I found Broker and the RPC servers to be very solid; they remained active between weekly IPLs. As did Wolfgang and Douglas, my client determined that multiple NTASKS did the job as well as multiple RPC Server jobs, so we implemented NTASKS > 1.
The Natural RPC job was submitted several times to mitigate possible job failure; if the first (i.e. active) job failed, the next one in the queue would automatically take its place. But we found that almost all RPC server failures were the result of application logic errors, and these logic error would cause the servers and jobs to fall like dominoes, in rapid succession. I recommended that Optimize for Infrastructure be implemented to monitor the number of active tasks and send appropriate notifications. As a short-term solution, I built a Natural program that could initiate, query, or terminate a server. Running in batch mode, it could send an e-mail notification if the number of active tasks fell below a specific threshold. I used calls to BROKER, USR2071N and USR2073N to build it.