We are in the process of establishing a broker/natural rpc server environment to serve 406 concurrent users. How can we tell how many rpc servers (NTASK) is enough to serve the user base? Is there a way I can tell if broker is waiting for an rpc server to service its call?
To test out the theory, in our TEST environment, we have tried running 8 concurrent users with 8 broker workers thread and 8 instances of an rpc server. We can see that all 8 instances of the rpc servers are used but cannot tell whether 8 rpc servers are indeed necessary.
SMH allows me to see the broker worker thread consumption and distribution. Is a similar tool available for the rpc servers?
I generally monitor the “pending conversations” statistic in SMH. If this number is greater than 1, it indicates that there is a message (conversation) waiting for a server to process it. If you are often seeing pending conversations of 3 (watch the “pending conversations (high)” also) or more, you will likely increase throughput by adding servers.
If you reduce the number of servers and the same load still shows no pending conversations, you can try reducing further. How many is “enough” will vary depending on the load - number of concurrent client requests), how long the service process takes (longer processing generally will require more servers to maintain overall throughput), available memory and CPU. For example, a site had about 150 RPC servers across several applications, but they had a single-engine CPU: trimming the number of RPC servers down (to about 10 - 15) improved overall throughput as the CPU had less swapping and task management to do.
Unless you are very resource constrained having an extra server instance or two,
in addition to what you deem “enough”, won’t do much harm, but you are prepared
for increasing load or peaks.
Establish an Attach Server that can start additional servers on a needed basis. Then you do not need to have all started from beginning especially good when you do not know how many are required to cope with the peak.
We ran several load tests over the weekend. With 500 users, our pending conversations (high) consistently stay around 50-60. We have 10 workers thread set and 17 RPC subtasks started. the active conversations (high) seemed to be around 200.
My 1st thought is that perhaps too many conversations are waiting at any one point. We are looking at natural subprogram performance as well. If the issue is not there - we looked at the callnat - return from callnat time in the rpc server log, is it fair to say that perhaps we can try by adding more rpc servers (additional ntasks)? The curious question here is what about an additional broker?
analyzing the called subprograms is something that just needs to be done anyway,
simply because when you fire 50 requests per second and a single RPC call takes
2 seconds there is no way you can have NO queueing …
Workers …
Play with the number of workers, on every system there’s a point where increasing
the number of workers actually works against you, simply because z/OS isn’t very
good in scheduling subtasks, when EntireX distributes the work evenly and the
9th and 10th worker dont get their fair share …
I found 8 to be a good rule of thumb, but … play with it.
Another broker … definitely an option, due to what I said above, z/OS does better
in scheduling address spaces than subtasks, so 2 brokers with 6 workers each
may give you better throughput than one Broker with 12. Of course it also adds
some base overhead, if your system can cope with that … give it a try, but I’d
say that’s more of a last resort than a typical scenario.
The “(high)” counters will always show the high water mark for this counter - once it reaches a high value, it will never go below that value until your restart the server. So if you have a high spike during your test’s startup cycle, for example, it will mask the highest activity during the actual test. During the test you need to refresh SMH to monitor the recent values for active conversations and pending conversations.
There are a lot of variables to consider before determining the “best” mix for your environment and application. A sample:
CPU load, number of processors (more processors will generally allow more concurrent tasks/subtasks to run productively)
relative CPU usage of RPC servers vs EntireX Broker (vs Adabas…)
length of time for service programs to execute (this is where Adabas/Natural tuning of the applications called will help). As Wolfgang says, the longer the RPC service takes, the more queueing you can expect
number of concurrent users - is the test simulating active users with little think time? For many web application, 500 “active” users will only result in 5 - 20 concurrent requests at a time as the rest is “think time” - time for the web application to render, the user to assess results and type/click new requests.
do you have very different durations across the different service requests - some that take little time, others that take seconds or minutes to run? Consider partitioning these services across different RPC Servers to prevent longer running services (usually fewer in number) from blocking the quick services (usually the majority of requests).
Shameless plug: consider having Software AG Global Consulting Services to supply their experience to help you achieve your application goals!