I am working in a WebMethods environment where only 2 integration servers are there in cluster and most of the load is handled by the Node 1 while only particular applications are running on Node2.
The traffic on the node 2 is relatively less than that of node 1. However, we have faced an issue that the IS is going unresponsive at times and the transactions are taking very much time to get processed. At times, the transactions transit time is even observed to be in minutes as well.
Most of our interactions are with DB2 servers and at the time of server sinking we could observe that the number of system threads are going to a high level (still within the maximum limit or even within the threshold limit), memory utilization is going near to 95 % + (which is usually 80% - 90%).
Server logs are showing “Broken pipe error” and “SQL30108N. A connection failed but has been re-established.” but nothing else.
We had to restart the server to let the server respond properly and I feel every time restarting a production Integration Server shouldn’t be solution.
Can some one suggest, How can I troubleshoot this issue. (During this issue, server is completely unresponsive to do any action through the IS Admin console.)