Weird Intermitted Connection Refused: webMethods Integration Server


We are facing Intermittent Connection Refused Error occurring 3-4 times in a day. Customer makes web service calls to an API Gateway (Policy Enforcement Point) Server which forwards call to a webMethods Integration Server via a Load Balancer. Client sends a scheduled SOAP over HTTPS web service call every 5 minutes to keep systems monitored. This is in parallel to actual transactions which takes place by actual consumers of Customer.

API Gateway is running on Linux and webMethods Integration Server is running on Windows Server 2012 R2.
Rarely we see API Gateway gets Connection Refused error when request comes to API Gateway which tries to invoke a web service at Integration Server (webMethods IS 9.9). We don’t know why this is happening as Integration Servers are up with no performance/resource issue. We cannot see anything in Integration Server logs either.

Firewall person says, it can’t be firewall issue as all traffic is either blocked or allowed from one source and one target IP.

Network person says, it must be application server who is sending RESET to these requests. Load Balancer can’t send any RESET on its own but Application Server only.

We have 2 Integration Servers (not clustered) running behind Load Balancer VIP which is invoke by an API Gateway having multiple hosts in cluster. We’ve seen each of API Gateway hosts has Connection Refused failures and each of hosts are getting successful transactions as well.

At Integration Server, we have 1000 threads allocated to each of 2 servers and server health has no issue at all. This issue is happening in each environment (QA/Staging/Production). Another noticeable behavior is that customer has performed several days of load testing and there is not a single Connection Refused failure during load testing.

We did setup Wireshark captures at Integration Server and we can see heart beats from Load Balancer continuously coming to each of 2 hosts. Transaction may failed at any point but another transaction coming within seconds may get successful response. At Wireshark capture sometime we see:

  1. Just 1 packet with RESET entry from source (API Gateway) server but we can’t correlate if that is the same transaction for Connection Refused.

  2. Once we saw that Integration Server received SYN packet and returned RESET instead of SYN-ACK. We are not sure why this happened but except 1-2 instances where we saw some activity in Wireshark, we don’t see anything in Wiresahark for the timestamp when API Gateway got Connection Refused Error.

Is there a setting at Integration Server which can help us to log all hits to the port, specially any connection refused event? We even tried to look at Windows Event Viewer but do not see any exception there.