Every now and then, our calls from an XML RPC Server to a SOAP-based web service fails with the error “Transport error: 408 Error: Request Timeout”. I am pretty certain that this is a failure on the part of the web service (could be the service’s load balancer before it gets to the service module, but it’s on the server-side I am sure) and that nothing we can do would prevent this from happening. Even in setting httpConnectionTimeout=“300”, when this happens, it happens at 30s (the RPC timeout setting in Natural is 55s). Taking out httpConnectionTimeout doesn’t change this either. When it works, the good response comes back in 1-2s so making it wait more than 30s is not likely going to result in more good responses. A support request I opened with Software AG seems to confirm these impressions.
2021-02-23 15:23:59.747> EntireX-*orker-2( CP:HTTPTransport.invoke() I:doc format: SOAP 1.1 )
2021-02-23 15:23:59.747> EntireX-*orker-2( CP:HTTPTransport.invoke() I:SOAPAction: “” )
2021-02-23 15:23:59.747> EntireX-*orker-2( CP:HTTPTransport.invoke() I:Target: http://gapp013-qa.tcc.etn.com/PollartShippingDocs/PollartShippingDocEngineService )
2021-02-23 15:23:59.747> EntireX-*orker-2( CP:HTTPTransport.invoke() I:Timeout(ms): 300000 )
2021-02-23 15:23:59.763> EntireX-*orker-2( CP:HTTPTransport.invoke() I:Request: <?xml version='1.0' encoding='utf-8'?><SOAP-ENV:Envelope xmlns:SOAP-ENV=“http://schemas.xmlsoap.org/soap/envelope/”>SOAP-ENV:Body
…
</SOAP-ENV:Body></SOAP-ENV:Envelope> )
2021-02-23 15:24:37.659> EntireX-*orker-2( CP:HTTPTransport.invoke() I:SendReceive Exception: com.softwareag.wsstack.client.API.WSClientException: org.apache.axis2.AxisFault: Transport error: 408 Error: Request Timeout )
2021-02-23 15:24:37.659> EntireX-*orker-2( CP:HTTPTransport.invoke() I:SendReceive Exception: com.softwareag.wsstack.client.API.WSClientException: org.apache.axis2.AxisFault: Transport error: 408 Error: Request Timeout
at com.softwareag.wsstack.client.impl.WSOperationClientImpl.execute(WSOperationClientImpl.java:68)
at com.softwareag.entirex.xml.rt.HttpTransportImpl.sendReceive(HttpTransportImpl.java:404)
at com.softwareag.entirex.xml.rt.TransportHandler.sendReceive(TransportHandler.java:236)
at com.softwareag.entirex.xml.rt.MessageHandler.processRPCMessage(MessageHandler.java:125)
at com.softwareag.entirex.xml.rt.XMLRPCServerRPCMessageHandler.processMessage(XMLRPCServerRPCMessageHandler.java:147)
at com.softwareag.entirex.aci.ServerRPCMessage.doNonConversation(ServerRPCMessage.java:66)
at com.softwareag.entirex.aci.ServerWorker.run(ServerWorker.java:185)
Caused by: org.apache.axis2.AxisFault: Transport error: 408 Error: Request Timeout
at org.apache.axis2.transport.http.HTTPSender.handleResponse(HTTPSender.java:340)
at org.apache.axis2.transport.http.HTTPSender.sendViaPost(HTTPSender.java:199)
at org.apache.axis2.transport.http.HTTPSender.send(HTTPSender.java:80)
at org.apache.axis2.transport.http.CommonsHTTPTransportSender.writeMessageWithCommons(CommonsHTTPTransportSender.java:406)
at org.apache.axis2.transport.http.CommonsHTTPTransportSender.invoke(CommonsHTTPTransportSender.java:233)
at org.apache.axis2.engine.AxisEngine.send(AxisEngine.java:443)
at org.apache.axis2.description.OutInAxisOperationClient.send(OutInAxisOperation.java:484)
at org.apache.axis2.description.OutInAxisOperationClient.executeImpl(OutInAxisOperation.java:263)
at org.apache.axis2.client.OperationClient.execute(OperationClient.java:165)
at com.softwareag.wsstack.client.impl.WSOperationClientImpl.execute(WSOperationClientImpl.java:65)
… 6 more
)
Ideally, we would want the team responsible for this service to identify the root cause and fix it. I am not optimistic this will happen as we’ve been troubleshooting this issue for a long time.
Thoughts are now on recovery from this condition… is there anything I can do in the EntireX layers (Broker, XML RPC Server) to auto-retry in case of this 408 error? I am guessing that the Natural client can look at the RC of the service CALLNAT and retry if it was not a good RC, though that would not just address this but any number of possible cases. Any retry logic would have to consider the possibility that a failure is not this sporadic 408 issue but that the service may be down, and we wouldn’t want retry logic to end up looping a million times as a result.
Are there any options besides coding smart retry logic in the Natural client code?
Thanks,
Brian