Handling "Transport error: 408 Error: Request Timeout" from XML RPC Server

Every now and then, our calls from an XML RPC Server to a SOAP-based web service fails with the error “Transport error: 408 Error: Request Timeout”. I am pretty certain that this is a failure on the part of the web service (could be the service’s load balancer before it gets to the service module, but it’s on the server-side I am sure) and that nothing we can do would prevent this from happening. Even in setting httpConnectionTimeout=“300”, when this happens, it happens at 30s (the RPC timeout setting in Natural is 55s). Taking out httpConnectionTimeout doesn’t change this either. When it works, the good response comes back in 1-2s so making it wait more than 30s is not likely going to result in more good responses. A support request I opened with Software AG seems to confirm these impressions.

2021-02-23 15:23:59.747> EntireX-*orker-2( CP:HTTPTransport.invoke() I:doc format: SOAP 1.1 )
2021-02-23 15:23:59.747> EntireX-*orker-2( CP:HTTPTransport.invoke() I:SOAPAction: “” )
2021-02-23 15:23:59.747> EntireX-*orker-2( CP:HTTPTransport.invoke() I:Target: http://gapp013-qa.tcc.etn.com/PollartShippingDocs/PollartShippingDocEngineService )
2021-02-23 15:23:59.747> EntireX-*orker-2( CP:HTTPTransport.invoke() I:Timeout(ms): 300000 )
2021-02-23 15:23:59.763> EntireX-*orker-2( CP:HTTPTransport.invoke() I:Request: <?xml version='1.0' encoding='utf-8'?><SOAP-ENV:Envelope xmlns:SOAP-ENV=“http://schemas.xmlsoap.org/soap/envelope/”>SOAP-ENV:Body

</SOAP-ENV:Body></SOAP-ENV:Envelope> )
2021-02-23 15:24:37.659> EntireX-*orker-2( CP:HTTPTransport.invoke() I:SendReceive Exception: com.softwareag.wsstack.client.api.WSClientException: org.apache.axis2.AxisFault: Transport error: 408 Error: Request Timeout )
2021-02-23 15:24:37.659> EntireX-*orker-2( CP:HTTPTransport.invoke() I:SendReceive Exception: com.softwareag.wsstack.client.api.WSClientException: org.apache.axis2.AxisFault: Transport error: 408 Error: Request Timeout
at com.softwareag.wsstack.client.impl.WSOperationClientImpl.execute(WSOperationClientImpl.java:68)
at com.softwareag.entirex.xml.rt.HttpTransportImpl.sendReceive(HttpTransportImpl.java:404)
at com.softwareag.entirex.xml.rt.TransportHandler.sendReceive(TransportHandler.java:236)
at com.softwareag.entirex.xml.rt.MessageHandler.processRPCMessage(MessageHandler.java:125)
at com.softwareag.entirex.xml.rt.XMLRPCServerRPCMessageHandler.processMessage(XMLRPCServerRPCMessageHandler.java:147)
at com.softwareag.entirex.aci.ServerRPCMessage.doNonConversation(ServerRPCMessage.java:66)
at com.softwareag.entirex.aci.ServerWorker.run(ServerWorker.java:185)
Caused by: org.apache.axis2.AxisFault: Transport error: 408 Error: Request Timeout
at org.apache.axis2.transport.http.HTTPSender.handleResponse(HTTPSender.java:340)
at org.apache.axis2.transport.http.HTTPSender.sendViaPost(HTTPSender.java:199)
at org.apache.axis2.transport.http.HTTPSender.send(HTTPSender.java:80)
at org.apache.axis2.transport.http.CommonsHTTPTransportSender.writeMessageWithCommons(CommonsHTTPTransportSender.java:406)
at org.apache.axis2.transport.http.CommonsHTTPTransportSender.invoke(CommonsHTTPTransportSender.java:233)
at org.apache.axis2.engine.AxisEngine.send(AxisEngine.java:443)
at org.apache.axis2.description.OutInAxisOperationClient.send(OutInAxisOperation.java:484)
at org.apache.axis2.description.OutInAxisOperationClient.executeImpl(OutInAxisOperation.java:263)
at org.apache.axis2.client.OperationClient.execute(OperationClient.java:165)
at com.softwareag.wsstack.client.impl.WSOperationClientImpl.execute(WSOperationClientImpl.java:65)
… 6 more
)

Ideally, we would want the team responsible for this service to identify the root cause and fix it. I am not optimistic this will happen as we’ve been troubleshooting this issue for a long time.

Thoughts are now on recovery from this condition… is there anything I can do in the EntireX layers (Broker, XML RPC Server) to auto-retry in case of this 408 error? I am guessing that the Natural client can look at the RC of the service CALLNAT and retry if it was not a good RC, though that would not just address this but any number of possible cases. Any retry logic would have to consider the possibility that a failure is not this sporadic 408 issue but that the service may be down, and we wouldn’t want retry logic to end up looping a million times as a result.

Are there any options besides coding smart retry logic in the Natural client code?

Thanks,

Brian

Hi Brian,
as discussed in the Support Incident. A retry may again fail with the timeout. We suggest to contact the provide of the Web Service to fix the reason of the timeouts.
regards
Peter

Hi Peter,

Thanks for giving me the opportunity to comment on this by reminding me I had posted here as well as having had the SR open on this same topic. When this issue was first raised to me, my first suggestion was for the service owner to identify and correct the reason for the timeout. All that followed was a bunch of finger-pointing… the owner of the service module says their code is not even getting invoked for any logging to even matter; the platform folks say everything is fine with the server; the network people say there are no issues with traffic in/out of that server. After much time, the collective response was “why don’t you just perform an automatic retry?”.

So, I was very happy when Amin recommended in a very eloquent manner that the service owner fix the root cause of the timeout. It gave me a stronger position to say this isn’t just me saying this but the vendor independently also recommends this course of action (and here’s why).

Currently they are still trying to figure this out, and we have put a quash to the expectation that the permanent solution is to code any automatic and immediate retry logic, although the current manner in which we recover from such failures is a manual retry, so we still may automate this to free up the person’s time who resends multiple transactions every day. But this will not be seen as the correct, permanent corrective action.

What others are doing right now is looking at enabling RequestDumperValue in WEB-INF\jboss-web.xml and testing if it helps to get requests in the server logs. However, they don’t really have much hope in that and are thinking they really need to perform a technical review of the application and see if there are any design changes needed. Hopefully in the end they will figure it out.

Thanks also for your confirming position that backs up Amin’s.

Regards,

Brian