XML RPC Server hangs on sixth call

An old problem has reared its ugly head, and I am under a lot of pressure to fix it. I have an SR open on this and don’t want to divert away from the efforts there, but because it’s an old problem we fixed, maybe I am overlooking something.

We have three web services we call from Natural through EntireX and the XML RPC servers. We had these working after some previous issues, but we have made changes to two of those services which changes the WSDL, the IDL and the XMM for them. I cannot generate the IDL and XMM from the WSDL. This is what SAG R&D is looking into right now, because if we could, I could also generate the Natural Client program. We have had to attempt to edit these objects manually in order to add the new paramters. I think we’ve done a pretty good job of taking these files and understanding the format string, the buffer length, the mappings and the variables defined in the client program and PDA, but as all the warnings say not to attempt to edit these objects, I am sure we are not 100% accurate in having these manual adjustments work.

In one changed service, the CheckAvailability one, they changed the XML payload on return to pass back an error type and message (A1 and A240 respectively) instead of a code (A4). The rest of the XML response maps well and is available to the program, but the newly added fields do not get returned to Natural.

The other changed service, PubSalesOrder, had some parameters added to the XML request, and as this is meant to be asynchronous, we do not have an XML response applicable. The changes to our objects allow for five successful calls, but upon making the sixth call, the XML RPC Server hangs, which is the behaviour we saw before. Sometimes we can deregister and re-start the XML RPC Server for this service, and other times we have to reboot the whole Windows instance. Then we are good for another five successful calls and the sixth one hangs consistently. When we had this problem before, it was solved when we implemented Reliable RPC and generated and used the Natural client stub, and after deploying a fixed entirex.jar file on the Windows instance running the XML RPC Servers. As far as I can tell, we still are using those, so I don’t know why we would hang unless there is something not right with the stub. Since I can’t generate this from the current WSDL, I can’t be certain the objects are correct.

Hence,this is why I am focused on this issue with generating the IDL from the WSDL as a means of solving the problem where the call fails to reach the XML RPC Server and why I think CheckAvailability is also failing in a different way but from the same root cause. But if I am missing something easy, perhaps I can resolve it faster.

Please advise as soon as possible since Eaton is putting a lot of pressure on me to resolve it.

Thanks,

Brian,

I assume you’ve been asked for a broker trace already by support for that SR ?

Anything unusual and/or suspect in there ?

I’d say if it reproduces that easily and constantly there must be a way of catching it …

R&D states that their analysis of the log files shows that the CheckAvailabilityRequest.xsd is fetched twice but I am waiting for further or final analysis.

Ok, but that isn’t related to Broker itself, or ?

Not sure what relation this has to the hang and “5 work, the sixth does not” though :wink:

What are your

entirex.server.minservers
entirex.server.maxservers

parameters set to ?

I don’t believe the Broker is having any issues at all. I think the message is being attempted to be delivered to the XML RPC Server, and it’s that which has the problem. When it’s having problems. Broker is successfully working with other services. I don’t think it’s a Broker issue at all.

I think it’s like the XML RPC Server is maintaining resources for a response that will never come from the service (as it’s asynchronous).

I don’t know if I actually specify the parms you inquire about.

Brian,

I didn’t say (or imply) Broker is having issues, just thought the trace might reveal something !

When you say there’s an issue with the XSD being fetched twice, can you run the server
without it, i.e. remove (rename) the XSD for a while ?

Just curious, I don’t know if the server insists in having a XSD, maybe it works similar to
the SOA Gateway where we can work with or without a XSD, we’ll just not be able to do
schema validation of requests without it.

Hi Wolfgang,

I appreciate your help - you have always been there for us on SAG-L and in these forums!

Unfortunately, I have no control over how the service is deployed as another group of developers is creating this and they use the XSD files as per their own standards inherited from the AIA programming style that Oracle and Deloitte use as their “best practice”. Anyway, I am not sure how we will know anything about the data elements required for the service if we remove the XSD files, since that’s where the definitions all are. EntireX needs those to understand what parameters to put in the IDL, XMM and Natural Client objects.

As for the logs/trace files, R&D hasn’t asked for the Broker trace but rather provided me a patch to the Designer tool to assist me in resolving the fact that I cannot generate the IDL file from the current service definition. Is there something I should be able to see if I look myself?

Thanks,

Well, the XMM has the mapping, a WSDL defines the interface, XSDs are for validation.

I can only tell from how we handle that in the SOA Gateway, when a XSD is present
data validation will take place, without the XSD the request will still be processed,
but one may hit issues on either the client side or the backend.

So it may be naive, but when there’s an XSD issue I’d try without to rule out one
of the possible problem sources :wink:

I’m afraid I cannot control the existence of the XSD to try that. Pretend this is a professional external service provider.

Tying up loose ends…

…this issue has been resolved by deploying the Hotfix 13 for EntireX.jar.

Thanks to all who responded.

Thanks for the information, Brian,

any insight as to why it was (consistently) the 6th call ?

I am guessing it’s some kind of resource issue because the old EntireX.jar file didn’t seem to recognize the null responses. It wasn’t even like it was a normal error - perhaps some condition it wasn’t coded to handle. Perhaps some sub-process was looping?

It handles it perfectly now. :slight_smile:

If you know the R&D folks, they may confide directly to you the specifics. It’s such a relief though to be behind me. I don’t have to feel like it was just my lack of experience. People at Eaton look to me to be the EntireX guru, while I feel I know about 1% of what there is to know. When I know twice as much as I do, I will feel like I know a half percent. :slight_smile:

There was an issue in the XML RPC Server which did not handle correctly an HTTP response containing no payload for an asynchronous web service call.

This has been corrected with a fix for the 821 version of the entirex.jar last December. However, at some point in time this year Brian replaced this (working) version of entirex.jar with the vanilla 822 version which did not contain this correction (because the 8220 version was build already in November). Replacing the 8220 version with the latest 822 fix brought back the working functionality.

I noticed when I deployed the fixed version of the v822 EntireX.jar file that the date of the file I replaced was only 3 days previous to the date of the file I deployed with the fix. Unfortunately, it took a lot longer to recognize that if I was asked to upgrade to v822 that the fix was not part of the base (which is something i assumed - that fixes from the prior version were always included in the base of the next even for an SM level).

Hi Brian,

the fix for 821 has been build on December 8 2011. The 822 release version has been build November 4 2011. So it is impossible that it includes the fix …

The date of a file in the file system is not relevant. This is the date when you save it to disk. Only if it is unpacked from an archive (this is e.g. what an installer does) it will keep the original date. The build date can be seen in the log file (e.g. “Start of EntireX XML RPC Server, Version: 8.2.2.0.186, Date: 04 Nov 2011”).

Then amongst the many things I had tried included an EntireX.jar deployment with a timestamp 3 days before the one that included the fix. Perhaps when I did the upgrade this version was the latest one available.

There were a few support tickets that were long-running in this quest to get everything to work, which was well beyond just this one issue (though it was certainly the centrepiece).

There were lots of issues for such a mature product. I have to wonder if anyone else is trying to do what we’re doing with XML RPC Servers and bi-directional web service calls in and out of the Natural environment, the use of AIA style programming which includes vendor provided self-referencing schema definitions and asynchronous service calls.

The self-referencing schema definition issue is still unresolved, but we have an intermediary service deployed that doesnt use them.