com.pcbsys.nirvana.client.nRequestTimedOutException on first publish

Hi,

My customer migrated from wm8/broker to wm9.7/um a few month back and we’re experiencing some strange behavior on ONE wm trigger.

Sine, only on one WM trigger we always experience the following error:

Error Messages:com.wm.app.b2b.server.dispatcher.exceptions.ResourceUnavailableException: com.pcbsys.nirvana.client.nRequestTimedOutException: The server failed to respond within the time out value:60000 last ID = 1246780 class com.pcbsys.nirvana.base.events.nTXPublishCommit

And only on the first element…

we’re parsing a csv file, each line is converted into 1 publishable document and push thru the pub.publish:publish service to the UM.
every time a new doc got consume, the first line of the csv which got publish return the given error…

We found one article on webmethods telling this error was a false one and could be ignored, but for sure the document never got publish, and so, never got consume…
If you resubmit the service thru My webMethods or play it in debug, no error returned…
As it’s a once a week process, for the moment we’re resubmitting manually the error one, but we’re looking for a better solution.

Any help or guess could be usefull before we open a case @ webmethods…

Looks like the timeout is because the underlying resource was not available to receive the message. I could be wrong but we noticed there are issues with pub.publish with UM 9.7 in quorum. Can you provide more details like whether you are using it as UM quorum ? What is the memory on your UM ? what kind of channel you are publishing to ? are you using any caching etc ?

I’ve dig up a little more and here’s some more details:

We have the issue on each services publishing this specific publishable document.

The document is correctly sync with the UM.
The issue is occuring on our production, preproduction and dev environment (but all have been configured the same way using the same procedure webmethods gave us).
All documents has been synch thru Designer and bind on the UM thru the following command
./jmsadmin bind topic /wm/is/dirname/docname

Each time one of those services try to publish the first document on the UM we got the message i quoted in first post.
All services behave the same way, they received some CSV files, parse them, extract all necessary data, populate the document, and publish it to the UM (one message per line, some service have 1-10lines, others 100… ) and everytime the first one got to be published we have the exception.

If we try to play those services in debug mode we’ve no issue…

The new infrastructure is more than 4times the capacity of the previous wm8 infra, and the wm8 infra was still running smoothly.

Hi Etienne,

are you sure that it is the first publish that is failing? How do you know it is the first publish?
Has this ever worked with UM, or is it a recent problem?
Are you using the setting “delayUntilServiceSuccess”? If so, then it will collect all published and commit them as one big transaction, which may cause problems if there are very many publishes.

Also you mention doing jmsadmin bind… Why do you do that? Are you also subscribing to these messages using JMS?

And if this affecting your production environment, please open a Support Incident, to make sure it gets the right attention.

Jonathan Heywood
Software AG Product Management

Hi Jonathan,

thanks for your message.

We’re sure it’s the first message that is failing as the service is reading the csv line per line, publishing one at a time (by one sub service), and it’s always the content of the first line, and the first service call we get in error, and . (btw no error shown in the IS, only visible thru My webMethods).

We don’t use JMS triggers, it’s some classic pub/sub broker messaging which got migrated on the new IS / UM environnement after a check by webMethods team saying it was doable without modification.

It has never worked since the migration to the UM. At first the technical guy from WM told us to upgrade the ram of the UM but we still have the issue, we were suppose to have some more news from WM but nothing since a few months (others projects got hotter and as it’s a once a week issue it got handled manually since then).

The jmsadmin bind was the commande we’ve done to set the UM topic at the begging. It’s the syntax we got from WM to declare our topic on the UM to got the pub/sub working.

Next step will be to open a Support Incident but my customer don’t have time to work on it until january, so i was asking here in case some people could get some clues before that, as it’s not an optimal situation.

thanks,
Etienne

Etienne,

You say that you see the error in MWS. Where precisely in MWS (which screen) are you seeing it?
Also do you have more information about the size of the message you are publishing?
And could you provide a screenshot of the doc type definition?

Regarding jmsadmin, that is not necessary and is completely unrelated to the use of native pub/sub. The correct way to create the asset on UM is to synchronize the publishable document type from Designer. Did you do that as well?

Can you please email me the name of the person from Software AG that told you to use jmsadmin? I would like to check that they have the correct information to guide customers. My email address is jonathan.heywood@softwareag.com

Hi Jonathan/Etienne,

We are hitting the similar issue in our env, just tracked it down in this post.

Is there any configuration tweaks needs to be done for this specific timeout issue in UM?

Env:
wM 9.10(testing phase with migration from 9.5 broker).

Error:
See internal message for details INTMSG: com.pcbsys.nirvana.client.nRequestTimedOutException: The server failed to respond within the time out value:60000 last ID = 2417487 class com.pcbsys.nirvana.base.events.nTXPublishCommit com.wm.app.b2b.server.dispatcher.exceptions.ResourceUnavailableException: com.pcbsys.nirvana.client.nRequestTimedOutException: The server failed to respond within the time out value:60000 last ID = 2417487 class com.pcbsys.nirvana.base.events.nTXPublishCommit

Regards,
Sabarish

Sabarish,

I didn’t get enough answers from Etienne to be able to help further.
Can you provide some more details:

  • Are you using pub.publish:publish or pub.jms:send?
  • Is your UM server clustered?
  • Do you see this with every message sent?
  • Where to you see the error reported?

Hi Jonathan,

Please find below the details

  • Are you using pub.publish:publish or pub.jms:send?
    Its pub.publish:publish
  • Is your UM server clustered?
    No.
  • Do you see this with every message sent?
    No
  • Where to you see the error reported?
    IS

//Sabarish

Sabarish,

the error implies a responsiveness issue with UM. This could occur if the UM server has limited resources or a (very) slow dick or network. It could also occur if there is a large backlog of messages on the channel.
I suggest you open a Support Incident to have this investigated further.

Hi Jonathan,

Thanks for the info. And yes, as suggested we will follow this with an SR.

I just want to highlight one more finding from our side for info in this thread.

It is the encoding type of the publishable document.

post the migration utility service “pub.utils.messaging:migrateDocTypesTriggersToUM” run, we have the doc encoding type updated as “ProtocolBuffers” and with this we are hitting the timeout error.

Reverting the same to “IDATA” actually negated the time out error and we could able to Publish. (But again the published docs are not subscribed by triggers and thats a diff issue we are following up with SAG anyways)

We are intrested more about this encoding type which submit diff results currently, Appreciate your inputs on this.

Thanks!

//Sabarish

Hi All,

The timed out issue got resolved from our end.

the issue is presence of REGex , asterisk in the filter conditions.

Removing the filter resolved the issue.

Thanks Jonathan.

//Sabarish

Sabarish,
I’m glad to hear it got resolved. I’d like to do some follow-up with our R&D department, so could you post the support incident number, or the filter condition that was giving the problems?
thanks

Hi Jonathan,

The SR is 5272857 and the filter conditions are as below.

Pre Migration:
%metadata/metadata/infoObject% L_EQUALS “XYZ” && (%metadata/metadata/finalReceiverSystem% = /XYZ./ || %metadata/metadata/finalReceiverSystem% = /ZYX./)

Post Migration:
(metadata.metadata.infoObject = ‘XYZ’ and (metadata.metadata.finalReceiverSystem LIKE ‘%XYZ%’ or metadata.metadata.finalReceiverSystem LIKE ‘%ZYX%’))

Actually, the migration utility service skipped the relative triggers with the warnings due to the presence of “." in it. We edited them manually replacing the Lexical operators but with the "(%metadata/metadata/finalReceiverSystem% = /XYZ./” and this condition completely impacted the channel behaviour, where our publish fails with the timeout error.

This was fixed with the use of “LIKE” operator.

Worst thing was, we had to put manual effort of enabling triggers one by one and try publishing each time amongst 150+ triggers to find the culprit:).

Thanks for all the inputs!

//Sabarish