Messages stuck on UM and not receing by IS trigger

Sebastian_Konik · May 15, 2018, 9:50am

Hello,

We are using webMethods 10.1 (Universal Messaging and Integration Server) on Windows Server 2016, in a scenario with high amount of small messages being received from Universal Messaging by webMethods Triggers (both concurrent and serial, also with and without IS-side filters).

From time-to-time, 1-2 time a week, we see that various triggers stop receiving documents. At that time the trigger is active, there is a corresponding connection on UM, and the message queue is growing up (but only for the given trigger).

To solve the issue we simply reload the package containing blocked trigger (restarting UM does not help).

Anyone have seen such behavior ?

fml2 · May 15, 2018, 10:30pm

What are the retry settings in the trigger? Do you see any error messages in the log? It might happen that an error occurs while some messages are processed, and then they get pushed back, and then the trigger tries to process them again and…

Depending on the retry settings, this might even result in an endless loop.

Atif_Ahmed · May 30, 2018, 7:03pm

We have seen the same behavior with triggers in our Production env . what the desired retry settings for the trigger.

Sebastian_Konik · May 30, 2018, 7:55pm

Triggers have default retry settings (triggers are NOT configured to retry nor to suspend on fatal error).
The triggers are always in Active state, even when messages are queued and stuck on UM.

No errors are present on both UM and IS side. Nothing

fml2 · May 31, 2018, 2:55am

Then I think you can do the following things:

Try to set server logging so that trigger activities are logged at a lower level
Open a support case; they can tell you what logger to activate; also, they could provide a patch which could help collecting more debug information.

Sebastian_Konik · May 31, 2018, 11:36am

Done that. Nothing unusual in the logs.
We raised a support ticket already: 5323528 but there is no resolution yet, so I’m curious if someone else have seen such problems.

Atif_Ahmed · May 31, 2018, 2:16pm

is there a way we could monitor if documents are not being picked up from a queue for more than 15 minutes using Optimize. I have no exp of Optimize detailed steps would be highly appreciated.

thanks

rmg · May 31, 2018, 11:09pm

I think you have to depend on UM Java API calls or CC (depends on latest version) to monitor the UM queues/TTL

HTH,
RMG

Florian_Altherr · June 1, 2018, 11:46am

In such situations it’s sometimes useful to check with Enterprise Manager for both

the “Outstanding Events” (in the Named Objects view).
the “Pending Messages” (when you click on a Trigger on the left side in Enterprise Manager).

My understanding:

"Outstanding Events" Events that are not processed yet (or that were put back into the queue because a trigger was not able to process it).
“Pending Messages” are events that are currently taken by the trigger, but processing is not yet acknowleged.

If messages are taken from the channel by the trigger the number of outstanding events is DEcreased and the number of pending messages is INcreased (until a succesful acknowledgement of the trigger).

When you keep an eye on those two values while having problems you can at least figure out if messages are taken from the queue and cannot be processed by a trigger (outstanding decreasing and pending increasing) OR if the triggger does not eben try to process them at all (only outstanding events increasing).

rmg · June 1, 2018, 8:25pm

Hi Florian,

This is considered as Manual monitoring efforts but do you have any setup for automated way for the UM queues that alerts via CC or Optmize dashboard thru and is it out-of-box available or aware in the newer versions?
Just curious.

TIA,
RMG

Wayne_Stewart1 · June 1, 2018, 8:52pm

Interestingly, I have an interest in developing something automated which would detect and alert on exactly these kinds of problems. However, I would need to collaborate with you a bit to get my idea prototyped out. Should not take too long if I get the use-case right as I have the notification portion figured out already. I just need some help on the condition detection side inside UM/IS.

Let me know if this would be of interest to you.

Mahesh_K_Sreenivas · June 2, 2018, 3:00am

Check "UM HealthChecker tool "

Wayne_Stewart1 · June 2, 2018, 8:18pm

Appreciate the tip. Will have a look at it. Thanks.

Wayne_Stewart1 · June 3, 2018, 5:56pm

OK, folks. I have to admit a bit of over-confidence on my part in terms of getting this done quickly based on my recollection of wMIS/UM.

It seems that on the wM side of things, I still can’t simply subscribe to a doc type, a webhook, or something similar to get the notification messages that I think I need in order to “trigger” the Alert without quite a bit of setup and configuration. Or, I haven’t been able to clearly see how to do that yet after skimming through a couple hundred pages of guides and looking at the HealthChecker tool. So, since I still want to move this forward, I am hoping I can “crowd-source” the wM side of the solution a bit by explaining what I already have in a bit more detail and seeing if I can get some additional collaboration here to get to a working end-to-end solution.

What I have already done: I have used the pub.client:http method in wMIS to allow the publishing of a JSON payload to an OpsGenie account (see image attached). The data in the JSON will create an Alert in OpsGenie which kicks off a voice call, SMS, mobile app notification to a person, or to a team of individuals, based on their work schedule, etc. My thinking was that adding a flow service which sends a real-time notification whenever business impacting issues requiring immediate action occurred would make this a piece of cake to configure for developers. I have tested this portion and it’s fairly easy to configure and most importantly works. Thus my thought that the wM/UM side of this was going to be quick and easy! At this point, I concede hubris.

The challenge for me now is figuring out which route is the most effective/efficient to pursue as a good solution for the wM/UM side:

HealthChecker seems to have a CLI interface. However, that seems to imply that I would have to create a job, schedule it to run the commands, extract the data and then call the flow service to create the Alert. Administratively, not that desirable.
I did see in the docs where it appears that HealthChecker has the capability to leverage Java to to create Listeners on event changes. Made me wonder if a Java wM service could be the Listener and I could map and publish the docs I was originally looking for? Some upfront development required, but once done, may be simpler to administer, maintain and scale?
Are there other options that I am overlooking (hopefully simpler)?

Also, I am willing to schedule some time to brainstorm with anyone willing to work though this with me as it seems that a solution like this might have quite a bit of utility. Regardless, this is definitely an intriguing use-case that I’d like to complete. Even if only as a Proof-Of-Concept. Looking forward to your feedback.

Florian_Altherr · June 4, 2018, 9:08am

Hi rmg,

sorry, I don’t know of out-of-the-box UM monitoring capabalities from Software AG, but would be interested, too.

We implemented some own Healthcheck-DocType and Trigger and a Ping-Service doing publishAndWait on them. And we wrote some own Java Clients to check current content of channels, server-side filter conditions and stuff, but we still run it manually (for instance before deployments) and do not use it for automated monitoring or alerting.

Cheers
Florian

Florian_Altherr · June 4, 2018, 9:18am

Thx for the tip. Where can I find it please?

Mahesh_K_Sreenivas · June 4, 2018, 2:34pm

At the location C:\SoftwareAG102\UniversalMessaging\tools\runner

runUMTool.bat, you can find more info. in UM Admin Docs.

Florian_Altherr · June 4, 2018, 2:37pm

Ah okay, cool. Thank you!

Wayne_Stewart1 · June 4, 2018, 6:16pm

Hi Florian,

Sounds like you’re validating that I need to go with option #2 and it is as involved as I had speculated. I appreciate the confirmation.

On the off chance that you have any additional specifics that can be shared about how/what you did, I would greatly appreciate that. Anything to save me some time would go a long way.

Thanks.
-Wayne

Atul_Patil1 · June 21, 2018, 2:22am

Did you get any solution on this from SoftwareAG. We are also experiencing similar issue in our production environment.

Thanks

Topic		Replies	Views
IS Cannot Connect to UM After UM Restarts EDI	13	3493	April 2, 2021
Universal Messaging Using high heap memory universal-messaging , Enterprise-Manager	5	1570	February 22, 2023
Using Apache ActiveMQ Artemis as JMS Broker with webMethods Integration Server Knowledge base jms , Apache-ActiveMQ-Artemis	16	4191	January 30, 2024
HSTS for webMethods 9.12 Designer	14	920	April 14, 2024

Messages stuck on UM and not receing by IS trigger

Related topics