We are using webMethods 10.1 (Universal Messaging and Integration Server) on Windows Server 2016, in a scenario with high amount of small messages being received from Universal Messaging by webMethods Triggers (both concurrent and serial, also with and without IS-side filters).
From time-to-time, 1-2 time a week, we see that various triggers stop receiving documents. At that time the trigger is active, there is a corresponding connection on UM, and the message queue is growing up (but only for the given trigger).
To solve the issue we simply reload the package containing blocked trigger (restarting UM does not help).
What are the retry settings in the trigger? Do you see any error messages in the log? It might happen that an error occurs while some messages are processed, and then they get pushed back, and then the trigger tries to process them again and…
Depending on the retry settings, this might even result in an endless loop.
Triggers have default retry settings (triggers are NOT configured to retry nor to suspend on fatal error).
The triggers are always in Active state, even when messages are queued and stuck on UM.
No errors are present on both UM and IS side. Nothing
is there a way we could monitor if documents are not being picked up from a queue for more than 15 minutes using Optimize. I have no exp of Optimize detailed steps would be highly appreciated.
In such situations it’s sometimes useful to check with Enterprise Manager for both
the “Outstanding Events” (in the Named Objects view).
the “Pending Messages” (when you click on a Trigger on the left side in Enterprise Manager).
My understanding:
"Outstanding Events" Events that are not processed yet (or that were put back into the queue because a trigger was not able to process it).
“Pending Messages” are events that are currently taken by the trigger, but processing is not yet acknowleged.
If messages are taken from the channel by the trigger the number of outstanding events is DEcreased and the number of pending messages is INcreased (until a succesful acknowledgement of the trigger).
When you keep an eye on those two values while having problems you can at least figure out if messages are taken from the queue and cannot be processed by a trigger (outstanding decreasing and pending increasing) OR if the triggger does not eben try to process them at all (only outstanding events increasing).
This is considered as Manual monitoring efforts but do you have any setup for automated way for the UM queues that alerts via CC or Optmize dashboard thru and is it out-of-box available or aware in the newer versions?
Just curious.
Interestingly, I have an interest in developing something automated which would detect and alert on exactly these kinds of problems. However, I would need to collaborate with you a bit to get my idea prototyped out. Should not take too long if I get the use-case right as I have the notification portion figured out already. I just need some help on the condition detection side inside UM/IS.
OK, folks. I have to admit a bit of over-confidence on my part in terms of getting this done quickly based on my recollection of wMIS/UM.
It seems that on the wM side of things, I still can’t simply subscribe to a doc type, a webhook, or something similar to get the notification messages that I think I need in order to “trigger” the Alert without quite a bit of setup and configuration. Or, I haven’t been able to clearly see how to do that yet after skimming through a couple hundred pages of guides and looking at the HealthChecker tool. So, since I still want to move this forward, I am hoping I can “crowd-source” the wM side of the solution a bit by explaining what I already have in a bit more detail and seeing if I can get some additional collaboration here to get to a working end-to-end solution.
What I have already done: I have used the pub.client:http method in wMIS to allow the publishing of a JSON payload to an OpsGenie account (see image attached). The data in the JSON will create an Alert in OpsGenie which kicks off a voice call, SMS, mobile app notification to a person, or to a team of individuals, based on their work schedule, etc. My thinking was that adding a flow service which sends a real-time notification whenever business impacting issues requiring immediate action occurred would make this a piece of cake to configure for developers. I have tested this portion and it’s fairly easy to configure and most importantly works. Thus my thought that the wM/UM side of this was going to be quick and easy! At this point, I concede hubris.
The challenge for me now is figuring out which route is the most effective/efficient to pursue as a good solution for the wM/UM side:
HealthChecker seems to have a CLI interface. However, that seems to imply that I would have to create a job, schedule it to run the commands, extract the data and then call the flow service to create the Alert. Administratively, not that desirable.
I did see in the docs where it appears that HealthChecker has the capability to leverage Java to to create Listeners on event changes. Made me wonder if a Java wM service could be the Listener and I could map and publish the docs I was originally looking for? Some upfront development required, but once done, may be simpler to administer, maintain and scale?
Are there other options that I am overlooking (hopefully simpler)?
Also, I am willing to schedule some time to brainstorm with anyone willing to work though this with me as it seems that a solution like this might have quite a bit of utility. Regardless, this is definitely an intriguing use-case that I’d like to complete. Even if only as a Proof-Of-Concept. Looking forward to your feedback.
sorry, I don’t know of out-of-the-box UM monitoring capabalities from Software AG, but would be interested, too.
We implemented some own Healthcheck-DocType and Trigger and a Ping-Service doing publishAndWait on them. And we wrote some own Java Clients to check current content of channels, server-side filter conditions and stuff, but we still run it manually (for instance before deployments) and do not use it for automated monitoring or alerting.
Sounds like you’re validating that I need to go with option #2 and it is as involved as I had speculated. I appreciate the confirmation.
On the off chance that you have any additional specifics that can be shared about how/what you did, I would greatly appreciate that. Anything to save me some time would go a long way.