I am trying to configure a rule where in i have to be notified when a particular IS service is hanging for more than a specified time. I wan to do it via webMethods manager console. any ideas about howw to achieve witht he built in OMI features. Fo doing that i have already writtena wrapper over the service whing hangs and am able to find out the state of the service i.e. whether the service is hanging or not accornding to my criteria, but how is that i can capture this via manager
I think this feature already exists with manager 6.5 and above. You can check the status of a service and configure an alert email if the online status is violated. You need to go into Manager console and drill into packages > services.
Have you investigated why the service “hangs” and then resolve that? Might save yourself a bit of work in creating infrastructure and services monitoring for something that shouldn’t be happening.
vicsu … what manager 6.5 has is the ability to tell if a service is currently running or not, how many times the service is invoked with the collection interval etc. But, it won’t give you ability to tell if a service has been running for 30 mins (say).
Rob…I agree with you. We should tackle the root cause of the problem.The above requirement is the result of one of the components that takes data from webMethods was not responding and because of the way we configured our triggers to do synchronous processing, it backed up the broker client queues. Since then we have added monitoring tools to monitor the client application and client queues etc. Also the problem with the client application is fixed. Unfortunately, there are many other such applications managed by different teams and we want to prevent this kind of situation again. So, I was looking if I can preemptively take some action instead of realizing the problem after it goes unnoticed for too long.
But I also realized this may not be the best approach to do it since there are too many services and it is hard to monitor each one of them.
Monitoring queue depth is probably the best approach.
As you know, Manager can tell you if a things are happening that outside of “normal.” You can get a little creative to identify what data points Manager gathers that would indicate a problem. For a hung service, the alert may be that the number of times it has run in the last period is below normal–usually a hung service will cause a bottleneck of some sort. I think you might be able to have Manager monitor the number of threads running a given service–sticking at one value for a length of time would probably be abnormal and could be flagged.
I think I figured out how to do this. Since I can get the “CurrentlyRunning” value of a service, creating a rule with " (OMI/OMIISService CurrentlyRunning/Reading_Minimum) = (1) " and specifying the "Intervals before true " = 5 (say) will give a rule violation if the service has been running from last 25 mins (assuming the collection interval is 5 mins)