webMethods Integration Server offers a very nice and easy to use interface for scheduling tasks (Server > Scheduler). You can select the service to be run, the user to run the service, the nodes on which to run and, of course, the time when the service must run.
From the Admin Console one can see when services are scheduled, but cannot easily see if they have started as planned or how much they have actually run.
Due to the large number of scheduled services, one difficulty that we recently faced was that there was no way to tell which time-slots are free (so we can schedule new services) and which time slots are really crowded (move some services out).
Fortunately we have a rule that all scheduled services must have the Audit log enabled and “Log on” set at least on “Error, success, and start”.
This configuration means that information about every (audited) service invocation is persisted and can later be inspected.
It is recommended that also the IS services exposed as Web Servicesto have the Audit log enabled.
The Integration Server page where the audit data can be queried is under Logs > Service. This page allows checking specific service execution details; however it is not intended to provide an overview of what happened during the previous day or week, and some important information like long execution times or recurrent service failures can be easily missed.
The RichAudit IS package allows developers and admins to have a visual representation of the service audit data. In one view you can see if there were any problems:
On this page you can:
- filter by service name and by time interval
- easy differentiate between successful services (blue) and failed services (red)
- choose your zoom level (1,2,3,6 or 12 hours, or larger: 1 day or 1 week)
- only filter for services that are scheduled (Only scheduled services check-box)
- see which time slot are "crowded" and which are free
- navigate UP and DOWN through the service list or see "All Services" on the same page for a better perspective
- navigate to left or right on the time line or return to the original position in time (Reset)
- click on the service bar (blue or red rectangle) to see the full details of the service (all calls of the service, server ID, user ID, etc)
The Details button is shows the service execution over time:
Service History page (Click on Details button):
The “Service History” is especially useful to see if a service execution time has increased over time (past several months).
The “Duration Upper Limit” can be used if there are some peeks that make the rest of the chart look flat.
One advantage of RichAudit is that it can run in a Productive environment without adding an overhead to the service execution time.
At the moment only two views are available (Time Slots and Service History), but based on the data available on the Audit table also other views could be rendered. For example:
- how is the service distribution between nodes
- which users make the most calls
- when a certain error first occurred
- etc
What would help you in the day to day monitoring of the Integration Server? Please leave a comment with your wish and it might get implemented.
Integration Serverver Package: RichAudit.zip
Update
RichAudit version 1.2 now has two new charts for Memory Usage and Thread Usage.
The user can see directly in the IS console the up to date state of memory and thread usage and can also check the values for previous days.
The chart allows to zoom horizontally (time axis) and vertically (values axis).
Integration Serverver Packageversion 1.2: RichAudit v1_2.zip