Service to return top most repeating error log entries

Hi there,
Is it possible to implement a service which would return top most repeating error log entries from the webMethods intergration server. This is needed for adding another step to our daily checks process which automates checking various components on all prod servers such as adapter connections, schedulers, triggers… From my point of view, it would be helpful to get a report of top most repeating error log entries from prod servers.

Thank you in advance,
n23

If you’re storing the error data in a DB (rather than a file) then this becomes an exercise in creating the right SQL query.

If in a file, the job is a bit more complex. And may be more appropriate for other tools such as a log watcher or something suited for processing semi-structured/unstructured data.

1 Like

Hi reamon,
Thanks for the answer! It seems that our ISs are configured to store the error log entries in WMERROR* files inside the /logs directory. I also checked table WMERROR but it does not contain any entries. I would rather not involve any third party apps for this. In this case, I will check if I can somehow define a schema for the file and try to parse it.

Best regards,
n23

If you upgrade to 10.7 there is an official API for accessing the server logs.
If older take a look at this service.

wm.server.query:getPartialLog?log=error&numLines=50&startLine=0

Other possible arguments
descendchecked
startDate
endDate

Better than going via the DB, db schema can change between version and the above works for both file or db destinations.
regards,
John.

1 Like

Downside: customer use of internal services is not supported.

true, but neither is direct DB access and service is less likely to change

Apologies for inferring that DB access would be preferable over the internal services. Definitely should try to use the services instead. Just a caveat to the OP that SAG support won’t help – with either approach – should issues be encountered.

No problems Rob :stuck_out_tongue_winking_eye:

You’re right, neither are supported, but the services behind the admin DSP pages rarely change. Also with our migration to the new admin UI and Admin API (documented) means these services will now never change as they will be replaced. The biggest risk is that in the future they might disappear, but that is unlikely (unless you choose to, we might make them optional in the future) as we rarely remove existing features to ensure backward compatibility.
regards
John.
PM

Thank you guys for your answers!
@John Now we are using version 9.9 but we plan to upgrade to 10.7. I will try out service wm.server.query:getPartialLog. I assume that I will have to use the same hack that I use when calling internal services to use service invoke instead of directly calling the service.
@reamon I know that it is not one of the best practices to use internal services but unfortunately there are no better alternatives. For instance, for checking the state of the triggers we are using an internal service: wm.server.triggers:getTriggerReport.

You can call it via http, I tested it using an external client with

http://localhost:5555/invoke/wm.server.query/getPartialLog?log=error&numLines=50&startLine=0

or if you want to call it from a flow service you can cheat by adding a debugLog step to the flow, then look at the service properties and replace the service attribute with “wm.server.query:getPartialLog”

boom, you can now use the service just like a normal service, complete with all the inputs and outputs.

regards,
John.

Ok, thanks. Will try it out :slight_smile:

But this will only return the newest fifty lines of the log and not those, which are logged more often than others.

Might be a ask for the ELK stack.

Regards,
Holger

Hi Holger,
Yes, but I assume that there is no max value which you can pass to numlines so I think that if I pass something like 999999999999999999999 should be fine or? Nevertheless, I will test it when I have the time. I will have to implement a logic to pull out from the logEntries String list the most frequent entries. I just hope that it won’t take forever for the flow to complete, since the plan is to call the flow on all prod servers from all clusters.

br,
n23

Hi Niemand23,

Using some sort of scripting is better option as I feel. @Holger_von_Thomsen, @niemand23 , @John_Carter4 , @reamon your opinion?

Be careful with putting to large an upper limit, you could crash your IS with an out of memory error!
Trying to count error messages will be very difficult without doing some kind of query, in which case we go back to @reamon advice, which is to use a DB query.

As an aside in 10.7 we have a new statistics dashhboard and this might be a good candidate to be added to the services tab. We already have something similar for API’s called “Top 5 slow APIs”, so it might be useful to add a “top 5 failing services” to the services tab.
regards,
John.

1 Like

Ok, thanks for the advice!
To give some more context why this is needed, we have a lot of flows which were implemented with absolutely 0 regards to support and monitoring. For some of them, we already implemented elastic search logging but there are just too many of them and we cannot afford to just monitor the error logs on all prod servers which is why in our case the process will assign a task (we are not using the task engine but a custom webMethods solution) to whoever is on call to check most repeating errors and after solving them, the task will be completed (we are also not using the business console for this but a custom dsp as I think that it is better for monitoring because you always know for sure which flow is invoked via Ajax).

A (less-known) nifty option, from a long-term stats collection standpoint, would be to use the Event Manager feature of the IS to create a subscription to Exception events, using Designer. Your subscribing service can then write the error type (and/or message) to a table, that has a counter that you can increment for every re-occurrence of the same type of exception.

If you don’t want to depend on a point of failure (i.e., database), you can write the events to the filesystem or WxConfig (best option) if you are using it.

I understand that this is essentially duplicating the Error Log entries and adding performance overhead, but it’s a reliable option and you can expand to other types of events in the long run. A caveat is that the performance overhead introduced is proportional to the amount of service exceptions (i.e., events).

Check the Service Development Help guide under Designer.

2 Likes

We too use internal services. The intent of my post above is really just to make sure you knew that “you’re on your own” when using these. Sounds like you do!

Good idea,
Forgot about that,
The service is

pub.event:addSubscriber

and you have to set eventType to “Error”, “Error Event” or “Exception” :thinking:
Can’t remember, you will have to experiment.

You then specify a service that implements the spec

pub.event:exception

This is a great feature, that is undersold. We should give it a proper admin UI !!

thnx @Venkata_Kasi_Viswanath for that
regards,
John.

@John_Carter4, indeed! I rue the fact that this useful feature is seldom used or publicized. It is a great aid for operational support.

I had leveraged this feature to build a monitoring solution for a customer 9 years ago (on v8.2) and have never seen it being used elsewhere - even in our Software AG delivery engagements.

A subscriber can be added via Designer and the pipeline can be saved to see the message contents. The documentation lists all the event types and the usage guidelines, so we don’t have to resort to trial and error :wink: