Our company is currently facing a challenge where we have 8 nodes in production, and it is time-consuming to individually open each node to check the loggers for processed or failed data. With numerous issues arising daily in production, tracking and validating these loggers has become a time-consuming task. Considering that we process a significant amount of data related to EDI 850, 855, 860, SAP, and sales data. There are lots of interfaces. We are seeking a solution or tool that allows us to check multiple loggers in a single interface, with easy search capabilities. What would be the best approach to implement such a solution or tool we can use?
This is a pretty broad topic. Without additional information I would lean towards something like the ELK stack or Splunk.
Can you provide additional details on the following questions:
- What is the amount of data we are looking at? New per day/active/archival? Retention period? Legal requirements about deleting and keeping?
- Where are the log data currently stored? Purely plain text or also DB?
- How urgent is this?
Suppose we have one interface for which are processing 500K messages per day. The retention period is 30 days.
We store in Purely plain text in apploggers.log and msgloggers.log files in Onprem server
We have time to implement the solution.
Are you using MWS and audit logging? I assume a notification mechanism of some sort is used in the integrations to notify (ticketing system, email) when an error or failure occurs.
Beyond MWS, I agree with @jahntech.cj that ELK or similar can be useful for log capture/aggregation and analysis. There are various systems management/monitoring tools as well that can provide dashboards, notifications, automated recovery actions, etc. (I’d mention a couple but don’t want to run afoul of forum etiquette and rules about promoting things – a search should turn up some options)
This decision also depends on your requirement. Personally I prefer using internal logging mechanisms. I prefer tracking services using MWS audit function. You can review pipelines and service flow and resubmit right away after making small changes. Others prefer external logging tools like logz.io or dynatrace hence they ship their logs to a 3rd party. I used to be a consultant so I don’t like to rely on 3rd party tools. If you are only interested in viewing internal log files, Command Central does it for every component from one place.
As a variant to my former post I would also like to mention that, assuming I understand things correctly, we are talking about a business-critical project here. I am not a fan of going overboard with formalism. But it is also not something that should IMHO be done without involving all stakeholders and get their buy-in as well as commitment.
Lastly, I would expect a requirements document of at least 10-20 pages (to give a feeling for the amount of detail we are looking at). I would pay particular attention to the performance and operations side.
Beware the illusion of “best.” More effective might be “what are others using to achieve X?” One of the various things that others are doing might work in your situation.
Adding to Rob’s post with my answer to the question “what are others using to achieve this?”
In my current project, we took the approach of implementing a logging package that is built on top of Log4Jv2 which enables us to emit log events from an executing service to a simple log file. Then, on each host of the Integration Server, we have a Splunk agent installed which ships the logs to Splunk.
The package does a number of important things, like supporting thread contexts, enriching the events with host, environment, thread, and calling service information, etc. However, one of the key things we did was to move away from issuing plain text log events. Instead, we now emit events based on a standard JSON structure. This approach allows us to emit rich log events that can be easily parsed in Splunk, enabling us to execute powerful queries and to build amazing dashboards. We also moved away from issuing email notifications from the Integration Server itself and now notifications are issued directly out of Splunk based on these JSON log events.
I can elaborate more on the structure we’re using if needed but another key thing we did was to separate the static textual log messages from the variable data within them. For example, where before we used to emit events like “Job ABC processed 150 rows in 6000 milliseconds,” we now emit events like:
"message": "Job completed",
This made a world of difference.
Now, if you’re interested in a vendor-neutral approach to shipping not only logs but also metrics and traces out of your application and into an observability platform of your choosing, I highly recommend you take a look at OpenTelemetry (https://opentelemetry.io/). It even has a Logs Data Model which you can use if you’d like.
There’s a cool product that supports OpenTelemetry for webMethods but I too don’t want to run afoul of forum etiquette so hit me up if you need more info.