We are using a PUB/SUB mechanism with six agents using the same name, so we are relying on the load balancing mechanism of WM. We want only one agent to receive the message. This had been working without issue for 9 months until last week, when we started to get the case where two agents were receiving the message within a few hundred milliseconds of each other. We enriched the sending agent with very precise logging and have determined that the message is sent only a single time.
I have been told that it is not possible to activate filtered logging on the broker, so full logging would need to be enabled and there are millions of other unrelated messages flowing through this same broker. On the scope that we are using, the messaging is very slow - about 2000 per day. So my questions are:
Has anyone see behaviour of this sort? I know that the broker is deprecated, but it is the latest version.
Is it possible to activate logging on the broker, but filtered to one specific scope?
You can see and use Document Logging functionality for a single Document Type. Use LogOnPub and LogOnPublish. Since, logging queues will see only about 2000 message a day assuming it is a separate and unique document type related flow, there is no need to setup Integration Server Logging Utility (WmLogUtil) package to drain the logged documents from Broker to some IS database. You can purge the logging queues once in a while.
Another option is to write a simple Java client that uses Broker Trace Events. I think you’ll need to subscribe to Published, Enqueued, and Received events. This will generate traces for all pub/sub, but that is ok as this can handle very high volume.
I will suggest you try Trace Event first. It will be simple 30-40 lines of Java code that makes use of Broker Java Admin APIs. Traces will be generated only when you run your trace client. There is no need for extra setup or cleanup.
Attached one sample code for tracing. It is part of some Broker/bin/xutils and will not work standalone. But, this should give you good idea on what you need to do.
BrokerTraceEventReader.java (4.8 KB)
One other observation. The time delay between the client posting the message and the first pickup is around 1.5 seconds. This seems rather long to me. I am wondering if the WM server is overloaded and the slowness exposes a race condition. Have you ever seen anything like this?
If there are only 2000 msg a day, I doubt the delay will be due to overload related slowness or some race condition. A million or two messages spread evenly over a day through other queues should not affect much. Most likely it will be one of following:
a) IS native trigger is used, so trigger delay kicks in and IS polls once every 2 seconds. Even though IS had left the callback on Broker, and Broker did deliver the message to IS right away, IS checked for message availability only after a second or so.
b) A custom Java/JMS client is used, and it is using polling mechanism.
c) JMS client (or IS JMS trigger) is used, it is using MaxReceive (or IS trigger prefetch) of 1 with a Broker Cluster. This max-receive=1 causes Broker JMS client to go into polling mode to get exactly one messages from exact one Broker from the cluster. This polling is again periodic and causes a second or two delay between publish and start of processing.
If you suspect load related issues, then use standard system monitoring and look for high cpu utilization (not overall, but few threads running 100%), high disk (say 60% or more utilization), and significant network utilization (say average several MB per sec average over days). From application side, you can check /diag.log and see if there are some response related entries there. A non-zero diag.log indicates some issue or another at some point of time since last restart. A consistent flow of response related entries in diag.log will indicate some ongoing issue.
Thanks again for your input. I am using the C native library (which is very old - 12 years I think). I have 6 agents that share a client state. On page 73 of the C programming manual, it describes this as a ‘first come, first serve’ load balancing. Each agent runs a listening thread, sitting on the awGetEvent with a 1 second timeout - so they are polling. One of the agents operates as a master and sends out a request every 5 minutes for an action. Sometimes, there might be 5 or 6 requests outbound over a 30 second time frame (so individually separated by 5-6 seconds). The ‘master’ agent also operates a listening thread and may respond to the request once it has finished sending out the requests.
This has worked without fault for 9 months. I have complete logs on my side of all interactions over that time and I have never issued a doubled outbound request. Last week, requests started being received by two and sometimes 3 agents. These responses were in parallel and the reception is separated by a few hundred milliseconds. Apparently the broker I am using is a shared amongst many other applications, so while my load on it is really trivial, I don’t know what is happening elsewhere.
I want to start by getting the platform metrics to see if its overloaded.
which version of Broker are we talking about?
9.5, 9.6, 10.5?
Might be worth to check if the C-Client can be updated to the matching version of the broker (incl. latest Fixes for the version in use).
Some metrics can be checked via Messaging UI on MWS, if you need deeper insights you can consider installing Optimize4Infrastructure.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.