Correlator status log lines

UPDATE (17/10/2019) : there is now a community-developed tool that can help with analyzing correlator log files – see GitHub - ApamaCommunity/apama-log-analyzer: Python 3 script for analyzing Apama correlator log files and extracting useful diagnostic information for more information.

Correlator status lines allow you to monitor the runtime operational status of a running correlator. When a project is running, correlator status lines are logged every 5 seconds when you start the correlator.

You can specify the interval at which the correlator logs status lines by providing the --logQueueSizePeriod option when you start the correlator.

WHERE TO LOOK FOR STATUS LOG LINES

The default behavior when a correlator is run from the command line is for the correlator log (which includes the status information) to be written to standard output. When run from Designer, status lines are shown in Console view.

FORMAT OF CORRELATOR STATUS LINES

When logging at INFO level the status lines are in the format shown below. The meaning of each field is described in the table in the Meaning of fields section .

Correlator Status: sm=11 nctx=1 ls=60 rq=0 eq=0 iq=0 oq=0 icq=0 lcn="<none>" lcq=0 lct=0.0 rx=5 tx=17 rt=31 nc=2 vm=325556 pm=81068 runq=0 si=915.3 so=0.0 srn="<none>" srq=0

HOW TO REDIRECT LOGS TO FILE

To send correlator logs to a file, specify the -f file or --logfile file option, where file is the name of the file that you want to be the log file.

If you specify the name of an existing file, the correlator overwrites it on Windows or appends to it on UNIX. To avoid overwriting log, specify ${START_TIME} in the log file name when you start the correlator.

For example: --logfile correlator_${START_TIME}.log

This command opens a log with a name something like correlator_2016-10-06_16:51:33.log

In UNIX shells, you need to escape the tag symbols like correlator -l license --logfile correlator_\${START_TIME}.log when starting the correlator.

HOW STATUS LOG LINES CAN BE USEFUL

For diagnosing correlator busy scenarios when connected to external applications.

These logs are helpful in identifying memory leaks, listener leaks, etc. For example:

  • If pm increases continuously, it may be an indication of a memory leak.
  • An increase in ls may indicate a listener leak.
  • If the runq parameter is consistently more than 0 and latency is a problem then it indicates performance issue and the application is not keeping up. You should reduce the amount of work being done in the correlator or add more/faster CPUs to your configuration.

In this way, you can assess the performance of applications based on the values of the fields shown in status lines.

MEANING OF FIELDS

The table describes the meaning of the fields in correlator status lines and a few examples of how we can diagnose issues from them:

Field Description Units Examples Comments
sm Number of sub monitors/monitor instances Number of monitor instances If there is a consistent increase in this number, it is an indication that monitors are being spawned but not terminating. This will eventually lead to an out of memory condition. This number should usually be steady
nctx Number of contexts (The number of contexts in the correlator. This includes the main context plus any user-defined contexts) Number of context This number should usually be steady
ls Number of listeners (This is the sum of listeners in all contexts. This includes each on statement and each stream source template) Number of listeners If “ls” is increasing but “sm” isn’t, the culprit is probably custom EPL that is creating listeners that are never satisfied or quit. This number should usually be steady
rq Route queue (Sum of the number of routed events on the input queues of all contexts) Number of events If this number is largish (>50) there is probably a very lengthy loop (possibly infinite) routing events. This number is typically 0 or near 0
eq Enqueue queue (Number of events on the enqueued events queue) Number of events This number is typically 0 or near 0
iq Input queue (Sum of the number of entries on the input queues of all contexts) Number of events Every context has an input queue and each queue has a maximum size of 20,000 events and it is perfectly normal to see this number in the hundreds. But if this number is steadily increasing this indicates the system is not keeping up and may eventually block senders. Either the correlator simply cannot process the events at the rate they arrive, or there is back pressure from the output queue. This number is typically 0 or near 0, if it is growing then it indicates contexts are not keeping up with incoming events.
oq Output queue (The number of events waiting on the correlator’s output queue to be dispatched to any registered receivers) Number of events The output queue is fed by the “emit” or “send to” keyword. There may be multiple processes attached as consumers to the output queue. If any one of them cannot consume events at the rate the correlator is emitting them, the queue will back up and this number will start getting large. By default, it has a limit of 10,000 events per receiver. If this limit is reached, the correlator will, by default, stop processing events and wait for the output queue to drain. This number is typically 0 or near 0, if it is growing then it indicates one or more slow receivers.
icq The total number of enqueued events on the input queues of all public contexts Number of events This will be a subset of the events included in the iq count. This number is typically 0 or near 0, if it is growing then it indicates one or more slow receivers.
lcn Name of the slowest context N/A
lcq For the context identified by lcn, this is the number of entries on its queue Number of events
lct For the context identified by lcn, this is the time difference between its current logical time and the most recent time tick added to its input queue Seconds
rx Number of events the correlator has received from external sources since the correlator was started Number of events This number increases as events are received. It only increases, it can never go down.
tx Number of events the correlator has sent to receivers since the correlator was started. This number includes duplicate events sent to multiple receivers Number of events If you inject EPL code that emits an event, and there are five receivers that are subscribed to channels that publish that event. In this situation, the tx count goes up by five. Although there was 1 event, it was sent five times. This number normally increases as events are emitted
rt Route total. Total number of events that have been routed across all contexts since the correlator was started Number of events This number normally increases as events are routed
nc Number of consumers (the number of components connected to receive emitted events) Number of connections This number should usually be steady
vm Virtual memory kilobytes This number should usually be steady, though may vary slightly
pm Physical memory kilobytes This number should usually be steady, though may vary slightly
runq Run queue. Number of contexts that have work to do but are not currently running Number of contexts If this number increases, it indicates a performance issue and the application is not keeping up This number is typically 0 or near 0
si The rate at which pages are being read from swap space Pages/second If “si” is non-zero, the host is running out of memory and will not perform well. This severely affects latency and it is possible that this will prevent Status lines being regularly logged every 5 seconds. This number is typically 0 or near 0
so The rate at which pages are being written to swap space Pages/second If “so” is non-zero, the host is running out of memory and will not perform well. This severely affects latency and it is possible that this will prevent Status lines being regularly logged every 5 seconds. This number is typically 0 or near 0
srn Slowest receiver name (The receiver with the largest number of incoming events waiting to be processed) N/A
srq Slowest receiver queue size (For the receiver with the largest number of incoming events waiting to be processed, this is the number of events that are waiting) Number of events

Apama implements flow control for its queues. The input queue (one per context) has a maximum size of 20,000 events and the output queue (one per receiver) has a size of 10,000 events. If a receiver is slow, the output queue will grow until it is full and then block contexts that are sending to it. If contexts are not keeping up (or blocked), the input queue will grow until 20,000 and then block senders.

Use the --qdisconnect command line flag to prevent the correlator from consuming an unbounded amount of memory and failing with out of memory in the case where receiver is slow and events continue to be sent in.

CONCLUSION

The correlator status log lines are the first place you should look if you are experiencing problems (throughput, latency, memory use, high CPU, etc) with Apama and give a great insight into the internal status of the correlator. Please use the forum if you have any questions.

1 Like