Checking memory consumption

bdm · May 30, 2024, 8:50am

we would like to check in realtime what is the memory consumption value for cumulocity platform even if we did not cross a threshold, is it something available?

Stefan_Witschel · May 30, 2024, 8:55am

In your posted example it is Streaming Analytics. For that you have this:

to track metrics.

For custom microservices you can do the same. Other platform components will be monitored by cloud operations when using a cloud hosted & operated instance.

bdm · May 30, 2024, 10:38am

ok thanks, so there is no way to easily check in the web interface or the API the current memory utilization percent in realtime? I already downloaded the diagnostics and checked the memory profiler, please find below a printscreen, it seems to me that EPL is top#1 with 68MB, #2: model manager with 36MB and a 2163 listeners and #3 block category registry with 29MB

Harald_Meyer · May 30, 2024, 10:50am

Have a look at the next section, the Prometheus endpoint should give you the necessary information:

Harald_Meyer · May 30, 2024, 10:52am

Also, you get additional information in the status line in the log:
https://documentation.softwareag.com/pam/10.15.4/en/webhelp/pam-webhelp/index.html#page/pam-webhelp%2Fco-DepAndManApaApp_logging_correlator_status.html

bdm · May 30, 2024, 11:37am

found this part in metrics file inside diagnostics but not sure if this is the allocated memory or the used memory and how much is the memory utilization percentage right now
/////

HELP sag_apama_correlator_virtual_memory_bytes Virtual memory usage

TYPE sag_apama_correlator_virtual_memory_bytes gauge

sag_apama_correlator_virtual_memory_bytes 2648317952

HELP sag_apama_correlator_physical_memory_bytes Physical memory usage

TYPE sag_apama_correlator_physical_memory_bytes gauge

sag_apama_correlator_physical_memory_bytes 708743168

/////

also based on startup logging file, it seems that Total memory is around 1 Gigabyte so assuming that its the total physical memory and while considering that the used physical memory is 708743168 then probably now the used memory percent is 70%

2024-05-27 09:00:07.308 ##### [140273325078528] - There are 1 CPU(s)
2024-05-27 09:00:07.309 INFO [140273325078528] - cgroups - available CPU(s) = 0.25
2024-05-27 09:00:07.309 INFO [140273325078528] - cgroups - CPU shares = 102
2024-05-27 09:00:07.309 INFO [140273325078528] - cgroups - maximum memory = 1,073,741,824 bytes
2024-05-27 09:00:07.309 INFO [140273325078528] - cgroups - soft memory limit = unlimited
2024-05-27 09:00:07.309 INFO [140273325078528] - cgroups - memory swap limit = unlimited
2024-05-27 09:00:07.309 INFO [140273325078528] - cgroups - memory swappiness = 30 %

also based on the following log from May 27:vm=1357556 pm=411016
2024-05-27 09:00:22.687 INFO [140273324783168] - Correlator Status: sm=162 nctx=27 ls=955 rq=0 iq=0 oq=0 icq=0 lcn=“” lcq=0 lct=0.0 rx=210 tx=58 rt=379 nc=9 vm=1357556 pm=411016 runq=0 si=0.0 so=0.0 srn=“” srq=0 jvm=0

Harald_Meyer · May 30, 2024, 11:54am

I do not think you can distinguish between allocated and used memory. I think you have to assume that this memory is actually in use.

One thing, I noticed is that it looks like you are mixing up the prefixes. Total memory is not 1 Terrabyte but 1 GB (in line with the original alarm you received). Likewise in the diagnostics you shared above around 29MB and 36MB instead of GB.

bdm · May 30, 2024, 12:10pm

ok so practically, we have a maximum memory allocated for analytics builder of 1 GB “maximum memory = 1,073,741,824 bytes” and we are currently consuming “physical memory is 708743168” then probably now the used memory percent is 70% correct?

what is still not clear is that I sumed up all the bytes in eplmemoryprofiler excel and the result is 69240690 so I did not see the mapping to “physical memory is 708743168” yet

Kevin_Palfreyman · May 30, 2024, 12:55pm

The normal per-tenant apama-ctrl microservice is typically composed of two processes.
A typical apama-ctrl-250mc-1g microservice is constrained to a quarter of a CPU core, and 1 GB of RAM.
Of the two processes inside the microservice, one is a JVM, and the second is a C++ process known as the correlator.
When looking at memory stats from the correlator, ignore anything related to virtual memory, you only care about physical memory. All EPL logic, Analytics Builder, and Smart rules are executed in the correlator. The JVM process sits in front of it and mostly deals with REST requests from the Web UI, etc. The JVM is normally 300-400MB.

If using the detailed low-level developer profiling interfaces (or diagnostic dumps) to look at the approximate memory used by EPL monitors, don’t expect it to add up to the full physical memory number. There are various overheads, and also anything sat in incoming or outgoing I/O queues, which often where memory is used particularly if bottlenecked at all talking to downstream services.

bdm · July 2, 2024, 11:20am

thanks, so is those 300-400MB reserved for JVM and not accessible when needed by the correlator? or all the 1G is shared between JVM and correlator? the reason why we reached to check the memory stats from the correlator was a log related to 90% memory consumption so we are trying to dig into the details to know if some EPL or analytics builder model or anything else is causing a memory leak, on the other hand, we need somehow to know the memory utilization in realtime, just like how we monitor the CPU/memory on a network device, so far we didnt find this kind of telemetry in the web interface or via APIs

Kevin_Palfreyman · July 2, 2024, 1:53pm

Hi.
The resource limit is for the entire Kubernetes pod. For Streaming Analytics there are (normally) two distinct separate processes within the pod, the JVM and the Correlator.

As mentioned in some of the previous replies and doc links, you can get the combined metrics in Prometheus format by accessing <tenantURL>/service/cep/prometheus

Examples of useful metrics:

How much memory is the pod using (in MB):

# HELP sag_apama_in_c8y_apama_ctrl_total_physical_mb Total microservice physical memory usage
# TYPE sag_apama_in_c8y_apama_ctrl_total_physical_mb gauge
sag_apama_in_c8y_apama_ctrl_total_physical_mb 662

How much memory is the correlator process using (in bytes):

# HELP sag_apama_correlator_physical_memory_bytes Physical memory usage
# TYPE sag_apama_correlator_physical_memory_bytes gauge
sag_apama_correlator_physical_memory_bytes 312905728

What are the resource limits for this whole microservice/pod, including memory limit (in MB):

# HELP sag_apama_correlator_user_streaminganalytics_microservice_metadata The user status 'streaminganalytics_microservice_metadata'
# TYPE sag_apama_correlator_user_streaminganalytics_microservice_metadata gauge
sag_apama_correlator_user_streaminganalytics_microservice_metadata{cpuCoreLimit="1.00",memoryLimitMB="4096",microserviceName="apama-ctrl-1c-4g",microserviceVersion="25.176.0"} 0

How much memory is the JVM process using (in MB):

# HELP sag_apama_in_c8y_apama_ctrl_physical_mb Java process physical memory usage
# TYPE sag_apama_in_c8y_apama_ctrl_physical_mb gauge
sag_apama_in_c8y_apama_ctrl_physical_mb 364.46875

Here my pod is using 662MB, made up of approx 364MB for the JVM and 298 for the Correlator.

As said earlier, you should ignore the virtual memory.

Other metrics from the correlator process that are potentially useful to monitor are listed in the Apama docs:
List of correlator status statistics (softwareag.com)

If you are getting alarms about hitting 90% usage (of pod limit), then there are a few ideas to check:

That alarm described on this page: Troubleshooting and diagnostics - Cumulocity IoT documentation
Is your output queue growing over time? (producing output faster than can be consumed)
Is your input queue growing over time? (either the output queue is full and causing back-pressure, or for each incoming message you are trying to do too much work (or blocked on an external lookup - are you attempting to manipulate ManagedObjects at incoming Measurement rates, or similarly doing too many/inefficient remote Find requests at incoming Measurement rates?)
Do you have a listener-leak?

Some of those can be spotted either by graphing Prometheus metrics, or by passing the correlator log file (which was in the diagnostics zip that you had) through the log-file-analyzer: Introducing the Apama Log Analyzer - Knowledge base - Cumulocity IoT - Software AG Tech Community & Forums