High heap memory use

Hi everyone, good morning!

Our UM Realm usually uses 300~800MB of the defined Heap Memory. We set minimun 2048 and max 4096.

But, for an unknown reason, today our UM Realm it’s using 3000MB+ of Heap Memory. Our CPU it’s also running at 95~99%.

Where should I go to find the process that it’s using this much? What should I do to fix this issue? Can I clear something on Universal Messaging Enterprise to release these system resources?

Please, check the attached screenshot.

Thanks in advance.

Hello Renan,

Have you confirmed that the CPU consumption is caused by the UM process. UM doesn’t usually go that hard on the CPU, unless it’s spinning heavily in GC. If this is the case and it’s still not releasing any memory, it’s probably a case of non-persistent data held in UM. If you could generate a heap dump and share it we could check what’s consuming all the heap space.

Thanks,
Stefan

Hi Mr. Stefan!

As you requested, I confirmed in our linux server that it really is UM that is using all this CPU percentage.

I will do what you requested. But, please, how can I generate this heap dump? I didn’t find this option in our Universal Messaging Enterprise.

Thanks,
Renan.

Hi. We solved the problem by restarting our UM Service.

But it seems that every wednesday at 12:00:5* AM it starts this high CPU usage for no reason.

In our server.log, I can see that one trigger becames very unstable and starts to throw the same error message. Look at the screenshot.

What should I do to fix this issue?

Hi Renan,

to take an heap dump, find out the process id of the problematic process.

Then do “kill -3 ” to take the heap dump, which will be written to std out, which usually is redirected to nohup.out for long running processes.

Regards,
Holger

Running “kill -3” produces a thread dump. In order to produce a heap dump you would need to execute something like:

$JAVA_HOME/bin/jmap -dump:format=b,file=<myheap.bin>

You would obviously need to run that when the heap utilization is high. After that if you could zip up and share the generated heap dump we could have a look to see what’s consuming the space.

Could you also share the nirvana.log file from under the /UniversalMessaging/server//data/ directory so we can take a look at the memory footprint progression over time.

Thanks,
Stefan

Hi.

Unfortunately, I can’t send all content of our nirvana.log due to confidential subjects. But I can share with you some parts, mainly the ones when the UM starts the high cpu usage.

Please, could you check the attachments?

In screenshot1.png I detected that after the exception the service starts to give errors and logging out sessions.

After screenshot2.png, most of the lines became “User xxx logged out from…” and it didn’t stop until we restarted our UM Service.

Thank you.


Hi!

Today we are having a heap memory high usage issue again. On last week we increased the Heap Memory Size for our UM and now it’s using about 5GB.

I tried to generate a heap memory dump as you requested, but I got this result:

[master@server00 Documents]$ sudo /WebM/IS01/jvm/jvm/bin/jmap -dump:format=b,file=myheap.bin 174185

174185: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding

Please, what should I do? I’m afraid to use the “-F” option because this is happening in production environment.

Thank you.

Hello Renan,

How did you determine the process ID?
We use something called a tanuki wrapper to oversee the Java process of the server and it is sometime possible to mix up the two and pass the wrong pid.
When in doubt, you can check the pid in the file /UniversalMessaging/server//data/RealmServer.lck

Thanks,
Stefan

Hello Stefan. Thank you for your fast reply!

I got the PID using the “top” command.

Using your suggestion I got the same PID:

[master@server00 Documents]$ cat /UMData/umserver01/data/RealmServer.lck
174185

But when I tried the heap dump I still got the same message:

174185: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding

That’s odd. Well you could try the force flag, but I fear the JVM is not very responsive atm. Could you zip up and upload the UMRealmService.log in the server bin directory? Or alternatively you can check it yourself to see what is happening to the server. Also in the nirvana.log file we should be able to see the memory monitor logs and have at least a basic idea of the GC activity.

Thanks,
Stefan

I managed to generate the heap dump. I used the “runUMTools.sh” passing the “heapdump” option and it generated a 9GB file.

As you requested, I compreesed it but couldn’t send here due to the 4096kb limit. The size after the compression is 24,314kb. Worst, I can’t send to any online storage due to proxy restrictions. :frowning: . Is there any way to ourselves analyze this heap dump and find a solution?

Unfortunately, I can’t send the logs due to confidential information. My company doesn’t allow me to send it. :confused:

Hi. I did some search and I managed to open the .hprof file generated by the “runUMTools.sh” script with the Eclipse Memory Analyzer.

I got the results in the attachments. Please, Stefan, I would like to ask if possible: Could you take a look to see if you can figure out what’s happening?

Thanks in advance.
Screenshot_1.png


Hi,

in this case I would suggest to check if you have the latest fixes for your UM version applied and if there is an entry in the fix readmes related to this issue.

If you are on the latest fixes and the issue persists you should open an incident at Empower to get this verified and hopefully fixed later on by SAG.

Regards,
Holger