UM Node shutting down with error "Nirvana Realm Server shutting down due to UserManager: OutOfMemoryError raised in User"

***Universal messaging ** used and 10.3 & 10.11 version

The UM nodes goes down after sometime with bellow error logs. We faced this issue sometime back and was resolved by increasing direct memory to 4GB but it’s happening again and increasing memory is not helping.

Could anyone please suggest to overcome this error. Full logs :-

INFO | jvm 1 | 2024/08/21 18:16:44 | “Client Setup:249” daemon prio=1 tid=0x1ec waiting on com.pcbsys.nirvana.server.store.cChannelList@6a9191 locked b

y Client Setup:207 BLOCKED

INFO | jvm 1 | 2024/08/21 18:16:49 | Nirvana Realm Server shutting down due to UserManager: OutOfMemoryError raised in User siebel@prdcrmcwf5.stc.com.s

a’s handler

INFO | jvm 1 | 2024/08/21 18:16:50 | Nirvana Realm Server shutting down due to UserManager: OutOfMemoryError raised in User siebel@prdcrmcwf5.stc.com.s

a’s handler

INFO | jvm 1 | 2024/08/21 18:16:50 | Nirvana Realm Server shutting down due to UserManager: OutOfMemoryError raised in User siebel@prdcrmcwf5.stc.com.s

a’s handler

INFO | jvm 1 | 2024/08/21 18:16:50 | Nirvana Realm Server shutting down due to UserManager: OutOfMemoryError raised in User siebel@prdcrmcwf5.stc.com.s

a’s handler

INFO | jvm 1 | 2024/08/21 18:16:52 | Starting Realm Wed Aug 21 18:16:52 AST 2024

INFO | jvm 1 | 2024/08/21 18:16:52 | Copyright (c) Software AG Limited. All rights reserved

INFO | jvm 1 | 2024/08/21 18:16:52 | Realm shutdown complete

INFO | jvm 1 | 2024/08/21 18:16:53 | Shutdown: Initiated by lock file being removed

STATUS | jvm 1 | 2024/08/21 18:16:53 | Shutting down Nirvana Realm

INFO | jvm 1 | 2024/08/21 18:16:53 | Nirvana Realm has been shutdown

STATUS | jvm 1 | 2024/08/21 18:16:53 | Stopping Wrapper monitor on port:9998

STATUS | jvm 1 | 2024/08/21 18:16:54 | UM realm server - stop called

STATUS | jvm 1 | 2024/08/21 18:16:54 | Nirvana Realm already shutdown

STATUS | wrapper | 2024/08/21 18:16:55 | ← Wrapper Stopped

Thanks

See https://documentation.softwareag.com/universal_messaging/num10-15/webhelp/num-webhelp/#page/num-webhelp%2Fco_configuring_jvm_heap_direct_memory.html
You need to tune the JVM configuration to avoid out of memory issues.

Hi Stephane,

Thank you for your response, we have already assigned 12GB as heap memory and 4G as direct memory but still facing same error.

Earlier we faced issue on another server where increasing direct memory to 4G worked but not this time.

Thanks

What’s the actual memory available in the server?
I would lower the values you’ve set to ensure the JVM does not run out of memory.
The idea is to trigger garbage collection before the JVM runs out of memory.

I had similar issues in Kubernetes, UM containers restarting every couple of hours because the memory available to the pod was lower than the heap size specified at UM level.

Hi Stephane,

We have 32 GB memory on VM and only 1 UM server is running on each VM.

When we run top command we can see more than 20GB is always free but we still get this error.

Thanks

And no cgroups on the server that limit the amount of resources available to the JVM?

There is no such limit, other 2 UMs in some cluster are running fine but they are on different VMs.

Thanks

I can see now there are some error with client Setup blocked.

jvm 1 | “Client Setup:243” daemon prio=1 tid=0x1b7 waiting on com.pcbsys.nirvana.server.store.cChannelList@52f41351 locked by Client Setup:140 BLOCKED
jvm 1 |
jvm 1 |
jvm 1 | “Client Setup:244” daemon prio=1 tid=0x1b8 waiting on com.pcbsys.nirvana.server.store.cChannelList@52f41351 locked by Client Setup:140 BLOCKED
jvm 1 |
jvm 1 |
jvm 1 | “Client Setup:245” daemon prio=1 tid=0x1b9 waiting on com.pcbsys.nirvana.server.store.cChannelList@52f41351 locked by Client Setup:140 BLOCKED
jvm 1 |
jvm 1 |
jvm 1 | “Client Setup:246” daemon prio=1 tid=0x1ba waiting on com.pcbsys.nirvana.server.store.cChannelList@52f41351 locked by Client Setup:140 BLOCKED
jvm 1 |
jvm 1 |
jvm 1 | “Client Setup:247” daemon prio=1 tid=0x1bb waiting on com.pcbsys.nirvana.server.store.cChannelList@52f41351 locked by Client Setup:140 BLOCKED
jvm 1 |
jvm 1 |
jvm 1 | “Client Setup:248” daemon prio=1 tid=0x1bc waiting on com.pcbsys.nirvana.server.store.cChannelList@52f41351 locked by Client Setup:140 BLOCKED
jvm 1 |
jvm 1 |
jvm 1 | “Client Setup:249” daemon prio=1 tid=0x1bd waiting on com.pcbsys.nirvana.server.store.cChannelList@52f41351 locked by Client Setup:140 BLOCKED
jvm 1 | Nirvana Realm Server shutting down due to UserManager: OutOfMemoryError raised in User siebel@prdcrmcwf1.stc.com.sa’s handler
wrapper | Pinging the JVM took 1 seconds to respond.
jvm 1 | Nirvana Realm Server shutting down due to UserManager: OutOfMemoryError raised in User siebel@prdcrmcesa1.stc.com.sa’s handler
wrapper | Pinging the JVM took 2 seconds to respond.
wrapper | Pinging the JVM took 1 seconds to respond.
wrapper | Pinging the JVM took 2 seconds to respond.
jvm 1 | UM realm server - stop called
jvm 1 | Deleting the server lock file: /wmapps/product/softwareag103/UM/UniversalMessaging/server/prdwmum1/data/RealmServer.lck
jvm 1 | Shutdown: Initiated by lock file being removed
jvm 1 | Shutting down Nirvana Realm
jvm 1 | Nirvana Realm has been shutdown
jvm 1 | Stopping Wrapper monitor on port:9998
wrapper | ← Wrapper Stopped

Let’s see if other community members can help, but I guess you’ll need to raise an official support ticket here.

Hi @Holger_von_Thomsen,

Could you please suggest here, all servers have same configuration but only node1 is going down repeatedly.

Now if we bring up node1. The JMS alias connections are getting disabled when all 3 nodes are running.

Then we shutdown node1 and JMS alias works fine.

Thanks

I might have missed a post but are these three nodes in an active-active cluster or they separate installations serving different instances? Do they share the data directory or do they have their own?

I believe i might have seen this issue with a customer after a 10.5 upgrade. If my memory recalls right, I had to flush out all the pending events in the naming\defaultContext channel. There was an instance of OFI that was sending in bunch of stuff to this channel causing the UM to run out of memory. But this might be a unique scenario that might not fit your case.

Having said that I would definitely recommend start by clearing out the defaultContext channel. You can also try creating a temporary data directory (brand new) and point UM Server to it and restart the UM.

If these things do not work then you might have to open a ticket with support because to troubleshoot this a complete access to your environment is needed which is not possible through tech forum posts. Hope it helps!

Hi Akshith,

This issue is happening on 10.3 version, could you please confirm where we have to clear naming\defaultContext files.

Thanks

What is the buffer size and what is the largest message size you publish?

It is a channel. You can access it from UM manager.

Hi Engin,

read buffer size is 10240, normally the message size is 10-20kb.

Thanks

Hi Akshith,

Could you please share screenshot, not able to find channels on enterprise manager.

Thanks

Dear
Compare the fix level of all three nodes.
Try the options below.
Update the RAM to 2 GB or 1GB and have jconsole (under JVM of SAG installation) launched to check the current usage. If it shows stable, update the memory and test again.

Another option would be to take a backup of the data folder of the impacted node and rejoin this, as the setup consists of three nodes cluster it should sync back from the master.

Hi Bari,

Thank you for response, the memory increase didn’t help to resolve the issue.
Yesterday we tried delete the data directory and the UM came up but since the queue and topics were missing we face multiple failures.

Could you please confirm how much time sync should take from other nodes, we can try this during off business hours.

Thanks

Top command only shows you the available system memory, not heap memory usage by jvm.

Did you verify all 3 nodes have the same resources allocated? Better double check number of cores, jvm max and min heap sizes, fix versions installed and system ulimit values. Having the issue on only 1 node indicates that it is indeed a configuration error. If it is not a configuration error, than it is highly likely that it is an installation error. If thats the case you might need to recreate the instance or may be reinstall your um node. You can use command central to compare the configuration files.

1 Like