UM Cluster Issue

Hello Experts,

We’ve 3 UM’s in cluster.

Here is the issue begins, one of the UM node was down due to heap memory issue. And we’re bringing up the UM server immediately.

Ever since, duplicate document subscription is happening. Observed that 2 times the same document is being subscribed based on our transaction audit tables.

Note: No retry mechanism configured at trigger level. And the document processing is “Serial”.

First Subscription: Document is interfacing to target successfully.

Second Subscription: Document is being subscribed by wrong subscriber eventhough the filter conditions specified correctly.

As it’s serial processing, documents are processing one by one and after couple of days it’s behaving normally. Means, the new documents are coming and processing successfully.

Can you please suggest how can we identify the root cause of this issue.

Thanks,
Pradeep.

I would suggest you involve SAG support for these kinds of issues as I have personally experienced on version UM 9.8.

Hello Mahesh,

Thanks for your suggestion.

But, In the designer I can see the UM filters defined.

When I checked in the UM trigger subscriber there’s no filter showing for the Selector. Attached the screenshot for the same.

Because of this documents are being subscribed by wrong subscriber. And also sometimes duplicate documents are being processed.

If you ever faced this kind of issue, what would be the resolution needed please help.

Selector_Issue.PNG

As Mahesh suggested, contact Software AG support for this issue.

If you want to try out few things yourself, then you can check subscriber detail in all 3 nodes. Maybe as part of recovery, the filter got dropped in one node, so 3 nodes are not really in sync, and you see odd behaviour depending upon which node you are connected to. If you do see that issue appears to be cluster node inconsistency, then you should report a defect. Or, you can choose to just delete and recreate the Topic and subscribers.

A good starting point for checking consistency across the nodes in your cluster is to use the Healthchecker.

Try the steps suggested as above.

May I know your UM version with fix levels. For running the health check tool here is the location

\SoftwareAG\UniversalMessaging\tools\runner

Hello All,

Thank you so much for your valuable inputs.

I ran the UM Healthcheck tool.

I see below are the ERRORS:

FIX LEVEL CHECK FOR NODES OF CLUSTER
WARN: Versions of the realms in cluster: ‘***’ does not match. Please update them to the same version.
umserver01 : Server Release Details = 9.10.0.16.100675
umserver02 : Server Release Details = 9.10.0.16.100675
umserver03 : Server Release Details = 10.1.0.4.102145

STORE MISMATCHES CHECK
ERROR: Could not find store (/RealmSpecific/NirvanaRepublishClusterChannel) on realm [umserver01] but it is present on realm [umserver03]
ERROR: Could not find store (/RealmSpecific/NirvanaRepublishClusterChannel) on realm [umserver02] but it is present on realm [umserver03]

And I could see that UM versions are different among the cluster nodes. Can this cause any issue of duplicate subscription and UM selecotrs disappear ?

Is there any impact if store mismatches between the cluster nodes ?

This will cause ‘non-deterministic’ behaviour. UM cluster nodes must all be on the same version and fix level. In addition your clients must be lower or at the same level of the server. So you depending on what level your IS is as you can take everything to the latest fix level of 10.1 (that includes IS) or take the 10.1 node back to 9.10 so it is the same as the other nodes.

Jane

Hello Jane,

Thanks for your quick response.

When I run the HealthCheck utility the environments versions are showing as below.

FIX LEVEL CHECK FOR NODES OF CLUSTER
WARN: Versions of the realms in cluster: ‘***’ does not match. Please update them to the same version.
umserver01 : Server Release Details = 9.10.0.16.100675
umserver02 : Server Release Details = 9.10.0.16.100675
umserver03 : Server Release Details = 10.1.0.4.102145

But, If I check the UM installation directory.

C:\SoftwareAG\install\products\NUMRealmServer.prop - Clearly shows Realm version is 10.1.

e2ei/11/NUM_10.1.0.0.96243/UniversalMessaging/NUMRealmServer/NUMRealmServer-W64-Any/BM_NUM_RealmServer-W64-Any/props/version=10.1.0.0.96243

Any reason for showing wrong in HealthCheck utility ?

Thanks,
Pradeep.

Hi Sai Pradeep,

I agree with Jane’s comment.

All UM servers should be at same version and fix levels.

Adding one more point to your issue : check towards mem files. If the mem files were too large and contains some non subscribed messages, and when you restart your UM node, all the other messages will also be subscribed again.

From mem files, the delivered data is not getting removed.

Please raise inc to SAG support team. You will get quick support(i guess there were couple of clients who faced the same issues and got the fixes from SAG).

I re-iterate my advice - all nodes in a cluster MUST be at the same version and fix level - this will be the first thing SAG Support will tell you. If you are running all nodes on the same machine from the same install then I assume this is a test machine. Without knowing how you have installed and set up I do not know how you have got a mixed release cluster. The 9.10 realms need to be upgraded, or the 10.1 realm downgraded.