Universal Messaging clustering

I have issues with the clustering in Universal messaging 9.8 version.

I have configured the cluster of 2 universal messaging server with sites as Primary and Secondary, Primary has IsPrime flag, but i have issue when we restart the servers. below logs are showing in cluster.logs

Cluster Members shows

eaiitg Master Primary eaiitg Local Online
eaiitg1 nothing here Secondary nothing here Disconnected Local

,Cluster> Setting potential master to eaiitg1 yet master count is only 1.0 while we we need more than 1.0 for quorum
[Mon Jul 27 13:20:12 MDT 2015],Cluster> Setting potential master to eaiitg1 yet master count is only 1.0 while we we need more than 1.0 for quorum
[Mon Jul 27 13:20:12 MDT 2015],Cluster> Setting potential master to eaiitg1 yet master count is only 1.0 while we we need more than 1.0 for quorum
[Mon Jul 27 13:20:14 MDT 2015],Cluster> Setting potential master to eaiitg1 yet master count is only 1.0 while we we need more than 1.0 for quorum
[Mon Jul 27 13:20:14 MDT 2015],Cluster> Setting potential master to eaiitg1 yet master count is only 1.0 while we we need more than 1.0 for quorum
[Mon Jul 27 13:20:14 MDT 2015],Cluster> Cluster State Manager: Failed to establish viable cluster, resetting links
[Mon Jul 27 13:20:16 MDT 2015],Cluster> Setting potential master to eaiitg1 yet master count is only 1.0 while we we need more than 1.0 for quorum
[Mon Jul 27 13:20:16 MDT 2015],Cluster> Setting potential master to eaiitg1 yet master count is only 1.0 while we we need more than 1.0 for quorum
[Mon Jul 27 13:20:18 MDT 2015],Cluster> Setting potential master to eaiitg1 yet master count is only 1.0 while we we need more than 1.0 for quorum
[Mon Jul 27 13:20:19 MDT 2015],Cluster> Setting potential master to eaiitg1 yet master count is only 1.0 while we we need more than 1.0 for quorum
[Mon Jul 27 13:20:19 MDT 2015],Cluster> Setting potential master to eaiitg1 yet master count is only 1.0 while we we need more than 1.0 for quorum
[Mon Jul 27 13:20:21 MDT 2015],Cluster> Setting potential master to eaiitg1 yet master count is only 1.0 while we we need more than 1.0 for quorum
[Mon Jul 27 13:20:21 MDT 2015],Cluster> Setting potential master to eaiitg1 yet master count is only 1.0 while we we need more than 1.0 for quorum

appreciate your help thanks

Kiran,

Are you on latest fixes. couple of weeks back SAG release Fix2 for Realm server which talks about some clustering issues.

I was facing exactly similar issue for which i have a ticket open with SAG.

My situation don’t allow me to apply fix and test it. if you can then please apply fix and share your feedback.

Or if you can recreate instances that can help too.

Mangat,

we are on latest fix and patch, i tried uninstall and reinstall couple of times but after restart of UM the issue is coming back, i have open a ticket with SAG, waiting on solution. thanks for update.

Auditing for the install location /eaiums/wM9/umserver

2015-07-14 23:34:14.932 MDT: Universal Messaging Shared Bundles 9.8 Fix 2(wMFix.NUMRepository.SharedBundles_9.8.0.0002-6716) installed
2015-07-14 23:34:14.932 MDT: Universal Messaging Java Client 9.8 Fix 2(wMFix.NUMClient_9.8.0.0002-6716) installed
2015-07-14 23:34:14.932 MDT: Universal Messaging JavaScript Client 9.8 Fix 2(wMFix.NUMClient.JavaScript_9.8.0.0002-6716) installed
2015-07-14 23:34:14.932 MDT: Universal Messaging C++ Client 9.8 Fix 2(LNXAMD64)(wMFix.NUMClient.LNX64_9.8.0.0002-6716) installed
2015-07-14 23:34:14.933 MDT: Universal Messaging Enterprise Manager 9.8 Fix 2(wMFix.NUMEnterpriseManager_9.8.0.0002-6716) installed
2015-07-14 23:34:14.933 MDT: Universal Messaging Realm Server 9.8 Fix 2(wMFix.NUMRealmServer_9.8.0.0002-6716) installed
2015-07-14 23:34:14.933 MDT: Universal Messaging Template Applications 9.8 Fix 2(LNXAMD64)(wMFix.NUMTemplateApplications.LNX_9.8.0.0002-6716) installed
2015-07-14 23:34:14.933 MDT: Universal Messaging SPM Bundles 9.8.0 Fix 1(wMFix.NUMspmBundles_9.8.0.0001-0001) installed

Kiran,

i am sure you must have already done this. but i will tell you how i resolved issue.

deleted existing instances using ninstancemanager.bat|sh

deleted data folder under umserver. better delete umserver folder under SAG_HOME/UniversalMessaging/server

then recreate instance using ninstancemanager

and recreated cluster and everything worked fine. i am still actively monitoring UM cluster to see if there are any issues reoccuring but i have seen none in last 3-4 days.

one thing to notice when you reinstall or recreate make sure you delete umserver folder. UM keep reference on file system and even after reinstall it may keep reference to old instance.

Not sure if this is the best way to go. mine is qa environment so i could do it.

HTH
Mangat

Hi,

We see similar issue on our Production UM servers.

One of the UM is not joining cluster with error.

Cluster State Manager: Failed to establish viable cluster
Cluster> Found existing Master in cluster as ulvwsbms01, setting local state to that of cluster
[Tue Aug 25 15:53:40 BST 2015],Cluster> Found existing Master in cluster as xxxxxxxx, setting local state to that of cluster
[Tue Aug 25 15:53:42 BST 2015],Cluster> Found existing Master in cluster as xxxxxxxx, setting local state to that of cluster

Could you please let me know if have you got any update from SAG on this issue? I cant delete/recreate the instance as this is Production.

Robert,

what version?

can you check if there are any large .mem files in under data folder of server not able to join cluster?

Is server starting up?

do you see any error in logs of individual server or cluster logs.