UM Clustering

I have created 4 UM instances and added all into the cluster. if 2 nodes are down then the cluster is going off.

Is there any setting that i need to change it/this is usual behavior? I am using UM9.8v.

Please advise on this.

For a cluster to be active there should be at least 50 + % of active servers. Currently as per the qurom you have exactly 50 % of active servers. You need an additional 1 vote to be in cluster.

Create a site and select one of the servers which is active as prime. This should work.

Here I am using cluster not site when i have taken 3 instance,if one is down other 2 active as a master and slave. If i took 4 instance, if two is down then complete cluster is down- why?

What does it mean “For a cluster to be active there should be at least 50 + % of active servers” ? Can you elaborate more on this.

Thanks.

Krishna,
the reason you need MORE than 50% to continue the cluster is to avoid a situation called “split brain”. If you have two data centers and all 4 servers are running, but the connection between the data centers goes down, you do NOT want both pairs of servers to continue processing (each half has exactly 50%), as you would create a situation that could not be recovered without losing data. Hence UM requires a majority (MORE than 50%) to achieve quorum. If you have two data centers, then you obviously have the possibility of EXACTLY 50%, so you should assign the Prime flag to one site to designate which site can continue in the 50% scenario. This prevents split-brain.
If you have a single data-center, then a 3-node cluster is recommended.

1 Like

Thanks Jonathan.

I have 3 test cases,
Case 1-
Planning to create 2 sites (A & B) in two different data centers (having 2 instance in each site) and enable ISPrime falg on only one site . All together in cluster

Case 2 -
Planning to create 2 sites (A & B) in two different data centers (having 2 instance in each site) and enable IsPrime flag on both sites. All together in cluster

Case 3 -
Planning to create 1 sites (A ) in one data centers (having 2 instance created in 2 (A & B)different data center) and set IsPrime flag to one instance .

Which case is giving  HA?

Krishna,

If you want a 4-node cluster, then Case 1 is what you want. It will be resilient against failure of any one machine without human intervention. If you lose the entire non-prime site, then also no human intervention will be needed. Only if you lose the entire prime site, will it be necessary to manually set the prime flag on the other site to restore cluster operation.

Case 2 is not supported and will not work. You cannot set the prime flag on both sites.

I don’t understand Case 3, but if you mean having your UM sites not matching the actual data centers, then this will also be a non-supported configuration that will not work.

Check out this video for more details on UM clustering: http://techcommunity.softwareag.com/pwiki/-/wiki/Main/Universal+Messaging+Clustering+Demystified

1 Like

Hi Janathan,

Thanks for the information.

Case 3 will be-

Having 2 UM instances in two different data centers , we combined both instance using Site and set Is Prime Flag to one of the instance. 

 I think in this approach, need manual intervention when Prime node is down like Case1. Correct?

In case 1 we e using 4 nodes , but here we are using only 2. Am thinking this will give better performance compare with case 1. correct me if i am wrong?

You understanding is correct.
The main difference is that your 2-node cluster will require manual intervention if either the prime MACHINE fails, or the prime SITE fails.
The 4-node cluster will be resilient to failure of any one machine without human intervention. That would only be needed if the prime SITE fails (which is very unlikely).

Yes - performance of a 4-node cluster will be a little lower than a 2-node cluster and network traffic will be a lot higher.

1 Like

@ Janathan,

Thanks for sharing this and will come back to you in case of any additional queries on the same.