I’m testing the Universal Messaging in cluster mode but when all the nodes fail the cluster doesn’t start again, it is, after starting each node, all nodes appear in off-line status so when I check them I see that they are running and the communication is ok between them.
The only way to get all working again is to drop the whole cluster and recreate it.
Any one know what is the root cause of this behaviour or how can I solve this problem?
I need to work on the same configuration without recreating the cluster even if all the nodes fail.
Hi Filiberto Conde,
If a cluster is not formed there are a number of potential causes.
Without sites a cluster cannot form if 50% or more of the realms in the cluster are offline. This is to avoid a split-brain scenario.
Sites allow one realm (or a group of realms collectively) in the cluster to be given a deciding vote in cluster election. If the realm with the “site” flag is part of the 50% of realms which are online then the cluster can form. If it can be confirmed that a node with the “site” flag is offline, the site flag can be moved to an operation realm to allow the formation of a cluster with just 50% of the realms online.
If realm nodes are online, but the cluster still will not form then it is worth checking network connectivity between the realms. There must be network connectivity in both directions from each realm in the cluster to all other cluster members. If the Realms cannot connect to each other they will not form a cluster.
The UM logs contain information about the changes in cluster state, and can be inspected to see if there is a potential problem.