Broker highly available architecture strategies

I’ve been wondering on the optimum approach to making the broker highly available…

At 6.x I deployed the broker into a VCS cluster 1 pair of FSC650’s it works OK (mostly) but it is a compromise.

a) First problem is there is only “one” if that one goes down best case you are waiting for VCS to bring it up again on it’s remote node, worst case you are going to be faffing about with some form of file corruption. Your cluster won’t help much here.

b) Second problem is scaling. Your broker is going to be limited by the capacity of the cluster member, granted it’s likely to be disk bound rather than CPU bound, none the less capacity issues involve getting a ‘bigger’ box which is expensive.

c) Securing the broker data files requires taking the broker down. Granted the data is ephemeral and you can argue you don’t need to do this that often. I have a very low risk threshold when I’m dealing with an entity literally worth millions and I insist it gets fully secured every night. This is very fast on our site EMC BCV’s are synced and when complete broker is stopped, BCV’s are split off. Broker is restarted and BCV volumes are imaged off to tape. I’d like to have my cake and eat it, secure the data files and do it without even the two minutes or so the stop, split and restart takes.

We are currently doing a webm 7.1 upgrade and I’m thinking about a different approach.

Broadly instead (or possibly in conjunction with hardware clustering) I’m thinking create a territory with serveral broker servers in it on a number of nodes. Configure them behind a hardware VIP (or several) connect the IS servers to the broker(s) via the VIP, essentially in an active passive config on a per IS basis, though some IS’ may have one broker as their primary and others have another.

The advantages are better availability characteristics than just simple hardware failover and better scaleability by dividing the IS between multiple brokers.

Anyone done this (does it work?). Or have a better idea?

broker ha only by external cluster is still one of the open points till at least 7.x in my opinion.

Using a territory is basically a good idead for certain scenarios, but using a load balancer may be problematically.

  1. If IS connects to a broker, there a several queues created on the broker for the IS. I doubt this will work without problem when connecting to a territory through a lb.
  2. Even if it work it will create problem for documents received by the IS. If 1) worked you will have queues on several brokers but IS wil lretrieve and commit a document only from the currently connected broker.

Afaik till version 7.x IS must be connected to a unqiue broker. Sides of a cluster must use the same boker.

If you are mainly inetersted in having uninterrupted connectivity but can accept the failover time for the documents curently in the broker during an outage you may consider using unclustered IS hosting identical functionality and connecting those to different brokers in a territory. If clients are accessing the IS the lb would balance access to the IS.

If you are currenlty upgrading you may also look at version 8. SAG introduced a logial broker there which should match the requirements for failover (havent used it yet myself). Looh for logical broker or a similar term, there is also a broker cluster mentioned in the docs but this is only a new name for a territory.