CloudSearch clustering - deployment options

webMethods API Portal guide

In a standard single-node installation, a single cloudsearch exists which will hold the index data of all tenants in that installation.

Multiple Cloudsearch instances without redundancy

When simply adding several Cloudsearch instances to the system, a tenant's Cloudsearch index data is handled similarly to how it is handled when having multiple database systems in an installation: all data of an individual tenant will be stored in a single Cloudsearch instance. If multiple cloudsearch instances are available, the data of different tenants will be distributed across the available instances, as illustrated in figure 1.

Note: Using two Cloudsearch instances in your system only allow the distribution of the data and load caused by different tenants across different instances - it does not allow spreading the query load of a single tenant across multiple instances, nor does it offer high availability. When a Cloudsearch instance becomes unavailable, the tenants whose data resided on that instance will not be able to use any components that require Cloudsearch access (in particular the business server and the portal) until the issue is resolved.

This just helps to data load distribution of multiple tenants across several instances, no high availability, no query load distribution.

Two Cloudsearch instances assigned to two different datacenters

In order to make Cloudsearch highly available, there needs to be more than one Cloudsearch instance in the system and there need to be instances

with different values for the "zookeeper.application.instance.datacenter" parameter. In general, Cloudsearch will allocate the data of each tenant to exactly one instance per datacenter (i.e., you will have as many identical copies of each tenant's index data in the system as you have different values for the datacenter parameter). If for example you have two Cloudsearch instances and declare one to be in datacenter A and the other to be in datacenter B, as shown in Figure 2, both instances will hold the full index data of all tenants in the system.

This redundancy will allow the system to continue working if one cloudsearch instance (or the entire failure scope in which it is running, i.e., the node, host, rack or datacenter) fails, i.e., it offers high availability. At the same time, as long as both instances are available, query load (but not update load) of individual tenants can also be distributed across the two instances.

A disadvantage of this variant, as usual with high availability, is that it comes at the price of increased resource usage as we need double the amount of CPUs and memory for the additional Cloudsearch instances, as now again each Cloudsearch instance has to hold the full amount of data of all tenants. Remember that a key element of Cloudsearch query performance is that it can hold the data in memory.

This helps to achieve high availability for cloudsearch component and query load distribution, BUT no data load distribution.

Four Cloudsearch instances assigned to two different datacenters

To combine the advantages of clustering variants 1 and 2, one simply needs to have both: instances in different datacenters and more than one

instance per datacenter. Figure 3 shows a minimal configuration of this variant, with two instances in datacenter A, and another two in datacenter B.

Observe how the index data of each tenant is found twice in the system, and once in each data center: For example, the data of the tenant "sag" is found on Cloudsearch instance 1 in datacenter A, and instance 4 in datacenter B. This redundancy again offers high availability, in that the system can tolerate the outage of an entire failure scope, as all data will still be available in the surviving failure scope. Note that while in this configuration the system can tolerate the outage of the two Cloudsearch instances that are in the same failure scope, this system can not tolerate the outage of any two instances! If for example Cloudsearch instances 1 and 2 fail, there is no more surviving instance that is allocated to handle the data of the default tenant. So while the guaranteed maximum number of failing instances that the system can survive has not increased compared to variant 2, we now distribute the query load of individual tenants across two instances (one instance in each datacenter) and we distribute the total amount of data (i.e. of all tenants) across two instances (in the same datacenter). As with variant 2 before, variant 3 comes at the price of increased resource usage as we need double the amount of CPUs and memory for the additional Cloudsearch instances.