Author: Vaidyanathan, Praveen
Supported Versions: 9.12 to 10.4
Introduction
API Gateway uses Elasticsearch as its primary data store for persisting different types of data like APIs, Policies, Applications etc apart from runtime events and metrics. This document is intended to provide some basic guidelines for configuring and managing Elasticsearch. At different points in the document, we will provide the configurations/tunings details vis-a-vis API Gateway. Please note, though the information provided in this document would enable to you get started with the basic configurations, it is important that you refer to the official Elasticsearch documentation for completeness.
Elasticsearch Basics
Document
A basic unit of information which can be stored and retrieved. A document is managed in JSON format. An API instance basic information can be visualized as a document.
Type
Type is a collection of documents which shares common attributes."API: type is a type which defines common attributes of APIs.
Index
Index is a collection of documents which are grouped as types. Till 5.6.4 version of Elasticsearch. From 6.X each index can have a single type.
Shard
Index can be divided as one or more shards. Shard is nothing but a data partition. This is needed as the index ( no. of documents) might not fit in a single node. So its partitioned and stores across nodes in a cluster and helps to scale horizontally. Data partition also speeds up the search performance can search can be done parallelly across shards of an index. Each document will be part of one primary shard and one or more replicas.
Replica
Replica is a duplicate of a shard. Replicas help to recover the data on shard loss or machine crash. Basically it improves the data availability.This helps to improve the concurrent reads and hence to scale out the searches.
Nodes
Node is an instance of Elasticsearch process. A collection of Nodes is called a cluster. There are many Nodes but the 2 most important types are:
Master Node
There will be only one master node in the cluster. The master node is responsible for creating or deleting an index, tracking which nodes are part of the cluster, and deciding which shards to allocate to which node. One master will be elected from all possible master eligible nodes (node.master=true) and all nods are master eligible by default.
Data Node
Data nodes hold data and perform data related operations such as CRUD, search, and aggregations. By default all nodes are data nodes (node.default=true).
Read / Write Semantics
For a write (update, delete), the request will be routed to primary shared. The primary validates the data, writes the changes and replicate the same to other shards. Once all replication is done successfully, the ack is sent back to caller.
For a read, the request is sent to all nodes which could serve the requests in "scatter" phase. the result is consolidated in "gather" phase and sent back to caller. if there are more replica, the request will be sent by round robin.
Refer read/write model for more details.
API Gateway Data Types
API Gateway data can be broadly classified into 4 types.
1. Core Configurations
This includes APIs, Applications, Policies, Plans, Packages, Administration Settings, Security Configurations (Keystores/Trustores) & Tokens (OAuth/API Keys). This data, by default, is stored in Elasticsearch embedded in API Gateway (as InternalDataStore).
2. Runtime Transactions
This includes the runtime transactions events and metrics data. By default API Gateway stores this data in Elasticsearch (InternalDataStore) but this can be modified so that API Gateway can emit this data to other destinations (like external Elasticsearch, RDBMS, DES etc).
3. Application Logs (from 10.3)
API Providers can configure the application logs to be stored in Elasticsearch (InternalDataStore) but this can be modified. Refer to the product documentation for more details.
4. Audit Logs (from 10.2)
API Gateway, by default, stores the audit logs in Elasticsearch (InternalDataStore) but this can be modified. Refer to the product documentation for more details.
Separation of Data
It is possible to separate the Core Configuration store from the other data store (such as Runtime transactions, Application Logs, Audit Logs). And this is recommended in the cases when there is a large volume of runtime transactions data and you want to scale the Elasticsearch used for runtime transactions independent of the default store. This is possible because API Gateway provides options for the API providers to configure external destinations to which API Gateway can emit the data such as Runtime transactions, Application Logs, Audit Logs. Refer to the product documentation to configure the external destination store.
Elasticsearch Configurations
Using External Elastic Search
API Gateway installation bundles a default Elasticsearch (IntenalDataStore) and all the data by default is written to this Elasticsearch. But the user can configure the API Gateway to use any external Elasticsearch.
Following table shows the Elasticsearch support matrix for API Gateway.
API Gateway
|
Elasticsearch
|
Comments
|
---|---|---|
9.12 | 2.3.2 | |
10.0 |
2.3.2 | |
10.1 | 2.3.2 | |
10.2 | >=2.3.2 and <= 5.6.4 | InternalDataStore is 5.6.4 but API Gateway can work with versions below 5.6.4 |
10.3 | >=2.3.2 and <= 5.6.4 | InternalDataStore is 5.6.4 but API Gateway can work with versions below 5.6.4 |
For 10.1, please refer to this article.
For 10.2 & 10.3, please refer to product documentation, API Gateway Configuration Guide or Configuring External Elasticsearch for API Gateway 10.2.
Elasticsearch HTTP Client Configurations
API Gateway provides options for you to control the way that API Gateway will talk to the Elasticsearch. Refer to product document, API Gateway Configuration Guide, to get more details on the same.
Elasticsearch server Configurations
API Gateway defaults to the Elasticsearch default configurations for its InternalDataStore. The default Elasticsearch configurations are generally good for starters. Following are the default configurations that you can find in InternalDataStore\config\elasticsearch.yml
Default Configuration
cluster.name: SAG_EventDataStore
node.name: XXXXX1540445984673
path.logs: C:\APIGW\InternalDataStore/logs
network.host: 0.0.0.0
http.port: 9240
discovery.zen.ping.unicast.hosts: ["localhost:9340"]
transport.tcp.port: 9340
path.repo: ['C:\APIGW\InternalDataStore/archives']
discovery.zen.minimum_master_nodes: 1
- For a production setup, you may have to change the defaults especially the cluster, networking, logging and data configurations. This document does not give you all the details on the configurations. You can refer to https://www.elastic.co/guide/en/elasticsearch/reference/current/settings.html.
- It is important that you run through the configurations mentioned at https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html.
- Detailed configurations can be looked up at https://www.elastic.co/guide/en/elasticsearch/reference/current/modules.html
Data location
Elasticsearch provides an option for you to configure the locations where you would want your data and logs to be stored. You should make sure that you specify the locations that are accessible and will have enough disk space. It is also important to monitor and ensure basic housekeeping of the data location by plan an effective data retention strategy. Following configuration lets you change the defaults - https://www.elastic.co/guide/en/elasticsearch/reference/current/path-settings.html
Data location
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
Network
Elasticsearch, by default, binds to the loopback address. It is important to change that for production deployment. https://www.elastic.co/guide/en/elasticsearch/reference/current/network.host.html
Sizing Strategy
The sizing strategy for Elasticsearch will depend on the volume of data that you plan to store. Generally, the Core Configurations (explained in the Data Types section above) does not have scope to grow in runtime (except for consumer applications & security tokens) and hence your initial sizing should depend on the number of APIs, configured policies, estimated number of applications/subscriptions & estimated number of security tokens.
The Runtime Transactions data has the potential to grow big. Runtime transactions and metrics are stored when Logging and Monitoring policies are enabled. You can base your sizing strategy based on the following parameters
- Transactions per second
- Average size of the payload (request + response)
- Data Retention strategy (purging interval)
The size of Application logs and Audit logs purely depend on your log configurations. You have to make a best judgment here.
The disk space allocation for all the above will also depend on your sharding and replication strategy (see below) as enabling these will distribute your data across different Elasticsearch nodes.
Sharding & Replication Strategy
An Elasticsearch index may store a large amount of data. When dealing with a growing data set, it is important to plan your sharding and replication strategy to meet your demands of high availability and performance throughput. Read the basics of sharding and replication at Shards & Replicas.
The sharding strategy should primarily depend on the growth of your data set and performance throughput. Lesser the number of shards the more will be the size of each shard and more computation resources needed on nodes that are holding the shards. On the other side, more the number of shards, there may be an overhead on the performance as the query results from all the shards will have to be consolidated. Hence there has to be a balance between the number of shards per index. Additionally, this is a quote from https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster,
"The number of shards you can hold on a node will be proportional to the amount of heap you have available, but there is no fixed limit enforced by Elasticsearch. A good rule-of-thumb is to ensure you keep the number of shards per node below 20 to 25 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600-750 shards, but the further below this limit you can keep it the better. This will generally help the cluster stay in good health"
Since the shard size will have an impact on reallocation (in case of failover) and reindex (if needed), the general recommendation is to keep the shard size between 30-50 GB. So if you believe that your index might grow up to 600 GB of data, then you can define the number of shards as follows, assuming there are 3 Elasticsearch nodes with each of these nodes will run with 4 GB heap and 2 replicas per shard.
- Number of shards (30 GB/shard) - 20 primary (with replicas it is 60)
- No of shards per GB - 5 (max recommended 20-25)
- No of shards per node - 20 (max recommended for 4 GB node is 80-100)
- No of Elasticsearch nodes - 3
As your data grows, you can keep adding more nodes and Elasticsearch will do the rebalancing of the shards across the new nodes keeping the number of total shards constant.
For replication, the general recommendation for a production cluster is to have 2 replicas per shard for handling failover.
API Gateway ships 7 indices by default. They are
- gateway_default (store for Core Configurations)
- gateway_default_analytics (store for runtime transactions)
- gateway_default_dashboard
- gateway_default_license (store for license details)
- gateway_default_audit (store for audit logs)
- gateway_default_cache (store for cache statistics)
- gateway_default_log (store for application logs)
When it comes to Sharding and replication of the above indices, API Gateway leaves it to Elasticsearch default (which is 5 shards per index & 1 replica for a shard). You can specify the number of shards per index level. So based on your estimate on the index(es) dealing with a growing data set, you plan the number of shards for that index(es). So let's say you estimate large volumes of runtime transactions, then you increase the number of shards for that index appropriately. For others, you can go with the default value (5 shards per index). For replication, it is better to stick to the general recommendation of 2 replicas per shard irrespective of the index.
Heap and Memory Configurations
For JVM configurations, you can do these changes in config/jvm.options file. Take a look at https://www.elastic.co/guide/en/elasticsearch/reference/current/jvm-options.html.
The default heap configuration for Elasticsearch is 1 GB. More often than not, in a production setup, this need to be changed. More inputs on Heap and main memory configurations can be obtained from https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html and https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html
Cluster Strategy & Configurations
Recommendations
- A stable master is important for the stability of the cluster. If your cluster really grows( > 8 nodes) or seeing stability issues, its recommended to host dedicated master nodes. i.e node.master=true , node.data=false , node.ingest=false
- To avoid a split-brain issue, it's recommended to use "quorum" for selecting a master. quorum = (No. master eligible nodes / 2) + 1. At least 3 nodes should be participating in the selection process to avoid split-brain. i.e discovery.zen.minimum_master_nodes=2
- By default "write" decision use "quorum". quorum = ((primary + replica) / 2) + 1. By default, each index will have 5 primary shards and 1 replica. The best that you could have is n-1 replica where n is the number of nodes. This provides protection in any worst case circumstances. But more replica means more work the data is sent to all replica but in parallel. More replica is bad for write heavy applications. But this also speeds up concurrent reads as replicas share the load. Less replica is bad for read heavy applications. 2 might be a good value up to 5 nodes to start with. Replicas can be changed at any time.
Securing Elasticsearch
Refer to Securing Elasticsearch for API Gateway 10.2 for more information on securing Elasticsearch.
Data Management
You can use API Gateway's Backup utilities to periodically back up the data in API Gateway. You may either adopt a similar strategy for all types of data or plan different strategies for different data. For example, you can adopt data backup strategy for your core configurations whereas you can adopt archive and purge for your transactions, audit and application logs based on your data retention period.
Backup and restore
When you perform a backup, API Gateway does only an incremental backup of your data. API Gateway backup and restore command-line utility will help you to take complete or partial backup and restore of API Gateway. You can refer to the Back up and restore of API Gateway assets and Periodical Data backup for more details.
Every time you back up your data, Elasticsearch creates a new snapshot that is incremental to the previous snapshot. In case you perform a daily backup, it is also important that you have a strategy on how many snapshots that you would want to maintain since in most of the practical cases it is sufficient that you maintain the backups for last few days. In such cases, it is a good practice to clean up the older snapshots. Use the following command to perform the cleanup.
cd <apigatewayInstallationLocation>\IntegrationServer\instances\default\bin
To List all the older backup
apigatewayUtil<.bat|.sh> list backup
All the created backups will be list in chronological sequence with the oldest backup listed first.
Example:-
./apigatewayUtil.sh list backupBackups available in default are
default-2019-may-27-15-18-3-618000000
default-2019-may-27-15-18-38-752000000
default-2019-may-27-15-18-48-435000000
default-2019-may-27-15-21-43-506000000
To delete a backup
apigatewayUtil<.bat|sh> delete backup -name <backupName>
Example:
./apigatewayUtil.sh delete backup -name default-2019-may-27-15-18-3-618000000
Archive and Purge
It is important to make sure you have an estimate of the data that you might store depending on your data retention strategy to help manage the data. Typically after your planned data retention period, you will have 2 options
- Purge old data
- Archive and purge the data
API Gateway provides option to archive the transactions, audit or applications logs data from the UI. For automation, you can use the Administration REST APIs (APIGatewayAdministration.json).
Dynamic configuration updates
Elasticsearch APIs can be used to update most of the configurations dynamically. You use the Cluster settings API to perform the configuration changes. Please refer to https://www.elastic.co/guide/en/elasticsearch/guide/current/_changing_settings_dynamically.html and https://www.elastic.co/guide/en/elasticsearch/reference/2.4/cluster-update-settings.html for more details.
You can find the list of all configurations that can be dynamically updated at https://www.elastic.co/guide/en/elasticsearch/reference/2.4/modules.html#modules
Monitoring
For Elasticsearch monitoring, no plugins are shipped along with API Gateway but there are a lot of good plugins available in the web. X-pack is a commercial monitoring solution from the same company which created Elasticsearch. The other good open-source alternatives are
Across Data Centre deployment
Elastic search does not encourage the deployment of ES across datacentres but has proposed few alternatives for the same.
Refer https://www.elastic.co/blog/clustering_across_multiple_data_centers.