Cumulocity IoT Edge: Fault Tolerance and Data Resilience vs High Availability (HA)

Nick.Ponomar · June 28, 2023, 4:11pm

Introduction

In the realm of computing and information technology, ensuring the continuous and uninterrupted operation of critical systems is of utmost importance. Two concepts that play a pivotal role in achieving this objective are fault tolerance and data resilience, as well as high availability (HA). While both aim to minimize downtime and maintain system reliability, they differ in their approaches and priorities. In this article, we will explore the distinctions between fault tolerance and data resilience versus high availability, and how striking the right balance between them is crucial for building robust and dependable systems.

Terms and definitions

Fault tolerance refers to a system’s ability to continue functioning, even in the presence of hardware or software failures. It involves designing systems with redundancy and built-in mechanisms to detect and recover from faults. The primary goal of fault tolerance is to ensure that failures or faults do not result in data loss.

Data resilience focuses on preserving data integrity and availability, particularly in the face of unexpected events or disasters. It involves implementing strategies and technologies to protect data from loss, corruption, or unauthorized access. Data resilience encompasses measures like data backups, replication, encryption, and disaster recovery planning. The primary objective is to ensure that critical data can be restored and accessed promptly, even in the event of a catastrophic failure

tony-stark-dr

High availability (HA) takes a slightly different approach. It emphasizes maximizing system uptime and minimizing downtime, with a strong focus on continuous service availability. HA systems are designed to provide seamless and uninterrupted access to services, even when individual components or subsystems fail. They typically employ redundancy, load balancing, and automatic failover mechanisms to ensure service continuity. The goal is to minimize service disruptions, maintain consistent performance, and meet the demands of users or customers who require constant access to the system.

dr-stange-ha-g-small

So how does Cumulocity IoT Edge address those concepts?

Cumulocity IoT Edge is a single-node distribution of Cumulocity IoT that includes the majority of Cumulocity IoT Platform features, enabling organizations to process and analyze data closer to its source.
While it does not strive to deliver high availability, it ensures fault tolerance and data resilience.
It is of the utmost importance to configure database backups (1) or even creating your own MongoDB database (2) (available with Cumulocity IoT Edge offering on Kubernetes starting with version 10.17)

Even though it is not the object of Cumulocity IoT Edge to achieve HA, there are options for scaling part of the solution independently from the Edge server itself. For example, if your devices rely on some specific connectivity, it would be possible to scale just the connector part while keeping a single node server.

If you are interested in investigating the above-mentioned issues in your use case, please pm or just drop me an email at nick.ponomar@softwareag.com