Best practices and patterns for using non-IoT data in Cumulocity

system · May 14, 2019, 11:31am

Introduction

Cumulocity has a very flexible and extensible data model that enables you to represent and manage almost any IoT device and its data. While building solutions with Cumulocity, you will encounter situations in which you need to store additional data that is part of your solution but that is not IoT- or device-related data. In this article, we explore the options for such data based on our experience in building IoT solutions:

Tenant options for configuration data.
Files for static data.
Events for time-series data.
Inventory for master data.
Binaries for larger data chunks.
External storage for large amounts of non-IoT data.

Configuration data

If the additional data is configuration data that your users may want to modify, you can use an “option”. An option is simply a value that can be quickly retrieved using a category and a key. The following request retrieves the option “enabled” in the category “two-factor-authentication”:

GET /tenant/options/two-factor-authentication/enabled

{
"category": "two-factor-authentication",
"key": "enabled",
"value": "true"
}

Categories, keys, and values are not limited to the specific uses in Cumulocity, so you can add your own, user-editable configuration parameters here.

POST /tenant/options
{
"category": "mycategory",
"key": "mykey",
"value": "myvalue"
}

Static, non-confidential data

If the additional data is mainly static and not confidential, a very low overhead way to store the data is simply as files in an application.

Let’s say that your devices send numeric error codes in certain situations. You would like to create an error database that translates the error codes into nicely readable text, an error category, and perhaps some documentation. For each type of device, you create a file “<type>.json”. These files are not user-editable and you only add new information when you manufacture new types or versions of devices.

For example, “mymachine.json” contains:

{
"7401": { 
"category": "System",
"severity": "Major",
"summary": "Mains failure",
"details": "This error occurs when there is no AC power supply on site. To check ..." 
},
"7403": {
"category": "System",
"severity": "Critical",
"summary": "Low battery voltage",
"details": "This error occurs when the battery voltage reaches 46V. To check ..."
},
"7410": {
"category": "System",
"severity": "Minor",
"summary": "Door open",
"details": "The door sensor was triggered by an unauthorised person." 
},
…
}

Zip these files into “mymachines.zip” and simply upload it to Cumulocity using “Add application” in the “Own applications” menu of the administration application. Now when you would like to show explanations for errors in your web application, use, for example:

$.get("'/apps/mymachines/mymachine.json', function(errors) { ... })

or

import errors from '/apps/mymachines/mymachine.json'';

to simply load your error database for “mymachine”.

Data with timestamps

If your additional data has a timestamp, you can use events. Events have a type, a text for printing in the user interface, a source-managed object and any additional data that you may want to store. Events do not have to be produced by devices.

For example, assume that you want to store a log of notes or chat messages for your devices:

POST /event/events

{
"source": { "id":"17815" },
"type": "c8y_ChatEvent",
"text": "Can you have a look, please? It does not seem to connect.",
"time": "2019-04-18T12:03:27.845Z"
}

Such chat messages can be queried by type and time period using the REST APIs. They can even be shown in the user interface:

Add your own data as fragments and query them by the fragment name. For example, assume that you want your chat messages to be sent between two users, “from” and “to”:

POST /event/events

{
"source": { "id":"17815" },
"type": "c8y_ChatEvent",
"text": "Can you have a look, please? It does not seem to connect.",
"time": "2019-04-18T12:03:27.845Z",
"from_Harry": {},
"to_Sally": {}
}

You want to retrieve all recent messages from Harry for this device:

GET /event/events? source=17815& dateFrom=2019-05-01&dateTo=2019-05-15& fragmentType=from_Harry

Please note the use of the “source” (= device) and date parameters. We recommend to always use at least one of these parameters in queries, as the queries may become slow for large data sets.

Master data

Querying master data

Often, Cumulocity developers store data as “managed objects” in the inventory in Cumulocity. The benefit of the inventory is that it has a very general structure and a flexible querying possibility through its OData API.

However, the inventory is not a generic database for all kinds of mass data. It is mainly intended for devices and some amount of master data around devices. You can efficiently retrieve data, for example, by querying for

The ID of a managed object or its external IDs.
Type or fragment type.
Creation time.
Name.
Any text inside the managed object through full text search ("?text=...")

Other parameters queried through the OData API (“?query=…”) may result in low performance since not all combinations of queries can be anticipated and indexed in Cumulocity’s underlying database system. However, many use cases can be addressed by the above query parameters.

Modeling master data

As a best practice, model any additional data in a document-oriented fashion. For example, assume that our error database contains confidential information that you would not like to expose to non-authenticated users. So you would like to rather store it in the inventory than as a file. If you would use a relational database design, you would create one managed object per error maybe like this:

POST /inventory/managedObjects

{
"name": "Mains failure",
"my_Error": { 
"category": "System",
"severity": "Major",
"details": "This error occurs when there is no AC power supply on site. To check ..." 
}
}

Location: https://.../inventory/managedObjects/17897

You would maybe refer to the error by using an ID combined of machine type and error code like this:

POST /identity/globalIds/17897/externalIds

{
"externalId": "mymachine_7401",
"type": "my_ErrorCode"
}

This is a technically viable approach, but it pollutes your inventory with large amounts of very granular data. A more efficient approach is to use the power of document orientation. You can arbitrarily structure document and increase the granularity, for example, by putting all error codes into one document, or all error codes of a particular machine type into one document, just like we stored it in the file above:

POST /inventory/managedObjects

{
"type": "errorcodes_mymachine",
"7401": { 
"category": "System",
"severity": "Major",
"summary": "Mains failure",
"details": "This error occurs when there is no AC power supply on site. To check ..." 
},
"7403": {
"category": "System",
"severity": "Critical",
"summary": "Low battery voltage",
"details": "This error occurs when the battery voltage reaches 46V. To check ..."
},
"7410": {
"category": "System",
"severity": "Minor",
"summary": "Door open",
"details": "The door sensor was triggered by an unauthorised person." 
},
…
}

Additional efficiency aspects

Since devices are a central concept in Cumulocity applications, the applications frequently retrieve device data. For example, the device management application will load the data for the first 100 top-level devices when you click on “All devices”.

If you model very large data structures as part of the representation of a device, working with that device in applications or microservices may become slow. For example, if you model a one megabyte size JSON structure in a device, that megabyte will be downloaded if you click on the device in an application.

In the case of complex machinery, it is often better to model the subsystems of the machinery in Cumulocity as several devices instead of putting all data into a single device. For example, if you have a wind turbine with hundreds or even thousands of parameters, you may want to model the PLC, generator, actuators, bearing controls and so forth separately as child devices of the main turbine device instead of putting all data into a single, very large device.

Large data chunks

If the size of your additional data chunks exceeds the range of a few kilobytes and you do not need to execute real-time processing on the data, you can store data using binary attachments. Binary attachments are supported for managed objects and events (if the data has a time-series aspect to it, or you want it to be cleaned up automatically after a while). They can be several megabytes in size (depending on your service provider) and are only loaded explicitly on demand. This means that they will not slow down applications or microservices that do not require the data. Examples of such data are configuration dumps or images.

Large, non-IoT data sets

Finally, if you have large sets of data that are unrelated to IoT or devices, you may want to not store them in Cumulocity at all. For example, assume that you have (or want to create) a CRM system or SIM management system. To show data from such systems in Cumulocity, you have two options:

Synchronization approach.
- Implement a microservice to map the external data to Cumulocity concepts and synchronize the data between Cumulocity and the other system regularly or on change.
- Show the data using Cumulocity widgets.
On demand approach.
- Implement a microservice to query the data from the external system using the external system's REST or Web Services API.
- Show the data using a custom widget or tab.

The benefit of the second approach is that the data shown in Cumulocity will be always up to date. You will have no additional storage cost from replicated storage. As a trade-off, some features in Cumulocity on such data will not be available (such as real-time processing with Apama). For example, Cumulocity uses the second approach for showing data from SIM management systems such as Cisco Jasper.

Summary

We have explored the diverse options for storing additional data in Cumulocity while building IoT solutions. These options range from very simple static storage to complex and large data models. We hope that you will find the best level of performance and functionality using the guidelines outlined above to pick your data model.