we are currently implementing an agent that should listen on new pending Operations via MQTT / SmartRest 2.0 Subscription. The operations are usually targeted to the child devices of the agent, so the agent is registering itself as a parent to be notified on all child operations, as well.
That works fine so far.
For scalability and availability reasons it is planned to run with at least 2 instances of the agent…which means that the relation between agent and child devices is not static (in case of a failover, or whatever)
Now…what happens if a second agent registers itself as parent for a child device, that belongs already to another agent?
I can see both agents are actually in the deviceParents array of the child, so this seems to be valid.
But what does that mean for the operation subscription? Will both parents receive it (and execute it)? Or just the first one (or the last one?) Is that specified?
We obviously want to make sure each operation is only executed once, even in case a failover happened.
from a MQTT perspective a Child Device can only have one Agent which handles the Operations, most likely the Agent (with the client ID) that created the Child Device. So you cannot register a second agent (with a different client ID) as a parent for the child device. If you do this via REST the messages will be just sent to the first agent and for the second Agent ignored.
How is your detailed plan to run two instances of that agent? Using the same client ID? Active-Passive?
It will work if you use the same client id. In my test only the last client who has subscribed to s/ds will retrieve all the operations. So an Active-Active won’t work but Active-Passive could work if implemented right.
thanks for the response. The core functionality of the agent is
receive MQTT messages from the devices (via load-balanced brokers) and generate C8Y Events for each
receive Operations from C8Y, reformat them to the device format and send a MQTT message to the device topic on the broker
A agent discovers child devices by seeing messages from the device: usually a connect message, but any message from the device will do. If a device comes online, the agent registers it as a child device to receive operations for it. The devices have already been created by a separate provisioning process before this point, so the agent will only look it up.
Now, the setup aims to support up to 100.000 devices … and operations should be delivered in realtime, as long as the target device is online.
We therefore need a scalable and (high) available setup of multiple agents taking care of the incoming messages. The relation between a device and its handling agent usually is sticky. The load balancer will make sure that messages from one device usually go to the same broker - but it’s not guaranteed. There might be rebalancing or just an agent going down. This is the situation when we need to make sure we still receive an operation (near) realtime and that we only execute it once.
Right now each agent uses a different client id and has its own ManagedObject, therefore we tried to implement a mechanism that will change the parent-device relationship if the visibility of a device changes to make sure it always belongs only to the correct parent (agent).
Unfortunately that is very tricky: we need to add the agent as new parent to the device AND remove all other existing parents from it. These are at least two requests per swap and we need to track which device belongs to what agent.
So I’m trying to find a more simple way to make sure Operations are handled properly.