We have an integration server cluster and a UM cluster using publishable documents and triggers to pass data between one to many systems.
The issue we’re running into is we’re unsure how to queue message on the subscribing trigger when the destination system is down for planned maintenance. Currently the trigger will pull messages from the UM immediately and disabling the trigger causes new messages to be not queued on the UM.
What is the best way to implement pub/sub with planned maintenance windows for subscriber systems?
Hi Daniel, as in so many cases the answer is “it depends”. Without knowing your system this is hard to say.
The most common way to suspend the flow is to suspend or even disable the trigger on IS side, this will make the messages remain in UM queue for later processing.
You need to understand what such a queue would mean for your sending systems. Would they run into timeouts and worst case have an logic of resubmission. In this case you would get duplicates or even worse of your input feeds inside the UM queue.
You also need to know your queues and event parameters. UM support overall QUEUE and also Event specific time to life (TTL). If that is setup to be lower than your time of queue you will lose transactions because you system will wipe them from the UM queue automatically.
You wrote: “disabling the trigger causes new messages to be not queued on the UM.” – this sounds like there is such an TTL configured.
You need to be sure you understand your overall solution to be “thread save”. This means that even you queue and later process the backlog, you need to know if change the execution sequence different from the time of arrival in your queue would be an problem for your data integrity. Unless your queues and triggers are configured to work single threaded / serialized, it is possible that the execution order is not the sequence in that your incoming messages arrived.
So the best is for sure also to inform your sending systems to suspend their feeds to you, so you now get messages to queue.
I checked the TTL settings and reconfigured them and then disabled document retrieval and documents remains on the queue! Thank you for the advice.
We’re aware that queue management will be a big task but we already have monitoring and alerting on our queue depths. We need to be able to pause triggers now during planned maintenance of our external systems.
Obviously unplanned outages will be handled by our error handling process and will require some manual intervention to make sure all data is passed through to the external systems without accidently rolling anything transations back.
Good to hear that help was so simple. There are several possible design patterns to use such queue logic in order to hide outages as much as possible to external parties. The SAG Professional Service team can help you on that.
In the document type if the “Discard” property under webMethods Messaging is false, then the TTL is immaterial. We use IS Administrator to suspend triggers as needed. In general, suspending a particular target should not matter at all to publishers though that’s not always going to be the case if a publisher is “waiting” for a response to come minutes, hours, days, weeks later.
Terminology-wise, be aware that a “disabled” trigger does one thing and a “suspended” trigger does another. In IS Administrator, you can suspend but not disable. In Designer, you can disable but not suspend.
I think this contributed a lot to the issues we were having.
A miscommunication between disabled and suspended. This has been cleared up now.
Thanks for the advice.