Is anyone utilizing the trigger’s Resource Monitoring service found under Transient Error Handling (Properties area of trigger)?
Currently, we are in a situation where we have customers who have servers, DB, etc that go down whether it’s by maintenance (that they neglect to tell us about) or possibly even by hardware failure. We all have had this occur to us, so I know I’m not saying anything new here.
Anyway, we have something in place to whereas any failed documents will be sent to myWebmethods, so that they can be resubmitted, but we also have run into really bad situations whereas perhaps a DB went down at 1am in the morning and for the next 6 hours, failed documents are being sent to myWebmethods. When the EAI on-call person gets up the next morning, he has 2 – 3,000 docs waiting for him to manually resubmit. Not a good wakeup call, especially since myWebmethods (at least over here) is somewhat flawed in terms of how many docs you can resend at once, but that’s another subject.
After doing some research, it seems that our problem is that we need to be proactive, instead of reactive. We need to be able to detect if a DB is down and not send a thing to that DB until there is a good connection. The data should be held in broker until the server/db is available again. I did some research on this in the forum and found that perhaps the best solution is to go into Transient error handling and do the following:
Retry until: Max attempts reached
Max retry attempts: not sure yet – to be discussed
Retry interval: not sure yet – to be discussed
On retry failure: Suspend and retry later
Resource monitoring service: use this as well
I would need a service that will ask for a date timestamp from the DB and if I don’t get one back, that would tell me that there’s trouble. Of course, I would retry multiple times before coming to that conclusion. If not available, the triggers should be suspended until the DB is available again. I understand how to do the simple DB check for the date, but I’m still not sure how to put together this resource monitoring service.
If anyone is using this, can you tell me what else is in yours, so that this can help me put this together on my side? I’m not understanding what should be in here that would get the triggers started back up again, if it is found that the database is back up. I don’t see a sample in IS. I read the very small section on it in the publish-subscribe WM doc and the explanation doesn’t help me enough in terms of actually putting this together. Searched the forum (which helped me come to the above solutions), but I’m still unsure about the service monitoring service itself and everything to put in it and how.
If there’s a better solution, I’m open to hearing about that as well. Thanks for your help!