On-Call support advice wanted

Hello and Happy New Year!

I don’t know if I’m not typing in the correct keywords or not, but I searched through the forums trying to find on-call advice. We are EAI on-call 24x7 and the ways things are set up right now is that we get paged for every error known to man.

Basically things are set up as such that any sort of error that occurs, whether it’s a database having a connection problem to someone typing the wrong password when trying to access MyWebmethods, automatically generates an error email that hits our blackberry. As you can tell from my explanation, this is very annoying and most times, there are errors that appear that we can do nothing with, but to be constantly paged for minor things is not good.

There has to be a better way and I need your help. Can I get some ideas from you guys out there who are on-support? It is my hope that someone has a better setup than this. How does your work? I’m thinking that perhaps we need to redo our packages and do something in the catch block, so that we can somehow differentiate an urgent error vs a minor error, but I don’t know what.

Again, overall, our setup is that ANY error (whether major or minor) within the IS Admin’s Error Log page automatically generates an error email which is sent to our Blackberry and this happens 24x7. Please help us come up with a better solution.:confused:

Thanks.

Sorry, it seems that I probably should have placed this question under Webmethods Architecture??? Not sure.

Sam,
About a million different ways to handle this and I think we have all been through it. All errors should not page only critical ones deemed critical by the app teams and infrastructure teams. Setting that up however will require some work.

The developer of the flow services in IS should handle their own errors. They can set them up in categories such as critical(page), warning(email only), or debug. I’m a firm believer that the flow service can handle most types of errors it encounters without have to broadcast it.

For server type internal errors you have a few options. Once again though most of these can be handle by the flow service because that’s really were the transaction is going to have to be handled anyway ie database down, ftp site not responding, http server down can all be handled by the individual flow services. Data related errors should page out to those who can actually fix them. For IS related errors you can turn off the notification from the Admin screen or turn down the logging. You can also subscribe to events instead which can be a bit more robust than just broadcasting emails/pages everytime something gets thrown to the log.

Bottom line is work with your developers and infrastructure team to come up with your SLA’s and notification procedures. Simply broadcasting emails and pages will just about guarantee that they will be ignored. And if they won’t help you out then make sure they get added to the distribution lists.:smiley: