Correct way to retry IS Trigger Service

Hi - Just wanted to ask about the correct way to build an IS trigger service that must retry until it succeeds.

The requirements are:

  • the trigger gets a document from the broker and executes the IS trigger service.
  • if the trigger service fails, the document stays on the broker queue
  • the trigger keeps retrying the same document - if it succeeds, the document is removed from the broker queue

To meet these requirements, here’s what I do:

Trigger Settings
• Set trigger ‘Retry failure behavior’ property to ‘Suspend and retry later’
• Set a suitable trigger resource monitoring service

Trigger Service Structure
• The following Flow service structure is used:

[b]Try/Catch - Outer Sequence[/b]
	[b]Try Sequence[/b]
		(service steps)
	[b]Catch Sequence[/b]
		Set error flag to ‘true’
[b]BRANCH on error flag[/b]
        If 'true', throw pub.flow:throwExceptionForRetry

The trigger service structure is pretty strict. For instance, if I move the ‘BRANCH on error /throwExceptionForRetry’ code block to be under the ‘Catch’ sequence, it does not work – the service just fails, no retry occurs and the document does not stay enqueued on the broker.

Is this the right way to setup a retry-capable IS trigger? Is the structure of the IS Trigger Service correct? Specifically, I was wondering why the ‘BRANCH’ does not work under the ‘Catch’ sequence.

In my views you should only configure trigger retry in case of transient error.

Keep the following points in mind when configuring a retry limit for a trigger:

  • Your flow service should catch a transient error and re-throw it as a run-time exception, by invoking pub.flow:throwExceptionForRetry. If a transient error occurs and the trigger service does not re-throw it as a run-time exception, the trigger service ends in error. The Integration Server will not retry the trigger service.

  • The Integration Server does not retry a trigger service that fails because a service
    exception occurred.

  • In the Properties panel, under Retries on error, in the “Deliver until” property, select either “Max attempts reached” or “Successful”. In your case (unlimited retry) you should select “Successful”. However I would not recommend this option. Please note that if a trigger is configured to retry until successful and a error condition is never remedied, a trigger service enters into an infinite retry situation.

  • If you select “Max attempts reached” for “Deliver until” property then you should also define, “Max attempt” and “retry interval” property. In this case you can use pub.flow:getRetryCount service in your trigger service to identify the number of attempt and accordingly you should invoke pub.flow:throwExceptionForRetry.

  • You should leave “Retry failure behaviour” to “throw service exception”

I hope this should give you a way forward.

Thanks
AmaR

Thanks a lot for your inputs Amar. However, this is still the best integration option I see that meets my requirements – ERP document delivery that tolerates extended (many hours) downtime and recovers automatically.

I want the trigger to retry when an exception is thrown in the trigger service. This is because an exception most commonly indicates a problem with the resource that trigger service is using (in this case, the trigger service tries to insert a document into the SAP ERP, so an exception indicates the ERP is down). When this happens, the trigger suspends, keeps checking if the ERP has come back up (via the trigger resource monitoring service). When the ERP is back up, the trigger resumes and starts inserting documents again.

This webMethods behavior is still not optimal. For example, suppose a data issue (a ‘bad’ document) caused an exception in the trigger service. The ‘bad’ document would ‘block’ the ‘good’ documents behind it in the queue and the trigger (assuming it is a serial trigger) would endlessly suspend and resume without going ahead.

Ideally, webMethods should give me the ability to differentiate between different classes of exceptions. For ‘bad’ documents, instead of suspending the trigger, I should be able to chuck them into some sort of ‘retry queue’ (where the data can be fixed up and resubmitted offline) while ‘good’ documents forge ahead.

That’s why I mentioned that “In my views you should only configure trigger retry in case of transient error.” A transient error is an error that arises from a temporary condition that might be resolved or restored, such as the unavailability of a resource owing to network issues or failure to connect to a database.

Whatever you want to achieve is quite possible in webMethods, you just need to implement it in a proper way. Following document should be very helpful for you.

  • GEAR 6.5 Error Handling
  • webMethods_Error_Message_Reference_6_5_and_6_5_1

Let me know if it’s of some help.

Many Thanks
AmaR

if you set trigger property to retry untill successfull, then its good to have a custom logic in the catch block to differentiate between exception thrown in case of any ERP down or in case of bad document.

I agree with with amar that setting trigger property to retry untill successfull will cause flow to in infinite loop. Administrator will have to manually delete the document from broker queue or will have to change the trigger properties in production which won’t be allowed.

Why not built a helper service to which check the state of ERP or any other back end and then take appropriate action like suspending the trigger in case of ERP is down.

my two cents.

“…I should be able to chuck them into some sort of ‘retry queue’ (where the data can be fixed up and resubmitted offline) while ‘good’ documents forge ahead.”

You can set up document/audit logging to do this. When the service encounters a bad document, have it throw an exception/exit signaling failure. With audit logging turned on the pipeline is saved to the DB. In Monitor you can view failed services and rerun them, optionally editing the data in the pipeline if necessary (this should be viewed as a exceptional/rare process, not as a primary recovery approach–bad data should be fixed at the source, not by the messenger).

The suspend trigger/monitor to resume approach is a good one. Just add on the extra things you want to cover the special cases you identify.

[updated 12 August]
Thanks guys. I already have a resource monitoring service defined in the trigger that automatically suspends and resumes the trigger on exception.

I think Rob’s suggestion to use PRT based logging (in conjunction with service audit logging settings) to capture and review irrecoverable exceptions in Monitor/ MWS is a good one.

One key sticking point is determining - reliably - which exceptions are irrecoverable, and which are not. One approach is inspect the output of getLastErrror. Another is calling the trigger’s resource monitoring service in the catch block. If the resource monitoring service returns failure that implies (but does not gaurentee) that the exception was caused by the resource being temporarily unavailable - i.e. the exception is recoverable. In that case, you’d set the error flag and throwExceptionForRetry.

To summarize, the code block should look like this (amended based on Rob’s input below).


[b]Try/Catch - Outer Sequence[/b]
	[b]Try Sequence[/b]
		Service steps...  (e.g., insert into ERP)
	[b]Catch Sequence[/b]
		call getLastError [b]and/or[/b] call trigger resource monitoring service
		If Exception is recoverable (e.g., ERP is down)
			Set error flag to ‘true’ 
		If Exception is irrecoverable (e.g., bad data rejected by ERP)               		  
			EXIT with failure (logs pipeline in PRT) 
[b]BRANCH on error flag[/b]
        If 'true', throw pub.flow:throwExceptionForRetry

If anyone can shed light on these two remaining mysteries, that would be beautiful :

  1. Why won’t the BRANCH on error flag code work within the ‘catch’ block?
  2. Why should (as Rob states), the EXIT with failure step be the last step in the ‘catch’ block?

Looks like a good approach to me.

Be careful with the EXIT ‘$flow’ and signal FAILURE within the catch block. If it isn’t the last step in the SEQUENCE exit on DONE block it may not behave as desired.

Thanks Rob - i updated my code block in the previous post to reflect this plus asked a couple of questions

I had mentioned “…may not behave as desired” in my earlier post because I wasn’t sure what would happen. I ran a quick test and found that an EXIT ‘$flow’ and signal FAILURE from the catch block will indeed exit. I was thinking that because the typical catch block is a SEQUENCE Exit on DONE that it might ignore any exceptional condition such as an exit with failure. But that doesn’t appear to be the case (which is good!).

The key thing here is to verify that the logic in a catch block is functioning as desired via testing.

Sonam,

Your approach is almost perfect. That is the correct way to handle scenarios that require indefinite retries.
The answer to your question about why the throwExceptionForRetry cannot be called from the catch block :

The catch block, unlike a java catch block, is a sequence with an EXIT setting. The setting for the catch block is usually “EXIT on DONE” or “EXIT on FAILURE”. So if it is the former then any exception in the block will be suppressed. That means your Exception for retry is lost. If it is the latter then the SEQUENCE wraps the exception into a FlowException and throws that. That would mean that your retriable(is that even a word ?) Exception just became non-retriable.

So what you are doing is the only way to use the throwExceptionForRetry service.

Off course, you do want to be careful with what constitutes a forever retry otherwise you can end up with a document stuck in the queue forever. If there is reason for you to doubt the way the adapter service throws a transient or non transient exception you can use your own logic but then you can end with a forever loop.

I do rememeber needing something to determine whether an exception thrown from a service is transient or not because some of the children flows had adapter services in catch blocks which wrapped up the exception. The code for that service is now in PSUtilities and is called ps.util.system:getExceptionType.

It works because the ART will throw a com.wm.pkg.art.error.DetailedSystemException on a transient error. If the lastError structure has that pattern you can kind of unwrap it by throwing a runtime exception.

Hope I clarified the situation somewhat and did not add to the confusion.

Rupinder

Good to see you back on the forums!

Absolutely clarified it! Thanks for your help, and as Rod states, good to see you here again. :slight_smile:

We are trying to implement something similar and I couldnt find the service getExceptionType in our PSUtilities package. We are on WM6.5. Is there any other way that i can identify the exception type??

Thanks,
Manju

Any one else who has used ps.util.system:getExceptionType or something similar to identify the rootException type, can you please pitch in.

Thanks
Manju

PSUtilities version 2.01 has that service. An earlier version may also have it. You’ll want to review the PSUtilities package available on Advantage or on the SoftwareAG site.

Hi
Is there any possibilities to put documents into check to find good r bad document before putting into the trigger queue.
By any other means it won’t waste the trigger time on retry without strong reason to retry on a document to subscribe.

Hi I just viewed this post and I would like to know how could we retry manually the service trigger when the max retry attempts is reached.
thank you

If audit logging for the service is enabled, you can resubmit the service via MWS.

thank you for these clarifications Reamon.
I have another question: When we resubmit the trigger service via MWS, does this change the trigger state from suspended to enabled and consequently release messages that are stuck in the queue?