Event appears in snoop but does not get processed

fml2 · February 8, 2017, 5:37pm

Hello,

we have a weird situation and would need some hep to resolve it.

In Integration Server, we have a publishable document type with ‘Encoding type’ set to ‘Protocol buffers’. It is associated with a connection alias which points to Universal Messaging.

Sometimes, when an event of this type is published to UM, we can see it via snoop (with an event ID), but the event does not appear in the corresponding Named Object. I.e. in the named object view, the Event-ID field remains one less than the Event-ID in snoop.

The event does not get processed, which leads to a hanging process instance etc.

Is this a known problem which occurs frequently and has a well known solution?

Thanks!

Jonathan_Heywood · February 8, 2017, 5:49pm

Hi,
does your trigger have a subscription filter?
That would potentially prevent a message not reaching the named object.

fml2 · February 8, 2017, 10:38pm

Hello Jonathan,

yes, the trigger does have a filter defined. Actually, it’s a subscription trigger for a process model, i.e. the trigger (and its filter expression) has been generated by Designer during build&upload.

It’s not the first time we do that, and it used to work. But now…

The bad thing is that we don’t see any error messages in any logs. Just nothing.

We use wM 9.9. How can we verify what filter is seen by UM? Is it possible to see the filter associated with a named object in Enterprise Manager?

Percio_Castro1 · February 9, 2017, 5:27pm

fml2,

We have the same exact problem in 9.7: a subscription trigger that was previously working, suddenly stops receiving messages. It only seems to happen to triggers that contain filters. The unfortunate thing is that it’s intermittent and not easy to reproduce, which makes creating a support ticket a bit challenging.

I’m very interested in what you learn from this thread. Quick question: do you see this in all your environments? Are the Integration Servers clustered? Do you typically see this after a deployment or can it happen at any time?

Thanks,
Percio

fml2 · February 10, 2017, 12:19am

My impression is that if it works, it works. I.e. it stops working after a quantum change, e.g. a deployment. In the course of deployment, we deploy triggers and then synchronize publishable documents with the messaging provider (which happens to be UM).

We noted that it’s possible to fix the situation by disabling the trigger, deleting the named object, and then enabling the trigger again (which re-creates the named object). But of course it’s not an option for a production environment because it leads to the loss of data in the named object. And because it should just work. But it does not. And it is not possible to see what’s going on. No messages in the log, no possibility to see how the filters from the trigger have been transfered to UM. You’re just blind. The only thing remaining is to pray that this time everything goes well.

I think we have some IS nodes sharing the process DB but not organized as a cluster.

As you pointed out, it’s not easy to reproduce.

Jonathan_Heywood · February 10, 2017, 9:24am

There have been some issues in this area, so please make sure you apply the latest fixes - both to the UM server and to client libs. Note that on your IS installation, you will need to install three fixes: UM client, UM common libraries and UM shared bundles. It is the common libraries specifically that IS uses.

Percio_Castro1 · February 10, 2017, 4:47pm

fml2,

Yep, we also tend to see the issue after a deployment. Sounds like we’re experiencing the same problem.

Jonathan, when we first started experiencing the issue, we were on the latest but I’ll check again. Perhaps some new fixes have been released since.

Thanks,
Percio

fml2 · February 10, 2017, 5:02pm

We had the Problems with the Fix 13 (for wM 9.9) installed. I see that today the Fix 15 has been released. But in the description I can’t see that the issues resolved in Fix 14 and 15 are somehow related to what we’ve experienced.

Percio_Castro1 · February 28, 2017, 11:50pm

fml2,

Any luck finding the root cause? We ran into the issue again last week. We tried to determine the root cause but had no luck. Our developer investigating the issue did learn that if he simply deleted the named object and reloaded the package, the trigger started functioning again and the queued messages weren’t lost. We are in a clustered environment though, so I’m not sure if you would experience the same behavior.

Percio

fml2 · March 2, 2017, 4:06pm

Hello Percio,

no, we still have not found the cause. We know how to make work it again, but it involves deleting and re-creating named objects which means all the data contained there is lost.

I don’t understand that. In my view, deleting named objects implies a data loss. This is so because the messages are not held in the channel but get immediately forwarded to the appropriate subscribers (=named objects which, in our case, correspond to IS triggers). If the trigger is deactivated, the messages remain in the named object and get picked up when the trigger is activated.

Could elaborate on how you don’t lose the data?

Thanks!

Ghislain_Lowa · April 27, 2017, 1:49pm

We are facing the same problem since a couple of days on recently installed 9.12 version . We have also been using the delete object and restart trigger solution. Is there any official resolution for this known yet, because we can’t resolve it like that when we go live in few weeks?
By the way we found out that the problem seems to only happen with webMethods Messaging Triggers no matter if they have filters or not…JMS triggers are working fine when the issue occurs

We have following fixes installed on IS
IS_9.12_Core_Fix4
IS_9.12_SPM_Fix1
TNS_9.12_Fix3

Anu_Sankoorikal · May 1, 2017, 3:26am

We are facing the same issue. We did an upgrade to 9.10 and all was working fine until now.

We have a UM Topic with 3 durable subscribers. Two of them are getting the events and one was not. All three does not have any filters. Recreating the named object fixed the issue, but this is not acceptable when it goes to Prod.

We have all the latest fixes installed as well.

Percio_Castro1 · May 1, 2017, 4:08pm

I’m disappointed to hear this issue still exists in 9.12. We are getting ready to kick off an upgrade project and one of the reasons for the upgrade was precisely to eliminate these UM issues. We will do some thorough UM testing early on in the project, and if the problems persist or can’t be resolved quickly, then I suppose we may have to revert back to the Broker.

Percio

fml2 · May 2, 2017, 12:56am

Percio, this is my feeling as well. And we are also in a similar situation (an upgrade to 9.12 is being planned). I don’t quite understand how this situation can continue for so long. Are there not so many customers suffering from this that this bug is not exposed well?

Unfortunately, the broker is not a viable solution acorrding to the announcements wrt its future.

Jonathan_Heywood · May 2, 2017, 6:39pm

Percio, FML2,
Please do work with Global Support on these issues. Feel free to cc me on your communications with GS and I will make sure it gets the right attention from R&D. My email address is my firstname.lastname@softwareag.com
While we are committed to resolving this in currently supported versions, the good news is that shared durable subscribers have been rearchitected to be much simpler,removing the internal hidden queue, and keeping messages directly on the channel. Look out for that in the 10.x releases.

Percio_Castro1 · May 8, 2017, 4:46pm

Thanks, Jonathan.

Sabarish_Natarajan3 · June 12, 2017, 2:32pm

Hi,

We are facing the similar issue with our environment wM9.10. Does anyone manages to find the root cause?

During our migration phase, we hit this issue very often, WIth SAG support we made the pub doc encoding type to protocol buffers and the UM provider filters corrected, finally deleting the channel(not PORD).

The setup was working almost for a month but again the same issue resurfaced.

We have tried to delete the named objects but still the subs doesnt work. We could also witness an inconsistent connection details in Enterprise Manager where diff details shown with diff logins and most of the times we cannot see filter conditions missing in the connection details.

Appreciate any inputs WRT this. Thanks!

fml2 · June 12, 2017, 11:14pm

Have you re-created the named objects after deleting them? WIthout named objects, no messages will be delivered to the triggers.

Sabarish_Natarajan3 · June 13, 2017, 11:57am

Hi fml2,

Yes, the Named Objects were recreated.

The event id “-1” tagged to it. i.e: Its not subscribing anything at all. But I could able to snoop the event from the respective channel post publish and still the evnt id is -1.

The Irony with this entire issue is, there is another track we have migrated where the products/fixes are in same versions but we never faced a single issue. Only diff with this 2 env is, The migration utility was not ran in the working onme. we left the “enc type” as IDATA instead of ProtoBuffers and the filter conditions unaltered i.e they are not moved to UM filters and the docs are still getting filtered in IS.

Please let us know if you have any furthere findings.

Thanks!

//Sabarish

fml2 · June 13, 2017, 10:14pm

Though it’s not the officially pushed configuration, it should work. I assume you’ve hit a bug. We also experience weird behaviour in this area and now try to find out what’s causing this.