We have 2 IS in a cluster connecting to a broker. Version 6.5. During the usual restart on last weekend the broker did not come up. The error message was: brokerdata.qs file corrupted, once we replaced the corrupted files (a few others under default dir of Broker) from the back up it came up. After that one of the interfaces started failing to publish its canonical. The error was “Document type is not defined in the Broker.” but this interface is in production and it was working fine before the .qs file corruption incident. So I synced the document from developer pushed to Broker but what I failed to notice at that time was the Trigger subscribing this document has also lost its subscription. By the time I realized this it was too late and the interface published a few records before I could disable/re-enable the trigger to bring back the subscription. The trigger re-established it subscription but it did not pick up those docs published earlier. It looks like a data loss.:o
Since this document’s properties are StorageType=Guaranteed and never Discard, it should be stored somewhere on the disk.
Now is there any way we can pull these docs out and resubmit them? Question may sound a little lame but since it is production data and I’m working on other possibilities to get the data re-send it would be great if some one can throw some light here and any help on this is highly appreciated.
If any one has any idea of the other two issues of .qs file corruption and docuemnt disappearing from broker, please let me know, that would be also helpful in investigating this issues further.
Guaranteed data is indeed stored to disk–in the BrokerData.qs file. Since it became corrupted, the data is gone.
The configuration data is stored in BrokerConfig.qs. If you restored that from a backup as well, that is the likely source of “Document type is not defined in the Broker.”
For the data lost after the sync and trigger reenable, unless you set up a dead-letter queue, that data is also gone–because there were no subscribers. Guaranteed means the broker will place the document on subscriber queues (written to disk with 2-phase commit). If there are no subscribers, the document is discarded (the dead-letter queue is considered a subscriber).
The discard property on the trigger only applies to documents on the trigger’s queue. And in this case the document never made it to the queue.
Have you been restarting Broker Server every weekend? What is the rationale for doing so? This really isn’t necessary.
I suspect the .qs corruption was caused by an ungraceful shutdown but that’s pure speculation.
Thanks for taking some time on this. I understood the dead letter queue concept. I’m planning to set-up one and after somemore reading in forums I learned that we need to use Java Broker APIs to handle the dead-letter Q. Is there any other way that you know of to do it?
And regarding the server restart, yes the servers are being restarted every week and it is being praticed since longtime even before I joined this client. So I’m not sure what is the reason behind it. Just curious is’nt really necessary for IS also or just the broker. If it is not then I can insist them not to do it as we are seeing many issues lately. I too suspect the ungracefull shutdown but need to check with the server team whether they do a manual shutdown of IS first or a scheduled hard reboot of the machine itself.
@DevNull43,
Since we managed to make the business resend the data, we didnt need to use the utility but anyways thanks for your suggestion.
Using the Broker APIs is the only way of which I am aware. Unfortunately there doesn’t appear to be a way to connect IS to a specific client queue so that you could define a trigger and trigger service.
For the restarts, it shouldn’t normally be needed for IS either but often is a way to work around memory leaks and other issues. Weekly or other periodic restarts should be treated as a temporary solution until the root issue is isolated and corrected.
I have just started to write the java code for handling the dead-letter queue. Let me see whether I come up something useful.
We ran into another issue with the wMLogUtil throwing errors every second saying “[LGU.0002.0002E] Error getting log events: The specified number of events to be peeked, are not in the queue.” When searched for the errorcode in the error reference guide it said the broker might not be reachable due to 1.network down, 2.broker loaded, but in our case all were fine and i could see the connection enabled in Admin page. But couldnt find anything substantial yet.
Considering the timeline and the yearend processes running critical reports we took the issues to wM support and they have come up with around 10 fixes(4- broker,6- IS). We will be applying them sooner now. Will update how it goes.
But I’m still curious why these issues have to come up now all of a sudden even when there is no change made to the server.
And regarding the server restart, as suspected they do a machine reboot without shutting down the IS before restart.
We have our architecture on Windows. Is Windows stable enough in terms of networking? The reason I’m asking this is we are seeing many connection glitches here and there, broker ping failed, reverse invoke ping failed. But all these are very momentary. As I have not worked on support on other platforms I’m not sure whether it is normal and any differences between them.
If you are ok to continue this discussion here in this thread.
The Dead Letter Client
To activate the dead letter queue, you must configure the Broker’s dead letter client. The
dead-letter client is a system client that subscribes to dead letters. It is a member of the
eventLog client group. This client group generates an explicit-destroy client with a
guaranteed queue.
By default, the client ID for the Broker’s dead-letter queue is “DefaultDLQ_”, however,
you can optionally append a suffix to this client ID when you activate the queue. For
example, you might add the name of your Broker to the default client ID to form an ID
such as “DefaultDLQ_BrokerEast01.” You can also associate an application name with
the dead letter client to make it easier to locate in the Broker user interface.
When you activate the dead letter queue, the Broker generates a (disconnected) client
state object using the client ID that you specify. You can use My webMethods to examine
or purge the dead letter queue. You can also use the Broker client API to develop a
program that connects to the dead letter client’s state object (using the client ID you
assigned to the dead letter client) and retrieves the dead letter documents from the
queue.
I have seen “network” glitches from IS to broker even installed on same machine, solved by a bunch of fixes, so I understand they have requetes you to install them.
In terms of OS, Windows is stable and many people run production environments here, however all depends to whom you ask, for me Windows is a toy
I would go for a Unix one, Solaris + zones has a great scalavility.
Actually, the DLQ has been around before 8. I think it was introduced with 6.5 but I’m not sure.
The issue isn’t in getting a DLQ established. It is that after you have one, there aren’t any tools (beyond the APIs) to do much of anything with the events that land there.
Ah, that sounds like the ticket. If one can use IS to process arbitrary documents off the DLQ (send emails, repub dead letters after adding a subscriber to handle them, etc.), that’s a good addition.
Thanks Rob for all your suggestions and inputs. As of now we have installed 12 fixes including one for wmLogUtil but still the WmLogUtil seems to have the same problem.
@Dev,
We are still in 6.5 and would be starting the upgrade early next year. Thanks for your input on the dead-letter queue handling.
Just thought I would update the thread with the root cause and the fix for this file corruptions.
Lately we started noticing many issues with the config files and broker config files getting corrupted often during restarts. Even after a long list of fixes and updates provided by Software AG are applied the issue remained the same. Then when we diverted our search away from webMethods to the OS level, we figured that there is a thrid party software that runs frequently to backup all the files on the system. This one create somekind of locks on the files explicitly while it backs them up due to which the IS and broker were not able to access them and write information properly and ultimately the files get corrupted. This software has an option to disable locking of files while backup. After this swirch we never noticed any file corruptions on IS or broker. So we consider this issue be closed.
Has anyone developped yet a java flow service to retrieve documents from the Default DeadLetter Queue?
If so could I kindly ask for some sample code?
I have tried following code based on Broker API documentation but without success…
IDataCursor pipelineCursor = pipeline.getCursor();
String brokerServer = IDataUtil.getString( pipelineCursor, "brokerServer" );
String brokerName = IDataUtil.getString( pipelineCursor, "brokerName" );
String brokerClient = IDataUtil.getString( pipelineCursor, "brokerClient" );
/* Create a deadletter subscription for all event types*/
try
{
BrokerClient c = new BrokerClient(brokerServer
, brokerName
, brokerClient);
c.newSubscription("*","{hint:DeadLetterOnly}");
}
catch (BrokerException e)
{
throw new ServiceException(e);
}
But I get following error during compilation…
129: cannot find symbol
symbol : constructor BrokerClient(java.lang.String,java.lang.String,java.lang.String)
location: class COM.activesw.api.client.BrokerClient
BrokerClient c = new BrokerClient(brokerServer
^
1 error
I started on the code but I never completed it because we found the root cause of the issue that was pushing documents inot dead letter.
But just a quick tip, the issue looks like it might be due ot the less number of parameters you are trying to pass ot the constructor. there are 3 more, client group, app name, connection descriptor.
give it a try with all the parameters and see if it compiles ok.