Random memory problems in B2B server

In one of the webMethods Integration servers that we administer, some time ago, after months of flawless operation, random memory errors started to occur.
It usually has the form of one particular structure tnMessage (that was by bitesToString, stringToDocument, documentToRecord created from bizdoc/Content) having disappeared from the pipeline. Depending on the exact time of the disappearance, we get different flow errors.
From the forum I got a tip that it might be related to too many files open in WmRepository2, and cleaning files from WmRepository2 might solve this issue.
Applying this tip did not give positive results.
Remarkable is that in an other identical B2B Trading Networks Integration server the error never occurs. Both servers are on the same machine (Sun Solaris), one is installed in /opt/webMethods with 5555 as primary port, the other in /opt2/webMethods with 8555 as primary port. Both use the same java_root but are runtime separate java processes of course.
The error occurs in the /opt2/webMethods on port 8555 only. Concurrency of processes is excluded as cause of the problem.
By trial and error I found that by having savePipelineToFile as one of the first service steps, before the bytesToString, the error does not occur. but as soon as savePipelineToFile is disabled or removed, the error does occur. So I use this now as a workaround.

I’m glad you found a workaround, but it doesn’t really fix the problem. My guess is that since the savePipelineToFile service writes to disk, enough synchronization is happening to avoid the bad result of a race condition somewhere later in the application.

If there is such a problem in the server, it needs to be definitively solved. Note that the default implementation of the IData object is not threadsafe. Also the contents of the pipeline should not be worked on by two threads at a time. It is possible to create such a problem by a seemingly logical optimization of passing an IData containing a PO document to one thread for processing while the receiving thread starts formating a PO Ack. Flow Maps start happening on two pipelines that contain references to a single IData and random errors occur.

How often does the problem happen?
What versions/patch levels are the servers running?
Does your application do any calls to threadedInvoke?
Does your application do other explicit execution of multi-threaded/concurrent operations?
What is your logging/audit level of the server and the specific services executing when this problem happens?
Do you write Java code to the IData API or the Values API or neither (just Flow)?
Has the problem ever been seen on any other server (QA, Dev)?

webMethods B2B Integration Server 4.0.1
In the mean time I have been playing around with this problem some more.

The application has two main parts:
1.acceptData is a remotely invoked servce, that receives a message body, along with some sender and content information. It writes the message body to disk and passes the sender information tnMessage to trading networks (receive)
2.processData is invoked by the processing rule in trading networks. it picks up the sender information tnMessage from the bizdoc, looks up a database to find the receiver ftp location and does the ftp of the file written by acceptData.

processData starts at acceptData/passToTN

From audit.log I learn the following:
Errors always occurred like
1.acceptData/passToTN
2.processData/getMessage
2.processData/lookupReceiver
1.acceptData/disconnect
2.processData/ftp fails

My workaround looks like this
1.acceptData/passToTN
2.processData/getMessage
2.processData/savePipeline
1.acceptData ends with server.disconnect
2.processData/restorePipeline
2.processData/lookupReceiver
2.processData/ftp ok

Occasionally the sequence is different
1.acceptData/passToTN
2.processData/getMessage
2.processData/savePipeline
2.processData/restorePipeline
1.acceptData/disconnect
2.processData/lookupReceiver fails

So your guess is right, it does not solve the problem. acceptData and processData are different threads but they obviously share the same pipeline.
I used to have a remote service call to acceptData without explicit disconnect. Maybe that is what made the error not occur in the past.

Does a real solution exist?

Thanks for the additional info. I’ll grab someone that knows more about TN internals to give an opinion.

Feel free to open a Service Request on this one to make sure it is followed up on promptly because sometimes my memory is random.

What version of IS/TN are you using?

webMethods B2BServer 4.0.1
TradingNetworks 4.0

Applied fixes:
B2BS_4-0-1_FIX_2.jar
B2BS_4-0-1_FIX_11.jar
B2BS_4-0-1_FIX_25.jar
B2BS_4-0-1_FIX_30.jar
B2BS_4-0-1_FIX_43.jar
B2BS_4-0-1_FIX_57.jar
B2BS_4-0-1_SP2.jar

Please contact webMethods support and ask for the TN 4.0.1 patch FIX_7_TN401. It may solve your problem, but from the description I’m not certain that it will.