IS out-of-memory issues on AIX 5.x

I realize the answer is “it depends”, but we are running into intermittent problems a with AS2 documents that are 4 meg in size. Other times we have been able to process docs as large as 12 meg of the same type. We are running IS/TN 6.1 on AIX 5.1 with 12.25gb of memory. We have been told by webMethods that we have to enable large document handling to overcome this problem, but I have heard from other wm sources that the IS should handle documents as large as 50 meg without enabling large document handling. Does anyone have any suggestions/comments? Thanks

Roger

I think the number 1 thing that determines when a document is too large is “what else is going on when the large doc arrives?” In some cases it may be a 100k doc that pushes it over the top.

What is the nature of the intermittent problems? Out of memory errors?

Thanks for the reply Rob…
Yes, out of memory errors, IS won’t crash, but will take some java core dumps.

Hi Roger,
Out of 12.25GB, how much is allocated to IS? By allocationg 1.5GB to IS, we are able to parse and map 14MB of RN doc without “Large File Handling”. Yes, it was the only doc running at that tiem and involved revisiting the code and dropping the unwanted variables in pipeline at each step.

Using the “Large File Handling”, we could process 50MB AS2/UCCNET doc.

On AIX, the user id running IS is not restricted in memory. The java settings on the IS command are: -Xms1536M -Xmx1792M (when we tried to make it larger, it caused problems) we are running IBM JVM 1.4.2 According to AIX, it is using only 3% of the real memory. Virtual size is 333MB. The only other things running on the server are the broker and an oracle instance for the TN/logging data base.

Besides our EDI processes, we do have multiple internal integrations running in this IS, that communicate with both our JDEdwards OneWorld and our PkMS warehouse management system. So we have several JDBC and OneWorld adapter connections running.

Roger,

Another large webMethods customer I am familiar with was having many core dump issues with, not only IS, but also other Java apps running on AIX 5.x.

IBM was working with this customer to come up with a workaround or fix, but so far the only recommendation was to go to a 64-bit OS, not because it would really fix the issue, but because the issue would occur much less frequently.

Not sure if your issues are the same as theirs, but they had several java apps running on JRE 1.4.2 that were having a core dump issue at high volumes and high memory usage scenarios.

You might check with support and keep escalating until you get to speak with someone who might be working on that customer’s issue. Not that I think this is a webMethods IS issue, but they may be familiar with a workaround.

Mark

We have been seeing out of memory issues on AIX.

The issue for us has been heap fragmentation rather than exhaustion.

You need to use the IBM tools to analyze your heapdumps (or IBM can help with this) to tell which situation you have. If you have an out-of-memory when only part of the heap is used, you have a fragmentation problem.

Unlike the Sun JVM, the IBM 1.4.2 JVM adds a category of immovable heap objects they call “Dosed”. Any method-local storage is dosed, and therefore cannot be moved during a heap compaction cycle. The Integration Server uses a lot of method-local references.

Depending on your thread count, you can find yourself in a situation where you have a large number of dosed objects scattered across the heap but only using say… 20% of the total. The first relatively large object that comes along doesn’t find enough contiguous space in the heap. So a garbage collection is triggered, and then a compaction. All those dosed objects can’t be moved during the compaction cycle. Still not enough contiguous space. Game over. You could say that in this situation, the large object is the victim rather than the perpetrator.

The dosed object situation is a calculated tradeoff IBM made in order to get higher performance by eliminating a layer of heap object reference indirection. An IBM java lab guy explained to me that at the time 1.4.2 was being designed, the workloads the hardware was supporting made this a good tradeoff. It’s not working so well now, and IBM has returned to a more “Sun-like” heap management approach for Java 1.5. If only we could get wm support on IBM java 1.5…

In the meantime, the best approach we have found so far is to recognize that in general, threads running inside the Integration Server are a poor place to queue workload, and get agressive about controlling thread count on any Integration Server. I’ve reduced the max server threads significantly.

The challenge is that by simply reducing the max server threads setting, there is a potential to starve high priority work with high-volume low priority work. So we must effectively throttle the workload requests at their source. This means taking a look at trigger throttle settings, concurrency settings for individual triggers, scheduler tasks, JDBC pool sizes, reverse-invoke connections, etc. etc. Ultimately, you may have a situation where more servers are required for high priority work, or a client simply must be prepared to wait.

Most excellent description! Very informative.

Mark,

Have you found a utility to help analyze IBM JVM core dumps? I’ll dig around on DevWorks as well.

Mark C

Mark,

On AIX you have a text format ‘javacore’ (a sort of enhanced thread dump), a heapdump, and the aix ‘native core’ that is equivalent to what you read with dbx on any unix system. In addition, with appropriate JVM properties and environment variables set, you can (and should) produce verbose garbage collector logs when this type of situation occurs.

You can identify IS threads from the text format javacore by looking for entries with identifier ‘2LKFLATMON’. This should be the same count you would get from: ps -lm -p $PID | wc -l, where $PID = the process id of the JVM running the IS. Looking at these is important because for multiple reasons, we cannot trust the thread counts displayed on the IS server statistics page. You can use IBM tools to analyze the heap, but if your verbose gc logging shows an out of memory failure when there was significant heap space available, you can probably assume there is a fragmentation problem.

The native cores have come into play when we’ve had the JVM die due to JIT compiler bugs. Needless to say, mere mortals need assistance from IBM to analyze this.

For those with curiosity and a lot of patience, much detail is available in the IBM Java 1.4.2 diagnostic guide:

http://download.boulder.ibm.com/ibmdl/pub/software/dw/jdk/diagnosis/diag142.pdf

Let’s just say I eagerly await webMethods support for Java 1.5.

Mark R.

Here’s a related thread based on what I found from analysng core dump text files from some recent issues. Turns out that are some known issues with the IBM 1.4.2 JITC on Windows when IS was started as an NT service

Mark