Hello community user,
I would like to ask for your assistance and thoughts based on your experience and knowledge. We are using for development environments on VM with installed Integration Server 7.1.3 (if needed more details I could provide) running on Windows server 2003 and Java Version is 1.5.0_15, and 12 GB of RAM. The main purpose is file integration’s with conversion and communications with file servers (about 15gb of data each day and about 50 000 transactions).
Same time we have production dedicated server on Windows server 2008 (and 96 gb RAM).
During last year, we are experiencing low performance on non-production servers, and that occurs periodically.
Hence the question: what I should start looking? Is there specific rules, which indicates server hardware in order to support 6 development activities on IS simultaneously and such load?
You can start looking into some important IS environment settings like java max memory (-Xmx) in setenv.bat/sh, total threads allocated to IS under Resources in IS admin page. Number of cores assigned to your non prod VM’s.
Can you also describe what you mean by low performance? Without knowing much else, I can almost guarantee that your issue is related to heavy memory usage, leading to long garbage collection pauses. Knowing your max heap (i.e. -Xmx as suggested by Prasad) will be a good starting point for us to send you down the right path.
Hello Prasad Pokala and Percio Castro,
Here are some details gained during weekend (when there are low usage of development environment):
- Instructions from server.bat:
set JMX_PARAM=-Dcom.sun.management.jmxremote.port=11556 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
set JAVA_GC_LOG=-Xloggc:D:/webMethods712/IntegrationServer/logs/gc.log -XX:+PrintVMOptions -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps -verbose:gc -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=D:/webMethods7/IntegrationServer/logs/java.heapdump.hprof
set JAVA_GC_PARAM=-XX:-TraceClassUnloading -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=20000 -XX:NewSize=512M -XX:MaxNewSize=512M -XX:+CMSClassUnloadingEnabled -XX:+CMSIncrementalMode -XX:CMSMarkStackSize=8M -XX:CMSMarkStackSizeMax=32M -XX:+ParallelRefProcEnabled -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC
Please see Capture.JPG
Heap, CPU and GC Monitor graphs, I will be able to provide later this week (since the lowest performance is experienced during business hours).
I’m also planning to discuss this thread with internal hardware engineer, in order to distinguish the current VM underlying hardware performance and upgrade plans.
There are many JVM tuning parameters with different values. My first question would be, have these been added with proper reasons & verified in running system if they really help, or added as a bunch assuming all of them would help?
Maximum memory allocated to the java process is 9 GB. In spite of CMS being enabled, I see from the IS admin screenshot that, the memory is almost near to 9 GB all the time. So, what is occupied in the memory which is not getting released? Do you see CMS activity wherein it is trying to do minor GC’s? Do you have the stats.log file in your system? You could share that as well.
Hello Senthilkumar G,
Please find attached stats.log file. Later today at evening (17:00 CET), I will provide the monitoring screenshots and new stats.log file.
These parameters had been added a time ago and had been verified. I’m only changed JAVA_MIN_MEM and JAVA_MAX_MEM values, when did upgraded VM.
I’m sorry for my lack of knowledge, but could you please instruct me, how to trace CMS activity?
workingDay_morning_stats.log (69.7 KB)
Indeed, as Senthilkumar indicated, the number of JVM tuning parameters is mind boggling. My guess is that a previous engineer ran into the same issue you’re running into and instead of digging to find the root cause, they attempted to “solve” the problem by copying-and-pasting all these JVM options from a Google search or from previous experience, and neither is very helpful. In fact, making so many changes can often have a detrimental effect. Once we figure out what the root cause is, we can work on cleaning these up.
Now, these settings further confirm my previous suspicion that your issue is likely due to poor memory use, leading to long garbage collection pauses. The good news is that based on your memory graph, your memory usage is high all the time, meaning we don’t have to wait for a specific, and perhaps elusive, event to figure out what’s consuming so much memory. In order to figure out what is consuming so much memory, you typically want to take a heap dump, using a command like jmap. You can then analyze the heap dump with a tool like MAT (http://eclipse.org/mat) to determine the culprit(s). You may actually already have a heap dump laying around since you’re passing the HeapDumpOnOutOfMemoryError and HeapDumpPath options to the server.
The bad news is that your application seems to use a LOT of memory all the time so it’s going to take a long time to generate the heap dump (if you don’t have one already) and longer for MAT to parse it. You may want to consider reproducing the issue in a test environment with a smaller heap size so you can work with a smaller and more manageable heap dump. Of course, don’t make it too small, otherwise it will be more difficult to identity the culprit.
By the way, it looks like you have GC logging configured so if you’d like to confirm that GC is behind some of the performance issues you’re seeing, you can look at the file webMethods712/IntegrationServer/logs/gc.log. If you need help making sense of the file, there are tools out there to help you analyze GC logs.
Looking at the stats log, below are the observations…
- Thread Usage is maximum. Maximum threads defined looks to be around 240 threads. Almost all the time, 220+ threads were in use.
- Maximum memory defined is 9 GB. Free memory was only between 200 MB to 1 GB. Almost all the time, 8 GB+ of memory is utilized.
What kind of requests are being handled by this system? Looking at the number of sessions created, and requests handling, it appears to be mostly HTTP kind of requests being handled by this system.
Next step should be,
- share the gc.log to see if any GC activities were happening or not
- analyze the data that is occupied in memory. Percio has already explained what to do with memory analyzer.
- how does the CPU utilization looks like? what is the number of cores? is there any other java processes running in that machine (like MWS), or is it only Integration server?
If you can reproduce the same load and memory usage in lower environments, it will be good to perform step by step approach. I would first remove all the JVM parameters (after analyzing the heap dump & gc.log) and add very few parameters like CMS to start with. Unless otherwise a parameter is helpful, it need not be added to JVM.
also, number of requests received in the system peaks/varies only between 7.30 & 9.00. Session usage is also high during this period. I wonder, what makes the threads to be used all the while? what is occupied in memory all the while? Are there any static data being loaded into memory?
while working on, you also involve your OS admin team to know the facts around the issue. There are so many reasons for causing high memory usage. You should analyze each and everything before taking any further actions.
I would say, not really…
When we talk about high memory being used, this is consumed by the java process (Integration server) that is running in the Operation system. OS will not try to do anything with the allocated memory of java process. Whatever happens inside that allocated memory is by the IS, and we need to analyze from IS end, not from OS end for ‘high memory’ being consumed all the time.
Dear community users,
I would like to say big thanks to you all. I appreciate your help and suggestions provided above. After my vacation in last weeks I done a couple checks and selected following solution:
- VM underlying hardware was upgraded from “Intel Family 6 Model 15 Stepping 7” to “Intel Family 6 Model 63 Stepping 2”. The difference is obvious, and GC iterations is consuming not more than 20% of CPU right now (instead of 100%). No more freezes are experienced. 6 developers are working efficiently, and simultaneously standard data flow is processed on the same environment.
- Planning to add additional 4gb of RAM, since HEAP usage is to high.
Glad to know that, the major issues are resolved.
CPU wise, this looks to be fine…
Memory wise, adding more RAM to the machine, and allocating those additional memory to the java process (IS) is one option. However, if the java process has some loop hole, or some implementation has issue which keeps adding objects into memory & consumes memory gradually, the problem is only prolonged. If you can make sure, GC is clearing objects from memory, then you are all set to be good. Simple way to check this is, using Oracle Java VisualVM and connect to Integration server using JMX port. If you notice a ‘see saw’ pattern for memory over period of time, all is well… Sample screenshot attached…
The reason of high heap memory usage probably related to huge amount of packages and services compiled during server start. In production environment, after server start there are 36GB already used (during business activity hours heap usage is growing till 60-70GB). Our project is mainly focused for data file batches processing (with conversion and business logic applying). Each day we have at input at least 50k of files and at output at least 20GB of plain text data files.
Our codebase contains more than 600 packages (approx. 24000 separate services) and 591434 requests are handled per 1 minute
Wow! 600 packages is insane and impressive all at once. I am working with a customer that has been using webMethods since the early 2000’s. I just looked at their version control system and they have 250+ packages but many of which are no longer in Production and these 250 packages are not all deployed to the same servers (they have 3 logical webMethods environments). The fact that you guys have 600 packages is definitely mind boggling.
Now, when you said “36GB already used”, what were you referring to? It can’t be memory so I’m a bit confused.
Here are a couple of thoughts:
I too agree with Senthil that increasing the heap size will likely just delay the problem. If you have a memory leak, it will be just a matter of time before you run out of memory again. I still believe the exercises we suggested here, like taking heap dumps and using visualVM, would give you great insight into what the real problem is.
Regardless of whether the problem is due to a memory leak or due to the number of packages, one thing that you should definitely consider is breaking up the packages into separate environments. In other words, create additional Integration Servers and deploy certain packages to one set of servers and some other packages to another set of servers. With 600 packages, you could even break it up into 3 or more sets of servers. There are several benefits to doing this: faster start up times, smaller heap sizes = quicker garbage collection, ability to more easily pin point offending applications in situations like these, ensuring that offending applications don’t impact all other applications, etc.
Yes having 600+ packs is a really really high number and you should plan to split up the interface specific packages and move to the other IS’s nodes (that should handle cluster too) and many advantages as Pierco mentioned.
Just curious what was the Min/max memory heapsize you have allocated for this IS and assuming they are in cluster nodes and external DB- Oracle?
Thanks for suggestion. Currently each environment is already working in cluster which consists 2 nodes (leading and failover). Considering that project is focused for file data batch processing, and we have customized schedules for source polling, the code base split is quite risky from my perspective right now. Our application is designed as not real-time system and not event based solution, each 4 minutes server is checking a huge amount of file servers and folders for new files (about 60-70 internal and external servers, and about 2000 source locations). MS SQL DB is placed in internal network, but running on other separate server.
For non-production environments, where we are using the VM’s I had dedicated at max 14GB and as min 12GB for heap. After underlying hardware upgrade, which was completed during last week there are noticed significant performance increase. No freezes anymore
I’m sorry about my lack of knowledge, but can you please instruct me more detailed about how to: take heap dump using visualVM in order to trace heap usage?