Integration Server high CPU usage

Hello,

We are in progress in upgrade project from version 7.1.3 till 10 (latest currently). During our analysis and investigation of current solution, we identified that IS service (running on Windows Server 2008) is highly loading CPU (80%-85%) all time. Since it becomes as showstopper for upgrade project, could you please share the best practices how to perform profiling and identify what is causing such high CPU load? Is there anything what we should look first?

Thank you,
Looking forward for your help and answers.

Hi Aleksandrs,

can you describe your migration path as there is no officially supported direct path from 7.1.3 to 10.1 (latest general release)?

Latest version with direct path from 7.1.3 is 9.5 SP1.
From there you need to migrate to 10.1 if this is possible.

See the Supported Upgrade Pathes documentation for details.
Additionally you will have to check all the Built-In-Services Guides for all intermediate versions for deprecations and changes.

Regards,
Holger

In addition to it, also plan to migrate to Universal Messaging from Broker (if used).

Hi,

You can request a free trial of the Wrightia’s Service Profiler ( https://www.wrightia.com/serviceprofiler ) to analyse which of your services is having a bigger impact on the CPU usage.

The installation requires a server restart.

If that is not possible, see on the Service Usage which service is running most of the time and get a Thread Dump to see which threads are running.

You can also analyse the logs, for any relevant messages and extract the stats.log and parse it elsewhere (I used excel) to see how/if the IS is calling the GC too often.

You can also get a Diagnostics file from the IS and request an analysis from SAG.

Best regards,

1 Like

Hi,

you can add some parameters to JAVA_OPTS to enable GC logging into a file:

Under profiles/IS_default/bin in the setenv.sh:
“-verbose:gc -Xloggc:/IntegrationServer/instances/default/logs/gc.log”

You can change the filename (here: gc.log) to any name you want.

Restart the IS after changing the setnev.sh.

Regards,
Holger

Hello Holger,

We are following the official way 7.1.3 → 9.5 → 10.1. Thank you for pointing us about deprecations and changes, I would like to let you know that the codebase analysis did performed and there no issues during this upgrade path.

KR,
Alex

Hello,

According our plan and platform, we don’t use broker at full cycle. So the switch from broker to UM is planned during first 6 months after go-live in 10.1.

KR,
Alex

Hi Gerardo,

Thank for suggestion and option. I really aprecciate that. I will try to follow the option without server restart, since our project has high-availability requirement.

KR,
Alex

Hi Aleksandrs,

if there are no issues encountered during codebase analysis you had luck.

But you should plan a thorough regression test on your code to make sure everything works as expected.
We have found some issues during our migration from 7.1 to 9.5 which were not detected during codebase analysis.

Regards,
Holger

Hej Holger,

Many thanks for your input. Is that possible for you, just describe shortly the issues you experienced? Just as experience sharing.

KR,
Alex

Alex,

The trick to finding the root cause of CPU issues is to correlate information from the operating system to that of the JVM. By that I mean, you typically have to leverage tools/commands at the operating system level to identify the thread ID, within the JVM process, that is taking up most of the CPU. You can then generate a thread dump from the JVM and use that ID to find the specific thread that is causing the issue. Please note that the thread ID may be represented differently, so you may have to convert from decimal to hex, for example, to find the correct thread.

For Windows operating systems, I have typically used Process Explorer (Process Explorer - Windows Sysinternals | Microsoft Docs) for these types of tasks.

Having said this, in my professional experience, the root cause of most (not all) CPU issues related to webMethods tends to be insufficient memory allocation (i.e. the heap is not large enough). Similarly, it’s not uncommon to run into heap sizing issues during upgrades because of the significant changes in the underlying software but often also in the environment. For example, many customers use the opportunity to upgrade (or change) operating systems or to move from 32-bit to 64-bit architectures.

Given this, you may want to hook up visualVM to the JVM (or turn on GC logging as suggested here) to determine if the issue is memory. You could also just bump up the heap size and see if the problem goes away. If it turns out not to be memory-related and there’s a service behaving badly somewhere, then a profiler like the one suggested by Gerardo or Nibble Technologies’ Nanoscope (https://nibl.tech/nanoscope) could come in handy. Full disclosure: I work for Nibble Technologies. :slight_smile:

Percio

Hi Alex,

we had to adjust our process models regarding receive steps when the models uses several of them for joining running instances.

Regarding OR-Joins check the 9-5-SP1_BPM_Process_Development_Help.pdf, page 125.

Undocumented change of behaviour for i.e. pub.xml:xmlNodeToDocument regarding parameter makeArrays in combination with parameter keepDuplicates.

When using pub.prt.log:logCustomId, make sure that the input for the customId is filled. Otherwise the service will fail instead of using “NO VALUE” as was done in wM 7.1.

Not all default values for parameters mentioned in the various Built-In-Services guides are really handled as default values.
These need to be set manually sometimes.

Remember to migrate the JDBC Adapter services with the script provided by the new Adapter if you are using and migrating the JDBC Adapter (from 6.5 [for wM 7.1] over 9.0 [possiible for wM 9.5] to 9.10 [current version for wM 9.10 and newer]).

Regards,
Holger

Hello Percio,

Many thanks for comprehensive solution, starting from this week me and my team will review start looking into this issue to solve that before going live with 10.1

KR,
Alex

Hi Holger,

For some reason, we don’t use business models in our project, but we already noticed some issues with the xml parsing and wrapping into xml content. Mainly we faced with the namespace issue which historically had been incorrectly passed (hardcoded). Regarding JDBC, we didn’t found out any issues at all.

KR,
Alex