BPO Performance Issues - Thread Dump

Hi All,
We are seeing performance issues with one of the BPO in production environment. We have done some tuning wrt server max thread pool size, Trigger threads max size and process audit and process engine pool size by increasing the pool size. The situation has improved very little. Dont know what is next to do…? Have observed few things

  1. On the service usage page we can see the below - wm.prt.dispatch:handleTransition(298) - Dont know what 298 refers to…? Is it that 298 transactions are waiting to get connection from ProcessEngine connection pool…? Can some one please explain this…?

  2. We have taken the thread dump of the server where the BPO is running. There are multiple entries as below.

“PRT6cfcbb70855d11e09940b1329a9ef116” daemon prio=1 tid=0x00002aaab404b9c0 nid=0x5bed in Object.wait() [0x000000006b39e000…0x000000006b39fd10]
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:474)
at com.wm.util.sync.LatchedSemaphore.semWait(LatchedSemaphore.java:54)
- locked <0x00002ab43f69b318> (a com.wm.util.sync.LatchedSemaphore)
at com.wm.util.sync.LatchedSemaphore.semWait(LatchedSemaphore.java:39)
- locked <0x00002ab43f69b318> (a com.wm.util.sync.LatchedSemaphore)
at com.wm.app.audit.AuditFactory.checkForSuspend(AuditFactory.java:948)
at com.wm.app.audit.AuditFactory.getPoolProducer(AuditFactory.java:808)
at com.wm.app.audit.AuditFactory._publishAudit(AuditFactory.java:628)
at com.wm.app.audit.impl.AuditRuntime.publishAudit(AuditRuntime.java:63)
at com.wm.app.prt.AuditLogging.publishAudit(AuditLogging.java:1065)
at com.wm.app.prt.AuditLogging.log(AuditLogging.java:850)
at com.wm.app.prt.AuditLogging.logTransition(AuditLogging.java:492)
at com.wm.app.prt.StepDispatcherAgent.sendTransition(StepDispatcherAgent.java:1961)
at com.wm.app.prt.StepDispatcherAgent.sendOutputEvents(StepDispatcherAgent.java:1863)
at com.wm.app.prt.StepDispatcherAgent.runStep(StepDispatcherAgent.java:1374)
at com.wm.app.prt.StepDispatcherAgent.execute(StepDispatcherAgent.java:1057)
at com.wm.app.prt.StepDispatcherAgent.synchRun(StepDispatcherAgent.java:143)
at com.wm.app.prt.StepDispatcher.startStep(StepDispatcher.java:1058)
at com.wm.app.prt.StepDispatcher.dispatch(StepDispatcher.java:1018)
at com.wm.app.prt.StepDispatcher.handleTransition(StepDispatcher.java:170)
at wm.prt.dispatch.handleTransition(dispatch.java:335)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at com.wm.app.b2b.server.JavaService.baseInvoke(JavaService.java:403)
at com.wm.app.b2b.server.invoke.InvokeManager.process(InvokeManager.java:631)
at com.wm.app.b2b.server.util.tspace.ReservationProcessor.process(ReservationProcessor.java:40)
at com.wm.app.b2b.server.invoke.StatisticsProcessor.process(StatisticsProcessor.java:44)
at com.wm.app.b2b.server.invoke.ServiceCompletionImpl.process(ServiceCompletionImpl.java:241)
at com.wm.app.b2b.server.invoke.ValidateProcessor.process(ValidateProcessor.java:51)
at com.wm.app.b2b.server.ACLManager.process(ACLManager.java:228)
at com.wm.app.b2b.server.invoke.DispatchProcessor.process(DispatchProcessor.java:30)
at com.wm.app.b2b.server.AuditLogManager.process(AuditLogManager.java:624)
at com.wm.app.b2b.server.invoke.InvokeManager.invoke(InvokeManager.java:536)
at com.wm.app.b2b.server.invoke.InvokeManager.invoke(InvokeManager.java:339)
at com.wm.app.b2b.server.dispatcher.trigger.Trigger.invokeService(Trigger.java:331)
at com.wm.app.b2b.server.dispatcher.trigger.Trigger.processMessage(Trigger.java:233)
at com.wm.app.b2b.server.dispatcher.trigger.DefaultTriggerTaskHelper.process(DefaultTriggerTaskHelper.java:253)
at com.wm.app.b2b.server.dispatcher.trigger.TriggerTask.run(TriggerTask.java:282)
at com.wm.util.pool.PooledThread.run(PooledThread.java:118)
- locked <0x00002ab447060810> (a com.wm.app.b2b.server.TMPooledThread)
at java.lang.Thread.run(Thread.java:595)

Does the above mean there is any deadlock happening in PRT (process engine database). Is it a cause of concern…? If so, what could be done to avoid this. Do we need to further increase the JDBC pool size…?

Please explain the above thread dump example. If someone can provide some suggestion ASAP it would be very helpful. We are seeing issues with this BPO everyday in our production environment.

Thanks in Advance !

What are the steps of process model doing i.e… services associated with model steps? Do you see them taking more time to complete?
What are current pool size for process engine?
It would be difficult to find root cause by seeing only part of thread dump. Does this process model executes on single IS or over multiple IS?
Tweaking quality of service properties of process can help

Hi Shahid,

Thanks for your reply. I have attached the process file (renamed the file with.txt extension to attach, remove the .txt extension to view it) and the complete thread dump for reference.

We have observed that one model instance takes hours to complete. Have attached 1 sample request and as you can see there is a lot of time taken to go from one step to another. Wonder whether database connection is a bottleneck looking into the thread dump file.

Current size of pool thread for ProcessEngine and ProcessAudit is set to 50 each. OS, IS and Java details are as follows

OS.Version 5.4
OS.Kernel.bits 64
CPU.type x86_64
HW.memory[GB] 3.56
OS.Name Red Hat Enterprise Linux Server
CPU.cores 1

Product webMethods Integration Server
Version 7.1.3.0
Updates PRT_7-1-2_Fix2
Build Number 50

SSL Strong (128-bit)

Java Version 1.5.0_22 (49.0)
Java VM Name Java HotSpot™ 64-Bit Server VM
Java Build Info 1.5.0_22-b03, mixed mode
Java Vendor Sun Microsystems Inc.
Java Data Model 64-bit

Audit related setting on IS is as follows -

watt.server.auditDBSize=10
watt.server.auditDir=audit
watt.server.auditGuaranteed=true
watt.server.auditLog=perSvc
watt.server.auditMaxFetchSize=10
watt.server.auditMaxPool=10
watt.server.auditMinPool=1
watt.server.auditRetryCount=3
watt.server.auditSync=false
watt.server.auditThreshold=100
watt.server.auditlog.error=true
watt.server.auditlog.gd=true
watt.server.auditlog.security=false
watt.server.auditlog.session=true
watt.server.event.audit.async=true
watt.server.extendedSettingsList=watt.orange.eai.OUK_PAYGSUBACT_Config;watt.orange.eai.OUK_PAYGSUBACT_NotepadMsg;watt.orange.eai.OUK_PAYGSUBACT_WselURIs;watt.server.auditMaxFetchSize;watt.server.auditMaxPool;
watt.server.mbean.auditPerCall=false
watt.server.trigger.suspendOnAuditErrorWhen=ErrorPipelineEnabled

The BPO is running on 2 Integration Servers in clustered environment.

Please let me know if you need further information.

Thanks in Advance !
PayGSubscriptionActivation.txt (10.4 KB)
ThreadDumpServer1.zip (79 KB)
ThreadDumpServer2.zip (85.7 KB)

Is optimize locally property for process is selected or cleared?
What activity does the service performs which is associated with long running process?

Hi Shahid,

Optimize locally is set to false. I think enabling this property consumes high memory. In one of the previous instance we have seen it worsens the performance due to high memory usage. None of the services are long running as you can see from the attached sample request. The issue is that it takes lots of time to move from 1 step to another.

Thanks