Optimized Thread at Integration Server - Performance

Fandy_Chandra · May 7, 2018, 6:10am

Good day all,

I got some several problem with my IS (9.7)

Some of the transaction got slow because of something and I need to check it the cause.

The IS Resources are like this:
Available Threads 96 % (1920 Threads)
Maximum Threads 2000
Minimum Threads 100
Available Threads Warning Threshold 15 % (300 Threads)
Scheduler Thread Throttle 75 % (1500 Threads)
Scheduler Current Threads 0

The statistic is showing less than available resources:
Current Max
Total Sessions 173 320
Licensed Sessions 169 290
Stateful Sessions 30 37
Service Instances 98 360
Service Threads 14 113
System Threads 842 943

FYI some of slowness are comes from local IS method like pub:schema:validate.

So I decided to analyze the thread dump and got this:
TIMED_WAITING = 500
WAITING = 223
RUNABLE = 120

So my questions are:

Suppose that we still have enough resource to proccess the thread, but why the RUNABLE thread is only 120?
What is the diff between The service and system thread that shown at the statistic page?
What others cases that can cause the slowness of the service? (exclude the calling to backend)

Best Regards

Holger_von_Thomsen · May 7, 2018, 2:17pm

Hi Fandy,

having enough threads available requires also to have enough JVM memory available.

Please estimate the size of the documents being validated to check for the JVM memory being required.

Please note that IS running in 32bit mode can only access up to 3,5 GB as JVM memory.
When more memory is required you will have to switch to 64bit mode.

As the IS is running some internal tasks in parallel to the custom service tasks the difference is correct.

A maximum of 2000 thread sounds a bit high to me (we usually have 500 threads max configured) and mostly running our instances with 2GB max JVM memory.

Regards,
Holger

rmg · May 7, 2018, 5:52pm

Please also check with respect to JVM memory allocated to that IS as Holger mentioned above and most of the time either hung threads or JVM will be culprit unless there are no other transient or network issues noticed during the peak load of your IS/OS that leads to this kind of slowness from your environment run-time wise.

HTH,
RMG

Fandy_Chandra1 · May 8, 2018, 3:26am

Hi Holger and rmg,

thanks for ur reply.

Allocated memory for JVM is 8GB and avarage consuming about 40% of the memory. We use 64bit Windows server 2012
size of the doc is around 2kb.

some of the slowness is come from network (when we hit the backend) and its Okay, but the one that got my concern is the local service it self (wmUtil) taking sometimes (2s sometimes more).

are there other way to check how good is IS performance is?

Best Regards

Holger_von_Thomsen · May 8, 2018, 3:05pm

Hi Fandy,

did you stop and restart the IntegrationServer just for the case that there are some blocked threads inside the JVM?

Are you sure, that it is the service which is so slow, or just the auditing subsystem in the IntegrationServer?
Even when it is asynchronous (using a local derby DB as a buffer), it will take some threads of the IS when there is a lot of information to be written in the audit database (i.e. session log, error log, service log)

Regards,
Holger

rmg · May 8, 2018, 4:02pm

Yes some times you may have to consider the slowness if you are set the audit enabled for (svc,session,security etc…) and during the peak time you could see the memory spikes etc… which is normal but as long as the IS resources are not stuck and it responding the requests as normal with bit latency.

HTH,
RMG

Fandy_Chandra · May 16, 2018, 12:26pm

Hi Holger & rmg,

no I did’t restart the IS. blocked thread is no longger there because I turn off the service.
I still doesn’t know what is the root cause actually, is it the service design is not good or external factor.

In normal condition, the end to end transaction can take only 1s, but during the slowness it takes almost 20s.

The client now install the monitoring tools called Dynatrace (i’m not sure I can trust this or not), and showing that the slowness comes from the service level. PFA some sample captured by Dynatrace

As you can see from the screenshot (if Im right how to read it):

Total transaction for 1 service exec time is about 19s
Response from backend very fast only 134ms
Some hidden (unknown method) takes almost 14s, don’t know what is that, and causing error:
“[ISS.0088.9145] this SOAPEnvelope object does not contain a valid SOAPHeader object”. The service that throw this is “com.wm.app.b2b.server.ServiceException”

Usually we only blame the resp from backend is slow, but after this monitoring tools, I need to further investigate is there any problem with the product

Another thing, during peak hours, we have almost 180k transaction per hour. Is it a huge transaction? or some thing that IS can handle normaly?

Best Regards

Gerardo_Lisboa · May 17, 2018, 10:59am

Hi,

If you enable JMX monitoring (it needs a restart) you can get more details of the inner JVM threads and compare JVM CPU usage against system CPU usage.

Although, another monitoring tool adds to more performance impact.

Can you check if it is not system I/O that’s slowing your system? Maybe even swapfile trashing?

Another source of info is the amount of I/O on communications with UM/Broker, DB and Derby (the inner DB used for internal queue management).

Best regards,

Cong_Ngo_Dinh · August 24, 2018, 6:16am

Hi Fandy,
Can you share status of this issue and how did you resolve it ?

Fandy_Chandra · August 24, 2018, 7:36am

Hi @Cong,

yeah we found out that the problem is not with the IS but with the broker.
It took almos 3-5s per each guarantee document that we publish to waiting for ack from broker.
Our broker using share SAN storage as data and got really bad disk write and read.

Cong_Ngo_Dinh · August 24, 2018, 8:35am

Thanks for sharing the status.

Optimized Thread at Integration Server - Performance

Related topics