Thread Pool Warning Threshold Exceeded

Forum Members,

I am facing issue in WM PROD server 9.7 with OS AIX where due to following thread exceeded issue our prod server goes down. Can you look into this exception, it’s looks suspicious this has due to some service hung/custom query ran in mws which is connected to multiple ISs and 1 of the IS is this one which went down suddenly

I ran through forum and found couple of suggestion to increase server thread pool in IS resource setting and i increase thread as well but one of sag post confuse me that this was happen due to IS or MWS that Y i have open this thread.

So can you please look into this exception and suggest me due to which issue/component our PROD IS went down
IF it is bcoz of MWS customer query which i suspect then

please guide me how to increase threads in MWS console/server or i need to go for increasing JVM size


Available Thread Pool Warning Threshold Exceeded: 1% available
Available Thread Pool Warning Threshold Exceeded: 0% available
Error executing custom query for services com.wm.monitor.common.util.MonitorException: java.sql.SQLException: Could not create a connection to ISCoreAudit JDBC Pool Alias.
java.sql.SQLException: Could not create a connection to ISCoreAudit JDBC Pool Alias.
Error executing custom query for services com.wm.monitor.common.util.MonitorException: java.sql.SQLException: Could not create a connection to ISCoreAudit JDBC Pool Alias
Available Thread Pool Warning Threshold Exceeded: 12% available
Error executing custom query for services com.wm.monitor.common.util.MonitorException: java.sql.SQLException: Could not create a connection to ISCoreAudit JDBC Pool Alias.
[1589]
Established new remote connection to local for user userid
Established new remote connection to SSH for user userid
Established new remote connection to IS1 for user userid
Established new remote connection to IS2 for user userid
Established new remote connection to IS3 for user userid
Established new remote connection to One3 for user userid
Established new remote connection to Siebel for user userid
Soap Message Coder warning; unregistered coder for variable customData, using String
Soap Message Coder warning; unregistered coder for variable customData, using String
Soap Message Coder warning; unregistered coder for variable customData, using String
Expired remote connection to IS2 for user userid
] Expired remote connection to One3 for user userid
Expired remote connection to IS4 for user userid
Expired remote connection to IS1 for user userid
Expired remote connection to SSH for user userid
Expired remote connection to Siebel for user userid
Expired remote connection to local for user userid
Soap Message Coder warning; unregistered coder for variable customData, using String
Soap Message Coder warning; unregistered coder for variable customData, using String
Soap Message Coder warning; unregistered coder for variable customData, using String
Soap Message Coder warning; unregistered coder for variable customData, using String
Soap Message Coder warning; unregistered coder for variable customData, using String
Soap Message Coder warning; unregistered coder for variable customData, using String
] Soap Message Coder warning; unregistered coder for variable customData, using String
java.sql.SQLException: Could not create a connection to ISCoreAudit JDBC Pool Alias.
at com.wm.monitor.db.ISDataSource.getConnection(ISDataSource.java:49)

Can you tell me what is your IS 9.7 fix levels and do you see any thread dumps in the Install folders when the IS was hung/not respoonding?

Also what’s the current setting on the JDBC Pool Alias definition?

Minimum Connections 0
Maximum Connections
Available Connections Warning Threshold
Waiting Thread Threshold Count

Also the current Resources page settings for these mainly:

Server Thread Pool
Available Threads
Maximum Threads
Minimum Threads

HTH,
RMG

Hi RMG,

In WM 9.7 IS fixed is IS_9.7_Core_Fix3 can’t go for higher bcoz, i need to justify with strong reason for same to my client as it prod env.

Regd thread dump
Sorry for confusion this IS was installed on window OS where what ever suspicious logs was capture has pasted in thread, i didn’t enable crash dump in window OS

Find the ISCoreAudit JDBC Pool Alias definition
Minimum Connections 0
Maximum Connections 10
Available Connections Warning Threshold 0 %
Waiting Thread Threshold Count 0
Idle Timeout 3300000 milliseconds

Resource setting
Available Threads 84 % (84 Threads)
Maximum Threads 100
Minimum Threads 10
Available Threads Warning Threshold 15 % (15 Threads)
Scheduler Thread Throttle 75 % (75 Threads)
Scheduler Current Threads 0

If you’re running out of threads, then the first step would be to take a thread dump and do some analysis there (e.g. do you see many instances of the same type of thread?) You can also look at your service usage to see if there are many instances of the same service running. Calling the service wm.server.admin:getDiagnosticData should return both pieces of information to you in a ZIP file. E.g.: http://localhost:9999/invoke/wm.server.admin:getDiagnosticData

The IS Admin guide talks about this service a bit.

Percio

Hi Percio,

My PROD server hung again, find the below exception.

Any Suggestion?

Error executing custom query for services com.wm.monitor.common.util.MonitorException: java.sql.SQLException: Could not create a connection to ISCoreAudit JDBC Pool Alias.
Soap Message Coder warning; unregistered coder for variable customData, using String
java.sql.SQLException: Could not create a connection to ISCoreAudit JDBC Pool A

Rajiv – By observing your error message it seems issue is with DB. Did you check with DB team to know any processes got locked or any free connections to the DB during the time when you had this error.

Thanks,

Fine tune the below settings in your JDBC Pools:

Minimum Connections
Maximum Connections
Available Connections Warning Threshold
Waiting Thread Threshold Count
Idle Timeout

Hi Mahesh,

This is my current jdbc pools setting

Minimum Connections 0
Maximum Connections 10
Available Connections Warning Threshold 0 %
Waiting Thread Threshold Count 0
Idle Timeout 90000 milliseconds

Can you suggest fine tune numbers for JDBC pools

Did you call getDiagnosticData like I suggested? The output of that service would tell you whether you were max’ing out your JDBC pool.

The error says “Could not create a connection to ISCoreAudit JDBC Pool A” Try increasing the Max Connection and monitor it for a week or two.

You may have to fine tune other params but we can start with this first.

Hi Rajiv,

please check the value for idle timeout as it sounds a bit large to me.

Additionally it is not an equivalent of multipes of minutes or hours, but something in between.

You might some network based timeouts as the connection kept is opened longer as the networks allows it to be and therefore stops before it times out.

When trying to use such a (stale) connection, this can lead to errors.

Regards,
Holger

MR/Holger/Mahesh/Percio/RMG,

I have tried all options suggested by you guys but still i am getting Could not create a connection to ISCoreAudit JDBC Pool exception.

May I know where can i set this network base timeouts ?

Server which is not responding is my critical/client facing IS which is located in DMZ zone or i can say it’s a enterprise gateway for by INFRA

1> Increase server level threads from 70 to 100
2> increase pool size from max 10 to 20
3> decrease idle timeout from 30k to 9000

But still my prod IS is not responding while i check logs it show above JDBC pool exceptions

Any other suggestion?

Contact SAG support with log.

Rajiv – Can you check do you have ‘WMBATCH_DRIVER’ table exist in your wM product stack tables list ?

Thanks,

Rajiv,

Since you mentioned my name on your post, I must reply and say that I respectfully disagree. I have not suggested any changes to your system. The only suggestion I made so far was for you to collect more information so you can do some root cause analysis, and as far as I can tell, that has not been done.

Don’t get me wrong, the suggestions made in this thread are all good, educated guesses based on the limited information that members have in front of them. However, you must do some root cause analysis to truly understand which suggestions apply, otherwise you are just shooting in the dark.

Not being able to connect to a database could very well be due to configuration in the IS, but it can also be due to a variety of different things, such as a bug, a JDBC driver and database mismatch, configuration on the database itself, or an issue with the network.

I left network for last on purpose because of your last, and very revealing, post, which said: “Server which is not responding is my critical/client facing IS which is located in DMZ zone or i can say it’s a enterprise gateway…”

Typically, servers out in the DMZ do NOT connect to databases for obvious reasons. For this reason, Enterprise Gateways (a.k.a. Reverse Invoke a.k.a. Reverse HTTP Gateways) are normally configured to use local DB. If you MUST connect to a database from this server, you should work closely with the security and networking teams to ensure that all firewall rules are in place and all timeout settings on the IS are configured to match the firewall and database timeouts. Please note you may actually have to go through multiple firewalls to reach a database from the DMZ.

Again, this is yet another guess at what your issue may be. Please collect more information, do some root cause analysis, and partner up with the security, networking and database teams because they should be able to provide you with more insight into the problem.

Good luck,
Percio

Rajiv,
While you are doing the analysis, here are some thoughts…
This thread started with ‘thread pool warning’ problem, but it has got more details and other problems as well…

  1. Thread pool warning:
    This warning would arise only when the configured server thread pool is not sufficient. I see your message as increased from 70 to 100. This number is very small. You can safely increase this number to a minimum of 200 or 250 threads. 70 or 100 is very small, and IS by itself would use mostly in that.

  2. How do measure what should be your thread pool size:
    This depends on the type of implementation you have done. If you have lot of web services, and more number of concurrent requests (TPS), this number might need to be even more. the number of triggers, connection count of triggers, minimum & maximum pool size etc., would be the basis for doing this calculation.

  3. Are you seeing this exception in IS which is located in ‘DMZ’ environment?
    As Percio mentioned, it is not recommended at all that you try to connect to a database from IS located in DMZ. Percio has given more details. Its clearly a security threat, and security & network team will not allow such configurations to be put in place.

Regards
Senthil

Thanks Senthil

  1. I have increase my server thread pool from 100 to 250 threads.

  2. Apologies for creating confusion here, server which is getting hung/crash due to JDBC pool issue was not our DMZ or enterprise gateway server but it’s a normal PROD IS which is reside inside internal firewall of our INFRA.

I am keeping eye on this server for few weeks to check how this perform after increasing server thread pool

Will post exception in future if in case it will get crash again.

Hi Rajiv,
Now with this server thread pool count, you can monitoring your system and see if the resources it is using is as expected (as i said before, triggers, conn count, pool size etc.,). If you see any thread being stuck, blocked, you need to analyze why are they blocked. Few thread dumps and diagnostic data during such time will help. If any thread takes more time to complete, it should be analyzed if that is expected or not. Appropriate expiry time out for any external calls (like db, web services, rest services, send and wait) should be set which will release those resources by not holding them for long if backends are not responding.

Regards
Senthil

Thanks Senthil.

Issue resolved for time being by increasing server threads pool

Yep that make sense… and that is the first thing to do part of the resource tuning settings. :smiley:

HTH,
RMG