weird error in Production environment

Hi Experts,

Recently we have come across below issue in Prod. Kindly look in and shed some light on it.

Error Msg : Scheduler: Resources unavailable: [9119] Unable to allocate a new thread in the time specified
Envir : wM 8.2
IS Server has been running for last 63 days and for the last 2 days we have been seeing above specified error message.Un able to open Adminstrator page even in diagnostic mode. We are unable to check what is the utilization of
memeory, how many threads currently running, nothing …
Did anyone come across this situation and if so what all steps followed to resove this.

As we didn’t find any alternative we have gone for Server re-start.

Thanks,
Prabhakar

See if this helps;

Did you try to use the stats.log file - which will have the statistics even if the admin page is not available. You can use the stats parser jar which can be downloaded from code samples (e.g. [url]http://techcommunity.softwareag.com/ecosystem/communities/codesamples/webmethods/esb/SAMPLE-20130304115622257.html[/url]).

We had similar problems in past and even now - the root cause for us was some threads hanging for a long time which also blockked OS resources. The problem was mostly in our test server, so we didn’t investigate much,

  1. when the admin page is accessible, we killed the hanging thread
  2. otherwise restarted the server.

After restart change the Thread settings under Settings → Resources.

Default is Minimum 10 Threads, Maximum 75 Threads.

When this message occurs, this means that all 75 threads are currently busy and that the next request did not get a thread in time.

Remark for using the StatsLogParser tool (which previously was available on Advantage [the old webMethods-Homepage]):
This tool only works if the JVM Max HeapSize does not exceed 2GB. Otherwise it will not be able to load the data.
Additionally I am not sure if it is installable on 64bit versions of Windows.

Regards,
Holger

R.P,

Yes tuning Resource settings mentioned above to be start with.

Also a side notes, make sure you have a system restarts every 30+ days especially production (schedule a downtime weekend maintenance window to keep servers healthy and responsive times along with regular gc jobs) and you may need to check the jvms heap dump that the sever might be going thru (assuming you have all the latest IS core fixes up to date patching)

HTH,
RMG

Yes, As suggested by RMG, the best way to keep the server healthy is restarting the server once in a month.

we do follow the same , take the maintenance window when there is a very limited transactions flow happens.

Regards
Siva

RP,

Also while the IS admin page not accessible did you check the Integration Server folder it should have create jvm heap dump and java core files for further analysis?

HTH,
RMG

Hi RP,

My 2 cents:

  1. Check the JVM version. If possible update to latest stable release.
  2. Take thread dump of JVM and investigate.
  3. Check the network resources like DB, IO operations and Adapters.
  4. Check CPU usage, RAM, disk space, etc.

HTH.

Thanks,
Rankesh

Thanks all for your valuable inputs.

I have checked stats logs but no much infor exist in that. There is no issue with disk space and Unix system is quite good. This is the 1st time we had this issue. We usually re-start server for every 1 month but this time we didn’t able to.

Thanks,
RP

I would like to add few more points :slight_smile: Do a daily sanity check on all your IS servers and take the below actions

1> You can reboot the IS if used memory is greater than 90% (You see this on statistics page)
2> Check the currently running services if it running for more than 2 hours
3> You can do a restart of IS during out of business hours (When there is less or no transactions)

Thanks Mahesh for your 2 cents.

Thanks,
RP