Scheduled Task stays 'Running' forever, service being invoked by scheduler is not running

Hello,

We have two clustered Integration Servers in Production, and sometimes a scheduled task that is targeted to run at a single node, will stay permanently with the ‘Running’ status. We have confirmed that service being invoked by the scheduler is NOT marked as a currently running instance under ‘Service Usage’ in the IS admin page.

This problem appears to happen randomly, with no IS shut down or database outage involved.

The ‘Running’ status lasted for several days, and the scheduled task only went back to normal after Suspending/Enabling it.

System info is as follows:

Scheduled task configuration: Repeating Tasks With a Simple Interval, ‘Repeat after completion’ checked

Product webMethods Integration Server
Version 8.2.2.0
Updates IS_8.2_SP2_Core_Fix5
Build Number 228
SSL Strong (128-bit)

Clustering Status Enabled
Session Timeout 60 minutes
Cache Type Coherence
Time To Live 1 network segment

HW.memory[GB] 62.91
CPU.type x86_64
OS.Version 6.5
CPU.sockets 4
OS.Name Red Hat Enterprise Linux Server
CPU.threads 24
OS.Kernel.bits 64
CPU.cores 6

Java Version 1.6.0_27 (50.0)
Java VM Name Java HotSpot™ 64-Bit Server VM
Java Build Info 20.2-b06, mixed mode
Java Vendor Sun Microsystems Inc.
Java Home /opt/softwareag/jvm/jvm160/jre
Java Entrust Toolkit Version Entrust Authority™ Security Toolkit for the Java® Platform version 7.2 SP2 Patch 170072
Java Classpath /opt/softwareag/common/runtime/plugins/org.eclipse.equinox.launcher_1.1.0.v20100507.jar

Thanks in advance for your help.

The latest core fix for 822 is Fix12…can you review the Readme of the fix and try to upgrade to the latest and see if the symptoms go away?

Updates IS_8.2_SP2_Core_Fix12

HTH,
RMG

Are both of your instances running on the same box?

Also what is the setting on the cluster nodes IS JDBC pool the maximum connections number? You may need to bump up to 100 or 150 depends on the DB connections availability and this should help schedulers performance.

HTH,
RMG

rmg: At this moment it is not feasible for us to upgrade to the latest core fix, and the problem only happens in Production. If you know of some specific fix that could be related to our case, could you please refer me to it? (I’ve noticed KB Article ID: 1614329416, and although the description seems identical, the resolution applies to scheduled tasks running on ‘All Cluster Nodes’.)

Regarding the IS JDBC Pool, the number of maximum connections was somewhat low (10), we’ve increased it to 100, let’s see if that helps.

Tong Wang: Each instance runs on its own box.

Thank you both very much for your help.

Regarding the IS JDBC Pool, the number of maximum connections was somewhat low (10), we’ve increased it to 100, let’s see if that helps.

–>Oh 10 is too low and raising it should help it (make sure you update all cluster nodes)

But you are on Core Fix5 not sure what higher fix would help but I suggest going with latest unless there is any specific work around fix released…let me also google Empower.

HTH,
RMG

on IS admin UI, Settings > Resources
check Server Thread Pool, you can increase it, also the "Scheduler Thread Throttle " setting, give more thread to the scheduler.
HTH,

Normally you can increase the settings below and this will allocate the Scheduler Thread Throttle percentage of 75% (by default)

Server Thread Pool:
Maximum Threads
Minimum Threads

So you can try the tuning it based on your IS’s load for better performance.

HTH,
RMG

Hi Guys,

We also have same problem where scheduler jobs shows as ‘Running’ but no instance of scheduler svc runs on either of the clustered nodes. We configured scheduled job to run on ‘any’ node of cluster.
We are at WM 8.2.2 SP2 core fix 1. Is installing latest fix only option? I chked IS jdbc pool and have 100 configured.
When the jobs go into hung state, I see a message in server log saying one of the server nodes left teh cluster and rejoined after a minute or so… Will this be the root cause of the issue? servers going away from cluster and rejoining? We have cluster settings exactly similar to the one mentioned above in the thread…

And I alspo have below queries…

  1. Is there a svc which can show me whether the svc is running or not in service usage page?
  2. If I have to bump the ‘MAx server threads’ in server thread pool under resources, say from 1000 to 1500 what factors I have to consider?

Any pointers to resolve the issue are much appreciated…

Thanks,
VEnkat