The scheduler system is dead

We meet with a problem on product machine.
The scheduler system is dead.We cannot open the scheduler page in IS administrater.and we write a service to suspend scheduler and we find the service is dead when it invoke pub.scheduler:getTaskIDs .

we also found there is a scheduler which is runing very frequently and the scheduler service is also dead and the machine 's memeory usage is higher than normal condition

Wubinzi,

I believe the Jobs.cnf file may got corrupted.So delete this file or copy this file from your test environment and put it in prod. c:\WmHomedir\IntegrationServer\config\jobs.cnf

Restart your IS and recreate the Schedulers that is previously existing in the Prod.

Sorry this might be a crazy resolution but fast process.

HTH,
RMG.

We have also seen this problem on one of our test servers. The other servers seem to be fine. We have a flow that suspends all of the schedules before we restart in the morning and it has been hanging as described in the first post. But on further investigation, all of the Simple/One-time schedules stopped running hours before the restart was kicked off.

We have noticed that once the Simple/One-time schedules stop running we can no longer get to the Scheduler admin page and the IS will not shutdown if you try and restart it. However any complex schedules that are setup continue to run normally.

At this point, I think I am going to try RMGs idea and recreate the schedules.

Thanks

Wubinzi/Rick,

As another option Report this problem to Wm Tech Support,since this is a critical prod/Test issue.

But I have not seen this behaviour anytime for Simple/One-time schedule jobs hanging,

Please share us the response if you get any better resolution from WM Tech Support folks,so that wmusers can aware of this in future.

Thanks,

I work with IS4.6 on an AIX 5.2 System and I encountered that problem several times. When the Scheduler page was unreachable, I tried to invoke the service that list all the scheduler but it was running for long timing. In that case, two thread were victims of a deadlock; I was able to watch it by sending a signal to the JVM so that it creates a javacore (kill -3 PID where PID=IS’ process PID).

The problem was solved by upgrading the IBM JVM from 1.3.1.11 to 1.3.1.15.

Well, I took the simple solution first before contacting support. I stopped the IS and deleted the jobs.cnf file. Then restarted and recreated all of the jobs. So far we have been running for 24 hours and our restart worked fine this morning. Prior to recreating the jobs, we were lucky if we could run 2 hours without the schedules getting messed up.

Thanks for the idea RMG

Dependingon what version of IS you are running the problem and solution has different nature.
I seen this porblem in IS4.x versions and WM released the fix for this porblem. The issues is that scheduler suer and system jobs were all running from sdame thread pool. Thread pool is the key here! If your user job was coded in funny way and hangs for any reason it will ocntunuew rstarting untill it finished allavavilable thareds fr IS and it will hang the scheduler and then server. Later versions 4.6-6.0 scheduler was separated into its own thread pool but still if one job used up all threads your scheduler will hang forever. Is will run.
WM Fix#89 fixes this problem in 6.0.1 by allocation separate pool for only user thareds.