Best Practice for Automatic Integration Server re-start/reco

Hello,
Any Thoughts/Suggestions/Guidance on manner of Automatic Integration Server
re-start/recovery procedures at your installation (both dev and production).
With the introduction of Lockfile, there is more than one way to initiate an auto-restart , wanted to know what has been the experience at your installation sites.
All comments and thoughts highly appreciated

Thanks
Ash

ashghogale@deloitte.com
aghogale@noblecorp.com

Actually would like to have the best practice/experiences for Developer alongwith the IS.

Thanks again - Ash

I duno if my response is what you’re looking for but …

We are running our I.S. under Unix and I have implemented the following procedure mainly to avoid duplicate start of the I.S. (that definitively creates mess in IS’s repository and database).

In case of problem, the startup script is crashing, and our operators have to apply procedures.

So before really starting the I.S., the script does following checks :

1/ check if there is a running server.sh
The first version checked presence of JAVA process, but it wasn’t safe as it’s quite difficult to know if it’s an I.S. or another Java thing,

2/ Check if LOCKFILE is present. If so, I crash the script and operators have to check if there is another I.S. running.
If not, the remove the file and rerun the starting script.

-> It wasn’t enough because sometime, they remove the file even if the I.S. is there, mainly because the I.S. wasn’t responsive or was starting up.

3/ Check if the main admin port is owned or not.

A very very good security, but only if the I.S. has already loaded all its packages. As unfortunately, it owns this port after loading everything, and as it took about 4 or 5 minutes to load all packages, during this time, this check is not enough.

Using this 3 sequential check, we prevented duplicate start of the I.S. for last 3 years.

When the I.S. crashes, 1st we are trying to restart it, letting it to do it’s own cleaning. Time to time, it failed to rebuild it’s repo, so we did some manual cleaning of repo directories.

Hopping it helps.

Best regards,

Laurent