I duno if my response is what you’re looking for but …
We are running our I.S. under Unix and I have implemented the following procedure mainly to avoid duplicate start of the I.S. (that definitively creates mess in IS’s repository and database).
In case of problem, the startup script is crashing, and our operators have to apply procedures.
So before really starting the I.S., the script does following checks :
1/ check if there is a running server.sh
The first version checked presence of JAVA process, but it wasn’t safe as it’s quite difficult to know if it’s an I.S. or another Java thing,
2/ Check if LOCKFILE is present. If so, I crash the script and operators have to check if there is another I.S. running.
If not, the remove the file and rerun the starting script.
-> It wasn’t enough because sometime, they remove the file even if the I.S. is there, mainly because the I.S. wasn’t responsive or was starting up.
3/ Check if the main admin port is owned or not.
A very very good security, but only if the I.S. has already loaded all its packages. As unfortunately, it owns this port after loading everything, and as it took about 4 or 5 minutes to load all packages, during this time, this check is not enough.
Using this 3 sequential check, we prevented duplicate start of the I.S. for last 3 years.
When the I.S. crashes, 1st we are trying to restart it, letting it to do it’s own cleaning. Time to time, it failed to rebuild it’s repo, so we did some manual cleaning of repo directories.
Hopping it helps.