Optimize for Infrastructure 7.x

Anyone tried to catch JVM out of heap space errors?
java.lang.OutOfMemoryError: Java heap space

I can measure memory usage, free memory, total memory and compare these in all sorts of neat ways.

However Integration Server utilizing above 80 or even 99% does not necessarily cause a Java heap space error. Thus I receive a lot of false positive alerts.

Also after a heap space error the Integration Server does not necessarily die, usually it just reports itself online but stops processing. Thus I can catch the issue by checking some other things. But I would like to be able to catch the issue quicker. Currently I know the when things should happen on the IS but it’s like did these 4 things happen in the past 60 minutes, if not- dead.

Any help be most appreciated.

We wrote a simple java program to tail -f and quick analyze full gc’s in verbose JVM logs of the running production IS. For example, we’ve empirically found large TN EDI documents cause 10+ full gc in a sequence. This is easily detected and seen in gc verbose logs. Upon java’s discovery of a suspected problem it publishes a broker document to trigger Optimize for Infrastructure to violate a custom rule. Eventually a high trouble ticket pops to the correct TM support group (EDI, not me!).
Our 6-5 IS runs for months at 4-6 GB JVM without any out of memory errors.