IS Crashing on a regular Basis every night

Hi there experts,
We are faced with an issue where our servers go down every night and we have to start them up when we come in the morning.

The Version of the IS is 7.1.2.0.
O/S is Solaris 10.

Following is the error we see in our error logs:

2010-07-20 04:00:00 PDT WMERROR f0c5046093ed11df86d3a511cdecfc50N
ULL f0c5046093ed11df86d3a511cdecfc501279623600331 server01.corporate.com:4200
java.security.ProviderException: implNextBytes() failed java.security.ProviderException: implNextBytes() failed
at sun.security.pkcs11.P11SecureRandom.implNextBytes(P11SecureRandom.java:170)
at sun.security.pkcs11.P11SecureRandom.engineNextBytes(P11SecureRandom.java:117)
at java.security.SecureRandom.nextBytes(SecureRandom.java:413)
at com.wm.util.text.UUID.generate(UUID.java:109)
at com.wm.app.audit.impl.AuditRuntime.generateContextId(AuditRuntime.java:137)
at com.wm.app.audit.impl.AuditRuntime.pushContext(AuditRuntime.java:101)
at com.wm.app.b2b.server.AuditLogManager.process(AuditLogManager.java:604)
at com.wm.app.b2b.server.invoke.InvokeManager.invoke(InvokeManager.java:536)
at com.wm.app.b2b.server.invoke.InvokeManager.invoke(InvokeManager.java:381)
at com.wm.app.b2b.server.ServiceManager.invoke(ServiceManager.java:237)
at com.wm.app.b2b.server.BaseService.invoke(BaseService.java:189)
at com.wm.lang.flow.FlowInvoke.invoke(FlowInvoke.java:324)
at com.wm.lang.flow.FlowState.invokeNode(FlowState.java:581)
at com.wm.lang.flow.FlowState.step(FlowState.java:441)
at com.wm.lang.flow.FlowState.invoke(FlowState.java:406)
at com.wm.app.b2b.server.FlowSvcImpl.baseInvoke(FlowSvcImpl.java:1040)
at com.wm.app.b2b.server.invoke.InvokeManager.process(InvokeManager.java:631)
at com.wm.app.b2b.server.util.tspace.ReservationProcessor.process(ReservationProcessor.java:40)
at com.wm.app.b2b.server.invoke.StatisticsProcessor.process(StatisticsProcessor.java:44)
at com.wm.app.b2b.server.invoke.ServiceCompletionImpl.process(ServiceCompletionImpl.java:241)
at com.wm.app.b2b.server.invoke.ValidateProcessor.process(ValidateProcessor.java:51)
at com.wm.app.b2b.server.ACLManager.process(ACLManager.java:228)
at com.wm.app.b2b.server.invoke.DispatchProcessor.process(DispatchProcessor.java:30)
at com.wm.app.b2b.server.AuditLogManager.process(AuditLogManager.java:624)
at com.wm.app.b2b.server.invoke.InvokeManager.invoke(InvokeManager.java:536)

from stderr.log

log4j:ERROR Ignoring configuration file […/properties/log4j_con.properties].
log4j:ERROR Ignoring configuration file […/properties/log4j_con.properties].
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /webMethods6/logs/conversion.log (Too many open files)
at java.io.FileOutputStream.openAppend(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:177)

res_init: socket: Too many open files
res_init: socket: Too many open files
Jul 19, 2010 11:09:45 PM com.webMethods.sc.config.ConfigurationLogger log
WARNING: Failed to process OOB file-change event on file ‘config/backup/policy/Sign_Auth_for_provider.xml’
res_init: socket: Too many open files
res_init: socket: Too many open files

Following are the fixes we have and the JVM Version.

Updates IS_7-1-2_PERF_FIX6
IS_7-1-2_Core_Fix12
IS_7-1-2_Flow_Fix4
IS_7-1-2_SrvPrtcl_Fix10
IS_7-1-2_WebSvcsXML_Fix10
IS_7-1-2_FlatFile_Fix3
IS_7-1-2_XA_Fix2
TNS_7-1-2_DB_Fix4
TNS_7-1-2_Doc_Fix2
IS_7-1-2_PkgMgmt_Fix1
TNS_7.1.2_Partner_Fix5
TNS_7.1.2_Doc_Fix5
TNS_7.1.2_General_Fix3
TNS_7.1.2_DB_Fix8
TNS_7.1.2_MWS_Fix4
Build Number 124
SSL Strong (128-bit)

Server Environment
Java Version 1.5.0_19 (49.0)
Java VM Name Java HotSpot™ Server VM
Java Build Info 1.5.0_19-b02, mixed mode
Java Vendor Sun Microsystems Inc.

Please let me know how to take care of this issue.
Folks your help and assistance is monumentally appreciated.
Sincerely,
Scooby !

“Too many open files” caught my eye. Options:

Review the integrations to make sure files are being closed.

Tweak Solaris to allow the JVM to have more file handles available.

Rob,
Glad to see your respond.
I think the fix for the “Too many open files” issue is ulimit which needs to be set to unlimited.

What do you think about the error in the error log, the one with the word pkcs11 in it?

Thanks Rob !

Scooby

I think pkcs11 is a victim of the files issue. Do you have timestamp for the too many files open error?

No Sir , I looked twice , there is only one and it doesnt match up.

Is there a scheduled task that runs at 4am? Or a bit before?

Yeah there is one…
:slight_smile:

And surprisingly , The IS was down today , but no errors/ entries in any kind of logs.
Arghhhhhhhhh

Would it be possible to disable that task for one day? That might help narrow down what the issue is.

I ve asked the Dev folks for this already.
ill be raising the logging level and ill monitor this , fingers crossed.
Ill keep you posted.

Rob , Thanks for helping me out on this one !
I appreciate it , I really do !

Scoobydoo!

When you run out of open files, any error which requires opening a socket, file etc will fail and this can even lead on corrupted files preventing IS from booting up next time.

But would be easy to run that task on DEV environment and check resources consumed… if you lack of a DEV environment will be an issue! :smiley:

But what do you think of the PKCS11 Error, any ideas?

OK I think increasing the file descriptors did solve this issue.
Its been a while and IS hasnt crashed !
Yippie !

Rob , you rock my man !
Thank you so much for your help !

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.