Core dump in Apama

Hi ,

We have installed UM and Apama 9.12 with terracotta in the same server. From the logs, it looks like an issue due to nirvana (UM plugin).

Any idea on the root cause of the issue?

/mnt/Disk1/softwareag/Apama/lib/libapcommon.so.9.12(ap_stackTrace+0x14)[0x7fc75e1c3914]
/mnt/Disk1/softwareag/Apama/lib/libapclient.so.9.12(_ZN3com5apama11coreHandlerEiP7siginfoPv+0x14d)[0x7fc75dc178cd]
/mnt/Disk1/softwareag/jvm/jvm/jre/lib/amd64/server/libjvm.so(+0x923e92)[0x7fc758264e92]
/mnt/Disk1/softwareag/jvm/jvm/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0xb6)[0x7fc75826b526]
/mnt/Disk1/softwareag/jvm/jvm/jre/lib/amd64/server/libjvm.so(+0x920b03)[0x7fc758261b03]
/lib64/libpthread.so.0(+0xf370)[0x7fc75abdf370]
/mnt/Disk1/softwareag/Apama/bin/correlator(_ZN11NamedParent6removeERKSs+0x48)[0xa95f18]
/mnt/Disk1/softwareag/Apama/bin/correlator(_ZN14ObjectScaffoldI18CorrelatorSequence19ObjectVisitorVeneerI8ObjectCCI22CorrelatorObjectBaseGCEEE14deepDeleteImplEv+0x8e)[0xa11bbe]
/mnt/Disk1/softwareag/Apama/bin/correlator(_ZN15CorrelatorEvent14deepDeleteImplEv+0x49)[0x9ff9b9]
/mnt/Disk1/softwareag/Apama/bin/correlator(ZN3com10softwareag12connectivity25HostSideMsglibBridge_Maps19consumeEventsLockedEPNS1_7MessageES4+0x15fb)[0xf6498b]
/mnt/Disk1/softwareag/Apama/bin/correlator(ZN3com10softwareag12connectivity12MsglibSender24HostSideMsglibBridgeBase20sendBatchTowardsHostEPNS1_7MessageES5+0x449)[0xf5b059]
/mnt/Disk1/softwareag/Apama/bin/correlator(internal_host_side_send_fn+0x14)[0xf80344]
/mnt/Disk1/softwareag/Apama/lib/libconnectivity-json-codec.so(sag_plugin_sendBatchTowardsHost_JSONCodec+0xe)[0x7fc73c69adfe]
/mnt/Disk1/softwareag/Apama/bin/correlator(ZN3com10softwareag12connectivity6CodecC20sendBatchTowardsHostEPNS1_7MessageES4+0x12)[0xf4d732]
/mnt/Disk1/softwareag/Apama/bin/correlator(internal_host_side_send_fn+0x14)[0xf80344]
/mnt/Disk1/softwareag/Apama/lib/libconnectivity-string-codec.so(sag_plugin_sendBatchTowardsHost_StringCodec+0xe)[0x7fc73c48386e]
/mnt/Disk1/softwareag/Apama/bin/correlator(ZN3com10softwareag12connectivity6CodecC20sendBatchTowardsHostEPNS1_7MessageES4+0x12)[0xf4d732]
/mnt/Disk1/softwareag/Apama/bin/correlator(internal_host_side_send_fn+0x14)[0xf80344]
/mnt/Disk1/softwareag/Apama/lib/libUMConnectivity.so(_ZN3com10softwareag12connectivity7plugins11UMTransport23flushMessageTowardsHostEv+0xad)[0x7fc73ea483fd]
/mnt/Disk1/softwareag/Apama/lib/libUMConnectivity.so(_ZN3com10softwareag12connectivity7plugins11UMTransport2goEPNS_6pcbsys7nirvana6client13nConsumeEventE+0x88f)[0x7fc73ea4c37f]
/mnt/Disk1/softwareag/Apama/…/UniversalMessaging/cplus/lib/x86_64/libNirvana.so(_ZN3com6pcbsys7nirvana5nbase18nClientChannelList12pushCallbackEPNS1_6client13nConsumeEventEPh+0xd9)[0x7fc73d6df169]
/mnt/Disk1/softwareag/Apama/…/UniversalMessaging/cplus/lib/x86_64/libNirvana.so(_ZN3com6pcbsys7nirvana5nbase9nPushTask11handleEventEPNS0_10foundation2io10fBaseEventE+0xa1)[0x7fc73d6e68a1]
/mnt/Disk1/softwareag/Apama/…/UniversalMessaging/cplus/lib/x86_64/libNirvana.so(_ZN3com6pcbsys7nirvana5nbase9nPushTask3runEv+0x373)[0x7fc73d6e6f83]
/mnt/Disk1/softwareag/Apama/lib/libPocoFoundation.so.43(_ZN4Poco12PooledThread3runEv+0x1d9)[0x7fc73dacfe19]
/mnt/Disk1/softwareag/Apama/lib/libPocoFoundation.so.43(_ZN4Poco10ThreadImpl13runnableEntryEPv+0x9a)[0x7fc73dacaada]
/lib64/libpthread.so.0(+0x7dc5)[0x7fc75abd7dc5]
2017-06-29 11:27:23.282 FATAL [140490654471936] - Will now dump core

Hi Sush,

Is is possible to get some more information from you about what’s going on in the system at the time this happened? The rest of the log file would be particularly useflul, for example. In particular are you running the engine_delete tool or are you attempting to shut down the process?

If the stack is to be believed it’s deleting a message being delivered from UM - which should be reproducible without any UM code, there’s an API boundary in there which shouldn’t allow anything that UM does to crash the correlator. I say ‘if the stack is to be believed’, because as far as I can tell from reading the code, there’s no way that deepDeleteImple should be able to call NamedParent::remove (which is on the next line in your stack trace).

In order for us to look into it further I have some additional questions as well as those above:

  • have you applied all the 9.12 patches (I would be able to tell this if you sent the full log file)
  • I see Java is enabled in this process. Did it produce an hs_err_pid error file? That may have further useful information
  • was anything else produced for example on standard error?
  • Can you send us the core file it generated
  • Is this reproducible or did you see it just once?
  • Can you generate an input log from a run when it crashed and send us that
  • Can we see your project?

HTH
Matt