Dear colleagues,
We are encountering a very strange issue on the productive environment. We have a cluster with 2 servers (please see settings below). Every day we need to restart both servers so that they can rejoin the cluster. Can you please help us find out what might be causing this?
IS server:
Product webMethods Integration Server
Version 9.9.0.0
Updates IS_9.9_Core_Fix23
TNS_9.9_Fix5
IS_9.9_SPM_Fix4
Build Number 102
Teracotta config:
<tc:tc-config xmlns:tc=“http://www.terracotta.org/config”>
<!-- Tolerant timeout settings taken from: http://www.terracotta.org/documentation/high-availability.html -->
<!-- l2 to l1 is Server timing out (and ejecting) the Client -->
<property name="l2.healthcheck.l1.ping.enabled" value="true" />
<property name="l2.healthcheck.l1.ping.idletime" value="5000" />
<property name="l2.healthcheck.l1.ping.interval" value="1000" />
<property name="l2.healthcheck.l1.ping.probes" value="3" />
<property name="l2.healthcheck.l1.socketConnect" value="true" />
<property name="l2.healthcheck.l1.socketConnectTimeout" value="5" />
<property name="l2.healthcheck.l1.socketConnectCount" value="10" />
<!-- Client reconnection properties -->
<property name="l2.l1reconnect.enabled" value="true" />
<property name="l2.l1reconnect.timeout.millis" value="2000" />
server-data
server-logs
9510
9520
9530
server-data
server-logs
9510
9520
9530
%(com.softwareag.tc.client.logs.directory)
Teracotta client log:
2018-06-02 04:09:31,547 [RemoteTransactionManager Flusher] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[3683]: Ignoring RemoteTransactionManagerTask because status State[ REJOIN_IN_PROGRESS ]
2018-06-02 04:09:37,564 [RemoteTransactionManager Flusher] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[3683]: Ignoring RemoteTransactionManagerTask because status State[ REJOIN_IN_PROGRESS ]
2018-06-02 04:09:43,669 [RemoteTransactionManager Flusher] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[3683]: Ignoring RemoteTransactionManagerTask because status State[ REJOIN_IN_PROGRESS ]
2018-06-02 04:09:43,838 [L1_L2:TCComm Main Selector Thread_R (listen 0:0:0:0:0:0:0:0:38993)] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(-1.ffffffffffffffffffffffffffffffff.03506bc8-8a62-4248-abb7-212431fe288b-163ba0da4a7.USER): CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1558944424: connected: false, closed: true local=10.12.141.33:33102 remote=10.12.141.33:9510 connect=[Sat Jun 02 04:09:31 CEST 2018] idle=12442ms [0 read, 0 write]. STATUS : SYN_SENT
2018-06-02 04:09:43,838 [L1_L2:TCComm Main Selector Thread_R (listen 0:0:0:0:0:0:0:0:38993)] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(-1.ffffffffffffffffffffffffffffffff.03506bc8-8a62-4248-abb7-212431fe288b-163ba0da4a7.USER): closing down connection - com.tc.net.core.TCConnectionImpl@1558944424: connected: false, closed: true local=10.12.141.33:33102 remote=10.12.141.33:9510 connect=[Sat Jun 02 04:09:31 CEST 2018] idle=12442ms [0 read, 0 write]
2018-06-02 04:09:43,838 [L1_L2:TCComm Main Selector Thread_W (listen 0:0:0:0:0:0:0:0:38993)] INFO com.tc.net.core.TCConnection - error writing to channel java.nio.channels.SocketChannel[closed]: null
2018-06-02 04:09:55,707 [RemoteTransactionManager Flusher] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[3683]: Ignoring RemoteTransactionManagerTask because status State[ REJOIN_IN_PROGRESS ]
2018-06-02 04:09:55,709 [TC Memory Monitor] WARN tc.operator.event - NODE : ClientID[3683] Subsystem: MEMORY_MANAGER EventType: MEMORY_LONGGC Message: Detected long GC>8,000ms. GC count:2. GC Time:11,753ms. Frequent long GC cycles cause severe performance degradation.
2018-06-02 04:10:00,760 [L1_L2:TCComm Main Selector Thread_W (listen 0:0:0:0:0:0:0:0:38993)] INFO com.tc.net.core.TCConnectionManager - error event on connection com.tc.net.core.TCConnectionImpl@1558944424: connected: false, closed: true local=10.12.141.33:33102 remote=10.12.141.33:9510 connect=[Sat Jun 02 04:09:31 CEST 2018] idle=29364ms [0 read, 0 write]: null
2018-06-02 04:10:00,772 [RemoteTransactionManager Flusher] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[3683]: Ignoring RemoteTransactionManagerTask because status State[ REJOIN_IN_PROGRESS ]
2018-06-02 04:10:00,894 [Rejoin Worker] WARN com.tc.platform.rejoin.RejoinManagerImpl - Error during channel open
java.net.ConnectException: Connection refused
Best regards,
Oliver