IS servers leaving cluster on a daily basis

Niemand23_Niemand · June 4, 2018, 2:03pm

Dear colleagues,

We are encountering a very strange issue on the productive environment. We have a cluster with 2 servers (please see settings below). Every day we need to restart both servers so that they can rejoin the cluster.  Can you please help us find out what might be causing this?

IS server:

Product webMethods Integration Server
Version 9.9.0.0
Updates IS_9.9_Core_Fix23
TNS_9.9_Fix5
IS_9.9_SPM_Fix4
Build Number 102

Teracotta config:

<tc:tc-config xmlns:tc=“http://www.terracotta.org/config”>

     <!-- Tolerant timeout settings taken from: http://www.terracotta.org/documentation/high-availability.html -->
     <!-- l2 to l1 is Server timing out (and ejecting) the Client  -->
     <property name="l2.healthcheck.l1.ping.enabled" value="true" />
     <property name="l2.healthcheck.l1.ping.idletime" value="5000" />
     <property name="l2.healthcheck.l1.ping.interval" value="1000" />
     <property name="l2.healthcheck.l1.ping.probes" value="3" />
     <property name="l2.healthcheck.l1.socketConnect" value="true" />
     <property name="l2.healthcheck.l1.socketConnectTimeout" value="5" />
     <property name="l2.healthcheck.l1.socketConnectCount" value="10" />

     <!-- Client reconnection properties -->
     <property name="l2.l1reconnect.enabled" value="true" />
     <property name="l2.l1reconnect.timeout.millis" value="2000" />

server-data server-logs 9510 9520 9530 server-data server-logs 9510 9520 9530 %(com.softwareag.tc.client.logs.directory)

Teracotta client log:

2018-06-02 04:09:31,547 [RemoteTransactionManager Flusher] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[3683]: Ignoring RemoteTransactionManagerTask because status State[ REJOIN_IN_PROGRESS ]
2018-06-02 04:09:37,564 [RemoteTransactionManager Flusher] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[3683]: Ignoring RemoteTransactionManagerTask because status State[ REJOIN_IN_PROGRESS ]
2018-06-02 04:09:43,669 [RemoteTransactionManager Flusher] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[3683]: Ignoring RemoteTransactionManagerTask because status State[ REJOIN_IN_PROGRESS ]
2018-06-02 04:09:43,838 [L1_L2:TCComm Main Selector Thread_R (listen 0:0:0:0:0:0:0:0:38993)] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(-1.ffffffffffffffffffffffffffffffff.03506bc8-8a62-4248-abb7-212431fe288b-163ba0da4a7.USER): CLOSE EVENT : com.tc.net.core.TCConnectionImpl@1558944424: connected: false, closed: true local=10.12.141.33:33102 remote=10.12.141.33:9510 connect=[Sat Jun 02 04:09:31 CEST 2018] idle=12442ms [0 read, 0 write]. STATUS : SYN_SENT
2018-06-02 04:09:43,838 [L1_L2:TCComm Main Selector Thread_R (listen 0:0:0:0:0:0:0:0:38993)] WARN com.tc.net.protocol.transport.ClientMessageTransport - ConnectionID(-1.ffffffffffffffffffffffffffffffff.03506bc8-8a62-4248-abb7-212431fe288b-163ba0da4a7.USER): closing down connection - com.tc.net.core.TCConnectionImpl@1558944424: connected: false, closed: true local=10.12.141.33:33102 remote=10.12.141.33:9510 connect=[Sat Jun 02 04:09:31 CEST 2018] idle=12442ms [0 read, 0 write]
2018-06-02 04:09:43,838 [L1_L2:TCComm Main Selector Thread_W (listen 0:0:0:0:0:0:0:0:38993)] INFO com.tc.net.core.TCConnection - error writing to channel java.nio.channels.SocketChannel[closed]: null
2018-06-02 04:09:55,707 [RemoteTransactionManager Flusher] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[3683]: Ignoring RemoteTransactionManagerTask because status State[ REJOIN_IN_PROGRESS ]
2018-06-02 04:09:55,709 [TC Memory Monitor] WARN tc.operator.event - NODE : ClientID[3683] Subsystem: MEMORY_MANAGER EventType: MEMORY_LONGGC Message: Detected long GC>8,000ms. GC count:2. GC Time:11,753ms. Frequent long GC cycles cause severe performance degradation.
2018-06-02 04:10:00,760 [L1_L2:TCComm Main Selector Thread_W (listen 0:0:0:0:0:0:0:0:38993)] INFO com.tc.net.core.TCConnectionManager - error event on connection com.tc.net.core.TCConnectionImpl@1558944424: connected: false, closed: true local=10.12.141.33:33102 remote=10.12.141.33:9510 connect=[Sat Jun 02 04:09:31 CEST 2018] idle=29364ms [0 read, 0 write]: null
2018-06-02 04:10:00,772 [RemoteTransactionManager Flusher] INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[3683]: Ignoring RemoteTransactionManagerTask because status State[ REJOIN_IN_PROGRESS ]
2018-06-02 04:10:00,894 [Rejoin Worker] WARN com.tc.platform.rejoin.RejoinManagerImpl - Error during channel open
java.net.ConnectException: Connection refused

Best regards,
Oliver

r_eamon · June 25, 2018, 5:30pm

Not sure if you want to pursue this line of thinking, but perhaps one approach is to eliminate the use of IS clustering? What specific features of IS clustering is your environment specifically leveraging?

Niemand23_Niemand · June 25, 2018, 10:16pm

Hello,

We need at least 2 servers in a cluster to handle the load which is significant on the productive environment.
Quick question: I can see in a lot of tc server log entries " ClientID[6178] Subsystem: MEMORY_MANAGER EventType: MEMORY_LONGGC Message: Detected long GC>8,000ms. GC count:4. GC Time:15,761ms. Frequent long GC cycles cause severe performance degradation"

What does the 8,000ms mean exactly. Is it the maximum allowed time for gc?
If so, is there any setting on Teracotta side to increase this threshold?

Thank you in advance,
Oliver

r_eamon · June 25, 2018, 10:49pm

You don’t need an IS cluster for that. You just need a load balancer in front of them.

This old, old thread may be helpful.

[url]http://tech.forums.softwareag.com/techjforum/posts/list/40113.page[/url]

Note that the posts about scheduled tasks needing IS clustering have been overcome by events – those no longer need IS clustering and instead use a shared DB to manage “run on any single instance” tasks.

Topic		Replies	Views
Integration Server Clustering webMethods , webMethods-General , webMethods-Architecture , Integration-Server-and-ESB	5	4775	April 2, 2021
Cluster Hosts are disapearing webMethods , Integration-Server-and-ESB	8	794	April 2, 2021
IS not joining the cluster during initial configuration webMethods , Terracotta , webmethods-terracotta , IS-Clustering	11	2400	October 13, 2022
Webmethods Integration Server. Performance Issue webMethods , Universal-Messaging-Broker , B2B-Integration	2	2277	November 19, 2021
Servers constantly get out of cluster webMethods , B2B-Integration , Integration-Server-and-ESB	14	997	September 12, 2021

IS servers leaving cluster on a daily basis

Related topics