IS out of Clustering

Hi All,

We are having two wM IS 7.1.2 in working in cluster but recently after restart one of them has come out of cluster and is not joining back after multiple restart attempts.
I already tried replacing the Cache files from backup in case they might have got corrupted but still no success.

I can see following error in Server Log during restart:
2015-10-18 00:13:11 NZDT [WmSharedCacheSC.config.0595I] Reading cache configuration: distributedCache
2015-10-18 00:14:43 NZDT [WmSharedCacheSC.impl.0200E] Failed to create/get distributed cache ‘IS Cluster:ISProdCluster’; reason: java.lang.RuntimeException: Failed to start Service “Cluster” (ServiceState=SERVICE_STOPPED, STATE_ANNOUNCE); – check configuration
2015-10-18 00:14:43 NZDT [ISS.0033.0104I] Could not add server to the cluster. com.webMethods.sc.caching.CachingException: Failed to create/get distributed cache ‘IS Cluster:ISProdCluster’; reason: java.lang.RuntimeException: Failed to start Service “Cluster” (ServiceState=SERVICE_STOPPED, STATE_ANNOUNCE); – check configuration

Although, other server is still showing up in cluster.
I am not sure what went wrong as both these servers were in cluster before restart.

Any help is much appreciated.

Thanks,
Amit

Check tangosol-coherence-override.xml if this file looks ok? I mean compare it with a good env. file.

Amit, copy coherence-login.jar, coherence.jar and tangosol.jar from common/lib/ext of the working environment into the non-working environment resolved the issue. Also make sure specifying the correct cluster name on both servers

Thanks,

Please check with your network or OS team if there any changes to the environment that hosted your wM and lot of times there might be changes could affect the cluster settings and since it is working before.

HTH,
RMG

Thanks everyone for the suggestions.

I have already compared tangosol-coherence-override.xml with working env and it looks fine.

Will try to replace coherence-login.jar, coherence.jar and tangosol.jar from working env and will share the outcome.

We recently had unix solaris server reboot on which this IS is hosted, only after that this issue started appearing.
I checked with Unix team and they confirmed they haven’t done anything except server reboot.
Is there anything specific I should ask them to check ?

Also, I tried to run multicast test at location “$wMHome/common/lib/ext” but it is giving below error.
Command:
java .classpath tangosol.jar com.tangosol.net.MulticastTest .group 230.0.2.1

Error:
Exception in thread “main” java.lang.NoClassDefFoundError: /classpath

Thanks,
Amit

Just found one more issue in our IS.
We are getting "[ISS.0033.0103W] Could not open the session cache. " error for File polling ports as well.

It seems IS is not able to connect to Distributed cache and that is causing all these errors.

Not sure how to fix this.

This is resulting in failure of transactions.

Please suggest.

Thanks,
Amit

Amit – Please go through below thread, you can find inputs.

http://tech.forums.softwareag.com/techjforum/posts/list/15/36561.page

Let me know if you will still having issues after performing steps suggested in above given thread.

Thanks,

Hi All,

This issue has been resolved now.
Network team has resolved this issue as per below details:
[color=red]
Was to vmotion the guest server1 to be on the same host as the guest server2.
Once this was done the arp tables were cleared on server1

And test in both directions were successful.
[/color]

This was caught in Multicast testing.

Thanks,
Amit