BrokerServers(A01 and A02) are clustered in in a High Availability Windows 2003 Enterprise_Server_Cluster Mode.
Yesterday The BrokerServer A01 went down then the Cluster Administrator activated the BrokerServer A02 .but it took 40 seconds to brought online .
At the same time the IS instances are reconnected to the BrokerServer A02.I found this details inthe serverlog.
why did Cluster Administrator take that much of time to bring second one online?
I think your question may actually be a Windows question. It’s Windows clustering that does the work.
But it also takes time for Broker Server to start up. The way Broker Server clustering works is that one Broker Server is active and running. The other is not even running (it cannot be). On failover, the active server is shutdown and the standby is started. 40 seconds is fairly quick based on what I’ve seen elsewhere.
Usually Cluster software (windows in this case) will try to ping broker in certain interval to check availability of broker server (typically in 30 seconds frequency). Once it determines broker server not running for 1 or x number of interval (depends on configuration), it will try to start other broker server.
40 seconds for this activity doesn’t seem much long. If you still want to reduce the time further, you might want to check ping frequency and other configurations on clustering software.
Thank You Reamon and JayaRam for sharing the valuable information.
I am adding one more point to the existing thread. Even BrokerServer A02 acted as Active server, but at the same time from the optimize for Infrastructure we got 15 fake alert notification mails in 2 hours as a "Broker server failure rule "(As a out of compliance) . So Due to this reason I have restarted the Infrastructure Data collector and Analytical engine, after that fake alert mails are stopped.
why did the Infrastructure Data collector not able to recongnise the Broker Server A02?
Is it need add any details inthe MWS to avoid from such sceanarios?
So you’re saying that both A1 and A2 are active at the same time? You’re risking a loss of data in this configuration. Only 1 active Broker Server should have the broker files open at a given time. If you have independent files for each server, then on failover any unprocessed events in the original server will be left there until failover back to that node happens–which probably is not a desired behavior.
Only one BrokerServer is active at a time either A01 or A02.
I am facing the problem in production environment
At the same time from the optimize for Infrastructure, we got 15 fake alert notification mails in 2 hours as a "Broker server failure rule "(As a out of compliance) .
So Due to this reason I have restarted the Infrastructure Data collector and Analytical engine, after that fake alert mails are stopped.
why did the Infrastructure Data collector not able to recongnize the Broker Server A02?
Is it need add any details in the MWS to avoid from such seanarios?
Let me rephrase: are you saying both A1 and A2 are running at the same time? I’m somewhat confused by your statement “Even BrokerServer A02 acted as Active server”
Are you simply saying that A2 became the active server? And that’s when the alerts started? If so, I apologize for the misunderstanding.
When the failover occurs, does the IP get moved to A2? If not, and only the hostname is moved, but the IP changes, that’s probably the issue. JVMs cache hostname resolutions forever by default. When you restarted the collector/engine JVM it forced a hostname lookup again, this time getting the new IP.
In this scenario, IP address and Network name has been defined defined for the Cluster group. The BrokerServers A01 and A02 associated with the same Cluster group as per the document.
In failover case , the clients can see the IP address and network name of cluster group but not the nodes( A01 and A02).
Any idea if JVM caches cluster groups IP address and network name or the single cluster nodes ipaddress and network name?
We have found the following errors inthe server log, Please find the details
EST [ISS.0098.0036E] DefaultProducer encountered Transport Exception: com.wm.app.b2b.server.dispatcher.exceptions.EndpointUnavailableException: [ISS.0098.9014] BrokerException: Timeout (112-1450): The request timed out.
EST [ISS.0098.0041C] Unable to connect to Broker. Starting to poll
EST [ISC.0088.0001E] SOAPException: [ISS.0088.9112] An Exception was thrown in the server:Siebel-Error invoking service ABI Siebel Adapter, method Upsert at step Upsert.(SBL-BPR-00162)EAIObjMgr_enu_0018_18998350.log SBL-BPR-00162 Error invoking service ABI Siebel Adapter, method Upsert at step Upsert.(SBL-BPR-00162) SBL-EAI-04421 IDS_RRN_ABI_SA_DML Cannot perform UpdateRecord on the business component Action(SBL-EAI-04421)
EST [ISS.0025.0025I] Broker Synchronizer initialized
EST [ISS.0098.0042I] Successfully reconnected to Broker. Stopped polling;