Hi All - Do you know if MQ Adapter caches connections? If so, how can one ensure this doesn’t happen? My suspicion is that wM will cache MQ connection information, even after you bounce the IS.
Here’s the behavior we noticed during a Disaster Recover test. Our Production IS712 was connected to Production MQ (version 6.0.2.3). The MQ Admin failed over from their PROD to their DR environment. From that point onwards, wM could no longer connect to MQ; even after it was bounced. I suspect this is because wM has cached the MQ connection information and is not robust enough to recognize that MQ is running on a different server.
Has anyone seen this behavior? Or am I missing something?
The JVM caches hostname resolutions forever by default. Perhaps that played a part? You can disable hostname resolution caching via JVM properties or command-line parameters. Search for networkaddress.cache.ttl for information on the available switches.
By “bounced” do you mean IS was restarted? Or the server hosting IS? The OS can cache hostname resolutions too but this usually isn’t a problem depending on the DNS entry. Hostname to IP updates can take a bit of time to propagate but I doubt this was a problem.
After the fail over were other machines able to connect to the DR instance? Ping it? Resolve to the right IP?
This feels like a DNS config type of issue rather than an IS MQ adapter issue, but that’s just a preliminary suspicion based on the info thus far.
Hi Rob - Yes, I bounced just the IS. And after MQ failed over to node2, I was able to telnet to that hostname & port (which is probably an alias for the MQ load balancer). So my host was able to communicate to the target MQ host.
If the JVM caches the hostname’s IP address forever, then that would certainly not serve us well during a Disaster Recovery scenario when the target system (MQ, Oracle, Sieble, etc.) fails over to a new node. I am almost certain that’s what hosed us because i was able to connect to Oracle. Also, the network team was sniffing packets coming out of our IS and noticed no data coming out when I disabled and enabled the MQ connections.
I know we set some jvm parameter (something like “cache.ttl”) which I’ll check out. And I’ll also check out the “networkaddress.cache.ttl” setting.
Our IS does set the “networkaddress.cache.ttl=60” so now I’m really confused about why the hostname-to-IP lookup didn’t happen after the 60 second caching period expired. Maybe we’re not setting it correctly? Does it need to be set with a (JAVA_ARG5=“-Dnetworkaddress.cache.ttl=60”) syntax perhaps?
So just to be clear, the IS was bounced and still didn’t connect to the target MQ server. However, when I telnet’ed to that target MQ server from the IS’s host (from command line), it connected correctly. This tells me the DNS lookup happened correctly at the host level. It was just the IS which had somehow still cached the old server’s connection information.
The networkaddress.cache.ttl needs to be in the java.security file. Is that where you put it? It isn’t supported as a command-line argument AFAIK. To use the command-line the var is sun.net.inetaddr.ttl and works only with a Sun JVM (I think).
Thanks for the feedback Rob. I suspect something else is going on too since bouncing the IS should have done the trick. Will keep you posted on what we find. The SR has been escalted with SAG.
Ok. We’ve been able to reproduce the MQ rc 2059 (QManager not available) and have also identified a workaround. The workaround is goofy, so be forewarned …
Here’s the scenario:
With SSL enabled, when MQ fails over from PROD node1 to node2 (or moves from their PROD to DR environment), wM MQ connections handle that very well. wM points to the new MQ node (or environment). The way I validated this is by toggling an MQ connection whose minConns=1 with connPooling turned on. Our MQ Admin verified he saw channels running from wM’s IP address.
Then, if you bounce the IS712, when wM comes back up, the MQ conns give a rc 2059. It seems that after a restart, the IS is somehow pointing back to the orignal MQ node and not to the one MQ moved to.
Here’s the goofy workaround:
1 - Disable any MQ connection packages and restart the IS.
2 - When IS is back up, create a new connection in any package (e.g. Default package) and it will enable just fine.
3 - Re-enable the MQ connections packages from step #1 and you’ll notice they connect fine.
So if anyone can figure out what’s causing this behavior, I’d greatly appreciate your thoughts. Note that if we perform the MQ failover and wM restart with SSL disabled, we don’t get the rc 2059s.
Hi Rob - Yes, we always set minConn=0. I set minConn=1 for one of our connections so the MQ team could see an active channel running on their end. It was only for diagnostic purposes.
Do you think that having even 1 out of a dozen MQ connections set to minConn=1 could be causing the issue? Also noteworthy is that when I delete those MQ Conn packages and restart IS (as opposed to just disabling them), the “trick” doesn’t work.
Anyway, I’ve posed this to SAG too and they too are intrigued and investigating. Will keep you posted.
It turns out that with MQ Adapter using SSL, one has to edit each MQ conn’s keystore password and save them each individually. Then restart the IS. This has resolved the rc 2059 issue.