I have two Integration Servers, where one performs a RemoteInvoke to the other. Over time I will start seeing the wm.server.ping service being called every five minutes. I look through the logs and I can’t find a failed RemoteInvoke or one that didn’t finish. I also have timeout values associated with the Remote Server alias. I just can’t figure where these pings are coming from or how to kill them off, except restarting webMethods.
I am running wM 6.1 under Red Hat Linux on both servers.
Thanks.
The wm.server:ping service is called to ensure that the session on the target server does not expire. This is currently controlled by the underlying Client Context class that the remote server alias uses. The ‘ping’ interval is determined by the session timeout value on the remote server.
Restarting your Integration Server makes the ping calls disappear only if you do not attempt to use the server alias in a remote invoke.
ed
I thought the session timeout value was used to timeout and end a session once there was no activity for that period of time. I didn’t think it was used to keep a session active.
Any guesses as to why the ping’s startup in the first place and then continue? Also I assume you are agreeing that there is no way to stop these ping sessions except by stopping and starting webMethods?
Sorry I re-read your reply and I thought of something else.
The number of individual pings depends on the total number of sessions used by Remote Invoke. If you have global scope set for an alias, the same session is used for all remote invocations; session scope creates a remote session for each local unique session that uses the remote server alias.
Typically you should just use global scope which reduces resource management on both servers; however there are times when you’ll want to use session scope. For example the remote services maintain state across service invocations.
One thing to note is that the each pinger thread is released if you have a timeout set on the remote alias.
Ed
I’m not sure what “global scope/session scope” is, but I am assuming that is some setting with Remote Invoke. I know it isn’t part of defining the Alias because I have never seen that setting. I will check with the developers on that.
I will try again to make explain my problem, sorry for my failed previous attempts, or maybe I’m just missing what you are trying to tell me. If I watch the logs/connections between servers I will see server A issue a Connect to server B, then issue a Remote Invoke to run a service on server B, and then issue a Disconnect. On server B I will see the connection come in and then disappear. This happens hundreds of time during the day with no problem and no ping service being called.
But among all those I currently see five ping services being executed every 5-10 minutes from server A to server B. If I look on server B I see five connections from server A that have been connected for over 80,000 seconds and growing. On server A the Remote Server alias I have setup for server B specifies a timeout of 10 and on server B I have Session Timeout set to 30, yet these sessions continue. Overtime there will be more than five, but I can’t seem to catch why or when they occur.
I believe this issue happened to quite many people before and I am still surprised why it is still not fixed.
We also had problems with mutipliying threads pinging remote servers.
But in our case, we had many of them because of the Manager Server.
The Manager Server invokes a service in the managed IS (located in WmOmiAgent) which will enable it to receive OMI notifications from the remote server. But it does not only do that, it also causes the remote server, IS, to ping the Manager Server similar to what happens with a remote alias.
When there is a network issue between the Manager and the Managed, some Pinger threads can hang (temporarly or forever), causing the IS running these threads to think the pinger is down, so it creates another one.
At some point, we even found an IS with hundreds of hanging Pinger threads and a dozen of active Pinger. Imagine you have something like that on production servers.
Something similar can happen with a remote alias.
We opened an SR# with webMethods for quite a while with bunch of traces, but for some reason, it is ignored…
riad
By the way, could you tell me the value of watt.net.timeout you use in your IS ?
It is in the Extended Settings but not visible until you make it visible.
I’ve checked both servers and it is set to 0.
Ok, 0 is the default value and it is a bad value (I wonder why bad values are quit often the default ones…).
Change it to 60 or 120 (seconds), so any socket used by a ping has to timeout within that time if there is a network failure or something like that.
Would be nice when you change it to tell us the result afterwards.
riad
I will try changing it and post the results.
I still have to ask why the ping starts in the first place. This ping is not being called either manually or by any of our code. Something in webMethods causes these to start because of some unknown conditions.
Thanks.
RemoteInvoke uses Context java object (or TContext, not sure).
Anyway, Context object will automatically start a ping thread to ping the target IS.
And you cannot control it. Weird design.
If you want another solution, use the simple HTTP client, it is lighter and faster and has less side effects.
The ping is done to ensure that the session on the target Integration Server is kept alive. One of the driving reasons for this behavior is the session potentially contains state information built over a series of service invocations which is lost if a session timeout occurs.
I still need to do more testing, but initial results is setting it to 30 caused even more pings to startup - a lot more.
Interesting…
I advice you however to increase the timeout value, 30 is too short.
By any chance, is the timeout associated with the Alias used by the RemoteInvoke lower than 30 secs ?
I set watt.net.timeout to 60.
We currently get lots of “[ISS.0036.0009E] Ping Failed” entries in server logs of most of our IntegrationServers, while a Remote Server Alias was only used between two of these.
The integration servers are spread between multiple physical machines but the messages appear on most of them. Astonishingly the messages do not appear on all IntegrationServers but on more than the two, which were connected by a Remote Server Alias. The IntegrationServer on which the Remote Server was configured does not seem to be affected
I removed the Remote Server Alias but it does not seem to change anything. Does anyone know of a solution to fix the problem?
Regards,
Ulf Licht
chris.paluch,
What was the outcome of you testing with watt.net.timeout=60 and alias timeout=5 minutes ?
Ulf,
- Did you restart your IS’s when you removed the remote aliases ? I mean both the IS contained the aliases and the IS’s being referenced by the aliases.
- Are you running Manager Server and did you add your IS’s in it ?
riad