Delay in XML RPC Server reconnecting to EntireX broker

This really isn’t an error condition so I hate to bug support about it, but I am at a loss to explain why two different XML RPC Servers behave differently for reconnecting to the EntireX broker node once it comes up again.

The EntireX broker node 203 comes down at 7pm Saturday evening. Two XML RPC Servers that connect to it run on the same Windows server, and both have the same timeout parms in entirex.xmlrpcserver.properties:

entirex.server.restartcycles=240

Timeouts

entirex.server.waitattach=600S
entirex.server.waitserver=300S
entirex.timeout=60

About 4 hours after the Broker is shut down, the XML RPC Server services shut down:

2015-06-20 23:05:03.868> EntireX-*er-Main) Leave: XMLRPCServer.startXMLRPCServer()

The Broker node 203 was restarted at 1:21.

As for the XML RPC Servers, they came up about an hour apart with the 1st one being about 2 hrs after the Broker was back up and running:

Service A:

2015-06-21 03:16:53.265/main-1 Trace started:

W00/REQUEST 2015-06-21 03:17:32.353 ETBD0282 SPFX Values:
Prefix = 80010A2800000028000000000800000000000000000000D803000000000000
000000000000000000
Unique-ID = 31354A756E32312D3031323130312D3030303030312D303030303342
Key string: ,BID:10203=vista.tcc.etn.com,F=REGISTER,UID=XMLRPCServer,SC=RP
C,SN=UNIFYARGLFEED,SV=CALLNAT,API=10,ANODE=CLEOHSSAG01,ATYPE=Java,AVERS=9.
6.0.0.161,ANAME=XML RPC Server,ETXL=256.
SeqID = 369

Service B:

2015-06-21 04:15:10.069/main-1 Trace started:

W00/REQUEST 2015-06-21 04:15:18.536 ETBD0282 SPFX Values:
Prefix = 80010A2800000028000000000800000000000000000000DE03000000000000
000000000000000000
Unique-ID = 31354A756E32312D3031323130312D3030303030312D303030303531
Key string: ,BID:10203=vista.tcc.etn.com,F=REGISTER,UID=XMLRPCServer,SC=RP
C,SN=ORACREATESALESORDER,SV=CALLNAT,API=10,ANODE=CLEOHSSAG01,ATYPE=Java,AV
ERS=9.6.0.0.161,ANAME=XML RPC Server,ETXL=256.
SeqID = 593

This really wouldn’t be noticed at that time of day except we deployed a monitoring process that runs starting 4am Sunday to check if all services are registered to the Brokers, and a ticket was created for this one development service not being registered. But it did come up on its own 15 minutes later, which is an hour after the other one did.

What would be the reason for such a delay in restarting and reconnecting?

Also, how can I ensure the XML RPC Server services start and reconnect as soon as possible once the Brokers are available again?

Thanks,

Brian

Brian,
how are the RPC Servers restarted after their shutdown?
They can’t restart by themselves simply because they are no longer running.

The Windows services are set up to automatically restart.

When I go to Services and click Properties and then the Recovery tab, they both are set as such:

First failure: Restart the Service

Second failure: Restart the Service

Subsequent failures: Restart the Service

Reset fail count after: 1 days

Restart service after: 5 minutes

So, I would expect the service to constantly retry a restart every 5 minutes after it times out and shuts down, so both should have reconnected within 5 minutes of the Broker being back up again.

I would really like such a quick turnaround on reconnecting, and it doesn’t matter to me if it’s the Windows service staying up through a Broker outage or a Windows service restart. It seems either way the XML RPC Server service should reconnect to the Broker within 5 minutes of the Broker being up again in either state because of the XML RPC server parameters or the Windows service configuration.

The fact that it eventually does start and reconnect is nice. But now that I have a callout and a ticket to resolve every Sunday at 4am… that’s not nice.

Thanks,

Brian

The RPC Servers try to reconnect every 60 seconds until the entirex.server.restartcycles count is elapsed.
You have entirex.server.restartcycles=240, so the Servers terminate after 240*60 seconds = 4 hours.

If you set this value to 480, they will try reconnecting for 8 hours.

Doing some Googling, I found someone else asking a similar question about Windows services (not EntireX related).

One of the suggestions there was to create a bat file that can be scheduled to run on the Windows server every 5 minutes like this (example is for the Print Spooler):

@echo off
Rem Look for the Print Spooler service in the list of started services
net start | find /i "Print Spooler"
Rem if not found, start it and a restart occurred.
if "%errorlevel%"=="1" (
   echo Service "Print Spooler" restarted at %time% on %date% by Script %0>>c:\ServiceRestart.Log
   net start "Print Spooler"
)

I guess if the Windows Service configuration doesn’t really do what one would think with the configuration to restart after 5 minutes, I could take this approach instead. However, I assume if this is a best practice when it comes to restarting XML RPC Servers, someone in this community can back that up.

-Brian

And if you set it to e.g. 10000000 they will try reconnecting more or less forever :shock:

I will do as you suggest, Rolf, since then I don’t have to mess with the bat file idea I just posted.

The entirex.xmlrpcserver.properties were coded for 4 hours because that’s the most the Broker nodes are down for SIT, QA and production environments (unless there is a longer scheduled outage for maintenance planned). But for development it is actually 9 hours because we shut development down at 7pm instead of midnight, so I should really use entirex.server.restartcycles=540 for development.

Then if I want to handle extended outages, I could code the bat files for quicker Windows service restart, as it seems the Windows Service recover setting to restart after 5 minutes really means within 5 minutes of some time when Windows feels like getting around to it.

I guess 10 million seconds ought to take care of just about any extended maintenance we could ever think of. LOL.

Thanks!