KeepAlive checked immediately after open of the connection

We have a Jms UM client implemented in a Oracle Java Stored Procedure. This means that each call should open JNDI and JMS connection to the UM server(s) pull data, close the connection and return the data being pulled. The connection is not open once and then kept open.

We see in the client log that after the first call being made, for the next ones - immediately after opening of the connection - the driver closes it because KeepAlive has not been received. It should wait 60+5 sec, not to check for missed KeepAlive immediately on open.

In the client log we have:
[Wed Nov 20 06:52:54 CET 2019] Session 10 attempting connection to [nsps://xxx:5150/]
[Wed Nov 20 06:52:54 CET 2019] Getting Default SSL SocketFactory
[Wed Nov 20 06:52:54 CET 2019] SSL-Socket> Close thread is blocked
[Wed Nov 20 06:52:54 CET 2019] IO : [xxx:5150] KeepAlive> Still not received any events from the server after allowing time for drift. Closing connection.

The client driver has parameter PermittedKeepAlivesMissed = 1

The version of the UM is 10.1.0.16.121219

In the UM driver code we see that the control of the event is based on com.pcbsys.foundation.base.fTimer - with static methods.
It is loaded into Jvm on the first call and for the subsequent calls the static values are preserved.

The KeepAlive check in com.pcbsys.foundation.io.fConnectionKeepAlive is the following:
if (fConnectionSettings.sPermittedMissedKeepAlives <= 0) {
return true;
} else {
long var1 = fTimer.getTicks();
long var3 = var1 - this.myConnection.myLastRxTime + 1000L;
long var5 = (long)fConnectionSettings.sPermittedMissedKeepAlives * this.myKeepAlive;
if (var3 >= var5) {
if (var3 >= var5 + 5000L) {
if (fConstants.logger.isWarningEnabled()) {
fConstants.logger.warn(“IO : [” + this.myConnection.id + “] KeepAlive> Still not received any events from the server after allowing time for drift. Closing connection.”);
}
where this.myConnection.myLastRxTime is not initialised on the creation of the connection but only on the first read. Having the fTimer static values and myLastRxTime returning 0 → the calculation above “(var3 >= var5 + 5000L)” seems to be true before the first read, so immediately after connect if KeepAlive is not received withing milliseconds - the client driver closes the connection immediately.

I don’t have experience with Oracle java stored procedures and the threading model that’s enforced there.
I’m afraid it’s not something we’ve aimed to support (respectively test) explicitly.

Generally speaking, the fTimer is a simple tick timer that uses java’s standard Timer. Indeed, the connection last read time is updated on read opearation, but that is something which should happen as part of the handshake routine when the connection is open, if that went fine.

My gut feeling is that the threading model in the Oracle DB is different from what you see in a standard JVM. I found some documentation here - Overview of Threading in Oracle Database

What’s the version of the Oracle DB you are using?
Would you be able to share the code you are using to manage the JNDI context and JMS connections lifecycle.
I wonder if we can get a thread dump from the DB JRE when the keep alive check fails…

Thanks,
Stefan

Hi Stefan,

The only difference between calling it from a standalone Java application and Oracle Java stored procedure is that we cannot keep the JNDI connection open [or open it only once to do the lookup] and use listener to retrieve the messages or pull messages in a loop. We must reconnect every X seconds/minutes to both JNDI and to Jms servers.
Here is the code:

com.pcbsys.nirvana.client.nConstants.setPermittedKeepAlivesMissed(1);

Context context = null;
Connection connection = null;
Session session = null;
MessageConsumer consumer = null;

try {
  System.setProperty("javax.net.ssl.keyStore", sKeyStore);
  System.setProperty("javax.net.ssl.keyStoreType", "pkcs12");
  System.setProperty("javax.net.ssl.keyStorePassword", sKeyStorePassword);
  System.setProperty("javax.net.ssl.trustStore", sTrustStore);
  if (sTrustStorePassword != null) 
    System.setProperty("javax.net.ssl.trustStorePassword", sTrustStorePassword);

  Properties env = new Properties();
  env.setProperty("java.naming.factory.initial", "com.pcbsys.nirvana.nSpace.NirvanaContextFactory");
  env.setProperty("java.naming.provider.url", sProviderURL);
  context = new InitialContext(env);
  ConnectionFactory factory = (ConnectionFactory)context.lookup(sConnFactoryName);
  connection = factory.createConnection();
  connection.start();
  session = connection.createSession(false, Session.CLIENT_ACKNOWLEDGE);
  Destination destination = session.createQueue(sQueueName);
  consumer = session.createConsumer(destination);
  Message message = consumer.receive(5000);

The driver logs lines come from a single statement:
context = new InitialContext(env);

I believe it can be reproduced with Java standalone client - instead of opening the JNDI connection once at the startup of the application - to have open+close executed in a loop with a sleep time in between.

Kind regards,
Gergana

Please also find below the rest of the Jms client log if it may help:

[Wed Nov 20 06:52:54 CET 2019] Setting log level to WARN
[Wed Nov 20 06:52:54 CET 2019] Session 10 has been created with URL list: nSessionAttributes:conns=1/[nsps://XX:5150/]
[Wed Nov 20 06:52:54 CET 2019] Session 10 has been created with URL list: nSessionAttributes:conns=1/[nsps:/XXX:5150/]
[Wed Nov 20 06:52:54 CET 2019] Session 10 attempting connection to [nsps://XXX:5150/]
[Wed Nov 20 06:52:54 CET 2019] Getting Default SSL SocketFactory
[Wed Nov 20 06:52:54 CET 2019] SSL-Socket> Close thread is blocked
[Wed Nov 20 06:52:54 CET 2019] IO : [XXX:5150] KeepAlive> Still not received any events from the server after allowing time for drift. Closing connection.
[Wed Nov 20 06:52:54 CET 2019] SSL-Socket> Close thread is blocked
[Wed Nov 20 06:58:54 CET 2019] Failed to connect to Realm on attempt 1 - nSessionAttributes:conns=1/[nsps:/XXX:5150/]
[Wed Nov 20 06:58:54 CET 2019] com.pcbsys.nirvana.client.nSessionNotConnectedException: The session is not currently connected to the server. Unable to perform the request:Session has been closed via another thread
com.pcbsys.nirvana.client.nSessionNotConnectedException: The session is not currently connected to the server. Unable to perform the request:Session has been closed via another thread
at com.pcbsys.nirvana.base.clientimpl.nConnectionManager.writeEvent(nConnectionManager.java)
at com.pcbsys.nirvana.base.clientimpl.nConnectionManager.writeEvent(nConnectionManager.java:800)
at com.pcbsys.nirvana.base.clientimpl.nConnectionManager.requestHandshake(nConnectionManager.java:605)
at com.pcbsys.nirvana.base.clientimpl.nConnectionManager.connect(nConnectionManager.java:551)
at com.pcbsys.nirvana.base.clientimpl.nConnectionManager.initialise(nConnectionManager.java:345)
at com.pcbsys.nirvana.client.nSession.init(nSession.java:267)
at com.pcbsys.nirvana.client.nSession.init(nSession.java:205)
at com.pcbsys.nirvana.nSpace.NirvanaContext.(NirvanaContext.java:163)
at com.pcbsys.nirvana.nSpace.NirvanaContext.(NirvanaContext.java:83)
at com.pcbsys.nirvana.nSpace.NirvanaContextFactory.getContext(NirvanaContextFactory.java:69)
at com.pcbsys.nirvana.nSpace.NirvanaContextFactory.getInitialContext(NirvanaContextFactory.java:50)
at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:684)
at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313)
at javax.naming.InitialContext.init(InitialContext.java:244)
at javax.naming.InitialContext.(InitialContext.java:216)

This is quite puzzling and I would be genuinly surprised if it’s reproducible in a standalone java application.
The keep alive task is scheduled to run 60s after the connection is created. There could be something wrong with the thread scheduler, or the timer, causing the task to fire prematurely, but I’m really worried about the threading support in the DB JRE.
What’s also curious is this “SSL-Socket> Close thread is blocked” log entry - it kind of suggests a close operation on the driver didn’t finish for 40 seconds.
Furthermore there is a 6m delay between the session termination and the attempt to use it, which doesn’t really make sense since the JNDI context will initiate the session immediately after it’s created.

I’m afraid I don’t have an Oracle DB at hand to test that. I’m going to check if there is anyone with field experience in the area.

In the meantime, could you share what is the version of the DB.

I’m assuming the JNDI context is closed at the end of the stored procedure, right?

Thanks,
Stefan

Hi Stefan,

Further update on this issue.

Another specific when calling the Jms from Oracle Java stored procedure is that between the calls the JVM is on hold - the timer receives no ticks. When the next call is invoked the timer still has the old number or ticks. It seems to catch up soon after.
Here is a proof - random entry from the log, but the pattern is seen for every single occurance:

*** 2019-11-27T06:41:20.616359+01:00

[Tue Nov 26 22:54:19 CET 2019] Session 10 has been created with URL list: nSessionAttributes:conns=1/[nsps://XXX:5150/]
[Tue Nov 26 22:54:19 CET 2019] Session 10 has been created with URL list: nSessionAttributes:conns=1/[nsps://XXX:5150/]
[Wed Nov 27 06:41:20 CET 2019] Session 10 attempting connection to [nsps://XXX:5150/]

The top line is the Oracle timestamp. The first 2 lines have very wrong date and time - the one from the previous call from this session. The code of com.pcbsys.foundation.logger.fLogger changes the datetime in the log only “if (sLastDateChg + 1000L < fTimer.getTicks()) {”. This seems to prove the Timer is being held on hold.
As you can see above, sooner after the timer catches up - always less than a second.

As a summary - between the calls the Timer is on hold, then the second call is more than 60 sec after the first one - keepAlive seems to be scheduled immediately.
This problem never occurs when the call are within 60 sec from each other.

It seems that even if keepALive is scheduled immediately, if inside this class the variable myLastRxTime if NULL does not get value 0 in the below code …
long var1 = fTimer.getTicks();
long var3 = var1 - this.myConnection.myLastRxTime + 1000L;
… it will not close the connection immediately.

Another workaround is if we can call a method which go give us the current Timer ticks and to compare versus the system datetime in a loop until the Timer catches up.

Please advise.

Kind regards,
Gergana

Actually, initialising the variable could not help as long as the timer is behind.
So the only workaround it to get the timer ticks up to date before opening of the connection.

And yes, we close both the connections in the finally block.

Kind regards,
Gergana