Broker 7.1 on AIX 5.3: Slow Startup time. Do you use AIX/Bkr

If you are running Broker 7.1 on AIX, I’d be interested to hear if you’ve ever faced this problem.
I’d also be interested to hear from anyone using this combination who has this working just fine. Is there anyone?

When we initially installed Broker, it would startup in 4 seconds. About 2 weeks ago the startup time in all of our environments suddenly went up. We’re now averaging 18-30 minutes for a startup. Obviously, this presents problems in our HA environments. The Clustering software is configured to only waits 30 seconds before going into error.

We’re working w/ support on this, but are not coming to a solution.
We’ve gone so far as to do a complete reinstall and still see the issue.

This is happening in environments where we have had a firmware upgrade to the OS (patches) applied, but also in environments where we’ve had no patches/upgrades applied.

We have resolved this issue.
This was due to the behavior of Broker when it’s ulimit on file descriptors is set to unlimited. The Startup is performing a routine to close open file descriptors from 0 up to the ulimit value. When this ulimit value is set, this is quick, but when it is unlimited, the routine goes on until it runs out of possible values, which we stopped looking after it hit 40 million.

The problem is caused by two contributing items.

  1. The id we are running on has it’s ulimit values set to unlimited.
  2. The S45Broker script sets a ‘ULIMIT_CMD’ variable to ‘ulimit -H 8192’, which, if executed as a command would set the file descriptors value to be limited to 8192. However, the S45Broker script never actually executes the variable, so it’s never set.

While we are verifying this, it appears this issue could appear in any Unix based system. They all use the ulimit setting.

We corrected this by editing the S45Broker script to execute the ULIMIT_CMD.
Another fix would have been to set our ID’s ulimit for file descriptors to 8192, or something that is not unlimited.

Why did this take a couple of weeks to show up?
When we initially setup on the environment, the unix id we were running under was using the sytem’s default limits, which had limits for fileDescriptor. This was later changed so that all of the ulimit values would be unlimited.

How to tell if you’ll have this problem?
From the ID you plan to run Broker under, find out the hard and soft ulimit values.
ulimit -a give you the soft limits
ulimit -a -H gives you the hard limits that cannot be exceeded.

If the filedescriptors is coming back unlimited, AND the S45Broker script has not been corrected to execute the ULIMIT_CMD, you will have this long startup time problem.