We have recently upgraded our production integration server cluster from 4 to 6 to cope with increasing load. Each webMethods integrationserver (v6.5) process has a reposityserver process on the same box. All our webMethods processes run on windows boxes. So we have 6x integration server processes and 6x repository server processes on 6 boxes. The clustering has been setup through the webmethods supplied GUI tool. The repository server processes are DB repository based.
The only real documentation we could find on this was in the cluster guide, which says:
To enable clustering and make the repository shareable, the repository server must be external to the Integration Server. The recommended external deployment co-locates a repository server with each Integration Server, as shown below. One of the repository servers will self-designate as the primary at system startup.
Can anyone with a similar setup confirm whether they are employing the same integration/repository server setup?
The reason I am asking is that, soon after the new servers were added to the cluster, we saw a few incidents where 1 or 2 of the integration servers failed to start up due to what looked like an issue when connecting to a repo server. We have a cyclic restart of all the servers at the start of each day. For example, we were getting errors like:
1 – Only the following 2 lines were logged in the server.log, followed by the service hanging and eventually having to be shut down manually and restarted
2010-07-11 05:42:22 BST [ISS.0025.0006C] License Manager started
2010-07-11 05:42:22 BST [ISS.0025.0001C] Integration Server 6.5 Build 395
2 – It seems to get as far as attempting to connect to the repo server, but errors in the following way:
2010-04-15 08:12:03 BST [ISS.0014.0032C] Could not connect to configured Repository Server. Shutting down server.
 java.net.ConnectException: Connection refused: connect
2010-04-15 08:12:03 BST [ISC.0067.0108C] initRemoteRepositoryClient() exception: RMI Exception: Naming lookup of Repository failed, Connection refused to host: [Server-name], nested exception is:
 java.net.SocketException: Connection reset
2010-04-15 08:10:02 BST [ISC.0067.0108C] initRemoteRepositoryClient() exception: Error unmarshaling return header; nested exception is:
2010-04-15 08:09:09 BST [ISS.0025.0006C] License Manager started
2010-04-15 08:09:09 BST [ISS.0025.0001C] Integration Server 6.5 Build 395
Does anyone have any recommendations of how to manage the number of repo servers in a cluster that is increasing in size? Are there any GEAR docs that anyone knows of that address this issue? We were considering reducing the number of repo servers but have not seen any documentation that really discusses the recommended setup
Any thoughts would be much appreciated