Hardware Cluster for IS

I know webMethods support software cluster for IS. The software cluster architecture need a Load Balancer on network to offer a virtual IP. Do anybody use hardware cluster to setup IS cluster ?
Would tell me how to configurea ? I know if use hardware cluster need start , monitor , stop script in hardware cluster environment.
Would offer the scripts let me reference ?

Jordan,

Yes, you can build the IS hardware clustering> However, you need to create start, monitor, stop scripts for the os clustering software.
This is usually done an experienced webmethods consultant. Since they need to write those scripts for your particular environment.

Jordan,
If you decide to Do Hardware clustering using File Systems Mount and dismount mechanishm the way vertias does.
It;s always to easy to write small shell script to for Veritas/other tool to monitor for PID etc…
and then you can fail over to secondary machine
But when it fails over IS will have to be restared and that may take up few minutes and some of your flow services may fail.
Secondaly how are you going to manage Schedule services in cluster mode
also the Notification.

IN Broker it works little differently because awbrokermon is always running on the secondary box.
Let me know how are you thinking of doing IS harware clusterring.
Also check with webMethods if they are going to support it.

I believe that at the “product roadmap” presentation at Integration World 2003 webMethods discussed having broker server software clustering in an upcoming release. At that time, October 2003, I think this feature was targeted for the late 2004/early 2005 release. I did not hear any discussion of adding support for IS hardware based clustering at that time.

Someone who has seen a more current roadmap presentation or who possesses a better memory should confirm this.

Mark

I did it with Veritas and Solaris which was pretty straight forward. We did get webmethods to certify the solution so they would support it. Simple start/stop scripts were used to control both the IS instances and the awbrokermon process. On Failover, Veritas handles all of the startup on the secondary node. Failover time varies depending on how big the IS instance is etc. Ours normally takes about 3 - 4 minutes in test. webMethods does have a veritas agent for the broker but we decided to use our own scripts.

Mark,

could You please tell us, how to set up & configure hardware cluster properly?

Because i’m facing with Broker / IS cluster configuration for some time and it’s still not ok - I get a lot of transport exceptions during startup of passive IS instance, duplicate processing of documents and so on.

Following lines describe our production environment and cluster config - 1x Broker and 2x IS and two physical servers A and B with virtual ip’s VIP-A and VIP-B.

Server A running - Broker and IS1
Server B running - IS2

Broker, IS1 and IS2 are in active / passive mode.

In case of failover the service is migrated to second server => A/Broker to B/Broker, A/IS1 to B/IS1 and B/IS2 to A/IS2.

Veritas cluster is used, I have created scripts for start / stop a monitoring of Broker and IS. I think there is no problem in cluster config & scripts, if the cluster detects the service (Broker, IS1 or IS2) is down, it starts migration procedure, migrates storage and starts new service.

I think the problem is in Broker and IS config and in data stored / migrated in cluster storage.

Let’s have a detailed look at the Broker / IS / cluster storage:

1.) Virtual IP settings

  • I have replaced all occurences of physical IP’s in config files and replaced them with virtual IP’s - for example in dispatch.cnf file
  • I have configured the Broker via BrokerAdmin and set only virtual IP’s to be visible - there were both IP’s visible - physical and virtual

2.) Cluster storage settings
I think this is the main problem.

Now following directories are migrated in cluster, it means if cluster detects that service is down, these directories are taken and mounted to passive instance and passive Broker / IS will start upon these data:

Broker:
/wmBrokerData/default

IS1:
IntegrationServer/audit/data
IntegrationServer/DocumentStore
IntegrationServer/WmRepository2
IntegrationServer/WmRepository4
Servers/RepoV3/WmRepository

IS2:
IntegrationServer/audit/data
IntegrationServer/DocumentStore
IntegrationServer/WmRepository2
IntegrationServer/WmRepository4

During IS migration in cluster, for exmaple A/IS1 to B/IS1 i get a lot of exeptions:

[ISS.0095.0003C] AuditLogManager Runtime Exception: :[BAA.0000.0029] audit event for schema wMIsmJournal version 1 has no runtime configurations defined, event rejected. processing log entry >>>BasicData:RootContextID=bc0e7e10173a11db97b8e231640f3662,
AuditTimestamp=1153322696817,
ContextID=bc0e7e10173a11db97b8e231640f3662,
AuditSchemaName=wMIsmJournal,
AuditSchemaVersion=1,
ServerID=a.b.c:15555,
JournalLogEntry=[B@d18a80,ERRORINFO={
MemData:exId=BAA.0000.0029,exBundle=com.wm.app.audit.resources.AuditExcpMsgs,exError=0,exReason=0,exDfltMsg=audit event for schema {2} version {3} has no runtime configurations defined, event rejected.,
exWrapped=null,exParmcnt=2,exParm=wMIsmJournal,exParm=1}<<<.

[ISS.0095.0004C] AuditLogManager Runtime Rejected entry >>>BasicData:RootContextID=bc0e7e10173a11db97b8e231640f3662,
AuditTimestamp=1153322696817,
ContextID=bc0e7e10173a11db97b8e231640f3662,
AuditSchemaName=wMIsmJournal,
AuditSchemaVersion=1,ServerID=a.b.c:15555,
JournalLogEntry=[B@dc135d,ERRORINFO={
MemData:exId=BAA.0000.0029,
exBundle=com.wm.app.audit.resources.AuditExcpMsgs,exError=0,
exReason=0,exDfltMsg=audit event for schema {2} version {3} has no runtime configurations defined, event rejected.,exWrapped=null,
exParmcnt=2,exParm=wMIsmJournal,exParm=1}<<<.

And a lot of documents stayed in Broker queues and are not delivered, for each document and subprocess I got following exception:

[ISS.0098.0049C] Exception:com.wm.app.b2b.server.dispatcher.exceptions.TransportException: [ISS.0098.9008] No registered Transport for the given id: RtUfgBA2HDdvcGRJQH14PyZlg2Y_SUBPROCESS_1.IS_HUB_trigger while executing trigger. Unable to ack Document for TriggerStore:SUBPROCESS_1.IS_HUB:trigger.

[ISS.0098.0049C] Exception:com.wm.app.b2b.server.dispatcher.exceptions.TransportException: [ISS.0098.9008] No registered Transport for the given id: RtUfgBA2HDdvcGRJQH14PyZlg2Y_SUBPROCESS_3.IS_HUB_trigger while executing trigger. Unable to ack Document for TriggerStore:SUBPROCESS_3.IS_HUB:trigger.

I found some articles on advantage, that this exception is thrown if there are some invalid Broker sessions - for example IS1 communicates with Broker and executes models / flows / … a writes some partial info about transaction into WmRepositoryX directories with info about Broker session; then A/IS1 is migrated to B/IS1, also the WmRepositoryX directories with information about old broker session, but B\IS1 will establish new session to Broker and is unable to acknowledge all the documnets that were processes with previous session and all these docs are now waiting in Broker’s queues and are not delivered.

All triggers are succesfully connected to Broker if the IS is migrated, also all triggers succesfully reconnect if the Broker is migrated…

My question is - is the list of directories that are migrated in cluster correct? Maybe it is not necessary to migrate whole directories, only some files… Could You help me with this issue?

Many thanks,

Stano

Stano,

You may want to consider not using an active/passive cluster for IS and use a load balancing cluster instead (and not use clustering facilities offerred in IS). The errors you are seeing are all related to the challenge of maintaining state between the two instances. Your IS availability will be better and your configuration will be easier if you use an LB cluster. Plus, using hardware/OS clustering for IS is not officially supported by wM tech support, unless you get them to certify it for your installation as Mark did.

Your best bet is not to try and synch to IS instances by picking out the shared directories (although this can be done). I would go with one install of the IS and broker on a externally connected disk array (either scsi or SAN attached).

In normal operating mode Host A has the mount, during failover it’s unmounted and Host B gets the mount. You will not run into issues with that configuration. It also eliminates synching config files, changes etc.

Load Balancing is a good option if your Integration schemes support it. We use that as well for some instances. But for some types of integrations it introduces a lot of complexity that is not always needed. It really depends on your requirements. The other thing to consider with LB is licensing. Having two active servers can violate your CPU count if you don’t have enough cpu’s in your license contract.

Rob & Mark,

thanks for your answers & time.

Rob, we can only use the active / passive cluster solution, there is no other possibility now. It’s too late to change it :frowning: Yes, I know that HW cluster must be certified by wM professional services.

Mark, the only diffrence between Your suggestion and our one, is that You share whole wM installation and we only some dirs. According to wM documentation, currently we share all directories where are stored all process data, the rest - packages, jars, config is not shared and is being synchronized manually.

Yes, in this concept you have to synchronize IS packages, config, patches between instances, but on the other side - you have always one backup instance. For example, you apply some patch an IS will not start, in this case you only migrate to passive node where the patch hasn’t been applied yet and IS will start. We can have also little diffrencies in configuration for active and passive instance - we can specify that passive instaces will consume less resources, …

And the main diffrence is, that you share only one Broker with IS always together. We have second IS2 and all the exceptions arisen when the second IS2 is processing some requests, then failover procedure of Broker & IS1 starts, IS2 lost connection to Broker and have some in-trasitions data stored, after cca 5 minutes reconnects to passive Broker instance and is unable to acknowledge all the in-transion data and in the Broker’s queues is a lot of undelivered docs.

There must be something else, what causes these problems. I will play around with the shared directories, maybe is not neccesary to migrate all of them.

I’m reminded of an old saying: No matter how far down the wrong path you have travelled, turn around.

Professional Services offers certification for Broker clustering. They don’t have an “official” certification for IS. The docs explicity state the IS hardware clustering is not supported. Mark’s installation may very well be the only certified IS hardware cluster on the planet. :slight_smile:

IS in a hardware cluster is not a common approach. As Mark as pointed out, it can be done, but IMO it will be something that you’ll always have some sort of issue with (6 months from now you’ll have to convince wM tech support that you’re operating a certified environment–you’ll spend wasted time just getting them to look at any issue you may have).

This is a bad idea. One of the main premises of nodes in a cluster is that they are configured exactly the same way.

You’ve detailed out the issues that exist very nicely. Mark has suggested a way to avoid those. Is there something about your integrations that is forcing this approach?

Hmm, maybe we will have to reconsider the wM clustering…

At this time we are finishing our delivery to customer and the very last open issue is wm clustering ;o) and there is no time to make some significant changes in our cluster desing, e.g. change it from HW to SW cluster.

I will try the Mark’s solution and move the whole wM installation to shared cluster disk, not only some subset of directories, but I’m afraid that the problems will arise again, due to second IS used.

Just a quick note on my suggestion, there is only one IS in that case. Since you are moving everything to a single directory, only one IS can exist because it is in fact the same IS server. It’s just mounted by the passive server during a failover but it is still the exact same IS.

Of course, we have 2 cluster storages - one for Broker & IS1, second for IS2.

I have moved the webMethods installation to cluster storage and I got one JVM error, maybe it will be useful information also for someone else.

OS: Sun Solaris 5.9
JVM: wM default JVM 1.4.2-b28 (webMethods6/jvm/sol142/…)

I got following error:
Error occurred during initialization of VM
java.lang.Error: Properties init: Could not determine current working directory.

And the IS didn’t start.

This problem occurs if the JVM is on mounted disk (cluster storage) and JVM has not appropriate rights in the filesystem.

You have to change the rights for the mounting point and all subdirectories to ‘755’ and perform un-mount & mount to take changes in effect.

Hi Friends,
Can any body suggest a solution for the following issue.

I have configure a listener in server-1.server-2 is clustered with server1.
with a business requirment I have to disable the listener in cluster at runtime.i.e i have to disable the listener from both the server.

Do anyone know if i disable the listener using pub.disableListener service will it work for cluster i.e it will disable the listener in both the server.??

Or i have to use remoteInvoke or http invoke service to call the pub.disableListener service individually in a particular server.

Regards
Sipun

I believe you need to individually disable the listner on each server. Listners are not like scheduler jobs where we can just disable on any. Please correct me if i am wrong.