IS Clustering without broker

Hi all,

I have this architecture whereby I have two machines a1 and a2. They are currently clustered but without broker configured. I have discovered the following.

  • if process is triggered from a2 and finished at a2, the process is completed successfully.
  • if process is triggered from a2 and finished at a1, the process is completed but at later stage, the process will timeout.

I have tried to use pub.prt.admin:changeProcessStatus and uses pub.remote:invoke but still the process is being timeout. Anyone can suggest a way to bypass the use of broker?

How/why is the process that started on a2 moving to a1? A rule of thumb that is often followed: a process that starts on one instance finishes on that instance. Introduce logical servers and have process steps run on different logical servers only if there is a specific benefit to doing so.

Having multiple logical servers on different IS instances will require use of Broker.

Hi Reamon,

The reason for that is that we have a load-balancer that determines the route thus we are not able to control where the process will complete. So far, we have tested various scenario and found that if we invoke pub.monitor.process.instanceControl:changeInstanceStatus using pub.remote:invoke with “Cancel” for controlAction for process with single document, it is able to stop the timeout timer but for process with multiple documents, it is not able to. Still in the process of researching why so.

I was just thinking, is it possible to log which server the process started and retrieve this information during the final step and passes it back to the original server to continue processing? Technically, I think it is not possible due to the BPM. Any advice will be good :slight_smile:

What is the process step doing to make it “leave” the current IS?

Hi Reamon,

In a single direction outbound PIP, the process step leaving the current IS is the Receive Ack and Receive Ack2. We have no control where the Acknowledgement will be handled in which server.

In a bi-direction inbound PIP, the process step leaving the current IS is the Receive Internal Document step. We have no control where the scheduler will run as it is running in clustered mode.

Ah, I think I understand the scenario now.

Broker isn’t a factor.

Do the 2 IS instances have the same logical server name? I assume they are both using the same set of DB tables for core audit, PE, etc. I’m thinking this is a simple configuration issue where A1 isn’t recognizing that A2 handled the rest of the process (or vice versa). It’s been a while since I’ve worked with Modeler/PE so my memory is fuzzy. Perhaps wM support can quickly resolve?

Hi Reamon,

We have actually engaged the wM support and professional services on this issue as well but still pending their outcome. The problem seems to be more than configuration issue because the timer for the process at a2 is still ticking while the process is completed at a1. Both IS instances have the same logical server name and using the same set of DB tables. With the broker, it is easy to resolve this problem as the process at a2 will subscribe to the same document and the process at a2 will end. Without the broker, it is not that simple as we have tested trying to either CANCEL the process at a2 using remote:invoke. Still trying to see what other options available. Any options and opinions is well appreciated :slight_smile:

You can run processes on a cluster without broker, but this is restricted to certain cases where logic is done by a third party ( meaning by third party for example TN ).

I saw this one while using a project with RN, but looong ago.

If your environment does not meet this restrictions on the “third party” doing the logic, the you will need to install Broker.

However since you engaged wM GS and PS, it would be nice to know the resolution! :smiley:

Keep us posted!

Hi all,

The problem can be resolved by fix: PRT_7-1-2_Fix3 and above. I’m still in the process of clarifying how exactly they stop the timer in node1. Will update once they provide information. I’m guessing that they simply just don’t log anything when the timer is invoked and to refresh query from database to get the exact status before timing out. If process is not completed, it will trigger off. :lame:

1-1ZE96V (PRT_7-1-2_Fix3)
You may observe that process timeout occurs in a cluster
environment even after process has completed.
For example, if a timer is initiated on node1, and the process
completes and then is subsequently reaped from the database from
node2, when the timer eventually fires on node1, the process
status will be unknown (as the status code is no longer in the
database).
This issue is now resolved.

Or maybe timer checks database before firing… if references to database no longer exist assumes other node has complete and exits? who know, but thanks for updating :smiley: