Concurrency and cluster in wM IS

This is not like proving something that does not exist, like claiming there is an elephant in the building and it is living in a safe and trying to disprove it. It can be true or not, but in order to disprove it you need to check all the safes. This is not the case here. If you have a stateful clusters, all you need to do is do load test and crush one of the nodes during load test. This is not a philosophy question.

I should have been more clear on that entry. It is not directly related to stateful clusters. What I meant by that was:
Before executing a step, Integration Server passes the pipeline data to that step.
Step executes with that pipeline data, and creates another pipeline after executing that. If the step involves multiple inner steps, same process will be applied there as well.
In short, every step executed generates a new pipeline. I don’t know how frequent integration server saves the pipeline. But stateful clusters will use the database to save the pipeline. It may be creating check points for every step, or it may be creating random check points. It can only be known by the developers themselves, unless they disclose that information somewhere, like documentation. Without that information what I did was only a speculation, a guess.

Referencing a past POC data is OK, but claiming something strongly just because you have more experience then someone else is not scientific. According to that logic oldest people should know everything. Its certainly not the case, as people grow older they tend to reject learning new things and they want to keep doing the same thing they have been doing the most. Thats why kids are really better with technology then most of us. This doesn’t indicate that as people grow older then don’t learn anything or they learn less. It is not, experience is important. But when there is a disagreement on a subject we need to stick with the facts. That’s how science makes progress.

Exactly my point. How do you know it is still the same? Windows 95 runs on DOS. According to my past 30 years of windows experience can I claim windows still runs on DOS? People are still disabling swap space because they think it slows down the execution speed by using page file. May be it was the case when we had 64 mb of rams and for servers it wasn’t necessary. It certainly is not the case anymore. Same thing applies to using nolock in every tsql select query, or running services as transformers. We should update our knowledge and believes frequently. I asked that question to ask you all when was the last time you updated your knowledge on this subject.

My point is, if stateful clustering doesn’t do anything for reliability then, why implement it? What is the use case for it? Why add it to documentation? It can’t work without F5 anyway, what is the point having it according to you all? What do you think Devs were thinking when they implemented it?

I didn’t reply unrelated parts in order not to get off topic. I am not talking about check point pattern. Stateful clusters may or may not use that out of the box. I don’t have that information about it so no need to go off topic here.

If I sound rude or something, please excuse me. English is not my primary language, and I certainly am not too polite even when speaking my primary language. That certainly is not intentional. Just wanted to clarify it just in case.

True. And I hope that nothing I’ve said indicates “been doing this a long time” as proof. I’ve tried to be careful about that but if I’ve missed in a place or two, I apologize. Documentation or evidence is by far preferred and is what I’m looking for…

This assumes that we’ve only been stating “my experience from long ago indicates X.” I don’t think anyone has made that claim without disclaimer or other info. Rather, “have never seen it and the docs don’t indicate that it has changed.” My understanding, based upon past exposure and document reviews over the years, is that IS clustering has not appreciably changed in a long time. Even when it changed to use Terracotta. I may have missed something – thus this thread exists. :slight_smile:

To allow multiple nodes to handle multiple interactions from a client in a stateful way. E.g. an interaction that requires a client to make multiple calls to achieve a “transaction” and the server can save session/state for the cluster so that any node can do the right thing given the state. It certainly aids in reliability so that if one node goes down, other nodes can still handle the calls without losing any existing state. But based upon docs and my understanding, an IS server instance does not hand things off to another node for execution. And if an IS instance goes down mid-execution, no other node will pick up that work where it left off on its own.

The session/state management is the same as any application server – stuff things in the session object (or elsewhere) that is accessible to all nodes so that any node can process the first, second, third…nth call in a given logical transaction. Not for an IS that is in middle of running a service to pass it off to another node mid-execution.

You do not sound rude. And hopefully no one feels any of the exchange has been rude or confrontational. I am finding it very helpful. And respectful.

1 Like

The short answer to your question is here: Reverb (EDIT: note that “failover support” here refers to the use of the client Context/TContext Java classes as described in that section which I linked in my prior post)

If you need any of the features for which the second column states “No”, then you need a stateful cluster. If you don’t, then a stateless cluster is preferred due its simplicity.

The longer answer is that stateful clusters were required by many other features in the past (e.g. scheduled tasks, certain types of trigger joins, etc.), but slowly but surely, Software AG started removing some of those requirements because synchronization across the cluster opened the door to other problems, especially in the days prior to Coherence, where webMethods used its home grown repo for managing state. I worked as a Professional Services consultant for Software AG from 2008 to 2014 and I recall several conversations on topic back then as R&D started to make these changes. I’d be happy to dig up some older conversations if it will help provide more context, but in a nutshell, this is the reason.

Percio

Both of yours last replies make sense but I can’t claim they are true unless I see it myself. Like I said earlier, this will be one of my test cases for my upgrade project. I will also add symbolic steps with debuglog service. probably 10 or more and possibly add some delay in between some of the steps.

I will be glad to test it as well if you have a specific test case in your mind.

For this query, I want to build a 2 node automatic scaling IS cluster, enable the pipeline to save upon failure and do a load test. After seeing it is not scaling up anymore, I plan to destroy all of the nodes forcefully except one.

For test service, I plan to use a simple db insert query. Let me know if you think this will clarify this or not and if you have a better or parallel test scenario. It will certainly be helpful whether it works one way or another. I will share my test result in this topic.

2 Likes

I’m looking forward to your results. As for a test, I think a simple service that does the following would be plenty:

  1. Log an initial message (e.g. BEGIN)
  2. Sleep
  3. Log a final message (e.g. END)

Then with a simple 2-node stateful cluster, invoke the service from any client on node 1 and kill that node while the service is sleeping. If server-side automatic failover is a real thing, you should see the same service automatically execute on the other server without a client retry. If step checkpointing is also a thing, you won’t see the BEGIN log statement in the 2nd node, you will only see the END statement.

If you want to validate whether the pipeline is shared across, you could even generate a GUID in the very beginning of the service and log it in steps #1 and #3. If the GUID generated in node 1 also appears in the log statement in node 2 (assuming automatic failover happens), then you know that the pipeline was shared across the nodes.

Good luck,
Percio

1 Like

This is much simpler then what I thought, I will test this first but I need to build my environment for this first and unfortunately I am no kubernetes expert so it will take a while.

If this doesn’t work we can find out with my test case as well. Will keep this post updated in the future.

1 Like

If this occurs, I’m very interested in how the client would ever get the response. When node 1 goes away, the client will get disconnected and fail. There is no way I’m aware of where a client that HTTPs a POST/GET to node 1 will get a response from node2.

1 Like

I get it. For this test, it may be easier to just go with a simple Docker Compose file that uses the images for IS and Terracotta from the Software AG Container Registry to spin up a simple environment without the complications of Kubernetes. If that feels complicated too, then a simple bare metal install on your PC and then trashing it after the test may be the quickest path.

Percio

This is possible with async calls but I don’t remember if IS actually supports async calls. As long as it uses a callback service, it is possible. Also it doesn’t have to return a response, my service wont be expecting a result, it will insert it to db.

If I weren’t already building a POC environment and if I didn’t have a stateful test environment I would do that. Stateful cluster configuration is not super easy, I don’t want to waste my time. It needs db, load balancer etc. that I can’t build myself. I don’t have terracotta helm charts yet, it could take less time if I had them in my hand already. SoftwareAG was supposed to put them to helm repo 3 weeks ago, they still haven’t done it.

Hi Rob,

For that please add the newly introduced tag discussion to topics for which you don’t want to get these “mark as solved” reminders.

2 Likes