Eventsevent size

Hi guys,
can anyone tell me the size of event in ES5.0 and number of events that can be published at a time.

Thanks
SAm

The size of an event varies as does the number of events that can be published at one time.

With no data, an event may be 1K or so. The data in an empty document is in the _env fields. Using ES 5.0, an event can be of unlimited size but your server may feel the strain if it is ill-equipped.

You may publish an unlimited amount of events at a time, but this occurs in sequence (i.e. rapid fire). The documents will processed by subscribers at a finite speed so you may find that you can publish faster than the documents can be subscribed/consumed. By increasing the amount of adapter processes for your subscribers, though, you may find that this case becomes manageable.

For example, if you rapid fire publish 30 events of size = 300K and each must be written to disk by a subscribing IO Adapter, you may find it better to run 30 adapter processes for the IO Adapter. This way, as events arrive, up to 30 IO Adapter clients are available for writing the 300K file to disk as opposed to waiting for the one adapter process to process all 30.

(To edit the number of adapter processes, use the Adapter Configuration tool).

Dan, I disagree with

“as events arrive, up to 30 IO Adapter clients are available for writing the 300K file to disk as opposed to waiting for the one adapter process to process all 30”

Correct me if this has changed with the 5.0 release of the broker, but in the good old day (4.x) my impression of what happens to events when received by the broker and delivered to subscribers are as follows for ATC and intelligent adapters respectively.

This discussion covers guaranteed events that are persisted to the broker-guar file and ATC adapters or intelligent adapters configured to log state.

PUBLISH TO BROKER
1 - Guaranteed event is sent to broker
2 - Broker stores event in broker-guar and adds pointers/references to the event to the client state for all the subscribers
3 - broker acknowledges the event to the publisher.

DELIVERY TO ATC SUBSCRIBER
1 - deliver event to subscriber
2 - ATC adapter master thread logs event to ATC database.
3 - Upon successful logging of event the ATC sends an Ack back to broker
4 - Broker removes pointer/reference from client state
5 - ATC delegates the processing of the acknowledged event to any of the ATC sub-threads

DELIVERY TO INTELLIGENT ADAPTER
1 - deliver event to subscriber
2 - Intelligent adapter sends logging event to logging adapter
3 - Upon successfull acknowledgement of the logging event sends an Ack back to broker
4 - Broker removes pointer/reference from client state
5 - Intelligent Adapter invokes the Integration Component for the event.

To support the requirement of maintaining order in delivering the events while at the same time ensuring end-to-end data integrity, the processing of logging the event MUST be single threaded (managed by a master thread), before it can delegate the processing of the event to the sub-threads.

This post might be a little bit off topic.

Rgs,
Andreas Amundin
www.amundin.com

That is, once you enable multiple processes or multiple threads for an adapter, you lose guarantee of order. Soo, only use multiple processes or multiple threads if order doesn’t matter.

Here is how it happens (presume two adapter processes).

  1. source sends event (sic. document) 1, a complex event
  2. source sends event 2, a simple event of the same type as event 1.
  3. target receives event 1 (process 1).
  4. target receives event 2 (process 2).
  5. target finishes processing event 2
  6. target finishes processing event 1

!
greg

Back to the original question:

==can anyone tell me the size of event in ES5.0 and number of events
==that can be published at a time.

==Thanks
==SAm

Well, the size of an individual (document) is limited only by total memory and (for guaranteed events) hard drive space available.

The number of events that can be published at a time is technically one*, the question of how much time it takes to publish a document and be ready to publish another document is a much more answerable question, and is highly dependent upon the hardware environment available. webMethods has published a performance document (c.f. response to SeeBeyond performance challenge) to show ideal conditions with optimal hardware limits on perfomance. There are enough variables to determine the speed at which you can publish documents that the best way to solve the problem is probably to set up the situation and test the actual throughput for your situation. On my laptop (500Mhz 512MbRam, for example, i could publish about 5000 .1k volatile events per second.

Tip:
When creating integrations, i tend (circumstances permitting) to create more fine grained events than large ones. My reasoning is that i can more finely control which elements subscribers care about, and (more importantly) creates a more even load for the broker. If you are working in a broker with many integrations, a large event causes the broker to focus on only it while it is being processed (i.e. queued), while with fine grain events, the broker can take a couple, process them, and then work on other tasks (other integration’s publications, for example) and then come back and process a few more, creating a more consistent pressure on the broker than large event bursts would.

-greg

*if time is in fact continuous, the probability of two “events” occurring at the “same time” with an event stipulated in such a way as to be in effect stochastically poisson distributed becomes arbitrarily small.

What Greg describes has been my experience as well. Order is guaranteed to the main thread of the ATC, ILA, intelligent adapter, etc. but once the event is dispatched to worker threads you have a race condition–which thread will finish first? That depends on many things including thread scheduling, the amount of work to be done, I/Os, etc.

So Andreas and Dan are both right–events are retrieved and logged serially by the main thread, in order, but then processed in parallel by the subthreads where order is lost and the performance benefits of multi-threading is gained.

Of course this is just my perception of how things appear to operate–I may be completely off-base with what is really going on under the covers!

Exactly, the broker guarantees the order of delivery. Once an adapter has acknowledged a delivered event the broker has satisfied the order requirement and will deliver the next event.

One point I want to make, with a follow up question, is that the ack of a guar event won’t be sent, by an adapter supporting data integrity, until the event has been logged. This guarantees that the process can be restarted if the process fails for any reason. My question: Broker 5.0 is supposed to have taken care of the logging before it delivers the event to the adapter. Does this happen?

Side note, for pre5.0, the event had to be logged by the adapter to a database through the use of the logger adapter before the ack could be sent. This was all done in the single threaded master process. True parallell processing power was never achieved as the event was logged typically on a different machine from where the adapter resided (if you followed wm guidelines). This commen only applies to Intelligent Adapters 4.5 and 4.6 and not ATC.

Comments anyone?

Andreas

For pre-5.0, we’re really talking about where serialized processing ends and parallel begins. It’s serial up to the point where logging is finished and parallel after that–otherwise what would be the point of having threads?

Also, one can configure multiple adapters to pull from the same queue. The adapters operate independently and queued events are always delivered in order. Event1 goes to adapterA, event2 goes to adapterB, event3 would go to whichever adapter finished first.

Lastly, adapters don’t need to retrieve only 1 event at a time. I don’t know if the off-the-shelf adapters retrieve multiple events in one get operation. I would guess no.

Ok…

Let us say we have an adapter A which publishes and waits for reply from adapter B. And adapter B publishes and waits for reply from adapter A. If any adapter publishes before it replies to its previous subscription, are we not looking into deadlock situation.

This can be avoided if we have multiple adapter processes. But we loose the event order.

On the event sizes…

Broker 5.0 can handle the event of any size, but what about the adapters. Many times I have seen the adapter throwing “Out of Memory Exception”(in 4.1.1 broker and for event size less than 6MB) and increasing the heap only solves the problem temporarily.

Please correct me if I am missing something.

On Ramesh’s first point:

Normally, the main queue that has subscriptions is not used for request/reply for just this reason. The main queue and thread retrieve subscribed events and dispatch it to a worker thread. The thread is given its own client queue to the broker to publish, request/reply. That way, things don’t get intermingled.

For your specific scenario, if both adapters only have one sub-thread then you could indeed get into deadlock–but one of the adapters will timeout waiting for the reply (you should never wait forever) and the other may succeed.

wrt the second point:

Huge events–do everything possible to avoid them. Message brokers were never intended for handling huge things as atomic entities, despite wM and other vendors best efforts to accomodate this abuse. Bust them up.

I am just going to chime in in support of Rob comment:

“Hugh events–do everything possible to avoid them.”

I can’t agree more. Many people have made this point and we’ll need to make it many more times.

Hugh events are often a sign that a batch process is being implemented. wME is an event driven asynchronous architecture that is not meant to support a batch process. Do everything to avoid a batch process or your project runs the risk of failure.

For future reference, use this thread to show that there are many of us out there that strongly recommend against hugh events and batch processes.

Rgs,
Andreas Amundin
www.amundin.com