Scalability of webMethods IS 6.1

Hi,

I am planning to have a single 6.1 Integration Server dealing with around 200 Transactions per seconds. Each transaction would consist of:
-call from the IS API;
-send a synchronous request to an external system on XML/HTTP (so I would have XML encoding/parsing to perform).

Do you know if a single Integration Server can handle this kind of high volume provided he is given enough memory and CPU ? Are there limits to vertical scalability ? What is the usual bottleneck ?

Thanks for your help.

Nicolas.

Nicolas,

There’s really not enough information to answer your question. If you look in Bookshelf > Product Information > White Papers there’s a document called webMethods 6.0.1 Technical Report - Platform Core Performance. That gives some examples of performance you might see. For example, on a 2CPU 3GHz Wintel box, we parsed 418 5K XML documents/second, which is pretty similar to what you’re talking about. However, if the document size is 66K, then the same hardware drops to 34 documents/sec. If that’s more similar to your needs, you might want to look at scaling both horizontally and vertically.

Assuming that you’re just doing what you describe (receiving an XML document and sending a request to an external system), and you’re not using logging, then the CPU is almost certainly the limiting factor. If you’re using guaranteed delivery, then the disk can become a bottleneck. If the document sizes are very large (e.g., 100MB), then memory can become the bottleneck. We’ve never seen a case where the network is the limiting factor for Integration Server, but it sometimes is for Broker.

If you can be more specific about what you’re trying to do, I can provide more a more accurate answer.

–Jeremy

Slight correction to my earlier note.

Hi Nicolas,

200 tx/sec is pretty high volume:
Even if theorically you may do that using an IS, I do not recommend it because:

  • My experience with a single heavy loaded IS is that it becomes instable (for instance, thread pools sometimes get out of control and grow heavily, audit runtime becomes unstable, etc).
  • What is the cost of loosing some transactions ? If it is high, you definitely need to make your sustem "redundant" by using multiple duplicated IS that will share the load and work in active-active mode. If one of the IS is down, others will continue. Even multiple IS on the same machine is better than one single big IS (that's my experience). One IS does not scale well above a number of CPU (4 or 6). Also, a big IS with many threads is worse than multiple IS will less threads (process CPU switching is better than thread CPU switching in many cases). If you can have redundancy in hardware, that is even nicer, you benefit from both availability and scalability. Along with redundancy, you can invest into a loadbalancer also. Note that if you use the java IS api (Context), you can specify multiple IS targets and it will manage failover and maybe load balancing for you.
  • If you make you IS redundant, I suggest to forget about the IS cluster option, because of the RepoV2 which is a performance issue but mostly a stability killer for high volume transactions (when are they planning to re-write it ?!).
  • If you do audit logging to the DB, watch out ! More because of the stability than for performance. In our case, we run into bad problems because the audit runtime is not robust. For instance, if the DB is not available for some time (tablespace full, of max segments reached, which can easily happen for high volume trasanctions even with good capacity planning, specially in the beginging of a production), the IS will buffer the audit logs until the DB is reachable, but meanwhile, the audit runtime eats some good portion of the CPU, degrading performance, and a percentage of the transactions will fail because of timeouts. then you will run into 2 issues,
    1. if the maximum number of the buffered audit logs is reached, all the threads will block. You can't even log to the IS ! That's a bad design in my opinion.
    2. if the DB becomes available, IS will flush buffered audit logs with an increased CPU usage, meanwhile some transactions will timeout and when it finished, the IS becomes very slow until you clean up all audit log files and restart it. So this is not so good for high availability ! (support know about it from our SR# but no action so far).
    Anyway, these issues can be minimized using redundancy and a good way to detect that one of the IS is not responding (execute a dummy service, like ping).
  • There are some JVM tunning options that help a *lot*, which you can find in some of the PDF files in advantage (a good one is "corrurentio").
  • - ...
riad

Hi Jeremy and Riad,

thank you very much for your answers. Here is a more precise description of the integrations I planned:

  • for each transaction, an IS client (using C API)calls a IS Service (I expect IS Client is not created each time but IS clients are pooled). I expect less than 5KB of data (wM native IData) to be sent. Then, simple validations and transformations (hardcoded, no database call) take place, then data is encoded in XML and sent to the external system. The external system sends data back in XML (less than 5KB), data is parsed, simple validations and transformations are performed and data is sent back to the IS client (output of the IS service called in the first place).

Please note that the call to the external system is a synchrnous call that will typically last 2-10 seconds, so I may have between 500 and 2000 concurrent connections openened with the external system. I guess this can be an issue

The platform is HP-UX so there may be distinct issues for Unix compared with the windows platform.

Any feedback based on this more detailed information would be really appreciated.

another precision to the note above: we do not need any guaranteed delivery feature on those ISs.

Thanks for the further details. As a rough order of magnitude, I’ll guess that you’ll get half the throughput we get in our 5K parse test, because of the backend call. That’s only a guess, though.

Since you mention hp-ux, there’s a report similar to the others I cited called hpux and webMethods Integration Server 6.0.1 (available on Advantage in Best Practices > Product Technical Notes). Looking at the 5K parse test results, it shows performance as high as 1175 docs/sec on an 8-way 875MHz PA-RISC box or 959 docs/sec on a 4-way 1.3GHz Itanium box. Be sure to take half that number to accomodate your architecture.

In short, I don’t think you’ll have any trouble reaching your 200 docs/sec requirement. How you choose to size your box will depend on how much room for growth you want, as well as any detailed benchmarking you perform.

–Jeremy

P.S. The one thing I don’t know about is keeping 500-2000 open connections… I have no numbers to indicate on how that will impact performance.