Performance issue | Common Considerations for Addressing Performance Issues in webMethods 10.15 Servers

Hi,

Following the migration from version 10.5 to 10.15 of webMethods Servers, we’ve encountered performance issues with the upgraded servers.

Errors:

502: Unexpected EOF at target

503: The Service is temporarily unavailable

504: Gateway Timeout

Issue: Proxy Throwing 502, 503, and 504 HTTP Status Codes

Flow: API GATEWAY → DNS → wM Rest API

We examined the following configurations in webMethods, which are closely associated with performance.

DNS TTL: 15 Mins
JAVA.MAX MEMORY: 8192
JAVA.INIT MEMORY: 5200
Maximum threads: 2000

and we compared all existing configs from the old 10.5 servers to the new 10.15 servers.

What additional checks can I perform apart from these configurations to pinpoint the performance issue in 10.15 Servers?

please feel free to add you’re insights and questions.

Thanks
Ajay

Do you have the latest fixes installed? Most of the performance degrading bugs are resolved in early fixes.

This doesn’t look like it has anything to do with IS, unless it is a bug.

Need more information about this one. It can be from the end point IS calls.

This might happen due to firewall/network settings. If you are using a proxy, check proxy by pass settings. Make sure there aren’t any redundant wild cards there. For instance, if there is a proxy by pass entry like *app*, it will by pass proxy for every http call that has “app” in it.

These values(thread pool) looked high to me but there is no formula that we can apply for heap size and thread pools. It is usually a trial and error and it highly depends on the number of cores you have on the server. If you set a value too high, the threads might starve hence get timeouts due to frequent context switches. If you aren’t running out of threads occasionally, don’t increase it too much. You can cut it by half and then another half and see if it helps.

Nevertheless we need more information if none of these are your root cause. You may want to share your JVM statistic, server.log and wrapper.log entries as well as your fix levels.

Another thing to consider, premature optimization is root of all evil. Did you test your server with default values? If you don’t need optimization, don’t optimize. You might have caused that behavior yourself by optimizing it prematurely. I know copying values from previous version seems like a good idea, but most of the times it isn’t. You don’t know how those values are set previously. Some of those might be set due to a lack of functionality that we had in the past but no more, and some of them might be set for testing purposes and be forgotten, or some might have been set due to lack of knowledge. If you want to optimize your IS, you can follow webMethods performance tuning documents. They are slightly outdated but most of the principles should still apply. You need empower access to review these documents.
https://empower.softwareag.com/Products/TechnicalReports/default.aspx

In addition to the info @engin_arlak shared, you’ll want to look beyond IS as well. Network/firewall/proxy settings would be good to check too.

The TTL setting of 15 min for host name resolution is unlikely to be an issue. IMO (and this is what we do) the JVM setting should be as low as possible – it is highly unlikely that host name lookups are a performance issue, but using stale IPs can be. We set ours to 1 min (I wanted 1 second but was overruled).

For the memory item, there are various POVs on this but one approach is to make INIT and MAX match. As with most things, pros and cons. Measuring and observation should identify whether a static heap (mx == ms) or a grow/shrink heap approach provides a benefit in your environment.

:+1:

Similar to another thread about optimization and “measure, measure, measure” there is this old guidance from 8. Don’t optimize prematurely - C++ Coding Standards: 101 Rules, Guidelines, and Best Practices [Book]

Spur not a willing horse (Latin proverb): Premature optimization is as addictive as it is unproductive. The first rule of optimization is: Don’t do it. The second rule of optimization (for experts only) is: Don’t do it yet. Measure twice, optimize once.

Some wise person told me that there is no such thing as best practice, there is only less vodka.

4 Likes

@ajaykumar.rangisetti You have done the upgrade. There is no turning back. Look at each error and get it fixed.

Http error code 500 series are server side. In your case, see where these servers are (whether it is Rest API server or some other server).

Timeouts are it’s not able to complete the processing that includes your backend server timing too, service unavailable if threads are too busy or server is down.

There is no one solution for all these errors.

**Do you have the latest fixes installed? Most of the performance degrading bugs are resolved in early fixes. **

→ Yeah, we have core fix 6
This doesn’t look like it has anything to do with IS, unless it is a bug.

→ The API Gateway generated this error because the target was not prepared to handle any requests

Need more information about this one. It can be from the end point IS calls.
→ this error also thrown from API Gateway since the target was not able to get the response back.

flow: DNS → wM Rest API → HTTPcall (this HTTPCALL just taking 9MS Max to respond back)

Looks like there there are more then 1 problem with your environment. The information you provided is still not enough. I recommend creating a support ticket, it is your best bet. If you want to keep this thread in the back ground while support is addressing your issues, it would be great if you also included your related log entries. While waiting you can test 1000 thread pools size and then 500 later. Looks like you have enough heap but we don’t know the rest of your hardware. You need to share more information. Try to share the performance metrics, hardware configuration, related log entries as well, and let us know if you made any changes with the OS. Sometimes we implement a workaround with a valid reason and we keep doing the same thing again an again even if the underlying issue is long gone, and the same work around can cause an unrelated issue.