xmlNodeToDocument is giving memory utilization spike for large document

we have a large XML string( about 250 MB) to be converted to IS Document. the pub.xml.xmlNodeToDocument is utilizing lot of memory and server memory utilization goes upto 99%. The xmlNode iterator is not helping much since it’s a complex xml structure. Is there any way to mitigate this?
Thanks in Advance

Can you share the structure?

What are the ways to break the document into logical units? Think about how this might be done in a “normal” programming language – what would be done there to be memory efficient? The answers to that can lead to how to do the same with possibly a combination of built-in and custom service.

Another item to consider, if you have any control over the source – don’t create a large document/entity. Break it up. Of course, this depends on what the underlying scenario is. Any additional info may lead to more useful suggestions to consider.

1 Like

@vignesh.govindhan Native of Integration does support processing a file upto 20 MB [ I think so ] by default without any issue . if Anything above which is memory consuming process which directly affect the JVM run time.

As @reamon Mentioned you need to break the files and process it as chunk , So that it will not direclty sit on your JVM

I really like this part. Sometines we should ask this kind of question before starting doing any development in webMethods.

I bring my own experience on the table.

Few months ago, I’ve received a file around 5GB from the client. It is a flat file of 5 years of SWIFT transactions. The file has a complex structure, so I can not get a schema from it and use the large flat file handling method of webMethods. I’ve decided to use Python (to not depend on the JVM) to process and split the file to multiple flat files that I can process using flow services in webMethods.

This feedback with the suggestion of @reamon (Think about how this might be done in a “normal” programming language – what would be done there to be memory efficient?) may probably help you to find a solution that fit your needs.

There is no hard 20 MB limit for processing files or other data sets.

3 Likes

Keep in mind that this does not necessarily mean splitting a large file into multiple physical files, though that is indeed an option. Just don’t load the whole thing into memory. Multiple ways to process it in a memory effiecient manner.

1 Like

Thanks for sharing you input and get to know this insight . I am intended to refer the default values for large file configuration extended properties which Integration server watt.server.tspace.max = 50 MB .

I am agree with you that there is no hard limit for processing files.

Processing a very large file (100mb+) in integration server is usually not a good idea. If you absolutely need to do it, its better to dedicate an integration server or 2 for this purpose in production environment. If there is a repeating block in the XML file, you can implement a lightweight service to process that block and move to the next block and so on. If you need to load all of the xml file in to memory, you need to increase your heap size greatly(several GBs) even if you implement using bigXML.

This only applies to Trading Networks and incoming documents (entry services). It is not a constraint in any other context. Based upon your original post, you’re not using TN.

If a client is posting to your IS, you could implement something similar to how TN does it as long as the HTTP post has a Content-Length header. If the transfer is chunked, then a different approach would be needed.

A guideline to keep in mind, which aligns with @engin_arlak comments, is that middleware tools (particularly publish/subscribe tools) were intended for lots of little calls/messages/events/documents. Not big batch jobs.

While there are ways to handle large files within IS, just as there are with any programming environment, one needs to determine whether or not IS is the right tool in such situations. And if you do use IS in such cases (we do so quite a bit) just do so in a memory friendly manner – this requires full understanding of what the built-in services do and don’t do and perhaps at times writing your own code.

The built-in services and documentation descriptions tend to lead to solutions that load things fully into memory. And to add to the fun, the behavior of mapping strings causes the string to be duplicated. Each time. So for a 500 MB string that is passed to a couple of services it is very easy for that to be replicated multiple times consuming 1.5, 2 or more GB. And if you don’t have a tidy pipeline, you can run into memory issues.

HTH

1 Like

In the past(a decade ago I guess) I have created a Java service using StAX parser to break a single large XML file (up to 5 GB) in to multiple small files\chunks and was able to process them. I explained it in detail in a blog post back then. See if you can access below URL.

https://akshith-webmethods.blogspot.com/2013/08/using-webmethods-java-service-stax.html

2 Likes

I can not read the blog post. I get authorization refused. Can you check the link ?

It is a pretty old Blog that I closed out to public due to a conflict of Interest with an old Employer about my innovations and ownership. I have opened it out to public access now. You can try again.

I will restart publishing articles again starting this quarter.

1 Like

Wow, this post is incredibly informative!

did you try these options

1)pub.xml:getXMLNodeIterator

  1. pub.xml:loadEnhancedXMLNode

Great post, learned a lot.