Large XML handling - Issue

Hi,

We have a requirement to process a huge (1GB) XML file in webMethods.
Given below is the sample XML and wM code to process the huge file using ‘nodeIterator’ approach.

Sample XML:



Item 1
100
100.00


Item 2
200
200



Item 100000
700
700.00

wM Code:

1.11 pub.file:getFile (Input → loadAs = “stream”)
1.12 pub.xml:xmlStringToXMLNode (Input → filestream)
1.13 pub.xml:getXMLNodeIterator (Input → criteria = “OrderItem”)
1.14 SplitProcess:Repeat
1.141 pub.xml:getNextXMLNode
1.142 Branch on ‘/next’
1.1421 $null:EXIT “SplitProcess”
1.1422 $default:SEQUENCE
1.14221 pub.xml:xmlNodeToDocument
1.14222 pub.xml:documentToXMLString
1.14223 processOrderItem

When we tested the code with 25MB file, the processing took about 13secs.
However, when we gradually increased the size to 250MB, the processing took about 1hr 20mins, and it affected the server performance severely and the processing became horrendously slow.

Any inputs to the below questions will be of great help:

  1. Are we loading the entire data into the memory as per this approach?
  2. If yes, what would be the alternate way to process this data.
  3. Similar to ‘LargeFileHandling’ in EDI, can we process this data in chunks by writing to an alternate hard disk location.

We have referenced to below link while deciding the approach.
http://www.wmusers.com/ezine/2002dec_pupadhya_1.shtml

Thanks,
Bot

Are you using moving window property? If not set movingWindow to “true” in pub.xml:getXMLNodeIterator service. It will disregards old nodes in memory

Regards,
Saravanan.E

Hi Saravanan,

Thanks for the reply.

Currently, we have the movingWindow property set to true.

Earlier, we processed a 250MB file without setting the property and the processing took about 1hr 20mins.

After setting the property to true, a 125MB file took about 1min 2secs to process.

There is definitely a noticeable difference in performance.

However, we had tested it when the server memory usage was around 90% which was consumed by other processes.

Will run further tests with larger files and provide an update.

Meanwhile, any inputs to the below questions is highly appreciated:

  1. Are we loading the entire data into the memory as per this approach?
  2. If yes, what would be the alternate way to process this data.
  3. Similar to ‘LargeFileHandling’ in EDI, can we process this data in chunks by writing to an alternate hard disk location.

Regards,
Bot

  1. Are we loading the entire data into the memory as per this approach?

if we use movingWindow - true then you are loading only current node in memory. Old nodes will be disregard

  1. If yes, what would be the alternate way to process this data.
  2. Similar to ‘LargeFileHandling’ in EDI, can we process this data in chunks by writing to an alternate hard disk location.

Other way is you can split each node and publish to broker and subscribe the document in multiple IS
Regards,
Saravanan.E

We moved our service to WebLogic because webMethods couldnt handle 25MB size XML.

As you can see above, IS can handle large XML documents, if the integration is designed and implemented correctly.

Two key concepts are needed: 1) don’t load the entire document into memory. Instead iterate over the nodes 2) implement a mechanism for processing individual records/documents in parallel.

Generic statements such as “X couldn’t handle Y” are usually incorrect if X is in the hands of a person with the right skills and experience.

Mark

Hi Saravanan,

Thanks for the confirmation!

Regards,
Bot.

Not a true statement. I have seen XML files almost 200 times the size mentioned above processed in WM with no issues under 30 secs.

As Mark mentioned above, If you have the right person with the skills and experience they will make it happen.

Cheers,
Akshith

Intresting any other solutions…

I have previously implemented large file XML processing using open source STAX API in a java service (receiver service) .

http://docs.oracle.com/javase/tutorial/jaxp/stax/why.html

I did not have a chance to look at ehcache but may be you could use it too for large file processing.

Cheers,
Akshith

HI Saravanan.E and Akki

This wmuser thread helps to create an flowservice for large file handling.
Thanks a lot

Regards,
Jeevan