How to handle large XML in webmethods 9.9

Hello guys,

I have around 20MB of XML to be processed in webMethods9.9. Please let me now what is the best approach.

I tried using below logic.

  1. xmlStringToXMLNode
  2. getXMLNodeIterator (criteria=)
  3. Repeat (count not sure what to set Repeat on=success)
    3.1 getNextXMLNode
    3.2 xmlNodeToDocument

Used the above logic but I don’t know what count should I set in repeat.

Regards,
Nikhil Pardeshi

empty!

Explore “Enhanced XML Parsing”

Hello,

I tried that as well but want to know what should be the basic configuration set at IS page for Enhanced XML parsing.

Checked in guide but still not sure how to put in numbers. I have set MAX HEAP size to 2gb.
For time being i am using default settings and implemented below logic.

  1. xmlStringToEnhancedXMLNode
  2. xmlNodeToDocument

My MAX HEAP size is set to 2gb.

Use of Repeat generally means you know how many iteration it should loops. Other use Loop and then break out with Exit step.

Of course Repeat can work the same way and I think this is where you don’t set any count value.

I built solution before that handled 100mb+ xml and its not that much different than processing large flat file by stream.

HTH.

Thank you guys–from code point of view this is great—I would also like to know more from infrastructure side as well – would it be better to save the data to local disk on IS or some other server, or terracotta server which is loading to memory and reading it from there–what is the best strategy.

Thank you

Hello Guys, I already developed a code using repeat step. Below is the reference for that.

  1. xmlStringToXMLNode
  2. getXMLNodeIterator(criteria=multiple occurring doc name)
  3. repeat(label=largeXML repeat on=success)
    3.1 getNextXMLNode(output=next)
    3.2 Branch on next
    3.2.1 exit(label=null exit from=largeXML signal=success)
    3.3 xmlNodeToDocuement(write your logic)

From the infrastructure point of view, you can use tspace present in extended settings.
tpsace specifies the maximum number of bytes that can be stored at any one time in the hard
disk drive space that you defined using the watt.server.tspace.location property.

Extended settings in IS page:
watt.server.tspace.location=…/IntegrationServer/Largefiletemplocation
watt.server.tspace.max=52428800

The default value is 52,428,800 bytes (50 MB).

You can also try below settings:

set JAVA_MIN_MEM=1024M
set JAVA_MAX_MEM=4096M
set JAVA_MAX_PERM_SIZE=4096M

If the failure occurs in the middle of large file processing, how would you process the failed chunk–or you would have to start reprocessing the entire file again from beginning?

1 Like

This is entirely up to the solution design. For example, you could fail the process, if it is a batch file from external and it was due to data integrity issue.

Same situation again and you could partially fail the process and response back the last successful location/index, so that the retry will start from failure point.

Generally you can restart processing from failure point if the solution keep track of processing status “reliably” even if it is to do with temporary technical issue.

There’s no point to reprocess the same data over and over again if the failure reason was not resolved. Eg: data issue, capacity issue and infrastructure issue etc.

HTH

Yes, it depends upon your design.

Suppose if your target is DB, then you can make your connection as local and explicitly commit or rollback partial transactions.