Large XML file handling in webMethods without TN - URGENT

Hi All,

We have the following requirement to handle large XML files of size 40-50 MB.

  1. We receive the large XML document through HTTP Post to webMethods and size will be 40-50 MB.

  2. We have to convert the incoming XML document to Flat file and write the file to shared folder.

  3. Send the flat file to Target server through FTP.

I had gone through server posts in wMUsers using XML Node iterator and getNextNode to process records. But if there are millions of records in the XML document and it will be time consuming to process each XML record and
convert it into flat file and append the string to file in shared folder.

Can we process the records in the chunks of 1000 records at a time and how to process the same if possible?
OR
Is there any way to handle the large XML files efficiently?

Regards,
Kumar

yes there are ways to read big xml files in webMethods…

as I know one way that gets metadata of a file in a document and publish it and in subscribing side, after getting this doc, use a java service which will read chunk of this file (chunking will be controlled in java by using some variables - file size, max no of line for a sub document) and loop will be there in java to read all the chunks and will publish a document with it, may be max 1000 records per doc. Add one 2 string in sub doc to check it that it is first doc or it is a last doc. After reading junk file, we have to do some appending of xml tags to make it valid and then convert this in doc and publish it.

Suppose 5 docs are published then convert them in flat file and give them proper name and save them in a dir and in last append them in a same way without breaking sequence and move it to a different dir and then send it to client/customer…

I hope it will hep you …

Thanks
Bijendra

Read 1000 via the node iterator approach, then write 1000.

Hi Rob,

Could you pls let me know how or where to specify the Record count like 1000 using NodeIterator?

I want to get 1000 records in the xml each time.

Sample Code:

  1. pub.xml:xmlStringToXMLNode (Input: $fileStream)
  2. pub.xml:getXMLNodeIterator (Input: “node” from previous step, “criteria” - records which are multiple)
  3. Repeat step ( Until data exist)
    3(a) pub.xml:getNextXMLNode (Input: “iterator” from Step 2)
    3(b) Map data to flat file
    3(c) pub.flatFile:convertToString
    3(d) writeToFile

Regards,
Kumar

1 Like

You would not specify a record count for any existing service. You’d call getNextXMLNode 1000 times. Then map, convert to string and write to file. Repeat until done.

Be aware that the returned next/node object will be reused on the next call to getNextXMLNode so you’ll want to xmlNodeToDocument for each call to getNextXMLNode. Then once you have a list of 1000, you can map, convert and write the 1000 items.

1 Like

Hi Rob,

Thanks for your reply. So I think the following would be the code

  1. pub.xml:xmlStringToXMLNode (Input: $fileStream)
  2. pub.xml:getXMLNodeIterator (Input: “node” from previous step, “criteria” - records which are multiple)
  3. Repeat step ( Until data exist)

3(a) pub.xml:getNextXMLNode (Input: “iterator” from Step 2)
3(b) Map data to flat file
3(c) pub.flatFile:convertToString
3(d) writeToFile

Sorry…Please ignore the code in my previous message. The new code would be

  1. pub.xml:xmlStringToXMLNode (Input: $fileStream)

  2. pub.xml:getXMLNodeIterator (Input: “node” from previous step, “criteria” - records which are multiple)

  3. Repeat step ( Until Main data exist)
    3.1 Repeat step (RepeatCount = 1000)
    3.1.1 pub.xml:getNextXMLNode (Input: “iterator” from Step 2)
    3.1.2 Map data to tempFlatFileDoc
    3.1.3 pub.list:appendToDocumentList - Append until 1000 records are collected

    3.2 pub.flatFile:convertToString
    3.3 writeToFile → Append string each time we get 1000 records.


I am looking forward for a better approach than this. I think I would end up in creating a java service to get 1000 records each time until data exist. Correct me or let me know if there is any better way of doing it.

Thanks in advance…

Regards,
Kumar

That is the basic approach (though you want to be careful about assuming the file will always have >1000 records and how to handle the “tail” of the file which undoubtedly would not always end with exactly 1000 records) . What are your specific objections to the approach that you’re looking for something “better?” In what ways would Java be better?

You could use something other than appendToDocumentList to accumulate the list. You could have your own service that uses a Collection class underneath, for example.

Hi Rob,

Thanks for your quick reply. If there are millions of records in the input xml then we would end up in calling the below three steps for million times which takes performance hit.

3.1.1 pub.xml:getNextXMLNode (Input: “iterator” from Step 2)
3.1.2 Map data to tempFlatFileDoc
3.1.3 pub.list:appendToDocumentList - Append until 1000 records are collected

So instead of calling above 3 services for those many times, I am looking for some java code where I can extract the records based on start index and end index to retrieve all records by index and process them in chunks. Correct me if I am wrong.
Let me know if there is any better approach.

Thanks in advance.

Regards,
Kumar

Hi Rob,

Is there any better approach that you can think of.

Regards,
Kumar

IMO, you’re optimizing prematurely. You do not yet know if this will be too slow nor exactly what the performance hit might be. Give the steps a try. You may be surprised to find it might be fast enough.

If I were to change anything about the steps I might use something other than appendToDocumentList. I’d replace it with my own service that adds the line to a Collection of some sort, perhaps a LinkedList. Then have another service that converts that collection to the form needed to write to the file.

Use of an index implies that all lines are in memory in some way.

The node iterator is intended to be fast and memory efficient. Give it a chance.

Thanks a lot…I will definitely try…

Regards,
Kumar

I have to handle a large XML file. How to process that using getXmlNodeIterator and getNextXMLNode?

Did you check the BIS documentation regarding the services and it should give you usage on the sercvice that process node by node for the large XML payloads.

Also please use the search in this section that will give you more results on the already discussed in wmusers.com legacy forum…

HTH,
RMG

Here you go for sample code for processing large XML on IS…

http://techcommunity.softwareag.com/ecosystem/communities/codesamples/webmethods/esb/SAMPLE-20120803202914137.html

-Senthil

Hi Pradeep, Have you created a java service for this scenario where >1 lakhs of records exists? Even i am trying to do the same but couldn’t able to do. Plz let me know ur email id.