Efficent/fast handling of large Xml Files

Sebastian_Hockmann · March 14, 2016, 4:33pm

Hello.

i have large Xml Files ( approx 20-30mb). I use the service “joAppEDI.services.Inbound.Bundesanzeiger.SPL:XmlSamples_v5”
in attached package.

I want to process the attached xml file. pw for “huge.zip” is 1234.

The current runtime is not satisfying and produces a very high memory consumption and cpu utilization.

My Implementation:

getFile as stream
xmlStringToXMLNode
get Iterator
get NextElement
queryXmlNode
xmlNodeTo Document

Is there a better approach ?

Thanks a lot for your ideas / best practices and hints

BR,
Sebastian

P.S. at the moment i can’t upload the service. If needed i can send it e.g. via mail
huge.zip (811 KB)

rmg · March 14, 2016, 4:38pm

Sebastian,

I believe you are on the right approach of the handling large XML streams the flow steps you mentioned below.

Also some parameters for processing the large XML data would be based on your IS/JVM memory heap configurations etc… Could slower the run-time performance you are seeing.

So is your test environment is better configured to handle these large XML payloads and thresholds?

HTH,
RMG

Mahesh_K_Sreenivas · March 14, 2016, 5:00pm

Read this chapter from IS administration guide, Configuring the Enhanced XML Parser this will help you to handle large xml files.

Sebastian_Hockmann · March 15, 2016, 4:11pm

Hi.

but i do not own a terracotta license. Can i still use advantages of Enhanced Xml Node ?

When i try to activate only Cache (no BigMemory) under IS → enhanced XML parsing i got
the message “License key file D:\SoftwareAG\webMethods96\common\conf\terracotta-license.key does not exist or cannot be read”

See screenshot attached.

BR,
Sebastian

Sebastian_Hockmann · March 15, 2016, 4:43pm

Hello.

currently i get a total runtime for processing a 36mb XML Document about 30min. is this normal ?

Are there some do’s and don’t dos or some recommendations in xml processing ?

What can i do on IS side to improve the “parsing power”. E.g. for the moment it seems as if IS only
utilizes 1 cpu and the second one “sleeps”.

Thanks for your hints.
Sebastian

Sebastian_Hockmann · March 15, 2016, 5:44pm

Hello.

one more question.

I use appendToDocumentList for the several xml segment to build/mapping the new xml document from the source xml.

So the resulting Xml document is completly in memory i guess.

Would it be more efficient or faster to write the new xml document segment to disk? so the new xml file can be built up step by step.

If it completed i could use loadXml and send it to my webservice ? I am thinking about using some StreamWriter.

Sounds this ok for you or am i going wrong ?

BR,
Sebastian

Holger_von_Thomsen · March 16, 2016, 1:15pm

Hi Sebastian,

can you explain your business case a bit more in detail please?

Using appendToDocumentList sounds a bit strange here.

Regards,
Holger

Sebastian_Hockmann · March 16, 2016, 1:42pm

Hello Holger.

for sure.

We are using SAP GTS. We must upload XML Sanction Party Lists (SPL) provided by a Data Provider into the System.
We would like to use a standard webservice from SAP.

The main problem is that the SPL File looks like this.

<n0:SLL_SPL_DATA_FILE xmlns:n0=“SAP Help Portal”>
<SLL_SPL_CONTROL>
<SLL_SPL_DATA_PROVIDER_ID>340888228</SLL_SPL_DATA_PROVIDER_ID>
<SLL_SPL_FILE_ID>SLS_200019</SLL_SPL_FILE_ID>
<SLL_SPL_VRSIO>US-00831</SLL_SPL_VRSIO>
<SLL_SPL_BASE_URL>http://www.awr-portal.de/SubBoy/pdf.jsp?site=</SLL_SPL_BASE_URL>
</SLL_SPL_CONTROL>
<SLL_SPL_ENTITY>
<SLL_SPL_HEADER>
<SLL_SPL_DATA_REF_ID>102280</SLL_SPL_DATA_REF_ID>
<SLL_SPL_LIST_TYPE>DPLUS</SLL_SPL_LIST_TYPE>
<SLL_SPL_AUTHORITY></SLL_SPL_AUTHORITY>
<SLL_SPL_LAW1>USDEB</SLL_SPL_LAW1>
<SLL_SPL_LAW2></SLL_SPL_LAW2>
<SLL_SPL_LAW3></SLL_SPL_LAW3>
<SLL_SPL_ENTRY_DATE>2013-02-12</SLL_SPL_ENTRY_DATE>
<SLL_SPL_VALID_FROM>1994-02-07</SLL_SPL_VALID_FROM>
<SLL_SPL_VALID_TO></SLL_SPL_VALID_TO>
<SLL_SPL_COMMENT1>(59 Federal Register 5664, February 7, 1994)</SLL_SPL_COMMENT1>
<SLL_SPL_COMMENT2></SLL_SPL_COMMENT2>
<SLL_SPL_COMMENT3></SLL_SPL_COMMENT3>
<SLL_SPL_URL>USDEB</SLL_SPL_URL>
<SLL_SPL_ENTITY_TYPE>ORGANIZATION</SLL_SPL_ENTITY_TYPE>
<SLL_SPL_GROUP>DEB</SLL_SPL_GROUP>
</SLL_SPL_HEADER>
<SLL_SPL_NAME>
<SLL_SPL_DATA_REF_ID>102280</SLL_SPL_DATA_REF_ID>
<SLL_SPL_REF_TYPE>AKA</SLL_SPL_REF_TYPE>
<SLL_SPL_NAME1>Aero Systems, Inc.</SLL_SPL_NAME1>
<SLL_SPL_NAME2></SLL_SPL_NAME2>
<SLL_SPL_NAME3></SLL_SPL_NAME3>
<SLL_SPL_NAME4></SLL_SPL_NAME4>
<SLL_SPL_NAME_CO></SLL_SPL_NAME_CO>
<SLL_SPL_EXTERNAL_ID></SLL_SPL_EXTERNAL_ID>
<SLL_SPL_PASSPORT_COUNTRY></SLL_SPL_PASSPORT_COUNTRY>
<SLL_SPL_IDENTIFICATION_NUMBER>12345</SLL_SPL_IDENTIFICATION_NUMBER>
<SLL_SPL_PASSPORT_NUMBER>990142545</SLL_SPL_PASSPORT_NUMBER>
</SLL_SPL_NAME>
<SLL_SPL_ADDRESS>
<SLL_SPL_DATA_REF_ID>102280</SLL_SPL_DATA_REF_ID>
<SLL_SPL_COUNTRY>XX</SLL_SPL_COUNTRY>
</SLL_SPL_ADDRESS>
</SLL_SPL_ENTITY>
<SLL_SPL_ENTITY>
<SLL_SPL_HEADER>
<SLL_SPL_DATA_REF_ID>102290</SLL_SPL_DATA_REF_ID>
<SLL_SPL_LIST_TYPE>DPLUS</SLL_SPL_LIST_TYPE>
<SLL_SPL_AUTHORITY></SLL_SPL_AUTHORITY>
<SLL_SPL_LAW1>USDEB</SLL_SPL_LAW1>
<SLL_SPL_LAW2></SLL_SPL_LAW2>
<SLL_SPL_LAW3></SLL_SPL_LAW3>
<SLL_SPL_ENTRY_DATE>2013-02-12</SLL_SPL_ENTRY_DATE>
<SLL_SPL_VALID_FROM>2009-08-25</SLL_SPL_VALID_FROM>
<SLL_SPL_VALID_TO></SLL_SPL_VALID_TO>
<SLL_SPL_COMMENT1>(74 Federal Register 42949, August 25, 2009)</SLL_SPL_COMMENT1>
<SLL_SPL_COMMENT2></SLL_SPL_COMMENT2>
<SLL_SPL_COMMENT3></SLL_SPL_COMMENT3>
<SLL_SPL_URL>USDEB</SLL_SPL_URL>
<SLL_SPL_ENTITY_TYPE>PERSON</SLL_SPL_ENTITY_TYPE>
<SLL_SPL_GROUP>DEB</SLL_SPL_GROUP>
</SLL_SPL_HEADER>
<SLL_SPL_NAME>
<SLL_SPL_DATA_REF_ID>102290</SLL_SPL_DATA_REF_ID>
<SLL_SPL_REF_TYPE>AKA</SLL_SPL_REF_TYPE>
<SLL_SPL_NAME1>Aguilar-Medina, Guillermo</SLL_SPL_NAME1>
<SLL_SPL_NAME2></SLL_SPL_NAME2>
<SLL_SPL_NAME3></SLL_SPL_NAME3>
<SLL_SPL_NAME4></SLL_SPL_NAME4>
<SLL_SPL_NAME_CO></SLL_SPL_NAME_CO>
<SLL_SPL_EXTERNAL_ID></SLL_SPL_EXTERNAL_ID>
<SLL_SPL_PASSPORT_COUNTRY></SLL_SPL_PASSPORT_COUNTRY>
<SLL_SPL_IDENTIFICATION_NUMBER></SLL_SPL_IDENTIFICATION_NUMBER>
<SLL_SPL_PASSPORT_NUMBER></SLL_SPL_PASSPORT_NUMBER>
</SLL_SPL_NAME>
<SLL_SPL_ADDRESS>
<SLL_SPL_DATA_REF_ID>102290</SLL_SPL_DATA_REF_ID>
<SLL_SPL_COUNTRY>XX</SLL_SPL_COUNTRY>
</SLL_SPL_ADDRESS>
<SLL_SPL_DATE_OF_BIRTH>
<SLL_SPL_DOB_FROM>1981-11-01</SLL_SPL_DOB_FROM>
<SLL_SPL_DOB_TO>1981-11-30</SLL_SPL_DOB_TO>
<SLL_SPL_DOB_ISSUING_CITY></SLL_SPL_DOB_ISSUING_CITY>
<SLL_SPL_DOB_ISSUING_REGION></SLL_SPL_DOB_ISSUING_REGION>
<SLL_SPL_DOB_ISSUING_COUNTRY></SLL_SPL_DOB_ISSUING_COUNTRY>
</SLL_SPL_DATE_OF_BIRTH>
</SLL_SPL_ENTITY>
<SLL_SPL_ENTITY>
…
</SLL_SPL_ENTITY>
…

The webservice expects the same structure, except the fact that it need the additional element CONTROLLER.
(see screenshot)
Also all empty fields have to be set to zero. But the pipeline “forget” these empty elements.

So i have to through the Xml document and set the CONTROLLER element and some validations on the fields.

Yesterday i replaced appendToDocumentList with addList from PSUtilities. But there is no real performance boost.

So i’m searching for the clue what makes my current implementation so slow.

For a 19mb XML the service runs 10min. if i execute savePipeLineToFile at the end of the service the file has about 700mb.

Any suggestions ?

I can send my current implementation if you want.

Thanks and best regards,
Sebastian

rmg · March 16, 2016, 4:48pm

Yes I believe appendToDocumentList does takes lot of memory to hold the list temporary and original list and probably writing the XML to the disk or using customer java streamer could be fast again its all depends on the IS handle for this kind of read/write large XML payloads.

Let us see what other expert users chime on this particular issue.

HTH,
RMG

Holger_von_Thomsen · March 16, 2016, 6:26pm

Hi Sebastian,

hope you have create a WSD (type Consumer) for the SAP WebService WSDL.

Then you will find some DocumentTypes in it with the required structures.
Here you can also check if the candidates for “empty” fields must exist or not and if they can be set to null (xsi:nil) or not.

When you have read in the payload as XML String convert it to the matching DocType reference by using build-in services under pub.xml:*.

Map this Document to Input-DocType of the WebService and aad the missing structures.
You can use pub.schema:validate to check if the document matches the doctype

Then call the WebService with this Input-Document.

This is approach should be quite fast excpet for reading in the payload which depends on the size of file.

See IS Build-In-Services guide for informations about the mentioned services.

Can you share a screenshot of your flow?
We will try to find out then where the service spents the time.

Regards,
Holger

rmg · March 16, 2016, 7:24pm

I was under assumption Inbound receives the XML payload and parse it for the mapping to a different target document reference structure and if this is not correct then Holger’s advise will do the trick of invoking the SAP WSDL and process the XML data to the WS end point.

HTH,
RMG

Sebastian_Hockmann · March 18, 2016, 9:48am

Hello.

i found my issue. I had used 3 Xml Iterators for the same file.

First i selected the with the iterator the first segment, then the last one.
After that i read all other segments. (approx. 13.000).

I changed this. Now i am using 1 iterator with selection criteria for
the needed 3 segments.

Before i had runtime approx 30min for a 36mb xml file. After the change
the runtime is 45 seconds. That is very high performance boost.

Although i could not explain technically why this happened.

So only the problem with the empty WSDL date fieds remain.

In Designer i could not change the date field to not required or optional.
Even if is set them to 0000-00-00 the webservice is not called cause the
validation fails.

So what it is trick to allow empty date fields? I attached the WSDL file.

Thanks a lot for your great support and help so far.

Sebastian
sanctionedpartylistupdatein_binding.xml (19.3 KB)

Holger_von_Thomsen · March 18, 2016, 12:52pm

Hi Sebastian,

most likely you will have to get this changed in SAP, then re-export the WSDL from there and re-import in webMethods.

If the field is really required, it should be allowed to be “null” (empty).

Regards,
Holger

Topic		Replies	Views
Problem with ComplexTypes (XML RPC Server ) EntireX	12	3694	April 2, 2021
Handling large messages in webMethods EDI	9	1254	April 2, 2021
Difference Between WSDL array Definition - JBOSS error EntireX	4	4507	April 2, 2021
How to send XML with pub.clent:http ? EDI	16	7244	April 2, 2021
Edi 210 EDI	61	9338	April 2, 2021

Efficent/fast handling of large Xml Files

Related topics