PARSE XML with large XML files

Stefan3 · May 6, 2014, 2:24pm

Hello,

is it possible to parse large XML files line by line with Natural, kind of like a SAX parser in Java? The only approach I have found (in the documentation and in this forum) is to read the whole (!) XML file into a dynamic variable and call PARSE XML on it (which seems like a DOM parser to me). This works fine for small XML files, but for larger files (the magic file size in our environment seems to be around 30 MB) I get:

NAT1222 Memory required for statement execution not available.

The code I use:

DEFINE WORK FILE 1 #FILENAME TYPE 'UNFORMATTED'
READ WORK FILE 1 #XML
END-WORK
*
PARSE XML #XML INTO PATH #XML-PATH NAME #XML-NAME VALUE #XML-VALUE
  INPUT (AD=IO) #XML-PATH (AL=70) #XML-NAME (AL=70) #XML-VALUE (AL=70)
END-PARSE

I already tried reading the work file line by line and calling PARSE XML for each line, but then I get (probably because the XML fragment is not a valid XML document):

0350 NAT8311 Error parsing XML document

Could anyone tell me if it is possible to parse large XML files with Natural?

Best regards,
Stefan

Ralph_Zbrog · May 7, 2014, 7:16am

I was able to parse a file of almost 45Mb by increasing the Work Area Size from the default 20Mb to 200Mb. I’m using Natural for Windows 8.3.1.

Natural Configuration Utility → Natural Parameter Files → NATPARM → Natural Execution Configuration → Buffer Sizes → Work Area Size (USIZE)

Finn_the_Dane1 · May 7, 2014, 9:36am

The Danish company register just converted their output format to XML as ONE big document = 1.3GB !!!

Most environments don’t support this size, so as a workaround I helped a customer create a “front-end” for breaking the XML into the actual “records” and then passed these to the parser.

More precisely:
Read a suitable chunck of data and feed this into a buffer until you have the relevant XML-elements, then pass this to the parser and delete it from the buffer - and then refill the buffer with the next record.

Finn

Stefan3 · May 7, 2014, 3:03pm

Hi Finn,

is this implemented in Natural or externally? How can I read a part of an XML file with Natural and provide a valid XML document to the parser? If I stop reading e.g. after a certain number of lines, the XML document is incomplete and Natural’s parser will not process it.

Best regards,
Stefan

Finn_the_Dane1 · May 7, 2014, 3:17pm

The structure of the document is something like this

…

…

…

…

So I read and fill up the buffer until I have both the location of the start- and end-tag of
And then copy this section of the buffer to a dynamic string that I pass to the parser.
You of course have to generate a parser-subprog from a schema that only contains the section you want to parse.

all of it simple stringhandling and all done in Natural

Stefan3 · May 8, 2014, 8:45am

Hello,

it seems to me that the main problem is the lack of a SAX-like XML parser in Natural Both solutions (increasing the Work Area Size and writing your own “parser”) only work around the limitations of the current implementation of XML handling in Natural (DOM).

We use Natural for batch processing, which uses large work files simply due to the large number of processed records. How can it be that Natural only provides a DOM parser and not a SAX parser? I think the latter would be far more useful in a system like Adabas/Natural.

However, as we probably won’t be able to solve this problem ourselves, I think I’ll try to split the XML file up into smaller parts (externally) and then have Natural process them one by one.

Best regards,
Stefan

Finn_the_Dane1 · May 8, 2014, 8:52am

Hi Stefan
To my best knowledge the Natural parser IS in fact a SAX-parser !
The problem is that the only input for the parser is dynamic string, and that there is a limit to the practical lenght of this.

Perhaps someone could think up a variant that takes a work file as input ?!

Finn
BTW the string handling in Natural is not that tricky, so why split the process in two ?

Topic		Replies	Views
extract data from XML file to ADABAS Adabas-Natural , Natural , Natural-on-Mainframes	13	2699	April 2, 2021
parse xml Adabas-Natural , Natural , Natural-on-Mainframes	4	6177	April 2, 2021
PARSE XML - error NAT0285 Adabas-Natural , Natural , Natural-Web-and-XML-Technology	2	9386	April 2, 2021
Parse XML issues. Adabas-Natural , Natural , Natural-on-Mainframes	7	5198	April 2, 2021
XML SAX parser - how to handle parser errors Adabas-Natural , Natural , Natural-Web-and-XML-Technology	2	5787	April 2, 2021

PARSE XML with large XML files

Related topics