Large flat files needed to be converted to XML

Hi,

I need your advice for below mentioned: Please help.

I have a requirement, wherein, I need to format 2 different huge flat files from 2 diff sources into a target xml file. These 2 source files arrive at different intervals of the day. In the night, we need to do little transformation and then merge both files into single xml file.
The target xml has header node, actual data node from 2 files and a trailer node. Header has some generic info and trailer has the number of records info.
My question is, can I form the traget xml by merging 3 different xml’s i.e
fdgfdgfdg




efddgdg
will this xml be valid xml ?
Since the incoming flat files are 20Mb in size…i cant keep 2 files of this size and then do documentToXML service. Please advice.

Yes, you could do that. No, it wouldn’t be valid XML nor is it well-formed.

Valid XML requires it comply with a particular schema/DTD. To be valid, the XML must also be well-formed.

Well-formed requires that every start tag have an end tag. It also requires that the document have only one root element (yours has 4).

BUT

It might be acceptable XML, depending on the target of the XML. IS, for example, is perfectly happy to parse XML with multiple root elements.

XML constructs aside, you can achieve what you want by appending files together to create your final XML document, assuming the files contain the XML you want.

  1. Open output stream/file.
  2. Write the header.
  3. Open the first data file. Read a chunk. Write it to the output file. Repeat until EOF.
  4. Do the same with the same with second file.
  5. Write the trailer.
  6. Close the stream/file.
  7. Send the file whereever it needs to go

Thanks for the reply!

Sorry, I didnt mentioned that there is root node. Here is how the target xml looks like:

...few child nodes here...data here is independent of incoming file .... ... few child nodes here...with one total transcation node

Now, I am planning to write a service which will create a target file and write below xml lines first…


…few child nodes here…

Then, I shall read first file and extract data …and append the nodes into the target file. I have set ‘header=false’ in documentToXMLString service…

I shall repeat the same for second file as well…

then at the end…i shall append the node…and the closing root node

Will this work ?

I am reading the flat file (20MB) as stream (getFile)…mapping the stream to ConverToValues…(i have designed the schema to extract only the required xml data embeded in the flat file)…
After extracting the xml data…I am closing the stream by using pub.io.close service…
I am then parsing this xml to target format…
Please let me know whether this will work and will be able to hold the max load which is suppose to be 26Mb per file… ??

regards,
Khan

This is getting into details that probably won’t be adequately addressed in a forum such as this, but I think the answer is maybe.

Using convertToValues and documentToXMLString will likely lead you to loading the entire file into memory, which you may want to avoid depending upon your system configuration, the load on the server, etc.

Refer to the documentation about large document handling. Effectively, you’ll want to convert the flat file to XML in parts, not all at once.