Large Flat File Parsing and Publishing the Generated Document

Hi all

I am having a problem in handling Flat Files of Larger size. The size of flat file is round about 70 MB. I want to create a document out of it after parsing this flat file, and have to publish this document. Now due to its huge size (about 130,000 records with each record of about 600 character length), I am getting Out of Memory Exception. I am using the “iterate” variable set to “true”, and using convertToValue service in a REPEAT loop. I dont know how to drop the results of convertToValue service each time. Furthermore as I need to publish this document, I have to have all the fields (130,000) published. Is there any solution to my problem.
offcourse there is another solution, that I read the parsed records into a document in a loop (which runs for 20,000 times) and publish that document and then drop that document, then read next 20,000 records and publish them again, so on and so forth. But this is not what I want. …
Any proper solution provided by webMethods itself for such problems ?? any expert in this area ??

Hi Salman,

I am having a similar problem…
I am not able to parse the flat file also. I get a Out of Memory Exception on getFile service itself. I get the contents as bytes.
any solution??

thanks
pawan

Salman,

The moment you say - “I have to publish the document with 130000 records”, you are basically not leaving a lot of options to play with.

Only things I can think of

a) Try the 1.4 Sun, IBM JVM and keep increasing the JVM min memory to an arbitrarily large value (4GB).
b) Turn off “inbound client queuing”
c) Run GC explicity as soon as you invoke pub.publish:publish.
d) You should decrease the “Max documents to Send per Transaction” setting to a value as low as (probably) 1.
e) Have appropriate settings for your trigger capacity and refill level (GEAR perfromance document talks about this).

All said and done, I will have to say (more out of experience than anything else) that it will be difficult to have a document of that size in memory. You have got to process it in chunks. If you need transactional integrity in case of failures while processing this large document, you can always maintain the state that will help you in processing the document from the previous state (If 63417 records are processed before the server goes down, the state information will tell you that you should start processing from 63418 when the server comes back again).

Make sure you load the file as ‘stream’ not bytes - it will stream from the filesystem (avoid loading the file into memory as the bytes does). Using the iterator is the way to go (I’ve done this recently) but the critical part to check is within the REPEAT loop to make sure you drop any unnecessary variables that are created within the loop. Keep the ffIterator variable though.

The concept of using the iterator is in order to process one node at a time. If you are going to put all these together into one big record it effectively negates the benefit of using the iterator. You should try to publish the individual records or discrete components one at a time rather than in one large document.

My $0.02

Have you tried configuring largeDoc handling?

Hi Will,

Can you tell me how i can load it as a stream and process the same.
I could not find out how to use the node iterator. Does it work for flat files??

Configuring Large Doc … I am not aware of it.
PLease let me know where i can find the documentation to do it…

I am using SAP Business Connector which is really Webmethods Integration Server 4.6

please help
thanks
pawan

You can find more about largeDoc in the wmEDI package or wmTN?
use pub.io:steamToBytes.

HTH