Finding better way to process large flat file

I have a pipe delimited file that contains about 115k+ lines records. I use the “convertToValues” service and it takes very long time to process. I even use the recommended process for handling large flat files which is set the “iterator” to true and use the “Repeate” operation with “convertToValues” and exit when “ffiterator” is $null. It still takes awhile to process. I’m looking for ways to optimize this process. If anyone have any suggestions , please let us know. Thanks.

How long is “a while?” What is the processing done for each record?

An approach I’ve used for simple DB loading is to create a FLOW service that reads a group of records from the FF at one time (using repeat and ffIterator). Then uses batch insert to write them to the DB. This cuts down on the overhead.

I’m running this via the Developer. I ran the flow service that does the “convertToValues” and I got tired waiting for it to complete which is about 15-30min time frame. I thought I must be doing something wrong. I’m not inserting it into a database. I just need to parse the FF so that I can use it for the next process of the step. I’m also using “appendToDocumentList” after the “convertToValues”.

There are several things to consider when using the FFIterator:

You need to parse the file as stream, otherwise the Iterator won’t change loading the whole file. Changing to stream enables to load only parts of the file into memory. Please see the Documentation on how to configure FFIterator correctly.
The before means you need ot do further processing inside the loop, so the process part can be dropped before next iteration. If your processing requires the complete contentx in memory, FFIterator won’t help. If you append all data to a document list, you still get the complete filecontents in memorey, so you may need to rethink your processing approach at all.
Running this from developer won’t work well, as this transfers all the data to developer. You should use Developer only with small test data and run from IS via invoke when parsing the complete file.

Regards

Martin

Martin,

Thanks for the suggestion. I didn’t realize I was loading it as “bytes” instead of a “stream”. I will make that change and see how it goes. However, based on what you said, it would be pointless to use “appendToDocumentList” since that’ll load all the data into memory. What I’ll do is parse only the record that I need and “appendToDocumentList”. When you say don’t use Developer to test but use the IS invoke from the admin console?

Yes, invoking from Developer is fine for debugging but highly inefficient as develoepr controls the complete flow. Invoking from admin console, either via package management or just adding the invoke part to the URL is far better if dealing with a lot of data.