Large Flat File - no ffiterate

Hi,
We need to pull a large file (40MB) from FTP and then process it. I am able to pull it from FTP (takes about 30 mins) using stream option. The file has just one header record and the remaining records (350K lines) are detail records. I couldn’t use ffiterate to convert to values as there is nothing to iterate upon. I wrote a java service to filter the 350k records and bring it down to about 60k records (7MB). Then I still need to pass it through convert to values, loop through 60k records and publish the final doclist to broker because of the existing integration scenario. The whole process is taking about 2.5 hrs. Is there any better way to convert to values such a large file?

Thanks,
Arjun

I had the same problem working at Johnson & Johnson. After I get the file by FTP, I converted to string, separate de details and then process it in batches. Was my best solution for this issue. Regards.

“I couldn’t use ffiterate to convert to values as there is nothing to iterate upon”

Is each line a record? If so, then iterate will return 1 record at a time.

Hi,

As i understood from your post, looks like u have the repetitive line records in the file.
If yes u can still use convertToValues by creating a new schema for this document with Header and LIN.
If you make iterate true, u will get value of one node at a time, hence boost the performance due to less memory utilization to carry the heavy pipeline.
Since u will have the value of one document node, you can perform the action (map or anything), then go to the next node.
Basically you need to implement repeat block till ffiterator is not null.

Hope it helps!! try it out and let us know if it still does not solve your problem.

Cheers!
nD

Thanks for your response. The first line is the header record and then there are 350k lines. I just tried your method of iterating over each line and it worked. However this wouldn’t validate the structure of the file with the 1st record as header record, I will have to explicitly do a substring or something to get each field of header record and check it.

If converToValues cannot validate the structure, is it recommended for such scenarios?

Thanks,
Arjun

Since you have one header per file and multiple lines, you can iterate on lines and store the values from Header in temp record and use it whenever you want.

I have not got your current issue now … Let me know what is stopping you now?

cheers!
nD

Thanks for the reply! My obstacle is to validate the schema of such a big file. For ffiterate to iterate over all the lines, the schema will have to be created as RecordwithNoID. This will not tell us which record is the header record, only after convertToValues step we will be validating the values of first record to say it is the header record. If I do in this way, the flowservice will have to check all the 350k lines to see if it is the header record.

My question is : is this the way to validate the schema of the file? I read that convertToValues is used to validate the schema of the file.

Thanks,
Arjun