Large Flat File - no ffiterate

arjunreddy · December 31, 2009, 12:02am

Hi,
We need to pull a large file (40MB) from FTP and then process it. I am able to pull it from FTP (takes about 30 mins) using stream option. The file has just one header record and the remaining records (350K lines) are detail records. I couldn’t use ffiterate to convert to values as there is nothing to iterate upon. I wrote a java service to filter the 350k records and bring it down to about 60k records (7MB). Then I still need to pass it through convert to values, loop through 60k records and publish the final doclist to broker because of the existing integration scenario. The whole process is taking about 2.5 hrs. Is there any better way to convert to values such a large file?

Thanks,
Arjun

Guest · December 31, 2009, 1:05am

I had the same problem working at Johnson & Johnson. After I get the file by FTP, I converted to string, separate de details and then process it in batches. Was my best solution for this issue. Regards.

reamon · December 31, 2009, 1:11am

“I couldn’t use ffiterate to convert to values as there is nothing to iterate upon”

Is each line a record? If so, then iterate will return 1 record at a time.

devexpert · December 31, 2009, 12:01pm

Hi,

As i understood from your post, looks like u have the repetitive line records in the file.
If yes u can still use convertToValues by creating a new schema for this document with Header and LIN.
If you make iterate true, u will get value of one node at a time, hence boost the performance due to less memory utilization to carry the heavy pipeline.
Since u will have the value of one document node, you can perform the action (map or anything), then go to the next node.
Basically you need to implement repeat block till ffiterator is not null.

Hope it helps!! try it out and let us know if it still does not solve your problem.

Cheers!
nD

arjunreddy · December 31, 2009, 7:46pm

Thanks for your response. The first line is the header record and then there are 350k lines. I just tried your method of iterating over each line and it worked. However this wouldn’t validate the structure of the file with the 1st record as header record, I will have to explicitly do a substring or something to get each field of header record and check it.

If converToValues cannot validate the structure, is it recommended for such scenarios?

Thanks,
Arjun

devexpert · January 2, 2010, 3:28pm

Since you have one header per file and multiple lines, you can iterate on lines and store the values from Header in temp record and use it whenever you want.

I have not got your current issue now … Let me know what is stopping you now?

cheers!
nD

arjunreddy · January 4, 2010, 8:22pm

Thanks for the reply! My obstacle is to validate the schema of such a big file. For ffiterate to iterate over all the lines, the schema will have to be created as RecordwithNoID. This will not tell us which record is the header record, only after convertToValues step we will be validating the values of first record to say it is the header record. If I do in this way, the flowservice will have to check all the 350k lines to see if it is the header record.

My question is : is this the way to validate the schema of the file? I read that convertToValues is used to validate the schema of the file.

Thanks,
Arjun

Topic		Replies	Views
Large file handling EDI	11	5515	May 14, 2021
convertToValues with multiple record structures flat-file	6	1194	July 21, 2021
Get first and last line Application-Platform , Managed-File-Transfer	5	2034	December 19, 2021
Large Flat File Handling EDI	4	1769	April 2, 2021
Large Flat File Parsing and Publishing the Generated Document EDI	7	2594	April 2, 2021

Large Flat File - no ffiterate

Related topics