Large Flat File Processing.

Hi Reamon, Can you please tell me more about this. I saw the flatfile services but I couldn’t find any splitting up of a file.

My problem is, I need to process files greater than 500MB to 1GB. So, the best way is to split the files into 100MB or so and then process each one in webMethods so that it doesn’t run out of memory.

Can you please let me know the best option. They are going with wM at any cost and they want to process data faster to a database. we are doing batchinsert since we are not doing any transformations. Simply, take the file, validate it against the schema and batchinsert.

Can you please provide if there is any utility to split files (our files are pipedelmited files )

Hi reamon,

Same question as previous member - didn’t find any particular service in IS under flat file package, which could split the files into multiple smaller files - which one could process further. The only option I know is using the $iterator field - however that is not splitting the file, rather processing each record at a time and in case of huge flat file it would cause the IS server crash still. Let me know other wise.

The splitService provided by Bhawesh is great, thanks for sharing those over. Thanks

Splitting a file into multiple smaller files is okay, but adds complications. Personally, I would not use the flat file services for file splitting. There are other easier techniques for that.

The key to evaluating options for processing a large file is understanding the content. An approach for a delimited file would be different from the approach for an XML file.

For a delimited file, the flat file services can be used. The convertToValues accepts inputs that support reading a file a bit at a time instead of all at once. The documentation describes the inputs but a quick summary: open the file with loadAs set to stream; iterate set to true; keep track of the ffIterator as you loop over the records.

Read 1 record, or up to X records, do whatever (map, validate, etc.) write the records directly to the target, such as a DB.

If you want to process X records at time, you’ll need to do a bit more work to gather them into a document list–call convertToValues X times, gathering each record into a list.

Loop until convertToValues returns no more records.

For XML, use node iterator techniques described in the documentation.

“Best” always needs definition. :slight_smile:

Splitting the large file into smaller files and processing those is one way. Another way is to process the large file iteratively. For a delimited file, my other post describes how to do so. Using iteration, the file is not loaded completely into memory. Out of the box, you can read 1 record at a time, process it, then get the next, etc.

For efficiency, particularly if the records are to be written to a DB, it would be desirable to read multiple records at time, then write them as a group. Doing that takes a bit more work. You’d do a loop inside a loop:

Open file as stream
LOOP until no more records
…LOOP until groupSize or no more records
…convertToValues → returns 1 record
…add that record to a document list
…Write the list to the DB
Close the file

Obviously this is simplified but gives the high-level approach. You can split this into a couple of different services – e.g. have the inner loop be a service that accepts the iterator and returns a document list, then call it until no more records.

Hope this helps though it may be a bit late.

Hello Himanshu Kumar,

i am checking for large file handling(500MB) and i saw your post here.and i didn’t find any particular service in IS to split files or to handle large files.

Kindly could you please share us the Bhawesh provided split service here that would be really helpful.

Regards,
Deepa