I am getting one gb size flat file which I need to process the header, detail and trailer. I need to create multiple XML file depends on the header information.
I am publishing to Broker to process header, detail and trailer records. I created a trigger to process my records in single thread mode but I would like to create multi thread for detail record.
How can I identify whether all my detail records are processed by multi thread trigger?
convertoValue service createas @fields. I could not publish @fields to my broker. How can I drop them from my convertoValue Output.
Hi Sam,
1GB Flat File is indeed a large file and special care should be taken to handle this type of file, considering the fact that IS JVM cannot have more than ~2.5GB allocated.
Below is how I have done to process the file of this size > 1GB containing no. of records > 500,000:
1> To avoid any overhead, used only IS (no Broker, no TN, no Modeller etc…).
2> Wrote a fileSplitter Java service. This service reads the input file as stream, goes through each record to do some validations and creates bunch of tmp files each of records 20,000 (configurable, supplied as input to this service) and creates the output as the list of tmp files thus created.
3> Main flow then processes these files 1 at a time, creates the desired output file and deletes the temp file after processing: all in a loop. The data in the output file is appended each time in the loop.
4> Found that the split size of 20,000 records were optimal (Total processing time < 2 hrs.). Setting it to higher or lower value increased the total processing time.
5> Solution is scalable e.g. if the input file size grows in future, the splitter will create more temp files but the IS will handle only small chunk of data (20,000 recs) at a time and so will not go out-of-memory.
Your integration could be totally different than what I had but wanted to give you some pointers to ponder over while handling a large file > 1GB.
Must the data for item 2 be processed after item 1? Of course the lines that make up an item (line1, detail1.1, detail1.2) need to stay together, but can the items be processed independently and in any order?
The question wasn’t whether or not you need to process the entire file. But rather, whether or not all the items in the file must be processed as a single unit–in other words, if one item fails for some reason, can you continue with the remaining items or do you need to stop and rollback all work up to that point?
A flat file? An XML file? An “electronic catalog file” is not descriptive enough.
In this case, the process outlined by Bhawesh should work fine for you too. You do not need to publish items to the Broker. Do not use it as part of your solution. Be careful with how you write your file splitter service–do as little processing as possible.
Hi Sam,
I can send you the fileSplitter java service. As Rob mentioned, I have optimized this service for optimal performance, because this the service which takes the hit of reading large data.
I am having a similar issue with what is being described on this thread and would appreciate any insight anyone can offer. I need to process a 60M-70M file. The format of the file is:
O - Order header information.
B (1 occurrence per O)
S (1 occurrence per O)
P (multiple occurrences per O)
There are about 70,000 Order records that need to be processed. I do not need to store these records anywhere, just process them and send an email.
I am able to run my service if I read a small sample file using getfile. But when I set iterate=True, I cannot do anything with the data. I am mapping ffvalues to a schema document created by a schema, and I use the correct schema name in ffschema value in convertToValues. If I savePipelineToFile, I see data in my schema document, but if I try to write any of the schema data to the debug log, the values are null.
As I mentioned before, if I bring the whole file into memory using getfile, the service works fine. I just cannot seem to be able to stream in the data.
I require to split failry large files being sent across different IS via a broker, one in the US and one in ASPAC.
The file sizes are approximately 50Mb in size, but there are around 50+ documents that get sent to the broker on the US side, so the broker is receiving quite a lot of traffic.
2> Wrote a fileSplitter Java service. This service reads the input file as stream, goes through each record to do some validations and creates bunch of tmp files each of records 20,000 (configurable, supplied as input to this service) and creates the output as the list of tmp files thus created.
Please send me the java service. Please do me the favour
My Id: datta.saru@gmail.com
I saw a lot of requests for the file splitter service that you have written. To avoid lengthening the thread for such requests only, I would request you to attach the zip of the code into the post itself.
Oh they are not necessary flat txt files. They could be zip files as well. I need to connect to the SFTP server and get them. I was thinking of letting the underlying Unix commands to get the file instead of getting it into the pipeline memory! But how do I do that? Any ponderings?
Great Job B Singh…Your code works fine with 32 bit webMethods,However some times it is skipping a portion of a line in 64 bit webMethods.please advise.i dont know Java.Do you have another service which does the same in 64 bit.Thanks