Issue with File read

Hi,

I am using pub.io:read along with getFile service to read a csv file.
It is taking too much time to read the file. First 100 lines are read pretty fast, after that it is taking exponential time to read more lines. Example 200th line to 300th line 30s, 500th line to 600th line 3mins …

Please can any one help

Thanks in advance,
Hitesh

Can you post your FLOW snippet? I suspect the slow down may be due to repeated reallocation of a byte array or of a string list.

Attached is the image of WM flow.

Steps followed -

  1. Get length of each line in a list, this list will have as many records as there are lines
  2. get file as a stream
  3. loop over the list from step 1, total loop count = total no. of lines
  4. Read row
  5. Parse row
  6. Publish
  7. Call garbage Collector after every 100th row.

We even disabled the publish … but still no use

Thanks for your help
Hitesh

General rule of thumb: don’t allocate things when looping. Move as much as possible outside of the loop. Reusing arrays, string buffers, etc.

Disable more and more until you can isolate what is causing the bottleneck. I suspect that calling createByteArray and bytesToString for each loop may be contributing. You can’t really get rid of the bytesToString but you should be able to use just one byte array, allocated before the loop and large enough to hold the largest row.

tokeniseString would be another thing to investigate.

Be aggressive in dropping pipeline variables. Drop vars as soon as you possibly can.

HTH

Thanks for your suggestion, I will try this and let you know

Thanks again,
Hitesh

I’d get rid of the GC call for every 100 lines. If you have a large heap and stop-the-world GC, you’ll be adding a significant pause for every 100 lines.

Following Rob’s advice re: not allocating variables inside the loop should eliminate the perceived need to call System.GC().

Mark

Thanks for the suggestion, I tried both (not allocating variables in loop and removed gc call). But no improvement.

Finally added a service to read the file using standard Java i/o API and load it into list. This worked !!!

Now its able to process 21k lines in about 10mins… !!!

Thanks for all your help
Hitesh