File Polling Question

RMG OR Experts,
we are using a get file service to get the latest files and are scheduling that service…Its reading big files without them being completely loaded…any solution besides file polling service in wm…thanks

Where are you getting the files from, a remote FTP box or from a local folder?

Why don’t you get the file size for few intervals in your service and see if it changes and if it doesn’t thats means file is completely transfered otherwise keep checking the file size.

Just curious, why don’t you wanna use file polling since it supports the exact scenario out of the box ?

From a Local box where IS is installed

Our current service is doing that, it checks for file size ,2 times with a gap of 45 seconds…I wann to change it…since this is taking hours to process large volumes…thanks

How about changing the name of the file after complete transfer and then you look for that name, lets suppose file name is flatFile and you change it to flatFileDone so your service will look for file name flatfileDone.

this might be picking up the half cooked files, so you can rename the file to a new name and then pick it up, even in case someother application is holding it, then it will error out while renaming else as Khan suggested you can go for a file polling port which takes care of file age, moving and reading the file.

What is the size of the file you trying failing to read and what kind of file is it, XML,txt…etc., ?

How to check if the file is completely done or not before renaming…Thats the only issue i have…
1.scheduler runs for every 50 secs
2.gets all files unprocessed
3.Check the file size of every file with a gap of 45sec and compare…
4. if file sizes are equal process…

Step 3 is taking too much time…I want to find an alternative for it…all files are flat files…

Maybe I wasn’t clear but i was hoping that the application which is putting the file can rename the file after transfering the data.

replace your step 3 with a rename command, rename the file by appending a timestamp to the original filename, it will fail if the file is still growing in size, meaning to say whether it still getting written by the source application or not. If it is a success then go in read the file else it is going to go the catch block and it will picked up in the next run or whenever it is fully available. In this way you can avoid reading the half file.

Thanks suren…i got it…Plz let me know if there are any other options available…

Guys,
Plz give me your valuble opinions

Suren,
Looks like the above solution doesnt work for unix.because i was able to change file name while the file is being used…Is there any way we can do this in Unix…

Hi,
So you mean even after renaming the file and the file kept growing in size with the new name? :confused: . This is strange and in this case we have one more thing to do, you can move the file to a new location, such as you can create a folder called “working” and move it there with appending the filename with timestamp.

And how are you renaming, using any wM built in service(which one) or you have written a custom code to do it(post it)?

Hope you dont mind me for asking too many questions.:wink:

Suren,
your trick works in windows…but not in unix…because in unix we can rename a file even if it is being used.i am using a mv command to rename file…Thanks

It’s the process that is writing the file that should rename it, as Talha Khan mentioned. Here are the steps:

  1. Process that is dropping off the files writes a file either to a temp directory or to a temp name. For example, _myFile.txt.

  2. When finished writing, that same process renames the file to myFile.txt.

  3. Your process (you might consider using the file polling port instead of writing your own polling services) will ignore all files that start with “_”. It will process any file that does not have a leading _ in the name.

This will assure that files being written are not picked up prematurely.

In any case, this won’t address the issue of “taking hours to process large volumes.” That’s a different problem.

Thanks Reamon,
but different applications put files in shared location…currently i modified my code to pick up files that are 1 min older…i.e my service gets the current time and last modification time of the file…if the difference is more than a minute…then only it picks up that file and processes it…plz let me know if this can be done in a better way…thanks

saritha

I’d suggest the following:

  • Over time, get the applications that are writing to the directory to change how they do so, per the approach listed earlier. The file age approach is a reasonable interim approach.

  • Use the file polling port instead of your own code.

Hello everyone,

I have a scenario where I need to pick files from UNIX folder and FTP to some destination.

Basically, I want to schedule this interface. Not to use FilePolling. Can anyone help if they have any code available.

There will be more than 30 files which I need to pick from that location. And we want to schedule this interface to run every 15 mins.

Option 2: If nothing works out. I want to try FilePolling too. I have heard that even FilePolling can pick half cooked files.

Please suggest if anyone has successfully implemented this and has a solution.

Your help and inputs are much appreciated.

Thanks,
David.

File polling is basically a scheduled activity. The schedule is established on the file polling port config instead of the scheduler page–but underneath the scheduler is used.

The way to avoid picking up partial files is this:

  • The process writing the files uses a temporary filename of some sort. Like “myFile20100530.csv.tmp”.
  • When the source process has finished writing it renames the file to “myFile20100503.csv”. Renames are an atomic operation.
  • The file polling looks only for *.csv files and ignores all others.

This guarantees that a file will only be picked up if the source process is finished writing and has successfully renamed it, regardless of the nature of the protocol being used (e.g. writing via FTP, shared disk, etc.).

With this approach, the file poller will never pick up:

  • Files that are still be written to by the source process.
  • Files that are incomplete due to a failure of the source process.

The file polling port has a facility to wait for a period of time (file age) before processing a file. This can be effective but in some cases simply reduces, rather than eliminates, the potential for picking up partial files.

Hi Reamon,

Thanks for your reply. But the problem here is that the source system doesn’t create a temp name file and then rename it. We discussed this with them, and they mentioned that it is all in their settings which automatically assigns the filename.

So, they cannot rename the file. What would be the second best approach. What is the fileage in filepolling configuration. If I mention, fileage to 3 mins. Then say first file comes…it waits for 3 mins and then picks the file? similarly, waits for 3 mins to pick the 2nd file. Am I right?

Thanks,
David