File polling port to poll multiple types of files

Hi

What is the syntax for polling multiple type of files in the ‘file name filter’?
I wanted to poll *.txt and *.doc files using a single file polling port,
Plz tell the syntax for achieving this.

Swetha

Hi,
Do you want to poll a FTP location?

If so then make an IS service which will list the files in the desired directory after FTP LOGIN for which you can use pub.client.ftp:ls with filenamePattern=*.txt (desired pattern). This IS service should be scheduled after desired interval. If you want *.doc too then use pub.client.ftp:ls twice and use appendToStringList service to form the complete fileNameList and then you can FTP GET or FTP MGET as per your design.

Hope this would be of help.

Kind Regards,
Soumik G Biswas

Hi Soumik,

Thnak you very much for the reply but thats not the one I am looking for.

While configuring the File Polling Port what should be given in the ‘file name filter’ such that it accepts both the txt and doc files.
In one previous post it was said that we should use regular expressions but I didnt get the correct pattern.

Can any one plz tell the pattern

If only doc and txt files are going to be present there in the scan directory then you can leave the filename pattern feild blank so that it picks up all the files from there.

Hi suren

Not only doc and txt files there will be different types of files that are present in the directory and among them I need to pick up these two pattern files.

As already noted, but thought I’d summarize for posterity:

  • The file polling port cannot poll FTP servers.
  • The file polling port monitors a directory for files that match a pattern. There is no facility within this approach to scan the directory. The polling port does the scan on your behalf.

You can try a regex to pick up files that have particular filename extensions, but it won’t be a simple expression and it will be difficult to have it do exactly what you need.

This KB article on Advantage discusses the regex supported by the file polling port:
https://advantage.webmethods.com/advantage?targChanId=kb_home&oid=1613386882

That article refers to a link that describes the class that the file polling port uses for regex handling:
http://jakarta.apache.org/oro/api/org/apache/oro/text/GlobCompiler.html

It’s pretty limited.

I suggest the following:

  • Use a pattern that ignores a particular pattern. For example, ignore all files that start with Z or z: [^Zz]*

This will pick up all files except those that start with Z. Clients writing files to the directory would write with the leading Z and when finished will rename the file to remove the Z. This prevents the port from picking up files before they are completely written.

  • Don’t allow “different types of files” within that directory. The directory should be reserved for IS processing only.

HTH.

Hi,

Thanks for the valuable information but I have a doubt regarding one of your points.

Will this situation of picking up the half cooked files not be taken care by the File Age property that we can set while configuring a file polling port?

Thanks,
Suren

Almost. It narrows the possibility of picking up files that are still open to another process but doesn’t necessarily eliminate it completely. Plus, it introduces an additional artificial latency, which may not be acceptable for a given integration.

The fool-proof way is for the process dropping off the file to do a rename (this tip applies to FTP file dropping too).

Thanks for the clarification. This brings up one more question regarding half-cooked files while FTP, if it is Ok to continue in the same thread.

Is that true that in UNIX you can rename a file successfully the file which is currently being written by some other process? If so will the FTP GET even after renaming will end up in getting only the half data?

I’m not sure if it is or not, via FTP.

I’m familiar with instances where the file polling port has picked up in-progress files. The first thing it does is rename the file to move it to the work directory so it seems that is indeed possible to rename a file that is being written to by another process.

I’ve not heard a scenario where an FTP client renamed a file successfully and then downloaded it only to find it was incomplete, but I suppose it may be possible.

The key here is to take an approach where two processes are not trying to access a file at the same time. This means using a scheme where the process picking up the file will not even look for a file that is still being written. The writing process should write to a temp filename/directory that the reading process completely ignores. When complete, the writing process renames the file to use a filename and/or directory that the reading process is monitoring. Rename operations are atomic. When the rename is done, the file is available for pick up.

Yes Reamon, I agree that is the best approach, even I have not come across that scenario as in my previous client we used to impose the source application the same scheme as to write the file into a temp file/directory and then rename/move it to the scan directory.

But in one of the threads,
http://www.wmusers.com/forum/showthread.php?p=60053#poststop
one user was able to issue a mv command and the file was still growing in size with the new filename. As I dont have a UNIX system to test this I thought I will discuss this.

Thanks,
Suren

Yes, in the scenario in that thread, it is definitely possible. I’ve seen this myself.