Could someone please help me understand the difference between a file poll and a scheduled task reading files from a folder.
I am trying to read a certain file every night at 1:00AM. I think I could do this using either file polling to invoke a processing service OR using a (scheduled) service doing a getFile and then invoking the same processing service.
The only additional thing that I can see with file polling is that it moves your files from the in to working to complete or error directory.
Is there more?
There are a number of extras using filepolling. Other than specifying IP restrictions and ACLs, you can eg also limit the “maximum number of invokation threads”, which lets you regulate how many threads you dedicate to the polling. This option can be interesting to play with if you have performance issues with incoming traffic.
When comes to performance wise filepolling mechanism is the best,and it has nice error handling feature and archives to a working/error directory this helps a lot for debug and reprocess the files without giving additional burden to source side.
Thanks RMG and loveloic. That helps!
Another factor you may want to consider is clustering. If you use IS cluster file polling is not cluster aware and can be used but in special cases. Schedulig services do run in cluster however unstable the shceduler is its still runs in clustered mode. The file poller will act independ in a cluster.
So from what I understand is that File Polling is not cluster aware. Is there any work around for that? We are planning to move to a clustered environment from single IS. I am expecting a lot of suprises and File polling was one of them. Correct me if I am wrong?
I would be interested in learning about the “special cases” that you refer to in your posting. Could you elaborate or point me to any documentation on this.
Thanks for your help.
File polling will run in cluster but if poller monitoring same directory and same file types its possible for several servers in cluster to get same file or in most cases some of the servers will throw ironious error because file was already moved by another server to working directory. Since file poller does not coordinate its activities in a cluster it can be a porblem depending on integration.
Special cases I refferd to previously is you can still use file poller in IS cluster but each server monitoring different type of file or directory so there is no collisions. Its basically a veritical scaling instead of real cluster horisontal scalability.
I would not expect too many surprises from IS Cluster for most part it works pretty well especially if your cluster is mainly working as a listener to netwrok real time requests such as HTTP, IS Clients or Broker. The scalability, filaover and load balancing ports seem to be working fine. Repository can be a little sensetive and may need to be refreshed at times. Scheduler has some simple but tricky configurations. But as usual it all depends on whot you are deployoing and doing with it.
One solution that I am thinking of would be to configure a dedicated server in the cluster for the file polling task. This server picks up the file and publishes it to a shared broker. From there on, any IS that is available can take up the processing of the document.
If that dedicated server goes down too then what is next possible solution to recover?
Hmmm…I would think that one would need to have a failover mechanism in place for that situation?
In our case, the file polling will be part of an asynchronous communication and if the server does go down, it will need to be brought up to process the backed up files.
Make sense…Thanks for clarification.
just to add to it. If you have server’s in a cluster that perform file polling, schedule a service to copy the file to a local file domain on the server and then process it. This should take care of the clustering problem or any deadlock situation.