When we want to introduce parallel processing we use pub/sub. It is an almost trivial way to do so and avoid all the “fun” of managing async service invocations and joins. We never drop to Java to do multi-threading in IS.
Quick terminology glossary to aid communication:
- Scheduler/scheduling - facility within IS to define scheduled tasks. There is only one of these in IS. It manages execution of tasks.
- Task - configuration that identifies a service to be run at specified times or intervals. (Many people refer to these as “schedulers” for some reason but the docs and UI call these tasks or “user tasks”.)
- Service - any IS-hosted service can be defined in a task to be executed.
Assuming you have Universal Messaging (or if still using Broker) here is a way that may fit the scenario of interactions with multiple SFTP servers at once, but just one for each.
- Parent service executed periodically via scheduled task. It obtains the list of SFTP servers. For each, publishes a document identifying the SFTP “source” in the publishable doc type.
- Define a trigger for each SFTP source server, using a filter. Set each to use serial processing.
- Depending upon how you have the interactions with each SFTP server defined (hard-coded, configuration driven) each trigger is defined with either the common service (assuming config driven) using the published doc input to determine what to do. E.g. multiple triggers, common service for all. Or a specific service for the given SFTP server. E.g. a trigger and a service for each SFTP server.
The serial setting will make sure just one thread is active on just one of the IS instances in the cluster at a time (edit: this does not mean “IS cluster”; but multiple identical IS instances connected to the same UM). Each published doc will be processed serially.
That said, there is a caveat. I am not 100% certain that using filters in triggers for the same doc type will distribute multiple events to run in parallel. You’ll need to confirm the behavior.
A potentially big downside to this approach, depending upon the specifics of your implementation, is that every time you add an SFTP target, you would need to create a trigger (with a filter).
Another possibility is to not get the list of SFTP servers to poll from somewhere else and instead define a scheduled task for each SFTP server. One advantage is that the polling interval would not need to be the same for all SFTP servers. Some can be once a day, others every 5 minutes.
My advice: keep it as simple as possible. The “stop a duplicate…” and “Scheduler is not allowed to run another instance…” is getting a bit involved and error prone.
Here is yet another possibility that may be workable and allow any number of simultaneous threads against a single SFTP server.
- Get list of files from SFTP server. If none, exit.
- For each file
- Rename the file to “reserve” it for this thread. Either add a suffix to the file (something that “list of files” is not looking for) or rename it to another directory (a move is a rename).
- If the rename fails, ignore and move on to the next because another thread processed the file.
- Retrieve the renamed file. Move/delete it when done.
We use this approach frequently. Works fine.
Another technique that should almost always be used – when writing a file for some other process to pick up, write the file to a name that the other process is not looking for. E.g. use .tmp instead of .csv; or write to another directory (on the same volume). When done, rename the file to the needed name. This prevents the other process from picking up files that still being written to, or files that are partial because the app writing the file failed.