Best practices for breaking up large CSV files and submitting to Amazon SNS

We are using webmethods.io integration cloud to get files off our SFTP folder, take a file (most commonly a large CSV, ~100k-200k rows), normalize the records to our JSON schema, and then submit the records to Amazon SNS. All of that job is currently handled via a flow service. We currently have a flow that submits each row on the file as an SNS message, but it’s too transactionally expensive opening up the SNS connection for each row. Does anyone have recommendations to make this flow possible?

We’re considering:

  • Chunking: taking the CSV file and parsing it into smaller batches that we load as a message to SNS. So for example, we take a 1000 line file and break it into 4 250 row batches, and send each batch to SNS as a single message (versus sending 1000 messages, 1 row per message).
  • Better flow service performance: are we missing basic functions or means of holding open SNS connections to send large volumes of messages more efficiently, without the cost of opening/closing the Amazon connection for each row?

Hi Alden,
As per my understanding we can send the files in batches. This will definitely increase the performance.
We should also follow the approach for parallel processing the messages.

Regards
Vikash Sharma

Thanks for confirmation! Do you know where we can find some notes / approach documents online that talk about how to go about sending the files in batches or parallel processing the messages?

Hi Mitchell,

Are you in need or asking for documentation pertaining to wm.io.integration (ipaas) or in general on-premise TN process (Batch implementation/configs) terms? Please clarify.

HTH,
RMG

We’re on the webmethods.io ipaas solution. Thanks!

OK then may be Vikas can advise here on the ipaas blogs/white-paper documentation if one available for batching implementation already!

HTH,
RMG

ok thanks! will see if @Vikash_Sharma1 has any thoughts.

Hi Alden,
Below is the approach I can think of

  1. Receive the file.
  2. Loop over the received file
  3. Create a batch of 50 or 100 (depending in size of data)
  4. Submit the request to SNS.

So in this case if we are going with a batch of 50 and we have total 200 rows. Then we will be connecting to SNS only 4 times.

As of now we don’t have any documented this kind of use case. We will definitely work on this type of use case.

You can also refer the below link.

This links talk about similar kind of use case implementation on IS on prem.

If we talk about the size of the payload that can be handled by the webmethods .io integration, then it depends upon the platform on which your tenant is hosted.

Refer the below link for same
https://docs.webmethods.io/integration/workflow_building_blocks/payload_and_log_data_support/#gsc.tab=0

Regards
Vikash Sharma

Ok thanks, we will take a look!

I haven’t yet found a webmethods.io flow service function similar to pub.json:getArrayIterator or pub.json:getNextBatch, but will continue to look.

Let me know if a similar use case pops up or gets documented. Thanks!

Alden