Loop and appendToDocumentList

Hi,

I want clarification regarding the usage of loop and appendToDocumentList.

In a situation where we want to create an output list depending on the incoming document list, we can do it in two ways.

one is we can use Output array of loop and other is we can use inbuilt service appendToDocumentList.

I want to know ideal situation to use both the functionalities and any PROS/CONS for using these methods.

will appreciate your comments.

Thanks

Hello,

As per my understanding: if the number of input records is equal to the number of output records you should go for output array else use appendToDocumentList method.

Regards,
Sasanka

If the array size is same, I use “output array”. If not, I use to addToList (the one from PSUtilities, I make a copy of it). I stay away from appendToDocList

Hello,

You can also use indices in the loop over input array to form an output array if the number of input records not equal to output records.

Thanks,
Sreenivas

Guys, thanks for the reply.

I am looking more at the PROS n CONS of using both.

as haragopal said i stay away from appendToDoc, any reason behind this?

According to my understanding there are issues related to Performance. if appendToDoc is used people have observed degradation in performance. I am looking at information more like this.

For large lists, appendToDocumentList performs poorly because of the way it is implemented. Every time you call appendToDocumentList, it basically creates a brand new list with size equal to plus 1 (assuming you’re appending one item). It then copies all the items from the original list to the new list and puts the appended item at the end of the new list. This frequent memory reallocation and copying of data is what gives you the performance hit.

When you use an output array, you can assume that the output array is allocated once with the same size as the input array so you don’t run into this problem. The problem with output array is that if you have a condition within the loop (e.g. a BRANCH) that prevents you from mapping a value into the output array, the element at that index of the array will be null, which may not be what you expect. That’s why many folks have indicated that you should use “output array” when the source list and target list are of the same size.

I ran a test for a customer a few years ago to compare different methods of mapping a source list to a target list involving large lists (up to 100,000 items), and at the time, they ranked as follows from fastest to slowest:

  1. Java Loop: looping done purely in Java with a Java service
  2. Implicit Loop: for simple lists of the same size, you may want to link them directly in a MAP step and let the IS handle looping implicitly
  3. Explicit Loop: using a LOOP step and its Output Array
  4. Append to Array List: similar to append to document list except that an array list is used in the background so there’s no reallocation and copying of data. It is important to set an appropriate initial size to maximize performance.
  5. Append to Document List: using the WmPublic service
  6. Dynamic Index: using a LOOP step without specifying Output Array and mapping to a specific item in the output list using a variable index

NOTE: for methods 1 through 4, the time taken to copy the lists grew linearly as the size of the list grew. Whereas for methods 5 and 6, it grew exponentially.

Hope this helps,
Percio

1 Like

One last thing: one natural question after hearing or reading “appendToDocumentList performs poorly for larger lists” is: what is a large list? It’s a good question, but unfortunately, it depends. It depends on the physical resources available to the IS, it depends on the complexity of your mapping, etc.

For that reason, many choose to take the approach recommended by haragopal, which is: use the array list approach if there isn’t a 1-to-1 relationship between an item in the source list to an item in the target list. In other words, forget about appendToDocumentList.

Percio

Key for all of the options listed in Percio’s excellent summary–test which approach works for your situation. Do not assume one approach will be faster than another. Measure.

In recent tests of my own with appendToDocumentList (though not with the number of elements Percio used) I found that performance had improved dramatically from the same tests I had done a few years ago. The JVM version and settings being used undoubtedly will have a big impact on this performance.

As Percio noted, there is no one answer. So the key is to try out different approaches until you get the performance your integration needs.

1 Like

Thank you very much guys.

Percio, I really appreciate the information and test result shared by you.

Hello there,
I just want to share the code, put my two cents worth, based on these concepts I’ve read so far


		// pipelineInput
		// IDataMap is alternative to Interface IDataCursor
		IDataMap iMap = new IDataMap(pipeline);
		
		// Instantiate List<IData> toList
		List<IData> toList = new ArrayList<>();
		
		// Capture "toList" DocList from pipeline
		IData[] toListIData = iMap.getAsIDataArray("toList");
		// toList IData ? !null : add to toList ArrayList
		if (toListIData != null) {
			for (IData iData : toListIData) {
				toList.add(iData);
			}
		}
		
		// Capture "fromList" DocList from pipeline
		IData[] fromList = iMap.getAsIDataArray("fromList");
		// fromList ? !null : append toList ArrayList
		if (fromList != null) {
			for (IData iData : fromList) {
				toList.add(iData);
			}
		}
		
		// Capture "fromItem" DocList from pipeline
		IData fromItem = iMap.getAsIData("fromItem");
		// fromItem ? !null : append toList ArrayList
		if (fromItem != null) {
			toList.add(fromItem);
		}
		
		// pipelineOutput
		/**
		 * for sending pipeline output toList after converting List into new
		 * IData[toList.size()]
		 */
		iMap.put("toList", toList.toArray(new IData[toList.size()]));

The objective is to:

  1. Create an ArrayList named toList as destination. We must add the DocList “toList” in the first order to the array (if there’s any) so that the order of destinations will be appeared first, and then appending all new incoming DocList “fromList” and/or Document “fromItem” will be placed afterwards.
  2. Capture incoming pipeline of IData of “toList” (named by var toListIData. For destination DocList if any) and then recursively append ArrayList of toList.
  3. Capture incoming pipeline of IData “fromList” and then recursively append to ArrayList of toList.
  4. Capture incoming pipeline of IData “fromItem” and then append to ArrayList of toList.
  5. Parse the ArrayList “toList” into IData based on “new IData[toList.size()]”, and throw to pipeline output.

DISCLAIMER
There some other things to consider that:

  1. The time complexity for this operation is linear O(n) depends on recursion of the length of “toList” and “fromList”.
  2. I haven’t obtain a solution for parse directly from IData to List/Vector/Set, so that I could append more easily without recursion. If there’s one I’ll update my code.
  3. I also I haven’t compare the performance against pub.list:appendToDocumentList. So I don’t know if my code is better in performance or not.
  4. I’m compiling the code in Integration Server 9.12 (IS_9.12_Core_Fix21); Java Version: 1.8.0_202 (52.0).

I hope this answer helps you…

Thank you

Kind regards,

Ivan