vectorToArray Vs appendToDocumentList

I have implemented addItemToVector and vectorToArray for forming list instead of appendToDocumentList.
As per my experience, vectors have worked perfectly for me but recently one of the Performance Tester pointed out that it causes performance delay so I should use appendToDocumentList instead of vectorToArray.

Is this correct?

No, it’s most probably not correct. appendToDocumentList ist slower than other methods of dealing with lists. Under most cicumstances.

There is an article somewhere (or maybe even a post in the forums) with a detailed analysis (including performance) of different ways to cope with lists.

That said, I never use anything but appendToDocumentList because I never have to process huge lists. Then the performance penalty is neglectable. Modern JVM are quite good at memory allocation. And the code remains more readable, which is an important virtue in my view.

1 Like

Did the “Performance Tester” make this as a general guideline statement? Or was it indicated based upon profiling of the integration and a determination was made that the majority of the time was being spent in building the list?

My guess, is that it was a general statement. It is likely that this falls in the category of “premature optimization” which is to be avoided because it is almost always ineffective.

You should use addToDocumentList unless you find specific evidence that indicates you need to use some other approach (and replacing with Vector is suspect but that’s a different discussion). Most likely, unless you’re handling thousands of entries, this will not be a meaningful performance point.

1 Like

Best to use is vectorToArray when you wanted to make a loop and create multile document list.

To make it more clear and simple to understand -
appendToList takes more time and memory as it needs to carry entire fromlist and tolist in the pipeline which degrades the performance.
vectorToArry takes less time and gives more performance oriented results.

We can show the performance to performance tester , by having a sample POC where you use same code/input/output and use appendToList once and verctorToArray other time.

“Best” is a subjective term, which requires analysis and evaluation of characteristics. This thread mention time and memory, but the question is whether or not that time and memory actually matter.

Using Vector may or may not be more performant, depending on a number factors.

A Vector still uses an array. And it is re-allocated as needed behind the scenes. Performance will depend on the initial and increment size inputs when the Vector was created.

Vector is synchronized which can impact performance.

Don’t get me wrong – using Vector is fine if the situation warrants. My objection is to the generalization that one should always use Vector instead of the appendToDocumentList service because of a view that doing so it “better” or faster.

Using Vector also has these considerations:

  • Custom code. Instead of using built-in services, which SAG has tested for you, you’re creating yet another component that you need to maintain.

  • Additional call to get the array behind the Vector – which actually re-allocates a new array.

  • People new to your environment will need to be advised to avoid the built-in services and instead use your custom services.

It’s okay to go down this path – but do so because you’ve measured the time it takes, and that time is meaningful to the integration, and reducing the time taken will measurably impact the overall integration, not simply because of a general “it will be faster” declaration. Otherwise you’re adding complexity (relatively) for no benefit.

Additional info: for me, I ended up replacing the use of appendToDocumentList once in the past. It turned out that the list size was such that after about 1000 records, the time taken was significantly impacting the integration. This was 10+ years ago. (I think the post that fml2 references above is one in which I participated and provided this same info.)

Just a few years after that, the same measurements I had done to warrant the change, came back different. The performance for the same size of list that was slow before, was now not slow. JVM advances helped.

As fml2 and I suggest, measure first. Then change if warranted. In the past 10 years or so, after that change I described, I’ve never needed to replace the use of appendToDocumentList. Chances are, you don’t need to either.

Hi Rob,

You are right, I have mentioned time and memory because in integration space we need them more frequently.

I agree on the point , using vector may or may not be performant depending on a number of factors .
But at the same time using appendToList is not performant too when compared to vectorToArray. And this is what we were discussing to use vectorToArray instead of appendToList

I would rather say, one should not use always vector instead of appendToList if they want to handle small amount of data passing via integration layer.

Could you please let us know what is the custom code that you might have used. Which will be useful for others to know the alternatives.
Note : vectorToArray is a built in service in SAG product suite.

Thanks for the reminder about addItemToVector/vectorToArray being built-in services. They didn’t used to be. :slight_smile: I don’t recall when it was added. My point about “custom code” vs. built-in services for this is clearly off base. My apologies.

However, the overall point about “measure before optimizing” still stands.

For me, if I find that using appendToDocumentList is a meaningful bottleneck, I’m likely to use something other than Vector. Indeed, if my memory isn’t faulty I think I used LinkedList in the past (long ago).

Be careful about generalizations such as “small amount.” What number is that? It will vary depending upon a number of factors.

For Shubham, if the Performance Tester is suggesting a particular path, what is the evidence they have for the recommendation? And for your specific integration, does this debate matter at all? Chances are high that it makes no difference which you use.

Exactly my point. There is no difference in whichever item to list we use. Concern then comes to the readability of vector.
The concern on their part was may be support guys won’t understand the concept of itemToVector and vectorToArray.
From whatever i have read, it’s best to use vectors in case of multi threaded environment. If it is single threaded doc list should be used because vectors will heat up the performance unnecessarily.
Not sure I understand “vectors will heat up the performance”.

It would seem we are thinking similarly. :slight_smile:

Performance is not likely the concern here. Readability is the primary aspect for consideration. addToDocumentList has one call in the loop. addItemToVector is in the loop, then vectorToArray after the loop. Not overly complex. Just another step. But may need to be explained to someone new to FLOW.

Whenever people use “fuzzy” descriptions for the supposed benefits of something, I start to tune out. :slight_smile: As you note, “heat up the performance” is meaningless. I prefer evidence, not anecdotal phrases like “better” or “best.”

Vector (or other synchronized collection object) is indeed useful for multiple thread activity – but unless your service is spawning threads and all of them access the same Vector object, this is not a concern. If you’re doing typical work in your service, there is only one thread accessing that Vector. Synchronization is not necessary.

The underlying theme: optimize things AFTER testing indicates that area where the time is spent. Given the info thus far, the decision to use addItemToVector/vectorToArray or appendToDocumentList should be based upon other “*-ilities”. :slight_smile:


Hello there,
I just want to share code to put my two cents worth, based on the discussion I’ve read so far.
I implemented the List instead of Vector which I considered not pretty useful for just one method entered by many threads, although if you prefer, you can replace the List into Vector anyway.


		// pipelineInput
		// IDataMap is alternative to Interface IDataCursor
		IDataMap iMap = new IDataMap(pipeline);
		
		// Instantiate List<IData> toList
		List<IData> toList = new ArrayList<>();
		
		// Capture "toList" DocList from pipeline
		IData[] toListIData = iMap.getAsIDataArray("toList");
		// toList IData ? !null : add to toList ArrayList
		if (toListIData != null) {
			for (IData iData : toListIData) {
				toList.add(iData);
			}
		}
		
		// Capture "fromList" DocList from pipeline
		IData[] fromList = iMap.getAsIDataArray("fromList");
		// fromList ? !null : append toList ArrayList
		if (fromList != null) {
			for (IData iData : fromList) {
				toList.add(iData);
			}
		}
		
		// Capture "fromItem" DocList from pipeline
		IData fromItem = iMap.getAsIData("fromItem");
		// fromItem ? !null : append toList ArrayList
		if (fromItem != null) {
			toList.add(fromItem);
		}
		
		// pipelineOutput
		/**
		 * for sending pipeline output toList after converting List into new
		 * IData[toList.size()];
		 */
		iMap.put("toList", toList.toArray(new IData[toList.size()]));

[size=14]
The objective is to:

  1. Create an ArrayList named toList as destination. We must add the DocList “toList” in the first order to the array (if there’s any) so that the order of destinations will be appeared first, and then appending all new incoming DocList “fromList” and/or Document “fromItem” will be placed afterwards.
  2. Capture incoming pipeline of IData[] of “toList” (named by var toListIData. For destination DocList if any) and then recursively append ArrayList of “toList”.
  3. Capture incoming pipeline of IData[]fromList” and then recursively append to ArrayList of toList.
  4. Capture incoming pipeline of IDatafromItem” and then append to ArrayList of toList.
  5. Parse the ArrayListtoList” into IData[] based on “new IData[toList.size()]”, and throw to pipeline output.

DISCLAIMER
There some other things to consider that:

  1. The time complexity for this operation is linear O(n) depends on recursion of the length of “toList” and “fromList”.
  2. I haven’t obtain a solution for transform directly from IData to List/Vector/Set, so that I could append more easily without recursion. If there’s one I’ll update my code.
  3. I also I haven’t compare the performance against pub.list:appendToDocumentList. So I don’t know if my code is better in performance or not.
  4. I’m compiling the code in Integration Server 9.12 (IS_9.12_Core_Fix21); Java Version: 1.8.0_202 (52.0).

I hope this answer helps you…

Thank you

Kind regards,

Ivan
[/size]

What would you say the value of this is over using “appendToDocumentList”? It is copying the complete list twice every time it is called.

You mention multi-thread and recursion but neither of those are in play here.

[size=14]
Hello Reamon,

Hmm… I mentioned earlier that

but then I suppose you’re right, I didn’t realize that the value didn’t much different from appendToDocumentList when the DocList size is pretty lengthy.
After some thoughts I changed few code to implemented with ExecutorService…
[/size]


	public static final void appendToDocListJava(IData pipeline) throws ServiceException {
		// pipelineInput
		// IDataMap is alternative to Interface IDataCursor
		IDataMap iMap = new IDataMap(pipeline);
		
		ExecutorService executor = null;
		CountDownLatch latch = null;
		
		// Capture "toList" DocList from pipeline
		IData[] toListIData = iMap.getAsIDataArray("toList");
		// toList IData ? !null : add to toList ArrayList
		if (toListIData != null) {
			latch = new CountDownLatch(toListIData.length);
			executor = Executors.newFixedThreadPool(toListIData.length);
			for (IData iData : toListIData) {
				executor.submit(new AppendListTask(latch, iData));
			}
		
			try {
				latch.await();
			} catch (InterruptedException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		
			// Just cleanup
			executor.shutdown();
			latch = null;
			executor = null;
		}
		
		// Capture "fromList" DocList from pipeline
		IData[] fromList = iMap.getAsIDataArray("fromList");
		// fromList ? !null : append toList ArrayList
		if (fromList != null) {
			latch = new CountDownLatch(fromList.length);
			executor = Executors.newFixedThreadPool(fromList.length);
			for (IData iData : fromList) {
				executor.submit(new AppendListTask(latch, iData));
			}
		
			try {
				latch.await();
			} catch (InterruptedException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		
			// Just cleanup
			executor.shutdown();
			latch = null;
			executor = null;
		}
		
		// Capture "fromItem" DocList from pipeline
		IData fromItem = iMap.getAsIData("fromItem");
		// fromItem ? !null : append toList ArrayList
		if (fromItem != null) {
			toList.add(fromItem);
		}
		
		// pipelineOutput
		/**
		 * for sending pipeline output toList after converting List into new
		 * IData[toList.size()];
		 */
		iMap.put("toList", toList.toArray(new IData[toList.size()]));
	}
	
	// --- <<IS-BEGIN-SHARED-SOURCE-AREA>> ---
	
	static List<IData> toList = Collections.synchronizedList(new ArrayList());
	
	private static class AppendListTask implements Runnable {
	
		private CountDownLatch latch;
		private IData iData;
	
		private AppendListTask(CountDownLatch latch, IData iData) {
			this.latch = latch;
			this.iData = iData;
		}
	
		@Override
		public void run() {
			toList.add(this.iData);
			this.latch.countDown();
		}
	
	}

[size=14]
My last post was completely no-brainer, Let me know your thoughts/comments.

Thanks Rob
[/size]

Your early comment: “I consider not pretty useful for just one method entered by many threads”

The method does not need to concern itself with threads at all. The method scope, when called, will always be just one thread. It is not creating nor using any objects that are available to multiple threads.

The discussion started with alternatives to appendToDocumentList that did not exhibit performance issues when the list grew large. How “large” the list needs to be before the performance issues hit varies based upon a number of factors. Most people do not need to worry about it as most document lists will never be of a size that matters.

Document lists are simple arrays. appendToDocumentList allocates a new list of size “toList” plus the size of “fromList” plus one more if “fromItem” is input. Then copies all the old toList entries to the new array, and all the fromList and fromItem entries to the new array. As noted in the thread, the JVM has gotten pretty good at doing this at a speed that is acceptable in most situations.

Alternatives to this try to avoid repeated allocating and copying of the array. The solution you’ve shared does not avoid this. Indeed, it allocates and copies twice in each call–once as the ArrayList is built, then again when ArrayList.toArray() is called.

The second version creates a thread for every item to add to the ArrayList object. That’s a lot of overhead just to add an item to a list. Are you sure you want to create a 1000+ threads for a 1000+ item list?

Have you done tests to check the performance? At what number of items does appendToDocumentList start to become very slow on your machine? How does that compare to your appendToDocListJava service? Does your JVM crash when the the list size reaches a level where the number of threads and resources exceed the JVM capacity?

Tests always reveal reality but my guesses are that both versions of appendToDocListJava will start exhibiting performance degradation at a list size far smaller than appendToDocumentList.