Flow problem: reduce a list to uniques

Has anyone achieved the following neatly in Flow?

I have a list of documents, each of which contains an “id” tag. The same “id” may be repeated in more than one document.

I want to reduce this to a list of “ids” with no repeats.

So input:
Doc[0] -> id=“john”
Doc[1] -> id=“paul”
Doc[2] -> id=“john”
Doc[3] -> id=“george”
Doc[4] -> id=“ringo”
Doc[5] -> id=“paul”

Would give output: “john”,“paul”,“george”,“ringo”.

Obviously it’s a trivial Java service, but it feels like something that ought to play to Flow’s strengths.

If order doesn’t need to be retained, sort the list, then loop over the sorted list and append to a new list each time the current list item differs from the previous.

Hmm, I’ve never found a built-in service to sort String lists.

I think what you’re telling me here is that problem does not play Flow’s strengths as much as I imagined it would - in as much as you have to write an algorithm (however simple) with temporary variables and so forth. It felt as if since Flow is good at mapping, it might also be good at folding too, and I’d missed a feature.

Just for reference the Java version is:

Set set = new TreeSet();
set.addAll(Arrays.asList(unfiltered));
String[] filtered = set.toArray();

So I put that in a Java service and put it in my “things that should be BIS” package.

Here you create a temporary treeset instance, which is by definition sorted in ascending order, and you use a method which only adds if not already present. This is almost exactly what was suggested in the flow solution, which, frankly, would have been more readable.

I could equally have used a HashSet, which is not sorted. The important ‘by definition’ here is that as Set is “A collection that contains no duplicate elements.”

True, although it’s not important whether this method “only adds if not already present” or “always adds, replacing the previous instance”. The point is that any method that adds to a set will not result in a duplicate item.

I can’t – and I’ve tried – make this brief or readable in Flow. Here’s my best attempt:

1. INVOKE sort inList (presupposes a sort service)
2. MAP (fabricate lastItem variable)
3. LOOP over inList
3.1 BRANCH (evaluate labels)
3.1.1 inList L_EQUALS lastItem: SEQUENCE
3.1.1.1 INVOKE pub.list:addToList (inList, outList)
3.1.1.2 MAP (inList -> lastItem)
4. MAP (drop lastItem variable)

This says to me “here’s an algorithm: read it”, whereas the Set example says to me “throw everything into a Set, then get it out again”.

I have not found a BIS to sort a list.

I’d hoped this was a common enough requirement that Flow had anticipated it. For example, if there were an option to addToList that suppressed duplicate items, the Flow could have been:

4. LOOP over inList
4.1 pub.list:addToList (inList, outList, noDuplicates=true)

That’s much more readable to me. An efficient implementation would require some cached indexing, but if it were a BIS that would be WM’s problem.