selecting distinct documents from a doc list

Hi,

I have a document list as input to a flow service. There are duplicate documents in this doc list. How do I eliminate the duplicates.

Sample input data:
RecNo SSN Firstname Lastname Grade
1 123 Tom Hanks 10
2 143 Tomy Hanks 10
3 123 Tom Hanks 10

I want only the First and second records to be there my output document list. The filter condition should achieve what the following SQL query achieves.
select distinct SSN, Firstname, Grade from inputdata

Regards,
kalravivar

Hi kalravivar,
Try creating a string list from your document list by concatenating all the values from the doc. list. For example you string list would be now
123.Tom.Hanks.10
143.Tomy.Hanks.10
123.Tom.Hanks.10

(I used a dot(.) to seperate the elements)

Now pass this string list to a service which removes the duplicates. you will get service to remove duplicates from PSUtilities or any of your common services.
After removing the duplicates again separate the string list into document list.(you can use tokenize on dots(.) to separate the elements and form a document list).

regards,
thanks,
Napster

Hi Napster,

Thanks for providing a solution.
I thought there is a single service something similar to ps.util.list:filterDocumentList.

Regards,
kalravivar

Napster,

What is the wm service that compares a string with a string list and checks whether the string is there in the string list or not?
I am not able to find any in ps utilities.
Regards,
kalravivar

Dear kalravivar,
There seem to be no service in PSUtilities which does what you are asking for. Anway there is no need of comparig a string with a string list, just google for any java service which removes duplicates from a string list.

regards,
Napster

Code For Eliminating duplicates from a string list:

Create a java service – remDupFromStrList
input/output parameters: inStrList and outStrList (Both are string lists)
In the shared tab import the following:
java.util.Arrays
java.util.Hashset
java.util.List
java.util.Set
--------------------Here is the code----------
IDataCursor pipelineCursor = pipeline.getCursor();
String inStrList = IDataUtil.getStringArray( pipelineCursor, “inStrList” );
pipelineCursor.destroy();

List list = Arrays.asList(inStrList);
Set set = new HashSet(list);
String result = new String[set.size()];
set.toArray(result);
IDataCursor pipelineCursor_1 = pipeline.getCursor();
IDataUtil.put( pipelineCursor_1, “outStrList”, result);
pipelineCursor_1.destroy();

Cheers,
Kalravivar

Here are 2 other approaches to consider:

  • Sort the document list by SSN. Loop over the list, copying the record from the source list to the target list only if the SSN of the current record differs from the previous record. There are a couple of doc list sort services available. Search the forums.

  • Create a couple of Java services to use a java.util.HashSet to eliminate dupes. One service would accept the document, its key (SSN) and an optional HashSet object. It returns a HashSet object, which can be used as input to the next call. The other service would return a doc list from HashSet after all the docs have been added.

Kalravivar,

Will this work with documentList instead of a StringList?