Selecting distinct documents from a doc list

Ravi_R · March 21, 2011, 11:13pm

Hi,

I have a document list as input to a flow service. There are duplicate documents in this doc list. How do I eliminate the duplicates.

Sample input data:
RecNo SSN Firstname Lastname Grade
1 123 Tom Hanks 10
2 143 Tomy Hanks 10
3 123 Tom Hanks 10

I want only the First and second records to be there my output document list. The filter condition should achieve what the following SQL query achieves.
select distinct SSN, Firstname, Grade from inputdata

Regards,
kalravivar

napster · March 22, 2011, 11:36am

Hi kalravivar,
Try creating a string list from your document list by concatenating all the values from the doc. list. For example you string list would be now
123.Tom.Hanks.10
143.Tomy.Hanks.10
123.Tom.Hanks.10

(I used a dot(.) to seperate the elements)

Now pass this string list to a service which removes the duplicates. you will get service to remove duplicates from PSUtilities or any of your common services.
After removing the duplicates again separate the string list into document list.(you can use tokenize on dots(.) to separate the elements and form a document list).

regards,
thanks,
Napster

Ravi_R · March 22, 2011, 2:50pm

Hi Napster,

Thanks for providing a solution.
I thought there is a single service something similar to ps.util.list:filterDocumentList.

Regards,
kalravivar

Ravi_R · March 26, 2011, 12:06am

Napster,

What is the wm service that compares a string with a string list and checks whether the string is there in the string list or not?
I am not able to find any in ps utilities.
Regards,
kalravivar

napster · March 28, 2011, 11:44am

Dear kalravivar,
There seem to be no service in PSUtilities which does what you are asking for. Anway there is no need of comparig a string with a string list, just google for any java service which removes duplicates from a string list.

regards,
Napster

Ravi_R · April 15, 2011, 5:58pm

Code For Eliminating duplicates from a string list:

Create a java service – remDupFromStrList
input/output parameters: inStrList and outStrList (Both are string lists)
In the shared tab import the following:
java.util.Arrays
java.util.Hashset
java.util.List
java.util.Set
--------------------Here is the code----------
IDataCursor pipelineCursor = pipeline.getCursor();
String inStrList = IDataUtil.getStringArray( pipelineCursor, “inStrList” );
pipelineCursor.destroy();

List list = Arrays.asList(inStrList);
Set set = new HashSet(list);
String result = new String[set.size()];
set.toArray(result);
IDataCursor pipelineCursor_1 = pipeline.getCursor();
IDataUtil.put( pipelineCursor_1, “outStrList”, result);
pipelineCursor_1.destroy();

Cheers,
Kalravivar

reamon · April 15, 2011, 9:37pm

Here are 2 other approaches to consider:

Sort the document list by SSN. Loop over the list, copying the record from the source list to the target list only if the SSN of the current record differs from the previous record. There are a couple of doc list sort services available. Search the forums.
Create a couple of Java services to use a java.util.HashSet to eliminate dupes. One service would accept the document, its key (SSN) and an optional HashSet object. It returns a HashSet object, which can be used as input to the next call. The other service would return a doc list from HashSet after all the docs have been added.

pth30041 · March 28, 2012, 1:08am

Kalravivar,

Will this work with documentList instead of a StringList?

Topic		Replies	Views
How can I get rid of duplicates ? ? ? Tamino	2	1997	April 2, 2021
How to remove duplicate records in flat file? Managed-File-Transfer	4	526	September 16, 2024
Compare String in webmethods Feedback & ideas Discussions-on-this-Forum-System , New-User-Welcome-and-Introductions	4	3030	June 5, 2008

Selecting distinct documents from a doc list

Related topics