I have a flat file with a single field consisting of personnel numbers. I would like to remove duplicate records in this flat file.
Inside designer, I converted the flat file into IS document type and sorted the records based on ascending order using pub.document:sortDocuments. Then, I did a LOOP step on the IS document type with a BRANCH step inside. The condition in the branch step is when current iteration’s record is not equal to the next interation’s (trying to make this dynamic by adding a variable called iteration2) record, I will append it to another document list. After each iteration, I will add 1 to the variable “iteration2”.
An example of the condition in the branch step: %DocumentDT/recordWithNoId/Employee Number% != %DocumentDT/recordWithNoId[%interation2%]/Employee Number%
However, this is not working. I would like to check if this the best way to remove duplicate records in designer? or do you have a better suggestion on how to remove duplicate record in a flat file?
Is it really just a file with a single column of personnel number? Nothing else? If so, don’t need to FF parse this, can use simplified mechanism to convert to a string list and use Java classes to help remove duplicates. Can you share more details?
For situations where we’re dealing with a single column, utilizing the hashtable service could be advantageous. By creating a hashtable object and populating it with the records, uniqueness is automatically ensured. This allows for easy retrieval of the unique records as a list.
Even with a multi-column file, a Hash* class can be helpful to eliminate duplicates. But a key aspect of the question is how big is the data expected to be. If “too big” then other techniques will be needed.