How to check duplicate data

Hi All,

I want to know about checking duplicate data. My requirement is that :There are 2files(work files). Read file 1 and 2 Validate for error like Duplicate data in file, Non-Numeric check etc.
If error encountered write to error file with the corresponding message.
For example consider that there are 2 records with same data in file 1. How i can check for duplicates?

Thanks,
Rohan.

Natural does not have a COMPARE function but it does not work like that…
Try IF #FIELD1 = #FIELD2 WRITE (1) ‘Duplicate’ END-IF
and maybe sort both work files first by the same key value to assist in the compare.

I have written a program which checks for duplicates and runs once every year.
The trick is sort both files. Read File 1 (preferably small file) move key to a variable, read file 2.
If saved key is GT file 2 key read file 1 and if less read other way round and of course if equal write ‘error — record key’.

Have fun.

Thanks Kazi…I got your point. But in my case i want to eliminate the duplicates from the same file. For example: If file 1 is having same records, i have to eliminate.

Requirement :

Read file 1 and 2 Validate for error like Duplicate data in file, Non-Numeric check etc.
If error encountered write to error file with the corresponding message.

If every data looks okay then process the records.Need an output report sorted based on employee ID where employees are in Active status.

Thanks,
Rohan.

You are either describing your problem inadequately, or have never programmed.

The two posted answers, by Verne and Kazi, both suffice to answer your problem as stated.

Assuming you have a work file sorted by Employee ID, and you should not have duplicate Employee IDs, you can simply do an IF test between successive records.

In Natural, you could also use AT BREAK to discover duplicates, or PERFORM BREAK, or even IF BREAK.

steve

To delete (or simply identify) duplicate key values on a WORK file, use a sort utility (such as DFSORT or SYNCSORT) rather than Natural. SORT outperforms Natural when it comes to sequential I/O.

To delete the duplicate records, your sort parameters will look similar to this:

SORT FIELDS=(start,length,format,A),NOEQUALS
 SUM FIELDS=NONE