Read vendor file
Sort records found by vendor name (excluding spaces and punctuation)
Select records on vendor file with same name but different tax identification number
Here’s my superbly inefficient first attempt. After my test job ran for an hour with no end in sight, I canceled it.
DEFINE DATA LOCAL
01 VENDOR VIEW OF VENDOR
02 VENDOR-NUM(A13)
02 REDEFINE VENDOR-NUM
03 FILLER 1X
03 #TIN(A9)
02 NAME(A31)
01 VENDOR-DUP VIEW OF VENDOR
02 VENDOR-NUM(A13)
02 REDEFINE VENDOR-NUM
03 FILLER 1X
03 #TIN(A9)
02 NAME(A31)
01 #NAME(A31)
01 #NAME2(A31)
END-DEFINE
READ VENDOR
RESET #NAME #NAME2
MOVE VENDOR.NAME TO #NAME
EXAMINE #NAME FOR ' ' DELETE
EXAMINE #NAME FOR ',' DELETE
EXAMINE #NAME FOR '.' DELETE
READ VENDOR-DUP
MOVE VENDOR-DUP.NAME TO #NAME2
EXAMINE #NAME2 FOR ' ' DELETE
EXAMINE #NAME2 FOR ',' DELETE
EXAMINE #NAME2 FOR '.' DELETE
ACCEPT IF #NAME2 = #NAME AND VENDOR-DUP.#TIN NE VENDOR.#TIN
END-ALL
SORT #NAME USING NAME VENDOR
AT BREAK OF #NAME
SKIP (1) 1
END-BREAK
*
WRITE (1) VENDOR 26T NAME
END-SORT
END
Why check for differences before you send it for sorting ?
As far as I can see, If you check when you get them back you can avoid reading through the whole DB n*n as you do now.
Finn
Unless name can start with one of the removed characters
READ VENDOR-DUP STARTING FROM #NAME
plus check for a reasonable strategy to drop out as early as
possible from the inner READ loop, otherwise you will read VENDOR-DUP
from start to end for every single VENDOR record …
DEFINE DATA LOCAL
01 VENDOR VIEW OF VENDOR
02 VENDOR-NUM(A13)
02 REDEFINE VENDOR-NUM
03 FILLER 1X
03 #TIN(A9)
02 NAME(A31)
01 #HOLD (5)
02 #NAME (A31)
02 NAME (A31)
02 VEND0R=NUM
01 #I (I4)
END-DEFINE
READ VENDOR
MOVE VENDOR.NAME TO #NAME
EXAMINE #NAME FOR ' ' DELETE
EXAMINE #NAME FOR ',' DELETE
EXAMINE #NAME FOR '.' DELETE
END-ALL
SORT BY #NAME USING NAME VENDOR-NUM
IF #NAME NE #HOLD.#NAME (1)
IF #I GT 1
DISPLAY (ES=ON) #HOLD(*)
END-IF
RESET #HOLD
MOVE #NAME TO #HOLD.#NAME (1)
MOVE NAME TO #HOLD.NAME(1)
MOVE VENDOR-NUM TO #HOLD.VENDOR-NUM(1)
MOVE 1 TO #I
ESCAPE TOP
END-IF
ADD 1 TO #I
MOVE #NAME TO #HOLD.#NAME (#I)
MOVE NAME TO #HOLD.NAME(#I)
MOVE VENDOR-NUM TO #HOLD.VENDOR-NUM(#I)
END-SORT
True, you are sorting the entire file, but that is better than reading it n*n times.
If this is something that will need to be done frequently, I would add a CLEAN-NAME to the file and make it a descriptor. Then you could histogram on CLEAN-NAME and use *NUMBER to find duplicates.
EMPLOYEES-DUP.PERSONNEL-ID NE EMPLOYEES.PERSONNEL-ID
is used to eliminate false hits/duplicates resulting from reading the same record as parent and child (eg outer loop PID=1 and inner loop PID=1). With SORT, there will be no false duplicates, so this test is unnecessary. Jerome also removed that test in his one-file code example.
Thanks Ralph, and sorry, I knew I must be missing something,
and I know I should stay clear of the forum with a headache like
that …
Nice solution then if external tools are allowed, yet there are, and must be,
“better” solutions using just Natural & Adabas, with the proper descriptors.
Sorry, guys, yesterday I missed the end of our such an interesting discussion.
Today I have 2 more points to add.
Wolfgand, I think nowadays most mainframe shops consider SORT as “integral part of OS”; perhaps, your remark about “external tools” is not that appropriate, if I may call it that way
However, I agree with you regarding requirement as for "different tax identification number " and I think Ralph is too optimistic: SELECT would certainly sort out ALL dups (with the same vendor`s name); however, the task to verify that those records have different tax numbers do require some MORE CODE to add…
Lets pretend this is the "flattened vendor file" (the vendors name is the first 8 positions):
00000001 THIS IS REC1
00000002 THIS IS TAX#2
00000005 THIS IS TAX#51
00000002 THIS IS TAX#2
00000003 THIS IS REC3
00000004 THIS IS REC6
00000002 THIS IS TAX#2
00000006 THIS IS REC8
00000007 THIS IS REC9
00000005 THIS IS TAX#52
00000005 THIS IS TAX#53
After SELECT ALL dups I got the following (unless I missed something?)
00000002 THIS IS TAX#2
00000002 THIS IS TAX#2
00000002 THIS IS TAX#2
00000005 THIS IS TAX#51
00000005 THIS IS TAX#52
00000005 THIS IS TAX#53
Since the first 3 records have the same tax-ID#, they should not be included into the final report; however, this is what I received:
00000002 THIS IS TAX#2
THIS IS TAX#2
THIS IS TAX#2
00000005 THIS IS TAX#51
THIS IS TAX#52
THIS IS TAX#53
NEXT FIN
NAT9995 Natural session terminated normally.
Only Dan, the original poster, can say for sure, but I presumed that a file named VENDOR would have unique TIN values. Now for a file named VENDOR-DETAILS …
In our case there can be multiple TINs on the VENDOR file.
PREFIX TIN SEQUENCE NUMBER
F 123456789 001
F 123456789 002
S 123456789 001
Prefix indicates if vendor uses a FEIN as their TIN or a SSN as their TIN or if they’re a foreign vendor
Sequence number is for a company with multiple locations.