Sequential match

Natural developers automatically think “READ/FIND” when master and transaction files are involved, but a sequential match may be a better solution, dramatically improving performance.

When processing large master and transaction files, the nested FIND is initiated many times. Because there is much more overhead involved in starting a FIND than there is in continuing a READ, READing an entire file is much less expensive than FINDing each individual record. A sequential match replaces the nested FIND with a READ WORK. The WORK file is created with a READ LOGICAL or PHYSICAL, which will benefit from Prefetch or Multi-fetch.

If you need to find widows (masters with no transactions), you READ MASTER with a nested FIND TRANSACTION. Then to find orphans (transactions with no master), you READ TRANSACTION with a nested FIND MASTER. The results require two complete passes of the two files. Another benefit of a sequential match is that you can find all widows and orphans in a single pass (by Adabas) of each file.

In Natural, a sequential match is implemented with two WORK files, or an ADABAS master with a WORK file for the transactions. It may be necessary to flatten an Adabas file to a WORK file. A Prefetched READ PHYSICAL as an extract, followed by a sort, will perform very well for this.

It is necessary that the two files be read in the same sequence, and that there is a 1-to-1 or 1-to-many relationship; a many-to-many relationship is not supported. In a one-to-many relationship, the master is defined as the “one” and the transaction file is the “many.”

ZZMATCH.NSP is attached. For testing/validation purposes, it is ready to RUN, as is, on mainframe or Windows. Follow the embedded instructions for customization for real-world applications. A fair bit of the code is for demonstration purposes (e.g. IF #TRACE-xxx …) and can be deleted once you are convinced the program is working as expected.

To test under Windows, create a text file called ZZMATCHM.TXT from the following data, and drag it to the same Natural library as ZZMATCH. It will appear as a Resource.

00000050BOSTON              FRANK               202 555 1212   ACCTNGENG
00000060SMITH               JOHN                714 555 9876   ENTMNTGER
00000065SMITHERS            CAROL               229 555 1212   ENTMNTENG
00000070SMYTHE              JON                 562 555 6543   ENTMNTGER
00000080GERSTNER            LOU                 204 555 2345   SALE01FRE
00000090SMITH               SAM                 415 555 6382   SALE01ENG
00000001CAIN                ABLE                800 555 1234   TECH01ENG

Create a text file called ZZMATCHT.TXT from the following data, and create another Resource, as above.

SMITH               ENTMNT07290000025000
SMYTHE              ENTMNT00284000059900
GERSTNER            SALE0100185000025000
GERSTNER            SALE0102974100000000
GERSTNER            SALE0103297001500000
GERSTNER            SALE0104126800750000
HENDRIX             SALE0101184000100000
SMITH               SALE0100039701000000
CAIN                TECH0101343000100000
SMITH               TECH0127032000085500
THOMPSON            TECH0100719000785000
THOMPSON            TECH0124900000456700
ZILDJAN             TECH1300020000012500

You may need to modify the DEFINE WORK FILE statements to correctly point to the two Resources.

For IBM mainframes, here is some JCL you can use to test ZZMATCH.

//jobname  JOB (acct),'ZZMATCH SAMPLE',CLASS=a,MSGCLASS=x,
//             MSGLEVEL=(1,1),NOTIFY=userid,REGION=0M
//*                                    (c) 2002 LEN Consulting LLC
//CMWKF01   DD *                       MSTR
--- insert MASTER data here ---
//CMWKF02   DD *                       TRAN
--- insert TRANSACTION data here ---
// (4.49 KB)

You can do the same in Adabas without using work files. The statements

READ (1) file BY descriptor = start-val STARTING WITH ISN = isn
MOVE descriptor TO start-val

replaces the


The nice thing with the “STARTING WITH ISN” option is that it is not doing what it says :wink:
Indeed it starts BEHIND the given ISN and not WITH the given ISN.
Thus if you have a master loop reading the master info and a subroutine reading the transactions in parallel, the subroutine must save the current descriptor value and the corresponding ISN. Next time when it is called, the subroutine starts either with the same descriptor value and the next higher ISN (if there are multple records with the same descriptor value) or with the next higher descriptor value (otherwise). I.e. the READ (1) reads the transactions in descriptor sequence but one record by one record.

The READ will perform just as well as the FIND, but overall the READ WORK will perform much better.

If the master has a million records, you will initiate a million READ loops, regardless of the “(1)” notation. Using a READ WORK means that you extracted the million records with a single, Prefetched READ loop. You get the same number of records, but initiating a single loop is much fastrer than initiating a million loops.

It’s not the point that you can use a READ (1). The point is that you probably should be using a READ WORK.