Meanwhile I did some tests with other sizes. A rule of thumb (for my system) is:
â from 0 to 200 take IF #ARRAY(*) =
â from 200 to 2000 take copycode with algorithm
â from 2000 take adabas descriptor
Of course you can drop some command of the copycode (i.e. filling #idx directely instead of using found). But this is not a game changer.
Sorry, but it doesnât make sense to read file A into an array. Iâll pass every record of file A only once - so reading it into an array wonât save database I/Os. I will consume some additional time for building the array.
Just for a data point, Iâm wondering what kind of machine you are testing on? Mainframe, LUW server or PC? How many other users/jobs are active? If Mainframe, is Natural optimizer involved? What Adabas caching is in use, indexes, table files? Do you only care about optimizing wall clock time, or does CPU time matter too?
I guess Steve Robinson is enjoying his retirement, havenât seen him post in over a year. But he always recommended doing large table searches using EXAMINE #TABLE(*) for #STRING ⌠GIVING POSITION. This requires redefining an (A1/10000), or whatever size, over the original array. And even uglier, needing to add an extra character between each array element and populate it with some unused character to avoid matching a value split between 2 occurrences. But I think he was only focusing on saving CPU time. I would only use this method if I was charged a lot for CPU minutes.
Hezi, accessing a 20-record file (FIND) a million times (parent loop) in a single program is bad design. Those million FINDs impact system throughput and every Adabas user. Even for a 20000-record file, you would be executing 50 Adabas commands, on average, for each record. Still not acceptable.
Calling an external routine (subprogram, function, external subroutine) incurs lots of overhead. This increases with more parameters, including array entries.
A colleague, George Vidacovich, demonstrated the speed of binary searches (your copycode) a decade ago and I have used them several times since. They are great.
Compute #OCC = *OCC (#ARRAY) only once, rather than within the REPEAT.
Because both parent and child files are sorted by the same value, replace the binary search with sequential match logic. At the beginning of the program, write the child file to a WORK file. That way you will access each record (parent and child) only once. And no array processing is needed. You will find a sample program attached as a zip file in the link, above.
Iâm on a (quite outdated) Solaris test machine. And Iâm almost alone at the moment - that means ADABAS has below 100 commands per secs and cpu is 95% idle.
Adabas Bufferpool is 1GB while Database is 80GB. And the pool hit rate is 99.9%
Here is one more option to try. Read FILE-B and compress the FOREIGN-KEY into a text field with delimiters. Then when reading FILE-A use EXAMINE #TEXT FOR FOREIGN-KEY GIVING #NMB1. My understanding is that using examine on a text string is a significant improvement over array logic.
ZZMATCH, my sample program, extracts the child file from Adabas to a WORK file. You could adapt it to load the data into an array, but the WORK file technique handles unlimited file sizes and eliminates any internal CPU for indexing. WORK file processing is not the fastest thing that Natural does, but I would be surprised to find a substantial reduction in elapsed time from using an array. Let us know your results.
Five years ago, I hit 75. I retired. My wife and I went on quite a number of vacations (mostly hiking). Then came Covid. We did local hiking trips, thus avoiding restaurants and hotels.
That left time to follow Natural on the Tech Forum and on SAG-L. Somehow, I did not follow this thread totally. So, to clarify something, the last time I did comparisons, using a large alpha field for an EXAMINE always beat any technique that involved working with arrays. The reason is quite simple, arrays require work; a single alpha field does not have all that overhead. I will look later; somewhere on the Forums are copies of my Inside Natural. I think there is an index which will simplify finding articles that involve searching arrays. The extra character to delineate array members is not at all ugly. You simply do a REDEFINE (as you said); no additional coding required. Yes, I was concerned with saving CPU time.
Now to the posting involving the two files, A and B. Jerome Blanc basically posted what you would find in the Inside Natural articles.
I do not seem to have the entire postings. I will have to look again for them.
Dion Kejlberg pointed out that if the File-A records can be read in Physical sequence, that would be faster than Logical sequence. This could be very important for performance.
Actually, the File-A activity is probably the main âcostâ of the entire process. As long as you avoid things like FIND NUMBERs on FILE-B, you should be fine. Even though there would be differences between different searches, the savings will probably be miniscule compared with reading the million records.
When I started reading this thread I remembered fondly going to MIT for Jim Wisdomâs annual Natural Conference with, you, Dennis Hamilton, Kelly Jones, Jim, and a host of others, all discussing nerdy stuff like READ vs FIND. Nice to see that this sliver of the corner of life hasnât changed. Hope you are well in retirement.
@Matthias1: Can you please share the file_a view fields used for âdo some stuff hereâ and the (additional?) view fields used for âdo some additional stuff hereâ?
Glad to hear your retirement is going well, and that you were able to continue hiking during the Covid shut downs. The hiking will keep you young, much more so than keeping up with SAG-L and this forum!
Sorry, maybe calling the redefine âuglyâ was a bit over the top. I just find it a bit messy-looking in my code when I use it. But if I needed to save CPU time on a large table scan, I have always used it once I read your tests and statistics examples. Thank you for all the tips and tricks you have shared with us over the years.
Maybe this thread could be used to justify a request to the Natural developers to add a binary table search or a hash table search option to the EXAMINE command. Cobol has always had the binary search algorithm built in, and it was always useful, especially when CPUs were so much slower. I wonder if the developers just didnât want to make Natural look too much like Cobol? Or maybe they didnât see why anybody would bother reading an Adabas table into a Natural array and doing lookups there, since Adabas was so fast and efficient?
Anyway, Iâll be interested in seeing how the sequential file match of Ralphâs will compare time wise if Mathias has time to test it. Iâm sure it would use the least CPU, and probably wall time too. Good to use if you have a canned routine ready to plug in, but I always hated writing them from scratch and testing on old Cobol files back in the day. Iâm sure a lot of mainframe CPU upgrades could have been avoided over the years if more sequential matches had been implemented for nightly, weekly and monthly batch runs. Iâve seen so many batch jobs running that reread the same data for reports that the previous job just read. But we got lazy, as it was so much easier to just read the database again, and less error prone. It also seemed like a step back in time to managing sequential master files, which was why we went to DBMS systems in the first place.
Cheers to all, and thanks to Mathias for the interesting thread,
.George Cooper (semi-retired and fighting it
Today I tried the Ralphs Zbrog points #4 and #5 and the array redef recommended by Jerome and others.
â #occ instead of *OCC doesnât really help.
â since file_a is read in a sorted way, Ralph Zbrogâs #5 is a good idea. See the code snippet below
â searching in a string instead of using doesnât really help. But to be honest: I didnât make a redefinition. I compressed the array into a string before. I think this doesnât matter.
* use search array with start pos
*
reset #found
#timx := *TIMX
#idx-start := 1
read (#reads) file_a by item_no
for #idx = #idx-start to *OCc(#array(*))
decide for first condition
when #array(#idx) > item_no escape bottom
when #array(#idx) < item_no #idx-start := #idx + 1
when #array(#idx) = item_no #idx-start := #idx + 1
add 1 to #found escape bottom
when none terminate 1
end-decide
end-for
end-read
#timx := *TIMX - #TIMX
write 'array + pos ' #TIMX_P13 '=' #found
*
* emulate array with redefine
*
reset #found
compress '|' #array(*) '|' into #adyn with all delimiters '|'
#timx := *TIMX
read (#reads) file_a by item_no
compress '|' item_no '|' into #item_no_delimited leaving no
examine #adyn for #item_no_delimited giving position #idx
IF #idx ne 0
add 1 to #found
end-if
end-read
#timx := *TIMX - #TIMX
write 'array redef ' #TIMX_P13 '=' #found
I am pleased to see that the sequential match logic faired well, but the results may be a bit skewed. You have three loops to read file A, one for each scenario. The first loop may have a longer run time because Adabas needs to retrieve data blocks from disk. The subsequent loops may be faster because some (or all) blocks are already loaded into the buffer pool. Rerun with the âarray + posâ or âarray redefâ test`` first in the program to see if it changes the results.
Shouldnât when #array(#idx) = item_no #idx-start := #idx + 1
be when #array(#idx) = item_no #idx-start := #idx
Arenât you expecting multiple records to match each array entry?