Hi all,
What is the difference in using a “FIND NUMBER” statement and using a
“HISTOGRAM with *NUMBER” ???
Waiting for reply.
Rahul
Hi all,
What is the difference in using a “FIND NUMBER” statement and using a
“HISTOGRAM with *NUMBER” ???
Waiting for reply.
Rahul
FIND NUMBER will return just the overall number of records a FIND with the same search criteria would return, it doesn’t open a processing loop.
HISTOGRAM will open a processing loop and present a descriptor value along with the number of records containing that value on every iteration.
Thanks Wolfgang.
If you want the number of records for a specific key value FIND NUMBER is faster than HISTOGRAM(1). If you want the number of records for each occurrence in a specific key range a HISTOGRAM is the solution. If you only want the number of records within the key range you use FIND NUMBER. If it is a complex criterion, you use FIND NUMBER, too.
Another application of FIND NUMBER is the provision of the record list for later processing: FIND NUMBER … RETAIN AS. You can also use FIND NUMBER … RETAIN AS for union, intersection and difference of retained sets.:
FIND NUMBER view WITH set1 OR set2 RETAIN AS set-union
FIND NUMBER view WITH set1 AND set2 RETAIN AS set-intersection
FIND NUMBER view WITH (set1 OR set2) AND NOT (set1 AND set2) RETAIN AS set-difference
In Report Mode, FIND NUMBER permits a WHERE clause involving other fields from a record. There is no such capability for HISTOGRAM.
steve
But Steve, in HISTOGRAM also we have a WHERE clause ( as per Natural programming guide , P-142).
But I guess there is no use of that WHERE clause because it is mentioned there → “The field specified in the WHERE clause must be the same as in the main clause of the HISTOGRAM statement”
Could you please clarify this???
In a HISTOGRAM statement you always have access only to the descriptor field the histogram uses, i.e. your view definition may only contain this descriptor and no other field from the ddm. You can of course redefine that field as you wish and write a where clause on any of this redefinition fields.
I think, your problem is understanding the WHERE clause itself. It has nothing to do with the WITH clause in a FIND statement. The WITH is evaluated by ADABAS, the WHERE is evaluated by NATURAL, i.e. after the record has been passed from ADABAS to NATURAL but before the record is available to the body of the processing loop. The WHERE clause effects *COUNTER but not *NUMBER (in a FIND), as *NUMBER is an ADABAS provided value, *COUNTER counts how often the body of the processing loop has been entered. Similar statements are ACCEPT/REJECT except that these statements are coded within the body of the processing loop and thus don’t effect *COUNTER (the body of the processing loop has been entered!)
I think there’s at least one more possiblity than using View-Fields.
Example:
define data local
*
01 VIEW-NAME view of DDM-NAME
02 AE
*
end-define
*
histogram VIEW-NAME for AE
where *NUMBER > 10
*
display *NUMBER VIEW-NAME
*
end-histogram
*
end
A Read or Find for that purpose would be slower!
To become a quibbler: *number ist not a field, it is a system variable.
Of course you can use any variable in the where clause. But if you want to compare to anything you have read (i.e. “field”), this must be defined in the view. And in a histogram you can only use a view that contains no other “field” than the descriptor.
Now I stop quibbling.
Wilfried, does it mean that the above line of code " histogram VIEW-NAME for AE where *NUMBER > 10 " is not valid???
It is valid and it does work
I think some of the confusion here arises because Natural does such a good job of abstracting what Adabas is doing, but to understand the difference between FIND NUMBER and HISTOGRAM you need to understand the Adabas back end.
First, remember that Adabas stores records in its DATA container files, and also builds an index for each descriptor in the ASSO container files. A descriptor may be a field in the record (a normal descriptor), a part of a field (a subdescriptor), a concatenation of parts or all of two or more fields (a superdescriptor), or a value calculated from one or more fields (a hyperdescriptor.) A descriptor index logically consists of a sorted list of values with each value having an associated list of the ISNs of the records matching that value.
To satisfy any FIND (Sx) command, Adabas creates a list of the ISNs of all records that meet the find criteria. In the simplest case this is just those records with a particular value of a single descriptor, but in other cases it may involve a range of values, multiple descriptors, and even fields that have not been indexed as descriptors. Normally your program then retrieves each record in the constructed ISN list in turn (using L1/L4 commands), but in the case of FIND NUMBER Adabas merely returns the size of the ISN list, which Natural places in the *NUMBER system variable.
For the HISTOGRAM (L9) command, Adabas reads through a single descriptor index and returns each value with the count of its associated ISN list, which NATURAL again places in the *NUMBER system variable. (The Adabas command manual calls the L9 command “Read Descriptor Values”, which is probably clearer than “HISTOGRAM”.)
If you understand what Adabas is doing “behind the scenes”, the difference between FIND NUMBER and HISTOGRAm is quite clear.
Thanks Curtis
Hi folks,
I would like to think on performance, but according to Curtis said, which conclusion we can take?
In this issue, we are not going to consider Histogram needs a specific View, Find number can be used with AND or OR criteria. Let consider only performance.
1 ) S1 command is the same, doesn’t matter if you are using FIND, FIND NUMBER or FIND (1), Natural handles it for you, so, only ISQ is retuned on System Variable (*number);
2 ) S1 command is not the same, Adabas handles it according to the information that you pass by Control Block. Find Number doesn’t have format buffer, and there is no loop to generate RC command;
3 ) L9 is cheaper than S1, because L9 uses only inverted list; so, no AC and DS are used. S1 uses inverted list, depend on the amount of the records, Work part 2 can be used, AC and DS are used - the first record is returned on S1 command;
4 ) Find number is S1 command, even if Natural handles the command. So, no format buffer (FB), loop (hence, no RC command is issued), and ISQ is placed on *number (system variable);
5 ) To check only if the record exist - descriptor with 1 or no record:
a. Find number should be used instead of Histogram (1);
6 ) To check only if the record exist - descriptor with large records:
a. Histogram (1) should be used instead of Find Number;
7 ) If you use Find Number, only S1 is generated, since there is no loop (no RC)
8 ) If you use Histogram (1), L9 and RC are generated;
9 ) Find number is always the best choose, only if the amount of ISN is enough on NSISN parameter;
After all, which statement is better to use? Find Number or Histogram (1)?
Thanks.
Hi folks,
Does anyone have any suggestion, comments or any explanation about the issue above?
Thanks.
There are several reasons for using FIND NUMBER versus HISTOGRAM, but, in my experience, there are no files whose contents apply solely to one of those criteria. There is always some subset of key values that would be processed better by the “other” statement. But you know only after the fact, during execution, that the other statement would have been better.
To simplify the decision, I base it on the answer to one question: Is your descriptor unique? If so, then avoid HISTOGRAM’s RC call and code
FIND NUMBER EMPLOYEES WITH NAME = 'ZBROG'
otherwise use
HISTOGRAM EMPLOYEES FOR NAME FROM 'ZBROG'
TO 'ZBROG'
END-HISTOGRAM
If you’re not yet at Natural 4, change the HISTOGRAM to use a THRU clause.
HISTOGRAM EMPLOYEES FOR NAME FROM 'ZBROG'
THRU 'ZBROG'
END-HISTOGRAM
And if it’s not easy to provide the THRU value, use
HISTOGRAM EMPLOYEES FOR NAME FROM 'ZBROG'
ESCAPE BOTTOM
END-HISTOGRAM
Hi Ralph,
I understand that you agreed with me about the item 5.
Thanks, does anyone want to add comments, suggestion about the other itens?
point 6 is irrelevant to the discussion: the size of the record will not impact the performance of either command.
If there was a blanket answer “use command x rather than command y” for all cases, Adabas wouldn’t need to support “command y” or Natural would be modified to generate “command x” in all cases where it now uses “command y”.
As Ralph alludes to - even if you run a test against a set of data and programs, the performance would vary for the selection criteria or data. A test against today’s scenario may yield a different performance result when run against tomorrow’s scenario.
For scenarios other than the simple existence check (counting, complex criteria, multiple-millions of records, etc, etc) you would have to define what you are trying to achieve more specifically. My general answer would be “it depends” - it depends on what your environment - data, processing, application, etc - and what the “question” is.
For a simple existence check, I generally use the FIND NUMBER as a first choice:
I will use a HISTOGRAM in an existence check to
Performance for both commands is nearly always “good enough”. I will almost always get more bang for my performance and tuning efforts looking elsewhere.
I think point 6 is relevant. John Smith wrote about “large records”. I would say: “large number of records”.
For example imaging the question: Is there any descriptor-value starting with “AB”? I would use HISTOGRAM here…
define data local
01 #search (A15) init <'AB'>
01 #low (A15) init full length <H>
01 #high (A15) init full length <H>
*
01 hist-id view of example-file
02 id-field (A15)
*
end-define
*
COMPRESS #search #low into #low LEAVING NO
COMPRESS #search #high into #high LEAVING NO
HIST-1.
HISTOGRAM (1) hist-id for id-field FROM #low THRU #high
END-HISTOGRAM
IF *COUNTER(HIST-1.) = 0
write "there is no value starting with" #search
else
write "there is a value starting with" #search
END-IF