READing a Range by Multi Valued Descriptor Field

Hi,

We can not use READ by from thru in case of MU (Multi Valued) Descriptors.

Looks like Adabas Version 7 supports reading a range of values by the following statement

“READ by from to ”

I am using Adobes version 7 and Natural Version 3. But my program is giving compilation error while using TO or ENDING AT or THRU clause with READ. I am not sure what the problem is? :frowning:

As a work around, I can use FIND Number … FROM … THRU, followed by a READ (nnnn) to select a range. (where nnnn = *Number from FIND NUMBER)

Is there a better way to READ a range of values for MU descriptor?
Why is my program giving compilation error for READ…TO statement?

Thanks,
Bharani

Natural does indeed give a compiler error for the ENDING AT clause when reading by a multiple valued descriptor.

HOWEVER, there is not problem when using it for a STARTING AT clause. So, you can simply have:

READ MYVIEW BY LANG STARTING FROM ‘C’
IF LANG (*) GT ‘GZZZ’
ESCAPE BOTTOM (IMMEDIATE if appropriate)

where you supply the escape logic appropriate.

Where in the documentation does it say that Version 7 supports an ENDING AT clause for a MU descriptor?

steve

As a followup to the posting above. The real question is , what does an ENDING AT clause “mean” when you have a multiple valued descriptor??

For example: suppose it were legal to have (it is not)

READ MYVIEW BY LANG STARTING FROM ‘F’ ENDING AT ‘M’

I have a record with two values for LANG, FRA and SPA

should this record satisfy the criteria? Because of the FRA, we will see this record.

Does the ENDING AT ‘M’ mean we should start from ‘F’ and read the file until we have no records with any value less than ‘M’? Does the "ENDING AT ‘M’ “override” the STARTING FROM ‘F’?

Do you see the problem(s)? Which is why Natural does not allow the ending at value for such a scenario.

steve

And READ … TO … is not valid in Natural 3. You’ll need version 4 for that.

Hi Steve,

Thanks for the quick reply.

(1) I do not want to use ESCAPE BOTTOM because I will end up with duplicate records selected.

(2) Thanks for confirming that “TO” clause will be permitted from Nat 4 version. We will soon be migrating to Nat 4.

(3) Regarding the documentation on READ with TO clause, Please see the link http://techcommunity.softwareag.com/ecosystem/documentation/natural/nat622unx/sm/read_1050.htm

It has a tabular column as attached in this mail

Thanks,
Bharani
READ.doc (25.5 KB)

I have sent a question to Software AG regarding the functioning of the READ of an MU with a range and a TO clause.

My guess, and that is all it is at the moment, a guess, is that Adabas works it way through the associator from the STARTING FROM Vvalue to the TO value. Thus, the only records you do NOT process are those that have ALL their values outside the specified range. If that is indeed what you want, and, you do not want to have to program around multiple processing of a single record, I would suggest

FIND MYVIEW WITH MUDESC EQ start THRU end

The FIND eliminates duplicate ISNs. If necessary, you could either use SORT, or even SORTED BY (but only for a small number of records) to achieve a sequencing by MUDESC (the lowest in each record).

Please note that with a large range, and a large file, the FIND would have to do a lot of work in the associator (up front). By contrast, if you just want the first n records, where n is small compared to the records that satisfy the criteria, the READ would be a better choice (maybe, depending on what you want to do with duplicates).

steve

Hi Steve,

Thanks. My Program is already using FIND but the volume is huge (nearly 4 million records are processed by FIND). I want to change the program to use READ.

Until we migrate to Nat 4 (when I can use READ … TO), I am planning to use FIND NUMBER…From … Thru… >> Move *Number To #Cnt >> READ (#Cnt) From…

Please advise if this sounds all good.

Many Thanks,
Bharani

How many records are in the file? If the number is not substantially greater than the 4 million records in the range, it may be more efficient (and simpler) to read through the file in physical sequence and check each record for the range.

I do not understand what you are doing with the FIND NUMBER and the READ. Doesn’t look like the FIND NUMBER is doing anything, and, with the numbers you cited, this should be very expensive. I think what you are trying to do is use the FIND to eliminate duplicates; but the READ can still process a record more than once. Could you clarify what you think this is doing?

steve

I’m wondering if this will help what you are trying to do:

Use the HISTOGRAM statement to browse thru the MUDESC index. If a value is one you need, you can then do a FIND on that value:


HISTOGRAM histview mudesc
(starting at / ending at are available)
  FIND findview WITH MUDESC = histview.MUDESC
     /* process all records with this MUDESC value - maybe there is more than one
  END-FIND
END-HISTOGRAM

This is most helpful if the “mudesc” contains values that let you decided if the FIND is to be performed or not. This gives you control over the sequence of reading the records. It does NOT prevent you from reading a given record more than once (just as doing a READ by the MUDESC) - if a record has 3 different values in the MU, you will read it 3 times, once for each value.

However, if you are really reading most of a 4 million record file: don’t use any of the descriptors; just read the file in physical or ISN sequence and SORT the records as needed. This will only read each record once - you treat the MU as an array to determine if you want to include it or not:


IF MUDESC(*) = 'VALUE' THEN
....
END-IF

The number of records in the file is about 16 million. So I can not use READ physical.

FIND works fine without giving duplicates. FIND does not give me duplicates as our file will not have duplicate values in the MU field of the same record. But FIND will affect performance for reading 4 million records.

READ with ESCAPE BOTTOM results in Duplicates. So I can not use it. The value I am looking for can be present anywhare within the array. i.e there is no fixed index of the MU field where the value I am looking for can be found. READ gives duplicates because my EXAMINE for … ESCAPE BOTTOM if EQ 0 would not work.

Therefore I think that I have to use a FIND NUMBER >> move *number to #limit >> READ (#limit) untill we migrate to Nat 4 by when I can use READ by from TO <value 2>

I agree with Steve: I don’t understand what your FIND NUMBER approach will do for you. If it works for your testing it is coincidental, not a function of the approach you are using.

At 4 million expected hits on a 16 million record file, you are in the “grey area” - it may be faster to do a READ PHYSICAL for 16m records than to use FIND/READ logic to select 4m. The major factor is the average number of records per block - if it is greater than 4, a READ PHYSICAL will likely be faster than FINDs or READs The rule of thumb is 20% hits - less than that you should use FIND logic, more than that, consider READ and READ PHYSICAL. The rule of thumb is based on a general average of 5 or more compressed records per physical block. The FIND and READ LOGICAL statements use more I/O per record retrieved because they have to access the associator for the index and the address converter; READ PHYSICAL simply returns records from the Data area. Combined with PREFETCH/MULTIFETCH, READ PHYSICAL can return data records at extraordinary speeds.

I’m not sure what you mean when you say the READ gives duplicates. Can one record have more than one value in the MU that satisfies your selection criteria? If so, then yes, the READ could give you the same record more than once.

The elimination of duplicates with the FIND occurs because the FIND merges the selected ISN list, so if more than one MU value in the same record satisfies your selection, the FIND will only return the record once. The READ (even with Natural 4 TO clause) will return duplicates.

How many different values are you searching in your 16m records for? That is - how many possible values are there in the from/to range you are looking for?

Hi Bharani;

If I understand what you are trying to do with the FIND NUMBER, it does not work at all.

Consider the following:

FIND NUMBER file WITH MUDESC = 10 thru 20
MOVE *NUMBER TO #NUM

assume that there are five distinct records that have values in this range; so #NUM is 5. Assume one of the records has MU values 11,12,13,14,15. Other 4 records have higher values.

Now do:

READ (#NUM) file BY MUDESC STARTING FROM 10

You will read the same record five times. You will never see the last four records that have values in the range.

Or, have I missed something in the code you have described?

I would do a timing for the physical READ (as suggested by Doug Kelly). Assuming there are no updates involved, use MULTIFETCH.

steve