Histogram

Hi Everyone,

I have question regarding a histogram.
I have file with multiple occurrence fields.
Something like this

01 TEST VIEW OF TEST
    02 A (A5) D
M 02 GROUP
       03 B  (A2)
       03 C  (A7)
    02 D (N8)
01 KEY (A9)
  02 B (A2)
  02 C (A7)

The multiple occurrence fields range from 1 to 99

Now I have to read over 600000 records to and filter on b

So I have a program that does the following

DEFINE DATA
LOCAL
01 TEST-HIST OF TEST
    02 A 
    02 B  
    02 C  
    02 D 
*
01 #SEARCH-KEY (A2)
END-DEFINE
 #SEARCH-KEY := 'D'
*
HS1.
HISTOGRAM TEST KEY STARTING FROM #SEARCH-KEY
*
iF #KEY NE #SEARCH-KEY
      ESCAPE BOTTOM(HS1.)
END-IF  
* 
DO FIND KEY ON RECORD USING HS1 KEY
CALCULATE VALUE
*
END-HISTOGRAM
END

My question is the following:
is it possible to get different result depending if the program is run in batch or online?
Must the histogram not always have a full key?
Can the result differ from read results?

(Example of read program)

DEFINE DATA
LOCAL
01 TEST-VIEW OF TEST
    02 A  
    02 C*GROUP
    02 GROUP (1:99)
       03 B  
       03 C   
    02 D  
*
01 #SEARCH-KEY (A2)
01 #INDEX             (N2)
END-DEFINE
*
 #SEARCH-KEY := 'D'
*
RD1.
READ TEST-VIEW BY A
*
#INDEX := RD1.C?*GROUP  
* 
IF  RD1.B(#INDEX) EQ 'D'
*
       CALCULATE VALUE
*
END-IF
*
END-READ
END

Hi Everyone,
Sorry I think this is the wrong forum, but could not find away to delete or move the post.

Please post a copy in the Natural area and we’ll get that one here deleted afterwards.

First, based on what you posted, I initially thought you were working with Multiple Valued fields.
However, the following part of your post is not of MU fields, but a Periodic Group.
01 TEST VIEW OF TEST
02 A (A5) D
M 02 GROUP
03 B (A2)
03 C (A7)
So, I will continue with that premise.

A side note; when doing a HISTOGRAM, the view must have only one field.

Values for fields within a PE Group do not have to fill the field. To see this, assuming you have access to the standard demo file, VEHICLES, just do a HISTOGRAM on the field MAKE.

Assume you have a PE Group, one of whose fields , say CREDIT-CARD, is a descriptor. The inverted tables , which are the basis for a HISTOGRAM command, have entries which functionally (not actually physically) have appended subscripts. For example VISA-1, VISA-2, etc. You can see the subscript by doing a display of *ISN. For example
HISTOGRAM myview credit-card
display credit-card *isn *number

You will get three columns like
VISA 1 47
VISA 2 26
VISA 5 14

The above would indicate there are no records with VISA as the third or fourth occurrences.

There would be no differences between running a HISTOGRAM in batch or online.

The only difference between a HISTOGRAM loop, and a READ loop, would occur if there were updating going on. Since READ of a file would take considerably longer than a HISTOGRAM of a single field, it would be more likely to get a record show up in the latter loop.

Finally, what are you actually trying to do. Could you show us an actual view of the PE Group? Could you explain what information you are trying to extract from the file.

1 Like

Hi Steve,

Yes you are correct, I remembered it incorrectly as I made the post from home.

But I would say it is periodic group with fields that occur multiple time
So using the VEHICLES example, it would look like this

01 VEHICLES VIEW OF VEHICLES
  02 CAR-DETAILS
    03 MAKE (1:99)
    03 MODEL (1:99)
    03 COLOR (1:99)
    03 COLOUR (1:99)

I will unfortunately not be able to share the view on a public domain.

I was asked to investigate if the correct count of records that supplied by a report.
The report does the read as the first program I wrote. Doing a histogram and then a find of the record.
The view have 2 set of periodic groups, both with multiple occurring fields
The report filter on the second periodic group and then uses the first periodic group to group subtotals.

01 VEHICLES VIEW OF VEHICLES
  02 CAR-CLASS
     03 CLASS (1:99)
  02 CAR-DETAILS
    03 MAKE (1:99)
    03 MODEL (1:99)
    03 COLOR (1:99)
    03 COLOUR (1:99)

So say for make FORD, totals by class (sedan, pickup, etc.)

The report gives me 680000 broken up over the different classes

The logic duplicated in another program gives me 649000 broken up over the different classes
and the read program that I wrote later gave 648000 broken up over the different classes
Both programs I wrote I executed online and in batch.

I have since exported some data into a SQL database and did a count there as well and the read seems to be the more correct read.

It seems that the histogram on a periodic group with multiple values is not returning the correct number of records.
Hence why I was asking the questions.

Regards,
Ivan

HISTOGRAM returns the number of values. When a PE is involved, it cannot be expected to return the number of records.

Using the Employees file for my example, empty the file, then add a single record, providing two income values, one with US dollars and one with Euros. HISTOGRAM FOR CURRENCY-SALARY will return two values (*COUNTER = 2) despite there being only one record in the file.

It may if

 01 VEHICLES VIEW OF VEHICLES  
  02 CAR-DETAILS  
    03 MAKE (1:99)  
    03 MODEL (1:99)  
    03 COLOR (1:99)  
    03 COLOUR (1:99)  

You wrote:

But I would say it is periodic group with fields that occur multiple time

This is not true,

CAR-DETAILS is NOT a Periodic Group. It is a group that contains four Multiple Valued Fields.

While Ralph discussed a Periodic Group, his post is also true for what you have, which is a bunch of MU fields.

Suppose you have a single record with three cars, whose MAKES are ACURA, FORD, TOYOTA.

HISTOGRAM VEHICLES MAKE
DISPLAY MAKE *NUMBER

The above would yield three lines of output, each of which has an *NUMBER of 1; yet, there is but one record, not three.

In general, HISTOGRAM is concerned with occurrences of values, not records.

One record could have many different values, whether you are dealing with a PE or an MU.

A question. What exactly does your Read program do?

It looks like

RD1.
READ TEST-VIEW BY A
*
#INDEX := RD1.C?*GROUP
*
IF RD1.B(#INDEX) EQ ‘D’
*
CALCULATE VALUE
*
END-IF
*
END-READ

Is intended to read through a file (why, by A?; why not READ PHYSICAL?). I assume “CALCULATE VALUE” is supposed to keep a running total of the number of occurrences of a particular value. Is this correct? Is there supposed to be a FOR loop within the READ to skim through all the occurrences of each record? (looks like you are only looking at one).

If the above is true, HISTOGRAM will not work for you. You could improve performance a lot if you got rid of the FOR loop (assuming there is such a loop) and used EXAMINE…GIVING NUMBER. for each record.

Again, what does the READ loop actually do (the pseudo code does not really explain) and are you simply trying to improve performance of the READ loop by replacing it with a HISTOGRAM loop.

I understand you cannot disclose real data names, but could you provide the actual structure of the data.