Checking for a single value within an array

There are times when I need to determine whether a value is located
within an array and I wonder what the most efficient method is.

Lets assume a standard 1 dimension array of 20 occurrences.

I usually use:
if #array(*) eq scan ‘ABC’ then…

I also use:
reset #i
examine full #array(*) for ‘ABC’ giving index #i
if #i gt 0 then…

I’ve also used:
for #x 1 20
if #array(#x) eq ’ ’ escape bottom end-if
if #array(#x) eq ‘ABC’
#found := true
escape bottom
end-if
end-for
if #found then…

I guess there are probably some other possibilities…

Any ideas?

I think your 3 methods are different. Method 1 would find an array-value of ‘XXXABCXXX’ as well, Method 3 doesn’t.

BTW: I would code Method 3 as follows:

if #array(*) = 'ABC' then ...

Forgot a couple of things.

1 - lets assume you know the length of the values in the array is 3.
this would mean if #array(*) eq scan ‘ABC’ should only be true when the full value of ‘ABC’ is found [negating the fact that ABC could be anywhere within a string]

2 - forgot about if #array() eq ‘ABC’ then…
this works for positive conditions but what if you wanted to check the negative of this? (i.e. #array(
) ne ‘ABC’)?
Then: if #array() ne ‘ABC’ no longer works.
For this situation I’ve used:
if #array(
) eq ‘ABC’
ignore
else…
Not sure if this is most efficient.

Ideas?

IF this is something that will be done very often, and performance is critical, and if you do not mind a very little extra programming, I would do the following:

DEFINE DATA LOCAL
1 #GROUP (20)
2 #ARRAY (A3)
2 #DUMMY (A1)
1 REDEFINE #GROUP
2 #STRING (A80)
:::::

Move asterisks (or, some character that will never appear in the array values) to #DUMMY(*).

Now, SCAN (or EXAMINE) the #STRING. This will be orders of magnitude faster than using the array itself.

steve

Here is a timing comparison (forgot the cpu comparison, but other timings show cpu and elapsed times to be consistent)

  • THIS PROGRAM DEMONSTRATES HOW EXPENSIVE IT IS TO EXAMINE AN ARRAY
  • AS OPPOSED TO EXAMINE’ING A STRING.

DEFINE DATA LOCAL
1 #STRING (A60)
1 REDEFINE #STRING
2 #ARRAY (A3/1:20)
1 #LOOP (P9)
1 #NUMBER (P3)
END-DEFINE
*
SETTIME
FOR #LOOP = 1 TO 300000
IGNORE
END-FOR
WRITE 3/10 ‘CONTROL LOOP TIME==>’ *TIMD (0150)
*
SETTIME
FOR #LOOP = 1 TO 300000
EXAMINE #STRING FOR ‘XYZ’ GIVING NUMBER #NUMBER
END-FOR
WRITE 3/10 ‘STRING TIME==>’ TIMD (0210)
*
SETTIME
FOR #LOOP = 1 TO 300000
EXAMINE #ARRAY (
) FOR ‘XYZ’ GIVING NUMBER #NUMBER
END-FOR
WRITE 3/10 ‘ARRAY TIME==>’ *TIMD (0270)
*
END

PAGE #   1                    DATE:    Jan 16, 2008
PROGRAM: EXAMIN13             LIBRARY: INSIDE



     CONTROL LOOP TIME==>       17


     STRING TIME==>       25


     ARRAY TIME==>       58

The CONTROL LOOP is just the overhead of the FOR loop. For a true comparison, subtract this from the next two times. Thus, the comparison is 8 vs 41, a factor of 5.

steve

to answer the other question - “if #array(*) ne ‘ABC’” - if you are checking that ABC is not in ANY occurrence of the array, just move the NOT outside:

if not(#array(*) = 'ABC')

The “if #array(*) ne ‘ABC’” checks that ALL occurrences are ‘ABC’; it is true if any occurrence is not ‘ABC’.

Steve’s performance timings should be taken seriously, but that doesn’t mean that you can simply redefine all your arrays as strings.

DEFINE DATA LOCAL
1 #ARRAY (A3/2)    INIT <'ABC', 'DEF'>
                   1 REDEFINE #ARRAY
  2 #STRING (A6)
1 #SEARCH (A3)     INIT <'CDE'>
1 #NA (I4)
1 #NS (I4)
END-DEFINE
EXAMINE #ARRAY (*) FOR #SEARCH GIVING NUMBER #NA
EXAMINE #STRING    FOR #SEARCH GIVING NUMBER #NS
DISPLAY #SEARCH
        #ARRAY (*)
        #STRING
        #NA
        #NS
END
#SEARCH #ARRAY #STRING     #NA         #NS
------- ------ ------- ----------- -----------

CDE     ABC    ABCDEF            0           1
        DEF

EXAMINE of the array gives a correct result, but EXAMINE of the string does not. The array must be modified to insert a delimiter (some unused hexadecimal value) between entries.

EXAMINE of the array gives a correct result, but EXAMINE of the string does not. The array must be modified to insert a delimiter (some unused hexadecimal value) between entries.

True. See my posting before the timing comparison.

steve

So, if we are really striving for efficiency and we only want to know if the value exists in the array, change GIVING NUMBER to GIVING POSITION. That way the examine can stop when it finds the first match within the string.

Now, there are other things to consider.

  • Are you running under Natural Optimizer?
    Can you load your delimiters once, or must they be reloaded with each repetition?
    Do you have control over the array so that you CAN redefine it?

If you cannot redefine the array (perhaps it is part of a file view), and you can load your delimiters once, you can still gain some efficiency by compressing the array into a long string and using examine.

Here is a modified version of Steve’s code:

0010 * THIS PROGRAM DEMONSTRATES HOW EXPENSIVE IT IS TO EXAMINE AN ARRAY
0020 * AS OPPOSED TO EXAMINE'ING A STRING.
0030 *
0040 DEFINE DATA LOCAL
0050 1 #GROUP   (20)
0060   2 #ARRAY (A3)
0070   2 #DUMMY (A1)
0080 1 REDEFINE #GROUP
0090   2 #STRING (A80)
0100 1 #COMPRESS (A80)
0110 1 #LOOP     (I4)
0120 1 #LIMIT    (I4) INIT <500000>
0130 1 #NUMBER   (I4)
0140 END-DEFINE
0150 *
0160   OPTIONS MCG=OFF  WRITE 'Nat Optimizer: OFF'
0170 * OPTIONS MCG=ON   WRITE 'Nat Optimizer: ON '
0180 * MOVE 'XYZ' TO #ARRAY (01) WRITE '"XYZ" in FIRST position'  / '-'(30)
0190 * MOVE 'XYZ' TO #ARRAY (10) WRITE '"XYZ" in MIDDLE position' / '-'(30)
0200   MOVE 'XYZ' TO #ARRAY (20) WRITE '"XYZ" in LAST position'   / '-'(30)
0210 *
0220 SETTIME
0230 FOR #LOOP = 1 TO #LIMIT
0240   IGNORE
0250 END-FOR
0260 WRITE 'CONTROL LOOP TIME==>' *TIMD (0220)
0270 *
0280 SETTIME
0290 FOR #LOOP = 1 TO #LIMIT
0300   EXAMINE #ARRAY (*) FOR 'XYZ' GIVING INDEX  #NUMBER
0310 END-FOR
0320 WRITE 'ARRAY TIME==>       ' *TIMD (0280)
0330 *
0340 SETTIME
0350 FOR #LOOP = 1 TO #LIMIT
0360   MOVE '%' TO #DUMMY (*)
0370   EXAMINE #STRING   FOR 'XYZ' GIVING POSITION #NUMBER
0380 END-FOR
0390 WRITE 'STRING TIME==>      ' *TIMD (0340)  'move inside loop'
0400 *
0410 SETTIME                                                              
0420 FOR #LOOP = 1 TO #LIMIT                                              
0430   COMPRESS #ARRAY (*) INTO #COMPRESS WITH DELIMITERS '%'             
0440   EXAMINE #COMPRESS FOR 'XYZ' GIVING POSITION #NUMBER                
0450 END-FOR                                                              
0460 WRITE 'COMPRESS TIME==>    ' *TIMD (0410)   'compress inside loop'   
0470 *                                                                    
0480 SETTIME                                                              
0490 FOR #LOOP = 1 TO #LIMIT                                              
0500   EXAMINE #STRING   FOR 'XYZ' GIVING POSITION #NUMBER                
0510 END-FOR                                                              
0520 WRITE 'STRING TIME==>      ' *TIMD (0480)   'move outside loop'      
0530 *                                                                    
0540 SETTIME
0550 FOR #LOOP = 1 TO #LIMIT
0560   EXAMINE #COMPRESS FOR 'XYZ' GIVING POSITION #NUMBER
0570 END-FOR
0580 WRITE 'COMPRESS TIME==>    ' *TIMD (0540)   'compress outside loop'
0590 *
0600 END

and the times I got with it (I’m running on a mainframe):

Natural Optimizer OFF      ---XYZ position---
                           FIRST MIDDLE  LAST                          
                           ----- ------  ----
 CONTROL LOOP TIME==>          3     3      3                       
 ARRAY TIME==>                 8    17     24                       
 STRING TIME==> (inside)      19    20     19 
 COMPRESS TIME==> (inside)    27    27     26 
 STRING TIME==> (outside)     10     9      9 
 COMPRESS TIME==> (outside)    8     8      8 


Natural Optimizer ON       ---XYZ position---
                           FIRST MIDDLE  LAST                          
                           ----- ------  ----
 CONTROL LOOP TIME==>          0      0     0 
 ARRAY TIME==>                 7     14    21 
 STRING TIME==> (inside)       1      4     8 
 COMPRESS TIME==> (inside)    13     13    12 
 STRING TIME==> (outside)      0      3     6 
 COMPRESS TIME==> (outside)    5      5     3

Make of this what you will.

Similar to Nat6@Solaris and Nat6@WinXP:

CONTROL LOOP TIME==> 23
STRING TIME==> 35
ARRAY TIME==> 73

CONTROL LOOP TIME==> 4
STRING TIME==> 7
ARRAY TIME==> 16

Next question: Does a string-Redefinition make the array itself slower? The answer is: No.

DEFINE DATA LOCAL
1 #STRING (A60)
1 REDEFINE #STRING
  2 #ARRAY (A3/1:20)
1 #LOOP (P9)
1 #NUMBER (P3)
*
1 #ARRAY2 (A3/1:20)
1 #LOOP2 (P9)
1 #a3 (A3)
END-DEFINE
*
  SETTIME
FOR #LOOP = 1 TO 100000
  FOR #LOOP2 = 1 TO 20
    IGNORE
  END-FOR
END-FOR
WRITE 3/10 'CONTROL LOOP TIME==>' *TIMD (0130)
*
  SETTIME
FOR #LOOP = 1 TO 100000
  FOR #LOOP2 = 1 TO 20
    #array(#loop2) := 'XXX'
  END-FOR
END-FOR
WRITE 3/10 '#ARRAY TIME==>' *TIMD (0210)
*
  SETTIME
FOR #LOOP = 1 TO 100000
  FOR #LOOP2 = 1 TO 20
    #array2(#loop2) := 'XXX'
  END-FOR
END-FOR
WRITE 3/10 '#ARRAY2 TIME==>' *TIMD (0290)
*
END

#ARRAY-TIME is almost identical to #ARRAY2 TIME

Jerome LeBlanc wrote:

"Can you load your delimiters once, or must they be reloaded with each repetition? "

You should only load the delimiters once. They will never be referenced.

I will run the code again with cpu-time. The reason is quite simple. There should be no difference between the outside MOVE and the outside COMPRESS. After these “initialization” operations, #STRING and #COMPRESS are just two A80 variables being EXAMINE’d. The times should be the same.

steve

Times coming up for a comparison of the COMPRESS and MOVE (string) operations. As you will see, they are identical. The first runs, the COMPRESS won by a small margin, consistently. Then I realized there was one fewer characters after the COMPRESS, so I added the MOVE SUBSTRING. As you can see below, the two approaches are basically identical; indeed, some of the output favored the MOVE/string by one unit, rather than the other way around.

  • THIS PROGRAM DEMONSTRATES HOW EXPENSIVE IT IS TO EXAMINE AN ARRAY
  • AS OPPOSED TO EXAMINE’ING A STRING.

DEFINE DATA LOCAL
1 #GROUP (20)
2 #ARRAY (A3)
2 #DUMMY (A1)
1 REDEFINE #GROUP
2 #STRING (A80)
1 #LOOP (P9)
1 #COMPRESS (A80)
1 #NUMBER (P3)
1 #CPU-START (P9)
1 #CPU-ELAPSED (P9)
END-DEFINE
*
MOVE ‘abc’ TO #ARRAY ()
MOVE '
’ TO #DUMMY ()
COMPRESS FULL #ARRAY (
) INTO #COMPRESS WITH DELIMITER ‘
MOVE '
’ TO SUBSTRING (#COMPRESS,80,1)
*
INCLUDE AATITLER
INCLUDE AASETC
*
MOVE *CPU-TIME TO #CPU-START
SETA. SETTIME
FOR #LOOP = 1 TO 300000
IGNORE
END-FOR
COMPUTE #CPU-ELAPSED = *CPU-TIME - #CPU-START
WRITE 3/10 ‘CONTROL LOOP TIME==>’ *TIMD (SETA.) #CPU-ELAPSED
*
MOVE *CPU-TIME TO #CPU-START
SETB. SETTIME
FOR #LOOP = 1 TO 300000
EXAMINE #STRING FOR ‘XYZ’ GIVING NUMBER #NUMBER
END-FOR
COMPUTE #CPU-ELAPSED = *CPU-TIME - #CPU-START
WRITE 3/10 ‘STRING TIME==>’ *TIMD (SETB.) #CPU-ELAPSED
*
MOVE *CPU-TIME TO #CPU-START
SETC. SETTIME
FOR #LOOP = 1 TO 300000
EXAMINE #COMPRESS FOR ‘XYZ’ GIVING NUMBER #NUMBER
END-FOR
COMPUTE #CPU-ELAPSED = *CPU-TIME - #CPU-START
WRITE 3/10 ‘compress time==>’ *TIMD (SETC.) #CPU-ELAPSED
*
END

PAGE # 1 DATE: JAN 17, 2008
PROGRAM: EXAMIN16 LIBRARY: INSIDE

     CONTROL LOOP TIME==>       17        166


     STRING TIME==>       41        387


     COMPRESS TIME==>       40        386

steve

Wow. Great replies.
Didn’t realize how expensive array processing could be.
Seems the decision would be a little more code vs. readability/maintability for an entire staff.
Afraid that it could get a little confusing when someone new comes to maintain the program and doesn’t quite get the idea of using a ‘dummy’ delimiter char to help.

Again… much thanks and I will pass this along to the staff here.

Jason

Well, of course you want to load them only once. But there are situations where you might have to load them each time. For example you are calling a subprogram that is returning an array.

In any case, with EXAMINE you must be careful if what you are looking for has trailing blanks. The FULL option of the EXAMINE statement will fix this:

DEFINE DATA LOCAL                                                  
1 #GROUP (15)                                                      
  2 #ARRAY (A3) INIT <'ABC','DEF','GHI','JKL','MNO','PQR','STU'>   
  2 #DUMMY (A1) INIT ALL <'~'>                                     
1 REDEFINE #GROUP                                                  
  2 #STRING (A60)                                                  
1 #SEARCH   (A3) INIT <'HI '>                                      
1 #P        (I2)                                                   
END-DEFINE                                                         
*                                                                  
WRITE '=' #STRING /                                                
EXAMINE #STRING FOR #SEARCH GIVING POSITION #P                     
IF #P GT 0                                                         
  PRINT '=' #SEARCH 'was found at position' #P ' -- FULL not used' 
END-IF                                                             
EXAMINE #STRING FOR FULL #SEARCH GIVING POSITION #P                
IF #P = 0                                                          
   PRINT '=' #SEARCH 'was not found within string.  -- FULL used'  
END-IF                                                             
END

It is a shame that array processing is so expensive. It is quite a bit easier to follow as Jason points out.

If the array values change for each iteration (from an Adabas record or a CALLNAT, as suggested by Jerome), then the array values need to be moved to an intermediate structure, but the delimiters can be initialized at compile time. I’ll leave it to others to compare the efficiency of this structural move to a COMPRESS of the array and delimiters to create the target string.

DEFINE DATA LOCAL    
1 #M (I4)               CONST <10>    
1 #ARRAY (A3/#M)    
1 #STRING1 (A40)        INIT FULL LENGTH <H>  /* Hex FF    
                        1 REDEFINE #STRING1    
  2 #TABLE1 (#M)    
    3 #STRUCT1 (A3)    
    3 FILLER 1X    
*    
1 #TABLE2 (#M)    
  2 #STRUCT2 (A3)    
  2 #DELIMITER (A1)     INIT (*) <H> /* Hex FF    
                        1 REDEFINE #TABLE2    
  2 #STRING2 (A40)    
*    
1 #P (I4)    
END-DEFINE    
*    
ASSIGN #ARRAY (1)   = 'ABC'       /* Simulate array retrieval    
ASSIGN #STRUCT1 (*) = #ARRAY (*)  /* Insert into structure    
EXAMINE #STRING1 FOR 'ABC' GIVING POSITION #P    
WRITE '=' #P    
*    
ASSIGN #ARRAY (*)   = 'ABC'       /* Simulate array retrieval    
ASSIGN #STRUCT2 (*) = #ARRAY (*)  /* Insert into structure    
EXAMINE #STRING2 FOR 'ABC' GIVING POSITION #P    
WRITE '=' #P    
END

The first example initializes the entire string to the delimiter value. The second example initializes only the delimiters.

In a previous posting, my sample code contained EXAMINE … GIVING NUMBER. I should have coded EXAMINE … GIVING POSITION. It’s been a while since I ran comparisons, but I recall POSITION being quicker because it does not require a scan to the end of the string, as does NUMBER.