Starting with the next issue of Inside Natural (see website; address below in signature)
I have added a new column. It is called “CPU Police”. The premise of the column
is simple. Many programmers, usually under time constraints, write code for simple
functions without considering alternative code. The code ends up in production.
Unfortunately, sometimes this code ends up within loops that are executed literally
millions of times per day.
Take the code that Weihnachsbar contributed to address the posting of Joseph regarding
“extracting” a “pattern” text string. My guess is that virtually everyone reading this
would implement this solution without giving it a second thought. It is a fine solution,
except, one can write code that performs the same function yet executes in one third
the time (and consumes about half the CPU time).
This is NOT to discourage participation in this forum (or, SAG-L for that matter).
To the contrary, the point of this posting is to encourage individuals within
the Natural community to find ways to better utilize Natural by writing more efficient code,
AND, sharing that code with the rest of the user community.
Part of the problem is that when the original posting was made it sort of specified
“PATTERN” as part of the solution. As will be shown below, “extracting” a “pattern”
of characters is not something that the PATTERN option does very well. Knowing that
PATTERN is fairly expensive, and noting that the posted solution had two EXAMINE for
PATTERNs, I rewrote a solution as follows:
EXAMINE #ORIGINAL FOR ‘a’ GIVING POSITION #PATTERNSTART
EXAMINE SUBSTRING (#ORIGINAL,#PATTERNSTART) FOR ‘the’
GIVING POSITION #LENGTH
ADD 2 TO #LENGTH /* AFTER EXAMINE, ONLY HAVE UPTO THE t
Note, I have, for demonstration simplicity, eliminated various tests, and the
final MOVE SUBSTRING, which would be in common with the original solution.
The point of the above is that two EXAMINEs without PATTERN were likely to be
faster than two EXAMINEs with PATTERN, plus an EXAMINE without PATTERN, plus
a MOVE.
I wrote a timing comparison on my PC (Natural 6.1.1). Here is the code and the
output.
DEFINE DATA LOCAL
1 #LENGTH (I4)
1 #AFTERREPLACE (I4)
1 #PATTERNLENGTH (I4)
1 #PATTERNSTART (I4)
*
1 #ORIGINAL (A250)
1 #COPY (A250)
1 #FOUNDPATTERN (A250)
1 #FINDPATTERN (A250)
1 #LOOP (P9)
1 #START-CPU (P9)
1 #CPU-ELAPSED (P9)
END-DEFINE
*
INCLUDE AATITLER
INCLUDE AASETC
*
#ORIGINAL := ‘When a pattern was found in my string, to get the ‘-
’ substring that matched the pattern, or to get the length of the’-
’ pattern-matching substring’
*
#FINDPATTERN := ‘a*the’
*
SETB. SETTIME
MOVE CPU-TIME TO #START-CPU
FOR #LOOP = 1 TO 100000
EXAMINE #ORIGINAL FOR ‘a’ GIVING POSITION #PATTERNSTART
EXAMINE SUBSTRING (#ORIGINAL,#PATTERNSTART) FOR ‘the’
GIVING POSITION #LENGTH
ADD 2 TO #LENGTH / AFTER EXAMINE, ONLY HAVE UPTO THE t
END-FOR
COMPUTE #CPU-ELAPSED = *CPU-TIME - #START-CPU
WRITE 5T ‘FAST WAY’ *TIMD (SETB.) #CPU-ELAPSED /
*
SETA. SETTIME
MOVE *CPU-TIME TO #START-CPU
FOR #LOOP = 1 TO 100000
EXAMINE #ORIGINAL FOR ’ ’ GIVING LENGTH #LENGTH
#COPY := #ORIGINAL
EXAMINE #COPY FOR PATTERN #FINDPATTERN REPLACE FIRST WITH ‘.’
GIVING LENGTH #AFTERREPLACE
*
#PATTERNLENGTH := #LENGTH - #AFTERREPLACE + 1
EXAMINE #ORIGINAL FOR PATTERN #FINDPATTERN
GIVING POSITION #PATTERNSTART
END-FOR
COMPUTE #CPU-ELAPSED = *CPU-TIME - #START-CPU
WRITE 5T ‘SLOW WAY’ *TIMD (SETA.) #CPU-ELAPSED /
*
SETC. SETTIME
MOVE *CPU-TIME TO #START-CPU
FOR #LOOP = 1 TO 100000
IGNORE
END-FOR
COMPUTE #CPU-ELAPSED = *CPU-TIME - #START-CPU
WRITE 5T ‘FOR LOOP’ *TIMD (SETC.) #CPU-ELAPSED
END
PAGE # 1 DATE: Sep 08, 2006
PROGRAM: PATTRN04 LIBRARY: SYSTEM
FAST WAY 33 324
SLOW WAY 48 466
FOR LOOP 5 56
For a meaningful comparison, one has to subtract the overhead of the FOR loop from the
two approaches. Thus, the elapsed times are 28 versus 43 ( a saving of a third in the
elapsed time ) and the CPU times are 268 versus 410 (a similar saving of about a third).
Then I realized that I could improve Weihnachsbar’s code. The second EXAMINE for PATTERN
was actually not necessary. Instead, the functionality of the second EXAMINE for PATTERN
could be incorporated within the first EXAMINE for PATTERN, as in:
EXAMINE #COPY FOR PATTERN #FINDPATTERN REPLACE FIRST WITH '.'
GIVING POSITION #PATTERNSTART GIVING LENGTH #AFTERREPLACE
The new timings were:
PAGE # 1 DATE: Sep 08, 2006
PROGRAM: PATTRN05 LIBRARY: SYSTEM
FAST WAY 33 324
SLOW WAY 36 349
FOR LOOP 5 56
The difference between the two was certainly reduced. After FOR loop subtractions, the
performance difference was down to about 10% (as opposed to 30% earlier).
Then I remembered an interesting fact. SUBSTRING on the mainframe was rewritten
for Version 4, resulting in substantial performance improvement. I ran PATTRN05
on the mainframe. Here are the numbers:
PAGE # 1 DATE: Sep 08, 2006
PROGRAM: PATTRN05 LIBRARY: SYSTEM
FAST WAY 4 37
SLOW WAY 11 64
FOR LOOP 1 5
Since most of you, I am sure, run Natural on the mainframe, the numbers above
are the most important in this exercise. Subtracting out the FOR loop, shows an
almost 50 reduction in CPU time, and a 2/3 reduction in elapsed time.
steve