I got an interesting performane question.
There is File A with over 1 Millon Records to process. An there is a file B with ~20 records. I have to process all records of File A, but I have to check, if there is a Record in File B to do some additional stuff. In Natural you would code:
READ FILE_A BY PRIMARY-KEY
/* do some stuff here
FN-B. FIND NUMBER FILE_B WITH FOREIGN-KEY = FILE_A.PRIMARY-KEY
IF NUMBER(FN-B.) NE 0
/ do some additional stuff here
END-IF
END-READ
So far so good. In my case I feel that this code has a bad performance. Because you do 1 Mio Finds. So I tried the following:
READ FILE_B BY FOREIGN-KEY
ADD 1 TO (#I4)
EXPAND ARRAY #ARRAY TO (1:#I4) #ARRAY(#I4) := FOREIGN-KEY
END-READ
*
READ FILE_A BY PRIMARY-KEY
/* do some stuff here
IF PRIMARY-KEY = #ARRAY()
/ do some additional stuff here
END-IF
END-READ
Here’s the performance (while 1486 means 148.6 secs)
search array 1486 #FOUND: 20
find number 2527 #FOUND: 20
→ Searching an array (with size 20) is faster than find number.
But now I want to go the next step.
Question #1 is: When (at which array-size) does the array-technique become inperformant vs. find number?
→ I wrote a test program to try out some sizes. I think it’s ~500. Of course this could be dependent on hardware etc.
search array 2503 #FOUND: 500
find number 2513 #FOUND: 500
search array 3547 #FOUND: 1000
find number 2443 #FOUND: 1000
Question #2 is: For huge arrays there must be a better way to search then IF PRIMARY-KEY = #ARRAY(*). Which way is the best? I already got an idea, but what do you think?
the reason for this runtime behavior is, that the arrray will be read sequentially and every occurance will passed by an IF statement until an occurence fits to that IF.
You could write an function that uses the technique of folding (german: falten).
This will decrease the number of executet IF statements by far.
Using techniques like "(A) DYNAMIC and BY VALUE in the paramter definition enables you to even write an abstract solution to this problem, which could be reused for similar problems.
examine full #array(*) for full … giving index … is not faster.
Here are the numbers:
search array 1463
examine 1436
find number 2477
In my case, there are only 20 records in file b. And I know that the number isn’t changing that much. But I thought: What, if there would be 100 or even more records? When does IF #array(*) = key become inperformant compared to FIND? In my case it was 500. And that’s interesting because I didn’t expect that high number. And I think with a more intelligent searching algorithm, we can push that limit to over 1000. And this would change my way of writing programs a lot.
By the result you publish, i understand Examine is faster, am i right?
I am not sure, calling a Function will improve the overall time, but sure will be usefull
But it could be, that because of the usage of that by VALUE parameter definition the performance may be bad.
The reason is, that with every invoke of that subprogram the whole array is physically copied.
It seems to me, that Find is always around 2400, regardless the array size, comparing to search (func,extSub copycode & etc) that is varying depending the array size