I also thought about the buffer pool. So my test code above reads the first 1.000.000 records of file_a and takes e.g. 500 records from it for searching. The records to search are equally distributed.
And after that I do the first test. It’s not provable - but I think some (or even all) of the file_a records are in the bufferpool before the first test starts.
Lots of good suggestions here. From my perspective and experience:
The outside loop will consume most of the time if you eliminate the FIND NUMBER.
Doing a READ to load into an array is a great idea I do it all the time. Note I also use this technique for table lookups online using +AIV-arrays.
If you are reading lots of records for lookup use MULTIFETCH.
Try setting array size to max at beginning and only expand when you get to max instead of each time.
In many real life situations there are related fields (arrays) so Examine giving occurrence is very valuable to obtain the related data items.
My 5 cents.