I need to update some specific fields of XML documents (records) in a huge database. I figure out that the most straightforward way is to use XQuery programmatically as follows:
for each “somekey” in list of “keys”
update for $a in input()/root where $a/key=“somekey”
do insert “value corresponding to ‘somekey’” into $a/target
} /* of course, this is only the algorithm which I will implement in Java API */
Here the “/root/key” element is properly indexed. But, I have some 10000 distinct “keys” which I try to match, and corresponding to each key I insert the “newelement” into the “/root/target”. And this particular doctype “root” contains some 4.5 million documents/records in it.
My question is: given this scenario, instead of doing as above, would it be advantageous to access the database records sequentially (i.e. going through the whole database only once), modify the document in memory and put it back into the database? The algorithm would be something like:
for each document “doc” in “root” doctype
currentkey ← “doc”/root/key
insert element value corresponding to ‘currentkey’ into “doc”/root/target
write back “doc” to the database
In other words, I seem to fail to appreciate how does indexing help in general? Does my first algorithm need 100004500000 operations (based on the numbers I gave above)? Or does it only need 10000 operations? Or, probably it actually needs 10000log(4500000), but it is still very bad. How many operations does it take to fetch the right document based on an indexed value?
Could you please pour in your valuable ideas. I can give more details, if you need … for now I wanted to keep it simple, hence this silly way of writing.