These are the steps of indexing in Lucene given in our syllabus-:
The first step says that it is creating an index whereas the last step says that it’s adding document to index.
What’s the difference between these two? Can I get an example.
Here’s what I think it should happen-:
- Collect all words from each documents. Lists it like-;
doc1=>word1,word2,WORD3….wordn
doc2=>word1,WORD2,word3….wordn
And so on.
- Analyse the words and remove various types of words as per analyzer, process them as per analyzer.
Say now what remains is-:
doc1=>word1,word3,…word(n-1)
doc2=>word2,…word(n-3)
- Done. Now you can make inverted index as well by converting this to inverted index.