Chapter 4 Index Construction
This chapter introduces and compares several ways to construct an index.
In comparing the BSBI and the SPIMI, the SPIMI is better than the BSBI in
several aspects:
1. The SPIMI needs not a data structure for mapping terms to term IDs which takes great
memory space in large collections.
2. The SPIMI does not sort the tokens; therefore, it runs faster than the BSBI.
However, I also noticed that, the SPIMI actually sorts the terms in lexicographic order
before writes the index of the block into the disk to help merge the blocks later.
Although the SPIMI seems better than the BSBI, I am curious about in what situations,
the BSBI is more suitable than the SPIMI. Or is the SPIMI always faster than the BSBI?
No comments:
Post a Comment