Share Email Print
cover

Proceedings Paper

An improved inverted index model and its retrieval algorithm
Author(s): Chaotao Liu; Zushu Li
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

The traditional inverted index scheme has some deficiencies owing to its only covering the word terms' frequency and positions in documents, but not covering the space sequences of the word terms in the documents' structures. This paper developed an improved inverted index scheme, which combined the paragraph sequences, sentence sequences and word sequences as a list to replace the posting list in the traditional inverted index. And the algorithm of similarity calculation and text retrieval based on this improved inverted index scheme was given. The similarity is the result of the traditional similarity multiplying paragraph sequence similarity coefficient, sentence sequence similarity coefficient, and words sequence similarity coefficient, which can denote as SimNew(D,Q) = Sim(D,Q) * CeofP * CeofS * CeofW. By calculating similarity, the documents can be ranked as retrieval results. As an experiment, some documents selected from the search results of Google was reranked by similarity calculated with this algorithm. The result of the experiment shows that this algorithm is helpful for users to retrieve information which can match the users' queries much more.

Paper Details

Date Published: 9 January 2008
PDF: 6 pages
Proc. SPIE 6794, ICMIT 2007: Mechatronics, MEMS, and Smart Materials, 679443 (9 January 2008); doi: 10.1117/12.784029
Show Author Affiliations
Chaotao Liu, Chongqing Univ. (China)
Chongqing Jiaotong Univ. (China)
Zushu Li, Chongqing Univ. (China)
Chongqing Institute of Technology (China)


Published in SPIE Proceedings Vol. 6794:
ICMIT 2007: Mechatronics, MEMS, and Smart Materials

© SPIE. Terms of Use
Back to Top