Share Email Print

Proceedings Paper

Robust keyword retrieval method for OCRed text
Author(s): Yusaku Fujii; Hiroaki Takebe; Hiroshi Tanaka; Yoshinobu Hotta
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Document management systems have become important because of the growing popularity of electronic filing of documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the electronic documents for document retrieval. Since texts generated by OCR generally include character recognition errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher than that of the conventional method. However, the precision rate was 64% lower.

Paper Details

Date Published: 24 January 2011
PDF: 7 pages
Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 787411 (24 January 2011); doi: 10.1117/12.876470
Show Author Affiliations
Yusaku Fujii, Fujitsu Labs., Ltd. (Japan)
Hiroaki Takebe, Fujitsu Labs., Ltd. (Japan)
Hiroshi Tanaka, Fujitsu Labs., Ltd. (Japan)
Yoshinobu Hotta, Fujitsu Labs., Ltd. (Japan)

Published in SPIE Proceedings Vol. 7874:
Document Recognition and Retrieval XVIII
Gady Agam; Christian Viard-Gaudin, Editor(s)

© SPIE. Terms of Use
Back to Top