Share Email Print

Proceedings Paper

Highly accurate retrieval method of Japanese document images through a combination of morphological analysis and OCR
Author(s): Yutaka Katsuyama; Hiroaki Takebe; Koji Kurokawa; Takahiro Saitoh; Satoshi Naoi
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

We have developed a method that allows Japanese document images to be retrieved more accurately by using OCR character candidate information and a conventional plain text search engine. In this method, the document image is first recognized by normal OCR to produce text. Keyword areas are then estimated from the normal OCR produced text through morphological analysis. A lattice of candidate- character codes is extracted from these areas, and then character strings are extracted from the lattice using a word-matching method in noun areas and a K-th DP-matching method in undefined word areas. Finally, these extracted character strings are added to the normal OCR produced text to improve document retrieval accuracy when u sing a conventional plain text search engine. Experimental results from searches of 49 OHP sheet images revealed that our method has a high recall rate of 98.2%, compared to 90.3% with a conventional method using only normal OCR produced text, while requiring about the same processing time as normal OCR.

Paper Details

Date Published: 18 December 2001
PDF: 11 pages
Proc. SPIE 4670, Document Recognition and Retrieval IX, (18 December 2001); doi: 10.1117/12.450739
Show Author Affiliations
Yutaka Katsuyama, Fujitsu Labs. Ltd. (Japan)
Hiroaki Takebe, Fujitsu Labs. Ltd. (Japan)
Koji Kurokawa, Fujitsu Labs. Ltd. (Japan)
Takahiro Saitoh, Fujitsu Labs. Ltd. (Japan)
Satoshi Naoi, Fujitsu Labs. Ltd. (Japan)

Published in SPIE Proceedings Vol. 4670:
Document Recognition and Retrieval IX
Paul B. Kantor; Tapas Kanungo; Jiangying Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top