Share Email Print

Proceedings Paper

Keyword spotting via word shape recognition
Author(s): Jeff L. DeCurtins; Edward C. Chen
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

With the advent of on-line access to very large collections of document images, electronic classification into areas of interest has become possible. A first approach to classification might be the use of OCR on each document followed by analysis of the resulting ASCII text. But if the quality of a document is poor, the format unconstrained, or time is critical, complete OCR of each image is not appropriate. An alternative approach is the use of word shape recognition (as opposed to individual character recognition) and the subsequent classification of documents by the presence or absence of selected keywords. Use of word shape recognition not only provides a more robust collection of features but also eliminates the need for character segmentation (a leading cause of error in OCR). In this paper we describe a system we have developed for the detection of isolated words, word portions, as well as multi-word phrases in images of documents. It is designed to be used with large, changeable, keyword sets and very large document sets. The system provides for automated training of desired keywords and creation of indexing filters to speed matching.

Paper Details

Date Published: 30 March 1995
PDF: 8 pages
Proc. SPIE 2422, Document Recognition II, (30 March 1995); doi: 10.1117/12.205829
Show Author Affiliations
Jeff L. DeCurtins, SRI International (United States)
Edward C. Chen, SRI International (United States)

Published in SPIE Proceedings Vol. 2422:
Document Recognition II
Luc M. Vincent; Henry S. Baird, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?