Share Email Print

Proceedings Paper

Communication theory framework for document recognition
Author(s): Gary E. Kopec; Philip A. Chou
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Document image decoding (DID) is a recently proposed generic framework for document recognition that is based on an explicit communication theory view of the processes of document creation, transmission, and interpretation. DID views a document recognition problem as containing four elements -- a message (information) source, an encoder (formatter and renderer), a noisy channel (e.g., printer, scanner) and an image decoder (recognizer). Application of DID to a particular recognition problem involves developing stochastic models for the source, encoder, and channel processes. The DID approach to modeling is based on the use of stochastic attributed context-free grammars. DID supports an approach to image decoding that has as the kernel of the method an informed best-first search algorithm, called the iterated complete path (ICP) algorithm, that is similar to branch-and bound and related heuristic search and optimization techniques. The inputs to the decoder generator are a Markov source model and values for channel parameters. The generator creates the necessary computation schedules and outputs an optimized in-line C program that implements the decoder. The customized decoder program is then compiled, linked with a support library and used to decode images.

Paper Details

Date Published: 23 March 1994
PDF: 2 pages
Proc. SPIE 2181, Document Recognition, (23 March 1994); doi: 10.1117/12.171095
Show Author Affiliations
Gary E. Kopec, Xerox Palo Alto Research Ctr. (United States)
Philip A. Chou, Xerox Palo Alto Research Ctr. (United States)

Published in SPIE Proceedings Vol. 2181:
Document Recognition
Luc M. Vincent; Theo Pavlidis, Editor(s)

© SPIE. Terms of Use
Back to Top