Share Email Print

Proceedings Paper

Stochastic attribute grammar model of document production and its use in document image decoding
Author(s): Philip A. Chou; Gary E. Kopec
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Document image decoding (DID) refers to the process of document recognition within a communication theory framework. In this framework, a logical document structure is a message communicated by encoding the structure as an ideal image, transmitting the ideal image through a noisy channel, and decoding the degraded image into a logical structure as close to the original message as possible, on average. Thus document image decoding is document image recognition where the recognizer performs optimal reconstruction by explicitly modeling the source of logical structures, the encoding procedure, and the channel noise. In previous work, we modeled the source and encoder using probabilistic finite-state automata and transducers. In this paper, we generalize the source and encoder models using context-free attribute grammars. We employ these models in a document image decoder that uses a dynamic programming algorithm to minimize the probability of error between original and reconstructed structures. The dynamic programming algorithm is a generalization of the Cocke-Younger-Kasami parsing algorithm.

Paper Details

Date Published: 30 March 1995
PDF: 8 pages
Proc. SPIE 2422, Document Recognition II, (30 March 1995); doi: 10.1117/12.205842
Show Author Affiliations
Philip A. Chou, Xerox Palo Alto Research Ctr. (United States)
Gary E. Kopec, Xerox Palo Alto Research Ctr. (United States)

Published in SPIE Proceedings Vol. 2422:
Document Recognition II
Luc M. Vincent; Henry S. Baird, Editor(s)

© SPIE. Terms of Use
Back to Top