Share Email Print

Proceedings Paper

Document-specific character template estimation
Author(s): Gary E. Kopec; Mauricio Lomelin
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

An approach to supervised training of document-specific character templates from sample page images and unaligned transcriptions is presented. The template estimation problem is formulated as one of constrained maximum likelihood parameter estimation within the document image decoding (DID) framework. This leads to a two-phase iterative training algorithm consisting of transcription alignment and aligned template estimation (ATE) steps. The ATE step is the heart of the algorithm and involves assigning template pixel colors to maximize likelihood while satisfying a template disjointedness constraint. The training algorithm is demonstrated on a variety of English documents, including newspaper columns, 15th century books, degraded images of 19th century newspapers, and connected text in a script-like font. Three applications enabled by the training procedure are described -- high accuracy document-specific decoding, transcription error visualization and printer font generation.

Paper Details

Date Published: 7 March 1996
PDF: 13 pages
Proc. SPIE 2660, Document Recognition III, (7 March 1996); doi: 10.1117/12.234712
Show Author Affiliations
Gary E. Kopec, Xerox Palo Alto Research Ctr. (United States)
Mauricio Lomelin, Microsoft Corp. (United States)

Published in SPIE Proceedings Vol. 2660:
Document Recognition III
Luc M. Vincent; Jonathan J. Hull, Editor(s)

© SPIE. Terms of Use
Back to Top