Share Email Print

Proceedings Paper

Multilevel character templates for document image decoding
Author(s): Gary E. Kopec
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Early work in document image decoding was based on a bilevel imaging model in which an observed image is formed by passing an ideal bilevel image through a memoryless asymmetric bit-flip channel. While this simple model has proven useful in practice, there are many situations in which the bit-flip channel is an inadequate degradation mode. This paper presents a multilevel generalization of the bilevel model in which the pixels of the ideal image are assigned values from a finite set of L discrete 'levels. Level 0 is a background color and the remaining levels are foreground colors. The observed image is bilevel and is modelled as the output of a memoryless L-input symbol, 2- output symbol, 2-output symbol channel. The multilevel model is motivated in part by the intuition that pixel sin a character image are more or less reliably black, depending on their distance from an edge. In addition, the multilevel model supports both 'write-black' and 'write-write' levels, and thus can be used to implement a probabilistic analog of morphological 'hit-miss' filtering. In experiments with the University of Washington UW-II English journal database, the character error rate with multilevel templates was about a factor of four less than the error rate with bilevel templates.

Paper Details

Date Published: 3 April 1997
PDF: 10 pages
Proc. SPIE 3027, Document Recognition IV, (3 April 1997); doi: 10.1117/12.270070
Show Author Affiliations
Gary E. Kopec, Xerox Palo Alto Research Ctr. (United States)

Published in SPIE Proceedings Vol. 3027:
Document Recognition IV
Luc M. Vincent; Jonathan J. Hull, Editor(s)

© SPIE. Terms of Use
Back to Top