Share Email Print

Proceedings Paper

High-performance OCR preclassification trees
Author(s): Henry S. Baird; C. L. Mallows
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

We present an automatic method for constructing high-performance preclassification decision trees for OCR. Good preclassifiers prune the set of alternative classes to many fewer without erroneously pruning the correct class. We build the decision tree using greedy entropy minimization, using pseudo-randomly generated training samples derived from a model of imaging defects, and then `populate' the tree with many more samples to drive down the error rate. In [BM94] we presented a statistically rigorous stopping rule for population that enforces a user-specified upper bound on error: this works in practice, but is too conservative, driving the error far below the bound. Here, we describe a refinement that achieves the user- specified accuracy more closely and thus improves the pruning rate of the resulting tree. The method exploits the structure of the tree: the essential technical device is a leaf-selection rule based on Good's Theorem [Good53]. We illustrate its effectiveness through experiments on a pan-European polyfont classifier.

Paper Details

Date Published: 30 March 1995
PDF: 7 pages
Proc. SPIE 2422, Document Recognition II, (30 March 1995); doi: 10.1117/12.205840
Show Author Affiliations
Henry S. Baird, AT&T Bell Labs. (United States)
C. L. Mallows, AT&T Bell Labs. (United States)

Published in SPIE Proceedings Vol. 2422:
Document Recognition II
Luc M. Vincent; Henry S. Baird, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?