Share Email Print

Proceedings Paper

Partitioning of the degradation space for OCR training
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Generally speaking optical character recognition algorithms tend to perform better when presented with homogeneous data. This paper studies a method that is designed to increase the homogeneity of training data, based on an understanding of the types of degradations that occur during the printing and scanning process, and how these degradations affect the homogeneity of the data. While it has been shown that dividing the degradation space by edge spread improves recognition accuracy over dividing the degradation space by threshold or point spread function width alone, the challenge is in deciding how many partitions and at what value of edge spread the divisions should be made. Clustering of different types of character features, fonts, sizes, resolutions and noise levels shows that edge spread is indeed shown to be a strong indicator of the homogeneity of character data clusters.

Paper Details

Date Published: 16 January 2006
PDF: 8 pages
Proc. SPIE 6067, Document Recognition and Retrieval XIII, 606705 (16 January 2006); doi: 10.1117/12.641229
Show Author Affiliations
Elisa H. Barney Smith, Boise State Univ. (United States)
Tim Andersen, Boise State Univ. (United States)

Published in SPIE Proceedings Vol. 6067:
Document Recognition and Retrieval XIII
Kazem Taghva; Xiaofan Lin, Editor(s)

© SPIE. Terms of Use
Back to Top