Share Email Print
cover

Proceedings Paper

Preliminary evaluation of histogram-based binarization algorithms
Author(s): Junichi Kanai; Kevin O. Grover
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

To date, most optical character recognition (OCR) systems process binary document images, and the quality of the input image strongly affects their performance. Since a binarization process is inherently lossy, different algorithms typically produce different binary images from the same gray scale image. The objective of this research is to study effects of global binarization algorithms on the performance of OCR systems. Several binarization methods were examined: the best fixed threshold value for the data set, the ideal histogram method, and Otsu's algorithm. Four contemporary OCR systems and 50 hard copy pages containing 91,649 characters were used in the experiments. These pages were digitized at 300 dpi and 8 bits/pixel, and 36 different threshold values (ranging from 59 to 199 in increments of 4) were used. The resulting 1,800 binary images were processed by all four OCR systems. All systems made approximately 40% more errors from images generated by Otsu's method than those of the ideal histogram method. Two of the systems made approximately the same number of errors from images generated by the best fixed threshold value and Otsu's method.

Paper Details

Date Published: 30 March 1995
PDF: 9 pages
Proc. SPIE 2422, Document Recognition II, (30 March 1995); doi: 10.1117/12.205823
Show Author Affiliations
Junichi Kanai, Univ. of Nevada/Las Vegas (United States)
Kevin O. Grover, Univ. of Nevada/Las Vegas (United States)


Published in SPIE Proceedings Vol. 2422:
Document Recognition II
Luc M. Vincent; Henry S. Baird, Editor(s)

© SPIE. Terms of Use
Back to Top