Share Email Print
cover

Proceedings Paper

Discrimination of handwritten from machine-printed text
Author(s): Steve Chahal
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

The problem of discriminating handwritten from machine-printed text is important for character recognition applications because most recognition algorithms for handwritten text differ considerably from those for machine-printed text. Therefore, an efficient segregation of the two streams is necessary prior to recognition in order to minimize systems cost and complexity. Several techniques have been proposed based on character connectivity and heuristics; but very few achieve results at the 99% level. The technique described in this paper has been proven to yield performance figures in the high 99% on tens of thousands of IRS tax forms and postal envelopes. The technique proposed is based on the use of density of black to white for a given binary field and the overall density of pixels for a gray-scale field as a main discrimination feature. First, the given field is boxed very closely and its boundaries are isolated in space. A horizontal histogram is extracted for this field, and the total number of black pixels is computed. The amount of black pixels per unit area is generated for binary text, and the sum of all pixels is generated for gray-level text. When tested on a large number of samples, these densities cluster following distinct normal distributions for handwritten and machine-printed text respectively. Fuzzy thresholds are set where the two normal curves cross with a confidence interval of 99%. The samples whose densities fall below the threshold are considered handwritten and the samples whose densities fall above the threshold are considered machine-printed.

Paper Details

Date Published: 1 June 1994
PDF: 8 pages
Proc. SPIE 2238, Hybrid Image and Signal Processing IV, (1 June 1994); doi: 10.1117/12.177714
Show Author Affiliations
Steve Chahal, Grumman Data Systems Corp. (United States)


Published in SPIE Proceedings Vol. 2238:
Hybrid Image and Signal Processing IV
David P. Casasent; Andrew G. Tescher, Editor(s)

© SPIE. Terms of Use
Back to Top