Share Email Print
cover

Proceedings Paper

Automatic extraction of numeric strings in unconstrained handwritten document images
Author(s): M. Mehdi Haji; Tien D. Bui; Ching Y. Suen
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Numeric strings such as identification numbers carry vital pieces of information in documents. In this paper, we present a novel algorithm for automatic extraction of numeric strings in unconstrained handwritten document images. The algorithm has two main phases: pruning and verification. In the pruning phase, the algorithm first performs a new segment-merge procedure on each text line, and then using a new regularity measure, it prunes all sequences of characters that are unlikely to be numeric strings. The segment-merge procedure is composed of two modules: a new explicit character segmentation algorithm which is based on analysis of skeletal graphs and a merging algorithm which is based on graph partitioning. All the candidate sequences that pass the pruning phase are sent to a recognition-based verification phase for the final decision. The recognition is based on a coarse-to-fine approach using probabilistic RBF networks. We developed our algorithm for the processing of real-world documents where letters and digits may be connected or broken in a document. The effectiveness of the proposed approach is shown by extensive experiments done on a real-world database of 607 documents which contains handwritten, machine-printed and mixed documents with different types of layouts and levels of noise.

Paper Details

Date Published: 24 January 2011
PDF: 9 pages
Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740L (24 January 2011); doi: 10.1117/12.874706
Show Author Affiliations
M. Mehdi Haji, Concordia Univ. (Canada)
Tien D. Bui, Concordia Univ. (Canada)
Ching Y. Suen, Concordia Univ. (Canada)


Published in SPIE Proceedings Vol. 7874:
Document Recognition and Retrieval XVIII
Gady Agam; Christian Viard-Gaudin, Editor(s)

© SPIE. Terms of Use
Back to Top