Share Email Print

Proceedings Paper

Word segmentation of off-line handwritten documents
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Word segmentation is the most critical pre-processing step for any handwritten document recognition and/or retrieval system. When the writing style is unconstrained (written in a natural manner), recognition of individual components may be unreliable, so they must be grouped together into word hypotheses before recognition algorithms can be used. This paper describes a gap metrics based machine learning approach to separate a line of unconstrained handwritten text into words. Our approach uses a set of both local and global features, which is motivated by the ways in which human beings perform this kind of task. In addition, in order to overcome the disadvantage of different distance computation methods, we propose a combined distance measure computed using three different methods. The classification is done by using a three-layer neural network. The algorithm is evaluated using an unconstrained handwriting database that contains 50 pages (1026 line, 7562 words images) handwritten documents. The overall accuracy is 90.8%, which shows a better performance than a previous method.

Paper Details

Date Published: 28 January 2008
PDF: 6 pages
Proc. SPIE 6815, Document Recognition and Retrieval XV, 68150E (28 January 2008); doi: 10.1117/12.767055
Show Author Affiliations
Chen Huang, Univ. at Buffalo (United States)
Sargur N. Srihari, Univ. at Buffalo (United States)

Published in SPIE Proceedings Vol. 6815:
Document Recognition and Retrieval XV
Berrin A. Yanikoglu; Kathrin Berkner, Editor(s)

© SPIE. Terms of Use
Back to Top