Share Email Print
cover

Proceedings Paper

A statistical approach to line segmentation in handwritten documents
Author(s): Manivannan Arivazhagan; Harish Srinivasan; Sargur Srihari
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

A new technique to segment a handwritten document into distinct lines of text is presented. Line segmentation is the first and the most critical pre-processing step for a document recognition/analysis task. The proposed algorithm starts, by obtaining an initial set of candidate lines from the piece-wise projection profile of the document. The lines traverse around any obstructing handwritten connected component by associating it to the line above or below. A decision of associating such a component is made by (i) modeling the lines as bivariate Gaussian densities and evaluating the probability of the component under each Gaussian or (ii)the probability obtained from a distance metric. The proposed method is robust to handle skewed documents and those with lines running into each other. Experimental results show that on 720 documents (which includes English, Arabic and children's handwriting) containing a total of 11, 581 lines, 97.31% of the lines were segmented correctly. On an experiment over 200 handwritten images with 78, 902 connected components, 98.81% of them were associated to the correct lines.

Paper Details

Date Published: 29 January 2007
PDF: 11 pages
Proc. SPIE 6500, Document Recognition and Retrieval XIV, 65000T (29 January 2007); doi: 10.1117/12.704538
Show Author Affiliations
Manivannan Arivazhagan, Univ. at Buffalo, SUNY (United States)
Harish Srinivasan, Univ. at Buffalo, SUNY (United States)
Sargur Srihari, Univ. at Buffalo, SUNY (United States)


Published in SPIE Proceedings Vol. 6500:
Document Recognition and Retrieval XIV
Xiaofan Lin; Berrin A. Yanikoglu, Editor(s)

© SPIE. Terms of Use
Back to Top