Share Email Print

Proceedings Paper

Text segmentation of machine-printed Gurmukhi script
Author(s): Gurpreet Singh Lehal; Chandan Singh
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

This paper describes a scheme for text segmentation of machine printed Gurmukhi script documents. There has been a tremendous research in text segmentation of machine printed Roman script documents. In contrast there has been very little reported research on text segmentation of Indian language scripts in general and Gurmukhi script in particular. Research in the field of text segmentation of Gurmukhi script faces major problems mainly related to the unique characteristics of the script like connectivity of characters on the headline, two or more characters in a word having intersecting minimum bounding rectangles along horizontal direction, multi-component characters, touching characters which are present even in clean documents and horizontally overlapping text segments. In our proposed method we have used horizontal projection profile to successively divide the text area into small sub-areas or horizontal strips each of which contains (1) A set of text lines or (2) A single text line or (3) Sub-parts of text lines. Using vertical projection profile the horizontal strips are physically split into smaller units such as words, characters or sub characters depending on the type of the strip. Finally each of this unit is segmented into a set of connected components. The classifier is trained to recognize these connected components which are later merged to form character(s).

Paper Details

Date Published: 21 December 2000
PDF: 9 pages
Proc. SPIE 4307, Document Recognition and Retrieval VIII, (21 December 2000); doi: 10.1117/12.410840
Show Author Affiliations
Gurpreet Singh Lehal, Thapar Institute of Engineering and Technology (India)
Chandan Singh, Punjabi Univ. (India)

Published in SPIE Proceedings Vol. 4307:
Document Recognition and Retrieval VIII
Paul B. Kantor; Daniel P. Lopresti; Jiangying Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top