Share Email Print

Proceedings Paper

Character segmentation using visual interword constraints in a text page
Author(s): Tao Hong; Jonathan J. Hull
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Character segmentation is a critical preprocessing step for text recognition. In this paper a method is presented that utilizes visual inter-word constraints available in a text image to split word images into smaller image pieces. This method is applicable to machine-printed texts in which the same spacing is always used between identical pairs of characters. The visual inter- word constraints considered here include information about whether a word image is a sub- image of another word image. For example, given two word images A and B, which are `mathematical' and `the.' If the short word image B is found to be a sub-image of the long word image A, the longer image A is split into three pieces, A1, A2, and A3, where A2 matches B, A1 corresponds to `ma,' and A3 corresponds to `matical.' The image piece A1 can be further used to split A3 into two parts, `ma' and `tical.' This method is based purely on image processing using the visual context in a text page. No recognition is involved.

Paper Details

Date Published: 30 March 1995
PDF: 11 pages
Proc. SPIE 2422, Document Recognition II, (30 March 1995); doi: 10.1117/12.205820
Show Author Affiliations
Tao Hong, SUNY/Buffalo (United States)
Jonathan J. Hull, SUNY/Buffalo (United States)

Published in SPIE Proceedings Vol. 2422:
Document Recognition II
Luc M. Vincent; Henry S. Baird, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?