Share Email Print

Proceedings Paper

Divide-and-conquer approach to Japanese text segmentation
Author(s): Stephen W. Lam; Qunfeng Liao; Sargur N. Srihari
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

This paper presents a robust text segmentation algorithm for printed Japanese documents. A divide-and-conquer approach is proposed to handle a large variety of image qualities and print styles. The approach can adapt its processing strategies according to the text quality, i.e., a method using diverse knowledge sources will be exploited to segment degraded text while a fast simple method will be used for good quality text. Since the algorithm can adaptively select the methods for different scenarios, the segmentation is highly efficient in terms of speed and accuracy. The segmenter has tree modules for image preprocessing, line segmentation, and character segmentation. The preprocessor uses the statistical information of the image connected components to globally estimate character size and uses projection profile to determine image quality. The line segmenter requires a `thresholding and smoothing' step prior to line extraction if the image is noisy. During character segmentation, the character segmenter first tries to locate components which contain touching characters. If touching characters exist, an algorithm which includes a profile-based splitting and classifier-based multiple hypothesis processing will be invoked to perform the segmentation.

Paper Details

Date Published: 30 March 1995
PDF: 12 pages
Proc. SPIE 2422, Document Recognition II, (30 March 1995); doi: 10.1117/12.205824
Show Author Affiliations
Stephen W. Lam, SUNY/Buffalo (United States)
Qunfeng Liao, SUNY/Buffalo (United States)
Sargur N. Srihari, SUNY/Buffalo (United States)

Published in SPIE Proceedings Vol. 2422:
Document Recognition II
Luc M. Vincent; Henry S. Baird, Editor(s)

© SPIE. Terms of Use
Back to Top