Share Email Print

Proceedings Paper

Script determination of mixed Chinese/English document images using Kolmogorov complexity measure
Author(s): Zheru Chi; Qing Wang
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

In this paper, we propose an approach based on Kolmogorov Complexity (KC) measuie for determining script classes in mixed Chinese (complex characters)/English document images. This approach, which mainly consists of two steps: document image preprocessing and KC measure, can successfully separate Chinese text lines from English ones. Our approach is robust and reliable in handling document images of different appearances and densities, and various fonts, sizes and styles of characters used in documents. Experimental results on a set of 40 text line images (20 English text lines and 20 Complex Chinese text lines) from various document images show that 100% correct classification rate can be achieved.

Paper Details

Date Published: 31 July 2002
PDF: 7 pages
Proc. SPIE 4875, Second International Conference on Image and Graphics, (31 July 2002); doi: 10.1117/12.477053
Show Author Affiliations
Zheru Chi, Hong Kong Polytechnic Univ. (Hong Kong)
Qing Wang, Northwestern Polytechnic Univ. (China)

Published in SPIE Proceedings Vol. 4875:
Second International Conference on Image and Graphics
Wei Sui, Editor(s)

© SPIE. Terms of Use
Back to Top