Share Email Print

Proceedings Paper

Touching character segmentation method for Chinese historical documents
Author(s): Xiaolu Sun; Liangrui Peng; Xiaoqing Ding
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

The OCR technology for Chinese historical documents is still an open problem. As these documents are hand-written or hand-carved in various styles, overlapped and touching characters bring great difficulty for character segmentation module. This paper presents an over-segmentation-based method to handle the overlapped and touching Chinese characters in historic documents. The whole segmentation process includes two parts: over-segmented and segmenting path optimization. In the former part, touching strokes will be found and segmented by analyzing the geometric information of the white and black connected components. The segmentation cost of the touching strokes is estimated with connected components' shape and location, as well as the touching stroke width. The latter part uses local optimization dynamic programming to find best segmenting path. HMM is used to express the multiple choices of segmenting paths, and Viterbi algorithm is used to search local optimal solution. Experimental results on practical Chinese documents show the proposed method is effective.

Paper Details

Date Published: 18 January 2010
PDF: 8 pages
Proc. SPIE 7534, Document Recognition and Retrieval XVII, 75340D (18 January 2010); doi: 10.1117/12.840251
Show Author Affiliations
Xiaolu Sun, Tsinghua Univ. (China)
Liangrui Peng, Tsinghua Univ. (China)
Xiaoqing Ding, Tsinghua Univ. (China)

Published in SPIE Proceedings Vol. 7534:
Document Recognition and Retrieval XVII
Laurence Likforman-Sulem; Gady Agam, Editor(s)

© SPIE. Terms of Use
Back to Top