Share Email Print
cover

Proceedings Paper

Design and development of an ancient Chinese document recognition system
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

The digitization of ancient Chinese documents presents new challenges to OCR (Optical Character Recognition) research field due to the large character set of ancient Chinese characters, variant font types, and versatile document layout styles, as these documents are historical reflections to the thousands of years of Chinese civilization. After analyzing the general characteristics of ancient Chinese documents, we present a solution for recognition of ancient Chinese documents with regular font-types and layout-styles. Based on the previous work on multilingual OCR in TH-OCR system, we focus on the design and development of two key technologies which include character recognition and page segmentation. Experimental results show that the developed character recognition kernel of 19,635 Chinese characters outperforms our original traditional Chinese recognition kernel; Benchmarked test on printed ancient Chinese books proves that the proposed system is effective for regular ancient Chinese documents.

Paper Details

Date Published: 15 December 2003
PDF: 8 pages
Proc. SPIE 5296, Document Recognition and Retrieval XI, (15 December 2003); doi: 10.1117/12.529107
Show Author Affiliations
Liangrui Peng, Tsinghua Univ. (China)
Pingping Xiu, Tsinghua Univ. (China)
Xiaoqing Ding, Tsinghua Univ. (China)


Published in SPIE Proceedings Vol. 5296:
Document Recognition and Retrieval XI
Elisa H. Barney Smith; Jianying Hu; James Allan, Editor(s)

© SPIE. Terms of Use
Back to Top