Share Email Print

Proceedings Paper

A word language model based contextual language processing on Chinese character recognition
Author(s): Chen Huang; Xiaoqing Ding; Yan Chen
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

The language model design and implementation issue is researched in this paper. Different from previous research, we want to emphasize the importance of n-gram models based on words in the study of language model. We build up a word based language model using the toolkit of SRILM and implement it for contextual language processing on Chinese documents. A modified Absolute Discount smoothing algorithm is proposed to reduce the perplexity of the language model. The word based language model improves the performance of post-processing of online handwritten character recognition system compared with the character based language model, but it also increases computation and storage cost greatly. Besides quantizing the model data non-uniformly, we design a new tree storage structure to compress the model size, which leads to an increase in searching efficiency as well. We illustrate the set of approaches on a test corpus of recognition results of online handwritten Chinese characters, and propose a modified confidence measure for recognition candidate characters to get their accurate posterior probabilities while reducing the complexity. The weighted combination of linguistic knowledge and candidate confidence information proves successful in this paper and can be further developed to achieve improvements in recognition accuracy.

Paper Details

Date Published: 18 January 2010
PDF: 10 pages
Proc. SPIE 7534, Document Recognition and Retrieval XVII, 75340N (18 January 2010); doi: 10.1117/12.838718
Show Author Affiliations
Chen Huang, Tsinghua Univ. (China)
Xiaoqing Ding, Tsinghua Univ. (China)
Yan Chen, Tsinghua Univ. (China)

Published in SPIE Proceedings Vol. 7534:
Document Recognition and Retrieval XVII
Laurence Likforman-Sulem; Gady Agam, Editor(s)

© SPIE. Terms of Use
Back to Top