Share Email Print

Proceedings Paper

OCR result optimization based on pattern matching
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Post-processing of OCR is a bottleneck of the document image processing system. Proof reading is necessary since the current recognition rate is not enough for publishing. The OCR system provides every recognition result with a confident or unconfident label. People only need to check unconfident characters while the error rate of confident characters is low enough for publishing. However, the current algorithm marks too many unconfident characters, so optimization of OCR results is required. In this paper we propose an algorithm based on pattern matching to decrease the number of unconfident results. If an unconfident character matches a confident character well, its label could be changed into a confident one. Pattern matching makes use of original character images, so it could reduce the problem caused by image normalization and scanned noises. We introduce WXOR, WAN, and four-corner based pattern matching to improve the effect of matching, and introduce confidence analysis to reduce the errors of similar characters. Experimental results show that our algorithm achieves improvements of 54.18% in the first image set that contains 102,417 Chinese characters, and 49.85% in the second image set that contains 53,778 Chinese characters.

Paper Details

Date Published: 29 January 2007
PDF: 8 pages
Proc. SPIE 6500, Document Recognition and Retrieval XIV, 650009 (29 January 2007); doi: 10.1117/12.702786
Show Author Affiliations
Junqing Shang, Tsinghua Univ. (China)
Changsong Liu, Tsinghua Univ. (China)
Xiaoqing Ding, Tsinghua Univ. (China)

Published in SPIE Proceedings Vol. 6500:
Document Recognition and Retrieval XIV
Xiaofan Lin; Berrin A. Yanikoglu, Editor(s)

© SPIE. Terms of Use
Back to Top