Share Email Print

Proceedings Paper

Incorporating linguistic post-processing into whole-book recognition
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

We describe a technique of linguistic post-processing of whole-book recognition results. Whole-book recognition is a technique that improves recognition of book images using fully automatic cross-entropy-based model adaptation. In previous published works, word recognition was performed on individual words separately, without awaring passage-level information such as word-occurrence frequencies. Therefore, some rare words in real texts may appear much more often in recognition results; vice versa. Differences between word frequencies in recognition results and in prior knowledge may indicate recognition errors on a long passage. In this paper, we propose a post-processing technique to enhance whole-book recognition results by minimizing differences between word frequencies in recognition results and prior word frequencies. This technique works better when operating on longer passages, and it drives the character error rate down 20% from 1.24% to 0.98% in a 90-page experiment.

Paper Details

Date Published: 18 January 2010
PDF: 8 pages
Proc. SPIE 7534, Document Recognition and Retrieval XVII, 75340M (18 January 2010); doi: 10.1117/12.839099
Show Author Affiliations
Pingping Xiu, Lehigh Univ. (United States)
Henry S. Baird, Lehigh Univ. (United States)

Published in SPIE Proceedings Vol. 7534:
Document Recognition and Retrieval XVII
Laurence Likforman-Sulem; Gady Agam, Editor(s)

© SPIE. Terms of Use
Back to Top