Share Email Print

Proceedings Paper

Robust language-independent OCR system
Author(s): Zhidong A. Lu; Issam Bazzi; Andras Kornai; John Makhoul; Premkumar S. Natarajan; Richard Schwartz
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

We present a language-independent optical character recognition system that is capable, in principle, of recognizing printed text from most of the world's languages. For each new language or script the system requires sample training data along with ground truth at the text-line level; there is no need to specify the location of either the lines or the words and characters. The system uses hidden Markov modeling technology to model each character. In addition to language independence, the technology enhances performance for degraded data, such as fax, by using unsupervised adaptation techniques. Thus far, we have demonstrated the language-independence of this approach for Arabic, English, and Chinese. Recognition results are presented in this paper, including results on faxed data.

Paper Details

Date Published: 29 January 1999
PDF: 9 pages
Proc. SPIE 3584, 27th AIPR Workshop: Advances in Computer-Assisted Recognition, (29 January 1999); doi: 10.1117/12.339811
Show Author Affiliations
Zhidong A. Lu, BBN Technologies/GTE (United States)
Issam Bazzi, BBN Technologies/GTE (United States)
Andras Kornai, BBN Technologies/GTE (United States)
John Makhoul, BBN Technologies/GTE (United States)
Premkumar S. Natarajan, BBN Technologies/GTE (United States)
Richard Schwartz, BBN Technologies/GTE (United States)

Published in SPIE Proceedings Vol. 3584:
27th AIPR Workshop: Advances in Computer-Assisted Recognition
Robert J. Mericsko, Editor(s)

© SPIE. Terms of Use
Back to Top