Share Email Print

Proceedings Paper

Evaluating text categorization in the presence of OCR errors
Author(s): Kazem Taghva; Thomas A. Nartker; Julie Borsack; Steven Lumos; Allen Condit; Ron Young
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

In this paper we describe experiments that investigate the effects of OCR errors on text categorization. In particular, we show that in our environment, OCR errors have no effect on categorization when we use a classifier based on the naive Bayes model. We also observe that dimensionality reduction techniques eliminate a large number of OCR errors and improve categorization results.

Paper Details

Date Published: 21 December 2000
PDF: 7 pages
Proc. SPIE 4307, Document Recognition and Retrieval VIII, (21 December 2000); doi: 10.1117/12.410861
Show Author Affiliations
Kazem Taghva, Univ. of Nevada/Las Vegas (United States)
Thomas A. Nartker, Univ. of Nevada/Las Vegas (United States)
Julie Borsack, Univ. of Nevada/Las Vegas (United States)
Steven Lumos, Univ. of Nevada/Las Vegas (United States)
Allen Condit, Univ. of Nevada/Las Vegas (United States)
Ron Young, Univ. of Nevada/Las Vegas (United States)

Published in SPIE Proceedings Vol. 4307:
Document Recognition and Retrieval VIII
Paul B. Kantor; Daniel P. Lopresti; Jiangying Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top