Share Email Print

Proceedings Paper

Performance evaluation of two Arabic OCR products
Author(s): Tapas Kanungo; Gregory A. Marton; Osama Bulbul
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Numerous Optical Character Recognition (OCR) companies claim that their products have near-perfect recognition accuracy (close to 99.9%). In practice, however, these accuracy rates are rarely achieved. Most systems break down when the input document images are highly degraded, such as scanned images of carbon-copy documents, documents printed on low-quality paper, and documents that are n-th generation photocopies. Besides, the end user cannot compare the relative performances of the products because the various accuracy results are not reported on the same dataset.. In this article we report our evaluation results for two popular Arabic OCR products: (1) Sakhr OCR and (2) OmniPage for Arabic. In our evaluation we establish that the Sakhr OCR product has 15.47% lower page error rate relative to the OmniPage page error rate. The absolute page accuracy rates for Sakhr and Omnipage are 90.33% and 86.89% respectively. Our evaluation was performed using the SAIC Arabic image dataset, and we used only those pages for which both OCR systems produced output. A scatter-plot of the page accuracy-rate pairs reveals that Sakhr in general performs better on low-accuracy (degraded) pages. The scatter-plot visualization technique allows an algorithm developer to easily detect and analyze outliers in the results.

Paper Details

Date Published: 29 January 1999
PDF: 8 pages
Proc. SPIE 3584, 27th AIPR Workshop: Advances in Computer-Assisted Recognition, (29 January 1999); doi: 10.1117/12.339809
Show Author Affiliations
Tapas Kanungo, Univ. of Maryland/College Park (United States)
Gregory A. Marton, Univ. of Maryland/College Park (United States)
Osama Bulbul, Univ. of Maryland/College Park (United States)

Published in SPIE Proceedings Vol. 3584:
27th AIPR Workshop: Advances in Computer-Assisted Recognition
Robert J. Mericsko, Editor(s)

© SPIE. Terms of Use
Back to Top