Share Email Print

Proceedings Paper

Generic approach for OCR performance evaluation
Author(s): Abdel Belaid; Laurent Pierron
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

This paper presents the limits of the character recognition engines (commercial OCRs) and how to exceed these limits to achieve the industrial goals in terms of document capture and coding performances. The recent integration of these OCRs in several industrial capture chains leads to think that a solution is possible to reach electronically the same performances obtained by human typists. After a global description of the problems and the exposure of the OCR limits, the paper will focus on the methodology used and details the different steps proposed for the individual performance improvement. The first step consists in the individual evaluation of the OCRs. This is made by comparing the OCR result with a ground truth, which allows to highlight its defects and catalogue its main errors on the document processed. The second step allows to increase these individual performances by combination the OCR with some others. Our choice has been fixed on the combination of only two OCRs deemed very efficient and complementary on the same class of documents. The residual errors are treated in the last step which be able to propose a list of heuristics resolving punctually the OCR defects on the limit cases. In order to validate our approach, we present in the second part of the paper a practical case of experimentation to reach industrial performances. This approach has been tested in the framework of an industrial application for automatic document capture, by attempting the lowest score, imposed on one specific document class, of 1 error for 10000 characters.

Paper Details

Date Published: 18 December 2001
PDF: 13 pages
Proc. SPIE 4670, Document Recognition and Retrieval IX, (18 December 2001); doi: 10.1117/12.450729
Show Author Affiliations
Abdel Belaid, CNRS (France)
Laurent Pierron, CNRS (France)

Published in SPIE Proceedings Vol. 4670:
Document Recognition and Retrieval IX
Paul B. Kantor; Tapas Kanungo; Jiangying Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?