Share Email Print

Proceedings Paper

Use of synthesized images to evaluate the performance of optical character recognition devices and algorithms
Author(s): Frank R. Jenkins; Junichi Kanai
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Synthesizing document images is a cost effective way to create a large test database and allows researchers to control typesetting and noise variables. Yet the effectiveness of using synthesized images in optical character recognition (OCR) research has not been extensively investigated. In this project, three kinds of test databases were used to study the performance of OCR devices: digitized `real world' documents, page images synthesized from ASCII files, and the synthesized images printed and digitized. The cleanest synthesized images were not necessarily recognized most accurately. Our results suggest that, in addition to typographical features and noise, linguistic features affect the performance of an OCR device.

Paper Details

Date Published: 23 March 1994
PDF: 10 pages
Proc. SPIE 2181, Document Recognition, (23 March 1994); doi: 10.1117/12.171107
Show Author Affiliations
Frank R. Jenkins, Univ. of Nevada/Las Vegas (United States)
Junichi Kanai, Univ. of Nevada/Las Vegas (United States)

Published in SPIE Proceedings Vol. 2181:
Document Recognition
Luc M. Vincent; Theo Pavlidis, Editor(s)

© SPIE. Terms of Use
Back to Top