Share Email Print

Proceedings Paper

Methodologies for using UW databases for OCR and image-understanding systems
Author(s): Ihsin T. Phillips
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

This paper discusses methodologies for automatically selecting document pages and zones form the UW databases, having the desired page/zone attributes. The selected pages can then be randomly partitioned into subsets for training and testing purposes. This paper also discusses three degradation methodologies that allow the developers of OCR and document recognition systems to create unlimited 'real- life' degraded images - with geometric distortions, coffee stains and water marks. Since the degraded images are created from the images in the UW databases, the nearly perfect original groundtruth files in the UW databases can be reused. The process of creating the additional document images, the associated groundtruth and attribute files require only a fraction of the original cost and time.

Paper Details

Date Published: 1 April 1998
PDF: 16 pages
Proc. SPIE 3305, Document Recognition V, (1 April 1998); doi: 10.1117/12.304624
Show Author Affiliations
Ihsin T. Phillips, Seattle Univ. (United States)

Published in SPIE Proceedings Vol. 3305:
Document Recognition V
Daniel P. Lopresti; Jiangying Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top