Share Email Print

Proceedings Paper

Using the web to validate document recognition results: experiments with business cards
Author(s): Clemens Oertel; Shauna O'Shea; Adam Bodnar; Dorothea Blostein
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

The World Wide Web is a vast information resource which can be useful for validating the results produced by document recognizers. Three computational steps are involved, all of them challenging: (1) use the recognition results in a Web search to retrieve Web pages that contain information similar to that in the document, (2) identify the relevant portions of the retrieved Web pages, and (3) analyze these relevant portions to determine what corrections (if any) should be made to the recognition result. We have conducted exploratory implementations of steps (1) and (2) in the business-card domain: we use fields of the business card to retrieve Web pages and identify the most relevant portions of those Web pages. In some cases, this information appears suitable for correcting OCR errors in the business card fields. In other cases, the approach fails due to stale information: when business cards are several years old and the business-card holder has changed jobs, then websites (such as the home page or company website) no longer contain information matching that on the business card. Our exploratory results indicate that in some domains it may be possible to develop effective means of querying the Web with recognition results, and to use this information to correct the recognition results and/or detect that the information is stale.

Paper Details

Date Published: 17 January 2005
PDF: 11 pages
Proc. SPIE 5676, Document Recognition and Retrieval XII, (17 January 2005); doi: 10.1117/12.588717
Show Author Affiliations
Clemens Oertel, Eberhard-Karls-Univ. Tubingen (Germany)
Shauna O'Shea, Queen's Univ. (Canada)
Adam Bodnar, Univ. of British Columbia (Canada)
Dorothea Blostein, Queen's Univ. (Canada)

Published in SPIE Proceedings Vol. 5676:
Document Recognition and Retrieval XII
Elisa H. Barney Smith; Kazem Taghva, Editor(s)

© SPIE. Terms of Use
Back to Top