Share Email Print

Proceedings Paper

OCR for World Wide Web images
Author(s): Jiangying Zhou; Daniel P. Lopresti; Zhibin Lei
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

A significant amount of text now present in World Wide Web documents is embedded in image data, and a large portion of it does not appear elsewhere at all. To make this information available, we need to develop techniques for recovering textual information from in-line Web images. In this paper, we describe two methods for Web image OCR. Recognizing text extracted from in-line Web images is difficult because characters in these images are often rendered at a low spatial resolution. Such images are typically considered to be 'low quality' by traditional OCR technologies. Our proposed methods utilize the information contained in the color bits to compensate for the loss of information due to low sampling resolution. The first method uses a polynomial surface fitting technique for object recognition. The second method is based on the traditional n-tuple technique. We collected a small set of character samples from Web documents and tested the two algorithms. Preliminary experimental results show that our n-tuple method works quite well. However, the surface fitting method performs rather poorly due to the coarseness and small number of color shades used in the text.

Paper Details

Date Published: 3 April 1997
PDF: 9 pages
Proc. SPIE 3027, Document Recognition IV, (3 April 1997); doi: 10.1117/12.270080
Show Author Affiliations
Jiangying Zhou, Panasonic Technologies, Inc. (United States)
Daniel P. Lopresti, Panasonic Technologies, Inc. (United States)
Zhibin Lei, Brown Univ. (United States)

Published in SPIE Proceedings Vol. 3027:
Document Recognition IV
Luc M. Vincent; Jonathan J. Hull, Editor(s)

© SPIE. Terms of Use
Back to Top