Share Email Print

Proceedings Paper

Accessing textual information embedded in Internet images
Author(s): Apostolos Antonacopoulos; Dimosthenis Karatzas; Jordi Ortiz-Lopez
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Indexing and searching for WWW pages is relying on analyzing text. Current technology cannot process the text embedded in images on WWW pages. This paper argues that this is a significant problem as text in image form is usually semantically important (e.g. headers, titles). The results of a recent study are presented to show that the majority (76%) of words embedded in images do not appear elsewhere in the main text and that the majority (56%) of ALT tag descriptions of images are incorrect of do not exist at all. Research under way to devise tools to extracted text from images based on the way humans perceive color differences is outlined and results are presented.

Paper Details

Date Published: 27 December 2000
PDF: 8 pages
Proc. SPIE 4311, Internet Imaging II, (27 December 2000); doi: 10.1117/12.411891
Show Author Affiliations
Apostolos Antonacopoulos, Univ. of Liverpool (United Kingdom)
Dimosthenis Karatzas, Univ. of Liverpool (United Kingdom)
Jordi Ortiz-Lopez, Univ. Politecnica de Catalunya (Spain)

Published in SPIE Proceedings Vol. 4311:
Internet Imaging II
Giordano B. Beretta; Raimondo Schettini, Editor(s)

© SPIE. Terms of Use
Back to Top