Share Email Print

Proceedings Paper

Recognition as translating images into text
Author(s): Kobus Barnard; Pinar Duygulu; David A. Forsyth
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

We present an overview of a new paradigm for tackling long standing computer vision problems. Specifically our approach is to build statistical models which translate from a visual representations (images) to semantic ones (associated text). As providing optimal text for training is difficult at best, we propose working with whatever associated text is available in large quantities. Examples include large image collections with keywords, museum image collections with descriptive text, news photos, and images on the web. In this paper we discuss how the translation approach can give a handle on difficult questions such as: What counts as an object? Which objects are easy to recognize and which are hard? Which objects are indistinguishable using our features? How to integrate low level vision processes such as feature based segmentation, with high level processes such as grouping. We also summarize some of the models proposed for translating from visual information to text, and some of the methods used to evaluate their performance.

Paper Details

Date Published: 10 January 2003
PDF: 11 pages
Proc. SPIE 5018, Internet Imaging IV, (10 January 2003); doi: 10.1117/12.478427
Show Author Affiliations
Kobus Barnard, Univ. of Arizona (United States)
Pinar Duygulu, Middle East Technical Univ. (Turkey)
David A. Forsyth, Univ. of California/Berkeley (United States)

Published in SPIE Proceedings Vol. 5018:
Internet Imaging IV
Simone Santini; Raimondo Schettini, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?