Share Email Print

Proceedings Paper

Sequence-to-sequence image caption generator
Author(s): Rehab Alahmadi; Chung Hyuk Park; James Hahn
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Recently, image captioning has received much attention from the artificial-intelligent (AI) research community. Most of the current works follow the encoder-decoder machine translation model to automatically generate captions for images. However, most of these works used Convolutional Neural Network (CNN) as an image encoder and Recurrent Neural Network (RNN) as a decoder to generate the caption. In this paper, we propose a sequence-to-sequence model that uses RNN as an image encoder that follows the encoder-decoder machine translation model, such that the input to the model is a sequence of images that represents the objects in the image. These objects are ordered based on their order in the captions. We demonstrate the results of the model on Flickr30K dataset and compare the results with the state-ofthe-art methods that use the same dataset. The proposed model outperformed the state-of-the-art methods on all metrics.

Paper Details

Date Published: 15 March 2019
PDF: 7 pages
Proc. SPIE 11041, Eleventh International Conference on Machine Vision (ICMV 2018), 110410C (15 March 2019); doi: 10.1117/12.2523174
Show Author Affiliations
Rehab Alahmadi, The George Washington Univ. (United States)
King Saud Univ. (Saudi Arabia)
Chung Hyuk Park, The George Washington Univ. (United States)
James Hahn, The George Washington Univ. (United States)

Published in SPIE Proceedings Vol. 11041:
Eleventh International Conference on Machine Vision (ICMV 2018)
Antanas Verikas; Dmitry P. Nikolaev; Petia Radeva; Jianhong Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?