Share Email Print

Proceedings Paper

When visual object-context features meet generic and specific semantic priors in image captioning
Author(s): Heng Liu; Chunna Tian; Mengmeng Jiang
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

In this work, we propose a novel encoding-decoding based image captioning framework, which improves the performance by jointly exploring the visual object-context features, generic and specific semantic priors. In the encoding of RNN, we extract the semantic attributes, object-related and scene-related image features first, and then feed them sequentially to the encoder of RNN, which considers the rich general semantic and visual object-context representation of images. To incorporate the testing specific semantic priors in the decoding of RNN, we apply cross-modal retrieval to find the most similar captions of the testing image in the visual-semantic embedding space of VSE++. The BLEU-4 similarity is utilized to evaluate the similarity between the generated sentence and the retrieved captions, which incorporates the sentence-making priors to the testing-specific reference captions. The evaluation on benchmark dataset Microsoft COCO shows the superiority of our algorithm over the state-of-the-art approaches on standard evaluation metrics.

Paper Details

Date Published: 6 May 2019
PDF: 9 pages
Proc. SPIE 11069, Tenth International Conference on Graphics and Image Processing (ICGIP 2018), 110691E (6 May 2019); doi: 10.1117/12.2524235
Show Author Affiliations
Heng Liu, Xidian Univ. (China)
Chunna Tian, Xidian Univ. (China)
Mengmeng Jiang, Xidian Univ. (China)

Published in SPIE Proceedings Vol. 11069:
Tenth International Conference on Graphics and Image Processing (ICGIP 2018)
Chunming Li; Hui Yu; Zhigeng Pan; Yifei Pu, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?