
Proceedings Paper
When visual object-context features meet generic and specific semantic priors in image captioningFormat | Member Price | Non-Member Price |
---|---|---|
$17.00 | $21.00 |
Paper Abstract
In this work, we propose a novel encoding-decoding based image captioning framework, which improves the performance by jointly exploring the visual object-context features, generic and specific semantic priors. In the encoding of RNN, we extract the semantic attributes, object-related and scene-related image features first, and then feed them sequentially to the encoder of RNN, which considers the rich general semantic and visual object-context representation of images. To incorporate the testing specific semantic priors in the decoding of RNN, we apply cross-modal retrieval to find the most similar captions of the testing image in the visual-semantic embedding space of VSE++. The BLEU-4 similarity is utilized to evaluate the similarity between the generated sentence and the retrieved captions, which incorporates the sentence-making priors to the testing-specific reference captions. The evaluation on benchmark dataset Microsoft COCO shows the superiority of our algorithm over the state-of-the-art approaches on standard evaluation metrics.
Paper Details
Date Published: 6 May 2019
PDF: 9 pages
Proc. SPIE 11069, Tenth International Conference on Graphics and Image Processing (ICGIP 2018), 110691E (6 May 2019); doi: 10.1117/12.2524235
Published in SPIE Proceedings Vol. 11069:
Tenth International Conference on Graphics and Image Processing (ICGIP 2018)
Chunming Li; Hui Yu; Zhigeng Pan; Yifei Pu, Editor(s)
PDF: 9 pages
Proc. SPIE 11069, Tenth International Conference on Graphics and Image Processing (ICGIP 2018), 110691E (6 May 2019); doi: 10.1117/12.2524235
Show Author Affiliations
Mengmeng Jiang, Xidian Univ. (China)
Published in SPIE Proceedings Vol. 11069:
Tenth International Conference on Graphics and Image Processing (ICGIP 2018)
Chunming Li; Hui Yu; Zhigeng Pan; Yifei Pu, Editor(s)
© SPIE. Terms of Use
