Share Email Print
cover

Proceedings Paper

Generating description with multi-feature fusion and saliency maps of image
Author(s): Lisha Liu; Yuxuan Ding; Chunna Tian; Bo Yuan
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Generating description for an image can be regard as visual understanding. It is across artificial intelligence, machine learning, natural language processing and many other areas. In this paper, we present a model that generates description for images based on RNN (recurrent neural network) with object attention and multi-feature of images. The deep recurrent neural networks have excellent performance in machine translation, so we use it to generate natural sentence description for images. The proposed method uses single CNN (convolution neural network) that is trained on ImageNet to extract image features. But we think it can not adequately contain the content in images, it may only focus on the object area of image. So we add scene information to image feature using CNN which is trained on Places205. Experiments show that model with multi-feature extracted by two CNNs perform better than which with a single feature. In addition, we make saliency weights on images to emphasize the salient objects in images. We evaluate our model on MSCOCO based on public metrics, and the results show that our model performs better than several state-of-the-art methods.

Paper Details

Date Published: 10 April 2018
PDF: 8 pages
Proc. SPIE 10615, Ninth International Conference on Graphic and Image Processing (ICGIP 2017), 106151D (10 April 2018); doi: 10.1117/12.2304845
Show Author Affiliations
Lisha Liu, Xidian Univ. (China)
Yuxuan Ding, Xidian Univ. (China)
Chunna Tian, Xidian Univ. (China)
Bo Yuan, Xidian Univ. (China)


Published in SPIE Proceedings Vol. 10615:
Ninth International Conference on Graphic and Image Processing (ICGIP 2017)
Hui Yu; Junyu Dong, Editor(s)

© SPIE. Terms of Use
Back to Top