Share Email Print

Proceedings Paper

Explore fine-grained discriminative visual explanation when making classification decision
Author(s): Zhengxia Gao; Aiwen Jiang; Jianyi Wan
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Language and image are two most important media for describing surrounding world. Fine-grained visual explanations are helpful for people to understand the reasons or motivation of vision system when it makes classification decision. Base on the pioneer work of Lisa, this paper proposes a new model for discriminative visual explanation generation. It extracts res5c image features from deep residual network and uses multimodal compact bilinear strategy for multimodal information fusion. Selective attention mechanism is introduced to focus on visual parts that are most related to the predicted category information. The proposed network both considers spatial distribution of image content and fusion strategy that better model different modal information. The result on CUB Bird Dataset shows that our model can improve the quality of the explanation statement, which indicates that our proposed network is effective.

Paper Details

Date Published: 9 August 2018
PDF: 5 pages
Proc. SPIE 10806, Tenth International Conference on Digital Image Processing (ICDIP 2018), 108065V (9 August 2018); doi: 10.1117/12.2502901
Show Author Affiliations
Zhengxia Gao, Jiangxi Normal Univ. (China)
Aiwen Jiang, Jiangxi Normal Univ. (China)
Jianyi Wan, Jiangxi Normal Univ. (China)

Published in SPIE Proceedings Vol. 10806:
Tenth International Conference on Digital Image Processing (ICDIP 2018)
Xudong Jiang; Jenq-Neng Hwang, Editor(s)

© SPIE. Terms of Use
Back to Top