Share Email Print
cover

Proceedings Paper

Deep visual-semantic for crowded video understanding
Author(s): Chunhua Deng; Junwen Zhang
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Visual-semantic features play a vital role for crowded video understanding. Convolutional Neural Networks (CNNs) have experienced a significant breakthrough in learning representations from images. However, the learning of visualsemantic features, and how it can be effectively extracted for video analysis, still remains a challenging task. In this study, we propose a novel visual-semantic method to capture both appearance and dynamic representations. In particular, we propose a spatial context method, based on the fractional Fisher vector (FV) encoding on CNN features, which can be regarded as our main contribution. In addition, to capture temporal context information, we also applied fractional encoding method on dynamic images. Experimental results on the WWW crowed video dataset demonstrate that the proposed method outperform the state of the art.

Paper Details

Date Published: 8 March 2018
PDF: 5 pages
Proc. SPIE 10609, MIPPR 2017: Pattern Recognition and Computer Vision, 106091E (8 March 2018); doi: 10.1117/12.2285848
Show Author Affiliations
Chunhua Deng, Wuhan Univ. of Science and Technology (China)
Hubei Province Key Lab. of Intelligent Information Processing and Real-time Industrial System (China)
Junwen Zhang, Wuhan Univ. of Science and Technology (China)
Hubei Province Key Lab. of Intelligent Information Processing and Real-time Industrial System (China)


Published in SPIE Proceedings Vol. 10609:
MIPPR 2017: Pattern Recognition and Computer Vision
Zhiguo Cao; Yuehuang Wang; Chao Cai, Editor(s)

© SPIE. Terms of Use
Back to Top