Share Email Print

Proceedings Paper

Mixed 3D-(2+1)D convolution for action recognition
Author(s): Bin Yang; Ping Zhou
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

2D CNNS for video-based action modeling ignore the temporal information and treat the multiple frames analogously to channels. In view of this, a mixed convolution structure implemented with ResNet-18 residual network is designed for video feature extracting. The 3D convolution and the (2+1)D convolution are interleaved in sequence throughout the network. Firstly, 2D convolution is performed on input multiple video frames one by one in the spatial. Then, 1D convolution of temporal is performed on the output of 2D convolution. Finally, 3D convolution is performed for spatiotemporal modeling simultaneously. Results show that the mixed convolution structure enhances the transmission of temporal information, improves the ability of video feature extraction and the action recognition accuracy obviously

Paper Details

Date Published: 14 August 2019
PDF: 6 pages
Proc. SPIE 11179, Eleventh International Conference on Digital Image Processing (ICDIP 2019), 1117949 (14 August 2019); doi: 10.1117/12.2540276
Show Author Affiliations
Bin Yang, Guilin Univ. of Electronic Technology (China)
Ping Zhou, Guilin Univ. of Electronic Technology (China)

Published in SPIE Proceedings Vol. 11179:
Eleventh International Conference on Digital Image Processing (ICDIP 2019)
Jenq-Neng Hwang; Xudong Jiang, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?