Share Email Print

Proceedings Paper

Pre-trained D-CNN models for detecting complex events in unconstrained videos
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Rapid event detection faces an emergent need to process large videos collections; whether surveillance videos or unconstrained web videos, the ability to automatically recognize high-level, complex events is a challenging task. Motivated by pre-existing methods being complex, computationally demanding, and often non-replicable, we designed a simple system that is quick, effective and carries minimal overhead in terms of memory and storage. Our system is clearly described, modular in nature, replicable on any Desktop, and demonstrated with extensive experiments, backed by insightful analysis on different Convolutional Neural Networks (CNNs), as stand-alone and fused with others. With a large corpus of unconstrained, real-world video data, we examine the usefulness of different CNN models as features extractors for modeling high-level events, i.e., pre-trained CNNs that differ in architectures, training data, and number of outputs. For each CNN, we use 1-fps from all training exemplar to train one-vs-rest SVMs for each event. To represent videos, frame-level features were fused using a variety of techniques. The best being to max-pool between predetermined shot boundaries, then average-pool to form the final video-level descriptor. Through extensive analysis, several insights were found on using pre-trained CNNs as off-the-shelf feature extractors for the task of event detection. Fusing SVMs of different CNNs revealed some interesting facts, finding some combinations to be complimentary. It was concluded that no single CNN works best for all events, as some events are more object-driven while others are more scene-based. Our top performance resulted from learning event-dependent weights for different CNNs.

Paper Details

Date Published: 19 May 2016
PDF: 9 pages
Proc. SPIE 9871, Sensing and Analysis Technologies for Biomedical and Cognitive Applications 2016, 98710O (19 May 2016); doi: 10.1117/12.2228504
Show Author Affiliations
Joseph P. Robinson, Northeastern Univ. (United States)
Yun Fu, Northeastern Univ. (United States)

Published in SPIE Proceedings Vol. 9871:
Sensing and Analysis Technologies for Biomedical and Cognitive Applications 2016
Liyi Dai; Yufeng Zheng; Henry Chu; Anke D. Meyer-Bäse, Editor(s)

© SPIE. Terms of Use
Back to Top