Share Email Print

Proceedings Paper

A spatio-temporal deep learning approach for human action recognition in infrared videos
Author(s): Anuj K. Shah; Ripul Ghosh; Aparna Akula
Format Member Price Non-Member Price
PDF $17.00 $21.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Human action recognition in indoor environment can prove to be very crucial in avoiding serious accidents and (or) damage. Application domain spans from monitoring the actions of solitary elders or persons with disabilities to monitoring persons working alone in a chamber or in isolated industry environment. These scenarios demand an automatic near real-time activity recognition and alert to save life and assets. In this work, considering the fact that the sensing modality should be capable of working round the clock in a non-intrusive manner, we have opted for thermal infrared camera, which captures the heat emitted by objects in the scene and generates an image. Motivated by the recent success of convolutional neural networks (CNN) for human action recognition in IR images, we extend this work by incorporating one additional dimension i.e. the temporal information. In this work, we have designed and implemented a 3D-CNN for learning the spatial as well as the sequential features in the thermal IR videos. In this work, eight action classes are considered - Walking, Standing, Falling, Lying, Sitting, Falling from chair, Sitting up (recovering from fall from sitting posture), Getting up (recovering from fall from lying posture). To evaluate the proposed framework, infrared (IR) videos of different actions were generated in three diverse environments of home – inside study room, inside a bedroom and in the garden. The dataset comprised of 2641 and 894 IR videos for training and testing respectively, each of half a second duration performed by more than 50 volunteers. We have designed and implemented 3D-CNN, comprising of two blocks, each of two convolution and one max pool layer, which automatically constructs features from raw data incorporating both spatial and temporal information to learn actions. Network parameters are learned using back-propagation algorithm and the learning is supervised. Experimental results indicate 85% classification accuracy on 894 complex test videos of the proposed Spatio-Temporal Deep Learning architecture on the IR action dataset.

Paper Details

Date Published: 7 September 2018
PDF: 9 pages
Proc. SPIE 10751, Optics and Photonics for Information Processing XII, 1075111 (7 September 2018); doi: 10.1117/12.2502993
Show Author Affiliations
Anuj K. Shah, Indian Institute of Engineering Science and Technology (India)
Ripul Ghosh, CSIR - Ctr. Scientific Instruments Organisation (India)
Aparna Akula, CSIR - Ctr. Scientific Instruments Organisation (India)

Published in SPIE Proceedings Vol. 10751:
Optics and Photonics for Information Processing XII
Abdul A. S. Awwal; Khan M. Iftekharuddin; Mireya García Vázquez, Editor(s)

© SPIE. Terms of Use
Back to Top