Share Email Print
cover

Proceedings Paper

A hierarchical framework for understanding human-human interactions in video surveillance
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Understanding human behavior in video is essential in numerous applications including smart surveillance, video annotation/retrieval, and human-computer interaction. However, recognizing human interactions is a challenging task due to ambiguity in body articulation, variations in body size and appearance, loose clothing, mutual occlusion, and shadows. In this paper we present a framework for recognizing human actions and interactions in color video, and a hierarchical graphical model that unifies multiple-level processing in video computing: pixel level, blob level, object level, and event level. A mixture of Gaussian (MOG) model is used at the pixel level to train and classify individual pixel colors. A relaxation labeling with attribute relational graph (ARG) is used at the blob level to merge the pixels into coherent blobs and to register inter-blob relations. At the object level, the poses of individual body parts are recognized using Bayesian networks (BNs). At the event level, the actions of a single person are modeled using a dynamic Bayesian network (DBN). The results of the object-level descriptions for each person are juxtaposed along a common timeline to identify an interaction between two persons. The linguistic 'verb argument structure' is used to represent human action in terms of triplets. A meaningful semantic description in terms of is obtained. Our system achieves semantic descriptions of positive, neutral, and negative interactions between two persons including hand-shaking, standing hand-in-hand, and hugging as the positive interactions, approaching, departing, and pointing as the neutral interactions, and pushing, punching, and kicking as the negative interactions.

Paper Details

Date Published: 17 January 2005
PDF: 15 pages
Proc. SPIE 5682, Storage and Retrieval Methods and Applications for Multimedia 2005, (17 January 2005); doi: 10.1117/12.587211
Show Author Affiliations
Sangho Park, Univ. of Texas/Austin (United States)
J. K. Aggarwal, Univ. of Texas/Austin (United States)


Published in SPIE Proceedings Vol. 5682:
Storage and Retrieval Methods and Applications for Multimedia 2005
Rainer W. Lienhart; Noboru Babaguchi; Edward Y. Chang, Editor(s)

© SPIE. Terms of Use
Back to Top