Share Email Print

Proceedings Paper

A hierarchical framework for understanding human-human interactions in video surveillance
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Understanding human behavior in video is essential in numerous applications including smart surveillance, video annotation/retrieval, and human-computer interaction. However, recognizing human interactions is a challenging task due to ambiguity in body articulation, variations in body size and appearance, loose clothing, mutual occlusion, and shadows. In this paper we present a framework for recognizing human actions and interactions in color video, and a hierarchical graphical model that unifies multiple-level processing in video computing: pixel level, blob level, object level, and event level. A mixture of Gaussian (MOG) model is used at the pixel level to train and classify individual pixel colors. A relaxation labeling with attribute relational graph (ARG) is used at the blob level to merge the pixels into coherent blobs and to register inter-blob relations. At the object level, the poses of individual body parts are recognized using Bayesian networks (BNs). At the event level, the actions of a single person are modeled using a dynamic Bayesian network (DBN). The results of the object-level descriptions for each person are juxtaposed along a common timeline to identify an interaction between two persons. The linguistic 'verb argument structure' is used to represent human action in terms of triplets. A meaningful semantic description in terms of is obtained. Our system achieves semantic descriptions of positive, neutral, and negative interactions between two persons including hand-shaking, standing hand-in-hand, and hugging as the positive interactions, approaching, departing, and pointing as the neutral interactions, and pushing, punching, and kicking as the negative interactions.

Paper Details

Date Published: 17 January 2005
PDF: 15 pages
Proc. SPIE 5682, Storage and Retrieval Methods and Applications for Multimedia 2005, (17 January 2005); doi: 10.1117/12.587211
Show Author Affiliations
Sangho Park, Univ. of Texas/Austin (United States)
J. K. Aggarwal, Univ. of Texas/Austin (United States)

Published in SPIE Proceedings Vol. 5682:
Storage and Retrieval Methods and Applications for Multimedia 2005
Rainer W. Lienhart; Noboru Babaguchi; Edward Y. Chang, Editor(s)

© SPIE. Terms of Use
Back to Top