Detecting humans carrying objects

Decomposing an image sequence into slices that generate twisted patterns sufficiently characterizes human gait and a particular class of activities.
10 March 2009
Yang Ran, Qinfen Zheng, and Rama Chellappa

Surveillance cameras may be a tool for solving crimes, but what about using them to prevent or stop criminals or terrorists? Although computerized video-based monitoring would seem to be the obvious answer, algorithms that can recognize suspicious activities and individuals have proven highly difficult to devise. Examples include a terrorist carrying a suicide bomb or a military target holding a heavy weapon. Traditional approaches to this problem are based on markers or feature points extracted from the human body, which is impractical for low-resolution images and moving platforms.1,2

Using artificial intelligence,3 we are developing a computer monitoring system that can analyze human motion under challenges such as low-resolution and real-time processing. We proceed by examining the cyclic property of motion and present algorithms to classify humans in videos according to their gait patterns. When a person's limbs are unencumbered, gait movements are symmetrical. Represented graphically, they form a twisted helical pattern called a double-helical signature (DHS) that resembles a figure 8. The pattern is changed by any activity that disturbs the symmetry, such as carrying a package. By defining these signatures, our system can recognize unique characteristics in human gait and automatically detect asymmetries (see Figure 1).


Figure 1. (top) Original frames and silhouette in X-Y. (middle) Activity sequence in X-Y-t. (bottom) Selected 2D slices containing DHS in X-t. t: Time axis.

In the proposed model, an image is decomposed into ‘X-t’ slices (where ‘t’ is the time axis). The motion of the limbs is represented as a pair of kinematic chains oscillating out of phase. The hands maintain the center of gravity above the point of contact and minimize the energy to balance the body during bipedal leg swing. We expect that the presence of a sufficiently heavy object will (at least in the hand regions) distort the DHS pattern. Because of the compactness of the method, we need only look at one slice to understand the arm articulation. We have studied three activities—natural walking, carrying an object with one hand, and holding an object—and examined the different symmetries associated with them. One side of the signature disappears when a person is carrying an object in one hand, and the whole DHS disappears when objects are held in the arms.

We have integrated human gait DHS into a real-time video surveillance system and used it to study and locate pedestrians. The experimental results have demonstrated the effectiveness of the system under lighting changes, shadows, camera motion, and various viewing angles, with and without obstacles. The results also indicate that the approach is superior to many existing methods in terms of accuracy and reliability.

There are several major differences between our method and traditional marker or 2D segmentation-based methods. First, we study the topology of activities guided by a kinematic model, whereas other approaches discard such information either by focusing on correlation or on the silhouette histogram. Second, existing methods assume that objects have been segmented or aligned. We focus on simultaneous detection and segmentation with and without occlusion. Third, we analyze the geometric constraints between DHS from multiple views and individuals. Finally, we investigate its usefulness in recognizing activity.

In conclusion, we have developed and tested several methods for human motion modeling and analysis with specific reference to surveillance applications. We use a kinematic model that characterizes the high-order statistical deformations of a human body. The analysis of redundancy in gait signatures from different heights is presented. The proposed method naturally integrates temporal body kinematics with 2D shape information. It does not require silhouettes or feature tracking. Our approach has two major advantages. First, the twisted pattern belongs to a Frieze group, which enables separation of self-intersecting curves for robust and efficient learning. Second, only a finite set of DHSs is needed to compactly and sufficiently represent activity volume topology and to estimate articulation parameters such as cadence, step and stride length, and style.

Human motion analysis and recognition can be expanded in many ways. Some potential avenues to explore in the context of the proposed approach include being able to preempt incidents through real-time alarms for suspicious behavior, enhanced forensic capabilities via content-based video retrieval, and situational awareness with joint cognizance of location, identity, and activity of objects in the monitored space. In our own group, we plan to study performance by increasing the classes of activities being considered. We will also extend our experimental research to videos captured from aerial vehicles.


Yang Ran, Qinfen Zheng
Center for Automation Research (CfAR)
University of Maryland
College Park, MD
SET Corporation
Greenbelt, MD
Yang Ran received his PhD in electrical engineering from the University of Maryland, College Park, in 2006. He has published book chapters and journal and conference papers in computer vision research, especially in human motion analysis. His interests are focused on biometrics, surveillance and monitoring, video clustering, and related commercial applications.
Rama Chellappa
CfAR
University of Maryland
College Park, MD

PREMIUM CONTENT
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research