Context-aware tracking with wide-area motion imagery

A tracker that draws links between individuals' behavior patterns can locate targets in large, highly detailed video images.
07 June 2013
Jianjun Gao, Haibin Ling, Erik Blasch, Khanh Pham, Zhonghai Wang, and Genshe Chen

Wide-area motion imagery (WAMI) generates high-resolution images to enable tracking and recording of vehicle and pedestrian movements over city-sized areas. Sensors are mounted on aircraft to provide a wide field of view, but the resulting image may include a very large number of individuals and objects. This presents difficulties in identifying a target by physical characteristics alone. To overcome this, we can use WAMI alongside a ‘ context-aware’ tracker: that is, one which is sensitive to the target's interactions with its surroundings (or ‘ network’). We can then record an individual's ‘ pattern of life,’ by following their movements over several days (their trace), and during a single day (their tracelets).

Using WAMI video, we focus on targets that display abnormal behaviors: for example, a suspected drug dealer who drives to the same place every other day. We identify the location of interest (a drug house, in our example) from an established database. The tracker records cars and pedestrians that visit the house, enabling us to construct a network that includes the location (the ‘ node’ for our model). We also record the cars' and pedestrians' traces and tracelets (the ‘ links’) to obtain targets' information (see Figure 1). Finally, we analyze individuals' behavior using a reasoning model that employs trace analysis to obtain the pattern of life. Furthermore, when new locations are identified, we update the knowledge database for future movement analysis.


Figure 1. A diagram of the system for obtaining pattern of life recognition using wide-area motion imagery (WAMI) and a context-aware multi-target tracker (CAMT).

We pinpoint our selected location using the WAMI sensor's viewing angle and a terrain map. Our context-aware multi-target tracker (CAMT) uses a spatio-temporal model, maximum consistency context, that seeks the most consistent associations in its neighborhood.1 This determines if there is a link between two detected targets across two consecutive frames, and selects reliable context information, filtering out noisy distraction.

To determine the features of a moving target, we use a particle filter: a sampling technique for estimating the probability density function of the variables in the system. The target's features are observations of its underlying states. Our algorithm automatically updates its estimation of the probability distribution of changes in these states over time. Figure 2 shows a tracking result using the CAMT over a very large number of targets.

Using our technique over a specific period, we construct information networks and examine the semantic attributes of each network element to determine patterns. We detect nodes and links using DJ-Cluster, an algorithm based on density and join-based clustering.2 Using the network structure—the detected nodes, links, and associated attributes— we can generate features based on target activities and events. From these we obtain the pattern of life. As we accumulate temporal-spatial analysis of targets' journeys over an extended period, we can recognize specific actions, from which we infer behavior during a particular time frame. Recognizing activities in the region of interest, we can also determine the pattern of life. 3–8


Figure 2. An example of CAMT results to track pattern of life in multiple targets, using WAMI data. Each line segment indicates the short-term trajectory of a target, or a connection between its positions in the last several frames. The colors indicate movement in different directions.

In future work, we will extend our methods for information extraction to include full motion videos that provide greater detail of the target. These would have higher resolution and would be taken from a side view, as well as from above. We would also include text to complement the video input, providing target information.


Jianjun Gao, Zhonghai Wang, Genshe Chen
Intelligent Fusion Technology, Inc. (IFT)
Germantown, MD

Jianjun Gao is a research scientist. He graduated from Northwestern Polytechnic University with a BS in 1989 and MSc in 1991, majoring in automatic control. His interests include target tracking, image processing, and WAMI data exploitation.

Zhonghai Wang is a senior research scientist. He received his PhD from Michigan Technological University in 2010. Previously, he was a postdoctoral researcher at Missouri University of Science and Technology (2011-2012), and an engineer at China Aerospace Science and Industry Corporation.

Genshe Chen received his BS and MSc in electrical engineering, and his PhD in aerospace engineering, in 1989, 1991 and 1994, respectively, from Northwestern Polytechnical University, Xian, China. He is chief technology officer at IFT and provides strategic guidance for government services and commercial solutions.

Haibin Ling
Department of Computer and Information Sciences
Temple University
Philadelphia, PA

Haibin Ling is an assistant professor. He received his PhD from the University of Maryland, College Park, in 2006 and gained postdoctorate training at the University of California Los Angeles in 2007. He worked for Microsoft Research Asia (2000-2001) and Siemens Corporate Research (2007-2008).

Erik Blasch
Information Directorate
Air Force Research Laboratory
Rome, NY

Erik Blasch holds a BS from Massachusetts Institute of Technology and an MBA and PhD from Wright State University. He has compiled more than 400 papers in areas of target tracking, information/sensor/image fusion, and robotics applications. He is an SPIE Fellow.

Khanh Pham
Space Vehicles Directorate
Air Force Research Laboratory
Kirtland Air Force Base, NM

Dr. Pham is a senior aerospace engineer. He has contributed to numerous aspects of the SPIE Defense, Security and Sensing Symposium through basic research in optics control and technical leadership in planning developments for conferences on sensors and systems for space applications.


References:
1. X. Shi, E. Blasch, W. Hu, H. Ling, Using maximum consistency context for multiple target association in wide area traffic scenes, Int'l Conf. on Acoustics, Speech and Signal Process., 2013.
2. C. Zhou, D. Frankowski, P. Ludford, S. Shekhar, L. Terveen, Discovering personal gazetteers: an interactive clustering approach, Proc. ACM Int'l Symp. on Adv. in Geographic Inf. Syst. , p. 266-273, 2004.
3. P. Liang, D. Shen, E. P. Blasch, K. Pham, Z. Wang, G. Chen, H. Ling, Spatial context for moving vehicle detection in wide area motion imagery with multiple kernel learning, Proc. SPIE 8751, p. 875105,  2013. doi:10.1117/12.2015967
4. P. Liang, H. Ling, E. Blasch, G. Seetharaman, D. Shen, G. Chen, Vehicle detection in wide aerial surveillance using temporal context, Int'l Conf. on Inf. Fusion, 2013.
5. E. Blasch, G. Seetharaman, K. Palaniappan, H. Ling, G. Chen, Wide-area motion imagery (WAMI) exploitation tools for enhanced situation awareness, IEEE Appl. Imagery Pattern Recognit. Workshop, 2012.
6. E. Blasch, P. C. G. Costa, K. B. Laskey, H. Ling, G. Chen, The URREF ontology for semantic wide area motion imagery exploitation, IEEE Nat. Aero. and Electron. Conf., 2012.
7. P. Liang, T. Teodoro, H. Ling, E. Blasch, G. Chen, L. Bai, Multiple kernel learning for vehicle detection in wide area motion imagery, Int'l Conf. on Inf. Fusion, 2012.
8. Y. Wu, G. Chen, E. Blasch, L. Bai, H. Ling, Feature-based background registration in wide-area motion imagery, Proc. SPIE 8402, p. 804204, 2012. doi:10.1117/12.918804
PREMIUM CONTENT
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research