The defense industry and a desire for improvements in search-and-rescue efforts have driven significant research and development in motion detection over the past few decades. Researchers in the field of computer vision have made extensive contributions towards addressing the issues faced, for example by national-defense and border-control agencies, and during disaster-relief and fire-rescue missions. Although the human eye is unmatched in its ability to accurately detect, track, and analyze targets, it is limited by the concentration (or lack thereof) and fatigue levels of the human mind. However, a well-trained computer-vision system can automatically detect and track specific targets in several hours of video footage. Perfecting such a system has been the field's focus for several years now. Problems such as occlusion, background clutter, and artifacts caused by illumination conditions, motion, and weather have all been significant research topics.
A large fraction of computer-vision research has focused on detection and tracking in the visible part of the spectrum, i.e., the range of electromagnetic radiation visible by the human eye. However, solutions to the tracking problem are independent of the specific spectral range used. The only wavelength-dependent differences are in target luminosity and signature. Spectral ranges outside the visible regime, such as UV and IR, provide a wealth of information about both animate and inanimate objects. For instance, bees (whose visible spectrum is shifted towards the UV regime) can recognize flowers containing nectar, although all flowers of a certain family may appear similar to the human eye. With advancements in technology, devices such as IR cameras and night-vision goggles are now being used extensively to gather and process information in military-conflict settings, denying the enemy the advantage of night cover and bunkers. Vast amounts of computational power and parallel-processing techniques enable real-time target detection and tracking. However, only a limited amount of work has been done on detection and tracking using thermal (IR) imagery.
Figure 1. (top row) Candidate target positions. (bottom row) Detected targets by elimination of false positives using steerable filters. All true targets have been detected correctly.
Several detection methods require that targets appear as bright regions (hotspots) in the images.1–3 This technique assumes that the target's IR radiation is much stronger than that from background and noise combined. We attempt to provide a solution to real-time target tracking in forward-looking IR imagery4 in the presence of camera motion and changing target features (shape and intensity). Once the targets have been detected, tracking is done using three modules. The first finds the translation vector in image space that minimizes the distance between the feature distributions of the model, obtained from the intensity and local standard-deviation measure of the frames, and the candidate target. The other modules compensate for the sensor's egomotion and update the target model. We have implemented some of these concepts in our modular end-to-end system COCOA,5 which can perform motion compensation, detect and track moving objects, and index videos taken from a camera mounted on a moving aerial platform. It works in both visible and thermal-imaging modes.
We perform target detection using an intensity histogram and partition the intensity space4 while assigning ambiguous regions according to fuzzy c-means clustering rules.6 This is followed by a merging phase in which we fuse the edge information and brightness constraints.7 Once the regions are segmented, a confidence measure for each candidate region is computed as the product of two sigmoid functions (mathematical S-shaped functions),
where μf and μb are the mean foreground and background levels of the ith target, respectively, λ1 and λ2 control the slopes of the sigmoids, and μ1 and μ2 are the functions' offsets. If a target region is bright and has a high contrast with its neighboring areas, this equation assigns it a confidence close to unity. Otherwise, its confidence level will be close to zero. We select regions of high confidence as possible targets.
As soon as target detection has been achieved, we proceed to target tracking, where we use the object's positions and sizes to initialize the system (see Figure 1). Once we have computed the required parameters, we apply the transformation to the previous target center and compute the approximate new candidate center. We then perform mean-shift iterations8 near the new target position to increase the likelihoods of both the target model and the new candidate model. In Figures 2 and 3 we demonstrate the robustness of the system using both the model update and motion compensation. We present the results on sequences both with a low signal-to-noise ratio (SNR) and high global motion, and containing multiple targets.
Figure 2. Although the target looks very similar to the background, its position is tracked correctly.
Figure 3. Demonstration of multiple-target tracking.
The COCOA tracking module follows targets detected in the motion-detection stage, as long as they remain visible in the camera's field of view. This is critical for obtaining tracks that reflect the motion characteristics of the object of interest for extended durations. COCOA consists of three modules. An egomotion compensation module accounts for the continuous motion of the camera. Accumulative frame differencing and background subtraction are used to detect motions of independently moving objects, such as cars and people. After frame differencing, the difference images are combined to obtain a summed image. Background subtraction is performed hierarchically, using both pixel- and frame-level processing.9 COCOA uses two methods for tracking. In the kernel-based object tracker, each object is represented separately by its own color distribution or intensity value in IR images, and tracking is performed in global coordinates. In the blob-tracking approach, regions of interest in successive frames are represented by unique appearance and shape models for multi-target tracking. Temporal relationship between the blobs is established using a cost function that takes into account appearance and shape similarities. Figure 4 shows an IR sequence in which two cars are tracked using the blob-tracking algorithm (see video demonstration10).
Figure 4. Tracking target objects in IR imagery using COCOA's blob tracking.
In summary, we have introduced two different methods to perform target tracking. One of these was developed specifically for IR imagery,4 while the other can track both across and outside the visible spectrum.5 Compensating for sensor egomotion is a key stage in both methods. Results based on sequences with low SNR and high egomotion show the robustness of both approaches for tracking targets in both visible and thermal-imaging modes. We intend to implement this system for real-time operation in the near future. Tracking in general, and IR imagery in particular, represents a significant challenge for computer vision. When working with limited resources and time, using real-time target-tracking systems that can deliver payloads from beyond the line of sight and assist in continuous activity monitoring can significantly enhance the chances of success during battle and search-and-rescue missions, increase security during surveillance, and detect events that are considered out of the ordinary.
Arjun Nagendran, Mubarak Shah
Computer Vision Laboratory
University of Central Florida (UCF)
Arjun Nagendran is a postdoctoral research associate. He has a PhD in robotics (2008) from the University of Manchester (UK). He was responsible for ground-vehicle systems for Team Tumbleweed, one of the six 2008 finalists of the UK's Ministry of Defence Grand Challenge. He specializes in control and landing mechanisms for unmanned air vehicles.
Mubarak Shah is the founding director of the Computer Vision Laboratory. He is co-author of three books published by Springer, Automated Multi-Camera Surveillance: Algorithms and Practice (2008), Video Registration (2003), and Motion-Based Recognition (1997). In 2006, he was awarded a Pegasus Professor award, the highest award given to UCF faculty members. He has been a recepient of several other awards, including the Harris Corporation's 1999 Engineering Achievement Award.