Vehicle-tracking in wide-area motion imagery from an airborne platform
Intelligence image analysts are confronted with an increasing volume of data from a large number of high-resolution sensors, including those mounted on unmanned systems. An example is the wide area motion imagery (WAMI) sensor, which records high-resolution, full-motion video over multiple square kilometers from an airborne platform. WAMI's level of detail is such that all individual vehicles are clearly visible. However, analysts are unable to monitor all movements in this video data simultaneously. Therefore, they would greatly benefit from automatic tracking of objects and detection of events and anomalies.
We obtained WAMI data from the CorvusEye 1500CM sensor system, which consists of four cameras, and which generates images with an overall resolution of 116 megapixels at a frame rate of 2Hz. Most current detection and tracking algorithms cannot handle such low frame rates, because motion estimation for moving object detection and tracking of many fast-moving objects generally requires high update frequencies. To get around these problems, we have developed a processing pipeline that is suitable for high-resolution, low-frequency WAMI data. Once the tracks are of high quality, it is then possible to reliably perform automatic high-level reasoning, such as event and anomaly detection.
We built a processing pipeline (see Figure 2) that comprises a static vehicle detector, a smart object filtering algorithm using a 3D building reconstruction, and multi-camera object tracking based on template matching.1 Using the resulting tracks, we were able to implement event and anomaly detection algorithms.
The first step is the detection of the vehicles, for which we use a static object detector. Because this shape-based detector works on individual images, it does not depend on image motion stabilization. Furthermore, it is able to detect vehicles that are stationary (parked, or waiting at traffic lights, for example), which would not be possible with established techniques that subtract from the background.
In the second step, we filter false detections based on their altitude, which we determine using a 3D reconstruction of the area (see Figure 3). This allows us to develop an altitude map for each video frame, and from these we may estimate the height of each detection.
The key component of our approach is multi-camera tracking. We use template matching to draw an association between detections and tracks, which is advantageous because it counters the inaccuracy of GPS data that is embedded in the video data, and because it can find matches when the vehicle detector does not. This results in long tracks even if the imagery has a low frame rate. Figure 4 shows an example of a vehicle tracked for more than two minutes.
During the six minutes of video, the system identified 41,000 tracks, and we could extract different kinds of information. One of them is a speed map. Figure 5 shows the speed of each track in kilometers per hour plotted on the ground plane. From this plot it is clear where the main roads are located, and the speed limit on the main road near the city center (80km/h).
Our method is capable of tracking vehicles in challenging multi-camera WAMI data, which can then be used for event and anomaly detection. Automatic processing of full-motion video can be useful in both real-time and offline scenarios. In a real-time scenario, the data connection may have insufficient bandwidth to transfer all video data to the ground station. On-board tracking and analysis of moving objects can be used to determine the relevant data to send to the ground station. In an offline scenario, the automatic processing can support the operator or image analyst, who can use the information to effectively interpret large amounts of data or to perform data searches faster or more effectively. For real-time operations it is vital to get an automatic alert if any vehicle approaches a certain area of interest from a suspicious direction and with an above-average speed.
In summary, automatic information extraction is a necessity when handling data flows from high-resolution WAMI sensors. We have developed a detection and tracking system to meet this need. Without our techniques, intelligence analysts would not be able to effectively use these sensors. Together with the Dutch Ministry of Defense, we will now further investigate how intelligence analysts can make optimal use of our technology, including how complex data can be fused, queried, and visualized.
We would like to thank Harris Corporation for the availability of the CorvusEyeTM 1500 imagery. This research is done within the Unmanned Systems program (V1340), sponsored by the NL MOD, in which TNO investigates the use of different unmanned systems.
Jasper van Huis (MSc) is an expert in the design and implementation of algorithms that perform automatic information extraction in full-motion video.