Stationary video surveillance systems are an important tool for identifying possible security threats. However, their use is limited for mobile platforms, such as trucks, trains, and ships. For example, stationary units are not available at all locations all of the time, which presents challenges for monitoring a moving vehicle. We have resolved this issue, even removing the need for stationary surveillance altogether, with the EU FP7 ARENA, a flexible system for automatic detection and recognition of human and vehicular threats to mobile assets. Based on multisensory data analysis, ARENA deploys detection sensors on the platform itself (a truck, for example). As a result, we can continuously follow, monitor, and protect the vehicle (see Figures 1 and 2).
Figure 1. Truck protected by the ARENA mobile surveillance system.
Figure 2. Vessel protected by the ARENA system.
Our system incorporates low-level sensor data processing, high-level decision support, and interfaces for human-machine interaction. We have developed algorithms for specific datasets, (for example, one set could represent a certain parking lot: see Figures 3 and 4), and use these to detect and recognize threats for another dataset (for example, one representing a separate parking lot, with different light and weather conditions). We have also developed a generic system architecture so that we can easily change the type and number of sensors in the network, making ARENA flexible enough for use on various types of mobile target.
Figure 3. Results from object detection and tracking. The car and the people (framed in green) are detected and tracked, while the truck and the car above the people are stationary and therefore not detected or tracked. As soon as an object starts to move, the detection and tracking algorithms receive input data from the changes in the image.
Figure 4. A view of a parking lot. User-defined areas are indicated with arrows. Learned zones are in red. T: Truck. SZ: Smoking area. SA: Service area. PW: Pavement.
To follow pedestrians within a specified scene, we use object detection and tracking based on the Fastest Pedestrian Detector in the West (FPDW).1 This a highly accurate algorithm with a short calculation time that uses integral channel features to detect pedestrians from a single image, without the need for any prior information. We use FPDW to input data and can then track using standard methods, such as linear kalman filters (algorithms that use a series of measurements observed over time), constant velocity motion models, and multi-hypothesis trackers for data association.
We have developed three different algorithms for identifying events: action recognition, group detection, and zone-based activity recognition. The first is effectively one detector for separate actions, such as walk, run, turn, check, fight, enter, and loiter. Each detector quantifies features of space-time-interest-point image analysis using a soft-assignment random forest. This is a method for human action detection and classification through generalization, achieved by substituting the binary decisions inside the data type by a sigmoid function. We capture locations of the motion features in 3D using a Gaussian layout model, and classify the bag-of-words histograms (vectors of occurrence counts for image features) using an action-specific support vector machine, which is a classification method for analyzing and recognizing motion patterns in images.2
For the second algorithm, group detection, we analyze how closely pedestrians are positioned within a scene (see Figure 5). From a security perspective, gatherings of people are of interest since they can signal the early stages of unrest, such as fights or attacks. We based this algorithm on K-means clustering, which enables modeling using the silhouette value, a method of representing how an object lies within a data cluster.3
Figure 5. Group detection. Black dots are tracks and red circles are groups. m: Meters.
For the third event recognition algorithm, we identify activity zones: areas where people interact or alter their behavior. We can automatically establish pedestrians' patterns of transition between zones, and use these to define situational events. A typical example might be: ‘From just south of zone Truck to just north of zone Service Area.’ We presented an evaluation of the three event-recognition algorithms recently.4
The output data from event recognition is transferred to the threat recognition algorithm. From this, we compile simple events to form longer segments, which we analyze to determine if threats are present. We have described the algorithm in detail elsewhere.5
ARENA's backbone is the integration platform, which provides all algorithms with sensor data and makes results easily accessible for operators.
We demonstrated our system in live and pre-recorded examples in Paris, France, in April 2014. In one day, we recorded scenarios on the truck parking lot shown in Figure 3, and then succesfully processed the sensor data in the integrated platform, including detecting, tracking, fusion, and recognition of events and threats. We also simulated a maritime scenario, where we detected and tracked vessels and recognized threats based on inconsistencies of radar, information from automatic identification systems, and unusual patterns of movement.
In summary, ARENA is a flexible surveillance system that enables automatic detection and recognition of the actions of individuals within an image, and identifies potential threats. In future work, we will focus on specific applications for developing the system towards a prototype. Moreover, all the specific algorithms need further development, particularly to reduce the risks for false alarms.
Asa Waern, Maria Andersson, Henrik Petersson
Swedish Defence Research Agency (FOI)
Maria Andersson is a senior scientist at FOI and a guest researcher at Linköping University. She received an MSc in mechanical engineering in 1989 and PhD in energy systems in 1997, both from Linköping University. Her research interests include situation assessment, event recognition, and signal processing.
Henrik Petersson received an MSc in electrical engineering and applied physics in 2002 and PhD in applied physics in 2008, both from Linköping University. Since then he has been a scientist, with research interests that include signal processing, machine learning, and sensor fusion. He has experience in both theoretical analysis and practical implementation aspects.
1. The fastest pedestrian detector in the West, Proc. Brit. Mach. Vis. Conf.
, p. 68, 2010. doi:10.5244/C.24.68
2. G. J. Burghouts, K. Schutte, H. Bouma, R. J. M. den Hollander, Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos, Mach. Vis. Appl. 25(1), p. 85-98, 2014.
3. M. Andersson, F. Gustafsson, L. St-Laurent, D. Prévost, Recognition of anomalous motion patterns in urban surveillance, IEEE J. Sel. Top. Sign. Proc. 7(1), p. 102-110, 2013.
4. M. Andersson, L. Patino, G. J. Burghouts, A. Flizikowski, M. Evans, D. Gustafsson, H. Petersson, K. Schutte, J. Ferryman, Activity recognition and localization on a truck parking lot, Proc. IEEE Conf. Adv. Vid. Sign. Surv., p. 263-269, 2013.
5. G. J. Burghouts, K. Schutte, R. J-M. ten Hove, S. P. van den Broek, J. Baan, O. Rajadell, J. van Huis, J. van Rest, Instantaneous threat detection based on a semantic representation of activities, zones and trajectories, Sign. Im. Vid. Proc., 2013. (In review).