Spie Press BookAutomatic Target Recognition
This Tutorial Text provides an inside view of the automatic target recognition (ATR) field from the perspective of an engineer working in the field for 40 years. The algorithm descriptions and testing procedures covered in the book are appropriate for addressing military problems. The book also addresses unique aspects and considerations in the design, testing, and fielding of ATR systems. These considerations need to be understood by ATR engineers working in the defense industry as well as by their government customers. The final chapter discusses the future of ATR and provides a type of Turing test for determining if an ATR system is truly smart (neuromorphic or brain-like). The Appendix provides difficult-to-find resources available to the ATR engineer.
Table of Contents
- 1 Definitions and Performance Measures
- 1.1 What is Automatic Target Recognition (ATR)?
- 1.2 Basic Definitions
- 1.3 Detection Criteria
- 1.4 Performance Measures for Target Detection
- 1.4.1 Truth-normalized measures
- 1.4.2 Report-normalized measure
- 1.4.3 Receiver operator characteristic curve
- 1.4.4 Pd versus FAR curve
- 1.4.5 Pd versus list length
- 1.4.6 Other factors that may enter the detection equation
- 1.4.7 Missile terminology
- 1.4.8 Clutter level
- 1.5 What is Automatic Target Recognition (ATR)?
- 1.5.1 Object taxonomy
- 1.5.2 Confusion matrix
- 1.5.3 Some commonly used terms from probability and statistics
- 1.6 Experimental Design
- 1.6.1 Test plan
- 1.6.2 ATR and human subject testing
- 1.7 Characterizations of ATR Hardware/Software
- 2 Target Detection Strategies
- 2.1 Introduction
- 2.1.1 What is target detection?
- 2.1.2 Detection schemes
- 2.1.3 Scale
- 2.1.4 Polarity, shadows, and image form
- 2.1.5 Methodology for algorithm evaluation
- 2.2 Simple Detection Algorithms
- 2.2.1 Triple-window filter
- 2.2.2 Hypothesis testing as applied to an image
- 2.2.3 Comparison of two empirically determined means: variations on the T-test
- 2.2.4 Tests involving variance, variation, and dispersion
- 2.2.5 Tests for significance of hot spot
- 2.2.6 Nonparametric tests
- 2.2.7 Test involving textures and fractals
- 2.2.8 Tests involving blob edge strength
- 2.2.9 Hybrid tests
- 2.2.10 Triple-window filters using several inner-window geometries
- 2.3 More-Complex Detectors
- 2.3.1 Neural network detectors
- 2.3.2 Discriminant functions
- 2.3.3 Deformable templates
- 2.4 Grand Paradigms
- 2.4.1 Geometrical and cultural intelligence
- 2.4.2 Neuromorphic paradigm
- 2.4.3 Learning on-the-fly
- 2.4.4 Integrated sensing and processing
- 2.4.5 Bayesian surprise
- 2.4.6 Modeling and simulation
- 2.4.7 SIFT and SURF
- 2.4.8 Detector designed to operational scenario
- 2.5 Traditional SAR and Hyperspectral Target Detectors
- 2.5.1 Target detection in SAR imagery
- 2.5.2 Target detection in hyperspectral imagery
- 2.6 Conclusions and Future Direction
- 3 Target Classifier Strategies
- 3.1 Introduction
- 3.1.1 Parables and paradoxes
- 3.2 Main Issues to Consider in Target Classification
- 3.2.1 Issue 1: Concept of operations
- 3.2.2 Issue 2: Inputs and outputs
- 3.2.3 Issue 3: Target classes
- 3.2.4 Issue 4: Target variations
- 3.2.5 Issue 5: Platform issues
- 3.2.6 Issue 6: Under what conditions does a sensor supply useful data?
- 3.2.7 Issue 7: Sensor issues
- 3.2.8 Issue 8: Processor
- 3.2.9 Issue 9: Conveying classification results to the human-in-the-loop
- 3.2.10 Issue 10: Feasibility
- 3.3 Feature Extraction
- 3.4 Feature Selection
- 3.5 Examples of Feature Types
- 3.5.1 Histogram of oriented gradients
- 3.5.2 Histogram of optical flow feature vector
- 3.6 Examples of Classifiers
- 3.6.1 Simple classifiers
- 3.6.2 Basic classifiers
- 3.6.3 Contest-winning and newly popular classifiers
- 3.7 Discussion
- 4 Unification of Automatic Target Tracking and Automatic Target Recognition
- 4.1 Introduction
- 4.2 Categories of Tracking Problems
- 4.2.1 Number of targets
- 4.2.2 Size of targets
- 4.2.3 Sensor type
- 4.2.4 Target type
- 4.3 Tracking Problems
- 4.3.1 Point target tracking
- 4.3.2 Video tracking
- 4.4 Extensions of Target Tracking
- 4.4.1 Activity recognition (AR)
- 4.4.2 Patterns-of-life and forensics
- 4.5 Collaborative ATT and ATR (ATT↔ATR)
- 4.5.1 ATT data useful to ATR
- 4.5.2 ATR data useful to ATT
- 4.6 Unification of ATT and ATR (ATT∪ATR)
- 4.6.1 Visual pursuit
- 4.6.2 A bat's echolocation of flying insects
- 4.6.3 Fused ATT∪ATR
- 4.7 Discussion
- 5 How Smart Is Your Automatic Target Recognizer?
- 5.1 Introduction
- 5.2 Test for Determining the Intelligence of an ATR
- 5.2.1 Does the ATR understand human culture?
- 5.2.2 Can the ATR deduce the gist of a scene?
- 5.2.3 Does the ATR understand physics?
- 5.2.4 Can the ATR participate in a pre-mission briefing?
- 5.2.5 Does the ATR possess deep conceptual understanding?
- 5.2.6 Can the ATR adapt to the situation, learn on-the-fly, and make analogies?
- 5.2.7 Does the ATR understand the rules of engagement?
- 5.2.8 Does the ATR understand the order of battle and force structure?
- 5.2.9 Can the ATR control platform motion?
- 5.2.10 Can the ATR fuse information from a wide variety of sources?
- 5.2.11 Does the ATR possess metacognition?
- 5.3 Sentient versus Sapient ATR
- 5.4 Discussion: Where Is ATR Headed?
- Appendix 1: Resources
- Appendix 2: Acronyms
An automatic target recognizer (ATR) is a real-time or near-real-time image/signal-understanding system. An ATR is presented with a stream of data. It outputs a list of the targets that it has detected and recognized in the data provided to it. A complete ATR system can also perform other functions such as image stabilization, preprocessing, mosaicking, target tracking, activity recognition, multi-sensor fusion, sensor/platform control, and data packaging for transmission or display.
In the early days of ATR, there were fierce debates between proponents of signal processing and those in the emerging field of computer vision. Signal processing fans were focused on more advanced correlation filters, stochastic analysis, estimation and optimization, transform theory, and time-frequency analysis of nonstationary signals. Advocates of computer vision said that signal processing provides some nice tools for our toolbox, but what we really want is an ATR that works as well as biological vision. ATR designers were less interested in processing signals than understanding scenes. They proposed attacking the ATR problem through artificial intelligence (AI), computational neuroscience, evolutionary algorithms, case-based reasoning, expert systems, and the like. Signal processing experts are interested in tracking point-like targets. ATR engineers want to track a target with some substance to it, identify what it is, and determine what activity it is engaged in. Signal processing experts keep coming up with better ways to compress video. ATR engineers want more intelligent compression. They want the ATR to tell the compression algorithm which parts of the scene are more important and hence deserving of more bits in the allocation. ATR, in of itself, can be thought of as a data reduction technique. The ATR takes in a lot of data and outputs relatively little data. Data reduction is necessary due to bandwidth limitations of the data link and workload limits of the time-strapped human operator. People are very good at analyzing video until fatigue sets in or they get distracted. They don't want to be like the triage doctor at the emergency ward, assessing everything that comes in the door, continually assigning priorities to items deserving further attention. Pilots and ground station operators want a machine to relieve their burden as long as it rarely makes a mistake. Trying to do this keeps ATR engineers employed. As often told to the author, pilots and image analysts are not looking for machines to replace them entirely. But, such decisions will be made higher up in the chain of command as ATR technology progresses.
The human vision system is not "designed" to analyze certain kinds of data such as rapid step-stare imagery, complex-valued signals that arise in radars, hyperspectral imagery, 3D LADAR data, or fusion of signal data with various forms of precise metadata. ATR shines when the sustained data rate is too high or too prolonged for the human brain, or the data is not well suited for presentation to humans. Nevertheless, most current ATRs operate with humans in the loop. Humans, at present, are much better than ATRs at tasks requiring consultation, comprehension, and judgement. Humans still make the final decision and determine the action to be taken. This means that ATR output, which is statistical and multi-faceted by nature, has to be presented to the human decision makers in an easily understood form. This is a difficult man/machine interface problem. Marching toward the future, more autonomous robotic systems will necessarily rely more on ATRs to substitute for human operators, possibly serving as the "brains" of entire robotic platforms. We leave this provocative topic to the end of the book.
Systems engineers took notice once ATRs became deployable. System engineers are grounded in harsh reality. They care little about the debate between signal processing and computer vision. They don't want to hear about an ATR being brain-like. They are not interested in which classification paradigm performs 1% better than the next. They care about the concept of operations (ConOps) and how it directs performance and functionality. They care about mission objectives and mission requirements. They want to identify all possible stakeholders, form an integrated product team, determine key performance parameters (KPPs), and develop test and evaluation (T&E) procedures to determine if performance requirements are met. Self-test is the norm for published papers and conference talks. Independent test and evaluation, laboratory blind tests, field tests, and software regression tests are the norm for determining if a system is deployable. The system engineer's focus is broader than ATR performance. System engineers want the entire system, or system of systems, to work well, including platform, sensors, ATR, and data links. They want to know what data can be provided to the ATR and what data the ATR can provide to the rest of the system. They want to know how one part of the system affects all other parts of the system. System designers care a lot about size, weight, power, latency, current and future costs, logistics, timelines, mean time between failure, and product repair and upgrade. They want to know the implications of system capture by the enemy.
At one time, ATR was the sole charge of the large defense electronics companies, working closely with the government labs. Only the defense companies and government have fleets of data collection aircraft, high-end sensors, and access to foreign military targets. Although air-to-ground has been the focus of much ATR work, ATR actually covers a wide range of sensors, operating within or between the layers of space, air, ocean/land surface, and undersea/underground. Although the name ATR implies recognition of targets, ATR engineers have broader interests. ATR groups tackle any type of military problem involving the smart processing of imagery or signals. The government (or government-funded prime contractor) is virtually the only customer. So, some of the ATR engineer's time is spent reporting to the government, participating in joint data collections, taking part in government-sponsored tests, and proposing new programs to the government.
Since the 1960s, the field of ATR has advanced in parallel with similar work in the commercial sector and academia, involving industrial automation, medical imaging, surveillance and security, video analytics, and space-based imaging. Technologies of interest to both the commercial and defense sector include low-power processors, novel sensors, increased system autonomy, people detection, robotics, rapid search of vast amounts of data (Big Data), undersea inspection, and remote medical diagnosis. The bulk of funding in some of these areas has recently shifted from the defense to the commercial sector. More money is spent on computer animation for Hollywood movies than for the synthesis of forward-looking infrared (FLIR) and synthetic aperture radar (SAR) imagery. The search engine companies are investing much more in neural networks compared to the defense companies. Well-funded brain research programs are investigating the very basis of human vision and cognitive processing. The days of specialized military processors (e.g., VHSIC) are largely over. Reliance is now on chips in high-volume production: multi-core processors (e.g., Intel and ARM), FPGAs (e.g., Xilinx and Intel/Altera) and GPUs (e.g., Nvidia and AMD). Highly packaged sensors (visible, FLIR, LADAR, and radar) combined with massively parallel processors are advancing rapidly for the automotive industry to meet new safety standards (e.g., MobilEye). Millions of systems will soon be produced per year. Current advanced driver assistance systems (ADAS) can detect pedestrians, animals, bicyclists, road signs, traffic lights, cars, trucks, and road markers. These are a lot like ATR tasks. The rapid advancement of ADAS might one day lead to driverless cars.
Some important differences between ATRs and commercial systems are worth noting. ATRs generally have to detect and recognize objects at much longer ranges than commercial systems. Enemy detection and recognition are noncooperative processes. Although a future car might have a LADAR, radar, or FLIR sensor, it won't have one that can produce high-quality data from a 20,000-ft range. An ADAS will detect a pedestrian but won't report if he is carrying a rifle. Search engine companies need to search large volumes of data with image-based search, but they don't have the metadata to help the search, such as is available on military platforms. That being said, the cost and innovation rate of commercial electronics can't be matched by military systems. The distinction between commercial and military systems is starting to blur in some instances. Cell phones now include cameras, inertial measurement unit, GPS, computers, algorithms, and transmitters/receivers. Slightly rugged versions of commercial cell phones and tablet computers are starting to be used by the military, even with ATR apps. "Toy" drones are approaching the sophistication of the smallest military unmanned air vehicles. ATR engineers are in tune with advances in the commercial sector and their applicability to ATR. Even their hobbies tend to focus on technology, e.g., hobbies such as quadcopters, novel cameras, 3D printers, computers, phone apps, robots, etc.
ATR is not limited to a device; it is also a field of research and development. ATR technology can be incorporated into systems in the form of self-contained hardware, FPGA code, or higher-level language code. ATR groups can help add autonomy to many types of systems. ATR can be viewed very narrowly or very broadly, borrowing concepts from a wide variety of fields. Papers on ATR are often of the form: "Automatic Target Recognition using XXX," where the XXX can be any technology such as super-resolution, principal component analysis, sparse coding, singular value decomposition, Eigen templates, correlation filters, kinematic priors, adaptive boosting, hyperdimensional manifolds, Hough transforms, foveation, etc. In the more ambitious papers, the XXX is a mélange of technologies, such as fuzzy-rule-based expert system, wavelet neural genetic network, fuzzy morphological associative memory, optical holography, deformable wavelet templates, hierarchical support vector machine, Bayesian recognition by parts, etc. Get the picture? Nearly any type of technology, everything but the kitchen sink, can be thrown at the ATR problem, with scant large-scale independent competitive test results to indicate which approach really works best, supposing that "best" can be defined and measured. This book is not a comprehensive survey of every technology that has ever been applied to ATR. This book covers some of the basics of ATR. While some of the topics in this book can be found in textbooks on pattern recognition and computer vision, this book focuses on their application to military problems as well as the unique requirements of military systems.
The topics covered in the book are organized in the way one would design an ATR. The first step is to understand the military problem and make a list of potential solutions to the problem. A key issue is the availability of sufficiently comprehensive sets of data to train and test the potential solutions. This involves developing a sound test plan, specifying procedures and equations, and determining who is going to do the testing. Testing isn't open ended. Exit criteria are needed to determine when a given test activity has been successfully completed. The next steps in ATR design are choosing the detector and classifier. The detector focuses attention on the regions of interest in the imagery requiring additional scrutiny. The classifier further processes these regions of interest and is the decision engine for class assignment. It can operate at any or all levels of a decision tree, from clutter rejection to identifying a specific vehicle or activity. Detected targets are often tracked. Target tracking has historically been treated as a separate subject from ATR, mainly because point-like targets contain too little information to apply an ATR. But, as sensor resolution becomes better, the engineering disciplines of target tracking and ATR are starting to merge. The ATR and tracker can be united for efficiency and performance. The last chapter points out how primitive current ATRs really are, as compared to biological systems. It suggests ways for measuring the intelligence of an ATR. This goes far beyond the basic performance measurement techniques covered in Chapter 1. The first appendix lists the many resources available to the ATR engineer. Many of the listed agencies supply training and testing data, perform blind tests, and sponsor research into compelling new sensor and ATR designs. The second appendix explains the acronyms used in the book.
Chapter 1: ATR technology has benefited from a significant investment over the last 50 years. However, the once-accepted definitions and evaluation criteria have been displaced by the march of technology. The first chapter updates the language for describing ATR systems and provides well-defined criteria for evaluating such systems. This will move forward collaboration between ATR developers, evaluators, and end-users.
ATR is used as an umbrella term for a broad range of military technology beyond just the recognition of targets. In a more general sense, ATR means sensor data exploitation. Two types of definitions are included in the first chapter. One type defines fundamental concepts. The other type defines basic performance measures. In some cases, definitions consist of a list of alternatives. This approach enables choices to be made to meet the needs of particular programs. The important point to keep in mind is that within the context of a particular experimental design, a set of protocols should be adopted to best fit the situation, applied, and then kept constant throughout the evaluation. This is especially important for competitive testing.
The definitions given in Chapter 1 are intended for evaluation of end-to-end ATR systems as well as the prescreening and classifier stages of the systems. Sensor performance and platform characteristics are excluded from the evaluation. It is recognized that sensor characteristics and other operational factors affect the imagery and associated metadata. A thorough understanding of data quality, integrity, synchrony, availability, and timeline are important for ATR development, test, and evaluation. Data quality should be quantified and assessed. However, methods for doing so are not covered in this book. The results and validity of ATR evaluation depend on the representativeness and comprehensiveness of the development and test data. The adequacy of development and test data is primarily a budgetary issue. The ATR engineer should understand and be able to convey the implications of limited, surrogate, or synthetic data, but might not have much control over the state of affairs.
Chapter 1 formalizes definitions and performance measures associated with ATR evaluation. All performance measures must be accepted as ballpark predictions of actual performance in combat. More carefully formulated experiments will provide more meaningful conclusions. The final measure of effectiveness takes place in the battlefield.
Chapter 2: Hundreds of simple target detection algorithms were tested on midand longwave FLIR images, as well as X-band and Ku-band SAR images. Each algorithm is briefly described. Indications are given as to which performed well. Some of these simple algorithms are loosely derived from standard tests of the difference of two populations. For target detection, these are typically populations of pixel grayscale values or features derived from them. The statistical tests are often implemented in the form of sliding triple-window filters. Several moreelaborate algorithms are also described with their relative performances noted. These algorithms utilize neural networks, deformable templates, and adaptive filtering. Algorithm design issues are broadened to cover system design issues and concepts of operation.
Since target detection is such a fundamental problem, it is often used as a test case for developing technology. New technology leads to innovative approaches for attacking the problem. Eight inventive paradigms, each with deep philosophical underpinnings, are described in relation to their effect on target detector design.
Chapter 3: Target classification algorithms have generally kept pace with developments in the academic and commercial sectors since the 1970s. However, most recently, investment into object classification by Internet companies and various large-scale projects for understanding the human brain has far outpaced that of the defense sector. The implications are noteworthy.
There are some unique characteristics of the military classification problem. Target classification is not solely an algorithm design problem, but is part of a larger system design task. The design flows down from a ConOps and KPPs. Required classification level is specified by contract. Inputs are image and/or signal data and time-synchronized metadata. The operation is often real-time. The implementation minimizes size, weight, and power (SWaP). The output must be conveyed to a time-strapped operator who understands the rules of engagement. It is assumed that the adversary is actively trying to defeat recognition. The target list is often mission dependent, not necessarily a closed set, and can change on a daily basis. It is highly desirable to obtain sufficiently comprehensive training and testing data sets, but costs of doing so are very high, and data on certain target types are scarce or nonexistent. The training data might not be representative of battlefield conditions, suggesting the avoidance of designs tuned to a narrow set of circumstances. A number of traditional and emerging feature extraction and target classification strategies are reviewed in the context of the military target classification problem.
Chapter 4: The subject being addressed is how an automatic target tracker (ATT) and an ATR can be fused together so tightly and so well that their distinctiveness becomes lost in the merger. This has historically not been the case outside of biology and a few academic papers. The biological model of ATT∪ATR arises from dynamic patterns of activity distributed across many neural circuits and structures (including those in the retinae). The information that the brain receives from the eyes is "old news" at the time that it receives it. The eyes and brain forecast a tracked object's future position, rather than relying on the received retinal position. Anticipation of the next moment—building up a consistent perception—is accomplished under difficult conditions: motion (eyes, head, body, scene background, target) and processing limitations (neural noise, delays, eye jitter, distractions). Not only does the human vision system surmount these problems, but it has innate mechanisms to exploit motion in support of target detection and classification. Biological vision doesn't normally operate on snapshots. Feature extraction, detection, and recognition are spatiotemporal. When scene understanding is viewed as a spatiotemporal process, target detection, target recognition, target tracking, event detection, and activity recognition (AR) do not seem as distinct as they are in current ATT and ATR designs. They appear as similar mechanisms taking place at varying time scales. A framework is provided for unifying ATT, ATR, and AR.
Chapter 5: ATRs have been under development since the 1960s. Advances in computer processing, computer memory, and sensor resolution are easy to evaluate. However, the time horizon of the truly smart ATR seems to be receding at a rate of one year per year. One issue is that there has never been a way to measure the intelligence of an ATR. This is fundamentally different from measuring detection and classification performance. The description of what constitutes an ATR, and in particular a smart ATR, keeps changing. Early ATRs did little more than detect fuzzy bright spots in first-generation FLIR video or ten-foot- resolution SAR data. Sensors are getting better, computers are getting faster, and the ATR is expected to take over more of the workload. With unmanned systems there is no human onboard to digest information. The ATR is compelled to transmit only the most important information over a limited-bandwidth data link. The ATR or robotic system can be viewed as a substitute for a human. What constitutes intelligence in artificial humans has long been debated, starting with stories of golems, continuing to the Turing Test, and including current dire predictions of super-intelligent robots superseding humans. Chapter 5 provides a type of Turing Test for judging the intelligence of an ATR.
Appendix 1: The first appendix lists the many resources available to the ATR engineer and includes a brief historical overview of the technologies involved in ATR development.
Appendix 2: The second appendix defines all of the acronyms used in this book.
Special thanks to the United States Army Night Vision and Electronic Sensors Directorate (NVESD), Air Force, Navy, DARPA, and Northrop Grumman for supporting this work over the years. This book benefited from critique and suggestions made by the reviewers and SPIE staff.
The views and opinions expressed in this book are solely those of the author in his private capacity and do not represent those of any company, the United States Federal Government, any entity of the U.S. Federal Government, or any private organization. Links to organizations are provided solely as a service to our readers. Links do not constitute an endorsement by any organization or the Federal Government, and none should be inferred. While extensive efforts have been made to verify statements and facts presented in this book, any factual errors or errors of opinion are solely those of the author. No position or endorsement by the U.S. Federal Government, any entity of the Federal government, or any other organization regarding the validity of any statement of fact presented in this book should be inferred.
Author's Contact Information
Comments on this book are welcome. The author can be contacted at Bruce.Jay.Schachter@gmail.com.
Bruce J. Schachter
- Appendix 2: Acronyms