Proceedings Volume 11394

Automatic Target Recognition XXX

cover
Proceedings Volume 11394

Automatic Target Recognition XXX

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 15 June 2020
Contents: 12 Sessions, 26 Papers, 19 Presentations
Conference: SPIE Defense + Commercial Sensing 2020
Volume Number: 11394

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 11394
  • Aerial, Remote Sensing, and Space-Based ATR I
  • Aerial, Remote Sensing, and Space-Based ATR II
  • Computer Vision for ATR I
  • Active Sensing (RF, LIDAR, and Acoustic)
  • ATR Algorithms I
  • ATR Algorithms II
  • Phenomenology and Analysis
  • Keynote Speaker Dr. Saurabh Prasad
  • Multi Sensor/Multi Spectrum
  • Target Detection Algorithms
  • Additional Paper
Front Matter: Volume 11394
icon_mobile_dropdown
Front Matter: Volume 11394
This PDF file contains the front matter associated with SPIE Proceedings Volume 11394, including the Title Page, Copyright information, and Table of Contents.
Aerial, Remote Sensing, and Space-Based ATR I
icon_mobile_dropdown
Deep learning based moving object detection for oblique images without future frames
Won Yeong Heo, Seongjo Kim, DeukRyeol Yoon, et al.
Moving object detection from UAV/aerial images is one of the essential tasks in surveillance systems. However, most of the works did not take account of the characteristics of oblique images. Also, many methods use future frames to detect moving objects in the current frame, which causes delayed detection. In this paper, we propose a deep learning based moving object detection method for oblique images without using future frames. Our network has a CNN (Convolutional Neural Network) architecture with the first and second layer containing sublayers with different kernel sizes. These sublayers play a role in detecting objects with different sizes or speeds, which is very important because objects that are closer to the camera look bigger and faster in oblique images. Our network takes the past five frames registered with respect to the last frame and produces a heatmap prediction for moving objects. Finally, we process a threshold processing to distinguish between object pixels and non-object pixels. We present the experimental results on our dataset. It contains about 15,000 images for training and about 6,000 images for testing with ground truth annotations for moving objects. We demonstrate that our method shows a better performance than some previous works.
Height-adaptive vehicle detection in aerial imagery using metadata of EO sensor
Seongjo Kim, Won Yeong Heo, HyunSeong Sung, et al.
Detecting targets in aerial imagery plays an important role in military reconnaissance and defense. One of the main difficulties in aerial imagery detection at a wide range of height is instability that detection is performed well only to the test data obtained at the same height range with the training data. To solve this problem, we utilize the sensor metadata to calculate GSD (Ground Sample Distance) and pixel size of the vehicle in our test images which are dependent on height. Based on this information, we estimate the optimal ratio for image preprocessing and adjust it to the test images. As a result, it detects the vehicles taken at a range of 100m to 300m with a higher F1-score than other approach which doesn’t consider the metadata.
Investigation of search strategies to identify optimized and efficient templates for automatic target recognition in remotely sensed imagery
Samantha S. Carley, Stanton R. Price, Samantha J. Tidrick, et al.
Object detection remains an important and ever-present component of computer vision applications. While deep learning has been the focal point for much of the research actively being conducted in this area, there still exists certain applications in which such a sophisticated and complex system is not required. For example, if a very specific object or set of objects are desired to be automatically identified, and these objects' appearances are known a priori, then a much simpler and more straightforward approach known as matched filtering, or template matching, can be a very accurate and powerful tool to employ for object detection. In our previous work, we investigated using machine learning, specifically, the improved Evolution COnstructed features framework, to identify (near-) optimal templates for matched filtering given a specific problem. Herein, we explore how different search algorithms, e.g., genetic algorithm, particle swarm optimization, gravitational search algorithm, can derive not only (near-) optimal templates, but also promote templates that are more efficient. Specifically, given a defined template for a particular object of interest, can these search algorithms identify a subset of information that enables more efficient detection algorithms while minimizing degradation of detection performance. Performance is assessed in the context of algorithm efficiency, accuracy of the object detection algorithm and its associated false alarm rate, and search algorithm performance. Experiments are conducted on handpicked images of commercial aircraft from the xView dataset | one of the largest publicly available datasets of overhead imagery.
Aerial, Remote Sensing, and Space-Based ATR II
icon_mobile_dropdown
Domain adversarial neural network-based oil palm detection using high-resolution satellite images
Detection of oil palm tree provides necessary information for monitoring oil palm plantation and predicting palm oil yield. The supervised model, like deep neural network trained by remotely sensed images of the source domain, can obtain high accuracy in the same region. However, the performance will largely degrade if the model is applied to a different target region with another unannotated images, due to changes in relation to sensors, weather conditions, acquisition time, etc. In this paper, we propose a domain adaptation based approach for oil palm detection across two different high-resolution satellite images. With manually labeled samples collected from the source domain and unlabeled samples collected from the target domain, we design a domain-adversarial neural network that is composed of a feature extractor, a class predictor and a domain classifier to learn the domain-invariant representations and classification task simultaneously during training. Detection tasks are conducted in six typical regions of the target domain. Our proposed approach improves accuracy by 25.39% in terms of F1-score in the target domain, and performs 9.04%-15.30% better than existing domain adaptation methods.
Computer Vision for ATR I
icon_mobile_dropdown
Target classification in infrared imagery by cross-spectral synthesis using GAN
Images can be captured using devices operating at different light spectrum's. As a result, cross domain image translation becomes a nontrivial task which requires the adaptation of Deep convolutional networks (DCNNs) to resolve the aforementioned imagery challenges. Automatic target recognition(ATR) from infrared imagery in a real time environment is one of such difficult tasks. Generative Adversarial Network (GAN) has already shown promising performance in translating image characteristic from one domain to another. In this paper, we have explored the potential of GAN architecture in cross-domain image translation. Our proposed GAN model maps images from the source domain to the target domain in a conditional GAN framework. We verify the performance of the generated images with the help of a CNN-based target classifier. Classification results of the synthetic images achieve a comparable performance to the ground truth ensuring realistic image generation of the designed network.
Active Sensing (RF, LIDAR, and Acoustic)
icon_mobile_dropdown
Radar target recognition using structured sparse representation
Radar target recognition using structured sparse representation is the focus of this paper. Block-sparse representation and recovery is applied to the radar target recognition problem assuming a stepped-frequency radar is used. The backscatter of commercial aircraft models as recorded in a compact range is used to train and test a block-sparse based classifier. The motivation is to investigate scenarios where the target backscatter is corrupted by extraneous scatterers (similar to the disguise problem), and to investigate scenarios where scatterer occlusion takes place (similar to the face occlusion problem). Additional scenarios of whether the target azimuth position is completely or partially known are also examined.
A comparison of template matching and deep learning for classification of occluded targets in LiDAR data
Automatic target recognition (ATR) is an ongoing topic of research for the Air Force. In this effort we develop, analyze and compare template matching and deep learning algorithms for use in the task of classifying occluded targets in light detection and ranging (LiDAR) data. Specifically, we analyze convolutional sparse representations (CSR) and convolutional neural networks (CNN). We explore the strengths and weaknesses of each algorithm separately, then improve the algorithms, and finally provide a comprehensive comparison of the developed tools. To conduct this final comparison, we improve the functionality of current LiDAR simulators to include our occlusion creator and parallelize our data simulation tools for use on the DoD High Performance Computers. Our results demonstrate that for this problem, a DenseNet trained with images containing representative clutter outperforms a basic CNN and the CSR approach.
Multi-feature optimization strategies for target classification using seismic and acoustic signatures
Perimeter monitoring systems have become one of the most researched topics in recent times. Owing to the increasing demand for using multiple sensor modalities, the data for processing is becoming high dimensional. These representations are often too complex to visualize and decipher. In this paper, we will investigate the use of feature selection and dimensionality reduction strategies for the classification of targets using seismic and acoustic signatures. A time-slice classification approach with 43 numbers of features extracted from multi-domain transformations has been evaluated on the SITEX02 military vehicle dataset consisting of tracked AAV and wheeled DW vehicle. Acoustic signals with SVM-RBF resulted in an accuracy of 93.4%, and for seismic signals, the ensemble of decision trees classifier with bagging approach resulted in an accuracy of 90.6 %. Further principal component analysis (PCA) and neighborhood component analysis (NCA) based feature selection approach has been applied to the extracted features. NCA based approach retained only 20 features that obtained classification accuracy ~ 94.7% for acoustic and ~ 90.5% for seismic. An increase of ~2% to 4% is observed for NCA when compared to PCA based feature transformation approach. A further fusion of individual seismic and acoustic classifier posterior probabilities increases the classification accuracy to 97.7%. Further, a comparison with PCA and NCA based feature optimization strategies have also been validated on CSIO experimental datasets comprising of moving civilian vehicles and anthropogenic activities.
Classifying WiFi "physical fingerprints" using complex deep learning
Logan Smith, Nicholas Smith, Joshua Hopkins, et al.
Wireless communication is susceptible to security breaches by adversarial actors mimicking Media Access Controller (MAC) addresses of currently-connected devices. Classifying devices by their “physical fingerprint” can help to prevent this problem since the fingerprint is unique for each device and independent of the MAC address. Previous techniques have mapped the WiFi signal to real values and used classification methods that support solely real-valued inputs. In this paper, we put forth four new deep neural networks (NNs) for classifying WiFi physical fingerprints: a real-valued deep NN, a corresponding complex-valued deep NN, a real-valued deep CNN, and the corresponding complex-valued deep convolutional NN (CNN). Results show state-of-the-art performance against a dataset of nine WiFi devices.
Adversarial training on SAR images
Recent studies have shown that machine learning networks trained on simulated synthetic aperture radar (SAR) images of vehicular targets do not generalize well to classification of measured imagery. This disconnect between these two domains is an interesting, yet-unsolved problem. We apply an adversarial training technique to try and provide more information to a classification network about a given target. By constructing adversarial examples against synthetic data to fool the classifier, we expect to extend the network decision boundaries to include a greater operational space. These adversarial examples, in conjunction with the original synthetic data, are jointly used to train the classifier. This technique has been shown in the literature to increase network generalization in the same domain, and our hypothesis is that this will also help to generalize to the measured domain. We present a comparison of this technique to off-the-shelf convolutional classifier methods and analyze any improvement.
ATR Algorithms I
icon_mobile_dropdown
A probabilistic analysis of connected component sizes in random binary images (Conference Presentation)
This paper addresses the problem of determining the probability mass function of connected component sizes for independent and identically distributed binary images. We derive an exact solution and an effective approximation that can be readily computed for all component sizes.
Flexible deep transfer learning by separate feature embeddings and manifold alignment
Samuel Rivera, Joel Klipfel, Deborah Weeks
Object recognition is a key enabler across industry and defense. As technology changes, algorithms must keep pace with new requirements and data. New modalities and higher resolution sensors should allow for increased algorithm robustness. Unfortunately, algorithms trained on existing labeled datasets do not directly generalize to new data because the data distributions do not match. Transfer learning (TL) or domain adaptation (DA) methods have established the groundwork for transferring knowledge from existing labeled source data to new unlabeled target datasets. However, current DA approaches assume similar source and target feature spaces and suffer in the case of massive domain shifts or changes in the feature space. Existing methods assume the data are either the same modality, or can be aligned to a common feature space. Therefore, most methods are not designed to support a fundamental domain change such as visual to auditory data. We propose a novel deep learning framework that overcomes this limitation by learning separate feature extractions for each domain while minimizing the distance between the domains in a latent lower-dimensional space. The alignment is achieved by considering the data manifold along with an adversarial training procedure. We demonstrate the effectiveness of the approach versus traditional methods with several ablation experiments on synthetic, measured, and satellite image datasets. We also provide practical guidelines for training the network while overcoming vanishing gradients which inhibit learning in some adversarial training settings.
Training set effect on super resolution for automated target recognition
Matthew Ciolino, David Noever, Josh Kalin
Single Image Super Resolution (SISR) is the process of mapping a low-resolution image to a high-resolution image. This inherently has applications in remote sensing as a way to increase the spatial resolution in satellite imagery. This suggests a possible improvement to automated target recognition in image classification and object detection. We explore the effect that different training sets have on SISR with the network, Super Resolution Generative Adversarial Network (SRGAN). We train 5 SRGANs on different land-use classes (e.g. agriculture, cities, ports) and test them on the same unseen dataset. We attempt to find the qualitative and quantitative differences in SISR, binary classification, and object detection performance. We find that curated training sets that contain objects in the test ontology perform better on both computer vision tasks while having a complex distribution of images allows object detection models to perform better. However, Super Resolution (SR) might not be beneficial to certain problems and will see a diminishing amount of returns for datasets that are closer to being solved.
SAR automatic target recognition with less labels
Joseph F. Comer, Reed W. Andrews, Navid Naderializadeh, et al.
Synthetic-Aperture-Radar (SAR) is a commonly used modality in mission-critical remote-sensing applications, including battlefield intelligence, surveillance, and reconnaissance (ISR). Processing SAR sensory inputs with deep learning is challenging because deep learning methods generally require large training datasets and high- quality labels, which are expensive for SAR. In this paper, we introduce a new approach for learning from SAR images in the absence of abundant labeled SAR data. We demonstrate that our geometrically-inspired neural architecture, together with our proposed self-supervision scheme, enables us to leverage the unlabeled SAR data and learn compelling image features with few labels. Finally, we show the test results of our proposed algorithm on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset.
ATR Algorithms II
icon_mobile_dropdown
Identifying unlabeled WiFi devices with zero-shot learning
In wireless networks, MAC-address spoofing is a common attack that allows an adversary to gain access to the system. To circumvent this threat, previous work has focused on classifying wireless signals using a “physical fingerprint”, i.e., changes to the signal caused by physical differences in the individual wireless chips. Instead of relying on MAC addresses for admission control, fingerprinting allows devices to be classified and then granted access. In many network settings, the activity of legitimate devices—those devices that should be granted access— may be dynamic over time. Consequently, when faced with a device that comes online, a robust fingerprinting scheme must quickly identify the device as legitimate using the pre-existing classification, and meanwhile identify and group those unauthorized devices based on their signals. This paper presents a two-stage Zero-Shot Learning (ZSL) approach to classify a received signal originating from either a legitimate or unauthorized device. In particular, during the training stage, a classifier is trained for classifying legitimate devices. The classifier learns discriminative features and the outlier detector uses these features to classify whether a new signature is an outlier. Then, during the testing stage, an online clustering method is applied for grouping those identified unauthorized devices. Our approach allows 42% of unauthorized devices to be identified as unauthorized and correctly clustered.
Adventures in deep learning geometry
Donald Waagen, Don Hulsey, Jamie Godwin, et al.
Deep learning models are pervasive for a multitude of tasks, but the complexity of these models can limit interpretation and inhibit trust. For a classification task, we investigate the induced relationships between the class conditioned data distributions, and geometrically compare/contrast the data with the deep learning models' output weight vectors. These geometric relationships are examined across models as a function of dense hidden layer width. Additionally, we geometrically characterize perturbation-based adversarial examples with respect to the deep learning model.
Do we miss targets when we capture hyperspectral images with compressive sensing?
The utilization of compressive sensing (CS) techniques for hyperspectral (HS) imaging is appealing since HS data is typically huge and very redundant. The CS design offers a significant reduction of the acquisition effort, which can be manifested in faster acquisition of the HS datacubes, acquisition of larger HS images and removing the need for postacquisition digital compression. But, do all these benefits come at the expense of the ability to extract targets from the HS images? The answer to this question, of course, depends on the specific CS design and on the target detection algorithm employed. In a previous study we have shown that there is virtually no target detection performance degradation when a classical target detection algorithm is applied on data acquired with CS HS imaging techniques of the kind we have developed during the last years. In this paper we further investigate the robustness of our CS HS techniques for the task of object classification by deep learning methods. We show preliminary results demonstrating that deep neural network classifiers perform equally well when applied on HS data captured with our compressively sensed methods, as when applied on conventionally sensed HS data.
Image fusion for context-aided automatic target recognition
Automatic Target Recognition (ATR) has seen many recent advances from image fusion, machine learning, and data collections to support multimodal, multi-perspective, and multi-focal day-night robust surveillance. This paper highlights ideas, strategies, and concepts as well as provides an example for electro-optical and infrared image fusion cooperative intelligent ATR analysis. The ATR results support simultaneous tracking and identification for physicsbased and human-derived information fusion (PHIF). The importance of context serves as a guide for ATR systems and determines the data requirements for robust training in deep learning approaches.
Phenomenology and Analysis
icon_mobile_dropdown
Robustness of adversarial camouflage (AC) for naval vessels
Different types of imaging sensors are frequently employed for detection, tracking and classification (DTC) of naval vessels. A number of countermeasure techniques are currently employed against such sensors, and with the advent of ever more sensitive imaging sensors and sophisticated image analysis software, the question becomes what to do in order to render DTC as hard as possible. In recent years, progress in deep learning, has resulted in algorithms for image analysis that often rival human beings in performance. One approach to fool such strategies is the use of adversarial camouflage (AC). Here, the appearance of the vessel we wish to protect is structured in such a way that it confuses the software analyzing the images of the vessel. In our previous work, we added patches of AC to images of frigates. The paches were placed on the hull and/or superstructure of the vessels. The results showed that these patches were highly effective, tricking a previously trained discriminator into classifying the frigates as civilian. In this work we study the robustness and generality of such patches. The patches have been degraded in various ways, and the resulting images fed to the discriminator. As expected, the more the patches are degraded, the harder it becomes to fool the discriminator. Furthermore, we have trained new patch generators, designed to create patches that will withstand such degradations. Our initial results indicate that the robustness of AC patches may be increased by adding degrading flters in the training of the patch generator.
Keynote Speaker Dr. Saurabh Prasad
icon_mobile_dropdown
Advances in supervised and semi-supervised machine learning for hyperspectral image analysis (Conference Presentation)
Saurabh Prasad
Recent advances in optical sensing technology (miniaturization and low-cost architectures for spectral imaging) and sensing platforms from which such imagers can be deployed have the potential to enable ubiquitous multispectral and hyperspectral imaging on demand in support of a variety of applications, including remote sensing and biomedicine. Often, however, robust analysis with such data is challenging due to limited/noisy ground-truth, and variability due to illumination, scale and acquisition conditions. In this talk, I will review recent advances in: (1) Subspace learning for learning illumination invariant discriminative subspaces from high dimensional hyperspectral imagery; (2) Semi-supervised and active learning for image analysis with limited ground truth; and (3) Deep learning variants that learn the spatial-spectral information in multi-channel optical data effectively from limited ground truth, by leveraging the structural information available in the unlabeled samples as well as the underlying structured sparsity of the data.
Multi Sensor/Multi Spectrum
icon_mobile_dropdown
Combining visible and infrared spectrum imagery using machine learning for small unmanned aerial system detection
There is an increasing demand for technology and solutions to counter commercial, off-the-shelf small unmanned aerial systems (sUAS). Advances in machine learning and deep neural networks for object detection, coupled with lower cost and power requirements of cameras, led to promising vision-based solutions for sUAS detection. However, solely relying on the visible spectrum has previously led to reliability issues in low contrast scenarios such as sUAS flying below the treeline and against bright sources of light. Alternatively, due to the relatively high heat signatures emitted from sUAS during ight, a long-wave infrared (LWIR) sensor is able to produce images that clearly contrast the sUAS from its background. However, compared to widely available visible spectrum sensors, LWIR sensors have lower resolution and may produce more false positives when exposed to birds or other heat sources. This research work proposes combining the advantages of the LWIR and visible spectrum sensors using machine learning for vision-based detection of sUAS. Utilizing the heightened background contrast from the LWIR sensor combined and synchronized with the relatively increased resolution of the visible spectrum sensor, a deep learning model was trained to detect the sUAS through previously difficult environments. More specifically, the approach demonstrated effective detection of multiple sUAS flying above and below the treeline, in the presence of heat sources, and glare from the sun. Our approach achieved a detection rate of 71.2 ± 8.3%, improving by 69% when compared to LWIR and by 30.4% when visible spectrum alone, and achieved false alarm rate of 2.7 ± 2.6%, decreasing by 74.1% and by 47.1% when compared to LWIR and visible spectrum alone, respectively, on average, for single and multiple drone scenarios, controlled for the same confidence metric of the machine learning object detector of at least 50%. With a network of these small and affordable sensors, one can accurately estimate the 3D position of the sUAS, which could then be used for elimination or further localization from more narrow sensors, like a fire-control radar (FCR). Videos of the solution's performance can be seen at https://sites.google.com/view/tamudrone-spie2020/.
Evaluating the variance in convolutional neural network behavior stemming from randomness
Christopher Menart
Deep neural networks are a powerful and versatile machine learning technique with strong performance on many tasks. A large variety of neural architectures and training algorithms have been published in the past decade, each attempting to improve aspects of performance and computational cost on specific tasks. But the performance of these methods can be chaotic. Not only does the behavior of a neural network vary significantly with respect to small algorithmic changes, but the same training algorithm, run multiple times, may produce models with different performance, due to multiple stochastic aspects of the training process. Replication experiments in deep neural network design is difficult in part for this reason. We perform empirical evaluations using the canonical task of image recognition with Convolutional Neural Networks to determine what degree of variation in neural network performance is due to random chance. This has implications for network tuning as well as for the evaluation of architecture and algorithm changes.
Network dynamics based sensor data processing
Two-dimensional (2D) image processing and three-dimensional (3D) LIDAR point cloud data analytics are two important techniques of sensor data processing for many applications such as autonomous systems, auto driving cars, medical imaging and many other fields. However, 2D image data are the data that are distributed in regular 2D grids while 3D LIDAR data are represented in point cloud format that consist of points nonuniformly distributed in 3D spaces. Their different data representations lead to different data processing techniques. Usually, the irregular structures of 3D LIDAR data often cause challenges of 3D LIDAR analytics. Thus, very successful diffusion equation methods for image processing are not able to apply to 3D LIDAR processing. In this paper, applying network and network dynamics theory to 2D images and 3D LIDAR analytics, we propose graph-based data processing techniques that unify 2D image processing and 3D LIDAR data analytics. We demonstrate that both 2D images and 3D point cloud data can be processed in the same framework, and the only difference is the way to choose neighbor nodes. Thus, the diffusion equation techniques in 2D image processing can be used to process 3D point cloud data. With this general framework, we propose a new adaptive diffusion equation technique for data processing and show with experiments that this new technique can perform data processing with high performance.
Target Detection Algorithms
icon_mobile_dropdown
Patch-based Gaussian mixture model for scene motion detection in the presence of atmospheric optical turbulence
In long-range imaging regimes, atmospheric turbulence degrades image quality. In addition to blurring, the turbulence causes geometric distortion effects that introduce apparent motion in acquired video. This is problematic for image processing tasks, including image enhancement and restoration (e.g., superresolution) and aided target recognition (e.g., vehicle trackers). To mitigate these warping effects from turbulence, it is necessary to distinguish between actual in-scene motion and apparent motion caused by atmospheric turbulence. Previously, the current authors generated a synthetic video by injecting moving objects into a static scene and then applying a well-validated anisoplanatic atmospheric optical turbulence simulator. With known per-pixel truth of all moving objects, a per-pixel Gaussian mixture model (GMM) was developed as a baseline technique. In this paper, the baseline technique has been modified to improve performance while decreasing computational complexity. Additionally, the technique is extended to patches such that spatial correlations are captured, which results in further performance improvement.
Real-time thermal infrared moving target detection and recognition using deep learned features
Aparna Akula, Varinder Kaur, Neeraj Guleria, et al.
Surveillance applications demand round the clock monitoring of regions in constrained illumination conditions. Thermal infrared cameras which capture the heat emitted by the objects present in the scene appear as a suitable sensor technology for such applications. However, developing of AI techniques for automatic detection of targets for monitoring applications is challenging due to high variability of targets within a class, variations in pose of targets, widely varying environmental conditions, etc. This paper presents a real-time framework to detect and classify targets in a forest landscape. The system comprises of two main stages: the moving target detection and detected target classification. For the first stage, Mixture of Gaussians (MoG) background subtraction is used for detection of Region of Interest (ROI) from individual frames of the IR video sequence. For the second stage, a pre-trained Deep Convolutional Neural Network with additional custom layers has been used for the feature extraction and classification. A challenging thermal dataset created by using both experimentally generated thermal infrared images and from publically available FLIR Thermal Dataset. This dataset is used for training and validating the proposed deep learning framework. The model demonstrated a preliminary testing accuracy of 95%. The real-time deployment of the framework is done on embedded platform having an 8-core ARM v8.2 64-bit CPU and 512-core Volta GPU with Tensor Cores. The moving target detection and recognition framework achieved a frame rate of approximately 23 fps on this embedded computing platform, making it suitable for deployment in resource constrained environments.
How robust are deep object detectors to variability in ground truth bounding boxes? Experiments for target recognition in infrared imagery
Evan A. Stump, Francisco Reveriano, Leslie M. Collins, et al.
In this work we consider the problem of developing deep learning models – such as convolutional neural networks (CNNs) - for automatic target detection (ATD) in infrared (IR) imagery. CNN-based ATD systems must be trained to recognize objects using bounding box (BB) annotations generated by human annotators. We hypothesize that individual annotators may exhibit different biases and/or variability in the characteristics of their BB annotations. Similarly, computer-aided annotation methods may also introduce different types of variability into the BBs. In this work we investigate the impact of BB variability on the behavior and detection performance of CNNs trained using them. We consider two specific BB characteristics here: the center-point, and the overall scale of BBs (with respect to the visual extent of the targets they label). We systematically vary the bias or variance of these characteristics within a large training dataset of IR imagery, and then evaluate the performance on the resulting trained CNN models. Our results indicate that biases in these BB characteristics do not impact performance, but will cause the CNN to mirror the biases in its BB predictions. In contrast, variance in these BB characteristics substantially degrades performance, suggesting care should be taken to reduce variance in the BBs.
Additional Paper
icon_mobile_dropdown
Methods for real-time optical location and tracking of unmanned aerial vehicles using digital neural networks
Unmanned aerial vehicles (UAVs) play important role in human life. Today there is a high rate of technology development in the field of unmanned aerial vehicles production. Along with the growing popularity of the private UAVs, the threat of using drones for terrorist attacks and other illegal purposes is also significantly increasing. In this case the UAVs detection and tracking in city conditions are very important. In this paper we consider the possibility of detecting drones from a video image. The work compares the effectiveness of fast neural networks YOLO v.3, YOLO v.3-SPP and YOLO v.4. The experimental tests showed the effectiveness of using the YOLO v.4 neural network for real-time UAVs detection without significant quality losses. To estimate the detection range, a calculation of the projection target points in different ranges was performed. The experimental tests showed possibility to detect UAVs size of 0.3 m at a distance about 1 km with Precision more than 90 %.