Proceedings Volume 11398

Geospatial Informatics X

cover
Proceedings Volume 11398

Geospatial Informatics X

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 18 June 2020
Contents: 6 Sessions, 18 Papers, 14 Presentations
Conference: SPIE Defense + Commercial Sensing 2020
Volume Number: 11398

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 11398
  • UAV Sensing and Analysis I
  • UAV Sensing and Analysis II
  • Full Motion Video Analytics
  • Environmental and Geospatial Analytics
  • Geospatial Informatics Applications
Front Matter: Volume 11398
icon_mobile_dropdown
Front Matter: Volume 11398
This PDF file contains the front matter associated with SPIE Proceedings Volume 11398, including the Title Page, Copyright information and Table of Contents
UAV Sensing and Analysis I
icon_mobile_dropdown
Developing structure-from-motion models from applied streetview and UAV images
It is common that after a disaster, teams are sent to collect data of damaged buildings. The images collected are often taken using a manual handheld camera and a drone. This data collection process is extremely manual and slow. It can even put the photo collectors in hazardous conditions. To address this, a drivable, omnidirectional camera can produce images that could potentially be combined with drone images to create a functioning three-dimensional model with drastically reduced data collection times. This paper aims to discuss the methods and applications of Applied Streetview images in Pix4D modeling software. The Applied Streetview images went through varied processing stages which resulted in models being combined with UAV data. The result of the merged data sets were then visually compared for aesthetic and accuracy purposes. The research is using images collected from areas around the University of Washington's campus.
Aerial 3D building reconstruction from RGB drone imagery
3D Building Reconstruction is an important problem with applications in urban planning, emergency response, and disaster planning. This paper presents a new pipeline for 3D reconstruction of buildings from RGB imagery captured via a drone. We leverage the commercial software Pix4D to construct a 3D point cloud from RGB drone imagery, which is then used in conjunction with image processing and geometric methods to extract a building footprint. The footprint is then extruded vertically based on the heights of the segmented rooftops. The footprint extraction involves two main steps, line segment detection and polygonization of the lines. To detect line segments, we project the point cloud onto a regular grid, detect preliminary lines using the Hough transform, refine them via RANSAC, and convert them into line segments by checking the density of the points surrounding the line. In the polygonization step, we convert detected line segments into polygons by constructing and completing partial polygons, and then filter them by checking for support in the point cloud. The polygons are then merged based on their respective height profiles. We have tested our system on two buildings of several thousand square feet in Alameda, CA, and obtained an F1 score of 0.93 and 0.95 respectively as compared to the ground truth.
Evaluation of different contour extraction approaches in the context of contour based image registration
In recent years, imagery from airborne sensors has become available at low costs, due to the advent of affordable drone systems. Such imagery can be used for addressing many different tasks in various fields of application. While the imagery itself bears all of the information required for some tasks, other tasks require the imagery to be georeferenced satisfying certain accuracy requirements. If the latter is not the case when performing such tasks, registering the imagery to reference images that come with a satisfying georeference allows us to transfer this georeference to the imagery. Many registration approaches described in literature require an image and the reference to be of sufficiently similar appearance, in order to work properly. To address registration problems in more unsimilar cases, we have been developing a registration method based on contour matching. In a nutshell, this method comprises two main steps, namely contour point extraction from both the image and the reference and matching them. Towards the optimization of the overall performance of our registration method, we strive to improve the performance of each step individually, by both implementing new algorithms and fine-tuning relevant parameters. The scope of this work is the implementation of a novel contour point extraction algorithm to improve step one of our method, as well as its evaluation in the context of our registration method. Line shaped objects exceeding a certain length, such as e.g. road networks, are likely to be present both in the image and the reference, despite their possible appearance disimilarity. The novel contour point extraction algorithm capitalizes on this by focusing on the extraction of contour points representing such line shaped objects.
GPU and multi-threaded CPU enabled normalized cross correlation
Nafis Ahmed, Evan Teters, Rumana Aktar, et al.
Image matching has been a critical research topic in many computer vision applications such as stereo vision, feature tracking, motion tracking, image registration and mosaicing, object recognition, 3D reconstruction, etc. Normalized Cross Correlation (NCC) is a template based image matching approach which is invariant to linear brightness and contrast variations. As a first step in mosaicing, we use NCC to a great extent for matching images which is an expensive and time consuming operation. Thus an attempt is made to implement NCC in GPU and multi-CPU in order to improve execution time for real time applications. Finally we compare the enhancement in performance and efficiency in timing by switching NCC implementation from CPU to GPU.
UAV Sensing and Analysis II
icon_mobile_dropdown
Characterization of unmanned aerial systems (UAS) geometry using feeds from fixed video cameras
Srikanth Gururajan, Matthew Dreyer
With the widespread use of multirotor UAS/drones in the civilian and commercial sector, the skies are going to get crowded. Under this scenario, it is not inconceivable to anticipate potential issues with enforcing flight rules and regulations on these drones. This might come about as a result of some failures on the drones themselves (loss of communication, sensor or actuator failures) or in some cases, deliberatively uncooperative drones. Therefore, in order to implement effective Counter UAS (C-UAS) measures, it is important to fully characterize the uncooperative drone, particularly its capabilities; the first step in this process is the identification of the geometry of the drone. In this paper, we present the preliminary results of an effort to characterize the geometry of a drone, using feeds from fixed video cameras. Preliminary results indicate that it is feasible to identify the general geometry of the drone – such as if it is a quadcopter or otherwise.
UAV detection with a dataset augmented by domain randomization
Diego Marez, Samuel Borden, Lena Nans
Object detection for computer vision systems continues to be a complicated problem in real-world situations. For instance, autonomous vehicles need to operate with very small margins of error as they encounter safety-critical scenarios such as pedestrian and vehicle detection. The increased use of unmanned aerial vehicles (UAVs) by both government and private citizens has created a need for systems which can reliably detect UAVs in a large variety of conditions and environments. In order to achieve small margins of error, object detection systems, especially those reliant on deep learning methods, require large amounts of annotated data. The use of synthetic datasets provides a way to alleviate the need to collect annotated data. Unfortunately, the nature of synthetic dataset generation introduces a reality and simulation gap that hinders an object detector's ability to generalize on real world data. Domain randomization is a technique that generates a variety of different scenarios in a randomized fashion both to close the reality and simulation gap and to augment a hand-crafted dataset. In this paper, we combine the AirSim simulation environment with domain randomization to train a robust object detector. As a final step, we fine-tune our object detector on real-world data and compare it with object detectors trained solely on real-world data.
Deep net route generation faster than a bullet
Recent breakthroughs in deep net processing have shown the ability to compute solutions to physics-based problems such as the three-body problem many orders-of-magnitude times faster. In this paper, we show how a deep autoencoder, trained on paths generated using a dynamical, physics-based model can generate comparable routes much faster. The autogenerated routes have all the properties of a physics-based model without the computational burden of explicitly solving the dynamical equations. This result is useful for planning and multi-agent reinforcement learning simulation purposes. In addition, the fast route planning capability may prove useful in real time situations such as collision avoidance or fast dynamic targeting response.
Design of trace-based NS-3 simulations for UAS video analytics with geospatial mobility
Chengyi Qu, Alicia Esquivel Morel, Drew Dahlquist, et al.
The continuous evolution of commercial Unmanned Aerial Systems (UAS) is fuelling a rapid advancement in the fields of network edge-communication applications for smart agriculture, smart traffic management, and border security. A common problem in UAS (a.k.a. drone systems) research and development is the cost related to deploying and running realistic testbeds. Due to the constraints in safe operation, handling limited energy resources, and government regulation restrictions, UAS testbed building is time-consuming and not easily configurable for high-scale experiments. In addition, experimenters have a hard time creating repeatable and reproducible experiments to test major hypotheses. In this paper, we present a design for performing tracebased NS-3 simulations that can be helpful for realistic UAS simulation experiments. We run experiments with real-world UAS traces including various mobility models, geospatial link information and video analytics measurements. Our experiments assume a hierarchical UAS platform with low-cost/high-cost drones co-operating using a geo-location service in order to provide a ‘common operating picture’ for decision makers. We implement a synergized drone and network simulator that features three main modules: (i) learning-based optimal scheme selection module, (ii) application environment monitoring module, and (iii) trace-based simulation and visualization module. Simulations generated from our implementation have the ability to integrate di↵erent drone configurations, wireless communication links (air-to-air; air-to-ground), as well as mobility routing protocols. Our approach is beneficial to evaluate network-edge orchestration algorithms pertaining to e.g., management of energy consumption, video analytics performance, and networking protocols configuration.
DeepOSM-3D: recognition in aerial LiDAR RGBD imagery
In this paper, we present a pipeline and prototype vision system for near-real-time semantic segmentation and classification of objects such as roads, buildings, and vehicles in large high-resolution wide-area real-world aerial LiDAR point-cloud and RGBD imagery. Unlike previous works, which have focused on exploiting ground- based sensors or narrowed the scope to detecting the density of large objects, here we address the full semantic segmentation of aerial LiDAR and RGBD imagery by exploiting crowd-sourced labels that densely canvas each image in the 2015 Dublin dataset.1 Our results indicate important improvements to detection and segmentation accuracy with the addition of aerial LiDAR over RGB imagery alone, which has important implications for civilian applications such as autonomous navigation and rescue operations. Moreover, the prototype system can segment and search geographic areas as big as 1km2 in a matter of seconds on commodity hardware with high accuracy (_ 90%), suggesting the feasibility of real-time scene understanding on small aerial platforms.
Full Motion Video Analytics
icon_mobile_dropdown
Short-Term Video Stabilization Using Ground Plane Segmentation for Low Altitude UAV Object Tracking (Conference Presentation)
Deniz Kavzak Ufuktepe, Jaired Collins, Hadi AliAkbarpour, et al.
Fast, efficient and robust algorithms are needed for real-time visual tracking that could also run smoothly on the airborne embedded systems. Flux tensor can be used to provide motion-based cues in visual tracking. In order to use any object motion detection on a raw image sequence captured by a moving platform, the motion caused by the camera movement must be stabilized first. Using feature points to estimate the homography matrix between the frames is a simple registration method that can be used for the stabilization. In order to have a good homography estimation, most of the feature points should lay on the same plane in the images. However, when the scene has complex structures it becomes very challenging to estimate a good homography. In this work, we propose a robust video stabilization algorithm which allows the flux motion detection to efficiently identify moving objects. Our experiments show satisfactory results when other methods shown to fail on the same type of raw videos.
Proper synchronization of geospatial metadata in motion imagery and its evaluation
Bastian Erdnuess
Motion imagery with geospatial metadata are recordings that are used to provide information about the observed scene. Given the comparably high speed and agility of the sensor platform (usual some kind of aircraft), metadata has to be synchronized very accurately to each individual recorded image to yield accurate results. The quality of the geospatial metadata can be evaluated by a 3D reconstruction of a motion imagery sequence (with software like Agisoft Metashape1 or COLMAP2, 3) and comparison of the reconstructed camera poses with the camera poses derived from the metadata. The obtained results so far suggest that a miss synchronization between the video frames and the metadata is often one of the largest sources of inaccuracies of the geospatial metadata and one of the most easy to avoid. For this reason, we assembled our own system with a commercially available image sensor and metadata module that can be attached to a small aircraft and evaluated the quality of its metadata on a test flight. This article describes the used system and the result of the metadata calibration4 performed to evaluate the quality of the metadata and its synchronization to the image frames.
pyTAG: python-based interactive training data generation for visual tracking algorithms (Conference Presentation)
In this study, a rapid training data and ground truth generation tool has been implemented for visual tracking. The proposed tool's plugin structure allows integration, testing, and validation of different trackers. The tracker can be paused, resumed, forwarded, rewound and re-initialized on the run, after it loses the object, which is a needed step in the training data generation. This tool has been implemented to assist researchers to rapidly generate ground truth and training data, fix annotations, run and visualize their own single object trackers, or existing object tracking techniques.
Maritime LOD balancing: evaluating the effect of level of detail on ship classification
Cameron Hilton, Jane Berk, Shibin Parameswaran
Synthetic data has shown to be an effective proxy for real data in order to train computer vision algorithms when acquiring labeled data is costly or impossible. Ship detection and classification from satellite imagery and surveillance video is one such area, and images generated using gaming engines such as Unity3D have been used successfully to circumvent the need for annotated real data. However, there is a lack of understanding of the effect of rendering quality of 3D models on algorithms that use synthetic data. In this work, we investigate how the level of detail (LOD) of objects in a maritime scene affects ship classification algorithms. To study this systematically, we create datasets featuring objects with varying LODs and observe their significance in computer vision algorithms. Specifically, we evaluate the impact of mismatched LOD datasets on classification algorithms, and investigate the effect of low or high LOD datasets on a model's ability to transfer to real data. The LOD of 3D objects are quantified using image quality metrics while the performance of computer vision algorithms is compared using accuracy metrics.
Environmental and Geospatial Analytics
icon_mobile_dropdown
Robust terrain classification of high spatial resolution remote sensing data employing probabilistic feature fusion and pixelwise voting
R. Derek West, Brian J. Redman, David A. Yocky, et al.
There are several factors that should be considered for robust terrain classification. We address the issue of high pixel-wise variability within terrain classes from remote sensing modalities, when the spatial resolution is less than one meter. Our proposed method segments an image into superpixels, makes terrain classification decisions on the pixels within each superpixel using the probabilistic feature fusion (PFF) classifier, then makes a superpixel-level terrain classification decision by the majority vote of the pixels within the superpixel. We show that this method leads to improved terrain classification decisions. We demonstrate our method on optical, hyperspectral, and polarimetric synthetic aperture radar data.
Deep learning model for accurate vegetation classification using RGB image only
The objective of this paper is to detect the type of vegetation so that a more accurate Digital Terrain Model (DTM) can be generated by excluding the vegetation from the Digital Surface Model (DSM) based on the vegetation type (such as trees). This way, many different inpainting methods can be applied subsequently to restore the terrain information from the removed vegetation pixels from DSM and obtain a more accurate DTM. We trained three DeepLabV3+ models with three different datasets that are collected at different resolutions. Among the three DeepLabV3+ models, the model trained with the dataset that has an image resolution close to the test data images provided the best performance and the semantic segmentation results with this model looked highly promising.
Performance comparison of different inpainting algorithms for accurate DTM generation
To accurately extract digital terrain model (DTM), it is necessary to remove heights due to vegetation such as trees and shrubs and other manmade structures such as buildings, bridges, etc. from the digital surface model (DSM). The resulting DTM can then be used for construction planning, land surveying, etc. Normally, the process of extracting DTM involves two steps. First, accurate land cover classification is required. Second, an image inpainting process is needed to fill in the missing pixels due to trees, buildings, bridges, etc. In this paper, we focus on the second step of using image inpainting algorithms for terrain reconstruction. In particular, we evaluate seven conventional and deep learning based inpainting algorithms in the literature using two datasets. Both objective and subjective comparisons were carried out. It was observed that some algorithms yielded slightly better performance than others.
Geospatial Informatics Applications
icon_mobile_dropdown
An exploration of NIIRS, image quality, and machine learning
The interpretability of an image indicates the potential intelligence value of the data. Historically, the National Imagery Interpretability Rating Scale (NIIRS) has been the standard for quantifying the intelligence potential based on image analysis by human observers. Empirical studies have demonstrated that spatial resolution is the dominant predictor of the NIIRS level of an image. Today, the value of imagery is no longer simply determined by spatial resolution, since additional factors such as spectral diversity and temporal sampling are significant. Furthermore, analyses are performed by machines as well as humans. Consequently, NIIRS no longer accurately quantifies potential intelligence value for an image or set of images. We are exploring new measures of information potential based on mutual information. Our research suggests that new measures of image “quality” based on information theory can provide meaningful standards that go beyond NIIRS. In our approach, mutual information provides an objective method for quantifying divergence across objects and activities in an image. This paper presents the rationale for our approach, the technical description, and the results of early experimentation to explore the feasibility of establishing an information-theoretic standard for quantifying the intelligence potential of an image.
GAN-based unpaired image-to-image translation for maritime imagery
Chelsea Mediavilla, Jonathan Sato, Mitch Manzanares, et al.
Generating imagery using gaming engines has become a popular method to both augment or completely replace the need for real data. This is due largely to the fact that gaming engines, such as Unity3D and Unreal, have the ability to produce novel scenes and ground-truth labels quickly and with low-cost. However, there is a disparity between rendering imagery in the digital domain and testing in the real domain on a deep learning task. This disparity/gap is commonly known as domain mismatch or domain shift, and without a solution, renders synthetic imagery impractical and ineffective for deep learning tasks. Recently, Generative Adversarial Networks (GANs) have shown success at generating novel imagery and overcoming this gap between two different distributions by performing cross-domain transfer. In this research, we explore the use of state-of-the-art GANs to perform a domain transfer between a rendered synthetic domain to a real domain. We evaluate the data generated using an image-to-image translation GAN on a classification task as well as by qualitative analysis.
Passive identification of vessel type through track motion analysis
Peter C. Yung, John M. Irvine
Maritime situational awareness depends on accurate knowledge of the locations, types, and activities of ocean-bound vessels. Such data can be gathered by analyzing the motion patterns of vessel tracks collected using coastal radar, visual identification, and Automatic Identification System (AIS) reports. We have developed a technique for predicting the types of vessels from abstract representations of their motion patterns. Our approach involves constructing multiple state sequences which represent activities syntactically. From these sequences, we generate multi-state transition matrices, which are the central feature used to train a support-vector machine classifier. Applying this technique to historical AIS data, our model successfully predicts vessel type even in cases where vessels do not follow known routes. Using only location information as the base feature for our model, we circumvent classification issues that arise from vessels' non-compliance with AIS regulations as well as the inability to visually identify vessels.
Long time-series analysis of urban development based on effective building extraction
The effective detection of urban development is the basis of understanding urban sustainability. Although various studies concentrated on long-time-series analysis on urban development, the resolution of images was too low to focus on a single object. In this paper, we provide a long-time-series analysis of built-up areas at an annual frequency in Beijing, China, from 2000 to 2015, based on the automatic building extraction and high-resolution satellite images. We propose a deeplearning based method to extract buildings, and employ an ensemble learning method to improve the localization of boundaries. The time-series results of built-up areas are analyzed based on two schemes, i.e., change detection over the past fifteen years and evaluation of the whole region in three selected years. Our proposed method achieves an average overall accuracy (OA) of 93%. The results reveal that Beijing developed more rapidly during 2001-2008 than other periods in terms of the density and the number of buildings.