Proceedings Volume 11139

Applications of Machine Learning

cover
Proceedings Volume 11139

Applications of Machine Learning

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 6 November 2019
Contents: 10 Sessions, 36 Papers, 8 Presentations
Conference: SPIE Optical Engineering + Applications 2019
Volume Number: 11139

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 11139
  • Machine Vision and Manufacturing
  • Electro-Optics, Imaging, and Video Processing
  • Computing Hardware for Machine Learning
  • Remote Sensing
  • Big Data, Simulations, and Theory I
  • Big Data, Simulations, and Theory II
  • Medical Imaging and Healthcare I
  • Medical Imaging and Healthcare II
  • Poster Session
Front Matter: Volume 11139
icon_mobile_dropdown
Front Matter: Volume 11139
This PDF file contains the front matter associated with SPIE Proceedings Volume 11139, including the Title Page, Copyright Information, Table of Contents, Author and Conference Committee lists.
Machine Vision and Manufacturing
icon_mobile_dropdown
Machine learning prediction of defect types for electroluminescence images of photovoltaic panels
Despite recent technological advances for Photovoltaic panels maintenance (Electroluminescence imaging, drone inspection), only few large-scale studies achieve identification of the precise category of defects or faults. In this work, Electroluminescence imaged modules are automatically split into cells using projections on the x and y axes to detect cell boundaries. Regions containing potential defects or faults are then detected using Hough transform combined with mathematical morphology. Care is taken to remove most of the bus bars or cell boundaries. Afterwards, 25 features are computed, focusing on both the geometry of the regions (e.g. area, perimeter, circularity) and the statistical characteristics of their pixel values (e.g. median, standard deviation, skewness). Finally, features are mapped to the ground truth labels with Support Vector Machine (RBF kernel) and Random Forest algorithms, coupled with undersampling and SMOTE oversampling, with a stratified 5- folds approach for cross validation. A dataset of 982 Electroluminescence images of installed multi-crystalline photovoltaic modules was acquired in outdoor conditions (evening) with a CMOS sensor. After automatic blur detection, 753 images or 47244 cells remain to evaluate faults. All images were evaluated by experts in PV fault detection that labelled: Finger failures, and three types of cracks based on their respective severity levels (A, B and C). Our results based on 6 data series, yield using Support Vector Machine an accuracy of 0.997 and a recall of 0.274. Improving the region detection process will most likely allow improving the performance.
Deep learning-based semantic segmentation for in-process monitoring in laser welding applications
The broad uses of laser welding in various industrial applications such as shipbuilding, automotive production and battery manufacturing, result from its capabilities of high productivity, flexibility and effectiveness1. However, the complex nature of laser-material interaction requires additional measures in order to reach the high-quality standards of the goods produced. Therefore, continuous process monitoring in laser welding is crucial to achieve reliable mass production and high-quality products at once. Camera-based process monitoring offers great advantages compared to one-dimensional observation techniques. The spatial resolution enables the monitoring of several process characteristics simultaneously, which leads to a more detailed description of the current process state2. In the last few years, we proposed a coaxially integrated camera system with external illumination. Process images taken by this system typically show the keyhole area, the weld pool, but also areas of solidified weld and areas of the blank sheet3. To automate image evaluation with respect to the recognition of aforementioned areas, we propose a convolutional neural network architecture to perform pixel-wise image classification4. In this paper, we investigate the influence of multiple hyper-parameters required for the network architecture in use, but also the amount of data that is necessary for high segmentation accuracies. In a second step, the outcome of the network is used to detect process deviations in laser welding image data using supervised machine learning. With the help of the Random Forest algorithm, assessment of the extracted process characteristics with respect to prediction accuracy takes place. Based on the information of the segmented image data, further investigations are carried out into the possibility of predicting individual process parameters such as laser power, welding speed and focus size simultaneously.
Image classification and control of microfluidic systems
Albert B. Chu, Du Nguyen, Alan D. Kaplan, et al.
Current microfluidic-based microencapsulation systems rely on human experts to monitor and oversee the entire process spanning hours in order to detect and rectify when defects are found. This results in high labor costs, degradation and loss of quality in the desired collected material, and damage to the physical device. We propose an automated monitoring and classification system based on deep learning techniques to train a model for image classification into four discrete states. Then we develop an actuation control system to regulate the flow of material based on the predicted states. Experimental results of the image classification model show class average recognition rate of 95.5%. In addition, simulated test runs of our valve control system verify its robustness and accuracy.
Deep learning for automated defect detection in high-reliability electronic parts
Emily A. Donahue, Tu-Thach Quach, Kevin Potter, et al.
Recent advances in deep learning have shown promising results for anomaly detection that can be applied to the problem of defect detection in electronic parts. In this work, we train a deep learning model with Generative Adversarial Networks (GANs) to detect anomalies in images of X-ray CT scans. The GANs detections can then be reviewed by an analyst to confirm the presence or absence of a defect in a scan, significantly reducing the amount of time required to analyze X-Ray CT scans. We employ a trained GAN via a system referred to in the literature as an AnoGAN. We train the AnoGAN on images of X-Ray CT scans from normal, non-defective components until it is capable of generating images that are indistinguishable from genuine part scans. Once trained, we query the AnoGAN with an image of an X-ray CT scan that is known to contain a defect, such as a crack or a void. By sampling the GANs latent space, we generate an image that is as visually close to the query image as possible. Because the AnoGAN has learned a distribution over non-defective parts, it can only produce images without defects. By taking the difference between the query image and the generated image, we are able to highlight anomalous areas in the defective part. We hypothesize that this work can be used to improve speed and accuracy for quality assurance of manufactured parts by applying machine learning to non-destructive imaging.
Electro-Optics, Imaging, and Video Processing
icon_mobile_dropdown
Sensor networks and artificial intelligence for real time motion analysis
This paper addresses the problem of motion analysis performed from digital data captured by a network of motion sensors distributed over a three-dimensional field of interest. Motion analysis means performing motion detection, motion-oriented classification, estimation, and prediction of kinematic parameters, tracking to build trajectories, and warning of the occurrence of potential abnormalities, incidents or accidents. Kinematic parameters are defined as spatial and temporal positions, velocity, scale and orientation. The entire system can be decomposed into three major components. First, a network of sensors captures and generates all relevant motion information. Second, a tree-structured telecommunication system concentrates all motion information to a data sink or gateway. Third, an Artificial Intelligence (A.I.) in a remote monitoring center processes the entire data stream transmitted from the gateway. The A.I. is composed of three major components: a Simulating Software, a Deep Learning System, and an Expert System. This paper addresses the structural relation between motion sensor network and artificial intelligence in order to display on a screen a complete and real time motion analysis of the events taking place in a three dimensional field of interest. This work will address and compare different motion sensors. The reference network being a network made of motion sensors based on passive photodetection. Other sensor networks of interest are networks based on active detection namely ultrasonic waves (SONAR), microwaves (RADAR) and lasers (LIDAR). A limited amount of video cameras turns out to the unavoidable with any motion sensor net- works either active or passive, distributed or localized. Video cameras are required to produce high resolution images allowing pattern recognition and motion disambiguation. To conclude, a comparison is presented of different distributed systems that perform motion analysis through different potential technologies for motion sensor networks. To efficiently network in real time with an A. I., two main challenging questions are raised that are related first to the motion information structure, and second, to the amount to be transmitted. Distributed passive photodetection sensor networks are optimal solutions for long term indoor or short term outdoor analyses. Active sensor networks are optimal solution to extend long term motion analysis to surrounding outdoors.
Semantic segmentation in egocentric video frames with deep learning for recognition of activities of daily living
José A. Zamorano Raya, Mireya S. García Vázquez, Juan C. Jaimes Méndez, et al.
The analysis of videos for the recognition of Instrumental Activities of Daily Living (IADL) through the detection of objects and the context analysis, applied for the evaluation of patient’s capacity with Alzheimer's disease and age related dementia, has recently gained a lot of interest. The incorporation of human perception in the recognition tasks, search, detection and visual content understanding has become one of the main tools for the development of systems and technologies that support the performance of people in their daily life activities. In this paper we propose a model of automatic segmentation of the saliency region where the objects of interest are found in egocentric video using fully convolutional networks (FCN). The segmentation is performed with the information regarding to human perception, obtaining a better segmentation at pixel level. This segmentation involves objects of interest and the salient region in egocentric videos, providing precise information to detection systems and automatic indexing of objects in video, where these systems have improved their performance in the recognition of IADL. To measure models segmentation performance of the salient region, we benchmark two databases; first, Georgia-Tech-Egocentric-Activity database and second, our own database. Results show that the method achieves a significantly better performance in the precision of the semantic segmentation of the region where the objects of interest are located, compared with GBVS (Graph-Based Visual Saliency) method
Image disambiguation with deep neural networks
Omar DeGuchy, Alex Ho, Roummel F. Marcia
In many signal recovery applications, measurement data is comprised of multiple signals observed concurrently. For instance, in multiplexed imaging, several scene subimages are sensed simultaneously using a single detector. This technique allows for a wider field-of-view without requiring a larger focal plane array. However, the resulting measurement is a superposition of multiple images that must be separated into distinct components. In this paper, we explore deep neural network architectures for this image disambiguation process. In particular, we investigate how existing training data can be leveraged and improve performance. We demonstrate the effectiveness of our proposed methods on numerical experiments using the MNIST dataset.
Restoration of turbulence-degraded images based on deep convolutional network
Atmospheric turbulence is an irregular form of motion in the atmosphere. Because of turbulence interference, when the optical system through the atmosphere of the target imaging, the observed image will appear point intensity diffusion, image blur, image drift and other turbulence effects. Digital recovery of the turbulence-degraded images technique is a classical ill-conditioned problem by removing the blurring effect and suppressing the noise. Traditional approaches relying on image heuristics suffer from high frequency noise amplification and processing artifacts. In this paper, the image degradation models of the turbulent flow are given, the point spread function of turbulence is simulated by the similar Gaussian function model, and investigated a general framework of neural networks for restoring turbulence-degraded images. The blur and additive noise are considered simultaneously. Two solutions respectively exploiting fully convolutional networks (FCN) and conditional Generative Adversarial Networks (CGAN) are presented. The FCN based on minimizing the mean squared reconstruction error (MSE) in pixel space gets high PSNR. On the other side, the CGAN based on perceptual loss optimization criterion retrieves more textures. We conduct comparison experiments to demonstrate the performance at different degree of turbulence intensity from the training configuration. The results indicate that the proposed networks outperform traditional approaches for restoring high frequency details and suppressing noise effectively.
Computing Hardware for Machine Learning
icon_mobile_dropdown
Digital neuromorphic chips for deep learning inference: a comprehensive study
Over the past few years, deep neural networks have achieved state-of-the-art accuracy in a broad spectrum of applications. However, implementing deep networks in general purpose architectures is a challenging task as they require high computational resources and massive memory bandwidth. Recently, several digital neuromorphic chips have been proposed to address these issues. In this paper, we explore sixteen prominent rate based digital neuromorphic chip architectures, optimized primarily for inference. Specific focus is on: What is the motivation to design digital neuromorphic chips? Which optimizations play a key role in improving their performance? What are the main research trends in current generation chips?
Digit recognition based on programmable nanophotonic processor
Artificial neural networks are computational models enlightened by biological neural networks, playing a significant role in image recognition, language translation and computer vision fields, etc. In this paper, we propose a fully optical neural network based on programmable nanophotonic processor (PNP) to realize digit recognition. The architecture includes 4 layers cascaded Mach–Zehnder interferometers (MZIs), which could theoretically execute matrix functions corresponding to a two-layer fully connected ANN with four inputs. We simulate cascaded MZIs and adjust phase shifters to match weight matrices calculated by ANN in computer beforehand. The accuracy of 4-class handwritten digits in ONN is 80.29% due to the compressed input data. The accuracy of 10-class digits could achieve 99.23% when the input node merely increases to 36. The results demonstrate the handwritten digits could be recognized effectively through PNP in ONN and the construction of PNP could be extended for more complex recognition systems.
Remote Sensing
icon_mobile_dropdown
Reconstruct missing pixels of Landsat land surface temperature product using a CNN with partial convolution
U.S. Landsat Analysis Ready Data (ARD) recently included the Land Surface Temperature (LST) product, which contains widespread and irregularly-shaped missing pixels due to cloud contamination or incomplete satellite coverage. Many analyses rely on complete LST images therefore techniques that accurately fill data gaps are needed. Here, the development of a partial-convolution based model with the U-Net like architecture to reconstruct the missing pixels in the ARD LST images is discussed. The original partial convolution layer is modified to consider both the convolution kernel weights and the number of valid pixels in the calculation of the mask correction ratio. In addition, the new partial merge layer is developed to merge feature maps according to their masks. Pixel reconstruction using this model was conducted using Landsat 8 ARD LST images in Colorado between 2014 and 2018. Complete LST patches (64x64) for two identical scenes acquired at different dates (up to 48 days apart) were randomly paired with ARD cloud masks to generate the model inputs. The model was trained for 10 epochs and the validation results show that the average RMSE values for a restored LST image in the unmasked, masked, and whole region are 0.29K, 1.00K, and 0.62K, respectively. In general, the model is capable of capturing the high-level semantics from the inputs and bridging the difference in acquisition dates for gap filling. The transition between the masked and unmasked regions (including the edge area of the image) in restored images is smooth and reflects realistic features (e.g., LST gradients). For large masked areas, the reference provides semantics at both low and high levels.
Conditional generative adversarial networks for data augmentation and adaptation in remotely sensed imagery
Jonathan Howe, Kyle Pula, Aaron A. Reite
The difficulty in obtaining labeled data relevant to a given task is among the most common and well-known practical obstacles to applying deep learning techniques to new or even slightly modified domains. The data volumes required by the current generation of supervised learning algorithms typically far exceed what a human needs to learn and complete a given task. We investigate ways to expand a given labeled corpus of remote sensed imagery into a larger corpus using Generative Adversarial Networks (GANs). We then measure how these additional synthetic data affect supervised machine learning performance on an object detection task.

Our data driven strategy is to train GANs to (1) generate synthetic segmentation masks and (2) generate plausible synthetic remote sensing imagery corresponding to these segmentation masks. Run sequentially, these GANs allow the generation of synthetic remote sensing imagery complete with segmentation labels. We apply this strategy to the data set from ISPRS' 2D Semantic Labeling Contest - Potsdam, with a follow on vehicle detection task. We find that in scenarios with limited training data, augmenting the available data with such synthetically generated data can improve detector performance.
Unsupervised feature learning in remote sensing
Aaron Reite, Scott Kangas, Zackery Steck, et al.
The need for labeled data is among the most common and well-known practical obstacles to deploying deep learning algorithms to solve real-world problems. The current generation of learning algorithms requires a large volume of data labeled according to a static and pre-defined schema. Conversely, humans can quickly learn generalizations based on large quantities of unlabeled data, and turn these generalizations into classifications using spontaneous labels, often including labels not seen before. We apply a state-of-the-art unsupervised learning algorithm to the noisy and extremely imbalanced xView data set to train a feature extractor that adapts to several tasks: visual similarity search that performs well on both common and rare classes; identifying outliers within a labeled data set; and learning a natural class hierarchy automatically.
Phenomenological versus random data augmentation for hyperspectral target detection
Joshua D. Zollweg, Charles F. LaCasse, Braden J. Smith
In this effort, random noise data augmentation is compared to phenomenologically-inspired data augmentation for a target detection task, evaluated on the Digital Imaging and Remote Sensing Image Generation (DIRSIG) model “MegaScene” simulated hyperspectral dataset. Random data augmentation is commonly used in the machine learning literature to improve model generalization. While random perturbations of an input may work well in certain fields such as image classification, they can be unhelpful in other applications such as hyperspectral target detection. For instance, random noise augmentation may not be beneficial when the applied noise distribution does not match underlying physical signal processes or sensor noise. In the context of a low-noise sensor, augmentation mimicking material mixing and other practical spectral modulations is likely to be more effective when used to train a target detector. It is therefore important to utilize a data augmentation strategy that emulates the natural variability in observed spectra. To validate this claim, a small fully connected neural network architecture is trained using an ideal hemispheric reflectance materials dataset as a trivial baseline. That dataset is then augmented using Gaussian random noise and the model is retrained and again applied to MegaScene. Finally, augmentation is instead performed using phenomenological insight and used to retrain and reevaluate the model. In this work, the phenomenological augmentation implements only simple and commonly encountered spectral permutations, namely linear mixing and shadowing. Comparison is made between the augmented models and the baseline model in terms of low constant false alarm rate (CFAR) performance.
Paired neural networks for hyperspectral target detection
Dylan Z. Anderson, Joshua D. Zollweg, Braden J. Smith
Spectral matched filtering and its variants (e.g. Adaptive Coherence Estimator or ACE) rely on strong assumptions about target and background distributions. For instance, ACE assumes a Gaussian distribution of background and additive target model. In practice, natural spectral variation, due to effects such as material Bidirectional Reflectance Distribution Function, non-linear mixing with surrounding materials, or material impurities, degrade the performance of matched filter techniques and require an ever-increasing library of target templates measured under different conditions. In this work, we employ the contrastive loss function and paired neural networks to create data-driven target detectors that do not rely on strong assumptions about target and background distribution. Furthermore, by matching spectra to templates in a highly nonlinear fashion via neural networks, our target detectors exhibit improved performance and greater resiliency to natural spectral variation; this performance improvement comes with no increase in target template library size. We evaluate and compare our paired neural network detector to matched filter-based target detectors on a synthetic hyperspectral scene and the well-known Indian Pines AVIRIS hyperspectral image.
Big Data, Simulations, and Theory I
icon_mobile_dropdown
A deep learning framework for mesh relaxation in arbitrary Lagrangian-Eulerian simulations
Ming Jiang, Brian Gallagher, Noah Mandell, et al.
The Arbitrary Lagrangian-Eulerian (ALE) method is used in a variety of engineering and scientific applications for enabling multi-physics simulations. Unfortunately, the ALE method can suffer from failures that require users to adjust a set of parameters to control mesh relaxation. In this paper, we present a deep learning framework for predicting mesh relaxation in ALE simulations. Our framework is designed to train a neural network using data generated from existing ALE simulations developed by expert users. In order to capture the spatial coherence inherent in simulations, we apply convolutional-deconvolutional neural networks to achieve up to 0.99 F1 score in predicting mesh relaxation.
Trade-offs between inference and learning in image segmentation
Quan Nguyen, Reid Porter, Beate Zimmer
During end-to-end learning, application level performance metrics, in combination with large training sets, are used to optimize deep neural network pipelines for the task at hand. There are two main places where application level performance metrics are typically introduced: in energy functions that are minimized during inference and in loss functions that are minimized during training. Minimizing energy functions and minimizing loss functions are both hard problems in the general case and an application specific trade-off must be made between how much effort is spent in inference versus training. In this paper we explore this trade-off in the context of image segmentation. Specifically, we use a novel, computationally efficient, family of networks to investigate the trade-off between two traditional extremes. At one extreme are inference networks that minimize a correlation clustering energy function. At the other extreme are learning networks that minimize a Rand Error loss function.
Big Data, Simulations, and Theory II
icon_mobile_dropdown
ML health monitor: taking the pulse of machine learning algorithms in production
Bringing the research advances in Machine Learning (ML) to production is necessary for businesses to gain value from ML. A key challenge of production ML is the monitoring and management of real-time prediction quality. This is complicated by the variability of live production data, the absence of real-time labels and the non-determinism posed by ML techniques themselves. We define ML Health as the real time assessment of ML prediction quality and present an approach to monitoring and improving ML Health. Specifically, a complete solution to monitor and manage ML Health within a realistic full production ML lifecycle. We describe a number of ML Health techniques and assess their efficacy via publicly available datasets. Our solution handles production realities such as scale, heterogeneity and distributed runtimes. We present what we believe is the first solution to production ML Health explored at both an empirical and complete system implementation level.
Simple generative model for assessing feature selection based on relevance, redundancy, and redundancy
An experimental procedure is proposed for measuring the performance of feature selection algorithms in a way that is not directly tied either to particular machine learning algorithms or to particular applications. The main interest is in situations for which there are a large number of features to be sifted through. The approach is based on simulated training sets with adjustable parameters that characterize the relevance" of individual features as well as the collective redundancy" of sets of features. In some cases, these training sets can be virtualized; that is, having specified their properties, one does not actually have to explicitly generate them. As a specific illustration, the method is used to compare variants of the minimum redundancy maximum relevance (mRMR) algorithm, and to characterize the performance of these variants in different regimes.
Nonlinear spectral preprocessing for small-brain machine learning
Luat T. Vuong, Hobson Lane
Substantial computing costs are required to use deep-learning algorithms. Here, we implement feature extraction based on analytic relations in the Fourier-transform domain. In an example relevant to visual odometry, we demonstrate a reduction in algorithmic complexity with cross-power spectral preprocessors for feature extraction in lieu of learned convolutional filters. With spectral reparameterization and spectral pooling, not only can the optical flow (spatial disparity of images in a sequence) be computed, but occluding objects can also be tracked in the foreground without deep learning. There is evidence that insects with small brains implement similar visual-data spectral preprocessors, which may be critical in the development of future real-time machine learning applications.
Medical Imaging and Healthcare I
icon_mobile_dropdown
Glaucoma diagnosis using transfer learning methods
Comparison of deep learning results from various studies for glaucoma diagnosis is essentially meaningless since private data sets are often used. Another challenge is overfitting of the deep learning models with relatively small public datasets. This overfitting leads to poor generalization. Here, we propose a practical approach for fine tuning an existing state-of-the art deep learning model, namely, the Inception-v3 for glaucoma detection.. A two pronged approach using a transfer learning methodology combined with data augmentation and normalization is proposed herein. We used a publicly available dataset, RIM-ONE which has 624 monocular and 159 stereoscopic retinal fundus images. Data augmentation operations mimicking the natural deformations in fundus images along with Contrast Limited Adaptive Histogram Equalization (CLAHE) and normalization were applied to the images. The weights of Inception-v3 network were pretrained on the ImageNet dataset which consists of real-world objects. We finetuned this network for the RIM-ONE dataset to get the deep features required for glaucoma detection without overfitting. Even though we used a small dataset, the results obtained from this network are comparable to that reported in the literature.
Cross-domain diabetic retinopathy detection using deep learning
Globally Diabetic retinopathy (DR) is one of the leading causes of blindness. But due to low patient to doctor ratio performing clinical retinal screening processes for all such patients is not always possible. In this paper a deep learning based automated diabetic retinopathy detection method is presented . Though different frameworks exist for classifying different retinal diseases with both shallow machine learning algorithms and deep learning algorithms, there is very little literature on the problem of variation of sources between training and test data. Kaggle EYEPACS data was used in this study for training the dataset and the Messidor dataset was used for testing the efficiency of the model. With proper data sampling, augmentation and pre-processing techniques it was possible to achieve state-of-the-art accuracy of classification using Messidor dataset (which had a different camera settings and resolutions of images). The model achieved significant performance with a sensitivity of almost 90% and specificity of 91. 94% with an average accuracy of 90. 4
Performance analysis of machine learning and deep learning architectures for malaria detection on cell images
Plasmodium malaria is a parasitic protozoan that causes malaria in humans. Computer aided detection of Plasmodium is a research area attracting great interest. In this paper, we study the performance of various machine learning and deep learning approaches for the detection of Plasmodium on cell images from digital microscopy. We make use of a publicly available dataset composed of 27,558 cell images with equal instances of parasitized (contains Plasmodium) and uninfected (no Plasmodium) cells. We randomly split the dataset into groups of 80% and 20% for training and testing purposes, respectively. We apply color constancy and spatially resample all images to a particular size depending on the classification architecture implemented. We propose a fast Convolutional Neural Network (CNN) architecture for the classification of cell images. We also study and compare the performance of transfer learning algorithms developed based on well-established network architectures such as AlexNet, ResNet, VGG-16 and DenseNet. In addition, we study the performance of the bag-of-features model with Support Vector Machine for classification. The overall probability of a cell image comprising Plasmodium is determined based on the average of probabilities provided by all the CNN architectures implemented in this paper. Our proposed algorithm provided an overall accuracy of 96.7% on the testing dataset and area under the Receiver Operating Characteristic (ROC) curve value of 0.994 for 2756 parasitized cell images. This type of automated classification of cell images would enhance the workflow of microscopists and provide a valuable second opinion.
Histopathological image classification with deep convolutional neural networks
In the last few years, deep learning approaches have been applied successfully in different modalities of medical imaging problems and achieved state-of-the-art accuracy. Due to the huge volume and variety of imaging modalities, it remains a large open research area. However, in this paper, we have applied Inception Residual Recurrent Convolutional Neural Network (IRRCNN) model for histopathological image classification where a new publicly available dataset named KIMIA Path960 is used. This database contains 960 histopathological images with 20 different classes (different types of tissue collected from 400 Whole Slide Images). In this implementation, we have evaluated the model with non-overlapping patches size of 64×64 pixels and the variant of samples are generated from each patch with different data augmentation techniques including rotation, shear, zooming, and horizontal and vertical flipping. The experimental results are compared against Linear Binary Pattern (LBP), bag-of-visual words (BoVW), and deep learning method with AlexNet and VGG16 networks. The IRRCNN model shows around 98.79% testing accuracy for augmented patch-level evaluation which is around 2.29% and 4% superior performance compared to Support Vector Machine with histogram intersection kernel (IKSVM) with BoVW and VGG16 methods respectively. Additionally, this evaluation also demonstrates that the deep feature representation-based method outperforms compared to a traditional feature-based method including LBP and BoVW for the histopathological image classification problem.
Medical Imaging and Healthcare II
icon_mobile_dropdown
Automated quantification of DNA damage via deep transfer learning based analysis of comet assay images
Srikanth Namuduri, Barath Narayanan Narayanan, Mahsa Karbaschi, et al.
The comet assay is a technique used to assess the DNA damage in individual cells. The extent of the damage is indicated by the ratio between the amount of DNA in the tail of the comet and the amount in the head. This assessment is typically made by the operator manually analyzing the images. This process is inefficient and time consuming. Researchers in the past have used machine learning techniques to automate this process but it required manual feature extraction. In some cases, deep learning was applied but only for damage classification. We have successfully applied Convolutional Neural Networks(CNN) to achieve automated quantification of DNA damage from comet images. Typically deep learning techniques such as CNN require large amounts of labelled training data, which may not always be available. We demonstrate that by applying deep transfer learning, state of the art results can be obtained in the detection of DNA damage, even with a limited number of comet images.
Classification of quantitative phase images of cancer cells using machine learning (Conference Presentation)
I will present our latest advances in the development of machine learning classifier on interferometric phase microscopy (IPM) quantitative tomographic maps, obtained by new wavefront sensors, in order to obtain real-time grading of cancer cells. An internal contrast mechanism that can be used when imaging cells without staining is their refractive index. The light beam passing through the imaged cells is delayed, since the cells have a slightly higher refractive index compared with their surroundings, which can be captured by IPM. Contrary to qualitative phase contrast methods, IPM yields the full sample wavefront containing the optical thickness map or optical path delay (OPD) map of the cell, so that on each x–y point of this map, OPD is equal to the integral of the refractive index values across the cell thickness. We have recently compared the quantitative phase imaging-based features of healthy and cancer cells and of primary cancer and metastatic cancer cells under stain-free IPM. For this task, we have chosen pairs of cell lines taken from the same individual and organ. The cells were round and unattached to the surface to allow imaging during flow, and therefore most of the cells look alike, thus subjective pathological examination cannot be performed, even under IPM. We therefore applied both a new deep learning approach that can work with a small training set and a principle component analysis (PCA) followed by support vector machine (SVM) classifiers, and obtained classification results (healthy/cancer/metastatic) with over 90% sensitivity and specificity.
Feasibility of sparse contact thermometry as a method for breast cancer detection
It has been widely reported that breast tumors produce surface temperature signatures due to an increased metabolic heat generation rate and angiogenesis (the generation of new blood vessels around a tumor). The present work provides an assessment of the feasibility of using sparse contact thermometry to detect tumors in the breast. The surface temperatures at positions approximately corresponding to the sensor locations in a proposed sensor array were obtained from infrared thermography images of 123 healthy patients and 27 patients with breast cancer. A Support Vector Machine was trained and tested through Leave One Out Cross Validation. The model obtained an AUC ROC of 0.914, with a sensitivity of 92.6% and specificity of 82.1%. These results are close to the gold standard and even higher in women with high breast density. The present work shows promise for sparse contact thermometry. It is imperative to conduct further research with larger sample sizes and with data collected with sparse contact thermometry devices to determine the effectiveness of the method.
Transfer learning approach to multiclass classification of child facial expressions
The classification of facial expression has been extensively studied using adult facial images which are not appropriate ground truths for classifying facial expressions in children. The state-of-the-art deep learning approaches have been successful in the classification of facial expressions in adults. A deep learning model may be better able to learn the subtle but important features underlying child facial expressions and improve upon the performance of traditional machine learning and feature extraction methods. However, unlike adult data, only a limited number of ground truth images exist for training and validating models for child facial expression classification and there is a dearth of literature in child facial expression analysis. Recent advances in transfer learning methods have enabled the use of deep learning architectures, trained on adult facial expression images, to be tuned for classifying child facial expressions with limited training samples. The network will learn generic facial expression patterns from adult expressions which can be fine-tuned to capture representative features of child facial expressions. This work proposes a transfer learning approach for multi-class classification of the seven prototypical expressions including the ‘neutral’ expression in children using a recently published child facial expression data set. This work holds promise to facilitate the development of technologies that focus on children and monitoring of children throughout their developmental stages to detect early symptoms related to developmental disorders, such as Autism Spectrum Disorder (ASD).
Non-invasive detection of breast cancer using deep learning (Conference Presentation)
Tarek M. Taha, Rishad Raiyan, Sharmin Akhtar, et al.
Abnormal and excessive cell growth, known as neoplasms, are not always a sign of cancer. These neoplasms may be benign malignant. Only malignant neoplasms are known as cancer. Therefore, determining the nature of a neoplasm is a critical step in the cancer detection process. Commonly used invasive methods for figuring out the type of neoplasm include fine needle aspiration cytology (FNAC) and tru-cut biopsy. Both these processes involve extracting tissue from the affected area via a needle and then observing the cells under a microscope. This observation is known in medical terms as histopathology. Histopathological images provide a direct evidence for the classification of the neoplasm. Histopathological image analysis however requires a skilled pathologist to describe the pathological findings. On the other hand, there are many non-invasive procedures used for cancer detection like elastography and mammography. Elastography maps the elastic properties of soft tissue using ultrasound or magnetic resonance imaging. Cancerous neoplasms will often be stiffer than healthier ones. A mammogram is just an x-ray picture of the breast. These non-invasive approaches unfortunately provide only an indirect evidence for classification making the classification job quite difficult. Creating a correlation between the non-invasive results and the histopathological images using deep learning can make it much easier to diagnose the nature of the tumor using only non-invasive procedures. Furthermore, training a deep network to recognize malignant tumors from histopathological images can make the entire cancer detection process automated. This paper proposes a deep learning based approach to determining the malignancy of a neoplasm using only non-invasive imaging. The results show good prediction accuracies.
Poster Session
icon_mobile_dropdown
Support vector machine and convolutional neural network based approaches for defect detection in fused filament fabrication
Identifying defective builds early on during Additive Manufacturing (AM) processes is a cost-effective way to reduce scrap and ensure that machine time is utilized efficiently. In this paper, we present an automated method to classify 3Dprinted polymer parts as either good or defective based on images captured during Fused Filament Fabrication (FFF), using independent machine learning and deep learning approaches. Either of these approaches could be potentially useful for manufacturers and hobbyists alike. Machine learning is implemented via Principal Component Analysis (PCA) and a Support Vector Machine (SVM), whereas deep learning is implemented using a Convolutional Neural Network (CNN). We capture videos of the FFF process on a small selection of polymer parts and label each frame as good or defective (2674 good frames and 620 defective frames). We divide this dataset for holdout validation by using 70% of images belonging to each class for training, leaving the rest for blind testing purposes. We obtain an overall accuracy of 98.2% and 99.5% for the classification of polymer parts using machine learning and deep learning techniques, respectively.
Crack detection of UAV concrete surface images
To improve the robustness of concrete crack detection in complex environments that feature non-uniform illumination, low contrast, and stain noise, such as roads, bridges, I present a systematic approach for automatic crack detection on UAV images for monitoring concrete facilities such as buildings and civil structures. A two-step process was applied. First, a deep learning processing technique for region detection of cracks, and then crack detection based on the image processing and region properties. I applied transfer learning approach to use a pre-trained network in order to identify cracks. I used pixel value based binarization of image data with an edge-preserving filter, which reduced noise in the region. Experimental results from UAV images showed that our approach has a good potential to be applied to concrete crack detection.
Visual attentiveness recognition using probabilistic neural network
Yi-Chun Chen, Yi-Jing Lin, I-Chieh Chen, et al.
For instant recognition of visual attentiveness, we established a set of studies based on signal conversion and machine learning of electroencephalogram (EEG). In this work, we invited twelve participants who were asked to play testing games for ensuing paying visual attention or to take a rest for a relaxed state. The brainwaves of participants were recorded by an EEG monitor during the experiments. EEG signals were transferred from time-domain into frequency-domain signals by fast Fourier transform (FFT) to obtain the frequency distributions of brainwaves of different visual attention states. The frequency information was then inputted into a probabilistic neural network (PNN) to build a discrimination model and to learn the rules that could determine an EEG epoch belongs to paying attention or not. As a type of supervised feedforward neural networks, PNN benefits high training speed and good error tolerance which is suitable for instant classification tasks. Given a set of training samples, PNN can train the predictable model of the specific EEG features by supervised learning algorithm, performing a classifier for visual attentiveness. In this paper, the proposed method successfully offers efficient differentiation for the assessment of visual attentiveness using FFT and PNN. The predictive model can distinguish the EEG epoch with attentive or relaxed states, which has an average accuracy higher than 82% for twelve participants. This attention classifier is expected to aid smart lighting control, specifically in assessing how different lighting situations will influence users’ visual work concentration.
Monitoring forest disturbance using change detection on synthetic aperture radar imagery
Alice M. S. Durieux, Matthew T. Calef, Scott Arko, et al.
Although monitoring forest disturbance is crucial to understanding atmospheric carbon accumulation and biodiversity loss, persistent cloud cover, especially in tropical areas, makes detecting forest disturbances using optical remotely sensed imagery difficult. In Sentinel-1 synthetic aperture radar (SAR) images, forest clearings exhibit reduced backscatter as well as increased interferometric coherence. We combined SAR and Interferometric SAR metrics from Sentinel-1 data collected in Borneo between in 2017 and 2018 and applied unsupervised change detection methods to the time series. The results show that a simple log-ratio based detector performs similarly to a more sophisticated anomalous change detection algorithm. The log-ratio detector was deployed to compare a 2017 mean Sentinel-1 composite with a 2018 mean composite. Approximately 20000 newly deforested areas were identified in 2018, for a total of 3000 km2 . The findings suggest that leveraging SAR data to monitor deforestation has the potential to achieve better performance than Global Forest Watch, the current Landsat based gold standard. Future work will leverage the short revisit time (6-12 days) of Sentinel-1 as an opportunity for continuous monitoring of deforestation. The improved time resolution associated with SAR observations in cloudy regions might enable the identification of areas at risk of deforestation early enough in the clearing process to allow preventive actions to be taken.
Road network mapping from aerial images
Suchit Jain, Rohan Mittal, Prakamya Mishra, et al.
Building and expansion of an efficient transportation network are essential for urban city advancement. However, tracking road development in an area is not an easy task as city planners do not always have access to credible information. A road network mapping framework is proposed which uses a random forest model for pixel-wise road segmentation. Road detection is followed by computer vision post-processing steps including Connected Component Analysis (CCA) and Hough Lines method for network extraction from high-resolution aerial images. The custom dataset used consists of images collected from an urban settlement in India.
An efficient framework for monitoring tree cover in an area through aerial images
Suchit Jain, Rohan Mittal, Arpan Jain, et al.
Monitoring tree cover in an area plays an important role in a wide range of applications and advances in UAV technology has made it feasible to capture high resolution imagery which can be used for this purpose. In this study, we adopt a state of the art object detector Mask Region-based CNN (Mask R-CNN1), through transfer learning, for the task of tree segmentation and counting. One bottleneck for the proposed task is the huge amount of data required if the model is required to be scalable to various different geographical regions. Towards this end, we explore the use of a sampling technique based on Gist descriptors and Gabor filtering in order to minimize the amount of training data required for obtaining excellent model performance across images with varied geographical features. This study was conducted across four regions in India, each having a different geographical landscape. We captured a total of 2357 images across all four regions. The final training dataset comprised of 48 images (sampled using the aforementioned method), representative of the entire dataset. Our method demonstrates high quality and scalable tree detection results.
Weed classification in grasslands using convolutional neural networks
Lyndon N. Smith, Arlo Byrne, Mark F. Hansen, et al.
Automatic identification and selective spraying of weeds (such as dock) in grass can provide very significant long-term ecological and cost benefits. Although machine vision (with interface to suitable automation) provides an effective means of achieving this, the associated challenges are formidable, due to the complexity of the images. This results from factors such as the percentage of dock in the image being low, the presence of other plants such as clover and changes in the level of illumination. Here, these challenges are addressed by the application of Convolutional Neural Networks (CNNs) to images containing grass and dock; and grass, dock and white clover. The performance of conventionally- trained CNNs and those trained using ‘Transfer Learning’ was compared. This was done for increasingly small datasets, to assess the viability of each approach for projects where large amounts of training data are not available. Results show that CNNs provide considerable improvements over previous methods for classification of weeds in grass. While previous work has reported best accuracies of around 83%, here a conventionally-trained CNN attained 95.6% accuracy for the two-class dataset, with 94.9% for the three-class dataset (i.e. dock, clover and grass). Interestingly, use of Transfer learning, with as few as 50 samples per class, still provides accuracies of around 84%. This is very promising for agricultural businesses that, due to the high cost of collecting and processing large amounts of data, have not yet been able to employ Neural Network models. Therefore, the employment of CNNs, particularly when incorporating Transfer Learning, is a very powerful method for classification of weeds in grassland, and one that is worthy of further research.
Nature of distracted driving in various physiological conditions
Road traffic injury has appeared as a severe problem today, claiming more than 1.25 million lives each year worldwide and draining 3% of the total global GDP. According to the National Highway Transportation Safety Administration, about half a million people got injured in 2014 due to distracted driving related car clashes. In this work, we have considered the brainwave, heart rate and blood pressure level of a distracted driver. Within the various source of distraction (e,g, multitasking, severe weather condition, external sound effects) we monitored the driving behavior through EEG signal. In particular, the alpha, beta, gamma, delta and theta brainwaves have a significant connection with the emotion, stress and other psychological responses. Our EEG data analysis can provide a pathway to detect the physiological condition of distracted drivers and avoid road accidents.