Proceedings Volume 10996

Real-Time Image Processing and Deep Learning 2019

cover
Proceedings Volume 10996

Real-Time Image Processing and Deep Learning 2019

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 26 July 2019
Contents: 8 Sessions, 28 Papers, 16 Presentations
Conference: SPIE Defense + Commercial Sensing 2019
Volume Number: 10996

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 10996
  • Deep Learning I
  • Deep Learning II
  • Real-Time Algorithms and Systems I
  • Deep Learning III
  • Real-Time Video Processing
  • Real-Time Algorithms and Systems II
  • Poster Session
Front Matter: Volume 10996
icon_mobile_dropdown
Front Matter: Volume 10996
This PDF file contains the front matter associated with SPIE Proceedings Volume 10996, including the Title Page, Copyright information, Table of Contents, Author and Conference Committee lists.
Deep Learning I
icon_mobile_dropdown
A deep learning-based smartphone app for real-time detection of retinal abnormalities in fundus images
Haoran Wei, Abhishek Sehgal, Nasser Kehtarnavaz
This paper presents the real-time implementation of two deep neural networks, which are trained for detection of eye retina abnormalities, on smartphones as an app. This app provides a low-cost and universally accessible alternative to fundus cameras since smartphones are widely available and they can be fitted with lenses that are commercially available for examination of the retina. The process of training two convolutional neural networks for retinal abnormality detection based on two publicly available datasets is discussed. Furthermore, it is shown how a smartphone app, both Android and iOS versions, are created from these trained networks. The results obtained indicate that it is possible to carry out the detection of retinal abnormalities on smartphones in an on-the-fly manner as retina images get captured by their cameras in real-time.
No-reference image quality assessment based on deep convolutional neural networks
A no-reference image quality assessment technique can measure the visual distortion in an image without any reference image data. Image distortions can be caused through the acquisition, compression or transmission of digital images. From the several types of image distortions, JPEG and JPEG2000 compression distortions, addition of white noise, Gaussian blur and fast fading are the most common ones. A typical real-world image may have multiple types of distortion. Our aim is to determine the different types of distortion that are present in an image and find the total distortion levels using a novel architecture using multiple Deep Convolutional Neural Networks (MDNN). The proposed model will classify different types of distortion that are present in an image thereby achieving both these objectives. Initially, local contrast normalization (LCN) is performed on images which are fed into the deep neural network for training. The images are then processed by a convolution-based distortion classifier which estimates the probability of each distortion type. Next, the distortion quality is predicted for each class. These probabilities are fused using the weighted average-pooling algorithm to get a single regressor output. We also experimented on the different parameters of the neural network, including optimizers (Adam, Adadelta, SGD, Rmsprop) and activation functions (RELU, SoftMax, Sigmoid, and Linear). The LIVE II database is used for the training, since it has five of the major distortion types. Cross-dataset validation is done on the CSIQ and TID2008 database. The results were evaluated using different correlation coefficients (SORCC, PLCC) and we achieved a linear correlation with the differential mean opinion scores (DMOS) for each of these coefficients in the tests conducted.
Integrated photonic FFTs for convolutional neural networks and real-time processing (Conference Presentation)
Convolutional neural networks have become an essential element of spatial deep learning systems. In the prevailing architecture, the convolution operation is performed with Fast Fourier Transforms (FFT) electronically in GPUs. The parallelism of GPUs provides an efficiency over CPUs, however both approaches being electronic are bound by the speed and power limits of the interconnect delay inside the circuits. Here we present a silicon photonics based architecture for convolutional neural networks that harnesses the phase property of light to perform FFTs efficiently. Our all-optical FFT is based on nested Mach-Zehnder Interferometers, directional couplers, and phase shifters, with backend electro-optic modulators for sampling. The FFT delay depends only on the propagation delay of the optical signal through the silicon photonics structures. Designing and analyzing the performance of a convolutional neural network deployed with our on-chip optical FFT, we find dramatic improvements by up to 102 when compared to state-of-the-art GPUs when exploring a compounded figure-of-merit given by power per convolution over area. At a high level, this performance is enabled by mapping the desired mathematical function, an FFT, synergistically onto hardware, in this case optical delay interferometers.
Computationally efficient training of deep neural networks via transfer learning
Diane Oyen
Transfer learning is a highly successful approach to train deep neural networks for customized image recognition tasks. Transfer learning leverages deep learning models that are previously trained on massive datasets and re-trains the network for a novel image recognition dataset. Typically, the advantage of transfer learning has been measured in sample efficiency, but instead, we investigate the computational efficiency of transfer learning. A good pre-trained model provides features that can be used as input to a new classifier (usually the top layers of a neural network). We show that if a good pre-trained model is selected, that training a new classifier can be much more computationally efficient than training a deep neural network without transfer learning. The first step in transfer learning is to select a pre-trained model to use as a feed-forward network for generating features. This selection process either uses human intuition/convenience or a methodical but computationally intensive validation loop. Instead, we would prefer a method to select the pre-trained model that will produce the best transfer results for the least amount of computation. To this end, we provide a computationally efficient metric for the fit between a pre-trained model and a novel image recognition task. The better the fit, the less computation will be needed to re-train the pre-trained model for the novel task. As machine learning becomes ubiquitous, highly-accurate trained models will proliferate and computationally efficient transfer learning methods will enable rapid development of new image recognition models.
Deep Learning II
icon_mobile_dropdown
Deep learning for impulsive noise removal in color digital images
Deep learning has been widely applied in many computer vision tasks due to its impressive capability of automatic feature extraction and classification. Recently, deep neural networks have been used in image denosing, but most of the proposed approaches were designed for Gaussian noise suppression. Therefore, in this paper, we address the problem of impulsive noise reduction in color images using Denoising Convolutional Neural Networks (DnCNN). This network architecture utilizes the concept of deep residual learning and is trained to learn the residual image instead of the directly denoised one. Our preliminary results show that direct application of DnCNN allows to achieve significantly better results than the state-of-the-art filters designed for impulsive noise in color images.
On data augmentation for segmenting hyperspectral images
Jakub Nalepa, Michal Myller, Michal Kawulok, et al.
Data augmentation is a popular technique which helps improve generalization capabilities of deep neural net- works, and can be perceived as implicit regularization. It is widely adopted in scenarios where acquiring high- quality training data is time-consuming and costly, with hyperspectral satellite imaging (HSI) being a real-life example. In this paper, we investigate data augmentation policies (exploiting various techniques, including generative adversarial networks applied to elaborate artificial HSI data) which help improve the generalization of deep neural networks (and other supervised learners) by increasing the representativeness of training sets. Our experimental study performed over HSI benchmarks showed that hyperspectral data augmentation boosts the classification accuracy of the models without sacrificing their real-time inference speed.
Fast multi-modal reuse: co-occurrence pre-trained deep learning models
The purpose of this paper is on the study of data fusion applications in traditional, spatial and aerial video stream applications which addresses the processing of data from multiple sources using co-occurrence information and uses a common semantic metric. Use of co-occurrence information to infer semantic relations between measurements avoids the need to make use of such external information, such as labels. Many of the current Vector Space Models (VSM) do not preserve the co-occurrence information leading to a not so useful similarity metric. We propose a proximity matrix embedding part of the learning metric embedding which has entries showing the relations between co-occurrence frequency observed in input sets. First, we show an implicit spatial sensor proximity matrix calculation using Jaccard similarity for an array of sensor measurements and compare with the state-of-the-art kernel PCA learning from feature space proximity representation; it relates to a k-radius ball of nearest neighbors. Finally, we extend the class co-occurrence boosting of our unsupervised model using pre-trained multi-modal reuse.
Deep learning for fast super-resolution reconstruction from multiple images
Michal Kawulok, Pawel Benecki, Krzysztof Hrynczenko, et al.
Recent advancements in single-image super-resolution reconstruction (SRR) are attributed primarily to convolutional neural networks (CNNs), which effectively learn the relation between low and high resolution and allow for obtaining high-quality reconstruction within seconds. SRR from multiple images benefits from information fusion, which improves the reconstruction outcome compared with example-based methods. On the other hand, multiple-image SRR is computationally more demanding, mainly due to required subpixel registration of the input images. Here, we explore how to exploit CNNs in multiple-image SRR and we demonstrate that competitive reconstruction outcome can be obtained within seconds.
Real-Time Algorithms and Systems I
icon_mobile_dropdown
An efficient algorithm for fast block matching motion estimation using an adaptive threshold scheme
Djoudi Kerfa, Abdelkader Saidane
A new block-matching algorithm for fast motion estimation is proposed. The so-called Star Diamond Search with Adaptive Threshold (SDth) is two steps algorithm. Adaptive threshold for matching errors eliminates invalid blocks early from motion estimation procedure. Then it performs a search for final motion vector with a Star Diamond Algorithm. Proposed SDth algorithm has been implemented and tested using several video sequences. SDth algorithm is also compared with previous search methods to demonstrate its utility.
Low-exposure image frame generation algorithms for feature extraction and classification
Krishnamurthy V. Vemuru, Jeffrey D. Clark
Neuromorphic architectures enable machine learning on a faster timescale compared to conventional processors and require encoding of spike trains from images for computer vision applications. We report low-exposure image representation algorithms that can generate multiple short-exposure frames from a given long exposure image. The frame deconvolution is non-linear in the sense that the difference between adjacent short-exposure frames change with exposure time, however the frames have a structural representation of the original image such that the image reconstructed from these frames has a Peak Signal-to-Noise Ratio (PSNR) of over 300 and a Structural Similarity Index Metric (SSIM) close to unity. We show that the low-exposure frames generated by our algorithms enable feature extraction for machine learning or deep learning, e. g., classification using convolutional neural networks. The validation accuracy for classification depends on the range of the random subtraction parameter, a used in our algorithms to simulate low-exposure frames. When the maximum of a, equals to the largest allowed change in the pixel intensity per time step, the validation accuracy for classification of digits in the Digits dataset is 90±3% based on the 1st 1 ms frame. The accuracy increases to 97% with only 40% of the 1ms frames generated for a given exposure time. These results show that machine learning can be extended to low exposure images.
Parallel image and video self-recovery scheme with high recovery capability
Javier Molina-Garcia, Volodymyr I. Ponomaryov, Rogelio Reyes-Reyes, et al.
In this paper, a parallel scheme for self-recovery of tampered images and videos is proposed. Designed technique is based on two methods for generating the digest image: halftoning and a block based scheme, additionally, the implementation of an authentication algorithm was carried out using a block-based method. In order to obtain robustness to the tampering coincidence problem, the proposed scheme embeds multiple versions of the recovery watermark. Finally, during the recovery process, an algorithm of inverse halftoning was applied to improve the quality of the reconstructed image. Proposed scheme was implemented in such a way that each process should be highly parallelizable using GPU, multicore processors. Experimental results have shown that novel framework generates a good quality of the watermarked images and the recovered images. The simulation results using parallel architectures have demonstrated the efficiency of the novel technique when it is implemented in a real-time environment.
Learning optimal actions with imperfect images
Song Jun Park, Dale R. Shires
Deep reinforcement learning has been successful in training an agent to play at human-level in Atari games. Here, outputs of the game images were fed into a deep neural network to compute optimal actions. Conceptually, reinforcement learning can be viewed as the intersection of planning, uncertainty, and learning. In this paper, deep reinforcement learning method is applied to solve a problem formulated as a partially observable Markov decision process (POMDP). Specifically, the input images are perturbed to introduce imperfect knowledge. POMDP formulations assume uncertainties in the true state space and thus a more accurate representation of the real-world scenarios. The deep Q-network is adopted to see if an optimal sequence of actions can be learned when the inputs are not fully observable. Experimental results indicated that optimal strategies were discovered by deep reinforcement learning in majority of test cases, albeit slower to converge to the optimal solution.
Deep Learning III
icon_mobile_dropdown
CNN classification based on global and local features
In this study, we investigated how shapes are classified based on local and global features by four representative convolutional neural networks (CNNs), i.e., AlexNet, VGG, ResNet and Inception. While the local features are based on simple components, such as orientation of line segment, the global features are based on the whole object, such as whether an object has a hole. For example, solid triangles and solid squares are differentiated by local features, solid circles and rings are differentiated by a global feature. Two sets of experiments were performed in this research. In the first experiment, we examined how the four CNNs pre-trained on ImageNet (with transfer learning) learned to differentiate the regular shapes (equilateral-triangles, squares, circles and rings). Our results showed that the pre-trained CNNs exhibited faster learning rates in the tasks discriminating the local features than in the tasks discriminating the global feature. However, the transfer learning of discriminating the global feature in regular shapes were better generalized to irregular shapes than the transfer learning of discriminating local features. In the second experiment, the CNNs were trained from scratch (with random weights initialization) to discriminate local and global features in regular and irregular shapes. Different from the transfer learning, the CNNs exhibited faster learning rates in discriminating the global feature than the local features. Similar to transfer learning, the CNNs exhibited excellent learning generalization to discriminating the global feature of irregular shapes, but poor learning generalization to discriminating the local features in the irregular shapes. The overarching goal of this research is to create a paradigm and benchmark to directly compare how the CNNs and primate visual systems process geometrical invariants. In contrast to the ImageNet approach which employs natural images to train CNNs, we employed the “ShapeNet” approach which features geometrical shapes with well-defined properties. The ShapeNet approach will not only help elucidate the strengths and limitations of CNN computation, but also provide insights into visual information processing in the primates.
Using subject face brightness assessment to detect ‘deep fakes’ (Conference Presentation)
A technology to create videos that purport to be authentic recordings of a subject has been developed and, based on its use of deep learning neural networks, is commonly known as ‘deep fakes.’ The videos produced by this technology are visually indistinguishable from real recordings, in at least some cases. These videos can, prospectively, be used to defame celebrities, politicians and business leaders, to make false statements appear authentic and potentially even change the course of national and regional elections. Clearly, techniques are needed to ascertain the veracity of video recordings. Techniques have been proposed which measure blinking and other characteristics of the subject. However, these techniques which rely on minor flaws in the recreation video algorithm may disappear quickly, as developers improve their software. This paper presents work related to a search for a more systematic way to detect artificially constructed videos. Specifically, the work presented herein analyzes the brightness of the subject’s face, comparing algorithmically constructed / reconstructed videos to actual recordings made under similar circumstances. Brightness is compared and averaged across the image. Additionally, pixel-to-adjacent-pixels and regional differences are compared and contrasted. Comparative results for constructed and original recordings are presented. Also discussed are the differences in results from using different individuals as the basis for the subject’s reconstruction, including the subject him or herself. Based on the results presented and the analysis discussed, the paper discusses the pros and cons of using subject face brightness as a technique for detecting fakes, across several application areas.
Real-Time Video Processing
icon_mobile_dropdown
Kalman-based motion estimation in video surveillance systems for safety applications
The paper presents the design of a Kalman-based motion estimation algorithm to be used in video surveillance systems for safety applications, such as smoke and fire fast alarm generation. Particularly, the Kalman estimator, combined with a color detection method, allows detecting the presence of smoke and/or fire in video scenes. Hence, it is a key device for indoor and outdoor surveillance systems using video-cameras.
Recent advances in integrated photonic-electronic technologies for high-speed processing and communication circuits for light-based transducers
The paper reviews recent advances in integrated technologies to embed photonic and electronic circuits and systems within the same chip and/or package. Thanks to multi-project wafer services like Europractice, new technologies (e.g. the SG25_PIC by IHP or the iSiPP50G by IMEC), are available at affordable cost to integrate on silicon chip also photonic devices like waveguides, photodiodes, photonic modulators and fiber-couplers. As an evolution, SG25H_EPIC technology is a high performance technology by IHP, which combines in the same chip integrated Silicon Photonic devices of the SG25_PIC technology with Bipolar and CMOS electronic circuits of the SiGe BiCMOS SG25H4 technology. The paper reviews these capabilities and show how it is possible to integrate in the same chip photonics and electronics to reach high speed in terms of signal acquisition, processing and transmission of light-based transducers.
Real-Time Algorithms and Systems II
icon_mobile_dropdown
Detection of retinal abnormalities using smartphone-captured fundus images: a survey
Several retinal pathologies lead to severe damages that may achieve vision lost. Moreover, some damages require expensive treatment, other ones are irreversible due to the lack of therapies. Therefore, early diagnoses are highly recommended to control ocular diseases. However, early stages of several ocular pathologies lead to the symptoms that cannot be distinguish by the patients. Moreover, ageing population is an important prevalence factor of ocular diseases which is the cases of most industrial counties. Further, this feature involves a lake of mobility which presents a limiting factor to perform periodical eye screening. Those constraints lead to a late of ocular diagnosis and hence important ocular pathology patients are registered. The forecast statistics indicates that affected population will be increased in coming years.

Several devices allowing the capture of the retina have recently been proposed. They are composed by optical lenses which can be snapped on Smartphone, providing fundus images with acceptable quality. Thence, the challenge is to perform automatic ocular pathology detection on Smartphone captured fundus images that achieves higher performance detection while respecting timing constraint with respect to the clinical employment. This paper presents a survey of the Smartphone-captured fundus image quality and the existing methods that use them for retinal structures and abnormalities detection.

For this purpose, we first summarize the works that evaluate the Smartphone-captures fundus image quality and their FOV (field-of-view). Then, we report the capability to detect abnormalities and ocular pathologies from those fundus images. Thereafter, we propose a flowchart of processing pipeline of detecting methods from Smartphone captured fundus images and we investigate about the implementation environment required to perform the detection of retinal abnormalities.
Real-time pyramidal Lukas-Kanade tracker performance estimation
Pavel Babayan, Sergei Buiko, Leonid Vdovkin, et al.
One of the effective ways to improve object tracking performance is a fusion of base tracking algorithms to their advantages and eliminate disadvantages. This fusion requires the estimation of the performances of the base object tracking algorithms. So the real-time estimation of the performance of each base tracking algorithm is required for the algorithm result to be used for the fusion. In this paper we propose an algorithm for performance estimation for the object tracking algorithm based on the pyramidal implementation of Lukas-Kanade feature tracker.

The performance estimation is based on the analysis of the variations of the intermediate algorithm parameters calculated during object tracking, such as total and mean feature lifetime, eigenvalues, inter-frame mean square coordinate difference, etc. Different combinations of these parameters were tested to obtain the best evaluation quality. The statistic measures were calculated for the image sequence, one or two hundred frames long. These statistic measures are highly correlated with the algorithm performance measures, based on the ground truth data: tracking precision and the ratio of the false detected features. The experimental research was performed using synthetic and real-world image sequences. We investigated performance estimation effectiveness in different observation conditions and during image degradations caused by noise, blur and low contrast.

The experimental results show good performance estimation quality. This allows Lukas-Kanade feature tracker to be fused with another tracking algorithms (correlation-based, segmentation, change detection) to obtain reliable tracking. Since the approach is based on the intermediate Lukas-Kanade algorithm parameters, then it does not bring valuable computational complexity to the tracking process. So real-time performance estimation can be implemented.
Robust switching technique of impulsive noise removal in color digital images
Bogdan Smolka, Boguslaw Cyganek, Michal Kawulok, et al.
In this paper a novel method of impulsive noise removal in color images is presented. The proposed filtering design is based on a new measure of pixel similarity, which takes into account the structure of the local neighborhood of the pixels being compared. Thus, the new distance measure can be regarded as an extension of the reachability distance used in the construction of the local outlier factor, widely used in the big data analysis. Using the new similarity measure, an extension of the classic Vector Median Filter (VMF) has been developed. The new filter is extremely robust to outliers introduced by the impulsive noise, retains details and has the unique ability to sharpen image edges. Using the structure of the developed filter, a new impulse detector has been constructed. The cumulated sum of smallest reachability distances in the filtering window serves as a robust measure of pixel outlyingness. In this way, a pixel will be treated as corrupted if a predefined threshold is exceeded and will be replaced by the average of pixels which were found to belong to the original, pristine image; otherwise the processed pixel will be retained. This structure is similar to the Fast Averaging Peer Group Filter, however the incorporation of the reachability measure makes this technique more robust. The new filtering design can be applied in real time scenario, as its computational efficiency is comparable with the standard VMF, which is fast enough to be used for the enhancement of video sequences. The new filter operates in a 3×3 filtering window, however the information acquired from a larger window is processed. The source of additional information is the local neighborhood of pixels, which is used for the determination of the novel reachability measure. The experiments performed on a large database of color images show that the new filter surpasses existing designs especially in the case of highly polluted images. The robust reachability measure assures that the clusters of impulses are being removed, as not only the pixels, but also their neighborhoods are considered. The novel measure of dissimilarity can be also used in other tasks whose main goal is the detection of outliers.
Radar real-time image processing for machine perception
The paper reviews real-time image processing techniques for machine perception with a specific focus on radar image processing for autonomous vehicles. Both algorithms and real-time computing platforms for range-doppler-direction analysis (i.e. estimation of distance, speed and angle of the target) are reviewed.
Simultaneous computation of discrete Radon transform quadrants for efficient implementation on real time systems
Ricardo Oliva-García, Óscar Gómez-Cárdenes, David Carmona-Ballester, et al.
Discrete Radon transform is a technique that allows to detect lines in images. It is much lighter to compute than Radon transforms based on Fourier slice theorem that use FFT as basis computing block. Even then, it is not that prone to optimal fine grain parallelization due to the need of running 4 passes to mirrored and flipped versions of the input in order to compute the 4 quadrants comprising 45 degrees each that arises of the decomposition of discrete lines in slope-intercept form. A new method is proposed that can solve the 4 quadrants simultaneously allowing for a more efficient parallelization. In higher dimensions Radon transform needs even more ‘runs’ of the basic algorithm, v.g., in 3 dimensions instead of 4 quadrants there are 12 dodecants to be solved. The proposed method can be extended to alleviate also the problem in those higher dimensions achieving an even greater gain.
Poster Session
icon_mobile_dropdown
Design of neuron-calculators for the normalized equivalence of two matrix arrays based on FPGA for self-learning equivalently convolutional neural networks (SLE_CNNs)
First, in the introduction, we will show the urgent need to create neuron-calculators (NCs) for the normalized equivalence of two matrix arrays for self-learning equivalently-convolutional neural networks (SLE_CNNs), video processors for parallel image processing with enhanced functionality. Consider promising areas of application of such single and multichannel neuron-calculators as high-precision, high-speed and high-performance accelerators for hardware systems and architectures for recognition, classification, image categorization, in particular for 2D-image space-invariant associative memory structures, SLE_CNNs based on the equivalence paradigm. Next, we will consider and analyze the theoretical foundations, the mathematical apparatus of matrix and continuous logic, and their basic operations, show their functional completeness, evaluate their advantages and prospects for application in the design of biologically inspired devices and systems for processing and analyzing signal arrays. We will show that some functions of continuous logic, including operations of normalized equivalence of vector and matrix signals, operation of the bounded difference of continuous logic, are the powerful basis for designing advanced accelerator calculators, microcells for hybrid (mixed) analog-to-digital transformations, comparisons and calculations of characteristics. Next, we will consider in more detail the design and simulation aspects of such digital neuron-calculators for the normalized equivalence of two matrix arrays based on FPGA, including their various modifications, depending on the dimension and the number of compared arrays. We will propose our approach for calculation of the normalized equivalence functions comparing current fragments of images and filters with dimensions 3×3, 7×7, 15×15, and others. The project is executed on FPGA ALTERA MAXII. The simulation was done with Intel Quartus Prime 17 and showed that in a single chip it is possible to place four parallel working neuron calculators processing 4 filters with a size of 15×15. The approximate processing time is less than 5μs. Power consumption is 50mW for supply voltage of 2.5V and clock frequency equals to 50MHz. We will also consider modifications that improve performance for different filter size. We show the results of modeling the proposed new implementations of NCs, we estimates and compare them.
Parallel blind semi-fragil color image watermarking based on fast discrete cosine transform
Alexis Jiménez-Calzadilla, Volodymyr I. Ponomaryov, Rogelio Reyes-Reyes, et al.
In this paper, a completely blind semi-fragile color image watermarking method for copyright protection is proposed. The embedding algorithm is performed by swapping three pairs of mid-frequency coefficients in each 8 by 8 Discrete Cosine Transform (DCT) non-overlapped block of the fast-discrete cosine transform algorithm that utilizes the eigenvectors of a symmetric “second difference” matrix in order to achieve higher computation performance. The simplicity of the embedding, extraction and DCT computation methods, offers a significant advantage in shorter processing time. Experimental results demonstrate the watermark imperceptibility with high PSNR and SSIM values of the watermarked image (of 45 dB and 0.948, respectively); additionally, the 2D binary watermark embedded such as company trademarks or owner’s logotype can be extracted with completely blind extraction process, i.e. without the original carrier image, original watermark nor any other information derivative of them are required. Average normalized correlation of the extracted binary watermark is of 0.91 even if the watermarked image was attacked by scaling, noise addition or JPEG compression. The simulation results of the parallel implementation in multicore CPU of the proposed method have shown more efficient and effective in real-time implementation of image watermarking than commonly used DCT techniques.
A robust framework for tamper detection in digital recorded voice signals
Adriana Bernal-Patiño, Volodymyr I. Ponomaryov, Rogelio Reyes-Reyes, et al.
This paper focuses on the tampering detection and pinpointing the manipulated region in digital recorded voice signals via blind semi-fragile watermark embedded in the frequency domain using the Fast Fourier Transform and Quantization Index Modulation-Dither Modulation. The proposed method has been evaluated with several male and female recorded voice signals. The designed method can provide non-audible perceptible difference between the original and the watermarked voice signals obtaining an average Signal-to-Noise Ratio of 44dB, providing resistance to various MP3 compression rates. The simulation results of the parallel implementation of the proposed technique on multicore central processing unit platform has shown efficient performance in a real-time environment.
Mobile detection of crop diseases for agricultural yield management
In agricultural economies worldwide, plant diseases are a major cause of economic losses. In this paper we propose an automated method for real time crop monitoring and disease detection. Images were captured on a daily basis for a field of 8 acres of land. Image features are extracted using the Speeded-Up Robust Features (SURF) after the Maximally Stable External Regions (MSER) method find blobs. The features are used to classify the images using Kmeans Clustering in the training phase. The ground truth diseased crop images are stored in a database with the same features to act as prototypes and are compared to real time images for disease detection using nearest neighbor classification. The experimental dataset currently consists of rice crop and maize crop with 100 diseased images and approximately 1000 normal crop images. Results show 83.3% accuracy and provide information to farmers about their crop and if required alert them to disease, allowing for corrective action. There is scope to extend the classification and detection method to real-time platforms. Such applications would prove to be a valuable tool for agricultural yield management, especially since the field of interest covered may be very large and the diseases may not be uniformly distributed.
Ocular diseases diagnosis in fundus images using a deep learning: approaches, tools and performance evaluation
Yaroub Elloumi, Mohamed Akil, Henda Boudegga
Ocular pathology detection from fundus images presents an important challenge on health care. In fact, each pathology has different severity stages that may be deduced by verifying the existence of specific lesions. Each lesion is characterized by morphological features. Moreover, several lesions of different pathologies have similar features. We note that patient may be affected simultaneously by several pathologies. Consequently, the ocular pathology detection presents a multiclass classification with a complex resolution principle. Several detection methods of ocular pathologies from fundus images have been proposed. The methods based on deep learning are distinguished by higher performance detection, due to their capability to configure the network with respect to the detection objective.

This work proposes a survey of ocular pathology detection methods based on deep learning. First, we study the existing methods either for lesion segmentation or pathology classification. Afterwards, we extract the principle steps of processing and we analyze the proposed neural network structures. Subsequently, we identify the hardware and software environment required to employ the deep learning architecture. Thereafter, we investigate about the experimentation principles involved to evaluate the methods and the databases used either for training and testing phases. The detection performance ratios and execution times are also reported and discussed.
Visual front-end for underwater scene change detection and environment monitoring by the autonomous drone
Autonomous underwater drone operation requires on-line analysis of signals coming from various sensors. In this paper we focus on design of the visual front-end of an underwater drone which is optimized for abrupt signal change detection for help in maneuvering and underwater object search operations. The proposed method relies on tensor space comparison with the chordal kernel function. This kernel measures a distance expressed as principal angles on Grassman manifolds of unfolded tensors. Although tested on color videos, the method can be scaled to accept more signal types in the input tensors. Experiments show promising results.
Rock grains segmentation using curvilinear structures based features
Sebastian Iwaszenko, Karolina Nurzynska
The segmentation method for rock grains delineation on rock material images is presented. The method is based on a five-dimensional intensity feature vector calculated for each point of the analyzed image. The features include pixel's grey level, grey level average and standard deviation calculated for the pixel's neighbourhood as well as vesselness and vesselness scale parameters. The vesselness and vesselness scale features measure local curvilinearity of objects depicted in the image. Machine learning classifiers, such as k nearest neighbours, support vector machine and artificial neural network are used for edges of rocks' grains determination. The manually segmented images were used as a ground truth. The bunch of experiments were performed to discover the best features calculation and classification methods parameters. The post-processing methods (thinning and best-fit) were proposed to improve the delineation process. The obtained results show that accuracy as high as 89% can be expected.
A newly proposed object detection method using Faster R-CNN inception with ResNet based on Tensorflow
The purpose of the project is to study the previous methods of Object Detection using Deep Learning and propose a new method. The new model consists of three different techniques or processes: Regional Convolutional Neural Network (RCNN), Inception and ResNet. In object detection, the prime target of each method is to make sure maximum accuracy, high FPS, greater resolution and faster speed. But, due to the limitation of computational power it is not always possible to maintain a balance between these four. R-CNN in general is capable of handling high resolution images with a decent number of frames per second. In our method we introduced the concept of inception by dividing each large convolutional layer into smaller convolutional layers. And, by adding ResNet, we were able to get rid any extra layer that was not helping us in the gaining higher accuracy. Though we could achieve a low FPS, but the input image size was high resolution and the mean average precision was also high (almost close to SSD). We retrained the COCO and OpenImage datasets and results were decent enough. The model we build was based on the TensorFlow library.