Tenth International Conference on Machine Vision (ICMV 2017) | (2018) | Publications

Volume Details

Date Published: 18 April 2018

Contents: 11 Sessions, 98 Papers, 0 Presentations

Conference: Tenth International Conference on Machine Vision 2017

Volume Number: 10696

All links to SPIE Proceedings will open in the SPIE Digital Library.

Show all abstracts

View Session

Front Matter: Volume 10696
Pattern Recognition
Target Detection and Tracking
Feature Matching and Image Segmentation
Image Classification
Image Analysis and Quality Assessment
Machine Vision and Visualization
Video Processing Technology and Methods
Computer Photography and Imaging Technology
Image Processing and Applications
Computer Information Engineering and Signal Processing

Front Matter: Volume 10696

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 10696, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and Conference Committee listing.

Pattern Recognition

Analysis of straw row in the image to control the trajectory of the agricultural combine harvester (Erratum)

Aleksandr Yurievich Shkanaev, Darya Alekseevna Krokhina, Dmitry Valerevich Polevoy, et al.

Show abstract

Publisher’s Note: This paper, originally published on 13 April 2018, was replaced with a corrected/revised version on 14 September 2018. If you downloaded the original PDF but are unable to access the revision, please contact SPIE Digital Library Customer Service for assistance. The paper proposes a solution to the automatic operation of the combine harvester along the straw rows by means of the images from the camera, installed in the cab of the harvester. The U-Net is used to recognize straw rows in the image. The edges of the row are approximated in the segmented image by the curved lines and further converted into the harvester coordinate system for the automatic operating system. The “new” network architecture and approaches to the row approximation has improved the quality of the recognition task and the processing speed of the frames up to 96% and 7.5 fps, respectively. Keywords: Grain harvester,

Geometry-based populated chessboard recognition

Youye Xie, Gongguo Tang, William Hoff

Show abstract

Chessboards are commonly used to calibrate cameras, and many robust methods have been developed to recognize the unoccupied boards. However, when the chessboard is populated with chess pieces, such as during an actual game, the problem of recognizing the board is much harder. Challenges include occlusion caused by the chess pieces, the presence of outlier lines and low viewing angles of the chessboard. In this paper, we present a novel approach to address the above challenges and recognize the chessboard. The Canny edge detector and Hough transform are used to capture all possible lines in the scene. The k-means clustering and a k-nearest-neighbors inspired algorithm are applied to cluster and reject the outlier lines based on their Euclidean distances to the nearest neighbors in a scaled Hough transform space. Finally, based on prior knowledge of the chessboard structure, a geometric constraint is used to find the correspondences between image lines and the lines on the chessboard through the homography transformation. The proposed algorithm works for a wide range of the operating angles and achieves high accuracy in experiments.

Biometric identification based on feature fusion with PCA and SVM

László Lefkovits, Szidónia Lefkovits, Simina Emerich

Show abstract

Biometric identification is gaining ground compared to traditional identification methods. Many biometric measurements may be used for secure human identification. The most reliable among them is the iris pattern because of its uniqueness, stability, unforgeability and inalterability over time. The approach presented in this paper is a fusion of different feature descriptor methods such as HOG, LIOP, LBP, used for extracting iris texture information. The classifiers obtained through the SVM and PCA methods demonstrate the effectiveness of our system applied to one and both irises. The performances measured are highly accurate and foreshadow a fusion system with a rate of identification approaching 100% on the UPOL database.

Comparison of the scanned pages of the contractual documents

Elena Andreeva, Vladimir V. Arlazarov, Temudzhin Manzhikov, et al.

Show abstract

In this paper the problem statement is given to compare the digitized pages of the official papers. Such problem appears during the comparison of two customer copies signed at different times between two parties with a view to find the possible modifications introduced on the one hand. This problem is a practically significant in the banking sector during the conclusion of contracts in a paper format. The method of comparison based on the recognition, which consists in the comparison of two bag-of-words, which are the recognition result of the master and test pages, is suggested. The described experiments were conducted using the OCR Tesseract and the siamese neural network. The advantages of the suggested method are the steady operation of the comparison algorithm and the high exacting precision, and one of the disadvantages is the dependence on the chosen OCR.

Textual blocks rectification method based on fast Hough transform analysis in identity documents recognition

P. V. Bezmaternykh, D. P. Nikolaev, V. L. Arlazarov

Show abstract

Textual blocks rectification or slant correction is an important stage of document image processing in OCR systems. This paper considers existing methods and introduces an approach for the construction of such algorithms based on Fast Hough Transform analysis. A quality measurement technique is proposed and obtained results are shown for both printed and handwritten textual blocks processing as a part of an industrial system of identity documents recognition on mobile devices.

Deep learning based beat event detection in action movie franchises

N. Ejaz, U. A. Khan, M. A. Martínez-del-Amor, et al.

Show abstract

Automatic understanding and interpretation of movies can be used in a variety of ways to semantically manage the massive volumes of movies data. “Action Movie Franchises” dataset is a collection of twenty Hollywood action movies from five famous franchises with ground truth annotations at shot and beat level of each movie. In this dataset, the annotations are provided for eleven semantic beat categories. In this work, we propose a deep learning based method to classify shots and beat-events on this dataset. The training dataset for each of the eleven beat categories is developed and then a Convolution Neural Network is trained. After finding the shot boundaries, key frames are extracted for each shot and then three classification labels are assigned to each key frame. The classification labels for each of the key frames in a particular shot are then used to assign a unique label to each shot. A simple sliding window based method is then used to group adjacent shots having the same label in order to find a particular beat event. The results of beat event classification are presented based on criteria of precision, recall, and F-measure. The results are compared with the existing technique and significant improvements are recorded.

Target Detection and Tracking

Compressed multi-block local binary pattern for object tracking

Tianwen Li, Yun Gao, Lei Zhao, et al.

Show abstract

Both robustness and real-time are very important for the application of object tracking under a real environment. The focused trackers based on deep learning are difficult to satisfy with the real-time of tracking. Compressive sensing provided a technical support for real-time tracking. In this paper, an object can be tracked via a multi-block local binary pattern feature. The feature vector was extracted based on the multi-block local binary pattern feature, which was compressed via a sparse random Gaussian matrix as the measurement matrix. The experiments showed that the proposed tracker ran in real-time and outperformed the existed compressive trackers based on Haar-like feature on many challenging video sequences in terms of accuracy and robustness.

Enhanced online convolutional neural networks for object tracking

Dengzhuo Zhang, Yun Gao, Hao Zhou, et al.

Show abstract

In recent several years, object tracking based on convolution neural network has gained more and more attention. The initialization and update of convolution filters can directly affect the precision of object tracking effective. In this paper, a novel object tracking via an enhanced online convolution neural network without offline training is proposed, which initializes the convolution filters by a k-means++ algorithm and updates the filters by an error back-propagation. The comparative experiments of 7 trackers on 15 challenging sequences showed that our tracker can perform better than other trackers in terms of AUC and precision.

Automated grain extraction and classification by combining improved region growing segmentation and shape descriptors in electromagnetic mill classification system

Sebastian Budzan

Show abstract

In this paper, the automatic method of grain detection and classification has been presented. As input, it uses a single digital image obtained from milling process of the copper ore with an high-quality digital camera. The grinding process is an extremely energy and cost consuming process, thus granularity evaluation process should be performed with high efficiency and time consumption. The method proposed in this paper is based on the three-stage image processing. First, using Seeded Region Growing (SRG) segmentation with proposed adaptive thresholding based on the calculation of Relative Standard Deviation (RSD) all grains are detected. In the next step results of the detection are improved using information about the shape of the detected grains using distance map. Finally, each grain in the sample is classified into one of the predefined granularity class. The quality of the proposed method has been obtained by using nominal granularity samples, also with a comparison to the other methods.

Distribution majorization of corner points by reinforcement learning for moving object detection

Hao Wu, Hao Yu, Dongxiang Zhou, et al.

Show abstract

Corner points play an important role in moving object detection, especially in the case of free-moving camera. Corner points provide more accurate information than other pixels and reduce the computation which is unnecessary. Previous works only use intensity information to locate the corner points, however, the information that former and the last frames provided also can be used. We utilize the information to focus on more valuable area and ignore the invaluable area. The proposed algorithm is based on reinforcement learning, which regards the detection of corner points as a Markov process. In the Markov model, the video to be detected is regarded as environment, the selections of blocks for one corner point are regarded as actions and the performance of detection is regarded as state. Corner points are assigned to be the blocks which are seperated from original whole image. Experimentally, we select a conventional method which uses marching and Random Sample Consensus algorithm to obtain objects as the main framework and utilize our algorithm to improve the result. The comparison between the conventional method and the same one with our algorithm show that our algorithm reduce 70% of the false detection.

Performance improvement of multi-class detection using greedy algorithm for Viola-Jones cascade selection

Alexander A. Tereshin, Sergey A. Usilin, Vladimir V. Arlazarov

Show abstract

This paper aims to study the problem of multi-class object detection in video stream with Viola-Jones cascades. An adaptive algorithm for selecting Viola-Jones cascade based on greedy choice strategy in solution of the N-armed bandit problem is proposed. The efficiency of the algorithm on the problem of detection and recognition of the bank card logos in the video stream is shown. The proposed algorithm can be effectively used in documents localization and identification, recognition of road scene elements, localization and tracking of the lengthy objects , and for solving other problems of rigid object detection in a heterogeneous data flows. The computational efficiency of the algorithm makes it possible to use it both on personal computers and on mobile devices based on processors with low power consumption.

Vision based speed breaker detection for autonomous vehicle

Arvind C.S., Ritesh Mishra, Kumar Vishal, et al.

Show abstract

In this paper, we are presenting a robust and real-time, vision-based approach to detect speed breaker in urban environments for autonomous vehicle. Our method is designed to detect the speed breaker using visual inputs obtained from a camera mounted on top of a vehicle. The method performs inverse perspective mapping to generate top view of the road and segment out region of interest based on difference of Gaussian and median filter images. Furthermore, the algorithm performs RANSAC line fitting to identify the possible speed breaker candidate region. This initial guessed region via RANSAC, is validated using support vector machine. Our algorithm can detect different categories of speed breakers on cement, asphalt and interlock roads at various conditions and have achieved a recall of ~0.98.

Deep learning architecture for recognition of abnormal activities

Marwa Khatrouch, Mariem Gnouma, Ridha Ejbali, et al.

Show abstract

The video surveillance is one of the key areas in computer vision researches. The scientific challenge in this field involves the implementation of automatic systems to obtain detailed information about individuals and groups behaviors. In particular, the detection of abnormal movements of groups or individuals requires a fine analysis of frames in the video stream. In this article, we propose a new method to detect anomalies in crowded scenes. We try to categorize the video in a supervised mode accompanied by unsupervised learning using the principle of the autoencoder. In order to construct an informative concept for the recognition of these behaviors, we use a technique of representation based on the superposition of human silhouettes. The evaluation of the UMN dataset demonstrates the effectiveness of the proposed approach.

Multi person detection and tracking based on hierarchical level-set method

Chadia Khraief, Faouzi Benzarti, Hamid Amiri

Show abstract

In this paper, we propose an efficient unsupervised method for mutli-person tracking based on hierarchical level-set approach. The proposed method uses both edge and region information in order to effectively detect objects. The persons are tracked on each frame of the sequence by minimizing an energy functional that combines color, texture and shape information. These features are enrolled in covariance matrix as region descriptor. The present method is fully automated without the need to manually specify the initial contour of Level-set. It is based on combined person detection and background subtraction methods. The edge-based is employed to maintain a stable evolution, guide the segmentation towards apparent boundaries and inhibit regions fusion. The computational cost of level-set is reduced by using narrow band technique. Many experimental results are performed on challenging video sequences and show the effectiveness of the proposed method.

Vanishing points detection using combination of fast Hough transform and deep learning

Alexander Sheshkus, Anastasia Ingacheva, Dmitry Nikolaev

Show abstract

In this paper we propose a novel method for vanishing points detection based on convolutional neural network (CNN) approach and fast Hough transform algorithm. We show how to determine fast Hough transform neural network layer and how to use it in order to increase usability of the neural network approach to the vanishing point detection task. Our algorithm includes CNN with consequence of convolutional and fast Hough transform layers. We are building estimator for distribution of possible vanishing points in the image. This distribution can be used to find candidates of vanishing point. We provide experimental results from tests of suggested method using images collected from videos of road trips. Our approach shows stable result on test images with different projective distortions and noise. Described approach can be effectively implemented for mobile GPU and CPU.

Aggregated channels network for real-time pedestrian detection

Farzin Ghorban, Javier Marín, Yu Su, et al.

Show abstract

Convolutional neural networks (CNNs) have demonstrated their superiority in numerous computer vision tasks, yet their computational cost results prohibitive for many real-time applications such as pedestrian detection which is usually performed on low-consumption hardware. In order to alleviate this drawback, most strategies focus on using a two-stage cascade approach. Essentially, in the first stage a fast method generates a significant but reduced amount of high quality proposals that later, in the second stage, are evaluated by the CNN. In this work, we propose a novel detection pipeline that further benefits from the two-stage cascade strategy. More concretely, the enriched and subsequently compressed features used in the first stage are reused as the CNN input. As a consequence, a simpler network architecture, adapted for such small input sizes, allows to achieve real-time performance and obtain results close to the state-of-the-art while running significantly faster without the use of GPU. In particular, considering that the proposed pipeline runs in frame rate, the achieved performance is highly competitive. We furthermore demonstrate that the proposed pipeline on itself can serve as an effective proposal generator.

Algorithm for covert convoy of a moving target using a group of autonomous robots

Igor Polyakov, Evgeny Shvets

Show abstract

An important application of autonomous robot systems is to substitute human personnel in dangerous environments to reduce their involvement and subsequent risk on human lives. In this paper we solve the problem of covertly convoying a civilian in a dangerous area with a group of unmanned ground vehicles (UGVs) using social potential fields. The novelty of our work lies in the usage of UGVs as compared to the unmanned aerial vehicles typically employed for this task in the approaches described in literature. Additionally, in our paper we assume that the group of UGVs should simultaneously solve the problem of patrolling to detect intruders on the area. We develop a simulation system to test our algorithms, provide numerical results and give recommendations on how to tune the potentials governing robots’ behaviour to prioritize between patrolling and convoying tasks.

Compressed normalized block difference for object tracking

Yun Gao, Dengzhuo Zhang, Donglan Cai, et al.

Show abstract

Feature extraction is very important for robust and real-time tracking. Compressive sensing provided a technical support for real-time feature extraction. However, all existing compressive tracking were based on compressed Haar-like feature, and how to compress many more excellent high-dimensional features is worth researching. In this paper, a novel compressed normalized block difference feature (CNBD) was proposed. For resisting noise effectively in a highdimensional normalized pixel difference feature (NPD), a normalized block difference feature extends two pixels in the original formula of NPD to two blocks. A CNBD feature can be obtained by compressing a normalized block difference feature based on compressive sensing theory, with the sparse random Gaussian matrix as the measurement matrix. The comparative experiments of 7 trackers on 20 challenging sequences showed that the tracker based on CNBD feature can perform better than other trackers, especially than FCT tracker based on compressed Haar-like feature, in terms of AUC, SR and Precision.

Feature Matching and Image Segmentation

Neural network-based feature point descriptors for registration of optical and SAR images

Dmitry Abulkhanov, Ivan Konovalenko, Dmitry Nikolaev, et al.

Show abstract

Registration of images of different nature is an important technique used in image fusion, change detection, efficient information representation and other problems of computer vision. Solving this task using feature-based approaches is usually more complex than registration of several optical images because traditional feature descriptors (SIFT, SURF, etc.) perform poorly when images have different nature. In this paper we consider the problem of registration of SAR and optical images. We train neural network to build feature point descriptors and use RANSAC algorithm to align found matches. Experimental results are presented that confirm the method’s effectiveness.

FPFH-based graph matching for 3D point cloud registration

Jiapeng Zhao, Chen Li, Lihua Tian, et al.

Show abstract

Correspondence detection is a vital step in point cloud registration and it can help getting a reliable initial alignment. In this paper, we put forward an advanced point feature-based graph matching algorithm to solve the initial alignment problem of rigid 3D point cloud registration with partial overlap. Specifically, Fast Point Feature Histograms are used to determine the initial possible correspondences firstly. Next, a new objective function is provided to make the graph matching more suitable for partially overlapping point cloud. The objective function is optimized by the simulated annealing algorithm for final group of correct correspondences. Finally, we present a novel set partitioning method which can transform the NP-hard optimization problem into a O(n³)-solvable one. Experiments on the Stanford and UWA public data sets indicates that our method can obtain better result in terms of both accuracy and time cost compared with other point cloud registration methods.

Graphic matching based on shape contexts and reweighted random walks

Mingxuan Zhang, Dongmei Niu, Xiuyang Zhao, et al.

Show abstract

Graphic matching is a very critical issue in all aspects of computer vision. In this paper, a new graphics matching algorithm combining shape contexts and reweighted random walks was proposed. On the basis of the local descriptor, shape contexts, the reweighted random walks algorithm was modified to possess stronger robustness and correctness in the final result. Our main process is to use the descriptor of the shape contexts for the random walk on the iteration, of which purpose is to control the random walk probability matrix. We calculate bias matrix by using descriptors and then in the iteration we use it to enhance random walks’ and random jumps' accuracy, finally we get the one-to-one registration result by discretization of the matrix. The algorithm not only preserves the noise robustness of reweighted random walks but also possesses the rotation, translation, scale invariance of shape contexts. Through extensive experiments, based on real images and random synthetic point sets, and comparisons with other algorithms, it is confirmed that this new method can produce excellent results in graphic matching.

A comparative analysis of image features between weave embroidered Thangka and piles embroidered Thangka

Zhenjiang Li, Weilan Wang

Show abstract

Thangka is a treasure of Tibetan culture. In its digital protection, most of the current research focuses on the content of Thangka images, not the fabrication process. For silk embroidered Thangka of "Guo Tang", there are two craft methods, namely, weave embroidered and piles embroidered. The local texture of weave embroidered Thangka is rough, and that of piles embroidered Thangka is more smooth. In order to distinguish these two kinds of fabrication processes from images, a effectively segmentation algorithm of color blocks is designed firstly, and the obtained color blocks contain the local texture patterns of Thangka image; Secondly, the local texture features of the color block are extracted and screened; Finally, the selected features are analyzed experimentally. The experimental analysis shows that the proposed features can well reflect the difference between methods of weave embroidered and piles embroidered.

From image captioning to video summary using deep recurrent networks and unsupervised segmentation

Bogdan-Andrei Morosanu, Camelia Lemnaru

Show abstract

Automatic captioning systems based on recurrent neural networks have been tremendously successful at providing realistic natural language captions for complex and varied image data. We explore methods for adapting existing models trained on large image caption data sets to a similar problem, that of summarising videos using natural language descriptions and frame selection. These architectures create internal high level representations of the input image that can be used to define probability distributions and distance metrics on these distributions. Specifically, we interpret each hidden unit inside a layer of the caption model as representing the un-normalised log probability of some unknown image feature of interest for the caption generation process. We can then apply well understood statistical divergence measures to express the difference between images and create an unsupervised segmentation of video frames, classifying consecutive images of low divergence as belonging to the same context, and those of high divergence as belonging to different contexts. To provide a final summary of the video, we provide a group of selected frames and a text description accompanying them, allowing a user to perform a quick exploration of large unlabeled video databases.

A novel automatic segmentation workflow of axial breast DCE-MRI

Feten Besbes, Norhene Gargouri, Alima Damak, et al.

Show abstract

In this paper we propose a novel process of a fully automatic breast tissue segmentation which is independent from expert calibration and contrast. The proposed algorithm is composed by two major steps. The first step consists in the detection of breast boundaries. It is based on image content analysis and Moore-Neighbour tracing algorithm. As a processing step, Otsu thresholding and neighbors algorithm are applied. Then, the external area of breast is removed to get an approximated breast region. The second preprocessing step is the delineation of the chest wall which is considered as the lowest cost path linking three key points; These points are located automatically at the breast. They are respectively, the left and right boundary points and the middle upper point placed at the sternum region using statistical method. For the minimum cost path search problem, we resolve it through Dijkstra algorithm. Evaluation results reveal the robustness of our process face to different breast densities, complex forms and challenging cases. In fact, the mean overlap between manual segmentation and automatic segmentation through our method is 96.5%. A comparative study shows that our proposed process is competitive and faster than existing methods. The segmentation of 120 slices with our method is achieved at least in 20.57±5.2s.

Breast mass segmentation in mammograms combining fuzzy c-means and active contours

Marwa Hmida, Kamel Hamrouni, Basel Solaiman, et al.

Show abstract

Segmentation of breast masses in mammograms is a challenging issue due to the nature of mammography and the characteristics of masses. In fact, mammographic images are poor in contrast and breast masses have various shapes and densities with fuzzy and ill-defined borders. In this paper, we propose a method based on a modified Chan-Vese active contour model for mass segmentation in mammograms. We conduct the experiment on mass Regions of Interest (ROI) extracted from the MIAS database. The proposed method consists of mainly three stages: Firstly, the ROI is preprocessed to enhance the contrast. Next, two fuzzy membership maps are generated from the preprocessed ROI based on fuzzy C-Means algorithm. These fuzzy membership maps are finally used to modify the energy of the Chan-Vese model and to perform the final segmentation. Experimental results indicate that the proposed method yields good mass segmentation results.

An adhered-particle analysis system based on concave points

Wencheng Wang, Fengnian Guan, Lin Feng

Show abstract

Particles adhered together will influence the image analysis in computer vision system. In this paper, a method based on concave point is designed. First, corner detection algorithm is adopted to obtain a rough estimation of potential concave points after image segmentation. Then, it computes the area ratio of the candidates to accurately localize the final separation points. Finally, it uses the separation points of each particle and the neighboring pixels to estimate the original particles before adhesion and provides estimated profile images. The experimental results have shown that this approach can provide good results that match the human visual cognitive mechanism.

Segmentation algorithm on smartphone dual camera: application to plant organs in the wild

Sarah Bertrand, Guillaume Cerutti, Laure Tougne

Show abstract

In order to identify the species of a tree, the different organs that are the leaves, the bark, the flowers and the fruits, are inspected by botanists. So as to develop an algorithm that identifies automatically the species, we need to extract these objects of interest from their complex natural environment. In this article, we focus on the segmentation of flowers and fruits and we present a new method of segmentation based on an active contour algorithm using two probability maps. The first map is constructed via the dual camera that we can find on the back of the latest smartphones. The second map is made with the help of a multilayer perceptron (MLP). The combination of these two maps to drive the evolution of the object contour allows an efficient segmentation of the organ from a natural background.

Video segmentation using keywords

Vinh Ton-That, Chi-Tai Vong, Xuan-Truong Nguyen-Dao, et al.

Show abstract

At DAVIS-2016 Challenge, many state-of-art video segmentation methods achieve potential results, but they still much depend on annotated frames to distinguish between background and foreground. It takes a lot of time and efforts to create these frames exactly. In this paper, we introduce a method to segment objects from video based on keywords given by user. First, we use a real-time object detection system - YOLOv2 to identify regions containing objects that have labels match with the given keywords in the first frame. Then, for each region identified from the previous step, we use Pyramid Scene Parsing Network to assign each pixel as foreground or background. These frames can be used as input frames for Object Flow algorithm to perform segmentation on entire video. We conduct experiments on a subset of DAVIS-2016 dataset in half the size of its original size, which shows that our method can handle many popular classes in PASCAL VOC 2012 dataset with acceptable accuracy, about 75.03%. We suggest widely testing by combining other methods to improve this result in the future.

Fast words boundaries localization in text fields for low quality document images

Dmitry Ilin, Dmitriy Novikov, Dmitry Polevoy, et al.

Show abstract

The paper examines the problem of word boundaries precise localization in document text zones. Document processing on a mobile device consists of document localization, perspective correction, localization of individual fields, finding words in separate zones, segmentation and recognition. While capturing an image with a mobile digital camera under uncontrolled capturing conditions, digital noise, perspective distortions or glares may occur. Further document processing gets complicated because of its specifics: layout elements, complex background, static text, document security elements, variety of text fonts. However, the problem of word boundaries localization has to be solved at runtime on mobile CPU with limited computing capabilities under specified restrictions. At the moment, there are several groups of methods optimized for different conditions. Methods for the scanned printed text are quick but limited only for images of high quality. Methods for text in the wild have an excessively high computational complexity, thus, are hardly suitable for running on mobile devices as part of the mobile document recognition system. The method presented in this paper solves a more specialized problem than the task of finding text on natural images. It uses local features, a sliding window and a lightweight neural network in order to achieve an optimal algorithm speed-precision ratio. The duration of the algorithm is 12 ms per field running on an ARM processor of a mobile device. The error rate for boundaries localization on a test sample of 8000 fields is 0.3

Image Classification

Early detection of lung cancer from CT images: nodule segmentation and classification using deep learning

Manu Sharma, Jignesh S. Bhatt, Manjunath V. Joshi

Show abstract

Lung cancer is one of the most abundant causes of the cancerous deaths worldwide. It has low survival rate mainly due to the late diagnosis. With the hardware advancements in computed tomography (CT) technology, it is now possible to capture the high resolution images of lung region. However, it needs to be augmented by efficient algorithms to detect the lung cancer in the earlier stages using the acquired CT images. To this end, we propose a two-step algorithm for early detection of lung cancer. Given the CT image, we first extract the patch from the center location of the nodule and segment the lung nodule region. We propose to use Otsu method followed by morphological operations for the segmentation. This step enables accurate segmentation due to the use of data-driven threshold. Unlike other methods, we perform the segmentation without using the complete contour information of the nodule. In the second step, a deep convolutional neural network (CNN) is used for the better classification (malignant or benign) of the nodule present in the segmented patch. Accurate segmentation of even a tiny nodule followed by better classification using deep CNN enables the early detection of lung cancer. Experiments have been conducted using 6306 CT images of LIDC-IDRI database. We achieved the test accuracy of 84.13%, with the sensitivity and specificity of 91.69% and 73.16%, respectively, clearly outperforming the state-of-the-art algorithms.

Singular spectrum decomposition of Bouligand-Minkowski fractal descriptors: an application to the classification of texture Images

João Batista Florindo

Show abstract

This work proposes the use of Singular Spectrum Analysis (SSA) for the classification of texture images, more specifically, to enhance the performance of the Bouligand-Minkowski fractal descriptors in this task. Fractal descriptors are known to be a powerful approach to model and particularly identify complex patterns in natural images. Nevertheless, the multiscale analysis involved in those descriptors makes them highly correlated. Although other attempts to address this point was proposed in the literature, none of them investigated the relation between the fractal correlation and the well-established analysis employed in time series. And SSA is one of the most powerful techniques for this purpose. The proposed method was employed for the classification of benchmark texture images and the results were compared with other state-of-the-art classifiers, confirming the potential of this analysis in image classification.

Classification of time-series images using deep convolutional neural networks

Nima Hatami, Yann Gavet, Johan Debayle

Show abstract

Convolutional Neural Networks (CNN) has achieved a great success in image recognition task by automatically learning a hierarchical feature representation from raw data. While the majority of Time-Series Classification (TSC) literature is focused on 1D signals, this paper uses Recurrence Plots (RP) to transform time-series into 2D texture images and then take advantage of the deep CNN classifier. Image representation of time-series introduces different feature types that are not available for 1D signals, and therefore TSC can be treated as texture image recognition task. CNN model also allows learning different levels of representations together with a classifier, jointly and automatically. Therefore, using RP and CNN in a unified framework is expected to boost the recognition rate of TSC. Experimental results on the UCR time-series classification archive demonstrate competitive accuracy of the proposed approach, compared not only to the existing deep architectures, but also to the state-of-the art TSC algorithms.

Gender classification from face images by using local binary pattern and gray-level co-occurrence matrix

Betül Uzbaş, Ahmet Arslan

Show abstract

Gender is an important step for human computer interactive processes and identification. Human face image is one of the important sources to determine gender. In the present study, gender classification is performed automatically from facial images. In order to classify gender, we propose a combination of features that have been extracted face, eye and lip regions by using a hybrid method of Local Binary Pattern and Gray-Level Co-Occurrence Matrix. The features have been extracted from automatically obtained face, eye and lip regions. All of the extracted features have been combined and given as input parameters to classification methods (Support Vector Machine, Artificial Neural Networks, Naive Bayes and k-Nearest Neighbor methods) for gender classification. The Nottingham Scan face database that consists of the frontal face images of 100 people (50 male and 50 female) is used for this purpose. As the result of the experimental studies, the highest success rate has been achieved as 98% by using Support Vector Machine. The experimental results illustrate the efficacy of our proposed method.

Training a whole-book LSTM-based recognizer with an optimal training set

Mohammad Reza Soheili, Mohammad Reza Yousefi, Ehsanollah Kabir, et al.

Show abstract

Despite the recent progress in OCR technologies, whole-book recognition, is still a challenging task, in particular in case of old and historical books, that the unknown font faces or low quality of paper and print contributes to the challenge. Therefore, pre-trained recognizers and generic methods do not usually perform up to required standards, and usually the performance degrades for larger scale recognition tasks, such as of a book. Such reportedly low error-rate methods turn out to require a great deal of manual correction. Generally, such methodologies do not make effective use of concepts such redundancy in whole-book recognition. In this work, we propose to train Long Short Term Memory (LSTM) networks on a minimal training set obtained from the book to be recognized. We show that clustering all the sub-words in the book, and using the sub-word cluster centers as the training set for the LSTM network, we can train models that outperform any identical network that is trained with randomly selected pages of the book. In our experiments, we also show that although the sub-word cluster centers are equivalent to about 8 pages of text for a 101- page book, a LSTM network trained on such a set performs competitively compared to an identical network that is trained on a set of 60 randomly selected pages of the book.

A deep learning pipeline for Indian dance style classification

Swati Dewan, Shubham Agarwal, Navjyoti Singh

Show abstract

In this paper, we address the problem of dance style classification to classify Indian dance or any dance in general. We propose a 3-step deep learning pipeline. First, we extract 14 essential joint locations of the dancer from each video frame, this helps us to derive any body region location within the frame, we use this in the second step which forms the main part of our pipeline. Here, we divide the dancer into regions of important motion in each video frame. We then extract patches centered at these regions. Main discriminative motion is captured in these patches. We stack the features from all such patches of a frame into a single vector and form our hierarchical dance pose descriptor. Finally, in the third step, we build a high level representation of the dance video using the hierarchical descriptors and train it using a Recurrent Neural Network (RNN) for classification. Our novelty also lies in the way we use multiple representations for a single video. This helps us to: (1) Overcome the RNN limitation of learning small sequences over big sequences such as dance; (2) Extract more data from the available dataset for effective deep learning by training multiple representations. Our contributions in this paper are three-folds: (1) We provide a deep learning pipeline for classification of any form of dance; (2) We prove that a segmented representation of a dance video works well with sequence learning techniques for recognition purposes; (3) We extend and refine the ICD dataset and provide a new dataset for evaluation of dance. Our model performs comparable or better in some cases than the state-of-the-art on action recognition benchmarks.

Automatic white blood cell classification using pre-trained deep learning models: ResNet and Inception

Mehdi Habibzadeh, Mahboobeh Jannesari, Zahra Rezaei, et al.

Show abstract

This works gives an account of evaluation of white blood cell differential counts via computer aided diagnosis (CAD) system and hematology rules. Leukocytes, also called white blood cells (WBCs) play main role of the immune system. Leukocyte is responsible for phagocytosis and immunity and therefore in defense against infection involving the fatal diseases incidence and mortality related issues. Admittedly, microscopic examination of blood samples is a time consuming, expensive and error-prone task. A manual diagnosis would search for specific Leukocytes and number abnormalities in the blood slides while complete blood count (CBC) examination is performed. Complications may arise from the large number of varying samples including different types of Leukocytes, related sub-types and concentration in blood, which makes the analysis prone to human error. This process can be automated by computerized techniques which are more reliable and economical. In essence, we seek to determine a fast, accurate mechanism for classification and gather information about distribution of white blood evidences which may help to diagnose the degree of any abnormalities during CBC test. In this work, we consider the problem of pre-processing and supervised classification of white blood cells into their four primary types including Neutrophils, Eosinophils, Lymphocytes, and Monocytes using a consecutive proposed deep learning framework. For first step, this research proposes three consecutive pre-processing calculations namely are color distortion; bounding box distortion (crop) and image flipping mirroring. In second phase, white blood cell recognition performed with hierarchy topological feature extraction using Inception and ResNet architectures. Finally, the results obtained from the preliminary analysis of cell classification with (11200) training samples and 1244 white blood cells evaluation data set are presented in confusion matrices and interpreted using accuracy rate, and false positive with the classification framework being validated with experiments conducted on poor quality blood images sized 320 × 240 pixels. The deferential outcomes in the challenging cell detection task, as shown in result section, indicate that there is a significant achievement in using Inception and ResNet architecture with proposed settings. Our framework detects on average 100% of the four main white blood cell types using ResNet V1 50 while also alternative promising result with 99.84% and 99.46% accuracy rate obtained with ResNet V1 152 and ResNet 101, respectively with 3000 epochs and fine-tuning all layers. Further statistical confusion matrix tests revealed that this work achieved 1, 0.9979, 0.9989 sensitivity values when area under the curve (AUC) scores above 1, 0.9992, 0.9833 on three proposed techniques. In addition, current work shows negligible and small false negative 0, 2, 1 and substantial false positive with 0, 0, 5 values in Leukocytes detection.

Convolutional neural network with transfer learning for rice type classification

Vaibhav Amit Patel, Manjunath V. Joshi

Show abstract

Presently, rice type is identified manually by humans, which is time consuming and error prone. Therefore, there is a need to do this by machine which makes it faster with greater accuracy. This paper proposes a deep learning based method for classification of rice types. We propose two methods to classify the rice types. In the first method, we train a deep convolutional neural network (CNN) using the given segmented rice images. In the second method, we train a combination of a pretrained VGG16 network and the proposed method, while using transfer learning in which the weights of a pretrained network are used to achieve better accuracy. Our approach can also be used for classification of rice grain as broken or fine. We train a 5-class model for classifying rice types using 4000 training images and another 2- class model for the classification of broken and normal rice using 1600 training images. We observe that despite having distinct rice images, our architecture, pretrained on ImageNet data boosts classification accuracy significantly.

Comparison of classification algorithms for various methods of preprocessing radar images of the MSTAR base

A. A. Borodinov, V. V. Myasnikov

Show abstract

The present work is devoted to comparing the accuracy of the known qualification algorithms in the task of recognizing local objects on radar images for various image preprocessing methods. Preprocessing involves speckle noise filtering and normalization of the object orientation in the image by the method of image moments and by a method based on the Hough transform. In comparison, the following classification algorithms are used: Decision tree; Support vector machine, AdaBoost, Random forest. The principal component analysis is used to reduce the dimension. The research is carried out on the objects from the base of radar images MSTAR. The paper presents the results of the conducted studies.

Image Analysis and Quality Assessment

Mobile and embedded fast high resolution image stitching for long length rectangular monochromatic objects with periodic structure

Elena Limonova, Daniil Tropin, Boris Savelyev, et al.

Show abstract

In this paper we describe stitching protocol, which allows to obtain high resolution images of long length monochromatic objects with periodic structure. This protocol can be used for long length documents or human-induced objects in satellite images of uninhabitable regions like Arctic regions. The length of such objects can reach notable values, while modern camera sensors have limited resolution and are not able to provide good enough image of the whole object for further processing, e.g. using in OCR system. The idea of the proposed method is to acquire a video stream containing full object in high resolution and use image stitching. We expect the scanned object to have straight boundaries and periodic structure, which allow us to introduce regularization to the stitching problem and adapt algorithm for limited computational power of mobile and embedded CPUs. With the help of detected boundaries and structure we estimate homography between frames and use this information to reduce complexity of stitching. We demonstrate our algorithm on mobile device and show image processing speed of 2 fps on Samsung Exynos 5422 processor

Modification of YAPE keypoint detection algorithm for wide local contrast range images

A. Lukoyanov, D. Nikolaev, I. Konovalenko

Show abstract

Keypoint detection is an important tool of image analysis, and among many contemporary keypoint detection algorithms YAPE is known for its computational performance, allowing its use in mobile and embedded systems. One of its shortcomings is high sensitivity to local contrast which leads to high detection density in high-contrast areas while missing detections in low-contrast ones. In this work we study the contrast sensitivity of YAPE and propose a modification which compensates for this property on images with wide local contrast range (Yet Another Contrast-Invariant Point Extractor, YACIPE). As a model example, we considered the traffic sign recognition problem, where some signs are well-lighted, whereas others are in shadows and thus have low contrast. We show that the number of traffic signs on the image of which has not been detected any keypoints is 40% less for the proposed modification compared to the original algorithm.

Reducing noise component on medical images

Evgeny Semenishchev, Viacheslav Voronin, Vladimir Dub, et al.

Show abstract

Medical visualization and analysis of medical data is an actual direction. Medical images are used in microbiology, genetics, roentgenology, oncology, surgery, ophthalmology, etc. Initial data processing is a major step towards obtaining a good diagnostic result. The paper considers the approach allows an image filtering with preservation of objects borders. The algorithm proposed in this paper is based on sequential data processing. At the first stage, local areas are determined, for this purpose the method of threshold processing, as well as the classical ICI algorithm, is applied. The second stage uses a method based on based on two criteria, namely, L2 norm and the first order square difference. To preserve the boundaries of objects, we will process the transition boundary and local neighborhood the filtering algorithm with a fixed-coefficient. For example, reconstructed images of CT, x-ray, and microbiological studies are shown. The test images show the effectiveness of the proposed algorithm. This shows the applicability of analysis many medical imaging applications.

Image quality assessment for selfies with and without super resolution

Aya Kubota, Seiichi Gohshi

Show abstract

With the advent of cellphone cameras, in particular, on smartphones, many people now take photos of themselves alone and with others in the frame; such photos are popularly known as “selfies”. Most smartphones are equipped with two cameras: the front-facing and rear cameras. The camera located on the back of the smartphone is referred to as the “out-camera,” whereas the one located on the front of the smartphone is called the “in-camera.” In-cameras are mainly used for selfies. Some smartphones feature high-resolution cameras. However, the original image quality cannot be obtained because smartphone cameras often have low-performance lenses. Super resolution (SR) is one of the recent technological advancements that has increased image resolution. We developed a new SR technology that can be processed on smartphones. Smartphones with new SR technology are currently available in the market have already registered sales. However, the effective use of new SR technology has not yet been verified. Comparing the image quality with and without SR on smartphone display is necessary to confirm the usefulness of this new technology. Methods that are based on objective and subjective assessments are required to quantitatively measure image quality. It is known that the typical object assessment value, such as Peak Signal to Noise Ratio (PSNR), does not go together with how we feel when we assess image/video. When digital broadcast started, the standard was determined using subjective assessment. Although subjective assessment usually comes at high cost because of personnel expenses for observers, the results are highly reproducible when they are conducted under right conditions and statistical analysis. In this study, the subjective assessment results for selfie images are reported.

Toward a perceptual image quality assessment of color quantized images

Mariusz Frackiewicz, Henryk Palus

Show abstract

Color image quantization is an important operation in the field of color image processing. In this paper, we consider new perceptual image quality metrics for assessment of quantized images. These types of metrics, e.g. DSCSI, MDSIs, MDSIm and HPSI achieve the highest correlation coefficients with MOS during tests on the six publicly available image databases. Research was limited to images distorted by two types of compression: JPG and JPG2K. Statistical analysis of correlation coefficients based on the Friedman test and post-hoc procedures showed that the differences between the four new perceptual metrics are not statistically significant.

Spatio-thermal depth correction of RGB-D sensors based on Gaussian processes in real-time

Christoph Heindl, Thomas Pönitz, Gernot Stübl, et al.

Show abstract

Commodity RGB-D sensors capture color images along with dense pixel-wise depth information in real-time. Typical RGB-D sensors are provided with a factory calibration and exhibit erratic depth readings due to coarse calibration values, ageing and thermal influence effects. This limits their applicability in computer vision and robotics. We propose a novel method to accurately calibrate depth considering spatial and thermal influences jointly. Our work is based on Gaussian Process Regression in a four dimensional Cartesian and thermal domain. We propose to leverage modern GPUs for dense depth map correction in real-time. For reproducibility we make our dataset and source code publicly available.

Analysis of computer images in the presence of metals

Alexey Buzmakov, Anastasia Ingacheva, Victor Prun, et al.

Show abstract

Artifacts caused by intensely absorbing inclusions are encountered in computed tomography via polychromatic scanning and may obscure or simulate pathologies in medical applications. Тo improve the quality of reconstruction if high-Z inclusions in presence, previously we proposed and tested with synthetic data an iterative technique with soft penalty mimicking linear inequalities on the photon-starved rays. This note reports a test at the tomographic laboratory set-up at the Institute of Crystallography FSRC “Crystallography and Photonics” RAS in which tomographic scans were successfully made of temporary tooth without inclusion and with Pb inclusion.

Optimization of the hierarchical interpolator for image compression

Mikhail Gashnikov

Show abstract

Hierarchical interpolation of images is investigated in the problem of image compression. A new approach is proposed for optimizing the adaptive interpolator for hierarchical compression. This approach is based on optimizing the entropy of the compressed signal. This approach is more adequate to the compression problem than the known approach based on optimization of the interpolation error. The optimization algorithm for the adaptive interpolator is proposed on the basis of the proposed approach. The theoretical estimation of the computational complexity of the proposed interpolator is calculated. A comparison of this complexity with the complexity of other interpolators is performed. The advantage of the proposed interpolator over known interpolators is investigated experimentally. The win is calculated according to the size of the archive file. Recommendations for the use of the proposed interpolator are formulated.

Sum of top-hat transform based algorithm for vessel enhancement in MRA images

Hibet-Allah Ouazaa, Hajer Jlassi, Kamel Hamrouni

Show abstract

The Magnetic Resonance Angiography (MRA) is rich with information’s. But, they suffer from poor contrast, illumination and noise. Thus, it is required to enhance the images. But, these significant information can be lost if improper techniques are applied. Therefore, in this paper, we propose a new method of enhancement. We applied firstly the CLAHE method to increase the contrast of the image. Then, we applied the sum of Top-Hat Transform to increase the brightness of vessels. It is performed with the structuring element oriented in different angles. The methodology is tested and evaluated on the publicly available database BRAINIX. And, we used the measurement methods MSE (Mean Square Error), PSNR (Peak Signal to Noise Ratio) and SNR (Signal to Noise Ratio) for the evaluation. The results demonstrate that the proposed method could efficiently enhance the image details and is comparable with state of the art algorithms. Hence, the proposed method could be broadly used in various applications.

A detail-preserved and luminance-consistent multi-exposure image fusion algorithm

Guanquan Wang, Yue Zhou

Show abstract

When irradiance across a scene varies greatly, we can hardly get an image of the scene without over- or underexposure area, because of the constraints of cameras. Multi-exposure image fusion (MEF) is an effective method to deal with this problem by fusing multi-exposure images of a static scene. A novel MEF method is described in this paper. In the proposed algorithm, coarser-scale luminance consistency is preserved by contribution adjustment using the luminance information between blocks; detail-preserved smoothing filter can stitch blocks smoothly without losing details. Experiment results show that the proposed method performs well in preserving luminance consistency and details.

3D shape recovery from image focus using Gabor features

Fahad Mahmood, Jawad Mahmood, Ayesha Zeb, et al.

Show abstract

Recovering an accurate and precise depth map from a set of acquired 2-D image dataset of the target object each having different focus information is an ultimate goal of 3-D shape recovery. Focus measure algorithm plays an important role in this architecture as it converts the corresponding color value information into focus information which will be then utilized for recovering depth map. This article introduces Gabor features as focus measure approach for recovering depth map from a set of 2-D images. Frequency and orientation representation of Gabor filter features is similar to human visual system and normally applied for texture representation. Due to its little computational complexity, sharp focus measure curve, robust to random noise sources and accuracy, it is considered as superior alternative to most of recently proposed 3-D shape recovery approaches. This algorithm is deeply investigated on real image sequences and synthetic image dataset. The efficiency of the proposed scheme is also compared with the state of art 3-D shape recovery approaches. Finally, by means of two global statistical measures, root mean square error and correlation, we claim that this approach, in spite of simplicity, generates accurate results.

The method for homography estimation between two planes based on lines and points

Julia Shemiakina, Alexander Zhukovsky, Dmitry Nikolaev

Show abstract

The paper considers the problem of estimating a transform connecting two images of one plane object. The method based on RANSAC is proposed for calculating the parameters of projective transform which uses points and lines correspondences simultaneously. A series of experiments was performed on synthesized data. Presented results show that the algorithm convergence rate is significantly higher when actual lines are used instead of points of lines intersection. When using both lines and feature points it is shown that the convergence rate does not depend on the ratio between lines and feature points in the input dataset.

Document localization algorithms based on feature points and straight lines

Natalya Skoryukina, Julia Shemiakina, Vladimir L. Arlazarov, et al.

Show abstract

The important part of the system of a planar rectangular object analysis is the localization: the estimation of projective transform from template image of an object to its photograph. The system also includes such subsystems as the selection and recognition of text fields, the usage of contexts etc. In this paper three localization algorithms are described. All algorithms use feature points and two of them also analyze near-horizontal and near- vertical lines on the photograph. The algorithms and their combinations are tested on a dataset of real document photographs. Also the method of localization quality estimation is proposed that allows configuring the localization subsystem independently of the other subsystems quality.

Perception-oriented fusion of multi-sensor imagery: visible, IR, and SAR

D. Sidorchuk, V. Volkov, S. Gladilin

Show abstract

This paper addresses the problem of image fusion of optical (visible and thermal domain) data and radar data for the purpose of visualization. These types of images typically contain a lot of complimentary information, and their joint visualization can be useful and more convenient for human user than a set of individual images. To solve the image fusion problem we propose a novel algorithm that utilizes some peculiarities of human color perception and based on the grey-scale structural visualization. Benefits of presented algorithm are exemplified by satellite imagery.

Machine Vision and Visualization

Wire connector classification with machine vision and a novel hybrid SVM

Vedang Chauhan, Keyur D. Joshi, Brian W. Surgenor

Show abstract

A machine vision-based system has been developed and tested that uses a novel hybrid Support Vector Machine (SVM) in a part inspection application with clear plastic wire connectors. The application required the system to differentiate between 4 different known styles of connectors plus one unknown style, for a total of 5 classes. The requirement to handle an unknown class is what necessitated the hybrid approach. The system was trained with the 4 known classes and tested with 5 classes (the 4 known plus the 1 unknown). The hybrid classification approach used two layers of SVMs: one layer was semi-supervised and the other layer was supervised. The semi-supervised SVM was a special case of unsupervised machine learning that classified test images as one of the 4 known classes (to accept) or as the unknown class (to reject). The supervised SVM classified test images as one of the 4 known classes and consequently would give false positives (FPs). Two methods were tested. The difference between the methods was that the order of the layers was switched. The method with the semi-supervised layer first gave an accuracy of 80% with 20% FPs. The method with the supervised layer first gave an accuracy of 98% with 0% FPs. Further work is being conducted to see if the hybrid approach works with other applications that have an unknown class requirement.

High-resolution hyperspectral ground mapping for robotic vision

Frank Neuhaus, Christian Fuchs, Dietrich Paulus

Show abstract

Recently released hyperspectral cameras use large, mosaiced filter patterns to capture different ranges of the light’s spectrum in each of the camera’s pixels. Spectral information is sparse, as it is not fully available in each location. We propose an online method that avoids explicit demosaicing of camera images by fusing raw, unprocessed, hyperspectral camera frames inside an ego-centric ground surface map. It is represented as a multilayer heightmap data structure, whose geometry is estimated by combining a visual odometry system with either dense 3D reconstruction or 3D laser data. We use a publicly available dataset to show that our approach is capable of constructing an accurate hyperspectral representation of the surface surrounding the vehicle. We show that in many cases our approach increases spatial resolution over a demosaicing approach, while providing the same amount of spectral information.

A study on low-cost, high-accuracy, and real-time stereo vision algorithms for UAV power line inspection

Hongyu Wang, Baomin Zhang, Xun Zhao, et al.

Show abstract

Conventional stereo vision algorithms suffer from high levels of hardware resource utilization due to algorithm complexity, or poor levels of accuracy caused by inadequacies in the matching algorithm. To address these issues, we have proposed a stereo range-finding technique that produces an excellent balance between cost, matching accuracy and real-time performance, for power line inspection using UAV. This was achieved through the introduction of a special image preprocessing algorithm and a weighted local stereo matching algorithm, as well as the design of a corresponding hardware architecture. Stereo vision systems based on this technique have a lower level of resource usage and also a higher level of matching accuracy following hardware acceleration. To validate the effectiveness of our technique, a stereo vision system based on our improved algorithms were implemented using the Spartan 6 FPGA. In comparative experiments, it was shown that the system using the improved algorithms outperformed the system based on the unimproved algorithms, in terms of resource utilization and matching accuracy. In particular, Block RAM usage was reduced by 19%, and the improved system was also able to output range-finding data in real time.

Drawing a baseline in aesthetic quality assessment

Fernando Rubio, M. Julia Flores, Jose M. Puerta

Show abstract

Aesthetic classification of images is an inherently subjective task. There does not exist a validated collection of images/photographs labeled as having good or bad quality from experts. Nowadays, the closest approximation to that is to use databases of photos where a group of users rate each image. Hence, there is not a unique good/bad label but a rating distribution given by users voting. Due to this peculiarity, it is not possible to state the problem of binary aesthetic supervised classification in such a direct mode as other Computer Vision tasks. Recent literature follows an approach where researchers utilize the average rates from the users for each image, and they establish an arbitrary threshold to determine their class or label. In this way, images above the threshold are considered of good quality, while images below the threshold are seen as bad quality. This paper analyzes current literature, and it reviews those attributes able to represent an image, differentiating into three families: specific, general and deep features. Among those which have been proved more competitive, we have selected a representative subset, being our main goal to establish a clear experimental framework. Finally, once features were selected, we have used them for the full AVA dataset. We have to remark that to perform validation we report not only accuracy values, which is not that informative in this case, but also, metrics able to evaluate classification power within imbalanced datasets. We have conducted a series of experiments so that distinct well-known classifiers are learned from data. Like that, this paper provides what we could consider valuable and valid baseline results for the given problem.

Fuzzy classification for strawberry diseases-infection using machine vision and soft-computing techniques

Hamit Altıparmak, Mohamad Al Shahadat, Ehsan Kiani, et al.

Show abstract

Robotic agriculture requires smart and doable techniques to substitute the human intelligence with machine intelligence. Strawberry is one of the important Mediterranean product and its productivity enhancement requires modern and machine-based methods. Whereas a human identifies the disease infected leaves by his eye, the machine should also be capable of vision-based disease identification. The objective of this paper is to practically verify the applicability of a new computer-vision method for discrimination between the healthy and disease infected strawberry leaves which does not require neural network or time consuming trainings. The proposed method was tested under outdoor lighting condition using a regular DLSR camera without any particular lens. Since the type and infection degree of disease is approximated a human brain a fuzzy decision maker classifies the leaves over the images captured on-site having the same properties of human vision. Optimizing the fuzzy parameters for a typical strawberry production area at a summer mid-day in Cyprus produced 96% accuracy for segmented iron deficiency and 93% accuracy for segmented using a typical human instant classification approximation as the benchmark holding higher accuracy than a human eye identifier. The fuzzy-base classifier provides approximate result for decision making on the leaf status as if it is healthy or not.

A low-cost machine vision system for the recognition and sorting of small parts

Gustavo Barea, Brian W. Surgenor, Vedang Chauhan, et al.

Show abstract

An automated machine vision-based system for the recognition and sorting of small parts was designed, assembled and tested. The system was developed to address a need to expose engineering students to the issues of machine vision and assembly automation technology, with readily available and relatively low-cost hardware and software. This paper outlines the design of the system and presents experimental performance results. Three different styles of plastic gears, together with three different styles of defective gears, were used to test the system. A pattern matching tool was used for part classification. Nine experiments were conducted to demonstrate the effects of changing various hardware and software parameters, including: conveyor speed, gear feed rate, classification, and identification score thresholds. It was found that the system could achieve a maximum system accuracy of 95% at a feed rate of 60 parts/min, for a given set of parameter settings. Future work will be looking at the effect of lighting.

Three main paradigms of simultaneous localization and mapping (SLAM) problem

Vandad Imani, Keijo Haataja, Pekka Toivanen

Show abstract

Simultaneous Localization and Mapping (SLAM) is one of the most challenging research areas within computer and machine vision for automated scene commentary and explanation. The SLAM technique has been a developing research area in the robotics context during recent years. By utilizing the SLAM method robot can estimate the different positions of the robot at the distinct points of time which can indicate the trajectory of robot as well as generate a map of the environment. SLAM has unique traits which are estimating the location of robot and building a map in the various types of environment. SLAM is effective in different types of environment such as indoor, outdoor district, Air, Underwater, Underground and Space. Several approaches have been investigated to use SLAM technique in distinct environments. The purpose of this paper is to provide an accurate perceptive review of case history of SLAM relied on laser/ultrasonic sensors and camera as perception input data. In addition, we mainly focus on three paradigms of SLAM problem with all its pros and cons. In the future, use intelligent methods and some new idea will be used on visual SLAM to estimate the motion intelligent underwater robot and building a feature map of marine environment.

Computer vision system: a tool for evaluating the quality of wheat in a grain tank

Uryi Igorevish Minkin, Aleksei Vladimirovich Panchenko, Aleksandr Yurievich Shkanaev, et al.

Show abstract

The paper describes a technology that allows for automatizing the process of evaluating the grain quality in a grain tank of a combine harvester. Special recognition algorithm analyzes photographic images taken by the camera, and that provides automatic estimates of the total mass fraction of broken grains and the presence of non-grains. The paper also presents the operating details of the tank prototype as well as it defines the accuracy of the algorithms designed.

Using virtual environment for autonomous vehicle algorithm validation

Aleksandrs Levinskis

Show abstract

This paper describes possible use of modern game engine for validating and proving the concept of algorithm design. As the result simple visual odometry algorithm will be provided to show the concept and go over all workflow stages. Some of stages will involve using of Kalman filter in such a way that it will estimate optical flow velocity as well as position of moving camera located at vehicle body. In particular Unreal Engine 4 game engine will be used for generating optical flow patterns and ground truth path. For optical flow determination Horn and Schunck method will be applied. As the result, it will be shown that such method can estimate position of the camera attached to vehicle with certain displacement error respect to ground truth depending on optical flow pattern. For displacement rate RMS error is calculating between estimated and actual position.

Reinforcement learning in computer vision

A. V. Bernstein, E. V. Burnaev

Show abstract

Nowadays, machine learning has become one of the basic technologies used in solving various computer vision tasks such as feature detection, image segmentation, object recognition and tracking. In many applications, various complex systems such as robots are equipped with visual sensors from which they learn state of surrounding environment by solving corresponding computer vision tasks. Solutions of these tasks are used for making decisions about possible future actions. It is not surprising that when solving computer vision tasks we should take into account special aspects of their subsequent application in model-based predictive control. Reinforcement learning is one of modern machine learning technologies in which learning is carried out through interaction with the environment. In recent years, Reinforcement learning has been used both for solving such applied tasks as processing and analysis of visual information, and for solving specific computer vision problems such as filtering, extracting image features, localizing objects in scenes, and many others. The paper describes shortly the Reinforcement learning technology and its use for solving computer vision problems.

Aerial images visual localization on a vector map using color-texture segmentation

I. A. Kunina, L. M. Teplyakov, A. P. Gladkov, et al.

Show abstract

In this paper we study the problem of combining UAV obtained optical data and a coastal vector map in absence of satellite navigation data. The method is based on presenting the territory as a set of segments produced by color-texture image segmentation. We then find such geometric transform which gives the best match between these segments and land and water areas of the georeferenced vector map. We calculate transform consisting of an arbitrary shift relatively to the vector map and bound rotation and scaling. These parameters are estimated using the RANSAC algorithm which matches the segments contours and the contours of land and water areas of the vector map. To implement this matching we suggest computing shape descriptors robust to rotation and scaling. We performed numerical experiments demonstrating the practical applicability of the proposed method.

Video Processing Technology and Methods

Image quality assessment for video stream recognition systems

Timofey S. Chernov, Nikita P. Razumnuy, Alexander S. Kozharinov, et al.

Show abstract

Recognition and machine vision systems have long been widely used in many disciplines to automate various processes of life and industry. Input images of optical recognition systems can be subjected to a large number of different distortions, especially in uncontrolled or natural shooting conditions, which leads to unpredictable results of recognition systems, making it impossible to assess their reliability. For this reason, it is necessary to perform quality control of the input data of recognition systems, which is facilitated by modern progress in the field of image quality evaluation. In this paper, we investigate the approach to designing optical recognition systems with built-in input image quality estimation modules and feedback, for which the necessary definitions are introduced and a model for describing such systems is constructed. The efficiency of this approach is illustrated by the example of solving the problem of selecting the best frames for recognition in a video stream for a system with limited resources. Experimental results are presented for the system for identity documents recognition, showing a significant increase in the accuracy and speed of the system under simulated conditions of automatic camera focusing, leading to blurring of frames.

Recurrent neural network based virtual detection line

Roberts Kadikis

Show abstract

The paper proposes an efficient method for detection of moving objects in the video. The objects are detected when they cross a virtual detection line. Only the pixels of the detection line are processed, which makes the method computationally efficient. A Recurrent Neural Network processes these pixels. The machine learning approach allows one to train a model that works in different and changing outdoor conditions. Also, the same network can be trained for various detection tasks, which is demonstrated by the tests on vehicle and people counting. In addition, the paper proposes a method for semi-automatic acquisition of labeled training data. The labeling method is used to create training and testing datasets, which in turn are used to train and evaluate the accuracy and efficiency of the detection method. The method shows similar accuracy as the alternative efficient methods but provides greater adaptability and usability for different tasks.

First stereo video dataset with ground truth for remote car pose estimation using satellite markers

Gustavo Gil, Giovanni Savino, Marco Pierini

Show abstract

Leading causes of PTW (Powered Two-Wheeler) crashes and near misses in urban areas are on the part of a failure or delayed prediction of the changing trajectories of other vehicles. Regrettably, misperception from both car drivers and motorcycle riders results in fatal or serious consequences for riders. Intelligent vehicles could provide early warning about possible collisions, helping to avoid the crash. There is evidence that stereo cameras can be used for estimating the heading angle of other vehicles, which is key to anticipate their imminent location, but there is limited heading ground truth data available in the public domain. Consequently, we employed a marker-based technique for creating ground truth of car pose and create a dataset∗ for computer vision benchmarking purposes. This dataset of a moving vehicle collected from a static mounted stereo camera is a simplification of a complex and dynamic reality, which serves as a test bed for car pose estimation algorithms. The dataset contains the accurate pose of the moving obstacle, and realistic imagery including texture-less and non-lambertian surfaces (e.g. reflectance and transparency).

Method of determining the necessary number of observations for video stream documents recognition

Vladimir V. Arlazarov, Konstantin Bulatov, Temudzhin Manzhikov, et al.

Show abstract

This paper discusses a task of document recognition on a sequence of video frames. In order to optimize the processing speed an estimation is performed of stability of recognition results obtained from several video frames. Considering identity document (Russian internal passport) recognition on a mobile device it is shown that significant decrease is possible of the number of observations necessary for obtaining precise recognition result.

A no-reference image and video visual quality metric based on machine learning

Vladimir Frantc, Viacheslav Voronin, Evgenii Semenishchev, et al.

Show abstract

The paper presents a novel visual quality metric for lossy compressed video quality assessment. High degree of correlation with subjective estimations of quality is due to using of a convolutional neural network trained on a large amount of pairs video sequence-subjective quality score. We demonstrate how our predicted no-reference quality metric correlates with qualitative opinion in a human observer study. Results are shown on the EVVQ dataset with comparison existing approaches.

Optimal frame-by-frame result combination strategy for OCR in video stream

Konstantin Bulatov, Aleksander Lynchenko, Valeriy Krivtsov

Show abstract

This paper describes the problem of combining classification results of multiple observations of one object. This task can be regarded as a particular case of a decision-making using a combination of experts votes with calculated weights. The accuracy of various methods of combining the classification results depending on different models of input data is investigated on the example of frame-by-frame character recognition in a video stream. Experimentally it is shown that the strategy of choosing a single most competent expert in case of input data without irrelevant observations has an advantage (in this case irrelevant means with character localization and segmentation errors). At the same time this work demonstrates the advantage of combining several most competent experts according to multiplication rule or voting if irrelevant samples are present in the input data.

Satellite markers: a simple method for ground truth car pose on stereo video

Gustavo Gil, Giovanni Savino, Simone Piantini, et al.

Show abstract

Artificial prediction of future location of other cars in the context of advanced safety systems is a must. The remote estimation of car pose and particularly its heading angle is key to predict its future location. Stereo vision systems allow to get the 3D information of a scene. Ground truth in this specific context is associated with referential information about the depth, shape and orientation of the objects present in the traffic scene. Creating 3D ground truth is a measurement and data fusion task associated with the combination of different kinds of sensors. The novelty of this paper is the method to generate ground truth car pose only from video data. When the method is applied to stereo video, it also provides the extrinsic camera parameters for each camera at frame level which are key to quantify the performance of a stereo vision system when it is moving because the system is subjected to undesired vibrations and/or leaning. We developed a video post-processing technique which employs a common camera calibration tool for the 3D ground truth generation. In our case study, we focus in accurate car heading angle estimation of a moving car under realistic imagery. As outcomes, our satellite marker method provides accurate car pose at frame level, and the instantaneous spatial orientation for each camera at frame level.

Hierarchical vs non-hierarchical audio indexation and classification for video genres

Nouha Dammak, Yassine BenAyed

Show abstract

In this paper, Support Vector Machines (SVMs) are used for segmenting and indexing video genres based on only audio features extracted at block level, which has a prominent asset by capturing local temporal information. The main contribution of our study is to show the wide effect on the classification accuracies while using an hierarchical categorization structure based on Mel Frequency Cepstral Coefficients (MFCC) audio descriptor. In fact, the classification consists in three common video genres: sports videos, music clips and news scenes. The sub-classification may divide each genre into several multi-speaker and multi-dialect sub-genres. The validation of this approach was carried out on over 360 minutes of video span yielding a classification accuracy of over 99%.

Computer Photography and Imaging Technology

Overview of machine vision methods in x-ray imaging and microtomography

Alexey Buzmakov, Denis Zolotov, Marina Chukalina, et al.

Show abstract

Digital X-ray imaging became widely used in science, medicine, non-destructive testing. This allows using modern digital images analysis for automatic information extraction and interpretation. We give short review of scientific applications of machine vision in scientific X-ray imaging and microtomography, including image processing, feature detection and extraction, images compression to increase camera throughput, microtomography reconstruction, visualization and setup adjustment.

Triadic split-merge sampler

Anne C. van Rossum, Hai Xiang Lin, Johan Dubbeldam, et al.

Show abstract

In machine vision typical heuristic methods to extract parameterized objects out of raw data points are the Hough transform and RANSAC. Bayesian models carry the promise to optimally extract such parameterized objects given a correct definition of the model and the type of noise at hand. A category of solvers for Bayesian models are Markov chain Monte Carlo methods. Naive implementations of MCMC methods suffer from slow convergence in machine vision due to the complexity of the parameter space. Towards this blocked Gibbs and split-merge samplers have been developed that assign multiple data points to clusters at once. In this paper we introduce a new split-merge sampler, the triadic split-merge sampler, that perform steps between two and three randomly chosen clusters. This has two advantages. First, it reduces the asymmetry between the split and merge steps. Second, it is able to propose a new cluster that is composed out of data points from two different clusters. Both advantages speed up convergence which we demonstrate on a line extraction problem. We show that the triadic split-merge sampler outperforms the conventional split-merge sampler. Although this new MCMC sampler is demonstrated in this machine vision context, its application extend to the very general domain of statistical inference.

Pixel-wise deblurring imaging system based on active vision for structural health monitoring at a speed of 100 km/h

Tomohiko Hayakawa, Yushi Moko, Kenta Morishita, et al.

Show abstract

In this paper, we propose a pixel-wise deblurring imaging (PDI) system based on active vision for compensation of the blur caused by high-speed one-dimensional motion between a camera and a target. The optical axis is controlled by back-and-forth motion of a galvanometer mirror to compensate the motion. High-spatial-resolution image captured by our system in high-speed motion is useful for efficient and precise visual inspection, such as visually judging abnormal parts of a tunnel surface to prevent accidents; hence, we applied the PDI system for structural health monitoring. By mounting the system onto a vehicle in a tunnel, we confirmed significant improvement in image quality for submillimeter black-and-white stripes and real tunnel-surface cracks at a speed of 100 km/h.

Laser projection positioning of spatial contour curves via a galvanometric scanner

Junchao Tu, Liyan Zhang

Show abstract

The technology of laser projection positioning is widely applied in advanced manufacturing fields (e.g. composite plying, parts location and installation). In order to use it better, a laser projection positioning (LPP) system is designed and implemented. Firstly, the LPP system is built by a laser galvanometric scanning (LGS) system and a binocular vision system. Applying Single-hidden Layer Feed-forward Neural Network (SLFN), the system model is constructed next. Secondly, the LGS system and the binocular system, which are respectively independent, are integrated through a datadriven calibration method based on extreme learning machine (ELM) algorithm. Finally, a projection positioning method is proposed within the framework of the calibrated SLFN system model. A well-designed experiment is conducted to verify the viability and effectiveness of the proposed system. In addition, the accuracy of projection positioning are evaluated to show that the LPP system can achieves the good localization effect.

Blur kernel estimation with algebraic tomography technique and intensity profiles of object boundaries

Anastasia Ingacheva, Marina Chukalina, Timur Khanipov, et al.

Show abstract

Motion blur caused by camera vibration is a common source of degradation in photographs. In this paper we study the problem of finding the point spread function (PSF) of a blurred image using the tomography technique. The PSF reconstruction result strongly depends on the particular tomography technique used. We present a tomography algorithm with regularization adapted specifically for this task. We use the algebraic reconstruction technique (ART algorithm) as the starting algorithm and introduce regularization. We use the conjugate gradient method for numerical implementation of the proposed approach. The algorithm is tested using a dataset which contains 9 kernels extracted from real photographs by the Adobe corporation where the point spread function is known. We also investigate influence of noise on the quality of image reconstruction and investigate how the number of projections influence the magnitude change of the reconstruction error.

Formation of the image on the receiver of thermal radiation

Tatiana A. Akimenko

Show abstract

The formation of the thermal picture of the observed scene with the verification of the quality of the thermal images obtained is one of the important stages of the technological process that determine the quality of the thermal imaging observation system. In this article propose to consider a model for the formation of a thermal picture of a scene, which must take into account: the features of the object of observation as the source of the signal; signal transmission through the physical elements of the thermal imaging system that produce signal processing at the optical, photoelectronic and electronic stages, which determines the final parameters of the signal and its compliance with the requirements for thermal information and measurement systems.

3D shape recovery from image focus using gray level co-occurrence matrix

Fahad Mahmood, Umair Munir, Fahad Mehmood, et al.

Show abstract

Recovering a precise and accurate 3-D shape of the target object utilizing robust 3-D shape recovery algorithm is an ultimate objective of computer vision community. Focus measure algorithm plays an important role in this architecture which convert the color values of each pixel of the acquired 2-D image dataset into corresponding focus values. After convolving the focus measure filter with the input 2-D image dataset, a 3-D shape recovery approach is applied which will recover the depth map. In this document, we are concerned with proposing Gray Level Co-occurrence Matrix along with its statistical features for computing the focus information of the image dataset. The Gray Level Co-occurrence Matrix quantifies the texture present in the image using statistical features and then applies joint probability distributive function of the gray level pairs of the input image. Finally, we quantify the focus value of the input image using Gaussian Mixture Model. Due to its little computational complexity, sharp focus measure curve, robust to random noise sources and accuracy, it is considered as superior alternative to most of recently proposed 3-D shape recovery approaches. This algorithm is deeply investigated on real image sequences and synthetic image dataset. The efficiency of the proposed scheme is also compared with the state of art 3-D shape recovery approaches. Finally, by means of two global statistical measures, root mean square error and correlation, we claim that this approach –in spite of simplicity generates accurate results.

Establishing the correspondence between closed contours of objects in images with projective distortions

Alexey V. Savchik, Victoria A. Sablina, Dmitry P. Nikolaev

Show abstract

In this paper, we consider the task of finding the correspondence between closed contours of objects in an image pair with small projective distortions. Several methods are considered and their comparison is performed. The experiments results for two contour sets are provided. Sufficient conditions of the applicability of the method of selecting the nearest contour are represented and proven.

Real-time stop sign detection and distance estimation using a single camera

Wenpeng Wang, Yuxuan Su, Ming Cheng

Show abstract

In modern world, the drastic development of driver assistance system has made driving a lot easier than before. In order to increase the safety onboard, a method was proposed to detect STOP sign and estimate distance using a single camera. In STOP sign detection, LBP-cascade classifier was applied to identify the sign in the image, and the principle of pinhole imaging was based for distance estimation. Road test was conducted using a detection system built with a CMOS camera and software developed by Python language with OpenCV library. Results shows that that the proposed system reach a detection accuracy of maximum of 97.6% at 10m, a minimum of 95.00% at 20m, and 5% max error in distance estimation. The results indicate that the system is effective and has the potential to be used in both autonomous driving and advanced driver assistance driving systems.

Image Processing and Applications

Face landmark point tracking using LK pyramid optical flow

Gang Zhang, Sikan Tang, Jiaquan Li

Show abstract

LK pyramid optical flow is an effective method to implement object tracking in a video. It is used for face landmark point tracking in a video in the paper. The landmark points, i.e. outer corner of left eye, inner corner of left eye, inner corner of right eye, outer corner of right eye, tip of a nose, left corner of mouth, right corner of mouth, are considered. It is in the first frame that the landmark points are marked by hand. For subsequent frames, performance of tracking is analyzed. Two kinds of conditions are considered, i.e. single factors such as normalized case, pose variation and slowly moving, expression variation, illumination variation, occlusion, front face and rapidly moving, pose face and rapidly moving, and combination of the factors such as pose and illumination variation, pose and expression variation, pose variation and occlusion, illumination and expression variation, expression variation and occlusion. Global measures and local ones are introduced to evaluate performance of tracking under different factors or combination of the factors. The global measures contain the number of images aligned successfully, average alignment error, the number of images aligned before failure, and the local ones contain the number of images aligned successfully for components of a face, average alignment error for the components. To testify performance of tracking for face landmark points under different cases, tests are carried out for image sequences gathered by us. Results show that the LK pyramid optical flow method can implement face landmark point tracking under normalized case, expression variation, illumination variation which does not affect facial details, pose variation, and that different factors or combination of the factors have different effect on performance of alignment for different landmark points.

Researches of fruit quality prediction model based on near infrared spectrum

Yulin Shen, Lian Li

Show abstract

With the improvement in standards for food quality and safety, people pay more attention to the internal quality of fruits, therefore the measurement of fruit internal quality is increasingly imperative. In general, nondestructive soluble solid content (SSC) and total acid content (TAC) analysis of fruits is vital and effective for quality measurement in global fresh produce markets, so in this paper, we aim at establishing a novel fruit internal quality prediction model based on SSC and TAC for Near Infrared Spectrum. Firstly, the model of fruit quality prediction based on PCA + BP neural network, PCA + GRNN network, PCA + BP adaboost strong classifier, PCA + ELM and PCA + LS_SVM classifier are designed and implemented respectively; then, in the NSCT domain, the median filter and the SavitzkyGolay filter are used to preprocess the spectral signal, Kennard-Stone algorithm is used to automatically select the training samples and test samples; thirdly, we achieve the optimal models by comparing 15 kinds of prediction model based on the theory of multi-classifier competition mechanism, specifically, the non-parametric estimation is introduced to measure the effectiveness of proposed model, the reliability and variance of nonparametric estimation evaluation of each prediction model to evaluate the prediction result, while the estimated value and confidence interval regard as a reference, the experimental results demonstrate that this model can better achieve the optimal evaluation of the internal quality of fruit; finally, we employ cat swarm optimization to optimize two optimal models above obtained from nonparametric estimation, empirical testing indicates that the proposed method can provide more accurate and effective results than other forecasting methods.

Non-parametric adaptative JPEG fragments carving

Sabrina Cherifa Amrouche, Dalila Salamani

Show abstract

The most challenging JPEG recovery tasks arise when the file header is missing. In this paper we propose to use a two layer machine learning model to restore headerless JPEG images. We first build a classifier able to identify the structural properties of the images/fragments and then use an AutoEncoder (AE) to learn the fragment features for the header prediction. We define a JPEG universal header and the remaining free image parameters (Height, Width) are predicted with a Gradient Boosting Classifier. Our approach resulted in 90% accuracy using the manually defined features and 78% accuracy using the AE features.

2.5d body estimation via refined forest with field-based objective

Jaehwan Kim, HoWon Kim

Show abstract

In this paper, we present a 2.5D* body region classification method based on the global refinement of random forest. The refinement of random forest provides the reduction of the size of training model with preserving prediction accuracy. We also incorporate the field-inspired objective to the random forest in consideration of the pairwise spatial relationships between neighboring data points. Numerical and visual experiments with artificial 3D data confirm the usefulness of the proposed method.

Rough case-based reasoning system for continues casting

Wenbin Su, Zhufeng Lei

Show abstract

The continuous casting occupies a pivotal position in the iron and steel industry. The rough set theory and the CBR (case based reasoning, CBR) were combined in the research and implementation for the quality assurance of continuous casting billet to improve the efficiency and accuracy in determining the processing parameters. According to the continuous casting case, the object-oriented method was applied to express the continuous casting cases. The weights of the attributes were calculated by the algorithm which was based on the rough set theory and the retrieval mechanism for the continuous casting cases was designed. Some cases were adopted to test the retrieval mechanism, by analyzing the results, the law of the influence of the retrieval attributes on determining the processing parameters was revealed. A comprehensive evaluation model was established by using the attribute recognition theory. According to the features of the defects, different methods were adopted to describe the quality condition of the continuous casting billet. By using the system, the knowledge was not only inherited but also applied to adjust the processing parameters through the case based reasoning method as to assure the quality of the continuous casting and improve the intelligent level of the continuous casting.

Generation method of synthetic training data for mobile OCR system

Yulia S. Chernyshova, Alexander V. Gayer, Alexander V. Sheshkus

Show abstract

This paper addresses one of the fundamental problems of machine learning - training data acquiring. Obtaining enough natural training data is rather difficult and expensive. In last years usage of synthetic images has become more beneficial as it allows to save human time and also to provide a huge number of images which otherwise would be difficult to obtain. However, for successful learning on artificial dataset one should try to reduce the gap between natural and synthetic data distributions. In this paper we describe an algorithm which allows to create artificial training datasets for OCR systems using russian passport as a case study.

The evaluation of correction algorithms of intensity nonuniformity in breast MRI images: a phantom study

Damian Borys, Wojciech Serafin, Kamil Gorczewski, et al.

Show abstract

The aim of this work was to test the most popular and essential algorithms of the intensity nonuniformity correction of the breast MRI imaging. In this type of MRI imaging, especially in the proximity of the coil, the signal is strong but also can produce some inhomogeneities. Evaluated methods of signal correction were: N3, N3FCM, N4, Nonparametric, and SPM. For testing purposes, a uniform phantom object was used to obtain test images using breast imaging MRI coil. To quantify the results, two measures were used: integral uniformity and standard deviation. For each algorithm minimum, average and maximum values of both evaluation factors have been calculated using the binary mask created for the phantom. In the result, two methods obtained the lowest values in these measures: N3FCM and N4, however, for the second method visually phantom was the most uniform after correction.

Ontology based decision system for breast cancer diagnosis

Soumaya Trabelsi Ben Ameur, Florence Cloppet, Laurent Wendling, et al.

Show abstract

In this paper, we focus on analysis and diagnosis of breast masses inspired by expert concepts and rules. Accordingly, a Bag of Words is built based on the ontology of breast cancer diagnosis, accurately described in the Breast Imaging Reporting and Data System. To fill the gap between low level knowledge and expert concepts, a semantic annotation is developed using a machine learning tool. Then, breast masses are classified into benign or malignant according to expert rules implicitly modeled with a set of classifiers (KNN, ANN, SVM and Decision Tree). This semantic context of analysis offers a frame where we can include external factors and other meta-knowledge such as patient risk factors as well as exploiting more than one modality. Based on MRI and DECEDM modalities, our developed system leads a recognition rate of 99.7% with Decision Tree where an improvement of 24.7 % is obtained owing to semantic analysis.

Age and gender estimation using Region-SIFT and multi-layered SVM

Hyunduk Kim, Sang-Heon Lee, Myoung-Kyu Sohn, et al.

Show abstract

In this paper, we propose an age and gender estimation framework using the region-SIFT feature and multi-layered SVM classifier. The suggested framework entails three processes. The first step is landmark based face alignment. The second step is the feature extraction step. In this step, we introduce the region-SIFT feature extraction method based on facial landmarks. First, we define sub-regions of the face. We then extract SIFT features from each sub-region. In order to reduce the dimensions of features we employ a Principal Component Analysis (PCA) and a Linear Discriminant Analysis (LDA). Finally, we classify age and gender using a multi-layered Support Vector Machines (SVM) for efficient classification. Rather than performing gender estimation and age estimation independently, the use of the multi-layered SVM can improve the classification rate by constructing a classifier that estimate the age according to gender. Moreover, we collect a dataset of face images, called by DGIST_C, from the internet. A performance evaluation of proposed method was performed with the FERET database, CACD database, and DGIST_C database. The experimental results demonstrate that the proposed approach classifies age and performs gender estimation very efficiently and accurately.

Multiview 3D sensing and analysis for high quality point cloud reconstruction

Andrej Satnik, Ebroul Izquierdo, Richard Orjesek

Show abstract

Multiview 3D reconstruction techniques enable digital reconstruction of 3D objects from the real world by fusing different viewpoints of the same object into a single 3D representation. This process is by no means trivial and the acquisition of high quality point cloud representations of dynamic 3D objects is still an open problem. In this paper, an approach for high fidelity 3D point cloud generation using low cost 3D sensing hardware is presented. The proposed approach runs in an efficient low-cost hardware setting based on several Kinect v2 scanners connected to a single PC. It performs autocalibration and runs in real-time exploiting an efficient composition of several filtering methods including Radius Outlier Removal (ROR), Weighted Median filter (WM) and Weighted Inter-Frame Average filtering (WIFA). The performance of the proposed method has been demonstrated through efficient acquisition of dense 3D point clouds of moving objects.

Hand motion modeling for psychology analysis in job interview using optical flow-history motion image: OF-HMI

Intissar Khalifa, Ridha Ejbali, Mourad Zaied

Show abstract

To survive the competition, companies always think about having the best employees. The selection is depended on the answers to the questions of the interviewer and the behavior of the candidate during the interview session. The study of this behavior is always based on a psychological analysis of the movements accompanying the answers and discussions. Few techniques are proposed until today to analyze automatically candidate’s non verbal behavior. This paper is a part of a work psychology recognition system; it concentrates in spontaneous hand gesture which is very significant in interviews according to psychologists. We propose motion history representation of hand based on an hybrid approach that merges optical flow and history motion images. The optical flow technique is used firstly to detect hand motions in each frame of a video sequence. Secondly, we use the history motion images (HMI) to accumulate the output of the optical flow in order to have finally a good representation of the hand‘s local movement in a global temporal template.

Computer Information Engineering and Signal Processing

Speaker emotion recognition: from classical classifiers to deep neural networks

Eya Mezghani, Maha Charfeddine, Henri Nicolas, et al.

Show abstract

Speaker emotion recognition is considered among the most challenging tasks in recent years. In fact, automatic systems for security, medicine or education can be improved when considering the speech affective state. In this paper, a twofold approach for speech emotion classification is proposed. At the first side, a relevant set of features is adopted, and then at the second one, numerous supervised training techniques, involving classic methods as well as deep learning, are experimented. Experimental results indicate that deep architecture can improve classification performance on two affective databases, the Berlin Dataset of Emotional Speech and the SAVEE Dataset Surrey Audio-Visual Expressed Emotion.

Method of estimation of scanning system quality

Eugene Larkin, Vladislav Kotov, Natalya Kotova, et al.

Show abstract

Estimation of scanner parameters is an important part in developing electronic document management system. This paper suggests considering the scanner as a system that contains two main channels: a photoelectric conversion channel and a channel for measuring spatial coordinates of objects. Although both of channels consist of the same elements, the testing of their parameters should be executed separately. The special structure of the two-dimensional reference signal is offered for this purpose. In this structure, the fields for testing various parameters of the scanner are sp atially separated. Characteristics of the scanner are associated with the loss of information when a document is digitized. The methods to test grayscale transmitting ability, resolution and aberrations level are offered.

Squeeze-SegNet: a new fast deep convolutional neural network for semantic segmentation

Geraldin Nanfack, Azeddine Elhassouny, Rachid Oulad Haj Thami

Show abstract

The recent researches in Deep Convolutional Neural Network have focused their attention on improving accuracy that provide significant advances. However, if they were limited to classification tasks, nowadays with contributions from Scientific Communities who are embarking in this field, they have become very useful in higher level tasks such as object detection and pixel-wise semantic segmentation. Thus, brilliant ideas in the field of semantic segmentation with deep learning have completed the state of the art of accuracy, however this architectures become very difficult to apply in embedded systems as is the case for autonomous driving. We present a new Deep fully Convolutional Neural Network for pixel-wise semantic segmentation which we call Squeeze-SegNet. The architecture is based on Encoder-Decoder style. We use a SqueezeNet-like encoder and a decoder formed by our proposed squeeze-decoder module and upsample layer using downsample indices like in SegNet and we add a deconvolution layer to provide final multi-channel feature map. On datasets like Camvid or City-states, our net gets SegNet-level accuracy with less than 10 times fewer parameters than SegNet.

Faulty node detection in wireless sensor networks using a recurrent neural network

Jamila Atiga, Nour Elhouda Mbarki, Ridha Ejbali, et al.

Show abstract

The wireless sensor networks (WSN) consist of a set of sensors that are more and more used in surveillance applications on a large scale in different areas: military, Environment, Health ... etc. Despite the minimization and the reduction of the manufacturing costs of the sensors, they can operate in places difficult to access without the possibility of reloading of battery, they generally have limited resources in terms of power of emission, of processing capacity, data storage and energy. These sensors can be used in a hostile environment, such as, for example, on a field of battle, in the presence of fires, floods, earthquakes. In these environments the sensors can fail, even in a normal operation. It is therefore necessary to develop algorithms tolerant and detection of defects of the nodes for the network of sensor without wires, therefore, the faults of the sensor can reduce the quality of the surveillance if they are not detected. The values that are measured by the sensors are used to estimate the state of the monitored area. We used the Non-linear Auto- Regressive with eXogeneous (NARX), the recursive architecture of the neural network, to predict the state of a node of a sensor from the previous values described by the functions of time series. The experimental results have verified that the prediction of the State is enhanced by our proposed model.

Laban movement analysis to classify emotions from motion

Swati Dewan, Shubham Agarwal, Navjyoti Singh

Show abstract

In this paper, we present the study of Laban Movement Analysis (LMA) to understand basic human emotions from nonverbal human behaviors. While there are a lot of studies on understanding behavioral patterns based on natural language processing and speech processing applications, understanding emotions or behavior from non-verbal human motion is still a very challenging and unexplored field. LMA provides a rich overview of the scope of movement possibilities. These basic elements can be used for generating movement or for describing movement. They provide an inroad to understanding movement and for developing movement efficiency and expressiveness. Each human being combines these movement factors in his/her own unique way and organizes them to create phrases and relationships which reveal personal, artistic, or cultural style. In this work, we build a motion descriptor based on a deep understanding of Laban theory. The proposed descriptor builds up on previous works and encodes experiential features by using temporal windows. We present a more conceptually elaborate formulation of Laban theory and test it in a relatively new domain of behavioral research with applications in human-machine interaction. The recognition of affective human communication may be used to provide developers with a rich source of information for creating systems that are capable of interacting well with humans. We test our algorithm on UCLIC dataset which consists of body motions of 13 non-professional actors portraying angry, fear, happy and sad emotions. We achieve an accuracy of 87.30% on this dataset.

A new centrality measure for identifying influential nodes in social networks

Delel Rhouma, Lotfi Ben Romdhane

Show abstract

The identification of central nodes has been a key problem in the field of social network analysis. In fact, it is a measure that accounts the popularity or the visibility of an actor within a network. In order to capture this concept, various measures, either sample or more elaborate, has been developed. Nevertheless, many of “traditional” measures are not designed to be applicable to huge data. This paper sets out a new node centrality index suitable for large social network. It uses the amount of the neighbors of a node and connections between them to characterize a “pivot” node in the graph. We presented experimental results on real data sets which show the efficiency of our proposal.