Proceedings Volume 10341

Ninth International Conference on Machine Vision (ICMV 2016)

Antanas Verikas, Petia Radeva, Dmitry P. Nikolaev, et al.
cover
Proceedings Volume 10341

Ninth International Conference on Machine Vision (ICMV 2016)

Antanas Verikas, Petia Radeva, Dmitry P. Nikolaev, et al.
Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 16 August 2017
Contents: 10 Sessions, 94 Papers, 0 Presentations
Conference: Ninth International Conference on Machine Vision 2016
Volume Number: 10341

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 10341
  • Target Detection
  • Pattern Recognition
  • Video Processing and Visualization
  • Image Transformation
  • Image Analysis
  • Robot and Machine Vision
  • Medical Image Processing
  • Image Processing and Applications
  • Computer Information Technology and Applications
Front Matter: Volume 10341
icon_mobile_dropdown
Front Matter: Volume 10341
This PDF file contains the front matter associated with SPIE Proceedings Volume 10341, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and Conference Committee listing.
Target Detection
icon_mobile_dropdown
SOFF: Scalable and oriented FAST-based local features
Noura Bouhlel, Anis Ben Ammar, Amel Ksibi, et al.
Local feature detection is a fundamental module in several mobile vision applications such as mobile object recognition and mobile visual search. The effectiveness and the efficiency of a local feature detector decide to what extent it is suitable for a mobile application. Over the past decades, several local feature detectors have been developed. In this paper, we are interested in FAST (Features from Accelerated Segment Test) local feature detector for its efficiency. However, FAST detector shows poor robustness against both scale and rotation changes. Therefore, we aim at enhancing FAST robustness against both scale and rotation changes while maintaining good efficiency. To this end, we propose a Scalable and Oriented FAST-based local Feature detector (SOFF). A comprehensive comparison against FAST detector and its variants is performed on benchmark datasets. Experimental results demonstrate that SOFF detector outperforms other FAST-based detectors in many cases. Furthermore, it is efficient to compute, thereby suitable for mobile vision applications.
Automatic construction of a recurrent neural network based classifier for vehicle passage detection
Evgeny Burnaev, Ivan Koptelov, German Novikov, et al.
Recurrent Neural Networks (RNNs) are extensively used for time-series modeling and prediction. We propose an approach for automatic construction of a binary classifier based on Long Short-Term Memory RNNs (LSTM-RNNs) for detection of a vehicle passage through a checkpoint. As an input to the classifier we use multidimensional signals of various sensors that are installed on the checkpoint. Obtained results demonstrate that the previous approach to handcrafting a classifier, consisting of a set of deterministic rules, can be successfully replaced by an automatic RNN training on an appropriately labelled data.
Moving object classification in infrared and visible spectra
This paper introduces a novel method of moving object classification in Infrared and Visible spectra. This method is based on a data-mining process by combining a set of best features based on shape, texture and motion. The proposed method relies either on visible spectrum or on infrared spectrum according to weather conditions (sunny days, rain, fog, snow, etc.) and timing of the video acquisition. Experimental studies are carried out to prove the efficiency of our predictive models to classify moving objects and the originality of our process with intelligent fusion of VIS-IR spectra.
Human fall detection based on block matching and silhouette area
Mariem Gnouma, Ridha Ejbali, Mourad Zaied
Currently, there are several fall detection systems based on video analysis. However, these systems have not yet reached the desired level of appropriateness and robustness. To reduce the risk of falling in insecure environments, a new method is developed in this paper to detect and predict human fall detection. We adopt, in this approach, a Block Matching motion estimation algorithm based on acceleration and changes of the human body silhouette area, which are obtained from a single surveillance camera. It presents an algorithm to accelerate the fall detection system on based on a local adjustment of the velocity field.
Human motion detection and tracking
Neziha Jaouedi, Soumaya Zaghbani, Noureddine Boujnah, et al.
Human behavior analysis in a video scene is based on motion feature extraction and recognition, many information is hidden behind gesture, sudden motion and walking, many research works tried to model and then recognize human behavior through motion analysis. In our work we will focus on motion extraction and tracking with some enhancements comparing to previous approaches. The first phase of behavior analysis is the extraction of regions in the video in motion. The second phase is devoted to motion tracking using Kalman filter. Performances evaluation are based on both visual quality and numerical values of MSE(Mean Squared Error).
Very deep recurrent convolutional neural network for object recognition
Sourour Brahimi, Najib Ben Aoun, Chokri Ben Amar
In recent years, Computer vision has become a very active field. This field includes methods for processing, analyzing, and understanding images. The most challenging problems in computer vision are image classification and object recognition. This paper presents a new approach for object recognition task. This approach exploits the success of the Very Deep Convolutional Neural Network for object recognition. In fact, it improves the convolutional layers by adding recurrent connections. This proposed approach was evaluated on two object recognition benchmarks: Pascal VOC 2007 and CIFAR-10. The experimental results prove the efficiency of our method in comparison with the state of the art methods.
Fast grasping of unknown objects using cylinder searching on a single point cloud
Qujiang Lei, Martijn Wisse
Grasping of unknown objects with neither appearance data nor object models given in advance is very important for robots that work in an unfamiliar environment. The goal of this paper is to quickly synthesize an executable grasp for one unknown object by using cylinder searching on a single point cloud. Specifically, a 3D camera is first used to obtain a partial point cloud of the target unknown object. An original method is then employed to do post treatment on the partial point cloud to minimize the uncertainty which may lead to grasp failure. In order to accelerate the grasp searching, surface normal of the target object is then used to constrain the synthetization of the cylinder grasp candidates. Operability analysis is then used to select out all executable grasp candidates followed by force balance optimization to choose the most reliable grasp as the final grasp execution. In order to verify the effectiveness of our algorithm, Simulations on a Universal Robot arm UR5 and an under-actuated Lacquey Fetch gripper are used to examine the performance of this algorithm, and successful results are obtained.
Data-driven approach to human motion modeling with Lua and gesture description language
Tomasz Hachaj, Katarzyna Koptyra, Marek R. Ogiela
The aim of this paper is to present the novel proposition of the human motion modelling and recognition approach that enables real time MoCap signal evaluation. By motions (actions) recognition we mean classification. The role of this approach is to propose the syntactic description procedure that can be easily understood, learnt and used in various motion modelling and recognition tasks in all MoCap systems no matter if they are vision or wearable sensor based. To do so we have prepared extension of Gesture Description Language (GDL) methodology that enables movements description and real-time recognition so that it can use not only positional coordinates of body joints but virtually any type of discreetly measured output MoCap signals like accelerometer, magnetometer or gyroscope. We have also prepared and evaluated the cross-platform implementation of this approach using Lua scripting language and JAVA technology. This implementation is called Data Driven GDL (DD-GDL). In tested scenarios the average execution speed is above 100 frames per second which is an acquisition time of many popular MoCap solutions.
Pattern Recognition
icon_mobile_dropdown
Speaker gender identification based on majority vote classifiers
Eya Mezghani, Maha Charfeddine, Henri Nicolas, et al.
Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.
Deep neural network features for horses identity recognition using multiview horses’ face pattern
Islem Jarraya, Wael Ouarda, Adel M. Alimi
To control the state of horses in the born, breeders needs a monitoring system with a surveillance camera that can identify and distinguish between horses. We proposed in [5] a method of horse’s identification at a distance using the frontal facial biometric modality. Due to the change of views, the face recognition becomes more difficult. In this paper, the number of images used in our THoDBRL’2015 database (Tunisian Horses DataBase of Regim Lab) is augmented by adding other images of other views. Thus, we used front, right and left profile face’s view. Moreover, we suggested an approach for multiview face recognition. First, we proposed to use the Gabor filter for face characterization. Next, due to the augmentation of the number of images, and the large number of Gabor features, we proposed to test the Deep Neural Network with the auto-encoder to obtain the more pertinent features and to reduce the size of features vector. Finally, we performed the proposed approach on our THoDBRL’2015 database and we used the linear SVM for classification.
Predicting human activities in sequences of actions in RGB-D videos
David Jardim, Luís Nunes, Miguel Dias
In our daily activities we perform prediction or anticipation when interacting with other humans or with objects. Prediction of human activity made by computers has several potential applications: surveillance systems, human computer interfaces, sports video analysis, human-robot-collaboration, games and health-care. We propose a system capable of recognizing and predicting human actions using supervised classifiers trained with automatically labeled data evaluated in our human activity RGB-D dataset (recorded with a Kinect sensor) and using only the position of the main skeleton joints to extract features. Using conditional random fields (CRFs) to model the sequential nature of actions in a sequence has been used before, but where other approaches try to predict an outcome or anticipate ahead in time (seconds), we try to predict what will be the next action of a subject. Our results show an activity prediction accuracy of 89.9% using an automatically labeled dataset.
Interacting with mobile devices by fusion eye and hand gestures recognition systems based on decision tree approach
Hanene Elleuch, Ali Wali, Anis Samet, et al.
Two systems of eyes and hand gestures recognition are used to control mobile devices. Based on a real-time video streaming captured from the device's camera, the first system recognizes the motion of user's eyes and the second one detects the static hand gestures. To avoid any confusion between natural and intentional movements we developed a system to fuse the decision coming from eyes and hands gesture recognition systems. The phase of fusion was based on decision tree approach. We conducted a study on 5 volunteers and the results that our system is robust and competitive.
Towards human behavior recognition based on spatio temporal features and support vector machines
Sawsen Ghabri, Wael Ouarda, Adel M. Alimi
Security and surveillance are vital issues in today’s world. The recent acts of terrorism have highlighted the urgent need for efficient surveillance. There is indeed a need for an automated system for video surveillance which can detect identity and activity of person. In this article, we propose a new paradigm to recognize an aggressive human behavior such as boxing action. Our proposed system for human activity detection includes the use of a fusion between Spatio Temporal Interest Point (STIP) and Histogram of Oriented Gradient (HoG) features. The novel feature called Spatio Temporal Histogram Oriented Gradient (STHOG). To evaluate the robustness of our proposed paradigm with a local application of HoG technique on STIP points, we made experiments on KTH human action dataset based on Multi Class Support Vector Machines classification. The proposed scheme outperforms basic descriptors like HoG and STIP to achieve 82.26% us an accuracy value of classification rate.
Android application for handwriting segmentation using PerTOHS theory
Hanen Akouaydi, Sourour Njah, Adel M. Alimi
The paper handles the problem of segmentation of handwriting on mobile devices. Many applications have been developed in order to facilitate the recognition of handwriting and to skip the limited numbers of keys in keyboards and try to introduce a space of drawing for writing instead of using keyboards. In this one, we will present a mobile theory for the segmentation of for handwriting uses PerTOHS theory, Perceptual Theory of On line Handwriting Segmentation, where handwriting is defined as a sequence of elementary and perceptual codes. In fact, the theory analyzes the written script and tries to learn the handwriting visual codes features in order to generate new ones via the generated perceptual sequences. To get this classification we try to apply the Beta-elliptic model, fuzzy detector and also genetic algorithms in order to get the EPCs (Elementary Perceptual Codes) and GPCs (Global Perceptual Codes) that composed the script. So, we will present our Android application M-PerTOHS for segmentation of handwriting.
Off-lexicon online Arabic handwriting recognition using neural network
Hamdi Yahia, Aymen Chaabouni, Houcine Boubaker, et al.
This paper highlights a new method for online Arabic handwriting recognition based on graphemes segmentation. The main contribution of our work is to explore the utility of Beta-elliptic model in segmentation and features extraction for online handwriting recognition. Indeed, our method consists in decomposing the input signal into continuous part called graphemes based on Beta-Elliptical model, and classify them according to their position in the pseudo-word. The segmented graphemes are then described by the combination of geometric features and trajectory shape modeling. The efficiency of the considered features has been evaluated using feed forward neural network classifier. Experimental results using the benchmarking ADAB Database show the performance of the proposed method.
Real-time hand gesture recognition based on feature points extraction
Soumaya Zaghbani, Neziha Jaouedi, Noureddine Boujnah, et al.
Tracking moving objects is an area increasingly known in computer vision field. It plays a very important role in human-computer interaction. In this context we have developed a hand tracking and gesture recognition system that allows interaction with the machine in an intuitive and natural way. To ensure the tracking we apply the Kalman filter and detect the optimal points of the hand in order to determine the gesture expressed by user.
Methodology for mammal classification in camera trap images
Luis Pulido Castelblanco, Claudia Isaza Narváez, Angélica Díaz Pulido
Using camera traps in animal ecology studies has increased because it facilitates the work of biologists and allows them to obtain information that otherwise would be impossible. A large number of photographs are capturing with this wildlife photography technique making difficult their posterior analysis. This paper presents a method to automatically identify the images with at least one animal and to classify them between birds and mammals. In this work a fuzzy classifier and a matched filter were used to identify the image with animals and to segment the images. An artificial neural network was employed to classify the segments between birds and mammals. We obtained a classification accuracy of 73.1% validating the model over real camera trap sessions. The database includes several difficulties, as the constant changes in the scene by climatic factors or animals partially occluded by the environment. This method was implemented in a software that is currently using in the Alexander von Humboldt Biological Resources Research Institute for studies of biodiversity in Colombia.
Convolutional neural networks with balanced batches for facial expressions recognition
Elena Battini Sönmez, Angelo Cangelosi
This paper considers the issue of fully automatic emotion classification on 2D faces. In spite of the great effort done in recent years, traditional machine learning approaches based on hand-crafted feature extraction followed by the classification stage failed to develop a real-time automatic facial expression recognition system. The proposed architecture uses Convolutional Neural Networks (CNN), which are built as a collection of interconnected processing elements to simulate the brain of human beings. The basic idea of CNNs is to learn a hierarchical representation of the input data, which results in a better classification performance. In this work we present a block-based CNN algorithm, which uses noise, as data augmentation technique, and builds batches with a balanced number of samples per class. The proposed architecture is a very simple yet powerful CNN, which can yield state-of-the-art accuracy on the very competitive benchmark algorithm of the Extended Cohn Kanade database.
Reading recognition of pointer meter based on pattern recognition and dynamic three-points on a line
Yongqiang Zhang, Mingli Ding, Wuyifang Fu, et al.
Pointer meters are frequently applied to industrial production for they are directly readable. They should be calibrated regularly to ensure the precision of the readings. Currently the method of manual calibration is most frequently adopted to accomplish the verification of the pointer meter, and professional skills and subjective judgment may lead to big measurement errors and poor reliability and low efficiency, etc. In the past decades, with the development of computer technology, the skills of machine vision and digital image processing have been applied to recognize the reading of the dial instrument. In terms of the existing recognition methods, all the parameters of dial instruments are supposed to be the same, which is not the case in practice. In this work, recognition of pointer meter reading is regarded as an issue of pattern recognition. We obtain the features of a small area around the detected point, make those features as a pattern, divide those certified images based on Gradient Pyramid Algorithm, train a classifier with the support vector machine (SVM) and complete the pattern matching of the divided mages. Then we get the reading of the pointer meter precisely under the theory of dynamic three points make a line (DTPML), which eliminates the error caused by tiny differences of the panels. Eventually, the result of the experiment proves that the proposed method in this work is superior to state-of-the-art works.
Recognizing online Arabic handwritten characters using a deep architecture
Najiba Tagougui, Monji Kherallah
Recognizing the online Arabic handwritten script has been gaining more interest because of the impressive advances in mobile device requiring more and more intelligent handwritten recognizers. Since it was demonstrated within many previous research that Deep Neural Networks (DNN) exhibit a great performance, we propose in this work a new system based on a DNN in which we try to optimize the training process by a smooth construct of the deep architecture. The Output’s error of each unit in the previous layer will be computed and only the smallest error will be maintained in the next iteration. This paper uses LMCA database for training and testing data. The experimental study reveals that our proposed DBNN using generated Bottleneck features can outperform state of the art online recognizers.
CS-BoW: a scalable parallel image recognition method
In this paper, a scalable parallel image recognition method, CS-BoW (Class-Specific Bag of Words), is proposed. CSBoW builds submodels for each class using weighted BoW in training phase. In test phase, CS-BoW calculates and sorts the coding residual for every class of each test image, then each test image is assigned to the class which has the smallest coding residual. The proposed CS-BoW is easy to be scaled, the only thing to do to extend training dataset with a new class is to build a submodel for the new class. No extra calculation for existing data. CS-BoW calculates coding residual of each test image for every class, so it is convenient to obtain Top N accuracy, while many other image recognition methods can only release Top 1 accuracy. The experimental results show that CS-BoW achieves comparable accuracy in very short time if it is run parallelly, it can be as fast as 0.05s per image in caltech 101 and caltech 256.
Initial proposition of kinematics model for selected karate actions analysis
Tomasz Hachaj, Katarzyna Koptyra, Marek R. Ogiela
The motivation for this paper is to initially propose and evaluate two new kinematics models that were developed to describe motion capture (MoCap) data of karate techniques. We decided to develop this novel proposition to create the model that is capable to handle actions description both from multimedia and professional MoCap hardware. For the evaluation purpose we have used 25-joints data with karate techniques recordings acquired with Kinect version 2. It is consisted of MoCap recordings of two professional sport (black belt) instructors and masters of Oyama Karate. We have selected following actions for initial analysis: left-handed furi-uchi punch, right leg hiza-geri kick, right leg yoko-geri kick and left-handed jodan-uke block. Basing on evaluation we made we can conclude that both proposed kinematics models seems to be convenient method for karate actions description. From two proposed variables models it seems that global might be more useful for further usage. We think that because in case of considered punches variables seems to be less correlated and they might also be easier to interpret because of single reference coordinate system. Also principal components analysis proved to be reliable way to examine the quality of kinematics models and with the plot of the variable in principal components space we can nicely present the dependences between variables.
Trigram-based algorithms for OCR result correction
Konstantin Bulatov, Temudzhin Manzhikov, Oleg Slavin, et al.
In this paper we consider a task of improving optical character recognition (OCR) results of document fields on low-quality and average-quality images using N-gram models. Cyrillic fields of Russian Federation internal passport are analyzed as an example. Two approaches are presented: the first one is based on hypothesis of dependence of a symbol from two adjacent symbols and the second is based on calculation of marginal distributions and Bayesian networks computation. A comparison of the algorithms and experimental results within a real document OCR system are presented, it's showed that the document field OCR accuracy can be improved by more than 6% for low-quality images.
Slant rectification in Russian passport OCR system using fast Hough transform
Elena Limonova, Pavel Bezmaternykh, Dmitry Nikolaev, et al.
In this paper, we introduce slant detection method based on Fast Hough Transform calculation and demonstrate its application in industrial system for Russian passports recognition. About 1.5% of this kind of documents appear to be slant or italic. This fact reduces recognition rate, because Optical Recognition Systems are normally designed to process normal fonts. Our method uses Fast Hough Transform to analyse vertical strokes of characters extracted with the help of x-derivative of a text line image. To improve the quality of detector we also introduce field grouping rules. The resulting algorithm allowed to reach high detection quality. Almost all errors of considered approach happen on passports of nonstandard fonts, while slant detector works in appropriate way.
Fast integer approximations in convolutional neural networks using layer-by-layer training
Dmitry Ilin, Elena Limonova, Vladimir Arlazarov, et al.
This paper explores method of layer-by-layer training for neural networks to train neural network, that use approximate calculations and/or low precision data types. Proposed method allows to improve recognition accuracy using standard training algorithms and tools. At the same time, it allows to speed up neural network calculations using fast-processed approximate calculations and compact data types. We consider 8-bit fixed-point arithmetic as the example of such approximation for image recognition problems. In the end, we show significant accuracy increase for considered approximation along with processing speedup.
Video Processing and Visualization
icon_mobile_dropdown
Real-time mobile phone dialing system based on SSVEP
Dongsheng Wang, Toshiki Kobayashi, Gaochao Cui, et al.
Brain computer interface (BCI) systems based on the steady state visual evoked potential (SSVEP) provide higher information transfer rates and require shorter training time than BCI systems using other brain signals. It has been widely used in brain science, rehabilitation engineering, biomedical engineering and intelligent information processing. In this paper, we present a real-time mobile phone dialing system based on SSVEP, and it is more portable than other dialing system because the flashing dial interface is set on a small tablet. With this online BCI system, we can take advantage of this system based on SSVEP to identify the specific frequency on behalf of a number using canonical correlation analysis (CCA) method and dialed out successfully without using any physical movements such as finger tapping. This phone dialing system will be promising to help disable patients to improve the quality of lives.
Local visual similarity descriptor for describing local region
Many works have devoted to exploring local region information including both the information of the local features in local region and their spatial relationships, but none of these can provide a compact representation of the information. To achieve this, we propose a new approach named Local Visual Similarity (LVS). LVS first calculates the similarities among the local features in a local region and then forms these similarities as a single vector named LVS descriptor. In our experiments, we show that LVS descriptor can preserve local region information with low dimensionality. Besides, experimental results on two public datasets also demonstrate the effectiveness of LVS descriptor.
Visual attention in egocentric field-of-view using RGB-D data
Veronika Olesova, Wanda Benesova, Patrik Polatsek
Most of the existing solutions predicting visual attention focus solely on referenced 2D images and disregard any depth information. This aspect has always represented a weak point since the depth is an inseparable part of the biological vision. This paper presents a novel method of saliency map generation based on results of our experiments with egocentric visual attention and investigation of its correlation with perceived depth. We propose a model to predict the attention using superpixel representation with an assumption that contrast objects are usually salient and have a sparser spatial distribution of superpixels than their background. To incorporate depth information into this model, we propose three different depth techniques. The evaluation is done on our new RGB-D dataset created by SMI eye-tracker glasses and KinectV2 device.
Temporal and spatial information extraction from videos based on the change in length of the shadow
Jiayun Wang, Jian Zu, Likang Wang
In this paper, considering the atmospheric refractive index, we present an approach to extract the recording date and geo-location from videos, based on the alteration of the shadow length. The paper carefully takes different information (photographed date, the length of the selected object) of the given video into consideration and forms a comprehensive approach to extract the temporal and spatial information of the given video. On the basis of this approach, we analyze the shadow length data of a chosen object from a real video and extract the temporal and spatial information of the video. Compared with the actual information, the error is less than 1%, which proves the validity of our approach.
Video denoising using low rank tensor decomposition
Lihua Gui, Gaochao Cui, Qibin Zhao, et al.
Reducing noise in a video sequence is of vital important in many real-world applications. One popular method is block matching collaborative filtering. However, the main drawback of this method is that noise standard deviation for the whole video sequence is known in advance. In this paper, we present a tensor based denoising framework that considers 3D patches instead of 2D patches. By collecting the similar 3D patches non-locally, we employ the low-rank tensor decomposition for collaborative filtering. Since we specify the non-informative prior over the noise precision parameter, the noise variance can be inferred automatically from observed video data. Therefore, our method is more practical, which does not require knowing the noise variance. The experimental on video denoising demonstrates the effectiveness of our proposed method.
Forward rectification: spatial image normalization for a video from a forward facing vehicle camera
Viktor Prun, Dmitri Polevoy, Vassiliy Postnikov
The work in this paper is focused around visual ADAS (Advanced Driver Assistance Systems). We introduce forward rectification – a technique for making computer vision algorithms more robust against camera mount point and mount angles. Using the technique can increase the quality of recognition as well as lower the dimensionality for algorithm invariance, making it possible to apply simpler affine-invariant algorithms for applications that require projective invariance. Providing useful results this rectification requires thorough calibration of the camera, which can be done automatically or semi-automatically. The technique is of general nature and can be applied to different algorithms, such as pattern matching detectors, convolutional neural networks. The applicability of the technique is demonstrated on HOG-based car detector detection rate.
A study of vignetting correction methods in camera colorimetric calibration
The procedure of colorimetric calibration of the camera ensures accurate and repeatable acquisition of the scene colors. The most common approach defines the calibration only as the color transformation between the camera image colors and colorimetric colors. The only condition for image acquisition is an uniform illumination of the scene. Unfortunately, such assumption does not include many distortions caused by image acquisition. One of them is a vignetting, which can be described as a decrease of the light intensity from the image center to the image corners. This phenomenon causes the same effects as non-uniform illumination, which is the change of color values of the same object depending on image coordinates. This paper is an attempt to analyze the influence of vignetting correction on the results of the camera colorimetric calibration. The conducted experiment in uniform light conditions shows that the improvement of calibration quality depends on the chosen vignetting correction method.
Snapscreen: TV-stream frame search with projectively distorted and noisy query
Natalya Skoryukina, Timofey Chernov, Konstantin Bulatov, et al.
In this work we describe an approach to real-time image search in large databases robust to variety of query distortions such as lighting alterations, projective distortions or digital noise. The approach is based on the extraction of keypoints and their descriptors, random hierarchical clustering trees for preliminary search and RANSAC for refining search and result scoring. The algorithm is implemented in Snapscreen system which allows determining a TV-channel and a TV-show from a picture acquired with mobile device. The implementation is enhanced using preceding localization of screen region. Results for the real-world data with different modifications of the system are presented.
Image Transformation
icon_mobile_dropdown
Comparative study of feature selection with ensemble learning using SOM variants
Ameni Filali, Chiraz Jlassi, Najet Arous
Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.
Parameterized adaptive predictor for digital image compression based on the differential pulse code modulation
The adaptive nonlinear predictor is proposed for digital image compression method based on differential pulse code modulation. Greham predictor is parameterized for this. Proposed predictor works in different ways depending on the local image contours. A special feature is offered for estimation the contour direction and intensity in the neighborhood of the current pixel. The parameters of the proposed predictor are calculated by rapid training procedure before the actual compression. This procedure minimizes the sum of absolute values of prediction errors. Theoretical computational complexity of the proposed predictor is shown. Considered predictors are compared in real images by computational experiments. The win of proposed algorithm is demonstrated. In addition, the gain of compression method based on differential pulse code modulation with the proposed predictor against JPEG compression method is demonstrated.
A hybrid method of natural scene text detection using MSERs masks in HSV space color
Text detection in natural scenes holds great importance in the field of research and still remains a challenge and an important task because of size, various fonts, line orientation, different illumination conditions, weak characters and complex backgrounds in image. The contribution of our proposed method is to filtering out complex backgrounds by combining three strategies. These are enhancing the edge candidate detection in HSV space color, then using MSER candidate detection to get different masks applied in HSV space color as well as gray color. After that, we opt for the Stroke Width Transform (SWT) and heuristic filtering. Such strategies are followed so as to maximize the capacity of zones text pixels candidates and distinguish between text boxes and the rest of the image. The non-text components are filtered by classifying the characters candidates based on Support Vector Machines (SVM) using Histogram of Oriented Gradients (HOG) features. Finally we apply boundary box localization after a stage of word grouping where false positives are eliminated by geometrical properties of text blocks. The proposed method has been evaluated on ICDAR 2013 scene text detection competition dataset and the encouraging experiments results demonstrate the robustness of our method.
Semi-regular remeshing based trust region spherical geometry image for 3D deformed mesh used MLWNN
Naziha Dhibi, Akram Elkefi, Wajdi Bellil, et al.
Triangular surface are now widely used for modeling three-dimensional object, since these models are very high resolution and the geometry of the mesh is often very dense, it is then necessary to remesh this object to reduce their complexity, the mesh quality (connectivity regularity) must be ameliorated. In this paper, we review the main methods of semi-regular remeshing of the state of the art, given the semi-regular remeshing is mainly relevant for wavelet-based compression, then we present our method for re-meshing based trust region spherical geometry image to have good scheme of 3d mesh compression used to deform 3D meh based on Multi library Wavelet Neural Network structure (MLWNN). Experimental results show that the progressive re-meshing algorithm capable of obtaining more compact representations and semi-regular objects and yield an efficient compression capabilities with minimal set of features used to have good 3D deformation scheme.
Image deblurring in video stream based on two-level image model
Arseniy Mukovozov, Dmitry Nikolaev, Elena Limonova
An iterative algorithm is proposed for blind multi-image deblurring of binary images. The binarity is the only prior restriction imposed on the image. Image formation model assumes convolution with arbitrary kernel and addition of a constant value. Penalty functional is composed using binarity constraint for regularization. The algorithm estimates the original image and distortion parameters by alternate reduction of two parts of this functional. Experimental results for natural (non-synthetic) data are present.
Automatic topics segmentation for TV news video
Mounira Hmayda, Ridha Ejbali, Mourad Zaied
Automatic identification of television programs in the TV stream is an important task for operating archives. This article proposes a new spatio-temporal approach to identify the programs in TV stream into two main steps: First, a reference catalogue for video features visual jingles built. We operate the features that characterize the instances of the same program type to identify the different types of programs in the flow of television. The role of video features is to represent the visual invariants for each visual jingle using appropriate automatic descriptors for each television program. On the other hand, programs in television streams are identified by examining the similarity of the video signal for visual grammars in the catalogue. The main idea of the identification process is to compare the visual similarity of the video signal features in the flow of television to the catalogue. After presenting the proposed approach, the paper overviews encouraging experimental results on several streams extracted from different channels and compounds of several programs.
Variational frame difference models for motion segmentation
Frame difference method is a good method for motion segmentation, but its result contains much wrong motion regions and incomplete motion objects. In this paper we combine variational method with frame difference method to propose two motion segmentation models, and the proposed models are based on different invariance assumptions. The models can detect motion objects and make up for the inadequacy of frame differential method with smooth terms. Experimental results show that the proposed models can detect motion objects better.
Unsupervised color texture segmentation using active contour model and oscillating information
Jinpeng Zhang, Guodong Wang, Zhenkuan Pan, et al.
It is common that textures occur in real-word color image, moreover, textures could cause difficulties in image segmentation. For the purpose of solving those difficulties, we put forward a new model. In this model we only need the structural and oscillating components’ information of the real color image. This model is based on the VO model, MTV and active contour models. We will use the fast Split Bregman algorithm to solve this model. The results of our model is mentioned in numerical experiments.
Random forest feature selection approach for image segmentation
László Lefkovits, Szidónia Lefkovits, Simina Emerich, et al.
In the field of image segmentation, discriminative models have shown promising performance. Generally, every such model begins with the extraction of numerous features from annotated images. Most authors create their discriminative model by using many features without using any selection criteria. A more reliable model can be built by using a framework that selects the important variables, from the point of view of the classification, and eliminates the unimportant once. In this article we present a framework for feature selection and data dimensionality reduction. The methodology is built around the random forest (RF) algorithm and its variable importance evaluation. In order to deal with datasets so large as to be practically unmanageable, we propose an algorithm based on RF that reduces the dimension of the database by eliminating irrelevant features. Furthermore, this framework is applied to optimize our discriminative model for brain tumor segmentation.
Comparison of k-means related clustering methods for nuclear medicine images segmentation
Damian Borys, Pawel Bzowski, Marta Danch-Wierzchowska, et al.
Image segmentation is often used in medical image processing. This crucial task can affect all results obtained from the further steps of image analysis. In nuclear medicine emission tomography imaging, where acquired and reconstructed images contain large noise and high blurring level, segmentation and tumour boundaries delineation can be very challenging task. Many from already existing image segmentation methods are based on clustering. In this work, we have tested and implemented a few clustering based methods. We have mainly focused on k-means related algorithms to evaluate and compare their accuracy. In this group we have chosen k-means algorithm, k-medoid clustering, and fuzzy C-means (FCM) method. Results for all methods were verified using the gold standard obtained from anatomical image co-registered and emission tomography dataset. Numerical values of both datasets matching were calculated using the Jaccard index. Results were compared with standard segmentation algorithm based on fixed threshold (standardized uptake value - SUV with threshold 2.5), which is a commonly used standard in clinical practice and also with previously implemented and verified methods (including game theoretical algorithm). For all tested methods we have obtained very similar results, comparable to SUV 2.5 threshold method but worse than the game theoretic method.
Parallel implementation of a watershed algorithm on shared memory multicore architecture
Watershed transform is widely used in image segmentation. In literature, this transform is computed by various algorithms among which the M-border kernel algorithm [1]. This algorithm computes the watershed transform in the framework of edge weighted graphs. It is based on a local property that makes it adapted to parallelization. In this paper we propose a parallel implementation of this algorithm. We start by studying the data dependencies problematic that it raises. We give then an approach that allows overcoming this problematic based on an alternated edges processing strategy. The implementation of this strategy on a shared memory multicore architecture using a Single Program Multiple Data (SPMD) approach proves its effectiveness. In fact, experimental results show that our implementation achieves a relative speedup factor of 2.8 using 4 processors over the performance of the sequential algorithm using a single processor on the same system.
Image Analysis
icon_mobile_dropdown
A new key recovery attack against DM-QIM image watermarking algorithm
Vitaly Mitekin
In this paper a new attack aimed at DM-QIM watermark detection and extraction is proposed, based on known histogram-based attacks against non-dithered QIM algorithm. Key recovery results obtained during proposed attack shows that for a sets of watermarked images it is possible to outperform known watermark detection attacks proposed for DM-QIM in case if non-local image statistics is utilized. Also several known techniques for secure key synchronization are recommended to improve robustness of existing DM-QIM robustness against proposed attack.
To image analysis in computed tomography
The presence of errors in tomographic image may lead to misdiagnosis when computed tomography (CT) is used in medicine, or the wrong decision about parameters of technological processes when CT is used in the industrial applications. Two main reasons produce these errors. First, the errors occur on the step corresponding to the measurement, e.g. incorrect calibration and estimation of geometric parameters of the set-up. The second reason is the nature of the tomography reconstruction step. At the stage a mathematical model to calculate the projection data is created. Applied optimization and regularization methods along with their numerical implementations of the method chosen have their own specific errors. Nowadays, a lot of research teams try to analyze these errors and construct the relations between error sources. In this paper, we do not analyze the nature of the final error, but present a new approach for the calculation of its distribution in the reconstructed volume. We hope that the visualization of the error distribution will allow experts to clarify the medical report impression or expert summary given by them after analyzing of CT results. To illustrate the efficiency of the proposed approach we present both the simulation and real data processing results.
Combining convolutional neural networks and Hough Transform for classification of images containing lines
In this paper, we propose an expansion of convolutional neural network (CNN) input features based on Hough Transform. We perform morphological contrasting of source image followed by Hough Transform, and then use it as input for some convolutional filters. Thus, CNNs computational complexity and the number of units are not affected. Morphological contrasting and Hough Transform are the only additional computational expenses of introduced CNN input features expansion. Proposed approach was demonstrated on the example of CNN with very simple structure. We considered two image recognition problems, that were object classification on CIFAR-10 and printed character recognition on private dataset with symbols taken from Russian passports. Our approach allowed to reach noticeable accuracy improvement without taking much computational effort, which can be extremely important in industrial recognition systems or difficult problems utilising CNNs, like pressure ridge analysis and classification.
Fast techniques for nonlinear mapping of hyperspectral data
Evgeny Myasnikov
The paper considers the problem of fast nonlinear mapping of hyperspectral data. The analysis of various ways to speed-up the nonlinear mapping algorithm is performed, including stochastic algorithms, methods based on space partitioning, interpolation techniques, and parallel implementations using modern parallel architecture. The general scheme for hyperspectral data processing that summarizes the analyzed methods and algorithms is given with recommendations. Experimental results for the proposed technique are presented for well-known hyperspectral images. Possible applications of the technique for hyperspectral image analysis are discussed in the paper.
The real time endoscopic image analysis
Radi Kadushnikov, Dmitry Bykov, Sergey Studenok, et al.
A new method for endoscopic image analysis in real time is presented. This method allows improving accuracy and helps avoiding subjectivity when performing endoscopic image classification. The method is based on use of Scale– invariant feature transform detector and computation of gastric mucosa pit–pattern skeletons. Subsequent use of the “Bag of visual words” method (“Bag of features”) and K–means method for key points clustering allows image classification with more than 85% accuracy.
Detection of informative fragments for image quality assessment
Algorithms for detection of informative image fragments are proposed. Algorithms detect fragments of two types. Fragments of the first type («homogeneous fragments») are used to estimate the noise in the image. Fragments of the second type («stepped fragment») are used to estimate the frequency response of distortion system. Thus, these fragments can be used for assessment of image distortions and image quality in natural scenarios.
New image quality metric used for the assessment of color quantization algorithms
Color quantization is an important operation in the field of color image processing. In this paper, we consider a usefulness of the new DSCSI metric to assessment of quantized images. This metric is shown in the background of other useful image quality metrics to evaluate the color image differences and it has also been proven that DSCSI metric achieves the highest correlation coefficients with MOS. For further veriffcation DSCSI metric the combined methods that use to initialize the results of well-known splitting algorithms such as POP, MC, Wu etc. were tested. Experimental results of such combined methods indicate that the Wu+KM approach leading to the best quantized images in the sense of DSCSI metric.
Further applications of the DSCSI metric for evaluating color quantization
Color image quantization is an often used in such tasks as image compression and image segmentation. In the paper, we continue to consider the usefulness of the new DSCSI metric for evaluating quantized images. Our use of the DSCSI metric confirmed that the color quantization in the CIELAB color space is better than in the basic RGB color space. On several examples we found very good DSCSI suitability in the case of quantization with dithering. During the tests of different dithering algorithms the best results, in terms of DSCSI metric, reached the classical Floyd-Steinberg algorithm at error propagation level of 75-85%.
Robot and Machine Vision
icon_mobile_dropdown
Active classifier selection for RGB-D object categorization using a Markov random field ensemble method
Maximilian Durner, Zoltán Márton, Ulrich Hillenbrand, et al.
In this work, a new ensemble method for the task of category recognition in different environments is presented. The focus is on service robotic perception in an open environment, where the robot’s task is to recognize previously unseen objects of predefined categories, based on training on a public dataset. We propose an ensemble learning approach to be able to flexibly combine complementary sources of information (different state-of-the-art descriptors computed on color and depth images), based on a Markov Random Field (MRF). By exploiting its specific characteristics, the MRF ensemble method can also be executed as a Dynamic Classifier Selection (DCS) system. In the experiments, the committee- and topology-dependent performance boost of our ensemble is shown. Despite reduced computational costs and using less information, our strategy performs on the same level as common ensemble approaches. Finally, the impact of large differences between datasets is analyzed.
Robotic system construction with mechatronic components inverted pendulum: humanoid robot
Lucian Alexandru Sandru, Marius Florin Crainic, Diana Savu, et al.
Mechatronics is a new methodology used to achieve an optimal design of an electromechanical product. This methodology is collection of practices, procedures and rules used by those who work in particular branch of knowledge or discipline. Education in mechatronics at the Polytechnic University Timisoara is organized on three levels: bachelor, master and PhD studies. These activities refer and to design the mechatronics systems. In this context the design, implementation and experimental study of a family of mechatronic demonstrator occupy an important place. In this paper, a variant for a mechatronic demonstrator based on the combination of the electrical and mechanical components is proposed. The demonstrator, named humanoid robot, is equivalent with an inverted pendulum. Is presented the analyze of components for associated functions of the humanoid robot. This type of development the mechatronic systems by the combination of hardware and software, offers the opportunity to build the optimal solutions.
Comparative analysis of ROS-based monocular SLAM methods for indoor navigation
This paper presents a comparison of four most recent ROS-based monocular SLAM-related methods: ORB-SLAM, REMODE, LSD-SLAM, and DPPTAM, and analyzes their feasibility for a mobile robot application in indoor environment. We tested these methods using video data that was recorded from a conventional wide-angle full HD webcam with a rolling shutter. The camera was mounted on a human-operated prototype of an unmanned ground vehicle, which followed a closed-loop trajectory. Both feature-based methods (ORB-SLAM, REMODE) and direct SLAMrelated algorithms (LSD-SLAM, DPPTAM) demonstrated reasonably good results in detection of volumetric objects, corners, obstacles and other local features. However, we met difficulties with recovering typical for offices homogeneously colored walls, since all of these methods created empty spaces in a reconstructed sparse 3D scene. This may cause collisions of an autonomously guided robot with unfeatured walls and thus limits applicability of maps, which are obtained by the considered monocular SLAM-related methods for indoor robot navigation.
Design of the arm-wrestling robot’s force acquisition system based on Qt
Zhixiang Huo, Feng Chen, Yongtao Wang
As a collection of entertainment and medical rehabilitation in a robot, the research on the arm-wrestling robot is of great significance. In order to achieve the collection of the arm-wrestling robot’s force signals, the design and implementation of arm-wrestling robot’s force acquisition system is introduced in this paper. The system is based on MP4221 data acquisition card and is programmed by Qt. It runs successfully in collecting the analog signals on PC. The interface of the system is simple and the real-time performance is good. The result of the test shows the feasibility in arm-wrestling robot.
The research on visual industrial robot which adopts fuzzy PID control algorithm
Yifei Feng, Guoping Lu, Lulin Yue, et al.
The control system of six degrees of freedom visual industrial robot based on the control mode of multi-axis motion control cards and PC was researched. For the variable, non-linear characteristics of industrial robot`s servo system, adaptive fuzzy PID controller was adopted. It achieved better control effort. In the vision system, a CCD camera was used to acquire signals and send them to video processing card. After processing, PC controls the six joints` motion by motion control cards. By experiment, manipulator can operate with machine tool and vision system to realize the function of grasp, process and verify. It has influence on the manufacturing of the industrial robot.
The calculation of a projective transformation in the problem of planar object targeting by feature points
The work is devoted to the research on the calculation of a projective transformation, which arises in the problems in machine vision. The details of the calculation of projective transformation and found specificities of mathematical libraries implementations are carefully analyzed. The comparisons of different approaches are provided in terms of both productivity and accuracy, using both artificially generated and real data.
Machine vision and appearance based learning
Smart algorithms are used in Machine vision to organize or extract high-level information from the available data. The resulted high-level understanding the content of images received from certain visual sensing system and belonged to an appearance space can be only a key first step in solving various specific tasks such as mobile robot navigation in uncertain environments, road detection in autonomous driving systems, etc. Appearance-based learning has become very popular in the field of machine vision. In general, the appearance of a scene is a function of the scene content, the lighting conditions, and the camera position. Mobile robots localization problem in machine learning framework via appearance space analysis is considered. This problem is reduced to certain regression on an appearance manifold problem, and newly regression on manifolds methods are used for its solution.
Iris recognition and what is next? Iris diagnosis: a new challenging topic for machine vision from image acquisition to image interpretation
Molecular image-based techniques are widely used in medicine to detect specific diseases. Look diagnosis is an important issue but also the analysis of the eye plays an important role in order to detect specific diseases. These topics are important topics in medicine and the standardization of these topics by an automatic system can be a new challenging field for machine vision. Compared to iris recognition has the iris diagnosis much more higher demands for the image acquisition and interpretation of the iris. One understands by iris diagnosis (Iridology) the investigation and analysis of the colored part of the eye, the iris, to discover factors, which play an important role for the prevention and treatment of illnesses, but also for the preservation of an optimum health. An automatic system would pave the way for a much wider use of the iris diagnosis for the diagnosis of illnesses and for the purpose of individual health protection. With this paper, we describe our work towards an automatic iris diagnosis system. We describe the image acquisition and the problems with it. Different ways are explained for image acquisition and image preprocessing. We describe the image analysis method for the detection of the iris. The meta-model for image interpretation is given. Based on this model we show the many tasks for image analysis that range from different image-object feature analysis, spatial image analysis to color image analysis. Our first results for the recognition of the iris are given. We describe how detecting the pupil and not wanted lamp spots. We explain how to recognize orange blue spots in the iris and match them against the topological map of the iris. Finally, we give an outlook for further work.
The automated testing facility based on machine vision for optimizing grain quality control technology
A. V. Panchenko, N. V. Reshetnyak, A. M. Samohin, et al.
The article provides a description of simulation stand construction and software for validation of grain quality analysis technology in combine harvest’s tank while harvesting process with the help of artificial vision-based hardware software complex. The description of training sample preparation for automation of sensors development and testing facility quality control are provided.
Real-time object-to-features vectorisation via Siamese neural networks
Object-to-features vectorisation is a hard problem to solve for objects that can be hard to distinguish. Siamese and Triplet neural networks are one of the more recent tools used for such task. However, most networks used are very deep networks that prove to be hard to compute in the Internet of Things setting. In this paper, a computationally efficient neural network is proposed for real-time object-to-features vectorisation into a Euclidean metric space. We use L2 distance to reflect feature vector similarity during both training and testing. In this way, feature vectors we develop can be easily classified using K-Nearest Neighbours classifier. Such approach can be used to train networks to vectorise such “problematic” objects like images of human faces, keypoint image patches, like keypoints on Arctic maps and surrounding marine areas.
Medical Image Processing
icon_mobile_dropdown
SVM classification of microaneurysms with imbalanced dataset based on borderline-SMOTE and data cleaning techniques
Qingjie Wang, Jingmin Xin, Jiayi Wu, et al.
Microaneurysms are the earliest clinic signs of diabetic retinopathy, and many algorithms were developed for the automatic classification of these specific pathology. However, the imbalanced class distribution of dataset usually causes the classification accuracy of true microaneurysms be low. Therefore, by combining the borderline synthetic minority over-sampling technique (BSMOTE) with the data cleaning techniques such as Tomek links and Wilson’s edited nearest neighbor rule (ENN) to resample the imbalanced dataset, we propose two new support vector machine (SVM) classification algorithms for the microaneurysms. The proposed BSMOTE-Tomek and BSMOTE-ENN algorithms consist of: 1) the adaptive synthesis of the minority samples in the neighborhood of the borderline, and 2) the remove of redundant training samples for improving the efficiency of data utilization. Moreover, the modified SVM classifier with probabilistic outputs is used to divide the microaneurysm candidates into two groups: true microaneurysms and false microaneurysms. The experiments with a public microaneurysms database shows that the proposed algorithms have better classification performance including the receiver operating characteristic (ROC) curve and the free-response receiver operating characteristic (FROC) curve.
On the analysis of local and global features for hyperemia grading
L. Sánchez, N. Barreira, N. Sánchez, et al.
In optometry, hyperemia is the accumulation of blood flow in the conjunctival tissue. Dry eye syndrome or allergic conjunctivitis are two of its main causes. Its main symptom is the presence of a red hue in the eye that optometrists evaluate according to a scale in a subjective manner. In this paper, we propose an automatic approach to the problem of hyperemia grading in the bulbar conjunctiva. We compute several image features on images of the patients' eyes, analyse the relations among them by using feature selection techniques and transform the feature vector of each image to the value in the adequate range by means of machine learning techniques. We analyse different areas of the conjunctiva to evaluate their importance for the diagnosis. Our results show that it is possible to mimic the experts' behaviour through the proposed approach.
Dense-HOG-based drift-reduced 3D face tracking for infant pain monitoring
Ronald W.J.J. Saeijs, Walther E. Tjon A Ten, Peter H. N. de With
This paper presents a new algorithm for 3D face tracking intended for clinical infant pain monitoring. The algorithm uses a cylinder head model and 3D head pose recovery by alignment of dynamically extracted templates based on dense-HOG features. The algorithm includes extensions for drift reduction, using re-registration in combination with multi-pose state estimation by means of a square-root unscented Kalman filter. The paper reports experimental results on videos of moving infants in hospital who are relaxed or in pain. Results show good tracking behavior for poses up to 50 degrees from upright-frontal. In terms of eye location error relative to inter-ocular distance, the mean tracking error is below 9%.
Hybrid approach for detection of dental caries based on the methods FCM and level sets
Marwa Chaabene, Ramzi Ben Ali, Ridha Ejbali, et al.
This paper presents a new technique for detection of dental caries that is a bacterial disease that destroys the tooth structure. In our approach, we have achieved a new segmentation method that combines the advantages of fuzzy C mean algorithm and level set method. The results obtained by the FCM algorithm will be used by Level sets algorithm to reduce the influence of the noise effect on the working of each of these algorithms, to facilitate level sets manipulation and to lead to more robust segmentation. The sensitivity and specificity confirm the effectiveness of proposed method for caries detection.
Finding glenoid surface on scapula in 3D medical images for shoulder joint implant operation planning: 3D OCR
Majid Mohammad Sadeghi, Emin Faruk Kececi, Kerem Bilsel M.D., et al.
Medical imaging has great importance in earlier detection, better treatment and follow-up of diseases. 3D Medical image analysis with CT Scan and MRI images has also been used to aid surgeries by enabling patient specific implant fabrication, where having a precise three dimensional model of associated body parts is essential. In this paper, a 3D image processing methodology for finding the plane on which the glenoid surface has a maximum surface area is proposed. Finding this surface is the first step in designing patient specific shoulder joint implant.
Mammographic mass classification based on possibility theory
Marwa Hmida, Kamel Hamrouni, Basel Solaiman, et al.
Shape and margin features are very important for differentiating between benign and malignant masses in mammographic images. In fact, benign masses are usually round and oval and have smooth contours. However, malignant tumors have generally irregular shape and appear lobulated or speculated in margins. This knowledge suffers from imprecision and ambiguity. Therefore, this paper deals with the problem of mass classification by using shape and margin features while taking into account the uncertainty linked to the degree of truth of the available information and the imprecision related to its content. Thus, in this work, we proposed a novel mass classification approach which provides a possibility based representation of the extracted shape features and builds a possibility knowledge basis in order to evaluate the possibility degree of malignancy and benignity for each mass. For experimentation, the MIAS database was used and the classification results show the great performance of our approach in spite of using simple features.
Abnormal cervical cell detection based on an adaptive margin-based feature selection method
Lili Zhao, Kuan Li, Hongyun Yang, et al.
In an abnormal cervical cell detection system the discriminated abilities of different features are not same so the optimized combination method of all features is an essential component to this system. Feature selection can improve each feature utilization ratio and the performance of the classification problem. The previous efforts of cervical abnormal cell detection are mainly focused on changing feature space into a new one by using a binary weight vector. In this work, the binary weight values are extended to the multiple weight values. According to the statistical distribution situation of the data, an adaptive margin-based weighted feature selection method is proposed in this paper. This method performs best compared with the other 3 methods. The experimental result achieves 96% accuracy in a real-world cervical smear image dataset.
Image Processing and Applications
icon_mobile_dropdown
Towards diverse visual suggestions on Flickr
Ghada Feki, Anis Ben Ammar, Chokri Ben Amar
With the great popularity of the photo sharing site Flickr, the research community is involved to produce innovative applications in order to enhance different Flickr services. In this paper, we present a new process for diverse visual suggestions generation on Flickr. We unify the social aspect of Flickr and the richness of Wikipedia to produce an important number of meanings illustrated by the diverse visual suggestions which can integrate the diversity aspect into the Flickr search. We conduct an experimental study to illustrate the effect of the fusion of the Wikipedia and Flickr knowledge on the diversity rate among the Flickr search and reveal the evolution of the diversity aspect through the returned images among the different results of search engines.
High resolution satellite image indexing and retrieval using SURF features and bag of visual words
Samia Bouteldja, Assia Kourgli
In this paper, we evaluate the performance of SURF descriptor for high resolution satellite imagery (HRSI) retrieval through a BoVW model on a land-use/land-cover (LULC) dataset. Local feature approaches such as SIFT and SURF descriptors can deal with a large variation of scale, rotation and illumination of the images, providing, therefore, a better discriminative power and retrieval efficiency than global features, especially for HRSI which contain a great range of objects and spatial patterns. Moreover, we combine SURF and color features to improve the retrieval accuracy, and we propose to learn a category-specific dictionary for each image category which results in a more discriminative image representation and boosts the image retrieval performance.
Iris indexing based on local intensity order pattern
Simina Emerich, Raul Malutan, Septimiu Crisan, et al.
In recent years, iris biometric systems have increased in popularity and have been proven that are capable of handling large-scale databases. The main advantage of these systems is accuracy and reliability. A proper iris patterns classification is expected to reduce the matching time in huge databases. This paper presents an iris indexing technique based on Local Intensity Order Pattern. The performance of the present approach is evaluated on UPOL database and is compared with other recent systems designed for iris indexing. The results illustrate the potential of the proposed method for large scale iris identification.
A new indexing method of HDR images using color histograms
Various methods for color histogram based indexing of Low Dynamic Range Images, have been developed. All these methods are considered to be effective, but none of the algorithms has been extended to High Dynamic Range (HDR) Images. In this paper, we present a new method for HDR image indexing using histogram intersection in the hue-saturation-value (HSV) color space. For a given HDR image, the proposed approach considers a global descriptor computed according to the quantization of the HSV color space. This descriptor it is highly discriminative and fast to compute. The strength of our approach is proven through experimental results using a database which contains 205 images that are classified into 12 categories. Preliminary results showed that the developed algorithm performs well for HDR image retrieval.
A new method for text detection and recognition in indoor scene for assisting blind people
Hanen Jabnoun, Faouzi Benzarti, Hamid Amiri
Developing assisting system of handicapped persons become a challenging ask in research projects. Recently, a variety of tools are designed to help visually impaired or blind people object as a visual substitution system. The majority of these tools are based on the conversion of input information into auditory or tactile sensory information. Furthermore, object recognition and text retrieval are exploited in the visual substitution systems. Text detection and recognition provides the description of the surrounding environments, so that the blind person can readily recognize the scene. In this work, we aim to introduce a method for detecting and recognizing text in indoor scene. The process consists on the detection of the regions of interest that should contain the text using the connected component. Then, the text detection is provided by employing the images correlation. This component of an assistive blind person should be simple, so that the users are able to obtain the most informative feedback within the shortest time.
Fast adaptive matting based on iterative solution
Jaehwan Kim, HoWon Kim
In this paper, we introduce a fast adaptive matting method, where the adaptive matting is carried with a parallelized iterative method, instead of a closed-form solution. As one of its applications, we also incorporate a saliency detection into the adaptive matting, which provides an automated way of extracting salient objects from a bundle of images. This is useful for various problems including object based retrieval, classification and so on. Numerical experiments and visual comparison with the publicly available sample images show the effectiveness of the proposed method.
Atmospheric correction of hyperspectral images using qualitative information about registered scene
In paper a method of atmospheric correction of hyperspectral images is proposed. On the first stage, observed image is used to obtain parameters of atmospheric distortions using common radiative transfer model. In contrast to other existing approaches we use full nonlinear form of the radiative transfer model and linear spectral model, which is applied to describe undistorted hyperspectral pixels. The combination of both models allows us to evaluate parameters of atmospheric distortions using only hyperspectral image and qualitative information about the scene. The latter is a list of spectral signatories (undistorted), which can appear in different linear combinations in the registered scene. The proposed method does not require any precedential information (sets of pixels containing predefined information) or pure hyperspectral pixels. Thus, it can be applied for blind identification of the atmospheric distortion model and for further atmospheric correction. Experimental results presented in this paper demonstrate performance of the method.
Classification of rice grain varieties arranged in scattered and heap fashion using image processing
Sudhanva Bhat, Sreedath Panat, Arunachalam N
Inspection and classification of food grains is a manual process in many of the food grain processing industries. Automation of such a process is going to be beneficial for industries facing shortage of skilled workforce. Machine Vision techniques are some of the popular approaches for developing such automations. Most of the existing works on the topic deal with identification of the rice variety by analyzing images of well separated and isolated rice grains from which a lot of geometrical features can be extracted. This paper proposes techniques to estimate geometrical parameters from the images of scattered as well as heaped rice grains where the grain boundaries are not clearly identifiable. A methodology based on convexity is proposed to separate touching rice grains in the scattered rice grain images and get their geometrical parameters. And in case of heaped arrangement a Pixel-Distance Contribution Function is defined and is used to get points inside rice grains and then to find the boundary points of rice grains. These points are fit with the equation of an ellipse to estimate their lengths and breadths. The proposed techniques are applied on images of scattered and heaped rice grains of different varieties. It is shown that each variety gives a unique set of results.
A framework of text detection and recognition from natural images for mobile device
Zied Selmi, Mohamed Ben Halima, Ali Wali, et al.
On the light of the remarkable audio-visual effect on modern life, and the massive use of new technologies (smartphones, tablets ...), the image has been given a great importance in the field of communication. Actually, it has become the most effective, attractive and suitable means of communication for transmitting information between different people. Of all the various parts of information that can be extracted from the image, our focus will be particularly on the text. Actually, since its detection and recognition in a natural image is a major problem in many applications, the text has drawn the attention of a great number of researchers in recent years. In this paper, we present a framework for text detection and recognition from natural images for mobile devices.
Classification of foods by transferring knowledge from ImageNet dataset
Elnaz J. Heravi, Hamed H. Aghdam, Domenec Puig
Automatic classification of foods is a way to control food intake and tackle with obesity. However, it is a challenging problem since foods are highly deformable and complex objects. Results on ImageNet dataset have revealed that Convolutional Neural Network has a great expressive power to model natural objects. Nonetheless, it is not trivial to train a ConvNet from scratch for classification of foods. This is due to the fact that ConvNets require large datasets and to our knowledge there is not a large public dataset of food for this purpose. Alternative solution is to transfer knowledge from trained ConvNets to the domain of foods. In this work, we study how transferable are state-of-art ConvNets to the task of food classification. We also propose a method for transferring knowledge from a bigger ConvNet to a smaller ConvNet by keeping its accuracy similar to the bigger ConvNet. Our experiments on UECFood256 datasets show that Googlenet, VGG and residual networks produce comparable results if we start transferring knowledge from appropriate layer. In addition, we show that our method is able to effectively transfer knowledge to the smaller ConvNet using unlabeled samples.
A sparse representation-based approach for copy-move image forgery detection in smooth regions
Copy-move image forgery is the act of cloning a restricted region in the image and pasting it once or multiple times within that same image. This procedure intends to cover a certain feature, probably a person or an object, in the processed image or emphasize it through duplication. Consequences of this malicious operation can be unexpectedly harmful. Hence, the present paper proposes a new approach that automatically detects Copy-move Forgery (CMF). In particular, this work broaches a widely common open issue in CMF research literature that is detecting CMF within smooth areas. Indeed, the proposed approach represents the image blocks as a sparse linear combination of pre-learned bases (a mixture of texture and color-wise small patches) which allows a robust description of smooth patches. The reported experimental results demonstrate the effectiveness of the proposed approach in identifying the forged regions in CM attacks.
Aerial image geolocalization by matching its line structure with route map
I. A. Kunina, A. P. Terekhin, T. M. Khanipov, et al.
The classic way of aerial photographs geolocation is to bind their local coordinates to a geographic coordinate system using GPS and IMU data. At the same time the possibility of geolocation in a jammed navigation field is also of interest for practical purposes. In this paper we consider one approach to visual localization relatively to a vector road map without GPS. We suggest a geolocalization algorithm which detects image line segments and looks for a geometrical transformation which provides the best mapping between the obtained segments set and line segments in the road map. We consider IMU and altimeter data still known which allows to work with orthorectified images. The problem is hence reduced to a search for a transformation which contains an arbitrary shift and bounded rotation and scaling relatively to the vector map. These parameters are estimated using RANSAC by matching straight line segments from the image to vector map segments. We also investigate how the proposed algorithm’s stability is influenced by segment coordinates (two spatial and one angular).
Computer Information Technology and Applications
icon_mobile_dropdown
Random forest ensemble classification based fuzzy logic
Abdelkarim Ben Ayed, Marwa Benhammouda, Mohamed Ben Halima, et al.
In this paper, we treat the supervised data classification, while using the fuzzy random forests that combine the hardiness of the decision trees, the power of the random selection that increases the diversity of the trees in the forest as well as the flexibility of the fuzzy logic for noise. We will be interested in the construction of a forest of fuzzy decision trees. Our system is validated on nine standard classification benchmarks from UCI repository and have the specificity to control some data, to reduce the rate of mistakes and to put in evidence more of hardiness and more of interoperability.
Local polynomial model: a new approach to vignetting correction
The vignetting refers to the fall-off of pixel intensity from the center towards the edges of the image. The correction of vignetting is a required pre-processing step in many applications of machine vision. In this paper, we propose a new local polynomial model of vignetting. The order of the polynomial is a parameter of the model and allows to fit the model to the real vignetting of the camera-lens system. The novelty of the proposed model is a usage of local fitting of the model to vignetting data, in contrast to the global models described in the literature. The new model was tested on two camera-lens systems with radial and non-radial vignetting, and has been compared with methods known from the literature. Based on the obtained results the proposed model gives the best quality of vignetting correction among the tested models of vignetting.
Spatial kernel bandwidth estimation in background modeling
When modeling the background with kernel density estimation, the selection of a proper kernel bandwidth becomes a critical issue. It is not easy, however, to perform pixel-wise kernel bandwidth estimation when the data associated with each pixel is insufficient. In this paper, we present a new method using spatial information to estimate the pixel-wise kernel bandwidth. The number of pixels in a spatial region is large enough to capture the variance of the underlying distribution on which the optimal kernel bandwidth is estimated. To show the effectiveness of the estimated kernel bandwidth, the background subtraction using this bandwidth is applied to OLED defect detection and its result is compared to those using the bandwidths obtained from other approaches.
OpenCL-based vicinity computation for 3D multiresolution mesh compression
3D multiresolution mesh compression systems are still widely addressed in many domains. These systems are more and more requiring volumetric data to be processed in real-time. Therefore, the performance is becoming constrained by material resources usage and an overall reduction in the computational time. In this paper, our contribution entirely lies on computing, in real-time, triangles neighborhood of 3D progressive meshes for a robust compression algorithm based on the scan-based wavelet transform(WT) technique. The originality of this latter algorithm is to compute the WT with minimum memory usage by processing data as they are acquired. However, with large data, this technique is considered poor in term of computational complexity. For that, this work exploits the GPU to accelerate the computation using OpenCL as a heterogeneous programming language. Experiments demonstrate that, aside from the portability across various platforms and the flexibility guaranteed by the OpenCL-based implementation, this method can improve performance gain in speedup factor of 5 compared to the sequential CPU implementation.
Bayesian user modeling: evaluation metrics of an adaptive user interface
Rim Rebai, Mohamed Amin Maalej, Adel Mahfoudhi, et al.
The adaptability of a web application is its ability to react depending on the needs and the preferences of users. Thus, user models, used by such adaptive user interface, contain personal information which is required for learning personalized process. Then, evaluation of web applications interests on how users can learn to achieve their objectives. To gather this information a variety of measures have been used. In this paper, we investigate and present our adaptive Web interface using a Bayesian networks approach and we give a special importance to the evaluation of this web interface. The experiments show that the adaptive Web interface provides results that satisfy the user. We confirmed that the adaptive user interface was more comfortable for use than the fixed user interface.
Artificial intelligent e-learning architecture
Mafawez Alharbi, Mahdi Jemmali
Many institutions and university has forced to use e learning, due to its ability to provide additional and flexible solutions for students and researchers. E-learning In the last decade have transported about the extreme changes in the distribution of education allowing learners to access multimedia course material at any time, from anywhere to suit their specific needs. In the form of e learning, instructors and learners live in different places and they do not engage in a classroom environment, but within virtual universe. Many researches have defined e learning based on their objectives. Therefore, there are small number of e-learning architecture have proposed in the literature. However, the proposed architecture has lack of embedding intelligent system in the architecture of e learning. This research argues that unexplored potential remains, as there is scope for e learning to be intelligent system. This research proposes e-learning architecture that incorporates intelligent system. There are intelligence components, which built into the architecture.
Force-directed visualization for conceptual data models
Andrew Battigaglia, Noah Sutter
Conceptual data models are increasingly stored in an eXtensible Markup Language (XML) format because of its portability between different systems and the ability of databases to use this format for storing data. However, when attempting to capture business or design needs, an organized graphical format is preferred in order to facilitate communication to receive as much input as possible from users and subject-matter experts. Existing methods of achieving this conversion suffer from problems of not being specific enough to capture all of the needs of conceptual data modeling and not being able to handle a large number of relationships between entities. This paper describes an implementation for a modeling solution to clearly illustrate conceptual data models stored in XML formats in well organized and structured diagrams. A force layout with several different parameters is applied to the diagram to create both compact and easily traversable relationships between entities.
Kernel credal classification rule
In this paper, we propose a kernel version of the credal classification rule (CCR) to perform the classification in a feature space of high dimension. Kernels based approaches have become popular for several years to solve supervised or unsupervised learning problems. In this paper, our method is extended to the CCR. It is realized by replacing the inner product with an appropriate positive definite function, and the corresponding algorithms are called kernel Credal Classification Rule (KCCR). The approach is applied to the classification of the generated and real data to evaluate and compare the performance of the KCCR method with other classification methods.
Cluster forest based fuzzy logic for massive data clustering
This article is focused in developing an improved cluster ensemble method based cluster forests. Cluster forests (CF) is considered as a version of clustering inspired from Random Forests (RF) in the context of clustering for massive data. It aggregates intermediate Fuzzy C-Means (FCM) clustering results via spectral clustering since pseudo-clustering results are presented in the spectral space in order to classify these data sets in the multidimensional data space. One of the main advantages is the use of FCM, which allows building fuzzy membership to all partitions of the datasets due to the fuzzy logic whereas the classical algorithms as K-means permitted to build just hard partitions. In the first place, we ameliorate the CF clustering algorithm with the integration of fuzzy FCM and we compare it with other existing clustering methods. In the second place, we compare K-means and FCM clustering methods with the agglomerative hierarchical clustering (HAC) and other theory presented methods using data benchmarks from UCI repository.
Comparison between extreme learning machine and wavelet neural networks in data classification
Siwar Yahia, Salwa Said, Olfa Jemai, et al.
Extreme learning Machine is a well known learning algorithm in the field of machine learning. It's about a feed forward neural network with a single-hidden layer. It is an extremely fast learning algorithm with good generalization performance. In this paper, we aim to compare the Extreme learning Machine with wavelet neural networks, which is a very used algorithm. We have used six benchmark data sets to evaluate each technique. These datasets Including Wisconsin Breast Cancer, Glass Identification, Ionosphere, Pima Indians Diabetes, Wine Recognition and Iris Plant. Experimental results have shown that both extreme learning machine and wavelet neural networks have reached good results.
Deep SOMs for automated feature extraction and classification from big data streaming
Mohamed Sakkari, Ridha Ejbali, Mourad Zaied
In this paper, we proposed a deep self-organizing map model (Deep-SOMs) for automated features extracting and learning from big data streaming which we benefit from the framework Spark for real time streams and highly parallel data processing. The SOMs deep architecture is based on the notion of abstraction (patterns automatically extract from the raw data, from the less to more abstract). The proposed model consists of three hidden self-organizing layers, an input and an output layer. Each layer is made up of a multitude of SOMs, each map only focusing at local headmistress sub-region from the input image. Then, each layer trains the local information to generate more overall information in the higher layer. The proposed Deep-SOMs model is unique in terms of the layers architecture, the SOMs sampling method and learning. During the learning stage we use a set of unsupervised SOMs for feature extraction. We validate the effectiveness of our approach on large data sets such as Leukemia dataset and SRBCT. Results of comparison have shown that the Deep-SOMs model performs better than many existing algorithms for images classification.
Fuzzy feature selection based on interval type-2 fuzzy sets
Sahar Cherif, Nesrine Baklouti, Adel Alimi, et al.
When dealing with real world data; noise, complexity, dimensionality, uncertainty and irrelevance can lead to low performance and insignificant judgment. Fuzzy logic is a powerful tool for controlling conflicting attributes which can have similar effects and close meanings. In this paper, an interval type-2 fuzzy feature selection is presented as a new approach for removing irrelevant features and reducing complexity. We demonstrate how can Feature Selection be joined with Interval Type-2 Fuzzy Logic for keeping significant features and hence reducing time complexity. The proposed method is compared with some other approaches. The results show that the number of attributes is proportionally small.