Proceedings Volume 9445

Seventh International Conference on Machine Vision (ICMV 2014)

Antanas Verikas, Branislav Vuksanovic, Petia Radeva, et al.
cover
Proceedings Volume 9445

Seventh International Conference on Machine Vision (ICMV 2014)

Antanas Verikas, Branislav Vuksanovic, Petia Radeva, et al.
Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 27 February 2015
Contents: 10 Sessions, 83 Papers, 0 Presentations
Conference: Seventh International Conference on Machine Vision (ICMV 2014) 2014
Volume Number: 9445

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 9445
  • Pattern Recognition
  • Feature Detection and Target Tracking
  • Image Processing
  • Image Analysis and Information Encryption
  • Modeling and Visualization
  • Video Analysis and Processing
  • Medical Signal Processing
  • Signal Processing
  • Information Systems and Image Processing Applications
Front Matter: Volume 9445
icon_mobile_dropdown
Front Matter: Volume 9445
This PDF file contains the front matter associated with SPIE Proceedings Volume 9445, including the Title Page, Copyright information, Table of Contents, and Conference Committee listing.
Pattern Recognition
icon_mobile_dropdown
Road shape recognition based on scene self-similarity
Vassili V. Postnikov, Darya A. Krohina, Victor E. Prun
A method of determining of the road shape and direction is proposed. The road can potentially have curved shape as well as be seen unclearly due to weather effects or relief features. The proposed method uses video taken from frontal camera that is rigidly placed in car as an input data. The method is based on self-similarity of typical road image, i.e. the smaller image inside the road is close to downscaled initial image.
A speech recognition system based on hybrid wavelet network including a fuzzy decision support system
Olfa Jemai, Ridha Ejbali, Mourad Zaied, et al.
This paper aims at developing a novel approach for speech recognition based on wavelet network learnt by fast wavelet transform (FWN) including a fuzzy decision support system (FDSS). Our contributions reside in, first, proposing a novel learning algorithm for speech recognition based on the fast wavelet transform (FWT) which has many advantages compared to other algorithms and in which major problems of the previous works to compute connection weights were solved. They were determined by a direct solution which requires computing matrix inversion, which may be intensive. However, the new algorithm was realized by the iterative application of FWT to compute connection weights. Second, proposing a new classification way for this speech recognition system. It operated a human reasoning mode employing a FDSS to compute similarity degrees between test and training signals. Extensive empirical experiments were conducted to compare the proposed approach with other approaches. Obtained results show that the new speech recognition system has a better performance than previously established ones.
Apply lightweight recognition algorithms in optical music recognition
Viet-Khoi Pham, Hai-Dang Nguyen, Tung-Anh Nguyen-Khac, et al.
The problems of digitalization and transformation of musical scores into machine-readable format are necessary to be solved since they help people to enjoy music, to learn music, to conserve music sheets, and even to assist music composers. However, the results of existing methods still require improvements for higher accuracy. Therefore, the authors propose lightweight algorithms for Optical Music Recognition to help people to recognize and automatically play musical scores. In our proposal, after removing staff lines and extracting symbols, each music symbol is represented as a grid of identical MN cells, and the features are extracted and classified with multiple lightweight SVM classifiers. Through experiments, the authors find that the size of 10 ∗ 12 cells yields the highest precision value. Experimental results on the dataset consisting of 4929 music symbols taken from 18 modern music sheets in the Synthetic Score Database show that our proposed method is able to classify printed musical scores with accuracy up to 99.56%.
Novel palmprint representations for palmprint recognition
Hengjian Li, Jiwen Dong, Jinping Li, et al.
In this paper, we propose a novel palmprint recognition algorithm. Firstly, the palmprint images are represented by the anisotropic filter. The filters are built on Gaussian functions along one direction, and on second derivative of Gaussian functions in the orthogonal direction. Also, this choice is motivated by the optimal joint spatial and frequency localization of the Gaussian kernel. Therefore,they can better approximate the edge or line of palmprint images. A palmprint image is processed with a bank of anisotropic filters at different scales and rotations for robust palmprint features extraction. Once these features are extracted, subspace analysis is then applied to the feature vectors for dimension reduction as well as class separability. Experimental results on a public palmprint database show that the accuracy could be improved by the proposed novel representations, compared with Gabor.
Feature integration with random forests for real-time human activity recognition
Hirokatsu Kataoka, Kiyoshi Hashimoto, Yoshimitsu Aoki
This paper presents an approach for real-time human activity recognition. Three different kinds of features (flow, shape, and a keypoint-based feature) are applied in activity recognition. We use random forests for feature integration and activity classification. A forest is created at each feature that performs as a weak classifier. The international classification of functioning, disability and health (ICF) proposed by WHO is applied in order to set the novel definition in activity recognition. Experiments on human activity recognition using the proposed framework show - 99.2% (Weizmann action dataset), 95.5% (KTH human actions dataset), and 54.6% (UCF50 dataset) recognition accuracy with a real-time processing speed. The feature integration and activity-class definition allow us to accomplish high-accuracy recognition match for the state-of-the-art in real-time.
Diamond recognition algorithm using two-channel x-ray radiographic separator
Dmitry P. Nikolaev, Andrey Gladkov, Timofey Chernov, et al.
In this paper real time classification method for two-channel X-ray radiographic diamond separation is discussed. Proposed method does not require direct hardware calibration but uses sample images as a train dataset. It includes online dynamic time warping algorithm for inter-channel synchronization. Additionally, algorithms of online source signal control are discussed, including X-ray intensity control, optical noise detection and sensor occlusion detection.
Comparison of two algorithms modifications of projective-invariant recognition of the plane boundaries with the one concavity
Natalia Pritula, Dmitrii P. Nikolaev, Alexander Sheshkus, et al.
In this paper we present two algorithms modifications of projective-recognition of the plane boundaries with one concavity. The input images are created with orthographic pinhole camera with a fixed focal length. Thus variety of the possible projective transformations is limited. The first modification considers the task more generally, the other uses prior information about camera model. A hypothesis that the second modification has better accuracy is being checked. Results of around 20000 numeral experiments that confirm the hypothesis are included.
Improving text recognition by distinguishing scene and overlay text
Bernhard Quehl, Haojin Yang, Harald Sack
Video texts are closely related to the content of a video. They provide a valuable source for indexing and interpretation of video data. Text detection and recognition task in images or videos typically distinguished between overlay and scene text. Overlay text is artificially superimposed on the image at the time of editing and scene text is text captured by the recording system. Typically, OCR systems are specialized on one kind of text type. However, in video images both types of text can be found. In this paper, we propose a method to automatically distinguish between overlay and scene text to dynamically control and optimize post processing steps following text detection. Based on a feature combination a Support Vector Machine (SVM) is trained to classify scene and overlay text. We show how this distinction in overlay and scene text improves the word recognition rate. Accuracy of the proposed methods has been evaluated by using publicly available test data sets.
LBP and SIFT based facial expression recognition
Omer Sumer, Ece Olcay Gunes
This study compares the performance of local binary patterns (LBP) and scale invariant feature transform (SIFT) with support vector machines (SVM) in automatic classification of discrete facial expressions. Facial expression recognition is a multiclass classification problem and seven classes; happiness, anger, sadness, disgust, surprise, fear and comtempt are classified. Using SIFT feature vectors and linear SVM, 93.1% mean accuracy is acquired on CK+ database. On the other hand, the performance of LBP-based classifier with linear SVM is reported on SFEW using strictly person independent (SPI) protocol. Seven-class mean accuracy on SFEW is 59.76%. Experiments on both databases showed that LBP features can be used in a fairly descriptive way if a good localization of facial points and partitioning strategy are followed.
A comparative study of local descriptors for Arabic character recognition on mobile devices
Maroua Tounsi, Ikram Moalla, Adel M. Alimi, et al.
Nowadays, the number of mobile applications based on image registration and recognition is increasing. Most interesting applications include mobile translator which can read text characters in the real world and translates it into the native language instantaneously. In this context, we aim to recognize characters in natural scenes by computing significant points so called key points or features/interest points in the image. So, it will be important to compare and evaluate features descriptors in terms of matching accuracy and processing time in a particular context of natural scene images. In this paper, we were interested on comparing the efficiency of the binary features as alternatives to the traditional SIFT and SURF in matching Arabic characters descended from natural scenes. We demonstrate that the binary descriptor ORB yields not only to similar results in terms of matching characters performance that the famous SIFT but also to faster computation suitable for mobile applications.
Feature Detection and Target Tracking
icon_mobile_dropdown
Object detection using categorised 3D edges
Lilita Kiforenko, Anders Glent Buch, Leon Bodenhagen, et al.
In this paper we present an object detection method that uses edge categorisation in combination with a local multi-modal histogram descriptor, all based on RGB-D data. Our target application is robust detection and pose estimation of known objects. We propose to apply a recently introduced edge categorisation algorithm for describing objects in terms of its different edge types. Relying on edge information allow our system to deal with objects with little or no texture or surface variation. We show that edge categorisation improves matching performance due to the higher level of discrimination, which is made possible by the explicit use of edge categories in the feature descriptor. We quantitatively compare our approach with the state-of-the-art template based Linemod method, which also provides an effective way of dealing with texture-less objects, tests were performed on our own object dataset. Our results show that detection based on edge local multi-modal histogram descriptor outperforms Linemod with a significantly smaller amount of templates.
Information based universal feature extraction
Mohammad Amiri, Rüdiger Brause
In many real world image based pattern recognition tasks, the extraction and usage of task-relevant features are the most crucial part of the diagnosis. In the standard approach, they mostly remain task-specific, although humans who perform such a task always use the same image features, trained in early childhood. It seems that universal feature sets exist, but they are not yet systematically found. In our contribution, we tried to find those universal image feature sets that are valuable for most image related tasks. In our approach, we trained a neural network by natural and non-natural images of objects and background, using a Shannon information-based algorithm and learning constraints. The goal was to extract those features that give the most valuable information for classification of visual objects hand-written digits. This will give a good start and performance increase for all other image learning tasks, implementing a transfer learning approach. As result, in our case we found that we could indeed extract features which are valid in all three kinds of tasks.
A rotation invariant local Zernike moment based interest point detector
Gökhan Özbulak, Muhittin Gökmen
Detection of interesting points in the image is an important phase when considering object detection problem in computer vision. Corners are good candidates as such interest points. In this study, by optimizing corner model of Ghosal based on local Zernike moments (LZM) and using LZM representation Sariyanidi et.al presented, a rotation-invariant interest point detector is proposed. The performance of proposed detector is evaluated by using Mikolajczyk's dataset prepared for rotation-invariance and our method outperforms well-known methods such as SIFT and SURF in terms of repeatability criterion.
Stereoscopic roadside curb height measurement using V-disparity
Florin Octavian Matu, Iskren Vlaykov, Mikkel Thogersen, et al.
Managing road assets, such as roadside curbs, is one of the interests of municipalities. As an interesting application of computer vision, this paper proposes a system for automated measurement of the height of the roadside curbs. The developed system uses the spatial information available in the disparity image obtained from a stereo setup. Data about the geometry of the scene is extracted in the form of a row-wise histogram of the disparity map. From parameterizing the two strongest lines, each pixel can be labeled as belonging to one plane, either ground, sidewalk or curb candidates. Experimental results show that the system can measure the height of the roadside curb with good accuracy and precision.
Automatic enrollment for gait-based person re-identification
Javier Ortells, Raúl Martín-Félez, Ramón A. Mollineda
Automatic enrollment involves a critical decision-making process within people re-identification context. However, this process has been traditionally undervalued. This paper studies the problem of automatic person enrollment from a realistic perspective relying on gait analysis. Experiments simulating random flows of people with considerable appearance variations between different observations of a person have been conducted, modeling both short- and longterm scenarios. Promising results based on ROC analysis show that automatically enrolling people by their gait is affordable with high success rates.
Feature extraction of probe mark image and automatic detection of probing pad defects in semiconductor using CSVM
Jeong-Hoon Lee, Jee-Hyong Lee
As semiconductor micro-fabrication process continues to advance, the size of probing pads also become smaller in a chip. A probe needle contacts each probing pad for electrical test. However, probe needle may incorrectly touch probing pad. Such contact failures damage probing pads and cause qualification problems. In order to detect contact failures, the current system observes the probing marks on pads. Due to a low accuracy of the system, engineers have to redundantly verify the result of the system once more, which causes low efficiency. We suggest an approach for automatic defect detection to solve these problems using image processing and CSVM. We develop significant features of probing marks to classify contact failures more correctly. We reduce 38% of the workload of engineers.
Pedestrian detection system based on HOG and a modified version of CSS
Daniel Luis Cosmo, Evandro Ottoni Teatini Salles, Patrick Marques Ciarelli
This paper describes a complete pedestrian detection system based on sliding windows. Two feature vector extraction techniques are used: HOG (Histogram of Oriented Gradient) and CSS (Color Self Similarities), and to classify windows we use linear SVM (Support Vector Machines). Besides these techniques, we use mean shift and hierarchical clustering, to fuse multiple overlapping detections. The results we obtain on the dataset INRIA Person shows that the proposed system, using only HOG descriptors, achieves better results over similar systems, with a log average miss rate equal to 43%, against 46%, due to the cutting of final detections to better adapt them to the modified annotations. The addition of the modified CSS increases the efficiency of the system, leading to a log average miss rate equal to 39%.
Image boundaries detection: from thresholding to implicit curve evolution
Souleymane Balla-Arabé, Vincent Brost, Fan Yang
The development of high dimensional large-scale imaging devices increases the need of fast, robust and accurate image segmentation methods. Due to its intrinsic advantages such as the ability to extract complex boundaries, while handling topological changes automatically, the level set method (LSM) has been widely used in boundaries detection. Nevertheless, their computational complexity limits their use for real time systems. Furthermore, most of the LSMs share the limit of leading very often to a local minimum, while the effectiveness of many computer vision applications depends on the whole image boundaries. In this paper, using the image thresholding and the implicit curve evolution frameworks, we design a novel boundaries detection model which handles the above related drawbacks of the LSMs. In order to accelerate the method using the graphics processing units, we use the explicit and highly parallelizable lattice Boltzmann method to solve the level set equation. The introduced algorithm is fast and achieves global image segmentation in a spectacular manner. Experimental results on various kinds of images demonstrate the effectiveness and the efficiency of the proposed method.
Image Processing
icon_mobile_dropdown
Improving parametric active contours by using object center of gravity distance map
A. Marouf, A. Houacine
In this paper, we propose an improvement of the classical parametric active contours. The method, presented here, consists in adding a new energy term based on the object center of gravity distance map. This additional term acts as attraction forces that constrain the contour to remain in the vicinity of the object. The distance map introduced here differs from the classical one since it is not based on a binary image, but rather constitutes a simplified and very fast version that relates only to one point, defined as the expected center of gravity of the object. The additional forces, so introduced, act as a kind of balloon method with improved convergence. The method is evaluated for object segmentation in images, and also for object tracking. The center of gravity is computed from the initial contour for each image of the sequence considered. Compared to the balloon method, the presented approach appears to be faster and less prone to loops, as it behaves better for object tracking.
Improving color image segmentation by spatial-color pixel clustering
Henryk Palus, Mariusz Frackiewicz
Image segmentation is one of the most difficult steps in the computer vision process. Pixel clustering is only one among many techniques used in image segmentation. In this paper is proposed a new segmentation technique, making clustering in the five-dimensional feature space built from three color components and two spatial coordinates. The advantages of taking into account the information about the image structure in pixel clustering are shown. The proposed 5D k-means technique requires, similarly to other segmentation techniques, an additional postprocessing to eliminate oversegmentation. Our approach is evaluated on different simple and complex images.
Segmentation of color images using genetic algorithm with image histogram
P. Sneha Latha, Pawan Kumar, Samruddhi Kahu, et al.
This paper proposes a family of color image segmentation algorithms using genetic approach and color similarity threshold in terns of Just noticeable difference. Instead of segmenting and then optimizing, the proposed technique directly uses GA for optimized segmentation of color images. Application of GA on larger size color images is computationally heavy so they are applied on 4D-color image histogram table. The performance of the proposed algorithms is benchmarked on BSD dataset with color histogram based segmentation and Fuzzy C-means Algorithm using Probabilistic Rand Index (PRI). The proposed algorithms yield better analytical and visual results.
Interactive object segmentation using color similarity based nearest neighbor regions mergence
Jun Zhang, Qieshi Zhang
An effective object segmentation is an important task in computer vision. Due to the automatic image segmentation is hard to segment the object from natural scenes, the interactive approach becomes a good solution. In this paper, a color similarity measure based region mergence approach is proposed with the interactive operation. Some local regions, which belong to the background and object, need to be interactively marked respectively. To judge whether two adjacent regions need to be merged or not, a color similarity measure is proposed with the help of mark. Execute merging operation based on the marks in background and the two regions with maximum similarity need to be merged until all candidate regions are examined. Consequently, the object is segmented by ignoring the merged background. The experiments prove that the proposed method can obtain more accurate result from the natural scenes.
Image detection and compression for memory efficient system analysis
The advances in digital signal processing have been progressing towards efficient use of memory and processing. Both of these factors can be utilized efficiently by using feasible techniques of image storage by computing the minimum information of image which will enhance computation in later processes. Scale Invariant Feature Transform (SIFT) can be utilized to estimate and retrieve of an image. In computer vision, SIFT can be implemented to recognize the image by comparing its key features from SIFT saved key point descriptors. The main advantage of SIFT is that it doesn't only remove the redundant information from an image but also reduces the key points by matching their orientation and adding them together in different windows of image [1]. Another key property of this approach is that it works on highly contrasted images more efficiently because it`s design is based on collecting key points from the contrast shades of image.
FPGA based image processing for optical surface inspection with real time constraints
Ylber Hasani, Ernst Bodenstorfer, Jörg Brodersen, et al.
Today, high-quality printing products like banknotes, stamps, or vouchers, are automatically checked by optical surface inspection systems. In a typical optical surface inspection system, several digital cameras acquire the printing products with fine resolution from different viewing angles and at multiple wavelengths of the visible and also near infrared spectrum of light. The cameras deliver data streams with a huge amount of image data that have to be processed by an image processing system in real time. Due to the printing industry’s demand for higher throughput together with the necessity to check finer details of the print and its security features, the data rates to be processed tend to explode. In this contribution, a solution is proposed, where the image processing load is distributed between FPGAs and digital signal processors (DSPs) in such a way that the strengths of both technologies can be exploited. The focus lies upon the implementation of image processing algorithms in an FPGA and its advantages. In the presented application, FPGAbased image-preprocessing enables real-time implementation of an optical color surface inspection system with a spatial resolution of 100 μm and for object speeds over 10 m/s. For the implementation of image processing algorithms in the FPGA, pipeline parallelism with clock frequencies up to 150 MHz together with spatial parallelism based on multiple instantiations of modules for parallel processing of multiple data streams are exploited for the processing of image data of two cameras and three color channels. Due to their flexibility and their fast response times, it is shown that FPGAs are ideally suited for realizing a configurable all-digital PLL for the processing of camera line-trigger signals with frequencies about 100 kHz, using pure synchronous digital circuit design.
Selection of optimal wavelet bases for image compression using SPIHT algorithm
Maria Rehman, Imran Touqir, Wajiha Batool
This paper presents the performance of several wavelet basesin SPIHT coding. Two types of wavelet bases are tested for SPIHT algorithm i.e. orthogonal and biorthogonal wavelet bases. The results of using coefficients of these bases are compared on the basis of Compression Ratio and Peak Signal to Noise Ratio. The paper shows that use of biorthogonal wavelets bases is better than orthogonal wavelet bases. Out of biorthogonal wavelets, bior 4.4 shows good results in SPIHT coding.
The effect of desying angle on polarimetric SAR image decomposition
Boussad Azmedroub, Mounira Ouarzeddine, Boularbah Souissi
Polarimetric image decomposition is nowadays among the most important applications of multi-polarization, multifrequency SAR radar images. With the growth of new satellite missions equipped with fully polarimetric modes there is a strong need for accurate methods and for new approaches to handle the huge data coming from different airborne and space borne missions and to understand better the several and different mechanisms that occur in a resolution cell. We are interested in this paper in polarimetric SAR image decomposition that makes a comparison between Yamaguchi decomposition called also the four component decomposition before and after image compensation from the orientation angle. This latter affects directly the scattering mechanisms and induces errors in the decomposition results especially in urban area where there are complex structures. We demonstrate with power profiles and with RGB color composite images that the volume scattering type decreases drastically after deorientation, whereas the helix scattering type is not sensitive to orientation. The test site is situated in the north of Algiers city and the satellite data is a fully polarimetric acquisition in C band. Results are in a high agreement with Google earth optical image.
Atmospheric correction of hyperspectral images based on approximate solution of transmittance equation
A. M. Belov, V. V. Myasnikov
The paper presents a method of atmospheric correction of remote sensing hyperspectral images. The method based on approximate solution of MODTRAN transmittance equation using simultaneous analysis of remote sensing hyperspectral image and “ideal” hyperspectral image which is free from atmospheric distortions. Experimental results show that proposed method is applicable to perform atmospheric correction.
Independent transmission of sign language interpreter in DVB: assessment of image compression
Petr Zatloukal, Martin Bernas, Lukáš Dvořák
Sign language on television provides information to deaf that they cannot get from the audio content. If we consider the transmission of the sign language interpreter over an independent data stream, the aim is to ensure sufficient intelligibility and subjective image quality of the interpreter with minimum bit rate. The work deals with the ROI-based video compression of Czech sign language interpreter implemented to the x264 open source library. The results of this approach are verified in subjective tests with the deaf. They examine the intelligibility of sign language expressions containing minimal pairs for different levels of compression and various resolution of image with interpreter and evaluate the subjective quality of the final image for a good viewing experience.
Orthogonal wavelet moments and their multifractal invariants
Dm. V. Uchaev, D. V. Uchaev, V. A. Malinnikov
This paper introduces a new family of moments, namely orthogonal wavelet moments (OWMs), which are orthogonal realization of wavelet moments (WMs). In contrast to WMs with nonorthogonal kernel function, these moments can be used for multiresolution image representation and image reconstruction. The paper also introduces multifractal invariants (MIs) of OWMs which can be used instead of OWMs. Some reconstruction tests performed with noise-free and noisy images demonstrate that MIs of OWMs can also be used for image smoothing, sharpening and denoising. It is established that the reconstruction quality for MIs of OWMs can be better than corresponding orthogonal moments (OMs) and reduces to the reconstruction quality for the OMs if we use the zero scale level.
Image Analysis and Information Encryption
icon_mobile_dropdown
High-accurate and noise-tolerant texture descriptor
Alireza Akoushideh, Babak Mazloom-Nezhad Maybodi
In this paper, we extend pyramid transform domain approach on local binary pattern (PLBP) to make a high-accurate and noise-tolerant texture descriptor. We combine PLBP information of sub-band images, which are attained using wavelet transform, in different resolution and make some new descriptors. Multi-level and -resolution LBP(MPR_LBP), multi-level and -band LBP (MPB_LBP), and multi-level, -band and -resolution LBP (MPBR_LBP) are our proposed descriptors that are applied to unsupervised classification of texture images on Outex, UIUC, and Scene-13 data sets. Experimental results show that the proposed descriptors not only demonstrate acceptable texture classification accuracy with significantly lower feature length, but also they are more noise-robustness to a number of recent state-of-the-art LBP extensions.
Fusing the RGB channels of images for maximizing the between-class distances
Ali Güneş, Efkan Durmuş, Habil Kalkan, et al.
In many machine vision applications, objects or scenes are imaged in color (red, green and blue) but then transformed into grayscale images before processing. One can use equal weights for the contribution of the color components to gary scale image or can use the unequal weights provided by the luminance mapping of the National Television Standards Committee (NTSC) standard. NTSC weights, which basically enhance the visual properties of the images, may not perform well for classification purposes. In this study, we propose an adaptive color-to-grayscale conversion approach which increases the accuracy of the image classification problems. The method optimizes the contribution of the color components which increases the between-class distances of the images in opponent classes. It’s observed from the experimental results that the proposed method increases the distances of the images in classes between 1% and 87% depending on the dataset which results increases in classification accuracies between 1% and 4% on benchmark classifiers.
Auto-SEIA: simultaneous optimization of image processing and machine learning algorithms
Valentina Negro Maggio, Luca Iocchi
Object classification from images is an important task for machine vision and it is a crucial ingredient for many computer vision applications, ranging from security and surveillance to marketing. Image based object classification techniques properly integrate image processing and machine learning (i.e., classification) procedures. In this paper we present a system for automatic simultaneous optimization of algorithms and parameters for object classification from images. More specifically, the proposed system is able to process a dataset of labelled images and to return a best configuration of image processing and classification algorithms and of their parameters with respect to the accuracy of classification. Experiments with real public datasets are used to demonstrate the effectiveness of the developed system.
An approach for combining multiple descriptors for image classification
Duc Toan Tran, Bart Jansen, Rudi Deklerck, et al.
Recently, efficient image descriptors have shown promise for image classification tasks. Moreover, methods based on the combination of multiple image features provide better performance compared to methods based on a single feature. This work presents a simple and efficient approach for combining multiple image descriptors. We first employ a Naive-Bayes Nearest-Neighbor scheme to evaluate four widely used descriptors. For all features, “Image-to-Class” distances are directly computed without descriptor quantization. Since distances measured by different metrics can be of different nature and they may not be on the same numerical scale, a normalization step is essential to transform these distances into a common domain prior to combining them. Our experiments conducted on a challenging database indicate that z-score normalization followed by a simple sum of distances fusion technique can significantly improve the performance compared to applications in which individual features are used. It was also observed that our experimental results on the Caltech 101 dataset outperform other previous results.
Sub-word image clustering in Farsi printed books
Mohammad Reza Soheili, Ehsanollah Kabir, Didier Stricker
Most OCR systems are designed for the recognition of a single page. In case of unfamiliar font faces, low quality papers and degraded prints, the performance of these products drops sharply. However, an OCR system can use redundancy of word occurrences in large documents to improve recognition results. In this paper, we propose a sub-word image clustering method for the applications dealing with large printed documents. We assume that the whole document is printed by a unique unknown font with low quality print. Our proposed method finds clusters of equivalent sub-word images with an incremental algorithm. Due to the low print quality, we propose an image matching algorithm for measuring the distance between two sub-word images, based on Hamming distance and the ratio of the area to the perimeter of the connected components. We built a ground-truth dataset of more than 111000 sub-word images to evaluate our method. All of these images were extracted from an old Farsi book. We cluster all of these sub-words, including isolated letters and even punctuation marks. Then all centers of created clusters are labeled manually. We show that all sub-words of the book can be recognized with more than 99.7% accuracy by assigning the label of each cluster center to all of its members.
Vehicle passes detector based on multi-sensor analysis
D. Bocharov, D. Sidorchuk, I. Konovalenko, et al.
The study concerned deals with a new approach to the problem of detecting vehicle passes in vision-based automatic vehicle classification system. Essential non-affinity image variations and signals from induction loop are the events that can be considered as detectors of an object presence. We propose several vehicle detection techniques based on image processing and induction loop signal analysis. Also we suggest a combined method based on multi-sensor analysis to improve vehicle detection performance. Experimental results in complex outdoor environments show that the proposed multi-sensor algorithm is effective for vehicles detection.
Vision-based industrial automatic vehicle classifier
Timur Khanipov, Ivan Koptelov, Anton Grigoryev, et al.
The paper describes the automatic motor vehicle video stream based classification system. The system determines vehicle type at payment collection plazas on toll roads. Classification is performed in accordance with a preconfigured set of rules which determine type by number of wheel axles, vehicle length, height over the first axle and full height. These characteristics are calculated using various computer vision algorithms: contour detectors, correlational analysis, fast Hough transform, Viola-Jones detectors, connected components analysis, elliptic shapes detectors and others. Input data contains video streams and induction loop signals. Output signals are vehicle enter and exit events, vehicle type, motion direction, speed and the above mentioned features.
Automatic emotional expression analysis from eye area
Betül Akkoç, Ahmet Arslan
Eyes play an important role in expressing emotions in nonverbal communication. In the present study, emotional expression classification was performed based on the features that were automatically extracted from the eye area. Fırst, the face area and the eye area were automatically extracted from the captured image. Afterwards, the parameters to be used for the analysis through discrete wavelet transformation were obtained from the eye area. Using these parameters, emotional expression analysis was performed through artificial intelligence techniques. As the result of the experimental studies, 6 universal emotions consisting of expressions of happiness, sadness, surprise, disgust, anger and fear were classified at a success rate of 84% using artificial neural networks.
Interactive change detection based on dissimilarity image and decision tree classification
Yan Wang, Alain Crouzil, Jean-Baptiste Puel
Our study mainly focus on detecting changed regions in two images of the same scene taken by digital cameras at different times. The images taken by digital cameras generally provide less information than multi-channel remote sensing images. Moreover, the application-dependent insignificant changes, such as shadows or clouds, may cause the failure of the classical methods based on image differences. The machine learning approach seems to be promising, but the lack of a sufficient volume of training data for photographic landscape observatories discards a lot of methods. So we investigate in this work the interactive learning approach and provide a discriminative model that is a 16-dimensional feature space comprising the textural appearance and contextual information. Dissimilarity measures in different neighborhood sizes are used to detect the difference within the neighborhood of an image pair. To detect changes between two images, the user designates change and non-change samples (pixel sets) in the images using a selection tool. This data is used to train a classifier using decision tree training method which is then applied to all the other pixels of the image pair. The experiments have proved the potential of the proposed approach.
Modeling and Visualization
icon_mobile_dropdown
SubPatch: random kd-tree on a sub-sampled patch set for nearest neighbor field estimation
Fabrizio Pedersoli, Sergio Benini, Nicola Adami, et al.
We propose a new method to compute the approximate nearest-neighbors field (ANNF) between image pairs using random kd-tree and patch set sub-sampling. By exploiting image coherence we demonstrate that it is possible to reduce the number of patches on which we compute the ANNF, while maintaining high overall accuracy on the final result. Information on missing patches is then recovered by interpolation and propagation of good matches. The introduction of the sub-sampling factor on patch sets also allows for setting the desired trade off between accuracy and speed, providing a flexibility that lacks in state-of-the-art methods. Tests conducted on a public database prove that our algorithm achieves superior performance with respect to PatchMatch (PM) and Coherence Sensitivity Hashing (CSH) algorithms in a comparable computational time.
Disparity estimation from monocular image sequence
Qieshi Zhang, Sei-ichiro Kamata
This paper proposes a novel method for estimating disparity accurately. To achieve the ideal result, an optimal adjusting framework is proposed to address the noise, occlusions, and outliners. Different from the typical multi-view stereo (MVS) methods, the proposed approach not only use the color constraint, but also use the geometric constraint associating multiple frame from the image sequence. The result shows the disparity with a good visual quality that most of the noise is eliminated, the errors in occlusion area are suppressed and the details of scene objects are preserved.
Sparse decomposition learning based dynamic MRI reconstruction
Peifei Zhu, Qieshi Zhang, Sei-ichiro Kamata
Dynamic MRI is widely used for many clinical exams but slow data acquisition becomes a serious problem. The application of Compressed Sensing (CS) demonstrated great potential to increase imaging speed. However, the performance of CS is largely depending on the sparsity of image sequence in the transform domain, where there are still a lot to be improved. In this work, the sparsity is exploited by proposed Sparse Decomposition Learning (SDL) algorithm, which is a combination of low-rank plus sparsity and Blind Compressed Sensing (BCS). With this decomposition, only sparsity component is modeled as a sparse linear combination of temporal basis functions. This enables coefficients to be sparser and remain more details of dynamic components comparing learning the whole images. A reconstruction is performed on the undersampled data where joint multicoil data consistency is enforced by combing Parallel Imaging (PI). The experimental results show the proposed methods decrease about 15~20% of Mean Square Error (MSE) compared to other existing methods.
Video Analysis and Processing
icon_mobile_dropdown
Generalization of the Viola-Jones method as a decision tree of strong classifiers for real-time object recognition in video stream
A. Minkina, D. Nikolaev, S. Usilin, et al.
In this paper, we present a new modification of Viola-Jones complex classifiers. We describe a complex classifier in the form of a decision tree and provide a method of training for such classifiers. Performance impact of the tree structure is analyzed. Comparison is carried out of precision and performance of the presented method with that of the classical cascade. Various tree architectures are experimentally studied. The task of vehicle wheels detection on images obtained from an automatic vehicle classification system is taken as an example.
A combined vision-inertial fusion approach for 6-DoF object pose estimation
Juan Li, Ana M. Bernardos, Paula Tarrío, et al.
The estimation of the 3D position and orientation of moving objects (‘pose’ estimation) is a critical process for many applications in robotics, computer vision or mobile services. Although major research efforts have been carried out to design accurate, fast and robust indoor pose estimation systems, it remains as an open challenge to provide a low-cost, easy to deploy and reliable solution. Addressing this issue, this paper describes a hybrid approach for 6 degrees of freedom (6-DoF) pose estimation that fuses acceleration data and stereo vision to overcome the respective weaknesses of single technology approaches. The system relies on COTS technologies (standard webcams, accelerometers) and printable colored markers. It uses a set of infrastructure cameras, located to have the object to be tracked visible most of the operation time; the target object has to include an embedded accelerometer and be tagged with a fiducial marker. This simple marker has been designed for easy detection and segmentation and it may be adapted to different service scenarios (in shape and colors). Experimental results show that the proposed system provides high accuracy, while satisfactorily dealing with the real-time constraints.
Classification of similar but differently paced activities in the KTH dataset
Shreeya Sengupta, Hui Wang, Piyush Ojha, et al.
The KTH video dataset [1] contains three activities - walking, jogging and running - which are very similar but are carried out at a different natural pace. We show that explicit inclusion of a feature which may be interpreted as a measure of the overall state of motion in a frame improves a classifier's ability to discriminate between these activities.
A new method for high-capacity information hiding in video robust against temporal desynchronization
Vitaly Mitekin, Victor A. Fedoseev
This paper presents a new method for high-capacity information hiding in digital video and algorithms of embedding and extraction of hidden information based on this method. These algorithms do not require temporal synchronization to provide robustness against both malicious and non-malicious frame dropping (temporal desynchronization). At the same time, due to randomized distribution of hidden information bits across the video frames, the proposed method allows to increase the hiding capacity proportionally to the number of frames used for information embedding. The proposed method is also robust against “watermark estimation” attack aimed at estimation of hidden information without knowing the embedding key or non-watermarked video. Presented experimental results demonstrate declared features of this method.
Video partitioning by segmenting moving object trajectories
Tapas Badal, Neeta Nain, Mushtaq Ahmed
Video partitioning may be involve in a number of applications and present solutions for monitoring and tracking particular person trajectory and also helps in to generate semantic analysis of single entity or of entire video. Many recent advances in object detection and tracking concern about motion structure and data association used to be assigned a label to trajectories and analyze them independently. In this work we propose an approach for video portioning and a structure is given to store motion structure of target set to monitor in video. Spatio-temporal tubes separate individual objects that help to generate semantic analysis report for each object individually. The semantic analysis system for video based on this framework provides not only efficient synopsis generation but also spatial collision where the temporal consistency can be resolved for representation of semantic knowledge of each object. For keeping low computational complexity trajectories are generated online and classification, knowledge representation and arrangement over spatial domain are suggested to perform in offline manner.
Experimental comparison of methods for estimation of the observed velocity of the vehicle in video stream
In this paper, we consider the problem of object's velocity estimation via video stream by comparing three new methods of velocity estimation named as vertical edge algorithm, modified Lucas-Kanade method, and feature points algorithm. As an applied example the task of automatic evaluation of vehicles' velocity via video stream on toll roads is chosen. We took some videos from cameras mounted on the toll roads and marked them out to determine true velocity. Comparison is carried out of performance in the correct velocity detection of the proposed methods with each other. The relevance of this paper is practical implementation of these methods overcoming all the difficulties of realization.
Analysis to feature-based video stabilization/registration techniques within application of traffic data collection
Mojtaba T. Sadat, Francesco Viti
Machine vision is rapidly gaining popularity in the field of Intelligent Transportation Systems. In particular, advantages are foreseen by the exploitation of Aerial Vehicles (AV) in delivering a superior view on traffic phenomena. However, vibration on AVs makes it difficult to extract moving objects on the ground. To partly overcome this issue, image stabilization/registration procedures are adopted to correct and stitch multiple frames taken of the same scene but from different positions, angles, or sensors. In this study, we examine the impact of multiple feature-based techniques for stabilization, and we show that SURF detector outperforms the others in terms of time efficiency and output similarity.
A robust SIFT-based descriptor for video classification
Raziyeh Salarifard, Mahshid Alsadat Hosseini, Mahmood Karimian, et al.
Voluminous amount of videos in today’s world has made the subject of objective (or semi-objective) classification of videos to be very popular. Among the various descriptors used for video classification, SIFT and LIFT can lead to highly accurate classifiers. But, SIFT descriptor does not consider video motion and LIFT is time-consuming. In this paper, a robust descriptor for semi-supervised classification based on video content is proposed. It holds the benefits of LIFT and SIFT descriptors and overcomes their shortcomings to some extent. For extracting this descriptor, the SIFT descriptor is first used and the motion of the extracted keypoints are then employed to improve the accuracy of the subsequent classification stage. As SIFT descriptor is scale invariant, the proposed method is also robust toward zooming. Also, using the global motion of keypoints in videos helps to neglect the local motions caused during video capturing by the cameraman. In comparison to other works that consider the motion and mobility of videos, the proposed descriptor requires less computations. Obtained results on the TRECVIT 2006 dataset show that the proposed method achieves more accurate results in comparison with SIFT in content-based video classifications by about 15 percent.
An agglomerative approach for shot summarization based on content homogeneity
Antonis Ioannidis, Vasileios Chasanis, Aristidis Likas
An efficient shot summarization method is presented based on agglomerative clustering of the shot frames. Unlike other agglomerative methods, our approach relies on a cluster merging criterion that computes the content homogeneity of a merged cluster. An important feature of the proposed approach is the automatic estimation of the number of a shot's most representative frames, called keyframes. The method starts by splitting each video sequence into small, equal sized clusters (segments). Then, agglomerative clustering is performed, where from the current set of clusters, a pair of clusters is selected and merged to form a larger unimodal (homogeneous) cluster. The algorithm proceeds until no further cluster merging is possible. At the end, the medoid of each of the final clusters is selected as keyframe and the set of keyframes constitutes the summary of the shot. Numerical experiments demonstrate that our method reasonable estimates the number of ground-truth keyframes, while extracting non-repetitive keyframes that efficiently summarize the content of each shot.
Medical Signal Processing
icon_mobile_dropdown
Automatic identification of vessel crossovers in retinal images
L. Sánchez, N. Barreira, M. G. Penedo, et al.
Crossovers and bifurcations are interest points of the retinal vascular tree useful to diagnose diseases. Specifically, detecting these interest points and identifying which of them are crossings will give us the opportunity to search for arteriovenous nicking, this is, an alteration of the vessel tree where an artery is crossed by a vein and the former compresses the later. These formations are a clear indicative of hypertension, among other medical problems. There are several studies that have attempted to define an accurate and reliable method to detect and classify these relevant points. In this article, we propose a new method to identify crossovers. Our approach is based on segmenting the vascular tree and analyzing the surrounding area of each interest point. The minimal path between vessel points in this area is computed in order to identify the connected vessel segments and, as a result, to distinguish between bifurcations and crossovers. Our method was tested using retinographies from public databases DRIVE and VICAVR, obtaining an accuracy of 90%.
Filter-based feature selection and support vector machine for false positive reduction in computer-aided mass detection in mammograms
V. D. Nguyen, D. T. Nguyen, T. D. Nguyen, et al.
In this paper, a method for reducing false positive in computer-aided mass detection in screening mammograms is proposed. A set of 32 features, including First Order Statistics (FOS) features, Gray-Level Occurrence Matrix (GLCM) features, Block Difference Inverse Probability (BDIP) features, and Block Variation of Local Correlation coefficients (BVLC) are extracted from detected Regions-Of-Interest (ROIs). An optimal subset of 8 features is selected from the full feature set by mean of a filter-based Sequential Backward Selection (SBS). Then, Support Vector Machine (SVM) is utilized to classify the ROIs into massive regions or normal regions. The method’s performance is evaluated using the area under the Receiver Operating Characteristic (ROC) curve (AUC or AZ). On a dataset consisting about 2700 ROIs detected from mini-MIAS database of mammograms, the proposed method achieves AZ=0.938.
The brain MRI classification problem from wavelets perspective
Mohamed Mokhtar Bendib, Hayet Farida Merouani, Fatma Diaba
Haar and Daubechies 4 (DB4) are the most used wavelets for brain MRI (Magnetic Resonance Imaging) classification. The former is simple and fast to compute while the latter is more complex and offers a better resolution. This paper explores the potential of both of them in performing Normal versus Pathological discrimination on the one hand, and Multiclassification on the other hand. The Whole Brain Atlas is used as a validation database, and the Random Forest (RF) algorithm is employed as a learning approach. The achieved results are discussed and statistically compared.
Computer-aided diagnosis method for MRI-guided prostate biopsy within the peripheral zone using grey level histograms
Andrik Rampun, Paul Malcolm, Reyer Zwiggelaar
This paper describes a computer-aided diagnosis method for targeted prostate biopsies within the peripheral zone in T2-Weighted MRI. We subdivided the peripheral zone into four regions and compare each sub region's grey level histogram with malignant and normal histogram models, and use specific metrics to estimate the presence of abnormality. The initial evaluation based on 200 MRI slices taken from 40 different patients and we achieved 87% correct classification rate with 89% and 86% sensitivity and specificity, respectively. The main contribution of this paper is a novel approach of Computer Aided Diagnosis which is using grey level histograms analysis between sub regions. In clinical point of view, the developed method could assist clinicians to perform targeted biopsies which are better than the random ones which are currently used.
Semi-automated segmentation of neuroblastoma nuclei using the gradient energy tensor: a user driven approach
Florian Kromp, Sabine Taschner-Mandl, Magdalena Schwarz, et al.
We propose a user-driven method for the segmentation of neuroblastoma nuclei in microscopic fluorescence images involving the gradient energy tensor. Multispectral fluorescence images contain intensity and spatial information about antigene expression, fluorescence in situ hybridization (FISH) signals and nucleus morphology. The latter serves as basis for the detection of single cells and the calculation of shape features, which are used to validate the segmentation and to reject false detections. Accurate segmentation is difficult due to varying staining intensities and aggregated cells. It requires several (meta-) parameters, which have a strong influence on the segmentation results and have to be selected carefully for each sample (or group of similar samples) by user interactions. Because our method is designed for clinicians and biologists, who may have only limited image processing background, an interactive parameter selection step allows the implicit tuning of parameter values. With this simple but intuitive method, segmentation results with high precision for a large number of cells can be achieved by minimal user interaction. The strategy was validated on handsegmented datasets of three neuroblastoma cell lines.
Tumor growth model for atlas based registration of pathological brain MR images
Wafa Moualhi, Zagrouba Ezzeddine
The motivation of this work is to register a tumor brain magnetic resonance (MR) image with a normal brain atlas. A normal brain atlas is deformed in order to take account of the presence of a large space occupying tumor. The method use a priori model of tumor growth assuming that the tumor grows in a radial way from a starting point. First, an affine transformation is used in order to bring the patient image and the brain atlas in a global correspondence. Second, the seeding of a synthetic tumor into the brain atlas provides a template for the lesion. Finally, the seeded atlas is deformed combining a method derived from optical flow principles and a model for tumor growth (MTG). Results show that an automatic segmentation method of brain structures in the presence of large deformation can be provided.
X-ray fluorescence tomography: Jacobin matrix and confidence of the reconstructed images
The goal of the X-ray Fluorescence Computed Tomography (XFCT) is to give the quantitative description of an object under investigation (sample) in terms of the element composition. However, light and heavy elements inside the object give different contribution to the attenuation of the X-ray probe and of the fluorescence. It leads to the elements got in the shadow area do not give any contribution to the registered spectrum. Iterative reconstruction procedures will try to set to zero the variables describing the element content in composition of corresponding unit volumes as these variables do not change system's condition number. Inversion of the XFCT Radon transform gives random values in these areas. To evaluate the confidence of the reconstructed images we first propose, in addition to the reconstructed images, to calculate a generalized image based on Jacobian matrix. This image highlights the areas of doubt in case if there are exist. In the work we have attempted to prove the advisability of such an approach. For this purpose, we analyzed in detail the process of tomographic projection formation.
Signal Processing
icon_mobile_dropdown
Escaping path approach for speckle noise reduction
Marek Szczepanski, Krystian Radlak
A novel fast filtering technique for multiplicative noise removal in ultrasound images was presented in this paper. The proposed algorithm utilizes concept of digital paths created on the image grid presented in [1] adapted to the needs of multiplicative noise reduction. The new approach uses special type of digital paths so called Escaping Path Model and modified path length calculation based on topological as well as gray-scale distances. The experiments confirmed that the proposed algorithm achieves a comparable results with the existing state-of-the-art denoising schemes in suppressing multiplicative noise in ultrasound images.
Optimized curvelet-based empirical mode decomposition
Renjie Wu, Qieshi Zhang, Sei-ichiro Kamata
The recent years has seen immense improvement in the development of signal processing based on Curvelet transform. The Curvelet transform provide a new multi-resolution representation. The frame elements of Curvelets exhibit higher direction sensitivity and anisotropic than the Wavelets, multi-Wavelets, steerable pyramids, and so on. These features are based on the anisotropic notion of scaling. In practical instances, time series signals processing problem is often encountered. To solve this problem, the time-frequency analysis based methods are studied. However, the time-frequency analysis cannot always be trusted. Many of the new methods were proposed. The Empirical Mode Decomposition (EMD) is one of them, and widely used. The EMD aims to decompose into their building blocks functions that are the superposition of a reasonably small number of components, well separated in the time-frequency plane. And each component can be viewed as locally approximately harmonic. However, it cannot solve the problem of directionality of high-dimensional. A reallocated method of Curvelet transform (optimized Curvelet-based EMD) is proposed in this paper. We introduce a definition for a class of functions that can be viewed as a superposition of a reasonably small number of approximately harmonic components by optimized Curvelet family. We analyze this algorithm and demonstrate its results on data. The experimental results prove the effectiveness of our method.
Information Systems and Image Processing Applications
icon_mobile_dropdown
Error analysis of rigid body posture measurement system based on circular feature points
Ju Huo, Jishan Cui, Ning Yang
For monocular vision pose parameters determine the problem, feature-based target feature points on the plane quadrilateral, an improved two-stage iterative algorithm is proposed to improve the optimization of rigid body posture measurement calculating model. Monocular vision rigid body posture measurement system is designed; experimentally in each coordinate system determined coordinate a unified method to unify the each feature point measure coordinates; theoretical analysis sources of error from rigid body posture measurement system simulation experiments. Combined with the actual experimental analysis system under the condition of simulation error of pose accuracy of measurement, gives the comprehensive error of measurement system, for improving measurement precision of certain theoretical guiding significance.
Optical flow based velocity estimation for mobile robots
Xiuzhi Li, Guanrong Zhao, Songmin Jia, et al.
This paper presents an optical flow based novel technique to perceive the instant motion velocity of mobile robots. The primary focus of this study is to determine the robot’s ego-motion using displacement field in temporally consecutive image pairs. In contrast to most previous approaches for estimating velocity, we employ a polynomial expansion based dense optical flow approach and propose a quadratic model based RANSAC refinement of flow fields to render our method more robust with respect to noise and outliers. Accordingly, techniques for geometrical transformation and interpretation of the inter-frame motion are presented. Advantages of our proposal are validated by real experimental results conducted on Pioneer robot.
Autonomous landing of a helicopter UAV with a ground-based multisensory fusion system
Dianle Zhou, Zhiwei Zhong, Daibing Zhang, et al.
In this study, this paper focus on the vision-based autonomous helicopter unmanned aerial vehicle (UAV) landing problems. This paper proposed a multisensory fusion to autonomous landing of an UAV. The systems include an infrared camera, an Ultra-wideband radar that measure distance between UAV and Ground-Based system, an PAN-Tilt Unit (PTU). In order to identify all weather UAV targets, we use infrared cameras. To reduce the complexity of the stereovision or one-cameral calculating the target of three-dimensional coordinates, using the ultra-wideband radar distance module provides visual depth information, real-time Image-PTU tracking UAV and calculate the UAV threedimensional coordinates. Compared to the DGPS, the test results show that the paper is effectiveness and robustness.
A computer control system using a virtual keyboard
Ridha Ejbali, Mourad Zaied, Chokri Ben Amar
This work is in the field of human-computer communication, namely in the field of gestural communication. The objective was to develop a system for gesture recognition. This system will be used to control a computer without a keyboard. The idea consists in using a visual panel printed on an ordinary paper to communicate with a computer.
Unified framework of face hallucination across multiple modalities
Xiang Ma, Junhui Liu, Wenmin Li
Face hallucination in a single modality environment has been heavily studied, in real-world environments under multiple modalities is still in its early stage. This paper presents a unified framework to solve face hallucination problem across multiple modalities i.e. different expressions, poses, illuminations. Almost all of the state-of-the-art face superresolution methods only generate a single output with the same modality of the low-resolution input. Our proposed framework is able to generate multiple outputs of different new modalities from only a single low-resolution input. It includes a global transformation with diagonal loading for modeling the mappings among different new facial modalities, and a local position-patch based method with weights compensation for incorporating image details. Experimental results illustrate the superiority of our framework.
Search-free license plate localization based on saliency and local variance estimation
Amin Safaei, H. L. Tang, S. Sanei
In recent years, the performance and accuracy of automatic license plate number recognition (ALPR) systems have greatly improved, however the increasing number of applications for such systems have made ALPR research more challenging than ever. The inherent computational complexity of search dependent algorithms remains a major problem for current ALPR systems. This paper proposes a novel search-free method of localization based on the estimation of saliency and local variance. Gabor functions are then used to validate the choice of candidate license plate. The algorithm was applied to three image datasets with different levels of complexity and the results compared with a number of benchmark methods, particularly in terms of speed. The proposed method outperforms the state of the art methods and can be used for real time applications.
Seam tracking with adaptive image capture for fine-tuning of a high power laser welding process
Olli Lahdenoja, Tero Säntti, Mika Laiho, et al.
This paper presents the development of methods for real-time fine-tuning of a high power laser welding process of thick steel by using a compact smart camera system. When performing welding in butt-joint configuration, the laser beam’s location needs to be adjusted exactly according to the seam line in order to allow the injected energy to be absorbed uniformly into both steel sheets. In this paper, on-line extraction of seam parameters is targeted by taking advantage of a combination of dynamic image intensity compression, image segmentation with a focal-plane processor ASIC, and Hough transform on an associated FPGA. Additional filtering of Hough line candidates based on temporal windowing is further applied to reduce unrealistic frame-to-frame tracking variations. The proposed methods are implemented in Matlab by using image data captured with adaptive integration time. The simulations are performed in a hardware oriented way to allow real-time implementation of the algorithms on the smart camera system.
Accurate and robust spherical camera pose estimation using consistent points
Christiano Couto Gava, Bernd Krolla, Didier Stricker
This paper addresses the problem of multi-view camera pose estimation of high resolution, full spherical images. A novel approach to simultaneously retrieve camera poses along with a sparse point cloud is designed for large scale scenes. We introduce the concept of consistent points that allows to dynamically select the most reliable 3D points for nonlinear pose refinement. In contrast to classical bundle adjustment approaches, we propose to reduce the parameter search space while jointly optimizing camera poses and scene geometry. Our method notably improves accuracy and robustness of camera pose estimation, as shown by experiments carried out on real image data.
Method of center localization for objects containing concentric arcs
This paper proposes a method for automatic center location of objects containing concentric arcs. The method utilizes structure tensor analysis and voting scheme optimized with Fast Hough Transform. Two applications of the proposed method are considered: (i) wheel tracking in video-based system for automatic vehicle classification and (ii) tree growth rings analysis on a tree cross cut image.
High-speed segmentation-driven high-resolution matching
Fredrik Ekstrand, Carl Ahlberg, Mikael Ekström, et al.
This paper proposes a segmentation-based approach for matching of high-resolution stereo images in real time. The approach employs direct region matching in a raster scan fashion influenced by scanline approaches, but with pixel decoupling. To enable real-time performance it is implemented as a heterogeneous system of an FPGA and a sequential processor. Additionally, the approach is designed for low resource usage in order to qualify as part of unified image processing in an embedded system.
A one-bit approach for image registration
Motion estimation or optic flow computation for automatic navigation and obstacle avoidance programs running on Unmanned Aerial Vehicles (UAVs) is a challenging task. These challenges come from the requirements of real-time processing speed and small light-weight image processing hardware with very limited resources (especially memory space) embedded on the UAVs. Solutions towards both simplifying computation and saving hardware resources have recently received much interest. This paper presents an approach for image registration using binary images which addresses these two requirements. This approach uses translational information between two corresponding patches of binary images to estimate global motion. These low bit-resolution images require a very small amount of memory space to store them and allow simple logic operations such as XOR and AND to be used instead of more complex computations such as subtractions and multiplications.
Remotely sensed image restoration using partial differential equations and watershed transformation
Avishan Nazari, Amin Zehtabian, Marco Gribaudo, et al.
This paper proposes a novel approach for remotely sensed image restoration. The main goal of this study is to mitigate two most well-known types of noises from remote sensing images while their important details such as edges are preserved. To this end, a novel method based on partial differential equations is proposed. The parameters used in the proposed algorithm are adaptively set regarding the type of noise and the texture of noisy datasets. Moreover, we propose to apply a segmentation pre-processing step based on Watershed transformation to localize the denoising process. The performance of the restoration techniques is measured using PSNR criterion. For further assessment, we also feed the original/noisy/denoised images into SVM classifier and explore the results.
Kernel weights optimization for error diffusion halftoning method
This paper describes a study to find the best error diffusion kernel for digital halftoning under various restrictions on the number of non-zero kernel coefficients and their set of values. As an objective measure of quality, WSNR was used. The problem of multidimensional optimization was solved numerically using several well-known algorithms: Nelder– Mead, BFGS, and others. The study found a kernel function that provides a quality gain of about 5% in comparison with the best of the commonly used kernel introduced by Floyd and Steinberg. Other kernels obtained allow to significantly reduce the computational complexity of the halftoning process without reducing its quality.
Genetic algorithms for mesh surface smoothing
Mehmet Yasin Özsağlam, Mehmet Çunkaş
This paper presents a new 3D mesh smoothing algorithm which is based on evolutionary methods. This method is a new optimization technique with Genetic algorithm. The main approach is based on expanding the search space by generating new meshes as genetic individuals. Features and shrinkage of models are preserved that are the main problems of existing smoothing algorithms. So with this method, over-smoothing effects are reduced and undesirable noises are effectively removed.
A review of state-of-the-art speckle reduction techniques for optical coherence tomography fingertip scans
Luke Nicholas Darlow, Sharat Saurabh Akhoury, James Connan
Standard surface fingerprint scanners are vulnerable to counterfeiting attacks and also failure due to skin damage and distortion. Thus a high security and damage resistant means of fingerprint acquisition is needed, providing scope for new approaches and technologies. Optical Coherence Tomography (OCT) is a high resolution imaging technology that can be used to image the human fingertip and allow for the extraction of a subsurface fingerprint. Being robust toward spoofing and damage, the subsurface fingerprint is an attractive solution. However, the nature of the OCT scanning process induces speckle: a correlative and multiplicative noise. Six speckle reducing filters for the digital enhancement of OCT fingertip scans have been evaluated. The optimized Bayesian non-local means algorithm improved the structural similarity between processed and reference images by 34%, increased the signal-to-noise ratio, and yielded the most promising visual results. An adaptive wavelet approach, originally designed for ultrasound imaging, and a speckle reducing anisotropic diffusion approach also yielded promising results. A reformulation of these in future work, with an OCT-speckle specific model, may improve their performance.
Evaluating word semantic properties using Sketch Engine
Velislava Stoykova, Maria Simkova
The paper describes approach to use statistically-based tools incorporated into Sketch Engine system for electronic text corpora processing to mining big textual data for search and extract word semantic properties. It presents and compares series of word search experiments using different statistical approaches and evaluates results for Bulgarian language EUROPARL 7 Corpus search to extract word semantic properties. Finally, the methodology is extended for multilingual application using Slovak language EUROPARL 7 Corpus.
Noncontact surface roughness measurement using a vision system
Erdinç Koçer, Erhan Horozoğlu, Ilhan Asiltürk
Surface roughness measurement is one of the basic measurement that determines the quality and performance of the final product. After machined operations, tracer end tools are commonly used in industry in order to measure the surface roughness that occurred on the surface. This measurement technique has disadvantages such as user errors because it requires calibration of the device occurring during measurement. In this study, measuring and evaluation techniques were conducted by using display devices over surface image which occurred on the processed surfaces. Surface measurement which performed by getting image makes easier measurement process because it is non-contact, and does not cause any damage. Measurement of surface roughness, and analysis was conducted more precise and accurate. Experimentally obtained results of the measurements on the parts in contact with the device is improved compared with the results of the non-contact image processing software, and satisfactory results were obtained.
A unified approach for development of Urdu Corpus for OCR and demographic purpose
Prakash Choudhary, Neeta Nain, Mushtaq Ahmed
This paper presents a methodology for the development of an Urdu handwritten text image Corpus and application of Corpus linguistics in the field of OCR and information retrieval from handwritten document. Compared to other language scripts, Urdu script is little bit complicated for data entry. To enter a single character it requires a combination of multiple keys entry. Here, a mixed approach is proposed and demonstrated for building Urdu Corpus for OCR and Demographic data collection. Demographic part of database could be used to train a system to fetch the data automatically, which will be helpful to simplify existing manual data-processing task involved in the field of data collection such as input forms like Passport, Ration Card, Voting Card, AADHAR, Driving licence, Indian Railway Reservation, Census data etc. This would increase the participation of Urdu language community in understanding and taking benefit of the Government schemes. To make availability and applicability of database in a vast area of corpus linguistics, we propose a methodology for data collection, mark-up, digital transcription, and XML metadata information for benchmarking.
Memory-efficient large-scale linear support vector machine
Abdullah Alrajeh, Akiko Takeda, Mahesan Niranjan
Stochastic gradient descent has been advanced as a computationally efficient method for large-scale problems. In classification problems, many proposed linear support vector machines are very effective. However, they assume that the data is already in memory which might be not always the case. Recent work suggests a classical method that divides such a problem into smaller blocks then solves the sub-problems iteratively. We show that a simple modification of shrinking the dataset early will produce significant saving in computation and memory. We further find that on problems larger than previously considered, our approach is able to reach solutions on top-end desktop machines while competing methods cannot.
On improvements of neural network accuracy with fixed number of active neurons
In this paper an improvement possibility of multilayer perceptron based classifiers with using composite classifier scheme with predictor function was exploited. Recognition of embossed number characters on plastic cards in the image taken by mobile camera was used as a model problem.
Measuring the engagement level of children for multiple intelligence test using Kinect
Dongjin Lee, Woo han Yun, Chan kyu Park, et al.
In this paper, we present an affect recognition system for measuring the engagement level of children using the Kinect while performing a multiple intelligence test on a computer. First of all, we recorded 12 children while solving the test and manually created a ground truth data for the engagement levels of each child. For a feature extraction, Kinect for Windows SDK provides support for a user segmentation and skeleton tracking so that we can get 3D joint positions of an upper-body skeleton of a child. After analyzing movement of children, the engagement level of children’s responses is classified into two classes: High or Low. We present the classification results using the proposed features and identify the significant features in measuring the engagement.
Real time rectangular document detection on mobile devices
Natalya Skoryukina, Dmitry P. Nikolaev, Alexander Sheshkus, et al.
In this paper we propose an algorithm for real-time rectangular document borders detection in mobile device based applications. The proposed algorithm is based on combinatorial assembly of possible quadrangle candidates from a set of line segments and projective document reconstruction using the known focal length. Fast Hough Transform is used for line detection. 1D modification of edge detector is proposed for the algorithm.
An iterative undersampling of extremely imbalanced data using CSVM
Jong Bum Lee, Jee Hyong Lee
Semiconductor is a major component of electronic devices and is required very high reliability and productivity. If defective chip predict in advance, the product quality will be improved and productivity will increases by reduction of test cost. However, the performance of the classifiers about defective chips is very poor due to semiconductor data is extremely imbalance, as roughly 1:1000. In this paper, the iterative undersampling method using CSVM is employed to deal with the class imbalanced. The main idea is to select the informative majority class samples around the decision boundary determined by classify. Our experimental results are reported to demonstrate that our method outperforms the other sampling methods in regard with the accuracy of defective chip in highly imbalanced data.