Proceedings Volume 10828

Third International Workshop on Pattern Recognition

Xudong Jiang, Zhenxiang Chen, Guojian Chen
cover
Proceedings Volume 10828

Third International Workshop on Pattern Recognition

Xudong Jiang, Zhenxiang Chen, Guojian Chen
Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 7 August 2018
Contents: 8 Sessions, 56 Papers, 0 Presentations
Conference: Third International Workshop on Pattern Recognition 2018
Volume Number: 10828

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 10828
  • Pattern Recognition
  • Target Detection
  • Image Transformation and Analysis
  • Image Detection Technology and Application
  • Image Processing and Application
  • Signal Analysis and Processing
  • Computer Science and Engineering
Front Matter: Volume 10828
icon_mobile_dropdown
Front Matter: Volume 10828
This PDF file contains the front matter associated with SPIE Proceedings Volume WPR18, including the Title Page, Copyright information, Table of Contents, and Conference Committee listing.
Pattern Recognition
icon_mobile_dropdown
Classification of handwritten Japanese hiragana characters of 71 categories attached with sound marks by using pattern augmentation for deep learning
Yoshihiro Shima, Yuki Omori
Neural networks are powerful technology for classifying character patterns and object images. A huge number of training samples is very important for classification accuracy. A novel method for recognizing handwritten hiragana characters is proposed that combines pre-trained convolutional neural networks (CNN) and support vector machines (SVM). The training samples are augmented by pattern distortion such as by cosine translation and elastic distortion. A pre-trained CNN, Alex-Net, can be used as the pattern feature extractor. Alex-Net is pre-trained for large-scale object image datasets. An SVM is used as a trainable classifier. Original hiragana samples of 71 classes on the ETL9B are divided in two-fold by odd and even dataset numbers. Samples with the odd dataset number and augmented patterns on the ETL9B database are trained by the SVM. The feature vectors of character patterns are passed to the SVM from Alex- Net. The average error rate was 2.378% for 100 test patterns of each of the 71 classes for a 5-times test, and the lowest error rate was 2.113% with 468,600 training patterns of distorted hiragana characters. Experimental results showed that the proposed method is effective in recognizing handwritten hiragana characters.
Radar micro-doppler signature analysis and its application on gait recognition
Jianfeng Ren, Xudong Jiang
Micro-Doppler signature (mDS) was often utilized for radar target recognition in literature. Most existing approaches focus on extracting visual features for human operators. In this paper, we provide a complete solution to gait recognition using radar micro-Doppler analysis. Gait recognition is challenging due to the time-varying nature of micro-Doppler signature and the small differences of human gaits among different people. To align two mDSs in time, we propose to utilize dynamic time warping (DTW). To uncover the tiny differences among people, we propose to treat the distances of a sample to all gallery samples as the feature vector, and classify it using a support vector machine. To evaluate the performance of the proposed approach, we create an mDS-gait database. On this database, the proposed approach demonstrates superior performance compared with existing ones.
Fingerprint identification based on neural network for large fingerprint database
Zhicheng Wang, Hongwei Zhang, Juan Peng, et al.
In this paper, we propose a novel approach in fingerprint identification for large Fingerprint database. First of all, we use neural network to extract minutiae. Then, facing the problems of large fingerprint database storage, matching speed, different fingerprint size, different fingerprint angles, we propose new a method that translates the fingerprint image to a n-dimension descriptor to represent the fingerprint features. This descriptor won’t be influenced by these problems. Finally, we propose D-LVQ to classify the sample set based on the training model. So, the all steps of fingerprint identification are neural networks. Experimental Results show that the method we proposed has better performance for large fingerprint database in accuracy, time and storage.
An abnormal telephone identification model based on ensemble algorithm
Yahan Yuan, Kei Ji, Runyuan Sun III, et al.
Due to the rapid development of the communications industry and the popularization of telephones, more and more personal information leaks and telephone fraud cases have occurred in the life.For the problem of fraudulent calls, there are deficiencies for operators to solve these problems.Inspired by the ensemble algorithm, it was found that the bagging algorithm can solve the classification problem of unbalanced data.This paper proposes an abnormal phone recognition model based on bagging algorithm.In particular, we used PCA dimension reduction in processing data to better mine the effective features of the sample, Multiple training sets are constructed by bootstrap sampling, and the ensemble of multiple training set-trained learners can solve the classification problem of unbalanced abnormal telephone data. Experiments show that the accuracy of prediction results of the abnormal phone recognition model based on the integrated algorithm is better than the prediction results of the single decision tree model, and the problem of unbalanced samples was solved and a relatively ideal prediction effect was achieved.
Microcontroller-based face recognition using combinations of morphing algorithms of Beier-Neely, Delaunay Triangulation and Alpha Blending
Jessie R. Balbin, Marianne M. Sejera, Mark P. Halili, et al.
This study aims to compare three algorithms; the Beier-Neely Morphing Algorithm, the Delaunay Triangulation Technique, and the Alpha Blending, which can be used for face morphing with the intention of applying them on face recognition systems. The study showcases as well different integration of the said algorithms and based from the results, Alpha-Delaunay Triangulation is found out to be the best algorithm among the algorithm integrations.
Recognizing emotions in chinese text using dictionary and ensemble of classifiers
Yanyong Ai, Zhenxiang Chen, Shanshan Wang, et al.
In recent years, subjective texts have shown great application value. As a hot research issues in the field of natural language processing, analysis of emotions in text, has attracted attentions from many scholars and also greatly develops research on the emotional polarity of Chinese texts. This paper presents an emotional classification algorithm combining dictionary and ensemble classifier. Firstly, based on the fusion of multiple dictionaries such as emotional dictionary, degree dictionary, and negative dictionary, output the negative and positive scores of each sentence according to the designed emotion calculation algorithm. Defining the difference value between negative and positive scores as the emotional tendency value, the samples are sorted by the amount of emotional tendency value and samples with the highest emotional tendency value are selected as the training samples. Finally, the ensemble classifier is used to classify the text emotions. Based on six machine learning algorithms including polynomial Bayes, decision tree, random forest, k-nearest neighbor, SVM, and logistic regression, the ensemble classifier aims to achieve the best classification effect and minimize the disadvantages of individual classifiers. The results show that the classification accuracy of the ensemble classifier is better than that of individual classifiers.
A method of automatic recognition for answer sheet
Yingjie Xia, Xiangru Yu, Rui Chen, et al.
The traditional method of recognizing an answer sheet is to use optical mark reader (OMR). A kind of OMR only recognizes a certain answer sheet with fixed format, which results in the poor universality of OMR. We propose a recognition method for answer sheet with arbitrary format. After designing the new answer sheet or using the existing ones, the printed answer sheets will become images by high-definition (HD) scanning after being filled in an exam. And the images of answer sheets will be recognized automatically by image processing techniques. According to the positioning cross found in answer sheets, the images will be corrected if they are tilted. Then candidate number recognition, option recognition and page number recognition will be carried out in the order specified by users. The method of maximum between-cluster variance will be used for candidate number recognition and option recognition. On the other hand, the page number of answer sheet will be recognized by template matching. Experimental results show that the accuracy can reach 100%. And this method can be realized easily, the cost is low, and it has good universality.
Button location and recognition method for elevator based on artificial mark and geometric transformation
Jianjie Shi, Qinghao An, Jinping Li
A button location and recognition method is proposed to help robot locate elevator button with the purpose of taking elevator to the destination floor autonomously. The position of elevator’s control panel is determined with four artificial marks around it, while the marks are located by analyzing the nested contours. After transforming the button panel into a rectangle with perspective transformation, we extract the feature of the buttons using morphology transformations. Then the layout of the button panel is determined with the projection histograms of the panel. Finally, the position and function of each button is obtained by comparing the layout with the template that has been set manually.
Target Detection
icon_mobile_dropdown
3D-DETNet: a single stage video-based vehicle detector
Suichan Li, Feng Chen
Video-based vehicle detection has received considerable attention over recent decades. Deep learning based algorithms are effective means for vehicle detection. However, these methods are mainly devised only for static images and applying them for video vehicle detection directly always obtains poor performance. In this work, we propose 3DDETNet: a single-stage video-based vehicle detector integrating 3DCovNet with focal loss, our method has ability to capture temporal information and is more suitable to detect vehicle in video than the other single-stage methods. Our 3D-DETNet takes multiple video frames as input and generate multiple spatial feature maps, then spatial feature maps are fed to sub-model 3DConvNet to capture temporal information which is fed to final fully convolution model for predicting locations of vehicles in video frames. We evaluate our method on UA-DETAC vehicle detection dataset. The experiment results show our method yields better performance and keeps a higher detection speed of 26 fps compared with the other typical methods.
Outlier detection algorithm based on robust component analysis
Cha Zheng, Lixin Ji, Chao Gao, et al.
In outlier detection problem, most existing algorithms have a notable issue that these approaches cannot detect highdimension outliers effectively. In order to provide a practical solution for this problem, we propose an outlier detection algorithm based on robust component analysis. The basic idea is to train multiple base detectors with the robust component analysis results of the training dataset. Furthermore, we generate some virtual outliers and utilize them to test the capacities of based detectors, and combine them according to the test results to obtain the final outlier detector. Experimental results comparing the proposed method with baseline approaches are presented on several datasets showing the performance of our approach.
Global contrast saliency detection of images with small scale structure suppression
Weilin Ling, Yinwei Zhan
Image saliency detection is a way to extract information from an image that a man pays close attention to. Usually, objects of small scale structure cause difficulties in saliency detection. For this, in this paper, we propose a global contrast saliency detection method based on color relation using color differences, globally spatial relations of images and a kind of image segmentation. Experiments demonstrate that this approach can ensure that objects of small scale structures do not appear in the final saliency map.
An extension of BING to high IOU threshold
Canzhang Guo, Yinwei Zhan
BING is an objectness measure to extract proposal windows in an image that may contain objects, avoiding cumbersome sliding window search for object detection. BING has a high recall rate when the Intersection-over-Union (IOU) threshold is 0.5, and runs as fast as 300 fps. However, the recall rate drops rapidly when the IOU threshold is greater than 0.5. So in this paper, we focus on investigating the cause of this phenomenon, and propose how to improve the recall rates, in which average recall rate is used in the performance evaluation of objectness measure for object detection. The problem of less positive samples in the secondary training stage is solved by selecting parameters with respect to training and testing.
Robust vanishing point detection based on block wise weighted soft voting scheme
Xue Fan, Zhiquan Feng, Xiaohui Yang, et al.
Vanishing point detection is a challenging task due to the variations in road types and its cluttered background. Currently, most existing texture-based methods detect the vanishing point using pixel-wise voting map generation, which suffers from high computational complexity and the noise votes introduced by the incorrectly estimated texture orientations. In this paper, a block wise weighted soft voting scheme is developed for good performance in complex road scenes. First, the gLoG filters are applied to estimate the texture orientation of each pixel. Then, the image is divided into blocks in a sliding fashion, and a histogram is constructed based on the texture orientation of pixels within each block to obtain the dominant orientation bin. Instead of using the texture orientation of all valid pixels within each block, only the dominant orientation bin is utilized to perform a weighted soft voting. The experimental results on the benchmark dataset show that the proposed method achieves the best performance among all, when compared with the state-of-the-art works.
Image Transformation and Analysis
icon_mobile_dropdown
Fast infrared image segmentation method based on 2D OTSU and particle swarm optimization
Song-Tao Liu, Zhan Wang, Zhen Wang
The image segmentation method based on 1D histogram and the optimal objective function is an important threshold segmentation method, but if it is applied to the infrared image segmentation directly, its ability for the suppression of the background noise is weak. In this paper, the 2D Maximum inter-class variance method is applied to infrared image segmentation, which improves the image segmentation effect obviously, but it takes a long time to calculate. Therefore, an improved Particle Swarm Optimization (PSO) algorithm is introduced to speed up the algorithm, which improves the real-time performance of the algorithm. The experimental results show that the new method has not only good segmentation effect, but also high computational efficiency, and it is a fast infrared image segmentation method.
Aesthetic QR code generation with background contrast enhancement and user interaction
Lijian Lin, Xinyi Zou, Liping He, et al.
Quick Response Code, abbreviated as QR code, is a two dimensional matrix which is extensively used both in the automotive industry and the general commercial applications currently. Compared with traditional barcodes, QR code is prevalent for its enormous information capacity and efficient error correction mechanism. Moreover, the standard QR codes possess a high decode rate at the expense of the aesthetic appearance. With an intension to resolve the contradiction, we propose a novel aesthetic QR code generation method. Differing from previous works, which mainly rely on the error correction mechanism, we first enhance the contrast of the background image so that more modules can be eliminated after initial threshold based module elimination, while maintaining the readability and demonstrate visual information to customers simultaneously. User interaction can be further adopted to delete modules as customer required using error correction mechanism.
Glandular cavity segmentation based on local correntropy-based K-means (LCK) clustering and morphological operations
One of the ways to diagnose cancer is to obtain images of the cells under the microscope through biopsies. Because the images of the stained cells are very complicated, there is a great deal of interference with the doctor's observations. To address this issue, we propose a new method for segmenting glandular cavity from gastric cancer cell images. Our method combines local correntropy-based K-means (LCK) clustering method and morphological operations to divide the image into complete glandular cavity and remove all extra-cavity interference areas. Our method does not require human interaction. The acquired image boundary features and internal information are complete, allowing doctors to diagnose cancer more quickly and efficiently.
Statistical multi-scale laws’ texture energy for texture segmentation
Nowadays it is one of the main focuses to recognize objects in a digital image. The texture is an important valuable feature in describing the coarseness and the regularity pattern in the surface of an object. We present an interesting and effective technique for segmentation of different texture by integrating color information and Laws’ texture energy. The first step is to convert an image from RGB to HSV color space to obtain hue channel as the basic feature. The second step is to calculate Laws’ texture energy in each pixel by exploring statistical approaches including mean and variance in the serial of multi-scale windows by moving window, in this step several variances can be produced to form a vector, and the vector can be used as an additional feature. This work utilizes threshold of difference between neighborhood vectors as an alternative to distinguish coarseness in a region after segmentation by using the basic feature. In addition, this work calculates the difference mean of hue each color in a region which contain many colors in 5 × 5 window size and utilizes threshold of mean to distinguish the similarity mean between colors. This work examined images from Berkeley Segmentation Dataset (BSDS) which have several textures by using a threshold of difference (70) between neighborhood vectors and threshold mean (10) of hue. The results show that 70.6% of the texture segmentation can be accepted after combining color information and Laws’ texture energy and provide a favorable result for texture segmentation.
The denoising semi-coupled dictionary learning for retina image super-resolution
Jiwen Dong, Weifang Wang, Guang Feng, et al.
Retina images are mainly obtained by Spectral Domain-Optical Coherence Tomography (SD-OCT), however, most of the acquired volume data are low-resolution(LR) images with noise, making it hard to quantify diseased tissue based on low quality retinal images. In this paper, we propose a denoising Semi-Coupled Dictionary Learning(SCDL) model to reconstruct the noise image while guaranteeing certain noise robustness. First, we use non-local similarities of retina images to construct constraint term, which is added to the objective function of the proposed model. Then, in order to guarantee the fidelity of reconstructed image, the initialized interpolation section should be replaced by the corresponding LR image after SR reconstruction. However, the noise in LR image will affects the reconstructed image quality. So we perform bilateral filtering on the LR image before replacement. Last, two sets of experiments on retinal noise images validate that our proposed method outperforms other state-of-the-art methods.
Image Detection Technology and Application
icon_mobile_dropdown
Indirect Gaussian kernel parameter optimization for one-class SVM in fault detection
Yingchao Xiao, Haichao Gao, Yongjie Yan
One-class SVM (OCSVM) is widely adopted as an effective method for fault detection, and its Gaussian kernel parameter directly influences its fault detection performance. However, the absence of fault samples in the training set makes it difficult to optimize this parameter. To solve this problem, a novel method of Gaussian kernel parameter optimization is proposed in this paper. This method first automatically selects edge and inner samples from the training set, and then optimizes the parameter through adjusting the distribution of the mappings of edge and inner samples in the feature space, so as to facilitate the building of OCSVM models. Moreover, this method needs not to train OCSVM models during the parameter optimization, which can save computational sources. The effectiveness of this proposed method is testified by experiments on 2D data sets and UCI data sets.
A computer aided diagnosis system for lung cancer detection using support vector machine
Boran Şekeroğlu, Erkan Emirzade
Computer aided diagnosis (CAD) is started to be implemented broadly in the diagnosis and detection of many varieties of abnormalities acquired during various imaging procedures. The main aim of the CAD systems is to increase the accuracy and decrease the time of diagnoses, while the general achievement for CAD systems are to find the place of nodules and to determine the characteristic features of them. As lung cancer is one of the fatal and leading cancer types, there has been plenty of studies for the usage of the CAD systems to detect lung cancer. Yet, the CAD systems need to be developed a lot to identify the different shapes of nodules, lung segmentation and to have higher level of sensitivity, specifity and accuracy. In this paper, Lung Image Database Consortium (LIDC) database is used which comprises of an image set of lung cancer thoracic documented CT scans. After performing image pre-processing, segmentation, feature extraction/selection steps, classification is utilized using Support Vector Machine (SVM) with Gaussian RBF and 97.3% of specificity and 92.0% of sensitivity is achieved which is superior to recently proposed CAD systems.
Learning to rank-based abnormal phone analysis in environment of telecommunication big data
Jian Liu, Ke Ji II, Runyuan Sun III, et al.
It is very important to find out abnormal calls and take effective control measures, but most of the current solutions are passive processing technology and lack active detection methods. Based on the existing telecom big data, through the statistical analysis of abnormal telephone behavior, the salient features which could represent the abnormal telephone were found. Then the the active detection method of abnormal telephone was designed based on the Ranking SVM sorting learning machine learning method. The experimental results on real datasets show that the proposed method can achieve higher accuracy under different sample sizes.
DroidDetector: a traffic-based platform to detect android malware using machine learning
Jingya Shen, Zhenxiang Chen, Shanshan Wang, et al.
With the rapid development of the mobile Internet,more and more people are using smart phones to access the Internet, especially Android devices, which have become the most popular devices of the moment. Although today's mobile operating systems do their best to provide users with a secure Internet environment, due to the open source nature of Android, it is still unable to completely stop the outbreak of Android malware. Although existing source-based static detection and behavior-based dynamic detection can identify mobile malware, many problems still exist,such as low detection efficiency and difficulty in deployment. In order to solve these problems, we propose DroidDetector, a detection engine that can automatically detect whether an app is a malware or not by using off-line trained machine learning models for network traffic analysis. DroidDetector uses the VPNService class provided by the Android SDK to intercept network traffic (it does not require root permission). All data analysis are performed on the server,which consumes minimun cache and resource on mobile devices. We extract the length of the first 8 packets of network traffic as features and use Support Vector Machine(SVM) classification algorithm to train the model. In an evaluation experiment of 53107 TCP packet length feature tuples samples, DroidDetector can achieve 95. 68% detection confidence.
Escherichia coli and Enterococcus faecalis growth detection device in ice using impedance microbiology and image processing technique
Jessie R. Balbin, Ernesto M. Vergara Jr., Kzandra H. Katigbak, et al.
Various drinks served in different establishment always come with ice. Ice is the main component in preparing cold drinks- it also brings benefits to human health. However, due to poor sanitation in ice manufacturing, it can also put human health at risk. Escherichia coli and Enterococcus faecalis are the most common bacteria that surrounds the ice. Too much intake of contaminated ice caused by these two bacteria can lead to serious illness. On this study, the researchers presented a prototype that detects the growth of the bacteria, Escherichia coli and Enterococcus faecalis, using impedance microbiology and image processing. In addition, Graphic User Interface (GUI) was also be implemented in this study. GUI displays the captured images of the different phases of the sample and the concentration of bacteria present.
Oil spills detection by spreading attention along gestalt cues
Qianqian Deng, Hao Dou, Peng Gao, et al.
This paper proposes an oil spills detection method based on superpixel merge. Inspired by Gestalt criterion of cognitive psychology, we explore two typical Gestalt grouping cues, including proximity and similarity for superpixels merge processing to finish the task of oil spill detection. First, the input infrared image is over-segmented into superpixels. Then, we extract the feature of each superpixel and compute feature contrast to obtain initial attention. Finally, spreading the attention along Gestalt grouping cues to merge superpixels and obtain the final whole oil spills regions.
Design of integrated quality detect system for saw blades based on LabVIEW
In order to solve the problem of low efficiency and poor precision which usually appeared in saw blade detection, here designed a system that can detect the surface jump, thickness, warpage and cracks at the same time. Laser displacement sensors were used to ensure the accuracy. The saw blade was driven by a stepper motor, which is used to avoid direct manipulation of the blade by the user’s hands. By using eddy current test, cracks could be accurately detected. The human-machine interface (HMI), designed based on LabVIEW, could not only did the data processing, but also had an easily interface. Test data could be stored in the database for further analysis. The system has the characteristics of fast detection speed, high precision, good safety, thus can replace the traditional method of saw blade detection.
Detection of zero degree belt loss in radial tire based on multiscale Gabor transform
Xiunan Zheng, Zengzhi Pang, Qingtao Hou, et al.
Tire safety is becoming more and more important with the increasing number of vehicles. The Zero Degree Belt Loss (ZDBL) is one of the important defects in radial tire that attract serious attention, which can result in fatal influence on the tire quality. In this study, an effective detection method to detect ZDBL in all steel radial tire based on multiscale Gabor transform and morphological filter is proposed. First of all, the multiscale and multi direction Gabor filtering of the tire tread image is carried out. After Gabor filtering, it was found that the texture of the 0 degree belt is obviously different from the other parts in zero degree direction. Then, according to the direction feature extracted by the Gabor transform, a morphologic filter is constructed to remain zero degree direction texture. Finally, if the pixel number is less than threshold in 0 degree direction of the tire tread after morphological filtering, the tire can be judged with ZDBL. 800 tire images are used in our experiment. These images are obtained from a tire factory, which including 100 normal images without any defects, 100 images with ZDBL and 600 images with other types of defects. The results show that the precision is 99.8% and the recall rate can reach 99.9%. Testing in the tire factory have also achieved good results without misreporting.
Surface defects detection of paper dish based on Mask R-CNN
Xuelong Wang, Ying Gao II, Junyu Dong, et al.
Machine vision is widely used in the detection of surface defects in industrial products. However, traditional detection algorithms are usually specialized and cannot be generalized to detect all types of defects. Object detection algorithms based on deep learning have powerful learning ability and can identify various types of defects. This paper applied object detection algorithm to defects detection of paper dish. We first captured the images with different shapes of defects. Then defects in these images were annotated and integrated for model training. Next, the model Mask R-CNN were trained for defects detection. At last, we tested the model on different defects categories. Not only the category and the location of the defect in the image could be got, but also the pixel segmentation were given. The experiments show that Mask R-CNN is a successful approach for defect detection task, which can quickly detect defects with a high accuracy.
Determination of jeepney engine condition based on smoke emission analysis using carbon monoxide and carbon dioxide gas sensors with color-based segmentation using L*a*b color space
Jessie R. Balbin, Glenn V. Magwili, Jordaniel C. Agus, et al.
Jeepneys play a significant role in the Philippines’ transportation system as they are the most widely used mode of transportation in the country. With the advent of jeepney modernization, jeepneys with poor emissions are being threatened to be phased out due to excessive emissions of harmful gases. These will affect the environment as well as the health of the commuters. Using an Arduino, CO (Carbon Monoxide), CO2 (Carbon Dioxide) sensors and a webcam, the researchers have created a prototype which identifies the likely engine problem of the jeep from analyzing the smoke emissions. The device is accompanied with a graphical user interface for initializing the prototype, viewing real time data, and saving data for references and future use.
Spectral-spatial hyperspectral image classification based on extended training set
Changli Li, Qingyun Wang
Hyperspectral remote sensing image classification achieved good effect using support vector machine (SVM) even with very few training samples. But due to restrictions on the number of samples, it is hard to further enhance classification accuracy when only using spectral information. On the other hand, one can improve the classification accuracy by increasing the training samples when the training samples are few. Accordingly, we present a method of extending the training samples by using spatial information. In this method, the classes of samples contained in one segmentation region are treated as the same class and the class labels of all the pixels in this region are decided by the class labels of the training samples contained in it. These new samples are then named as the extended training set. Experiments show that the proposed method in this paper has better effect than the direct use of majority voting method.
A self-adaptive subtraction algorithm for dynamic background video
Zhiyong An, JiaHui Zhang, Shuying Chen, et al.
This paper presents an effective background modeling method that incorporates adaptive mechanism for the dynamic background. Each pixel in the background model is defined by a history of the N most recent image values at each pixel. It then compares the model with the current pixel value to determine whether or not the pixel belongs to the background using the decision threshold. We design the Time-spatial dynamic feature (TSD feature) innovatively to describe the dynamic background. According to the TSD feature, the decision threshes can be adjusted adaptively with feedback loops that overcome global threshold influence for dynamic background. Updating the background model is essential in order to account for changes in the background, such as moving background objects and lighting changes. The update rate in the background model also can be adjusted adaptively with the background changes based on the TSD feature. The experimental results demonstrate that the proposed algorithm outperforms several state-of-the-art methods on dynamic background video sequences.
Image Processing and Application
icon_mobile_dropdown
Special faster-RCNN for multi-objects detection
Libin Hu D.V.M., Changzhi Wei II, Xinghai Yang II, et al.
A series of neural networks called RCNN are playing a vital role in objects detection, as the most perfect one, Faster RCNN achieved an end-to-end object detection and made the detection times comparatively low but with high accuracy. In this work, we propose the following two changes to the original Faster RCNN model for multi-object detection: The first, we give 1800 ROI(Regions of Interest) comes from RPN to the RCNN network as input instead of 300, all the 1800 ROI are used to training the softmax classification and Bounding-box regression. The second, we traverse all xml files of every training image to get the number of marked objects and calculate the value of IOU for every marked objects, then we set a dynamic loss function to evaluation and optimization the Faster RCNN model by the two values of an image.
An automatic facial beautification method for video post-processing
Yifeng Zhou, Wensha Tian, Chengrong Yu, et al.
The facial beautification is very popular nowadays. There are many photographic apps supporting the facial beautification function. However, the automatic beautification of human faces in a video is still relatively rare. In this paper, we present an automatic facial beautification method for video post-processing software. Firstly we use OpenCV and Dlib to detect the human’s face. Secondly we use Gaussian blur and median filtering to whiten the facial area. And then we use linear interpolation to add the decoration to the cheek. Lastly we enhance the lip’s color based on digital differential analyzer (DDA) and scan line algorithm. The method has been developed as a plugin for After Effects (AE). Experiments show that our method can achieve good results with no obvious artifacts and it’s easy to operate.
Crowd counting system by facial recognition using Histogram of Oriented Gradients, Completed Local Binary Pattern, Gray-Level Co-Occurrence Matrix and Unmanned Aerial Vehicle
Jessie R. Balbin, Ramon G. Garcia, Kaira Emi D. Fernandez, et al.
A counting system is a device used for identifying the number of people present in a crowd. It has a wide variety of uses from fields of statistics, business and social sciences. This study introduces a method of a facial recognition counting system through the use of an unmanned aerial vehicle to capture aerial images of the crowd and the use of MATLAB to process those images to count the number of people present in the crowd. The algorithms used in this paper are Histogram of Oriented Gradients (HOG) and Completed Local Binary Pattern (CLBP) for low density and Gray Level Co-Occurrence Matrix (GLCM) for high density. From the data gathered, the program can classify an object as a head if it can see all of the human facial features like e.g. eyes, nose, mouth, etc. Thus, to obtain the best results in counting people in a crowd using this method, the user must take pictures at an angle and height where the features of the face can be seen, in our case, at 15 degrees and 3.2 meters respectively. But, if applied in an actual field, many people will be facing different directions and some faces will be blocked by other people.
A CNN-based probability hypothesis density filter for multitarget tracking
Chenming Li, Wenguang Wang, Yankuan Liang
Recently, the probability hypothesis density filter (PHD) shows excellent multiple targets tracking performance, and it has been applied for tracking targets in video. The PHD filter usually needs to integrate other feature for image object tracking. However, the single hand-crafted feature shows poor robustness while utilizing multiple features fusion will increase the complexity. To alleviate the above problems, a deep convolutional neural networks (CNN) based PHD filter is proposed in this paper. The proposed method utilizes the impressive representability of the CNN feature to improve the robustness without increasing the complexity. Besides this, we also revise the update process of the standard PHD filter to output the continuous track and new birth targets, directly. The experiment tested on MOT17 dataset validate the efficacy of the proposed method in multitarget tracking in image sequences.
Research on position calibration method of temperature image of pool based on image processing technology
During the process of collecting laser bath temperature data by CCD industrial camera, due to the influence of laser cladding process and installation location of mechanical equipment, the lens axis of CCD industrial camera is not perpendicular to the surface of the molten pool, resulting in positioning error caused by tilting photography. In order to eliminate the influence of positioning error and restore the true shape of the molten pool, a position calibration method for the temperature image of molten pool based on image processing technology was proposed. The experimental results show that after image processing the acquired image data of the molten pool was restored image morphology, which laid the foundation for subsequent image processing.
Grayscale difference sensitivity of human eyes on LCD screen
Xiangru Yu, Yimin Dou, Jinping Li
Grayscale difference sensitivity reflects visual characteristics of human eyes, and people often watch images and videos on LCD screen now, so the research of grayscale difference sensitivity on LCD (liquid crystal display) is interesting and meaningful. Up to now, the experiment on sensitivity towards brightness change for human eyes is mainly divided into two categories: one focuses on rods or the other specific structures of retina, which requires high precision in controlling illumination devices; the other just selects some contrasts in different spatial resolution, and the experiment regards the eye as a whole instead of the specific structures. After analyzing advantages and disadvantages of the two kinds of experiments, we propose an interesting experiment based on Weber law to measure the grayscale difference sensitivity on LCD screen, which is helpful to distinguish the selected change and the region of interest. And under the actual illumination, the experiment is conducted on all gray levels. The experiment process are as follows: firstly, present an image in a certain gray range as the background on the LCD screen; secondly, select an area as the foreground randomly; thirdly, the tester adjusts foreground grayscale gradually until he can perceive the difference, and then the foreground position is marked clearly to verify the result given by tester; finally, the background and foreground grayscale are recorded simultaneously. The experiment is conducted under indoor illumination, 100 student volunteers who have normal or corrected visual acuity attend the test. To verify the experimental results, an image grayscale compression algorithm is proposed. The experimental results show that the distribution of grayscale difference sensitivity data is regular, and the experiment conforms to Weber law.
Fractional amplitude of low-frequency fluctuation and degree centrality in autistic children: a resting-state fMRI study
Bo Miao, Junling Guan, Qingfang Meng, et al.
Autism negatively affects healthy cognitive development in children. As reliable neuroimaging markers, fractional amplitude of low-frequency fluctuation (fALFF) can reflect the intensity of spontaneous brain activity, and degree centrality (DC) can reflect connectivity of whole brain at voxel-level. By combining these two markers we can study the pathological mechanism of autism from more aspects. We investigated fALFF and weighted DC differences using functional magnetic resonance imaging (fMRI) data in 24 autistic children and 24 neurotypical children. Compared with neurotypical children, autistic children showed increased fALFF in right medial frontal gyrus, right dorsal anterior cingulate cortex, and bilateral ventral posterior cingulate cortex as well as decreased fALFF in bilateral visual cortex. Compared with neurotypical children, autistic children also showed increased weighted DC in left middle temporal gyrus, left middle frontal gyrus, and bilateral ventral posterior cingulate cortex as well as decreased weighted DC in left posterior cerbellar lobe and left visual cortex. Results in our study suggest that the pathological mechanism of autism is associated with spontaneous activity and connectivity changes in many brain regions, these changes will affect the ability of theory of mind.
Rapid image retrieval with binary hash codes based on deep learning
GuangWei Deng, Cheng Xu, XiaoHan Tu, et al.
With the ever-growing large-scale image data on the web, rapid image retrieval has become one of the hot spots in the multimedia field. And it is still very difficult to reliable image retrieval due to the complex image appearance variations. Inspired by the robustness of convolutional neural networks features, we propose an effective deep learning framework to generate compact similarity-preserving binary hash codes for rapid image retrieval. Our main idea is incorporating deep convolutional neural network (CNN) into hash functions to jointly learn feature representations and mappings from them to hash codes. In particular, our approach which learns hash codes and image representations takes pairs of images as training inputs. Meanwhile, an effective loss function is used to maximize the differentiability of the output space by encoding the supervised information from the input image pairs. We extensively evaluate the retrieval performance on two large-scale datasets CIFAR-10 and NUS-WIDE, and the evaluation shows that our method gives a better performance than traditional hashing learning methods in image retrieval.
Fast scene layout estimation via deep hashing
Yi Zhu, Wenbing Luo, Hanxi Li, et al.
In this work, we propose an efficient method for accurately estimating the scene layout in both outdoor and indoor scenarios. For outdoor scenes, the horizon line in a road image is estimated while for indoor scenes, the wall-wall, wallceiling and wall-floor edges are estimated. A number of image patches are first cropped from the image and then feed into a convolution neural network which is originally trained for object detection. The yielded deep features from three different layers are compared with the features of the training patches, in a spatial-aware hashing fashion. The horizon line is then estimated via a sophisticated voting stage in which different voters are considered differently according to their importances. In particular, for the more complex labels (in indoor scenes), we introduce the structural forest for further enhancing the deep features before learning the hashing function. In practice, the proposed algorithm outperforms the state-of-the-art methods in accuracy for outdoor scenes while achieves the comparable performance to the best indoor scene layout estimators. Further more, the proposed method is real-time speed (up to 25 fps).
A similarity learning for fine-grained images based on the Mahalanobis metric and the kernel method
Zisheng Fu, Ninghua Wang, Zhimin Feng, et al.
Since most prior studies on similar image retrieval focused on the category level, image similarity learning at the finegrained level remains challenge, which often leads to a semantic gap between the low-level visual features and highlevel human perception. To solve the problem, we proposed a Mahalanobis and kernel-based similarity (Mah-Ker) method combined with features developed by the Convolutional Neural Network (CNN). Firstly, triplet constraints are introduced to characterize the fine-grained image similarity relationship which the Mahalanobis metric is trained upon. Then a kernel-based metric is proposed in the last layer of model to devise nonlinear extensions of Mahalanobis metric and further enhance the performance. Experiments based on the real VIP.com dress dataset showed that our proposed method achieved a promising higher retrieval performance than both the state-of-art fine-grained similarity model and the hand-crafted visual feature based approaches.
Signal Analysis and Processing
icon_mobile_dropdown
Fine scale estimation for correlation filter tracking
Yanchuan Wang, Hongtao Yu, Shaomei Li, et al.
Focusing on the issue that Correlation Filter Trackers has poor performance in scale variations, a fine scale estimation approach is proposed. Firstly, we train a scale correlation filter using the target initial state. Secondly, the target is segmented according to its shape and then two subgraph correlation filters are respectively established. During tracking, we judge the trend of scale changes by the relative position changes of the subgraphs and the weights of the scale samples are offset. In this way, we obtain the coarse scale estimation of the target. Finally, we use Newton method to accurately estimate the scale of the target. Experiments show that the algorithm achieves more accurate scale estimation and effectively improves the tracking success rate.
Attribute reduction based on improved information entropy
Baohua Liang, Fei Ruan II, Yun Liu III
Traditional information entropy algorithm only considers the size of knowledge granularity, algebraic view only considers the impact of attributes on the determined subsets in the domain. In order to find an objective and comprehensive measure about the importance of attributes, first of all, starting from the algebraic view, we propose the definition of approximate boundary viscosity. Secondly, according to the definition of relative fuzzy entropy, the concept of relative information entropy is proposed, which can effectively measure the importance of attributes. In order to further enhance the importance of attributes, a concept of enhanced information entropy with significant amplification is proposed based on relative information entropy. Thirdly, two new attribute reduction methods are proposed by combining the approximate boundary precision with the entropy of relative information entropy and enhanced information entropy. Making full use of the results of U/B when seeking U / (B∪b) , greatly reducing the time overhead of the system. Finally, through experimental analysis and comparison, the feasibility and validity of the proposed algorithm in reducing quality and classification accuracy are verified.
Classification of ECG Arrhythmia using symbolic dynamics through fuzzy clustering neural network
This paper presents automatic ECG arrhythmia classification method using symbolic dynamics through hybrid classifier. The proposed method consists of four steps: pre-processing, data extraction, symbolic time series construction and classification. In the proposed method, initially ECG signals are pre-processed to remove noise. Further, QRS complex is extracted followed by R peak detection. From R peak value, symbolic time series representation is formed. Finally, the symbolic time series is classified using Fuzzy clustering Neural Network (FCNN). To evaluate the proposed method we conducted the experiments on MIT-BIH dataset and compared the results with Support Vector Machine (SVM) and Radial Basis Function Neural Network (RBFNN) classifiers. The experimental results reveal that the FCNN classifier outperforms other two classifiers.
Ensemble empirical mode decomposition applied to long-term solar time series analysis
Jianmei An, Yunfang Cai, Yi Qi, et al.
Solar time series manifests nonlinear and non-stationary behaviors, and perhaps multi-modal dynamical processes operating in solar magnetic indicators. In the present work, the novel ensemble empirical mode decomposition (EEMD) is applied to study the monthly distribution of sunspot areas produced by the extended time series of solar activity indices (ESAI) database in the time interval from 1821 January to 1989 December. It is established that the quasi-periodic variations of monthly sunspot areas consist of at least three well-defined dynamical components: one is the short-term variations which are obviously smaller than one year, the second one is the mid-term variations with periodic scales varying from 1 year to 15 years, and the last component is the periodic variation with periodicities larger than 15 years. The analysis results indicate the EEMD technique is an advanced tool for analyzing the weakly nonlinear and non-stationary dynamical behaviors of solar magnetic activity cycle.
A fast CUM-m-Capon algorithm for DOA estimation based on fourth-order cumulant
Kaikai Chao, Xinyu Zhang, Kai Huo, et al.
The problem of super-resolution DOA estimation in very low SNR has attracted much interest for decades. In this paper we proposed a fast DOA estimation algorithm based on fourth-order cumulant for MIMO system radar. Combining with the average matrix dimension reduction technique the proposed CUM-m-Capon DOA estimation algorithm can achieve improved performance of DOA estimation. From Matlab simulation result, we can see that the proposed algorithm has a high resolution at low SNR. After adopting the average dimension reduction technique, the complexity of the algorithm is lower. The test using measured data shows that the proposed method could achieve better accuracy and robustness.
Music genre classification using a hierarchical long short term memory (LSTM) model
Chun Pui Tang, Ka Long Chui, Ying Kin Yu, et al.
This paper examines the application of Long Short Term Memory (LSTM) model in music genre classification. We explore two different approaches in the paper. (1) In the first method, we use one single LSTM to directly classify 6 different genres of music. The method is implemented and the results are shown and discussed. (2) The first approach is only good for 6 or less genres. So in the second approach, we adopt a hierarchical divide- and-conquer strategy to achieve 10 genres classification. In this approach, music is classified into strong and mild genre classes. Strong genre includes hiphop, metal, pop, rock and reggae because usually they have heavier and stronger beats. The mild class includes jazz, disco, country, classic and blues because they tend to be softer musically. We further divide the sub-classes into sub-subclasses to help with the classification. Firstly, we classify an input piece into strong or mild class. Then for each subclass, we further classify them until one of the ten final classes is identified. For the implementation, each subclass classification module is implemented using a LSTM. Our hierarchical divide-and-conquer idea is built and tested. The average classification accuracy of this approach for 10-genre classification is 50.00%, which is higher than the state-of-the-art approach that uses a single convolutional neural network. From our experimental results, we show that this hierarchical scheme improves the classification accuracy significantly.
Indexing and classifying snore characteristics using Support Vector Machine and integrated signal processing algorithm
Jessie R. Balbin, Ernesto Vergara Jr., Ross Junior S. Calma, et al.
Snoring is the loud or severe sound that buzzes when an individual sleep. Snoring can be produced through the nose, throat, uvula, or tongue. Each nature could be a sign that can be beneficial to specify what medical ailment or disorder a person could have. This paper focused on a sleeping disorder called Obstructive sleep apnea (OSA). Initiated from other investigation concerning about snoring detection and indexing, categories of snore have been segregated and classified from their elementary acoustic compositions such as the sound intensity and frequency. The study aims to come up with a device that records a snore sound that classifies the snore to what ailment the patient could be suffering using Support Vector Machine (SVM) and signal processing algorithm.
Computer Science and Engineering
icon_mobile_dropdown
A genetic algorithm-based approach for class-imbalanced learning
Shangyan Dong, Yongcheng Wu
It is often the case for machine learning that datasets are imbalanced in the real world. When dealing with this problem, the traditional classification method aiming to maximize the overall accuracy of classification is not suitable. To tackle this issue and improve the performance of classifiers, methods based on oversampling, undersampling and cost-sensitive classification are widely employed. In this paper, we propose a new genetic algorithm-based over-sampling technique for class-imbalanced datasets. The genetic algorithm can create optimized synthetic minority class instances to produce a balanced training datasets. The experimental results on 5 class-imbalanced datasets show that our method performs better than three existing sampling techniques in terms of AUC and F-measure.
Ideal solutions of TOPSIS based on vague sets and their application to landmark preference
Bo Wei, Xiaoyu Zhang, Xue He, et al.
Ideal solutions are an important part of the technique for order preference by similarity to ideal solution (TOPSIS) based on vague sets. In order to expand and supplement the ideal solutions of TOPSIS based on vague sets, and considering the potential influence on the degree of truth-membership and the degree of false-membership by the degree of unknown in vague sets, three potential ideal solutions are proposed by looking for new ideal solution forms between the maximal ideal solutions and the actual ideal solutions. Proof in theoretical shows that all the proposed potential ideal solutions are vague sets, which means that they all can apply to the calculation of TOPSIS based on vague sets. The proposed potential ideal solutions are three types of supplementary forms for the ideal solutions of TOPSIS based on vague sets. Several properties of the proposed potential ideal solutions are discussed, and it shows that the proposed potential ideal solutions can be converted to the maximal ideal solutions and the actual ideal solutions under certain conditions. The proposed potential ideal solutions are applied to landmark preference based on TOPSIS, which effectiveness and feasibility are illustrated, and a new way for landmark preference is provided in the meanwhile.
Research on opinion spreading based on military hierarchical network
Zhiyi Zhang, Xiaohu Yin, Haiyan Liu, et al.
The military unit is a special hierarchical social network. Based on complex network, opinion dynamics and multi-agent theory, this paper studies the regularity of opinion spreading in military hierarchical network. The analysis results show that the opinion value, influence, tenacity of a individual and the trust between neighboring individuals are the main factors that influence the opinion spreading. The opinion spreading speed in the network with interconnection between the same level nodes is low and it is hard to reach a consistent support state. The research shows that in such a closed social system, to achieve consistent support, it is necessary not only keeping the purity of the network, but also the effective intervention to the special individuals.
Target regression tracking based on convolutional neural network
Hongwei Zhang, Xiang Fan, Bin Zhu, et al.
For visual tracking with UAV, the non-rigid body change of target usually results in the accumulation of errors and decline of tracking precision. In view of this problem, a target regression tracking algorithm based on convolutional neural network is proposed. Firstly, we use the Siamese convolutional neural network to extract features which used as the input of tracker based on self-adapted scale kernel correlation filters. Then, in order to cope with the cumulative errors caused by the change of target form, a target regression network is designed to refine the location. Using the refined location to extract sample and update the filter parameters of tracker can prevent tracker from being polluted. The experimental results show that the algorithm has high tracking precision as well as fast speed compared to the state-of-the-art tracking algorithms, especially with the ability to deal with the non-rigid body change of target.
Hadoop-based analysis model of network public opinion and its implementation
Fei Wang, Peiyu Liu II, Zhenfang Zhu III
In order to perform network public opinion mining effectively, this paper proposes a Hadoop-based network public opinion analysis model, which applies HDFS file service system to store massive network data distributed, providing fault tolerance and reliability assurance; As the traditional K-means clustering method is too inefficient to process massive data during the clustering process, this paper adopts MapReduce-based K-means distributed topic clustering computation method to process the massive public opinion information through multi-computer cooperation efficiently; And to obtain the information of hot network public opinion in a certain period of time by the analysis of topic heat, and verify the effectiveness of the proposed method by experiments.
A rough-set based measurement for the membership degree of fuzzy C-means algorithm
The traditional Fuzzy C-means (FCM) algorithm is stable and easy to be implemented. However, the data elements in the cluster boundary of FCM are easily clustered into incorrect classes making the efficiency of FCM algorithm reduced. Aiming at solving this problem, this paper presents a Rough-FCM algorithm which is combined FCM algorithm with rough set according to new equations. We take the advantage of the positive region set and the boundary region set of rough set. First, Rough-FCM algorithm divides the data elements into the positive region set or the boundary region set of all classes according to the threshold we set. Second, it updates the cluster centers and membership matrixes with new equations. Thus, we can execute the second clustering based on first clustering of FCM. By comparing the experimental results of the Rough-FCM with K-means, DBSCAN and FCM according to four clustering evaluation indexes on both synthetic and real datasets, we evaluate our proposed algorithm and improve outcomes from most of datasets by adopting these three classic clustering algorithms mentioned above.
Short-term power load forecasting based on the PSO-RVR model
Yan Zhang, Genji Yuan, Hanran Ji, et al.
In order to improve the accuracy of short-term load forecasting of power system, the paper proposes a power load forecasting model RVR based on particle swarm optimization (PSO), and compares it with the support vector regression model. Aiming at the randomness of the parameters of the correlation vector regression, that is, the penalty function and the kernel function in the initialization, the PSO algorithm is used to optimize the parameters of the correlation vector regression, which can achieve better prediction results. The classical particle swarm optimization algorithm is a global optimization algorithm that can quickly find the optimal parameters in the correlation vector regression. The RVR model based on the particle swarm optimization is applied to short-term load forecasting. The simulation results show that the convergence rate of the optimized model of particle swarm optimization is more accurate than that of the traditional prediction models of SVR and RVR, and the predicting accuracy of the PSO – RVR model is higher than that of the PSO - SVR, which verifies the feasibility of the correlation vector regression method based on particle swarm optimization algorithm in the short-term load forecasting, which has practical value to some degree.
Design for omni-directional mobile wheelchair control system based on brain computer interface
Jiaxing Lu, Linyan Wu, Weiwei Zai, et al.
Based on Stable-State Visual Evoked Potentials (SSVEP) signal generation method, the multi-mode control system with manual control, remote control and brain computer signal control for omni-directional wheelchair system is designed. The system structure design, EEG signal acquisition and recognition processing technology are introduced. The software architecture of the overall control system, kinematical modeling of the omni-directional wheelchair, implementation of fundamental motion control algorithm and the design of wheelchair motion control algorithm are expounded. The paper illustrates the software architecture of main control system, and the design of motion scheduling and controlling algorithm. The system utilizes user’s EEG signal to control movement of wheelchair; remotely controlling movement of wheelchair. The system has friendly interactive interface for staffs of monitor center or relatives of patients to supervise state of wheelchair motion and information of environment in real time. Experimental results prove that the system could stably and reliably analyze EEG signal, possessing some practical value.