MIPPR 2017: Pattern Recognition and Computer Vision

Static facial expression recognition with convolution neural networks

Feng Zhang, Zhong Chen, Chao Ouyang, et al.

Show abstract

Facial expression recognition is a currently active research topic in the fields of computer vision, pattern recognition and artificial intelligence. In this paper, we have developed a convolutional neural networks (CNN) for classifying human emotions from static facial expression into one of the seven facial emotion categories. We pre-train our CNN model on the combined FER2013 dataset formed by train, validation and test set and fine-tune on the extended Cohn-Kanade database. In order to reduce the overfitting of the models, we utilized different techniques including dropout and batch normalization in addition to data augmentation. According to the experimental result, our CNN model has excellent classification performance and robustness for facial expression recognition.

Feature hashing for fast image retrieval

Lingyu Yan, Jiarun Fu, Hongxin Zhang, et al.

Show abstract

Currently, researches on content based image retrieval mainly focus on robust feature extraction. However, due to the exponential growth of online images, it is necessary to consider searching among large scale images, which is very timeconsuming and unscalable. Hence, we need to pay much attention to the efficiency of image retrieval. In this paper, we propose a feature hashing method for image retrieval which not only generates compact fingerprint for image representation, but also prevents huge semantic loss during the process of hashing. To generate the fingerprint, an objective function of semantic loss is constructed and minimized, which combine the influence of both the neighborhood structure of feature data and mapping error. Since the machine learning based hashing effectively preserves neighborhood structure of data, it yields visual words with strong discriminability. Furthermore, the generated binary codes leads image representation building to be of low-complexity, making it efficient and scalable to large scale databases. Experimental results show good performance of our approach.

Affine invariant feature extraction based on the shape of local support region

Luping Lu, Yong Zhang

Show abstract

Feature extraction is an important step in image feature matching. And the repeatability of features is particularly crucial. The perspective deformation of images can decrease the repeatability of features. This paper introduces a feature extraction method which can improve the repeatability of features when notable perspective deformation exists. First, initial feature points are extracted by the classical Harris algorithm. Then a local support region is extracted for every initial feature point. Affine rectification parameters can be calculated based on the shape of the support region. Then the image patch around a feature point is resampled using these affine rectification parameters. The final feature points are extracted and described on the resampled image patches. The repeatability of the final features is much better than initial features thanks to the affine rectification. And the feature descriptors obtained on the resampled image patches are better to be used in image matching.

Adaptive region constrained FCM algorithm for image segmentation with hierarchical superpixels

Lei Li, Zhuoli Dong, Xuan Fei, et al.

Show abstract

Spatially fuzzy c-means (FCM) clustering has been successfully applied in the field of image segmentation. However, due to the existence of noise and intensity inhomogeneity in images, most of the spatial constraint model fail to resolve misclassification problem. To further improve the segmentation accuracy, a robust spatially constrained FCM-based image segmentation method with hierarchical region information is proposed in this paper. First, two-level superpixles of the input image are generated by two classical segmentation methods, and the first level superpixels instead of the pixels are as input of FCM. Second, by considering the use of the spatial constraints with high-level superpixels, a novel membership function of the first-level superpixels is designed to overcome the impact of noise in the image and accelerate the convergence of clustering process. Through using superpixels instead of pixels and incorporating superpixel information into the spatial constraints, the proposed method can achieve highly consistent segmentation results. Experimental results on the Berkeley image database demonstrate the good performance of the proposed method.

A mixture model for robust registration in Kinect sensor

Li Peng, Huabing Zhou, Shengguo Zhu

Show abstract

The Microsoft Kinect sensor has been widely used in many applications, but it suffers from the drawback of low registration precision between color image and depth image. In this paper, we present a robust method to improve the registration precision by a mixture model that can handle multiply images with the nonparametric model. We impose non-parametric geometrical constraints on the correspondence, as a prior distribution, in a reproducing kernel Hilbert space (RKHS).The estimation is performed by the EM algorithm which by also estimating the variance of the prior model is able to obtain good estimates. We illustrate the proposed method on the public available dataset. The experimental results show that our approach outperforms the baseline methods.

HOG pedestrian detection based on edge symmetry and trilinear interpolation

Dandan Wang, Tongei Lu, Yanduo Zhang

Show abstract

In computer vision, pedestrian detection is a key problem. In this paper, we propose to speed up the HOG+SVM algorithm without sacrificing the classification accuracy. In order to eliminate the effects of aliasing phenomenon that products in the process of HOG extraction, we used trilinear interpolation to extract feature. This paper proposed HOG pedestrian detection method based on edge symmetry. In these experiments, we used INRIA dataset. Traditional HOG pedestrian detection is presence of slow detection speed and low detection rate. Experiments show that using trilinear interpolation and edge symmetry not only can improve the detection effect, but also can improve the detection rate.

A multi-view face recognition system based on cascade face detector and improved Dlib

Hongjun Zhou, Pei Chen, Wei Shen

Show abstract

In this research, we present a framework for multi-view face detect and recognition system based on cascade face detector and improved Dlib. This method is aimed to solve the problems of low efficiency and low accuracy in multi-view face recognition, to build a multi-view face recognition system, and to discover a suitable monitoring scheme. For face detection, the cascade face detector is used to extracted the Haar-like feature from the training samples, and Haar-like feature is used to train a cascade classifier by combining Adaboost algorithm. Next, for face recognition, we proposed an improved distance model based on Dlib to improve the accuracy of multiview face recognition. Furthermore, we applied this proposed method into recognizing face images taken from different viewing directions, including horizontal view, overlooks view, and looking-up view, and researched a suitable monitoring scheme. This method works well for multi-view face recognition, and it is also simulated and tested, showing satisfactory experimental results.

A framework for farmland parcels extraction based on image classification

Guoying Liu, Wenying Ge, Xu Song, et al.

Show abstract

It is very important for the government to build an accurate national basic cultivated land database. In this work, farmland parcels extraction is one of the basic steps. However, during the past years, people had to spend much time on determining an area is a farmland parcel or not, since they were bounded to understand remote sensing images only from the mere visual interpretation. In order to overcome this problem, in this study, a method was proposed to extract farmland parcels by means of image classification. In the proposed method, farmland areas and ridge areas of the classification map are semantically processed independently and the results are fused together to form the final results of farmland parcels. Experiments on high spatial remote sensing images have shown the effectiveness of the proposed method.

A depth enhancement strategy for kinect depth image

Wei Quan, Hua Li, Cheng Han, et al.

Show abstract

Kinect is a motion sensing input device which is widely used in computer vision and other related fields. However, there are many inaccurate depth data in Kinect depth images even Kinect v2. In this paper, an algorithm is proposed to enhance Kinect v2 depth images. According to the principle of its depth measuring, the foreground and the background are considered separately. As to the background, the holes are filled according to the depth data in the neighborhood. And as to the foreground, a filling algorithm, based on the color image concerning about both space and color information, is proposed. An adaptive joint bilateral filtering method is used to reduce noise. Experimental results show that the processed depth images have clean background and clear edges. The results are better than ones of traditional Strategies. It can be applied in 3D reconstruction fields to pretreat depth image in real time and obtain accurate results.

The method for froth floatation condition recognition based on adaptive feature weighted

Jieran Wang, Jun Zhang, Jinwen Tian, et al.

Show abstract

The fusion of foam characteristics can play a complementary role in expressing the content of foam image. The weight of foam characteristics is the key to make full use of the relationship between the different features. In this paper, an Adaptive Feature Weighted Method For Froth Floatation Condition Recognition is proposed. Foam features without and with weights are both classified by using support vector machine (SVM).The classification accuracy and optimal equaling algorithm under the each ore grade are regarded as the result of the adaptive feature weighting algorithm. At the same time the effectiveness of adaptive weighted method is demonstrated.

Method of segmenting river from remote sensing image

Qingyun Tang, Jun Zhang, Daimeng Zhang, et al.

Show abstract

This paper presents a method of segment the river area in remote sensing images. The spectral distribution of the river area in the image is relatively uniform, and the overall gray level is dark, And the spectrum is evenly distributed regardless of direction, but land area spectral information is very messy, most of the land in the regional spectral distribution is not uniform, maybe some land area spectral distribution is more uniform, but has a certain direction, this paper according to these characteristics, using the cross-type template, the regional variance is used as the regional texture characteristic to obtain the adaptive threshold to obtain the adaptive binary graph. The river is usually a connected water, only a large enough area to determine the river, so the use of binary image marking algorithm to obtain the largest connected area, marked as a river. This paper presents the method of river segmentation. Experiments show that the river segmentation is suitable for remote sensing images with relatively large river regions.

Robust image matching via ORB feature and VFC for mismatch removal

Tao Ma, Wenxing Fu, Bin Fang, et al.

Show abstract

Image matching is at the base of many image processing and computer vision problems, such as object recognition or structure from motion. Current methods rely on good feature descriptors and mismatch removal strategies for detection and matching. In this paper, we proposed a robust image match approach based on ORB feature and VFC for mismatch removal. ORB (Oriented FAST and Rotated BRIEF) is an outstanding feature, it has the same performance as SIFT with lower cost. VFC (Vector Field Consensus) is a state-of-the-art mismatch removing method. The experiment results demonstrate that our method is efficient and robust.

Dual-threshold segmentation using Arimoto entropy based on chaotic bee colony optimization

Li Li

Show abstract

In order to extract target from complex background more quickly and accurately, and to further improve the detection effect of defects, a method of dual-threshold segmentation using Arimoto entropy based on chaotic bee colony optimization was proposed. Firstly, the method of single-threshold selection based on Arimoto entropy was extended to dual-threshold selection in order to separate the target from the background more accurately. Then intermediate variables in formulae of Arimoto entropy dual-threshold selection was calculated by recursion to eliminate redundant computation effectively and to reduce the amount of calculation. Finally, the local search phase of artificial bee colony algorithm was improved by chaotic sequence based on tent mapping. The fast search for two optimal thresholds was achieved using the improved bee colony optimization algorithm, thus the search could be accelerated obviously. A large number of experimental results show that, compared with the existing segmentation methods such as multi-threshold segmentation method using maximum Shannon entropy, two-dimensional Shannon entropy segmentation method, two-dimensional Tsallis gray entropy segmentation method and multi-threshold segmentation method using reciprocal gray entropy, the proposed method can segment target more quickly and accurately with superior segmentation effect. It proves to be an instant and effective method for image segmentation.

The selection of the optimal baseline in the front-view monocular vision system

Bincheng Xiong, Jun Zhang, Daimeng Zhang, et al.

Show abstract

In the front-view monocular vision system, the accuracy of solving the depth field is related to the length of the inter-frame baseline and the accuracy of image matching result. In general, a longer length of the baseline can lead to a higher precision of solving the depth field. However, at the same time, the difference between the inter-frame images increases, which increases the difficulty in image matching and the decreases matching accuracy and at last may leads to the failure of solving the depth field. One of the usual practices is to use the tracking and matching method to improve the matching accuracy between images, but this algorithm is easy to cause matching drift between images with large interval, resulting in cumulative error in image matching, and finally the accuracy of solving the depth field is still very low. In this paper, we propose a depth field fusion algorithm based on the optimal length of the baseline. Firstly, we analyze the quantitative relationship between the accuracy of the depth field calculation and the length of the baseline between frames, and find the optimal length of the baseline by doing lots of experiments; secondly, we introduce the inverse depth filtering technique for sparse SLAM, and solve the depth field under the constraint of the optimal length of the baseline. By doing a large number of experiments, the results show that our algorithm can effectively eliminate the mismatch caused by image changes, and can still solve the depth field correctly in the large baseline scene. Our algorithm is superior to the traditional SFM algorithm in time and space complexity. The optimal baseline obtained by a large number of experiments plays a guiding role in the calculation of the depth field in front-view monocular.

Learning deep features with adaptive triplet loss for person reidentification

Zhiqiang Li, Nong Sang, Kezhou Chen, et al.

Show abstract

Person reidentification (re-id) aims to match a specified person across non-overlapping cameras, which remains a very challenging problem. While previous methods mostly focus on feature extraction or metric learning, this paper makes the attempt in jointly learning both the global full-body and local body-parts features of the input persons with a multichannel convolutional neural network (CNN) model, which is trained by an adaptive triplet loss function that serves to minimize the distance between the same person and maximize the distance between different persons. The experimental results show that our approach achieves very promising results on the large-scale Market-1501 and DukeMTMC-reID datasets.

A novel approach for fire recognition using hybrid features and manifold learning-based classifier

Rong Zhu, Xueying Hu, Jiajun Tang, et al.

Show abstract

Although image/video based fire recognition has received growing attention, an efficient and robust fire detection strategy is rarely explored. In this paper, we propose a novel approach to automatically identify the flame or smoke regions in an image. It is composed to three stages: (1) a block processing is applied to divide an image into several nonoverlapping image blocks, and these image blocks are identified as suspicious fire regions or not by using two color models and a color histogram-based similarity matching method in the HSV color space, (2) considering that compared to other information, the flame and smoke regions have significant visual characteristics, so that two kinds of image features are extracted for fire recognition, where local features are obtained based on the Scale Invariant Feature Transform (SIFT) descriptor and the Bags of Keypoints (BOK) technique, and texture features are extracted based on the Gray Level Co-occurrence Matrices (GLCM) and the Wavelet-based Analysis (WA) methods, and (3) a manifold learning-based classifier is constructed based on two image manifolds, which is designed via an improve Globular Neighborhood Locally Linear Embedding (GNLLE) algorithm, and the extracted hybrid features are used as input feature vectors to train the classifier, which is used to make decision for fire images or non fire images. Experiments and comparative analyses with four approaches are conducted on the collected image sets. The results show that the proposed approach is superior to the other ones in detecting fire and achieving a high recognition accuracy and a low error rate.

Research on three-dimensional reconstruction method based on binocular vision

Jinlin Li, Zhihui Wang, Minjun Wang

Show abstract

As the hot and difficult issue in computer vision, binocular stereo vision is an important form of computer vision,which has a broad application prospects in many computer vision fields,such as aerial mapping,vision navigation,motion analysis and industrial inspection etc.In this paper, a research is done into binocular stereo camera calibration, image feature extraction and stereo matching. In the binocular stereo camera calibration module, the internal parameters of a single camera are obtained by using the checkerboard lattice of zhang zhengyou the field of image feature extraction and stereo matching, adopted the SURF operator in the local feature operator and the SGBM algorithm in the global matching algorithm are used respectively, and the performance are compared. After completed the feature points matching, we can build the corresponding between matching points and the 3D object points using the camera parameters which are calibrated, which means the 3D information.

Multi-modal image registration via depth information based on point set matching

Bin Sun, Qi Yang, Kai Hu, et al.

Show abstract

Image registration is an important pre-processing operation to perform multi-modal joint analysis correctly. However, registration of images captured by different sensors is a very challenging problem due to the apparent differences of scenes. Traditional Coherent Point Drift method (CPD) is a global registration approach, which strongly relies on the extracted features. In the case of infrared and visible images, registration methods based on edges or points are inappropriate since those features might be significantly different. Fortunately, depth information is more robust feature for multi-modal image pairs. In this paper, we propose an algorithm based on Canny to extract edge of objects. And the regions of interest (ROI) is obtained by depth maps of image pairs in which common features usually successfully implemented by point set registration. Experimental results on real world data demonstrate the effectiveness of the proposed approach, which is superior to the traditional CPD algorithm for multi-modal image registration.

Scene text detection by leveraging multi-channel information and local context

Runmin Wang, Shengyou Qian, Jianfeng Yang, et al.

Show abstract

As an important information carrier, texts play significant roles in many applications. However, text detection in unconstrained scenes is a challenging problem due to cluttered backgrounds, various appearances, uneven illumination, etc.. In this paper, an approach based on multi-channel information and local context is proposed to detect texts in natural scenes. According to character candidate detection plays a vital role in text detection system, Maximally Stable Extremal Regions(MSERs) and Graph-cut based method are integrated to obtain the character candidates by leveraging the multi-channel image information. A cascaded false positive elimination mechanism are constructed from the perspective of the character and the text line respectively. Since the local context information is very valuable for us, these information is utilized to retrieve the missing characters for boosting the text detection performance. Experimental results on two benchmark datasets, i.e., the ICDAR 2011 dataset and the ICDAR 2013 dataset, demonstrate that the proposed method have achieved the state-of-the-art performance.

Selecting good regions to deblur via relative total variation

Lerenhan Li, Hao Yan, Zhihua Fan, et al.

Show abstract

Image deblurring is to estimate the blur kernel and to restore the latent image. It is usually divided into two stage, including kernel estimation and image restoration. In kernel estimation, selecting a good region that contains structure information is helpful to the accuracy of estimated kernel. Good region to deblur is usually expert-chosen or in a trial-anderror way. In this paper, we apply a metric named relative total variation (RTV) to discriminate the structure regions from smooth and texture. Given a blurry image, we first calculate the RTV of each pixel to determine whether it is the pixel in structure region, after which, we sample the image in an overlapping way. At last, the sampled region that contains the most structure pixels is the best region to deblur. Both qualitative and quantitative experiments show that our proposed method can help to estimate the kernel accurately.

Single image super-resolution based on convolutional neural networks

Lamei Zou, Ming Luo, Weidong Yang, et al.

Show abstract

We present a deep learning method for single image super-resolution (SISR). The proposed approach learns end-to-end mapping between low-resolution (LR) images and high-resolution (HR) images. The mapping is represented as a deep convolutional neural network which inputs the LR image and outputs the HR image. Our network uses 5 convolution layers, which kernels size include 5×5, 3×3 and 1×1. In our proposed network, we use residual-learning and combine different sizes of convolution kernels at the same layer. The experiment results show that our proposed method performs better than the existing methods in reconstructing quality index and human visual effects on benchmarked images.

An improved multi-paths optimization method for video stabilization

Tao Qin, Sheng Zhong

Show abstract

For video stabilization, the difference between original camera motion path and the optimized one is proportional to the cropping ratio and warping ratio. A good optimized path should preserve the moving tendency of the original one meanwhile the cropping ratio and warping ratio of each frame should be kept in a proper range. In this paper we use an improved warping-based motion representation model, and propose a gauss-based multi-paths optimization method to get a smoothing path and obtain a stabilized video. The proposed video stabilization method consists of two parts: camera motion path estimation and path smoothing. We estimate the perspective transform of adjacent frames according to warping-based motion representation model. It works well on some challenging videos where most previous 2D methods or 3D methods fail for lacking of long features trajectories. The multi-paths optimization method can deal well with parallax, as we calculate the space-time correlation of the adjacent grid, and then a kernel of gauss is used to weigh the motion of adjacent grid. Then the multi-paths are smoothed while minimize the crop ratio and the distortion. We test our method on a large variety of consumer videos, which have casual jitter and parallax, and achieve good results.

Threshold-adaptive canny operator based on cross-zero points

Boqi Liu, Xiuhua Zhang, Hanyu Hong

Show abstract

Canny edge detection[1] is a technique to extract useful structural information from different vision objects and dramatically reduce the amount of data to be processed. It has been widely applied in various computer vision systems. There are two thresholds have to be settled before the edge is segregated from background. Usually, by the experience of developers, two static values are set as the thresholds[2]. In this paper, a novel automatic thresholding method is proposed. The relation between the thresholds and Cross-zero Points is analyzed, and an interpolation function is deduced to determine the thresholds. Comprehensive experimental results demonstrate the effectiveness of proposed method and advantageous for stable edge detection at changing illumination.

Supervised guiding long-short term memory for image caption generation based on object classes

Jian Wang, Zhiguo Cao, Yang Xiao, et al.

Show abstract

The present models of image caption generation have the problems of image visual semantic information attenuation and errors in guidance information. In order to solve these problems, we propose a supervised guiding Long Short Term Memory model based on object classes, named S-gLSTM for short. It uses the object detection results from R-FCN as supervisory information with high confidence, and updates the guidance word set by judging whether the last output matches the supervisory information. S-gLSTM learns how to extract the current interested information from the image visual se-mantic information based on guidance word set. The interested information is fed into the S-gLSTM at each iteration as guidance information, to guide the caption generation. To acquire the text-related visual semantic information, the S-gLSTM fine-tunes the weights of the network through the back-propagation of the guiding loss. Complementing guidance information at each iteration solves the problem of visual semantic information attenuation in the traditional LSTM model. Besides, the supervised guidance information in our model can reduce the impact of the mismatched words on the caption generation. We test our model on MSCOCO2014 dataset, and obtain better performance than the state-of-the- art models.

Color correction using weighted moving least squares in image mosaicking applications

Chengcai Du, Sheng Zhong

Show abstract

In image mosaicking applications, the color of images for mosaicking may be inconsistent due to different camera settings and lighting conditions. This paper proposes an effective color correction algorithm to correct the photometrical disparities. Firstly, corresponding points between two images are extracted using SIFT Flow. Secondly, the corresponding points will serve as the control points of the weighted moving least squares algorithm to correct the color of input image. This operation is conducted for each channel of input image respectively in RGB space. Finally, combining three corrected single channel image, we get the final corrected RGB image. Experimental results have shown that the proposed color correction method yields better performance than the state-of-art algorithms.

A method of non-contact reading code based on computer vision

Chunsen Zhang, Xiaoyu Zong, Bingxuan Guo

Show abstract

With the purpose of guarantee the computer information exchange security between internal and external network (trusted network and un-trusted network), A non-contact Reading code method based on machine vision has been proposed. Which is different from the existing network physical isolation method. By using the computer monitors, camera and other equipment. Deal with the information which will be on exchanged, Include image coding ,Generate the standard image , Display and get the actual image , Calculate homography matrix, Image distort correction and decoding in calibration, To achieve the computer information security, Non-contact, One-way transmission between the internal and external network , The effectiveness of the proposed method is verified by experiments on real computer text data, The speed of data transfer can be achieved 24kb/s. The experiment shows that this algorithm has the characteristics of high security, fast velocity and less loss of information. Which can meet the daily needs of the confidentiality department to update the data effectively and reliably, Solved the difficulty of computer information exchange between Secret network and non-secret network, With distinctive originality, practicability, and practical research value.

Drug-related webpages classification based on multi-modal local decision fusion

Ruiguang Hu, Xiaojing Su, Yanxin Liu

Show abstract

In this paper, multi-modal local decision fusion is used for drug-related webpages classification. First, meaningful text are extracted through HTML parsing, and effective images are chosen by the FOCARSS algorithm. Second, six SVM classifiers are trained for six kinds of drug-taking instruments, which are represented by PHOG. One SVM classifier is trained for the cannabis, which is represented by the mid-feature of BOW model. For each instance in a webpage, seven SVMs give seven labels for its image, and other seven labels are given by searching the names of drug-taking instruments and cannabis in its related text. Concatenating seven labels of image and seven labels of text, the representation of those instances in webpages are generated. Last, Multi-Instance Learning is used to classify those drugrelated webpages. Experimental results demonstrate that the classification accuracy of multi-instance learning with multi-modal local decision fusion is much higher than those of single-modal classification.

A new matching algorithm for affine point sets

Zhiguo Tan, Jianping Ou, Fubing Chen, et al.

Show abstract

A novel point pattern matching algorithm based on point feature is proposed. In the paper, we construct the point's feature map, according to the point set's distribution and points' position. Then the log-polar coordinate transformation is applied to the feature map, and the moment invariants method is used to describe the transformed feature map and it's written by the form of vectors. Thus, the curse matching results is acquired by comparing the feature vectors. After these, an iterative method,the relaxation labeling method, is introduced for the final matching result. There are two contributions made in this paper. Firstly, we construct a log-polar coordinate transformation based point feature(L-PTM), which can stand affine transformation.Secondly, a new point pattern matching algorithm is proposed, which is combined L-PTM with the relaxation labeling. The method is insensitive to outliers and noises. Experiments demonstrate the validity and robustness of the algorithm.

Week texture objects pose estimation based on 3D model

Yang Chen, Hanmo Zhang, Shaoxiong Tian, et al.

Show abstract

This paper proposes a 3D pose estimation method for week texture objects, by performing point matching of a test image to a matched rendering image of an object rather than its 3D model. Give a 3D model of an object, we use an exemplar based 2D-3D matching method to estimate the coarse pose of the object. We first obtain the 2D rendering images of each view of the object using its 3D model, and build an exemplar based model using all the rendering images. For a test image, we then perform 2D-3D matching using the proposed model, and the rendering image with the highest score is the best match to the test image. The coarse pose can be obtained using the view parameters of the rending images. Finally, we perform point matching between the matched rendering image and the test image to estimate pose more accurately. The proposed coarse-to- fine pose estimation method can provide stronger constraint, which makes pose estimation more accurate. The experimental results demonstrate the effectiveness of the proposed method.

Deep learning based hand gesture recognition in complex scenes

Zihan Ni, Nong Sang, Cheng Tan

Show abstract

Recently, region-based convolutional neural networks(R-CNNs) have achieved significant success in the field of object detection, but their accuracy is not too high for small objects and similar objects, such as the gestures. To solve this problem, we present an online hard example testing(OHET) technology to evaluate the confidence of the R-CNNs' outputs, and regard those outputs with low confidence as hard examples. In this paper, we proposed a cascaded networks to recognize the gestures. Firstly, we use the region-based fully convolutional neural network(R-FCN), which is capable of the detection for small object, to detect the gestures, and then use the OHET to select the hard examples. To enhance the accuracy of the gesture recognition, we re-classify the hard examples through VGG-19 classification network to obtain the final output of the gesture recognition system. Through the contrast experiments with other methods, we can see that the cascaded networks combined with the OHET reached to the state-of-the-art results of 99.3% mAP on small and similar gestures in complex scenes.

Fast-match on particle swarm optimization with variant system mechanism

Yuehuang Wang, Xin Fang, Jie Chen

Show abstract

Fast-Match is a fast and effective algorithm for approximate template matching under 2D affine transformations, which can match the target with maximum similarity without knowing the target gesture. It depends on the minimum Sum-of-Absolute-Differences (SAD) error to obtain the best affine transformation. The algorithm is widely used in the field of matching images because of its fastness and robustness. In this paper, our approach is to search an approximate affine transformation over Particle Swarm Optimization (PSO) algorithm. We treat each potential transformation as a particle that possesses memory function. Each particle is given a random speed and flows throughout the 2D affine transformation space. To accelerate the algorithm and improve the abilities of seeking the global excellent result, we have introduced the variant system mechanism on this basis. The benefit is that we can avoid matching with huge amount of potential transformations and falling into local optimal condition, so that we can use a few transformations to approximate the optimal solution. The experimental results prove that our method has a faster speed and a higher accuracy performance with smaller affine transformation space.

Discriminative correlation filter tracking with occlusion detection

Shuo Zhang, Zhong Chen, XiPeng Yu, et al.

Show abstract

Aiming at the problem that the correlation filter-based tracking algorithm can not track the target of severe occlusion, a target re-detection mechanism is proposed. First of all, based on the ECO, we propose the multi-peak detection model and the response value to distinguish the occlusion and deformation in the target tracking, which improve the success rate of tracking. And then we add the confidence model to update the mechanism to effectively prevent the model offset problem which due to similar targets or background during the tracking process. Finally, the redetection mechanism of the target is added, and the relocation is performed after the target is lost, which increases the accuracy of the target positioning. The experimental results demonstrate that the proposed tracker performs favorably against state-of-the-art methods in terms of robustness and accuracy.

A method of airborne infrared and visible image matching based on HOG feature

Xue Wang, Qing Zhou, Qiang Liu, et al.

Show abstract

In the all-time matching and navigation task, the aircraft applies real-time infrared images acquired by infrared imagery sensor to match the referenced visible image provided by the satellite for accurate location. However, the large difference between the infrared image and the visible image makes the task challenging. In this paper, for the sake of engineering application in the avionics system, we obtain real-time infrared images according to the flight trajectory, and then use them to match the referenced visible images. Furthermore, the HOG features are extracted respectively from real-time infrared images and referenced visible images to describe their feature similarity, for the purpose of accurate matching and localization. Experimental results demonstrate that our proposed method can not only realize the matching between airborne infrared and visible images, but also achieve high location accuracy, which shows good performance and robustness.

A method of vehicle license plate recognition based on PCANet and compressive sensing

Xianyi Ye, Feng Min

Show abstract

The manual feature extraction of the traditional method for vehicle license plates has no good robustness to change in diversity. And the high feature dimension that is extracted with Principal Component Analysis Network (PCANet) leads to low classification efficiency. For solving these problems, a method of vehicle license plate recognition based on PCANet and compressive sensing is proposed. First, PCANet is used to extract the feature from the images of characters. And then, the sparse measurement matrix which is a very sparse matrix and consistent with Restricted Isometry Property (RIP) condition of the compressed sensing is used to reduce the dimensions of extracted features. Finally, the Support Vector Machine (SVM) is used to train and recognize the features whose dimension has been reduced. Experimental results demonstrate that the proposed method has better performance than Convolutional Neural Network (CNN) in the recognition and time. Compared with no compression sensing, the proposed method has lower feature dimension for the increase of efficiency.

Automated railroad reconstruction from remote sensing image based on texture filter

Jie Xiao, Kaixia Lu

Show abstract

Techniques of remote sensing have been improved incredibly in recent years and very accurate results and high resolution images can be acquired. There exist possible ways to use such data to reconstruct railroads. In this paper, an automated railroad reconstruction method from remote sensing images based on Gabor filter was proposed. The method is divided in three steps. Firstly, the edge-oriented railroad characteristics (such as line features) in a remote sensing image are detected using Gabor filter. Secondly, two response images with the filtering orientations perpendicular to each other are fused to suppress the noise and acquire a long stripe smooth region of railroads. Thirdly, a set of smooth regions can be extracted by firstly computing global threshold for the previous result image using Otsu's method and then converting it to a binary image based on the previous threshold. This workflow is tested on a set of remote sensing images and was found to deliver very accurate results in a quickly and highly automated manner.

Stereo matching algorithm based on double components model

Xiao Zhou, Kejun Ou, Jianxin Zhao, et al.

Show abstract

The tiny wires are the great threat to the safety of the UAV flight. Because they have only several pixels isolated far from the background, while most of the existing stereo matching methods require a certain area of the support region to improve the robustness, or assume the depth dependence of the neighboring pixels to meet requirement of global or semi global optimization method. So there will be some false alarms even failures when images contains tiny wires. A new stereo matching algorithm is approved in the paper based on double components model. According to different texture types the input image is decomposed into two independent component images. One contains only sparse wire texture image and another contains all remaining parts. Different matching schemes are adopted for each component image pairs. Experiment proved that the algorithm can effectively calculate the depth image of complex scene of patrol UAV, which can detect tiny wires besides the large size objects. Compared with the current mainstream method it has obvious advantages.

Ship detection based on rotation-invariant HOG descriptors for airborne infrared images

Guojing Xu, Jinyan Wang, Shengxiang Qi

Show abstract

Infrared thermal imagery is widely used in various kinds of aircraft because of its all-time application. Meanwhile, detecting ships from infrared images attract lots of research interests in recent years. In the case of downward-looking infrared imagery, in order to overcome the uncertainty of target imaging attitude due to the unknown position relationship between the aircraft and the target, we propose a new infrared ship detection method which integrates rotation invariant gradient direction histogram (Circle Histogram of Oriented Gradient, C-HOG) descriptors and the support vector machine (SVM) classifier. In details, the proposed method uses HOG descriptors to express the local feature of infrared images to adapt to changes in illumination and to overcome sea clutter effects. Different from traditional computation of HOG descriptor, we subdivide the image into annular spatial bins instead of rectangle sub-regions, and then Radial Gradient Transform (RGT) on the gradient is applied to achieve rotation invariant histogram information. Considering the engineering application of airborne and real-time requirements, we use SVM for training ship target and non-target background infrared sample images to discriminate real ships from false targets. Experimental results show that the proposed method has good performance in both the robustness and run-time for infrared ship target detection with different rotation angles.

RPBS: Rotational Projected Binary Structure for point cloud representation

Bin Fang, Zhiwei Zhou, Tao Ma, et al.

Show abstract

In this paper, we proposed a novel three-dimension local surface descriptor named RPBS for point cloud representation. First, points cropped form the query point within a predefined radius is regard as a local surface patch. Then pose normalization is done to the local surface to equip our descriptor with the invariance to rotation transformation. To obtain more information about the cropped surface, multi-view representation is formed by successively rotating it along the coordinate axis. Further, orthogonal projections to the three coordinate plane are adopted to construct two-dimension distribution matrixes, and binarization is applied to each matrix by following the rule that whether the grid is occupied, if yes, set the grid one, otherwise zero. We calculate the binary maps from all the viewpoints and concatenate them together as the final descriptor. Comparative experiments for evaluating our proposed descriptor is conducted on the standard dataset named Bologna with several state-of-the-art 3D descriptors, and results show that our descriptor achieves the best performance on feature matching experiments.

Environmentally adaptive crop extraction for agricultural automation using super-pixel and LAB Gaussian model

Cuina Li, Guangyu Shi, Zhenghong Yu

Show abstract

In this paper, we proposed an environmentally adaptive crop extraction method for agricultural automation using LAB Gaussian model and super-pixel segmentation. A Gaussian mixture model in LAB color space is introduced to describe the distribution of crop pixel to adapt to the outdoor environment and the super-pixel technique is applied for structure preserving. Comparing experiment show that our method outperforms the other commonly used extraction methods.

Improved dense trajectories for action recognition based on random projection and Fisher vectors

Shihui Ai, Tongwei Lu, Yudian Xiong

Show abstract

As an important application of intelligent monitoring system, the action recognition in video has become a very important research area of computer vision. In order to improve the accuracy rate of the action recognition in video with improved dense trajectories, one advanced vector method is introduced. Improved dense trajectories combine Fisher Vector with Random Projection. The method realizes the reduction of the characteristic trajectory though projecting the high-dimensional trajectory descriptor into the low-dimensional subspace based on defining and analyzing Gaussian mixture model by Random Projection. And a GMM-FV hybrid model is introduced to encode the trajectory feature vector and reduce dimension. The computational complexity is reduced by Random Projection which can drop Fisher coding vector. Finally, a Linear SVM is used to classifier to predict labels. We tested the algorithm in UCF101 dataset and KTH dataset. Compared with existed some others algorithm, the result showed that the method not only reduce the computational complexity but also improved the accuracy of action recognition.

Action recognition in depth video from RGB perspective: A knowledge transfer manner

Jun Chen, Yang Xiao, Zhiguo Cao, et al.

Show abstract

Different video modal for human action recognition has becoming a highly promising trend in the video analysis. In this paper, we propose a method for human action recognition from RGB video to Depth video using domain adaptation, where we use learned feature from RGB videos to do action recognition for depth videos. More specifically, we make three steps for solving this problem in this paper. First, different from image, video is more complex as it has both spatial and temporal information, in order to better encode this information, dynamic image method is used to represent each RGB or Depth video to one image, based on this, most methods for extracting feature in image can be used in video. Secondly, as video can be represented as image, so standard CNN model can be used for training and testing for videos, beside, CNN model can be also used for feature extracting as its powerful feature expressing ability. Thirdly, as RGB videos and Depth videos are belong to two different domains, in order to make two different feature domains has more similarity, domain adaptation is firstly used for solving this problem between RGB and Depth video, based on this, the learned feature from RGB video model can be directly used for Depth video classification. We evaluate the proposed method on one complex RGB-D action dataset (NTU RGB-D), and our method can have more than 2% accuracy improvement using domain adaptation from RGB to Depth action recognition.

Geometry and coherence based feature matching for structure from motion

Kai Wei, Kun Sun, Wenbing Tao

Show abstract

We present a fast feature matching approach based on coherence and geometry constraints. Our method first estimates the epipolar geometry between the images with a small number of feature points, then uses the epipolar geometry constraint and the coherence among the matches to guide the matching of the remaining features. For the rest of the feature points, we firstly reduce the scope of the candidate matching points according to the epipolar geometry constraint. After that, we use the coherence constraint, which requires the matching points of neighboring feature points to be neighbors, to further reduce the number of the candidate matching points. Such a strategy can effectively reduce the matching time and retain more correct matches which are filtered by David Lowe’s ratio test. Finally, we remove the mismatches roughly with the coherence among the matches. We validate the effectiveness of our method through matching and SfM results on various of public datasets.

Research on driver fatigue detection

Ting Zhang, Zhong Chen, Chao Ouyang

Show abstract

Driver fatigue is one of the main causes of frequent traffic accidents. In this case, driver fatigue detection system has very important significance in avoiding traffic accidents. This paper presents a real-time method based on fusion of multiple facial features, including eye closure, yawn and head movement. The eye state is classified as being open or closed by a linear SVM classifier trained using HOG features of the detected eye. The mouth state is determined according to the width-height ratio of the mouth. The head movement is detected by head pitch angle calculated by facial landmark. The driver’s fatigue state can be reasoned by the model trained by above features. According to experimental results, drive fatigue detection obtains an excellent performance. It indicates that the developed method is valuable for the application of avoiding traffic accidents caused by driver’s fatigue.

Smoke regions extraction based on two steps segmentation and motion detection in early fire

Wenlin Jian, Kaizhi Wu, Zirong Yu, et al.

Show abstract

Aiming at the early problems of video-based smoke detection in fire video, this paper proposes a method to extract smoke suspected regions by combining two steps segmentation and motion characteristics. Early smoldering smoke can be seen as gray or gray-white regions. In the first stage, regions of interests (ROIs) with smoke are obtained by using two step segmentation methods. Then, suspected smoke regions are detected by combining the two step segmentation and motion detection. Finally, morphological processing is used for smoke regions extracting. The Otsu algorithm is used as segmentation method and the ViBe algorithm is used to detect the motion of smoke. The proposed method was tested on 6 test videos with smoke. The experimental results show the effectiveness of our proposed method over visual observation.

Collaborative identification method for sea battlefield target based on deep convolutional neural networks

Guangdi Zheng, Mingbo Pan, Wei Liu, et al.

Show abstract

The target identification of the sea battlefield is the prerequisite for the judgment of the enemy in the modern naval battle. In this paper, a collaborative identification method based on convolution neural network is proposed to identify the typical targets of sea battlefields. Different from the traditional single-input/single-output identification method, the proposed method constructs a multi-input/single-output co-identification architecture based on optimized convolution neural network and weighted D-S evidence theory. The simulation results show that

Airplane detection in remote sensing images using convolutional neural networks

Chao Ouyang, Zhong Chen, Feng Zhang, et al.

Show abstract

Airplane detection in remote sensing images remains a challenging problem and has also been taking a great interest to researchers. In this paper we propose an effective method to detect airplanes in remote sensing images using convolutional neural networks. Deep learning methods show greater advantages than the traditional methods with the rise of deep neural networks in target detection, and we give an explanation why this happens. To improve the performance on detection of airplane, we combine a region proposal algorithm with convolutional neural networks. And in the training phase, we divide the background into multi classes rather than one class, which can reduce false alarms. Our experimental results show that the proposed method is effective and robust in detecting airplane.

Unsupervised classification of high-resolution remote-sensing images under edge constraints

Wenying Ge, Guoying Liu

Show abstract

Classification is a crucial task in various remote sensing applications. While edge is one of the most important characteristics in the high-resolution remote-sensing images, which helps much for the improvement of classification accuracy. Therefore, in this paper, we propose an unsupervised classification method by incorporating edge information into a clustering procedure. Firstly, a consistency coefficient function, which indicates the similarity between edges obtained by clustering and by the edge detection methods, is defined to guarantee more accurate edges. Sequentially, a clustering procedure based on HMRFFCM is designed, in which the edge constraints are exploited by using the edge consistency. Experiments on synthetic and real remote sensing images have shown that the proposed methods can get more accurate classification results.

Modeling of biologically motivated self-learning equivalent-convolutional recurrent-multilayer neural structures (BLM_SL_EC_RMNS) for image fragments clustering and recognition

Vladimir G. Krasilenko, Alexander A. Lazarev, Diana V. Nikitovich

Show abstract

The biologically-motivated self-learning equivalence-convolutional recurrent-multilayer neural structures (BLM_SL_EC_RMNS) for fragments images clustering and recognition will be discussed. We shall consider these neural structures and their spatial-invariant equivalental models (SIEMs) based on proposed equivalent two-dimensional functions of image similarity and the corresponding matrix-matrix (or tensor) procedures using as basic operations of continuous logic and nonlinear processing. These SIEMs can simply describe the signals processing during the all training and recognition stages and they are suitable for unipolar-coding multilevel signals. The clustering efficiency in such models and their implementation depends on the discriminant properties of neural elements of hidden layers. Therefore, the main models and architecture parameters and characteristics depends on the applied types of non-linear processing and function used for image comparison or for adaptive-equivalent weighing of input patterns. We show that these SL_EC_RMNSs have several advantages, such as the self-study and self-identification of features and signs of the similarity of fragments, ability to clustering and recognize of image fragments with best efficiency and strong mutual correlation. The proposed combined with learning-recognition clustering method of fragments with regard to their structural features is suitable not only for binary, but also color images and combines self-learning and the formation of weight clustered matrix-patterns. Its model is constructed and designed on the basis of recursively continuous logic and nonlinear processing algorithms and to k-average method or method the winner takes all (WTA). The experimental results confirmed that fragments with a large numbers of elements may be clustered. For the first time the possibility of generalization of these models for space invariant case is shown. The experiment for an images of different dimensions (a reference array) and fragments with diferent dimensions for clustering is carried out. The experiments, using the software environment Mathcad showed that the proposed method is universal, has a significant convergence, the small number of iterations is easily, displayed on the matrix structure, and confirmed its prospects. Thus, to understand the mechanisms of self-learning equivalence-convolutional clustering, accompanying her to the competitive processes in neurons, and the neural auto-encoding-decoding and recognition principles with the use of self-learning cluster patterns is very important which used the algorithm and the principles of non-linear processing of two-dimensional spatial functions of images comparison. The experimental results show that such models can be successfully used for auto- and hetero-associative recognition. Also they can be used to explain some mechanisms, known as "the reinforcementinhibition concept". Also we demonstrate a real model experiments, which confirm that the nonlinear processing by equivalent function allow to determine the neuron-winners and customize the weight matrix. At the end of the report, we will show how to use the obtained results and to propose new more efficient hardware architecture of SL_EC_RMNS based on matrix-tensor multipliers. Also we estimate the parameters and performance of such architectures.

Deep visual-semantic for crowded video understanding

Chunhua Deng, Junwen Zhang

Show abstract

Visual-semantic features play a vital role for crowded video understanding. Convolutional Neural Networks (CNNs) have experienced a significant breakthrough in learning representations from images. However, the learning of visualsemantic features, and how it can be effectively extracted for video analysis, still remains a challenging task. In this study, we propose a novel visual-semantic method to capture both appearance and dynamic representations. In particular, we propose a spatial context method, based on the fractional Fisher vector (FV) encoding on CNN features, which can be regarded as our main contribution. In addition, to capture temporal context information, we also applied fractional encoding method on dynamic images. Experimental results on the WWW crowed video dataset demonstrate that the proposed method outperform the state of the art.

Weighted least square method for epipolar rectification in semi-calibrated image

Guojia Zhu, Huabing Zhou, Yiwei Tao III

Show abstract

The traditional method for dealing with the problem of epipolar rectification in the semi-calibrated case is to use RANdom SAmple Consensus(RANSAC), which could not get a correct parameter when exist serious mismatch points. So the weighted least square method is proposed to solve this problem. First, extracting Scale Invariant Feature Transform(SIFT) and conducting initial feature matching for image pairs. Next, according to the internal geometric relations of corresponding points, transforming the problem into a maximum likelihood estimate problem. And then, each pair of corresponding points is given weight, and the weight is regarded as a latent variable to stand for the precision of correct matching. Finally, weighted least square method and Expectation Maximization(EM) algorithm are used to estimate the latent variable and uncalibrated parameters. Experimental results show that propo- sed method could not only keep rectified precision high, but also has slighter image morphing and faster rectified velocity than state-of-the-art algorithms.

Detection of vehicle parts based on Faster R-CNN and relative position information

Mingwen Zhang, Nong Sang, Youbin Chen, et al.

Show abstract

Detection and recognition of vehicles are two essential tasks in intelligent transportation system (ITS). Currently, a prevalent method is to detect vehicle body, logo or license plate at first, and then recognize them. So the detection task is the most basic, but also the most important work. Besides the logo and license plate, some other parts, such as vehicle face, lamp, windshield and rearview mirror, are also key parts which can reflect the characteristics of vehicle and be used to improve the accuracy of recognition task. In this paper, the detection of vehicle parts is studied, and the work is novel. We choose Faster R-CNN as the basic algorithm, and take the local area of an image where vehicle body locates as input, then can get multiple bounding boxes with their own scores. If the box with maximum score is chosen as final result directly, it is often not the best one, especially for small objects. This paper presents a method which corrects original score with relative position information between two parts. Then we choose the box with maximum comprehensive score as the final result. Compared with original output strategy, the proposed method performs better.

Near infrared and visible face recognition based on decision fusion of LBP and DCT features

Zhihua Xie, Shuai Zhang, Guodong Liu, et al.

Show abstract

Visible face recognition systems, being vulnerable to illumination, expression, and pose, can not achieve robust performance in unconstrained situations. Meanwhile, near infrared face images, being light- independent, can avoid or limit the drawbacks of face recognition in visible light, but its main challenges are low resolution and signal noise ratio (SNR). Therefore, near infrared and visible fusion face recognition has become an important direction in the field of unconstrained face recognition research. In order to extract the discriminative complementary features between near infrared and visible images, in this paper, we proposed a novel near infrared and visible face fusion recognition algorithm based on DCT and LBP features. Firstly, the effective features in near-infrared face image are extracted by the low frequency part of DCT coefficients and the partition histograms of LBP operator. Secondly, the LBP features of visible-light face image are extracted to compensate for the lacking detail features of the near-infrared face image. Then, the LBP features of visible-light face image, the DCT and LBP features of near-infrared face image are sent to each classifier for labeling. Finally, decision level fusion strategy is used to obtain the final recognition result. The visible and near infrared face recognition is tested on HITSZ Lab2 visible and near infrared face database. The experiment results show that the proposed method extracts the complementary features of near-infrared and visible face images and improves the robustness of unconstrained face recognition. Especially for the circumstance of small training samples, the recognition rate of proposed method can reach 96.13%, which has improved significantly than 92.75 % of the method based on statistical feature fusion.

Low, slow, small target recognition based on spatial vision network

Zhao Cheng, Pei Guo, Xin Qi

Show abstract

Traditional photoelectric monitoring is monitored using a large number of identical cameras. In order to ensure the full coverage of the monitoring area, this monitoring method uses more cameras, which leads to more monitoring and repetition areas, and higher costs, resulting in more waste. In order to reduce the monitoring cost and solve the difficult problem of finding, identifying and tracking a low altitude, slow speed and small target, this paper presents spatial vision network for low-slow-small targets recognition. Based on camera imaging principle and monitoring model, spatial vision network is modeled and optimized. Simulation experiment results demonstrate that the proposed method has good performance.

Vehicle parts detection based on Faster - RCNN with location constraints of vehicle parts feature point

Liqin Yang M.D., Nong Sang, Changxin Gao

Show abstract

Vehicle parts detection plays an important role in public transportation safety and mobility. The detection of vehicle parts is to detect the position of each vehicle part. We propose a new approach by combining Faster RCNN and three level cascaded convolutional neural network (DCNN). The output of Faster RCNN is a series of bounding boxes with coordinate information, from which we can locate vehicle parts. DCNN can precisely predict feature point position, which is the center of vehicle part. We design an output strategy by combining these two results. There are two advantages for this. The quality of the bounding boxes are greatly improved, which means vehicle parts feature point position can be located more precise. Meanwhile we preserve the position relationship between vehicle parts and effectively improve the validity and reliability of the result. By using our algorithm, the performance of the vehicle parts detection improve obviously compared with Faster RCNN.

Hand pose estimation in depth image using CNN and random forest

Xi Chen, Zhiguo Cao, Yang Xiao, et al.

Show abstract

Thanks to the availability of low cost depth cameras, like Microsoft Kinect, 3D hand pose estimation attracted special research attention in these years. Due to the large variations in hand`s viewpoint and the high dimension of hand motion, 3D hand pose estimation is still challenging. In this paper we propose a two-stage framework which joint with CNN and Random Forest to boost the performance of hand pose estimation. First, we use a standard Convolutional Neural Network (CNN) to regress the hand joints` locations. Second, using a Random Forest to refine the joints from the first stage. In the second stage, we propose a pyramid feature which merges the information flow of the CNN. Specifically, we get the rough joints` location from first stage, then rotate the convolutional feature maps (and image). After this, for each joint, we map its location to each feature map (and image) firstly, then crop features at each feature map (and image) around its location, put extracted features to Random Forest to refine at last. Experimentally, we evaluate our proposed method on ICVL dataset and get the mean error about 11mm, our method is also real-time on a desktop.

A fast non-local means algorithm based on integral image and reconstructed similar kernel

Zheng Lin, Enmin Song

Show abstract

Image denoising is one of the essential methods in digital image processing. The non-local means (NLM) denoising approach is a remarkable denoising technique. However, its time complexity of the computation is high. In this paper, we design a fast NLM algorithm based on integral image and reconstructed similar kernel. First, the integral image is introduced in the traditional NLM algorithm. In doing so, it reduces a great deal of repetitive operations in the parallel processing, which will greatly improves the running speed of the algorithm. Secondly, in order to amend the error of the integral image, we construct a similar window resembling the Gaussian kernel in the pyramidal stacking pattern. Finally, in order to eliminate the influence produced by replacing the Gaussian weighted Euclidean distance with Euclidean distance, we propose a scheme to construct a similar kernel with a size of 3 x 3 in a neighborhood window which will reduce the effect of noise on a single pixel. Experimental results demonstrate that the proposed algorithm is about seventeen times faster than the traditional NLM algorithm, yet produce comparable results in terms of Peak Signal-to- Noise Ratio (the PSNR increased 2.9% in average) and perceptual image quality.

A portable low-cost 3D point cloud acquiring method based on structure light

Li Gui, Shunyi Zheng, Xia Huang, et al.

Show abstract

A fast and low-cost method of acquiring 3D point cloud data is proposed in this paper, which can solve the problems of lack of texture information and low efficiency of acquiring point cloud data with only one pair of cheap cameras and projector. Firstly, we put forward a scene adaptive design method of random encoding pattern, that is, a coding pattern is projected onto the target surface in order to form texture information, which is favorable for image matching. Subsequently, we design an efficient dense matching algorithm that fits the projected texture. After the optimization of global algorithm and multi-kernel parallel development with the fusion of hardware and software, a fast acquisition system of point-cloud data is accomplished. Through the evaluation of point cloud accuracy, the results show that point cloud acquired by the method proposed in this paper has higher precision. What`s more, the scanning speed meets the demand of dynamic occasion and has better practical application value.

Chinese character recognition based on Gabor feature extraction and CNN

Yudian Xiong, Tongwei Lu, Yongyuan Jiang

Show abstract

As an important application in the field of text line recognition and office automation, Chinese character recognition has become an important subject of pattern recognition. However, due to the large number of Chinese characters and the complexity of its structure, there is a great difficulty in the Chinese character recognition. In order to solve this problem, this paper proposes a method of printed Chinese character recognition based on Gabor feature extraction and Convolution Neural Network(CNN). The main steps are preprocessing, feature extraction, training classification. First, the gray-scale Chinese character image is binarized and normalized to reduce the redundancy of the image data. Second, each image is convoluted with Gabor filter with different orientations, and the feature map of the eight orientations of Chinese characters is extracted. Third, the feature map through Gabor filters and the original image are convoluted with learning kernels, and the results of the convolution is the input of pooling layer. Finally, the feature vector is used to classify and recognition. In addition, the generalization capacity of the network is improved by Dropout technology. The experimental results show that this method can effectively extract the characteristics of Chinese characters and recognize Chinese characters.

Convolutional neural network using generated data for SAR ATR with limited samples

Longjian Cong, Lei Gao, Hui Zhang, et al.

Show abstract

Being able to adapt all weather at all times, it has been a hot research topic that using Synthetic Aperture Radar(SAR) for remote sensing. Despite all the well-known advantages of SAR, it is hard to extract features because of its unique imaging methodology, and this challenge attracts the research interest of traditional Automatic Target Recognition(ATR) methods. With the development of deep learning technologies, convolutional neural networks(CNNs) give us another way out to detect and recognize targets, when a huge number of samples are available, but this premise is often not hold, when it comes to monitoring a specific type of ships. In this paper, we propose a method to enhance the performance of Faster R-CNN with limited samples to detect and recognize ships in SAR images.

MIPPR 2017: Pattern Recognition and Computer Vision

Volume Details

Table of Contents

Table of Contents