2019 International Conference on Image and Video Processing, and Artificial Intelligence | (2019) | Publications

Volume Details

Date Published: 27 November 2019

Contents: 7 Sessions, 109 Papers, 0 Presentations

Conference: The Second International Conference on Image, Video Processing and Artifical Intelligence 2019

Volume Number: 11321

All links to SPIE Proceedings will open in the SPIE Digital Library.

Show all abstracts

View Session

Front Matter: Volume 11321
Image Processing and Applications
Video Processing and Applications
Computer Vision and AI
Machine Learning and Artificial Neural Networks
AI Systems
Big Data and Large-Scale Scientific Computing

Front Matter: Volume 11321

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 11321, including the Title Page, Copyright information, Table of Contents, Author and Conference Committee lists.

Image Processing and Applications

Automatic segmentation algorithm of license plate image based on PCNN and DNN

Xiangyu Deng, Wenjuan Qin, Ran Zhang, et al.

Show abstract

License plate segmentation is a key technology in the process of license plate location and recognition. How to realize automatic segmentation of license plate image under complex illumination conditions has been a hot issue in intelligent transportation system (ITS). This paper deals with license plate image segmentation under a variety of lighting conditions. Based on the adaptive segmentation of license plate images by the Pulse Coupled Neural Network (PCNN), the relationship between the license plate image contrast and the PCNN iteration entropy is analyzed. An adaptive segmentation algorithm for license plate image using Deep Neural Network (DNN) to select the optimal result is proposed, and the selected segmentation image is filtered by the connected domain, which lays a foundation for subsequent license plate location, character segmentation and recognition. Simulation experiments show that the proposed algorithm performs better license plate segmentation and optimal selection for license plate images under various lighting conditions.

Adopting centrality measure models in visualized financial datasets

Jie Hua, Guohua Wang, Youquan Xu

Show abstract

The financial data is complex to analyse due to its complicated relationships and multiple attributes. Centrality measure models from the SNA (Social network analysis) can show the most critical variables in a network, and graph layouts can be produced to represent not only data networks but also the relations among data entries. To the best of our knowledge, there is no work that has been tried on the Australian stock market based on the combination of those two methods mentioned above so far. This study adopts centrality measure methods and a graph drawing algorithm (force-directed) to offer users big pictures and detailed views, comes with ranking factors based on weighted degree, pagerank and eigenvector metrics. The outcomes show that the methodology can produce clear graph layouts of the stock’s social network, identify the central stocks (represent through features such as node colour and size) and the business sectors they belong to. This study may assist stakeholders with grasping deep insight from the complex financial datasets, and another angle of view to adjust future investments accordingly.

Functional magnetic resonance imaging classification based on random forest algorithm in Alzheimer's disease

Yu Wang, Changsheng Li

Show abstract

For classifying Alzheimer's disease (AD) by analyzing medical image data, in this paper a computer-aided diagnosis method is proposed based on random forest algorithm. In this study functional magnetic resonance imaging (fMRI) data including 34 AD patients, 35 mild cognitive impairments (MCI) and 35 normal controls (NC) is collected. Firstly, functional connection between the different regions of whole brain is calculated using Pearson correlation coefficient. Then the importance of the functional connection between different brain regions is measured and the important features are selected using the random forest algorithm. Finally, classification is performed using support vector machine (SVM) classifier with ten-fold cross-validation. The classification model based on random forest and SVM has a good effect on the recognition of AD, and the classification accuracy rate can reach 90.68%. Functional connection characteristics can be effectively analyzed by the random forest algorithm which can distinguish AD, MCI and NC accurately. At the same time, the abnormal brain regions of AD pathogenesis can be obtained. The related experimental results can provide an objective reference for the early clinical diagnosis of AD.

Histogram-based image watermarking algorithm using visual perception characteristics

Wenwen Du, Daoshun Wang, Shundong Li, et al.

Show abstract

Watermarking algorithms based on the geometric invariance of image histogram are effective and can resist various common attacks. However, all existing histogram-based image watermarking algorithms are constructed from all the pixels of the entire image; thus the embedded watermark energy is randomly distributed throughout the image, causing visual quality degradation in the smooth areas. In this paper, an improved algorithm using human visual perception characteristics is proposed. Firstly, we calculate the JND threshold mapping of the carrier image and select a portion of the pixels with the largest threshold as samples of the statistical histogram. Secondly, we calculate the mean of the selected pixel set, determine the embedding region and divide it into several groups. Finally, by adjusting the number of pixels in three bins per group, 2 bits of the watermark are embedded. According to the geometric invariance of the histogram and the different sensitivity of human eyes to the smooth and textured areas, we embed the watermark in the positions which are not easily perceived by human eyes. Experiments show that the proposed algorithm significantly improves the visual quality of the smooth areas in the watermarked image, but it has weaker robustness to signal processing attacks.

An offline fast model training method using CGAN for anti-jamming in true environment

Minmin Jiang, Da-peng Li, Fuqi Mu, et al.

Show abstract

Deep learning is applied in the Cognitive Radio field, such as anti-jamming communication based on spectrum waterfall image. However, the anti-jamming communication is a decision problem essentially, which interacts with the environment dynamically and is quite different from the problems of classification and detection. The training for decision problem often requires sampling a large number of labeled dataset from true environment, which undoubtedly suffers from the extremely high time consumption problem. In this paper, an offline fast model training method for anti-jamming communication is proposed. Specifically, a novel framework is developed to accelerate the training procedure. Like the Gym toolkit, a virtual environment generator for the production of the spectrum waterfall image is made by CGAN which is trained using the dataset sampled randomly by real transceiver. Due to the variety of the synthesized spectrum waterfall images outputted rapidly through the environment generator, the training efficiency is improved significantly. The experiments show, compared with the online training method, the time cost of the offline method is reduced over 50%.

The application of transfer learning for scene recognition

Boyi Hong, Cong Jin, Nansu Wang, et al.

Show abstract

Video is often accompanied by advertisement recommendation, which is an important part of it. In order to make the recommendation of advertisements intelligent, it is important to know the categories of scenes in videos. Although scene recognition and classification have been extensively studied, most methods require a large amount of data sets and training time. To address this issue, we adopt transfer learning t, which has achieved great success in visual tasks with high accuracy and small data set. In this scheme, we propose a model which can be applied to intelligent recommendation of advertisements. We chose class places from taskonomy as our source task model, and it has relatively good accuracy after freezing and training. Our model is not only suitable for indoor scenes, but also suitable for several outdoor scenes which often appear in video and have advertising value.

RGB-D dynamic facial dataset capture for visual speech recognition

Naveed Ahmed

Show abstract

We present a new comprehensive RGB-D dynamic facial dataset capturing system that can be used for facial recognition, emotion recognition, or visual speech processing. Our facial dataset uses an RGB-D (Kinect) camera to record 20 individuals saying 20 common English words or phrases. Using Kinect facial tracking, we not only record the facial features, but also facial outline, RGB data, depth data, mapping between RGB and depth data, facial animation units, facial shape units, and finally 2D and 3D face representations of the face along with the 3D head orientation. The captured RGBD dynamic facial dataset can be employed in several applications. We demonstrate its effectiveness by presenting a new visual speech recognition that employs three-dimensional spatial and temporal data of different facial feature points. The results demonstrate the our RGB-D dynamic facial dataset can be effectively employed in a visual speech recognition system.

Prototype of guide robot using marks in dynamic environments for visually impaired people

José A. Rumipamba L., Aníbal R. Pérez C., Christian E. Flores A., et al.

Show abstract

Visually impaired people have several limitations when performing daily tasks, they almost always need the help of a person who does not suffer visual impairment to help with these tasks, which makes them dependent on other people. In the present work a mobile robot prototype is presented that aims to provide independence to visually impaired people when making purchases in a supermarket. With this objective, the development of a mobile differential type dragging robot that uses infrared sensors to follow a black line, a web camera to recognize green and red marks that indicate turn or stop actions respectively, ultrasonic sensors to avoid collisions with obstacles in the route tracking and a LattePanda vented card to control the system is presented. In addition, the results obtained from tests carried out with the prototype are shown in guiding people to contribute with data that detail the behavior of this model under experimental conditions using artificial vision and infrared sensors.

The application of transfer learning in film and television works

Bihan Lian, Cong Jin, Nansu Wang, et al.

Show abstract

Many personalized advertisement recommendation studies suffer from the problem of only certain tagged items can be recommended in video playback, which mean it can’t recommend more produces to users that they really like . It also doesn’t know the users really like at the source. Due to the large number of scene changes in different video, the users can choose more items they like. This study attempts to adopt transfer knowledge to solve the problem of data volume to provide users with a variety of options. Aiming at the image classification model of learning on big data set, this paper proposes a method to solve the problem of scene object recognition in TV program,such as movies,TV plays, variety shows and short video, by transferring a pre-trained depth image classification model to a specific task. In a small training set, Learning high-level representations on a small training set to produce a task-specific target model. Experiments on small data sets and real face sets collected by myself show that the transfer learning is effective and efficient. In the application of video, this study provides a theoretical basis for personalized click recommendation of video users.

Research on autonomous driving perception based on deep learning algorithm

Bolin Zhou, Jihu Zheng, Chen Chen, et al.

Show abstract

Multiple source sensor fusion is the foundation of motion planning for autonomous driving system, which is the crucial part in improving the performances for unmanned operational system. In this article, based on the deep learning platform CATARC constructed, applied with Udacity’s Lincoln MKZ multiple sensor data, implemented with Robotic Operation System, Computer Vision, PointCloud Library, Deep Neural Networks and Extended Kalman Filter, constructed a low-cost object pose estimation data fusion solution, aiming at technic support for the industrialization of autonomous driving technologies.

Asymmetric multiple-images cryptosystems based on discrete multiple-parameters fractional Fourier transforms

Yu Tai, Wen-Liang Hsue, Jun-Yu Liu, et al.

Show abstract

There are many free parameters in discrete multiple-parameters fractional Fourier transforms. These parameters can be applied as encryption keys that make a cryptosystem safer. In this paper, the extended versions of discrete Fourier transforms are employed into an existing asymmetric multiple-images cryptosystem. The resulting encryption system is based on phase and amplitude truncations, which is the reason that the system is asymmetric. The phase combination operation during encoding makes the random phase key required for decoding, which enhances security of decoding. The discrete Fourier transform is replaced by discrete multiple-parameters fractional Fourier transforms to increase the number of free parameters and sensitivity of errors without decreasing the sensitivity of phase and amplitude keys. Related computer experiments are performed to validate our proposed methods.

CNNs in the frequency domain for image super-resolution

Yingnan Liu, Randy Clinton Paffenroth

Show abstract

This paper develops methods for recovering high-resolution images from low-resolution images by combining ideas inspired by sparse coding, such as compressive sensing techniques, with super-resolution neural networks. Sparse coding leverages the existence of bases in which signals can be sparsely represented, and herein we use such ideas to improve the performance of super-resolution convolutional neural networks (CNN). In particular, we propose an improved model in which CNNs are used for super-resolution in the frequency domain, and we demonstrate that such an approach improves the performance of image super-resolution neural networks. In addition, we indicate that instead of numerous deep layers, a shallower architecture in the frequency domain is sufficient for many types of image super-resolution problems.

Denoising and contrast enhancement fusion based on white balance for underwater images

Chao Wei, Junfeng Wang, Guannan Chen

Show abstract

When the scene reached the human eye or other sensor through the underwater medium, the original color property of the object could be basically lost, and the background of most images was blue-green. In this paper, to remove the blue-green color of the underwater image and increase the color contrast of the image, a novel fusion method of two images was proposed. The two images were derived from the filtering denoising result and the contrast enhancement result after whitebalancing version of the original degraded image. Then, based on the two images, the associated weight maps were designed to enhance edge texture and color contrast of the output image. Finally, to avoid artifacts in the reconstructed image, we adopted the multiscale fusion strategy to fuse the processed two images. Experiments showed that our algorithm achieved better image contrast and edge sharpness than other methods and obtained better exposure for darker areas of the image.

Pneumonia detection based on deep neural network Retinanet

Mao Liu, Yumeng Tan, Lina Chen

Show abstract

The interpretation of chest x-rays is critical for the discovery of thoracic diseases, including pneumonia and lung cancer, which affect millions of people worldwide each year. This time-consuming task usually requires radiologists to read the images, leading to diagnostic errors due to fatigue and lack of diagnostic expertise in areas where there are no radiologists in the world. Recently, deep learning methods have been able to perform well in the field of medical imaging, thanks to the emergence of large network architectures and large labeled datasets. In this work, we describe our approach to pneumonia classification and localization in chest radiographs. This method uses only open-source deep learning object detection and is based on RetinaNet, a fully convolutional network which incorporated global and local features for object detection. Our method achieves the classification and localization of Chest radiograph pneumonia by key modifications to the image preprocessing and training process, and incorporates bounding boxes from multiple models during the test. Improve the effect of algorithm classification and localization. After image enhancement and algorithm improvement, we randomly selected 100 chest radiographs on the second stage chest dataset to test our detection algorithm and achieved good results. Our findings yield an accuracy of 90.25%.

Human face detection based on skin color with principal component analysis (PCA) algorithm

Dorsa S. Kiaei, Saeed Tavakoli

Show abstract

Identifying and identifying a person's face is one of the challenging issues in computer applications. The main part is the facial recognition of the whole image. Problems include sensitivity to the exposure conditions of the input image. Here is a way to detect human faces in color images based on the edge and color of the skin of the color image by setting the appropriate thresholds. First, image enhancement is performed, especially if the image is obtained from an infinite lighting condition. Then the skin was digested in the laboratory environment. The edges of the image are combined with the tone of the skin color to separate all areas other than the face. For this purpose, skin color detection methods have been used to detect faces. The Eigenface method is also known as the PCA method. This process has been tested in pictures and samples for various samples and has yielded good results. Due to the increasing instances of identity theft and the occurrence of terrorism in the past years, recognizing and identifying a person can determine the unity of a human being with another person's face. For facial recognition of an online monitoring system or an offline image, the main component to be identified is the skin areas. The skin color has been proven to be a powerful and useful indication for face detection, placement, and follow-up. Face detection is simply a facial image for determining the input given, regardless of its size, position, and background. The current evolution of computer technology has increased in this era.

A fractal image denoising algorithm in wavelet domain

Na Wang

Show abstract

An image denoising and enhancement algorithm based on fractal coding in wavelet domain is presented. Among them, A lemma is introduced first, and the advantages of fractal coding in wavelet domain are analyzed. Then, the denoising algorithm and experimental results based on fractal coding in wavelet domain are given in detail, and the experimental results are analyzed. When the noise level is high, the denoising effect of this method is better than that of the general method, and has higher logarithmic signal-to-noise ratio and peak logarithmic signal-to-noise ratio. At the same time, the calculation amount of this algorithm is smaller and the precision is higher.

A blind digital watermarking scheme based on RS coding

Na Wang

Show abstract

According to the robustness of digital watermarking, a robust blind watermarking algorithm based on RS coding technology is proposed and implemented. Before embedding watermark signal, RS channel encoding is carried out. When embedding and extracting the watermark, the relevance of the wavelet transform coefficients are calculated to extract and enhance the robustness. The experimental results show that the gain of 12dB is comparable with that of non coding.

Walsh orthogonal moments for efficiently reducing Gibbs noise in image reconstruction

Bing He, Jiangtao Cui, Yanguo Peng

Show abstract

Orthogonal moments have been extensively used as global feature-extraction descriptors in two-dimensional image reconstruction. However, a common problem for the existing image moments including orthogonal moments built in Cartesian coordinate system and radial orthogonal moments achieved at polar coordinate space is that Gibbs effect will occur, when image reconstruction is performed using features extracted from finite-order moments. It is well known that Gibbs noise can affect the capability of representation for image moments and the quality of reconstructed images. In this paper, we propose a set of novel orthogonal moments named as Walsh orthogonal moments (WOMs). The basis functions of Walsh orthogonal moments are Walsh functions, which composed of a class of complete orthogonal discontinuous binary function systems. Therefore, it provides possibility of avoiding calculating high order polynomials, and experimental results show Gibbs noise can be efficiently avoided by using discontinuous functions as the basis functions of Walsh orthogonal moments.

Multi-loss joint optimization for person re-identification

Mengxue Ren, Shuhua Lu

Show abstract

Due to the rise of deep learning, person re-identification has become a research hotspot in computer vision. For most person re-identification algorithm, softmax function is used as loss function which could increase the distance of interclasses, but has a bad convergence performance for the distance of intra-classes. Therefore, a person re-identification model based on multi-loss optimization is proposed by adding center loss. Center loss has the function of reducing intraclass distance, which makes up for the shortcoming of softmax loss. Two models are selected for comparative experiment to prove the effectiveness of our method. One is the re-ranking person re-identification model with kreciprocal coding, which is named IDE_ResNet-50+Jaccard. The other is the person re-identification model without kreciprocal coding, which is named IDE_ResNet-50. The experiments perform on the Market-1501 dataset, and the result shows that our method has a better result than the original model, which gains an increase of 1.25% and 0.63% in mAP and rank-1 accuracy for IDE_ResNet-50+Jaccard model. For the IDE_ResNet-50 model, the accuracy of mAP and rank1 increased by 1.86% and 0.18%, respectively.

Feature optimization for pedestrian detection based on faster R-CNN

Mengxue Ren, Shuhua Lu

Show abstract

Due to the rise of convolutional neural network, pedestrian detection has achieved great success. However, the features of many existing methods are not fully utilized, which results in unsatisfactory detection results. A new pedestrian detection model which is named of SE-Faster R-CNN is proposed in this paper. By adding SENet block to Faster RCNN, it can strengthen the expressive force of feature. Then, the GN-Faster R-CNN, which is generated by adding the normalization layer -- Group Normalization layer to Faster R-CNN, is proposed. The proposed architecture is trained and tested on Caltech dataset. In addition, VGG16 model and ZF model are used as the backbone structure of detection network. A comparative experiment is implemented to compare the effectiveness of the two optimization methods. It can be seen from the experimental results that, after adding SENet, the miss rates of ZF model and VGG16 model were reduced by 0.392% and 0.999%, respectively. After adding the GN layer, the miss rate of VGG16 model was reduced by 0.665%, while the miss rate of ZF model was increased by 2.093%.

Blind motion deblurring using multi-scale residual channel attention network

Jiakai Dai, Yujun Zeng

Show abstract

In recent years, multi-scale approach has been applied to image restoration tasks, including super-resolution, deblurring, etc., and has been proved beneficial to both optimization-based methods and learning-based methods to improve the restoration performance. Meanwhile, it is observed that high-frequency information plays an important role in blind motion deblurring. Unlike previous learning-based methods, which simply deepen deblurring network without discriminating the low-frequency contents and the high-frequency details, we propose a novel multi-scale convolutional neural network (CNN) framework with residual channel attention block (RCAB) for blind motion deblurring. RCAB has the residual in residual (RIR) structure, which consists of several residual groups with long skip connections and allows low-frequency information pass through the skip connections conveniently, and can adaptively learn more useful channel-wise features and pay more attention to high-frequency information. Experimental results show that our proposed method can obtain better deblurring images than state-of-the-art learning-based image deblurring methods in terms of both quantitative metrics and visual quality.

A texture segmentation method for high resolution remote sensing images combining gray edge information

Na Wang

Show abstract

A texture segmentation method for high resolution remote sensing image Combining gray edge information is proposed. Firstly, an initial segmentation strategy based on gray edge detection is proposed to segment the image initially. Then the texture features of the image are extracted by using the Gauss Markov random field model. In the feature space, the mean values of each class of features in the initial segmentation are obtained, and then the feature vectors are clustered as initial points to complete the segmentation. This method solves two drawbacks of the standard fuzzy C-means clustering algorithm in image segmentation: slow operation speed and large dependence on the initial value. The real remote sensing image is segmented by this algorithm. Experiments show that this method has faster speed and better stability.

Instance segmentation by using mask R-CNN based on feature fusion of RGB and depth images

Jinyu Sun, Chengxiong Jin, Shiwei Ma

Show abstract

The instance segmentation for obstacle detection based on machine vision and deep learning is quite important for autonomous driving system. In this paper, a method using the Mask R-CNN based on feature fusion of RGB and depth images for instance segmentation is proposed. It extracts the features of depth image by designing a two-layer NiN network, and uses convolution to realize the feature fusion and dimension reduction of RGB image and depth image. The edge texture in depth image can improve the accuracy of boundary frame positioning. Experimental results on typical benchmark dataset demonstrates the effectiveness of the proposed method, which can improve the segmentation accuracy by 4% and the recall rate by 2%.

Real-time lane detection, fitting and navigation for unstructured environments

Apoorve Singhal, Vibhakar Mohta, Arvind Jha, et al.

Show abstract

In the outdoor environment, robot perception is a challenging task encompassing several layers of abstractions like lane detection, object detection and avoidance, and way-point navigation. Intelligent Ground Vehicles are becoming popular and having an efficient perception stack is quintessential to its scaling for different tasks. Several issues like illumination variance, shadows, occlusions, etc. cause researchers to adopt computationally heavy approaches for improving generalization. We present a novel, real-time approach for combined lane detection, obstacle detection, and way-point navigation using features from a 2D-LiDAR and camera. A robust curve fitting algorithm has been implemented, adhering to the minimization of computation. The overall processing pipeline has been tested and validated to work well in outdoor conditions.

Coordinated detection of robots during processing

Xuefeng Wu, Mingyu Wang

Show abstract

In order to realize the functions of tool, workpiece replacement, surface roughness, machining defect and tool wear detection in the process of intelligent manufacturing, a system which can realize in-machine detection is proposed by combining robot with machine vision detection technology. The overall structure of the system is composed of CNC machine tools, cutting tools, tool shank, camera, lens, computer and so on. The manipulator is calibrated by the method of simultaneous calibration of hand and eye, the clamping device and camera are installed on the manipulator, and the replacement of cutting tool and workpiece can be realized by using industrial image processing algorithm, as well as the in-machine detection of surface roughness, machining defects and tool wear. The experimental results show that the method can meet the requirements of some tests.

Real-time patient facial expression recognition using convolutional neural network

Xin Chen, Yutong Qian, Shilei Fu, et al.

Show abstract

Real-time monitoring of patients in hospital is of great importance, as it serves as an alarm of emergence condition. However, all-day company of carers or monitor is costly, and a waste of resources. With the development of deep learning, it is worthy of consideration to use low-cost real-time target recognition method in machine learning instead. This paper proposes to monitor the state of the patients via facial expression recognition. In order to that, a two-stage approach, i.e. detection of the face of the patient and classification the facial expression, is proposed. The face detector relies on the Harr feature, and is pre-trained. Then the detected face are classified either as “normal” or “abnormal” via a convolutional neural network. The training and test data are collected in real scene by mobile phone. The experimental results show an accuracy of 83% is achieved in test set.

Laser-assisted turning spot detection algorithm for workpiece surface

Xuefeng Wu, Xianliang Zhou

Show abstract

A machine vision system is used to extract the laser spot area and the position of the spot center on the workpiece. Its overall structure consists of lathe, camera, PC and laser. It has the advantages of simple structure, high integration and flexible system design. Firstly, the area of the spot illuminated to the workpiece is obtained, and the power density and the temperature of the spot center are deduced. The position of the spot center on the workpiece is obtained to facilitate the adjustment of the laser irradiation position. In order to detect the light spot on the surface of a cylindrical workpiece, the pin-hole imaging model and the cylindrical surface expansion algorithm are used to expand and fit the surface part of the light spot on the surface of a cylindrical workpiece, and the Hough transform is used to locate its exact position.

Night colorize: fully convolutional colorization network for low-light images

Lubin Xia, Li Li, Weiqi Jin, et al.

Show abstract

An end-to-end network is proposed for low-light images natural colorization using a deep fully convolutional architecture. The network consists of a downsampling sub-network and an upsampling sub-network. The downsampling component extracts the high-level features of the input images, while the upsampling component transforms the high-level features to color. A skip connection is used to transmit low layer information to the deep layer so as to improve the colorization accuracy. Gamma correction and random noise augmentation are used to improve the network adaptability to low-light images. The trained model can naturally colorize low-light images without any reference image or artificial scribbles.

Lung image segmentation by generative adversarial networks

Jiaxin Cai, Hongfeng Zhu

Show abstract

Lung image segmentation plays an important role in computer-aid pulmonary diseases diagnosis and treatment. This paper proposed a lung image segmentation method by generative adversarial networks. We employed a variety of generative adversarial networks and use its capability of image translation to perform image segmentation. The generative adversarial networks was employed to translate the original lung image to the segmented image. The generative adversarial networks based segmentation method was test on real lung image data set. Experimental results shows that the proposed method is effective and outperform state-of-the art method.

Infrared and visible image fusion using joint convolution sparse coding

Chengfang Zhang, Zhen Yue, Dan Yan, et al.

Show abstract

Infrared and visible images possess different types of simultaneous information, but there is a correlation between them. Traditional convolution sparse representation fusion considers the individual characteristics of each image but not the correlation between infrared and visible images. This results in insufficient detail retention and low contrast. To overcome these issues, joint convolution sparse coding is introduced, and a novel visible/infrared image fusion method is proposed. First, low-pass decomposition is used to decompose the source image into low- and high-pass components. Subsequently, joint convolutional sparse coding and a “choose-maximum” fusion strategy are used to fuse base layers, and the "absolute-maximum" is used for detail layers. Finally, image reconstruction is performed on the low and highpass components to obtain a final fused image. The proposed method not only avoids patch-based sparse fusion, which can destroy the image’s global structural features, but also fully integrates related information between infrared and visible images. Four groups of typical infrared and visible images are used for fusion experiments to verify the superiority of the proposed algorithm. The experimental results show that the proposed fusion algorithm provides optimal performance in subjective visual effects and objective evaluation indicators. Compared with the fusion method based on convolution sparse representation, three Q-series objective evaluation indicators increased by 3.83%, 5.31%, and 0.48%, respectively.

Research on technology of image processing for ID photos

Min Huang, Hong Lan

Show abstract

The second generation of ID photos have been required strict standards for personal digital photos. Research a new image processing technology to convert the personal photo into the standard ID photo or passport photo. The photos after processing will be qualified by " The Second generation ID photodetection platform" and can be used on ID card. A new method of image processing was proposed, which can change the normal personal photo into the standard ID photo. The proposed method includes three steps. Firstly, the clipping algorithm based on distance along the head is put forward, which is used to cut the photo to fit the rules of requirements. Secondly, the improved FloodFill algorithm is proposed to use for changing the photo background. Thirdly, the mean RGB color of skin adjustment algorithm is used to adjust and equalize the face colour of the photo, so that the photo can be fit the standards and pass the detection platform. The unqualified photos are successfully processed into qualified ID photos. Using the image processing method proposed in this paper to process photo, which only requires a single photo with a relatively simple background for user. After upon three steps, The single photo can be processed into a standard identity card photo. The proposed method has been proved to resolve the inconvenience of ID photo have to be taken with care in photo studio.

Automatic recognition of radar signal types based on convolutional neural network

Guoqing Ruan, Wei Wu

Show abstract

In the field of cognitive electronic warfare, automatic feature learning and recognition of radar signal is an important technology to ensure intelligence reconnaissance. This paper analyses the basic structure of convolutional neural network (CNN) and proposes an automatic recognition algorithm for radar signal. Firstly, the radar signal is transformed into time-frequency image, and the principal component information of the image is extracted by image processing method. Then, the designed network CNN-LeNet-5 is used to realize self-learning and recognition of features. The simulation results show that the algorithm can effectively identify eight kinds of radar signals in low signal-to-noise ratio.

Smooth voxel surface for medical volumetric rendering

Porawat Visutsak, Fuangfar Pensiri, Orawan Chaowalit

Show abstract

This paper aims to implement the trilinear interpolation algorithm with the marching cubes method for generating the smooth voxel surface from 2D digital images. The trilinear interpolation is a straight extension of the bilinear interpolation technique. It can be seen as the linear interpolation of two bilinear interpolations. The novel method is a fast and easy to implement and it also produces a smooth results (compared to the marching cubes technique). Therefore, for volume rendering such as the 3D medical models and terrains where a very large numbers of lookups in 3D grids are performed, this method is a very good choice for generating the high resolution of 3D surfaces.

Remote sensing image fusion based on dictionary learning and sparse representation

Fei Yin, Shuhua Cao, Xiaojie Xu

Show abstract

It is a tough challenge to find a remote sensing image fusion method which can acquire spatial and spectral information as much as possible from panchromatic (PAN) image and multispectral (MS) image. Sparse representation (SR) can realize remote sensing image fusion better than other popular methods, which is a powerful tool for dealing with the signals of high dimensionality. In addition, to gain better fusion results without color distortion, this paper propose a remote sensing image fusion algorithm with SR and color matching in stead of the intensity hue saturation (IHS) color model and Brovey transform. The experimental results show that proposed method can make fused image with both better spatial details and spectral information compared with three well-known methods.

Transfer learning of deep CNN de-noiser prior for Chinese ancient calligraphy tablet image denoising

Feihang Ge, Lifeng He, Yuyan Chao

Show abstract

Tablet images are significance vehicles for ancient culture heritage. However, due to natural or artificial destruction, there usually exists a large amounts of noises or scratches in the ancient tablet images, and this makes the recognition of interesting objects carved in the ancient very difficult. To deal with this problem, a method based on transfer learning of DnCNN De-noiser Prior was proposed in this paper. Firstly all parameters of all layers of a DnCNN pre-trained in natural images are transferred to our target networks. The initial trained CNN filter weights were then fine tuned with noised Chinese tablet calligraphy images by back-propagation so that they better reflected the noise modalities of tablet image, where Chinese tablet calligraphy structures are concerned to remove isolated small scratches by combing the connected region technique with DnCNN transfer denoising. Experiments on real noised tablet images demonstrate that the proposed method is effective both in image noise removal and image detail preserve compared with existing image denoising methods.

Research and design of three-dimensional brain augmented reality system based on medical image features

Wang Zhao, Yan Ren, Yufei Pang, et al.

Show abstract

With the rapid development of information technology and the application of artificial intelligence technology in medical field, the latest computer technology has become a means to improve the level of medical diagnosis. Various medical imaging diagnostic devices, such as CT and MRI, can provide two-dimensional images of human body information. At present, doctors urgently need to accurately determine the spatial position, size, geometric shape of the lesion and the spatial relationship between the lesion and surrounding tissues. Therefore, it is very important to use computer technology to reconstruct human organs and display the lesion area in three-dimensional. At present, the results of automatic recognition and labeling of brain images are displayed in two-dimensional form, so it is necessary to use three-dimensional visualization technology to reconstruct them. In addition, some additional information can be superimposed on brain images for integrated display by combining virtual and real information. Augmented reality is the best technology to achieve the above requirements. This paper studies the process of feature extraction and three-dimensional reconstruction of brain image diseases. Combining with augmented reality business function, using DICOM file and Unity tool, a program of three-dimensional reconstruction and visualization of brain image features is designed to help doctors grasp the spatial information of brain quickly and intuitively.

Video Processing and Applications

A CAVE-based panoramic video playing method with 360° viewpoint

Ao Li, Long Ye

Show abstract

CAVE is a large virtual reality display system and this paper focuses on the CAVE-based panoramic video playing method to achieve playing panoramic video in any stream format in the CAVE system with free viewpoint. Our framework include the processing video data captured by VR video capture device, the establishment of CAVE model, the switching of viewpoint and the viewpoint interaction with video and finally realize a smooth and efficient panoramic video playing architecture.

Research on the application of modern information technology in teaching based on the network multimedia environment

Xiaofei Peng

Show abstract

With the rapid development of information industry, network multimedia technology has been widely used in College English teaching, which has greatly promoted the development of College English teaching and achieved remarkable results.However, after many years of using multimedia teaching, it also has many problems. Some disequilibriums are gradually exposed in the implementation of network multimedia teaching mode in Colleges and universities.From the perspective of educational ecology, this paper explores the underlying causes of disequilibrium, puts forward suggestions and countermeasures on how to deal with and solve these disequilibriums, and explores how to optimize the overall ecology of College English teaching, so as to coordinate the relationship among the ecological factors, so as to put the imbalanced teaching system back on a sound track of development.

Surveillance of abnormal behavior in elevators based on edge computing

Yan Qi, Ping Lou, Junwei Yan, et al.

Show abstract

Edge computing is an extension of the cloud computing paradigm that shifts part of computing data, applications and services from the cloud server to the network edge, providing low-latency, mobility and location-aware support for delaysensitive applications. The elevators in the high-rise buildings are geographically distributed and movable. Safety and reliability of elevators have attracted people’s attention. Security problem in the elevator is a key issue, especially in emergencies requiring fast response and low latency. In this paper, an elevator abnormal behavior video surveillance system is designed and developed using edge computing paradigm. The recognition of abnormal image sequences and the evaluation of abnormal behavior are realized. Collecting, processing, and analyzing video images are completed at the network edge in real time. The Edge computing nodes are distributed and deployed according to the geographic location of the elevator. The edge nodes are based on mobile embedded devices, and use the computing resources of the embedded devices to implement edge computing at the network edge. Through the edge network, there are several edge nodes based clusters being built to perform distributed computation tasks.

Video action recognition based on improved 3D convolutional network and sparse representation classification

Wang Liu, Qi Fu, Yuqiu Lu, et al.

Show abstract

In view of the problem that the typical convolutional neural networks fail to model actions at their full temporal extent, a novel video action recognition algorithm, which is based on improved 3D Convolutional Network (iC3D) architecture with K-means keyframes extraction and sparse representation classification (SRC), is proposed in this study. During the feature extraction process, the K-means keyframes extraction is constrained to reduce redundant information generated by continuous video frames and increase the temporal acceptance region. Meanwhile, to improve the noise immunity, sparse coding and its reconstruction errors are used for classification. The proposed method has 96.5% recognition accuracy on the typical video action classification dataset UCF101 that outperforms other competing methods. In addition, we built a wild test dataset to verify the generalization performance of the proposed model.

A fast blink-control system based on FPGA

Siwen Cai, Peng Wang, Qiang Zhang, et al.

Show abstract

People have been pursuing fast and convenient methods to control machines. However, the existing human-computer interaction methods still have many shortcomings. We present here a fast blink-control system based on FPGA. The video image processing technologies were utilized to realize the blink-control algorithm and the Verilog hardware description language (Verilog HDL) was taken to develop functional modules on the FPGA. Positioning human eye and recognizing blink were implemented with integral projection method based on gradient operator according to features of eyes. Then, subsequent operations were performed based on judgement results. The designed fast blink-control system was verified on ALTER FPGA development platform. The results showed that the accuracy to detect eye blink (front face, no inclination) was 98% and the average response time of the system was about 413ms, which reached the real-time detection level. Therefore, it has achieved the goal to provide a new control mode which is fast and effective.

Non-uniform correction of infrared image based on adaptive forgetting factor recursive least square method

Chenlong Guo, Xining Liang, Yixuan Liu

Show abstract

The traditional neural network method is faced with the problem of gradient step size selection when dealing with nonuniform correction of infrared images. When the gradient step size is large, it is easy to cause gradient divergence, and when the gradient step size is small, it is difficult to obtain convergence.At the same time, the algorithm is also prone to ghosting or image blurring. Aiming at this problem, this paper proposes an infrared image non-uniform correction method based on adaptive forgetting factor recursive least squares method. Firstly, this paper deduces the least squares method into the form of incremental calculation, and introduces it into the calculation of the offset and gain of nonuniform correction, so that it can train the infrared image frame by frame. At the same time, this paper considers the problem that the background of the previous frame is learned to generate ghosts in the process of image from long-term still to sudden change, and the calculation of forgetting factor is introduced. And this paper uses local structural similarity index (SSIM) to calculate the forgetting factor. The experimental results show that the iterative step size of the proposed method can be calculated adaptively, without manual adjustment, and can effect overcome the ghost problem. Compared with the traditional neural network method and time domain high-pass filtering method, the algorithm of this paper is the best.

SAR target recognition based on Gabor filter and convolutional neural network

Chenlong Guo, YuXuan Han, HuiYing Zhang

Show abstract

In this paper, a synthetic aperture radar target recognition method based on Gabor filter and convolutional neural network was proposed. Ordinary convolutional neural network obtained the corresponding connection weight through self-learning, but it had no clear meaning, and often required more convolution kernels and more time cost to complete the self-learning of features. Due to the local amplification function of Gabor filter, it was used as the fixed connection weight as the first convolution kernel of the convolutional neural network in this paper. Then, a convolutional neural network consisting of 7 convolutional layers and 2 full-connected layers was constructed, and the convolutional neural network was used for SAR target recognition. Experimental results showed that, after adding Gabor filter as the fixed first convolutional layer, the convergence rate of convolutional neural network could be greatly improved, and the recognition effect is better than that of ordinary convolutional neural network.

Optimizing the function of urban intelligent transportation system

Qiujie Liu, Chengfu Yang

Show abstract

With the development of the urban economy and the increase of car ownership, traffic congestion has become a stubborn problem in urban governance, existing road traffic and information systems are also facing severe tests, building intelligent transportation has become an urgent problem for major cities. This study focused on optimizing the function of the intelligent transportation system in the city, put forward the design of the function module of the "recommended maximum flow path in the minimum time in real time"based on the dual consideration of road network and user, and expounded its design concept, the function of intelligent decision support for query users was realized. Furthermore, in this study, the mechanism of the function realization of the proposed optimization module was analyzed and studied, that is, the concept model of the minimum time maximum flow was established, and the algorithm steps for finding the minimum time maximum flow were explored; And a simple case was analyzed. This study provides a powerful reference for the construction and improvement of intelligent transportation systems for cities.

Video-based traffic sign detection and recognition

Qiuyu Zhang, Yongliang Shen, Zhang Yi

Show abstract

For driverless cars, the detection and identification of traffic signs is critical to the understanding of the environment around the vehicle. There are many smaller traffic signs in the pictures taken during driving, which are difficult to detect by existing object detection methods.In view of the real-time and accuracy requirements of the traffic sign recognition problem of driverless cars, this paper improves the VGG convolutional neural network, proposes the VGG-8 model, and improves the VGG-16 structure. The network is optimized through SGD and Nesterov Momentum. Training and testing in the GTSRB traffic dataset. By comparing a small-scale four-layer convolutional neural network VGG-6 with a 13-layer convolutional neural network VGG-16, a six-layer convolutional neural network is proposed. VGG-8. Inspecting the ten traffic signs in the video, VGG8 has higher accuracy and running speed, and has certain feasibility in practical applications.

Computer Vision and AI

Design and implementation of encrypted file system based on FPGA

Xiaoyuan Wang

Show abstract

Data Encryption Standard (short for DES) algorithm is widely used in encryption design. By adopting DES algorithm, this paper aims to encrypt and decrypt data files based on FPGA. In this thesis, the author deeply analyzes DES algorithm and the workflow, such as IP substitution, S-box substitution, P-box substitution and key substitution. Then by virtue of an analysis on core technology of DES algorithm, DES algorithm is utilized to realize encryption and decryption of data files, among which, the encryption and decryption processes of DES algorithm are basically the same, except for a small part. For example, when generating the key, the encryption algorithm cyclically moves to the left, but the decryption algorithm cyclically moves to the right, etc. Since the encryption and decryption parties must use the same key, DES algorithm is also a symmetric algorithm. This paper uses DES algorithm to implement data file encryption and decryption based on FPGA.

Sheep delivery scene detection based on faster-RCNN

Shiwen Sun, Junping Qin, Hongcheng Xue

Show abstract

Sheep delivery scene detection is one of the important applications of object detection technology in the field of animal delivery detection. At present, there are reports on the detection of delivery scenarios of pigs and dairy cows at home and abroad, but the research on the behavior of sheep delivery is still in its infancy. This paper aims to apply the Faster- RCNN model for the detection of ewes and newborn lambs in a sheep delivery scenario; Training the Faster-RCNN model based on the ZF, VGG16 feature extraction networks and the Soft-NMS algorithm respectively by using the selfmade sheep delivery scene data set, and the experimental results were compared; The comparison of experimental results show that the Faster-RCNN model based on Soft-NMS algorithm and VGG16 feature extraction network has better effect in sheep delivery scene detection. The method can effectively complete the detection of the ewes and the newborn lambs in the sheep delivery scene, expand the application range of artificial intelligence in the animal husbandry, and has certain popularization and application value for promoting the development of the wisdom animal husbandry.

A novel robot teaching system based on augmented reality

Zebiao Guan, Yi Liu, Yuqing Li, et al.

Show abstract

This paper presents a noval robot teaching system based on Augmented Reality to solve several problems like the inefficiency in off-line programming, low interactivity, etc. Combining a motion capture system and a mixed reality head-mounted display (Microsoft HoloLens) and a 6-DF robot arm, we design a robot teaching system with three main functions: 1) Trajectory programming, 2) visualization of trajectory information, 3)virtual previews of robot motion. Firstly, we make coordinates transformation among HoloLens, motion capture system and robot arm. Then, we superimpose the virtual robot arm on the real industry environment by using virtual reality technology. Additionally, we use the motion capture system to capture the data of three-dimensional position of handheld demonstrator and save these data to the register of the real robot arm for trajectory programming. Finally, we transfer those data to HoloLens to achieve visualization of trajectory information and the virtual preview of robot motion.

Detection and fine 3D pose estimation of texture-less objects

Jian Peng, Yingbo Zhang, Shaojun Zhou

Show abstract

This paper presents a method for real-time detection and accurate estimation pose of the object. We start from the LINEMOD which be proposed by Hinterstoisser et al. However, it show typical problems such as not being robust to clutter and occlusions. In this paper, we propose a method, namely Block-Matching. Firstly, extracting multi-angle templates for each object. Then cutting each template into many small blocks, and treat each block as a template to match the object in the scene. If the matching score of a small block in a template exceeds a certain threshold, it indicates that there is an object which we want in the scene. The ICP algorithm is the most common choice for fine 3D pose estimation. But ICP can easily get stuck in local minima and its performance largely depends on the quality of initialization. Particle swarm optimization is employed to refine the 6D pose of the target object in this paper to avoid these defects. Extensive experimental results demonstrate the superior performance of the approach compared to the state of the art.

Joining geometric and RGB features for RGB-D semantic segmentation

Shaopeng Zhang, Ming Zhong, Gang Zeng, et al.

Show abstract

Depth map is a regular format of geometric data structure. Some approaches attempted to harness point cloud from depth channel to extract 3D features and demonstrated the superiority over traditional 2.5D representation approaches. However, how to add 3D features to the pixel on RGB frames in order to incorporate geometric information is a challenging task. In this paper, we propose a simple and general framework combining geometric information of depth maps and RGB information of color maps for semantic segmentation task. Specifically, we first extract geometric feature from an associated point cloud which is harnessed from depth map, and then the RGB feature from color map. Due to the regular format of depth map, the geometric feature can be easily mapped to the corresponding pixel on RGB feature. After that, we get a combination of RGB and geometric features with our Element-Max-Min-Fuse function. Additionally, an efficient decoder module is designed to refine the segmentation results. We demonstrate the effectiveness of the proposed model on S3DIS dataset, the experimental results show that our method enhances the result of using RGB image or point cloud alone.

XGAN: adversarial attacks with GAN

Xiaoyu Fang, Guoxu Cao, Huapeng Song, et al.

Show abstract

Recent studies have demonstrated that deep neural networks can be attacked by adding small pixel-level perturbations to the input data. In general, such disturbances are indistinguishable to the human eye, but can completely subvert the output of the deep neural network classifier to achieve non-target or target attacks. The current common practice is to superimpose the original image after generating a disturbance for the neural network. In this paper, we applied a method of generating target images directly using GAN to achieve a method of attacking deep neural networks. This method has excellent results on black-box attacks and is also suitable for the preconditions of most neural network attacks. Using this method, we achieved an 82% success rate on the black-box target attack on the cifar10 dataset and the MNIST dataset, while ensuring that the generated image is comparable to the original image.

Target tracking based on hierarchical feature fusion of residual neural network

Hui Jin, XinYang Li

Show abstract

Feature expression is a crucial part of the target tracking process. The artificial feature is relatively simple and has strong real-time performance, but there is a problem of insufficient representation ability. It is prone to drift when dealing with problems such as rapid change and target occlusion. With the strong feature expression ability of deep neural network features in target detection and recognition tasks, deep neural network features are gradually used as feature extraction tools, but how to use and integrate these features is still worth studying. In this paper, the Residual Neural Network(ResNet) is the main researched object, and the influence of each layer on the target tracking performance is analyzed in detail. The feature fusion strategy of the convolutional layer and the addition layer is finally determined. We train a classifier separately for these layers. Then we search the multi-layer response maps to infer the target location in a coarse-to-fine fashion. The algorithm of this paper is verified on the OTB-50 dataset. The one-pass evalution(OPE) value can reach 0.612, which is better than the same type of algorithms.

Action recognition based on virtual simulation for prosthetic vision

Dantong Xu, Ying Zhao, Aiping Yu, et al.

Show abstract

For now, even the state-of-the-art of retinal prostheses is limited by low resolution and image processing algorithms. In order to find suitable parameters for prosthetic vision and help visual prosthesis wearers get on well with low-resolution gray-scale image correctly and effectively, the action recognition experiments based on virtual simulation were performed in this paper. Animated videos of skeletonized walking upright were structured in 3ds Max, then the action animation video clips were pixelized by MATLAB. The actions were classified into three categories: combined actions, simple actions and difficult actions. All action clips have 6 resolutions (6×16, 24×24, 32×32, 48×48, 64×64, 128×128). Twenty observers (classified by gender and experience) were recruited to participate in the test. They were asked to recognize actions at different resolutions. The results showed that there was no significant regular pattern of gender difference on the recognition accuracy. As for experience difference, the recognition accuracy of experienced observers was higher than the unexperienced ones. The conclusion was drawing that learning experience can improve recognition accuracy, experienced observers recognized an action requires lower resolution, 48×48 was a suitable resolution which has considerable latent capacity.

MFM Net: modify feature map for object detection

Jinhui Qin, Weiqi Jin, Su Qiu, et al.

Show abstract

Object detection, the important task in computer vision, is widely used in face recognition and unmanned drive. Based on VGG16¹ , a fast and simple backbone comparing to deeper network, this paper proposes a new block, named Modify Feature Map (MFM) Block, to improve feature maps, leading from two facts: different channel in the feature map represents different feature in an image; every position in a feature map belong to the object or background. We establish MFM Net to predict location and classification. Some experiments on Pascal VOC 2007 and MS COCO show that MFM Net can achieve high performance with real-time speed.

Receptive field enrichment network for pedestrian detection

Pengfei Luo, Zengfu Wang

Show abstract

The current advanced pedestrian detection methods adopt feature maps with different resolutions to cover multiscale pedestrians. Despite multi-scale feature pyramid can alleviate the problems caused by scale variation, each layer used for detection has merely a fixed receptive field, which results in the defects related to pedestrians with wide range of scale and aspect ratio variation. In this paper, we propose the Receptive Field Enrichment Network (RFENet), an endto- end framework for fast and accurate pedestrian detection. Two blocks are introduced in this framework, a receptive field enrichment module (RFEM) and a hierarchy aggregation module (HAM). The former is designed to diversify receptive fields of features, so as to better adapt to pedestrians with different scales and aspect ratios. The latter is further applied to enhance the entire feature hierarchy by merging spatial information and high-level semantics from different layers simultaneously. To evaluate the effectiveness of our method, extensive experiments are conducted on CityPersons and Caltech datasets. The results show that our proposed RFENet achieves comparable performance with state-of-the-art methods.

Self-repairing object tracking method by adopting multi-level features

Mengjie Hu, Ying Xiong, Xiaoyang Li

Show abstract

Visual object tracking is a fundamental problem in computer vision community and has been studied for decades. Trackers are prone to drift over time without other information. In this paper, we propose a self-repairing online object tracking algorithm based on different level of features. The fine-grained low-level features are used to locate the specific object in each frame and the coarse-grained high-level features are used to describe the category-level representation. We design a tracking kernel updating mechanism based on category-level description to revise the online tracking drift. We tested our proposed algorithm on OTB-50 dataset and compared the proposed method with some popular real-time online tracking algorithms. Experimental results demonstrated the effectiveness of our proposed method.

Applying estimation models to accelerate genetic algorithms for charging scheduling problems in wireless rechargeable sensor networks

Jingjing Chen, Hong wei Chen, Wu Yu Chang

Show abstract

In this paper, we designed a genetic algorithm based on a new charging path estimation model to obtain an efficient charging scheduling in a wireless rechargeable sensor network. Specifically, we first proposed a charging path estimation model, through which an expected cost of a scheduling charging path can be obtained. Based on this model, a genetic algorithm, which includes a traditional design of chromosome structure, selection, cross-over and mutation operation, supporting the charging scheduling for wireless charging vehicles is devised at the same time. We finally evaluate the performance of the proposed algorithm by extensive simulations. Simulation results show that the proposed algorithm is promising, can improve the performance of wireless rechargeable sensor network.

Efficient training and inference in highly temporal activity recognition

Masoud Charkhabi, Nivedita Rahurkar

Show abstract

High-performance Activity Recognition models from video data are difficult to train and deploy efficiently. We measure efficiency in performance, model size, and run-time; during training and inference. Researchers have demonstrated that 3D convolutions capture the space-time dynamics well [13]. The challenge is that 3D convolutions are computationally intensive. [8] Propose the Temporal Shift Module (TSM) for train-efficiency, and [5] proposes DeepCompression for inference-efficiency. TSM is a simple yet effective way to gain near 3D convolution performance at 2D convolution computation cost. We apply these efficiency techniques to a newly labeled activity recognition data set through transfer learning. Our labeling strategy is meant to create highly temporal activity. We benchmark against a 2D ResNet50 backbone trained on individual frames, and a multilayer 3DCNN on multi-frame short videos. Our contributions are: 1. A new highly temporal activity recognition dataset based on egoHands [1]. 2. results that show a 3D backbone on videos outperforms a 2D one on frames. 3. With TSM we achieve 5x train efficiency in run-time with negligible performance loss. 4. With Quantization alone we achieve 10x inference efficiency in model size with negligible performance loss.

Machine Learning and Artificial Neural Networks

Tai chi action recognition based on structural LSTM with attention module

Lingxiao Dong, Dongmei Li, Shaobin Li, et al.

Show abstract

Tai chi is a traditional Chinese sport, which is popular all over the world. It is expressed by slow, soft, and continuously flowing moves. At present, studies on action recognition are generally aimed at common actions, such as walking, jumping. These algorithms are not very suitable for Tai chi recognition. Because Tai chi actions have unique characteristics. Through careful analysis of Tai chi moves and research of existing Tai chi dataset, we propose and build a Tai chi dataset, named Sub-Tai chi. This dataset is based on joints and skeleton, consisting of 15 representative basic actions of different body parts. For Tai chi action recognition, we use Structural LSTM with Attention Module, which is an action recognition method based on neural network. We use RNN to capture action features and use the full connected layer to classify actions. In this paper, we introduce the velocity features and acceleration features to improve Tai chi actions. Experimental results show that the method proposed in this paper has accuracy about 79%, which is nearly 7% higher than the original algorithm.

Research on intelligent internet financial investment model

Hualing Liu, Saijun Zhou, Wanmeng Yang

Show abstract

Currently there is a growing concern over the issue of peer-to-peer (P2P) lending. A key challenge for personal investors in P2P lending marketplaces is how to accurately identify the subject of loan funds and how to effectively evaluate the profit and risk of the subject in the context of lending success.In this paper, we use the nuclear regression model to evaluate the probability of successful lending, to provide effective frontier for investors, and to give the optimal combination of the recommended bids for the lenders under different risk preferences.Finally we verify the scheme with data from Paipai Lending, the largest P2P network lending website in China. Experimental results reveals that the scheme can effectively provide investors more investment options.

Pattern recognition by pattern inversion

Alexei Mikhailov, Mikhail Karavay

Show abstract

Alternative approach to pattern recognition is discussed that amounts to operations on inverse patterns and resembles working of Google-type search engines. Unlike neural networks that iteratively calculate weights within many a learning cycles, inverse patterns-based paradigm (neural cortex) does not use weights and follows a challenging learning trend that attempts to achieve a human-like generalization from a single example.

A research on generative adversarial network algorithm based on GPU parallel acceleration

Haibo Chen, Tao Jia, Jing Tang

Show abstract

In recent years, Generative Adversarial Network (GAN) has received much attention in the field of machine learning. It is an unsupervised learning model which is widely used in image, video, voice, etc. Based on GAN's two-man zero-sum game theory, the researchers proposed excellent variant algorithms such as deep convolutional GAN(DCGAN), Conditional GAN(CGAN), Least Squares GAN(LSGAN), and Boundary Equilibrium GAN (BEGAN), which has gradually overcome the problem of training imbalance and model collapse. However, the time efficiency of model training has always been a challenging problem. This paper proposes a GAN algorithm based on GPU parallel acceleration, which utilizes the powerful computing power of GPU and the advantages of multi-parallel computing, greatly reduces the time of model training, improves the training efficiency of GAN model, and achieves better modeling performance. Finally, we used the LSUN public scene dataset and the TIMIT public voice dataset to evaluate the proposed algorithm and compare it with the traditional GAN, DCGAN, LSGAN, and BEGAN algorithms. The experiment has fully proved the time advantage of the model training of the algorithm introduced in this paper.

The classification of facial expressions with multi-cultural backgrounds: an event-related fMRI study

Sutao Song, Chunyu Liu, Lijie Huang, et al.

Show abstract

Expression recognition is important for our social interaction and communications, but the role of face-selective regions in discriminating various facial expressions remain unclear, especially when the expressions came from multi-cultural backgrounds. In this study, 800 facial expressions collected from 5 different facial expression databases with western or eastern cultural backgrounds were shown to the subjects in a slow event-related fMRI experiment. The subjects were instructed to indicate the category of facial expressions (happy, disgust, angry or neutral) by pressing different buttons. One multivariate pattern analysis method, support vector machine was trained to predict the categories of facial expressions. Results showed that: (1) the face selective regions differed in their ability for expression decoding, but a similar pattern was observed, with a predominance to classify facial expressions with opposite valence, i.e. happy vs. fear and happy vs. disgust. Besides, angry vs. disgust and happy vs. neutral achieved the lowest results. (2) the accuracies of facial expression classification cross-databases were as high as the accuracy of the generalization across runs withindatabase. These results provided evidence for the consistency of the representation of facial expressions in human brain with different culture backgrounds.

Prognostic recurrence analysis method for non-small cell lung cancer based on CT imaging

Xu Wang, Hui-hong Duan, Sheng-dong Nie

Show abstract

In order to assist doctors in planning postoperative treatment and re-examination of patients with non-small cell lung cancer, this study proposed a prognostic recurrence analysis method for non-small cell lung cancer based on CT imaging features, aiming to use multiple CT image features to predict the prognosis recurrence of non-small cell lung cancer. Firstly, the lung tumor area was segmented and features were extracted. Secondly, the extracted feature data was optimized for removing redundant features. Then, the optimized feature data and the patient's prognosis were taken as input, the data was trained using a machine learning method, and a predictive analysis model was constructed to predict the prognosis of the non-small cell patient. Finally, experiments were designed to verify the performance of the prognostic recurrence analysis model. A total of 157 patients with non-small cell lung cancer were enrolled in the study. The experimental results showed that the predictive accuracy of the prognostic recurrence model of random forest classifier based on CT imagery grayscale, shape and texture is as high as 84.7%, which can effectively assist doctors to make more accurate prognosis for patients with non-small cell lung cancer. This model can help doctors choose treatment and review methods to prolong the patient's survival.

Task scheduling algorithm for non-unit time testing

Zhongming Yang, Wei Li, Wenquan Zeng

Show abstract

Due to different length of test tasks, multi-thread testing task scheduling for software automatic testing is a non-unit time task scheduling procedure. This text will study a unit time task scheduling algorithm with time limit and punishment from the point of view of deadline of performing task scheduling, and design a scheduling algorithm which can meet the non-unit time task and complete the testing task within the limited time range. And this optimal scheduling design of the algorithm stated above has been verified by experiments.

Community conflict prediction method based on spliced BiLSTM

Si Chen, Xiaodong Cai, Bo Li, et al.

Show abstract

Existing community conflict prediction models usually use a single unidirectional LSTM network to process graph and word embeddings simultaneously. However，there is no temporal coherence between graph and word embeddings. And their importance for prediction is different. A community conflict prediction method based on spliced bidirectional LSTM is proposed. Firstly, two bidirectional LSTMs are utilized to process graph and word embeddings respectively to break temporal dependency. Secondly, the hidden states of the two bidirectional LSTMs are weighted. Finally, the weighted hidden states are spliced and fed into subsequent layers of the neural network to predict conflicts. Experimental results show that this method can improve the AUC value to 0.733 on the Reddit dataset, and reduce the number of iterations of training.

Research on path optimization of reverse logistics network

Tao Fan, Ying Sun

Show abstract

In the era of "Internet +", when using genetic algorithm to explore the recovery path optimization problem of reverse logistics network, it is found that the transportation time and vehicle arrangement have an important impact on the problem while requiring the shortest transportation path. In practical applications, when applying the Travelling Salesman Problem (TSP) method to solve the location and path planning problems of reverse logistics network, it is necessary to redesign the distance matrix of the method and the fitness of the solution algorithm. This paper designs a set of reverse logistics path planning model considering path, environmental impact and resource utilization.

Entity relationship extraction optimization based on entity recognition

Yanru Zhong, Zhaorong He, Leixian Zhao, et al.

Show abstract

Chinese entity relationship is usually stored in the form of a triple, usually based on dependent on its syntactic and semantic role labeling way of information extraction, the method to extract the entities may be greatly influenced by noise, this paper USES neural network is optimized by the recognition of Chinese entities triples, abstracted from the first extracts the initial triples, and then use neural network to the initial triples, which can identify the entity through our experiment, found that this method not only can well remove the noise of the entity, and can be controlled by neural networks,Allows the result of a triple that can only be parsed as expected.

Semantic-constraint graph dual non-negative matrix factorization in text co-clustering

Yu Liu, Jiaxun Hua, Youguang Chen

Show abstract

Co-clustering, an extension of one-sided clustering, refers to process of clustering data points and features simultaneously. During text clustering tasks, traditional one-sided clustering algorithms have encountered difficulties dealing with sparse problem. Instead, a co-clustering procedure, where data's common organizing form is a big matrix aggregated by data points, has proved more useful when faced with sparsity. Based on the traditional co-clustering approaches, a new model named SC-DNMF, which takes into account the semantic constraints between words, is proposed in this paper. Experiments on several datasets indicate that our proposal improves the clustering accuracy over traditional co-clustering models.

Bagging deep autoencoders with dynamic threshold for semi-supervised anomaly detection

Bingjun Guo, Lei Song, Taisheng Zheng, et al.

Show abstract

In the field of anomaly detection, anomalies are usually very rare compared with normal samples, which is not conducive to the construction of anomaly detection model. In this paper, we propose a semi-supervised anomaly detection algorithm based on deep autoencoder. With this algorithm, only normal samples are needed to train anomaly detection model. To improve the robustness of the algorithm, Bagging ensemble method is used to train and combine multiple deep autoencoders. In the process of Bagging, dynamic threshold for anomaly detection is applied to increase the diversity of individual autoencoder. Compared with other semi-supervised methods including one-class SVM, SOM and K-Means, our proposed method has obvious superiority in the behavior of anomaly detection.

A research on cell inconsistency prediction of power battery using Gaussian process regression

Liu Ling, Song Chao, Yong-bo Xie, et al.

Show abstract

Cell inconsistency affect battery life and driving safety. In order to solve the accuracy problem of online prediction of cell inconsistency of power battery, battery characteristic analysis based on of vehicle network big data is proceeded, health indicator(HI), based on the cell terminal voltage difference,is proposed through the degradation model; As the similar distribution of cell terminal voltage difference between battery discharge conditions, the health indicator sequence based on SOC(State of Charge) is constructed, and the next health indicator is predicted by Gaussian process regression. The prediction results show that the method requires less training samples and less hardware resources, and the overall prediction accuracy is not less than 85%, which can meet the practical requirements.

A face reconstruction method based on fusion regression network and gradient descent

Menglin Zhao, Ming Yan

Show abstract

Three-dimensional (3D) face reconstruction refers to the restoration and reconstruction of 3D model of face from one or more two-dimensional (2D) images. It has been widely used in face recognition, expression migration, face editing and other aspects. In the current existing algorithms, there are still many shortcomings in how to reconstruct 3D face by parametric model in real time. In this paper, based on the convolutional neural network, we integrate the weight mask into the loss function, and then use the back propagation algorithm to calculate the parameter gradient error. Finally, the parameter self-renewal purpose of the loss function is achieved by gradient descent. It can be seen from the experimental results that this method can accurately reconstruct the 3D contour of the face, and the reconstruction results are complete and the topological structure is known. This is very important for the application after face reconstruction, such as face changing, expression changing and other aspects of accuracy has been greatly improved.

Research on face recognition technology and its application in intelligent security

Haibo Chen, Lei Liu, Jing Tang

Show abstract

With the rapid development of artificial intelligence, face recognition, as an important part of artificial intelligence, has a tremendous impact on people's lives. This paper aims to improve the accuracy and rapidity of face recognition by using deep learning and cloud computing technology, as well as to provide face recognition service for intelligent security system. Firstly, the Convolutional Neural Network (CNN) is introduced and Large Margin Cosine Loss (LMCL)+Center Loss is proposed as the loss function of CNN to train the face feature extraction model. Then, combined with cloud computing technology, a parallel model training and face recognition implementation scheme are proposed to solve the problems of large amount of data and slow model training faced by face recognition technology. Finally, the face recognition scheme is applied to the intelligent security system of a telecommunications operators, and good results are achieved.

Ensemble learning based multi-source information fusion

Junyi Xu, Le Li, Ming Ji

Show abstract

Aiming at the target recognition tasks of multi-source sensors, this paper proposes a decision-level information fusion model based on ensemble learning to improve the target recognition ability of distributed sensors. Based on distributed sensing data, feature analysis model is first constructed to reduce the dimension of original data. Then, target recognition model is constructed by data mining to realize the rapid identification by single classifier. On this basis, information fusion model based on ensemble learning is proposed to assist decision-making, combined with different ensemble strategies to improve the robustness and reliability of multi-source sensor target recognition. Finally, five public data sets are used to verify the effect of multi-source information fusion model under four homogeneous strategies and two heterogeneous strategies.

Improved extreme learning machine and its application in SAR target recognition

Jian Chen

Show abstract

In this paper, An improved algorithm for the extreme learning machine is proposed and applied to SAR target recognition.In order to solve the influence of the noise and spatial distribution of the training samples on the calculation of the classification plane, different penalty factors are given to different training samples, and according to this, the “weighted extreme learning machine” is proposed. And then,the kernel function is introduced into the "extreme learning machine" to improve the ability of nonlinear function approximation. Considering that the general training algorithm of the weighted extreme learning machine is slow and consumes a lot of computer memory when the number of training samples is large, a training method based on conjugate gradient algorithm is proposed. The test on "banana benchmark data" shows that the weighted extreme learning machine based on the conjugate gradient method can complete the convergence in the number of iterations far less than the number of samples, and the calculation speed is much faster than the traditional algorithm. Finally, this proposed algorithm is applied to SAR target recognition. The test on MSTAR data set shows that the proposed algorithm is not only extremely fast in SAR target recognition, but also has better recognition performance than support vector machine, general limit learning machine, BP neural network and other algorithms.

Scene matching areas classification based on PCANet and MLP

Kai Sun, Liang Pan, Weilin Yuan

Show abstract

Scene matching aided navigation is mainly used in autonomous navigation of aircraft. In scene matching field, scene matching areas selecting is a great challenge. The traditional methods focus on extracting image features and building a model to fit the relationship between image features and matching suitability indicators. We propose a new method combing principal component analysis network (PCANet) and multi-layer perception (MLP) to select scene matching areas for the first time. Firstly, we built a dataset based on images captured by TerraSAR-X satellite. Secondly, we extract information of each image by PCANet and generate label based on matching probability. Finally, MLP is used to automatically fit the mapping relation between image and matching suitability. The proposed method avoids the steps of extracting features manually and improves the performance in different task. The method proposed in this paper performs better than convolutional neural network (CNN).

Convolutional neural network with contextualized word embedding for text classification

Gaoyang Fan, Cui Zhu, Wenjun Zhu

Show abstract

Text classification is a fundamental task in natural language processing. This task is widely concerned and applied. However, previous methods mainly use traditional static word embedding, but static word embedding could not deal with the problem of polysemy. For this reason, we propose to utilize contextualized BERT word embedding to effectively encode the input sequence and then use the temporal convolutional module which simply computes a 1-D convolution to extract high-level features, finally, the max-pooling layer retains the most critical features for text classification. We conduct experiments on six commonly used large-scale text categorization datasets, including sentiment analysis, problem classification and topic classification tasks. Due to the limitation of BERT processing long text, we propose an effective truncation method. Experimental results show that our proposed method outperforms previous methods.

An experimental verification for the convergence of the BIBCI algorithm

Chunxiao Ren, Yuxiao Wu

Show abstract

The BIBCI algorithm solved the parameter estimation problem for Choquet integral. The computational complexity for the Choquet integral is reduced from O((n+K) ∗ 2²ⁿ) to K ∗O(n²logn). In this paper, we give an experimental verification for the convergence of the BIBCI algorithm. The experiments show that the BIBCI algorithm is convergent.

The visual synaesthesia analysis of Chinese traditional music aesthetics

Yan Gao, Xinyu Ma, Lingyun Xie

Show abstract

With the development of multi-platform and diversified audio-visual multimedia, new requirements and technical problems have been put forward for the automatic integration and matching of audio and visual. Studies showed that synaesthesia has a solid physiological and psychological basis. As the Chinese traditional culture, Chinese traditional music and Chinese painting are closely connected by the cultural background, aesthetic connotation, spatial consciousness and emotional expression. In this paper, the synaesthesia rule between Chinese traditional music aesthetics and the Chinese painting composition elements was found through the visual synaesthesia experiment, which was verified by the multi-perception experiment. It can provide support for audio-visual interactive aesthetics, audio-visual intelligent matching calculation and music visualization.

Deep learning-based visual inspection for the delayed brittle fracture of high-strength bolts in long-span steel bridges

Jing Zhou, Linsheng Huo, Gangbing Song, et al.

Show abstract

The delayed brittle fracture of high-strength bolts in long-span steel bridges threatens the safety of the bridges and even lead to serious accidents. Currently, human periodic inspection, the most commonly applied detection method for this kind of high-strength bolts damage, is a dangerous process and consumes plenty of manpower and time. To detect the damage fast and automatically, a visual inspection approach based on deep learning is proposed. YOLOv3, an object detection algorithm based on convolution neural network (CNN), is introduced due to its good performance for the detection of small objects. First, a dataset including 500 images labeled for damage is developed. Then, the YOLOv3 neural network model is trained by using the dataset, and the capability of the trained model is verified by using 2 new damage images. The feasibility of the proposed detection method has been demonstrated by the experimental results.

A new gait energy image based on mask processing for pedestrian gait recognition

Zhong Li, Jiulong Xiong, Xiangbin Ye

Show abstract

Under a more realistic experimental setup, the performance of the existing gait recognition approaches would drop drastically. Because pedestrians are mostly under different and unknown covariate conditions. Thus, the influence caused by changes of clothing and carrying on profile of pedestrians is the main obstacle of gait recognition. In this paper, we propose a new Gait Energy Image based on mask processing (MP-GEI) to reduce the influence of covariate conditions. Firstly, we calculate the Gait Energy Image (GEI) and its synthetic average template which includes the various features of 124 subjects under different covariate conditions from five views (54°, 72°, 90°, 108°, 126°). Secondly, we propose Gait Entropy Image (GEnI) and calculate its synthetic average template (T-GEnI). Thirdly, we calculate the mask representing the dynamic feature areas in T-GEnI by setting the threshold. Finally, we use parts of the mask to remove the irrelevant gait information in GEI. In this work, we explore the performance of MP-GEI with two models based on convolution neural network (CNN), and experiments are carried out on the CASIA Dataset B. Our results demonstrate that the proposed approach achieves better correct classification rate compared with GEI when pedestrians are under different and unknown covariate conditions. In addition, using the pre-trained VGG-16 model to extract deep features for recognition is more effective than fine-tuning the pre-trained VGG-16 model.

Intelligent control method for high maneuvering fighter

Zongcheng Liu, Renwei Zuo, Qiuni Li, et al.

Show abstract

For high maneuvering fighters, we propose an intelligent control method which has strong robustness in the sense that the system nonlinearities are almost completely unknown and dead-zone nonlinearity is present. Moreover, the proposed intelligent controller has very simple structure since approximators are not used and differentiation of virtual control is avoided. It is proved that all the signals in the closed-loop control system are bounded, and the attitude angle tracking error can converge to an arbitrarily small neighborhood around zero by choosing the appropriate design parameters.

Auditory effect of chanting sound

Wei-lin Ning, Ya-li Liu

Show abstract

The sound of Buddhist chanting characterized by both speech and music, it played an important part of traditional culture. Its unique auditory effect needs to be studied and developed. In this paper the auditory effects of the chanting sound were explored from two aspects: mood survey and physiological parameters. The experiment was designed from the perspective of subjective auditory perception. Experiments showed that the chanting sound inhibited negative emotions, enhanced sense of calm, slowed down heart rate and increased imbalance of EEG power first, then balanced it. In contrast experiments, we found that listening to traffic noise stimulated negative emotions, inhibited positive emotions and continuously enhanced imbalance of EEG power.

Convolutional neural networks application in cardiovascular decision support systems

Natalia Konnova, Mikhail Basarab, Michael Khachatryan, et al.

Show abstract

The paper considers the possibilities of using neural network methods of machine learning to diagnose the states of the human cardiovascular system and support decision-making in cardiology and cardiac surgery. The issues of processing and preparation of electrocardiography signals, selection of architecture and tuning of neural network parameters for automation of diagnosis are discussed. Here, the results obtained with the help of multilayer perceptrons and convolutional neural networks to assign the submitted input cardiovascular data to one of the classes of states in the selected space are examined. Based on a specialized developed software, the proprietary numerical experiments with real clinical data were carried out. Given the above results, demonstrating the applicability of the used deep learning methods and algorithms to diagnostic automation, a model of a hierarchical decision support system is proposed.

Analysis on EEG signal with machine learning

Jaehoon Cha, Kyeong Soo Kim, Haolan Zhang, et al.

Show abstract

In this paper, research on electroencephalogram (EEG) is carried out through principal component analysis (PCA) and support vector machine (SVM). PCA is used to collect EEG data characteristics to discriminate the behaviors by SVM methodology. The actual EEG signals are obtained from 18 experimenters who raised hands with meditation and actual movement during the experiments. The 16-channel data from the experiments form one data set. In order to get principal component of EEG signal, 16 features are considered from each channel and normalized. Simulation results demonstrate that two behaviors – i.e., raising hands and meditation – can be clearly classified using SVM, which is also visualized by a 2-dimensional principal component plot. Our research shows that specific human actions and thinking can be efficiently classified based on EEG signals using machine learning techniques like PCA and SVM. The result can apply to make action only with thinking.

Development of upper limb rehabilitation training control system based on path planning

Kai Guo, Shasha Zhao, Yongfeng Liu, et al.

Show abstract

The experience of a large number of hemiplegia clinical rehabilitation treatments shows that the rehabilitation process of hemiplegia patients can be divided into three phases: the acute phase, recovery phase, and sequelae. Different stages of training therapy often need to adopt different training strategies and modes according to clinical characteristics. Through the passive, semi-passive semi-active and active three rehabilitation training methods, the single joint movement and compound movement of the upper limb 7 degrees of freedom can be realized. The three rehabilitation training modes can be applied to the acute, recovery, and sequelae of the patient rehabilitation project, respectively.

A novel improved CNN algorithm via denoising approach

Jie Zhang, Jingjing Liu, Mingyu Wang, et al.

Show abstract

Address-event-based Dynamic Vision Sensor(DVS) and Convolutional Neural Network(CNN) have been widely researched in recent years. However, the collected data of DVS are easily affected by some noise, which makes it difficult to identify the target during the classification processing. In order to solve the problem of misclassification, a novel improved CNN(NI-CNN) technique is proposed in this paper. Firstly, the appropriate number of event pulses are chosen and mapped to the frame domain, then the optimization denosing approach is utilized to the whole classification system. Secondly, reducing intra-class spacing and enlarging inter-class divergence by joint loss function which is adjusted regularization parameters. Numerical comparisons between our proposed approach and some state-of-the-art solvers, on several accessible databases, are presented to demonstrate its efficiency and effectiveness.

Research on textual classification of medical history in electronic patient records based on LSTM

Yirong Zhuo, Dong Cao, Haimei Wu, et al.

Show abstract

Natural Language Processing (NLP) is an important direction in the field of computer science and artificial intelligence. Combining with deep learning, NLP can effectively transform unstructured natural language into structured data. The electronic medical records of hospitals are mainly used in clinic, and the data of electronic medical records need to be reorganized to carry out research. This paper mainly studies the automatic classification and extraction of medical history information fields based on Convolutional neural network (CNN) and Long Short-Term Memory network (LSTM), aiming at solving the problem of traditional Chinese medicine. The classification problem of automatic extraction of all medical history information from mixed text information of medical records. The experimental results show that the F value is 0.8506 based on Convolutional Neural Network (CNN) and 0.8810 based on LSTM, which has good classification effect.

AI Systems

A sparsity-relaxed algorithm for the under-determined convolutive blind source separation

Junjie Yang, Yi Guo, Zuyuan Yang, et al.

Show abstract

Convolutive blind source separation (CBSS) is a kind of signal processing method by separating multiple sources from a convolutive mixing model. The concept of CBSS is to recover the latent sources in a reverberant environment. Usually, a two-stage scheme including the mixing matrix estimation and the source recovery are proposed to fulfill this target. In this paper, we mainly discuss the source recovery problem based on the knowledge of estimated mixing matrix. Specifically, this problem can be categorized as a sparse source construction optimization model, especially for the under-determined case where the number of sources is greater than the number of microphones. Inspirited by the fact that only few source components are active at each time-frequency slot, a new augmented Lagrange method is proposed to find the optimal sparse solution of sources with the ℓp norm (0<p<1) based measurement function. The proposed method relaxes the strict sparse assumption on sources, hence improve the source separation performance. The experiment results demonstrate that the proposed algorithm is superior than the state-of-the-art methods.

Chinese dialect identification using prosodic classes and enhanced bigram model

Linjia Sun

Show abstract

A method based on prosodic classes is proposed for Chinese dialect identification in this paper. The prosodic classes are obtained from a large number of prosodic words which are the basic unit of the prosodic structure, and simultaneously include acoustic, phonotactic and prosodic feature to classify dialects. In addition, the pauses between prosodic words also are considered and described as a special prosodic class. The different between the Chinese dialects is distinguished by the prosodic classes and their order in the whole sentences. The enhanced bigram model (EBM) based on HMM technique is proposed to obtain the sequential statistics of sequences of prosodic classes, which is shown to yields better identification performance and outperform the universal HMM model. We implement the new method to illustrate the capability of identification and evaluate it on the corpus from the Project for the Protection of Language Resources of China. The experimental results show that our method provides competitive performance with the existing methods.

An adaptive hierarchical multi-hop routing protocol based on energy balance in WSN

Xi Jie, Yaran Li, Shuai Han, et al.

Show abstract

Due to the harsh battlefield environment and the large amount of transmission data, it is preferable for individual soldiers with sensor nodes to form a wireless sensor network (WSN) with command center, so that to carry out more concealing and deeper military operations. However, the data transmission in the mobile WSN faces many difficult challenges such as unfixed topology, packet loss, high energy consumption, etc. How to design an efficient route that improves the packet transmission success rate and WSN lifetime effectively has been one of the important issues in mobile WSN researches. A variety of cluster routing protocols have been proposed in the past, however, it is hard to balance energy consumption and lifetime. Hence, this paper proposes an adaptive hierarchical multi-hop routing (AHMHR) protocol based on energy balance so as to enhance the reliability of long-distance transmission in the mobile WSN. Preliminary simulation is performed and shows great potential in improving packet transmission success rate, reducing delay and energy consumption when compared with the existing protocols, including LEACH, LEACH-ME and CBR-Mobile. The simulation results also prove that the performance of AHMHR can maintain high level stability and AHMHR protocol can achieve energy balance and prolong WSN lifetime regardless of the proportion of mobile nodes.

WOA with adaptive mutation operator to estimate parameters of heavy oil thermal cracking model

Shuyue Zhang, Ning Wang

Show abstract

This paper proposes an enhanced whale optimization algorithm with adaptive mutation operator (amWOA). In amWOA, the adaptive mutation operator is designed to balance the global search and local search abilities. The population sequencing strategy is added to the mutation operator to help the algorithm jump out of the local optimum. The numerical results of three test functions show that the amWOA has better performance. The amWOA is adopted for parameter estimation of the heavy oil thermal cracking model. The simulation results show that the amWOA has the smallest modeling error.

A new distance measure based on Pythagorean hesitant fuzzy sets and its application to multi-criteria decision making

Yanru Zhong, Xiuyan Guo, Hong Gao, et al.

Show abstract

The Pythagorean fuzzy set is characterized by five parameters, namely membership degree, non-membership degree, indeterminacy degree, strength of commitment about membership, and direction of commitment. And distance measure is an important index in Pythagorean Hesitant fuzzy (PHF) environment when solving the multicriteria decision-making problem. However, the existing distance measure considers the difference between the member ship degrees, the non-membership degrees, and the degrees of indeterminacy, but ignores the influence of the difference between the directions of PHF sets (PHFSs). The existing distance measure method may lead to unreasonable results sometimes. Inspired by above, the five parameters of PFS are extended to Pythagorean Hesitant Fuzzy set (PHFS) fully in this paper, generating new distance measures of PHFS and introducing some properties and theorems firstly. Then, the proposed method is applied in MCDM with PHF information by considering the distance between the positive ideal solution and each alternative. Finally, to validate the effectiveness of the proposed method, a pragmatic experiment is introduced for comparisons with existing methods.

Video-based violence detection by human action analysis with neural network

Yunqing Zhao, Wilton W. T. Fok, C. W. Chan

Show abstract

In recent years, human action analysis is a focal point in video processing, especially on action recognition and safety surveillance. It always performs as an auxiliary tool to minimize the manpower-resource on special tasks. This paper explores the human action analysis in a specified situation, based on the human posture extraction by pose-estimation algorithm. Deep neural network (DNN) methods was used, composed of residual learning blocks for feature extraction and recurrent neural network for time-series data learning. All these modules can be applied on real-time videos, classifying different security levels of actions between two people, with 91.8% accuracy on test set. Meanwhile, some other classical network structures were compared as baselines. After forward inference process of the neural network model, a logic enhancement algorithm was raised and applied in this paper, due to the prediction error between two classes. Experiments were conducted on real-time videos, achieving satisfying performance.

Development of educational game based on Cocos2d-JS engine

Pingping Chen, Min Zou, Xiaoran Geng, et al.

Show abstract

The project has developed an educational wechat game based on Cocos2d-JS engine, using an innovative form that combining traditional amusement games with English word answering system. This game combine enjoyment with knowledge, and practice the idea of Educational Game. In this way, players can have a good experience because of the Combination of enjoyment and knowledge. It can achieve the goal of "combining teaching with pleasure".

Research on autonomous maneuvering decision of UCAV based on approximate dynamic programming

Zhencai Hu, Peng Gao, Fei Wang

Show abstract

Unmanned aircraft systems can perform some more dangerous and difficult missions which manned aircraft systems cannot perform. For tasks with high complexity, such as air combat, maneuvering decision mechanism is required to sense the combat environment and make the optimal strategy in real time. This paper formulates one-to-one air combat maneuvering problem in 3D environment, and proposes an approximate dynamic programming approach to make optimal maneuvering decisions automatically. The aircraft searches for combat strategies based-on Reinforcement Leaning, while sensing the environment, taking available maneuvering actions and receiving feedback reward signals. To solve the problem of dimensional explosion in the air combat, the proposed method is implemented through feature selection, trajectory sampling, function approximation and Bellman backup operation in the air combat simulation environment. This approximate dynamic programming approach provides a fast response to rapid changing tactical situations, and learns effective strategies to fight against the opponent aircraft.

Research on vehicle control technology of brain-computer interface based on SSVEP

Sheng Long, Zongtan Zhou, Yang Yu, et al.

Show abstract

In this paper, we propose an asynchronous paradigm for controlling a car using steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI) and conduct experimental tests on real car outside the laboratory. The paradigm uses six stimulation frequencies to classify targets by canonical correlation analysis (CCA) method and generates multi-task vehicle control strategies, including left and right turn signals, wipers, horns, doors and hazard lights. Four healthy volunteers participated in the online car control experiment, and the average correct rate reached 88.43%. Subject S1 showed the most satisfactory BCI-based performance, and its true positive rate and false positive rate were in line with expectations. The research shows the feasibility and effectiveness of the paradigm in automotive control applications, which lays the foundation for future research and development of related brain-controlled automotive technologies, thereby helping individuals with mobility impairments to provide supplements or alternatives, and can also provide an auxiliary vehicle driving strategy for healthy people.

Quantization of deep convolutional networks

Yea-Shuan Huang, Charles Djimy Slot, Chang Wu Yu

Show abstract

In recent years increasingly complex architectures for deep convolutional networks (DCNs) have been proposed to boost the performance on image recognition tasks. However, the gains in performance have come at a cost of substantial increase in computation and model storage resources. Implementation of quantized DCNs has the potential to alleviate some of these complexities and facilitate potential deployment on embedded hardware. In this paper, we experiment with three different quantizers for the implementation of DCNs. We denote them by min-max quantizer (MMQ), average quantizer (AQ) and histogram average quantizer (HAQ). We used a set of 8 different bit-widths (i.e one, two, …, eight bits) to quantize each DCN’s weight to run our experiments. Experimental results show that due to the non-destructive effect on the original distribution of HAQ, it outperforms both MMQ and AQ.

Sentence-level sentiment analysis via BERT and BiGRU

Jianghong Shen, Xiaodong Liao, Zhuang Tao

Show abstract

Sentiment analysis is a significant task in nature language processing (NLP). Acquiring high quality word representations is a key point in the task. Specially we find that the same word has different meaning in different sentence, which should be recognized by computer. This idea cannot be done well by traditional way of word embeddings. In this paper, we propose a BERT(Bidirectional Encoder Representation from Transformers) + BiGRU (Bidirectional Gated Recurrent Unit) model which first put words into vector via BERT model, from which we can gain the contextualized embeddings, then perform the sentiment analysis by BiGRU. Experimental results prove that compared with various of different methods, our model has the best performing.

IFPSS: intelligence fire point sensing systems in AIoT environments

Kun-Ming Yu, Yen-Chiu Chen, Chung-Hsing Liu, et al.

Show abstract

The situation at the scene of the fire is changing rapidly. How to collect and analyze the most immediate fire information, providing the most effective information for disaster decision-making has always been an important issue. This paper proposes an Intelligent Fire Point Sensing System (IFPSS), which proposes fire condition prediction based on artificial intelligence technology as well as large amounts of gas and temperature data in fire scenes collected by IoT devices. The IFPSS collected actual gas and temperature data from the simulation room where the actual fire test was conducted. Taking carbon monoxide (CO) and hydrogen sulfide (H₂S) data as an example, the artificial intelligence analysis of IFPSS uses linear regression algorithm to establish artificial intelligence model. After training and testing the model, an accuracy of up to 84.4% predicts whether the fire process is in the very early stages of a fire.

Short-term solar PV forecasting based on recurrent neural network and clustering

Wen Ouyang, Kun-Ming Yu, Nattawat Sodsong, et al.

Show abstract

With the large-scale deployment of solar photovoltaic (PV) installation, managing the efficiency of the generation system has become essential. One of the main challenges facing solar PV power output lies in the difficulty in managing solar irradiance fluctuation. Generally speaking, the power output is heavily influenced by solar irradiance and sky conditions which are consistently changing. Thus, the ability to accurately forecast the solar PV power is critical for optimizing the generation system and ensuring the quality of service. In this paper, we propose a solar PV forecasting model using Recurrent Neural Network (RNN) in a Cascade model combined with Hierarchical Clustering for improving the overall prediction accuracy of solar PV forecast. The proposed model, upon comparing with other learning algorithms, namely, Feed-forward Artificial Neural Network (FFNN), GRU, Support Vector Regression (SVR) and K Nearest Neighbors (KNN) using the cluster data from K-Means Clustering and Hierarchical Clustering, had the lowest average NRMSE of 8.88% using Hierarchical clustered data. According to the results, Hierarchical Clustering suits better for solar PV forecast than K-means clustering.

Big Data and Large-Scale Scientific Computing

Collaborative adaptive scheduling scheme for multi-source big data tasks in the cloud

Lizi Zheng III, Delong Cui

Show abstract

With the arrival of big data age, workflow applications are transferring from original infrastructure to more efficient, reliable and affordable cloud computing platforms. Focusing on fine-grained big data tasks schedule in cloud computing environment, we proposed a novel collaborative adaptive algorithm based on reinforcement learning in this paper. Experiment results demonstrate the efficiency of the collaborative adaptive scheduling scheme.

Hybrid recommendation algorithm based on logistic regression refinement sorting model

Shihui Chang, Wenguo Wei, Guiyuan Xie

Show abstract

The mainstream recommendation systems mainly use content-based method or collaborative filtering method. However, in specific recommendation scenarios, hybrid algorithm often performs better than single algorithm. We introduce a new recommendation method based on hybrid algorithm, which combined with logistic regression refinement sorting model. Our method can achieve higher accuracy rate and recall rate when we need to consider item and user features comprehensively. We recall and sort items by the hybrid algorithm based on content-based method and collaborative filtering method. After recalling process, we obtain preliminary rough sorting recommendation lists. Then we use logistic regression refinement sorting model to train the rough sorting results. The recommendation results can be more accurate after refinement sorting. We used the song data of a music website as experimental data and set three comparative experiments under different feature weight values. The experimental results show that when we consider the item and user features comprehensively, our method is better than other mainstream methods in accuracy rate and recall rate.

WiFi location method based on TSNE-KNN

Yanru Zhong, Qingbo Xie, Shuaijie Zhao, et al.

Show abstract

Aiming at the problems of low positioning accuracy and high data dimension of traditional WIFI fingerprint locating method, propose the WIFI fingerprint indoor locating method based on TSNE-KNN method to solution the problem. In the offline stage, the WIFI fingerprint database is dimensionalized by using the TSNE (t-distributed embedding), and the TSNE parameters are adjusted to obtain the 2d(two-dimensional) WIFI fingerprint database with high differentiation. In the online phase: firstly, the real-time WIFI signal strength collected together with the original WIFI fingerprint database is used as the input of TSNE. The 2d WIFI fingerprint database obtained in the offline phase is used as the initial solution, and a set of arbitrary data is added as the initial solution. The TSNE parameters obtained in the offline phase are used to calculate the dimensionality reduction data. Then use KNN (k-nearestneighbor) algorithm to achieve WIFI location; Finally, the fingerprint database on the fourth floor of EE building of XJTLU north campus is used as input in the experiment. Experiments show that the TSNE-KNN can effectively display the characteristics of high-dimensional datas with low-dimensional datas, and improve the location accuracy also.

Summary on sensors in agricultural robots

Peng Li, Jinlei Liu, Wei Zhang, et al.

Show abstract

The development of agricultural robots in recent years vastly promotes the process of agricultural automation. Agricultural robots can accomplish various tasks to help producers manage the farmland better and improve the yield. Sensors deployed on agricultural robots are the essential component for them to implement their functions. This paper provides a summary of frequently applied sensors on agricultural robots based on a survey of recent literatures. Moreover, the concrete utilization of those sensors in different aspects of agricultural robots are briefly introduced. Additionally, the application of human-robot collaboration in agricultural robots is introduced. The current problems and future works of agricultural robots are discussed at last.

Double threshold control genetic algorithm based on optimal protection

Bin He, Yuxing Zhang, Yu Wang, et al.

Show abstract

In view of the disappointing phenomenon that genetic algorithm is trapped into the local minimum in application of complex problems easily, the double thresholds are introduced to dynamically adjust the similarity of the parents and mutation probability. The proposed algorithm helps to enhance the crossover effectiveness and the population diversity, improving the search efficiency of the algorithm. Besides, the added optimal protection guarantees the optimal individual undestroyed while expanding the population searching area. After all, the improved genetic algorithm is tested by using 164-point TSP model. The experimental results show that the improved genetic algorithm find new resolution and improve the searching efficiency when the population evolution stagnates. And comparative simulations with parameter pairs could provide the theoretical instructions of selecting the thresholds and coefficients for scholars.

High-performance vehicle diagnostic information collection device for vehicle big data

San-Fu Wang, Yin-Tang Chen, Cheng-Wei Yang, et al.

Show abstract

On-board diagnostics II (OBD-II) [1]is a new automotive device, which can provide the vehicle's self-diagnostic and reporting capability to the vehicle owner or repair technician. And the vehicle owner or repair technician can understand the status of the various vehicle subsystems. These vehicle's self-diagnostic reports are very basic material for the car big data. Currently marketed OBD-II system can only be used for small cars. However, the large vehicle is the main cause of car accident. Therefore, the proposed OBD-II device is suitable for the niche market of the bus and commercial vehicles.

Using electromagnetic parameters of targets and R language to enhance the accurate and rapid judgment

Pai-Song Chiang, Chiang-Ju Chien

Show abstract

Radar is the main method all over the world to monitor the conditions on the sea, but there are many disadvantages for radar to maintain the mission of surveillance. In this paper, an innovative idea from the electromagnetic parameters of radar is presented, and the programs are executed by R language. By this idea, the method of big data to enhance the accurate judgment of the situation of unusual boats on the sea is used. There are large numbers of data from electromagnetic parameters of radar which are unique and like as fingerprints of human. A database of electromagnetic parameters is created that every parameter stands for the vehicle for boats or airplanes by this feature. However, it is more ineffective for surveillance with traditional method of radar, so that the programs are compiled from these big data to make the procedure easier. The simulation values which the 8-49 boats are put into programs test continuously. After computing, how the data large is, the accurate information is always presented rapidly, even though the three simulating conditions, the correct targets are extracted accurately with different program. At the last, the crime or accident records were listed after clustered which would make predictions of condition for the administrator. This study could improve the mission of patrol and security on the sea absolutely for nations, it could also save human resources cost for monitoring.