Proceedings Volume 9897

Real-Time Image and Video Processing 2016

cover
Proceedings Volume 9897

Real-Time Image and Video Processing 2016

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 6 June 2016
Contents: 5 Sessions, 29 Papers, 0 Presentations
Conference: SPIE Photonics Europe 2016
Volume Number: 9897

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 9897
  • Applications of Real-Time Image Processing
  • Real-Time Imaging Implementations
  • Real-Time Video Processing
  • Poster Session
Front Matter: Volume 9897
icon_mobile_dropdown
Front Matter: Volume 9897
This PDF file contains the front matter associated with SPIE Proceedings Volume 9897, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and Conference Committee listing.
Applications of Real-Time Image Processing
icon_mobile_dropdown
Real-time FPGA-based radar imaging for smart mobility systems
Sergio Saponara, Bruno Neri
The paper presents an X-band FMCW (Frequency Modulated Continuous Wave) Radar Imaging system, called X-FRI, for surveillance in smart mobility applications. X-FRI allows for detecting the presence of targets (e.g. obstacles in a railway crossing or urban road crossing, or ships in a small harbor), as well as their speed and their position. With respect to alternative solutions based on LIDAR or camera systems, X-FRI operates in real-time also in bad lighting and weather conditions, night and day. The radio-frequency transceiver is realized through COTS (Commercial Off The Shelf) components on a single-board. An FPGA-based baseband platform allows for real-time Radar image processing.
3D real-time visualization of blood flow in cerebral aneurysms by light field particle image velocimetry
Matthias F. Carlsohn, André Kemmling, Arne Petersen, et al.
Cerebral aneurysms require endovascular treatment to eliminate potentially lethal hemorrhagic rupture by hemostasis of blood flow within the aneurysm. Devices (e.g. coils and flow diverters) promote homeostasis, however, measurement of blood flow within an aneurysm or cerebral vessel before and after device placement on a microscopic level has not been possible so far. This would allow better individualized treatment planning and improve manufacture design of devices. For experimental analysis, direct measurement of real-time microscopic cerebrovascular flow in micro-structures may be an alternative to computed flow simulations. An application of microscopic aneurysm flow measurement on a regular basis to empirically assess a high number of different anatomic shapes and the corresponding effect of different devices would require a fast and reliable method at low cost with high throughout assessment. Transparent three dimensional 3D models of brain vessels and aneurysms may be used for microscopic flow measurements by particle image velocimetry (PIV), however, up to now the size of structures has set the limits for conventional 3D-imaging camera set-ups. On line flow assessment requires additional computational power to cope with the processing large amounts of data generated by sequences of multi-view stereo images, e.g. generated by a light field camera capturing the 3D information by plenoptic imaging of complex flow processes. Recently, a fast and low cost workflow for producing patient specific three dimensional models of cerebral arteries has been established by stereo-lithographic (SLA) 3D printing. These 3D arterial models are transparent an exhibit a replication precision within a submillimeter range required for accurate flow measurements under physiological conditions. We therefore test the feasibility of microscopic flow measurements by PIV analysis using a plenoptic camera system capturing light field image sequences. Averaging across a sequence of single double or triple shots of flashed images enables reconstruction of the real-time corpuscular flow through the vessel system before and after device placement. This approach could enable 3D-insight of microscopic flow within blood vessels and aneurysms at submillimeter resolution. We present an approach that allows real-time assessment of 3D particle flow by high-speed light field image analysis including a solution that addresses high computational load by image processing. The imaging set-up accomplishes fast and reliable PIV analysis in transparent 3D models of brain aneurysms at low cost. High throughput microscopic flow assessment of different shapes of brain aneurysms may therefore be possibly required for patient specific device designs.
Application of the local similarity filter for the suppression of multiplicative noise in medical ultrasound images
In this paper we address the problem of the reduction of multiplicative noise in digital images. This kind of image distortion, also known as speckle noise, severely decreases the quality of medical ultrasound images and therefore their effective enhancement and restoration is of vital importance for proper visual inspection and quantitative measurements. The structure of the proposed Pixel-Patch Similarity Filter (PPSF) is a weighted average of pixels in a processing block and the weights are determined calculating the sum of squared differences between the mean of a patch and the intensities of pixels of the local window at the block center. The structure of the proposed design is similar to the bilateral and non-local means filters, however we neglect the topographic distance between pixels, which decreases substantially its computational complexity. The new technique was evaluated on standard gray scale test images contaminated with multiplicative noise modelled using Gaussian and uniform distribution. Its efficiency was also assessed utilizing a set of simulated ultrasonographic images distorted by means of the Field II simulation software and real ultrasound images of a finger joint. The comparison with the state-of-the-art techniques revealed very high efficiency of the proposed filtering framework, especially for strongly degraded images. Visually, the homogeneous areas are smoother, while image edges and small details are better preserved. The experiments have shown that satisfactory results were obtained with patches consisting of only 9 samples belonging to a relatively small processing block of 7x7 pixels, which ensures low computational complexity of the proposed denoising scheme and allows its application in real-time image processing scenarios.
Memory efficient and constant time 2D-recursive spatial averaging filter for embedded implementations
Qifeng Gan, Lama Seoud, Houssem Ben Tahar, et al.
Spatial Averaging Filters (SAF) are extensively used in image processing for image smoothing and denoising. Their latest implementations have already achieved constant time computational complexity regardless of kernel size. However, all the existing O(1) algorithms require additional memory for temporary data storage. In order to minimize memory usage in embedded systems, we introduce a new two-dimensional recursive SAF. It uses previous resultant pixel values along both rows and columns to calculate the current one. It can achieve constant time computational complexity without using any additional memory usage. Experimental comparisons with previous SAF implementations shows that the proposed 2D-Recursive SAF does not require any additional memory while offering a computational time similar to the most efficient existing SAF algorithm. These features make it especially suitable for embedded systems with limited memory capacity.
Real-time optical flow estimation on a GPU for a skied-steered mobile robot
Accurate egomotion estimation is required for mobile robot navigation. Often the egomotion is estimated using optical flow algorithms. For an accurate estimation of optical flow most of modern algorithms require high memory resources and processor speed. However simple single-board computers that control the motion of the robot usually do not provide such resources. On the other hand, most of modern single-board computers are equipped with an embedded GPU that could be used in parallel with a CPU to improve the performance of the optical flow estimation algorithm. This paper presents a new Z-flow algorithm for efficient computation of an optical flow using an embedded GPU. The algorithm is based on the phase correlation optical flow estimation and provide a real-time performance on a low cost embedded GPU. The layered optical flow model is used. Layer segmentation is performed using graph-cut algorithm with a time derivative based energy function. Such approach makes the algorithm both fast and robust in low light and low texture conditions. The algorithm implementation for a Raspberry Pi Model B computer is discussed. For evaluation of the algorithm the computer was mounted on a Hercules mobile skied-steered robot equipped with a monocular camera. The evaluation was performed using a hardware-in-the-loop simulation and experiments with Hercules mobile robot. Also the algorithm was evaluated using KITTY Optical Flow 2015 dataset. The resulting endpoint error of the optical flow calculated with the developed algorithm was low enough for navigation of the robot along the desired trajectory.
Real-Time Imaging Implementations
icon_mobile_dropdown
Design of a 3D-IC multi-resolution digital pixel sensor
N. Brochard, J. Nebhen, J. Dubois, et al.
This paper presents a digital pixel sensor (DPS) integrating a sigma-delta analog-to-digital converter (ADC) at pixel level. The digital pixel includes a photodiode, a delta-sigma modulation and a digital decimation filter. It features adaptive dynamic range and multiple resolutions (up to 10-bit) with a high linearity. A specific row decoder and column decoder are also designed to permit to read a specific pixel chosen in the matrix and its neighborhood of 4 x 4. Finally, a complete design with the CMOS 130 nm 3D-IC FaStack Tezzaron technology is also described, revealing a high fill-factor of about 80%.
Parallel implementation of a hyperspectral data geometry-based estimation of number of endmembers algorithm
In the last years, hyperspectral analysis have been applied in many remote sensing applications. In fact, hyperspectral unmixing has been a challenging task in hyperspectral data exploitation. This process consists of three stages: (i) estimation of the number of pure spectral signatures or endmembers, (ii) automatic identification of the estimated endmembers, and (iii) estimation of the fractional abundance of each endmember in each pixel of the scene. However, unmixing algorithms can be computationally very expensive, a fact that compromises their use in applications under real-time constraints. In recent years, several techniques have been proposed to solve the aforementioned problem but until now, most works have focused on the second and third stages. The execution cost of the first stage is usually lower than the other stages. Indeed, it can be optional if we known a priori this estimation. However, its acceleration on parallel architectures is still an interesting and open problem. In this paper we have addressed this issue focusing on the GENE algorithm, a promising geometry-based proposal introduced in.1 We have evaluated our parallel implementation in terms of both accuracy and computational performance through Monte Carlo simulations for real and synthetic data experiments. Performance results on a modern GPU shows satisfactory 16x speedup factors, which allow us to expect that this method could meet real-time requirements on a fully operational unmixing chain.
FPGA implementation of glass-free stereo vision
Weidong Tang, Xiaolin Yan
This paper presents a real-time efficient glass-free 3D system, which is based on FPGA. The system converts two-view input that is 60 frames per second (fps) 1080P stream into a multi-view video with 30fps and 4K resolution. In order to provide smooth and comfortable viewing experience, glass-free 3D systems must display multi-view videos. To generate a multi-view video from a two-view input includes three steps, the first is to compute disparity maps from two input views; the second is to synthesize a couple of new views based on the computed disparity maps and input views; the last is to produce video from the new views according to the specifications of the lens installed on TV sets.
A novel fast median filter algorithm without sorting
As one of widely applied nonlinear smoothing filtering methods, median filter is quite effective for removing salt-andpepper noise and impulsive noise while maintaining image edge information without blurring its boundaries, but its computation load is the maximal drawback while applied in real-time processing systems. In order to solve the issue, researchers have proposed many effective fast algorithms and published many papers. However most of the algorithms are based on sorting operations so as to make real-time implementation difficult. In this paper considering the large scale Boolean calculation function and convenient shift operation which are two of the advantages of FPGA(Field Programmable Gate Array), we proposed a novel median value finding algorithm without sorting, which can find the median value effectively and its performing time almost keeps changeless despite how large the filter radius is. Based on the algorithm, a real-time median filter has been realized. A lot of tests demonstrate the validity and correctness of proposed algorithm.
Real-Time Video Processing
icon_mobile_dropdown
4K-based intra and interprediction techniques for HEVC
D. G. Fernández, A. A. Del Barrio, Guillermo Botella, et al.
HEVC/H.265 standard was released in 2013. It allows reducing by half the required bandwidth in comparison with the previous H.264 standard. This opens the door to many relevant applications in the multimedia video coding and transmission context. Thanks to the HEVC improvements, the 4K and 8K Ultra High Definition Video real time constraints can be met. Nonetheless, HEVC implementations require a vast amount of resources. In this contribution we propose intra and inter prediction techniques in order to diminish the HEVC complexity, while complying with the real time and quality constraints. The performance is noticeably increased when comparing with respect to the HM16.2 reference software as well as the x265 encoder, maintaining a similar quality too.
In-network adaptation of SHVC video in software-defined networks
Olatunde Awobuluyi, James Nightingale, Qi Wang, et al.
Software Defined Networks (SDN), when combined with Network Function Virtualization (NFV) represents a paradigm shift in how future networks will behave and be managed. SDN’s are expected to provide the underpinning technologies for future innovations such as 5G mobile networks and the Internet of Everything. The SDN architecture offers features that facilitate an abstracted and centralized global network view in which packet forwarding or dropping decisions are based on application flows. Software Defined Networks facilitate a wide range of network management tasks, including the adaptation of real-time video streams as they traverse the network. SHVC, the scalable extension to the recent H.265 standard is a new video encoding standard that supports ultra-high definition video streams with spatial resolutions of up to 7680×4320 and frame rates of 60fps or more. The massive increase in bandwidth required to deliver these U-HD video streams dwarfs the bandwidth requirements of current high definition (HD) video. Such large bandwidth increases pose very significant challenges for network operators. In this paper we go substantially beyond the limited number of existing implementations and proposals for video streaming in SDN’s all of which have primarily focused on traffic engineering solutions such as load balancing. By implementing and empirically evaluating an SDN enabled Media Adaptation Network Entity (MANE) we provide a valuable empirical insight into the benefits and limitations of SDN enabled video adaptation for real time video applications. The SDN-MANE is the video adaptation component of our Video Quality Assurance Manager (VQAM) SDN control plane application, which also includes an SDN monitoring component to acquire network metrics and a decision making engine using algorithms to determine the optimum adaptation strategy for any real time video application flow given the current network conditions. Our proposed VQAM application has been implemented and evaluated on an SDN allowing us to provide important benchmarks for video streaming over SDN and for SDN control plane latency.
The QoE implications of ultra-high definition video adaptation strategies
James Nightingale, Olatunde Awobuluyi, Qi Wang, et al.
As the capabilities of high-end consumer devices increase, streaming and playback of Ultra-High Definition (UHD) is set to become commonplace. The move to these new, higher resolution, video services is one of the main factors contributing to the predicted continuation of growth in video related traffic in the Internet. This massive increases in bandwidth requirement, even when mitigated by the use of new video compression standards such as H.265, will place an ever-increasing burden on network service providers. This will be especially true in mobile environments where users have come to expect ubiquitous access to content. Consequently, delivering UHD and Full UHD (FUHD) video content is one of the key drivers for future Fifth Generation (5G) mobile networks. One often voiced, but as yet unanswered question, is whether users of mobile devices with modest screen sizes (e.g. smartphones or smaller tablet) will actually benefit from consuming the much higher bandwidth required to watch online UHD video, in terms of an improved user experience. In this paper, we use scalable H.265 encoded video streams to conduct a subjective evaluation of the impact on a user’s perception of video quality across a comprehensive range of adaptation strategies, covering each of the three adaptation domains, for UHD and FUHD video. The results of our subjective study provide insightful and useful indications of which methods of adapting UHD and FUHD streams have the least impact on user’s perceived QoE. In particular, it was observed that, in over 70% of cases, users were unable to distinguish between full HD (1080p) and UHD (4K) videos when they were unaware of which version was being shown to them. Our results from this evaluation can be used to provide adaptation rule sets that will facilitate fast, QoE aware in-network adaptation of video streams in support of realtime adaptation objectives. Undoubtedly they will also promote discussion around how network service providers manage their relationships with end users and how service level agreements might be shaped to account for what may be viewed as ‘unproductive’ use of bandwidth to deliver very marginal or imperceptible improvements in viewing experience.
Real-time multi-camera video acquisition and processing platform for ADAS
The paper presents the design of a real-time and low-cost embedded system for image acquisition and processing in Advanced Driver Assisted Systems (ADAS). The system adopts a multi-camera architecture to provide a panoramic view of the objects surrounding the vehicle. Fish-eye lenses are used to achieve a large Field of View (FOV). Since they introduce radial distortion of the images projected on the sensors, a real-time algorithm for their correction is also implemented in a pre-processor. An FPGA-based hardware implementation, re-using IP macrocells for several ADAS algorithms, allows for real-time processing of input streams from VGA automotive CMOS cameras.
Ghost removing for HDR real-time video stream generation
High dynamic range (HDR) imaging generation from a set of low dynamic range images taken in different exposure times is a low cost and an easy technique. This technique provides a good result for static scenes. Temporal exposure bracketing cannot be applied directly for dynamic scenes, since camera or object motion in bracketed exposures creates ghosts in the resulting HDR image. In this paper we describe a real-time ghost removing hardware implementation on high dynamic range video ow added for our HDR FPGA based smart camera which is able to provide full resolution (1280 x 1024) HDR video stream at 60 fps. We present experimental results to show the efficiency of our implemented method in ghost removing.
Architecture of web services in the enhancement of real-time 3D video virtualization in cloud environment
Adedayo Bada, Qi Wang, Jose M. Alcaraz-Calero, et al.
This paper proposes a new approach to improving the application of 3D video rendering and streaming by jointly exploring and optimizing both cloud-based virtualization and web-based delivery. The proposed web service architecture firstly establishes a software virtualization layer based on QEMU (Quick Emulator), an open-source virtualization software that has been able to virtualize system components except for 3D rendering, which is still in its infancy. The architecture then explores the cloud environment to boost the speed of the rendering at the QEMU software virtualization layer. The capabilities and inherent limitations of Virgil 3D, which is one of the most advanced 3D virtual Graphics Processing Unit (GPU) available, are analyzed through benchmarking experiments and integrated into the architecture to further speed up the rendering. Experimental results are reported and analyzed to demonstrate the benefits of the proposed approach.
Contour-based object orientation estimation
Boris Alpatov, Pavel Babayan
Real-time object orientation estimation is an actual problem of computer vision nowadays. In this paper we propose an approach to estimate an orientation of objects lacking axial symmetry. Proposed algorithm is intended to estimate orientation of a specific known 3D object, so 3D model is required for learning. The proposed orientation estimation algorithm consists of 2 stages: learning and estimation. Learning stage is devoted to the exploring of studied object. Using 3D model we can gather set of training images by capturing 3D model from viewpoints evenly distributed on a sphere. Sphere points distribution is made by the geosphere principle. It minimizes the training image set. Gathered training image set is used for calculating descriptors, which will be used in the estimation stage of the algorithm. The estimation stage is focusing on matching process between an observed image descriptor and the training image descriptors. The experimental research was performed using a set of images of Airbus A380. The proposed orientation estimation algorithm showed good accuracy (mean error value less than 6°) in all case studies. The real-time performance of the algorithm was also demonstrated.
Poster Session
icon_mobile_dropdown
Distance and speed measurements from monocular images
A method of estimating vehicle height, width and speed from images obtained by a monocular camera is presented. The method is based on the detection and tracking of vehicle license plates. The distance between the license plate and the camera is calculated from its pixel coordinates. The method makes no assumptions about the camera mounting height. The computational cost and the processing time are reduced by using tilt measurements provided by a microelectromechanical sensor and field-of-view data obtained prior to installation.
Parallel multilayer perceptron neural network used for hyperspectral image classification
Beatriz P. Garcia-Salgado, Volodymyr I. Ponomaryov, Marco A. Robles-Gonzalez
This study is focused on time optimization for the classification problem presenting a comparison of five Artificial Neural Network Multilayer Perceptron (ANN-MLP) architectures. We use the Artificial Neural Network (ANN) because it allows to recognize patterns in data in a lower time rate. Time and classification accuracy are taken into account together for the comparison. According to time comparison, two paradigms in the computational field for each ANN-MLP architecture are analysed with three schemes. Firstly, sequential programming is applied by using a single CPU core. Secondly, parallel programming is employed over a multi-core CPU architecture. Finally, a programming model running on GPU architecture is implemented. Furthermore, the classification accuracy is compared between the proposed five ANN-MLP architectures and a state-of.the-art Support Vector Machine (SVM) with three classification frames: 50%,60% and 70% of the data set's observations are randomly selected to train the classifiers. Also, a visual comparison of the classified results is presented. The Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) criteria are also calculated to characterise visual perception. The images employed were acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), the Reflective Optics System Imaging Spectrometer (ROSIS) and the Hyperion sensor.
Swarming visual sensor network for real-time multiple object tracking
Yuri P. Baranov, Sergey N. Yarishev, Roman V. Medvedev
Position control of multiple objects is one of the most actual problems in various technology areas. For example, in construction area this problem is represented as multi-point deformation control of bearing constructions in order to prevent collapse, in mining – deformation control of lining constructions, in rescue operations – potential victims and sources of ignition location, in transport – traffic control and traffic violations detection, in robotics –traffic control for organized group of robots and many other problems in different areas. Usage of stationary devices for solving these problems is inappropriately due to complex and variable geometry of control areas. In these cases self-organized systems of moving visual sensors is the best solution. This paper presents a concept of scalable visual sensor network with swarm architecture for multiple object pose estimation and real-time tracking. In this article recent developments of distributed measuring systems were reviewed with consequent investigation of advantages and disadvantages of existing systems, whereupon theoretical principles of design of swarming visual sensor network (SVSN) were declared. To measure object coordinates in the world coordinate system using TV-camera intrinsic (focal length, pixel size, principal point position, distortion) and extrinsic (rotation matrix, translation vector) calibration parameters were needed to be determined. Robust camera calibration was a too resource-intensive task for using moving camera. In this situation position of the camera is usually estimated using a visual mark with known parameters. All measurements were performed in markcentered coordinate systems. In this article a general adaptive algorithm of coordinate conversion of devices with various intrinsic parameters was developed. Various network topologies were reviewed. Minimum error in objet tracking was realized by finding the shortest path between object of tracking and bearing sensor, which set global coordinate system. Weight coefficients were determined by experimental researches of system sensors that are represented in this article. Conclusions obtained from this work are the basement for SVSN prototypes production and its future researches.
Analysis and segmentation of images in case of solving problems of detecting and tracing objects on real-time video
The article deals with the methods of image segmentation based on color space conversion, and allow the most efficient way to carry out the detection of a single color in a complex background and lighting, as well as detection of objects on a homogeneous background. The results of the analysis of segmentation algorithms of this type, the possibility of their implementation for creating software. The implemented algorithm is very time-consuming counting, making it a limited application for the analysis of the video, however, it allows us to solve the problem of analysis of objects in the image if there is no dictionary of images and knowledge bases, as well as the problem of choosing the optimal parameters of the frame quantization for video analysis.
Automatic finger joint synovitis localization in ultrasound images
Karolina Nurzynska, Bogdan Smolka
A long-lasting inflammation of joints results between others in many arthritis diseases. When not cured, it may influence other organs and general patients' health. Therefore, early detection and running proper medical treatment are of big value. The patients' organs are scanned with high frequency acoustic waves, which enable visualization of interior body structures through an ultrasound sonography (USG) image. However, the procedure is standardized, different projections result in a variety of possible data, which should be analyzed in short period of time by a physician, who is using medical atlases as a guidance. This work introduces an efficient framework based on statistical approach to the finger joint USG image, which enables automatic localization of skin and bone regions, which are then used for localization of the finger joint synovitis area. The processing pipeline realizes the task in real-time and proves high accuracy when compared to annotation prepared by the expert.
Optimized adaptation algorithm for HEVC/H.265 dynamic adaptive streaming over HTTP using variable segment duration
Adaptive video streaming using HTTP has become popular in recent years for commercial video delivery. The recent MPEG-DASH standard allows interoperability and adaptability between servers and clients from different vendors. The delivery of the MPD (Media Presentation Description) files in DASH and the DASH client behaviours are beyond the scope of the DASH standard. However, the different adaptation algorithms employed by the clients do affect the overall performance of the system and users’ QoE (Quality of Experience), hence the need for research in this field. Moreover, standard DASH delivery is based on fixed segments of the video. However, there is no standard segment duration for DASH where various fixed segment durations have been employed by different commercial solutions and researchers with their own individual merits. Most recently, the use of variable segment duration in DASH has emerged but only a few preliminary studies without practical implementation exist. In addition, such a technique requires a DASH client to be aware of segment duration variations, and this requirement and the corresponding implications on the DASH system design have not been investigated. This paper proposes a segment-duration-aware bandwidth estimation and next-segment selection adaptation strategy for DASH. Firstly, an MPD file extension scheme to support variable segment duration is proposed and implemented in a realistic hardware testbed. The scheme is tested on a DASH client, and the tests and analysis have led to an insight on the time to download next segment and the buffer behaviour when fetching and switching between segments of different playback durations. Issues like sustained buffering when switching between segments of different durations and slow response to changing network conditions are highlighted and investigated. An enhanced adaptation algorithm is then proposed to accurately estimate the bandwidth and precisely determine the time to download the next optimal segment considering the variable segment duration. Furthermore, objective metrics are employed to highlight the merits of the achieved compression efficiency using longer segment sizes for higher bitrate representations.
Automatic detection and classification of obstacles with applications in autonomous mobile robots
Volodymyr I. Ponomaryov, Dario I. Rosas-Miranda
Hardware implementation of an automatic detection and classification of objects that can represent an obstacle for an autonomous mobile robot using stereo vision algorithms is presented. We propose and evaluate a new method to detect and classify objects for a mobile robot in outdoor conditions. This method is divided in two parts, the first one is the object detection step based on the distance from the objects to the camera and a BLOB analysis. The second part is the classification step that is based on visuals primitives and a SVM classifier. The proposed method is performed in GPU in order to reduce the processing time values. This is performed with help of hardware based on multi-core processors and GPU platform, using a NVIDIA R GeForce R GT640 graphic card and Matlab over a PC with Windows 10.
Real-time framework for tensor-based image enhancement for object classification
In many practical situations visual pattern recognition is vastly burdened by low quality of input images due to noise, geometrical distortions, as well as low quality of the acquisition hardware. However, although there are techniques of image quality improvements, such as nonlinear filtering, there are only few attempts reported in the literature that try to build these enhancement methods into a complete chain for multi-dimensional object recognition such as color video or hyperspectral images. In this work we propose a joint multilinear signal filtering and classification system built upon the multi-dimensional (tensor) approach. Tensor filtering is performed by the multi-dimensional input signal projection into the tensor subspace spanned by the best-rank tensor decomposition method. On the other hand, object classification is done by construction of the tensor sub-space constructed based on the Higher-Order Singular Value Decomposition method applied to the prototype patters. In the experiments we show that the proposed chain allows high object recognition accuracy in the real-time even from the poor quality prototypes. Even more importantly, the proposed framework allows unified classification of signals of any dimensions, such as color images or video sequences which are exemplars of 3D and 4D tensors, respectively. The paper discussed also some practical issues related to implementation of the key components of the proposed system.
Development of the software for images segmentation and objects detecting on video
Features of program realization of algorithms of segmentation when developing the software for detecting and maintenance of objects on the stream video image are considered in this article. The special attention is paid to the choice of the using of algorithm of segmentation and realization of the qualifier which finds required object on the image, on the basis of Haar's cascades. The main algorithms which are basic when developing the software and examples of work of the program are given.
Real-time and low-cost embedded platform for car's surrounding vision system
Sergio Saponara, Emilio Franchi
The design and the implementation of a flexible and low-cost embedded system for real-time car’s surrounding vision is presented. The target of the proposed multi-camera vision system is to provide the driver a better view of the objects that surround the vehicle. Fish-eye lenses are used to achieve a larger Field of View (FOV) but, on the other hand, introduce radial distortion of the images projected on the sensors. Using low-cost cameras there could be also some alignment issues. Since these complications are noticeable and dangerous, a real-time algorithm for their correction is presented. Then another real-time algorithm, used for merging 4 camera video streams together in a single view, is described. Real-time image processing is achieved through a hardware-software platform
Static hand gesture recognition based on finger root-center-angle and length weighted Mahalanobis distance
Static hand gesture recognition (HGR) has drawn increasing attention in computer vision and human-computer interaction (HCI) recently because of its great potential. However, HGR is a challenging problem due to the variations of gestures. In this paper, we present a new framework for static hand gesture recognition. Firstly, the key joints of the hand, including the palm center, the fingertips and finger roots, are located. Secondly, we propose novel and discriminative features called root-center-angles to alleviate the influence of the variations of gestures. Thirdly, we design a distance metric called finger length weighted Mahalanobis distance (FLWMD) to measure the dissimilarity of the hand gestures. Experiments demonstrate the accuracy, efficiency and robustness of our proposed HGR framework.
A computationally efficient denoising and hole-filling method for depth image enhancement
Depth maps captured by Kinect depth cameras are being widely used for 3D action recognition. However, such images often appear noisy and contain missing pixels or black holes. This paper presents a computationally efficient method for both denoising and hole-filling in depth images. The denoising is achieved by utilizing a combination of Gaussian kernel filtering and anisotropic filtering. The hole-filling is achieved by utilizing a combination of morphological filtering and zero block filtering. Experimental results using the publicly available datasets are provided indicating the superiority of the developed method in terms of both depth error and computational efficiency compared to three existing methods.