Proceedings Volume 8499

Applications of Digital Image Processing XXXV

cover
Proceedings Volume 8499

Applications of Digital Image Processing XXXV

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 4 October 2012
Contents: 10 Sessions, 78 Papers, 0 Presentations
Conference: SPIE Optical Engineering + Applications 2012
Volume Number: 8499

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 8499
  • Image Signal Processing
  • High Dynamic Range Imaging
  • Distributed Video Coding
  • HEVC: An Emerging Video Standard I
  • HEVC: An Emerging Video Standard II
  • Low Complexity and Highly Parallel Image Coding
  • Computer Vision Techniques and Applications
  • Mobile Video and Application
  • Poster Session
Front Matter: Volume 8499
icon_mobile_dropdown
Front Matter: Volume 8499
This PDF file contains the front matter associated with SPIE Proceedings Volume 8499, including the Title Page, Copyright information, Table of Contents, and Conference Committee listing.
Image Signal Processing
icon_mobile_dropdown
Comparative study of resolution improvement of optical intrinsic signal imaging by extracting outlier images during data analysis
Optical intrinsic signal imaging (OISI) is a functional neuroimaging technique that measures changes in cortical light reflectance induced in vivo by the change in both cortical absorption and scattering. These changes are spatially correlated with neuronal activity and are due to changes in hemoglobin concentration and cell swelling. Typically, a light source at 630nm illuminates the exposed cortex to emphasize changes in deoxyhemoglobin and CCD camera acquired the reflected light during trial (stimulation). One trial consisted of recording multiple consecutive frames to minimize noise during image acquisition. Unfortunately, during trials processing both good and poor quality images are combined together resulting in an overall degradation of resolution performance. The present study describes the performance evaluation of an algorithm developed to detect and screen out these poor images (outliers) during OISI analysis. Algorithm’s performance was tested on rodent's model and the experimental results highlight the potential of the algorithm for enhancing the resolution of the active area in the final OISI images.
Super-resolution preprocessing of data from undersampled imaging systems for phase diversity
Eric A. Shields
Phase diversity algorithms allow wavefront and an estimate of the scene to be reconstructed from multiple images with a known phase change between measurements. These algorithms rely on sampling requirements that are frequently not met in remote sensing imaging systems. It is demonstrated that super-resolution pre-processing of imagery from undersampled systems can effectively increase the sampling, thereby allowing application of traditional phase diversity algorithms. Experimental results are presented for both a point object and an extended scene.
Statistics of Fresnelet coefficients in PSI holograms
Marc Wilke, Alok Kumar Singh, Ahmad Faridian, et al.
Advances in computer technology are moving real-time capable, digital holography into the realm of near future feasibility. The small pixel size required in the recording of even small objects and the large detector area (high numerical aperture in a lenseless recording setup) required for high resolution reconstruction results in large amounts of data, especially considering real-time video applications. The special requirements posed by digital holographic microscopy using lasers operating in the UV range are another application generating large quantities of data that suggest the use of compression for transmission and storage. Holograms differ significantly from natural images, as both the intensity and the phase of the incoming wavefront are recorded. The information about the recorded object is non-localized in the detector plane and in many applications the phase is far more important than the intensity as it provides information about different optical path length (e.g. distance and thus shape in metrology, presence of transparent structures in microscopy). This paper examines the statistical properties of PSI holograms. The holograms are transformed using Fres- nelets, a wavelet analysis of the reconstructed wavefront in the object plane. Since the wavefront is complex valued, the complex amplitude has been separated into real-valued phase and amplitude before wavelet trans- formation. The results show that while the phase can be statistically modeled using a Generalized Gaussian Distribution (GGD) with exponent α ≈ 1.5, the statistics of the amplitude seem to be the result of two separable components, each corresponding to GGD. These are identified as the speckle field caused by sub-wavelength surface roughness with α ≈ 2 and the actual object with α ≈ 1. These result suggest the separate application of classical image compression based on GGD statistics in the subbands to the phase, the speckle amplitude and the object amplitude.
Performance evaluation of consumer-grade 3D sensors for static 6DOF pose estimation systems
J. A. Marvel, M. Franaszek, J. L. Wilson, et al.
Low-cost 3D depth and range sensors are steadily becoming more widely available and affordable, and thus popular for robotics enthusiasts. As basic research tools, however, their accuracy and performance are relatively unknown. In this paper, we describe a framework for performance evaluation and measurement error analysis for 6 degrees of freedom pose estimation systems using traceable ground truth instruments. Characterizing sensor drift and variance, and quantifying range, spatial and angular accuracy, our framework focuses on artifact surface fitting and static pose analysis, reporting testing and environmental conditions in compliance with the upcoming ASTM E57.02 standard.
Adaptive bilateral filter for video and image upsampling
Rahul Vanam, Yan Ye
Upsampling is a post-processing method for increasing the spatial resolution of an image or video. Most video players and image viewers support upsampling functionality. Sometimes upsampling can introduce blurring, ringing, and jaggedness artifacts in the upsampled video or image thereby lowering its visual quality. In this paper, we present an adaptive bilateral interpolation filter for upsampling a video or image by an arbitrary upsampling factor, and show that it mitigates most of the artifacts produced by conventional upsampling methods.
Comparison of refractive power maps from a reference surface: geometric versus Zernike power polynomials
Angel S. Cruz Felix, Sandra Balderas-Mata, Estela López-Olazagasti, et al.
In recent years there has been many advances in the field of visual optics, such as new technologies to measure and to analyze the wavefront aberration function of the human eye. In this direction, corneal topographers have been extensively used as a tool to obtain related data that can be used to get the refractive power maps of the human cornea in order to characterize the optical function of the eye. On the other hand, it is well known that we can describe the optical aberrations in the human eye as a polynomial expansion of the Zernike polynomials. In this work we present a qualitative comparison of a refractive power map from a reference refractive surface obtained with an alternate form of representation, first proposed in 2007 by Iskander1 et al., of the wavefront aberrations in the dioptric power domain and the usual geometrical representation of the power refractive maps. We present our preliminary results from such comparison.
Image reconstruction from compressive samples via a max-product EM algorithm
We propose a Bayesian expectation-maximization (EM) algorithm for reconstructing structured approximately sparse signals via belief propagation. The measurements follow an underdetermined linear model where the regression-coefficient vector is the sum of an unknown approximately sparse signal and a zero-mean white Gaussian noise with an unknown variance. The signal is composed of large- and small-magnitude components identified by binary state variables whose probabilistic dependence structure is described by a hidden Markov tree (HMT). Gaussian priors are assigned to the signal coefficients given their state variables and the Jeffreys’ noninformative prior is assigned to the noise variance. Our signal reconstruction scheme is based on an EM iteration that aims at maximizing the posterior distribution of the signal and its state variables given the noise variance. We employ a max-product algorithm to implement the maximization (M) step of our EM iteration. The noise variance is a regularization parameter that controls signal sparsity. We select the noise variance so that the corresponding estimated signal and state variables (obtained upon convergence of the EM iteration) have the largest marginal posterior distribution. Our numerical examples show that the proposed algorithm achieves better reconstruction performance compared with the state-of-the-art methods.
Two-factor authentication system based on optical interference and one-way hash function
We present a two-factor authentication method to verify the personal identification who tries to access an optoelectronic system. This method is based on the optical interference principle and the traditional one-way Hash function (e.g. MD5). The authentication process is straightforward, the phase key and the password-controlled phase lock of one user are loading on two Spatial Light Modulators (SLMs) in advance, by which two coherent beams are modulated and then interference with each other at the output plane leading to an output image. By comparing the output image with all the standard certification images in the database, the system can thus verify the user’s identity. However, the system designing process involves an iterative Modified Phase Retrieval Algorithm (MPRA). For an uthorized user, a phase lock is first created based on a “Digital Fingerprint (DF)”, which is the result of a Hash function on a preselected user password. The corresponding phase key can then be determined by use of the phase lock and a designated standard certification image. Note that the encode/design process can only be realized by digital means while the authentication process could be achieved digitally or optically. Computer simulations were also given to validate the proposed approach.
Image restoration based on topological properties of functions of two variables
Image restoration refers to the problem of estimating an ideal image from its observed degraded one. Functions of two variables can be well described with two variations. One of them is a total variation for continuously differentiable functions. Another one is called a linear variation. The linear variation is a topological characteristic of a function of two variables whereas the total variation is a metrical characteristic of a function. A restoration algorithm based on both total variation-based regularization and variations is proposed. Computer simulation results illustrate the performance of the proposed algorithm for restoration of blurred images.
New image compression algorithm based on improved reversible biorthogonal integer wavelet transform
Libao Zhang, Xianchuan Yu
The low computational complexity and high coding efficiency are the most significant requirements for image compression and transmission. Reversible biorthogonal integer wavelet transform (RB-IWT) supports the low computational complexity by lifting scheme (LS) and allows both lossy and lossless decoding using a single bitstream. However, RB-IWT degrades the performances and peak signal noise ratio (PSNR) of the image coding for image compression. In this paper, a new IWT-based compression scheme based on optimal RB-IWT and improved SPECK is presented. In this new algorithm, the scaling parameter of each subband is chosen for optimizing the transform coefficient. During coding, all image coefficients are encoding using simple, efficient quadtree partitioning method. This scheme is similar to the SPECK, but the new method uses a single quadtree partitioning instead of set partitioning and octave band partitioning of original SPECK, which reduces the coding complexity. Experiment results show that the new algorithm not only obtains low computational complexity, but also provides the peak signal-noise ratio (PSNR) performance of lossy coding to be comparable to the SPIHT algorithm using RB-IWT filters, and better than the SPECK algorithm. Additionally, the new algorithm supports both efficiently lossy and lossless compression using a single bitstream. This presented algorithm is valuable for future remote sensing image compression.
High Dynamic Range Imaging
icon_mobile_dropdown
Building a high dynamic range video sensor with spatially nonregular optical filtering
Michael Schöberl, Alexander Belz, Arne Nowak, et al.
Although we observe a steady progress in the development of High Dynamic Range Video (HDRV) technology, current image sensors are still lacking achievable dynamic range for high image quality applications. We propose a new imaging principle that is based on a spatial variation of optical Neutral Density (ND) filters on top of some pixels. In existing work, this method has been used to trade spatial resolution for an increase in dynamic range. We improve this approach by a non-regular placement of these filters. The non-regular sampling is an important step as any sub-sampling with regular patterns leads to aliasing. The non-regular patterns however preserve just a single dominant spatial frequency and enable an image reconstruction without aliasing. In combination with a new image reconstruction approach, we are able to recover image details at high resolution. The iterative reconstruction is based on the assumption that natural images can be represented with few coefficients in the Fourier domain. As typical natural images can be classified as near-sparse, the method enables the reconstruction of images of high objective and visual quality. In extension to theory and simulation of this method we want to present details on a practical implementation of our method. While building a demonstration system we encountered many challenges. This includes effects like crosstalk, aspects like sensor selection and mask fabrication as well as mounting of the masks.
Temporal coherency for video tone mapping
Ronan Boitard, Kadi Bouatouch, Remi Cozot, et al.
Tone Mapping Operators (TMOs) aim at converting real world high dynamic range (HDR) images captured with HDR cameras, into low dynamic range (LDR) images that can be displayed on LDR displays. Several TMOs have been proposed over the last decade, from the simple global mapping to the more complex one simulating the human vision system. While these solutions work generally well for still pictures, they are usually less e_cient for video sequences as they are source of visual artifacts. Only few of them can be adapted to cope with a sequence of images. In this paper we present a major problem that a static TMO usually encounters while dealing with video sequences, namely the temporal coherency. Indeed, as each tone mapper deals with each frame separately, no temporal coherency is taken into account and hence the results can be quite disturbing for high varying dynamics in a video. We propose a temporal coherency algorithm that is designed to analyze a video as a whole, and from its characteristics adapts each tone mapped frame of a sequence in order to preserve the temporal coherency. This temporal coherency algorithm has been tested on a set of real as well as Computer Graphics Image (CGI) content and put in competition with several algorithms that are designed to be time-dependent. Results show that temporal coherency preserves the overall contrast in a sequence of images. Furthermore, this technique is applicable to any TMO as it is a post-processing that only depends on the used TMO.
A comparative survey on high dynamic range video compression
Alper Koz, Frederic Dufaux
High dynamic range (HDR) video compression has until now been approached by using the high profile of existing state-of-the-art H.264/AVC (Advanced Video Coding) codec or by separately encoding low dynamic range (LDR) video and the residue resulted from the estimation of HDR video from LDR video. Although the latter approach has a distinctive advantage of providing backward compatibility to 8-bit LDR displays, the superiority of one approach to the other in terms of the rate distortion trade-off has not been verified yet. In this paper, we first give a detailed overview of the methods in these two approaches. Then, we experimentally compare two approaches with respect to different objective and perceptual metrics, such as HDR mean square error (HDR MSE), perceptually uniform peak signal to noise ratio (PU PSNR) and HDR visible difference predictor (HDR VDP). We first conclude that the optimized methods for backward compatibility to 8-bit LDR displays are superior to the method designed for high profile encoder both for 8-bit and 12-bit mappings in terms of all metrics. Second, using higher bit-depths with a high profile encoder is giving better rate-distortion performances than employing an 8-bit mapping with an 8-bit encoder for the same method, in particular when the dynamic range of the video sequence is high. Third, rather than encoding of the residue signal in backward compatible methods, changing the quantization step size of the LDR layer encoder would be sufficient to achieve a required quality. In other words, the quality of tone mapping is more important than residue encoding for the performance of HDR image and video coding.
Effect of tone mapping operators on visual attention deployment
Manish Narwaria, Matthieu Perreira Da Silva, Patrick Le Callet, et al.
High Dynamic Range (HDR) images/videos require the use of a tone mapping operator (TMO) when visualized on Low Dynamic Range (LDR) displays. From an artistic intention point of view, TMOs are not necessarily transparent and might induce different behavior to view the content. In this paper, we investigate and quantify how TMOs modify visual attention (VA). To that end both objective and subjective tests in the form of eye-tracking experiments have been conducted on several still image content that have been processed by 11 different TMOs. Our studies confirm that TMOs can indeed modify human attention and fixation behavior significantly. Therefore our studies suggest that VA needs consideration for evaluating the overall perceptual impact of TMOs on HDR content. Since the existing studies so far have only considered the quality or aesthetic appeal angle, this study brings in a new perspective regarding the importance of VA in HDR content processing for visualization on LDR displays.
Image and video compression for HDR content
Yang Zhang, Erik Reinhard, Dimitris Agrafiotis, et al.
High Dynamic Range (HDR) technology can offer high levels of immersion with a dynamic range meeting and exceeding that of the Human Visual System (HVS). A primary drawback with HDR images and video is that memory and bandwidth requirements are significantly higher than for conventional images and video. Many bits can be wasted coding redundant imperceptible information. The challenge is therefore to develop means for efficiently compressing HDR imagery to a manageable bit rate without compromising perceptual quality. In this paper, we build on previous work of ours and propose a compression method for both HDR images and video, based on an HVS optimised wavelet subband weighting method. The method has been fully integrated into a JPEG 2000 codec for HDR image compression and implemented as a pre-processing step for HDR video coding (an H.264 codec is used as the host codec for video compression). Experimental results indicate that the proposed method outperforms previous approaches and operates in accordance with characteristics of the HVS, tested objectively using a HDR Visible Difference Predictor (VDP). Aiming to further improve the compression performance of our method, we additionally present the results of a psychophysical experiment, carried out with the aid of a high dynamic range display, to determine the difference in the noise visibility threshold between HDR and Standard Dynamic Range (SDR) luminance edge masking. Our findings show that noise has increased visibility on the bright side of a luminance edge. Masking is more consistent on the darker side of the edge.
BoostHDR: a novel backward-compatible method for HDR images
Francesco Banterle, Roberto Scopigno
In this paper, we present BoostHDR, a novel method for compressing high dynamic range (HDR) images. The algorithm leverages on a novel segmentation-based tone mapping operator (TMO) which relaxes the no seams constraint. Our method can work with both JPEG and JPEG2000 encoders. Moreover, it provides better results compared to the state of the art in HDR images compression algorithms in terms of bit per pixels (bpp), and visual quality using objective metrics.
A JPEG backward-compatible HDR image compression
High Dynamic Range (HDR) imaging is expected to become one of the technologies that could shape next generation of consumer digital photography. Manufacturers are rolling out cameras and displays capable of capturing and rendering HDR images. The popularity and full public adoption of HDR content is however hindered by the lack of standards in evaluation of quality, file formats, and compression, as well as large legacy base of Low Dynamic Range (LDR) displays that are unable to render HDR. To facilitate wide spread of HDR usage, the backward compatibility of HDR technology with commonly used legacy image storage, rendering, and compression is necessary. Although many tone-mapping algorithms were developed for generating viewable LDR images from HDR content, there is no consensus on which algorithm to use and under which conditions. This paper, via a series of subjective evaluations, demonstrates the dependency of perceived quality of the tone-mapped LDR images on environmental parameters and image content. Based on the results of subjective tests, it proposes to extend JPEG file format, as the most popular image format, in a backward compatible manner to also deal with HDR pictures. To this end, the paper provides an architecture to achieve such backward compatibility with JPEG and demonstrates efficiency of a simple implementation of this framework when compared to the state of the art HDR image compression.
Distributed Video Coding
icon_mobile_dropdown
Side information improvement in transform-domain distributed video coding
A. Abou-Elailah, G. Petrazzuoli, J. Farah, et al.
Side Information (SI) has a strong impact on the rate-distortion performance in distributed video coding. The quality of the SI can be impaired when the temporal distance between the neighboring reference frames increases. In this paper, we introduce two novel methods that allow improving the quality of the SI. In the first approach, we propose a new estimation method for the initial SI using backward and forward motion estimation. The second one consists in re-estimating the SI after decoding all WZFs within the current Group of Pictures (GOP). For this purpose, the SI is first successively refined after each decoded DCT band. Then, after decoding all WZFs within the GOP, we adapt the search area to the motion content. Finally, each already decoded WZF is used, along with the neighboring ones, to estimate a new SI closer to the original WZF. This new SI is then used to reconstruct again the WZF with better quality. The experimental results show that, compared to the DISCOVER codec, the proposed method reaches an improvement of up to 3.53 dB in rate-distortion performance (measured with the Bjontegaard metric) for a GOP size of 8.
Refining WZ rate estimation in DVC with feedback channel constraints
Distributed video coding (DVC) has attracted a lot of attention during the past decade as a new solution for video compression where the computationally most intensive operations are performed by the decoder instead of by the encoder. One very important issue in many current DVC solutions is the use of a feedback channel from the decoder to the encoder for the purpose of determining the rate of the coded stream. The use of such a feedback channel is not only impractical in storage applications but even in streaming scenarios feedback-channel usage may result in intolerable delays due to the typically large number of requests for decoding one frame. Instead of reverting to a feedback-free solution by adding complexity to the encoder for performing encoder-side rate estimation, as an alternative, in previous work we proposed to incorporate constraints on feedback channel usage. To cope better with rate fluctuations caused by changing motion characteristics, in this paper we propose a refined approach exploiting information available from already decoded frames at other temporal layers. The results indicate significant improvements for all test sequences (using a GOP of length four).
Adaptive distributed video coding with correlation estimation using expectation propagation
Lijuan Cui, Shuang Wang, Xiaoqian Jiang, et al.
Distributed video coding (DVC) is rapidly increasing in popularity by the way of shifting the complexity from encoder to decoder, whereas no compression performance degrades, at least in theory. In contrast with conventional video codecs, the inter-frame correlation in DVC is explored at decoder based on the received syndromes of Wyner-Ziv (WZ) frame and side information (SI) frame generated from other frames available only at decoder. However, the ultimate decoding performances of DVC are based on the assumption that the perfect knowledge of correlation statistic between WZ and SI frames should be available at decoder. Therefore, the ability of obtaining a good statistical correlation estimate is becoming increasingly important in practical DVC implementations. Generally, the existing correlation estimation methods in DVC can be classified into two main types: pre-estimation where estimation starts before decoding and on-the-fly (OTF) estimation where estimation can be refined iteratively during decoding. As potential changes between frames might be unpredictable or dynamical, OTF estimation methods usually outperforms pre-estimation techniques with the cost of increased decoding complexity (e.g., sampling methods). In this paper, we propose a low complexity adaptive DVC scheme using expectation propagation (EP), where correlation estimation is performed OTF as it is carried out jointly with decoding of the factor graph-based DVC code. Among different approximate inference methods, EP generally offers better tradeoff between accuracy and complexity. Experimental results show that our proposed scheme outperforms the benchmark state-of-the-art DISCOVER codec and other cases without correlation tracking, and achieves comparable decoding performance but with significantly low complexity comparing with sampling method.
Exploiting the error-correcting capabilities of low-density parity check codes in distributed video coding using optical flow
Lars Lau Rakêt, Jacob Søgaard, Matteo Salmistraro, et al.
We consider Distributed Video Coding (DVC) in presence of communication errors. First, we present DVC side information generation based on a new method of optical flow driven frame interpolation, where a highly optimized TV-L1 algorithm is used for the flow calculations and combine three flows. Thereafter methods for exploiting the error-correcting capabilities of the LDPCA code in DVC are investigated. The proposed frame interpolation includes a symmetric flow constraint to the standard forward-backward frame interpolation scheme, which improves quality and handling of large motion. The three flows are combined in one solution. The proposed frame interpolation method consistently outperforms an overlapped block motion compensation scheme and a previous TV-L1 optical flow frame interpolation method with an average PSNR improvement of 1.3 dB and 2.3 dB respectively. For a GOP size of 2, an average bitrate saving of more than 40% is achieved compared to DISCOVER on Wyner-Ziv frames. In addition we also exploit and investigate the internal error-correcting capabilities of the LDPCA code in order to make it more robust to errors. We investigate how to achieve this goal by only modifying the decoding. One of approaches is to use bit flipping; alternatively one can modify the parity check matrix of the LDPCA. Different schemes known from LDPC codes are considered and evaluated in the LDPCA setting. Results show that the performance depend heavily on the type of channel used and on the quality of the Side Information.
Iterative Wyner-Ziv decoding and successive side-information refinement in feedback channel-free hash-based distributed video coding
Frederik Verbist, Nikos Deligiannis, Shahid M. Satti, et al.
This work presents a novel rate control scheme that suppresses the need for a feedback channel in hash-based distributed video coding (DVC) architectures. Our state-of-the-art DVC schemes generate side-information (SI) at the decoder by means of hash-based overlapped block motion estimation followed by probabilistic motion compensation, where key frames or previously decoded non-key frames, called Wyner-Ziv frames, are used as references. These DVC systems employ powerful low-density parity-check accumulate codes to code the Wyner-Ziv frames in the transform domain. Our previous DVC architectures use a classical decoder-driven rate control scheme with a feedback channel. Specifically, chunks of accumulated syndrome bits are sent from the encoder to the decoder upon request from the latter until successful decoding is achieved. In order to suppress the feedback channel, the encoder of the DVC system, proposed in this work, approximates the SI available to the decoder using a novel low complexity SI generation technique. Subsequently, the conditional probabilities of the original Wyner-Ziv frames, given the approximation of the SI at the encoder, are used to generate an estimate of the required rate for channel decoding. Hence the presence of a feedback channel is evaded. Additionally, the proposed feedback channel-free DVC system is equipped with advanced reconstruction techniques to reduce the impact of failed channel decoding. In this context, our DVC architecture features iterative refinement of the SI at the decoder. The latter allows for reattempting to decode Wyner-Ziv data for which the channel decoding failed in previous decoding steps when only a lower quality version of the SI was available. Experimental results show competitive compression performance of our novel feedback channel-free hash-based DVC system with respect to the feedback channel-based benchmark in DVC.
Stereo side information generation in low-delay distributed stereo video coding
Matteo Salmistraro, Søren Forchhammer
Distributed Video Coding (DVC) is a technique that allows shifting the computational complexity from the encoder to the decoder. One of the core elements of the decoder is the creation of the Side Information (SI), which is a hypothesis of what the to-be-decoded frame looks like. Much work on DVC has been carried out: often the decoder can use future and past frames in order to obtain the SI exploiting the time redundancy. Other work has addressed a Multiview scenario; exploiting the frames coming from cameras close to the one we are decoding (usually a left and right camera) it is possible to create SI exploiting the inter-view spatial redundancy. A careful fusion of the two SI should be done in order to use the best part of each SI. In this work we study a Stereo Low-Delay scenario using only two views. Due to the delay constraint we use only past frames of the sequence we are decoding and past and present frames of the other. This is done by using Extrapolation, to exploit the time redundancy and well known techniques for stereo error concealment. This allows us to create good quality SI even if we are only using two views. In this work we have also used a new method in order to fuse the two SIs, inspired by Multi-Hypothesis decoding. In this work the multiple hypotheses are used to fuse the SIs. Preliminary results show improvements up to 1 Βd.
HEVC: An Emerging Video Standard I
icon_mobile_dropdown
Large and various shapes block processing in HEVC
Il-Koo Kim, Junghye Min, Tammy Lee, et al.
Recently, Joint Collaborative Team on Video Coding (JCT-VC) which is joint team by ITU-T SG 16 Q.6 (VCEG) and ISO/IEC JTC1/SC29/WG11 (MPEG) was established and started to define new video coding standard called as High Efficiency Video Coding (HEVC). This paper introduces block partitioning structure of HEVC standard and presents its analysis results. Among many technical aspects of HEVC, the block partitioning structure has been considered as a key factor of its significant coding efficiency improvement. Compared with the macroblock structure of the fixed size 16x16 in H.264/AVC, HEVC defines three flexible size units according to their functionalities. Coding unit (CU) defines a region sharing the same prediction scheme between spatial and temporal predictions and it is represented by the leaf node of the quadtree structure. Moreover, prediction unit (PU) defines a region sharing the same prediction information and transform unit (TU), which is specified by another quadtree, defines a region sharing the same transformation. This paper introduces technical details of the block partitioning structure of HEVC with emphasis on the consistently designed framework by combining three different units together. Provided experimental results justifies each component of the block partitioning structure.
Block merging for quadtree-based partitioning in HEVC
Benjamin Bross, Simon Oudin, Philipp Helle, et al.
With the prospective High Effciency Video Coding (HEVC) standard as jointly developed by ITU-T VCEG and ISO/IEC MPEG, a new step in video compression capability is achieved. Technically, HEVC is a hybrid video-coding approach using quadtree-based block partitioning together with motion-compensated prediction. Even though a high degree of adaptability is achieved by quadtree-based block partitioning, this approach is intrinsically tied to certain drawbacks which may result in redundant sets of motion parameters to be transmitted. In order to remove those redundancies, a block-merging algorithm for HEVC is proposed. This algorithm generates a single motion-parameter set for a whole region of contiguous motion-compensated blocks. Simulation results show that the proposed merging technique works more effciently than a conceptually similar direct mode.
Entropy coding of syntax elements related to block structures and transform coefficient levels in HEVC
Tung Nguyen, Philipp Helle, Martin Winken, et al.
The most recent video compression technology is High Efficiency Video Coding (HEVC). This soon to be completed standard is a joint development by Video Coding Experts Group (VCEG) of ITU-T and Moving Picture Experts Group (MPEG) of ISO/IEC. As one of its major technical novelties, HEVC supports variable prediction and transform block sizes using the quadtree approach for block partitioning. In terms of entropy coding, the Draft International Standard (DIS) of HEVC specifies context-based adaptive binary arithmetic coding (CABAC) as the single mode of operation. In this paper, a description of the specific CABAC-based entropy coding part in HEVC is given that is related to block structures and transform coefficient levels. In addition, experimental results are presented that indicate the benefit of the transform-coefficient level coding design in HEVC in terms of improved coding performance and reduced complexity.
Core transform design for high efficiency video coding
Jie Dong, Yan Ye
High Efficiency Video Coding (HEVC) is the next generation video coding standard currently being developed by the Joint Collaborative Team on Video Coding (JCT-VC). It employs various coding unit sizes 2K×2K, where K is a positive integer with the typical values from 3 to 6; it also uses larger transform sizes up to 32×32. This raises the interest in seeking high performance higher order integer transforms with low computation requirements. This paper presents approaches to designing order-N (N=4, 8, 16, 32) integer transforms, by which the derived integer transforms have special symmetry structures to ensure the matrix factorization. The proposed set of high order integer transforms with well selected elements demonstrates excellent coding performance, compared with the core transform design in HEVC.
New fast DCT algorithms based on Loeffler's factorization
Yoon Mi Hong, Il-Koo Kim, Tammy Lee, et al.
This paper proposes a new 32-point fast discrete cosine transform (DCT) algorithm based on the Loeffler's 16-point transform. Fast integer realizations of 16-point and 32-point transforms are also provided based on the proposed transform. For the recent development of High Efficiency Video Coding (HEVC), simplified quanti-zation and de-quantization process are proposed. Three different forms of implementation with the essentially same performance, namely matrix multiplication, partial butterfly, and full factorization can be chosen accord-ing to the given platform. In terms of the number of multiplications required for the realization, our proposed full-factorization is 3~4 times faster than a partial butterfly, and about 10 times faster than direct matrix multiplication.
HEVC: An Emerging Video Standard II
icon_mobile_dropdown
Subjective quality evaluation of the upcoming HEVC video compression standard
Philippe Hanhart, Martin Rerabek, Francesca De Simone, et al.
High Effciency Video Coding (HEVC) is the latest attempt by ISO/MPEG and ITU-T/VCEG to define the next generation compression standard beyond H.264/MPEG-4 Part 10 AVC. One of the major goals of HEVC is to provide effcient compression for resolutions beyond HDTV. However, the subjective evaluations that led to the selection of technologies were bound to HDTV resolution. Moreover, performance evaluation metrics to report effciency results of this standard are mainly based on PSNR, especially for resolutions beyond HDTV. This paper provides subjective evaluation results to assess the performance of the current HEVC codec for resolutions beyond HDTV.
Informal subjective quality comparison of video compression performance of the HEVC and H.264/MPEG-4 AVC standards for low-delay applications
Michael Horowitz, Faouzi Kossentini, Nader Mahdi, et al.
This paper presents the results of an informal subjective quality comparison between the current state of the emerging High Efficiency Video Coding (HEVC) draft standard and the well-established H.264 / MPEG-4 AVC High Profile (HP) for low-delay applications. The tests consisted of two basic encoding comparisons. First, we compare the Main profile low-delay configuration of the HEVC reference software (HM) against a similarly configured H.264 / MPEG-4 AVC HP reference encoder (JM). Additionally, to complement these results, the widely-recognized production-quality H.264 / MPEG-4 AVC encoder known as x264 is compared with a production-quality HEVC implementation from eBrisk Video. The encoding configurations are designed to reflect relevant application scenarios and to enable a fair comparison to the maximum extent feasible. When viewing HM and JM encoded video side-by-side in which the JM was configured to use approximately twice the bit rate of the HM encoded video, viewers indicated that they preferred the HM encoded video in approximately 74% of trials. Similarly, when comparing the eBrisk HEVC and x264 H.264 / MPEG-4 AVC production encoders in which x264 was configured to use approximately twice the bit rate of the eBrisk encoded video, viewers indicated they preferred the eBrisk HEVC encoded video in approximately 62% of trials. The selection of which encoding was displayed on which side for the side-by-side viewing was established in a randomized manner, and the subjective viewing experiments were administered in a double-blind fashion. The results reported in this paper generally confirm that the HEVC design (as represented by HM version 7.1 and separately by a production-quality HEVC implementation) exhibits a substantial improvement in compression capability beyond that of H.264 / MPEG-4 AVC (as represented by a similarly-configured JM version 18.3 and x264 version core 122 r2184, respectively) for low-delay video applications, with HEVC exhibiting roughly twice or more of the overall compression capability of H.264 / MPEG-4 AVC.
Study of decoder complexity for HEVC and AVC standards based on tool-by-tool comparison
Y. J. Ahn, W. J. Han, D. G. Sim
High Efficiency Video Coding (HEVC) is the latest standardization efforts of ISO/IEC MPEG and ITU-T VCEG for further improving the coding efficiency of H.264/AVC standard. It has been reported that HEVC can provide comparable subjective visual quality with H.264/AVC at only half bit-rates in many cases. In this paper, decoder complexities between HEVC and H.264/AVC are studied for providing initial complexity estimates of the HEVC decoder compared with the H.264/AVC decoder. For this purpose, several selected coding tools including intra prediction, motion compensation, transform, loop filters and entropy coder have been analyzed in terms of number of operations as well as their statistical differences.
Analysis of 3D and multiview extensions of the emerging HEVC standard
Anthony Vetro, Dong Tian
Standardization of a new set of 3D formats has been initiated with the goal of improving the coding of stereo and multiview video, and also facilitating the generation of multiview output needed for auto-stereoscopic displays. Part of this effort will develop 3D and multiview extensions of the emerging standard for High Efficiency Video Coding (HEVC). This paper outlines some of the key technologies and architectures being considered for standardization, and analyzes the viability, benefits and drawbacks of different codec designs.
DCT based interpolation filter for motion compensation in HEVC
Alexander Alshin, Elena Alshina, Jeong Hoon Park, et al.
High Efficiency Video Coding (HEVC) draft standard has a challenging goal to improve coding efficiency twice compare to H.264/AVC. Many aspects of the traditional hybrid coding framework were improved during new standard development. Motion compensated prediction, in particular the interpolation filter, is one area that was improved significantly over H.264/AVC. This paper presents the details of the interpolation filter design of the draft HEVC standard. The coding efficiency improvements over H.264/AVC interpolation filter is studied and experimental results are presented, which show a 4.0% average bitrate reduction for Luma component and 11.3% average bitrate reduction for Chroma component. The coding efficiency gains are significant for some video sequences and can reach up 21.7%.
Parallel tools in HEVC for high-throughput processing
Minhua Zhou, Vivienne Sze, Madhukar Budagavi
HEVC (High Efficiency Video Coding) is the next-generation video coding standard being jointly developed by the ITU-T VCEG and ISO/IEC MPEG JCT-VC team. In addition to the high coding efficiency, which is expected to provide 50% more bit-rate reduction when compared to H.264/AVC, HEVC has built-in parallel processing tools to address bitrate, pixel-rate and motion estimation (ME) throughput requirements. This paper describes how CABAC, which is also used in H.264/AVC, has been redesigned for improved throughput, and how parallel merge/skip and tiles, which are new tools introduced for HEVC, enable high-throughput processing. CABAC has data dependencies which make it difficult to parallelize and thus limit its throughput. The prediction error/residual, represented as quantized transform coefficients, accounts for the majority of the CABAC workload. Various improvements have been made to the context selection and scans in transform coefficient coding that enable CABAC in HEVC to potentially achieve higher throughput and increased coding gains relative to H.264/AVC. The merge/skip mode is a coding efficiency enhancement tool in HEVC; the parallel merge/skip breaks dependency between the regular and merge/skip ME, which provides flexibility for high throughput and high efficiency HEVC encoder designs. For ultra high definition (UHD) video, such as 4kx2k and 8kx4k resolutions, low-latency and real-time processing may be beyond the capability of a single core codec. Tiles are an effective tool which enables pixel-rate balancing among the cores to achieve parallel processing with a throughput scalable implementation of multi-core UHD video codec. With the evenly divided tiles, a multi-core video codec can be realized by simply replicating single core codec and adding a tile boundary processing core on top of that. These tools illustrate that accounting for implementation cost when designing video coding algorithms can enable higher processing speed and reduce implementation cost, while still delivering high coding efficiency in the next generation video coding standard.
Techniques for increasing throughput in HEVC transform coefficient coding
Rajan L. Joshi, Joel Sole, Jianle Chen, et al.
Transform coefficient coding in HEVC encompasses the scanning patterns and the coding methods for the last significant coefficient, significance map, coefficient levels and sign data. Unlike H.264/AVC, HEVC has a single entropy coding mode based on the context adaptive binary arithmetic coding (CABAC) engine. Due to this, achieving high throughput for transform coefficient coding was an important design consideration. This paper analyzes the throughput of different components of transform coefficient coding with special emphasis on the explicit coding of the last significant coefficient position and high throughput binarization. A comparison with H.264/AVC transform coefficient coding is also presented, demonstrating that HEVC transform coefficient coding achieves higher average and worst case throughput.
HEVC deblocking filtering and decisions
Andrey Norkin, Kenneth Andersson, Arild Fuldseth, et al.
The emerging High Efficiency Video Coding (HEVC) standard uses a block-based coding scheme, which may cause blocking artifacts, especially at lower bitrates. An adaptive in-loop deblocking filter is used in the standard to reduce visible artifacts at block boundaries. The deblocking filter detects artifacts at the block boundaries and attenuates them by applying a selected filter. This paper will present deblocking decisions and filtering operations that are used in HEVC.
The adaptive loop filtering techniques in the HEVC standard
Ching-Yeh Chen, Chia-Yang Tsai, Yu-Wen Huang, et al.
This article introduces adaptive loop filtering (ALF) techniques being considered for the HEVC standard. The key idea of ALF is to minimize the mean square error between original pixels and decoded pixels using Wiener-based adaptive filter coefficients. ALF is located at the last processing stage of each picture and can be regarded as a tool trying to catch and fix artifacts from previous stages. The suitable filter coefficients are determined by the encoder and explicitly signaled to the decoder. In order to achieve better coding efficiency, especially for high resolution videos, local adaptation is used for luma signals by applying different filter to different region in a picture. In addition to filter adaptation, filter on/off control at largest coding unit (LCU) level is also helpful for improving coding efficiency. Syntax-wise, filter coefficients are sent in a picture level header called adaptation parameter set (APS), and filter on/off flags of LCUs are interleaved at LCU level in the slice data. Besides supporting picture-based optimization of ALF, the syntax design can support low delay applications as well. When the filter coefficients in APS are trained by using a previous picture, filter on/off decisions can be made on the fly during encoding of LCUs, so the encoding latency is only one LCU. Simulation results show that the ALF can achieve on average 5% bit rate reduction and up to 27% bit rate reduction for 25 HD sequences. The run time increases are 1% and 10% for encoders and decoders, respectively, with un-optimized C++ codes in software.
Low Complexity and Highly Parallel Image Coding
icon_mobile_dropdown
Parallel hardware architecture for JPEG-LS based on domain decomposition
S. Ahmed, Z. Wang, M. Klaiber, et al.
JPEG-LS has a large number of different and independent context sets that provide the opportunity for par-allelism. As JPEG-LS, many of the lossless image compression standards have “adaptive” error modeling as the core part. This, however, leads to data dependency loops of the compression scheme such that a parallel compression of neighboring pixels is not possible. In this paper, a hardware architecture is proposed in order to achieve parallelism in the JPEG-LS compression. In the adaptive part of the algorithm, the context update and error modeling of a pixel belonging to a context number depends on the previous pixel having the same context number. On the other hand, the probability for two successive pixels to be in different contexts is only 17%. Thus storage is required for the intermediary pixels of the same context. In this architecture, a buffer mechanism is built to exploit the parallelism regardless of the adaptive characteristics. Despite the introduced architectural parallelism, the resulting JPEG-LS codec is fully compatible with the ISO/IEC 14495-1 JPEG-LS standard. A design for such a hardware system is provided here and simulated in FPGA which is also compared with a sequential pipelined architecture of JPEG-LS implemented in FPGA. The final design makes it possible to be applied with a streaming image sensor and does not require storing the entire image before compression. Thus it is capable of lossless compression of input images in real-time embedded systems.
Towards high-speed, low-complexity image coding: variants and modification of JPEG 2000
Thomas Richter, Sven Simon
Recently, the JPEG committee discussed the introduction of an "ultrafast" mode for JPEG 2000 encoding. This considered extension of the JPEG 2000 framework replaces the EBCOT coding by a combined Human- Runlength code, and adds an optional additional prediction step after quantization. While the resulting codec is not compatible with existing JPEG 2000, it still allows lossless transcoding from JPEG 2000 and back, and performance measurements show that it offers nearly the quality of JPEG 2000 and similar quality than JPEG XR at a much lower complexity comparable to the complexity of the IJG JPEG software. This work introduces the extension, and compares its performance with other JPEG standards and other extensions of JPEG 2000 currently under standardization.
GPU-specific reformulations of image compression algorithms
Jiří Matela, Petr Holub, Martin Jirman, et al.
Image compression has a number of applications in various fields, where processing throughput and/or latency is a crucial attribute and the main limitation of state-of-the-art implementations of compression algorithms. At the same time contemporary GPU platforms provide tremendous processing power but they call for specific algorithm design. We discuss key components of successful design of compression algorithms for GPUs and demonstrate this on JPEG and JPEG2000 implementations, each of which contains several types of algorithms requiring different approaches to efficient parallelization for GPUs. Performance evaluation of the optimized JPEG and JPEG2000 chain is used to demonstrate the importance of various aspects of GPU programming, especially with respect to real-time applications.
Optimization of the motion estimation for parallel embedded systems in the context of new video standards
Fabrice Urban, Olivier Déforges, Jean-Francois Nezan
The effciency of video compression methods mainly depends on the motion compensation stage, and the design of effcient motion estimation techniques is still an important issue. An highly accurate motion estimation can significantly reduce the bit-rate, but involves a high computational complexity. This is particularly true for new generations of video compression standards, MPEG AVC and HEVC, which involves techniques such as different reference frames, sub-pixel estimation, variable block sizes. In this context, the design of fast motion estimation solutions is necessary, and can concerned two linked aspects: a high quality algorithm and its effcient implementation. This paper summarizes our main contributions in this domain. In particular, we first present the HME (Hierarchical Motion Estimation) technique. It is based on a multi-level refinement process where the motion estimation vectors are first estimated on a sub-sampled image. The multi-levels decomposition provides robust predictions and is particularly suited for variable block sizes motion estimations. The HME method has been integrated in a AVC encoder, and we propose a parallel implementation of this technique, with the motion estimation at pixel level performed by a DSP processor, and the sub-pixel refinement realized in an FPGA. The second technique that we present is called HDS for Hierarchical Diamond Search. It combines the multi-level refinement of HME, with a fast search at pixel-accuracy inspired by the EPZS method. This paper also presents its parallel implementation onto a multi-DSP platform and the its use in the HEVC context.
Multi-modal low cost mobile indoor surveillance system on the Robust Artificial Intelligence-based Defense Electro Robot (RAIDER)
We present an autonomous system capable of performing security check routines. The surveillance machine, the Clearpath Husky robotic platform, is equipped with three IP cameras with different orientations for the surveillance tasks of face recognition, human activity recognition, autonomous navigation and 3D reconstruction of its environment. Combining the computer vision algorithms onto a robotic machine has given birth to the Robust Artificial Intelligencebased Defense Electro-Robot (RAIDER). The end purpose of the RAIDER is to conduct a patrolling routine on a single floor of a building several times a day. As the RAIDER travels down the corridors off-line algorithms use two of the RAIDER's side mounted cameras to perform a 3D reconstruction from monocular vision technique that updates a 3D model to the most current state of the indoor environment. Using frames from the front mounted camera, positioned at the human eye level, the system performs face recognition with real time training of unknown subjects. Human activity recognition algorithm will also be implemented in which each detected person is assigned to a set of action classes picked to classify ordinary and harmful student activities in a hallway setting.The system is designed to detect changes and irregularities within an environment as well as familiarize with regular faces and actions to distinguish potentially dangerous behavior. In this paper, we present the various algorithms and their modifications which when implemented on the RAIDER serves the purpose of indoor surveillance.
Computer aided decision support system for cervical cancer classification
Rahmadwati Rahmadwati, Golshah Naghdy, Montserrat Ros, et al.
Conventional analysis of a cervical histology image, such a pap smear or a biopsy sample, is performed by an expert pathologist manually. This involves inspecting the sample for cellular level abnormalities and determining the spread of the abnormalities. Cancer is graded based on the spread of the abnormal cells. This is a tedious, subjective and time-consuming process with considerable variations in diagnosis between the experts. This paper presents a computer aided decision support system (CADSS) tool to help the pathologists in their examination of the cervical cancer biopsies. The main aim of the proposed CADSS system is to identify abnormalities and quantify cancer grading in a systematic and repeatable manner. The paper proposes three different methods which presents and compares the results using 475 images of cervical biopsies which include normal, three stages of pre cancer, and malignant cases. This paper will explore various components of an effective CADSS; image acquisition, pre-processing, segmentation, feature extraction, classification, grading and disease identification. Cervical histological images are captured using a digital microscope. The images are captured in sufficient resolution to retain enough information for effective classification. Histology images of cervical biopsies consist of three major sections; background, stroma and squamous epithelium. Most diagnostic information are contained within the epithelium region. This paper will present two levels of segmentations; global (macro) and local (micro). At the global level the squamous epithelium is separated from the background and stroma. At the local or cellular level, the nuclei and cytoplasm are segmented for further analysis. Image features that influence the pathologists’ decision during the analysis and classification of a cervical biopsy are the nuclei’s shape and spread; the ratio of the areas of nuclei and cytoplasm as well as the texture and spread of the abnormalities. Similar features are extracted towards the automated classification process. This paper will present various feature extraction methods including colour, shape and texture using Gabor wavelet as well as various quantative metrics. Generated features are used to classify cells or regions into normal and abnormal categories. Following the classification process, the cancer is graded based on the spread of the abnormal cells. This paper will present the results of the grading process with five stages of the cancer spectrum.
Automatic and robust extrinsic camera calibration for high-accuracy mobile mapping
Werner Goeman, Koen Douterloigne, Peter Bogaert, et al.
A mobile mapping system (MMS) is the answer of the geoinformation community to the exponentially growing demand for various geospatial data with increasingly higher accuracies and captured by multiple sensors. As the mobile mapping technology is pushed to explore its use for various applications on water, rail, or road, the need emerges to have an external sensor calibration procedure which is portable, fast and easy to perform. This way, sensors can be mounted and demounted depending on the application requirements without the need for time consuming calibration procedures. A new methodology is presented to provide a high quality external calibration of cameras which is automatic, robust and fool proof.The MMS uses an Applanix POSLV420, which is a tightly coupled GPS/INS positioning system. The cameras used are Point Grey color video cameras synchronized with the GPS/INS system. The method uses a portable, standard ranging pole which needs to be positioned on a known ground control point. For calibration a well studied absolute orientation problem needs to be solved. Here, a mutual information based image registration technique is studied for automatic alignment of the ranging pole. Finally, a few benchmarking tests are done under various lighting conditions which proves the methodology’s robustness, by showing high absolute stereo measurement accuracies of a few centimeters.
Adaptive noise suppression technique for dense 3D point cloud reconstructions from monocular vision
Yakov Diskin, Vijayan K. Asari
Mobile vision-based autonomous vehicles use video frames from multiple angles to construct a 3D model of their environment. In this paper, we present a post-processing adaptive noise suppression technique to enhance the quality of the computed 3D model. Our near real-time reconstruction algorithm uses each pair of frames to compute the disparities of tracked feature points to translate the distance a feature has traveled within the frame in pixels into real world depth values. As a result these tracked feature points are plotted to form a dense and colorful point cloud. Due to the inevitable small vibrations in the camera and the mismatches within the feature tracking algorithm, the point cloud model contains a significant amount of misplaced points appearing as noise. The proposed noise suppression technique utilizes the spatial information of each point to unify points of similar texture and color into objects while simultaneously removing noise dissociated with any nearby objects. The noise filter combines all the points of similar depth into 2D layers throughout the point cloud model. By applying erosion and dilation techniques we are able to eliminate the unwanted floating points while retaining points of larger objects. To reverse the compression process, we transform the 2D layer back into the 3D model allowing points to return to their original position without the attached noise components. We evaluate the resulting noiseless point cloud by utilizing an unmanned ground vehicle to perform obstacle avoidance tasks. The contribution of the noise suppression technique is measured by evaluating the accuracy of the 3D reconstruction.
Computer Vision Techniques and Applications
icon_mobile_dropdown
Rotation invariant fast features for large-scale recognition
Gabriel Takacs, Vijay Chandrasekhar, Sam Tsai, et al.
We present an end-to-end feature description pipeline which uses a novel interest point detector and Rotation- Invariant Fast Feature (RIFF) descriptors. The proposed RIFF algorithm is 15× faster than SURF1 while producing large-scale retrieval results that are comparable to SIFT.2 Such high-speed features benefit a range of applications from Mobile Augmented Reality (MAR) to web-scale image retrieval and analysis.
Improved coding for image feature location information
Sam S. Tsai, David Chen, Gabriel Takacs, et al.
In mobile visual search applications, an image-based query is typically sent from a mobile client to the server. Because of the bit-rate limitations, the query should be as small as possible. When performing image-based retrieval with local features, there are two types of information: the descriptors of the image features and the locations of the image features within the image. Location information can be used to check geometric consistency of the set of features and thus improve the retrieval performance. To compress the location information, location histogram coding is an effective solution. We present a location histogram coder that reduces the bitrate by 2:8x when compared to a fixed-rate scheme and 12:5x when compared to a floating point representation of the locations. A drawback is the large context table which can be difficult to store in the coder and requires large training data. We propose a new sum-based context for coding the location histogram map. We show that it can reduce the context up to 200x while being able to perform just as well as or better than previously proposed location histogram coders.
On coding of images and SIFT feature descriptors
We offer probabilistic interpretation of meaning of SIFT image features. This allows us to derive formulae connecting SIFT feature values to parameters of gradient distributions. We also study KL-distances between gradient distributions and establish their connections to values in SIFT descriptors.
Encoding scene structures for video compression
Georgios Georgiadis, Avinash Ravichandran, Stefano Soatto, et al.
We describe an approach to partition a video stream into structure regions that are temporally encoded and disjoint from texture regions that are synthesized so as to preserve the statistical properties of the original data stream. Structures encode regions of an image that can be put into correspondence in different images of the same scene, and are encoded via a dictionary that takes into account spatial and temporal regularities. Textures are synthesized in a manner that preserves perceptual similarity.
An automatic identification and monitoring system for coral reef fish
Joseph Wilder, Chetan Tonde, Ganesh Sundar, et al.
To help gauge the health of coral reef ecosystems, we developed a prototype of an underwater camera module to automatically census reef fish populations. Recognition challenges include pose and lighting variations, complicated backgrounds, within-species color variations and within-family similarities among species. An open frame holds two cameras, LED lights, and two ‘background’ panels in an L-shaped configuration. High-resolution cameras send sequences of 300 synchronized image pairs at 10 fps to an on-shore PC. Approximately 200 sequences containing fish were recorded at the New York Aquarium’s Glover’s Reef exhibit. These contained eight ‘common’ species with 85–672 images, and eight ‘rare’ species with 5–27 images that were grouped into an ‘unknown/rare’ category for classification. Image pre-processing included background modeling and subtraction, and tracking of fish across frames for depth estimation, pose correction, scaling, and disambiguation of overlapping fish. Shape features were obtained from PCA analysis of perimeter points, color features from opponent color histograms, and ‘banding’ features from DCT of vertical projections. Images were classified to species using feedforward neural networks arranged in a three-level hierarchy in which errors remaining after each level are targeted by networks in the level below. Networks were trained and tested on independent image sets. Overall accuracy of species-specific identifications typically exceeded 96% across multiple training runs. A seaworthy version of our system will allow for population censuses with high temporal resolution, and therefore improved statistical power to detect trends. A network of such devices could provide an ‘early warning system’ for coral ecosystem collapse.
OCR enhancement through neighbor embedding and fast approximate nearest neighbors
Generic optical character recognition (OCR) engines often perform very poorly in transcribing scanned low resolution (LR) text documents. To improve OCR performance, we apply the Neighbor Embedding (NE) single-image super-resolution (SISR) technique to LR scanned text documents to obtain high resolution (HR) versions, which we subsequently process with OCR. For comparison, we repeat this procedure using bicubic interpolation (BI). We demonstrate that mean-square errors (MSE) in NE HR estimates do not increase substantially when NE is trained in one Latin font style and tested in another, provided both styles belong to the same font category (serif or sans serif). This is very important in practice, since for each font size, the number of training sets required for each category may be reduced from dozens to just one. We also incorporate randomized k-d trees into our NE implementation to perform approximate nearest neighbor search, and obtain a 1000x speed up of our original NE implementation, with negligible MSE degradation. This acceleration also made it practical to combine all of our size-specific NE Latin models into a single Universal Latin Model (ULM). The ULM eliminates the need to determine the unknown font category and size of an input LR text document and match it to an appropriate model, a very challenging task, since the dpi (pixels per inch) of the input LR image is generally unknown. Our experiments show that OCR character error rates (CER) were over 90% when we applied the Tesseract OCR engine to LR text documents (scanned at 75 dpi and 100 dpi) in the 6-10 pt range. By contrast, using k-d trees and the ULM, CER after NE preprocessing averaged less than 7% at 3x (100 dpi LR scanning) and 4x (75 dpi LR scanning) magnification, over an order of magnitude improvement. Moreover, CER after NE preprocessing was more that 6 times lower on average than after BI preprocessing.
Characterization and identification of smoke plume for early forest fire detection
John Saghri, John Jacobs, Daniel Kohler, et al.
Characterization and discrimination of fire smoke for a land-based early forest fire detection system are discussed. Preliminary results of several fire plume identification schemes applied to multispectral video data obtained from a number of controlled fire experiments are presented. The temporal, spectral, and spatial signatures of the fire are exploited. The methods discussed include: (1) range filtering followed by entropy filtering of the infrared (IR) video data, (2) dual range moving average differencing followed by principal component analysis (PCA) of IR video data, and (3) PCA of visible color video data followed by texture analysis and segmentation. The three schemes presented are tailored to detect the fire core, the heat plume, and the smoke, respectively.
Onboard pattern recognition for autonomous UAV landing
Chen-Ko Sung, Florian Segor
The civil security and supervision system AMFIS was developed at the Fraunhofer IOSB as a mobile support system using multiple UAVs for rescue forces in accidents or disasters. To gain a higher level of autonomy for these UAVs, different onboard process chains of image exploitation for tracking landmarks and of control technologies for UAV navigation were implemented and examined to achieve a redundant and reliable UAV precision landing. First experiments have allowed to validate the process chains and to develop a demonstration system for the tracking of landmarks in order to prevent and to minimize any confusion on landing.
Mobile Video and Application
icon_mobile_dropdown
Adapting video delivery based on motion triggered visual attention
Velibor Adzic, Hari Kalva, Lai-Tee Cheok
Cues from human visual system (HVS) can be used for further optimization of compression in modern hybrid video coding platforms. We present work that explores and exploits motion related attentional limitations. Algorithms for exploiting motion triggered attention were developed and compared with MPEG AVC/H.264 encoder with various settings for different bitrate levels. For the sequences with high motion activity our algorithm provides up to 8% bitrate savings.
Content adaptive enhancement of video images
Vladimir Lachine, Louie Lee, Gregory Smith
Digital video products such as TVs, set-up boxes and DV players have circuits that enhance quality of incoming video content. User may control parameters of these circuits according to video source for optimum quality. However, there is a need for a procedure that can adjust these parameters automatically without user interaction. A three stages method for content adaptive enhancement of video images (CAEVI) in display processors is proposed. The first stage measures video signal statistics such as intensity and frequency histograms over image’s active area. The following stage generates control parameters for image processing blocks after measured statistics analysis. One of four quality classes (low, medium, high or special) is assigned to the incoming video, and a set of predefined control parameters for this class is selected. At the third stage, the set of control parameters is applied to the corresponding image processing blocks to reduce noise, improve signal transitions, enhance spatial details, contrast, brightness and saturation, and resample the video image. Video signal statistics are measured and accumulated for each frame, and control parameters are gradually adjusted on scene basis. Measuring and processing blocks are implemented in hardware to provide real time response. Image analysis and quality classification algorithm is implemented in embedded software for flexibility. The proposed method has been implemented in video processor as “Auto HQV” feature. The method was originally developed for TVs and. It is currently under adaptation for hand held devices.
Error-resilient video coding for wireless video telephony applications
In this paper, we present an error resilient video coding scheme for wireless video telephony applications that uses feedback to limit error propagation. In conventional feedback-based error resilient schemes, error propagation can significantly degrade visual quality when feedback delay is in the order of a few seconds. We propose a coding structure based on multiple description coding that mitigates error propagation during feedback delay, and uses feedback to adapt its coding structure to effectively limit error propagation. We demonstrate the effectiveness of our approach at different error rates when compared to conventional coding schemes that use feedback.
A new context-model for the pseudo-distance technique in lossless compression of color-mapped images
In this work, we propose a method that utilizes a new context model along with a pseudo-distance technique in compression of color-mapped images. Graphic Interchange Format (GIF) and Portable Network Graphics (PNG) are two of the well-known and frequently used techniques for the compression of color-mapped images. There are several techniques that achieve better compression results than GIF and PNG; however, most of these techniques need two passes on the image data, while others do not run in linear time. The pseudo-distance technique runs in linear time and requires only one pass. We show that using the proposed context model along with the pseudo-distance technique yields better results than both PNG and GIF.
A novel convergence control method for toed-in stereo camera systems
In this paper, we present a novel convergence control method for toed-in stereo camera systems. The proposed method automatically computes a convergence angle for both (i.e., left and right) cameras to locate a target object at image center. Unlike other image based auto-convergence algorithms, the proposed method aims at controlling the angle of yaw of stereo camera, and thus makes a disparity of the target object be zero while capturing stereoscopic images. The proposed algorithm is based on the fact that an object at convergence position has a zero-disparity in stereoscopic images under the toed-in camera configuration. As a result, we can avoid the accommodation-convergence conflict while watching the target object in stereoscopic images. Experimental results demonstrate that the proposed method effectively estimates convergence angles for target objects at different distances.
Design of rate control for wireless video telephony applications
Zhifeng Chen, Yuriy A. Reznik
We propose a design of a rate control algorithm for low-delay video transmission over wireless channels. Our objective is to meet delay constraints and to make sure that the decoder buffer does not overflow or underflow. We approach this problem analytically, by studying the leaky bucket model in the variable rate transmission scenario, and deriving sufficient conditions for meeting our objective. We then employ these conditions in the design of the rate control algorithm. We report results obtained by using this algorithm with an MPEG-4 AVC/H.264 encoder and LTE channel simulator as a test platform.
Compression performance comparison in low delay real-time video for mobile applications
This article compares the performance of several current video coding standards in the conditions of low-delay real-time in a resource constrained environment. The comparison is performed using the same content and the metrics and mix of objective and perceptual quality metrics. The metrics results in different coding schemes are analyzed from a point of view of user perception and quality of service. Multiple standards are compared MPEG-2, MPEG4 and MPEG-AVC and well and H.263. The metrics used in the comparison include SSIM, VQM and DVQ. Subjective evaluation and quality of service are discussed from a point of view of perceptual metrics and their incorporation in the coding scheme development process. The performance and the correlation of results are presented as a predictor of the performance of video compression schemes.
Poster Session
icon_mobile_dropdown
Center location of circular targets with surface fitting method
Circular target is one of the most commonly used artificial markers in machine vision. An alternative method for center location of circular targets with sub-pixel accuracy based on surface fitting is presented in this paper. The gray level distribution around the image of the circular target is modeled starting form one-dimensional step edge smoothed with Gaussian filter, and then extending to two-dimensional case by means of variable substitution of the elliptical rotation. The surface model aforementioned is a non-elementary function, so an approximate expression is found subsequently to make the numerical computation executable. The parameters of the surface model are estimated with algebraic least square fitting, from which the accurate center location can be calculated. The experiment results show the proposed method is more robust to image degradation comparing with the most commonly used method.
Correction of circular center deviation in perspective projection
Circular targets are widely used in machine vision. The localization of circle center plays a crucial role in machine vision applications. In the process of camera imaging, the circles change to the ellipses in the image plane of camera because of perspective transformation. The center of ellipse usually does not coincide with the projected center of the circle, leading to a deviation of circle center. Based on perspective transformation and analytic geometry we present a new approach, in which the concentric circular targets are adopted and the true projective position of the circular target can be determined accurately. Both simulation and experiment results show that the proposed method is valid and robust. The true positions of the circular centers can be localized by proposed method without the center deviations.
Automatic registration of range images combined with the system calibration and global ICP
Xiaoli Liu, Xinhua He, Zeyi Liu, et al.
In this paper, we propose an approach for the automatic fast registration of range images which are captured by the 3D optical measurement system. The measurement system consists of multiple 3D sensors distributed from the top to the bottom separately, which are used to measure object from different views. And a one-axis turntable is constructed to drive object revolve around the axis with eight angles. In each orientation, we can obtain multiple range images of object with the measurement system. And then all range images of object are needed to register to uniform coordinate frame. Firstly, we establish an in-situ 3-D calibration target in a measurement volume, which consists of a number of marker points. The coordinates of those marker points are obtained from the photogrammetry technique and they are thereafter employed for the determination of the locations and orientations of 3D sensors, which will be used to implement the registration among the range images taken from multi-sensors in one angle view. In addition, the registration of range images of eight angles can be achieved by the calibration of the rotation axis. In the end, the global iterative closest points method is proposed to attain the fine registration of all range images. The experimental results demonstrate the validity of the registration approach.
Utilization of Solar Dynamics Observatory space weather digital image data for comparative analysis with application to Baryon Oscillation Spectroscopic Survey
V. Shekoyan, S. Dehipawala, Ernest Liu, et al.
Digital solar image data is available to users with access to standard, mass-market software. Many scientific projects utilize the Flexible Image Transport System (FITS) format, which requires specialized software typically used in astrophysical research. Data in the FITS format includes photometric and spatial calibration information, which may not be useful to researchers working with self-calibrated, comparative approaches. This project examines the advantages of using mass-market software with readily downloadable image data from the Solar Dynamics Observatory for comparative analysis over with the use of specialized software capable of reading data in the FITS format. Comparative analyses of brightness statistics that describe the solar disk in the study of magnetic energy using algorithms included in mass-market software have been shown to give results similar to analyses using FITS data. The entanglement of magnetic energy associated with solar eruptions, as well as the development of such eruptions, has been characterized successfully using mass-market software. The proposed algorithm would help to establish a publicly accessible, computing network that could assist in exploratory studies of all FITS data. The advances in computer, cell phone and tablet technology could incorporate such an approach readily for the enhancement of high school and first-year college space weather education on a global scale. Application to ground based data such as that contained in the Baryon Oscillation Spectroscopic Survey is discussed.
Video-based face identification using unconstrained non-linear composite filters
Everardo Santiago-Ramírez, J.-A. Gonzalez-Fraga, J.-I. Ascencio-Lopez, et al.
This paper considers the face identification task in video sequences where the individual’s face presents variations; such as expressions, pose, scale, shadow/lighting and occlusion. The principles of Synthetic Discriminant Functions (SDF) and K-Law filters are used to design an adaptive unconstrained correlation filter (AUNCF). We developed a face tracking algorithm which together with a face recognition algorithm were carefully integrated into a video-based face identification method. First, a manually selected face in the first video frame is identified. Then, in order to build an initial correlation filter, the selected face is distorted so that it generates a training set. Finally, the face tracking task is performed using the initial correlation filter which is updated through the video sequence. The efficiency of the proposed method is shown by experiments on video sequences, where different facial variations are presented. The proposed method correctly identifies and tracks the face under observation on the tested video sequences.
A comparison of autonomous techniques for multispectral image analysis and classification
Multispectral imaging has given place to important applications related to classification and identification of objects from a scene. Because of multispectral instruments can be used to estimate the reflectance of materials in the scene, these techniques constitute fundamental tools for materials analysis and quality control. During the last years, a variety of algorithms has been developed to work with multispectral data, whose main purpose has been to perform the correct classification of the objects in the scene. The present study introduces a brief review of some classical as well as a novel technique that have been used for such purposes. The use of principal component analysis and K-means clustering techniques as important classification algorithms is here discussed. Moreover, a recent method based on the min-W and max-M lattice auto-associative memories, that was proposed for endmember determination in hyperspectral imagery, is introduced as a classification method. Besides a discussion of their mathematical foundation, we emphasize their main characteristics and the results achieved for two exemplar images conformed by objects similar in appearance, but spectrally different. The classification results state that the first components computed from principal component analysis can be used to highlight areas with different spectral characteristics. In addition, the use of lattice auto-associative memories provides good results for materials classification even in the cases where some spectral similarities appears in their spectral responses.
Multifocus image fusion using Zernike moments
C. Toxqui-Quitl, E. Velázquez-Ramírez, A. Padilla-Vivanco, et al.
A multifocus image fusion method using orthogonal moments is presented. The focus measure is based in the compute of Zernike and Harmonic moments of an image function. The algorithm divides the input images into blocks and evaluate the contrast of each block. From this, the boundaries between focused and defocused regions have been determined. The method selects the better focused regions to create the final focused image. The method is based on orthogonal basis functions, which are used as moment weighting kernel. Fusion results show that, Zernike-Fourier moments can achieve a high quality fusion. However, Harmonics moments achieve a good fusion with a simple average of moments.
Graphical user interface (GUIDE) for the control of two methods of recovery profiles for tridimensional objects
Marco Antonio Canchola Chávez, Estela López Olazagasti, Gustavo Ramírez Zavaleta, et al.
The recovery profile of an object is of great interest in different technical, metrological and medical applications. In this work we present a system which, through a single graphical interface, we can set grid generation, acquisition and image processing for two different fringe projection methods; such as the Phase-Shifting method and the so-called Takeda method. Both techniques are aimed to obtain surfaces profiles through phase recovery. The use of the proposed system has the advantage that there is no need for decoupled systems for fringe projection for image acquisition and one for processing them. We present some preliminary results obtained using the proposed system.
Static sign language recognition using 1D descriptors and neural networks
José F. Solís, Carina Toxqui, Alfonso Padilla, et al.
A frame work for static sign language recognition using descriptors which represents 2D images in 1D data and artificial neural networks is presented in this work. The 1D descriptors were computed by two methods, first one consists in a correlation rotational operator.1 and second is based on contour analysis of hand shape. One of the main problems in sign language recognition is segmentation; most of papers report a special color in gloves or background for hand shape analysis. In order to avoid the use of gloves or special clothing, a thermal imaging camera was used to capture images. Static signs were picked up from 1 to 9 digits of American Sign Language, a multilayer perceptron reached 100% recognition with cross-validation.
Wiener filtering in the process of dark current suppression
Jan Švihlík, Frantisek Mojžíš, Karel Fliegel, et al.
This paper is devoted to the dark current suppression in astronomical and multimedia images using the wiener filtering. We consider dark current represented by dark frame as a white impulsive noise generated in CCD sensor. The wiener filter is then set up accordance to measured second sample moments at CCD at chosen temperature range 268 K to 293 K. Furthermore, the temperature dependency of the second sample moments was fit by exponential regression. Hence, we are able to find sample moment and suppress dark current at given temperature. The measurement was done at SBIG ST8 camera.
A fast matching algorithm based on local gradient histograms
Image matching is an important task in image processing. Basically two different problems are distinguished: detection of a reference image in a scene and estimation of its exact position. Recently many matching algorithms have been proposed. In this work, we propose a hybrid matching algorithm based on recursive calculation of local gradient histograms and pyramidal representation of matched images. The proposed algorithm is fast and invariant to affine transformations such as rotation, translation, and scaling. Computer simulation results obtained with the suggested algorithm are presented and compared with those of common matching techniques.
Face recognition in real uncontrolled environment with correlation filters
Susana Garduño-Massieu, Vitaly Kober
Face recognition is a task that humans perform daily and effortlessly. In pattern recognition and computer vision, there has been an increasing interest in automatic face recognition over the past years. Facial recognition systems face challenging problems owing to inherent variations of some factors at image acquisition, such as nonuniform illumination, pose changes, occlusion and ageing. Numerous techniques were proposed for face recognition in still images. Despite recent achievements in this area, the problem of a reliable facial recognition in a real uncontrolled scene still remains open. This work introduces an algorithm that is based on composite correlation filters and does not require prior face segmentation. Optimal filters with respect to the discrimination capability criterion are derived and used to synthesize a single composite filter that can be used for distortion invariant face recognition. Computer simulation results obtained with the suggested algorithm are presented and discussed.
Comparison of stereoscopic technologies in various configurations
Karel Fliegel, Stanislav Vítek, Tomáš Jindra, et al.
The aim of this paper is twofold. In the first part of the paper we present results of subjective quality assessment based comparison of stereoscopic technologies in various configurations. Subjective assessment has been done on a limited set of observers while using a database of stereoscopic test videos of various source types. There is also comparison of results obtained with the same stereoscopic content from the two cooperating test laboratories. The results can be used to address different aspects of viewing experience, especially comparing passive and active stereoscopic display technologies. The second part of the paper is focused on preliminary experimental results analyzing the vergence-accommodation conflict present in current stereoscopic systems. Simultaneous measurement of the vergence and accommodation has been done with observers viewing a real scene and its stereoscopic reproduction.
Reconstruction of tridimensional objects with two different textures using Gaussian model
Luis David Lara-Rodríguez, Elizabeth López-Meléndez, Jorge M. Ibarra Galitzia, et al.
We present in this work the use of two Gaussian models that describe reflectance of two different textures for the reconstruction of tridimensional objects made of these textures. The textures of the objects are segmented using a combination of image processing techniques. These segmentations are correlated with the Gaussian model for corresponding texture, for the reconstruction of the tridimensional object. We show our preliminary experimental result and we discuss the advantages and disadvantages of the Gaussian models, which are compared with the traditional representation of the inverse square law of light.
Application of real-time single camera SLAM technology for image-guided targeting in neurosurgery
Yau-Zen Chang, Jung-Fu Hou, Yi Hsiang Tsao, et al.
In this paper, we propose an application of augmented reality technology for targeting tumors or anatomical structures inside the skull. The application is a combination of the technologies of MonoSLAM (Single Camera Simultaneous Localization and Mapping) and computer graphics. A stereo vision system is developed to construct geometric data of human face for registration with CT images. Reliability and accuracy of the application is enhanced by the use of fiduciary markers fixed to the skull. The MonoSLAM keeps track of the current location of the camera with respect to an augmented reality (AR) marker using the extended Kalman filter. The fiduciary markers provide reference when the AR marker is invisible to the camera. Relationship between the markers on the face and the augmented reality marker is obtained by a registration procedure by the stereo vision system and is updated on-line. A commercially available Android based tablet PC equipped with a 320×240 front-facing camera was used for implementation. The system is able to provide a live view of the patient overlaid by the solid models of tumors or anatomical structures, as well as the missing part of the tool inside the skull.
Vessel classification in overhead satellite imagery using learned dictionaries
Katie Rainey, Shibin Parameswaran, Josh Harguess, et al.
Recognition and classification of vessels in maritime imagery is a challenging problem with applications to security and military scenarios. Aspects of this problem are similar to well-studied problems in object recognition, but it is in many ways more complex than a problem such as face recognition. A vessel's appearance can vary significantly from image to image depending on factors such as lighting condition, viewing geometry, and sea state, and there is often wide variation between ships of the same class. This paper explores the efficacy of several object recognition algorithms at classifying ships and other ocean vessels in commercial panchromatic satellite imagery. The recognition algorithms tested include traditional classification methods as well as more recent methods utilizing sparse matrix representations and dictionary learning. The impacts on classification accuracy of various pre-processing steps on vessel imagery are explored, and we discuss how these algorithms are being used in existing systems to detect and classify vessels in satellite imagery.
Light field optical flow for refractive surface reconstruction
This paper discusses a method to reconstruct a transparent ow surface from single camera shot with the aid of a Micro-lens array. An intentionally prepared high frequency background which is placed behind the refractive flow is captured and a curl-free optical flow algorithm is applied between pairs of images taken by different micro-lenses. The computed raw optical ow vector is a blend of motion parallax and background deformation vector due to the underlying flow. Subtracting the motion parallax, which is obtained by calibration, from the total op- optical flow vector yields the background deformation vector. The deflection vectors on each images are used to reconstruct the flow profile. A synthetic data set of fuel injection was used to evaluate the accuracy of the proposed algorithm and good agreement was achieved between the test and reconstructed data. Finally, real light field data of hot air created by a lighter flame is used to reconstruct and show a hot air plume surface.