Proceedings Volume 6508

Visual Communications and Image Processing 2007

cover
Proceedings Volume 6508

Visual Communications and Image Processing 2007

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 28 January 2007
Contents: 14 Sessions, 106 Papers, 0 Presentations
Conference: Electronic Imaging 2007 2007
Volume Number: 6508

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 6508
  • Video Coding I
  • Image and Video Analysis
  • Special Session: The Mathematics of Imaging
  • Special Session: Collaborative Object Tracking
  • Distributed Video Coding
  • Wavelet Representation and Coding
  • Image Registration and Recognition
  • Special Session: Next-Generation Video Coding Technologies
  • Media Communication and Networking
  • Imaging Systems
  • Video Coding II
  • Motion Estimation
  • Poster Session
Front Matter: Volume 6508
icon_mobile_dropdown
Front Matter: Volume 6508
This PDF file contains the front matter associated with SPIE Proceedings Volume 6508, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and the Conference Committee listing.
Video Coding I
icon_mobile_dropdown
Real-time video coding under power constraint based on H.264 codec
Li Su, Yan Lu, Feng Wu, et al.
In this paper, we propose a joint power-distortion optimization scheme for real-time H.264 video encoding under the power constraint. Firstly, the power constraint is translated to the complexity constraint based on DVS technology. Secondly, a computation allocation model (CAM) with virtual buffers is proposed to facilitate the optimal allocation of constrained computational resource for each frame. Thirdly, the complexity adjustable encoder based on optimal motion estimation and mode decision is proposed to meet the allocated resource. The proposed scheme takes the advantage of some new features of H.264/AVC video coding tools such as early termination strategy in fast ME. Moreover, it can avoid suffering from the high overhead of the parametric power control algorithms and achieve fine complexity scalability in a wide range with stable rate-distortion performance. The proposed scheme also shows the potential of a further reduction of computation and power consumption in the decoding without any change on the existing decoders.
A low bit-rate video coding approach using modified adaptive warping and long-term spatial memory
Ying Chen, Clyde Lettsome, Mark Smith, et al.
In this paper, an H.264/AVC video coding strategy is introduced that employs a spatial-temporal video sequence representation in which video frames are coded at a low spatial sampling rate and reference I frames are coded at high spatial resolution. High spatial frequency information is re-synthesized at the receiver side using an adaptive motion estimation and warping method. The approach as presented is shown to improve coding quality for sequences with low to moderate motion.
Rate-prediction structure complexity analysis for multi-view video coding using hybrid genetic algorithms
Yebin Liu, Qionghai Dai, Zhixiang You, et al.
Efficient exploitation of the temporal and inter-view correlation is critical to multi-view video coding (MVC), and the key to it relies on the design of prediction chain structure according to the various pattern of correlations. In this paper, we propose a novel prediction structure model to design optimal MVC coding schemes along with tradeoff analysis in depth between compression efficiency and prediction structure complexity for certain standard functionalities. Focusing on the representation of the entire set of possible chain structures rather than certain typical ones, the proposed model can given efficient MVC schemes that adaptively vary with the requirements of structure complexity and video source characteristics (the number of views, the degrees of temporal and interview correlations). To handle large scale problem in model optimization, we deploy a hybrid genetic algorithm which yields satisfactory results shown in the simulations.
Comparison of standard-based H.264 error-resilience techniques and multiple-description coding for robust MIMO-enabled video transmission
Milos Tesanovic, David R. Bull, Dimitris Agrafiotis, et al.
MIMO (multiple-input-multiple-output) systems offer potential for throughput increase and enhanced quality of service for multimedia transmission. The underlying multipath environment requires new error-resilience techniques if the obtained benefits are to be fully exploited. Different MIMO architectures produce error-patterns of somewhat diverse characteristics. This paper proposes the use of multiple-description coding (MDC) as an approach that outperforms the standard-based error-resilience techniques in the majority of these cases. Results obtained from the random packet-error generator are furthered through the use of realistic MIMO channel scenarios and argue in favour of the deployment of an MDC-based video transmission system. Singular value decomposition (SVD) is used to create orthogonal sub-channels within a MIMO system which provide, depending on their respective gains and fading characteristics, an efficient means of mapping video content. Results indicate improvements in average PSNR of decoded test-sequences of up to 3 dB (5dB in the region of high PERs) compared to standard, single-description video transmission. This is also supported by significant subjective quality enhancements.
Spatial and temporal models for texture-based video coding
In this paper, we investigate spatial and temporal models for texture analysis and synthesis. The goal is to use these models to increase the coding efficiency for video sequences containing textures. The models are used to segment texture regions in a frame at the encoder and synthesize the textures at the decoder. These methods can be incorporated into a conventional video coder (e.g. H.264) where the regions to be modeled by the textures are not coded in a usual manner but texture model parameters are sent to the decoder as side information. We showed that this approach can reduce the data rate by as much as 15%.
Content-adaptive motion estimation for efficient video compression
Motion estimation is the most important step in the video compression. Most of the current video compression systems use forward motion estimation, where motion information is derived at the encoder and sent to the decoder over the channel. Backward motion estimation does not derive an explicit representation of motion at the encoder. Instead, the encoder implicitly embeds the motion information in an alternative subspace. Most recently, an algorithm that adopts least-square prediction (LSP) for backward motion estimation has shown great potential to further improve coding efficiency. Forward motion estimation and backward motion estimation have both their advantages and disadvantages. Each is suitable for handling some specific category of patterns. In this paper, we propose a novel approach that combines both forward motion estimation and backward motion estimation in one framework to adaptively exploit the local motion characteristics in an arbitrary video sequence, thus achieving better coding efficiency. We refer to this as Content-Adaptive Motion Estimation (CoME). The encoder in the proposed system is able to adjust the motion estimation method in a rate-distortion optimized manner. According to the experimental results, CoME reduces the data rate in both lossless and lossy compression.
Image and Video Analysis
icon_mobile_dropdown
Feature point tracking combining the Interacting Multiple Model filter and an efficient assignment algorithm
David Marimon, Yousri Abdeljaoued, Bruno Palacios, et al.
An algorithm for feature point tracking is proposed. The Interacting Multiple Model (IMM) filter is used to estimate the state of a feature point. The problem of data association, i.e. establishing which feature point to use in the state estimator, is solved by an assignment algorithm. A track management method is also developed. In particular a track continuation method and a track quality indicator are presented. The evaluation of the tracking system on real sequences shows that the IMM filter combined with the assignment algorithm outperforms the Kalman filter, used with the Nearest Neighbour (NN) filter, in terms of data association performance and robustness to sudden feature point manoeuvre.
Trajectory-based ball detection and tracking with aid of homography in broadcast tennis video
Xinguo Yu, Nianjuan Jiang, Ee Luang Ang
Ball-detection-and-tracking in broadcast tennis video (BTV) is a crucial but challenging task in tennis video semantics analysis. Informally, the challenges are due to camera motion and the other causes such as the presence of many ball-like objects and the small size of the tennis ball. The trajectory-based approach proposed by us in our previous papers mainly counteracted the challenges imposed by causes other than camera motion and achieves a good performance. This paper proposes an improved trajectory-based ball detection and tracking algorithm in BTV with the aid of homography, which counteracts the challenges caused by camera motion and bring us multiple new merits. Firstly, it acquires an accurate homography, which transforms each frame into the "standard" frame. Secondly, it achieved higher accuracy of ball identification. Thirdly, it obtains the ball projection position in the real world, instead of ball location in the image. Lastly, it also identifies landing frames and positions of the ball. The experimental results show that the improved algorithm can obtain not only higher accuracy in ball identification and in ball position alike, but also ball landing frames and positions. With the intent of using homography to improve the video-based event detection for smart home we also do some experiments on acquiring the homography for home surveillance video.
A robustified hidden Markov model for visual tracking with subspace representation
This paper describes a new, robustified Hidden Markov Model for target tracking using a subspace representation. The Hidden Markov Model (HMM) provides a powerful framework for the probabilistic modelling of observations and states. Visual tracking problems are often cast as an inference problem within the HMM framework. Probabilistic Principal Component Analysis (PPCA), a classic subspace representation method, is a popular tool for appearance modelling because it provides a compact representation for high-dimensional data. Previous subspace based tracking algorithms assume the image observations were generated from a Gaussian distribution parameterized by principal components. One drawback of using Gaussian density model is that atypical observations cannot be modelled well. Hence, they are very sensitive to outliers. To address this problem, we propose to augment the HMM by adding a set of latent variables {wi}ti=1 to adjust the shape of the observation distribution. By carefully choosing the distribution of {w i}ti=1, we obtain a more robust observation distribution with heavier tails than a Gaussian. Numerical experiments demonstrate the effectiveness of this new framework in cases where the target objects are corrupted by noise or occlusion.
Optimal multiple sprite generation based on physical camera parameter estimation
We present a robust and computational low complex method to estimate the physical camera parameters, intrinsic and extrinsic, for scene shots captured by cameras applying pan, tilt, rotation, and zoom. These parameters are then used to split a sequence of frames into several subsequences in an optimal way to generate multiple sprites. Hereby, optimal means a minimal usage of memory while keeping or even improving the reconstruction quality of the scene background. Since wide angles between two frames of a scene shot cause geometrical distortions using a perspective mapping it is necessary to part the shot into several subsequences. In our approach it is not mandatory that all frames of a subsequence are adjacent frames in the original scene. Furthermore the angle-based classification allows frame reordering and makes our approach very powerful.
Geometrical image filtering with connected operators and image inpainting
This paper deals with the joint use of connected operators and image inpainting for image filtering. Connected operators filter the image by merging its flat zones while preserving contour information. Image inpainting restores the values of an image for a destroyed or consciously masked subregion of the image domain. In the present paper, it will be shown that image inpainting can be combined with connected operators to perform an efficient geometrical filtering technique. First, connected operators are presented and their drawbacks for certain applications are highlighted. Second, image inpainting methodology is introduced and a structural image inpainting algorithm is described. Finally, a general filtering scheme is proposed to show how the drawbacks of connected operators can be efficiently solved by structural image inpainting.
Maximum-entropy expectation-maximization algorithm for image processing and sensor networks
In this paper, we propose a maximum-entropy expectation-maximization algorithm. We use the proposed algorithm for density estimation. The maximum-entropy constraint is imposed in order to ensure smoothness of the estimated density function. The exact derivation of the maximum-entropy expectation-maximization algorithm requires determination of the covariance matrix combined with the maximum entropy likelihood function, which is difficult to solve directly. We therefore introduce a new lower-bound for the EM algorithm derived by using the Cauchy-Schwartz inequality to obtain a suboptimal solution. We use the proposed algorithm for function interpolation and image segmentation. We propose the use of the EM algorithm for image recovery from randomly sampled data and signal reconstruction from randomly scattered sensors. We further propose to use our approach to maximum-entropy expectation-maximization (MEEM) in all of these applications. Computer simulation experiments are used to demonstrate the performance of our algorithm in comparison to existing methods.
Special Session: The Mathematics of Imaging
icon_mobile_dropdown
Mathematical aspects of shape analysis for object recognition
In this paper we survey some of the mathematical techniques that have led to useful new results in shape analysis and their application to a variety of object recognition tasks. In particular, we will show how these techniques allow one to solve a number of fundamental problems related to object recognition for configurations of point features under a generalized weak perspective model of image formation. Our approach makes use of progress in shape theory and includes the development of object-image equations for shape matching and the exploitation of shape space metrices (especially object-image metrics) to measure matching up to certain transformations. This theory is built on advanced mathematical techniques from algebraic and differential geometry which are used to construct generalized shape spaces for various projection and sensor models. That construction in turn is used to find natural metrics that express the distance (geometric difference) between two configurations of object features, two configurations of image features, or an object and an image pair. Such metrics are believed to produce the most robust tests for object identification; at least as far as the object's geometry is concerned. Moreover, these metrics provide a basis for efficient hashing schemes to do identification quickly, and they provide a rigorous foundation for error and statistical analysis in any recognition system. The most important feature of a shape theoretic approach is that all of the matching tests and metrics are independent of the choice of coordinates used to express the feature locations on the object or in the image. In addition, the approach is independent of the camera/sensor position and any camera/sensor parameters. Finally, the method is also independent of object pose or image orientation. This is what makes the results so powerful.
Challenges in 3DTV image processing
André Redert, Robert-Paul Berretty, Chris Varekamp, et al.
Philips provides autostereoscopic three-dimensional display systems that will bring the next leap in visual experience, adding true depth to video systems. We identified three challenges specifically for 3D image processing: 1) bandwidth and complexity of 3D images, 2) conversion of 2D to 3D content, and 3) object-based image/depth processing. We discuss these challenges and our solutions via several examples. In conclusion, the solutions have enabled the market introduction of several professional 3D products, and progress is made rapidly towards consumer 3DTV.
The algebra and statistics of generalized principal component analysis
We consider the problem of simultaneously segmenting data samples drawn from multiple linear subspaces and estimating model parameters for those subspaces. This "subspace segmentation" problem naturally arises in many computer vision applications such as motion and video segmentation, and in the recognition of human faces, textures, and range data. Generalized Principal Component Analysis (GPCA) has provided an effective way to resolve the strong coupling between data segmentation and model estimation inherent in subspace segmentation. Essentially, GPCA works by first finding a global algebraic representation of the unsegmented data set, and then decomposing the model into irreducible components, each corresponding to exactly one subspace. We provide a summary of important algebraic properties and statistical facts that are crucial for making GPCA both efficient and robust, even when the given data are corrupted with noise or contaminated by outliers. We demonstrate the effectiveness of GPCA using a large testbed of synthetic and real experiments.
Segmentation of multivariate mixed data via lossy coding and compression
Harm Derksen, Yi Ma, Wei Hong, et al.
In this paper, based on ideas from lossy data coding and compression, we present a simple but surprisingly effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions or linear subspaces. The goal is to find the optimal segmentation that minimizes the overall coding length of the segmented data, subject to a given distortion. We show that deterministic segmentation minimizes an upper bound on the (asymptotically) optimal solution. The proposed algorithm does not require any prior knowledge of the number or dimension of the groups, nor does it involve any parameter estimation. Simulation results reveal intriguing phase-transition behaviors of the number of segments when changing the level of distortion or the amount of outliers. Finally, we demonstrate how this technique can be readily applied to segment real imagery and bioinformatic data.
Integral invariants for 3D curves: an inductive approach
Shuo Feng, Irina A. Kogan, Hamid Krim
In this paper we obtain, for the first time, explicit formulae for integral invariants for curves in 3D with respect to the special and the full affine groups. Using an inductive approach we first compute Euclidean integral invariants and use them to build the affine invariants. The motivation comes from problems in computer vision. Since integration diminishes the effects of noise, integral invariants have advantage in such applications. We use integral invariants to construct signatures that characterize curves up to the special affine transformations.
Variable elimination for 3D from 2D
Ji Zhang, Mireille Boutin, Daniel G. Aliaga
Accurately reconstructing the 3D geometry of a scene or object observed on 2D images is a difficult problem: there are many unknowns involved (camera pose, scene structure, depth factors) and solving for all these unknowns simultaneously is computationally intensive and suffers from numerical instability. In this paper, we algebraically decouple some of the unknowns so that they can be solved for independently. Decoupling the pose from the other variables has been previously discussed in the literature. Unfortunately, pose estimation is an ill-conditioned problem. In this paper, we algebraically eliminate all the camera pose parameters (i.e., position and orientation) from the structure-from-motion equations for an internally calibrated camera. We then also fully eliminate the structure coordinates from the equations. This yields a very simple set of homogeneous polynomial equations of low degree involving only the depths of the observed points. When considering a small number of tracked points and pictures (e.g., five points on two pictures), these equations can be solved using the sparse resultant method.
Special Session: Collaborative Object Tracking
icon_mobile_dropdown
Bayesian distributed articulated object tracking using multiple collaborative trackers
In this paper, we propose two novel articulated object tracking approaches. The Decentralized Articulated Object Tracking approach avoids the common practice of using a high-dimensional joint state representation for articulated object tracking. Instead, it introduces a decentralized scheme and models the inter-part interaction within an innovative Bayesian framework. To handle severe self-occlusions, we further extend the first approach by modeling high-level inter-unit interactions and develop the Hierarchical Articulated Object Tracking algorithm within a consistent hierarchical framework. Preliminary experimental results have demonstrated the superior performance of the proposed approaches for real-world videos sequences.
Tracking people in mixed modality systems
Yuri Ivanov, Alexander Sorokin, Christopher Wren, et al.
In traditional surveillance systems tracking of objects is achieved by means of image and video processing. The disadvantages of such surveillance systems is that if an object needs to be tracked - it has to be observed by a video camera. However, geometries of indoor spaces typically require a large number of video cameras to provide the coverage necessary for robust operation of video-based tracking algorithms. Increased number of video streams increases the computational burden on the surveillance system in order to obtain robust tracking results. In this paper we present an approach to tracking in mixed modality systems, with a variety of sensors. The system described here includes over 200 motion sensors as well as 6 moving cameras. We track individuals in the entire space and across cameras using contextual information available from the motion sensors. Motion sensors allow us to almost instantaneously find plausible tracks in a very large volume of data, ranging in months, which for traditional video search approaches could be virtually impossible. We describe a method that allows us to evaluate when the tracking system is unreliable and present the data to a human operator for disambiguation.
Multiple hypothesis shape tracking using particle filtering and Hough-based observation models
Alessio Dore, Majid Asadi, Carlo S. Regazzoni
In the last years, the Particle Filter algorithm has been extensively proposed and employed for handling the problem of visual tracking of multiple moving objects under different assumptions. This wide usage is due to the capability of performing a recursive multiple hypothesis state estimation for non-linear non-Gaussian motion and observation models. In this paper a method, based on the Particle Filter framework, is proposed for multiple objects tracking, exploiting a target representation consisting of position and shape described as a fixed dimensionality vector composed by a fixed number of grouped target corners. However, usually, application domains of visual tracking algorithms are characterized by non-rigid objects and high occlusions rate entailing new corners to appear and others to disappear at each frame. In order to cope with this problem, a voting method (i.e. the Generalized Hough Transform) is employed to estimate the likelihood function to weight different propagated particles (i.e. multiple corners configurations describing shapes) by means of the corners extracted from the currently observed image. This method, in addition to the high dimensionality of the state representation, depicts the two main particularities of the presented Particle Filter. The proposed algorithm has been tested in a real-world domain and experiments indicate good results in tracking both rigid and non-rigid objects.
Collaborative tracking of objects in EPTZ cameras
Faisal Bashir, Fatih Porikli
This paper addresses the issue of multi-source collaborative object tracking in high-definition (HD) video sequences. Specifically, we propose a new joint tracking paradigm for the multiple stream electronic pan-tilt-zoom (EPTZ) cameras. These cameras are capable of transmitting a low resolution thumbnail (LRT) image of the whole field of view as well as a high-resolution cropped (HRC) image for the target region. We exploit this functionality to perform joint tracking in both low resolution image of the whole field of view as well as high resolution image of the moving target. Our system detects objects of interest in the LRT image by background subtraction and tracks them using iterative coupled refinement in both LRT and HRC images. We compared the performance of our joint tracking system with that of tracking only in the HD mode. The results of our experiments show improved performance in terms of higher frame rates and better localization.
Particle filter-based camera tracker fusing marker and feature point cues
David Marimon, Yannick Maret, Yousri Abdeljaoued, et al.
This paper presents a video-based camera tracker that combines marker-based and feature point-based cues within a particle filter framework. The framework relies on their complementary performances. On the one hand, marker-based trackers can robustly recover camera position and orientation when a reference (marker) is available but fail once the reference becomes unavailable. On the other hand, filter-based camera trackers using feature point cues can still provide predicted estimates given the previous state. However, the trackers tend to drift and usually fail to recover when the reference reappears. Therefore, we propose a fusion where the estimate of the filter is updated from the individual measurements of each cue. The particularity of the fusion filter is to manipulate different sorts of cues in a single framework. The framework keeps a single motion model and its prediction is corrected by one cue at a time. More precisely, the marker-based cue is selected when the reference is available whereas the feature point-based cue is selected otherwise. The filter's state is updated by switching between two different likelihood distributions. Each likelihood distribution is adapted to the type of measurement (cue). Evaluations on real cases show that the fusion of these two approaches outperforms the individual tracking results.
Distributed Video Coding
icon_mobile_dropdown
Robust distributed multi-view video compression for wireless camera networks
We propose a novel method of exploiting inter-view correlation among cameras that have overlapping views in order to deliver error-resilient video in a distributed multi-camera system. The main focus in this work is on robustness which is imminently needed in a wireless setting. Our system has low encoding complexity, is robust while satisfying tight latency constraints, and requires no inter-sensor communication. In this work, we build on and generalize PRISM [Puri2002], an earlier proposed single-camera distributed video compression system. Specifically, decoder motion search, a key attribute of single-camera PRISM, is extended to the multi-view setting to include decoder disparity search based on two-view camera geometry. Our proposed system, dubbed PRISM-MC (PRISM multi-camera), achieved PSNR gains of up to 1.7 dB over a PRISM based simulcast solution in experiments over a wireless channel simulator.
Hybrid key/Wyner-Ziv frames with flexible macroblock ordering for improved low delay distributed video coding
D. Agrafiotis, P. Ferré, D. R. Bull
This paper proposes a concealment based approach to generating the side information and estimating the correlation noise for low-delay, pixel-based, distributed video coding. The proposed method employs a macroblock pattern similar to the one used in the dispersed type FMO of H.264 for grouping the macroblocks of each frame into intra coded (key) and Wyner-Ziv groups. Temporal concealment is then used at the decoder for "concealing" the missing macroblocks (estimating the side information-predicting the Wyner-Ziv macroblocks). The actual intra coded/decoded macroblocks are used for estimating the correlation noise. The results indicate significant performance improvements relative to existing motion extrapolation based approaches (up to 25% bit rate reduction).
Distributed video coding based on constrained rate adaptive low density parity check codes
In this paper, we present a distributed video coding scheme based on zero motion identification at the decoder and constrained rate adaptive low density parity check (LDPC) codes. Zero-motion-block identification mechanism is introduced at the decoder, which takes the characters of video sequence into account. The constrained error control decoder can use the bits in the zero motion blocks as a constraint to achieve a better decoding performance and further improve the overall video compression efficiency. It is only at the decoder side that the proposed scheme exploits temporal and spatial redundancy without introducing any additional processing at the encoder side, which keeps the complexity of the encoding as low as possible with certain compression efficiency. As a powerful alternative to Turbo codes, LDPC codes have been applied to our scheme. Since video data are highly non-ergodic, we use rate-adaptive LDPC codes to fit this variation of the achievable compression rate in our scheme. We propose a constrained LDPC decoder not only to improve the decoder efficiency but also to speed the convergence of the iterative decoding. Simulation demonstrates that the scheme has significant improvement in the performances. In addition, the proposed constrained LDPC decoder may benefit other application.
Unequal error protection using Wyner-Ziv coding for error resilience
Compressed video is very sensitive to channel errors. A few bit losses can derail the entire decoding process. Thus, protecting compressed video is imperative to enable visual communications. Since different elements in a compressed video stream vary in their impact on the quality of the decoded video, unequal error protection can be used to provide efficient protection. This paper describes an unequal error protection method for protecting data elements in a video stream, via a Wyner--Ziv encoder that consists of a coarse quantizer and a Turbo coder based lossless Slepian--wolf encoder. Data elements that significantly impact the visual quality of decoded video, such as modes and motion vectors as used by H.264, are provided more parity bits than coarsely quantized transform coefficients. This results in an improvement in the quality of the decoded video when the transmitted sequence is corrupted by transmission errors, than obtained by the use of equal error protection.
Generalized in-scale motion compensation framework for spatial scalable video coding
Ruiqin Xiong, Jizheng Xu, Feng Wu, et al.
In existing video coding schemes with spatial scalability based on pyramid frame representation, such as the ongoing H.264/MPEG-4 SVC (scalable video coding) standard, video frame at a high resolution is mainly predicted either from the lower-resolution image of the same frame or from the temporal neighboring frames at the same resolution. Most of these prediction techniques fail to exploit the two correlations simultaneously and efficiently. This paper extends the in-scale prediction technique developed for wavelet video coding to a generalized in-scale motion compensation framework for H.264/MPEG-4 SVC. In this framework, for a video frame at a high resolution layer, the lowpass content is predicted from the information already coded in lower resolution layer, but the highpass content is predicted by exploiting the neighboring frames at current resolution. In this way, both the cross-resolution correlation and temporal correlation are exploited simultaneously, which leads to much higher efficiency in prediction. Preliminary experimental results demonstrate that the proposed framework improves the spatial scalability performance of current H.264/MPEG-4 SVC. The improvement is significant especially for high-fidelity video coding. In addition, another advantage over wavelet-based in-scale scheme is achieved that the proposed framework can support arbitrary down-sampling and up-sampling filters.
Fast prediction algorithm of adaptive GOP structure for SVC
Yi-Hau Chen, Chia-Hua Lin, Ching-Yeh Chen, et al.
Adaptive group-of-picture (GOP) structure is an important encoding tool in multi-level motion-compensated temporal filtering coding scheme. Compared to conventional fixed-GOP scheme, it can dynamically adapt the GOP size to enhance the coding performance based on each sequence's characteristics. But the existing adaptive GOP structure (AGS) algorithm proposed in JSVM requires huge computation complexity. In this paper, a fast AGS prediction algorithm is proposed. At first, based on the relationship among coding performance, GOP size and corresponding intra block ratio, a sub-GOP size prediction model for different decomposition levels is developed based on the encoded intra block ratio. Then, a prediction scheme is proposed to implement AGS by the sub-GOP size prediction model. It can predict the following sub-GOP size by current sub-GOP's information instead of searching all possible sub-GOP composition. The experimental results show that the proposed algorithm with linear threshold has almost equivalent coding performance as AGS in JSVM but only one-fourth computation complexity for 4-level interframe coding scheme is required.
Wavelet Representation and Coding
icon_mobile_dropdown
Scalable direction representation for image compression with direction-adaptive discrete wavelet transform
Tao Xu, Chuo-Ling Chang, Bernd Girod
The direction-adaptive discrete wavelet transform (DA-DWT) locally adapts the filtering direction to the geometric flow in the image. DA-DWT image coders have been shown to achieve a rate-distortion performance superior to non-adaptive wavelet coders. However, since the direction information must always be signalled regardless of total bit-rate, performance at very low bit-rates might be worse. In this paper, we propose two scalable direction representations: the layered scheme which is similar to the scalable motion vector representation in scalable video coding and the level-unit scheme which provides finer granularity upon the layered scheme. Experimental results indicate that we can achieve the desirable performance at both low and high bit rates with our proposed level-unit scheme. Significant improvement in image quality (about 3-5 dB) is observed at very low bit rate, relative to non-scalable coding of the direction information.
Video coding with fully separable wavelet and wavelet packet transforms
Three-dimensional (t+2D) wavelet coding schemes have been demonstrated to be efficient techniques for video compression applications. However, the separable wavelet transform used for removing the spatial redundancy allows a limited representation of the 2D texture because of spatial isotropy of the wavelet basis functions. In this case, anisotropic transforms, such as fully separable wavelet transforms (FSWT), can represent a solution for spatial decorrelation. FSWT inherits the separability, the computational simplicity and the filter bank characteristics of the standard 2D wavelet transform, but it improves the representation of directional textures, as the ones which can be found in temporal detail frames of t + 2D decompositions. The extension of both classical wavelet and wavelet-packet transforms to fully separable decompositions preserve at the same time the low-complexity and best-bases selection algorithms of these ones. We apply these transforms in t + 2D video coding schemes and compare them with classical decompositions.
The edge driven oriented wavelet transform: an anisotropic multidirectional representation with oriented lifting scheme
Guillaume Jeannic, Vincent Ricordel, Dominique Barba
In spite of its success, the standard 2-D discrete wavelet transform (2D-DWT) is not completely adapted to represent image entities like edges or oriented textures. Indeed the DWT is limited by the spatial isotropy of its basis functions that can not take advantage of edges regularity and moreover, direction edge that is neither vertical or horizontal is represented using many of these wavelet basis functions which does mean that DWT does not provide a sparse representation for such discontinuities. Several representations have been proposed to overcome this lack. Some of them deal with more orientations while introducing redundancy (e.g. ridgelets, curvelets, contourlets) and their implementations are not trivial or require 2-D non separable filtering. We present two oriented lifting-based schemes using separable filtering, lead by edge extraction, and inspired from bandelets and curved wavelets. An image is decomposed into a quadtree according to the edge elements orientation. For each leaf, a wavelet transform is performed along the most regular orientation, and then along its orthogonal direction. Different adapted filters may be used for these two directions in order to achieve anisotropic filtering. Our method permits also a perfect reconstruction and a critical sampling.
Image Registration and Recognition
icon_mobile_dropdown
Multimodal image registration based on edges and junctions
This paper proposes an edge-based multimodal image registration approach. It aims to address image registration as a whole rather than tackle each of its elements independently. One-pixel-wide curves are firstly extracted from images, and junctions along the curves are detected. Then, each curve is divided into subsegments as matching primitives. A similarity metric based on the number of matched pairs of subsegments is proposed and experimental results show that the presented approach is a robust and effective tool for multimodal image registration.
Automatic target segmentation in color dental images
Jiebo Luo, Mark Bolin
Automatic target segmentation is critical to computerized dental imaging systems, which are designed to reduce human effort and error. We have developed an automatic algorithm that is capable of outlining an intra-oral reference bar and the tooth of interest. In particular, the algorithm first locates the reference bar using unique color and shape cues. The located reference bar provides an estimate for the tooth of interest in terms of both its scale and location. Next, the estimate is used to initialize a trained active shape model (ASM) consisting of the bar and the tooth. Finally, a search process is performed to find a match between the ASM and the local image structures. Experimental results have shown that our fully automatic algorithm provides accurate segmentation of both the reference bar and the tooth of interest, and it is insensitive to lighting, tooth color, and tooth-shape variations.
Comparison of compression algorithms' impact on fingerprint and face recognition accuracy
A. Mascher-Kampfer, Herbert Stögner, Andreas Uhl
The impact of using different lossy compression algorithms on the matching accuracy of fingerprint and face recognition systems is investigated. In particular, we relate rate-distortion performance as measured in PSNR to the matching scores as obtained by the recognition systems. JPEG2000 and SPIHT are correctly predicted by PSNR to be the most suited compression algorithms to be used in fingerprint and face recognition systems. Fractal compression is identified to be least suited for the use in the investigated recognition systems, although PSNR suggests JPEG to deliver worse recognition results in the case of face imagery. JPEG compression performs surprisingly well at high bitrates in face recognition systems, given the low PSNR performance observed.
Special Session: Next-Generation Video Coding Technologies
icon_mobile_dropdown
Film grain noise modeling in advanced video coding
A new technique for film grain noise extraction, modeling and synthesis is proposed and applied to the coding of high definition video in this work. The film grain noise is viewed as a part of artistic presentation by people in the movie industry. On one hand, since the film grain noise can boost the natural appearance of pictures in high definition video, it should be preserved in high-fidelity video processing systems. On the other hand, video coding with film grain noise is expensive. It is desirable to extract film grain noise from the input video as a pre-processing step at the encoder and re-synthesize the film grain noise and add it back to the decoded video as a post-processing step at the decoder. Under this framework, the coding gain of the denoised video is higher while the quality of the final reconstructed video can still be well preserved. Following this idea, we present a method to remove film grain noise from image/video without distorting its original content. Besides, we describe a parametric model containing a small set of parameters to represent the extracted film grain noise. The proposed model generates the film grain noise that is close to the real one in terms of power spectral density and cross-channel spectral correlation. Experimental results are shown to demonstrate the efficiency of the proposed scheme.
Advances in hybrid video coding
Thomas Wedi, Steffen Wittmann, Torsten Palfner, et al.
Hybrid video coding is known for a long time and is applied in all video coding standards such as MPEG-x or H.26x. This paper shows that there is still enough potential for further coding efficiency improvements. The paper starts with an overview of state-of-the-art hybrid video coding schemes such as H.264/AVC. Thereafter, our advances on main building blocks of H.264/AVC are presented that significantly improve the coding efficiency. For instance, intra prediction is improved by changing scan directions and thus providing better reference pixels for specific prediction directions. Adaptive filtering and high precision motion compensation improves the motion compensated prediction. Furthermore, the combination of transformation, quantization, and entropy coding of the prediction error is improved using an advanced frequency selective coding technique. With the transmission of post-filter hints it is possible to design a Wiener post-filter that significantly enhances the picture quality. Finally, texture synthesis techniques are used to improve the subjective quality for specific textures with homogenous statistical characteristics. This paper presents our above-mentioned techniques in detail. Depending on the input sequence and the bit rate, the objective and/or subjective gains compared to H.264/AVC are quite significant.
Next generation video coding for mobile applications: industry requirements and technologies
Madhukar Budagavi, Minhua Zhou
Handheld battery-operated consumer electronics devices such as camera phones, digital still cameras, digital camcorders, and personal media players have become very popular in recent years. Video codecs are extensively used in these devices for video capture and/or playback. The annual shipment of such devices already exceeds a hundred million units and is growing, which makes mobile battery-operated video device requirements very important to focus in video coding research and development. This paper highlights the following unique set of requirements for video coding for these applications: low power consumption, high video quality at low complexity, and low cost, and motivates the need for a new video coding standard that enables better trade-offs of power consumption, complexity, and coding efficiency to meet the challenging requirements of portable video devices. This paper also provides a brief overview of some of the video coding technologies being presented in the ITU-T Video Coding Experts Group (VCEG) standardization body for computational complexity reduction and for coding efficiency improvement in a future video coding standard.
Adaptive filtering for cross-view prediction in multi-view video coding
PoLin Lai, Yeping Su, Peng Yin, et al.
We consider the problem of coding multi-view video that exhibits mismatches in frames from different views. Such mismatches could be caused by heterogeneous cameras and/or different shooting positions of the cameras. In particular, we consider focus mismatches across views, i.e., such that different portions of a video frame can undergo different blurriness/sharpness changes with respect to the corresponding areas in frames from the other views. We propose an adaptive filtering approach for cross-view prediction in multi-view video coding. The disparity fields are exploited as an estimation of scene depth. An Expectation-maximization (EM) algorithm is applied to classify the disparity vectors into groups. Based on the classification result, a video frame is partitioned into regions with different scene-depth levels. Finally, for each scene-depth level, a two-dimensional filter is designed to minimize the average residual energy of cross-view prediction for all blocks in the class. The resulting filters are applied to the reference frames to generate better matches for cross-view prediction. Simulation results show that, when encoding across views, the proposed method achieves up to 0.8dB gain over current H.264 video coding.
RD-optimized competition scheme for efficient motion prediction
J. Jung, G. Laroche, B. Pesquet
H.264/MPEG4-AVC is the latest video codec provided by the Joint Video Team, gathering ITU-T and ISO/IEC experts. Technically there are no drastic changes compared to its predecessors H.263 and MPEG-4 part 2. It however significantly reduces the bitrate and seems to be progressively adopted by the market. The gain mainly results from the addition of efficient motion compensation tools, variable block sizes, multiple reference frames, 1/4-pel motion accuracy and powerful Skip and Direct modes. A close study of the bits repartition in the bitstream reveals that motion information can represent up to 40% of the total bitstream. As a consequence reduction of motion cost is a priority for future enhancements. This paper proposes a competition-based scheme for the prediction of the motion. It impacts the selection of the motion vectors, based on a modified rate-distortion criterion, for the Inter modes and for the Skip mode. Combined spatial and temporal predictors take benefit of temporal redundancies, where the spatial median usually fails. An average 7% bitrate saving compared to a standard H.264/MPEG4-AVC codec is reported. In addition, on the fly adaptation of the set of predictors is proposed and preliminary results are provided.
High-definition video coding with super-macroblocks
A high definition video coding technique using super-macroblocks is investigated in this work. Our research is motivated by the observation that the macroblock-based partition in H.264/AVC may not be efficient for high definition video since the maximum macroblock size of 16 x 16 is relatively small against the whole image size. In the proposed super-macboblock based video coding scheme, the original block size MxN in H.264 is scaled to 2Mx2N. Along with the super-macroblock prediction framework, a low-complexity 16 x 16 discrete cosine transform (DCT) is proposed. As compared with the 1D 8 x 8 DCT, only 16 additions are added for a 1D 16 points 16 x 16 DCT. Furthermore, an adaptive scheme is proposed for the selection the best coding mode and best transform size. It is shown by experimental results that the super-macroblock coding scheme can achieve a higher coding gain.
Media Communication and Networking
icon_mobile_dropdown
Cross-layer optimization for wireless video communication
With the rapid growth of wireless networks and increasing popularity of portable video devices, wireless video communication is poised to become the enabling technology for many multimedia applications over wireless networks. Real-time wireless video transmission typically has requirements on quality of service (QoS). However, wireless channels are unreliable and the channel capacities are time-varying, which may cause severe degradation to video presentation quality. In addition, for portable devices, video compression and wireless transmission are tightly coupled through the constraints on data rate, power, and delay. These issues make it particularly challenging to design an efficient real-time video compression and wireless transmission system on a portable device. In this paper, we take a cross-layer approach to this problem; our objective is to maximize the video quality under the constraints of resource and delay. Specifically, we minimize the end-to-end video distortion under the constraints of resource and delay, over the parameters in physical, link, and application (video) layers. This formulation is general and capable of capturing the fundamental aspects of the design of wireless video communication systems. Based on this formulation, we study how the resources could be intelligently allocated to maximize the video quality and analyze the performance limits of the wireless video communication system under resource constraints.
A more aggressive prefetching scheme for streaming media delivery over the Internet
Efficient delivery of streaming media content over the Internet becomes an important area of research as such content is rapidly gaining its popularity. Many research works studied this problem based on the client-proxy-server structure and proposed various mechanisms to address this problem such as proxy caching and prefetching. While the existing techniques can improve the performance of accesses to reused media objects, they are not so effective in reducing the startup delay for first-time accessed objects. In this paper, we try to address this issue by proposing a more aggressive prefetching scheme to reduce the startup delay of first-time accesses. In our proposed scheme, proxy servers aggressively prefetch media objects before they are requested. We make use of servers' knowledge about access patterns to ensure the accuracy of prefetching, and we try to minimize the prefetched data size by prefetching only the initial segments of media objects. Results of trace-driven simulations show that our proposed prefetching scheme can effectively reduce the ratio of delayed requests by up to 38% with very marginal increase in traffic.
Rate-smoothed encoding for real-time video streaming applications
For real-time video streaming applications over the constant bit rate channels, it is highly desired that video signals can be encoded in not only good average quality but also smooth video quality. However, in the case that the network resource is sufficiently large and the video quality has reached the target quality, the quality smoothing is not necessary and the rate smoothing is desired to avoid overusing the unnecessary network resource but also achieve a smoothed traffic rate. In this paper, we propose a novel real-time rate-smoothed encoding scheme by applying the low pass filtering idea. Both theoretical analysis and experimental results show that the proposed rate-smoothed encoding scheme can achieve a target average quality while significantly reducing the peak rate and the rate variance. We have further proposed a joint quality and rate smoothed encoding scheme, which can provide adaptive smoothing according to different situations. Experimental results show that the proposed joint smoothing scheme can make an optimal balance between the quality fluctuation and the rate fluctuation, and hence improve the overall system performance.
Joint source-channel rate allocation in parallel channels
A novel rate-optimal rate allocation algorithm is proposed for parallel transmission of scalable images in multi-channel systems. Scalable images are transmitted via fixed-length packets. The proposed algorithm selects a subchannel as well as a channel code rate for each packet, based on the signal-to-noise ratios (SNR) of the subchannels. The resulting scheme provides unequal error protection of source bits. Applications to JPEG2000 transmission show that significant UEP gains are achieved over equal error protection (EEP) schemes.
Constant quality JPEG2000 rate control for digital cinema
Michael D. Smith, John Villasenor
An uncompressed Digital Cinema Distribution Master (DCDM) image typically has dimensions of up to 4096x2160 (4K) or 2048x1080 (2K) with 12-bit pixel data for each of the X'Y'Z' color planes. At a frame rate of 24 frames per second, this gives uncompressed data rates of 7.6 and 1.9 Gbps for 4K and 2K respectively. Even after compression, average data rates in the hundreds of Mbits/sec. are encountered. Recently, the Society of Motion Picture and Television Engineers' (SMPTE) has chosen JPEG2000 as the standard to be used for digital cinema compression. Thus, methods to appropriately trade off rate and quality for JPEG2000-compressed movies will have high importance in the next several years as systems are designed and deployed. In this paper we describe a new distortion-based framework for rate control that enables a JPEG2000 encoder to achieve a user-specified quality, and therefore makes it possible to produce constant quality from frame-to-frame in an image sequence. The new method makes direct use of the same JPEG2000 coding pass data as the traditional approaches, and thus can easily be adopted at the back end of JPEG2000 encoding engines. We compare the new method with two other common rate control techniques for JPEG2000.
On preserving robustness-false alarm tradeoff in media hashing
S. Roy, X. Zhu, J. Yuan, et al.
This paper discusses one of the important issues in generating a robust media hash. Robustness of a media hashing algorithm is primarily determined by three factors, (1) robustness-false alarm tradeoff achieved by the chosen feature representation, (2) accuracy of the bit extraction step and (3) the distance measure used to measure similarity (dissimilarity) between two hashes. The robustness-false alarm tradeoff in feature space is measured by a similarity (dissimilarity) measure and it defines a limit on the performance of the hashing algorithm. The distance measure used to compute the distance between the hashes determines how far this tradeoff in the feature space is preserved through the bit extraction step. Hence the bit extraction step is crucial, in defining the robustness of a hashing algorithm. Although this is recognized as an important requirement by all, to our knowledge there is no work in the existing literature that elucidates the effcacy of their algorithm based on their effectiveness in improving this tradeoff compared to other methods. This paper specifically demonstrates the kind of robustness false alarm tradeoff achieved by existing methods and proposes a method for hashing that clearly improves this tradeoff.
Imaging Systems
icon_mobile_dropdown
Edge-based automatic white balancing with linear illuminant constraint
Homer H. Chen, Chun-Hung Shen, Pei-Shan Tsai
Automatic white balancing is an important function for digital cameras. It adjusts the color of an image and makes the image look as if it is taken under canonical light. White balance is usually achieved by estimating the chromaticity of the illuminant and then using the resulting estimate to compensate the image. The grey world method is the base of most automatic white balance algorithms. It generally works well but fails when the image contains a large object or background with a uniform color. The algorithm proposed in this paper solves the problem by considering only pixels along edges and by imposing an illuminant constraint that confines the possible colors of the light source to a small range during the estimation of the illuminant. By considering only edge points, we reduce the impact of the dominant color on the illuminant estimation and obtain a better estimate. By imposing the illuminant constraint, we further minimize the estimation error. The effectiveness of the proposed algorithm is tested thoroughly. Both objective and subjective evaluations show that the algorithm is superior to other methods.
Highly automated image recomposition: the picture you wish you had taken
Jiebo Luo, Phoury Lei
Have you ever lamented, "I wish I had taken a different picture of that 'Kodak Moment' when everyone was smiling and no one blinked"? With image recomposition, we strive to deliver the "one-click fix" experience to customers so they can easily create the perfect pictures that they never actually took. To accomplish this, a graphic user interface was created that integrates existing and new algorithms, including face detection, facial feature location, face recognition, expression recognition, face appearance and pose matching, and seamless blending. Advanced modes include face relighting. This system is capable of performing image recomposition from a mixture of videos and still photos, with ease of use and a high degree of automation.
Symmetric trinocular dense disparity estimation for car surrounding camera array
This paper presented a novel dense disparity estimation method which is called as symmetric trinocular dense disparity estimation. Also a car surrounding camera array application is proposed to improve the driving safety by the proposed symmetric trinocular dense disparity estimation algorithm. The symmetric trinocular property is conducted to show the benefit of doing disparity estimation with three cameras. A 1D fast search algorithm is described to speed up the slowness of the original full search algorithms. And the 1D fast search algorithm utilizes the horizontal displacement property of the cameras to further check the correctness of the disparity vector. The experimental results show that the symmetric trinocular property improves the quality and smoothness of the disparity vector.
Surveillance system with mega-pixel scalable transcoder
This paper presents a video surveillance system that displays mega-pixel streams effectively, while transmitting and processing the streams efficiently with limited resources such as bandwidth, computing power and display resolution. The proposed system stores high-resolution and high-quality video data and associated object metadata, which includes ROI (Region-of-Interest) information. To satisfy such resource constraints and display important parts in detail without missing the overall scene context, the stored images are efficiently transcoded in the compressed-domain based on the ROI information, display resolution and available bandwidth. Simulation results confirm the effectiveness of the proposed system in terms of objective measures and subjective evaluation.
A 2-D gel electrophoresis DNA image analysis algorithm with automatic thresholding
Naima Kaabouch, Richard R. Schultz
Polymerase chain reaction (PCR) and gel electrophoresis are two widely used techniques for genetic studies that require the bench scientist to perform many tedious manual steps. Advances in automation are making these techniques more accessible, but detection and image analysis still remain labor-intensive. Although several commercial software packages are now available, DNA image analysis still requires some intervention by the user, and thus a certain level of image processing expertise. To allow researchers to speed up their analyses and obtain more repeatable results, we present a fully automated image analysis system for DNA or protein studies with high accuracy. The proposed system is based mainly on four steps: automatic thresholding, shifting, filtering, and processing. The automatic thresholding that is used to equalize the gray values of the gel electrophoreses image background is one of the key and novel operations in this algorithm. An enhancement is also used to improve poor quality images that have faint DNA bands. Experimental results show that the proposed method eliminates defects due to noise for good and average quality gel electrophoresis images, while it also improves the appearance of poor quality images.
Automatic estimation and compensation of geometric distortions in video copies
B. Chupeau, A. Massoudi, F. Lefèbvre
The proliferation of pirate copies of feature films on peer-to-peer networks arouses a great interest to countermeasures such as the insertion of (invisible) forensic marks in projected movies, to deter their illegal capture. The registration of pirate copies with the original content is however a prerequisite to the recovery of such embedded messages, as severe geometric distortions often occur in illegally camcorded contents. After a brief state-of-the-art in image registration, the paper details an algorithm for video registration, focusing on the compensation of geometric distortions. Control points are automatically extracted in original and copy pictures, followed by pre- and post-matching filtering steps to discard not relevant control points and erroneous matched pairs of control points respectively. This enables the accurate numerical estimation of an 8-parameter homographic distortion model, used to register the copy frames with the original reference grid. Such an image registration algorithm is inserted into a general video registration scheme. Results are presented on both natural and synthetic test material.
Video Coding II
icon_mobile_dropdown
A novel statistical learning-based rate distortion analysis approach for multiscale binary shape coding
Zhenzhong Chen, King Ngi Ngan
In this paper, we propose a statistical learning based approach to analyze the rate-distortion characteristics of multiscale binary shape coding. We employ the polynomial kernel function and incorporate rate-distortion related features for our support vector regression. ε-Insensitive loss function is chosen to improve the estimation robustness. The parameter tuning is also studied. Moreover, we discuss the feature selection which helps to improve the estimation accuracy. Comparing to the traditional method, our proposed framework provides better rate distortion estimation not only on simple shapes but also on complex shapes.
Drift-compensated coding optimization for fast bit-rate reduction transcoding
Peng Zhang, Xiangyang Ji, Wen Gao, et al.
Bit rate adaptation is one of the basic problems in universal multimedia access. Therefore, in many cases, there exists a strong requirement for a very fast transcoding implementation. Usually, open-loop transcoding architecture is the desired solution with the lowest computational complexity. However, severe drift error makes it infeasible to most applications. In this paper, we propose a drift-compensated coding optimization scheme, by which generated bitstreams can be effectively transcoded into lower bitrate by open-loop transcoder with little drift propagation. The encoder integrates a virtual open-loop transcoder, in which drift error is effectively compensated by suitably adjusting the RD based mode selection and motion estimation for each macroblock. Simulation results show that compared with traditional coding, the proposed mode selection scheme can yield better coding efficiency when rate reduction transcoding to low bitrate occurs. And meanwhile, it does not degrade the coding efficiency in comparison with the normal single layer coding in H.264/AVC.
A fast inter mode decision algorithm in H.264/AVC for IPTV broadcasting services
Geun-Yong Kim, Bin-Yeong Yoon, Yo-Sung Ho
The new video coding standard H.264/AVC employs the rate-distortion optimization (RDO) method for choosing the best coding mode. However, since it increases the encoder complexity tremendously, it is not suitable for real-time applications, such as IPTV broadcasting services. Therefore we need a fast mode decision algorithm to reduce its encoding time. In this paper, we propose a fast mode decision algorithm considering quantization parameter (QP) because we have noticed that the frequency of best modes depends on QP. In order to consider these characteristics, we use the coded block pattern (CBP) that has "0" value when all quantized discrete cosine transform (DCT) coefficients are zero. We also use both the early SKIP mode and early 16x16 mode decisions. Experimental results show that the proposed algorithm reduces the encoding time by 74.6% for the baseline profile and 72.8% for the main profile, compared to the H.264/AVC reference software.
Motion Estimation
icon_mobile_dropdown
Enhanced SAD reuse fast motion estimation
Kai Lam Tang, King N. Ngan
H.264 is a new video coding standard which outperforms the previous video coding standards. It uses many advanced video coding techniques to improve the coding performance. Variable block size motion estimation (ME) is one of the techniques that contributes to the excellent performance of H.264 but it is computational intensive. In this paper, a fast SAD reuse ME algorithm is proposed which reuses SAD within the same macroblock and uses pattern-based ME and refinement search to reduce the computational complexity of variable block size ME with a little degradation of coding performance in terms of PSNR and bitrate. Experimental results show that the proposed algorithm reduces the ME time by more than 90% on average with only a little degradation of coding performance when comparing with that of the Fast Full Search (FFS).
Motion estimation performance models with application to hardware error tolerance
Hye-Yeon Cheong, Antonio Ortega
The progress of VLSI technology towards deep sub-micron feature sizes, e.g., sub-100 nanometer technology, has created a growing impact of hardware defects and fabrication process variability, which lead to reductions in yield rate. To address these problems, a new approach, system-level error tolerance (ET), has been recently introduced. Considering that a significant percentage of the entire chip production is discarded due to minor imperfections, this approach is based on accepting imperfect chips that introduce imperceptible/acceptable system-level degradation; this leads to increases in overall effective yield. In this paper, we investigate the impact of hardware faults on the video compression performance, with a focus on the motion estimation (ME) process. More specifically, we provide an analytical formulation of the impact of single and multiple stuck-at-faults within ME computation. We further present a model for estimating the system-level performance degradation due to such faults, which can be used for the error tolerance based decision strategy of accepting a given faulty chip. We also show how different faults and ME search algorithms compare in terms of error tolerance and define the characteristics of search algorithm that lead to increased error tolerance. Finally, we show that different hardware architectures performing the same metric computation have different error tolerance characteristics and we present the optimal ME hardware architecture in terms of error tolerance. While we focus on ME hardware, our work could also applied to systems (e.g., classifiers, matching pursuits, vector quantization) where a selection is made among several alternatives (e.g., class label, basis function, quantization codeword) based on which choice minimizes an additive metric of interest.
Super-resolution based on region-matching motion estimation
Osama A. Omer, Toshihisa Tanaka
We consider the problem of recovering a high-resolution (HR) frame from a sequence of low-resolution (LR) frames. It is challenging to design a super-resolution (SR) algorithm for arbitrary video sequences. Video frames in general cannot be related through global parametric transformation due to the arbitrary individual pixel movement between frame pairs. Hence a local motion model needs to be used for frame alignment. An accurate alignment is the key to success of reconstruction-based super-resolution algorithms. Motivated by this challenge we propose to employ region-matching technique for image registration in this paper. The proposed algorithm consists of the alignment step to produce a blurred version of the HR frame and the restoration step to estimate the HR frame. The experimental results of the proposed algorithm are compared with the results of using affine, block matching, and optical flow motion models. It is shown that the use of region matching for SR is very promising in producing higher quality images.
Poster Session
icon_mobile_dropdown
An improved adaptive interpolation approach for H.264
D. Wu, K. P. Lim, T. K. Chiew, et al.
As one important component in H.264 video encoder, interpolation of half and quarter pixels is also computational intensive. Compared to integer pixel motion estimation, "finer" interpolation provides better block match. However, this good motion compensation performance is obtained at the expense of increased complexity. Based on our previous work, this paper presents an improved fast and adaptive interpolation method that further reduces the complexity of video encoding process. Experimental results on typical video sequences demonstrate that the proposed method is able to increase encoder speed ranging from 10% to 22% compared with our previous work without any PSNR loss or bit rate increase.
A comparative study of image compression based on directional wavelets
Kun Li, Wenli Xu, Qionghai Dai, et al.
Discrete wavelet transform is an effective tool to generate scalable stream, but it cannot efficiently represent edges which are not aligned in horizontal or vertical directions, while natural images often contain rich edges and textures of this kind. Hence, recently, intensive research has been focused particularly on the directional wavelets which can effectively represent directional attributes of images. Specifically, there are two categories of directional wavelets: redundant wavelets (RW) and adaptive directional wavelets (ADW). One representative redundant wavelet is the dual-tree discrete wavelet transform (DDWT), while adaptive directional wavelets can be further categorized into two types: with or without side information. In this paper, we briefly introduce directional wavelets and compare their directional bases and image compression performances.
Locally adaptive reconstruction of lost low-frequency coefficients in wavelet coded images
Joost Rombaut, Aleksandra Pižurica, Wilfried Philips
In packet switched networks such as the Internet, packets may get lost during transmission due to, e.g., network congestion. This leads to a quality degradation of the original signal. As video communication is a bandwidth consuming application, the original data are first compressed. This compression step increases the impact of information loss even more. In wavelet based image and video coding, the low frequency data is the most important. Loss of low frequency coefficients results in annoying black holes in the received images and video. This effect can be countered by post processing error concealment: a lost coefficient is estimated from its neighboring coefficients. In this paper we present a locally adaptive interpolation method for the lost low frequency coefficients. For each lost low frequency coefficient, we estimate the optimal interpolation direction (horizontal or vertical) using novel error measures. In this way, we preserve the edges in the reconstructed image much better. Compared to older techniques of similar complexity, our scheme reconstructs images with the same or better quality. This is reflected in the visual as well as in the numerical results: there is an increase of up to 4.4 dB compared to bilinear concealment. The proposed scheme is fast and simple, which makes it suitable for real-time applications.
Pose estimation from video sequences based on Sylvester's equation
In this paper, we introduce a method to jointly track the object motion and estimate pose within the framework of particle filtering. We focus on direct estimation of the 3D pose from a 2D image sequence. Scale-Invariant Feature Transform (SIFT) is used to extract feature points in the images. We show that pose estimation from the corresponding feature points can be formed as a solution to Sylvester's equation. We rely on a solution to Sylvester's equation based on the Kronecker product method to solve the equation and determine the pose state. We demonstrate that the classical Singular Value Decomposition (SVD) approach to pose estimation provides a solution to Sylvester's equation in 3D-3D pose estimation. The proposed approach to the solution of Sylvester's equation is therefore equivalent to the classical SVD method for 3D-3D pose estimation, yet it can also be used for pose estimation from 2D image sequences. Finally, we rely on computer simulation experiments to demonstrate the performance of our algorithm on video sequences.
A fast and efficient method to protect color images
In this paper, we propose a method to embed the color information of an image in a corresponding grey-level image. The objective of this work is to allow free access to the grey-level image and give color image access only if you own a secret key. This method is made of three major steps which are a fast color quantization, an optimized ordering and an adapted data hiding. The principle is to build an index image which is, in the same time, a semantically intelligible grey-level image. In order to obtain this particular index image, which should be robust to data hiding, a layer running algorithm is proceeded to sort the K colors of the palette. The major contributions of this paper are the fast color quantization, the optimized layer running algorithm, the color palette compression and the adapted data hiding.
3D-face model tracking based on a multi-resolution active search
This paper deals with face tracking in difficult conditions of non calibrated camera, strong head motions, thanks to a deformable 3D model. In those conditions, the proposed approach is able to detect and track a face. The novelty is mainly due to a multi-resolution Active Model search which allows to catch strong head motions. Results show an improvement between the single and the multi-resolution technique. Near real-time results are also provided.
Region-based hidden Markov models for image categorization and retrieval
Fei Li, Qionghai Dai, Wenli Xu
Hidden Markov models (HMMs) have been widely used in various fields, including image categorization and retrieval. Most of the existing methods train HMMs by low-level features of image blocks; however, the blockbased features can not reflect high-level semantic concepts well. This paper proposes a new method to train HMMs by region-based features, which can be obtained after image segmentation. Our work can be characterized by two key properties: (1) Region-based HMM is adopted to achieve better categorization performance, for the region-based features accord with the human perception better. (2) Multi-layer semantic representation (MSR) is introduced to couple with region-based HMM in a long-term relevance feedback framework for image retrieval. The experimental results demonstrate the effectiveness of our proposal in both aspects of categorization and retrieval.
A new support tool for machine learning and pattern recognition using tracking and motion segmentation
E. Bichot, O. Masset, L. Mascarilla, et al.
A new support tool using object tracking and motion based segmentation is developed for machine learning and pattern recognition. In the learning step, an object of interest is tracked while learning is performed from segmented frames. In the recognition step, target is tracked until favorable conditions allow identification. This tool is used in the context of the Aqu@thèque project which includes an automatic fish recognition system. Tracking is a dificult task especially in case of real world images. Particle filtering methods incorporating motion based segmentation measurement in importance sampling step improve performance.
Pet fur color and texture classification
Object segmentation is important in image analysis for imaging tasks such as image rendering and image retrieval. Pet owners have been known to be quite vocal about how important it is to render their pets perfectly. We present here an algorithm for pet (mammal) fur color classification and an algorithm for pet (animal) fur texture classification. Per fur color classification can be applied as a necessary condition for identifying the regions in an image that may contain pets much like the skin tone classification for human flesh detection. As a result of the evolution, fur coloration of all mammals is caused by a natural organic pigment called Melanin and Melanin has only very limited color ranges. We have conducted a statistical analysis and concluded that mammal fur colors can be only in levels of gray or in two colors after the proper color quantization. This pet fur color classification algorithm has been applied for peteye detection. We also present here an algorithm for animal fur texture classification using the recently developed multi-resolution directional sub-band Contourlet transform. The experimental results are very promising as these transforms can identify regions of an image that may contain fur of mammals, scale of reptiles and feather of birds, etc. Combining the color and texture classification, one can have a set of strong classifiers for identifying possible animals in an image.
A simple reversed-complexity Wyner-Ziv video coding mode based on a spatial reduction framework
Debargha Mukherjee, Bruno Macchiavello, Ricardo L. de Queiroz
A spatial-resolution reduction based framework for incorporation of a Wyner-Ziv frame coding mode in existing video codecs is presented, to enable a mode of operation with low encoding complexity. The core Wyner-Ziv frame coder works on the Laplacian residual of a lower-resolution frame encoded by a regular codec at reduced resolution. The quantized transform coefficients of the residual frame are mapped to cosets to reduce the bit-rate. A detailed rate-distortion analysis and procedure for obtaining the optimal parameters based on a realistic statistical model for the transform coefficients and the side information is also presented. The decoder iteratively conducts motion-based side-information generation and coset decoding, to gradually refine the estimate of the frame. Preliminary results are presented for application to the H.263+ video codec.
Motion refined medium granular scalability
Zhengguo Li, Wei Yao, Susanto Rahardja
In this paper, we propose an interesting scheme to obtain a good tradeoff between motion information and residual information for medium granular scalability (MGS). In this scheme, both motion information and residual information are refined at enhancement layers when the scalable bit rate range is wide, whereas only residual information is refined when the range is narrow. In other words, for the case of wide bit rate range, there can be more than one motion vector fields (MVFs) where one is generated at base layer and others are generated at enhancement layers. When it is narrow, only one MVF is necessary. The layers can either share one MVF or have its own, depending on the bit rate range cross layers. Unlike Coarse Granular Scalability (CGS), the correlation between two adjacent MVFs in MGS is very strong. Hence MGS can be provided in the most important bit rate range to achieve a better tradeoff between motion and residual information and a finer granularity in that range. CGS can be applied in less important bit rate ranges to give a coarse granularity. Experimental results show that the coding efficiency can be improved by up to 1dB compared with existing SNR scalability scheme at high bit rate.
An efficient multi-frame dynamic search range motion estimation for H.264
Qi-Chao Sun, Jing Wang, Xin-Hao Chen, et al.
H.264/AVC achieves higher compression efficiency than previous video coding standards. However, this comes at the cost of increased complexity due to the use of variable block size motion estimation and long-term memory motion compensated prediction (LTMCP). In this paper, an efficient multi-frame dynamic search range motion estimation algorithm is proposed. This algorithm can adjust the spatial search range and temporal search range according to the video content dynamically. This algorithm can be on the top of many other fast motion estimation (Fast ME) algorithms. Compared with the constant search range scheme used by multi-frame UMHexagonS algorithm, the proposed algorithm can be 4.86 time faster, with negligible degradation of video quality.
An area-efficient VLSI architecture for AVS intra frame encoder
Ke Zhang, Lu Yu
In this paper, we propose a VLSI architecture for AVS intra frame encoder. Reconstruction loop hinders the parallelism exploration and becomes the critical path in an intra frame encoder. A First Selection Then Prediction (FSTP) method is proposed to break the loop and enable the parallel process of intra mode selection and reconstruction on neighboring blocks. In addition, area-efficient modules were developed. Configurable intra predictor can support all the intra prediction modes. A CA-2D-VLC engine with an area-efficient Exp-Golomb encoder was developed to meet the encoding speed demand with comparably low hardware cost. Synthesized with 0.18 m CMOS standard-cell library, the overall hardware cost of the proposed intra frame encoder is 89k logic gates at the clock frequency constraint of 125MHz. Proposed encoder can satisfy real time encoding of 720x576 4:2:0 25fps video at the working frequency of 54MHz.
Fast luminance and chrominance correction based on motion compensated linear regression for multi-view video coding
Luminance and chrominance correction (LCC) is important in multi-view video coding (MVC) because it provides better rate-distortion performance when encoding video sequences captured by ill-calibrated multi-view cameras. This paper presents a robust and fast LCC algorithm based on motion compensated linear regression which reuses the motion information from the encoder. We adopt the linear weighted prediction model in H.264/AVC as our LCC model. In our experiments, the proposed LCC algorithm outperforms basic histogram matching method up to 0.4dB with only few computational overhead and zero external memory bandwidth. So, the dataflow of this method is suitable for low bandwidth/low power VLSI design for future multi-view applications.
Complexity control of fast motion estimation in H.264/MPEG-4 AVC with rate-distortion-complexity optimization
Mo Wu, Søren Forchhammer, Shankar M. Aghito
A complexity control algorithm for H.264 advanced video coding is proposed. The algorithm can control the complexity of integer inter motion estimation for a given target complexity. The Rate-Distortion-Complexity performance is improved by a complexity prediction model, simple analysis of the past statistics and a control scheme. The algorithm also works well for scene change condition. Test results for coding interlaced video (720 x576 PAL) are reported.
Wyner-Ziv residual coding for wireless multi-view system
Zhipeng Jin, Mei Yu, Gangyi Jiang, et al.
For wireless multi-view video system, whose abilities of storage and computation are all very weak, it is essential to have an encoder device with low-power consumption and low-complexity. In this paper, a DCT-domain Wyner-Ziv residual coding scheme with low encoding complexity is proposed for wireless multi-view video coding (WZRC-WMS). The scheme is designed to encode the residual frames of each view independently without any motion or disparity estimation at the encoder, so as to shift the large computational complexity to the decoder. At the decoder, the proposed scheme performs joint decoding with side information interpolated from current view and adjacent views. Experimental results show that the proposed WZRC-WMS scheme outperforms the H.263+ interframe coding about 1.9dB in rate-distortion performance, while the encoding complexity is only 1/17 of that of H.264 interframe coding.
Progressive image transmission with RCPT protection
In this paper, a joint source-channel coding scheme is proposed for progressive image transmission over channels with both random bit errors and packet loss by using rate-compatible punctured Turbo codes (RCPT) protection only. Two technical components which are different from existing methods are presented. First, a data frame is divided into multiple CRC blocks before being coded by a turbo code. This is to secure a high turbo coding gain which is proportional to the data frame size. In the mean time, the beginning blocks in a frame may still be usable although the decoding of the entire frame fails. Second, instead of employing product codes, we only use RCPT, along with an interleaver, to protect images over channels with combined distortion including random errors and packet loss. With this setting, the effect of packet loss is equivalent to randomly puncturing turbo codes. As a result, the optimal allocation of channel code rates is required for the random errors only, which largely reduces the complexity of the optimization process. The effectiveness of the proposed schemes is demonstrated with extensive simulation results.
A sender-driven time-stamp controlling based dynamic light field streaming service
Zhun Han, Qionghai Dai, Yebin Liu
Light Field Rendering (LFR) now plays a very important role in Free View-point Video (FVV) service, which is a new type of multi-media. Supporting Light Field Video (LFV) streaming over IP network is a very challenging research area. This paper shows a sender-driven streaming service that can support dynamic light field video service to multiple users over broad-band IP networks using a time-stamp controlling algorithm. Results show that system built based on our algorithm can support more than 50 users in a 100Mb band-width on server side.
A fast and quality-preserving method for H.264 encoding using dynamic SAD maps
D. Wu, T. K. Chiew, K. P. Lim, et al.
It is well known that motion estimation is the most computationally intensive processing unit of the H.264 video encoder. Various fast motion estimation algorithms have been proposed to reduce its complexity. Generally, these approaches achieve speedup by reducing the number of candidate search points within the search window. In this paper, we propose a new method, which uses the Sum-of-Absolute-Differences mapping (SAD map) to dynamically cache the SAD values and then reuse them for different block sizes. Experimental results on standard video sequences verified that the proposed method is capable of increasing the encoder speed by up to 15% without any loss in PSNR value or increase in bit rate. Due to its generic nature, this method can be applied in any fast motion estimation methods although it is especially effective in the full search motion estimation method.
Impact of rate control tools on very fast non-embedded wavelet image encoders
O. López, M. Martínez-Rach, J. Oliver, et al.
Recently, there has been an increasing interest in the design of very fast wavelet image encoders focused on applications (interactive real-time image&video applications, GIS systems, etc) and devices (digital cameras, mobile phones, PDAs, etc) where coding delay and/or available computing resources (working memory and power processing) are critical for proper operation. Most of these fast wavelet image encoders are non-embedded in order to reduce complexity, so no rate control tools are available for scalable coding applications. In this work, we analyze the impact of simple rate control tools for these encoders in order to determine if the inclusion of rate control functionality is worth enough with respect to popular embedded encoders like SPIHT and JPEG2000. We perform the study by adding rate control to the nonembedded LTW encoder, showing that the increase in complexity still maintains LTW competitive with respect SPIHT and JPEG2000 in terms of R/D performance, coding delay and memory consumption.
Non-rigid object tracker based on a robust combination of parametric active contour and point distribution model
Joanna Isabelle Olszewska, Tom Mathes, Christophe De Vleeschouwer, et al.
Our study considers the development of a reliable tracker for non-rigid objects evolving on cluttered background in crowded scenes captured by moving cameras. For this purpose, we propose an original method that combines two approaches, respectively based on parametric active contours (PAC) and on point distribution model (PDM). The PAC tracker relies on an effective and effcient implementation of contour convergence mechanism to bring a smooth contour to the edges of the target in real-time. The PDM approach collects feature points in the region delineated by the PAC tracker to build and update a model of the target in term of a feature point distribution. Formally, when a novel frame is considered, its feature points are matched with the PDM model. The matching information is used to initialize the novel PAC, whose convergence identify the points that are relevant to update the PDM for the next frame. Hence, the two approaches complement each others. The a priori information provided by the PDM makes the system robust towards occlusions, while the deformation of the PAC increases its robustness towards target appearance changes. Simulations on real-word video sequences demonstrate the performance of our approach.
Optimal packet scheduling and rate control for video streaming
Eren Gürses, Gozde Bozdagi Akar, Nail Akar
In this paper, we propose a new low-complexity retransmission based optimal video streaming and rate adaptation algorithm. The proposed OSRC (Optimal packet Scheduling and Rate Control) algorithm provides average reward optimal solution to the joint scheduling and rate control problem. The efficacy of the OSRC algorithm is demonstrated against optimal FEC based schemes and results are verified over TFRC (TCP Friendly Rate Control) transport with ns-2 simulations.
Evaluation of a combined pre-processing and H.264-compression scheme for 3D integral images
To provide sufficient 3D-depth fidelity, integral imaging (II) requires an increase in spatial resolution of several orders of magnitude from today's 2D images. We have recently proposed a pre-processing and compression scheme for still II-frames based on forming a pseudo video sequence (PVS) from sub images (SI), which is later coded using the H.264/MPEG-4 AVC video coding standard. The scheme has shown good performance on a set of reference images. In this paper we first investigate and present how five different ways to select the SIs when forming the PVS affect the schemes compression efficiency. We also study how the II-frame structure relates to the performance of a PVS coding scheme. Finally we examine the nature of the coding artifacts which are specific to the evaluated PVS-schemes. We can conclude that for all except the most complex reference image, all evaluated SI selection orders significantly outperforms JPEG 2000 where compression ratios of up to 342:1, while still keeping PSNR > 30 dB, is achieved. We can also confirm that when selecting PVS-scheme, the scheme which results in a higher PVS-picture resolution should be preferred to maximize compression efficiency. Our study of the coded II-frames also indicates that the SI-based PVS, contrary to other PVS schemes, tends to distribute its coding artifacts more homogenously over all 3D-scene depths.
Key frame extraction from unstructured consumer video clips
Christophe Papin, Jiebo Luo
We present a key frame extraction method dedicated to summarize unstructured consumer video clips acquired from digital cameras. Analysis of spatio-temporal changes over time provides meaningful information about the scene and the cameraman's general intents. First, camera and object motion are estimated and used to derive motion descriptors. A video is segmented into homogeneous segments based on major types of camera motion (e.g., pan, zoom, pause, steady). Dedicated rules are used to extract candidate key frames from each segment. Confidence measures are computed for the candidates to enable ranking in semantic relevance. This method is scalable so that we can produce any desired number of key frames from the candidates. We demonstrated the effectiveness of our method by comparing results with the ground truth agreed by multiple judges.
Multiresolution mesh segmentation based on surface roughness and wavelet analysis
Céline Roudet, Florent Dupont, Atilla Baskurt
During the last decades, the three-dimensional objects have begun to compete with traditional multimedia (images, sounds and videos) and have been used by more and more applications. The common model used to represent them is a surfacic mesh due to its intrinsic simplicity and efficacity. In this paper, we present a new algorithm for the segmentation of semi-regular triangle meshes, via multiresolution analysis. Our method uses several measures which reflect the roughness of the surface for all meshes resulting from the decomposition of the initial model into different fine-to-coarse multiresolution meshes. The geometric data decomposition is based on the lifting scheme. Using that formulation, we have compared various interpolant prediction operators, associated or not with an update step. For each resolution level, the resulting approximation mesh is then partitioned into classes having almost constant roughness thanks to a clustering algorithm. Resulting classes gather regions having the same visual appearance in term of roughness. The last step consists in decomposing the mesh into connex groups of triangles using region growing ang merging algorithms. These connex surface patches are of particular interest for adaptive mesh compression, visualisation, indexation or watermarking.
Occlusion and split detection and correction for object tracking in surveillance applications
Carlos Vázquez, Mohammed Ghazal, Aishy Amer
This paper proposes a novel algorithm for the real-time detection and correction of occlusion and split in feature-based tracking of objects for surveillance applications. The proposed algorithm detects sudden variations of spatio-temporal features of objects in order to identify possible occlusion or split events. The detection is followed by a validation stage that uses past tracking information to prevent false detection of occlusion or split. Special care is taken in case of heavy occlusion, when there is a large superposition of objects. In this case the system relies on long-term temporal behavior of objects to avoid updating the video object features with unreliable (e.g. shape and motion) information. Occlusion is corrected by separating occluded objects. For the detection of splits, in addition to the analysis of spatio-temporal changes in objects features, our algorithm analyzes the temporal behavior of split objects to discriminate between errors in segmentation and real separation of objects, such as in the deposit of an object. Split is corrected by physically merging the objects detected to be split. To validate the proposed approach, objective and visual results are presented. Experimental results show the ability of the proposed algorithm to detect and correct, both, split and occlusion of objects. The proposed algorithm is most suitable in video surveillance applications due to: its good performance in multiple, heavy, and total occlusion; its distinction between real object separation and faulty object split; its handling of simultaneous occlusion and split events; and its low computational complexity.
Optimal reverse frame selection for stored video delivery under constrained resources
In this paper, we present an optimal reverse frame selection (RFS) algorithm based on dynamic programming for delivering stored video under both bandwidth and buffer size constraints. Our objective is to find a feasible set of frames that can maximize the video's accumulated motion metrics without violating any constraint. We further extend RFS to solve the problem of video delivery over VBR channels where the channel bandwidth is both limited and time-varying. In particular, we first run RFS offline for several bandwidth samples, and the computation complexity is modest and scalable with the aids of frame size stuffing and non-optimal state elimination. During online streaming, we only need to retrieve the optimal frame selection path from the pre-generated offline results, and it can be applied to any VBR channels that can be modelled as piecewise CBR channels. Experimental results show the good performance of our proposed algorithm.
Post-processing for decoding without update step in motion-compensated lifted wavelet video coding
Motion-compensated temporal filtering is an open-loop coding technique, which generally employs a motion-compensated update step. Although the update step is essential at the encoder for good rate-distortion efficiency, it might be skipped at the decoder for benefits like lower complexity and lower playout delay. Previous investigations showed that skipping the update step at the decoder results in some quality degradation at high rates. In this paper we analyze how this degradation arises and also propose a simple method to reduce this degradation. The proposed solution can also be implemented as a post-processing procedure after conventional decoding without the update step. Experimental results show that the degradation in quality is reduced by half. For our t+2D wavelet coder, this gives a gain of approximately 1.0 - 1.5 dB at high bit-rates.
Wavelet-based multiple description coding of 3-D geometry
Andrey Norkin, M. Oguz Bici, Gozde Bozdagi Akar, et al.
In this work, we present a multiple description coding (MDC) scheme for reliable transmission of compressed three dimensional (3-D) meshes. It trades off reconstruction quality for error resilience to provide the best expected reconstruction of 3-D mesh at the decoder side. The proposed scheme is based on multiresolution geometry compression achieved by using wavelet transform and modified SPIHT algorithm. The trees of wavelet coefficients are divided into sets. Each description contains the coarsest level mesh and a number of tree sets coded with different rates. The original 3-D geometry can be reconstructed with acceptable quality from any received description. More descriptions provide better reconstruction quality. The proposed algorithm provides flexible number of descriptions and is optimized for varying packet loss rates (PLR) and channel bandwidth.
Hysteresis-based selective Gaussian-mixture model for real-time background update
We propose a novel Mixture of Gaussian (MOG)-based real-time background update technique. The proposed technique consists of a new selective matching scheme based on the combined approaches of component ordering and winner-takes-all. This matching scheme not only selects the most probable component for the first matching with new pixel data, greatly improving performance, but also simplifies pixel classification and component replacement in case of no match. Further performance improvement achieved by using a new simple and functional component variance adaptation formula. Also in this technique, the proposed new hysteresis-based component matching and temporal motion history schemes improve segmentation quality. Component hysteresis matching improves detected foreground object blobs by reducing the amount of cracks and added shadows, while motion history preserves the integrity of moving objects boundaries, both with minimum computational overhead. The proposed background update technique implicitly handles both gradual illumination change and temporal clutter problems. The problem of shadows and ghosts is partially addressed by the proposed hysteresis-based matching scheme. The problem of persistent sudden illumination changes and camera movements are addressed at frame level depending on the percentage of pixels classified as foreground. We implemented three different state-of-the-art background update techniques and compared their segmentation quality and computational performance with those of the proposed technique. Experimental results on reference outdoor sequences and real traffic surveillance streams show that the proposed technique improved segmentation accuracy for extracting moving objects of interest compared to other reference techniques.
View interpolation by inverse filtering: generating the center view using multiview images of circular camera array
Akira Kubota, Kazuya Kodama, Yoshinori Hatori
This paper deals with a view interpolation problem using multiple images captured with a circular camera array. An inverse filtering method for reconstructing a virtual image at the center of the camera array is proposed. First, we generate a candidate image by adding all correspondence pixel values based on multiple layers assumed in a scene. Second, we model a linear relationship between the desired virtual image and the candidate image, and then derive the inverse filter to reconstruct the virtual image. Since we can derive the inverse filter independent of the scene structure, the proposed method requires no depth estimation. Simulation results using synthetic images show that increasing the number of cameras and layers improves the quality of the virtual image.
A novel framework for improving bandwidth utilization for VBR video delivery over wide-area networks
There are great challenges in streaming variable-bit-rate video over wide-area networks due to the significant variation of network conditions. The utilization of the precious bandwidth of wide-area networks is often low in such streaming systems. In this paper, we propose a novel framework to improve the bandwidth utilization from a new perspective. Instead of focusing on the performance of each single media stream, we aim to improve the overall bandwidth utilization for video streaming systems. We try to exploit the unoccupied bandwidth in ongoing streams and using it to deliver some prefetched data which can be used to facilitate future streaming. Preliminary results show that our mechanism has great potential to improve both the overall bandwidth utilization and the caching performance of the proxy servers in the streaming systems.
H.264/AVC error resilient video streaming using leaky prediction
In this paper, we investigate the efficacy of the weighted prediction feature provided within H.264/AVC video coding standard for error resilient video streaming. Leaky prediction has been proposed for scalable and non-scalable video coding to effectively combat transport errors. However, all prior results are based on non standard coding methods and no results have been available on the effectiveness of leaky prediction that is supported by a video coding standard. The weighted prediction feature in H.264/AVC was originally designed for improving the coding efficiency especially in presence of fading in video sequences. This paper presents a performance analysis of H.264/AVC weighted prediction feature that balances the trade-off between coding efficiency and error resilience. A theoretical analysis of rate-distortion performance of leaky prediction is provided and closed-form rate-distortion functions are derived for the error free and error drift scenarios. The theoretical results conform well to the operational results with respect to different choices of the leaky factor.
A fast and accurate characteristic-based rate-quantization model for video transmission
Din-Yuen Chan, Wei-Ta Chien, Chun-Yuan Chang, et al.
In this paper, we analyze the limitation of ρ-domain based rate-quantization (R-Q) model. We find out that a characteristic-based R-Q model can be derived from ρ-domain to q-domain. Experimental data show that such a characteristic-based R-Q model can provide a more accurate estimation of the actual bitrate than existing models for both frame-level and macroblock(MB)-level. In addition, a simple analysis of computational complexity of our quantization-free characteristics extraction framework shows that our model is faster than existing variance and ρ-domain based R-Q models.
A double motion-compensated orthogonal transform with energy concentration constraint
Markus Flierl, Bernd Girod
This paper discusses a transform for successive pictures of an image sequence which strictly maintains orthogonality while permitting 2-hypothesis motion compensation between pairs of pictures. We extend previous work on an orthogonal transform for image sequences which uses only 1-hypothesis motion compensation between pairs of pictures. This work is motivated by the well known fact that the motion-compensated lifted Haar wavelet maintains orthogonality only approximately. In the case of zero motion fields, the motion-compensated lifted Haar wavelet is known to be orthogonal. But for complex motion fields with many multi-connected and unconnected pixels, the motion-compensated lifted Haar wavelet cannot accurately maintain its transform property and, hence, suffers a performance degradation. The presented double motion-compensated orthogonal transform strictly maintains orthogonality for any motion field. For the transform, each pixel in the high-band is compensated by a linear combination of two motion-compensated pixels chosen from the corresponding low-band. That is, each pixel in the high-band is associated with two motion vectors. This is in contrast to the previously presented single motion-compensated orthogonal transform where each pixel in the high-band is compensated by only one motion-compensated pixel chosen from the corresponding low-band. In terms of energy concentration, the double motion-compensated orthogonal transform outperforms the single motion-compensated orthogonal transform and compares favorably with the double motion-compensated lifted Haar wavelet.
Image compression using constrained relaxation
In this work, we develop a new data representation framework, called constrained relaxation for image compression. Our basic observation is that an image is not a random 2-D array of pixels. They have to satisfy a set of imaging constraints so as to form a natural image. Therefore, one of the major tasks in image representation and coding is to efficiently encode these imaging constraints. The proposed data representation and image compression method not only achieves more efficient data compression than the state-of-the-art H.264 Intra frame coding, but also provides much more resilience to wireless transmission errors with an internal error-correction capability.
Rate-distortion optimized color quantization for compound image compression
Wenpeng Ding, Yan Lu, Feng Wu, et al.
In this paper, we present a new image compression scheme, which is specially designed for computer generated compound color images. First we classify the image content into two kinds: text/graphic content and picture content. Then two different compression schemes are applied blocks of different types. We propose a two stage segmentation scheme which combines thresholding block features and rate-distortion optimization. The text/graphics blocks compression scheme consists of two parts: color quantization and lossless coding of quantized images. The input images will first be color quantized and converted to codebooks and labels, introducing constraint distortion to the color quantization images. Then generated labels and codebooks are lossless compressed respectively. We proposed a rate-distortion optimized color quantization algorithm for text/graphic content, which introduces distortion to text content and minimizes the bit rate produced by the following lossless entropy compression algorithm. The picture content is compressed using conventional image algorithms like JPEG. The results show that the proposed scheme achieves better coding performance than other images compression algorithms such as JPEG2000 and DjVu.
Dynamic GOP structure for scalable video coding
MPEG is currently developing a new scalable video coding (SVC) standard which should provide compression efficiency similar to current non-scalable video coding standard. The extension of H.264/AVC hybrid video coding using motion-compensated temporal filtering (MCTF) is a current solution. This paper presents a dynamic group of picture (GOP) structure to improve the quality of the SVC coded video by using variable GOP sizes along the video sequences without the restriction of the fixed GOP size or the limited adaptive GOP size. The dynamic GOP structure keeps temporal importance information of frames in the encoded video, which helps in coding efficiency and provides better user perception. An effective GOP size determination method is also proposed in this paper. The proposed scheme has been validated by experimental results.
The wavelet-based multi-resolution motion estimation using temporal aliasing detection
Teahyung Lee, David V. Anderson
In this paper, we propose a new algorithm for wavelet-based multi-resolution motion estimation (MRME) using temporal aliasing detection (TAD). In wavelet transformed image/video signals, temporal aliasing will be severe as the motion of object increases, causing the performance of the conventional MRME algorithms to drop. To overcome this problem, we perform the temporal aliasing detection and MRME simultaneously instead of using a temporal anti-aliasing filter which changes the original signal. We show that this technique gives competitive or better performance in terms of rate-distortion (RD) for slow-varying or simple-moving video signals compared to conventional MRME employing increased search area (SA).
Adaptive P2P video streaming via packet labeling
We consider the scenario of video streaming in peer-to-peer networks. A single media server delivers the video content to a large number of peer hosts by taking advantage of their forwarding capabilities. We propose a scheme that enables the peers to efficiently distribute the media stream among them. Each of the peers connects to the streaming server via multiple multicast trees that provide for robustness in the event of peer disconnection. Moreover, adaptive forwarding of the media content at each peer is enabled by labeling the packets with their importance for the reconstruction of the media stream. We study the performance of the proposed scheme as a function of system parameters such as the play-out delay of the media application, the peer population size and the number of multicast trees employed by the scheme. We show that by placing priorities on forwarding the individual packets at each peer an improved performance is achieved over conventional peer-to-peer systems where no such prioritization is deployed. The gains in performance are particularly significant for low-delay applications and large peer populations.
Using machine learning for fast intra MB coding in H.264
Hari Kalva, Lakis Christodoulou
H.264 is a highly efficient and complex video codec. The complexity of the codec makes it difficult to use all its features in resource constrained mobile devices. This paper presents a machine learning approach to reducing the complexity of Intra encoding in H.264. Determining the macro block coding mode requires substantial computational resources in H.264 video encoding. The goal of this work to reduce MB mode computation from a search operation, as is done in the encoders today, to a computation. We have developed a methodology based on machine learning that computes the MB coding mode instead of searching for the best match thus reducing the complexity of Intra 16x16 coding by 17 times and Intra 4x4 MB coding by 12.5 times. The proposed approach uses simple mean value metrics at the block level to characterize the coding complexity of a macro block. A generic J4.8 classifier is used to build the decision trees to quickly determine the mode. We present a methodology for Intra MB coding. The results show that intra MB mode can be determined with over 90% accuracy. The proposed can also be used for determining MB prediction modes with an accuracy varying between 70% and 80%.
Subband motion compensation for spatially scalable video coding
Rong Zhang, Mary L. Comer
In this paper, a new enhancement layer motion compensation technique referred to as subband motion compensation is proposed for spatially scalable video coding. This approach is proposed as an alternative to a technique which we call pyramid motion compensation in this paper and which is referred to as inter-layer residual prediction in the H.264/MPEG4-AVC scalable extension, Scalable Video Coding (SVC) standard. The main difference between these two techniques lies in the way they use the base layer information to encode the enhancement layer. Experimental results comparing the two approaches show that for enhancement layer encoding, pyramid method is better when the corresponding base layer is encoded with a lower bitrate while subband method outperforms pyramid method when the base layer has a higher bitrate. This motivates future proposed work to adaptively choose between these two methods at the macroblock level or even at the transform coefficient level for spatially scalable video coding.
Non-linear up-sampling for image coding in a spatial pyramid
A locally adaptive up-sampling method that improves the efficiency of a spatially scalable representation of images in a spatial pyramid is presented. While linear methods use a globally optimized up-sampling filter design, the method presented locally switches between enhancement of significant structures and smoothing of flat regions that are dominated by noise. It is based on a locally adaptive Wiener filter expression that can be implemented by the bilateral filter. The performance of the method is assessed in a scenario resembling its possible use in MPEG's and ITU-T's joint current activity on scalable video coding (SVC).
Collusion attack to a scalable AND-ACC fingerprinting scheme
Yongdong Wu, Zhigang Zhao
This paper presents a collusion attack to a fingerprinting scheme. To this end, it creates a pirated copy by adaptively separating the traitors into groups and then applying either LCCA or majority attack. Since the pirated copy discloses no watermarks of the traitors, the fingerprinting scheme fails to trace the traitors. Our experiments demonstrate that the attack is effective.
Improvements of multiple FGS layers coding for low-delay applications in SVC
Yanyan Zheng, Xiangyang Ji, Feng Wu, et al.
For low-delay applications in Scalable Video Coding, there are two alternative coding strategies based on AR-FGS for compressing multiple FGS layers. However, their coding efficiencies both suffer largely from the inherent drifting errors. In this paper, a more efficient multiple FGS layers coding structure is presented, which can provide higher coding performance within wide bitrate range and stronger error resilience. This is achieved by incorporating the partial-reconstructed enhancement layer references, instead of the complete-reconstructed ones, into the motion-compensated prediction loop of the FGS layers. Thereupon the prediction drift can be effectively decreased, especially for the middle bitrate points. Further, through selecting different-quality enhancement layer references generated with cycle-based reconstruction mechanism, more flexible video quality can be supported to be more suitable for varied practical application requirements.
Achieving H.264-like compression efficiency with distributed video coding
Recently, a new class of distributed source coding (DSC) based video coders has been proposed to enable low-complexity encoding. However, to date, these low-complexity DSC-based video encoders have been unable to compress as efficiently as motion-compensated predictive coding based video codecs, such as H.264/AVC, due to insufficiently accurate modeling of video data. In this work, we examine achieving H.264-like high compression efficiency with a DSC-based approach without the encoding complexity constraint. The success of H.264/AVC highlights the importance of accurately modeling the highly non-stationary video data through fine-granularity motion estimation. This motivates us to deviate from the popular approach of approaching the Wyner-Ziv bound with sophisticated capacity-achieving channel codes that require long block lengths and high decoding complexity, and instead focus on accurately modeling video data. Such a DSC-based, compression-centric encoder is an important first step towards building a robust DSC-based video coding framework.