Proceedings Volume 4310

Visual Communications and Image Processing 2001

cover
Proceedings Volume 4310

Visual Communications and Image Processing 2001

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 29 December 2000
Contents: 14 Sessions, 95 Papers, 0 Presentations
Conference: Photonics West 2001 - Electronic Imaging 2001
Volume Number: 4310

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Enhancement and Restoration
  • Image Coding
  • Image Analysis
  • Video Coding Algorithms
  • Stereo and Multiview Processing
  • Video Coding Implementations
  • Motion Estimation
  • Error-Resilient Coding
  • Image Sequence Analysis
  • Video Coding Algorithms
  • Internet Video
  • Face Tracking and Recognition
  • Wireless Video
  • Posters I: Processing and Analysis of Visual Information
  • Posters II: Visual Communication
Enhancement and Restoration
icon_mobile_dropdown
New fuzzy filter for Gaussian noise reduction
Dimitri Van De Ville, Mike Nachtegael, D. Van der Weken, et al.
A new fuzzy filter is presented for the noise reduction of images corrupted with additive Gaussian noise. The filter consists of two stages. The first stage computes a fuzzy gradient for eight different directions around the currently processed pixel. The second stage uses the fuzzy gradient to perform fuzzy smoothing by taking different contributions of neighboring pixel values. The two stages are both based on fuzzy rules which make use of membership functions. The filter can be applied iteratively to effectively reduce heavy noise. The shape of the membership functions is adapted according to the remaining noise level after each iteration, making use of the distribution of the homogeneity in the image. The fuzzy operators are implemented by the classical min/max. Experimental results are obtained to show the feasibility of the proposed approach. These results are also compared to other filters by numerical measures and visual inspection.
Spatially adaptive regularized iterative high-resolution image reconstruction algorithm
Won Bae Lim, Min Kyu Park, Moon Gi Kang
High resolution images are often required in applications such as remote sensing, frame freeze in video, military and medical imaging. Digital image sensor arrays, which are used for image acquisition in many imaging systems, are not dense enough to prevent aliasing, so the acquired images will be degraded by aliasing effects. To prevent aliasing without loss of resolution, a dense detector array is required. But it may be very costly or unavailable, thus, many imaging systems are designed to allow some level of aliasing during image acquisition. The purpose of our work is to reconstruct an unaliased high resolution image from the acquired aliased image sequence. In this paper, we propose a spatially adaptive regularized iterative high resolution image reconstruction algorithm for blurred, noisy and down-sampled image sequences. The proposed approach is based on a Constrained Least Squares (CLS) high resolution reconstruction algorithm, with spatially adaptive regularization operators and parameters. These regularization terms are shown to improve the reconstructed image quality by forcing smoothness, while preserving edges in the reconstructed high resolution image. Accurate sub-pixel motion registration is the key of the success of the high resolution image reconstruction algorithm. However, sub-pixel motion registration may have some level of registration error. Therefore, a reconstruction algorithm which is robust against the registration error is required. The registration algorithm uses a gradient based sub-pixel motion estimator which provides shift information for each of the recorded frames. The proposed algorithm is based on a technique of high resolution image reconstruction, and it solves spatially adaptive regularized constrained least square minimization functionals. In this paper, we show that the reconstruction algorithm gives dramatic improvements in the resolution of the reconstructed image and is effective in handling the aliased information. The proposed algorithm is also shown to be robust in the presence of severe registration error. Experimental results are provided to illustrate the performance of the proposed reconstruction algorithm. Comparative analyses with other reconstruction methods are also provided.
Restoration of compressed video using temporal information
Mark A. Robertson, Robert L. Stevenson
This paper proposes a Bayesian method for the restoration of video sequences compressed using the discrete cosine transform (DCT). Two elements, both part of the Bayesian observation model, distinguish the proposed algorithm from the majority of other methods in the literature. The proposed algorithm incorporates temporal information from nearby frames -- past, present and future -- when forming an estimate of the current frame. Furthermore, this work uses a spatially-varying noise model to account for the noise introduced by quantization of the DCT coefficients. These two aspects of the observation model are used in conjunction with a Huber-Markov Random Field (HMRF) model to form a Bayesian estimate of each frame in the compressed video sequence.
Novel sequential error concealment techniques using orientation adaptive interpolation
Xin Li, Michael T. Orchard
This paper introduces a new framework of sequential error concealment techniques for block-based image coding systems. Unlike previous approaches which simultaneously recover the pixels inside the missing block, we propose to recover them in a sequential fashion. The structure of sequential recovery enhances the capability of handling complex texture patterns in the image and serious block loss situations during the transmission. Under the framework of sequential recovery, we present a novel spatially adaptive scheme to interpolate the missing pixels along the edge orientation. We also study the problem of how to fully exploit the information from the available surrounding neighbors with the sequential constraint. Experiment results have shown that novel sequential recovery techniques are superior to most existing parallel recovery techniques in terms of both subjective and objective quality of reconstructed images.
Image Coding
icon_mobile_dropdown
Lossless and near-lossless image compression with successive refinement
We present a technique that provides progressive transmission and near-lossless compression in one single framework. The proposed technique produces a bitstream that results in progressive reconstruction of the image just like what one can obtain with a reversible wavelet codec. In addition, the proposed scheme provides near-lossless reconstruction with respect to a given bound after each layer of the successively refinable bitstream is decoded. We formulate the image data compression problem as one of asking the optimal questions to determine, respectively, the value or the interval of the pixel, depending on whether one is interested in lossless or near-lossless compression. New prediction methods based on the nature of the data at a given pass are presented and links to the existing methods are explored. The trade-off between non- causal prediction and data precision is discussed within the context of successive refinement. Context selection for prediction in different passes is addressed. Finally, experimental results for both lossless and near-lossless cases are presented, which are competitive with the state-of-the-art compression schemes.
Near-lossless compression by relaxation-labeled 3D prediction
In this work, near-lossless compression, i.e., yielding strictly bounded reconstruction error, is proposed for high- quality data compression. An interframe causal DPCM scheme is presented for interframe compression of remotely sensed optical data, both multispectral and hyperspectral, as well as of volumetric medical data. The proposed encoder relies on a classified linear-regression prediction, followed by context- based arithmetic coding of the outcome prediction errors. It provides outstanding performances, both for reversible and for irreversible, i.e., near-lossless, compression. Coding time are affordable thanks to fast convergence of training. Decoding is always performed in real time.
Adaptive approximation image coding models
Rodrigo Montufar-Chaveznava, Francisco Garcia-Ugalde
In this work we present some image coding models based on adaptive approximation techniques. The image coding models presented are based on Matching Pursuit and High Resolution Pursuit, which are the most popular adaptive approximation techniques. These models have a similar computational complexity and structure. The models expands an image along an overcomplete dictionary. The dictionary was selected according to a best basis metric or a training strategy. From such expansion, the model selects the coefficients that correspond to the most important image structures. Selected coefficients are quantized just when they are chosen, in order to minimize error propagation along the process. These coefficients represent an optimal image decomposition, or a reduced image representation. This representation, in some way, corresponds to a coded image with a high compression rate. A simple reconstruction algorithm recovers the original image with a high visual quality.
Image coding using a colored pattern appearance model
Guoping Qiu
With desktop imaging devices becoming ubiquitous, effectively managing the images in large collections has become a challenge. The requirements for a modern imaging system now demand not only efficient storage (low bit rate coding), but also easy manipulation, indexing and retrieval of images. In this paper, we introduce a new method for color image coding based on a visual appearance model of local color image patterns. The visual appearance of small image patterns is characterized by their spatial pattern, color direction and local energy strength. To encode the local visual appearance, an approach based on vector quantization (VQ) is introduced. A separate VQ is designed for the spatial pattern and color direction respectively. It is shown that the method not only achieves good image coding results in terms of rate distortion criterion, it also enables content-based retrieval to be performed in the compressed domain easily and conveniently.
Lifting kernel-based sprite codec
Aravind R. Dasu, Sethuraman Panchanathan
The International Standards Organization (ISO) has proposed a family of standards for compression of image and video sequences, including the JPEG, MPEG-1 and MPEG-2. The latest MPEG-4 standard has many new dimensions to coding and manipulation of visual content. A video sequence usually contains a background object and many foreground objects. Portions of this background may not be visible in certain frames due to the occlusion of the foreground objects or camera motion. MPEG-4 introduces the novel concepts of Video Object Planes (VOPs) and Sprites. A VOP is a visual representation of real world objects with shapes that need not be rectangular. Sprite is a large image composed of pixels belonging to a video object visible throughout a video segment. Since a sprite contains all parts of the background that were at least visible once, it can be used for direct reconstruction of the background Video Object Plane (VOP). Sprite reconstruction is dependent on the mode in which it is transmitted. In the Static sprite mode, the entire sprite is decoded as an Intra VOP before decoding the individual VOPs. Since sprites consist of the information needed to display multiple frames of a video sequence, they are typically much larger than a single frame of video. Therefore a static sprite can be considered as a large static image. In this paper, a novel solution to address the problem of spatial scalability has been proposed, where the sprite is encoded in Discrete Wavelet Transform (DWT). A lifting kernel method of DWT implementation has been used for encoding and decoding sprites. Modifying the existing lifting scheme while maintaining it to be shape adaptive results in a reduced complexity. The proposed scheme has the advantages of (1) avoiding the need for any extensions to image or tile border pixels and is hence superior to the DCT based low latency scheme (used in the current MPEG-4 verification model), (2) mapping the in place computed wavelet coefficients into a zero tree structure without actually rearranging them, thereby saving allocation of additional memory. The proposed solutions provide efficient implementation of the sprite coder making possible a VLSI realization with a reduced real estate.
Image Analysis
icon_mobile_dropdown
Nonlinear prediction for Gaussian mixture image models
Jun Zhang, Dehong Ma
Prediction is an essential operation in many image processing applications, such as object detection and image and video compression. When the image is modeled as Gaussian, the optimal predictor is linear and easy to obtain. However, image texture and clutter are often non-Gaussian and in such cases, optimal predictors are difficult to find. In this paper, we have derived an optimal predictor for an important class of non-Gaussian image models, the block-based multivariate Gaussian mixture model. This predictor has a special non- linear structure: it is a linear combination of the neighboring pixels but the combination coefficients are also functions of the neighboring pixels, not constants. The efficacy of this predictor is demonstrated in object recognition experiments where the prediction error image is used to identify 'hidden' objects. Results indicate that when the background texture is non-linear, i.e., with fast- switching gray-level patches, it performs significantly better than the optimal linear predictor.
Feature representation and compression for content-based retrieval
Hua Xie, Antonio Ortega
In semantic content-based image/video browsing and navigation systems, efficient mechanisms to represent and manage a large collection of digital images/videos are needed. Traditional keyword-based indexing describes the content of multimedia data through annotations such as text or keywords extracted manually by the user from a controlled vocabulary. This textual indexing technique lacks the flexibility of satisfying various kinds of queries requested by database users and also requires huge amount of work for updating the information. Current content-based retrieval systems often extract a set of features such as color, texture, shape motion, speed, and position from the raw multimedia data automatically and store them as content descriptors. This content-based metadata differs from text-based metadata in that it supports wider varieties of queries and can be extracted automatically, thus providing a promising approach for efficient database access and management. When the raw data volume grows very large, explicitly extracting the content-information and storing it as metadata along with the images will improve querying performance since metadata requires much less storage than the raw image data and thus will be easier to manipulate. In this paper we maintain that storing metadata together with images will enable effective information management and efficient remote query. We also show, using a texture classification example, that this side information can be compressed while guaranteeing that the desired query accuracy is satisfied. We argue that the compact representation of the image contents not only reduces significantly the storage and transmission rate requirement, but also facilitates certain types of queries. Algorithms are developed for optimized compression of this texture feature metadata given that the goal is to maximize the classification performance for a given rate budget.
Joint compression-classification with quantizer/classifier dimension mismatch
Naveen Srinivasamurthy, Antonio Ortega
In this paper an algorithm is presented to design encoders that achieve good compression and classification. The goal is to minimize the classification error introduced by quantizing the data using encoders operating on low dimension inputs, which are subsets of the high dimension data used by the classifier for classification. In the encoder design information from the other dimensions of the vector is used to develop efficient encoders which are capable of achieving lower classification error for a given distortion. The design allows a trade-off between distortion and classifications costs providing more flexibility in the overall system design. The algorithm is tested on Gaussian mixture data, which is classified using a classifier which takes as input vectors of quantized values. The proposed technique can trade performance to achieve lower complexity, which is desirable in devices having limited computational resources. For 4 dimensional Gaussian mixture data the misclassification was about 2.2% more than that achieved by using encoders of the same dimension as the classifier, while the encoding complexity was reduced by more than a factor of 2.
Framework for cooperative segmentation based on the multiagent paradigm
Ludovic Pithon, Saida Bouakaz, Salima Hassas
In the image interpretation domain, one of very scrupulous stages rests in the partitioning of pixels into principal components that correspond to the part of the image. However, the approaches used for a segmentation process, either have a very strong specificity or require many sensible parameters, which make their use in a general case very difficult. Using them in inappropriate cases leads to very bad performance in terms of result quality and relevance. Broadly speaking, to obtain an acceptable segmentation result, we have to choose the appropriate method which best adapted method and we have to turn it very carefully. However, even though the need for an adaptable methodology has been long recognized, few proposes have been appeared. In this paper we present a cooperative and concurrent framework for image segmentation. Our approach integrate several region based methods and detector based boundary finding using Multi-Agents System (MAS). Thanks to natural MAS properties, the image analysis is done at several semantic levels. The novelty of our approach is that is a multi directional cooperation whereby each separate method improves its own result as well as the global one through mutual cooperation and information sharing.
Unsupervised segmentation algorithm based on an iterative spectral dissimilarity measure for hyperspectral imagery
We present an adaptive unsupervised segmentation technique, in which spectral features are obtained and processed without a priori knowledge of the spectral characteristics. The proposed technique is based on an iterative method, in which segmentation at a given iteration depends closely on the segmentation results at the previous iteration. The hyperspectral images are first coarsely segmented and then the segmentation is successively refined via an iterative spectral dissimilarity measure. The algorithm also provides reduced computational complexity and improved segmentation performance. The algorithm consists of (1) an initial segmentation based on a fixed spectral dissimilarity measure and the k-means algorithm, and (2) subsequent adaptive segmentation based on an iterative spectral dissimilarity measure over a local region whose size is reduced progressively. The iterative use of a local spectral dissimilarity measure provided a set of values that can discriminate among different materials. The proposed unsupervised segmentation technique proved to be superior to other unsupervised algorithms, especially when a large number of different materials are mixed in complex hyperspectral scenes.
Video Coding Algorithms
icon_mobile_dropdown
Adaptive block transforms for hybrid video coding
Mathias Wien, Claudia Mayer
Today's standard video coders employ the hybrid coding scheme on a macroblock basis. In these coders blocks of 16 X 16, and 8 X 8 pixel are used for motion compensation of non- interlaced video. The Discrete Cosine Transform (DCT) is then applied to the prediction error on blocks of size 8 X 8. The emerging coding standard H.26L employs a set of seven different block sizes for motion compensation. The size of these blocks varies from 4 X 4 to 16 X 16. The block sizes smaller than 8 X 8 imply that the 8 X 8 DCT cannot be used for transform coding of the prediction error. In the current test model an integer approximation of the 4 X 4 DCT matrix is employed. In this paper the concept of Adaptive Block Transforms is proposed. In this scheme, the transform block size is adapted to the block sizes used for motion compensation. The transform exploits the maximum possible signal length for transform coding without exceeding the compensated block boundaries. The proposed scheme is integrated into the H.26L test model. New integer approximations of the 8 X 8 and 16 X 16 DCT matrices are introduced. Like the TML 4 X 4 transform the coefficient values of these matrices are restricted to a limited range. The results presented here are based on an entropy estimation. They reveal an increased rate/distortion performance of approximately 1.1 dB for high rates on the employed test sequences.
Preprocessing of compressed digital video
Pre-processing algorithms improve on the performance of a video compression system by removing spurious noise and insignificant features from the original images. This increases compression efficiency and attenuates coding artifacts. Unfortunately, determining the appropriate amount of pre-filtering is a difficult problem, as it depends on both the content of an image as well as the target bit-rate of compression algorithm. In this paper, we explore a pre- processing technique that is loosely coupled to the quantization decisions of a rate control mechanism. This technique results in a pre-processing system that operates directly on the Displaced Frame Difference (DFD) and is applicable to any standard-compatible compression system. Results explore the effect of several standard filters on the DFD. An adaptive technique is then considered.
Embedded-to-lossless coding of motion-compensated prediction residuals in lossless video coding
Lossless video coding is useful in applications where no loss of information or visual quality is tolerable. In embedded to lossless coding an encoded video stream can be decoded into any bit rate up to the lossless bit rate, which is quite useful in numerous applications. In this paper, the research we present is based on lossless video coding, which uses motion compensated prediction to eliminate temporal redundancy. We specifically address the problem of embedded to lossless coding of the motion compensated prediction residuals. Since the statistical properties of the residuals are different from still images, the best wavelet bases for still images do not perform well for residuals. Since residuals contain a higher portion of high frequency information in high motion regions, a fixed transform for the entire frame is not very efficient. We introduce a spatially adaptive wavelet transform method, which takes the local frame statistics into account before choosing the wavelet base. This transform technique provided the best performance for most of the test frames.
Hybrid video coding based on high-resolution displacement vectors
Common motion compensated hybrid video coding standards such as H.263, MPEG-1, MPEG-2, MPEG-4 are based on a fractional-pel displacement vector resolution of 1/2-pel. Recent approaches like MPEG-4 ACE and H.26L (TML5) use a 1/4-pel displacement vector resolution. In order to estimate and compensate fractional-pel displacements, the image signal has to be interpolated. Therefore different interpolation filters are used in the standards. In this paper an enhanced motion compensated hybrid video codec is presented, which is based on high-resolution displacement vectors. For this purpose, displacement vector resolutions of 1/8- and 1/16-pel are used in order to improve the motion compensated prediction and the coding efficiency. The coding results for different resolutions are presented and the dependence on different interpolation filters is analyzed. It turned out that the higher the displacement vector resolution is, the higher the influence of the filter on the coding gain is. A gain up to 3.0 dB PSNR is obtained compared to a hybrid video codec, which is based on 1/2-pel resolution and bilinear interpolation like H.263, MPEG-1,2,4. Compared to 1/4-pel displacement vector resolution and a wiener interpolation filter as it is used in MPEG-4 ACE and H.26L (TML5), a gain up to 1.0 dB PSNR is obtained.
Scene level rate control algorithm for MPEG-4 video coding
Object-based coding approaches, such as the MPEG-4 standard approach, where a video scene is composed by several video objects, require that the rate control is performed by using two levels: the scene rate control and the object rate control. In this context, this paper presents a new scene level and object level rate control algorithm for low delay MPEG-4 video encoding capable of performing bit allocation for the several VOs in the scene, encoded at different VOP rates, and aiming at obtaining a better trade-off among spatial and temporal quality for the overall scene. The proposed approach combines rate-distortion modeling using model adaptation by least squares estimation and adaptive bit allocation to 'shape' the encoded data in order that the overall subjective quality of the encoded scene is maximized.
Stereo and Multiview Processing
icon_mobile_dropdown
Interpolation of three views based on epipolar geometry
Makoto Kimura, Hideo Saito
In this paper, we propose a method for generating arbitrary view image by interpolating images between three cameras using epipolar geometry. Projective geometry has recently been used in the field of computer vision, because projective geometry can be easily determined comparing with Euclidean geometry. In the proposed method, three input camera images are rectified so that the vertical and horizontal directions can be completely aligned to the epipolar planes between the cameras. This rectification provides Projective Voxel Space (PVS), in which the three axes are aligned with the direction of camera's projection. Such alignment simplifies the procedure for projection and back projection between the 3D space and the image planes. First, we apply shape-from-silhouette with taking advantage of PVS. The consistency of color value between the images is evaluated for final determination of the object surface voxel. Therefore, consistent matching in three images is estimated and images can be interpolated from the matching information. Synthesized images are based on 3D shape in PVS, so the occlusion of the object is reproduced in the generated images, however it requires only weak calibration.
Structure analysis of natural scenes using census transform and region competition
Kunio Yamada, Tadashi Ichikawa, Takeshi Naemura, et al.
In this paper we propse an approach to generate a panorama depth map which is dense enough to be used as a template for cutting out a texture image by the depth, i.e. structuring the scene into layers to construct a shared 3-D image space. For depth estimation, we make use of census transform for robust determination of correspondence of a stereo pair. To interpolate unknown disparities, we introduced a process influenced by the K-means algorithm. For densification of the depth data, we make use of Region Competition. During the stitching, the confidence of the data is improved for the overlapped areas by multiple evaluation of the disparity data.
Fusion-based adaptive regularized smoothing for 3D reconstruction from image sequences
Junghoon Jung, TaeYong Kim, Younhui Jang, et al.
This paper presents a regularized smoothing algorithm for 3D reconstruction from image sequence. Depth data estimated from a stereo pair or multiple image frames can easily be corrupted by various types of noise such as quantization and imperfect matching. We propose a regularized image restoration algorithm which enhances the surface of depth maps based on spatially adaptive image fusion. We can also enhance the resolution of the surfaces and preserve discontinuities.
Photorealistic interactive virtual environment generation using multiview cameras
Namgyu Kim, Woontack Woo, Makoto Tadenuma
In this paper, we report on a convenient and unified framework for generating a photo-realistic interactive virtual environment (piVE) using heterogeneous multiview cameras, while not using bluescreen techniques and special rendering hardware. In spite of the rapid growth of computer hardware, rendering a photo-realistic virtual environment on the fly is still a challenging problem. With the proposed framework, exploiting stereo images/videos, piVE can be rendered in realtime without using expensive high-end computer with rendering hardware. The proposed framework consists of three main parts, i.e. (1) photo-realistic virtual space generation exploiting a camera with stereoscopic adapter, (2) generation of a video avatar (a special object representing the user) by exploiting multiview camera, and (3) graphics object rendering according to the given camera parameters and the user's interaction. We also address z-keying issues among background video, graphics objects and video avatar.
Computed 3D models for very low bit-rate video coding
Franck Galpin, Luce Morin
This article deals with video sequence compression for very low bitrate. We assume static scene captured with a monocular moving camera. We propose a method using several 3D models in order to reconstruct the original video sequence or a virtual one. A complete scheme for video sequence compression is presented and compared with classical compression schemes.
Multiview image coding with depth maps and 3D geometry for prediction
Marcus Magnor, Peter Eisert, Bernd Girod
Block-based disparity compensation is an efficient prediction scheme for encoding multi-view image data. Available scene geometry can be used to further enhance prediction accuracy. In this paper, three different strategies are compared that combine prediction based on depth maps and 3-D geometry. Three real-world image sets are used to examine prediction performance for different coding scenarios. Depth maps and geometry models are derived from the calibrated image data. Bit-rate reductions up to 10% are observed by suitably augmenting depth map-based with geometry-based prediction.
Video Coding Implementations
icon_mobile_dropdown
SAMPEG: a scene-adaptive parallel MPEG-2 software encoder
This paper presents a fully software-based MPEG-2 encoder architecture, which uses scene-change detection to optimize the Group-of-Picture (GOP) structure for the actual video sequence. This feature enables easy, lossless edit cuts at scene-change positions and it also improves overall picture quality by providing good reference frames for motion prediction. Another favorable aspect is the high coding speed obtained, because the encoder is based on a novel concept for parallel MPEG coding on SMP machines. This concept allows the use of advanced frame-based coding algorithms for motion estimation and adaptive quantization, thereby enabling high- quality software encoding in real-time. Our proposal can be combined with the conventional parallel computing approach on slice basis, to further improve parallelization efficiency. The concepts in the current SAMPEG implementation for MPEG-2 are directly applicable to MPEG-4 encoders.
Adaptive parallel video coding algorithm
Kwong-Keung Leung, Nelson Hon Ching Yung, Paul Y. S. Cheung
Parallel encoding of video inevitably gives varying frame rate performance due to dynamically changing video content and motion field since the encoding process of each macro-block, especially motion estimation, is data dependent. A multiprocessor schedule optimized for a particular frame with certain macro-block encoding time may not be optimized towards another frame with different encoding time, which causes performance degradation to the parallelization. To tackle this problem, we propose a method based on a batch of near-optimal schedules generated at compile-time and a run-time mechanism to select the schedule giving the shortest predicted critical path length. This method has the advantage of being near- optimal using compile-time schedules while involving only run- time selection rather than re-scheduling. Implementation on the IBM SP2 multiprocessor system using 24 processors gives an average speedup of about 13.5 (frame rate of 38.5 frames per second) for a CIF sequence consisting of segments of 6 different scenes. This is equivalent to an average improvement of about 16.9% over the single schedule scheme with schedule adapted to each of the scenes. Using an open test sequence consisting of 8 video segments, the average improvement achieved is 13.2%, i.e. an average speedup of 13.3 (35.6 frames per second).
Memory bandwidth efficient two-layer reduced-resolution decoding of high-definition video
This paper addresses the problem of efficiently decoding high- definition (HD) video for display at a reduced resolution. The decoder presented in this paper is intended for applications that are constrained not only in memory size, but also in peak memory bandwidth. This is the case, for example, during decoding of a high-definition television (HDTV) channel for picture-in-picture (PIP) display, if the reduced resolution PIP-channel decoder is sharing memory with the full-resolution main-channel decoder. The most significant source of video quality degradation in a reduced-resolution decoder is prediction drift, which is caused by the mismatch between the full-resolution reference frames used by the encoder and the subsampled reference frames used by the decoder. to mitigate the visually annoying effects of prediction drift, the decoder described in this paper operates at two different resolutions -- a lower resolution for B pictures, which do not contribute to prediction drift and a higher resolution for I and P pictures. This means that the motion-compensation unit (MCU) essentially operates at the higher resolution, but the peak memory bandwidth is the same as that required to decode at the lower resolution. Storage of additional data, representing the higher resolution for I and P pictures, requires a relatively small amount of additional memory as compared to decoding at the lower resolution. Experimental results will demonstrate the improvement in video quality achieved by the addition of the higher-resolution data in forming predictions for P pictures.
New cost-effective VLSI implementation of 2D discrete cosine transform and its inverse
Danian Gong, Yun He
The two dimensional discrete cosine transform (2-D DCT) has been chosen as the basis in almost all of the recent international image and video coding standards. This paper first categorized the 2-D DCT and inverse DCT (IDCT) architectures. Then a new VLSI architecture for 2-D DCT/IDCT without transpose memory was proposed. The proposed 2-D DCT/IDCT architecture eliminates special transpose circuits and uses general memory modules to store the intermediate results after row-wise transform. The row-wise and column-wise transforms are performed with the different data flow provided by the configurable computation units and data alignment module. The accuracy testing system is set up to search the optimum word-length parameters. Based on the accuracy testing system, the proposed architecture has achieved the smallest word-length compared with reported 2-D DCT architectures.
VHDL design for hardware assistance of fractal image compression
Andrew J. Erickson, Muhammad E. Shaaban, Kenneth W. Hsu, et al.
Fractal image compression has several useful properties, including resolution independence, high compression ratios, rapid decoding, and subjectively good image quality. Despite this, its adoption has been limited, largely because of the very high computational burden of compression. In this paper, an original ASIC design is described which performs the comparisons required for fractal image compression of grayscale images. The design is based around a pipeline of many individual block comparison units through which the set of domain blocks are propagated. Each unit performs the required comparisons between a single range block and the series of domain blocks, and stores the best possible match. The ASIC is designed using synthesizable VHDL, allowing it to be targeted to virtually any digital technology. The small portions of the design which are specific to a particular application or technology, such as the bus interfaces and memory blocks, have been kept separate from the application- independent portions of the design. Simulations of the design suggests that an actual hardware implementation would be about one thousand times faster than a general purpose microprocessor based on similar IC technology, reducing the time required to optimally compress a 256 X 256 image using 8 X 8 range blocks from a few minutes to a fraction of a second.
Complexity analysis of two-pass algorithm and elliptical weighted average filter for VLSI implementation of perspective texture warping
Sethuraman Panchanathan, Karthik Ramaswamy, Jian-Jun Fang, et al.
In this paper we present the Elliptical Weighted Average filtering algorithm and an optimized implementation of a two- pass algorithm and used in digital image and video warping. Two-pass algorithms are well suited for hardware implementation due to their reduced complexity in using 1-D re-sampling and anti-aliasing filters. But, the primary disadvantage is the need for a large buffer to store the temporary image since warping is performed in two passes. The size of the temporary buffer is equal to or greater than the size of the input image. A dedicated, hardware, implementation for this algorithm implies huge cost in terms of real estate on chip. In our approach, Wolberg-Boult's resampling algorithm is modified to use only two rows of temporary buffer thereby making the algorithm more amenable for hardware implementation. We present the complexity analysis based on number of arithmetic and logic operations (add, shift, compare, multiply, clip and divide) per macroblock. In the case of EWA filters, it is the most cost-effective high- quality filtering method because point inclusion testing can be done with one function evaluation and the filter weights can be stored in lookup tables for reduction in computation. For mapping the quadrilaterals, four equations were needed for the four lines of the quadrilaterals, which was computationally complex, wherein the computational cost was directly proportional to the number of input pixels accessed. Also we present the complexity analysis per macroblock.
Motion Estimation
icon_mobile_dropdown
Estimation of large-amplitude motion and disparity fields: application to intermediate view reconstruction
This paper describes a method for establishing dense correspondence between two images in a video sequence (motion) or in a stereo pair (disparity) in case of large displacements. In order to deal with large-amplitude motion or disparity fields, multi-resolution techniques such as blocks matching and optical flow have been used in the past. Although quite successful, these techniques cannot easily cope with motion/disparity discontinuities as they do not explicitly exploit image structure. Additionally, their computational complexity is high; block matching requires examination of numerous vector candidates while optical flow-based techniques are iterative. In this paper, we propose a new approach that addresses both issues. The approach combines feature matching with Delaunay triangulation, and thus reliable long-range correspondences result while the computational complexity is not high (sparse representation). In the proposed approach, feature points are found first using a simple intensity corner detector. Then, correspondence pairs between two images are found by maximizing cross-correlation over a small window. Finally, the Delaunay triangulation is applied to the resulting points, and a dense vector field is computed by planar interpolation over Delaunay triangles. The resulting vector field is continuous everywhere, and thus does not reflect motion or depth discontinuities at object boundaries. In order to improve the rendition of such discontinuities, we propose to further divide Delaunay triangles whenever the displacement vectors within a triangle do not allow good intensity match. The approach has been extensively tested on stereoscopic images in the context of intermediate view reconstruction where the quality of estimated disparity fields is critical for final image rendering. The first results are very encouraging as the reconstructed images are of high quality, especially at object boundaries, and the computational complexity is lower than that of multi- resolution block matching.
Motion vector field improvement for picture rate conversion with reduced halo
Mark J. W. Mertens, Gerard de Haan
The quality of the interpolated images in picture rate upconversion is predominantly dependent on the accuracy of the motion vector fields. Block based MEs typically yield incorrect vectors in occlusion areas, which leads to an annoying halo in the upconverted video sequences. In the past we have developed a cost-effective block based motion estimator, the 3D Recursive Search ME, and an improved accuracy version for tackling occlusion, the tritemporal ME. In this article we describe how the vector field from this tritemporal ME is further improved by a retimer, using information from a foreground/background detector. More accurate motion vector fields are of importance to other applications also (e.g. video compression, 3D, scene analysis ...).
Fast adaptive diamond search algorithm for block-matching motion estimation using spatial correlation
Sang-Gon Park, Dong-Seok Jeong
In this paper, we propose a fast adaptive diamond search algorithm (FADS) for block matching motion estimation. Many fast motion estimation algorithms reduce the computational complexity by the UESA (Unimodal Error Surface Assumption) where the matching error monotonically increases as the search moves away from the global minimum point. Recently, many fast BMAs (Block Matching Algorithms) make use of the fact that global minimum points in real world video sequences are centered at the position of zero motion. But these BMAs, especially in large motion, are easily trapped into the local minima and result in poor matching accuracy. So, we propose a new motion estimation algorithm using the spatial correlation among the neighboring blocks. We move the search origin according to the motion vectors of the spatially neighboring blocks and their MAEs (Mean Absolute Errors). The computer simulation shows that the proposed algorithm has almost the same computational complexity with DS (Diamond Search), but enhances PSNR. Moreover, the proposed algorithm gives almost the same PSNR as that of FS (Full Search), even for the large motion with half the computational load.
Markovian motion field regularization based on the Gauss-Laguerre transform
Marco Carli, Giovanni Jacovitti, Alessandro Neri
In this paper we address the classical problem of estimating, from a pair of consecutive frames of a video sequence, the motion (or velocity field, or optical flow) produced by planar translations and rotations, in the context of the Gauss- Laguerre Transform (GLT) theory. This contribution extends some previous works of the authors on wavelet based Optimum Scale-Orientation Independent Pattern Recognition. In particular here we make use of an orthogonal system of Laguerre-Gauss wavelets. Each wavelet represents the image by translated, dilated and rotated versions of a complex waveform whereas, for a fixed resolution, this expansion provides a local representation of the image around any point. In addition each waveform is self-steerable, i.e. it rotates by simple multiplication with a complex factor. These properties allow to derive an iterative joint translation and rotation field Maximum Likelihood (ML) estimation procedure based on a bank of CHWs. In this contribution the coarse estimate obtained by the memoryless, point wise, ML estimator is refined by resorting to a compound Markovian model that takes into account the spatial continuity of the motion field associated to a single object, by heavily penalizing abrupt changes in the motion intensity and direction not located in correspondence of intensity discontinuities (i.e. mostly object boundaries).
Error-Resilient Coding
icon_mobile_dropdown
Error statistical analysis of H.263 syntactical elements
The purpose of our work is to report a solution for the problem of the Joint Source-Channel Coding (JSCC) of the H.263 bit stream to be transmitted through error prone channels. Our investigations led us to classify the H.263 syntactical elements into several classes of different significance. By identifying the most sensitive elements we developed a data partitioning (DP) technique which exhibits improved error resilience. A good reconstructed video quality is obtained for a constant 64 kbit/s transmission rate using Rate Compatible Punctured Convolutional (RCPC) codes of different rates for forward error protection. By exploiting the different syntactical element sensitivities we presented an Unequal Error Protection (UEP) scheme that surpasses the optimal Equal Error Protection (EEP). The forward error correction adopted has resulted in PSNR improvements over 20 dB for bit error rates higher than of 4 X 10-3.
Reliable video communication over lossy packet networks using multiple state encoding and path diversity
Video communication over lossy packet networks such as the Internet is hampered by limited bandwidth and packet loss. This paper presents a system for providing reliable video communication over these networks, where the system is composed of two subsystems: (1) multiple state video encoder/decoder and (2) a path diversity transmission system. Multiple state video coding combats the problem of error propagation at the decoder by coding the video into multiple independently decodable streams, each with its own prediction process and state. If one stream is lost the other streams can still be decoded to produce usable video, and furthermore, the correctly received streams provide bidirectional (previous and future) information that enables improved state recovery for the corrupted stream. This video coder is a form of multiple description coding (MDC), and its novelty lies in its use of information from the multiple streams to perform state recovery at the decoder. The path diversity transmission system explicitly sends different subsets of packets over different paths, as opposed to the default scenarios where the packets proceed along a single path, thereby enabling the end- to-end video application to effectively see an average path behavior. We refer to this as path diversity. Generally, seeing this average path behavior provides better performance than seeing the behavior of any individual random path. For example, the probability that all of the multiple paths are simultaneously congested is much less than the probability that a single path is congested. The resulting path diversity provides the multiple state video decoder with an appropriate virtual channel to assist in recovering from lost packets, and can also simplify system design, e.g. FEC design. We propose two architectures for achieving path diversity, and examine the effectiveness of path diversity in communicating video over a lossy packet network.
Coordinated packet-level protection with a corruption model for robust video transmission
Jin-Gyeong Kim, JongWon Kim, Jitae Shin, et al.
In this work, each packet is associated with a varying degree of significance based on the impact of its loss, which is estimated with a corruption model. For a compressed video stream, initial and propagation errors are first described by the macroblock (MB)-based corruption model by taking into account error concealment, temporal dependency, and loop filtering. Next, at the packet-level, a corruption model using the relative priority index (RPI) is constructed by integrating MB-based models. Furthermore, the effect of multiple-packet loss is analyzed and the associated RPI is calculated to improve the model accuracy at a higher packet loss rate. With the RPI-based corruption model for each packet, coordinated delivery of packetized video over QoS networks is investigated. Finally, a per-packet optimization framework with unequal error protection (UEP) is realized to consider both end-to-end video performance and pricing. The performance of the proposed solution is verified with intensive simulations.
Investigation of robust video streaming using a wavelet-based rate-scalable codec
Gregory W. Cook, Eduardo Asbun, Edward J. Delp III
In this paper we investigate the use of a fully rate scalable wavelet codec known as SAMCoW (Scalable Adaptive Motion Compensated Wavelet) for use in robust video streaming. We develop a theory based on the notion of additive temporal distortion to predict the performance of the bit stream under error conditions. Due to the regular nature of SAMCoW, a closed-form solution is found and compared experimentally to a SAMCoW stream in a simulated channel.
Image Sequence Analysis
icon_mobile_dropdown
Image sequence stabilization by low-pass filtering of interframe motion
Sarp Ertuerk
An image sequence stabilization system that removes translational jitter while preserving intentional camera pan is presented. The video sequence is processed to acquire global camera translations from frame to frame (global interframe motion vectors) by motion estimation. The resulting motion vectors are accumulated to construct an absolute frame position vs. frame number signal. This signal is low-pass filtered to remove high frequency components caused by jitter, and retain low frequency parts representing the intentional camera pan. Correction vectors for image frames are obtained by subtracting the absolute frame position from the low-pass filtered value, and stabilization is achieved by the corresponding translation of image frames.
Multisensor track fusion
Katerin Romeo, Piet B. W. Schwering, Marcel G. J. Breuers
The complementary nature of sensors in the visible light and in the infrared attracts research interest in multi-sensor image sequence analysis. The information made available in each wavelength can be combined. The combination of multi- sensor data with the temporal information makes target detection more robust against different sensor artifacts. In this paper we analyze the moving object segmentation and tracking with the fusion of multi-sensor data in two levels: in the detection level and in the decision level. In the first level the accuracy of the detection of all moving objects is analyzed while in the decision level the moving objects are classified as target or non-target, with the computed information in different wavelengths. Performance evaluation is done through ROC curves and with synthetic degradation methods. In the detection level approach registration techniques are used to transmit detected moving object coordinates from the visible to the infrared band and in the opposite direction. The results are compared to the detection rate in the destination band without fusion. In the decision level tracks from different sensors are fused and evaluated considering new ROC curves. The first promising results of the algorithm applied to the experimental data and the algorithm evaluation are presented.
Extracting meaningful regions for content-based retrieval of image and video
Yun Luo, Yujin Zhang, Yongying Gao, et al.
Meaningful region is the intermediate level between the original image and the interesting object of image. This level is an effective visual level for the representation of images and the successful extraction of meaningful regions from images helps to perform semantic segmentation. This paper proposes a scheme for roughly extracting meaningful regions in an image. By using multi-dimensional low-level feature analysis the local level of reliability of different features can be determined in order to adaptively weight the contribution of each feature to the segmentation process. Since the large variance of one feature always indicates that this feature would distinguish different objects clearly, a new weighted non-parametric clustering algorithm in the density space is implemented with suitably decided weights for different features. This permits us to utilize all the features efficiently and to extract semantic meaning from images. The above technique is proposed along with a retrieval application of landscape images. In this application, the object recognition plays an important role. The meaningful regions extracted should be merged into objects and more subtly semantic meaning could be obtained. Experiments on extracting meaningful regions both from still images and video clips are carried out with some satisfactory results.
Video object extraction based on adaptive background and statistical change detection
This paper introduces a system for video object extraction useful for general applications where foreground objects move within a slow changing background. Surveillance of indoor and outdoor sequences is a typical example. The originality of the approach resides in two related components. First, the statistical change detection used in the system does not require any sophisticated parametric tuning as it is based on a probabilistic method. Second, the change is detected between a current instance of the scene and a reference that is updated continuously to take into account slow variation of the background. Simulation results show that the proposed scheme performs well in extracting video objects, with stability and good accuracy, while being of relative reduced complexity.
Tracking of deformable objects
Parimal Aswani, K. K. Wong, Man Nang Chong
Tracking of moving-objects in image sequences is needed for several video processing applications such as content-based coding, object oriented compression, object recognition and more recently for video object plane extraction in MPEG-4 coding. Tracking is a natural follow-up of motion-based segmentation. It is a fast and efficient method to achieve coherent motion segments along the temporal axis. Segmenting out moving objects for each and every frame in a video sequence is a computationally expensive approach. Thus, for better performance, semi-automatic segmentation is an acceptable compromise as automatic segmentation approaches rely heavily on prior assumptions. In semi-automatic segmentation approaches, motion-segmentation is performed only on the initial frame and the moving object is tracked in subsequent frames using tracking algorithms. In this paper, a new model for object tracking is proposed, where the image features -- edges, intensity pattern, object motion and initial keyed-in contour (by the user) form the prior and likelihood model of a Markov Random Field (MRF) model. Iterated Conditional Mode (ICM) is used for the minimization of the global energy for the MRF model. The motion segment for each frame is initialized using the segment information from the previous frame. For the initial frame, the motion segment is obtained by manually keying in the object contour. The motion-segments obtained using the proposed model are coherent and accurate. Experimental results on tracking using the proposed algorithm for different sequences -- Bream, Alexis and Claire are presented in this paper. The results obtained are accurate and can be used for a variety of applications including MPEG-4 Video Object Plane (VOP) extraction.
Long-transition analysis for post shot-boundary detection
Wei Jyh Heng, King N. Ngan
With the introduction of the standard in video indexing, research in shot boundary detection has recently become popular. While new solutions for long transition detection have recently emerged, there is not much literature that focuses on what to do with the frames within the transition when they are detected. After the approximate position of the long transition is detected, the raw sequence cannot be used for segmentation and indexing due to the special effects incorporated. Here, a technique aims at extracting extra information from the transition after the existence of a long transition is confirmed. This technique consists of four stages, namely, shot boundary refinement, shot type determination, frame reconstruction for soft transition and pixel classification for hard transition. This paper gives the overview as well as performance of each stage. This technique allows detected transitions to be analyzed without human intervention.
Video classification in user profile generation for personalized broadcast services
Automatic generation of user profiles, as specified in the MPEG-7 user preference description scheme, for personized broadcast services is investigated in this work. Our research has focused on categorization of user-favored video into different semantically meaningful classes. This knowledge is then used in media filtering guidance and user preferred AV content selection. Several visual and motion features are extracted from source video sequences, such as the number of intra-coded macroblocks, the macroblock motion information, temporal variances and shot activity histograms, for the classification purposes. Moreover, to further improve the accuracy of classification results, a 'fuzzy nearest prototype classifier' is applied in this work. It is shown by experimental results that the proposed classification scheme is efficient and accurate.
Video Coding Algorithms
icon_mobile_dropdown
Experiments in MPEG-4 content authoring, browsing, and streaming
Atul Puri, Robert L. Schmidt, Andrea Basso, et al.
In this paper, within the context of the MPEG-4 standard we report on preliminary experiments in three areas -- authoring of MPEG-4 content, a player/browser for MPEG-4 content, and streaming of MPEG-4 content. MPEG-4 is a new standard for coding of audiovisual objects; the core of MPEG-4 standard is complete while amendments are in various stages of completion. MPEG-4 addresses compression of audio and visual objects, their integration by scene description, and interactivity of users with such objects. MPEG-4 scene description is based on VRML like language for 3D scenes, extended to 2D scenes, and supports integration of 2D and 3D scenes. This scene description language is called BIFS. First, we introduce the basic concepts behind BIFS and then show with an example, textual authoring of different components needed to describe an audiovisual scene in BIFS; the textual BIFS is then saved as compressed binary file/s for storage or transmission. Then, we discuss a high level design of an MPEG-4 player/browser that uses the main components from authoring such as encoded BIFS stream, media files it refers to, and multiplexed object descriptor stream to play an MPEG-4 scene. We also discuss our extensions to such a player/browser. Finally, we present our work in streaming of MPEG-4 -- the payload format, modification to client MPEG-4 player/browser, server-side infrastructure and example content used in our MPEG-4 streaming experiments.
Internet Video
icon_mobile_dropdown
Interactive browsing of 3D environment over the Internet
Cha Zhang, Jin Li
In this paper, we describe a system for wandering in a realistic environment over the Internet. The environment is captured by the concentric mosaic, compressed via the reference block coder (RBC), and accessed and delivered over the Internet through the virtual media (Vmedia) access protocol. Capturing the environment through the concentric mosaic is easy. We mount a camera at the end of a level beam, and shoot images as the beam rotates. The huge dataset of the concentric mosaic is then compressed through the RBC, which is specifically designed for both high compression efficiency and just-in-time (JIT) rendering. Through the JIT rendering function, only a portion of the RBC bitstream is accessed, decoded and rendered for each virtual view. A multimedia communication protocol -- the Vmedia protocol, is then proposed to deliver the compressed concentric mosaic data over the Internet. Only the bitstream segments corresponding to the current view are streamed over the Internet. Moreover, the delivered bitstream segments are managed by a local Vmedia cache so that frequently used bitstream segments need not be streamed over the Internet repeatedly, and the Vmedia is able to handle a RBC bitstream larger than its memory capacity. A Vmedia concentric mosaic interactive browser is developed where the user can freely wander in a realistic environment, e.g., rotate around, walk forward/backward and sidestep, even under a tight bandwidth of 33.6 kbps.
VBR transcoding architecture for video streaming
Yue Yu, Chang Wen Chen
The delivery of high quality video program existing in a video server through heterogeneous networks is a fast emerging service and has attracted much attention recently. Transcoding technique that converts a pre-encoded video sequence from a high bitrate to a low bitrate is a key component in such a system. Based on our previously proposed VBR coding for fixed storage application, we propose in this paper a VBR transcoding architecture to accomplish video streaming over heterogeneous networks. First, we assume that the pre-encoded bitstream is generated using the proposed VBR encoding scheme at a relatively high bitrate. In addition, the relationship between the amount of bits for each frame and all possible quantization factors, i.e. rate (R) and quantization (Q) function, is also generated at the video server. Once the users provide their access requirements, the video server will optimally compute the appropriate quantization factors for each frame according to the constraints of desired bitrate and finite buffer size. Then the compressed bitstream will be decoded to DCT domain and requantized with the updated quantization factors. Because the generation of R-Q function is based on the original frame and is not related to the quantization factors, it is possible to reuse the motion information embedded in the compressed bitstream and implement the transcoding in the DCT domain. The computational expense required by this proposed transcoder is much lower than those required by schemes which decode the compressed bitstream to the pixel domain and transcode to the desired bitrate using a complete encoder. Furthermore, this VBR transcoding scheme is able to generate a CBR-like output while retaining all advantages of VBR encoding. This facilitates the delivery of VBR bitstream through various existing CBR channels. Experimental results demonstrate that our proposed VBR transcoding not only is capable of achieving higher mean PSNR and more consistent decoded visual quality, but also requires much less computational expense compared with conventional pixel domain CBR transcoder.
Fine-grained loss protection for robust Internet video streaming
Mihaela van der Schaar, Hayder Radha
Several embedded video coding schemes have been recently developed for multimedia streaming over IP. In particular, Fine-Granular-Scalability (FGS) video coding has been recently adopted by the MPEG-4 standard as the core video-compression method for streaming applications. From its inception, the FGS scalability structure was designed to be packet-loss resilient especially under Unequal Packet-loss Protection (UPP). In this paper, we introduce the notion of Fine Grained Loss Protection (FGLP), which provides UPP within the FGS enhancement-layer, and we develop an analytical framework for evaluating FGLP bounds. Based on these bounds, we show the impact of applying fine-grained protection to the FGS enhancement-layer for different types of video sequences and over a wide range of bit-rates and packet-loss ratios. Subsequently, the FGLP framework has been implemented using forward error correction codes and its performance for Internet video streaming has been evaluated. As illustrated by our extensive simulation results, fine-grained loss protection for the FGS enhancement- layer can provide significant resilience under moderate-to- high packet-loss ratios (e.g. 5 - 10%).
Polyphase down-sampling multiple-description coding for IP transmission
Marcello Caramma, Marco Fumagalli, Rosa C. Lancini
Recently the problem of transmitting data over heterogeneous networks has received considerable attention. Our specific interest is the communication of still images. In this work we present a Multiple Description system based on Polyphase DownSampling algorithm (PDMD) that splits an image source in an arbitrary number of balanced descriptions. Our goal is to obtain both the best reconstruction quality from the received descriptions (also only one) and a subjective indistinguishable reconstruction from the original, if all the descriptions are available.
Scalable image coding for interactive image communication over networks
Sung Ho Yoon, Ji Hyun Lee, Winser E. Alexander
This paper presents a new, scalable coding technique that can be used in interactive image/video communications over the Internet. The proposed technique generates a fully embedded bit stream that provides scalability with high quality for the whole image and it can be used to implement region based coding as well. The embedded bit stream is comprised of a basic layer and many enhancement layers. The enhancement layers add refinement to the quality of the image that has been reconstructed using the basic layer. The proposed coding technique uses multiple quantizers with thresholds (QT) for layering and it creates a bit plane for each layer. The bit plane is then partitioned into sets of small areas to be coded independently. Run length and entropy coding are applied to each of the sets to provide scalability for the entire image resulting in high picture quality in the user-specific area of interest (ROI). We tested this technique by applying it to various test images and the results consistently show high level of performance.
Scalable Internet video streaming with differential JPEG-2000 video codec and content-aware rate control
Lifeng Zhao, JongWon Kim, C.-C. Jay Kuo
A scalable video streaming system that includes video preprocessing, differential JPEG-2000 (DJ2K) video codec, and content-aware rate control is proposed. The proposed streaming solution tackles the problem of maintaining smooth video quality even when there is a significant change in the source content or the transmission bandwidth. The wavelet-based JPEG- 2000 video codec provides temporal, spatial and SNR scalabilities. A rate controller, which is separated from the encoder, can deal with diverse demands of a wide variety of users and various traffic conditions in a heterogeneous network by leveraging offered scalabilities and content- awareness. Simulations are conducted to demonstrate various advantages of the proposed solution.
Face Tracking and Recognition
icon_mobile_dropdown
Tracking faces of arbitrary views for video annotation
We proposed an omni-face tracking system for video annotation in this paper, which is designed to find faces from arbitrary views in complex scenes. The face detector first locates potential faces in the input by performing skin-tone detection. The subsequent processing consists of two largely independent components, the frontal face module and the side- view face module, responsible for finding frontal-view and side-view faces, respectively. The frontal face module uses a region-based approach wherein regions of skin-tone pixels are analyzed for gross oval shape and the presence of facial features. In contrast, the side-view face module follows an edge-based approach to look for curves similar to a side-view profile. To extract the trajectories of faces, the temporal continuity between consecutive frames within the video shots is considered to speed up the tracking process. The main contribution of this work is being able to find faces irrespective of their poses, whereas contemporary systems deal with frontal-view faces only. Information regarding to human faces is encoded in XML format for semantic video content representation. The effectiveness of human face for video annotation is demonstrated in a TV program classification system that categories the input video clip into predefined types. It is shown that the classification accuracy is improved saliently by the employment of face information.
Face detection and tracking using edge orientation information
Bernhard Froeba, Christian Kueblbeck
An important topic in face recognition as well as in video coding or multi-modal human machine interfaces is the automatic detection of faces or head-and-shoulder regions in visual scenes. The algorithms therefore should be computationally fast enough to allow an online detection and parallel processing of the detected objects. In this paper we describe our ongoing work on face detection using an approach that models the face appearance by edge orientation information. We will show that edge orientation is a powerful local image feature to describe objects like faces or body parts. We will present a sample and efficient method for template matching and object modeling based solely on edge orientation information. We also show how to obtain an optimal face model in the edge orientation domain from a set of training images. Unlike many approaches that model the gray level appearance of the face our approach is computationally very fast. It takes less than 0.1 seconds on a Pentium II 500 MHz for a 190 X 140 image to be processed using a multi- resolution search with three resolution levels. We demonstrate the capability of our detection method on an image database of 7000 images taken from more than 200 different people. The variations in head size, lighting and background are considerable. The obtained detection rate is more than 97% on that database. We have also extended the algorithm for face tracking. The tracking capabilities are shown using results from a real-time implementation of the proposed algorithm.
Simplified gaze-correction method using 3D mesh warping
Insuh Lee, Byeungwoo Jeon
Under the typical video communication configuration in which a camera is placed on top or at lateral side of a monitor, the face-to-face video communication has an inherent difficulty of poor eye contacts since the users stare at the monitor screen rather than directly seeing the camera lens. In this paper, we propose an image warping technique for gaze-correction which performs 3D warping of face object in the given image by a certain correction angle. The correction angle which is the angle between the direction of eye gaze and that to the camera is estimated in an unsupervised way by using eye tracking technique. Experimental results with real image data shows much enhanced naturalness which the face-to-face video communication has to offer.
Emotion-independent face recognition
Liyanage C. De Silva, Kho Guan Poh Esther
Current face recognition techniques tend to work well when recognizing faces under small variations in lighting, facial expression and pose, but deteriorate under more extreme conditions. In this paper, a face recognition system to recognize faces of known individuals, despite variations in facial expression due to different emotions, is developed. The eigenface approach is used for feature extraction. Classification methods include Euclidean distance, back propagation neural network and generalized regression neural network. These methods yield 100% recognition accuracy when the training database is representative, containing one image representing the peak expression for each emotion of each person apart from the neutral expression. The feature vectors used for comparison in the Euclidean distance method and for training the neural network must be all the feature vectors of the training set. These results are obtained for a face database consisting of only four persons.
Wireless Video
icon_mobile_dropdown
New robust coding scheme for wireless video transmission based on coefficient sampling
Min-Sup Kim, Seong-Dae Kim
Burst error is the most difficult problem to cope with in a wireless video transmission. In this paper, a new robust coding method is proposed, which is based on the spreading of burst error. We propose a coefficient sampling method for spreading, and show a projections onto convex sets (POCS) method fits well into this coefficient sampling method as a restoration method. The simulation results show that the proposed method restores the lost information very well.
Proxy-based approaches for IDCT acceleration
Wendi Pan, Antonio Ortega, Ibrahim N. Hajj-Ahmad, et al.
In the boundary between wired and wireless worlds, proxy-based processing has been recognized as an efficient approach to provide client-specific adaptation to improve the overall performance. For the portable video decoding devices with wireless access ability, fast, low-power decoding is desirable. In this paper, we propose a novel proxy-based framework that is capable of accelerating the IDCT operations performed at the client. We introduce a proxy-specific IDCT algorithm that is not only suitable for the proxy framework, but has fine-grained complexity scalability as well. We then demonstrate the effectiveness of the proxy by two examples. The first example is a simple proxy that is assigned the task of IDCT block classification, which is performed at the client along with the IDCT operation conventionally. Experimental results clearly show the complexity advantage of this method. The second example is a more active proxy. Adaptation to the image statistics is introduced. Simulations of the optimization process based on the average complexity criteria show that the client complexity can be further reduced, while maintaining a low side information overhead. Both examples provide the trade-off between the client-proxy bandwidth increment and the IDCT complexity reduction at the client.
Reliable video transmission over fading wireless channel using joint source-channel adaptation
This research presents a robust video transmission scheme over fading wireless channels that relies on a coordinated protection effort in handling channel and source variations dynamically. Given the priority of each source packet and the estimated channel condition, an adaptive protection scheme based on joint source-channel criteria is investigated via proactive forward error correction (FEC). A product code is developed based on both the Reed-Solomon (RS) code and rate- compatible punctured convolutional (RCPC) codes. We explore the dynamic programming algorithm in matching the relative priority of source packets to instantaneous channel conditions. To obtain a realistic joint source-channel adaptation scheme, special attention has been paid to the channel status feedback in terms of accuracy and delay, the product code tradeoff, and the involved packetization efficiency. The performance improvement due to adaptation via a dynamic programming solution is demonstrated by simulating the wireless transmission of error resilient ITU-T H.263+ video.
Channel-adaptive unequal error protection for scalable video transmission over wireless channel
Guijin Wang, Qian Zhang, Wenwu Zhu, et al.
Scalable video delivery over wireless link is a very challenging task due to the time-varying characteristics of wireless channels. This paper proposes a channel-adaptive error control scheme for efficiently video delivery, which consists of dynamically channel estimation and channel- adaptive Unequal Error Protection (UEP). In our proposed channel-adaptive UEP scheme, a bit allocation algorithm is presented to periodically allocate the available bits among different video layers based on varying channel conditions so as to minimize the end-to-end distortion. Simulation results show that our proposed scheme is efficient under various channel conditions.
Posters I: Processing and Analysis of Visual Information
icon_mobile_dropdown
Sketch on dynamic gesture tracking and analysis exploiting vision-based 3D interface
Woontack Woo, Namgyu Kim, Karen Wong, et al.
In this paper, we propose a vision-based 3D interface exploiting invisible 3D boxes, arranged in the personal space (i.e. reachable space by the body without traveling), which allows robust yet simple dynamic gesture tracking and analysis, without exploiting complicated sensor-based motion tracking systems. Vision-based gesture tracking and analysis is still a challenging problem, even though we have witnessed rapid advances in computer vision over the last few decades. The proposed framework consists of three main parts, i.e. (1) object segmentation without bluescreen and 3D box initialization with depth information, (2) movement tracking by observing how the body passes through the 3D boxes in the personal space and (3) movement feature extraction based on Laban's Effort theory and movement analysis by mapping features to meaningful symbols using time-delay neural networks. Obviously, exploiting depth information using multiview images improves the performance of gesture analysis by reducing the errors introduced by simple 2D interfaces In addition, the proposed box-based 3D interface lessens the difficulties in both tracking movement in 3D space and in extracting low-level features of the movement. Furthermore, the time-delay neural networks lessens the difficulties in movement analysis by training. Due to its simplicity and robustness, the framework will provide interactive systems, such as ATR I-cubed Tangible Music System or ATR Interactive Dance system, with improved quality of the 3D interface. The proposed simple framework also can be extended to other applications requiring dynamic gesture tracking and analysis on the fly.
Research environment for developing and testing object tracking algorithms
Todd Schoepflin, Christopher Lau, Rohit Garg, et al.
We present an integrated research environment (RAVEN) that we have developed for the purpose of developing and testing object tracking algorithms. As a Windows application, RAVEN provides a user interface for loading and viewing video sequences and interacting with the segmentation and object tracking algorithms, which are included at run time as plug- ins. The plug-ins interact with RAVEN via a programming interface, enabling algorithm developers to concentrate on their ideas rather than on the user interface. Over the past two years. RAVEN has greatly enhanced the productivity of our researchers, enabling them to create a variety of new algorithms and extended RAVEN's capabilities via plug-ins. Examples include several object tracking algorithms, a live- wire segmentation algorithm, a methodology for the evaluation of segmentation quality, and even a mediaprocessor implementation of an object tracker. After implementing an algorithm, RAVEN makes it easy to present the results since it provides several mask display modes and output options for both image and video. We have found that RAVEN facilitates the entire research process, from prototyping an algorithm to visualization of the results to a mediaprocessor implementation.
Curvature analysis approach to shape coding using B-splines
This work presents an algorithm for efficient shape coding using cubic B-splines. In the framework of object-based layered coding of image sequences, shape information is essential for content-based access to video objects, and its efficient encoding needs to be investigated. We present a rate and distortion controlled algorithm for vide object shape approximation by variable number of cubic B-spline segments and motion compensated inter-frame coding of B-spline control points. Rate-distortion efficiency of the proposed algorithm is compared to MPEG-4 context arithmetic encoding and two stage motion compensated chain coding.
Segmentation and compression techniques for 3D animation models based on motion trajectory in the spherical coordinate system
Jeong-Hwan Ahn, Yo-Sung Ho
In recent days, applications using 3D animation models are increasing. Since the 3D animation model contains a huge amount of information, data compression is needed for efficient storage or transmission. Although there have been various proposals for 3D model coding, most works have considered only static connectivity and geometry information. Only a few studies have been presented for 3D animation models. This paper presents a coding scheme for 3D animation models using a new 3D segmentation algorithm. For an accurate segmentation, we take advantage of temporal coherence in the generic animated 3D model. After the motion vector of each vertex is mapped onto the surface of the unit sphere in the spherical coordinate system, we partition the surface of the sphere equally to have the same area. We then reconstruct in- between 3D models using the reconstructed key frame and an affine motion model for each segmented unit.
Extraction of texture regions using region-based local correlation
Sang Yong Seo, Chae Whan Lim, Young Deok Chun, et al.
We present an efficient algorithm using a region-based texture feature for the extraction of texture regions. The key idea of this algorithm is based on the fact that most of the variations of local correlation coefficients (LCCs) according to different orientations are clearly larger in texture regions than in shade regions. An object image is first segmented into homogeneous regions. The variations of LCCs are next averaged in each segmented region. Based on the averaged variations of LCCs, each region is then classified as a texture or shade region. The threshold for classification is found automatically by an iterative threshold selection technique. In order to evaluate the performance of the proposed algorithm, we use six test images (Lena, Woman, Tank, Jet, Face and Tree) of 256 X 256 8-bit pixels. Experimental results show that the proposed feature suitably extracts the regions that appear visually as texture regions.
New real-time algorithm for multiview disparity-based 3D-object segmentation
Yuecheng Zhang, Yun He
With the development of video communication system, more and more attention has been put on the implementation of varieties of functionality associated with content. Thus, the object segmentation algorithms features both semantic and automatic are tremendous necessary. In this paper a new real-time algorithm for 3D-object segmentation algorithm based on disparity is proposed. Its kernel disparity matching methodology is developed on base of a general pixel-based segmentation algorithm known a Dynamic Programming on Disparity Space Image. With the introduction of weighted feature span, the computational complexity is reduced apparently. The proposed algorithm is tested on both real captured stereo sequence and synthetic multiview sequence, and gets satisfying result in generality, quality and complexity.
360-deg panoramic camera using a mirror rotation mechanism
Toshiyasu Nakao, Atsushi Kashitani
In this paper, we describe a new panoramic camera that incorporates a 2-axes mirror rotation mechanism and image mosaicing software to get high-resolution panoramic images in a short time. The mirror is located in front of the camera via the mirror rotation mechanism to move the camera's view. Partial images taken with the mirror rotation are merged into a high-resolution panoramic image by mosaicing software. The 2-axes mirror rotation mechanism consist of a mirror pedestal and a cam. The image mosaicing software projects each partial image onto a projection surface around the rotation center of the mirror. In that process, the projection center is shifted from the original principal point of the lens or the viewpoint to the mirror rotation center, and the projection positions for each pixel of partial images are calculated by using the limiting point. With these features, our panoramic camera has four advantages: (1) accurate, continuous, high-resolution, large and endless (360 degrees wide) panoramic images; (2) short panoramic image acquisition time with fast mirror rotation; (3) small body size; and (4) endless (360 degrees wide) fast mirror rotation. We also describe its prototype and its application to the Internet broadcasting of a tennis game.
Developing an integrated video analysis system
Ankush Mittal, Loong-Fah Cheong
Matching the similarity between two units of data occurs as a frequent task in video or image analysis. The parameters of matching techniques are level of abstraction of features, distance measures and normalization of features, if supported, or else the method of relatively weighing the features. Most multimedia analysis systems employ only low level features with distance measures similar to Euclidean distance, with no method to automatically generate the weights of the features and thus are ineffective in replenishing suitable matches to the user's demands. We argue for shifting the burden of mapping the feature space with relevant categories from the user to the multimedia analysis system. In this paper, a Bayesian Framework is presented where the evaluation of the parameters of classification and especially the relevancy of each feature with respect to each class is performed automatically. The probabilistic framework is extended to work well for generalized multi-modal distribution of a particular class over the feature space. Theoretical foundation is developed to provide simultaneously existing multiple views to an image or a video sequence. The low-level features can be synthesized with intelligent association to furnish high-level features, which could be more meaningful to the user. The significance of this work is presented by comparing with a system which employs an unsophisticated approach similar to common systems where feature vector of query image and feature vector of template image are compared by means of weighted Euclidean distance. The superiority of our approach is presented over the database consisting of 300 video sequences comprising of diverse video classes.
User-assisted segmentation algorithm using B-spline curves
Daehee Kim, Yo-Sung Ho
Most automatic segmentation techniques have difficulties in extracting individual video objects in a single frame. They are somewhat premature to obtain desirable segmentation results from various kinds of image sequences. However, if the user can provide semantic information of video objects for the first frame in a user-assisted manner, improved segmentation results can be obtained in the following picture frames. The user-assisted or semi-automatic approach for video segmentation is more practical in generating VOPs of moving objects. In this paper, we propose a new semi-automatic video segmentation scheme using an active contour algorithm. In most active contour algorithms, samples of the contour convey insufficient information about the curve shape between samples due to finite difference approximation. In addition, since most active contour algorithms have been developed for images of homogeneous simple background, they are not applicable to moving objects in the complex background. Therefore, this paper addresses the B-spline representation of curves and the morphological feature extraction for video object segmentation.
Object segmentation using skin color and motion information
Eui-Yoon Chung, Ho-Keun Lee, Hyung Suk Kim, et al.
This paper presents an effective object segmentation method for object-oriented coding. The process is composed of facial region detection using skin color and changed region detection using motion information. The image is then segmented between moving objects and background using motion estimation. The facial regions are detected using skin color, the data for which is obtained from the u, v image. This data is then used as a threshold to segment the facial region. The image is also segmented into changed regions using motion information. After combining these results, the final image is segmented between moving objects and background. This method is more efficient due to two factors, color and motion. Experimental results show that the proposed method can significantly improve the image quality.
Efficient algorithm for video sequence matching using the Hausdorff distance and the directed divergence
Sang Hyun Kim, Rae-Hong Park
To manipulate large video databases, effective video indexing and retrieval are required. While most algorithms for video retrieval can be used for frame-wise user query or video content query, video sequence matching has not been investigated much. In this paper, we propose an efficient algorithm to match the video sequences using the modified Hausdorff distance, and a video indexing method using the directed divergence of histograms between successive frames. To effectively match the video sequences and to reduce the computational complexity, we use the key frames extracted by the cumulative directed divergence, and compare the set of key frames, using the Hausdorff distance. Experimental results show that the proposed video sequence matching and video indexing algorithms using the Hausdorff distance and the directed divergence yield the remarkably high accuracy and performances compared with conventional algorithms such as histogram difference or histogram intersection methods.
Motion estimation by Fermat number transform
Nam-Ho Kim, Samuel Moon-Ho Song
A new block-matching algorithm, based on the Fermat Number Transform (FNT), is presented. It declares the most correlated-block as the best matching block, as opposed to declaring the block with the least sum of differences between the blocks. The proposed number theoretic approach significantly reduces the computational complexity for the estimation process.
Fast-block-matching motion estimation algorithm using optimal search patterns
Dong-Keun Lim, Yo-Sung Ho
For video compression, motion estimation is popularly employed to exploit temporal correlation existing in video sequences. If we employ the full search block matching algorithm for estimating motion vectors, it requires very heavy computational complexity. Although several fast block matching algorithms have been proposed to solve this problem, they sacrifice their reconstructed image quality. In this paper, we derive optimal search patterns for fast block matching motion estimation. By analyzing the block matching algorithm as a function of the block size and the shape, we find optimal search patterns for initial motion estimation. The proposed idea can provide an analytical ground for the current MPEG-2 proposals. In addition, we propose a new fast motion estimation algorithm using adaptive search patterns, considering matching criteria and statistical properties of object displacement. In order to select an appropriate search pattern, we exploit the relationship between the motion vector and the frame difference of each block. By changing the search pattern adaptively, we can improve the motion prediction accuracy, while reducing the required computational complexity compared to other fast block matching algorithms.
Simultaneous digital focusing and motion blur removal using segmentation-based adaptive regularization
SangKyu Kang, Jihong Min, Joon-Ki Paik
Recently, many image processing systems are required to offer high-quality images. For example, when we use a surveillance system with a digital camcorder and a digital video recorder, it is highly probable that the acquired image suffers from various image degradation, such as motion blur and out-of- focus blur. With such degradation, we cannot obtain important information. This is mainly cased by limited performance of image formation system. In this work, we investigate the causes of focus blur and motion blur. With the simultaneous formulation of the corresponding degradation, we propose a spatially adaptive regularization algorithm for restoring out- of-focus and motion blurred images. Accordingly, we present a method to estimate blur parameters and a segmentation method for spatially adaptive processing.
3D modeling using hierarchical feature point and spatio-temporal relationship
Ho-Keun Lee, Sun-Kyu Kwon, Hee-Soo Kim, et al.
This paper proposes a new 3D modeling technique based on feature points using spatio-temporal relationship. Normally, the generation of a 3D model from a real scene requires the computation of the depth of the model vertices from a dense correspondence map of the whole image. This is very time consuming, plus it is also quite difficult to achieve an accurate depth. The proposed method can generate a 3D model of an object based on identifying the correspondence of certain feature points without the need for a dense correspondence map. The proposed method consists of three parts: the extraction of an object, the extraction of feature points, and hierarchical 3D modeling using the classified feature points. This method is effective in generating a 3D model and expressing the smoothness of plain regions and sharpness of edges.
Adaptive regularized image interpolation using data fusion and steerable constraints
Jeong-Ho Shin, Joon-Ki Paik, Jeffery R. Price, et al.
This paper presents an adaptive regularized image interpolation algorithm from blurred and noisy low resolution image sequence, which is developed in a general framework based on data fusion. This framework can preserve the high frequency components along the edge orientation in a restored high resolution image frame. This multiframe image interpolation algorithm is composed of two levels of fusion algorithm. One is to obtain enhanced low resolution images as an input data of the adaptive regularized image interpolation based on data fusion. The other one is to construct the adaptive fusion algorithm based on regularized image interpolation using steerable orientation analysis. In order to apply the regularization approach to the interpolation procedure, we first present an observation model of low resolution video formation system. Based on the observation model, we can have an interpolated image which minimizes both residual between the high resolution and the interpolated images with a prior constraints. In addition, by combining spatially adaptive constraints, directional high frequency components are preserved with efficiently suppressed noise. In the experimental results, interpolated images using the conventional algorithms are shown to compare the conventional algorithms with the proposed adaptive fusion based algorithm. Experimental results show that the proposed algorithm has the advantage of preserving directional high frequency components and suppressing undesirable artifacts such as noise.
Fast adaptive-spatial-varying filtering for inverse halftoning
Ming Sun Fu, Oscar Chi Lim Au
Inverse halftoning is a process to recover the multi-tone images from halftone images. Since halftoning is not an invertible process, there is no clear best or unique approach for inverse halftoning. In this paper, we propose a low complexity, high quality Fast Adaptive-spatial-varying Filtering (FAF) for inverse halftoning, which is robust to different kinds of halftone images. FAF is a two-step algorithm combining spatial-variant and spatial invariant filtering. Firstly, it uses spatial invariant filter to suppress the halftone noise and then generates the target image. Secondly, it uses spatial variant filter to filter the halftone image based on the target image. Combining these two filtering, FAF can filter the noise and preserve edges of the halftone image at the same time, which is essential for effective inverse halftoning.
Enhancement of out-of-focus images using fusion-based PSF estimation and restoration
Joonshik Yoon, Jeong-Ho Shin, Joon-Ki Paik
In this paper, we propose an enhancement algorithm of out-of- focused images using fusion-based Point-spread-function (PSF) estimation and restoration. The proposed algorithm can make in-focused image by using only digital image processing techniques, and it requires neither infrared light/ultrasound nor focusing lens assembly operated by electrically powered movement of focusing lens. In order to increase accuracy in estimating the PSF of the defocus image, the proposed algorithm finds true and linear edges by using Canny edge detector, which is optimal edge detector and has good localization, estimates the step response across the edge for each pixel, computes the one-dimensional step response by averaging the step responses, estimates the two-dimensional PSF from the averaged step response, and then provides in- focused image by image restoration filter based on the estimated PSF. Finally, we execute fusion process, which can enhance the quality of the fused image by fusing restored images. There is a limit of the amount of out-of-focus, which can be recovered by the proposed algorithm. Moreover, the proposed algorithm is operating under assumption that an input image contains at least one piece-wise linear boundary between an object and background. In spite of above-mentioned limitations, the proposed algorithm can make acceptable quality of focused image by using only digital image processing.
Posters II: Visual Communication
icon_mobile_dropdown
Regularized dequantizers for DCT-based transform coding of images
Gunho Lee, Sinae Kim, Samuel Moon-Ho Song
We have recently proposed a new dequantization scheme for DCT- based transform coding based on regularization principles. The new approach sharply reduced blocking artifacts in decoded images and the performance of the new dequantization scheme has been evaluated against the standard JPEG, MPEG and H.263+ in terms of the peak-signal-to-noise ratio (PSNR) with our own definition of the blockiness measure (BM). Basically, the proposed dequantizer maps the received data to within the range +/- (quantizer spacing/2), so that the final decompressed image is 'smooth' in the sense of minimizing the cost functional including the stabilizing term weighted by a regularization parameter. In this paper, we focus on several important aspect of this regularized dequantizer, namely the selection of the regularization parameter and the convergence of the dequantization algorithm.
Portable whiteboard system with vision input
Ferdinand Hendriks, Xiping Wang, Belle L. Tseng, et al.
We present a novel whiteboard system that uses one or more active input devices. The system is especially suitable for situations in which several users provide input on a large writing surface which also serves as a projection surface. That surface can have general orientation. We have implemented a system that uses an inexpensive, infrared-emitting stylus and an off-the-shelf videoconferencing camera fitted with an IR transmitting/visible blocking filter to capture handwritten strokes. The system uses a calibration method that uses projective mapping thus allowing off-axis camera placement. We also propose a self-calibrating system in which the capture device shares its optical system with the projector. In a system with multiple local users the question of 'who wrote what' becomes relevant. Therefore, we address the question of attaching identify to stroke information.
Model-based coding for multiobject sequence
Yunhai Liu, Lu Yu, Qingdong Yao
Model-based coding has been studied mostly on video sequence with only one person. Normally there are several persons in the videoconference scenes. In this paper, model-based coding for multi-objects (or multi-person) sequence is explored. A multi-scale 3D wire-frame of head has been constructed to fit for different head size on the scene. The object is composed of several components such as head, shoulder and arm. Wire- frame of shoulder and arm are built in this contribution. By connecting different components according to physical structure, a more complete wire-frame of upper body with head is built. The ways of motion parameters transferring through the different components are determined. The bit-rate of three-object sequence is compared with one-object sequence with the same image format. Model-based coding efficiency of multi-object sequence is higher than that of one-object sequence.
Adaptation of facial synthesis to parameter analysis in MPEG-4 visual communication
Lu Yu, Jingyu Zhang, Yunhai Liu
In MPEG-4, Facial Definition Parameters (FDPs) and Facial Animation Parameters (FAPs) are defined to animate 1 a facial object. Most of the previous facial animation reconstruction systems were focused on synthesizing animation from manually or automatically generated FAPs but not the FAPs extracted from natural video scene. In this paper, an analysis-synthesis MPEG-4 visual communication system is established, in which facial animation is reconstructed from FAPs extracted from natural video scene.
Fractal-based gradient match and side match vector quantization for image coding
Hsuan-Ting Chang, Tuan Yung Han
In this paper, we propose the fractal-based gradient match vector quantizers (FGMVQs) and the fractal-based side match vector quantizers (FSMVQs) for the image coding framework. The proposed schemes are based upon the non-iterative fractal block coding (FBC) technique and the concepts of the gradient match vector quantizers (GMVQs) and the side match vector quantizers (SMVQs). Unlike the ordinary GMVQs and SMVQs, the super codebooks in the proposed FGMVQs and FSMVQs are generated from the affine-transformed domain blocks in the non-iterative FBC technique. The codewords in the state codebook are dynamically extracted from the super codebook with the side-match and gradient-match criteria. The redundancy in affine-transformed domain blocks is greatly reduced and the compression ratio can be significantly increased. Our simulation results show that about 10% - 20% bit rates in the non-iterative FBC techniques are saved by using the proposed FGMVQs and FSMVQs.
Motion vector synthesis algorithm for MPEG-2-to-MPEG-4 transcoder
Kuniaki Takahashi, Kazushi Satoh, Teruhiko Suzuki, et al.
We have developed a Motion Vector (MV) Synthesis algorithm for an MPEG2-to-MPEG4 transcoder. MPEG2 bitstream parameters are utilized to adaptively scale and refine the MPEG2 MVs to synthesize MPEG4 MVs. The simulation results show that our MV Synthesis algorithm produces results that are equivalent to VM Full Search in both coding efficiency and subjective image quality, with significant reduction in complexity.
Predictive motion vector field adaptive search technique (PMVFAST): enhancing block-based motion estimation
Motion Estimation (ME) is an important part of most video encoding systems, since it could significantly affect the output quality of an encoded sequence. Unfortunately this feature requires a significant part of the encoding time especially when using the straightforward Full Search (FS) algorithm. In this paper a new algorithm is presented named as the Predictive Motion Vector Field Adaptive Search Technique (PMVFAST), which significantly outperforms most if not all other previously proposed algorithms in terms of Speed Up performance. In addition, the output quality of the encoded sequence in terms of PSNR is similar to that of the Full Search algorithm. The proposed algorithm relies mainly upon very robust and reliable predictive techniques and early termination criteria, which make use of parameters adapted to the local characteristics of a frame. Our experiments verify the superiority of the proposed algorithm, not only versus several other well-known fast algorithms, but also in many cases versus even the Full Search algorithm.
Robust video coding for underwater transmission
Tim Collins, Phillip Atkins, Jon Davies, et al.
A robust image compression algorithm has been developed for use in underwater video transmission. The hostile nature of the medium leads to high error rates, normally requiring a severe overhead in terms of forward error correction bits. The new algorithm reduces this requirement by means of error resilient organization of the compressed image data and variable rate protection coding. Simulation results for the standard 512 X 512 Lena image are presented as well as practical results from recent sea-trials.
New object-based variational approach for MPEG-2 data recovery over lossy packet networks
Joel Jung, Marc Antonini, Michel Barlaud
In this paper, we propose an object-based variational approach for the decoding of MJPEG and MPEG1-2 video sequences. This new method improves the visual quality of the reconstructed sequence by reducing the impact of cell losses, due to the transmission of the sequences over lossy packet networks. Generally, two kinds of approaches are suggested to tackle this problem: a preventative 'network' approach tends to provide a more robust bit-stream in order to help the recovery of the lost data, for instance by adding redundancy to the bitstream, and a concealment 'image' approach, that tends to reduce visual impact resulting from the packet loss. The proposed method combines 'network' information provided by the transmission protocol, with an adapted 'image' decoding algorithm. Several experimental results demonstrate the efficiency of the decoding method: lost areas are largely recovered, compared to a standard decoding, both on the background and on the objects.
Fully scalable video codec
Shu-Wei Liu, Chi-Hui Huang, Ja-Ling Wu
In this paper, we present the video coder which achieves various kinds of continuous scalabilities simultaneously, including spatial scalability, temporal scalability and SNR scalability. By applying the wavelet transform that maps integer to integer in spatial and temporal dimensions, it is possible to perfectly reconstruct the original video sequences. Due to the energy compaction in the temporal domain, using wavelet transform alone does not work well, we replace it by applying the motion estimation at the Low-Low (LL) band to eliminate ringing effects. Following the 2-D SPECK algorithm to represent the wavelet (or residual) coefficients based on the bit plane quantization (according to magnitude comparisons), a precise rate control and a self- adjusting rate allocation are achieved. In order to provide all kinds of scalabilities, a multiresolution encoding/decoding concept is presented and implemented. In which the significant wavelet coefficients corresponding to different resolutions in the original embedded bit stream are interleaved. In the multiresolution decoding process, the user can specify his (or her) own desired spatial and temporal resolutions, and the server will produce the combine bit stream according to the user requirement. In addition to spatial and temporal scalabilities, SNR scalability can be achieved by fitting the exact target bitrate to the available network bandwidth.
New programmable video signal processor for motion estimation and motion compensation
Danian Gong, Yun He
A new programmable and parallel video signal processor (PVSP) is proposed to implement a class of fast block matching algorithms (BMA). Five parameters are used to embody the various fast BMAs and a generic framework for the fast BMA is designed. Then PVSP is designed and the BMA algorithms are mapped onto it. The key modules of PVSP are discussed in details. PVSP uses horizontal and vertical mapping strategies to implement byte-aligned and circular addressing. Under these two mapping strategies the memory system of PVSP can be configured flexibly. The regular, iterative and low-delay tree adder without any pipeline stages is chosen as the key computation component to implement the BMA. With scalable tree adder, PVSP can be easily expanded to meet higher computing requirement. PVSP is estimated to have approximately 30 kGates and 40 kb SRAM and it can work at the frequency of 133 MHz. It can support the motion estimation and motion compensation tasks in real-time for MPEG2 MPML encoder.
Objective measurement scheme for perceived picture quality degradation caused by MPEG encoding without any reference pictures
Osamu Sugimoto, Ryoichi Kawada, Masahiro Wada, et al.
The rapid progress in digital transmission technology has spurred demand for developing a technology that supports monitoring of video transmissions. Automating the picture quality assessment process is in particular demand because it currently depends on subjective assessments by human operators and places a heavy burden on them. The authors therefore propose an objective picture quality measurement method that works without reference pictures. In this proposed method, invisible markers are embedded into original pictures by use of the spread spectrum data hiding method. Since the markers are widely spread over the frequency domain, the degradation caused by MPEG compression can be estimated by detecting the extent of marker degradation. Our method is usable regardless of the kind of picture, bitrate, and number of stages of tandem codec connections. The degradation in picture quality caused by the embedded markers is quite small and not perceivable by the human eye. The proposed method is therefore applicable to a wide range of visual transmission services.
Loop/post filter to suppress blocking and ringing artifacts for H.26L video codec
Min-Cheol Hong, Hurnsoo Hahn
In this paper, an efficient separable one-dimensional loop and post-filtering algorithm is addressed to suppress blocking and ringing artifacts of H.26L compressed video. A new one- dimensional pixel-based regularized smoothing function is defined and the regularization parameters controlling the degree of smoothness to two neighboring directions are determined by available information in encoder and decoder. The proposed loop/post filter is different to the typical regularization approaches, in that the proposed regularized smoothing functional is defined on pixel basis for easy and fast implementation. Therefore, no inverse matrix is required and iteration techniques are not necessary, which require very expensive computational cost. Also, by using look-up-table for determining the regularization parameters, the recovered image can be obtained with less computational cost. The experimental results show the capability of the proposed algorithm.
Advanced deinterlacing techniques with the use of zonal-based algorithms
This paper describes a new highly efficient deinterlacing approach based on motion estimation and compensation techniques. The proposed technique mainly benefits from the motion vector properties of zonal based algorithms, such as the Advanced Predictive Diamond Zonal Search (APDZS) and the Predictive Motion Vector Field Adaptive Search Technique (PMVFAST), multihypothesis motion compensation, but also an additional motion classification phase where, depending on the motion of a pixel, additional spatial and temporal information is also considered to further improve performance. Extensive simulations demonstrate the efficacy of these algorithms, especially when compared to standard deinterlacing techniques such as the line doubling and line averaging algorithms.
Efficient architecture of binary motion estimation for MPEG-4 shape coding
Yi-Chu Wang, Hao-Chieh Chang, Wei-Ming Chao, et al.
This paper presents an efficient architecture of binary motion estimation (BME) for MPEG-4 shape coding. This architecture, called DDBME, mainly consists of a data dispatch based 1-D systolic array and a 16 X 32 bit search range buffer. In DDBME, bit parallelism technique is applied on the SAD calculation of block matching algorithm. In order to support efficiently bit-data parallel processing, bit addressing should be taken into consideration. The data dispatch technique is applied on 1-D array by the hardwired data flow routing such that the bit addressing operations can be efficiently reduced. The DDBME operating at 7.29 MHz can handle the real-time requirement for encoding MPEG-4 shape sequence at core profile level 2, i.e. 2 VOs with CIF format, 30 fps and assuming each frame contains 30% boundary macroblocks in average. For the same real-time specification, the optimized software running on RISC (Ultra Sparc, 300 MHz) can only achieve 1/20 performance.
Efficient requantization method for INTRA MB in heterogeneous transcoding
Kwang-deok Seo, Jae-Kyoon Kim, Kook-yeol Yoo, et al.
In this paper, we propose an efficient requantization method for INTRA Macroblocks (MB) in heterogeneous transcoding MPEG-1 to MPEG-4 simple profile. The quantizer for MPEG-1 INTRA MB usually uses a quantization matrix while the quantizer for MPEG-4 simple profile doesn't. As a result, the quantization step sizes of the two quantizers may not be the same even for the same quantization parameter. Due to this mismatch in the quantization step size, the transcoded MPEG-4 sequence suffers serious quality degradation and the number of bits produced by transcoding increases from the original MPEG-1 video sequence. To solve these problems, we propose an efficient method to find a near-optimum reconstruction level in the transcoder. We also present a PDF (probability distribution function) estimation model for the original DCT coefficients of MPEG-1 video sequence, which is required for the proposed requantization. Experimental results show that the proposed method gives 0.3 to approximately 0.7 dB improvement in PSNR over the conventional method, even at the reduced bit-rate about 5 to approximately 7% from the conventional method.