Proceedings Volume 2419

Digital Video Compression: Algorithms and Technologies 1995

Arturo A. Rodriguez, Robert J. Safranek, Edward J. Delp
cover
Proceedings Volume 2419

Digital Video Compression: Algorithms and Technologies 1995

Arturo A. Rodriguez, Robert J. Safranek, Edward J. Delp
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 17 April 1995
Contents: 10 Sessions, 50 Papers, 0 Presentations
Conference: IS&T/SPIE's Symposium on Electronic Imaging: Science and Technology 1995
Volume Number: 2419

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Scene Change Detection and Video Indexing
  • Low Bit Rate Coding
  • MPEG Video
  • Motion Estimation Techniques I
  • Motion Estimation Techniques II
  • Video Analysis
  • Coding Methods and Techniques
  • Poster Session
  • Wavelet Coding
  • Coding Techniques and Implementations
  • Coding Methods and Techniques
  • Poster Session
  • Wavelet Coding
  • Poster Session
  • Coding Techniques and Implementations
  • Wavelet Coding
Scene Change Detection and Video Indexing
icon_mobile_dropdown
Scene change detection and content-based sampling of video sequences
Behzad Shahraray
Digital images and image sequences (video) are a significant component of multimedia information systems, and by far the most demanding in terms of storage and transmission requirements. Content-based temporal sampling of video frames is proposed as an efficient method for representing the visual information contained in the video sequence by using only a small subset of the video frames. This involves the identification and retention of frames at which the contents of the scene is `significantly' different from the previously retained frames. It is argued that the criteria used to measure the significance of a change in the contents of the video frames are subjective, and performing the task of content-based sampling of image sequences, in general, requires a high level of image understanding. However, a significant subset of the points at which the contextual information in the video frames change significantly can be detected by a `scene change detection' method. The definition of a scene change is generalized to include not only the abrupt transitions between shots, but also gradual transitions between shots resulting from video editing modes, and inter-shot changes induced by camera operations. A method for detecting abrupt and gradual scene changes is discussed. The criteria for detecting camera-induced scene changes from camera operations are proposed. Scene matching is proposed as a means of achieving further reductions in the storage and transmission requirements.
Scene change detection in an MPEG-compressed video sequence
Jianhao Meng, Yujen Juan, Shih-Fu Chang
An algorithm is proposed for the detection of abrupt scene change and special editing effects such as dissolve in a compressed MPEG/MPEG-2 bitstream with minimal decoding of the bitstream. Scene changes are easily detected with DCT DC coefficients and motion vectors. By performing minimal decoding on the compressed bitstream, the processing speed for searching a video database of compressed image sequences can be dramatically improved. In addition, the algorithm may also be applied in video scene browsing and video indexing as well.
Scene decomposition of MPEG-compressed video
Hain-Ching H. Liu, Gregory L. Zick
This paper presents the video processing techniques for indexing MPEG video sequences. Parameters encoded in P- and B-pictures are used to detect scene changes. In the MPEG format, P- and B-pictures consist of two types of information: difference matrix and motion vector(s) for every macroblock (MB). Different types of MBs indicate the relationship between the current picture and its reference picture(s). The proposed techniques take advantage of those parameters encoded in the MPEG video streams to detect some changes. Since motion information in the MPEG format is used, these novel techniques are reliable, accurate, and fast. Those techniques and algorithms are presented in detail and examples are provided.
Temporal segmentation of videos: a new approach
Mourad Cherfaoui, Christian Bertin
Several works have been carried out to achieve automatic video segmentation into meaningful parts. Video shots usually correspond to these meaningful parts. Many techniques have already been proposed1'2'3'4'5. Currently, finer segmentation ofvideos based on camera movements is sought. Figure 1 shows a video segmented as usual and in a finer way. Cameraoperation based segmentation allows us to better describe video contents for video-database applications5. It also allows to achieve semantic coding for very low bitrate video compression. In this paper, we propose a method that achieves automatic video segmentation based on camera movements. In section 2, we define a camera operations model. Using this model, we present in section 3 our video segmentation method. Experimental results are presented in section 4. Finally, we make a quick comparison of our method with the one presented in 6.
Low Bit Rate Coding
icon_mobile_dropdown
Single-frame prediction for high video compression
In this abstract, we present a novel technique to encode video sequences, that performs a region-based motion compensation of each frame to be encoded so as to generate a predicted frame. The set of regions to be motion compensated for a given frame has been obtained through a quadtree segmentation of the motion field estimated between a single reference frame (representing a typical projection of the scene) and the frame to be encoded. This way, no DPCM loop in the temporal domain is introduced, avoiding the feedback of the quantization errors. Under the assumption that the projection of the scene on the image plane remains nearly constant, only slight deformations of the reference frame occur from one frame to the next, so that very limited information needs to be coded: (1) the segmentation shape; (2) the motion information. Temporal correlation is used to predict both types of information so as to further reduce any left redundancy. As the segmentation may not be perfect, spatial correlation may still exist between neighboring regions. This is used in the strategy designed to encode the motion information. The motion and segmentation information are estimated on the basis of a two stage process using the frame to be encoded and the reference frame: (1) a hierarchical top-down decomposition, followed by (2) a bottom-up merging strategy. This procedure can be nicely embedded in a quadtree representation, which ensures a computationally efficient but rather robust segmentation strategy. We show how the proposed method can be used to encode QCIF video sequences with a reasonable quality at a 10 frame/s rate using roughly 20 kbit/s. Different schemes for prediction are compared pointing the advantage of the single reference frame for both prediction and compensation.
Variable block size video coding with motion prediction and motion segmentation
Kui Zhang, Miroslaw Bober, Josef Kittler
In the paper we are concerned with the efficient coding of image sequences for video- conference applications. In such sequences, large image regions usually undergo a uniform translational motion. Consequently, to maximize the coding efficiency and quality, the codec should be able to segment and estimate multiple translational motions accurately and reliably. Following the above premise, we propose an algorithm which combines several known and new techniques. Firstly, a traditional variable block size motion compensation was used, but employing a novel robust motion estimation algorithm. The algorithm can estimate multiple motions to a sub-pixel accuracy and also provides a reliable motion segmentation. Whenever there exist multiple motions within a block, the motion boundary is recovered and approximated by a straight line. Also, an inter-block motion prediction is used to achieve a further improvement of the compression ratio. A comparison with the H.261 scheme shows that the proposed algorithm produces better results both in terms of PSNR and bit-rate. To judge the contribution of the motion segmentation to the overall performance, experiments have been carried out with a variant of the algorithm where only single motion within any block is allowed. This incapacitated variant emulates a commonly used approach for variable block size coding. The comparison of the proposed and incapacitated variants shows that the use of motion segmentation can lower the bit rate and deliver a better visual quality of the reconstructed image sequence.
Ordered Kohonen vector quantization for very low rate interframe video coding
Hui Liu
The new interframe video coding algorithm is presented using the topological ordering property of a self-organizing vector quantization (VQ). This algorithm utilizes the Kohonen learning algorithm to train a super VQ codebook which transforms the statistical characteristics of the training motion-compensated frame difference video signals into a 2D topologically ordered array. A new finite state VQ (FSVQ) scheme is proposed to make use of the correspondence between the image interblock correlation and the geometrical closeness of the codevectors in the ordered super codebook. A small state codebook is dynamically predicted purely based on the positions of codevectors used to encode the neighboring image blocks in the current frame as well as in the previous frame. Thus, this new FSVQ significantly reduces the computational complexity at the encoder and preserves the advantages of a simple VQ decoder. The experimental results show that the prediction accuracy ranges from 70 to 95%, depending on the moving information in a frame. It achieves an average bit rate of 0.082 bits per pixel with high image quality (37.86 dB) for the standard test image sequence `Miss America.' This algorithm is amenable to VLSI implementation because of its simple design, low memory requirement, and low computational complexity.
Segmentation-based scheme for very low bit rate video coding
Vasudev Bhaskaran, Wei Li, Murat Kunt
A video coding/decoding system for very low bitrate video has been developed. This hybrid coding scheme employs intraframe and interframe coding techniques. For intraframe coding, a wavelet-based method is used. For interframe coding, motion-estimation is employed to compute the displaced frame differences (DFD) and the DFD image is coded using a segmentation based method wherein the displaced frame is segmented into active and inactive regions. To meet the low bit rate requirements, the motion vectors are processed so as to reduce their contribution to the overall bitrate. Preprocessing techniques are also employed to generate a smooth motion-vector field. To reduce the coding artifacts, post-processing techniques have been developed for use at the decoder. In this paper, we present simulation results for several typical video sequences coded at 16 kbits/sec and at 32 kbits/sec.
Active compression: a framework for video conferencing at very low bit rates
Videoconferencing is an application of video compression that is rapidly expanding in terms of market acceptance and importance. A number of recent papers discuss methods for very low bit rate videoconferencing between 10 - 20 kbps. In addition, telephone lines with state-of-the- art data modems are capable of transmitting data between 20 - 28 kbps. Since videoconferencing requires the transmission and reception of audio and video, the video component is likely to be restricted to about 20 kbps when using telephone lines. While these recent papers indicate acceptable quality for sequences such as `Miss America' for these very low bit rates, they also indicate higher data rates required for other sequences, that are characterized by excessive movement and/or details. In this paper, we are introducing a framework in which sequences that fall in this category are processed so that they match better the characteristics of the coder. However, this cannot be accomplished without reducing the amount of information in the sequence. The key is to make this in such a way as to produce the least subjectively perceptible distortion. Our experiments have shown that the subjective quality of the resulting sequence after active coding and decoding is better than that of the one resulting from standard coding and decoding.
MPEG Video
icon_mobile_dropdown
ISO-IEC MPEG-2 software video codec
Stefan Eckart, Chad E. Fogg
Part 5 of the International Standard ISO/IEC 13818 `Generic Coding of Moving Pictures and Associated Audio' (MPEG-2) is a Technical Report, a sample software implementation of the procedures in parts 1, 2 and 3 of the standard (systems, video, and audio). This paper focuses on the video software, which gives an example of a fully compliant implementation of the standard and of a good video quality encoder, and serves as a tool for compliance testing. The implementation and some of the development aspects of the codec are described. The encoder is based on Test Model 5 (TM5), one of the best, published, non-proprietary coding models, which was used during MPEG-2 collaborative stage to evaluate proposed algorithms and to verify the syntax. The most important part of the Test Model is controlling the quantization parameter based on the image content and bit rate constraints under both signal-to-noise and psycho-optical aspects. The decoder has been successfully tested for compliance with the MPEG-2 standard, using the ISO/IEC MPEG verification and compliance bitstream test suites as stimuli.
Performance evaluation of MPEG-2 for HDTV
Daniel Lauzon, Andre Vincent, Limin Wang
In this paper, we examine the subjective and the objective performance of MPEG-2 for HDTV. Tests were conducted on seven HDTV sequences selected to cover a wide range of conditions in terms of scene content, complexity and motion speed and direction. The selected material was digitized in 4:2:2 format with HDTV resolution and coded using MPEG-2 Man Profile/High Level syntax. Formal subjective assessment was performed by non-expert viewers on the sequences coded at a bit rate of 18 Mbits/s. Since MPEG-2 allows a great flexibility at the encoding end, we also examined the impact of various MPEG-2 encoding parameters on the quality of the reconstructed HDTV video sequences. The parameters include bit rate, the structure of picture organization, as well as temporal processing.
Impact of scan conversion methods on the performance of scalable video coding
Eric Dubois, Nadia Baaziz, Marwan Matta
The ability to flexibly access coded video data at different resolutions or bit rates is referred to as scalability. We are concerned here with the class of methods referred to as pyramidal embedded coding in which specific subsets of the binary data can be used to decode lower- resolution versions of the video sequence. Two key techniques in such a pyramidal coder are the scan-conversion operations of down-conversion and up-conversion. Down-conversion is required to produce the smaller, lower-resolution versions of the image sequence. Up- conversion is used to perform conditional coding, whereby the coded lower-resolution image is interpolated to the same resolution as the next higher image and used to assist in the encoding of that level. The coding efficiency depends on the accuracy of this up-conversion process. In this paper techniques for down-conversion and up-conversion are addressed in the context of a two-level pyramidal representation. We first present the pyramidal technique for spatial scalability and review the methods used in MPEG-2. We then discuss some enhanced methods for down- and up-conversion, and evaluate their performance in the context of the two-level scalable system.
Forward-adaptive quantization with optimal overhead cost for image and video coding with applications to MPEG video coders
Antonio Ortega, Kannan Ramchandran
We address the problem of optimal forward-adaptive quantization in the video and image coding framework. In this framework, as is consistent with that of most practical coders like MPEG, the encoder has the capability of changing the quantizer periodically (e.g. at a macroblock interval in MPEG). In this paper, we formulate an optimal strategy, based on dynamic programming, for updating the quantizer choice for coding an image or video signal. While in some coding environments the overhead needed to specify the quantizer used by each block is equal for every choice of quantizer, in other situations (e.g. MPEG) the overhead cost is higher if the quantizer changes from one block to the next. We concentrate on the latter case which will be more likely encountered in situations where the overhead represents a significant fraction of the overall rate, as can be the case if a low bit rate is used (e.g. error frames in a typical motion-compensated video coder). We provide empirical evidence of the performance gain that can be obtained when applying our optimal algorithm to typical motion-compensated prediction error frames in MPEG, showing how the popular Viterbi algorithm can be used to find the optimal solution.
Rate quantization modeling for rate control of MPEG video coding and recording
Wei Ding, Bede Liu
For MPEG video coding and recording applications, it is important to select quantization parameters at slice and macroblock levels to produce nearly constant quality image for a given bit count budget. A well designed rate control strategy can improve overall image quality for video transmission over a constant-bit-rate channel and fulfill editing requirement of video recording, where a certain number of new pictures are encoded to replace consecutive frames on the storage media using at most the same number of bits. In this paper, we developed a feedback method with a rate-quantization model, which can be adapted to changes in picture activities. The model is used for quantization parameter selection at the frame and slice level. Extra computations needed are modest. Experiments show the accuracy of the model and the effectiveness of the proposed rate control method. A new bit allocation algorithm is then proposed for MPEG video coding.
Motion Estimation Techniques I
icon_mobile_dropdown
Motion-compensated interpolation using trajectories with acceleration
Michel Chahine, Janusz Konrad
This paper is primarily concerned with motion-compensated interpolation of video sequences using multiple images. Due to the extended temporal support of such motion compensation, linear (constant-velocity) trajectory model is often inappropriate, for example due to insufficient temporal sampling. Recently, we have proposed a quadratic (constant-acceleration) trajectory model and a framework for the computation of its parameters. The approach is based on Markov random field models that lead to a regularized formulation solved by multiresolution deterministic relaxation. In this paper, we demonstrate advantages of using accelerated motion over linear trajectories in a plausible application using natural data. We apply the estimated trajectories to motion-compensated interpolation over multiple frames of progressive and interlaced video sequences. The experimental results for `Miss America' and `Femme et arbre' (interlaced) show, respectively, a 4 and 2 dB average improvement in the PSNR of the reconstruction error when quadratic trajectories are used instead of the linear ones. It is interesting to note that in `Miss America' the most significant improvements can be observed in the area of the mouth and the eyes which are in fact likely to exhibit acceleration. We envisage an application of the proposed method to post-processing in very low bit rate video coding.
Projection methods in motion estimation and compensation
Ton Kalker, Martin Vetterli
The standard approach to exploiting motion fields in block-based hybrid video coding schemes is motion compensation of the current frame using the motion field, followed by DCT coding of the residue. In this paper we argue that this separation of approximation methods (non- linear prediction followed by transform coding) is unfortunate in low bit-rate applications. In low bit-rate applications, only a limited set of DCT coefficients is retained. Expressing the end result of motion estimation/compensation followed by separate residue coding in terms of basic linear algebra, we find a sub-optimal linear approximation scheme. In this paper we replace motion compensation and separate residue coding by (optimal) orthogonal projection. We show that for low bit-rate applications, the orthogonal projection method performs better than conventional methods. In one extreme case, it is better to use projection with only 15 DCT coefficients retained, than to use separate residue coding with all 64 DCT coefficients retained. A critical ingredient in the proposed scheme is the choice of an orthogonal basis in the vector spaces involved. In the current scheme, these bases are determined locally, implying a high computational complexity. Possible directions for the reduction of this computational complexity are discussed.
Performance evaluation of spatial dynamic motion compensation algorithms
Henry R. Wu, Andrew P. Paplinski, Q. X. Jian, et al.
This paper presents a new method, called Spatial Dynamic Motion Compensation (SDMC), which applies the concept of global motion to the coding of motion trajectory information for the purpose of side information reduction in digital video signal compression. The experimental results of the SDMC algorithms presented in this paper have shown that the new method yields significant improvements in terms of bit rate reduction of motion information over the method used in MPEG-1, while maintaining a comparable reconstructed picture quality, when the video sequences possess significant global translational motion.
Motion Estimation Techniques II
icon_mobile_dropdown
Multiresolution framework for backward motion compensation
Aria Nosratinia, Michael T. Orchard
Hierarchical decomposition of images and their relationship with motion fields continues to be a hotly pursued topic, and the role of backward motion information in coding is beginning to capture the interest of the video coding community. This paper simultaneously addresses some of the fundamental issues in multi-resolution and backward motion systems. From a coding viewpoint, a multi-resolution motion hierarchy should be coupled with an estimation system that deals with a maximally subsampled wavelet decomposition of the frames, to avoid redundancy of representation. Through a frequency domain argument, we expose the difficulties associated with such an approach, and in fact show that a band-to-band motion compensated estimation in a wavelet domain is not possible. This analysis leads to an alternative approach for estimation of detail bands. The resulting estimation errors were coded through a zerotree quantizer. We circumvented the causality problem associated with determination of zerotree information in a recursive coder through using a suitable substitute zerotree. Simulations show that a prototype coder of this type is very competitive, with a performance similar to forward (block-based) coders. The results show that the commonly held belief by many, that backward (pel-based, or pel-recursive) coding algorithms are inherently inferior to block-based methods, is not true, and will hopefully spawn interest and open a debate on the role of backward motion information in video coding.
Pyramid decompositions and hierarchical motion compensation
David Houlding, Jacques Vaisey
Block-based motion compensation (MC) is a fundamental component of most video compression algorithms; however, the `optimal' full search is computationally expensive and, as a result, fast search methods are preferred. This paper investigates the use of Gaussian and Laplacian pyramids in fast hierarchical MC algorithms. We compare the two types of pyramids as a function of different decimation and interpolation filters. In addition, the algorithm performance is compared to that of the full search and other `state of the art' fast methods. It is shown that the Gaussian pyramid is superior to the Laplacian, and that the hierarchical techniques outperform the other fast methods by approximately 7% at the same complexity.
Fast motion vector estimation with a Markov model for MPEG
Sungook Kim, C.-C. Jay Kuo
This paper presents a new approach for motion vector estimation. We first propose a stochastic model to describe the temporal correlation of motion vectors. We show that the optimal motion vectors can be obtained through maximum a posteriori sequence estimation method. This method is however not practically implementable due to its high computational complexity and many unknown modeling parameters. However, motivated by this theoretical framework, we propose a modified algorithm which is simple, accurate, and fast. First a set of good motion vector candidates is determined. By examining the distribution of these motion vector candidates, we can estimate the noise level as well as select the best motion vector by using the temporal correlations. Then, the next search window can be predicted by examining the trend of the motion vector variation and the noise level (a higher noise level leading to a large search window). In this way, we can reduce the search operation up to less than 0.5% compared to full block search. The excellent performance of the proposed algorithm is demonstrated through extensive experiments.
Simple way to improve perceived quality of motion-compensated prediction images
Albert A. Deknuydt, Stefaan Desmet, Luc Van Eycken, et al.
Almost all current video codecs are of the hybrid type and thus use motion compensated prediction. The motion vectors are assigned to rectangular blocks rather than pixels, because vector search algorithms are rather calculation intensive and because transmitting vectors for all pixels requires a substantial bit rate all by itself. However, if we see the motion vectors we are willing to transmit as a sampling of an underlying motion vector field, it is obvious that we can reconstruct this underlying field better than by simply using the zeroth order approximation (block-wise constant assumption), which is generally taken. In this paper, a simple alternative for this block-wide constant assumption is made. Transmitted vectors are seen as samples of a rather smooth realistic motion field. This assumes a more intelligent motion vector scheme than plain MAD. Vectors used for non sampling points are calculated by interpolation with vectors of neighboring blocks. Interpolation is done as that the transition between block boundaries is made as continuous as possible. This is achieved by a simple second order surface matching. To prove the validity of the scheme, several simulations were done. First, the perceived visual quality of the motion compensated prediction was evaluated. The scheme results in visually more acceptable predicted images, as their characteristic block appearance is greatly reduced. Secondly, the signal to noise ratio of the prediction was compared. Differences with the standard scheme are relatively small, and in either direction. Finally, a complete codec was tested with both predictions schemes.
Video Analysis
icon_mobile_dropdown
Segmentation of frames in a video sequence using motion and other attributes
Edmond Chalom, V. Michael Bove Jr.
Motion-compensated video coders typically segment a scene into arbitrary tiles, resulting in a compressed bitstream which is not physically or semantically related to the scene structure. This paper presents a method for segmenting video frames and coding motion of regions, where the regions are defined in terms of a number of different properties. The goal is a video coder which gives good compression while identifying coherent regions in a manner useful for both human users and automated scene-understanding processes. Both a supervised and an unsupervised clustering algorithm are used to segment an image sequence; both algorithms make use of multiple features including motion, texture, position, and color. By utilizing both the structure and motion information, we preserve the semantic/structural content of the different regions, and simultaneously remove the redundancy (in successive frames) by describing the motion information in each region with a six-parameter affine model. In the supervised clustering algorithm, the first frame is manually segmented and used as training data. The classification of subsequent frames is done automatically, by using a MAP estimate, and modeling the n-dimensional feature-space as jointly Gaussian. The unsupervised algorithm is an iterative process that reassigns the classification of each point to the region corresponding to the nearest mean among each region of the segmentation from the previous iteration. In both algorithms, the distance and/or the mean is an n-dimensional measurement, n being the number of features used.
Mosaic-based video compression
Michal Irani, Stephen Hsu, P. Anandan
We describe a technique for video compression, based on a mosaic image representation obtained from all frames in a scene sequence, giving a panoramic view of the scene. We describe two types of mosaics, static and dynamic, which are suited for storage and transmission applications, respectively. In each case, the mosaic construction process aligns the images using a global parametric motion transformation, usually canceling the effect of camera motion on the dominant portion of the scene. The residual motions that are not compensated by the parametric motion are then analyzed for their significance and coded. The mosaic representation exploits large scale spatial and temporal correlations in image sequences. In many applications where there is significant camera motion (e.g., remote surveillance), it performs substantially better than traditional interframe compression methods, and offers the potential for very low bitrate transmission. In storage applications, such as digital libraries and video editing environments, it has the additional benefit of enabling direct access and retrieval of single frames at a time.
Encoding motion and approximate segmentation in a slicing floorplan tree structure
Haim Schweitzer, Yanjun Zhang
A new encoding scheme for motion and approximate scene segmentation in video sequences is described. The proposed segmentation partitions the scene into variable size rectangles that can be encoded efficiently in a data structure known as `a slicing floorplan tree' in VLSI design. The motion in each rectangle is represented by size parameters. We describe experiments comparing the compactness of this encoding to motion encoding in MPEG-1, and its quality to motion estimates obtained by standard computer vision techniques. These experiments establish two facts. (1) The proposed representation is flexible enough to describe motion as accurately as the output produced by the best motion algorithms developed by computer vision researchers. (2) The proposed representation is much more compact than the representation of motion in MPEG-1.
Multiresolutional region-based segmentation scheme for stereoscopic image compression
Sriram Sethuraman, Mel Siegel, Angel G. Jordan
Stereoscopic image sequence transmission over existing monocular digital transmission channels, without seriously affecting the quality of one of the image streams, requires a very low bit-rate coding of the additional stream. Fixed block-size based disparity estimation schemes cannot achieve such low bit-rates without causing severe edge artifacts. Also, textureless regions lead to spurious matches which hampers the efficient coding of block disparities. In this paper, we propose a novel disparity-based segmentation approach, to achieve an efficient partition of the image into regions of more or less fixed disparity. The partitions are edge based, in order to minimize the edge artifacts after disparity compensation. The scheme leads to disparity discontinuity preserving, yet smoother and more accurate disparity fields than fixed block-size based schemes. The smoothness and the reduced number of block disparities lead to efficient coding of one image of a stereo pair given the other. The segmentation is achieved by performing a quadtree decomposition, with the disparity compensated error as the splitting criterion. The multiresolutional recursive decomposition offers a computationally efficient and non-iterative means of improving the disparity estimates while preserving the disparity discontinuities. The segmented regions can be tracked temporally to achieve very high compression ratios on a stereoscopic image stream.
Classification of objects in a video sequence
Bruno Carpentieri, James A. Storer
In this paper we review the Split-Merge video displacement estimation technique and show how this can be used, given a classification with integer labels of the objects in the first frame of a video sequence, to track this classification along the sequence while doing motion estimation.
Coding Methods and Techniques
icon_mobile_dropdown
Volume data compression using smoothed particle transformation
Mikio Nagasawa
In 3D entertainment oriented games, most geometrical objects are represented by polygon sets. Thus, the handling of polygons is highly optimized in the hardware of graphic workstations. However, 3D scientific simulations do not ordinarily use polygon data but rather volume data in its original form, such as the scalar array p(x,y,z). If we had an effective volume data compression algorithm and a standard way of representing this data, it could be used in a more efficient manner in the network environment. This paper presents a Smoothed Particle Transformation (SPT) algorithm for volume data at several levels of compressed resolution from an original array description of a given variable. Representing data at various levels of detail is important for archiving the huge data in the network environment. SPT contributes not only to data compression but also for resolution rearrangement. As a result, it speeds up of the visualization of transferred data, especially for the direct volume rendering of compressed data.
Algorithm for fast fractal image compression
John Kominek
Fractal image compression is a promising new technology that may successfully provide a codec for PC-to-PC video communications. Unfortunately, the large amount of computation needed for the compression stage is a major obstacle that needs to be overcome. This paper introduces the Fast Fractal Image Compression algorithm, a new approach to breaking the `speed problem' that has plagued previous efforts. For still images, experiments show that at comparable quality levels the FFIC algorithm is 5 to 50 times faster than the current state of the art. Such an improvement brings real-time video applications within the reach of fractal mathematics.
Software-codec-based full motion video conferencing on the PC using visual pattern image sequence coding
Barry S. Barnett, Alan Conrad Bovik
This paper presents a real time full motion video conferencing system based on the Visual Pattern Image Sequence Coding (VPISC) software codec. The prototype system hardware is comprised of two personal computers, two camcorders, two frame grabbers, and an ethernet connection. The prototype system software has a simple structure. It runs under the Disk Operating System, and includes a user interface, a video I/O interface, an event driven network interface, and a free running or frame synchronous video codec that also acts as the controller for the video and network interfaces. Two video coders have been tested in this system. Simple implementations of Visual Pattern Image Coding and VPISC have both proven to support full motion video conferencing with good visual quality. Future work will concentrate on expanding this prototype to support the motion compensated version of VPISC, as well as encompassing point-to-point modem I/O and multiple network protocols. The application will be ported to multiple hardware platforms and operating systems. The motivation for developing this prototype system is to demonstrate the practicality of software based real time video codecs. Furthermore, software video codecs are not only cheaper, but are more flexible system solutions because they enable different computer platforms to exchange encoded video information without requiring on-board protocol compatible video codex hardware. Software based solutions enable true low cost video conferencing that fits the `open systems' model of interoperability that is so important for building portable hardware and software applications.
Knowledge-based approach to JPEG acceleration
Konrad Froitzheim, Heiner Wolf
JPEG picture compression and related algorithms are not only used in still picture compression, but also to a growing degree for moving picture compression in telecommunication applications. Real-time JPEG compression and decompression are crucial in these scenarios. We present a method to significantly improve the performance of software based JPEG decompression. Key to these performance gains are adequate knowledge of the structure of the JPEG coded picture information and transfer of structural information between consecutive processing steps. Our implementation achieved an 80% performance increase decompressing typical JPEG video streams.
New generation of real-time software-based video codec: popular video coder II (PVC-II)
Ho-Chao Huang, Ja-Ling Wu
A new generation of real-time software-based video coder, the Popular Video Coder II (PVC- II), is presented in the paper. The PVC-II simplifies the traditional video coder by removing the transform and the motion estimation parts and modifies the quantizer and entropy coder. Moreover, the PVC-II improves the coding performance of its previous version, the Popular Video Coder, by introducing several newly developed efficient coding techniques, such as the adaptive quantizer, the adaptive resolution reduction and the fixed-model intraframe DPCM, into the codec. The coding speed, compression ratio and picture quality of the PVC-II are good enough for applying it to various real-time multimedia applications. Since no compression hardware is needed for the PVC-II to encode and decode video data, the cost and complexity of developing multimedia applications, such as video phone and multimedia e-mail systems, can be greatly reduced.
Poster Session
icon_mobile_dropdown
Wavelet-based scalable image compression
In this paper, we present scalable image compression algorithms based on wavelet transform. Recently, the International Standard Organization (ISO) has proposed the JPEG standard for still image compression. JPEG standard not only provides the basic features of compression (baseline algorithm) but also provides the framework of reconstructing images in different picture qualities and sizes. These features are referred to as SNR and spatial scalability, respectively. Spatial scalability can be implemented using the hierarchical mode in the JPEG standard. However, the standard does not specify the downsampling filters to be used for obtaining the progressively lower size images. A straightforward implementation would employ mean downsampling filters. However, this filter does not perform very well in extracting the features from the full size image resulting in poor quality images and a lower compression ratio. We present a wavelet transform based technique for achieving spatial scalability (within the framework of hierarchical mode). Our simulation results confirm the substantial performance improvement and superior subjective quality images using the proposed technique. Most importantly, the wavelet based technique does not require any modifications to existing JPEG decoders.
Wavelet Coding
icon_mobile_dropdown
Optimal wavelet tree pruning for image coding
Yew Hock Ang, M. Bi, Sim Heng Ong
In this paper, an optimal image coding scheme based on Wavelet decomposition and vector quantization is proposed. The selection of wavelet transformed coefficients for encoding is performed using an optimal tree-pruning algorithm. Optimum selection of wavelet coefficients is achieved by minimizing the overall residual quantization error of the pruned tree. The pruning process takes into consideration the image structure and exploits the spatial masking effect of the human visual system. This reduces the overall coding bit rate without scarifying the perceptual quality in the reconstructed image. Vector quantization of the selected wavelet coefficients (pruned-tree) is performed using our proposed multiresolution product codebook. The optimal design of the product codebook is characterized by its minimized and equally distributed quantization distortions of individual sub-codebooks.
Efficient algorithm for video compression using the wavelet transform
Vitor Mendes Silva, Luis A. S. V. de Sa
An effective video coding algorithm is described. It is based on a new quadtree block merging algorithm and wavelet decomposition of video signals using orthonormal and biorthogonal filter banks. With the merging algorithm the motion compensated predicted images are represented by a small set of rectangular regions in order to improve the coding efficiency and to minimize the border effects associated to the wavelet transform. A new solution to the problem of processing finite length signals by wavelet transforms, based on time-varying perfect reconstruction filter banks will be developed. Results of coding simulations using the merging algorithm with 8 X 8 and 16 X 16 blocks are presented. Also, subjects such as, choice of the optimal filter bank, boundary effects, optimal scalar quantization of the wavelet coefficients and prediction error filtering by wavelet decompositions will be discussed.
Coding Techniques and Implementations
icon_mobile_dropdown
Runlength encoding of quantized discrete cosine transform (DCT) coefficients
Viresh Ratnakar, Ephraim Feig, Eric Viscito, et al.
Runlength encoding is used in image and video compression methods to efficiently store quantized Discrete Cosine Transform coefficients. The coefficients for each block are scanned in a zig-zag fashion, and runs of zeros are entropy coded. In this paper we present a comparison of the bit-rate resulting from runlength encoding with the bit-rate calculated as the coefficient-wise sum of entropies. Our experiments with several images show that the two are very close in practice. This is a useful result, for example, for designing quantization matrices to meet any bit-rate requirement. We also present an analytical framework to study these bit- rates. We consider two variants of runlength encoding. In the first one, the symbols that are entropy-coded are (runlength, value) pairs. In the second variant, which is the one used in JPEG, values are grouped together into categories based on magnitude.
Parallel implementation of an MPEG-1 encoder: faster than real time
In this paper we present an implementation of an MPEG1 encoder on the Intel Touchstone Delta and Intel Paragon parallel computers. We describe the unique aspects of mapping the algorithm onto the parallel machines and present several versions of the algorithms. We will show that I/O contention can be a bottleneck relative to performance. We will also describe how the Touchstone Delta and Paragon can be used to compress video sequences faster than real-time.
Scan image compression-encryption hardware system
Nikolaos G. Bourbakis, R. Brause, C. Alexopoulos
This paper deals with the hardware design of an image compression/encryption scheme called SCAN. The scheme is based on the principles and ideas reflected by the specification of the SCAN language. SCAN is a fractal based context-free language which accesses sequentially the data of a 2D array, by describing and generating a wide range (near (nxn)) of space filling curves (or SCAN patterns) from a short set of simple ones. The SCAN method uses the algorithmic description of each 2D image as SCAN patterns combinations for the compression and encryption of the image data. Note that each SCAN letter or word accesses the image data with a different order (or sequence), thus the application of a variety of SCAN words associated with the compression scheme will produce various compressed versions of the same image. The compressed versions are compared in memory size and the best of them with the smallest size in bits could be used for the image compression/encryption. Note that the encryption of the image data is a result of the great number of possible space filling curves which could be generated by SCAN. Since the software implementation of the SCAN compression/encryption scheme requires some time, the hardware design and implementation of the SCAN scheme is necessary in order to reduce the image compression/encryption time to the real-time one. The development of such an image compression encryption system will have a significant impact on the transmission and storage of images. It will be applicable in multimedia and transmission of images through communication lines.
Fast VLSI architecture for 8 x 8 2D DCT
Hughes de Perthuis, E. Bercovici, A. de Grandmaison, et al.
Discrete Cosine Transform (DCT) is one of the most popular lossy techniques used today in video compression schemes. It allows to take advantage of the properties of natural images. Indeed thanks to their continuity for small surfaces (typically 8 X 8 pixels), they ask for a more compact description in the frequential plan than in the spatial one. A coupled quantization also brings further compression gain as it is now possible to degrade more the high frequencies of the image to which human eye is less sensitive. The drawback is that DCT puts heavy stress on computational resources and can be a bottleneck to cheap real time video. We here introduce a VLSI architecture which combines excellent performance with a small die size as we use an algorithm which maps very well on silicon. Through a reordering of the samples, regularity and complexity of the computations involved are greatly improved. This allows to divide the process into two parallel parts, one for even samples, the other for odd ones. As the number of coefficients required is decreased, fixed multipliers can be used. A simple join of the two parts' results followed by a normalization merged with quantization will give 8 X 8 2D DCT after a total of 64 cycles.
Coding Methods and Techniques
icon_mobile_dropdown
Block adaptive classified vector quantization
Huy So Peter Truong, Stephen C.Y. Ho
As vector quantization (VQ) under low bit-rate constraint suffers serious reconstruction degradation, block adaptive classified VQ has been considered as an effective scheme not only to improve reconstructed image quality but also to reduce processing time. These desirable properties are achieved mainly by combining adaptive block segmentation with classified VQ, which results in a better adaptation of the coding process to the nature of images. Central to this scheme is the classification process. Its operation is based on both transform and spatial domains to achieve reasonable classification accuracy and simplicity. In making adaptive segmentation decisions, low-ordered DCT coefficients and dynamic range of pixel values are used to establish variable size blocks. For classified VQ, low-ordered DCT coefficients and contrast sensitivity are employed to characterize various edge orientations and positions within image blocks, respectively. An alternative approach for determining suitable codebook sizes to be used with classified VQ has been investigated with favorable trade-off between overall distortion and processing time. Following the block segmentation and classification, all image blocks are coded with VQ in the spatial domain to take advantage of its better rate-distortion performance. Reconstructed images with peak signal-to-noise ratios ranging from 28.2 to 35.3 dB have been obtained at coding rates between 0.27 and 0.46 bit-per-pixel.
Poster Session
icon_mobile_dropdown
High performance software MPEG video player for PCs
Stefan Eckart
This presentation describes the implementation of the video part of a high performance software MPEG player for PCs, capable of decoding both video and audio in real-time on a 90 MHz Pentium system. The basic program design concepts, the methods to achieve high performance, the quality versus speed trade-offs employed by the program, and performance figures, showing the contribution of the different decoding steps to the total computational effort, are presented. Several decoding stages work on up to four data words in parallel by splitting the 32 bit ALU into four virtual 8 bit ALUs. Care had to be taken to avoid arithmetic overflow in these stages. The 8 X 8 inverse DCT is based on a table driven symmetric forward-mapped algorithm which splits the IDCT into four 4 X 4 DCTs. In addition, the IDCT has been combined with the inverse quantization into a single computational step. The display process uses a fast 4 X 4 ordered dither algorithm in YUV space to quantize the 24 bit 4:2:0 YUV output of the decoder to the 8 bit color lookup table hardware of the PC.
Reconstruction artifacts in digital video compression
Michael Yuen, Henry R. Wu
This paper surveys the visual distortions introduced by a compression scheme into the reconstruction of a video sequence. Specifically, the paper will concentrate on systems utilizing motion compensation (MC), differential pulse code modulation (DPCM), and the discrete cosine transform (DCT). Such systems are exemplified by the CCITT H.261 and MPEG standards. In addition to the artifacts that have already been widely considered, such as `blocking' and `mosquito' effects, new classifications of artifacts will be presented. A concise characterization and demonstration of each artifact will be provided. This will include the specification of the distribution of the artifact within the reconstructed frames, and its correlation with the local spatial/temporal features within the sequence. Also noted will be the specific causes of the artifacts, with relation to the MC/DPCM/DCT components. Since the human visual system is, typically, the final judge of the quality of the reconstructed sequence, it is also important to note the level of severity of the artifacts that make the artifacts visually noticeable.
Wavelet Coding
icon_mobile_dropdown
Feature-preserving wavelet scheme for low bit rate coding
Po-Yuen Cheng, C.-C. Jay Kuo
In this research, we propose a novel low bit rate video compression scheme that consists of three major components: hybrid wavelet-JPEG compression, feature-preserving compression via multizone decomposition and window selection and tracking.
Poster Session
icon_mobile_dropdown
Real-time MPEG-1 software decoding on HP workstations
Vasudev Bhaskaran, Konstantinos Konstantinides, Ruby B. Lee
A MPEG1 codec that is capable of real-time playback of video and audio on HP's RISC-based workstations has been developed. Real-time playback is achieved by examining the complete MPEG1 decoding pipeline and optimizing the algorithms used for the various stages of the video and audio decompression process. For video decompression, efficient implementations are derived by examining the huffman decoder, inverse quantizer and inverse DCT as a single system. For audio decompression, by viewing the subband synthesis function in MPEG1 layer I and layer II decoding as a DCT, a fast 32-point DCT suitable for use in MPEG1 audio decompression has been developed. Besides algorithmic enhancements, in order to achieve real-time performance, minor changes to the CPU architecture and the display pipeline architecture were needed. The integration of algorithmic and architectural enhancements results in real-time playback of MPEG1 video and audio on HP's RISC based workstations.
DCT-based scheme for lossless image compression
Giridhar D. Mandyam, Nasir Ahmed, Neeraj Magotra, et al.
In this paper, a new method to achieve lossless compression of 2D images based on the discrete cosine transform (DCT) is proposed. This method quantizes the high energy DCT coefficients in each block, finds an inverse DCT from only these quantized coefficients, and forms an error residual sequence to be coded. Furthermore, a simple delta modulation scheme is performed on the coefficients that exploits correlation between high energy DCT coefficients in neighboring blocks of an image. The resulting sequence is compressed by using an entropy coder, and simulations show the results to be promising and more effective than just simply performing entropy coding on original raw image data.
Wavelet codec for image sequence coding at very low bit rate with low latency
Steffen Abraham, Fritz Seytter
Wavelet theory provides an alternative approach to traditional DCT-based coding with properties that suggest an advantage over traditional schemes, especially for high compression ratios. This paper presents the design of an image coder for very low bit rates (8..64 kbit/s) which can be used for videophone applications. Apart from the image quality coding delay is an important critical factor. The coder performs a hybrid motion compensation/transform coding approach using a warping motion compensation to completely avoid blocking artifacts. Experiments using standard videophone test sequences show that the obtained image quality is similar to the best available results from DCT coders.
Very low bit rate coding for PSTN videotelephony on personal computer: part 2
Jean-Claude Schmitt, Gerard Eude
The part 1 of this paper was presented during the 1994 SPIE/IS&T Symposium. A specific hardware executing a proprietary VCS1 algorithm (based on COST211ter Group specifications) was described. This hardware was connected via a serial link to a Macintosh computer in order to make a video telephony application on a Public Switching Telephony Network in a Macintosh environment. This codec has evolved to support TMN4 which is the today's state of the art of the ITU-T group for very low bit rate coding. Due to the low frame rate of encoded pictures and the increasing available power on personal computers, the implementation of such an algorithm in software becomes now possible. The software decoder part is now running in real time on different Macintosh personal computers. In this paper, the decoder implementation is described and the time spent in the different parts of the decoding process is discussed. Different comparisons are made: decoder written in `C' language and in 680 X 0 assembler language on the first side, decoder implemented on 680 X 0 class machines and on PowerMacintosh class machines on the other side.
Performance of real-time software-only H.261 codec on the Power Macintosh
Hsi-Jung Wu, Katherine S. Wang, James O. Normile, et al.
The widespread use of teleconferencing as a major role of remote communication has until now been stymied by the costs associated with deploying specialized hardware required to achieve good performance. Another constraint has been the incompatibility among the various systems that are available. These facts coupled with the rapid increase in computational power available on desktop systems convinced us of the value of a standards-based software solution. Leveraging the performance of the PowerPC RISC processors, we have implemented a software-only realization of the CCITT H.261 video coding standard. In this paper, we will discuss the performance of the software codec which has been optimized for the Power Macintosh 8100. Over ISDN and loaded Ethernet, the codec provides good visual quality in terms of spatial quality and frame rate over a range of bit rates (less than 64 to 384 Kbits/s). We will outline the structure of the software codec and discuss its performance.
Multiplication free scaled 8 x 8 DCT algorithm with 530 additions
Leonid Kasperovich
The known idea of the rational DCT computations employs multiply approximation, where overall accuracy is not affected due to the rounding-off and truncations, which are intrinsic to the quantization process in the image/video compression. Besides higher computational efficiency, the multiplication free scheme can take advantage of the full-length processor word of 32-bit or newest 64-bit microprocessors, being applied to the packed pixel pairs. In this paper we propose the scaled DCT algorithm, that uses 30 `essential' multiplications by (root)2 and 64 `non-essential' by three constants (which are closed to 5, 3 and 2). The number of additions in the approximation version of the algorithm is considerably fewer than in the known multiplication free 8 X 8 DCT computational scheme. Other advantage of the algorithm is its symmetry, i.e. almost all the computational modules used in both forward and inverse DCT implementations are the same rather than transpositional ones. A performance of the software implementations of MJPEG codec for Pentium PC, based on the algorithm presented, as well as its applicability to the high resolution software-only playback in various video standards, is also discussed.
Coding Techniques and Implementations
icon_mobile_dropdown
New inner product algorithm of the two-dimensional DCT
Bela Feher
The 2D discrete cosine transform (2D DCT) is one of the most effective methods in image data compression. In this paper an inner produce algorithm for the 8 X 8 2D DCT implementation is presented. The proposed direct 2D inner product algorithm exploits redundancies down to the bit level, and results in minimal hardware complexity. The basic algorithm separates the computation to 8 subtransform, according to the different cosine function values. Utilizing the odd-even property of the DCT, every transformed coefficients are expressed as a 4 point size inner product operation. The inner product processors are realized by an efficient distributed vector multiplication arrangement. All of the numerical parameters are built in into the inner product processors, so the arithmetic complexity is partly transformed to the internal topology of the units. The selected globally parallel, locally serial implementation style is features by basic serial processing elements and low communication cost. It is ideal for FPGA implementation, where the available chip area is a priori partitioned between logical and routing resources. The fully concurrent bit-serial pipeline architecture needs less than 1000 arithmetic primitives. Assuming 30 MHz bitclock rate in the Xilinx FPGA, the available throughput is 1 million 2D 8 X 8 DCT transform/sec.
Wavelet Coding
icon_mobile_dropdown
Scalable image compression using combined wavelet transform and vector quantization
Sethuraman Panchanathan, A. Jain, N. Gamaz
In this paper, we propose vector quantization (VQ) based scheme for scalable image compression. Scalability is a generic feature referring to image representation in different sizes (spatial scalability) and/or picture qualities (SNR scalability). VQ is a powerful technique for low bit rate image compression. However the conventional VQ approach does not provide a scalable bitstream. We propose a VQ based algorithm (SVQ) to achieve spatial scalability. In SVQ technique, a pyramidal structure of three layers is built by applying 2D wavelet downsampling filters on the input image. The label stream is then made scalable such that a smaller-size image can be obtained by decoding a portion of the bitstream. This image can be further enhanced in size by progressively decoding the remaining bits. This algorithm ensures partial decodability of VQ labels by using separate codebooks one for each spatial resolution. We then propose a combination of wavelet transform and SVQ technique called WSVQ to exploit the cross-correlations among wavelet sub-bands. Simulation results confirm the substantial reductions in bit rate and superior subjective image quality at each spatial resolution using the proposed algorithm, at a significantly reduced computational complexity.