Proceedings Volume 2952

Digital Compression Technologies and Systems for Video Communications

cover
Proceedings Volume 2952

Digital Compression Technologies and Systems for Video Communications

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 16 September 1996
Contents: 14 Sessions, 70 Papers, 0 Presentations
Conference: Advanced Imaging and Network Technologies 1996
Volume Number: 2952

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Video Coding
  • Pre- and Postprocessing
  • Image/Video Coding and Processing
  • Additional Papers
  • Image/Video Coding and Processing
  • Additional Papers
  • Image/Video Coding and Processing
  • Video Transmission
  • Medical and Very High Quality Images
  • Object-Based Motion Compensation
  • Poster Session I: Image/Video Coding
  • Motion Estimation and Compensation
  • Poster Session I: Image/Video Coding
  • MPEG-2 Optimization and Applications
  • Untitled Session
  • Image Compression
  • Additional Papers
  • Image Compression
  • Poster Session II: Optimization, Implementation, and Applications of CODECs
  • Fractal-Based Coding
Video Coding
icon_mobile_dropdown
Image sequence segmentation for object-oriented coding
Achim Ibenthal, Sven Siggelkow, Rolf-Rainer Grigat
An algorithm for the segmentation of image sequences is presented, taking into account especially aspects for object oriented coding. A fundamental requirement of such applications is the temporal stability of the segmentation. This is improved in this article compared to other existing approaches by including motion estimation into the segmentation process. Additionally a hierarchical approach enables an efficient predictive coding on one hand and a semantic data access on the other hand. As a direct result from using full color information for the segmentation process, coding of the chrominance information can be done with extremely high compression ratios. Instead of full resolution chrominance information only few mean chrominance vectors need to be transferred (which corresponds to a compression factor of about 1000). Additionally object shapes must be coded but this has to be done for grayscale images anyway.
Multiview video coding using a multistate dynamic programming disparity estimation algorithm
Nikos Grammalidis, Michael G. Strintzis
An efficient disparity estimation and occlusion detection and characterization algorithm for multiocular systems is presented. A dynamic programming algorithm, using a multiview matching cost as well as pure gometrical constraints, is used to provide an estimate of the disparity field and to identify occluded areas. An important advantage of this approach is that not only are the occluded points simultaneously detected, but they are also characterized by the number of views where each point is occluded. Specifically, a 'state' map describing the number of matches for each imaged pixel and thus identifying occluded points in the multiview sequence is produced. The disparity and state information is then coded using a directional coding technique and applied to obtain virtual images from intermediate viewpoints. Experimental results, obtained using a four-view image sequence, illustrate the performance of the proposed technique.
Estimation of eye and mouth corner point positions in a knowledge-based coding system
Automatic extraction of facial feature points is one of the main problems for semantic coding of videophone sequences at very low bit rates In this contribution, an approach for estimation of the eye and mouth corner point positions is presented. For this proposition, the location informations of the face model are exploited to define search areas for estimation of the eye and mouth corner point positions. Then, the eye and mouth corner point positions are estimated based on a template matching technique with eye and mouth corner templates. Finally, in order to verify these estimated corner point positions, some geometric conditions between the corner point positions and the center point positions of the eyes and the mouth are exploited. The proposed algorithm has been applied to test sequences Claire and Miss America with a spatial resolution corresponding to CIF and a frame rate of 10 Hz.
Object-based system for stereoscopic videoconferencing with viewpoint adaptation
This paper describes algorithms that were developed for a stereoscopic videoconferencing system with viewpoint adaptation. The system identifies foreground and background regions, and applies disparity estimation to the foreground object, namely the person sitting in front of a stereoscopic camera system with rather large baseline. A hierarchical block matching algorithm is employed for this purpose, which takes into account the position of high-variance feature points and the object/background border positions. Using the disparity estimator's output, it is possible to generate arbitrary intermediate views from the left- and right-view images. We have developed an object-based interpolation algorithm, which produces high-quality results. It takes into account the fact that a person's face has a more or less convex surface. Interpolation weights are derived both from the position of the intermediate view, and from the position of a specific point within the face. The algorithms have been designed for a realtime videoconferencing system with telepresence illusion. Therefore, an important aspect during development was the constraint of hardware feasibility, while sufficient quality of the intermediate view images had still to be retained.
Pre- and Postprocessing
icon_mobile_dropdown
Quality improvement of low-data-rate compressed video signals by pre- and postprocessing
Robert Kutka, Andre Kaup, M. Hager
In this paper a technique for improving the image quality of block-based video coders is presented which combines the following pre- and postprocessing steps: Before the coding process, the image sharpness is enhanced by applying a special prefilter to the original image. By this means a possible degradation of the image sharpness by the coder is compensated. The decoder output is smoothed by a lowpass filter which reduces artifacts such as block discontinuities. Secondly, the quantization error of the frequency DCT coefficients is reduced by predicting AC coefficients based on the mean values of the neighboring image blocks. This step improves the luminance function within the blocks. While the original sequences show coarse blocking patterns, the images processed by this technique look smooth on the overall with well-preserved local edge structures. The tool supports all standardized DCT codecs, such as MPEG-1/2 or H.261/3.
Optimal quad-tree-based motion estimator
Guido M. Schuster, Aggelos K. Katsaggelos
In this paper we propose an optimal quad-tree (QT)-based motion estimator for video compression. It is optimal in the sense that for a given bit budget for encoding the displacement vector field (DVF) and the QT segmentation, the scheme finds a DVF and a QT segmentation which minimizes the energy of the resulting displaced frame difference (DFD). We find the optimal QT decomposition and the optimal DVF jointly using the Lagrangian multiplier method and a multilevel dynamic program. The resulting DVF is spatially inhomogeneous since large blocks are used in areas with simple motion and small blocks in areas with complex motion. We present results with the proposed QT-based motion estimator which show that for the same DFD energy the proposed estimator uses about 30% fewer bits than the commonly used block matching algorithm.
Temporal filtering of coded video
Iain E. Garden Richardson, Andrew G. Turnbull, Martyn J. Riley
We describe a novel approach to filtering coded video data. MPEG video is filtered to reduce frame rate and bandwidth by dropping coded B-pictures. The choice of B-pictures to be dropped is made based on an 'activity measure' which indicates the amount of motion within the video scene. We present preliminary results which indicate that this method produces a filtered sequence with higher visual quality than a sequence produced by dropping B-pictures at regular intervals.
Parameter estimation in regularized reconstruction of BDCT compressed images for reducing blocking artifacts
High compression ratios for both still images and sequences of images are usually achieved by discarding information represented by block discrete cosine transform (BDCT) coefficients which is considered unimportant. This compression procedure yields images that exhibit annoying block artifacts. In this paper we examine the reconstruction of BDCT compressed images which results in the removal of the blocking artifact. The method we propose for the reconstruction of such images, is based on a hierarchical Bayesian approach. With such an approach image and degradation models are required. In addition, unknown hyperparameters, usually the noise and image variances, have to be estimated in advanced or simultaneously with the reconstructed image. We show how to introduce knowledge about these parameters into the reconstruction procedure. The proposed algorithm is tested experimentally.
Iterative regularized error concealment algorithm
In this paper, we propose an iterative regularized error concealment algorithm. The coded image can be degraded due to channel errors, network congestion, and switching system problems. We may have therefore seriously degraded images due to information loss. When the structure of the image and video codec is hierarchical, the degradation may be worse because of the inter-dependence of the coded information. In order to solve the error concealment problem of compressed images, we use an iterative regularized algorithm. We analyze the necessity of an oriented high pass operator we introduce and the requirement of changing the initial condition when all the quantized DCT coefficients in a block are lost. Several experimental results are presented.
Image/Video Coding and Processing
icon_mobile_dropdown
Comparison of lossless coding techniques for screened continuous-tone images
Koen N.A. Denecker, Steven Van Assche, Wilfried R. Philips, et al.
Screening of color-separated continuous-tone photographic images produces large high-resolution black-and-white images (up to 5000 dpi). Storing such images on disk or transmitting them to a remote imagesetter is an expensive and time-consuming task, which makes lossless compression desirable. Since a screened photographic image may be viewed as a rotated rectangular grid of large half-tone dots, each of them being made up of an amount of microdots, we suspect that compression results obtained on the CCITT test images might not apply to high-resolution screened images and that the default parameters of many existing compression algorithms may not be optimal. In this paper we compare the performance of lossless one-dimensional general-purpose byte-oriented statistical and dictionary-based coders as well as lossless coders designed for compression of two- dimensional bilevel images on high-resolution screened images. The general-purpose coders are: GZIP (LZ77 by GNU), TIFF LZW and STAT (an optimized PPM compressor by Bellard). The non-adaptive two-dimensional black-and-white coders are: TIFF Group 3 and TIFF Group 4 (former published fax- standards by CCITT). The adaptive two-dimensional coders are: BILEVEL coding (by Witten et al.) and JBIG (latest fax- standard). First we compared the methods without tuning their parameters. We found that both in compression ratio (CR) and speed, JBIG (CR 7.3) was best, followed by STAT (CR 6.3) and BILEVEL coding (CR 6.0). Some results are remarkable: STAT works very well, despite its one- dimensional approach; JBIG beats BILEVEL coding on high- resolution images though BILEVEL coding is better on the CCITT images, and finally, TIFF Group 4 (CR 3.2) and TIFF Group 3 (2.7) can't compete with any of these three methods. Next, we fine-tuned the parameters for JBIG and BILEVEL coding, and this resulted in an increased compression ratio of 8.0 and 6.7 respectively.
Additional Papers
icon_mobile_dropdown
Gradual cut detection using low-level vision for digital video
Jae-Hyun Lee, Yeun-Sung Choi, Ok-bae Jang
Digital video computing and organization is one of the important issues in multimedia system, signal compression, or database. Video should be segmented into shots to be used for identification and indexing. This approach requires a suitable method to automatically locate cut points in order to separate shot in a video. Automatic cut detection to isolate shots in a video has received considerable attention due to many practical applications; our video database, browsing, authoring system, retrieval and movie. Previous studies are based on a set of difference mechanisms and they measured the content changes between video frames. But they could not detect more special effects which include dissolve, wipe, fade-in, fade-out, and structured flashing. In this paper, a new cut detection method for gradual transition based on computer vision techniques is proposed. And then, experimental results applied to commercial video are presented and evaluated.
Image/Video Coding and Processing
icon_mobile_dropdown
Performance analysis of a multivideo traffic communication network
Raymond Wai-Leung Cheung, Peter C. K. Liu
In this paper, we investigate a method to find the cell loss probability versus different buffer size for different types of video traffic in a statistical multiplexer. The fluid- flow model has been used to estimate the average cell loss of the video phone traffic. This approach is good for video traffic whose mean and standard variance are small. Conversely, the mean and standard variance for video traffic such as CATV and Studio_TV are comparatively large when those of the video phone traffic are compared. The fluid- flow approach cannot be used in determining the average cell loss probabilities in the video traffic carrying high average mean value because of the unbalancing of the equation roots. Because a motion video source could be represented by a mixture of Gaussian densities and the sequence of coded bit rates can be modeled more accurately by an autoregressive process whose parameters vary in time according to the state of a discrete Markov chain. This implies that the bursty traffic source can be represented by a number of equally virtual sources with relatively small mean and small standard variance. The fluid-flow model works satisfactorily with this virtual source because of its appropriate mean and variance value. When the correlation relationship of these equal virtual sources are properly identified using the discrete-time Markov chain, the result generated by one virtual source can be used to predict the probabilities of total cell loss which could happen to a bursty traffic source. Same technique can be applied to other video traffic whose mean and variance values are comparatively large. Thus, the average cell loss probabilities against different buffer size in a heterogeneous traffic network can be determined.
Intraframe compression of radar image sequences for ship traffic control applications
Alessandro Andreadis, Giuliano Benelli, Andrea Garzelli, et al.
In this paper, an intraframe scheme for high compression of X-band radar images for ship traffic control is proposed. We used a proprietary radar simulator which generates maritime scenarios as seen by one or more radar sites. We propose a modified adaptive discrete cosine transform (MADCT) scheme which allows us to classify each 8 by 8 image block by means of a threshold criterion based on ac and dc activity. The strategy of transmission of the DCT coefficients, the recovering process of blocks incorrectly discarded, and the bit-allocation phase have been properly designed to fit with the particular application. Accurate experimental results, in terms of PSNR and compression ratio, prove the superiority of the novel scheme with respect to standard coding techniques.
Additional Papers
icon_mobile_dropdown
Proposal for JPEG: thumbnail-based image access/retrieval
Harvey A. Cohen
Image compression codecs should be adapted to the practical reality that most images are first viewed as thumbnail images, before the full size image is accessed. Existing JPEG variants, progressive and hierarchical, could be used for thumbnail based image access, but would involve a total re-engineering of existing libraries. An alternate proposal is made for a new variant of JPEG coding based on a reversible transformation of the JPEG output of existing compilers, so that existing decoder/encoders can be utilized, and existing JPEG libraries can be modified with no alteration to the quality of full size images after decompression. In the proposed scheme, the image code is partitioned into a thumb part and the remainder, or FF part. The thumb part is sufficient for the production of an image thumbnail, while this partition of the code together with the FF partition is required for full featured image reconstruction.
Image/Video Coding and Processing
icon_mobile_dropdown
Image quality prediction for bit-rate allocation
Pascal Fleury, Touradj Ebrahimi
Recent developments in image coding tend to promote schemes consisting of a great variety of coding algorithms applied to different parts of the image to code. This results in an improved rate-distortion behavior of the global system. The selection of the optimal coding method for a given region of an image is still a computationally intensive task, as most of the current schemes need to compute the result for all algorithms and to choose the most suited one. This paper investigates a method to predict the coding performance. This prediction is based on extracted features of the input image. The computation of those features, as well as the prediction itself, is computationally much less expensive than the exhaustive selection. The prediction system is based on neural networks. Selected image features and a set of representation of those features build the input of the network. The low computation cost also enables a dynamic distribution of the fixed bitrate over the different parts of the image, and therefore an algorithm capable to allocate bits to reach a constant quality over the whole image.
Measurements of JND property of HVS and its applications to image segmentation, coding, and requantization
Day-Fann Shen, Shih-Chang Wang
In this paper, we measure the gray level JND (just noticeable difference) property of human visual system directly under various viewing conditions. We then developed three image processing tasks using the measured JND data. First, a JND based image segmentation algorithm for coding purpose is proposed. The algorithm operates on a pyramid data structure and uses the JND property as the merge criterion which is simple in computation while proven to be effective and robust in segmenting various images such as Lena, Salesman, etc. Second, the blocky artifacts normally seen in segmented images can be improved by encoding the difference image between the original image and its segmented version. With slight modifications, the JND based segmentation algorithm can effectively segment the difference image for the proposed two-pass progressive image coding. Finally, the measured JND data shows that 55 gray levels per pixel are sufficient to represent an image under normal viewing conditions and that 64 gray levels are sufficient under any viewing condition. An image requantization algorithm is then proposed and its effectiveness verified.
Subjective assessment of concatenated compression systems
Constantin Glasman, Victor Andronov, Oleg Vasilyev, et al.
Traditional objective measurements are of limited effectiveness in predicting the quality of compressed images. Subjective assessment is the most reliable method of evaluation of compression systems performance now. But subjective assessment is time consuming. That is why it is important to simplify the procedure of subjective measurement. To achieve it one may use the method of paired comparisons. This paper describes applications of the method of paired comparisons in the field of assessment of concatenated compression systems.
Two-layer video coding using pyramid structure for ATM networks
Chang-Bum Lee, Seung Hoon Hong, Rae-Hong Park
In transmission of image sequences over ATM networks, the channel sharing efficiency in packet loss conditions is important. As one of possible approaches two-layer video coding methods have been proposed. These methods transmit video information over the network with different levels of protection with respect to packet loss. In this paper, a two-layer coding method using pyramid structure is proposed and several realizations of two-layer video coding methods are presented and their performances are compared.
Noise reduction in DCT-coded images by estimating optimal quantized values
Masayuki Tanimoto, Kiyohito Narita, Toshiaki Fujii, et al.
This paper proposes a new scheme to reduce the quantization noise in DCT coded images by estimating optimal quantized values. In the proposed scheme, the quantized value of DCT coefficients is shifted from the middle of the quantization step to the mean value of the amplitude distribution of DCT coefficients in the quantization step. The values of the shift minimizing the quantization noise power are obtained experimentally. A simple scheme approximating these values is derived so that the modification factors of the quantized values can be estimated at the decoder. About 0.5 dB improvement of SNR is achieved experimentally by the proposed scheme. By smoothing the boundaries of DCT blocks in the decoded images further, 1.0 dB improvement of SNR is achieved and visually better images are obtained.
Signal extensions in perfect reconstruction filter banks
Joao Carlos Silvestre, Luis A. S. V. de Sa
The wavelet transform is defined for infinite-length signals. In practice we only have finite-length signals, so signals must be extended before they can be transformed. The question is how to extend the signal to minimize signal end effects, or how to find the signal extension that preserves the transform signal length. In this paper we discuss the problem of signal extension in perfect reconstruction filter banks.
Video Transmission
icon_mobile_dropdown
Real-time MPEG-1 video transmission over local area networks
Anastasios N. Delopoulos, Dimitrios Kalogeras, Vassilios Alexopoulos, et al.
In this work is presented the architecture of an MPEG-1 stream transmission system appropriate for point-to-point transfer of live video and audio over TCP/IP local area networks. The hardware and software modules of the system are presented as well. Experimental results on the statistical behavior of the generated and transmitted MPEG-1 stream are quoted.
Lessons learned on reliable transmission of real-time MPEG-2 streams over ATM
Andrea Basso, Mehmet Reha Civanlar, Glenn L. Cash
This paper describes a system that has been designed and built at AT&T Bell Labs for studying transmission of real- time MPEG-2 video over ATM networks for multi-cast applications. The set-up comprises a hardware real-time MPEG-2 video, audio and system encoder, an ATM network adaptation module for MPEG-2 transport over AAL-5, and ATM switch, a software system decoder and a hardware elementary stream decoder. The MPEG-2 transport stream has been characterized in terms of robustness to errors. This preliminary study showed the higher importance of the structural information of the stream (PES packet headers TS headers, sequence, picture headers, etc.) with respect to the coded video data (motion vectors, DCT coefficients, etc.). A brief study of the current MPEG-2 hardware decoding architectures allowed us to better understand the effects of bit-stream errors on the resulting video quality. In our experiments, while the loss of some structural data such as picture start codes led the hardware decoder to loose synchronization or to freeze, the loss of video data only affected the image quality. Furthermore the recovery times from a loss of synchronization were orders of magnitude higher than the recovery from some video data loss. An error-resilient real-time software transport stream decoder has been developed. In multiplex-wide operations (i.e. operations on the entire transport stream) it takes advantage of ring buffers and manages the timing information appropriately. In video-stream specific operations it uses resynchronization mechanisms at the picture level which exploit the redundancy of the PES and transport stream syntax. Furthermore time data transfers between the system decoder and the elementary stream decoder are employed. Experiments show that proper use of these methods can significantly improve the system performance.
Digital network for video surveillance and video distribution
Herman A.T. Claus
Siemens Atea n.v. developed a digital 600 Mbps network based on a counter-rotating optical ring. On this network a video subsystem has been developed, based on M-JPEG compression technology. Videostream switching has been integrated in the network. The resulting system can be used for video collection (surveillance) and for video distribution. This article describes some of the technical features of the system, and technology choices that have been made.
Digital watermarking of raw and compressed video
Frank H. Hartung, Bernd Girod
Embedding information into multimedia data is a topic that has gained increasing attention recently. For video broadcast applications, watermarking of video, and especially of already encoded video, is interesting. We present a scheme for robust interoperable watermarking of MPEG-2 encoded video. The watermark is embedded either into the uncoded video or into the MPEG-2 bitstream, and can be retrieved from the decoded video. The scheme working on encoded video is of much lower complexity than a complete decoding process followed by watermarking in the pixel domain and re-encoding. Although an existing MPEG-2 bitstream is partly altered, the scheme avoids drift problems. The scheme has been implemented and practical results show that a robust watermark can be embedded into MPEG encoded video which can be used to transmit arbitrary binary information at a data rate of several bytes/second.
Medical and Very High Quality Images
icon_mobile_dropdown
Medical image compression using block-based transform coding techniques
Peter De Neve, Wilfried R. Philips, Jeroen Van Overloop, et al.
The JPEG lossy compression technique in medical imagery has several disadvantages (at higher compression ratios), mainly due to block-distortion. We therefore investigated two methods, the lapped orthogonal transform (LOT) and the DCT/DST coder, for the use on medical image data. These techniques are block-based but they reduce the block- distortion by spreading it out over the entire image. These compression techniques were applied on four different types of medical images (MRI image, x-ray image, angiogram and CT- scan). They were then compared with results from JPEG and variable block size DCT coders. At a first stage, we determined the optimal block size for each image and for each technique. It was found that for a specific image, the optimal block size was independent of the different transform coders. For the x-ray image, the CT-scan and the angiogram an optimal block size of 32 by 32 was found, while for the MRI image the optimal block size was 16 by 16. Afterwards, for all images the rate-distortion curves of the different techniques were calculated, using the optimal block size. The overall conclusion from our experiments is that the LOT is the best transform among the ones being investigated for compressing medical images of many different kinds. However, JPEG should be used for very high image qualities, as it then requires almost the same bit rate as the LOT and as it requires fewer computations than the LOT technique.
Medical image compression in teleradiology using low-bit-rate communication networks
Man-Bae Kim, Bo-Sik Yeoun, Yong-Duk Cho, et al.
Teleradiology is defined as the practice of radiology at a distance. Medical images are acquired from one location and are transmitted to one or more distant sites where they are displayed for a diagnosis. Timely availability of medical images via a variety of communication networks is one of the primary goals of teleradiology. In this paper, we propose a medical image compression that can be effectively utilized in the teleradiology system using low-bit rate communication networks. For this purpose, we make use of regions of interest (ROIs) that may be clinically important in medical images. Our study shows that our proposed compression method can reduce the transmission time significantly if the ratio of ROI in the image is small. For example, if the twenty percent of an image belongs to ROI, (ROI ratio equals 0.2), the compression ratio is increased by the scale of about three compared with lossless compression. Accordingly the transmission time is reduced by the same scale. As well, by preserving the clinically important regions, the risk of wrong diagnosis is much reduced compared with lossy compression.
Virtual zero-tree wavelet transform coding of superhigh-definition images
Qi Wang, Mohammad Ghanbari
A fully compatible and scalable coding scheme for super high definition images has been developed. Wavelet decomposition is performed such that the low frequency band is of CIF order, which forms a low-resolution core of the full sized video, and is coded by MPEG. The high frequency wavelet coefficients are coded by the proposed virtual zerotree which encodes the wavelets more efficiently than the simple zerotree. Embedded in the scheme is also the hierarchical motion compensation that gives high performance with a large capable compensation range, requires minimum computation, and reduces overhead motion information. With its scalability, data prioritization and precise bit-rate control, the scheme is suitable to code SHD video in multimedia applications.
Superhigh-definition-image special effect system
Ryuta Suzuki, Okikazu Tanno, Kiyotaka Kato, et al.
This paper introduces a super high definition (SHD) image special effecting system focusing on system architecture and key hardware technologies. Recently, super high definition imaging has been investigated as a next generation communication media beyond HDTV. Requirement of industries, such as contents production and entertainment or electronic museums, to the SHD imaging system, is to realize processing images with special effect in reasonably quick response- time. Image processing like dissolve, fade in/out, wipe and filtering are indispensable for professional editing system as well as for entertainment or museum usage. These processing are useful techniques to generate seamless scene change. However, handling of SHD images is difficult, because only 2.8 nano second per pixel is allowable time for real time image processing. In this system, the following key technologies are employed: VLSI technologies based upon GaAs, parallel processing architecture, switching data path architecture with triplet frame buffer, and 2 input 1 output high speed ALU architecture. This system can be applied for several applications such as remote editing and printing, CSCW (computer supported cooperative work) for designing, remote education, electronic museum, and electronic catalogue.
Reversible interframe compression of high-quality color images of paintings
Andrea Abrardo, Luciano Alparone, Franco Bartolini, et al.
Reversible compression of color images is gaining the ever- increasing attention of multimedia publishing industries for collections of works-of-art. In fact, the availability of high-resolution high-quality multispectral scanners demands robust and efficient coding techniques capable to capture inter-band redundancy without destroying the underlying intra-band correlation. Although DPCM schemes (e.g., lossless JPEG) are employed for reversible compression, their straightforward extension to true-color (e.g., RGB, XYZ) image data usually leads to a negligible coding gain or even to a performance penalty with respect to individual coding of each color component. Previous closest neighbor (PCN) prediction has been recently proposed for lossless data compression of multispectral images, in order to take advantage of inter-band data correlation. The basic idea to predict the value of the current pixel in the current band on the basis of the best zero-order predictor on the previously coded band has been applied by extending the set of predictors to those adopted by lossless JPEG. On a variety of color images, one of which acquired directly from a painting by the VASARI Scanner at the Uffizi Gallery with a very high resolution (20 pel/mm, 8 MSB for each of the XYZ color components), experimental results show that the method is suitable for inter-frame decorrelation and outperforms lossless JPEG and, to a lesser extent, PCN.
Object-Based Motion Compensation
icon_mobile_dropdown
Vertex tracking for grid-based motion compensation
Karsten Schroeder
To overcome some of the well-known artefacts stemming from block-based motion compensation, grid-based techniques have been proposed in the past as a promising alternative to block matching. While the latter is restricted to simple translational motion, grid-based compensation employs e.g. an affine model when triangular meshes are assumed. The theoretically superior model, however, will perform worse at object boundaries where the connectivity constraint of the meshes causes smoothing of the underlying discontinuous motion vector field. This effect can be diminished by providing an individual grid for each object in a scene. The crucial part of grid-based motion prediction is the technique of tracking mesh vertices. In contrast to block matching, motion estimation of vertices cannot be done independently without sacrificing prediction gain. This paper discuses different algorithms for vertex tracking. The issue of tracking at object boundaries and the influence of the resampling algorithm on the prediction gain are addressed in detail. Object grids, carrying both shape and motion information, are evaluated further in terms of shape coding efficiency and temporal scalability. Both aspects become important when aiming at functional coding for low bit rates, as it is currently being investigated in the framework of MPEG-4.
Segmentation-based video codec with a block-based fallback mode
Stefaan Desmet, Albert A. Deknuydt, Luc Van Eycken, et al.
This paper describes a possible strategy to migrate from block-based coding towards object-based coding. We use the results of an intelligent motion estimation algorithm to define low resolution (block based) segmentation. The information from previous segmentation results together with the motion vector field, the image data and spatial relations are used to define the cost to belong to a given object. A separate selection procedure is used to generate the high resolution (pixel based) objects. Afterwards some shape simplification is performed to generate the final objects.
Model-based hierarchical motion compensation technique
Jun-Seo Lee, Sang Uk Lee
In this paper, a new motion compensation technique is proposed to enhance the performance of the conventional motion compensation technique using the block matching algorithm, whose main features are a translational motion model and a uniformly spaced motion field. The proposed MC technique consists of two main stages: split and merge. In split stage, deformable squared-block-based motion estimation technique is employed to describe a complex motion. In merge stage, the rate-distortion optimal quadtree decomposition technique is employed to adapt the nonuniform characteristics of motion in sequence. From the simulation result, it is shown that the proposed technique yields about 1 - 2 dB PSNR gain over BMA at the same bit rate.
Fast motion detection and compensation method based on hybrid mapping parameter estimation and hierarchical structure in object-oriented coding
Chang-Bum Lee, Rae-Hong Park
This paper investigates motion estimation and compensation in object-oriented analysis-synthesis coding. Object- oriented coding employs a mapping parameter technique for estimating motion information in each object. The mapping parameter technique using gradient operators requires high computational complexity. The main objective of this paper is to propose a hybrid mapping parameter estimation method using the hierarchical structure in object-oriented coding. The hierarchical structure employed constructs a low- resolution image. Then six mapping parameters for each object are estimated from the low-resolution image and these parameter values are verified based on the displaced frame difference (DFD). If the verification test succeeds, the parameters and object boundaries are coded. Otherwise, eight mapping parameters are estimated in a low-resolution image and the verification test is again applied to an image reconstructed by estimated parameters. If it succeeds, the parameters and object boundaries are coded, otherwise, the regions are coded by second-order polynomial approximation. Theoretical analysis and computer simulation show that the peak signal to nose ratio (PSNR) of the image reconstructed by the proposed method lies between those of images reconstructed by the conventional 6- and 8-parameter estimation methods with reduction of the computation time by a factor of about four.
Poster Session I: Image/Video Coding
icon_mobile_dropdown
Performance analysis of 3D subband coding for low-bit-rate video
Andre Mainguy, Limin Wang
Two prevalent approaches for video coding are hybrid motion compensated DCT coding (MC/DCT) and subband coding. Hybrid MC/DCT coding has been adopted in present standards for low bit rate digital video compression such as ITU-T Recommendations H.261 and H.263. One problem with hybrid MC/DCT coding is that blocking artifacts in the reconstructed video sequences are prominent at low bit rates due to block segmentation of the image. Unlike block transform coding, subband coding does not suffer from these 'blocking' effects. A significant issue for the subband video coder is to fully exploit the temporal redundancy prevailing in video images for efficient video coding. More recent studies have addressed this problem using the three- dimensional (3-D) subband framework. In this study, a packet wavelet processing scheme is implemented to exploit temporal redundancy in video sequences. A bit allocation strategy is proposed and applied to the coding of the temporal subbands performed in an embedded fashion. The coding performance of the resulting 3-D wavelet subband video coder is compared with the H.261 coder at a bit rate of 384 kbps and CIF resolution, and with the H.263 coder at 64 kbps and QCIF resolution. Test sequences are selected to cover a reasonable range of scene contents.
Experimental comparison of segmented image coding and warped polynomial image coding
Wilfried R. Philips, Jeroen Van Overloop
This paper compares two second-generation image coding methods. The first method is segmented image coding (SIC) which segments images into regions of stationary image intensity and approximates the gray values in each region by a bi-variate polynomial. The second method is warped image coding (WIC) which represents images in terms of image- dependent base functions, but which may also be viewed as an adaptive subsampling method. The paper explains the most important features of both techniques and compares them experimentally at a bit rate of 0.3 bit per pixel on the 'peppers' image. The results show that WIC produces the best subjective image quality. On the other hand, WIC is computationally much more demanding than the particular SIC method considered in this paper. Both SIC and WIC produce better images than JPEG at the considered rate.
Reference-frame representation in the transform domain for H.261 video compression
Leonid Kasperovich
To deal with the temporal redundancy of an input video sequence H.261 Standard specifics to use displaced frame differences with reference frames and, optionally, motion compensation technique. Usually, encoding process is performed in a loop comprising reconstruction of a reference frame in spatial domain along with the computation of a difference between a successive input video frame and its reference, followed by DCT compression of that displaced frame difference. In this paper we show that the reference frames can be reconstructed in the transform domain instead, having no impact on the computational accuracy and output bitstream. In this way, we represent all reference frames in the transform domain in a form of dequantized DCT coefficients, so the next inter-frame is a difference between the next DCT-transformed input picture and the current reference frame. This inter-frame is then quantized and entropy encoded on a regular basis. The output bitstream remains to be H.261-compliant, while the compression ratio as well as quality of the decompressed video are the same as in conventional implementation. We also present the performance results for the software codec running on Pentium PC.
Lossless image compression based on a generalized recursive interpolation
Bruno Aiazzi, Pasquale S. Alba, Luciano Alparone, et al.
A variety of image compression algorithms exists for applications where reconstruction errors are tolerated. When lossless coding is mandatory, compression ratios greater than 2 or 3 are hard to obtain. DPCM techniques can be implemented in a hierarchical way, thus producing high- quality intermediate versions (tokens) of the input images at increasing spatial resolutions. Data retrieval and transmission can be achieved in a progressive fashion, either by stopping the process at the requested resolution level, or by recognizing that the image being retrieved is no longer of interest. However, progressiveness is usually realized with a certain performance penalty with respect to the reference DPCM (i.e., 4-pel optimum causal AR prediction). A generalized recursive interpolation (GRINT) algorithm is proposed and shown to be the most effective progressive technique for compression of still images. The main advantage of the novel scheme with respect to the standard hierarchical interpolation (HINT) is that interpolation is performed in a separable fashion from all error-free values, thereby reducing the variance of interpolation errors. Moreover, the introduction of a parametric half-band interpolation filter produces further benefits and allows generalized interpolation. An adaptive strategy consists of measuring image correlation both along rows and along columns and interpolating first along the direction of minimum correlation. The statistics of the different subband-like sets of interpolation errors are modeled as generalized Gaussian PDFs, and individual codebooks are fitted for variable length coding. The estimate of the shape factor of the PDF is based on a novel criterion matching the entropy of the theoretical and actual distributions. Performances are evaluated by comparing GRINT with HINT, and a variety of other multiresolution techniques. Optimum 4-pel causal DPCM and lossless JPEG are also considered for completeness of comparisons, although they are not progressive. For the examined images GRINT is always superior. Only optimum DPCM provides comparable results; GRINT is, however, progressive and yields error- free tokens at any resolution level.
Hybrid block-matching algorithm for motion estimation
Tsang-Long Pao, Jia-Shian Wu
Motion compensation is a key operation in video compression to remove the temporal redundancy in a video sequence. One of the application examples is the MPEG video compression standard. The most commonly used motion estimation algorithm is the block matching algorithm due to its regularity. Full search is the most straight forward block matching algorithm which can always locate the optimal motion vector. However, its computational complexity makes it impractical in real time applications. Fast algorithm required less computation but the obtained motion vector is suboptimal. In this paper, a hybrid block matching algorithm is proposed. In this algorithm, average intensities of groups of pixels are used to roughly estimate the motion first. Then, the fast search algorithm is applied in a reduced search region centered around the result of the first pass. Experimental results show that the performance of the estimation accuracy is quite close to that of the full search algorithm while the computational complexity is only slightly increased with respect to those fast algorithms.
Hierarchical hybrid video coding: displaced frame difference coding
Frank Mueller, Klaus Illgner
A new method for coding of displaced frame differences (DFD) is proposed. It is in its main aspects close to the classical pyramid approach of Burt and Adelson. In particular a least squares L2 Laplacian pyramid is employed which decomposes the DFD into several levels with differing spatial resolution. This pyramid is quantized and coded following a layered quantization approach together with a layered coding method based on conditional arithmetic coding. The DFD encoder outputs an embedded bit stream. Thus the coder control may truncate the bitstream at any point, and can keep a fixed rate. Simulation results show promising rate-distortion performance for low bit rate video coding.
Least-squares spline interpolation for image data compression
Michele Buscemi, Rossella Fenu, Daniel D. Giusto, et al.
A new interpolation algorithm for 2D data is presented that is based on the least-squares minimization and the use of splines. This interpolation technique is then integrated into a double source decomposition scheme for image data compression. First, a least-squares interpolation is implemented and applied to a uniform sampling image. Second, the splines and the analysis of the entropy allow us to reconstruct the final image. Experimental results show that the proposed image interpolation algorithm is very efficient. The major advantages of this new method over traditional block-coding techniques are the absence of the tiling effect and a more effective exploitation of interblock correlation.
Object-based very low bit rate coding with robust global motion estimation and background/foreground separation
Apostolos Saflekos, Michael G. Strintzis, Jenny Benois-Pineau, et al.
We present a two-step segmentation scheme for the very low bit rate coding of general purpose video scenes. Our objective is to determine the regions of interest before the actual segmentation procedure, so as to reduce the computational overhead introduced by this relatively complex process, and to avoid the phenomenon of over-segmentation of the scene. Simulation results show that the proposed scheme results in a radical reduction of the number of discrete spatio-temporal regions, while the background is identified as one uniform region, even when it is characterized by complex global motion.
Color image coding using block truncation and vector quantization
Mehmet Celenk, Jinshi Wu
In this paper, we describe an adaptive coding method for color images of natural scenes. It is based on the block truncation coding (BTC) and vector quantization (VQ) methods which attempt to retain important visual characteristics of an image without discarding any important details. The proposed algorithm is an iterative procedure developed by extending the within group variance and the information distance measurements to color images. It attempts to minimize one of these two measurements within m by m local windows so that the selected criterion results in the best compression rate for a given color image. This adaptive operation of the algorithm makes it particularly suitable for unsupervised parallel implementation. Once the window size is determined for an input image, then subimages within such windows are divided into two color classes using least- mean square (LMS) algorithm. Each color cluster within a window is represented by its mean color vector. A linear vector quantizer is then used to further compress the coded outputs of local windows to achieve the lowest compression rate for the input image. This results in lower bit rates (as low as 1.0 bit per pixel for the R, G, B color images used in the experiments) and reconstruction errors (as low as 7.0%) with some perceivable errors.
Fast motion estimation algorithm using spatial correlation of motion field and hierarchical search
Byung Cheol Song, Kyoung Won Lim, Jong Beom Ra
A new block matching algorithm specially proper for a large search area, is proposed. The algorithm uses spatial correlation of motion field and hierarchical search. Motion vectors of casually neighboring blocks can be credible motion vector candidates of the current block, if the motion field has high spatial correlation. However, they are not helpful for searching complex or random motion. Our hierarchical scheme consists of the higher level search that uses the motion vectors of neighboring blocks for continuous motion and evenly distributed motion vector candidates for random or complex motion, and the lower level search for the final motion vector refinement. Compared with the conventional hierarchical BMA, the scheme reduces the local minimum phenomenon. It also alleviates the error propagation due to the use of spatial correlation when the complex motion is involved. Simulation results show that the proposed algorithm drastically reduces the computational complexity to about 3.6% of that of FS-BMA, with the minor PSNR degradation of 0.29 dB even in the worst case.
Reduction of specific distortions in image vector quantization
Hazem Ahmad Munawer
In this paper, to reduce the specific distortions that accompany the vector quantization (VQ) process, we present the following VQ algorithms: (1) wavelet VQ (WVQ), (2) residual and hybrid residual VQ, and (3) VQ with a sorted codebook (VQ-SCB). The simulation results of the above mentioned algorithms have shown that a considerable reduction of the VQ specific distortions (PSNR equals 27.7 - 30.6 db) can be achieved at bit rates R equals 0.4 - 0.7 b/p.
Motion Estimation and Compensation
icon_mobile_dropdown
Bidirectional geometric transform motion compensation for low-bit-rate video coding
Sergio M. M. de Faria, Mohammad Ghanbari
Motion compensation with spatial transformation has proven to be more efficient than that of the conventional block matching technique. In this paper we report on the application of bidirectional usage of this technique in improving the coding efficiency of interframe coders further. Bidirectional block matching used in the standard video codecs, such as MPEG and H.263, has improved the coding efficiency of these codes over their H.261 predecessors. This is because in the unidirectional motion compensation, motion of the uncovered objects cannot be compensated unless access to future frames can be made. In our codec the motion estimation is carried out with the spatial transform. For reduction of the motion vector overhead we have segmented the image and quantized the neighboring motion vectors. We report on the optimum number of bidirectional motion compensated pictures for various bit rates, ranging from 32 kbit/s to 64 kbit/s.
Multiple-candidate hierachical block matching with inherent smoothness properties
Serafim N. Efstratiadis, T. Karampatzakis, Haralambos Sahinoglou, et al.
In this paper, we present a multiple candidate hierarchical block matching (MCHBM) estimation approach of the apparent motion vector field (MVF) in image sequences. In contrast to the standard hierarchical block matching (HBM) approach, which considers only the best solution at each level (single candidate case), MCHBM considers the H best candidate solution vectors and the associated matching error at that level. Then, the H selected candidate solution vectors are projected to the next higher resolution level in order to serve as initial estimates for the search process at that level, which improves all H estimates. Thus, at the highest resolution level, the final vector is selected by taking into account the global suitability of the vector and not just the local error. The resulting MVF approximates the true motion by avoiding local minima which lead to solutions that differ from the true MVF. The multiple candidate approach is considered in combination with the overlapped multi-grid and multi-resolution HBM estimation methods. The final algorithm has very good smoothness properties regardless of the application of any additional magnitude and/or smoothness constraints. Experimental results on video-conference image sequences demonstrate the improved performance of the proposed methods.
Frame-rate conversion based on acceleration and motion-based segmentation
Peter Csillag, Lilla Boroczky
Frame rate conversion requires interpolation of image frames at time instances, where the original sequence has not been sampled. This can be done in high quality only by means of motion-compensated algorithms, therefore the knowledge of motion present in the sequence is essential. This motion information has to be determined from the image sequence itself. In this paper a motion-based image segmentation algorithm is proposed, and its application for motion- compensated (MC) frame rate conversion is presented. The segmentation algorithm that can trace multiple rigid objects with translational movement, is based on vector quantization of the estimated motion field determining a set of global motion vectors and segmenting the images into multiple moving areas. Then, the spatial order of the objects (which one is in front of the other) is determined. Interpolation is performed based on the results of the segmentation, the set of motion vectors, and the proper handling of covered and uncovered areas. Furthermore, an accelerated motion model developed previously by the authors is applied, in order to further improve the performance of the MC frame rate converter.
Poster Session I: Image/Video Coding
icon_mobile_dropdown
Hierarchical hybrid video coding: motion estimation and motion vector field coding
Klaus Illgner, Frank Mueller
In this and an accompanying paper in the same proceedings a hierarchical video coding scheme is presented which is designed mainly for video communications at very low bit rates and therefore based on the hybrid coding principle. Both the displacement vector field and the displaced frame differences are decomposed into a Laplacian-type pyramid, and these pyramids are encoded in an embedded fashion using zero-trees and conditioning contexts, respectively. The focus of this paper lies on the motion estimation and the design of the motion vector field coding. Especially, the aspect of estimating and coding vector fields of different resolution is investigated to achieve optimal coding efficiency in coding situations, where there is not enough data rate available to code higher resolution vector fields completely.
MPEG-2 Optimization and Applications
icon_mobile_dropdown
Rate-reduction techniques for MPEG-2 video bit streams
Pedro Assuncao, Mohammad Ghanbari
Rate control is usually an intrinsic function of the video coding algorithm, responsible for matching the available bandwidth of the communication channel to the output bit stream. However, many of the forthcoming video services will use pre-encoded video, in compressed format, for storage and transmission purposes. In this case, the rate control function should work on the compressed bit stream by reducing its bit rate according to the network demand, which can be either static or dynamic. Bit rate reduction of compressed video is needed not only when pre-encoded video is transmitted, but also in video multicasting where several users receive the same bit stream through different channels. A similar problem results from the interactivity with the user in video on demand services, where each user is offered the possibility of choosing the amount of bandwidth assigned to its channel. In this paper we analyze rate reduction techniques for compressed video in terms of complexity and final picture quality. We propose a new scheme, working entirely in the frequency domain, for bit rate reduction of compressed video. It is a low delay and drift free video transcoder that outperforms a re-encoding system and performs very close to a normal encoder using uncompressed video.
Feedback-control scheme for low-latency constant-quality MPEG-2 video encoding
Andrea Basso, Ismail Dalgic, Fouad A. Tobagi, et al.
This paper describes a coding control scheme for MPEG-2 which maintains the perceived video quality constant, and is suitable for real-time and low latency encoding. We have chosen a proportional integral derivative (PID) scheme for the controller, and we adopt the same approach in designing the particular PID feedback function. The main reason for using a PID feedback function is the good tradeoff that it offers between computational complexity, ease of design, and performance. We have chosen to use a new video quality metric called moving picture quality metric (MPQM). This metric models the human visual system and matches subjective evaluations correctly, outperforming existing quality metrics for video. Simulations results are shown for typical video sequences. A comparison with CBR encoding also is presented.
MPEG-2 video coding with an adaptive selection of scanning path and picture structure
Minhua Zhou, Jan L.P. De Lameillieure, Ralf Schaefer
In the MPEG-2 video coding an interlaced frame can be encoded as either a frame-picture or two field-pictures. The selection of picture structure (frame/field) has a strong impact on picture quality. In order to achieve the best possible picture quality, an adaptive scheme is proposed in this paper to select the optimal picture structure on a frame by frame basis. The selection of picture structure is performed in connection with that of the optimal scanning path. First, the scanning path (zig-zag scan/alternate scan) is chosen based on a post-analysis of DCT-coefficients. Secondly, the optimal picture structure is selected for the next frame according to the chosen scanning path, i.e. a zig-zag scan corresponds to frame picture structure, while an alternate scan corresponds to field picture structure. Furthermore, the TM5 buffer control algorithm is extended to support the coding with adaptive frame/field picture structure. Finally, simulation results verify the adaptive scheme proposed in this paper.
Untitled Session
icon_mobile_dropdown
Application of superhigh-definition images to teleradiology and telepathology
Junji Suzuki
It was recognized early on that the digitization of medical information would advance the efficiency of diagnostic technology. However, the digitization of image data, which makes up the majority of medical information, is dependent on advances in technologies such as input, processing, transmission, storage, and display. Insufficient advances in such technologies has effectively limited the digitization of image data for medical use. The result of this has been non-networked systems or LANs confined to a single hospital. Such isolated systems integrate only portions of digital medical images such as x-ray computer tomography (CT), magnetic resonance (MR), and computed radiography (CR). Fortunately, recent advances in the areas of super high definition image I/O, high-quality encoding, ATM-based high speed transmission, and high-capacity storage has turned the tide in favor of the digitization and networking of all medical information. This paper focuses on the digitization and networking of medical image information used within hospitals and provides a multifaceted study of the technologies necessary for these advances. This allows us to discuss the present state of related technical developments and the level that has been attained so far. In addition, we have targeted image information that demands the highest level of quality (radiological and pathological images) for application in medical diagnosis using super high definition images, which have a spatial resolution of at least 2048 by 2048 pixels and a temporal resolution of at least 60 frames per second with progressive scanning. We cover the concrete issues and approaches to solutions that must be investigated when building and networking a digital system.
Image Compression
icon_mobile_dropdown
Evaluation of lossless compression of Teletext and graphics images for TV systems
Peter H. N. de With, Mihaela v.d. Schaar-Mitrea
The diversity in the nature of TV images is increasing due to the mixing of graphics data with natural scenery. In this paper we study the lossless coding of such graphics data with the objective to come to recommendations for video compression of such images. For this purpose, a comparison has been made between contour coding, template coding and run-length coding of graphical imagery. It is concluded that template coding is suitable for TXT-like images but it is not practical for more generic graphics images. For menus and similar imagery, contour and run-length coding provide interesting options.
Additional Papers
icon_mobile_dropdown
Rate-distortion optimal thresholding in SNR scalability based on 2D dynamic programming
Jan L.P. De Lameillieure
Thresholding is a technique for suppressing small transform coefficients in DCT-based coding. Recently, dynamic programming has been presented to optimize thresholding in a rate-distortion sense. This contribution investigates the extension of this method for SNR scalability. Because of the tight coupling between base and enhancement layer in SNR scalability, a 2D dynamic programming algorithm has been developed.
Image Compression
icon_mobile_dropdown
Autosophy information theory provides lossless data and video compression based on the data content
A new autosophy information theory provides an alternative to the classical Shannon information theory. Using the new theory in communication networks provides both a high degree of lossless compression and virtually unbreakable encryption codes for network security. The bandwidth in a conventional Shannon communication is determined only by the data volume and the hardware parameters, such as image size; resolution; or frame rates in television. The data content, or what is shown on the screen, is irrelevant. In contrast, the bandwidth in autosophy communication is determined only by data content, such as novelty and movement in television images. It is the data volume and hardware parameters that become irrelevant. Basically, the new communication methods use prior 'knowledge' of the data, stored in a library, to encode subsequent transmissions. The more 'knowledge' stored in the libraries, the higher the potential compression ratio. 'Information' is redefined as that which is not already known by the receiver. Everything already known is redundant and need not be re-transmitted. In a perfect communication each transmission code, called a 'tip,' creates a new 'engram' of knowledge in the library in which each tip transmission can represent any amount of data. Autosophy theories provide six separate learning modes, or omni dimensional networks, all of which can be used for data compression. The new information theory reveals the theoretical flaws of other data compression methods, including: the Huffman; Ziv Lempel; LZW codes and commercial compression codes such as V.42bis and MPEG-2.
Poster Session II: Optimization, Implementation, and Applications of CODECs
icon_mobile_dropdown
Three-step search motion estimation chip for MPEG-2 applications
You-Ming Chiu, Liang-Gee Chen, Yung-Ping Lee, et al.
In this paper, a hardware implementation of a 9-PE architecture for three-step search block-matching motion estimation algorithm is proposed. With intelligent data arrangement and memory configuration, the proposed architecture can reach the requirements of low costs, high speed, and low memory bandwidth. With 0.8 micrometer CMOS technology, the proposed chip requires a die size of 6.90 by 5.98 mm and is able to operate at a clock rate more than 50 MHz.
VLSI systems for image compression: a power-consumption/image-resolution trade-off approach
Javier Bracamonte, Michael Ansorge, Fausto Pellandini
Low power consumption is a requirement for any battery powered portable equipment. When designing ASICs for image and video compression, emphasis has been placed mainly on building circuits that are fast enough to satisfy the high data throughput associated with image and video processing. The imminent development of portable systems featuring full multimedia applications, adds the low-power constraint to the design of VLSI circuits for this kind of application. Several techniques such as lowering the supply voltage, architectural parallelization, pipelining etc., have been proposed in the literature to achieve low-power consumption. In this paper we report a VLSI circuit featuring a power management user-controllable technique that trades image quality for power consumption in a transform-based algorithm.
Improving the efficiency of MPEG-2 coding by means of film unsteadiness correction
Theodore Vlachos
The performance of MPEG-2 coders with regard to bit-rate reduction of unsteady film sequences is assessed. Sequences impaired both by hop-and-weave and twin-lens flicker artifacts are considered. It is shown that the removal of such artifacts improves the performance of an MPEG-2 coder significantly. In the spatial domain, improvements are due to the suppression of high frequencies artificially boosted by twin-lens flicker. In the temporal domain, the correction of hop-and-weave unsteadiness simplifies the motion estimation process considerably. As a consequence, the coding effort is focused mainly on picture content rather than being consumed by bit-rate reducing the unsteadiness artifacts. Our results show considerable improvement of subjective picture quality and of measured mean-square error for a constant bit-rate reduction ratio. Moreover, improved bandwidth efficiency can be achieved for the same level of objective or subjective distortion.
Application-driven computation of optimum quantization tables for DCT-based block coders
Nikos G. Panagiotidis, Anastasios N. Delopoulos, Stefanos D. Kollias
In this paper we propose a method for computing optimal quantization tables for specific images. The main criterion for this processing is the allocation of bandwidth in frequency subspaces in the DCT-domain according to power metrics obtained from the transform coefficients. Choice of the weights determines the subjective importance of each frequency coefficient as well as its contribution the finally perceived image. The simultaneous requirement that the quantization tables yield data compression comparable to the one achieved by the baseline JPEG scheme at various quality factors (QF) imposes an additional constraint to the proposed model.
Segmentation to diminish the mosaic effect
Xavier Jove-Boix, M. E. Santamaria, Elisa Martinez, et al.
In block coding techniques a harmful effect is the gap between blocks. The mosaic effect is important in smooth areas. In block coding techniques, the subjective quality depends on the mosaic effect. Among regions and on edge zones the mosaic effect is less injurious than in smooth areas. If the difference between blocks coincides with contour areas the visual effect almost disappears. The objective is to select a block with contour boundaries. Then another problem appears. It is not possible to select a regular block that matches exactly with an image region. On the other hand, the coding method will be modified to use irregular blocks. A large zone segmentation provides a uniform characteristic region. The boundaries of these regions will be edge areas. There are several methods of segmentation. We test statistical and morphological methods. We allow slow variation of the selected characteristics within every region. Segmentation coding methods use small regions and store their average gray level. Our method stores the most suitable characteristics for every large region. Sometimes the region will be expanded to a regular zone to use transform coding, such as DCT.
Transmission of hypertextual information using MPEG-2 private data
Luigi Atzori, Paolo Dettori, Massimiliano Di Gregorio, et al.
In this paper, we design a new system for the transmission of multimedia information within the digital TV channel in a broadcast fashion. The scheme proposed is based on some DSM- CC functions, studying an additional syntax that allows us to convey all the data required to manage with the multimedia information at the reception side. The transport structure used is the well-known MPEG-2 Transport Stream, that is the most widespread platform for new digital television systems. The multimedia information conveyed in such a model is similar to that of the WWW system, that is an hypertextual file system where the text is integrated with images, sounds and animation. A proper encoding software has been realized, that codes the file system by means of the DSM-CC operations and brings out a transport stream.
H.263 coding optimization: temporal subsampling driven by motion analysis
Giuseppe Russo, Stefania Colonnese, S. Rossi, et al.
The paper describes a coding optimization strategy in conformity with the recent ITU-T H.263 Recommendation for videophone sequences at bitrates less than 64 Kbit/s. The optimization algorithm jointly selects the temporal position of the frames to be coded and the coding mode (l, P or PB) of the selected frames. The decision is based on the observation, on a group of frames, of an 'activity' parameter, representing the variation of each frame with respect to the last coded one. The proposed strategy produces coded sequences with average frame rates lower than those produced by a non optimized coder, and a better visual quality of the single frame. However, the activity parameters evaluation, and the observation of several candidates, requires a greater delay, buffer size and complexity of the coding algorithm.
Efficient VLSI architecture for block-matching motion estimation
Han Kyu Lee, Jinwoong Kim, Young-Mi Ohk, et al.
Motion estimation reduces temporal redundancies in a video sequence, and becomes the most demanding part in video source encoders where motion compensated transform coding method is used. Block matching algorithm needs a large amount of computational load, but its regular data flow structure is good to implement with various parallel processing architectures. In this paper, we present a new VLSI architecture for block matching motion estimation in video encoding systems. The proposed architecture is based on linear systolic arrays. The proposed architecture has (1) fully pipelining operation which achieves 100 percent efficiency of processing elements, (2) efficient data input scheme for high input-rate video encoding systems, (3) glueless interfaces for easy extension of search range by cascaded multiple chip connections, (4) very regular and modular structure which is good to ASIC implementation.
Dynamic bandwidth allocation for an MPEG-2 multiencoder video system
Luis Miguel Lopes Teixeira, Maria Teresa Andrade
An efficient algorithm for dynamically multiplexing MPEG2 encoded video sources is presented. Sources are grouped into classes regarding different combined levels of spatial detail and amount of movement. Simulations were performed using different associations of sources belonging to distinct classes, different bit rates and GOP structures. The implications associated to a real implementation are analyzed and a modular architecture is proposed. Simulation results are presented and discussed, showing that sequences with higher spatial detail and motion are those which exhibit the higher quality improvements. These results are almost not affected by the non alignment, at GOP level, between video sequences.
High-performance variable length decoder with two-word bit-stream segmentation
Michael Bakhmutsky
Implementing a high-performance variable length decoder (VLD) presents a major challenge in building an MPEG-2 compliant HDTV video decoder. The capability of the VLD to process macroblocks in real-time can save memory and simplify decoder architectures. For an MPEG-2 main profile, high level compliant HDTV video decoder, this means that the VLD must be able to decode macroblocks at rates exceeding 100 million code words per second. Partitioning the system on the VLD level increases decoder complexity and memory utilization. It is therefore desirable to conceive of a 'one-piece' VLD capable of performing the required operations economically and in real-time. The process of decoding entropy-encoded variable length bit streams is inherently serial in nature. In the VLD, the parallel processing of the bit stream located between the resynchronization points is therefore limited. A unique technique of high-speed parallel bit stream processing is described. This technique is based on a non-traditional two- word bit stream segmentation method optimized for high-speed word length decoding. Applied to the main body of the bit stream, it produces excellent performance results in both consumer and professional profiles of MPEG where decoder partitioning at the VLD level might otherwise be the norm.
HCPN: multimedia synchronization with user interaction and hypermedia
Michael Junke Hu, K. S. Choo
The populace of multimedia hardware and software products in the 90s, and some recent investigation of formal models of synchronous multimedia systems, help researchers to understand better the fundamental difference between a comprehensive hypermedia system and a hypertext system. In this paper, a formal hypermedia model is proposed and discussed. The hypermedia composition Petri net (HCPN) model extends the current research on the formal specification of multimedia systems into the hypermedia framework. Apart from accommodating associative links which connects one piece of multimedia entity to another, the HCPN model supports spatial context and temporal coordination related to each hypermedia link, which is essential for a comprehensive hypermedia service. Furthermore, on-line reorganizing and rescheduling mechanisms are provided for dynamic coarse- grained and fine-grained hypermedia services. The HCPN is currently being implemented into a prototype system. The model development and its implementation are presented in this paper.
MPEG-2 variable bit rate coding algorithm: analysis and modeling
Antonio Chimienti, Marco Conti, Enrico Gregori, et al.
Variable bit rate (VBR) video is currently by far the most interesting and challenging real-time application for B- ISDN/ATM environments. To define bandwidth allocation schemes which provide an adequate quality of service (QoS) for VBR applications and minimize the waste of bandwidth, the effects of the video applications on the network must be investigated. While the modeling of VBR video sources has recently received significant attention, there presently exists no widely accepted model which lends itself to mathematical analysis. Currently MPEG is the reference standard for moving video compression. In the first part of the paper a VBR implementation of MPEG-2 coding algorithm is described and a comparison with the constant bit rate (CBR) algorithm is performed. To investigate the impact that the coding parameters have on the characteristics of the traffic delivered to the network a long sample of the movie 'The Sheltering Sky' has been coded with two different quantizer steps. The generated trace was used to obtain a detailed statistical analysis of the traffic generated by an MPEG-2 encoder. Starting from this statistical analysis an analytically tractable model is developed, analyzed and used to study bandwidth allocation problems.
Fractal-Based Coding
icon_mobile_dropdown
Multiresolution transform and its application to video coding
Traditional wavelet edge detection and encoding schemes preserve shape features of objects effectively at a variety of spatial scales and also allow an efficient means of image and video compression. However these schemes also remove texture from imagery and thus image reproduction quality suffers. Fractal encoding techniques on the other hand generate high compression ratios but tend to introduce blocky artifacts in imagery. Thus we describe a video encoding method that combines the shape preserving capability of wavelets with the compression qualities of fractal compression for a hybrid multiresolution technique that achieves high compression, selective and accurate feature preservation, and is computationally efficient.
Structure-based fractal image coding
Bronislav Titkov, Anatoli Tikhotskij, Alexandr Myboroda, et al.
An image coding scheme using partitioned iterated function systems (fractal codec) is presented. It extends the usual block-based algorithms with quadtree partitioning by splitting with arbitrarily shaped masks. Besides the general advantages of fractal coding such as very high compression and scalable decoding this gives provision for shape- adaptive coding. Furthermore the usage of orthogonal luminance transformations results in code suitable for browsing. The codec now uses up to 256 masks at each level of the quadtree partition (up to four levels) which may be applied twice. The mask set was experimentally optimized on a set of images. The high degree of flexibility is handled besides the usage of the known 'multidimensional nearest neighbor search' by an estimation of useful masks based on a block-structure analysis. Fractal referencing to parts of the image already coded allows for additional reduction of data rate. Controlling takes into account the compression efficiency of the different types of approximation. The bit stream of the encoded image is structured, pre-encoded and selectivity entropy coded by arithmetic coding. Remaining blocking artifacts are concealed by a generally applicable method called bending, which is designed to avoid blurring by filtering.
Simplified method of testing for convergence in fractal image coding schemes
Peter Siepen
In fractal image coding the original image is approximated by a unique fixpoint of a contractive affine transformation. To ensure convergence at the decoder an eigenvalue calculation of the transformation matrix during the encoding process is necessary to admit scaling coefficients larger than one. Due to the huge dimension of the transformation matrix the eigenvalue calculation is in general computationally infeasible. This paper presents a method to reduce the dimension of this matrix dramatically. The result of this method is a simple rule to create the reduced matrix directly without using the original matrix. Based on this rule a hierarchical method is presented, which allows us to test rather general fractal coding schemes for convergence.