Proceedings Volume 3309

Visual Communications and Image Processing '98

Sarah A. Rajala, Majid Rabbani
cover
Proceedings Volume 3309

Visual Communications and Image Processing '98

Sarah A. Rajala, Majid Rabbani
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 9 January 1998
Contents: 19 Sessions, 104 Papers, 0 Presentations
Conference: Photonics West '98 Electronic Imaging 1998
Volume Number: 3309

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Image Coding I
  • Motion Estimation I
  • Video Coding
  • Pre- and Post-processing
  • Filtering/Interpolation
  • Model-based Coding
  • Image and Video Coding Standards
  • Poster Presentations on Motion Estimation II
  • Model-based Coding
  • Poster Presentations on Motion Estimation II
  • Fractal/Subband Coding
  • Poster Presentations on Motion Estimation II
  • Implementations and Architectures
  • Very Low Bit-rate Coding
  • Poster Presentations on Filtering
  • Video and Database Management
  • Image Coding II
  • Image Sequence Analysis
  • Image Coding III
  • Stereoscopic Data Processing/Analysis
  • Wavelet Coding
  • Poster Presentations on Filtering
  • Wavelet Coding
  • Motion Estimation/Detection
  • Fractal/Subband Coding
  • Poster Presentations on Filtering
Image Coding I
icon_mobile_dropdown
Edge-assisted upper-bands coding techniques
Ramon Llados-Bernaus, Robert L. Stevenson
This paper introduces a series of techniques aimed at facilitating the encoding of the upper bands of a subband decomposition. The fundamental idea behind these techniques is that the energy in the upper bands is concentrated around the discontinuities of the lower band. Therefore, for image compression the output of an edge detector applied on the lower band can be employed to locate the upper band pels with the largest amplitudes. Moreover, the percentage of pels declared as edges in each one of the upper bands can be interpreted as an additional degree of freedom in the definition of a subband compression scheme, in a similar way to the bit allocation between bands. The performance of the proposed techniques for various edge detectors and different filter banks is studied. We introduce an algorithm to distribute a given number of edge pels between the different upper bands in order to minimize the total distortion. A morphological operator is presented as a method to further increase the efficiency of the edge-based approach. The influence of the baseband distortion in the performance of the proposed techniques is also analyzed. Finally, we estimate the savings in transmitted bit rate of an edge based compression system with respect to traditional schemes.
Joint optimal object shape estimation and encoding
Lisimachos P. Kondi, Fabian W. Meier, Guido M. Schuster, et al.
A major problem in object oriented video coding and MPEG-4 is the encoding of object boundaries. In our previous work, we presented efficient methods for the lossy encoding of object boundaries which were optimal in the rate distortion sense. In this paper, we extend our work to utilize both the original image and an initial segmentation to obtain the optimal shape representation based on the criteria we impose. THe boundary detection and encoding problems are considered simultaneously. If there is low confidence on the location of the boundary, a large approximation error is allowed when encoding the boundary and vise versa. Experimental results demonstrate the effectiveness of the proposed algorithm.
Symmetric padding for content-based 2D-DCT coding
Satoshi Misaka, Yuichiro Nakaya, Taizo Kinoshita
Content-based coding, which independently codes the objects included in a picture, has recently attracted considerable attention for its property of enabling object-based editing of still and motion pictures. In content-based coding, it is required to code arbitrarily shaped objects. Therefore, it is necessary to apply 2D-DCT to 8 X 8 blocks that include object boundaries. Padding is a technique that enables condign of such blocks by assigning imaginary values to the pixels that are not included in the object. Additionally, padding prevents the increase of the high frequency DCT coefficients which is caused by the discontinuous object boundary. In this paper, a new padding method named Symmetric Padding, which provides high coding efficiency with a simple copy-and-paste procedure, is proposed. Additionally a content-based 2D-DCT coding method, which changes the padding method and the scanning method according to the features of the object shape, is proposed. Due to the increase of the number of zero DCT coefficients, the proposed method shows better coding performance than conventional method, especially, at high bit rates.
Embedded still image coder with rate-distortion optimization
Jin Li, Shawmin Lei
It is well known that the fixed rate coder achieves optimality when all coefficients are coded with the same rate-distortion (R-D) slope. In this paper, we show that the performance of the embedded coder can be optimized in a rate-distortion sense by coding the coefficients with decreasing R-D slope. We denote such coding strategy as rate-distortion sense by coding the coefficients with decreasing R-D slope. We denote such coding strategy as rate-distortion optimized embedding (RDE). RDE allocates the available coding bits first to the coefficient with the steepest R-D slope, i.e. the largest distortion decrease per coding bit. The resultant coding bitstream can be truncated at any point and still maintain an optimal R-D performance. To avoid the overhead of coding order transmission, we use the expected R-D slope which can be calculated from the coded bits and are available in both the encoder and the decoder. With the probability estimation table of the QM- coder, the calculation of the R-D slope can be just a look- up table operation. Experimental results show that the rate- distortion optimization significantly improves the coding efficiency in a wide range of coding rates.
Adaptive embedding for reduced-complexity image and video compression
An embedded coding algorithm crates a compressed bit stream which can be truncated to produce reduced resolution versions of the original image. This property allows such algorithms to precisely achieve fixed bit rates, reducing or eliminating the need for rate control in video transmission applications. Furthermore, embedded bit streams have a certain degree of innate error resistance and can also be used to facilitate unequal error protection. Unfortunately, embedded compression algorithms are relatively slow. In this paper, we introduce the concept of adaptive embedding in an effort to address this problem. Such embedding increases the speed of the algorithm by reducing the number of separate resolution layers contained within the bit stream. Since it is often impossible to effectively use the many layers created by existing embedded coders, sacrificing some of them to speed up the processing may be quite acceptable. We show here that adaptive embedding increase execution speed by 28 percent for fixed-rate video compression with only a 2 percent reduction in rate-distortion performance. Finally, we also introduce an alternate form of lossless compression which increases execution speed by another 6-10 percent at the expense of reconstruction quality.
Robust embedded zerotree wavelet coding algorithm
Sujitha Thillainathan, David R. Bull, Cedric Nishan Canagarajah
Error-resilience is an important feature of any image or video coding algorithm associated with transmission over noisy or multipath channels. In this paper, we present a robust coding algorithm based on a modified version of the zerotree coding technique. The algorithm provides significantly improved error-resilience with minimum added redundancy while still retaining the efficiency and scalability of the original technique.
Motion Estimation I
icon_mobile_dropdown
Novel computationally scalable algorithm for motion estimation
Krisda Lengwehasatit, Antonio Ortega, Andrea Basso, et al.
Because motion estimation represents a major computational load in typical vide encoding systems, there has been extensive research into fast motion estimation techniques. Given the nature of the process, two major classes of complexity reduction techniques have been proposed. These seek to speed up search times by (i) reducing the cost of each matching operation or (ii) reducing the number of points considered in the search region. In fast matching (FM) techniques, a typical approach is to compute the cost function based on a subset of pixels in a block. In fast search (FS) approaches, the complexity reduction comes from restricting the number of points in the search region, based on fixed rules or on initialization based on motion vectors already computed for other blocks or the previous frame. In this paper we use as a baseline algorithm the initialize- refine technique which belongs to the FS class. We concentrate on the case of real time software video encoding, which allows the flexibility of using variable complexity algorithms. Thus, we modify our baseline algorithm using a Lagrange multiplier approach similar to that of which allows us to explicitly take into account the trade-offs between search complexity and residual frame energy. Furthermore, we combine this algorithm with a novel fast matching method for SAD estimation which allows us to estimate the SAD based on successive subsets of pixels in a particular block. This method naturally possesses computational scalability because we can stop the framework and gives us one more degree of freedom to control the complexity/residual energy trade-off. We show that the combined algorithm achieves reductions of around 25 percent in computation time with respect to the original algorithm without SAD estimation. These results are further improved by designing a test structure that is optimized for typical sequences and where test for an early termination of the matching process are only included if they are though to be worthwhile in terms of the overall complexity.
Hierarchical motion estimation using binary pyramid with three-scale tilings
Xudong Song, Ya-Qin Zhang, Tihao Chiang
In this paper, a hierarchical motion estimation algorithm using a binary pyramid (HMEBP) with 3-scale tilings is proposed. In the HMEBP scheme, motion estimation is performed using three block sizes in the real domain at the topmost layer. At the intermediate layers, each candidate motion vector is refined in the binary domain at three different scales and the best motion vector from each scale is propagated to the next layer for further refinement. At the lowest layer, one motion vector is selected for refinement for each macroblock based on minimizing the motion-compensated predicted error. The proposed techniques reduces computational complexity greatly compared with the full search because motion estimation in the binary domain only involves Boolean logic operations. This results in a substantial reduction in hardware complexity. Simulations on three MPEG sequences show the performance of the HMEBP is comparable with the full search.
Hierarchical block-matching algorithm using partial distortion criterion
A new block matching algorithm (BMA) especially appropriate for large search areas is proposed. Motion vectors of causally adjacent blocks can be credible motion vector candidates in continuous motion fields. However, they are not helpful for searching complex or random motions. In order to remedy this problem, we propose a new two-step hierarchical block matching algorithm using spatial correlation in a motion field. In the first step, the candidates for an initial estimate consists of four motion vectors of adjacent blocks for searching continuous motion, and regularly sub-sampled points for searching complex or random motions. In the second step, the estimate is refined within a smaller search area by using full search BMA (FS- BMA). The straightforward application of the first step, however, tends to break data flow regularity due to random locations of four adjacent motion vectors. Therefore, in order to maintain consistent data flow in examining the four adjacent vectors, we introduce partial mean absolute difference which is calculated by suing a partial searching block rather than the whole block. Simulation result show that, in comparison with FS-BMA, the proposed algorithm reduces the computational complexity to 5.9 percent with negligible PSNR degradations. Furthermore, due to its regular data-flow, our scheme is especially suitable for hardware implementation.
Simple illumination-corrected vector search algorithm
Albert A. Deknuydt, Stefaan Desmet, Luc Van Eycken
Plain block matching, and may improved block based matching techniques, fail to determine real physical motion when there is a marked change in luminance. Such changes occur often, but most of the time, the block average changes only a few units per frame, if the frame rate is reasonably high. This is not enough to completely upset plain block matching. When special effects are present, or frame rates are low, the effect becomes more important. Then vectors returned by plain block matching try to match the change in average luminance. What happens as a result of this depends on the exact codec used. Some coders will find the resulting total difference energy too large, and switch to intra mode coding. An example of this behavior is the problem most MPEG encoders have with photographic flashlights. A more intelligent codec might still code in inter mode, but will have to recode all details in the block, because of the wrong vector. If we could use a vector that keeps track of real physical motion and a codec that can deal with DC changes separately, we would be much better off. In this paper we describe a simple way to correct for these illumination changes, without adding any worthwhile computational burden to the encoder and show that the additional to an improved block matching algorithm is able to generate more realistic motion vectors.
Motion estimation and compensation based on region-constrained warping prediction
Dong-Il Chang, Joon Hyun Sung, Jeong Kwon Kim, et al.
The visually annoying artifacts resulting form block matching algorithm (BMA), blocky artifacts, become noticeable in applications for low bit rates. Warping prediction (WP) based schemes can remove the blocky artifacts of BMA successfully, but they also produce severe prediction errors around the boundaries of moving objects. Since the errors around the boundaries of objects are visually sensitive, they may sometimes look more annoying than blocky artifacts. The lack of ability of modeling motion discontinuities is the major reason of the errors from WP. Motion discontinuities usually exist in practical video sequences, so that it is required to develop a more reliable motion estimation and usually exist in practical video sequences, so that it is required to develop a more reliable motion estimation and compensation scheme for low bit rate applications. In this paper, we propose a new WP scheme, named region constrained warping prediction (RCWP), which places motion discontinuities according to the segmentation results. In RCWP, there is mutual dependency between estimated motion field and segmentation mask. Because of the mutual dependency, an iterative refinement process is also introduced. Experimental results have shown that the proposed algorithm can provide much better subjective and objective performance than the BMA and the conventional warping prediction.
New algorithm for motion estimation on interlaced video
Many video processing algorithms can profit from motion information. Therefore, motion estimation is often an integral part of advanced video processing algorithms. This paper focuses on the estimation of true-motion vectors, that are required for scan-rate-conversion. Two recent motion estimator methods will be discussed. By combining these tow methods, the major drawbacks of the individual MEs is eliminated. The new resulting motion estimator proves to be superior over alternatives in an evaluation.
Video Coding
icon_mobile_dropdown
Joint block-based video source/channel coding for packet-switched networks
Raynard O. Hinds, Thrasyvoulos N. Pappas, Jae S. Lim
Block-based video coders rely on motion compensated block prediction for more data compression. With the introduction of video coding over packet-switched networks such as the Internet and the resulting packet loss that occurs on congested networks, coding mode selection for each macro- block is significant in determining the overall distortion on the decoded video sequence. In this work, we examine the problem of mode selection for macro-blocks in the presence of loss and a channel rate constraint. We present and evaluate several methods for mode selection that attempt to minimize perceptual distortion from packet loss. We formulate a simplified problem which is useful for gaining some insight, and present an efficient algorithm to finding an optimal solution.
Multiple-reference-picture video coding using polynomial motion models
Thomas Wiegand, Eckehard G. Steinbach, Axel Stensrud, et al.
We present a new video coding scheme that uses several references frames for improved motion-compensated prediction. The reference pictures are warped versions of the previously decoded frame applying polynomial motion compensation. In contrast to global motion compensation, where typically one motion model is transmitted, we show that in the general case more than one motion model is of benefit in terms of coding efficiency. In order to determine the multiple motion models we employ a robust clustering method based on the iterative application of the least median of squares estimator. The approach is incorporated into an H-263-based video codec and embedded into a rate- constrained motion estimation and macroblock mode decision frame work. It is demonstrated that adaptive multiple reference picture coding in general improves rate-distortion performance. PSNR gains of 1.2 dB in comparison to the H-263 codec for the high global and local motion sequence Stefan and 1 dB for the sequence Mobile and Calendar, which contains no global motion, are reported. These PSNR gains correspond to bit-rate savings of 21 percent and 30 percent comparing to the H-263 codec, respectively. The average number of motion models selected by the encoder for our test sequences is between 1 and 7 depending on the actual bit- rate.
Adaptive prediction models for optimization of video encoding
Axel Brinkmann, Jose Ignacio Ronda, Angel Pacheco, et al.
The increasing demand on high quality digital video broadcasting systems has driven the developments of several coding standards, the motion picture experts group (MPEG) standard 4 being the most recent. Since these standards only define the bitstream syntax, such essential features of the coder as controlling the output bitrate have to be implemented individually. An important problem arising here is the optimization of the bitrate-distortion trade-off, since in general the resulting bitstream has to meet the requirements of a communication channel with limited bandwidth. Obviously the output bitrate can be influenced by adjusting the quantization of the coder, but however a general functional relation to predict bitrate and distortion at a given quantization for a frame does not exist, since these parameters depend strongly on the properties of the actual video sequence. With the help of a predictor adapting on a specific video sequence, it would be possible to run an algorithm, which calculates the optimal quantization parameters causing the coder to meet the bitrate requirements of the given channel. Main objective of this paper is to design az model of a MPEG-4 coder with reduced complexity that is capable to predict the functional relation between quantization, bitrate and distortion for the next frame to be coded of a given video sequence. This model will be implemented as a neuronal network forming a two-layer perceptron.
Scalable high-definition video coding
Gary Lilienfield, John W. Woods
A new source coding algorithm is proposed which delivers an encoded bit stream that is scalable in both spatial resolution and frame rate. Motion compensated temporal filtering is combined with a spatial subband/wavelet pyramid to provide an efficient 3D multiresolution representation that yields perceptually acceptable low frame rate sequences. Error feedback hierarchical coding is used to eliminate the propagation of coding errors between spatio temporal resolutions in order to achieve near optimal results for each subvideo. The proposed order in of refinement addressees the non-commutative property of motion compensated temporal filtering and spatial subband/wavelet analysis. A new algorithm based on adaptive conditional arithmetic coding of quantizer significance maps is introduced and is shown to increase coding efficiency. Experimental results demonstrate a significant improvement in performance over other published algorithms. The algorithm is found to be more robust with respect to variations in the type of motion present in video data. The code-and-refine nature of the hierarchical algorithm makes it possible to use the algorithm as a scalable extension to other source coding algorithms. Finally, the complexity of the algorithm is modest and is well suited to a parallel implementation.
Mobile videophone systems for radio speech channels
Lee David Scargall, Satnam Singh Dlay
A suite of bandwidth efficient image codecs are presented for the use in second-generation wireless systems, such as the American IS-54 and IS-95 systems, the Pan-European GSM system and the Japanese digital cellular system. The proposed codecs are configured to operate at 9.6K bits per second and are suitable for Quarter Common Intermediate Format videophone sequences, scanned at 10 frames per second. The new image codecs employ the orthonormal wavelet transform to decompose the displaced frame difference data for each frame, into four frequency subbands. The wavelet coefficients within the frequency subbands are then encoded using vector quantization. Comparison measures are undertaken for the two-stage pairwise nearest neighbor (PNN) algorithm and the designs are rated upon their ability to coherently reconstruct an efficient codebook from a training sequence of vector coefficients. It was found that the two- stage PNN algorithm constitutes a valuable compromise in terms of computational complexity with only negligible performance loss. When the codecs were configured to operate at 9.6Kpbs, the average peak signal to noise ratio of the two stage PNN and the adaptive algorithms were in excess of 28kB and 30dB respectively.
Backward context adaptive interframe coding
Wenqing Jiang, Antonio Ortega
This paper presents a backward context adaptive coder for motion-compensated difference frames based on a mixture density model. In this model, each pixel is assumed to be generated from a random source with probability distribution conditioned on the interframe context which consists of the local intensity context and the local displacement vector context. To estimate this conditional probability distribution, a backward adaptive nonparametric approach is chosen in our work due to its fast adaptivity to the input data characteristics. Once this probability distribution is found, each pixel can then be coded by using the corresponding entropy coder. As an application, we modified Telenor's TMN5 H.263 algorithm based on this context adaptive coding idea and implemented a hybrid interframe coder. Our simulation results show that the coding performance at 8kbps is comparable to that of H.263 without obvious blocking effects and our coder can also be implemented with lower complexity.
Color-based classifier for region identification in video
Richard P. Schumeyer, Kenneth E. Barner
Content based coding has been proposed by several authors and members of the MPEG-4 community as a solution to very low rate video coding. Using content coding, a video sequence is decomposed into objects that may be encoded independently. Such a scheme requires a fast and accurate segmentation algorithm to identify various objects in the scene. In this paper we propose, develop, and analyze a color-based segmentation algorithm. One application of interest is coding of sign language video sequences. The requirements for accurate perception of sign language differ from those of traditional head-and-shoulders videoconferencing sequences. We propose a content-based coding method in which perceptually important regions in an image are identified, and more resources are allocated to these regions. Since face, hands and arms are important components of sign language, regions are defined that encompass these features. The dynamic segmentation algorithm identifies flesh regions using statistical methods operating on image color distributions. A method for performing the segmentation in the perceptually linear LAB space using data captured in the YCbCr space is developed. Results of encoding sign language sequences using the proposed content- based methods illustrate the improved quality that can be achieved at the same bit rate when compared to a uniform algorithm.
SNR scalable video coder using progressive transmission of DCT coefficients
Marshall A. Robers, Lisimachos P. Kondi, Aggelos K. Katsaggelos
The importance of signal-to-noise ratio (SNR) video compression algorithms has increased in the past few years. This emergence corresponds with the vast increase of products and applications requiring the transmission of digital video stream.s these new applications, including video telephony/teleconferencing, video surveillance/public safety, and video-to-demand, require limiting the bandwidth of the compressed bitstream to less than the capacity of the transmission channel. However, the channel capacity is frequently unknown at the time of compression, especially when the stream is to be broadcasted to many users over heterogeneous channels. SNR scalable, compression allows a single compression to provide bitstreams of multiple quality. In this fashion, the transmitted bitrate can match the available channel(s) without requiring multiple encodings. In this paper, we present a novel approach to SNR scalable video compression. Our approach combines two separate methodologies for dividing the blocks of discrete cosine transform (DCT) coefficients. The flexible combination of these approaches allows each DCT block to be divided into a fixed number of scans while also controlling the size of each scan. Thus, the transmitted stream can contain any subset of scans from the overall compressed version and thereby both the transmitted bitrate and the quality or SNR are allowed to vary.
Compression of mixed video and graphics images for TV systems
The diversity in TV images has augmented with the increased application of computer graphics. In this paper we study z coding system that supports both the lossless coding of such graphics data and regular lossy video compression. The lossless coding techniques are based on runlength and arithmetical coding. For video compression, we introduce a simple block predictive coding technique featuring individual pixel access, so that it enables a gradual shift from lossless coding of graphics to the lossy coding of video. An overall bit rate control completes the system. Computer simulations show a very high quality with a compression factor between 2-3.
Pre- and Post-processing
icon_mobile_dropdown
Block effect reduction for the two-layer video codec based on MPEG-2
Taehwan Shin, Tae-Sun Choi
In this paper, we describe the two-layer video codec based on MPEG-2 and its layering algorithms, where we exploit the property that the human vidual system is more sensitive to low frequency coefficients than high frequency coefficients in the DCT domain. We prose a new bloc effect reduction algorithm, called content-based AC correction, which predicts AC coefficients considering the image content itself in the DCT domain; because the block effect occurs in a block which has few AC coefficients in DCT domain. First, this algorithm detects which block has a block effect. Secondly, this algorithm detects if a block is located near the edge or a boundary of video object. If the block is located near any edge or boundary, a new AC correction algorithm is used. Otherwise, the traditional AC correction algorithm is used.
Elimination of blocky artifacts in predictive video coding
Yoon-Seok Jung, Geun-Soo Park, Samuel Moon-Ho Song
In a number of image compression standards, the motion vectors are generally obtained by the well-known block matching algorithm (BMA), the error image along with the computed motion vectors are encoded. Unfortunately, this widely used approach generates artificial block boundary discontinuities, called blocky artifacts, between the blocks. Since the blocky artifacts are caused by synthesizing the predicted frame using one constant motion vector per block, we propose an algorithm that interpolates the motion vectors before the construction of the predicted image. Naturally, using spatially smooth motion vectors completely eliminates the blocky artifact. However, we can no longer use the motion vectors as provided by the BMA. The optimum motion vectors must minimize the norm of the error image. The proposed algorithm computes the optimum motion vectors, with the interpolation process built into the algorithm. To obtain spatially smooth motion vectors, we use a band-limited interpolation, and thus, we refer to our algorithm as the band-limited motion compensation (BLMC). Our simulations indicate that the BLMC completely eliminates the blocky artifacts, as expected, and in addition provides higher peak-signal-to-noise-ratio in comparison to the traditional BMA based motion compensation (BMC) as well as the overlapped BMC.
Reduction of coding artifacts at low bit rates
Thomas Meier, King N. Ngan, Gregory A. Crebbin
Many video and image coding standards such as MPEG, H.261, H.263, and JPEG are based on the discrete cosine transform (DCT). They subdivide images or frames into blocks that are encoded independently. DCT coefficients that are considered to be less important are discarded. The achievable compression ratios are normally limited by visible discontinuities along block boundaries, the so-called blocking artifacts. In this paper, a postprocessing algorithm based on Markov random fields (MRF) is proposed that efficiently removes blocking effects, while retaining the sharpness of the image and without introducing new artifacts. To prevent blurring of dominant edges. The decoded image is segmented into regions which are then enhanced separately. The segmentation is preformed in two steps. A novel texture detector first identifies all texture regions, before the remaining monotone areas are partitioned by an MRF segmentation algorithm. A new edge component has been incorporated to improve the detection of dominant edges. The proposed enhancement stage calculates the MAP estimate of the unknown original image, which is modeled by an MRF. An efficient implementation is presented, and experiments demonstrate that our proposed postprocessor gives very good results both objectively and subjectively.
Deblocking filter with two separate modes in block-based video coding
Sung Deuk Kim, Jaeyoun Yi, Jong Beom Ra
This paper presents a method to remove blocking artifacts in low bit-rate block-based video coding. The proposed algorithm has two separate filtering modes, which are selected by pixel behavior around the block boundary. In each mode, proper 1D filtering operations are performed across the block boundary along horizontal and vertical directions, respectively. In the first mode corresponding flat regions, a strong filter is applied inside the block as well as on the block boundary, because the flat regions are more sensitive to the human vidual system and the artifacts propagated from the previous frame due to motion compensation are distributed inside the block. In the second mode corresponding to other regions, a sophisticated smoothing filter, which is based on the frequency information around block boundaries, is used to reduce blocking artifacts adaptively without introducing undesired blur. Even though the proposed deblocking filter is quite simple, it improves both subjective and objective image quality for various image features.
JPEG image enhancement based on adaptive learning
Guoping Qiu, Hsiao-Pei Lee
In this paper, a new technique is developed to enhance the quality of JPEG compressed images. Based on the principle of learning by examples, the new method constructs the enhancement system adaptively. In order not to incur overheads, the processing system is designed to have only 7 parameters which can also be estimated off-line. Extensive simulations have been performed, results show significant improvements of both subjective and objective quality of JPEG compressed images. It is also observed that the new techniques performs competitively to state of the art techniques and is computationally more efficient.
Filtering/Interpolation
icon_mobile_dropdown
Effect of image stabilization on the performance of the MPEG-2 video coding algorithm
A. Tanju Erdem, Cigdem Eroglu
The effect of image stabilization on the performance of the MPEG-2 video coding algorithm is investigated. It is shown that image stabilization prior to MPEG-2 coding of an unsteady image sequence does increase the quality of the compressed video considerably. The quality improvement is explained by fact that an actual zero motion vector is favored over a zero differential motion vector in the MPEG-2 video coding scheme. The bits saved in coding the motion information in P frames are then utilized in the coding of the DCT data in I frames. The temporal prediction of the macroblocks in P and B frames are also improved because of the increased quality of the compressed I frames and because an image stabilization algorithm can compensate for displacement with better than 1/2 pixel accuracy.
Generalized image degradation model for removing motion blur in image sequence
Yoo Chan Choung, Jeong-Ho Shin, Joon-Ki Paik
Moving pictures inevitably suffer from motion blur caused by the relative motion between a camera and objects. Such degradation is due to the nonideal operation of a shutter in the imaging instrument. Especially, motion blur is usually space-variant because the moving objects have different orientation and velocity. The purpose of this paper is to give a precise model of the space-variant motion blue, and to propose an adaptive image restoration techniques which removes the corresponding motion blur. The main concern of the proposed model is to explain the phenomenon that occurs in the boundary region of moving objects. In the process of estimating the degradation, a hierarchical motion estimation method is used. And then an adaptive restoration algorithm shows subjectively acceptable performance in eliminating the space-variant motion blur.
3D interpolation for the digital restoration of 35-mm film
Heimo Mueller-Seelich, Walter Plaschzug, Klaus Glatz
The celebration of the Centenary of Cinema in 1995 was the occasion to initiate new developments for the preservation of the international cinematic heritage and the restoration of old damaged films. 'Classical' film restoration is based on special printing machines to improve the quality of copies. Only a small class of artifacts can be removed with such a process because the unit of manipulation is always a whole image sequence. With the help of digital image processing techniques the restoration process can be adapted for each frame or even pixel. This creates new potentialities for the restoration of films beyond repair, especially nitrate based films produced before 1954 and early color films. This paper presents a short overview about a system for the digital restoration of image sequences, currently under development in the EUREKA project LIMELIGHT. After an introduction to the technical objectives and key figures, the restoration process is described for the case of 35mm film. Algorithms for the detection of artifacts, such as dust, image vibrations, scratches, distorted frames and brightness variations, based on a morphological detector, which uses spatial properties, and a dynamic detector, based on motion analysis, are presented. Furthermore an algorithm for 3D images interpolation used for the removal of scratches and subtitles is described. The main problem is the reconstruction of the missing image content for more than one frame in the same spatial location. Application examples for each defect class are given.
General framework of image sequence interpolation
Jeong-Ho Shin, Yoo Chan Choung, Joon-Ki Paik
Image interpolation is widely used in various image processing applications. In this paper, we propose a general framework for performing image sequence interpolation and a novel image sequence interpolation technique using a spatio- temporal processing, which can magnify an image with higher resolution than conventional methods. The proposed algorithm is also shown to efficiently reduce noise by using motion compensated temporal filtering. The efficiency of the proposed algorithm is demonstrated through several experimental results.
Model-based Coding
icon_mobile_dropdown
Locally accurate motion estimation for object-based video coding
Michael K. Steliaros, Graham R. Martin, Roger A. Packwood
We describe new motion coding algorithms that develop fixed size block matching (FSBM) into variable sized block matching (VSBM) and a modified approach (MVSBM) which can exploit irregularly shaped areas of uniform motion. New generations of video coding standards handle arbitrary shaped visual objects as well as frame based input. We explain how MVSBM strategies work when combining shaped data and block based algorithms. Locally accurate motion information is produced by the combination of otherwise ambiguous estimates produced by the small area matching required to detect locally diverse movements. The success of various prediction strategies indicates that the motion information is well behaved and thus likely to be accurate, given complex natural image sequence source material. MVSBM is evaluated by forcing it to perform with the same quality prediction as FSBM, then comparing the number of bits required by each technique. FSBM vector coding methods are taken from H.263 and MPEG-4 for comparison with those developed for MVSBM, extra compression phases are developed for MVSBM by utilizing the greater structure of the representation. Results are presented for three MPEG-4 test sequences, 'Container', 'weather' and 'Stefan'. We show bit savings of 67 percent for 'container', and 13 percent for 'Stefan' with more complex object activity.
Region-based motion estimation with uncovered region detection
In natural video sequences, object movement causes regions to be covered or uncovered. Conventional algorithms for region-based motion estimation do not take these regions into full account. Uncovered regions seriously decrease the accuracy of motion estimation. This paper presents an algorithm for increasing the motion estimation accuracy. This algorithm detects uncovered regions and uses them to improve image segmentation. Experimental results show that the presented algorithm is effective in reducing the displaced frame difference, without introducing any extra information for coding applications.
Region-based video coder using edge flow segmentation and hierarchical affine region matching
Debargha Mukherjee, Yining Deng, Sanjit K. Mitra
The essential motivations, towards an object-based approach to video coding, include possible object-based coding scheme. In this work we present an region-based video coder which uses a segmentation map obtained from the previous reconstructed frame, thereby eliminating the need to transmit expensive shape information to the decoder. While the inspiration for this work is derived from previous work by Yokoyama et al, there are major differences between our work and the earlier effort, in the segmentation scheme employed, the motion model, and the handling of overlapped and uncovered regions. We use an edge flow based segmentation scheme, which appears to produce consistent segmentation results over a variety of natural images. Since it combines luminance, chrominance and texture information for image segmentation, it is well suited to segment real world images For motion compensation, we choose an affine model, and use hierarchical region-matching for accurate affine parameter estimation. Heuristic techniques are used to eliminate overlapped and uncovered regions after motion compensation. Extensive coding results of our implementation are presented.
Object-oriented motion compensation for very low bitrate coding applying content-based triangle meshes
Martina Eckert, Javier Villar, Jose Ignacio Ronda, et al.
The two most important aims in video coding are to receive a good quality on one hand and a low bit-rate on the other hand. To combine these almost contradictory requirements it is necessary to use a motion compensation technique which has good subjective transformation characteristics and does not need the transmission of much more than motion information. This paper presents a motion compensation scheme based on 2D triangle meshes with irregularly spread node points. The nodes are selected with the same method on encoder and decoder sides, so that the only information that has to be transmitted are motion vectors and very low bit- rates can be achieved. In a second approach, this model is extended to a technique which adapts particular meshes to the objects in the scene and performs individual transformations. In this way it is possible to compensate motion discontinuities and to improve subjective and objective quality over traditional motion compensation schemes without too costly computational procedures.
Accurate segmentation and estimation of parametric motion fields for object-based video coding using mean field theory
Radhakrishnan Haridasan, John S. Baras
We formulate the problem of decomposing a scene into its constituent objects as one of partitioning the current frame into objects comprising it. The motion parameter is modeled as a non-random but unknown quantity and the problem is posed as one of maximum likelihood (ML) estimation. The MRF potential s which characterize the underlying segmentation field are defined in a way that the spatio-temporal segmentation is constrained by the static image segmentation of the current frame. To compute the motion vector and the segmentation simultaneously we use the expectation maximization (EM) algorithm. The E-step of the EM algorithm, which computes the conditional expectation of the segmentation field, now reflects interdependencies more accurately because of neighborhood interactions. We take recourse to mean field theory to compute the expected value of the conditional MRF. Robust M-estimation methods are used int he M-step. To allow for motions of large magnitudes image frames are represented at various scales and the EM procedure is embedded in a hierarchical coarse-to-fine framework. Our formulation results in a highly parallel algorithm that computes robust and accurate segmentations as well as motion vectors for use in low bit rate video coding.
Image and Video Coding Standards
icon_mobile_dropdown
H.263+ rate control via variable frame rates and global bit allocation
Two new rate control algorithms for H.263+ are proposed in this work. Almost all rate control algorithms studied before are performed under the constant frame rate. However, at very low bit rates, video quality can be degraded severely in order to support a fixed frame rate. Our first rate control algorithm preserves good video quality by adjusting the frame rate under a given channel condition and keeping motion smoothness simultaneously. Our second rate control algorithm is a global bit allocation scheme which guarantees the reasonable sub-optimality with a low computational complexity. Experimental results are provided to demonstrate the performance of the proposed algorithm.
Advanced rate control for MPEG-4 coders
Jose Ignacio Ronda, Martina Eckert, Sven Rieke, et al.
The interest developed in the last years on the coded representations of video signals allowing independent manipulation of semantic picture elements has become one of the guidelines for the new ISO standard MPEG-4. The real- time generation of a bit-stream according to this new paradigm, which can be described as the multiplexing of a set of independently coded, arbitrarily shaped video objects, posies new requirements regarding the transmission through either fixed or variable-rate channels.In particular, adequate control algorithms should support the encoding of objects with different quality requirements and be robust with respect to rapid changes in size and shape of the objects. After formalizing this new rate-control problem in terms consistent with the previous requirements, this paper focuses on the design of the corresponding rate- control algorithms in the real-time case, introducing and evaluating an approach relying on the modelization of the source and the application of optimization criteria based on rate-distortion concepts for the tuning of the different object qualities. The experimental work presented, which corresponds to the recent MPEG-4 video verification model coder specification, apart from making apparent the superiority of the global control over the individual control of the coding of each object, allows for a comparison of the advantages and inconveniences of the different optimization objectives.
New MPEG-2 rate control algorithms based on motion estimation statistics
Jungwoo Lee
A new rate control algorithm based on motion estimation error is presented. The algorithm consists of two steps. The first step deals with the target bit allocation for each frame. The complexity measure which determines the target bit allocation is calculated by using the motion estimation error statistics. The second step is to compute the bit spending profile within a frame. A nonlinear profile based on motion estimation statistics is used to allocate bits for each macroblock more efficiently. Experimental results show that the performance in terms of PSNR is significantly improved over a baseline rate control (TM5) algorithm. Compared to the TM5 baseline algorithm, the new algorithm has very little added complexity because it uses existing information from motion estimation.
Temporal error concealment technique for I-pictures in an MPEG-2 video decoder
Susanna Aign
For digital broadcasting of TV-signals over various transmission media like satellite, cable, and terrestrial channels, the MPEG-2 video source coding algorithm will be considered. Since the video-signal is very sensitive to channel disturbances due to variable length coding, a bit- error rate of less than 10-8 has to be guaranteed. Therefore, a powerful error protection scheme is applied in the European TV-Broadcasting Standards. However, in the case of bad reception conditions, e.g. deep fades or impulsive noise, remaining errors even in the highest protected video signal may still occur. With remaining errors in the video signal a good quality of service cannot be guaranteed. Therefore, some post-processing techniques such as error concealment and iterative decoding at the receiver side will be desirable. The aim of this article is to study different error concealment techniques for handling residual channel errors at the receiver side taking into account a real transmission media. For I-pictures, a motion compensated temporal error concealment algorithm is proposed where motion vectors are displaced along the motion direction.
Performance evaluation of the MPEG-4 visual coding standard
Atul Puri, Robert L. Schmidt, Barry G. Haskell
We first present an overview of the MPEG-4 video standard and its relationship to other existing as well as evolving video standards. MPEG-4 video, while introducing a new paradigm of treating each object in a scene independently, utilizes the traditional motion compensated DCT framework for coding of each object. Thus, while introducing new object based coding functionality, it is also capable of providing traditional frame-based coding. Furthermore, it supports advanced functionalities such as efficient coding of background as a sprite, robustness to channel errors, spatial and temporal scalability of arbitrary shape objects etc. Next, we evaluate the statistical performance of the MPEG-4 video under a number of selected conditions and compare it, depending on the application, with the H.263, the MPEG-1 and the MPEG-2 standards. For each traditional application, based on our limited set of experiments, MPEG-4 video appears to provide equal or better performance when compared to the most suitable existing standard addressing that application area. For the new object based applications, although MPEG-4 video when coding arbitrary shaped objects, incurs additional coding costs, perhaps with further optimization, the increased cost may be offset by improved tradeoffs in coding quality control, channel bandwidth and decoding resource adaptations.
Foreground/background video coding using H.261
Douglas Chai, King N. Ngan
This research addresses the use of face segmentation to improve the subjective quality of videophone sequence encoded by the H.261-compliant coder. In this approach, each frame of the input video sequence is first segmented into two regions, namely, foreground and background. These regions are then encoded using the same coder but with different rate control and quantization level. In this way, the image quality of the more important foreground region can be improved by encoding it with more bits at the expense of background image quality. In this paper, we present a summary of the face segmentation technique required by this approach and then describe the implementation of this foreground/background (FB) coding scheme on the H.261 framework. The discriminatory quantization process and a new rate control strategy for the FB regions are also discussed. In addition, the performance of the FB coding scheme was evaluated by some experimental result carried out on the typical videophone sequences.
Poster Presentations on Motion Estimation II
icon_mobile_dropdown
Using raw MPEG motion vectors to determine global camera motion
Maurizio Pilu
This paper presents a simple and effective method to determine global camera motion using raw MPEG-1 motion vectors information obtained straight form real MPEG-1 streams such as those of the new HITACHI MP-EG1A digital camcorder. The simple approach we have experimented with robustly fits a global affine optic flow model to the motion vectors. Other more robust methods are also proposed. In order to cope with the group-of-frames (GOF) discontinuity of the MPEG stream, B frames are used backward to determine the 'missing link' to a previous GOF thereby ensuring continuity of the motion estimation across a reasonable number of frames. As a tested, we have applied the method to the image mosaicing problem, for which interesting results have been obtained. Although several other methods exists to perform camera motion estimation, the approach presented here is particularly interesting because exploits 'free' information present in MPEG streams and bypass the highly expensive correlation process.
Model-based Coding
icon_mobile_dropdown
3D-model-based nonrigid motion estimation for multiview image sequence compression
Ioannis Kompatsiaris, Dimitrios Tzovaras, Michael G. Strintzis
This paper describes a procedure for model-based coding of all channels of a multiview image sequence. The 3D model is initialized by accurate adaption of a 2D wireframe model to the foreground object of one of the views. The rigid 3D motion is estimated for each triangle, and spatial homogeneity neighborhood constraints are used to improve the reliability of the estimation efficiency and to smooth the motion field produced. A novel technique is used to estimate flexible motion of the nodes of the wireframe from the rigid 3D motion of each triangle and flexible deformation of each node of the wireframe. The performance of the resulting 3D flexible motion estimation method is evaluated experimentally.
Poster Presentations on Motion Estimation II
icon_mobile_dropdown
Epipolar constrained motion estimation for reconstruction from video sequences
Lionel Oisel, Etienne Memin, Luce Morin, et al.
In this paper we present a method for matching two different views of a static scene without any calibration information on the camera. To that end, we use a technique derived from optical flow estimation which takes into account epipolar constraint. The epipolar geometry is directly computed from image data. We assume the dense disparity field to be smooth in any planar region. Smoothness is realized using a regularization method. By adding a robust M-estimator on the smoothness term, the resulting model implicitly takes into account depth discontinuities. We use a multiresolution scheme that allows to recover large displacement. Results are shown on real pairs of images.
Feature-accelerated block matching
Bo Tao, Michael T. Orchard
We study the relationship between local features and block matching in this paper. We show that the use of many features can greatly improve the block matching results by introducing several fast block matching algorithms. The first algorithm is pixel decimation-based. We show that pixels with larger gradient magnitude have larger motion compensation error. Therefore for pixel decimation-based fast block matching, it benefits to subsample the block by selecting pixels with the largest gradient magnitude. Such a gradient-assisted adaptive pixel selection strategy greatly outperforms two other subsampling procedures proposed in previous literature. Fast block matching can achieve the optimal performance obtained using full search. We present a family of such fast block matching algorithm using various local features, such as block mean and variance. Our algorithm reduces more than 80 percent computation, while achieving the same performance as the full search. This present a brand new approach toward fast block matching algorithm design.
Fractal/Subband Coding
icon_mobile_dropdown
Fractal motion compensation
Andreas Pommer, Christian Hufnagl, Andreas Uhl
In this work we apply techniques from classical fractal still-image coding to block-matching motion compensation algorithms for digital video compression. Especially the method of adapting the gray-values in image blocks of the current frame to those in blocks of the reference-frame shows promising performance.
Poster Presentations on Motion Estimation II
icon_mobile_dropdown
Robust estimation of FOE (focus of expansion) from unreliable motion flows
Mun-Sup Song, Man-Bae Kim
We present a recursive estimation technique for recovering FOE from unreliable motion or optical flow. The estimation of FOE is of importance to the analysis of camera motion, especially in the case that the camera motion is purely translational. Our work is based on the observation that there is strong dependence between FOE estimation and motion flows. Therefore, as the FOE depends on the motion flow, a good motion flow can be obtained from accurate FOE. We assume that the camera motion is purely translational and there is no object motion in the scene. The technique used for the elimination of unreliable motion flow is orthogonal regression method. We combine FOE estimation with the elimination algorithm of unreliable motion flows. Experiments using both simulation and real scenes show that our proposed method works robustly under the condition that the percentage of outliers is varying.
Generalized motion compensation for drift reduction
The most straight-forward approach in obtaining a down- converted image sequence is to decimate each frame after it has been fully encoded. To reduce memory requirements and other costs incurred by this approach, a down-conversion decoder would perform a decimation within the decoding loop. In this way, predictions are made from a low-resolution reference which has experienced a considerable loss of information. Additionally the prediction s must be made from a set of motion vectors which correspond to the full- resolution image sequence. Given these conditions, it is desirable to optimize the performance of the motion compensation process. In this paper we show that the optimal set of filters for performing the low-resolution motion compensation are dependent on the choice of down-conversion are provided. To demonstrate the usefulness of these results, a sample set of motion compensation filters for each class of down-conversion are calculated. The results are incorporated into a low-resolution decoder and comparisons of each down-conversion class are made. Simulation results reveal that the filters which were based on multiple block down-conversion can reduce the amount of prediction drift found in the single block down-conversion by as much as 35 percent.
Implementations and Architectures
icon_mobile_dropdown
Complexity analysis of the emerging MPEG-4 standard as a basis for VLSI implementation
Peter M. Kuhn, Walter Stechele
A complexity analysis of the video part of the emerging ISO/IEC MPEG-4 standard was performed as a basis for HW/SW partitioning for VLSI implementation of a portable MPEG-4 terminal. While the computational complexity of previously standardized video coding schemes was predictable for I-, P- and B-frames over time, the support of arbitrarily shaped visual objects as well as various coding options within MPEG-4 introduce now content dependent computational requirements with significant variance. In this paper the result of a time dependent complexity analysis of the encoding and decoding process of a binary shape coded video object (VO) and the comparison with a rectangular shaped VO is given for the complete codec as well as for the single tools of the encoding and decoding process. It is shown, that the average MB complexity per arbitrary shaped P-VOP depicts significant variation over time for the encoder and minor variations for the decoder.
Low-delay MPEG-2 video coding
Tri D. Tran, Lurng-Kuo Liu, Peter H. Westerink
High-quality and low-delay MPEG-2 video coding can be achieved by avoiding the use of intra (I) and bidirectional prediction (B) pictures. Such coding requires intra macroblocks refreshing techniques for channel error propagation resilience and for compliance with the accuracy requirements of the MPEG-2 standard. This paper describes some of these techniques and presents software simulation results of their performance in terms of image quality and their robustness of transmission channel errors.
Nondisruptive RTSP video over the Internet using a modem connection
Yon Jun Chung, C.-C. Jay Kuo
The lack of a delay constraint in the transmission control protocol/Internet protocol scheme is a barrier for time- based media transmission over the Internet. Recently, this barrier has been overcome with the introduction of protocols such as real time protocol and real time streaming protocol. However, these new protocols do not address the issue of smooth time-based media delivery in a server/client environment. Approaches such as buffer control can alleviate the non-smooth delivery problem up to a certain extent, but they are vulnerable to long propagation delay. In this work, we propose a new approach for non-disruptive video playback on the client with a minimal amount of buffering. This is achieved through a combination of video frame insertion and multitasking threads for data reception and video playback. The major contribution of this work lies in system integration of networking and video compression techniques.
Analysis of memory bandwidth requirements for the H.263 video codec
Bhanu Kapoor
Memory bandwidth is emerging as the fundamental impediment to higher performance and lower power computer and communication systems. In this paper, we present an analysis of memory bandwidth requirements for the H.263 video codec algorithms. We provide data and insight into how the choice of cache parameters affects external bandwidth requirements of video. We make use of memory traces generated as a result of running Telenor's H.263 video encoder and decoder software implementations to simulate a large number of cache configurations. In the area of analysis of video algorithms, this paper focuses on the following issues: we provide a study of how varying cache size, block size, associativity, replacement policy, and organization parameters such a split versus unified cache affects memory bandwidth requirements. A comparative study of encoder and decoder bandwidth requirements is presented. We also study various advanced encoding options provided with the H.263 standard in this light. Based on our study, we provide guidelines for traffic-directed memory system design.
Standard-based software-only video conferencing codec on Ultra SPARC
Wei Ding, Alex Zhi-Jian Mou, Daniel S. Rice
Even though CIF-resolution video decoding in software has become popular, as evidenced by the popularity of MPEG-1 software decoders, video encoding at CIF resolution still needs hardware solution, especially with motion estimation. In this paper, we report our work of a high-quality and high-performance software-only H.261 video codec, with real- time transport protocol packetization capability, for H.323 video conferencing over LAN. The codec is implemented entirely in software using Ultra SPARC multimedia instruction extension - visual instruction set or VIS. The encoder can perform in real time change detection, motion estimation, motion-compensated spatio-temporal pre-filtering for noise reduction and adaptive quantization. Thus high quality video can be obtained at rate of 128 Kbits per second or even at a lower rate. It is capable of performing simultaneous encoding and decoding of near CIF resolution video at 15 frames per second. The design of encoder structure, data layout, and various techniques and algorithms developed can be extended to H.263 codec.
Very Low Bit-rate Coding
icon_mobile_dropdown
Efficient spatiotemporal segmentation for very low bitrate video coding
Jason Handcock, Cedric Nishan Canagarajah, Wolfgang Tellert, et al.
The goal of low bit rate video coding is to minimize the number of bits which are required to represent a video coding schemes remove spatial and temporal redundancy by using block based motion compensation and transform coding such as DCT. However the disadvantages of these techniques is that blocking artifacts occur at high compression ratios due to the fact that the block structures are not adapted to the image contents. In this paper an efficient joint spatio- temporal segmentation scheme is proposed to segment and represent image contents based on the spatial characteristics and motion vectors. Two spatio-temporal segmentation algorithms have been proposed. The first uses arbitrarily shaped regions and the other an efficient quad- tree structure which is well suited to very low bit rate video coding. The experimental results show that the proposed scheme produced significant improvements in both objective and subjective quality at low bit rates when compared with fixed block-size segmentation.
Region-based multivector motion estimation for efficient very low bitrate video coding
Luis Salgado, Jose Ignacio Ronda, Jose Manuel Menendez, et al.
A new multivector motion estimation and compensation strategy particularly suitable for region-based coding strategies is introduced. Region motion is described through a variable number of motion vectors applied to specific region control points. Motion estimation is carried out applying locally translational models to the control points, while more complex region motion models are used for compensation. No information about this control points is required to be transmitted,d as their determination is based on information available at the decoder. The application of this strategy within the context of a region-based hybrid video codec operating on arbitrary shaped regions where efficiency is improved eliminating the transmission of any image regions description is presented. Results showing very good quality images at transmission bit rates in the order of 40 kb/s are presented.
Motion parametric modeling for very low bitrate video coding
Jose Manuel Menendez, Luis Salgado, Enrique Rendon, et al.
A camera motion compensation procedure through the use of parametric modeling is introduced. The global movement of the camera is estimated and compensated stemming from the analysis of the sequence vector field. This modeling scheme is applied to a segmentation-based hybrid video codec, allowing the system to keep bit-rates which are similar in the transmission of video sequences acquired by both static and dynamic cameras.
Layered coding system for very low bitrate videophone
Tao Xie, Yun He, Chengjian Weng, et al.
This paper presents a layered video-coding scheme for very low bit rate videophone. The lower two layers of this scheme are compatible with ITU-T H.263 in order to be compatible with other schemes. The first layer is normal H.263 without a priori knowledge of the image content, which is robust to image content. The second layer introduces the typical head- shoulder assumption, and provides subjectively improved quality of reconstructed image. The third layer is based on the same model as layer two, but further improves the subjective quality by employing the shape information of the interested objects. Manipulation functionality is supported in the third layer. When the typical head-shoulder hypothesis is true, the coding scheme can upgrade its coding layer to obtain better subjective quality. On the other hand, the coding scheme can withdraw to robust lower layers when the a priori knowledge is not available In this paper, a novel motion estimation criterion is introduced which can be integrated into most of present strategies to improve the overall performance. A B-spline approach is introduced in this paper to encode the object contour instead of chain code.
AR modeling and low-bit-rate encoding of motion-compensated frame differences
Michael Bruenig, Wolfgang Niehsen
The hybrid coding scheme is modified. The discrete cosine transform used for encoding displaced frame differences is replaced by a predefined set of transform matrices based on autoregressive models up to the fourth order. The autoregressive models are parameterized using reflection coefficients. It is shown that the coding efficiency can be improved though side information for the chosen transform has to be transmitted to the decoder.
Poster Presentations on Filtering
icon_mobile_dropdown
Neural net classification and LMS reconstruction to halftone images
Pao-Chi Chang, Che-Sheng Yu
The objective of this work is to reconstruct high quality gray-level images from halftone images, or the inverse halftoning process. We develop high performance halftone reconstruction methods for several commonly used halftone techniques. For better reconstruction quality, image classification based on halftone techniques is placed before the reconstruction process so that the halftone reconstruction process can be fine tuned for each halftone technique. The classification is based on enhanced 1D correlation of halftone images and processed with a three- layer back propagation neural network. This classification method reached 100 percent accuracy with a limited set of images processed by dispersed-dot ordered dithering, clustered-dot ordered dithering, constrained average, and error diffusion methods in our experiments. For image reconstruction, we apply the least-mean-square adaptive filtering algorithm which intends to discover the optimal filter weights and the mask shapes. As a result, it yields very good reconstruction image quality. The error diffusion yields the best reconstructed quality among the halftone methods. In addition, the LMS method generates optimal image masks which are significantly different for each halftone method. These optimal masks can also be applied to more sophisticated reconstruction methods as the default filter masks.
Iterative regularized mixed-norm image restoration algorithm
This paper introduces a regularized mixed-norm image restoration algorithm. A functional which combines the least mean squares (LMS), the least mean fourth (LMF), and a smoothing functional is proposed.A function of the kurtosis is used to determine the relative importance between the LMS and the LMF functionals, and a function of the previous two functionals an the smoothing functional is utilized for determining the regularization parameter. The two parameters are chosen in such a way that the proposed functional is convex, so that a local minimizer becomes a global minimizer. The novelty of the proposed algorithm is than no knowledge of the noise distribution is required, and the relative contribution of the LMS, the LMF and the smoothing functional is adjusted based on the partially restored image.
Graphic/nongraphic segmentation for multistandard compression
Robrecht Jacques, Luc Van Eycken
In the field of compression research a number of different types of compression algorithms have been developed. These algorithms are best suited for the type of data they were developed for but fail if an image of a completely different type is coded, either in compression ratio or in image quality. Object based coding schemes allow algorithms to code different objects in different ways. In MPEG-4 these objects are used for coding, but they can also be used for object manipulation, indexing or image analysis. The problem remaining is finding a segmentation that gives meaningful objects. This object segmentation is most of the time done based on color and texture information or on motion but segmentation based on the type of data is mostly ignored. In this paper a segmentation scheme is presented that is based on the type of data present in the image. Two very distinct data types are considered: graphical and non-graphical data. Based on a co-occurrence matrix of the Peano-scan of the color space the algorithm segments the image and produces two images: one with graphical data that will be coded by JBIG and one with non-graphical data to be coded with JPEG.
2D adaptive prediction-based Gaussianity tests in microcalcification detection
With increasing use of Picture Archiving and Communication Systems, computer-aided diagnosis methods will be more widely utilized. In this paper, we develop a CAD method for the detection of microcalcification clusters in mammograms, which are an early sign of breast cancer. The method we propose makes use of 2D adaptive filtering and a Gaussianity test recently developed by Ojeda et al. for causal invertible time series. The first step of this test is adaptive linear prediction. It is assumed that the prediction error sequence has a Gaussian distribution as the mammogram images do not contain sharp edges. Since microcalcifications appear as isolated bright spots, the prediction error sequence contains large outliers around microcalcification locations on the second step of the algorithm is the computation of a test statistic from the prediction error values to determine whether the samples are from a Gaussian distribution. The Gaussianity test is applied over small, overlapping square regions. The regions, in which the Gaussianity test fails, are marked as suspicious regions. Experimental results obtained from a mammogram database are presented.
Multiplier-free linear-phase filter banks for image and video compression
David W. Redmill, David R. Bull
The discrete wavelet transform (DWT) has been widely proposed for image and video compression. In many systems the DWT represents a large proportion of the codec complexity. In order to reduce the complexity, and as results manufacturing cost, it is desirable to choose filer bank which has low complexity, without compromising compression performance. This paper presents several multiplier free filter banks for use within image and video compression system. The filter banks are compared to alternative designs which are known to be well suited to image and video compression. The proposed filters offer almost identical performance with a significant reduction in complexity.
Information-efficient decompositions
Rachel Alter-Gartenberg, Stephen K. Park
Digital image decomposition is yet another of a series of operators that have been migrated from the realm of digital signal processing (DSP) to the realm of digital image processing (DIP) without checking the validity of some of their basic assumptions. In particular, 2D image decomposition techniques often ignore the basic difference between DSP and DIP acquisition systems. Whereas 1D acquisition is designed to ensure sufficient sampling, digital cameras are inherently designed to under sample. Therefore the assumptions of band-limited input target and a perfect sinc interpolator as the output-device response are valid only for 1D signal decomposition. This paper ties the decomposition and reconstruction design-theory to the continuous-target/discrete processing/continuous-image theory of 2D sampled images. It extends the traditional theory of image decomposition to include the effects of acquisition and display. It shows that the acquired information, not the signal's entropy, dictates the trade- off between data transmission and visual quality.It suggests the information bit-allocation tool in the case of insufficient sampling.
Control grid interpolation using motion discontinuity patterns
Joon Hyun Sung, Joon-Ho Chang, Dong-Il Chang, et al.
Control grid interpolation (CGI) was suggested to overcome the drawbacks of the block matching algorithm such as blocking artifacts. In CGI, smooth motion field is obtained because motion vectors are spatially transformed from the control points. However, when there are more than two objects which move in different directions, distortion may be involved near the motion boundaries due to the continuity of the motion field. To cope with this problem, we use motion discontinuity pattern and the motion vectors are transformed according to the motion discontinuity. For the determination of motion discontinuity pattern, the similar vectors are grouped together and the grouping pattern is checked. Since this determination is carried out by simple calculation of the motion vectors which are available at the decoder, no side information is necessary and the additional complexity is negligible. The modified CGI with motion discontinuity outperforms the conventional CGI in objective and subjective qualities.
Video and Database Management
icon_mobile_dropdown
Motion-based video object indexing using multiresolution analysis
Jeho Nam, Ahmed H. Tewfik
In this paper, we describe an efficient video indexing scheme based on motion behavior of video objects for fast content-based browsing and retrieval in a video database. The proposed novel method constructs a dictionary of prototype objects. The first step in our approach extracts moving objects by analyzing layered images constructed from the coarse data in a 3D wavelet decomposition of the video sequence. These images capture motion information only. Moving objects are modeled as collections of interconnected rigid polygonal shapes in the motion sequences that we derive from the wavelet representation. The motion signatures of the object are computed from the rotational and translational motions associated to the elemental polygons that form the objects. These signatures are finally stored as potential query terms.
Content-based storage and retrieval scheme for image and video databases
Nicos Herodotou, Konstantinos N. Plataniotis, Anastasios N. Venetsanopoulos
In this paper, a technique is presented to locate and track the facial areas in image and video databases. The extracted facial regions are used to obtain a number features that are suitable for content-based storage and retrieval. The proposed face localization method consists of essentially two components: i) a color processing unit, and ii) a shape and color analysis module. The color processing component utilizes the distribution of skin-tones in the HSV color space to obtain an initial set of candidate regions or objects. The latter shape and color analysis module is used to correctly identify the facial regions when falsely detected objects are extracted. A number of features such as hair color, skin-tone, and face location and size are subsequently determined from the extracted facial areas. The hair and skin colors provide useful descriptions related to the human characteristics while the face location and size can reveal information about the activity within the scene, and the type of image. These features can be effectively combined with others and employed in user queries to retrieve particular facial images.
Image segmentation using intensity and color information
Yuichi Kanai
We have developed an advanced segmentation algorithm using color information as well as intensity information. Combining both kinds of information yields robust and better segmentation results in terms of perception .Our segmentation algorithm consists of joint marker extraction, region growing and region merging. We have introduced a new algor for extracting markers out of images using both color and intensity information. Morphological open-close by reconstruction filters are applied for intensity based marker extraction. In color based marker extraction, quantized HSV color values are employed. Joint markers are defined as the sum of both kinds of markers. The region growing process is applied after the marker extraction process until all of the uncertain pixels belong to either of the marker regions. Our proposed process is based on a watershed algorithm which is a powerful morphological decision tool. After the region growing process, region merging using color information is employed. This process is applied in order to reduce segmented regions while preserving meaningful information. Finally, our experimental results are shown using 'akiyo' and 'foreman' sequences.
Image Coding II
icon_mobile_dropdown
Comparison of five popular lossless image compressors
Theodore R. Goodman, Alexander I. Drukarev, Glen G. Langdon Jr.
Many choices are available for lossless image compression.Each technique offers different tradeoffs between compression ratios, execution speeds, and flexibility. Performance tradeoffs typically depend on user- specified scaling parameters, and on the type of image. This investigation focuses on five well-established methods for performing lossless image compression: 'deflate' compression, Lempel-Ziv-Welch, Graphic INterchange Format, Portable Network Graphic format, and baseline JPEG-LS. Several classes of imagery are covered, and the merits of three color-mapping techniques are examined.
Near-lossless image compression techniques
Methods of near-lossless image compression based on the criterion of maximum allowable deviation of pixel values are described in this paper. Predictive and multi resolution techniques for performing near-lossless compression are investigated. A procedure for near-lossless compression using a modification of lossless compression are investigated. A procedure for near-lossless compression using a modification of lossless predictive coding techniques to satisfy the specified tolerance is descried. Simulation results with modified versions of two of the best lossless predictive coding techniques known, CALIC and JPEG- LS, are provided. It is shown that the application of lossless coding based on reversible transforms in conjunction with pre-quantization is inferior to predictive techniques for near-lossless compression. A partial embedding two-layer scheme is proposed in which an embedded multi-resolution coder generates a lossy base layer, and a simple but effective context-based lossless coder codes the difference between the original image and the lossy reconstruction. Simulation results show that this lossy plus-lossless technique yields compression ratios very close to those obtained with predictive techniques, while providing the feature of a partially embedded bit-stream.
Buffer control of DCT-based intrafield video coding
ShengQiang Lou, Fukan Huang, Liangzhu Zhou, et al.
In this paper, a new buffer control strategy for DCT-based intrafield video encoder is presented. It selects the encoding parameters according to the rate-quantization models for three kinds of blocks: edge blocks, texture blocks and smooth blocks. It adapted to the scene changes through adjusting the R-Q models' parameters. A simple and effective criteria is presented for classifying the blocks. The classifying result based on it is close to the result gotten through quantizing the bits needed to code a block into three levels. This method can be applied to some other coding techniques such as H-261; MPEG, extra computations needed by it is modest. Experiments show that this buffer control policy can control the output bit rate steadily even at scene changes.
Optimal piecewise linear image coding
Dietmar Saupe
Piecewise linear (PL) image coding proceeds in three steps: 1) a digital image is converted into a 1D-signal using a scanning procedure, for example by scanning lines in a zig- zag or Hilbert order. 2) The signal is approximated by the graph of a piecewise linear function, which consists of a finite number connected line segments. 3) Entropy encoding of the sequence of the segment end points. In this step differential coding can be used for one or both coordinate sequences of the end points. In this step differential coding can be used for one or both coordinate sequences of the end points. To ensure a desired approximation quality a constraint is imposed, e.g., on the root-mean-square error of the PL signal. In this paper we consider uniform approximation. Two problems are addressed: first, an optimal PL approximation in the sense of a minimal number of segments is to be obtained. Second, when entropy coding of the segments is used, how can one jointly optimize the variable length code and the PL approximation yielding a better or even minimal rate without violating the uniform error bound. The first problem is solved by dynamic programing, the second is approached by using Huffman coding and an annealing procedure in which the design of the Huffman tables and the dynamic programming is alternately iterated using a cost function that reflects the codework lengths of the current variable length code. This algorithm is guaranteed to converge to a minimum length code. We describe the algorithms, implementation issues, compare two different scanning procedures, the zig-zag line scan and the Hilbert scan, and report results for encoding various test images.
Improved compression technique for multipass color printers
Chris Honsinger
A multipass color printer prints a color image by printing one color place at a time in a prescribed order, e.g., in a four-color systems, the cyan plane may be printed first, the magenta next, and so on. It is desirable to discard the data related to each color plane once it has been printed, so that data from the next print may be downloaded. In this paper, we present a compression scheme that allows the release of a color plane memory, but still takes advantage of the correlation between the color planes. The compression scheme is based on a block adaptive technique for decorrelating the color planes followed by a spatial lossy compression of the decorrelated data. A preferred method of lossy compression is the DCT-based JPEG compression standard, as it is shown that the block adaptive decorrelation operations can be efficiently performed in the DCT domain. The result of the compression technique are compared to that of using JPEG on RGB data without any decorrelating transform. In general, the technique is shown to improve the compression performance over a practical range of compression ratios by at least 30 percent in all images, and up to 45 percent in some images.
Image Sequence Analysis
icon_mobile_dropdown
Motion-based video segmentation using continuation method and robust cost functions
Cheng-Hong Yang, Janusz Konrad
We propose a new approach to spatial segmentation of video sequences that is based on motion attributes. The approach, similarly to some previous efforts, uses Markov random field models and maximum a posteriori probability estimation.Our approach is novel in three ways. First, we propose a general formation for the joint motion estimation. Secondly, instead of the usual quadratic models we propose a robust estimation criterion that eliminates the impact of outliers on the estimates. Thirdly, since solving the segmentation problem directly in the space of discrete labels is difficult, we opt for a continuation method over a Gaussian pyramid. Thus, the estimation process starts as a motion estimation and then slowly converges towards a motion-based segmentation by 'hardening' the smoothness constraint. The final result is a quasi-segmentation, i.e., the estimated vector field is continuous but almost peicewise constant, and must undergo subsequent quantization. We show experimental results on two natural image sequences; the resolution quasi-segmentations clearly extract moving objects. The method may serve as an initial stage for joint motion estimation and segmentation, or may produce final segmentations if suitable post- processing is applied.
Fast algorithm for subpixel-accuracy image stabilization for digital film and video
Cigdem Eroglu, A. Tanju Erdem
This paper introduces a novel method for subpixel accuracy stabilization of unsteady digital films and video sequences. The proposed method offers a near-closed-form solution to the estimation of the global subpixel displacement between two frames, that causes the misregistration of them. The criterion function used is the mean-squared error over the displaced frames, in which image intensities at subpixel locations are evaluated using bi-linear interpolation. The proposed algorithm is both faster and more accurate than the search-based solutions found in the literature. Experimental results demonstrate the superiority of the proposed method to the spatio-temporal differentiation and surface fitting algorithms, as well. Furthermore, the proposed algorithm is designed so that it is insensitive to frame-to-frame intensity variations. It is also possible to estimate any affine motion between two frames by applying the proposed algorithm on three non-collinear points in the unsteady frame.
Disparity estimation with modeling of occlusion and object orientation
Andre Redert, Chun-Jen Tsai, Emile A. Hendriks, et al.
Stereo matching is fundamental to applications such as 3D visual communications and depth measurements. There are several different approaches towards this objective, including feature-based methods, block-based methods, and pixel-based methods. Most approaches use regularization to obtain reliable fields. Generally speaking, when smoothing is applied to the estimated depth field, it results in a bias towards surfaces that are parallel to the image plane. This is called fronto-parallel bias. Recent pixel-based approaches claim that no disparity smoothing is necessary. In their approach, occlusions and objects are explicitly modeled. But these models interfere each others in the case of slanted objects and result in a fragmented disparity field. In this paper we propose a disparity estimation algorithm with explicit modeling of object orientation and occlusion. The algorithm incorporates adjustable resolution and accuracy. Smoothing can be applied without introducing the fronto-parallel bias. The experiments show that the algorithm is very promising.
Simultaneous moving and uncovered-background pixel detection and parameter estimation in video sequences
Kristine E. Matthews, Nader M. Namazi
Two essential aspects of uncovered-background prediction and motion compensation for image sequence coding are the segmentation of an image frame in a sequence of images into regions of uncovered- and covered-background, moving, and stationary pixels and the estimation of motion parameters. We have developed and investigated a method which simultaneously estimates motion and sequence parameters and provides image segmentation from noisy image sequences. The method segments an image frame in an image sequence into regions of moving, stationary, covered-background, and uncovered-background pixels relative to a reference frame. The basis of our method is the expectation-maximization algorithm for maximum-likelihood estimation. We previously presented our method under assumptions which imposed sever restrictions on the image composition and the object motion and did not consider covered-background pixels. These cases, though unrealistic, were illustrative of the mathematical formulation of our method. In this paper we remove the limiting restrictions on the image composition, and we introduce a region for covered-background pixels.
Reconstruction of 3D affine and Euclidean mesh models from video
3D mesh modeling offers several advantages over 2D mesh modeling for object-based video manipulation including digital postprocessing and editing. This paper proposes a new method for automatic reconstruction of 3D affine and Euclidean mesh representations of objects from video sequences. Our approach is to first design a 2D mesh by selecting node points from a set of salient points followed by constrained Delaunay triangulation. Next, an improved 2D mesh tracking scheme is proposed that estimates motion vectors at the node points by global motion estimation followed by a local refinement scheme. A 3D mesh is then reconstructed by estimating the depth at these node points in an affine space using computer vision techniques. A Euclidean reconstruction is also computed by imposing additional constraints. Experimental results are provided to demonstrate the accuracy of the reconstructions.
Image Coding III
icon_mobile_dropdown
Vector quantization and coding of the chromatic information in an image
Eric Dubois, Jamal Fadli, Daniel Lauzon
This paper presents a new method for coding the chromatic component of a color image that exploits the piecewise- constant nature of chromatic information. The image is first transformed to a color space in which chromatic information is nearly piecewise constant. The chromatic component is then represented by entries from a codebook of 2D chromatic vectors adapted to the given image. Both memoryless quantization and quantization with spatial memory are considered. Finally, the field of labels is coded using a suitable lossless code with memory; we have used a context- dependent arithmetic code. Experimental results showing rate-distortion performance of the method under various conditions are presented.
Fast PNN algorithm for design of VQ initial codebook
Day-Fann Shen, Kuo-Shu Chang
PNN algorithm is excellent for obtaining an initial codebook in VQ design. However, the drawback of PNN is its computational complexity, especially when the training set size is large. In this paper, we explore the characteristics of PNN algorithm and propose a fast PNN algorithm using memory which can reduce the computational complexity from O(L3) to O(L2).
Lattice vector quantization with reduced or without look-up table
Patrick Rault, Christine M. Guillemot
This paper describes a new vector indexing algorithm for Lattice Vector Quantization (LVQ). The technique applies to a large class of lattices such as Zn, An, Dn, or E8, widely used in signal compression. Relying on a partitioning of the events sources, based on a notion of leaders as proposed, it allows to trade vector look-up table size for arithmetic operations. At the cost of a very small number of integer arithmetic operations, The algorithm leads to a very significant reduction of the vector look-up tables. This in turn leads to reduced encoder and decoder complexities. The introduction of the concept of 'absolute' leaders, and of the corresponding coding and decoding algorithms, provides additional flexibility in trading table size for arithmetic operations. The association of these vector indexing techniques with product codes, in the framework of LVQ, leads to increased compression performances.
Hierarchical coding of 2D mesh geometry and motion
This paper proposes methods for construction and compression of hierarchical representations of 2D moving meshes. This representation consists of a hierarchy of Delaunay meshes, using image-based and shape-based criteria for mesh geometry simplification. The hierarchical compression technique is based on a nearest-neighbor ordering of mesh node points. This ordering serves to define the mesh boundary as well as a spatial prediction relation on the nodes, which is employed for differential node point location and motion vector coding. The proposed compression methods allow progressive transmission and quality scalability. The proposed hierarchical mesh representation can be applied in object-based video coding, storage and manipulation. Experimental results are provided comparing the compression performance of non-hierarchical coding and hierarchical coding, and evaluating the trade-off between mesh-based video object rendering quality and hierarchical mesh bitrate.
Low-bit-rate subband image coding with matching pursuits
Hamid Rabiee, S. R. Safavian, Thomas R. Gardos, et al.
In this paper, a novel multiresolution algorithm for low bit-rate image compression is presented. High quality low bit-rate image compression is achieved by first decomposing the image into approximation and detail subimages with a shift-orthogonal multiresolution analysis. Then, at the coarsest resolution level, the coefficients of the transformation are encoded by an orthogonal matching pursuit algorithm with a wavelet packet dictionary. Our dictionary consists of convolutional splines of up to order two for the detail and approximation subbands. The intercorrelation between the various resolutions is then exploited by using the same bases from the dictionary to encode the coefficients of the finer resolution bands at the corresponding spatial locations. To further exploit the spatial correlation of the coefficients, the zero trees of wavelets (EZW) algorithm was used to identify the potential zero trees. The coefficients of the presentation are then quantized and arithmetic encoded at each resolution, and packed into a scalable bit stream structure. Our new algorithm is highly bit-rate scalable, and performs better than the segmentation based matching pursuit and EZW encoders at lower bit rates, based on subjective image quality and peak signal-to-noise ratio.
Compression and progressive transmission of images
Kunal Mukherjee, Amar Mukherjee
To be able to compact large amounts of multimedia data and route it through a busy network at interactive rates has emerged as one of the biggest technological challenges of our times. Recently, there has been much activity in the areas of theoretical compression models using wavelets, evaluation of suitable wavelets for compression, fast real- time compression/decompression systems, and parallelized VLSI algorithms. Little work has been donee towards integrating these developments into a tightly coupled optimized scheme. We take a unified approach to developing a real-time compression/transmission system using a tight coupling of hierarchical vector quantization (HVQ) on discrete wavelet transformed images. We simultaneously optimize for speed, performance and scalability on several fronts, e.g. choice of wavelet, parallelizability, and efficient VLSI implementation. In doing so we demonstrate a speedup of O(logL), as well as reduce storage by a factor of O(log)L3. To achieve this we argue that the simplest wavelets, i.e. the Haar bases suffice for our scheme, because HVQ retains detail coefficients.We also show how to integrate the algorithm into the parallel graphics library, in order to achieve parallelized compression and progressive transmission of images.
Stereoscopic Data Processing/Analysis
icon_mobile_dropdown
Stereo sequence coding
Qin Jiang, Monson H. Hayes III
A stereo sequence coding algorithm is presented and evaluated in this paper. The left image stream is coded independently by an MPEG-type coding scheme. In the right image stream, only reference frames are coded by the subspace projection technique. The rest of frames in the right image stream are not coded and transmitted at the encoder; they are reconstructed from reference frames at the decoder. A frame estimation and interpolation technique is developed to exploit the great redundancy within stereo sequences to reconstruct some frames of the right image stream at the decoder. In the reconstructed frames, uncovered occlusions regions are filled by a disparity-based techniques. The intra coding and residual coding are based on subband coding techniques. The motion and disparity fields are estimated by block-based matching with a multiresolution structure, and coded by an entropy coding technique. Two stereo sequences are used to test our coding algorithm. Experimental results show that the frame estimation and interpolation technique works well perceptively and our stereo sequence coding scheme is effective to achieve high compression ratio.
Dependent quantization for stereo image coding
Woontack Woo, Antonio Ortega
In this paper, we address the problem of optimal bit allocation for stereo images. Conventional rate-distortion based methods have mainly concentrated on minimizing total distortion within a given bit budget by independently encoding each image. However, stereo image coding, like video coding, requires dependent bit allocation framework to further improve encoding performance because binocular and spatial dependencies are introduced by the disparity estimation and differential pulse coded modulation of the disparity vector field. We first formulate the dependent bit allocation problem for stereo image coding and extend it to blockwise dependent bit allocation. We then focus on the blockwise dependent quantization because using open-loop disparity estimation decouples the dependent bit allocation problem into two independent problems; disparity estimation and dependent quantization. The encoding complexity and delay in the dependent quantization framework can be significantly reduced by exploiting the unidirectional binocular dependency. An optimal set of quantizers can be selected using the Viterbi algorithm. For a given three quantization scales, the proposed scheme provides higher PSNR, about 3dB compared to JPEG without disparity compensation and 0.5dB compared to optimal higher PSNR, about 3dB compared to JPEG without disparity compensation and 0.5dB compared to optimal independent blockwise quantization with disparity compensation and 0.5dB compared to optimal independent blockwise quantization with disparity compensation. The proposed scheme can help develop a fast and efficient bit allocation strategy, be a benchmark of practical rate control schemes or be used in asymmetric applications, which may involve offline encoding, such as CD-ROM, DVD, video-on-demand, etc.
Two-dimensional sequence compression using MPEG
Ming-Hoe Kiu, Xiao-Song Du, Robert J. Moorhead II, et al.
This paper presents a technique to use the MPEG-2 compression scheme to compress a 2D array of images. The technique operates over two spatial dimensions, instead of the normal temporal dimension. This compression is crucial to interactive image-based rendering techniques, which allow virtual walk-throughs of texture-rich scenes. Since the MPEG schemes can achieve high compression with minimal image degradation, the large storage problems are reduced. This increase the applicability of image-based rendering on standard workstations or personal computers. Results are compared against previously presented compression schemes for image-based rendering.
Block-based winner-takes-all reconstruction of intermediate stereoscopic images
Abdol-Reza Mansouri, Janusz Konrad
This paper addresses the issue of the reconstruction of intermediate views from a pair of stereoscopic images. Such a reconstruction is needed for the enhancement of depth perception in stereoscopic systems, e.g., 'continuous look around' or adjustment of virtual camera baseline. The algorithm proposed here addresses the issue of blue; unlike typical reconstruction algorithms that perform averaging between disparity-compensated left and right images the new algorithm uses non-linear filtering via a winner-takes-all strategy. The image under reconstruction is assumed to be a tiling by fixed-size blocks that come from various positions of either the left or right images using disparity compensation. The tiling map is modeled by a binary decision field while the disparity model is based on a smoothness constraint. The models are combined through a maximum a posteriori probability criterion. The intermediate intensities, disparities and the binary decision field are estimated jointly using the expectation-maximization algorithm. The proposed algorithm is compared experimentally with a reference block-based algorithm employing linear filtering. Although the improvements are localized and often subtle, they demonstrate that a high-quality intermediate view reconstruction for complex scenes is feasible if camera convergence angle is small.
Perceptual evaluation of coded stereo image pairs
Haluk B. Aydinoglu, Monson H. Hayes III
A methodology to quantify the perceptual quality of coded stereo image pairs is developed. It can determine the performance of stereo image condign algorithm with respect to each other, as well as with respect to original stereo pairs. In addition, statistical significance of the comparisons can be obtained. The results can also be used to quantify the distortion in depth information since perceptual quality of stereo pairs is directly related to depth information.
Wavelet Coding
icon_mobile_dropdown
Nonseparable two-dimensional multiwavelet transform for image coding and compression
Daniel Wajcer, David Stanhill, Yehoshua Y. Zeevi
In most cases of 2D wavelet applications in image processing, and in coding ad compression, separable filters reconstructed by a tensor product of two 1D filters are used. This approach imposes limitations on the design of such filters. We therefore present a method of designing non-separable, orthogonal 2D wavelet functions and filter- banks. We also show how to obtain orthogonal filter-banks which have linear phase and any number of vanishing moments or approximation order. After applying the 2D wavelet transform to an image, we use vector quantization (VQ) and zerotree for coding the wavelet coefficients in an efficient way. For VQ, we use the entropy-constrained vector quantization. The bit allocation for each resolution level is determined automatically during the training process by a Lagrange multiplier method. It is shown that it is better to combine coefficients, that relate to different wavelet functions and the same location, into the same vector, rather than combining neighboring coefficients of the same wavelet function. We also present a different method for the coefficient coding, employing the fact that there exists similarity between the subbands, to increase the efficiency of data compression. This method is based on the embedded zerotree wavelet algorithm, but introduces some modifications: we relate to vectors instead of dealing with each coefficient individually, and the method is adjusted to the specific structure of the transform we use. Quincunx multiwavelets with up to third order polynomial approximation, corresponding to filters with very small support, are considered in detail.
Low-delay embedded 3D wavelet color video coding with SPIHT
Beong-Jo Kim, William A. Pearlman
In this paper, a modification of the 3D SPIHT, which is the 3D extension to image sequence of 2D SPIHT still image coding, is presented in order to allow more flexibility in choosing the number of frames to be processed at one time by introducing unbalanced tree structure. Simulation shows that 3D SPIHT with reduced coding latency still achieves coding result comparable to MPEG-2, and exhibits more uniform PSNR fluctuations. In additional extension to color video coding is accomplished without explicit rate-allocation, and can be used to any color-plane representation.
Poster Presentations on Filtering
icon_mobile_dropdown
Optimized linear-phase filter banks for wavelet image coding
Xuguang Yang, Kannan Ramchandran
In this paper, we address the issue of designing a two- channel linear phase biorthogonal filter bank that maximizes the two most desired properties for the wavelet transform in image coding applications, namely, orthogonality and energy compaction. Proper cost functions are formulated for these two criteria and an efficient signal-adaptive optimization algorithm is proposed. Our algorithm is motivated by a number of interesting properties of the correlation matrix of typical image signals, and uses lifting operations to efficiently represent the degrees of freedom subject to perfect reconstruction conditions. In addition, it offers a successive tradeoff between our two optimization goals. Experimental results on the popular Daubechies 9-7 and 10-18 filter banks reveal that considerable improvements in terms of both orthogonality and energy compaction can be achieved through the proposed optimization technique.
Wavelet Coding
icon_mobile_dropdown
Adaptive multiresolution motion estimation techniques for wavelet-based video coding
Wavelet transform is an important tool for image and video coding applications. Several motion estimation techniques have been proposed in the wavelet domain. The coarse-to-fine motion estimation techniques generally have a lower complexity at the expense of inaccurate estimation. On the other hand, the fine-to-coarse motion estimation techniques provide a superior estimation, but at a higher complexity. In this paper, we propose an efficient video coder in the wavelet domain. First, we propose an adaptive resolution selection for motion estimation where a lowpass subband at an appropriate scale is employed for coarse motion estimation. The motion vectors of subbands from higher/lower resolutions are then predicted from the coarse motion vectors and are further refined using a small search window. Secondly, we propose an adaptive bit allocation technique where the bits are allocated optimally between the motion vectors and the displaced frame difference. This is performed by minimizing a cost function based on the Lagrange multiplier method. Simulation result shows that the proposed video coder provides a superior coding performance compared to other multi-resolution techniques proposed in the literature.
Representing multiple regions of interest with wavelets
Andrew T. Duchowski
Gaze-contingent systems minimize bandwidth requirements by displaying high-resolution imagery within eye-slaved regions of interest (ROIs) of limited spatial extent. The parafoveal resolution transitions must be sufficiently smooth to render degradation effects imperceptible. In this paper, a wavelet image filtering scheme is presented which preserves high resolution in ROIs matching foveal vision, and gradually degrades resolution in the periphery. The method permits the representation of multiple ROIs extending previous work based on MIP-mapping to the wavelet domain. Degradation is achieved through wavelet coefficient scaling following Voronoi partitioning of the image plane. Three variants of peripheral degradation are offered, including one matching human visual acuity. Reconstruction examples of images processed with the Haar and Daubechies-6 wavelets are provided. Results from gaze-contingent experiments, conducted to test perceived impairment of degraded image sequences, are summarized suggesting imperceptible degradation effects.
Motion Estimation/Detection
icon_mobile_dropdown
Moving-object detection from MPEG coded data
Yasuyuki Nakajima, Akio Yoneyama, Hiromasa Yanagihara, et al.
We describe a method of moving object detection directly from MPEG coded data. Since motion information in MPEG coded data is determined in terms of coding efficiency point of view, it does not always provide real motion information of objects. We use a wide variety of coding information including motion vectors and DCT coefficients to estimate real object motion. Since such information can be directly obtained from coded bitstream, very fast operation can be expected. Moving objects are detected basically analyzing motion vectors and spatio-temporal correlation of motion in P-, and B-pictures. Moving objects are also detected in intra macroblocks by analyzing coding characteristics of intra macroblocks in P- and B-pictures and by investigating temporal motion continuity in I-pictures. The simulation results show that successful moving object detection has been performed on macroblock level using several test sequences. Since proposed method is very simple and requires much less computational power than the conventional object detection methods, it has a significant advantage as motion analysis tool.
Tracking a moving stimulus by active vision
Edouard Leclercq, Olga Cachard, Paul Leber, et al.
Our research deals with the development of a binocular system which is able to react to a stimulus and to track it.In this paper we present results which concentrate on a monocular tracking system. The imposed constraints are as follows: it must be a simple system, react quickly and be based on the biological model of vision. The developed model consists in three modules: the optical mechanism to acquire the visual information, the process module which processes the information and allows the location of the moving object, and the command module of the eye which is necessary to place the fovea on this object. The results obtained from the analyses of scenes with a unique moving object enable us to validate the chosen command. Then, we applied the system to several moving stimuli varying in sizes and velocity vectors. One of the advantages of or process module is the possibility of selecting one stimulus out of n to track it. The developed system enables us to obtain a good precision, the object is always localized by the fovea, even after a change of direction.
Luminance and texture variation analysis for motion detection
We present a statistic algorithm of motion detection in sequences of images acquired by a fixed camera. This algorithm associates two methods of segmentation. The first, called 'unrefined', destined for a perminary determination of zones in movement, is based on a segmentation by division of an original operator making use of three successive images. The second uses a Markovian approach combined with statistic parameters of the second order. This algorithm is characterized by its robustness, its simplicity, and its calculation time.
Semiautomatic video layering using 2D mesh tracking
Candemir Toklu, A. Murat Tekalp, A. Tanju Erdem
We describe a method for video layering using 2D tracking. We represent the video as sets of non-overlapping layers, and model the motion and shape of each layer by a 2D mesh. Assuming that the object boundaries are marked interactively on some key-frames, we find the segmentation maps in all other frames using only the YUV data and track the 2D meshes corresponding to video layers during their lifespan.
Noniterative motion estimation for overlapped block motion compensation
Bo Tao, Michael T. Orchard
We study motion estimation in overlapped block motion compensation. Due to the interaction between neighboring motion vectors, it remains an open problem how to find the optimal motion vector set minimizing the motion compensation error. In this paper we present a non-iterative motion estimation algorithm to search for a sub-optimal solution. It utilizes the relationship between block motion estimates, by exploiting known motion estimate in the causal past and predicting the unknown future. Our algorithm significantly outperforms other known non-iterative algorithms, including conventional bock matching and windowed block matching. Furthermore, while only costing a fraction of the computation needed by iterative algorithms, our non- iterative algorithm obtains most of the gain realizable by using the iterative algorithms. The experimental results show that iteration is not necessary to achieve large gains, in contrast to common belief.
Fractal/Subband Coding
icon_mobile_dropdown
Fractal image coding based on replaced domain pools
Masaki Harada, Toshiaki Fujii, Tadahiko Kimoto, et al.
Fractal image coding based on iterated function system has been attracting much interest because of the possibilities of drastic data compression. It performs compression by using the self-similarity included in an image. In the conventional schemes, under the assumption of the self- similarity in the image, each block is mapped from that larger block which is considered as the most suitable block to approximate the range block. However, even if the exact self-similarity of an image is found at the encoder, it hardly holds at the decoder because a domain pool of the encoder is different from that of the decoder. In this paper, we prose a fractal image coding scheme by using domain pools replaced with decoded or transformed values to reduce the difference between the domain pools of the encoder and that of the decoder. The proposed scheme performs two-stage encoding. The domain pool is replaced with decoded non-contractive blocks first and then with transformed values for contractive blocks. It is expected that the proposed scheme reduces errors of contractive blocks in the reconstructed image while those of non- contractive blocks are kept unchanged. The experimental result show the effectiveness of the proposed scheme.
Predictive and direct fractal quantization of the wavelet packet domain
Matthias Reichl, Andreas Uhl
In this work we discuss several approaches for designing fractal quantizers in the context of hybrid wavelet-fractal image compression algorithms. Moreover different subband- structures are compared concerning their suitability for subsequent fractal quantization.
Coding of fingerprint images using binary subband decomposition and vector quantization
Oemer Nezih Gerek, Enis A. Cetin
In this paper, compression of binary digital fingerprint images is considered. High compression ratios for fingerprint images is essential for handling huge amount of images in databases. In our method, the fingerprint image is first processed by a binary nonlinear subband decomposition filter bank and the resulting subimages are coded using vector quantizers designed for quantizing binary images. It is observed that the discriminating properties of the fingerprint images are preserved at very low bit rates. Simulation results are presented.
Time-varying subband image coding with efficient reduction of higher-order redundancy
Benoit Maison, Luc Vandendorpe
A region-adaptive subband coding algorithm is studied. The shape of the regions is not given beforehand, but is the result of a joint optimization with the set of coding operators. A simple space-varying M x M-band subband decomposition technique with instantaneous switching is utilized so that each M by M image block can be allocated to one of N concurrent encoders. The joint optimization is iterative and switches back and forth between optimization of the region shapes and of the coding operators defined by a set of subband filters and entropy coding tables ( quantization is uniform and constant) . From an information theoretic viewpoint, this procedure corresponds to the modeling of higher order redundancy by means offinite multidimensionalmixtures. The algorithm is tested on natural images and several conclusions are drawn. Region-adaptive coding presents a significant advantage compared to the equivalent single coder system. Although the optimal regions exhibit a distinctive structure, it is very different from any high level object-based segmentation. Finally, the efficiency of the approach lies mainly in its region-adaptive entropy coding capability. Adaptation of the transform operator itself appears to be less important. Keywords: Image Compression, Subband Coding, Adaptive, Mixture Distributions, HOS, Nonlinear Modeling
Poster Presentations on Filtering
icon_mobile_dropdown
Progressive ROI coding and diagnostic quality for medical image compression
This work addresses the delicate problem of lossy compression of medical images. More specifically, a selective allocation of coding resources is introduced based on the concept of 'diagnostic interest' and an interactive methodology based on a new measure of 'diagnostic quality'. The selective allocation of resources is made possible by a selection a priori of regions of specific interest for diagnostic purpose. The idea is to change the precision of representation in a transformed domain of region of particular interest, through a weighting procedure by an on- line user-defined quantization matrix. The overall compression method is multi-resolution, provides for an embedded generation of the bit-stream and guarantees for a good rate-distortion trade-off, at various bit-rates, with spatially varying reconstruction quality. This work also analyzes the delicate issue of a professional usage of lossy compression in a PACS environment. The proposed compression methodology gives interesting insights in favor of using lossy compression in a controlled fashion by the expert radiologist. Most of the ideas presented in this work have been confirmed by extensive experimental simulations involving medical expertise.