Proceedings Volume 3653

Visual Communications and Image Processing '99

cover
Proceedings Volume 3653

Visual Communications and Image Processing '99

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 28 December 1998
Contents: 22 Sessions, 149 Papers, 0 Presentations
Conference: Electronic Imaging '99 1999
Volume Number: 3653

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Image/Video Coding Standards
  • Special Session: Robust Image and Video Transmission Part I
  • Pre and Post Processing, Filtering
  • Special Session: Robust Image and Video Transmission Part II
  • Wavelet/Subband Coding
  • Poster Presentations on Image Processing Applications
  • Poster Presentations on Motion and Object-Based Coding
  • Video Coding
  • Stereo and 3D Imaging
  • Poster Presentations on Image Recovery
  • Poster Presentations on Implementations
  • Special Session: Internet Video
  • Architectures and Implementations
  • Feature Extraction
  • Poster Presentations on Image and Video Coding
  • Image Coding
  • Content-Based, Model-Based, Object-Based Coding
  • Poster Presentations on Segmentation and Synthetic View
  • Lossless compression
  • Special Session: SHD and Electronic Cinema
  • Motion Estimation
  • Special Session: Multimedia Indexing, Retrieval, and Presentation
Image/Video Coding Standards
icon_mobile_dropdown
Scene description, composition, and playback systems for MPEG-4
Atul Puri, Robert L. Schmidt, Barry G. Haskell
We discuss a few selected aspects of MPEG-4 Systems which includes a representation of description of multimedia scenes, delivery of coded multimedia objects and the scene description, composition, and synchronized playback. We first present an overview of BIFS, the MPEG-4 scene description language, as well as a brief background of VRML, on which BIFS is based. We then discuss the basics of a (nonadaptive) MPEG-4 playback system, including issues in BIFS and media decoding. The state of the software implementation in MPEG-4 of the 2D scene player and the 3D scene player is then discussed. Thus, the novel aspects of MPEG-4 Systems version 1 work which is near completion, is reviewed. We also discuss related aspects of the ongoing work on version 2 of MPEG-4 Systems. This includes additional work on scene description, referred to as advanced BIFS, and work on JavaTM based extension of MPEG- 4 System to an adaptive system referred to as MPEG-J. The work on advanced BIFS, extends the version 1 scene description by a number of additional capabilities. The work on MPEG-J involves design of architecture to control version 1 system, and set of API. Thus, the status of some of the ongoing work on MPEG-4 Systems version 2 is also reviewed.
Real-time motion-based H.263+ frame rate control
Most existing H.263+ rate control algorithms, e.g. the one adopted in the test model of the near-term (TMN8), focus on the macroblock layer rate control and low latency under the assumptions of with a constant frame rate and through a constant bit rate (CBR) channel. These algorithms do not accommodate the transmission bandwidth fluctuation efficiently, and the resulting video quality can be degraded. In this work, we propose a new H.263+ rate control scheme which supports the variable bit rate (VBR) channel through the adjustment of the encoding frame rate and quantization parameter. A fast algorithm for the encoding frame rate control based on the inherent motion information within a sliding window in the underlying video is developed to efficiently pursue a good tradeoff between spatial and temporal quality. The proposed rate control algorithm also takes the time-varying bandwidth characteristic of the Internet into account and is able to accommodate the change accordingly. Experimental results are provided to demonstrate the superior performance of the proposed scheme.
Improved H.263 video codec with motion-based frame interpolation
A fast block-based motion frame interpolation (FMCI) and an adaptive frame skipping scheme (AFS) are proposed for the H.263/H.263+ decoder and encoder, respectively, in this research. The proposed FMCI decoder can successfully interpolate non-coded frames so that a video clip with skipped frames can be played without jerkiness. In the FMCI decoder, the block-based motion fields from the encoder are directly utilized to generate interpolated frames without performing further motion search. Thus, the computational complexity of the interpolation operation is significantly saved. With AFS, the encoder can adaptively choose the frame skip number based on the prediction result of the embedded FMCI. AFS generates a bit stream with a variable frame rate, which can increase the coding efficiency and enhance the performance of the FMCI interpolation at the decoder. FMCI works with any bit stream generated by the standard H.263/H.263+ encoder with or without incorporating AFS.
Special Session: Robust Image and Video Transmission Part I
icon_mobile_dropdown
Tools for robust image and video coding in JPEG-2000 and MPEG-4 standards
Jie Liang, Raj Talluri
In this paper, we review the tools for error resilient image and video coding that have already been incorporated in the upcoming ISO/IEC MPEG4 and JPEG2000 video and image coding standards. We also review the ongoing work that are being currently conducted at the evolving JPEG2000 standard. The methodology adopted by the MPEG and JPEG standards bodies for developing error resilience tools is also discussed. Finally, we provide performance data on the effectiveness of these error resilience tools under various channel conditions.
Joint source coding, transport processing, and error concealment for H.323-based packet video
Qin-Fan Zhu, Louis Kerofsky
In this paper, we investigate how to adapt different parameters in H.263 source coding, transport processing and error concealment to optimize end-to-end video quality at different bitrates and packet loss rates for H.323-based packet video. First different intra coding patterns are compared and we show that the contiguous rectangle or square block pattern offers the best performance in terms of video quality in the presence of packet loss. Second, the optimal intra coding frequency is found for different bitrates and packet loss rates. The optimal number of GOB headers to be inserted in the source coding is then determined. The effect of transport processing strategies such as packetization and retransmission is also examined. For packetization, the impact of packet size and the effect of macroblock segmentation to picture quality are investigated. Finally, we show that the dejitter buffering delay can be used to the advantage for packet loss recovery with video retransmission without incurring any extra delay.
Bidirectional synchronization and hierarchical error correction for robust image transmission
HongZhi Li, Chang Wen Chen
In this paper, we present a novel joint source and channel image coding scheme for noisy channel transmission. The proposed scheme consists of two innovative components: (1) Intelligent bi-directional synchronization, and (2) Layered bit-plane error protection. The bi-directional synchronization is able to recover the coding synchronization when any single or even when two consecutive synchronization codes are corrupted by the channel noise. With synchronized partition, unequal error protection for each bit-plane can be designed to suit for a wide range of channel environments. The hierarchical error protection strategy is based on the analysis of bit-plane error sensitivity, aiming at achieving an optimal joint source and channel coding when the compressed image data are transmitted over noisy channels. Experimental results over extensive channel simulations show that the proposed scheme outperforms the approach proposed by Sherwood and Zeger who have reported the best numerical results in the literature.
Macroscopic multistage image compression for robust transmission over noisy channels
Greg Sherwood, Kenneth Zeger
We propose a macroscopic multistage compression system to provide progressive and robust transmission of images across noisy channels with varying statistics. Each stage encodes the residual image of the previous stage. The choice of source coder and transmission rate at each stage are design parameters. The multistage structure allows the use of efficient unequal error protection channel coding and introduces source redundancy to enable graceful degradation under severe channel conditions. Both bit errors and packet losses are considered. Specific examples are provided to demonstrate the performance of the proposed method.
Error-resilient wavelet image coding with fast resynchronization
Te-Chung Yang, Sunil Kumar, C.-C. Jay Kuo
An improved self-synchronizing Huffman code is proposed to decrease the error propagation length in the compressed bit stream caused by errors during transmission. After regaining synchronization, the decoder may still not be able to align each symbol with its correct location due to the wrong number of previously decoded symbols in the error propagation region. A scheme to identify the probable error location and then move symbols towards their correct positions is also proposed by exploiting the correlation between coefficients of the current subband and their parent subband. Experiments under different error conditions are performed, and it is demonstrated that the proposed error resilient techniques provide more a robust codec with very little sacrifice in the coding efficiency.
Pre and Post Processing, Filtering
icon_mobile_dropdown
Total variational blind image restoration from image sequences
Shang-Hong Lai, Yuntao Cui
Blind image restoration is to recover the original images from the blurred images when the blurring function in the image formation process is unknown. In this paper, we present an efficient and practical blind image restoration algorithm based on total variational (TV) regularization. The TV regularization employs TV norm on the images for the smoothness constraint, while the traditional regularization uses H1 norm for the smoothness constraint. The TV regularization provides a larger functional space for the image functions and is known for allowing discontinuities in the image function to be recovered. The blur functions considered in this paper are combinations of a Gaussian defocus blur and a uniform motion blur, that each can be approximated by a parametric function of one or two parameters. The use of this parametric form intrinsically imposes a constraint on the blur function. The small number of parameters involved in the parametric blur function makes the resulting optimization problem tractable. The above formulation for the restoration from a single image is then extended to the blind restoration from an image sequence by introducing motion parameters into the multi-frame data constraints. An iterative alternating numerical algorithm is developed to solve the nonlinear optimization problems. Each iteration of the alternating numerical algorithm involves the Fourier preconditioned conjugate gradient iterations to update the restored image and quasi-Newton steps to update the blur and motion parameters. Some experimental results are shown to demonstrate the usefulness of our algorithm.
Algorithm for estimating the degree of camera shaking and noise corruption
Byung-Chul Choi, Ji-Woong Choi, Moon Gi Kang
While acquiring the image, the shaking of the acquiring device or of the object seriously damages the acquired image. This phenomenon, which decreases the distinction of the image is called motion blur. In this paper, a newly defined function is introduced for finding the degree and the length of the motion blur. The domain of this function is called the Peak-trace domain. In the Peak-trace domain, the noise dominant region -- for calculating the noise variance -- and the signal dominant region -- for extracting the degree and the length of the motion blur -- are defined. Using the information of the Peak- trace in the signal dominant region, we can fastly find the direction of the motion blur with noise immunity. A new weighted least mean square method helps extracting the Peak- trace more precisely. After getting the direction of the motion blur, we can find the degree of the motion blur very fast using a one dimensional Cepstrum method. In our experiment, we could efficiently restore the damaged image using the information we got by the above mentioned method.
Low-complexity postprocessing of wavelet-coded images via robust estimation and nonlinear filtering
Mei-Yin Shen, C.-C. Jay Kuo
A postprocessing algorithm for compression artifact reduction in low-bit-rate wavelet coding is proposed in this work. We first formulate the artifact reduction problem as a robust estimation problem. Under this framework, the artifact-free image is obtained by minimizing a cost function that accounts for the smoothness constraint as well as image fidelity. To compute the estimate, computationally intensive algorithms such as simulated annealing and gradient descent search are often adopted. To reduce the computational complexity, a nonlinear filtering technique is proposed in this work to find the approximate global minimum with a lower computational cost. We have performed our experiments on images coded by JPEG-2000 standard and observed the proposed method is effective in reducing the severe ringing artifact while maintaining low complexity and low memory bandwidth.
Iterative blocking artifact reduction using a wavelet transform
Ick Hoon Jang, Hyun-Joo So, Nam Chul Kim
We propose an iterative algorithm for reducing the blocking artifact in block transform-coded images by using a wavelet transform (WT). An image is considered as a set of one- dimensional (1-D) horizontal and vertical signals and 1-D WT is utilized in which the mother wavelet is the first order derivative of a Gaussian like function. The blocking artifact is reduced by removing the blocking component that causes the variance at the block boundary position in the first scale wavelet domain to be abnormally higher than those at the other positions using a minimum mean square error (MMSE) filter in the wavelet domain. This filter minimizes the MSE between the ideal blocking component-free signal and the restored signal in the neighborhood of block boundaries in the wavelet domain. It also uses local variance in the wavelet domain for pixel adaptive processing. The filtering and the projection onto a convex set of quantization constraint are iteratively performed in alternating fashion. Experimental results show that the proposed method yields not only a PSNR improvement of about 0.46 - 1.26 dB, but also subjective quality nearly free of the blocking artifact and edge blur.
Novel compressed image improving technique by blending a reference image
Supatana Auethavekiat, Kiyoharu Aizawa, Mitsutoshi Hatori
A novel image improving algorithm for compressed image sequence by blending with reference images is presented. The reference image is defined as the high quality still image of the same scene. Compressed images are improved by blending the data directly with the reference image in both DCT and spatial domain. Amount of blending is controlled by the resemblance between the reference and the compressed image. Experiments conducted on the sequence of H.263 compressed image is presented. Prior information about compression technique is not required so this technique should be able to apply to other techniques as well.
Set theoretic inverse halftoning method for grayscale and color images
Gozde Bozkurt, Enis A. Cetin
Inverse halftoning is the problem of recovering a continuous- tone image from a given halftoned image. In this paper, a new inverse halftoning method which uses a set theoretic formulation is introduced. The new method exploits the prior information at hand, and uses space-domain projections, frequency-domain projections, and space-scale domain projections to obtain a feasible reconstruction of the continuous-tone image. The proposed method is also extended for the inverse halftoning of color error-diffused images.
Sampling conditions for anisotropic diffusion
Multi-resolution image analysis utilizes subsampled image representations for applications such as image coding, hierarchical image segmentation and fast image smoothing. An anti-aliasing filter may be used to insure that the sampled signals adequately represent the frequency components/features of the higher resolution signal. Sampling theories associated with linear anti-aliasing filtering are well-defined and conditions for nonlinear filters are emerging. This paper analyzes sampling conditions associated with anisotropic diffusion, an adaptive nonlinear filter implemented by partial differential equations (PDEs). Sampling criteria will be defined within the context of edge causality, and conditions will be prescribed that guarantee removal of all features unsupported in the sample domain. Initially, sampling definitions will utilize a simple, piecewise linear approximation of the anisotropic diffusion mechanism. Results will then demonstrate the viability of the sampling approach through the computation of reconstruction errors. Extension to more practical diffusion operators will also be considered.
Special Session: Robust Image and Video Transmission Part II
icon_mobile_dropdown
Use of FEC coding to improve statistical multiplexing performance for video transport over ATM networks
The use of forward error-control (FEC) coding, possibly in conjunction with ARQ techniques, has emerged as a promising approach for video transport over ATM networks for cell-loss recovery and/or bit error correction, such as might be required for wireless links. Although FEC provides cell-loss recovery capabilities it also introduces transmission overhead which can possibly cause additional cell losses. A methodology is described to maximize the number of video sources multiplexed at a given quality of service (QoS), measured in terms of decoded cell loss probability, using interlaced FEC codes. The transport channel is modelled as a block interference channel (BIC) and the multiplexer as single server, deterministic service, finite buffer supporting N users. Based upon an information-theoretic characterization of the BIC and large deviation bounds on the buffer overflow probability, the described methodology provides theoretically achievable upper limits on the number of sources multiplexed. Performance of specific coding techniques using interlaced nonbinary Reed-Solomon (RS) codes and binary rate-compatible punctured convolutional (RCPC) codes is illustrated.
Robustness of adaptive quantization for subband coding of images
Hong Man, Mark J. T. Smith, Faouzi Kossentini
In this paper, we present a generalized framework for the design of adaptive quantization that is able to achieve a good balance between high compression performance and channel error resilience. The unique feature of our proposed adaptive quantization technique is that it improves the channel error resilience of the compression system. It also provides a simple way to perform bit stream error sensitivity analysis, which previously was only available for fixed rate quantization schemes. The coder automatically classifies the compressed data sequence into separated subsequences with different error sensitivity levels, which enables a good adaptation to different channel models according to their noise statistics and error protection schemes. Two sets of adaptive quantization examples are provided for subband coding of images. The first set is based on a layered quantization/coding approach where our techniques directly quantizes the subband coefficients. The other set is designed for a conventional subband coding system with optimal bit allocation and fixed rate quantization at each subband. Under this second structure, the technique performs lossless compression on quantized subband coefficients. Experimental results have shown that our coders can obtain high quality compression performance with significantly improved resilience to channel errors.
Error-resilient video coding in H.263+ against error-prone mobile channel
Dong-Seek Park, Jeong-Hoon Park, Jong Dae Kim, et al.
This paper presents a new error-resilient scheme for the as- yet-defined H.263++ (backward-compatible with both H.263+ and H.263) standard of Question 15/Study Group 16 in International Telecommunication Union (ITU). The key idea is the layered protection of video bitstream in the context of data partitioning scheme, which can be realized by doing adaptive insertion of part of encoded video stream as the redundancy according to channel status and the importance of the syntax. Since the additional data at each layer can help on decoding with increased reliability picture quality can be improved. Also, number of frames decoded is increased with aid of the additional layer.
Uniform-threshold TCQ with block classification for image transmission over noisy channels
A combined source-channel coding scheme without explicit error protection is proposed to transmit images over noisy channels. Major components of the proposed coding scheme include 2-D DCT with block classification, fixed-length uniform threshold trellis coded quantization (UTTCQ), optimal bit allocation algorithm and noise reduction (NR) filters. The integration of these components allows us to organize the compressed bitstream in such a way that it is less sensitive to channel noise, and hence achieves data compression and error resilience at the same time. This paper reports our recent study by incorporating the block classification into the integrated scheme. Experimental results show that, in the case of noise-free channels and at the bit rate of 0.5 bpp, an improvement of 2.33 dB can be achieved with the classification. In the case of noisy channels, the gain is decreasing with the increasing of bit error rate to an average improvement of 0.46 dB with BER equals 0.1. Our proposed system uses no error protection, no synchronization codewords and no entropy coding. However, it shows decent compression ratio and gracious degradation with respect to increasing channel errors.
Loading algorithms for subband coded image transmission using multicarrier modulation
Haito Zheng, Kuo Juey Ray Liu
We present a new parallel framework for multimedia data transmission over spectrally shaped channels using multicarrier modulation. Source images are first decomposed into a set of layers with different perceptual importance. Unlike the traditional approaches, the layers are transmitted simultaneously, each occupying a number of subchannels. We develop parallel loading algorithms which distribute the transmitted power and data rate among the subchannels efficiently, to provide unequal error protection. A power optimized serial transmission system is also developed. Simulation results show that the parallel transmission system achieve significant performance improvement compared to the optimized serial transmission and the existing loading algorithms developed for data transmission. Performance comparison under different parameters are also presented. The proposed algorithms are well suited for both Additive Gaussian White Noise (AGWN) channels and spectrally shaped channels which are typical in ADSL systems. Numerical comparison for different parameters are also given.
True motion vectors for robust video transmission
In this paper, we make use of true motion vectors for better error concealment. Error concealment in video is intended to recover the loss due to channel noise by utilizing available picture information. In our work, we do not change the syntax and thus no additional bits are required. This work focuses on improving the error concealment with transmitted true motion vectors. That is, we propose a 'true' motion estimation at the encoder while using a post-processing error concealment scheme that exploits motion interpolation at the decoder. Given the location of the lost regions and various temporal error concealment techniques, we demonstrate that our true motion vectors perform better than the motion vectors found by minimal-residue block-matching. Additionally, we propose a new error concealment technique that improves reconstruction quality when the previous frame has been heavily damaged. It has been observed that in the case of a heavily damaged frame, better predictions can be made from the past reference frame, rather than the current reference frame which is damaged. This is accomplished by extrapolating the decoded motion vectors so that they correspond to the past reference frame.
Robust video compression for time-varying wireless channels
We investigate joint source-channel coding for transmission of video over time-varying channels. We assume that the channel state is known at the receiver, but only statistical description of the time varying nature of the channel is available at the transmitter. A multimode coder is proposed to efficiently quantize the input video, and generate a quasi fixed-length bit stream of unequal importance. We vary the error protection offered to the individual bits, by matching it to both its importance, and the channel noise statistics. Based on the channel state, the decoder makes the best estimate of the source vector from the received codeword. We present a design algorithm which optimizes the overall rate- distortion performance of the system. Simulations results show that the proposed system outperforms a reference scheme where the multimode (source) codes and the channel codes were designed separately. Further, both the multimode coding schemes provide substantial gains over fixed length JSCC coding.
Effect of error distribution in channel coding failure on MPEG wireless transmission
P. Max Robert, Ahmed M. Darwish, Jeffrey H. Reed
This paper examines the interaction between digital video and channel coding in a wireless communication system. Digital video is a high-bandwidth, computationally intensive application. The recent allocation of large tracks of spectrum by the FCC has made possible the design and implementation of personal wireless digital video devices for several applications, from personal communications to surveillance. A simulation tool was developed to explore the video/channel coding relationship. This tool simulates a packet-based digital wireless transmission in various noise and interference environments. The basic communications system models the DAVIC (Digital Audio-Visual Council) layout for the LMDS (Local Multipoint Distribution Service) system and includes several error control algorithms and a packetizing algorithm that is MPEG-compliant. The Bit-Error-Rate (BER) is a basic metric used in digital communications system design. This work presents simulation results that prove that BER is not a sufficient metric to predict video quality based on channel parameters. Evidence will be presented to show that the relative positioning of bit errors, regardless of absolute positioning and the relative occurrence of these bit error bursts are the main factors that must be observed in a physical layer to design a digital video wireless system.
MPEG-compliant joint source/channel coding using DCT and substream scheduling for packet networks
Seong-Whan Kim, Shan Suthaharan, Geun-Ho Lee, et al.
QoS-guarantee in real-time communication for multimedia applications is significantly important. An architectural framework for multimedia networks based on substreams or flows is effectively exploited for combining source and channel coding for multimedia data. But the existing frame by frame approach which includes Moving Pictures Expert Group (MPEG) cannot be neglected because it is a standard. In this paper, firstly, we designed an MPEG transcoder which converts an MPEG coded stream into variable rate packet sequences to be used for our joint source/channel coding (JSCC) scheme. Secondly, we designed a classification scheme to partition the packet stream into multiple substreams which have their own Quality of Service (QoS) requirements. Finally, we designed a management (reservation and scheduling) scheme for substreams to support better perceptual video quality such as the bound of end-to-end jitter. We have shown that our JSCC scheme is better than two other popular techniques by simulation and real video experiments on the TCP/IP environment.
Error-resilient video transmission over the Internet
Fabrice Le Leannec, Christine M. Guillemot
Targeting multimedia communications over the Internet, this paper describes a technique in the direction of improved packet loss resiliency of video compressed streams. Aiming at a best trade-off between compression efficiency and packet loss resiliency, a procedure for adapting the video coding modes to varying network characteristics is introduced. The coding mode selection is based on a rate-distortion procedure with global distortion metrics incorporating channel characteristics under the form of a two states Markov model. This procedure has been incorporated in an MPEG-4 video encoder. It has been observed that, in error-free environments, the channel adaptive mode selection technique allows a significant gain with respect to simple conditional replenishment. On the other hand, under loss conditions, it is shown that this procedure significantly improves the encoder's performance with respect to the original MPEG-4 encoder, to approach the robustness of conditional replenishment mechanisms.
Wavelet/Subband Coding
icon_mobile_dropdown
Evaluation of color-embedded wavelet image compression techniques
Martha Saenz, Paul Salama, Ke Shen, et al.
Color embedded image compression is investigated by means of a set of core experiments that seek to evaluate the advantages of various color transformations, spatial orientation trees and the use of monochrome embedded coding schemes such as EZW and SPIHT. In order to take advantage of the interdependencies of the color components for a given color space, two new spatial orientation trees that relate frequency bands and color components are investigated.
Embedded and efficient low-complexity hierarchical image coder
We propose an embedded hierarchical image coding algorithm of low complexity. It exploits two fundamental characteristics of an image transform -- the well defined hierarchical structure, and energy clustering in frequency and in space. The image coding algorithm developed here, apart from being embedded and of low complexity, is very efficient and is comparable to the best known low-complexity image coding schemes available today.
Wavelet image coding using trellis-coded space-frequency quantization
Recent progresses in wavelet image coding have brought the field into its maturity. Major developments in the process are rate-distortion (R-D) based wavelet packet transformation, zerotree quantization, subband classification and trellis- coded quantization, and sophisticated context modeling in entropy coding. Drawing from past experience and recent insight, we propose a new wavelet image coding technique with trellis coded space-frequency quantization (TCSFQ). TCSFQ aims to explore space-frequency characterizations of wavelet image representations via R-D optimized zerotree pruning, trellis coded quantization, and context modeling in entropy coding. Experiments indicate that the TCSFQ coder achieves twice as much compression as the baseline JPEG coder does at the same peak signal to noise ratio (PSNR), making it better than all other coders described in the literature.
Activity-selective SPIHT coding
Marcia G. Ramos, Sheila S. Hemami
An embedded image coder that provides very effective progressive image transmission and a high degree of spatial scalability while achieving superior visual performance to SPIHT is presented. A new partitioning structure combined with scale-based wavelet weights achieves a significance reordering that codes more significant information belonging to the lower frequency bands earlier in the bitstream when compared to SPIHT. This reordering provides substantial improvements in progressive transmission, spatial scalability, and PSNR, especially at low bit rates. Further visual gains are achieved by exploiting human visual system (HVS) characteristics to weight wavelet coefficients according to their perceptual importance, causing a reordering of the coefficients. The activity selective SPIHT coder provides higher perceptual quality and higher PSNRs than SPIHT at low bit rates, and comparable visual quality with slightly lower PSNRs at high bit rates.
Progressive coding of medical volumetric data using three-dimensional integer wavelet packet transform
Zixiang Xiong, Xiaolin Wu, David Y. Yun, et al.
We examine progressive lossy to lossless compression of medical volumetric data using three-dimensional (3D) integer wavelet packet transforms and set partitioning in hierarchical trees (SPIHT). To achieve good lossy coding performance, we describe a 3D integer wavelet packet transform that allows implicit bit shifting of wavelet coefficients to approximate a 3D unitary transformation. We also address context modeling for efficient entropy coding within the SPIHT framework. Both lossy and lossless coding performances are better than those reported recently in reference one.
Regularized edge-preserving subband coding scheme
Sung Wai Hong, Paul Bao
In this paper, we introduce a new approach for edge preserving image compression technique based on the wavelet transform and iterative constrained least square regularization approach. This approach treats image reconstructed from lossy image compression as the process of image restoration. It utilizes the edge information detected from the source image as a priori knowledge for the subsequent reconstruction In order to compromise the overall bit rate incurred by the additional edge information, a simple vector quantization scheme is proposed to classify the edge bit-planes pattern into a number of binary codevectors. The experiment showed that the proposed approach could definitely prove both objective and subjective quality of the reconstructed image by recovering more image details and edges.
Poster Presentations on Image Processing Applications
icon_mobile_dropdown
Image quality assessment for an Intel digital imaging chip set prototype
Lawrence A. Booth Jr., Phillip G. Austin, Caren Firsty, et al.
Image quality assessments for Intel's digital imaging chip set prototype are made using objective and subjective image quality assessment criteria. Objective criteria such as signal to noise ratio, linearity, color error, dynamic range, and resolution are used to provide quantitative metrics for engineering development. Subjective criteria such as mean observer scores derived from single stimulus and paired comparison adjectival ratings provide overall product image quality assessment that are used to determine product acceptability assessments for product marketing analysis. These metrics along with the subjective assessment, serve as development tools which allow the product development team to focus on the critical areas which improve the image quality of the product.
Online elimination of reflected images to generate high-quality images
Noboru Ohnishi, Masaki Iwase, Tsuyoshi Yamamura, et al.
We often see scenes where an object's image is reflected on window glass and overlaps with a image of another object behind the glass. This paper proposes on-line methods for eliminating images reflected specularly on a smooth surface such as glass and plastic. Our methods are based on the optical property that light reflected on glass is polarized, while light transmitted through glass is less polarized. It is possible to eliminate reflected light with a polarizing filter. The polarization direction, however, changes even for planar glass and is not easily determined without information about the position and orientation of the glass and objects relative to the camera. Our method uses a series of images obtained by rotating a polarizing filter placed in front of a camera. Reflected images are removed by selecting just minimum image intensity among a series of images for each pixel. We propose two methods for estimating minimum image; one is min- max method and the other parameter estimation method. We conducted experiments with real images and compared the performances of the two methods. As a result, we could generate high quality image without reflected images at semi- video rate of 15 frames per second.
Practical approach to the registration of multiple frames of video images
Ikram E. Abdou
Image registration deals with finding the geometric mapping, which, when applied to a region of interest (ROI) in one image, adjusts it such that it matches as closely as possible a corresponding ROI in a reference image. In this paper we address the application of image registration in the processing of multiple frames of video sequences. We discuss three levels of difficulty in the image registration problem: mathematical, statistical, and structural. Based on this analysis and the understanding of the requirements in our application, we select region-based methods for registering the video images, and describe these methods in detail. We extend the study to subpixel image registration and propose a new and more accurate subpixel registration method based on cross-spectrum interpolation. To demonstrate the performance of these methods, we apply them to the registration of real data.
Analysis and improvement of iterative image interpolation using asymmetric regularization
Jeong-Ho Shin, Junghoon Jung, Joon-Ki Paik
This paper presents an adaptive regularized image interpolation algorithm, which can restore high frequency details in the original high resolution image. In order to apply the regularization approach to the interpolation procedure, we first present a two-dimensional separable image degradation model for a low resolution imaging system. Based on the image degradation model, we can have an interpolated image which minimizes both residual between the high resolution and the interpolated images with a prior constraints. In addition, by using spatially adaptive constraints and regularization parameters, directional high frequency components are preserved with efficiently suppressed noise. We also analyze convergence of the proposed adaptive iterative algorithm. As a result, step length of the adaptive algorithm should be less than the non-adaptive algorithm, and the ratio of two quantities is proportional to the number of different constraints used in the adaptive algorithm. In the experimental results, interpolated images using the conventional algorithms are shown to compare the conventional algorithms with the proposed adaptive algorithm. Moreover, we provide experimental results which are classified into non- adaptive and adaptive algorithms. Based on the experimental results, the proposed algorithm provides a better interpolated image than the conventional non-adaptive interpolation algorithms in the sense of both subjective and objective criteria. More specifically, the proposed algorithm has the advantage of preserving directional high frequency components and suppressing undesirable artifacts such as noise.
Registration of synthetic aperture radar images using a multiresolution Radon transform
Timothy Myles Payne, Julian F.A. Magarey, Garry N. Newsam
This paper describes a new algorithm for registration of synthetic aperture radar (SAR) images. The algorithm is based on cross-correlation in the Radon transform domain, chosen principally because of the prevalence of line-like features in SAR images. The distributed nature of such features, and their persistence in the image independently of 'look angle,' make Radon-domain correlation appropriate for the peculiar challenge of SAR image registration. Furthermore, 2D cross- correlation in Radon space may be efficiently implemented as 1D convolution followed by backprojection. To handle local variations caused by terrain elevations and errors in global parameters, we use a coarse-to-fine matching strategy based on a novel multiresolution Radon transform pyramid. This may be efficiently constructed from an initial fine partition of the image into disjoint tiles, using alternating grouping and decimation steps. The whole algorithm is linear in the number of such tiles. Test results demonstrate that the new algorithm performs comparably to pixel-similarity-based registration when the look angle is the same, and much better for pairs with different look angles.
Polynomial methods for SfM assuming central projection
Sahar M. Ghanem, Mohammed A. Ismail, Soheir A. Bassiony
We are concerned with determining three-dimensional structure and motion of objects in space from images known as structure from motion (SfM) problem. We focus on a two-frame feature- based SfM problem. The reformulation of the problem using central projection as the projection model is demonstrated. A new capability of estimating focal length is introduced in addition to structure and motion parameters, which allows for processing uncalibrated camera. The use of polynomial systems of equations to formulate the problem is explained. The formulation of polynomial systems using the central projection model added a restriction that the depth of at least one of the feature points must be known. Thus it is assured that we solve for the true structure and translation (not to a scale factor).
New shift, scaling, and derivative properties for the DCT
Roger Reeves
The DCT is used in image and video compression standards JPEG, MPEG and H.261. A set of properties for shifting and scaling by fractional amounts, and taking linear operations such as differentiation and integration is described. The properties take as input the DCT coefficients of a sampled signal, subject them to a linear transform, and return the DCT coefficients of a sampled signal which has been subject to the corresponding operation. The properties are derived by considering the inverse discrete transform as a sum of continuous basis functions. Mathematically, the properties are equivalent to taking the inverse transform of the DCT coefficients; reconstructing the continuous signal using an infinite sum of sinc functions, performing the desired operation (shift, scale, differentiate, integrate) on the reconstructed signal, resampling the result, and then taking the DCT of the resulting samples. It is proved that such an approach is valid for the type 2 DCT, a 2D version of which is used in JPEG, MPEG and H.261. A consequence of this method is that the original signal is assumed to be symmetrically extended and periodically repeated with period 2N, where N is the size of the DCT. Operations which result in points outside the DCT window in the reconstructed signal will return a point on the symmetric extension of the signal. In most cases this will result in an error being introduced because no actual information exists on what the value of the signal should be except within the DCT window. This approach has an exact analog in the signal domain, and can also be incorporated into the reverse transform. The techniques may prove useful in compressed domain processing applications, and are interesting because they allow operations from the continuous domain such as integration and differentiation to be interpreted in the discrete domain, using the sampling theorem.
Poster Presentations on Motion and Object-Based Coding
icon_mobile_dropdown
Packed binary representations for fast motion estimation on general-purpose architectures
Sriram Sethuraman, Ravi Krishnamurthy
Reduced representations have been used to decrease the memory bandwidth requirements of fast motion estimation schemes. Usually, this is achieved on special-purpose architectures that exploit the reduced representations to do several distortion calculations in parallel. In this paper, we present a generic fast implementation that its suitable for various general-purpose architectures. The algorithm uses a novel data structure that is based on packing and 'overlapping' the reduced representation data into the native word size of the processor. Efficient motion estimation schemes to minimize the memory bandwidth between the processor and cache by exploiting this data structure are developed. These schemes can be tailored with ease to suit different general-purpose processors and media processors.
Fast full-search block matching based on combined SAD and MSE measures
Michael Bruenig, Wolfgang Niehsen
A new fast block matching algorithm is presented. The sum of absolute differences (SAD) and the mean square error (MSE) are used to find a suitable motion vector. A lower bound for both error measures is exploited to reduce the number of search positions and therefore the computational requirements. The error measures for the remaining search positions are calculated simultaneously so that the computational load for these calculations only slightly increases. The algorithm is compared to a fast full search block matching algorithm based on the same concept but only using the SAD or the MSE as the matching criterion. It is shown that the algorithm using both error measures combines the advantages of both algorithms using only on the SAD or the MSE.
Object-based wavelet compression using coefficient selection
Lifeng Zhao, Ashraf Ali Kassim
In this paper, we present a novel approach to code image regions of arbitrary shapes. The proposed algorithm combines a coefficient selection scheme with traditional wavelet compression for coding arbitrary regions and uses a shape adaptive embedded zerotree wavelet coding (SA-EZW) to quantize the selected coefficients. Since the shape information is implicitly encoded by the SA-EZW, our decoder can reconstruct the arbitrary region without separate shape coding. This makes the algorithm simple to implement and avoids the problem of contour coding. Our algorithm also provides a sufficient framework to address content-based scalability and improved coding efficiency as described by MPEG-4.
Video object analysis for content-based video coding
Yun He, Cheng Du, Tao Xie
Video object analysis is a key technique for the content based coding. Based on different video object description, different coding schemes are followed to give out the optimum coding performance. For further exploring the functionality of content based coding, the existed methods for video object analysis are reviewed, compared. A concept of the layered video object analysis is suggested and its relation to the layered coding is addressed in this paper.
Estimation of feature parameters in model-based image coding system for video conference
Lu Yu, Yunhai Liu, Qingdong Yao
Model-based image coding is a well-known solution for image communication at very low bit-rate. But very complex techniques and large amount of computation are involved in these systems. It is especially difficult to automatically extract Facial Definition Parameters (FDPs) and Facial Animation Parameters (FAPs), which are defined in MPEG-4, from 2D image to represent 3D moving objects. In this paper, an algorithm using intra- and inter-frame information to estimate features parameters is proposed. It utilizes spatial information (edge information) as well as temporal difference between successive frames. The combination using of 2 kinds of information makes the system more robust. Physiological symmetry and proportion is another kind of knowledge used here to make the system to less computational intenseness.
Facial expression recognition and model-based regeneration for distance teaching
Liyanage C. De Silva, V. V. Vinod, Kuntal Sengupta
This paper presents a novel idea of a visual communication system, which can support distance teaching using a network of computers. Here the author's main focus is to enhance the quality of distance teaching by reducing the barrier between the teacher and the student, which is formed due to the remote connection of the networked participants. The paper presents an effective way of improving teacher-student communication link of an IT (Information Technology) based distance teaching scenario, using facial expression recognition results and face global and local motion detection results of both the teacher and the student. It presents a way of regenerating the facial images for the teacher-student down-link, which can enhance the teachers facial expressions and which also can reduce the network traffic compared to usual video broadcasting scenarios. At the same time, it presents a way of representing a large volume of facial expression data of the whole student population (in the student-teacher up-link). This up-link representation helps the teacher to receive an instant feed back of his talk, as if he was delivering a face to face lecture. In conventional video tele-conferencing type of applications, this task is nearly impossible, due to huge volume of upward network traffic. The authors utilize several of their previous publication results for most of the image processing components needs to be investigated to complete such a system. In addition, some of the remaining system components are covered by several on going work.
Efficient parallel algorithm for hierarchical block-matching motion estimation
Charalampos Konstantopoulos, Andreas I. Svolos, Christos Kaklamamis
Motion estimation is an integral part of most of the video coding schemes that have been proposed in the literature. It is also the most computationally intensive part in these schemes and thus is usually implemented on high performance parallel architectures. In this paper, we deal with a multiresolution (hierarchical) block matching motion estimation algorithm. Specifically, we parallelize this algorithm on a hypercube based multiprocessor. As this algorithm presents a non regular data flow, it could not be easily implemented on systolic arrays. In contrast, the use of such an advanced network as the hypercube overcomes the problem of the non regular data flow, thereby providing high performance. Another important point in our study is that our multiprocessor is assumed to be fine grained unlike most of multiprocessors that has been proposed for video coding schemes. The constraint of limited local memory in each processor leads to frequent interprocessor communication and thus the employed techniques should be carefully selected in order to lower the communication overhead. Coarse grained architectures do not have this kind of problem because each processor can take most of the data it will need throughout the algorithm execution from the beginning. This greatly reduces the communication overhead, and thus the algorithm design is rather straightforward in this case.
Video Coding
icon_mobile_dropdown
Very low bit-rate vector-quantized video codecs
Lee David Scargall, Satnam Singh Dlay
In this paper, an image sequence coding scheme for very low bit-rate video coding is presented. We examine the performance of various codebooks to remove the spatial redundancy within the difference frame. When the codec is configured to operate at 10.1 kbit/s, average PSNR values in excess of 32.86dB and 25.6dB are achieved for the 'Miss America' and 'Carphone' sequences respectively. We also present a new methodology for adaptive vector quantization (AVQ), where the codebook is updated with new vectors. The new vectors replace less significant ones in the codebook based on a novel scoring criterion that utilizes a forgetting factor and codebook half- life. The proposed method gives rise to an additional performance enhancement of around 1 dB over conventional techniques of AVQ. Furthermore, the methods do not suffer from blocking effects due to the inherent properties of both the temporal and spatial coding.
Bitplane coding of DCT coefficients for image and video compression
Fan Ling, Weiping Li, Hongqiao Sun
In the current image and video coding standards, such as MPEG- 1, MPEG-2, MPEG-4, H.261, H.263, and JPEG, quantized DCT coefficients are entropy coded using a so-called run_value coding technique. A problem with this technique is that the statistics of the run_value symbols are highly dependent on the quantization step size and the dynamic range of the DCT coefficients. Therefore, a single fixed entropy coding table cannot achieve the optimal coding efficiency for all possible quantization step sizes and all possible dynamic ranges of the DCT coefficients. Bitplane coding of the DCT coefficients is a new coding scheme that overcomes this problem. It provides a better performance than run_value coding under all conditions.
Rate control for non-real-time video encoding
IMing Pao, Ming-Ting Sun
In streaming video applications, video sequences are encoded off-line and stored in a server. Users may access the server over a constant bit-rate channel such as Public Switched Telephone Network (PSTN) or Integrated Service Digital Network (ISDN). Examples of the streaming video are video on demand, archived video news, and non-interactive distance learning. Before the playback, part of the video bit-stream is pre- loaded in the decoder buffer to ensure that every frame can be decoded at the scheduled time. For these streaming video applications, since the delay (latency) is not a critical issue and the whole video sequence is available to the encoder, a more sophisticated bit-allocation scheme can be used to achieve better video quality. During the encoding process for streaming video, two constraints need to be considered: the maximum pre-loading time that the video viewers are willing to accept and the physical buffer-size at the receiver (decoder) side. In this paper, we propose a rate- control scheme that uses statistical information of the whole video sequence as a guidance to generate better video quality for video streaming involving constant bit-rate channels. Simulation results show video quality improvements over the regular H.263 TMN8 encoder.
Rate-constrained video coding using a flexible representation of motion
Daniel Lauzon, Eric Dubois
This paper descries a technique for representing motion information in a video coder. We present a novel way of representing motion, based on a dictionary of motion models, as well as related estimation techniques. Motion fields are represented by low-order polynomial-based models and a discrete label field. We develop an adaptive context-based entropy coding technique for the label field. In the paper, we address issues relating to rate-distortion optimal coding. Simulations based on a software implementation of the technique are compared to similar results for classical block- based motion compensation and coding techniques.
Adaptive scalar quantization for perceptual coding in the wavelet domain
Sam J. Liu
A generic transform/wavelet compression system is illustrated in Figure 1. In this system, a transform such as the DCT or a wavelet decomposition is first applied to the image. The resulting transform/wavelet coefficients are then quantized and coded to produce the compressed bitstream. The goal of perceptual image coding is not to achieve the best SNR but to produce an image with the best visual fidelity at the target bitrate. To accomplish this goal, the quantizer need to be both frequency and spatially adaptive. As an example of frequency adaptation, DCT based compression systems such as JPEG use a frequency weighted table, Q-table, to adjust the relative value of the quantization stepsizes. The weights typically increase with respect to the frequency to better match the noise masking property of the human visual system (HVS). Similarly, perceptual weighting has also been successfully applied to wavelet based coders, where the frequency bands are weighted based on a wavelet Q-table. As an example of spatial adaptation, the JPEG-Part3 and MPEG allows the Q-table to be scaled by a factor on a block by block basis. Spatial adaptation by scale factor modulation plays two important roles. First it allows the encoder to adaptively quantize each block based on the local spatial characteristics such as edge, texture, flat region, etc. Second it allows the encoder to achieve the target bitrate in a single pass, which is a very desirable feature in a practical coder. In light of this, the wavelet Q-table should also be allowed to change spatially in order to better reflect the HVS's response to various coefficient type, edge, texture, etc. This paper describes how spatial and frequency adaptive scalar quantization can be used in a wavelet coding framework, both embedded and non-embedded, to achieve perceptual coding and single-pass rate-control. The proposed quantization strategy can be easily supported syntactically, yet if offers a powerful tool to optimize wavelet coders.
Invertible three-dimensional analysis/synthesis system for video coding with half-pixel-accurate motion compensation
Three-dimensional subband coding with motion compensation (MC- 3DSBC) has been demonstrated [1-4] to be an efficient technique for video coding applications. With half-pixel- accurate motion compensation, images need to be interpolated for motion-compensated (MC) temporal filtering. The resulting analysis/synthesis system is not invertible. In this paper, we propose a new three-dimensional analysis/synthesis system which guarantees perfect reconstruction and has a nonrecursive coding structure. We replaced the analysis/synthesis system of [1] by the new scheme. The resulting coding system does not have distortion from the analysis/synthesis system and allocate bits among classes of 3-D subbands optimally in the sense of rate-distortion function. The experimental results show that the proposed video coding system improves [1] by PSNR .3 - 2.0 dB and TM5 MPEG [10] by PSNR 2.1 - 3.0 dB over a range of bit rates.
Dense motion field and uncovered region estimations for low-bit-rate video coding application
Keng-Pang Lim, Man Nang Chong, Amitabha Das
This paper presents a novel video-coding framework that uses dense motion field for efficient motion-compensated video coding. A new stochastic technique that is robust in overcoming the problem of occlusion is first used to estimate the dense motion field from the past-reconstructed frames. Using the continuum of motion in our proposed framework, the current dense motion field is predicted by projecting the estimated motion field. No motion information need to be coded since the decoder re-generates this motion information from its reconstructed frames. By making use of the predicted motion field, a novel uncovered background prediction is proposed to further improve the forward motion-compensated prediction. The algorithm is tested extensively on a number of standard video-conferencing sequences. With the same Peak Signal to Noise Ratio (PSNR) performance, an average compressing gain of 17% is achieved as compared with the compression ratio of H.263 algorithm that uses the half-pel overlapped block-matching motion-compensated prediction.
Toward a fair comparison of the coding efficiency of interlaced and progressive video
Although in the literature comparisons of the effectiveness of MPEG-2 coding on interlaced and progressive sources have been reported, we think some very important aspects are missing in the research sofar. Particularly, the differences in resulting blocking artifacts are neglected, while usually only scenes with abundant vertical detail are evaluated. From our experiments, we conclude that the general opinion concerning the effectiveness of MPEG-2 coding on interlaced picture material is likely biased by the focus on challenging sequences only, while the omission of blockiness metrics in the evaluation significantly increases this bias further.
Stereo and 3D Imaging
icon_mobile_dropdown
Modified overlapped block disparity compensation for stereo image coding
Woontack Woo, Antonio Ortega
In this paper, we propose a modified overlapped block matching (OBM) scheme for stereo image coding. The OBM scheme has been introduced in video coding, as a promising way to reduce blocking artifacts by using multiple vectors for a block, while maintaining the advantages of the fixed size block matching framework. However, OBM has its own limitations, even though it overcomes some drawbacks of block matching schemes. For example, to estimate an optimal displacement vector (DV) field, OBM requires complicated iterations. In addition, OBM does not always guarantee a consistent DV field, even through several iterations, because the estimation considers only the magnitude of the prediction error as a measure. Therefore, we propose a modified OBM scheme, which allows both consistent disparity estimation and efficient disparity compensation, without several iterations. In the proposed scheme, the computational burden resulting from iterations is reduced using 'open-loop' coding, which decouples the encoding into estimation and compensation. The consistent disparity estimation is performed by using a causal MRF model and a half-pixel search, while maintaining (or reducing) the energy level of disparity compensated difference frame. The compensation efficiency is improved by interpolating the reference image in half pixel accuracy and by applying OBM in part. To prove the efficiency of the proposed OBM scheme, we provide some experimental results, which show that the proposed scheme achieves higher PSNR, about 0.5 - 1 dB, as well as better perceptual quality, at a fraction of the computation, as compared to a conventional OBM.
Stereo correspondence using geometric relational matching
Sekhavat Sharghi, Farhad A. Kamangar
A new geometric relational matching approach is proposed to solve the stereo correspondence problem. The first distinct features are extracted in a pair of stereo images using a feature extractor. Then a newly developed window-based feature point detector is used to detect feature points from the extracted feature in both images. Feature points are connected with two points to form a straight line in both images. A match function representing the requirements of the epipola and disparity constraints in both images is proposed for straight line matching. Important information can be obtained from the parameter values attached to each line, such as distance and orientation. Information contained in the match function is used to determine straight-line correspondence. The method described here takes a unique approach to match straight lines. After that straight-line correspondence is established using the match function values in the left image and corresponding ones in the right image. Triplets of matched points are used to construct a model polygon in the left image. Then the entire right image is searched by an exhaustive search method to find a matching polygon. The computational complexity of the proposed method is proportional to the number of detected feature points in the image pair. Experimental results indicate that the method performs well for a variety of stereo images, and it is suitable for many applications.
Adaptive optimal quantization for 3D mesh representation in the spherical coordinate system
Jeong-Hwan Ahn, Yo-Sung Ho
In recent days, applications using 3D models are increasing. Since the 3D model contains a huge amount of information, compression of the 3D model data is necessary for efficient storage or transmission. In this paper, we propose an adaptive encoding scheme to compress the geometry information of the 3D model. Using the Levinson-Durbin algorithm, the encoder first predicts vertex positions along a vertex spanning tree. After each prediction error is normalized, the prediction error vector of each vertex point is represented in the spherical coordinate system (r,(theta) ,(phi) ). Each r is then quantizes by an optimal uniform quantizer. A pair of each ((theta) ,(phi) ) is also successively encoded by partitioning the surface of the sphere according to the quantized value of r. The proposed scheme demonstrates improved coding efficiency by exploiting the statistical properties of r and ((theta) ,(phi) ).
Low-bit-rate representation of cylindrical volume grids using Chebyshev bases: direct section computation, synthesis, and reconstruction
Ranjit P. Desai, Jai P. Menon
A large class of high-speed visualization applications use image acquisition and 3D volume reconstruction techniques in cylindrical sampling grids; these include real-time 3D medical reconstruction, and reverse engineering. This paper presents the novel use of Chebyshev bases in such cylindrical grid- based volume applications, to allow efficient computation of cross-sectional planes of interest and partial volumes without the computationally expensive step of volume rendering, for subsequent transmission in constrained bitrate environments. This has important consequences for low-bitrate applications such as video-conferencing and internet-based visualization environments, where interaction and fusion between independently sampled heterogenous data streams (images, video and 3D volumes) from multiple sources is beginning to play an important part. Volumes often embody widely varying physical signals such as those acquired by X-rays, ultrasound sensors in addition to standard c.c.d. cameras. Several benefits of Chebyshev expansions such as fast convergence, bounded error, computational efficiency, and their optimality for cylindrical grids are taken into account. In addition, our method exploits knowledge about the sampling strategy (e.g. position and trajectory of the sensor) used to acquire the original ensemble of images, which in turn makes the overall approach very amenable to internet-based low-bitrate applications.
Wavelet-based progressive view morphing
Dan Xu, Paul Bao
This paper presents a new view synthesis techniques using morphing and 2-d discrete wavelet transformation. We completely base on pairwise images that are known without calibrating camera and the depth information of images. First, we estimate the Fundamental Matrix related with any pair of images. Second, using fundamental matrix, any pair of image planes can be rectified to be parallel and their corresponding points are lying on the same scanline. This gives an opportunity to generate new views with linear interpolating technique. Third, the pre-warped images are then decomposed into hierarchical structure with wavelet transformation. Corresponding coefficients between two decomposed images are therefore linear interpolated to form the multiresolution representation of an intermediate view. Any quantization techniques can be embedded here to compress the coefficients in depth. The compressed format is very suitable for storage and communication. Fourthly, when displaying, compressed images are decoded and an inverse wavelet transform is achieved. Finally, we use a post-warping procedure to transform the interpolated views to its desired position. A nice future of using wavelet transformation is its multiresolution representation mode, which makes generating views can be refined progressively and hence suitable for communication.
Real-time video-based rendering for augmented spatial communication
Takeshi Naemura, Hiroshi Harashima
In the field of 3-D image communication and virtual reality, it is very important to establish a method of displaying arbitrary views of a 3-D scene. It is sure that the 3-D geometric models of scene objects are very useful for this purpose, since computer graphics techniques can synthesize arbitrary views of the models. It is, however, not so easy to obtain the models of objects in the physical world. In order to avoid this problem, a new technique, called image-based rendering, has been proposed for interpolating between views by warping input images, using depth information or correspondences between multiple images. To date, most of the works on this new technique has been concentrated on static scenes or objects. In order to cope with 3-D scenes in motion, we must establish the ways of processing multiple video sequences in real-time, and constructing accurate camera array system. In this paper, the authors propose a real-time method of rendering arbitrary views of 3-D scenes in motion. The proposed method realizes a sixteen camera array system with software adjusting support and a video-based rendering system. According to the observer's viewpoint, appropriate views of 3- D scenes are synthesized in real-time. Experimental results show the potential applicability of the proposed method to the augmented spatial communication systems.
Poster Presentations on Image Recovery
icon_mobile_dropdown
Efficient restoration and MPEG-2 encoding of old video archives
Parimal Aswani, Man Nang Chong
Most of the current implementations of MPEG-2 encoders have little provision for removal of noise from corrupted movie sequences. Presence of noise in video sequences will decrease the encoding efficiency and result in poorly reconstructed/decoded images. This paper presents an advanced MPEG2 encoder that has an in-built provision for noise removal. The proposed MPEG2 encoder can successfully remove most of the common types of artifacts found in old video archives such as dirt and sparkle and scratches, and store the video in MPEG2 compressed bit-stream. The video restoration algorithm used is a Gaussian weighted, bi-directional 3D autoregressive model. Since both the video restoration and MPEG2 encoding algorithms use bi-directional motion vectors in image sequences to align moving objects, significant speedup can be achieved if these two algorithms are integrated in such a way that the computation of the motion vectors is minimized. In this paper, we proposed an efficient method of developing such an MPEG2 encoder with in-built capability of restoring old video archives. The proposed advanced MPEG2 encoder is fully implemented on an Ultrasparc workstation.
Robust image compression with packetization: the JPEG-2000 case
Iole Moccagatta, Osama K. Al-Shaykh, Homer Chen
Multimedia applications running over wireless or other error prone transmission media require compression algorithms that are resilient to channel degradation. This paper presented a data packetization approach to make the emerging ISO JPEG-2000 image compression standard resilient to transmission errors. The proposed technique can be easily extended to other wavelet based-image codec schemes. Extensive simulation results shown that, with the proposed approach, a decoder is able to recover up to 8.5 dB in PSNR with a minimum overhead, and without affecting coding efficiency and spatial/quality scalability. Finally, the proposed approach supports unequal error protection of the wavelet subbands.
Robust transform image coder over noisy channel
Chi-Hsi Su, Hsueh-Ming Hang, Che-Ho Wei
In this paper, we propose a robust quantizer design for image coding. Because the bits representing the reconstruction levels are transmitted directly to the channel, the proposed quantizer can be viewed as a compound of a quantizer, a VLC coder, and a channel coder. The conventional combined source/channel design produces a source coder designed for a channel with a specific channel noise. Our proposed quantizer is designed within a noise range. In comparison with the ordinary JPEG coder, simulation results show that our proposed scheme has a much more graceful distortion behavior within the designed noise range.
Motion vector recovery for error concealment
Jae-Won Suh, Eung-Tae Kim, Seung-Jong Choi, et al.
This paper describes an error concealment algorithm to reduce the effect of channel errors in the bitstreams generated by motion compensated video coding algorithms such as MPEG. When channel errors are introduced during transmission and cannot be corrected properly, we can apply an error concealment technique to repair damaged portions of the picture by exploiting the spatial and temporal redundancies in the received and reconstructed video signal. In motion compensated video coding, if some bits are lost or received with errors, not only the current picture will be corrupted, but also errors will propagate to succeeding frames. In this paper, we analyze the effect of channel errors in MPEG-2 bitstreams, and propose an idea for recovering lost or erroneously received motion vectors. Extended luminance intensity value of the lost block is used for motion estimation at the decoder side. Simulation results show that the proposed algorithm achieves good performance in PSNR and provides good subjective image quality.
Fast image restoration for reducing blocking artifacts
Sang Kwang Lee, Tae Yong Kim, Joon-Ki Paik, et al.
DCT-based coding techniques for image data compression are widely used owing to good performance with moderate hardware complexity. In very low bit rate applications, however, block- based image compression techniques usually exhibit significant quality degradation, which is called as the blocking artifact. In this paper, we propose an adaptive fast image restoration method that is suitable for reducing the blocking artifact. The proposed restoration filter is based on an observation that the quantization operation is a nonlinear and many-to-one mapping operator. We have developed an approximated version of the constrained optimization technique for image restoration by removing the nonlinear and space-varying degradation operator. The proposed method can be used as a post-processor at the decoder of video coding systems for digital TV, video on demand (VOD), or digital versatile disc (DVD) applications.
Object-based analysis of motion blur and its removal by considering occluded boundaries
Yoo Chan Choung, Jeong-Ho Shin, Joon-Ki Paik
An image frame in image sequences, in general, suffers from degradation due to spatially varying motions. In this paper, we propose a new image degradation model for space-variant motion blur and a spatially adaptive image restoration algorithm to remove such motion blur. For the proposed image degradation model, we mathematically analyze boundary effect which arises on the border of two image segments with different motions. We extend the already proposed model for a moving object in an arbitrary direction. In order to represent the point spread function (PSF) for motion blur in an arbitrary direction, we develop a method which distributes energy of samples in the PSF into the neighboring integer grids. In order to remove the above mentioned motion blur, we propose an object-based adaptive regularized image restoration algorithm. Both in synthetically and naturally motion blurred images, the proposed image restoration algorithm gives acceptable performance in removing motion blur and, as a result, in restoring important features, such as numbers and characters, which cannot be recognized in the input blurred image.
Classification-based adaptive regularization for fast deblocking
Tae Yong Kim, Sang Kwang Lee, Joon-Ki Paik, et al.
In this paper we propose an iterative image restoration method using block edge classification for reducing block artifact in compressed images. In order to efficiently reduce block artifacts, a block is classified as edge or non-edge block, and the adaptive regularized iterative restoration method is used. The proposed restoration method is based on the observation that the quantization operation in a series of coding preprocess is a nonlinear and many-to-one mapping operator. And then we propose an adaptive iterative image restoration algorithm for removing the nonlinear and space- varying degradation. With some minor modifications the proposed image restoration method can be used for postprocessing reconstructed image sequences in HDTV, DVD, or video conferencing systems.
Poster Presentations on Implementations
icon_mobile_dropdown
Efficient modeling architecture for real-time content-based arithmetic coding
Hao-Chieh Chang, Liang-Gee Chen
In this paper, we describe an efficient modeling architecture for the content-based arithmetic coding on bi-level image. The architecture uses a delay-line buffer to maintain the input pixels such that the pixels in the buffer can be efficiently re-used. Besides, the delay-line buffer can be easily extended and reconfigure for constructing various 'contents.' The experimental result shows that the bottleneck resulting from constructing content for each pixel in a bi-level image can be overcome by the proposed architecture.
Object tracking and creation of linking information for distributed movie-based web-browsing system
Atsunobu Hiraiwa, Keisuke Fuse, Naohisa Komatsu, et al.
This paper proposes a new approach to automatically extract an accurate object from video streams. The new approach provides a useful tool creating linking information for a distributed movie-based Web-browsing system, and consists of a skip- labeling algorithm for feature-based segmentation, and a shrink-merge tracking algorithm for tracking an object. This skip-labeling algorithm can be used to segment an image into integrated regions of the same feature. The segmented regions belong to such a texture area as waves or forest. The shrink- merge tracking algorithm is executed, based on the time continuity of moving-objects, using morphological image processing, such as dilation and erosion. The dilation and erosion are repeatedly executed using the projection processing in which the object area in a next frame is derived from the object area in a current frame. The shrink-merge tracking algorithm can also project the area of a rotating- object in a current frame on the rotating-object containing the newly appearing regions in the next frame. The newly automated object extraction method works satisfactorily for the objects which move non-linearly within the video streams including MPEG and Motion JPEG, and works satisfactorily in approximately 450 frames, each with a full frame size of 704 X 480 pixels at video frame rate of 30 fps. This paper finally demonstrates that object-based linking information for a movie-based Web-browsing system contains information of objects obtained by the fully automated extraction from video- streams.
Optimization of software-based real-time H.263 video encoding
Shahriar Akramullah, Ishfaq Ahmad, Ming Lei Liou
In this work, our goal is to develop a real-time software- based H.263 video encoder using a single-processor system. This requires optimizing the execution speed of the code which, in turn, needs optimization at various design phases, including algorithmic enhancements, efficient implementations of these algorithms, and taking advantage of certain architectural features of the machine. We present an H.263 video encoder implemented on a single Sun UltraSPARC-1 workstation. In order to exploit the architectural features of the machine, we make use of a low-level machine primitive, namely, Sun UltraSPARC's Visual Instruction Set (VIS). Using VIS, we accelerate the computation in a SIMD fashion, increase the utilization of available registers in the processor, and remove register contentions between data and control variables. We have achieved a reasonably high frame encoding rate of more than 12 frames per second for QCIF resolution of video with high perceptual quality, which is sufficient for most of the GSTN-based video telephony applications. Extensive benchmarking experiments have been carried out to study the performance of the encoder. We have taken into account the effects of the optional H.263 coding modes on PSNR, bit rate and encoding speed. Based on these effects, suggestions are made to decide the optimum coding options.
Generalized parallelization methodology for video coding
Kwong-Keung Leung, Nelson Hon Ching Yung
This paper describes a generalized parallelization methodology for mapping video coding algorithms onto a multiprocessing architecture, through systematic task decomposition, scheduling and performance analysis. It exploits data parallelism inherent in the coding process and performs task scheduling base on task data size and access locality with the aim to hide as much communication overhead as possible. Utilizing Petri-nets and task graphs for representation and analysis, the method enables parallel video frame capturing, buffering and encoding without extra communication overhead. The theoretical speedup analysis indicates that this method offers excellent communication hiding, resulting in system efficiency well above 90%. A H.261 video encoder has been implemented on a TMS320C80 system using this method, and its performance was measured. The theoretical and measured performances are similar in that the measured speedup of the H.261 is 3.67 and 3.76 on four PP for QCIF and 352 X 240 video, respectively. They correspond to frame rates of 30.7 frame per second (fps) and 9.25 fps, and system efficiency of 91.8% and 94% respectively. As it is, this method is particularly efficient for platforms with small number of parallel processors.
Continuous-media communication method for minimizing playback interruptions
Kazuhiro Yoshida, Hiroyuki Kimiyama, Kazutoshi Nishimura
Video-on-demand service will be carried on the future broadband networks. For this service to gain acceptance, methods of maintaining acceptable quality of service must be implemented. However, this is made difficult by the variation in the delay time caused by other communication traffic on the same network. We propose a continuous-media communication method for minimizing the playback interruptions. In our method, the server predicts the delay time and sends data earlier by an amount equal to the delay time, so that the data is received before it should be played back and the number of playback interruptions is reduced. In this way, our method can guarantee the quality of service using end-to-end communication without reserving the required bandwidth in advance. We evaluated our method by simulation under two conditions In the first simulation, we measured the playback interruption rate when there was background traffic. In the second simulation, we measured the rate when we used the delay times measured in actual networks as the simulated delay time. The results showed that using our method reduced the playback interruption rate by 44 - 82% in the first simulation and by 40 - 100% in the second simulation, compared with using no control.
Hardware/software design implementation of feature detection for a reconfigurable processor
Philip P. Dang, Paul M. Chau
Image processing algorithms are suitable for reconfigurable architectures due to their matrix structures, inherent parallelism and need for flexibility and processing speed. This paper describes a method to implement feature detection on the ReConfigurable Processor (RCP). The RCP is an FPGA- based system, which was built by the VLSI-RCP Research Group at UCSD and L3 Communications. The design is based on the Altera FLEX 10K70. The architecture used to implement feature detector on RCP, software and hardware implementation will be discussed.
Special Session: Internet Video
icon_mobile_dropdown
Improving H.263+ scalability performance for very low bit-rate applications
Lily Liuyang Yang, Fernando C. M. Martins, Thomas R. Gardos
In this paper, we discuss the coding efficiency of the SNR enhancement layer scalability as supported by H.263+. We show that for the typical Internet connections, sub-56 Kbps channels, a significant amount of overhead is imposed by the H.263+ layered approach. This overhead precludes the efficient use of H.263+ SNR and spatial scalability in very low data rates. We provide a detailed analysis of the overhead in layered bitstreams and propose coding modifications that significantly reduce the overhead.
Video coding for multiple target audiences
Alan Lippman
We explain some of the mechanisms used by the SureStreamTM method in RealSystemTMG2 software for streaming video over the Internet. We focus on the dynamic behavior of the system under changing Internet bandwidth conditions. Our approach measures available bandwidth and switches between separate (non-layered) video encodings to match the channel capacity. The choice of bitrates, appropriate rate control methods, and details of switching between each bitrate will be the main topics of this paper. Our goal is to present one approach to this problem and the rationale behind some of the decisions we made, in the hopes of encouraging progress in the development of the best possible video streaming experience over the Internet. To avoid complexity we have left out discussion of audio and the interaction between audio streaming and video streaming.
Expanding network video capacity with delay-cognizant video coding
Yuan-Chi Chang, David G. Messerschmitt, Thom Carney
Prior work on statistical multiplexing of variable-bit-rate network video shows higher video capacity (more video connections) can be supported if connections have smoother traffic profiles. For delay critical applications like videoconferencing, smoothing a compressed bit stream indiscriminately is not an option because excess delay would be introduced. In this paper, we presented an application of delay cognizant video coding (DCVC) to expand the network video capacity by performing traffic smoothing discriminatively. DCVC segments the raw video data and generates two compressed video flows with differential delay requirements, a delay-critical flow and a delay-relaxed flow. The delay-critical flow carries less video information and is thus less bursty. The delay-relaxed flow complements the first flow and the magnitude of its bursts can be reduced by traffic smoothing. We demonstrated that at equal visual quality measured in PSNR, the network video capacity could be increased by as mush as 50 percent through the two-flow discriminative traffic smoothing.
Large-scale experiments on low-bit-rate multimedia broadcast
Zon-Yin Shae, Xiping Wang, Stephen P. Wood
This paper contains our experience with low bit rate multimedia streaming and broadcast, as applied to the Internet/Intranet, and focuses on two of the enabling technologies: 100% Java clients and broadcast reflectors. Interpreted Java is slower than compiled C/C++ and Java platforms do not currently support video and audio synchronization. Various techniques to improve Java performance and to reduce code size are provided in detail. A novel video and audio synchronization mechanism for the pure Java environment is devised and instigated. This paper also describes a hierarchical reflector network architecture which, superimposed on the Internet, is a practical alternative for broadcasting of events to massive client audiences when multicast support of such audiences in the current Internet is questionable and remains untested.
Tracking of multiple semantic video objects for Internet applications
Chuang Gu, Ming-Chieh Lee
This paper introduces a novel tracking system for generic semantic video objects using backward region-based classification. It consists of five elementary steps: region pre-processing, region extraction, region-based motion estimation, region classification and region post-processing. Region pre-processing simplifies the input data. Region extraction finds the basic elements for classification. Region-based motion estimation provides the trajectory information about each basic element. Region classification determines the interior/exterior parts of semantic video objects. Finally, region post-processing cleans the results. We will show solid performance of this generic tracking system with pixel-wise accuracy for Internet applications.
Dynamic resource allocation for VBR video transmission
Hsiu-Chi Yang, Hsueh-Ming Hang
The goal of this paper is to provide a feasible and flexible mechanism for variable bit rate (VBR) video transmission and to achieve high network utilization with statistical Quality of Service (QoS). In this paper, we employ a piece-wise constant rate smoothing algorithm to smooth the video coder outputs and propose a simple algorithm to determine the renegotiation schedule for the smoothed streams. In order to transmit video streams with renegotiation-based VBR service, we suggest a connection admission control (CAC) based on Chernoff bound using a simple yet quite accurate 'binomial' traffic model. The experimental results show that our proposed method provides an easy and robust mechanism to support real- time video transmission in both homogeneous and heterogeneous connection environments.
Packet-loss-resilient Internet video streaming
Bernd Girod, Klaus Werner Stuhlmueller, M. Link, et al.
This paper describes a transmission scheme for Internet video streaming that provides an acceptable video quality over a wide range of connection qualities. The proposed system consists of a scalable video coder which uses a fully standard compatible H.263 coder in its base layer. The scalable video coder is combined with unequal error protection using Reed- Solomon codes applied across packets. We present and verify a two-state Markov model for packet losses over Internet connections. The relation between packet loss and picture quality at the decoder for an unequally protected layered video stream is derived. Experimental results show that, with our approach, the picture quality of a streamed video degrades gracefully as the packet loss probability of an Internet connection increases.
Transporting H.320 video conference traffic to the Internet
Chin-Fu Ku, Hung-Yu Ko, Jeng-Wei Lin, et al.
Although H.320 is one of the most popular ITU-T standard for video conference systems, H.323 is receiving wide acceptance in the Internet society. In this paper, we study the problem of transporting video conference traffic to and from the Internet. Some characteristics of the problem are as follows. For example, H.323 video stream is VBR while H.320 video stream is CBR; H.323 is byte-oriented while H.320 is bit- oriented; audio and video packets are transmitted independently in H.323 while they are multiplexed together in H.320; the probability of packet loss in an H.323 network is much higher than in an H.320 ISDN circuit switching network. In this paper, we present our designs and some preliminary experimental results in dealing with these issues.
Network-adaptive video coding and transmission
Kay Sripanidkulchai, Tsuhan Chen
In visual communication, the conditions of the network, such as the delay, delay jitter, and the packet loss rate, have strong impact to the video quality. It would be useful if there is a feedback channel from the client to the sever to indicate the network conditions periodically, and a smart mechanism for coding and transmitting video that can adapt to these conditions. For example, when the network is congested, instead of sensing all the packets that have a high probability of being lost in the network, we can selectively drop some packets at the server (such as dropping packets for bidirectional-predicted frames). While intuitive, it is difficult to illustrate the effectiveness of adaptation using a single video server-client pair. A practical simulation would require multiple video servers and clients, and only then the benefit of adaptation will show up as advantageous utilization of network resources to provide good video quality. In this paper, we will introduce our methods of adaptation and present experimental and simulation results.
Architectures and Implementations
icon_mobile_dropdown
Novel embedded compression algorithm for memory reduction in MPEG codecs
Advanced digital compression systems, like the MPEG-standard, are entering the consumer market. However, the consumer acceptation of these technologies relies considerably on the possibility of substantial reduction of the implementation costs. In this paper we study a low-cost and high-quality embedded compression system for reducing the memory requirements of an MPEG-2 decoder with a factor of 4 - 6. The proposed embedded codec is based on a low-cost transform coding scheme and employs a modified feedforward coding mechanism to ensure a fixed compression factor. A novel quantization technique is developed, which prevents the error accumulation resulting from multiple encodings of the same pixel data. The decrease in image quality, caused by the embedded compression, is minimal (i.e. less than 1 dB) for MPEG-2 coded sequences at 4 - 9 Mbit/s. The proposed algorithm can also be successfully applied for memory reduction of MPEG- 2 encoders and H.263 codecs.
Motion-estimation/motion-compensation hardware architecture for a scene-adaptive algorithm on a single-chip MPEG-2 MP@ML video encoder
Koyo Nitta, Toshihiro Minami, Toshio Kondo, et al.
This paper proposes a unique motion estimation and motion compensation (ME/MC) hardware architecture for a scene- adaptive algorithm. The most significant feature is the independence of the two modules for the ME/MC. This enables the encoder to analyze the statistics of a scene before encoding it and to control the whole encoding process adaptively according to the scene. The scene-adaptive controls involve changing various encoding parameters, such as the search area or selection criteria, in the slice cycle or even in the macroblock cycle. The search area of our ME/MC architecture is plus or minus 211.5 horizontally and plus or minus 113.5 vertically by the area hopping method. The architecture is loaded on a single-chip MPEG2 MPML encoder.
Flexible low-power VLSI architecture for MPEG-4 motion estimation
Peter M. Kuhn, Ulrich Niedermeier, Liang-Fang Chao, et al.
This paper discusses VLSI architectural support for motion estimation (ME) algorithms within the H.263 and MPEG-4 video coding standards under low power constraints. A high memory access bandwidth and a high number of memory modules is mainly responsible for high power consumption in various motion estimation architectures. Therefore the aim of the presented VLSI architecture was to gain high efficiency at low memory bandwidth requirements for the computationally demanding algorithms as well as the support of several motion estimation algorithmic features with less additional area overhead. The presented VLSI architecture supports besides full search ME with [-16, 15] and [-8, +7] pel search area, MPEG-4 ME for arbitrarily shaped objects, advanced prediction mode, 2:1 pel subsampling, 4:1 pel subsampling, 4:1 alternate pel subsampling, Three Step Search (TSS), preference of the zero-MV, R/D-optimized ME and half-pel ME. A special data-flow design is used within the proposed architecture which allows to perform up to 16 absolute difference calculations in parallel, while loading only up to 2 bytes in parallel from current block and search are memory per clock cycle each. This VLSI-architecture was implemented using a VHDL-synthesis approach and resulted into a size of 22.8 kgates (without RAM), 100 Mhz (min.) using a 0.25 micrometer commercial CMOS library.
Scalable architecture of real-time MP@HL MPEG-2 video encoder for multiresolution video
Kazuhito Suguri, Takeshi Yoshitome, Mitsuo Ikeda, et al.
We have proposed a new system architecture for an MPEG-2 video encoder designed for high-resolution video. The system architecture uses the spatially parallel encoding approach and has scalability for the target video resolution to be encoded. Three new techniques have been introduced to the system. The first is a general video interface that supports multiple video formats. The second is a bitstream generation control scheme suitable for the spatially parallel encoding approach. The third is a simple data sharing mechanism for all encoding units. With these techniques, the system achieves both scalability and high encoding efficiency. Video encoding systems based on this system architecture will enable high quality video encoding to be used for visual applications for commercial and personal use at reasonable system cost.
New systolic array architecture for vector median filters
Long-Wen Chang
In digital image and audio processing, the scalar median filter is very effective in removing impulse noise and preserves the edge in the signal. Neuvo extended the scalar median and introduced vector median, which processes the vector signal. Since the vector median utilizes the correlation between different components, it is better than componentwise scalar median for color image processing. In this paper, a new systolic array architecture for computing the vector median of a series of vector signals is proposed. In pipeline processing of a sequence of vector signals it can output a vector median every clock with clock time just about computing one multiplication.
Feature Extraction
icon_mobile_dropdown
Multiscale wavelet feature detection applied to Landsat images
Landsat images have well-defined homogeneous regions owing to the unique spectral characteristics of different crops. Discontinuity detection and multi-scale feature detection were combined to delineate boundaries in Landsat images as a new approach to image classification: High Scale Discontinuity Detection.
Automatic facial feature extraction by genetic algorithms
Ja-Ling Wu, Chun-Hung Lin
An automatic facial feature extraction algorithm is presented in this paper. The algorithm is composed of two main stages: the face region estimation stage and the feature extraction stage. In the face region estimation stage, a second-chance region growing method is adopted to estimate the face region of a target image. In the feature extraction stage, genetic search algorithms are applied to extract the facial feature points within the face region. It is shown by simulation results that the proposed algorithm can automatically and exactly extract facial features with limited computational complexity.
Image segmentation and object extraction based on geometric features of regions
Toru Tamaki, Tsuyoshi Yamamura, Noboru Ohnishi
We propose a method for segmenting a color image into object- regions each of which corresponds to the projected region of each object in the scene onto an image plane. In conventional segmentation methods, it is not easy to extract an object- region as one region. Our proposed method uses geometric features of regions. At first, the image is segmented into small regions. Next, the geometric features such as inclusion, area ratio, smoothness, and continuity, are calculated for each region. Then the regions are merged together based on the geometric features. This merging enables us to obtain an object-region even if the surface of the object is textured with a variety of reflectances; this isn't taken into account in conventional segmentation methods. We show experimental results demonstrating the effectiveness of the proposed method.
Use of multiple visual features for object tracking
Ajith A. Pasqual, Kiyoharu Aizawa, Mitsutoshi Hatori
In this paper we present a method of using multiple visual attributes (features) that are present in moving objects for carrying out object tracking, by way of feature substitution. The proposed method, in principle, can make use of many visual cues available from a scene such as texture, color, velocity (monocular features) and disparity, vergence (binocular features). For the present experiments we make use of 3 features, namely, texture, optical flow and color as the main visual features and defocus of objects (blur) as supportive feature. At any instance, tracking is carried out using only one feature and this feature is monitored closely for failures. The feature is substituted with another suitable feature only upon the failure or high uncertainty of the current features. In case of tracking using texture alone, we make use of a histogram based technique called Histogram Intersection Value. This technique is not only computationally simple but provides very good results whenever texture is suitable for tracking. Experimental results with real image sequences show the validity of the proposed method.
Log-derivative matching method for pattern comparison
Yasuko Takahashi, Hisako Tanaka, Akio Shio, et al.
A new pattern comparison method called LDM (log-derivative- matching) based on the calculus of object reflectance is proposed. We introduce a log-derivative operator for the local operator, and correlation for global integration. We show two facts about LDM: (1) Under a few assumptions on illumination change, our log-derivative operator minimizes the influence of illuminaton. (2) The LDM method can be used for pattern comparison. Experimental results and also a mathematical analysis show that the proposed method permits pattern matching even under strong shadow.
Unsupervised image segmentation using a mean field decomposition of a posteriori probability
Hideki Noda, Mehdi N. Shirazi, Bing Zhang, et al.
This paper proposes a Markov random field (MRF) model-based method for unsupervised segmentation of images consisting of multiple textures. To model such textured images, a hierarchical MRF is used with two layers, the first layer representing an unobservable region image and the second layer representing multiple textures which cover each region. This method uses the Expectation and Maximization (EM) method for model parameter estimation, where in order to overcome the well-noticed computational problem in the expectation step, we approximate the Baum function using mean-field-based decomposition of a posteriori probability. Given provisionally estimated parameters at each iteration in the EM method, a provisional segmentation is carried out using local a posteriori probability (LAP) of each pixel's region label, which is derived by mean-field-based decomposition of a posteriori probability of the whole region image. Simulation results show that the use of LAPs is essential to perform a good image segmentation.
Poster Presentations on Image and Video Coding
icon_mobile_dropdown
Quincunx filter lifting scheme for image coding
This paper introduces a new construction of quincunx wavelet transform. This new transform is a bidimensional extension of the factorization of wavelet transform into lifting scheme for finite and symmetrical low pass filters. The aim of this method is to deal with quincunx images by appropriate transforms while using advantages offered by the lifting scheme. Indeed, quincunx sampling is of big interest for image coding applications. For example recent remote sensors of satellites return quincunx sampled images. Moreover, a quincunx sampling allows the decomposition of the image into two channels and to have a twice as accurate multiresolution analysis as the dyadic one.
Visual-pattern-based color image compression
A novel color image coding technique based on visual patterns is presented. Visual patterns, a concept first introduced by Chen and Bovik, are image blocks representing visually meaningful information. A method has been developed to extend the concept of visual patterns (originally developed for grayscale images) to color image coding. A mapping criterion has been developed to map small image blocks to a set of predefined, universal visual patterns in a uniform color space. Source coding and color quantization are applied to achieve efficient coding. Compression ratios between 40:1 and 60:1 (0.6 - 0.4 bpp) have been achieved; subjective as well as objective measures show that the new method is comparable to state-of-the-art techniques such as JPEG.
Multiple description coding via polyphase transform and selective quantization
Wenqing Jiang, Antonio Ortega
In this paper, we present an efficient Multiple Description Coding (MDC) technique to achieve robust communication over unreliable channels such as a lossy packet network. We first model such unreliable channels as erasure channels and then we present a MDC system using polyphase transform and selective quantization to recover channel erasures. Different from previous MDC work, our system explicitly separates description generation and redundancy addition which greatly reduces the implementation complexity specially for systems with more than two descriptions. Our system also realizes a Balanced Multiple Description Coding (BMDC) framework which can generate descriptions of statistically equal rate and importance. This property is well matched to communication systems with no priority mechanisms for data delivery, such as today's Internet. We then study, for a given total coding rate, the problem of optimal bit allocation between source coding and redundancy coding to achieve the minimum average distortion for different channel failure rates. With high resolution quantization assumption, we give optimal redundancy bit rate allocations for both scalar i.i.d sources and vector i.i.d sources for independent channel failures. To evaluate the performance of our system, we provide an image coding application with two descriptions and our simulation results are better than the best MDC image coding results reported to date. We also provide image coding examples with 16 descriptions to illustrate the simplicity and effectiveness of our proposed MDC system.
Novel bit allocation method for the motion-compensated interframe coding in the sense of optimality
Wook-Joong Kim, Seong-Dae Kim
In this work, we present a novel method for bit allocation problem that aims to minimize overall distortion subject to bit rate constraint. It has been proved that optimal solution can be found by a method using the Lagrangian method with dynamic programming. However, the optimal bit allocation for block-based interframe coding is practically unattainable because of interframe dependency of macroblocks caused by motion compensation. In order to reduce the computational burden maintaining the result close to optimal, we propose an alternative method. We derive a partitioned form of the bit allocation problem: a frame-level problem and one-frame macroblock-level problems. Then we use a two-phase optimization technique with an interframe dependency model and a rate-distortion model.
Hybrid coding of video with spatiotemporal scalability using subband decomposition
Marek Domanski, Adam Luczak, Slawomir Mackowiak, et al.
The paper deals with scalable coding of video with SDTV or HDTV resolution. A new technique of scalable coding is proposed for bitrates of about 3 - 10 Mbps. The technique has been implemented for BT.601 resolution and progressive scan, therefore problems related to an interlaced scan are omitted here. The goal is to improve spatial scalability of MPEG-2 by introducing spatio-temporal scalability. The technique proposed needs less coding overhead than in MPEG-2 spatially scalable scheme and an enhancement layer bitstream with its bitrate not less than the bitrate in a base layer. The solution proposed in the paper is based on both temporal and spatial resolution reduction performed for data transmitted in a base layer. The temporal resolution reduction is obtained by placing each second frame (B-frame) in the enhancement layer. The enhancement layer includes also high-frequency spatial subbands from other frames. A variant of the system based on three-dimensional spatio-temporal analysis is also described. In both cases the assumption is that a base layer is fully MPEG-2 compatible.
Transcoding DV into MPEG-2 in the DCT domain
Donyeon Kim, Bumsik Youn, Yoonsik Choe
Transcoding Digital Video (DV) for Digital Video Cassette Recorder (DVCR) into MPEG-2 intra coding is performed in the DCT domain to reduce conversion steps. Multiplying matrix by transformed data is used for 4:1:1-to-4:2:0 chroma format conversion and 2-4-8 DCT mode to 8-8 DCT mode conversion for parallel processing. M_quant of MPEG-2 rate control is computed in the DCT domain. For MPEG-2 inter coding, fast motion estimations taking advantage of data in the DCT domain are studied for transcoding. Among them, ME with overlapped search range shows better PSNR performance than ME without overlapping.
Virtually lossless compression of medical images through classified prediction and context-based arithmetic coding
This paper proposes a method to achieve a virtually-lossless compression of medical images. An image is normalized to the standard deviation of its noise, which is adaptively estimated in an unsupervised fashion. The resulting bit map is encoded without any further loss. The compression algorithm is based on a classified linear-regression prediction followed by context-based arithmetic coding of the outcome residuals. Images are partitioned into blocks, e.g., 16 X 16, and a minimum mean square (MMSE) linear predictor is calculated for each block. Given a preset number of classes, a Fuzzy-C-Means algorithm produces an initial guess of classified predictors to be fed to an iterative procedure which classifies pixel blocks simultaneously refining the associated predictors. All the predictors are transmitted along with the label of each block. Coding time are affordable thanks to fast convergence of the iterative algorithms. Decoding is always performed in real time. The compression scheme provides impressive performances, especially when applied to X-ray images.
Encoding of multi-alphabet sources by binary arithmetic coding
Muling Guo, Takahumi Oka, Shigeo Kato, et al.
In case of encoding a multi-alphabet source, the multi- alphabet symbol sequence can be encoded directly by a multi- alphabet arithmetic encoder, or the sequence can be first converted into several binary sequences and then each binary sequence is encoded by binary arithmetic encoder, such as the L-R arithmetic coder. Arithmetic coding, however, requires arithmetic operations for each symbol and is computationally heavy. In this paper, a binary representation method using Huffman tree is introduced to reduce the number of arithmetic operations, and a new probability approximation for L-R arithmetic coding is further proposed to improve the coding efficiency when the probability of LPS (Least Probable Symbol) is near 0.5. Simulation results show that our proposed scheme has high coding efficacy and can reduce the number of coding symbols.
New color image compression algorithm based on DCT and hierarchical data structure
Alan C. K. Ng, Bing Zeng
Nowadays, since the bandwidth is very expensive in most cases of the real-world telecommunications, we need to compress the amount of image data being transmitted. The main drawback for lossy compression is that we cannot completely restore the original images. We have to make a compromise between bit-rate and distortion. In the paper, we propose a new scalable image compression algorithm using (1) Quad Tree data structure, (2) DCT and (3) Hierarchical DPCM encoding. For high compression, we are able to achieve compression factor of 48 (at 0.25 bpp), with PSNR of about 32 dB. For small distortion compressed image, we can obtain high PNSR of 49 dB at 2.5 dpp result is promising. Due to the hierarchial data structure, the encoded image can be transmitted progressively for practical Internet applications.
Two-dimensional optimum band partition based on band blocks for subband image coding
Masashi Kameda, Kohhei Ohtake, Makoto M. Miyahara
Subband coding is important (1) to decompose an input signal into an adequate set of subbands considering the property of an input signal and (2) to assign an adequate bits in proportion with the power of each subband signal. Investigation of auto-correlation characteristics of several kinds of images reveals us an isotropic correlation model. Based on this theoretical image model, we have derived the optimum band partition that minimizes the quantization noise power of the reconstructed signal at the receiving end. In order to apply the above theoretical optimum band partition to the image coding, we propose the optimum band partition scheme based on an idea of a set of band blocks, and we present the calculation algorithm with the low computational complexity to determine the optimum band partition. Also, we propose two methods to exploit the non-stationary nature of images, and the filter bank configuration to realize the optimum band partition including these methods. The proposed optimum band partition shows better result than that of the blocking DCT in obtaining the higher compression rate and the higher image quality.
Lossless coding method for black-ink signals of high-quality printing images
Shigeo Kato, Muling Guo, Madoka Hasegawa
Digital color images in the printing fields are usually extra high quality images, which have many gray levels and high resolution. In order to transmit and store such images efficiently, it is needed to introduce the compression techniques. Printing color images are usually represented by 4-primary colors, such as the Cyan, the Magenta, the Yellow and the Black (Black-Ink) signals. The black signal is, however, quite different from the other three primary color signals in statistical characteristics. Such specific characteristics should be used to compress the black signals of the printing color images. In this paper, we propose a new coding scheme for the black signals of printing color images. In the proposed scheme, first, eight prediction functions including those used in JPEG spatial mode are applied for three primary color signals except the Black one. Secondly, a suitable prediction function is selected from the eight prediction functions by calculating the summation of absolute prediction errors of three primary color signals, and searching the minimum summation among the eight prediction functions. Finally, prediction errors are separately encoded by contexts of reference pixels. Simulation results show that the proposed scheme has high compression ratio for the black- ink signals.
34/45-Mbps 3D HDTV digital coding scheme using modified motion compensation with disparity vectors
Sei Naito, Shuichi Matsumoto
This paper describes a digital compression coding scheme for transmitting three dimensional stereo HDTV signals with full resolution at bit-rates around 30 to 40 Mbps to be adapted for PDH networks of the CCITT 3rd digital hierarchy, 34 Mbps and 45 Mbps, SDH networks of 52 Mbps and ATM networks. In order to achieve a satisfactory quality for stereo HDTV pictures, three advanced key technologies are introduced into the MPEG-2 Multi-View Profile, i.e., a modified motion compensation using disparity vectors estimated between the left and right pictures, an adaptive rate control using a common buffer memory for left and right pictures encoding, and a discriminatory bit allocation which results in the improvement of left pictures quality without any degradation of right pictures. From the results of coding experiment conducted to evaluate the coding picture achieved by this coding scheme, it is confirmed that our coding scheme gives satisfactory picture quality even at 34 Mbps including audio and FEC data.
Piecewise linear compression scheme for PC video cameras
Jun Li, Iskender Agi
To keep cost low, most of the commercial color camera products for PC video conferencing employ a single CCD or CMOS imager. Usually, a mosaic color filter array is applied to the CCD or CMOS focal plane array image sensor in order to extract color information from the image through post-processing. Thus the output data of each color component is interlaced in every output line from the imager. This paper proposes a new scan- line based image compression technique for PC video conferencing. It takes the advantages of PC's advanced computation power and provides end users the capability of performing image decompression and color interpolation on demand. Without doing any color interpolation before image compression, piecewise linear compression is performed on raw line data. It generates a single control point set for all color components of the same line and packs the control points into a compact format so that a small lookup table for variable-length Huffman encoding can be applied. The simplicity of piecewise-linear approximation and variable- length encoding leads to low-cost ASIC implementation and fast speed image compression solution with improved compression ratio, better color fidelity, and capability to implement decoding in parallel processing.
Image Coding
icon_mobile_dropdown
Fractal image coding by an approximation of the collage error
Ismail Salih, Stanley H. Smith
In fractal image compression an image is coded as a set of contractive transformations, and is guaranteed to generate an approximation to the original image when iteratively applied to any initial image. In this paper we present a method for mapping similar regions within an image by an approximation of the collage error; that is, range blocks can be approximated by a linear combination of domain blocks.
Mixed raster content (MRC) model for compound image compression
Ricardo L. de Queiroz, Robert R. Buckley, Ming Xu
This paper will describe the Mixed Raster Content (MRC) method for compressing compound images, containing both binary test and continuous-tone images. A single compression algorithm that simultaneously meets the requirements for both text and image compression has been elusive. MRC takes a different approach. Rather than using a single algorithm, MRC uses a multi-layered imaging model for representing the results of multiple compression algorithms, including ones developed specifically for text and for images. As a result, MRC can combine the best of existing or new compression algorithms and offer different quality-compression ratio tradeoffs. The algorithms used by MRC set the lower bound on its compression performance. Compared to existing algorithms, MRC has some image-processing overhead to manage multiple algorithms and the imaging model. This paper will develop the rationale for the MRC approach by describing the multi-layered imaging model in light of a rate-distortion trade-off. Results will be presented comparing images compressed using MRC, JPEG and state-of-the-art wavelet algorithms such as SPIHT. MRC has been approved or proposed as an architectural model for several standards, including ITU Color Fax, IETF Internet Fax, and JPEG 2000.
Image coding approach based on multiscale matching pursuits operation
Hui Li, Ingo Wolff
A new image coding technique based on the Multiscale Matching Pursuits (MMP) approach is presented. Using a pre-defined dictionary set, which consists of a limited amount of elements, the MMP approach can decompose/encode images on different image scales and reconstruct/decode the image by the same dictionary. The MMP approach can be used to represent different scale image texture as well as the whole image. Instead of the pixel-based image representation, the MMP method represents the image texture as an index of a dictionary and thereby can encode the image with low data volume. Based on the MMP operation, the image content can be coded in an order from global to local and detail.
Scalable image coding with fine granularity based on hierarchical mesh
Patrick Lechat, Nathalie Laurent, Henri Sanson
This paper presents a method for still image encoding based on a hierarchical mesh representation. Contrary to most classical coding schemes which transform the signal into the frequential domain and quantize it, our method performs a purely spatial, content adaptive, representation. The main goals are: both spatial and SNR scalability, progressive bitstream transmission and efficient support for motion estimation and compensation. The technique presented consists in approximating the image by triangular mesh covering the whole image domain, which allows to use the finite elements method. Mesh nodes carry both position information and photometric data (YUV) and the Lagrangian affine interpolation model defined on triangular elements enables image approximation everywhere. To perform the scalability and the content adaptive scheme, the base level mesh is iteratively subdivided, by splitting each triangle into 4 new ones. Furthermore, to decrease the coding rate, mesh nodes position and values are quantized and differential encoded across mesh levels. A quad tree built during mesh subdivision selects and sorts data to be sent to the bitstream, given a quality criteria per tree node. By this way, the most important information is sent first, delivering a rough image representation, then further differential values are transmitted to enhance the representation quality.
Visual progressive coding
Jin Li
The embedded coder has an attractive feature that the coding bitstream can be truncated later at any point and still decoded a perceptible image. It is common in conventional coding to improve the subjective quality of the coded image through adjusting the quantization step size inversely proportional to a set of visual weights. However, such scheme will not be effective in the embedded coder as different viewing condition may be called for at different stages of embedding. In this paper, we propose the visual progressive coding (VIP), which uses the visual weights to determine the order of embedding, rather than to requantize the transform coefficients. VIP can change the weights halfway through redefining the order of embedding according to the active weights. With such 'reordering by weight' strategy, it is feasible to adjust the visual weights flexibly during the embedding process, and improve the subjective appearance of the image over the entire bit rate.
Interscale prediction and subband decomposition for still image coding
Felix Henry, Pierre Duhamel
It is commonly admitted that relying only on interscale redundancies, like pure fractal coders do, leads to inferior rate-distortion performance when compared to more classical designs (involving transform, quantization, and entropy coding). We address the problem of local efficiency of interscale coding in digital images. We design an interscale coding technique that predicts vectors of subband coefficients between adjacent resolution levels. This interscale coding scheme is applied in cooperation with zeroing and lattice vector quantization (LVQ) within the same compression scheme. To this end, we propose a rate allocation algorithm adapted to a signal partitioned into small vectors. The rate allocator chooses the best method according to the rate-distortion compromise. Side information is transmitted in a compressed form to indicate which of the three decoding methods is chosen on each block (zeroing, interscale prediction, or LVQ). Experiments show that interscale coding can locally outperform other methods. Unfortunately, due to the side information, its impact is globally negative. Switching off interscale coding leads to improved performance. Thus, we show in a rigorous framework that interscale prediction of blocks is not recommended for natural image coding.
Content-Based, Model-Based, Object-Based Coding
icon_mobile_dropdown
Hierarchical 2D content-based mesh tracking for video object
Ning Zhuang, Peter J. L. van Beek, Isil Celasun, et al.
This paper proposes methods for tracking of hierarchical 2D content-based mesh representations. We introduce new techniques to maintain the initial mesh hierarchy and topology during tracking by imposing certain constraints at each stage of the procedure. Experimental results are presented to compare the tracking performance of hierarchical versus non- hierarchical mesh representations. The results show that hierarchical tracking outperforms single-level tracking in case there is significant motion.
Extraction of moving objects for content-based video coding
Thomas Meier, King N. Ngan
This paper considers video object plane (VOP) segmentation for the content-based video coding standard MPEG-4. To provide multimedia applications with new functionalities, such as content-based interactivity and scalability, the new video coding standard MPEG-4 relies on a content-based representation. To take advantage of these functionalities, a prior decomposition of sequences into semantically meaningful, physical objects is required. We formulate this problem as one of separating foreground objects from the background based on motion information. For the object of interest, a two- dimensional binary model is derived and tracked throughout the sequence. The model points consist of edge pixels detected by the Canny operator. To accommodate rotation and changes in shape of the tracked object, the model is updated every frame. These binary models then guide the actual VOP extraction. Due to the excellent edge localization properties of the Canny operator, the resulting VOP contours are very accurate. Both the model initialization and update stage exploit motion information. The main assumption underlying our approach is the existence of a dominant global motion that can be assigned to the background. Areas that do not follow this background motion indicate the presence of independently moving physical objects. Two methods to identify such objects are presented. The first one employs a morphological motion filter with a new filtering criterion that measures the deviation of the locally estimated optical flow from the corresponding global motion. The second method computes a change detection mask by taking the difference between consecutive frames. The first version is more suitable for sequences involving little motion, whereas the second version is stronger at dealing with fast moving objects.
Progressive mesh-based coding of arbitrary-shaped video objects
Corinne Le Buhan Jordan, Touradj Ebrahimi, Murat Kunt
While the emerging MPEG-4 standard has raised the need for efficient object-based compression, the future MPEG-7 standard motivates further research in the field of progressive, quality-scalable, and semantic representations for indexing and retrieval applications in particular. In this paper a mesh-based video compression scheme is proposed that integrates shape, motion and texture representations in a consistent way and provides a content-based, quality-scalable, separate bitstream syntax. A complete video compression scheme is designed based on a content-based triangular mesh model combined with a progressive geometrical shape representation. In this context, different node motion estimation methods are discussed as well as residual texture representation by means of transform coding. Lastly, the adaptation of such a mesh- based video representation to achieve progressive compression is also investigated.
Region-based color image segmentation scheme
Nicolaos Ikonomakis, Konstantinos N. Plataniotis, Anastasios N. Venetsanopoulos
A color image segmentation technique is presented for use in coding and/or compression of video-conferencing sequences. The proposed technique utilizes the perceptual HSI (hue, saturation, intensity) color space. The effectiveness of the scheme is improved by first splitting the pixels in the image into chromatic and achromatic regions using a classification method. A region growing scheme is then employed to each of the set of chromatic and achromatic pixels to segment the image. For the achromatic pixels a simple intensity difference metric is used. For the chromatic pixels three distance metrics were compared. Results are shown for three video- conferencing type images.
Fast and accurate moving object extraction technique for MPEG-4 object-based video coding
A fast and robust video segmentation technique is proposed to generate a coding optimized binary object mask in this work. The algorithm exploits the color information in the L*u*v* space, and combines it with the motion information to separate moving objects from the background. A non-parametric gradient- based iterative color clustering algorithm, called the mean shift algorithm, is first employed to provide robust homogeneous color regions according to dominant colors. Next, moving regions are identified by a motion detection method, which is developed based on the frame intensity difference to circumvent the motion estimation complexity for the whole frame. Only moving regions are analyzed by a region-based affine motion model, and tracked to increase the temporal and spatial consistency of extracted objects. The final shape is optimized for MPEG-4 coding efficiency by using a variable bandwidth region boundary. The shape coding efficiency can be improved up to 30% with negligible loss of perceptual quality. The proposed system is evaluated for several typical MPEG-4 test sequences. It provides consistent and accurate object boundaries throughout the entire test sequences.
2D shape estimation for moving objects with a moving camera and cast shadows
Roland Mech, Juergen Stauder
The estimation of the 2D shape of moving objects in a video image sequence is required for many applications, e.g. for so- called content-based functionalities of ISO/MPEG-4, for object-based coding, and for automatic surveillance. Many real sequences are taken by a moving camera and show moving objects as well as their cast shadows. In this paper, an algorithm for 2D shape estimation for moving objects is presented that considers for first time explicitly both, a moving camera and moving cast shadows. The algorithm consists of five steps: Estimation and compensation of possibly apparent camera motion, detection of possibly apparent scene cuts, generation of a binary mask by detection of temporal signal changes after camera motion compensation, elimination of mask regions corresponding to moving cast shadows and uncovered background, and finally, adaptation of the mask to luminance edges of the current frame. For identification of moving cast shadows, three criteria evaluate static background edges, uniform change of illumination, and shadow penumbra. The proposed algorithm yields accurate segmentation results for sequences taken by a static or moving camera, in absence and in presence of moving cast shadows. Parts of this algorithm have been accepted for the informative part of the description of the forthcoming international standard ISO/MPEG-4.
Spatiotemporal segmentation
Cassandra T. Swain, Atul Puri
This paper presents a spatio-temporal approach to segmenting moving foreground from background based on focus, color, and motion. Focus and/or color are used to segment the foreground from the background spatially. And motion is used for temporal segmentation. Results indicate that moving foreground can be segmented from stationary foreground. Results also demonstrate that using both spatial and temporal information reduces the processed regions in the image.
3D motion estimation for articulated human templates using a sequence of stereoscopic image pairs
Sebastian Weik, Oliver Niemeyer
This contribution describes an approach towards 3D teleconferencing. Textured, 3D anthropomorphic models are used in a virtual environment to give the impression of physical closeness. The requirements for such a conferencing system are on the one hand textured, articulated 3D models of the conferees. For high realism a flexible deformation model has been integrated in the 3D models. On the other hand these models have to be animated in the virtual meeting room according to the motion parameters of the real conferees. Therefore motion estimation has to be performed. To avoid wiring of the persons this has to be done optically. In this approach a gradient based motion tracker has been implemented. No markers or optical tracking points are needed to extract the hierarchic motion parameters of the conferee. It works on a stereoscopic image sequence and employs the flexible, articulated anthropomorphic model of the conferee. The motion hierarchy of the articulated model is used to reduce the degrees of freedom and to make the estimation more robust.
Poster Presentations on Segmentation and Synthetic View
icon_mobile_dropdown
Vehicle detection and classification using robust shadow feature
Chae Whan Lim, Jong-Sun Park, Chang-Sup Lee, et al.
We propose an efficient vehicle detection and classification algorithm using shadow robust feature for an electronic toll collection. The local correlation coefficient between wavelet transformed input and reference images is used as such a feature, which takes advantage of textural similarity. The usefulness of the proposed feature is analyzed qualitatively by comparing the feature with the local variance of a difference image, and is verified by measuring the improvements in the separability of vehicle from shadowy or shadowless road for a real test image. Experimental results from field tests show that the proposed vehicle detection and classification algorithm performs well even under abrupt intensity change due to the characteristics of sensor and occurrence of shadow.
Robust region-merging technique for video sequences: spatiotemporal segmentation
Riccardo Leonardi, Pier Angelo Migliorati, Giuseppe Tofanicchio
The segmentation of video sequences into regions underlying a coherent motion is one of the most important processing in video analysis and coding. In this paper, we propose a reliability measure that indicates to what extent an affine motion model represents the motion of an image region. This reliability measure is then proposed as a criterion to coherently merge moving image regions in a Minimum Description Length (MDL) framework. To overcome the region-based motion estimation and segmentation chicken and egg problem, the motion field estimation and the segmentation task are treated separately. After a global motion compensation, a local motion field estimation is carried out starting from a translational motion model. Concurrently, a Markov Random Field model based algorithm provides for an initial static image partition. The motion estimation and segmentation problem is then formulated in the view of the MDL principle. A merging stage based on a directed weighted graph gives the final spatio-temporal segmentation. The simulation results show the effectiveness of the proposed algorithm.
Content-based object segmentation in video sequences
Constantinos Tsougarakis, Sethuraman Panchanathan
Motivated by the emerging video coding standard MPEG-4, this paper proposes a solution to address the problems of unsupervised object segmentation in images and video sequences, using color information. Although the human visual system is very sensitive to edge information, image segmentation is one of the most fundamental and yet complex tasks in image processing. The correct classification of different elements in a scene into different objects and the accurate extraction of their contours is crucial, since the performance of an object-tracking algorithm relies mainly on the segmentation results. In this paper we propose a novel algorithm for unsupervised object segmentation based on the combination of a region growing algorithm, clustering and a morphological opening by reconstruction operator using the inherent color information present in the image, in order to create a robust segmentation tool.
Motion segmentation and cloud tracking on noisy infrared image sequences
Ronan Fablet, Philippe Rostaing, Christophe Collet
Aerial surveillance is an issue of key importance for warship protection. In addition to radar systems, infrared surveillance sensors represent an interesting alternative for remote observation. In this paper, we study such a system and an original approach to the tracking of complex cloudy patterns in noisy infrared image sequences is proposed. We have paid particular attention to robustness with regards to perturbations likely to occur (noise, 'lining effects' . . .). Our approach relies on robust parametric motion estimation and an original regularization scheme allows to handle with the appearance and the disappearance of objects in the scene. Numerous experiments performed on outdoor infrared image sequences underline the efficiency of the proposed method.
Selective tracking of stimuli by impulse Retina
Emmanuel Marilly, Alain Mercier, Christophe Coroyer, et al.
In the context of mobile robotics, we have evolved a Foveal Visual Pre-processor that detects and extract motion as well as shape. This F.V.P. makes a selective tracking of stimuli. The main interest of this model inspired by the vertebrate retina is its response to stationary or moving stimuli: they can be distinguished according to both their shapes and velocities. This model is adaptive and its multi-resolution characteristics allow the detection of a wide range of velocities. The developed model consists in three parts: the Retina module which is in charge of the visual processing, the process module which processes the neural signals and extracts the velocity vector and the spatial frequencies (i.e.: the shape of the stimulus) and the control module which directs the fovea onto the stimulus and tracks it. The results obtained from the analyses of scenes with, in a first step, a unique moving object have validated the chosen control. One of the interests of our model is the possibility of selecting one stimulus and tracking it. The developed system has a good precision and the object is always caught by the fovea. Indeed, as the biological retina, our sensor adapts to the various conditions of illumination and is robust to noise.
Extraction of the front vehicle using projected disparity map
Nobuhiro Tsunashima, Masato Nakajima
To prevent rear-end collisions, it is necessary to measure the distance between a vehicle and the vehicle in front of it. In this paper we described a new technique for measuring this distance using stereoscopic images. A vehicle is represented by object points that are same distance from the stereo cameras. So, a disparity map calculated from the stereo images is projected in the ordinate direction. We call this map a 'Projected Disparity Map.' In the projected disparity map the disparities of the vehicle are translated into a straight line. So we can detect the front vehicle by extracting that straight line. To verify this method we applied it to stereoscopic images taken on an expressway. The experiments demonstrated the effectiveness of the methods.
New depth-cue-based algorithm for background-foreground segmentation
Kuntal Sengupta, Liyanage C. De Silva
In this paper, we present a method for segmenting the interesting foreground from the background using a novel depth cue based algorithm. The input to the algorithm are two pairs of images, the first being the stereo pair corresponding to the background image only (called the background pair), and the second corresponds to the stereo pair when the object(s) of interest is present in front of the background (called the composite pair). Since we use stereo images rather than monocular images, we can utilize the fact that the interesting foreground has a depth/disparity value which is different from the corresponding values for the background. Under situations such as poor lighting conditions, or when lighting conditions change continuously, it may be quite unreliable to extract the foreground by the process of subtracting the composite image from its background counterpart, followed by a thresholding process. Also, the camera noise is usually unknown, in general. Instead, we compute the disparity image corresponding to the background stereo pair, and validate the disparity values for the composite pair. A point belonging to the foreground will certainly have a higher disparity value. Based on the novel depth cue based measure introduced in this paper, it would fail the validation process and hence would be classified as a foreground pixel. The other notable point is that the computationally expensive stereo matching process is performed offline, and hence the segmentation process is quite fast.
Capturing wide-view images with uncalibrated cameras
Vincent van de Laar, Kiyoharu Aizawa, Mitsutoshi Hatori
This paper describes a scheme to capture a wide-view image using a camera setup with uncalibrated cameras. The setup is such that the optical axes are pointed in divergent directions. The direction of view of the resulting image can be chosen freely in any direction between these two optical axes. The scheme uses eight-parameter perspective transformations to warp the images, the parameters of which are obtained by using a relative orientation algorithm. The focal length and scale factor of the two image are estimated by using Powell's multi-dimensional optimization technique. Experiments on real images show the accuracy of the scheme.
Spherical visual system for real-time virtual reality and surveillance
Su-Shing Chen
A spherical visual system has been developed for full field, web-based surveillance, virtual reality, and roundtable video conference. The hardware is a CycloVision parabolic lens mounted on a video camera. The software was developed at the University of Missouri-Columbia. The mathematical model is developed by Su-Shing Chen and Michael Penna in the 1980s. The parabolic image, capturing the full (360 degrees) hemispherical field (except the north pole) of view is transformed into the spherical model of Chen and Penna. In the spherical model, images are invariant under the rotation group and are easily mapped to the image plane tangent to any point on the sphere. The projected image is exactly what the usual camera produces at that angle. Thus a real-time full spherical field video camera is developed by using two pieces of parabolic lenses.
Morphological image segmentation preserving semantic object shapes
Hyun Sang Park, Jong Beom Ra
As an attempt to achieve realistic image segmentation, an efficient segmentation algorithm is proposed. The proposed method aims to represent homogeneous visual objects with few regions while preserving semantic contents of an image as good as possible. This strategy is valid, since homogeneous visual objects occupy most parts of the entire image domain in a typical 'head and shoulder' video sequence and the raggedness within them is much more objectionable than in complex visual objects. For this objective, we adopt a bottom-up approach by using spatial domain information only. For precise initial image segmentation, an efficient marker extraction algorithm utilizing marker clusters is employed. And, an ordered and classified region-merging algorithm is suggested and applied to reduce the number of redundant regions within visual objects. Finally, we eliminate redundant small regions heuristically, according to their topological locations. The experimental results show the realistic segmentation of an image with a marginal number of regions. Particularly, homogeneous visual objects are represented with a few regions. Thus, the proposed method is highly applicable to high-level computer vision problems as well as object-based video coders.
Mutual conversion between 3D images and light rays
Keisuke Takeuchi, Takeshi Naemura, Hiroshi Harashima
In the field of 3-D image, several kinds of input/output methods are developed and still making rapid progress. Considering such situation, it is desirable that the format of 3-D data is independent on input/output methods. For this purpose, ray-based representation has been proposed. In this method, 3-D physical space is represented by rays which propagate in the space. If whole light rays are completely described, 3-D space can be reproduced correctly from light ray data. However, we can only obtain sample data of light rays, e.g. multiview images. Moreover, the parameters which represent the position and direction of light rays are also sampled. If the sampling of ray parameters is not proper, it is probable that original image are not reproduced correctly from the light rays. In this paper, we discuss the effects of the sampling in mutual conversion between multiview images and light rays. Furthermore, we present sampling methods to reproduce original images correctly for several camera arrangements.
Lossless compression
icon_mobile_dropdown
Transform-based lossless coding
Xin Li, Michael T. Orchard
Recent developments on the implementation of integer-to- integer transform provide a new basis for transform-based lossless coding. Although it shares many features with popular transform-based lossy coding, there are also a few discrepancies between them because of different coding rules. In this paper we discuss several important discrepancies, including the evaluation of decorrelating performance, the implementation of transform and the criteria of choosing transform. We target at a better understanding of applying linear transforms in lossless coding scenario.
Pel-adaptive lossless predictive coding based on image segmentation
Lossless image coding that can recover original image from its compressed signal is required in the fields of medical imaging, fine arts, printing, and any applications demanding high image fidelity. MAR (Multiplicative Autoregressive) predictive coding is an efficient lossless compression scheme. In this method, prediction coefficients are fixed within the subdivided block-by-block image and cannot to be adopted to local statistics efficiently. Furthermore, side-information such as prediction coefficients must be transmitted to the decoder at each block. In this paper, we propose an improved MAR coding method based on image segmentation. The proposed MAR predictor can be adapted to local statistics of image efficiently. This coding method does not need transmit side- information to the decoder at each pixel. The effectiveness of the proposed model is shown through experiments using SHD images.
Comparison of lossless compression techniques for prepress color images
Steven Van Assche, Koen N.A. Denecker, Wilfried R. Philips, et al.
In the pre-press industry color images have both a high spatial and a high color resolution. Such images require a considerable amount of storage space and impose long transmission times. Data compression is desired to reduce these storage and transmission problems. Because of the high quality requirements in the pre-press industry only lossless compression is acceptable. Most existing lossless compression schemes operate on gray-scale images. In this case the color components of color images must be compressed independently. However, higher compression ratios can be achieved by exploiting inter-color redundancies. In this paper we present a comparison of three state-of-the-art lossless compression techniques which exploit such color redundancies: IEP (Inter- color Error Prediction) and a KLT-based technique, which are both linear color decorrelation techniques, and Interframe CALIC, which uses a non-linear approach to color decorrelation. It is shown that these techniques are able to exploit color redundancies and that color decorrelation can be done effectively and efficiently. The linear color decorrelators provide a considerable coding gain (about 2 bpp) on some typical prepress images. The non-linear interframe CALIC predictor does not yield better results, but the full interframe CALIC technique does.
Lossless compression scheme of superhigh-definition images by partially decodable Golomb-Rice code
Shigeo Kato, Madoka Hasegawa, Muling Guo
Multimedia communication systems using super high definition (SHD) images are widely desired in various communities such as medical imagery, digital museum, digital libraries and so on. There are, however, many requirements in SHD image communication systems, because of high pixel accuracy and high resolution of a SHD image. We considered mandatory functions that should be realized in SHD image application systems, as summarized to three items, i.e, reversibility, scalability and progressibility. This paper proposes an SHD image communication systems based on reversibility, scalability and progressibility. To realize reversibility and progressibility, a lossless wavelet transform coding method is introduced as a coding model. To realize scalability, a partially decodable entropy code is proposed. Especially, we focus on a partially decodable coding method for realizing the scalability function in this paper.
Special Session: SHD and Electronic Cinema
icon_mobile_dropdown
HDTV versus electronic cinema
We are on the brink of transforming the movie theatre with electronic cinema. Technologies are converging to make true electronic cinema, with a 'film look,' possible for the first time. In order to realize the possibilities, we must leverage current technologies in video compression, electronic projection, digital storage, and digital networks. All these technologies have only recently improved sufficiently to make their use in the electronic cinema worthwhile. Video compression, such as MPEG-2, is designed to overcome the limitations of video, primarily limited bandwidth. As a result, although HDTV offers a serious challenge to film-based cinema, it falls short in a number of areas, such as color depth. Freed from the constraints of video transmission, and using the recently improved technologies available, electronic cinema can move beyond video; Although movies will have to be compressed for some time, what is needed is a concept of 'cinema compression,' rather than video compression. Electronic cinema will open up vast new possibilities for viewing experiences at the theater, while at the same time offering up the potential for new economies in the movie industry.
4K x 2K pixel color video pickup system
Masayuki Sugawara, Kohji Mitani, Hiroshi Shimamoto, et al.
This paper describes the development of an experimental super- high-definition color video camera system. During the past several years there has been much interest in super-high- definition images as the next generation image media. One of the difficulties in implementing a super-high-definition motion imaging system is constructing the image-capturing section (camera). Even the state-of-the-art semiconductor technology can not realize the image sensor which has enough pixels and output data rate for super-high-definition images. The present study is an attempt to fill the gap in this respect. The authors intend to solve the problem by using new imaging method in which four HDTV sensors are attached on a new color separation optics so that their pixel sample pattern forms checkerboard pattern. A series of imaging experiments demonstrate that this technique is an effective approach to capturing super-high-definition moving images in the present situation where no image sensors exist for such images.
Super-high-definition digital movie system
Tatsuya Fujii, Mitsuru Nomura, Junji Suzuki, et al.
We have developed a digital move communication system for SHD images. The system can transmit extra high quality digital full color movies of 2048 by 2048 pixel resolution using 622 Mbps ATM transmission systems, and display them at the frame rate of 60 frames per second. The system consists of an image data server, ATM transport interfaces, a real-time JPEG decoder and an SHD image display frame buffer. The motion SHD images have such high quality that the system has been designed for professional movie applications such as tele- medicine, education and commercial movie theaters. The image data server is constructed on a workstation to store motion image data and transmit them to the decoder via 622 Mbps ATM links. The real-time decoder is a parallel DSP system to decode the received movie data stream into raw SHD move images. The frame buffer is connected to the decoder via optical links offering at total of 12 Gbps to show complete movie images on CRTs or large size projectors.
Next-generation tele-immersive devices for desktop transoceanic collaboration
Andrew Johnson, Jason Leigh, Thomas A. DeFanti, et al.
Tele-Immersion is the combination of collaborative virtual reality and audio/video teleconferencing. With a new generation of high-speed international networks and high-end virtual reality devices spread around the world, effective trans-oceanic tele-immersive collaboration is now possible. But in order to make these shared virtual environments more convenient workspaces, a new generation of desktop display technology is needed.
High-description image acquisition for digital archiving of rare books
Masaaki Kashimura, Toshifumi Nakajima, Taizo Maeda, et al.
At first in this paper, we given an outline of activity of the Humanities Media Interface (HUMI) Project. This project was established by Keio University for the purpose of digital archiving of rare books held in Keio University Library, and of realizing a research oriented digital library. Then our way of acquiring rare book images of super high definition is introduced and image compensation method for acquiring just- front view of page using the 3-D information extracted from the shape of top line of the page area depicted in the image is proposed. Our approach of acquiring higher resolution image by joining close-up partial images of a page is also introduced. The proposing image adjustment method is extended for partial images of a page as preprocess of joining them together. In the experiment, well-adjusted and joined page images could be obtained.
Motion Estimation
icon_mobile_dropdown
Block-matching subpixel motion estimation from noisy undersampled frames: an empirical performance evaluation
Sean Borman, Mark A. Robertson, Robert L. Stevenson
The performance of block-matching sub-pixel motion estimation algorithms under the adverse conditions of image undersampling and additive noise is studied empirically. This study is motivated by the requirement for reliable sub-pixel accuracy motion estimates for motion compensated observation models used in multi-frame super-resolution image reconstruction. Idealized test functions which include translational scene motion are defined. These functions are sub-sampled and corrupted with additive noise and used as source data for various block-matching sub-pixel motion estimation techniques. Motion estimates computed from this data are compared with the a-priori known motion which enables an assessment of the performance of the motion estimators considered.
Analysis of subpixel motion estimation
The use of interpolation filters in a motion estimator to realize sub-pixel shifts, may lead to unintentional preferences for some velocities over other. In this paper we analyze this phenomenon, focusing on the case of interlaced image data where the problem leads to the most pronounced errors. Linear interpolators, either applied directly or indirectly using generalized sampling are discussed. The conclusions are applicable to any type of motion estimator.
New metric to detect wipes and other gradual transitions in video
Stuart J. Golin
This paper proposes a metric sensitive to evolutionary changes in video, responding strongly to systematic changes, and weakly to 'random' object motion. The metric was developed to assist in the detection of gradual transitions between two video shots, which is the focus of the paper. The metric presupposes quantities that vary monotonically with the relative fraction of two video shots, such as the bins of a color histogram for an 'ideal' wipe, or the pixel values for an 'ideal' dissolve. Common dissimilarity measures based on these quantities, such as their L2-norm, have a very useful property: Dnet, the net dissimilarity between two frames in the transition region, is much larger than Dcum, the cumulative dissimilarity between all adjacent frames between those two frames. The proposed metric is the Video-Evolution Ratio (VER): Dnet/Dcum. The VER is derived for some ideal cases. In this paper, the VER is based on histograms. As such, it is particularly sensitive to wipes, but it is also sensitive to most dissolves, fades, and cuts. It can detect gradual transitions between very similar shots.
Segmentation-based motion estimation for video processing using object-based detection of motion types
In this paper, novel techniques for image segmentation and explicit object-matching-based motion estimation are presented. The principal aims of this work are to reconstruct motion-compensated images without introducing significant artifacts and to introduce an explicit object-matching and noise-robust segmentation technique which shows low computational costs and regular operations. A main feature of the new motion estimation technique is its tolerance against image segmentation errors such as the fusion or separation of objects. In addition, motion types inside recognized objects are detected. Depending on the detected object motion types either 'object/unique motion-vector' relations or 'object/several motion-vectors' relations are established. For example, in the case of translation and rotation, objects are divided into different regions and a 'region/one motion vector' relation is achieved using interpolation techniques. Further, suitability (computational cost) of the proposed methods for online applications (e.g. image interpolation) is shown. Experimental results are used to evaluate the performance of the proposed methods and to compare with block- based motion estimation techniques. In this stage of our work, the segmentation part is based on intensity and contour information (scalar segmentation). For further stabilization of the segmentation and hence the estimation process, the integration other statistical properties of objects (e.g. texture) (vector segmentation) is our current research.
Fast motion estimation algorithm for MPEG-2 video encoding
Minhua Zhou
In this paper a fast motion estimation algorithm is proposed for the MPEG-2 video encoding. This algorithm is based on a hybrid use of the block matching technique and gradient technique. For estimation a motion vector, the block matching technique is used for selecting the initial vector from the spatial neighboring macroblocks and the temporally co-located macroblock, then the gradient technique is applied to obtain the final vector by a pixel-recursive refinement of the initial vector. In addition, an adaptive scheme to determine the search range is developed to enable the low latency motion estimation when the gradient technique is involved. The simulation results reveal that the proposed algorithm shares a high speed of while preserves the coding efficiency compared with the full search block matching technique.
Fast motion estimation using circular zonal search
Block based motion estimation is widely used for exploiting temporal correlation within an image. Still the full search algorithm, which is considered to be the optimal, is computational intensive. In this paper a new fast motion estimation method for video coding, is presented. It will be shown that the new algorithm is not only much faster than traditional algorithms, but in some cases can achieve much better visual quality, even from the 'optimal' full search algorithm.
Special Session: Multimedia Indexing, Retrieval, and Presentation
icon_mobile_dropdown
Binary Format for Scene (BIFS): combining MPEG-4 media to build rich multimedia services
Julien Signes
In this paper, we analyze the design concepts and some technical details behind the MPEG-4 standard, particularly the scene description layer, commonly known as the Binary Format for Scene (BIFS). We show how MPEG-4 may ease multimedia proliferation by offering a unique, optimized multimedia platform. Lastly, we analyze the potential of the technology for creating rich multimedia applications on various networks and platforms. An e-commerce application example is detailed, highlighting the benefits of the technology. Compression results show how rich applications may be built even on very low bit rate connections.
Self-describing schemes for interoperable MPEG-7 multimedia content descriptions
Seungyup Paek, Ana Belen Benitez, Shih-Fu Chang
In this paper, we present the self-describing schemes for interoperable image/video content descriptions, which are being developed as part of our proposal to the MPEG-7 standard. MPEG-7 aims to standardize content descriptions for multimedia data. The objective of this standard is to facilitate content-focused applications like multimedia searching, filtering, browsing, and summarization. To ensure maximum interoperability and flexibility, our descriptions are defined using the eXtensible Markup Language (XML), developed by the World Wide Web Consortium. We demonstrate the feasibility and efficiency of our self-describing schemes in our MPEG-7 testbed. First, we show how our scheme can accommodate image and video descriptions that are generated by a wide variety of systems. Then, we present two systems being developed that are enabled and enhanced by the proposed approach for multimedia content descriptions. The first system is an intelligent search engine with an associated expressive query interface. The second system is a new version of MetaSEEk, a metasearch system for mediation among multiple search engines for audio-visual information.
Hierarchical video summarization
Krishna Ratakonda, M. Ibrahim Sezan, Regis J. Crinon
We address the problem of key-frame summarization of vide in the absence of any a priori information about its content. This is a common problem that is encountered in home videos. We propose a hierarchical key-frame summarization algorithm where a coarse-to-fine key-frame summary is generated. A hierarchical key-frame summary facilitates multi-level browsing where the user can quickly discover the content of the video by accessing its coarsest but most compact summary and then view a desired segment of the video with increasingly more detail. At the finest level, the summary is generated on the basis of color features of video frames, using an extension of a recently proposed key-frame extraction algorithm. The finest level key-frames are recursively clustered using a novel pairwise K-means clustering approach with temporal consecutiveness constraint. We also address summarization of MPEG-2 compressed video without fully decoding the bitstream. We also propose efficient mechanisms that facilitate decoding the video when the hierarchical summary is utilized in browsing and playback of video segments starting at selected key-frames.
Spatiotemporal indexing of video in the wavelet domain
Automatic video indexing is an important feature in video database applications. Several techniques have appeared in recent literature for detecting object motion and camera operations present in a video. However, most of these techniques operate in the spatial domain. Since, video is likely to be stored in compressed form, it is crucial to deep detection techniques which can operate on the compressed data. Wavelet transform has recently emerged as a powerful tool for efficient compression and indexing. In this paper, we present a technique for temporal indexing using multiresolution motion vectors in a wavelet framework. We note that several approaches for indexing the spatial content of video have already been proposed in the literature. A combination of spatial and temporal indices constitutes a spatio-temporal index of video in the wavelet domain.