Show all abstracts
View Session
- Image Coding I
- Image Coding II
- Image Coding III
- Video Coding
- MPEG-4
- Motion Estimation
- Image and Video Transmission I
- Image and Video Transmission II
- Rendering
- Video Processing
- Image and Video Segmentation and Retrieval I
- Image and Video Segmentation and Retrieval II
- Image Processing
- Motion Estimation and Video Coding
- Poster Session
Image Coding I
Fast-lapped transform for image coding
Ricardo L. de Queiroz,
Trac D. Tran
Show abstract
This paper introduces a class of linear phase lapped biorthogonal transforms with basis functions of variable length. A lattice is used to enforce both linear phase and perfect reconstruction properties as well as to provide a fast and efficient transform implementation for image coding applications. In the proposed formulation which we call fast lapped transform (FLT), the higher frequency filters (basis functions) are those of the DCT, which are compact to limit ringing. The lower frequency filters (basis functions) are overlapped for representing smooth signals while avoiding blocking artifacts. A great part of the FLT computation is spent at the DCT stage, which can be implemented through fast algorithms, while just a few more operations are needed to implement the extra stages. For example, compared to the DCT, an FLT with good performance can be implemented with only 8 extra additions and 6 extra multiplications for an 8-sample block. Yet, image coding examples show that the FLT is far superior to the DCT and is close to the 9/7-tap biorthogonal wavelet in subjective coding performance.
Online rate control in digital cameras for near-constant distortion based on minimum/maximum criterion
Show abstract
We address the problem of online rate control in digital cameras, where the goal is to achieve near-constant distortion for each image. Digital cameras usually have a pre-determined number of images that can be stored for the given memory size and require limited time delay and constant quality for each image. Due to time delay restrictions, each image should be stored before the next image is received. Therefore, we need to define an online rate control that is based on the amount of memory used by previously stored images, the current image, and the estimated rate of future images. In this paper, we propose an algorithm for online rate control, in which an adaptive reference, a 'buffer-like' constraint, and a minimax criterion (as a distortion metric to achieve near-constant quality) are used. The adaptive reference is used to estimate future images and the 'buffer-like' constraint is required to keep enough memory for future images. We show that using our algorithm to select online bit allocation for each image in a randomly given set of images provides near constant quality. Also, we show that our result is near optimal when a minimax criterion is used, i.e., it achieves a performance close to that obtained by applying an off-line rate control that assumes exact knowledge of the images. Suboptimal behavior is only observed in situations where the distribution of images is not truly random (e.g., if most of the 'complex' images are captured at the end of the sequence.) Finally, we propose a T- step delay rate control algorithm and using the result of 1- step delay rate control algorithm, we show that this algorithm removes the suboptimal behavior.
Image Coding II
Method for variable quantization in JPEG for improved perceptual quality
Show abstract
One of the main limitations of the ubiquitous JPEG standard is the fact that visible artifacts can often appear in the decompressed image at moderate to high compression ratios. This is especially true for parts of the image containing graphics, text, or some other such synthesized component. Artifacts are also common in smooth regions and in image blocks containing a single dominant edge. One approach to deal with this problem is to change the 'coarseness' of quantization as a function of image characteristics in the block being compressed. The latest extension of the JPEG standard, called JPEG Part-3, provides the necessary syntax for supporting this process by means of scale factors that can be used to uniformly vary the quantization step sizes on a block by block basis. However, the standard does not recommend any specific technique or algorithm for determining scale factors. This paper proposes a simple algorithm for computing scale factors for the quantization tables used in the JPEG compression standard. The algorithm classifies each image block according to its activity and type which index into look-up tables that provide the scale factor to be used for the block. The look-up tables are designed experimentally to yield perceptually lossless compression for the target device under consideration.
Reconstruction of bilevel images from a low-quality JPEG
Mihai Sipitca,
David W. Gillman,
Lyman Hurd
Show abstract
Using concepts from projection onto convex sets (POCS), we develop algorithms for reconstructing bilevel images from a low quality JPEG. The constraint that the image is bilevel is not convex, so simple iteration does not suffice, but we develop techniques for enforcing the constraints gradually (soft-thresholding) and for using randomness to improve convergence (random thresholding). Our final algorithm succeeded in finding a bilevel image satisfying our constraints in every tested case.
Fast JPEG encoding for color fax using HVQ
Ricardo L. de Queiroz,
Patrick Fleckenstein
Show abstract
We derive a JPEG compliant image compressor which is based on hierarchical vector quantization (HVQ). The goal is to reduce complexity while increasing compression speed. For each block, the DCT DC coefficient is encoded in the regular way while the residual is mapped through HVQ to a pre-computed bit-stream corresponding to the compressed DCT AC coefficients. Approximation quality is generally good for high compression ratios. Color Fax is the primary application target for the proposed system.
Optimal bit allocation for channel-adaptive multiple description coding
Show abstract
Multiple Description Coding (MDC) techniques have been explored in recent years as an alternative to other methods to provide robustness to multimedia information in the presence of losses. In a MDC approach some redundancy is preserved in the source coding so that, after appropriate packetization, if packet losses occur it is possible to recover by exploiting the redundancy (statistical or deterministic) between what was received and what was lost. While MDC techniques have shown some promising results, one potential drawback is the fact that changing their redundancy level may entail significant changes to the system. Since the level of redundancy should be adjusted to match the specific channel conditions, the difficulty in adapting can be a significant problem for time varying transmission scenarios. As an example, MDC techniques based on transform coding would require a modification of the transform at encoder and decoder each time the channel conditions change. In our previous work, we have proposed a simple approach for MDC that involves using a polyphase transform and deterministic redundancy (e.g., each sample of input data is transmitted several times, with different coding rates). This approach is useful in that it greatly simplifies the design of a MDC scheme, since the rate allocation determines the amount of redundancy. Moreover, it provides a great deal of flexibility as it enables the choice of redundancy to be almost arbitrary. To demonstrate the effectiveness of our system we introduce an optimal bit allocation algorithm that allows us to select the amount of redundancy to be introduced in the signal that best matches a given target packet loss rate. It is clear that such a trade- off exists, as the level of redundancy should increase when the packet loss rate increases, at the cost of some degradation in the corresponding error free performance. Our results show significant differences between optimal and suboptimal choices of redundancy. Moreover, given that the decoder remains unchanged when the bit allocation changes it is possible to adapt very simply to the changes in channel behavior without requiring a change in the packet sizes, or the structure of the decoder.
Image Coding III
Visual masking in wavelet compression for JPEG-2000
Show abstract
We describe a nonuniform quantization scheme for JPEG2000 that leverages the masking properties of the visual system, in which visibility to distortions declines as image energy increases. Derivatives of contrast transducer functions convey visual threshold changes due to local image content (i.e. the mask). For any frequency region, these functions have approximately the same shape, once the threshold and mask contrast axes are normalized to the frequency's threshold. We have developed two methods that can work together to take advantage of masking. One uses a nonlinearity interposed between the visual weighting and uniform quantization stage at the encoder. In the decoder, the inverse nonlinearity is applied before the inverse transform. The resulting image- adaptive behavior is achieved with only a small overhead (the masking table), and without adding image assessment computations. This approach, however, underestimates masking near zero crossings within a frequency band, so an additional technique pools coefficient energy in a small local neighborhood around each coefficient within a frequency band. It does this in a causal manner to avoid overhead. The first effect of these techniques is to improve the image quality as the image becomes more complex, and these techniques allow image quality increases in applications where using the visual system's frequency response provides little advantage. A key area of improvement is in low amplitude textures, in areas such as facial skin. The second effect relates to operational attributes, since for a given bitrate, the image quality is more robust against variations in image complexity.
Error-resilient EBCOT image coding with content classification
Show abstract
EBCOT, as the baseline algorithm of JPEG-2000 final draft, is an efficient image coding technique. It is inherently more error resilient than many other wavelet-based schemes due to its independent coding of blocks in each subband. However, the loss of data of a block, in any lower frequency subband, in EBCOT can still degrade the perceptual image quality considerably. As robust entropy codes are used, the information of image content can help recover damaged regions of blocks. This paper discusses the use of reversible variable length codes (RVLC) and data partitioning for coding lower frequency subbands in EBCOT, instead of arithmetic codes. The selection of RVLC is based on content of images, which is classified as active and shape. We have observed that the proposed approach has very little additional bit-rate overhead and improved performance in the presence of errors from the help of content.
Compression of concentric mosaic scenery with alignment and 3D wavelet transform
Show abstract
As a new scene representation scheme, the concentric mosaic offers a quick way to capture and model a realistic 3D environment. This is achieved by shooting a lot of photos of the scene. Novel views can be rendered by patching vertical slits of the captured shots. The data amount in the concentric mosaic is huge. In this work, we compress the concentric mosaic image array with a 3D wavelet scheme. The proposed scheme first aligns the mosaic images, and then applies a 3D wavelet transform on the aligned mosaic image array. After that, the wavelet coefficients in each subband are split into cubes, where each of the cubes is encoded independently with an embedded block coder. Various cube bitstreams are then assembled to form the final compressed bitstream. Experimental result shows that the proposed 3D wavelet coder achieves a good compression performance.
Mesh-based scalable image coding with rate-distortion optimization
Show abstract
Recent development in video coding research deals with the use of hierarchical and/or adaptative mesh for video representation. Concurrently, transmitted bit rates have to be reduced to adapt to the network available bandwidth. Some previous works deal with adaptative node sampling according to image content. However, adaptative hierarchical proposed approaches do not optimize a compromise between distortion and bitrate: the representation coding cost is often stated but not taken into account as a constraint. Compared to these methods, this paper proposes for considering an adaptative hierarchical mesh based representation whose splitting criterion optimizes both the coding cost and the image rendering. Jointly, node value optimization, adaptative quantization, cheap coding tree and a wavelet approach are presented. To illustrate our different proposed methods, experimental results are shown and compared to the JPEG picture coding format.
Selective splitting approach to entropy-constrained single/multistage vector quantizer design
Show abstract
A practical tool is proposed to improve the design of various vector quantizer (VQ) structures. Particular emphasis is placed on the design of entropy-constrained VQ (ECVQ), and entropy-constrained multi-stage VQ (EC-MSVQ), whose optimization is notoriously difficult. Traditional design techniques involve an indirect approach where a fixed rate quantizer is gradually modified into a variable rate VQ. We propose a direct design procedure based on selective codevector splitting. The codebooks are grown, using splitting according to a rate-distortion Lagrangian trade-off, to the desired operating point of average bit rate. Extensive simulations in image and video compression are presented and show consistent, significant improvement over standard techniques. For example, in ECVQ design for compression of video residuals, PSNR gains of about 1.0 dB were achieved.
Video Coding
DSP-based real-time video encoding
Minhua Zhou,
Raj Talluri
Show abstract
This paper describes the implementation of H.263 real-time video encoding on TI TMS320C6X. This series of DSPs utilize a common core based on VelociTITM, the advanced Very Long Instruction Word (VLIW) DSP architecture, which makes them ideal for the high performance embedded multimedia applications. In this paper we discuss in detail the used methodologies to structure video coding algorithm in order to exploit the DSP architecture. In particular, a novel DSP- friendly motion estimation algorithm has been developed to achieve the good trade-off between the coding efficiency and coding complexity. This algorithm plays a key role for the realization of real-time video encoding on DSPs. On the EVM board of this kind of DSP (CPU frequency 167 MHz), we were able to demonstrate the H.263 baseline video encoding, CIF (352 X 288), 1Mbit/s at a speed of about 30 fps. Multimedia applications such as consumer set-top boxes, videophones, videoconferencing, network camera will benefit from this performance.
Constrained variable-bit-rate control algorithm for MPEG-2 encoder
Show abstract
The main objective of the variable bit rate (VBR) control for a video encoder is to maintain the picture quality during compression. Our constrained VBR control algorithm uses an external output buffer level to feedback the encoder. This algorithm predicts the next buffer level based on the current buffer level and the weighted average picture size of different picture types. The buffer level is fed back to the encoder and the quantizer scale is adjusted accordingly. If the buffer is near empty, the optimized quantizer scale is used. If the buffer is near full, the quantizer scale is increased aggressively to guarantee that the buffer does not overflow. This VBR encoder algorithm was implemented and compared with the CBR encoder algorithm. Several simulation results show that the VBR encoder provides better and more uniform picture quality than the CBR encoder at the same bit rate. For a desired picture quality, our VBR encoder can achieve more compression. If a network can support the VBR output, our constrained VBR control encoder performs better than the CBR control encoder.
Conditional entropy coding of DCT coefficients for video compression
Mihai Sipitca,
David W. Gillman
Show abstract
We introduce conditional Huffman encoding of DCT run-length events to improve the coding efficiency of low- and medium-bit rate video compression algorithms. We condition the Huffman code for each run-length event on a classification of the current block. We classify blocks according to coding mode and signal type, which are known to the decoder, and according to energy, which the decoder must receive as side information. Our classification schemes improve coding efficiency with little or no increased running time and some increased memory use.
Optimized bit allocation for scalable wavelet video coding
Show abstract
The hybrid coding scheme is employed in all established coding standards. A forward motion vector field is estimated and applied for motion compensation. The remaining prediction error and the motion vectors are transmitted to the decoder. The discrete cosine transform is used for transform coding of the prediction error. The extension of this coding scheme to scalability is not easily achieved and the performance of standard video coders when using scalability options can often be reduced to the performance of simulcast coding. In this paper a hierarchical spatial scalable wavelet video coder is presented. A backward motion compensation scheme is used and no motion vectors have to be transmitted. In this coding scheme the lowpass band of the decomposition is coded first without motion compensation using DPCM. The coarser levels of the signal decomposition which are known to both, the encoder and the decoder, are then employed for motion estimation of the next levels. Due to the hierarchical structure of the presented scheme the impact of quantization is strongly interdependent between the decomposition levels. In contrast to still image coding quantization affects the reconstruction quality as well as the motion estimation and motion compensation efficiency. In this paper the bit allocation for the decomposition levels of the hierarchical wavelet video coder is investigated.
Embedded wavelet video coding with error concealment
Show abstract
We present an error-concealed embedded wavelet (ECEW) video coding system for transmission over Internet or wireless networks. This system consists of two types of frames: intra (I) frames and inter, or predicted (P), frames. Inter frames are constructed by the residual frames formed by variable block-size multiresolution motion estimation (MRME). Motion vectors are compressed by arithmetic coding. The image data of intra frames and residual frames are coded by error-resilient embedded zerotree wavelet (ER-EZW) coding. The ER-EZW coding partitions the wavelet coefficients into several groups and each group is coded independently. Therefore, the error propagation effect resulting from an error is only confined in a group. In EZW coding any single error may result in a totally undecodable bitstream. To further reduce the error damage, we use the error concealment at the decoding end. In intra frames, the erroneous wavelet coefficients are replaced by neighbors. In inter frames, erroneous blocks of wavelet coefficients are replaced by data from the previous frame. Simulations show that the performance of ECEW is superior to ECEW without error concealment by 7 to approximately 8 dB at the error-rate of 10-3 in intra frames. The improvement still has 2 to approximately 3 dB at a higher error-rate of 10-2 in inter frames.
Design of wavelet-based scalable video codec
Show abstract
This paper presents a new framework for wavelet based scalable video coding. Like some other video coders, this new video coder also encodes some frames (I-frame) independently and other frames (P-frame) predictively. The I-frame is encoded by a scalable image coder with LCBiD coefficient coding scheme. This is a wavelet based image coder of very low implementation complexity. A P-frame is only dependent on the frame just encoded. One major difficulty in designing a scalable video codec is how to obtain a highly scalable video stream while the inter-frame dependency is still efficiently exploited. This paper describes a scheme which is referred to as Layered Motion Compensation and Coding (LMCC) to resolve this conflict. With LMCC, the motion compensation is performed in the coefficient domain. The prediction residue is calculated in a way that the layered structure is strictly enforced across the entire Group of Pictures (GOP), no matter how many frames there are in a GOP. The residue coding scheme is also based on LCBiD with introduction of conditional sign bit encoding to maintain the layered structure obtained in the motion compensation stage to achieve an SNR scalable video stream. The resolution scalability is realized by coding each subband of each frame independently.
MPEG-4
MPEG-4 playback of multimedia objects searchable by their description
Atul Puri,
Robert L. Schmidt,
Qian Huang,
et al.
Show abstract
In this paper, from the standpoint of their potential for multimedia applications/services we examine key new elements of the MPEG-4 standard as well as concepts leading to the ongoing work for the MPEG-7 standard. We first examine generic requirements common to several practical future multimedia applications/services. We then identify and examine key new features of the MPEG-4 standard such as, image/video objects, facial animation, advanced sound, integration with the web, multi-user worlds and Java control. Next, we briefly review the goal, requirements and progress in the ongoing work on the MPEG-7 standard. We then discuss the player we intend to use which integrates a number of key features of MPEG-4 and search and retrieval concepts similar to that of MPEG-7 standard, as well as how to create content for the player. Next, we discuss an example multimedia application that we have developed. Finally, we summarize the key highlights of the paper.
Analysis of object segmentation methods for VOP generation in MPEG-4
Show abstract
The recent audio-visual standard MPEG4 emphasizes content- based information representation and coding. Rather than operating at the level of pixels, MPEG4 operates at a higher level of abstraction, capturing the information based on the content of a video sequence. Video object plane (VOP) extraction is an important step in defining the content of any video sequence, except in the case of authored applications which involve creation of video sequences using synthetic objects and graphics. The generation of VOPs from a video sequence involves segmenting the objects from every frame of the video sequence. The problem of object segmentation is also being addressed by the Computer Vision community. The major problem faced by the researchers is to define object boundaries such that they are semantically meaningful. Finding a single robust solution for this problem that can work for all kinds of video sequences still remains to be a challenging task. The object segmentation problem can be simplified by imposing constraints on the video sequences. These constraints largely depend on the type of application where the segmentation technique will be used. The purpose of this paper is twofold. In the first section, we summarize the state-of- the-art research in this topic and analyze the various VOP generation and object segmentation methods that have been presented in the recent literature. In the next section, we focus on the different types of video sequences, the important cues that can be employed for efficient object segmentation, the different object segmentation techniques and the types of techniques that are well suited for each type of application. A detailed analysis of these approaches from the perspective of accuracy of the object boundaries, robustness towards different kinds of video sequences, ability to track the objects through the video sequences, and complexity involved in implementing these approaches along with other limitations will be discussed. In the final section, we concentrate on the specific problems that require special attention and discuss the scope and direction for further research.
New video object segmentation technique based on flow-thread features for MPEG-4 and multimedia systems
Ho-Chao Huang,
Yung-Chieh Lin,
Yi-Ping Hung,
et al.
Show abstract
In this paper, we present a novel technique for video object (VO) segmentation. Compared to the existing VO segmentation methods, our method has the advantage that it does not decompose the VO segmentation problem into an initial image segmentation problem (segmenting a single image frame) followed by a temporal tracking problem. Instead, motion information contained in a finite duration of the image sequence is considered simultaneously. Given a video sequence, our method first estimates motion vectors between consecutive images, and then constructs the flow-thread for each pixel based on the estimated motion vectors. Here, a flow-thread is a series of pixels obtained by tracing the motion vectors along the image sequence. Next, we extract a set of flow- thread features (ft-features) from each flow-thread, which is then used to classify the associated pixel into the VO it belongs to. The segmentation results obtained by our unsupervised method look promising and the processing speed is fast enough for practical uses.
Motion Estimation
Fast motion estimation for frame-rate conversion
Show abstract
Different multimedia applications and transmission channels require different resolution, frame rates/structures and bitrate, there is often a need to transcode the stored compressed video to suit the needs of these various applications. This paper is concerned about fast motion estimation for frame rate/structure conversion. In this paper, we proposed several novel algorithms that exploit the correlation of the motion vectors in the original video and those in the transcoded video. We achieve a much higher quality than existing fast search algorithms with much lower complexity.
Global/local motion-compensated frame interpolation for low-bit-rate video
Show abstract
A new motion-compensated frame interpolation scheme for low bitrate video based on the ITU-T H.263/H.263+ standard is investigated in this research. The proposed scheme works solely on the decoded bitstream with a block-based approach to achieve interpolation results. It is composed of two main modules: the background/foreground segmentation module and the hybrid motion compensated frame interpolation module. The background/foreground segmentation module uses a global motion model to estimate the background motion and an iterative background update to refine the segmentation. The hybrid motion compensated frame interpolation module is employed to reconstruct background and foreground, respectively. Global motion compensation and frame interpolation is applied to background blocks where either the 6-parameter affine or the 8-parameter perspective model is used to reduce the computational complexity and implement perspective correction, while local motion compensation and frame interpolation with localized triangular patch mapping is applied to the foreground area. Experiments show that the proposed scheme can achieve higher overall visual quality compared to conventional block-based frame interpolation schemes.
Fast full-search block matching using subblocks and successive approximation of the error measure
Show abstract
A fast full search block matching algorithm is developed. The matching criterion is the sum of absolute differences or the mean square error. The algorithm evaluates lower bounds for the matching criteria for subdivided blocks in order to reduce the number of search positions. It also uses the lower bounds for a fast calculation of the matching criterion for the remaining search positions. The computational complexity of the algorithm is evaluated and compared to the three-step search strategy. The search result of the algorithm is identical to the search result of the exhaustive search.
Image and Video Transmission I
Signal processing for Internet video streaming: a review
Show abstract
Despite the commercial success, video streaming remains a black art owing to its roots in proprietary commercial development. As such, many challenging technological issues that need to be addressed are not even well understood. The purpose of this paper is to review several important signal processing issues related to video streaming, and put them in the context of a client-server based media streaming architecture on the Internet. Such a context is critical, as we shall see that a number of solutions proposed by signal processing researchers are simply unrealistic for real-world video streaming on the Internet. We identify a family of viable solutions and evaluate their pros and cons. We further identify areas of research that have received less attention and point to the problems to which a better solution is eagerly sought by the industry.
Simple video format for mobile applications
Show abstract
With the advent of pervasive computing, there is a growing demand for enabling multimedia applications on mobile devices. Large numbers of pervasive computing devices, such as personal digital assistants (PDAs), hand-held computer (HHC), smart phones, portable audio players, automotive computing devices, and wearable computers are gaining access to online information sources. However, the pervasive computing devices are often constrained along a number of dimensions, such as processing power, local storage, display size and depth, connectivity, and communication bandwidth, which makes it difficult to access rich image and video content. In this paper, we report on our initial efforts in designing a simple scalable video format with low-decoding and transcoding complexity for pervasive computing. The goal is to enable image and video access for mobile applications such as electronic catalog shopping, video conferencing, remote surveillance and video mail using pervasive computing devices.
Robust optimization solution to the data hiding problem using distributed source coding principles
Show abstract
Inspired by a recently proposed constructive framework for the distributed source coding problem, we propose a powerful constructive approach to the watermarking problem, emphasizing the dual roles of 'source codes' and 'channel codes.' In our framework, we explore various source and channel codes to achieve watermarks that are robust to attackers in terms of maximizing the distortion between the corrupted coded-source signal and the original signal while holding the distortion between the coded-source signal and the original signal constant. We solve the resulting combinatorial optimization problem using an original technique based on robust optimization and convex programming.
Image and Video Transmission II
Context-based multiple bit-stream image transmission over noisy channels
Show abstract
In this paper, we propose a novel combined source and channel coding scheme for image transmission over noisy channels. The main feature of the proposed scheme is a systematic decomposition of image sources so that unequal error protection can be applied according to not only bit error sensitivity but also visual context importance. The wavelet transform is adopted to hierarchically decompose the image. The association between the wavelet coefficients and what they represent spatially in the original image is fully exploited. Such decomposition generates wavelet blocks that can be classified based on their corresponding image context. The classification produces wavelet trees in each class with similar context and statistics and therefore enables high performance source compression using SPIHT. The channel coding assigns unequal error protection to different classes and to different bit planes so that the image transmission scheme is robust in terms of both subjective and objective visual quality. To further improve the quality of the received image, a post-processing method was proposed to restore the degradation due to the channel decoding residual error. Experimental results show that the proposed scheme has a good performance for image transmission over noisy channels. In particular, the reconstructed images consistently illustrate better visual quality.
Video delivery over wireless channel with dynamic QoS control
Sherry Wang,
Harold Zheng,
John A. Copeland
Show abstract
The combination of bit-rate variability of compressed video traffic and inferior quality of a wireless fading channel causes low bandwidth efficiency, high data error rate, and long delay to wireless multimedia applications. In this paper, we analyze the drawbacks of conventional SR-ARQ for time- constrained traffic transmission, such as video. We also show the necessity of enhancing the existing data link protocols. Consequently, a new QoS aware SR-ARQ (QSR-ARQ) scheme is proposed. QSR-ARQ utilizes the layer property of MPEG video frames, as well as varied QoS requirements of different data sections to improve the protocol performance. A time spreading technique that based on clients' buffering capacity is also used in the study. This technique not only reduces the bit- rate fluctuation of video traffic but also alleviates the impact of burst link errors on MPEG video. Comprehensive computer simulations were used to compare the performance of the proposed QSR-ARQ with a conventional SR-ARQ using original video traffic and the pre-processed ones. The results proved the applicability and the effectiveness of the QSR-ARQ and the time spreading technique for transmitting MPEG videos with QoS guarantees in a wireless mobile environment. QSR-ARQ can also be applied to the component-based video compression applications.
Wireless image transmission using multiple-description-based concatenated codes
Show abstract
This work introduces a multiple-description product code which aims at optimally generating multiple, equally-important wavelet image descriptions. The codes used are a concatenated channel code including a row (outer) code based on RCPC codes with CRC error detection and a source-channel column (inner) code consisting of the scalable SPIHT image coder and an optimized array of unequal protection Reed-Solomon erasure- correction codes. By systematically matching the unequal protection codes to the embedded source bitstream using a simple, fast optimizer that can run in real time, we allow image quality to degrade gracefully as fade worsens and maximize expected image quality at the receiver. This approach to image transmission over fading channels offers significant improvements in both peak and expected image quality when compared to current state-of-the-art techniques. Our packetization scheme is also naturally suited for hybrid packet-network/wireless channels, such as those used for wireless Internet access.
Joint source-channel coding with allpass filtering source shaping for image transmission over noisy channels
Show abstract
In this paper, we proposed a fixed-length robust joint source- channel coding (JSCC) scheme for image transmission over noisy channels. Three channel models are studied: binary symmetric channels (BSC) and additive white Gaussian noise (AWGN) channels for memoryless channels, and Gilbert-Elliott channels (GEC) for bursty channels. We derive, in this research, an explicit operational rate-distortion (R-D) function, which represents an end-to-end error measurement that includes errors due to both quantization and channel noise. In particular, we are able to incorporate the channel transition probability and channel bit error rate into the R-D function in the case of bursty channels. With the operational R-D function, bits are allocated not only among different subsources, but also between source coding and channel coding so that, under a fixed transmission rate, an optimum tradeoff between source coding accuracy and channel error protection can be achieved. This JSCC scheme is also integrated with allpass filtering source shaping to further improve the robustness against channel errors. Experimental results show that the proposed scheme can achieve not only high PSNR performance, but also excellent perceptual quality. Compared with the state-of-the-art JSCC schemes, this proposed scheme outperforms most of them especially when the channel mismatch occurs.
Joint source-channel coding for scalable video
Show abstract
An approach towards joint source-channel coding for wireless video transmission at low bit rates is proposed. An SNR scalable video coder is used and a different amount of error protection is allowed for each scalable layer. Our problem is to allocate the available bit rate across scalable layers and, within each layer, between source and channel coding, while minimizing the end-to-end distortion of the received video sequence. The distortion is due to both source coding (quantization) errors and channel noise errors, and is measured in terms of the Mean Squared Error (MSE). The optimization algorithm utilizes rate-distortion characteristic plots. These plots show the contribution of each layer to the total distortion as a function of the source rate of the layer and the residual bit error rate (the error rate that remains after the use of channel coding). The plots are obtained experimentally using representative video sequences and show the sensitivity of the source encoder and decoder to channel errors. Our algorithm is operationally optimal given the rate- distortion characteristic plots. These plots are used in conjunction with plots that show the bit error rate achieved by the allowable channel coding schemes for given channel conditions in order to obtain the operational rate-distortion curve of each layer. Then, dependent Lagrangian optimization is used to determine the overall bit allocation across all layers.
Reliable video transmission over fading channels via channel state estimation
Show abstract
Transmission of continuous media such as video over time- varying wireless communication channels can benefit from the use of adaptation techniques in both source and channel coding. An adaptive feedback-based wireless video transmission scheme is investigated in this research with special emphasis on feedback-based adaptation. To be more specific, an interactive adaptive transmission scheme is developed by letting the receiver estimate the channel state information and send it back to the transmitter. By utilizing the feedback information, the transmitter is capable of adapting the level of protection by changing the flexible RCPC (rate-compatible punctured convolutional) code ratio depending on the instantaneous channel condition. The wireless channel is modeled as a fading channel, where the long-term and short- term fading effects are modeled as the log-normal fading and the Rayleigh flat fading, respectively. Then, its state (mainly the long term fading portion) is tracked and predicted by using an adaptive LMS (least mean squares) algorithm. By utilizing the delayed feedback on the channel condition, the adaptation performance of the proposed scheme is first evaluated in terms of the error probability and the throughput. It is then extended to incorporate variable size packets of ITU-T H.263+ video with the error resilience option. Finally, the end-to-end performance of wireless video transmission is compared against several non-adaptive protection schemes.
Rendering
Progressive rendering with multiresolution 3D meshes
Show abstract
A progressive 3D mesh rendering technique is investigated in this research. This method renders a multi-resolution 3-D triangle mesh by exploiting the smooth transition of an object in the spatial and temporal domains. To be more specific, when a vertex split technique is used to get a refined mesh, we consider a new rendering technique that will automatically determine which parts of the newly generated polygons are visible and perform the rendering accordingly. Since the process of dynamic splitting can be performed iteratively for many times, the error of a progressive rendered result tends to accumulate along the process. To control the visual quality of the rendered image, we can set a threshold or a constant number of steps for the 3D models to be rendered from the scratch, and we call it the graphic refresh. Experimental results are given to show the performance of the proposed progressive rendering algorithm.
Error-resilient coding technique for 3D graphic models
Show abstract
Existing techniques for coding a general 3D graphic model are very sensitive to errors. The coded bitstream can be easily ruined with even a single bit error in the topology structure part. As a result, the reconstruction of the original mesh becomes difficult or even impossible. In this research, we propose a new approach for error resilient coding of 3D graphic models, and reconstruct the original model to the maximum extent from corrupted 3D mesh data while maintaining a satisfactory compression performance. In the proposed scheme, the 3D mesh is first divided into a number of small pieces by using any standard mesh segmentation algorithm. A new structure, i.e. the joint boundary, is then retrieved from those divided pieces. It is used to stitch segmented pieces back to form the original model. We develop a new coding algorithm for the joint boundary. The coded joint boundary topology and the first 3 bit-planes of coded joint boundary geometry are protected using the forward error correction (FEC) code. They can be decoded free of error. Consequently, they provide the anchor-vertices and anchor-links in the 3D space, and form the key structure in error detection, recovery and concealment of corresponding pieces. It is demonstrated that the proposed coding scheme can achieve very good results for the bit error rate (BER) less than 10-3 while maintaining high coding efficiency.
Spatially adaptive regularized pel-recursive motion estimation based on the EM algorithm
Show abstract
Pel-recursive motion estimation is a well-established approach for motion estimation. However, in the presence of noise, it becomes an il-posed problem that requires regulation. In the past, regularization for pel-recursive estimations was addressed in an ad-hoc manner. In this paper, a Bayesian estimation framework is used to deal with this issue. More specifically, motion vectors and regularization parameters are estimated in an iterative fashion by means of the Expectation- Maximization (EM) algorithm and a Gaussian data model. The proposed algorithm utilizes the local image properties to regularize the motion vector estimates following a spatially adaptive approach. Numerical experiments are presented that demonstrate the merits of the proposed algorithm.
Video Processing
Majority-selection de-interlacing: an advanced motion-compensated spatiotemporal interpolation technique for interlaced video
Show abstract
De-interlacing of interlaced video doubles the number of lines per picture. As the video signal is sub-Nyquist sampled in the vertical and temporal dimension, standard up-conversion or interpolation filters cannot be applied. This may explain the large number of de-interlacing algorithms that have been proposed in the literature, ranging from simple intra-field de-interlacing methods to the advanced motion-compensated (MC) methods. MC de-interlacing methods are generally far superior over the non-MC ones. However, it seems difficult to combine robustness of a MC de-interlacing algorithm for incorrect motion vectors with the ability to preserve high spatial frequencies. The Majority-Selection de-interlacer, as proposed in this paper, provides a means to combine several strengths of individual de-interlacing algorithms into a single output signal.
Postprocessing of interframe coded images based on convex projection and regularization
Show abstract
In order to reduce blocking artifacts in inter-frame coded images, we propose a new image restoration algorithm, which directly processes differential images before reconstruction. We note that blocking artifact in inter-frame coded images is caused by both 8 X 8 DCT and 16 X 16 macroblock based motion compensation, while that of intra-coded images is caused by 8 X 8 DCT only. According to the observation, we propose a new degradation model for differential images and the corresponding restoration algorithm that utilizes additional constraints and convex sets for discontinuity inside blocks. The proposed restoration algorithm is a modified version of standard regularization that incorporates spatially adaptive lowpass filtering with consideration of edge directions by utilizing a part of DCT coefficients. Most of video coding standard adopt a hybrid structure of block- based motion compensation and block discrete cosine transform (BDCT). By this reason, blocking artifacts are occurred on both block boundary and block interior. For more complete removal of both kinds of blocking artifacts, the restored differential image must satisfy two constraints, such as, directional discontinuities on block boundary and block interior. Those constraints have been used for defining convex sets for restoring differential images.
Removing blocking effects and dropouts in DCT-based video sequences
Show abstract
In this paper, we propose an object-based variational approach for the decoding of DV, M-JPEG, and MPEG1-2 video sequences. This new method improves the visual quality by considering simultaneously two kind of artifacts: artifacts due to compression, such as blocking effects and quantization noise, and other defects due to acquisition, transmission, or storage, such as dropouts and banding. Generally, methods improving the visual quality in video sequences consider these two kind of artifacts separately, and consist most of the time in post-processing techniques, applied to a single kind of artifact. The proposed method adopts a global approach for the decoding. It deals with the minimization of half-quadratic criteria, which allow the simultaneous estimation and restoration of backgrounds, on one side, and the moving objects detection on the other side. Each background and each object are processed separately, according to their spatial and temporal properties, to remove effectively blocking effects, quantization noise, and dropouts. Several experimental results are presented in this paper. They demonstrate the efficiency of the decoding method: blocking effects are largely removed and missing data are significantly reduced, compared to the standard decoding, resulting in a greater visual quality of the sequence.
Object-based postprocessing of block motion fields for video applications
Show abstract
It is likely that in many applications block-matching techniques for motion estimation will be further used. In this paper, a novel object-based approach for enhancement of motion fields generated by block matching is proposed. Herein, a block matching is first applied in parallel with a fast spatial image segmentation. Then, a rule-based object postprocessing strategy is used where each object is partitioned into sub-objects and each sub-object motion histogram first separately analyzed. The sub-object treatment is, in particular, useful when image segmentation errors occur. Then, using plausibility histogram tests, object motions are segregated into translational or non-translational motion. For non-translational motion, a single motion-vector per sub-object is first assigned. Then motion vectors of the sub-objects are examined according to plausibility criteria and adjusted in order to create smooth motion inside the whole object. As a result, blocking artifacts are reduced and a more accurate estimation is achieved. Another interesting result is that motion vectors are implicitly assigned to pixels of covered/exposed areas. In the paper, performance comparison of the new approach and block matching methods is given. Furthermore, a fast unsupervised image segmentation method of reduced complexity aimed at separating objects is proposed. This method is based on a binarization method and morphological edge detection. The binarization combines local and global texture-homogeneity tests based on special homogeneity masks which implicitly take possible edges into account for object separation. The paper contributes also a novel formulation of binary morphological erosion, dilation and binary edge detection. The presented segmentation uses few parameters which are automatically adjusted to the amount of noise in the image and to the local standard deviation.
Image and Video Segmentation and Retrieval I
CBIR: from low-level features to high-level semantics
Show abstract
The performance of a content-based image retrieval (CBIR) system is inherently constrained by the features adopted to represent the images in the database. Use of low-level features can not give satisfactory retrieval results in many cases; especially when the high-level concepts in the user's mind is not easily expressible in terms of the low-level features. Therefore whenever possible, textual annotations shall be added or extracted and/or processed to improve the retrieval performance. In this paper a hybrid image retrieval system is presented to provide the user with the flexibility of using both the high-level semantic concept/keywords as well as low-level feature content in the retrieval process. The emphasis is put on a statistical algorithm for semantic grouping in the concept space through relevance feedback in the image space. Under this framework, the system can also incrementally learn the user's search habit/preference in terms of semantic relations among concepts; and uses this information to improve the performance of subsequent retrieval tasks. This algorithm can eliminate the need for a stand-alone thesaurus, which may be too large in size and contain too much redundant information to be of practical use. Simulated experiments are designed to test the effectiveness of the algorithm. An intelligent dialogue system, to which this algorithm can be a part of the knowledge acquisition module, is also described as a front end for the CBIR system.
Model-based video segmentation for vision-augmented interactive games
Show abstract
This paper presents an architecture and algorithms for model based video object segmentation and its applications to vision augmented interactive game. We are especially interested in real time low cost vision based applications that can be implemented in software in a PC. We use different models for background and a player object. The object segmentation algorithm is performed in two different levels: pixel level and object level. At pixel level, the segmentation algorithm is formulated as a maximizing a posteriori probability (MAP) problem. The statistical likelihood of each pixel is calculated and used in the MAP problem. Object level segmentation is used to improve segmentation quality by utilizing the information about the spatial and temporal extent of the object. The concept of an active region, which is defined based on motion histogram and trajectory prediction, is introduced to indicate the possibility of a video object region for both background and foreground modeling. It also reduces the overall computation complexity. In contrast with other applications, the proposed video object segmentation system is able to create background and foreground models on the fly even without introductory background frames. Furthermore, we apply different rate of self-tuning on the scene model so that the system can adapt to the environment when there is a scene change. We applied the proposed video object segmentation algorithms to several prototype virtual interactive games. In our prototype vision augmented interactive games, a player can immerse himself/herself inside a game and can virtually interact with other animated characters in a real time manner without being constrained by helmets, gloves, special sensing devices, or background environment. The potential applications of the proposed algorithms including human computer gesture interface and object based video coding such as MPEG-4 video coding.
Fast and adaptive semantic object extraction from video
Show abstract
Semantic video object identification and extraction is an important component for content-based multimedia applications such as editing, coding and retrieval. A smart interactive video object generation (SIVOG) system based on adaptive processing and semantic user interaction was developed in our previous work. SIVOG is further improved to efficiently process video content based on semantic object's spatial and temporal characteristic in this work. The enhanced SIVOG system adaptively selects processing regions based on the object shape. Temporal skipping and interpolation procedures are applied to objects that have a slow motion activity. This system can extract simple semantic objects in real time with pixel-wise accuracy. Fast, accurate and consistent results are obtained in the experiment when the system is evaluated with several MPEG-4 test sequences.
Image-object extraction using a genetic-programming-based object model
Show abstract
This paper presents new algorithm for a person extraction system in video. Generally, segmentation schemes are based on some criteria related to homogeneous properties of image features, such as color and motion. However, typical semantic objects comprise multiple regions having different properties, and this severely affects the segmentation results. In this paper, we propose a method to extract the block-based boundaries of semantic objects as one of the key components of our system. The method relies on an idea of integrating the manipulations of image features at an initial level with no semantics (e.g., color) and an object model at a higher level with semantics. To do so, we use genetic programing (GP) to create the object model with a set of training images. A Maximum A Posteriori (MAP) estimation procedure is applied so that the object model and the image features are integrated. In a testing process, we fuse two segmentation results: the block-based contour extracted with the MAP procedure and arbitrary shaped regions obtained with a color segmentation scheme. Thus, the final contour of an object is acquired. The proposed algorithm is applied to extract the head and the body of a person in our experiment.
Kernel-based multiple-cue algorithm for object segmentation
Jian Wang,
Ze-Nian Li
Show abstract
This paper proposes a novel algorithm to solve the problem of segmenting foreground-moving objects from the background scene. The major cue used for object segmentation is the motion information, which is initially extracted from MPEG motion vectors. Since the MPEG motion vectors are generated for simple video compression without any consideration of visual objects, they may not correspond to the true motion of the macroblocks. We propose a Kernel-based Multiple Cue (KMC) algorithm to deal with the above inconsistency of MPEG motion vectors and use multiple cues to segment moving objects. KMC detects and calibrates camera movements; and then finds the kernels of moving objects. The segmentation starts from these kernels, which are textured regions with credible motion vectors. Beside motion information, it also makes use of color and texture to help achieving a better segmentation. Moreover, KMC can keep track of the segmented objects over multiple frames, which is useful for object-based coding. Experimental results show that KMC combines temporal and spatial information in a graceful way, which enables it to segment and track the moving objects under different camera motions. Future work includes object segmentation in compressed domain, motion estimation from raw video, etc.
Image and Video Segmentation and Retrieval II
Automatic segmentation for very low bit-rate video coding
Stefaan Desmet,
Albert A. Deknuydt,
Luc Van Eycken
Show abstract
Object based coding is a new technique which is being investigated to achieve high compression ratios for video sequences. The classical approach to video coding is a four- step algorithm: segmentation, motion estimation and compensation, coding of the segmentation information and finally coding of the prediction errors. In order to reduce the number of bits needed for coding the objects, we propose another approach where we try to combine the first three steps: do the motion estimation and segmentation together and at the same time collect important information for the coding of the segmentation information.
Object-oriented hybrid segmentation using stereo images
Show abstract
In this paper, we have developed the theoretical framework for coherent image segmentation using stereo images. The robust segmentation is performed by combining multiple cues such as shape, intensity (color) and depth. Through image segmentation has been an active research field over last few decades, segmentation based on individual cue has several well-known drawbacks. For example, intensity-based schemes tend to generate detailed but inaccurate edges, and motion-based schemes only help segment moving objects. In addition, depth- based schemes may not yield satisfactory segmentation results because disparity estimation itself is a well-known ill-posed problem. Therefore, the main issue in segmentation is how to combine various cues to achieve robust segmentation results. In the proposed scheme, robust and consistent segmentation is achieved by properly combining several cues using MRF/GRF model. We first estimate intensity edges of the image and then re-evaluate the edges based on disparity edge information. In turn, the resulting intensity edges can help estimate an accurate disparity field. In addition, occlusion area can be segmented by properly combining intensity edges of stereo images.
Video structuring for multimedia applications
Amit Chakraborty
Show abstract
In recent years, videos have become immensely popular. They are being generated at an enormous rate everyday by a variety of sources such as defense/civilian satellites, scientific experiments, biomedical imaging, industrial inspections, home entertainment systems, etc. This large amount of video data makes it a tedious and hard job to browse and annotate them by just fast forward and rewind. Organizing this information into well structured databases is of crucial importance in order to be able to use these videos in a meaningful way. The user can then readily retrieve those sections of the video that he/she is interested in without having to go through all the videos involved. In this paper we use an integrated method that uses several metrics computed from the video frames, which includes interframe difference, histogram difference between frames and the time derivative of the intensity variance, and then use probabilistic reasoning to break the video into shots. Our novel method works very well for videos with a mixture of abrupt and gradual scene changes.
Object tracking under compressed domain
Show abstract
Currently, most approaches for object tracking are under spatial domain using optical flow and depth, or some model- based methods which need to uncompress those video sequences then do further disposal. The computation for uncompressing is expensive and not good for real time control. Although there are also some researchers doing object tracking under compressed domain, they only use part of DCT values of I frames in a video sequence, which doesn't take advantage of the information under compressed domain. In this paper we consider a new method for object tracking which uses the information only supplied in compressed domain through the MPEG encoder. The main scheme we have performed is to get the motion vectors of P and B frames directly from the MPEG video without decompressing it then cluster objects based on the motion vectors. In particular, camera motion is also taken into account since camera's motion can influence the objects' motion and segmentation results dramatically. Experiments based on the method mentioned above have been carried out in several videos. The results obtained indicate that tracking object in compressed domain is very promising.
Minimum description length region tracking with level sets
Show abstract
This paper addresses the problem of tracking an arbitrary region in a sequence of images, given a pre-computed velocity field. Such a problem is of importance in applications ranging from video surveillance to video database search. The algorithm presented here formulates tracking as an estimation problem. We propose, as our estimation criterion, a precise description length measure that quantifies tracking performance. In this context, tracking is naturally formulated as minimum description length estimation. The solution to this estimation problem is given by particular evolution equations for the region boundary. The implicit representation of the region boundary by the zero level set of a smooth function yields an equivalent set of partial differential equations and the added benefit of topology independence; regions may split (e.g., for divergent velocity fields) or merge (e.g. for convergent velocity fields) during tracking, clearly a desirable feature in real-world applications. We illustrate the performance of the proposed algorithm on a number of real images with natural motion.
Image Processing
Multiscale scheme for image magnification
Show abstract
Using the wavelet transform (WT), a given signal is decomposed into a succession of embedded approximations and detail coefficients. The observation of the details shows that similarities can be noticed across scales, in particular for the transitions (edges in an image). A wavelet-based magnification that both increases the resolution of an image and adds high-frequency information is proposed in this paper. From a non-subsampled WT, the zero-crossings of the details coefficients provide a consistent representation. From these coefficients, a prediction of high-frequency coefficients is possible via the computation of local Liptschitz exponents but needs an interpolation due to the constancy of the number of details coefficients. The proposed magnification is based on the decimated Mallat's algorithm. As this transformation is not shift-invariant, the local laws cannot be computed. The prediction is realized via the learning of representative edge signatures. A multiscale database is therefore constructed from the edge's zero-crossings. The magnification quality is evaluated by application on synthetic and noisy images.
Wavelet-domain edge modeling with applications to image interpolation
Bo Tao,
Michael T. Orchard
Show abstract
Edge modeling in the wavelet domain is important to efficient image compression and accurate image interpolation among many other applications. In this work, we first analyze the properties of an edge in different scales. By using simple models for edges and wavelet filters, we show how edge coefficients in one scale relate to edge coefficients in another scale in both magnitude and phase, which we call scale coherence. Real image data are used to verify the goodness of the model. We further propose to use spatial coherence, i.e. the structural correlation of edge signals with their spatial neighbors, as additional tools to locate an edge. The model enables us to predict the edge coefficients in a high- frequency band by inspecting their counterparts in previous high-frequency bands and in their spatial neighborhood, and is applied to image interpolation.
Stochastic wavelet-based image modeling using factor graphs and its application to denoising
Show abstract
In this work, we introduce a hidden Markov field model for wavelet image coefficients within a subband and apply it to the image denoising problem. Specifically, we propose to model wavelet image coefficients within subbands as Gaussian random variables with parameters determined by the underlying hidden Markov process. Our model is inspired by the recent Estimation-Quantization (EQ) image coder and its excellent performance in compression. To reduce the computational complexity we apply a novel factor graph framework to combine two 1-D hidden Markov chain models to approximate a hidden Markov Random field (HMRF) model. We then apply the proposed models for wavelet image coefficients to perform an approximate Minimum Mean Square Error (MMSE) estimation procedure to restore an image corrupted by an additive white Gaussian noise. Our results are among the state-of-the-art in the field and they indicate the promise of the proposed modeling techniques.
Motion Estimation and Video Coding
Three-dimensional motion and deformation estimation of deformable mesh
Albert A. Deknuydt,
Stefaan Desmet,
Kris Cox,
et al.
Show abstract
Recently real-time capture of dynamic 3D-objects has become feasible. The dynamic models obtained by various techniques, come in the form of separate highly detailed 3D-meshes with texture at video-rates. These represent such an amount of data, as to hamper manipulation, editing and rendering. Data- compression techniques can alleviate this problem. Independent decimation of the separate meshes, is an inferior solution for what is really time varying mesh. Firstly, it causes unnatural flickering, and secondly, it leaves the inter-mesh correlation unexploited. Therefore, a hybrid technique might be a better solution. It consists of an 'intra' compression scheme working on still mesh, a 3D motion estimator/predictor, and a coder for the prediction errors and side information (motion vectors and mesh segmentation). We describe a technique to segment a deforming mesh into regions with locally-uniform motion. We start by interpreting the motion as samples of a 3D vector field. In each point, we estimate the translation, rotation and divergence of the vector field. As human faces are rather incompressible, we ignore the divergence component. Then, we cluster the population with the criterion of similar translation and rotation. Results show that it allows to segment a deforming human face into approximately 200 regions of locally-uniform rigid motion, while keeping the motion prediction error under 5 percent. This is good enough for efficient compression.
Generation and tracking of mesh objects in image sequences
JuiTai Ko,
Bih-Wei Shyr,
Sheng-Jyh Wang
Show abstract
In this paper, we propose a new scheme that could automatically generate a hierarchical mesh structure for a real image and then use a nodal block matching to track this mesh structure in an image sequence. First, a three-layer pyramid is built by using a multi-resolution approach. For each layer, after extracting the high-curvature features, a linking procedure and a splitting process are applied to generate a compact set of representative points. By adopting the constrained Delaunay triangulation algorithm, the selected points can be triangulated into meshes. Starting from the coarsest layer to the finest layer, we further eliminate the duplicate nodes and then form a hierarchical mesh structure. Based on the hierarchical mesh structure and the intensity value at the mesh nodes and, we can progressively reconstruct an image with simple linear interpolation. Moreover, the motion of a mesh is tracked using a coarse-to-fine approach to lower the computational complexity.
Matching algorithm based on Godel coding scheme
Neslisah Dicle,
Volkan Atalay
Show abstract
We describe an algorithm for the correspondence of line features between two consecutive images. The algorithm is based on Godel coding of the features and singular value decomposition. First, line segments are extracted by Canny operator followed by the end-point-fit method. Line segments are represented by coordinates of midpoints and the angle of a perpendicular line from a reference point. Then, a proximity matrix is constructed following the minimal mapping theory of Ullman. Thus if two line segments are correlated, the corresponding matrix element is the Godel coded difference of their features; otherwise the element is assigned to a maximum number. Finally, singular value decomposition is applied on the proximity matrix. Godel coded differences strengthens the method due to the fact that not only the norms of the vectors are compared for matching but also their unique Godel numbers are involved. Proposed algorithm is implemented and tested both on calibrated and uncalibrated stereo image pairs and the matching results are promising.
Multiple motion segmentation with level sets
Show abstract
Motion segmentation of an image sequence belongs to the most difficult and important problems in video processing and compression, and in computer vision. In this paper, we consider the problem of segmenting an image into multiple regions possibly undergoing different motions. To this end we use level sets of functions evolving according to certain partial differential equations. Contrary to numerous other motion segmentation algorithms based on level sets, we compute accurate motion boundaries without relying on intensity boundaries as an accessory. This will be illustrated on examples where intensity boundaries are hardly visible and yet motion boundaries are accurately identified. The main benefit of the level set representation is in its ability to handle variations in the topology of the level sets. As a result, it is only necessary to know the total number of distinct motion classes and their parameters. We describe an automatic initialization procedure that is based on feature point correspondences and K-means clustering in a 6-parameter space of affine parameters. We illustrate the performance of the proposed algorithm on real images with both real and synthetic motion.
Poster Session
Efficient algorithm for identifying dependency regions for fast fractal image decoding
Show abstract
This paper describes novel fractal coding scheme that significantly improves the efficiency of the fractal image decoding. Removing great number of contractive transforms required by the decoding process can significantly reduce the decoding time. The proposed decoding scheme effectively finds dependency regions, whose range blocks are decoded by only one contractive transformation, from an encoded image. The experimental results show the significance of our proposed scheme in improving the efficiency of the fractal decoding process.
Modified symmetrical reversible variable length code and its theoretical bounds
Chien-Wu Tsai,
Ja-Ling Wu,
Shu-Wei Liu
Show abstract
The reversible variable length codes (RVLCs) have been adopted in the emerging video coding standards -- H.263+ and MPEG- 4, to enhance their error-resilience capability which is important and essential in the error-prone environments. The most appealing advantage of symmetrical RVLCs compared with asymmetrical RVLCs is that only one code table is required to forward and backward decoding, however, two code tables are required for asymmetrical RVLCs. In this paper, we propose a simple and efficient algorithm that can produce a symmetrical RVLC from a given Huffman code, and we also discuss theoretical bounds of the proposed symmetrical RVLCs.
Scalable wavelet video coding using long-term memory motion-compensated prediction
Show abstract
Temporal redundancies are the key to high compression ratios in video coding. In order to improve the prediction gain of motion compensation the concept of long-term memory motion- compensated prediction has been developed. More frames than the previously decoded frame can be taken into account for motion compensation. Usually the motion is estimated in the encoder, where all unencoded frames are accessible. We investigate the applicability of long-term memory motion- compensated prediction to a scalable wavelet video coding scheme using backward motion compensation, where the motion is estimated in both, the encoder and the decoder.
New class of VQ codebook design algorithms using adjacency maps
Andreas Constantinou,
David R. Bull,
Cedric Nishan Canagarajah
Show abstract
We propose a new class of vector quantization (VQ) codebook design algorithms, which alleviate many of the drawbacks associated with the well-known LBG and its variants. We introduce the notion of an adjacency map (AM), which provides a heuristic template for improved codebook design, by reducing the search space required for exhaustive optimization, while providing solutions close to the globally optimum, independent of the initial codewords or a target codebook size. An iterative adjacency merge (IAM) algorithm is presented, which outperforms the pairwise-nearest-neighbor (PNN) approach, through conformance to the minimum adjacency map. Additionally, an exhaustive search algorithm is presented that reduces the search complexity to the minimum without introducing heuristics.
Wavelet image coding using intercontext arithmetic adaptation
Nikolaos V. Boulgouris,
Dimitrios Vyzovitis,
Michael G. Strintzis
Show abstract
In this paper, we present a novel approach to overcoming the context dilution problem in context-based entropy coding of images. We propose a family of algorithms that employ similarity among context models to improve the adaptation rate of context models. The proposed scheme employs nonconventional updates of the probability tables kept by the context entropy coder, that extend the notion of symbol occurrence. Preliminary experimental results, obtained using wavelet transformed images, indicate that the basic algorithm indeed improves the performance of conventional context modelers by enhancing the model adaptation rate, and achieve efficiency competitive to well-established algorithms.
Embedded DCT and wavelet methods for fine granular scalable video: analysis and comparison
Show abstract
Video transmission over bandwidth-varying networks is becoming increasingly important due to emerging applications such as streaming of video over the Internet. The fundamental obstacle in designing such systems resides in the varying characteristics of the Internet (i.e. bandwidth variations and packet-loss patterns). In MPEG-4, a new SNR scalability scheme, called Fine-Granular-Scalability (FGS), is currently under standardization, which is able to adapt in real-time (i.e. at transmission time) to Internet bandwidth variations. The FGS framework consists of a non-scalable motion-predicted base-layer and an intra-coded fine-granular scalable enhancement layer. For example, the base layer can be coded using a DCT-based MPEG-4 compliant, highly efficient video compression scheme. Subsequently, the difference between the original and decoded base-layer is computed, and the resulting FGS-residual signal is intra-frame coded with an embedded scalable coder. In order to achieve high coding efficiency when compressing the FGS enhancement layer, it is crucial to analyze the nature and characteristics of residual signals common to the SNR scalability framework (including FGS). In this paper, we present a thorough analysis of SNR residual signals by evaluating its statistical properties, compaction efficiency and frequency characteristics. The signal analysis revealed that the energy compaction of the DCT and wavelet transforms is limited and the frequency characteristic of SNR residual signals decay rather slowly. Moreover, the blockiness artifacts of the low bit-rate coded base-layer result in artificial high frequencies in the residual signal. Subsequently, a variety of wavelet and embedded DCT coding techniques applicable to the FGS framework are evaluated and their results are interpreted based on the identified signal properties. As expected from the theoretical signal analysis, the rate-distortion performances of the embedded wavelet and DCT-based coders are very similar. However, improved results can be obtained for the wavelet coder by deblocking the base- layer prior to the FGS residual computation. Based on the theoretical analysis and our measurements, we can conclude that for an optimal complexity versus coding-efficiency trade- off, only limited wavelet decomposition (e.g. 2 stages) needs to be performed for the FGS-residual signal. Also, it was observed that the good rate-distortion performance of a coding technique for a certain image type (e.g. natural still-images) does not necessarily translate into similarly good performance for signals with different visual characteristics and statistical properties.
Lossless image compression by adaptive contextual encoding
Show abstract
This paper deals with the reversible intraframe compression of grayscale images. With reference to a spatial DPCM scheme, prediction may be accomplished in a space varying fashion following two main strategies: adaptive, i.e., with predictors recalculated at each pixel position, and classified, in which image blocks, or pixels are preliminarily labeled into a number of statistical classes, for which minimum MSE (MMSE) predictors are calculated. In this paper, a trade off between the above two strategies is proposed, which relies on a classified linear-regression prediction obtained through fuzzy techniques, and is followed by context based statistical modeling of the outcome prediction errors, to enhance entropy coding. A thorough performances comparison with the most advanced methods in the literature highlights the advantages of the fuzzy approach.
Semi-optimized padding method for arbitrarily shaped image coding
Koh'ichi Takagi,
Atsushi Koike,
Shuichi Matsumoto
Show abstract
In this paper, we propose a new 'padding method,' which is a technique allowing encoding with a squared block by assigning imaginary values to pixels except in an object. We show our methods to be more effective in coding performance than conventional ones. We theoretically study padding method from the viewpoint of the coding performance, or rate-distortion. We first define the criteria function to evaluate each padding method, and then concretely propose how to determine padding pixel values. We select a criterion for distribution of coefficients after applying the DCT and for quantized error of DCT coefficients after quantization. Pixel values to be filled in background pixels can then also be calculated so that both entropy and quantized error are also low for determined quantizer. We tested our padding method by computer simulation using test squared blocks (8 X 8 pixels). Results of testing showed our proposed method performed on average 2 or 3 dB at the entire bit-rate compared to be the MPEG-4 VM padding method. We then adapted padding method to entire test images (having no segmentation information) and found that the coding performance increased about 0.5 or 0.6 dB compared to the method which applying a typical 2D-DCT.
Exploiting the third dimension in the lossless and near-lossless compression of medical volume images and video
Dirk De Rycke,
Steven Van Assche,
Wilfried R. Philips,
et al.
Show abstract
Recent advances in digital technology have caused a huge increase in the use of 3D image data. In order to cope with large storage and transmission requirements, data compression is necessary. Although lossy techniques have shown to achieve higher compression ratios than lossless techniques, the latter are sometimes required, e.g., in medical environments. A lot of lossless image compressors do exist, but most of them don't exploit interframe correlations. In this paper we extend and refine a recently proposed technique which combines intraframe prediction and interframe modeling. However its current performance was still significantly worse than current state- of-the-art intraframe methods. After adding techniques often used in those state-of-the-art schemes and other refinements, a fair comparison with state-of-the-art intraframe coders is made. It shows that the refined method achieves considerable gains on video and medical images compared to these purely intraframe methods. The method also shows some good properties such as graceful compression ratio degradation when the interframe gap (medical volume data) or interframe delay (video) increases.
Achieving idempotence in near-lossless JPEG-LS
Show abstract
The lossless and near-lossless image compression standard, JPEG-LS, while offering state-of-the-art compression performance with low complexity, fails to be idempotent in near-lossless mode (i.e., images degrade upon successive compression/decompressions). This paper identifies the cause and presents two solutions. First it presents a modification to the compressor and decompressor that maintains or improves the error bounds and achieves idempotence. Second it describes a preprocessor that acts upon any image and returns one upon which JPEG-LS does perform idempotently at the expense of doubling the guaranteed error bound on a small subset of pixels (typically below 0.5%).
Adaptive segmentation of wavelet transform coefficients for video compression
Piotr Wasilewski
Show abstract
This paper presents video compression algorithm suitable for inexpensive real-time hardware implementation. This algorithm utilizes Discrete Wavelet Transform (DWT) with the new Adaptive Spatial Segmentation Algorithm (ASSA). The algorithm was designed to obtain better or similar decompressed video quality in compare to H.263 recommendation and MPEG standard using lower computational effort, especially at high compression rates. The algorithm was optimized for hardware implementation in low-cost Field Programmable Gate Array (FPGA) devices. The luminance and chrominance components of every frame are encoded with 3-level Wavelet Transform with biorthogonal filters bank. The low frequency subimage is encoded with an ADPCM algorithm. For the high frequency subimages the new Adaptive Spatial Segmentation Algorithm is applied. It divides images into rectangular blocks that may overlap each other. The width and height of the blocks are set independently. There are two kinds of blocks: Low Variance Blocks (LVB) and High Variance Blocks (HVB). The positions of the blocks and the values of the WT coefficients belonging to the HVB are encoded with the modified zero-tree algorithms. LVB are encoded with the mean value. Obtained results show that presented algorithm gives similar or better quality of decompressed images in compare to H.263, even up to 5 dB in PSNR measure.
Component ratio preserving compression for remote sensing applications
Show abstract
This paper presents a new distortion measure for multi-band image vector quantization. The distortion measure penalizes the deviation in the ratios of the components. We design a VQ coder for the proposed ratio distortion measure. We then give experimental results that demonstrate that the new VQ coder yields better component ratio preservation than conventional techniques. For sample images, the proposed scheme outperforms SPIHT, JPEG and conventional VQ in color ratio preservation.
Efficient shape coding algorithm by quadtree decomposition for MPEG-4
Chil-Cheang Ma,
Mei-Juan Chen
Show abstract
Due to rapid growth of video coding techniques, the object- oriented video compression is an important role in recent years. MPEG-4 visual is the first international standard allowing to transmit arbitrarily shaped video object. In this paper, we propose a new shape coding algorithm called quadtree-based binary shape coding (QTSC) algorithm. The proposed method is based on the simple characteristic of the quadtree decomposition to improve coding efficiency. Quadtree (QT) decomposition has been used as a part of image sequence compression algorithms. It is a simple technique used to obtain an image representation at different resolution levels. An improved quadtree shape coding (IQTSC) algorithm to further increase coding performance is also presented. We compare the proposed method with the bitmap-based and contour-based methods for MPEG-4. Simulation results show that the coding performance of the improved scheme is better than conventional shape coding methods. The proposed methods are shown to be simple, and efficient solutions and suitable for hardware implementation.
Using a model of the human visual system to identify and enhance object contours in natural images
Show abstract
Segmentation of natural images depends on the ability to identify continuous contours that define the boundaries between objects. However, in many natural images (especially those captured in environments where the illumination is largely ambient) continuous contours can be difficult to identify. In spite of this, the human visual system efficiently perceives the contours along the boundaries of occluding objects. In fact, optical illusions, such as the Kanizsa triangle, demonstrate that the human visual system can 'see' object boundaries even when spatial intensity contrasts are totally absent from an image. In searching for the mechanism that generates these 'subjective contours' neurological researchers have found that the 2D image on the retina is mapped onto Layer 4 of the primary visual cortex (V1) and that there are lateral connections within the 6 layers of V1 that might subserve contour completion. This paper builds on a previous model of the early visual system (including the retina, the LGN and the simple cells of V1) by adding lateral interconnections to demonstrate how these interconnections might provide contour completion. Images are presented to show how this model enhances the detection of continuous spatial contours, thus contributing to the segmentation of natural images.
MRF-based texture segmentation using wavelet decomposed images
Show abstract
One difficulty of textured image segmentation in the past was the lack of computationally efficient models which can capture the statistical regularities of textures over large distances. Recently, to overcome this difficulty, Bayesian approaches capitalizing on the computational efficiency of multiresolution representations have received attention. Most of the previous researches have been based on multiresolution stochastic models which use the Gaussian pyramid decomposition as the image decomposition scheme. In this paper, motivated by the nonredundant, directional selectivity, and highly discriminative nature of the wavelet representation, we present an unsupervised textured image segmentation algorithm which is based on a multiscale stochastic modeling over the wavelet decomposition of the image. The model, using doubly stochastic Markov random fields (MRFs), captures intrascale statistical dependencies over the observed image's wavelet decomposition and intrascale and interscale statistical dependencies over the corresponding multiresolution region image (an unobserved image which contains the classification of pixels in the image). For the sake of computational efficiency, versions of the Expectation-Maximization (EM) algorithm and Maximum a posteriori (MAP) estimate, which are based on the mean-field decomposition of a posteriori probability, are used for estimating model parameters and the segmented image, respectively.
Using density and spatial cues for clustering image pixels in real time
Show abstract
The goal of our work is efficient clustering of object pixels from a sequence of live images for use in real-time applications including object recognition and tracking. We propose a novel approach to clustering object pixels into separate objects using density and spatial cues. The suggested method runs in linear time, accounts for image noise and yields real-time performance.
Feasibility of using a stabilizer in constraining the motion of epicardial tissues during MIDCAB surgery: analysis using image processing techniques
Vijay K. Subramaniam,
Sathyanarayana S. Rao,
Kiran Pallegadda
Show abstract
This paper deals with the motion analysis and measurement of Epicardial tissues of the heart during MIDCAB (Minimal Invasive Direct Coronary Artery Bypass) and subsequently determines the effectiveness of a stabilizer used during this process. It involves measurement of the movement of muscles within and outside the stabilizer-using Image processing techniques based on which the surgeons' can determine the effectiveness of the stabilizer. We used a localization approach to measure the displacement of muscles. Comparisons were made with previous approaches that utilized global invariants such as the weighted PSNR method. It is found that the Standard deviation is lower with the localization approach, which means the accuracy of measurement obtained is better.
Implementation and analysis of an optimized rainfalling watershed algorithm
Show abstract
In this paper we discuss a new implementation of a floating point based rainfalling watershed algorithm. First, we analyze and compare our proposed algorithm and its implementation with two implementations based on the well-known discrete Vincent- Soille flooding watershed algorithms. Next, we show that by carefully designing and optimizing our algorithm a memory (bandwidth) efficient and high speed implementation can be realized. We report on timing and memory usage results for different compiler settings, computer systems and algorithmic parameters. Our optimized implementation turns out to be significantly faster than the two Vincent-Soille based implementations with which we compare. Finally, we include some segmentation results to illustrate that visually acceptable and almost identical segmentation results can always be obtained for all algorithms being compared. And, we also explain how, in combination with other pre- or post- processing techniques, the problem of oversegmentation (a typical problem of all raw watershed algorithms) can be (partially) overcome. All these properties make that our proposed implementation is an excellent candidate for use in various practical applications where high speed performance and/or efficient memory usage is needed.
Blocking artifact removal based on blockiness estimation
Qinggang Zhou,
Chris Basoglu,
Woobin Lee
Show abstract
We present a fast and robust blocking artifact removal method for images coded using block transforms. While other artifact removal methods apply complicated smoothing schemes on the entire image, this method estimates the level of blockiness and applies a simple low-pass filter only on the blocking artifacts. In this paper, we first formulate a function that describes the relative gradient continuities of the pixel values. This function makes use of the characteristics of the blocking artifact, such as the position and magnitude of the artifact, to distinguish real edges from the blockiness. The function is mostly continuous in smooth areas but discontinuous in blocky areas. The results of the function are compared to an empirically obtained threshold to determine the existence of a blocking artifact. Once the artifact is detected, any smoothing method can be applied. On test images coded with the JPEG standard, our method visually removed almost all of the blocking artifacts. The signal-to-noise also improved, but more importantly, the subjective quality of the images processed with our method was noticeably better than that of other methods. In addition, our method did not degrade the image areas where artifacts were not present. The false detection rate of our method was found to be less than 1% on the test images, thereby preserving the true edges in the image.
Improved method for digital image manipulation: storage, transmission, and display
Show abstract
We present and analyze a new digital image manipulation method. Our main goals are to optimize the use of resources for image generation, storage, transmission, processing and display, and to display images with high quality. The proposed method consists on generating, storing, transmitting and processing images with fewer image samples (manipulation resolution), than screen pixels (display resolution). The display resolution is greater than manipulation resolution, and in the last stage of the proposed method, image display stage, we use some high quality reconstruction technique to generate these new pixels. In this work, we use the Two Dimensional Normalized Sampled Finite Sinc Reconstructor (NSFSR 2-D). We make qualitative and quantitative analyses of the proposed and currently used methods, and observe two important situations: Using the same image display resolution in both methods, the proposed has a smaller image manipulation resolution and resources usage, and image display quality is similar. Using the same image manipulation resolution and different image display resolutions, both methods have the same image manipulation resources usage and the proposed method has a much better image display quality. We conclude that the proposed image manipulation method achieves a better overall image quality, and reduces drastically the resources usage, like network bandwidth, processing and storage capacity. Thus, our main goals were achieved.
Multiframe combination and blur deconvolution of video data
Show abstract
In this paper we present a technique that may be applied to surveillance video data to obtain a higher-quality image from a sequence of lower-quality images. The increase in quality is derived through a deconvolution of optical blur and/or an increase in spatial sampling. To process sequences of real forensic video data, three main steps are required: frame and region selection, displacement estimation, and original image estimation. A user-identified region-of-interest (ROI) is compared to other frames in the sequence. The areas that are suitable matches are identified and used for displacement estimation. The calculated displacement vector images describe the transformation of the desired high-quality image to the observed low quality images. The final stage is based on the Projection Onto Convex Sets (POCS) super-resolution approach of Patti, Sezan, and Tekalp. This stage performs a deconvolution using the observed image sequence, displacement vectors, and an a priori known blur model. A description of the algorithmic steps are provided, and an example input sequence with corresponding output image is given.
2D wavelet feature detection for defining curved boundaries in Landsat images
Show abstract
Multiscale feature detection will be extended over a large region of Landsat images to define boundaries between homogeneous regions formed by individual crops. It is expected that it will be possible to define a grid between homogeneous regions defined by both man-made boundaries (2-D edges) and river beds (2-D curves) which define the availability of water. This approach might be usefully applied to remote sensing images based on other wavelengths (i.e. IR or laser remote sensing).
Facial motion parameter estimation and error criteria in model-based image coding
Show abstract
Model-based image coding has been given extensive attention due to its high subject image quality and low bit-rates. But the estimation of object motion parameter is still a difficult problem, and there is not a proper error criteria for the quality assessment that are consistent with visual properties. This paper presents an algorithm of the facial motion parameter estimation based on feature point correspondence and gives the motion parameter error criteria. The facial motion model comprises of three parts. The first part is the global 3-D rigid motion of the head, the second part is non-rigid translation motion in jaw area, and the third part consists of local non-rigid expression motion in eyes and mouth areas. The feature points are automatically selected by a function of edges, brightness and end-node outside the blocks of eyes and mouth. The numbers of feature point are adjusted adaptively. The jaw translation motion is tracked by the changes of the feature point position of jaw. The areas of non-rigid expression motion can be rebuilt by using block-pasting method. The estimation approach of motion parameter error based on the quality of reconstructed image is suggested, and area error function and the error function of contour transition-turn rate are used to be quality criteria. The criteria reflect the image geometric distortion caused by the error of estimated motion parameters properly.
Triangle mesh-based motion compensation scheme with shape-adaptive wavelet transform
Martina Eckert,
Damian Ruiz,
Jose Ignacio Ronda,
et al.
Show abstract
In this paper, we present a video coding scheme which combines motion compensation (MC) based on irregular triangle meshes, wavelet transform (DWT) and zerotree coding. This scheme includes the possibility to use the wavelet transform conventionally (over the whole frame) or as a shape adaptive version (SADWT) applied to selected regions of the error image. We propose to transform regions which are formed by groups of triangles situated over parts of the error image with a high variance level. In this way it is possible to reduce the transmitted error information to areas where most motion takes place, which is especially useful in cases of a low bit-rate. As the regions can also consist in object shapes, this method also favors the object-based approach of our scheme. The transformation method results in a number of coefficients exactly the same as the number of pixels in the arbitrary shaped region, unless zerotree coding is performed over the whole frame. Results of applying this technique in a low bit-rate environment are presented at the end of the paper.
Fast four-step search algorithm using UESA and quadrant selection approach for motion estimation
Show abstract
Motion estimation has been widely used by various video coding standards. Full Search is the most straightforward and optimal block matching algorithm but its huge computational complexity is the major drawback. To overcome this problem several fast block matching motion estimation algorithms have been reported. In this paper a fast four step search algorithm based on the strict application of Unimodal Error Surface Assumption has been proposed. Quadrant Selection Approach has been adopted to reduce the computational complexity. The algorithm is adaptive in the sense that it can be stopped at the second or third step depending on the motion content of the block based on the Half Stop Technique. Simulation results show that the number of search points in our algorithm are almost half as compared to the conventional four step search algorithm. The total number of search points varies from 7 to 17 in our proposed algorithm. The worst case computational requirement is only 17 block matches. Our algorithm is robust, as the performance is independent of the motion of the image sequences. It also possesses regularity and simplicity of hardware oriented features.
Fast block motion estimation algorithm based on combined subsamplings on pixels and search candidates
Show abstract
Block motion estimation is one of the key technologies in video compression and has been widely adopted by several existing video coding international standards. Many popular block motion estimation methods, including three-step search (TSS), new three-step search (NTSS), and four-step search (4SS), have assumed that the error surface is unimodal over the search area or the motion vector is center-biased. However, these assumptions do not hold for most MPEG-1,2 video frames. As a result, the schemes with these assumptions will exhibit degraded performance as they applied to MPEG-1,2 video frames. In this paper, we propose a fast block matching motion estimation scheme based on an integration of pixel subsampling and search candidate subsampling. Comparing with the well- known TSS algorithm, the proposed scheme visits more candidates so that it can significantly avoid trapping into the local minima and therefore is more robust. Experimental results using typical MPEG-1 video frames show that the proposed algorithm can achieve better PSNR as well as higher speed-up ratios than the well-known TSS. In addition, the combined subsampling scheme has a regular structure that facilitates easy hardware implementation.
Fast block-matching algorithm using threshold-based half stop, cross search, and partial distortion elimination
Show abstract
New three-step search (NTSS) algorithm obtains good picture quality in predicted images with more reduced computation on the average. To reduce more the computation while keeping error performance compared with NTSS, this paper proposes a fast NTSS algorithm using unimodal error surface assumption correlation of causal adjacent matching errors, partial distortion elimination (PDE) algorithm and cross search algorithm. Proposing algorithm reduces less important checking points of the first step in the NTSS by using initial sum of absolute difference (SAD) and adaptive threshold of SAD. Instead of checking seventeen candidate points in the first step like the NTSS, our search algorithm starts with nine checking points according to the result of comparison between initial SAD and adaptive threshold of SAD. We get adaptive threshold of SAD according to the causal adjacent SADs. For more computational reduction without any degradation in prediction quality, we employ PDE and cross search algorithm. Therefore, we can apply this algorithm to variety of applications because the threshold is adaptive to the characteristics of each sequence. Experimentally, our algorithm shows good performance in terms of PSNR of predicted images and average-checking points for each block compared with the conventional NTSS and TSS algorithms.
Motion-based segmentation for object-based video coding and indexing
Show abstract
A 'region-based' approach to the problem of motion estimation and segmentation in video sequences is presented. The devised algorithm requires an initial still-picture partition and a dense optical flow: affine region motion parameters are robustly estimated from pixel motion vectors on color- homogenous regions, which are further merged on a motion- homogeneity criterion, and temporally tracked. Computer simulation results and comparisons with other approaches are given. Applications to object-based representation, manipulation and coding as well as indexing of video are discussed
Fast block search using Haar decomposition
Jarkko Kari,
Gang Liang,
Simant Dube,
et al.
Show abstract
We propose and analyze a new technique of fast block matching for motion estimation applications. The technique is based on computing lower bounds for the mean square difference between blocks in their Haar decomposition, using only as few Haar coefficients as is necessary. The algorithm always finds the optimal solution under the mean square error metric. Experiments show a significant speed-up over the exhaustive search algorithm.
Efficient motion estimation algorithm for video transcoding
Mei-Juan Chen,
Ming-Chung Chu,
Chih-Wei Pan
Show abstract
In multimedia applications, it is often needed to adapt the bit-rate of the coded video bit streams to the available bandwidth of various communication channels. Because different networks may have different bandwidths, a gateway can include a transcoder to adapt the video bit-rates in order to provide video services to users on different networks. In transcoding, motion estimation is usually not performed in the transcoder because of its heavy computation complexity. To speed up the operation, a video transcoder usually reuses the decoded motion vectors from the incoming bit stream. Previously, the bilinear interpolation and forward with dominant vector selection (FDVS) methods were proposed to reuse motion vectors. In this paper, we propose a new algorithm called activity dominant vector selection (ADVS) which utilizes the quantized discrete cosine transform (DCT) coefficients of residual blocks for composing a motion vector from the incoming ones. In addition, a new motion vector refinement algorithm called variable step-size search (VSS) is presented. The performance can be improved and maintaining low computation complexity.
Efficient image denoising using side information
Show abstract
Recently a constructive practical framework for the problem of source coding with side information was proposed. In this work we address the problem of denoising of images with side information at the decoder. We propose this with the addition of a side digital channel and decode the digital bits with the help of a noisy image (side information) at the decoder. The encoder needs to compress the image using the knowledge that the decoder has access to a noisy version of the image. We propose a rate allocation technique to optimally allocate the rate among the wavelet coefficients of the image. With a rate of transmission of 0.8175 bits/source sample, we get a gain of 1.665 dB over the conventional source coding techniques. This is achieved by modifying only 9% of the wavelet coefficients in the conventional source coder.
Automatic face detection and tracking for H.263-compatible region-of-interest coding
Show abstract
In this paper a H.263 compatible region-of-interest coding system is presented. A face detection and tracking algorithm is applied to find the region-of-interest (ROI). The input image is filtered by a region-adaptive lowpass filter which blurs the image outside the facial region. This region- adaptive lowpass filtering leads to a graceful degradation of the image quality outside the ROI, whereas the ROI remains unchanged. Since no modification is necessary at the encoder, the ROI preprocessing step is compatible to any existing implementation of the H.263 encoder. For face detection, color information is integrated into a detection algorithm based on principal components analysis. Once a face is detected, tracking is based on color information. The ROI filter can be combined with any video sequence coder. Simulation results are given using the H.263 standard. The bitrate can be significantly reduced while retaining a high perceptual quality.
Efficient multimedia distribution framework for Internet video
Jia Yao,
Jozsef Vass,
Yan Huang,
et al.
Show abstract
Multimedia distribution over the Internet is becoming increasingly popular. Since the Internet was designed for computer data communication, satisfying the different characteristics and requirements of multimedia streams poses significant challenges. For effective and efficient Internet video streaming, many issues (e.g., multiresolution representation, multicast transmission, error control, synchronization, etc.) must be addressed. In the paper, a novel framework for Internet video streaming is proposed. For video compression, our previously developed three-dimensional significance-linked connected component analysis (3D-SLCCA) codec is applied. 3D-SLCCA provides high coding efficiency, multiresolution video representation, transmission error resilience, and low computational complexity. For audio coding, the GSM standard is used. For error control, retransmission and error concealment are jointly applied. Multiresolution-multicast transmission is implemented by assigning different multicast group addresses to different video layers. Thus each receiver subscribes to the maximum number of layers that both its hardware resource and network capability can handle. By using hierarchically structured multicast tree, each node is responsible for caching packets, collecting NACK packets, and sending repair packets. This not only significantly reduces the latency, but also efficiently solves the 'ACK implosion' problem. As opposed to data transmission, reliable multicast is not required by the network infrastructure. Based on timing constraint and importance of lost packets, each receiver decides whether to ask for retransmission or apply error concealment. Finally synchronization is accomplished by using the timestamp mechanism of RTP. When the network does not support multicasting, proxy servers implemented on workstations can be used to perform similar functionalities.
Video communications over wireless ATM networks
Jozsef Vass,
Yan Huang,
Xinhua Zhuang
Show abstract
An adaptive and integrated video communication system is proposed for wireless ATM. Video received from the wireline source is adapted at the base station to both the hardware capabilities of mobile host and time-varying wireless channel conditions. Following the application level framing principle, source coding, channel coding, and packetization are jointly implemented as part of the application and only simple services are requested from the underlying wireless network infrastructure. Highly efficient and robust source coding, channel coding, and packetization techniques are also proposed. For source coding, we propose to use our three- dimensional significance-linked connected component analysis video codec. For channel coding and packetization, both intracell and interlaced (intercell) forward error correction are applied. Furthermore, the time-varying channel characteristics is exploited by adaptively allocating the total bit budget between source coding and channel coding. Performance evaluation demonstrates the effectiveness of the proposed wireless video communication system.
Performance analysis of image transmission over wireless communication channels
Show abstract
The transmission of real-time images and video over wireless communication channels is still a challenge problem. Digital compressed images are sensitive to bit errors which are typical in wireless communications. Moreover, the bandwidth at the air interface is currently a limiting factor because the first and second generation of mobile phone standards mainly support voice communications. In this paper, we present our study of real-time image traffic over a radio link -- the aim of this research is for videophone applications. In this study, we use the Discrete Wavelet Transform (DWT) to compress images and a Code Division Multiple Access (CDMA) link to transfer images over wireless communication channels. The results of the experiment show that it is possible to transfer 4 QCIF images per second over a CD MA link with minor degradation in image quality. This study was investigated by the VLSI Signal, Image and Video Processing Research Laboratory at the University of California, San Diego (UCSD).
Dynamic quality-of-service management in wireless multimedia networks
Show abstract
The provisioning of quality of service (QoS) in future wireless communication networks is a complex problem due to the presence of changing network connectivity, user mobility, and shared, noisy, highly variable and limited communication links. In this paper, we propose a QoS management framework in wireless multimedia networks. A new comprehensive service model considering both traffic characteristics and user mobility for wireless multimedia networks is proposed. Based on this proposed service model, adaptive resource scheduling, admission control and resource reservation schemes are applied appropriately. Simulation results show that the proposed scheme can achieve higher network utilization and the resulting multimedia traffic can get better quality of service guarantees at different levels.
Comparison of multiple-description coding and layered coding based on network simulations
Show abstract
Layered coding has been proposed as a method of 'quality adaptation' for the Internet's best effort service model. The disadvantage of layered coding is that if the base layer packets are lost, the enhancement layers are rendered useless. To achieve error free transmission of the base layer, ARQ could be used, but this limits the performance of LC, due to the strict timing constraints of real time transmission. In this paper we compare Multiple Description Coding, an alternative scalable scheme, without retransmission to Layered Coding with retransmission for a wide range of scenarios. These scenarios include network with no feedback support, networks with long RTTs (WANs) and applications with low latency requirements.
Differentiated QoS-aware priority handoff in cell-based multimedia wireless network
Show abstract
One key issue of providing multimedia services over a mobile wireless network is the quality of service (QoS) support in the presence of changing network connectivity. The trend of using pico-cells in wireless networks to gain more spatial efficiency increases the rate of call handoffs when mobile users move from one cell to another. Frequent handoffs make it very difficult to support QoS for multimedia applications. In this research, we investigate a potential solution to meet the challenge of seamless resource transition during frequent handoffs by combining the differentiated QoS service model and the priority handoff mechanism. We perform simulations with OPNET. Results show a tradeoff between system utilization and handoff blocking rates for different QoS classes.
Performance of syntax-based error detection in H.263 video coding: a quantitative analysis
Show abstract
In this work we evaluate the effectiveness of syntax-based error detection in the context of H.263 video transmission over a noisy, error-prone, channel. We assume transmission is carried out over a non-protected channel, i.e., no information about error occurrence is fed to the application by the transport layer. More specifically, the probability of missing an error or revealing one when no error is actually present is calculated. Particular care is taken to distinguish between macroblock (MB) data errors and header errors, since they have a different impact on reconstructed video quality. We also investigate the possibility of interpreting errors by relying on syntax information only: e.g. we try to discriminate between errors affecting motion vectors and errors altering the value of DCT coefficients. Extensive testing, performed at different rates and with channels characterized by different error probabilities, suggests that the necessity exists of using more reliable error detection approaches.
Real-time object tracking and human face detection in cluttered scenes
Show abstract
This paper presents a real-time video surveillance system which is capable of tracking multiple persons and locating faces in moderately complex scenes. Rather than using heavily parameterized models for the tracking of foreground regions, we suggest the modeling of objects based on the bounding boxes that contain them. The algorithm describes a novel integration of dynamic reference frame differencing and coarse motion estimation to overcome the various occlusion problems encountered in multiple object tracking. Change detection is performed by taking the difference between the current frame and a dynamic reference frame, where the reference frame is adaptively updated over time to account for changes in the background, illumination variations, and the like. Video object segmentation establishes a mapping from this binary change detection map to an indexed segmentation map by utilizing coarse directional information in addition to the size and position of connected foreground regions. We employ adaptive linear predictive filtering of the bounding box model in conjunction with the motion displacement estimates to accurately track multiple occluding objects. Once the video is segmented into foreground and background areas, we search within a subset of the foreground bounding boxes using chrominance histogram matching to detect facial regions.
Rotational invariant similarity measurement for content-based image indexing
Yong Man Ro,
Kiwon Yoo
Show abstract
We propose a similarity matching technique for contents based image retrieval. The proposed technique is invariant from rotated images. Since image contents for image indexing and retrieval would be arbitrarily extracted from still image or key frame of video, the rotation invariant property of feature description of image is important for general application of contents based image indexing and retrieval. In this paper, we propose a rotation invariant similarity measurement in cooperating with texture featuring base on HVS. To simplify computational complexity, we employed hierarchical similarity distance searching. To verify the method, experiments with MPEG-7 data set are performed.
Video indexing using object motion map
So-Yeon Kim,
Yong Man Ro
Show abstract
We propose an object motion map for video indexing. An efficient object motion extraction and indexing is an important issue for the content-based indexing of video data. From shots in video data, object motion is clustered and extracted by removing background motion from global motion. To index the object motion, we divide motion in polar coordinate into i number of magnitude and j number of direction so that i X j regions are obtained to represent variable motion type. The object motion map is obtained based on the polar division. The proposed object motion map has global and local information of motion as well as small size of storage for indexing. To evaluate performance of proposed indexing technique, experiments are performed with video database consisting of MPEG-1 video sequences in MPEG-7 test set.
Facial animation reconstruction from FAP
Show abstract
In MPEG-4, two sets of parameters are defined: Facial Definition Parameters (FDPs) and Facial Animation Parameters (FAPs). The FDPs are used to customize the proprietary face model of the decoder to a particular face or to download a face model along with the information about how to animate it. And the FAPs are based on the study of minimal facial actions and are closely related to muscle actions, they represent a complete set of basic facial actions, and therefore allow the representation of most facial expressions. In this paper, we propose a simple key-point displacement-controlling muscle model, which describes how the adjacent facial tissue moves with the key points to reconstruct facial animation using FAPs.
Compression issues in multiview autostereo displays
Show abstract
Image compression for multi-view autostereoscopic displays is one of the major governing factors for the development and acceptance of 3D technology. For example: the multi-view autostereo display developed at Cambridge uses between six and twenty-eight distinct views of the scene, each view being a complete image of the scene taken from a particular view point. Therefore it is of prime importance to use compression methods that would exploit the redundancy present in the view- point direction to allow for 3D image communication since the images require a very high bandwidth for transmission and a large amount of space for storage. In this paper an initial investigation on how the third dimension can be utilized is presented. Entropy measures for multi-view images are derived. It is shown that exploiting the similarities between views can give lower entropy, indicating higher compression rates. The parallel axes geometry of the cameras used for the autostereo display produces only horizontal shifts between stereo images, therefore investigation in using hierarchical row decomposition along with correlation and mean squared error measures for estimating disparity shifts and reducing search spaces respectively are presented.
DVD-RAM-based network storage system
Tetsuya Ura,
Takaya Tanabe,
Manabu Yamamoto
Show abstract
A network storage system with a high transfer rate and high capacity has been developed. This system, DVD-RAIL (Digital Versatile Disk-Redundant Array of Inexpensive Libraries), consists of six small DVD-RAM libraries and a RAILcontroller, which uses the RAID4 algorithm. Each library has two DVD-RAM drives, a robotic changer and a slot for storing up to 150 DVD-RAM disks. The system can handle up to 900 disks, corresponding to about 2 TB of storage. Data transfer is done in parallel from and to each library, so the transfer rate is over 6 MB/sec. The redundant architecture of RAIL provides high reliability, enabling the system to continue working even if an error occurs in one of the libraries. The RAILcontroller controls all the allocation and parallel transmission processes, so the system behaves as a large single library. Evaluation of the system showed that it can distribute high- definition moving pictures at over 20 Mbps and that a transfer rate of over 50 Mbps may be feasible.
Non-uniform shape-preserving subdivision scheme for surface interpolation
Show abstract
Recursive subdivision on 3-D meshes with arbitrary topology has been widely used in computer graphics and CAD/CAM systems. In this paper, we provide a non-uniform subdivision algorithm based on the modified Butterfly subdivision scheme. We adopt several efficient refinement criteria based on different kinds of viewing information such as the viewing frustum, surface orientation, screen-space visibility error, and local mesh flatness. We further generalize the modified Butterfly scheme to model the natural features of 3-D objects (such as creases, cusps and darts) by deriving a set of subdivision rules. To produce desired piecewise smooth surfaces from a recursive subdivision process, we used tagged meshes to model sharp features and classify the edge set into three categories: normal edges, sharp edges, and near-crease edges. An interactive subdivision system is constructed for users to easily specify sharp features.
Facial expression recognition on a people-dependent personal facial expression space (PFES)
Show abstract
In this paper, a person-specific facial expression recognition method which is based on Personal Facial Expression Space (PFES) is presented. The multidimensional scaling maps facial images as points in lower dimensions in PFES. It reflects personality of facial expressions as it is based on the peak instant of facial expression images of a specific person. In constructing PFES for a person, his/her whole normalized facial image is considered as a single pattern without block segmentation and differences of 2-D DCT coefficients from neutral facial image of the same person are used as features. Therefore, in the early part of the paper, separation characteristics of facial expressions in the frequency domain are analyzed using a still facial image database which consists of neutral, smile, anger, surprise and sadness facial images for each of 60 Japanese males (300 facial images). Results show that facial expression categories are well separated in the low frequency domain. PFES is constructed using multidimensional scaling by taking these low frequency domain of differences of 2-D DCT coefficients as features. On the PFES, trajectory of a facial image sequence of a person can be calculated in real time. Based on this trajectory, facial expressions can be recognized. Experimental results show the effectiveness of this method.
Image-fusion-based adaptive regularization for image expansion
Show abstract
This paper presents a regularized image sequence interpolation algorithm, which can restore high frequency details by fusing low-resolution frames. Image fusion algorithm gives the feasibility of using different data sets, which correspond to the same scene to get a better resolution and information of the scene than the one obtained using only one data set. Based on the mathematical model of image degradation, we can have an interpolated image which minimizes both residual between the high resolution and the interpolated images with a prior constraint. In addition, by using spatially adaptive regularization parameters, directional high frequency components are preserved with efficiently suppressed noise. The proposed algorithm provides a better-interpolated image by fusing low-resolution frames. We provide experimental results which are classified into non-fusion and fusion algorithms. Based on the experimental results, the proposed algorithm provides a better interpolated image than the conventional interpolation algorithms in the sense of both subjective and objective criteria. More specifically, the proposed algorithm has the advantage of preserving high frequency components and suppressing undesirable artifacts such as noise.