Methods for low bit-rate video compression: some issues and answers
Author(s):
Russell M. Mersereau;
Sepideh H. Fatemi;
Craig H. Richardson;
Kwan K. Truong
Show Abstract
This paper looks at some of the challenges associated with building a coder for video telephony that operates at bit rates below 20 kbps. It then proceeds to present a novel hierarchically structured coder that provides a mechanism for addressing many of these difficulties. The hierarchical structure employs several different simple predictors that are ordered according to their coding gains. This structure is contrasted with traditional predictive coders for video head and shoulders sequences and its performance is demonstrated with several examples.
Current status of the MPEG-4 standardization effort
Author(s):
Dimitris Anastassiou
Show Abstract
The Moving Pictures Experts Group (MPEG) of the International Standardization Organization has initiated a standardization effort, known as MPEG-4, addressing generic audiovisual coding at very low bit-rates (up to 64 kbits/s) with applications in videotelephony, mobile audiovisual communications, video database retrieval, computer games, video over Internet, remote sensing, etc. This paper gives a survey of the status of MPEG-4, including its planned schedule, and initial ideas about requirements and applications. A significant part of this paper is summarizing an incomplete draft version of a `requirements document' which presents specifications of desirable features on the video, audio, and system level of the forthcoming standard. Very low bit-rate coding algorithms are not described, because no endorsement of any particular algorithm, or class of algorithms, has yet been made by MPEG-4, and several seminars held concurrently with MPEG-4 meetings have not so far provided evidence that such high performance coding schemes are achievable.
Very low bit-rate (VLBR) coding schemes: a new algorithmic challene?
Author(s):
Claude Labit;
Jean-Pierre Leduc
Show Abstract
The huge increase of the communication services has suggested to investigate new algorithmic avenues to solve the perpetual rate/distortion trade-off in video coding applications. The problem is drawn nowadays according to very severe constraints: bit-rate budgets become smaller and domestic communication networks at 8 Kbit/s. Two approaches are explored. The first consists of extending the standards already developed in image communication by successive improvements and alternatives inside the algorithmic frameworks. The second proposes a complete change of the basic tools and a design of hybrid schemes by introducing some new methodological techniques which have been already validated by the Computer vision community but never introduced, for compatibility or complexity reasons, into the encoding strategies. This tutorial paper presents these different issues and evokes their advantages. Some experiments performed in our laboratory to encode efficiently time-varying image sequences are eventually presented.
Impact of human visual perception of color on very low bit-rate image coding
Author(s):
Sarah A. Rajala
Show Abstract
One of the keys to obtaining acceptable quality imagery/video encoded at very low bit rates is to transmit only that information which is critical to human perception. To successfully achieve this goal, one must not only understand the human visual system, but be able to utilize this information in the design of their codec. This paper will present an overview of the properties associated with color science and human visual perception, and how they could make an impact on very low bit-rate image coding.
Very low bit-rate video coding using matching pursuits
Author(s):
Ralph A. Neff;
Avideh Zakhor;
Martin Vetterli
Show Abstract
The term matching pursuits refers to a greedy algorithm which matches signal structures to a large, diverse dictionary of functions. The technique was proposed by Mallat and Zhang with an application to signal analysis. In this paper, we show how matching pursuits can be used to effectively code the motion residual in a hybrid video coding system at bit rates below 20 kbit/s. One advantage of this technique at low bit rates is that bits are assigned progressively to high energy areas in the motion residual. The proper choice of a dictionary set can lead to other advantages. For instance, a large dictionary with a wide variety of structures can represent a residual signal using fewer coefficients than the DCT basis. Also, a dictionary which is not block-based can reduce block distortions common to low bit rate DCT systems. Experimental results are presented in which the DCT residual coder from a standard coding system is replaced by a matching pursuit coder. These results show a substantial improvement in both PSNR and perceived visual quality. Further improvements result when the matching pursuit coder is paired with a smooth motion model using overlapping motion blocks.
Video sequence quantizer for very low bit-rate moving picture coding
Author(s):
Akio Yamada;
Mutsumi Ohta;
Takao Nishitani
Show Abstract
This paper describes a novel intelligent low bit rate coding technique for motion pictures, named Video Sequence Quantizer (VSQ). VSQ is one of the semantic coding. The concept is very simple. The encoder extracts motion information about target objects and transmit it as a few control parameters. The decoder has several video sequences in its database and outputs them selectively according to the transmitted parameters. It can reproduce natural movements with a simple operation compared with another semantic coding scheme called computer graphics model-based coding. We make clear the concept of VSQ and also apply it to a very low bit rate TV phone system. Computer simulation of TV phone has been done using twelve video sequences. It can naturally reproduce the speaker's movements by 80 bit/sec. VSQ is a very simple but basic concept for video coding. It will be also useful for mobile and multimedia communications.
Coding of significant features in very low bit-rate video systems
Author(s):
Josep R. Casas;
Luis Torres
Show Abstract
Small visual features are often meaningful for the subjective quality of the coded images in very-low bit-rate video systems. If some significant features of the original images are coded, large improvements in the perceived quality can be achieved at low cost. Perceptual factors must be taken into account in order to select only the most significant features for coding. Perceptual coding is the key issue in second generation video coding systems in order to reach the lowest bit-rates with acceptable quality. In this paper, some solutions are proposed for the extraction, selection and coding of small image features in video sequences. The extraction step is based on morphological tools and the selection is performed according to explicit perceptual criteria. The selected features are motion compensated and then coded with an efficient technique derived from the READ coding (2D run-length) taking references in the temporal dimension. The results at very low bit-rates validate the use of perceptual factors in advanced second generation coding techniques.
Morphological pattern restoration: optimal structuring elements
Author(s):
Dan Schonfeld
Show Abstract
In this paper, we derive the optimal structuring elements of morphological filters in image restoration. The expected pattern transformation of random sets is presented. An estimation theory framework for random sets is subsequently proposed. This framework is based on the least mean difference (LMD) estimator. The LMD estimator is defined to minimize the cardinality of the expected pattern transformation of the set-difference of the parameter and the estimate. Several important results for the determination of the LMD estimator are derived. The LMD structuring elements of morphological filters in image restoration are finally derived.
Homotopy and critical morphological sampling
Author(s):
Dinei A. F. Florencio;
Ronald W. Schafer
Show Abstract
In pattern recognition tasks, it is often convenient to alter the sampling rate of the signal, either to convert to a more appropriate scale/resolution, or to produce an image pyramid and perform the task in an multiresolution fashion. In many of these applications, the mse criteria optimized by the Shannon Sampling Theorem is not appropriate, and other sampling strategies should be considered. In this paper we present a new Critical Sampling Theorem, and extends the results to the case where the connection between several parts of the signal (i.e., the homotopy of the set) is of primary interest. The results are presented for binary signals in an hexagonal grid. Extension for square grids with some specific cases of connectivity criteria is also presented. The results show that it is possible to preserve homotopy while using a sampling density 3 to 4 times smaller than required by previous results. This can be used to reduce the sample density or to improve the detail preservation, and has the potential of improving many multiresolution techniques.
Greedy and A*-searching-based approach to the finding of optimal binary morphological filter
Author(s):
Chin-Chuan Han;
Kuo-Chin Fan
Show Abstract
In this paper, a greedy and A*-based searching algorithm is proposed to find the optimal morphological filter on binary images. According to the Matheron representation, the estimator for mean square error (MSE) is defined as the union of multiple erosions. Unfortunately, finding the optimal solution is a long search and time consuming procedure because we have to compute the MSE values over all possible structuring element combinations and make comparisons among them. In this presented paper, the search for the solution is reduced to the problem of obtaining a path with the minimal cost from the root node to one vertex on error code graph. Two graph searching techniques, greedy and A* algorithms are applied to avoid the search on the extremely large number of search space. Experimental results are illustrated to show the efficiency and performances of our proposed method.
Generalized morphological center: idempotence
Author(s):
Mohammed A. Charif-Chefchaouni;
Dan Schonfeld
Show Abstract
In this paper, we present an investigation of the generalized morphological center. The generalized morphological center is introduced. The idempotence of morphological filters is investigated. Conditions are subsequently derived for the idempotence of the generalized morphological center. Several important examples of idempotent generalized morphological centers are finally presented.
Skeleton generation using mathematical morphology
Author(s):
C. K. Lee;
Siu Pang Wong
Show Abstract
In this paper, we shall investigate thinning algorithms based on morphological hit/miss transform. Our target is to extract skeletons of closed-loop objects in noisy images. After the thinning process, many skeletal legs are generated and some of them may be very long, which require much time to be shortened. First, we propose three algorithms which can speed up the skeletal leg shortening process in different situations and investigate their performance to a single skeletal leg. Secondly we suggest a method of classifying an image using the skeletal leg patterns. Finally we investigate the efficiencies of the three proposed algorithms to different kinds of skeletal leg patterns and propose a decision method to select the most efficient algorithm for a particular image.
Obtaining surface orientation of texture image using mathematical morphology
Author(s):
Jun-Sik Kwon;
HyunKi Hong;
Jong Soo Choi
Show Abstract
In this paper, we present a new morphological approach to obtain a surface orientation using the variation of texture image caused by projective distortions. The perspective effect and the foreshortening are considered as the projective distortions. Under the assumption that the surface of texture image is plane, we apply the mathematical morphology in order to compute the 3D surface orientation. Centroids of texels can be obtained from the farthest texel to the nearest texel sequentially by recursive erosions, so the entire texture image is segmented into several sub-regions. We may compute a major axis of each sub-region with centroids and an average slope with all major axes. Intersections between the line perpendicular to the average slope and the major axes are obtained. Using the relations between the intersections and the size of structuring elements in each sub-region, we can compute the vanishing point. The entire region is rearranged using the major axes of sub-regions and the lines converging to the vanishing point. We can obtain the surface orientation by using the vanishing point and the perpendicular line. We have demonstrated experimental results with the artificial image and the natural images. In the experimental results, we have ascertained the proposed algorithm was more effective than the algorithm using the aggregation transform with the texel edge.
Multistage vector quantizer design using competitive neural networks
Author(s):
Syed A. Rizvi;
Nasser M. Nasrabadi
Show Abstract
This paper presents a new technique for designing a jointly optimized Multi-stage Vector Quantizer which is also known as the Residual Vector Quantizer (RVQ). In conventional stage-by-stage design procedure, each stage codebook is optimized for that particular stage distortion and does not consider the distortion from the subsequent stages. However, the overall performance can be improved if each stage codebook is optimized by minimizing the distortion from the subsequent stage quantizers as well as the distortion from the previous stage quantizers. This can only be achieved when stage codebooks are jointly designed for each other. In this paper, the proposed codebook design procedure is based on a multi-layer competitive neural network where each layer of this network represents one stage of the RVQ. The weight connecting these layers form the corresponding stage codebooks of the RVQ. The joint design problem of the RVQ's codebooks is formulated as a nonlinearly constrained optimization task which is based on a Lagrangian error function. The proposed procedure seeks a locally optimal solution by iteratively solving these equations for this Lagrangian error function. Simulation results show an improvement in the performance of an RVQ when designed using the proposed joint optimization technique as compared to the stage-by-stage design, where both Generalized Lloyd Algorithm and the Kohonen Learning Algorithm were used to design each stage codebook independently, as well as the conventional joint- optimization technique.
Sequential vector quantization of perceptually windowed DCT coefficients
Author(s):
Dong-Wook Kang;
Jun Seok Song;
Hee Bok Park;
ChoongWoong Lee
Show Abstract
In order to provide a high quality of the reconstructed images with a reduced transmission bit rate, taking into account both statistical redundancy and perceptual characteristics of the DCT coefficients, we apply perceptual windowing and sequential vector quantization to encoding of the coefficients: first, the coefficients are decomposed into 16 slightly overlapping subvectors so as that each subvector has reasonable dimension and conveys key information about one directional image, and then the decomposed subvectors are quantized in a sequential manner. The proposed scheme is good at encoding images with a wide range of transmission bit rates which can be easily controlled by adjusting only the stopping criterion of the sequential vector quantizer.
Subband finite-state vector quantization
Author(s):
Ruey-Feng Chang;
Yu-Len Huang
Show Abstract
Subband coding and vector quantization have been shown to be effective methods for coding images at low bit rates. In this paper, we propose a new subband finite-state vector quantization scheme that combines the SBC and FSVQ. A frequency band decomposition of the image is carried out by means of 2D separable quadrature mirror filters, which split the image spectrum into 16 subbands. In general, the 16 subbands can be encoded by intra-band VQ or inter-band VQ. We will use the inter-band VQ to exploit the correlations among the subband images. Moreover, the FSVQ is used to improve the performance by using the correlations of the neighboring samples in the same subband. It is well known that the inter- band VQ scheme has several advantages over coding each subband separately. Our subband- FSVQ scheme not only has all the advantages of the inter-band VQ scheme but also reduces the bit rate and improves the image quality. Comparisons are made between our scheme and some other coding techniques. The new scheme yields a good peak signal-to-noise ratio performance in the region between 0.30 and 0.31 bit per pixel, both for images inside and outside a training set of five 512 X 512 mono-chrome images. In the experiments, the improvement of our scheme over the ordinary VQ without SBC is up to 3.42 dB and over the inter-band VQ is up to 1.20 dB at nearly the same bit rate for the image Lena. The PSNR of the encoded image Lena using the proposed scheme is 32.1 dB at 0.31 bit per pixel.
Side-match vector quantization design for noisy channel
Author(s):
Chung Jung Kuo;
Chang S. Lin
Show Abstract
Side-match vector quantization is an attractive finite-state coding technique for images. This research shows that the side-match vector quantization is similar to catastrophic code where infinite number of decoding errors is caused by a finite number of channel errors. A 1D side- match vector quantization is first proposed to limit the error propagation across different row of blocks when the channel is noisy. Then three tests are proposed for noisy rows detection with minimum overhead information. These tests include the 1D side-match vector quantization decoding test, double-row test, and triple-row test. Finally, a modified Viterbi algorithm is proposed to decode the images encoded by the 1D side-match vector quantization.
Compressed maximum descent algorithm for codebook generation
Author(s):
Chrissavgi Dre;
Stavroula Giannopoulou;
Costas E. Goutis
Show Abstract
A vital step in building a vector quantizer is to generate an optimal codebook. Among the algorithms presented in the literature, the Maximum Descent (MD) algorithm appears to be a promising alternative codevector generation technique to the generalized Lloyd (LBG) algorithm, when dealing with vector quantization of images. In this paper, a novel vector quantization codebook generation approach is presented. The algorithm uses an MD codebook as an initial codebook and a compression of this codebook is then achieved based on a simple feature clustering technique. According to this technique, we attempt to arrange the codevectors of the MD codebook in a way that prefixed number of clusters results. The centroids of the resulted clusters form a reduced MD codebook. Using this new technique we can produce codebooks with about 0.2 - 0.6 db improvement in peak-signal to noise ratio and a reduction of 10% - 20% in the codebook size compared to the LBG algorithm.
Classified wavelet transform coding of images using vector quantization
Author(s):
Young Huh;
J. J. Hwang;
C. K. Choi;
Ricardo L. de Queiroz;
K. R. Rao
Show Abstract
The discrete wavelet transform (DWT) has recently emerged as a powerful technique for image compression in conjunction with a variety of quantization schemes. In this paper, a new image coding scheme--classified wavelet transform/vector quantization (DWT/CVQ)--is proposed to efficiently exploit correlation among different DWT layers aiming to improve its performance. In this scheme, DWT coefficients are rearranged to form the small blocks, which are composed of the corresponding coefficients from all the subbands. The block matrices are classified into four classes depending on the directional activities, i.e., energy distribution along each direction. These are further divided adaptively into subvectors depending on the DWT coefficient statistics as this allows efficient distribution of bits. The subvectors are then vector quantized. Simulation results show that under this technique the reconstruction images preserve the detail and structure in a subjective sense compared to other approaches at a bit rate of 0.3 bit/pel.
Efficient algorithm for two-dimensional finite impulse response (FIR) filtering and system identification
Author(s):
George-Othon Glentis;
Cornelis H. Slump;
Otto E. Herrmann
Show Abstract
In this paper a novel algorithm is presented for the efficient 2D Least Squares FIR filtering and system identification. Filter masks of general boundaries are allowed. Efficient order updating recursions are developed by exploiting the spatial shift invariance property of the 2D data set. In contrast to the existing column (row)-wise 2D recursive schemes based on the Levinson-Wiggins-Robinson's multichannel algorithm, the proposed technique offers the greatest maneuverability in the 2D index space in a computational efficient way. This flexibility can be taken into advantage if the shape of the 2D mask is not a priori known and has to be dynamically configured. The recursive character of the algorithm allows for a continuous reshaping of the filter mask. Search for the optimal filter mask, essentially reconfigures the filter mask to achieve an optimal match. The optimum determination of the mask shape offers important advantages in 2D system modeling, filtering and image restorations.
Globally optimal smoothing functional for multichannel image restoration
Author(s):
Moon Gi Kang;
Aggelos K. Katsaggelos
Show Abstract
It is expected that a globally optimal restored multichannel image should be superior to a suboptimally restored image without the use of cross-channel information. In this paper, a regularized multichannel image restoration approach is proposed, which is based on the minimum multichannel regularized noise power criterion. Furthermore, no prior knowledge about the variance of the noise at each channel and a bound on the high frequency energy of the image are assumed, but this information is estimated based on the partially restored result at each step. The multichannel smoothing functional to be minimized is formulated to have a global minimizer with the proper choice of the multichannel regularization functionals. With this algorithm, the regularization functional for each channel is determined by incorporating not only within-channel information but also cross-channel information. It is also shown that the proposed multichannel smoothing functional is convex, and therefore, has a global minimizer. The proposed multichannel algorithm not only does not depend on initial conditions but is also shown to be much more computationally efficient than existing algorithms.
Hierarchical Bayesian approach to image restoration and the iterative evaluation of the regularization parameter
Author(s):
Rafael Molina;
Aggelos K. Katsaggelos
Show Abstract
In an image restoration problem we usually have two different kinds of information. In the first stage, we have knowledge about the structural form of the noise and local characteristics of the image. These noise and image models normally depend on unknown hyperparameters. The hierarchical Bayesian approach adds a second stage by putting a hyperprior on these hyperparameters, through which information about these hyperparameters is included. In this work we relate the hierarchical Bayesian approach to image restoration to an iterative approach for estimating these hyperparameters in a deterministic way.
Jittered sampling on a hexagonal lattice
Author(s):
JoAnn B. Koskol
Show Abstract
Incorporating spatiotemporal jitter into the scanning of time-varying imagery on a rectangular lattice effects the severity of baseband noise under the conditions of sampling at or below the Nyquist rate. Sampling spatiotemporal images on a hexagonal lattice has been shown to provide a 13.4% reduction in the sample rate over that of a comparable rectangular system. The relationship between the increased performance due to sampling on a hexagonal lattice and the performance degradation due to the presence of jitter in the scanning process is investigated in this paper. Spatiotemporal pixel jitter is included in the scanning of time- varying imagery on a hexagonal grid. The aliasing noise is dependent on the statistical properties of the jitter as well as the pattern used to scan the image. The signal-to-aliasing noise power ratio is used as a performance measure.
Dynamic range compression of video-phone images by simulating additional diffuse scene illumination in their video signals
Author(s):
Werner Blohm
Show Abstract
The mismatch between dynamic ranges of intensities in non-uniformly illuminated videophone scenes and of intensities reproducible at display devices is addressed. Displayed portrait images with no details visible in some facial regions result from this mismatch and lead in turn to a reduced impression of telepresence. A novel image processing approach for dynamic range compression of video phone portrait images is presented in this paper. Unlike conventional methods, it provides compressed images with a most natural appearance. The basic idea is to simulate a more uniform scene illumination in the video signals of videophone portrait images. This requires a determination of the scene's reflectance function. An extension of the classical lightness approach of Land and McCann to curved surfaces is proposed for this task.
Decision-based median filter using local signal statistics
Author(s):
Dinei A. F. Florencio;
Ronald W. Schafer
Show Abstract
Noise removal is important in many applications. When the noise has impulsive characteristics, linear techniques do not perform well, and median filter or its derivatives are often used. Although median-based filters preserve edges reasonably well, they tend to remove some of the finer details in the image. Switching schemes--where the filter is switched between two or more filters--have been proposed, but they usually lack a decision rule efficient enough to yield good results on different regions of the image. In this paper we present a strategy to overcome this problem. A decision rule based on the second order local statistics of the signal (within a window) is used to switch between the identity filter and a median filter. The results on a test image show an improvement of around 4 dB over the median filter alone, and 2 dB over other techniques.
Reconstruction algorithm for error diffused halftones using binary permutation filters
Author(s):
Yeong-Taeg Kim;
Gonzalo R. Arce
Show Abstract
This paper describes an inverse halftoning algorithm to reconstruct a continuous-tone image given its error diffused halftone. We develop a modular class of non-linear filters, denoted as a class of binary permutation filters, which can reconstruct the continuous-tone information preserving image details and edges which provide important visual cues. The proposed non- linear reconstruction algorithm is based on the space-rank ordering of the halftone samples, which is provided by the multiset permutation of the `on' pixels in a halftone observation window. By varying the space-rank order information utilized in the estimate, for a given window size, we obtain a wide range of filters. A constrained LMS type algorithm is employed to design optimal reconstruction filters which minimize the reconstruction mean squared error. We present simulations showing that the proposed class of filters is modular, robust to image source characteristics, and that the results produce high visual quality image reconstruction.
Extended MPEG-2 rate control by analytical modeling of feedback system
Author(s):
Hiroshi Watanabe;
Yoshinori Ito
Show Abstract
A rate control in video coding using an analysis model is described. Video coding and the rate control presented in this paper are MPEG2 and its Test Model since they are known to provide better performance than the conventional one. The rate control can be modeled as a linear feedback system with some statistical assumptions. By this model, we can adjust the feedback response of the system. It is required to generate constant bits per picture for the communication application since it gives the shorter delay by using a smaller area of the buffer memory. Thus, the method to improve the tracking ability to the target bit in a certain period is proposed. It can generate more precise bit amount for a group of pictures (GOP) than the normal case. The deviation from the target value at the GOP level can be lowered to less than 1/70 in the simulation It can be applied to any video coding which determines a quantization step at every macroblock.
Performance evaluation of nonscalable MPEG-2 video coding
Author(s):
Robert L. Schmidt;
Atul Puri;
Barry G. Haskell
Show Abstract
The second phase of the ISO Moving Picture Experts Group audio-visual coding standard (MPEG-2) is nearly complete and this standard is expected to be used in a wide range of applications at variety of bitrates. While the standard specifies the syntax of the compressed bitstream and the semantics of the decoding process, it allows considerably flexibility in choice of encoding parameters and options enabling appropriate tradeoffs in performance versus complexity as might be suitable for an application. First, we present a review of profile and level structure in MPEG-2 which is the key for enabling use of coding tools in MPEG-2. Next, we include a brief review of tools for nonscalable coding within MPEG-2 standard. Finally, we investigate via simulations, tradeoffs in coding performance with choice of various parameters and options so that within the encoder complexity that can be afforded, encoder design with good performance tradeoffs can be accomplished. Simulations are performed on standard TV and HDTV resolution video of various formats and at many bitrates using nonscalable (single layer) video coding tools of the MPEG-2 standard.
Trick mode solutions for MPEG tape recording
Author(s):
Emmanuel D. Frimout;
Jan Biemond;
Reginald L. Lagendijk
Show Abstract
In this paper several formatting solutions which trick mode (fast forward, fast reverse) support for helical scan recording are proposed. First, two basic methods where the trick mode signal is built up from semi-random sections of the normal stream, will be discussed. These low complexity methods form the basis for the extension to more advanced solutions that employ a dedicated bit stream for trick modes. A method of formatting this dedicated trick stream on a recorder without phase lock between the scanner and the tape during fast forward is presented by means of three separate case studies. In order to guarantee that the trick mode stream is read at the chosen speed multiple copies of the same stream need to be placed at well chosen places on the tape. This solution is extremely attractive for systems where a limited number of speedup levels are required. Recorders that can perform phase locking during fast forward yield a particular advantage for high speed-up (n equals 12, 18, ...) factors and allow for the support of more distinct speedup levels.
Rate control strategy based on human visual sensitivity for MPEG video coder
Author(s):
Seungkwon Paek;
Jungsuk Kang;
Yang-Seock Seo
Show Abstract
The Moving Picture Experts Group (MPEG) has standardized the bit-stream syntax for the coded representation of video. This means that MPEG specifies only a decoding method and allows much flexibility in encoding methods. Therefore the picture quality of the reconstructed video sequences is considerably dependent on the rate control strategy in the encoding method. We propose a new rate control strategy conforming to the MPEG syntax that improves the reconstructed picture quality. The new rate control strategy allocates the target number of bits and assigns quantization step sizes adaptively based on the human visual sensitivity and the complexity of the picture to be coded. The proposed rate control strategy consists of the following steps. First, a 16 X 16 macroblock is classified into one of 8 macroblock classes based on human visual sensitivity for luminance and color components in the macroblock and then an 8 X 8 block is classified into one of block classes by its variance. Next, the target number of bits is allocated to a block and the quantization step size is assigned to a macroblock. The result of subjective tests showed the proposed rate control strategy improved the picture quality of the reconstructed video sequences considerably over the conventional strategy. Especially in the proposed strategy, the subjective picture quality between frames was more constant and hence was less degraded than conventional rate control strategies.
Projection-based decoding of low bit-rate MPEG data
Author(s):
Yongyi Yang;
Nikolas P. Galatsanos
Show Abstract
In low bit-rate applications, MPEG based compressed video exhibits annoying coding artifacts. In this paper, image recovery algorithms based on the theory of projections onto convex sets are proposed to decode video images from MPEG data without coding artifacts. According to this approach, each video frame is reconstructed using not only the transmitted data but also other prior knowledge which is available and not explicitly used by the conventional MPEG decoding algorithm. Numerical experiments demonstrate that the proposed algorithm yield images superior to those from conventional MPEG decoders.
Error accumulation in hybrid DPCM/DCT video coding
Author(s):
Limin Wang
Show Abstract
This paper studies the problem of error accumulation with hybrid DPCM/DCT video coding. It is shown that the error accumulation results from quantization of the temporal prediction errors in the DCT domain. When the temporal prediction error for a coefficient is smaller than the quantization dead zone for a number of successive frames, error accumulates for that coefficient. This is particularly the case for the high order DCT coefficients, which, in general, exhibit small amplitude and low temporal correlation. To alleviate the error accumulation, it is proposed that a loop filter be inserted in the hybrid coding loop to remove the temporal prediction for the coefficients whose temporal prediction errors contain more information than the coefficients themselves.
Hybrid motion compensation of interlaced video using motion field estimation
Author(s):
David A. Hargreaves;
Jacques Vaisey
Show Abstract
Motion compensation is a powerful tool used in the compression of video. While much work has been done on various techniques of progressive motion compensation, interlaced motion compensation has not been considered in as much detail. This paper proposes several new techniques of interlaced motion compensation using multiple fields, the de-interlacing interpolation. A novel technique, hybrid motion compensation, is presented that uses linear combination of previous fields to construct better block matches. In addition, the effects of several different interpolation techniques including spatial and Bayesian motion estimation, are considered.
Classification and codebook design for a vector quantizer
Author(s):
Mahmoud K. Quweider;
Ezzatollah Salari
Show Abstract
This paper presents a classified vector quantizer based on Peano scanning. The Peano scanning, which is used to reduce the dimensionality of the data provides a 1D algorithm to classify an image block. The class of the block is determined based on its Peano scanning value from a look up table (LUT) of representative Peano scanning values and their associated classes. The Peano scanning algorithm is easily implemented in hardware and the class can be determined in a logarithmic time proportional to the number of entries in the LUT when using a binary search algorithm on the sorted LUT. Moreover, the class look up table is easily implemented in real time. An effective algorithm to generate all the codebooks of the classes simultaneously in a systematic way based on the greedy tree growing algorithm is also presented. The monochromatic images encoded in the range of 0.625 - 0.813 with a 16 dimensional input vectors are shown to preserve the edge integrity and quality as determined by subjective and objective measures.
Adaptive quadtree-based side-match finite-state vector quantization
Author(s):
Ruey-Feng Chang;
Wei-ming Chen
Show Abstract
Vector quantization (VQ) is an effective image coding technique at low bit rate. Side-match finite-state vector quantizer (SMVQ) exploits the correlations between the neighboring blocks (vectors) to avoid large gray level transition across block boundaries. In this paper, a new adaptive quadtree-based side-match finite-state vector quantizer (QBSMVQ) has been proposed. In QBSMVQ, the blocks are classified into two main classes, edge blocks and nonedge blocks, to avoid selecting a wrong state codebook for an input block. In order to improve the image quality, edge vectors are reclassified into sixteen classes. Each class uses a master codebook that is different from the codebook of other classes. In our experiments, results are given and comparisons are made between the new scheme and ordinary SMVQ coding techniques. As will be shown, the improvement of QBSMVQ over the ordinary SMVQ is up to 3.13 dB at nearly the same bit rate. Moreover, the improvement over the ordinary VQ can be up to 4.30 dB at the same bit rate for the image Lena. Further, block boundaries and edge degradation are less visible because of edge-vector classification. Hence, the perceived improvement in quality over ordinary SMVQ is even greater for human sight.
Entropy-constrained mean-gain-shape vector quantization for image compression
Author(s):
Michael L. Lightstone;
Sanjit K. Mitra
Show Abstract
A method for optimal variable rate mean-gain-shape vector quantization (MGSVQ) is presented with application to image compression. Conditions are derived within an entropy- constrained product code framework that result in an optimal bit allocation between mean, gain, and shape vectors at all rates. An extension to MGSVQ called hierarchical mean-gain- shape vector quantization (HMGSVQ) is similarly introduced. By considering statistical dependence between adjacent means, this method is able to provide improvement in rate- distortion performance over traditional MGSVQ, especially at low bit rates. Simulation results are provided to demonstrate the rate-distortion performance of MGSVQ and HMGSVQ for image data.
New vector coding schemes for image and video compression
Author(s):
Weiping Li;
John P. Wus;
Ya-Qin Zhang
Show Abstract
Vector transform coding (VTC) has been shown to be a promising new technique for image and video compression. Vector transformation (VT) reduces inter-vector correlation and preserves intra-vector correlation much better than scalar-based transforms such as the discrete cosine transform (DCT) so that vector quantization (VQ) in the VT domain can be made more efficient. In addition to finding a good vector transform, another important aspect of VTC is the codebook structure and bit-allocation in the VT domain. Vectors with different indices in the VT domain have very different characteristics. Given a total number of bits, the question is now to construct the codebooks and allocate bits to the vectors so that distortion is minimized. A new multi-layered codebook structure and a dynamic bit-allocation scheme have been developed. The main advantage of this method over a fixed bit-allocation scheme is that distortion is controlled by dynamically allocating more bits to vectors causing larger distortions and less bits to vectors causing smaller subband coding into vector case so that scalar-based operations are replaced by vector-based operations. In VSC, an image is first decomposed into a set of vector subbands (VSBs) using a vector filter bank (VFB), and then VQ is performed on all vectors in each VSB. The proposed VFB not only reduces inter-vector and inter-band correlation, but also preserves intra-vector correlation. This property makes the subsequent VQ much more efficient and allows large coding gain over conventional subband coding.
Optimal codebook design for a trellis-searched vector quantization
Author(s):
Seung Jun Lee;
Yong Chang Seo;
ChoongWoong Lee
Show Abstract
This paper proposes a new trellis-searched vector quantization scheme with an optimal codebook design algorithm. The proposed vector quantizer employs a conditional index encoder to efficiently exploit high interblock correlation, where each index is adaptively encoded using a different Huffman table which is selected according to the index of the previously decoded block. We also modify the Viterbit algorithm in order to find the optimal path through the trellis of indices. For an optimal codebook design an iterative descent method is used, which is very similar to that for ECVQ (entropy-constrained vector quantization) codebook design. Simulation results show that at the same bitrate the proposed method provides higher PSNR performances than the conventional ECVQ by 1.0 approximately 3.0 dB when applied to image coding.
Entropy-constrained vector quantization of images in the transform domain
Author(s):
Jong Seok Lee;
Rin Chul Kim;
Sang Uk Lee
Show Abstract
In this paper, two image coding techniques employing an entropy constrained vector quantizer (ECVQ) in the transform domain are presented. In both techniques, the transformed DCT coefficients are rearranged into the Mandala blocks for vector quantization. The first technique is based on the unstructured ECVQ designed separately for each Mandala block, while the second technique employs a structured ECVQ, i.e., an entropy constrained lattice vector quantizer (ECLVQ). In the ECLVQ, unlike the conventional lattice VQ combined with entropy coding, we take into account both the distortion and entropy in the encoding. Moreover, in order to improve the performance further, the ECLVQ parameters are optimized according to the input image statistics. Also we reduce the size of the variable word-length code table, by decomposing the lattice codeword into its magnitude and sign information. The performances of both techniques are evaluated on the real images, and it is found that the proposed techniques provide 1 - 2 dB gain over the DCT-classified VQ at bit rates in the range of 0.3 - 0.5 bits per pixel.
Fractal-based hybrid compression schemes
Author(s):
Viresh Ratnakar;
Ephraim Feig;
Prasoon Tiwari
Show Abstract
Fractal compression has not lived up to its promise as a high-quality low bit-rate image compression scheme. The existing algorithms for finding self-mapping contractive transforms for images are computationally expensive and offer a poor rate-quality tradeoff. In this paper we explore the error images resulting from a simple fractal compression scheme. We use a set of fractal maps as a predictor for the image, and store the error-image using the Discrete Cosine Transform (DCT). Our experiments show that such a composite scheme has worse rate-quality tradeoff than DCT alone.
Fractal image coding system based on an adaptive side-coupling quadtree structure
Author(s):
Chewn-Jye Shy;
Hong-Yuan Mark Liao;
Chen-Kuo Tsao;
Ming-Yang Chern
Show Abstract
A new fractal-based image compression system based on a so-called Adaptive Side-Coupling Quadtree (ASCQ) structure is proposed. The proposed system consists of three processes: a preprocessing process, a compression process, and a decompression process. In the compression process, the original image is represented by an ASCQ structure. The set of Iterated Function System (IFS) codes, which is a powerful tool usually derived in the encoding process, can be calculated directly from this tree structure. Using these IFS codes, an image which is similar to the original one can be reconstructed. The proposed ASCQ structure not only represents the original image but also contains both the domain and range pools in it. The number of fractal codes in an ASCQ structure will be different when it is extracted from different original images. Since the proposed ASCQ is adaptive in its structure, it is beneficial for representing those images that contain more homogeneous regions. Also, the inherent dynamic structure of its range pool and domain pool, which is different from the fixed pools proposed by Jacquin, will certainly shorten the searching time and speed up the image retrieval process. We also propose a First-N-Maximum function as a decision function to choose those blocks in the domain pool that are most similar to the terminal ones. Experimental results reflect that the ASCQ structure is indeed an efficient structure for the fractal-based image compression system.
Fractal-based modeling and interpolation of non-Gaussian images
Author(s):
Stephen M. Kogon;
Dimitris G. Manolakis
Show Abstract
In modeling terrain images corresponding to infrared scenes it has been found the images are characterized by a long-range dependence structure and high variability. The long-range dependence manifests itself in a `1/f' type behavior in the power spectral density and statistical self-similarity, both of which suggest the use of a stochastic fractal model. The traditional stochastic fractal model is fractional Brownian motion, which assumes the increment process arises from a Gaussian distribution. This model has been found to be rather limiting due to this restriction and therefore is incapable of modeling processes possessing high variability and emanating from long-tailed non-Gaussian distributions. Stable distributions have been shown to be good models of such behavior and have been incorporated into the stochastic fractal model, resulting in the fractional Levy stable motion model. The model is demonstrated on a terrain image and is used in an interpolation scheme to improve the resolution of the image.
Biorthogonal wavelets for image compression
Author(s):
Eric Majani
Show Abstract
Biorthogonal wavelets or filterbanks are shown to be superior in coding gain performance than orthogonal ones for logarithmic subband decompositions (limited to iterative decomposition of the downsampled output of the analysis low-pass filter). As a consequence, for logarithmic decompositions, the optimal filter is not an ideal filter. This is shown for maximally regular biorthogonal and orthogonal filters, as well as filters designed to optimize the subband coding grain.
Wavelet image compression using IIR minimum variance filters, partition priority, and multiple distribution entropy coding
Author(s):
Dimitrios Tzovaras;
Serafim N. Efstratiadis;
Michael G. Strintzis
Show Abstract
Image compression methods for progressive transmission using optimal subband/wavelet decomposition, partition priority coding (PPC) and multiple distribution entropy coding (MDEC) are presented. In the proposed coder, hierarchical wavelet decomposition of the original image is achieved using wavelets generated by IIR minimum variance filters. The smoothed subband coefficients are coded by an efficient triple state DPCM coder and the corresponding prediction error is Lloyd-Max quantized. The detail coefficients are coded using a novel hierarchical PPC (HPPC) approach. That is, given a suitable partitioning of their absolute range, the detail coefficients are ordered based on their decomposition level and magnitude, and the address map is appropriately coded. Finally, adaptive MDEC is applied to both the DPCM and HPPC outputs by considering a division of the source of the quantized coefficients into multiple subsources and adaptive arithmetic coding based on their corresponding histograms.
Low bit-rate design considerations for wavelet-based image coding
Author(s):
Michael L. Lightstone;
Eric Majani
Show Abstract
Biorthogonal and orthogonal filter pairs derived from the class of binomial product filters are considered for wavelet transform implementation with the goal of high performance lossy compression. To help narrow the potential candidate filters, a number of design objectives based on filter frequency response and orthonormality are introduced with final selection being determined by experimental rate-distortion performance. While image data compression is specifically addressed, many of the proposed techniques are applicable to other coding applications.
Shuffle-based M-band filter bank design
Author(s):
Hakan Caglar;
Oktay Alkin;
Emin Anarim;
Bulent Sankur
Show Abstract
This paper presents a new design technique for obtaining optimum M channel orthogonal subband coders where M equals 2i. All filters that constitute the subband coder are FIR type filters with linear phase. We carry out the design in time domain, based on time domain orthonormality constraints that the filters must satisfy. Once a suitable low pass filter h0(n) is found, the remaining (M-1) filters of the coder are obtained through the use of shuffling operators on that filter. Since all resulting subband filters use the same numerical coefficient values (in different shift positions), this technique leads to a set of filters that can be implemented very efficiently. If, on the other hand, maximization of the coding gain is more important consideration than efficient implementation, the set of impulse responses obtained through shuffling can be further processed to remove the correlation between the subbands. This process leads to a new set of orthonormal linear phase filters that no longer share the same numerical coefficient values. In general, coding gain performance for this new set is better compared to the initial design. This uncorrelated decomposition can be thought of as counterpart of Karhunen-Loeve transform in an M channel filter bank.
Wavelet image modeling and its application in model-based image coding
Author(s):
Ling Guan
Show Abstract
This paper describes an image modeling scheme based on wavelet signal decomposition and statistical texture analysis, and its application in model-based image coding. In this approach, the image being considered is first decomposed into octal signal bands which contains different spectral components of the images. Based on the property of the signal decomposition statistical texture analysis techniques are hierarchically employed to separate the image into different categories with distinct modeling parameters. Coding is performed based on the texture image model.
Robust tracking of head and arm for model-based image coding
Author(s):
Robert Hsu;
Hiroshi Harashima
Show Abstract
Model-based image coding is potentially a powerful technique for compressing scenes dominated by the head-shoulder images, as in a videotelephone scene with a closeup of the human upper-body. In this paper we describe a Kalman filtering-based technique to robustly recover 3D structure and kinematics of the head and arm in view from optical flow. Because of its explicit modeling of measurement noise and modeling uncertainty, the extended Kalman filter has been shown to improve the robustness of shape-from-motion technique. The robustness of the recursive estimation technique is further enhanced by (1) confining feature- tracking within the neighborhoods of skeleton sketch of the head and arm, (2) formulating measurement noise of the optical flow as a function of the optical flow's confidence measures, and (3) fusing into the recursive estimation a set of constraints which govern kinematic linkages of an articulated arm.
Fast automatic face feature points extraction for model-based image coding
Author(s):
Xiangwen Wang;
Defu Cai
Show Abstract
A new scheme based on priori face knowledge and shift template method for fast face feature points extraction are presented in this paper. A fairly good accuracy and speed in detecting the feature points of eyebrows, eyes, nose and mouth have been achieved by a pair of complementary templates. It is shown by computer simulation that the scheme is very suitable for very low bit rate model-based image coding in real time applications.
Hierarchical image sequence model for segmentation: application to region-based sequence coding
Author(s):
Ferran Marques;
Victor Vera;
Antoni Gasull
Show Abstract
This paper studies the performance of an image sequence model based on Compound Random Fields when used for segmentation purposes. The fact of using a hierarchical model allows characterizing separately the texture and contour information within the sequence. Moreover, temporal and spatial contour behavior can also be described independently. This separated characterization allows to impose constraints on the kind of contours to be obtained and to introduce some a priori knowledge in the segmentation procedure. The way to exploit these features for an object-based sequence coding scheme is analyzed. The influence of the model parameters on the segmentation results is analyzed, in order to achieve segmentations which can be easily coded. The main sought characteristics are smooth spatial contours, slow temporal variations and homogeneous textures. Two different segmentation algorithms are used for this analysis, a recursive and a nonrecursive one.
Error-correcting form class matching for form reading system
Author(s):
Xingyuan Li;
Jiarong Hong
Show Abstract
In this paper, we describe a method to train and classify form class. We consider the form shape as a special graph, and propose a representation of a form class and an efficient matching algorithm. The matching algorithm is simple and can find the optimal map. When the form class is trained, our approach can correct some errors in the input form; thus a high tolerance of noise on the image is obtained.
Using a generalized neuron evolution rule to resolve spatial deformation in pattern recognition
Author(s):
Andrew C. C. Cheng;
Ling Guan
Show Abstract
In common associative neural network practice, the updating rule remains unchanged during neuron evolution. Once the evolution is stuck in a local minimum, simulated annealing is used to resolve the problem. In this paper, the local minimum problem in associative neural networks is treated from a novel perspective. In particular a generalized neuron evolution algorithm is introduced. It is shown in the paper that the probability of local minima associated with different learning schemes having the same locations on the surfaces of their respective energy functions is extremely small. Thus once evolution terminates in a local minimum associated with one learning scheme, another learning scheme is invoked to continue the evolution. Numerical examples are used to demonstrate the performance of this. The results show that the local minimum problem is effectively resolved.
Robust object recognition using symmetry
Author(s):
Nicholas P. Walmsley;
K. M. Curtis
Show Abstract
This paper describes an object recognition technique based on the concept of local symmetry. A technique is described which can detect 1D features under situations of extreme low contrast. The effectiveness of the technique is shown through application to DNA autoradiographs. The technique is then expanded so that 2D objects can be located and then recognized. In the 2D case the resulting output of the technique is a locus which can be compared to those in a database. It is further shown how the technique is robust against changes in both scale and orientation.
Associative memory model for searching an image database by image snippet
Author(s):
Javed I. Khan;
David Y. Yun
Show Abstract
This paper presents an associative memory called an multidimensional holographic associative computing (MHAC), which can be potentially used to perform feature based image database query using image snippet. MHAC has the unique capability to selectively focus on specific segments of a query frame during associative retrieval. As a result, this model can perform search on the basis of featural significance described by a subset of the snippet pixels. This capability is critical for visual query in image database because quite often the cognitive index features in the snippet are statistically weak. Unlike, the conventional artificial associative memories, MHAC uses a two level representation and incorporates additional meta-knowledge about the reliability status of segments of information it receives and forwards. In this paper we present the analysis of focus characteristics of MHAC.
Aerial image: from straight lines to rectangles
Author(s):
Sergey V. Ablameyko;
Dmitry M. Lagunovsky
Show Abstract
A fast algorithm to detect rectangles in aerial images is suggested. A contour image is used as an input for the algorithm. the primitive lines are extracted in the contour image and joined into line segments by cluster analysis method. the line-merging algorithm is developed, and an algorithm to detect rectangles from the extracted straight lines is suggested. The developed algorithm allows a good relation between computational time and line quality.
Classification of compressed images using Fourier-transform-based invariants
Author(s):
Zamir Abraham;
Moshe Elbaz;
Jacob Rubenstein;
Yehoshua Y. Zeevi
Show Abstract
A novel approach for classification of deformed images is presented. It is shown that by using the Fourier phase of the deformed images, a set of invariants can be calculated and used for identification of each image. Phase-only, as is well known, is also good for compression and reconstruction of images. Computational results are presented and discussed.
Joint motion/disparity estimation for stereo image sequences
Author(s):
Sotiris Malassiotis;
Michael G. Strintzis
Show Abstract
An algorithm is described for the joint estimation of motion and disparity vector fields from stereoscopic image sequences. Gibbs-Markov random fields are used to model local interaction processes. Interaction of neighboring motion, disparity vectors across a discontinuity line is prohibited via hidden Markov chains signaling discontinuities in the vector fields. The coherence of motion and disparity vector fields, is exploited by means of the epipolar constraint and the so called `loop constraint'. A simulated annealing algorithm is employed to find the global maximum of the posterior probability.
Gibbs random field model based 3D motion estimation from video sequences
Author(s):
A. Aydin Alatan;
Levent Onural
Show Abstract
In contrast to previous global 3D motion concept, a Gibbs random field based method, which models local interactions between motion parameters defined at each point on the object, is proposed. An energy function which gives the joint probability distribution of motion vectors, is constructed. The energy function is minimized in order to find the most likely motion vector set. Some convergence problems, due to ill-posedness of the problem, are overcome by using the concept of hierarchical rigidity. In hierarchical rigidity, the objects are assumed to be almost rigid in the coarsest level and this rigidness is weakened at each level until the finest level is reached. The propagation of motion information between levels, is encouraged. At the finest level, each point have a motion vector associated with it and the interaction between these vectors are described by the energy function. The minimization of the energy function is achieved by using hierarchical rigidity, without trapping into a local minimum. The results are promising.
Kalman filter for improving optical flow accuracy along moving boundaries
Author(s):
J. N. Pan;
Yun-Qing Shi
Show Abstract
Optical flow is an important source of information of motion and structure of objects in 3D world. Once the optical flow field is computed accurately, the measurement of image velocity can be used widely in many tasks in computer vision area. Current computer vision techniques require that the relative errors in the optical flow be less than 10%. However, to reduce error in optical flow determination is still a difficult problem. In this paper, we propose a Kalman filtering for improving accuracy in determining optical flow along moving boundaries. Firstly, a quantitative analysis on the error decreasing rate in iteratively determining optical flow using the correlation-based technique is given. It concludes that this error decreasing rate is varied for different regions in an image plane: it is larger for the regions where intensity varies more drastically, it is smaller for those where intensity varies more smoothly. This indicates that the iterations needed in optical flow determination should not be uniform for different image regions. That is, for the moving boundaries, where intensity usually changes bigger, less iterations are needed than for other regions. This is reasonable. In fact, the confidence measure is usually high along moving boundaries since richer information exists there. Therefore, an optical flow algorithm needs to have less iterations along moving boundaries than in other areas so that the better estimation of optical flow along boundaries can be propagated into other areas instead of being blurred by those in other areas. Secondly, we propose a Kalman filter to realize the task of applying different number of necessary iterations in determining optical flow to deblur boundary and enhance accuracy. Loosely speaking, the idea is whenever the standard deviation of optical flow at a pixel is less than certain criterion, i.e., good accuracy has been achieved, the Kalman filter will not further update optical flow at this pixel, thus conserving accuracy along moving boundaries. Assuming that estimated optical flow field is contaminated by a Gaussian white noise, we give appropriate considerations to the system and measurement noise covariance matrices, Q and R, respectively. In this way, the Kalman filter is used to eliminate noise, raise accuracy and refine accuracy along discontinuities. Finally, an experiment is presented to demonstrate the efficiency of our Kalman filter. Two objects are considered. One is stationary, while another is in translation. Unified optical flow field (UOFF) quantities are determined by using the proposed technique. The 3D position and speeds are then estimated by using UOFF approach. Both results obtained with and without the Kalman filter are given. A more than 10% improvement is achieved in this experiment. It is expected that the more moving boundaries in the scene, the more effectively the scheme works.
Parallel motion estimation using an annealed Hopfield neural network
Author(s):
Yungsik Kim
Show Abstract
An annealed Hopfield neural network has been shown to solve an image segmentation problem and good image segmentation was successfully achieved. In this paper, a new motion estimation algorithm using an annealed Hopfield neural network is developed. Motion estimation process can simply be described as finding the corresponding pixels in the consecutive images. Optimization function Eme1 equals Eg to achieve this simple process is defined first. This optimization function finds the motion vector for a given pixel in a frame by finding a corresponding pixel in the next frame. However, the image sequence usually contains the noise. In this case, only finding the corresponding pixels does not work well in estimating the correct vector field. To make the motion vectors smooth within a moving object and to make the motion vectors different between the objects moving in different directions, weak continuity constraints terms, Eme2 equals Ed + Es + Ep, are added to the previously defined optimization function Eme1, resulting in Eme equals Eme1 + Eme2. Eme2 controls the smoothness of the detected motion vectors within objects as well as maintains the motion vector boundaries between the objects moving to the different directions. Simulation are done for the synthetic image sequence and the real image sequence.
Calculating time-to-collision with real-time optical flow
Author(s):
Ted A. Camus
Show Abstract
Currently two major limitations to applying computer vision to real-time robotic vision tasks are robustness in unsimulated and uncontrolled environments, and the computational resources required for real-time operation. In particular, many current visual motion detection algorithms (optical flow) are not suited for practical applications such as crash detection because they either require highly specialized hardware or up to several minutes on a scientific workstation. A recent optical flow algorithm [C94] however has been shown to run in real-time on a standard scientific workstation and yields very accurate time-to-contact calculations.
Fast multiresolution motion estimation scheme for a wavelet transform video coder
Author(s):
Sethuraman Panchanathan;
Eric Chan;
Xiping Wang
Show Abstract
Wavelet transform has recently emerged as a promising technique for video processing applications due to its flexibility in representing nonstationary video signals. In this paper, we propose a fast multiresolution motion estimation (FMRME) scheme based on the wavelet transform for video compression. In FMRME, the set of orientation subimages at each level of the wavelet pyramid are first combined together into a single (all-orientation) sub-image, and then the motion estimation is performed on the newly formed subimage. This contrasts with the MRME scheme (recently reported in the literature) where motion estimation is separately performed on all the individual wavelet subimages. The motion vectors of an all-orientation subimage at a lower level are predicted from the motion vectors at the higher (preceding) level and are refined at each step. The proposed scheme reduces the search time for motion vectors by 66% compared to MRME. In addition, the FMRME scheme has the advantage of significantly reduced side information for the description of motion vectors. Simulations show that the FMRME scheme has considerable reductions in the bit rate which results to significant improvements in coding performance of the FMRME based wavelet coding compared to the MRME based wavelet coding for video compression.
Motion-compensated spatio-temporal interpolation for frame rate up-conversion of interlaced or progressive image sequences
Author(s):
Teh-Tzong Chao;
Chang-Lin Huang
Show Abstract
This paper proposes a new motion-compensated frame (field) interpolation algorithm for frame (field) rate upconversion, which allows us to interpolate frame (or pairs of fields) between two originally continuous frames (fields) of a digital television sequence by preserving the stationary background. First, for a interlace format, the de-interlacing process was used to reduce the motion range after converting the interlace format to progressive one. A video scene can be temporally categorized by the change detector into changed and unchanged regions. Each changed region is further separated into moving objects, covered and uncovered regions. To interpolate the intermediate field (frame), we have developed direct motion interpolation method and indirect motion interpolation method to fill the moving object areas in the changed regions, and then apply the forward/backward motion extrapolation method to fill the covered/uncovered regions. Finally the hybrid repetition is used to interpolate the unchanged regions. In the experiment, we will show the interpolated fields and frames for two standard image sequences.
Efficient algorithm for frame rate conversion
Author(s):
Vincent Lin;
Sau-Gee Chen
Show Abstract
Frame rate conversion is widely used in transformation among various different video standards. This paper describes an efficient frame rate conversion method using block motion compensation. In this approach, a predicted motion vector and a motion vector preselecting process are invoked. This results in lower complexity, while maintains a higher SNR compared with the existing efficient method.
Robust 3D feature detection using dictionary-based relaxation
Author(s):
Nigel Sharp;
Edwin R. Hancock
Show Abstract
The problem of grouping 3D coplanar line segmented obtained from a single view is addressed. The proposed method is efficient and has been tested on both synthetic and real images. First, a Hough-based algorithm is used to detect 2D line segments in a sequence of images representing a 3D scene. Secondly, the 3D coordinates of the line segments are estimated, at each time instant, by means of an extended Kalman filter, based on the displacements (u,v) of the line segment endpoints on the image plane. Finally, 3D coplanar segments are grouped by a 3D voting approach. The novelty of this method lies in the possibility of using a simple voting scheme similar to that associated with the standard Hough transform for line extraction, where each edge point votes for a sheaf of rectilinear lines. In the proposed approach, each line segment votes for a sheaf of planes.
Three-dimensional line segment extraction and grouping from image sequences
Author(s):
Gian Luca Foresti
Show Abstract
The problem of grouping 3D coplanar line segmented obtained from a single view is addressed. The proposed method is efficient and has been tested on both synthetic and real images. First, a Hough-based algorithm is used to detect 2D line segments in a sequence of images representing a 3D scene. Secondly, the 3D coordinates of the line segments are estimated, at each time instant, by means of an extended Kalman filter, based on the displacements (u,v) of the line segment endpoints on the image plane. Finally, 3D coplanar segments are grouped by a 3D voting approach. The novelty of this method lies in the possibility of using a simple voting scheme similar to that associated with the standard Hough transform for line extraction, where each edge point votes for a sheaf of rectilinear lines. In the proposed approach, each line segment votes for a sheaf of planes.
Dialogic modification of picture by interactive process of processing and painting
Author(s):
Noritaka Kimura;
Hiroki Ino;
Yo Murao;
Hajime Enomoto
Show Abstract
Dialogic modification using picture processing and painting in combination is proposed as an effective method. Picture painting is realized by a sequence of basic processes specifying points, lines, luminance and chrominance. Picture processing consists of processes extracting feature objects. The modification processes consist of both processes so to have interaction between them. In these modification processes, a client can select interactive scheme of switching between processing and painting. Dialogic modification is defined as a set of modification processes given by communication sequences between a client and a server. An example is illustrated. In the example, a client intends to change a highlight position to generate a new picture required by a change of view point. The modification methods are classified and used. In dialogic modification, the selecting a revised sequence is executed by communication between a client and a server. To realize the communication, the common platform is introduced as an interface in the color picture processing and painting system based on the extensible window-based elaboration language. As the results, it is shown that dialogic modification of picture by interactive process of processing and painting is the excellent method.
Service management in the picture processing and painting system having extensible functions
Author(s):
Ikuo Hirai;
Yasuhide Miyamoto;
Yo Murao;
Hajime Enomoto
Show Abstract
An excellent system environment is required for realizing various services for picture processing and painting like geometrical feature processing and modification processing with susceptible constraints. Service management will play important roles in the system environment in order to integrate processing and painting of many kinds of pictures including moving picture. Integration of service management is inevitable to realize interactive process such as our Extensible WELL (Window-based ELaboration Language). Our system can handle all kinds of domain specific language by using concept of common platform including object network and of communication manager. Every object network specifies consecutive process, and communication manager controls cooperation between a client and several servers.
Adaptive multiresolution Hermite-binomial filters for image edge and texture analysis
Author(s):
Irene Yu-Hua Gu
Show Abstract
A new multiresolution image analysis approach using adaptive Hermite-Binomial filters is presented in this paper. According to the local image structural and textural properties, the analysis filter kernels are made adaptive both in their scales and orders. Applications of such an adaptive filtering approach, including image texture resolution analysis, multiscaled image edge curve estimation and adaptive edge-texture-based image compression, are then presented. Simulation results on image texture resolution and edge curve estimation as well as image compression are included.
New texton encoder for rotation-invariant texture processing
Author(s):
Sharma V.R. Madiraju;
Chih-Chiang Liu
Show Abstract
This paper presents a robust texture encoding technique for texture processing using a covariance-based approach. This encoder produces a texture code which is invariant to local and global textural rotations. Our proposed method uses six statistical features obtained from two rotation invariant local descriptors. The covariance technique is used to compute local descriptors and to index roughness, anisotropy, or general textural differences. Texture classification and segmentation is carried out to evaluate the proposed encoding scheme. Classification and segmentation results for synthetic and natural textures are presented to show the robustness of our method.
Edge detection by nonlinear dynamics
Author(s):
Yiu-fai Wong
Show Abstract
We demonstrate how the formulation of a nonlinear scale-space filter can be used for edge detection and junction analysis. By casting edge-preserving filtering in terms of maximizing information content subject to an average cost function, the computed cost at each pixel location becomes a local measure of edgeness. This computation depends on a single scale parameter and the given image data. Unlike previous approaches which require careful tuning of the filter kernels for various types of edges, our scheme is general enough to be able to handle different edges, such as lines, step edges, corners and junctions. Anisotropy in the data is handled automatically by the nonlinear dynamics.
Directional moving averaging interpolation for bilinear texture mapping
Author(s):
Kou-chang Chen;
Chang-Lin Huang
Show Abstract
In image wrapping, the post-filtering is necessary to avoid aliasing effect. Here, we develop a hybrid space-variant directional filtering for post-processing. In order to reduce the computation, we classify the filter area as three blocks: constant, oriented, and irregular. The constant and irregular blocks are local average filtering and elliptical weighted average filtering. In oriented model, we propose a new filtering method called elliptical weighted adaptive directional moving average filter. In the experiments, we show that our method may correct the distorted images and have better results, with better subjective quality, especially with high magnification ratio, such as four times or more.
Super-resolution estimation of edge images
Author(s):
E. Fussfeld;
Yehoshua Y. Zeevi
Show Abstract
A hidden markov model, which describes the evolution of a (binary) edge-image along the resolution axis, is presented. The model integrates two layers: A hidden layer consists of sources having the ability of `breeding' along the resolution axis according to a markovian rule. A second layer consists of a Gibbs random field which is defined by all the sources. The available image is a realization of this field. After fitting such a model to a given pyramid, it is possible to estimate the super-resolution images by synthesizing additional levels of the process which created the pyramid. The hidden markov model is found to be a useful tool, allowing us to incorporate selected properties in the process of evolution along the resolution axis, while simultaneously providing an interpretation of this process. The properties incorporated into the model significantly influence the super-resolution image.
Texture classification using morphological gradient texture spectrum
Author(s):
Jia-hong Lee;
Yuang-cheh Hsueh
Show Abstract
In this paper, a new notion called Morphological Gradient Texture Spectrum (MGTS) is proposed for texture analysis and classification. The MGTS is used as a discriminating tool in texture analysis and classification. To evaluate the performance of the proposed method in the discrimination of textures, we perform a supervised classification procedure to classify texture images extracted from Brodatz's album. A simple measure is defined to determine the distance between two MGTS. Promising results indicate that our method is efficient in texture classification. We also evaluate two basic texture features, namely, coarseness and directionality from the MGTS for visual-perceptual feature extraction.
Texture classification using C-matrix and the fuzzy min-max neural network
Author(s):
Youn-Jen Chang;
Hsiao-Rong Tyan;
Hong-Yuan Mark Liao
Show Abstract
In this paper, we propose a new texture classification method. Previously, for texture analysis and classification, the gray tone co-occurrence matrix was adopted most frequently. However, due to the complexity in its derivation process, it is not the best choice if the processing time is a major concern. In this work, we propose a more compact matrix called C-matrix to solve the above problem. The proposed C-matrix characterizes both qualitative and quantitative properties between each pixel and its neighbors in an image. Based on this matrix, a set of statistical features can be defined. These features are then fed into a trained fuzzy neural network for texture classification. Experimental results based on two ground surface images are reported to corroborate the proposed theory.
Three-dimensional object reconstruction from a monocular image
Author(s):
A. Sdigui;
G. Barta;
M. Benjelloun
Show Abstract
In this paper, we propose a geometrical method for reconstruction of particular objects, typically polyhedrons whose faces are parallelepipeds. The originality of this approach lies in the use of Pluckerian coordinates for the parameterization of the lines supporting segments. The intrinsic relationships between the 3D co-ordinates of a segment of line and its projection on the retina plane have enabled to determine the 3D structure from 2D data together with a partial knowledge of the object. We have developed the 3D reconstruction method and presented results.
Automated counting of pedestrians
Author(s):
Graham G. Sexton;
Xiaowei Zhang
Show Abstract
Recent developments in automatic building management systems and in road traffic monitoring systems have highlighted a need for accurate real time data relating to occupancy and flow of pedestrians in relatively unconstrained physical areas. In buildings where constrained pedestrian flow is possible to arrange either light beams or turn stiles have been used to count people, but in less constrained areas, counting has been achieved by some manual means i.e. hand counting of live or recorded scenes. The ESPRIT PEDMON (PEDestrian MONitoring) project is focused on the development of technologies to enable the automated counting of pedestrians in unconstrained areas, with the target of producing a cost effective real time demonstration at the projects conclusion. A number of mechanisms are being investigated within the project and image processing is taking a leading role in the development of wide area monitoring and counting systems. This report will describe the image processing work performed by the authors in fulfillment of the project.
Low-delay center-bridged and distributed combining schemes for multipoint videoconferencing
Author(s):
Ting-Chung Chen;
Wen-deh Wang
Show Abstract
Multipoint videoconferencing is a natural evolution of two-point videoconferencing and can increase its value to users. Currently, ITU-T's SG8 and SG15 are working on multipoint control related issues; ANSI's T1A1.5 is also working in this area. The video coding and related communication standards of the upcoming MPEG4 and H.26P will all include multipoint communication capability. This paper investigates transport structures and the associated combining schemes that can be used to support multipoint videoconferencing. Since low delay is a major issue for multipoint interactive communications, a distributed structure which renders the lowest delay with many other system advantages is especially investigated. We first analyze the insertion delay of a QCIF combiner bridge which have been proposing for multipoint videoconferencing. Partial input-output pipelining has been used to reduce the delay. To reduce the insertion delay caused by unevenly distributed inputs, a parallel parsing scheme is proposed. This parallel parsing scheme allows low complexity inputs not to be held up by high complexity inputs and can reduce insertion delay significantly. An efficient delay- reduction algorithm using intraslice coding was also cited in a previous proposal. As a comparison, we describe low delay pel-domain transcoding schemes which have similar delay performance with coded-domain combining but have much higher complexities. We also describe a recent proposal which eliminates most of the insertion delay but which require major changes to existing standards and encoder and decoder implementations. The performance of a distributed transport structure in providing multipoint video services is then investigated. Using the face that bandwidth usage is an important factor in estimating network complexity, it is shown that a distributed transport structure saves 40 to 62.5% bandwidth for a 4-point conferencing and also renders the shortest delay when compared with other center- bridged structures. We then discuss multipoint control and decoder implementations using the distributed transport structure. It is shown that a parallel decoding structure can be used to combine multiple inputs to realize a short delay which is identical to the delay of a two-point conference. The complexity of this special decoder, though, is larger than a regular decoder. A cascaded combiner-decoder at the user site is then shown to make a good compromise between implementation complexity, bandwidth efficiency and delay performance.
Image vector quantization with channel coding
Author(s):
Otar G. Zumburidze;
Hazem Ahmad Munawer
Show Abstract
In this work, we present several methods of reduction of specific distortions such as `blocking effect', `staircase structure' and `details blurring' accompanying vector quantization (VQ) process in coding both still and moving pictures. These methods include: (1) the overlap method in forming and rebuilding blocks (vectors); (2) the classified vector quantization based on a composite source model supplying with better perceptual quality of reconstructed images; (3) adaptive VQ with Discrete Cosine-III Transform, which gives better performances than VQ with standard Discrete Cosine Transform; (4) the 3D vector quantizer giving the possibility of exploiting both intra- and inter-frame correlations in image sequence coding and thus resulting in higher reconstruction quality, even at a very low bit rate. Moreover, a large codebook built partially outside training is used to encode the image sequence. The 3D VQ gives the possibility of reducing the annoying floating noises in image sequence coding. At last we proposed an algorithm for VQ with channel coding (for Gaussian channel). In the first experiment we used Phase Shift Keying (PSK), in the other experiment we used Minimum Shift Keying and error correcting convolutional codes (66,35) which gives superbly better performances than with PSK.
Pseudonoise-coded image communications
Author(s):
Tohru Kohda;
A. Oshiumi;
Akio Tsuneda;
K. Ishii
Show Abstract
Baseband image communication systems using code division multiple access are proposed. We investigate such a system using fixed-length pseudonoise (PN) codes and one using variable- length PN codes. We can find that the number of channels can be reduced when the latter is employed. Such a situation leads us to employ chaotic sequences, whose code-lengths can be arbitrarily chosen, for such variable-length PN codes.
Information theoretical assessment of visual communication with subband coding
Author(s):
Zia-ur Rahman;
Carl L. Fales;
Friedrich O. Huck
Show Abstract
A well-designed visual communication channel is one which transmits the most information about a radiance field with the fewest artifacts. The role of image processing, encoding and restoration is to improve the quality of visual communication channels by minimizing the error in the transmitted data. Conventionally this role has been analyzed strictly in the digital domain neglecting the effects of image-gathering and image-display devices on the quality of the image. This results in the design of a visual communication channel which is `suboptimal.' We propose an end-to-end assessment of the imaging process which incorporates the influences of these devices in the design of the encoder and the restoration process. This assessment combines Shannon's communication theory with Wiener's restoration filter and with the critical design factors of the image gathering and display devices, thus providing the metrics needed to quantify and optimize the end-to-end performance of the visual communication channel. Results show that the design of the image-gathering device plays a significant role in determining the quality of the visual communication channel and in designing the analysis filters for subband encoding.
High-data-rate DCT/IDCT architecture by parallel processing
Author(s):
Tae-Yong Kim;
Lee-Sup Kim;
Jae-Kyoon Kim
Show Abstract
A new DCT/IDCT architecture capable of handling higher input/output data rates has been proposed. In the proposed architecture, the 8-point input data vector for DCT/IDCT is divided into two 4-point data vectors, the even part and the odd part. These two parts are parallelly processed. As a result, the 8-point DCT/IDCT is completed for 4 clock cycles, while the conventional DCT/IDCT processors need 8 clock cycles. Therefore, our novel DCT/IDCT architecture achieves twice higher data rates, which is useful for the applications like the real- time HDTV. For the purpose of reducing the hardware size, we replaced the Modified Booth Multiplier by the Pre-Rounded Multiplier, in which some lower significant bits of partial sums are rounded before summations. To achieve high data rates, multipliers and accumulators were composed of Carry Save Adders and Pipeline Registers. Although the proposed DCT/IDCT architecture has a larger chip size than the one based on the Distributed Arithmetic method, the size is reasonable in 1.0 micrometers CMOS technology. In spite of a larger chip size, the proposed architecture can achieve higher data rates and high accuracy. The high regularity of the proposed architecture is also appropriate for VLSI implementation.
Multiresolution modulation for efficient broadcast of information
Author(s):
Mika Grundstrom;
Markku Renfors
Show Abstract
Methods for reliable transmission of information, especially digital television, are considered. In the broadcast channel, several different receiver configurations and channel conditions make an optimization of the channel coding practically impossible. To efficiently utilize available spectrum and to allow robust reception in adverse channel conditions, joint source- channel coding is applied. This is achieved utilizing multiresolution modulation combined with unequal error protection in the channel coding part and data prioritization in source coder. These design parameters for the joint system are considered. The emphasis of this paper is on the modulation part. Multiresolution 32 QAM is presented. Simulations show good performance in the additive white gaussian noise channel and, moreover, results in multipath fading channel are encouraging as far as high priority level of the data is considered.
Image resolution improvement by swing CCD array imager in two dimensions and its signal processing
Author(s):
Ping Li;
Yumei Wen
Show Abstract
This paper presents the principle to enhance the spatial 2D resolution of spatial 2D resolution and the possibility of available limitation of maximum resolution. The paper also reports on image signal processing for 2D high definition pictures and the hanging-up piezoelectric bimorph construction designed to improve both horizontal and vertical resolution respectively by one time. The regenerated signal can be compatible with current TV mode.
Super-exponential method for blur identification and image restoration
Author(s):
Thomas J. Kostas;
Laurent M. Mugnier;
Aggelos K. Katsaggelos;
Alan V. Sahakian
Show Abstract
This paper examines a super-exponential method for blind deconvolution. Possibly non- minimal phase point spread functions (PSFs) are identified. The PSF is assumed to be low pass in nature. No other prior knowledge of the PSF or the original image is necessary to assure convergence of the algorithm. Results are shown using synthetically degraded satellite images in order to demonstrate the accuracy of the PSF estimates. In addition, radiographic images are restored with no knowledge of the PSF of the x-ray imaging system. These experiments suggest a promising application of this algorithm to a variety of blur identification problems.
Coding of an autostereoscopic 3D image sequence
Author(s):
Toshiaki Fujii;
Hiroshi Harashima
Show Abstract
This paper is concerned with the data compression and interpolation of multi-view image. In this paper, we formulate the compression and interpolation problem of 3D image and propose a novel coding scheme of autostereoscopic 3D images. First, we introduce a general representation of an autostereoscopic 3D image and explain the concept of `space coding'. In this concept, multi-view image can be regarded as sampled data of the `space', and the objective of transmitting multi-view image can be considered as the reconstruction of the `space' in the remote site. From this viewpoint, we then examine the 3D structure extraction coding. In the experiment, we assume a simple wireframe model and evaluate the coding efficiency in terms of the bit rate and SNR is decoded images. Finally, we report the experiment, where two sets of multi-view images were used as original images and the amount of data was reduced to 1/20 with SNR 34 dB.
Vector field visualization by projection pursuit: analysis of projection indices
Author(s):
G. Harikumar;
Yoram Bresler
Show Abstract
We address the problem of the display of vector valued images, also known as vector fields or multiparameter images, in which a vector of data, rather than a scalar, is associated with each pixel of a pixel grid. Each component of the vector field defines a gray-scale image on the pixel grid. Vector fields usually arise when more than one physical property (henceforth an attribute) is measured for the object being imaged. Mapping the measured values of any one attribute to gray-scale on a pixel grid defines an image. The collection of these images then defines a vector field, with a vector of attributes, corresponding to each pixel. The object usually consists of several disjoint regions, each made up of a region type, which we shall call a class. A set of attributes separates the classes (and therefore also the regions) if, for any two different classes, there exists an attribute in the set that has distinct values for the two. We refer to a pixel grid where the pixels are labeled as belonging to different regions as an Underlying Image (UI) . Vector fields arise naturally in any application that involves the display of multiparameter data, or multisensor fusion, as surveyed in [1], the classical example being multispectral satellite images. Once we have a vector field of an object consisting of measurements of a set of attributes that separate all of the classes in the object, we have the potential to distinguish between all of its regions. Regardless of the ultimate goal of the specific imaging application, most applications share a common intermediate goal, which is the one we address: to present a specialist human observer with an image that will maximize his chances to classify correctly the image pixels into different regions, that is, segment the image. Unfortunately, the human observer is inefficient at assimilating and intergrating multidimensional data, and traditional methods like the parallel display of the component images are often misleading, as demonstrated by an example in [1]. Another technique for vector field visualization is automatic image segmentation, which suffers from a serious drawback. Our premise is that the specialist human observer should be able to bring to bear all his prior knowledge and experience, which are difficult to capture in a computer program, on the final analysis process. This specialized prior knowledge could include information about the spatial structure of the different regions in the image, the contrast levels between the various regions and the within-region variability. An automatically segmented image would deprive the specialist of a chance to apply this knowledge, and result in inferior performance in many situations. Instead, we have recently proposed [1] an approach involving the fusion of the vector field into a single most informative gray-scale image. This is done by finding linear combinations of the component images of vector field that yield images with high discriminating power. (See also [2, 3]). We note that, although the problem is closely related to the classical problem of optimum feature extraction, our approach is non-classical. It explicitly takes into account the spatial structure of the data. This paper concentrates on the quantitative analysis and objective performance evaluation of the proposed algorithms.
Multiresolution block coding method for visualization of compressed images in multimedia applications
Author(s):
Oemer Nezih Gerek;
Enis A. Cetin
Show Abstract
Multimedia and Picture Archiving and Communication System (PACS) applications require efficient ways of handling images for communication and visualization. In many Visual Information and Management Systems (VIMS), it may be required to get quick responses to queries. Usually, a VIMS database has a huge number of images and may provide lots of images for each query. For example, in a PACS, the VIMS provides 10 to 100 images for a typical query. Only a few of these images may actually be needed. In order to find the useful ones, the user has to preview each image by fully decompressing it. This is neither computationally efficient, nor user friendly. In this paper, we propose a scheme which provides a magnifying glass type previewing feature. With this method, a multiresolution previewing without decompressing the whole image is possible. Our scheme is based on block transform coding which is the most widely used technique in image and video coding. In the first step of our scheme, all of the queried images are displayed in the lowest possible resolution (constructed from the DC coefficients of the coded blocks). If the user requests more information for a region of a particular image by specifying its size and place, then that region is hierarchically decompressed and displayed. In this way, large amounts of computations and bandwidth usage are avoided and a good user interface is accomplished. This method changes the ordering strategy of transform coefficients, thus reduces the compression ratio, however this effect is small.
Design of gain-optimized perfect reconstruction regular lattice filter banks
Author(s):
Patrick Waldemar;
Tor A. Ramstad
Show Abstract
This paper considers perfect reconstruction lattice filter banks. When optimizing for coding gain the purpose is to find a simple perfect reconstruction structure with few multiplication and reasonable gain. We present such a system which also possesses a certain regularity when expanding from N to 2N channels. Results including both obtained gain and the number of filter multiplications and filter magnitude responses are presented. The results show that the system gives strange filter responses, but good coding gain considering the number of multiplications.
Reconstruction of a high-resolution image from multiple-degraded misregistered low-resolution images
Author(s):
Brian C. Tom;
Aggelos K. Katsaggelos
Show Abstract
In applications that demand highly detailed images, it is often not feasible or sometimes possible to acquire images of such high resolution by just using hardware (high precision optics and charge coupled devices). Instead, image processing approaches can be used to construct a high resolution image from multiple, degraded, low resolution images. It is assumed that the low resolution images have been subsampled (thus introducing aliasing) and displaced by sub-pixel shifts with respect to a reference frame. Therefore, the problem can be divided into three sub-problems: registration (estimating the shifts), restoration, and interpolation. None of the methods which appeared in the literature solve the registration and restoration sub-problems simultaneously. This is sub-optimal, since the registration and restoration steps are inter-dependent. Based on previous restoration and identification work using the Expectation-Maximization algorithm, the proposed approach estimates the sub-pixel shifts and conditional mean (restored images) simultaneously. In addition, the registration and restoration sub-problems are cast in a multi-channel framework to take advantage of the cross- channel information. Experimental results show the validity of this method.
Application of the Gibbs-Bogoliubov-Feynman inequality in mean field calculations for Markov random fields
Author(s):
Jun Zhang
Show Abstract
Recently, there has been growing interest in the use of mean field theory (MFT) in Markov random field (MRF) model-based estimation problems. In many image processing and computer vision applications, the MFT approach can provide comparable performance to that of the simulated annealing, but requires much less computational effort and has easy hardware implementation. The Gibbs-Bogoliubov-Feynman inequality from statistical mechanics provides a general, systematic, and optimal approach for deriving mean field approximations. In this paper, this approach is applied to two important classes of MRF's, the compound Gauss-Markov model and the vector Ising model. The results obtained are compared and related to other methods of deriving mean field equations. Experimental results are also provided to demonstrate the efficacy of this approach.
Two-dimensional nonseparable perfect reconstruction (PR) filter bank design based on the Bernstein polynomial approximation
Author(s):
Kerem A. Harmanci;
Emin Anarim;
Hakan Caglar;
Bulent Sankur
Show Abstract
This study presents design of 2D nonseparable Perfect Reconstruction Filter Bank (PRFB) for two different sampling lattices: the quincuncial and rectangular. In quincunx case z-domain PR conditions are mapped into Bernstein-x domain. Desired power spectrum of 2D nonseparable filter is approximated by using Bernstein polynomial. Since we introduce mapping from complex periodic domain to real polynomial domain, PRFB design in Bernstein-x domain is much easier to handle. The parametric solution for 2D nonseparable design technique is obtained with desired regularity for quincunx sampling lattices. This technique allows us to design of 2D wavelet transform. For rectangular downsampling, the use of signed shuffling operations to obtain a PRFB from a low pass filter enables the reduction of PR conditions. This design technique leads us to efficient implementation structure since all the filters in the bank have the same coefficients with sign and position changes. This structure overcomes the high complexity problem that is the major shortcoming of 2D nonseparable filter banks. Designed filter banks are tested on 2D image models and real images in terms of compaction performance. It has been shown that nonseparable design can outperform separable ones in the application of data compression.
Artificial intelligence (AI)-based relational matching and multimodal medical image fusion: generalized 3D approaches
Author(s):
Stevan M. Vajdic;
Henry E. Katz;
Andrew R. Downing;
Michael J. Brooks
Show Abstract
A 3D relational image matching/fusion algorithm is introduced. It is implemented in the domain of medical imaging and is based on Artificial Intelligence paradigms--in particular, knowledge base representation and tree search. The 2D reference and target images are selected from 3D sets and segmented into non-touching and non-overlapping regions, using iterative thresholding and/or knowledge about the anatomical shapes of human organs. Selected image region attributes are calculated. Region matches are obtained using a tree search, and the error is minimized by evaluating a `goodness' of matching function based on similarities of region attributes. Once the matched regions are found and the spline geometric transform is applied to regional centers of gravity, images are ready for fusion and visualization into a single 3D image of higher clarity.
Comparative study on computed tomography algorithms
Author(s):
Nasser Zayed;
Bryan Lawton
Show Abstract
This study uses Computed Tomography (CT) for reconstructing images of solid propellant rocket motors during static firing tests. Implementation, verification and comparison of four CT algorithms are presented. These four algorithms are: Algebraic Reconstruction Technique, Linear Superposition with Compensation, and Fourier Convolution technique with parallel beams and fan-beam. The phantom used in the comparison between algorithms is similar in cross-section to a solid propellant rocket motor. Comparison between algorithms on the ability to detect artifacts is made. Also, a comparison is made using data obtained by optical tomography of the absorption coefficient inside a 20 mm gas gun barrel. Finally, a comparison of the running time versus number of projections, number of ray sums, and resolution is studied.
Fully automatic method for detecting center and tracking boundary points in left ventrical echocardiograms
Author(s):
Xiaowei Zhang;
John P. Oakley
Show Abstract
In this paper we present a new fully automatic method to quantify the motion of the left ventricle in a 2D echocardiographic image sequence. The approach is based on a matched filter which is designed to give a strong response at the curved boundary. This type of filter is used to detect boundary points on the left ventricle cavity wall for any selected (tangent) directions with high reliability in the presence of noise. The algorithm is divided into two stages. In the first stage the estimated cavity center point is identified by processing a sequence of frames corresponding to about one half of the heart beat cycle. In the second stage, two boundary points are located in each frame of the image sequence. The algorithm has been applied two different image sequences, a short-axis view and a long axis view, both of which are quite noisy. The results show accurate tracking of specific tangent points on the endocardial boundary, without any user intervention being required.
Rate-constrained motion estimation
Author(s):
Bernd Girod
Show Abstract
A theoretical frame work for rate-constrained motion estimation is introduced. The best trade- off between displacement vector rate and prediction error rate is shown to be the point were the partial derivatives of the multivariate distortion rate function are equal. Low bit-rate coders are severely rate-constrained, and coarser motion compensation is appropriate for lower bit- rates. A new region-based motion estimator builds a tree of nested regions from back to front, where each region is encoded by a closed B-spline contour. The rate-constraint is a natural termination criterion when building the region tree. Experimental results show that rate- constrained motion compensation is superior to a full-search block matching scheme when incorporated into a motion-compensating hybrid coding scheme, yielding up to 3 dB SNR improvement at 15 kbps.
Model-based image coding
Author(s):
Kiyoharu Aizawa
Show Abstract
Historically. progress in image coding tecliniqnes has been made by incorporating resnits from other fields SUCh as information theory. Most of the existing coding methods such as predictive coding. transform coding, and vector quantization belong to information-theory-based methods in which image signals are considered as ranidoni signals aiid compressed by exploiting their stochastic properties. Apart from these iiiformation-tlieoretic coding iniethods, research on new approaches such as modelbased coding (which is related to 1)0th image analysis and computer graphics) has recently intensified. An essential difference between conventional coding methods and these new approaches is the image model they assume. Contrary to conventional coding methods which efficiently represent 2-D waveforms of image signals model-based co ding represent iniage signals using structural image models which take into accomit the 3-D properties of the scene in some sense. There have been two major approaches to model-based image coding. One approach uses general niodels such as planar patches and sniooth surfaces to represent the 3-D properties of the oI)jects [11, 19, 28. 31, 33. 36, 39. 40, 43]. The other approach utilizes detailed 1)arameterized 3-D object models such as a parameterized facial model [2. 6, 13. 15. 17, 22. 24. 2G. 32, 35. 41. 47, 48. 49. 51. 54]. TIne former approach has been discussed in the context of advanced motion compensation. The latter approach is clearly different from the conventional approaches. since it uses an explicit 3—D model and encodes images with coniputer—vision— Oiieiite(il tecliniques an(I 1eJ)1()(luces iniages with colnj)uter—grapllics—oriented techniques . Both approaches will potentially achieves very low l)it rate video compression. Iii the following sectionis. unage co(ling schemes are described froni the point of view of their associated image IIli)delS. Amoiig our works related to these paradignins in the Univ. of Tokyo [2. 3, 4. 8, 9, 10, 36, 41 42. 40]. 3-D model-based coding and 2-D deforniable triangle based niotion compensation are described.
High-compression video coding using generic vector mapping
Author(s):
Ya-Qin Zhang;
Weiping Li
Show Abstract
High-compression video coding at very low bit rates has recently received much attention in academia, industry, and standard bodies. ISO/IEC/JTC1/SC29 has recently started a new initiative for very low bitrate coding of audio and video information (MPEG 4). ITU-T (formerly CCITT) SG 15/WP15/1, targeted primarily at video-telephony applications, has outlined near-term (H.VLC/N) specifications for audio/video coding, data interface, multiplexing, error control, modem, and overall system integration. This paper presents a new approach for very low bit rate video coding based on generic vector mapping and quantization (GVMQ). The vector mapping (VM) can be considered as an extension to the existing pixel- based mapping techniques, in which pixel-based operation is replaced by vector-based operations. Since VM intends to preserve the internal correlation structure within a vector while decorrelating different vectors, the subsequent VQ becomes highly efficient, recalling the fact that better VQ performance can be achieved when inter-vector correlation is reduced while intra-vector correlation is preserved. Two examples of the GVMQ for image coding are Vector Transform Coding (VTC) and Vector Subband Coding (VSC), developed previously by the authors. VTC and VSC can also be considered as extensions of DCT and subband coding to vector forms. From the preliminary study, we feel that GVMQ presents a promising approach to very low bit rate coding. It can be considered as a bridge between the conventional coding schemes and the object-oriented representation. The scheme is being further optimized for submission to relevant standard bodies including ISO/IEC/JTC1/SC29 and ITU-T SG 15.
MPEG-4: very low bit rate coding for multimedia applications
Author(s):
Isabelle Corset;
Sylvie Jeannin;
Lionel Bouchard
Show Abstract
Following the development of digital communications, focus is now on interactive multimedia services on telephone networks. To provide such applications, very low bit rate video coding algorithms have to be designed and optimized. A standardization process is being started within MPEG under the MPEG-4 denomination, to build a generic scheme for bit rates from 10 to 64 kbit/s. The Laboratories d'Electronique Philips are currently involved in European projects which are devoting their efforts to the elaboration of a common European proposal towards MPEG-4. This paper gives an overview of the research status on very low bit rate video coding within Philips and within Europe. Most of the European studies are concerned with region based compression technique; emphasis will therefore be put on the two major algorithms enhancing this technique.
Joint contour-based and motion-based image sequence segmentation for TV image coding at very low bit rate
Author(s):
Jenny Benois-Pineau;
Ling Wu Zhang;
Dominique Barba
Show Abstract
This paper describes a spatio-temporal segmentation of image sequences for object-oriented low bit rate image coding. The spatio-temporal segmentation is received by merging the spatial homogeneous regions into motion homogeneous ones. Each of these regions is characterized by a motion parameters vector and structural information which represents a polygonal approximation of region border. The segmentation of the current frames in the sequence is obtained by prediction and refinement. A predictive coding scheme is developed. Some estimates of the quantity of information are given.
Model-based image sequence coding using interframe AU correlation for very low bit rate transmission
Author(s):
Defu Cai;
Huiying Liang;
Xiangwen Wang
Show Abstract
The model-based image coding open up a wide aspect of applications in very/ultra low bit rate transmissions such as video telephone/teleconference and multimedia communication etc. This paper proposed a new scheme of model-based image sequence coding based on interframe AUs correlation which would be suitable for very/ultra low bit rates transmission. A new criteria based on Cauchy-Schwaz-Inequality for facial expressions, a fast automatic face feature point extraction method for real time applications and the method of AUs separation of a face on various expressions had been given. In this paper, the deformable model controlled by motion compensation, expressive facial images synthesized by given AUs and knowledge base were realized. The results of computer simulation are given.
Progressive optimization in subband trees
Author(s):
Mehmet V. Tazebay;
Ali Naci Akansu
Show Abstract
This paper shows that performance improvements are possible if different filter banks in the different levels, even in the nodes, of a subband tree are used. A progressive optimization technique is proposed for fine tuning of the product subband filters in a tree structure. The time and frequency properties of product subband filters are evaluated and comparisons made.
Progressive orthogonal tilings of the time-frequency plane
Author(s):
Manuel A. Sola;
Sebastia Sallent-Ribes
Show Abstract
This paper proposes a fast splitting algorithm (FSA) for a signal that when combined with an optimally criterion defined in the frequency domain leads to coherent tilings of the time- frequency plane. For a given set of basis regions formed by allowed subsets of the signal and a cost function defined over this set, we find the minimum cost cover of the signal by means of a fast algorithm. We show how when an additive cost measure is defined over the subband decomposition induced by a given filter bank, the method admits a solution in the form of a progressive orthogonal tiling. When progressive conditions are verified this method can be modelled with simple structures such as trellis diagrams or ordinary Petri nets. The extension of this method to bidimensional signals and conditions for fast algorithms are also discussed. The set of partitions obtained by the double tree algorithm is included in those considered by the FSA, allowing the later better signal analysis. We also present two approaches that reduced properly the complexity of the FSA maintaining the improvements of the method. The first one is based in constraining the set of basis regions to those with dyadic support, while the second one bounds the maximum support of basis regions.
Optimal biorthogonal wavelet analysis
Author(s):
Michael G. Strintzis
Show Abstract
The paper investigates the design of filter banks generating the optimal signal representation by M-band one-dimensional and multi-dimensional biorthogonal wavelet frames. Criterion of optimality is the minimization of the average mean-square approximation error at each level of the decomposition. Preliminary results of the application of such filter banks to two- dimensional image compression are included.
Discretization of the Gabor-type scheme by sampling of the Zak transform
Author(s):
Meir Zibulski;
Yehoshua Y. Zeevi
Show Abstract
The matrix algebra approach was previously applied in the analysis of the continuous Gabor representation in the Zak transform domain. In this study we analyze the discrete and finite (periodic) scheme by the same approach. A direct relation that exists between the two schemes, based on the sampling of the Zak transform, is established. Specifically, we show that sampling of the Gabor expansion in the Zak transform domain yields a discrete scheme of representation. Such a derivation yields a simple relation between the schemes by means of the periodic extension of the signal. We show that in the discrete Zak domain the frame operator can be expressed by means of a matrix-valued function which is simply the sampled version of the matrix-valued function of the continuous scheme. This result establishes a direct relation between the frame properties of the two schemes.
Structure of the Gabor matrix and efficient numerical algorithms for discrete Gabor expansions
Author(s):
Sigang Qiu;
Hans Georg Feichtinger
Show Abstract
The standard way to obtain suitable coefficients for the (non-orthogonal) Gabor expansion of a general signal for a given Gabor atom g and a pair of lattice constants in the (discrete) time/frequency plane, requires to compute the dual Gabor window function g- first. In this paper, we present an explicit description of the sparsity, the block and banded structure of the Gabor frame matrix G. On this basis efficient algorithms are developed for computing g- by solving the linear equation g- * G equals g with the conjugate- gradients method. Using the dual Gabor wavelet, a fast Gabor reconstruction algorithm with very low computational complexity is proposed.
New family of orthonormal transforms with time localizability based on DFT
Author(s):
Kook-Yeon Kwak;
Richard A. Haddad
Show Abstract
The library of orthonormal bases includes Short Time Fourier Transform and Wavelet Transform bases. The widely-used DFT lacks flexibility in dealing with signals possessing a time-varying spectrum. Such signals require a variable time-frequency-localizability feature for adequate resolution. In this paper, we introduce new orthonormal transforms which are constructed from the DFT yet have variable time-frequency-localizability feature and fast computing capability. We create a framework for constructing Time Localization Linear Transform (TLLT) upon the concept that time localizability can be recovered by inverse transform of selected partial collection of forward transform coefficients. By coding the coefficients of TLLT, we can combine the adaptability of Vector Quantization with the speed of linear transform.
Design of filters for minimum-variance multiresolution analysis and subband decomposition
Author(s):
Christos G. Chrysafis;
Michael G. Strintzis
Show Abstract
The paper investigates the design of minimum-variance perfect reconstruction filter banks for use in image compression applications. These filter banks are formed so that the variance of the difference between input and output of each band is minimized. This has the effect of minimizing the error occurring when only one of all filter banks is retained and the rest are omitted. The paper presents a method for the design of such banks so as to achieve in each band minimum variance and a linear-phase realization appropriate for application to images. Some results of the actual implementation of these filter bands to image compression are included.
Contour simplification by a new nonlinear filter for region-based coding
Author(s):
Chuang Gu;
Murat Kunt
Show Abstract
The guiding principle of this study is to find an optimum way to simplify the contours produced by a second generation coding scheme based on morphological segmentation. For this purpose, evaluations of existing methods for contour simplification are carried out first. Based on the human visual phenomenon, a new nonlinear filter by means of majority operation is designed to simplify the contours in order to obtain an optimum compromise between the cost for contour coding and visual quality. Applications for region-based still image coding and video coding are demonstrated. Experimental results have shown an average of 20% reduction of bits for contour coding while keeping good visual quality.
Ordering color maps for lossless compression
Author(s):
Nasir D. Memon;
Sibabrata Ray
Show Abstract
Linear predictive techniques perform poorly when used with color mapped images as there is no linear relationship between neighboring pixels values. Re-ordering the color table, however, can lead to a lower entropy of prediction errors. The problem of ordering the color table such that the absolute weight of the prediction errors is minimized turns out to be intractable. In fact, even for the simplest prediction scheme that uses the value of the previous pixel for the current pixel, the problem of obtaining an optimal ordering turns out to be the optimal linear arrangement problem. The optimal rearrangement problem can be abstracted as a graph problem, and is known to be intractable. We give two heuristics for the problem and use them for ordering the color table of a color mapped image. The first heuristic is based on the famous network flow problem and is computationally expensive. The second heuristic involves successive transposition of color table entries and is simple in terms of implementation and time complexity. Simulation results giving comparison of the two heuristics are presented. Application of the ordering techniques to lossless compression of gray scale image data is also presented. Re-ordering intensity values for images sometimes leads to significant improvements in compression rates. For example, improvements of almost one bit per pixel were obtained with the well known USC-Girl image. Simulation results for a set of standard images are presented.
Self-referred texture coding for segmentation-based codec
Author(s):
Philippe Salembier;
Felipe Lopez
Show Abstract
This paper describes a texture coding technique mainly suitable for segmentation-based coding schemes. The main features of the proposed technique are its efficiency in terms of bits per pixel for homogeneous regions and its ability to deal with local inhomogeneities that may be present in the image. The basic idea of the coding strategy is to divide the image into blocks and to classify the blocks in two categories: Referable and Nonreferable. Referable means that the block can be approximated by one block of the already transmitted texture and nonreferable is defined by opposition. Nonreferable blocks are transmitted with a general purpose coding scheme (for example a DCT-based technique) and referable blocks are transmitted by means of a simple transition vector indicating which sample of the transmitted texture has to be translated. We show that this technique is suitable for texture but produces distortions for strong contours. As a result, we propose to use it within a segmentation-based coding scheme where contours are transmitted by another strategy. Finally, the application to sequence coding is discussed. It is shown that this technique is particularly attractive to code the prediction error within a motion compensated video coding scheme.
Fixed binary linear quadtree coding scheme for spatial data
Author(s):
Henry Ker-Chang Chang;
Jayh-Woei Chang
Show Abstract
In this paper, a new linear quadtree construction based on a binary coding is proposed. The proposed fixed binary linear quadtree (FBLQ) coding scheme emphasizes the reduction of bit string to encode spatial data. The decomposition discrimination process is designed to separate data representation in two lists. In the decomposition discrimination process, identifier list is first used to indicate whether or not four subquadrants of a decomposition should be further decomposed. Once a subquadrant is found to have uniform intensity, i.e. the region is of either object or background, a bit of either one or zero is inserted to the color list. With the usage of both identifier and color lists, a compact storage is derived to code spatial data on a binary image. Two sets of binary images with various resolution factors are tested in the experiments; a theoretic analysis for the image in the worst case is also derived. Both empirical results and theoretic analysis demonstrate that the proposed FBLQ coding scheme has the smallest storage space among all other three methods.
Image coding using a weak membrane model of images
Author(s):
Tolga Acar;
Muhittin Gokmen
Show Abstract
Object boundaries detected by edge detection algorithms provide a rich, meaningful and sparse description of an image. In this study, we develop an image compression algorithm based on such a sparse description which is obtained by using weak membrane model of the image. In this approach, image is modelled as a collection of smooth regions separated by edge contours. This model allows us to determine edge contours, represented as line processes, by minimizing a nonconvex energy functional associated with a membrane, and to reconstruct the original image by using the same model. Thus despite the previous work where first edges are obtained by an edge detection algorithm based on convolution and then surface is reconstructed by using a completely different process such as interpolation, in our approach the same process is used for both detecting edges and reconstructing surfaces from them. We coded the line processes by using run length coding and the sparse data around line processes by using the entropy coding. We evaluate the performance of the algorithm qualitatively and quantitatively on various synthetic and real images, and show that good quality images can be obtained for moderate compression ratio like 5:1 while this ratio may reach up to 20:1 for some images.
Low bit-rate video compression based on maximum a posteriori (MAP) recovery techniques
Author(s):
Taner Ozcelik;
Aggelos K. Katsaggelos
Show Abstract
Most of the existing video coding algorithms produce highly visible artifacts in the reconstructed images as the bit-rate is lowered. These artifacts are due to the information loss caused by the quantization process. Since these algorithms treat decoding as simply the inverse process of encoding, these artifacts are inevitable. In this paper, we propose an encoder/decoder paradigm in which both the encoder and decoder solve an estimation problem based on the available bitstream and prior knowledge about the source image and video. The proposed technique makes use of a priori information about the original image through a non- stationary Gauss-Markov model. Utilizing this model, a maximum a posteriori estimate is obtained iteratively using mean field annealing. The fidelity to the data is preserved by projecting the image onto a constraint set defined by the quantizer at each iteration. The performance of the proposed algorithm is demonstrated on an H.261-type video codec. It is shown to be effective in improving the reconstructed image quality considerably while reducing the bit-rate.
Multigrid techniques and wavelet representations in image superresolution
Author(s):
Mariappan S. Nadar;
Bobby R. Hunt;
Philip J. Sementilli
Show Abstract
The Expectation Maximization algorithm for Poisson data, the Poisson-MLE algorithm, and the Simultaneous Multiplicative Algebraic Reconstruction Technique are three iterative solutions to minimum Kullback-Leibler (KL) distance methods. It has been noted empirically that the performance of the three minimum KL distance methods rely on the sparseness of the object. In a previous work, ad hoc object representation schemes were reported that improved the performance of these algorithms, for objects with a significantly high background. In addition to the limitation on the nature of the object estimated, these algorithms have a slow convergence rate. Multigrid methods and wavelet decompositions are two closely related concepts. Multigrid methods were proposed to improve the convergence rates of some iterative methods by appending corrections from coarse grids to an approximate estimate at the fine grid. Wavelet representations, on the other hand, have achieved tremendous success in signal compression applications. This is a direct consequence of the fact that the wavelet transform redistributes the energy in the signal to small number of transform coefficients, thus making the wavelet representation approximately sparse. In addition, the spaces spanned by the wavelet bases comprise of elements with a significant number of near zero sample values. In this paper we expound on the similarities and differences of the two concepts as pertaining to the imaging equation. This leads to a multigrid formulation based on the wavelet subspaces. The goal of this paper is to use this new formulation to overcome two deficiencies of minimum KL distance methods, viz., ringing artifacts due to significant background values in the object and slow convergence of the iterative methods.
Spatially adaptive multiscale image restoration using the wavelet transform
Author(s):
Mark R. Banham;
Aggelos K. Katsaggelos
Show Abstract
In this paper, we present a new adaptive approach to the restoration of noisy blurred images which is particularly effective at producing sharp deconvolution while suppressing the noise in the flat regions of an image. This is accomplished through a multiscale Kalman smoothing filter applied to a prefiltered observed image in the discrete, separable, 2D wavelet domain. The prefiltering step involves a constrained least squares filter based on a very small choice for the regularization parameter, producing an under-regularized restored image. This leads to a reduction in the support of the required state vectors in the wavelet domain, and an improvement in the computational efficiency of the multiscale filter. The proposed method has the benefit that the majority of the regularization, or noise suppression, of the restoration is accomplished by the efficient multiscale filtering of wavelet detail coefficients ordered on quadtrees. Not only does this permit adaptivity to the local edge information in the image, but it leads to potential parallel implementation schemes. In particular, this method changes filter parameters depending on scale, local signal-to-noise ratio, and orientation. Because the wavelet detail coefficients are a manifestation of the multiscale edge information in an image, this algorithm may be viewed as an `edge-adaptive' multiscale restoration approach.
Image enhancement of the galaxy VV371c using the 2D fast wavelet transform
Author(s):
James F. Scholl
Show Abstract
The fast wavelet transform for images developed by Mallat is a useful and powerful tool for image processing, with its main applications being compression, feature extraction and image enhancement. In particular, since this algorithm segments an image with respect to both spatial frequency and orientation, image enhancement becomes more precise and efficient. This idea is demonstrated by the removal of background noise and flaws from two digitized images of the faint galaxy VV371c. This object was first studied in the early 1980's using older and less sophisticated computer technologies and image processing techniques. We present results of the wavelet based image analysis of VV371c which yield new conclusions as to this galaxy's structure and morphological classification.
Multiscale filtering method for derivative computation
Author(s):
Bingcheng Li;
Songde Ma
Show Abstract
In this paper, we propose a multiscale filtering method to compute derivatives with any orders. As a special case, we consider the computation of the second derivatives, and show that the difference of two smoothers with the same kernel, but different scales constructs a Laplacian operator and has a zero crossing at a step edge. Selecting a Gaussian function as the smoother, we show the DOG (difference of Gaussian) itself is a zero crossing edge extractor, and it needn't approximate to LoG (Laplacian of Gaussian). At the same time, we show that even though DOG for bandwidth ratio 0.625 (1:1.6) is the optimal approximation to LoG, it is not optimal for edge detection. Finally, selecting an exponential function as the smoothing kernel, we obtain a Laplacian of exponential (LoE) operator, and it is shown theoretically and experimentally that the LoE has a high edge detection performance, furthermore its computation is efficient and its computational complexity is independent of the filter kernel bandwidths.
Regularized constrained total least-squares image restoration
Author(s):
Vladimir Z. Mesarovic;
Nikolas P. Galatsanos;
Aggelos K. Katsaggelos
Show Abstract
In this paper the problem of restoring an image distorted by a linear space-invariant point- spread function (psf) which is not exactly known is formulated as the solution of a perturbed set of linear equations. The regularized constrained total least-squares (RCTLS) method is used to solve this set of equations. Using the diagonalization properties of the discrete Fourier transform (DFT) for circulant matrices, the RCTLS estimate is computed in the DFT domain. This significantly reduces the computational cost of this approach and makes its implementation possible for large images. Numerical experiments for different psf approximations are performed to check the effectiveness of the RCTLS approach for this problem. Objective and subjective comparisons are presented with the linear minimum mean- squared-error and the regularized least-squares estimates, for 2D images, that verify the superiority of the RCTLS approach.
Image restoration using spectrum estimation
Author(s):
Ki-Woon Na;
Joon-Ki Paik
Show Abstract
A stochastic approach to image restoration is proposed by using various spectrum estimation techniques. In order to estimate the original image from the knowledge of observed image, the minimum mean square error filter or Wiener filter is known to be optimum in the sense of minimizing the mean square error. The optimality of Wiener filter, however, holds only when the power spectra of the original image and noise are given in addition to the transfer function of the imaging system. In practice, the information of the original image is generally not available. In the present paper additive noise is assumed to be white with known variance and the Wiener filter is implemented using various estimation techniques for the original spectrum. The proposed method shows significant improvement over the conventional methods, such as the Wiener filter using constant signal-to-noise power ratio, particularly for images with low signal-to-noise ratio.
Digital filtering of images using the discrete sine or cosine transform
Author(s):
Stephen A. Martucci
Show Abstract
The discrete sine and cosine transforms (DSTs and DCTs) are powerful tools for image compression. They would be even more useful if they could be used to perform convolution. As we have recently shown, these transforms do possess a convolution-multiplication property and therefore can be used to perform digital filtering. The convolution is a special type, called symmetric convolution. This paper reviews the concepts of symmetric convolution then elaborates on how to use the operation, and therefore DSTs and DCTs, to implement digital filters for images. An indication of how to compute the transforms and what is the complexity of their use is also given.
Video coding using wavelet transform, windowed motion compensation, and conditional entropy coding
Author(s):
Ulrik Hindo;
Susanne Schjorring
Show Abstract
In this paper a video coding scheme is presented which is a hybrid coding scheme tailored to suit the wavelet transform. Using conventional block based motion compensation introduces blocking effects in the prediction error signal which are expensive to code when using a wavelet transform. Therefore a windowed motion compensation using overlapping blocks is used to smooth out transitions between blocks. A method for coding the quantized wavelet coefficients is presented. The positions of non-zero coefficients are coded using conditional entropy coding. The method exploits the correlation between coefficients inside a subband as well as between coefficients in different subbands. Simulation results are given for different prediction methods and different methods to code the quantized wavelet coefficients when coding an R.601 signal to 4 Mbit/s.
Pixel-based compensated video compression
Author(s):
Navid Haddadi;
C.-C. Jay Kuo
Show Abstract
A linear pixel-based motion model across several frames is proposed in compensated video coding, where a single motion field is used for the prediction of several future frames. Experimental results are presented to show that the computed motion field can be used to obtain very good predictions. The compressibility of the motion field is addressed. It is shown that large regions of the image may be approximated by lower resolutions, and small prediction error can be maintained for the entire predicted frames by changing the resolution of the motion field.
Fractal (self-VQ) encoding of video sequences
Author(s):
Yuval Fisher;
Daniel N. Rogovin;
Tsae-Pyng Janice Shen
Show Abstract
We present results of a scheme to encode video sequences of digital image data based on a quadtree still-image fractal method. The scheme encodes each frame using image pieces, or vectors, from its predecessor; hence it can be thought of as a VQ scheme in which the code book is derived from the previous image. We present results showing: near real-time (5 - 12 frames/sec) software-only decoding; resolution independence; high compression ratios (25 - 244:1); and low compression times (2.4 - 66 sec/frame) as compared with standard fixed image fractal schemes.
Adaptive three-dimensional motion-compensated wavelet transform for image sequence coding
Author(s):
Jean-Pierre Leduc
Show Abstract
This paper describes a 3D spatio-temporal coding algorithm for the bit-rate compression of digital-image sequences. The coding scheme is based on different specificities namely, a motion representation with a four-parameter affine model, a motion-adapted temporal wavelet decomposition along the motion trajectories and a signal-adapted spatial wavelet transform. The motion estimation is performed on the basis of four-parameter affine transformation models also called similitude. This transformation takes into account translations, rotations and scalings. The temporal wavelet filter bank exploits bi-orthogonal linear-phase dyadic decompositions. The 2D spatial decomposition is based on dyadic signal-adaptive filter banks with either para-unitary or bi-orthogonal bases. The adaptive filtering is carried out according to a performance criterion to be optimized under constraints in order to eventually maximize the compression ratio at the expense of graceful degradations of the subjective image quality. The major principles of the present technique is, in the analysis process, to extract and to separate the motion contained in the sequences from the spatio-temporal redundancy and, in the compression process, to take into account of the rate-distortion function on the basis of the spatio-temporal psycho-visual properties to achieve the most graceful degradations. To complete this description of the coding scheme, the compression procedure is therefore composed of scalar quantizers which exploit the spatio-temporal 3D psycho-visual properties of the Human Visual System and of entropy coders which finalize the bit rate compression.
Hybrid VQ of video sequences using quadtree motion segmentation
Author(s):
Wenhua Li;
Ezzatollah Salari
Show Abstract
A new hybrid coding scheme for video sequences is presented. With the introduction of a fast quadtree motion segmentation algorithm, motion vectors are estimated with variable size block matching which produces better performance considering both overhead motion information and motion compensated prediction error. Small blocks containing high motion activities are intraframe vector quantized, whereas large blocks representing smooth motion areas are first decimated and then interframe vector quantized. Simulation results demonstrate that the proposed scheme performs very well.
Adaptive perceptual quantization using a neural network for video coding
Author(s):
Byung-Sun Choi;
H. D. Cho;
Kyoung Won Lim;
Kangwook Chun;
Jong Beom Ra
Show Abstract
This paper describes a new adaptive quantization algorithm for video sequence coding, which can reflect perceptual characteristics of macroblocks by using a neural network classifier. Multilayer perceptron is adopted as a neural network structure, and the feature parameters and target classes of training macroblocks are prepared for learning. The coding performance based on the neural network classifier is investigated by computer simulation. In comparison with both the non-adaptive quantization scheme and the adaptive one in the MPEG-2 TM5, the proposed scheme is proven to enhance perceptual quality in video coding.
Human visual system-based block classification algorithm for image sequence coders
Author(s):
Albert A. Deknuydt;
Stefaan Desmet;
Luc Van Eycken;
Andre J. Oosterlinck
Show Abstract
For a long time, most of the hybrid codecs for moving images have been of a block based type. Each block is then coded rather independently by spending bits according to its statistically relevant contents. But this does not yet mean that the data is visually relevant. To overcome this, sometimes people incorporate some visual parameters inside their coding process. The visual weighting matrix so often used in DCT coding is such an example. However, this approach often results in complications, since the coding algorithm as such is almost never suitable to extract or interpret visual relevance. For instance, DCT coefficients do not represent a visual phenomenon. The problem can be dealt with more thoroughly by completely separating the process into a visual analysis step and a coding step. An analysis step can now assign attributes to visual features, irrespectively of the actual coding algorithm. Its results can then be used by the coding engine to choose a suitable algorithm, to adapt parameters or to choose a local quality criterion. In this way, it is possible to code better adapted to local visual features. This paper will describe a codec using such a classification algorithm and present some results.
Adaptive frame type selection for low bit-rate video coding
Author(s):
Jungwoo Lee;
Bradley W. Dickinson
Show Abstract
In this paper, we present an adaptive frame type selection algorithm for motion compensation, which is applied to low bit rate video coding. In the adaptive scheme, the number of reference frames for motion compensation is determined by a scene change detection algorithm using temporal segmentation. To choose for the distance measure for the temporal segmentation, three histogram-based measures and one variance-based measure were tested and compared. The reference frame positions may be determined by an exhaustive search algorithm which is computationally complex. The complexity can be reduced by using a binary search algorithm which exploits the monotonicity of the distance measure with respect to the reference frame interval. Variable target bit allocation for each picture type in a group of pictures is used to allow a variable number of reference frames with the constraint of constant channel bit rate. Simulation results show that the reference frame positioning scheme compares favorably with the fixed interpolation structure at the bit rates of 64 kb/s and 14.4 kb/s.
Improved delta modulation algorithm for image coding
Author(s):
Tian-Hu Yu;
Sanjit K. Mitra
Show Abstract
In a delta modulation (DM) system, the difference between two consecutive pixel intensities is coded by an one-bit quantizer. The performance of a basic DM system depends only on two quantities: sampling rate and quantization step size. In order to reach low bit-rate while preserving the decoded image quality, these two quantities have to be designed carefully. The performance of the DM system can be improved by using a delayed coding scheme and/or adapting the step size. However, such a DM system becomes so complex that it loses the advantage of simplicity. In this paper, we propose a modified DM algorithm to keep the advantage of simplicity while improving the performance of the DM system. Based on the characteristics of the human visual system, we introduce a nonlinear DM scheme by adapting a relative step size, in which the Robert's pseudo-noise technique is employed to make the proposed system stable. In addition, we show that the proposed system with a 2D predictor improves the subjective quality of the decoded image.
Optimized nonorthogonal transforms for image compression
Author(s):
Onur G. Guleryuz;
Michael T. Orchard
Show Abstract
This paper proposes an algorithm for improving the performance of standard block transform coding algorithms by better exploiting the correlations between transform coefficients. While standard algorithms focus on decorrelating coefficients within each block, the new approach focuses on exploiting correlations between coefficients of different blocks. Interblock correlations are minimized by linearly estimating coefficients from previously transmitted neighboring block coefficients. The use of linear estimators between blocks leads to a nonorthogonal representation of the image. Quantization issues relating to this nonorthogonal transform are addressed, and an image coding implementation is simulated. Simulations demonstrate that large improvements are observed over standard block transform coding systems, over a wide range of bitrates.
Design of psychovisual quantizers for a visual subband image coding
Author(s):
Abdelhakim Saadane;
Hakim Senane;
Dominique Barba
Show Abstract
In order to specify the optimal psychovisual quantizer associated to a given visual subband decomposition scheme, a new methodology has been developed. Psychovisual experiments based on the visibility of the quantization noise, have been conducted. The complex signals used, have been characterized by the local band-limited contrast. The main finding of this study is that the quantizers are, in the chosen space contrast, of a linear type. The quantization intervals contrast, obtained with a given observer, are of 0.039 for the subband called III-1 (radial selectivity 1.5 cy/d degree(s) to 5.7 cy/d degree(s), angular selectivity -22.5 d degree(s) to 22.5) 0.031 for the subband called IV-1 (radial selectivity 5.7 cy/d degree(s) to 14.1 cy/d degree(s), angular selectivity -15 d degree(s) to 15 d degree(s)) 0.117 for the subband called V-1 (radial selectivity 14.1 cy/d degree(s) to 28.2 cy/d degree(s), angular selectivity -15 d degree(s) to 15 d degree(s)). To evaluate the importance of the `angular' aspect in this approach, further measurements have been made with the subband IV-2 (radial selectivity 5.7 cy/d degree(s) to 14.1 cy/d degree(s), angular selectivity 15 d degree(s) to 30 d degree(s)). The linearity is also observed and the quantization interval contrast, for the same observer, is of 0.122.
Picture quality evaluation based on error segmentation
Author(s):
Wen Xu;
Gert Hauske
Show Abstract
A segmentation-based error metric (SEM) is proposed to evaluate the quality of pictures with impairments resulting from typical source coding algorithms and channel interference. After appropriate visual preprocessing, the error picture is segmented into errors on own edges of the picture, errors representing exotic or spurious edges, and remaining errors in flat regions to describe edge errors like blurring, exotic structures like blocking and contouring, and residual errors like random noise, respectively. Error parameters or distortion factors are derived by appropriate summation over the segmented components. The distortion metric is built by a combination of the parameters using a generalized multiple linear regression procedure. Tests with a picture data base consisting of impairments from various picture coding techniques applied to different types of pictures have shown that the SEM yields very promising results. The correlation coefficient with subjective ratings was 0.875, whereas the widely used PSNR had only a correlation of 0.653. In addition, it is also possible to classify type and amount of individual distortions.
Error concealment technique using projection data for block-based image coding
Author(s):
Kyeong-Hoon Jung;
Joon-Ho Chang;
ChoongWoong Lee
Show Abstract
A new error concealment technique for block-based image coding, like JPEG or MPEG, is described in this paper. We propose a projective interpolation scheme, which is based on the fact that a 2D image object can be well reconstructed by its projection data. In our projective interpolation scheme, the projection data and the interpolation direction are estimated using boundary pixels surrounding a missing block and then the missing block is interpolated by using bilinear reconstruction. Also, the edge pattern of the missing block is examined and utilized with the projective interpolation scheme so as to guarantee the continuity of edge. The proposed algorithm reveals superior subjective quality as well as objective quality to the conventional schemes. The improvement in subjective quality is mainly due to the fact that edges are well preserved and that blocking artifacts are negligible. The proposed algorithm is expected to be useful for the error resilience scheme in any type of block-based image processing and will be applicable to various image coding schemes.
Projection-based spatially adaptive reconstruction of block transform compressed images
Author(s):
Yongyi Yang;
Nikolas P. Galatsanos;
Aggelos K. Katsaggelos
Show Abstract
In conventional block-transform coding, the compressed images are decoded using only the transmitted transform data. In this paper, we formulate image decoding as an image recovery problem. According to this approach, the decoded image is reconstructed using not only the transmitted data, but in addition, the prior knowledge that images before compression do not display blocking artifacts. A spatially-adaptive image recovery algorithm is proposed based on the theory of projections onto convex sets. Numerical experiments demonstrate that the proposed algorithm yields images superior to those from both the JPEG deblocking recommendation and a projection-based image decoding approach.
Registration of multiple image sets with thin-plate spline
Author(s):
Liang He;
James C. Houk
Show Abstract
A thin-plate spline method for spatial warping was used to register multiple image sets during 3D reconstruction of histological sections. In a neuroanatomical study, the same labeling method was applied to several turtle brains. Each case produced a set of microscopic sections. Spatial warping was employed to map data sets from multiple cases onto a template coordinate system. This technique enabled us to produce an anatomical reconstruction of a neural network that controls limb movement.
Medical image compression using b-splines and vector quantization
Author(s):
Javad Alirezaie;
John A. Robinson
Show Abstract
A lossy image compression technique, incorporating least squares cubic spline pyramids, vector quantization, predictive coding and arithmetic coding was developed for the compression and reconstruction of Magnetic Resonance Images. Typical results of 29.76 dB Peak Signal-to-Noise ratio (PSNR) for 0.45 bits per pixel (bpp) compression, and 27.91 dB PSNR for 0.33 bpp, compare very favorably with other, recently reported, medical image compression results. Furthermore, block artifacts are absent from the recovered pictures.
New approach to inter-slice coding of magnetic resonance images
Author(s):
Nader Mohsenian;
Aria Nosratinia;
Bede Liu;
Michael T. Orchard
Show Abstract
In this paper we present an inter-slice coding scheme for magnetic resonance (MR) image sequences which is based on a triangular matching technique. This scheme provides a powerful tool for MR image sequence compression since there exists a great deal of deformation between contiguous slices of 3D MR data. The proposed inter-slice coding scheme takes advantage of triangular deformation to remove redundancies in the third dimension. A residual image is formed by estimating and compensating for the triangular deformation in the previous slice, and subtracting this from the original slice. The DCT is used to de-correlate the residue signal. DC components are encoded losslessly using DPCM and Huffman codes. Different banks of entropy-constraint scalar quantizers are designed for the AC components. Each AC coefficient is encoded using an appropriate scalar codebook subject to a bit-allocation procedure. The new inter-slice coding scheme is tested on an MR heart sequence. It shows superior results over intra-slice and other inter-slice schemes. The compressed images displayed on the monitor are imperceptible from the original at our lowest bit-rates.
Early detection of cancerous changes by mammogram comparison
Author(s):
Dragana P. Brzakovic;
Nenad S. Vujovic;
Milorad Neskovic
Show Abstract
This paper describes an approach to mammogram analysis which detects cancerous signs by comparing newly acquired mammograms with previous screenings of the same patient. The comparison is carried out regionally between appropriate mammograms in three sequential steps: (1) mammogram registration, (2) mammogram partitioning, and (3) analysis and comparison of regional intensity statistics. The registration is established pairwise between the new mammogram and the corresponding one acquired in the previous screening. The objective of registration is to allow interpretation of the newly acquired mammograms in terms of the earlier screenings. The newly acquired mammogram is partitioned by using the hierarchical region growing technique that models relationships between pixels by the fuzzy membership function. By changing parameters of the function obtained regions are split further (when needed) to allow analysis of smaller regions. Region splitting is required in order to achieve higher sensitivity to small changes such as appearance of microcalcifications. The intensity statistics of a region in the newly acquired mammogram is compared to the intensity statistics of the corresponding region in the previous mammogram. The comparison is carried out by assuming that intensity differences in the case of cancerous changes are primarily related to higher intensities. The approach was tested on twelve cases represented by 96 mammograms. In all cases the approach performed satisfactorily.
Computation-efficient algorithms for image registration
Author(s):
Xin Qiu;
Chai Huat Chong;
Chien-Min Huang;
Christopher M. U. Neale
Show Abstract
Two computation-efficient algorithms are presented for automatically generating matching pixels for image registration. The first algorithm is an efficient Sequential Spatial Search Algorithm. A window selection and searching approach incorporated with a post-processing technique are used for restricting the image search area in order to increase reliability of pixel matching. The second method is the Hierarchical Fuzzy C-varieties-based algorithm. The translation and rotation between two images are estimated based on a combination of the closest match both in the centroid value and the membership count. Through a match-and- search procedure, the linear transformation is determined between images. Both methods resulted in good image mosaicking quality even for large rotation and translation.
Subband analysis and synthesis of volumetric medical images using wavelet
Author(s):
Chang Wen Chen;
Ya-Qin Zhang;
Kevin J. Parker
Show Abstract
We present in this paper a study of subband analysis and synthesis of volumetric medical images using 3D separable wavelet transforms. With 3D wavelet decomposition, we are able to investigate the image features at different scale levels that correspond to certain characteristics of biomedical structures contained in the volumetric images. The volumetric medical images are decomposed using 3D wavelet transforms to form a multi-resolution pyramid of octree structure. We employ a 15-subband decomposition in this study, where band 1 represents the subsampled original volumetric images and other subbands represent various high frequency components of a given image. Using the available knowledge of the characteristics of various medical images, an adaptive quantization algorithm based on clustering with spatial constraints is developed. Such adaptive quantization enables us to represent the high frequency subbands at low bit rate without losing clinically useful information. The preliminary results of analysis and synthesis show that, by combining the wavelet decomposition with the adaptive quantization, the volumetric biomedical images can be coded at low bit rate while still preserving the desired details of biomedical structures.
New version of Fourier convolution algorithm in computed tomography technique
Author(s):
Nasser Zayed;
Bryan Lawton
Show Abstract
A new version of the Fourier Convolution algorithm in Computed Tomography is created by modifying the filter function, considering the phantom as a matrix of square pixels and taking the ray sum width equal to the pixel side. Each ray sum has a number of pixels which are partially covered by the ray sum width. This creates a weighting factor which is the correlation vector between the scanning beams and the phantom pixels. Each pixel is reconstructed more than once in each projection. A comparison of the reconstructed variable between images obtained by using the Shepp and Logan filter and the modified filter is presented. Also, the comparison between the Fourier algorithm and the new version is illustrated to compare the image quality and the running time.
Comparison of motion vector coding techniques
Author(s):
Philippe Guillotel;
Christophe Chevance
Show Abstract
This paper deals with the comparison in different motion vector coding methods suitable for motion compensated image communication systems. Both waveform and description coding techniques are presented and applied to a dense motion field. Only lossless compression is considered, and a reduction factor between 3:1 and 9:1 of the raw bit rate of the motion vectors can be reached. Although some of these techniques have already been used for the coding of motion information it would be greatly interesting to compare them in the same conditions, with intent to apply them to professional motion compensated applications as opposed to the particular case of motion compensated picture coding.
Segment-based video coding using an affine motion model
Author(s):
Hirohisa Jozawa
Show Abstract
This paper describes a segment-based motion compensation (MC) method where object motion is described using an affine motion model. In the proposed MC method, the shape of the objects is obtained using 5D K-mean clustering (employing three color components and two coordinates). Motion estimation and compensation employing the affine motion model are performed on each arbitrarily shaped object in the current frame. By using the affine motion model, rotation and scaling of objects can be compensated for in addition to translational motion. The residual between the current frame and the predicted one is DCT-coded and multiplexed with the motion and shape parameters. Simulation results show that the interframe prediction error is significantly reduced compared to the conventional block-based method. The SNR of the coded sequences obtained using the proposed method (modified MPEG1 with segment-based affine MC) is about 0.8 - 2 dB higher than that using the conventional one (original MPEG1 with block-based MC).
Coding of edge blocks in digital video sequence coding
Author(s):
Jae Dong Kim
Show Abstract
A new edge block coding mode of digital video coding is presented. For the detailed reconstruction of edge blocks, an adaptive block truncation coding is proposed using the properties of block truncation coding and intensity sensitivity of Human Visual System. To use this algorithm in video sequence coding, an edge block detection mechanism is devised and pixel reallocation, scan mode selection, 1D DPCM, and 2D run length coding are explained. Measured with objective and subjective criteria, the new algorithm is shown to have better performance than other algorithms.
Video segmentation using spatial proximity, color, and motion information for region-based coding
Author(s):
Won Hak Hong;
Nam Chul Kim;
Sang-Mi Lee
Show Abstract
An efficient video segmentation algorithm with homogeneity measure to incorporate spatial proximity, color, and motion information simultaneously is presented for region-based coding. The procedure toward complete segmentation consists of two steps: primary segmentation, and secondary segmentation. In the primary segmentation, an input image is finely segmented by FSCL. In the secondary segmentation, a lot of small regions and similar regions generated in the preceding step are eliminated or merged by a fast RSST. Through some experiments, it is found that the proposed algorithm produces efficient segmentation results and the video coding system with this algorithm yields visually acceptable quality and PSNR equals 36 - 37 dB at a very low bitrate of about 13.2 kbits/s.
Application of dynamic Huffman coding to image sequence compression
Author(s):
Byeungwoo Jeon;
Juha Park;
Jechang Jeong
Show Abstract
In many image sequence compression applications, Huffman coding is used to reduce statistical redundancy in quantized transform coefficients. The Huffman codeword table is often pre-defined to reduce coding delay and table transmission overhead. Local symbol statistics, however, may be much different from the global one manifested in the pre-defined table. In this paper, we propose a dynamic Huffman coding method which can adaptively modify the given codeword and symbol association according to the local statistics. Over a certain set of blocks, local symbol statistics is observed and used to re-associate the symbols to the codewords in such a way that shorter codewords are assigned to more frequency symbols. This modified code table is used to code the next set of blocks. A parameter is set up so that the relative degree of sensitivity of the local statistics to the global one can be controlled. By performing the same modification to the code table using the decoded symbols, it is possible to keep up with the code table changes in receiving side. The code table modification information need not be transmitted to the receiver. Therefore, there is no extra transmission overhead in employing this method.
Reversible compression of a video sequence
Author(s):
Nasir D. Memon;
Khalid Sayood
Show Abstract
In addition to spatial redundancies, a sequence of video images contain spectral and temporal redundancies, which standard lossless compression techniques fail to take into account. In this paper we propose and investigate lossless compression schemes for a video sequence. Prediction schemes are presented that exploit temporal correlations and spectral correlation as well as spatial correlations. These schemes are based on the notion of a scan model which we have defined in our earlier work. A scan model effectively captures the inherent structure in an image and by using optimal scan models from spectrally and temporally adjacent frames to perform prediction in the current frame provides an effective means of utilizing spectral and temporal correlations. We also present a simpler approximation to this technique that selects an appropriate predictor from a set, by making use of information in neighboring frames. Besides effective prediction techniques, we also include a simple error modeling step that takes into account prediction errors made in spectrally and/or temporally adjacent pixels in order to efficiently encode the prediction residual. Implementation results on standard test sequences indicate that significant improvements can be obtained by the proposed techniques.
Wavelet transforms in a JPEG-like image coder
Author(s):
Ricardo L. de Queiroz;
C. K. Choi;
Young Huh;
J. J. Hwang;
K. R. Rao
Show Abstract
The discrete wavelet transform is incorporated into the JPEG baseline coder for image coding. The discrete cosine transform is replaced by an association of two-channel filter banks connected hierarchically. The scanning and quantization schemes are devised and the entropy coder used is exactly the same as used in JPEG. The result is a still image coder that outperforms JPEG while retaining its simplicity and most of its existing building blocks. Objective results and reconstructed images are presented.
Fractal-based image compression: a fast algorithm using wavelet transform
Author(s):
Yonghong Tang;
William G. Wee
Show Abstract
Deterministic fractals have been successfully applied to gray-level image compression. In this approach, an image is represented by a set of affine transforms, each of which maps one subimage to another subimage. The affine transforms are found by exhaustive searching over all the collection of subimages and is very time-consuming. The objectives of this paper are to demonstrate the applicability of wavelet transform (WT) in the searching process and to show a time saving in using WT as compared to exhaustive searching. The wavelet transform provides a multiscale description of an image based on local `detail signals' at each resolution scale. We propose a fast algorithm that takes the advantage of the structural information of the subimages by searching through the wavelet coefficient space instead of the gray-level space. The wavelet transform is computed only once, and can be done very rapidly by using short FIR filters. We have experimentally shown here that the approach is applicable, and there is a time saving of 77% over the exhaustive searching method.
Fractal transform coding of color images
Author(s):
Bernd Huertgen;
Paul Mols;
Stephan F. Simon
Show Abstract
The paper reports on investigations concerning the application of block oriented fractal coding schemes for encoding of color images. Correlations between the different color planes can be exploited for the aim of data compression. For this purpose the similarities between the fractal transform parameters of one block location but different color planes are regarded in a blockwise manner. Starting-point is the fractal code for one block in the dominant color plane which serves as prediction for the code of the corresponding block in the other planes. Emerging from this prediction the depending codes can be derived by a successive refinement strategy. Since the fractal code for the dominant color plane and the refinement information for determining the code for the other planes can be represented with fewer bits compared to the independently calculated ones, a coding gain can be achieved.
New compression method for multiresolution coding algorithms
Author(s):
Ching-Yang Wang;
Tsann-Shyong Liu;
Long-Wen Chang
Show Abstract
The multiresolution coding algorithm, such as Laplacian-Gaussian coding, subband coding or wavelet transform coding, becomes very popular because it can achieve low bitrate and has the property of progressive transmission. In this paper, we propose a new image compression scheme for multiresolution coding algorithm. First, we decompose an image into its multiresolution representation by Laplacian-Gaussian pyramid coding, subband image coding or wavelet transform image coding. It is observed that the energy of high subbands are mainly concentrated around the appropriate edges of the original image. A simple direct VQ to encode the high subbands does not take full use of the sparsely distributed nature of the high subband, and waste many bits to encode the blocks with very low variance. In this paper, we apply a block busy/smooth detector on the lowest subband to predict whether the high subbands contain edge or not. If the block in the lowest subband is an edge block, a vector quantization is applied to encode the corresponding blocks in high subbands. Otherwise these corresponding blocks are discarded. Our simulation results show that our proposed coding scheme can achieve low bitrate without degrading the image quality visually.
Layer coder for hierarchical rate-distortion optimal coding of images
Author(s):
Stephan F. Simon;
Bernd Huertgen
Show Abstract
Hierarchical coding of images is often thought to require a higher bitrate than non-hierarchical coding. It is shown for a stationary, correlated Gaussian image source and the mean squared error fidelity measure that a hierarchical description can be achieved without additional rate. A coder model, based on the optimum forward channel, is developed which allows operating on the rate-distortion bound at every stage of the hierarchy. For the layer coders, of which the hierarchical system is composed, an explicit coding strategy is derived prescribing the amounts of rate required for spatial refinement and SNR refinement of already coded parts.
Stochastic determination of optimal wavelet compression strategies
Author(s):
Donald E. Waagen;
Jeffrey D. Argast;
John R. McDonnell
Show Abstract
Wavelet theory provides an attractive approach to signal and image compression. This work investigates a new wavelet transform coefficient selection approach for efficient image compression. For a desired image compression ratio (50:1), wavelet scale thresholds are derived via a multiagent stochastic optimization process. Previous work has demonstrated an interscale relationship between the stochastically optimized wavelet coefficient thresholds. Based on the experimental results, a deterministic wavelet coefficient selection criteria is hypothesized and the constants of the equation are statistically derived. Experimental results of the stochastic optimization and deterministic approaches are compared and contrasted with results from previously published wavelet coefficient threshold strategies.
Low-complexity high-quality low bit-rate image compression scheme based on wavelets
Author(s):
Chih-Shin Chuang;
Gin-Kou Ma
Show Abstract
Hierarchical image decompositions have been widely studied in conjunction with different quantizer for image coding. This paper presented a new approach of compressing still image based on the combination of scalar quantizers, a vector quantizer, variable length coders, and the hierarchical wavelet decomposition. Our simulation results show, with proper choice of quantization threshold, that the proposed scheme can yield higher compression ratio than the previously developed tree-structured quantization scheme. In the mean time, the quality of reproduced image is very competitive and the computational complexity is lower than those schemes. An adaptive algorithm which automatically calculates the quantization step in order to meet the predetermined bit rate is also presented in this paper. Compare with JPEG, this scheme yields better perceptual quality of the reproduced image at very low bit rate.
Very large scale integration (VLSI) realization of a hierarchical MPEG-2 TV and HDTV decoder
Author(s):
Maati Talmi;
M. Block;
P. Kanold;
Thorsten Selinger;
M. Schubert;
W. Yan;
W. Zhang;
M. Zieger
Show Abstract
This paper reports on the VLSI realization of a hierarchical MPEG-2 HDTV video decoder based on the Spatially Scalable Profile at High-1440 Level. The decoder is built at the Heinrich-Hertz-Institut within the ongoing joint R&D project `Hierarchical Digital TV Transmission' (HDTVT) and will be demonstrated during the international exhibition IFA 1995 in Berlin. One goal of the project is the demonstration of a compatible approach to HDTV with TV-HDCTV compatibility through spatial scalability. The decoder can be used within several scenarios among which are terrestrial broadcasting with graceful degradation and portable reception (through spatial and SNR scalability) besides the less demanding cable and satellite scenarios. Two chips are currently under development to achieve an integrated hardware solution for the hierarchical MPEG-2 video source decoder. These chips are presented and the overall system architecture chosen for the HDTVT decoder is explained. The design approach utilizes logic synthesis and an on board DSP based functional test is implemented to support the field trials. The paper shows that multi-chip HDTV decoders are feasible today. According to the technological progress being expected within the years to come, optimized single-chip decoders with less external memory can be realized until about 1998.
Very large scale integration (VLSI) architecture for motion estimation and vector quantization
Author(s):
Shih-Yu Huang;
Kuen-Rong Hsieh;
Jia-Shung Wang
Show Abstract
A hardware design, called the ME/MRVQ, which combines the functions of motion estimation (ME) and mean/residual vector quantization (MRVQ) is proposed to improve the coding quality of MPEG in this paper. In the ordinary MPEG design, the ME hardware is idle while coding I-pictures. In general, the process of motion estimation is quite similar to the work of vector quantization coding. They all try to search a best representative from a set of exemplars for every input block. In our design, we perform an extra VQ coding in intraframe compression utilizing the idle hardware for motion estimation. A new intraframe coding scheme, called the cascading MRVQ-DCT, is proposed for incorporating this hardware design. The cascading MRVQ-DCT performs a VQ coding before executing DCT, quantization and VLC coding. It has the advantages of coding the shade vectors by VQ and coding the edge vectors by DCT. Thus, the blocking effects in DCT coder under low bit rate is cured. The coding efficiency of the cascading MRVQ-DCT scheme is investigated by software simulations. It is shown that the rate-distortion performance is uniformly improved.
Single-chip highly parallel architecture for image processing applications
Author(s):
Johannes Kneip;
Karsten Roenner;
Peter Pirsch
Show Abstract
For real-time implementation of image processing applications a general purpose Single Instruction/Multiple Data multiprocessor is proposed. The processor consists of an array of data paths, embedded in a two stage memory hierarchy, built of a shared memory with conflict free parallel access in shape of a matrix and a local cache, autonomously addressable by the data paths. The array is controlled by a Reduced Instruction Set Controller with load/store architecture and a fixed field coded very long instruction word. A six stage instruction pipeline leads to a low cycle time of the processor. To provide the necessary flexibility of the array processor even for the parallel processing of complex algorithms, a three stage autonomous controlling hierarchy for the processing units has been implemented. This concept leads to a high level language programmable homogeneous architecture with sustained performance on a wide spectrum of image processing algorithms. For an array of 16 processing units at 100 MHz clock frequency, an arithmetic processing power of 2.0 - 2.4 gigaoperations per second for several algorithms is achieved.
Real-time architecture for large displacements estimation
Author(s):
Lirida Alves de Barros;
Nicolas Demassieux
Show Abstract
In this paper, we present a new motion estimation architecture for large displacements estimation. An efficient differential block recursive algorithm is used to orient searching for a match and save computation power. Multiresolution and multiprediction approaches accelerate the algorithm convergence. Data multiplexing and pipeline allow real-time processing for standard video frequencies such as CCIR 601.
Hierarchical heterogeneous multiprocessor system for real-time motion picture coding
Author(s):
Sang Hoon Choi;
Choon Lee;
Young Gil Kim;
Seong Won Ryu;
Kyu Tae Park
Show Abstract
A multiprocessor system for high-speed processing of hybrid image coding algorithms such as H.261, MPEG or digital HDTV is presented in this study. Using a combination of a highly parallel 32-bit microprocessor and function specific devices controlled by microprogrammed controller, a new processing module is designed for a high performance coding system. In this study, we use 32-bit transputers for overall control and for operations requiring small amount of computation such as quantization, and function specific devices for repeated operation such as DCT and motion estimation. We constructed the motion picture coder using geometrical parallel processing techniques for inter processing module and algorithmic parallel processing for intra module since a single module alone cannot perform hybrid coding system algorithms at high speed.
Scaling method for linear-phase lattice filters for multiresolution image coding
Author(s):
P. Desneux;
J. Y. Mertes;
Benoit M. M. Macq;
Jean-Didier Legat
Show Abstract
Subband coding is one of the most used techniques for image compression. The coding consists in splitting the image into different frequency bands. These subbands are then coded independently from each other, the most part of bits being allocated to the subbands containing the most part of the energy. The subband decomposition is usually achieved by signal filtering. Multiresolution transform represents such a filtering technique. In most cases, filter design and optimization does not take hardware implementation constraints into account. However, these can lead to a loss of performances in the compression algorithms due to roundoff noise or coefficient quantization. In this article, we show the influence of finite precision effects on results achieved with a multiresolution architecture, using a transform based on separable linear-phase filters implemented in their lattice form. We also propose a scaling method which tries to minimize the computation noise and to avoid overflows. This scaling method has been simulated and gives useful results for the design of an integrated circuit implementing the multiresolution transform.
Low-power design of wavelet processors
Author(s):
Yi Kang;
Belle W. Y. Wei;
Teresa H.-Y. Meng
Show Abstract
This paper describes a VLSI architecture for low-power image compression applications. It implements a block-based wavelet transform with emphasis on minimal memory requirements, power consumption, and compression performance. The blocked transform partitions the original image into 16 X 16 data blocks with repeated boundary pixels for edge smoothing. Each block undergoes four octaves of 2D wavelet decomposition, and results in 13 subbands. The first octave uses Daubechies-4 transform, and the second, third, and fourth use Haar functions. The architecture consists of four 8-bit X 7-bit multiplier and accumulator units and one 13-bit adder. It implements 2D wavelet transforms directly with minimal memory and a small chip area. At 60 MHz, it processes a 512 X 512 gray-scale image in 23 ms and 18 ms for forward and inverse transforms respectively, satisfying the full-motion video requirement of 30 frames per second.
Architecture and modeling of a parallel digital processor based image processing system
Author(s):
David Andrew Hartley;
Shirish P. Kshirsagar
Show Abstract
The paper describes an image processing system which uses both shared memory and message passing. Shared memory is used in conjunction with a high speed parallel bus to transfer image data; message passing is used for general inter-processor communication. A prototype system based upon the Texas Instruments TMS320C40 digital signal processor is currently in the final stages of construction. A Petri Net model of the communication aspects of the TMS320C40 processor has been developed. Features of the Petri Net software are discussed and the raw communication performance of the TMS320C40 shown. The modeling of a four and sixteen processor system applied to 2D FFT transforms is described.
Temporal prediction of block motion vectors with local ambiguity-based adaptivity
Author(s):
Stephen O'Halek;
Ken D. Sauer
Show Abstract
The computational burden of full search block matching algorithms for motion estimation in video coding can be reduced through exploitation of spatial and temporal correlation of the motion vectors. This paper describes a simple adaptive temporal prediction scheme for block motion vectors. The first-order temporal predictor determines the center of the search area for a conventional block match, but with substantially reduced search radius. The abbreviated search reduces computation by about 75% for blocks whose motion is successfully predicted. Adaptivity is introduced through the notion of ambiguity in the predicted block match. Those blocks whose matching cost function shows too great an ambiguity in the neighborhood of the best match are instead estimated by the conventional full search. While performance of the algorithm is dependent on sequence content, it offers an attractive choice in the computational cost/performance tradeoff for simple motion compensation.
Improved fast block matching algorithm in the feature domain
Author(s):
Yiu-Hung Fok;
Oscar Chi Lim Au
Show Abstract
A fast block matching algorithm in the feature domain was proposed by Fok and Au with a computation reduction factor of N/2 for a search block size of N X N. Although the algorithm can achieve close-to-optimal result, it requires a large amount of memory to store the features. This paper presents three improved fast block matching algorithms in the integral projections feature domain which can also reduce the computation significantly but with a considerably lower memory requirement. With a search block size of N X N, two of our algorithms retain a computation reduction factor of N/2 while the other one can achieve a computation reduction factor of N. The three algorithms can achieve close-to-optimal performance in mean absolute difference sense.
Boundary-control point based motion representation scheme with its application in video coding
Author(s):
Jin Li;
Xinggang Lin;
Youshou Wu
Show Abstract
In this paper, a boundary-control point (BCP) based motion representation scheme is proposed. According to the scheme, the dense motion field is described by the object boundary and the motion vectors located at predefined grids--the control points. Our scheme differs from the conventional block based motion representation scheme in the point that the motion field has more degree of freedom. It can represent complex motion, e.g., translation, rotation, zooming and deformation. And the motion field is generally continuous, with discontinuity only at the object boundary, so the distortion after motion compensation is mainly geometrical deformation, which is relatively insensitive to human eye compared with the block effect caused by the conventional block based scheme. A pixel threshold criterion is also proposed to evaluate the BCP based motion compensated prediction (MCP) image and to determine whether the MCP error needs to be transmitted. Finally, a BCP based video encoder is constructed. With nearly the same decoded signal-to-noise ratio about 20 - 55% bit rate saving can be achieved compared with MPEG-I, while the subjective quality of the BCP based scheme is better than that of MPEG-I. The new scheme is also quire unlike the model-based scheme for it needs no complex scenery analysis. Some promising experimental results are shown.
Minimum mean square error (MMSE) design of generalized interpolation filters for the motion processing of interlaced images
Author(s):
Luc Vandendorpe;
Paul Delogne;
Laurent Cuvelier;
Benoit Maison
Show Abstract
In [1 , 2J, it has been shown how vertical motion could be handled when processing interlaced images with sub-pixel accuracy. The description was based on a generalized version of the sampling theorem. In [4, 3] , this formalism has been used to solve the problem of image deinterlacing. In [5] , we showed how the interpolation filters associated with non-uniform periodical sampling could be approximated by means of a minimum mean squared design. In the present paper, we show how to apply this type of design to the interpolation operation involved in the motion estimation/compensation process on the one hand, and in the deinterlacing process on the other hand. It will be shown how to compute. the filter coefficients as functions of the motion and the statistical properties of the images. The results obtained for this type of design will be compared with the ones obtained by means of the design methods presented in [1, 2, 3, 4].
Hierarchical motion estimation algorithm with smoothness constraints and postprocessing
Author(s):
Kan Xie;
Luc Van Eycken;
Andre J. Oosterlinck
Show Abstract
How to obtain accurate and reliable motion parameters from an image sequence is the key problem in many applications of image processing, image recognition and image coding. Based on the investigation and comparison of the different motion estimation (or optical flow) algorithms existing, a pel-based motion estimation algorithm with smoothness constraints is developed. It is obvious that the performance of a motion estimation algorithm is strongly dependent on the measurement windows which seem to vary with different types of images. In order to further improve accuracy and reliability of motion estimation, a hierarchical motion estimation algorithm is proposed in this paper to cope with different sizes, patterns and complicated motions of objects in scene. This hierarchical estimation algorithm can conveniently be applied for motion-compensated (MC) interframe interpolation coding. Since motion information can be estimated and transmitted as in a block-based forward MC prediction coder, and then the motion vector fields are hierarchically refined to the pel resolution at the receiver to realize a high quality MC interpolation. The experiment results show that the proposed algorithm has an excellent performance, not only very small prediction errors, but also quite smooth and accurate motion estimation.
Motion restoration: a method for object and global motion estimation
Author(s):
Jih-Shi Su;
Hsueh-Ming Hang;
David W. Lin
Show Abstract
A new technique called motion restoration method for estimating the global motion due to zoom and pan of the camera is proposed. It is composed of three steps: (1) block-matching motion estimation, (2) object assignment, and (3) global motion restoration. In this method, each image is first divided into a number of blocks. Step (1) may employ any suitable block- matching motion estimation algorithm to produce a set of motion vectors which capture the compound effect of zoom, pan, and object movement. Step (2) groups the blocks which share common global motion characteristics into one object. Step (3) then extracts the global motion parameters (zoom and pan) corresponding to each object from the compound motion vectors of its constituent blocks. The extraction of global motion parameters is accomplished via singular value decomposition. Experimental results show that this new technique is efficient in reducing the entropy of the block motion vectors for both zooming and panning motions and may also be used for image segmentation.
Development and picture quality evaluation of a prototype Hi-Vision coding system for facility monitoring
Author(s):
Masamichi Hasegawa;
Yuji Omori;
Yasuhiro Kosugi;
Hideo Shimizu
Show Abstract
A Hi-Vision video coding system has been developed to realize a high definition video surveillance system to observe various objects, such as important facilities, their construction works, details of their component blocks and machinery setup. This system transmits Hi- Vision video images with their high resolution and wide viewing angle through only 32 Mbps lines. Human visual power can be utilized with the system to the extent where conventional NTSC video system do not allow. This paper describes motives of the development, system configurations and test results of remote surveillance of an operating power plant.
Real-time motion compensating TV-to-HDTV converter
Author(s):
Martin Hahn;
Stefan Wolf;
Maati Talmi;
Michael Karl
Show Abstract
This paper describes the realization of a TV interlaced to HDTV interlaced real time format converter for studio applications. The conversion is performed by motion compensated 3D- interpolation. The estimation of motion is based on hierarchical blockmatching. Reliability checking of motion vectors is applied to achieve high picture quality. Furthermore various picture classification algorithms are utilized to improve the reliability of motion vectors. This format converter has been developed using field programmable gate arrays, digital signal processors and specially designed VLSI-chips for reduction of hardware amount. These VLSI- chips have been realized as semicustom and fullcustom chips. Besides employment within the format converter they are suitable for various applications in video processing.
Analysis of joint bit-rate control in multiprogram image coding
Author(s):
Gertjan Keesman;
David Elias
Show Abstract
Variable bit-rate compression is known to be more efficient than constant bit-rate (CBR) compression. However, since most transmission media convey a constant bit-rate, most applications use CBR compression. In multi-program applications such as satellite and cable television a technique called joint bit-rate control can be used. With joint bit-rate control the bit rates of individual programs are allows to vary but the sum of all bit rates is kept constant. Since a joint control is involved, the encoders should be geographically close. Joint bit-rate control is considered to be different from a technique called statistical multiplexing which is proposed in combination with ATM networks. In this paper, the benefits of joint bit-rate control are studied. This study involves both a theoretical analysis and a verification with experiments. Both theory and practice indicate that joint bit-rate control gives a significant gain even for a small number of programs. For instance, extra programs can be transmitted over the same channel if joint bit-rate control is used.
Robust algorithms for image transmission over asynchronous transfer mode (ATM) networks
Author(s):
Sethuraman Panchanathan;
A. Jain
Show Abstract
Video compression is becoming increasingly important with the advent of broadband networks and compression standards. Asynchronous transfer mode (ATM) is fast emerging as a means of transferring packetized voice, video and data on high speed networks. However, transmission of data using ATM networks may result in the loss of packets when required bandwidth exceeds the available bandwidth or when there is a buffer overflow at the receiver. Compressed video transmission over ATM results in degraded picture quality because of cell losses. In this paper, we present novel techniques based on vector quantization which not only provides a good coding performance but is also robust to cell losses in ATM networks. Our simulation results indicate that the proposed techniques provide a significant improvement in the image quality over a wide range of cell loss rates.
Two-layer video codec design for asychronous transfer mode (ATM) networks
Author(s):
Shang-Pin Chang;
Hsueh-Ming Hang
Show Abstract
The asynchronous-transfer-mode (ATM) transmission has been adopted by most computer networks and the broad-band integrated-services digital network. Many researches have been conducted to investigate the transmission of video services over ATM networks. The previous studies often concentrate on designing video coders or on designing network regulating policies that reduce packet loss effect. But our ultimate goal should be reducing the overall distortion on the reconstructed images at the receiver and this distortion contains two components: (1) source coding error due to compression and (2) channel error due to network packet loss. In general, a high output rate at a source encoder leads to a smaller compression error; however, this high bit rate may also increase lost packets and thus increase the channel error. In this paper, a popular two-layer coding structure is considered and the optimal quantizer step size for the enhancement layer has been studied under the consideration of the source-plus-channel distortion. Through both theoretical analysis and image simulation, we indeed find an optimal operating point that achieves the lowest total mean squared error at the receiver.
Hierarchical texture segmentation with wavelet packet transform and adaptive smoothing
Author(s):
Yu-Chuan Lin;
Tianhorng Chang;
C.-C. Jay Kuo
Show Abstract
A new texture segmentation algorithm based on the wavelet transform and a hierarchical fuzzy processing technique is proposed in this research. We perform a tree-structured wavelet transform to decompose a textured image adaptively into subimages corresponding to different frequency channels, and assign a fuzzy membership function to every spatial location by examining the averaged energy value of a local window centered at that point. A procedure to determine the proper smoothing window size is described. The fuzzy membership function in different subimages is then integrated hierarchically to lead to the final segmentation result. Numerical experiments are given to demonstrate the performance of our proposed algorithm.
Improved image segmentation techniques for hybrid waveform/object-oriented coding
Author(s):
Peter P. Kauff;
U. Goelz;
Silko Kruse;
S. Rauthenberg
Show Abstract
A current topic in image coding is to investigate how the efficiency of conventional waveform coding can be improved by utilizing methods from image analysis, image segmentation, image understanding and computer vision. A very promising approach to this problem is object- oriented analysis-synthesis coding and its combination with conventional waveform codecs (hybrid waveform/object-oriented coding). One of the most important key components needed for such a hybrid system is a reliable and generic image segmentation. In this context the underlying contribution presents an algorithm for segmenting objects in front of moving backgrounds. This algorithm is based on four subsequent processing steps which are explained in detail: pre-segmentation by using a Hough transform of conventionally estimated vector fields, global parameter mapping within the pre-segmented background area, compensation of the global background motion, thresholding and post-processing of the remaining DFD signal.
Applications of Gibbs random field in image processing: from segmentation to enhancement
Author(s):
Jiebo Luo;
Chang Wen Chen;
Kevin J. Parker
Show Abstract
The Gibbs random field (GRF) has been proved to be a simple and practical way of parameterizing the Markov random field which has been widely used to model an image or image related process in may image processing applications. In particular, Gibbs random field can be employed to construct an efficient Bayesian estimation that often yields optimal results. In this paper, we describe how the Gibbs random field can be efficiently incorporated into optimization processes in several representative applications, ranging from image segmentation to image enhancement. One example is the segmentation of CT volumetric image sequence in which the GRF has been incorporated into K-means clustering to enforce the neighborhood constraints. Another example is the artifact removal in DCT based low bit rate image compression in which GRF has been used to design an enhancement algorithm that smooths the artificial block boundary as well as ringing pattern while still preserve the image details. The third example is an elegant integration of GRF into a wavelet subband coding of video signals in which the high-frequency bands are segmented with spatial constraints specified by a GRF while the subsequent enhancement of the decompressed images is accomplished with the smoothing function specified by another corresponding GRF. With these diverse examples, we are able to demonstrated that various features of images can be all properly characterized by a Gibbs random field. The specific form of the Gibbs random field can be selected according to the characteristics of an individual application. We believe that Gibbs random field is a powerful tool to exploit the spatial dependency in various images, and is applicable to many other image processing tasks.
Image segmentation via piecewise constant regression
Author(s):
Scott Thomas Acton;
Alan Conrad Bovik
Show Abstract
We introduce a novel unsupervised image segmentation technique that is based on piecewise constant (PICO) regression. Given an input image, a PICO output image for a specified feature size (scale) is computed via nonlinear regression. The regression effectively provides the constant region segmentation of the input image that has a minimum deviation from the input image. PICO regression-based segmentation avoids the problems of region merging, poor localization, region boundary ambiguity, and region fragmentation. Additionally, our segmentation method is particularly well-suited for corrupted (noisy) input data. An application to segmentation and classification of remotely sensed imagery is provided.
Segmentation for handwritten characters overlapping a tabular formed slip by global interpolation
Author(s):
Satoshi Naoi;
Yoshinobu Hotta;
Maki Yabuki;
Atuko Asakawa
Show Abstract
The global interpolation we proposed evaluates segment pattern continuity and connectedness to produce characters with smooth edges while interrupting blank or missing segments, e.g., in extracting a handwritten character overlapping one box border, correctly. In this paper, we expand our method to be able to separate handwritten characters overlapped a tabular formed slip. We solve two problems to realize it: (1) precise matching among blank segments of adjacent characters for interpolation, and (2) reinterpolation area decision when adjacent character strings are close to each other. Precise matching can be done by finding exact terminal points of blank segments or missing segments. We make efficient use of removed image in a border. The contour of the character segment in removed border image is tracked from the intersection of the character and the border toward the center of the border. Reinterpolation area is adaptively decided by not using one box border size, but, estimating a character size in each character string after removing borders of a tabular formed slip. When adjacent character strings are close to each other, their strings cannot be separated by calculating their horizontal projection value. We calculate the weighted horizontal projection value whose weight is approximated by a convex function, that is, the peak is in proportion to each labeled segment size and is set to the center of gravity of the labeled segment. Some experimental results show the effectiveness of our method.
Texture segmentation parallel model based on the symmetric and asymmetric Gabor filters
Author(s):
Yan Zhou;
Harold G. Longbotham;
Hong Deng
Show Abstract
In this paper a parallel mode for texture segmentation is presented. The model is built on a basic texture segmentation model. In the basic model, a bank of filters, Gabor filters and asymmetric Gabor filters, are used to describe texture properties in both frequency and in spacial domains. Since there is much information in texture image, detecting texture boundaries will take a long time. In the parallel model, the operations of filtering image are implemented in parallel by different processors. Since the operations are independent and almost the same, the speedup should be very high. The examples show that the parallel model is very effective for the texture segmentation operation.
Detection of the number of image regions by minimum bias/variance criterion
Author(s):
Yue Joseph Wang;
Tianhu Lei;
Joel M. Morris
Show Abstract
An unsupervised stochastic model-based image analysis technique requires the model parameters to be estimated directly from the observed image. A new approach is presented to the problem of detecting the number of statistically distinct regions in an image, based on the application of a new information theoretic criterion called minimum bias/variance criterion (MBVC). Different from the conventional approximation and coding based approaches introduced by Akaike and by Rissanen, the new criterion is to reformulate the problem explicitly as a problem of model bias and variance balancing in which the number of image regions is obtained merely by minimizing the MBVC value. Simulation results that illustrate the performance of the new method for the detection of the number of regions in an image are presented with both synthetic and medical images.
Simultaneous matching of multiple templates in complex images
Author(s):
Raghunath K. Rao;
Jezekiel Ben-Arie
Show Abstract
Our recently developed Expansion Matching (EXM) is an effective method for matching templates and optimizes a new similarity measure called Discriminative Signal-to-Noise Ratio (DSNR). DSNR optimization leads to sharp matching peaks and minimal off-center response in the matching result. Consequently, EXM yields superior performance to the widely used correlation matching (also known as matched filtering), especially in conditions of noise, superposition and severe occlusion. This paper extends the real-valued EXM formulation for simultaneous matching of multiple templates in the complex image domain. Complex-valued template matching is relevant to matching frequency domain templates, edge gradient images and can be extended to multispectral images as well. Here, a single filter is designed to simultaneously match a set of given complex templates with optimal DSNR, while eliciting user-defined center responses for each template. For the simplified case of matching a single real-valued template, this EXM result reduces exactly to the Minimum Squared Error restoration filter assuming the template as the blurring function. In addition, the single- template EXM matching result also corresponds to the coefficients of a non-orthogonal expansion of the input image using shifted versions of the template as basis functions.
Segmentation and modeling of textured images through combined second- and third-order statistical models
Author(s):
Tania Stathaki;
Anthony G. Constantinides
Show Abstract
In this paper the problem of texture image analysis in the presence of noise is examined from a higher-order statistical perspective and in the context of noise. The objective is to develop analysis techniques through which robust texture characteristics are extracted and used for texture modelling and segmentation. The approaches taken involve the use of autoregressive models whose parameters derived first from joint and weighted second and third order cumulants and secondly as a solution to a weighted overdetermined set of equations in which the weights are appropriate functions of the eigenvalue spread. The required segmentation of such 2D random fields is effected through the additional stage of a neural network having as inputs the extracted autoregressive parameters. The paper describes the fundamental issues of the various components of the approach.
Image sequence coding using 3D scene models
Author(s):
Bernd Girod
Show Abstract
The implicit and explicit use of 3D models for image sequence coding is discussed. For implicit use, a 3D model can be incorporated into motion compensating prediction. A scheme that estimates the displacement vector field with a rigid body motion constraint by recovering epipolar lines from an unconstrained displacement estimate and then repeating block matching along the epipolar line is proposed. Experimental results show that an improved displacement vector field can be obtained with a rigid body motion constraint. As an example for explicit use, various results with a facial animation model for videotelephony are discussed. A 13 X 16 B-spline mask can be adapted automatically to individual faces and is used to generate facial expressions based on FACS. A depth-from-defocus range camera suitable for real-time facial motion tracking is described. Finally, the real-time facial animation system `Traugott' is presented that has been used to generate several hours of broadcast video. Experiments suggest that a videophone system based on facial animation might require a transmission bitrate of 1 kbit/s or below.
Image sequence recognition
Author(s):
Guido Tascini;
Primo Zingaretti
Show Abstract
Image sequence recognition is an interesting problem involved in various situations of the Computer Vision field and in particular in mobile robot vision. Typical for this purpose is the motion estimation from a series of frames. Many techniques of motion estimation are described in literature 10, 13, 22 The approaches are normally divided in two categories: pixel based methods and feature based methods. In both the motion is estimated in two steps: 1) 2D motion analysis (feature based) or estimation (pixel based), 2) 3D motion estimation. The pixel based, or flow based, method uses local changes in light intensity to compute optical flow at each image point and then derives 3D motion parameters . The feature based method, on which it falls our choice, firstly extracts the features (as corners, point of curvature, lines, etc.). They are used as features: sharp changes in curvature 15, global properties of moving objects 18, lines and curves 16, centroids 6 Secondly it establishes the correspondences of these features between two successive frames (correspondence problem), and finally it computes motion parameters and object structure from correspondences (structure from motion problem). The motion correspondence is the most difficult problem. Occlusion masks the features and noise creates difficulties. Given n frames taken at different time instants and m points in each frame, the motion correspondence maps a point in one frame to another point in the next frame such that no two points map on the same point. The combinatorial explosiveness of the problem has to be constrained; in Rangarajan and Sah 19 it is proposed the proximal uniformity constraint: given a location of a point in a frame, its location in the next frame lies in the proximity of its previous location. Even tough the problem has not yet been solved, many solutions are proposed for 3D motion estimation, assuming that the correspondences has been established 12, 13, 24 Regularization theory has also been proposed for the numerical improvement of the solution of both feature based and pixel based problems . From the human stand point a vision system may be viewed as performing the following tasks in sequence: detection, tracking and recognition. The detection rises at cortex level; then it follows the tracking of objects contemporary attempting to recognize them. From the machine stand point the movement detection phase may be viewed as a useful mean to focus the system attention so reducing the search space of the recognition algorithms. Particular attention has to be reserved in detecting moving objects in presence of moving background, from monocular image sequence. Several researchers have faced the problem 14, 21 When we take the images from a moving vehicle (for instance with translational movement) it is necessary to distinguish between real and apparent movement. The stationary objects of the scene appear to move along paths radiating from the point toward which we are moving (focus of expansion). By operating a transformation on the image, called Complex Logarithmic Mapping (see Frazier-Nevatia 8), it is possible to convert the problem from one of detecting motion along both the X and Y axes to one of detecting motion from along an angular axis. After executing an horizontal edge detection, if we observe the motion of edges in the vertical direction we can conclude that there is a moving object in the scene. Our approach is feature based and a series of considerations are necessary to understand the solution adopted. We regard as features edges, corners or whole regions. The choice of a feature depends on the facility of retrieving it in the successive frames, forming a correspondence chain. The corner detection may be based on revealing the sharply direction change of intensity gradient. In Rangarajan et al. 20 it is described the construction of a set of operators to detect corners. Being the corners the mainly used features particular attention has been devoted to correspondences among points. For these they may be adopted two approaches: 1) with matching, in which two point patterns, from two consecutive images, are matched (elastic matching 25); 2) without matching, by using the criteria of proximity and regularity of point trajectories. Our approach uses two types of matching: geometric and relational. The geometric matching uses parametrized geometric models and may be viewed as a parametrized optimization problem. The relational matching uses relational representations and may be viewed as the problem of detecting the isomorphism among graphs.