Proceedings Volume 1818

Visual Communications and Image Processing '92

Petros Maragos
cover
Proceedings Volume 1818

Visual Communications and Image Processing '92

Petros Maragos
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 1 November 1992
Contents: 19 Sessions, 149 Papers, 0 Presentations
Conference: Applications in Optical Science and Engineering 1992
Volume Number: 1818

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Hierarchical Image Coding I
  • Motion Analysis
  • Feature Extraction and Image Restoration
  • Video Coding I
  • Image Restoration
  • Morphological and Related Nonlinear Filters
  • Vector Quantization
  • Motion Analysis and Video Coding
  • Morphological Image Processing I
  • Hierarchical Image Coding II
  • Video Coding II
  • Morphological Image Processing II
  • Fractals and Wavelets
  • Architectures for Image and Video Processing
  • Video Coding III
  • Image Segmentation
  • Biomedical Image Processing
  • Image Coding and Model-Based Analysis
  • Section
Hierarchical Image Coding I
icon_mobile_dropdown
Image subband coding with adaptive filter banks
John Hakon Husoy, Sven Ole Aase
A fundamental problem with both FIR and IIR based image subband coders operating at low bit rates is the trade-off between ringing noise and blocking artifacts. The important parameter in this respect is the impulse response length of the filter bank channels. A short impulse response length is suitable for modeling image areas with a high degree of detail, but suffers from blocking effects. On the other hand, a long impulse response does not exhibit blocking effects, but will lead to ringing noise in the vicinity of edges. In this paper we show that IIR filter banks based on all pass filter building blocks can be made spatially adaptive without compromising their perfect reconstruction property. This rather surprising property is used to simultaneously minimize ringing artifacts and blocking effects in a subband coder.
Ladder structures for multidimensional linear-phase perfect reconstruction filter banks and wavelets
A. A. C. M. Kalker, I. A. Shah
The design of multidimensional filter banks and wavelets have been areas of active research for use in video and image communication systems. At the same time efficient structures for the implementation of such filters are of importance. In 1-D, the well known lattice structure and the recently introduced ladder structure are attractive. However, their extensions to higher dimensions (m-D) have been limited. In this paper we reintroduce the ladder structure, with the purpose of transforming the structure into m-D using the McClellan transform.
Comparative study of several nonlinear image interpolation schemes
Bing Zeng, Anastasios N. Venetsanopoulos
Image interpolation using non-linear approaches is investigated in this paper. We shall study this problem by concentrating on two sub-sampling schemes that are most commonly adopted in practice: (1) quincunx lattice and (2) every other row and every other column (EORC) lattice. For each sub-sampling lattice, we propose four interpolation structures and compare them by considering the computational complexity and testing with both natural and synthetic images. Comparisons of the proposed non-linear interpolators with some linear approaches are also presented. With the results obtained in this paper, we can come out with a general conclusion that the non-linear interpolation should be preferred to in practice.
Flexible design of perfect reconstruction FIR diamond subband filters with transformations
David B. H. Tay, Nick G. Kingsbury
In this paper we present an method for designing 2-D linear phase FIR diamond subband filters having the perfect reconstruction property. It is based on a transformation of variable technique and is equivalent to the generalized McClellan transformation. We present methods to design a whole class of transformations. The method provides the flexibility of controlling the frequency characteristics of the filters with ease. With this method the problem consists of two parts: design of the transformation and design of the 1-D filters. The filters designed with this method can be implemented efficiently by employing separable processing along the diagonal directions. Several numerical design examples are presented to illustrate the flexibility of the design method.
Hierarchical image coding with diamond-shaped subbands
Xiaohui Li, Jie Wang, Peter H. Bauer, et al.
We present a sub-band image coding/decoding system using a diamond-shaped pyramid frequency decomposition to more closely match visual sensitivities than conventional rectangular bands. Filter banks are composed of simple, low order IIR components. The coder is especially designed to function in a multiple resolution reconstruction setting, in situations such as variable capacity channels or receivers, where images must be reconstructed without the entire pyramid of sub-bands. We use a nonlinear interpolation technique for lost subbands to compensate for loss of aliasing cancellation.
Analysis and design of multidimensional nonuniform band filter banks
Thomas R. Gardos, Kambiz Nayebi, Russell M. Mersereau
We present a method for designing multi-dimensional multi-rate nonuniform band FIR filter banks with integer or rational decimation factors. The analysis is based on the periodically time-varying (PTV) impulse responses of the filter bank branches. A constraint is developed, in terms of the analysis and synthesis filters, which ensures perfect reconstruction of the input signal. This forms the basis for the design method which uses constrained optimization techniques. We review impulse response analysis of filter banks and use Smith normal form decomposition to simplify their structure. We show how these impulse response analysis techniques allow us to consider filter banks that previous techniques fail to address.
Motion Analysis
icon_mobile_dropdown
Motion estimation: the concept of velocity bandwidth
Regis J. Crinon, Wojciech J. Kolodziej
This paper shows that like spatial and temporal frequencies, velocity is subject to aliasing. The analysis is based on studying a two dimensional sinusoidal signal moving at constant velocity. It is shown that given the spatio-temporal grid used to sample the image, displacement of a sinusoidal signal must remain confined to a well defined two dimensional domain to avoid velocity aliasing. Analytical derivations to the aliasing-free domains are provided for various sampling lattices such as the rectangular field-interlaced and offset field-interlaced (quincunx) sampling grids. The paper concludes with a few suggestions regarding how the concept of velocity bandwidth can be helpful in the estimation of object displacement in digital video sequences. In particular, bounds for the search domains use in multiscale block matching- based motion estimation algorithms are derived.
Motion estimation using frequency components
Hsueh-Ming Hang, Yung-Ming Chou, Tzy-Hong S. Chao
Motion estimation techniques are widely used in today's video coding systems. The most often-used techniques are block (template) matching method and differential method (pel recursive). In this paper, we would like to study this topic from a viewpoint different from the above methods to explore the fundamental limits and tradeoffs in image motion estimation. The underlying principles behind the two conflict requirements, accuracy and inambiguity, become clear when they are analyzed using this tool--frequency component analysis. This analysis may lead us to invent new motion estimation algorithms and suggest ways to improve the existing algorithms.
New pel-recursive motion estimation algorithms based on novel interpolation kernels
Michael T. Orchard
Pel-recursive motion estimation algorithms are an attractive alternative to block-matching motion compensation algorithms for video coding because (a) they do not require that motion information by transmitted over the channel, and (b) they allow the reconstruction of continuously varying motion fields. Unfortunately, the high computational complexity of these algorithms and their difficulty in tracking varying motion fields, discontinuities in motion fields, and noisy image sequences have led most current video coding algorithms to use block- based rather than pel-recursive approaches to motion estimation and compensation. This paper presents a new, discrete formulation of pel-recursive motion estimates which allows more flexibility in trading off computational complexity for prediction accuracy, and which permits the design of hybrid motion-estimation algorithms sharing characteristics of both pel-recursive approaches and block-matching approaches. Using the discrete formulation, we defined three novel approaches to motion estimation, one in the form of a conventional pel-recursive algorithm and two incorporating various amounts of block-based motion information from the encoder. We present simulations comparing their performance with both standard pel- recursive and block-matching motion estimation algorithms, demonstrating significant improvements in prediction accuracy.
Multigrid block-matching motion estimation with an adaptive local mesh refinement
Frederic Dufaux, Murat Kunt
Block-based motion estimation and compensation have been efficiently applied in block-based video coding schemes, such as those using DCT, and have been adopted in standards as CCITT/H.261 and ISO/MPEG-II. However, the use of block-based motion estimation techniques in subband coding schemes represents a contradiction, and may introduce block artifacts. In this paper, we present a multigrid block matching motion estimation technique giving more accurate motion field, in the sense of the true motion in the scene, while reaching near optimal solutions in minimizing the energy of the prediction error, when compared to the classical full-search block matching. In addition, a locally adaptive mesh refinement procedure is introduced, allowing more accurate motion fields on edges of moving objects, without increasing the side information required to transmit the motion vectors. The local mesh refinement coarsely approximates a segmentation of the motion field, and reduces block artifacts when the motion estimation is used in a subband coding scheme.
Generalized block-matching motion estimation
Vassilis E. Seferidis, Mohammad Ghanbari
A general approach to block matching motion estimation is introduced. It very well handles the complex motion found in broadcasting television signals by comparing each block of the current frame with a deformed quadrilateral of the previous one. The calculation of the extra motion information requires additional operations which increase the computational load but the improved prediction reduces the bit rate, especially in scenes with complex motion and deformations of the moving objects.
Affine models for motion and shape recovery
Chiou-Shann Fuh, Petros Maragos
This paper presents an affine model for 3-D motion and shape recovery using two perspective views and their relative 2-D displacement field. The 2-D displacement vectors are estimated as parameters of a 2-D affine model that generalizes standard block matching by allowing affine shape deformations of image blocks and affine intensity transformations. The matching block size is effectively found via morphological size histograms. The parameters of the 3-D affine model are estimated using a least-squares algorithm that requires solving a system of linear equations with rank three. Some stabilization of the recovered motion parameters under noise is achieved through a simple form of MAP estimation. A multi-scale searching in the parameter space is also used to improve accuracy without high computational cost. Experiments on applying these affine models to various real world image sequences demonstrate that they can estimate dense displacement fields and recover motion parameters and object shape with relatively small errors.
Toward an estimation of three-dimensional motion in a 3DTV image sequence II: extension to curved surfaces
This article deals with 3D scene analysis for coding purposes and is part of general research in 3D television. The method proposed here attempts, through dynamic monocular analysis, to estimate three-dimensional motion and to determine the structure of the observed objects. Motion and structure estimation is achieved by means of a differential method. Images are segmented according to spatio-temporal criteria, using a hierarchical method of quad-tree type with overlappings. Segmentation and estimation are performed jointly.
Feature Extraction and Image Restoration
icon_mobile_dropdown
Filters for directly detecting surface orientation in an image
A new approach to finding the 3D orientation of a textured planar surface is presented. By decomposing image space in a novel way that reflects the structure of the problem to be solved, a natural set of filters indexed by 3D orientation is obtained. The filters are applied at a single point in the image and the maximally responding filter is found. Its 3D orientation indices specify the orientation of the surface. The set of filters is large, making this 'pure' approach computationally expensive, so a method for using Gabor filters to select a subset of the 3D orientation detecting filters is presented. The result is a computationally efficient, practical algorithm. Only filter applications at a single point and simple operations on the filter outputs are needed. This method makes use of texture gradient information without the need for combining explicit measurements from multiple image points. The surface texture is assumed to be locally homogeneous, but not necessarily isotropic. The algorithm has an average error of 4 degrees in slant and tilt on a set of twelve images of real textured surfaces. The idea of decomposing the space of images according to the structure of the 3D information to be computed is a powerful one that we expect will apply to other vision problems. Also, the image space decomposition and filter functions developed here can serve as a model for surface orientation perception in biological vision.
Color texture representation using multiscale feature boundaries
Jacob Scharcanski, Jeff K. Hovis, Helen C. Shen
This paper analyzes the problem of representing the structural aspect of texture images. To represent a texture image, we propose an ensemble composed of feature boundaries detected by multi-scale operators. This ensemble is built by the integration of the detected boundaries in different scales. Using this representation, textures may be discriminated by a simple similarity measure.
Tracking of unresolved targets in infrared imagery using a projection-based method
Jae-Ho Choi, Sarah A. Rajala
The conventional two-dimensional Hough transform technique is generalized into a projection- based transform method by using the modified Radon transform for estimating a three- dimensional target tracks embedded in a time-sequential set of image frames. The target of concern are dim, unresolved point targets moving along straight paths across a same field of view. Since the target signal-to-noise is low and the spatial extend of the target is less than a pixel, one must rely on integration over a target track which span over many image frames. Instead of processing the entire 3-D data set, a set of projections are taken using the modified 3-D Radon transform. The projection frames are processed further to extract the track parameters using the Hough transform. This projection-based method not only lowers the data dimensionality but maintains a comparable estimation performance to that of using the entire 3-D data by successfully incorporating all available knowledge obtained from the set of projections. The simulation results are presented for the synthetic and real infrared image sequences containing synthetically generated 3-D target tracks under various signal-to-noise conditions.
Multidimensional energy operator for image processing
Petros Maragos, Alan Conrad Bovik, Thomas F. Quatieri
The 1-D nonlinear differential operator (Psi) (f) equals (f')2 - ff' has been recently introduced to signal processing and has been found very useful for estimating the parameters of sinusoids and the modulating signals of AM-FM signals. It is called an energy operator because it can track the energy of an oscillator source generating a sinusoidal signal. In this paper we introduce the multidimensional extension (Phi) (f) equals (parallel)$DELf(parallel)2 - f$DEL2f of the 1-D energy operator and briefly outline some of its applications to image processing. We discuss some interesting properties of the multidimensional operator and develop demodulation algorithms to estimate the amplitude envelope and instantaneous frequencies of 2-D spatially-varying AM-FM signals, which can model image texture. The attractive features of the multidimensional operator and the related amplitude/frequency demodulation algorithms are their simplicity, efficiency, and ability to track instantaneously- varying spatial modulation patterns.
Restoration with equivalence to nonorthogonal image expansion for feature extraction and edge detection
Raghunath K. Rao, Jezekiel Ben-Arie
This paper discusses two additional applications of our newly developed expansion matching scheme: edge detection and feature extraction. Expansion matching optimizes a novel matching criterion called Discriminative Signal to Noise Ratio (DSNR) and has recently been shown to robustly recognize templates under conditions of noise, severe occlusion and superposition. The DSNR criterion is better suited to practical conditions than the traditional SNR since it considers as 'noise', even the off-center response of the filter to the signal itself. In this paper, we introduce a new optimal DSNR edge detector based on the expansion filter for an edge model. This edge detector is compared with the widely used Canny edge detector (CED). Experimental comparisons show that our edge detector is superior to the CED in terms of DSNR even under very noise signal conditions. Expansion matching is also successfully used for extracting features from images. One application that is described is extraction of corners from images. Another application of expansion matching that is outlines here is that of generic face recognition.
Adaptive-neighborhood image processing
Raman B. Paranjape, Rangaraj M. Rangayyan, William M. Morrow, et al.
A number of locally-adaptive, non-linear techniques for image enhancement have recently been developed. These methods typically involve convolution between the image and a rectangular, fixed-size, sliding window positioned over each pixel in the image whose coefficient are dependent on the image statistics under the sliding window. These nonlinear techniques, however, share a fundamental shortcoming that can reduce their utility: they are based on an assumption of stationarity of the image. This is, in general, not a good assumption, as image characteristics often change abruptly. Image segmentation can be used to break up an image into relative stationary regions so that different types of filters may be applied to statistically different regions of the image, however, this can result in unwarranted enhancement of edges between segmented regions. In this paper we present a new paradigm for image processing operations where unlike fixed-neighborhood methods, enhancement operations are based on the characteristics of an adaptive neighborhood determined individually for each pixel in the image. The adaptive neighborhood, just like the fixed neighborhood, surrounds the pixel to be enhanced, but the shape and area covered by the adaptive neighborhood are dependent on the local characteristics of the image rather than being arbitrarily defined. Images enhanced using adaptive neighborhoods are superior to those enhanced using fixed neighborhoods, as the adaptive-neighborhood image processing (ANIP) techniques tune themselves to the contextual details in the image. A major advantage in the adaptive-neighborhood techniques is that edges in the images are not degraded since the adaptive neighborhoods tend not to transcend real edges or boundaries in the image. We have, over the past decade, presented a series of articles describing various adaptive-neighborhood enhancement techniques in which neighborhoods are allowed to overlap. In our experience images enhanced using adaptive, overlapping neighborhoods are superior to those enhanced using fixed neighborhoods, as the adaptive-neighborhood techniques tune themselves to the contextual details in the image. In this paper, we provide an overall unifying viewpoint for this work, so that the use of adaptive-neighborhood image processing may be viewed as a new paradigm for image processing operations.
Video Coding I
icon_mobile_dropdown
Wavelet transform image sequence coder using nonstationary displacement estimation
Mark R. Banham, James C. Brailean, Aggelos K. Katsaggelos
In this paper, we present a novel coding technique which makes use of the nonstationary characteristics of an image sequence displacement field to estimate and encode motion information. In addition, we develop a wavelet transform approach using cross-scale vector quantization to encode single frames during periods of high motion and scene changes. The objective of this design is to demonstrate the coding potential of a newly developed motion estimator called the Compound Linearized MAP (CLMAP) estimator. This estimator can be used as a means for producing motion vectors which may be regenerated at the decoder with a coarsely quantized error term created in the encoder. The motion estimator generates highly accurate motion estimates. This permits the elimination of a separately coded displaced frame difference (DFD) and coded motion vectors. We exploit both the advantages of the nonstationary motion estimator and the edge preserving quality of the wavelet based still frame coder to improve the visual quality of reconstructed video-conferencing image sequences, at low bit rates.
Image sequence coding using a three-dimensional wavelet packet and adaptive selection
Touradj Ebrahimi, Murat Kunt
An alternative technique for wavelet packet decomposition of signals is introduced. This technique allows a time- (space-) variant partitioning of the joint domain while being efficient in both computational performance and memory requirements. The proposed wavelet packet decomposition is then applied on two codecs based on motion compensated 2-D wavelet packet decomposition and 3-D wavelet packet decomposition with adaptive subband selection. The performances of these codes in terms of the quality of the reconstructed signals are compared in the frame work of video compression. Simulations show that good quality results can be obtained with both systems, while the latter offers slightly better efficiency, in terms of the PSNR and the visual quality.
Study of nonseparable subband filters for video coding
Ya-Qin Zhang, Weiping Li
In this paper, the hexagonal subband filter banks developed in [SIMO 90] are applied to motion video sequences by incorporating multi-resolution motion compensation and vector quantization. A bit allocation scheme is obtained in terms of both the vector variance distribution and correlation properties within a vector. A classified vector quantizer is used to quantize the motion-compensated residual subbands using a multiresolution codebook, which is obtained by training the corresponding subbands in the training sequence.
Image sequence coding using quincunx wavelet transform, motion compensation, and lattice vector quantization
Thierry Gaidon, Michel Barlaud, Pierre Mathieu
This paper is concerned with image sequence coding based on motion estimation-compensation using a pel-recursive technique. Motion estimation achieved by minimization of a functional is improved by the incorporation of a discontinuity constraint on the optical flow. The prediction errors are vector quantized using lattices. Recent work enabled us to use truncated lattices (D4, E8, L16, ...) in pyramidal form to construct the codebooks. The coding results were achieved on real sequence image.
Motion-compensated transform coding technique employing subband decomposition
Hoon Paek, Rin Chul Kim, Sang Uk Lee
In this paper, by combining the motion compensated transform coding with the sub-band decomposition technique, we present a motion compensated sub-band coding technique (MCSBC) for image sequence coding. Also, several problems related to the MCSBC, such as a scheme for motion compensation in each sub-band, the optimal bit rate allocation to each sub-band, and the efficient VWL coding of the DCT coefficients in each sub-band, are discussed. For an efficient coding, the motion estimation and compensation is performed only on the LL sub-band, but the discrete cosine transform (DCT) is employed to encode all sub- bands in our approach. Then, the transform coefficients in each sub-band are scanned in a different manner depending on the energy distributions in the DCT domain, and coded by using the separate 2-D Huffman code tables, which are optimized to the probability distributions of each sub-band. The performance of the proposed MCSBC technique is intensively examined by computer simulations on the HDTV image sequences. The simulation results reveal that the proposed MCSBC technique outperforms the other coding techniques, especially the well-known motion compensated transform coding technique by about 1.5 dB in terms of the average peak signal to noise ratio.
Motion-adaptive subband coding of interlaced video
The application of subband coding to video signals is hindered by the fact that these signals usually have the interlaced format. For proper preservation of the motion in an interlaced video signal, field-based subband coding is preferred. In areas of the video frames without motion, however, frame-based subband coding leads to a lower bit rate than field-based coding. This paper proposes a method to combine the advantages of field-based and frame- based subband coding. The method is called motion-adaptive subband coding. It first divides each video frame into moving and non-moving areas. The moving areas are processed with field-based subband coding, the non-moving ones with frame-based subband coding. Special attention is paid to the transitions between moving and non-moving areas. It is shown that the proposed motion-adaptive subband coding system has a higher performance in terms of bit rate versus picture quality than purely field-based or frame-based subband coding.
Video compression using lapped transforms for motion estimation/compensation and coding
Robert W. Young, Nick G. Kingsbury
Many conventional video coding schemes, such as the CCITT H.261 recommendation, are based on the independent processing of non-overlapping image blocks. An important disadvantage with this approach is that blocking artifacts may be visible in the decoded frames. In this paper, we propose a coding scheme based entirely on the processing of overlapping, windowed data blocks, thus eliminating blocking effects. Motion estimation and compensation are both performed in the frequency domain using a complex lapped transform (CLT), which may be viewed as a complex extension of the lapped orthogonal transform (LOT). The motion compensation algorithm is equivalent to overlapped compensation in the spatial domain, but also allows image interpolation for sub-pel displacements and sophisticated loop filters to be conveniently applied in the frequency domain. For inter- and intra-frame coding, we define the modified fast lapped transform (MFLT). This is a modified form of the LOT, which entirely eliminates blocking artifacts in the reconstructed data. The transform is applied in a hierarchical structure, and performs better than the discrete cosine transform (DCT) for both coding modes. The proposed coder is compared with the H.261 scheme, and is found to have significantly improved performance.
Image Restoration
icon_mobile_dropdown
Enhancement of low-dosage cine-angiographic image sequences using a modified expectation maximization algorithm
Clinical x-ray image sequences acquired through fluoroscopy systems may be corrupted by quantum mottle--a Poisson-distributed, signal-dependent noise that arises with a controlled x- ray dosage reduction in an attempt to lower the exposure to the patient and the medical staff. In this paper, an approach to temporally filter this sequence is presented. It relies on a joint estimation of the signal and the displacement field through a maximum likelihood approach. Implementation is done via a modified EM algorithm to facilitate a more tractable solution.
New variants of the POCS method using affine subspaces of finite codimension with applications to irregular sampling
Hans Georg Feichtinger, C. Cenker, M. Mayer, et al.
The POCS-method (projection onto convex subsets) has been proposed as an efficient way of recovering a band-limited signal from irregular sampling values. However, both the ordinary POCS-method (which uses one sampling point at a given time, i.e. consists of a succession of projections onto affine hyperplanes) and the one-step method (which uses all sampling values at the same time) become extremely slow if the number of sampling points gets large. Already for midsize 2D-problems (e.g. 128 X 128 images) one may easy run into memory problems. Based on the theory of pseudo-inverse matrices new efficient variants of the POCS- method (so to say intermediate versions) are described, which make use of a finite number of sampling points at each step. Depending on the computational environment appropriate strategies of designing those families of sampling points (either many families with few points, or few families with many points, overlapping families or disjoint ones...) have to be found. We also report on numerical results for these algorithms.
High-quality image magnification applying Gerchberg-Papoulis iterative algorithm with discrete cosine transform
Eiji Shinbori, Mikio Takagi
A new image magnification method, called 'IM-GPDCT' (image magnification applying the Gerchberg-Papoulis (GP) iterative algorithm with discrete cosine transform (DCT)), is described and its performance evaluated. This method markedly improves image quality of a magnified image using a concept which restores the spatial high frequencies which are conventionally lost due to use of a low pass filter. These frequencies are restored using two known constraints applied during iterative DCT: (1) correct information in a passband is known and (2) the spatial extent of an image is finite. Simulation results show that the IM- GPDCT outperforms three conventional interpolation methods from both a restoration error and image quality standpoint.
System for the removal of impulsive noise in image sequences
Anil Christopher Kokaram, Peter J. W. Rayner
This paper presents a system for the restoration of image sequences that are degraded by impulsive noise such as scratches or dropouts. The proposed system uses a multilevel block matching algorithm to estimate the motion between frames and considers the use of an impulsive noise detector to improve the quality of restoration as compared to a global median operation. The detector considers the temporal continuity of motion compensated image information and makes a decision as to whether a suspected discontinuity is due to an impulsive distortion or occlusion in the sequence. When a corrupted portion of the image is detected a motion compensated median filter is used to remove the distortion. The paper introduces an extended multistage filter for image sequence processing. It is found that the use of the detector cannot adversely affect the filtered image when compared to the globally filtered image, and the detail preservation is generally better. The speed of processing is also increased since the number of median filtering operations is considerably reduced.
Spatiochromatic model for color image enhancement
Stuart Wolf, Ran Ginosar, Yehoshua Y. Zeevi
Traditionally the development of color image enhancement techniques has not taken the design of the human visual system into account. Color images have been treated in much the same way as achromatic images and only brightness has been enhanced. In this study we first discuss an opponent color model of the early visual system and then we show the benefit of performing spatial enhancement on both brightness and chroma in the context of the model. The opponent transform is calculated such that cone redundancies are reduced and information compressed, similar to the way it is performed in the visual system. This is done using a multispectral Karhunen-Loeve transform. This results in an achromatic and two opponent chromatic channels, one red-green and the other blue-yellow. Laplacian based edge enhancement is then performed independently on each channel. The spatial enhancement of chroma highlights small color detail and enhances color edges thus increasing image detail in areas where achromatic enhancement has no effect. Due to spatial limitations of the visual system, the best enhanced image is obtained when both achromatic and chromatic edge enhancement are used together.
Classification of distorted images by summation kernels
Victor A. Segalescu, Joseph Segman, Yehoshua Y. Zeevi
For planar deformation generated by commutative Lie transform groups, there exist integration kernels such that the associated integral transforms are invariant under the deformation, in a similar sense to the action of the Fourier transform on shifted functions. In many applications it is useful to have similar invariant summation kernels for some specific representation coefficients. We derive such summation kernels from the existing integration kernels.
Regularized multichannel restoration of color images using cross-validation
Wenwu Zhu, Nikolas P. Galatsanos, Aggelos K. Katsaggelos
Multichannel images are the multiple image planes (channels) obtained by imaging the same scene using multiple sensors. The validity of multichannel restoration where both the within and between channel relations are incorporated has already been established using both stochastic and deterministic restoration filters. However, it has been demonstrated that stochastic multichannel filters are extremely sensitive to the estimates of the between channel statistics. In this paper deterministic multichannel filters are proposed that do not utilize any prior knowledge about the multichannel image and the noise. Regularization based on the multichannel Cross-Validation function is used to obtain these filters. Their relation to stochastic multichannel restoration filters is examined and a technique to estimate the variance of the multichannel noise is proposed. Finally, experiments are shown where proposed filters and noise variance estimators are tested using color images.
Morphological and Related Nonlinear Filters
icon_mobile_dropdown
Adaptive morphological filters for color image enhancement
P. Deng-Wong, Fulin Cheng, Anastasios N. Venetsanopoulos
The structuring element in the conventional morphological operations has fixed size and fixed shape. As a result, images processed by morphological filters have the drawbacks of loss of details or pattern alteration. These deficiencies are intensified when processing color images. When the three independently processed channels are overlayed to form the final color image output, the artificial patterns created in each of the channels add up to form unnatural appearance. In this paper, we examine the new opening operator (NOP) and the new closing operator (NCP) whose structuring element has the ability to alter its shape according to the local geometric features of the images. The design algorithms and their use of the geometric properties to reduce computational complexity will be discussed in detail. Finally, the performance of a morphological filter which is constructed of the NOP and the NCP, is compared to that of the conventional closing-opening filters and the vector median filter in processing color images which are contaminated with noise. It is shown that the adaptive morphological filter removes the noise as effective as the vector median filter, while providing the best detail preservation among all outputs.
Statistical analysis of median type and morphological filters
Jukka Neejarvi, Lasse Koskinen, Yrjo A. Neuvo
In this paper, we analyze statistical properties of 1-D median type and morphological filters. Analytical formulas for the expectation of the dilation and closing in the case of i.i.d. Laplacian noise and explicit formulas for the expectation and variance of the morphological filters in the case of i.i.d. uniformly distributed noise are given. Noise attenuation figures of the filters for Gaussian, Laplacian and uniformly distributed noise are shown. It is shown that noise attenuation of the morphological filters varies considerably depending on the distribution of the noise. We analyze also the five point Morphological-FIR-Median Hybrid (MFMH) filter and its special case, the three point Morphological-Median Hybrid (MMH) filter structure. Responses for rectangular pulses of the MFMH and MMH filters are shown, and the behavior of the filters around noisy edges is studied. The performance of the MFMH filter near noisy edges is better than that of the FMH filter and the properties of the filter are adjustable in larger detail. In addition, an illustrative example of the noise attenuation performance of median type and morphological filters with an image is shown.
Weighted vector median operation for filtering multispectral data
Risto Wichman, Kai Oistamo, Qin Liu, et al.
Weighted vector median operation for filtering multispectral data is proposed. The operation shares the good qualities of vector median filters, and in addition, brings in extra freedom in the filter design. The proposed generalization of the vector median filters combines component weights, that facilitate the weighting of input signal channels separately, and distance weights that operate on the distances between the input vectors. With this structure the possible differences in the noise statistics in the different signal components can be taken into account. A number of computer simulations with color images is performed to study the properties of the new filtering scheme.
Compared performances of morphological, median type, and running mean filters
Demin Wang, Joseph Ronsin, Veronique Haese-Coat
In this paper, performances of basic morphological filters are evaluated. Very simple output distribution expressions of erosions and openings are given for independent non-identically distributed inputs. The output means and variances for input signals plus white Gaussian, bi- exponential, and uniform noises are then analyzed and computed. These results are used to compare the performances of morphological filters with those of median filters, (alpha) - trimmed mean filters, ranked-order filters and running mean filters. The comparisons show that morphological filters achieve the best edge preservation for all the three kinds of noises. As regards to noise suppression, morphological filters are the best ones for uniform noise, median filters are optimal for bi-exponential noise, and running mean filters are optimal for Gaussian noise. Performances of (alpha) -trimmed mean filters spread between those of median and linear filters, while performances of ranked-order filters are compromises between those of erosions (or dilation) and median filters.
Topological considerations on gray-level skeletonization
Joaquin Madrid, Russell M. Mersereau, Norberto F. Ezquerra
Analyzing the topological properties of algebraic opening and closing, we characterized the skeletonization of gray level signals as a homotopic transformation; i.e., a continuous deformation of the signal into a simple representation that preserves the connectivity of the signal features. To perform such a transformation, we interpret the morphological opening and closing as algorithmic approximation, in which the structural element imposes the connectivity criteria of the results. Our implementation is a variation of Serra's residual of openings, and is valid for any topology on the working space. We show a particular application and discuss the design of the appropriate structural elements.
Optimal morphological filters for discrete random sets under a union or intersection noise model
Nicholaos D. Sidiropoulos, John S. Baras, Carlos A. Berenstein
We consider the problem of optimal binary image restoration under a union or intersection noise model. Union noise is well suited to model random clutter (obscuration), whereas intersection noise is a good model for random sampling. Our approach is random set-theoretic, i.e. digital images are viewed as realizations of a uniformly bounded discrete random set. First we provide statistical proofs of some 'folk theorems' of Morphological filtering. In particular, we prove that, under some reasonable worst-case statistical scenarios, Morphological openings, closings, unions of openings, and intersections of closings, can be viewed as MAP estimation of the signal based on the noisy observation. Then we propose a 'generic' procedure for the design of optimal Morphological filters for independent union or intersection noise.
Morphological bounds on nonlinear filters
Mohammed A. Charif-Chefchaouni, Dan Schonfeld
In this paper, we present a general theory on the morphological bounds of nonlinear filters. Conditions for the existence of various morphological bounds on nonlinear filters are proposed. Several fundamental morphological bounds on nonlinear filters are derived. Extensions of morphological bounds on the iterations of a nonlinear filter are also derived. Criteria for a root of a nonlinear filter are derived. Finally, conditions for the convergence of the iteration of a nonlinear filter are proposed.
Vector Quantization
icon_mobile_dropdown
Man-machine interaction in the 21st century--new paradigms through dynamic scene analysis and synthesis (Keynote Speech)
Thomas S. Huang, Michael T. Orchard
The past twenty years have witnessed a revolution in the use of computers in virtually every facet of society. While this revolution has been largely fueled by dramatic technological advances, the efficient application of this technology has been made possible through advances in the paradigms defining the way users interact with computers. Today's massive computational power would probably have limited sociological impact is users still communicated with computers via the binary machine language codes used in the 1950's. Instead, this primitive paradigm was replaced by keyboards and ASCII character displays in the 1970's, and the 'mouse' and multiple-window bit-mapped displays in the 1980's. As continuing technological advances make even larger computational power available in the future, advanced paradigms for man-machine interaction will be required to allow this power to be used efficiently in a wide range of applications. Looking ahead into the 21st century, we see paradigms supporting radically new ways of interacting with computers. Ideally, we would like these interactions to mimic the ways we interact with objects and people in the physical world, and, to achieve this goal, we believe that it is essential to consider the exchange of video data into and out of the computer. Paradigms based on visual interactions represent a radical departure from existing paradigms, because they allow the computer to actively seek out information from the user via dynamic scene analysis. For example, the computer might enlarge the display when it detects that the user if squinting, or it might reorient a three- dimensional object on the screen in response to detected hand motions. This contrasts with current paradigms in which the computer relies on passive switching devices (keyboard, mouse, buttons, etc.) to receive information. Feedback will be provided to the user via dynamic scene synthesis, employing stereoscopic three-dimensional display systems. To exploit the synergism between analysis and synthesis, we will need common data representation used by both. To illustrate, we give some typical scenarios which could be made possible by these new paradigms.
Lattice vector quantization approach to image coding
Iole Moccagatta, Murat Kunt
In this paper we present an image coding technique which applies a lattice quantizer on the coefficient domain generated by a Gabor-like wavelet transform. The spatial correlation is reduced by this perceptual subband/wavelet transform with efficient implementation. Due to its visual significance, the DC component is PCM coded. The coefficients are then quantized by a block quantizer whose points lie on a lattice, followed by ideal entropy coding. Different truncated lattices have been tested, namely An, Dn, and E8. Simulation results using grayscale images have shown good performance for bit rates lower than 0.5 bits/pixel when the truncated lattice D16 is used.
Adaptive entropy-constrained lattice vector quantization for multiresolution image coding
Marc Antonini, Michel Barlaud, Thierry Gaidon
In many different fields, digitized images are replacing conventional analog images as photograph or X-rays. The volume of data required to describe such images greatly slows down transmission and makes storage prohibitively costly. The information contained in the images must therefore be compressed by extracting only the visible elements, which are then encoded. The quantity of data involved is thus substantially reduced. High compression rates can be achieved using wavelet transform and vector quantization (VQ) of wavelet coefficients subimages. In this paper, we propose a new scheme to vector quantize real Laplacian or generalized Gaussian sources using a multidimensional compandor and lattice vector quantization. We propose an approximation formula to compute the number of points contained in an n-dimensional hypercube--or truncated lattice when using uniform source data. We also propose an analytical expression for the distortion gain when a uniform source, rather than a Laplacian one, is quantized.
Vector clustering in symmetry-folded spaces for image vector quantization
Fabio Lavagetto
A significant amount of residual redundancy in vector quantized images can be further reduced by taking into accounts symmetries such as those presented by rotated or flipped versions of the same vector. Decorrelating data blocks with respect to this kind of symmetry allows for more efficient vector clustering, thus increasing the compression performances of the coder. In this paper an innovative though intuitively simple technique is presented for image vector quantization operating on a symmetry folded space. The symmetry invariance of the vector space is obtained by rotating and flipping each block in order to minimize a suitable functional: by means of this normalization procedures, rotated or flipped versions of the same block are mapped onto the same point in the folded space, thus clustering together previously scattered vectors. Fewer bits are therefore needed to quantize the redistributed vectors without increasing the level of distortion or, conversely, a lower distortion is achieved with unchanged bit rate. In order to invert the normalization procedure, some side information must be delivered to the decoder to drive the inverse operations of block rotation and flipping. However, because of the high level of spatial redundancy, this inverse operation can be efficiently predicted by analyzing nearby blocks. Only the prediction error is therefore coded and sent as side information. Experimental results have shown that for many image classes the entropy of the prediction error is fairly small in comparison with the bit saving stemming from the reduced size of the vectors and of the reconstruction look-up table.
Adaptive entropy-constrained predictive vector quantization of image with a classifier and a variable vector dimension scheme
Rin Chul Kim, Sang Uk Lee
In this paper, an entropy constrained predictive vector quantizer (ECPVQ) for image coding is described, and an adaptive ECPVQ (AECPVQ) technique to take into account the local characteristics of the input image is proposed. The adaptation is achieved by employing a classifier and the variable vector dimension scheme. In the proposed AECPVQ coder, separate predictors and codebooks are prepared for each class. The 6 X 6 input block is classified into one of the predetermined 6 classes according to the distribution of the feature vector in the DCT domain. Then, the input block is partitioned into several small vectors by the proposed variable vector dimension scheme to take into account the orientation of edge and the variances for each class. The vectors in each class are encoded using the corresponding codebook and the predictor. The computer simulation result shows that the proposed AECPVQ outperforms the conventional ECPVQ in terms of both the subjective quality and peak signal to noise ratio. For example, the AECPVQ enjoys a 1.5 dB gain over the ECPVQ at 0.7 bits/pel on the Lena image.
Neural net approach to predictive vector quantization
Nader Mohsenian, Nasser M. Nasrabadi
A new predictive vector quantization (PVQ) technique, capable of exploring the nonlinear dependencies in addition to the linear dependencies that exist between adjacent blocks of pixels, is introduced. Two different classes of neural nets form the components of the PVQ scheme. A multi-layer perceptron is embedded in the predictive component of the compression system. This neural network, using the non-linearity condition associated with its processing units, can perform as a non-linear vector predictor. The second component of the PVQ scheme vector quantizes (VQ) the residual vector that is formed by subtracting the output of the perceptron from the original wave-pattern. Kohonen Self-Organizing Feature Map (KSOFM) was utilized as a neural network clustering algorithm to design the codebook for the VQ technique. Coding results are presented for monochrome 'still' images.
Lapped block decoding for vector quantization of images
Siu-Wai Wu, Allen Gersho
We have investigated an improved decoding paradigm for vector quantization of images. In this new decoding method, the dimension of the code vectors in the decoder is higher than that of the input vectors at the encoder, so that the area covered by each output vector extends beyond the input block of pixels into its neighborhood. The image is reconstructed as an overlapping patchwork of output blocks (code vectors), where the pixel values in the lapped region are obtained by summing the corresponding elements of the overlapping code vectors. With a properly designed decoder codebook, this lapped block decoding technique is able to improve the performance of VQ by exploiting the interblock correlation at the decoder. We have developed a recursive algorithm for designing a locally optimal decoder codebook from a training set of images, given a fixed VQ encoder. Computer simulation with both full search VQ and tree structured VQ encoders demonstrated that, compared to conventional VQ decoding, this new decoding technique reproduces images with not only higher SNR, but also better perceptual quality.
Image coding based on classified lapped orthogonal transform vector quantization
Suresh Venkatraman, Jae Yeal Nam, K. R. Rao
Classified transform coding of images using vector quantization has proved to be an efficient technique. Transform vector quantization combines the energy compaction properties of transform coding and the superior performance of vector quantization. Classification improves the reconstructed image quality considerably because of adaptive bit allocation. Block transform coding of images, traditionally using DCT, produces an undesirable effect called the blocking effect. In this paper a classified transform vector quantization technique using the lapped orthogonal transform (LOT/VQ) is presented. Image blocks are transformed using the LOT and are classified into four classes based on their structural properties. These are further divided adaptively into subvectors depending on the LOT coefficient statistics as this allows efficient distribution of bits. These subvectors are then vector quantized. The LOT/VQ is an efficient image coding algorithm which also reduces the blocking effect significantly. Coding tests using computer simulation show the effectiveness of this technique.
Color image coding using variable blocksize vector quantization in (R,G,B,) domains
Ching-Yang Wang, Long-Wen Chang
Vector Quantization (VQ) is a popular image coding scheme with high data compression ratio. However, its major drawback is the edges in the reconstructed image are very poor if the codebook size isn't large enough. In this paper, we use the Variable Blocksize Vector Quantization (VBVQ) instead of a fixed blocksize VQ to code color images in the (R, G, B) domain. In order to alleviate block effect and preserve the edge well, we choose 4 X 4 X 3 as the largest block and 1 X 1 X 3 as the smallest block. For each blocksize we generate a codebook by using LBG's algorithm. The image is first decomposed into variable blocks and each block uses its corresponding codebook for vector quantization. In our simulation results, the bit rate is from 0.4 bpp (bit per pixel) to 1.1 bpp with PSNR (peak signal to noise ratio) of Y component from 28 db to 36 db. Our VBVQ outperforms conventional fixed blocksize 2 X 2 X 3 VQ with 512 codevectors in the codebook.
Motion Analysis and Video Coding
icon_mobile_dropdown
Motion estimation involving discontinuities in a multiresolution scheme
Michel Barlaud, Laure Blanc-Feraud, Jean-Marc Collin
In this paper, the problem of motion estimation is formulated mathematically and two classical methods are reviewed. Focus is then placed on a slightly different method which offers the advantage of stable convergence while providing a good approximation of the solution. Traditionally, the solution has been stabilized by regularization, as proposed by Tikhonov, i.e., by assuming a priori the smoothness of the solution. This hypothesis cannot be made globally over a field of motion vectors. Hence we propose a regularization process involving MOTION DISCONTINUITIES based on a Markov (MRF) model of motion. A new regularization function involving discontinuities is defined. Since the criterion is no longer quadratic, a deterministic relaxation method can be applied to estimate the global minimum. This relaxation scheme is based on the minimization of a sequence of quadratic functionals which tend toward the criterion. The algorithms presented were tested on two sequences: SPHERE, a synthetic sequence, and INTERVIEW, a real sequence.
Simple method to segment motion field for video coding
Bede Liu, King-Wai Chow, Andre Zaccarin
This paper describes a motion compensated video coding method that employs a simple segmentation scheme to achieve sub-block resolution of the motion field. This increased accuracy of the motion estimation reduces the energy of the residue or displace frame difference by as much as 30%. Including the overhead to send the side information, the average bit rate is reduced by approximately 10%.
Fast block-matching algorithm by successive refinement of matching criterion
Kangwook Chun, Jong Beom Ra
This paper describes a new fast block matching algorithm alleviating the local minimum problem based on successive refinement of motion vector candidates. The proposed algorithm employs a layered structure. At the first layer, a full search is performed with an approximated matching criterion to obtain a candidate set for the motion vector within a short computation time. In each successive searching process, the matching criterion becomes refined and the search is performed with it only for the candidate set obtained at the preceding layer to refine the candidates. By repeating this process, at the last layer, only a single motion vector can be selected from a few candidates using the conventional MAD (Mean Absolute Difference) criterion without approximation. Since a full search is performed with a coarse matching criterion at the first layer, the proposed algorithm can diminish the local minimum problem in the existing fast search algorithms and also reduce the computation time drastically compared with a brute force search.
Overlapped block motion compensation
Cheung Auyeung, James J. Kosmach, Michael T. Orchard, et al.
Block motion compensation is a critical component of most efficient video compression algorithms. Applications range from real-time low bit rate teleconferencing codecs to moderate and high bit-rate coding standards currently being considered for CD-ROM video storage and packet video. Recent research has suggested that blocking artifacts can be reduced by applying block motion compensation on overlapped blocks in image frames. In this paper, we formulate overlapped block motion compensation as an optimal linear estimator of pixel intensities given the neighboring block motion estimate in the frame. Applying this framework, we propose a procedure for designing optimal windows for overlapped block motion compensation. When applying block motion compensation with overlapped blocks, the optimal motion vector to apply at each block in an image depends on values of motion vectors in a non-causal neighborhood of that block. Thus, unlike for standard block-based compensation, optimal motion estimates cannot be computed with exhaustive search block- matching algorithms, even with the overlap window incorporated. Instead, we defined a simple iterative procedure for computing optimal motion estimates for overlapped block motion compensation. The performance gains achieved by both optimal motion estimation and optimal overlapped windows are demonstrated in simulations. Together, reductions of up to 30% in displaced frame difference energy are demonstrated compared with standard block motion compensation.
Global-local motion estimation in multilayer video coding
Paola Formenti, Pier Angelo Migliorati, L. Sorcinelli, et al.
In this paper we describe an algorithm for image interpolation that takes into account the global motion of the camera. A description of the motion field in terms of pan and zoom parameters (global motion) and displacement of independent moving objects (local motion) is carried out. First the estimated global parameters are employed to compensate the images of the camera motion. Then the missing images are interpolated using the local motion information. As a last step the images are reconstructed in their correct dimension considering the global motion information. The performances of the proposed algorithm has been tested within a multilayer video coding scheme. The simulation results show the effectiveness of the proposed motion compensating interpolation algorithm that uses global-local description of the motion field.
Spatially adaptive subsampling of HDTV using motion information
Ricardo A. F. Belfor, Reginald L. Lagendijk, Jan Biemond
For the transmission of HDTV-signals a data reduction is necessary. In currently implemented systems this data reduction is achieved using sub-Nyquist sampling for the stationary part of the image sequence. If the concept of sub-Nyquist is to be extended to moving parts of the scene, the problem of critical velocities is introduced. We propose to solve this problem by shifting the sampling lattice according to the motion, in such a way that no discarded pixels are present in the direction of the displacements. As such this method can be called motion compensated sub-Nyquist sampling. We show how this algorithm can be extended to incorporate fractional accuracy of the motion estimation and combinations with other subsampling structures. The control structure is based on the results of an error analysis of motion compensated interpolation schemes. The experimental results show a improved performance compared to fixed subsampling and nonadaptive sub-Nyquist sampling.
Efficient MPEG motion compensation scheme by motion trajectory tracking method
Xiaobing Lee, Peifang Zhou, Alberto Leon-Garcia
This paper presents an efficient Motion Compensation scheme by 3-D Motion Trajectory Tracking (MTT) for MPEG (Motion Picture Expert Group) video codec implementations. A frame of video sequence can be segmented into non-motion region, traceable-motion region, rapid-motion and new object regions. A particular macro-block of the frame is located within one of such regions. A 3-D motion trajectory vector can track the motion of this macro-block frame-by-frame, in parallel with the pre-filtering and frame-grouping processes, as long as it is located in the non-motion/traceable-motion regular or occasionally the rapid-motion region. The 2-D motion displacements of the macro-blocks which are near this trajectory trace can be approximated from the deviations of the 3-D motion trajectory, since there are strong temporal correlations among such 2-D displacements. The Motion Estimation (ME) of MPEG encoding can then be performed on a smaller window predicted from the pre-calculated 3-D motion trajectory vectors. This approach of MPEG ME has twice the complexity of motion compensation schemes with two adjacent frames (H.261 kind codecs) for arbitrary number (N) of skipped frames, rather than (N + 1)2/N + 2(DOT)[N2 + (N - 1)2 + ... + 22 + 1]/N. Therefore, real-time and low-cost MPEG codec is feasible, with the maximum amount of skipping frames.
Multirate image sequence coding with quadtree segmentation and backward motion compensation
Ligang Lu, William A. Pearlman
In this paper, we present a new image sequence coding scheme which employs backward motion compensation, quadtree segmentation, and pruned tree-structured vector quantization. Based only on the previous reconstructed frames, the backward motion compensation technique eliminates the necessity to transmit the motion displacement vectors as side information, and thus achieves a bit rate saving. Quadtree segmentation is used to exploit the regional characteristics of the signal. To take the advantage of the fact that many large areas of the motion compensated frame difference contain only low activity, the motion compensated frame difference is decomposed into large, low activity blocks and small, high activity blocks. The large blocks are encoded by their means while the small blocks are encoded by pruned tree-structured vector quantization (PTSVQ) at different rates. PTSVQ is both a multi-rate and variable rate coding technique and has the fast codebook search property due to its tree structure. It has been shown recently that PTSVQ may outperform full search unstructured VQ in the low bit rate range. Excellent results have been obtained in the computer simulation of this new scheme. When tested on the Salesman sequence, this interframe coding technique has achieved an average peak signal-to-noise ratio of 40.55 dB at an average bit rate of 0.43 bits per pixel, which indicates that the proposed scheme is suitable for low rate video applications.
Morphological Image Processing I
icon_mobile_dropdown
Morphological multiscale image segmentation
Philippe Salembier, Jean C. Serra
This paper deals with a morphological approach to the problem of unsupervised image segmentation. The proposed technique relies on a multiscale approach which allows a hierarchical processing of the data ranging from the most global scale to the most detailed one. At each scale, the algorithm relies on four steps: preprocessing, feature extraction, decision and quality estimation. The goal of the preprocessing step is to simplify the original signal which is too complex to be processed at once. Morphological filters by reconstruction are very attractive for this purpose because they simplify without corrupting the contour information. The feature extraction intends to extract the pertinent parameters for assessing the degree of homogeneity of the regions. To this goal, morphological techniques extracting flat or contrasted regions are very efficient. The decision step defines precisely the contours of the regions. This decision is achieved by a watershed algorithm. Finally, the quality estimation is used to compute the information that has to be further processed by the next scale to improve the segmentation result. The estimation is based on a region modeling procedure. The resulting segmentation is very robust and can deal with very different types of images. Moreover, the first levels give segmentation results with a few regions; but precisely located contours.
Hausdorff-metric continuity of projection-generated Fourier descriptors
The present paper concerns Fourier descriptors resulting from waveforms generated by geometric projection: a pattern A is projected on a line of angle (theta) , and the pattern's waveform is given by Proj(A,(theta) ), the length of the pattern's projection on the line. Rotating the line (varying (theta) ) generates the waveform and the pattern's descriptors are found by appropriately normalizing its DFT. Of interest is the behavior of projection- generated descriptors relative to the Hausdorff metric commonly employed in mathematical morphology, specifically, continuity of the descriptors relative to the Hausdorff metric. The fundamental proposition states that, as a mapping from the space of nonempty compact sets under the Hausdorff metric into the space of complex-valued sequences under the supremum norm, the projection-generated Fourier-descriptor transform is continuous. So long as we concern ourselves with nonempty compact sets, the basic morphological operations of erosion, dilation, opening, and closing are upper semicontinuous with respect to the Hausdorff metric; indeed, dilation is continuous. Hence, application of a morphological filter followed by computation of the projection-generated descriptors produces an upper semicontinuous operation (continuous in the case of dilation). Besides the general theory, the paper includes quantitative bounds on the descriptors for important morphological filters acting on noise images.
Optoelectronic systems for high-speed threshold-decomposition image processing
In threshold decomposition image processing an input grayscale image is decomposed into a sequence of binary slices, each corresponding to a different threshold value. Each slice is processed in some way and the results added. In an optoelectronic system the processing operation often involves a spatial filtering operation, and most likely followed by a nonlinearity. Ranked-order filtering of grayscale images can be performed with digital accuracy. Relatively simple optoelectronic implementation of threshold-dependent processing is also possible. Texture segmentation can be performed independent of illumination level.
Multiresolution morphological analysis of document images
Dan S. Bloomberg
An image-based approach to document image analysis is presented, that uses shape and textural properties interchangeably at multiple scales. Image-based techniques permit a relatively small number of simple and fast operations to be used for a wide variety of analysis problems with document images. The primary binary image operations are morphological and multiresolution. The generalized opening, a morphological operation, allows extraction of image features that have both shape and textural properties, and that are not limited by properties related to image connectivity. Reduction operations are necessary due to the large number of pixels at scanning resolution, and threshold reduction is used for efficient and controllable shape and texture transformations between resolution levels. Aspects of these techniques, which include sequences of threshold reductions, are illustrated by problems such as text/halftone segmentation and word-level extraction. Both the generalized opening and these multiresolution operations are then used to identify italic and bold words in text. These operations are performed without any attempt at identification of individual characters. Their robustness derives from the aggregation of statistical properties over entire words. However, the analysis of the statistical properties is performed implicitly, in large part through nonlinear image processing operations. The approximate computational cost of the basic operations is given, and the importance of operating at the lowest feasable resolution is demonstrated.
Lossy encoding of document images with the continuous skeleton
Jonathan W. Brandt, V. Ralph Algazi
We examine a new approach to document image compression based on the continuous skeleton shape representation. In particular, we exploit the structure-extracting property of the skeleton transformation to devise a code which allows graceful degradation of reproduction quality in exchange for dramatic increases in compression. At the source, the contours which make up the skeleton-based shape description are approximated by B-spline functions and then quantized and encoded. At the receiver, the skeleton contours are reconstructed and then used to regenerate the document. We apply the method to a standard ensemble of facsimile documents and compare the compression results with conventional lossless methods. The result is that significant compression gains can be achieved while incurring nearly imperceptible loss in the reproduction. This approach is motivated by the principle that the proper level at which to encode a document is at the object or component level, rather than at the pixel level. Approximations made to the descriptions of these objects are much less noticeable than comparable approximations made to the raw pixel data. Thus, adopting this object-centered viewpoint toward document compression allows us to introduce errors progressively, and often unobtrusively.
Character recognition using min-max classifiers designed via an LMS algorithm
Ping-Fai Yang, Petros Maragos
In this paper we propose a Least Mean Square (LMS) algorithm for the practical training of the class of min-max classifiers. These are lattice-theoretic generalization of Boolean functions and are also related to feed-forward neural networks and morphological signal operators. We applied the LMS algorithm to the problem of handwritten character recognition. The database consists of segmented and cleaned digits. Features that were extracted from the digits include Fourier descriptors and morphological shape-size histograms. Experimental results using the LMS algorithm for handwritten character recognition are promising. In our initial experimentation, we applied the min-max classifier to binary classification of '0' and '1' digits. By preprocessing the feature vectors, we were able to achieve an error rate of 1.75% for a training set of size 1200 (600 of each digit); and an error rate of 4.5% on a test set of size 400 (200 of each). These figures are comparable to those obtained by 2-layer neural nets trained using back propagation. The major advantage of min-max classifiers compared to neural networks is their simplicity and the faster convergence of their training algorithm.
Skeleton-chain coding for Chinese characters
Long-Wen Chang, Shui-Kung Chuang, Shang-Shung Yu
The compression of Chinese characters is very important for Chinese office automation and desktop publishing. In this thesis, various methods are used to compress 13051 Chinese characters with size 64 X 64 losslessly. A new method, based on coding the skeleton points of each Chinese character is proposed. The skeleton points are found by morphological operations. The method for coding skeleton points is taking advantage of Huffman code, Elias code and chain code. Better compression rate can be achieved than conventional methods.
Hierarchical Image Coding II
icon_mobile_dropdown
Statistic model for coding subband images using VQ and arithmetic coding
Andre Nicoulin, Marco Mattavelli
A new entropy coding algorithm for the compression of subband images is presented. By combining vector quantization (VQ) and scalar quantization (SQ) with entropy coding, the proposed scheme exploits the remaining statistical dependencies among the subband samples, and keeps an optimal control on local distortion by scalar quantization. The system is based on a statistical model which uses VQ information to generate low entropy probability tables for an arithmetic coder. The bit rate can be shared between VQ-rate and SQ-rate, allowing many possible configurations in terms of performances and implementation complexity. The proposed system shows improved performances when compared with other existing methods.
Perfect reconstruction conditions for subband decomposition with generalized subsampling
Masoud R. K. Khansari, Alberto Leon-Garcia
The operation of block sub-sampling and up-sampling are defined. Perfect reconstruction conditions for 2-band subband coding using these block sampling operations are then derived. It is shown that if the analysis filter banks do not have a common zero, there always exists a subsampling strategy such that perfect reconstruction is possible. The results are then generalized to the case of the periodically time-varying filter banks.
Multiplierless suboptimal PR-QMF design
Ali Naci Akansu
This paper searches for the suboptimal multiplierless orthonormal PR-QMF solutions. It is shown that there are multiplierless PR-QMFs and they perform comparable or better than the known filter banks and DCT based codec for the objective and subjective tests performed. They are very efficient to implement on VLSI. It is expected that these PR-QMFs will find their applications in real-time image-video coding.
Short-tap and linear-phase PR filter banks for subband coding of images
Jiro Katto, Kunitoshi Komatsu, Yasuhiko Yasuda
This paper presents a filter design algorithm in order to realize high efficient image codec system based on subband decomposition, and also describes unified frameworks to interpret various image coding techniques. First, various image coding techniques based on the linear transform are dealt with in an unified manner, where both the matrix representation in time domain and the multirate filter bank concept are introduced. Reconsideration about the DPCM is stressed here because the subband coding and the transform coding have been already formulated in some common frameworks. Neither the filter's frequency response nor its orthogonality are taken into consideration because they don't necessarily play an important role in our formulation, particularly in case of the DPCM and the SSKF. Secondly, statistical optimization of short tap and linear phase PR (perfect reconstruction) filter banks is considered. The UCG (unified coding gain) that we proposed at the VSIP '91 as a performance measure of energy compaction properties of the multirate filter bank is expanded to multiple layer cases, and several new examples are presented. The short tap structure leads to low computational complexity, and the linear phase property contributes not only to remove phase distortion but also to solve the so-called border problem. Simulation results are also shown, and the validity of our approach is confirmed.
Layered coder using subband approach
King N. Ngan, Khok Khee Pang
For service interworking, scalability is an essential features of a generic video coder. Layered coding achieves scalability by splitting the video information into different layers which could be coded independently. In this respect, subband coding is suitable for layering. In this paper, we presented a layered coding architecture based on subband coding. The base layer provides an image of lower resolution while the enhancement layer (together with the base layer) gives a higher resolution image. The base layer can be transmitted in a high priority channel in an ATM environment to provide cell loss resilience.
Image compression using subband/wavelet transform and adaptive multiple-distribution entropy coding
Serafim N. Efstratiadis, Bruno Rouchouze, Murat Kunt
Image compression methods using subband/wavelet transform and adaptive multiple distribution entropy coding (AMDEC) are presented. The methods are suitable for the coding of still images, as well as, motion compensated prediction error images for video compression. First, the subband/wavelet transform coefficients are uniformly quantized. For still image coding perception based weight quantization is used. Space filling scanning, subband partitioning and classification methods are used in dividing the original source of quantized coefficients into a number of subsources with corresponding distributions. A hierarchical partition priority coding (PPC) approach is followed, that is, given a suitable partitioning of their range, the transform coefficients are ordered based on their magnitude. AMDEC is applied to the output of PPC, which contains magnitude and location information, using adaptive arithmetic coding based on the histograms of the various subsources. Experimental results on standard monochromatic images and video-conference image sequences demonstrate the great performance of the proposed AMDEC methods.
Video coding with motion-compensated subband/wavelet decomposition
Hirohisa Jozawa, Hiroshi Watanabe
A hybrid video coding scheme using motion compensated subband decomposition is proposed. In the proposed method, interframe prediction is performed for the subband domain data. Subband data for interframe prediction are obtained by applying analysis filter to the image blocks already shifted by the amount of estimated motion. Simulated results show that the efficient of subband coding is significantly improved by the proposed hybrid coding method. Motion compensated subband decomposition is well suitable for subband/wavelet filter banks.
Design and analysis of directional 2D nonseparable perfect reconstruction filter banks for subband coding of images and video signals
Chang-Lin Huang, Chen-Chang Lien
In this paper, we develop a directional 2-D non-separable filter bank which can perform the perfect reconstruction of the downsampled subband signals. The filter bank represents a union of two powerful image and video processing tools: directional decomposition and subband decomposition. This subband decomposition is implemented by: (1) shifting the input signal and the subband signals; (2) using a tree-structure diamond shape prefilter followed by downsampling on quincunx grids; and (3) applying four types parallelogram prefilters followed by four different downsampling matrices respectively. This paper addresses the design and implementation of two-channel filter banks for such applications. The two-band subsystem in the tree-structure filter bank is proved and analyzed to be able to provide perfect reconstruction of the downsampled subband signals. Our method is extremely computationally simple in designing the analysis/synthesis subfilters for the filter bank without using any nonlinearly constrained numerical optimization. Finally, we use conventional 1-D analysis/synthesis filters as prototype and then apply McClellan transform for the specific 2-D diamond shape and parallelogram shape sub-filters.
Video Coding II
icon_mobile_dropdown
Image analysis for adaptive noise reduction in super high-definition image coding
V. Ralph Algazi, Todd Randall Reed, Gary E. Ford, et al.
The encoding of Super High Definition Images presents new problems with regard to the effect of noise on the quality of images and on coding performance. Although the information content of images decreases with increasing resolution, the noise introduced in the image acquisition or scanning process, remains at a high level, independently of resolution. Although this noise may not be perceptible in the original image, it will effect the quality of the encoded image, if the encoding process introduces correlation and structure in the coded noise. Further, the coder performance will be affected by the noise, even if the noise is not perceived. Therefore, there is a need to reduce the noise by pre-processing the SHD image, so as to maintain image quality and improve the encoding process. The reduction of noise cannot be performed by low pass filtering operations that will degrade image quality. We are applying to this problem image analysis for adaptive noise removal. We discuss first the information theoretic issues on the effect of noise on coders. We then consider adaptive noise removal techniques to the perceptually transparent and very high quality coding of still SHD images.
Video coding with adaptive quantization and rate control
Caspar Horne, Atul Puri
Adaptive quantization and rate-control are key components in the performance of video coding schemes. Within the framework of ongoing second phase of Motion Picture Experts Group (MPEG-2), various improvements in adaptive quantization and rate control concepts developed for MPEG-1 based coding are possible. In this paper, we first review the adaptive quantization schemes developed so far for MPEG based coding. We then analyze performance of one such scheme and compare its behavior with that expected from a good adaptive quantization scheme. Next, we propose several modifications that have the potential for substantially improving the performance of this scheme.
Entropy-constrained quantization for existing video decoders
Yong Han Kim, Katsutoshi Sawada
Once video coding standards are finalized, decoders are completely specified so that encoders from different providers can maintain the inter-operability. However, some of the encoding parameters such as the decision thresholds of the embedded quantizers and the buffer-control schemes still remain at designer's freedom. Taking advantage of this freedom, this paper concentrates on the design of encoders for existing decoders. More specifically, an entropy- constrained design approach is described only for the quantizer decision thresholds within encoders while the reconstruction levels and the variable-length code (VLC) table remain unchanged. The efficiency of the new method is demonstrated through an example of the well- known Lloyd-Max quantizers operating on broad-tailed generalized Gaussian distributed memoryless sources.
Adaptive error concealment algorithm for MPEG compressed video
Huifang Sun, Joel Zdepski
This paper presents an adaptive error concealment technique for MPEG (Moving Picture Experts Group) compressed video. Error concealment algorithms are essential for many practical video transmission scenarios characterized by occasional data loss due to thermal noise, channel impairments, network congestion, etc.. Such scenarios of current importance include terrestrial (simulcast) HDTV, teleconferencing via packet networks, TV/HDTV over fiber-optic ATM (asynchronous transfer mode) systems, etc. In view of the increasing importance of MPEG video for many of these applications, a number of error concealment approaches for MPEG have been developed, and are currently being evaluated in terms of their complexity vs. performance trade-offs. Here, we report the results of recent work on a specific adaptive algorithm that provides excellent robustness properties for MPEG-1 video transmitted on either one- or two-tier transmission media. Receiver error concealment is intended to ameliorate the impact of lost video data by exploiting available redundancy in the decoded picture. The concealment process must be supported by an appropriate transport format which helps to identify the image pixel regions which correspond to lost video data. Once the image region (i.e., macroblocks, slices, etc.) to be concealed are identified, a combination of temporal and spatial replacement techniques may be applied to fill in the lost picture elements. The specific details of the concealment procedure will depend upon the compression algorithm being used, and on the level of algorithmic complexity permissible within the decoder. Simulation results obtained from a detailed end-to-end model that incorporates MPEG compression/decompression and a custom cell-relay (ATM type) transport format are reported briefly.
Segmentation-based motion estimation and residual coding for packet video: a goal-oriented approach
Day-Fann Shen, Sarah A. Rajala
Image transmission via packet switched networks has a significant impact on encoded image data. To develop an efficient image codec for packet video, the goals of image coding are redefined and formulated as an optimization problem. Guided by these goals, a set of design requirements and a new segmentation based coding technique is developed. This approach features region based motion estimation, region based residual coding and region based single frame coding. The performance of the proposed algorithm is evaluated and a packet loss compensation algorithm is presented. As a result, good image quality at very low-bit rates can be achieved.
Joint source coding and packetization for video transmission over ATM networks
Qin-Fan Zhu, Yao Wang, Leonard G. Shaw
The DCT-based image and video coding methods are tailored to fit into the ATM environment. Coding, packetization and reconstruction methods are designed jointly to achieve a good compromise among compression gain, system complexity, processing delay, error concealment capability, and reconstruction quality. The JPEG and MPEG algorithms for image and video compression are modified to incorporate even-odd block interleaving in the spatial domain and DCT coefficient segmentation in the frequency domain to conceal the errors due to packet loss. When combined with proper layered transmission, the proposed system can handle very high packet loss rates, at only slight cost of compression gain, system complexity and processing delay.
Complete bit rate models of digital TV and HDTV codecs for transmission on ATM networks
This paper aims to describe the digital TV and HDTV codecs as variable bit rate sources to be transmitted on ATM networks. The bit rates will be studied and modeled at different time- scales ranging from the sub-image level (a few microseconds) up to the program level (several hours). At the lowest level corresponding to the microscopic time constants of the network, the cell interarrival times will be considered. At higher levels corresponding to the macroscopic time-scales of the image encoding algorithm, the bit or cell rates will be considered as counting processes over different time spans. To validate the theoretical approach, an experiment has been set up to process with a digital TV codec 25 hours of actual time TV images recorded on D1 tapes. The bit rates and the cell interarrival times have been collected and Statistics will be carried out to validate the proposed models.
Morphological Image Processing II
icon_mobile_dropdown
Euclidean skeletons and conditional bisectors
Hugues Talbot, Luc M. Vincent
This paper deals with the determination of skeletons and conditional bisectors in discrete binary images using the Euclidean metrics. The algorithm proceeds in two steps: first, the Centers of the Euclidean Maximal Discs (CMD) included in the set to skeletonize are characterized and robustly identified. Second, a firefront propagation is simulated starting from the set boundaries, in which pixels which are not centers of maximal discs and are not crucial to homotopy preservation are removed. Not only is the resulting algorithm fast and accurate, it allows the computation of a vast variety of skeletons. Furthermore, it can be extended to provide conditional bisectors of any angular parameter (theta) . This leads to the introduction of a new morphological transformation, the bisector function, which synthesizes the information contained in all the (theta) -conditional bisectors. The interest of all these skeleton-like transformations is illustrated on the segmentation of binary images of glass fibers.
Fast image compression using distance function on curved space
In this paper, a new fast image compression method utilizing an improved version of the Distance Function On Curved Space (DTOCS), is presented. The maximas of distances are used directly to select control points. Also a new concept, a varying structuring element, a curvature constant, is applied to the calculation of distances. Its influence on compression results is studied.
Bit plane decomposition and shape analysis for morphological skeletonization
Tun-Wen Pai, John H. L. Hansen
This paper addresses the problem of structure element selection in the context of a morphological based grayscale image communication system. The morphological skeleton representation in discrete space provides a means of lossless coding, and the coding efficiency is further improved in its minimized version by choosing a more appropriate structuring element. For an image with consistent shape distribution such as a texture pattern, a more efficient and useful skeleton representation is expected. Analysis of simulated and natural image patterns show the activated points in a morphological skeleton range between 30 and 327 points using different structuring elements. A procedure is proposed which allows for the selection of a more effective structuring element from a basis set of structuring elements. The decision process of the multiprototype pattern classification is based on the minimum-distance measurement between the chain code edge vector of object and the basis set of structuring elements. For a grayscale image communication scheme, the binary morphological skeleton transformation provides a progressive transmission framework. This framework is based on the bit plane decomposition with Gray code mapping. The progressive communication system is useful for searching image databases over a narrowband communication channel. Once the image of interest is found, the progressive communication system can provide complete knowledge of the image without loss of any information. The proposed bit plane skeleton transmission system achieves data compression rates from 2.36 to 4.28 in this study, while the original image can be reconstructed exactly using the entire set of decomposed bit planes.
Shape description by a distribution function based on morphological decomposition
Tadahiko Kimoto, Motohiro Asai, Yasuhiko Yasuda
A new method of describing the shape of a silhouette for data compression is proposed. A silhouette is decomposed into a union of ellipsoids. The algorithm of shape decomposition is based on mathematical morphology. By this algorithm, the ellipsoids are determined in the descending order of size. The ellipsoids extracted are represented in a tree structure according both to the size and to the adjacency. The morphological closing operation is used to measure the distance between two ellipsoids. In this tree representation, the ellipsoids are classified into two categories: ones that expand the internal structure of the region, and others that are located to fill in the gap between other ellipsoids. This tree indicates the order of the ellipsoids to reproduce the internal structure of the region progressively. Also, a sub-tree defines a partial structure of the region. The strategy for achieving data compression is truncating the sequence of the ellipsoids. To fill in the gaps caused by the discarded ellipsoids, each ellipsoid is replaced by one defined by a density distribution function, which is called a metaellipsoid. The result of simulation has shown that the different area of the region reproduced from the set of the metaellipsoids from the original region is reduced to approximately one-half of that of the union of the normal ellipsoids, at the same amount of data.
Morphological shape description using geometric spectrum on multidimensional binary images
Frank Yeong-Chya Shih, Christopher Chamin Pu
In this paper we present a useful morphological shape description tool called geometric spectrum or G-spectrum, for quantifying the geometric features on multidimensional binary images. The basis of this tool relies upon the cardinality of a set of non-overlapping segments in an image using morphological operations. The G-spectrum preserves the translation invariance property. With a chosen set of isotropic structuring elements the G-spectrum also preserves the rotation invariance. After the procedure of normalization, the G-spectrum can also preserve the scaling invariance. The properties and proofs of the G-spectrum are discussed.
Classifier system for learning spatial representations based on a Pebble_Pond morphological wave-propagation algorithm
Michael M. Skolnick
Pebble_Pond performs morphologically based wave propagation on an input set of points on the plane, with the points corresponding to the locations of detected image features. The waves are allowed to pass through each other, resulting in an complex evolving state space from which can be obtained a diverse class of non-planar spatial measures and structures, e.g., all k nearest neighbors, k-th order Voronoi tessellations, and k-th order Gabriel graphs. One perspective on Pebble_Pond is that it takes spatial structure and transforms it into temporal structure. That is, at each iteration in the wave propagation, measures on the state space reflect spatial structure at the scale corresponding to the current iteration. Thus, at each iteration all measures obtained (in parallel) from the state space report on all spatial relations falling within the distance that the waves have propagated. This paper investigates how particular measures of the underlying state space can be used to provide basic input to a spatial learning system. The learning system to be described is based on induction via classifier systems that are modified by bucket-brigade and genetic algorithms. Classifier systems are especially useful in environments where pattern recognition actions at given time steps need to be linked with related pattern recognition actions occurring at latter time steps. This paper will describe how a classifier system can be built which exploits the parallelism and temporal ordering of the measures that arise out of the basic Pebble_Pond algorithm.
Texture classification by gray-scale morphological granulometries
Yidong Chen, Edward R. Dougherty
Binary morphological granulometric size distributions were conceived by Matheron as a way of describing image granularity (or texture). Since each normalized size distribution is a probability density, feature vectors of granulometric moments result. Recent application has focused on taking local size distributions around individual pixels so that the latter can be classified by surrounding texture. The present paper investigates the extension of the local- classification technique to gray-scale textures. It does so by using forty-two granulometric features, half generated by opening granulometries and a dual half generated by closing granulometries. After training and classification of both dependent and independent data, feature extraction (compression) is accomplished by means of the Karhunen-Loeve transform. The effect of randomly placed Gaussian noise is investigated.
Fractals and Wavelets
icon_mobile_dropdown
Fast algorithm to select maps in an iterated function system fractal model
Greg Vines, Monson H. Hayes III
A new algorithm is proposed for determining the interpolation points for an Iterated Function System (IFS) model for one-dimensional data. The algorithm quickly selects points which are shown to provide favorable results when compared to an exhaustive search of all possible points. The algorithm is based on a recent proof which relates the fixed points of the IFS maps to the extremum points of the attractor of the IFS. The resulting algorithm greatly reduces the search time for the best interpolation points, and results are given comparing the proposed algorithm to exhaustive searches for a small number of maps on a series of test files.
Image classification and segmentation using multichannel fractal modeling
Dimitrios K. Kaloyeras, Stefanos D. Kollias
Multichannel fractal modelling of the color images is proposed in this paper as an efficient tool for image classification and segmentation. An extension of the fractal dimension to appropriate matrix forms is used for this purpose, showing that it leads to more accurate representations of the original images. Various factors affecting these representations are investigated while artificial neural networks are used to classify the derived feature sets and segment the images.
Fractional Brownian motion: a maximum-likelihood estimator for blurred data
Rachid Harba, William J. Ohley, Stephan Hoefer
Fractional Brownian motion is a useful tool to describe many objects and phenomena. But in the case of real data, the estimation of the H parameter is corrupted by noise and sometimes blur. The maximum likelihood estimation of H can take into account these perturbations. This communication deals with the problem of the blur which is modeled by a low pass filter. It is then possible to rewrite the autocorrelation function of the data and the estimation of H is performed. The Cramer-Rao lower bound (CRLB) is stated. Finally, synthetic data permit the proof that the estimation of H is possible even if the signal is blurred. The variance of the estimates is compared with the CRLB and shows the quality of the results.
Fast algorithm for lapped nonorthogonal transform: application to the image Gabor transform
Michel Poize, Marc Renaudin, Patrick Venier
A fast algorithm to solve the problem of the expansion of a one or two-dimensional finite and discrete signal into a lapped non-orthogonal time-modulated set of functions is described, thus providing a solution to the particular problem of the Gabor transform of images. Starting with current methods, such as Bastiaan's auxiliary functions and the ZAK transform, we describe a new algorithm resulting in a significant decrease in CPU time for image Gabor coefficients with only a slight approximation. The complexity of this algorithm is equivalent to the block Fourier transform plus 2 to 4 operations for each pixel. The global complexity is thus in O(M) compared with the most rapid current method in O(M logM). This algorithm is presented in two formalisms: the transform and the filtering formalisms in order to show the interdependence of the two approaches. Finally, theoretical results are demonstrated by an implementation study of the new fast algorithm.
Gabor representation with oversampling
Meir Zibulski, Yehoshua Y. Zeevi
An approach for characterizing the properties of the basis functions of the Gabor representation in the context of oversampling is presented. The approach is based on the concept of frames and utilizes the Piecewise Zak Transform (PZT). The frame operator associated with the Gabor-type frame, the so-called Weyl-Heisenberg frame, is examined for a rational oversampling rate by representing the frame operator as a matrix-valued function in the PZT domain. Completeness and frame properties of the Gabor representation functions are examined in relation to the properties of the matrix-valued function. The frame bounds are calculated by means of the eigenvalues of the matrix-valued function, and the dual-frame, which is used in calculation of the expansion coefficients, is expressed by means of the inverse matrix.
Image representation by wavelet-type transforms based on the similarities group
Joseph Segman, Victor A. Segalescu, Yehoshua Y. Zeevi
In this paper we discuss the wavelet approach to image representation according to the action of the similarities group, i.e., according to the action of radial scaling, angular rotation and shifting. We find sufficient conditions for a discrete set of such wavelet-type functions in order to constitute a frame of L2(R2) images. Finally, an iterative algorithm to compute the desired representation coefficients is presented, and a lower bound on the rate of convergence of the algorithm is offered.
Novel approach to template recognition by image expansion with nonorthogonal wavelets
Jezekiel Ben-Arie, Raghunath K. Rao
This article concentrates on three related issues: a generalized scheme for non-orthogonal image representation, expansion matching of templates and implementation of expansion with restoration techniques. First, the requirements for general representation of discrete L2(R) signals by expansion with non-orthogonal basis functions (BFs) are examined. It is proved that both circulant and truncated self-similar BFs in a dense configuration have to satisfy only minor conditions to serve as complete bases for discrete L2(R) signals. A novel Discriminative Signal-to-Noise Ratio (DSNR) is then defined. The DSNR is more relevant to template matching since it considers as 'noise' even the filter's off-center response to the template. Maximization of this performance criterion leads to an expansion scheme that employs a BF set that consists of template-similar basis functions. In addition, it is proved that such an expansion matching is precisely equivalent to minimum squared error restoration with Wiener filters, thus enabling an efficient implementation of our technique.
Architectures for Image and Video Processing
icon_mobile_dropdown
Systolic array architecture for real-time Gabor decomposition
Giridharan Iyengar, Sethuraman Panchanathan
In this paper, we propose a combined systolic array--content addressable memory architecture for image compression using Gabor decomposition. Gabor decomposition is attractive for image compression since the basis functions match the human visual profiles. Gabor functions also achieve the lowest bound on the joint entropy of data. However these functions are not orthogonal and hence an analytic solution for the decomposition does not exist. Recently it has been shown that Gabor decomposition can be computed as a multiplication between a transform matrix and a vector of image data. Systolic arrays are attractive for matrix multiplication problems and content addressable memories (CAM) offer fast means of data access. For an n X n image, the proposed architecture for Gabor decomposition consists of a linear systolic array of n processing elements each with a local CAM. Simulations and complexity studies show that this architecture can achieve real-time performance with current technology. This architecture is modular and regular and hence it can be implemented in VLSI as a codec.
Systolic array for real-time morphological image processing
Elias S. Manolakos, Jinhong K. Guo
In this paper we systematically synthesize a mesh-type VLSI application specific array architecture for real-time morphological image processing. The array can perform dilations, erosion or successive combinations of the two (opening and closing) at video rates. The 2-D image can be gray-level (or binary) and the structuring element of any arbitrary shape, as long as its bounding box is known. A partitioning scheme is proposed that can be used to decompose and match the large size problem to a smaller target array.
Efficient bit-level systolic arrays for QMF banks
Chia-Wen Lin, Yung-Chang Chen, Chin-Liang Wang
In this paper, various systolic arrays are proposed for the application to quadrature mirror filter (QMF) banks. A word-level systolic array is firstly presented to realize QMF banks. It is subsequently refined to bit-level array with bit-parallel arithmetic via the well-known two-level pipelining techniques and is then converted to bit-serial form by using the bit-serial inner product array proposed by Wang et al.. By applying the polyphase representation as well as fully utilizing the special relations among QMFs', aside from the memory cost, the whole filter bank can be constructed by using only about one half of the hardware expense of a prototype filter. In comparison with the direct realization using polyphase representation, the number of the systolic multiplier-accumulators (SMAs) required for our architecture is halved. Thus, both the chip area and transistor-count are reduced. As a result, with today's commercial CMOS technology, the whole filter bank can be implemented within a single-chip for various video applications.
High-speed binary-image processor for reduction conversion and image rotation
Hiroyuki Matsumoto, Ikuro Oyaizu
A high-speed binary-image processor has been developed to perform reduction conversion while suppressing image degradation. The processor adopts two newly developed reduction- conversion methods: thin-line preservation reduction and multilevel display reduction, and also incorporates a 90-degree rotation function indispensable to image processing. For A4-size image data having a resolution of 200 dpi, the processor performs 1/3 reduction in 95 msec and rotation by 90 degrees in 66 msec.
Implementation of vector quantization techniques in the transform domain
Herbert Plansky
This article presents a coding scheme using variable block-size transform coding together with vector quantization (VQ). This coding scheme shows a satisfying picture quality at bitrates of about 0.3 - 0.6 bit/pel (coding of the luminance signal only). The coding technique is suited for multimedia, computer, and distribution applications due to its asymmetry in complexity and its inherent hierarchical structure. The picture is segmented into rectangles of different sizes. These rectangles are transformed by a two-dimensional DCT and coded by VQ based on analysis in the spatial and transform domain. A decomposition scheme of the rectangles into vectors which is adapted to non-stationary signals like edges will be introduced. Computer simulations compare the results of constant and variable blocksize TVQ. The second part of the article discusses the problems of implementing VQ resulting from large code tables and high computational complexity. Strategies for avoiding these drawbacks are presented.
VLSI Reed-Solomon decoder
Yong Hwan Kim, Young Mo Chung, Sang Uk Lee
In this paper, a VLSI architecture for Reed-Solomon (RS) decoder based on the Berlekamp algorithm is proposed. The proposed decoder provides both erasure and error correcting capability. In order to reduce the chip area, we reformulate the Berlekamp algorithm. The proposed algorithm possesses a recursive structure so that the number of cells for computing the errata locator polynomial can be reduced. Moreover, in our approach, only one finite field multiplication per clock cycle is required for implementation, provided an improvement in the decoding speed. And the overall architecture features parallel and pipelined structure, making a real time decoding possible. It is shown that the proposed VLSI architecture is more efficient in terms of VLSI implementation than the architecture based on the recursive Euclid algorithm.
Hierarchical multiprocessor system for video signal processing
Joerg Wilberg, Matthias Schoebinger, Peter Pirsch
The architecture of a hierarchical multiprocessor (MP) system for videocoding is discussed. The topmost level of the proposed MP-system consists of identical, bus connected processing elements (PEs). A heterogeneous MIMD (multiple instruction multiple data) architecture is proposed for the PE. The PE contains a shared local memory and processing units, which are adapted to specific tasks. A strategy for optimizing the efficiency (defined by inverse area X processing time) at different levels of the hierarchy is proposed. This allows to realize an H.261 videocodec with only a few, if not a single PE. The efficiency of each PE can be increased, if multiple datablocks are processed concurrently within the PE (macropipelining). On the basis of a 1.0 micrometers CMOS technology a single PE (clock rate 50 MHz) can process the H.261 videocodec (except variable length coding and decoding) for CIF images at a frame rate of 18 Hz. Assuming an 0.6 micrometers CMOS technology, a single PE is expected to process frame rates of 30 Hz. A rough estimate of the silicon area for this technology is in the order of 100 mm2.
Open architecture television for motion-compensated coding
V. Michael Bove Jr., Edmond Chalom
Open Architecture Television centers on the development of digital image representations that allow video to be displayed at resolutions and frame rates that do not necessarily match the numerical parameters of the camera. We begin an investigation of frame-rate conversion in the reconstruction of motion-compensated image sequences, and suggest a simple change in the generation of the motion vectors which enhances the quality of the images produced.
Video Coding III
icon_mobile_dropdown
Interlaced image sequence coding for digital TV using adaptive quantization
Bruno Rouchouze, Touradj Ebrahimi, Frederic Dufaux, et al.
A method for interlaced image sequence coding for digital TV is presented. In order to obtain a more efficient compression, only the fields of one parity are compressed instead of compressing the frames resulting from interlaced to progressive format change. The fields of the other parity are predicted using spatio-temporal interpolation based on the corresponding decoded fields, and the prediction error is coded and transmitted. In this way, the decoder can reconstruct the odd and even parity fields with a reduced transmission cost. Experimental results, where the proposed interlaced coding method is applied to Gabor-like wavelet transform coding of MPEG2 image sequences, show a very good performance of the proposed scheme.
Video coding using switched motion compensation and high-order entropy coding
Ting-Chung Chen, Shawmin Lei
High quality video transport is a key to the successful offering of networked multimedia or entertainment video services, especially under stringent bandwidth limitations. Video coding techniques that improve visual quality are thus of much interest. This paper reports a video coding scheme designed to transmit high quality HDTV at about 15 Mbps, a CCIR standard video at about 4 Mbps, or the common intermediate format (CIF) video at about 1 Mbps. To improve the known art of video compression, two techniques were further explored. One reduces the temporal redundancy of the input information, essentially by using a switched motion compensation technique. The other takes further advantage of spatial correlations in lossless spatial coding, essentially by using high order entropy coding of the quantized symbols. Two representative motion compensation techniques, the conventional frame-based motion compensation and a new field-adjusted motion compensation (FAMC) currently being proposed at the MPEG2 committee, were analyzed and compared with the proposed switched motion compensation. Based on the mean square prediction error of four test sequences, the new scheme achieves a consistent gain, at twice the processing complexity of the conventional scheme but at only 5% the processing complexity of the FAMC. This forward directional prediction is especially suitable for low delay coding in interactive visual communications, even though it can also incorporate bi-directional prediction to improve performance. The high-order entropy coding technique explored in this paper gives 27% lower rate than conventional run-length/Huffman coding. The required number of high-order entropy code tables is reduced by 97% by our innovative code table reduction technique. Compared to the results we reported before, the required number of code tables is further reduced by about 40% by merging similar codebooks from different components of the video source. The combined scheme using the block-switched motion compensation and high-order entropy coding reduces the total rate by 25% compared with the compression scheme we tested before, and achieves a comparable very good visual quality.
Motion-adaptive quantization in transform coding for exploiting motion masking effect
Nan Li, Stefaan Desmet, Albert A. Deknuydt, et al.
In this paper, we investigate transform quantization scheme for exploiting motion masking effect (MME). The study is carried out by adapting the quantizer with respect to the movement of the image. The idea of the strategy is based on a velocity dependent HVS model. Through simulation, motion adaptive quantization (MAQ) is proved effective with intra field coding. A scheme of a motion compensated DCT coding with MAQ is then presented. Technical points essential to its realization are explained.
Scalable video coding in frequency domain
Scalable video coding is important in a number of applications where video needs to be decoded and displayed at a variety of resolution scales. It is more efficient than simulcasting, in which all desired resolution scales are coded totally independent of one another within the constraint of a fixed available bandwidth. In this paper, we focus on scalability using the frequency domain approach. We employ the framework proposed for the ongoing second phase of Motion Picture Experts Group (MPEG-2) standard to study the performance of one such scheme and investigate improvements aimed at increasing its efficiency. Practical issues related to multiplexing of encoded data of various resolution scales to facilitate decoding are considered. Simulations are performed to investigate the potential of a chosen frequency domain scheme. Various prospects and limitations are also discussed.
Segmentation-based texture coding algorithm for packet video: a goal-oriented approach
Day-Fann Shen, Sarah A. Rajala
The design of an image coder for packet-switched transmission is formulated as a minimization problem. A general set of design requirements is derived and used to design a segmentation-based texture coding algorithm. The segmentation process is performed on a pyramid data structure and uses the just noticeable difference (JND) of the human visual system as the merge criterion. To reduce the bit-rate while maintaining image quality, each region is classified as either texture or non-texture. Texture regions are approximated by a one-dimensional polynomial, while the non-texture regions are approximated by the region's mean intensity. A set of parameters for bit-rate/image quality tuning is identified and their effect evaluated on LENA and HOUSE.
Efficient image sequence coding by vector quantization of spatiotemporal bandpass outputs
Bernhard Wegmann, Christoph Zetzsche
A coding scheme for image sequences is designed in analogy to human visual information processing. We propose a feature-specific vector quantization method applied to multi-channel representation of image sequences. The vector quantization combines the corresponding local/momentary amplitude coefficients of a set of three-dimensional analytic band-pass filters being selective for spatiotemporal frequency, orientation, direction and velocity. Motion compensation and decorrelation between successive frames is achieved implicitly by application of a non-rectangular subsampling to the 3D-bandpass outputs. The nonlinear combination of the outputs of filters which are selective for constantly moving one- dimensional (i.e. spatial elongated) image structures allows a classification of the local/momentary signal features with respect to their intrinsic dimensionality. Based on statistical investigations a natural hierarchy of signal features is provided. This is then used to construct an efficient encoding procedure. Thereby, the different sensitivity of the human vision to the various signal features can be easily incorporated. For a first example, all multi- dimensional vectors are mapped to constantly moving 1D-structures.
Improved disparity estimation for the coding of stereoscopic television
Andreas C. Kopernik, Danielle Pele
An algorithm is presented that computes the disparity vector fields of arbitrary stereo image sequences with high accuracy and without the assumption of a known camera geometry. The disparity vectors are used for the reconstruction of the right image sequence from the left one in a 3DTV transmission system, thus replacing the full transmission of both stereo channels by a stereocompensated scheme. Compared to classical approaches in robot vision, however, the disparity vector fields have to be dense and sufficiently smooth while maintaining high accuracy to allow for efficient coding and good reconstruction quality. Related to the overlying source coding scheme the analysis still rests block-based. An initial estimate of the disparity is obtained by a correlation of the image phase. Combining an analysis of texture variance and contour information then either a re-estimation using a more sophisticated image analysis or a surface interpolation is applied to selected regions of high reconstruction error. In regions of sufficient texture structures on additional differential identification of affine or quadratic block models is performed. Continuity versus time and ordering constraints from stereopsis serve as further optimization criterions.
Image Segmentation
icon_mobile_dropdown
Comparison of three-color image segmentation algorithms in four color spaces
John M. Gauch, Chi Wan Hsia
In this paper, we describe how three conventional segmentation methods can be generalized to handle color images. The segmentation algorithms we consider are: (1) seed based region growing, (2) directional derivative based edge detection, and (3) recursive split and merge. We then evaluate the effectiveness of these techniques using color difference metrics associated with four different color spaces and a variety of real and synthetic color images. The four color spaces we evaluate are: (1) the spectral primary system (RGB), (2) the NTSC transmission system (YIQ), (3) the hue saturation and brightness system (HLS), and (4) the CIE perceptually uniform space (LAB). We compare these segmentation results using real and synthetic color images which have been 'hand segmented' to determine true object boundaries.
Semantic segmentation of videophone image sequences
Peter J. L. van Beek, Marcel J. T. Reinders, Bulent Sankur, et al.
A system for segmentation of head-and-shoulder scenes into semantic regions, to be applied in a model-based coding scheme on video telephony, is described. The system is conceptually divided into three levels of processing and uses successive semantic regions of interest to locate the speaker, the face and the eyes automatically. Once candidate regions have been obtained by the low level segmentation modules, higher level modules perform measurements on these regions and compare these with expected values to extract the specific region searched for. Fuzzy membership functions are used to allow deviations from the expected values. The system is able to locate satisfactorily the facial region and the eye regions.
Temporal segmentation method for video sequence
Masahiro Shibata
This report describes a method for extracting a hierarchical structure of video sequences in the time domain as a means for video browsing. First, a descriptive method for video content is proposed. Next, a hierarchical segmentation method based on the descriptive method is discussed and results of an experiment are also introduced. The experiment has proved the effectiveness of the proposed methods.
Unsupervised/supervised texture segmentation and its application to real-world data
Devesh Patel, T. John Stonham
We present a texture segmentation technique which can be adapted for a broad category of applications. A Texture Co-occurrence Spectrum is generated for each texture sample by extracting information from all directions around a pixel. A Combined Unsupervised/Supervised clustering algorithm, then groups the Co-occurrence spectra in feature space into clusters representing homogeneous textured regions. The method as presented is applied to, and shown to be capable of segmenting natural texture composites and real-world images such as silica particle micrographs and aerial images.
Image segmentation by region-contour cooperation as a basis for efficient coding scheme
This paper describes a method of image coding based on contour detection and region-growing procedure. Contour detection allows the most evident frontiers of homogeneous regions to be found. The region-growing procedure serves to close contours and to obtain more precise segmentation. The goal of this segmentation is to obtain regions with borders which are easy to code. So the split-and-merge procedure and quad-tree representation are used, that gives a more regular closure of contours, than usual contour-wise methods. The particularity of the method is the choice of growing centers. They are placed on the skeleton of non-closed regions and used in split-and-merge procedure of region growing. The centers placement scheme aims to obtain a uniform centers distribution inside a contour which results in approximately constant speed of growing.
Efficient image partition algorithm based on edge information
Jose Vicente, Ronald W. Schafer
The iterative algorithm presented in this paper partitions an image into piecewise-constant regions. Each iteration consists of three steps. The first step extracts edges from the image. The extracted edges, which must exhibit high connectivity, are computed using a Laplacian- like morphological edge detector in the first iteration and a simple gradient thresholding in subsequent iterations. After the first iteration, the edge detector operates on a piecewise- constant image for which the edge detection problem is well defined and well posed. In the second step, a fast one-pass averaging of connected pixels within closed boundaries defines the regions in the image. Finally, edge pixels (both true and spurious edge pixels) are each assigned to an underlying region. The advantages of this algorithm are: (1) both local and region-based information are incorporated, (2) convergence occurs in a few iterations, and (3) the component operations are relatively simple.
Biomedical Image Processing
icon_mobile_dropdown
Convergence analysis of the active contour model with applications to medical images
We propose an algorithm that reconstructs the central layer of thick contours, and maps it onto the unit interval. We base our algorithm on an active contour model and we analyze its behavior both in the space and frequency domain. We prove conditions on the parameters of the formulation that guarantee a good solution and we demonstrate the importance of the choice of these parameters through a series of experiments.
Multiple simultaneous excitation electrical impedance tomography using low cross-correlated signals
Orkun Hasekioglu, M. Kemal Kiymik
Electrical Impedance Technology (EIT) is a non-destructive technique to image the electrical impedance cross-sections of a given phantom. Conventionally in transverse plane impedance imaging electrical signals (usually sinusoidal) are sequentially applied to the electrodes located circularly around the phantom at uniform angles. For each excitation the signal amplitudes and phases are measured at the other remaining electrodes. These measurements are then used to compute an impedance cross-section map of the phantom. This generic technique is not sufficiently fast to map rapidly varying impedance cross-sections, since all the electrodes need to be excited sequentially while amplitude and phase measurements are made at each one of the other electrodes. To circumvent this problem, it is suggested to use multiple simultaneous excitations with low cross-correlation signals instead of single sequential excitations applied one at a time. As a result the data acquisition time can be reduced by a factor of about the number of electrodes.
Enhancement of digital angiographic images with misregistration correction and spatially adaptive matched filtering
V. V. Digalakis, Dimitris G. Manolakis, P. Lazaridis, et al.
In this paper, we report a new method for the improvement of the perceptual quality in digital angiographic images. This method contains two-steps. In the first step, we process the angiographic image sequence using a new algorithm for the correction of motion artifacts. In the second step, we apply a novel spatially-adaptive temporal filter to the registered image sequence. We demonstrate the improvement of the image quality over the existing methods using actual patient data.
Fuzzy iterative image segmentation with recursive merging
Nozha Boujemaa, Georges Stamon, Jacques Lemoine
The problem of segmenting images with fuzzy clustering is considered. A new approach called 'gradual focusing (beta) -decision' that proceeds in two steps is proposed. First the most 'ambiguous' pixels are revealed from the remaining ones. Next, fine boundaries segmentation is provided by focusing only on ambiguous zone. Since our approach takes into account global information as well as local one, an accurate and smooth result is obtained.
Design and analysis of medical images pyramid coding
YihChuan Lin, Shen-Chuan Tai
In this paper, a new efficient technique for medical image compression is presented. It allows the ability of interactive picture transmission over low bandwidth channels, and achieves high compression ratio as well as good image fidelity. Within this paper, a method, which considers the intra-correlation properties, to design an optimal pyramids generation/synthesizing subsystem is first presented. By this, a pyramid structure which decomposes the input image waveforms into four pyramids is presented. In order to retain the important information for human eyes, we adopt different quantizer as well as coding algorithm for every pyramid according to human visual perceptual properties. Since the lower pyramids of most images are sparse and locally concentrated, i.e., many areas of the lower pyramids are inactive, a vector quantization accompanied with quadtree representation is used. As for the higher pyramids, signals are coded by simple variable length codewords algorithm. Since human eyes are more sensitive to low frequency changes, we apply finer quantizer to encode these pyramids. To generate the pyramids structure, a 2-D Quadrature Mirror Filter (2-D QMF) is used. Simulation results show that a reproduction of medical image with qualities be judged as faithful to the original image, can be obtained with SNR equals 34.16 (dB) and bit rate 0.479 bpp.
Virtual symmetry reconstruction algorithm for limited-view computed tomography
Yang-ming Zhu, Tian-ge Zhuang, Laigao Michael Chen
A Virtual Symmetry Reconstruction (VSR) algorithm for limited-view image reconstruction is proposed here. The algorithm can be applied to such cases where the missing view angle up to (pi) /2 is allowed for some nondestructive evaluation/testing (NDE/NDT) applications. In order to reconstruct the image, we first assume that the reconstructed object is virtually symmetrical about x-axis. The missing (pi) /2 projection data are compensated for by virtue of the proposed formula. An image with the property symmetrical about x-axis is then reconstructed via filtered backprojection algorithm. Similarly, we assume that the reconstructed object is symmetrical about y-axis. Another image with property symmetrical about y-axis is reconstructed. With these two virtually symmetrical images, the desired image reflecting the real object is finally obtained by some sort of operations. Simulations show that VSR is suited to some NDE/NDT applications.
Image Coding and Model-Based Analysis
icon_mobile_dropdown
Color space choice for nearly reversible image compression
Albert A. Deknuydt, J. Smolders, Luc Van Eycken, et al.
In applications where nearly reversible color image compression is required, the choice of an appropriate color space is a major factor determining the attainable compression ratio. The optimal choice would satisfy the following conditions: (1) Decorrelate the data as much as possible. (2) Minimize the total number of bits in the data path needed for a certain quality level (assuming a truly reversible component compression scheme). (3) Need no calculations with critical accuracy. We compared several known linear color transformations (under the assumption that the original data was represented in RGB) with respect to these conditions. Decorrelation and entropy reduction were calculated. The number of bits required to guarantee a certain quality level measured in (Delta) Lab units was determined. The effects of transformation coefficient quantization were checked. As a result, a simple transformation needing no multipliers at all is proposed.
Symmetry-based image coding
Pierangela Cicconi, Riccardo Leonardi, Murat Kunt
A novel coding technique which proposes the use of symmetry to reduce redundancy in images is presented. Axes of symmetry are extracted using the Principal Axes of Inertia theory and the technique is extended to non-symmetric images by the introduction of a Coefficient of Symmetry. One part of the images is then linearly predicted with respect to the chosen axis. The method is implemented in a block-based fashion in order to adapt to local symmetries on the image data. An image representation and a coding strategy is illustrated, and results are presented on real static images.
Orthogonal and nonorthogonal pyramid building in television image coding
Nodar G. Kharatishvili, Joseph Ronsin, Xu Shengwang, et al.
The paper considers orthogonal and non-orthogonal pyramid building for television image coding. Non-orthogonal structures of Laplacian type on the basis of two and three dimensional Laplacian pyramid with three levels is studied. Effectiveness using of quincunx subsampling in the frame with optimal quantization is shown. Building of 'branched pyramid' that enables expanding coefficient compression is proposed. Results of pyramid coding of transformant extracted after linear orthogonal transformation of Walsh original TV images are given.
Adaptive transform image coding based on variable-shape block segmentation with smoothing filter
Susumu Itoh, Ichiro Matsuda, Toshio Utsunomiya
This paper describes a new transform image coding scheme which transfigures square-blocks into variable-shape-blocks so that their boundaries run parallel with the principal contours in an image in order to diminish the coding noise peculiar to transform coding such as the mosquito noise and blocking-effects. This scheme decreases additional information to encode block shapes by limiting all of them to quadrilaterals. Moreover smoothing filter is introduced for the purpose of reducing approximation errors particularly at block boundaries. After determining and encoding block shapes, this scheme transmits a mean value of each block by DPCM and reproduces the interpolation image at both coder and decoder. Then, the interpolation residual signals are selectively encoded by the mean separated KLT whose orthonormal basis-functions are derived individually for each block from the isotropic model for autocorrelation functions of images. Simulation results indicate that this scheme has superiority over square-block-based coding scheme in both coding efficiency and image quality.
Technique of eye animation generated by CG and evaluation of eye contact using eye animation
Kiyohiro Morii, Takanori Satoh, Nobuji Tetsutani, et al.
We describe a real-time animation technique to generate blinking and gaze shift, while still considering convergence, using a Graphic Workstation. Moreover, we evaluate the feeling of eye contact using this eye animation. In our experiment, we subjectively evaluate gaze generated by CG using 2-D or 3-D display, and compare it with the gaze of an actual person. We also perform an experiment on the time required for perception of eye contact when the eyes of the CG image are moving.
Facial image synthesis by hierarchical wire frame model
Yasuichi Kitamura, Yoshio Nagashima, Jun Ohya, et al.
We have studied a generation of realistic computer graphics facial action synchronized with actual facial actions. This paper describes a method of extracting facial feature points and reproducing facial actions for a virtual space teleconferencing system that achieves a realistic virtual presence. First, we need the individual facial wire frame model. We use a 3D digitizer or both the front and side images of the face. Second, we trace the feature points, the points around both the eyes and the mouth. For this purpose, we watch the eye regions and mouth region. If they move, the intensity of the image changes and we are able to find the eyes and the mouth. From facial action images, we cannot extract the deformation of the facial skin. Only from the front view of the face, tracing the eye regions and mouth region, can the movement of these regions in 2D space be extracted. We are proposing a new hierarchical wire frame model that can represent facial actions including wrinkles. The lower layer of the wire frame moves according to movement of the feature points. The upper layer slides over the lower layer and is deformed based on the movement of the lower layer. By applying this method to a telecommunication system, we confirm very realistic facial action in virtual space.
Videophone coding based on 3D modeling of facial muscles
Fabio Lavagetto, Sergio Curinga
In this paper an innovative approach to videophone coding is described, based on a 3D model of a human face with a complex muscle structure capable to reproduce faithfully the basic facial expressions and mimics. The muscle structure is organized in a set of interconnected fibers covering the whole surface of the face and characterized by predefined mechanical properties. Muscles can be activated through the direct stimulation of each individual fiber or, indirectly, by interaction with adjacent stimulated fibers. Through the analysis algorithms performed at the transmitter, the input video sequence is processed to extract a set of suitable facial control features and to estimate the muscle parameters. These parameters, namely the value of the estimated muscle stimula, are then quantized, coded and transmitted to the receiver where they are applied to the model to synthesize the corresponding facial expression. In the paper the muscle structure is described together with the coding/decoding algorithms which have been implemented on a multiprocessor AT&T Pixel Machine. Some samples of the achieved experimental results are also presented to show the significant subjective quality of the reconstructed sequence.
Section
icon_mobile_dropdown
Determining properties of materials by using thermal imaging
Claude L. Caillas, Thierry Porcher
One of the important issues in low level vision consists in recovering properties of materials from images. This paper presents a physical model that describes how planar and horizontal surfaces appear in thermal images according to reflectivity, emissivity and thermal inertia to materials. A numerical algorithm, taking into account meteorological parameters, is presented in order to calculate the temperature of surfaces as a function of physical properties of materials. The calculation of the temperature is performed for every possible combination of material properties. We then use a least square fitting method for matching modeled and experimental temperature which leads to material properties determination. Applying this method to an experimental scene composed of several kinds of sand lead to reasonable quantitative results. Although reasonable, these results are not perfectly reliable due to inevitable assumptions of the model. From there, we argue that one can derive qualitative results that are more reliable than quantitative ones.
Color change method for objects in images
Shoji Suzuki, Masanaga Tokuyo, Masahiro Mori
The color change method we developed uses a reflection model on the object's surface and constructs a new color mixture model for the object-background boundary to help preserve the natural appearance of the objects involved. The reflection model is based on object color, illumination, and surface reflection. While the object color can be changed while preserving the reflection, however, a problem occurs that makes the boundary color appear discontinuous and the object edge overly prominent. When images are digitized, light from the object and the background mix at the boundary, so the object merges naturally into the background. To solve the problem and preserve the naturalness as the original, we propose a boundary-color mixture model that maintains the color blending ratio of the original image and avoids color- related changes in the boundary smoothness.
Color edge detectors based on multivariate ordering
Panagiotis E. Trahanias, Anastasios N. Venetsanopoulos
Color edge detection is approached in this paper using vector order statistics. Based on the R- ordering method, a class of color edge detectors is defined. These detectors function as vector operators. Specific edge detectors can be obtained as special cases of this class. Various such detectors are defined and analyzed. Experimental results show the noise robustness of the vector order statistics operators. A quantitative evaluation and comparison to other color edge detectors favors our approach.
Restoration of gray-level picture with less blur using local information
Kiyoshi Tanaka, Yasuhiro Nakamura, Kineo Matsui
This paper presents a restoration scheme of gray level image which can generated more smooth-dithered and less blurred image. Our scheme embeds local area information on each dithering block into the image to get a high quality image. When a dithered image is sent for facsimile in this scheme, the received image can be output as a normal hardcopy and be displayed on the terminal CRT as more clearly softcopy, using the embedded information.
Frequency domain adaptive iterative image restoration and evaluation of the regularization parameter
In this paper a nonlinear frequency domain adaptive regularized iterative image restoration algorithm is proposed, according to which the regularization parameter is frequency dependent and is updated at each iteration step. The development of the algorithm is based on a set theoretical regularization approach, where bounds on the error residual and the stabilizing functional are updated in the frequency domain at each iteration step. Sufficient conditions for the convergence of the algorithm are derived and experimental results are shown.
Study on image data compression by using neural network
Zhong Zheng, Masayuki Nakajima, Takeshi Agui
Properties of the neural networks employed in image data compression are studied, and a method for increasing the compression capability is proposed. Since the multiple gray level image have a large quantity of data, the poor mapping capacity of the neural network is the main problem causing the poor data compression capability. In order to increase the compression capability, in the proposed method, first an image is divided into subimages, that is blocks. Then these blocks are divided into several classes. Several independent neural networks are assigned adaptively to these blocks according to their classes. Since the mapping capacity is proportional to the number of the neural networks, and no data quantity increases, the compression capability is increased efficiently by our method. The computer simulation results show that the signal to noise ratio (SNR) of the reconstructed images was increased by about 1 approximately 2 (dB) by our method. Especially the visual image quality has increased.
Optimum lattice multiresolution transform for image compression
Benoit M. M. Macq, J. Y. Mertes, Serge Comes, et al.
This paper presents a new design criterion in order to make the multiresolution filter banks perform the best quality/compression ratio, in the frame of scene adaptive coding. The optimization adapts the filters parameters to the codec features and to the statistics of 2-D sources. The aim of the paper is the comparison of several multiresolution linear transforms implemented by separable FIR filter banks. Some of them are well-known transforms while others are optimized in the light of our criterion. Objective criterions and visual tests are taken into consideration.
Modeling and analysis of quantization errors in two-channel subband filter structures
Necdet Uzun, Richard A. Haddad
This paper is concerned with the analysis and modeling of the effects of quantization of subband signals in a two channel filter bank. We derive equations for the autocorrelation and power spectral density (PSD) of the reconstructed signal y(n) in terms of the analysis/synthesis filters, the PSD of the input, and the quantizer model. Formulas for the mean-square error (MSE) and for compaction gain are obtained in terms of these parameters. We assume the filter bank is perfect reconstruction (PR) (but not necessarily paraunitary) in the absence of quantization and transmission errors. These formulas set the stage for filter optimization (maximization of compaction gain and minimization of MSE) subject to PR and bit constraints.
Error-free transform coding by maximum-error-limited quantization of transform coefficients
Gopinath R. Kuduvalli, Rangaraj M. Rangayyan
In this paper, we describe a new technique for encoding transform coefficients for error-free image compression with very little overhead in transmitting the error images. In traditional schemes for error-free transform coding, a fixed total number of bits is allocated for encoding the transform coefficients, and the difference image is encoded by a lossless encoding method. The total number of bits allocated to encode the transform coefficients is varied until the total average bit rate is minimized. In the present scheme, the quantization error for each transform coefficient is monitored, and additional bits are allocated to reduce this quantization error to less than a pre-specified maximum value. It is shown that the reconstructed image from transform coefficients quantized using this scheme will consist of only a negligible number of pixels in error by more than the pre-specified maximum value. This results in very little overhead for encoding the error image. The results of application of this method on a standard image and digitized mammograms are presented.
Fast feature-matching algorithm of motion compensation for hierarchical video CODEC
Xiaobing Lee
The objective of this investigation is to develop a fast block matching scheme which uses Feature Matching to estimate the Motion Displacement (Motion Vector) of the inter- frame/field blocks in MPEG, ATM, or HDTV video sequences. We study two basic feature models, Sign Truncated Feature (STF) and Maximum Feature (MF) corresponding to the reduced-mean and wavelet hierarchical CODEC structures. The kernel operator is in 2 X 2 pixels and produces a Feature Vector FV (mean, model_pattern). 16 X 16 macro_block of pixels can be iteratively represented as three layered feature vector structures. In the higher resolution layers, the model_pattern alone is sufficient to describe the pixel phase correlations within the block for exclusive matching decisions. Only one or four bits are needed for each 2 X 2 pixel block in the feature vector matching. The reduced data representations make it possible to implement the real-time full range search within a large search window (+/- 32 to +/- 64 pixels). This feature representation can well express the pixel correlations, edges and texture information of the tested blocks. By matching the feature correlations rather than matching the summed pixel-by-pixel intensity values between the current block and the reference block of the previous/future video frame, it is possible to significantly reduce the basis matching complexity by re-using the previous results of feature extraction computations with less data fetching requirements. In addition, half-pixel accuracy motion estimation can be achieved. The propose feature matching algorithm is suitable for pipeline and parallel processing to approach a real time VLSI implementation. It can be more than 10 times faster relative to conventional block matching techniques with the same full range search scheme.
Compatible HDTV/TV hierarchical scheme for secondary distribution of TV and HDTV signals
Christine M. Guillemot
This paper describes a procedure for coding video in the HDTV Interlaced format (HDI) at rates of 34 Mbit/s while satisfying constraints of an embedded specified constant bit rate stream, compatible to an Interlaced Enhanced Definition TV signal (EDI). The term compatibility is used here to imply both (a) resolution compatibility and (b) that a decoder conforming to a specific syntax, should be able to decode an easily extractable stream from the information intended for a higher quality decoder, a feature that is sometimes referred as backward compatibility. Resolution compatibility, while solved easily when dealing with progressively scanned signals is more intricate in the case of interlaced signals and requires careful considerations in the design of the procedure useful for generating the lower resolution (or compatible) signal. Instead of performing a temporal filtering which presents the drawback of blurring the contours in motion, the proposed approach consists of addressing these issues in a spatial hierarchical coding scheme, in which special care is given to the choice of filters, in order to obtain a compatible signal with a good tradeoff between vertical and temporal definition, and with an appropriate suppression of aliasing. The hierarchical decomposition is carried out on a field basis and two constrained filter banks are actually designed in order to deal with the phase shift inherent to field processing procedures.
Transmission of compressed video over radio links
Neil MacDonald
A system for transmitting a compressed video signal over a radio link has been developed. The scheme involves altering the output bitstream of an H.261 videocodec to allow data transmission over a Digital European Cordless Telecommunications (DECT) demonstrator system. The system was used to demonstrate acceptable picture quality at bit rates between 32 kbit/s and 284 kbit/s. The high error rate possible for a radio link requires that techniques for improving the error resilience of the coding scheme be investigated. The resilience may be improved by forward error correction, interleaving of the data stream and by repeating blocks which have been corrupted by transmission. The viability of the system has been proven by hardware trials but further study is required before the system is usable on a real network.
Multipoint audiographics teleconferencing system with privacy feature
Ikuro Oyaizu, Kiyoto Tanaka, Toshikazu Yamaguchi, et al.
A multipoint audiographics teleconferencing system with a privacy feature is realized with the development of a personal computer expansion card for high-speed, real-time encryption and decryption of multimedia data that includes audio, images and handwritten data in parallel with ISDN communications.
Multipoint teleconference architecture for CCITT standard video conference terminals
Susumu Oka, Yasuo Misawa
In this paper we describe an architecture and an implementation method for multipoint teleconference systems as one of the most important applications of image communications. We studied a centralized architecture using Multipoint Control Units (MCUs) as service providers. In order to apply this architecture to large-scale systems we adopted a hierarchical star configuration for inter-MCUs connections. Also we have developed a composite mechanism of international standard protocols and our original protocol for high performance services. We have built a prototype teleconference system which can provide a variety of services including several procedures for opening conferences and various types of conference modes. These services can be used not only by our original videoconference terminals but also by international standard terminals and ordinary audio telephones.
Super high-definition image handling system
Ryuta Suzuki, Hideo Ohira, Minoru Wada, et al.
We have developed a super high definition image handling system using new progressive technologies. The system supports the functions of display, editing, storage, transmission, and image compression for the images over 2 K by 2 K pixel resolution. A sufficient performance has been achieved in image handling speed and picture quality. The system is expected to be used in the areas of electrical publishing, biomedical imaging, electrical museum, image filing system, image communication system, and remote plant control system. This system will also be expected to be the next generation Hypermedia Platform.
Video indexing using motion vectors
Akihito Akutsu, Yoshinobu Tonomura, Hideo Hashimoto, et al.
This paper presents a video indexing method that uses motion vectors to 'identify' video sequences. To visualize and interactively control video sequences, we propose a new video index and corresponding icons. The index is based on the identification of discrete cut points and camera operations made possible by analyzing motion vectors. Simulations and experiments confirm the practicality of the index and icons.
Concurrent schema of cooperative picture-painting system
Yo Moriya, Naoto Niwa, Murao Yo, et al.
This paper proposes a concurrent schema using an extended Petri net. In general, Petri net offers a way to describe concurrency in processing, and detailed and complete representation of the picture painting system by Petri net can define exactly. But, using the standard way it lacks the brevity required to understand the behavior of the system. In order to remove these problems of Petri net, some extensions of Petri net such as to use multi tokens, multi arcs, and multi transitions are devised. Moreover, with the layered system, a set of layered, extended Petri nets provides a way that represents simply and visually the complex behavior of the painting system.
Super high-definition still/motion images digitizing system and standard test images
Isao Furukawa, Kazunobu Kashiwabuchi, Sadayasu Ono
This paper presents a very high resolution image digitizing system that handles both still and motion images, and the standard test images captured by the system. The system consists of a high resolution CCD line scanner and high precision film transport unit. Resolution characteristics are directly measured using test charts, and a resolution of over 2048 pixels is obtained in the vertical and horizontal direction. Various corrections, such as shading-, gamma-, color-correction, and edge enhancement are made, and good reproductions of the original photo images are obtained. Sixteen still standard images are digitized, and prepared for coding simulations and other applications. In the near future, motion standard test images will be digitized.
Study for high-resolution color-image-coding approach for office system
Kazuhiro Suzuki, Yutaka Koshi, Syunichi Kimura, et al.
Distributed office systems employ the image exchange facilities such as image input/output, handling a softcopy, image filing service and so on. This paper presents a study of the segmented image exchange approach suit well for block based image coding and image processing schemes. From the result of the feasibility study, the segmented image formatting criterion provides a well-designed quantization-table set for DCT coding scheme. The 400 spi full-tone image coding simulation of the segmented adaptive quantization for DCT coding shows the better SNR and the perceptually better image quality than that of the conventional scheme. To select the quantization-table in the decoder, this scheme requires 3-bit header for every coded block. However, it enables another image processing features in a reconstructed image.
ASIC chip for real-time vector quantization of video signals
Calvin C.K. Chan, Chi-Kit Ma, Anthony Fong
The demanding computational requirement of a vector quantization (VQ) encoder has hindered the application to real-time video coding. The complexity of the VQ encoder can be greatly reduced by using a multiplicationless image vector quantization (MLIVQ) technique. A real- time vector quantizer architecture implementing this novel algorithm on a single ASIC chip is presented in this paper. Running at a clock rate of 33 MHz and through the use of a three- stage pipeline architecture, the proposed hardware implementation is capable of real-time compression of motion video with 512 X 512 pixels per frame at a refreshing rate of 30 frames per second. As compared with traditional full-search VQ, the MLIVQ technique shows a big reduction in hardware complexity while introducing an insignificant degradation in picture quality.
Design of multiprocessor DSP chip set for superhigh-definition image processing
Tatsuya Fujii, Tomoko Sawabe, Mitsuru Nomura, et al.
This paper details the design of a DSP chip set for NOVI-III, a massively parallel digital signal processing system for super high definition (SHD) image coding. In the first step, achieving real-time SHD image processing, we implemented the JPEG coding algorithm on the highly parallel DSP system called NOVI-II HiPIPE, and evaluated its processing performance. The result shows that a real-time CODEC must have an average computational performance of over 100 GFlops (floating point operations) to process SHD motion images. Based on our experience of still SHD image coding, a new DSP chip set is being designed for NOVI-III. The chip set consists of three main chips. The first chip is a vector processor that, in its original form, had a peak performance of 120 MFlops. It is being redesigned to achieve a peak performance of 540 MFlops with 0.5 micrometers fabrication technology. Second is an intercommunication network switch that has six 400 Mbps data links. Third is a RISC type DSP core which controls the vector processor, communication switch and internal memories. This DSP core also has a special function that efficiently performs the bit operations required by the Huffman coding process.
Modified frame memories for motion estimation: the picture-processing RAM
Peter Schiefer
A modified frame memory, the Picture Processing RAM (PPRAM), is proposed for motion estimation. Due to the high input data flow of compression algorithms the PPRAM has been enabled to perform the necessary calculations besides the frame memory cells on the same device. Therefore a small number of logic circuits, less than 5% of the total number of transistors, has to be added to a regular frame memory. This reduces the number of transistors for the compression algorithm and the input data rate significantly.