Proceedings Volume 2727

Visual Communications and Image Processing '96

Rashid Ansari, Mark J. T. Smith
cover
Proceedings Volume 2727

Visual Communications and Image Processing '96

Rashid Ansari, Mark J. T. Smith
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 27 February 1996
Contents: 18 Sessions, 145 Papers, 0 Presentations
Conference: Visual Communications and Image Processing '96 1996
Volume Number: 2727

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Image Coding I
  • Video Coding I
  • Object Recognition
  • Vector Quantization
  • Motion Estimation/Representation
  • Feature Detection
  • Wavelet/Multiresolution Image Compression
  • Motion Analysis and Estimation
  • Image Segmentation
  • Low-Bit-Rate Video Coding
  • Image/Video Analysis
  • Architectures and Implementation
  • Image Coding II
  • Image Sequence Segmentation
  • Interpolation/Reconstruction/Filtering
  • Video Coding II
  • Fractal Methods
  • Enhancement/Restoration
  • Image Coding I
  • Wavelet/Multiresolution Image Compression
  • Image Coding II
Image Coding I
icon_mobile_dropdown
Image coding for content-based retrieval
Mitchell D. Swanson, Srinath Hosur, Ahmed H. Tewfik
We develop a new coding technique for content-based retrieval of images and text documents which minimizes a weighted sum of the expected compressed file size and query response time. Files are coded into three parts: (1) a header consisting of concatenated query term codewords, (2) locations of the query terms, and (3) the remainder of the file. The coding algorithm specifies the relative position and codeword length of all query terms. Our approach leads to a progressive refinement retrieval by successively reducing the number of searched files as more bits are read. It also supports progressive transmission.
SAR image data compression algorithm for clipping service applications
Jin-Woo Nahm, Mark J. T. Smith
This paper addresses the SAR image data communications problem between airborne sensors and ground stations. Channel bandwidths are typically constrained. Therefore a compression- transmission-decompression strategy is necessary to facilitate real-time transmission. In the sections that follow, we introduce a new compression-decompression algorithm for transmitting sensor data at low bit rates. The algorithm involves the computation and use of pre-transmission estimates of target locations. These estimates are used to allocate bits preferentially to enhance the quality of the image representation in areas where the probability of targets is high. It is shown that this algorithm is effective in improving the subject quality of the transmitted images as well as improving the receiver-end target recognition performance.
Stereo image compression with disparity compensation using the MRF model
Woontack Woo, Antonio Ortega
In coming years there will be an increasing demand for realistic 3-D display of scenes using such popular approaches as stereo or multi-view images. As the amount of information displayed increases so does the need for digital compression to ensure efficient storage and transmission of the sequences. In this paper, we introduce a new approach to stereo image compression based on the MRF model and MAP estimation. The basic strategy will be to encode the right image as a reference, then estimate the disparity between blocks in the right and left images and transmit the disparity and the error between the disparity compensated left image and the original. This approach has been used in the literature and is akin to the block matching technique used for motion compensation in video coders. The main drawback in this approach is that as the block size becomes smaller the overhead required to transmit the disparity map becomes too large. Also, simple block matching algorithms frequently fail to provide good matching results because the correspondences are locally ambiguous due to noise, occlusion, and repetition or lack of texture. The novelty in our work is that to compute the disparity map we introduce an MRF model with its corresponding energy equation. This allow us to incorporate smoothness constraints, to take into account occlusion, and to minimize the effect of noise in the disparity map estimation. Obtaining a smooth disparity is beneficial as it reduces the overhead required to transmit the disparity map. It is also useful for video coding since the robustness against noise ensures that disparity maps in successive frames will be very similar. We describe this new formulation in detail and provide compression results.
Code-excited linear predictive coding of multispectral MR images
Jian-Hong Hu, Yao Wang, Patrick Cahill
This paper reports a multispectral code excited linear predictive coding method for the compression of well-registered multispectral MR images. Different linear prediction models and the adaptation schemes have been compared. The method which uses forward adaptive autoregressive (AR) model has proven to achieve a good compromise between performance, complexity and robustness. This approach is referred to as the MFCELP method. Given a set of multispectral images, the linear predictive coefficients are updated over non-overlapping square macroblocks. Each macro-block is further divided into several micro-blocks and, the best excitation signals for each microblock are determined through an analysis-by-synthesis procedure. To satisfy the high quality requirement for medical images, the error between the original images and the synthesized ones are further specified using a vector quantizer. The MFCELP method has been applied to 26 sets of clinical MR neuro images (20 slices/set, 3 spectral bands/slice, 256 by 256 pixels/image, 12 bits/pixel). It provides a significant improvement over the discrete cosine transform (DCT) based JPEG method, a wavelet transform based embedded zero-tree wavelet (EZW) coding method, as well as the MSARMA method we developed before.
Improved joint bilevel image experts group (JBIG) data compression of continuous-tone images
Liang Shen, Rangaraj M. Rangayyan
Lossless compression techniques are essential in some applications, such as archival and communication of medical images. In this paper, an improvement on the Joint Bi-level Image Experts Group (JBIG) method for continuous-tone image compression is proposed, which is basically an innovative combination of multiple decorrelation procedures, namely a lossless JPEG (Joint Photographic Experts Group)-based predictor, a transform-based inter-bit-plane decorrelator, and JBIG-based intra-bit-plane decorrelator. Our improved JBIG coding scheme outperformed lossless JPEG coding, JBIG coding, and the best mode of CREW coding (compression with reversible embedded wavelets), on the average bit rate, by 0.56 (8 bits/component images only), 0.14, and 0.12 bits per pixel with the JPEG standard set of 23 continuous-tone test images. Our compression technique could be easily incorporated into currently existing JBIG-based products. A proper high-order entropy estimation algorithm is also presented, which indicates the potentially achievable lower bound bit rate, and should be useful in decorrelation analysis as well as in the design of cascaded decorrelators.
Image reconstruction through projection of wavelet coefficients
Chul-Woo Kim, HyoJoon Kim, ChoongWoong Lee
This paper proposes an image reconstruction algorithm that adopts projection scheme on wavelet transform domain of image signal. Wavelet decomposed image is encoded by the result of projection along one direction out of eight which approximates the coefficients most closely to the originally transformed coefficients. These projection data are vector quantized using separate codebooks depending on the decomposition level and orientation of decomposed image. Experimental result reveals that the proposed scheme shows excellent performance in PSNR manner and also shows good subjective quality.
Video Coding I
icon_mobile_dropdown
Error-resilient decoding of randomly impaired MPEG-2 bit stream
JongWon Kim, Jong-Wook Park, Sang Uk Lee
This paper presents an error resilient decoding technique based on the error detection and DCT coefficients recovery for the randomly impaired MPEG-2 bit stream. First, to simulate a more realistic and severe environment, we do not rely on the network support of the packet transport/adaptation protocol for error detection. Therefore, the macroblocks corrupted by the random bit errors are identified by a layered error detection algorithm. Next, assuming a smoothness constraint on image intensity, an object function which describes the inter-sample variations at the boundaries of the lost macroblock and the adjacent macroblocks is defined, and the damaged DCT coefficients are optimally recovered by solving a set of linear equations. Computer simulation results show that the quality of a recovered image is significantly improved even when the bit error rate is as high as 10-5.
Deinterlacing algorithm based on sparse wide vector correlations
Yeong-Taeg Kim
In this paper, we propose a new deinterlacing algorithm based on sparse wide vector correlations, which is an extension of the deinterlacing algorithm previously proposed by the author, aimed to reduce the H/W complexity in applications. The proposed algorithm is developed mainly for the format conversion problem encountered in current HDTV systems, but can also be applicable to the double rate conversion problem in the NTSC system. By exploiting the edge oriented spatial interpolation based on the wide vector correlations, visually annoying artifacts caused by interlacing such as a serrate line, line crawling, a line flicker, and a large area flicker can be remarkably reduced since the use of the wide vector correlation increases the range of the orientations that can be detected, and by introducing sparse vectors the H/W complexity for realizing the algorithm in applications can be significantly reduced. Simulations are also provided indicating that the proposed algorithm results in a high performance comparable to the performance of the deinterlacing algorithm based on wide vector correlations.
Eigenfeatures coding of videoconferencing sequences
Marcia G. Ramos, Sheila S. Hemami
This paper presents a video coding technique that achieves high visual quality at very low bit rates. Each video frame is divided into two regions, consisting of a background area and a visually important feature to be coded at higher bit rates. The feature is tracked from frame to frame and it is coded using a set of features that are extracted from a training set. The set of features, which will be referred to as eigenfeatures, is stored both at the encoder and decoder sites. The technique is based on the eigenfaces method, and achieves high visual quality at high feature compression ratios (around 200 for the salesman sequence and 1000 for the Miss America sequence) with considerably less computational complexity than the eigenfaces method. Using this technique for the feature together with H.261 for the background allows a reduction of up to 70% in the bit rate compared to using H.261 alone.
Rate control using spline-interpolated R-D characteristics
Liang-Jin Lin, Antonio Ortega, C.-C. Jay Kuo
Digital video's increased popularity has been driven to an extent by a flurry of recently proposed international standards. In most standards, the rate control scheme, which plays an important role for improving and stabilizing the decoding and play-back quality, is usually not defined. Several techniques have been proposed to aim at the best possible quality for a given channel rate and buffer size. These approaches are complex in that they require the R-D characteristics of the input data to be measured. In this paper, we propose a method to approximate the rate and distortion functions to reduce the complexity of the optimization procedures while making a minimal number of a priori assumptions on the source data. In the proposed method, the R-D of image frames is approximated by spline interpolation functions, and inter-frame dependency (for P or B frames in MPEG) are modelled by a linear-constant function. The application to gradient-based rate-control scheme for MPEG shows that, for a typical MPEG encoder, by using the proposed model, the same performance can be achieved with only about 10 to 15 percent of computation cost.
Adaptive 3D subband video coding
Yong-Kwan Kim, Rin Chul Kim, Sang Uk Lee
In this paper, we propose a temporally adaptive three-dimensional subband coding (3D SBC) technique to effectively exploit the temporal activities in the input video. By using the rate- distortion performance measure, we show the optimal number of temporal subbands can be easily determined. The base temporal subband, which yields much concentrated energy, is encoded using H.261-like motion compensated discrete cosine transform technique. While in the higher temporal subbands, the two-dimensional adaptive wavelet packet bases are employed to exploit the various energy distributions, due to the moving components. In encoding the subbands, we employ adaptive scanning methods, followed by uniform step-size quantization with variable length coding, and coded/not-coded flag reduction technique based the quadtree structure. From the simulation results, the proposed 3D SBC provides about 0.29 to approximately 3.14 dB PSNR gain over the H.261 and the temporally fixed 3D SBC techniques.
Performance analysis of an adaptive transform coder using uncovered background predictor
Keng-Pang Lim, Man Nang Chong, Amitabha Das
The drawbacks of the current state of transform based coding algorithms used in the CCITT H.261 video coding standard are evaluated. The inefficiencies of the CCITT H.261 based algorithms can be seen as: (1) the image is partitioned into fixed block-size of 8 by 8 pixels for processing. The encoded information of the codec has a fixed data structure which may not be efficient in terms of minimal data representation. The fixed block-size approach is not adaptive to the change of scene activity. (2) The use of a single last-frame for predictive conditional replenishment coding is not effective and is highly susceptible to accumulated interframe coding error. In this paper, we attempt to improve the CCITT H.261 based coding algorithms by finding a solution to the above mentioned inefficiencies. Two major developments have been accomplished for efficient coding of video signals. First, an adaptive video coding algorithm with the aim to obtain the desirable attributes of an efficient or ideal data structure for the encoded information is developed and implemented. Next, a background predictor is implemented with the adaptive video coding algorithm to improve the efficiency of predictive conditional replenishment coding in H.261 based algorithms. The results obtained from the proposed, 'adaptive transform coder with uncovered background predictor' algorithm compare favorably with those of H.261 based algorithms. The new method renders a more significant gain in compression ratio than the fixed block-size algorithm and the peak signal-to-noise-ratio is only a fraction of decibel lower than that of a fixed block-size algorithm. The results were obtained by testing the new algorithm with standard test sequences of images.
Constant quality MPEG coder with bit stream shaping and peak rate control
Patrick J. van der Meer, Reginald L. Lagendijk, Jan Biemond
Variable bit rate transmission opens the way to constant quality video coding. However, this requires a different approach from the traditional constant bit rate coding techniques since a constant distortion does not yield a constant quality. We introduce a method to determine the maximum distortion locally, and a technique to minimize the bit rate regarding this local maximum distortion. Also, it is shown that with bit stream shaping and peak bit rate control the bit rate will always be lower than in a CBR source with similar quality in the most difficult scenes.
Object Recognition
icon_mobile_dropdown
Gaussian kernels for affine-invariant iconic representation and object recognition by multidimensional indexing
Jezekiel Ben-Arie, Zhiqian Wang, Raghunath K. Rao
This paper describes an approach for affine-invariant object recognition by iconic recognition of image patches that correspond to object surfaces that are roughly planar. Each surface is recognized separately invariant to its 3D pose, employing novel affine-invariant spectral signatures (AISSs). The 3D-pose invariant recognition is achieved by convolving the image with a novel configuration of Gaussian kernels and extracting local spectral signatures. The local spectral signature of each image patch is then matched against a set of iconic models using multi-dimensional indexing (MDI) in the frequency domain. Affine-invariance of the signatures is achieved by a new configuration of Gaussian kernels with modulation in two orthogonal axes. The proposed configuration of kernels is Cartesian with varying aspect ratios in two orthogonal directions. The kernels are organized in subsets where each subset has a distinct orientation. Each subset spans the entire frequency domain and provides invariance to slant, scale and limited translation. The union of differently oriented subsets is utilized to achieve invariance in two additional degrees of freedom, i.e. rotation and tilt. Hence, complete affine-invariance is achieved by the proposed set of kernels. The indexing method provides robustness in partial distortion, background clutter, noise, illumination effects and lower image resolution. The localized nature of the Gaussian kernels allows independent recognition of adjacent shapes that correspond to object parts which could have different poses. The method has yielded high recognition rates in experiments over a wide range of slant, scale, rotation, and tilt with a library of 26 gray-level and infra-red models, in the presence of noise, clutter and other degradations.
New construction method of bidirectional associative memory for object recognition
Te-Shin Chen, Kang-Ping Lin, Being-Tau Chung
A new construction method of bi-directional associative memory (BAM) for image pattern/object recognition is proposed in this paper. The strategy of the method is based on combining the major information of each object from both spatial domain and frequency domain images. The BAM model is reconstructed by input object features and the input object's Fourier spectrum. After the BAM reconstruction, each similar input pattern can be retrieved or recognized using the two-layer neural network automatically. The presented reconstruction method allows any new input object to be stored and added into the original BAM model without extra training procedure. Experimental results indicate that the system performs good recognition ability for input object with noise. The method also provides a distortion correction way for the deformed objects. The system has been implemented in a PC- based computer.
Moving object discrimination and tracking for unmanned surveillance system
Young Ho Kim, Kyu-Won Lee, Jun Geun Jeon, et al.
An efficient algorithm for discriminating and tracking a moving object for the unmanned surveillance system is proposed. Shape discrimination process uses the boundary map, and determines whether the moving object is the target of tracking. In order to successfully obtain the target motion information (by removing the background motion from the scene) after the tracking is started, we use an extension of the directional selectivity theory for the motion model, and we use a ramp-shaped profile of the intensity change at every edge point. We are currently designing a real-time dedicated system only for the moving target discrimination and tracking using high speed programmable logic devices. Simulation of discrimination and tracking a moving object, limited to a walking man in this paper, is performed successfully by using the proposed algorithm.
Cascade fuzzy ART: a new extensible database for model-based object recognition
Hai-Lung Hung, Hong-Yuan Mark Liao, Shing-Jong Lin, et al.
In this paper, we propose a cascade fuzzy ART (CFART) neural network which can be used as an extensible database in a model-based object recognition system. The proposed CFART networks can accept both binary and continuous inputs. Besides, it preserves the prominent characteristics of a fuzzy ART network and extends the fuzzy ART's capability toward a hierarchical class representation of input patterns. The learning processes of the proposed network are unsupervised and self-organizing, which include coupled top-down searching and bottom-up learning processes. In addition, a global searching tree is built to speed up the learning and recognition processes.
Improved adaptive boundary tracing using 2D dynamic programming
He Wang, Tian-ge Zhuang, Dazong Jiang, et al.
Edge detection is one of the most important steps in the process of object recognition and 3D display of medical images. Although lots of techniques have been presented to extract edges, few of them can give rapid results with thinness, connectedness, closedness as well as optimization. Most works are still done manually. In this paper, after a review of the previous works on edge detection, an improved dynamic programing (DP)-based algorithm is proposed. On the basis of the initial cost matrix provided by gradient information, local window technique is introduced into the traditional 2D DP to speed up the algorithm with the optimal parameters being determined adaptively in the window. As the local window moves forward along the boundary, the contour of a defined object will be traced segment by segment, and finally an optimal and connected result is acquired. From the experiments on synthetic images with and without noise, the quantitative evaluation of the performance and computational complexity of the algorithm are studied. In the practical application, medical images are processed to show that the improved DP approach can achieve a fast and ideal edge tracing operation.
Texture classification based upon spatial autocorrelation
Ahmed Bounekkar, Michel P. Lamure, Nicolas Nicoloyannis
In this paper, we propose a method of texture classification based upon contiguity concepts and Moran's spatial autocorrelation coefficient. First, we introduce essential concepts about contiguity and Moran's coefficient (section 2 and section 3). Then, we are able to propose a classification of a set with 30 textures based on computation of those Moran's coefficients (section 4). We discuss obtained results.
Personal system for practical face recognition
Kazue Fukushima, Harumi Kawamura, Makoto Kosugi, et al.
We propose a personal image processing system for face recognition. It locates a subject's face in the image and recognizes the person using only simple commercial devices such as an NTSC video camera and a personal computer. This system can locate a subject's face and recognize the person in real-time without imposing strict conditions on the background, by using mosaic pattern matching. We describe the system configuration and its algorithm, and show experimental results for various faces and environments.
Color scene recognition using relational distance measurement
In this paper, we present a method for the recognition of color images of outdoor scenes using the relational graph description and distance measurement. Color scenes of interest are modeled by considering 3-D to 2-D constraints, regions' color properties, and regions' adjacency relations. The scene models are stored in a database system in region adjacency graph format. Each node of the graph represents an object or surface in the scene and includes object's or surfaces color as its unary property. Edges of the graph correspond to the binary adjacency relations between the objects or surfaces in accordance with the 3-D to 2-D mapping constraints. For the recognition of a color image of unknown origin, the uniformly colored object areas in the image are extracted using color clustering and linear discriminant. The textured surfaces are then segmented by the Julesz conjecture. The extracted object regions are refined in the spatial plane to eliminate the fine grain segmentation results. The extracted regions are used as the nodes of the image region adjacency graph. The nodes are assigned the region's color information as their unary properties. If two object regions share a common boundary, they are considered adjacent and their respective nodes are connected by an edge. A relational homomorphism (i.e., mapping function) from the image graph to the scene graph is determined considering the unary properties and binary relations in their respective graph representations. The relational-distance measure is used for matching the relational graphs of the input scenes and the respective images. The scene graph with the minimum structural error is selected as the best match for the image being processed.
Vector Quantization
icon_mobile_dropdown
Classified residue vector quantization by visual patterns
K. W. Chan, Kwok-Leung Chan
A new classified residue vector quantization (CRVQ) by visual pattern (VP) without any side information was developed. The original image was first decomposed into a low frequency component (LFC), which was highly correlated with the original, and a residue. The residue was classified not by itself nor the original, but by the LFC. With 15 VPs and 4 variance classes, the visual quality was enhanced from 0.5 to nearly 1.5 dB, without any penalty in bit rate.
Algorithms for intervector dependency utilization in variable-rate tree-structured vector quantization
Ligang Lu, William A. Pearlman
In this paper we improve and extend our previous results in finite-state variable rate tree- structured vector quantization (FSVRTSVQ) by developing two new algorithms to increase its coding efficiency. The two new algorithms are derived to utilize the inter-codeword dependency and the tree structure property of the codebook so as to achieve bit rate reduction. The evaluation of both algorithms on various synthesis and real sources has shown that as many as 32.2% of bit rate savings has been obtained over the pruned tree-structured vector quantization (PTSVQ).
Entropy-constrained finite-state residual vector quantization: a new scheme for low-bit-rate coding
Finite-state vector quantization (FSVQ) is known to give a better performance than a memoryless vector quantization (VQ). Recently, a new scheme that incorporates a finite memory into a residual vector quantizer (RVQ) has been developed. This scheme is referred to as finite-state RVQ (FSRVQ). FSRVQ gives better performance than the conventional FSVQ with a substantial reduction in the memory requirement. The codebook search complexity of an FSRVQ is also reduced in comparison with that of the conventional FSVQ scheme. This paper presents a new variable-rate VQ scheme called entropy-constrained finite state residual vector quantization (EC-FSRVQ). EC-FSRVQ is designed by incorporating a constraint on the output entropy of an FSRVQ during the design process. This scheme is intended for low bit rate applications due to its low codebook search complexity and memory requirements. Experimental results show that the EC-FSRVQ outperforms JPEG at low bit rates.
New vector transform for image coding
Shipeng Li, Weiping Li
Vector quantization (VQ) always outperforms scalar quantization. Recently, vector transform coding (VTC) has been introduced to better take advantage of signal processing for vector quantization and shown to achieve better performance in image coding. How much performance advantage in terms of rate-distortion can a vector transform coding scheme gain over the other coding schemes? What is the optimal vector transform (VT) with complexity constraint on VQ? These are the questions we try to answer in this paper. Based on the results from high-resolution or asymptotic (in rate) quantization theory, we obtain a general rate- distortion formula for signal processing combined with vector quantization for first-order Gaussian-Markov source. We prove that VTC indeed has better performance than other existing coding schemes with the same or less complexity based on the rate-distortion measurement. A new mirror-sampling based vector transform which only involves additions and subtractions is proposed. For high rate case, we show that the new VTC scheme achieves the optimal performance under the complexity constraint. A 2D version of the new vector transform is applied to image coding, and the results show that the new vector transform consistently outperforms the subsampling-based vector transform.
Optimized multilevel codebook searching algorithm for vector quantization in image coding
Hugh Q. Cao, Weiping Li
An optimized multi-level codebook searching algorithm (MCS) for vector quantization is presented in this paper. Although it belongs to the category of the fast nearest neighbor searching (FNNS) algorithms for vector quantization, the MCS algorithm is not a variation of any existing FNNS algorithms (such as k-d tree searching algorithm, partial-distance searching algorithm, triangle inequality searching algorithm...). A multi-level search theory has been introduced. The problem for the implementation of this theory has been solved by a specially defined irregular tree structure which can be built from a training set. This irregular tree structure is different from any tree structures used in TSVQ, prune tree VQ, quad tree VQ... Strictly speaking, it cannot be called tree structure since it allows one node has more than one set of parents, it is only a directed graph. This is the essential difference between MCS algorithm and other TSVQ algorithms which ensures its better performance. An efficient design procedure has been given to find the optimized irregular tree for practical source. The simulation results of applying MCS algorithm to image VQ show that this algorithm can reduce searching complexity to less than 3% of the exhaustive search vector quantization (ESVQ) (4096 codevectors and 16 dimension) while introducing negligible error (0.064 dB degradation from ESVQ). Simulation results also show that the searching complexity is close linearly increase with bitrate.
Variable-length tree-structured subvector quantization
It is demonstrated in this paper that the encoding complexity advantage of a variable-length tree-structured vector quantizer (VLTSVQ) can be enhanced by encoding low dimensional subvectors of a source vector instead of the source vector itself at the nodes of the tree structure without significantly sacrificing coding performance. The greedy tree growing algorithm for the design of such a vector quantizer codebook is outlined. Different ways of partitioning the source vector into its subvectors and several criteria of interest for selecting the appropriate subvector for making the encoding decision at each node are discussed. Techniques of tree pruning and resolution reduction are applied to obtain improved coding performance at the same low encoding complexity. Application of an orthonormal transformation such as KLT or subband transformation to the source and the implication of defining the subvectors from orthogonal subspaces are also discussed. Finally simulation results on still images and AR(1) source are presented to confirm our propositions.
Variable block-size interpolative vector quantization
Krit Panusopone, K. R. Rao
The conventional fixed block-size vector quantization (VQ) usually copes with a small dimension data to alleviate computation load. This technique suffers from the blocking effect at low bit rates. To handle this problem, input data is arranged to form a variable dimension vector so that correlation between two vectors is weak. This paper uses quadtree partitioning to form variable block-size regions. Instead of taking all data of segmented area directly, only constant amount of pixels are selectively subsampled from each terminal node to form an input vector of VQ. With this improvement, only single universal codebook can take care of all kinds of image data. At the decoder, reduced dimension vector will be interpolated back to its full resolution information. Simulation results show that the reconstructed images preserve fine and pleasant qualities in both edge and background regions. The search time for VQ coder also reduces significantly. Furthermore, the comparison of the PSNR of the reconstructed images also reveals better performance of the proposed method than the traditional fixed block-size scheme at low bit rates.
Variable-rate lattice VQ algorithm for vector subband coding
Hugh Q. Cao, Cindy C. Wang, Weiping Li
In this paper, a variable rate lattice VQ algorithm for vector subband coding is proposed. It is a modification of a two-stage vector quantization-lattice vector quantization (VQ-LVQ) algorithm introduced by J. Pan and T. R. Fischer. After the modification, the new algorithm is designed to target low-bit rate coding instead of moderate bit rate to high bit rate coding in the original design. The new algorithm has been applied to a low bit rate vector subband coding scheme. Simulation result shows it can reduce the memory requirement while offering even better rate-distortion performance (up to 1 db) than regular unstructured VQ.
Motion Estimation/Representation
icon_mobile_dropdown
Subpixel motion estimation in DCT domain
Ut-Va Koc, Kuo Juey Ray Liu
Currently existing subpixel motion estimation algorithms require interpolation of inter-pixel values. However, it is shown in this paper that, by introducing the concept of pseudo phase shifts in DCT coefficients of a moving object's intensity, subpixel displacements can be estimated on two consecutive frames in DCT domain without interpolating inter-pixel values. Specifically, under a certain condition, subpixel motion information is preserved in the DCT coefficients of a shifted signal and extraction of this subpixel motion is based upon the subpel sinusoidal orthogonal principles. Furthermore, the resulted algorithms are flexible and scalable in terms of estimation accuracy and have very low computational complexity O(N2) compared to O(N4) for full search block matching approach and its subpixel versions. Above all, motion estimation in DCT domain instead of spatial domain simplifies the conventional standard-compliant video coder, especially the heavily loaded feedback loop in the conventional design, resulting in a fully DCT-based high-throughput standard-compliant video codec. In addition, the computation of pseudo phases is local and thus a highly parallel architecture is feasible for the DCT-based algorithms. In this paper, a set of experiments on a number of video sequences demonstrate that the DCT-based subpixel motion estimation schemes provide comparable performance against the block matching approach.
Rate-constrained contour-representation for region-based motion compensation
Klaus Werner Stuhlmueller, Albert Salai, Bernd Girod
A new region-based coding scheme is introduced. It uses generalized parabolic blending curves for contour coding which lead to visually preferable smooth contours and support good local controllability. A rate-distortion criterion is applied to decide which regions should be used for motion compensation and for optimizing the region contours. The conditions for optimal regions in a rate-distortion sense are derived. Simulation results are presented. A gain of about 2 dB in PSNR for region-based motion compensation with optimized regions over block-based motion compensation at very low bit-rates is reported.
Closed-form solutions for polygon-based node motion estimation
Motion compensation using 2-D mesh models requires computation of the parameters of a spatial transformation within each mesh element (patch). It is well known that the parameters of an affine (bilinear or perspective) mapping can be uniquely estimated from three (four) node-point motion estimates. This paper presents closed-form overdetermined solutions for least squares estimation of the motion parameters, which also preserve mesh-connectivity using node-based connectivity constraints. In particular, two new algorithms are presented: The first method, based on the dense motion estimates, can be viewed as post processing of the dense motion field for best compact representation in terms of irregularly spaced samples, while the second one, which is based on spatio-temporal intensity gradients, offers closed- form solutions for direct estimation of the best node-point motion vectors. We show that the performance of the proposed closed-form solutions are comparable to those of the alternative search-based solutions at a fraction of the computational cost.
Motion-based representation of video sequences using variable block sizes
Johanna V. Gisladottir, Kannan Ramchandran, Michael T. Orchard
In standard video coding algorithms image frames are represented partly through motion information and partly through intensity information. Through motion information, a new frame is described with respect to the previous frame in the video sequence, and through intensity information it is described without reference to the previous frame. As long as consecutive frames are related through motion, motion-based information generally yields more efficient representation than intensity-based information. This observation has motivated the development of a new coding scheme in which image sequences are represented solely in terms of their motion fields. In the new coding scheme every pixel in a new image frame is assigned a motion vector that fully conveys the new pixel intensity. The resulting full resolution motion field fully characterizes the new frame. The motion field is then coded and sent to the decoder. Such a motion based representation has been shown to be more efficient than standard hybrid representations. In this paper the motion-based coding scheme is generalized to allow variable block sizes. Image frames are represented in terms of quadtrees that describe their division into blocks of varying sizes. Simulations show that variable block sizes allow for more efficient motion-based coding of video sequences than fixed block sizes do.
Image restoration-based template matching with application to motion estimation
Mun Gi Choi, Nikolas P. Galatsanos, Dan Schonfeld
In this paper, we investigate the relationship between the image restoration and the template matching problems. A new formulation of template matching is presented. According to this formulation the relationship between image restoration and template matching is captured by the observation that the restored template can be used to ascertain its location. As a consequence, the duality between image restoration and template matching is established. An alternative approach is also presented to provide the interpretation of template matching as a special case of blurred image restoration. This approach is based on the observation that template matching can be viewed as image restoration of the degraded version of an unknown image blurred by a linear point spread function -- the template. It is shown that, unlike image restoration, all second order statistics required for the implementation of linear minimum mean-square error (LMMSE) template matching can be easily estimated. The linear minimum mean-square error (LMMSE) template matching is subsequently applied to block-matching motion estimation from degraded image sequences. Our experimental results indicate: First, that LMMSE template matching is superior to traditional block-matching. Second, that there exist certain applications, for example multichannel image sequence restoration, that greatly benefit from LMMSE-based template matching.
Lossy techniques for motion vector encoding
Mihai Sipitca, Vijay K. Madisetti
In this paper we investigate the advantages of using lossy encoding of the motion vectors (MV). This is done in the context of accommodating a larger amount of MV information that results if one uses a smaller size of the macroblock (MB). After estimating the improvement in the displaced frame difference (DFD) variance associated with the decrease of the size of the MB to 8 for salesman QCIF sequence, we look at the lossy DCT performance in the context of rate-distortion trade off. A theoretical argument in favor of the Walsh Hadamard transform is made and the experimental results back up the assumption that it performs better than the DCT. The influence of the size of the kernel is also investigated. A combination of the WHT with lossy techniques imbedded in the MV estimation process that improve the MV field redundancy is proved to perform best for the lossy transform case. Then, we look at VQ encoding of the MVs and the results are compared with the transform case in terms of rate- distortion performance and flexibility.
Field and frame-based motion estimator with a very flexible search range
Jin Suk Kwak, Jinwoong Kim, Kichul Kim
In this paper, a low-latency and high throughput semi-systolic array architecture is proposed which can perform both field-based and frame-based motion estimations within single data- flow. Fully pipelined operation for forward and backward motion estimations is provided without any time delay. The proposed architecture has the following properties: (1) the search area data is broadcast to every processing element at the same time, and the reference block data is transferred from one processing element to the next processing element at every clock cycle. This data-flow minimizes the initial delay, and reduces the amount of hardware for storing the search area data. (2) Each processing element computes multiple mean absolute differences (MADs) one by one. Therefore, by using the processing elements repeatedly, motion estimation with a very flexible search range can be performed without any additional hardware. (3) Using the fact that a MAD for one frame reference block can be obtained from the sum of two MADs for top and bottom field blocks, the architecture performs both field- based and frame-based motion estimations within single data-flow.
Motion-compensated partition coding
This paper deals with the coding of the partition information resulting from the segmentation of video sequences. Motion compensation of partition sequences is described as an efficient inter-frame mode of coding. It involves the prediction of the partition, the computation of the partition compensation error, the simplification of the error and its transmission. The major issues and processing steps of a general motion compensation loop for partitions are presented and discussed.
Feature Detection
icon_mobile_dropdown
Generalized feature detection using the Karhunen-Loeve transform and expansion matching
Zhiqian Wang, Raghunath K. Rao, Dibyendu Nandy, et al.
This paper presents a novel generalized feature extraction method based on the expansion matching (EXM) method and the Karhunen-Loeve (KL) transform. This yields an efficient method to locate a large variety of features with reduced number of filtering operations. The EX method is used to design optimal detectors for different features. The KL representation is used to define an optimal basis for representing these EXM feature detectors with minimum truncation error. Input images are then analyzed with the resulting KL bases. The KL coefficients obtained from the analysis are used to efficiently reconstruct the response due to any combination of feature detectors. The method is applied to real images and successfully extracts a variety of arc and edge features as well as complex junction features formed by combining two or more arc or line features.
Automated extraction of the LV from 3D cardiac ultrasound scans
Edward A. Ashton, Daniel Phillips, Kevin J. Parker
In this paper, a novel technique is presented for the extraction of features from 3D medical image sequences. This technique involves grayscale segmentation, followed by application of a 3D deformable model algorithm which smooths the data and compensates for drop-out regions in the segmentation. These properties are particularly desirable in the application studied here, which is the extraction of the left ventricle from a 3D ultrasound scan. THe algorithm is shown to produce a good reconstruction of the LV, as well as an accurate measurement of its volume.
Optimal filter for detection of stellate lesions and circumscribed masses in mammograms
Thor Ole Gulsrud, Sissel Kjode
This paper presents a new method for segmentation of digital mammograms based on the use of an optimal filter for feature extraction. The filter is optimized with respect to the Fisher criterion, i.e. the mean feature values for the textures to be discriminated are maximally separated, while the feature variances are low. Our optimal filter is used in the detection of two types of tumors; circumscribed masses and stellate lesions. We take advantage of the similarity between stellate lesions and circumscribed masses and use the same filter in the segmentation of both types of abnormalities. Results of an experimental study using a set of 60 mammograms -- 22 containing circumscribed masses, 19 containing stellate lesions, and 19 containing no tumors -- are presented.
Optimal filtering scheme for unsupervised texture feature extraction
Trygve Randen, Vidar Alvestad, John Hakon Husoy
In this paper a technique for unsupervised optimal feature extraction and segmentation for textured images is presented. The image is first divided into cells of equal size, and similarity measures on the autocorrelation functions for the cells are estimated. The similarity measures are used for clustering the image into clusters of cells with similar textures. Autocorrelation estimates for each cluster are then estimated, and two-dimensional texture feature extractors using filters, optimal with respect to the Fisher criterion, are constructed. Further, a model for the feature response at and near the texture borders is developed. This model is used to estimate whether the positions of the detected edges in the image are biased, and a scheme for correcting such bias using morphological dilation is devised. The article is concluded with experimental results for the proposed unsupervised texture segmentation scheme.
Maximum-weight bipartite matching technique and its application in image feature matching
Yong-Qing Cheng, Victor Wu, Robert Collins, et al.
An important and difficult problem in computer vision is to determine 2D image feature correspondences over a set of images. In this paper, two new affinity measures for image points and lines from different images are presented, and are used to construct unweighted and weighted bipartite graphs. It is shown that the image feature matching problem can be reduced to an unweighted matching problem in the bipartite graphs. It is further shown that the problem can be formulated as the general maximum-weight bipartite matching problem, thus generalizing the above unweighted bipartite matching technique.
Optimal algorithm for detecting two-dimensional images
Richard J. Qian, Thomas S. Huang
In this paper, we present a new two-dimensional (2D) edge detection algorithm. The algorithm detects edges in 2D images by a curve segment based edge detection functional that uses the zero crossing contours of the Laplacian of Gaussian (LOG) as initial conditions to approach the true edge locations. We prove that the proposed edge detection functional is optimal in terms of signal-to-noise ratio and edge localization accuracy for detecting general 2D edges. In addition, the detected edge candidates preserve the nice scaling behavior that is held uniquely by the LOG zero crossing contours in scale space. The algorithm also provides: (1) an edge regularization procedure that enhances the continuity and smoothness of the detected edges; (2) an adaptive edge thresholding procedure that is based on a robust global noise estimation approach and two physiologically originated criteria to help generate edge detection results similar to those perceived by human visual systems; and (3) a scale space combination procedure that reliably combines edge candidates detected from different scales.
Algorithm to locate the nodal points of a projected mesh
Charalabos Brilakis
An algorithm to automate the classical image processing method for analyzing the three- dimensional (3-D) shape of an object's surface, using shape from stereo, is proposed. An orthogonal mesh (two-dimensional cross grating) is projected on the object's surface and it is recorded as images with two device cameras. For correspondence between the surface of the object and the images, the grating line numbers are used. Therefore, the pixel positions of the nodal points of the mesh (the intersections of the center lines of the grating lines) must be located. These are the interest points, used for correspondence. The algorithm proposed in this paper is dealing with the problem of locating these points. The traditional thresholding and thinning procedure doesn't perform very well, therefore the points are located as the centers of the intersection areas.
Automatic assessment of the appearance of seam pucker on textiles using Hough transform
Koji Ichikawa, Tsunehiro Aibara, Maki Muranaka, et al.
Two methods of automated assessment of the appearance of seam pucker based on the Hough transform are proposed. We treat this problem as a pattern recognition problem. From the given standard photographs of suits, which are classified into five classes, we determine a template pattern for each class. These patterns are separated well in the feature space. Although there are several items to be assessed, we focus our attention on the seam of the back of suits. Using a few test samples we made an experiment on the assessment. The results suggest the possibility of practical use.
Multiresolution sequential image change detection with wavelets
Yawgeng A. Chau, Jar-Chi Shee
Multiresolution image change detection based on the wavelet expansion is addressed. The multiresolution change detection is modeled as a sequential hypothesis test problem. A modified multistage truncated sequential probability ratio test (TSPRT) is developed for the change detection problem. With the multistage TSPRT, the devised scheme for change detection employs multiresolution images with increasing sample sizes. The maximum likelihood (ML) estimation is used to obtain the mean, variance, and the relevant correlation coefficients of the image signals for the test. To determine the thresholds of the TSPRT, a suboptimal technique in accordance with the constant false alarm and missing probabilities for the hypothesis test problem is considered. To illustrate the performance of the developed multiresolution change detection scheme, experimental results are presented. From the experimental results, it is asserted that the developed multiresolution change detection algorithm can accurately disclose the changing areas in a consecutive image sequence.
Wavelet/Multiresolution Image Compression
icon_mobile_dropdown
Efficient coding of wavelet trees and its applications in image coding
Bin Zhu, En-hui Yang, Ahmed H. Tewfik, et al.
We propose in this paper a novel lossless tree coding algorithm. The technique is a direct extension of the bisection method, the simplest case of the complexity reduction method proposed recently by Kieffer and Yang, that has been used for lossless data string coding. A reduction rule is used to obtain the irreducible representation of a tree, and this irreducible tree is entropy-coded instead of the input tree itself. This reduction is reversible, and the original tree can be fully recovered from its irreducible representation. More specifically, we search for equivalent subtrees from top to bottom. When equivalent subtrees are found, a special symbol is appended to the value of the root node of the first equivalent subtree, and the root node of the second subtree is assigned to the index which points to the first subtree, an all other nodes in the second subtrees are removed. This procedure is repeated until it cannot be reduced further. This yields the irreducible tree or irreducible representation of the original tree. The proposed method can effectively remove the redundancy in an image, and results in more efficient compression. It is proved that when the tree size approaches infinity, the proposed method offers the optimal compression performance. It is generally more efficient in practice than direct coding of the input tree. The proposed method can be directly applied to code wavelet trees in non-iterative wavelet-based image coding schemes. A modified method is also proposed for coding wavelet zerotrees in embedded zerotree wavelet (EZW) image coding. Although its coding efficiency is slightly reduced, the modified version maintains exact control of bit rate and the scalability of the bit stream in EZW coding.
Robust image communication using subband coding and multilevel modulation
John M. Lervik, Tor A. Ramstad
Visual communication systems can exploit the advantages of analog transmission and digital compression: Analog transmission allows the quality of the recovered data to vary gracefully as the channel conditions vary, whereas digital compression allows for powerful data reduction. In this paper, a system optimized for bandwidth- and power-limited channels which combines analog and digital communication principles is proposed. The system achieves a high bandwidth compression efficiency by combining subband signal decomposition and dynamic bandwidth allocation with 81-PAM transmission. Furthermore, graceful degradation is obtained by finding mappings from the subband samples to the channel space which minimize the impact of channel errors in the reconstructed signal. The optimized mapping system outperforms a conventional system using random mappings. In addition, it is shown that the proposed system performs better, in a rate-distortion sense for a given channel signal- to-noise ratio, than a similar subband system using conventional bit allocation, optimized mappings, and 64-PAM. A novel definition for measuring system efficiency, based on comparisons to pure analog PAM transmission, is used to evaluate the proposed system. The ideas presented in this paper can be used for future terrestrial TV broadcasting and mobile communication systems.
New results for fractal/wavelet image compression
Gregory Caso, C.-C. Jay Kuo
In this research, we perform a multiresolution analysis of the mappings used in fractal image compression. We derive the transform-domain structure of the mappings and demonstrate a close connection between fractal image compression and wavelet transform coding using the Haar basis. We show that under certain conditions, the mappings correspond to a hierarchy of affine mappings between the subbands of the transformed image. Our analysis provides new insights into the mechanism underlying fractal image compression, leads to a non-iterative transform-domain decoding algorithm, and suggests a transform-domain encoding method with extensions to wavelets other than the Haar transform. As a result, we also propose a novel transform-domain encoding scheme whereby the image is hierarchically encoded starting with a coarse approximation and detail is added through a casual sequence of affine mappings between the subbands. This new approach is not only theoretically elegant, but also useful for embedded representation, i.e. images can be decoded to an intermediate resolution.
Hierarchical weighted vector quantization (HWVQ) for embedded wavelet coding
We propose a coding scheme that performs vector quantization (VQ) of the wavelet transform coefficients of an image. The proposed scheme uses different vector dimensions for different wavelet subbands and also different codebook sizes so that more bits are assigned to those subbands that have more energy. Another element of the proposed method is that the vector codebooks used are obtained recursively from the image that is to be compressed. By ordering the bit-stream properly, we can maintain the embedding property, since the wavelet coefficients are ordered according to their energy. Preliminary numerical experiments are presented to demonstrate the performance of the proposed method.
Multiresolution transform and its application to image coding
Traditional wavelet edge detection and encoding schemes preserve shape features of objects effectively at a variety of spatial scales and also allow an efficient means of image and video compression. However these schemes also remove texture from imagery and thus image reproduction quality suffers. Fractal encoding techniques on the other hand generate high compression ratios but tend to introduce blocky artifacts in imagery. Thus we describe an encoding method that combines the shape preserving capability of wavelets with the compression qualities of fractal compression for a hybrid multiresolution technique that achieves high compression, selective and accurate feature preservation, and is computationally efficient.
Wavelet domain textual coding of Ottoman script images
Oemer Nezih Gerek, Enis A. Cetin, Ahmed H. Tewfik
Image coding using wavelet transform, DCT, and similar transform techniques is well established. On the other hand, these coding methods neither take into account the special characteristics of the images in a database nor are they suitable for fast database search. In this paper, the digital archiving of Ottoman printings is considered. Ottoman documents are printed in Arabic letters. Witten et al. describes a scheme based on finding the characters in binary document images and encoding the positions of the repeated characters This method efficiently compresses document images and is suitable for database research, but it cannot be applied to Ottoman or Arabic documents as the concept of character is different in Ottoman or Arabic. Typically, one has to deal with compound structures consisting of a group of letters. Therefore, the matching criterion will be according to those compound structures. Furthermore, the text images are gray tone or color images for Ottoman scripts for the reasons that are described in the paper. In our method the compound structure matching is carried out in wavelet domain which reduces the search space and increases the compression ratio. In addition to the wavelet transformation which corresponds to the linear subband decomposition, we also used nonlinear subband decomposition. The filters in the nonlinear subband decomposition have the property of preserving edges in the low resolution subband image.
Volumetric medical image compression with three-dimensional wavelet transform and octave zerotree coding
Compression of 3D or 4D medical image data has now become imperative for clinical picture archiving and communication systems (PACS), telemedicine and telepresence networks. While lossless compression is often desired, lossy compression techniques are gaining acceptance for medical applications, provided that clinically important information can be preserved in the coding process. We present a comprehensive study of volumetric image compression with three-dimensional wavelet transform, adaptive quantization with 3D spatial constraints, and octave zerotree coding. The volumetric image data is first decomposed using 3D separable wavelet filterbanks. In this study, we adopt a 3-level decomposition to form a 22-band multiresolution pyramid of octree. An adaptive quantization with 3D spatial constraints is then applied to reduce the statistical and psychovisual redundancies in the subbands. Finally, to exploit the dependencies among the quantized subband coefficients resulting from 3D wavelet decomposition, an octave zerotree coding scheme is developed. The proposed volumetric image compression scheme is applied to a set of real CT medical data. Significant coding gain has been achieved that demonstrates the effectiveness of the proposed volumetric image compression scheme for medical as well as other applications.
Joint wavelet-based coding and packetization for video transport over packet-switched networks
Hung-ju Lee
In recent years, wavelet theory applied to image, and audio and video compression has been extensively studied. However, only gaining compression ratio without considering the underlying networking systems is unrealistic, especially for multimedia applications over networks. In this paper, we present an integrated approach, which attempts to preserve the advantages of wavelet-based image coding scheme and to provide robustness to a certain extent for lost packets over packet-switched networks. Two different packetization schemes, called the intrablock-oriented (IAB) and interblock-oriented (IRB) schemes, in conjunction with wavelet-based coding, are presented. Our approach is evaluated under two different packet loss models with various packet loss probabilities through simulations which are driven by real video sequences.
Motion Analysis and Estimation
icon_mobile_dropdown
Estimation of accelerated motion for motion-compensated frame interpolation
Peter Csillag, Lilla Boroczky
In motion-compensated processing of image sequences, e.g. in frame interpolation, frame rate conversion, deinterlacing, motion blur correction, image sequence restoration, slow-motion replay, etc., the knowledge of motion is essential. In these applications motion information has to be determined from the image sequence. Most motion estimation algorithms use only a simple motion model, and assume linear constant speed motion. The contribution of our paper is the development of an algorithm for modeling and estimation of accelerated motion trajectories, based on a second order motion model. This model is more general and much closer to the real motion present in natural image sequences. The parameters of the accelerated motion are determined from two consecutive motion fields, that has been estimated from three consecutive image frames using a multiresolution pel-recursive Wiener-based motion estimation algorithm. The proposed algorithm was successfully tested on artificial image sequences with synthetic motion as well as on natural real-file videophone and videoconferencing sequences in a frame interpolation environment.
Reducing rate/complexity in video coding by motion estimation with block adaptive accuracy
Jordi Ribas-Corbera, David L. Neuhoff
Classical block-based motion-compensated video coders need to find and code a motion field with one motion vector per image block. All motion vectors are computed and encoded with the same fixed accuracy, typically 1 or one-half pixel accuracy. Higher motion vector accuracies have been shown to significantly reduce the total bit rate in some video sequences, but motion estimation at such subpixel accuracies is computationally expensive and is usually not performed in practice. In this paper we show that computing and encoding different motion vectors with different accuracies in the same frame can significantly reduce the total bit rate, and that the complexity of the block adaptive motion estimation procedure can be as low as that of the classical motion estimation at 1 pixel accuracy. Our new block adaptive motion estimator uses a simple technique to decide how accurately to compute the motion vector for each block. This technique results from an analysis on the effect of the motion vector accuracy on the total bit rate, and indicates that motion vectors of higher texture blocks must be computed more accurately and that at higher levels of compression lower motion vector accuracies are needed. We implement two video coders based on our technique, present results on real video frames, and describe the rate/complexity benefits of our procedure.
Jointly optimal forward-backward motion-compensated prediction for video signals
Rajesh Rajagopalan, Michael T. Orchard
Motion compensation of video has been studied within two general frameworks: forward prediction in which motion parameters are computed at the encoder and transmitted over the channel, and backward prediction in which motion parameters are computed at the decoder. Recent work has proposed a promising hybrid forward-backward approach (some parameters are transmitted -- others are computed at the decoder) that exploits the best features of both approaches. However, the forward information used in that work consisted of standard block matching motion vectors and that work did not address how to optimize motion parameters in the context of both forward and backward information. In this work, we propose a jointly optimal forward-backward motion compensation approach. Applications of our approach to inter-frame prediction and denoising are presented. Experimental results demonstrate the excellent performance gains, in particular, prediction efficiency improves by 26% on the average compared to block matching at half-pel accuracy and denoising performance is close to optimal.
Optimal unified approach to warping and overlapped block estimation in video coding
Aria Nosratinia, Michael T. Orchard
Among the class of block-based motion estimators, warping prediction and overlapped block motion estimation have emerged recently as two of the most effective inter-frame estimation algorithms. Warping estimation is based on linear operations in the motion field, while overlapped block estimators are based on a linear sum of motion-based estimators in the intensity field. In this paper, we propose a unified framework for estimation of pixel intensities based jointly on warping and overlapped blocks. We motivate our estimator through a discussion of ambiguities in an incomplete (sparsely sampled) motion field; and that different object motions call for resolution of motion ambiguities in either the motion or intensity domain. We offer a means of optimizing the joint estimator simultaneously in intensity and motion domains, thus guaranteeing improved performance compared to warping and overlapped block estimators, which the joint model contains as special cases. Furthermore, the joint framework provides an excellent vehicle for studying the interactions and relative merits of warping and overlapped block estimators in the presence of various motion scenarios.
Fast motion vector estimation for video coding based on multiresolution-spatiotemporal correlations
Junavit Chalidabhongse, Sungook Kim, C.-C. Jay Kuo
In this paper, we propose a new fast algorithm for motion vector (MV) estimation based on correlations of MVs existing in spatially and temporally adjacent as well as hierarchically related blocks. The main idea is to select a good set of MV candidates with information obtained at the coarser resolution level and spatio-temporal blocks at the same level, and then perform further local search to refine the MV result. The experimental results show that the proposed fast algorithm achieves a speed-up factor ranging from 150 to 310 with only 2 - 7% MSE increase in comparison with the full search block matching algorithm on typical test video sequences.
Motion segmentation in RGB image sequence based on hidden MRF and 6D Gaussian distribution
Adam Kurianski, Takeshi Agui, Hiroshi Nagahashi
A problem of motion segmentation in RGB image sequence is addressed. An algorithm proposed is based on local motion modeling and pixel labeling approach. An information vector used for labeling consists of six components; three color components and three differences of colors. To develop the labeling algorithm a statistical model of motion sequence, which uses a six-variate Gaussian distribution, is chosen. Moreover, the use of a hidden Markov random field (MRF) framework is proposed in order to carry out the segmentation more accurately. The experimental results of the application of the method to an RGB sequence showing a woman's turning head are included and discussed.
Myocardial motion estimation using elastic mapping technique
Kang-Ping Lin, Te-Shin Chen
This paper presents a self-organizing method for estimating a transformation that can elastically map one subject's MR cardiac image to the same slice cut image acquired at different heart beat sequences. One slice image called input image is elastically deformed to match the other slice image called the reference image. Both images are linearly divided into several smaller areas of equal size. A local correspondence is used to estimate the best matching position by moving individual areas of the input image to the reference image within a search neighborhood. Based on local correspondence, coarse displacement vectors for each area are determined by the position difference between the original and the new area centers. The estimated vectors provide a complete transformation that matches the entire input image to the reference image. As the process is repeated, a better transformation is obtained that improves the matching. This algorithm has been tested on simulations of 2D deformed images and been found to be successful for MR cardiac images.
Motion estimation with variable velocity bandwidth
Regis J. Crinon, Wojciech J. Kolodziej
A new phase-based motion algorithm is introduced. The estimation is performed such that only non-aliased temporal frequencies are included in the calculation of the translation vector describing the displacement of an object in a digital video sequence. A full description of the algorithm is presented. It is based on using a two-dimensional fast Fourier transform in two consecutive video fields or frames. Simulation results are included to illustrate the capabilities of the algorithm in an object-tracking application featuring fast frame-to-frame motion. It is shown that the proposed algorithm out-performs a conventional phase-based motion estimation algorithm.
Image Segmentation
icon_mobile_dropdown
Image segmentation based on multiscale random field models
Ahmet Mufit Ferman, Erdal Panayirci
Recently a new approach to Bayesian image segmentation has been proposed by Bouman and Shapiro, based on a multiscale random field (MSRF) model along with a sequential MAP (SMAP) estimator as an efficient and computationally feasible alternative to MAP segmentation. But their method is restricted to image models with observed pixels that are conditionally independent given their class labels. In this paper, we follow the approach of and extend the SMAP method for a more general class of random field models. The proposed scheme is recursive, yields the exact MAP estimate, and is readily applicable to a broad range of image models. We present simulations on synthetic images and conclude that the generalized algorithm performs better and requires much less computation than maximum likelihood segmentation.
Segmentation of scanned document images for efficient compression
Hei Tao Fung, Kevin J. Parker
A scanned, complex document image may be composed of text, graphics, halftones, and pictures, whose layout is unknown. In this paper, we propose a novel segmentation scheme for scanned document images that facilitates their efficient compression. Our scheme segments an input image into binarizable components and no-binarizable components. By a binarizable component we mean that the region can be represented by no more than two gray levels (or colors) with acceptable perceptual quality. A non-binarizable component is defined as region that has to be represented by more than two gray levels (or colors) with acceptable perceptual quality. Once the components are identified, the binarizable components can be thresholded and compressed as a binary image using an efficient binary encoding scheme together with the gray values represented by the black and white pixels of the binary image. The non-binarizable components can be compressed using another suitable encoding scheme.
Automated connectivity-based thresholding segmentation of midsagittal brain MR images
Chulhee Lee, Michael A. Unser, Terence A. Ketter
In this paper, we propose an algorithm for automated segmentation of midsagittal brain MR images. First, we apply thresholding to obtain binary images. From the binary images, we locate some landmarks. Based on the landmarks and anatomical information, we preprocess the binary images to eliminate small regions and remove the skull, which substantially simplifies the subsequent operations. We perform segmentation in the binary image as much as possible and then return to the gray scale image to solve problematic areas. We propose a new connectivity-based thresholding segmentation to separate brain regions from surrounding tissues. Experiments show promising results.
Heuristic segmentation and feature extraction algorithm for optical character recognition of Arabic script
Fatos T. Yarman-Vural, A. Atici
In this paper, a heuristic method is developed for segmentation, feature extraction and recognition of the Arabic script. The study is part of a large project for the transcription of the documents in Ottoman Archives. A geometrical and topological feature analysis method is developed for segmentation and feature extraction stages. Chain code transformation is applied to main strokes of the characters which are then classified by the hidden Markov model (HMM) in the recognition stage. Experimental results indicate that the performance of the proposed method is impressive, provided that the thinning process does not yield spurious branches.
Robust parallel clustering algorithm for image segmentation
Jose Gerardo Tamez-Pena, Arnulfo Perez
This paper describes a hierarchical parallel implementation of two clustering algorithms applied to the segmentation of multidimensional images and range images. The proposed hierarchical parallel implementation results in a fast robust segmentation algorithm that can be applied in a number of practical computer vision problems. The clustering process is divided in two basic steps. First, a fast sequential clustering algorithm performs a simple analysis of the image data, which results in a sub optimal classification of the image features. Second, the resulting clusters are analyzed using the minimum volume ellipsoid estimator. The second step is to merge the similar clusters using the number and shape of the ellipsoidal clusters that best represents the data. Both algorithms are implemented in a parallel computer architecture that speeds up the classification task. The hierarchical clustering algorithm is compared against the fuzzy k-means clustering algorithm showing that both approaches gave comparable segmentation results. The hierarchical parallel implementation is tested in synthetic multidimensional images and real range images.
Image processing algorithm design and implementation for real-time autonomous inspection of mixed waste
Robert J. Schalkoff, Khaled M. Shaaban, Albrecht E. Carver
The ARIES #1 vision system is used to acquire drum surface images under controlled conditions and subsequently perform autonomous visual inspection leading to a classification as 'acceptable' or 'suspect.' Specific topics described include vision system design methodology, algorithmic structure, hardware processing structure, and image acquisition hardware. Most of these capabilities were demonstrated at the ARIES Phase II Demo which was held on November 30, 1995. Finally, Phase III efforts are briefly addressed.
Generalized connected operators
Albert Oliveras, Philippe Salembier
This paper deals with the notion of connected operators. These operators are becoming popular in image processing because they have the fundamental property of simplifying the signal while preserving the contour information. In a first step, we recall the basic notions involved in binary and gray level connected operators. Then, we show how one can extend and generalize these operators. We focus on two important issues: the connectivity and the simplification criterion. We show in particular how to create connectivities that are either more or less strict than the usual ones and how to build new criteria.
Low-Bit-Rate Video Coding
icon_mobile_dropdown
Low-complexity smoother (LO-COST) for low-bit-rate codecs
Brian C. Davison
A new low-complexity postprocessing algorithm is presented for enhancing noisy images in real time videoconferencing applications. The algorithm analyzes the image from the perspective of the human viewer, classifies pixels according to local characteristics, and applies combinations of linear and non-linear filters to each class. Edge pixels are sharpened, pixels near edges have a median filter applied to mitigate mosquito noise, and pixels in monotonous regions are smoothed to dampen blocking. The algorithm requires limited processing and can be used independently of the videoconferencing standard that is used. Objective analysis shows that errors are decreased for a majority of pixels. In subjective tests using low bit-rate coding standards, processed sequences were judged to significantly improve the image quality of the decoded sequences in all cases.
Fast and efficient mode and quantizer selection in the rate distortion sense for H.263
Guido M. Schuster, Aggelos K. Katsaggelos
In this paper, a fast and efficient method for selecting the encoding modes and the quantizers for the ITU H.263 standard is presented. H.263 is a very low bit rate video coder which produces satisfactory results at bit rates around 24 kbits/second for low motion quarter common intermediate format (QCIF) color sequences such as 'mother and daughter.' Two major target applications for H.263 are video telephony using public switched telephone network lines and video telephony over wireless channels. In both cases, the channel bandwidth is very small, hence the efficiency of the video coder needs to be as high as possible. The presented algorithm addresses this problem by finding the smallest frame distortion for a given frame bit budget. The presented scheme is based on Lagrangian relaxation and dynamic programming (DP). It employs a fast evaluation of the operational rate distortion curve in the DCT domain and a fast iterative search which is based on a Bezier function.
Region-based representation of video sequences with uniform background motion for a content-based image coding
Jenny Benois-Pineau, Apostolos Saflekos, Dominique Barba
Content-based image coders become the center of attention now for the currently emerging standard MPEG-4. A method based on the spatio-temporal segmentation for motion image coding is developed in this paper. The method is designed for the sequences characterized by homogeneous global motion (camera motion) and the presence of semantic objects having proper motions. The property of the method is the fitness of the moving border to spatial contours of regions which allows for a high quality of predicted images without any error encoding.
Efficient segmentation-based motion compensation for low-bit-rate video coding
Mihai Sipitca, Vijay K. Madisetti
In this paper, we investigate the performance of a segmentation-based motion compensation algorithm for videoconferencing which uses the motion information obtained from a set of analog position sensors. The position sensors provide high resolution motion information which is used to perform an affine transformation-based motion compensation. The simulation results are presented in the form of displaced frame difference (DFD) variance improvement versus the usual block matching algorithm. The proposed algorithm has the following advantages: (1) a very small amount of motion information needs to be transmitted, (2) the error energy is concentrated mainly in the boundary regions which makes it less noticeable, (3) different regions of the frame can be transmitted with different quality, and (4) a smaller computational load than advanced techniques is required.
Application of motion-compensated prediction to coding ultrasound video
Alen Docef, Mark J. T. Smith
B-mode ultrasound video is a very effective diagnostic modality. Popular image sequence coding techniques based on motion compensation such as MPEG and H.263 give poor results when applied to ultrasound video, because of the noisy nature of the signal. By using motion compensation in conjunction with a model-based ultrasound representation, we improve the quality of reconstructed images significantly.
Detection and encoding of model failures in very low bit rate video coding
Taner Ozcelik, Aggelos K. Katsaggelos
One of the challenging problems for most existing video codecs is the detection and encoding of the information pertaining to model failure areas, i.e., areas where the compensation of the motion is insufficient. The insufficient motion may result from several different reasons, such as uncovered background by moving objects, complex motion, etc. The existing approaches to detection and encoding of model failures are closely tied to the encoding scheme they are built in, particularly to the specific motion estimation algorithm used; therefore, generalization of these algorithms to other coding techniques is not possible. On the other hand, the efficient encoding of the position and the intensity field information in these areas is also very crucial to the performance of the very low bit rate codecs. The existing approaches fail to meet the target bit rates and satisfactory image quality. In this paper, a new method to detect the model failure areas is described. In this method, the model failure areas are detected based on a motion compensated prediction of the current frame independently of the motion estimation algorithm. Thus the proposed method can be used with any type of coding scheme. In addition efficient and robust encoding of the boundary and the intensity information is described. The simulation results demonstrate that with the described method, the requirements of very low bit rate coding can be satisfactorily met.
Perceptually tuned low-bit-rate video codec for ATM networks
Chun-Hsien Chou
In order to maintain high visual quality in transmitting low bit-rate video signals over asynchronous transfer mode (ATM) networks, a layered coding scheme that incorporates the human visual system (HVS), motion compensation (MC), and conditional replenishment (CR) is presented in this paper. An empirical perceptual model is proposed to estimate the spatio- temporal just-noticeable distortion (STJND) profile for each frame, by which perceptually important (PI) prediction-error signals can be located. Because of the limited channel capacity of the base layer, only coded data of motion vectors, the PI signals within a small strip of the prediction-error image and, if there are remaining bits, the PI signals outside the strip are transmitted by the cells of the base-layer channel. The rest of the coded data are transmitted by the second-layer cells which may be lost due to channel error or network congestion. Simulation results show that visual quality of the reconstructed CIF sequence is acceptable when the capacity of the base-layer channel is allocated with 2 multiplied by 64 kbps and the cells of the second layer are all lost.
Irregular triangular mesh representation based on adaptive control point removal
Kangwook Chun, Byeungwoo Jeon, Jae Moon Jo
Several new approaches are being investigated in conjunction with the low bit rate coding, such as MPEG-4, to overcome the limitation imposed by block-based image compression. One solution is to use 'warping' prediction (or spatial transformation) based on a set of control points where one of the most important issues is how to adequately place the control points not destroying salient features such as edges and corners. In this paper, we propose a new image representation scheme based on irregular triangular mesh structure in which, considering the salient features, a considerably reduced number of control points are adaptively selected out of initial uniformly distributed control points. A new criterion based on local representation error is defined to be used in successive control point removal exploiting global image features, thus providing better image representation. Computer simulation has shown that the proposed scheme gives significantly improved image representation performance compared with the conventional scheme based on regular meshes in both objective and subjective qualities.
Simplification of 3D scanned head data for use in real-time model-based coding systems
Ricardo Lopez, Thomas S. Huang
In this paper we present an algorithm for reducing a set of high-density scanned range data to a simplified polygonal mesh. Of major interest is the application of this algorithm to Cyberware 3D range data of human heads to produce simple yet accurate wireframe approximations for use in model based video coding systems. The objective is to decimate the range data while maintaining acceptable levels of resolution over critical sections of the face, such as areas of high curvature (noise, mouth) and sections with fine detail (eyes). Areas such as foreheads and cheeks which are relatively smooth are represented with lower geometric detail. The algorithm employs a quadtree-based representation of the range data and subsequent mergings of the leaf nodes are determined by a multi-variable cost function. Factors taken into account in creating the cost function are fitting error, polygon aspect ratios, and improvement in mesh simplification. This method has been tested on a database of Cyberware head data and the results are presented at the end of the paper. Experiments show that the algorithm provides considerable data reduction and the resulting simplified wireframe is accurate enough for use in a real-time model based coding system.
Image/Video Analysis
icon_mobile_dropdown
Hybridized edge preservation coefficient for anisotropic diffusion
Auapong Yaicharoen, Scott Thomas Acton
This paper addresses the construction of the edge-preservation coefficient for anisotropic diffusion for image enhancement. We evaluate the performance of existing diffusion techniques in the presence of several corruptive processes. The results from this study were used to design a hybrid algorithm which capitalizes on the strengths of the current diffusion coefficients. The new edge-preserving algorithm adaptively determines the presence of noise or edges at each image location and selects the appropriate diffusion coefficient. The results generated from this algorithm exhibit improvements in mean squared error, signal-to-noise ratio, and visual quality. A comparative study shows the performance of the standard diffusion techniques and the hybrid algorithm for images corrupted by Laplacian-distributed, Gaussian- distributed, and 'salt and pepper' impulse noise.
Application of blue noise mask in color halftoning
Meng Yao, Kevin J. Parker
Color halftoning using a conventional screen requires rotating the screen by different angles for different color planes to avoid Moire patterns. An obvious advantage of halftoning using a blue noise mask (BNM) is that there are no screen angles or Moire patterns. However, a simple strategy of employing the same BNM on all color planes is unacceptable in cases where a small registration error can cause objectionable color shifts. In a previous paper, we proposed shifting or inverting the BNM for different color planes. The shifting technique can, at certain shift values, introduce low frequency contents into the halftone image, whereas the inverting technique can be used only on two color planes. In this paper, we propose a technique that uses four distinct BNMs that are correlated in a way such that the low frequency noise resulting from the interaction between the BNMs is significantly reduced.
Quantitative analysis of blue noise mask generation
Meng Yao, Lan Gao, Kevin J. Parker
The blue noise mask (BNM) is a stochastic screen that produces visually pleasing blue noise. In its construction, a filter is applied to a given dot pattern to identify clumps in order to add or remove dots and thereby generate a correlated binary pattern for the next level. But up to now, all the filters were selected on a qualitative basis. There is no reported work describing precisely how the filtering and selection of dots affects the perceived error of the binary pattern. In this paper, we give a strict mathematical analysis of the BNM construction based on a human visual model, which provides insights to the filtering process and also prescribes the locations of the dots that will result in a binary pattern of minimum perceived error when swapped. The analysis also resolves some unexplained issues noticed by other researchers.
Adaptive image matching in the subband domain
Hualu Wang, Shih-Fu Chang
In this paper we discuss image matching by correlation in the subband domain with prospective applications. Theoretical proof is given to show that the correlation of two signals equals the weighted sum of the correlations of their decomposed subband signals. We propose an adaptive method to compute image correlation directly in the subband domain, which avoids decoding of the compressed data. Compared with pixel-domain correlation, this method reduces computation by more than ten times with satisfactory accuracy. We also compare the effects of template size, number of iterations of subband decomposition, and filter type on the speed and accuracy. Complexity estimations and test results are given. In addition, several techniques that involve image correlation are investigated for application in image matching.
Temporally coherent 3D video sequence analysis and modeling
Andreas C. Kopernik, W. Tuleweit, S. Wagenbreth
This paper covers two modules of an analysis-synthesis-chain for 3D video that were optimized to support a more consistent temporal scene modeling. In a first section it addresses the disparity analysis of the scene, and in a second section the synthesis of a 3D adapted wireframe representing the visual scene surface based on the previous disparity analysis results. The assumption of parallel stereo camera axes, or the rectification of the stereo images prior to analysis, allows the application of stereopsis constraints such as uniqueness and ordering and the focus on single image scanlines for correspondence analysis. This led to the development of dynamic programming search algorithms for the ordering of the scanline matching results during the disparity estimation process, either region-based or feature-based. The approach presented in this paper expands the focus of the ordering method from the single scanline to a neighborhood region and combines this with a temporal recursion in order to include as much information as possible already in the ordering process before starting the disparity diffusion. The wireframe approximation of the visual surface of 3D video traditionally starts with a 2D network construction based on the projection of the segmented surface labels into the image plane. The 3D adaptation of this network is performed afterwards resulting in certain image regions into a degeneration of the mesh that cannot be controlled. In the second section of this paper for this reason an algorithm is suggested that performs the 3D triangulation directly in 3D space based on the 3D visual surface in order to achieve maximum regularity of the wireframe.
Cardiac dynamic analysis using hierarchical shape models and Gaussian curvature recovery: an integrated approach
We present in this paper a scheme to analyze the left ventricle motion over a cardiac cycle through the integration of the hierarchical surface fitting and the point correspondence estimation. The hierarchical surface fitting is a coarse-to-fine analysis scheme and has been successfully applied to cine-angiographic cardiac images. In this study, the hierarchical surface fitting and motion analysis is applied to a set of CT images with real volumetric nature. We also incorporate an additional global deformation, long axis bending, into the shape model to reflect the curved nature of the left ventricle long axis. With the dense volumetric data, we are able to implement higher order spherical harmonics in the analysis of the local deformations. The fitted surface allows us a complete recovery of the Gaussian curvature of the shape. The estimation of the point correspondence is accomplished through the analysis of the first fundamental form and the Gaussian curvature computed from the fitted shape assuming conformal motion. The overall coarse-to-fine hierarchical analysis and the parametric nature of the fitted surface enable us to compute the Gaussian curvature analytically and gain a clear and complete description of the left ventricle dynamics based on the shape evolution over the cardiac cycle. Results based on a set of CT data of 16 volumes show that this hierarchical surface fitting and motion analysis scheme is promising for cardiac analysis.
Real-time facial expression detection based on frequency domain transform
Kazuyuki Ebihara, Jun Ohya, Fumio Kishino
A new method for the real-time detection of facial expressions from time-sequential images is proposed. The proposed method does not need the tape marks that are pasted to the face for detecting expressions in real-time in the current implementation for virtual space teleconferencing. In the proposed method, four windows are applied to four areas in the face image: a left and right eye, mouth, and forehead. Each window is divided into blocks that consist of 8 by 8 pixels. Discrete cosine transform (DCT) is applied to each block, and the feature vector of each window is obtained by taking the summations of the DCT energies in the horizontal, vertical, and diagonal directions. To convert the DCT features to virtual tape mark movements, we represent the displacement of a virtual tape mark by a polynomial of the DCT features for the three directions. We apply a genetic algorithm to train facial expression image sequences to find the optimal set of coefficients that minimizes the difference between the real and converted displacements of the virtual tape marks. Experimental results show the effectiveness of the proposed method.
Optimal binary image design based on the discrete cosine transform
Buh-Yun Lee, Tsann-Shyong Liu, Long-Wen Chang
Halftoning is a technique to display a continuous gray-tone image with a bilevel device. Conventionally, halftoning is done in the spatial domain. A new halftoning technique in the frequency domain is proposed. It chooses a best bilevel image to display the original continuous-tone image and minimize the mean square error between them with the weight coefficients based on the criterion of human vision in the frequency domain of the discrete cosine transform. The simulation results indicate that our algorithm shows better bilevel image visually without false contours.
Architectures and Implementation
icon_mobile_dropdown
Cellular neural network architecture for Gibbs random field-based image segmentation
We describe in this paper a novel cellular connectionist neural network model for the implementation of clustering-based Bayesian image segmentation with Gibbs random field spatial constraints. The success of such an algorithm is largely due to the neighborhood constraints modeled by the Gibbs random field. However, the iterative enforcement of the neighborhood constraints involved in the Bayesian estimation would generally need tremendous computational power. Such computational requirement hinders the real-time application of the Bayesian image segmentation algorithms. The cellular connectionist model proposed in this paper aims at implementing the Bayesian image segmentation with real-time processing potentials. With a cellular neural network architecture mapped onto the image spatial domain, the powerful Gibbs spatial constraints are realized through the interactions among neurons connected through their spatial cellular layout. This network model is structurally similar to the conventional cellular network. However, in this new cellular model, the processing elements designed within the connectionist network are functionally more versatile in order to meet the challenging needs of Bayesian image segmentation based on Gibbs random field. We prove that this cellular neural network does converge to the desired steady state with a properly designed update scheme. An example of CT volumetric medical image segmentation is presented to demonstrate the potential of this cellular neural network for a specific image segmentation application.
Architectures for high-speed re-synchronization using parallel pattern matching
Sanghoon Lee, Soon Hwa Jang, Soon Hong Kwon
A variable length coder adds a synchronization code having a fixed length to a bit stream for fast resynchronization after transmission error or random access. In this paper, we present architectures for high speed resynchronization with a desired bit pattern such as synchronization code from an input bit stream. We consider the hardware architecture for achieving parallel pattern matching and apply it to VLD of the video decoder. The hardware architectures are constructed to minimize the time taken for one stage in the system for finding match pattern. It can rapidly deal with the transmission error and random access through high speed resynchronization.
Automatic scale detection
Dongwei Chen, John G. Harris, Andrew F. Laine
Most image processing and computer vision algorithms assume knowledge of the proper spatial scale in the image. We propose an automatic technique that adaptively sets the proper spatial scale based on the size of local features in the image. We discuss an analog VLSI circuit implementation that uses this method to automatically determine the proper scale based on a dynamic system model that continuously searches scale space. Though this technique is motivated by analog circuit concepts, we show results for a digital implementation for image enhancements of mammograms. Digital implementations must discretize the scale space and perform an optimization over this finite set.
Portable and scalable MPEG-2 video encoder on parallel and distributed computing systems
Shahriar Akramullah, Ishfaq Ahmad, Ming Lei Liou
Traditionally, real-time video compression due to its enormous computing requirement has been done using the special-purpose hardware. On the other hand, software-based solutions have been primarily intended for non real-time applications. In this paper, we present a portable and scalable implementation of the MPEG-2 video encoder, using parallel processing, that can be used for both real-time and non real-time applications. The portability allows it to run on a wide variety of platforms including a number of high-performance parallel computers as well as networks of workstations. The scalability allows the user to control the parallelism enabling it to run on a few fast workstations using a coarse granularity or on a massively parallel architecture using a fine grained granularity. An important feature of our implementation is that we use a data-parallel approach and exploit parallelism within each frame, unlike previous parallel video coding implementations. This makes our encoder suitable for real-time applications where the complete video sequence may not be present on the disk and may become available on a frame-by-frame basis with time. The encoder also provides control over various parameters such as number of processors, size of motion-search window, buffer management and bit rate. Our implementation is flexible and allows inclusion of fast and new algorithms for different stages of the encoder, replacing current algorithms. Experimental results have been conducted on two parallel processing systems: the Intel Paragon XP/S and the Intel iPSC/860 hypercube. Networks of workstations used include the SUN and HP, connected via the Ethernet and FDDI, respectively. Comparisons of execution times, speedups as well as frame encoding rates on these systems are provided. Using maximum parallelism by dividing one block per processor, an encoding rate higher than real- time (30 frames/sec) has been achieved on the Intel Paragon.
High-level design methodology for the implementation of image processing ASICs
Mohamed A. Wahab, Iain W. Shewring, S. John Rees
This paper presents an integrated design methodology for the development of high-level image processing algorithms and their ASIC implementation. A commercially available DSP development system is utilized to implement this strategy. A 2-D DCT algorithm is used as an example to illustrate the smooth transition between high-level algorithm development and hardware synthesis.
Comparison of block-matching algorithms for VLSI implementation
Sheu-Chih Cheng, Hsueh-Ming Hang
This paper presents an evaluation of several block-matching motion estimation algorithms from a system-level VLSI design viewpoint. Because a straightforward block-matching algorithm (BMA) demands a very large amount of computing power, many fast algorithms have been developed. However, these fast algorithms are often designed to merely reduce arithmetic operations without considering their overall performance in VLSI implementation. In this paper, three criteria are used to compare various block-matching algorithms: (1) silicon area, (2) input/output requirement, and (3) image quality. Several well-known motion estimation algorithms are analyzed under the above criteria. The advantages/disadvantages of these algorithms are discussed. Although our analysis is limited by the preciseness of our silicon area estimation model, it should provide valuable information in selecting a BMA for VLSI implementation.
Parallel processor for motion estimation
Emmanuel J.-M. Hanssens, Jean-Didier Legat
A parallel processor for real-time motion estimation algorithms has been developed. It consists of several clusters of basic processing elements connected to a transfer controller that is attached to an external RAM. The architecture is parallel, allowing each cluster to work on non-overlapping image segments. It is also associative, in that the execution of a global instruction step by a processing element depends on some local conditions: moreover, lateral communications between clusters are provided, these two last features being essential for motion vector field regularization processes. Feasibility of the architecture has been evaluated with an advanced block-matching based option estimation algorithm: the ABMA. Running at a clock rate of 50 MHz, a group of 12 processing elements can real-time execute the ABMA on a common input format (CIF) image (288 by 352 pixels, at 10 Hz). A custom VLSI test circuit, consisting of one processing element and one transfer controller, has been designed in a 1 micrometer technology; the total silicon area of the test circuit is 41 mm2.
Improvement of VLSI architecture for two-dimensional discrete cosine transform and its inverse
Kyeounsoo Kim, Soon Hwa Jang, Soon Hong Kwon, et al.
This paper presents an improvement of VLSI architecture for 2-dimensional DCT (discrete cosine transform) and its inverse in complexity and speed. In the proposed architecture, an accuracy compensator and a bit serial transposition network are newly introduced. It can be easily applied to the previously developed 2-D DCT/IDCT architectures, and revised to more fast and simple architecture without changing the existing scheme. Its other main characteristic is that this scheme jumps over the restriction of the computational resolution for the finite word length calculation. Bit serial transposition network needs less registers and more simple routing than the existing architectures. The proposed architecture shows operation speed over 100 MHz in 0.6 micrometer 3-metal CMOS technology.
Reconfigurable image coprocessor for mathematical morphology
Mohamed A. Wahab, Julian M. Holden, S. John Rees
The execution of morphological algorithms places a large computational burden on the main processor in an image processing system and requires dedicated hardware. This paper proposes a versatile reconfigurable morphological coprocessor, which is implemented using a static RAM-based FPGA device. The basic architecture and several alternative implementations are presented to illustrate the flexibility of the coprocessor whose architecture is modified under the control of the system's processor without any physical design changes.
Image Coding II
icon_mobile_dropdown
Layered coding of check images using foreground and background segmentation
Ali Susanto, Yao Wang, Edward K. Wong
An emerging trend in the banking industry is to digitize the check storage, processing, and transmission process. One bottleneck in this process is the extremely large sizes of digitized checks. A check image is usually comprised of a foreground overlaid on top of a background. For most banking functions, only the foreground carries useful information and should be specified accurately. The background either does not need to be retained, or can be represented with less precisions, depending on the underlying banking requirements and procedures. Recognizing this special characteristic of check images, we propose a layered coding approach. The first layer consists of the binary foreground map. The second layer contains the gray or color values of the foreground pixels. The third layer retains a coarse representation of the background. The fourth layer comprises the error image between the original and the decompressed one from the first three layers. The methods for segmenting the foreground and for coding different layers are presented. The proposed layered coding scheme can yield a more accurate representation of a check image, especially the foreground, than the JPEG baseline algorithm under the same compression ratio. Furthermore, it facilitates progressive retrieval or transmission of check images in compressed formats.
Optimal lossy segmentation encoding scheme
Guido M. Schuster, Aggelos K. Katsaggelos
In this paper, we present a fast and optimal method for the lossy encoding of object boundaries which are given as 8-connect chain codes. We approximate the boundary by a polygon and consider the problem of finding the polygon which can be encoded with the smallest number of bits for a given maximum distortion. To this end, we derive a fast and optimal scheme which is based on a shortest path algorithm for a weighted directed acyclic graph. We further investigate the dual problem of finding the polygonal approximation which leads to the smallest maximum distortion for a given bit rate. We present an iterative scheme which employs the above mentioned shortest path algorithm and prove that it converges to the optimal solution. We then extend the proposed algorithm to the encoding of multiple object boundaries and introduce a vertex encoding scheme which is a combination of an 8-connect chain code and a run-length code. We present results of the proposed algorithm using objects from the 'Miss America' sequence.
Simultaneous parameter estimation and image segmentation for image sequence coding
Kristine E. Matthews, Nader M. Namazi
We previously proposed and demonstrated the feasibility of a method for segmenting an image in a sequence of images into regions of stationary, moving, and uncovered background pixels and simultaneously estimating parameters of each region. The basis of our method is the expectation-maximization (EM) algorithm for maximum-likelihood estimation. We view the intensity difference between image frames as the incomplete data and the intensity difference with the region identifier as the complete data. Our previous work focused primarily on the viability of the method and considered only moving and stationary pixels. In particular, we estimated the DCT coefficients of the motion field for the moving pixels allowing motion- compensated reconstruction of image frames. In this paper we extend our previous formulation to include uncovered background pixels, and we present results showing image segmentation and parameter convergence.
Controlled redundancy for image coding and high-speed transmission
Nicolas Normand, Jeanpierre V. Guedon, Olivier Philippe, et al.
The goal of this paper is to describe a new fully reversible image transform specifically designed for image coding and transmission in a context of possible loss of information as encountered in ATM networks. The so-called Mojette transform, is based on a discrete exact radon transform which allows for a natural redundancy of the initial image information. The inverse Mojette transform is particularly flexible and efficient: any portion of the coded image can be replaced by another one if transmission problems occur. Furthermore, the transform is also well suited to the two-layers paradigm used for video transmission on ATM networks by using a multiscale and/or quadtree decomposition into the transform domain. Finally, we discuss the merits of the transform both for image storage and transmission in order to show its ability to operate as the core representation of digital images.
Edge-adaptive JPEG image compression
Marcia G. Ramos, Sheila S. Hemami
Digital image compression algorithms have become increasingly popular due to the need to achieve cost-effective solutions in transmitting and storing images. In order to meet various transmission and storage requirements, the compression algorithm should allow a range of compression ratios, thus providing images of different visual quality. This paper presents a modified JPEG algorithm that provides better visual quality than the Q-factor scaling method commonly used with JPEG implementations. The quantization step sizes are adapted to the activity level of the block, and the activity selection is based on an edge-driven quadtree decomposition of the image. This technique achieves higher visual quality than standard JPEG compression at the same bit rate.
Reversible transform coding of images
Kunitoshi Komatsu, Kaoru Sezaki
Though reversible predictive coding and reversible subband coding exist already as reversible coding of gray-level still images, reversible method has almost not been proposed against transform coding. Therefore, in this paper, we propose some reversible transform coding methods. In case that we use conventional transform coding as it is, we have to make the number of levels of the transform coefficient very large in order to reconstruct the input signal with no distortion. Therefore, we propose transform codings that have reversibility whereas the number of levels of the transform coefficient are not very large. We propose reversible coding methods that correspond to the discrete Walsh-Hadamard, Haar, and cosine transform. Furthermore, we propose a method that uses the difference of the n-th order, a method of which the number of levels of the transform coefficient is the same as that of the input signal, and a reversible overlap transform coding method. Simulation shows that the compression efficiency of the proposed method is almost the same as that of predictive coding.
Adaptive cosine transform image coding with variable block size and constant block distortion
An adaptive discrete cosine transform (DCT) image coding system is implemented with the same average distortion designated for each variable size image block. The variable block size segmentation is performed using a quadtree data structure by dividing the perceptually more important regions of an image into smaller size blocks compared to the size of blocks containing lesser amounts of spatial activity. Due to the nonstationarity of real-world images, each image block is described by a space-varying nonstationary Gauss-Markov random field. The space-varying autoregressive parameters are estimated using an on-line modified least- squares estimator. For each assumed space-varying nonstationary image block, a constant average distortion is assigned and the code rate for each image block is allowed to vary in order to meet the fixed distortion criterion. Simulation results show that reconstructed images coded at low average distortion, based on an assumed space-varying nonstationary image model, using variable size blocks and with variable bit rate per block possess high-quality subjective (visual) and objective (measured) quality at low average bit rates. Performance gains are achieved due to the distortion being distributed more uniformly among the blocks as compared with fixed-rate, stationary image transform coding schemes.
Image Sequence Segmentation
icon_mobile_dropdown
Regions merging based on robust statistical testing
Fabrice Moscheni, Frederic Dufaux
This paper addresses the problem of finding moving objects present in an image sequence. More specifically, a method is proposed to merge regions based on a coherent motion criterion. A modified Kolmogorov-Smirnov test is proposed which exploits both the motion information present in the residual distribution and the motion information of the motion parameter space. Therefore, all the available motion information is used. Moreover, the proposed test is consistent with robust motion estimation. Using the modified Kolmogorov- Smirnov test, the graph of the relationships between the different regions is built. The graph also integrates spatial information as only adjacent regions are allowed to merge. Two graph clustering rules are proposed which enable us to robustly define the moving objects. The proposed method does not require any user input. Simulation results demonstrate the efficiency of the proposed method.
Moving target extraction algorithm for selective coding
Chan-Sik Kim, Sang-Yeon Kim, Jong-Bae Lee, et al.
In this paper, we propose a novel moving target extraction algorithm for selective coding which does discriminately encode target from the background region in a general tactical scene. Our algorithm has 4 stages. In the first stage, we perform global motion estimation and compensation. In the next stage, we segment the motion vector field using region growing technique. After that, we calculate the change detection mask which is a set of changed regions. Finally, we extract the moving target(s) by using the motion segmentation information and the change detection mask obtained from the previous stages. Simulation results show that our algorithm has better results than the existing methods for tactical scenes.
Hybrid image segmentation using watersheds
Kostas Haris, Serafim N. Efstratiadis, Nicos Maglaveras, et al.
A hybrid image segmentation algorithm is proposed which combines edge- and region-based techniques through the morphological algorithm of watersheds. The algorithm consists of the following steps: (1) edge-preserving statistical noise reduction, (2) gradient approximation, (3) detection of watersheds on gradient magnitude image, and (4) hierarchical region merging (HRM) in order to get semantically meaningful segmentations. The HRM process uses the region adjacency graph (RAG) representation of the image regions. At each step, the most similar pair of regions is determined (minimum cost RAG edge), the regions are merged and the RAG is updated. Traditionally, the above is implemented by storing all the RAG edges in a priority queue (heap). We propose a significantly faster algorithm which maintains an additional graph, the most similar neighbor graph, through which the priority queue size and processing time are drastically reduced. The final segmentation is an image partition which, through the RAG, provides information that can be used by knowledge-based high level processes, i.e. recognition. In addition, this region based representation provides one-pixel wide, closed, and accurately localized contours/surfaces. Due to the small number of free parameters, the algorithm can be quite effectively used in interactive image processing. Experimental results obtained with 2D MR images are presented.
Structural motion segmentation for compact image sequence representation
Cha Keon Cheong, Kiyoharu Aizawa, Takahiro Saito, et al.
This paper addresses a problem of extraction of the structural motion information for compact image sequence representation. In order to extract a meaningful scene structure from image sequence, global motion and region shape of moving objects are taken into consideration. Firstly, intraframe segmentation is carried out with edges that are detected from zero-crossings of a wavelet transform, and local motions are estimated using a gradient-based method. Moving regions are then extracted using the local motion based on the intraframe segmentation. Secondly, moving regions are roughly separated into the region of the moving objects based on probabilistic clustering with mixture models using the optical flow and the image intensity for each region of the intraframe segmentation. Motion segmentation can finally be obtained by iterated estimation of affine motion parameters and region reassignment according to a criterion using Gauss-Newton iterative optimization algorithm.
Fine segmentation of image objects by means of active contour models using information derived from morphological transformations
Silko Kruse, Peter P. Kauff
A fine-segmentation system for precisely locating object contours is described. The system consists of three stages, where in the last stage an active contour model (ACM) is employed to attain a final refinement of the results. Unlike other ACM techniques the approach proposed here uses information provided by morphological operators. In this way two different segmentation paradigms are combined.
Three-dimensional segmentation of multiview images based on disparity estimation
Takeshi Naemura, Masahide Kaneko, Hiroshi Harashima
This paper presents new methods for partitioning a set of multi-view images into 3-D regions corresponding to objects in the scene in order to parse raw multi-view data into a 3-D region based structured representation. For this purpose, color, position, and disparity information at each pixel are incorporated as an attributes vector into the segmentation process. We propose three methods, all of which are based on K-means clustering algorithm. The first method is sensitive to the estimation error of disparity at each pixel, as it is formulated assuming that the estimated disparity is accurate. We solve this problem in the second method by prohibiting estimated disparity from being used for calculating the distance between attributes vectors. Finally, a third method is proposed to reduce the calculation cost of the segmentation process. As each 3-D region has one-to-one correspondence to an object or surface in the scene, 3-D region based structured representation of multi-view images is useful and powerful for data compression, view interpolation, structure recovery, and so on. The experimental results show the potential applicability of the method to the next-generation 3-D image communication system.
Video sequence segmentation based on rate-distortion theory
Josep Ramon Morros, Ferran Marques, Montse Pardas, et al.
This paper describes a coding-oriented segmentation technique for video schemes using an optimization strategy to address the problem of bit allocation. The optimization is based on the rate-distortion theory. Our purpose is to define a method to obtain an 'optimal' partition together with the best coding technique for each region of this partition so that the result is optimal in a rate-distortion sense.
Image segmentation based on an adaptive 3D analysis of the CIE-L*a*b* color space
Wolfgang Woelker
An algorithm for segmentation of color images based on an adaptive clustering method analyzing the three dimensional CIE-L*a*b* color space is presented. The color information of every picture element (PEL) is transferred and analyzed in a three-dimensional color- histogram. The algorithm detects clusters of similar colors and describes their position and size within the color histogram with an enclosing cuboid. Searching for cluster kernels can be done automatically or with interactively given color probes. The extension of the enclosing cube can be limited to a maximum Euclidean distance ((Delta) E) within the CIE-L*a*b*.
Interpolation/Reconstruction/Filtering
icon_mobile_dropdown
Spatially adaptive interpolation of digital images using fuzzy inference
Hou Chun Ting, Hsueh-Ming Hang
This paper presents a novel adaptive interpolation method for digital images. This new method can reduce dramatically the blurring and jaggedness artifacts on the high-contrast edges, which are generally found in the interpolated images using conventional methods. This high performance is achieved via two proposed operators: a fuzzy-inference based edge preserving interpolator and an edge-shifted matching scheme. The former synthesizes the interpolated pixel to match the image local characteristics. Hence, edge integrity can be retained. However, due to its small footage, it does not work well on the sharply curved edges that have very sharp angles against one of the coordinates. Therefore, the edge-shifted matching technique is developed to identify precisely the orientation of sharply curved edges. By combining these two techniques, the subjective quality of the interpolated images is significantly improved, particularly along the high-contrast edges. Both synthesized images (such as letters) and natural scenes have been tested with very promising results.
Multichannel image interpolation algorithms
Jong Ho Paik, Joon-Ki Paik, Jung-Hyun Hwang, et al.
In the present paper we propose a set of interpolation algorithms for improving the resolution of multichannel images, which are assumed to have a greater amount of information than a single-channel or still image. First, a multichannel interpolation algorithm on the rectangular grid is proposed. Second, a multichannel interpolation on the generalized grid for the real test image is proposed. Third, an implementationally efficient algorithm, which is suitable for real- time image processing, is also proposed based on image registration and interpolation followed by IIR filtering along the time axis. It is demonstrated that the proposed algorithms can be the theoretical bases for interpolation of dynamic image sequences, and experimental results of the proposed algorithms are also presented respectively.
Image interpolation as a boundary value problem
Hei Tao Fung, Kevin J. Parker
Image interpolation is the determination of unknown pixels based on some known pixels. The conventional interpolation methods such as pixel replication, bilinear interpolation, and cubic spline interpolation, assume that the known pixels are located regularly on a Cartesian mesh. They cannot be easily extended to other cases where the configurations of the known pixels are different. We propose a novel formulation of the image interpolation problem to deal with the more general cases, such as the case where a region of image is missing and the case where the known pixels are irregularly placed. The interpolation problem is formulated into a boundary value problem involving the Laplacian equation and the known pixels as the boundary conditions. The matrix equation resulting from the formulation has a unique solution. It can be solved efficiently by the successive over-relaxation (SOR) iteration. The advantage of the proposed interpolation method lies in its flexibility in handling the general cases of interpolation.
Multiresolution total least squares reconstruction algorithm based on wavelet in medical optical tomography
Wenwu Zhu, Yao Wang, Jun Zhang
In this paper, we present a wavelet based multiresolution total least squares (TLS) approach to solve the perturbation equation encountered in medical optical tomography. With this scheme, the unknown image, the data, as well as the weight matrix are all represented by wavelet expansions, and thus yielding a multi-resolution representation of the original Rayleigh quotient function in the wavelet domain. This transformed Rayleigh quotient function is then minimized using a multigrid scheme, by which an increasing portion of wavelet coefficients of the unknown image are solved in successive approximations. One can also quickly identify regions of interest (ROI) from a coarse level reconstruction and restrict the reconstruction in the following fine resolutions to those regions. At each resolution level, a TLS solution is obtained iteratively using a conjugate gradient (CG) method. Compared to a previously reported one grid iterative TLS algorithm, the multigrid method requires substantially shorter computation time under the same reconstruction quality criterion.
Road centerline reconstruction from sequential images based on shape from sequences
Chuang Tao
The reconstruction of 3D road centerlines becomes a physical problem of solving an energy- minimizing 3D B-splines shape model. The reconstruction is described as a process whereby a 3D road centerline shape model is deformed gradually, driven by forces arising from object space (internal energy) and image sequences (external energy). Recent test results demonstrate that this approach functions reliably even in situations where navigation errors exist and the road condition is far from ideal.
Trinocular image analysis for virtual frame reconstruction
Gwenaelle Le Mestre, Danielle Pele
This paper addresses the problem of reconstructing intermediate view points from the analysis of trinocular images. Each image in the triplet plays a symmetrical role i.e. three disparities maps are computed based on a multiresolution correlation algorithm. Then the three depth maps referring to the virtual camera are computed and fused. For that a fusion criterion based on similarity, reliability and visibility is defined. The depth map is then projected to reconstruct an intermediate view. Results concerning real image triplets are presented.
Adaptive multichannel filters for color image processing
Konstantinos N. Plataniotis, Dimitrios Androutsos, Anastasios N. Venetsanopoulos
A new family of adaptive nonlinear filters that use fuzzy membership functions based on different distance measures is proposed for color image processing. The proposed filters constitute a fuzzy generalization of well known multichannel filters. The principle behind the new filters is explained and comparisons with other nonlinear filters are provided. Color images corrupted with different types of noise are used to assess the performance of the proposed filters. Simulation results indicate that the new filters offer some flexibility and have excellent performance.
Optimization of multiplierless two-dimensional digital filters
S. Sriranganathan, David R. Bull, David W. Redmill
Circularly symmetric and diamond-shaped low-pass linear phase FIR filters are designed using coefficients comprising the sum or difference of two signed power-of-two (SPT) terms. A minimax error criterion is adopted in conjunction with an optimization process based on the use of genetic algorithms (GAs). The results presented are compared with those obtained using various other design methods, including simulated annealing, linear programming and simple rounding of an optimum (continuous) minimax solution. The filters designed using GAs exhibit superior performance to those designed using other methods.
Video Coding II
icon_mobile_dropdown
Rate control algorithm for co-located variable bit-rate MPEG-2 video encoders
Sanghoon Lee, Seop Hyeong Park, Sang Hoon Lee
In this paper, we propose a rate control algorithm for co-located multiple VBR MPEG-2 video encoders in multi-program environments. The proposed algorithm aims to improve the average picture quality, as well as to make the quality of each picture equal under buffer constraints. We control each rate while referring transmission rate over network and observing the traffic of the other video sources by monitoring the fullness of a virtual buffer in the system. To maintain uniform picture quality we keep the variance of quantization parameters as small as possible within a picture. Experimental results show that the proposed algorithm gives more than 1 dB improvement in PSNR compared to the rate control algorithm in MPEG2 TM5. This scheme is able to be applied to traditional constant bandwidth channel as well as ATM network by adapting both CBR and VBR multiplexing transmission rates.
Three-dimensional subband coding of video using the zero-tree method
In this paper, a simple yet highly effective video compression technique is presented. The zerotree method by Said, which is an improved version of Shapiro's original one, is applied and expanded to three-dimension to encode image sequences. A three-dimensional subband transformation on the image sequences is first performed, and the transformed information is then encoded using the zerotree coding scheme. The algorithm achieves results comparable to MPEG-2, without the complexity of motion compensation. The reconstructed image sequences have no blocking effects at very low rates, and the transmission is progressive.
Overflow-free video coders: properties and optimal control design
Jose Ignacio Ronda, Fernando Jaureguizar, Narciso N. Garcia
The control of video coders to adapt their variable output bit-rate to the input requirements of communication channels constitutes a well known problem which attracts considerable attention from researchers in the field. The importance of the problem stems from the freedom that international standards (such as MPEG) allow in the design of the rate control algorithm, which makes it one of the main reasons for difference in quality between same-standard coders from different manufacturers. In real-time applications the main concern in controlling the coder is to ensure a buffer overflow-free operation with a limited look-ahead in the analysis of the sequence while optimizing the quality of the encoded signal. Stochastic optimal control constitutes a systematic approach to obtain the control policy which minimizes in the long- term an average cost function. Provided that a modelization of the coder as a stochastic system is possible, the application of this theory becomes a way to avoid heuristic design. Within this frame, in this paper we concentrate on the obtainment of those policies which ensure that no buffer overflow occurs (overflow-safe policies). The condition for their existence is provided along with their main properties and the definition of a stochastic optimal control problem which can be solved using standard algorithms in order to obtain optimal overflow-safe policies. Results of the application of the approach to the control of a standard coder are provided.
Data compression for structured video using support layer
Hiroshi Watanabe
A new approach that enables data compression of the support layer while keeping 'structured video' with layer representation is proposed. The proposed video coding scheme consists of interframe prediction and arbitrary shape coding for the prediction error. Each support layer can be updated independently without sending any additional information to the decoder. It enables the decoder to synthesize arbitrarily shaped spatio-temporal images at high compression ratio.
SBASIC video coding and its 3D-DCT extension for MPEG-4 multimedia
Atul Puri, Robert L. Schmidt, Barry G. Haskell
Due to the need to interchange video data in a seamless and cost effective manner, interoperability between applications, terminals and services has become increasingly important. The ISO Moving Picture Experts Group (MPEG) has developed the MPEG-1 and the MPEG-2 audio-visual coding standards to meet these challenges; these standards allow a range of applications at bitrates from 1 Mbits to 100 Mbit/s. However, in the meantime, a new breed of applications has arisen which demand higher compression, more interactivity and increased error resilience. These applications are expected to be addressed by the next phase standard, called MPEG-4, which is currently in progress. We discuss the various functionalities expected to be offered by the MPEG-4 standard along with the development plan and the framework used for evaluation of video coding proposals in the recent first evaluation tests. Having clarified the requirements, functionalities and the development process of MPEG-4, we propose a generalized approach for video coding referred to as adaptive scalable interframe coding (ASIC) for MPEG-4. Using this generalized approach we develop a video coding scheme suitable for MPEG-4 based multimedia applications in bitrate range of 320 kbit/s to 1024 kbit/s. The proposed scheme is referred to as source and bandwidth adaptive scalable interframe coding (SBASIC) and builds not only on the proven framework of motion compensated DCT coding and scalability but also introduces several new concepts. The SNR and MPEG-4 subjective evaluation results are presented to show the good performance achieved by SBASIC. Next, extension of SBASIC by motion compensated 3D- DCT coding is discussed. It is envisaged that this extension when complete will further improve the coding efficiency of SBASIC.
Layered transmission of audio/video signals
Masoud Sajadieh, F. R. Kschischang, Alberto Leon-Garcia
In a broadcast channel, a single transmitter communicates with a number of receivers having different channel capacities available to them. A typical example of such a channel is the over the air TV broadcast. For the two-receiver case, we examine the performance of some common orthogonal transmission methods in terms of their achievable rates. Layered transmission, whereby the information intended for the better channel is superimposed on the portion of information common to both receivers, can provide an optimal solution. On this premise, we construct a bi-rate transmission model for a Gaussian broadcast channel, in which receivers are distributed according to an exponential pdf. A basic grade of service is thus maintained throughout the entire coverage area in addition to a higher quality video offered to the receivers with better reception conditions. The performance evaluation indicates that this model offers a higher per capita data rate comparing to the conventional single rate transmission systems. The multirate paradigm exhibits a stepwise degradation, mitigating the sharp cutoff threshold of the current digital broadcast systems. Application of this multirate broadcast proves very promising in the area of multiresolution transmission of digital HDTV.
Artificial test pattern generation for digital video coding systems
Howard C. Edinger Jr., Huifang Sun
A novel method of developing test patterns for digital video coding systems is presented. This method is illustrated for an MPEG-2 encoder where, subject to fixed encoding parameters, the input frames are identically the expected output frames of reconstructed video. The method is based upon a method of developing test bitstreams for MPEG decoders that is also described. Both methods rely upon bit-accurate modeling of the system under test by a software codec, usually a C program. Natural video may be used to test digital video systems; however, encoded natural images seldom provide adequate coverage of either fixed-length or variable- length binary codeword spaces. Test bitstreams may be readily constructed for decoders to cover specific codeword subspaces, but they decode as noticeably artificial video frames. 'Artificial' test patterns are obtained for an encoder by first decoding a specially constructed test bitstream, then holding its higher-level encoding parameters fixed and iteratively encoding the 'artificial' video frames until convergence is reached, i.e., input frames equal output frames. Convergence occurs after a few or dozens of iterations, and the resulting 'artificial' test patterns retain substantial coverage.
Fractal Methods
icon_mobile_dropdown
Hybrid fractal image coding method
Zhengbing Zhang, Yaoting Zhu, Guang-Xi Zhu, et al.
In recent years, fractal image compression has been paid great attention because of its potential of high compression ratio. In the previously published encoding techniques, an image is usually partitioned into nonoverlapping blocks, and each block is encoded by a self-affirm mapping from a larger block. A high cost of the searching process is generally needed to encode a block. With the help of experiments, we discovered blocks do exist which cannot be well matched with any larger blocks under self-affirm transform. To encode these kinds of blocks with the present fractal encoding method may result in relatively low fidelity on these blocks. In this paper, we propose a hybrid fractal encoding method based on DCT and self- affirm transforms to improve local fidelity and encoding speed. The concept of short distance piecewise self-similarity (SDPS) is defined. Those blocks possessing SDPS are encoded with near-center self-affirm transform method. Other blocks are encoded with quasi-JPEG algorithm. Our method makes use of both the advantages of fractal coding technique, possessing the potential of high compression ratio, and the advantages of JPEG algorithm providing high fidelity at low or medium compression.
Genetic algorithms for fast search in fractal image coding
David W. Redmill, David R. Bull, Ralph R. Martin
This paper demonstrates the application of genetic algorithms (GAs) to the real-time search problem of fractal image compression. An approach using GAs has been simulated and compared with both an exhaustive search method and a heuristic multi-grid method. Results, for various block sizes, show that the GA based approach offers a computationally more efficient search than either of the other methods.
Fractal-like video coding with weighted summation
Karl Bochez, Masahide Kaneko, Hiroshi Harashima
We introduce in the present paper a fast 3D fractal encoding method based on tree compression and achieving good compression ratio at very low bit rate. The algorithm is also open to transposition on the space of wavelet coefficients.
Fractal image coding using rate-distortion optimized matching pursuit
Mohammad Gharavi-Alkhansari, Thomas S. Huang
Matching pursuit is a general and flexible method for solving an optimization problem that is of interest in signal analysis, coding, control theory and statistics. In this paper, principles of rate-distortion optimal coding are used in combination with matching pursuit algorithm to obtain an enhanced fractal image coding method. The advantages of using such a method over traditional fractal image coders are described and compression results are presented.
Perceptually lossless fractal image compression
Huawu Lin, Anastasios N. Venetsanopoulos
According to the collage theorem, the encoding distortion for fractal image compression is directly related to the metric used in the encoding process. In this paper, we introduce a perceptually meaningful distortion measure based on the human visual system's nonlinear response to luminance and the visual masking effects. Blackwell's psychophysical raw data on contrast threshold are first interpolated as a function of background luminance and visual angle, and are then used as an error upper bound for perceptually lossless image compression. For a variety of images, experimental results show that the algorithm produces a compression ratio of 8:1 to 10:1 without introducing visual artifacts.
Reducing the codebook size in fractal image compression by geometrical analysis
Julien Signes
In most IFS based image coding schemes, the domain blocks codebook searched is independent from the input image, identical for all range blocks of the same size, and non optimized. In order to get a good match between range and domain blocks, a huge codebook is used, which is both inefficient in terms of computational complexity and output bit rate. We propose to design an optimal reduced codebook for the whole image thanks to a geometrical analysis of the basic scheme. We then compare this method with a 'local' codebook method, which leads us to some conclusions about the differences between IFS and vectorial quantization (VQ) based coding schemes.
Edge detection based on scale fractal dimension
Donghui Xue, Yaoting Zhu, Guang-Xi Zhu, et al.
In this paper we propose a new concept of scale fractal dimension for the fractal characterization. We point out that scale fractal dimension can supply more information and give a more accurate description for fractal in nature than the usual used fractal dimension dose. The variation of scale fractal dimension for different kinds of images is analyzed and a new metric based on scale fractal dimension for edge detection is defined. After that, the algorithm for edge detection based on scale fractal dimension is given and its performances for edge detection are discussed.
Fractal-based motion estimation for image sequence coding
Kwok-Leung Chan, Graham R. Martin
In this investigation, motion estimation is carried out on three image sequences using a block matching approach. Each frame of the image sequence is partitioned into a number of fixed size blocks, and for each block the fractal dimension is calculated. For each block in the current frame, the best-matching block in the previous frame is identified using a novel two- pass searching scheme. In the first pass, the fractal dimension is calculated in nine positions within the search space. The coarse position of the corresponding block is identified based on the similarity of the fractal dimension. In the second pass, a grey level exhaustive search around the coarse position is used to determine the exact position of the corresponding block. The searching process is waived if the block has negligible movement. Preliminary results show that the new motion estimation method requires much less computation than the exhaustive search technique and provides a better estimate than the three-step search method, especially for large search spaces.
Enhancement/Restoration
icon_mobile_dropdown
Iterative algorithm for improving the resolution of video sequences
Brian C. Tom, Aggelos K. Katsaggelos
This paper introduces an iterative and temporally recursive technique to improve the spatial resolution of a video sequence. Such iterative techniques have a number of advantages, among which are the ability to incorporate prior information into the restoration and the removal of the need for taking inverses of matrices. At each iteration, a residual (error) term is added to the current estimate of the high resolution frame. This residual term is based on a model which mathematically describes the relationship between the low resolution images and their corresponding high resolution images. This model incorporates the motion between the frames, and in the most general case, takes into account occlusions and newly uncovered areas. Experimental results are presented which demonstrate the capabilities of the proposed approach. Keywords: video enhancement , interpolation, motion compensation , subsampling
Modified CPI filter algorithm for removing salt-and-pepper noise in digital images
Nelson Hon Ching Yung, Andrew H. S. Lai, Kim Ming Poon
In this paper, the theoretical aspects, implementation issues, and performance analysis of a modified CPI filter algorithm are presented. As the concept of the original CPI algorithm is to identify corrupted pixels by interrogating subimages, and considering the intensity spread of pixel values within the subimage when making a decision, the modified algorithm similarly takes into account the subimage gray level distribution across the whole gray scale. It works on the assumption that to consider which group in the subimage is corrupted, the multiple- feature histogram representing a subimage gray level distribution must be transformed into a two-feature histogram such that these two features can be mapped onto the two available pixel classes. This transformation is performed by using a 1-sigma decision about the mean intensity of the subimage, which enables pixels that fall inside the sigma bounds to be considered as uncorrupted, and the rest corrupted. A performance analysis of the modified CPI, original CPI, average, median and sigma algorithms is given for noisy images corrupted by salt-and- pepper noise of the impulsive and Gaussian nature, and gray noise over the signal-to-noise ratios (SNR) of plus 50 dB to minus 50 dB. The results show that similar to the original CPI algorithm, the modified CPI algorithm exhibits a number of desirable features. Firstly, due to its pixel identification property, it has better noise removing capability than the conventional filter algorithms. Secondly, most features in the original image are preserved in the restored image compared with, say, the median filter. Thirdly, iterative filtering of a noisy image using the CPI algorithm is possible.
Study on improving iterative image restoration
Joon Il Moon, Joon-Ki Paik
In the present paper we propose two new improved iterative restoration algorithms. One is to accelerate convergence of the steepest descent method using the improved search directions. The other is also a fast iterative algorithm based on preconditioning. The preconditioner in the proposed algorithm can be implemented by using FIR filter structure, so it can be applied to practical application with a manageable amount of computation. Experimental results show good performance improvement in accelerating the convergence. Although the proposed methods have a disadvantage that cannot be included in adaptive restoration, they can be used as pre-processing for adaptive algorithms.
Color image enhancement in a new color space
Tian-Hu Yu
A practical approach to developing color image enhancement algorithms is to transform the R, G, and B components of each pixel into another set of color coordinates with which luminance, hue, and saturation of color can be described. Each of the new set of coordinates is processed with its own enhancement algorithm, and the enhanced new set of coordinates are inverse transformed to R, G, and B components for display. In this paper, we introduce a new color space based on a concept of nonlinear color differences. The transformations between the proposed color space and the RGB color space are much simpler than those between the LHS and the RGB color spaces. Meanwhile, the chromatic components are totally uncorrelated with the achromatic component, with which hue and saturation of color can be exactly described. With the proposed color space obtained by modifying the YIQ color space, we develop some color image processing algorithms for luminance component equalization, saturation equalization, and color emphasis.
Projection onto the narrow quantization constraint set for postprocessing of scalar quantized images
Dongsik Kim, Seop Hyeong Park
Since the postprocessing of image data using a priori information depends on the constraints imposed on the decoded images, it is important to utilize constraints which are best suited to postprocessing techniques. Among the constraint sets, the quantization constraint set (QCS) is commonly used in various algorithms that are especially based on the theory of projection onto convex sets. In general, the QCS is the closure of the corresponding known quantization region, since such a QCS is the smallest set that is easily predictable at the decoder and always includes the original image before quantization. Our work, however, has revealed that the ordinary QCS is not optimal in the sense of minimum mean square error. Surprisingly, under certain conditions the optimal QCS is always obtained when the boundary of the QCS is narrower than that of the ordinary QCS. In this paper we propose the narrow quantization constraint set (NQCS) as a substitute for the ordinary QCS. We also present mathematical analysis and simulations which demonstrate that the NQCS works better than the ordinary QCS on natural images.
Image enhancement for low-bit-rate JPEG and MPEG coding via postprocessing
Yung-Kai Lai, Jin Li, C.-C. Jay Kuo
JPEG and MPEG compression standards are based on block discrete cosine transform (BDCT) and, when the bit rate becomes low, visually annoying blocking artifacts appear in decompressed images and videos. In this work, we attempt to characterize and quantify the blocking artifact, and then propose an iterative method for its removal by using block classification and space-frequency filtering. The proposed new method is better than the POCS (projection onto convex sets) method in terms of convergence rate, visual appearance and objective mean square error measure.
Multichannel image restoration based on a pseudo-block-diagonalized Wiener filter
Ki-Woon Na, Joon-Ki Paik
In the present paper we propose a multichannel image restoration method based on a pseudo- block-diagonalized Wiener filter. The proposed algorithm assumes that a multichannel imaging system includes both intra and inter-channel degradation, and utilizes correlation information among channels in the restoration process. Three types of the spectrum estimation techniques are also proposed for implementing the proposed restoration algorithm. The proposed restoration method can be implemented in the Fourier transform domain, and it neither assumes spectral or spatial separability used by Hunt, nor computes the inverse of a huge block matrix.
Globally optimal smoothing functional for edge-enhancing regularized image restoration
In regularized image restoration a solution is sought which preserves the fidelity to the noisy and blurred image data and also satisfies some constraints which represent our prior knowledge about the original image. A standard expression of this prior knowledge is that the original image is smooth. The regularization parameter balances these two requirements, i.e., fidelity to the data and smoothness of the solution. The smoothness requirement on the solution, however, results in a globally smooth image, i.e., no attention is paid to the preservation of the high spatial frequency information (edges). One approach towards the solution of this problem is the introduction of spatial adaptivity. A different approach is presented in this paper. According to this approach besides the constraint which bounds from above the energy of the restored image at high frequencies, a second constraint is used. With this constraint the high frequency energy of the restored image is also bounded from below. This means that very smooth solutions are not allowed, thus preserving edges and fine details in the restored image. Extending our previous work, we propose a nonlinear formulation of the regularization functional and derive an iterative algorithm for obtaining the unique minimum of this functional. The regularization parameters are evaluated simultaneously with the restored image, in an iterative fashion based on the partially restored image.
Removing noninformative subsets during vectorization of gray-level line patterns
Sergey V. Ablameyko, Carlo Arcelli, G. Ramella
A two-stage procedure for direct vectorization of gray-level line patterns, that bypasses the binarization phase, is illustrated. The first stage extracts the gray-skeleton of the pattern as the set of one-pixel thick subsets placed in correspondence with locally higher intensity regions. The second stage removes skeleton subsets that are noninformative for the task at hand, by taking into account parameters tailored to the problem domain. Criteria and operations employed to modify respectively peripheral and internal skeleton branches are described.
Image Coding I
icon_mobile_dropdown
Audiovisual signal compression: the 64/P codecs
Nikil S. Jayant
Video codecs operating at integral multiples of 64 kbps are well-known in visual communications technology as p * 64 systems (p equals 1 to 24). Originally developed as a class of ITU standards, these codecs have served as core technology for videoconferencing, and they have also influenced the MPEG standards for addressable video. Video compression in the above systems is provided by motion compensation followed by discrete cosine transform -- quantization of the residual signal. Notwithstanding the promise of higher bit rates in emerging generations of networks and storage devices, there is a continuing need for facile audiovisual communications over voice band and wireless modems. Consequently, video compression at bit rates lower than 64 kbps is a widely-sought capability. In particular, video codecs operating at rates in the neighborhood of 64, 32, 16, and 8 kbps seem to have great practical value, being matched respectively to the transmission capacities of basic rate ISDN (64 kbps), and voiceband modems that represent high (32 kbps), medium (16 kbps) and low- end (8 kbps) grades in current modem technology. The purpose of this talk is to describe the state of video technology at these transmission rates, without getting too literal about the specific speeds mentioned above. In other words, we expect codecs designed for non- submultiples of 64 kbps, such as 56 kbps or 19.2 kbps, as well as for sub-multiples of 64 kbps, depending on varying constraints on modem rate and the transmission rate needed for the voice-coding part of the audiovisual communications link. The MPEG-4 video standards process is a natural platform on which to examine current capabilities in sub-ISDN rate video coding, and we shall draw appropriately from this process in describing video codec performance. Inherent in this summary is a reinforcement of motion compensation and DCT as viable building blocks of video compression systems, although there is a need for improving signal quality even in the very best of these systems. In a related part of our talk, we discuss the role of preprocessing and postprocessing subsystems which serve to enhance the performance of an otherwise standard codec. Examples of these (sometimes proprietary) subsystems are automatic face-tracking prior to the coding of a head-and-shoulders scene, and adaptive postfiltering after conventional decoding, to reduce generic classes of artifacts in low bit rate video. The talk concludes with a summary of technology targets and research directions. We discuss targets in terms of four fundamental parameters of coder performance: quality, bit rate, delay and complexity; and we emphasize the need for measuring and maximizing the composite quality of the audiovisual signal. In discussing research directions, we examine progress and opportunities in two fundamental approaches for bit rate reduction: removal of statistical redundancy and reduction of perceptual irrelevancy; we speculate on the value of techniques such as analysis-by-synthesis that have proved to be quite valuable in speech coding, and we examine the prospect of integrating speech and image processing for developing next-generation technology for audiovisual communications.
Wavelet/Multiresolution Image Compression
icon_mobile_dropdown
High-resolution radar imaging with applications to astronomy
David C. Munson Jr.
Synthetic aperture radar (SAR) is an example of a computed imaging system that is capable of synthesizing imagery having extraordinary resolution. This is accomplished by coherently processing radar returns collected from many different spatial locations. SAR has many cousins, including computer tomography, magnetic resonance imaging, holography, interferometric radio astronomy, and x-ray crystallography. Each of these systems acquires partial Fourier data and then employs sophisticated digital processing techniques to form 2-D, 3-D, and 4-D images. In this talk, we first provide a brief overview of Fourier-based computed imaging and then focus on the application of SAR to radar astronomy. Conventional radar imaging has been used to map the radar reflectivity of the Moon, interior planets, and asteroids, employing range-Doppler techniques. Although this form of processing has proven highly successful over the years, it is problematic for extremely fine-resolution applications. During the data collection interval (typically several minutes), the object's rotation may cause scatterers to migrate through range-Doppler resolution cells. In addition, during extended observation times, the apparent rotation rate of the object may change, and the Doppler cell boundaries may actually move. Both situations contribute to spatially-varying smearing in the image when using conventional processing on high-resolution data. To overcome this problem, we propose the application of imaging algorithms from spotlight-mode SAR. When viewed in a tomographic context, this form of processing recognizes that each range-bin datum (after range compression) represents a superposition of the reflectivity of all illuminated scatterers at that range; this is approximately a linear projection. From the projection-slice theorem, the Fourier transform of each returned signal gives a slice of the 2-D Fourier transform of the reflectivity for a 2-D surface. Collecting data from many angles, as the object rotates, provides Fourier data on a nearly polar grid. Simple polar-to-Cartesian interpolation, followed by a 2-D FFT, produces the high-resolution image. We show results of applying such processing to Lunar data acquired at Arecibo Observatory and demonstrate that the SAR-based processing method is superior to conventional range-Doppler processing. Some of our more recent work is focused on imaging of 3-D objects having unknown and possibly highly irregular shapes. This work is motivated by the problem of imaging asteroids.
Image Coding II
icon_mobile_dropdown
Computer vision challenges and technologies for agile manufacturing
Perry A. Molley
Sandia National Laboratories, a Department of Energy laboratory, is responsible for maintaining the safety, security, reliability, and availability of the nuclear weapons stockpile for the United States. Because of the changing national and global political climates and inevitable budget cuts, Sandia is changing the methods and processes it has traditionally used in the product realization cycle for weapon components. Because of the increasing age of the nuclear stockpile, it is certain that the reliability of these weapons will degrade with time unless eventual action is taken to repair, requalify, or renew them. Furthermore, due to the downsizing of the DOE weapons production sites and loss of technical personnel, the new product realization process is being focused on developing and deploying advanced automation technologies in order to maintain the capability for producing new components. The goal of Sandia's technology development program is to create a product realization environment that is cost effective, has improved quality and reduced cycle time for small lot sizes. The new environment will rely less on the expertise of humans and more on intelligent systems and automation to perform the production processes. The systems will be robust in order to provide maximum flexibility and responsiveness for rapidly changing component or product mixes. An integrated enterprise will allow ready access to and use of information for effective and efficient product and process design. Concurrent engineering methods will allow a speedup of the product realization cycle, reduce costs, and dramatically lessen the dependency on creating and testing physical prototypes. Virtual manufacturing will allow production processes to be designed, integrated, and programed off-line before a piece of hardware ever moves. The overriding goal is to be able to build a large variety of new weapons parts on short notice. Many of these technologies that are being developed are also applicable to commercial production processes and applications. Computer vision will play a critical role in the new agile production environment for automation of processes such as inspection, assembly, welding, material dispensing and other process control tasks. Although there are many academic and commercial solutions that have been developed, none have had widespread adoption considering the huge potential number of applications that could benefit from this technology. The reason for this slow adoption is that the advantages of computer vision for automation can be a double-edged sword. The benefits can be lost if the vision system requires an inordinate amount of time for reprogramming by a skilled operator to account for different parts, changes in lighting conditions, background clutter, changes in optics, etc. Commercially available solutions typically require an operator to manually program the vision system with features used for the recognition. In a recent survey, we asked a number of commercial manufacturers and machine vision companies the question, 'What prevents machine vision systems from being more useful in factories?' The number one (and unanimous) response was that vision systems require too much skill to set up and program to be cost effective.