Facial image tracking system architecture utilizing real-time labeling
Author(s):
Yuichi Fujino;
Takeshi Ogura;
Toshiaki Tsuchiya
Show Abstract
This paper proposes a new moving-objects tracking method processed by a local spiral labeling with CAM (Content Addressable Memory). The local spiral labeling method was proposed in order to improve one of the shortcomings of TV telephones. The conventional labeling, however, needs huge processing time and a memory capacity in order to compute connecting relations between label numbers. CAM has some functions to search and write the plural contents of the memory at the same time. CAM is suitable for a real time labeling. This paper shows a new labeling algorithm called local spiral labeling, a real-time labeling scheme utilizing CAM, and a prototype system of human head tracking using 0.5 micrometers BiCMOS gate-array technology.
Very low bit-rate video coding with object-based motion compensation and orthogonal transform
Author(s):
Yutaka Yokoyama;
Yoshihiro Miyamoto;
Mutsumi Ohta
Show Abstract
This paper presents a new object-based video coding algorithm for very low bit-rate video transmission. The algorithm employs an object-based approach for motion compensation, so that block distortion does not appear. The moving objects are detected from local decoded pictures, so information regarding object contours is not sent to a decoder. This technique contributes to drastically reducing the code amount to be transmitted. Motion compensated prediction errors are encoded by transform coding, but only large error regions are encoded. Computer simulation results at 16 kbps show that the subjective quality for a decoded picture by this algorithm is substantially better than that of a conventional algorithm.
Model-based segmentation of the tongue surface using a modified scale space filter
Author(s):
Juergen Kelch;
Berthold B. Wein
Show Abstract
Ultrasonic imaging is used in order to detect disturbed tongue movements during swallowing and articulation. On these real-time B-mode sonographic images, which represent a median sagittal plane through the tongue and the floor of the mouth, the physician normally marks the tongue surface (dorsum) manually to stress the shape information. Our work presents a solution to extract the tongue dorsum automatically. We use a modified scale space filter for basic segmentation. This edge detector is based on coarse-to-fine tracking by varying the smoothing parameter of the Laplacian-of-Gaussian filter (LoG). In this way contour segments of the tongue dorsum and other objects are extracted. A model of the tongue supports identification of the tongue segments and interpolation of the surface in the spatiotemporal space. Some segments include two or more objects -- like liquid and tongue. This is the reason why we form the tongue as a chain of elliptical structure elements. This model stresses a direction to detect the orientation of the tongue and is flexible enough to form any shape. These structure elements are matched to the scale space segments by correlation. A trainable cost path classifier selects the topological connections of the structure elements, which are linked by a spline interpolation. Finally virtual three-dimensional views of the contour surface in the spatiotemporal space are generated with different azimuthal angles for visualization.
Symmetrical segmentation-based image coding
Author(s):
Christina Saraceno;
Riccardo Leonardi
Show Abstract
An image coding technique based on symmetry extraction and Binary Space Partitioning (BSP) tree representation for still pictures is presented. Axes of symmetry, detected through a principal axis of inertia approach and a coefficient of symmetry measure, are used to divide recursively an input image into a finite number of convex regions. This recursive partitioning results in the BSP tree representation of the image data. The iterative partition occurs whenever the current left/right node of the tree cannot be represented `symmetrically' by its counterpart, i.e., the right/left node. This splitting process may also end whenever the region associated with a given node has homogeneous characteristics or its size falls below a certain threshold. Given a BSP tree partition for a given input image, and the `seed' leaf nodes (i.e., those that cannot be generated by mirroring their counterparts), the remaining leaf nodes of the tree are reconstructed using a predictive scheme with respect to the `seed' leaf nodes.
Time-varying homotopy and the animation of facial expressions for 3D virtual space teleconferencing
Author(s):
Souichi Kajiwara;
Hiromi T. Tanaka;
Yasuichi Kitamura;
Jun Ohya;
Fumio Kishino
Show Abstract
A homotopy describes the transformation of one arbitrary curve into another that shares the same endpoints. In this paper, we propose a deformable cylinder model, based on homotopy, in which an arbitrary surface interpolated between two contours via a blending function is transformed into another surface over time. We then show how this homotopic deformation can be applied to the realistic animation of human faces in a virtual space teleconferencing system. Specifically, we show that facial expressions such as wrinkling of the forehead and opening and closing of the mouth can be synthesized and animated in real time through 3D homotopic deformations.
Parallel approach to character recognition and its VLSI implementation
Author(s):
Heng-Da Cheng;
C. Xia;
C. N. Zhang
Show Abstract
In this paper, a novel parallel approach called VH2D (Vertical- Horizontal-2 Diagonals) method is proposed. In this method, projections of character images are made from four directions: vertical, horizontal, and two diagonals, which produce four subfeature vectors for each character. The four subfeature vectors are adjusted according to the character's center and combined into a complete feature vector. The feature dictionary consists of 3000 Chinese characters. The experiment has shown that the approach can result in extremely high accuracy of recognition. The techniques of pipelining and parallelism are applied in the proposed approach. The time complexity of the proposed algorithm is 0(N). A simple VLSI architecture composed of four linear arrays of processing elements for the VH2D approach is briefly presented.
Extraction of two-dimensional arbitrary shapes using a genetic algorithm
Author(s):
Tomoharu Nagao;
Takeshi Agui;
Hiroshi Nagahashi
Show Abstract
A method is proposed to extract two-dimensional shapes which are similar to a given model shape from a binary image composed of black and white pixels. This extraction problem is equivalent to the problem of determining the position, size, and rotational angle of each similar shape in the binary image. The model shape is transformed with four space transformation parameters, (chi) c, yc, M, and (theta) , and the transformed model shape is overlapped with the binary image. Parameters (chi) c and yc denote xy coordinate values of the center of gravity of the transformed model shape, M is magnification ratio and (theta) is rotational angle. The research goal here is to find out the space transformation parameter set that gives the maximum matching rate between the transformed model shape and a similar shape in the binary image. Genetic algorithm (GA), which is a kind of searching or optimizing algorithm, is employed for this problem. In this method, several virtual living things whose chromosomes represent space transformation parameters are randomly generated in a computer, and they are evolved according to GA. As the result of generation iterations, an evolved individual corresponding to the best space transformation parameter set is obtained. Algorithm of the method and several experimental results are described.
Audio-visual speech recognition for a vowel discrimination task
Author(s):
Peter L. Silsbee;
Alan Conrad Bovik
Show Abstract
Among the various methods which have been proposed to improve the robustness and accuracy of automatic speech recognition (ASR) systems, lipreading has received very little attention. In this paper, we provide motivation for the use of lipreading. A novel speaker dependent lipreading system is developed, which uses hidden Markov modeling, a well known and highly successful technique for audio-based ASR. It is used in conjunction with an audio ASR system in order to improve the accuracy of the latter, especially under degraded acoustical conditions. Reductions in error of 30 to over 60% result.
Classification and compression of digital newspaper images
Author(s):
Wey-Wen Cindy Jiang;
H. E. Meadows
Show Abstract
An improved scheme for newspaper block segmentation and classification is described. The newspaper image is first segmented into blocks using three passes of a run-length smoothing algorithm. Blocks may have any shape and need not be non-overlapped rectangles. The height H between the top-line and base-line of lower case letters, and the number of pixels that have values differing from their four neighboring pixels, are measured for simple and reliable block classification. Blocks of different types are compressed based on their own characteristics. Unlike conventional methods, halftone image blocks are treated differently from black and white graphic blocks for better compression. A lossless compression scheme for halftoned images is proposed. Reconstruction of gray-tones from halftone images employing information of both smooth and edgy areas is presented.
Joint multigrid estimation of 2D motion and object boundaries using boundary patterns
Author(s):
Ralf Buschmann
Show Abstract
For the improvement of joint estimation of 2D motion and 2D object boundaries an optimization procedure based on multigrid and pattern processing is presented. In this procedure an iterative optimization is applied using grids of different resolution, starting with a global estimation of the motion in the coarsest grid. Improvements are achieved by the introduction of boundary patterns, which are obtained from boundary estimates from coarser grids to be used in the finer ones. It is shown that the developed technique compared to a hierarchical blockmatching technique leads to a significant reduction of the mean square error between compensated and original images at object boundaries.
Fractal approach to low-rate video coding
Author(s):
Bernd Huertgen;
Peter Buettgen
Show Abstract
This paper presents a method for fast encoding and decoding of image sequences based on fractal coding theory and the hybrid coding concept. The DPCM-loop accounts for statistical dependencies of natural image sequences in temporal direction. Those regions of the original image where the prediction, i.e., motion estimation, and compensation fails are encoded using an advanced fractal coding scheme which originally was developed for encoding of still images. Similar to conditional replenishment coders, not regions of the residual image itself but of the original image are encoded. The introduction of a fractal coding scheme instead of the commonly used DCT turns out to be advantageous especially at very low bit rates (8 - 64 kbit/s). In order to increase reconstruction quality, encoding speed and compression ratio, some additional features such as hierarchical codebook search and multilevel block segmentation, are proposed.
Sequence coding based on the fractal theory of iterated tranformations systems
Author(s):
Emmanuel Reusens
Show Abstract
A new scheme for sequence coding based on the iterated functions systems theory is presented. The method relies on a 3D approach in which the sequence is adaptively partitioned. Each partition block can be coded either by using the spatial self similarities or by exploiting temporal redundancies. The proposed system shows very good performances when compared to other existing methods.
Localized fractal dimension measurement in digital mammographic images
Author(s):
Christine J. Burdett;
Mita D. Desai
Show Abstract
This paper investigates a novel image processing tool for differentiating between malignant and benign lesions in digitized mammograms. The new technique makes use of localized measurements of fractal parameters, calculated through the use of Gabor filters, and provides a means of quantifying the intrinsic roughness of the intensity surface of the digitized mammographic data. Since benign lesions are usually smoothly marginated while malignant lesions are characterized by indistinct, rough, spiculated borders, the premise used is that a benign lesion will have a value of fractal dimension that is lower than that of a malignant lesion. The technique allows spatio-spectrally accurate fractal parameter measurements to be made by making fractal measurements over different scales. This is done by decomposing the image into N bandpass channels. The local fractal dimension can then be measured from the spectral samples by finding the best linear fit (linear regression) to the data set, from which the fractal parameters of interest are computed. Conjoint resolution can be obtained by selecting the channels filters to be Gabor functions, or frequency-translated Gaussian functions. Results of this technique as applied to lesions in digitized mammograms are presented, using mammogram x-rays digitized to 12 bits of gray scale resolution.
Computational morphology and representation of operators between complete lattices
Author(s):
Clara Cuciurean-Zapan;
Edward R. Dougherty
Show Abstract
The representations of translation-invariant mappings in the context of computational morphology are in terms of elementary binary erosions and dilations. The general representations in the context of complete lattices are in terms of abstract dilations and antidilations. The present paper considers the relationship between the specialized computational and general lattice representations. The computational and lattice-based representations are directly demonstrated to be equivalent in the computational setting.
Morphological pyramids for image coding
Author(s):
Jose Crespo;
Jean C. Serra
Show Abstract
This paper presents a method to compute a partition intended to be used in a region-based coding system where the number of regions is a design constraint. A new flat zone approach is introduced that (a) selects a subset of those flat zones (regions) that are present in the output of a multiscale connected filter, and that (b) assigns the rest of the regions to one among those selected by using flat zone merging procedures. The filters used are connected sequential alternating filters and a new connected filter recently presented. These are flat operators, invariant under anamorphosis, that act on a function by extending its flat zones exclusively. A new theoretical proposition is stated. Our segmentation approach behaves like a connected operator and it preserves the flat zone inclusion property between the input and the output partition. Moreover, regions computed at the highest resolution level can be imposed in successive coarser levels. Among the flat zones which are present at the output of a connected filter, the most significant ones with respect to several criteria are distinguished. This region selection is based solely on an ordering of the regions. Absolute thresholding is avoided, and the proposed approach computes with linear dynamic changes.
Application algorithm of mathematical morphology for SMT
Author(s):
Jun-Sik Kwon;
Jong Soo Choi
Show Abstract
In this paper, we propose a new visual positioning and inspecting algorithm. The positioning algorithm is the process of computing the center position and the rotated angle of the surface mount device. The inspecting algorithm is capable of detecting the location of broken or bent leads. The morphological opening is applied to obtain a leadless image of the quad flat package or the small outline package, and the leadless image is able to be treated like an image of the square-shaped passive devices. The center position and the rotated orientation are found by four corner points which are obtained from the morphological skeleton subsets of the leadless image. After finding them, the hit and miss transform is utilized to extract the feature of the corner identification (ID). The ultimate orientation is determined correctly using results detected by hit structuring element and miss structuring element which satisfy the feature of the corner ID. The morphological inspection is executed extending over two steps, i.e., a rough inspection and a detailed inspection. The former is practiced before finding the center and the orientation, and the latter is performed after finding them.
Forward rate control for MPEG recording
Author(s):
Emmanuel D. Frimout;
Jan Biemond;
Reginald L. Lagendijk
Show Abstract
The MPEG video coding algorithm is used in a large variety of video recording applications. A key constraint for video coding algorithms for consumer (tape) recorder applications is the bit stream editability requirement; i.e., it must be possible to replace N consecutive frames by N new consecutive frames on the storage media, using at most the same number of bits. In this paper this constraint is satisfied by the use of a forward rate control mechanism, such that each group of pictures (GoP) will be encoded using a fixed number of bits (within a tolerance margin to be minimized). The problem of performing a forward state allocation (quantizer step allocation) is limited to the picture level by performing a pre-allocation, assigning a fraction of the available bits to each of the frames of a GoP. The state allocation per picture amounts to the correct selection of the quantization step size for all slices. This is done by forming parametric models for both the rate (R) and the distortion (D), such that for a particular slice, the R-D curve can be predicted. Using the R-D curves of every slice of the picture, the state allocation can be performed. With the described algorithm the GoP rate error is within 4% in the stationary mode; if a non-stationary mode that includes a re-allocation based on feedback information is added, the error is within 1%.
Transmission error detection, resynchronization, and error concealment for MPEG video decoder
Author(s):
Sang Hoon Lee;
Jong Sun Yoon;
Seong Hwan Jang;
Soon Hwa Jang
Show Abstract
In this paper, we propose an error detection and a resynchronization algorithm in the decoding process of VLC and an error concealment technique for MPEG video decoder. A transmission error causes an error propagation in compressed video data, which in turn leads to objectionable degradation in picture quality. In order to minimize the effect of error propagation, we should resynchronize and recover the differentially coded symbols as fast as possible. The proposed algorithm is capable of resynchronizing the VLC until at least the next macroblock. Concerning the error concealment algorithm, more emphasis is laid upon the concealment of erroneous intra mode MBs. The proposed technique shows better performance than any other previous techniques especially in I-pictures and scene change pictures.
Real-time testbed for MPEG-1 video compression
Author(s):
Christian Bouville;
A. Bouriel;
R. Brusq;
P. Houlier
Show Abstract
The MPEG-1 (ISO-11172) video compression standard allows considerable latitude in codec design in both aspects of processing algorithms and hardware. This flexibility is difficult to exploit because it implies hours of computer simulation to find a good compromise between the application requirements, the picture quality, and the hardware complexity. To cope with this problem, we have developed an MPEG-1 programmable coding testbed that allows real time experimentation of the coding process in a wide range of application profiles. The coder architecture is highly modular and consists of several processing units which communicate through high speed video data paths and low speed transputer links for control data. Each of these processing units has an MIMD architecture made up of a linear structure of processing modules. These processing modules combine transputers for the control tasks and DSPs for intensive computing tasks (or block matching accelerators for the motion estimation unit). The power of the current version of the system is sufficient to process MPEG SIF video format in real time. The implementation of other low bit rate video compression applications is currently being considered.
Block loss for ATM video
Author(s):
Sze Keong Chan;
Alberto Leon-Garcia
Show Abstract
In BISDN, the asynchronous transfer mode (ATM) requires all information to be represented as a sequence of standard data units called cells. Cell los is inherent in ATM networks due to the cell header corruption and buffer overflow in the network. Several studies have shown that cell losses are bursty for an ATM network. In this work, we encoded real video sequences with a variable bit-rate (VBR) version of the H.261 video encoder in order for us to determine the relationship between blocks in a video frame and the number of ATM cells generated. We then considered the impact of bursty cell losses on image block loss probability. Block loss distributions are given at different codec and channel parameters. We also obtained block loss results using a cell loss correction scheme. Three sequences were analyzed to obtain the cumulative block loss probability distribution. Similar maximum and minimum block loss probability values were obtained for each sequence. The block loss probability distribution varies according to the amount and type of motion present in each sequence. We show that the block loss is confined to one group of blocks (GOB). The maximum block loss probability can be two orders of magnitude larger than the channel cell loss probability. By using the cell loss correction scheme, block loss was reduced to a level equivalent to reducing cell loss probability by five orders of magnitude.
Optimal delayed-coding of video sequences subject to a buffer-size constraint
Author(s):
David W. Lin;
Ming-Hong Wang;
Jiann-Jone Chen
Show Abstract
A video encoder has the task of producing lowest-distortion coded video subject to some constraints on delay, rate, and buffer conditions. We present a general optimization approach to this problem in a framework of delayed coding and we motivate a certain formulation of the optimization objective. Two forms of distortion measures are considered, namely, the maximum distortion and the total distortion, each defined over a segment of the video to be coded. These distortion measures are chosen for their mathematical tractability and practical importance. A solution (computational algorithm) for each case is described. Subject to some conditions, the solutions may be suboptimal. Simulation results show an improved performance with this approach compared to a simple typical approach which varies the quantization scale linearly with the encoder buffer level.
Entropy criterion for optimal bit allocation between motion and prediction error information
Author(s):
Fabrice Moscheni;
Frederic Dufaux;
Henri Nicolas
Show Abstract
Motion estimation and compensation techniques are widely used in video coding. This paper addresses the problem of the trade-off between the motion and the prediction error information. Under some realistic hypotheses, the transmission cost of these two components can be estimated. Therefore, we obtain a criterion which controls the motion estimation process in order to optimize its performance. As a particular application, this criterion is applied to the split procedure of an adaptive multigrid block matching technique. Simulation results are presented, showing the significant improvements due to the method.
Effective fuzzy logic approach to image enhancement
Author(s):
Zhiwei Zhao;
Xueqin Li;
Heng-Da Cheng
Show Abstract
This paper presents a fuzzy enhancement algorithm for image enhancement based on a fuzzy membership function and a contrast intensification transformation function using the average fuzzy membership grade with neighbors. Our algorithm has the following features: (1) fully employs the information of neighbors and utilizes the two-step nonlinear transformations, (2) reduces noise by using a smoothing operator in fuzzy singletons, and (3) keeps the edges intact.
Enhancement of retinal images: a critical evaluation of the technology
Author(s):
Eli Peli
Show Abstract
Evaluation of retinal images is essential to modern ophthalmic care. With the advent of image processing equipment, digital recording and processing of retinal images is starting to replace the standard film based fundus photography. The ability to enhance images is cited as one of the major benefits of this expensive technology. This paper critically reviews the practices employed in the image enhancement literature. It is argued that the papers published to date have not presented convincing evidence regarding the diagnostic value of retinal image enhancement. The more elaborate studies in radiology suggest, at best, modest diagnostic improvement with enhancement. The special difficulties associated with the demonstration of an improved diagnosis in ophthalmic imaging are discussed in terms of the diagnostic task and the selection of study populations.
Adaptive, model-based restoration of textures by generalized Wiener filtering
Author(s):
Ravi Krishnamurthy;
John W. Woods;
Joseph M. Francos
Show Abstract
We consider the adaptive restoration of inhomogeneous textured images degraded by linear blur and additive white Gaussian noise. The method consists of segmenting the image into individual homogeneous textures and restoring each texture separately. The individual textures are assumed to be realizations of 2-D Wold-decomposition based regular, homogeneous random fields which may possess deterministic components. The conventional Wiener filter assumes that the spectral distribution of the signal is absolutely continuous and, therefore, cannot be directly used to restore the individual textures. A generalized Wiener filter accommodates the unified texture model and is shown to yield minimum mean-squared error estimates for fields with discontinuous spectral distributions. Texture discrimination is performed by obtaining maximum a posteriori estimates for the label field using simulated annealing. The performance of our segmentation algorithm is investigated in the presence of noise.
Continuous presence video bridging based on H.261 standard
Author(s):
Ting-Chung Chen;
Shawmin Lei;
Ming-Ting Sun
Show Abstract
Multi-point videoconferencing provides the full benefits of teleconference but also incurs more involved technical issues. This paper does a detailed analysis of a continuous presence video bridge using the H.261 video coding standard. We first describe the architecture and the required operations of a coded domain bridge using H.261. We then derive the bounds of the bridge delay and the required buffer size for the implementation of the bridge. The delay and the buffer occupancy of the video bridge depend on the order, complexity, and the bit- distribution of the input video sources. To investigate a typical case, we simulate the delay and the buffer occupancy of a video bridge. We also provide a heuristic method to estimate the delay in a typical case. Several techniques were discussed to minimize the bridge delay and the buffer size. Finally, we simulate an intra slice coding and show that the delay and the buffer size can be reduced significantly using this technique.
Multifocus synthesis and its application to 3D image capturing
Author(s):
Hirohisa Yamaguchi
Show Abstract
A new technique for the high-resolution image synthesis called multifocus synthesis is presented. As an important extension, the 3-D real-world image capture is discussed. In the approach, the object image is taken at a number of focal distances by a single camera placed at a fixed position. Each of these images are then converted into the multiresolution using the optimized QMF. The resultant volume of coefficients are then analyzed and 3-D distance information is computed.
Nonlinear regression for image enhancement via generalized deterministic annealing
Author(s):
Scott Thomas Acton;
Alan Conrad Bovik
Show Abstract
We introduce new classes of image enhancement techniques that are based on optimizing local characteristics of the image. Using a new optimization technique for nonconvex combinatorial optimization problems, generalized deterministic annealing (GDA), we compute fuzzy nonlinear regressions of noisy images with respect to characteristic image sets defined by certain local image models. The image enhancement results demonstrate the powerful approach of nonlinear regression and the low-cost, high-quality optimization of GDA.
Error concealment techniques for an all-digital high-definition television system
Author(s):
Aradhana Narula;
Jae S. Lim
Show Abstract
Broadcasting High Definition Television (HDTV) requires the transmission of an enormous amount of information within a highly restricted bandwidth. Adhering to the transmission constraints, channel errors are inevitable. This paper proposes error concealment techniques to remove subjective effects of transmission errors. Error concealment techniques for several data parameters transmitted for the compressed representation of HDTV are considered. Specifically, we address errors in motion vectors and DCT coefficients. The concealment techniques estimate the true value of the corrupted parameters by exploiting the spatial and temporal correlation within the image sequence. In general, the error concealment techniques are found to be extremely successful in removing degradations from the decoded image.
Enhancement of images corrupted with signal dependent noise: application to ultrasonic imaging
Author(s):
Mehmet Alper Kutay;
Mustafa Karaman;
Gozde Bozdagi
Show Abstract
An adaptive filter for smoothing images corrupted by signal dependent noise is presented. The filter is mainly developed for speckle suppression in medical B-scan ultrasonic imaging. The filter is based on mean filtering of the image using appropriately shaped and sized local kernels. Each filtering kernel, fitting to the local homogeneous region, is obtained through local statistics based region growing. Performance of the proposed scheme have been tested on a B-scan image of a standard tissue-mimicking ultrasound resolution phantom. The results indicate that the filter effectively reduces the speckle while preserving the resolvable details. The performance figures obtained through computer simulations on the phantom image are presented in a comparative way with some existing speckle suppression schemes.
Improved interpolation, motion estimation, and compensation for interlaced pictures
Author(s):
Paul Delogne;
Laurent Cuvelier;
Benoit Maison;
Beatrice Van Caillie;
Luc Vandendorpe
Show Abstract
This paper deals with the problem of motion estimation and compensation with sub-pixel accuracy when interlaced moving sequences are processed. Provided a translational motion exists between consecutive frames (or that the image can be partitioned into sets of points with such a motion) the exact formulas for the prediction of pixels are derived. It is shown that in order to obtain a correct prediction of any pixel in one frame, it is required to use the information of both fields of the previous frame. The ideal interpolation filters are of infinite length. Therefore, the question of designing finite length filters is addressed and alternative design methods are proposed. The efficiency of the method is then measured on typical sequences.
Hierarchical edge-based block motion estimation for video subband coding at low bit-rates
Author(s):
Joon-Hyeon Jeon;
Jae-Kyoon Kim
Show Abstract
In this paper, a new hierarchical edge-based block motion estimation scheme is proposed for the exact motion estimation in the passband structure. The basic idea is to estimate an initial motion at the top-most layer in order to increase the accuracy of the motion displacement at the next lower layers in the motion estimation pyramid. This is based on a blockmatching method by the block-edge pattern. In addition, we present a technique of classifying a block- edge pattern by use of only two parameters, which are obtained by calculating two highpass subband signals at the top-most layer in the motion estimation pyramid. Using this scheme, the coding performance is improved by the exact motion estimation. We also present a method to encode the residual error frames in a passband structure. Simulation results show that the performance of the scheme with the edge-based initial motion estimation is better than without it, especially when a high degree of motion exists.
Fast computation of motion vectors for MPEG
Author(s):
Navid Haddadi;
C.-C. Jay Kuo
Show Abstract
Motion compensated video coding in the MPEG standard relies on the knowledge of a single motion vector per 16 X 16 block of pixels called the macroblock. While a brute force approach known as the full search block matching algorithm (BMA) or its variations has been commonly adopted in computing the motion vector in most implementations of the MPEG standard, we study a gradient based method in this work. The proposed method is based on our previous results on multiresolution computation of a discontinuous optical flow field, and some modifications are introduced in this research for efficient computation. Classical motion compensated coding methods approximate the motion field with a piecewise constant function. In contrast, our algorithm approximates the motion field by a piecewise linear function over small triangular subregions. Hence, the resulting algorithm is not only attractive from a computational point of view, but also it provides a better model of the motion field which may result in better compression factors than BMA. Experimental results on some standard test images are reported.
Stereo-enhanced displacement estimation by genetic block matching
Author(s):
Ruggero E. H. Franich;
Reginald L. Lagendijk;
Jan Biemond
Show Abstract
In this paper we introduce a block based vector field estimation technique which is based upon a genetic algorithm and which exploits the dual sensor nature of a stereoscopic signal in order to accelerate its convergence. This vector field estimation technique has been designed to produce smooth vector fields at a small block size without sacrificing accuracy. Conversely (false) accuracy does not impinge upon the smoothness of the vector field.
Hierarchical variable block size motion estimation technique for motion sequence coding
Author(s):
JongWon Kim;
Sang Uk Lee
Show Abstract
Recently, a variable block size (VBS) motion estimation technique has been employed to improve the performance of the motion compensated transform coding (MCTC). This technique allows larger blocks to be used when smaller blocks provide little gain, saving the bit rates especially for areas containing more complex motion. However, there has been little effort in investigating an efficient VBS motion structure for reducing the motion vector coding rates further. Hence, in this paper, a new VBS motion estimation technique based on a hierarchical structure is proposed, which improves the motion vector encoding efficiency and reduces the number of motion vectors to be transmitted as well. Intensive computer simulations on several moving image sequences show that the MCTC employing the VBS motion estimation provides a performance improvement of 0.7 to approximately 1.0 dB, in terms of PSNR, compared to the fixed block size motion estimation.
Recursive MAP displacement estimation and restoration of noisy-blurred image sequences
Author(s):
James C. Brailean;
Aggelos K. Katsaggelos
Show Abstract
In this paper, we develop a recursive model-based maximum a posteriori (MAP) estimator that simultaneously estimates the displacement vector field (DVF) and intensity field from a noisy- blurred image sequence. Current motion-compensated spatio-temporal filters treat the estimation of the DVF as a preprocessing step. Thus, no attempt is made to verify the accuracy of these estimates prior to their use in the filter. By simultaneously estimating these two fields, information is made available to each filter regarding the reliability of the estimates provided by the other filter. Nonstationary models are used for both the DVF and the intensity field in the proposed estimator, thus avoiding the smoothing of boundaries present in both.
Maximum a posteriori displacement field estimation in quantum-limited image sequences
Author(s):
Cheuk L. Chan;
James C. Brailean;
Aggelos K. Katsaggelos;
Alan V. Sahakian
Show Abstract
In this paper, we develop an algorithm for obtaining the maximum a posteriori (MAP) estimate of the displacement vector field (DVF) from two consecutive image frames of an image sequence acquired under quantum-limited conditions. The estimation of the DVF has applications in temporal filtering, object tracking, and frame registration in low-light level image sequences as well as low-dose clinical x-ray image sequences. The quantum-limited effect is modeled as an undesirable, Poisson-distributed, signal-dependent noise artifact. The specification of priors for the DVF allows a smoothness constraint for the vector field. In addition, discontinuities of the field are taken into account through the introduction of a line process for neighboring vectors. A Bayesian formulation is used in this paper to estimate the DVF and a block component algorithm is employed in obtaining a solution. Several experiments involving a phantom sequence show the effectiveness of this estimator in obtaining the DVF under severe quantum noise conditions.
Fractional pixel motion estimation
Author(s):
Smita Gupta;
Allen Gersho
Show Abstract
In this paper, we present a novel method for sub-pixel motion estimation. Compared to traditional methods, our technique offers a significantly reduced complexity by eliminating the explicit interpolation of pixel values at a finer sampling grid and avoiding the distortion evaluation for each candidate sub-pixel motion vector. Simulation results confirm that its performance is virtually indistinguishable from that of a traditional sub-pixel motion estimator. To illustrate applications of the new method we have developed a variable resolution motion estimator. The performance of the variable resolution motion estimator is evaluated by replacement of the fixed resolution motion estimator in Simulation Model 3 of MPEG-1 video compression standard by the variable resolution motion estimator.
Development of a VLSI chip set for H.261/MPEG-1 video codec
Author(s):
Eishi Morimatsu;
Osamu Kawai;
Kiyoshi Sakai;
Kiichi Matsuda;
Hideki Miyasaka;
Hirokazu Fukui;
Yasuhiro Sakawaki;
Kazuo Kaneko;
Katsuhiro Eguchi
Show Abstract
A VLSI chip set fully compatible with both CCITT/H.261 and ISO/MPEG-1 has been developed. The chip set is composed of 3 chips, MC-LSI, COD-LSI, and DEC-LSI which realize a realtime coding of moving pictures based on the international standard coding algorithms. A realtime decoder can also be realized by single use of a DEC-LSI chip. Each chip includes 140,000 to 160,000 gates using 0.8 micrometers CMOS technology and operates at 27 MHz clock rate. The chip set performs full frame coding/decoding of CIF and SIF, and operates up to 6.3 Mb/s in the transmission bitrate. The chip set has been installed into a prototype video codec controlled by a PC(FM-TOWNS) and confirmed to work successfully.
Adaptive DPCM for broadcast-quality TV transmission of composite NTSC signal
Author(s):
Toru Shibuya;
Norio Suzuki;
Noboru Kawayachi;
Toshio Koga
Show Abstract
This paper proposes a DPCM algorithm to be deployed for broadcast quality transmission of composite NTSC color TV signal at DS-3 rate (45 Mb/s). Recently, many algorithms have been proposed for TV coding and most of them are based on orthogonal transform such as DCT. DCT is appropriate for high compression. On the other hand, DPCM is the best candidate for highest quality transmission, though it might need a comparatively high transmission rate. DPCM can be directly applied to composite NTSC signal without separation/synthesis of chrominance and luminance components before/after the compression. In case of cascaded connection, no accumulation of coding noise can be achieved unlike transform coding. It has been confirmed, using a complete codec running at DS-3 rate, that the proposed adaptive DPCM is very effective even for composite NTSC signals including the CCIR standard sequences. This adaptive DPCM contributes to rapid progress of a digital broadcast TV transmission network using a high speed medium such as SONET as well as DS- 3 service.
Fast approximate Karhunen-Loeve transform with applications to digital image coding
Author(s):
Leu-Shing Lan;
Irving S. Reed
Show Abstract
The Karhunen-Loeve transform (KLT) is known to be the optimal transform for data compression. However, since it is signal dependent and lacks a fast algorithm, it is not used in practice. In this paper, a fast approximate Karhunen-Loeve transform (AKLT) is presented. This new transform is derived using perturbation theory of linear operators. Both the forward and inverse AKLT are analytically derived in closed forms. In addition, fast computational algorithms are developed for both the forward and inverse transforms. The order of computational complexity for the AKLT is N log2 N, which is the same as that of the DCT, the transform presently used in industrial practice. Performance comparisons reveal for a first-order Markov sequence that the AKLT performs better than the DCT in its energy compaction and signal decorrelation capabilities. Experiments on real images also demonstrate a definite superiority of the AKLT over the DCT when an adaptive scheme is used.
Low-complexity field-based processing for high-quality video compression
Author(s):
Ashok K. Rao;
K. Yang;
Sanjai Bhargava;
Kou-Hu Tzou
Show Abstract
A motion estimation/compensation scheme which is very effective for interlaced sequences with fast motion is presented. This field-based processing technique is MPEG-2 compliant and has lower memory and processing requirements than those of the standard bi-directional frame-based approach. For effective compression of different video sources, a novel temporal processing architecture is described which can switch in real-time between different MPEG-2 compliant field/frame processing modes.
Image sequence coding using the zero-tree method
Author(s):
Gourav Bhutani;
William A. Pearlman
Show Abstract
A simple yet effective image sequence coding technique based on the zero-tree method is studied. The frames of the image sequence are decomposed hierarchically into 5 levels and motion compensation is performed on each subband separately. The displaced frame difference (DFD) is then encoded using the zero-tree method. Simulations done on the first 60 frames of the Miss America sequence using backward motion compensation with pel recursive motion estimation technique have yielded an average PSNR of 40.52 dB at a rate of 0.3 bpp and 43.68 dB at 1.0 bpp. This technique has the advantage that it works well with most images as no training set is required. There is precise control of the coding rate and the transmission is progressive.
Performance of MPEG codecs in the presence of errors
Author(s):
Ya-Qin Zhang;
Xiaobing Lee
Show Abstract
MPEG is emerging as a major international standard for applications in compressed video storage, transmission, and interactive communications. MPEG will be used in applications which involve imperfect channel conditions or storage defects. This work studies the effects of different types of errors on the compressed MPEG bit streams and possible approaches to minimize such effects. The use of forward error correction, block interleaving, error concealment, and their inter-relationship and applicability to MPEG bit streams, were investigated.
Distortion measure for blocking artifacts in images based on human visual sensitivity
Author(s):
Shanika A. Karunasekera;
Nick G. Kingsbury
Show Abstract
A visual model which gives a distortion measure for Blocking Artifacts in images is presented. Given the original and the reproduced image as inputs, the model output is a single numerical value which quantifies the visibility of blocking error in the reproduced image. The model is derived based on the human visual sensitivity to horizontal and vertical edge artifacts which result from blocking. Psychovisual experiments have been carried out using a novel experimental technique to measure the sensitivity to edge artifacts with the variation of edge length, edge amplitude, background luminance and background activity. The model parameters are estimated based on these sensitivity measures. The final model has been tested on real images, and the results show that the error predicted by the model correlate well with the subjective ranking.
Recursive multiscale error-diffusion technique for digital halftoning
Author(s):
Ioannis Katsavounidis;
C.-C. Jay Kuo
Show Abstract
The technique of mapping a given gray level to some arrangement of dots such that it renders the desired gray level is called halftoning. In this research, we propose a new digital halftoning algorithm to achieve this goal based on an approach called the recursive multiscale error diffusion. Our main assumption is that the resulting intensity from a raster of dots is in proportion to the number of dots on that raster. In analogy, the intensity of the corresponding region of the input image is simply the integral of the (normalized) gray level over the region. The two intensities should be matched as much as possible. It is shown that the area of integration plays an important role to how successful the matching of the two intensities can be, and since the area of integration corresponds to different resolutions (therefore to different viewing distances), we address the problem of matching the intensities, as much as possible for every resolution. Advantages of our method include very good performance, versatility and ease of hardware implementation.
Scene-adaptive motion interpolation structures based on temporal masking in human visual perception
Author(s):
Jungwoo Lee;
Bradley W. Dickinson
Show Abstract
In this paper we present a novel technique to dynamically adapt motion interpolation structures by temporal segmentation. The interval between two reference frames is adjusted according to the temporal variation of the input video. The difficulty of bit rate control for this dynamic group of pictures (GOP) structure is resolved by taking advantage of temporal masking in human vision. Six different frame types are used for efficient bit rate control, and telescopic search is used for fast motion estimation because frame distances between reference frames are dynamically varying. Constant picture quality can be obtained by variable bit rate coding using this approach and the statistical bit rate behavior of the coder is discussed. Advantages for low bit rate coding and storage media applications and implications for HDTV coding are discussed. Simulations on test video including HDTV sequences are presented for various GOP structures and different bit rates, and the results compare favorably with those for conventional fixed GOP structures.
Regularized reconstruction to remove blocking artifacts from block discrete cosine transform compressed images
Author(s):
Yongyi Yang;
Nikolas P. Galatsanos;
Aggelos K. Katsaggelos
Show Abstract
In most block-transform based codecs (coder-decoder) the compressed image is reconstructed using only the transmitted data. In this paper, the reconstruction is formulated as a regularized image recovery problem where both the transmitted data and prior knowledge about the properties of the original image are used. This is accomplished by minimizing an objective function, using iterative algorithms, which captures the smoothness properties of the original image. Experimental results are presented which demonstrate that the proposed regularized algorithms yield reconstructed images with superior quality, both visually and using objective distance metrics, to that of traditional decoders that use only the transmitted transform coefficients.
Technique for image interpolation using polynomial transforms
Author(s):
Boris Escalante-Ramirez;
Jean-Bernard Martens
Show Abstract
We present a new technique for image interpolation based on polynomial transforms. This is an image representation model that analyzes an image by locally expanding it into a weighted sum of orthogonal polynomials. In the discrete case, the image segment within every window of analysis is approximated by a finite set of polynomial coefficients. We show how the problem of interpolating an image can be approached by interpolating the smooth window function used to locally analyze the image. Comparison with the existing interpolation techniques illustrates the good performance of the method presented in this paper.
Information loss recovery for block-based image coding techniques: a fuzzy logic approach
Author(s):
Xiaobing Lee;
Ya-Qin Zhang;
Alberto Leon-Garcia
Show Abstract
A new technique to recover the information loss in a block-based image coding system is developed in this paper. The proposed scheme is based on the fuzzy logic reasoning and can be divided into three main steps: (1) hierarchical compass interpolation/extrapolation in the spatial domain for initial recovery of lost blocks that mainly contain low-frequency information such as smooth background; (2) coarse spectra interpretation by fuzzy logic reasoning for recovery of lost blocks that contain high-frequency information such as complex textures and fine features; (3) sliding window iteration in both spatial and spectral domains to efficiently integrate the results obtained in step (1) and (2) such that optimal results can be achieved in terms of surface continuities on block boundaries and the established inference rules. The proposed method, suitable for recovering both isolated and contiguous block losses, provides a new approach for error concealment of block-based image coding systems such as the JPEG coding standard and vector quantization based coding algorithms. The principle of the proposed scheme can also be applied to block-based video compression schemes such as the H.261, MPEG, and HDTV standards. Simulation results are presented to illustrate the effectiveness of the proposed method.
Image sequence coding by multigrid motion estimation and segmentation-based coding of prediction errors
Author(s):
Wei Li;
Frederic Dufaux
Show Abstract
This paper presents an innovative video coding system which processes appropriately motion information by an improved motion estimation algorithm and a segmentation based coding of displaced frame differences (DFD). The proposed multigrid motion estimation algorithm with adaptive mesh leads to more uniform and accurate motion vectors, and a lower overhead information. A morphological segmentation algorithm is proposed to code DFD by sending contours and quantized high energy regions. The method is coding oriented, and has the potential of graceful degradation. Simulation results show outstanding performances of the proposed codec for image sequences in CCIR 601 format.
Integral projection methods for block motion estimation
Author(s):
Brian Schwartz;
Ken D. Sauer
Show Abstract
Block-based motion estimation is included in most video coding systems, despite the high computational cost of direct block matching techniques. Several schemes have been advanced in recent years to simplify motion estimation while maintaining minimal error in the motion- compensated predicted image. We present in this paper a block motion estimation approach which is based on exhaustive search with integral projections. The projection method is much less computationally costly than block matching, and has a prediction accuracy of competitive quality with both full block matching and other efficient techniques. Our algorithm also takes advantage of the similarity of motion vectors in adjacent blocks in typical imagery, by subsampling the motion vector field.
Efficient motion-estimated vector quantizer
Author(s):
Inseop Lee;
Jong Gyu Kim;
Souguil J.M. Ann
Show Abstract
In this paper, we have proposed an efficient motion estimated vector quantizer (MEVQ) which reduces the full searching time of the VQ encoder by means of motion estimation. In image sequence successive frames are usually very highly correlated. Using the frame difference we can classify a moving area and a static area in a frame. And by motion estimation we can get motion vectors for each block. In MEVQ we make two sets of codebooks. One is the main codebook which has all codewords and the other is the sub codebook which is a subset of the main codebook. For the static area MEVQ refers to the sub codebook instead of the main codebook. For motion parts that have motion vectors it refers to the sub codebook by the help of motion vectors. For the rest of the image frame that are moving areas but cannot have motion vectors it refers to the main codebook. According to our computer simulation, MEVQ reduces the searching time greatly compared with the normal full search VQ without any significant degradation.
Motion analysis in 3D subband coder
Author(s):
King N. Ngan;
Weng Leong Chooi;
Khok Khee Pang
Show Abstract
A block-based motion model capable of determining velocity and orientation of motion of a sequence is developed in this paper. Based on the model, three classes of motion orientation, namely horizontal, vertical and diagonal motion in the velocity range of 1 - 8 pixels/frame, are identified. The classification is valuable in coding whereby a reduced motion search is adequate to obtain the best prediction. A low-complexity motion detection scheme is also considered here which takes advantage of the 3D subband decomposition architecture employed.
Improving block-based motion estimation by the use of global motion
Author(s):
Caspar Horne
Show Abstract
The current video coding standards, such as MPEG1 and MPEG2, all use on block based motion compensation to reduce the temporal redundancy. Motion vectors used for the compensation are computed by block matching. The computational load to compute these vectors is very high, and as a result, the range of the motion estimation is limited by hardware constraints. In image sequences where the actual motion exceeds this range, the video quality of the decoded sequences will suffer. By allowing the motion estimation search window to be displaced by a certain offset with respect to the position of the block to be coded, a much larger range of motion can be compensated effectively. We describe a method for computing this offset, the global motion present in a certain frame, and show that this measure can be used as an estimate for the displacement applied to the position of the search window in the next frame. Experimental data is included, and simulation results show the use of global motion can give significant improvements in average number of bits produced per frame using constant quality.
New hardware implementation of fast vector median filters
Author(s):
Long-Wen Chang;
J. S. Lee
Show Abstract
Median filtering is a nonlinear filtering technique which is known for preserving sharp changes in signals and being particularly effective in removing impulse noise. It is very widely used in speech and image processing since it was first described by Tukey. Recently, Neuvo first introduced the concepts of vector median. In this approach, the samples of the vector- value input signals are processed as vectors instead of componentwise scalar processing. The vector median utilizes the correlation between different components. Thus, it is good for HDTV color image processing. In this paper, we first discuss the difference between componentwise median filter and vector median filter. Then, we propose several new vector median filters to increase signal noise ratio. A fast computational architecture for these algorithms is also addressed.
Adaptive interpolation of images with application to interlaced-to-progressive conversion
Author(s):
Stefan Thurnhofer;
Michael L. Lightstone;
Sanjit K. Mitra
Show Abstract
We propose a novel method for image interpolation which adapts to the local characteristics of the image in order to facilitate perfectly smooth edges. Features are classified into three categories (constant, oriented, and irregular). For each class we use a different zooming method that interpolates this feature in a visually optimized manner. Furthermore, we employ a nonlinear image enhancement which extracts perceptually important details from the original image and uses these in order to improve the visual impression of the zoomed images. Our results compare favorably to standard lowpass interpolation algorithms like bilinear, diamond- filter, or B-spline interpolation. Edges and details are much sharper and aliasing effects are eliminated. In the frequency domain we can clearly see that our adaptive algorithm not only suppresses the undesired spectral components that are folded down in the upsampling process. It is also capable of replacing them with new estimates, which accounts for the increased image sharpness. One application of this interpolation method is spatial interlaced-to- progressive conversion. Here, it yields again more pleasing images than comparable algorithms.
Novel approaches to multichannel filtering for image texture segmentation
Author(s):
Trygve Randen;
John Hakon Husoy
Show Abstract
Several approaches to multichannel filtering for texture classification and segmentation with Gabor filters have been proposed. The rationale presented for the use of the Gabor filters is their relation to models for the early vision of mammals as well as their joint optimum resolution in time and frequency. In this work we present a critical evaluation of the Gabor filters as opposed to filter banks used in image coding -- in both full rate and critically sampled realizations. In the critically sampled case, tremendous computational savings can be realized. We further evaluate the commonly used octave band decomposition versus alternative decompositions. We conclude that for a texture segmentation task it is possible to use a wider range of filters than just the Gabor class of filters, it is possible to use alternative decompositions and, most important, it is possible to use subsampled filters.
Optimized interpolation filters for compatible pyramidal coding of TV and HDTV
Author(s):
Laurent Cuvelier;
Benoit M. M. Macq;
Benoit Maison;
Luc Vandendorpe
Show Abstract
This paper deals with the question of optimizing the filters in the upsampling stage of a TV/HDTV compatible pyramidal coder. From a coding gain point of view, both the decimation and upsampling filters should be optimized. In the frame of compatible coding, not only the coding efficiency influences the choice of the decimation filter but also the compatible image quality. Therefore, assuming this filter has been fixed, we analyze the question of optimizing the upsampling filter in order to obtain the highest coding gain. This question is addressed for a mean squared error (MSE) criterion. In addition, assuming the base layer (TV) signal can be quantized, the influence on the quantization noise on the optimal interpolation filter is investigated and the problem is handled for the MSE criterion. As the statistical properties of pictures are required in the optimization, a model is then developed to compute these properties when there is motion. The model takes into account the processing of progressive sources and, concerning interlaced sequences, the independent processing of fields or the processing of merged fields. Results are then derived for the three types of processing.
Modeling and analysis of floating point quantization errors in subband filter structures
Author(s):
Necdet Uzun;
Richard A. Haddad
Show Abstract
This paper is concerned with the analysis and modeling of the effects of floating point quantization of subband signals in a two channel filter bank. It represents an extension of previous work for optimal fixed-point quantizers. In the present case, the quantization noise is modeled by a multiplicative noise as compared with additive noise representation for the fixed- point case. We derive equations for the autocorrelation and power spectral density (PSD) of the reconstructed signal y(n) in terms of the analysis/synthesis filters, the PSD of the input, and the quantizer model. Formulas for the mean-square error and for compaction gain are obtained in terms of these parameters. We assume the filter bank is perfect reconstruction (PR) (but not necessarily paraunitary) in the absence of quantization and transmission errors. The autocorrelation function of the output y(n) is generally non-stationary. However, it is cyclostationary since it is stationary when n is odd, or n is even; but not both. By taking the average of the autocorrelation for n even, and for n odd, we obtain a stationary autocorrelation, and its associated PSD. This cyclostationary analysis is used to compute the quantization noise component in the output, for any PR subband structure.
Optimal subband filters to maximize coding gain
Author(s):
Masayuki Tanimoto;
Akio Yamada;
Norio Wakatsuki
Show Abstract
The optimal analysis/synthesis filters giving the maximum coding gain are derived in subband schemes. The optimal analysis filters consist of the emphasis of the picture signal and ideal band-splitting. The characteristics of the emphasis is determined by the spectrum of the picture signal. A large improvement of coding gain is achieved by the subband scheme with the optimal subband filters obtained here. Approximated emphasis characteristic determined from a spectrum model of picture signals can be used and the ideal band-splitting filters can be replaced by conventional subband filters since the degradation of coding gain due to these approximations is small. Computer simulation of super HD image coding by the proposed scheme is performed. SN ratio of the reconstructed image is increased and edges are reconstructed very well compared to the conventional subband scheme. The proposed scheme is very suited to super HD image coding since the improvement of SN ratio is large for images with high correlation between the neighboring pixels.
Reversible image compression via multiresolution representation and predictive coding
Author(s):
Amir Said;
William A. Pearlman
Show Abstract
In this paper a new image transformation suited for reversible (lossless) image compression is presented. It uses a simple pyramid multiresolution scheme which is enhanced via predictive coding. The new transformation is similar to the subband decomposition, but it uses only integer operations. The number of bits required to represent the transformed image is kept small through careful scaling and truncations. The lossless coding compression rates are smaller than those obtained with predictive coding of equivalent complexity. It is also shown that the new transform can be effectively used, with the same coding algorithm, for both lossless and lossy compression. When used for lossy compression, its rate-distortion function is comparable to other efficient lossy compression methods.
Multiresolution coder with edge sensitivity
Author(s):
Nikhil D. Sinha;
Richard A. Haddad
Show Abstract
In this paper we propose a new method to code the higher bands in a multiresolution decomposition. These coefficients contain the perceptually meaningful edge information in an image which is usually hard to code due to the inherent non-stationary nature of the signal. The multiresolution decomposition itself has the advantage that it is maximally decimated and the edge extraction operator is the same as the higher band filters. Since the bands are directional the edge extraction process shows preference to edges that have specific orientations -- to take advantage of this directional preference we scan the bands in these directions only and code the data as one dimensional streams. The performance of the coder is promising, though several improvements are being made -- specifically to code edge data that is obtained from scanning in the direction tangent to the edge profile rather than the directions dictated by the filtering process.
Systolic implementation of a bidimensional lattice filter bank for multiresolution image coding
Author(s):
P. Desneux;
Jean-Didier Legat;
Benoit M. M. Macq;
J. Y. Mertes
Show Abstract
In this paper, we present systolic architecture based on the lattice structure of filters. The main characteristic of this architecture is the systolism: computations are pipelined in many identical locally interconnected processing elements (PEs). These PEs are simple and can reach high frequency clock while working at any time of the process. So, the speed of the circuit can be increased. The implementation of the filters, through VLSI techniques, is facilitated by the repetitive nature of the elements. In section 2, we describe the multiresolution scheme and the lattice structures. If the lattice structure appears as an efficient remedy for the finite length of the multipliers, special attention has to be kept on the computation noise which appears when the datapath is limited to some finite width. The goal of the so-related study consists in keeping this computation noise below the quantization noise (coming from the quantizers) at a reasonable cost. In section 3, we present the basic processing element and its use among the different stages of the filter. Section 4 deals with the finite representation of the data throughout the datapath.
Efficient coding of residual images
Author(s):
Josep R. Casas;
Luis Torres;
Matilde Jareno
Show Abstract
In progressive image coding, the cost in bits of the successive residual images being encoded does not always correspond with the subjective importance that such components may have. The original idea of this paper arose for increasing the efficiency when coding the last Laplacian levels in linear decompositions for pyramidal coding. The same principle has been applied to the non-linear image decomposition presented in which a segmentation-based progressive scheme is used for coding purposes. The `post-it' method for extracting details based on mathematical morphology proposed by Meyer has been modified in order to improve the rendition of the posterior coding, and a suitable technique for coding the extracted details from an extension of run-length coding is then applied. In both linear and non-linear cases the results of this method for `detail-coding' are compared against the conventional progressive coding technique, i.e., pyramidal coding or pure segmentation-based contour/texture coding.
MPEG-like pyramidal video coder
Author(s):
Rakeshkumar Gandhi;
Limin Wang;
Sethuraman Panchanathan;
Morris Goldberg
Show Abstract
In this paper, we propose an MPEG-like video coder using a 3-dimensional (3D) adaptive pyramid. In the 3D adaptive pyramid, the temporal or spatial contraction is selected using the corresponding prediction differences. A video codec is designed using the 3D adaptive pyramid and intra-frame vector quantization. Encoding errors introduced at the upper levels are fed-forward to the lower levels of the pyramid which makes lossless coding possible. The proposed pyramidal video coder has a number of similarities with the MPEG-I video compression standard. Simulation results for the CCITT standard test sequences indicate that the 3D adaptive pyramid reduces the lossless bit rate by a factor of two. Excellent subjective quality as well as objective quality (PSNR value of 36.6 db) are obtained at a bit rate less than T1 rate (i.e., 1.544 Mb/s). Furthermore, smooth transition is achieved in the case of scene changes without sacrificing picture quality.
Spatial-domain resolution-scalable video coding
Author(s):
Atul Puri;
Andria H. Wong
Show Abstract
Scalable video coding is important in a number of applications where video needs to be decoded and displayed at a variety of resolution scales. In this paper, we investigate the spatial-domain approach for spatial resolution scalability. We employ the framework proposed for the ongoing second phase of Moving Picture Experts Group (MPEG-2) standard to introduce a general coding structure and discuss key issues in progressive-to-interlace and interlace-to-interlace scalable coding. We discuss the technique of spatio-temporal weighted prediction and show detailed encoder and decoder structures employing this technique. In our simulations we focus on 2-layer progressive-to-interlace spatially scalable coding and compare its performance with the simulcast technique at different bit-rates. Next, we evaluate the performance of spatially scalable coding for different picture types. Finally, we examine the efficiency of a chosen set of weight codes in spatio-temporal weighted prediction and propose improvements.
Multiresolution tree architecture with its application in video sequence coding: a new result
Author(s):
Jin Li;
Xinggang Lin;
Youshou Wu
Show Abstract
Motion estimation and compensation are important tasks in image sequence coding. In this paper, we present a motion estimation scheme with multiresolution tree structure and hierarchical motion vector search. Experiments and analysis show that this scheme is not only computation efficient, but also robust. The multiresolution tree structure is further utilized in a variable block size image sequence coding scheme that incorporates the visual spatio-temporal characteristics. Both DCT and quadtree approaches are used to encode the motion compensated prediction error. Although the signal-to-noise ratio of the quadtree coded image is a little lower, the subjective quality around sharp edges is much better. Comparing our extensive simulation results with the MPEG-1 standard, we have obtained quite promising results.
Parallel image segmentation using a Hopfield neural network with annealing schedule for neural gains
Author(s):
Yungsik Kim;
Sarah A. Rajala
Show Abstract
Neural network architectures have been proposed as new computer architectures and a Hopfield neural network has been shown to find good solutions very fast in solving complex optimization problems. It should be noted, however, that a Hopfield neural network with fixed neural gains only guarantees to find local optimum solutions, not the global optimum solution. Image segmentation, like other engineering problems, can be formalized as an optimization problem and implemented using neural network architectures if an appropriate optimization function is defined. To achieve a good image segmentation, the global or the nearly global optimum solutions of the appropriate optimization function need to be found. In this paper, we propose a new neural network architecture for image segmentation, `an annealed Hopfield neural network,' which incorporates an annealing schedule for the neural gains. We implemented image segmentation using this annealed Hopfield neural network with an optimization function proposed by Blake and Zisserman and achieved good image segmentation in detecting horizontal and vertical boundaries. Later, we proposed an extended optimization function to achieve better performance on detecting sharp corners and diagonally oriented boundaries. Finally, simulation results on synthetic and real images are shown and compared with general-purpose mean field annealing technique.
Novel cluster-based probability model for texture synthesis, classification, and compression
Author(s):
Kris Popat;
Rosalind W. Picard
Show Abstract
We present a new probabilistic modeling technique for high-dimensional vector sources, and consider its application to the problems of texture synthesis, classification, and compression. Our model combines kernel estimation with clustering, to obtain a semiparametric probability mass function estimate which summarizes -- rather than contains -- the training data. Because the model is cluster based, it is inferable from a limited set of training data, despite the model's high dimensionality. Moreover, its functional form allows recursive implementation that avoids exponential growth in required memory as the number of dimensions increases. Experimental results are presented for each of the three applications considered.
Logic neural network-based segmentation system with variable-sensitivity characteristics
Author(s):
Devesh Patel;
G. Tambouratzis;
T. John Stonham
Show Abstract
A logic artificial neural network paradigm is used to cluster texture spectra in feature space to achieve image segmentation. The features are grouped such that they represent regions of textural homogeneity on the image. These are extracted from small local areas on the image. The strategy results in a feature spectrum transformation of the image. The logic neural network is characterized by short training and operating times compared to the analogue neural networks. The network runs in an unsupervised mode, which excludes the need for external supervision during operation. The variable-sensitive characteristics of the network are illustrated with the aid of natural texture image composites.
Nonlinear feature extraction for perceptually-based low-bit-rate image coding
Author(s):
Michael L. Lightstone;
Stefan Thurnhofer;
Sanjit K. Mitra
Show Abstract
A novel feature extraction technique based on a new class of nonlinear filters is presented for perceptually based, low bit rate image compression. In addition to having remarkable computational efficiency, the filters adaptively extract perceptually relevant edge and texture information with less amplification of noise than conventional linear filters. A two-band nonlinear decomposition based on the filters is constructed so that this feature information can be separated from the image. Separate coding strategies are examined that effectively quantize these features and the more stationary, low frequency residual information. The resulting compression scheme is compared with traditional methods of image compression and is shown to exhibit clear advantages in perceptual reconstruction quality.
Noncausal predictive image coding
Author(s):
Peifang Zhou;
Masoud R. K. Khansari;
Alberto Leon-Garcia
Show Abstract
This paper presents an application of Markov random field theory to image coding. First we use Markov random fields to model the correlation in the image intensity fields. We then propose a noncausal predictive image coding scheme in which the estimation of present pixel is based on both past and future neighboring pixels. A sequential iterative decoding algorithm is extended from 1D to 2D to perfectly reconstruct the image from estimation residuals at the decoder. We also develop a fast whirlpool algorithm to speed up the decoding. Open-loop and closed-loop quantizer structures are implemented for noncausal prediction and performances are compared with conventional DPCM predictive coding.
Subjective mode regions of picture and painting of 3D pictures
Author(s):
Teruki Yamamoto;
Takaaki Fukuyama;
Yo Murao;
Hajime Enomoto
Show Abstract
Usually, pictures are represented as the set of pairs composed of the location (x, y) of a point and its intensity level f(x, y). For analysis of the scalar function f(x, y), the theory of structure lines are effective because the lines have invariant features for variation of environments where a real object gives a corresponding picture. The division line is especially important in structure lines because the line divides convex or concave domains. Then, we propose the concept of subject mode region in order to analyze the structure of a picture. Subjective mode region is defined as the region enclosed by some division lines. Therefore, subjective mode regions are specified by neighborhood mode relation of luminance value or chrominance separation. The concept of subjective mode region is useful for a stereoscopic picture for a complicated picture such as a human face. A stereoscopic picture is generated by applying the simple geometrical models to subjective mode regions.
Adaptive Bayesian approach for color image segmentation
Author(s):
Michael M. Chang;
Andrew J. Patti;
M. Ibrahim Sezan;
A. Murat Tekalp
Show Abstract
A Bayesian segmentation algorithm to separate color images into regions of distinct colors is presented. The algorithm takes into account the local color variations in the image in an adaptive manner. A Gibbs random field (GRF) is used as the a priori probability model for the segmentation process to impose a spatial connectivity constraint. We study the performance of the proposed algorithm in different color spaces and its application in reduced data rendering of color images. Experimental results and discussion are included.
Perceptual grouping by local group-tree research
Author(s):
Jun Zhang;
Yong Yan
Show Abstract
Perceptual grouping, or perceptual organization, is the process of grouping local image features, such as line segments and regions, into groups (`perceptual chunks') that are likely to have come from the same object. Such an operation is essential to reliable and robust object recognition since the local features are often fragmented and cannot be matched directly to object models. In this paper, a novel approach to the problem of perceptual grouping is described. In this approach, perceptual grouping is accomplished in two steps. First, connections between local features are established or rejected according to which such connections would lead to good global groupings. This is done by performing a tree search of all the possible global groups associated with each potential connection. Second, perceptual groups are generated by propagating local connections and by local competition. The efficacy of this approach is demonstrated on the grouping of line segments in synthetic and real-world images.
Segmentation and image enhancement using coupled anisotropic diffusion equations
Author(s):
Eric J. Pauwels;
Marc Proesmans;
Luc J. Van Gool;
Theo Moons;
Andre J. Oosterlinck
Show Abstract
This paper introduces a number of systems of coupled, non-linear diffusion equations and investigates their role in edge-preserving smoothing and noise suppression. The basic idea is that several maps describing the image undergo coupled development towards an equilibrium state representing the enhanced image. These maps could contain: e.g., intensity, local edge strength, range or another quantity. All these maps, including the edge map, contain continuous rather than all-or-nothing information, following a strategy of least commitment. Each of the approaches has been developed and tested on a parallel transputer network.
Integration of segmentation and stereo matching
Author(s):
Yaonan Zhang
Show Abstract
Segmentation and stereo matching are difficult problems in computer vision. One of the possible solutions is to solve these problems in an integrated manner as described in this paper. After region-based segmentation, a candidate stereo matching is carried out, which assigns the corresponding regions from one image to another image by shape-based matching. During the next segmentation, stereo information is included, that is, in considering the merging of one region with its neighboring regions, the corresponding regions in the candidate matching pools are extracted and a new measurement is calculated, based on intensity and shape information from both images. The global matching finally combines other constraints like uniqueness, ordering, and topological relations to get unique matching. The developed algorithm has successfully reconstructed disparity maps on test images. It is concluded that our method is a good one to solve segmentation and stereo matching together.
Knowledge-based videotelephone sequence segmentation
Author(s):
Manuel Menezes de Sequeira;
Fernando Manuel Ber Pereira
Show Abstract
This paper presents a robust knowledge-based segmentation algorithm for videotelephony sequences ranging from studio based to mobile. It is able to divide each image in a sequence in non-overlapping head, body, and background areas. Its robustness stems from its ability to cope with the peculiarities of mobile sequences, having very detailed, moving backgrounds as well as strong camera movements (originating from vibration in car videotelephones or from small hand movements in hand-held videotelephones). The proposed algorithm uses edge and changed areas (due to speaker's motion) detection, as well as the redundancy associated to the speaker's position, as the basis for the segmentation. Geometrical knowledge-based techniques are then used to define the complete regions. The algorithm includes a quality estimation and control procedure, which enables it to decide whether to accept or reject the current segmentation, and which can be input to the videotelephone coder.
Hierarchical multiresolution texture image segmentation
Author(s):
Kidiyo Kpalma;
Veronique Haese-Coat;
Joseph Ronsin
Show Abstract
In this work we present a method for multiresolution texture image segmentation via supervised Bayesian classification. This method takes into account a function called classification index CI (or Indice de Classification IC3). This function measures the accuracy rate of a pixel classification after it has been classified. The CI measure is based on the location of the pixel attribute vector in the observation space relative to the classes to which it must be classified. To reduce the time used in the classification process, we propose a new method (ASH: Algorithme de Segmentation Hierarchique3) that segments hierarchically the multiresolution images.
Finite-state residual vector quantizer for image coding
Author(s):
Steve Shih-Yu Huang;
Jia-Shung Wang
Show Abstract
Finite state vector quantization (FSVQ) has been proven during recent years to be a high quality and low bit rate coding scheme. A FSVQ has achieved the efficiency of a small codebook (the state codebook) VQ while maintaining the quality of a large codebook (the master codebook) VQ. However, the large master codebook has become a primary limitation of FSVQ if the implementation is carefully taken into account. A large amount of memory would be required in storing the master codebook and also much effort would be spent in maintaining the state codebook if the master codebook became too large. This problem could be partially solved by the mean/residual technique (MRVQ). That is, the block means and the residual vectors would be separately coded. A new hybrid coding scheme called the finite state residual vector quantization (FSRVQ) is proposed in this paper for the sake of utilizing both advantage in FSVQ and MRVQ. The codewords in FSRVQ were designed by removing the block means so as to reduce the codebook size. The block means were predicted by the neighboring blocks to reduce the bit rate. Additionally, the predicted means were added to the residual vectors so that the state codebooks could be generated entirely. The performance of FSRVQ was indicated from the experimental results to be better than that of both ordinary FSVQ and RMVQ uniformly.
Variable-rate multistage vector quantization of multispectral imagery with greedy bit allocation
Author(s):
Smita Gupta;
Allen Gersho
Show Abstract
Multispectral satellite images of the earth consist of sets of images obtained by sensing electromagnetic radiation in different spectral bands for each geographical region. We have applied a variable rate multistage vector quantizer for the compression of multispectral imagery. Spectral and spatial correlation are simultaneously exploited by forming vectors from 3-dimensional data blocks. The wide variation in entropy across the data set is efficiently exploited by an adaptive bit allocation algorithm based on a greedy approach where the rate- distortion trade-off is locally optimized for each successive encoding stage. Simulation results on an image set acquired by a Thematic Mapper scanner are presented. A substantial improvement is obtained over prior vector quantization based coders for multispectral data compression.
Semiadaptive vector quantization and its application in medical image compression
Author(s):
Jian-Hong Hu;
Yao Wang;
Patrick Cahill
Show Abstract
In this paper, we introduce a semi-adaptive vector quantization (SAVQ) method, which is a combination of the traditional VQ scheme using a fixed code book and the locally adaptive VQ (LAVQ) method which dynamically constructs a code book according to the input data stream. The code book in SAVQ consists of two parts: a fixed part that is designed based on certain training signals as in VQ, and an adaptive part that it updated based on the input vectors to be compressed. The proposed method is more effective than VQ and LAVQ for semi-stationary signals that have patterns common over different images as well as features specific to a particular image. Such is the case with medical images, which have similar tissue characteristics over different images, as well as with local variations that are patient and pathology dependent. The SAVQ as well as VQ and LAVQ methods have been applied to multispectral magnetic resonance brain images. The SAVQ has achieved higher compression ratios than the VQ and LAVQ methods over a wide range of reproduction quality, with more significant improvement in the mid to high quality range. Furthermore, under the same quality criterion, SAVQ requires a much smaller code book than VQ, making the former less time and memory demanding. Readings by neuroradiologists have suggested that images produced by SAVQ at compression ratios up to 40 (for MRI data with 3 or 4 images/set, 256 X 256 pixels/image, and 16 bits/pixel) are acceptable for primary reading.
Next-state functions for finite-state vector quantization
Author(s):
Nasser M. Nasrabadi;
Nader Mohsenian;
Hon-Tung Mak;
Syed A. Rizvi
Show Abstract
In this paper, a finite-state vector quantizer called Dynamic Finite-State Vector Quantization (DFSVQ) is investigated with regard to its subcodebook construction. In DFSVQ each input vector encoded by a small codebook, called subcodebook, is created from a much larger codebook called supercodebook. The subcodebook is constructed by selecting (reordering procedure) a set of appropriate codevectors from the supercodebook. The performance of the DFSVQ depends on this reordering procedure, therefore, several reordering procedures are introduced and their performances are evaluated in this paper. The reordering procedures that are investigated are the conditional histogram, address prediction, vector prediction, nearest neighbor design, and the frequency usage of codevectors. The performance of the reordering procedures are evaluated by comparing their hit ratios (the number of blocks encoded by the subcodebook) and their computational complexity. Experimental results are presented for both still images and video. It is found that for still images the conditional histogram performs the best and for video the nearest neighbor design performs the best.
Robust estimation of motion vector fields with discontinuity and occlusion using local outliers rejection
Author(s):
Siu-Leong Iu
Show Abstract
This paper proposes the use of rejecting outliers locally to solve effectively two fundamental problems in estimating motion vector fields: motion vector discontinuity and occlusion. This approach does not introduce extra unknowns and parameters as other approaches, such as using the `line process' and the `occlusion process.' Since the objective function of this approach can be calculated locally for each pixel, a highly parallel implementation can be achieved. For the problem of motion vector discontinuity, we argue that since the outliers occur at the motion boundaries, we should reject them locally. We reject outliers at each pixel by thresholding with a threshold reference based on the surrounding neighbors. Thus, our approach is not sensitive to the amplitude of motion, and the normal smoothness assumptions can fully used on smooth areas. Also, since we apply outlier rejection to the motion model instead of to the measurements, we can reject outliers even when there are more outliers than non-outliers for a given pixel, such as at a moving corner for an 8-neighborhood system. The proposed approach has been extended to the problem of occlusion using three frames. Simulated annealing using a Gibbs sampler is used to solve the minimization problem. Experiments for synthetic and real image sequences have been conducted to illustrate the effectiveness of the proposed approach.
Psychovisual lattice vector quantization in subband image coding
Author(s):
Dominique Barba;
Albrecht Neudecker
Show Abstract
In this paper, we present a way of incorporating some important features of the human visual system in the coding of images within the framework of subband decomposition and of lattice vector quantization (VQ). As lattice VQ uses regular distribution of nodes, we use a non-linear pre-normalization of the signals in each subband, taking into account the visibility of impairments in the subbands and particularly the masking effects. Visually optimized subband lattice VQ allows us to encode complex images at bit rates between 0.4 to 0.6 bit/pixel without any visible distortion on a high-quality display.
Image sequence coding using frame-adaptive vector quantization
Author(s):
Fayez M. Idris;
Sethuraman Panchanathan
Show Abstract
Vector quantization (VQ) is a promising technique for low bit rate image coding. Recently image sequence coding algorithms based on VQ have been reported in the literature. We note that image sequences are highly non stationary and generally exhibit variations from frame to frame and from scene to scene; hence using a fixed VQ codebook to encode the different frames/sequences cannot guarantee a good coding performance. Several adaptive techniques which improve the coding performance have been reported. However, we note that most adaptive techniques result in further increases in the computational complexity and/or the bit rate. In this paper, a new frame adaptive VQ technique for image sequence coding (SC- FAVQ) is presented. This technique exploits the inter/intraframe correlations and provides frame adaptability at a reduced complexity. In addition, a dynamic self organized codebook is used to track the local statistics from frame to frame. Computer simulations using standard CCITT image sequences demonstrate the superior coding performance of SC-FAVQ.
Wavelet deformable model for shape description and multiscale elastic matching
Author(s):
Chun-Hsiung Chuang;
C.-C. Jay Kuo
Show Abstract
In this research, we propose a hierarchical wavelet curve descriptor which decomposes a planar curve into components of different scales so that the coarsest scale components carry the global approximation information while other finer scale components contain the local detailed information. Furthermore, we interpret the wavelet coefficients as random variables, and use the deformable stochastic wavelet descriptor to model a group of shapes which have the same topological structure but may differ slightly due to local deformation. We show that this descriptor can be conveniently used in multiscale elastic matching. Local deformation can be more effectively represented by the wavelet descriptor than the conventional Fourier descriptor, since wavelet bases are well localized in both the spatial and frequency domains. Experimental results are given to illustrate the performance of the proposed wavelet descriptor, where we use a model-based approach to extract the contour of an object from noisy images.
Efficient signal extension for subband/wavelet decomposition of arbitrary-length signals
Author(s):
Herjan J. Barnard;
Jos H. Weber;
Jan Biemond
Show Abstract
Compression of digital signals is often performed with a two-band subband/wavelet decomposition scheme. Conventional tree-structured schemes with depth k that are based on this two-channel scheme require an input signal of which the length is a multiple of 2k. Normally, if the input signal does not meet this condition, samples are added to it until the requirement is met. However, these extra samples lead to an increase in data. In this paper a new method is presented that is based on an efficient way of signal extension. With this method, signals of arbitrary length N can be decomposed into subbands up to arbitrary level without an increase in data. Furthermore, a new alternative boundary extension method for filtering even length signals with symmetric odd length filters is presented. This so-called symmetric-periodic extension is closely related to the new efficient signal extension method and has the advantage of having periodicity 2N. In this paper all signal extensions are explained visually with diagrams to clearly demonstrate perfect reconstruction conditions.
Image coding based on zerocrossing and energy information
Author(s):
Andrea Basso;
Alexander M. Geurtz;
Murat Kunt
Show Abstract
In this paper we propose a complete subband-based codec for image compression purposes, using a new 2D algorithm based on zero-crossing and energy information. Although signal representation based on sign information is attractive from a coding point of view, two major problems inhibited its practical use, namely the large amount of side information needed for obtaining a stable reconstruction and the efficient treatment of inherently 2D signals. The new algorithm significantly improves the reconstruction results obtained in our previous work, thanks to its improved convergence and its capabilities to deal with the two-dimensional nature of the subband image signals. The zero-crossing information is compressed in a lossless way using the JBIG standard and the energy information in a lossy way using an arithmetic coder. Good results have been obtained both at the reconstruction and at the coding level, which show the feasibility of the global scheme.
New efficient methods for Gabor analysis
Author(s):
Hans Georg Feichtinger;
Ole Christensen
Show Abstract
In this paper we describe new methods to obtain (non-orthogonal) Gabor expansions of discrete and finite signals. By this we understand the expansion of a signal of a given length n into a (finite) series of coherent building blocks obtained from a Gabor atom through discrete time- and frequency-shift operators. Although bump-type atoms are natural candidates the approach is not restricted to such building blocks. Also the set of time/frequency shift operators does not have to be a (product) lattice but just an ordinary (additive) subgroup of the time/frequency-plane which is naturally identified with the two-dimensional n X n cyclic group. In contrast, other non-separable subgroups turn out to be more interesting for the efficient determination of a suitable set of coefficients for the coherent expansion. For this purpose it is enough to determine the so-called dual Gabor atom. The existence and basic properties of this dual atom are known in the case of lattice groups from the ordinary frame theory. But more importantly, we demonstrate that the use of the conjugate gradient method reduces the computational complexity of determining it drastically. The required Gabor coefficients are simply obtained as short time Fourier coefficients of the given signal with the dual atom being the moving window.
Wavelet transform coding using NIVQ
Author(s):
Xiping Wang;
Sethuraman Panchanathan
Show Abstract
Discrete wavelet transform is an ideal tool for multi-resolution representation of image signals. Some promising results have been recently reported on the application of wavelet transform for image compression. In this paper, we propose a new wavelet coding technique for image compression. The proposed scheme has the advantages of improved coding performance and reduced computational complexity. The input image is first decomposed into a pyramid structure with three layers using a 2-D wavelet transform. A block size of 2m - 3 (m equals 1, 2, 3) is used for each orientation sub-image at the m-th layer to form 64-D vectors by combining the corresponding blocks in all the sub-images. The 64-D vectors are then encoded using 16-D non-linear interpolative vector quantization (NIVQ). At the decoder, the indices are used to reconstruct the 64-D vectors directly from a 64-D codebook designed using a non-linear interpolative technique. The proposed scheme not only exploits the correlation among the wavelet sub-images but also preserves the high frequency sub-images. Simulation results show that the reconstructed image of a superior quality can be obtained at a compression ratio of about 100:1.
Vector quantization of wavelet coefficients for super-high-definition image
Author(s):
Yoshiaki Kato;
Yoshihisa Yamada;
Hideo Ohira;
Tokumichi Murakami
Show Abstract
In this paper, we describe a new coding scheme for super high definition (SHD) images using vector quantization of wavelet coefficients. This coding scheme includes hierarchical blocking and adaptive coding techniques to optimize the coding efficiency. The SHD image is decomposed into multi-resolution layers with the biorthogonal wavelet filter. The general ideas of the adaptive coding are to remove the redundancy of correlation between different layers and to attempt the minimization of perceptional error through adaptive bit allocation. To achieve the high compression ratio, wavelet coefficients except for the low frequency bands are coded by using vector quantization. The binary tree codebooks are created for each resolution layer. The depth of the tree search is also adaptively controlled according to the magnitude of the input vector. To evaluate the picture quality, objective and subjective tests are performed on the reconstructed SHD images. The results of our method give better SNR and picture quality compared with the JPEG baseline coding.
Isochronous LAN-based full-motion video and image server-client system with constant distortion-adaptive DCT coding
Author(s):
Xiaonong Ran;
Walter Roland Friedrich
Show Abstract
A distributed real-time video playback application is presented using an isochronous Ethernet to interconnect video clients to dedicated video servers. Special displaying functions such as seeking, pausing, stepping, and regular/fast playing forward and backward are supported, which are all emulated in the server through control messages received from the clients. The video compression scheme is DCT-based and entropy-coded with adaptivity to the local statistical properties. The important property of the coding system is the constant quality of the decoded frames and is reached by minimizing the average bitrate subject to a constraint on an estimate of the quantization error. This coding scheme also gives rise to a higher PSNR performance as compared with JPEG.
Implementation of destination-address block-location using an SIMD machine
Author(s):
Tianhao Ding;
Joe Zheng
Show Abstract
The implementation of destination address block location using an SIMD machine is described. Both system architecture and software considerations are presented. The results and the processing times for regular envelopes are included to demonstrate that implementation on an SIMD machine is very cost-effective in image-based applications with real-time requirements. The conclusion is made that a mixed mode of SIMD and MIMD is the best approach to the efficient implementation of destination address block location.
Stroke extraction from Chinese characters by improved SLSA algorithm
Author(s):
Fu Jin;
Zheng-Ming Chai;
Xiaoqing Ding
Show Abstract
When syntactic pattern recognition approach is used to recognize handwritten Chinese characters it is inevitable that strokes from Chinese characters are extracted first. In 1987 Gu and Wang presented a stroke-extracting algorithm named straight line sequence approximation (SLSA) algorithm. SLSA does not need a thinning process, so it is very fast. However, in the crossing regions among strokes SLSA often extracts false strokes and causes stroke-splitting. The improved SLSA algorithm described in this paper aims at solving this problem.
Two-stage hierarchical search and motion vector smoothing in motion estimation
Author(s):
Long-Wen Chang;
Jiunn Yueh Ho
Show Abstract
Motion estimation is very important in video-phone, video-conference and HDTV, which will become part of everyday life in the near future. Conventionally, it uses full search algorithm because of its computational regularity suitable for VLSI implementation. However, its search range around the search center is 16 X 16 pixels, which requires 256 processors. To increase the search range by full search for HDTV applications, the number of processors will also increase. This makes it very difficult for VLSI implementation. This paper proposes a two stage hierarchical search algorithm to overcome the difficulty. In the first stage, k best motion vectors are found and then fine tuned in the second stage. Both stages use full search and can be easily implemented by VLSI. Instead of using multi-layers hierarchical search technique, we propose a two stage hierarchical search algorithm. It does not require sampling process and can be implemented in a much simpler hardware circuit.
Adaptive model for mixed binary image coding
Author(s):
Takahiro Hongu;
Takeshi Agui;
Hiroshi Nagahashi
Show Abstract
In recent facsimile application fields, mixed documents comprising characters and photographs have come to be generally treated. Following this trend, the `joint bi-level image group (JBIG)' of ISO/IEC/JTC1/SCWG9 and CCITT/SG VIII prepared an international standard for the encoding of binary images obtained by quantizing mixed documents into binary levels. This JBIG coding scheme consists of two parts: (1) the modeling part based on binary Markov model referring to 10 pixels surrounding a current pixel to be encoded and (2) the coding part based on an adaptive arithmetic compression coder. This paper presents an adaptive model for mixed binary images, which realizes higher compression efficiency than the typical Markov model of the JBIG scheme. We describe two significant characteristics, generalized model and area classification. The generalized model refers not only to neighboring pixels like the typical Markov model, but also to a predicted gray-level calculated pixel by pixel from previously scanned pixels near a current pixel. The area classification classifies mixed binary images including both characters and halftone images into two types of areas. In this adaptive model, the typical Markov model and the generalized model for halftone images are selected according to the types of the areas. As a result, the adaptive model reduces the values of Markov entropy of mixed binary images and realizes high compression efficiency.
Parallel VLSI-oriented algorithm and architecture for computing histogram of images
Author(s):
Heng-Da Cheng;
Xueqin Li;
Lifeng Wang
Show Abstract
The histogram of an image conveys information about the brightness and contrast of the image, and is used to manipulate these features. Histogram has many applications in image processing and may be needed at different processing stages. In this paper, we propose a parallel algorithm for computing the histogram of limited-width (such as gray-level) values. The essential parallelism and simplicity of the proposed algorithm make it easy to implement by using a VLSI array architecture. Each pixel only needs to perform addition and comparison, and communicate only with its immediate neighbor pixels during the entire computation period. The histograms for pre-load and in-load images can be computed using the proposed architecture. The time complexity for the proposed algorithm is O(N), comparing with O(N2) if using a uniprocessor, where N is the dimension of the image plane. The algorithm partition issue has also been studied.
Pyramid AR model to generate fractal Brownian random field
Author(s):
Bingcheng Li;
Songde Ma
Show Abstract
Fractal Brownian random (FBR) field is an extension of fractional Brownian motion (FBM) and has been successfully used in image analysis, the generation of natural scenes, fractal geometry, and other areas. But its implementation is difficult and limits its applications. In this paper, by extending the AR model, we propose a new approach, the pyramid-AR-model approach, to implement FBR fields. The new method has the same computational complexity as A. Fournier's, but it can generate FBR fields much more accurately.
Replacing mouse and trackball with tracked line of gaze
Author(s):
William E. Schroeder
Show Abstract
Foster-Miller has recently demonstrated a prototype eye-slaved target acquisition interface for the AEGIS workstation system, the `Visual Trackball.' In proof-of-principle stage it functioned: hands-free, head free (no head tracker or `fiducial mark'), immediately at a high level without training, losing and recurring the eye (when the user looked away), with glasses (but not sharp division bifocals) and contacts, and, most importantly, twice as fast as a trackball in a controlled test (for novice and experienced users). Progress was made on two obstacles to a practical eye tracking computer interface: (1) moving a research technology out of the laboratory into the real world, and (2) using a sensor (the eye) as a communications channel. By taking subjects at random and asking them to perform a simulation of the AEGIS target acquisition task with no preparation and little training, much real-world experience was gained about glasses, eyelashes and eyelids, and people's ability to compensate for system and procedural deficiencies. Problem behaviors were identified for later work. The prototype had some `customizable' features. Three methods of feeding back calculated eye-gaze were tried. Smoothing of raw eye gaze data was also adjustable. Scope exists for improved function by fitting these factors to the user.
Head-mounted computer interface based on eye tracking
Author(s):
William E. Schroeder
Show Abstract
A head-mounted computer interface using eye-tracking as a `pointer' was demonstrated at the Third Annual Weapons Technology Review and Training Exposition at San Diego in January 1992, and is under further development for military and civilian applications. in this interface, the computer interprets eye gaze direction just as it would input from a mouse or trackball. This technology makes possible computer-human interaction with many ideal aspects: complete portability, hands-free operation, silent, secure operation anywhere and any time. Commercial systems using control inputs from the tracked eye are available and effective within their limits, but size, weight, and cost have been barriers to wider use. This paper describes the latest head-mounted eye tracking display (HETD) development prototype. We are exploring the technological limits of current and near-term lighting, detector, and lens technology to project the feasibility of this device.
Segmentation evaluation and comparison: a study of various algorithms
Author(s):
Yujin Zhang
Show Abstract
An objective and quantitative study of several representative segmentation algorithms is presented. In this study, the measurement accuracy of object features from the segmented images is taken to judge the quality of segmentation results and to assess the performance of applied algorithms. Moreover, some synthetic images are specially generated and used in test experiments. This evaluation and comparison study reveals the behavior of those algorithms within various situations, provides their performance ranking under real-like conditions, gives some limits and/or constraints for employing those algorithms in different applications, as well as indicates several potential directions for improving their performance and initiating new developments. Since the investigated algorithms are selected from different technique groups, this study also shows that the presented approach would be valid and effective for treating a wide range of segmentation algorithms.
Streak splicing using medial-axis transform techniques
Author(s):
Daniel J. Falabella
Show Abstract
One of the objectives of image processing is to detect and locate some meaningful discontinuities in the gray levels of an image. The usual approach taken is referred to as edge detection. Streak detection presents a related but sometimes different problem. Here we are concerned not only with the edges but also with the properties and features of that portion of the image which is contained within the boundaries of these edges. This paper looks at particular methods which may be applied for the location and characterization of such streaks and for the possible splicing together of different streak segments. Particular emphasis is placed on the development of an invariant proprerty referred to as the Preservation of Perimeter Property of an object (PPP). Coupled with this is the use of the Medial Axis Transformation (MAT) as a tool in the process. The MAT produces what is known as the skeleton of the object. Integrating a new approach to finding the Euclidean Elevation Map (EEM) with the MAT is a key in the development phases. The approach taken represents a compromise between the process of producing an exact bitmap of the skeleton and developing a faster approximation of the skeleton which incorporates usable, structural information.
Method for vectorizing line-structured engineering drawings based on window features extraction
Author(s):
Xiangyang Wang;
Xinggang Lin;
Youshou Wu
Show Abstract
This paper presents a method for vectorizing line-structured engineering drawings based on window features extraction. A line-structured engineering drawing is composed of straight lines and curves (they may have different widths) as well as their ends, corners, and crosses (we call them feature points). In the paper we use 2-dimension black-run-length to trace and separate different lines. We present feature point extracting criteria to detect feature point. When any feature point is detected a small rectangle window is opened around the feature point. After an adaptive window enlarge algorithm is applied, a proper size and position of the window which is named as Window Feature is obtained. In this way, we can see whether it is really a feature point (end, corner, cross) or just a noise. We define 40 window features and with these window features, we can process and vectorize all the complicated cross points in mechanical drawings. Finally we give some experimental samples about vectorizing mechanical drawings.
Three-dimensional region matching and tracking in a long binocular image sequence
Author(s):
K. Kaoula;
M. Benjelloun;
Bernard Dubuisson
Show Abstract
In this paper, a new approach to the tracking and matching processes is proposed. These two usually distinct operations are merged into a single one. The two binocular image sequences are seen as being a unique entity. This is done by using a clustering-based approach. We discuss about the determination of the best fitted invariant 3D region descriptors and the corresponding feature space. An optimal configuration of the binocular system is deduced from these considerations. This tracking and matching strategy allows the simultaneous processing of 2 X N binocular frames and the handling of occlusions phenomena and nonrigid motion segmentation and object deformation.
Image segmentation by wavelet-based automatic threshold selection
Author(s):
Jean-Christophe Olivo
Show Abstract
A segmentation method using a peak analysis algorithm for threshold selection is presented. It is based on the detection of the zero-crossings and the local extrema of a wavelet transform which give a complete characterization of the peaks in the histogram. These values are used for the unsupervised selection of a sequence of thresholds describing a coarse-to-fine analysis of histogram variation. The results of using the proposed technique are presented in the case of different images.
Fast two-dimensional entropic thresholding algorithm
Author(s):
Wen-Tsuen Chen;
Chia-Hsien Wen;
Chin-Wen Yang
Show Abstract
Two-dimensional entropic thresholding is one of the important thresholding techniques for image segmentation. Usually, the global threshold vector is selected from L2 (gray level, local average) pairs through a `maximum' optimization procedure with O(L4) computation complexity. This paper proposes a fast two-phase 2D entropic thresholding algorithm. In order to reduce the computation time, we estimate 9L2/3 candidate threshold vectors from a quantized image of the original in advance. The global threshold vector is then obtained by checking candidates only. The optimal computation complexity is O(L8/3) by quantizing the gray level in L2/3 levels. Experimental results show that the processing time of each image is reduced from more than two hours to about two minutes. The required memory space is also greatly reduced.
Detecting L-, T-, and X-junctions from low-order image derivatives
Author(s):
Toshiki Iso;
Masahiko Shizawa
Show Abstract
The proposed one-shot algorithm requires only first- and second-order image derivatives in local neighborhoods to detect multiple orientations and to classify L-, T-, X-junctions. This algorithm has two stages. The first stage detects multiple orientations with a parametrized filter based on the principle of superposition, and the second stage classifies L-, T-, X- junctions with first-order image derivatives in local areas in order to evaluate the borders. The computational cost is therefore low, and the method is robust against noise. The algorithm provided successful results for artificial images and real images.
Adaptive color subsampling of images
Author(s):
Anil M. Murching;
John W. Woods
Show Abstract
New methods for the subsampling of color images are presented. A block-based orthogonalizing transform is used to represent the RGB image in an efficient domain. This transformation offers substantial advantages when preceded by a segmentation of the image based on chrominance content. An adaptive scheme, based on a threshold test on the local variance, is proposed for subsampling the resulting chrominance-related components. The proposed subsampling scheme outperforms conventional methods based on transforms such as the YCRCB, in terms of an improved quality versus sample-rate (or bit-rate) trade-off. It is possible to achieve 4:4:4 quality at near 4:2:0 rates. A block-based strategy to code the resulting color image components is also presented, which is based on the JPEG Baseline method.
Fractal-based image coding with polyphase decomposition
Author(s):
Kwo-Jyr Wong;
Ching-Han Lance Hsu;
C.-C. Jay Kuo
Show Abstract
A new method for fractal image compression by applying Jacquin's algorithm to a polyphase decomposed image is proposed to increase the encoding efficiency in this research. By using a (P X P) : (1 X 1) polyphase decomposition with P equals 2n, we divide an image into P X P subimages and then apply the Jacquin compression algorithm to these subimages independently. We show that the resulting scheme can improve the coding speed by a factor of P2 at the sacrifice of the decompressed image quality. Besides, since the subimages are very similar to each other, we may focus on a small subset of subimages, seek the appropriate domain block for their range blocks, and record the information of address mapping, scaling and offset. To encode the remaining subimages, we simply determine the scaling and the offset based on the same set of addressing mapping previously found. A set of numerical experiments with various parameters, including the polyphase decimation factor P, the size D (or R) of domain (or range) blocks, and the size s of search step, are performed to illustrate the tradeoff between the speed, image quality, and compression rate.
Multiscale stochastic approach to object detection
Author(s):
Daniel R. Tretter;
Charles A. Bouman
Show Abstract
We present a method for object detection based on a novel multiscale stochastic model together with Bayesian estimation techniques. This approach results in a fast, general algorithm which may be easily trained for specific objects. The object model is based on a stochastic tree structure in which each node is an important subassembly of the three dimensional object. Each node or subassembly is modeled using a Gaussian pyramid decomposition. The objective of the algorithm is then to estimate the unknown position of each subassembly, and to determine on the presence of the object. We use a fast multiscale search technique to compute the sequential MAP (SMAP) estimate of the unknown position, scale factor, and 2-D rotation for each subassembly. The search is carried out in a manner similar to a sequential likelihood ratio test, where the process advances in scale rather than time. We use a similar search to estimate the model parameters for a given object from a set of training images.
Hand gesture recognition using a stick figure model
Author(s):
Kuplong Yunibhand;
Hirotsugu Kinoshita;
Yoshinori Sakai
Show Abstract
A method for automated 2D human body description (stick figure) based on segmentation is described. Segmentation is based on contrast, motion, and knowledge of body structure. The human body (upper part), segmented into 6 parts, is represented by an adjacency graph. The regions assigned to each part are modeled as a Markov random field (MRF) on the graph, and the body part detection problem is then formulated as a maximum a posteriori (MAP) estimation. In addition, an algorithm for generating stick figures without excessive computation is proposed.
Adaptive rate control algorithm for DPCM/DCT hybrid video codec adopting bidirectional prediction
Author(s):
Seong Hwan Jang;
Seop Hyeong Park
Show Abstract
In DPCM/DCT hybrid coding algorithm like MPEG, rate control strategy is the key to the reconstruction of images with high quality. In this paper, we propose a new rate control algorithm for DPCM/DCT hybrid video codec which are capable of handling scene change pictures and maintaining small variations of reference quantization parameter within a picture. If a scene change is detected at P pictures, the number of target bits is increased enough to reconstruct images with high quality. In order to allocate the appropriate number of bits to each macroblock, the number of generated bits by a macroblock is estimated before quantization. Experimental results show that the proposed algorithm gives more than 1.5 dB improvement in PSNR at scene change pictures compared to the rate control algorithm described in TM5. It is also shown that the proposed algorithm reduces the variation of the reference quantization parameter within a picture compared to the rate control algorithm in MPEG2 video TM5. This results in uniform distribution of the PSNR of each macroblock within a picture.
Region-based dichromatic estimation method for illumination color estimation
Author(s):
Harumi Kawamura;
Sadayuki Hongo;
Isamu Yoroizawa
Show Abstract
We propose the region-based dichromatic (RBD) estimation method for illumination color estimation. The method realizes illumination color estimation and, at the same time, illumination independent color object segmentation, even if the highlights in the image are generated by several objects of different color. The dichromatic reflection theory, which was proposed by S. A. Shafer and T. Kanade, yields a method that sometimes outputs incorrect illumination color when the color of high luminance matte region is similar to the illumination color. The proposed method solves this problem by extracting highlights as bicomponent regions. The RBD estimation method proceeds as follows: (1) extraction of high luminance regions, (2) grouping of similar color regions based on color difference in the L*u*v* uniform color space, (3) extraction of highlight regions which contain two reflection components, (4) illumination color estimation by finding the point that minimizes the summation of the distance to all highlight lines in each highlight region. The experimental results for real images show that the RBD estimation method estimates the illumination color more correctly than the conventional method.
Pedestrian counting system robust against illumination changes
Author(s):
Atsushi Sato;
Kenji Mase;
Akira Tomono;
Ken'ichiro Ishii
Show Abstract
This paper presents a realtime pedestrian counting system based on x-t spacetime image analysis. The system counts the number of pedestrians through the combination of three processes: moving object extraction, object counting by region labeling, and direction detection by flow estimation. In order to extract objects reliably under various illumination changes, we propose new moving objects extraction methods that extract moving regions whose shadows have been eliminated by using adaptive background image reconstruction and the color information processing of the images. For labeling and flow estimation, we use the Ortho-sectioning method that analyzes extracted regions on a spacetime slice image of the original three dimensional volume. Experimental results confirm the robustness of the system against illumination changes.
Human image tracking technique applied to remote collaborative environments
Author(s):
Yoshio Nagashima;
Gen Suzuki
Show Abstract
To support various kinds of collaborations over long distances by using visual telecommunication, it is necessary to transmit visual information related to the participants and topical materials. When people collaborate in the same workspace, they use visual cues such as facial expressions and eye movement. The realization of coexistence in a collaborative workspace requires the support of these visual cues. Therefore, it is important that the facial images be large enough to be useful. During collaborations, especially dynamic collaborative activities such as equipment operation or lectures, the participants often move within the workspace. When the people move frequently or over a wide area, the necessity for automatic human tracking increases. Using the movement area of the human being or the resolution of the extracted area, we have developed a memory tracking method and a camera tracking method for automatic human tracking. Experimental results using a real-time tracking system show that the extracted area fairly moves according to the movement of the human head.
Prefilter approach for the design of the lapped orthogonal transform basis
Author(s):
ChangWoo Lee;
Sang Uk Lee
Show Abstract
In this paper, we propose a new method to design the LOT basis with the view of maximizing the transform coding gain. In our approach, only the linear phase basis is considered, since the nonlinear phase basis is inappropriate to the image coding. The proposed design technique for the LOT basis is mainly based on decomposing the transform matrix into an orthogonal matrix and the prefilter matrix. Based on the decomposition, the prefilter matrix and the orthogonal matrix are designed separately. It is shown that the proposed LOT yields the improved coding gain as compared to the conventional LOT.
Foreign object detection via texture recognition and a neural classifier
Author(s):
Devesh Patel;
I. Hannah;
E. R. Davies
Show Abstract
It is rate to find pieces of stone, wood, metal, or glass in food packets, but when they occur, these foreign objects (FOs) cause distress to the consumer and concern to the manufacturer. Using x-ray imaging to detect FOs within food bags, hard contaminants such as stone or metal appear darker, whereas soft contaminants such as wood or rubber appear slightly lighter than the food substrate. In this paper we concentrate on the detection of soft contaminants such as small pieces of wood in bags of frozen corn kernels. Convolution masks are used to generate textural features which are then classified into corresponding homogeneous regions on the image using an artificial neural network (ANN) classifier. The separate ANN outputs are combined using a majority operator, and region discrepancies are removed by a median filter. Comparisons with classical classifiers showed the ANN approach to have the best overall combination of characteristics for our particular problem. The detected boundaries are in good agreement with the visually perceived segmentations.
Wavelets for sign language translation
Author(s):
Beth J. Wilson;
Gretel Anspach
Show Abstract
Wavelet techniques are applied to help extract the relevant parameters of sign language from video images of a person communicating in American Sign Language or Signed English. The compression and edge detection features of two-dimensional wavelet analysis are exploited to enhance the algorithms under development to classify the hand motion, hand location with respect to the body, and handshape. These three parameters have different processing requirements and complexity issues. The results are described for applying various quadrature mirror filter designs to a filterbank implementation of the desired wavelet transform. The overall project is to develop a system that will translate sign language to English to facilitate communication between deaf and hearing people.
Signal-adapted transform coding of sequences
Author(s):
Benoit M. M. Macq;
Serge Comes;
J. Y. Mertes;
Maria Paula Queluz
Show Abstract
In standard DCT coding schemes like MPEG, the sequences compression is achieved by motion compensation, transformation, quantization, and entropy coding. In this paper, we have followed the same path by adapting to the image signal the elements of the coding scheme. The motion compensation is achieved by a block-matching method, where the size of the blocks is adapted to the signal. Great attention has been paid to the relevance of the motion field. Combined with the motion compensation, the two fields of each frame are merged, taking into account the measured motion vectors, to compose a pseudo-progressive frame. The encoding is applied to this `motion-compensated progressive' frame. A wavelet decomposition is then applied on each (inter or intra) frame. Such a transform, intrinsically owning linear- phase and perfect reconstruction properties, has been optimized for maximizing a perceptually weighted coding gain. The wavelet coefficients are thereafter vector-quantized, in order to reach the maximum perceptual SNR : frequency weighting is taken into account. The relevance of the measured vector field allows a precise spatio-temporal quantization optimization. The vectors are entropy coded taking into account the remaining inter-band dependence, by an adapted entropy code. Results obtained from 1 Mbit/s to 8 Mbit/s are shown for moving sequences at the conference.
Layered video coder with self-concealed capability using frequency scanning technique
Author(s):
Liem H. Kieu;
King N. Ngan
Show Abstract
In this paper, the frequency scanning and the Modified Universal Variable Length Coding (MUVLC) technique is examined as a means to improve the cell loss resilience of video codecs. An appropriate implementation of this technique for effective use in a pyramid layered coding scheme is described. Simulation results are presented which show the superior performance of this slice based coding technique in comparison to the conventional block scanning and Variable Length Coding (VLC) technique having similar coding efficiency.
Alternative morphological thinning algorithm
Author(s):
Frank C. Glazer
Show Abstract
Digital skeletons, generated by thinning algorithms, are used in the characterization and analysis of the shape of objects in binary images. Many proposed algorithms have been presented without a formal analysis, making it difficult to determine their correctness, convergence, and accuracy. A notable exception is provided by morphological thinning algorithms based on mathematical morphology. Jang and Chin have presented a precise definition of digital skeletons and a mathematical method for the analysis of morphological thinning algorithms. They used this analysis to design a new thinning algorithm. For some applications, the skeletons produced by their algorithm may be excessively branched due to insignificant details on the border of the objects. This report presents an alternative operator, defined within Jang and Chin's morphological method, that is less sensitive to such border noise.
Study on a very low bit-rate video coding algorithm
Author(s):
Masahisa Kawashima;
Fumihiko Numata;
Hideyoshi Tominaga
Show Abstract
A very low bit-rate video coding algorithm is proposed. The algorithm is a hybrid motion compensation and DCT (discrete cosine transform) coding. However a progressive transmission scheme with a frequency-oriented coding of DCT coefficients is proposed to cope with partial freeze due to the buffer overflow. In addition, some methods to improve motion estimation are proposed. The simulation results show the proposed algorithm shows perceptually better decoded images than the traditional RM8 (reference model 8) algorithm.
Image compression with the wavelet transform
Author(s):
Jeffrey D. Argast;
Malan D. Rampton;
Xin Qiu;
Todd K. Moon
Show Abstract
We present a new algorithm for coefficient selection during image compression using the wavelet transform for decomposition and reconstruction. This new algorithm assigns a weight to each scale so that coefficients from a high resolution scale are more likely to be discarded before coefficients from a low resolution scale. Results show our algorithm introduces less error into the reconstructed image than the sort method developed by DeVore/em et al.
Joint blind equalization, carrier recovery, and timing recovery for HDTV modem
Author(s):
Yong-Seok Choi;
Deok Soo Han;
Hyun Ha Hwang
Show Abstract
We introduce a high speed modem structure based on an adaptive blind equalization coupled with carrier and timing recovery for QAM-based HDTV modem receiver. The adaptive equalization, which is called a modified `stop-and-go' algorithm (MSGA), uses the combination of the constant modulus algorithm (CMA) and a `stop-and-go' algorithm (SGA). The CMA initializes the equalizer coefficients; once initialized, the CMA is switched to the MSGA for completing equalizer convergence. In order to obtain faster convergence, the Sato- like errors used in the SGA are replaced by errors used for updating coefficients in the CMA. The carrier recovery for coupling with the adaptive equalization uses the decision-directed technique. The calculation of absolute sum of error gives a simple and reliable way of switching from CMA to MSGA for the adaptive equalizer's coefficient updating procedure. The nonlinear spectral line extraction scheme, which consists of a prefilter and a squarer followed by a narrow bandpass filter, is used for the timing recovery. In timing recovery scheme, we propose a zero-crossing detection technique for extracting the exact symbol timing phase from the output of the narrow bandpass filter output. Simulation results reveal that the proposed structure is robust against additive white Gaussian, impulsive noise, multipaths and carrier phase errors such as frequency offset, phase offset and phase jitter.
Regularized iterative image restoration based on an iteratively updated convex smoothing functional
Author(s):
Moon Gi Kang;
Aggelos K. Katsaggelos
Show Abstract
The determination of the regularization parameter is an important issue in regularized image restoration, since it controls the trade-off between fidelity to the data and smoothness of the solution. A number of approaches have been developed in determining this parameter. In this paper, we propose the use of a regularization functional instead of a constant regularization parameter. The properties such a regularization functional should satisfy are investigated, and two specific forms of it are proposed. An iterative algorithm is proposed for obtaining a restored image. The regularization functional is defined in terms of the restored image at each iteration step, therefore allowing for the simultaneous determination of its value and the restoration of the degraded image. Both proposed iteration adaptive regularization functionals are shown to result in a smoothing functional with a global minimum, so that its iterative optimization does not depend on the initial conditions. The convergence of the algorithm is established and experimental results are shown.
Robust motion vector prediction algorithms with application to very low bit rate image sequence coding
Author(s):
Taner Ozcelik;
Aggelos K. Katsaggelos
Show Abstract
In this paper a new motion compensated (MC) predictive coding method for image sequences at very low bit rates is presented. This method utilizes a prediction of the displacement vector field (DVF) in order to produce an MC prediction error which is coded and transmitted. Assuming that the two previous frames are available both at the transmitter and the receiver the DVF corresponding to the previous frame is estimated. Then, based on this estimate a temporal prediction of the DVF at the current frame is obtained. Using the assumption that the motion is constant along its trajectory, an auto-regressive (AR) model based method for performing such a prediction is proposed. According to the proposed scheme there is no need to transmit the DVF. Since the transmission of the DVF represents a sizeable overhead in very low bit rate coding of video signals, this method constitutes a major contribution to bit rate reduction and quality improvement. Using the predicted DVF, the MC prediction error is obtained, which is then transform and entropy coded. The proposed algorithm is experimentally tested on standard video-conferencing image sequences and compared to previously reported methods which transmit the motion vectors. Significantly improved results are obtained compared to the methods that transmit the motion vectors in terms of reduced bit- rate and improved quality of the reconstructed image sequence.
Adaptive subband coding of full motion video
Author(s):
Kamran Sharifi;
Leping Xiao;
Alberto Leon-Garcia
Show Abstract
In this paper a new algorithm for digital video coding is presented that is suitable for digital storage and video transmission applications in the range of 5 to 10 Mbps. The scheme is based on frame differencing and, unlike recent proposals, does not employ motion estimation and compensation. A novel adaptive grouping structure is used to segment the video sequence into groups of frames of variable sizes. Within each group, the frame difference is taken in a closed loop Differential Pulse Code Modulation (DPCM) structure and then decomposed into different frequency subbands. The important subbands are transformed using the Discrete Cosine Transform (DCT) and the resulting coefficients are adaptively quantized and runlength coded. The adaptation is based on the variance of sample values in each subband. To reduce the computation load, a very simple and efficient way has been used to estimate the variance of the subbands. It is shown that for many types of sequences, the performance of the proposed coder is comparable to that of coding methods which use motion parameters.
Fast gray-level morphological transforms with any structuring element
Author(s):
Christopher Gratin;
Hugues Talbot
Show Abstract
This paper presents efficient algorithms to perform standard morphological operations on gray-level images such as erosions and dilations with any structuring elements . In a first section the general case is studied. A solution taking advantage of overlapping areas of the structuring elements and involving either a hiearchical approach or a simple sort of the pixels of the image is described, along with a comparison of this algorithm with existing methods. The proposed method shows significant improvement over these methods. Some mean of accelerating the algorithm further are also indicated. In a second section, the particular case of the line segment as structuring element is studied and a new method for dealing with these structuring elements, oriented in any direction, is proposed, which features a constant, optimal computing time with respect to the length of these structuring elements. Extensions to other types of structuring elements using the described technique are also proposed.
Integration of picture painting process by domain-specific extensible system
Author(s):
Yasuhide Miyamoto;
Hiroki Ino;
Yo Murao;
Hajime Enomoto;
Yo Moriya;
Minoru Kamoshida
Show Abstract
A painting system has been developed to generate arbitrary still and moving pictures from within the one system. The system is constructed based on a new language, Extensible WELL (Window-based Elaboration Language). Extensible WELL is organized to be available as a language for any specified domain. Appropriate interface is required between Extensible WELL and a language for a specified domain. It is expressed as a schema corresponding to the processes in the specified domain, and is called the object network. Two kinds of object networks are provided in the painting processes for still and moving pictures, and are together combined and then added to Extensible WELL. Thus, being based on Extensible WELL, the painting processes of still and moving pictures are integrated as a system.
Interactive specification and data schema for picture painting process
Author(s):
Yuji Hashimoto;
Takeshi Murano;
Yo Murao;
Hajime Enomoto
Show Abstract
In a service process of a software system, an intention of a client is satisfied through the cooperative communication between the client and the server. For a description of specification of the system, it is necessary that this communication is explicitly represented as an interaction. This is called interactive specification. At the interaction of the picture painting process, the cooperative communication between client and server is done through the query of the picture data in the database. Picture data has much information compared with numerical or character data. This paper defines the data schema for picture painting so that the client can store, retrieve, and process the data in the database efficiently. This data schema also has an extended relational data model that has a hierarchical structure along the above process. Two types of schema are described in this paper. One is the denotation schema of interactive specification. Another is data schema for the picture painting process. They have the same idea of description.
Flexible search-based approach for morphological shape decomposition
Author(s):
Joseph M. Reinhardt;
William E. Higgins
Show Abstract
Mathematical morphology is well-suited to capturing geometric information. Hence, morphology-based approaches have been popular for object shape representation. The two primary morphology-based approaches, the morphological skeleton and the morphological shape decomposition (MSD), each represent an object as a collection of disjoint sets. A practical shape representation scheme, though, should give a representation that is computationally efficient to use. Unfortunately, little work has been done for the morphological skeleton and the MSD to address efficiency. We propose a flexible search- based shape decomposition scheme that typically gives more efficient representations than the morphological skeleton or MSD. Our method decomposes an object into a number of simple components based on homothetics of a set of structuring elements. To form the representation, the components are combined using set union and set difference operations. We use three constituent component types and a thorough cost-based search strategy to find efficient representations. We also consider allowing some object representation error, which may yield even more efficient representations.
Comparison of ISO MPEG1 and MPEG2 video-coding standards
Author(s):
Andria H. Wong;
Cheng-Tie Chen
Show Abstract
MPEG1 video coding standard is an international standard primarily aimed at storage applications, and has been adopted in 1992. As a second work item by the ISO MPEG committee, MPEG2 standard is being considered for broader applications. In this paper, we investigate the advantages of the more complicated MPEG2 standard over the MPEG1. An overview on the technology of the standards and their major differences is given. Simulation results are presented to compare the performance of the two standards in terms of quality, quantization, bit count, and buffer content. It is shown that the MPEG2 yields better performance especially for highly interlaced video sequence.
New multialphabet multiplication-free arithmetic codes
Author(s):
Shawmin Lei
Show Abstract
Arithmetic coding is a powerful lossless data compression technique that has attracted much attention in recent years. It provides more flexibility and better efficiency than the celebrated Huffman coding does. However, the multiplications needed in its encoding and decoding algorithms are very undesirable. Rissanen and Mohiuddin have proposed a simple scheme to avoid the multiplications. We found that the performance of their proposed scheme might degrade significantly in some cases. In this paper, we propose a multiplication-free multialphabet arithmetic code which can be shown to have only minor performance degradation in all cases. In our proposed scheme, each multiplication is replaced by a single shift-and-add. We prove, by both theoretical analysis and simulation results, that the degradation of the proposed multiplication-free scheme is always several times smaller than that of the Rissanen-Mohiuddin's scheme.
Stochastic optimal control of variable bit-rate video coders
Author(s):
Jose Ignacio Ronda;
Fernando Jaureguizar;
Narciso N. Garcia
Show Abstract
Transmission of variable bit-rate encoded video over either constant or variable bit-rate communication channels imposes restrictions on the shape of the bit generation process in the encoder. In order that these restrictions can be met, a coder-network interface system has be to included consisting of a buffer and a coder controller device. In this paper we show how the availability of appropriate statistical models of the behavior of the source-coder system allows the formalization of the problem of the controller design as a stochastic optimal control problem, which can be solved by direct application of dynamic programming algorithms. Focusing the problem in the adaptation of a hybrid DCT coder to a fixed bit-rate channel, examples of the three stages of the process (source-coder modelization, problem definition, and optimal policy computation) are provided.
Efficient coding method for stereo image pairs
Author(s):
Wenhua Li;
Ezzatollah Salari
Show Abstract
In this paper, a new coding method for stereo image pairs is presented. Disparity vectors are estimated with variable size block matching algorithm which requires fewer overhead bits for transmission. The different statistical properties between original images and disparity- compensated images are discussed. The efficiency of DCT compression is fully utilized through adaptively switching between the original image data and the disparity-compensated image data. Simulation results demonstrate that the proposed scheme performs well.
High-quality subband image coding of TV signals at 5 Mbit/s with motion compensation interpolation and visually optimized scalar quantization
Author(s):
Jose Hanen;
Dominique Barba
Show Abstract
To obtain a high level of quality when coding a TV color video signal with a high compression rate, the spatial subband decomposition and the differential coding with motion compensation and psychovisual quantization have been previously successfully used for coding a color television sequence at a rate of 10 Mbit/s. However these techniques are not sufficiently efficient to achieve the objective of a rate of 5 Mbit/s without any significant reduction of the spatial resolution and of the quantification quality. We present a subband coding method which allows the user to simultaneously maintain the full spatial resolution and the quantization quality. This method is based on a frame skipping algorithm; it primarily uses motion compensation techniques for the interpolation of non coded frames and the prediction of coded frames and secondarily a spatial subband decomposition of the prediction error before a coding step which uses a set of visually optimized scalar quantizers -- the quantization in the spatial frequency domain is adapted to the visual perception by incorporating some important aspects of the human visual system (frequency dependence and masking).
Multichannel regularized iterative restoration of image sequences
Author(s):
Mun Gi Choi;
Ozan E. Erdogan;
Nikolas P. Galatsanos;
Aggelos K. Katsaggelos
Show Abstract
The recent advances in visual communications make restoration of image sequences an increasingly important problem. In addition, this problem finds applications in other fields such as robot guidance and target tracking. Restoring the individual frames of an image sequence independently is a suboptimal approach because the between frame relations of the image sequence are not explicitly incorporated into the restoration algorithm. In this paper we address this problem by proposing a family of multichannel algorithms that restore the multiple time frames (channels) simultaneously. This is accomplished by using a multichannel regularized formulation in which the regularization operator captures both within and between- frame (channel) properties of the image sequence. More specifically, this operator captures both the spatial within-frame smoothness and the temporal along the direction of the motion between-frame smoothness. We propose a number of different methods to define multichannel regularization operators and a family of algorithms to iteratively obtain the restored images. We also present experiments that demonstrate beyond any doubt that the proposed approach produces significant improvements over traditional independent frame restoration of image sequences.
Structural limitations of self-affine and partially self-affine fractal compression
Author(s):
Jaroslaw Domaszewicz;
Vinay A. Vaishampayan
Show Abstract
Fractal image compression using self-affine transformations has recently drawn considerable attention. Although some elements of the technique have a well-established foundation, many issues remain unclear. We consider the attractors that are obtained by varying the parameters of the contractive transformation. We show that the parameters can be divided into two groups and that if the parameters in the first group are fixed, the set of attractors obtained by varying the parameters in the second group is a vector space. Based on this observation, an improvement to the collage coding technique for encoding data is obtained. We then present a coder, referred to as the classified transform coder, which is structurally limited in the same way as the fractal coder. However, in the classified transform coder, the design of the pool of subspaces is directly addressed. Finally, some performance results are presented for the classified transform coder.
Estimation of visual bandwidths and their impact in image decomposition and coding
Author(s):
Abdelhakim Saadane;
Dominique Barba;
Hakim Senane
Show Abstract
In order to characterize the spatial frequency mechanisms of the visual system, we measured the visibility threshold evaluation as a function of the spatial frequency cosine maskers. The stimulus and the maskers used were spatially localized and temporally weighted. The results show that the relative bandwidth (defined as the ratio between the estimated bandwidth and the frequency of the masker) varies from 3 in low frequencies of the masker (1 cy/d degree(s)) to 1.15 in high frequencies of the masker (10 cy/d degree(s)). This is consistent with a model having five classes of spatial frequency mechanisms covering the band 0 - 30 cy/d degree(s). These results allow the definition of a sub-band decomposition of images in twenty-one `visual components.'
Tracking of global motion and facial expressions of a human face in image sequences
Author(s):
Marcel J. T. Reinders;
F. A. Odijk;
Jan C. A. van der Lubbe;
Jan J. Gerbrands
Show Abstract
We present a system in which the global motion (3D rotation and translation) and the local motion (facial expressions) of the face of a talking person are estimated automatically from an input image sequence. First, the shape of the facial features, such as eyes and mouth, are robustly extracted from the images. Then, based on the extracted shape of the facial features, the global motion and facial expressions are estimated. No human assistance is necessary throughout the process. The system relies on the use of a priori knowledge about the scene and facial motions. In the feature extraction scheme this a priori knowledge is modeled by a descriptive tree of the scene and geometric shape representations of the facial features. The global motion can be easily obtained from these facial features. In the local motion estimation scheme the a priori knowledge is modeled in a belief network, in which knowledge about muscle actuators is represented, e.g., the interactions between the muscle actuators and their visible manifestations. A few experiments are included to illustrate the system.
Real-time reprogrammable low-level image processing: edge detection and edge tracking accelerator
Author(s):
M. Meribout;
Kun Mean Hou
Show Abstract
Currently, in image processing, segmentation algorithms comprise between real time video rate processing and accurate results. In this paper, we present an efficient and not recursive algorithm filter originated from Deriche filter. This algorithm is implemented in hardware by using FPGA technology. Thus, it permits video rate edge detection. In addition, the FPGA board is used as an edge tracking accelerator, it allows us to greatly reduce execution time by avoiding scanning the whole image. We also present the architecture of our vision system dedicated to build 3D scene every 200 ms.
Tradeoffs in the design of wavelet filters for image compression
Author(s):
Patrice Onno;
Christine M. Guillemot
Show Abstract
This paper addresses the problem of joint optimization of wavelet transform, quantization, and data rate allocation according to mathematical criteria for high compression efficiency of image coding algorithms. The relevancy of some filter bank properties for compression purposes is evaluated. Using lattice structures, a large number of orthogonal and biorthogonal wavelet filter banks, with different properties of regularity, coding gain, phase linearity, and cross-correlation between adjacent bands are designed. Scalar and lattice vector quantization is then optimized adaptively to filter bank characteristics and to signal statistics. An appropriate choice of transition bandwidth, decreasing the energy around the Nyquist frequency without constraints of `zeros' in (omega) equals (pi) , provides by the maximum selectivity criterion filter banks close in performance to filters that we found optimum, and designed to satisfy either the maximum coding gain or minimum cross-correlation criterion. For a lower transition bandwidth, the increased regularity has for effect to increase the coding gain, to reach a maximum coding gain for maximally regular Daubechies filters. When comparing results of coding with the optimal orthogonal wavelet filter bank with those provided by a maximally frequency selective biorthogonal solution with same regularity it is observed that for a comparable peak SNR the contours are better reconstructed with biorthogonal solutions.
Temporal redundancy reduction using a motion model hierarchy and tracking for image sequence coding
Author(s):
Henri Nicolas;
Fabrice Moscheni
Show Abstract
This paper introduces an improved variable size block based motion estimation algorithm relying on a hierarchy of motion representations. The most efficient of the latter is chosen through an evaluation constraint which models the total bit rate. The proposed coding technique exploits the concepts of temporal tracking and localization of the displaced frame difference energy. Moreover, only the smallest regions of the latter are coded and transmitted. Simulations show a significant improvement of the performances.
Active mesh: a video representation scheme for feature seeking and tracking
Author(s):
Yao Wang;
Ouseb Lee
Show Abstract
This paper introduces a representation scheme for images and video sequences using nonuniform samples embedded in a mesh structure. It describes a video sequence by the nodal positions and colors in a starting frame, followed by the nodal displacements in the following frames. The nodal points are more densely distributed in regions containing interesting features such as edges and corners, and are dynamically updated to follow the same features in successive frames. They are determined automatically by maximizing feature (e.g., gradient) magnitudes at nodal points, while minimizing interpolation errors within individual elements, and matching errors between corresponding elements. In order to avoid the mesh elements becoming overly deformed, a penalty term is also incorporated which measures the irregularity of the mesh structure. The notions of shape functions and master elements commonly used in the finite element method have been employed to simplify the numerical calculation of the energy functions and their gradients. The proposed representation is motivated by the active contour or snake model proposed by Kass, Witkin, and Terzopoulos. The current representation retains the salient merit of the original model as a feature tracker based on local and collective information, while facilitating more accurate image interpolation and prediction.
Using subjective redundancy for DCT coding of moving images
Author(s):
Nan Li;
Stefaan Desmet;
Albert A. Deknuydt;
Luc Van Eycken;
Andre J. Oosterlinck
Show Abstract
An investigation is carried out on employing subjective redundancy to improve DCT coding respecting the non-trackable motion in video signals. The human visual system (HVS) is exploited through the two aspects of its behavior, the sensitivity to a visual stimulus and visual masking. Some measures possible to adopt those aspects are discussed and tested. The result showed that: (1) during the non-trackable motion, a reduction of the HVSs response to the reconstruction error occurs because of the loss of sensitivity and the increase of the masking effect; (2) as far as the sensitivity is concerned, adjusting the quantization weighting function with respect to the motion is unlikely to bring substantial improvement; (3) the magnitude of the visual masking by the image content over the reconstruction error is positively related to the function of spatial masking and the relative speed between the eyes and the image. An example of application on motion compensated DCT coding is presented.
Image restoration using vector classified adaptive filtering
Author(s):
Paul Richardson
Show Abstract
This paper describes a novel adaptive filtering technique for image reconstruction `Vector Classified Adaptive Filtering' (VCAF) and provides comparative results when reconstructing images corrupted by additive Gaussian noise. Each sample in the image being reconstructed is first classified by mapping a classification vector onto a codebook as is done in Vector Quantization (VQ) image coding. The classification vector is a set of samples from the neighborhood of the one being reconstructed, and the codebook is a set of vectors designed off line using representative images, a model of the distortion function, and VQ codebook design techniques. Once the classification of the local region has been determined an optimal reconstruction filter is used to estimate the correct sample value. In this paper Wiener filtering techniques and a priori information from representative images are used to design least square optimal reconstruction filters for each class in the codebook. Thus VCAF is a very highly adaptive filter that incorporates a priori knowledge of typical image statistics in the form of a classification codebook and optimal reconstruction filter for each class. Experimental results indicate that VCAF performs very well with respect to the techniques with which it was compared; specifically non-adaptive Wiener filtering and edge orientation adaptive Wiener filtering.
Data compression methods for document image including screened halftones
Author(s):
Takeshi Masui;
Yasuhiro Nasu;
Eiji Shimizu
Show Abstract
We have developed a new method for binarization and a coding scheme for the document image which is bi-level images including screened halftones such as dither coded images and error diffusion images and so on. The screened halftones area of the document image is not suitable for conventional compression techniques. We propose a new data compression method for screened halftones. Our proposed method makes use of segmentation of region, block coding operation and arithmetic coding operation. In this paper, we describe new data compression methods for document image including screened halftones.
New approach for applying high-order entropy coding to image data
Author(s):
Steve S. Yu;
Nikolas P. Galatsanos
Show Abstract
Entropy coding is a well-known method for exploiting the statistical redundancy in order to compress image data. Information theory indicates that the coding efficiency can be improved by utilizing high-order entropy coding (HEC). However, due to the high complexity of the implementation and the difficulties in estimating the high-order statistics during the coding process, high-order entropy coding has not been widely used. Conditional coding of an Lth order Markov source requires 2KL code tables with 2K probabilities in each table. In this paper, we present a new approach called binary decomposed high-order entropy coding (BDHEC) that significantly reduces the complexity of implementation of HEC techniques. Furthermore, it increases the accuracy of estimating the statistical model and thus it also improves the effectiveness of HEC for practical applications. The novelty of this approach is that the K-bits, M equals 2K representation levels, grayscale image is decomposed into M binary sub-images, each corresponding to one representation level of M outcomes of pels. Since each sub-image has only two representation levels, K is reduced to 1, the smallest possible value. Thus, when high-order conditional entropy coding is applied to these sub- images instead of the original image, the implementation complexity is significantly reduced and the accuracy of estimating the statistical model is increased. Theoretical analysis and experimental results are presented, which verify the value of BDHEC.
New class of 2D multirate filter bank for digital image coding
Author(s):
Leu-Shing Lan;
Irving S. Reed
Show Abstract
A new type of perfect reconstruction filter bank is presented in this paper., This new class of filter bank has the following distinct features at the same time: linear-phase, distortion-free, IIR-FIR filter pair, closed form expression, orthogonal projection, and factorization-free. In contrast, the other existing filter banks do not possess all these properties simultaneously. For example, the classic Quadrature Mirror Filter (QMF) does not have the perfect reconstruction nature. Another example is the Conjugate Quadrature Filter (CQF), which lacks the linear- phase property. A qualitative comparison of different types of filter banks is also included. This newly developed perfect reconstruction IIR-FIR filter bank is then extended to the 2-D case. Both separable and non-separable filters are considered. In addition, both rectangular downsampling and non-rectangular downsampling schemes are discussed. Numerical optimization procedures are used to compute the optimal filter coefficients. The results are listed in tables for quick reference.
Transform-domain postprocessing of DCT-coded images
Author(s):
Chung-Nan Tien;
Hsueh-Ming Hang
Show Abstract
Data compression algorithms are developed to transmit massive image data under limited channel capacity. When a channel rate is not sufficient to transmit good quality compressed images, a degraded image after compression is reconstructed at the decoder. In this situation, a postprocessor can be used to improve the receiver image quality. Ideally, the objective of postprocessing is to restore the original pictures from the received distorted pictures. However, when the received pictures are heavily distorted, there may not exist enough information to restore the original images. Then, what a postprocessor can do is to reduce the subjective artifact rather than to minimize the differences between the received and the original images. In this paper, we propose two post processing techniques, namely, error pattern compensation and inter-block transform coefficient adjustment. Since Discrete Cosine Transform (DCT) coding is widely adopted by the international image transmission standards, our postprocessing schemes are proposed in the DCT domain. When the above schemes are applied to highly distorted images, quite noticeable subjective improvement can be observed.
Affine-transform-based image vector quantizer
Author(s):
Madaparthi B. Brahmanandam;
Sethuraman Panchanathan;
Morris Goldberg
Show Abstract
In this paper, we propose an affine transform based vector quantization (ATVQ) technique for image coding applications. Vector quantization (VQ) is intrinsically superior to predictive coding, transform coding, and other suboptimal and ad hoc procedures. The limitation of VQ is the very large codebook that must be generated and stored. The proposed affine transform based vector quantization technique addresses this problem. The image to be coded is partitioned into disjoint square blocks. Each block is regarded as a vector and is encoded by searching through a set of affine transforms and a codebook of templates. The transform- template pair that can reconstruct an approximate input vector with minimum distortion is selected. The parameters and the index of the affine transform and the index of the template constitute the codeword of the input vector. In decoding, the image vector is reconstructed by applying the inverse of the affine transform on the template. ATVQ can reconstruct more input vectors without any distortion than conventional VQ can reconstruct, using the same codebook. Simulation results show that the technique performs well using a universal codebook. This technique is also suitable for progressive image transmission as its performance is good at very low bit rates.
Hierarchical image sequence coding with tree-structured vector quantization
Author(s):
Ulug Bayazit;
William A. Pearlman
Show Abstract
The paper presents two different approaches to image sequence coding which exploit the spatial frequency statistics as well as the spatial and temporal correlation present in the video signal. The first approach is the pyramidal decomposition of the Motion Compensated Frame Difference (MCFD) signal in the frequency domain and the subsequent coding by unbalanced Tree Structured Vector Quantizers (TSVQ) designed to match the statistics of the frequency bands. The type of TSVQ used in this study possess the advantage of low computational complexity with coding performance comparable to full-search vector quantization. The second approach is similar except that the order of motion estimation/compensation and pyramidal decomposition are interchanged.
Evaluation of M-band orthonormal filter banks: hierarchical and direct structures
Author(s):
Adil Benyassine;
Ali Naci Akansu
Show Abstract
This paper compares the objective and subjective performance of direct and hierarchical subband decomposition schemes for image coding. It is observed that the hierarchical subband schemes outperform the direct structures in image coding. It is also shown that the dyadic tree subband codec outperforms the full-tree case for the image coding experiments performed in this study.
Flicker-free field-sequential stereoscopic TV system compatible with current PAL system
Author(s):
Guoyu Yang;
XiaoYun Shen;
Hongbo Sun;
Chang Li
Show Abstract
Compared with several typical field-sequential stereoscopic TV systems, a flicker-free field- sequential stereoscopic TV system compatible with a current PAL system is proposed. Except the frame memory technique, resetting composite sync signal and line interpolation techniques are adopted in this system, therefore, it has the advantage of fine and smooth scanning structure, high vertical resolution, and easy application.
Psychovisual-based distortion measure for monochrome image compression
Author(s):
Navin Chaddha;
Teresa H.-Y. Meng
Show Abstract
In this paper we describe a quantitative distortion measure for judging the quality of compressed monochrome images based on a psycho-visual model. Our model follows the human vision perception in that the distortion as perceived by a human viewer is dominated by the compression error uncorrelated with the local features of the original image. We have performed subjective tests to obtain the ranking results for images which were compressed using different compression algorithms and compared the results with the rankings obtained using our distortion measure and other existing mean-square error based distortion measures. We have found that our distortion measure's ranking matches the subjective ranking perfectly where as the mean-square error and its variants are only 60% correct on the average.
Finding the optimal wavelet functions for image representation
Author(s):
David Stanhill;
Yehoshua Y. Zeevi
Show Abstract
A complete and unique parametrization is given for all compactly supported, orthogonal, two dimensional wavelet functions for the case of the quincunx decimation matrix. Once we have a complete parametrization we search for optimal wavelet functions. Here we concentrate on the task of image representation and compression. We suggest three different criterions for measuring optimality. The first is tailored for each specific image, the second measures optimality in a more statistical sense and the third uses the general quality of vanishing moments. We compare between the three criterions which seem at first very different in nature, and show that they give very similar results.
Video coding using the MPEG-2 compression standard
Author(s):
Atul Puri
Show Abstract
The second phase of the Moving Picture Experts Group (MPEG-2) is ongoing and so far has resulted in completion of algorithmic tools for video coding with this standard. This standard is generic in nature and is organized around the concept of profiles and levels to address a wide range of applications and bit-rates. It not only supports coding of interlaced and progressive format video as a single layer, but also supports scalable video coding. The MPEG-2 standard, like the MPEG-1 standard, specifies a syntax for the bitstream and the semantics of the decoding process; encoders can be designed for best tradeoff of performance versus complexity, based on the application. We describe the various tools available in the MPEG-2 standard and discuss the potential of each tool for improving coding performance or enabling new applications.
Workstation-PC multipoint video teleconferencing implementation on packet networks
Author(s):
Mehmet Reha Civanlar;
J. P. Worth;
Glenn L. Cash
Show Abstract
The increasing availability of high speed local area networks (LANs) and the recent developments in image coding hardware that follow the international standardizations have made multipoint desktop video teleconferencing feasible using mostly off-the-shelf hardware and software. We have implemented a multipoint video teleconferencing system using workstations and personal computers (PCs) on a fiber distributed data interface (FDDI) LAN. The system is based on motion JPEG image coding and uses commercially available hardware and software wherever possible. In this paper, we outline our design experiences.
Multiresolution coding for digital transmission
Author(s):
Sheng-Wei Lin;
WenThong Chang
Show Abstract
Joint consideration for signal bit stream priority in source coding and the Euclidean distance between signal points in modulation achieves stepwise degradation in the picture quality. In this multiresolution strategy, system performance is affected by how the original images are decomposed, and how many signal levels are used in modulation. With 1:2 ratio of bit-rates between two components in multiresolution 64 QAM transmission, we investigate subband coder, pyramid coder, and decomposed DCT coder in multiresolution source coding. Optimum quantization is employed with the nonuniform quantizer matched to Laplacian distribution. Simulation reveals that decomposed DCT coder is a better candidate for multiresolution transmission, which has the advantages that the source coder is simple and not sensitive to bit-ratio variation.
Matrix algebra approach to Gabor-type image representation
Author(s):
Meir Zibulski;
Yehoshua Y. Zeevi
Show Abstract
Properties of basis functions which constitute a finite scheme of discrete Gabor representation are investigated. The approach is based on the concept of frames and utilizes the Piecewise Finite Zak Transform (PFZT). The frame operator associated with the Gabor-type frame is examined by representing it as a matrix-values function in the PFZT domain. The frame property of the Gabor representation functions are examined in relation to the properties of the matrix-valued function. The frame bounds are calculated by means of the eignevalues of the matrix-valued function, and the dual frame, which is used in calculation of the expansion coefficients, is expressed by means of the inverse matrix. DFT-based algorithms for computation of the expansion coefficients, and for the reconstruction of signals from these coefficients are generalized for the case of oversampling of the Gabor space. It is illustrated by an example that a better reconstruction is obtained in from the same number of coefficients in the case of oversampling.