Proceedings Volume 1199

Visual Communications and Image Processing IV

cover
Proceedings Volume 1199

Visual Communications and Image Processing IV

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 1 November 1989
Contents: 1 Sessions, 157 Papers, 0 Presentations
Conference: 1989 Symposium on Visual Communications, Image Processing, and Intelligent Robotics Systems 1989
Volume Number: 1199

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • All Papers
All Papers
icon_mobile_dropdown
Multiresolution Techniques For Image Representation, Analysis, And 'Smart' Transmission
Peter J. Burt
Important new techniques for representing and analyzing image data at multiple resolutions have been developed over the past several years. Closely related multiresolution structures and procedures have been developed more or less independently in diverse scientific fields. For example, pyramid and subband representations have been applied to image compression, and promise excellent performance and flexibility. Similar pyramid structures have been developed as models for the neural coding of images within the human visual system. The pyramid has been developed in the computer vision field as a general framework for implementing highly efficient algorithms, including algorithms for motion analysis and object recognition. In this paper I review these multiresolution techniques and discuss how they may be usefully combined in the future. Methods used in image compression, for example, should match the requirements of human perception, and future 'smart' transmission systems will need to perform rapid analysis in order to selectively encode the most critical information in a scene.
Some New Techniques in Displacement Estimation Algorithms
Kan Xie, Luc Van Eycken, Andre Oosterlinck
In this paper, some new techniques in displacement estimation algorithms are presented. The concepts of the correlation of motion vectors and motion tendency estimation are introduced to improve the performance of the conventional displacement estimation algorithms. A simple formula for pelrecursive algorithm has been derived from the theoretical differential method, and the tenable and convergence conditions for this formula are discussed as well. In the actual image processing, the "oscillation" and 'unstability may happen in some places where the formula is invalid, and displacement estimation may be far away from the actual one. Sonic new measures and techniques are introduced to alleviate these effects. Starting from the published displacement estimation algorithms, a group of improved pel-recursive algorithms have been developed. In this paper, the emphasis is put on the evaluation of displacement estimation algorithms instead of the actual realization of a coding scheme using these algorithms. The performance of the proposed algorithms is evaluated and compared with that of the conventional algorithms. The experiment shows that a substantial improvement has been obtained both on the estimation accuracy and the convergence rate. Current results indicate a reduction of the average number of iterations by a factor of 7 - 20, and a reduction of the match entropy by a factor of 3-20 on the image sequences examined.
Lossless Compression Of Block Motion Information In Motion Compensated Video Coding
Ali N. Akansu, Jung Hui Chien, M. S. Kadur
It is well known that the motion compensation is a powerful approach to the video coding. The redundancy within adjacent video frames is exploited by motion compensated interframe prediction. The motion information and the prediction error are transmitted to the receiver to reconstruct the video frames. The motion information must be compressed in a lossless way. Several lossless data compression techniques are tested and compared here for the motion information. It is observed that more than 50% compression of motion data is achieved. This corresponds to approximately 0.05 bits/pel for the motion information in the tested scheme, which is about 10-15% of total bits used in low bit rate video applications.
Multipredictive Motion Estimation Scheme With A Prediction Along Motion Axis
B. Choquet, D. Pele
This paper describes a motion estimation algorithm and its implementation in Television pictures (temporal conversion, de-interlace technique, prediction, reconstruction, coding). The inter-image processing techniques that we have developed are based on a motion estimation algorithm which determines a motion vector at each pixel that is to be interpolated. The motion estimation algorithm is a differential, iterative, pel-recursive method (Walker/Rao algorithm). Motion field is improved by a multi-predictive procedure with a prediction along motion axis.
A Multiple-Frame Pel-Recursive Wiener-Based Displacement Estimation Algorithm
Serafim N. Efstratiadis, Aggelos K. Katsaggelos
In this paper, a multiple frame formulation of the pel-recursive Wiener-based displacement estimation algorithm [1] is presented. The derivation of the algorithm is based on the assumption that both the so-called update of the initial estimate of the displacement vector and the linearization error are samples of stochastic processes. A linear least-squares estimate of the update of the initial estimate of the displacement vector from the previous frame to the current is provided, based on w observations in a causal window W of each of the v previous frames. The sensitivity of the pel-recursive algorithms in the areas where occlusion occurs is studied and their performance is improved with adaptive regularization of the inverse problem that is involved. Based on our experiments with typical video-conferencing scenes, we concluded that the multiple frame Wiener-based algorithm performs better than the two-frame Wiener-based pel-recursive algorithm with respect to robustness, stability, and smoothness of the velocity field.
Spatio-Temporal Motion Compensated Noise Filtering Of Image Sequences
A. K. Katsaggelos, J. N. Driessen, S. N. Efstratiadis, et al.
In this paper the filtering of noise in image sequences using spatio-temporal motion compensated techniques is considered. Noise in video signals degrades both the image quality and the performance of subsequent image processing algorithms. Although the filtering of noise in single images has been studied extensively, there have been few results in the literature on the filtering of noise in image sequences. A number of filtering techniques are proposed and compared in this work. They are grouped into recursive spatio-temporal and motion compensated filtering techniques. A 3-D point estimator which is an extension of a 2-D estimator due to Kak [5] belongs in the first group, while a motion compensated recursive 3-D estimator and 2-D estimators followed by motion compensated temporal filters belong in the second group. The motion in the sequences is estimated using the pel-recursive Wiener-based algorithm [8] and the block-matching algorithm. The methods proposed are compared experimentally on the basis of the signal-to-noise ratio improvement and the visual quality of the restored image sequences.
Video Noise Reduction In Telecine Systems
Glenn Kennel, Mysore Raghuveer
A telecine is a motion picture scanner used to generate high quality video recordings of motion pictures. The noise characteristics of a telecine system were analyzed to determine a suitable scheme for reducing video noise across non-moving portions of a temporal sequence of images. It was found that while in principle median filtering is better than a scheme such as averaging, the improvement is not substantial and the simpler approach of averaging can very well be used.
3D Ultrasonic Images Filtering
Bruno Ayral, Bertrand Vachon
This paper presents a simple but efficient filtering method suited to fit 3D ultrasonic images features. It is based upon the order statistics filtering principle. The main idea is to use only a combination of 2D filters to process a 3D image, thus sparing a large amount of processing time. This method is possible because of some properties of the ultrasonic images.
Evaluation Of Order Statistic Filters Using A Modified Signal To Noise Ratio
Renee L. J. Martens, A. N. Venetsanopoulos
One major problem in evaluating image filters is the lack of a quality measure that is objective and quantitative yet reflects the demands of the human visual system. This study will use two new quality measures that will attempt to address this problem. The peak-to-peak signal-to-noise ratio (PSNR) for edge areas will demonstrate how the edge areas are affected by the different filters while the PSNR for the homogeneous areas will similarly demonstrate the effect in the homogeneous areas. In some applications the edge areas are more important then the homogeneous areas and vice-versa. The results of the research, therefore, contribute to the study of filter effectiveness in terms of specific applications. In addition the paper demonstrates the effect of different image types on the performance of order statistic filters. The study compares the following: 1) the adaptive trimmed mean filter [1], 2) the signal adaptive median filter [2], 3) the adaptive double window modified trimmed mean filter [3], 4) the adaptive window edge detecting median filter [45], 5) the median filter [6], 6) the a -trimmed mean filter [7], and 7) the contraharmonic mean [8]. The seven filters are compared using the PSNR, the PSNR for edge areas and the PSNR for homogeneous areas. The edge areas and homogeneous areas is determined through the use of a simple range edge detector on the original noise free image. Four different images are used for comparison: 1) 'Lenna', 2) 'Geometrical', 3) 'Harbor' and 4) 'Jones'. Three noise types will be used: 1) impulsive, 2) additive and 3) a mixture of impulsive and additive noise.
Robust Object Reconstruction From Noisy Observations
G. Sundaramoorthy, M. R. Raghuveer, S. A. Dianat
The problem of reconstructing a moving object from multiple snapshots contaminated by noise arises in many imaging applications. Many techniques have been proposed for noise elimination that rely either on measurements of the autocorrelation and/or power spectrum of the observations, or on the assumption that the additive noise is white. The power spectrum is affected by additive noise, and in many situations, the noise is spatially or temporally correlated. The above techniques are sensitive to deviations from assumptions. The bispectrum is identically zero for random processes with symmetric distributions regardless of spatial or temporal correlations. This property along with its ability to retain phase and magnitude information, have led researchers to propose bispectral techniques for estimating parameters of random signals in noise. The bispectrum is also insensitive to translational motion. If these properties are to be taken advantage of to solve the moving object (deterministic signal in noise) reconstruction problem, it is necessary to obtain good estimates of the bispectrum of the object from the noisy observations. In order to do this it is necessary to restrict bispectrum estimation to a certain region of the frequency plane. A consequence of this is that several techniques proposed for bispectral analysis of random signals cannot be used. The paper develops new approaches which enable signal reconstruction from bispectrum measurements made over the restricted region. Simulations of application of these techniques to moving object reconstruction, data transmission over channels with jitter and noise, and image restoration, show that they are more robust with respect to the statistics of the contaminating noise than methods based on autocorrelation.
Optimal Detection Methods for the Restoration of Images Degraded by Signal Dependent Noise
Kenneth E. Barner, Gonzalo R. Arce
The restoration of images degraded by signal dependent noise has traditionally been approached from an estimation framework. These techniques, however, are heavily dependent on a complete and accurate statistical representation of an image field and tend to blur regions where this representation is inaccurate. In this paper we introduce restoration techniques based on optimal vector detection methods. The introduced restoration methods are derived from a Bayesian framework and reduce to a M-ary Hypothesis Test among a representative set, or codebook, of vectors. These techniques remove the signal dependent noise while retaining the structure required for accurate image representation.
High Speed Kalman Filtering For Image Restoration
Jin Yun Zhang, Willem Steenaart
In this paper a fast Kalman filter is developed for digital image restoration. Optimal restoration schemes for two-dimensional images degraded by both blur and noise create dimensionality problems which, in turn, lead to intensive computation. When the original image model and the degradation model are both represented by Roesser's 2-D SISO state-space models, a simple composite dynamic structure based on a cascade technique is obtained. From this composite model, the Kalman filtering equations are established by defining a proper state vector. The speed of the recursive estimation procedure can be improved by processing the image along the diagonal direction. Furthermore, a dedicated VLSI array processor for high speed processing is proposed.
Characteristics Of The Motion Sensitive Spatiotemporal Filters
David Yu-Shan Fong, Carlos A. Pomalaza-Raez
The spatiotemporal filters modeled after the motion sensitive mechanism found in the biological vision system are analyzed. The analysis is supported by computer simulation of the filter behavior. The spatiotemporal filter under investigation is sensitive to motion in the preferred direction. It also blocks input with uniform intensity and allows only the transition in a leading edge to pass through. A preliminary examination of the frequency domain response suggests a tightly coupled spatiotemporal characteristics between the spatial frequency and the temporal frequency. The similarity and difference of the response between this class of filters and that of the Delayed Difference of Gaussian is also noted.
Hierarchic and Parallel Processing Schemes for Arbitrary Multirate Filter Banks
Bennett Levitan, Gershon Buchsbaum
This paper presents relations between hierarchic and parallel implementations of multirate filter banks (MRBs). An MRB consists of a set of filters that produce several reduced sampling-rate, or spatial scale, versions of a signal. Because image properties exist at all spatial scales, and different significance is placed on each of these scales, MRBs are very useful for image coding and analysis. Hierarchic processors, in which outputs are computed with a cascade of filters acting on a signal, have been shown in some special cases to implement MRBs more efficiently than parallel processors. However, no general theory for the hierarchic implementation of MRBs exists. We derive conversions between parallel multirate filter banks (PMRBs) and hierarchic multirate filter bank (HMRBs). The theory provides closed-form equations for finding the PMRB equivalent to a given HMRB provided a "commutation condition" (subsection 2.2.2) on the decimation and interpolation coefficients is satisfied. Closed-form solutions for the HMRB equivalent to a given PMRB can be found provided both the commutation condition and a "frequency-preservation condition" (subsection 2.3) on the PMRB filters are met. MRBs that do not satisfy the frequency preservation condition can not be implemented with an HMRB. We use a two-dimensional HMRB that allows arbitrary, non-decreasing rational-number reductions in sampling rate between successive outputs, arbitrary LTI filtering between outputs and low-pass filtering to prevent aliasing from the sampling-rate changes. Finally, we consider the special case of scaled Gaussian filters.
Optimal Morphological Filters for Pattern Restoration
Dan Schonfeld, John Goutsias
A theoretical analysis of morphological filters, for the "optimal" restoration of noisy binary images is presented. The problem is formulated in a general form, and an "optimal" solution is obtained by using fundamental tools from mathematical morphology and decision theory.
Generic Ribbons: A Morphological Approach Towards Natural Shape Decomposition
Ziheng Zhou, Anastasios N. Venetsanopoulos
In this paper, we propose a shape decomposition scheme which decomposes complex shapes into their natural components. Each component is represented by a shape primitives called generic ribbon. A generic ribbon is defined by sweeping a structuring element along a specified trajectory which is decomposed from the morphological skeleton of the shape through the skeleton decomposition. Five decomposition rules are devised to facilitate a natural decomposition. The decomposed generic ribbons form a hierarchical representation in which each component is described by a feature vector placed at a certain level according to its significance in the representation. The hierarchical structure makes the representation reliable and simplifies the matching process during recognition.
Mathematical Morphology and Its Application in Machine Vision
David G. Daut, Dongming Zhao
Mathematical morphology provides an efficient tool for image analysis. We study the problem of flaw detection in materials which are represented by very poor contrast digital images. An algorithm for flaw detection in the case of glass matte surfaces has been developed. The object skeletons within the binary images are obtained and directional connectivity information in the skeletons is used to discriminate noise patterns from flaws according to a specified criteria. After the discrimination process, the remaining skeletons correspond to flaws and can be employed to recover the shape of flaws. An alarm flag may be turned on if the sizes of the detected flaws are found to exceed industrial standards. In the case of a grayscale image, the image is converted to a binary version by using an adaptive threshold algorithm, then the algorithm for binary images is applied. Experimental results have been obtained for both binary and grayscale digital image data.
Representation Of Geometric Features, Tolerances, And Attributes Using Mathematical Morphology
Frank Y. Shih
The integration of computer-aided design (CAD) and computer-aided manufacturing (CAM) for modeling the geometry of rigid solid objects is becoming increasingly important in mechanical and civil engineering, architecture, computer graphics, computer vision, and other fields that deal with spatial phenomena. At the heart of such systems are symbolic structures (representations) designating "abstract solids" (subsets of Euclidean space) that model physical solids. The mathematical framework for modeling solids is Mathematical Morphology, which is based on set-theoretic operations. Using mathematical morphology as a tool, our theoretical research aims at studying the representation schemes for the dimension and tolerance of the geometric structure. The paper is divided into three major parts. The first part defines a mathematical framework, mathematical morphology, for characterizing solid objects dimension and tolerance. The second part then adopts the framework to represent some illustrated two-dimensional and three-dimensional objects. The third part describes the added tolerance information to control the quality of the parts and the interchangeability of the parts among assemblies. With the help of variational information, we know how to manufacture, how to setup, and how to inspect to ensure the products within the required tolerance range.
Mathematical Morphology on Gradient Space Surface Tessellation
Meng-Ling Hsiao
The approach of using element type representation to describe 3-D surface is encouraged for many applications such as machine vision, 3-D visualization, scene understanding, finite-element engineering analysis, and etc. In this paper, a new method for 3-D surface tessellation from a perspective 2.5-D range map is proposed. Mathematical morphology operations are applied to conduct the surface tessellation on a frame-by-frame basis. The perspective 2.5-D range map which describes the surface altitude in a 3-D orthographic projection model can be either acquired from a range finder or generated from a true 3-D voxel data set. The surface tessellation method is achieved by applying the Delaunary triangulation method to the intrinsic images. The intrinsic information, gradients of point on the surface, can are directly computed from the range map using neighborhood operation. Based on the differential geometry, the directional derivatives along two orthogonal axes in the image are used as the intrinsic images. The lines of curvature along with the principle directions can be detected from these intrinsic images by a sequence of boolean lattice mathematical morphology operations, dilation and image algebra operations. Using the lines of curvature images, a set of seed points can be obtained by intersecting the lines of curvature along the principle directions with maximum and minimum normal curvature. Finally, a set of triangular elements may be resulted by applying the Delaunay triangulation method to those seed points.
Morphological Algorithms For The Analysis Of Pavement Structure
Dimitri A. Grivas, Michael M. Skolnick
The applicability of morphological image processing techniques for the description of condition and analysis of pavement surfaces is examined. Morphological techniques can be used in the measurement of pavement media consisting of grain (aggregates) and binding substances (bituminous or Portland cement mixtures). Measurements of size and size distributions on surface features related to texture and distresses can be obtained via morphological opening and closing transformations and distributions. When correlated with actual physical measurements of such quantities, the presented morphological measures of size and size distributions may prove to be useful in characterizing the surface condition of both asphalt and concrete pavement structures.
A Method For Evaluating The Efficiency Of Block Classification In Multiple Mode Video Compression
Tom Lookabaugh, Dan Klenke, Bowonkoon Chitprasert, et al.
Many contemporary video compression algorithms classify each block of image data into one of a small number of modes and then apply a coding technique determined by the mode. The design and optimization of this classifier is a central problem in designing the overall video compression algorithm. We derive a theoretical bound on performance of block classification for independently coded blocks by minimizing a Lagrangian functional of rate and distortion. This bound can be used to evaluate the efficiency of simple classification schemes. As an example, we analyzed a heuristically optimized classification scheme for a simple three mode video coding system and showed that its performance was within 5% of the Lagrangian minimization. We also demonstrate how our technique can be used to improve classification logic.
A Pc-Compatible, Multiprocessor Workstation For Video, Data, And Voice Communication
Joe W. Duran, Michael Kenoyer
We present an architecture that combines video, voice, and data capabilities in a PC-compatible workstation. This architecture has been realized as an actual workstation product which provides the user with many communications abilities, including full motion video/audio communication over digital networks. The current version executes a transform based video compression algorithm, using a 16 by 16 pixel discrete cosine transform. During video phone calls, users may transfer computer files, execute programs, send snapshots of computer screens, send FAX messages, and transfer data from other devices through serial data ports built into the workstation. The design also supports storing and retrieving video sequences (with sound) and still pictures on magnetic or optical discs.
A Multiprocessor Based Low Bitrate Video Codec Architecture and a Well-Adapted Two Phase Software Coding Strategy
Thomas Kummerow, Peter Mohr, Peter Weis
The first part of the paper describes a specialized multiprocessor environment for hybrid coding of visual communications signals in the range from ISDN basic access to primary rate transmission channels. Most important is a proprietary 4-wide SIMD parallel video processor with 80 MIPS. The second part deals with the software philosophy of the codec. It uses preanalysis and prebuffering in the first phase of coding a frame. In the second phase, limited processing power and available channel bits are distributed optimally over time and over changed areas of one frame. Codec delay is halved with respect to conventional codecs.
A Comparison of Techniques for Estimating Block Motion in Image Sequence Coding
Michael Orchard
Block motion estimation for motion compensation is a key component of a wide variety of video coding schemes. Applications range from real-time low bit rate teleconferencing codecs to moderate bit rate methods currently being proposed for the coding of video on CD-ROM's. The criteria used for evaluating the performance of block motion estimation algorithms vary significantly with the application. While low computational complexity is critical for real-time codec applications, it is only a secondary consideration for video coding on CD-ROM's where the encoding is performed off-line. Similarly, while consistency of motion estimates is not an important performance criteria in some applications, it is important in applications using motion interpolation to reduce the temporal frame rate. This paper compares five motion estimation algorithms in terms of three performance criteria: energy of the displaced frame difference (DFD), computational complexity, and entropy of the motion field. The energy of the DFD is a measure of the bit rate needed to reconstruct the current frame after motion compensation. The entropy of the motion field is a measure of the bit rate needed to send motion vectors. In addition, it is usually reasonable to assume that most moving objects in an image produce smooth variations in the motion field across the pixels representing those objects. Since smoothly varying motion fields have low entropy, this assumption leads to an interpretation of the entropy of the motion field as a measure of the consistency of the motion estimate. This interpretation is supported by simulation results.
A Multi-Media Teleconference Terminal Controlling Quantity Of Flow In Packet Transmission.
Naobumi Kanemaki, Fumio Kishino, Katsutoshi Manabe
This paper describes a new teleconference terminal constructed on the X.25 protocol for high speed packet switching over 1.5Mb/s subscriber lines. To develop a high performance teleconferencing system, the terminal incorporates three transmission techniques: variable bit rate coding and packet priority assignment of video signals, combining left and right audio channels and assignment of audio packet priority, and under severe traffic congestion the dropping of video data packets if there are no low priority audio packets that can be dropped. If a sending terminal detects traffic congestion severe enough to prevent transmission of all packets, the least signifi-cant audio packets are dropped until audio signal rate falls to 32kb/s. Next, video data packets are dropped if the traffic congestion is very severe. Sound localization, important in conferences, is maintained even under heavy traffic congestion. The influence of packet loss on picture and audio quality is described and decoded pictures suffering various packet losses are investigated. With this newly developed teleconferencing terminal, consistent and high quality multi-media services can be assured.
A Low-Rate Video Coding Based on DCT/VQ
Joon Maeng, David Hein
The Discrete Cosine Transform (DCT) is an efficient coding method to compress the video-conferencing images. It has well-known artifacts, blocking effects and mosquito effects. These artifacts are very visible at the low bit rates such as 56 Kbps and 112 Kbps. A coding method based on the Discrete Cosine Transform and Vector Quantization (VQ) is presented in this paper. The incoming image is split into two bands of frequencies. The low frequency band is coded by the 8x8 DCT with motion compensated inter-frame coding and the high band is coded by the 4x4 Vector Quantization. The decoder filters the low band after construction and combines it with the high band to restore the image. The simulation results show a great improvement in subjective picture quality. No blocking effects are visible and the mosquito effects are greatly reduced. Since only the low band is filtered by the low pass filter, no high frequency components are lost in the process of filtering. The combined image of high and low band maintains edge sharpness and has a smooth texture.
Knowledge-based Coding of Facial Images based on the Analysis of Feature-Points Motion and Isodensity Lines
Takashi Mizuno, Osamu Nakamura, Toshi Minami
In the study of facial motion codings for practical use in TV telephone and TV conference systems, knowledge-based coding has attracted researchers' attention as a concrete example of next-generation image coding schemes[1]-[3]. In such schemes a facial image of a person without any specific expression (the model image) is transmitted over a telephone line, and a wireframe of the face (the model wireframe) is constructed, and stored in the receiving terminal at the beginning of the conversation. During the conversation, movements in the speaker's facial features are measured, and displacements of feature points on the speaker's face are transmitted to the receiving terminal, where the speaker's facial image is reconstructed by transforming the model wireframe according to these data.
Polynomial Image Algebra Approach For Image Processing
Prabir Bhattacharya, Kai Qian
This paper describes a polynomial approach to the representation of binary and gray images for machine vision. We show that most of the stan-dard image processing can be done by the template polynomial operations. Further, we also develop some operators which rely on the instrinsic properties of polynomials and can be done at a considerable advantage using the polynomial representation of images.
Automatic Site Recognition and Localisation
P. Thevenoux, B. Zavidovique, G. Stamon
LOAS is a model-based interpretation system designed to recognize and to locate sites using a 3D model. Two concurrent sensors of different types provide two views of the same area with a different point of view. This redundancy is the important source of 3D recognition. The system is organized around two major phases: Model description and 2D data segmentation versus 3D scene description. Attention is focused on using 3D information as soon as possible before the matching which is performed on a subgraph pyramid basis.
A Comparison of Matrix Texture Features Using a Maximum Likelihood Texture Classifier
Jon R. Berry Jr., John Goutsias
The performance of various matrix features in classifying synthetic and natural textures is compared by using the features directly in a maximum likelihood texture classifier (MLTC). The matrix texture features under consideration include the spatial gray level dependence matrix (SGLDM), the neighboring gray level dependence matrix (NGLDM) and the neighboring spatial gray level dependence matrix (NSGLDM). By adopting the MLTC we avoid the various problems associated with the use of scalar features extracted from the matrices under consideration, while we obtain excellent classification results.
A Binary Image Representation Scheme Using Irredundant Translation Invariant Data Structure
M. C. Zhang, S. Chen
In this paper, we propose a novel 3-D binary image representation method using an irredundant, translation invariant data structure. In comparison with existing techniques (e.g. the quadtree), this approach has the advantages of translation invariance and spatial efficiency.
A Markov Random Field Model-Based Approach To Image Interpretation
J. Zhang, J. W. Modestino
In this paper, a Markov random field (MRF) model-based approach to automated image interpretation is described and demonstrated as a region-based scheme. In this approach, an image is first segmented into a collection of disjoint regions which form the nodes of an adjacency graph. Image interpretation is then achieved through assigning object labels, or interpretations, to the segmented regions, or nodes, using domain knowledge, extracted feature measurements and spatial relationships between the various regions. The interpretation labels are modeled as a MRF on the corresponding adjacency graph and the image interpretation problem is formulated as a maximum a posteriori (MAP) estimation rule. Simulated annealing is used to find the best realization, or optimal MAP interpretation. Through the MRF model, this approach also provides a systematic method for organizing and representing domain knowledge through the clique functions of the pdf of the underlying MRF. Results of image interpretation experiments performed on synthetic and real-world images using this approach are described and appear promising.
Elementary Holograms And Neurocomputer Architecture For Pattern Recognition
Walter Schempp
The massively parallel organization principles which distinguish neural systems from the von Neumann architecture of standard digital computer hardware are one of the main reasons for the largely emerging interest in neurocomputers. Based on a unified nilpotent harmonic analysis approach to artificial neural network models implemented with coherent optical, optoelectronic, or analog electronic neurocomputer architectures, the paper establishes a new identity for the matching polynomials of complete bichromatic graphs which connect neurons located in the neural plane. The key idea is to identify in a first step the hologram plane with the three-dimensional Heisenberg nilpotent Lie group quotiented by its one-dimensional center, then to restrict in a second step the holographic transform to the holographic lattices which form two-dimensional pixel arrays inside the hologram plane, and finally to recognize in a third step the hologram plane as a neural plane. The quantum mechanical treatment of optical holography is imperative in microoptics or amacronics since atoms coherently excited by short laser pulses may be as large as some transistors of microelectronic circuits and the pathways between them inside the hybrid VLSI neurochips.
Automatic Modelling of Rigid 3D Objects Using an Analysis by Synthesis System
Hans Busch
An important computer vision task is modeling of 3D objects for computer graphics and animation. This paper presents a method for generating a 3D wireframe model automatically out of a number of views of the original object. The silhouettes of the objects are used to determine the intersecting volume which represents a rough estimation of the object shape in voxel representation. A triangulation algorithm converts the model into a wire-frame surface representation to enhance the quality of the models surface and allow tex-ture mapping. A shape from motion/stereo algorithm is used to enhance the model shape. Calibrated cameras are used to obtain true dimensions.
A Stereo-Based Approach to Face Modeling for the ATR Virtual Space Conferencing System
Gang Xu, Hiroshi Agawa, Yoshio Nagashima, et al.
The goal of this research is to generate three-dimensional facial models from facial images and to synthesize images of the models virtually viewed from different angles, which is an integral component of the ATR virtual space conferencing system project. We take a stereo-based approach. Since there is a great gap between the images and a 3D model, we argue that it is necessary to have a base face model to provide a framework. The base model is built by carefully selecting and measuring a set of points on the extremal boundaries that can be readily identified from the stereo output and another set of points inside the boundaries that can be easily determined given the boundary points. A front view and a side view of a face are employed. First the extremal boundaries are extracted or interpolated, and features such as eyes, nose and mouth are extracted. The extracted features are then matched between the two images, and their 3D positions calculated. Using these 3D data, the prepared base face model is modified to approximate the face. Finally the points on the modified 3D model are assigned intensity values derived from the original stereo images, and images are synthesized assuming new virtual viewing angles. The originality and significance of this work lies in that the system can generate a face model without a human operator's interaction with the system as in other conventional face modeling techniques.
Application Of Mathematical Morphology To Handwritten ZIP Code Recognition
Andrew M. Gillies, Paul D. Gader, Michael P. Whalen, et al.
This paper describes applications of mathematical morphology to a system for recognizing handwritten ZIP Codes. It discusses morphological techniques used for preprocessing address block images, locating address block lines, splitting touching characters, and identifying handwritten numerals. These techniques combine mathematical morphology, hierarchical matching of object models to symbolic image representations, and a strategy of propagating multiple hypotheses. The various submodules of the system have been trained on over two thousand real address block images and tested on one thousand representative images. On the one thousand test images, the system correctly located 82.5 percent, correctly identified 45.6 percent, and incorrectly classified only 0.8% of the ZIP Codes. This system performance level could lead to a significant cost savings in mail piece sorting.
Shape Features Using Curvature Morphology
F. Leymarie, M. D. Levine
The notion of curvature of planar curves has emerged as one of the most powerful for the representation and interpretation of objects in an image. Although curvature extraction from a digitized object contour would seem to be a rather simple task, few methods exist that are at the same time easy to implement, fast, and reliable in the presence of noise. In this paper we first briefly present a scheme for obtaining the discrete curvature function of planar contours based on the chain-code representation of a boundary. Secondly, we propose a method for extracting important features from the curvature function such as extrema or peaks, and segments of constant curvature. We use mathematical morphological operations on functions to achieve this. Finally, on the basis of these morphological operations, we suggest a new scale-space representation for curvature named the Morphological Curvature Scale-Space. Advantages over the usual scale-space approaches are shown.
Multi-Scale Fractal and Correlation Signatures for Image Screening and Natural Clutter Suppression
T. Peli, V. Tom, B. Lee
The fractal and correlation signatures of grayscale imagery are used to distinguish man-made areas of activity from natural background in surveillance imagery. The signatures are generated via computation of generalized dimensions as a function of scale and space. Image screening and natural clutter suppression methods can exploit the behavior of fractal and correlation signatures for generic geometric objects versus the predictable fractal behavior of natural backgrounds. Several different image screening methods have been investigated, using these multi-scale signatures to eliminate large areas of natural terrain from consideration and to cue image analysts to areas of interest.
Measuring Fractal Dimension: Morphological Estimates And Iterative Optimization
Petros Maragos, Fang-Kuo Sun
An important characteristic of fractal signals is their fractal dimension. For arbitrary fractals, an efficient approach to evaluate their fractal dimension is the covering method. In this paper we unify many of the current implementations of covering methods by using morphological operations with varying structuring elements. Further, in the case of parametric fractals depending on a parameter that is in one-to-one correspondence with their fractal dimension, we develop an optimization method, which starts from an initial estimate and by iteratively minimizing a distance between the original function and the class of all such functions, spanning the quantized parameter space, converges to the true fractal dimension.
Estimating of Fractal and Correlation Dimension from 2D and 3D-Images
Ari M. Vepsalainen, Jun Ma
If the recursive algorithm or the dynamical system that generate the fractal shape is known, then the fractal dimension can be calculated. Many natural shapes are fractals. However, one can usually not estimate exactly the corresponding dimension from images, because the images are formed of finite amount of values at discrete points. In here, one algorithm to estimate fractal dimension and two algorithms to estimate the correlation dimension, which is a lower bound of fractal dimension, from a 2D or 3D image are introduced. The calculated dimension could be used in order to solve many practical problems. For example, one could use the dimension image on icebreakers in order to determine the type of ice and in order to find cracks in the ice using satellite images. Also one can use dimension image in image compression and in pattern recognition.
Pixel Classification By Morphologically Derived Texture Features
Edward R. Dougherty, Jeff B. Pelz
Local granulometric size distributions are generated by performing a granulometry on an image and keeping local pixel counts in a neighborhood of each pixel at the completion of each successive opening. Normalization of the resulting size distributions yields a probability density at each pixel. These densities contain texture information local to each pixel. Pixels can be classified according to the moments of the densities. Further refinement can be accomplished by employing several structuring-element sequences in order to generate a number of granulometries, each revealing different texture qualities. Classification is accomplished by comparing observed moments to those representing a database of textures. The collection of database moments are actually random variables dependent on random texture processes, and the method employed in the present paper involves the comparison of observed moments to the means of database-texture moments.
Representation Theorems In A L-Fuzzy Set Theory Based Algebra For Morphology
Divyendu Sinha, Charles R. Giardina
Perhaps the most promising area of morphological image processing is that dealing with morphological filters. An image to image mapping is called a morphological filter if it is increasing and translation compatible. In other words, morphological filters preserve the natural set theoretic ordering and are space invariant. Examples of such filters are the convex hull operator, the topological closure, the umbra transform and various other topological algorithms [2-4]. Matheron showed that all such mappings can be characterized by the set theoretic operations erosion and dilation. His representation theorems has been adapted to digital images with great success [4].
Digital Filters Based On Threshold Decomposition And Boolean Functions
Jaakko Astola, Lasse Koskinen, Olli Yli-Harja, et al.
In this paper we discuss certain limitations which are inherent in filters that are based on threshold decomposition and Boolean functions. We also define a generalization of stack filters by dropping the requirement of positivity in the defining Boolean function. We show that while some of the nice properties of stack filters are lost the extra design freedom makes it possible to construct filters which have certain attractive properties that are not attainable with stack filters.
Parallel Architecture for Mathematical Morphology
F. Lenoir, S. Bouzar, M. Gauthier
The image processor presented here was designed for road traffic measurement and uses a four pixels parallelism to process images. It can perform morphological transformations on four, two or one pixel up to 0.3 ms for a 256 x 256 image. It processes binary and grey level images with neighbourhood and arithmetical processors. Its recursive part allows quick reconstructions.
Morphological Networks
Stephen S. Wilson
The standard operations in mathematical morphology involve erosions, dilations, openings, and closings which have been defined on binary and grey scale images. A generalization of standard morphology is discussed and leads to a new operation: weighted rank order filters. A different type of generalization leads to morphology in a vector space. A combination of the two ideas is developed, and involves a sequence or network of layers of weighted rank order filters on vector spaces which have properties very similar to multi-layer neural networks. A weighted rank order cell has a non-linear soft threshold response at the output. Due to the nature of rank order filtering, a unique supervised training procedure can be defined which allows weights in hidden layers to be trained as quickly and easily as those in the output layer.
Adaptive One Dimensional DCT-VQ For Motion Compensated Video Coding
Ali N. Akansu, Jung Hui Chien
The motion compensated frame difference (MCFD) signals of video have much less correlation than the still frames. This change of source statistics brings the compactness power of the 2D DCT down. The Hadamard matrix decomposition 1D DCT of MCFD is proposed in this paper. The 1D DCT case is compared with the 8x8 2D DCT case. The results indicate that both perform the same practically. But the first is simpler computationally than the latter. Transform coefficients split into bands and each band is grouped based on the variances. The VQ of transform coefficient bands is employed. 35.77 dB of PSNR at 0.3245 bits/pixel is obtained with the proposed scheme which is also perceptually as good as the 2D DCT case.
Low Bit Rate Teleconferencing Video Signal Data Compression
C. Manikopoulos, H. Sun, G. Antoniou
A new high compression algorithm of teleconferencing image sequence data is proposed. It is based on vector quantization (VQ) operating in the spatial domain and it is structured as a combination of an intraframe finite state (FS) algorithm with state label entropy encoding, followed by an interframe one, operating on a bundle of frames. The intraframe algorithm encodes the head frame of the bundle; the interframe algorithm follows, operating on the remaining frames. Then, the encoding process restarts with a new bundle of frames. Simulation experiments have been carried out for a videoconference image sequence, consisting of 20 frames of 112x96 pixels, with 25% average block motion. The representation vectors were of 2 x 2 x 4 resolution. The results obtained have shown that for PSNR = 32 db the required bit rate is 0.08 to 0.10 bits/pixel.
A New Coding Method For Low Bit-Rate Video Signals
G. Tu, L. Van Eycken, A. Oosterlinck
In this paper, a hybrid motion evaluation coder for low bit-rate (64 Kb/s) video sequence transmission is studied. More than one previous frames are used to estimate the temporary motion behavior in the motion evaluation process (rather than conventional motion estimation methods in which only the immediate former frame is used). The motion sequence is therefore estabished alongside the image data sequence. Through investigating and analysing the previous frame of this motion sequence, which are available to both the coder and the decoder, the moving objects can be identified from the background. The current image frame to be encoded is then divided into two types of regions: the background region which may not he transmitted at all but simply repeated with the data of the previously decoded frame, and the moving region which is encoded by compensating the motion information. The moving region is reconstructed in the normal frame sampling frequency while the background data are treated coarsely by refreshing them in a much lower frequency. The relative impact of the moving region with regard to the background region is also evaluated for assigning bits to the two separated regions. As long as this impact is significant enough, the background data may remain unchanged. The final reconstructed image frame is then the coarsely processed background superimposed by the decoded moving objects.
Color Image Display with a Limited Palette Size
Charles Bouman, Michael Orchard
Many image display devices limit the colors which can be simultaneously displayed to a small set of colors called a color palette. Usually, each element of the color palette can be selected by the user from a wide variety of available colors. Such device restrictions make it particularly difficult to display natural color images since these images usually contain a wide range of colors. The color quantization problem can be considered in two parts: the selection of an optimal color palette and the optimal mapping of each pixel of the image to a color from the palette. This paper applies a hierarchical tree structure to the design of the color palette. The tree structure allows palette colors to be properly allocated to areas of the color space which are densely populated and, in addition, greatly reduces the computational requirements of the palette design and pixel mapping tasks. Methods are presented for incorporating performance measures which reflect subjective evaluations of image quality. Incorporation of these measures into the quantization problem results in significantly improved image quality. The algorithms presented in the paper produce high quality displayed images with minimum artifacts and require far less computation than previously proposed algorithms using unstructured color palettes.
Screening Pictorial Images with Gray Level Reproducibility
Joseph Shou-Pyng Shu
This paper describes an improved digital screening technique to convert a continuous-tone image into a halftone (binary) image for pictorial image reproduction. The technique first generates a digital screen in gray scale (e.g., from 0 to 255) based on a set of mathematical equations; and then it adaptively thresholds the screen at each pixel using threshold value equal to the pixel gray intensity of input image to produce the output image which contains standard halftone dots. The screened image holds as many gray shades (e.g., 255) as in the input image by yielding different halftone dot sizes so that the binary image looks as if it has gray scales. The screen can be generated for either linear or nonlinear transformations from image gray tones into spatial sizes of halftone dots in order to preserve or to enhance low frequency information (i.e., average gray tone within the area of a halftone cell), respectively. In addition, the technique preserves sharp edges in the input image by producing asymmetric halftone dots. This technique works on a wide range of screen frequency (i.e., halftone dots per inch) and scan frequency (i.e., pixels per inch). Scanned and processed images are illustrated.
Digital Color Map Compression By Classified Vector Quantization
Harsha Potlapalli, Herb Barad, Andrew Martinez
This paper shows that combined color component coding can be performed for the compression of color maps. Classification, which was done to remove the printing irregularities in the maps, reduces the number of colors to a much smaller number which can be stored in a look-up table. Thus, the three bands that form the color map are replaced by a pseudo-color map. The pixel indices in this map are only pointers to their actual colors stored in the look-up table. Combined component coding increases the compression achieved with conventional coding by a factor of three.
Performance Analysis Of An Intraframe Quadrature Mirror Filter Subband Coding System
S. Brofferio, E. Marcozzi, L. Mori, et al.
A subband coding system for high quality video signals suitable for ATM packet communication has been investigated. Monochrome signals at a transmission rate of 0.84 bit/pixel have been simulated: The performances of classical linear phase Quadrature Mirror Filters and quasi-linear phase and minimum phase with perfect reconstruction analysis-synthesis QMF using matched quantizers have given identical results for the 16 coefficients filters; 8 coefficients linear phase filter and matched quantizer perform slightly better than perfect reconstruction ones.
Two Approaches To Transform Coding Using Classified Vector Quantization
G. Tu, L. Van Eycken, A. Oosterlinck
In this paper, two approaches of classified transform coding using vector quantization are presented and discussed. The image data, which are known to be statistically uncertain, are classified before any further quantization and coding operations. The local image activities can be ascertained either by using the neighboring decoded data e. finite-state method) or by examining the current data (i. e. activity-detection method). The finite-state method, which is more preferable to he implemented using small transform blocks (typically 4 x 4) due to the fact that the inter-block correlations decrease for larger block dimensions, provides satisfied data classifications without requiring extra indication bits. The activity-detection method is however more likely to have larger transform blocks in order to keep the number of the extra indication bits at a reasonable level. Other practical considerations for both methods are also given.
Software Tools for a Real-Tirne Processing System : 1515 : Image System Interactive Software OSIRIS : Open Software-Reconfigurable Interactive Real-time Imaging System
B. de Rouvray, O. Farabet
Real-time signal analysis has evolved significantly. In image processing, some commercially available products now reach real-time. Software has to be designed to make these hardware performances easy to use. This paper presents an image processing system working at video frequency. Its architecture and the software developed to control it are detailed. It is then shown how the speed ups provided by this user-friendly software allow to develop fastly applications with a simple example.
Real Time Movement Detection - An Algorithm For The Image Pipeline Processors µPD7281
Bento A. Correia, Fernando D. Carvalho
In this paper it is described an algorithm developed for real time movement detection, in order to maximize the parallel processing possibilities of the data flow microprocessors μPD7281 (NEC). The image processing board used in this work is the French PC-OEIL, which has four ImPP's, their companion chip called Magic, 1Mbytes of image memory and additionnal circuitry for image acquisition, digitization and display. The algorithm is based on the comparison between two images, in order to generate a third one that contains the enhanced differences. The proposed method eliminates efficiently the usual noise associated with video signals and its digital conversion. It also generates a number that indicates the amount of the global variations that have been detected. The high degree of compactness that has been achieved allows the generation of all that information on a single access to the image memories. The algorithm was implemented by direct assembler programming and has been applied with very good results to surveillance tasks.
Vector Quantization With Replenishment Technique For Video Signal Coding
H. Sun, C. N. Manikopoulos
The digital representation of a video signal or image sequence requires a very large number of bits. The goal of image coding is to reduce this number as much as possible and reconstruct a faithful duplicate of the original pictures. Vector quantization has been applied to the coding of image sequences. An effective approach is to define the vector in the spatial domain and to directly exploit the temporal redundancy, that is the replenishment techniques. In an extension of the frame replenishment concept, the changes arising between consecutive frames can be coded by updating both the frame label map and codebook in vector quantization. In this paper, we propose two new label replenishment algorithms which are based on vector quantization. The first is the encoding of the frame label map. In the scheme of label replenishment, an extra bit is used as side information to indicate whether the label needs to be replenished. The resulting frame label map can be seen as a new binary image. This binary image is encoded by a run-length coding. The second algorithm is label combination coding. In the motion portion of an image sequence, the neighboring blocks in the previous frame may remain neighbors as they move together. Based upon this observation, the rate deduction can be obtained by coding the combination of the neighboring blocks. The simulation results have shown that a very low bit rate with good reproduction quality is obtained for image sequence coding.
Discriminant Operators For The Emulation Of The Visual Receptive Fields
M. M. Gupta, S. K. Hungenahally
Low level visual information processing and information extraction are important aspects of any machine vision system. The biological retina is a robust visual low level preprocessor for the extraction of information in the visual channel. Extraction of transition and edge information from signals and visual images in the presence of noise is an important class of problems, both in the field of signal and image processing. In this paper we describe a class of discriminant operators for the extraction of transition information from noisy one dimensional signals. The discriminant operator, as briefly introduced in this paper, attempts to emulate the robust information extraction properties of the receptive fields found in the retina and the visual channel leading to the visual cortex. The discriminant operator, (0 operator) has differential properties with inherent aggregation. The differentiation property helps to extract transition and edge information, while the aggregation operation provides noise immunity. This theory will be used in the development of robust algorithms for robotics, medical image processing and vision prosthesis.
Application of Polynomial Transforms in Image Deblurring
Jean-Bernard Martens
In this paper, we present an algorithm for deblurring and interpolating digital images. Deblurring is an ill-posed problem, which can be made well-behaved by introducing assumptions about the input image. The algorithm described in this paper assumes that images can be described by local polynomial expansions. The mapping between an image and the coefficients of the polynomial approximation is called a polynomial transform. It is shown that, by using this image approximation, deblurring becomes equivalent to estimating the polynomial transform coefficients of the image. In practice, the noise and sampling of the blurred image limit the order of the coefficients that can be estimated reliably.
Highly Accurate 3-D Position Measurement and Non-Linear Coordinate Transformation
Hiroshi Naruse, Yoshihiko Nomura, Michihiro Sagara
An important preprocess in computer vision is to measure the position of an object and to eradicate lens distortion which hampers the recognition of an object from its image. At first, the slit-ray projection method is investigated and some ideas for increasing measurement accuracy are proposed. A regression to normal distribution is achieved by the least squares method for the pixel positions and their intensities. Because of this, the slit-ray center position can be determined with a high accuracy of 0.05 pixel. A method for correcting the distortion of the slit-ray intensity caused by non-uniform reflection is described. It involves compensating the intensity on the basis of a uniform reference light. Optimum measurement conditions for the slit-ray width and the regression range are obtained by examining the relation between intensity fluctuations and measurement error. In order to increase 3-D measurement accuracy, methods for calibrating various parameters are described. Next, by combining three ideas, i.e., the reference of the shift quantities, the en bloc processing of pixel groups and the separation of the transformation into two directions, non-linear coordinate transformation time is reduced. This is effective for the speedy correction of image distortion.
Principles and Applications of a New Cooperative Segmentation Methodology
P. Bonnin, B. Zavidovique
This paper illustrates three principles of a generic segmentation method, and their application to two particular cases of scene analysis. These cases refer to different applications, as the target tracking from infrared imagery described in [BPZ89], or the 3D reconstruction of indoor / outdoor scene, in man-made environments, from visible imagery described in [BBHZ89]. The first principle of the proposed segmentation methodology is in considering future implementation on parallel computers either of the MIMD (Multiple Instructions Multiple Data) or SIMD type (Single Instruction Multiple Data). The developped algorithms must exploit the possibilities of these computers. The second principle is the early introduction of an a priori knowledge. So the segmentation method is specific to a particular case of scene analysis. It relies on both a physical model related to the image formation (depending totally on the spectral domain, here IR or visible) , and a conceptual model related to the application, here tracking or 3D reconstruction. This introduction of knowledge allows the system to segment "finely" only certain interesting parts of the image. The third principle is in forcing cooperative or guided segmentation. For instance, robust segmentation algorithms often require simultaneous information about homogeneity and disparity properties of the image. The proposed method, which implies a great cooperation between edge and region detectors, enhancing these previous properties, satisfies this requirement.
Image Estimation And Missing Observations Reconstruction By Means Of A KALMAN Like Filter
M. Dirickx, A. Acheroy
The purpose of the presented method is the noise reduction and the estimation of missing frames in interlaced images. In the case all the frames are present, there are two possible semi-causal optimal Kalman filters whose equations are only reducible if the image formation can be described by a first order separable Markov process : the first filter is causal in the direction of the rows and non causal in the direction of the columns, the second filter is causal in the direction of the columns and non causal in the direction of the rows. In the case of interlaced images with one missing frame, only one of two lines is observed and only the second Kalman filter is reducible.
Local Thresholding Technique Based On A Run Representation
V. Eramo, C. Raspollini
This paper describes a technique for local thresholding of large digital images, obtained from line drawings of low quality. It is based on two main steps. First, the gray level image is thresholded with a global value, to discard those pixels which surely belong to a non significant background. A suitable run representation is used to compress the resulting image. Then, for each run, a set of connected runs is used to compute a local thresholding value. This threshold is applied to the pixels of the current run, to obtain the final binary image.
Fast Contour Line Extraction Algorithm Observing Line Continuation
Takashi Nishimura, Tsuyoshi Fujimoto
This paper describes a new contour line extraction algorithm that yields one-pixel-wide contour lines with excellent continuity and low noise from gray-scale images. Conventional methods are extremely slow because the line thinning process, needed after line extraction, is iterative. The new algorithm effectively integrates local maximal point extraction and thresholding by checking line continuity. Since the two processes, point extraction and thresholding, are carried out simultaneously, the processing speed of this algorithm is about 14 times faster than a conventional method which combines Prewitt's differential type operation with Deutsch's thinning algorithm.
Histogram Equalization Utilizing Spatial Correlation For Image Enhancement
M. Kamel, Lian Guan
Histogram equalization (HE) techniques are widely used for image enhancement due to their simplicity and effectiveness. Most of the existing HE techniques assume image pixels to be randomly distributed over the image space. In general, adjacent image pixels are highly correlated, it is more reasonable to design HE methods that utilize this correlation. In this paper, we present a method of HE that takes spatial correlation among pixels into account. The concurrence of the gray values of adjacent pixels is calculated to form conditional probabilities of each gray level with respect to other gray levels in the image. The HE is then obtained using these probabilities. We call this method Conditional Histogram Equalization (CHE). Experimental results show that the proposed method generates images that are visually more pleasant than the ones generated by conventional HE techniques. The results also show that this method avoids the problem of over stretching the contrast in images with highly peaked histograms.
Subband Encoding of Video Sequences
John W. Woods, T. Naveen
Several methods to perform subband coding of video sequences are studied. The subband decomposition can be used in coding the error frames as well as in estimating the motion in the image sequence. When motion compensation was not used, subband coding was found to offer numerical and subjective improvement over DPCM coding alone. Robust motion estimation using the pel recursive motion estimation was performed with the QMF pyramid in a hierarchical structure. A method for subband coding the residue in a hierarchical motion estimation environment is given. This method gave stable and better SNRs over non-subband coding schemes. Finally motion compensated interpolation and extrapolation are studied using subband coding and motion estimation schemes.
A Pyramidal Image Coder with Contour-Based Interpolative Vector Quantization
Yo-Sung Ho, Allen Gersho
The Laplacian pyramid is a versatile data structure that represents an image as a sequence of spatially filtered and decimated versions of the original image. To obtain an expanded approximation from the processed decimated samples, linear interpolation is commonly used due to its computational simplicity; however, linear interpolation does not consider the gray-level change in the neighborhood of each interpolated pixel, and thus creates unpleasant staircase effects near edges and visually annoying blocking effects in shade regions. This paper proposes a new pyramidal image coding algorithm using contour-based interpolative vector quantization (VQ) to obtain a perceptually better interpolated image through the use of edge information at each level of the pyramid. The basic idea is to code the image using VQ on a level-by-level basis in the hierarchical structure. The decimation implies that the redundancy existing over large image areas can be efficiently exploited by VQ of a low dimension. In order to improve the quality of the reconstructed image, an adaptive corrector stage is added at the final stage of the pyramid to selectively encode the residual errors as needed. The locations for correction are hierarchically predicted using the previously encoded data at the higher levels. The above scheme is a very efficient combination of VQ and contour-based interpolation in the hierarchical pyramid structure, and is amenable to progressive image transmission. Experimental results show that good reconstructed images are obtained at 0.25 bits per pixel with a reasonable coding complexity.
Subband Coding of Image Sequences at Low Bit Rates
J. Biemond, P. H. Westerink, F. Muller
In this paper a low bit rate subband coding scheme for image sequences is described. Typically, the scheme is based on temporal DPCM in combination with an intraframe subband coder. In contrast to previous work, however, the subbands are divided into blocks onto which conditional replenishment is applied, while a bit allocation algorithm divides the bits among the blocks assigned for replenishment. A solution is given for the "dirty window" effect by setting blocks to zero that were assigned to be replenished but received no bits. The effect of motion compensation and the extension to color images are discussed as well. Finally, several image sequence coding results are given for a bit rate of 300 Kb/s.
Subband Coding of Images Employing an Efficient Parallel Filter Bank
John Hakon Husoy, Tor A. Ramstad
The principle of subband coding has recently been successfully applied for the purpose of image data compression. The analysis/synthesis filter banks and the subband signal encoder/decoder are the main parts of a subband coder. Previously the filter banks in image subband coders have been designed with FIR filters organized in a tree structure. In this paper we propose a novel structure that deviates from previously reported structures in these respects: 1) The filter bank is organized as a parallel structure. 2) The filters employed are 1st order IIR allpass filters featuring very low computational complexity. 3) The resulting subbands are complex. 4) The filter bank features exact reconstruction in the absence of encoding/decoding. In the paper we present the filter bank structure. The computational complexity is evaluated and shown to be considerably lower than that of existing structures. The problem of extension of the finite length signals in a subband coder is mentioned along with a method, recursively formulated circular convolution , for dealing with it. We have incorporated this filter bank structure into a complete image subband coder. Finally, coding results obtained by computer simulation are presented.
Image Coding on a Hexagonal Pyramid with Noise Spectrum Shaping
B. Mahesh, W. A. Pearlman
A hexagonally sampled image is split into a low pass band and nine pass bands of one octave width and sixty degrees angular orientation. The conditions to be satisfied by the filter banks for perfect reconstruction are presented. The human visual system's response to stimuli at differing spatial frequencies is then employed to shape the coding noise spectrum. Rate is allocated under a frequency weighted mean squared error distortion measure. A framework is presented for doing this employing either the power spectral density of the image or the variance of the subbands. Both adaptive and non-adaptive entropy coding simulations are carried out under a Laplacian source distribution. Transparent coding results are presented at rates below 1 bit per pixel.
Multirate Monochrome Image Compression Based On A Novel Deconvolution Of The Analysis/Synthesis Filter Banks
Mark J. T. Smith, Steven L. Eddins
This paper reports on the authors' recent work in developing a new multirate method for image coding at low bit rates. The method follows the approach which was introduced at ISCAS 89 [1] and is based on a multirate maximally decimated filter bank structure. Difficulties and solutions related to the performance of the system are discussed in this paper. The proposed structure attempts to preserve the perceptually significant characteristics of the reconstructed image in the presence of quantization. Methods for coding the new multirate representation are still under development. However, preliminary coding results are shown in this paper using vector quantization and adaptive differential PCM. Results based on a more efficient contour encoding method will be presented at the conference.
Sub-Band Coding For ATV Signals Based On Spatial Domain Considerations
Ting-Chung Chen, Paul E. Fleischer
The effect of different finite impulse response analysis/synthesis filter pairs on sub-band coding performance is investigated based on spatial domain considerations. Deterministic as well as real video signals are used for rate and distortion evaluations of various filter pairs. A sub-band coding scheme suitable for advanced television (ATV) transmission in Broadband ISDN is then described based on the above simulations. The scheme uses the shortest kernel (2-tap) quadrature mirror filter (QMF) pairs for signal decomposition and reconstruction. The short span of the filter not only avoids quantization noise spread but also allows simple implementation in the high rate, high quality application considered here. Through successive QMF operations, the signal is decomposed into six bands to achieve low noise coding at the desired rate. The resultant coding scheme is identical to a Hadamard transform (HT) with non-constant block sizes. In addition to excellent coding performance, the hardware architecture is also much simpler than conventional transform coding or sub-band coding schemes.
On Implementing The Jpeg Still-Picture Compression Algorithm
R. Aravind, G. L. Cash, J. P. Worth
In this paper, we discuss an implementation of a compression technique for still images that is currently being proposed as a standard by the Joint Photographic Experts Group (JPEG), a specialists group working under the aegis of the International Organization for Standardization (ISO) and the CCITT. We discuss first the overall configuration of the compression system, and then we describe the general implementation of the encoder and decoder. The system we pay most attention to is a hierarchical system which encodes an image in three stages. Last, we focus on a DSP-based system that can perform the decoding operation in real-time for a channel data-rate of 64 kbps. This hardware has been built with only off-the-shelf boards. It is capable of decoding and displaying a coarse first-stage image compressed to 0.25 bits/pixel in about one second, a second-stage update compressed to 0.5 bits/pixel in roughly two seconds, and finally a third-stage update compressed to 1.5 bits/pixel (for a total overall bit-rate of 2.25 bits/pixel) in five seconds.
Reliable Bayes Classification Using Multiple Description Segments And Its Applications In Scene Analysis
Nelson H. C. Yung, Kevin Jones
This paper presents the theory and realisation of the Multiple Description Segment (MDS) technique used in a commercially available integrated vision environment. It was applied to a sequence of scenes of an office corridor. The preliminary results of the classification have shown confident decision by the classifier on well defined objects and objects with high complexity and noise interference. Objects that are very similar (but different in functionality) were also recognised correctly. The merits and pitfalls of the technique and the future direction of development will be discussed.
A Note on Detecting Dominant Points
Nirwan Ansari, Edward J. Delp
Detecting dominant points is a crucial preprocessing step for shape recognition and point-based motion estimation. Polygonal approximation has been a commonly used approach to detecting dominant points. This paper presents two alternatives which detect stable dominant points.
Two-Parameter Cubic Convolution For Image Reconstruction
Stephen E. Reichenbach, Stephen K. Park
This paper presents an analysis of a recently-proposed two-parameter piecewise-cubic convolution algorithm for image reconstruction. The traditional cubic convolution algorithm is a one-parameter, interpolating function. With the second parameter, the algorithm can also be approximating. The analysis leads to a Taylor series expansion for the average square error due to sampling and reconstruction as a function of the two parameters. This analysis indicates that the additional parameter does not improve the reconstruction fidelity - the optimal two-parameter convolution kernel is identical to the optimal kernel for the traditional one-parameter algorithm. Two methods for constructing the optimal cubic kernel are also reviewed.
TLS Grammars : Inference, Parsing, and Some Results
Jacques Blanc-Talon, Bertrand Zavidovique
This paper describes a grammatical 2D-pattern recognition process. In order to build the inference process on stable features of the object being studied, we set up a new type of grammars, so-called TLS-grammars. Attributes and related operators on chains, which are defined in this 2D-application as particular links between confirmed segments, are introduced inside the grammar productions and processed within the parsing step. The grammar rules are weighted with probabilities; they are dynamically modified during the parsing step. Finally, experimental results from real patterns are discussed.
Recognition of Equations Using a Two-Dimensional Stochastic Context-Free Grammar
P. A. Chou
We propose using two-dimensional stochastic context-free grammars for image recognition, in a manner analogous to using hidden Markov models for speech recognition. The value of the approach is demonstrated in a system that recognizes printed, noisy equations. The system uses a two-dimensional probabilistic version of the Cocke-Younger-Kasami parsing algorithm to find the most likely parse of the observed image, and then traverses the corresponding parse tree in accordance with translation formats associated with each production rule, to produce eqn I troff commands for the imaged equation. In addition, it uses two-dimensional versions of the Inside/Outside and Baum re-estimation algorithms for learning the parameters of the grammar from a training set of examples. Parsing the image of a simple noisy equation currently takes about one second of cpu time on an Alliant FX/80.
Real-Time Video Signal Processing System For Dynamic Images
Nobuyuki Yagi, Ryoichi Yajima, Kazumasa Enami, et al.
A real-time video signal processing system (Picot-system) has been developed which processes multiple color time-varying images at video-rate and carries out various image processing functions such as edge detection, and geometrical transformation. The system is a multi-processor system including several hundred processors, and is divided into cascaded clusters, each of which has 16 processors. Each processor has an image memory to store all data required for its own processing, thus eliminating memory-access conflict. These processors and cluster inputs/outputs are connected by a crossbar network, which carries out all combinations of connections, and processors operate in both parallel and pipeline fashion. Micro-program controlled system has a control mechanism and arithmetic functions suitable for image signal processing. Performance of the system improves in proportion to the number of processors. The processors and network are virtually all LSIs, which use CMOS gate-array technology. The Picot-system can be applied to various fields such as medical imaging, robot vision, video CODEC, broadcast video production, and so on.
A Double Precision High Speed Convolution Processor
F. Larochelle, J. F. Coté, A. S. Malowany
There exist several convolution processors on the market that can process images at video rate. However, none of these processors operates in floating point arithmetic. Unfortunately, many image processing algorithms presently under development are inoperable in integer arithmetic, forcing the researchers to use regular computers. To solve this problem, we designed a specialized convolution processor that operates in double precision floating point arithmetic with a throughput several thousand times faster than the one obtained on regular computer. Its high performance is attributed to a VLSI double precision convolution systolic cell designed in our laboratories. A 9X9 systolic array carries out, in a pipeline manner, every arithmetic operation. The processor is designed to interface directly with the VME Bus. A DMA chip is responsible for bringing the original pixel intensities from the memory of the computer to the systolic array and to return the convolved pixels back to memory. A special use of 8K RAMs allows an inexpensive and efficient way of delaying the pixel intensities in order to supply the right sequence to the systolic array. On board circuitry converts pixel values into floating point representation when the image is originally represented with integer values. An additional systolic cell, used as a pipeline adder at the output of the systolic array, offers the possibility of combining images together which allows a variable convolution window size and color image processing.
NTSC-CIF Mutual Conversion Processor
Shinji Nishimura, Hideo Kuroda, Yutaka Suzuki, et al.
An NTSC-CIF mutual conversion processor is developed that consists of just four VLSI chips, and external four frame memories and five field memories. The new processor can reversibly convert NTSC signals into any one of the four CIF formats. The processor replaces conventional systems consisting of several hundred chips with commensurately lower cost and power consumption. Many circuit devices are developed for the processor. A new PLL circuit, similar to a "tanlock loop" for color demodulation, detects phase error between input and local color sub-carrier by looking up a ROM table. By limiting the detectable phase error range to +/- 45 degrees, the ROM capacity is drastically reduced to 1/146 of that needed for +/- 90 degrees. In order to avoid picture degradation, parallel different band-width filters are adopted. Coefficient sets of horizontal and vertical filters are shared by the four formats. A pre-filter smoothes motion and reduces noise. It utilizes the double buffer frame memories to minimize the required external memory capacity. The VLSI chips are constructed with the advanced Bi-CMOS technology. A 0.8 micron design rule is used and the die size is 144 mm2. A 208-pin PGA package format is used.
Fractal-Based Pattern Recognition And Its Applications To Cell Image Analysis
Peng Zhang, Herb Barad, Andrew Martinez
The fractal dimension of a surface is a useful measure of the roughness of the surface. A method of estimating fractal dimension using mathematical morphology is derived and is applied to cell image analysis. By utilizing the fractal dimension property, the cells can be automatically classified as labeled or unlabeled cells.
Using Fractal And Morphological Criteria For Automatic Classification Of Lung Diseases
Jacques Levy Vehel
Medical Images are difficult to analyze by means of classical image processing tools because they are very complex and irregular. Such shapes are obtained for instance in Nuclear Medecine with the spatial distribution of activity for organs such as lungs, liver, and heart. We have tried to apply two different theories to these signals: - Fractal Geometry deals with the analysis of complex irregular shapes which cannot well be described by the classical Euclidean geometry. - Integral Geometry treats sets globally and allows to introduce robust measures. We have computed three parameters on three kinds of Lung's SPECT images: normal, pulmonary embolism and chronic desease: - The commonly used fractal dimension (FD), that gives a measurement of the irregularity of the 3D shape. - The generalized lacunarity dimension (GLD), defined as the variance of the ratio of the local activity by the mean activity, which is only sensitive to the distribution and the size of gaps in the surface. - The Favard length that gives an approximation of the surface of a 3-D shape. The results show that each slice of the lung, considered as a 3D surface, is fractal and that the fractal dimension is the same for each slice and for the three kind of lungs; as for the lacunarity and Favard length, they are clearly different for normal lungs, pulmonary embolisms and chronic diseases. These results indicate that automatic classification of Lung's SPECT can be achieved, and that a quantitative measurement of the evolution of the disease could be made.
Low-Level Segmentation Of Multidimensional Medical Images: An Expert System
Sai Prasad Raya, Gabor T. Herman
The basic framework of an expert rule-based segmentation system designed to handle some of the low-level segmentation problems in medical imaging is presented. The system is modular in nature consisting of two associative memories and a number of independent modules specialized in different aspects of segmentation. The basic control structure is that of a production system in which knowledge is formulated into condition-action rules. Some preliminary results pertaining to low-level segmentation of structures of the brain via magnetic resonance imaging are also presented.
Estimation Of Heart Wall Velocity Using The Wigner Distribution
Ernest M. Stokely, Malani Mandumula
The Wigner distribution (WD) has been suggested as an alternative to the Fourier transform as a spatiotemporal-frequency method for measuring optical flow. The WD has the advantage of pixel-by-pixel estimation of the local velocity field in an image set; however, the performance of the WD in estimating apparent local velocity in an nonhomogeneous optical flow field is uncertain. This paper substantiates the high spatial resolution of the WD in estimating the optical flow in an image with a highly nonhomogeneous velocity field. A simple method using linear regression is presented for estimating from the WD the velocity components at a pixel. One of the characteristics of the WD, the lack of a superposition property, gives rise to a cross term in the WD of the sum of 2 image sets that can lead to serious errors in the estimation of optical flow when a simple constant background is added to the image set. These properties are demonstrated with simulated data, and with data from a scintigraphic imaging study of the human heart. The results show that the value of the estimated velocity vector is influenced by the location of the point of inspection. Sketches of the local correlation function for a 1-D moving waveform lend insight into the relationship between the placement of the point of inspection and the estimated velocity.
Semi-Automatic Extraction Of The Left-Ventricular Chamber From 3-D CT Cardiac Images
William E. Higgins, Namsik Chung, Erik L. Ritman
Given a high-resolution three-dimensional (3-D) volumetric image (volume) of the heart, one can estimate the volume and 3-D spatial distribution of left ventricular (LV) myocardial muscle mass. The first stage of this problem is to extract the LV chamber. The prevalent techniques for solving this problem require manual editing of the data on a computer console. Unfortunately, manual editing is subject to operator errors and biases, only draws upon two-dimensional views, and is extremely time consuming. We describe a semi-automatic method for extracting the volume and shape of the LV chamber from a 3-D CT image (or volume) of the heart. For a given volume, the operator first performs some simple manual edits. Then, an automated stage, which incorporates concepts from 3-D mathematical morphology and topology and the maximum-homogeneity filter, extracts the LV chamber. The method gives more consistent measurements and demands considerably less operator time than manual slice-editing.
Statistical Refinement Of Transmission Computed Tomograms In High Photon Counting Noise
Ken Sauer, Bede Liu
Tomographic reconstruction is conventionally performed by algorithms derived from deterministic views of the Radon transform inversion problem. Probably the two best-known approaches are convolution backprojection (CBP), and the algebraic reconstruction technique (ART). ART treats the image pixel values as unknowns in a very large set of linear equationsl. The iterative steps in ART can be thought of as a succession of projections onto convex sets, each of which includes all images satisfying one of the observed projection values. Standard ART includes no noise compensation, while the more widely commercially used CBP includes compensation for photon counting noise in projection data, but implicitly assumes stationarity. In many common imaging problems, the results of this assumption are not serious, since the object may be relatively homogeneous, or corruption in the image may be dominated by effects not directly associated with photon counting noise. But reconstructions of nonhomogeneous objects from low dosage transmission data may involve nonstationary, nonisotropic noise patterns which result from the data dependence of noise statistics in projections.
Resolution Enhancement of Magnetic Resonance Images Using an Iterative Data Refinement Technique
D. W. Ro, G. T. Herman, P. M. Joseph
Iterative Data Refinement (abbreviated IDR) is a general procedure which encompasses many special procedures for image reconstruction and for related problems. It is a procedure for estimating data that would have been collected by an idealized measuring device from data that were collected by an actual measuring device. Such approaches have been applied successfully in areas of reconstruction in x-ray tomographic radiology. In fact, IDR is general enough to encompass standard approaches to data recovery, such as the Error-Reduction and the Hybrid Input-Output methods. Along similar lines, IDR provides a common framework within which new algorithms can be developed for improved magnetic resonance imaging (MRI). We have applied and implemented the approach of IDR to a specific problem in MRI, namely to the correction of spatially-dependent blurs due to short local transverse relaxation (T2) values. The algorithm is designed to reconstruct T2-weighted spin density images with improved spatial resolution. The practical computational significance of using the IDR approach will be illustrated by the reconstruction of mathematical phantoms. We have found that over-relaxation of the algorithm improves computational speed by up to a factor of five.
Combined-Coding Techniques For Radiographic Image Data Compression
Ya-Qin Zhang, Murray H. Loew, Raymond L. Pickholtz
A new coding scheme, namely combined coding (CC) is proposed in this paper. The main point behind CC is to divide the image into two image sets with different statistical properties, then exploit those properties to code the sets more efficiently. Three variations are distinguished and discussed. Two particular CC schemes are identified as CC scheme 1 and CC scheme 2. For the CC 1, i.e. Run-length mapping then entropy coding for the upper image set (UIS) and Peano-differential mapping then Ziv-Lempel coding for the lower image set (LIS), the experimental results are tabulated and compared with other methods. For CC scheme 2, i.e. same coding method for the UIS and transform coding the LIS, both the rate-distortion performance and actual coding results are given. It is noted that the CC significantly reduces the "blocking effect" and this property is very useful for clinical decisions. This scheme seems to provide a good tradeoff and an efficient solution to practical image transmission and archival applications.
Application Of Entropy-Constrained Vector Quantization To Waveform Coding Of Images
P. A. Chou
An algorithm recently introduced to design vector quantizers for optimal joint performance with entropy codes is applied to waveform coding of monochrome images. Experiments show that when such entropy-constrained vector quantizers (ECVQs) are followed by optimal entropy codes, they outperform standard vector quantizers (VQs) that are also followed by optimal entropy codes, by several dB at equivalent bit rates. Two image sources are considered in these experiments: twenty-five 256x 256 magnetic resonance (MR) brain scans produced by a General Electric Signa at Stanford University, and six 512 x 512 (luminance component) images from the standard USC image database. The MR images are blocked into 2 x 2 components, and the USC images are blocked into 4 x 4 components. Both sources are divided into training and test sequences. Under the mean squared error distortion measure, entropy-coded ECVQ shows an improvement over entropy-coded standard VQ by 3.83 dB on the MR test sequence at 1.29 bit/pixel, and by 1.70 dB on the USC test sequence at 0.40 bit/pixel. Further experiments, in which memory is added to both ECVQ and VQ systems, are in progress.
A Hashing-Based Search Algorithm for Coding Digital Images by Vector Quantization
Chen-Chau Chu
This paper describes a fast algorithm to compress digital images by vector quantization. Vector quantization relies heavily on searching to build codebooks and to classify blocks of pixels into code indices. The proposed algorithm uses hashing, localized search, and multi-stage search to accelerate the searching process. The average of pixel values in a block is used as the feature for hashing and intermediate screening. Experimental results using monochrome images are presented. This algorithm compares favorably with other methods with regard to processing time, and has comparable or better mean square error measurements than some of them. The major advantages of the proposed algorithm are its speed, good quality of the reconstructed images, and flexibility.
System for Lossless Digital Image Coding/Decoding
H. H. Torbey, H. E. Meadows
This study revisits simple methods, subsampling and modulo differentiation, to combine them in a system suitable for operation in ISDN as well as non-ISDN environments with good overall performance. The modulo difference of selected pairs of pixels in a digital image enables the design of a hierarchal code which represents the original digital image without any information loss. The pixels are selected by pairing together consecutive non-overlapping pixels in the original image and then in successive subsampled versions of it. The subsampling is 2:1 and, as the pairing, is alternated horizontally and vertically. This algorithm uniquely combines hierarchal and parallel processing. Further processing of the hierarchal code through an entropy coder achieves good compression. If a Huffman code is used, default tables can be designed with minimum penalty. Parallel processing allows the coder to operate at various speeds, up to real time, and the hierarchal data structure enables operations in a progressive transmission mode. The algorithm's inherent simplicity and the use of the modulo difference operation make this coding scheme computationally simple and robust to noise and errors in coding or transmission. The system can be implemented economically in software as well as in hardware. Coder and decoder are symmetrical. Compression results can be slightly improved by using 2-D prediction but at the cost of an increase in system complexity and sensitivity to noise.
Adaptive Thresholding and Sketch-Coding of Grey Level Images
Thrasyvoulos N. Pappas
We consider the problem of obtaining a binary sketch from a grey level image and coding it at very low bit rates. Halftone representations of images have too many transitions from level to level, and cannot be coded at similar rates. We use an image segmentation algorithm, presented in a previous paper, to obtain the binary sketch. It is an adaptive thresholding scheme that uses spatial constraints and takes into consideration the local intensity characteristics of the image to obtain the sketch. The algorithm is applied to a variety of images. In particular, it is applied to images of human faces. The sketches preserve most of the information necessary for recognition. They are similar to what an artist would sketch with a few brushstrokes. The sketches are coded by tracing the boundaries of the black or white regions. For the human face images, our coding scheme typically requires less than 0.1 bit/pixel. This is substantially lower than what waveform coding techniques require. We also compare our technique to other thresholding schemes as well as edge detectors, and demonstrate its advantages.
Block Transform Image Coding By Multistage Vector Quantization With Optimal Bit Allocation
Limin Wang, Morris Goldberg
In this paper, we introduce a multistage vector quantization technique (MVQ-OBA) applied to the transform coefficients, where the effective number of bits assigned to each coefficient is proportional to the coefficient variance. An optimal bit allocation map for a given bit rate, {Bij}, is first found based on the variances of the transform coefficients. The optimal bit allocation map, {Bij} is then sliced into a set of bit allocation planes {Bk,ij, k = 0,1,...} by applying a set of thresholds {Tk, k = 0, 1,...}. Here, Bk,ij indicates the number of bits assigned to coefficient (i, j) at stage k. The transformed image is then vector quantized on a stage-by-stage basis where, at each stage k, only the (residual error) coefficients assigned a non-zero number of bits are combined into vectors and vector quantized with a codebook of size 2∑i,jBk,ij Since only part of the coefficients are included into vectors and a relatively small codebook is used at each stage, the overhead required for transmitting the codebook is significantly reduced. Furthermore, as MVQ-OBA operates in a multistage manner where the information transmitted up to each stage corresponds to an approximation of the image, it is well suited for progressive image transmission.
Discrete Cosine Transform Image Coding With Sliding Block Codes
Ajay Divakaran, William A. Pearlman
A transform trellis coding scheme for images is presented. A two dimensional discrete cosine transform is applied to the image followed by a search on a trellis structured code. This code is a sliding block code that utilizes a constrained size reproduction alphabet. The image is divided into blocks by the transform coding. The non-stationarity of the image is counteracted by grouping these blocks in clusters through a clustering algorithm, and then encoding the clusters separately. Mandela ordered sequences are formed from each cluster i.e identically indexed coefficients from each block are grouped together to form one dimensional sequences. A separate search ensues on each of these Mandela ordered sequences. Padding sequences are used to improve the trellis search fidelity. The padding sequences absorb the error caused by the building up of the trellis to full size. The simulations were carried out on a 256x256 image ('LENA'). The results are comparable to any existing scheme. The visual quality of the image is enhanced considerably by the padding and clustering.
Two Channel Image Coding Scheme
Diego P. de Garrido, Paulo Roberto R. L. Nunes, Jacques Szczupak
In this paper we present a two channel image compression technique which allows good compression rates and image quality. The technique consists in splitting the image into two parts: a low-pass one, which represents the general area brightness without sharp contours, and the high-pass part containing mainly sharp edge information. This decomposition into low and high-frequency components was first presented in the so called Synthetic High System. The Synthetic Highs System exploits elegantly the properties of the visual system and consists in detecting and coding edges separately from low frequency components. The method proposed here follows the general idea of the Synthetic Highs System, though there is no edge-detection involved. Most of the contribution of our scheme relates to the way the two frequency bands are coded: transform coding for the low-pass part and vector quantization for the high-pass one. The system was tested using several monochromatic images with sizes 256 x 256 and 512 x 512; good results were obtained in terms of visual quality and peak signal-to-noise ratio. Blocking effects were not observed for the images tested.
Multi-Stage Vector Quantization Based On The Self-Organization Feature Maps
J. Li, C. N. Manikopoulos
A neural network clustering algorithm, termed Self-Organization Feature Maps(SOFM) proposed by Kohoneni, is used to design a vector quantizer. The SOFM algorithm differs from the LBG algorithm in that the former forms a codebook adaptively but not iteratively. For every input vector,the weight between the input node and the corresponding output node is updated by encouraging a shift toward the center of gravity in the due influence region. Some important properties are discussed, demonstrated by examples, and compared with the LBG algorithm. Based on this clustering algorithm, a very practical image sequence coding scheme is proposed, which consists of two cascade neural networks. The first stage network is adapted with every frame so that the coder can quickly track the local changes in the picture. Simulation results have shown that quite robust performance can be achieved with high signal to noise ratio using the absolute value distortion measure. Additionally, the cascade scheme substantially reduces the computation complexity.
Picture Coding With Switchable Dynamic Quantizers
Chung H. Lu, Shing-Sun Yu
This paper presents a picture coding scheme with switchable dynamic quantizers. The image data compression technique is a hybrid of subsampling, pyramid coding, block truncation coding, interpolative coding, dynamic quantization and differential quantization. In addition to the differential quantizer, there are three other quantization modes. According to image activity, each block of the subsampled image can be quantized and coded dynamically in one of four modes. The coding technique can be adapted for progressive transmission of images and can be extended to color images.
Machine Vision Algorithms on the NCUBE/10
David R. Kaeli
Presented is the work in progress, benchmarking machine vision algorithms run on the NCUBE/10 system. Early results show that speedups of O(log n) are achievable on low-level floating point algorithms.
Design and VLSI Implementation of Efficient Systolic Array Architectures for High-Speed Digital Signal Processing Applications
D. V. Poornaiah, M. O. Ahmad
In this paper, we present a class of systolic array architectures (SAAs) that can be efficiently used to implement many computationally intensive DSP functions. A salient feature of the proposed architectures is that depending on an application, it is possible to adapt an SAA design intended for particular DSP function and arithmetic type in order to perform other DSP function and arithmetic operation with a minimal hardware modification involving suitable interconnection of the basic building blocks: the inner-product-step-processor (IPSP) cells. This facilitates the process of automatic selection of a particular SAA design depending on the algorithm description of a given DSP function. Furthermore, the proposed SAAs either totally eliminate or minimize the use of separate adder modules which are normally used along with multiplier units in order to perform the inner-product-step computations involved in 1- and 2-dimensional DSP functions. Finally, the use of the proposed schemes results in reductions in the computation time, the area, and the number of cell types thus making them highly attractive for VLSI implementation.
Vector-Centered CAM Architecture For Image Coding Using Vector Quantization
S. Panchanathan, M. Goldberg
In this paper, a vector-centered CAM architecture for image coding using vector quantization is presented. In vector quantization(VQ), a set of representative vectors (codebook) is generated from a training set of vectors. The input vectors to be coded are quantized to the closest codeword of the codebook and the corresponding index(label) of the codeword is transmitted. Thus, VQ essentially involves a search operation to obtain the best match. Traditionally, the search mechanism is implemented sequentially, where each vector is compared with the codewords one at a time. For K input vectors of dimension L, and a codebook of size N, the search complexity is O(KLN) which is compute intensive making real-time implementation of VQ algorithm difficult. A content-addressable memory (CAM) based architecture (where the data is accessed simultaneously and in parallel on the basis of its content) which exploits parallelism in the directions of L and K resulting in real-time implementation of VQ has been reported. Here, the CAM cell is essentially pixel-centered and hence K*L cells have to be organized in L parallel modules, K cells per module, which implies a large hardware complexity. Furthermore, the results of the search operation in the individual modules have to be combined which involves a high communication overhead. In this paper, we propose a vector-centered CAM cell which essentially stores the entire vector (Vij) in a cyclic stack. The individual pixels in the vector are then circulated into the search portion of the cell. The result of the search operation are stored in a response register which is also organized within the cell. The proposed design has the advantages of reduced hardware complexity, low communication overhead and modularity which make possible VLSI implementation of the architecture. An analysis of the savings in hardware complexity, and communication overhead is also presented in this paper.
A 2-D Convolver Architecture For Real-Time Image Processing
David Landeta, Chris W. Malinowski
This paper presents a novel architecture for two VLSI ICs, an 8-bit and 12-bit version, which execute real-time 3x3 kernel image convolutions at rates exceeding 10 ms per 512x512 pixel frame (at a 30 MHz external clock rate). The ICs are capable of performing "on-the-fly" convolutions of images without any need for external input image buffers. Both symmetric and asymmetric coefficient kernels are supported, with coefficient precision up to 12 bits. Nine on-chip multiplier-accumulators maintain double-precision accuracy for maximum precision of the results and minimum roundoff noise. In addition, an on-chip ALU can be switched into the pixel datapath to perform simultaneous pixel-point operations on the incoming data. Thus, operations such as thresholding, inversion, shifts, and double frame arithmetic can be performed on the pixels with no extra speed penalty. Flexible internal datapaths of the processors provide easy means for cascadability of several devices if larger image arrays need to be processed. Moreover, larger convolution kernels, such as 6x6, can easily be supported with no speed penalty by employing two or more convolvers. On-chip delay buffers can be programmed to any desired raster line width up to 1024 pixels. The delay buffers may also be bypassed when direct "Sum-Of-Products" operation of the multipliers is required; such as when external frame buffer address sequencing is desired. These features make the convolvers suitable for applications such as affine and bilinear interpolation, one-dimensional convolution (FIR filtration), and matrix operations. Several examples of applications illustrating stand-alone and cascade mode operation of the ICs will be discussed
Multiprocessor DSP With Multistage Switching Network And Its Scheduling For Image Processing
Yasuyuki Okumura, Kazunari Irie, Ryozo Kishimoto
This paper proposes a dynamic load balancing method using a multistage switching network to solve one of the greatest problems in multiprocessor DSPs for image processing: load concentration on certain processors. This method balances the processing load by distributing the total load among processor elements whose loads are small. The load distribution is performed by the multistage switching network, which transmits the load quantity information within the network. A scheduling method for a motion picture coding algorithm using the multiprocessor DSP is also proposed. The scheduling method takes full advantage of the multistage switching network functions in distributing the processing load and sorting the processed results. Using computer simulation, the multiprocessor DSP's performance is shown to be double that of a conventional multiprocessor DSP, when an initially unbalanced load is allocated to the processors, as in picture coding for TV conferences.
Color-Edge Detectors for a VLSI Convolver
M. E. Malowany, A. S. Malowany
Two color edge-detection algorithms are presented for high-speed processing of RGB image planes from a color camera using a high-performance convolution processor board. The first uses a 3X3 Laplacian operator and the second uses a cross operator which is composed of horizontal (1X9) and vertical (9X1) difference operator components. In this paper, we examine the performance of these algorithms on artificially-generated color images with different amounts of noise added. This approach is placed in context by reviewing the work of other researchers in color machine vision. The experiments described in this paper were carried out in a Sun workstation environment on simulated color input to test the algorithms. This work is part of the effort to integrate a new high-performance convolver board and color camera into our robotic workcell environment.
A Modular Ring Architecture for Large Scale Neural Network Implementations
Lance B. Jump, Panos A. Ligomenides
Constructing fully parallel, large scale, neural networks is complicated by the problems of providing for massive interconnectivity and of overcoming fan in/out limitations in area-efficient VLSI/WSI realizations. A modular, bus switched, neural ring architecture employing primitive ring (pRing) processors is proposed, which solves the fan in/out and connectivity problems by a dynamically reconfigurable communication ring that synchronously serves identical, radially connected, processing elements. It also allows cost versus performance trade-offs by the assignment of variable numbers of logical neurons to each physical processing element.
A Multi-Processor Accelerator for 2D Image Handling
J. S. Sheblee, C. R. Allen, C. E. Goutis
Requirements for the transmission and automatic compaction of complex imagery on local area networks is a rapidly developing field, driven by the needs of industry, commerce and medicine who require image based data on existing LAN computing systems. This paper describes an image processing accelerator architecture for application in the field of pictorial archiving and communication systems (PACS). The image handling system is designed to speed-up the performance of personal computers, and is being tested using data derived from endoscopic examination of patients attending the outpatient clinic in the department of Ear, Nose and Throat of a local hospital.
Image Processing by the Human Eye
Larry N. Thibos
Image processing by the eye is treated as a classical example of concatentated linear filters followed by a sampling operation. The first filter is optical and is characterized by an optical point-spread function. The second filter is neural and is characterized by the neural point-spread function, which is shown to be related to the receptive fields of retinal neurons. Sampling renders the internal "neural image" a discrete signal subject to the effects of aliasing. Conditions responsible for aliasing are formulated in terms of the amount of overlap of retinal samplers. Evidence of aliasing in human vision is presented along with a simulation of an aliased neural image in the peripheral visual field.
A Statistical Framework for Robust Fusion of Depth Information
Laurence T. Maloney, Michael S. Landy
We describe a simple statistical framework intended as a model of how depth estimates derived from consistent depth cues are combined in biological vision. We assume that the rule of combination is linear, and that the weights assigned to estimates in the linear combination are variable. These weight values corresponding to different depth cues are determined by ancillary measures, information concerning the likely validity of different depth cues in a particular scene. The parameters of the framework may be estimated psychophysically by procedures described. The conditions under which the framework may be regarded as normative are discussed.
Visual Issues In The Use Of A Head-Mounted Monocular Display
Eli Peli
A miniature display device, recently available commercially, is aimed at providing a portable, inexpensive means of visual information communication. The display is head-mounted in front of one eye with the other eye's view of the environment unobstructed. Various visual phenomena are associated with this design. The consequences of these phenomena for visual safety, comfort, and efficiency of the user were evaluated: (1) The monocular, partially occluded mode of operation interrupts binocular vision. Presenting disparate images to each eye results in binocular rivalry. The two images may appear superimposed, with one image perceived with greater clarity or com letely dominant. Most observers can, use the display comfortably in this rivalrous mode. In many cases, it is easier to use the display in a peripheral position, slightly above or below the line of sight, thus permitting normal binocular vision of the environment. (2) As a head-mounted device, the displayed image is perceived to move during head movements due to the response of the vestibulo-ocular reflex. These movements affect the visibility of small letters during active head rotations and sharp accelerations. Adaptation is likely to reduce this perceived image motion. No evidence for postural instability or motion sickness was noted as a result of these conflicts between vis-ual and vestibular inputs. (3) Small displacements of the image are noted even without head motion, resulting from eye movements and the virtual lack of display persiste ce. These movements are noticed sponta e ously by few observers and are unlikely to interfere with the display use in most tasks.
Morphological Convolution Operations For Image Processing
M. M. Gupta, B. De Baets
Mathematical morphological operations are an important class of operations in image processing, development of machine vision systems and other similar applications. In this paper we present a general framework for morphological convolution operations. In the conventional convolution integral (summation) there are several basic mathematical steps (such as shifting, multiplication and integration) which are used to convolve the input signals or images with the ,weighting function (kernel) of a dynamic system. In this study we generalize the operational steps to include the broader meaning of multiplication operation by confluence operation, and integral operation by aggregation operation. Some of these operations are similar to that found in the biological vision system both at the retinal and visual cortex levels. This generalization leads to both deterministic and fuzzy types of morphological operations.
Color Vision: Machine And Human
Jussi Parkkinen, Timo Jaaskelainen
Color vision is nowadays an important and intensively studied field as well in the machine vision research as in the human vision research. There are numerous (three dimensional) color coordinate systems used in machine vision, and most of them are based on simplified models of human vision. Thought many times sufficient, three dimensional color representation does not give needed accuracy allways. To achieve better performance, multispectral imaging and analysis methods are needed. A pattern recognition based color analysis method, the subspace approach, is described in this paper. This method is applicable to color discrimination, recognition, and classification. Eigen-spectra information of natural colors is compared to anatomical and physiological data on color vision mechanism. Interesting similarities were observed.
Algorithms From The fdsk-Model Of Paradigmatic Recognition
Panos A. Ligomenides
Rational decision making (human and robotic) depends critically on the availability of comprehensible internal models of the decision making world. Recognition of the spatio-temporal patterns of numerically or linguistically quantified attributes and relations by assessment of conformity/resemblance to wholistic perceptual elastic constraints of selected norms, is essential for the formulation of comprehensible internal models. The "formal description schema" (fdsk)- model has been proposed for perceptual recognition and inferential capture of real time experiential knowledge from underconstrained and often indeterminate sensory data [10]. In this paper, the man-machine interactive paradigmatic approach to modeling human perception of conformity/resemblance is reviewed and algorithms for real-time recognition are presented. Fully parallel implementation using neural networks is also discussed.
Numerical Analysis of Image Patterns
Alan C. Bovik, Nanda Gopal, Tomas Emmoth
We find similarities between spatial pattern analysis and other low-level cooperative visual processes. Numerical algorithms for computing intrinsic scene attributes, e.g. shape-from-X (shading, texture, etc.) and optical flow typically involve estimating generalized orientation components via iterative constraint propagation. Smoothing or regularizing terms imposed on the constraint equations often enhance the uniqueness / stability (well-posedness) of the solutions. The numerical approach to visual pattern analysis developed here proceeds analogously via estimation of emergent 2-D image frequencies. Unlike shape-from-X or optical flow paradigms, constraints are derived from the responses of multiple oriented spatial frequency channels rather than directly from the image irradiance measurements. By using channel filters that are sufficiently concentrated in both space and frequency, highly accurate spatial frequency estimates are computed on a local basis. Two methods are proposed. In the first, constrained estimates of the emergent image frequencies are obtained by resolving the responses of multiple channel filters in a process similar to photometric stereo. The second approach formulates the estimation of frequencies as an extremum problem regularized by a smoothing term. An iterative constraint propagation algorithm is developed analogous to those used in variational / relaxational approaches to shape-from-X (shading, texture) and optical flow. Examples illustrate each approach using synthetic and natural images.
Image Motion Processing in Biological and Computer Vision Systems
Abdesselam Bouzerdoum, Robert B. Pinter
Motion modeling in biological and computer vision systems is divided into two categories: intensity-based schemes and feature-matching schemes. Some models from each category are discussed. Intensity-based models are further subdivided into global and local models. Moreover, a new motion detection model that belongs to the family of intensity-based schemes is introduced. The model operates on the same basic structure as the Reichardt's correlation model does, that is, a nonlinear asymmetric interaction between signals from two adjacent channels. However, the new model differs from that of Reichardt in the nature and origin of the nonlinear interaction: it is of the inhibitory type originating from the biophysical mechanism of shunting inhibition. Our model detects fairly well motion of objects like edges and bars. Furthermore, its mean response to a moving grating of low contrast is equivalent to that of the Reichardt correlation model.
Nonseparable QMF Pyramids
Eero P. Simoncelli, Edward H. Adelson
It is widely recognized that effective image processing and machine vision must involve the use of information at multiple scales, and that models of human vision must be multi-scale as well. The most commonly used image representations are linear transforms, in which an image is decomposed into a sum of elementary basis functions. Besides being well understood, linear transformations which can be expressed in terms of convolutions provide a useful model of early processing in the human visual system. The following properties are valuable for linear transforms that are to be used in image processing and vision modelling:
Subband Coding Of Images Using Singular Value Decomposition (S V D)*
Tian-Hu Yu, Sanjit K. Mitra
The singular value decomposition method is proposed for the coding of the subband images generated via a two-dimensional (2-D) quadrature mirror filter analysis bank. This approach permits easy determination of the optimal bit allocation and by decomposing an image into singular vectors, a 2-D coding problem is transformed into a 1-D coding problem, for which a stable linear prediction algorithm exists. In addition, the determination of the SVD of subimages is much simpler than that of the original one, due to their considerably smaller sizes. Simulation results are presented to illustrate the performance characteristics of the proposed approach.
Perfect Reconstruction Filter Banks with Rational Sampling Rates in One and Two Dimensions
Jelena Kovacevic, Martin Vetterli
Multirate filter banks with integer sampling rate changes are used widely in subband coding and transmultiplexing. A more general scheme is obtained when one allows arbitrary rational sampling rate changes, leading to non-uniform division of the frequency spectrum. This paper shows how to obtain arbitrary non-uniform perfect reconstruction filter banks by building trees of divisions into two (possibly unequal) channels. The construction uses the commutativity of subsampling and upsampling. The commutativity result, which is well known in one dimension, is then extended to two dimensions.
Pyramid Coding Of Images Using Visual Criterion
R. P. Rao, W. A. Pearlman
An image pyramid incorporating properties of the human visual system is developed and is used for compressing images. Generation of the pyramid is done in two stages: in the first stage quadrature mirror filters (QMFs) are used to decompose the image; in the second stage directional "dome" filters are applied to the subbands generated by the QMFs. The dome filter is designed such that its impulse response function resembles the receptive field of the human cortical cell. Perfect reconstruction is possible by simply interpolating, filtering and adding the various subbands. Optimal quantization of the oriented pyramid components is done based on sensitivity of the visual system. Simulation results are presented for the "lena" image.
Applications of Polynomial Transforms in Image Coding and Computer Vision
Jean-Bernard Martens
In this paper, we describe how the recently developed technique of polynomial transforms can be applied in image coding. Two related algorithms, based on two-dimensional (2D) and one-dimensional (1D) polynomial approximations respectively, are presented. It is subsequently shown how the latter coding algorithm could be improved by including ideas from the field of computer vision.
HDTV Subband/DCT Coding Using IIR Filter Banks: Coding Strategies
Rashid Ansari, Antonio Fernandez, Sang H. Lee
The results of a study of HDTV subband coding using an Infinite-duration Impulse Response (IIR) filter bank is described. The coding scheme consists of using Discrete Cosine Transform (DCT) coding in the lowest frequency band and runlength/entropy coding in the high frequency bands. Coding strategies in representing the information in the different subbands, and the tradeoffs involved in apportioning the budget of bits among the different subbands are described. Issues in the selection and implementation of the analysis and synthesis filter banks are discussed. The effect of varying the order of the analysis and synthesis filters on the reconstruction of the signal is also examined.
Experiments on Image Compression Using Morphological Pyramids
Fang-Kuo Sun, Petros Maragos
In this paper, the concept of morphological pyramids for image compression and progressive transmission is discussed. Experimental results from applying these pyramids to three real images, a satellite cloud image, a tank image from arerial photography and a NMR skull image, are presented. For lossless compression, no reduction in total (first order) entropies of error pyramids derived from the original images are observed in all three cases. However, high quality reconstructions of original images from the corresponding error pyramids can be achieved with significant reduction in total entropies.
Flexible Segmentation and Matching for Optical Character Recognition
San-Wei Sun, S. Y. Kung
This paper presents a flexible image segmentation and feature matching method based on dynamic programming techniques to resolve the spatial deformation of Optical Character Recognition (OCR) problems. A 2-subcycle thinning algorithm is presented to extract a character skeleton which is 8-connected. In addition, two feature extraction methods are devised, which will extract the projected 1-D profiles of stroke distributions and 2-D background distribution respectively. The performance of the scheme is superior to that of an equally divided segmentation scheme.
Clustering and Classification for Chinese Character Recognition
Bor-Shenn Jeng, San-Wei Sun, Chun-Jen Lee, et al.
The paper evaluates the applicability and results of several clustering and classification algorithms for optical Chinese character recognition. Emphases are laid on k-means clustering algorithms, Neural Nets classification, and Hidden Markov Model matching scheme. Some experimental results of the algorithms are also presented.
The Region And Recognition-Based Segmentation Method Used For Text In Mixed Chinese And English Characters
Yu Ping Lan, Bor Shenn Jeng, Sheng Hua Lu, et al.
Traditional methods for character segmentation are suitable only for some special cases, such as text with fixed pitches and sizes or text with overlapped, inclined or touched characters. We suggest a method which combines the advantages of both region-based and recognition-based methods to process texts which are composed of mixed English and Chinese characters. The method could distinguish different kinds of characters and would know whether they are overlapped, inclined, touched or not.
Stroke-Order Independent On-Line Recognition Of Handwritten Chinese Characters
Chang-Keng Lin, Bor-Shenn Jeng, Chun-Jen Lee
This paper proposes an on-line handwritten Chinese character recognition system based on stroke-sequence feature extraction. The character to be recognized can be stroke-order and stroke-number free, tolerance for combined strokes, size flexible, but within the constraint of normal hand-writing. Firstly, the recognizer, using the finite state matching mechanism, is used to extract primitive strokes, represented as stroke string, from the input character. Secondly, the recognizer, using a modified dynamic programming matching method, is employed to perform recognition processes with the stroke-string features. Reference patterns have been generated 2500 Chinese characters with stroke-numbers ranging from 1 to 29. The recognition results are based upon the 1800 handwritten characters by 10 people. The obtained recognition rate is 94.5%, and the cumulative classification rate of choosing fourth most similar characters is up to 98.7%. In the last part, a secondary recognition mechanism is used to further tell apart the candidates involved. The final recognition rate may be promoted up to 99%.
Cartographic Character Recognition
Howard Rafal, Matthew Ward
This work details a methodology for recognizing text elements on cartographic documents. Cartographic Character Recognition differs from traditional OCR in that many fonts may occur on the same page, text may have any orientation, text may follow a curved path, and text may be interfered with by graphics. The technique presented reduces the process to three steps: blobbing, stringing, and recognition. Blobbing uses image processing techniques to turn the gray level image into a binary image and then separates the image into probable graphic elements and probable text elements. Stringing relates the text elements into words. This is done by using proximity information of the letters to create string contours. These contours also help to retrieve orientation information of the text element. Recognition takes the strings and associates a letter with each blob. The letters are first approximated using feature descriptions, resulting in a set of possible letters. Orientation information is then used to refine the guesses. Final recognition is performed using elastic matching Feedback is employed at all phases of execution to refine the processing. Stringing and recognition give information that is useful in finding hidden blobs. Recognition helps make decisions about string paths. Results of this work are shown.
Representation And Retrieval Of Symbolic Pictures Using Generalized 2D Strings
Shi-Kuo Chang, Erland Jungert, Y. Li
We present a methodology for pictorial database design, based upon a new spatial knowledge structure. This spatial knowledge structure consists of an image database, symbolic projections representing the spatial relations among objects or sub-objects in an image, and rules to derive complex spatial relations from the generalized 2D string representation of the symbolic projections. The most innovative aspect of this spatial knowledge structure is the use of symbolic projections to represent pictorial knowledge as generalized 2D strings. Since spatial knowledge is encoded into strings, inference rules can be applied for spatial reasoning.
On the Performance of Stochastic Model-Based Image Segmentation
Tianhu Lei, Wilfred Sewchand
A new stochastic model-based image segmentation technique for X-ray CT image has been developed and has been extended to the more general nondiffraction CT images which include MRI, SPELT, and certain type of ultrasound images [1,2]. The nondiffraction CT image is modeled by a Finite Normal Mixture. The technique utilizes the information theoretic criterion to detect the number of the region images, uses the Expectation-Maximization algorithm to estimate the parameters of the image, and uses the Bayesian classifier to segment the observed image. How does this technique over/under-estimate the number of the region images? What is the probability of errors in the segmentation of this technique? This paper addresses these two problems and is a continuation of [1,2].
A Model-Fitting Approach To Cluster Validation With Application To Stochastic Model-Based Image Segmentation
J. W. Modestino, J. Zhang
An unsupervised stochastic model-based image segmentation technique requires the model parameters for the various image classes in an observed image to be estimated directly from the image. In this work, a clustering scheme is used for the model parameter estimation. Most of the existing clustering procedures require prior knowledge of the number of classes which is often, as in unsupervised image segmentation, unavailable and has to be estimated. The problem of determining the number of classes directly from observed data is known as the cluster validation problem. For unsupervised image segmentation, the solution of this problem directly affects the quality of the segmentation. In this work, we propose a model-fitting approach to the cluster validation problem based upon Akaike's Information Criterion (AIC). The explicit evaluation of the AIC is achieved through an approximate maximum-likelihood (ML) estimation algorithm. We demonstrate the efficacy and robustness of the proposed approach through experimental results for both synthetic mixture data and image data.
Segment Coding and Automated Document Recognition
T. Y. Wang, C. C. Lee
This paper presents a very simple document recognition method called "segment coding". The method entails partitioning of a document into square segments of a given size and encode each segment according to the ratio of the numbers of black and white pixels inside the segment. The segment code is used as document feature for recognition. We show an experimental prototype which performs extremely well. We also show some analytical results including system parameter optimization and system performance versus database size. It is shown that the system is fast, flexible, extremely accurate and can accommodate a huge number of documents without significantly degrading recognition accuracy.
Token-Based Optimized Correspondence Processing For The Segmentation Of Time-Varying Images
Dominique Moulet, Dominique Barba
This work deals with the improvement of a new method concerning the temporal following of spatial segmentation in images sequences. This is one of the keys for performing time-varying complex image analysis and movement interpretation. Our method connects along the time axis characteristic points derived from the different contours generated by the spatial segmentation. These points are time-linked together by a correspondence process which consists of 3 main phases : a prediction step, a matching process and a refinement phase.
Comparison Of Coding Performance Of Image Transforms Under Vector Quantization Of Optimized Subbands
Yonggang Du, Jurgen Halfmann
To exploit the remaining statistical dependencies between the spectral coefficients of an image transform, techniques applying vector quantization (VQ) to small subbands of the spectral domain have been introduced in recent years. Unfortunately, up to now these techniques are only viewed as a straightforward extension of the traditional transform coding (TC). The interplay of the VQ and the TC is more or less disregarded, leading to the consequence that statements which are made to the traditional transform coding are likely to be taken over to the new VQ based scheme. Thus, the well known superiority of the DCT - although only proven in connection with scalar quantization (SQ) - might be the reason why almost all new subband VQ based coders prefer the DCT, too. It is the purpose of this paper to introduce a method which enables a theoretical comparison of the coding performance of image transforms, specially for the case of subband VQ. It will be shown that this comparison can only be meaningful if the partition of the spectral domain is optimized. For this optimization a suboptimal iterative algorithm is described. Based on this algorithm it is shown that the performance difference between the DCT and Hadamard transform is considerably reduced even at strongly limited expense of VQ. Finally, the theoretical results are confirmed by an experimental bit rate evaluation of two subband VQ coders, one uses the DCT and the other the Hadamard transform, but otherwise both have the same structure.
A Framework For High-Compression Coding Of Color Images
F. De Natale, G. Desoli, D. Giusto, et al.
In this paper some improvements in the vector quantization technique are presented. The development of high performance coding techniques for color images is of major importance for the high compression coding framework, previously realized at DIBE. First Linde-Buzo-Gray (LBG) algorithm was improved by introducing a non-supervised neural classifier to choose the starting vector codebook; this strategy allows one to reach better solutions than a random initial choice. Then a 'split & merge' algorithm for codebook vectors was tested in order to obtain a more uniform distribution of the samples of the training sequence among the alphabet letters. Concerning the codifing process, a predictor was designed which allows a considerable reduction in the redundancy remaining after vector quantization, by exploiting the residual 'inter-block' spatial correlation. During the encoding/de-coding process, the next block can be predicted on the basis of the neighboring blocks, using four spatial-transition probability matrices. The system performance, in terms of SNR, ranges from 28 to 31 dB for a bit rate of about 0.15 bpp; it should be pointed out that the introduction of the predictor produces a sharp reduction of the bit-rate, without loss of information.
Contour-Based Postprocessing of Coded Images
Yo-Sung Ho, Allen Gersho
A new contour-based postprocessing scheme for encoded images that reduces quantization artifacts near edges and improves perceptual quality is proposed. Based on extracted contour information, our scheme selectively processes the image along edges. This scheme works very effectively by removing distortions around edges while at the same time avoiding edge blurring. Furthermore, edges are sharpened by independent directional filtering of each side of the edge, thereby enhancing perceptual quality. We demonstrate a visually distinctive improvement in perceptual quality for images coded at low rates based on vector quantization and transform coding techniques.
Using M-Transformation to Improve Performance of DCT Zonal Coding
Yoshitaka Morikawa, Nobumoto Yamane, Hiroshi Hamada
The DCT zonal coding compares unfavorably with the DCT threshold coding, in respect of coding performances. This is due to quantization losses of the Max quantizer caused by long-tailed distributions of DCT coefficients. This paper proposes a method to compensate for the losses and to improve performances of the zonal coding, by applying M-transformation to the DCT coefficient sample sequences. Simulation results have shown the attained improvements remarkable.
Fast Image Coding Using Simple Image Patterns
Dapang Chen, Alan C. Bovik
We present a very fast image coding algorithm that employs the recently-developed visual pattern image coding (VPIC) algorithm embedded in a multi-resolution (pyramid) structure. At each level in the hierarchy, the image is coded by the VPIC algorithm [1]. The low-resolution images coded at the upper levels of the pyramid are used to augment coding of the higher-resolution images. The interaction between the different resolution levels is both simple and computationally efficient, and yields a significant increase in compression ratio relative to simple VPIC with improved image quality and with little increase in complexity. The resulting hierarchical VPIC (HVPIC) algorithm achieves compression ratios of about 24:1 and in the implementation demonstrated here, requires only 22.8 additions and 3.84 multiplications per image pixel.
Simultaneous Blur Identification And Image Restoration Using The EM Algorithm
A. K. Katsaggelos, K. T. Lay
Algorithms for the simultaneous identification of the blur and the restoration of a noisy blurred image are presented in this paper. The original image and the additive noise are modeled as zero-mean Gaussian random processes, which are characterized by their covariance matrices. The covariance matrices are unknown parameters. The blurring process is specified by its point spread function, which is also unknown. Maximum likelihood estimation is used to find these unknown parameters. In turn, the EM algorithm is exploited to find the maximum likelihood estimates. In applying the EM algorithm, the original image is chosen to be part of the complete data; its estimate, which represents the restored image, is computed in the E-step of the EM iterations. Explicit iterative expressions are derived for the estimation of relevant parameters. Experiments with simulated and photographically blurred images are shown.
Projection Techniques For Image Restoration
Christine I. Podilchuk, Richard J. Mammone
In this paper we compare the use of three different projection techniques for image recovery. The three methods include modified versions of the row-action and block-action projection techniques of Kaczmarz as well as a new iterative projection method which projects onto the set of least squares solutions. The performance characteristics of the three techniques are demonstrated using computer simulations.
Use of Convex Projections for Image Recovery in Remote Sensing and Tomography
Henry Stark, Peyma Oskoui
We consider the problem of reconstructing remotely obtained images from image-plane detector arrays. While the individual detectors may be larger than the blur spot of the imaging optics, high-resolution reconstructions can be obtained by scanning or rotating the image with respect to the detector. As an alternative to matrix inversion or least-squares estimation the method of convex projections is proposed. We derive the appropriate algorithm and discuss the availability of prior knowledge. The application of the algorithm to tomography is briefly reviewed.
Standard Image Recovery Methods In The Iterative Data Refinement Framework
Gabor T. Herman
Iterative Data Refinement (IDR) is a general procedure for estimating data that would have been collected by an ideal measuring device from data that were collected by an actual measuring device. An example is in Computerized Tomography (CT), where we have a mathematical procedure to reconstruct the x-ray attenuation coefficient at individual points inside the human body from (the ideal) data obtained by passing monoenergetic x-rays through the body and measuring the percentage of energy that gets through. Unfortunately, x-ray tubes deliver polyenergetic x-rays and the actual measurements only approximate what is assumed by the mathematics of CT, resulting in images whose quality is noticeably worse that those reconstructed from ideal data. This is one of the applications where the efficacy of IDR has been demonstrated: it can be used to estimate data that would be obtained from the ideal monoenergetic x-ray tube from the data that are obtained from the actual polyenergetic x-ray tube. In fact, IDR is general enough to encompass such well-accepted image recovery methods as the Gerchberg-Saxton algorithm and the error reduction and hybrid input-output methods of Fienup. The generalizations provided by IDR give new insights into the nature of such algorithms and, in particular, allow us to introduce the notion of relaxation into them, resulting in many cases in a much improved computational behavior.
Image Representation by Localized Phase
Yehoshua Y. Zeevi, Moshe Porat
Localized phase is extracted from images represented in the combined frequency- position space. It is shown that images represented by localized phase-only information reproduce adequately the edge relationship while compressing the gray level information, unlike the localized magnitude-only representation that distorts the edge information. We address the issue of the number of quantization levels required for adequate representation by localized phase, and present an analytical expression of the resultant image as well as computational examples. We show that image reconstruction from localized phase-only is more efficient than image reconstruction from global (Fourier) phase in that the number of required computer operations is reduced and the rate of convergence is improved. The computation efficiency can be further improved by implementation of highly- parallel architecture.
Phase Recovery Via Mathematical Programming
Arnold Lent
The Gerchberg-Saxton problem of phase recovery is identified with the canonical problem of locating a point of maximum norm in a given convex set. Some heuristic algorithms are proposed.
Applications Of Image Recovery Techniques In The Earth Sciences
Robert F. Brammer
This paper focuses on the broad range of applications of image recovery techniques in the earth sciences. The objectives of this paper are threefold. First, the paper is a survey of several significant problems in the earth sciences for which image recovery techniques are uniquely useful. Second, the paper describes some important trends in the application of image recovery techniques in these areas. Finally, several significant research areas for which the current generation of image recovery techniques are not adequate to solve important problems are summarized with important directions indicated for future developments.
Efficient Algorithms for Least-Squares Restoration
Cheung Auyeung, Russell M. Mersereau
This paper introduces a new algorithm for the restoration of black and white photographic images that have been distorted by linear blurs and additive noise. The algorithm is iterative and particularly efficient for one-dimensional causal distortions, such as motion blur, and one- and two-dimensional symmetric distortions, such as out-of-focus blurs. It involves solving sets of Toeplitz or block Toeplitz linear equations. The algorithm is illustrated using an example involving motion blur.
Optimal Sampling And Reconstruction Of MRI Signals Resulting From Sinusoidal Gradients
Avideh Zakhor
Fundamental operations of Magnetic Resonance Imaging (MRI) can be formulated, for a large number of methods, as sampling the object distribution in the Fourier spatial-frequency domain, followed by processing the digitized data to produce a digital image. In these methods, controllable gradient fields determine the points in the spatial-frequency domain which are sampled at any given time during the acquisition of the Free Induction Decay (FID) signal. Unlike the constant gradient case in which equally spaced samples of the FID signal in time correspond to uniform samples in the Fourier domain, for time-varying gradients, linear sampling in time corresponds to nonlinear sampling in the Fourier domain, and therefore straightforward inverse Fourier transformation is not sufficient for obtaining samples of the object distribution. MRI methods using time-varying gradients, such as sinusoids, are particularly important from a practical point of view, since they require considerably shorter data acquisition times. In this paper, we derive the optimum continuous time filter and its various discrete time implementations for FID signals resulting from sinusoidal gradients. In doing so we find that the estimation error associated with implementation based on linear temporal sampling, or equivalently nonlinear spatial frequency sampling, is smaller than that of nonlinear temporal sampling. In addition, we will show that that the optimal maximum likelihood estimator for sinusoidal gradients has higher error variance than that of constant gradients.
Sampling and Reconstruction of Non-Bandlimited Signals
James J. Clark
We examine the problem of sampling and reconstructing non-bandlimited signals with sample sets of finite density. Three approaches are considered in the paper. The first approach, based on a method of Clark, involves time warping, or demodulating, a class of generalized phase modulated signals into bandlimited signals, which can then be sampled and reconstructed with the standard Shannon sampling theory. The second method applies Kramer generalization of Shannon's theory, and it is seen that the reconstruction processes derived from this application are, in general, for non-bandlimited signals. The final approach combines the first approach and a special case of the second approach wherein non-harmonic Fourier kernels are used. This approach allows the specification of a sampling and reconstruction process for certain classes of non-bandlimited signals for which uniform sampling is used.
F-CORE: A Fourier Based Image Compression And Reconstruction Technique
Evangelia Micheli-Tzanakou, Gayle M. Binge
A data compression and reconstruction technique is presented based on a Fourier transformation. Correlations of up to 99.8% have been achieved between the original and the reconstructed image. In this method the complex Fourier coefficients are calculated and sorted from maximum to minimum. A certain small percentage of these coefficients is retained along with their coordinates. An inverse FFT is used to reconstruct a first approximation of the image and after that algebraic reconstruction techniques are used for refinements on the image.
Color Video Coding For Packet Switched Network: Replenishment Adaptive VQ
Yushu Feng, Nasser M. Nasrabadi
Prior work has shown that vector quantization method is a powerful tool in image coding. In order to get further compress ratio, and good quality coded image; by combing an adaptive vector quantization technique with decimation and directional non-linear interpolation; we can get high quality coded image. Since local information content of an image varies widely from one region to another, we use an adaptive method that varies the coding resolution according to local estimate of informatiOn content. This implies a variable-rate coding technique; we use only one stage coding to deal with this high and low detail region. This process is 1. decimation of image vector 2. use adaptive vector quantization technique to get high quality reconstructed image(subsampled) 3. do nonlinear interpolation. See figure 1. It describes a new adaptive vector quantization scheme suitable for color packet video. In the proposed technique, a large codebook is divided into two parts, called higher and lower priority section, representing common and specific characteristics of images. Entries in the two section are reorganized and exchanged as a function of usage of the codevectors. The rate and the extent of adoption is dictated by the update interval and the desired level of quality, respectively, without requiring any transmission of the vectors themselves. This method increases coding efficiency and provides a perceptually more consistent image quality throughout all area of the restructured image. Simulation result shows that for color image bpp is about 0.15-0.25; SNR ratio is 27-30 db.
Study On Non-Real-Time Motion Picture Coding For Read-Only Digital Storage Media
Yoshihiro Miyamoto, Mutsumi Ohta, Takao Omachi
Motion picture coding algorithms for read-only digital storage media (DSM), such as CD-ROM, is studied. For this purpose, it is possible to carry out a non-real-time analysis in advance of encoding and to use the result for improving coding efficiency. In this paper, two methods are proposed for such improvement, prefiltering and background prediction. Computer simulation results show that both methods can improve coding efficiency, approximately 20-30 % over conventional prediction coding, when applied to an image sequence with still camera-work. Moreover, it is clarified that the background prediction also can improve coding efficiency for a panning image sequence.
17 Mbit/s Algorithm for Secondary Distribution of 4.2.2 Video Signals
Jean Pierre Henot
The transmission of digital video signals to the end user, which is known as secondary distribution by CCITT, is one of the key point of Broadband ISDN (B-ISDN). Sufficient picture quality can be obtained at a bitrate of 17 Mbit/s (or even lower) by using a hybrid DCT scheme working on blocks of 8 pels by 8 lines of the same frame (instead of lines of the same field) associated to block matching motion estimation and compensation. The optimization of the block matching algorithm (BMA) and the adaption of the hybrid DCT scheme to blocks designed on a frame basis are presented hereafter.
A Low-Bit-Rate Video Codec Using Block-List-Transform
Hsueh-Ming Hang, Barry G. Haskell, Robert L. Schmidt, et al.
A simple but efficient video codec is built for low-bit-rate videophone applications. The goal of this project is to construct a video codec at a low cost and good performance. In order to reduce hardware complexity, a simple interframe predictive coding structure is selected. The input pictures are partitioned into 2-D blocks, and only the blocks with significant frame differences are coded and transmitted to the decoder. The Block-list-transform (BLT) coding algorithm, a generalized DPCM, is used to encode the frame differences between the previously reconstructed picture and the current picture. Preliminary results using this coding system at 64 kbits/sec show reasonable coding performance on typical videophone sequences.
Image Sequence Coding by Octrees
Riccardo Leonardi
This work addresses the problem of representing an image sequence as a set of octrees. The purpose is to generate a flexible data structure to model video signals, for applications such as motion estimation, video coding and/or analysis. An image sequence can be represented as a 3-dimensional causal signal, which becomes a 3 dimensional array of data when the signal has been digitized. If it is desirable to track long-term spatio-temporal correlation, a series of octree structures may be embedded on this 3D array. Each octree looks at a subset of data in the spatio-temporal space. At the lowest level (leaves of the octree), adjacent pixels of neighboring frames are captured. A combination of these is represented at the parent level of each group of 8 children. This combination may result in a more compact representation of the information of these pixels (coding application) or in a local estimate of some feature of interest (e.g., velocity, classification, object boundary). This combination can be iterated bottom-up to get a hierarchical description of the image sequence characteristics. A coding strategy using such data structure involves the description of the octree shape using one bit per node except for leaves of the tree located at the lowest level, and the value (or parametric model) assigned to each one of these leaves. Experiments have been performed to represent Common Image Format (CIF) sequences.
An Interframe Coding Scheme for Packet Video
A. Puri, R. Aravind
Interframe coding schemes though sensitive to the propagative affects of channel-errors, are much more efficient than intraframe coding schemes, when using similar data-compression techniques. Moreover, since in interframe coding, the instantaneous data-rate may vary drastically, it can benefit from the bits-on-demand property of packet-networks. We propose a layered coding-structure that contains a self-contained basic inter-frame codec as the core, and supplementary additive extended-codecs. The inter-frame codec uses a slightly modified version of motion-compensated DCT coding scheme proposed by the CCITT in reference model 8 for pX64kbit/ s videophone. It encodes the lower spatial resolution image sequence, with an adequate temporal resolution. The encoded-data it produces is assigned the highest priority. The extended-codecs allow enhancement of temporal and spatial resolutions, at the cost of increase in the data-rate. They encode the interpolative differences between the upsampled local decoded output of the basic-codec and the higher resolution input. The encoded-data of the extended codecs is assigned lower priority.
Post Filtering for Cell Loss Concealment in Packet Video
Kou-Hu Tzou
A packet network provides an effective way of multiplexing variable-rate sources, such as video and voice, by taking into account the source statistics. However, network congestion will result in packet (cell) losses which may substantially degrade video quality. In this paper, we investigate the effects of cell losses on a Lapped Orthogonal Transform (LOT) coding system and methods to restore the degraded image. Due to the overlapping nature of the LOT, an image block can be partially reconstructed if the corresponding coefficient block is missing. The degraded image block is modeled as a blurred original block corrupted by a correlated noise. A minimum mean-squared-error restoration filter is designed using a first-order Markov model for the image. The restored image shows noticeable quality improvement.
Using Local Orientational Information As Image Primitive For Robust Object Recognition
Peter Seitz
Orientational information can replace the traditional edges as basic image features ("primitives") for object recognition. A comparison of five different orientation operators on 3 x 3 windows is carried out, and it is found that these operators have similar performance. A first attempt at object recognition searches for minimum root mean square deviation of orientation in a picture. This technique shows better object discrimination than the traditional normalized cross-correlation of grey-level images. Additionally the parameters of the Gaussian distribution of orientational correlation can be accurately predicted by a simple theoretical model. The orientational correlation technique shows difficulties in recognizing geometrically distorted and partially occluded objects. For this reason a very robust algorithm for the recognition of simple objects is developed, based only on orientationl information as image feature, and local polar coordinates for the model of the object. Practical examples taken under difficult, natural conditions illustrate the reliable performance of the proposed algorithm, and it becomes apparent that orientational information is indeed a powerful image primitive.
Model Generation And Inexact Matching For Object Identification
Chung Lin Huang
In this paper we present a hierarchical inexact matching method to identify 3-D objects. The 3-D objects are described by a relational model consisting of the global relation, the topological relation, and the geometrical relation. When identifying an unknown object, firstly its global, topological, and geometrical relations are generated, secondly its global relation is used as a key to search for the closest group of clusters, thirdly its topological relation is matched with the representative of each cluster in the same group to find the most similar cluster, and finally the geometrical relation match takes place against the objects in this cluster for the best match. The hierarchical relational model generation and inexact matching are both simple and powerful. The matching procedure incorporating weights, thresholds, and constraint relations is highly flexible.
Normalized Interval Vertex Descriptors and Application to the Global Shape Recognition Problem
Ramon Parra-Loera, Wiley E. Thompson, Gerald M. Flachs
The general global 2-D shape recognition problem of objects in digital scenes is discussed. A new technique -- Normalized Interval Vertex Descriptors (NI/VD) -- for representing contours is introduced to the field of object recognition. This technique, which is derived from the physical characteristic of the silhouette of the object (corners and sides), is proven to be effective in recognizing objects that can be more or less represented or approximated by polygons. Typical examples of these classes are man-made objects like planes, missiles, mechanical parts, etc. Normalized Interval Vertex Descriptors provide an accurate representation for objects which is robust to three main sources of classification errors, namely: scale change, translation and rotation. Because of this robustness and other advantages of this implementation, its use proves to be very effective for recognizing objects in arbitrary positions within the field of view (FOV) and for objects at varying distances from the sensor or for various sensor zoom factors. Compactness of the representation, on the other hand, allows the implementation of faster object recognition systems.
Recovery of Straight Homogeneous Generalized Cylinders Using Contour and Intensity Information
Ari D. Gross, Terrance E. Boult
Generalized cylinders have been the focus of considerable vision research. Straight homogeneous generalized cylinders are a class of generalized cylinders whose cross sections are scaled versions of a reference curve. In this paper, a general method for recovering straight homogeneous generalized cylinders is outlined. The method proposed in this paper combines constraints derived from both contour and intensity information. First, a method of "ruling" straight homogeneous generalized cylinder (SHGC) images is given. Next, these ruled images are studied to determine what parameters of the underlying shape can be computed. We show there exist equivalent classes of SHGCs that pro-duce the same set of contours, both extremal and discontinuous, thus one can not recover an SHGC from contour information alone. We also show that the sign and magnitude of the Gaussian curvature at a point varies among members of a contour-equivalent class. This motivates the need for incorporating additional information into the recovery process. Finally, we derive a method for recovering the tilt of the object using the ruled SHGC contour and intensity values along cross-sectional geodesics.