Proceedings Volume 1360

Visual Communications and Image Processing '90: Fifth in a Series

Murat Kunt
cover
Proceedings Volume 1360

Visual Communications and Image Processing '90: Fifth in a Series

Murat Kunt
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 1 September 1990
Contents: 28 Sessions, 169 Papers, 0 Presentations
Conference: Visual Communications and Image Processing '90 1990
Volume Number: 1360

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Human Visual System and Neural-Network-Based Processing
  • Massively Parallel Computer Architectures
  • Nonlinear Image Processing
  • Mathematical Morphology and Fractals
  • VLSI Implementation and System Architectures I
  • VLSI Implementation and System Architectures II
  • 3-D Image Processing
  • Image Sequence Coding I
  • Hierarchical Video Coding
  • Hierarchical Image Coding
  • Digital Image Processing in Medicine I
  • Image Sequence Coding III
  • Digital Image Processing in Medicine II
  • HDTV
  • Parallel Processing
  • Image Coding and Transmission I
  • Image Coding and Transmission II
  • Edge/Boundary Detection
  • Neuromorphology of Biological Vision I
  • Neuromorphology of Biological Vision II
  • Image Sequence Coding II
  • Image Sequence Coding III
  • Image Sequence Coding II
  • Segmentation/Classification
  • Texture
  • Image Restoration
  • Digital Image Processing
  • JPEG/MPEG Algorithms and Implementation
  • Vision Science and Technology for Space
  • Pattern Recognition
  • Image Sequence Coding III
  • Image Sequence Coding II
  • Pattern Recognition
  • JPEG/MPEG Algorithms and Implementation
  • Hierarchical Image Coding
  • JPEG/MPEG Algorithms and Implementation
Human Visual System and Neural-Network-Based Processing
icon_mobile_dropdown
Human visual quality criterion
Serge Comes, Benoit M. M. Macq
This paper presents a human visual quality criterion for still monochrome pictures. The main idea is based on the decomposition of the signal in tuned channels with respect to the location in the visual field, the orientation and the frequency. The specific sensitivity of each channel and the self-masking are taken into account to evaluate the quality of the picture.
Multiscale image coding using the Kohonen neural network
Marc Antonini, Michel Barlaud, Pierre Mathieu, et al.
This paper proposes a new method for image coding involving two steps. First, we use a 'Dual Recursive Wavelet' Transform in order to obtain a set of subclasses of images with better characteristics than the original image (lower entropy, edges discrimination, ... ). Second, according to Shannon's rate distortion theory, the wavelet coefficients are vector quantized using the Kohonen Self-Organizing Feature Maps. We compare this training method with the well known LBG algorithm.
Target cuing: a heterogeneous neural network approach
Howard M. McCauley
Autonomous analysis of complex image data is a critical technology in today’s world of expanding automation. The growth of this critical field is slowed by problems in traditional image analysis methods. Traditional methods lack the speed, generality, and robustness that many modern image analysis problems require. While neural networks promise to improve traditional techniques, homogeneous neural network systems have difficulty performing all the diverse analysis required of an autonomous system. This paper proposes a dual-staged, heterogeneous neural network approach to image analysis; specifically, a way to solve the target cuing problem.
Bilevel quantization using dithering and Hopfield theory
Bilevel quantization using dither is useful with coarse quantizers. So we study it from a statistical viewpoint, furthermore, physical energy theory known as Hopfield neural network or Ising spin system. These are equivalent internally and they should be related to the optimum convergence or minimum quantization error of dithering. In this paper, we show this relationship and theoretical improvements.
New bidirectional neural network and application to binary image recognition
Shengwei Zhang, Anthony G. Constantinides, Lihe Zou
A new bidirectional neural network is proposed based on convex projection theory. The neurons in the network are divided into two classes, clamped and floating neurons. For clamped neurons their states are preassigned to some fixed values and provide the network stimulus. For floating neurons their states change in accordance to those of other neurons and provide the network response. Steady state solutions under synchronous operation are presented in a closed-form formula. An adaptive learning algorithm is discussed which does not need matrix inverse computation and thus saves much learning time. Experiments in storing and retrieving binary images are carried out on a data base composed of 26 uppercase and lower-case English characters.
NNE-CA: the implementation of neural network emulator board
Kicheol Park, Kugchan Cha, Jong Soo Choi
A Neural Network Emulator named NNE-CA was implemented using five recently developed general purpose 56-bits Digital Signal Processor (DSP56001). The NNE-CA is a MIMD (Multiple Instruction and Multiple Data) type parallel processor having Ring-Star coupled 5 DSPs. Its host computer is MVME147, which uses VME bus. The vector quantization using self organized neural netwoik, the resistive net for shape from shading, and back- propagation algorithms are simulated on the NNE-CA.
Multistaged neural network architecture for position invariant shape recognition
Jay I. Minnix, Eugene S. McVey, Rafael M. Inigo
This paper presents a pattern recognition system that self-organizes to recognize objects by shape as part of an Integrated Visual Network (IVN) for autonomous flight control. The system uses a multistaged hierarchical neural network that exhibits insensitivity to the location of the object in the visual field. The network's three layers perform the functionally disjoint tasks of preprocessing (dynamic thresholding), invariance (position normalization), and recognition (identification of the shape). The Preprocessing stage uses a single layer of elements to dynamically threshold the grey level input image into a binary image. The Invariance stage is a multilayered neural network implementation of a modified Walsh-Hadamard transform that generates a representation of the object that is invariant with respect to the object’s position. The Recognition stage is a modified version of Fukushima's Neocognitron that identifies the position normalized representation by shape. The inclusion of the Preprocessing and Invariance stages allows reduction of the massively replicated processing structures used for translation invariance in the Neocognitron. This system offers roughly the same translation invariance capabilities as the Neocognitron with a dramatic reduction in the number of elements and the network's interconnection complexity.
Image compression using a neural network with learning capability of variable function of a neural unit
Ryuji Kohno, Mitsuru Arai, Hideki Imai
This paper proposes image compression using an advanced neural network in which a variable input-output function of a neural unit can be learnt as well as a weight coefficient of a neural connection corresponding to information source and application. Since the neural network has the improved learning capability for local nonlinearity of information source, its application to compression of nonlinear information such as image is investigated. A learning algorithm and adaptive controlling schemes of input-output functions are derived. Simulation results show that the neural network can achieve higher SNR and shorter learning time than a conventional network having only variable weights.
Massively Parallel Computer Architectures
icon_mobile_dropdown
ASP: a parallel computing technology
R. M. Lea
ASP modules constitute the basis of a parallel computing technology platform for the rapid development of a broad range of numeric and symbolic information processing systems. Based on off-the-shelf general-purpose hardware and software modules, ASP technology is intended to increase productivity in the development (and competitiveness in the marketing) of cost-effective low-MIMD/high-SIMD Massively Parallel Processor (MPPs). The paper discusses ASP module philosophy and demonstrates how ASP modules can satisfy the market, algorithmic, architectural and engineering requirements of such MPPs. In particular, two specific ASP modules, based on VLSI and WSI technologies, are studied as case examples of ASP technology; the latter reporting 1 TOPS/ft3, 1 GOPS/W and 1 MOPS/$ as ball-park figures-of-merit of cost-effectiveness.
Benchmarking the ASP for computer vision
Argy Krikelis
As high-performance sequential and parallel computing systems become increasingly accessible to large portions of the computing community, how well these systems perform on particular applications becomes of greater interest.
SCC-100 parallel processor for real-time imaging
William J. Jacobi, William B. Kendall, Leo A. Wadsworth
The SCC-100 parallel processor utilizes a fully-programmable, 32-bit MIMD architecture optimized for image and signal processing. Applications include image registration, clutter suppression, velocity filtering, multispectral processing, medical imaging and computer vision research as well as radar and sonar signal processing. The first SCC-100 processor, with 19 nodes and a peak throughput in excess of 1 GFLOPS, was recently delivered. A micro-miniature version using hybrid wafer-scale integration is currently under development for space applications.
MEGA Node: an implementation of a coarse-grain totally reconfigurable parallel machine
M. Bertrand Blum, Caroline Burrer
The MEGA Node is a loosely coupled, highly parallel computer, based on transputers. One of its main characteristics is its ability to change the topology of the network, using an electronic switch. It covers a range from 128 to 1024 "worker processors", delivering from 550 to 4400 Mflops peak performance. To achieve these performances, a hierarchical structure has been adopted. This highly parallel machine is issued from the Esprit I P1085 "Supernode" project. The software has to support a wide spectrum of users going from those who wish to obtain maximum performance from the machine to those who wish to use it as a general purpose multi-user parallel machine. This paper describes the different ways to use the MEGA Node and the software environments provided to satisfy all kind of users. The Helios environment is a good example to explain how an operating system can control this machine, particularly the networking management and the fundamental problem of mapping. The MEGA Node has already been used for a wide range of applications like signal/image processing (high and low level), image synthesis, scientific and engineering number-crunching, neural network simulation and logic simulation. Only a few of them are discussed in this paper: medical image analysis and vision and ray tracing.
Nonlinear Image Processing
icon_mobile_dropdown
Median-based algorithms for image sequence processing
Bilge Alp, Petri Haavisto, Tiina Jarske, et al.
In this paper several digital video signal processing algorithms that use the median operation are presented. Both numerical and visual evaluations of the resulting image sequences clearly show that the median operation has many desirable properties for digital image sequence processing applications.
Nonlinear quincunx interpolation filtering
Arto H.T. Lehtonen, Markku Renfors
Quincunx sub-sampling and interpolation techniques are widely used in many image and video signal bandwidth compression techniques, like HDTV transmission systems. In this paper we consider the basic quincunx interpolation problem: interpolation of the missing samples of an orthogonally sampled image after quincunx sub-sampling with a factor of two. New median-based nonlinear interpolation techniques are developed and compared with traditional linear designs. It is shown that nonlinear interpolators result in considerably lower hardware complexity than good quality linear designs and, in many respects, even better image quality can be achieved.
Minimax optimization over the class of stack filters
Moncef Gabbouj, Edward J. Coyle
A new optimization theory for stack filters is presented in this paper. This new theory is based on the minimax error criterion rather than the mean absolute error (MAE) criterion used in [8]. In the binary case, a methodology will be designed to find the stack filter that minimizes the maximum absolute error between the input and the output signals. The most interesting feature of this optimization procedure is the fact that it can be solved using a linear program (LP), just like in the MAE case [8]. One drawback of this procedure is the problem of randomization due to the lost of structure in the constraint matrix of the LP. Several sub-optimal solutions will be discussed and an algorithm to find an optimal integer solution (still using a LP) under certain conditions will be provided. When generalizing to multiple-level inputs, complexity problems will arise and two alternatives will be suggested. One of these approaches assumes a parameterized stochastic model for the noise process and the LP is to pick the stack filter which minimizes the worst effect of the noise on the input signal.
Morphological filtering of noisy images
Lasse Koskinen, Jaakko T. Astola, Yrjo A. Neuvo
In this paper we analyze the statistical properties of discrete morphological filters. These properties are important when we are applying morphological filters to noisy images. Analytical expressions for the output distribution of basic discrete morphological filters are derived using the theory of stack filters.
Morphological filtering and iteration
Henk J. A. M. Heijmans
The construction of morphological filters, i.e., morphological operators which are increasing and idempo- tent, is an important problem in mathematical morphology. In this paper it is described how one can construct openings and closings and other morphological filters by iteration. It turns out that a sort of order continuity is required to guarantee that iteration leads to idempotence. An important class of operators which yield morphological filters after iteration are the so-called pointwise monotone operators. In the last section the abstract theory is illustrated by a number of concrete examples.
Quantitative comparison of median-based filters
T. George Campbell, Hans du Buf
Median-based filtering techniques are compared by considering a test image which contains a central disk-shaped region with a step or a ramp edge against a uniform background. Free parameters are the amplitude of Gaussian noise added,'the edge slope, and the number of filtering iterations. The quantitative comparison measure is the normalized squared error between the filtered noisy image and the noise-free image, on the flat image regions and on the transition region separately.
Nonlinear spatial filtering of FLIR images
Maria Jose Perez-Luque, Carlos Munoz, Narciso N. Garcia
FLIR images contain a high level of noise, mainly caused by the acquisition system. Different classical non linear filters have been proposed to reduce this noise. Here, those having the best performance have been selected, studied in depth, and evaluated using statistical and subjective criteria. The best filters were AM and DWMTM, on which a detailed comparison study was carried out, finally choosing the AM one.
Mathematical Morphology and Fractals
icon_mobile_dropdown
Links: definition and properties
Jean C. Serra
In the lattice of the operators acting on a complete lattice, a link ? is the inf of an increasing and a decreasing mappings. The links generate a complete lattice, and are characterized, in case of complete distributivity, by the implication A ? X ? B ? ? (A) ^ ?(B) ? ?(X). They allow minimal representation of a large class of mappings by means of hit-or-miss. Special attention is devoted to the residuals, i.e. the set difference between the identity and increasing mappings. These particular links allow to provide a common form to skeletons, ultimate erosions and conditional bisectors.
Minimal search for the optimal mean-square digital gray-scale morphological filter
To characterize optimal mean-square morphological filters, it is first necessary to interpret morphological operations in a functional manner appropriate to the theory of statistical estimation. The present paper takes such an approach in the case of digital N-observation grayscale filters, these being defined via the Matheron representation. Having obtained the optimality criterion, we are lead to the characterization of a minimal search space, the nodes of the space being potential erosion structuring elements. More precisely, there exists a set of structuring elements which will always contain elements forming the basis for an optimal MS filter. Moreover, the set, called the fundamental set, is minimal, in the sense that no element can be deleted from it without possibly yielding a set not containing the optimal structuring element for a single-erosion filter.
Fractal image coding based on a theory of iterated contractive image transformations
Arnaud E. Jacquin
The notion of fractal image compression arises from the fact that the iteration of simple deterministic mathematical procedures can generate images with infinitely intricate geometries, known as fracial images [2]. The purpose of research on fractal—based digital image coding is to solve the inverse problem of constraining this complexity to match the real—world complexity of real—world images. In this. paper, we propose a fractal image coding technique based on a mathematical theory of iterated transformations [1, 2, 3, 7] which encompasses deterministic fractal geometry. Initial results were reported in [7, 8]. The main characteristics of this technique are that (i) it is fractal in the sense that it approximates an original image by a fractal image, and (ii) it is a block—coding technique which relies on the assumption that image redundancy can be efficiently exploited through blockwise self—iransformabiliiy. We therefore refer to it as fracial block—coding. The coding—decoding system is based on the construction, for each original image given to encode, of a specific image transformation which, when iterated on any initial image, produces a sequence of images which converges to a fractal approximation of the original. We show how to design such coders, and thoroughly describe the implementation of a system for monochrome still images. Extremely promising coding results were obtained.
Determining watersheds in digital pictures via flooding simulations
Pierre Soille, Luc M. Vincent
The watershed transformation is a very powerful image analysis tool provided by mathematical morphology. However, most existing watershed algorithms are either too time consuming or insufficiently accurate. The purpose of this paper is to introduce a new and flexible implementation of this transformation. It is based on a progressive flooding of the picture and it works for n-dimensional images. Pixels are first sorted in the increasing order of their gray values. Then, the successive gray levels are processed in order to simulate the flooding propagation. A distributive sorting technique combined with breadth-first scannings of each gray level allow an extremely fast computation. Furthermore, the present algorithm is very general since it deals with any kind of digital grid and its extension to general graphs is straightforward. Its interest with respect to image segmentation is illustrated by the extraction of geometrical shapes from a noisy image, the separation of 3-dimensional overlapping particles and by the segmentation of a digital elevation model using watersheds on images and graphs.
Digital Euclidean skeletons
Fernand Meyer
The digitization of the skeleton is the source of many difficulties. The skeleton of a binary set can be defined as the set of crest points and their upstream on the distance function10, 12. The final result depends on the quality of the distance function which has been used. The result is particularly coarse with the hexagonal distance function. The use of an euclidean distance function yields a beautiful skeleton, once some subtle interactions with the grid have been solved.
Mathematical morphology on the sphere
Jos B.T.M. Roerdink
In this paper we show how the classical morphological definitions for Euclidean space can be extended to the sphere, where the rotation group replaces the Euclidean translation group. The main problem to overcome is the non-commutativity of the rotation group. Some examples of morphological transformations on the sphere are given. To visualize the results we use a projection of the sphere onto a plane.
Antiskeleton: some theoretical properties and application
Michel Schmitt
Noting that the skeleton is defined by means of the distance function to the nearest point, an analog notion named anti-skeleton is derived when substituting to that distance the distance to the farthest point. A complete parallel is constructed, concerning the mathematical definitions and properties of the skeleton and the anti-skeleton. An example of its use in image analysis is then presented.
Image mosaic and interpolation by multiresolution morphological pyramids
Soo-Chang Pei, Hann-Bao Tsai
The morphological pyramid is a new computationally efficient algorithm for generating multidimensional bandpass representation of an image. This image pyramid offers a flexible convenient multiresolution format that mirrors the multiple scales of processing in the human visual system. Here the morphological pyramid is used for image mosaic and smear removal by multiresolution interpolation, some image examples are illustrated to show the effectiveness of this approach.
Subband image decomposition by mathematical morphology
Soo-Chang Pei, Fei-Chin Chen
An efficient subband image decomposition method by mathematical morphology is described in this paper, it decomposes the input signal spectrum into 4 subband images by using two separable structure elements, then each band image, can be decimated and coded effectively for data transmission. This subband pyramid scheme preserves the number of pixels as in the original image, also the data structure itself is very compact, the advantages of morphology over linear filtering approach are its direct geometric interpretations, simplicity and efficiency in implementation, some image examples are given to show the effectiveness of this approach.
VLSI Implementation and System Architectures I
icon_mobile_dropdown
Reconfigurable architecture for real-time 3-D parameter estimation from image sequences
F. M. Hugen, Maarten J. Korsten, Zweitze Houkes
In this paper a multiprocessor architecture is proposed which is capable of handling real-time parameter estimation algorithms for estimating 3D body parameters from 2D image sequences. The implicit parallelism of the algorithm is used to obtain a highly efficient architecture while retaining modularity and flexibility.
Real-time VLSI architecture for geometric image transformations
Min Zhao, Jean Gobert, Olivier Schirvanian, et al.
Many applications in image processing, such as digitized angiography and scan conversion in medical imaging, sensor distorsion correction or image registration, require real time geometric transformations. Therefore, the Laboratoires d’Electronique Philips (LEP) and TELECOM Paris University are currently developing a chip performing in real-time a large class of geometric image transformations with third degree polynomials. We present here the VLSI architecture of this chip. The different problems associated with real-time image processing are discussed and some new architectural concepts, local memories combined with incremental calculation on a block and processing by patches, are used to overcome these problems. The chip operates on frame sizes of up to 1024x1024 pixels with a spatial resolution of 1/16 pixel at a maximum rate of 30 images per second.
CCD focal-plane real-time image processor
A focal-plane-array chip designed for real-time, general-purpose, image preprocessing is reported. A 48 X 48 pixel detector array and a 24 X 24 processing element processor array are monolithically integrated on the chip. The analog, charge-coupled device-based VLSI chip operates in the charge domain and has sensing, storing, and computing capabilities. The chip was fabricated with a double-poly, double-metal process in a commercial CCD foundry. The simulation of an edge detection algorithm implemented by the chip is presented. An overview of the chip performance is described as well.
Mapping technique for VLSI/WSI implementation of multidimensional systolic arrays
Mohamed B. Abdelrazik
This paper describes a mapping technique for transforming a linear systolic array into multidimensional systolic arrays in order to achieve high-speed with less overhead. This technique is systematic, therefore, it would be useful for logic synthesis. The application of this technique in DSP and numerical computations reduces the design time which results in low design cost. This technique produces various structures (semi-systolic, quasi-systolic and pure systolic arrays) which could be considered as application specific array processors.
Mixed digital/analog VLSI array architectures for image processing
Mani Soma, Thomas Alexander
This paper presents the design of a mixed digital/analog array to implement several signal processing algorithms used in coordinate rotation for image processing. The advantages of hybrid arrays in which the processor element can be either analog or digital depending on a specific computing requirement are discussed. The interfaces between processor elements of different types are considered and several circuits are designed for this task. The proposed mixed array architecture is shown to possess strong potential in other high-performance computing applications.
Modular Image Processor: an efficient chip set for real-time image processing
Hughes Waldburger, Jean-Yves Dufour, Gilles Concordel
With the breakthrough achieved in both memory capacities and very large scale integration, a large field of sophisticated algorithms can now be implemented in real time or video rate. This is especially true for low level operations which are taking into account the video image itself. At this level, most of the operations can be carried out using a restricted group of basic functions. This has led to the realisation of a coherent VLSI chip set named MIP (Modular Image Processor) carried out in the framework of a EUREKA project. The methodology which has ruled the design, the resulting chip set description as well as one of its applications are presented.
Analog parallel processor hardware for high-speed pattern recognition
Taher Daud, Raoul Tawel, Harry Langenbacher, et al.
We report on a VLSI-based analog processor for fully parallel, associative, high-speed pattern matching. The processor consists of two main components: one, an analog memory matrix for storage of a library of patterns, and the other, a winner- take-all (WTA) circuit for selection of the stored pattern that best matches an input pattern. An inner product is generated between the input vector and each of the stored memories. The resulting values are applied in parallel to a WTA network for determination of the closest match. Patterns with up to 22% overlap are successfully classified with a WTA settling time of less than 10|is. Applications such as star pattern recognition and mineral classification with bounded overlap patterns have been successfully demonstrated. This architecture has a potential for an overall pattern matching speed in excess of 109 bits per second for a large (32-bit x 1000-pattern) memory.
Foveating vision systems architecture: image acquisition and display
Yehoshua Y. Zeevi, Ran Ginosar
Both biological and man-made (usually computer-based) vision systems have to acquire visual data and process it in real time at a formidable bit rate. To cope with this fundamental problem, biological systems of higher species employ mechanisms of data reduction already at the level of visual data acquisition.
VLSI Implementation and System Architectures II
icon_mobile_dropdown
Parallel architecture for real-time video communications
Luis A. S. V. de Sa, Vitor Mendes Silva, Fernando Perdigao, et al.
A video codec based on several parallel digital signal processors is described. The digital signal processors (DSPs) can be easily programmed to implement the H.261 algorithm and are organized as a single instruction multiple data (SIMD) computing architecture. Both the encoder and the decoder divide a picture in regions of horizontal strips and use one local processor per region. These local processors code (decode) one horizontal strip of data which, using the terminology of the H.261 standard, corresponds to two group of blocks (GOBs). They also communicate to a central processor which multiplexes (demultiplexes) the coded data from (for) the processors in the encoder (decoder). In the case of the encoder the central processor also controls a data buffer for bit-rate adaptation. Lateral communication between adjacent processors is implemented to allow comparisons between blocks situated in neighbouring regions, as required by most motion estimation algorithms.
VLSI components for a 560-Mbit/s HDTV codec
Klaus Grueger, Peter Pirsch, Josef Kraus, et al.
The hardware implementation of a DPCM coding algorithm with 2D prediction, noise shaping and fixed codeword length for the transmission of HDTV signals with 560 Mbit/s bit rate has been investigated. In order to reduce timing requirements an architecture with parallel processing elements and with modified DPCM-structure is proposed. DPCM and FIFO circuits as major components for such a codec have been designed for a VLSI-realization using 1.2|im CMOS technology.
VLSI-architectures for the hierarchical block-matching algorithm for HDTV applications
Luc P.L. De Vos
This paper describes VLSI-architectures for HDTV-suitable implementations of two well-known hierarchical block-matching algorithms: the three-step search algorithm and the one-at-a-time search algorithm. The architectures exploit the regularity and design efficiency of systolic arrays, combined with a decision-driven on-chip data handling. They are capable of treating (16*16)-blocks, with a maximum displacement of +/-14 pixels, at 50MHz pixel rate. Another important feature is the small input-data bandwidth, which keeps to a minimum the requirements to external memory units. Transistor count and chip area estimations show that the architectures can be realized with today's CMOS technologies.
VLSI architecture and implementation of a multifunction, forward/inverse discrete cosine transform processor
Masanori Maruyama, H. Uwabu, I. Iwasaki, et al.
The Discrete Cosine Transform (DCT) is considered to be the most effective transform coding technique for image and video compression. In this paper, a new implementation of an experimental prototype multi-function DCT/IDCT (Inverse DCT) chip is reported. The chip is based on a distributed arithmetic architecture. The main features of the chip include: 1) The DCT and the IDCT are integrated in the same chip, 2) the chip achieves high accuracy, exceeding the stringent requirements of a proposed CCITF standard, 3) it achieves a high operating speed of 27 MHz, and is thus applicable to a wide-range of real-time image and video applications, 4) the internal clock frequency is the same as the pixel rate, and 5) with an on-chip zigzag scan converter and an adder/subtractor, it is multifunctional and useful in a DPCM configuration. The chip is implemented with standard cells and contains about 156k transistors.
3-D Image Processing
icon_mobile_dropdown
Determining vanishing points using Hough transform
Evelyne Lutton, Henri Maitre, Jaime Lopez-Krahe
We propose a method to locate three vanishing points on an image corresponding to three orthogonal directions of the scene. This method is based upon the Hough Transform. With the help of a correction of errors due to the limitation of the retina, and errors due to the detection inaccuracy of the image segments, this method was improved to work on noisy images of real complex scenes.
Monocular correspondence detection for symmetrical objects by template matching
G. Vilmar, Philipp W. Besslich Jr.
We describe a possibility to reconstruct 3-D information from a single view of an 3-D bilateral symmetric object. The symmetry assumption allows us to obtain a “second view” from a different viewpoint by a simple reflection of the monocular image. Therefore, we have to solve the correspondence problem in a special case, where known feature-based or area-based binocular approaches fail. In principle, our approach is based on a frequency domain template matching of the features on the epipolar lines. During a training period, our system “learns” the assignment of correspondence models to image features. The object shape is interpolated, when no template matches to the image features. This fact is an important advantage of this methodology, because no “real world” image holds the symmetry assumption perfectly. To simplify the training process, we used single views on human faces (e.g. passport photos), but our system is trainable on any other kind of objects.
3-D reconstruction using a limited number of projections
Catherine Klija, Blandine Lavayssiere
We present a new iterative algorithm for 3D reconstruction under constraints from a limited number of radiographic projections, with adjustment of the constraints during the iterations. The first step of the algorithm is a classical iterative reconstruction ART type method (Algebraic Reconstruction Technique) which provides a rough volumic reconstructed 3D zone containing a flaw. Then, this reconstructed zone is modelled by a Markov Random Field (MRF) which allows us to estimate some 3D support and orientation constraints using a Bayesian restoration method. This fundamental step is an important one in the sense that it allows the introduction of local geometric a priori knowledge concerning the faults. The next step consists in reintroducing these strong constraints in the reconstruction algorithm. Few iterations of the algorithm are necessary to improve quality of the reconstructed 3D zone. Simulated radiographic projections allow the performance of the algorithm to be evaluated.
Mathematical morphology for 3-D object segmentation and partial matching
Isabelle Bloch-Boulanger, Henri Maitre, Francis J. M. Schmitt
We propose here a pre-processing of 3D shapes which allows to give greater importance to the match between some parts of the surface of one object and some parts of the surface of the other object during the matching of the two objects. The method is working with objects defined on a digital grid and consists in a segmentation step to separate the main components of the shapes, and in a distance computation step to determine matching weights to be assigned to the surface points. Several new algorithms are developed: a 3D segmentation algorithm based on mathematical morphology, a fast method to compute geodesic distances on a 3D surface, a simple sphere creation algorithm on a digital grid. The method has been applied to chemical molecules.
Image Sequence Coding I
icon_mobile_dropdown
Motion-estimation for coding of moving video at 8 kbit/s with Gibbs-modeled vectorfield smoothing
Christoph Stiller
A new approach to motion-estimation for hybrid image sequence coding is presented. Instead of minimizing the displaced frame difference (DFD), the estimator introduced in this paper maximizes the probability to determine the ’true’ physical motion of the scene. The probability expression is derived from two models, one for the statistics of the prediction error image and one for the interdependency of vectors in a vectorfield. The physical vectorfield is smoother than the vectorfield of a DFD-estimator and stronger statistical bindings between vectors exist. Therefore a coding algorithm for the vectorfield combining contour coding of regions of similar displacement with predictive coding of the vectors inside each region proves efficient. This allows the estimator to work with decreased blocksize and (even in DFD sense) to supply a distinctly improved displacement-compensation without spending more datarate for displacement-compensation than the DFD- estimator.
Some variants of universal pattern-matching interframe coding
Takahiro Saito, Ryuji Abe, Takashi Komatsu, et al.
The work herein, applying the concept of irreversible coding via copying to low-rate video compression, has devised some new variants of universal pattern-matching interframe coding (PMIC), which produces the extra effect of generalizing the definition of a search area used in the conventional block-matching motion compensation. We have experimentally shown the performance gain provided by the generalization within the framework of irreversible coding via copying, and demonstrated that PMIC is useful and potential as a basic means for low-rate video compression.
Video coding using a pyramidal Gabor expansion
Touradj Ebrahimi, Todd Randall Reed, Murat Kunt
A compression technique based on an expansion is presented. The elementary functions of the expansion form a class of pyramidal Gabor functions covering the frequency domain in octave bands. The image sequence is coded by differentially coding selected coefficients of this expansion. Simulation results show sequences reconstructed with good quality for a bit rate less than 64 Kbit/s.
Block testing in a variable resolution spatially interpolative moving image sequence coder
Peter J. Cordell, Roger John Clarke
We have modified the interpolation and quad-tree based still picture Recursive Binary Nesting algorithm to code moving sequences. Central to the efficiency of this is the mechanism by which the extent of the quad-tree decomposition is controlled. Termed block testing, this forms the subject of this paper.
Effective exploitation of background memory for coding of moving video using object mask generation
Wolfgang Guse, Michael Gilge, Bernd Huertgen
A lot of different algorithms for background prediction and memory control have been investigated mainly in the field of low bit-rate coding. Two problems arise with background prediction: 1. How to tell the receiver which parts of the prediction image may be taken from the background memory? 2. How to distinguish at the receiver’s side between object, stationary, covered and uncovered background? This paper presents a method for pel-accurate distinction between moving object and background without transmitting a single bit. In addition an object-oriented motion estimation and compensation is introduced.
Very low rate coding of motion video using 3-D segmentation with two change-detection masks
Sang-Mi Lee, Nam Chul Kim, Hyon Son
A new 3-D segmentation-based coding technique is proposed for transmitting the motion video with reasonablly acceptable quality even at a very low bit rate. Only meaningful motion areas are extracted by using two change detection masks and a current frame is directly segmented rather than a difference frame itself so that a good image quality can be obtained at high compression ratios. Through the experiments, the sequence of Miss America is reconstructed with visually acceptable quality at the very high compression ratio of 360 : 1.
Visual pattern image sequence coding
Peter L. Silsbee, Alan Conrad Bovik, Dapang Chen
Visual pattern image coding, or VPIC1-5, is an important new digital image coding process that possesses significant advantages relative to all other existing technologies: VPIC is capable of coding (i) with visual fidelity comparable to the best available techniques; (ii) at very high compressions exceeding the best available technologies: compressions in the range 30:1—40:1 are obtained routinely; (iii) with absolutely unprecedented coding efficiency - coding/decoding via VPIC is completely linear with respect to image size and entails a complexity 1-2 orders of magnitude faster than any prior high compression strategy; (iv) configurability. In the current work, the VPIC coding framework developed initially for single images is extended to image sequences. The algorithm for image sequence coding presented here, termed Visual Pattern Image Sequence Coding (or VPISC) exploits all of the advantages of "static" VPIC in the reduction of information from an additional (temporal) dimension to achieve unprecedented image sequence coding performance stated in terms of coding complexity, compression, and visual fidelity.
Hierarchical Video Coding
icon_mobile_dropdown
Subband coding of video employing efficient recursive filter banks and advanced motion compensation
John Hakon Husoy, Hakon Gronning, Tor A. Ramstad
Filter banks based on recursive filters have recently been applied with success in subband coding of still images [1, 2]. Their main feature, low computational complexity, make these filter banks extremely attractive for video coding applications. In. this paper we demonstrate the suitability of these filter banks in low bit rate image sequence coders operating in the 64 —200 Kbps range. In particular we compare the performance of two types of recursive filter banks with that of a more traditional one based on quadrature mirror FIR filters (QMF). It is shown that the performance of our hR based systems is comparable to that of the more complex FIR based systems. Also, we show that the encoding strategy of CCITT's reference model no. 7 can be adapted successfully to fit into our subband coding scheme. Finally we discuss alternative approaches to motion estimation in hybrid image sequence coders and show preliminary results in the context our video subband coder.
Interframe hierarchical address-vector quantization
A new interframe coding technique is presented in this paper called Interframe Hierarchical Address- Vector Quantization (IHA-VQ). It exploits the local characteristics of a moving image's motion compensated (via block matching) difference signals by using quadtree segmentation to divide each signal into large, uniform regions and smaller, highly detailed regions. The detailed regions are encoded by Vector Quantization (VQ), and the larger, low detail regions are replenished from the previous image. IHA-VQ also exploits the correlation between the small blocks by encoding the addresses of neighboring vectors using several codebooks of address-codevectors. The IHA-VQ technique is applied to a test sequence of images in a computer simulation. The sequence was encoded at an average bit rate of 0.607 bits per pixel (bpp). The Signal-to- Noise Ratio (SNR) averaged 39.5 dB. By changing the quadtree segmentation parameter a much lower bit rate, 0.237 bpp, is achieved for a lower SNR, 37.73 dB.
Refinement system for hierarchical video coding
Frank Bosveld, Reginald L. Lagendijk, Jan Biemond
The Broadband Integrated Services Digital Network (BISDN) based on lightwave technology is supposed to become the all-purpose exchange area communications network of the future. All digital video services are integrated with applications ranging from videophone, teleconferencing to digital TV (signals according to the CCIR rec. 601) and High Definition TV (HDTV) distribution. A desirable feature of the various video services is the upward and downward compatibility in resolution in order to guarantee a free exchange of services, transmitters and receivers. This paper proposes an n-level progressive hierachical intraframe coding scheme based on subband coding. In this scheme several spatial low-resolution services are available as subsets of the coded HDTV data which can directly be received at lower bit rates. Progressive coding of the HDTV signal is employed in order to prevent quantization errors from propagating to higher resolution signals. Special attention is given to the design of the quantizers required for the progressive coding, and to the incorporation of the side panels coding.
Design of an HDTV subband codec considering CMOS-VLSI constraints
Ulrike Pestel-Schiller, Bernd Schmale
A coding technique for the transmission of HDTV signals at a bit rate of 140 Mbit/s is presented. Since the capability of downward compatibility to standard TV is desired a four band subband approach has been taken. Because of the intended realization of the codec using CMOS-VLSI technology, the subband filter bank and the coding algorithm are designed considering hardware constraints and system restrictions of the subband filterbank. The design results in Quadrature Mirror Filters realizable with simple integer arithmetic. An adaptive intrafield coding algorithm is used for coding the four subbands. Separable QMFs with 10 coefficients in vertical and 14 coefficients in horizontal filtering are preferred to QMFs with less coefficients because of the coding gain.
Hierarchical Image Coding
icon_mobile_dropdown
Image subband coding using an efficient recursive filter bank with complex signals
Hakon Gronning, John Hakon Husoy, Tor A. Ramstad
The principle of subband coding has recently been applied for the purpose of image data compression. In this paper we present new results pertaining to our subband coder for images featuring an exact reconstruction parallel complex filter bank based on HR filters. This filter bank is characterized by its low complexity. The main advantage of subband coding is the possibility of separate coding of each subband such that the perceptually important coding noise is minimized. In this paper we exploit the nature of the complex valued subband signals of this particular coder by coding the module and phase rather than the real and imaginary parts. We determine the tolerable quantization noise levels for the module and phase signals in all subbands. These empiricaly derived data are employed in the coding simulations presented. Finally we present coding results for bit rates between 0.25 and 1.0 bits/pel.
Perfect reconstruction binomial QMF-wavelet transform
Ali Naci Akansu, Richard A. Haddad, Hakan Caglar
This paper describes a class of orthogonal binomial filters which provide a set of basis functions for a bank of perfect reconstruction Finite Impulse Response Quadrature Mirror Filters (FIR-QMF). These Binomial QMFs are shown to be the same filters as those derived from a discrete orthonormal wavelet approach by Daubechies [13]. The proposed filters can be implemented very efficiently with output scaling, but otherwise no multiply operations. The compaction performance of the proposed signal decomposition technique is computed and shown to be better than that of the DCT for the AR(1) signal models, and also for standard test images.
Generalized quad-trees: a unified approach to multiresolution image analysis and coding
Roland Wilson, Martin Todd, Andrew D. Calway
This paper is an attempt to bring together a number of current ideas in multiresolution image processing into a single framework. The unification is achieved by introducing a two-level image model, comprising a quadtree containing parameters which control the evolution of the image as a sequence of successive refinements through scale-space. Examples of the method's application to image coding and analysis are used to illustrate the principles and show its usefulness.
Three-dimensional adaptive Laplacian pyramid image coding
Sebastia Sallent, Luis Torres, L. Gils
This paper describes the Three Dimensional Laplacian Pyramid Coding as an extension of previously reported work on static images. The algorithms that we present when applied to image sequences gives high compression rates being very useful for videotelephone or videoconference applications. The image sequence is splitted into subsequences which are composed of a small number of images, then the three dimensional algorithms are applied. The coding scheme is based on the construction of spatial/temporal pyramid structures defined on arbitrary sampling lattices. A three dimensional treatment study of sampling rate conversion using matrix notation is also presented.
Digital Image Processing in Medicine I
icon_mobile_dropdown
Three-dimensional reconstruction and lateral views in optical microscopy
Tullio Tommasi, Bruno Bianco, Vittorio Murino, et al.
The three-dimensional properties of an optical microscope are analyzed and a defocusing technique is proposed to recover the spatial distribution of the specimens under investigation. Limitations and real capabilities of the 3D reconstruction are pointed out. As a result, lateral views are obtained by means of operations in the spatial frequency domain. In such a way, it is possible to represent side views of an object within the angular aperture range of the microscope. A theory concerning image formation is discussed and simulations of side view reconstructions are reported.
Improved resolution of medical 3-D x-ray computed-tomographic images
Christophe Odet, Gilles Jacquemod, Francoise Peyrin, et al.
Many applications require the acquisition of 3D CT images made of a stack of 2D X- ray slices. The different slices are obtained by the displacement of the patient table, which sets the resolution in the z direction perpendicularly to the slice plane. This resolution is physically limited by the detector width, on which the X-ray radiation is integrated. Those detectors inherently carry out a low pass filtering process and aliasing effects. As a consequence the reconstruction of a slice in any direction can present artefacts. Then we propose to apply a superresolution technique based on a deconvolution method associated to oversampling using subpixel motion in order to improve the resolution in the z-direction. This method requires the acquisition of overlapping slices which are later processed directly in the spatial domain, what is particularly interesting for computer time. Its application provides thinner and accurate new slices. After the presentation of the method, the results obtained on experimental images coming from a conventional CT scanner are presented.
Extraction of morphometric information from dual echo magnetic resonance brain images
Tamas Sandor, Ferenc A. Jolesz, J. Tieman, et al.
Hierarchical image segmentation techniques have been applied to spin-echo magnetic resonance images of the brain. From T1- and T2-weighted clinical images, areas representing gray matter, white matter and cerebrospinal fluid were segmented. In the segmentations à priori anatomic information was used and the procedures were performed in an ordered sequence of anatomy.
Diagnostic digital image processing of human corneal endothelial cell patterns
This paper describes the application of digital image processing to the morphological evaluation of human corneal endothelial cell patterns. This morphological evaluation is used in both basic and clinical ophthalmology. There is interest in investigating the fundamental processes involved in the pattern formation and the variation in the pattern over time. Digital image processing and analysis of the binary segmented images of cell boundaries can be used to determine for each cell border the following features as: area, perimeter, aspect ratio, roundness, form factor, fractal dimension, length, breadth, width, equivalent diameter, x and y center of gravity, orientation angle, and moment angle. The question of the nature of the physical processes which result in the dynamic polygonal cell patterns is presented.
Region-oriented 3-D segmentation of NMR datasets: a statistical model-based approach
Til Aach, Herbert Dawid
We present a three-stage method for segmenting NMR-datasets of the head into 3D-regions corresponding to different brain matter classes, liquid containing structures, cranium, and background. Our technique works from the beginning with 3D-regions, whose internal ’grey’ values as well as shapes are described by stochastic models. The first phase starts by assuming the entire dataset as consisting of only one region, and then recursively extracts those areas which are not compatible with this hypothesis. During this step, special emphasis is given to the problem of accurately locating the region surfaces. In the second stage, a Bayes classifier groups the regions into different categories, like brain matter, liquid, cranium, and background. Classification errors are corrected largely automatically during the third stage by applying simple knowledge about the topological relationships between the classes.
Enhancement and segmentation for NMR images of blood flow in arteries
Guang-Zhong Yang, Peter Burger
The widespread prevalence of atherosclerotic vascular disease has given rise to the need for a simple, noninvasive imaging examination of the cardiovascular performance of patients. The potential of using Magnetic Resonance (MR) Imaging to quantify flow in vivo has for reaching possibilities for the future of preventive medicine. In this paper we address the problem of using MR velocity imaging to analyse the flow boundaries in human arteries which are of great importance to the early diagnosis of occlusive diseases. A flow related enhancement process is introduced in this paper. It is designed to suppress the residuals and the noisy background of the MR velocity images caused by misregistration, tissue movement and uneven magnetic field and provide great improvement in signal to noise ratio. From the enhanced image, the main flow areas can be delineated by a thresholding process which defines the kernel of the flow. The boundaries of the kernel region are then dynamically guided by a defined flow boundary localization process to their final positions. The results of the application of this coarse to fine process show its robustness and effectiveness for the determination of the blood blow boundaries form very low quality MR velocity images.
Image Sequence Coding III
icon_mobile_dropdown
Enhancement and delineation of lung tumors in local x-ray chest images
Fang Zhou, Hongbo Zhou, Wu Xiang-Qi, et al.
The diagnostic quality of radiographic images can be improved with some basic image processing algorithms. This is shown by a two step procedure on X-ray chest images in this paper. First, a revised adaptive unsharp masking method was applied which sharpens some boundary features and in the meantime smooths some others. After this, an extended Sobel edge detection and region growing process followed which further delineates the suspected lung nodules. The final results give a clearer show of wanted features.
Digital Image Processing in Medicine II
icon_mobile_dropdown
Analysis of x-ray hand images for bone age assessment
Joan Serrat, Jordi M. Vitria, Juan Jose Villanueva
In this paper we describe a model-based system for the assessment of skeletal maturity on hand radiographs by the TW2 method. The problem consists in classifiying a set of bones appearing in an image, in one of several stages described in an atlas. A first approach consisting in pre-processing, segmentation and classification independent phases is also presented. However, it is only well suited for well contrasted, low noise images, without superimposed bones, were the edge detection by zero crossing of second directional derivatives is able to extract all bone contours, maybe with little gaps, and few false edges on the background. Hence, the use of all available knowledge about the problem domain is needed to build a rather general system. We have designed a rule-based system for narrow down the rank of possible stages for each bone, and guide the analysis process. It calls procedures written in conventional languages for matching stage models against the image, and getting features needed in the classification process.
ITS: a practical picture archiving and communication system
Tianhu Lei, Wilfred Sewchand
In the design and implementation of Picture Archiving and Communication Systems (PACS), two types of problems have to be solved. The first is in the aspect of image data formatting and the second is in the image transmission. Radiography, X-ray Computed Tomography (CT), radionuclide emission tomography, magnetic resonance imaging (MRI), and ultrasonic imaging are the widely used medical imaging. Microscopic images are the examples of bioimaging. Some of these biomedical imaging modalities, such as X-ray CT, MRI, etc, directly create digitized images, which are stored in computer disks. Some of them produce non-digitized images, which are recorded on films or video-tapes. Using scanners (flatbed, overhead, slide) or video camera, the non-digitized images can be digitized. Due to the diversity of physical principles and image reconstruction procedures of these imaging modalities, the image data formats of the digitized biomedical images are extremely different. In addition, the data size of the digitized images are usually very large, especially when the higher resolution is required. For instance, a transaxial thoracic image, created by the 3-rd generation of X-ray scanner, has about 150 X 103 bytes, a digitized radiographic film (14” X 17”), in the moderate resolution (300 pixel-per-inch), occupies about 20 X 106 bytes. Therefore, seeking an unified image data format and compressing the image data for digitized biomedical images are the first issue in developing PACS . Computer networks provide the effective means for exchanging images. Currently, mainframe computers are linked by networks such as Bitnet, ARPAnet, etc. Personal (micro) computers (including workstations) are linked by LAN etc. Micro-mainframe connectivity is also available. In order to establish the communications among these computers, the same protocol is required. Once the compressed images are captured, opening, z.e., redisplaying the captured images requires the same or compatible applications (softwares). For the further processing and analysis, these compressed images may be required to change to the standard text data format (e.g. ASC) to meet a variety of needs of different computer systems and processing techniques. It is evident that speed in image transmission is another key factor, especially when image data size is very large. Thus, speeding up image transmission, decoding the compressed images, and selecting protocol and application are the second issue in developing a PACS . The problems addressed above have been considered and solved. An Image Transfer System, ITS, has been developed in the University of Maryland School of Medicine (Baltimore) and tested between micro- to micro-computers, micro- to mainframe computers, and mainframe to mainframe computers. 1. The Tag Image File Format, TIFF, is utilized as an unified data format for image data compression and transmission. TIFF was originated in 1986 and some ideas behind it came from the ACR/NEMA communication protocol for radiological images. We extend it to more general medical images . Comparing with traditional data file formats which use a fixed-position organization, TIFF employs a different structure based on tagged information. This structure is commonly used in data base design. Each tag of TIFF consists of several fields and a pointer. It describes not only the height and width of image, but also contain resolution information, support grayscale a d color data, and allow for private and public data compression schemes. Although the tag structure of TIFF may seem to add an unnecessary layer of overhead, it provides a key advantage over a fixed- position format. It is very easy to add new tags to the format without requiring all supporting software to be rewritten . Standard data compression schemes are defined within TIFF. For bilevel image, a simple run- length encoding is used. For grayscale and color images, a form of LZW (Lempel-Ziv and Welch) encoding is used. According to our results, compression ratio is up to 3.65 . 2. A special firmware and a selected protocol (University of Maryland School of Medicine is going to file patent protection) are utilized to transfer images (in TIFF format) between micro- to micro-computers, and micro- to mainframe computers. This firmware provides a speed (in image transfer) which is four (4) times faster than that in normal data transfer . Once the transferred images are captured, they can be directly displayed and printed. Using phototypesetter printer such as Linotronic, a photo-quality prints can be produced. Using several public softwares, these captured images can be directly manipulated. A decoding technique which can translate the coded, compressed TIFF image data into standard text image data is also developed. Thus, almost every image processing and analysis approach, e.g. edge-detection and region-segmentation in the lower-level analysis and object recognition and labeling in the higher- level analysis, either in mainframe or in micro-computer, can use these image data for further processing . Our research and development showed that ITS has the following advantages: A. it can accept both digitized and non-digitized images, as inputs, from different biomedicalimaging modalities; B. various image data formats can be converted to an unified format, TIFF, which is extensible,portable, and revisable; C. it can achieve pretty high data compression ratio, up to 3.65; D. it can achieve quite fast transmission speed, 4 times faster than normal data transfer; E. its communication protocol and applications are simple and public; F. its decompression (i.e. decoding) procedure makes the captured image be processed in almostevery computer system and by existing processing/analysis techniques; G. it is very user-friendly: it does not require special computer training; H. it is also extremely cost-effective; ITS demonstrates the great promise and is a practical Picture Archiving and Communication System.
Open system architecture for distributed image-reference database in radiological applications
Alessandro Bellini, Giacomo Bucci
Discussed are the most relevant architectural aspects of the the so-called Image Reference Data Base (IRDB), that is the distributed Information System which is under development within the framework of TELEMED Project of the program RACE, project no.R1086, of the Commission of the European Community. The TELEMED Project has the objective of developing a pilot application for experimenting Integrated Broadband Communication of medical, anatomical and radiological information. A major design objective has been the definition of an open system architecture permitting a multisite implememtation of IRDB. To this end, the proposed architecture builds a federation of IRDB and provides a logical view of each individual IRDB as if it were a partition of single database.
HDTV
icon_mobile_dropdown
Motion field restoration using vector median filtering on high-definition television sequences
Tero Koivunen, Ari Nieminen
Field rate upconversion is used in many modem television systems in order to improve the picture quality. Motion estimation by motion vector computation is used to improve motion portrayal. Usually, motion compensated upconversion requires some further processing of the standard 50 Hz motion vectors. In this paper, a nonlinear algorithm, called vector median, is introduced for this purpose. The suggested postprocessing was applied to the High Definition Multiplexed Analogue Components (HD-MAC) system introduced by the Eureka 95 project.
Motion-adaptive four-channel HDTV subband/DCT coding
Guenter Schamel
A new subband technique for coding high-definition television (HDTV) signals is presented. Intrafield processing is sufficient because of motion adaptive subband suppression. The scheme can be implemented in hardware very similar to two-dimensional subband coding schemes.
Compression and channel-coding algorithms for high-definition television signals
Luciano Alparone, Giuliano Benelli, A. F. Fabbri
In this paper results of investigations about the effects of channel errors in the transmission of images compressed by means of techniques based on Discrete Cosine Transform (DCT) and Vector Quantization (VQ) are presented. Since compressed images are heavily degraded by noise in the transmission channel, more seriously for what concern VQ-coded images, theoretical studies and simulations are presented in order to define and evaluate this degradation. Some channel coding schemes are proposed in order to protect information during transmission. Hamming codes (7,4), (15,11) and (31,26) have been used for DCT-compressed images; more powerful codes, such as Golay (23,12), for VQ-compressed images. Performances attainable with soft- decoding techniques are also evaluated; better quality images have been obtained than using classical hard decoding techniques. All tests have been carried out to simulate the transmission of a digital image from HDTV signal over an AWGN channel with PSK modulation.
Source coding of HDTV with compatibility to TV
Gradual introduction of HDTV is considered to be important. In this paper, a bit-rate reduction system is introduced, which decreases the bit rate of digital HDTV from 664 Mbit/s to about 80 Mbit/s, while ensuring compatibility with TV. The system is based on first subband splitting the interlaced HDTV signal into an interlaced compatible TV signal and three surplus signals. Then, the compatible TV signal is coded with intraframe DCT coding, whereas the surplus signals are coded with quantization, variable-length coding and runlength coding.
Parallel Processing
icon_mobile_dropdown
Xputer use in image processing and digital signal processing
Reiner W. Hartenstein, A. G. Hirschbiel, K. Lemmert, et al.
For a number of real-time applications extremely high throughput (up to several kiloMIPS) is needed at very low hardware cost. For at least another decade this mostly will be possible only with dedicated hardware, but not with programmable von-Neumann-type universal hardware.
Parallel architectures for the postprocessing of SAR images
Luciano Alparone, Federico Boragine, Stefano Fini, et al.
In this paper we have studied what performances can be obtained by mapping processing image algorithms on parallel architectures. For this reason we have focused our attention on a particular problems we have studied a processing chain to detect ship wakes in SAR images. It allowed us to test low and medium level algorithms. Because of the noisy nature of SAR images great attention has been put on the choice of the most appropriate filtering technique. It has been done both through theoretical considerations on the nature of noise and experimental results on filtered images. The chain has been implemented on a sequential machine like MicroVAX II and on some parallel architectures based on eight transputers IMS T800. The performances obtained in this way have been discussed and some general considerations have been deduced from them.
Transputer-based embedded system for METEOSAT image data compression
Menno H.J.B. Versteeg, R. A. Hogendoorn, A. Monkel
The National Aerospace Laboratory of the Netherlands NLR realized under contract with the European Space Operation Center (ESOC) a system to demonstrate the feasibility of using data compression for the European METEOrological SATellite (METEOSAT) dissemination link. The used compression method doubles the amount of image data that can be disseminated to the users via the existing channel without introducing any errors in the reconstructed image after decompression. The compression and decompression functions are performed in real-time during the METEOSAT dissemination (166 Kbit/sec). For this project a multi processor system based on the INMOS Transputer has been selected to provide the required processing power.
Image Coding and Transmission I
icon_mobile_dropdown
Source coding of super high definition images with discrete cosine transform
Mitsuru Nomura, Tetsuro Fujii, Naohisa Ohta
This paper discusses the bit-rate compression of super high definition images with Discrete Cosine Transform (DCT). Super high definition images with more than 2048x2048 pixels of resolution are introduced as the next generation image category beyond HDTV. In order to develop bit-rate reduction algorithms, an image I/O system for super high definition still images is assembled. A traditional DCT based coding algorithm called Scene Adaptive Coder (SAC) is applied to super high definition images and related problems are clarified. A new coding algorithm is proposed, which takes human visual perception characteristics into consideration, and its coding performance is examined for super high definition images.
New variable-rate VQ coding scheme apply in HDTV
Yushu Feng, Ker Zhang
In this paper, a new approach of a variable block size VQ coding scheme is proposed. Most natural image can be divided into regions of high and low detail. Traditional coding schemes do not adapt the blocksize on the space-varying characteristics of natural images, but break the image into blocks of a fixed size prior to processing. The blocksize in transform coding is usually too large to handle partial high-detail region satisfactorily. On the other hand, the block size in conventional vector quantizer is too small to take advantage of large homogeneous regions in an image. For this reason, it can be very attractive to implement a variable block size image coding scheme which allow a greater variation of the number of bits spent per unit area according to the local detail.
Clustering algorithm for entropy-constrained vector-quantizer design
Weiler A. Finamore, Diego Pinto de Garrido, William A. Pearlman
A clustering algorithm for the design of efficient vector quantizers to be followed by entropy coding is proposed. The algorithm generates a sequence of quantizers whith rates close to theoretical rate- distortion bound. A fast version of this algorithm can be used as an alternative to the entropy-constrained vector quantizer technique proposed by Chou, Lookabaugh and Gray.
Variable block-sized vector quantization of gray-scale images with unconstrained tiling
Jerrold L. Boxerman, Ho John Lee
This paper discusses the design and performance evaluation of a variable block-sized vector (plantizer with unconstrained tiling (ETYQ). a novel block coder that segments the input image into an unconstrained tiling of various-sized, nou-overlapping rectangular regions, each of which is then coded with a vector cpiantizer. ETYQ is compared with the more constrained quadtree vector cpiantizer (QTYQ) approach. A transmission .scheme is described which reduces the overhead required to convey ETYQ segmentation geometry. Several l:TYQ segmentation strategies are presented, and mean-residual codebook generation methods with appropriate side stream compression techniques are discussed. ETYQ outperforms QTYQ in a projected rate sense, since it can segment an image into fewer blocks, each of which satisfies a specified homogeneity constraint. Furthermore, lTTYQ outperforms QTYQ in a rate-distortion sense for certain applications. The maximum separation which we have observed between our experimental ETYQ rate-distortion curves and those for the QTYQ system is 0.4 dB.
Image Coding and Transmission II
icon_mobile_dropdown
New technique of linear-phase QMF filter design for subband coding
Joon-Hyeon Jeon, Jae-Kyoon Kim
In this paper, a new technique for designing linear phase QMF filters is proposed. These filters have the frequency responses of sharp and symmetrical transition charcteristics, and the capabilities of perfect reconstruction and simple implementation. By applying video signals into basic sub-band systems, the performances between designed QMF pair and other QMF pairs1,2 are compared in alias-cancellation!reconstruction errors and Peek Signal- to-Noise Ratios(PSNRs). It is shown that the proposed QMF pair has better performance than the short kernel filter pair1, and similar performance as the long kernel filter pair2.
Gain-adaptive trained transform trellis code for images
Dong-Youn Kim, William A. Pearlman
There exists a transform trellis code that is optimal for stationary Gaussian sources and the squared- error distortion measure at all rates. In this paper, we train an asymptotically optimal version of such a code to obtain one which is matched better to the statistics of real world data. The training algorithm uses the M-algorithm to search the trellis codebook and the LBG-algorithm to update the trellis codebook. To adapt the codebook for the varying input data, we use two gain-adaptive methods. The gain-adaptive sheme 1, which normalizes input block data by its gain factor, is applied to images at rate 0.5 bits/pixel. When each block is encoded at the same rate, the nonstationarity among the block variances leads to a variation in the resulting distortion from one block to another. To alleviate the non-uniformity among the encoded image, we design four clusters from the block power, in which each cluster has its own trellis codebook and different rates. The rate of each cluster is assigned through requiring a constant distortion per-letter. This gain-adaptive scheme 2 produces good visual and measurable quality at low rates.
DS-DCT: the double-shift DCT image coding for low-bit-rate image transmission
Defu Cai, Yan-Ping Chen
Traditional block transform coding scheme would generate the artifact called blocking effects near block boundaries which degrate the low bit rate coded images. In this paper; the Double Shift DCT ( DS-DCT ) image coding scheme is to be presented. The pixels in spatial and coefficients domain has been shifted according to the equal-pitch principle before DCT transformation. Finally; the blocking effects will be reduced to a very low level. At the mean time; the energy with in subblocks will be more compacted. Therefore; the benefits of more data compression will be accompanied with the reducing blocking effects. The DS-DCT results in smaller image reconstruction errors; better signal to noise ratio and more data compression than the adaptive DCT image coding scheme from Chen1 .
Efficient error-resilient codes for sparse image coefficients
Niann Tsyk Cheng, Nick G. Kingsbury
In many source and data compression schemes, information relating to positions of high energy samples or areas of importance often needs to be relayed to the decoder. The error resilient positional code (ERPC) is an efficient fixed rate coding scheme for encoding such positional information, or equivalently, sparse binary data patterns. It has also been designed with good channel error robustness properties, whose performance degrades gracefully with worsening channel conditions, without the possibility of breakdown or loss of sync. In this paper, the coding efficiency of the ERPC is compared to a few other standard schemes, and as well as being efficient, its error extension is shown to be low and non-catastrophic. The ERPC is then applied to an efficient error robust adaptive image coding example based on a SBC/VQ codec capable of operating in harsh channel conditions without the aid of channel coding.
Optimum quantization for subband coders
Luc Vandendorpe, Benoit M. M. Macq
Subband Coding of images, which can be seen as an extension of orthogonal transformations, has been introduced by Woods and O’Neil [8]. In [8], the subbands are further encoded by DPCM. In this study, it is proposed to use subband coding as a stand alone coding technique. The emphasis is put on the question of subband quantization. A methology is proposed where the quantization step of the subbands is taken so as to minimize a quantization noise power weighted by a sensitivity function of the eye.
Statistical dependence between orientation filter outputs used in a human-vision-based image code
Bernhard Wegmann, Christoph Zetzsche
We present an image coding scheme based on the properties of the early stages of the human visual system. The image signal is decomposed via even and odd symmetric, frequency and orientation selective band-pass filters in analogy to the quadrature phase simple cell pairs in the visual cortex. The resulting analytic signal is transformed into a local amplitude and local phase representation in order to achieve a better match to its signal statistics. Both intra filter dependencies of the analytic signal and inter filter dependencies between different orientation filters are exploited by a suitable vector quantization scheme. Inter orientation filter dependencies are demonstrated by means of a statistical evaluation of the multidimensional probability density function. The results can be seen as an empirical confirmation of the suitability of vector quantization in subband coding. Instead of generating a code book by use of an conventional design-algorithm, we suggest a feature specific partitioning of the multidimensional signal space matched to the properties of human vision. Using this coding scheme satisfactory image quality can be obtained with about 0.78 bit/pixel.
Coding gains of pyramid structures in progressive image transmission
Seop Hyeong Park, Sang Uk Lee
In this paper, we show two basic results to pyramid coding for a progressive image transmission. We first show that in a hierarchical pyramid structure, the transform coding can provide a gain over a simple scalar quantizer. The optimum bit allocation strategy among the Laplacian pyramid for transform coding is also described. Secondly, based on th.e use of transform coding, we show that especially at low bit rates a hierarchical pyramid structure provides a significant gain over encoding of an input image directly. This is particularly attractive to the progressive transmission using a hierarchical pyramid structure.
Image data compression using hybrid POLA-VQ technique
Ching-Long Lee, Rong-Hauh Ju, Tsann-Shyong Liu, et al.
Vector Quantization (VQ) is a well-known technique for reducing the transmission bit rates or the storage in image coding. It exploits the high correlation and spatial redundancy between neighboring pixels to minimize the mean square error distortion. In this paper, we present a simple and efficient image coding method called hybrid predictive ordering linear approximation and VQ (POLA-VQ). By taking as reference the previous received scan line, the pixels of the present scan line are placed in a decreasing order of amplitudes. The regression line of the ordered pixels is estimated. The linear approximation is refined by computing the one-bit error pattern of the scan line and by selecting an optimal codevector from the codebook. We found that an acceptable quality of coded image at low bit rates can be achieved with our approach.
Edge/Boundary Detection
icon_mobile_dropdown
Object contours and boundaries in color images
Remi Ronfard
This paper illustrates the practical use of deformable contour models in the context of multi-spectral image segmentation. A discussion of color boundary energies is made, and an algorithm (Boundary Search) is presented that adaptively fits a contour line through locally optimized boundary points.
Attributed tree data structure for representing the descriptions of object contours in images
Z. Ren, Walter Ameling, Peter F. Jensch
Shape of medical objects, such as the human bones or skeletons, conveys information about the object’s anatomic structures and their pathological states. The sectional contours of the objects, which are observable through computer tomograms, carry descriptive clues for analyzing shapes. To build and represent the descriptions of contours, one should derive stable structural elements and their sensitive attributes from the contours. Sectional contours of the medical objects are highly curved and complex, therefore, they should be described in a higher order parameter space than just in a linear one. These considerations have guided us to examine the scale-space transformation, and to develop methods for expressing, describing, and extracting structures in the scale-space images of the contours. In this paper we introduce a tree-based data structure for representing descriptions of structures in the scale-space, and further demonstrate its applications to shape analysis.
Two design techiques for 2-D FIR LoG filters
Pierre Siohan, Danielle Pele, Valery Ouvrard
One of the most popular technique for edge detection is based on the convolution of the image with a LoG operator, namely the Laplacian of Gaussian, characterized by a space constant
Adaptable edge quality metric
Robin N. Strickland, Dunkai K. Chang
A new quality metric for evaluating edges detected by digital image processing algorithms is presented. The metric is a weighted sum of measures of edge continuity, smoothness, thinness, localization, detection, and noisiness. Through a training process, we can design weights which optimize the metric for different users and applications. We have used the metric to compare the results of ten edge detectors when applied to edges degraded by varying degrees of blur and varying degrees and types of noise. As expected, the more optimum Difference-of-Gaussians (DOG) and Haralick methods outperform the simpler gradient detectors. At high signal-to-noise (SNR) ratios, Haralick's method is the best choice, although it exhibits a sudden drop in performance at lower SNRs. The DOG filter's performance degrades almost linearly with SNR, and maintains a reasonably high level at lower SNRs. The same relative performances are observed as blur is varied. For most of the detectors tested, performance drops with increasing noise correlation. Noise correlated in the same direction as the edge is the most destructive of the noise types tested.
Neuromorphology of Biological Vision I
icon_mobile_dropdown
Neuromorphology of biological vision: a basis for machine vision
Madan M. Gupta
A machine that can see, perceive and recognize images and objects will be an integral part of the future, autonomous intelligent robotic systems for applications to industrial automation, remote sensing, medical imaging, and space exploration [1-10]. Basic research on and development of such a machine pose a scientific challenge to both the academic and industrial communities.
Color-subspace-based color-coordinate system
Jussi P. S. Parkkinen, Jarmo Hallikainen, Timo Jaeaeskelaeinen
Color is a sensation normally produced by light. For reasonable handling of the spectrum reflected from the surface of a sample one needs a finite dimensional coordinate system. There are several different bases and frameworks to define color coordinate systems. One goal is to find an uniform coordinate system which also could emulate human color vision mechanisms. In this report we describe the color subspace color coordinate system, which is shown to be adaptive and accurate coordinate system for different uses. Also, the subspace approach for human color vision system is represented. This model is based on the Young-Helmholz theory with opponent colors modelling color preprocessing in the LGN-level. It is shown that this approach allows an adaptive way to define color coordinate system for many different purposes. The proposed coordinate system gives an unified view into the color definition and color representation problem.
GRUPO: a 3-D structure recognition system
We have developed a system, Generalized cylinder Recognition Using Perceptual Organization (GRUPO), that performs model-based recognition of the projections of generalized cylinders. Motivated by psychological theory, the approach uses perceptual organization, the grouping of structurally significant features, to limit the object and viewpoint search spaces in recognition. The system receives feature data from a segmentation based on perceptual organization and ranks the object space according to estimates of conditional object probabilities. Depth information is not used in the approach. To complete the recognition system, several problems were solved. For modeling, theoretical contributions include a proof for the invariance of discontinuities to projection, a method to find the axis of symmetry1, and a technique for determining self-occlusion. For the recognition process, solutions to search administration, feature matching, probabilistic search of the object space, and final template matching have been developed. The theory has been implemented and tested on synthetic data.
Neuromorphology of Biological Vision II
icon_mobile_dropdown
Global stability in nonlinear lateral inhibition
Gerard F. McLean, M. Ed Jernigan
Pinter’s model of lateral inhibition using nonlinear local interactions adapts to local signal characteristics to perform range compression and edge enhancement. The model does this with no forced-choice or thresholding, making it suitable for implementation as a generalised image preprocessor. However, the global behaviour of the model is problematic, as its nonlinear and recurrent structure demands that iterative solution procedures be used. Using such techniques, the model proposed by Pinter has been shown to be extremely sensitive to parameter specifications, often providng unstable signal response. In this paper, we investigate the structure of alternatives to the Pinter model which include locally adaptive gain functions. We show that the alternative models display similar adaptive behaviour and decreased sensitivity to gain selection compared to the original Pinter model.
Dynamic neural network for visual memory
Madan M. Gupta, George K. Knopf
A dynamic neural network with neural computing units that exhibit hysteresis phenomena is proposed as a mechanism for visual memory. The neural network, named the PN-processor, is loosely based on a mathematical theory proposed by Wilson and Cowan to describe the functional dynamics of conical nervous tissue. The individual neural computing units of the network are programmed to exhibit localized hysteresis phenomena. This neural network structure is capable of storing visual information without physical changes to its synaptic connections. External stimuli move the network's neural activity around a high-dimensional phase space of state attractors until the overall response is stabilized. Once stabilized, the response remains unperturbed by weak or familiar stimuli and is changed only by a sufficiently strong new input. In this paper we briefly describe several aspects of this type of visual memory.
Binocular fusion inferences in a log-polar decision space
Norman C. Griswold, Nasser Kehtarnavaz
In the technical discipline of computer vision, the concept of using stereo cameras for depth perception has been motivated by the fact that, in human vision, one percept can arise from two retinal images as a result of the process of “fusion” in the visual cortex. If mankind wants to apply this concept to computer controlled machines it must be inferred that knowledge of the psychological process is sufficient to emulate, at least in a weak sense, the human sensory phenomena called vision. Typically, models for this emulation process have been computationally intensive. They contained or required massive amounts of data and thus resulted in very time consuming analytic solutions for depth or range information. The visual process, however complex, does not operate independently of other brain functions. For instance, the superior colliculus of the brain is used to generate information about visual field motion and many visual reflexes. The superior colliculus is also a multi-modal sensory area which includes auditory and somaethetic information [1]. This visual system-brain function connection must be kept in mind when one is attempting to utilize general visual system models to control manipulative processes in order to accomplish a task. In this paper we investigate the utilization of binocular fusion to determine range and heading angle for the specialized control task of guidance for an autonomous vehicle in a convoy following application. It is therefore necessary to develop the vision model, to define the decision space and to use multi-modal information when appropriate. Both parallel and convergent vision models have been utilized for this task. A comparison of calibration results, the mathematical advantages of a log-polar decision space and the resulting real-time structure for controlling a vehicle in a convoy mode are given from actual experimental evaluation.
Dense color stereo
John Raymond Jordan III, Alan Conrad Bovik
Motivated by the observation that chromatic (color) information is a salient property of surfaces in many natural scenes, we investigate the use of chromatic information in dense stereo correspondence - a topic which has never been investigated. In this regard, the chromatic photometric constraint, which is used to specify a mathematical optimality criterion for solving the dense stereo correspondence problem, is developed. The result is a theoretical construction for developing dense stereo correspondence algorithms which use chromatic information. The efficacy of using chromatic information via this construction is tested by developing a new dense stereo algorithm — the Dense Color Stereo Algorithm. The results of applying intensity and chromatic versions of the Dense Color Stereo Algorithm to several stereo image pairs demonstrate that the use of chromatic information can significantly improve the performance of dense stereo correspondence. These results complement our prior studies of the utility of chromatic information in edge-based stereo correspondence, where it was also shown to play a significant role.
Image invariance with changes in distance: the effect of a nonuniform visual system
Eli Peli, Jian Yang, Robert B. Goldstein
Invariant perception of objects is desirable. Contrast constancy assures invariant appearance of suprathreshold image features as they change their distance from the observer. Fully robust size invariance also requires equal contrast thresholds across all spatial frequencies and eccentricities so that near-threshold image features do not appear or disappear with distance changes. This clearly is not the case, since contrast thresholds increase exponentially with eccentricity. We showed that a less stringent constraint actually may be realized. Angular size and eccentricity of image features covary with distance changes. Thus the threshold requirement for invariance could be approximately satisfied if contrast thresholds were to vary as the product of spatial frequency and eccentricity from the fovea. Measurements of observers' orientation discrimination contrast thresholds fit this model well over spatial frequencies of 1 - 16 cycles/degree and for retinal eccentricities up to 23 degrees. Measurements of observers’ contrast detection thresholds from three different studies provided an even better fit to this model over even wider spatial frequency and retinal eccentricity ranges. The fitting variable, die fundamental eccentricity constant, was similar for all three studies (0.036, 0.036, 0.030, respectively). The eccentricity constant for the orientation discrimination thresholds was higher (0.048 and 0.050 for two observers, respectively). We simulated the appearance of images with a nonuniform visual system by applying the proper threshold at each eccentricity and spatial frequency. The images exhibited only small changes over a simulated 4-octave distance range. However, the change in simulated appearance over the same distance range was dramatic for patients with central visual field loss. The changes of appearance across the image as a function of eccentricity were much smaller than in previous simulations, which used data derived from visual cortex anatomy rather than direct measurements of visual function. Our model provides a new tool for analyzing the visibility of displays and for designing equal visibility or various visibility displays.
Image Sequence Coding II
icon_mobile_dropdown
Coding of moving video at 1 mbit/s: movies on CD
Bernd Huertgen, Michael Gilge, Wolfgang Guse
A coder concept for coding of digital television sequences at 1 Mbit/s is presented. Emerging from the standard hybrid codec for videophone at 64 kbit/s an advanced concept has been investigated which considers the special properties of real world image sequences like unrestricted motion, zoom/pan, changes of illumination and scene cuts. These effects, which cannot be considered for videophone applications because of the extremly low available data rate, make it necessary to extend the standard hybrid coding scheme for 64 kbit/s. To reduce the amount of data the original image sequence in CCIR Rec. 601 format is spatially and temporarily subsampled. The resulting sequence is in accordance with the Common Intermediate Format (CIF) but consists of 25 noninterlaced frames/second. Motion estimation and compensation are improved by applying an algorithm for illumination correction. Additionally zoom and pan, which is widely used in common TV-scenes, is compensated by processing the motion vector field of the previous motion estimation. Scene cuts are detected and considered by intraframe coding of the next frame. Besides every 10th frame is intraframe coded to allow fast forward and reverse searching start at any arbitrary point within the sequence.
Vector quantization with 3-D gradient motion compensation
Choon Lee, Morton Nadler
Several new methods for vector quantization of image sequence were developed in this paper. The 3-d gradient operator1 and the newly developed stochastic gradient operator was used to estimate the motion vector on the present frame in a pixel-bypixel basis. Two new methods of block motion estimation using the pixel motion vectors were studied. The methods either use the pixel motion vectors directly on the moving block or calculate the block motion vector from the pixel motion vectors. Then for both of the methods, the differences between the prediction block and the moving block are vector quantized. The gradual deterioration of the image frames due to the incorrect matching of the code vectors and the difference blocks could be suppressed with the stochastic gradient operator and a simple smoothing filter for the prediction values. Good quality images for video conferencing applications could be obtained with the bit rate in the neighborhood of 0.15.
Two-layers constant-quality video coding for ATM environments
Fernando Manuel Ber Pereira, Lorenzo Masera
The near future Broadband Integrated Services Digital Network (B-ISDN) represents a new mark in Telecommunications allowing service integration and simplifying the global communications structure. Since one of the most important services in the B-ISDN will be video communications, it is essential to identify the characteristics of the video coding schemes which exploit, with the best performance, the new network capabilities. This paper presents some ideas and results related with one interesting coding scheme, suited to the B-ISDN - the two-layers coding scheme.
Image sequence representation using polar-separable filters
T. George Campbell, Todd Randall Reed, Murat Kunt
An orthogonal transform using polar separable filters is introduced. The conditions for perfect reconstruction properties to hold will be discussed. A method for constructing orthogonal pairs of directional operators is derived. An example of a two band orthogonal perfect reconstruction fan filter pair is presented. An orthogonal directional four band filter bank is shown
Image Sequence Coding III
icon_mobile_dropdown
Pyramidal encoding for packet video transmission
Luis Salgado, Alberto Sanz
Pyramidal encoding schemes are proposed as an approximation to a subband coding approach into a packet switched transmission environment. In this paper, the method is introduced and justified, extending the classic approach to pyramidal encoding with a pruning mechanism which reduces extensively (in the nearabouts of 3:1) the information to be coded with a very limited quality loss. The different parameters associated to the coding structure are studied, very particularly the behaviour of the input buffer where the subbands are implemented and important delays have to be accounted for. Results are supplied showing the most meaningful design implications that have been found. A detailed description of the simulator which has been developed is also given, expressing with detail the way of handling the asynchrony of the overall process.
Image Sequence Coding II
icon_mobile_dropdown
Visual model weighted DCT vector quantization for variable bit-rate video coding
Fabio Lavagetto, Sandro Zappatore
In this paper a code-bit allocation policy is presented for the vector quantization of video data, based on the Human Visual System (HVS) frequency response. Discret Cosine Transform is employed for data decorrelation and a frame adaptive vector quantization mechanism is subsequently applied for the DCT coefficients encoding. HVS response is taken into account for the code-bit allocation by sizing the reconstruction look-up table of each vector quantizer. Before vector quantization, DCT coefficients are zig-zag reordered and thresholded; the quantized coefficients are subsequently run-length encoded. The resulting coder presents the basic characteristics of being block-oriented, layered and variable bitrate (VBR): promising results in applications within packet switched networks are forseen. Preliminary results are presented.
Encoding of sign language image sequences at very low rate
Chang-Lin Huang, Chih Hung Wu
Sign Language is a better language used by the hearing impaired. This report offers a means of visual communication over a very—low bandwidth communication network (i.e. telephone lines) of which the purpose is to supply a visual telecommunication among the deaf community. We develop a system which consists of the edge detection, the binarization, the vectorization, the inter—frame correspondence, the component trajectory finding, and hybrid encoding. This system is able to transmit a sequence of images of the sign language at higher resolution and lower transmission rate.
Real-time facial action image synthesis system driven by speech and text
Shigeo Morishima, Kiyoharu Aizawa, Hiroshi Harashima
Automatic facial motion image synthesis schemes and a real-time system design are presented. The purpose of this schemes is to realize an intelligent human-machine interface or intelligent communication system with talking head images. Human's face is reconstructed with 3D surface model and texture mapping technique on the display of terminal. Facial motion images are synthesized naturally by transformation of the lattice points on wire frames. Two types of motion drive methods, text to image conversion and speech to image conversion are proposed in this paper. In the former manner, synthesized head can speak some given texts naturally and in the latter case, some mouth and jaw motions can be synthesized in time to speech signal of behind speaker. These schemes were implemented to a parallel image computer and a real-time image synthesizer could output facial motion images to the display as fast as video rate.
Image modeling for digital TV codecs
This paper describes source models of bit rate for digital TV codecs based on the Discrete Cosine Transformation as decorrelation technique. These models are based on cyclostationary processes. Properties of cyclostationary processes are such that the process is preserved over the whole TV transmission. Therefore, the influence of image Statistics is presented on the coders, the transmission networks, and, the decoders. An emphasis is given on variable bit rate codecs transmitting on ATM networks.
Video signal processing using vector median
Kai Oistamo, Yrjo A. Neuvo
In the vector median approach the samples of the vector-valued input signal are processed as vectors as opposed to componentwise scalar processing. The VM-type filters utilize the correlation between different components in video signal. This makes the VM-filters attractive in video signal processing. In this paper the performance of vector median filters for color video signals is investigated. Vector median FIR median hybrid filters (VFMH) for cross luma and cross color cancellation are introduced. In the VFMH filter the vector median operation is combined with linear substructures resulting in improved cross color and cross luma attenuation and in very good noise attenuation.
Image analysis for face modeling and facial image reconstruction
Hiroshi Agawa, Gang Xu, Yoshio Nagashima, et al.
We have studied a stereo-based approach to three-dimensional face modeling and facial image reconstruction virtually viewed from different angles. This paper describes the system, especially image analysis and facial shape feature extraction techniques using information about color and position of face and face components, and image histogram and line segment analysis. Using these techniques, the system can get the facial features precisely, automatically and independent of facial image size and face tilting. In our system, input images viewed from the front and side of the face are processed as follows: the input images axe first transformed into a set of color pictures with significant features. Regions are segmented by thresholding or slicing after analyzing the histograms of the pictures. Using knowledge about color and positions of the face, face and hair regions are obtained and facial boundaries extracted. Feature points along the obtained profile are extracted using information about curvature amplitude and sign, and knowledge about distance between the feature points. In the facial areas which include facial components, regions are again segmented by the same techniques with color information from each face component. The component regions are recognized using knowledge of facial component position. In each region, the pictures are filtered with various differential operators, which are selected according to each picture and region. Thinned images are obtained from the filtered images by various image processing and line segment analysis techniques. Then, feature points of the front and side views are extracted. Finally, the size and position differences and facial tilting between two input images are compensated for by matching the common feature points in the two views. Thus, the three-dimensional data of the feature points and the boundaries of the face are acquired. The two base face models, representing a typical Japanese man and woman, are prepared and the model of the same sex is modified with 3D data from the extracted feature points and boundaries in a linear manner. The images, which are virtually viewed from different angles, are reconstructed by mapping facial texture to the modified model.
General motion estimation and segmentation
Siu-Fan Wu, Josef Kittler
An algorithm is proposed to estimate the general motion of multiple moving objects in an image sequence. The general motion is described by a general motion model which is specified by number of parameters called motion parameters. By estimating the individual motion of the objects, segmentation according to motion is achieved at the same time. The algorithm works directly on an image sequence without using any higher level information such as corners and edges. Furthermore, it is not necessary to go through the estimation of optical flow as an intermediate step, avoiding the error caused by estimating optical flow.
Adaptive algorithms for pel-recursive displacement estimation
Lilla Boroczky, Johannes Nicolaas Driessen, Jan Biemond
In this paper adaptive algorithms for pel-recursive displacement estimation are introduced. The proposed algorithms are similar in form to the original Wiener-based algorithm, where the extensions are the appropriate tuning of the so-called damping parameter and the use of a linear search technique. The proposed techniques maintain the order of complexity of the original Wiener-based displacement estimator but they are more robust and exhibit a higher rate of convergence. The adaption of the damping parameter in a Total Least Squares sense improves the performance of the estimator with only a modest increase in the computational complexity. The application of a bi-section linear search technique is the most effective extension, but the most computational intensive too.
Distributed detection methods for displacement estimation
Serafim N. Efstratiadis, Aggelos K. Katsaggelos
In this paper, a distributed detection approach for displacement estimation in image sequences is presented. This method is derived from a Bayesian framework and reduces to a M-ary Hypothesis test among a representative set of possible displacement vectors. It is shown that the mean-squared error based block-matching (BM) algorithm is a special case of this general approach. In our approach, at each point of the current frame a set of overlapping localized detectors outputs a number of estimates for the displacement vector. Then, a distributed detection network is adopted for the fusion of the these estimates. Since the computational load is high, suboptimal but computationally efficient solutions are proposed. The above method gives a more accurate estimation of the displacement field and it is shown to be more robust in the presence of occlusion and noise, compared to the BM algorithm. Experimental results on video-conference image sequences are presented.
Segmentation/Classification
icon_mobile_dropdown
Surface defect detection using adaptive image modeling
This paper deals with surface defect detection. The approach investigated here attempts to detect grey level as well as texture defects. The defects are regarded as being characterized by abrupt, local and unpredictable changes in an image. On the other hand, a defect-free surface is assumed to be regular and homogeneous, possibly with smooth and slow variation in its features. The detection is based on adaptive image modeling. It is shown that a classical autoregressive model is not really suitable for this detection problem. Then, two modified models are proposed. Their advantage lies in their taking into account the grey level value as well as the texture information.The model adaptation can be stated as a joint optimization problem with constraint. Two different algorithms are defined and tested. The first algorithm performs the adaptation in a recursive way while enforcing the constraint at each step, whereas the second one imposes the constraint only at optimum. The performance of each algorithms is assessed with a statistical test using synthetic images. Finally, it is shown how this adaptive modeling technique can be applied to practical defect detection problems. Several cases are presented and discussed.
Thresholding three-dimensional image
Yujin Zhang, Jan J. Gerbrands, Eric Backer
In this paper, we present the general methodology for extending different 2-D thresholding techniques to meet 3-D requirements. Several typical thresholding techniques in their 3-D forms have been implemented. The extended techniques have been tested using 3-D synthetic images and the results will be evaluated. Problems related to the extension and improvements made in the implementation will also be discussed.
Edge point/region cooperative segmentation specific to 3-D scene reconstruction application
Patrick J. Bonnin, Bertrand Zavidovique
This paper describes the method and the last implemented modules within the frame of an ongoing study in the field of 3D indoor / outdoor scenes reconstruction, but in man-made environments. It deals more specifically with a cooperative segmentation methodology, very much application driven, and designed from a rough geometrical modelisation of such scenes, as sets of polyhedra. First of all, principles of the segmentation methodology are given. Then, information transfers between both kinds of feature extractors: region (relating homogeneous properties), and edge (relating local difference properties) are pointed out. They allow an easy merge of the different types of data. Later, to conclude with the advantages of this method, compared to other classical ones are underlined.
Sequential classification into m multivariate populations using the information based on small samples
In this paper, the classification problem is considered when the alternative distributions have given functional forms but with unspecified parameters. No systematic attempt seems to have been made to offer solutions for small samples. Here a way for eliminating nuisance parameters from the classification problem is proposed, which is based on consideration of a statistic transforming the original observations, the transformation so chosen that the distribution of this statistic does not depend on the nuisance parameters. The price paid for the elimination of the nuisance parameters is the fact that instead of the original n observations we are left with n1 (n1
Texture
icon_mobile_dropdown
Adaptive classification of textured images using moments and autoregressive models
Levon Sukissian, Andreas Tirakis, Stefanos D. Kollias
An adaptive approach to the classification of textured images is presented, based on the extraction of appropriate features from images. Autoregressive linear prediction models, as well as moments of images, are features which are examined and compared in the paper. Classification is achieved in an adaptive way, using an artificial feedforward neural network, which is trained by examples, using an efficient variant of the backpropagation learning algorithm. It is also shown that an adaptive least squares estimation algorithm can be appropriately interweaved with the network, resulting in an on-line adaptive classification scheme. Simulation results are given, which illustrate the performance of the presented method.
Classification of textures in aerial images
Gerard Brunet, Jean Devars
A method is here presented for texture segmentation in aerial images, using expert knowledge for selection of an automatically adapted and optimized set of references. Software is modular and can be combined with segmentation based on a connected components labeling algorithm working on closed contours. This system is versatile and applications concerning composite textures have shown that method is reliable.
Frequency and orientation selective texture measures using linear symmetry and Laplacian pyramid
An efficient, new method for computing texture features based on dominant local orientation is introduced. The features are computed as the Laplacian pyramid is built up. At each level of the Laplacian pyramid, the linear symmetry feature is computed. This feature is anisotropic and estimates the optimal local orientation in the Least Square Error (LSE) sense. This corresponds to the orientation interpolation obtained from the filter responses of a polar gabor decomposition of the local image. The linear symmetry feature consists of two components, the local orientation estimate and its confidence measure based on the error. Since the latter is a measure of evidence for existence of a definite direction, the linear symmetry feature computed as proposed, has the property of orientation selectivity within each frequency channel. The algorithm is based on convolutions with simple separable filters and pixel-wise non-linear arithmetic operations. These properties allow highly parallel implementation, for example on a pyramid machine, yielding real time applications. Experiments based on test images of natural textures are presented.
Texture classification using transform vector quantization
Gerard F. McLean
This paper presents a method for the classification and coding of textures based upon the use of transform vector quantization. Techniques for texture classification and vector quantization similarly process small, nonoverlapping blocks of image data which are extracted independently from the image. Local spatial frequency features have been identified as being appropriate for texture classification, indicating that a transform vector quantization scheme should be capable of characterizing and classifying textured regions. A data set consisting of 7 natural textures is used to demonstrate the utility of this approach. The experimental results show acceptable classification rates and suggest avenuws for future research which will yield significant improvements in future work.
Image Restoration
icon_mobile_dropdown
Tutorial review of recent developments in digital image restoration
M. Ibrahim Sezan, A. Murat Tekalp
In this review, we consider the three fundamental aspects of image restoration: (i) modeling, (ii) model identification methods, and (iii) restoration methods. Modeling refers to determining a model of the relationship between the ideal image and the observed degraded image, as well as modeling the ideal image itself, on the basis of a priori information. Model parameters are determined by various identification methods. Restoration algorithms are discussed in two categories: general algorithms and specialized algorithms. We also furnish a brief discussion of present and future research directions.
Maximum-likelihood blur identification
Reginald L. Lagendijk, Jan Biemond
In this paper we discuss the use of maximum likelihood estimation procedures for the identification of unknowns blur from a blurred image. The main focus will be on the problem of estimating the coefficients of relatively large point-spread functions, and the estimation of the support size of point-spread functions in general. Two improved blur identification techniques are proposed which are both based on the expectation- maximization algorithm. In the first method we describe the point-spread function by a parametric model, while in the second method resolution pyramids are employed to identify the point-spread function in a hierarchical manner.
Optimal constraint parameter estimation for constrained image restoration
Stanley J. Reeves, Russell M. Mersereau
Because of the presence of noise in blurred images, an image restoration algorithm must constrain the solution to achieve stable restoration results. Such constraints are often introduced by biasing the restoration toward the minimizer of a given functional. However, a proper choice of the degree of bias is critical to the success of this approach. Generally, the appropriate bias cannot be chosen a priori and must be estimated from the blurred and noisy image. Cross-validation is introduced as a method for estimating the optimal degree of bias for a general form of the constraint functional. Results show that this constraint is capable of improving restoration results beyond the capabilities of the traditional Tikhonov constraint.
Multiple input adaptive image restoration algorithms
In this paper image restoration applications where multiple distorted versions of the same original image are available, are considered. A general adaptive restoration algorithm is derived based on a set theoretic regularization technique. The adaptivity of the algorithm is introduced in two ways : a) by a constraint operator which incorporates properties of the response of the human visual system into the restoration process, and b) by a weight matrix which assigns greater importance for the deconvolution process to areas of high spatial activity than to areas of low spatial activity. Different degrees of trust are assigned to the various distorted images depending on the amount of noise. Experimental results obtained by an iterative implementation of the proposed algotihms are presented.
Robust estimation of local orientations in images using a multiresolution approach
Roland Wilson, S. C. Clippingdale, A. H. Bhalerao
The problem of estimating feature orientation from noisy image data is addressed using a multiresolution technique. It is shown that by an appropriate choice of representation of orientation, it is possible to employ simple linear smoothing methods to reduce estimation noise. The smoothing is done using a combination of scale-space recursive filtering and iterative estimation, giving significant improvements in estimated orientations at low computational cost. Applications to enhancement and segmentation are presented.
Image restoration using biorthogonal wavelet transform
Jean-Michel Bruneau, Michel Barlaud, Pierre Mathieu
A well-known method to solve ill-posed problem in image restoration is to use regularization techniques. The purpose of this paper is to propose a new scheme for digital image restoration based on a regularization method and on the BiOrthogonal Wavelet Transform. We show that, in cases where the blur function can be considered a scale function of a biorthogonal multiresolution analysis, it is possible to obtain an efficient family of regularization operators from the convolution operator alone.
Stochastic model-based approach for simultaneous restoration of multiple misregistered images
Chukka Srinivas, Mandayam D. Srinath
In this paper, the problem of restoration of multiple misregistered, blurred and noisy images is considered. Motivation for this problem comes from the larger problem of construction of a high resolution restored image from the observed set of low-resolution images. The global approach is to decompose the problem into a sequence of sub-problems, namely registration, restoration and interpolation. The restoration problem is addressed here. For simultaneous restoration of the observed images, a parametric stochastic model for the image correlations is derived and used to obtain a linear minimum mean square error (LMMSE) estimate of the original image at each sensor. The resulting restored low-resolution images are integrated to construct a single restored high resolution image. Simulation results are presented to illustrate the performance of the proposed restoration scheme.
Optical methods for iterative image restoration
In this paper we propose a new totally optical approach to the implementation of iterative algorithms for image restoration. A phase conjugating mirror (PCM) is employed as an amplifying element to implement a coherent optical feedback system capable of performing iterative operations on a spatially modulated input beam. Three systems are suggested. Their optical analysis is performed. The obtained results indicate that each system is capable of restoring linearly degraded images.
Digital Image Processing
icon_mobile_dropdown
Stability analysis of multichannel linear-predictive systems
Yusuf Ozturk, Huseyin Abut
In this study we have attempted to investigate the stability problems observed in multichannel multidimensional linear predictive modeling of images. Morf et al.[3] have shown that based on a positive definite autocorrelation matrix, singular values of the matrix H ? ?q + 1.HERM(?q + 1) must lie inside the unit circle for a stable solution, where ?q + 1 is the normalized partial correlation matrix and HERM(.) denotes the Hermitian operator. We have employed this stability method to modify the multichannel Levinson algorithm [1,2] for obtaining stable linear prediction coefficients. Since the procedure involved block-by-block processing of image intensity values, blocks of 32x32 pixels were defined as analysis windows. A two-step stabilization method has been developed for these windows and it is applied to the multichannel multidimensional linear prediction of monochromatic imagery. The first step is based on heuristic notions and employed for obtaining strictly positive definite multichannel autocorrelation matrices R[q]. The second step is based on forcing singular values of H to reside inside the unit circle for satisfying the stability criterion reported in [3].
Group delay equalization of multidimensional recursive filters
Fleur T. Tehrani, Robert E. Ford
An effective optimization algorithm is developed to design multi-dimensional recursive phase equalizers. The optimization algorithm uses an iterative procedure to minimize the maximum group delay ripple. The criteria used to guarantee the stability of the equalizer is based on DeCarlo-Strintzis theorem. Using this technique, the algorithm can be used in higher dimensions with less intensive computations. Illustrative examples are included to show the effectiveness of the algorithm.
Perceptually relevant model for aliasing in the triplet-stripe filter CCD image sensor
Rob A. Beuker, F. W. Hoeksema
In this paper we present a signal-theoretical model of the aliasing in the triplet stripe filter CCD image sensor. In this sensor, the colour filters axe placed in a repetitive pattern of three columns on the monochrome sensor. Therefore the projected scene is sampled at every three columns and abasing can result. We use the signal-theoretical model to predict the visibibty of the aliasing.
Gram-Gabor approach to optimal image representation
Moshe Porat, Yehoshua Y. Zeevi
The Gram determinant technique is applied to signal representation by non-orthogonal bases. A special case of image representation in biological and machine vision using Gabor elementary functions (GEFs) is considered. It is shown that, in general, the Gram determinant is a better approach to computation of the expansion coefficients than the one using bi-orthonormal auxiliary functions. An optimal representation by finite sets of coefficients is attained without a significant computational effort and the resultant reconstruction error converges monotonically with the addition of basis’ components to the reconstruction set. The Gram approach appears to be in a better accord with biological findings, regarding information processing along the visual pathway, compared to the conventional bi-orthogonal scheme.
Fast method of geometric picture transformation using logarithmic number systems and its applications for computer graphics
Tomio Kurokawa, Takanari Mizukoshi
Logarithmic arithmetic (LA) is a very fast computational method for real numbers. And its computation precision is much better than a floating point arithmetic of equivalent word length and range. This paper shows a method of fixed point number computations by LA—just to do numeric conversion before and after LA computations. It is used to handle discrete coordinate addresses and pixel intensity data of digital images. The geometrical transformation is a typical application of the method. Linear (affine) and non-linear transformations with three interpolations of "nearest neighbor,” ”bi-linear” and "cubic convolution” in LA are demonstrated. It is the processing of coordinate addresses and pixel intensities of fixed point numbers. Experiments by 16-bit personal computer program showed that quality and speed are surprisingly high. The latter is comparable to that by using the floating point hardware chip. Some other examples of the applications are shown—curve drawing, three dimensional computer graphics and fractal image generation—all excellent.
Space-variant filtering of images through a hybrid implementation of the Wigner distribution function
The 4-D Wigner distribution function (WDF) of 2-D images has been generated, filtered, and inverted through a hybrid optical-digital processor. In this way, different spatially variant filtered images have been obtained. The processor generates optical and sequentially the spatial samples of the distribution. Each one of this samples can be modified in a different way, multiplying it by different masks. The filtered WDF is inverted also via optical. Then, a digital image processor selects the information associated with each filtered image point.
Curved shadow generation by ray tracing in the plane
Alade Tokuta
We present a method for generating shadows in a static environment represented by polygonal or parametric surfaces, and illuminated by one or more movable point light sources. Concepts of ray tracing are modified and utilized in shadow determination. Ray-patch intersection are determined by using contour integration rather than purely numerical or subdivision techniques and thus the expense of long computation is somewhat lessened. Preprocessing steps are used to ascertain that the number of such calculations are limited. Shadow pairs are detected and pairs that cannot interact to produce shadows are also detected and discarded. Coherence properties are used to limit the number of rays processed. The approach combines elements of the shadow z-buffer in the generation process and to a lesser extent, that of projected polygons in scanline rendering as a preprocessing step. The algorithm is easily integrated with scanline a z-buffer algorithm and thus retains the benefits of a z- buffer. Distributed light sources are modelled as arrays of point light sources and the method allows the computation of umbra-penumbra effects.
Computation network for visible surface reconstruction
Hung-Chang Hsieh, WenThong Chang
A network computation method for the reconstruction of visual surface is presented. The surface reconstruction problem stated in this paper is to produce a full surface representation from part of its information. The problem is modeled with the minimization of a variation functional. To transform the problem into the discrete domain, the Rayleigh-Ritz method is introduced. The tensor product method is employed to derive a set of linear equations. Then a network composed of a number of iteration cells is proposed to solve the set of equations. During the iteration process of the network, each cell is interacting with its neighboring cells. The surface is reconstructed while the iteration converges.
JPEG/MPEG Algorithms and Implementation
icon_mobile_dropdown
Color image transmission system for prepress with ADCT compression algorithm
Hiroyuki Hasegawa, Masashi Sugiura, Hideyuki Ono, et al.
This paper outlines the compression and transmission system developed for high- quality color image data for prepress. This system has resulted in highly-efficient compression of color image data by using DCT(Discrete Cosine Transform) - SQ(Scalar Quantization) coding scheme adopted as the basic algorithm of the international standard for color and still picture coding in ISO. This system allows selection of a compression ratio in the range of 1/2 - 1 /50. As a result of evaluations a quality image picture for printing is obtained at a data compression ratio ranging from 1/5 - 1/10. By using the newly-developed high-speed CODEC LSIs, this system has enabled real-time data transmission to 64Kbps digital lines.
Design of a multifunction video decoder based on a motion-compensated predictive-interpolative coder
Kun-Min Yang, Sharad Singhal, Didier J. LeGall
It is now possible to encode VCR quality video and stereo audio at only 1.5 Mbit/s. In addition, standards have been defined for compressing full color still images and teleconferencing video at bit rates of 64 to 1920 Kbit/s. Finally there is great interest in the next generation of PCs which will incorporate multimedia displays and have capabilities to edit, store and transmit video and images over communication networks. Although the standards defined for video teleconferencing and those being defined for storage of video and images are different, they still have substantial parts that are common. In this paper, we describe the design of a multi-function decoder that is capable of decoding bit streams from the different encoders. By sharing functional modules that are common to the different algorithms, the decoder can cope with the different standards with only a minimal increase in complexity required over that needed for any one standard. In addition, it allows transparent display of video information coded at different frame rates and using different aspect ratios, thus facilitating exchange of information between NTSC and PAL-based systems as well as film material.
Video coding for digital storage media using hierarchical intraframe scheme
Kazuto Kamikura, Hiroshi Watanabe
A video coding algorithm for digital storage media (DSM) is currently being studied by the Moving Picture Experts Group of ISO. This paper proposes two new techniques for improving the retrieval functions and coding efficiency of DSM. The first is a hierarchical intraframe coding scheme for flexible fast forward/reverse playback mode implementation, which is a requirement peculiar to DSM. The second is a global motion compensation scheme to improve coding efficiency by compensating for camera movement, e.g. pan and zoom, using three parameters. Simulation results show the feasibility of the proposed techniques.
Principal devices and hardware volume estimation for moving picture decoder for digital storage media
Makiko Konoshima, Osamu Kawai, Kiichi Matsuda
The ISO/MPEG has been discussing the establishment of a standard for moving- picture coding to ensure that digital storage media can store picture information efficiently. This paper introduces a trial design for the source decoder of simulation model 2 in the ISO/MPEG video subgroup. This paper describes MPEG simulation model decoding and the Fujitsu video signal processors, VSP-1 and DCT LSI chips. We designed a source decoder using these two LSI chips, memory, and standard logic ICs. The trial design showed that three VSP-ls and one DCT LSI chips are sufficient to make a source decoder.
Vision Science and Technology for Space
icon_mobile_dropdown
Digital image gathering and minimum mean-square error restoration
Stephen K. Park, Stephen E. Reichenbach
Most digital image restoration algorithms are inherently incomplete because they are conditioned on a discrete-input, discrete-output model which only accounts for blurring during image gathering and additive noise. For those restoration applications where sampling and reconstruction (display) are important the restoration algorithm should be based on a more comprehensive end-to-end model which also accounts for the potentially important noise-like effects of aliasing and the low- pass filtering effects of interpolative reconstruction. In this paper we demonstrate that, although the mathematics of this more comprehensive model is more complex, the increase in complexity is not so great as to prevent a complete development and analysis of the associated minimum mean- square error (Wiener) restoration filter. We also survey recent results related to the important issue of implementing this restoration filter, in the spatial domain, as a computationally efficient small convolution kernel.
Information theoretical assessment of image gathering and coding for digital restoration
Friedrich O. Huck, Sarah John, Stephen E. Reichenbach
In this paper we are concerned with the end-to-end performance of image gathering, coding, and restoration as a whole rather than as a chain of independent tasks. Our approach evolves from the pivotal relationship that exists between the spectral information density of the transmitted signal and the restorability of images from this signal. The information theoretical assessment accounts for the information density and efficiency of the acquired signal as a function of the image-gathering system design and the radiance-field statistics, and for the information efficiency and data compression that can be gained by combining image gathering with coding to reduce the signal redundancy and irrelevancy. The redundancy reduction is concerned mostly with the statistical properties of the acquired signal, and the irrelevancy reduction is concerned mostly with the visual properties of the scene and the restored image. The results of this assessment lead to intuitively appealing insights about image gathering and coding for digital restoration. Foremost is the realization that images can be restored with better quality and from less data as the information efficiency of the transmitted data is increased, providing that the restoration correctly accounts for the image gathering and coding processes and effectively suppresses the image-display degradations. High information efficiency, in turn, can be attained only by minimizing imagegathering degradations as well as signal redundancy. Another important realization is that the critical constraints imposed on both image gathering and natural vision limit the maximum acquired information density to ~ 4 binary information units (bifs). This information density requires ~ 5-bit encoding for transmission and recording when lateral inhibition is used to compress the dynamic range of the signal (irrelevancy reduction). This number of encoding levels is close (perhaps fortuitously) to the upper limit of the ~ 40 intensity levels that each nerve fiber can transmit, via pulses, from the retina to the visual cortex within ~l/20 sec to avoid prolonging reaction times. If the data are digitally restored as an image on film for ‘best’ visual quality, then the information density may often be reduced to ~3 bifs or even less, depending on the scene, without incurring perceptual degradations because of the practical limitations that are imposed on the restoration. These limitations are not likely to be found in the nervous system of human beings, so that the higher information density of ^4 bifs that the eye can acquire probably contributes effectively to the improvement in visual quality that we always experience when we view a scene directly rather than through the media of image gathering and restoration.
Image coding by edge primitives
Rachel Alter-Gartenberg, Ramkumar Narayanswamy
The visual system is most sensitive to structure. Moreover, many other attributes of the scene are preserved in the retinal response to edges. In this paper we present a new coding process that is based on models suggested for retinal processing in human vision. This process extracts and codes only features that are preserved by the response of these filters to an edge (edge primitives). The decoded image, obtained by recovering the intensity levels between the outlined boundaries, using only the edge primitives, attains high structural fidelity. We demonstrate that a wide variety of images can be represented by their edge primitives with high compression ratios, typically two to three orders of magnitude, depending on the target. This method is particularly advantageous when high structural fidelity representation must be combined with high data compression.
Photon detection with parallel-asynchronous processing
Darryl D. Coon, A. G. Unil Perera
A new approach to photon detection with a parallel asynchronous signal processor is described. The visible or infrared photon detection capability of the silicon p+-n-n+ detectors and the parallel asynchronous processing are addressed separately. This approach would permit an independent analog processing channel to be dedicated to every pixel. A laminar architecture consisting of a stack of planar arrays of the devices would form a two-dimensional array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuronlike asynchronous pulse coded form through the laminar processor. Such systems would integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The possibility of multispectral image processing is addressed.
Vision-based planetary rover navigation
Brian H. Wilcox
Future missions to the moon or Mars will use planetary rovers for exploration or construction tasks. Operation of these rovers as unmanned robotic vehicles with some form of semi-autonomous control is desirable to reduce the cost and increase the capability and safety of many types of missions. A program to develop semi-autonomous navigation (SAN) was initiated in late 1988 at the Jet Propulsion Laboratory (JPL) under sponsorship of the Office of Aeronautics and Exploration Technology of the National Aeronautics and Space Administration (NASA). A testbed vehicle with all necessary power, sensing, and computing resources on-board to demonstrate SAN has been developed and initial testing has been completed. Vision-based navigation techniques used on this vehicle and planned for implementation are described.
Expert imaging system
George B. Westrom
One of the most impressive features of the human visual system is its innate ability to extract only the most important visual information and to disregard that which is insignificant or superfluous. In this paper we introduce an Expert Imaging System, EIS, concept that emulates some qualities of human vision that can be used for remote sensing on Mars or for the Space station activities.
Hybrid image processing
A combination of digital and optical image processing techniques makes the best use of each processing domain. Fourier optics offers speed, digital processing offers algorithmic complexity. One connection between the domains is digital image operations that produce shift invariance to image changes of a priori known character. Our video rate image coordinate transformation is an enabling technology for real time hybrid image pattern recognition. We outline some of the techniques and devices that are unique to the Johnson Space Center program
Pattern Recognition
icon_mobile_dropdown
Retrieval of script information appearing on bank checks for automatic reading purposes
J. Christophe Salome, Manuel Leroux, Herve Oiry, et al.
Within the framework of the project aiming at the automatic processing of post cheques in French Post Financial establishments, the French Post Technical Research Department (SRTP) has started studying the automatic reading of cursive script amounts on cheque. We present the total processing sequence and in particular the part related to the retrieval of script information on the cheque. Indeed, pre-printed information (guiding lines, slashes, acronyms, typed information) and script information, which is the one to be identified, coexist on the cheque. The problem consists in retrieving the information to be processed from the cheque without damaging the information.
Use of discrete-state Markov process for Chinese-character recognition
Bor-Shenn Jeng, Chun-Hsi Shih, San-Wei Sun, et al.
In this paper, an intelligent optical Chinese character recognition system using discrete-state Markov process has been developed to solve the input problem of Chinese characters. The doubly stochastic process encodes the distortion and similarity among patterns of a class through a stochastical and evaluating approach. A simple feature extraction is employed in our experiments, which is the histogram of the projected profiles. A character class is modeled by a 7-state Markov process, while the recognition rate for 5401 multi-font character classes reaches 97% on average for inside test and 92% on average for outside test.
Underground radar system utilizing pattern-recognition technique in the frequency domain
Yuji Nagashima, Jun-ichi Masuda, Ryosuke Arioka, et al.
This paper describes a new signal processing technique to improve the detection accuracy of underground radar system (pulse radar). A remarkable characteristic of this new signal processing is the process of extracting a target echo using frequency domain information. The observation signal is divided into several segments, which are trans formed into a corresponding frequency region for evaluation of the spectrum distribution and computation of the frequency parameter values for that distribution. The target signal can be extracted by comparing the target signal parameter values with the reference frequency parameter values of the target signals. This technique clearly discriminates the targets from among other objects.
Rotation- and translation-invariant pattern recognition based on distance transformations
Juergen Moeschen, Bedrich J. Hosticka
In this paper a statistical approach for pattern recognition, based on a distance transformation, will be presented. The algorism enables multiclass rotation- and translation invariant pattern recognition and can be made optionally scale invariant. Beside the theory of the algorithm we will present some practical application examples and propose a system architecture for a high speed hardware implementation including an ASIC architecture.
Description and matching of density variation for personal identification through facial images
Kazushige Takahashi, Tsuyoshi Sakaguchi, Toshi Minami, et al.
Description and matching of density variation for personal identification through facial images
Footprint image processing expert system with friendly user interface
Yasuyuki Hattori, Toshi Minami, Osamu Nakamura
This paper describes an image processing expert system for images of footprints used in criminal investigations, and user interface with this system. The purpose of this system is to increase the efficiency of footprint image processing through improved footprint image quality and automatic footprint pattern extraction. In footprint image processing, the most difficult issue is to carry through the complex processing sequence brought by the variety of pickup methods depending on the condition of pickup places, various patterns of footprints, noises and so on. Because this system constructs usable processing sequences for input images automatically by using knowledge of the image processing and has friendly user interface, not only police investigators but general police officers can deal with image processing easily. In the pattern extraction, a practical processing sequence for various footprint images is constructed through the simulation over 37 footprint images, and the usefulness of the expert system is ascertained experimentally.
Extraction of arbitrary shapes from a noisy binary image using pseudo view field tracer
Tomoharu Nagao, Takeshi Agui, Masayuki Nakajima
We propose a new method to extract arbitrary shapes such as lines, circles, ellipses and other complex shapes, from noisy binary images. In this method, shapes in a given binary image are traced by the tracer named PVFT (Pseudo View Field Tracer ). The movement of PVFT is similar to that of the view field of a man who recognizes arbitrary shapes with a restricted view field. That is, PVFT selects line segments of a shape in a noisy image, and traces them. Moreover, movement of PVFT is controlled according to the shape to be extracted.
2-D invariant color pattern recognition using complex log-mapping transform
Soo-Chang Pei, Min-Jor Hsu
This paper describes a complex log mapping approach to 2D color object recognition. First a 3D vector color image is first transformed into a 2D vector image by projection into a color plane, then complex log mapping transform is applied on this 2D vector color image for invariant pattern recognition; By extensive computer simulations, this method can effectively recognize a scale and rotation changed color object among the similar shape objects but with different colors, and also among the different shape objects as well very successfully.
Image Sequence Coding III
icon_mobile_dropdown
Asynchronous conditional-replenishment video codec for 64-kbits/s channels
Jose Manuel Menendez, Carlos Munoz
A low cost and low-bit rate video codec for the transmission of traffic images using 64 kbit/s channels is presented. To minimize hardware complexity and data volume, a predictive coding structure with conditional replenishment has been selected. Input pictures are tessellated into 2-D blocks, compared with a reference picture, and only those blocks with significant differences are transmitted.
Fast codebook search algorithm in vector quantization
Jordi Huguet, Luis Torres
We present a simple but effective algorithm to accelerate the encoding process in a vector quantization scheme when a MSE criterium is used. A considerable reduction in the number of operations is achieved. This algorithm was first designed for image vector quantization in which the samples of the image signal (pixels) are positive, although it can be used with any positive-negative signal with only a minor modification.
Digital HDTV compression at 44 Mbps using parallel motion-compensated transform coders
Hsueh-Ming Hang, Riccardo Leonardi, Barry G. Haskell, et al.
High Definition Television (HDTV) promises to offer wide-screen, much better quality pictures as compared to the today’s television. However, without compression a digital HDTV channel may cost up to one Gbits/sec transmission bandwidth. We suggest a parallel processing structure using the proposed international standard for visual telephony (CCITT Px64 kbs standard) as processing elements, to compress the digital HDTV pictures. The basic idea is to partition an HDTV picture into smaller sub-pictures and then compress each sub-picture using a CCITT Px64kbs coder, which is cost-effective, by today’s technology, only on small size pictures. Since each sub-picture is processed by an independent coder, without coordination these coded sub-pictures may have unequal picture quality. To maintain a uniform quality HDTV picture, the following two issues are studied: (l) sub-channel control strategy (bits allocated to each sub-picture), and (2) quantization and buffer control strategy for individual sub-picture coder. Algorithms to resolve the above problems and their computer simulations are presented.
Hybrid DCT encoding of TV and HDTV: a comparative study
Narciso N. Garcia, Jose Ignacio Ronda, Fernando Jaureguizar
The different nature of the TV and HDTV signals, which hold dissimilar features from both the statistical and perceptual points of view, implies that the same behavior cannot be expected from the encoding procedure. Here, it is shown a comparison of the performances of a Hybrid DCT bit rate compression system when applied to information of each type. This study is carried out taking as a basis the statistics of the output data which result from the processing of corresponding HDTV and TV sequences, and it is described by several sets of data, relevant at different stages of the encoding process, such as DCT-domain energy distributions, quality versus bit rate curves, and final symbol statistics.
Development of video teleconference system
Yoshiaki Kobayashi, Kentaro Toudo
The application of broadband transmission system has become feasible to any locations by using optical fiber cables in various modes; such as optical fiber running in overhead ground wires in power transmission lines, and overhead or undergraund optical fiber cables running in parallel with power transmission/distribution lines. Advances in such infrastructures have brought about greater needs for video transmission, above all, video information switching and transmission. Although the research and development for broadband ISDN (B-ISDN) is making progress by many organizations, solutions to problems associated with the cost and the application must be sought to satisfy immediate needs. Such an environment called for the development of a video information transmission and switching system as an alternative to the B-ISDN. This paper brings forward a video teleconference system developed in the form of a video switching network system.
Motion-compensated adaptive interframe/intraframe prediction
Kan Xie, Luc Van Eycken, Andre J. Oosterlinck
Motion-compensated (MC) interframe prediction which has been used widely in low bit-rate television coding is an efficient method to reduce the temporal redundancy in a sequence of television signals. However, if there is a violent or complicated motion in the scene such as in the broadcast television, the performance of MC interframe prediction will degenerate seriously while the intraframe prediction is better. It is obvious that an adaptive prediction scheme will provide a lower and more stable transmission rate than that obtained by simple MC interframe prediction or intraframe prediction. In this paper, a motion-compensated adaptive inter/intra frame prediction scheme based on the block-based motion estimation algorithm we proposed earlier [11] is presented. Adaptivity is achieved block-wise by comparing prediction errors of MC interframe predictor and intraframe predictor, one of the predictors which gives the smaller error is chosen as the actual predictor. To further compress transmission bit-rate, a variable length encoding is designed to encode the prediction errors and the motion information according to their actual distributions. Simulation results have shown that this adaptive prediction scheme is very efficient for different kinds of pictures containing slow to very violent motion, and a'high compression rate can be archived with very good picture quality.
Analysis of a 2-D DCT image coding scheme with motion compensation and vector quantization
Romualdo Picco, Fabio Luigi Bellifemine, Antonio Chimienti
A new effective adaptive coding scheme for data compression of images is presented. It is based on a reduction of the spatial, temporal and statistical redundancy. Inter-frame motion compensation technique was used to take advantage of the image temporal correlation. A unitary 2D-DCT was performed on each (8x 8) pixel block in order to exploit spatial correlation and in order to decorrelate the block elements and to make the relative quantization error subjectively less disturbing. The Rate Distortion Function of a Laplacian source was used as a classification strategy by relating a block to a coding bit rate in accordance with its energy. The a.c. transform coefficients were vector quantized by using the pyramidal vector quantizer based on the cubic lattice. Compression ratios in the 8 — 12 range give a distribution quality for colour image sequences according to the CCIR Rec. 601 (12-20 Mbit/sec) or HDTV sequences (50-70 Mbit/sec). Experimental results obtained by computer simulations are presented and the influence of the motion compensation and the vector quantization on the objective error are analysed.
Image Sequence Coding II
icon_mobile_dropdown
Control analysis of video packet loss in ATM networks
Duan-Shin Lee, Kou-Hu Tzou, San-Qi Li
In this paper we will study the video packet loss due to excessive queueing delay in a single statistical multiplexer. Because of the real time nature of video service, packets exceeding a time constraint will be declared lost at the destination. Any packet arriving during the period when the queue length exceeds the threshold determined by the time constrain will be dropped at the destination. Thus, packet losses occur in clusters. We measure the quality of the received pictures by the expected underload period and the expected number of high priority arrivals during an overload period. The former quantity measures the frequency of packet dropping due to excessive delay, while the later is an indicator of the picture area affected. We analyze and compare two system schemes, where the first scheme drops late packets only at the destination and the second one blocks arrivals in front of the multiplexer once the packets exceed the permissible delay. Comparison of the two system schemes based on the two measurements mentioned above indicates that the second scheme is superior to the first one. In order to further improve the video service quality, a simple congestion control based on the dynamics of queue length is proposed. Our analysis shows that the proposed control scheme significantly extends the expected underload period.
Pattern Recognition
icon_mobile_dropdown
Zak-Gabor representation of images
Izidor Gertner, Yehoshua Y. Zeevi
A mathematical approach to image analysis and synthesis in the combined frequency- position space is presented. The formalism is based on the Weil map - Zak transform, which provides one of the most fundamental tools for studies of nonstationary processes and images as such. Algorithms suitable for computation of Gabor expansion coefficients are presented.
JPEG/MPEG Algorithms and Implementation
icon_mobile_dropdown
Comparing motion-interpolation structures for video coding
Atul Puri, R. Aravind
This paper considers the topic of video compression at bit-rates around 1 Mbps for digital storage/playback applications. The backbone algorithm employed is the well-known motion-compensated prediction-error coder. This approach is considerably enhanced by the incorporation of conditional motion-compensated interpolation (CMCI), where some of the video frames are encoded with prediction-error coding and the remainder are encoded with motion-compensated interpolation and interpolation-error coding. The advantage of this combined technique is that interpolation errors can be coded with much fewer bits than prediction errors for similar reconstructed-picture quality. We consider two different techniques for estimating motion in CMCI- coded frames. In order to facilitate direct interaction with the bit-stream, the input sequence is divided into groups such that one frame in each group is intra-coded. The arrangement of predictive- and CMCI-coded frames can be varied; a study of these arrangements is the major focus of this paper. Structures with one-, two- and four-frame interpolation are compared. Our results indicate that one-frame interpolation is inferior to two-frame interpolation on complex test material. Four-frame interpolation is comparable in performance to two-frame interpolation as long as the motion is compensable.
Hierarchical Image Coding
icon_mobile_dropdown
Image representation using binary space partitioning trees
Hayder Radha, Riccardo Leonardi, Bruce Naylor, et al.
Representation of two and three-dimensional objects by tree structures has been used extensively in solid modeling, computer graphics, computer vision and image processing. (See for example [Mantyla] [Chen] [Hunter] [Rosenfeld] [Leonardi].) Quadtrees, which are used to represent objects in 2-D space, and octrees, which are the extension of quadtrees in 3-D space, have been studied thoroughly for applications in graphics and image processing.
JPEG/MPEG Algorithms and Implementation
icon_mobile_dropdown
Encoding of motion video sequences for the MPEG environment using arithmetic coding
Eric Viscito, Cesar A. Gonzales
In this paper, we describe a motion-compensated hybrid DCT/DPCM algorithm suitable for compressing motion video sequences at 1-1.5 Mbits/second. The algorithm is compatible with the currently emerging MPEG video compression standard with one exception: the entropy coding of the DCT coefficients, motion vector data and all side information is accomplished via arithmetic coding rather than Huffman coding. The algorithm can be made completely MPEG-compatible with the addition of an arithmetic-Huffman transcoder. Experimental results demonstrate the effect of the proposed changes on the compression efficiency.