Proceedings Volume 1453

Human Vision, Visual Processing, and Digital Display II

cover
Proceedings Volume 1453

Human Vision, Visual Processing, and Digital Display II

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 1 June 1991
Contents: 5 Sessions, 36 Papers, 0 Presentations
Conference: Electronic Imaging '91 1991
Volume Number: 1453

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Quality of Displayed Information
  • Perceptual Processing of Spatial and Spatio-Temporal Images
  • Model-Based Image Coding, Compression, and Enhancement
  • Biologically Based Machine Vision
  • Machine and Human Color Vision
  • Biologically Based Machine Vision
Quality of Displayed Information
icon_mobile_dropdown
Evaluation of the effect of noise on subjective image quality
Peter G. J. Barten
Basic considerations are given about the effect of static and dynamic noise on the contrast sensitivity of the eye and a numerical evaluation is made of published data on this subject. The results of this analysis are used to extend the square-root integral method for the evaluation of subjective image quality in such a way that it also can be applied to describe the effect of noise. Using an adapted version of the square- root-integral, calculations are made for experiments with pictures with various amounts of white noise found in literature. From a comparison between measurements and calculations, it appears that the image quality calculated in this way shows a very good correlation with the measurents.
Image quality measurements with a neural brightness perception model
Timothy A. Grogan, Mei Wu
A computational model for the human perception of image brightness has been advanced by Cohen, Grossberg, and Todorovic. The research describes how this model can be used to assess perceived image quality. The implementation of the model is extended to allow the processing of larger images and an increased dynamic range of the gray scale. The model is validated by examining the simulation of some classical brightness perception phenomena including the Herman grid illusion, and the Craik-O'Brien-Cornsweet effect. Results of a comparative evaluation of three halftoning algorithms are offered which indicate that the model is useful for the evaluation of image processing algorithms. Human subjects ranked the quality of the images halftoned with each of three different algorithms at three different viewing distances. Objective measures of the halftoned images were obtained after preprocessing to account for the different viewing distances. The ranking of the objective measures did not correspond to those of the majority of the human observers. However, after processing by the brightness perception model, ranking of the objective measures do correspond with the rankings assigned by human observers.
Subjective evaluation of scale-space image coding
Six experiments are described in which the perceived quality of scale- space-coded color images has been assessed by means of numerical category scaling. 'Scale space' is a pyramidal multiresolution image- coding technique in which data reduction is accomplished by quantizing the prediction error signals on different scales of the pyramid. This coding technique has been applied to the luminance as well as the chrominance components of color images of three static complex scenes. The main coding parameters were degree of uniform quantization and scale level of a quantization error. The results show that the magnitude of impairment due to quantizing the prediction error signal on a given scale of the luminance component depends on the scale level as well as on the content of the scene, that high-resolution color information does not contribute to image quality, that perceptually distinct impairments combine according to a Minkowski metric with an exponent that is slightly above 2 and that perceptually similar impairments combine according to a Minkowski metric with an exponent that is slightly above 1. The applicability of Allnatt's 'law of subjective addition' is discussed.
Quality factors of handwritten characters based on human visual perception
Takahito Kato, Mitsuho Yamada
Although various hand-written Kanji character recognition techniques have been developed, they are not yet satisfactory. Hand-written characters can vary in shape so much that the recognition rate depends greatly on their quality. It is very useful to know how humans recognize hand-written characters. Accordingly, we made two kinds of evaluations. First, we carried out a subjective experiment to ascertain whether humans use common criteria for judging character quality or not. Japanese adults were asked to rate the character quality on a 5 point- scale. The results suggested that the subjects used common criteria. Second, to find the character' key parts, we analyzed the subject's eye movement as the quality was evaluated. The experiments revealed that the parts on which the subject's attention was concentrated were: areas of high stroke density, the positioning of the stroke and its spacing, and character composition. Based on the results of these experiments, we have proposed new measures for stroke, pixel, line width, aspect ratio, and character balance. It was indicated that the measures are useful factors for determining the quality of hand-written Kanji characters.
Objective evaluation of the feeling of depth in 2-D or 3-D images using the convergence angle of the eyes
Mitsuho Yamada, Nobuyuki Hiruma, Haruo Hoshino
Measurement of the convergence angle of a viewer's eye can be useful in evaluating his perception of depth while watching, for example, HDTV or 3D stereoscopic pictures. First, the target was moved away from the subject to determine convergence characteristics in a natural scene. The convergence angle changed as the target moved and the distance calculated using the intersection of the sight lines of both eyes also corresponded well to the actual position of the target. Secondly polarized binocular stereoscopic pictures were used. The Parallax of the CG pattern changed at 5-second intervals toward the foreground or background in steps, from 0 to one of three parallax amplitudes. As a result, the convergence distribution widened as the parallax increased. This spread was large when the target was presented in the foreground. Next, a program prepared stereoscopically was presented in 3D with parallax and in 2D without parallax. The results indicated that the convergence varied even when viewed 2D and that the convergence distribution was larger in 3D than in 2D. As for the possible cause of this change with 2D picture, the influence of the feeling of depth produced by the 2D picture should be considered.
New approach to palette selection for color images
We apply the vector quantization algorithm proposed by Equitz to the problem of efficiently selecting colors for a limited image palette. The algorithm performs the quantization by merging pairwise nearest neighbor (PNN) clusters. Computational efficiency is achieved by using k- dimensional trees to perform fast PNN searches. In order to reduce the number of initial image colors, we first pass the image through a variable-size cubical quantizer. The centroids of colors that fall in each cell are then used as sample vectors for the merging algorithm. Tremendous computational savings is achieved from this initial step with very little loss in visual quality. To account for the high sensitivity of the human visual system to quantization errors in smoothly varying regions of an image, we incorporate activity measures both at the initial quantization step and at the merging step so that quantization is fine in smooth regions and coarse in active regions. The resulting images are of high visual quality. The computation times are substantially smaller than that of the iterative Lloyd-Max algorithm and are comparable to a binary splitting algorithm recently proposed by Bouman and Orchard.
Alignment and amplification as determinants of expressive color
Nathaniel Jacobson, William J. Bender, Uri Feldman
Color alignment and amplification are explored within the context of a windowed system in a computer workstation. Color alignment allows precise specification of color relatedness while, at the same time, accounting for interaction of surrounding visual field and state of visual adaptation. Alignments represent the basic unit of color experience, and are formed by any two hues in color space. An experiment which utilizes the color alignment model developed by Jacobson and Bender is presented. The experiment measures preference for type of alignment between text color, background color, and highlight color, in a workstation. Results of this experiment provide guidelines for effective selection of colors for window, font, and highlight for any given application within the windowed system. Of particular interest is the incorporation of color alignments into the window manager, so that local colors can be coordinated with global colors to provide a variety of expressive signals.
Minimum resolution for human face detection and identification
Ashok Samal
Our goal is to build an automated system for face recognition. Such a system for a realistic application is likely to have thousands, possibly miffions of faces. Hence, it is essential to have a compact representation for a face. So an important issue is the minimum spatial and grayscale resolutions necessary for a pattern to be detected as a face and then identified. Several experiments were performed to estimate these limits using a collection of 64 faces imaged under very different conditions. All experiments were performed using human observers. The results indicate that there is enough information in 32 x32 x 4bpp images for human eyes to detect and identify the faces. Thus an automated system could represent a face using only 512 bytes.
Perceptual Processing of Spatial and Spatio-Temporal Images
icon_mobile_dropdown
Oh say, can you see? The physiology of vision
Richard A. Young
How can we see? One answer lies in the receptive fields of visual cells in our eyes and brain. A 'receptive field' in the simplest terms is a map of the regions in space where light can affect a cell's electrical output. Millions of such fields analyze and filter the patterns of light that impinge on the retina. I will illustrate the major anatomical structures and physiological processes underlying such fields, in the primary visual pathway from the eye to the brain. An understanding of such fields is critically important, since their output provides the basis upon which conscious visual perception can eventually be constructed by higher brain processes. That is, perception itself is derived from the information as filtered and analyzed by such fields. Complete spatio-temporal receptive fields of simple cells in the visual cortex of monkeys were recently recorded using white-noise analysis techniques by Dan Pollen and colleagues. I discuss various models of the shapes of these fields (Gaussian derivative, Gabor, edge-and-line- detector, and difference-of-offset-Gaussian) in the context of these new data. The Gaussian derivative model provided the simplest and most concise description of the receptive fields to the models tested. Gaussian derivative machine vision spatio-temporal filters, based upon the biological data, produced robust estimates of the spatial and temporal derivatives of the image. These should prove suitable for form, motion, color, and stereo analysis, using only linear, separable filters or their linear combinations. So a partial answer to 'How can we see?' may be that receptive fields in the early visual system may serve as robust derivative analyzers in space and time.
Packing geometry of human cone photoreceptors: variations with eccentricity and evidence for local anisotropy
Kenneth R. Sloan Jr., Christine A. Curcio
Disorder in the packing geometry of the human cone mosaic is believed to help alleviate spatial aliasing effects. In order to characterize cone packing geometry we gathered positions of cone inner segments at 7 locations along 4 primary and 2 oblique meridians in an adult human retina. We generated statistical descriptors based on the distribution of distances and angles to Voronoi neighbors. Parameters of a compressed-jittered model were fit to the actual mosaic. Local anisotropies were investigated using correlograms. We find that: (1) median distance between Voronoi neighbors increases with eccentricity, but the minimum distance is constant (6-8 micrometers ) across peripheral retina; (2) the cone mosaic is most orderly at the edge of the foveal rod-free zone; (3) in periphery, cone spacing is 10-15% less in one direction than in the orthogonal direction; (4) cone spacing is minimal perpendicular to meridians emanating from the foveal center. The nearly constant minimum distance implies that high spatial frequencies may be sampled even in peripheral retina. Local anisotropy of the cone mosaic is qualitatively consistent with the meridional resolution effect previously described for the discrimination of gratings.
Network compensation for missing sensors
A network learning algorithm is presented that computes interpolation functions that can compensate for weakened, jittered, or missing elements of a sensor array. The algorithm corrects errors in translation invariance, so prior knowledge of the input images is not required.
Static and dynamic spatial resolution in image coding: an investigation of eye movements
Lew B. Stelmach, Wa James Tam, Paul J. Hearty
Large savings in bandwidth can be achieved when an image is displayed at full resolution at the center of gaze, and at lower resolution outside this central area. However, these savings require real-time monitoring of the observer's eye-position and real-time processing of the image. Hence, such techniques are limited to a single viewer. It would be useful if a reduction in bandwidth similar to that obtained with a single viewer could be achieved with multiple viewers, and without real- time monitoring of eye-movements. It is clear that this technique would be feasible only if different viewers looked at the same part of an image at the same time. In the present research, twenty-four observers viewed 15 forty-five-second clips of NTSC video while their direction of gaze was monitored. The goal of the research was to assess whether information about viewing behavior could be used for image coding. Our analysis of the viewing behavior showed that there was a substantial degree of agreement among viewers in terms of where they looked. We conclude that incorporating information about viewing behavior into video coding schemes may result in appreciable bandwidth savings.
Detecting spatial and temporal dot patterns in noise
Bruce Drum
The visual system can be thought of as an image processor that first reduces the dynamic retinal image to a temporal succession of noisy but redundant arrays of retinal ganglion cell signals and then reconstructs from these signals a stable representation of the external world. The process by which this reconstruction takes place is still poorly understood. An obvious requirement, however, is the capability to reject the noise in the individual neural signals. I am investigating the visual system's noise rejection capabilities by determining how much noise must be added to dot patterns to reduce them to detection threshold. The stimuli are patches of nonrandom dots surrounded by dynamic random dots of the same mean luminance and contrast. The non randomness, or coherence, of the stimulus patterns is controlled by randomizing a known percentage of stimulus dots in each frame of the dynamic display. The stimulus patterns can be limited to either spatial or temporal information. In addition to coherence, the size, duration and retinal location of the stimulus can be varied, as well as the temporal frequency, dot size, contrast and mean luminance of the entire display. Coherence thresholds are generally elevated by any operation that reduced the number of ganglion cells responding to the stimulus, either by reducing the stimulus area or duration or by limiting the response to a subset of ganglion cells (e.g., the receptive field overlap or response redundancy factor can be reduced by preferentially stimulating only one functional ganglion cell type, or by testing glaucoma patients with partially destroyed ganglion cell layers). The visual system thus appears to reduce noise effects by integrating neural responses that are correlated in either space or time.
Observer performance in dynamic displays: effect of frame rate on visual signal detection in noisy images
James Stuart Whiting, David A. Honig, Edward Carterette, et al.
An observer's ability to detect low contrast features (signals) within an image is an important measure of image quality. A theory exists for describing the relationship between measurable image parameters and the detectability of simple visual signals such as squares or disks in single images. This signal detection theory has been successfully applied to many practical visual tasks yielding fundamental re'ationships between noise, contrast, and the effect on detectability of intensifying screen/x-ray film combinations in conventional radiology2, and quantization noise,3 image processing,4 and window/level settings5'6 in digital displays. We are aware of no studies examining signal detectability in dynamically displayed medical images, despite the importance of these displays for many imaging modalities. Examples of dynamic displays in medical imaging include x-ray fluoroscopy, cardiac cineangiography, real-time two-dimensional ultrasound (2D-Echo), rapid-sequence nuclear magnetic resonance imaging (cine MRI), radioisotope ventriculography, and ultrafast computed tomography (UFCT) . The goal of the present study was to quantify the psychophysical parameters which affect observer performance in dynamically displayed sequences of noisy images.
Perceiving the coherent movements of spatially separated features
Lyn Mowafy, Joseph S. Lappin
When a partially-occluded object is represented in an image, it is defined by a set of spatially-separated features that may be registered at different spatial scales. To under stand the image, human vision must organize these fragmented optical features into common and distinct object surfaces. Although the common fate of moving features often is considered a primary source of reliable information for image segmentation, little is known of the visual system's capacity to discriminate the coherence relative movements of spatially-separated features. In a series of experiments, observers viewed elements whose movements were correlated (direction and magnitude) or were uncorrelated. Our results indicate that observers can discriminate the two types of movement about as well as they can detect any movement at all. Moreover, the ability to perceive coherent movements is maintained under a variety of conditions including differences in the elements' spatial frequency content, spatial position and contrast, and temporal phase shifts between the spatially-correlated displacements. These results suggest that coherent relative motion may be a fundamental source of information exploited by vision, despite considerable variability in the spatial and temporal characteristics of the individual features.
Model-Based Image Coding, Compression, and Enhancement
icon_mobile_dropdown
"Perfect" displays and "perfect" image compression in space and time
Several topics connecting basic vision research to image compression and image quality are discussed: (1) A battery of about 7 specially chosen simple stimuli should be used to tease apart the multiplicity of factors affecting image quality. (2) A 'perfect' static display must be capable of presenting about 135 bits/min2. This value is based on the need for 3 pixels/min and 15 bits/pixel. (3) Image compression allows the reduction from 135 to about 20 bits/min2 for perfect image quality. 20 bit/min2 is the information capacity of human vision. (4) A presumed weakness of the JPEG standard is that it does not allow for Weber's Law nonuniform quantization. We argue that this is an advantage rather than a weakness. (5) It is suggested that all compression studies should report two numbers separately: the amount of compression achieved from quantization and the amount from redundancy coding. (6) The DCT, wavelet and viewprint representations are compared. (7) Problems with extending perceptual losslessness to moving stimuli are discussed. Our approach of working with a 'perfect' image on a 'perfect' display with 'perfect' compression is not directly relevant to the present situation with severely limited channel capacity. Rather than studying perceptually lossless compression we must carry out research to determine what types of lossy transformations are least disturbing to the human observer. Transmission of 'perfect', lossless images will not be practical for many years.
Study on objective fidelity for progressive image transmission systems
Eleanor Lin, Cheng-Chang Lu
With the continuing growth of modern communication technologies, the demand for image transmission is increasing rapidly. Transmitting a high resolution images over low bandwidth communication lines usually requires a great amount of time and the user interaction with such a transiiission eiivitonment can become frustrating. The problem can be eased somewhat by transmitting a SelICS of lOW resolution approximations which outlines the final image. Progressive image transmission is being applied in interactive image comiriunications tlia.t tia usmit )VCr low bandwidth channels. The progressive build-up is a way of reordering the coded data 1)IiOI transmission, so that a rough approximation will appear first and l)C followed by SuCCeSSively 11101-C detailed versions of the final image. In this paper, we investigate the objective fidelity for several widely used progressive graylevel i i-n age transmission algorithms which include both time and frequency-domain appiOa dies, and also propose some efficient implementations to improve their performance without increasing (Ufli Putational complexity.
Quantization of color image components in the DCT domain
Heidi A. Peterson, Huei Peng, J. H. Morgan, et al.
Several topics connecting basic vision research to image compression and image quality are discussed: (1) A battery of about 7 specially chosen simple stimuli should be used to tease apart the multiplicity of factors affecting image quality. (2) A 'perfect' static display must be capable of presenting about 135 bits/min2. This value is based on the need for 3 pixels/min and 15 bits/pixel. (3) Image compression allows the reduction from 135 to about 20 bits/min2 for perfect image quality. 20 bit/min2 is the information capacity of human vision. (4) A presumed weakness of the JPEG standard is that it does not allow for Weber's Law nonuniform quantization. We argue that this is an advantage rather than a weakness. (5) It is suggested that all compression studies should report two numbers separately: the amount of compression achieved from quantization and the amount from redundancy coding. (6) The DCT, wavelet and viewprint representations are compared. (7) Problems with extending perceptual losslessness to moving stimuli are discussed. Our approach of working with a 'perfect' image on a 'perfect' display with 'perfect' compression is not directly relevant to the present situation with severely limited channel capacity. Rather than studying perceptually lossless compression we must carry out research to determine what types of lossy transformations are least disturbing to the human observer. Transmission of 'perfect', lossless images will not be practical for many years.
Quantitative evaluation of image enhancement algorithms
Hong-Qian Lu
Functional evaluation of image processing algorithms is to determine how well they process images. The evaluation is, therefore, closely related to the assessment of image quality. The commonly used metrics are signal-to-noise ratio, mean square error, absolute error, and correlation. Unfortunately, these measures cannot adequately describe the visual quality of a processed image and, thence, may not properly evaluate the algorithms. This paper presents a new quantitative method for evaluating image enhancement (noise reduction) filters by proposing a new image quality metric. This approach is based on the psycho-visual study of noise sensitivity of human vision and a study of the performance of enhancement filters. The image quality metric is defined with respect to image intensity changes called the spatial activity. The functional evaluation of an enhancement filter uses this quality metric and includes quantitative measures of the filter's noise removal ability and the filter-caused distortion. It is shown that, to produce a good visual image quality, a filter should have both better noise reduction ability in low spatial activity regions and less distortion in high spatial activity regions.
Converting non-interlaced to interlaced images in YIQ and HSI color spaces
Eric B. Welch, Robert J. Moorhead II, John K. Owens
Directly converting images from the noninterlaced format used by high resolution monitors to the interlaced NTSC standard can result in several artifacts which are visually disturbing. One of the most noticeable of these is a flickering effect visible on sharp horizontal edges which is due to the interlace of the NTSC video signal. This flickering effect can be significantly reduced using an appropriately designed spatial filter to remove abrupt horizontal changes in the image without excessive blurring of the image. Due to the availability of application specific integrated circuits which convert images between different color spaces, it is practical to filter in any of several color spaces. Therefore, our current research focuses on comparing tradition interfield spatial filtering in the YIQ color space to new spatial filtering techniques developed by the authors using the HSI color space. Results based on an extended version of the CCIR impairment scale are presented. The present experiments are based on renderings of computation grids obtained from the NFS/MSU Engineering Research Center of computational Field Simulation. The paper also demonstrates how a red fringe occurs in wire grid images with gray scale backgrounds when these images are spatially filtered in the HSI color space. Further, a means of correcting this red shift is demonstrated.
Model-based halftoning
Thrasyvoulos N. Pappas, David L. Neuhoff
New model-based approaches to digital halftoning are proposed. They are intended primarily for laser printers, which generate 'distortions' such as 'dot overlap'. Conventional methods, such as clustered-dot ordered dither, resist distortions at the expense of spatial and gray-scale resolution. Our methods rely on printer models that predict distortions and, rather than merely resisting them, they exploit them to increase, rather than decrease, gray-scale and spatial resolution. We propose a general framework for printer models, and find a specific model of laser printers. As an example of model-based halftoning we propose a modification of error diffusion, which is often considered the best halftoning method of CRT displays with no significant distortions. The new version exploits the printer model to extend the benefits of error diffusion to printers. Experiments show that it provides high quality reproductions with reasonable complexity. The quality of printed images obtained using the new technique on a 300 dots/inch printer is comparable to the quality of images obtained with traditional techniques (e.g. 'Classical' screening) on a 400 dots/inch printer. Model-based halftoning can be especially useful in transmission of high quality documents using high fidelity gray-scale image encoders. As we show in a companion paper, is such cases halftoning is performed at the receiver, just before printing. Apart form coding efficiency, this approach permits the halftoner to be tuned to the individual printer, whose characteristics may vary considerably from those of other printers, for example, write-black vs. write-white laser printers.
Biologically Based Machine Vision
icon_mobile_dropdown
Computational model of an integrated vision system
William Uttal, Thomas Shepherd, Robb E. Lovell, et al.
This paper reports progress in the development of an integrated computational model of visual perception. Ultimately, the system is planned to include as many of the stages of perceptual processing as possible. The portions described in this report include a powerful texture segmenter based on the spatial averaging of the outputs of a cluster of fallible texture analyzers, the reconstruction of three dimensional forms from sparse point samples, and the location, magnification, and rotation invariant recognition of forms based on both their local features and the arrangement of the local features.
Modeling inner and outer plexiform retinal processing using nonlinear coupled resistive networks
Andreas G. Andreou, Kwabena A. Boahen
The retina transduces all visual information that reaches the brain. From an engineering point of view, its function is to reduce the bandwidth required to transmit images to the brain by rejecting irrelevant information. Indeed, the retina is primarily sensitive to temporal and spatial change3 in the image, and not to the absolute level of illumination. This preprocessing greatly reduces the size of the optic nerve and makes higher level processing more effective. However, any process that discards information must necessarily create ambiguities. That is, two different stimuli may affect the same response—one stimulus thus creates an illusion of the other. Vision researchers have discovered many illusions. Any model that seeks to account for the behavior of the eye—brain system must explain this large phenomenological database in a unified and biologically plausible fashion. Grossberg has proposed a model that succeeds in the first respect; that is, he provides a unified mechanistic explanation for optical illusions1 . Grossberg succeeds, were others have failed, because his model takes into account interactions between the processes that control perception of form and appearance. As it turns out, these interacting processes offset each other's complementary inadequacies, producing emergerti properties that cannot be explained by focusing on any one process alone. Grossberg's model has three interacting processes. The first process enhances discontinuities ( edges) in the image and, at the same time, discounts the illuminant. This process is implemented using on—cells with lateral inhibitory connections whose outputs resemble that of the retinal bipolar cells. The second process does the actual edge detection. It is realized by three hierarchical layers of cells. The third process smooths variations in brightness using a syncytium of cells between which signals diffuse freely. Afferent inputs produced by the first process (on—cells) are averaged by the third process (syncytium) within boundaries generated by the second process (edge detection) to generate the final brightness percept. In Grossberg's work, results of computer simulations demonstrating the performance of this model for various 1—D and 2—D images are presented. The model gives the correct brightness percepts for several classic illusions, such as brightness constancy, brightness contrast ,the Craik— O'Brein—Cornsweet effect, the Koffka—Benusi ring, evenly and unevenly illuminated Mondrians, and more recent illusions such as the Kanizsa—Minguzzi anomalous brightness differentiation. That a simple mechanistic model can explain all these illusions away should not be surprising; they are produced by a single (highly evolved) underlying biological structure. 'Now in the CNS program, California Institute of Technology This paper describes a phy3ical model2 which implements the above mechanisms using two resistive networks (grids). The first network forms a spatial average of the input luminance signals, mimicking the retinal horizontal cells. The second network implements the syncytium using nonlinear conductances. The current in these conductances saturates when the voltage across them becomes large, automatically segmenting the image. In the retina, this mechanism is probably mediated by the gap junctions. Our model extends Mahowald and Mead's biologically inspired silicon retina2 to include inner—plexiform processing. It is simple and robust, having only three levels and six parameters (which are actual conductances and currents) compared to six levels and over twenty parameters for Grossberg's model. We have simulated our model on a computer (about 400 lines of C—code) and used it to duplicate the results1 using images with up to 40 x 40 pixels. Brightness percepts produced by the model for various illusions will be presented. Since the model has a simple and regular structure, requiring only nearest—neighbor connections, it can be efficiently implemented in Analog VLSI. It should be possible to realize a 200 x 200 pixel retina in a state—of—the—art CMOS process. Of course, the silicon retina will operate in real—time; its dynamic properties could be compared with available neurophysiological data. This paper is organized as follows: The new model is presented in the next section (Section 2). In Section 3, we describe the software implementation. Results from the simulations are presented in Section 4. In Section 5, we argue that the syncytium is realized by the amacrine cells in the inner—plexiform layer of the retina and show that the model's predictions are consistent with results from motion experiments. Our concluding remarks are in Section 6
New human vision system model for spatio-temporal image signals
Toshikazu Matsui, Shuzo Hirahara
A new mathematical model of the human vision system has been formulated. This model simulates the human vision spatio-temporal responses quantitatively. The physiological characteristics of the human visual nerve system including the X and Y-ganglion cells have been added to a vision model which was formerly proposed for still images. The new model has been applied to flickering sinusoidal waves, and the model's spatio- temporal frequency characteristics agreed well with experimental results.
Color character recognition method based on a model of human visual processing
Kazuo Yamaba, Yoichi Miyake
Color is one of the most important types of visual information in various environmental conditions such as space, underwater and so on. This area of study, however, is still a young technology. In particular, the use of color vision in a color bar-code detecting system is a new area. On the other hand, neural net models have been studied for many years in the hope of achieving human-like performance. This paper deals with a color character recognition method based on a model of human visual processing. To realize a higher level of color character recognition, a new apparatus is constructed. The apparatus mainly consists of a color CCD camera corresponding to human eye, an image board, a neuro board and a microcomputer. In this experiment, ten characters are selected from a normal typewriter, and five kinds of colors are used as color characters and background colors under the illumination of a fluorescent lamp. The neuro board consists of an image pipe line processor and its network has three hierarchical structures using a back-propagation method. The experiment is carried out by employing models of human visual processing. The results show that it can recognize characters and colors at the same time.
Simulation of parvocellular demultiplexing
Eugenio Martinez-Uriegas
A massively parallel wiring of chromatic and achromatic receptive fields, of the type presumably present at some level in the visual cortex, is simulated starting from pseudo-random L/M cone lattices. It is shown that there is considerable variability in the relative L/M cone input to achromatic processes but there is also a intrinsic invariant in the chromatic decoding. With the usual caution required to link simple models derived from first principles to the behavior of complex biological systems, the results are useful to understand, on one hand, the variability of luminous spectral sensitivity for achromatic visual tasks under different spatial conditions, and on the other, the observed stability of unique yellow across individuals with normal color vision who nevertheless have very different proportion of L to M cones. The results are also consistent with those given by non-random non-massive demultiplexing examples previously reported.
Vision-based model of artificial texture perception
Anne M. Landraud
A robust operator for natural-texture recognition, which works as a model of texture visual perception and enables a scale-and-orientation independent image interpretation, is presented. The visual cell behavior is modeled as a set of frequency-and-orientation selective filters in accordance with cortical channel characteristics and with the signal theory. The bandpasses of our frequency-and-orientation separable filters agree closely with psychophysiological experiments. Such a frequency filter may be any function with same frequency properties as visual system ones. Each 2D bandpass filter is constructed by crossing a 1D frequency selective filter with an orientation selective 1D filter. The output energies of those 2D filters enable a texture characterization which is especially interesting for a continuous multiresolution representation. This method allows us to compare textures with different orientations and scales by means of a simple translation in the response plane. The advantage of our model is its ability to process continuous variations while avoiding information redundancy.
Mean-field stereo correspondence for natural images
William N. Klarquist, Scott Thomas Acton, Joydeep Ghosh
This paper presents a new cooperative technique for solving the dense stereo correspondence problem in natural images using mean field theory (MFT). Given a gray scale stereo image pair, the disparity map for the scene is modeled as a locally interconnected network of graded neurons. The network encodes the correspondence problem as an energy function composed of terms representing disparity uniqueness, disparity continuity, and system stability evaluated at each neuron. A MFT approximation to the simulated annealing process commonly used to locate the minimum energy solution for the disparity map is introduced and developed. Results using this approach are compared with those from a standard simulated annealing algorithm and demonstrate a significant improvement in rate of convergence with comparable solution quality.
Machine and Human Color Vision
icon_mobile_dropdown
Color and Grassmann-Cayley coordinates of shape
Alexander P. Petrov
A new concept of surface color is developed and the variety of all perceived colors is proved to be a 9-D set of 3 X 3 matrices corresponding to different surface colors. We consider the Grassmann manifold Q of orbits Q equals {B (DOT) h + c; detBdoes not equal 0} where c is an arbitrary vector of colorimetric space, B is a 3 X 3 matrix, and h(x,y) is a color image of a Lambertian surface assumed to be a linear vector-function of the normal vectors. Different orbits Q(n) correspond to different shapes but they are invariant under color and illuminant transformation. Coordinates of an orbit Q in Q can be computed as 3 X 3 (2 X 2, sometimes) determinants the elements of which are values of some linear functionals (receptive fields) of h(x,y). Based on the approach, a shape-from-shading algorithm was developed and successfully tested on the three-color images of various real objects (an egg, cylinders and cones made of paper, etc.).
Supervised color constancy for machine vision
Carol L. Novak, Steven A. Shafer
In machine vision, color constancy is the ability to match object colors in images taken under different colors of illumination. This is a difficult problem because the image color will depend upon the spectral reflectance function of the object and the spectral distribution function of the incident light, both of which are generally unknown. Previous methods to solve this problem have represented these functions with a small number of basis functions and used some sort of reference knowledge to calculate the coefficients. Most of these methods have the weakness that the reference property may not actually hold for all images, or that it provides too few constraints to allow an adequate recovery of the functions. We present here a method for color constancy that uses a color chart of known spectral characteristics to give stronger reference criteria, and with a large number of colors to give enough constraints to calculate the illuminant to the desired degree of accuracy. We call this approach 'supervised color constancy' since the process is supervised by a picture of a reference color chart. We demonstrate two methods for computing supervised color constancy, one using least squares estimation, the other using a neural network. We present results for simulated experiment of the calculation of the spectral power distribution of an unknown illuminant.
Photometric models in multispectral machine vision
The performance of several tasks in multispectral computer vision involves assumptions about the reflection of light from surfaces. These tasks include color constancy (visual representation of spectral reflectances independent of the illuminant spectrum), object-based image segmentation, and deduction of the shape of a surface from its shading. Most color-constancy theories implicitly assume Lambertian, coplanar reflecting surfaces, a distant viewer, and a distant light source that may have many components that are spatially and spectrally distinct. Object-based-segmentation theories allow curved surfaces, each of whose scattering kernels is the sum of a few separable terms (each of which is the product of a wavelength-dependent part and a geometry-dependent part). There is no restriction on the distances of light sources or observer. However, for these theories the illuminant angular/spectral distribution must consist of only one or two separable terms. Finally, A. Petrov's shape-from-shading theory allows the light source to have nearly arbitrary spectral and spatial composition, but requires the surface scattering kernels to have Lambertian dependence on the surface normal. The present paper compares these photometric models.
Large and small color differences: predicting them from hue scaling
Hoover Chan, Israel Abramov, James Gordon
Color appearance can be specified by a procedure of direct hue scaling. In this procedure, subjects look at a stimulus and then simply state the proportions of their sensations using the four unique hue names red, yellow, green, and blue; to completeness, they also state the apparent saturation. Observers can scale stimuli quickly and reliably, and this is true even if they are relatively inexperienced. Thus stimuli can be rescaled whenever viewing conditions change such that a new specification of appearance is required. The scaled sensory values elicited by a set of stimuli are used to derive the locations of the stimuli on a color diagram that is based on appearance and which we term a Uniform Appearance Diagram (UAD). The orthogonal axes of these space are red-green and yellow-blue; the location of a stimulus specifies its hue and its distance from the origin specifies its apparent saturation. We have investigated the uniformity of this space by using a subject's UAD, for a particular set of viewing conditions, to predict both small and large color differences under comparable viewing conditions. For small-scale differences we compared wavelength discrimination functions derived from UADs with those obtained by direct adjustment of a bipartite field. For large-scale differences, subjects rated the degree of similarity of pairs of different wavelengths; these ratings were compared with the distances separating the same pairs of wavelengths on a UAD. In both cases, the agreements were very good, implying that UAD's are metrically uniform. Thus, UADs could be used to adjust the hues in a pseudo-color display so that all transitions would be equally perceptible or would differ by specified amounts.
Visual processing, transformability of primaries, and visual efficiency of display devices
William A. Thornton
This paper follows on paper 1250-06 in SPIE Proceedings of the 1990 Conference: 'Perceiving, measuring, and using color', Michael H. Brill, Chair/Editor, February 15-16, 1990. Visual efficiency of display devices, as well as of all other applications of strong visible coloration, demands an understanding of true spectral response of normal human vision, and for a useful colorimetry of visual processing. True spectral response of vision is considered unknown, while visual processing, in the sense represented by colorimetry, is thought to be adequately understood. It is the intent of this paper and its antecedent to suggest that both are mistaken: first, that spectral response is now fairly dependably known, and second that colorimetry works too poorly to support the contention that visual processing is understood well enough for that purpose. Fortunately, it appears that much of this understanding can be gained through psychophysics, without a good knowledge of the inner working of the visual system. We repeat here some evidence of true spectral response, and introduce new evidence on some ways in which visual processing does not occur. Colorimetry (measurement of color and brightness) does not correlate well to what the normal human observer sees. In researching the reasons for the discrepancies, we use (1) accurate, absolute spectral power distributions to arrive at power-content of viewed lights, (2) in a visual colorimeter coupled by a quartz light-pipe to a precise spectroradiometer; (3) three disparate primary-sets of three spectral lights; (4) six normal acute observers. Last year, we demonstrated large errors in computed chromaticity; this year we show that the errors are not largely due to the Standard Observer, but to the 'color-matching function' (CMF), and to its uses. Last year we suggested trouble with transformation of primaries; this year we show graphic evidence that transformation of primaries fails. The repercussions of these facts--failure of CMFs as weighting functions, and of transformation of primaries-- are unpleasant to contemplate. Avenues of possible improvement of the colorimetry situation are mentioned.
Measurements of lightness: dependence on the position of a white in the field of view
John J. McCann, Robert Savoy
We used quantitative lightness matching to measure changes in achromatic lightness as a function of the separation between a bright White (1000 mL) and a nearby 3 or a 300 mL Test Field. The experiments varied size, angular subtend and separation of the White area. All variables showed some influence on the matching lightness to the Test Field. Of particular interest are substantial changes in lightness from White areas that have 7.5 degrees separation. This paper will describe the quantitative results and their implications for models of lightness.
Apparent contrast and surface color in complex scenes
Lawrence Arend
The influence of sensory processes on perception of pictures has long interested graphics scientists and engineers. Adaptation, illumination, and surround variables affect chromatic and achromatic apparent contrast and other aspects of color appearance. My recent experiments on apparent surface colors in complex patterns have led to a model of surface appearance in which early visual processes (e.g., adaptation, contrast) are only the first stage. Their role in surface perception is to relatively accurately encode the physical contrasts in the retinal image. Higherorder processes then compute surface properties from these contrast signals. It is, however, well known from neurophysiological and psychophysical measurements that early processes only approximate ideal encoding of image contrasts. Constant response amplitudes require larger luminance contrasts at low mean luminances. I have recently measured local apparent contrasts, lightnesses (apparent reflectances), and brighmesses (apparent luminances) in complex patterns at a variety of luminances that occur frequently in modern display devices. Apparent contrast decreased at low luminances, but this did not distort apparent reflectances (as one might expect from a number of recent lightness models). These results have several interesting implications for imaging applications.
Biologically Based Machine Vision
icon_mobile_dropdown
Accurate image simulation by hemisphere projection
Buming Bian, Norman Wittels
Radiosity methods produce very striking interreflection effects for an enclosed diffuse environment. The quality of the images synthesized by radiosity methods varies depending on the calculation accuracy of the form factor, which is a geometric factor depending only on the relative orientation and position between two surfaces. Hemicube projection has been proposed to check the visibility between surfaces and calculate the form factor efficiently, but there is no analytical solution due to the double area integration. This paper presents a new form of the form factor and gives a theoretical derivation from the point view of illumination engineering. The new form factor contains only one area integration, so it can be computed analytically. An energy conservation condition applies to verify the correctness of the new model. The error introduced by using reciprocity as part of the form factor calculation has been eliminated, but the principle of reciprocity is maintained. The hemisphere projection method is used as an analytical solution of the new form factor. Finally, images generated by the new radiosity method are presented. Interreflection effects can be clearly seen around the intersection of the surfaces.