Proceedings Volume 4662

Human Vision and Electronic Imaging VII

cover
Proceedings Volume 4662

Human Vision and Electronic Imaging VII

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 30 May 2002
Contents: 9 Sessions, 51 Papers, 0 Presentations
Conference: Electronic Imaging 2002
Volume Number: 4662

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Keynote Session
  • Eye Movements and Scene Analysis
  • Quantitative Models of Human Vision for Coding and Quality I
  • Quantitative Models of Human Vision for Coding and Quality II
  • Perception, Visualization, and Graphics I
  • Perception, Visualization, and Graphics II
  • Image Analysis, Perception, and Digital Libraries
  • Retinex at 40
  • Quantitative Models of Human Vision for Coding and Quality II
  • Demonstrations and Posters
Keynote Session
icon_mobile_dropdown
Adaptation, high-level vision, and the phenomenology of perception
To what extent do we have shared or unique visual experiences? This paper examines how the answer to this question is constrained by known processes of visual adaptation. Adaptation constantly recalibrates visual sensitivity so that our vision is matched to the stimuli that we are currently exposed to. These processes normalize perception not only to low-level features in the image, but to high-level, biologically relevant properties of the visual world. They can therefore strongly impact many natural perceptual judgments. To the extent that observers are exposed to and thus adapted by a different environment, their vision will be normalized in different ways and their subjective visual experience will differ. These differences are illustrated by considering how adaptation can influence human face perception. To the extent that observers are exposed and adapted to common properties in the environment, their vision will be adjusted toward common states, and in this respect they will have a common visual experience. This is illustrated by reviewing the effects of adaptation on the perception of image blur. In either case, it is the similarities or differences in the stimuli - and not the intrinsic similarities or differences in the observers - which determine the relative states of adaptation. Thus at least some aspects of our private internal experience are controlled by external factors that are accessible to objective measurement.
Identifying perceptually significant features for recognizing faces
Pawan Sinha
The human visual system possesses a remarkable ability to detect and identify faces even under degraded viewing conditions. The fundamental challenge in understanding this ability lies in determining which facial attributes the visual system uses for these tasks. Here we describe experiments designed to probe the limits of these abilities and determine the relative contributions of internal versus external facial features for the detection and identification tasks. The results provide strong constraints and guidelines for computational models of face perception.
Eye Movements and Scene Analysis
icon_mobile_dropdown
Effects of 180-deg. image rotation on eye movement pattern
Dimitri A. Chernyak, Michela Azzariti, Lawrence W. Stark
This study quantitatively compares eye movement (EM) patterns of subjects viewing static pictures for short period of time. Each image stimulus is viewed in its original appearance and under a linear transformation - rotation by 180 degrees. Eye movements for the original and transformed images are compared in terms of similarity in position of fixations (SP factor) and their sequence (SS factor). The stimuli come from four distinct groups. First group contains pseudo-natural images that have typical natural image Fourier power distribution (1/f) and random phase factor. This creates a cloud-like pattern without any particular shape outlines. Second group contains single objects that might appear in the environment in almost any orientation. Such objects do not possess any intrinsic polarity such as up and down (for example a bundle of keys). The third group contains single object with well- defined polarity (for example a tree). Finally, the fourth category contains scenes with multiple objects and well- defined polarity (for example a picture of a room). We investigate the effects of the transformation for each category on EM pattern and evaluate similarity of viewing strategies for individual subjects.
How people look at pictures before, during, and after scene capture: Buswell revisited
A wearable eye tracker was used to record photographers' eye movements while they took digital photographs of person, sculpture, and interior scenes. Eye movement sequences were also recorded as the participants selected and cropped their images on a computer. Preliminary analysis revealed that during image capture people spend approximately the same amount of time looking at the camera regardless of the scene being photographed. The time spent looking at either the primary object or the surround differed significantly across the three scenes. Results from the editing phase support previous reports that observers fixate on semantic-rich regions in the image, which, in this task, were important in the final cropping decision. However, the spread of fixations, edit time, and number of crop windows did not differ significantly across the three image classes. This suggests that, unlike image capture, the cropping task was highly regular and less influenced by image content.
Shape perception in pictures: eye movements during local surface attitude probing
Andrea J. van Doorn, Theo Boersema, Huib de Ridder, et al.
Perceived local attitudes of the surface of a photographed object are measured by superimposing a small ellipse upon the picture and asking observers to adjust shape and orientation of the ellipse such that it looks as if it is a circle painted on the surface. Previous studies show that observers need global information when judging this fit, although the task is local probing. This study investigates how observers gather global stimulus information, by monitoring eye-scan patterns during task performance. It appears that for the vast majority of the settings all fixations of the subjects fall within a small area around the center of the ellipse. Thus, the global information that observers need to perform the task is almost never acquired during fixations outside a small area around the ellipse. This indicates the importance of peripheral information for shape perception.
Gaze-contingent real-time simulation of arbitrary visual fields
Jeffrey S. Perry, Wilson S. Geisler
We describe an algorithm and software for creating variable resolution displays in real time, contingent upon the direction of gaze. The algorithm takes as input a video sequence and an arbitrary, real-valued, two-dimensional map that specifies a desired amount of filtering (blur) at each pixel location relative to direction of gaze. For each input video image the follow operations are performed: (1) the image is coded as a multi-resolution pyramid, (2) the gaze direction is measured, (3) the resolution map is shifted to the gaze direction, (4) the desired filtering at each pixel location is achieved by interpolating between levels of the pyramid using the resolution map, and (5) the interpolated image is displayed. The transfer function associated with each level of the pyramid is calibrated beforehand so that the interpolation produces exactly the desired amount of filtering at each pixel. This algorithm produces precision, artifact-free displays in 8-bit grayscale or 24-bit color. The software can process live or prerecorded video at over 60 frames per second on ordinary personal computers without special hardware. Direction of gaze for each processed video frame may be taken from an eye-tracker, from a sequence of directions saved on disk, or from another pointing device (such as a mouse). The software is demonstrated by simulating the visual fields of normals and of patients with low vision. We are currently using the software to precisely control retinal stimulation during complex tasks such as extended visual search.
Quantitative Models of Human Vision for Coding and Quality I
icon_mobile_dropdown
Practical applications that require some of the more advanced features of current visual models
While the use of visual models for assessing all aspects of the imaging chain is steadily increasing, one hindrance is the complexity of these models. This has impact in two ways - not only does it take longer to run the more complex visual model, making it difficult to place into optimization loops, but it also takes longer to code, test, and calibrate the model. As a result, a number of shortcut models have been proposed and used. Some of the shortcuts involve more efficient frequency transforms, such as using a Cartesian separable wavelet, while other types of shortcuts involve omitting the steps required to simulate certain visual mechanisms, such as masking. A key example of the latter is spatial CIELAB, which only models the opponent color CSFs and does not model the spatial frequency channels. Watson's recent analysis of the Modelfest data showed that while a multi-channel model did give the best performance, versions dispensing with the complex frequency bank and just using frequency attenuation did nearly as well. Of course, the Modelfest data addressed detection of a signal on a uniform field, so no masking properties were probed. On the other end of complexity is the model by D'Zmura, which not only includes radial and orientation channels, but also the interactions between the channels in both luminance and color. This talk will dissect several types of practical distortions that require more advanced visual models. One of these will be the need for orientation channels to predict edge jaggies due to aliasing. Other visual mechanisms in search of an exigent application that we will explore include cross luminance-chrominance masking and facilitation, local contrast, and cross-channel masking.
Approaching a unified model of pattern detection and brightness perception
In order to develop a human vision model to simulate both grating detection and brightness perception, we have chosen four visual functional components. They include a front-end low-pass filter, a cone-type dependent local compressive nonlinearity described by a modified Naka-Rushton equation, a cortical representation of the image in the Fourier domain, and a frequency dependent compressive nonlinearity. The model outputs were fitted to contrast sensitivity functions over 7 mean illuminance levels ranging from 0.0009 to 900 trolands simultaneously with a set of 6 free parameters. The fits account for 97.8% of the total variance in the reported experimental data. Furthermore, the same model was used to simulate contrast and brightness perception. Some visual patterns that can produce simultaneous contrast or crispening effect were used as input images to the model. The outputs are consistent with the perceived brightness, using the same set of parameter values that was used in the above-mentioned fits. The model also simulated the perceived contrast contours on seeing a frequency-modulated grating and the whiteness percepts at different adaptation levels. In conclusion, a model that is based on simple visual properties is promising for deriving a unified model of pattern detection and brightness perception.
Functional roles of center-surround interactions in visual image processing
The perception of a visual stimulus is affected by the presence of other stimuli in its surround region. The perceived central contrast can be suppressed or enhanced by the surrounds. Our previous psychophysical results showed that both surround suppression and enhancement existed in the foveal vision while only suppression existed in the peripheral vision. Moreover, the suppression in the periphery was much stronger than that in the fovea. In this report, we built an image processing model for the vision system with lateral connections embedded. We first adjusted model parameters to make the model have the same performance as human subjects had in contrast perception experiments in fovea and periphery respectively. With those parameters, we then analyzed the functions of lateral connections in image perception. We found that: 1) With foveal parameters, lateral interactions served the purpose of gain control and image regularization. The contrast response in the fovea was modulated by the global image through lateral connections. 2) With peripheral parameters, lateral interactions resulted in image boundary segmentation. The response to a uniform image region was suppressed while the response to the boundary regions remained. The results suggest that a visual image be encoded differently by foveal and peripheral vision. The possible impacts of this feature on image compression were discussed.
Additivity models for suprathreshold distortion in quantized wavelet-coded images
Damon M. Chandler, Sheila S. Hemami
The additivity of wavelet subband quantization distortions was investigated in an unmasked detection task and in masked detection and discrimination tasks. Contrast thresholds were measured for both simple targets (artifacts induced by uniform quantization of individual discrete wavelet transform subbands) and compound targets (artifacts induced by uniform quantization of pairs of discrete wavelet transform subbands) in the presence of no mask and eight different natural image maskers. The results were used to assess summation between wavelet subband quantization distortions on orientation and spatial-frequency dimensions. In the unmasked detection experiment, subthreshold quantization distortions pooled in a non-linear fashion and the amount of summation agreed with those of previous summation-at-threshold experiments ((beta) equals 2.43; relative sensitivity equals 1.33). In the masked detection and discrimination experiments, suprathreshold quantization distortions pooled in a linear fashion. Summation increased as the distortions became increasingly suprathreshold but quickly settled to near-linear values. Summation on the spatial-frequency dimension was greater than summation on the orientation dimension for all suprathreshold contrasts. A high degree of uncertainty imposed by the natural image maskers precludes quantifying an absolute measure of summation.
Perceived overall contrast and quality of the tone scale rendering for natural images
Past research has demonstrated the complexity of perceived contrast as an attribute of natural images. This attribute is affected by the tone reproduction characteristics in an imaging system, the observer's viewing environment, and the scene itself. Development of digital photography with new tools to affect tone reproduction prompts the necessity for further insights into parameters that influence the perception of contrast to computationally predict and optimize this attribute for a large variety of individual images. To elucidate the relationship between perceived overall contrast and image properties, we performed a series of experiments where observers estimated perceived overall contrast, defined as an integrated impression of difference in lightness, for images of natural scenes presented on a monitor screen. The ratings were used to develop a computational prediction based on assessment of features, which hypothetically could be used by the subjects' visual system when evaluating perceived overall contrast.
Image quality assessment by expert and non-expert viewers
The difference between expert and non-expert viewers in assessing image quality is evaluated in two experiments. The assessment performance in terms of discrimination ability and reproducibility is measured for both groups. The results of these experiments suggest that both groups of viewers exhibit the same assessment behavior when judging the level of a given image quality attribute, such as e.g. sharpness. When judging overall image quality, however, expert viewers seem to weight various attributes differently as compared to non-expert viewers.
Quantitative Models of Human Vision for Coding and Quality II
icon_mobile_dropdown
Extending the modelfest image/threshold database into the spatio-temporal domain
Thom Carney, Stanley A. Klein, Brent Beutter, et al.
Models that predict human performance on narrow classes of visual stimuli abound in the vision science literature. However, the vision and the applied imaging communities need robust general-purpose, rather than narrow, computational human visual system (HVS) models to evaluate image fidelity and quality and ultimately improve imaging algorithms. Of the general-purpose early HVS models that currently exist, direct model comparisons on the same data sets are rarely made. The Modelfest group was formed several years ago to solve these and other vision modeling issues. The group has developed a database of static spatial test images with threshold data that is posted on the WEB for modellers to use in HVS model design and testing. The first phase of data collection was limited to detection thresholds for static gray scale 2D images. The current effort will extend the database to include thresholds for selected grayscale 2D spatio-temporal image sequences. In future years, the database will be extended to include discrimination (masking) for dynamic, color and gray scale image sequences. The purpose of this presentation is to invite the Electronic Imaging community to participate in this effort and to inform them of the developing data set, which is available to all interested researchers. This paper presents the display specifications, psychophysical methods and stimulus definitions for the second phase of the project, spatio-temporal detection. The threshold data will be collected by each of the authors over the next year and presented on the WEB along with the stimuli.
Quality of video affected by packet loss distortion compared to the predictions of a spatio-temporal model
Kjell E. Brunnstrom, Bo N. Schenkman
Image quality of images as judged by human observers may be determined by different simulation models. The results of an image discrimination model is discussed and compared to the results of the peak signal-to-noise ratio. The image discrimination model simulates how people analyze spatial-temporal information and it predicts detection of distortion. Peak signal-to-noise ratio measures the physical difference between an original and a distorted image sequence. To test if the model could without modifications be used to predict image quality in small moving images, a perceptual experiment was carried out with ten observers. Image quality judgments were measured with five different video scenes using eight category scales and magnitude estimations. Packet loss of transmissions was simulated for two H.263 coders, one with two layers and one with only one layer. The reliability of the judgments was generally high. The judged image quality depended on type of scene and coder. There was a strong inter-correlation between the category scales. Both magnitude estimations of image quality and ratings of a category scale for image quality could to some extent be predicted by the model, but there were no advantages for the visual model.
Towards a spatio-chromatic standard observer for detection
The aim of the ColorFest is to extend the original ModelFest (http://vision.arc.nasa.gov/modelfest/) experiments to build a spatio-chromatic standard observer for the detection of static coloured images. The two major issues that need to be addressed are (1) the contrast sensitivity functions for the three chromatic mechanisms and (2) how the output of these channels is combined. We measured detection thresholds for stimuli modulated along different colour directions and for a wide range of spatial frequencies. The three main directions (an achromatic direction, a nominally isoluminant red-green direction, and the tritanopic confusion line) and four intermediate colour directions were used. These intermediate directions were the vector sums of the thresholds along the main directions. We evaluate two models. Detection performance is described by a linear transformation C defining the chromatic tuning and a diagonal matrix S reflecting the sensitivity of the chromatic mechanisms for a particular spatial frequency. The output of the three chromatic mechanisms is combined according to a Minkowski metric (General Separable Model), or according to a Euclidean Distance measure (Ellipsoidal Separable Model). For all three observers the ellipsoidal model fits as well as the general separable model. Estimating the chromatic tuning improves the model fit for one observer.
Perceptual color image quality metric using adequate error pooling for coding scheme evaluation
We propose a visual quality bivariant criterion for the evaluation of coding scheme. This criterion is based on human visual system properties to get the best correspondence with human judgments. Contrary to some others objective criteria, it doesn't use any information on the type of degradations introduced by coding schemes. We use two main stage. The first one computes the visual representation of errors distributed over color, spatial and frequency dimensions between two image. This stage is entirely based on results from psychophysics experiments conducted in the laboratory. The second stage computes the error pooling over color, frequency and space to get the overall visual quality between two images. Since we have previously showed importance of this stage, we propose an original approach extended here to color images. In particular, we point out in this paper how to take into account color information in a visual quality criterion. We compare results of the criterion with human judgments on a database of images distorted with 3 types of compression schemes (JPEG, JPEG2000 and a ROI-based algorithm using metrics defined by the Video Quality Expert Group. Results indicate that criterion provides good prediction accuracy, monotonicity and consistency. Proposed approach is so a useful alternative tool for coding image searchers.
Perception, Visualization, and Graphics I
icon_mobile_dropdown
Using animation quality metric to improve efficiency of global illumination computation for dynamic environments
Karol Myszkowski, Takehiro Tawara, Hans-Peter Seidel
In this paper, we consider applications of perception-based video quality metrics to improve the performance of global lighting computations for dynamic environments. For this purpose we extend the Visible Difference Predictor (VDP) developed by Daly to handle computer animations. We incorporate into the VDP the spatio-velocity CSF model developed by Kelly. The CSF model requires data on the velocity of moving patterns across the image plane. We use the 3D image warping technique to compensate for the camera motion, and we conservatively assume that the motion of animated objects (usually strong attractors of the visual attention) is fully compensated by the smooth pursuit eye motion. Our global illumination solution is based on stochastic photon tracing and takes advantage of temporal coherence of lighting distribution, by processing photons both in the spatial and temporal domains. The VDP is used to keep noise inherent in stochastic methods below the sensitivity level of the human observer. As a result a perceptually-consistent quality across all animation frames is obtained.
Perception, Visualization, and Graphics II
icon_mobile_dropdown
Conveying 3D shape with texture: recent advances and experimental findings
If we could design the perfect texture pattern to apply to any smooth surface in order to enable observers to more accurately perceive the surface's shape in a static monocular image taken from an arbitrary generic viewpoint under standard lighting conditions, what would the characteristics of that texture pattern be? In order to gain insight into this question, our group has developed an efficient algorithm for synthesizing a high resolution texture pattern, derived from a provided 2D sample, over an arbitrary doubly curved surface in such a way that the orientation of the texture is constrained to follow a specified underlying vector field over the surface, at a per-pixel level, without evidence of seams or projective distortion artifacts. In this paper, we report the findings of a recent experiment in which we attempt to use this new texture synthesis method to assess the shape information carrying capacity of two different types of directional texture patterns (unidirectional and bi-directional) under three different orientation conditions (following the first principal direction, following a constant uniform direction, or swirling sinusoidally in the surface). In a four alternative forced choice task, we asked participants to identify the quadrant in which two B-spline surfaces, illuminated from different random directions and simultaneously and persistently displayed, differed in their shapes. We found, after all subjects had gained sufficient training in the task, that accuracy increased fairly consistently with increasing magnitude of surface shape disparity, but that the characteristics of this increase differed under the different texture orientation conditions. Subjects were able to more reliably perceive smaller shape differences when the surfaces were textured with a pattern whose orientation followed one of the principal directions than when the surfaces were textured with a pattern that either gradually swirled in the surface or followed a constant uniform direction in the tangent plane regardless of the surface shape characteristics. These findings appear to support our hypothesis that anisotropic textures aligned with the first principal direction may facilitate shape perception, for a generic view, by making more, reliable information about the extent of the surface curvature explicitly available to the observer than would be available if the texture pattern were oriented in any other way.
Two-stage color palettization for error diffusion
Niloy J. Mitra, Maya R. Gupta
Image-adaptive color palettization chooses a decreased number of colors to represent an image. Palettization is one way to decrease storage and memory requirements for low-end displays. Palettization is generally approached as a clustering problem, where one attempts to find the k palette colors that minimize the average distortion for all the colors in an image. This would be the optimal approach if the image was to be displayed with each pixel quantized to the closest palette color. However, to improve the image quality the palettization may be followed by error diffusion. In this work, we propose a two-stage palettization where the first stage finds some m << k clusters, and the second stage chooses palette points that cover the spread of each of the M clusters. After error diffusion, this method leads to better image quality at less computational cost and with faster display speed than full k-means palettization.
Graduated profiling: enumerating and generating perceptual colormaps for uncalibrated computer displays
The importance of using perceptual colormaps for visualizing numerical data is well established in the fields of scientific visualization, computer graphics and color science and related areas of research. In practice however, the use of perceptual colormaps tends to be the exception rather than the rule. In general it is difficult for end-users to find suitable colormaps. In addition, even when such colormaps are available, the inherent variability in color reproduction among computer displays makes it very difficult for the users to verify that these colormaps do indeed preserve their perceptual characteristics when used on different displays. Generally, verification requires display profiling (evaluating the display's color reproduction characteristics), using a colorimeter or a similar type of measuring device. With the growth of the Internet, and the resulting proliferation of remote, client-based displays, the profiling problem has become even more difficult, and in many cases, impossible. We present a method for enumerating and generating perceptual colormaps in such a way that ensures that the perceptual characteristics of the colormaps are maintained for over a wide range of different displays. This method constructs colormaps that are guaranteed to be 'perceptually correct' for a given display by using whatever partial profile information of the display is available. We use the term 'graduated profiling' to describe this method of partial profiling.
Image Analysis, Perception, and Digital Libraries
icon_mobile_dropdown
Perceptual image analysis for graphical rendering and digital libraries
When fabricating a large area holographic diffuser by cascading hologram of diffusion patterns with specified area, it was found that a fluctuation of diffraction intensity occurred at the boundaries of the two adjacent exposed areas. It was affirmed experimentally that this problem could be alleviated by controlling the intensity distribution of reference laser beam around the boundary of each exposure area. Using this method, the fluctuation was reduced considerably. For better compensation of the non-uniformity a theoretical analysis for the diffusion process of photopolymer is discussed and compared with the experimental results.
Real-time high-performance attention focusing in outdoors color video streams
When confronted with cluttered natural environments, animals still perform orders of magnitude better than artificial vision systems in visual tasks such as orienting, target detection, navigation and scene understanding. To better understand biological visual processing, we have developed a neuromorphic model of how our visual attention is attracted towards conspicuous locations in a visual scene. It replicates processing in the dorsal ('where') visual stream in the primate brain. The model includes a bottom-up (image-based) computation of low-level color, intensity, orientation and flicker features, as well as a nonlinear spatial competition that enhances salient locations in each feature channel. All feature channels feed into a unique scalar 'saliency map' which controls where to next focus attention onto. In this article, we discuss a parallel implementation of the model which runs at 30 frames/s on a 16-CPU Beowulf cluster, and the role of flicker (temporal derivatives) cues in computing salience. We show how our simple within-feature competition for salience effectively suppresses strong but spatially widespread motion transients resulting from egomotion. The model robustly detects salient targets in live outdoors video streams, despite large variations in illumination, clutter, and rapid egomotion. The success of this approach suggests that neuromorphic vision algorithms may prove unusually robust for outdoors vision applications.
Automatic identification of regions of interest with application to the quantification of DNA damage in cells
Visual systems that have evolved in nature appear to exercise a mechanism that places emphasis upon areas in a scene without necessarily recognising objects that lie in those areas. This paper describes the application of a new model of visual attention to the automatic assessment of the degree of damage in cultured human lung fibroblasts. The visual attention estimator measures the dissimilarity between neighbourhoods in the image giving higher visual attention values to neighbouring pixel configurations that do not match identical positional arrangements in other randomly selected neighbourhoods in the image. A set of tools has been implemented that processes images and produces corresponding arrays of attention values. Additional functionality has been added that provides a measure of DNA damage to images of treated lung cells affected by ultraviolet light. The unpredictability of the image attracts visual attention with the result that greater damage is reflected by higher attention values. Results are presented that indicate that the ranking provided by the visual attention estimates compare favourably with an 'experts' visual assessment of the degree of damage. Potentially, visual attention estimates may provide an alternative method of calculating the efficacy of genotoxins or modulators of DNA damage in treated human cells.
Using a computational model of human color vision to perform object segmentation in natural images
John Arthur Black Jr., Karthikeyan Vaithianathan, Sethuraman Panchanathan
Most recent attempts at object segmentation have been based on object motion in video sequences. However, object segmentation in still images is more difficult. Without motion cues, other cues must be found. Edge detection algorithms are able to extract object contours from images, and were once thought to hold promise for object segmentation in still images. However, additional processing is needed to distinguish between object contours and other edges, such as those produced by textures. The alternative method of region growing (based on luminance or color) has also proven rather ineffective for object segmentation in natural images. In contrast, humans are very successful at object segmentation in still images, suggesting that a model of the early human visual system (HVS) might reveal useful methods for more robust object segmentation in still images. The research results presented in this paper are derived from an HVS model that includes models of Type 1 and Type 2 color contrast cells, and double opponent color contrast cells. By combining the outputs of these cells with edge detected images, object contours can be better distinguished from other contours (such as texture contours and shadow contours) thus providing enhanced object segmentation in cluttered images.
ISee: perceptual features for image library navigation
Aleksandra Mojsilovic, Jose Gomes, Bernice E. Rogowitz
To develop more satisfying image navigation systems, we need tools to construct a semantic bridge between the user and the database. In this paper we present an image indexing scheme and a query language, which allow the user to introduce a cognitive dimension to the search. At an abstract level, this approach consists of: 1) learning the natural language that humans speak to communicate their semantic experience of images, 2) understand the relationships between this language and objective measurable image attributes, and then 3) develop the corresponding feature extraction schemes. In our previous work we have conducted a number of subjective experiments in which we asked human subjects to group images, and then explain verbally why they did so. The results of this study indicated that part of the abstraction involved in image interpretation is often driven by semantic categories, which can be broken into more tangible semantic entities, i.e. objective semantic indicators. By analyzing our experimental data, we identified some candidate semantic categories (i.e. portraits, people, crowds, cityscapes, landscapes, etc.), discovered their underlying semantic indicators (i.e. skin, sky, water, object, etc.), and derived important low-level image descriptors accounting for our perception of these indicators. In our recent work we have used these findings to develop a set of image features that match the way humans communicate image meaning, and a semantic-friendly query language for browsing and searching diverse collections of images. We have implemented our approach into an Internet search engine, ISee, and tested it on a large number of images. The results we obtained are very promising.
Retinex at 40
icon_mobile_dropdown
Visual cortex and the Retinex algorithm
Jack D. Cowan, Paul C. Bressloff
Optical imaging has revealed that the visual cortex is the site of numerous functional maps associated with visual objects such as their position in the visual field, the local orientation of their contours, their texture and surface properties, and some aspects of their color. We show how such functional or feature maps may be used to analyze how visual objects may be represented by integrated neural population activity, and how such activity may embody algorithms similar to the Retinex algorithm.
Contribution of local and global cone-contrasts to color appearance: a Retinex-like model
Anya C. Hurlbert, Christopher J. L. Wolf
Recent psychophysical experiments demonstrate that for simple configurations, colour appearance is largely determined by the ratios of within-type cone excitations (cone contrasts) between a target surface and its immediate background. Other experiments demonstrate that both the mean and variance of the cone excitations from remote surfaces may influence the colour of a target surface. The relative contribution of local and remote surfaces to the colour appearance of a centrally viewed target also depends on adaptational state and, therefore, on stimulus duration. Cone-contrast models of colour appearance that include the influence of cone excitations from local and global surfaces may be viewed as modern-day successors of the Retinex model for colour constancy. Here we describe psychophysical experiments of colour matching under simulated illumination changes, and examine the effects of the size and configuration of local and remote chromatic elements in a complex background on the colour appearance of a central target. We compare the observed colour matches with predictions from a standard Retinex model and from a modified Retinex-like model with weighting factors on the distance-order of chromatic edges.
Experimental studies of instantaneous color constancy: dynamic color matching under rapid changes of illuminant
John L. Barbur, Darryl de Cunha, Cristyn B. Williams, et al.
We have extended the experiments of McCann et al., (1976) by incorporating the Mondrian stimulus into a dynamic colour matching (DCM) technique that allows the subject to match accurately the colour of any test patch under sequential changes of illuminant. We have also studied how scattered light affects the measured instantaneous colour constancy (ICC) index. The results show that correction for forward light scatter in the eye can increase significantly the measured ICC index. The changes in the perceived colour of a central test stimulus as a result of surround illuminant changes was investigated in a number of successful binocular and dichoptic experiments. The contribution made by distant patches to ICC was found to be small with the immediate surround (i.e., less than 2 degree(s) separation) contributing over 50% of the constancy effect. A number of subjects with partial loss of ability to see and discriminate colours caused by damage to ventromedial pre-striate visual cortex were also investigated. In order to establish the site of ICC mechanisms, the dynamic colour matching technique was modified to make it suitable for studies in patients with unilateral damage to the primary visual cortex.
Conditions for perceptual transparency
Stephen Westland, Osvaldo da Pos, Caterina Ripamonti
We review the conditions that are necessary for the perception of transparency and describe the spatiochromatic constraints for achromatic and chromatic transparent displays. These constraints can be represented by the convergence model and are supported by psychophysical data. We present an alternative representation of the constraints necessary for transparency perception that is based on an analogy with a model of colour constancy and the invariance of cone-excitation ratios. Recent psychophysical experiments are described that suggest that displays where the cone-excitation ratios are invariant produce a stronger impression of transparency than displays where the cone excitations are convergent. We argue that the spatial relations in an image are preserved when a Mondrian-like surface is partially covered by a transparent filter and therefore show an intriguing link between transparency perception and colour constancy. Finally, we describe experiments to relate the strength of the transparency percept with the number of unique patches in the image display. We find that the greater the number of surfaces in the display that are partially covered by a transparent filter the stronger the impression of transparency.
Demonstration of red/white projections and rod-Lcone color
John J. McCann, Jeanne L. Benton, Suzanne P. McKee
In the late 1950's Edwin Land, while developing instant color film, repeated James Clerk Maxwell's 1861 three-color projection experiments. By accident, a two-color red and white projection appeared on the screen. Fascinated by the multicolored images that he saw, Land studied the phenomena extensively, published a series of papers and developed with Texas Instruments, a prototype red and white television system. This talk will demonstrate Land's original Red and White projections using equipment on loan from the Rowland Institute. In the late 1960s and early 70s McKee, Benton and McCann investigated color images from stimuli that excited only rods and long-wave (L) cones. They used dark adaptation curves, flicker-fusion rates, the Stiles-Crawford Effect, and apparent sharpness to differentiate rod and M/S-cone responses. They showed that color from rods and L cones under the right stimulus conditions was nearly identical to cone-cone color. This talk will also demonstrate color from rod -L cone interactions.
Capturing a black cat in shade: the past and present of Retinex color appearance models
As a part of the Symposium 'Retinex at 40', this paper recounts the research on capturing real-life scenes, calculating appearances and rendering sensations on film, and other limited dynamic-range media. It describes: the first patents, a hardware display used in Land's Ives Medal Address in 1968, the first computer simulations using 20 by 24 pixel arrays, psychophysical experiments and computational models of color constancy and dynamic range compression and the Frankle-McCann computationally efficient Retinex image processing of 512 by 512 images. It will include several modifications of the approach including recent modifications and gamut-mapping applications. This paper emphasizes the need for parallel studies of psychophysical measurements of human vision and computational models of imaging systems.
Improving the Retinex algorithm for rendering wide dynamic range photographs
Digital photography systems often render an image from a scene-referred description with very wide dynamic range to an output-referred description of much lesser dynamic range. Global tone maps are often used for this purpose, but can fail when called upon to perform a large amount of range compression. A luminance formulation of the Retinex ratio-reset-product-average algorithm produces a smoothly changing contrast mask of great benefit, but it too can fail where high contrast edges are encountered. A slight but critical modification to the Retinex equation - introducing a ratio modification operator - changes the nature of the generated contrast mask so that it is simultaneously smooth in regions of small contrast ratios, but extremely sharp at high contrast edges. A mask produced in this way compresses large and undesirable contrast ratios while preserving, or optionally enhancing, small ratios critical to the sensation of image contrast. Processed images may appear to have a greater contrast despite having a shorter global contrast range. Adjusting the new operator prior to processing gives control of the degree of compression at high contrast edges. Changing the operator during processing gives control over spatial frequency response.
Modifications to Retinex to relax RESET nonlinearity and implement segmentation constraints
This paper addresses the Frankle-McCann Retinex algorithm by altering its properties to provide a distance-weighting function to every ratio-product path from a source to a destination pixel. The algorithm is further modified to permit the hard RESET function to be replaced by a piece-wise smoother function that allows ratio-product propagation to slightly exceed the maximum brightness in each visual channel. Investigations of how segmentation can aid in reducing the computational complexity and provide a more realistic white balance are presented.
Tuning Retinex parameters
Brian V. Funt, Florian Ciurea, John J. McCann
Our goal is to specify the retinex model as precisely as possible. The core retinex computation is clearly specified in our recent MATLAB implementation; however, there remain several free parameters which introduce significant variability into the model's predictions. In this paper, we extend previous work on specifying these parameters. In particular, instead of looking for fixed values for the parameters, we establish methods which automatically determine values for them based on the input image. These methods are tested on the McCann-McKee-Taylor asymmetric matching data along with some previously unpublished data that include simultaneous contrast targets.
Color correction between gray world and white patch
Color equalization algorithms exhibit a variety of behaviors described in two differing types of models: Gray World and White Patch. These two models are considered alternatives to each other in methods of color correction. They are the basis for two human visual adaptation mechanisms: Lightness Constancy and Color Constancy. The Gray World approach is typical of the Lightness Constancy adaptation because it centers the histogram dynamic, working the same way as the exposure control on a camera. Alternatively, the White Patch approach is typical of the Color Constancy adaptation, searching for the lightest patch to use as a white reference similar to how the human visual system does. The Retinex algorithm basically belongs to the White Patch family due to its reset mechanism. Searching for a way to merge these two approaches, we have developed a new chromatic correction algorithm, called Automatic Color Equalization (ACE), which is able to perform Color Constancy even if based on Gray World approach. It maintains the main Retinex idea that the color sensation derives from the comparison of the spectral lightness values across the image. We tested different performance measures on ACE, Retinex and other equalization algorithms. The results of this comparison are presented.
Post-filtering for color appearance in synthetic-image tone reproduction
Daniele Marini, Alessandro Rizzi, Maurizio Rossi
In the Photorealistic Image Synthesis process the spectral content of the synthetic scene is carefully reproduced, and the final output contains the exact spectral intensity light field of the perceived scene. This is the first important step toward the goal of producing a synthetic image that is indistinguishable from the actual one, but the real scene and its synthetic reproduction should be studied under the same conditions, in order to make a correct comparison and evaluate the degree of photorealism. To simplify this goal, a synthetic observer could be employed to compensate differences in the viewing conditions, since a real observer cannot enter into a synthetic world. Various solutions have been proposed to this end. Most of them are based more on perceptive measures of the Human Visual System (HVS) under controlled conditions rather than on the HVS behaviour under real conditions, e.g., observing a common image and not a controlled black and white striped pattern. Another problem in synthetic image generation is the visualization phase, or tone reproduction, whose purpose is to display the final result of the simulation model on a monitor screen or on a printed paper. The tone reproduction problem consists of finding the best solution to compress the extended dynamic range of the computed light field into the limited range of the displayable colors. We would like to propose a working hypothesis to solve the appearance and the tone reproduction problems in the synthetic image generation, integrating the Retinex model into the photorealistic image synthesis context, including in this way a model of the human visual system in the synthesis process.
Simple spatial processing for color mappings
Color is commonly treated as an autonomous entity and all color processing and color mapping is predominantly done as a point operation, linking one color or color description to another. This point-wise approach clearly does not capture the spatial dependencies of color that we humans experience. However, spatial models are rather computationally extensive, leading to a continuous predominance of point-wise color processing methods. This paper examines what advantages can be gained from a most simplistic spatial approach.
Retinex processing for automatic image enhancement
In the last published concept (1986) for a Retinex computation, Edwin Land introduced a center/surround spatial form, which was inspired by the receptive field structures of neurophysiology. With this as our starting point we have over the years developed this concept into a full scale automatic image enhancement algorithm - the Multi-Scale Retinex with Color Restoration (MSRCR) which combines color constancy with local contrast/lightness enhancement to transform digital images into renditions that approach the realism of direct scene observation. The MSRCR algorithm has proven to be quite general purpose, and very resilient to common forms of image pre-processing such as reasonable ranges of gamma and contrast stretch transformations. More recently we have been exploring the fundamental scientific implications of this form of image processing, namely: (i) the visual inadequacy of the linear representation of digital images, (ii) the existence of a canonical or statistical ideal visual image, and (iii) new measures of visual quality based upon these insights derived from our extensive experience with MSRCR enhanced images. The lattermost serves as the basis for future schemes for automating visual assessment - a primitive first step in bringing visual intelligence to computers.
Perceived image quality basis for image enhancement techniques
This paper discusses the use of basic image quality metrics to construct systematic procedures for the enhancement of digital-image sharpness, and describes in detail how these same procedures may be extended to the problem of the satisfactory rendering of images covering a wide range of brightness levels (the extended-latitude problem). It is demonstrated that in both cases the solution includes the possibility of naturally-adaptive and continuously-variable enhancement techniques which are both simple in operation and undemanding of computational resources.
Variational famework for Retinex
Ron Kimmel, Michael Elad, Doron Shaked, et al.
Retinex theory addresses the problem of separating the illumination from the reflectance in a given image and thereby compensating for non-uniform lighting. This is in general an ill-posed problem. In this paper we propose a variational model for the Retinex problem that unifies previous methods. Similar to previous algorithms, it assumes spatial smoothness of the illumination field. In addition, knowledge of the limited dynamic range of the reflectance is used as a constraint in the recovery process. A penalty term is also included, exploiting a-priori knowledge of the nature of the reflectance image. The proposed formulation adopts a Bayesian view point of the estimation problem, which leads to an algebraic regularization term, that contributes to better conditioning of the reconstruction problem. Based on the proposed variational model, we show that the illumination estimation problem can be formulated as a Quadratic Programming optimization problem. An efficient multi-resolution algorithm is proposed. It exploits the spatial correlation in the reflectance and illumination images. Applications of the algorithm to various color images yield promising results.
Analysis and generalization of Retinex by recasting the algorithm in wavelets
The Retinex algorithm, in its incarnation as McCann'99, presents an interesting mix of a locally connected iterative algorithm and a multiresolution analysis of the image. By recasting the algorithm, using wavelets, the behavior of the algorithm comes to light. This allows generalizations to be proposed, by changes in both the multiresolution structure and the iterative update structure.
Retinex processing from the fMRI study on V4: artistic research of colored picture using functional MRI
Yasuyo G. Ichihara, Satoshi Nakadomari, Hiroaki Takeuchi, et al.
Artists can imagine 2 kinds of coloured scenes with no recognizable objects. One is the abstract picture such as colour Mondrian, which includes geometric pattern: rectangles, circles and crosses. Another is the decorative texture such as Japanese traditional cloud pattern which is colour camouflage pattern and does not include geometric pattern. We created these 2 kinds of colour dot pattern stimuli composed of 4 iso-luminance colours and the same area and an achromatic version. These functional magnetic resonance imaging stimuli reveal multiple colour-sensitive areas in human ventral occipitotemporal cortex. The results showed that area V4 is highly activated by the stimulus of the abstract picture such as rectangle pattern and spiral but little activated by the stimulus of the decorative texture such as random colour dot picture and cloud pattern. We suggest V4 is activated by figures composed of colour dots with eye-like shape such as disks, crosses, gratings, spirals, and windmill-like figures, and V4 has low response to the camouflage figure in which colour dots do not include eye-like shapes.
Quantitative Models of Human Vision for Coding and Quality II
icon_mobile_dropdown
Visual perception studies to improve the perceived sharpness of television images
William E. Glenn
In this paper several properties of visual perception are used to describe the perceived sharpness of present HDTV transmission and display formats. A method is described that uses these properties to improve perceived sharpness without increasing the transmission bit rate. Because of the oblique effect in vision and the statistical orientation of lines in scenes, diagonal sampling reduces the required number of pixels in an image. Quantitatively, our measurements show that the number of pixels is reduced by a factor of 1.4 for the same perceived sharpness. Interlaced scanning reduces vertical resolution for several reasons involving spatial and temporal masking effects in visual perception. Progressive scan avoids these limitations. In addition, by taking advantage of the octave-wide tuning bands in visual perception, our measurements show that the perceived resolution in the vertical direction for a progressive scan can be double that of an interlaced scan. By using diagonal sampling, a 1920X1080 image with progressive scan at 60 frames per second requires the same transmission bit rate as a 1920X1080 cardinally sampled image scanned interlaced at 30 frames per second. This results in an image that appears to be much sharper than the 1080 line interlaced format without the interlace artifacts.
Demonstrations and Posters
icon_mobile_dropdown
Texture resynthesis using principle component analysis
We present a method for analyzing and resynthesizing inhomogeneously textured regions in images for the purpose of advanced compression. First the user defines image blocks so that they cover regions with homogeneous texture. These blocks are each transformed in turn. For the transform we use the so called Principle Component Analysis. After the transform into the new domain we statistically analyze the resulting coefficients. To resynthesize new texture we generate random numbers that exactly meet these statistics. Using the inverse transform the random coefficients are finally transformed back into the spatial domain. The visual appearance of the resulting artificial texture matches the original to a very high degree.
Detection and tracking of facial features under a complex background
Li Zhuang, Guang-you Xu, Haizhou Ai, et al.
A coarse-to-fine facial feature detection and tracking system which is used under complex background is introduced in this paper. The system uses stereo cameras for video input. By stereovision technique, face is roughly and quickly segmented from complex background. Then, the multiple template matching method is applied to find the accurate face region from this rough segmentation. Facial organ candidates are extracted from the detected face region at a specific scale space called organ scale for Sobel filter. Finally, eyes, nose and mouth corners are detected. Techniques for checking and correcting errors in facial feature detection based on multiple cues are developed to make the algorithm more robust in facial feature detection and tracking in video sequence. Experiments on 189 video sequences demonstrate its effectiveness.
Method for color gamut compression based on subjective evaluation
Mariko Takahashi, Narihiro Matoba, Hiroaki Sugiura
As many color devices, such as color displays, color printers, digital cameras, are commonly used, it is well known that colors displayed on two color devices are different from each other. This difference in color necessitates the use of color matching techniques, especially gamut compression, to compress the colors displayed on the device having the larger gamut onto a device having a smaller color gamut. In this paper we introduce a modified method of gamut compression using correlate with modified perceptible difference. Previously we introduced a color difference in HVC color space and also sensitivity coefficients to improve the subjective quality of the gamut compression that is to compress out of gamut color to inside gamut color. That was to have determined the sensitivity coefficients for Hue, Value and Chroma in HVC color space based on the subjective evaluations between different color devices. Then we applied the sensitivity coefficients to gamut compression. Carrying out subjective experiment, the result showed that this method was more effective than conventional method. Then, we also leaded the correlate to cross term Hue, Value and Chroma. We analyzed with variance to know the relation between cross term each attribute. We modified the color difference and applied it to gamut compression.
Recognition of stereoscopic images among elderly people
Masako Omori, Tomoyuki Watanabe, Masaru Miyao, et al.
We tested 130 subjects including elderly people using two types of stereogram. One was a 3D image of a repeating parallel pattern showing balloons, from a software program called Stretch Eye. This program adopts a shift method in which the balloons diverge just at the point that causes a single shift between the right and left eyes, so that they appear to be more distant than the monitor screen. The Stretch Eye image was shown on a color LCD. The other image was a paper stereogram. Both used the same image of balloons. Using these 2 types of 3D image, we analyzed the recognition of stereoscopic images among elderly people. The subjects were 130 people aged 18 to 86 years, including 60 people over 60 years of age. The subjects' visual functions of cataract cloudiness (CC) and pupil distance were measured. Comparisons were carried out for the two targets of the paper stereograms and color LCDs. Subjects were divided into four groups according to the severity of CC. Two-way ANOVA was used for the statistical analysis in order to compare the influence of the target types, age and cataract cloudiness on the ability, distance and time of stereoscopic recognition. In a two-way ANOVA, two kinds of dependant variables, recognized speed (RS) and recognized distance (RD) were used for the subjects' stereoscopic recognition performance.
Categorical color response of chromatic lights in the entire visual field
Masato Sakurai, Chieko Sakamoto, Miyoshi Ayama, et al.
Categorical color naming experiment was carried out in the entire visual field using five test stimuli, R, Y1, Y2, G, and B which appear red, dark yellow, bright yellow, green, and blue, respectively, at the center of the visual field. The observer reported color appearance of the test stimuli using only one of 13 color terms (11 basic categorical color terms, yellowish-green, and aqua-blue). The test stimuli were presented at 0 deg and from 10 to 80 deg in the eccentricity with 10 deg step for each of the 8 directions. The constant color name region where the same color name as used at 0 deg was obtained was clearly larger for Y2 and B than those for R and G, while that of Y1 was in between. Approximate size of the constant color name region of monocular viewing extended 30 deg in the nasal and upper, 40 deg in the lower, 70 deg in the temporal visual fields. Outside of the region, the categorical color responses became unstable and in the far periphery achromatic color names such as white or gray often appeared in all of the test stimuli.
Effect evoked by luminance and color of PC monitors in the pupillary responses and retinal illuminances
Ernesto Suaste-Gomez, Arturo Zuoiga, Rosalinda Martinez
One of the main characteristics of the reaction of the pupil is the change that undergoes due of the luminance and color of the PC monitors. In this study, we evaluated the retinal illuminance in troland, i.e., total the luminous flux that affects the surface of the retina. For this investigation all the spectral of chromaticity was considered, wavelength from 390 to 660 nm, by means of system RGB and the functions of equalization of color XYZ, were exhibited in several PC monitors. The study was made projecting over on PC monitor, in three forms: full-field chromatic stimulation, the foveal target of 2 degree(s) on background white and on background blue of 15 degree(s) in both cases. In addition, the luminance in cd/m2 of each one of the monitors was measured. In order to quantify the changes of pupil area mm2, it was made by image processing of a video-oculography, which uses a video camera with sensors and illumination in the infrared range. Necessary condition when the observer is dark-adapted in scotopic or photopic conditions. The results obtained of retinal illuminance were made since wavelength 390 to 660 nm of ten subjects including at two persons with different colour vision deficiencies.
Individual differences in visual behavior in simulated flight
Flying an aircraft is highly visually demanding. It is very important to map pilot visual behaviour, both for the purpose of evaluating the cockpit interface and to effectively integrate it with future adaptive interfaces and decision support systems. Pilots' visual behaviour was studied in two experiments. In the first experiment commercial aviation pilots were flying a commercial aviation scenario and eye point of gaze, and eye blinks were collected. In the second experiment military pilots were flying an air-to-air combat scenario and the visual behaviour was video recorded. In both of the experiments the results show individual differences in the pilots' visual behaviour. In the second experiment two different categories of eye blinks were found that might help explain the individual differences in visual behaviour. One category can be related to the systematic eye blinks found to occur when the eye point of gaze was changed between head-up/head-down and head-down/head-up. The other category could be related to other reasons, such as mental workload or visual demands.
Interactive voxel graphics in virtual reality
Bill Brody, Glenn G. Chappell, Chris Hartman
Interactive voxel graphics in virtual reality poses significant research challenges in terms of interface, file I/O, and real-time algorithms. Voxel graphics is not so new, as it is the focus of a good deal of scientific visualization. Interactive voxel creation and manipulation is a more innovative concept. Scientists are understandably reluctant to manipulate data. They collect or model data. A scientific analogy to interactive graphics is the generation of initial conditions for some model. It is used as a method to test those models. We, however, are in the business of creating new data in the form of graphical imagery. In our endeavor, science is a tool and not an end. Nevertheless, there is a whole class of interactions and associated data generation scenarios that are natural to our way of working and that are also appropriate to scientific inquiry. Annotation by sketching or painting to point to and distinguish interesting and important information is very significant for science as well as art. Annotation in 3D is difficult without a good 3D interface. Interactive graphics in virtual reality is an appropriate approach to this problem.