Proceedings Volume 5666

Human Vision and Electronic Imaging X

cover
Proceedings Volume 5666

Human Vision and Electronic Imaging X

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 18 March 2005
Contents: 15 Sessions, 59 Papers, 0 Presentations
Conference: Electronic Imaging 2005 2005
Volume Number: 5666

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Keynote Session
  • Perspectives in Vision Science
  • Memorial Session in Honor of Bela Julesz
  • Perceptual Image Compression and Quality
  • Image Quality of System Tonescale
  • Perceptual Image Compression and Quality
  • Image Quality of System Tonescale
  • Perceptual Image Compression and Quality
  • Image Quality of System Tonescale
  • Reference-Free Image Quality
  • Perceptually Based Techniques for Digital Libraries
  • Perceptual Image Representation and Display
  • Poster Session
  • Perceptual Image Analysis
  • Perceptual Image Representation and Display
  • Portraying Visual Reality: The Limits of Perception and Representation in Art
  • Poster Session
  • Memorial Session in Honor of Bela Julesz
  • Poster Session
  • Perceptual Image Analysis
  • Special Session: VALVE I
  • Special Session: VALVE II
  • Special Session: VALVE III
  • Special Session: VALVE IV
  • Perceptual Image Analysis
Keynote Session
icon_mobile_dropdown
A spatial standard observer for visual technology
The Spatial Standard Observer (SSO) was developed in response to a need for a simple, practical tool for measurement of visibility and discriminability of spatial patterns. The SSO is a highly simplified model of human spatial vision, based on data collected in a large cooperative multi-lab project known as ModelFest. It incorporates only a few essential components, such as a local contrast transformation, contrast sensitivity function, local masking, and local pooling. The SSO may be useful in a wide variety of applications, such as evaluating vision from unmanned aerial vehicles, measuring visibility of damage to aircraft and to the shuttle orbiter, predicting outcomes of corrective laser eye surgery, inspection of displays during the manufacturing process, estimation of the quality of compressed digital video, evaluation of legibility of text, and predicting discriminability of icons or symbols in a graphical user interface. In this talk I will describe the development of the SSO, and will discuss in detail a number of these potential applications.
Celestial illusions and ancient astronomers: Aristarchus and Eratosthenes
When the moon is half, one would expect that a line starting from the moon’s center and being perpendicular to the “shadow diameter” would, if extended, go through the center of the light source, namely, the sun. It turns out that, when the sun is visible, this extended line appears to aim significantly above the sun, which is the essence of the “half-moon illusion”. The explanation advanced here is that this is not an optical illusion; instead, it can be explained by the relative sizes and distances of the earth, moon, and sun, and it hinges on the fact that the sunrays are nearly parallel with respect to the earth-moon system. It turns out that the ancients knew and used this near-parallelism of the sunrays. Eratosthenes, for example, used a simple but ingenious scheme to obtain a good estimate of the earth’s circumference. An interesting question is: How did the ancients arrive at the conclusion that the sunrays are nearly parallel? This was probably a corollary, based on the immense size of the sun and its huge distance from the earth, as estimated by, among others, Aristarchus of Samos by a brilliantly simple method.
Perspectives in Vision Science
icon_mobile_dropdown
Do humans discount the illuminant?
In constancy experiments, humans report very small changes in appearance with substantial illumination changes. Hermann von Helmholtz introduced the term “discounting the illuminant” to describe 19th century thinking about underlying mechanisms of constancy. It uses an indirect approach. Since observers see objects as constant, observers “must” be able to detect the spatial and spectral changes in illumination and automatically compensate by altering the signals from the quanta catches of retinal receptors. Instead of solving the problem directly by calculating an object’s reflectance from the array of scene radiances, Helmholtz chose to solve the problem of identifying the illumination. Twentieth century experiments by Hubel and Wiesel, Campbell, Land, and Gibson demonstrate the power of mechanisms using spatial comparisons. This paper analyses a series of different experiments looking for unequivocal evidence that either supports “discounting the illuminant” or supports spatial comparisons as the underlying mechanism of constancy.
A model of the formation of a self-organized cortical representation of color
A. Ravishankar Rao, Guillermo Cecchi, Charles Peck, et al.
In this paper we address the problem of understanding the cortical processing of color information. Unravelling the cortical representation of color is a difficult task, as the neural pathways for color processing have not been fully mapped, and there are few computational modelling efforts devoted to color. Hence, we first present a conjecture for an ideal target color map based on principles of color opponency, and constraints such as retinotopy and the two dimensional nature of the map. We develop a computational model for the cortical processing of color information that seeks to produce this target color map in a self-organized manner. The input model consists of a luminance channel and opponent color channels, comprising red-green and blue-yellow signals. We use an optional stage consisting of applying an antagonistic center-surround filter to these channels. The input is projected to a restricted portion of the cortical network in a topographic way. The units in the cortical map receive the color opponent input, and compete amongst each other to represent the input. This competition is carried out through the determination of a local winner. By simulating a self-organizing map for color according to this scheme, we are largely able to achieve the desired target color map. According to recent neurophysiological findings, there is evidence for the representation of color mixtures in the cortex, which is consistent with our model. Furthermore, an orderly traversal of stimulus hues in the CIE chromaticity map correspond to an orderly spatial traversal in the primate cortical area V2. Our experimental results are also consistent with this biological observation.
A study of human recognition rates for foveola-sized image patches selected from initial and final fixations on calibrated natural images
Ian van der Linde, Umesh Rajashekar, Lawrence K. Cormack, et al.
Recent years have seen a resurgent interest in eye movements during natural scene viewing. Aspects of eye movements that are driven by low-level image properties are of particular interest due to their applicability to biologically motivated artificial vision and surveillance systems. In this paper, we report an experiment in which we recorded observers’ eye movements while they viewed calibrated greyscale images of natural scenes. Immediately after viewing each image, observers were shown a test patch and asked to indicate if they thought it was part of the image they had just seen. The test patch was either randomly selected from a different image from the same database or, unbeknownst to the observer, selected from either the first or last location fixated on the image just viewed. We find that several low-level image properties differed significantly relative to the observers’ ability to successfully designate each patch. We also find that the differences between patch statistics for first and last fixations are small compared to the differences between hit and miss responses. The goal of the paper was to, in a non-cognitive natural setting, measure the image properties that facilitate visual memory, additionally observing the role that temporal location (first or last fixation) of the test patch played. We propose that a memorability map of a complex natural scene may be constructed to represent the low-level memorability of local regions in a similar fashion to the familiar saliency map, which records bottom-up fixation attractors.
Towards a multilevel cognitive probabilistic representation of space
Adriana Tapus, Shrihari Vasudevan, Roland Siegwart
This paper addresses the problem of perception and representation of space for a mobile agent. A probabilistic hierarchical framework is suggested as a solution to this problem. The method proposed is a combination of probabilistic belief with “Object Graph Models”(OGM). The world is viewed from a topological optic, in terms of objects and relationships between them. The hierarchical representation that we propose permits an efficient and reliable modeling of the information that the mobile agent would perceive from its environment. The integration of both navigational and interactional capabilities through efficient representation is also addressed. Experiments on a set of images taken from the real world that validate the approach are reported. This framework draws on the general understanding of human cognition and perception and contributes towards the overall efforts to build cognitive robot companions.
Memorial Session in Honor of Bela Julesz
icon_mobile_dropdown
Top-down processes in perceiving false depth and motion for faces and scenes
There are at least two broad classes of three-dimensional (3D) stimuli that tend to be perceived in illusory reverse depth: hollow masks and “reverspectives”, the latter having been invented by Patrick Hughes in 1964. Because of the depth inversion, these stimuli appear to move when observers move in front of them. The illusion is diminished significantly when a hollow mask is inverted, as compared to an upright mask; the same trend is observed with inverted reverspectives, as compared to upright reverspectives, but the inversion effect is weaker than that in faces. The inversion effect can be attributed to top-down influences in perception, and the results point to a stronger role of such influences for the perception of faces than scenes.
Local cross-correlation model of stereo correspondence
Martin S. Banks, Sergei Gepshtein, Heather F. Rose
As the disparity gradient of a stimulus increases, human observers’ ability to solve the correspondence problem and thereby estimate the disparities becomes poorer. It finally fails altogether when a critical gradient - the disparity-gradient limit (Burt & Julesz, 1980)- is reached. We investigated the cause of the disparity-gradient limit. As part of this work, we developed a local cross-correlator similar to ones proposed in the computer vision literature and similar to the disparity-energy model of neurons in area V1. Like humans, the cross-correlator exhibits poorer performance as the disparity gradient increases. We also conducted a psychophysical experiment in which observers were presented sawtooth waveforms defined by disparity. They made spatial phase discriminations. We presented different corrugation spatial frequencies and amplitudes, and measured observers’ ability to discriminate the two phases. Coherence thresholds (the proportion of signal dots at threshold relative to the total number of dots in the stimulus) were well predicted by the disparity gradient and not by either the spatial frequency or amplitude of the corrugation waveform. Thus, human observers and a local cross-correlator exhibit similar behavior, which suggests that humans use such an algorithm to estimate disparity. As a consequence, disparity estimation is done with local estimates of constant disparity (piecewise frontal), which places a constraint on the highest possible stereo resolution.
The riches of the cyclopean paradigm
The cyclopean paradigm introduced by Bela Julesz remains one of the richest probes into the neural organization of sensory processing, by virtue of both its specificity for purely stereoscopic form and the sophistication of the processing required to retrieve it. The introduction of the sinusoidal stereograting showed that the perceptual limitations of human depth processing are very different from those for monocular form. Their use has also revealed the existence of hypercyclopean form channels selective for specific aspects of the monocularly invisible depth form. The natural extension of stereogratings to patches of stereoGabor ripple has allowed the measurement of the summation properties for depth structure, which is specific for narrow horizontal bars in depth. Consideration of the apparent motion between two cyclopean depth structures reveals the existence of a novel surface correspondence problem operating for cyclopean surfaces over time after the binocular correspondence has been solved. Such concepts imply that remains to be discovered about cyclopean stereopsis and its relationship to 3D form perception from other depth cues.
Perceptual Image Compression and Quality
icon_mobile_dropdown
Sharpening image motion based on the spatio-temporal characteristics of human vision
Moving objects in films often appear normal or even sharper than they actually are, a phenomenon called motion sharpening. We sought to clarify which spatio-temporal frequency components of a moving image are sharpened when the pattern is moving. We applied various spatio-temporal filters to moving natural images and evaluated the perceived sharpness and smoothness of motion by comparing them to a stationary image. On each trial, subjects adjusted three parameters of the still image: overall luminance contrast, the slope of the amplitude function in the spatial frequency domain, and cut-off spatial frequency. We found the strongest motion sharpening when image frames were spatially band-reject filtered. In addition, spatially low-pass filtered movies induced stronger motion sharpening than spatially high-pass filtered movies. When temporal filters were applied, perceived sharpness became stronger when the movies were temporally low-pass filtered. A high-pass temporal filter drastically reduced the perceived quality of image motion. Our results demonstrate that the perceived contrast of higher spatial frequency components in moving images is enhanced by the interaction between different spatio-temporal frequency channels in the motion sharpening process. The results suggest that it is possible to compress and enhance moving images by removing higher spatio-temporal frequency information.
Spatial quantization via local texture masking
Wavelet-based transform coding is well known for its utility in perceptual image compression. Psychovisual modeling has lead to a variety of perceptual quantization schemes, for efficient at-threshold compression. Successfully extending these models to supra-threshold compression, however, is a more difficult task. This work attempts to bridge the gap between at threshold modeling and supra-threshold compression by combining a spatially-selective quantization scheme, designed for at-threshold compression with simple MSE-based rate-distortion optimization. A psychovisual experiment is performed to determine how textured image regions can be used to mask quantization induced distortions. Texture masking results from this experiment are used to derive a spatial quantization scheme, which hides distortion in high-contrast image regions. Unlike many spatial quantizers, this technique requires explicit side information to convey contrast thresholds to generate step-sizes. A simple coder is presented that is designed that applies spatially-selective quantization to meet any rate constraints near and above threshold. This coder leverages this side information to reduce the rate required to code the quantized data. Compression examples are compared with JPEG-2000 examples with visual frequency weighting. When matched for rate, the spatially quantized images are highly competitive with and in some cases superior to the JPEG-2000 results in terms of visual quality.
Image Quality of System Tonescale
icon_mobile_dropdown
Perceptual evaluation of tone mapping operators with real-world scenes
Akiko Yoshida, Volker Blanz, Karol Myszkowski, et al.
A number of successful tone mapping operators for contrast compression have been proposed due to the need to visualize high dynamic range (HDR) images on low dynamic range devices. They were inspired by fields as diverse as image processing, photographic practice, and modeling of the human visual systems (HVS). The variety of approaches calls for a systematic perceptual evaluation of their performance. We conduct a psychophysical experiment based on a direct comparison between the appearance of real-world scenes and HDR images of these scenes displayed on a low dynamic range monitor. In our experiment, HDR images are tone mapped by seven existing tone mapping operators. The primary interest of this psychophysical experiment is to assess the differences in how tone mapped images are perceived by human observers and to find out which attributes of image appearance account for these differences when tone mapped images are compared directly with their corresponding real-world scenes rather than with each other. The human subjects rate image naturalness, overall contrast, overall brightness, and detail reproduction in dark and bright image regions with respect to the corresponding real-world scene. The results indicate substantial differences in perception of images produced by individual tone mapping operators. We observe a clear distinction between global and local operators in favor of the latter, and we classify the tone mapping operators according to naturalness and appearance attributes.
Perceptual Image Compression and Quality
icon_mobile_dropdown
Delicate visual artifacts of advanced digital video processing algorithms
With the incoming of digital TV, sophisticated video processing algorithms have been developed to improve the rendering of motion or colors. However, the perceived subjective quality of these new systems sometimes happens to be in conflict with the objective measurable improvement we expect to get. In this presentation, we show examples where algorithms should visually improve the skin tone rendering of decoded pictures under normal conditions, but surprisingly fail, when the quality of mpeg encoding drops below a just noticeable threshold. In particular, we demonstrate that simple objective criteria used for the optimization, such as SAD, PSNR or histogram sometimes fail, partly because they are defined on a global scale, ignoring local characteristics of the picture content. We then integrate a simple human visual model to measure potential artifacts with regard to spatial and temporal variations of the objects' characteristics. Tuning some of the model's parameters allows correlating the perceived objective quality with compression metrics of various encoders. We show the evolution of our reference parameters in respect to the compression ratios. Finally, using the output of the model, we can control the parameters of the skin tone algorithm to reach an improvement in overall system quality.
Analysis of psychological factors for quality assessment of interactive multimodal service
Kazuhisa Yamagishi, Takanori Hayashi
We propose a subjective quality estimation model for interactive multimodal services. First, psychological factors of an audiovisual communication service are extracted by using the semantic differential (SD) technique and factor analysis. Forty subjects participate in subjective tests and perform point-to-point conversational tasks on a PC-based TV phone that exhibits various network qualities. Those subjects assess those qualities on the basis of 25 pairs of adjectives. Two psychological factors, i.e., an aesthetic feeling and a feeling of activity, are extracted from the results. Moreover, overall audiovisual quality can be estimated by evaluating scores of the above two psychological factors. Second, quality impairment factors affecting these two psychological factors are analyzed. We find that the aesthetic feeling is mainly affected by IP packet loss and video coding bit rate, and the feeling of activity depends on delay time and video frame rate. We then propose an opinion model derived from the relationships among quality impairment factors, psychological factors, and overall quality. Our calculations indicate that the estimation error of the proposed model is almost equivalent to the statistical reliability of the subjective score. Finally, using the proposed model, we discuss guidelines for quality design of interactive audiovisual communication services.
Audiovisual quality evaluation of low-bitrate video
Stefan Winkler, Christof Faller
Audiovisual quality assessment is a relatively unexplored topic. We designed subjective experiments for audio, video, and audiovisual quality using content and encoding parameters representative of video for mobile applications. Our focus were the MPEG-4 AVC (a.k.a. H.264) and AAC coding standards. Our goals in this study are two-fold: we want to understand the interactions between audio and video in terms of perceived audiovisual quality, and we use the subjective data to evaluate the prediction performance of our non-reference video and audio quality metrics.
Image Quality of System Tonescale
icon_mobile_dropdown
Can gamut mapping quality be predicted by colour image difference formulae?
Modern digital imaging workflows typically involve a large number of different imaging technologies and media. In order to assure the quality of such workflows, there is a need to quantify how reproduced images have been changed by the reproduction process, and how much these changes are perceived by the human eye. The goal of this study is to investigate whether current color image difference formulae can be used to this end, specifically with regards to the image degradations induced by color gamut mapping. We have applied image difference formulae based on CIELAB, S-CIELAB, and iCAM to a set of images, which have been processed by several state-of-the-art color gamut mapping algorithms. The images have also been evaluated by psychophysical experiments on a CRT monitor. We have not found any statistically significant correlation between the calculated color image differences and the visual evaluations. We have examined the experimental results carefully, in order to understand the poor performance of the color difference calculations, and to identify possible strategies for improving the formulae. For example, S-CIELAB and iCAM were designed to take into account factors such as spatial properties of human vision, but there might be other important factors to be considered to quantify image quality. Potential factors include background/texture/contrast sensitivity effect, human viewing behaviour/area of interest, and memory colors.
Perceptual Image Compression and Quality
icon_mobile_dropdown
Perceptual analysis of video impairments that combine blocky, blurry, noisy, and ringing synthetic artifacts
Mylene C. Q. Farias, John M. Foley, Sanjit K. Mitra
In this paper we present the results of a psychophysical experiment which measured the overall annoyance and artifact strengths of videos with different combinations of blocky, blurry, noisy, and ringing synthetic artifacts inserted in limited spatio-temporal regions. The test subjects were divided into two groups, which performed different tasks - 'Annoyance Judgment' and 'Strength Judgment'. The 'Annoyance' group was instructed to search each video for impairments and make an overall judgment of their annoyance. The 'Strength' group was instructed to search each video for impairments, analyze the impairments into individual features (artifacts), and rate the strength of each artifact using a scale bar. An ANOVA of the overall annoyance judgments showed that the artifact physical strengths had a significant effect on the mean annoyance value. It also showed interactions between the video content (original) and 'noisiness strength', 'original' and 'blurriness strength', 'blockiness strength' and 'noisiness strength', and 'blurriness strength' and 'noisiness strength'. In spite of these interactions, a weighted Minkowski metric was found to provide a reasonably good description of the relation between individual defect strengths and overall annoyance. The optimal value found for the Minkowski exponent was 1.03 and the best coefficients were 5.48 (blockiness), 5.07 (blurriness), 6.08 (noisiness), and 0.84 (ringing). We also fitted a linear model to the data and found coefficients equal to 5.10, 4.75, 5.67, and 0.68, respectively.
Image Quality of System Tonescale
icon_mobile_dropdown
Predicting visible differences in high dynamic range images: model and its calibration
New imaging and rendering systems commonly use physically accurate lighting information in the form of high-dynamic range (HDR) images and video. HDR images contain actual colorimetric or physical values, which can span 14 orders of magnitude, instead of 8-bit renderings, found in standard images. The additional precision and quality retained in HDR visual data is necessary to display images on advanced HDR display devices, capable of showing contrast of 50,000:1, as compared to the contrast of 700:1 for LCD displays. With the development of high-dynamic range visual techniques comes a need for an automatic visual quality assessment of the resulting images. In this paper we propose several modifications to the Visual Difference Predicator (VDP). The modifications improve the prediction of perceivable differences in the full visible range of luminance and under the adaptation conditions corresponding to real scene observation. The proposed metric takes into account the aspects of high contrast vision, like scattering of the light in the optics (OTF), nonlinear response to light for the full range of luminance, and local adaptation. To calibrate our HDR VDP we perform experiments using an advanced HDR display, capable of displaying the range of luminance that is close to that found in real scenes.
Reference-Free Image Quality
icon_mobile_dropdown
Reduced-reference image quality assessment using a wavelet-domain natural image statistic model
Zhou Wang D.V.M., Eero P. Simoncelli
Reduced-reference (RR) image quality measures aim to predict the visual quality of distorted images with only partial information about the reference images. In this paper, we propose an RR quality assessment method based on a natural image statistic model in the wavelet transform domain. In particular, we observe that the marginal distribution of wavelet coefficients changes in different ways for different types of image distortions. To quantify such changes, we estimate the Kullback-Leibler distance between the marginal distributions of wavelet coefficients of the reference and distorted images. A generalized Gaussian model is employed to summarize the marginal distribution of wavelet coefficients of the reference image, so that only a relatively small number of RR features are needed for the evaluation of image quality. The proposed method is easy to implement and computationally efficient. In addition, we find that many well-known types of image distortion lead to significant changes in wavelet coefficient histograms, and thus are readily detectable by our measure. The algorithm is tested with subjective ratings of a large image database that contains images corrupted with a wide variety of distortion types.
Reference-free objective quality metrics for MPEG-coded video
With the growth of digital video delivery, there is an increasing demand for better and more efficient ways to measure video quality. Most existing video quality metrics are reference-based approaches that are not suitable to measure the video quality perceived by the end user without access to reference videos. In this paper, we propose a reference-free video quality metric for MPEG coded videos. It predicts subjective quality ratings using both reference-free MPEG artifact measures and MPEG system parameters (known or estimated). The advantage of this approach is that it does not need a precise separation of content and artifact or the removal of any artifacts. By exploring the correlations among different artifacts and system parameters, our approach can remove content dependency and achieve an accurate estimate of the subjective ratings.
No reference video quality estimation based on human visual system for 2.5/3G devices
2.5/3G devices should achieve satisfactory QoS, overcoming mobile standards drawbacks. In-service/blind quality monitoring is essential in order to improve perceptual quality according to Human Visual System. Several techniques have been proposed for image/video quality assessment. A novel no-reference quality index which uses an effective HVS model is proposed. Luminance masking, Contrast Sensitivity Function and temporal masking are taken into account with fast in-service algorithms. The proposed index is able to assess blockiness distortion with a fast image-domain measure. Compression/post-processing blurring effects are measured with a standard approach. Moving artifacts distortion is evaluated taking into account standard deviation with respect to a natural image statistical model. Several distortion effects, in wireless noisy channels with low video-streaming/playback bit rates (e.g. edge busyness and image persistence) are evaluated. A multi-level pooling algorithm (block, temporal-window, frame, and sequence levels) is used. Validation tests have been developed in order to assess index performance and computational complexity. The final measure provides human-like threshold-effect and high correlation with subjective data. Low complexity algorithms can be derived for real-time, HVS-based, QoS management for low-power consumer devices. Different distortion effects (e.g. ringing and jerkiness) can be easily included.
Perceptually Based Techniques for Digital Libraries
icon_mobile_dropdown
Mimicking human texture classification
In an attempt to mimic human (colorful) texture classification by a clustering algorithm three lines of research have been encountered, in which as test set 180 texture images (both their color and gray-scale equivalent) were drawn from the OuTex and VisTex databases. First, a k-means algorithm was applied with three feature vectors, based on color/gray values, four texture features, and their combination. Second, 18 participants clustered the images using a newly developed card sorting program. The mutual agreement between the participants was 57% and 56% and between the algorithm and the participants it was 47% and 45%, for respectively color and gray-scale texture images. Third, in a benchmark, 30 participants judged the algorithms' clusters with gray-scale textures as more homogeneous then those with colored textures. However, a high interpersonal variability was present for both the color and the gray-scale clusters. So, despite the promising results, it is questionable whether average human texture classification can be mimicked (if it exists at all).
Experimental determination of visual color and texture statistics for image segmentation
We consider the problem of segmenting images of natural scenes based on color and texture. A recently proposed algorithm combines knowledge of human perception with an understanding of signal characteristics in order to segment natural scenes into perceptually/semantically uniform regions. We conduct subjective tests to determine key parameters of this algorithm, which include thresholds for texture classification and feature similarity, as well as the window size for texture estimation. The goal of the tests is to relate human perception of isolated (context-free) texture patches to image statistics obtained by the segmentation procedure. The texture patches correspond to homogeneous texture and color distributions and were carefully selected to cover the entire parameter space. The parameter estimation is based on fitting statistical models to the texture data. Experimental results demonstrate that this perceptual tuning of the algorithm leads to significant improvements in segmentation performance.
Perceptual Image Representation and Display
icon_mobile_dropdown
The utility of perspecta 3D volumetric display for completion of tasks
Thomas R. Tyler, Andy Novobilski, Joseph Dumas, et al.
This paper explores the hypothesis that the depth cues and display quality of a 3D volumetric display provides advantages for learning simple tasks. Experimental data generated by human subjects using the Perspecta 3D Volumetric Display are compared to like data generated using a 2D flat screen liquid crystal display (LCD). These data show that the Perspecta display provides advantages over the LCD display with respect to peak performance of simple tasks.
Poster Session
icon_mobile_dropdown
Methods study for the relocation of visual information in central scotoma cases
Anne-Catherine Scherlen, Vincent Gautier
In this study we test the benefit on the reading performance of different ways to relocating the visual information present under the scotoma. The relocation (or unmasking) allows to compensate the loss of information and avoid the patient developing driving strategies not adapted for the reading. Eight healthy subjects were tested on a reading task, on each a central scotoma of various sizes was simulated. We then evaluate the reading speed (words/min) during three visual information relocation methods: all masked information is relocated - on both side of scotoma, - on the right of scotoma, - and only essentials letters for the word recognition too on the right of scotoma. We compare these reading speeds versus the pathological condition, ie without relocating visual information. Our results show that unmasking strategy improve the reading speed when all the visual information is unmask to the right of scotoma, this only for large scotoma. Taking account the word morphology, the perception of only certain letters outside the scotoma can be sufficient to improve the reading speed. A deepening of reading processes in the presence of a scotoma will then allows a new perspective for visual information unmasking. Multidisciplinary competences brought by engineers, ophtalmologists, linguists, clinicians would allow to optimize the reading benefit brought by the unmasking.
Perceptual Image Analysis
icon_mobile_dropdown
A human visual model-based approach of the visual attention and performance evaluation
In this paper, a coherent computational model of visual selective attention for color pictures is described and its performances are precisely evaluated. The model based on some important behaviours of the human visual system is composed of four parts: visibility, perception, perceptual grouping and saliency map construction. This paper focuses mainly on its performances assessment by achieving extended subjective and objective comparisons with real fixation points captured by an eye-tracking system used by the observers in a task-free viewing mode. From the knowledge of the ground truth, qualitatively and quantitatively comparisons have been made in terms of the measurement of the linear correlation coefficient (CC) and of the Kulback Liebler divergence (KL). On a set of 10 natural color images, the results show that the linear correlation coefficient and the Kullback Leibler divergence are of about 0.71 and 0.46, respectively. CC and Kl measures with this model are respectively improved by about 4% and 7% compared to the best model proposed by L.Itti. Moreover, by comparing the ability of our model to predict eye movements produced by an average observer, we can conclude that our model succeeds quite well in predicting the spatial locations of the most important areas of the image content.
Perceptual Image Representation and Display
icon_mobile_dropdown
Stylized rendering for multiresolution image representation
By integrating stylized rendering with an efficient multiresolution image representation, we enable the user to control how compression affects the aesthetic appearance of an image. Adopting a point-based rendering approach to progressive image transmission and compression, we represent an image by a sequence of color values. To best approximate the image at progressive levels of detail, a novel, adaptive farthest point sampling algorithm balances global coverage with local precision. Without storing any spatial information apart from the aspect ratio, the spatial position of each color value is inferred from the preceding members of the sampling sequence. Keeping track of the spatial influence of each sample on the rendition, a progressively generated discrete Voronoi diagram forms the common foundation for our sampling and rendering framework. This framework allows us to extend traditional photorealistic methods of image reconstruction by scattered data interpolation to encompass non-photorealistic rendering. It supports a wide variety of artistic rendering styles based on geometric subdivision or parametric procedural textures. Genetic programming enables the user to create original rendering styles through interactive evolution by aesthetic selection. Comparing our results with JPEG, we conclude with a brief overview of the implications of using non-photorealistic representations for highly compressed imagery.
Vision, healing brush, and fiber bundles
Todor Georgiev
The Healing Brush is a tool introduced for the first time in Adobe Photoshop (2002) that removes defects in images by seamless cloning (gradient domain fusion). The Healing Brush algorithms are built on a new mathematical approach that uses Fibre Bundles and Connections to model the representation of images in the visual system. Our mathematical results are derived from first principles of human vision, related to adaptation transforms of von Kries type and Retinex theory. In this paper we present the new result of Healing in arbitrary color space. In addition to supporting image repair and seamless cloning, our approach also produces the exact solution to the problem of high dynamic range compression of17 and can be applied to other image processing algorithms.
Portraying Visual Reality: The Limits of Perception and Representation in Art
icon_mobile_dropdown
Where should you sit to watch a movie?
Martin S. Banks, Heather F. Rose, Dhanraj Vishwanath, et al.
When a picture is viewed from positions other than its center of projection, there can be large changes specified in the retinal image, yet the perceived spatial layout and shape of objects do not seem to change. We have shown that compensation for oblique viewing occurs provided that the viewer can estimate the slant and tilt of the picture surface accurately (Vishwanath, Girshick, & Banks, 2004). Compensation is nearly veridical with binocular viewing at close range. Compensation generally does not occur with monocular viewing through a small aperture; instead, the percept is dictated by the shape of the retinal image. The mechanism for compensation appears to operate locally; that is, separately for each part of the picture. Our findings help to explain invariance for incorrect viewing positions, and other phenomena like perceived distortions with wide fields of view and the anamorphic effect. Our findings also have relevance to the design of displays. We will discuss, for example, how the viewer’s position ought to affect percepts depending on the shape of the display surface.
A horopter for two-point perspective
Linear perspective is constructed for a particular viewing location with respect to the scene being viewed and, importantly, the location of the canvas between the viewer and the scene. Conversely, both the scene and the center of projection may be reconstructed with some knowledge of the structure of the scene. For example, if it is known that the objects depicted have symmetrical features, such as equiangular corners, the center of projection is constrained to a single line (or point) in space. For one-point perspective (with a single vanishing point for all lines that are not parallel to the canvas plane), the constraint line runs from the vanishing point perpendicular to the canvas. For two-point perspective, in which the objects depicted are oblique to the canvas, the constraint line is a semicircle joining the two vanishing points. A viewer located at any point on the circle will see the depicted objects as rectangular and symmetric, and will have no grounds for knowing that the perspective was not constructed for this viewing location (unless there are objects that are known to be square, i.e., a further symmetry constraint on the object structures). This semi-circular line of rectangular validity forms a kind of horopter for two-point perspective. Moving around this semi-circular line for an architectural scene gives the viewer the odd impression of the architecture reforming itself in credible fashion to form an array of equally plausible structures.
Quantitative analysis of qualitative images
David Hockney, Charles M. Falco
We show optical evidence that demonstrates artists as early as Jan van Eyck and Robert Campin (c1425) used optical projections as aids for producing their paintings. We also have found optical evidence within works by later artists, including Bermejo (c1475), Lotto (c1525), Caravaggio (c1600), de la Tour (c1650), Chardin (c1750) and Ingres (c1825), demonstrating a continuum in the use of optical projections by artists, along with an evolution in the sophistication of that use. However, even for paintings where we have been able to extract unambiguous, quantitative evidence of the direct use of optical projections for producing certain of the features, this does not mean that paintings are effectively photographs. Because the hand and mind of the artist are intimately involved in the creation process, understanding these complex images requires more than can be obtained from only applying the equations of geometrical optics.
Asymmetry in Lotto carpets and its implication for Hockney's optical projection theory
Recently the artist David Hockney theorized that some European painters as early as 1420 used concave mirrors and, later, converging lenses, to project real inverted images onto their canvases or other supports which they then traced and painted over. We consider a specific painting adduced as the primary evidence for this bold theory by Hockney and his collaborator, thin-film physicist Charles Falco: Lorenzo Lotto’s Husband and wife (c. 1543). These projection theorists attribute perspective anomalies in the painting to Lotto repositioning a concave mirror, specifically to overcome its limitations in depth of field. Their analysis lies thoroughly and crucially upon the assumption that the physical carpet pattern was symmetric. We point to a study of “Lotto carpets” surviving in museum collections that shows that these comparison carpets are significantly asymmetric. There seems to be no persuasive independent evidence to support the projection proponents’ assumption that these carpets, hand-knotted by children in 16th-century Turkey, were symmetric. Moreover, the angular asymmetries in these surviving carpets are nearly the same as those corresponding to the anomalies in the painting, strongly suggesting that these “anomalies” are in fact due to inherent carpet asymmetries, not to changes in configuration of an optical projector. We show that a non-optical explanation can fit the visual evidence with a precision roughly equal to that of the projection theory, but without the need to invoke a complicated, undocumented optical system. Finally, had Lotto used such an optical projector, we would expect both the general historical documentary record and Lotto’s own writings to indicate as much; however no such corroboratory evidence exists. We conclude by rejecting the numerous claims of “proof” that Lotto used optical projections when executing this painting.
Poster Session
icon_mobile_dropdown
Blink duration measurement system for drowsiness detection using image processing
Takayuki Kageyama, Masami Kato
Support systems for automobile drivers have been recently developed. One support system prevents drivers from sleeping at the wheel, but an adequate system has not yet been commercialized. To detect the drowsiness level, methods can be divided into two categories. One category uses vehicle information, such as monitoring the car’s distance to the lane marker to detect how a driver is maneuvering the steering wheel. The other category uses physical information, such as brain waves, the electrical potential of the skin or heart, eye motion, and blinking information. Among these methods, blinking information is said to reflect drowsiness most easily. Many reports say that when a driver is drowsy, the blinking duration tends to be long. So, we propose a method to prevent drivers from sleeping by detecting the blinking duration. Because this method only uses simple image processing, the algorithm is not complex. We used a digital camera and PC because we wanted to use hardware that everybody can easily prepare. The algorithm involves face localization, then eye localization, followed by measurement of the blinking duration. We tested our system on a number of frames, and the correct blinking duration was usually detected.
Human recognition by body shape features
Ming Du, Ling Guan
Non-invasive biometrics is of particular importance because of its application under surveillance environment. Although traditional research in this field is mostly focused on gait recognition, feature based on human body shape is one of the alternate choices we can rely on. Here we propose a body shape based identification system, trying to explore the its distinguishing power in biometrics. Robust image processing procedures such as Wiener filter are implemented to extract binary silhouettes from frontal-view human walking video. The Kalman filter, usually adopted as a powerful tool to facilitate tracking in computer vision applications, here functions as a reliable estimator to recover body shape information from the corrupted observations. The dynamically extracted static feature vectors are then compared to templates to achieve identification. We provide experimental results to demonstrate the performance of our system.
An artifacts-based video quality metric using fuzzy logic
Wei Dai, Zhen Cai, William E. Lynch
In this paper, a Quality Metric (QM) called the Artifacts-based Video Quality Metric (AVQM) is proposed. Three avenues of innovation are exploited. 1) The context of the video is used so that artifacts in different surroundings are treated differently. 2) A measurement is made of the block-flashing artifact. 3) A non-linear method of combining artifact measures using fuzzy methods is employed. Simulation results indicate that these significantly improve performance.
Measuring the negative impact of frame dropping on perceptual visual quality
Zhongkang Lu, Weisi Lin, Boon Choong Seng, et al.
Work presented in the paper includes two parts: first we measured the detectability and annoyance of frame dropping's effect on perceptual visual quality evaluation under different motion and framesize conditions. Then, a new logistics function and an effective yet simple motion content representation are selected to model the relationship among motion, framerate and negative impact of frame-dropping on visual quality, in one formula. The high Pearson and Spearman correlation results between the MOS and predicted MOSp, as well as the results of other two error metrics, confirm the success of the selected logistic function and motion content representation.
Minimization of color shift generated in RGBW quad structure.
Hong Chul Kim, Jae Kyeong Yun, Heume-Il Baek, et al.
The purpose of RGBW Quad Structure Technology is to realize higher brightness than that of normal panel (RGB stripe structure) by adding white sub-pixel to existing RGB stripe structure. However, there is side effect called 'color shift' resulted from increasing brightness. This side effect degrades general color characteristics due to change of 'Hue', 'Brightness' and 'Saturation' as compared with existing RGB stripe structure. Especially, skin-tone colors show a tendency to get darker in contrast to normal panel. We’ve tried to minimize 'color shift' through use of LUT (Look Up Table) for linear arithmetic processing of input data, data bit expansion to 12-bit for minimizing arithmetic tolerance and brightness weight of white sub-pixel on each R, G, B pixel. The objective of this study is to minimize and keep Δu'v' value (we commonly use to represent a color difference), quantitative basis of color difference between RGB stripe structure and RGBW quad structure, below 0.01 level (existing 0.02 or higher) using Macbeth colorchecker that is general reference of color characteristics.
Using perceptually based face indexing to facilitate human-computer collaborative retrieval
John Arthur Black Jr., Laurent Bonnasse, Pallavi Satyan, et al.
While methods such as Principle Component Analysis, Linear/Fisher Discriminate Analysis, and Hidden Markov Models provide useful similarity measures between face images, they are not based on factors that humans use to perceive facial similarity. This can make it difficult for humans to work collaboratively with face retrieval systems. For example, if a witness to a crime uses a query-by-example paradigm to retrieve the face of the perpetrator from a database of mug-shots, and if the similarity measures used for retrieval are not based on facial features that are salient or important to humans, the retrievals will likely be of limited value. Based on the observation that humans tend to name things that are particularly salient or important to them, this research uses words (such as bearded, bespectacled, big eared, blond, buck-toothed, bug-eyed, curly-haired, dimpled, freckled, gap-toothed, long-faced, snub-nosed, thin-lipped, or wrinkled) to manually index face images. Pair-wise similarity values are then derived from the resulting feature vectors and are compared to ground-truth similarity values, which have been established by having humans hierarchically sort the same set of face images. This comparison indicates which words are most important for indexing the face images, allows the computation of a weighting factor for each word to enhance the overall quality of indexing, and suggests which facial features might provide a more intuitive basis for evaluating similarity.
Investigating affective color association of media content in language and perception based on online RGB experiment
As an investigation of color categorization in language and perception, this research surveyed the affective association between certain colors and different media contents (movie genres). Compared to scientific graphics, entertainment graphics are designed to deliver emotionally stimulating content to audiences. Based on an online color survey of 19 subjects, this study investigated whether or not subjects had different color preferences on diverse movie genres. Instead of providing predefined limited number of color chips (or pictures) as stimuli, this study was conducted by asking the subjects to visualize their own images of movie genres and to select their preferred colors through a RGB color palette. To compare the distribution of movie genres, the user selected colors were mapped on CIE chromaticity diagram. The subject also described both their preferred color naming of different movie genres and three primary color names of the their most favorite movie genre. The results showed that the subjects had different color associations with specific movie genres as well as certain genres showed higher individual differences. Regardless of genre differences, the subjects selected blue, red or green as their three primary color names representing their favorite movie genres. The results supports Berlin & Kay’s eleven color terms.
Memorial Session in Honor of Bela Julesz
icon_mobile_dropdown
Spatio-temporal interactions that promote the smoothness constraint for binocular matches
Clifton M. Schor, Zhi-Lei Zhang
Early in his career, Bela Julesz introduced the stereo matching problem while working at Bell Labs on an encryption project. The common belief at that time was based on Wheatstone’s proposal that 2-D space perception of form preceded coding of disparity for 3-D space perception. However, with the random-dot stereogram, Julesz demonstrated that stereoscopic depth could be perceived in the absence of any identifiable objects or perspective cues available to either eye alone. This work inspired many algorithms for binocular matching including the smoothness constraint. Wheatstone’s and Julesz’s proposals as to whether binocular matches are solved at a low level, prior to form perception, or after form is perceived are still debated. We have examined spatio-temporal interactions that promote binocular matches and yield percepts of smooth surfaces in depth. We identified low-level processes for estimating depth differences between surface patches that require their proximity in both time and space, and a high level process that minimizes their depth differences when surface texture of adjacent patches appears to belong to the same surface. This suggests that the stereo-matching solution is influenced by a priori assumptions about the surface configuration of the scene and by monocular and binocular spatial cues.
Poster Session
icon_mobile_dropdown
Annoyance of individual artifacts in MPEG-2 compressed video and their relation to overall annoyance
Chin Chye Koh, Sanjit K. Mitra, John M. Foley, et al.
In this work we describe a study on a limited number of artifacts in MPEG-2 compressed video with the primary aim to analyze how annoying these compression artifacts are. More specifically, the objectives were: (1) to determine how the subjective annoyances of individual artifacts contribute to the overall annoyance, (2) to obtain the subjective ranks of the artifacts, and (3) to determine what relationships exist between annoyance values and annoyance ranks. To this end, a psychophysical experiment was carried out in which observers provided us with their subjective assessment of video sequences. The results showed that at low compression bit-rates, the blocking artifacts were the most annoying, whereas at higher compression bit-rates the ringing artifacts were the most annoying. The blocking artifact had the highest mean annoyance rank across all videos and compression bit-rates. Mean annoyance values were ordered differently from mean annoyance ranks due to very high annoyance values associated with some artifacts. Overall annoyance was related to the total squared error by a Logistic function. Individual artifact annoyance was related to overall annoyance by a weighted Minkowski metric.
Perceptual Image Analysis
icon_mobile_dropdown
Pictorial relief for equiluminant images
Andrea J. van Doorn, Huib de Ridder, Jan J. Koenderink
Pictorial relief depends strongly on “cues” in the image. For isoluminant renderings some cues are missing, namely all information that is related to luminance contrast (e.g., shading, atmospheric perspective). It has been suggested that spatial discrimination and especially pictorial space suffer badly in isoluminant conditions. We have investigated the issue through quantitative measurement of pictorial depth-structure under normal and isoluminant conditions. As stimuli we used monochrome halftone photographs, either as such, or “transposed” to Red/Green or Green/Red hue modulations. We used two distinct methods, one to probe pictorial pose (by way of correspondences settings between pictures of an object in different poses), the other to probe pictorial depth (by way of attitude settings of a gauge figure to a perceptual “fit”). In both experiments the depth reconstructions for Red/Green, Green/Red and monochrome conditions were very similar. Moreover, observers performed equally well in Red/Green, Green/Red and monochrome conditions. Thus, the general conclusion is that observers did not do markedly worse with the isoluminant Red/Green and Green/Red transposed images. Whereas the transposed images certainly looked weird, they were easily interpreted. Much of the structure of pictorial space was apparently preserved. Thus the notion that spatial representations are not sustained under isoluminant conditions should be applied with caution.
Special Session: VALVE I
icon_mobile_dropdown
Perceiving simulated ego-motions in virtual reality: comparing large screen displays with HMDs
Bernhard E. Riecke, Joerg Schulte-Pelkum, Heinrich H. Buelthoff
In this keynote I will present some of the work from our virtual reality laboratory at the Max Planck Institute for Biological Cybernetics in Tübingen. Our research philosophy to understand the brain is to study human information processing in an experimental setting as close as possible to our natural environment. Using computer graphics and virtual reality technology we can now study perception not only in a well controlled natural setting but also in a closed perception-action loop, in which the action of the observer will also change the input to our senses. In psychophysical studies we could show that humans can integrate multimodal sensory information in a statistically optimal way, in which cues are weighted according to their reliability. A better understanding of multimodal sensor fusion will allow us to build new virtual reality platforms in which the design effort for visual, auditory, haptic, vestibular and proprioceptive simulation is influenced by the weight of each cue in multimodal sensor fusion.
Importance of perceptual representation in the visual control of action
Jack M. Loomis, Andrew C. Beall, Jonathan W. Kelly, et al.
In recent years, many experiments have demonstrated that optic flow is sufficient for visually controlled action, with the suggestion that perceptual representations of 3-D space are superfluous. In contrast, recent research in our lab indicates that some visually controlled actions, including some thought to be based on optic flow, are indeed mediated by perceptual representations. For example, we have demonstrated that people are able to perform complex spatial behaviors, like walking, driving, and object interception, in virtual environments which are rendered visible solely by cyclopean stimulation (random-dot cinematograms). In such situations, the absence of any retinal optic flow that is correlated with the objects and surfaces within the virtual environment means that people are using stereo-based perceptual representations to perform the behavior. The fact that people can perform such behaviors without training suggests that the perceptual representations are likely the same as those used when retinal optic flow is present. Other research indicates that optic flow, whether retinal or a more abstract property of the perceptual representation, is not the basis for postural control, because postural instability is related to perceived relative motion between self and the visual surroundings rather than to optic flow, even in the abstract sense.
Pilot behavior and course deviations during precision flight
Jeffrey B. Mulligan, Xavier L. C. Brolly
In the fall of 2003, a series of flight tests were performed in the Tullahoma, Tennessee area to assess the ability of non-instrument rated helicopter pilots to fly precision routes with the aid of a Global Positioning System (GPS) receiver. The navigation performance of pilot subjects was assessed from GPS recordings of the flight trajectory, while pilot behavior was recorded using four video cameras, two of which were attached to a goggle frame worn by the pilot. This paper describes the processing methods developed for these data, and presents some preliminary results.
Special Session: VALVE II
icon_mobile_dropdown
Intercepting moving targets: why the hand's path depends on the target's velocity
Eli Brenner, Jeroen B.J. Smeets
In order to intercept a moving target one must reach some position at the same moment as the target. Considering that moving towards such a position takes time, it seems obvious that one must determine where one can best intercept the target well in advance. However, experiments on hitting moving targets have shown that the paths that the hand takes when trying to intercept targets that are moving at different velocities are different, even if the targets are hit at the same position. This is particularly evident at high target velocities, which seems strange because the benefit of considering the target’s velocity should be largest for fast targets. We here propose that the paths’ curvature may intentionally differ for different target velocities in order to maximize the chance of hitting the target. Arriving at the target with a velocity that matches that of the target can reduce the consequence of certain temporal errors. In particular, if the path curves in a way that makes the component of the hand’s final velocity that is orthogonal to the hitting direction exactly match the velocity of the target, then no additional error will arise from arriving at the target slightly earlier or later than expected. On the other hand, moving along a curved path is likely to increase the spatial errors. We argue that a compromise between these two influences could account for the differences between paths towards fast and slow targets.
Separating the edge-based detection of object motion from the detection of objectless motion energy: implications for visually guided locomotion
Evidence is provided for independent motion pathways that can serve to discriminate the motion of objects from the optic flow produced by the perceiver's egomotion, the latter based on detecting motion energy. Motion energy models are founded on the idea that low-level motion perception entails the detection of spatiotemporal changes in raw luminance (i.e., oriented energy), irrespective of the boundaries that segregate objects from their background and/or delineate the parts of objects. In the current study, it was shown that the distinction between motion based on detecting an object's edges and motion based on detecting motion energy corresponds to Wertheimer's distinction between beta motion and objectless phi motion. Evidence came from a stimulus for which luminance increments spread in one direction, but in a way that created stimulus information specifying successive edge motions in the opposite direction. Objectless phi motion is perceived only for brief frame durations (high speeds). Beta motion is perceived for relatively long frame durations (slower speeds) when luminance contrast decreases at one edge and simultaneously increases at another. These results, which cannot be accounted for by attentive feature tracking, indicate that there are independent mechanisms for detecting object motion and detecting objectless motion energy.
Achieving near-correct focus cues in a 3D display using multiple image planes
Simon J. Watt, Kurt Akeley, Ahna R. Girshick, et al.
Focus cues specify inappropriate 3-D scene parameters in conventional displays because the light comes from a single surface, independent of the depth relations in the portrayed scene. This can lead to distortions in perceived depth, as well as discomfort and fatigue due to the differing demands on accommodation and vergence. Here we examine the efficacy of a stereo-display prototype designed to minimize these problems by using multiple image planes to present near-correct focus cues. Each eye’s view is the sum of several images presented at different focal distances. Image intensities are assigned based on the dioptric distance of each image plane from the portrayed object, determined along visual lines. The stimulus to accommodation is more consistent with the portrayed depth than with conventional displays, but it still differs from the stimulus in equivalent real scenes. Compared to a normal, fixed-distance display, observers showed improved stereoscopic performance in different psychophysical tasks including speed of fusing stereoscopic images, precision of depth discrimination, and accuracy of perceived depth estimates. The multiple image-planes approach provides a practical solution for some shortcomings of conventional displays.
Visual illusions: pointing the finger at the Judd illusion
Andrew Dunn, Peter Thompson
Attempts to demonstrate dual route processing of vision for action and vision for perception, have yielded mixed results. Early work suggested that motor actions were, unlike perceptual responses, unaffected by visual illusions. However, it has been argued that these experiments were methodologically flawed and that the evidence actually supports a unitary representation account. We have examined perception of and pointing at the Judd illusion. In Experiment 1 we compared immediate object orientated pointing with perceptual line-matching at the ends and unmarked midpoints of the left and right facing Judd illusion. In Experiment 2 we compared immediate and delayed (4s) pointing and line-matching performance using the right facing Judd. We found that although pointing and matching were affected by the illusion, the pattern and magnitude of the errors were different across modality. Immediate pointing performance was generally less accurate (bigger errors) than line-matching performance; delayed pointing accuracy improved (errors were reduced) while line-matching accuracy remained unchanged. We argue that these data do not fit either the unitary or the standard dual route account, and are best understood in the context of a two-stage dual route model. We suggest that looking for differences in the pattern of results might serve a more useful approach than focussing on null effects in the motor task.
Special Session: VALVE III
icon_mobile_dropdown
Spatial awareness in immersive virtual environments revealed in open-loop walking
Kathleen A. Turano, Sidhartha Chaudhury
As we move, we receive feedback from environmental information and internal self-motion cues (proprioception). This co-variation serves to calibrate our action system with respect to the environment and is integral in allowing us to know where we are within a body-scaled space. While the calibration established in the real world is robust enough to support walking without vision to a previously seen target, we propose that the action system needs to be recalibrated when scenes are from a virtual environment (VE). We will present results from experiments where subjects walked without vision to targets in briefly displayed scenes from virtual and real environments. The only available feedback from external sources was a single beep emitted at the end of a trial, implicitly signaling the target distance. Unlike performance in the real world, in the initial trials within the VE, subjects’ egocentric reference frame shifted in concert with the changing scene context. Over time, subjects became less dependent on the unreliable scene context and performance in the VE approached that in the real world. The change in behavior over time is consistent with subjects adopting a more consistent external cue (the beep) to calibrate their action systems. Supported by NIH EY07839 to KAT.
Walking simulator for evaluation of ophthalmic devices
James Barabas, Russell L. Woods, Eli Peli
Simulating mobility tasks in a virtual environment reduces risk for research subjects, and allows for improved experimental control and measurement. We are currently using a simulated shopping mall environment (where subjects walk on a treadmill in front of a large projected video display) to evaluate a number of ophthalmic devices developed at the Schepens Eye Research Institute for people with vision impairment, particularly visual field defects. We have conducted experiments to study subject's perception of "safe passing distance" when walking towards stationary obstacles. The subject's binary responses about potential collisions are analyzed by fitting a psychometric function, which gives an estimate of the subject's perceived safe passing distance, and the variability of subject responses. The system also enables simulations of visual field defects using head and eye tracking, enabling better understanding of the impact of visual field loss. Technical infrastructure for our simulated walking environment includes a custom eye and head tracking system, a gait feedback system to adjust treadmill speed, and a handheld 3-D pointing device. Images are generated by a graphics workstation, which contains a model with photographs of storefronts from an actual shopping mall, where concurrent validation experiments are being conducted.
Matching visual and nonvisual signals: evidence for a mechanism to discount optic flow during locomotion
Adrian Thurrell, Adar Pelah
We report on recent experiments to investigate the Arthrovisual Locomotor Effect (ALE), a mechanism based on non-visual signals postulated to discount or remove the self-generated visual motion signals during locomotion. It is shown that perceptual matches made by standing subjects to a constant motion optic flow stimulus that is viewed while walking on a treadmill are linearly reduced by walking speed, a measure of the reported ALE. The degree of reduction in perceived speed depends on the similarity of the motor activity to natural locomotion, thus for the four activities tested, ALE strength is ranked as follows: Walking > Cycling > Hand Pedalling > Finger Tapping = 0. Other variations and important controls for the ALE are described.
Mixed visual reference frames: perceiving nonretino-centric visual quantities in a retino-centric frame
A. V. van den Berg, R. van Ee, Andre J. Noest
It is a useful competence to see motion relative to the head or to the external world, although those quantities are not directly given on the retina. The same holds for judgement of the shape of an object. We argue that the required transformations can be, and are done independent of the associated direction transformations. This involves perceptual channels with retinal apertures but non-retinocentric motion- or shape- sensitivity. In order to arrive at units that perform such a mixed transformation, the substructure of the retinotopic receptive field needs to be dynamically adjusted, using extra-retinal signals (or equivalent measures like vertical disparity). Here we show that detectors tuned to disparity-curvature X retinal-direction can extract (metric) object curvature from the retinal disparity field in one step. We point out the correspondence to a previously proposed model of heading detection, which contains detectors that become tuned to head-centric flow by dynamically changing their preferred structure of the flowfield in a retinal aperture, depending on the eye movement.
Special Session: VALVE IV
icon_mobile_dropdown
The use of visual and nonvisual cues in updating the perceived position of the world during translation
Laurence R. Harris, Richard T. Dyde, Michael R. Jenkin
During self-motion the perceived positions of objects remain fixed in perceptual space. This requires that their perceived positions are updated relative to the viewer. Here we assess the roles of visual and non-visual information in this spatial updating. To investigate the role of visual cues observers sat in an enclosed, immersive, virtual environment formed by six rear-projection screens. A simulated room was presented stereographically and shifted relative to the observer. A playing card, whose movement was phase-locked to the room, floated in front of the subject who judged if this card was displaced more or less than the room. Surprisingly, perceived stability occurred not when the card’s movement matched the room’s displacement but when perspective alignment was maintained and parallax between the card and the room was removed. The role of the complementary non-visual cues was investigated by physically moving subjects in the dark. Subjects judged whether a floating target was displaced more or less than if it were earth stable. To be judged as earth-stationary the target had to move in the same direction as the observer: more so if the movement was passive. We conclude that both visual and non-visual cues to self-motion and active involvement in the movement are simultaneously required for veridical spatial updating.
Perception of object movement during self-movement
Simon K. Rushton, Paul A. Warren
Motion of the image of an object across the retina may be due to movement of the object, movement of the observer or a combination of the two. The human brain has a well-documented sensitivity to "flow" - the characteristic pattern of retinal motion resulting from movement of the observer's eye through the environment (self-movement). If the pattern of flow due to self-movement could be parsed out then any remaining retinal motion could be attributed to movement of an object within the environment (object-movement). We review the results of three studies conducted recently on detection of movement, induced movement and visual search. The results of all three studies are compatible with the flow-parsing hypothesis described above. The commonly held assumption that the primary role of flow processing is in the guidance of locomotion has been disputed. Here we suggest an alternative role in which flow processing does not control but compensates for locomotion.
Investigations on the interactions between vision and locomotion using a treadmill virtual environment
William B. Thompson, Sarah H. Creem-Regehr, Betty J. Mohler, et al.
Treadmill-based virtual environments have the potential to allow near natural locomotion through large-scale simulated spaces. To be effective, such devices need to provide users with visual and biomechanical sensations of walking that are sufficiently accurate to evoke perception-action couplings comparable to those occurring in the real world. We are exploring this problem using a custom built, computer controlled treadmill with a 6' by 10' walking surface, coupled to computer graphics presented on wide field-of-view back projection screens. The system has the added feature of being able to apply forces to the user to simulate walking on slopes and the effects of changes in walking speed. We have demonstrated the effectiveness of this system by showing that the perceptual-motor calibration of human locomotion in the real world can be altered by prior walking on the treadmill virtual environment when the visual flow associated with self-motion is mismatched relative to biomechanical walking speed. The perceptual-motor coupling that we have achieved is sufficient to allow investigation of a number of open questions, including the effect of walking on slopes on the visual estimation of slant and visual influences on gait and walking speed.
Virtual odometry from visual flow
Markus Lappe, Harald Frenz, Thomas Buehrmann, et al.
We investigate how visual motion registered during one's own movement through a structured world can be used to gauge travel distance. Estimating absolute travel distance from the visual flow induced in the optic array of a moving observer is problematic because optic flow speeds co-vary with the dimensions of the environment and are thus subject to an environment specific scale factor. Discrimination of the distances of two simulated self-motions of different speed and duration is reliably possible from optic flow, however, if the visual environment is the same for both motions, because the scale factors cancel in this case. Here, we ask whether a distance estimate obtained from optic flow can be transformed into a spatial interval in the same visual environment. Subjects viewed a simulated self-motion sequence on a large (90 by 90 deg) projection screen or in a computer animated virtual environment (CAVE) with completely immersive, stereographic, head-yoked projection, that extended 180deg horizontally and included the floor space in front of the observer. The sequence depicted self-motion over a ground plane covered with random dots. Simulated distances ranged from 1.5 to 13 meters with variable speed and duration of the movement. After the movement stopped, the screen depicted a stationary view of the scene and two horizontal lines appeared on the ground in front of the observer. The subject had to adjust one of these lines such that the spatial interval between the lines matched the distance traveled during the movement simulation. Adjusted interval size was linearly related to simulated travel distance, suggesting that observers could obtain a measure of distance from the optic flow. The slope of the regression was 0.7. Thus, subjects underestimated distance by 30%. This result was similar for stereoscopic and monoscopic conditions. We conclude that optic flow can be used to derive an estimate of travel distance, but this estimate is subject to scaling when compared to static intervals in the environment, irrespective of steroscopic depth cues.
The perception of linear self-motion
Frank H. Durgin, Laura F. Fox, Evan Schaffer, et al.
VR lends itself to the study of intersensory calibration in self-motion perception. However, proper calibration of visual and locomotor self-motion in VR is made complicated by the compression of perceived distance and by unfamiliar modes of locomotion. Although adaptation is fairly rapid with exposure to novel sensorimotor correlations, here we show that good initial calibration is found when both (1) the virtual environment is richly structured in near space (1 m) and (2) locomotion is on solid ground. Previously it had been observed that correct visual speeds seem too slow when walking on a treadmill. Several principles may be involved, including inhibitory sensory prediction (Durgin et al., in press), distance compression, and missing peripheral flow in the reduced FOV (Banton et al., in press). However, though a richly-structured near-space environment provides higher rates of peripheral flow, it does not improve calibration when walking on a treadmill. Conversely, walking on solid ground does not improve calibration in an empty (though well-textured) virtual hallway. Because walking on solid ground incorporates well-calibrated mechanisms that can assess speed of self-motion independent of vision, our observations suggest that near space is also better calibrated in our HMD. Near-space obstacle avoidance systems may also be involved.
Perceptual Image Analysis
icon_mobile_dropdown
Transient-based image segmentation: top-down surround suppression in human V1
Previously we studied the effect of spatiotemporal pattern of transients on perceptual organization. Transient synchrony/asynchrony was critical in novel illusions of contextual motion (Likova & Tyler, 2002, 2003a, b). We found that strong image segmentation can be generated from transient asynchronies in fields of homogeneous visual noise, a phenomenon that we term 'Structure-from-Transients' (SfT). Here we used fMRI to reveal cortical mechanisms involved in SfT. The stimuli were random dot fields of 30 x 40°, replaced by uncorrelated dots every 500 ms. Asynchronous updates in subregions of the random-dot fields results in SfT. Exp.1: Figure/ground organization was generated in the test stimuli by transient-asynchrony between a figure area (a horizontal noise strip 8 x 40°) and its surround. The transient changes in the null stimuli however were synchronized, generating no SfT. Thus the global percepts switched from figure/ground (test) to a homogenous field (null) every 9 s, in 36 blocks per scan. Exp.2: Figure/ground organization was eliminated by segmentation of the field into equal horizontal SfT stripes. We found dramatic reorganization of the cortical activation pattern with manipulation of the perceptual SfT organization. Exp.1 revealed excitation of hMT/V5+ and figure/ground-specific top-down suppression of the background region in V1. Both were abolished by eliminating the figure/ground organization with multiple SfT stripes, which instead activated the higher dorsal and ventral tier retinotopic areas. The results support a view of a recurrent architecture with functional feedback loops, exhibiting complex spatiotemporal behavior in the case of a figure/ground organization extracted from its specific 'generator'. Our study reveals that on a global level the brain makes an important use of asynchrony as a relation structuring the spatiotemporal visual input.