Proceedings Volume 8651

Human Vision and Electronic Imaging XVIII

cover
Proceedings Volume 8651

Human Vision and Electronic Imaging XVIII

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 26 March 2013
Contents: 15 Sessions, 52 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2013
Volume Number: 8651

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 8651
  • Keynote Session
  • Lightness and Color
  • Vision and the Evolution of Technology
  • Early Vision Image Quality I
  • Early Vision Image Quality II
  • Higher-Level Issues in Image Quality I
  • Higher-Level Issues in Image Quality II
  • Perception and Natural Environments: Image Statistics, Texture, and Features I
  • Perception and Natural Environments: Image Statistics, Texture, and Features II
  • Attention and Saliency: From Perception to Applications
  • Eye Movements and Visual Tasks in Complex Environments
  • 3D Attention and Visual Tracking
  • Art and Perception
  • Interactive Paper Session
Front Matter: Volume 8651
icon_mobile_dropdown
Front Matter: Volume 8651
This PDF file contains the front matter associated with SPIE Proceedings Volume 8651, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Keynote Session
icon_mobile_dropdown
World, environment, Umwelt, and innerworld: a biological perspective on visual awareness
The world is all physical reality (Higgs bosons, and so forth), the “environment” is a geographical locality (your city, …), the “Umwelt” is the totality of possible actions of the environment on the sensitive body surface of an agent (you, your dog, …) and the possible actions of the agent on the environment (mechanical, chemical, …), whereas the “innerworld” is what it is for the agent to be, that is awareness. Awareness is pre-personal, proto-conscious, and (perhaps) proto-rational. The various “worlds” described above are on distinct ontological levels. The world, and the environment are studied in the exact sciences, the Umwelt is studied by physiology and ethology. Ethology is like behavioristic psychology, with the difference that it applies to all animals. It skips the innerworld, e.g., it considers speech to be a movement of air molecules.The innerworld can only be known through first person reports, thus is intrinsically subjective. It can only be approached through “experimental phenomenology”, which is based on intersubjectivity among humans. In this setting speech may mean something in addition to the movements of molecules. These views lead to a model of vision as an “optical user interface”. It has consequences for many applications.
Does evolution favor true perceptions?
Donald D. Hoffman, Manish Singh, Justin Mark
Does natural selection favor veridical perceptions, those that more accurately depict the objective environment? Vision researchers often claim that it does. But this claim, though influential, has not been adequately tested. In this paper we formalize the claim and a few alternatives. We then discuss the results of evolutionary games and genetic algorithms that indicate that veridical perceptions can be driven to extinction by non-veridical strategies that are tuned to fitness rather than to objective reality. This suggests that natural selection need not favor veridical perceptions, and that the effects of natural selection on perception deserve further study. We then address the question: How can an organism act effectively in an environment that it does not perceive veridically? (Acknowledgement: Brian Marion and Kyle Stevens are collaborators in this research.)
Lightness and Color
icon_mobile_dropdown
Mapping luminance onto lightness in vision and art
Alan Gilchrist
Most natural images span a large range of luminance values, often a thousand-to-one and sometimes a million-to-one (Heckaman and Fairchild, 2009). This luminance range must be mapped by the visual system onto a scale of perceived gray shades (called lightness) with a range of roughly thirty-to-one (90% to 3% reflectance). For the painter who wants to represent this scene on a canvas, the challenge is a bit different. The high-range scene must be represented using pigments with a range of only thirty-to-one. Let’s begin with vision. Even without a high range in the scene, understanding the mapping of luminance onto lightness has proven elusive. But we can think of the problem as having two parts: Anchoring and scaling.
Human lightness perception is guided by simple assumptions about reflectance and lighting
Lightness constancy is the remarkable ability of human observers to perceive surface reflectance accurately despite variations in illumination and context. Two successful approaches to understanding lightness perception that have developed along independent paths are anchoring theory and Bayesian theories. Anchoring theory is a set of rules that predict lightness percepts under a wide range of conditions. Some of these rules are counterintuitive and difficult to motivate, e.g., a rule that large surfaces tend to look lighter than small surfaces. Bayesian theories are formulated as probabilistic assumptions about lights and objects, and they model percepts as rational inferences from sensory data. Here I reconcile these two seemingly divergent approaches by showing that many rules of anchoring theory follow from simple probabilistic assumptions about lighting and reflectance. I describe a simple Bayesian model that makes maximum a posteriori interpretations of luminance images, and I show that this model predicts many of the phenomena described by anchoring theory, including anchoring to white, scale normalization, and rules governing glow. Thus anchoring theory can be formulated naturally in a Bayesian framework, and this approach shows that many seemingly idiosyncratic properties of human lightness perception are actually rational consequences of simple assumptions about lighting and reflectance.
Spatial imaging in color and HDR: Prometheus unchained
The Human Vision and Electronic Imaging Conferences (HVEI) at the IS and T/SPIE Electronic Imaging meetings have brought together research in the fundamentals of both vision and digital technology. This conference has incorporated many color disciplines that have contributed to the theory and practice of today's imaging: color constancy, models of vision, digital output, high-dynamic-range imaging, and the understanding of perceptual mechanisms. Before digital imaging, silver halide color was a pixel-based mechanism. Color films are closely tied to colorimetry, the science of matching pixels in a black surround. The quanta catch of the sensitized silver salts determines the amount of colored dyes in the final print. The rapid expansion of digital imaging over the past 25 years has eliminated the limitations of using small local regions in forming images. Spatial interactions can now generate images more like vision. Since the 1950's, neurophysiology has shown that post-receptor neural processing is based on spatial interactions. These results reinforced the findings of 19th century experimental psychology. This paper reviews the role of HVEI in color, emphasizing the interaction of research on vision and the new algorithms and processes made possible by electronic imaging.
Vision and the Evolution of Technology
icon_mobile_dropdown
Visual stimuli: past and present
Gerald Westheimer
The fundamental properties of light and the principles of the structure and function of the visual system were discovered at a time when the only light sources were the sun and the flame of a candle. Contributions by Newton, Huygens, Thomas Young and Purkinje, Helmholtz’s ophthalmoscope – all preceded the first incandescent filament. Light bulbs, Xenon arcs, lasers, light-emitting diodes (LEDs), computer monitors then sequentially enlarged the arsenal, and so did the steps from Nicol prism to Polaroid in polarizing light, and from glass and interference filters to laser light in generating monochromatic light. Technological advances have a deep impact on the research topics at any one time, expanding their scope. In particular, utilization of computers now allows the generation and manipulation of targets permitting questions to be approached that could not have been envisaged at the dawn of the technological era of vision research. Just beyond the immediate grasp of even the most thoughtful vision scientist, however, is the concern that stimulus sets originating in mathematicians’ and physicists’ toolboxes fail to capture some essential ingredients indigenous to human vision. The quest to study vision with stimuli in its own terms continues.
Emergent technologies: 25 years
This paper will talk about the technologies that have been emerging over the 25 years since the Human Vision and Electronic Imaging conference began that the conference has been a part of, and that have been a part of the conference, and will look at those technologies that are emerging today, such as social networks, haptic technologies, and still emerging imaging technologies, and what we might look at for the future.Twenty-five years is a long time, and it is not without difficulty that we remember what was emerging in the late 1980s. Yet to be developed: The first commercial digital still camera was not yet on the market, although there were hand held electronic cameras. Personal computers were not displaying standardized images, and image quality was not something that could be talked about in a standardized fashion, if only because image compression algorithms were not standardized yet for several years hence. Even further away were any standards for movie compression standards, there was no personal computer even on the horizon which could display them. What became an emergent technology and filled many sessions later, image comparison and search, was not possible, nor the current emerging technology of social networks- the world wide web was still several years away. Printer technology was still devising dithers and image size manipulations which would consume many years, as would scanning technology, and image quality for both was a major issue for dithers and Fourier noise.From these humble beginnings to the current moves that are changing computing and the meaning of both electronic devices and human interaction with them, we will see a course through the changing technology that holds some features constant for many years, while others come and go.
Perceptual approaches to finding features in data
Electronic imaging applications hinge on the ability to discover features in data. For example, doctors examine diagnostic images for tumors, broken bones and changes in metabolic activity. Financial analysts explore visualizations of market data to find correlations, outliers and interaction effects. Seismologists look for signatures in geological data to tell them where to drill or where an earthquake may begin. These data are very diverse, including images, numbers, graphs, 3-D graphics, and text, and are growing exponentially, largely through the rise in automatic data collection technologies such as sensors and digital imaging. This paper explores important trends in the art and science of finding features in data, such as the tension between bottom-up and top-down processing, the semantics of features, and the integration of human- and algorithm-based approaches. This story is told from the perspective of the IS and T/SPIE Conference on Human Vision and Electronic Imaging (HVEI), which has fostered research at the intersection between human perception and the evolution of new technologies.
Early Vision Image Quality I
icon_mobile_dropdown
Is image quality a function of contrast perception?
Andrew M. Haun, Eli Peli
In this retrospective we trace in broad strokes the development of image quality measures based on the study of the early stages of the human visual system (HVS), where contrast encoding is fundamental. We find that while presenters at the Human Vision and Electronic Imaging meetings have frequently strived to find points of contact between the study of human contrast psychophysics and the development of computer vision and image quality algorithms. Progress has not always been made on these terms, although indirect impact of vision science on more recent image quality metrics can be observed.
Visible contrast energy metrics for detection and discrimination
Contrast energy was proposed by Watson, Barlow, and Robson (Science, 1983) as a useful metric for representing luminance contrast target stimuli because it represents the detectability of the stimulus in photon noise for an ideal observer. We propose here the use of visible contrast energy metrics for detection and discrimination among static luminance patterns. The visibility is approximated with spatial frequency sensitivity weighting and eccentricity sensitivity weighting. The suggested weighting functions revise the Standard Spatial Observer (Watson and Ahumada, J. Vision, 2005) for luminance contrast detection , extend it into the near periphery, and provide compensation for duration. Under the assumption that the detection is limited only by internal noise, both detection and discrimination performance can be predicted by metrics based on the visible energy of the difference images.
Early Vision Image Quality II
icon_mobile_dropdown
Initial spatio-temporal domain expansion of the Modelfest database
Thom Carney, Sahar Mozaffari, Sean Sun, et al.
The first Modelfest group publication appeared in the SPIE Human Vision and Electronic Imaging conference proceedings in 1999. "One of the group's goals is to develop a public database of test images with threshold data from multiple laboratories for designing and testing HVS (Human Vision Models)." After extended discussions the group selected a set of 45 static images thought to best meet that goal and collected psychophysical detection data which is available on the WEB and presented in the 2000 SPIE conference proceedings. Several groups have used these datasets to test spatial modeling ideas. Further discussions led to the preliminary stimulus specification for extending the database into the temporal domain which was published in the 2002 conference proceeding. After a hiatus of 12 years, some of us have collected spatio-temporal thresholds on an expanded stimulus set of 41 video clips; the original specification included 35 clips. The principal change involved adding one additional spatial pattern beyond the three originally specified. The stimuli consisted of 4 spatial patterns, Gaussian Blob, 4 c/d Gabor patch, 11.3 c/d Gabor patch and a 2D white noise patch. Across conditions the patterns were temporally modulated over a range of approximately 0-25 Hz as well as temporal edge and pulse modulation conditions. The display and data collection specifications were as specified by the Modelfest groups in the 2002 conference proceedings. To date seven subjects have participated in this phase of the data collection effort, one of which also participated in the first phase of Modelfest. Three of the spatio-temporal stimuli were identical to conditions in the original static dataset. Small differences in the thresholds were evident and may point to a stimulus limitation. The temporal CSF peaked between 4 and 8 Hz for the 0 c/d (Gaussian blob) and 4 c/d patterns. The 4 c/d and 11.3 c/d Gabor temporal CSF was low pass while the 0 c/d pattern was band pass. This preliminary expansion of the Modelfest dataset needs the participation of additional laboratories to evaluate the impact of different methods on threshold estimates and increase the subject base. We eagerly await the addition of new data from interested researchers. It remains to be seen how accurately general HVS models will predict thresholds across both Modelfest datasets.
A database of local masking thresholds in natural images
The ability of an image region to hide or mask a target signal continues to play a key role in the design of numerous image-processing and vision applications. However, one of the challenges in designing an effective model of masking for natural images is the lack of ground-truth data. To address this issue, this paper describes a psychophysical study designed to obtain local contrast detection thresholds (masking maps) for a database of natural images. Via a three-alternative forced-choice experiment, we measured the thresholds for detecting 3.7 cycles/deg vertically oriented log-Gabor targets placed within each 85×85-pixel patch (1.9 deg patch) of 15 natural images from the CSIQ image database [Larson and Chandler, JEI, 2010]. Thus, for each image, we obtained a masking map in which each entry in the map denotes the RMS contrast threshold for detecting the log-Gabor target at the corresponding spatial location in the image. Here, we describe the psychophysical procedures used to collect the thresholds, we provide analyses of the results, and we provide some outcomes of predicting the thresholds via basic low-level features, a computational masking model, and two modern imagequality assessment algorithms.
Interplay between JPEG-2000 image coding and quality estimation
Guilherme O. Pinto, Sheila S. Hemami
Image quality and utility estimators aspire to quantify the perceptual resemblance and the usefulness of a distorted image when compared to a reference natural image, respectively. Image-coders, such as JPEG-2000, traditionally aspire to allocate the available bits to maximize the perceptual resemblance of the compressed image when compared to a reference uncompressed natural image. Specifically, this can be accomplished by allocating the available bits to minimize the overall distortion, as computed by a given quality estimator. This paper applies five image quality and utility estimators, SSIM, VIF, MSE, NICE and GMSE, within a JPEG-2000 encoder for rate-distortion optimization to obtain new insights on how to improve JPEG-2000 image coding for quality and utility applications, as well as to improve the understanding about the quality and utility estimators used in this work. This work develops a rate-allocation algorithm for arbitrary quality and utility estimators within the Post- Compression Rate-Distortion Optimization (PCRD-opt) framework in JPEG-2000 image coding. Performance of the JPEG-2000 image coder when used with a variety of utility and quality estimators is then assessed. The estimators fall into two broad classes, magnitude-dependent (MSE, GMSE and NICE) and magnitudeindependent (SSIM and VIF). They further differ on their use of the low-frequency image content in computing their estimates. The impact of these computational differences is analyzed across a range of images and bit rates. In general, performance of the JPEG-2000 coder below 1.6 bits/pixel with any of these estimators is highly content dependent, with the most relevant content being the amount of texture in an image and whether the strongest gradients in an image correspond to the main contours of the scene. Above 1.6 bits/pixel, all estimators produce visually equivalent images. As a result, the MSE estimator provides the most consistent performance across all images, while specific estimators are expected to provide improved performance for images with suitable content.
Higher-Level Issues in Image Quality I
icon_mobile_dropdown
From image quality to atmosphere experience: how evolutions in technology impact experience assessment
Image quality is a concept that for long very well served to optimize display performance and signal quality. New technological developments, however, forced the community to look into higher-level concepts to capture the full experience. Terms as naturalness and viewing experience were used to optimize the full experience of 3D-displays and Ambilight TV. These higher-level concepts capture differences in image quality and differences in perceived depth or perceived viewing field. With the introduction of solid-state lighting, further enhancing the multimedia experience, yet more advanced quality evaluation concepts to optimize the overall experience will be needed in the future.
Preference limits of the visual dynamic range for ultra high quality and aesthetic conveyance
Scott Daly, Timo Kunkel, Xing Sun, et al.
A subjective study was conducted to investigate the preferred maximum and minimum display luminances in order to determine the dynamic ranges for future displays. Two studies address the diffuse reflective regions, and a third study tested preferences of highlight regions. Preferences, as opposed to detection thresholds, were studied to provide results more directly relevant to the viewing of entertainment or art. Test images were specifically designed to test these limits without the perceptual conflicts that usually occur in these types of studies. For the diffuse range, we found a display with a dynamic range having luminances between 0.1 and 650 cd/m2 matches the average preferences. However, to satisfy 90% of the population, a dynamic range from 0.005 and ~3,000 cd/m2 is needed. Since a display should be able to produce values brighter than the diffuse white maximum, as in specular highlights and emissive sources, the highlight study concludes that even the average preferred maximum luminance for highlight reproduction is ~4,000 cd/m2.
Quantifying image quality in graphics: perspective on subjective and objective metrics and their performance
We explore three problems related to quality assessment in computer graphics: the design of efficient user studies; the scene-referred metrics for comparing high-dynamic-range images; and the comparison of metric performance for the database of computer graphics distortions. This paper summarizes the most important observations from investigating these problems and gives a high level perspective on the problem of quality assessment in graphics.
Visualizing lighting with images: converging between the predictive value of renderings and photographs
Performing psychophysical experiments to investigate lighting perception can be expensive and time consuming if complex lighting systems need to be implemented. In this paper, display-based experiments are explored as a cost effective and less time consuming alternative to real-world experiments. The aim of this work is to better understand the upper limit of prediction accuracy that can be achieved when presenting an image on a display rather than the real-world scene. We compare the predictive value of photographs and physically-based renderings on a number of perceptual lighting attributes. It is shown that the photographs convey statistically the same lighting perception as in a real-world scenario. Initial renderings have an inferior performance, but are shown to converge towards the performance of the photographs through iterative improvements.
Higher-Level Issues in Image Quality II
icon_mobile_dropdown
A survey on 3D quality of experience and 3D quality assessment
The field of algorithmically assessing the 3D quality of experience and/or 3D quality is an extremely challenging one; making it a fertile ground for research. The complexity of the problem, coupled with our yet nascent understanding of 3D perception and the increasing commercial shift toward 3D entertainment makes the area of 3D QA interesting, formidable and practically relevant. This article undertakes a brief review of the recent research in the area of 3D visual quality of experience and quality assessment. We first review literature in the field of quality of experience which encompasses geometry, visual discomfort etc., and then perform a similar review in the field of quality assessment which encompasses distortions such as blur, noise, compression etc. We describe algorithms and databases that have been proposed in the literature for these purposes. We conclude with a short description of a recent resource - the LIVE 3D IQA database that is the first quality assessment database which provides researchers with access to true depth information for each of the stereo pairs obtained from a high-precision range scanner.
Visual quality beyond artifact visibility
The Electronic imaging community has devoted a lot of effort to the development of technologies that can predict the visual quality of images and videos, as a basis for the delivery of optimal visual quality to the user. These systems have been based for the most part on a visibility-centric approach, assuming the more artifacts are visible, the higher is the annoyance they provoke, the lower the visual quality. Despite the remarkable results achieved with this approach, recently a number of studies suggested that the visibility-centric approach to visual quality might have limitations, and that other factors might influence the overall quality impression of an image or video, depending on cognitive and affective mechanisms that work on top of perception. In particular, interest in the visual content, engagement and context of usage have been found to impact on the overall quality impression of the image/video. In this paper, we review these studies and explore the impact that affective and cognitive processes have on the visual quality. In addition, as a case study, we present the results of an experiment investigating on the impact of aesthetic appeal on visual quality, and we show that users tend to be more demanding in terms of visual quality judging beautiful images.
Subjective matters: from image quality to image psychology
From the advent of digital imaging through several decades of studies, the human vision research community systematically focused on perceived image quality and digital artifacts due to resolution, compression, gamma, dynamic range, capture and reproduction noise, blur, etc., to help overcome existing technological challenges and shortcomings. Technological advances made digital images and digital multimedia nearly flawless in quality, and ubiquitous and pervasive in usage, provide us with the exciting but at the same time demanding possibility to turn to the domain of human experience including higher psychological functions, such as cognition, emotion, awareness, social interaction, consciousness and Self. In this paper we will outline the evolution of human centered multidisciplinary studies related to imaging and propose steps and potential foci of future research.
Perception and Natural Environments: Image Statistics, Texture, and Features I
icon_mobile_dropdown
The rough side of texture: texture analysis through the lens of HVEI
We take a look at texture analysis research over the past 25 years, from the persective of the Human Vision and Electronic Imaging conference. We consider advances in the understanding of human perception of textures and the development of texture analysis algorithms for practical applications. We cover perceptual models and algorithms for image halftoning, texture discrimination, texture segmentation, texture analysis/synthesis, perceptually and structurally lossless compression, content-based retrieval, and sense substitution.
Optimizing visual performance by adapting images to observers
Michael A. Webster, Igor Juricevic
Visual adaptation is widely assumed to optimize visual performance, but demonstrations of functional benefits beyond the case of light adaptation remain elusive. The failure to find marked improvements in visual discriminations with contrast or pattern adaptation may occur because these become manifest only over timescales that are too long to probe by briefly adapting observers. We explored the potential consequences of color contrast adaptation by instead “adapting” images to simulate how they should appear to observers under theoretically complete adaptation to different environments, and then used a visual search task to measure the ability to detect colors within the adapted images. Color salience can be markedly improved for extreme environments to which the observer is not routinely exposed, and may also be enhanced even among naturally occurring outdoor environments. The changes in performance provides a measure of how much in theory the visual system can be optimized for a given task and environment, and can reveal the extent to which differences in the statistics of the environment or the sensitivity of the observer are important in driving the states of adaptation. Adapting the images also provides a potential practical tool for optimizing performance in novel visual contexts, by rendering image information in a format that the visual system is already calibrated for.
Perception and Natural Environments: Image Statistics, Texture, and Features II
icon_mobile_dropdown
Efficient image representations and features
Michael Dorr, Eleonora Vig, Erhardt Barth
Interdisciplinary research in human vision and electronic imaging has greatly contributed to the current state of the art in imaging technologies. Image compression and image quality are prominent examples and the progress made in these areas relies on a better understanding of what natural images are and how they are perceived by the human visual system. A key research question has been: given the (statistical) properties of natural images, what are the most efficient and perceptually relevant image representations, what are the most prominent and descriptive features of images and videos? We give an overview of how these topics have evolved over the 25 years of HVEI conferences and how they have influenced the current state of the art. There are a number of striking parallels between human vision and electronic imaging. The retina does lateral inhibition, one of the early coders was using a Laplacian pyramid; primary visual cortical areas have orientation- and frequency-selective neurons, the current JPEG standard defines similar wavelet transforms; the brain uses a sparse code, engineers are currently excited about sparse coding and compressed sensing. Some of this has indeed happened at the HVEI conferences and we would like to distill that.
Highly overcomplete sparse coding
This paper explores sparse coding of natural images in the highly overcomplete regime. We show that as the overcompleteness ratio approaches l0x, new types of dictionary elements emerge beyond the classical Gabor function shape obtained from complete or only modestly overcomplete sparse coding. These more diverse dic­ tionaries allow images to be approximated with lower L1 norm (for a fixed SNR), and the coefficients exhibit steeper decay. We also evaluate the learned dictionaries in a denoising task, showing that higher degrees of overcompleteness yield modest gains in peformance.
Blind image quality assessment without training on human opinion scores
Anish Mittal, Rajiv Soundararajan, Gautam S. Muralidhar, et al.
We propose a family of image quality assessment (IQA) models based on natural scene statistics (NSS), that can predict the subjective quality of a distorted image without reference to a corresponding distortionless image, and without any training results on human opinion scores of distorted images. These `completely blind' models compete well with standard non-blind image quality indices in terms of subjective predictive performance when tested on the large publicly available `LIVE' Image Quality database.
Attention and Saliency: From Perception to Applications
icon_mobile_dropdown
Saliency identified by absence of background structure
Visual attention is commonly modelled by attempting to characterise objects using features that make them special or in some way distinctive in a scene. These approaches have the disadvantage that it is never certain what features will be relevant in an object that has not been seen before. This paper provides a brief outline of the approaches to modeling human visual attention together with some of the problems that they face. A graphical representation for image similarity is described that relies on the size of maximally associative structures (cliques) that are found to be reflected in pairs of images. While comparing an image with itself, the similarity mechanism is shown to model pop-out effects when constraints are placed on the physical separation of pixels that correspond to nodes in the maximal cliques. Background regions are found to contain structure in common that is not present in the salient regions which are thereby identified by its absence. The approach is illustrated with figures that exemplify asymmetry in pop-out, the conjunction of features, orientation disturbances and the application to natural images.
Investigation of eye-catching colors using eye tracking
Mokryun Baik, Hyeon-Jeong Suk, Jeongmin Lee, et al.
An eye tracking experiment was conducted to investigate the relationship between eye gazing movements and the color attributes to support the creation of effective communication and increase aesthetic satisfaction. With consideration to the context of smart phones, the study focused on icon arrays, and thus each stimulus set was composed of 25 color square patches arrayed in the format of a 5 by 5 grid. The experiment was divided into three parts, each examining one specific attribute of color, while controlling its other attributes. Fifteen college students were recruited, among whom all partook in all three parts. In Part I, hue difference was examined. Each stimulus set contained 25 hues under a fixed tone. It was revealed that subjects were more attentive to warm colors than to cool colors, particularly when warm colors were arranged along the horizontal and vertical axes; In Part II, the experiment dealt with tone difference. 25 tone variations for red, green and blue were provided as stimulus sets. However, the result indicated that changes in tone does not have a significant influence on subjects’ initial attention; Lastly, in Part III, combinations of colors were examined to determine whether color contrast influenced participants’ attention in a manner different from that of single colors. Among them, icons with complementary contrast gained the greatest attention. Throughout the experiments, the background was applied with either black or white; however a contrast effect between background and foreground was not noticeable.
Can relative skill be determined from a photographic portfolio?
Abhishek Agrawal, Vittal Premachandran, Rajesh Somavarapu, et al.
In this study, our primary aim is to determine empirically the role that skill plays in determining image aesthetics, and whether it can be deciphered from the ratings given by a diverse group of judges. To this end, we have collected and analyzed data from a large number of subjects (total 168) on a set of 221 of images taken by 33 photographers having different photographic skill and experience. We also experimented with the rating scales used by previous studies in this domain by introducing a binary rating system for collecting judges’ opinions. The study also demonstrates the use of Amazon Mechanical Turk as a crowd-sourcing platform in collecting scientific data and evaluating the skill of the judges participating in the experiment. We use a variety of performance and correlation metrics to evaluate the consistency of ratings across different rating scales and compare our findings. A novel feature of our study is an attempt to define a threshold based on the consistency of ratings when judges rate duplicate images. Our conclusion deviates from earlier findings and our own expectations, with ratings not being able to determine skill levels of photographers to a statistically significant level.
Eye Movements and Visual Tasks in Complex Environments
icon_mobile_dropdown
Binocular eye movements in health and disease
Binocular eye movements form a finely-tuned system that requires accurate coordination of the oculomotor dynamics and supports the vergence movements for tracking the fine binocular disparities required for 3D vision, and are particularly susceptible to disruption by brain injury and other neural dysfunctions. Saccadic dynamics for a population of 84 diverse participants show tight coefficients of variation of 2-10% of the mean value of each parameter. Significantly slower dynamics were seen for vertical upward saccades. Binocular coordination of saccades was accurate to within 1-4%, implying the operation of brainstem coordination mechanisms rather than independent cortical control of the two eyes. A new principle of oculomotor control - reciprocal binocular inhibition – is introduced to complement Sherrington’s and Hering’s Laws. This new law accounts for the fact that symmetrical vergence responses are about five times slower than saccades of the same amplitude, although a comprehensive analysis of asymmetrical vergence responses revealed unexpected variety in vergence dynamics. This analysis of the variety of human vergence responses thus contributes substantially to the understanding of the oculomotor control mechanisms underlying the generation of vergence movements and of the deficits in the oculomotor control resulting from mild traumatic brain injury.
Reflexive and voluntary control of smooth eye movements
Jeffrey B. Mulligan, Scott B. Stevenson, Lawrence K. Cormack
An understanding of visually evoked smooth eye movements is required to predict the visibility and legibility of moving displays, such as might be encountered in vehicles like aircraft and automobiles. We have studied the response of the oculomotor system to various classes of visual stimuli, and analyzed the results separately for horizontal and vertical version (in which the two eyes move together), and horizontal and vertical vergence (where they move in opposite directions). Of the four types of motion, only vertical vergence cannot be performed under voluntary control, and certain stimuli (all having relatively long latencies) are incapable of evoking it. In another experiment, we instructed observers to track one of two targets, and measured weak but reliable responses to the unattended target, in which the long-latency component of the response is abolished. Our results are consistent with a system containing two distinct processes, a fast reflexive process which responds to a restricted class of stimuli, and a slower voluntary process capable of following anything that can be seen, but incapable of controlling vertical vergence.
Simple gaze-contingent cues guide eye movements in a realistic driving simulator
Laura Pomarjanschi, Michael Dorr, Peter J. Bex, et al.
Looking at the right place at the right time is a critical component of driving skill. Therefore, gaze guidance has the potential to become a valuable driving assistance system. In previous work, we have already shown that complex gaze-contingent stimuli can guide attention and reduce the number of accidents in a simple driving simulator. We here set out to investigate whether cues that are simple enough to be implemented in a real car can also capture gaze during a more realistic driving task in a high-fidelity driving simulator. We used a state-of-the-art, wide-field-of-view driving simulator with an integrated eye tracker. Gaze-contingent warnings were implemented using two arrays of light-emitting diodes horizontally fitted below and above the simulated windshield. Thirteen volunteering subjects drove along predetermined routes in a simulated environment popu­ lated with autonomous traffic. Warnings were triggered during the approach to half of the intersections, cueing either towards the right or to the left. The remaining intersections were not cued, and served as controls. The analysis of the recorded gaze data revealed that the gaze-contingent cues did indeed have a gaze guiding effect, triggering a significant shift in gaze position towards the highlighted direction. This gaze shift was not accompanied by changes in driving behaviour, suggesting that the cues do not interfere with the driving task itself.
Designing an obstacle display for helicopter operations in degraded visual environment
Patrizia M. Knabl, Niklas Peinecke
Flying in degraded visual environment is an extremely challenging task for a helicopter pilot. The loss of the outside visual reference causes impaired situation awareness, high workload and spatial disorientation leading to incidents like obstacle or ground hits. DLR is working on identifying ways to reduce this problem by providing the pilot with additional information from fused sensor data. Therefore, different display design solutions were developed. In a first study, the design focused on the use of a synthetic head-down display, considering different representations for obstacles, color coding and terrain features. Results show a subjective preference for the most detailed obstacle display, while objective results reveal better performance for the little less detailed display. In a second study, symbology for a helmet-mounted display was designed and evaluated in a part-task simulation. Design considerations focused on different obstacle representations as well as attentional and perceptual aspects associated with the use of helmet-mounted displays. Results show consistent findings to the first experiment, indicating that the display subjectively favored does not necessarily contribute to the best performance in detection. However when additional tasks have to be performed the level of clutter seems to impair the ability to respond correctly to secondary tasks. Thus the favored display type nonetheless seems to be the most promising solution since it is accompanied by the overall best objective results integrating both detection of obstacles and the ability to perform additional tasks.
3D Attention and Visual Tracking
icon_mobile_dropdown
Visual storytelling in 2D and stereoscopic 3D video: effect of blur on visual attention
Quan Huynh-Thu, Cyril Vienne, Laurent Blondé
Visual attention is an inherent mechanism that plays an important role in the human visual perception. As our visual system has limited capacity and cannot efficiently process the information from the entire visual field, we focus our attention on specific areas of interest in the image for detailed analysis of these areas. In the context of media entertainment, the viewers’ visual attention deployment is also influenced by the art of visual storytelling. To this date, visual editing and composition of scenes in stereoscopic 3D content creation still mostly follows those used in 2D. In particular, out-of-focus blur is often used in 2D motion pictures and photography to drive the viewer’s attention towards a sharp area of the image. In this paper, we study specifically the impact of defocused foreground objects on visual attention deployment in stereoscopic 3D content. For that purpose, we conducted a subjective experiment using an eyetracker. Our results bring more insights on the deployment of visual attention in stereoscopic 3D content viewing, and provide further understanding on visual attention behavior differences between 2D and 3D. Our results show that a traditional 2D scene compositing approach such as the use of foreground blur does not necessarily produce the same effect on visual attention deployment in 2D and 3D. Implications for stereoscopic content creation and visual fatigue are discussed.
Using natural versus artificial stimuli to perform calibration for 3D gaze tracking
Christophe Maggia, Nathalie Guyader, Anne Guérin-Dugué
The presented study tests which type of stereoscopic image, natural or artificial, is more adapted to perform efficient and reliable calibration in order to track the gaze of observers in 3D space using classical 2D eye tracker. We measured the horizontal disparities, i.e. the difference between the x coordinates of the two eyes obtained using a 2D eye tracker. This disparity was recorded for each observer and for several target positions he had to fixate. Target positions were equally distributed in the 3D space, some on the screen (with a null disparity), some behind the screen (uncrossed disparity) and others in front of the screen (crossed disparity). We tested different regression models (linear and non linear) to explain either the true disparity or the depth with the measured disparity. Models were tested and compared on their prediction error for new targets at new positions. First of all, we found that we obtained more reliable disparities measures when using natural stereoscopic images rather than artificial. Second, we found that overall a non-linear model was more efficient. Finally, we discuss the fact that our results were observer dependent, with variability’s between the observer’s behavior when looking at 3D stimuli. Because of this variability, we proposed to compute observer specific model to accurately predict their gaze position when exploring 3D stimuli.
Study of center-bias in the viewing of stereoscopic image and a framework for extending 2D visual attention models to 3D
Junle Wang, Matthieu Perreira Da Silva, Patrick Le Callet, et al.
Compared to the good performance that can be achieved by many 2D visual attention models, predicting salient regions of a 3D scene is still challenging. An efficient way to achieve this can be to exploit existing models designed for 2D content. However, the visual conflicts caused by binocular disparity and changes of viewing behavior in 3D viewing need to be dealt with. To cope with these, the present paper proposes a simple framework for extending 2D attention models for 3D images, well as evaluates center-bias in 3D-viewing condition. To validate the results, a database is created, which contains eye-movements of 35 subjects recorded during free viewing of eighteen 3D images and their corresponding 2D version. Fixation density maps indicate a weaker center-bias in the viewing of 3D images. Moreover, objective metric results demonstrate the efficiency of the proposed model and a large added value of center-bias when it is taken into account in computational modeling of 3D visual attention.
How visual attention is modified by disparities and textures changes?
Dar'ya Khaustova, Jérome Fournier, Emmanuel Wyckens, et al.
The 3D image/video quality of experience is a multidimensional concept that depends on 2D image quality, depth quantity and visual comfort. The relationship between these parameters is not yet clearly defined. From this perspective, we aim to understand how texture complexity, depth quantity and visual comfort influence the way people observe 3D content in comparison with 2D. Six scenes with different structural parameters were generated using Blender software. For these six scenes, the following parameters were modified: texture complexity and the amount of depth changing the camera baseline and the convergence distance at the shooting side. Our study was conducted using an eye-tracker and a 3DTV display. During the eye-tracking experiment, each observer freely examined images with different depth levels and texture complexities. To avoid memory bias, we ensured that each observer had only seen scene content once. Collected fixation data were used to build saliency maps and to analyze differences between 2D and 3D conditions. Our results show that the introduction of disparity shortened saccade length; however fixation durations remained unaffected. An analysis of the saliency maps did not reveal any differences between 2D and 3D conditions for the viewing duration of 20 s. When the whole period was divided into smaller intervals, we found that for the first 4 s the introduced disparity was conducive to the section of saliency regions. However, this contribution is quite minimal if the correlation between saliency maps is analyzed. Nevertheless, we did not find that discomfort (comfort) had any influence on visual attention. We believe that existing metrics and methods are depth insensitive and do not reveal such differences. Based on the analysis of heat maps and paired t-tests of inter-observer visual congruency values we deduced that the selected areas of interest depend on texture complexities.
Art and Perception
icon_mobile_dropdown
Copy-paste in depth
Whereas pictorial space plays an important role in art historic discussions, there is little research on the quantitative structure of pictorial spaces. Recently, a number of methods have been developed, one of which relies on size constancy: two spheres are rendered in the image while the observers adjusts the relative sizes such that they appear to have similar sizes in pictorial space. This method is based on pair-wise comparisons, resulting in n(n-1)/2 trials for n samples. Furthermore, it renders a probe in the image that does not conform to the style of the painting: it mixes computer graphics with a painting. The method proposed here uses probes that are already in the scene, not violating the paintings' style. An object is copied from the original painting and shown in a different location. The observer can adjust the scaling such that the two objects (one originally in the painting, and the other copy-pasted) appear to have equal sizes in pictorial space. Since the original object serves as a reference, the number of trials increases with n instead of n2 which is the case of the original method. We measured the pictorial spaces of two paintings using our method, one Canaletto and one Breughel. We found that observers typically agreed well with respect to each other, coefficients of determination as high as 0.9 were found when the probe was a human, while other probes scored somewhat (but significantly) lower. These initial findings appear very promising for the study of pictorial space.
Drawing accuracy measured using polygons
Linda Carson, Matthew Millard, Nadine Quehl, et al.
The study of drawing, for its own sake and as a probe into human visual perception, generally depends on ratings by human critics and self-reported expertise of the drawers. To complement those approaches, we have developed a geometric approach to analyzing drawing accuracy, one whose measures are objective, continuous and performance-based. Drawing geometry is represented by polygons formed by landmark points found in the drawing. Drawing accuracy is assessed by comparing the geometric properties of polygons in the drawn image to the equivalent polygon in a ground truth photo. There are four distinct properties of a polygon: its size, its position, its orientation and the proportionality of its shape. We can decompose error into four components and investigate how each contributes to drawing performance. We applied a polygon-based accuracy analysis to a pilot data set of representational drawings and found that an expert drawer outperformed a novice on every dimension of polygon error. The results of the pilot data analysis correspond well with the apparent quality of the drawings, suggesting that the landmark and polygon analysis is a method worthy of further study. Applying this geometric analysis to a within-subjects comparison of accuracy in the positive and negative space suggests there is a trade-off on dimensions of error. The performance-based analysis of geometric deformations will allow the study of drawing accuracy at different levels of organization, in a systematic and quantitative manner. We briefly describe the method and its potential applications to research in drawing education and visual perception.
Fractals in art and nature: why do we like them?
Branka Spehar, Richard P Taylor
Fractals have experienced considerable success in quantifying the visual complexity exhibited by many natural patterns, and continue to capture the imagination of scientists and artists alike. Fractal patterns have also been noted for their aesthetic appeal, a suggestion further reinforced by the discovery that the poured patterns of the American abstract painter Jackson Pollock are also fractal, together with the findings that many forms of art resemble natural scenes in showing scale-invariant, fractal-like properties. While some have suggested that fractal-like patterns are inherently pleasing because they resemble natural patterns and scenes, the relation between the visual characteristics of fractals and their aesthetic appeal remains unclear. Motivated by our previous findings that humans display a consistent preference for a certain range of fractal dimension across fractal images of various types we turn to scale-specific processing of visual information to understand this relationship. Whereas our previous preference studies focused on fractal images consisting of black shapes on white backgrounds, here we extend our investigations to include grayscale images in which the intensity variations exhibit scale invariance. This scale-invariance is generated using a 1/f frequency distribution and can be tuned by varying the slope of the rotationally averaged Fourier amplitude spectrum. Thresholding the intensity of these images generates black and white fractals with equivalent scaling properties to the original grayscale images, allowing a direct comparison of preferences for grayscale and black and white fractals. We found no significant differences in preferences between the two groups of fractals. For both set of images, the visual preference peaked for images with the amplitude spectrum slopes from 1.25 to 1.5, thus confirming and extending the previously observed relationship between fractal characteristics of images and visual preference.
Interactive Paper Session
icon_mobile_dropdown
Picture perception and visual field
Andrea J. van Doorn, Huib de Ridder, Jan Koenderink
Looking at a picture fills part of the visual field. In the case of straight photographs there is a notion of the “Field of View” of the camera at the time of exposure. Is there a corresponding notion for the perception of the picture? In most cases the part of the visual field (as measured in degrees) filled by the picture will be quite different from the field of view of the camera. The case of works of arts is even more complicated, there need not even exist a well defined central view point. With several examples we show that there is essentially no notion of a corresponding “field of view” in pictorial perception. This is even the case for drawings in conventional linear perspective. Apparently the “mental eye” of the viewer is often unrelated to the geometry of the camera (or perspective center used in drawing). Observers often substitute templates instead of attempting an analysis of perspective.
Measurements of achromatic and chromatic contrast sensitivity functions for an extended range of adaptation luminance
Inspired by the ModelFest and ColorFest data sets, a contrast sensitivity function was measured for a wide range of adapting luminance levels. The measurements were motivated by the need to collect visual performance data for natural viewing of static images at a broad range of luminance levels, such as can be found in the case of high dynamic range displays. The detection of sine-gratings with Gaussian envelope was measured for achromatic color axis (black to white), two chromatic axes (green to red and yellow-green to violet) and two mixed chromatic and achromatic axes (dark-green to light-pink, and dark yellow to light-blue). The background luminance varied from 0.02 to 200 cd/m2. The spatial frequency of the gratings varied from 0.125 to 16 cycles per degree. More than four observers participated in the experiments and they individually determined the detection threshold for each stimulus using at least 20 trials of the QUEST method. As compared to the popular CSF models, we observed higher sensitivity drop for higher frequencies and significant differences in sensitivities in the luminance range between 0.02 and 2 cd/m2. Our measurements for chromatic CSF show a significant drop in sensitivity with luminance, but little change in the shape of the CSF. The drop of sensitivity at high frequencies is significantly weaker than reported in other studies and assumed in most chromatic CSF models.
Viewer preferences for adaptive playout
Adaptive media playout techniques are used to avoid buffer underflow in a dynamic streaming environment where the available bandwidth may be fluctuating. In this paper we report human perceptions from audio quality studies that we performed on speech and music samples for adaptive audio playout. Test methods based on ITU-R BS. 1534-1 recommendation were used. Studies were conducted for both slow playout and fast playout. Two scales - a coarse scale and a finer scale was used for the slow and fast audio playout factors. Results from our study can be used to determine acceptable slow and fast playout factors for speech and music content. An adaptive media playout algorithm could use knowledge of these upper and lower bounds on playback speeds to decide its adaptive playback schedule.
The effect of familiarity on perceived interestingness of images
Sharon Lynn Chu, Elena Fedorovskaya, Francis Quek, et al.
We present an exploration of familiarity as a meaningful dimension for the individualized adaptation of media-rich interfaces. In this paper, we investigate in particular the effect of digital images personalized for familiarity on users’ perceived interestingness. Two dimensions of familiarity, facial familiarity and familiarity with image context, are manipulated. Our investigation consisted of three studies: the first two address how morphing technology can be used to convey meaningful familiarity, and the third studies the effect of such familiarity on users’ sense of interestingness. Four levels of person familiarity varying in degree of person knowledge, and two levels of context familiarity varying in frequency of exposure, were considered: Self, Friend, Celebrity, and Stranger in Familiar and Unfamiliar contexts. Experimental results showed significant main effects of context and person familiarity. Our findings deepen understanding of the critical element of familiarity in HCI and its relationship to the interestingness of images, and can have great impact for the design of media-rich systems.
Quantifying patterns of dynamics in eye movement to measure goodness in organization of design elements in interior architecture
Hasti Mirkia, Arash Sangari, Mark Nelson, et al.
Architecture brings together diverse elements to enhance the observer’s measure of esthetics and the convenience of functionality. Architects often conceptualize synthesis of design elements to invoke the observer’s sense of harmony and positive affect. How does an observer’s brain respond to harmony of design in interior spaces? One implicit consideration by architects is the role of guided visual attention by observers while navigating indoors. Prior visual experience of natural scenes provides the perceptual basis for Gestalt of design elements. In contrast, Gestalt of organization in design varies according to the architect’s decision. We outline a quantitative theory to measure the success in utilizing the observer’s psychological factors to achieve the desired positive affect. We outline a unified framework for perception of geometry and motion in interior spaces, which integrates affective and cognitive aspects of human vision in the context of anthropocentric interior design. The affective criteria are derived from contemporary theories of interior design. Our contribution is to demonstrate that the neural computations in an observer’s eye movement could be used to elucidate harmony in perception of form, space and motion, thus a measure of goodness of interior design. Through mathematical modeling, we argue the plausibility of the relevant hypotheses.
Development of a human vision simulation camera and its application: implementation of specific color perception
Hiroshi Okumura, Shoichiro Takubo, Shoichi Ozaki, et al.
The authors have developed HuVisCam, a human vision simulation camera, that can simulate not only Purkinje effect for mesopic and scotopic vision but also dark and light adaptation, abnormal miosis and abnormal mydriasis caused by the influence of mydriasis medicine or nerve agent This camera consists of a bandpass pre-filter, a color USB camera, an Illuminator and a small computer. In this article, improvement of HuVisCam for specific color perception is discussed. For persons with normal color perception, simulation function of various types of specific color perception is provided. In addition, for persons with specific color perception, color information analyzing function is also provided.
IMF-based chaotic characterization of AP and ML visually-driven postural responses
Hanif Azhar, Guillaume Giraudet, Jocelyn Faubert
The objective was to analyze visually driven postural responses and characterize any non-linear behaviour. We recorded physiological responses for two adults, 260 trials each. The subjects maintained quite stance while fixating for four seconds within an immersive room, EON Icube, where the reference to the visual stimuli, i.e., the virtual platform, randomly oscillated in Gaussian orientation 90° and 270° for antero-posterior (AP), and, 0° and 180° for medio-lateral (ML) at three different frequencies (0.125, 0.25, and 0.5 Hz). We accomplished stationary derivatives of posture time series by taking the intrinsic mode functions (IMFs). The phase space plot of IMF shows evidence of the existence of non-linear attractors in both ML and AP. Correlation integral slope with increasing embedding dimension is similar to random white noise for ML, and similar to non-linear chaotic series for AP. Next, recurrence plots indicate the existence of more non-linearity for AP than that for ML. The patterns of the dots after 200th time stamp (near onset) appears to be aperodic in AP. At higher temporal windows, AP entropy tends more toward chaotic series, than that of ML. There are stronger non-linear components in AP than that in ML regardless of the speed conditions.
Application of imaging technology for archaeology researches: framework design for connectivity analysis in pieces of Jomon pottery
Kimiyoshi Miyata, Ryota Yajima, Kenichi Kobayashi
Jomon pottery is one kind of earthenware produced in Jomon period in Japan. Potteries are found by the excavations in archaeological sites, however their original whole shapes have been dismissed because those are broken and separated into small pieces. In the archaeological investigation process, reproduction of the whole shape of the potteries is one of the important and difficult tasks because there are a lot of pieces and the number of combinations among the pieces is huge. In this paper, a framework for an application of the imaging technology is explained at first, then connectivity analysis among the pieces of Jomon potteries is focused on to reduce the number of trial and error to find connectable combinations in the pieces. The authentic pieces are chosen and taken by a digital camera, and each piece in the image is labeled to calculate the statistical information in the analysis of the connectivity. A coefficient showing the connectivity of the pieces is defined and calculated to indicate probability of connection among the pieces in the image. Experimental result showed that the correct pieces could be detected by using the coefficient.
Top-down visual search in Wimmelbild
Julia Bergbauer, Sibel Tari
Wimmelbild which means “teeming figure picture” is a popular genre of visual puzzles. Abundant masses of small figures are brought together in complex arrangements to make one scene in a Wimmelbild. It is picture hunt game. We discuss what type of computations/processes could possibly underlie the solution of the discovery of figures that are hidden due to a distractive influence of the context. One thing for sure is that the processes are unlikely to be purely bottom-up. One possibility is to re-arrange parts and see what happens. As this idea is linked to creativity, there are abundant examples of unconventional part re-organization in modern art. A second possibility is to define what to look for. That is to formulate the search as a top-down process. We address top-down visual search in Wimmelbild with the help of diffuse distance and curvature coding fields.
Visual discrimination and adaptation using non-linear unsupervised learning
Sandra Jiménez, Valero Laparra, Jesus Malo
Understanding human vision not only involves empirical descriptions of how it works, but also organization principles that explain why it does so. Identifying the guiding principles of visual phenomena requires learning algorithms to optimize specific goals. Moreover, these algorithms have to be flexible enough to account for the non-linear and adaptive behavior of the system. For instance, linear redundancy reduction transforms certainly explain a wide range of visual phenomena. However, the generality of this organization principle is still in question:10 it is not only that and additional constraints such as energy cost may be relevant as well, but also, statistical independence may not be the better solution to make optimal inferences in squared error terms. Moreover, linear methods cannot account for the non-uniform discrimination in different regions of the image and color space: linear learning methods necessarily disregard the non-linear nature of the system. Therefore, in order to account for the non-linear behavior, principled approaches commonly apply the trick of using (already non-linear) parametric expressions taken from empirical models. Therefore these approaches are not actually explaining the non-linear behavior, but just fitting it to image statistics. In summary, a proper explanation of the behavior of the system requires flexible unsupervised learning algorithms that (1) are tunable to different, perceptually meaningful, goals; and (2) make no assumption on the non-linearity. Over the last years we have worked on these kind of learning algorithms based on non-linear ICA,18 Gaussianization, 19 and principal curves. In this work we stress the fact that these methods can be tuned to optimize different design strategies, namely statistical independence, error minimization under quantization, and error minimization under truncation. Then, we show (1) how to apply these techniques to explain a number of visual phenomena, and (2) suggest the underlying organization principle in each case.
Chromatic induction and contrast masking: similar models, different goals?
Sandra Jiménez, Xavier Otazu, Valero Laparra, et al.
Normalization of signals coming from linear sensors is an ubiquitous mechanism of neural adaptation.1 Local interaction between sensors tuned to a particular feature at certain spatial position and neighbor sensors explains a wide range of psychophysical facts including (1) masking of spatial patterns, (2) non-linearities of motion sensors, (3) adaptation of color perception, (4) brightness and chromatic induction, and (5) image quality assessment. Although the above models have formal and qualitative similarities, it does not necessarily mean that the mechanisms involved are pursuing the same statistical goal. For instance, in the case of chromatic mechanisms (disregarding spatial information), different parameters in the normalization give rise to optimal discrimination or adaptation, and different non-linearities may give rise to error minimization or component independence. In the case of spatial sensors (disregarding color information), a number of studies have pointed out the benefits of masking in statistical independence terms. However, such statistical analysis has not been performed for spatio-chromatic induction models where chromatic perception depends on spatial configuration. In this work we investigate whether successful spatio-chromatic induction models,6 increase component independence similarly as previously reported for masking models. Mutual information analysis suggests that seeking an efficient chromatic representation may explain the prevalence of induction effects in spatially simple images.
Aesthetics and entropy II: a critical examination
The proposal to use entropy as a metric for optimization of image processing has been subjected to further critical examination, on the basis of experiments with contrast adjustment, HDR imaging, bimodal brightness distributions, and unsharp masking. Consequently our original expectation that entropy may be a directly useful response metric for optimizing image processing now appears to us to be naïve and limited in its applicability. One purpose of the present investigation is to ascertain the nature of these limits. We also infer from the unsharp masking studies that the human visual system (HVS) has evolved not so much to maximize information captured from the visual field as to enhance compressibility and to effect image simplification.