World, environment, Umwelt, and innerworld: a biological perspective on visual awareness
Author(s):
Jan J. Koenderink
Show Abstract
The world is all physical reality (Higgs bosons, and so forth), the “environment” is a geographical locality (your city, …), the “Umwelt” is the totality of possible actions of the environment on the sensitive body surface of an agent (you, your dog, …) and the possible actions of the agent on the environment (mechanical, chemical, …), whereas the “innerworld” is what it is for the agent to be, that is awareness. Awareness is pre-personal, proto-conscious, and (perhaps) proto-rational. The various “worlds” described above are on distinct ontological levels. The world, and the environment are studied in the exact sciences, the Umwelt is studied by physiology and ethology. Ethology is like behavioristic psychology, with the difference that it applies to all animals. It skips the innerworld, e.g., it considers speech to be a movement of air molecules.The innerworld can only be known through first person reports, thus is intrinsically subjective. It can only be approached through “experimental phenomenology”, which is based on intersubjectivity among humans. In this setting speech may mean something in addition to the movements of molecules. These views lead to a model of vision as an “optical user interface”. It has consequences for many applications.
Does evolution favor true perceptions?
Author(s):
Donald D. Hoffman;
Manish Singh;
Justin Mark
Show Abstract
Does natural selection favor veridical perceptions, those that more accurately depict the objective environment? Vision researchers often claim that it does. But this claim, though influential, has not been adequately tested. In this paper we formalize the claim and a few alternatives. We then discuss the results of evolutionary games and genetic algorithms that indicate that veridical perceptions can be driven to extinction by non-veridical strategies that are tuned to fitness rather than to objective reality. This suggests that natural selection need not favor veridical perceptions, and that the effects of natural selection on perception deserve further study. We then address the question: How can an organism act effectively in an environment that it does not perceive veridically? (Acknowledgement: Brian Marion and Kyle Stevens are collaborators in this research.)
Mapping luminance onto lightness in vision and art
Author(s):
Alan Gilchrist
Show Abstract
Most natural images span a large range of luminance values, often a thousand-to-one and sometimes a million-to-one (Heckaman and Fairchild, 2009). This luminance range must be mapped by the visual system onto a scale of perceived gray shades (called lightness) with a range of roughly thirty-to-one (90% to 3% reflectance). For the painter who wants to represent this scene on a canvas, the challenge is a bit different. The high-range scene must be represented using pigments with a range of only thirty-to-one. Let’s begin with vision. Even without a high range in the scene, understanding the mapping of luminance onto lightness has proven elusive. But we can think of the problem as having two parts: Anchoring and scaling.
Human lightness perception is guided by simple assumptions about reflectance and lighting
Author(s):
Richard F. Murray
Show Abstract
Lightness constancy is the remarkable ability of human observers to perceive surface reflectance accurately despite
variations in illumination and context. Two successful approaches to understanding lightness perception that have
developed along independent paths are anchoring theory and Bayesian theories. Anchoring theory is a set of rules that
predict lightness percepts under a wide range of conditions. Some of these rules are counterintuitive and difficult to
motivate, e.g., a rule that large surfaces tend to look lighter than small surfaces. Bayesian theories are formulated as
probabilistic assumptions about lights and objects, and they model percepts as rational inferences from sensory data.
Here I reconcile these two seemingly divergent approaches by showing that many rules of anchoring theory follow from
simple probabilistic assumptions about lighting and reflectance. I describe a simple Bayesian model that makes
maximum a posteriori interpretations of luminance images, and I show that this model predicts many of the phenomena
described by anchoring theory, including anchoring to white, scale normalization, and rules governing glow. Thus
anchoring theory can be formulated naturally in a Bayesian framework, and this approach shows that many seemingly
idiosyncratic properties of human lightness perception are actually rational consequences of simple assumptions about
lighting and reflectance.
Spatial imaging in color and HDR: Prometheus unchained
Author(s):
John J. McCann
Show Abstract
The Human Vision and Electronic Imaging Conferences (HVEI) at the IS and T/SPIE Electronic Imaging meetings
have brought together research in the fundamentals of both vision and digital technology. This conference has
incorporated many color disciplines that have contributed to the theory and practice of today's imaging: color
constancy, models of vision, digital output, high-dynamic-range imaging, and the understanding of perceptual
mechanisms. Before digital imaging, silver halide color was a pixel-based mechanism. Color films are closely tied to
colorimetry, the science of matching pixels in a black surround. The quanta catch of the sensitized silver salts
determines the amount of colored dyes in the final print. The rapid expansion of digital imaging over the past 25
years has eliminated the limitations of using small local regions in forming images. Spatial interactions can now
generate images more like vision. Since the 1950's, neurophysiology has shown that post-receptor neural processing
is based on spatial interactions. These results reinforced the findings of 19th century experimental psychology. This paper reviews the role of HVEI in color, emphasizing the interaction of research on vision and the new algorithms and processes made possible by electronic imaging.
Visual stimuli: past and present
Author(s):
Gerald Westheimer
Show Abstract
The fundamental properties of light and the principles of the structure and function of the visual system were discovered at a time when the only light sources were the sun and the flame of a candle. Contributions by Newton, Huygens, Thomas Young and Purkinje, Helmholtz’s ophthalmoscope – all preceded the first incandescent filament. Light bulbs, Xenon arcs, lasers, light-emitting diodes (LEDs), computer monitors then sequentially enlarged the arsenal, and so did the steps from Nicol prism to Polaroid in polarizing light, and from glass and interference filters to laser light in generating monochromatic light. Technological advances have a deep impact on the research topics at any one time, expanding their scope. In particular, utilization of computers now allows the generation and manipulation of targets permitting questions to be approached that could not have been envisaged at the dawn of the technological era of vision research. Just beyond the immediate grasp of even the most thoughtful vision scientist, however, is the concern that stimulus sets originating in mathematicians’ and physicists’ toolboxes fail to capture some essential ingredients indigenous to human vision. The quest to study vision with stimuli in its own terms continues.
Emergent technologies: 25 years
Author(s):
Hawley K. Rising III
Show Abstract
This paper will talk about the technologies that have been emerging over the 25 years since the Human Vision and Electronic Imaging conference began that the conference has been a part of, and that have been a part of the conference, and will look at those technologies that are emerging today, such as social networks, haptic technologies, and still emerging imaging technologies, and what we might look at for the future.Twenty-five years is a long time, and it is not without difficulty that we remember what was emerging in the late 1980s. Yet to be developed: The first commercial digital still camera was not yet on the market, although there were hand held electronic cameras. Personal computers were not displaying standardized images, and image quality was not something that could be talked about in a standardized fashion, if only because image compression algorithms were not standardized yet for several years hence. Even further away were any standards for movie compression standards, there was no personal computer even on the horizon which could display them. What became an emergent technology and filled many sessions later, image comparison and search, was not possible, nor the current emerging technology of social networks- the world wide web was still several years away. Printer technology was still devising dithers and image size manipulations which would consume many years, as would scanning technology, and image quality for both was a major issue for dithers and Fourier noise.From these humble beginnings to the current moves that are changing computing and the meaning of both electronic devices and human interaction with them, we will see a course through the changing technology that holds some features constant for many years, while others come and go.
Perceptual approaches to finding features in data
Author(s):
Bernice E. Rogowitz
Show Abstract
Electronic imaging applications hinge on the ability to discover features in data. For example, doctors examine diagnostic images for tumors, broken bones and changes in metabolic activity. Financial analysts explore visualizations of market data to find correlations, outliers and interaction effects. Seismologists look for signatures in geological data to tell them where to drill or where an earthquake may begin. These data are very diverse, including images, numbers, graphs, 3-D graphics, and text, and are growing exponentially, largely through the rise in automatic data collection technologies such as sensors and digital imaging. This paper explores important trends in the art and science of finding features in data, such as the tension between bottom-up and top-down processing, the semantics of features, and the integration of human- and algorithm-based approaches. This story is told from the perspective of the IS and T/SPIE Conference on Human Vision and Electronic Imaging (HVEI), which has fostered research at the intersection between human perception and the evolution of new technologies.
Is image quality a function of contrast perception?
Author(s):
Andrew M. Haun;
Eli Peli
Show Abstract
In this retrospective we trace in broad strokes the development of image quality measures based on the study of the early
stages of the human visual system (HVS), where contrast encoding is fundamental. We find that while presenters at the
Human Vision and Electronic Imaging meetings have frequently strived to find points of contact between the study of
human contrast psychophysics and the development of computer vision and image quality algorithms. Progress has not
always been made on these terms, although indirect impact of vision science on more recent image quality metrics can be
observed.
Visible contrast energy metrics for detection and discrimination
Author(s):
Albert J. Ahumada;
Andrew B. Watson
Show Abstract
Contrast energy was proposed by Watson, Barlow, and Robson (Science, 1983) as a useful metric for representing luminance contrast target stimuli because it represents the detectability of the stimulus in photon noise for an ideal observer. We propose here the use of visible contrast energy metrics for detection and discrimination among static luminance patterns. The visibility is approximated with spatial frequency sensitivity weighting and eccentricity sensitivity weighting. The suggested weighting functions revise the Standard Spatial Observer (Watson and Ahumada, J. Vision, 2005) for luminance contrast detection , extend it into the near periphery, and provide compensation for duration. Under the assumption that the detection is limited only by internal noise, both detection and discrimination performance can be predicted by metrics based on the visible energy of the difference images.
Initial spatio-temporal domain expansion of the Modelfest database
Author(s):
Thom Carney;
Sahar Mozaffari;
Sean Sun;
Ryan Johnson;
Sharonya Shrivastava;
Priscilla Shen;
Emma Ly
Show Abstract
The first Modelfest group publication appeared in the SPIE Human Vision and Electronic Imaging conference
proceedings in 1999. "One of the group's goals is to develop a public database of test images with threshold data from
multiple laboratories for designing and testing HVS (Human Vision Models)." After extended discussions the group
selected a set of 45 static images thought to best meet that goal and collected psychophysical detection data which is
available on the WEB and presented in the 2000 SPIE conference proceedings. Several groups have used these datasets
to test spatial modeling ideas. Further discussions led to the preliminary stimulus specification for extending the database
into the temporal domain which was published in the 2002 conference proceeding.
After a hiatus of 12 years, some of us have collected spatio-temporal thresholds on an expanded stimulus set of
41 video clips; the original specification included 35 clips. The principal change involved adding one additional spatial
pattern beyond the three originally specified. The stimuli consisted of 4 spatial patterns, Gaussian Blob, 4 c/d Gabor
patch, 11.3 c/d Gabor patch and a 2D white noise patch. Across conditions the patterns were temporally modulated over
a range of approximately 0-25 Hz as well as temporal edge and pulse modulation conditions. The display and data
collection specifications were as specified by the Modelfest groups in the 2002 conference proceedings.
To date seven subjects have participated in this phase of the data collection effort, one of which also
participated in the first phase of Modelfest. Three of the spatio-temporal stimuli were identical to conditions in the
original static dataset. Small differences in the thresholds were evident and may point to a stimulus limitation. The
temporal CSF peaked between 4 and 8 Hz for the 0 c/d (Gaussian blob) and 4 c/d patterns. The 4 c/d and 11.3 c/d Gabor
temporal CSF was low pass while the 0 c/d pattern was band pass.
This preliminary expansion of the Modelfest dataset needs the participation of additional laboratories to
evaluate the impact of different methods on threshold estimates and increase the subject base. We eagerly await the
addition of new data from interested researchers. It remains to be seen how accurately general HVS models will predict
thresholds across both Modelfest datasets.
A database of local masking thresholds in natural images
Author(s):
Md Mushfiqul Alam;
Kedarnath P. Vilankar;
Damon M. Chandler
Show Abstract
The ability of an image region to hide or mask a target signal continues to play a key role in the design of numerous image-processing and vision applications. However, one of the challenges in designing an effective model of masking for natural images is the lack of ground-truth data. To address this issue, this paper describes a psychophysical study designed to obtain local contrast detection thresholds (masking maps) for a database of natural images. Via a three-alternative forced-choice experiment, we measured the thresholds for detecting 3.7 cycles/deg vertically oriented log-Gabor targets placed within each 85×85-pixel patch (1.9 deg patch) of 15 natural images from the CSIQ image database [Larson and Chandler, JEI, 2010]. Thus, for each image, we obtained a masking map in which each entry in the map denotes the RMS contrast threshold for detecting the log-Gabor target at the corresponding spatial location in the image. Here, we describe the psychophysical procedures used to collect the thresholds, we provide analyses of the results, and we provide some outcomes of predicting the thresholds via basic low-level features, a computational masking model, and two modern imagequality assessment algorithms.
Interplay between JPEG-2000 image coding and quality estimation
Author(s):
Guilherme O. Pinto;
Sheila S. Hemami
Show Abstract
Image quality and utility estimators aspire to quantify the perceptual resemblance and the usefulness of a
distorted image when compared to a reference natural image, respectively. Image-coders, such as JPEG-2000,
traditionally aspire to allocate the available bits to maximize the perceptual resemblance of the compressed
image when compared to a reference uncompressed natural image. Specifically, this can be accomplished by
allocating the available bits to minimize the overall distortion, as computed by a given quality estimator. This
paper applies five image quality and utility estimators, SSIM, VIF, MSE, NICE and GMSE, within a JPEG-2000
encoder for rate-distortion optimization to obtain new insights on how to improve JPEG-2000 image coding for
quality and utility applications, as well as to improve the understanding about the quality and utility estimators
used in this work.
This work develops a rate-allocation algorithm for arbitrary quality and utility estimators within the Post-
Compression Rate-Distortion Optimization (PCRD-opt) framework in JPEG-2000 image coding. Performance
of the JPEG-2000 image coder when used with a variety of utility and quality estimators is then assessed.
The estimators fall into two broad classes, magnitude-dependent (MSE, GMSE and NICE) and magnitudeindependent
(SSIM and VIF). They further differ on their use of the low-frequency image content in computing
their estimates. The impact of these computational differences is analyzed across a range of images and bit rates.
In general, performance of the JPEG-2000 coder below 1.6 bits/pixel with any of these estimators is highly content
dependent, with the most relevant content being the amount of texture in an image and whether the strongest
gradients in an image correspond to the main contours of the scene. Above 1.6 bits/pixel, all estimators produce
visually equivalent images. As a result, the MSE estimator provides the most consistent performance across all
images, while specific estimators are expected to provide improved performance for images with suitable content.
From image quality to atmosphere experience: how evolutions in technology impact experience assessment
Author(s):
Ingrid Heynderickx;
Huib de Ridder
Show Abstract
Image quality is a concept that for long very well served to optimize display performance and signal quality. New
technological developments, however, forced the community to look into higher-level concepts to capture the full
experience. Terms as naturalness and viewing experience were used to optimize the full experience of 3D-displays and
Ambilight TV. These higher-level concepts capture differences in image quality and differences in perceived depth or
perceived viewing field. With the introduction of solid-state lighting, further enhancing the multimedia experience, yet
more advanced quality evaluation concepts to optimize the overall experience will be needed in the future.
Preference limits of the visual dynamic range for ultra high quality and aesthetic conveyance
Author(s):
Scott Daly;
Timo Kunkel;
Xing Sun;
Suzanne Farrell;
Poppy Crum
Show Abstract
A subjective study was conducted to investigate the preferred maximum and minimum display luminances in order to
determine the dynamic ranges for future displays. Two studies address the diffuse reflective regions, and a third study
tested preferences of highlight regions. Preferences, as opposed to detection thresholds, were studied to provide results more directly relevant to the viewing of entertainment or art. Test images were specifically designed to test these limits without the perceptual conflicts that usually occur in these types of studies. For the diffuse range, we found a display with a dynamic range having luminances between 0.1 and 650 cd/m2 matches the average preferences. However, to satisfy 90% of the population, a dynamic range from 0.005 and ~3,000 cd/m2 is needed. Since a display should be able to produce values brighter than the diffuse white maximum, as in specular highlights and emissive sources, the highlight study concludes that even the average preferred maximum luminance for highlight reproduction is ~4,000 cd/m2.
Visualizing lighting with images: converging between the predictive value of renderings and photographs
Author(s):
Ulrich Engelke;
Mariska G. M. Stokkermans;
Michael J. Murdoch
Show Abstract
Performing psychophysical experiments to investigate lighting perception can be expensive and time consuming
if complex lighting systems need to be implemented. In this paper, display-based experiments are explored
as a cost effective and less time consuming alternative to real-world experiments. The aim of this work is to
better understand the upper limit of prediction accuracy that can be achieved when presenting an image on a
display rather than the real-world scene. We compare the predictive value of photographs and physically-based
renderings on a number of perceptual lighting attributes. It is shown that the photographs convey statistically
the same lighting perception as in a real-world scenario. Initial renderings have an inferior performance, but are
shown to converge towards the performance of the photographs through iterative improvements.
A survey on 3D quality of experience and 3D quality assessment
Author(s):
Anush K. Moorthy;
Alan C. Bovik
Show Abstract
The field of algorithmically assessing the 3D quality of experience and/or 3D quality is an extremely challenging
one; making it a fertile ground for research. The complexity of the problem, coupled with our yet nascent
understanding of 3D perception and the increasing commercial shift toward 3D entertainment makes the area
of 3D QA interesting, formidable and practically relevant. This article undertakes a brief review of the recent
research in the area of 3D visual quality of experience and quality assessment. We first review literature in the
field of quality of experience which encompasses geometry, visual discomfort etc., and then perform a similar
review in the field of quality assessment which encompasses distortions such as blur, noise, compression etc. We
describe algorithms and databases that have been proposed in the literature for these purposes. We conclude
with a short description of a recent resource - the LIVE 3D IQA database that is the first quality assessment
database which provides researchers with access to true depth information for each of the stereo pairs obtained
from a high-precision range scanner.
Visual quality beyond artifact visibility
Author(s):
Judith A. Redi
Show Abstract
The Electronic imaging community has devoted a lot of effort to the development of technologies that can predict the
visual quality of images and videos, as a basis for the delivery of optimal visual quality to the user. These systems have
been based for the most part on a visibility-centric approach, assuming the more artifacts are visible, the higher is the
annoyance they provoke, the lower the visual quality. Despite the remarkable results achieved with this approach,
recently a number of studies suggested that the visibility-centric approach to visual quality might have limitations, and
that other factors might influence the overall quality impression of an image or video, depending on cognitive and
affective mechanisms that work on top of perception. In particular, interest in the visual content, engagement and context of usage have been found to impact on the overall quality impression of the image/video. In this paper, we review these studies and explore the impact that affective and cognitive processes have on the visual quality. In addition, as a case study, we present the results of an experiment investigating on the impact of aesthetic appeal on visual quality, and we show that users tend to be more demanding in terms of visual quality judging beautiful images.
Subjective matters: from image quality to image psychology
Author(s):
Elena A. Fedorovskaya;
Huib De Ridder
Show Abstract
From the advent of digital imaging through several decades of studies, the human vision research community systematically focused on perceived image quality and digital artifacts due to resolution, compression, gamma, dynamic range, capture and reproduction noise, blur, etc., to help overcome existing technological challenges and shortcomings. Technological advances made digital images and digital multimedia nearly flawless in quality, and ubiquitous and pervasive in usage, provide us with the exciting but at the same time demanding possibility to turn to the domain of human experience including higher psychological functions, such as cognition, emotion, awareness, social interaction, consciousness and Self. In this paper we will outline the evolution of human centered multidisciplinary studies related to imaging and propose steps and potential foci of future research.
The rough side of texture: texture analysis through the lens of HVEI
Author(s):
Thrasyvoulos N. Pappas
Show Abstract
We take a look at texture analysis research over the past 25 years, from the persective of the Human Vision
and Electronic Imaging conference. We consider advances in the understanding of human perception of textures
and the development of texture analysis algorithms for practical applications. We cover perceptual models
and algorithms for image halftoning, texture discrimination, texture segmentation, texture analysis/synthesis,
perceptually and structurally lossless compression, content-based retrieval, and sense substitution.
Optimizing visual performance by adapting images to observers
Author(s):
Michael A. Webster;
Igor Juricevic
Show Abstract
Visual adaptation is widely assumed to optimize visual performance, but demonstrations of functional benefits beyond the case of light adaptation remain elusive. The failure to find marked improvements in visual discriminations with contrast or pattern adaptation may occur because these become manifest only over timescales that are too long to probe by briefly adapting observers. We explored the potential consequences of color contrast adaptation by instead “adapting” images to simulate how they should appear to observers under theoretically complete adaptation to different environments, and then used a visual search task to measure the ability to detect colors within the adapted images. Color salience can be markedly improved for extreme environments to which the observer is not routinely exposed, and may also be enhanced even among naturally occurring outdoor environments. The changes in performance provides a measure of how much in theory the visual system can be optimized for a given task and environment, and can reveal the extent to which differences in the statistics of the environment or the sensitivity of the observer are important in driving the states of adaptation. Adapting the images also provides a potential practical tool for optimizing performance in novel visual contexts, by rendering image information in a format that the visual system is already calibrated for.
Efficient image representations and features
Author(s):
Michael Dorr;
Eleonora Vig;
Erhardt Barth
Show Abstract
Interdisciplinary research in human vision and electronic imaging has greatly contributed to the current state of the art in imaging technologies. Image compression and image quality are prominent examples and the progress made in these areas relies on a better understanding of what natural images are and how they are perceived by the human visual system. A key research question has been: given the (statistical) properties of natural images, what are the most efficient and perceptually relevant image representations, what are the most prominent and descriptive features of images and videos?
We give an overview of how these topics have evolved over the 25 years of HVEI conferences and how they have influenced the current state of the art. There are a number of striking parallels between human vision and electronic imaging. The retina does lateral inhibition, one of the early coders was using a Laplacian pyramid; primary visual cortical areas have orientation- and frequency-selective neurons, the current JPEG standard defines similar wavelet transforms; the brain uses a sparse code, engineers are currently excited about sparse coding and compressed sensing. Some of this has indeed happened at the HVEI conferences and we would like to distill that.
Highly overcomplete sparse coding
Author(s):
Bruno A. Olshausen
Show Abstract
This paper explores sparse coding of natural images in the highly overcomplete regime. We show that as the overcompleteness ratio approaches l0x, new types of dictionary elements emerge beyond the classical Gabor function shape obtained from complete or only modestly overcomplete sparse coding. These more diverse dic tionaries allow images to be approximated with lower L1 norm (for a fixed SNR), and the coefficients exhibit steeper decay. We also evaluate the learned dictionaries in a denoising task, showing that higher degrees of overcompleteness yield modest gains in peformance.
Blind image quality assessment without training on human opinion scores
Author(s):
Anish Mittal;
Rajiv Soundararajan;
Gautam S. Muralidhar;
Alan C. Bovik;
Joydeep Ghosh
Show Abstract
We propose a family of image quality assessment (IQA) models based on natural scene statistics (NSS), that can predict the subjective quality of a distorted image without reference to a corresponding distortionless image, and without any training results on human opinion scores of distorted images. These `completely blind' models compete well with standard non-blind image quality indices in terms of subjective predictive performance when tested on the large publicly available `LIVE' Image Quality database.
Saliency identified by absence of background structure
Author(s):
Fred W. M. Stentiford
Show Abstract
Visual attention is commonly modelled by attempting to characterise objects using features that make them special or in some way distinctive in a scene. These approaches have the disadvantage that it is never certain what features will be relevant in an object that has not been seen before. This paper provides a brief outline of the approaches to modeling human visual attention together with some of the problems that they face. A graphical representation for image similarity is described that relies on the size of maximally associative structures (cliques) that are found to be reflected in pairs of images. While comparing an image with itself, the similarity mechanism is shown to model pop-out effects when constraints are placed on the physical separation of pixels that correspond to nodes in the maximal cliques. Background regions are found to contain structure in common that is not present in the salient regions which are thereby identified by its absence. The approach is illustrated with figures that exemplify asymmetry in pop-out, the conjunction of features, orientation disturbances and the application to natural images.
Investigation of eye-catching colors using eye tracking
Author(s):
Mokryun Baik;
Hyeon-Jeong Suk;
Jeongmin Lee;
Kyungah Choi
Show Abstract
An eye tracking experiment was conducted to investigate the relationship between eye gazing movements and the color attributes to support the creation of effective communication and increase aesthetic satisfaction. With consideration to the context of smart phones, the study focused on icon arrays, and thus each stimulus set was composed of 25 color square patches arrayed in the format of a 5 by 5 grid. The experiment was divided into three parts, each examining one specific attribute of color, while controlling its other attributes. Fifteen college students were recruited, among whom all partook in all three parts. In Part I, hue difference was examined. Each stimulus set contained 25 hues under a fixed tone. It was revealed that subjects were more attentive to warm colors than to cool colors, particularly when warm colors were arranged along the horizontal and vertical axes; In Part II, the experiment dealt with tone difference. 25 tone variations for red, green and blue were provided as stimulus sets. However, the result indicated that changes in tone does not have a significant influence on subjects’ initial attention; Lastly, in Part III, combinations of colors were examined to determine whether color contrast influenced participants’ attention in a manner different from that of single colors. Among them, icons with complementary contrast gained the greatest attention. Throughout the experiments, the background was applied with either black or white; however a contrast effect between background and foreground was not noticeable.
Can relative skill be determined from a photographic portfolio?
Author(s):
Abhishek Agrawal;
Vittal Premachandran;
Rajesh Somavarapu;
Ramakrishna Kakarala
Show Abstract
In this study, our primary aim is to determine empirically the role that skill plays in determining image aesthetics, and
whether it can be deciphered from the ratings given by a diverse group of judges. To this end, we have collected and
analyzed data from a large number of subjects (total 168) on a set of 221 of images taken by 33 photographers having
different photographic skill and experience. We also experimented with the rating scales used by previous studies in this
domain by introducing a binary rating system for collecting judges’ opinions. The study also demonstrates the use of
Amazon Mechanical Turk as a crowd-sourcing platform in collecting scientific data and evaluating the skill of the judges
participating in the experiment. We use a variety of performance and correlation metrics to evaluate the consistency of
ratings across different rating scales and compare our findings. A novel feature of our study is an attempt to define a
threshold based on the consistency of ratings when judges rate duplicate images. Our conclusion deviates from earlier
findings and our own expectations, with ratings not being able to determine skill levels of photographers to a statistically
significant level.
Binocular eye movements in health and disease
Author(s):
Christopher W. Tyler
Show Abstract
Binocular eye movements form a finely-tuned system that requires accurate coordination of the oculomotor dynamics
and supports the vergence movements for tracking the fine binocular disparities required for 3D vision, and are
particularly susceptible to disruption by brain injury and other neural dysfunctions. Saccadic dynamics for a population
of 84 diverse participants show tight coefficients of variation of 2-10% of the mean value of each parameter.
Significantly slower dynamics were seen for vertical upward saccades. Binocular coordination of saccades was accurate
to within 1-4%, implying the operation of brainstem coordination mechanisms rather than independent cortical control of
the two eyes. A new principle of oculomotor control - reciprocal binocular inhibition – is introduced to complement
Sherrington’s and Hering’s Laws. This new law accounts for the fact that symmetrical vergence responses are about five
times slower than saccades of the same amplitude, although a comprehensive analysis of asymmetrical vergence
responses revealed unexpected variety in vergence dynamics. This analysis of the variety of human vergence responses
thus contributes substantially to the understanding of the oculomotor control mechanisms underlying the generation of
vergence movements and of the deficits in the oculomotor control resulting from mild traumatic brain injury.
Reflexive and voluntary control of smooth eye movements
Author(s):
Jeffrey B. Mulligan;
Scott B. Stevenson;
Lawrence K. Cormack
Show Abstract
An understanding of visually evoked smooth eye movements is required to predict the visibility and legibility of
moving displays, such as might be encountered in vehicles like aircraft and automobiles. We have studied the
response of the oculomotor system to various classes of visual stimuli, and analyzed the results separately for
horizontal and vertical version (in which the two eyes move together), and horizontal and vertical vergence (where
they move in opposite directions). Of the four types of motion, only vertical vergence cannot be performed under
voluntary control, and certain stimuli (all having relatively long latencies) are incapable of evoking it. In another
experiment, we instructed observers to track one of two targets, and measured weak but reliable responses to the
unattended target, in which the long-latency component of the response is abolished. Our results are consistent
with a system containing two distinct processes, a fast reflexive process which responds to a restricted class
of stimuli, and a slower voluntary process capable of following anything that can be seen, but incapable of
controlling vertical vergence.
Simple gaze-contingent cues guide eye movements in a realistic driving simulator
Author(s):
Laura Pomarjanschi;
Michael Dorr;
Peter J. Bex;
Erhardt Barth
Show Abstract
Looking at the right place at the right time is a critical component of driving skill. Therefore, gaze guidance has the potential to become a valuable driving assistance system. In previous work, we have already shown that complex gaze-contingent stimuli can guide attention and reduce the number of accidents in a simple driving simulator. We here set out to investigate whether cues that are simple enough to be implemented in a real car can also capture gaze during a more realistic driving task in a high-fidelity driving simulator. We used a state-of-the-art, wide-field-of-view driving simulator with an integrated eye tracker. Gaze-contingent warnings were implemented using two arrays of light-emitting diodes horizontally fitted below and above the simulated windshield. Thirteen volunteering subjects drove along predetermined routes in a simulated environment popu lated with autonomous traffic. Warnings were triggered during the approach to half of the intersections, cueing either towards the right or to the left. The remaining intersections were not cued, and served as controls.
The analysis of the recorded gaze data revealed that the gaze-contingent cues did indeed have a gaze guiding effect, triggering a significant shift in gaze position towards the highlighted direction. This gaze shift was not accompanied by changes in driving behaviour, suggesting that the cues do not interfere with the driving task itself.
Designing an obstacle display for helicopter operations in degraded visual environment
Author(s):
Patrizia M. Knabl;
Niklas Peinecke
Show Abstract
Flying in degraded visual environment is an extremely challenging task for a helicopter pilot. The loss of the outside visual reference causes impaired situation awareness, high workload and spatial disorientation leading to incidents like obstacle or ground hits. DLR is working on identifying ways to reduce this problem by providing the pilot with additional information from fused sensor data. Therefore, different display design solutions were developed. In a first study, the design focused on the use of a synthetic head-down display, considering different representations for obstacles, color coding and terrain features. Results show a subjective preference for the most detailed obstacle display, while objective results reveal better performance for the little less detailed display. In a second study, symbology for a helmet-mounted display was designed and evaluated in a part-task simulation. Design considerations focused on different obstacle representations as well as attentional and perceptual aspects associated with the use of helmet-mounted displays. Results show consistent findings to the first experiment, indicating that the display subjectively favored does not necessarily contribute to the best performance in detection. However when additional tasks have to be performed the level of clutter seems to impair the ability to respond correctly to secondary tasks. Thus the favored display type nonetheless seems to be the most promising solution since it is accompanied by the overall best objective results integrating both detection of obstacles and the ability to perform additional tasks.
Visual storytelling in 2D and stereoscopic 3D video: effect of blur on visual attention
Author(s):
Quan Huynh-Thu;
Cyril Vienne;
Laurent Blondé
Show Abstract
Visual attention is an inherent mechanism that plays an important role in the human visual perception. As our visual
system has limited capacity and cannot efficiently process the information from the entire visual field, we focus our
attention on specific areas of interest in the image for detailed analysis of these areas. In the context of media
entertainment, the viewers’ visual attention deployment is also influenced by the art of visual storytelling. To this date,
visual editing and composition of scenes in stereoscopic 3D content creation still mostly follows those used in 2D. In
particular, out-of-focus blur is often used in 2D motion pictures and photography to drive the viewer’s attention towards
a sharp area of the image. In this paper, we study specifically the impact of defocused foreground objects on visual
attention deployment in stereoscopic 3D content. For that purpose, we conducted a subjective experiment using an eyetracker.
Our results bring more insights on the deployment of visual attention in stereoscopic 3D content viewing, and
provide further understanding on visual attention behavior differences between 2D and 3D. Our results show that a
traditional 2D scene compositing approach such as the use of foreground blur does not necessarily produce the same
effect on visual attention deployment in 2D and 3D. Implications for stereoscopic content creation and visual fatigue are
discussed.
Using natural versus artificial stimuli to perform calibration for 3D gaze tracking
Author(s):
Christophe Maggia;
Nathalie Guyader;
Anne Guérin-Dugué
Show Abstract
The presented study tests which type of stereoscopic image, natural or artificial, is more adapted to perform efficient and reliable calibration in order to track the gaze of observers in 3D space using classical 2D eye tracker. We measured the horizontal disparities, i.e. the difference between the x coordinates of the two eyes obtained using a 2D eye tracker. This disparity was recorded for each observer and for several target positions he had to fixate. Target positions were equally distributed in the 3D space, some on the screen (with a null disparity), some behind the screen (uncrossed disparity) and others in front of the screen (crossed disparity). We tested different regression models (linear and non linear) to explain either the true disparity or the depth with the measured disparity. Models were tested and compared on their prediction error for new targets at new positions. First of all, we found that we obtained more reliable disparities measures when using natural stereoscopic images rather than artificial. Second, we found that overall a non-linear model was more efficient. Finally, we discuss the fact that our results were observer dependent, with variability’s between the observer’s behavior when looking at 3D stimuli. Because of this variability, we proposed to compute observer specific model to accurately predict their gaze position when exploring 3D stimuli.
Study of center-bias in the viewing of stereoscopic image and a framework for extending 2D visual attention models to 3D
Author(s):
Junle Wang;
Matthieu Perreira Da Silva;
Patrick Le Callet;
Vincent Ricordel
Show Abstract
Compared to the good performance that can be achieved by many 2D visual attention models, predicting salient regions of a 3D scene is still challenging. An efficient way to achieve this can be to exploit existing models designed for 2D content. However, the visual conflicts caused by binocular disparity and changes of viewing behavior in 3D viewing need to be dealt with. To cope with these, the present paper proposes a simple framework for extending
2D attention models for 3D images, well as evaluates center-bias in 3D-viewing condition. To validate the results, a database is created, which contains eye-movements of 35 subjects recorded during free viewing of eighteen 3D images and their corresponding 2D version. Fixation density maps indicate a weaker center-bias in the viewing of 3D images. Moreover, objective metric results demonstrate the efficiency of the proposed model and a large added value of center-bias when it is taken into account in computational modeling of 3D visual attention.
How visual attention is modified by disparities and textures changes?
Author(s):
Dar'ya Khaustova;
Jérome Fournier;
Emmanuel Wyckens;
Olivier Le Meur
Show Abstract
The 3D image/video quality of experience is a multidimensional concept that depends on 2D image quality, depth quantity and visual comfort. The relationship between these parameters is not yet clearly defined. From this perspective, we aim to understand how texture complexity, depth quantity and visual comfort influence the way people observe 3D content in comparison with 2D. Six scenes with different structural parameters were generated using Blender software. For these six scenes, the following parameters were modified: texture complexity and the amount of depth changing the camera baseline and the convergence distance at the shooting side. Our study was conducted using an eye-tracker and a 3DTV display. During the eye-tracking experiment, each observer freely examined images with different depth levels and texture complexities. To avoid memory bias, we ensured that each observer had only seen scene content once. Collected fixation data were used to build saliency maps and to analyze differences between 2D and 3D conditions. Our results show that the introduction of disparity shortened saccade length; however fixation durations remained unaffected. An analysis of the saliency maps did not reveal any differences between 2D and 3D conditions for the viewing duration of 20 s. When the whole period was divided into smaller intervals, we found that for the first 4 s the introduced disparity was conducive to the section of saliency regions. However, this contribution is quite minimal if the correlation between saliency maps is analyzed. Nevertheless, we did not find that discomfort (comfort) had any influence on visual attention. We believe that existing metrics and methods are depth insensitive and do not reveal such differences. Based on the analysis of heat maps and paired t-tests of inter-observer visual congruency values we deduced that the selected areas of interest depend on texture complexities.
Copy-paste in depth
Author(s):
Maarten W. A. Wijntjes
Show Abstract
Whereas pictorial space plays an important role in art historic discussions, there is little research on the quantitative structure of pictorial spaces. Recently, a number of methods have been developed, one of which relies on size constancy: two spheres are rendered in the image while the observers adjusts the relative sizes such that they appear to have similar sizes in pictorial space. This method is based on pair-wise comparisons, resulting in n(n-1)/2 trials for n samples. Furthermore, it renders a probe in the image that does not conform to the style of the painting: it mixes computer graphics with a painting. The method proposed here uses probes that are already in the scene, not violating the paintings' style. An object is copied from the original painting and shown in a different location. The observer can adjust the scaling such that the two objects (one originally in the painting, and the other copy-pasted) appear to have equal sizes in pictorial space. Since the original object serves as a reference, the number of trials increases with n instead of n2 which is the case of the original method. We measured the pictorial spaces of two paintings using our method, one Canaletto and one Breughel. We found that observers typically agreed well with respect to each other, coefficients of determination as high as 0.9 were found when the probe was a human, while other probes scored somewhat (but significantly) lower. These initial findings appear very promising for the study of pictorial space.
Drawing accuracy measured using polygons
Author(s):
Linda Carson;
Matthew Millard;
Nadine Quehl;
James Danckert
Show Abstract
The study of drawing, for its own sake and as a probe into human visual perception, generally depends on ratings by human critics and self-reported expertise of the drawers. To complement those approaches, we have developed a geometric approach to analyzing drawing accuracy, one whose measures are objective, continuous and performance-based. Drawing geometry is represented by polygons formed by landmark points found in the drawing. Drawing accuracy is assessed by comparing the geometric properties of polygons in the drawn image to the equivalent polygon in a ground truth photo. There are four distinct properties of a polygon: its size, its position, its orientation and the proportionality of its shape. We can decompose error into four components and investigate how each contributes to drawing performance. We applied a polygon-based accuracy analysis to a pilot data set of representational drawings and found that an expert drawer outperformed a novice on every dimension of polygon error. The results of the pilot data analysis correspond well with the apparent quality of the drawings, suggesting that the landmark and polygon analysis is a method worthy of further study. Applying this geometric analysis to a within-subjects comparison of accuracy in the positive and negative space suggests there is a trade-off on dimensions of error. The performance-based analysis of geometric deformations will allow the study of drawing accuracy at different levels of organization, in a systematic and quantitative manner. We briefly describe the method and its potential applications to research in drawing education and visual perception.
Fractals in art and nature: why do we like them?
Author(s):
Branka Spehar;
Richard P Taylor
Show Abstract
Fractals have experienced considerable success in quantifying the visual complexity exhibited by many natural patterns,
and continue to capture the imagination of scientists and artists alike. Fractal patterns have also been noted for their
aesthetic appeal, a suggestion further reinforced by the discovery that the poured patterns of the American abstract
painter Jackson Pollock are also fractal, together with the findings that many forms of art resemble natural scenes in
showing scale-invariant, fractal-like properties. While some have suggested that fractal-like patterns are inherently
pleasing because they resemble natural patterns and scenes, the relation between the visual characteristics of fractals and
their aesthetic appeal remains unclear. Motivated by our previous findings that humans display a consistent preference
for a certain range of fractal dimension across fractal images of various types we turn to scale-specific processing of
visual information to understand this relationship. Whereas our previous preference studies focused on fractal images
consisting of black shapes on white backgrounds, here we extend our investigations to include grayscale images in which
the intensity variations exhibit scale invariance. This scale-invariance is generated using a 1/f frequency distribution and
can be tuned by varying the slope of the rotationally averaged Fourier amplitude spectrum. Thresholding the intensity of
these images generates black and white fractals with equivalent scaling properties to the original grayscale images,
allowing a direct comparison of preferences for grayscale and black and white fractals. We found no significant
differences in preferences between the two groups of fractals. For both set of images, the visual preference peaked for
images with the amplitude spectrum slopes from 1.25 to 1.5, thus confirming and extending the previously observed
relationship between fractal characteristics of images and visual preference.
Picture perception and visual field
Author(s):
Andrea J. van Doorn;
Huib de Ridder;
Jan Koenderink
Show Abstract
Looking at a picture fills part of the visual field. In the case of straight photographs there is a notion of the “Field of View” of the camera at the time of exposure. Is there a corresponding notion for the perception of the picture? In most cases the part of the visual field (as measured in degrees) filled by the picture will be quite different from the field of view of the camera. The case of works of arts is even more complicated, there need not even exist a well defined central view point. With several examples we show that there is essentially no notion of a corresponding “field of view” in pictorial perception. This is even the case for drawings in conventional linear perspective. Apparently the “mental eye” of the viewer is often unrelated to the geometry of the camera (or perspective center used in drawing). Observers often substitute templates instead of attempting an analysis of perspective.
Measurements of achromatic and chromatic contrast sensitivity functions for an extended range of adaptation
luminance
Author(s):
Kil Joong Kim;
Rafal Mantiuk;
Kyoung Ho Lee
Show Abstract
Inspired by the ModelFest and ColorFest data sets, a contrast sensitivity function was measured for a wide range
of adapting luminance levels. The measurements were motivated by the need to collect visual performance data
for natural viewing of static images at a broad range of luminance levels, such as can be found in the case of high
dynamic range displays. The detection of sine-gratings with Gaussian envelope was measured for achromatic
color axis (black to white), two chromatic axes (green to red and yellow-green to violet) and two mixed chromatic
and achromatic axes (dark-green to light-pink, and dark yellow to light-blue). The background luminance varied
from 0.02 to 200 cd/m2. The spatial frequency of the gratings varied from 0.125 to 16 cycles per degree. More
than four observers participated in the experiments and they individually determined the detection threshold
for each stimulus using at least 20 trials of the QUEST method. As compared to the popular CSF models, we
observed higher sensitivity drop for higher frequencies and significant differences in sensitivities in the luminance
range between 0.02 and 2 cd/m2. Our measurements for chromatic CSF show a significant drop in sensitivity with
luminance, but little change in the shape of the CSF. The drop of sensitivity at high frequencies is significantly
weaker than reported in other studies and assumed in most chromatic CSF models.
Viewer preferences for adaptive playout
Author(s):
Sachin Deshpande
Show Abstract
Adaptive media playout techniques are used to avoid buffer underflow in a dynamic streaming environment where the
available bandwidth may be fluctuating. In this paper we report human perceptions from audio quality studies that we
performed on speech and music samples for adaptive audio playout. Test methods based on ITU-R BS. 1534-1
recommendation were used. Studies were conducted for both slow playout and fast playout. Two scales - a coarse scale
and a finer scale was used for the slow and fast audio playout factors. Results from our study can be used to determine
acceptable slow and fast playout factors for speech and music content. An adaptive media playout algorithm could use
knowledge of these upper and lower bounds on playback speeds to decide its adaptive playback schedule.
The effect of familiarity on perceived interestingness of images
Author(s):
Sharon Lynn Chu;
Elena Fedorovskaya;
Francis Quek;
Jeffrey Snyder
Show Abstract
We present an exploration of familiarity as a meaningful dimension for the individualized adaptation of media-rich
interfaces. In this paper, we investigate in particular the effect of digital images personalized for familiarity on users’
perceived interestingness. Two dimensions of familiarity, facial familiarity and familiarity with image context, are
manipulated. Our investigation consisted of three studies: the first two address how morphing technology can be used to
convey meaningful familiarity, and the third studies the effect of such familiarity on users’ sense of interestingness. Four
levels of person familiarity varying in degree of person knowledge, and two levels of context familiarity varying in
frequency of exposure, were considered: Self, Friend, Celebrity, and Stranger in Familiar and Unfamiliar contexts.
Experimental results showed significant main effects of context and person familiarity. Our findings deepen
understanding of the critical element of familiarity in HCI and its relationship to the interestingness of images, and can
have great impact for the design of media-rich systems.
Quantifying patterns of dynamics in eye movement to measure goodness in organization of design elements in interior architecture
Author(s):
Hasti Mirkia;
Arash Sangari;
Mark Nelson;
Amir H. Assadi
Show Abstract
Architecture brings together diverse elements to enhance the observer’s measure of esthetics and the convenience of
functionality. Architects often conceptualize synthesis of design elements to invoke the observer’s sense of harmony and
positive affect. How does an observer’s brain respond to harmony of design in interior spaces? One implicit
consideration by architects is the role of guided visual attention by observers while navigating indoors. Prior visual
experience of natural scenes provides the perceptual basis for Gestalt of design elements. In contrast, Gestalt of
organization in design varies according to the architect’s decision. We outline a quantitative theory to measure the
success in utilizing the observer’s psychological factors to achieve the desired positive affect. We outline a unified
framework for perception of geometry and motion in interior spaces, which integrates affective and cognitive aspects of
human vision in the context of anthropocentric interior design. The affective criteria are derived from contemporary
theories of interior design. Our contribution is to demonstrate that the neural computations in an observer’s eye
movement could be used to elucidate harmony in perception of form, space and motion, thus a measure of goodness of
interior design. Through mathematical modeling, we argue the plausibility of the relevant hypotheses.
Development of a human vision simulation camera and its application: implementation of specific color perception
Author(s):
Hiroshi Okumura;
Shoichiro Takubo;
Shoichi Ozaki;
Takeru Kawasaki;
Indra Nugraha Abdullah;
Kohei Arai;
Osamu Fukuda
Show Abstract
The authors have developed HuVisCam, a human vision simulation camera, that can simulate not only Purkinje
effect for mesopic and scotopic vision but also dark and light adaptation, abnormal miosis and abnormal mydriasis
caused by the influence of mydriasis medicine or nerve agent This camera consists of a bandpass pre-filter, a
color USB camera, an Illuminator and a small computer. In this article, improvement of HuVisCam for specific
color perception is discussed. For persons with normal color perception, simulation function of various types of
specific color perception is provided. In addition, for persons with specific color perception, color information
analyzing function is also provided.
IMF-based chaotic characterization of AP and ML visually-driven postural responses
Author(s):
Hanif Azhar;
Guillaume Giraudet;
Jocelyn Faubert
Show Abstract
The objective was to analyze visually driven postural responses and characterize any non-linear behaviour. We recorded physiological responses for two adults, 260 trials each. The subjects maintained quite stance while fixating for four seconds within an immersive room, EON Icube, where the reference to the visual stimuli, i.e., the virtual platform, randomly oscillated in Gaussian orientation 90° and 270° for antero-posterior (AP), and, 0° and 180° for medio-lateral (ML) at three different frequencies (0.125, 0.25, and 0.5 Hz). We accomplished stationary derivatives of posture time series by taking the intrinsic mode functions (IMFs). The phase space plot of IMF shows evidence of the existence of non-linear attractors in both ML and AP. Correlation integral slope with increasing embedding dimension is similar to random white noise for ML, and similar to non-linear chaotic series for AP. Next, recurrence plots indicate the existence of more non-linearity for AP than that for ML. The patterns of the dots after 200th time stamp (near onset) appears to be aperodic in AP. At higher temporal windows, AP entropy tends more toward chaotic series, than that of ML. There are stronger non-linear components in AP than that in ML regardless of the speed conditions.
Application of imaging technology for archaeology researches: framework design for connectivity analysis in pieces of Jomon pottery
Author(s):
Kimiyoshi Miyata;
Ryota Yajima;
Kenichi Kobayashi
Show Abstract
Jomon pottery is one kind of earthenware produced in Jomon period in Japan. Potteries are found by the excavations in archaeological sites, however their original whole shapes have been dismissed because those are broken and separated into small pieces. In the archaeological investigation process, reproduction of the whole shape of the potteries is one of the important and difficult tasks because there are a lot of pieces and the number of combinations among the pieces is huge. In this paper, a framework for an application of the imaging technology is explained at first, then connectivity analysis among the pieces of Jomon potteries is focused on to reduce the number of trial and error to find connectable combinations in the pieces. The authentic pieces are chosen and taken by a digital camera, and each piece in the image is labeled to calculate the statistical information in the analysis of the connectivity. A coefficient showing the connectivity of the pieces is defined and calculated to indicate probability of connection among the pieces in the image. Experimental result showed that the correct pieces could be detected by using the coefficient.
Top-down visual search in Wimmelbild
Author(s):
Julia Bergbauer;
Sibel Tari
Show Abstract
Wimmelbild which means “teeming figure picture” is a popular genre of visual puzzles. Abundant masses of small
figures are brought together in complex arrangements to make one scene in a Wimmelbild. It is picture hunt game. We
discuss what type of computations/processes could possibly underlie the solution of the discovery of figures that are
hidden due to a distractive influence of the context. One thing for sure is that the processes are unlikely to be purely
bottom-up. One possibility is to re-arrange parts and see what happens. As this idea is linked to creativity, there are
abundant examples of unconventional part re-organization in modern art. A second possibility is to define what to look
for. That is to formulate the search as a top-down process. We address top-down visual search in Wimmelbild with the
help of diffuse distance and curvature coding fields.
Visual discrimination and adaptation using non-linear unsupervised learning
Author(s):
Sandra Jiménez;
Valero Laparra;
Jesus Malo
Show Abstract
Understanding human vision not only involves empirical descriptions of how it works, but also organization
principles that explain why it does so. Identifying the guiding principles of visual phenomena requires learning
algorithms to optimize specific goals. Moreover, these algorithms have to be flexible enough to account for the
non-linear and adaptive behavior of the system.
For instance, linear redundancy reduction transforms certainly explain a wide range of visual phenomena.
However, the generality of this organization principle is still in question:10 it is not only that and additional
constraints such as energy cost may be relevant as well, but also, statistical independence may not be the better
solution to make optimal inferences in squared error terms. Moreover, linear methods cannot account for the
non-uniform discrimination in different regions of the image and color space: linear learning methods necessarily
disregard the non-linear nature of the system. Therefore, in order to account for the non-linear behavior,
principled approaches commonly apply the trick of using (already non-linear) parametric expressions taken from
empirical models. Therefore these approaches are not actually explaining the non-linear behavior, but just
fitting it to image statistics. In summary, a proper explanation of the behavior of the system requires flexible
unsupervised learning algorithms that (1) are tunable to different, perceptually meaningful, goals; and (2) make
no assumption on the non-linearity.
Over the last years we have worked on these kind of learning algorithms based on non-linear ICA,18 Gaussianization,
19 and principal curves. In this work we stress the fact that these methods can be tuned to
optimize different design strategies, namely statistical independence, error minimization under quantization, and
error minimization under truncation. Then, we show (1) how to apply these techniques to explain a number of
visual phenomena, and (2) suggest the underlying organization principle in each case.
Chromatic induction and contrast masking: similar models, different goals?
Author(s):
Sandra Jiménez;
Xavier Otazu;
Valero Laparra;
Jesús Malo
Show Abstract
Normalization of signals coming from linear sensors is an ubiquitous mechanism of neural adaptation.1 Local
interaction between sensors tuned to a particular feature at certain spatial position and neighbor sensors explains
a wide range of psychophysical facts including (1) masking of spatial patterns, (2) non-linearities of motion
sensors, (3) adaptation of color perception, (4) brightness and chromatic induction, and (5) image quality
assessment.
Although the above models have formal and qualitative similarities, it does not necessarily mean that the
mechanisms involved are pursuing the same statistical goal. For instance, in the case of chromatic mechanisms
(disregarding spatial information), different parameters in the normalization give rise to optimal discrimination
or adaptation, and different non-linearities may give rise to error minimization or component independence.
In the case of spatial sensors (disregarding color information), a number of studies have pointed out the benefits
of masking in statistical independence terms. However, such statistical analysis has not been performed for
spatio-chromatic induction models where chromatic perception depends on spatial configuration.
In this work we investigate whether successful spatio-chromatic induction models,6 increase component independence
similarly as previously reported for masking models. Mutual information analysis suggests that
seeking an efficient chromatic representation may explain the prevalence of induction effects in spatially simple
images.
Aesthetics and entropy II: a critical examination
Author(s):
M. R. V. Sahyun
Show Abstract
The proposal to use entropy as a metric for optimization of image processing has been subjected to further critical examination, on the basis of experiments with contrast adjustment, HDR imaging, bimodal brightness distributions, and unsharp masking. Consequently our original expectation that entropy may be a directly useful response metric for optimizing image processing now appears to us to be naïve and limited in its applicability. One purpose of the present investigation is to ascertain the nature of these limits. We also infer from the unsharp masking studies that the human visual system (HVS) has evolved not so much to maximize information captured from the visual field as to enhance compressibility and to effect image simplification.