Seven challenges for image quality research
Author(s):
Damon M. Chandler;
Md Mushfiqul Alam;
Thien D. Phan
Show Abstract
Image quality assessment has been a topic of recent intense research due to its usefulness in a wide variety of
applications. Owing in large part to efforts within the HVEI community, image-quality research has particularly
benefited from improved models of visual perception. However, over the last decade, research in image quality
has largely shifted from the previous broader objective of gaining a better understanding of human vision, to
the current limited objective of better fitting the available ground-truth data. In this paper, we discuss seven
open challenges in image quality research. These challenges stem from lack of complete perceptual models
for: natural images; suprathreshold distortions; interactions between distortions and images; images containing
multiple and nontraditional distortions; and images containing enhancements. We also discuss challenges related
to computational efficiency. The objective of this paper is not only to highlight the limitations in our current
knowledge of image quality, but to also emphasize the need for additional fundamental research in quality
perception.
Audiovisual focus of attention and its application to Ultra High Definition video compression
Author(s):
Martin Rerabek;
Hiromi Nemoto;
Jong-Seok Lee;
Touradj Ebrahimi
Show Abstract
Using Focus of Attention (FoA) as a perceptual process in image and video compression belongs to well-known approaches to increase coding efficiency. It has been shown that foveated coding, when compression quality varies across the image according to region of interest, is more efficient than the alternative coding, when all region are compressed in a similar way. However, widespread use of such foveated compression has been prevented due to two main conflicting causes, namely, the complexity and the efficiency of algorithms for FoA detection. One way around these is to use as much information as possible from the scene. Since most video sequences have an associated audio, and moreover, in many cases there is a correlation between the audio and the visual content, audiovisual FoA can improve efficiency of the detection algorithm while remaining of low complexity. This paper discusses a simple yet efficient audiovisual FoA algorithm based on correlation of dynamics between audio and video signal components. Results of audiovisual FoA detection algorithm are subsequently taken into account for foveated coding and compression. This approach is implemented into H.265/HEVC encoder producing a bitstream which is fully compliant to any H.265/HEVC decoder. The influence of audiovisual FoA in the perceived quality of high and ultra-high definition audiovisual sequences is explored and the amount of gain in compression efficiency is analyzed.
Influence of audio triggered emotional attention on video perception
Author(s):
Freddy Torres;
Hari Kalva
Show Abstract
Perceptual video coding methods attempt to improve compression efficiency by discarding visual information not
perceived by end users. Most of the current approaches for perceptual video coding only use visual features ignoring the
auditory component. Many psychophysical studies have demonstrated that auditory stimuli affects our visual perception.
In this paper we present our study of audio triggered emotional attention and it’s applicability to perceptual video
coding. Experiments with movie clips show that the reaction time to detect video compression artifacts was longer when
video was presented with the audio information. The results reported are statistically significant with p=0.024.
3D sound and 3D image interactions: a review of audio-visual depth perception
Author(s):
Jonathan S. Berry;
David A. T. Roberts;
Nicolas S. Holliman
Show Abstract
There has been much research concerning visual depth perception in 3D stereoscopic displays and, to a lesser extent, auditory depth perception in 3D spatial sound systems. With 3D sound systems now available in a number of different forms, there is increasing interest in the integration of 3D sound systems with 3D displays. It therefore seems timely to review key concepts and results concerning depth perception in such display systems. We first present overviews of both visual and auditory depth perception, before focussing on cross-modal effects in audio-visual depth perception, which may be of direct interest to display and content designers.
Roughness vs. contrast in natural textures
Author(s):
René van Egmond;
Huib de Ridder;
Thrasyvoulos N. Pappas;
Pubudu M. Silva
Show Abstract
We investigate the effect of contrast enhancement on the subjective roughness of visual textures. Our analysis
is based on subjective experiments with seventeen images from the CUReT database in three variants: original,
synthesized textures, and contrast-enhanced synthesized textures. In Experiment 1, participants were asked
to adjust the contrast of a synthesized image so that it became similar in roughness to the original image. A
new adaptive procedure that extends the staircase paradigm was used for efficient placement of the stimuli. In
Experiment 2, the subjective roughness and the subjective contrast of the original, synthesized, and contrastenhanced
synthesized images were determined using a pairwise comparison paradigm. The results of the two
experiments show that although contrast enhancement of a synthesized image results in a similar subjective
roughness as the original, the subjective contrast of that image is considerably higher than that of the original
image. Future research should give more insights in the interaction between roughness and contrast.
An investigation of visual selection priority of objects with texture and crossed and uncrossed disparities
Author(s):
Dar'ya Khaustova;
Jérôme Fournier;
Emmanuel Wyckens;
Olivier Le Meur
Show Abstract
The aim of this research is to understand the difference in visual attention to 2D and 3D content depending on texture
and amount of depth. Two experiments were conducted using an eye-tracker and a 3DTV display. Collected fixation data
were used to build saliency maps and to analyze the differences between 2D and 3D conditions. In the first experiment
51 observers participated in the test. Using scenes that contained objects with crossed disparity, it was discovered that
such objects are the most salient, even if observers experience discomfort due to the high level of disparity. The goal of
the second experiment is to decide whether depth is a determinative factor for visual attention. During the experiment, 28
observers watched the scenes that contained objects with crossed and uncrossed disparities. We evaluated features
influencing the saliency of the objects in stereoscopic conditions by using contents with low-level visual features. With
univariate tests of significance (MANOVA), it was detected that texture is more important than depth for selection of
objects. Objects with crossed disparity are significantly more important for selection processes when compared to 2D.
However, objects with uncrossed disparity have the same influence on visual attention as 2D objects. Analysis of eyemovements
indicated that there is no difference in saccade length. Fixation durations were significantly higher in
stereoscopic conditions for low-level stimuli than in 2D. We believe that these experiments can help to refine existing
models of visual attention for 3D content.
Memory texture as a mechanism of improvement in preference by adding noise
Author(s):
Yinzhu Zhao;
Naokazu Aoki;
Hiroyuki Kobayashi
Show Abstract
According to color research, people have memory colors for familiar objects, which correlate with high color preference.
As a similar concept to this, we propose memory texture as a mechanism of texture preference by adding image noise
(1/f noise or white noise) to photographs of seven familiar objects. Our results showed that (1) memory texture differed
from real-life texture; (2) no consistency was found between memory texture and real-life texture; (3) correlation existed
between memory texture and preferred texture; and (4) the type of image noise which is more appropriate to texture
reproduction differed by object.
Computer vision enhances mobile eye-tracking to expose expert cognition in natural-scene visual-search tasks
Author(s):
Tommy P. Keane;
Nathan D. Cahill;
John A. Tarduno;
Robert A. Jacobs;
Jeff B. Pelz
Show Abstract
Mobile eye-tracking provides the fairly unique opportunity to record and elucidate cognition in action. In our research,
we are searching for patterns in, and distinctions between, the visual-search performance of experts and novices in the
geo-sciences. Traveling to regions resultant from various geological processes as part of an introductory field studies
course in geology, we record the prima facie gaze patterns of experts and novices when they are asked to determine the
modes of geological activity that have formed the scene-view presented to them. Recording eye video and scene video
in natural settings generates complex imagery that requires advanced applications of computer vision research to generate
registrations and mappings between the views of separate observers. By developing such mappings, we could then place
many observers into a single mathematical space where we can spatio-temporally analyze inter- and intra-subject fixations,
saccades, and head motions. While working towards perfecting these mappings, we developed an updated experiment
setup that allowed us to statistically analyze intra-subject eye-movement events without the need for a common domain.
Through such analyses we are finding statistical differences between novices and experts in these visual-search tasks. In
the course of this research we have developed a unified, open-source, software framework for processing, visualization,
and interaction of mobile eye-tracking and high-resolution panoramic imagery.
An adaptive hierarchical sensing scheme for sparse signals
Author(s):
Henry Schütze;
Erhardt Barth;
Thomas Martinetz
Show Abstract
In this paper, we present Adaptive Hierarchical Sensing (AHS), a novel adaptive hierarchical sensing algorithm for sparse signals. For a given but unknown signal with a sparse representation in an orthogonal basis, the sensing task is to identify its non-zero transform coefficients by performing only few measurements. A measurement is simply the inner product of the signal and a particular measurement vector. During sensing, AHS partially traverses a binary tree and performs one measurement per visited node. AHS is adaptive in the sense that after each measurement a decision is made whether the entire subtree of the current node is either further traversed or omitted depending on the measurement value. In order to acquire an N -dimensional signal that is K-sparse, AHS performs O(K log N/K) measurements. With AHS, the signal is easily reconstructed by a basis transform without the need to solve an optimization problem. When sensing full-size images, AHS can compete with a state-of-the-art compressed sensing approach in terms of reconstruction performance versus number of measurements. Additionally, we simulate the sensing of image patches by AHS and investigate the impact of the choice of the sparse coding basis as well as the impact of the tree composition.
Referenceless perceptual fog density prediction model
Author(s):
Lark Kwon Choi;
Jaehee You;
Alan C. Bovik
Show Abstract
We propose a perceptual fog density prediction model based on natural scene statistics (NSS) and “fog aware” statistical
features, which can predict the visibility in a foggy scene from a single image without reference to a corresponding
fogless image, without side geographical camera information, without training on human-rated judgments, and without
dependency on salient objects such as lane markings or traffic signs. The proposed fog density predictor only makes use
of measurable deviations from statistical regularities observed in natural foggy and fog-free images. A fog aware
collection of statistical features is derived from a corpus of foggy and fog-free images by using a space domain NSS
model and observed characteristics of foggy images such as low contrast, faint color, and shifted intensity. The proposed
model not only predicts perceptual fog density for the entire image but also provides a local fog density index for each
patch. The predicted fog density of the model correlates well with the measured visibility in a foggy scene as measured
by judgments taken in a human subjective study on a large foggy image database. As one application, the proposed
model accurately evaluates the performance of defog algorithms designed to enhance the visibility of foggy images.
Dynamics of backlight luminance for using smartphone in dark environment
Author(s):
Nooree Na;
Jiho Jang;
Hyeon-Jeong Suk
Show Abstract
This study developed dynamic backlight luminance, which gradually changes as time passes for comfortable use of a
smartphone display in a dark environment. The study was carried out in two stages. In the first stage, a user test was
conducted to identify the optimal luminance by assessing the facial squint level, subjective glare evaluation, eye blink
frequency and users’ subjective preferences. Based on the results of the user test, the dynamics of backlight luminance
was designed. It has two levels of luminance: the optimal level for initial viewing to avoid sudden glare or fatigue to
users' eyes, and the optimal level for constant viewing, which is comfortable, but also bright enough for constant reading
of the displayed material. The luminance for initial viewing starts from 10 cd/m2, and it gradually increases to 40 cd/m2
for users’ visual comfort at constant viewing for 20 seconds; In the second stage, a validation test on dynamics of
backlight luminance was conducted to verify the effectiveness of the developed dynamics. It involving users' subjective
preferences, eye blink frequency, and brainwave analysis using the electroencephalogram (EEG) to confirm that the
proposed dynamic backlighting enhances users' visual comfort and visual cognition, particularly for using smartphones
in a dark environment.
Effects of image size and interactivity in lighting visualization
Author(s):
Michael J. Murdoch;
Mariska G. M. Stokkermans
Show Abstract
Rendered images of varied lighting conditions in a virtual environment have been shown to provide a perceptually
accurate visual impression of those in a real environment, providing a valuable tool set for the development and
communication of new lighting solutions. In order to further improve this tool set, an experiment was conducted to
assess the impact of image size and viewing interactivity on perceptual accuracy. It was found that a high-quality TVsized
display outperforms a smaller laptop screen and a larger projected image on most measures, and that the expected
value of the interactive panoramic format was masked by the fatigue of using it repeatedly.
On the delights of being an ex-cataract patient: Visual experiences before and after cataract operations; what they indicate
Author(s):
Floris L. van Nes
Show Abstract
This paper is about changes in the author's visual perception over most of his lifetime, but in particular in the period
before and after cataract operations. The author was myopic (-3D) until the operations, and emmetropic afterwards –
with mild astigmatic aberrations that can be compensated with cylindrical spectacles, but in his case rarely are, because
of the convenience of not needing to wear distance glasses in daily life anymore. The perceptual changes concern color
vision, stereopsis and visual acuity. The post-cataract changes were partly expected, for example less yellow and more
blue images, but partly wholly unexpected, and accompanied by feelings of excitement and pleasure; even delight. These
unexpected changes were a sudden, strongly increased depth vision and the sensation of seeing suddenly sharper than
ever before, mainly at intermediate viewing distances. The visual acuity changes occur after, exceptionally, his distance
glasses are put on. All these sensations lasted or last only for a short time. Those concerning stereopsis were dubbed
'super depth', and were confined to the first two months after the second cataract operation. Those concerning acuity
were termed 'super-sharpness impression'; SSI. These can be elicited more or less at will, by putting on the spectacles
described; but will then disappear again, although the spectacles are kept on. Ten other ex-cataract patients have been
interviewed on their post-operation experiences. The 'super-depth' and SSI experiences may be linked to assumed
neurophysiological mechanisms such as the concept of Bayesian reweighting of perceptual criteria.
X-Eye: A reference format for eye tracking data to facilitate analyses across databases
Author(s):
Stefan Winkler;
Florian M. Savoy;
Ramanathan Subramanian
Show Abstract
Datasets of images annotated with eye tracking data constitute important ground truth for the development of saliency models, which have applications in many areas of electronic imaging. While comparisons and reviews of saliency models abound, similar comparisons among the eye tracking databases themselves are rare. In an earlier paper, we reviewed the content and purpose of over two dozen databases available in the public domain and discussed their commonalities and differences. A major issue is that the formats of the various datasets vary a lot owing to the nature of tools used for eye movement recordings, and often specialized code is required to use the data for further analysis. In this paper, we therefore propose a common reference format for eye tracking data, together with conversion routines for 16 existing image eye tracking databases to that format. Furthermore, we conduct a few analyses on these datasets as examples of what X-Eye facilitates.
Modeling the leakage of LCD displays with local backlight for quality assessment
Author(s):
Claire Mantel;
Jari Korhonen;
Jesper Melgaard Pedersen;
Søren Bech;
Ehsan Nadernejad;
Nino Burini;
Søren Forchhammer
Show Abstract
The recent technique of local backlight dimming has a significant impact on the quality of images displayed with a LCD screen with LED local dimming. Therefore it represents a necessary step in the quality assessment chain, independently from the other processes applied to images. This paper investigates the modeling of one of the major spatial artifacts produced by local dimming: leakage. Leakage appears in dark areas when the backlight level is too high for LC cells to block sufficiently and the final displayed brightness is higher than it should.
A subjective quality experiment was run on videos displayed on LCD TV with local backlight dimming viewed from a 0° and 15° angles. The subjective results are then compared objective data using different leakage models: constant over the whole display or horizontally varying and three leakage factor (no leakage, measured at 0° and 15° respectively). Results show that for dark sequences accounting for the leakage artifact in the display model is definitely an improvement. Approximating that leakage is constant over the screen seems valid when viewing from a 15° angle while using a horizontally varying model might prove useful for 0° viewing.
On improving the pooling in HDR-VDP-2 towards better HDR perceptual quality assessment
Author(s):
Manish Narwaria;
Matthieu Perreira Da Silva;
Patrick Le Callet;
Romuald Pepion
Show Abstract
High Dynamic Range (HDR) signals capture much higher contrasts as compared to the traditional 8-bit low dynamic
range (LDR) signals. This is achieved by representing the visual signal via values that are related to the real-world
luminance, instead of gamma encoded pixel values which is the case with LDR. Therefore, HDR signals cover a larger
luminance range and tend to have more visual appeal. However, due to the higher luminance conditions, the existing
methods cannot be directly employed for objective quality assessment of HDR signals. For that reason, the HDR Visual
Difference Predictor (HDR-VDP-2) has been proposed. HDR-VDP-2 is primarily a visibility prediction metric i.e.
whether the signal distortion is visible to the eye and to what extent. Nevertheless, it also employs a pooling function to
compute an overall quality score. This paper focuses on the pooling aspect in HDR-VDP-2 and employs a
comprehensive database of HDR images (with their corresponding subjective ratings) to improve the prediction accuracy
of HDR-VDP-2. We also discuss and evaluate the existing objective methods and provide a perspective towards better
HDR quality assessment.
Theory and practice of perceptual video processing in broadcast encoders for cable, IPTV, satellite, and internet distribution
Author(s):
S. McCarthy
Show Abstract
This paper describes the theory and application of a perceptually-inspired video processing technology that was recently
incorporated into professional video encoders now being used by major cable, IPTV, satellite, and internet video service
providers. We will present data that show that this perceptual video processing (PVP) technology can improve video
compression efficiency by up to 50% for MPEG-2, H.264, and High Efficiency Video Coding (HEVC). The PVP
technology described in this paper works by forming predicted eye-tracking attractor maps that indicate how likely it
might be that a free viewing person would look at particular area of an image or video. We will introduce in this paper
the novel model and supporting theory used to calculate the eye-tracking attractor maps. We will show how the
underlying perceptual model was inspired by electrophysiological studies of the vertebrate retina, and will explain how
the model incorporates statistical expectations about natural scenes as well as a novel method for predicting error in
signal estimation tasks. Finally, we will describe how the eye-tracking attractor maps are created in real time and used
to modify video prior to encoding so that it is more compressible but not noticeably different than the original
unmodified video.
Temporal perceptual coding using a visual acuity model
Author(s):
Velibor Adzic;
Robert A. Cohen;
Anthony Vetro
Show Abstract
This paper describes research and results in which a visual acuity (VA) model of the human visual system (HVS) is used
to reduce the bitrate of coded video sequences, by eliminating the need to signal transform coefficients when their
corresponding frequencies will not be detected by the HVS. The VA model is integrated into the state of the art HEVC
HM codec. Compared to the unmodified codec, up to 45% bitrate savings are achieved while maintaining the same
subjective quality of the video sequences. Encoding times are reduced as well.
Characterizing perceptual artifacts in compressed video streams
Author(s):
Kai Zeng;
Tiesong Zhao;
Abdul Rehman;
Zhou Wang
Show Abstract
To achieve optimal video quality under bandwidth and power constraints, modern video coding techniques employ lossy coding schemes, which often create compression artifacts that may lead to degradation of perceptual video quality. Understanding and quantifying such perceptual artifacts play important roles in the development of effective video compression, streaming and quality enhancement systems. Moreover, the characteristics of compression artifacts evolve over time due to the continuous adoption of novel coding structures and strategies during the development of new video compression standards. In this paper, we reexamine the perceptual artifacts created by standard video compression, summarizing commonly observed spatial and temporal perceptual distortions in compressed video, with emphasis on the perceptual temporal artifacts that have not been well identified or accounted for in previous studies. Furthermore, a floating effect detection method is proposed that not only detects the existence of floating, but also segments the spatial regions where floating occurs∗.
Zero shot prediction of video quality using intrinsic video statistics
Author(s):
Anish Mittal;
Michele A. Saad;
Alan C. Bovik
Show Abstract
We propose a no reference (NR) video quality assessment (VQA) model. Recently, ‘completely blind’ still picture quality analyzers have been proposed that do not require any prior training on, or exposure to, distorted images or human opinions of them. We have been trying to bridge an important but difficult gap by creating a ‘completely blind’ VQA model. The principle of this new approach is founded on intrinsic statistical regularities that are observed in natural vidoes. This results in a video ‘quality analyzer’ that can predict the quality of distorted videos without any external knowledge about the pristine source, anticipated distortions or human judgments. Hence, the model is zero shot. Experimental results show that, even with such paucity of information, the new VQA algorithm performs better than the full reference (FR) quality measure PSNR on the LIVE VQA database. It is also fast and efficient. We envision that the proposed method is an important step towards making real time monitoring of ‘completely blind’ video quality feasible.
Personalized visual aesthetics
Author(s):
Edward A. Vessel;
Jonathan Stahl;
Natalia Maurer;
Alexander Denker;
G. Gabrielle Starr
Show Abstract
How is visual information linked to aesthetic experience, and what factors determine whether an individual finds a
particular visual experience pleasing? We have previously shown that individuals’ aesthetic responses are not
determined by objective image features but are instead a function of internal, subjective factors that are shaped by a
viewers’ personal experience. Yet for many classes of stimuli, culturally shared semantic associations give rise to similar
aesthetic taste across people. In this paper, we investigated factors that govern whether a set of observers will agree in
which images are preferred, or will instead exhibit more “personalized” aesthetic preferences. In a series of experiments,
observers were asked to make aesthetic judgments for different categories of visual stimuli that are commonly evaluated
in an aesthetic manner (faces, natural landscapes, architecture or artwork). By measuring agreement across observers,
this method was able to reveal instances of highly individualistic preferences. We found that observers showed high
agreement on their preferences for images of faces and landscapes, but much lower agreement for images of artwork and
architecture. In addition, we found higher agreement for heterosexual males making judgments of beautiful female faces
than of beautiful male faces. These results suggest that preferences for stimulus categories that carry evolutionary
significance (landscapes and faces) come to rely on similar information across individuals, whereas preferences for
artifacts of human culture such as architecture and artwork, which have fewer basic-level category distinctions and
reduced behavioral relevance, rely on a more personalized set of attributes.
Identifying image preferences based on demographic attributes
Author(s):
Elena A. Fedorovskaya;
Daniel R. Lawrence
Show Abstract
The intent of this study is to determine what sorts of images are considered more interesting by which demographic
groups. Specifically, we attempt to identify images whose interestingness ratings are influenced by the demographic
attribute of the viewer’s gender. To that end, we use the data from an experiment where 18 participants (9 women and 9
men) rated several hundred images based on “visual interest” or preferences in viewing images. The images were
selected to represent the consumer “photo-space” - typical categories of subject matter found in consumer photo
collections. They were annotated using perceptual and semantic descriptors.
In analyzing the image interestingness ratings, we apply a multivariate procedure known as forced classification, a
feature of dual scaling, a discrete analogue of principal components analysis (similar to correspondence analysis). This
particular analysis of ratings (i.e., ordered-choice or Likert) data enables the investigator to emphasize the effect of a
specific item or collection of items. We focus on the influence of the demographic item of gender on the analysis, so
that the solutions are essentially confined to subspaces spanned by the emphasized item. Using this technique, we can
know definitively which images’ ratings have been influenced by the demographic item of choice. Subsequently,
images can be evaluated and linked, on one hand, to their perceptual and semantic descriptors, and, on the other hand, to
the preferences associated with viewers’ demographic attributes.
Chamber QoE: a multi-instrumental approach to explore affective aspects in relation to quality of experience
Author(s):
Katrien De Moor;
Filippo Mazza;
Isabelle Hupont;
Miguel Ríos Quintero;
Toni Mäki;
Martín Varela
Show Abstract
Evaluating (audio)visual quality and Quality of Experience (QoE) from the user’s perspective, has become a key element
in optimizing users’ experiences and their quality. Traditionally, the focus lies on how multi-level quality features are
perceived by a human user. The interest has however gradually expanded towards human cognitive, affective and
behavioral processes that may impact on, be an element of, or be influenced by QoE, and which have been underinvestigated
so far. In addition, there is a major discrepancy between the new, broadly supported and more holistic
conceptualization of QoE proposed by Le Callet et al. (2012) and traditional, standardized QoE assessment. This paper
explores ways to tackle this discrepancy by means of a multi-instrumental approach. More concretely, it presents results
from a lab study on video quality (N=27), aimed at going beyond the dominant QoE assessment paradigm and at
exploring affective aspects in relation to QoE and in relation to perceived overall quality. Four types of data were
collected: ‘traditional’ QoE self-report measures were complemented with ‘alternative’, emotional state- and user
engagement-related self-report measures to evaluate QoE. In addition, we collected EEG (physiological) data, gazetracking
data and facial expressions (behavioral) data. The video samples used in test were longer in duration than is
common in standard tests allowing us to study e.g. more realistic experience and deeper user engagement. Our findings
support the claim that the traditional QoE measures need to be reconsidered and extended with additional, affective staterelated measures.
Alone or together: measuring users' viewing experience in different social contexts
Author(s):
Yi Zhu;
Ingrid Heynderickx;
Judith A. Redi
Show Abstract
In the past decades, a lot of effort has been invested in predicting the users’ Quality of Visual Experience (QoVE) in
order to optimize online video delivery. So far, the objective approaches to measure QoVE have been mainly based on
an estimation of the visibility of artifacts generated by signal impairments at the moment of delivery and on a prediction
of how annoying these artifacts are to the end user. Recently, it has been shown that other aspects, such as user interest
or viewing context, also have a crucial influence on QoVE. Social context is one of these aspects, but it has been poorly
investigated in relation to QoVE so far. In this paper, we report the outcomes of an experiment that aims at unveiling the
role that social context, and in particular co-located co-viewing, plays within the visual experience and the annoyance of
coding artifacts. The results show that social context significantly influences user’s QoVE, whereas the appearance of
artifacts doesn’t have impact on viewing experience, although users can still notice them. The results suggest that
quantifying the impact of social context on user experience is of major importance to accurately predict QoVE towards
video delivery optimization.
Would you hire me? Selfie portrait images perception in a recruitment context
Author(s):
F. Mazza;
M. P. Da Silva;
P. Le Callet
Show Abstract
Human content perception has been underlined to be important in multimedia quality evaluation. Recently aesthetic considerations have been subject of research in this field. First attempts in aesthetics took into account perceived low-level features, especially taken from photography theory. However they demonstrated to be insuf- ficient to characterize human content perception. More recently image psychology started to be considered as higher cognitive feature impacting user perception. In this paper we follow this idea introducing social cognitive elements. Our experiments focus on the influence of different versions of portrait pictures in context where they are showed aside some completely unrelated informations; this can happen for example in social networks interactions between users, where profile pictures are present aside almost every user action. In particular, we tested this impact on resumes between professional portrait and self shot pictures. Moreover, as we run tests in crowdsourcing, we will discuss the use of this methodology for these tests. Our final aim is to analyse social biases’ impact on multimedia aesthetics evaluation and how this bias influences messages that go along with pictures, as in public online platforms and social networks.
Assessing the impact of image manipulation on users' perceptions of deception
Author(s):
Valentina Conotter;
Duc-Tien Dang-Nguyen;
Giulia Boato;
María Menéndez;
Martha Larson
Show Abstract
Generally, we expect images to be an honest reflection of reality. However, this assumption is undermined by the new image editing technology, which allows for easy manipulation and distortion of digital contents. Our understanding of the implications related to the use of a manipulated data is lagging behind. In this paper we propose to exploit crowdsourcing tools in order to analyze the impact of different types of manipulation on users’ perceptions of deception. Our goal is to gain significant insights about how different types of manipulations impact users’ perceptions and how the context in which a modified image is used influences human perception of image deceptiveness. Through an extensive crowdsourcing user study, we aim at demonstrating that the problem of predicting user-perceived deception can be approached by automatic methods. Analysis of results collected on Amazon Mechanical Turk platform highlights how deception is related to the level of modifications applied to the image and to the context within modified pictures are used. To the best of our knowledge, this work represents the first attempt to address to the image editing debate using automatic approaches and going beyond investigation of forgeries.
Spectral compression: Weighted principal component analysis versus weighted least squares
Author(s):
Farnaz Agahian;
Brian Funt;
Seyed Hossein Amirshahi
Show Abstract
Two weighted compression schemes, Weighted Least Squares (wLS) and Weighted Principal Component Analysis
(wPCA), are compared by considering their performance in minimizing both spectral and colorimetric errors of
reconstructed reflectance spectra. A comparison is also made among seven different weighting functions incorporated
into ordinary PCA/LS to give selectively more importance to the wavelengths that correspond to higher sensitivity in the
human visual system. Weighted compression is performed on reflectance spectra of 3219 colored samples (including
Munsell and NCS data) and spectral and colorimetric errors are calculated in terms of CIEDE2000 and root mean square
errors. The results obtained indicate that wLS outperforms wPCA in weighted compression with more than three basis
vectors. Weighting functions based on the diagonal of Cohen’s R matrix lead to the best reproduction of color
information under both A and D65 illuminants particularly when using a low number of basis vectors.
Creating experimental color harmony map
Author(s):
Christel Chamaret;
Fabrice Urban;
Josselin Lepinel
Show Abstract
Starting in the 17th century with Newton, color harmony is a topic that did not reach a consensus on definition,
representation or modeling so far. Previous work highlighted specific characteristics for color harmony on com-
bination of color doublets or triplets by means of a human rating on a harmony scale. However, there were no
investigation involving complex stimuli or pointing out how harmony is spatially located within a picture. The
modeling of such concept as well as a reliable ground-truth would be of high value for the community, since the
applications are wide and concern several communities: from psychology to computer graphics. We propose a
protocol for creating color harmony maps from a controlled experiment. Through an eye-tracking protocol, we
focus on the identification of disharmonious colors in pictures. The experiment was composed of a free viewing
pass in order to let the observer be familiar with the content before a second pass where we asked ”to search
for the most disharmonious areas in the picture”. Twenty-seven observers participated to the experiments that
was composed of a total of 30 different stimuli. The high inter-observer agreement as well as a cross-validation
confirm the validity of the proposed ground-truth.
Exploring the use of memory colors for image enhancement
Author(s):
Su Xue;
Minghui Tan;
Ann McNamara;
Julie Dorsey;
Holly Rushmeier
Show Abstract
Memory colors refer to those colors recalled in association with familiar objects. While some previous work introduces this concept to assist digital image enhancement, their basis, i.e., on-screen memory colors, are not appropriately investigated. In addition, the resulting adjustment methods developed are not evaluated from a perceptual view of point. In this paper, we first perform a context-free perceptual experiment to establish the overall distributions of screen memory colors for three pervasive objects. Then, we use a context-based experiment to locate the most representative memory colors; at the same time, we investigate the interactions of memory colors between different objects. Finally, we show a simple yet effective application using representative memory colors to enhance digital images. A user study is performed to evaluate the performance of our technique.
Perceptual evaluation of colorized nighttime imagery
Author(s):
Alexander Toet;
Michael J. de Jong;
Maarten A. Hogervorst;
Ignace T. C. Hooge
Show Abstract
We recently presented a color transform that produces fused nighttime imagery with a realistic color appearance
(Hogervorst and Toet, 2010, Information Fusion, 11-2, 69-77). To assess the practical value of this transform we
performed two experiments in which we compared human scene recognition for monochrome intensified (II) and
longwave infrared (IR) imagery, and color daylight (REF) and fused multispectral (CF) imagery. First we investigated
the amount of detail observers can perceive in a short time span (the gist of the scene). Participants watched brief image
presentations and provided a full report of what they had seen. Our results show that REF and CF imagery yielded the
highest precision and recall measures, while both II and IR imagery yielded significantly lower values. This suggests that
observers have more difficulty extracting information from monochrome than from color imagery. Next, we measured
eye fixations of participants who freely explored the images. Although the overall fixation behavior was similar across
image modalities, the order in which certain details were fixated varied. Persons and vehicles were typically fixated first
in REF, CF and IR imagery, while they were fixated later in II imagery. In some cases, color remapping II imagery and
fusion with IR imagery restored the fixation order of these image details. We conclude that color remapping can yield
enhanced scene perception compared to conventional monochrome nighttime imagery, and may be deployed to tune
multispectral image representation such that the resulting fixation behavior resembles the fixation behavior for daylight
color imagery.
Reaching into Pictorial Spaces
Author(s):
Robert Volcic;
Dhanraj Vishwanath;
Fulvio Domini
Show Abstract
While binocular viewing of 2D pictures generates an impression of 3D objects and space, viewing a picture monocularly through an aperture produces a more compelling impression of depth and the feeling that the objects are “out there”, almost touchable. Here, we asked observers to actually reach into pictorial space under both binocular- and monocular-aperture viewing. Images of natural scenes were presented at different physical distances via a mirror-system and their retinal size was kept constant. Targets that observers had to reach for in physical space were marked on the image plane, but at different pictorial depths. We measured the 3D position of the index finger at the end of each reach-to-point movement.
Observers found the task intuitive. Reaching responses varied as a function of both pictorial depth and physical distance. Under binocular viewing, responses were mainly modulated by the different physical distances. Instead, under monocular viewing, responses were modulated by the different pictorial depths. Importantly, individual variations over time were minor, that is, observers conformed to a consistent pictorial space. Monocular viewing of 2D pictures thus produces a compelling experience of an immersive space and tangible solid objects that can be easily explored through motor actions.
A framework for the study of vision in active observers
Author(s):
Carlo Nicolini;
Carlo Fantoni;
Giovanni Mancuso;
Robert Volcic;
Fulvio Domini
Show Abstract
We present a framework for the study of active vision, i.e., the functioning of the visual system during actively
self-generated body movements. In laboratory settings, human vision is usually studied with a static observer
looking at static or, at best, dynamic stimuli. In the real world, however, humans constantly move within dynamic
environments. The resulting visual inputs are thus an intertwined mixture of self- and externally-generated
movements. To fill this gap, we developed a virtual environment integrated with a head-tracking system in which
the influence of self- and externally-generated movements can be manipulated independently. As a proof of
principle, we studied perceptual stationarity of the visual world during lateral translation or rotation of the head.
The movement of the visual stimulus was thus parametrically tethered to self-generated movements. We found
that estimates of object stationarity were less biased and more precise during head rotation than translation.
In both cases the visual stimulus had to partially follow the head movement to be perceived as immobile. We
discuss a range of possibilities for our setup among which the study of shape perception in active and passive
conditions, where the same optic flow is replayed to stationary observers.
Shading and shadowing on Canaletto's Piazza San Marco
Author(s):
Maarten W. A. Wijntjes;
Huib de Ridder
Show Abstract
Whereas the 17th century painter Canaletto was a master in linear perspective of the architectural elements, he seems to have had considerable difficulty with linear perspective of shadows. A common trick to avoid shadow perspective problems is to set the (solar) illumination direction parallel to the projection screen. We investigated in one painting where Canaletto clearly used this trick, whether he followed this light direction choice consistently through in how he shades the persons. We approached this question with a perceptual experiment where we measured perceived light directions in isolated details of the paintings. Specifically, we controlled whether observers could only see the (cast) shadow, only shading or both. We found different trends in all three conditions. The results indicate that Canaletto probably used different shading than the parallel light direction would predict. We interpret the results as a form or artistic freedom that Canaletto used to shade the persons individually.
3D space perception as embodied cognition in the history of art images
Author(s):
Christopher W. Tyler
Show Abstract
Embodied cognition is a concept that provides a deeper understanding of the aesthetics of art images. This study considers the role of embodied cognition in the appreciation of 3D pictorial space, 4D action space, its extension through mirror reflection to embodied self-‐cognition, and its relation to the neuroanatomical organization of the aesthetic response.
Color visualization of cyclic magnitudes
Author(s):
Alfredo Restrepo;
Viviana Estupiñán
Show Abstract
We exploit the perceptual, circular ordering of the hues in a technique for the visualization of cyclic variables. The hue is thus meaningfully used for the indication of variables such as the azimuth and the units of the measurement of time. The cyclic (or circular) variables may be both of the continuous type or the discrete type; among the first there is azimuth and among the last you find the musical notes and the days of the week. A correspondence between the values of a cyclic variable and the chromatic hues, where the natural circular ordering of the variable is respected, is called a color code for the variable. We base such a choice of hues on an assignment of of the unique hues red, yellow, green and blue, or one of the 8 even permutations of this ordered list, to 4 cardinal values of the cyclic variable, suitably ordered; color codes based on only 3 cardinal points are also possible. Color codes, being intuitive, are easy to remember. A possible low accuracy when reading instruments that use this technique is compensated by fast, ludic and intuitive readings; also, the use of a referential frame makes readings precise. An achromatic version of the technique, that can be used by dichromatic people, is proposed.
Quality evaluation of stereo 3DTV systems with open profiling of quality
Author(s):
Sara Kepplinger;
Nikolaus Hottong
Show Abstract
Current work describes two evaluations in two different locations investigating possible differences in the experience of
quality of stereo 3DTV systems. Herein, the work presents the usage of the Open Profiling of Quality method. This
method allows going beyond up to now considered distinctive features (e.g., glasses wear comfort, brightness…). During
the first evaluation standardized display-settings were used for each tested system. In the second study all systems were
tested with their provided factory settings. Other factors like test stimuli, play out technology, laboratory settings, and
viewing position were strictly standardized. Additionally, influencing factors like spectacle frames and display design
have been minimized by using same eyeglass frames (but different technology) and hiding the display chassis. The
results of both evaluations show distinct influences of display technology on quality perception. This is affirmed by the
quality describing attributes deriving from the open profiling of quality method beyond the quantitative quality rating.
This influence has to be considered within subjective evaluation of quality in order to support test-retest reliability and
user centered approaches on quality evaluation of stereo 3D visualization. Different quality perception of different
display technologies was confirmed even under different TV settings.
MPEG-4 AVC saliency map computation
Author(s):
M. Ammar;
M. Mitrea;
M. Hasnaoui
Show Abstract
A saliency map provides information about the regions inside some visual content (image, video, ...) at which a human
observer will spontaneously look at. For saliency maps computation, current research studies consider the uncompressed
(pixel) representation of the visual content and extract various types of information (intensity, color, orientation, motion
energy) which are then fusioned. This paper goes one step further and computes the saliency map directly from the
MPEG-4 AVC stream syntax elements with minimal decoding operations. In this respect, an a-priori in-depth study on
the MPEG-4 AVC syntax elements is first carried out so as to identify the entities appealing the visual attention.
Secondly, the MPEG-4 AVC reference software is completed with software tools allowing the parsing of these elements
and their subsequent usage in objective benchmarking experiments. This way, it is demonstrated that an MPEG-4
saliency map can be given by a combination of static saliency and motion maps.
This saliency map is experimentally validated under a robust watermarking framework. When included in an m-QIM
(multiple symbols Quantization Index Modulation) insertion method, PSNR average gains of 2.43 dB, 2.15dB, and 2.37
dB are obtained for data payload of 10, 20 and 30 watermarked blocks per I frame, i.e. about 30, 60, and 90 bits/second,
respectively. These quantitative results are obtained out of processing 2 hours of heterogeneous video content.
Visual manifold sensing
Author(s):
Irina Burciu;
Adrian Ion-Mărgineanu;
Thomas Martinetz;
Erhardt Barth
Show Abstract
We present a novel method, Manifold Sensing, for the adaptive sampling of the visual world based on manifolds of increasing but low dimensionality that have been learned with representative data. Because the data set is adapted during sampling, every new measurement (sample) depends on the previously acquired measurements. This leads to an efficient sampling strategy that requires a low total number of measurements. We apply Manifold Sensing to object recognition on UMIST, Robotics Laboratory, and ALOI benchmarks. For face recognition, with only 30 measurements - this corresponds to a compression ratio greater than 2000 - an unknown face can be localized such that its nearest neighbor in the low-dimensional manifold is almost always the actual nearest image. Moreover, the recognition rate obtained by assigning the class of the nearest neighbor is 100%. For a different benchmark with everyday objects, with only 38 measurements - in this case a compression ratio greater than 700 - we obtain similar localization results and, again, a 100% recognition rate.
Visually lossless coding based on temporal masking in human vision
Author(s):
Velibor Adzic;
Howard S. Hock;
Hari Kalva
Show Abstract
This paper presents a method for perceptual video compression that exploits the phenomenon of backward temporal
masking. We present an overview of visual temporal masking and discuss models to identify portions of a video
sequences masked due to this phenomenon exhibited by the human visual system. A quantization control model based
on the psychophysical model of backward visual temporal masking was developed. We conducted two types of
subjective evaluations and demonstrated that the proposed method up to 10% bitrate savings on top of state of the art
encoder with visually identical video. The proposed methods were evaluated using HEVC encoder.
Face detection on distorted images using perceptual quality-aware features
Author(s):
Suriya Gunasekar;
Joydeep Ghosh;
Alan C. Bovik
Show Abstract
We quantify the degradation in performance of a popular and effective face detector when human–perceived image quality
is degraded by distortions due to additive white gaussian noise, gaussian blur or JPEG compression. It is observed
that, within a certain range of perceived image quality, a modest increase in image quality can drastically improve face
detection performance. These results can be used to guide resource or bandwidth allocation in a communication/delivery
system that is associated with face detection tasks. A new face detector based on QualHOG features is also proposed
that augments face-indicative HOG features with perceptual quality–aware spatial Natural Scene Statistics (NSS) features, yielding improved tolerance against image distortions. The new detector provides statistically significant improvements over a strong baseline on a large database of face images representing a wide range of distortions. To facilitate this study, we created a new Distorted Face Database, containing face and non–face patches from images impaired by a variety of common distortion types and levels. This new dataset is available for download and further experimentation at
www.ideal.ece.utexas.edu/˜suriya/DFD/.
Consciousness and stereoscopic environmental imaging
Author(s):
Steve Mason
Show Abstract
The question of human consciousness has intrigued philosophers and scientists for centuries: its nature, how we perceive
our environment, how we think, our very awareness of thought and self. It has been suggested that stereoscopic vision is
“a paradigm of how the mind works” 1 In depth perception, laws of perspective are known, reasoned, committed to
memory from an early age; stereopsis, on the other hand, is a 3D experience governed by strict laws but actively joined
within the brain―one sees it without explanation. How do we, in fact, process two different images into one 3D module
within the mind and does an awareness of this process give us insight into the workings of our own consciousness?
To translate this idea to imaging I employed ChromaDepth™ 3D glasses that rely on light being refracted in a different
direction for each eye―colors of differing wavelengths appearing at varying distances from the viewer resulting in a 3D
space. This involves neither calculation nor manufacture of two images or views.
Environmental spatial imaging was developed―a 3D image was generated that literally surrounds the viewer. The
image was printed and adhered to a semi-circular mount; the viewer then entered the interior to experience colored
shapes suspended in a 3D space with an apparent loss of surface, or picture plane, upon which the image is rendered. By
focusing our awareness through perception-based imaging we are able to gain a deeper understanding of how the brain
works, how we see.
Bivariate statistical modeling of color and range in natural scenes
Author(s):
Che-Chun Su;
Lawrence K. Cormack;
Alan C. Bovik
Show Abstract
The statistical properties embedded in visual stimuli from the surrounding environment guide and affect the evolutionary processes of human vision systems. There are strong statistical relationships between co-located luminance/chrominance and disparity bandpass coefficients in natural scenes. However, these statistical rela- tionships have only been deeply developed to create point-wise statistical models, although there exist spatial dependencies between adjacent pixels in both 2D color images and range maps.
Here we study the bivariate statistics of the joint and conditional distributions of spatially adjacent bandpass responses on both luminance/chrominance and range data of naturalistic scenes. We deploy bivariate generalized Gaussian distributions to model the underlying statistics. The analysis and modeling results show that there exist important and useful statistical properties of both joint and conditional distributions, which can be reliably described by the corresponding bivariate generalized Gaussian models. Furthermore, by utilizing these robust bivariate models, we are able to incorporate measurements of bivariate statistics between spatially adjacent luminance/chrominance and range information into various 3D image/video and computer vision applications, e.g., quality assessment, 2D-to-3D conversion, etc.