Proceedings Volume 9394

Human Vision and Electronic Imaging XX

cover
Proceedings Volume 9394

Human Vision and Electronic Imaging XX

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 10 April 2015
Contents: 12 Sessions, 53 Papers, 0 Presentations
Conference: SPIE/IS&T Electronic Imaging 2015
Volume Number: 9394

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 9394
  • Keynote Session
  • New Frontiers in Perceptual Image Quality: New Trends, Emerging Technologies, and Novel Evaluation Techniques
  • Perception of Texture, Gloss, and Color in Materials: Joint Session with Conferences 9394 and 9398
  • Keynote: Joint Session with Conferences 9394 and 9395
  • New Frontiers in Perceptual Image Quality: Natural Scenes and Higher-Level Statistical Features
  • Lighting, Light, and Lightness
  • Color in New Technologies from Mobile to Cinema: Joint Session with Conferences 9394 and 9395
  • Attention and Visual Saliency
  • Perceptual Dynamics in Visualization and Computer Graphics
  • Digital Humanities: Imaging, Visualization, and Analytics in the Social Sciences
  • Interactive Paper Session
Front Matter: Volume 9394
icon_mobile_dropdown
Front Matter: Volume 9394
This PDF file contains the front matter associated with SPIE Proceedings Volume 9394, including the Title Page, Copyright information, Table of Contents, Authors, and Conference Committee listing.
Keynote Session
icon_mobile_dropdown
Cognitive psychology meets art: exploring creativity, language, and emotion through live musical improvisation in film and theatre
Mónica López-González
Creativity is primarily defined as a mental phenomenon that engages multiple cognitive processes to generate novel and useful solutions to problems. There are two core problem-solving modes: long-term deliberate methodical vs. short-term spontaneous. Despite behavioral models integrating the multiple activities (e.g. technical and financial issues, emotional responses) arising within and the socio-cultural effects surrounding the long-term creative process in various artistic disciplines, no systematic study exists of short-term improvisatory behavior in response to emotional stimuli within such ecologically valid contexts as film and theatre. In this paper I present and discuss the novel use of one cinematic and one theatrical project that investigate spontaneous creative thinking and emotion perception, particularly as it pertains to in the moment expressive translation of emotional scenic variables such as actors’ movements and dialogue to musical language. Both projects explore the six universal human emotions (anger, disgust, fear, happiness, sadness, surprise) and were performed and recorded with live improvised music by professional jazz musicians. Combining visual scene analysis with musical feature analysis of the improvised scores, I propose a cognitive feedback model of spontaneous creative emotional innovation that integrates music, spoken language, and emotional expression within the context of live music scoring in both film and theatre. This work also serves as an appeal towards more arts-based cognitive research and interdisciplinary methods in the study of human intelligence.
New Frontiers in Perceptual Image Quality: New Trends, Emerging Technologies, and Novel Evaluation Techniques
icon_mobile_dropdown
Use of a local cone model to predict essential CSF light adaptation behavior used in the design of luminance quantization nonlinearities
The human visual system’s luminance nonlinearity ranges continuously from square root behavior in the very dark, gamma-like behavior in dim ambient, cube-root in office lighting, and logarithmic for daylight ranges. Early display quantization nonlinearities have been developed based on luminance bipartite JND data. More advanced approaches considered spatial frequency behavior, and used the Barten light-adaptive Contrast Sensitivity Function (CSF) modelled across a range of light adaptation to determine the luminance nonlinearity (e.g., DICOM, referred to as a GSDF {grayscale display function}). A recent approach for a GSDF, also referred to as an electrical-to-optical transfer function (EOTF), using that light-adaptive CSF model improves on this by tracking the CSF for the most sensitive spatial frequency, which changes with adaptation level. We explored the cone photoreceptor’s contribution to the behavior of this maximum sensitivity of the CSF as a function of light adaptation, despite the CSF’s frequency variations and that the cone’s nonlinearity is a point-process. We found that parameters of a local cone model could fit the max sensitivity of the CSF model, across all frequencies, and are within the ranges of parameters commonly accepted for psychophysicallytuned cone models. Thus, a linking of the spatial frequency and luminance dimensions has been made for a key neural component. This provides a better theoretical foundation for the recently designed visual signal format using the aforementioned EOTF.
Display device-adapted video quality-of-experience assessment
Abdul Rehman, Kai Zeng, Zhou Wang
Today's viewers consume video content from a variety of connected devices, including smart phones, tablets, notebooks, TVs, and PCs. This imposes significant challenges for managing video traffic efficiently to ensure an acceptable quality-of-experience (QoE) for the end users as the perceptual quality of video content strongly depends on the properties of the display device and the viewing conditions. State-of-the-art full-reference objective video quality assessment algorithms do not take into account the combined impact of display device properties, viewing conditions, and video resolution while performing video quality assessment. We performed a subjective study in order to understand the impact of aforementioned factors on perceptual video QoE. We also propose a full reference video QoE measure, named SSIMplus, that provides real-time prediction of the perceptual quality of a video based on human visual system behaviors, video content characteristics (such as spatial and temporal complexity, and video resolution), display device properties (such as screen size, resolution, and brightness), and viewing conditions (such as viewing distance and angle). Experimental results have shown that the proposed algorithm outperforms state-of-the-art video quality measures in terms of accuracy and speed.
About subjective evaluation of adaptive video streaming
The usage of HTTP Adaptive Streaming (HAS) technology by content providers is increasing rapidly. Having available the video content in multiple qualities, using HAS allows to adapt the quality of downloaded video to the current network conditions providing smooth video-playback. However, the time-varying video quality by itself introduces a new type of impairment. The quality adaptation can be done in different ways. In order to find the best adaptation strategy maximizing users perceptual quality it is necessary to investigate about the subjective perception of adaptation-related impairments. However, the novelties of these impairments and their comparably long time duration make most of the standardized assessment methodologies fall less suited for studying HAS degradation. Furthermore, in traditional testing methodologies, the quality of the video in audiovisual services is often evaluated separated and not in the presence of audio. Nevertheless, the requirement of jointly evaluating the audio and the video within a subjective test is a relatively under-explored research field. In this work, we address the research question of determining the appropriate assessment methodology to evaluate the sequences with time-varying quality due to the adaptation. This was done by studying the influence of different adaptation related parameters through two different subjective experiments using a methodology developed to evaluate long test sequences. In order to study the impact of audio presence on quality assessment by the test subjects, one of the experiments was done in the presence of audio stimuli. The experimental results were subsequently compared with another experiment using the standardized single stimulus Absolute Category Rating (ACR) methodology.
A transformation-aware perceptual image metric
Petr Kellnhofer, Tobias Ritschel, Karol Myszkowski, et al.
Predicting human visual perception has several applications such as compression, rendering, editing and retargeting. Current approaches however, ignore the fact that the human visual system compensates for geometric transformations, e. g., we see that an image and a rotated copy are identical. Instead, they will report a large, false-positive difference. At the same time, if the transformations become too strong or too spatially incoherent, comparing two images indeed gets increasingly difficult. Between these two extrema, we propose a system to quantify the effect of transformations, not only on the perception of image differences, but also on saliency. To this end, we first fit local homographies to a given optical flow field and then convert this field into a field of elementary transformations such as translation, rotation, scaling, and perspective. We conduct a perceptual experiment quantifying the increase of difficulty when compensating for elementary transformations. Transformation entropy is proposed as a novel measure of complexity in a flow field. This representation is then used for applications, such as comparison of non-aligned images, where transformations cause threshold elevation, and detection of salient transformations.
Designing a biased specification-based subjective test of image quality
Amy R. Reibman
Specification-based subjective tests (SBSTs) form the basis for almost all performance analysis of image and video quality estimators (QEs). Our ability to compare the efficacy of different QEs across a wide range of applications depends on careful design of these SBST, so the conclusions that are drawn about how well a QE performs are reliable and accurate. In this paper, we explore methods to design biased SBSTs for image and video QEs. A biased SBST will produce an estimate of the performance of a given QE that is systematically different than its actual performance. We demonstrate by proof of concept that it's possible to create SBSTs that generate misleading or biased Pearson or Spearman correlation coefficients between subjective and objective scores, and we present some diagnostics that begin to evaluate when influential observations have been included in the SBST. Understanding how to create biased tests is a first step toward the overall goal of creating more effective unbiased SBSTs.
Towards a comprehensive model for predicting the quality of individual visual experience
Recently, a lot of effort has been devoted to estimating the Quality of Visual Experience (QoVE) in order to optimize video delivery to the user. For many decades, existing objective metrics mainly focused on estimating the perceived quality of a video, i.e., the extent to which artifacts due to e.g. compression disrupt the appearance of the video. Other aspects of the visual experience, such as enjoyment of the video content, were, however, neglected. In addition, typically Mean Opinion Scores were targeted, deeming the prediction of individual quality preferences too hard of a problem. In this paper, we propose a paradigm shift, and evaluate the opportunity of predicting individual QoVE preferences, in terms of video enjoyment as well as perceived quality. To do so, we explore the potential of features of different nature to be predictive for a user’s specific experience with a video. We consider thus not only features related to the perceptual characteristics of a video, but also to its affective content. Furthermore, we also integrate in our framework the information about the user and use context. The results show that effective feature combinations can be identified to estimate the QoVE from the perspective of both the enjoyment and perceived quality.
Quality labeled faces in the wild (QLFW): a database for studying face recognition in real-world environments
The varying quality of face images is an important challenge that limits the effectiveness of face recognition technology when applied in real-world applications. Existing face image databases do not consider the effect of distortions that commonly occur in real-world environments. This database (QLFW) represents an initial attempt to provide a set of labeled face images spanning the wide range of quality, from no perceived impairment to strong perceived impairment for face detection and face recognition applications. Types of impairment include JPEG2000 compression, JPEG compression, additive white noise, Gaussian blur and contrast change. Subjective experiments are conducted to assess the perceived visual quality of faces under different levels and types of distortions and also to assess the human recognition performance under the considered distortions. One goal of this work is to enable automated performance evaluation of face recognition technologies in the presence of different types and levels of visual distortions. This will consequently enable the development of face recognition systems that can operate reliably on real-world visual content in the presence of real-world visual distortions. Another goal is to enable the development and assessment of visual quality metrics for face images and for face detection and recognition applications.
Parameterized framework for the analysis of visual quality assessments using crowdsourcing
Anthony Fremuth, Velibor Adzic, Hari Kalva
The ability to assess the quality of new multimedia tools and applications relies heavily on the perception of the end user. In order to quantify the perception, subjective tests are required to evaluate the effectiveness of new technologies. However, the standard for subjective user studies requires a highly controlled test environment and is costly in terms of both money and time. To circumvent these issues we are utilizing crowdsourcing platforms such as CrowdFlower and Amazon's Mechanical Turk. The reliability of the results relies on factors that are not controlled and can be considered “hidden”. We are using pre-test survey to collect responses from subjects that reveal some of the hidden factors. Using statistical analysis we build parameterized model allowing for proper adjustments to collected test scores.
What do you think of my picture? Investigating factors of influence in profile images context perception
F. Mazza, M. P. Da Silva, P. Le Callet, et al.
Multimedia quality assessment has been an important research topic during the last decades. The original focus on artifact visibility has been extended during the years to aspects as image aesthetics, interestingness and memorability. More recently, Fedorovskaya proposed the concept of 'image psychology': this concept focuses on additional quality dimensions related to human content processing. While these additional dimensions are very valuable in understanding preferences, it is very hard to define, isolate and measure their effect on quality. In this paper we continue our research on face pictures investigating which image factors influence context perception. We collected perceived fit of a set of images to various content categories. These categories were selected based on current typologies in social networks. Logistic regression was adopted to model category fit based on images features. In this model we used both low level and high level features, the latter focusing on complex features related to image content. In order to extract these high level features, we relied on crowdsourcing, since computer vision algorithms are not yet sufficiently accurate for the features we needed. Our results underline the importance of some high level content features, e.g. the dress of the portrayed person and scene setting, in categorizing image.
Perception of Texture, Gloss, and Color in Materials: Joint Session with Conferences 9394 and 9398
icon_mobile_dropdown
Texture, illumination, and material perception
Sylvia C. Pont, Andrea J. van Doorn, Maarten W. A. Wijntjes, et al.
In this paper we will present an overview of our research into perception and biologically inspired modeling of illumination (flow) from 3D textures and the influence of roughness and illumination on material perception. Here 3D texture is defined as an image of an illuminated rough surface. In a series of theoretical and empirical papers we studied how we can estimate the illumination orientation (in the image plane) from 3D textures of globally flat samples. We found that the orientation can be estimated well by humans and computers using an approach based on second order statistics. This approach makes use of the dipole-like structures in 3D textures that are the results of illumination of bumps / throughs. For 3D objects, the local illumination direction varies over the object, resulting in surface illuminance flow. This again results in image illuminance flow in the image of a rough 3D object: the observable projection in the image of the field of local illumination orientations. Here we present results on image illuminance flow analysis for images from the Utrecht Oranges database, the Curet database and two vases. These results show that the image illuminance flow can be estimated robustly for various rough materials. In earlier studies we have shown that the image illuminance flow can be used to do shape and illumination inferences. Recently, in psychophysical experiments we found that adding 3D texture to a matte spherical object improves judgments of the direction and diffuseness of its illumination by human observers. This shows that human observers indeed use the illuminance flow as a cue for the illumination.
Effects of contrast adjustment on visual gloss of natural textures
We propose a novel subband-based S-curve transformation for increasing the perceived contrast of images, and use it to explore the relation between perceived gloss and perceived contrast of natural textures. The proposed transformation makes minimal assumptions on lighting conditions and does not require prior knowledge of surface geometry. Through a series of subjective experiments with both complex real-world textures and synthesized Lambertian surfaces, we show that there is a strong and robust correlation between perceived contrast and perceived gloss, regardless of the composition of the texture. We also show that contrast modification of an image with near-frontal illumination can compensate for the change in perceived gloss due to an oblique illumination (of the same texture at the same viewing angle).
A subjective study and an objective metric to quantify the granularity level of textures
Texture granularity is an important visual characteristic that is useful in a variety of applications, including analysis, recognition, and compression, to name a few. A texture granularity measure can be used to quantify the perceived level of texture granularity. The granularity level of the textures is influenced by the size of the texture primitives. A primitive is defined as the smallest recognizable repetitive object in the texture. If the texture has large primitives then the perceived granularity level tends to be lower as compared to a texture with smaller primitives. In this work we are presenting a texture granularity database referred as GranTEX which consists of 30 textures with varying levels of primitive sizes and granularity levels. The GranTEX database consists of both natural and man-made textures. A subjective study is conducted to measure the perceived granularity level of textures present in the GranTEX database. An objective metric that automatically measures the perceived granularity level of textures is also presented as part of this work. It is shown that the proposed granularity metric correlates well with the subjective granularity scores.
Texture synthesis models and material perception in the visual periphery
Benjamin Balas
The feature vocabularies used to support texture synthesis algorithms are increasingly being used to examine various aspects of human visual perception. These algorithms offer both a rich set of features that are typically sufficient to capture the appearance of complex natural inputs and a means of carrying out psychophysical experiments using synthetic textures as a proxy for the transformations ostensibly carried out by the visual system when processing natural images using summary statistics. Texture synthesis algorithms have recently been successfully applied to a wide range of visual tasks including texture perception, visual crowding, visual search, among others. Presently, we used both nonparametric and parametric texture synthesis models to investigate the nature of material perception in the visual periphery. We asked participants to classify images of four natural materials (metal, stone, water, and wood) when briefly presented in the visual periphery and compared the errors made under these viewing conditions to the errors made when judging the material category of synthetic images made from the original targets. We found that the confusions made under these two scenarios were substantially different, suggesting that these particular models do not appear to account for material perception in the periphery.
Keynote: Joint Session with Conferences 9394 and 9395
icon_mobile_dropdown
Next gen perception and cognition: augmenting perception and enhancing cognition through mobile technologies
In current times, mobile technologies are ubiquitous and the complexity of problems is continuously increasing. In the context of advancement of engineering, we explore in this paper possible reasons that could cause a saturation in technology evolution – namely the ability of problem solving based on previous results and the ability of expressing solutions in a more efficient way, concluding that ‘thinking outside of brain’ – as in solving engineering problems that are expressed in a virtual media due to their complexity – would benefit from mobile technology augmentation. This could be the necessary evolutionary step that would provide the efficiency required to solve new complex problems (addressing the ‘running out of time’ issue) and remove the communication of results barrier (addressing the human ‘perception/expression imbalance’ issue). Some consequences are discussed, as in this context the artificial intelligence becomes an automation tool aid instead of a necessary next evolutionary step. The paper concludes that research in modeling as problem solving aid and data visualization as perception aid augmented with mobile technologies could be the path to an evolutionary step in advancing engineering.
New Frontiers in Perceptual Image Quality: Natural Scenes and Higher-Level Statistical Features
icon_mobile_dropdown
Feature maps driven no-reference image quality prediction of authentically distorted images
Deepti Ghadiyaram, Alan C. Bovik
Current blind image quality prediction models rely on benchmark databases comprised of singly and synthetically distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. However, real world images often contain complex mixtures of multiple distortions. Rather than a) discounting the effect of these mixtures of distortions on an image's perceptual quality and considering only the dominant distortion or b) using features that are only proven to be efficient for singly distorted images, we deeply study the natural scene statistics of authentically distorted images, in different color spaces and transform domains. We propose a feature-maps-driven statistical approach which avoids any latent assumptions about the type of distortion(s) contained in an image, and focuses instead on modeling the remarkable consistencies in the scene statistics of real world images in the absence of distortions. We design a deep belief network that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction. We demonstrate the remarkable competence of our features for improving automatic perceptual quality prediction on a benchmark database and on the newly designed LIVE Authentic Image Quality Challenge Database and show that our approach of combining robust statistical features and the deep belief network dramatically outperforms the state-of-the-art.
Combining full-reference image visual quality metrics by neural network
Vladimir V. Lukin, Nikolay N. Ponomarenko, Oleg I. Ieremeiev, et al.
A task of assessing full-reference visual quality of images is considered. Correlation between the obtained array of mean opinion scores (MOS) and the corresponding array of given metric values allows characterizing correspondence of a considered metric to HVS. For the largest openly available database TID2013 intended for metric verification, a Spearman correlation is about 0.85 for the best existing HVS-metrics. One simple way to improve an efficiency of assessing visual quality of images is to combine several metrics. Our work addresses a possibility of using neural networks for the aforementioned purpose. As leaning data, we have used metric sets for images of the database TID2013 that are employed as the network inputs. Randomly selected half of 3000 images of the database TID2013 has been used at the learning stage whilst other half have been exploited for assessing quality of neural network based HVS-metric. Six metrics “cover” well all types of distortions: FSIMc, PSNR-HMA, PSNR-HVS, SFF, SR-SIM, and VIF, have been selected. As the result of NN learning, the Spearman correlation between the NN output and the MOS for the verification set of database TID2013 reaches 0.93 for the best configuration of NN. This is considerably better than for any particular metric employed as an input (FSIMc is the best among them). Analysis of the designed metric efficiency is carried out, its advantages and drawbacks are demonstrated.
Geometrical and statistical properties of vision models obtained via maximum differentiation
We examine properties of perceptual image distortion models, computed as the mean squared error in the response of a 2-stage cascaded image transformation. Each stage in the cascade is composed of a linear transformation, followed by a local nonlinear normalization operation. We consider two such models. For the first, the structure of the linear transformations is chosen according to perceptual criteria: a center-surround filter that extracts local contrast, and a filter designed to select visually relevant contrast according to the Standard Spatial Observer. For the second, the linear transformations are chosen based on statistical criterion, so as to eliminate correlations estimated from responses to a set of natural images. For both models, the parameters that govern the scale of the linear filters and the properties of the nonlinear normalization operation, are chosen to achieve minimal/maximal subjective discriminability of pairs of images that have been optimized to minimize/maximize the model, respectively (we refer to this as MAximum Differentiation, or “MAD”, Optimization). We find that both representations substantially reduce redundancy (mutual information), with a larger reduction occurring in the second (statistically optimized) model. We also find that both models are highly correlated with subjective scores from the TID2008 database, with slightly better performance seen in the first (perceptually chosen) model. Finally, we use a foveated version of the perceptual model to synthesize visual metamers. Specifically, we generate an example of a distorted image that is optimized so as to minimize the perceptual error over receptive fields that scale with eccentricity, demonstrating that the errors are barely visible despite a substantial MSE relative to the original image.
Relations between local and global perceptual image quality and visual masking
Md Mushfiqul Alam, Pranita Patil, Martin T. Hagan, et al.
Perceptual quality assessment of digital images and videos are important for various image-processing applications. For assessing the image quality, researchers have often used the idea of visual masking (or distortion visibility) to design image-quality predictors specifically for the near-threshold distortions. However, it is still unknown that while assessing the quality of natural images, how the local distortion visibilities relate with the local quality scores. Furthermore, the summing mechanism of the local quality scores to predict the global quality scores is also crucial for better prediction of the perceptual image quality. In this paper, the local and global qualities of six images and six distortion levels were measured using subjective experiments. Gabor-noise target was used as distortion in the quality-assessment experiments to be consistent with our previous study [Alam, Vilankar, Field, and Chandler, Journal of Vision, 2014], in which the local root-mean-square contrast detection thresholds of detecting the Gabor-noise target were measured at each spatial location of the undistorted images. Comparison of the results of this quality-assessment experiment and the previous detection experiment shows that masking predicted the local quality scores more than 95% correctly above 15 dB threshold within 5% subject scores. Furthermore, it was found that an approximate squared summation of local-quality scores predicted the global quality scores suitably (Spearman rank-order correlation 0:97).
Building structural similarity database for metric learning
We propose a new approach for constructing databases for training and testing similarity metrics for structurally lossless image compression. Our focus is on structural texture similarity (STSIM) metrics and the matched-texture compression (MTC) approach. We first discuss the metric requirements for structurally lossless compression, which differ from those of other applications such as image retrieval, classification, and understanding. We identify “interchangeability” as the key requirement for metric performance, and partition the domain of “identical” textures into three regions, of “highest,” “high,” and “good” similarity. We design two subjective tests for data collection, the first relies on ViSiProG to build a database of “identical” clusters, and the second builds a database of image pairs with the “highest,” “high,” “good,” and “bad” similarity labels. The data for the subjective tests is generated during the MTC encoding process, and consist of pairs of candidate and target image blocks. The context of the surrounding image is critical for training the metrics to detect lighting discontinuities, spatial misalignments, and other border artifacts that have a noticeable effect on perceptual quality. The identical texture clusters are then used for training and testing two STSIM metrics. The labelled image pair database will be used in future research.
Lighting, Light, and Lightness
icon_mobile_dropdown
Effect of daylight on atmosphere perception: comparison of a real space and visualizations
Mariska G. M. Stokkermans, Yuexu Chen, Michael J. Murdoch, et al.
The perceived atmosphere in a space is to a large extent determined by the illumination of the space, which usually is a combination of artificial lighting and daylight naturally entering the room. In this study we investigated how the presence of daylight affects the perceived atmosphere of a light ambience both in a real illuminated space and in a visualization of the space. The perceptual accuracy of these visualizations has been demonstrated in previous studies for artificial interior lighting, but not yet for the contribution of daylight to the interior lighting. Our results showed only a relatively small effect of the contribution of daylight both on the perception of light and the perception of the atmosphere of an interior light ambience. Possibly, adaptation plays an important role in this finding. Further, we demonstrated that the perceptual accuracy of visualizations containing daylight was for the majority of the light and atmosphere perception attributes similar to visualizations showing only artificial interior lighting.
The role of natural lighting diffuseness in human visual perception
Yaniv Morgenstern, Wilson S. Geisler, Richard F. Murray
The pattern of the light that falls on the retina is a conflation of real-world sources such as illumination and reflectance. Human observers often contend with the inherent ambiguity of the underlying sources by making assumptions about what real-world sources are most likely. Here we examine whether the visual system’s assumptions about illumination match the statistical regularities of the real world. We used a custom-built multidirectional photometer to capture lighting relevant to the shading of Lambertian surfaces in hundreds of real-world scenes. We quantify the diffuseness of these lighting measurements, and compare them to previous biases in human visual perception. We find that (1) natural lighting diffuseness falls over the same range as previous psychophysical estimates of the visual system’s assumptions on diffuseness, and (2) natural lighting almost always provides lighting direction cues that are strong enough to override the human visual system’s well known assumption that light tends to come from above. A consequence of these findings is that what seem to be errors in visual perception are often actually byproducts of the visual system knowing about and using reliable properties of real-world lighting when contending with ambiguous retinal images.
The influence of lighting on visual perception of material qualities
We studied whether lighting influences the visual perception of material scattering qualities. To this aim we made an interface or “material probe”, called MatMix 1.0, in which we used optical mixing of four canonical material modes. The appearance of a 3D object could be adjusted by interactively adjusting the weights of the four material components in the probe. This probe was used in a matching experiment in which we compared material perception under generic office lighting with that under three canonical lighting conditions. For the canonical materials, we selected matte, velvety, specular and glittery, representing diffuse, asperity, forward, and specular micro facet scattering modes. For the canonical lightings, we selected ambient, focus and brilliance lighting modes. In our matching experiment, observers were asked to change the appearance of the probe so that the material qualities of the probe matched that of the stimuli. From the matching results, we found that our brilliance lighting brought out the glossiness of our stimuli and our focus lighting brought out the velvetiness of our stimuli most similarly to office lighting. We conclude that the influence of lighting on material perception is material-dependent.
Effect of fixation positions on perception of lightness
Matteo Toscani, Matteo Valsecchi, Karl R. Gegenfurtner
Visual acuity, luminance sensitivity, contrast sensitivity, and color sensitivity are maximal in the fovea and decrease with retinal eccentricity. Therefore every scene is perceived by integrating the small, high resolution samples collected by moving the eyes around. Moreover, when viewing ambiguous figures the fixated position influences the dominance of the possible percepts. Therefore fixations could serve as a selection mechanism whose function is not confined to finely resolve the selected detail of the scene. Here this hypothesis is tested in the lightness perception domain. In a first series of experiments we demonstrated that when observers matched the color of natural objects they based their lightness judgments on objects’ brightest parts. During this task the observers tended to fixate points with above average luminance, suggesting a relationship between perception and fixations that we causally proved using a gaze contingent display in a subsequent experiment. Simulations with rendered physical lighting show that higher values in an object’s luminance distribution are particularly informative about reflectance. In a second series of experiments we considered a high level strategy that the visual system uses to segment the visual scene in a layered representation. We demonstrated that eye movement sampling mediates between the layer segregation and its effects on lightness perception. Together these studies show that eye fixations are partially responsible for the selection of information from a scene that allows the visual system to estimate the reflectance of a surface.
Color in New Technologies from Mobile to Cinema: Joint Session with Conferences 9394 and 9395
icon_mobile_dropdown
Reducing observer metamerism in wide-gamut multiprimary displays
Emerging electronic display technologies for cinema and television such as LED, OLED, laser and quantum dot are permitting greatly enhanced color gamuts via increasingly narrow-band primary emission spectra. A recent standard adopted for Ultra High Definition television, ITU-R Rec. 2020, promotes RGB primary chromaticities coincident with the spectral locus. As displays trend towards larger gamuts in the traditional 3-primary design, variability in human color sensing is exacerbated. Metameric matches to aim stimuli for one particular observer may yield a notable color mismatch for others, even if all observers are members of a color-normal population. Multiprimary design paradigms may hold value for simultaneously enhancing color gamut and reducing observer metamerism. By carefully selecting primary spectra in systems employing more than 3 emission channels, intentional metameric performance can be controlled. At Rochester Institute of Technology, a prototype multiprimary display has been simulated to minimize observer metamerism and observer variability according to custom indices derived from emerging models for human color vision. The constructed display is further being implemented in observer experiments to validate practical performance and confirm these vision and metamerism models.
Gamut extension for cinema: psychophysical evaluation of the state of the art and a new algorithm
Syed Waqas Zamir, Javier Vazquez-Corral, Marcelo Bertalmío
Wide gamut digital display technology, in order to show its full potential in terms of colors, is creating an opportunity to develop gamut extension algorithms (GEAs). To this end, in this work we present two contributions. First we report a psychophysical evaluation of GEAs specifically for cinema using a digital cinema projector under cinematic (low ambient light) conditions; to the best of our knowledge this is the first evaluation of this kind reported in the literature. Second, we propose a new GEA by introducing simple but key modifications to the algorithm of Zamir et al. This new algorithm performs well in terms of skin tones and memory colors, with results that look natural and which are free from artifacts.
Attention and Visual Saliency
icon_mobile_dropdown
Modeling the importance of faces in natural images
B. Jin, G. Yildirim, C. Lau, et al.
In this work we study the varying importance of faces in images. Face importance is found to be affected by the size and number of faces present. We collected a dataset of 152 face images with faces in different size and number of faces. We conducted a crowdsourcing experiment where we asked people to label the important regions of the images. Analyzing the results from the experiment, we propose a simple face-importance model, which is a 2D Gaussian function, to quantitatively represent the influence of the size and number of faces on the perceived importance of faces. The face-importance model is then tested for the application of salient-object detection. For this application, we create a new salient-objects dataset, consisting of both face images and non-face images, and also through crowdsourcing we collect the ground truth. We demonstrate that our face-importance model helps us to better locate the important, thus salient, objects in the images and outperforms state-of-the-art salient-object detection algorithms.
Bridging the gap between eye tracking and crowdsourcing
Pierre Lebreton, Toni Mäki, Evangelos Skodras, et al.
Visual attention constitutes a very important feature of the human visual system (HVS). Every day when watching videos, images or browsing the Internet, people are confronted with more information than they are able to process, and analyze only part of the information in front of them. In parallel, crowdsourcing has become a particularly hot topic, enabling to scale subjective experiments to a large crowd with diversity in terms of nationalities, social background, age, etc. This paper describes a novel framework with the aim to bridge these two fields, by providing a new way of measurements of user's experience in a subjective crowdsourcing experiment. This study goes beyond self-reported methods, and provide a new kind of information for the context of crowdsourcing: visual attention. The results show that it is possible to estimate visual attention, in a non-intrusive manner and without using self-reported methods or specialized equipment, with a precision as high as 14.1% in the horizontal axis and 17.9% in the vertical axis. This accuracy is sufficient for many kinds of measurements that can be efficiently executed only in non-controlled environments..
Visual saliency in MPEG-4 AVC video stream
M. Ammar, M. Mitrea, M. Hasnaoui, et al.
Visual saliency maps already proved their efficiency in a large variety of image/video communication application fields, covering from selective compression and channel coding to watermarking. Such saliency maps are generally based on different visual characteristics (like color, intensity, orientation, motion,…) computed from the pixel representation of the visual content. This paper resumes and extends our previous work devoted to the definition of a saliency map solely extracted from the MPEG-4 AVC stream syntax elements. The MPEG-4 AVC saliency map thus defined is a fusion of static and dynamic map. The static saliency map is in its turn a combination of intensity, color and orientation features maps. Despite the particular way in which all these elementary maps are computed, the fusion techniques allowing their combination plays a critical role in the final result and makes the object of the proposed study. A total of 48 fusion formulas (6 for combining static features and, for each of them, 8 to combine static to dynamic features) are investigated. The performances of the obtained maps are evaluated on a public database organized at IRCCyN, by computing two objective metrics: the Kullback-Leibler divergence and the area under curve.
Learning visual balance from large-scale datasets of aesthetically highly rated images
Ali Jahanian, S.V.N. Vishwanathan, Jan P. Allebach
The concept of visual balance is innate for humans, and influences how we perceive visual aesthetics and cognize harmony. Although visual balance is a vital principle of design and taught in schools of designs, it is barely quantified. On the other hand, with emergence of automantic/semi-automatic visual designs for self-publishing, learning visual balance and computationally modeling it, may escalate aesthetics of such designs. In this paper, we present how questing for understanding visual balance inspired us to revisit one of the well-known theories in visual arts, the so called theory of “visual rightness”, elucidated by Arnheim. We define Arnheim’s hypothesis as a design mining problem with the goal of learning visual balance from work of professionals. We collected a dataset of 120K images that are aesthetically highly rated, from a professional photography website. We then computed factors that contribute to visual balance based on the notion of visual saliency. We fitted a mixture of Gaussians to the saliency maps of the images, and obtained the hotspots of the images. Our inferred Gaussians align with Arnheim’s hotspots, and confirm his theory. Moreover, the results support the viability of the center of mass, symmetry, as well as the Rule of Thirds in our dataset.
Assessing the influence of combinations of blockiness, blurriness, and packet loss impairments on visual attention deployment
In digital video systems, impairments introduced during the capture, coding/decoding processes, delivery and display might reduce the perceived quality of the visual content. Recent developments in the area of visual quality have focused on trying to incorporate aspects of gaze patterns into the design of visual quality metrics, mostly using the assumption that visual distortions appearing in less salient areas might be less visible and, therefore, less annoying. Most of these studies, however, have considered the presence of a single artifact (e.g. blockiness or blur) impairing the image. In practice, this is not the case, as multiple artifacts may overlap, and their combined appearance may be strong enough to deviate saliency from its natural pattern. In this work, our focus is on measuring the impact and the influence of combinations of artifacts on the video saliency. For this purpose, we tracked eye-movements of participants in a subjective quality assessment experiment during a free-viewing and a quality assessment tasks. Results show that the gaze locations change from pristine videos to impaired videos. These changes seem to be more related to the quality level and content of videos than to the specific combination of artifacts.
Perceptual Dynamics in Visualization and Computer Graphics
icon_mobile_dropdown
Hue tinting for interactive data visualization
Hue tinting’ is a set of visualization interactions that make it possible to use color in ways that are meaningful and specific to visualization tasks. ‘Hue tinting’ interactions address the problem of how to best choose colors to show numbers while 1) using hue to mark only relevant aspects of the data and 2) minimizing color-related problems such as brightness distortion. Most visualization systems make it difficult to use hue variation to identify and distinguish between meaningful features in a dataset without distorting the form or structure of the data. Like colorizing a black-and-white photograph, ‘Hue tinting’ lets you use color to select, identify, and mark relevant portions of your data without distorting the brightness of the underlying grayscale visualization. Hue tinting a specific range of data values provides a direct method for validating and compliance-testing. Hue tinting a specific region of an image provides a direct method for identifying and measuring features in a dataset, such as the range of power levels at a given frequency range.
On the visualization of tetrachromatic images
Alfredo Restrepo, Edisson Maldonado
How can a normal trichromat person attempt to see a tetrachromatic image? We propose 2 techniques, both of which use the time dimension. This makes it difficult to use the techniques to visualise tetrachromatic videos so, we deal here only with the visualisation of tetrachromatic (static) images. In one of the visualisation techniques we use "movies" (as in a Matlab movie) made of frames that are trichromatic renderings of tetrachromatically-processed versions of the tetrachromatic image to be visualised, where a parameter in 4-Runge1 space varies across the versions. In the other technique, we give a "dynamic texture" to regions of pixels having a UV or, alternatively an IR component. The texture can be for example a superposed "granularity" (texture) that "vibrates" in time, where the density of the grains depends on the amount of UV (or NIR) present in the tetrachromatic colour.
Evaluating the perception of different matching strategies for time-coherent animations
Javier Villegas, Ronak Etemadpour, Angus Graeme Forbes
This paper introduces new terminology to describe the perceptual qualities of the non-photorealistic animation sequences created using an analysis/synthesis approach to rendering. Specifically, we propose the use of different matching optimization criteria as part of the creative control for generating animated sequences, or stylized video, and we explore the perceptual differences that are obtained when different optimization criteria are used. Additionally, metrics are introduced that describe the strengths and weakness of each of these matching strategies. Moreover, we show that these metrics may be useful for future evaluations of stylized video. We examine a series of sequences generated using different matching algorithms based on these metrics, and a user evaluation of 30 participants demonstrates that our objective metrics are perceptually relevant.
Perceptual evaluation of visual alerts in surveillance videos
Bernice E. Rogowitz, Mercan Topkara, William Pfeiffer, et al.
Visual alerts are commonly used in video monitoring and surveillance systems to mark events, presumably making them more salient to human observers. Surprisingly, the effectiveness of computer-generated alerts in improving human performance has not been widely studied. To address this gap, we have developed a tool for simulating different alert parameters in a realistic visual monitoring situation, and have measured human detection performance under conditions that emulated different set-points in a surveillance algorithm. In the High-Sensitivity condition, the simulated alerts identified 100% of the events with many false alarms. In the Lower-Sensitivity condition, the simulated alerts correctly identified 70% of the targets, with fewer false alarms. In the control condition, no simulated alerts were provided. To explore the effects of learning, subjects performed these tasks in three sessions, on separate days, in a counterbalanced, within subject design. We explore these results within the context of cognitive models of human attention and learning. We found that human observers were more likely to respond to events when marked by a visual alert. Learning played a major role in the two alert conditions. In the first session, observers generated almost twice as many False Alarms as in the No-Alert condition, as the observers responded pre-attentively to the computer-generated false alarms. However, this rate dropped equally dramatically in later sessions, as observers learned to discount the false cues. Highest observer Precision, Hits/(Hits + False Alarms), was achieved in the High Sensitivity condition, but only after training. The successful evaluation of surveillance systems depends on understanding human attention and performance.
Digital Humanities: Imaging, Visualization, and Analytics in the Social Sciences
icon_mobile_dropdown
Examples of challenges and opportunities in visual analysis in the digital humanities
Holly Rushmeier, Ruggero Pintus, Ying Yang, et al.
The massive digitization of books and manuscripts has converted millions of works that were once only physical into electronic documents. This conversion has made it possible for scholars to study large bodies of work, rather than just individual texts. This has offered new opportunities for scholarship in the humanities. Much previous work on digital collections has relied on optical character recognition and focused on the textual content of books. New work is emerging that is analyzing the visual layout and content of books and manuscripts. We present two different digital humanities projects in progress that present new opportunities for extracting data about the past, with new challenges for designing systems for scholars to interact with this data. The first project we consider is the layout and spectral content of thousands of pages from medieval manuscripts. We present the techniques used to study content variations in sets of similar manuscripts, and to study material variations that may indicate the location of manuscript production. The second project is the analysis of representations in the complete archive of Vogue magazine over 120 years. We present samples of applying computer vision techniques to understanding the changes in representation of women over time.
Temporal evolution of brain reorganization under cross-modal training: insights into the functional architecture of encoding and retrieval networks
This study is based on the recent discovery of massive and well-structured cross-modal memory activation generated in the primary visual cortex (V1) of totally blind people as a result of novel training in drawing without any vision (Likova, 2012). This unexpected functional reorganization of primary visual cortex was obtained after undergoing only a week of training by the novel Cognitive-Kinesthetic Method, and was consistent across pilot groups of different categories of visual deprivation: congenitally blind, late-onset blind and blindfolded (Likova, 2014). These findings led us to implicate V1 as the implementation of the theoretical visuo-spatial ‗sketchpad‗ for working memory in the human brain. Since neither the source nor the subsequent ‗recipient‘ of this non-visual memory information in V1 is known, these results raise a number of important questions about the underlying functional organization of the respective encoding and retrieval networks in the brain. To address these questions, an individual totally blind from birth was given a week of Cognitive-Kinesthetic training, accompanied by functional magnetic resonance imaging (fMRI) both before and just after training, and again after a two-month consolidation period. The results revealed a remarkable temporal sequence of training-based response reorganization in both the hippocampal complex and the temporal-lobe object processing hierarchy over the prolonged consolidation period. In particular, a pattern of profound learning-based transformations in the hippocampus was strongly reflected in V1, with the retrieval function showing massive growth as result of the Cognitive-Kinesthetic memory training and consolidation, while the initially strong hippocampal response during tactile exploration and encoding became non-existent. Furthermore, after training, an alternating patch structure in the form of a cascade of discrete ventral regions underwent radical transformations to reach complete functional specialization in terms of either encoding or retrieval as a function of the stage of learning. Moreover, several distinct patterns of learning-evolution within the patches as a function of their anatomical location, implying a complex reorganization of the object processing sub-networks through the learning period. These first findings of complex patterns of training-based encoding/retrieval reorganization thus have broad implications for a newly emerging view of the perception/memory interactions and their reorganization through the learning process. Note that the temporal evolution of these forms of extended functional reorganization could not be uncovered with conventional assessment paradigms used in the traditional approaches to functional mapping, which may therefore have to be revisited. Moreover, as the present results are obtained in learning under life-long blindness, they imply modality-independent operations, transcending the usual tight association with visual processing. The present approach of memory drawing training in blindness, has the dual-advantage of being both non-visual and causal intervention, which makes it a promising ‗scalpel‘ to disentangle interactions among diverse cognitive functions.
Interactive Paper Session
icon_mobile_dropdown
Do curved displays make for a more pleasant experience?
Nooree Na, Kyeong Ah Jeong, Hyeon-Jeong Suk
This study investigated the benefits of a curved display compared to a flat display and proposed the optimal radius of curvature for a monitor. The study was carried out in two steps. For identifying the optimal radius, a bendable monitor prototype was used to enable subjects to adjust the display radius manually. Each subject was instructed to search for an optimal radius according to individual preference and visual comfort. Six different themes were applied for the display content. The subjects also reported the radius in which a visual distortion occurred. As a result, it was found that curvature with a radius equal to 600 mm to 700 mm is optimal for a 23-inch diagonal display, while 700 mm to 800 mm is appropriate for a 27-inch diagonal display. Moreover, when the radius of curvature was smaller than 600 mm, a majority reported distortion regardless of the display size. Next, a validation test confirmed that the subjects read the texts faster on the curved display than on the flat display. Based on the empirical results of two experiments, the excellence of a curved monitor in terms of visual comfort, preference, and immersion was verified.
The importance of accurate convergence in addressing stereoscopic visual fatigue
Visual fatigue (asthenopia) continues to be a problem in extended viewing of stereoscopic imagery. Poorly converged imagery may contribute to this problem. In 2013, the Author reported that in a study sample a surprisingly high number of 3D feature films released as stereoscopic Blu-rays contained obvious convergence errors.1 The placement of stereoscopic image convergence can be an “artistic” call, but upon close examination, the sampled films seemed to have simply missed their intended convergence location. This failure maybe because some stereoscopic editing tools do not have the necessary fidelity to enable a 3D editor to obtain a high degree of image alignment or set an exact point of convergence. Compounding this matter further is the fact that a large number of stereoscopic editors may not believe that pixel accurate alignment and convergence is necessary. The Author asserts that setting a pixel accurate point of convergence on an object at the start of any given stereoscopic scene will improve the viewer’s ability to fuse the left and right images quickly. The premise is that stereoscopic performance (acuity) increases when an accurately converged object is available in the image for the viewer to fuse immediately. Furthermore, this increased viewer stereoscopic performance should reduce the amount of visual fatigue associated with longer-term viewing because less mental effort will be required to perceive the imagery. To test this concept, we developed special stereoscopic imagery to measure viewer visual performance with and without specific objects for convergence. The Company Team conducted a series of visual tests with 24 participants between 25 and 60 years of age. This paper reports the results of these tests.
Improvement in perception of image sharpness through the addition of noise and its relationship with memory texture
Xiazi Wan, Hiroyuki Kobayashi, Naokazu Aoki
In a preceding study, we investigated the effects of image noise on the perception of image sharpness using white noise, and one- and two-dimensional single-frequency sinusoidal patterns as stimuli. This study extends our preceding study by evaluating natural color images, rather than black-and-white patterns. The results showed that the effect of noise in improving image sharpness perception is more evident in blurred images than in sharp images. This is consistent with the results of the preceding study. In another preceding study, we proposed "memory texture" to explain the preferred granularity of images, as a concept similar to "memory color" for preferred color reproduction. We observed individual differences in type of memory texture for each object, that is, white or 1/f noise. This study discusses the relationship between improvement of sharpness perception by adding noise, and the memory texture, following its individual differences. We found that memory texture is one of the elements that affect sharpness perception.
Depth image enhancement using perceptual texture priors
A depth camera is widely used in various applications because it provides a depth image of the scene in real time. However, due to the limited power consumption, the depth camera presents severe noises, incapable of providing the high quality 3D data. Although the smoothness prior is often employed to subside the depth noise, it discards the geometric details so to degrade the distance resolution and hinder achieving the realism in 3D contents.

In this paper, we propose a perceptual-based depth image enhancement technique that automatically recovers the depth details of various textures, using a statistical framework inspired by human mechanism of perceiving surface details by texture priors. We construct the database composed of the high quality normals. Based on the recent studies in human visual perception (HVP), we select the pattern density as a primary feature to classify textures. Upon the classification results, we match and substitute the noisy input normals with high quality normals in the database. As a result, our method provides the high quality depth image preserving the surface details. We expect that our work is effective to enhance the details of depth image from 3D sensors and to provide a high-fidelity virtual reality experience.
A patch-based cross masking model for natural images with detail loss and additive defects
Visual masking is an effect that contents of the image reduce the detectability of a given target signal hidden in the image. The effect of visual masking has found its application in numerous image processing and vision tasks. In the past few decades, numerous research has been conducted on visual masking based on models optimized for artificial targets placed upon unnatural masks. Over the years, there is a tendency to apply masking model to predict natural image quality and detection threshold of distortion presented in natural images. However, to our knowledge few studies have been conducted to understand the generalizability of masking model to different types of distortion presented in natural images. In this work, we measure the ability of natural image patches in masking three different types of distortion, and analyse the performance of conventional gain control model in predicting the distortion detection threshold. We then propose a new masking model, where detail loss and additive defects are modeled in two parallel vision channels and interact with each other via a cross masking mechanism. We show that the proposed cross masking model has better adaptability to various image structures and distortions in natural scenes.
Influence of high ambient illuminance and display luminance on readability and subjective preference
Katrien De Moor, Börje Andrén, Yi Guo, et al.
Many devices, such as tablets, smartphones, notebooks, fixed and portable navigation systems are used on a (nearly) daily basis, both in in- and outdoor environments. It is often argued that contextual factors, such as the ambient illuminance in relation to characteristics of the display (e.g., surface treatment, screen reflectance, display luminance …) may have a strong influence on the use of such devices and corresponding user experiences. However, the current understanding of these influence factors is still rather limited. In this work, we therefore focus in particular on the impact of lighting and display luminance on readability, visual performance, subjective experience and preference. A controlled lab study (N=18) with a within-subjects design was performed to evaluate two car displays (one glossy and one matte display) in conditions that simulate bright outdoor lighting conditions. Four ambient luminance levels and three display luminance settings were combined into 7 experimental conditions. More concretely, we investigated for each display: (1) whether and how readability and visual performance varied with the different combinations of ambient luminance and display luminance and (2) whether and how they influenced the subjective experience (through self-reported valence, annoyance, visual fatigue) and preference. The results indicate a limited, yet negative influence of increased ambient luminance and reduced contrast on visual performance and readability for both displays. Similarly, we found that the self-reported valence decreases and annoyance and visual fatigue increase as the contrast ratio decreases and ambient luminance increases. Overall, the impact is clearer for the matte display than for the glossy display.
A no-reference bitstream-based perceptual model for video quality estimation of videos affected by coding artifacts and packet losses
K. Pandremmenou, M. Shahid, L. P. Kondi, et al.
In this work, we propose a No-Reference (NR) bitstream-based model for predicting the quality of H.264/AVC video sequences, affected by both compression artifacts and transmission impairments. The proposed model is based on a feature extraction procedure, where a large number of features are calculated from the packet-loss impaired bitstream. Many of the features are firstly proposed in this work, and the specific set of the features as a whole is applied for the first time for making NR video quality predictions. All feature observations are taken as input to the Least Absolute Shrinkage and Selection Operator (LASSO) regression method. LASSO indicates the most important features, and using only them, it is possible to estimate the Mean Opinion Score (MOS) with high accuracy. Indicatively, we point out that only 13 features are able to produce a Pearson Correlation Coefficient of 0.92 with the MOS. Interestingly, the performance statistics we computed in order to assess our method for predicting the Structural Similarity Index and the Video Quality Metric are equally good. Thus, the obtained experimental results verified the suitability of the features selected by LASSO as well as the ability of LASSO in making accurate predictions through sparse modeling.
Saliency detection for videos using 3D FFT local spectra
Zhiling Long, Ghassan AlRegib
Bottom-up spatio-temporal saliency detection identifies perceptually important regions of interest in video sequences. The center-surround model proves to be useful for visual saliency detection. In this work, we explore using 3D FFT local spectra as features for saliency detection within the center-surround framework. We develop a spectral location based decomposition scheme to divide a 3D FFT cube into two components, one related to temporal changes and the other related to spatial changes. Temporal saliency and spatial saliency are detected separately using features derived from each spectral component through a simple center-surround comparison method. The two detection results are then combined to yield a saliency map. We apply the same detection algorithm to different color channels (YIQ) and incorporate the results into the final saliency determination. The proposed technique is tested with the public CRCNS database. Both visual and numerical evaluations verify the promising performance of our technique.
Perceived interest versus overt visual attention in image quality assessment
Ulrich Engelke, Wei Zhang, Patrick Le Callet, et al.
We investigate the impact of overt visual attention and perceived interest on the prediction performance of image quality metrics. Towards this end we performed two respective experiments to capture these mechanisms: an eye gaze tracking experiment and a region-of-interest selection experiment. Perceptual relevance maps were created from both experiments and integrated into the design of the image quality metrics. Correlation analysis shows that indeed there is an added value of integrating these perceptual relevance maps. We reveal that the improvement in prediction accuracy is not statistically different between fixation density maps from eye gaze tracking data and region-of-interest maps, thus, indicating the robustness of different perceptual relevance maps for the performance gain of image quality metrics. Interestingly, however, we found that thresholding of region-of-interest maps into binary maps significantly deteriorates prediction performance gain for image quality metrics. We provide a detailed analysis and discussion of the results as well as the conceptual and methodological differences between capturing overt visual attention and perceived interest.
A tone mapping operator based on neural and psychophysical models of visual perception
Praveen Cyriac, Marcelo Bertalmio, David Kane, et al.
High dynamic range imaging techniques involve capturing and storing real world radiance values that span many orders of magnitude. However, common display devices can usually reproduce intensity ranges only up to two to three orders of magnitude. Therefore, in order to display a high dynamic range image on a low dynamic range screen, the dynamic range of the image needs to be compressed without losing details or introducing artefacts, and this process is called tone mapping. A good tone mapping operator must be able to produce a low dynamic range image that matches as much as possible the perception of the real world scene. We propose a two stage tone mapping approach, in which the first stage is a global method for range compression based on a gamma curve that equalizes the lightness histogram the best, and the second stage performs local contrast enhancement and color induction using neural activity models for the visual cortex.
Illuminant color estimation based on pigmentation separation from human skin color
Satomi Tanaka, Akihiro Kakinuma, Naohiro Kamijo, et al.
Human has the visual system called “color constancy” that maintains the perceptive colors of same object across various light sources. The effective method of color constancy algorithm was proposed to use the human facial color in a digital color image, however, this method has wrong estimation results by the difference of individual facial colors. In this paper, we present the novel color constancy algorithm based on skin color analysis. The skin color analysis is the method to separate the skin color into the components of melanin, hemoglobin and shading. We use the stationary property of Japanese facial color, and this property is calculated from the components of melanin and hemoglobin. As a result, we achieve to propose the method to use subject’s facial color in image and not depend on the individual difference among Japanese facial color.
Evaluation of color encodings for high dynamic range pixels
Traditional Low Dynamic Range (LDR) color spaces encode a small fraction of the visible color gamut, which does not encompass the range of colors produced on upcoming High Dynamic Range (HDR) displays. Future imaging systems will require encoding much wider color gamut and luminance range. Such wide color gamut can be represented using floating point HDR pixel values but those are inefficient to encode. They also lack perceptual uniformity of the luminance and color distribution, which is provided (in approximation) by most LDR color spaces. Therefore, there is a need to devise an efficient, perceptually uniform and integer valued representation for high dynamic range pixel values. In this paper we evaluate several methods for encoding colour HDR pixel values, in particular for use in image and video compression. Unlike other studies we test both luminance and color difference encoding in a rigorous 4AFC threshold experiments to determine the minimum bit-depth required. Results show that the Perceptual Quantizer (PQ) encoding provides the best perceptual uniformity in the considered luminance range, however the gain in bit-depth is rather modest. More significant difference can be observed between color difference encoding schemes, from which YDuDv encoding seems to be the most efficient.
Using false colors to protect visual privacy of sensitive content
Many privacy protection tools have been proposed for preserving privacy. Tools for protection of visual privacy available today lack either all or some of the important properties that are expected from such tools. Therefore, in this paper, we propose a simple yet effective method for privacy protection based on false color visualization, which maps color palette of an image into a different color palette, possibly after a compressive point transformation of the original pixel data, distorting the details of the original image. This method does not require any prior face detection or other sensitive regions detection and, hence, unlike typical privacy protection methods, it is less sensitive to inaccurate computer vision algorithms. It is also secure as the look-up tables can be encrypted, reversible as table look-ups can be inverted, flexible as it is independent of format or encoding, adjustable as the final result can be computed by interpolating the false color image with the original using different degrees of interpolation, less distracting as it does not create visually unpleasant artifacts, and selective as it preserves better semantic structure of the input. Four different color scales and four different compression functions, one which the proposed method relies, are evaluated via objective (three face recognition algorithms) and subjective (50 human subjects in an online-based study) assessments using faces from FERET public dataset. The evaluations demonstrate that DEF and RBS color scales lead to the strongest privacy protection, while compression functions add little to the strength of privacy protection. Statistical analysis also shows that recognition algorithms and human subjects perceive the proposed protection similarly
The visual light field in paintings of Museum Prinsenhof: comparing settings in empty space and on objects
Tatiana Kartashova, Huib de Ridder, Susan F. te Pas, et al.
The aim of this study was to investigate whether inferences of light in the empty space of a painting and on objects in that painting are congruent with each other. We conducted an experiment in which we tested the perception of light qualities (direction, intensity of directed and ambient components) for two conditions: a) for a position in empty space in a painting and b) on the convex object that was replaced by the probe in the first condition. We found that the consistency of directional settings both between conditions and within paintings is highly dependent on painting content, specifically on the number of qualitatively different light zones[1] in a scene. For uniform lighting observers are very consistent, but when there are two or more light zones present in a painting the individual differences become prominent. We discuss several possible explanations of such results, the most plausible of which is that human observers are blind to complex features of a light field2.
Using V1-based models for difference perception and change detection
P. Y. Chua, K. Kwok
Using V1-based models, it is possible to investigate the features of human visual processing that influence difference perception and change detection. V1-based models were built based on the same basic constructs of the human visual system, incorporating mechanisms of visual processing such as colour opponency, receptive field tuning, contrast sensitivity, linear and non-linear behaviour, and response pooling. Three studies were conducted to investigate the use of such models in difference perception and change detection. These studies demonstrate the various applications of human vision models and highlight several key considerations that need to be made when using them.