Image Quality and System Performance VIII

Front Matter: Volume 7867

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 7867, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.

Image quality metrics for the evaluation of print quality

Marius Pedersen, Nicolas Bonnier, Jon Y. Hardeberg, et al.

Show abstract

Image quality metrics have become more and more popular in the image processing community. However, so far, no one has been able to define an image quality metric well correlated with the percept for overall image quality. One of the causes is that image quality is multi-dimensional and complex. One approach to bridge the gap between perceived and calculated image quality is to reduce the complexity of image quality, by breaking the overall quality into a set of quality attributes. In our research we have presented a set of quality attributes built on existing attributes from the literature. The six proposed quality attributes are: sharpness, color, lightness, artifacts, contrast, and physical. This set keeps the dimensionality to a minimum. An experiment validated the quality attributes as suitable for image quality evaluation. The process of applying image quality metrics to printed images is not straightforward, because image quality metrics require a digital input. A framework has been developed for this process, which includes scanning the print to get a digital copy, image registration, and the application of image quality metrics. With quality attributes for the evaluation of image quality and a framework for applying image quality metrics, a selection of suitable image quality metrics for the different quality attributes has been carried out. Each of the quality attributes has been investigated, and an experimental analysis carried out to find the most suitable image quality metrics for the given quality attributes. For the sharpness attributes the Structural SIMilarity index (SSIM) by Wang et al. (2004) is the the most suitable, and for the other attributes further evaluation is required.

Hyper error map based document stitching

Chengwu Cui

Show abstract

The hyper error map is proposed as a metric to guide selection of stitch seam to minimize and avoid document stitching artifacts. An efficient hyper error map computation scheme is also shown with edge filtering. Examples of actual stitching artifacts are presented and used to prove the effectiveness of the hyper error map. The hyper error map is further discussed in relation to the hyperacuity of the visual system, which indicates that other aspect of human perception should also be incorporated in visual modeling for image processing.

Quantification of perceived macro-uniformity

Ki-Youn Lee, Yousun Bang, Heui-Keun Choh

Show abstract

Macro-uniformity refers to the subjective impression of overall uniformity in the print sample. By the efforts of INCITS W1.1 team, macro-uniformity is categorized into five types of attributes: banding, streaks, mottle, gradients, and moiré patterns, and the ruler samples are generated with perceptual scales. W1.1 macro-uniformity ruler is useful for judging the levels of print defect, but it is not an easy task to reproduce the samples having the same perceptual scales at different times in different places. An objective quantification method is more helpful and convenient for developers to analyze print quality and design printing system components. In this paper, we propose a method for measuring perceived macro-uniformity for a given print using a flat-bed scanner. First, banding, 2D noise, and gradients are separately measured, and they are converted to the perceptual scales based on subjective results of each attribute. The correlation coefficients between the measured values of the attributes and the perceptual scales are 0.92, 0.97, and 0.86, respectively. Another subjective test is performed to find the relationship between the overall macro-uniformity and the three attributes. The weighting factors are obtained by the experimental result, and the final macro-uniformity grade is determined by the weighted sums of each attribute.

Comparing hardcopy and softcopy results in the study of the impact of workflow on perceived reproduction quality of fine art images

Susan Farnand, Jun Jiang, Franziska Frey

Show abstract

A project, supported by the Andrew W. Mellon Foundation, is currently underway to evaluate current practices in fine art image reproduction, determine the image quality generally achievable, and establish a suggested framework for art image interchange. To determine the image quality currently being achieved, experimentation has been conducted in which a set of objective targets and pieces of artwork in various media were imaged by participating museums and other cultural heritage institutions. Prints and images for display made from the delivered image files at the Rochester Institute of Technology were used as stimuli in psychometric testing in which observers were asked to evaluate the prints as reproductions of the original artwork and as stand alone images. The results indicated that there were limited differences between assessments made using displayed images relative to printed reproductions. Further, the differences between rankings made with and without the original artwork present were much smaller than expected.

Using metrics to assess the ICC perceptual rendering intent

Kristyn Falkenstern, Nicolas Bonnier, Marius Pedersen, et al.

Show abstract

Increased interest in color management has resulted in more options for the user to choose between for their color management needs. We propose an evaluation process that uses metrics to assess the quality of ICC profiles, specifically for the perceptual rendering intent. The primary objective of the perceptual rendering intent, unlike the media-relative intent, is a preferred reproduction rather than an exact match. Profile vendors commonly quote a CIE ΔE*ab color difference to define the quality of a profile. With the perceptual rendering intent, this may or may not correlate to the preferred reproduction. For this work we compiled a comprehensive list of quality aspects, used to evaluate the perceptual rendering intent of an ICC printer profile. The aspects are used as tools to individually judge the different qualities that define the overall strength of profiles. The proposed workflow uses metrics to assess each aspect and delivers a relative comparison between different printer profile options. The aim of the research is to improve the current methods used to evaluate a printer profile, while reducing the amount of time required.

Development of a perceptually calibrated objective metric of noise

Brian W. Keelan, Elaine W. Jin, Sergey Prokushkin

Show abstract

A system simulation model was used to create scene-dependent noise masks that reflect current performance of mobile phone cameras. Stimuli with different overall magnitudes of noise and with varying mixtures of red, green, blue, and luminance noises were included in the study. Eleven treatments in each of ten pictorial scenes were evaluated by twenty observers using the softcopy ruler method. In addition to determining the quality loss function in just noticeable differences (JNDs) for the average observer and scene, transformations for different combinations of observer sensitivity and scene susceptibility were derived. The psychophysical results were used to optimize an objective metric of isotropic noise based on system noise power spectra (NPS), which were integrated over a visual frequency weighting function to yield perceptually relevant variances and covariances in CIE L*a*b* space. Because the frequency weighting function is expressed in terms of cycles per degree at the retina, it accounts for display pixel size and viewing distance effects, so application-specific predictions can be made. Excellent results were obtained using only L* and a* variances and L*a* covariance, with relative weights of 100, 5, and 12, respectively. The positive a* weight suggests that the luminance (photopic) weighting is slightly narrow on the long wavelength side for predicting perceived noisiness. The L*a* covariance term, which is normally negative, reflects masking between L* and a* noise, as confirmed in informal evaluations. Test targets in linear sRGB and rendered L*a*b* spaces for each treatment are available at http://www.aptina.com/ImArch/ to enable other researchers to test metrics of their own design and calibrate them to JNDs of quality loss without performing additional observer experiments. Such JND-calibrated noise metrics are particularly valuable for comparing the impact of noise and other attributes, and for computing overall image quality.

Perceptually relevant evaluation of noise power spectra in adaptive pictorial systems

Robin B. Jenkin, Brian W. Keelan

Show abstract

Noise Power Spectra (NPS) are traditionally measured using uniform areas of tone. Adaptive algorithms, such as noise reduction, demosaicing, and sharpening, can modify their behavior based on underlying image structure. In particular, noise reduction algorithms may suppress noise more strongly in perfectly uniform areas than they would in those with modest variations, as found in actual pictorial images, and so yield unrepresentative NPS. This phenomenon would be similar in nature to the susceptibility of high-contrast-edges to adaptive sharpening and the subsequent over-estimation of effective pictorial modulation transfer function by some targets. Experimentation is described that examines the effect of modern adaptive noise reduction algorithms on the NPS of images containing ramps of varying gradient. Gradients are chosen based on a survey of consumer images from areas where noise is typically noticeable, such as blue sky, walls and faces. Although loss in performance of adaptive noise reduction is observed as gradients increase, the effect is perceptually small when weighted according to the frequency of occurrence of the gradients in pictorial imaging. The significant additional complexity of measuring gradient-based NPS does not appear to be justified; measuring NPS from uniform areas of tone should suffice for most perceptual work.

A novel perceptual image quality measure for block based image compression

Tamar Shoham, Dror Gill, Sharon Carmel

Show abstract

The challenge of finding a reliable, real-time, automatic perceptual evaluation of image quality has been tackled continuously by researchers worldwide. Existing methods often have high complexity, or are dependent on setup specifics such as image size, or else have low correlation with subjective quality. We propose a novel, easy to compute, image quality score which reliably measures artifacts introduced in block based coding schemes. The proposed score, named BBCQ (Block Based Coding Quality) lies in the range 0-1 with 1 indicating identical images, and is composed of three components. These components are based on a pixel-wise error using PSNR, evaluation of added artifactual edges along coding block boundaries and a measure of the texture distortion. These three measures are calculated on image tiles, whose size depends on image resolution, and are combined using a weighted geometric average. The obtained local scores, one per image tile, may then be used for local quality evaluation, or pooled into a single overall image quality score. The proposed quality score enables reliable, real-time, automatic perceptual evaluation of the quality of block-based coded images. BBCQ has been successfully integrated into an automatic, perceptually lossless, JPEG recompression system.

Predicting preferred coring level to reduce toner scatter in electrophotographic printing

Hyung Jun Park, Jan P. Allebach

Show abstract

The electrophotographic process depends on a complex interplay between electrostatically charged toner particles, the developer roller, and the organic photoconductor during development; and between the toner particles, the organic photoconductor, and the paper during transfer. The task of controlling the imaging process is made even more challenging by the fact that colorant planes are developed independently and in succession. At high colorant levels, toner particles for a given colorant plane may be strongly repelled by toner that has already been deposited for previously developed colorant planes. The result is scattering of toner away from the edges of thin lines and character strokes. In previous work, we have proposed a coring method to reduce the occurrence of the toner scatter, and conducted psychophysical experiments to determine the preferred level of coring as a function of line width and colorant level. In this paper, we apply the edge transition width (ETW) metric to physically measure the impact of toner scatter on the sharpness of edges of lines and character strokes. We consider ETW both with and without coring, and compare it to the results from our earlier psychophysical experiments.

A universal reference-free blurriness measure

Chunhua Chen, Wen Chen, Jeffrey A. Bloom

Show abstract

The perceptual quality of digital imagery is of great interest in many applications. Blur artifacts can be among the most annoying in processed images and video sequences. In many applications of perceptual quality assessment, a reference is not available. Therefore no-reference blurriness measures are of interest. In this paper, we present a universal, reference-free blurriness measurement approach. While some other methods are designed for a particular source of blurriness such as block-based compression, the proposed is universal in that it should work for any source of blur. The proposed approach models the gradient image of the given image as Markov chain and utilizes transition probabilities to compute a blurriness measure. This is the first time that transition probabilities are applied to perceptual quality assessment. Specifically, we first compute the transition probabilities for selected pairs of gradient values and then combine these probabilities, using a pooling strategy, to formulate the blurriness measure. Experimental studies compare the proposed method to the state-of-the-art reference-free blurriness measurement algorithms and show that the proposed method outperforms the commonly used measures.

Issues in the design of a no-reference metric for perceived blur

Hantao Liu, Ingrid Heynderickx

Show abstract

Developing an objective metric, which automatically quantifies perceived image quality degradation induced by blur, is highly beneficial for current digital imaging systems. In many applications, these objective metrics need to be of the noreference (NR) type, which implies that quality prediction is based on the distorted image only. Recent progress in the development of a NR blur metric is evident from many promising methods reported in the literature. However, there is still room for improvement in the design of a NR metric that reliably predicts the extent to which humans perceive blur. In this paper, we address some important issues relevant to the design as well as the application of a NR blur metric. Its purpose is not to describe a particular metric, but rather to explain current concerns and difficulties in this field, and to outline how these issues may be accounted for in the design of future metrics.

Evaluating super resolution algorithms

Youn Jin Kim, Jong Hyun Park, Gun Shik Shin, et al.

Show abstract

This study intends to establish a sound testing and evaluation methodology based upon the human visual characteristics for appreciating the image restoration accuracy; in addition to comparing the subjective results with predictions by some objective evaluation methods. In total, six different super resolution (SR) algorithms - such as iterative back-projection (IBP), robust SR, maximum a posteriori (MAP), projections onto convex sets (POCS), a non-uniform interpolation, and frequency domain approach - were selected. The performance comparison between the SR algorithms in terms of their restoration accuracy was carried out through both subjectively and objectively. The former methodology relies upon the paired comparison method that involves the simultaneous scaling of two stimuli with respect to image restoration accuracy. For the latter, both conventional image quality metrics and color difference methods are implemented. Consequently, POCS and a non-uniform interpolation outperformed the others for an ideal situation, while restoration based methods appear more accurate to the HR image in a real world case where any prior information about the blur kernel is remained unknown. However, the noise-added-image could not be restored successfully by any of those methods. The latest International Commission on Illumination (CIE) standard color difference equation CIEDE2000 was found to predict the subjective results accurately and outperformed conventional methods for evaluating the restoration accuracy of those SR algorithms.

Image quality assessment based on distortion identification

Aladine Chetouani, Azeddine Beghdadi

Show abstract

A New Global Full-Reference Image Quality System based on classification and fusion scheme is proposed. It consists of many steps. The first step is devoted to the identification of the type of degradation contained in a given image based a Linear Discriminant Analysis (LDA) classifier using some common Image Quality Metric (IQM) as feature inputs. An IQM per degradation (IQM-D) is then used to estimate the quality of the image. For a given degradation type, the appropriate IQM-D is derived by combining the top three best IQMs using an Artificial Neural Network model. The performance of the proposed scheme is evaluated first in terms of good degradation identification. Then, for each distortion type the image quality estimation is evaluated in terms of good correlation with the subjective judgments using the TID 2008 image database.

Image quality evaluation of light field photography

Qiang Fu, Zhiliang Zhou, Yan Yuan, et al.

Show abstract

Light field photography captures 4D radiance information of a scene. Digital refocusing and digital correction of aberrations could be done after the photograph is taken. However, capturing 4D light field is costly and tradeoffs between different image quality metrics should be made and evaluated. This paper explores the effects of light field photography on image quality by quantitatively evaluating some basic criteria for an imaging system. A simulation approach was first developed by ray-tracing a designed light field camera. A standard testing chart followed by ISO 12233 was provided as the input scene. A sequence of light field raw images were acquired and processed by light field rendering methods afterwards. Through-focus visual resolution and MTF were calculated and analyzed. As a comparison, the same tests were taken for the same main lens system as the results of conventional photography. An experimental light field system was built up and its performance was tested. This work helps better understanding the pros and cons of light field photography in contrast with conventional imaging methods and perceiving the way to optimize the joint digital-optical design of the system.

Feature-based automatic color calibration for networked camera system

Shoji Yamamoto, Keisuke Taki, Norimichi Tsumura, et al.

Show abstract

In this paper, we have developed a feature-based automatic color calibration by using an area-based detection and adaptive nonlinear regression method. Simple color matching of chartless is achieved by using the characteristic of overlapping image area with each camera. Accurate detection of common object is achieved by the area-based detection that combines MSER with SIFT. Adaptive color calibration by using the color of detected object is calculated by nonlinear regression method. This method can indicate the contribution of object's color for color calibration, and automatic selection notification for user is performed by this function. Experimental result show that the accuracy of the calibration improves gradually. It is clear that this method can endure practical use of multi-camera color calibration if an enough sample is obtained.

Estimation error in image quality measurements

Peter D. Burns

Show abstract

The development and adoption of standard image quality measurement and analysis methods have helped both the evaluation of competing imaging products and in technologies. Inherent in the interpretation of results from any particular evaluation, however, are the variation of the method itself, the sampling of test images, equipment, and test conditions. Here we take a statistical approach to measurement variation, and interpret the objective as being the estimation of particular system or image properties, based on data, collected as part of standard testing. Measurement variation was investigated for two signal-transfer methods commonly used for digital camera and scanner evaluation: the ISO 12233 slanted-edge spatial frequency response and the dead-leaves method for texture-MTF evaluation being developed by the Camera Phone Image Quality (CPIQ) Initiative. In each case, the variation due to the selection of analysis regions was computed by repeated analysis. The slanted-edge methods indicated a relative error in the range of 1-3% depending on the nature of the region selection. For the dead-leaves method, the amplitude spectrum (square root of the noise-power spectrum) showed a relative error of approximately 4-6%, however, this can be reduced by applying spectral estimation methods commonly used in image noise analysis.

LCD displays performance comparison by MTF measurement using the white noise stimulus method

Carles Mitjà, Jaume Escofet

Show abstract

The amount of images produced to be viewed as soft copies on output displays are significantly increasing. This growing occurs at the expense of the images targeted to hard copy versions on paper or any other physical support. Even in the case of high quality hard copy production, people working in professional imaging uses different displays in selecting, editing, processing and showing images, from laptop screen to specialized high end displays. Then, the quality performance of these devices is crucial in the chain of decisions to be taken in image production. Metrics of this quality performance can help in the equipment acquisition. Different metrics and methods have been described to determine the quality performance of CRT and LCD computer displays in clinical area. One of most important metrics in this field is the device spatial frequency response obtained measuring the modulation transfer function (MTF). This work presents a comparison between the MTF of three different LCD displays, Apple MacBook Pro 15", Apple LED Cinema Display 24" and Apple iPhone4, measured by the white noise stimulus method, over vertical and horizontal directions. Additionally, different displays show particular pixels structure pattern. In order to identify this pixel structure, a set of high magnification images is taken from each display to be related with the respective vertical and horizontal MTF.

Improving the quality of H.264/AVC by using a new rate-quantization model

M. Hrarti, H. Saadane, M.-C. Larabi, et al.

Show abstract

Rate control plays a key role in video coding standards. Its goal is to achieve a good quality at a given target bit-rate. In H.264/AVC, rate control algorithm for both Intra and Inter-frames suffers from some defects. In the Intra-frame rate control, the initial quantization parameter (QP) is mainly adjusted according to a global target bit-rate and length of GOP. This determination is inappropriate and generates errors in the whole of video sequence. For Inter coding unit (Frame or Macroblock), the use of MAD (Mean Average Differences) as a complexity measure, remains inefficient, resulting in improper QP values because the MAD handles locally images characteristics. QP miscalculations may also result from the linear prediction model which assumes similar complexity from coding unit to another. To overcome these defects, we propose in this paper, a new Rate-Quantization (R-Q) model resulting from extensive experiments. This latter is divided into two models. The first one is an Intra R-Q model used to determine an optimal initial quantization parameter for Intraframes. The second one is an Inter R-Q model that aims at determining the QP of Inter coding unit according to the statistics of the previous coded ones. It does not use any complexity measure and substitutes both linear and quadratic models used in H.264/AVC rate controller. Objective and subjective simulations have been carried out using JM15.0 reference software. Compared to this latter, the global R-Q model (Intra and Inter models combined) improves the coding efficiency in terms of PSNR, objectively (up to +2.01dB), subjectively (by psychophysical experiments) and in terms of computational complexity.

A noble method on no-reference video quality assessment using block modes and quantization parameters of H.264/AVC

Inkyung Park, Taeyoung Na, Munchurl Kim

Show abstract

Video quality assessment is an important tool of guaranteeing video services in a required level of quality. Although subjective quality assessment is more reliable due to the reflection of Human Visual System (HVS) than objective quality assessment, it is a time-consuming and very expensive approach, and is not appropriate for real-time applications. Therefore, much research has been made for objective video quality assessment instead of subjective video quality assessment. Among three kinds of objective assessment approaches which are full-reference, reduced-reference and no-reference methods, no-reference method has drawn much attention because it does not require any reference. The encoding parameters are good features to use for no-reference model because the encoded bitstreams carry plenty of information about the video contents and it is easy to extract some coding parameters to assess visual quality. In this paper, we propose a no-reference quality metric using two kinds of coding parameters in H.264/AVC: quantization and block mode parameters. These parameters are extracted and computed from H.264/AVC bitstreams, without relying on pixel domain processing. We design a linear quality metric composed of these two parameters. The weight values of the parameters are estimated using linear regression with the results of subjective quality assessment which are obtained based on the DSIS (Double Stimulus Impairment Scale) method of ITU-R BT.500-11.

Prioritization of application layer FEC information for IP television services QoS

E. Mammi, G. Russo, P. Talone

Show abstract

In the digital television world, an important transformation is represented by the television over IP service. One of the key factors enabling the spreading of television over IP is represented by the quality. Furthermore, packet loss is probably the main service degradation source for that services. The proposed approach combines the use of AL-FEC with the set-up of a transport quality mechanism based on FEC packets prioritization. To AL-FEC packets is assigned a transfer priority higher than that of the media packets transferred under the best effort paradigm, thus reducing in congested routers the amount of FEC packet losses. In this way the error correction capability is improved. Furthermore, as the FEC stream is usually a percentage of the media one, the choice of applying the prioritization to the FEC stream and not to the whole media allows reducing the impact of prioritization of television service traffic on other types of traffic, concurrent on the same link. The tests have been performed on a simulated network and on a real IP test-bed. The results show the effectiveness of the proposed approach with respect the un-prioritized one, allowing to obtain higher video quality at the same packet loss rate.

Reference image method for measuring quality of photographs produced by digital cameras

Mikko Nuutinen, Olli Orenius, Timo Säämänen, et al.

Show abstract

Objective image quality metrics can be based on test targets or algorithms. Traditionally, the image quality of digital cameras has been measured using test targets. Test-target measurements are tedious and require a controlled laboratory environment. Algorithm metrics can be divided into three groups: full-reference (FR), reduced-reference (RR) and noreference (NR). FR metrics cannot be applied to the computation of image quality captured by digital cameras because pixel-wise reference images are missing. NR metrics are applicable only when the distortion type is known and the distortion space is low-dimensional. RR metrics provide a tradeoff between NR and FR metrics. An RR metric does not require a pixel-wise reference image; it only requires a set of extracted features. With the aid of RR features, it is possible to avoid problems related to NR metrics. In this study, we evaluate the applicability of RR metrics to measuring the image quality of natural images captured by digital cameras. We propose a method in which reference images are captured using a reference camera. The reference images represented natural reproductions of the views under study. We tested our method using three RR metrics proposed in the literature. The results suggest that the proposed method is promising for measuring the quality of natural images captured by digital cameras for the purpose of camera benchmarking.

RAW camera DPCM compression performance analysis

Katherine Bouman, Vikas Ramachandra, Kalin Atanassov, et al.

Show abstract

The MIPI standard has adopted DPCM compression for RAW data images streamed from mobile cameras. This DPCM is line based and uses either a simple 1 or 2 pixel predictor. In this paper, we analyze the DPCM compression performance as MTF degradation. To test this scheme's performance, we generated Siemens star images and binarized them to 2-level images. These two intensity values where chosen such that their intensity difference corresponds to those pixel differences which result in largest relative errors in the DPCM compressor. (E.g. a pixel transition from 0 to 4095 corresponds to an error of 6 between the DPCM compressed value and the original pixel value). The DPCM scheme introduces different amounts of error based on the pixel difference. We passed these modified Siemens star chart images to this compressor and compared the compressed images with the original images using IT3 MTF response plots for slanted edges. Further, we discuss the PSF influence on DPCM error and its propagation through the image processing pipe.

Brightness, lightness, and specifying color in high-dynamic-range scenes and images

Mark D. Fairchild, Ping-Hsu Chen

Show abstract

Traditional color spaces have been widely used in a variety of applications including digital color imaging, color image quality, and color management. These spaces, however, were designed for the domain of color stimuli typically encountered with reflecting objects and image displays of such objects. This means the domain of stimuli with luminance levels from slightly above zero to that of a perfect diffuse white (or display white point). This limits the applicability of such spaces to color problems in HDR imaging. This is caused by their hard intercepts at zero luminance/lightness and by their uncertain applicability for colors brighter than diffuse white. To address HDR applications, two new color spaces were recently proposed, hdr-CIELAB and hdr-IPT. They are based on replacing the power-function nonlinearities in CIELAB and IPT with more physiologically plausible hyperbolic functions optimized to most closely simulate the original color spaces in the diffuse reflecting color domain. This paper presents the formulation of the new models, evaluations using Munsell data in comparison with CIELAB, IPT, and CIECAM02, two sets of lightness-scaling data above diffuse white, and various possible formulations of hdr-CIELAB and hdr-IPT to predict the visual results.

Evaluating HDR photos using Web 2.0 technology

Guoping Qiu, Yujie Mei, Jiang Duan

Show abstract

High dynamic range (HDR) photography is an emerging technology that has the potential to dramatically enhance the visual quality and realism of digital photos. One of the key technical challenges of HDR photography is displaying HDR photos on conventional devices through tone mapping or dynamic range compression. Although many different tone mapping techniques have been developed in recent years, evaluating tone mapping operators prove to be extremely difficult. Web2.0, social media and crowd-sourcing are emerging Internet technologies which can be harnessed to harvest the brain power of the mass to solve difficult problems in science, engineering and businesses. Paired comparison is used in the scientific study of preferences and attitudes and has been shown to be capable of obtaining an interval-scale ordering of items along a psychometric dimension such as preference or importance. In this paper, we exploit these technologies for evaluating HDR tone mapping algorithms. We have developed a Web2.0 style system that enables Internet users from anywhere to evaluate tone mapped HDR photos at any time. We adopt a simple paired comparison protocol, Internet users are presented a pair of tone mapped images and are simply asked to select the one that they think is better or click a "no difference" button. These user inputs are collected in the web server and analyzed by a rank aggregation algorithm which ranks the tone mapped photos according to the votes they received. We present experimental results which demonstrate that the emerging Internet technologies can be exploited as a new paradigm for evaluating HDR tone mapping algorithms. The advantages of this approach include the potential of collecting large user inputs under a variety of viewing environments rather than limited user participation under controlled laboratory environments thus enabling more robust and reliable quality assessment. We also present data analysis to correlate user generated qualitative indices with quantitative image statistics which may provide useful guidance for developing better tone mapping operators.

Just noticeable difference vs. visual difference: hypotheses and how to verify their validity

Sergey Bezryadin, Pavel Burov, Igor Tryndin

Show abstract

The issue of accurate color reproduction is a hot topic, which is closely linked to the problem of accurate measurement of the human visual threshold, or "Just Noticeable Difference" (JND). Since most imaging scientists believe that JND experiments are too complicated and costly, "Visual Difference" (dV) experiments have gained high popularity. Typically the results of dV experiments are extended in place of JND, and many scientists interchange dV for JND. For example, the current standard color difference formula CIE DE 2000 was constructed on a dataset from dV experiments. However, in order for the "dV to JND" transition to be correct, several assumptions are taken, and whose haven't actually been proven. This paper proposes a relatively inexpensive experiment that will allow a precise JND measurement experiment, which in turn will let verify the assumptions for the dV to JND transition, and perhaps offer a better dataset for development of a more robust and solid color difference formula.

Device-dependent scene-dependent quality predictions using effective pictorial information capacity

Kyung Hoon Oh, Sophie Triantaphillidou, Ralph E. Jacobson

Show abstract

This study aims to introduce improvements in the predictions of device-dependent image quality metrics (IQMs). A validation experiment was first carried out to test the success of such a metric, the Effective Pictorial Information Capacity (EPIC), using results from subjective tests involving 32 test scenes replicated with various degrees of sharpness and noisiness. The metric was found to be a good predictor when tested against average ratings but, as expected by device-dependent metrics, it predicted less successfully the perceived quality of individual, non-standard scenes with atypical spatial and structural content. Improvement in predictions was attempted by using a modular image quality framework and its implementation with the EPIC metric. It involves modeling a complicated set of conditions, including classifying scenes into a small number of groups. The scene classification employed for the purpose uses objective scene descriptors which correlate with subjective criteria on scene susceptibility to sharpness and noisiness. The implementation thus allows automatic grouping of scenes and calculation of the metric values. Results indicate that model predictions were improved. Most importantly, they were shown to correlate equally well with subjective quality scales of standard and non-standard scenes. The findings indicate that a device-dependent, scene-dependent image quality model can be achieved.

Social image quality

Guoping Qiu, Ahmed Kheiri

Show abstract

Current subjective image quality assessments have been developed in the laboratory environments, under controlledconditions, and are dependent on the participation of limited numbers of observers. In this research, with the help of Web 2.0 and social media technology, a new method for building a subjective image quality metric has been developed where the observers are the Internet users. A website with a simple user interface that enables Internet users from anywhere at any time to vote for a better quality version of a pair of the same image has been constructed. Users' votes are recorded and used to rank the images according to their perceived visual qualities. We have developed three rank aggregation algorithms to process the recorded pair comparison data, the first uses a naive approach, the second employs a Condorcet method, and the third uses the Dykstra's extension of Bradley-Terry method. The website has been collecting data for about three months and has accumulated over 10,000 votes at the time of writing this paper. Results show that the Internet and its allied technologies such as crowdsourcing offer a promising new paradigm for image and video quality assessment where hundreds of thousands of Internet users can contribute to building more robust image quality metrics. We have made Internet user generated social image quality (SIQ) data of a public image database available online (http://www.hdri.cs.nott.ac.uk/siq/) to provide the image quality research community with a new source of ground truth data. The website continues to collect votes and will include more public image databases and will also be extended to include videos to collect social video quality (SVQ) data. All data will be public available on the website in due course.

Utility studies for security encoded office documents: experimental design challenges

Chris A. Deller, Geoff J. Woolfe

Show abstract

We have developed methodologies to study the usability of documents. This methodology has been applied to the study of the impact of visible security patterns in the usability of typical office documents. Two specific information retrieval tasks have been examined - the retrieval of text-based information from a written report and the retrieval of numerical information from tables and graphs. The methodology we have developed aims to minimize sources of uncontrolled variability in the measurements while simultaneously avoiding a systematic bias from learning effects and maintaining task equivalence across all documents. We believe the methodologies developed in this work may prove useful in future studies of document usability.

Printed fingerprints: a framework and first results towards detection of artificially printed latent fingerprints for forensics

Stefan Kiltz, Mario Hildebrandt, Jana Dittmann, et al.

Show abstract

In Schwarz 1 an amino acid model for printing latent fingerprints to porous surfaces is introduced, motivated by the need for reproducibility tests of different development techniques for forensic investigations. However, this technique also enables the fabrication of artificial traces constituting a possible threat to security, motivating a need for research of appropriate detection techniques. In this paper a new framework for modelling the properties of a generic fingerprint examination process is introduced. Based on the framework, examination properties and detection properties are derived by a subjective evaluation. We suggest a first formalisation of exemplary properties, which can be easily extended to fit different needs. We present a first experimental setup limited to two printers and the Schwarz amino acid model using absorbing and non-absorbing material with first results to show tendencies and underline the necessity for further research.

Monitoring image quality for security applications

Mohamed-Chaker Larabi, Didier Nicholson

Show abstract

This work is focusing on the definition of a procedure for the qualification of coding schemes for video surveillance applications. It consists in developing and benchmarking tools that learn from the expertise of police and security department. This expertise is intended to be modeled thanks to a campaign of subjective measurement allowing to analyze the way they are using in performing the security tasks like face or license plate recognition, event detection and so on. The results of the previous test are used will be used to tune and to construct a hybrid metric based on basic artifacts detection due to compression and transmission.

Video quality and interpretability study using SAMVIQ and Video-NIIRS

Darrell L. Young, Jeff Ruszczyk, Tariq Bakir

Show abstract

The effect of various video encoders, and compression settings is examined using the subjective task-based performance metric, Video National Imagery Interpretability Rating Scale (Video-NIIRS), and a perceptual quality metric Subjective Assessment Methodology of Video Image Quality (SAMVIQ). Subjective results are compared to objective measurements.

Weighted-MSE based on saliency map for assessing video quality of H.264 video streams

H. Boujut, J. Benois-Pineau, O. Hadar, et al.

Show abstract

Human vision system is very complex and has been studied for many years specifically for purposes of efficient encoding of visual, e.g. video content from digital TV. There have been physiological and psychological evidences which indicate that viewers do not pay equal attention to all exposed visual information, but only focus on certain areas known as focus of attention (FOA) or saliency regions. In this work, we propose a novel based objective quality assessment metric, for assessing the perceptual quality of decoded video sequences affected by transmission errors and packed loses. The proposed method weights the Mean Square Error (MSE), Weighted-MSE (WMSE), according to the calculated saliency map at each pixel. Our method was validated trough subjective quality experiments.

Using performance efficiency for testing and optimization of visual attention models

Brian J. Stankiewicz, Nathan J. Anderson, Richard J. Moore

Show abstract

When developing a predictive tool for human performance one needs to have clear metrics to evaluate the model's performance. In the area of Visual Attention Modeling (VAM) one typically compares eye-tracking data collected on a group of human observers to the predictions made by a model. To evaluate the performance of these models one typically uses signal detection (Receiver Operating Characteristic (ROC)) that measures the predictive power of the system by comparing the model's predictions for an image to human eye tracking data. These ROC curves take into account the model's hit and false alarm rates and by averaging over a set of test images provides a final measure of the system's performance. In releasing a commercial visual attention system, we have spent considerable effort in developing metrics that allow for regression testing, that are useful for optimizing our visual attention model that takes into account the Upper-Theoretical Performance Limit for an image or classes of images. We describe how the Upper- Theoretical Performance Limit is calculated and how regression testing and parameter optimization benefit from this approach.

Naturalness and interestingness of test images for visual quality evaluation

Raisa Halonen, Stina Westman, Pirkko Oittinen

Show abstract

Balanced and representative test images are needed to study perceived visual quality in various application domains. This study investigates naturalness and interestingness as image quality attributes in the context of test images. Taking a top-down approach we aim to find the dimensions which constitute naturalness and interestingness in test images and the relationship between these high-level quality attributes. We compare existing collections of test images (e.g. Sony sRGB images, ISO 12640 images, Kodak images, Nokia images and test images developed within our group) in an experiment combining quality sorting and structured interviews. Based on the data gathered we analyze the viewer-supplied criteria for naturalness and interestingness across image types, quality levels and judges. This study advances our understanding of subjective image quality criteria and enables the validation of current test images, furthering their development.

Potential of face area data for predicting sharpness of natural images

Mikko Nuutinen, Olli Orenius, Timo Säämänen, et al.

Show abstract

Face detection techniques are used for many different applications. For example, face detection is a basic component in many consumer still and video cameras. In this study, we compare the performance of face area data and freely selected local area data for predicting the sharpness of photographs. The local values were collected systematically from images, and for the analyses we selected only the values with the highest performance. The objective sharpness metric was based on the statistics of the wavelet coefficients for the selected areas. We used three image contents whose subjective sharpness values had been measured. The image contents were captured by 13 cameras, and the images were evaluated by 25 subjects. The quality of the cameras ranged from low-end mobile phone cameras to low-end compact cameras. The image contents simulated typical photos that consumers take with their mobile phones. The face area sizes on the images were approximately 0.4, 1.0 or 4.0 %. Based on the results, the face area data proved to be valuable for measuring the sharpness of the photographs if the face size was large enough. When the face area size was 1.0 or 4.0 %, the performance of the measured sharpness values was equal to or better than the sharpness values measured from the best local areas. When the face area was too small (0.4 %), the performance was low compared with the best local areas.

A video quality assessment model based on the MPEG-7 descriptor

Masaharu Sato, Yuukou Horita

Show abstract

Our research is focused on examining the video quality assessment model based on the MPEG-7 descriptor. This model consists of two parts: "Frame Quality estimation processing" and "Video Quality estimation processing". The estimation of Video Quality in the proposed model uses five values (average value, worst value, best value, standard deviation and frame rate) from the estimation Frame Quality and the input video sequence. Two coding methods (WMV9 and H.264) are used to verify the proposed model's presumption accuracy. As a result, Video Quality estimation has a high presumption accuracy (correlation : 0.94, average error : 0.20, maximum error : 0.68 and outlier ratio : 0.23).

Image quality: a tool for no-reference assessment methods

Silvia Corchs, Francesca Gasparini, Fabrizio Marini, et al.

Show abstract

In this work we propose an image quality assessment tool. The tool is composed of different modules that implement several No Reference (NR) metrics (i.e. where the original or ideal image is not available). Different types of image quality attributes can be taken into account by the NR methods, like blurriness, graininess, blockiness, lack of contrast and lack of saturation or colorfulness among others. Our tool aims to give a structured view of a collection of objective metrics that are available for the different distortions within an integrated framework. As each metric corresponds to a single module, our tool can be easily extended to include new metrics or to substitute some of them. The software permits to apply the metrics not only globally but also locally to different regions of interest of the image.

Extending video quality metrics to the temporal dimension with 2D-PCR

Christian Keimel, Martin Rothbucher, Klaus Diepold

Show abstract

The aim of any video quality metric is to deliver a quality prediction similar to the video quality perceived by human observers. One way to design such a model of human perception is by data analysis. In this contribution we intend to extend this approach to the temporal dimension. Even though video obviously consists of spatial and temporal dimensions, the temporal aspect is often not considered well enough. Instead of including this third dimension in the model itself, the metrics are usually only applied on a frame-by-frame basis and then temporally pooled, commonly by averaging. Therefore we propose to skip the temporal pooling step and use the additional temporal dimension in the model building step of the video quality metric. We propose to use the two dimensional extension of the PCR, the 2D-PCR, in order to obtain an improved model. We conducted extensive subjective tests with different HDTV video sequences at 1920×1080 and 25 frames per seconds. For verification, we performed a cross validation to get a measure for the real-life performance of the acquired model. Finally, we will show that the direct inclusion of the temporal dimension of video into the model building improves the overall prediction accuracy of the visual quality significantly.

ImQual: a web-service dedicated to image quality evaluation and metrics benchmark

Michael Nauge, Mohamed-Chaker Larabi, Christine Fernandez-Maloigne

Show abstract

Quality assessment is becoming an important issue in the framework of image and video processing. Images are generally intended to be viewed by human observers and thus the consideration of the visual perception is an intrinsic aspect of the effective assessment of image quality. This observation has been made for different application domains such as printing, compression, transmission, and so on. Recently hundreds of research paper have proposed objective quality metrics dedicated to several image and video applications. With this abundance of quality tools, it is more than ever important to have a set of rules/methods allowing to assess the efficiency of a given metric. In this direction, technical groups such as VQEG (Video Quality Experts Group) or JPEG AIC (Advanced Image Coding) have focused their interest on the definition of test-plans to measure the impact of a metric. Following this wave in the image and video community, we propose in this paper a web-service or a web-application dedicated to the benchmark of quality metrics for image compression and open to all possible extensions. This application is intended to be the reference tool for the JPEG committee in order to ease the evaluation of new compression technologies. Also it is seen as a global help for our community to help researchers time while trying to evaluate their algorithms of watermarking, compression, enhancement, . . . As an illustration of the web-application, we propose a benchmark of many well-known metrics on several image databases to provide a small overview of the possible use.

Optimal front light design for reflective displays under different ambient illumination

Sheng-Po Wang, Ting-Ting Chang, Chien-Ju Li, et al.

Show abstract

The goal of this study is to find out the optimal luminance and color temperature of front light for reflective displays in different ambient illumination by conducting series of psychophysical experiments. A color and brightness tunable front light device with ten LED units was built and been calibrated to present 256 luminance levels and 13 different color temperature at fixed luminance of 200 cd/m². The experiment results revealed the best luminance and color temperature settings for human observers under different ambient illuminant, which could also assist the e-paper manufacturers to design front light device, and present the best image quality on reflective displays. Furthermore, a similar experiment procedure was conducted by utilizing new flexible e-signage display developed by ITRI and an optimal front light device for the new display panel has been designed and utilized.

Comparison of HDTV formats in a consumer environment

Christian Keimel, Arne Redl, Klaus Diepold

Show abstract

High definition television (HDTV) has become quite common in many homes. Still, there are two different formats used currently in commercial broadcasting: one interlaced format, 1080i50/60, and one progressive format, 720p50/60.There have already been quite a few contributions comparing the visual quality of these formats subjectively under common standard conditions. These conditions, however, dont necessarily represent the viewing conditions in the real-life consumer environment. In this contribution we therefore decided to do a comparison under conditions more representative of the consumer environment with respect to display and viewing conditions. Furthermore, we decided to select not specially prepared test sequences, but real-life content and coding conditions. As we were not interested in the influence of the transmission errors, we captured the sequences directly in the play-out centre of a cable network provider in both 1080i50 and 720p50. Also we captured for comparison the same content in digital PAL-SDTV. We conducted extensive subjective tests with overall 25 test subjects and a modified SSIS method. The results show that both HDTV formats outperform SDTV significantly. Although 720p50 is perceived to have a better quality than 1080i50, this difference is not significant in a statistical sense. This supports the validity of previous contributions results, gained in standard conditions, also for the real-life consumer environment.

Image Quality and System Performance VIII

Volume Details

Table of Contents

Table of Contents