Automatic quality prediction of authentically distorted pictures
Social media and rapid advances in camera and mobile device technology have led to the creation and consumption of a seemingly limitless supply of visual content. However, the vast majority of these digital images are captured by casual amateur photographers whose unsure hands and eyes often introduce annoying artifacts during acquisition. In addition, subsequent storage and transmission of visual media can further degrade their visual quality.
Recent developments in visual modeling have elucidated the impact of visual distortions on perception of such pictures and videos. They have laid the foundation for automatic and accurate metrics that can identify and predict the quality of visual media as perceived by human observers.1To address this problem, several objective blind or no-reference (NR) image quality assessment (IQA) algorithms have been developed to predict the perceptual quality of a given (possibly distorted) image without additional information.2–7 Such quality metrics could be used to monitor and control multimedia services on networks and devices or to prioritize quality of transmission over speed, for example.
Real-world images are usually afflicted by mixtures of distortions that differ significantly from the single, unmixed distortions contained in restrictive and unrepresentative legacy databases.9–12 We recently designed a unique and challenging image data set with associated human opinion scores called the Laboratory for Image and Video Engineering (LIVE) authentic image quality challenge database8 (see Figure 1). Using this LIVE challenge database, we have been developing a robust blind IQA model for images suffering from real-world, authentic distortions. We call our model the ‘feature maps driven referenceless image quality evaluation engine’ (FRIQUEE) index. FRIQUEE outperforms other state-of-the-art blind IQA algorithms on both the LIVE legacy IQA9 and the LIVE challenge database8 (see Table 1).
PLCC | SROCC | |
---|---|---|
FRIQUEE | 0.7062 | 0.6824 |
BRISQUE 4 | 0.6104 | 0.6018 |
DIIVINE 3 | 0.5577 | 0.5094 |
BLIINDS-II 2 | 0.4496 | 0.4049 |
NIQE 5 | 0.4776 | 0.4210 |
S3 index 18 | 0.3243 | 0.3054 |
C-DIIVINE 7 | 0.6632 | 0.6350 |
We have empirically observed that the presence of different mixtures of authentic distortions affects the scene statistics1 of an image differently from when these distortions occur in isolation.13 FRIQUEE follows a feature-maps-driven statistical approach, avoiding any latent assumption about the type of distortion(s) contained in an image, focusing instead on the remarkable regularities of the scene statistics of real-world images in the absence of distortions. We capture a large and comprehensive collection of perceptually relevant and reliable ‘quality-sensitive’ statistical image features from these feature maps that supply us with greater discriminative power on authentic image distortion ensembles than the most successful NR IQA models. We also leverage the capability of a powerful deep belief network14 (DBN) to learn a model that maps the image features to complex feature representations that in turn predicts the subjective quality scores more accurately than the state-of-the-art models.
We process every image by first transforming it into color spaces known as RGB, LMS, and CIE LAB, then deriving a set of feature maps by applying operations (including steerable pyramid decomposition in the complex domain,7 difference of Gaussians,16 and Laplacian decomposition17) on the luminance component and on the four chroma components, i.e., A and B from LAB space, and M and S from the LMS color space (see Figure 2). The definition of each feature map is driven by established perceptual models of the transformations that occur at various stages of visual processing.
We then perform perceptually significant debiasing and divisive normalization operations15 on each feature map and model the statistical regularities and irregularities exhibited by their histograms using a generalized Gaussian distribution (in real or complex domains)19 or an asymmetric generalized Gaussian distribution.20We compute model parameters, such as shape and variance of these fits, and sample statistics, such as kurtosis and skewness, and use them as image features. A DBN, when trained with these image features as input, generates deep feature representations. These deep features are later used to train a support vector regressor (SVR) such that, given a test image's deep features, its quality is predicted (see Figure 3).
We evaluate the performance of several NR IQA algorithms by computing the correlation between the scores predicted by the algorithm and the ground truth subjective scores (see Table 1, which shows that our proposed model combining robust features and a DBN outperforms several other IQA models on unseen test data).
By identifying and addressing the challenges in perceptual quality prediction of images containing mixtures of authentic distortions using FRIQUEE, we have shown that novel techniques push the boundaries of quality prediction powers of state-of-the-art IQA models. To further improve the prediction power of blind IQA models on real-world distortions, we plan to combine techniques to consider high-level cognitive factors such as semantic content, attention, and aesthetic quality, in addition to accommodating skewness in the real-world distribution of distortions by machine learning.21–23
Deepti Ghadiyaram is a PhD candidate interested in the intersection of image/video processing and machine learning.
Alan Bovik is the Cockrell Family Regents Endowed Chair Professor and director of LIVE.