Medical images are routinely used to assist diagnosis, and while image-based tools are becoming more common, physicians conduct the majority of diagnostic imaging tasks by visual inspection. Even though reading medical images is now a common occurrence in health care settings worldwide, there is still much to learn about how this process works. The lack of a complete understanding limits our ability to optimize image acquisition and processing and to predict when a new imaging technology will lead to more accurate diagnoses than an existing standard.1–3
Figure 1 depicts a simple example of this phenomenon. Each vertical column of the panel consists of two images. One contains a Gaussian blob target at the center of the image, and the other does not. Going through the panel and picking out the image in each column that has the target is a very short version of a classical two-alternative forced-choice detection task. Most people score many or all of the 10 trials correctly. It would intuitively seem correct to choose whichever image has a higher intensity in the center of the image. However, a simple matched filter model4 that makes decisions on that basis alone only scores six of the 10 trials correctly, and similar findings have been borne out in much larger studies. To arrive at the correct answer, it is more likely that observers perform this task by formulating the contrast between the central pixels and some local background around them. The details of how a visual contrast like this is computed are important for understanding how factors like noise and image processing will impact diagnostic accuracy.
Figure 1. Forced-choice task panel. The target profile (A) is present in one of the images in each column of the task panel (B). In a two-alternative forced-choice task, the observer is asked to identify which image of each pair contains the target profile (answer key at the end of the article).
To better understand how visual tasks such as this are performed, researchers have developed a methodology called ‘classification images’ for visual perception experiments that has vastly improved our ability to determine how such tasks are performed.5, 6 Fundamentally, the approach analyzes a comparison between images that an observer has scored incorrectly to those the observer has scored correctly. The method requires that the task be limited by ‘noisy’ images (i.e., random variability in the intensity of the image pixels). It produces a map that can be thought of as displaying the weight that the observer applied to each pixel in the image when performing the task. At the Vision and Image Understanding Laboratory at the University of California-Santa Barbara, we have been using this classification-image methodology as a way to better understand how human observers perceive images in the presence of image degradations and noise. Our larger goal is to improve models of visual performance for optimizing imaging systems for specific tasks.
Effect of image contrast. The task consisted of detecting a Gaussian blob profile (A) embedded in low-contrast or high-contrast noise. Example images (B) show target-present images in both the low-contrast and high-contrast conditions (target contrast was adjusted for 80% correct performance). The observed classification images show a smaller central region of positive weighting and a stronger inhibitory surround at low contrast. This is seen more clearly in frequency plots, which show low-frequency suppression for images at lower contrast. This data was part of a larger published study.7
Effect of noise texture. The task consisted of detecting a Gaussian blob profile (A) embedded in (high-contrast) white noise or textured noise with a low-pass spectrum. Example images (B) show target-present images in both the white noise and low-pass noise conditions (target contrast was adjusted for 80% correct performance). The observed classification images show a smaller central region of positive weighting. The frequency weights of the classification images show evidence of shifting to higher frequencies in the presence of low-pass textured noise. This data was part of a larger published study.8
More technically, the approach is similar to reverse correlation techniques6, 9 used to map out the receptive fields of neurons in early visual pathways. The difference is that elements of the image are correlated with the behavioral response of the observer rather than the firing rate of a specific neuron. When the observer is well modeled as a noisy linear template, the classification image produces an unbiased estimate of the linear pixel weights of the template.5, 10,11 When the linear template model is not a complete description of the observer, the classification image may still be useful as a way to compare a model to data from human observers. Higher-order generalizations of the classification-image methodology can be implemented through the use of Wiener and Volterra kernels.12 The methodology is fairly demanding—we typically use a minimum of 2000 trials for tasks similar to that in Figure 1—but the information gained from the results is generally worth the effort.
Figures 2 and 3 show some examples of the classification-image methodology we have used to examine basic questions about how human observers extract information from images. Figure 2 shows an analysis of a simple manipulation of image contrast. A range of contrasts were used in the study,7 two of which are shown in Figure 2(B) (root-mean square contrast 25 and 2.5%). The classification images in Figure 2(C) from a single subject in the study show how the observer weights different parts of the image to perform this detection task. Closer inspection of the two classification images reveals some differences between the two tasks, specifically a narrower central region and some evidence of inhibitory side lobes at lower contrasts. These effects are shown more clearly in a spatial-frequency analysis of the classification images. The frequency plots suggest a transition to a more band-pass detection scheme as contrast decreases.
Figure 3 shows the effect of noise texture on detection of a Gaussian target. We used multiple tasks and various different noise textures in the study.8 The figure shows the results for a single subject in the study for white noise with independent variability from pixel to pixel compared to low-pass noise with long-range positive correlations between pixels. The classification images in Figure 3(C) show a narrowing of the central region of positive weights in the low-pass texture, and the frequency plots suggest a shifting to higher spatial frequencies to perform the task.
In both of these examples, the classification images show how the visual system adapts to the statistical environment of the images when an observer performs a task. We have begun to apply this approach in more clinical images such as x-ray mammograms,13, 14 and to more realistic tasks such as forced localization.15 It is our hope that a fuller understanding of how human observers perform visual tasks will lead to better models of observer performance for the purpose of optimizing and validating medical imaging technology.
(Answer key to Figure 1. b = bottom image; t = top image. 1 – b; 2 – t; 3 – b; 4 – t; 5 – b; 6 – t; 7 – b; 8 – t; 9 – t; 10 – t.)
Craig K. Abbey
University of California-Santa Barbara (UCSB)
Santa Barbara, CA
Craig Abbey is a researcher in the Department of Psychological and Brain Sciences at UCSB. His interests focus on the transfer of diagnostic information in medical imaging systems.
1. H. H. Barrett, Objective assessment of image quality: effects of quantum noise and object variability, J. Opt. Soc. Am. A 7, p. 1266-1278, 1990.
2. A. Burgess, Image quality, the ideal observer, and human performance of radiologic decision tasks, Acad. Radiol. 2, p. 522-526, 1995.
3. R. F. Wagner, G. G. Brown, Unified SNR analysis of medical imaging systems, Phys. Med. Biol. 30, p. 489-518, 1985.
4. H. H. Barrett, J. Yao, J. P. Rolland, K. J. Myers, Model observers for assessment of image quality, Proc. Nat'l Acad. Sci. USA 90, p. 9758-9765, 1993.
5. A. J. Ahumada Jr., Classification image weights and internal noise level estimation, J. Vis. 2, p. 121-131, 2002.
6. R. F. Murray, Classification images: a review, J. Vis. 11, p. 1-25, 2011.
7. C. K. Abbey, M. P. Eckstein, Frequency tuning of perceptual templates changes with noise magnitude, J. Opt. Soc. Am. A Opt. Image Sci. Vis. 26, p. B72-83, 2009.
8. C. K. Abbey, M. P. Eckstein, Classification images for simple detection and discrimination tasks in correlated noise, J. Opt. Soc. Am. A Opt. Image Sci. Vis. 24, p. B110-124, 2007.
9. P. Z. Marmarelis, K.-I. Naka, White-noise analysis of a neuron chain: an application of the Wiener theory, Science 175, p. 1276-1278, 1972.
10. C. K. Abbey, M. P. Eckstein, Classification image analysis: estimation and statistical inference for two-alternative forced-choice experiments, J. Vis. 2, p. 66-78, 2002.
11. C. K. Abbey, M. P. Eckstein, Optimal shifted estimates of human-observer templates in two-alternative forced-choice experiments, IEEE Trans. Med. Imag. 21, p. 429-440, 2002.
12. M. Schetzen, The Volterra and Wiener Theories of Nonlinear Systems , Wiley, New York, 1980.
13. C. Castella, C. K. Abbey, M. P. Eckstein, F. R. Verdun, K. Kinkel, F. O. Bochud, Human linear template with mammographic backgrounds estimated with a genetic algorithm, J. Opt. Soc. Am. A Opt. Image Sci. Vis. 24, p. B1-12, 2007.
14. C. Castella, M. P. Eckstein, C. K. Abbey, K. Kinkel, F. R. Verdun, R. S. Saunders, E. Samei, F. O. Bochud, Mass detection on mammograms: influence of signal shape uncertainty on human and model observers, J. Opt. Soc. Am. A 26, p. 425-436, 2009.
15. C. K. Abbey, M. P. Eckstein, High human-observer efficiency for forced-localization tasks in correlated noise, Proc. SPIE
7627, p. 76270R, 2010. doi:10.1117/12.843653