Medical Imaging 1996: Image Perception

Volume Details

Date Published: 27 March 1996

Contents: 4 Sessions, 21 Papers, 0 Presentations

Conference: Medical Imaging 1996 1996

Volume Number: 2712

All links to SPIE Proceedings will open in the SPIE Digital Library.

Show all abstracts

View Session

Modeling Visual Signal Detection
Human Performance Evaluation
Human Performance Evaluation: Image Compression and Image Displays
Poster Session

Modeling Visual Signal Detection

Detectability of moving objects in fluoroscopy

Ping Xue, Richard Aufrichtig, David L. Wilson

Show abstract

We investigate the effect of motion on detectability of low-contrast objects using 2 fluoroscopic imaging techniques: 30 acq/sec (pulsed-30) and 15 acq/sec pulsed (pulsed-15). We measure detectability using a new reference/test, adaptive 9-alternative forced-choice method. For pulsed-30 (reference) and pulsed-15 (test), we measure absolute detectabilities, an equivalent perception dose (EPD) for test, and response times. Computer-generated phantoms are x-ray projections of cylinders that mimic arteries, catheters, and guide wires. In the case of the larger cylinders, motion increases absolute detectability. With the smaller cylinders, motion decreases detectability. Results from four subjects show that dose savings for pulsed- 15 is around 18% with relatively little effect of velocity or object size. In general, subjects take slightly longer to respond in the case of low acquisition fluoroscopy.

Detection and contrast discrimination of moving signals in uncorrelated Gaussian noise

Miguel P. Eckstein, James Stuart Whiting, James P. Thomas

Show abstract

We investigate human visual detection and contrast discrimination of a moving Gabor signal in spatiotemporal white noise. We measure performance as a function of signal contrast for detection and contrast discrimination in a 4 alternative forced choice task. Observers were instructed and trained to maintain their gaze on a fixation point at all times during the experiment. The effect of signal contrast on human detection and contrast discrimination performance (d') for a moving signal in spatiotemporal noise is similar to that found for the case of a stationary signal in spatial noise. It can be described by a linear function with a positive x-intercept for detection and a 0 intercept for contrast discrimination. The difference in x-intercepts for the detection and contrast discrimination tasks are consistent with signal uncertainty. The improvement in performance with increasing number of frames is different for the detection and contrast discrimination tasks. Results show performance improvement with number of frames that saturates much later (750 - 800 msec) than would be expected from the early temporal filters (100 - 150 msec). Observers are more efficient detecting a stationary signal than a moving signal (when no eye tracking is allowed) in spatiotemporal noise. In an additional experiment where the signal interframe displacement was increased, observer performance (d') decreased with increasing interframe signal displacements dropping 50% for an interframe displacement of 70 min. of arc showing that human performance for detection of a moving signal is affected by the specific characteristics of the signal motion.

Effect of additive noise, signal contrast, and feature motion on visual detection in structured noise

James Stuart Whiting, Miguel P. Eckstein, Craig A. Morioka, et al.

Show abstract

This paper investigates signal detectability in a fixed structured background as a function of signal contrast, additive white noise and feature motion. We use a 4 AFC (alternative forced choice) detection task where the signal appeared at the center of one of four identical, clearly visible, simulated cylindrical artery segments. All four segments moved identically relative to the background in 32-frame image sequences displayed at 15 frames per second. The background in one condition was uniform and in a second condition was structured noise consisting of a single frame randomly selected from a group of clinical x-ray coronary angiograms. We studied two display formats, the 'moving artery' display in which the background was stationary and the cylinders moved back and forth, simulating the motion of the coronary arteries, and the 'stabilized artery' in which each frame of the sequence was translated to keep the cylinders stationary, while the background moved back and forth. The signal to be detected was a disk superimposed at the center of one of the simulated cylinders. Signal energy and the variance of additive Gaussian spatiotemporal white noise were manipulated. For each level of additive white noise the threshold signal energy for detection (at the 82% correct performance level) was determined. There was no time limit to reach decision. For all conditions the threshold signal energy increased linearly with added white noise variance, with a positive y-intercept. The presence of the structured background increased both the y-intercept and the slope of this relationship between threshold energy and added white noise variance. Thus, the presence of the structured background had a multiplicative effect, as well as an additive effect, on the degradation of performance due to added white noise. The multiplicative effect might be modeled by an increase in induced internal noise (noise proportional to the external noise) with the presence of the structured background. Such an effect, if it occurs in the setting of clinical coronary angiography, would cause changes in radiation exposure (and thus quantum noise) to affect visual perception more than expected from experiments with white noise alone. One possible mechanism for this effect may be that the added random noise interferes with the observer's use of spatiotemporal correlations to 'subtract' or 'read around' the structured noise.

Detection of clusters of simulated calcifications in lumpy noise backgrounds

Philip F. Judy

Show abstract

The effect of the lump amplitude of lumpy backgrounds on human observers' ability to find clusters of simulated calcifications was measured. Clusters of simulated calcifications were randomly located in 56-pixel-radius circular areas of lumpy backgrounds to which Gaussian noise was added. The clusters of simulated calcifications were projections of five 2.5 pixel- radius spheres randomly located less than 10 pixels of its center. The lumpy backgrounds were produced by adding to the circular areas at random locations, 50 Gaussian lumps of a standard deviation of 7 pixels. Stimulus sets were rendered on the gray-scale monitor of a computer workstation. Three observers searched each area for the cluster. They indicated the location most likely to contain a cluster with a mouse pointer and rated the likelihood that the indicated location actually contained a cluster. The Gaussian noise was adjusted so that the fraction of clusters found was approximately 0.5 for the three levels of amplitude of the lumpy backgrounds and the uniform backgrounds. The detectability, d', was calculated from the fraction of clusters found using an M-alternative forced choice model. A cluster was scored as found (a hit) if the center of the cluster was contained within a specified area surrounding the observer indicated location. The observer detection loss ratio, the ratio of d' to SNR-SKE, decreased from 0.6 for the uniform noise background to 0.25 for the lumpy backgrounds. Observers' ability to find clusters of simulated calcifications is significantly decreased by lumpy backgrounds.

Observer signal-to-noise ratios for the ML-EM algorithm

Craig K. Abbey, Harrison H. Barrett, Donald W. Wilson

Show abstract

We have used an approximate method developed by Barrett, Wilson, and Tsui for finding the ensemble statistics of the maximum likelihood-expectation maximization algorithm to compute task-dependent figures of merit as a function of stopping point. For comparison, human- observer performance was assessed through conventional psychophysics. The results of our studies show the dependence of the optimal stopping point of the algorithm on the detection task. Comparisons of human and various model observers show that a channelized Hotelling observer with overlapping channels is the best predictor of human performance.

Human Performance Evaluation

Comparison of observer performance for real and simulated nodules in chest radiography

Ehsan Samei, Michael J. Flynn, Gordon H. Beute, et al.

Show abstract

Observer performance studies are frequently performed to evaluate and compare various radiographic techniques with regard to detection of subtle lung nodules using simulated nodules. The fidelity of the simulated nodules and their equivalency to the real lesions have not been investigated in prior studies. We have developed simulated Teflon nodules with contrast characteristics emulating real nodules found in a database of graded chest radiographs. This paper presents a summary of the method used to make the phantoms, describes our technique for positioning the phantoms in human subject studies, and reports the preliminary results of an observer performance study for detection of the simulated nodules compared to real lesions.

Defining the perceptual processes involved with mammographic diagnostic errors

Mark D. Mugglestone, Alastair G. Gale, Helen C. Cowley, et al.

Show abstract

Screening for breast cancer using mammography is currently the most sensitive technique for detecting early signs of this disease. Interpreting mammograms is a complex process and in common with other areas of visual inspection some errors of over and under-reading occur. A study is reported which investigates the nature of these errors in relation to both the initial overall global impression and the subsequent detailed visual search of the mammogram. Results demonstrate the importance of detailed visual search in mammography to detect possible abnormalities and the role of search factors in error occurrence.

Visual search in angiograms: does geometry play a role in saliency?

Jannick P. Rolland, Christopher S. Helvig

Show abstract

Quantifying and modeling how the human eye searches medical images is important. Equally detectable lesions may not be equally salient in a visual search. This research investigates the effect of geometry on the saliency of stenoses in angiograms. A previous experiment suggested that stenoses located in areas of high curvature along vessels would be less salient. In this paper, we measure the saliency of stenoses in two steps: first, we measure stenoses amplitude detection thresholds for stenoses at three values of curvature along vessels; second, after adjusting the degree of stenosis at these curvatures to multiples of the measured thresholds to achieve equal detectability, we measure the saliency of the stenoses in a visual search experiment. Median reaction time is used as a measure of saliency. We found that thresholds increase as curvature increases. This finding explains the decrease in saliency found in a previous investigation. Results also show that median reaction time is constant across different curvatures for 70% and higher degrees of stenosis, indicating that saliency is independent of geometry for clinically significant angiographic lesions.

Nature of expertise in searching mammograms for breast masses

Calvin F. Nodine, Harold L. Kundel, Sherri C. Lauver, et al.

Show abstract

Eye position of observes representing four levels of training and experience: mammographers; mammography residents; mammographic technologists; and, laypersons were compared to a random search model as they examined a set of nine two-view digital mammogram pairs for breast masses. Analysis of time-to-hit data revealed that mammographers' training and experience combined to produce the most efficient search patterns as measured by the fastest search times to detect breast masses on two views. Scanning patterns of mammography residents and mammographic technologists were less efficient due to wider dispersion of visual attention that was divided between potential breast masses and perturbations in breast parenchyma. Because laypersons lacked training in radiology, bright blobs in the breast image were considered to be intuitively valid target candidates, and these features distracted search by capturing visual attention.

Influence of experience on scanning strategies in mammography

Elizabeth A. Krupinski

Show abstract

The goal of this study was to determine if there are significant differences in the ways in which experienced and inexperienced radiologists search mammograms for the detection of lesions. The eye position of six radiologists (3 staff mammographers, 3 radiology residents) was recorded as they searched mammograms for masses and microcalcifications. True and false positive decisions were associated with prolonged gaze durations; false-negative decisions were associated with longer gaze durations than true-negatives. Readers with more experience tended to detect lesions earlier in search than readers with less experience; but those with less experience tended to spend more time overall searching the images and cover more image area than those with more experience. Mammographic search for readers with different degrees of experience can be characterized by gaze durations, scan paths and detection times.

Mammographic training sets for improving breast cancer detection

Helen C. Cowley, Alastair G. Gale, A. R. M. Wilson

Show abstract

The PERFORMS^C (personal performance in mammographic screening) self-assessment program has demonstrated large individual differences in both cancer and feature detection. It has also illustrated national differences in feature detection, certain features are undetected and misinterpreted more frequently than others. These differences are also found in the breast screening program. Subsequently revision training sets concentrating on these specific features have been developed. The use of these has been shown to increase feature detection and thereby cancer detection.

Human Performance Evaluation: Image Compression and Image Displays

Evaluation of the diagnostic quality of chest images compressed with JPEG and wavelet techniques: a preliminary study

Cathlyn Y. Wen, Fleming Yuan Ming Lure, Roger S. Gaborski

Show abstract

Image compression reduces the amount of space necessary to store digital images and allows quick transmission of images to other hospitals, departments, or clinics. However, the degradation of image quality due to compression may not be acceptable to radiologists or it may affect diagnostic results. A preliminary study with small-scale test procedures was conducted using several chest images with common lung diseases and compressed with JPEG and wavelet techniques at various ratios. Twelve board-certified radiologists were recruited to perform two types of experiments. In the first part of the experiment, presence of lung disease on six images was rated by radiologists. Images presented were either uncompressed or compressed at 32:1 or 48:1 compression ratios. In the second part of the experiment, radiologists were asked to make subjective ratings by comparing the image quality of the uncompressed version of an image with the compressed version of the same image, and then judging the acceptability of the compressed image for diagnosis. The second part examined a finer range of compression ratios (8:1, 16:1, 24:1, 32:1, 44:1, and 48:1). In all cases, radiologists were able to make an accurate diagnosis on the given images with little difficulty, but image degradation perceptibility increased as the compression ratio increased. At higher compression ratios, JPEG images were judged to be less acceptable than wavelet-based images, however, radiologists believed that all the images were still acceptable for diagnosis. Results of this study will be used for later comparison with large-scale studies.

Nodule detection performance in compressed chest CT images

Uri Feldman, Philip F. Judy, Steven E. Seltzer, et al.

Show abstract

An investigation was performed to evaluate objectively observer's ability to find lung nodules on compressed spiral computerized tomographic (CT) images of the chest. A set of 80 images from 13 patients served as backdrops. One simulated nodule of either 3.0, 3.4, 4.0, or 5.0 mm in diameter was inserted into each image. These 80 images were viewed on a computer screen in two formats: compressed with a wavelet transform coder at a compression rate of 40:1, and in the uncompressed 8 bit-per-pixel format, windowed down from the 12 bit-per- pixel originals. The images were presented one at a time in random order, as two conditions of 80 images each. Six observers searched for lung nodules on both the original and compressed formats. The tasks were to locate the nodule in each image, and, using a five category rating scale, to indicate the confidence that the indicated location contained a nodule. The results indicate that all observers detected a higher fraction of nodules in the original images than in the compressed images. Even though the compressed images were described by the observers as unacceptable for clinical use because they contained numerous artifacts, the percentage of 4 and 5 mm nodules found in the compressed images was high. Directions of further research include measurement of detection performance at lower compression rates, identification of compression artifacts that get confused with nodules, and analysis of the confidence ratings.

Contrast-detail analysis of the effect of image compression on computed tomographic images

Larry T. Cook, Glendon G. Cox, Michael F. Insana, et al.

Show abstract

Three compression algorithms were compared by using contrast-detail (CD) analysis. Two phantoms were designed to simulate computed tomography (CT) scans of the head. The first was based on CT scans of a plastic cylinder containing water. The second was formed by combining a CT scan of a head with a scan of the water phantom. The soft tissue of the brainwas replaced by a subimage containing only water. The compression algorithms studied were the full-frame discrete cosine (FDCT) algorithm, the Joint Photographic Experts Group (JPEG) algorithm, and a wavelet algorithm. Both the wavelet and JPEG algorithms affected regions of the image near the boundary of the skull. The FDCT algorithm propagated false edges throughout the region interior to the skull. The wavelet algorithm affected the images less than the other compression algorithms. The presence of the skull especially affected observer performance on the FDCT compressed images. All of the findings demonstrated a flattening of the CD curve for large lesions. The results of a compression study using lossy compression algorithms is dependent on the characteristics ofthe image and the nature of the diagnostic task. Because of the high density bone of the skull, head CT images present a much more difficult compression problem than chest x-rays. We found no significant differences among the CD curves for the tested compression algorithms. Key Words: Image compression, contrast-detail analysis.

Optimizing the tonescale display of computed radiography images

Walter Huda, Richard M. Slone, Beverly A. Hoyle, et al.

Show abstract

Computed radiography (CR) systems transform exposures incident on the imaging plates to image densities using examination specific tonescale display algorithms. A procedure is proposed which permits CR users to optimize these algorithms based on the premise that image contrast should be optimized. This procedure was applied to portable chest x-rays on a medical intensive care unit. Changes made to the display parameters supplied by the manufacturer resulted in an improved quality of the displayed image.

Measurements of the perceived dynamic range of a medical imaging workstation

Robert S. Kenney, David S. Channin M.D., Fred W. Prior

Show abstract

Murch and Weiman have demonstrated that greater than 11 bits of contrast information are perceivable by a human observer. Digital display controllers with 10 or 12 bit digital to analog converters are becoming available. Before attempting to determine if these technologies improve the clinical effectiveness of medical imaging workstations it is first necessary to determine if measurable differences can be produced in the perceived dynamic range (PDR) of the displays. A set of experiments have been performed to determine a baseline PDR for an 8- bit per pixel display. This data will be used as the control for future measurements at 10 bits per pixel. The experimental design includes all psychovisual factors that affect an observer's perception of contrast. Stimulus display duration, physical size of the stimulus and training factors were all studied and controlled in the experiments. Simple images are used to avoid complicating the observer's task and display time is kept short to prevent adaptation and boredom effects. Data was collected using four non-radiologists and four radiologists. Each subject had at least normal corrected vision and wore his corrective lenses during each session. All experiments were conducted on a SUN SPARC workstation using an Image Systems (M21P-47SO1-2KHB) portrait monitor driven by a modified DOME Imaging Systems (Md2/SUN) 10-bit, grayscale, video board initially configured to run in 8-bit mode. Specially developed software was used to control the experiments and to gather and analyze the data. Pizer and Chan's methodology for computing PDR was adapted for the above hardware and software environment. A rating experiment was used to determine the just noticeable difference in contrast for a given reference intensity. Integration over the range of the monitor provides the PDR for that display for one observer. This data is then averaged with all other observations to determine a baseline PDR. These experiments allow for the determination of a baseline PDR for comparison with future hardware configurations. All calibration, control and analysis software is in place such that new hardware can be easily evaluated.

Subjective and objective evaluation of image sharpness: behavior of the region-based image edge profile acutance measure

Silvia Delgado-Olabarriaga, Rangaraj M. Rangayyan

Show abstract

We recently proposed a region-based measure of image edge profile acutance to characterize the sharpness of a region of interest. In this paper we study the capability of the acutance measure to analyze relative sharpness in the presence of blurring and noise by comparing acutance to other measures of distortion and to subjective evaluation. The purpose of the experiment was to organize an image set in increasing order of sharpness with results obtained by objective image quality measures (acutance, mean squared error, normalized error, and normalized mean squared error) and to compare the results with subjective evaluation. A psychometric experiment was developed to perform sorting according to the subjective notion of sharpness. The region-based image edge profile acutance measure provided results that agree more closely with subjective evaluation of relative sharpness than the other measures studied. The acutance measure also exhibited a good level of immunity to noise, whereas the other measures provided ordering according to noise rather than sharpness.

Poster Session

Decision support in screening mammography

Sean M. Hammond, Ian R. L. Davies, Paul T. Sowden, et al.

Show abstract

We are developing a statistical decision support system for use in screening mammography, and here we report on the rationale underlying its design, and on some preliminary tests of the system. A single expert radiologist described 200 mammograms, with known outcome, in terms of 38 critical features. We then compared discriminant function analysis (DFA), logistic regression (LR) and a backpropagation neural network (BNN) on their performance in classifying the 200 mammograms as normal or abnormal. All three approaches achieved greater than 90% correct classification, but DFA had low sensitivity and LR had a 9% miss rate, whereas the BNN detected all the cancers. External evaluation of LR and BNN on a new set of 167 mammograms showed that specificity was still high (greater than 96%) but sensitivity was less than 85%. We propose developing a system combining LR and BNN.

Design of training schedules for medical-image readers: the effects of variability and difficulty levels of the training stimuli

Ian R. L. Davies, Penny Roling, Paul T. Sowden, et al.

Show abstract

We report two studies on the design of training schedules for medical-image readers. Experiment 1 required subjects to learn to classify ultrasound images of six different phantoms; one group of subjects trained on images that varied on just two of the four possible dimensions of variation on each day, while the other group trained on images that varied on all four dimensions of variation each day. Performance improved on each day, and learning transferred to novel stimuli shown on successive days. Initially performance was worse for the high-variation group, but by the final session they had reached the same level as the restricted variation group. Within-category similarity increased after training and inter-category similarity decreased. In the second experiment, the stimuli were x-rays of perspex blocks with holes drilled in one of five possible locations. The subject's task was to search for the image- feature produced by the hole (a dark spot). One group of subjects judged 'easy' images for the first four days, and then switched to judgments of 'difficult' holes on the final day, while a second group underwent the reverse order. Although both groups improved, the group that had trained on the easy stimuli showed positive transfer to the more difficult stimuli, but the group that trained on the difficult stimuli showed no transfer to the easy stimuli.

Improvement in screening performance: the importance of appropriate feedback in screening mammography

Paul T. Sowden, Regina Pauli, Penny Roling, et al.

Show abstract

After a brief formal training period screening mammographers are expected to further improve their skill as a result of learning from experience. However, it has been found that little improvement in screening performance occurs after initial training. One possible reason for this lack of improvement is because mammographers often receive no individual feedback on their performance during routine screening practice. While exposure to stimuli without feedback is sufficient to induce some types of learning, in other cases learning as a result of simple experience may not occur. In this paper we report two studies designed to investigate the importance of feedback for the learning of two visual detection tasks. Experiment one investigated whether the provision of feedback improved observers' ability to detect target features in x rays. Results indicated that in the absence of feedback no improvement in detection accuracy occurred, but that when feedback was provided performance improved. Experiment two examined the effects of feedback on the learning of a probabilistic target detection task using computer generated images. Observers' detection performance improved with practice. This improvement was not contingent upon observers receiving feedback although there was a trend for observers who received trial by trial feedback to exhibit better overall performance.

Optimizing the processing and presentation of PPCR imaging

Andrew G. Davies, Arnold R. Cowen, Geoff J. S. Parkin, et al.

Show abstract

Photostimulable phosphor computed radiography (CR) is becoming an increasingly popular image acquisition system. The acceptability of this technique, both diagnostically, ergonomically and economically is highly influenced by the method by which the image data is presented to the user. Traditional CR systems utilize an 11' by 14' film hardcopy format, and can place two images per exposure onto this film, which does not correspond to sizes and presentations provided by conventional techniques. It is also the authors' experience that the image enhancement algorithms provided by traditional CR systems do not provide optimal image presentation. An alternative image enhancement algorithm was developed, along with a number of hardcopy formats, designed to match the requirements of the image reporting process. The new image enhancement algorithm, called dynamic range reduction (DRR), is designed to provide a single presentation per exposure, maintaining the appearance of a conventional radiograph, while optimizing the rendition of diagnostically relevant features within the image. The algorithm was developed on a Sun SPARCstation, but later ported to a Philips' EasyVisionRAD workstation. Print formats were developed on the EasyVision to improve the acceptability of the CR hardcopy. For example, for mammographic examinations, four mammograms (a cranio-caudal and medio-lateral view of each breast) are taken for each patient, with all images placed onto a single sheet of 14' by 17' film. The new composite format provides a more suitable image presentation for reporting, and is more economical to produce. It is the use of enhanced image processing and presentation which has enabled all mammography undertaken within the general infirmary to be performed using the CR/EasyVisionRAD DRR/3M 969 combination, without recourse to conventional film/screen mammography.

Medical Imaging 1996: Image Perception

Volume Details

Table of Contents

Table of Contents