Proceedings Volume 3663

Medical Imaging 1999: Image Perception and Performance

cover
Proceedings Volume 3663

Medical Imaging 1999: Image Perception and Performance

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 24 May 1999
Contents: 9 Sessions, 39 Papers, 0 Presentations
Conference: Medical Imaging '99 1999
Volume Number: 3663

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Display Parameters and Performance I
  • Display Parameters and Performance II
  • ROC and Other Performance Methodologies
  • Perceptual Processes and Performance
  • Technology Assessment and Observer Performance I
  • Technology Assessment and Observer Performance II
  • Modeling Visual Signal Detection I
  • Modeling Visual Signal Detection II
  • Poster Session
  • Modeling Visual Signal Detection II
  • Poster Session
Display Parameters and Performance I
icon_mobile_dropdown
Observer performance assessment of JPEG-compressed high-resolution chest images
Walter F. Good, Glenn S. Maitz, Jill L. King, et al.
The JPEG compression algorithm was tested on a set of 529 chest radiographs that had been digitized at a spatial resolution of 100 micrometer and contrast sensitivity of 12 bits. Images were compressed using five fixed 'psychovisual' quantization tables which produced average compression ratios in the range 15:1 to 61:1, and were then printed onto film. Six experienced radiologists read all cases from the laser printed film, in each of the five compressed modes as well as in the non-compressed mode. For comparison purposes, observers also read the same cases with reduced pixel resolutions of 200 micrometer and 400 micrometer. The specific task involved detecting masses, pneumothoraces, interstitial disease, alveolar infiltrates and rib fractures. Over the range of compression ratios tested, for images digitized at 100 micrometer, we were unable to demonstrate any statistically significant decrease (p greater than 0.05) in observer performance as measured by ROC techniques. However, the observers' subjective assessments of image quality did decrease significantly as image resolution was reduced and suggested a decreasing, but nonsignificant, trend as the compression ratio was increased. The seeming discrepancy between our failure to detect a reduction in observer performance, and other published studies, is likely due to: (1) the higher resolution at which we digitized our images; (2) the higher signal-to-noise ratio of our digitized films versus typical CR images; and (3) our particular choice of an optimized quantization scheme.
Acceptable compression ratios for CR images for PACS archives
Bijoy M. Misra, Philip F. Judy, Francine L. Jacobson, et al.
We make a preliminary study with a ROC experiment to determine the acceptable levels of image compression that may be utilized for PACS archives. CR images of 1760 X 2140 pixel size and 10 bit depth are studied. The experiment uses wavelet algorithm for image compression and printed films for image viewing. The 'internal standard' experiment results in an acceptable value of compression ratio of 6 for imperceptible difference (da' equals 1) between the compressed and the uncompressed image. Such ratio would lead a storage reduction factor of 9.6 for these images. The information capacity of the CR images may be extrapolated to be 40 bits per millimeter of viewing area.
Effect of denoising on the consistency of manually defined borders
Xuli Zong, Edward A. Geiser, Donald A. Conetta M.D.
This paper presents experimental results and analysis for a study of the effect of de-noising on the consistency and reliability of manually-defined borders of echocardiograms. De-Noising and image enhancement have been performed on a test data set of 60 sequences from an echocardiographic database exhibiting diverse image quality. Both endocardial and epicardial borders of end diastolic and end systolic frames were manually identified for original and enhanced images based on image perception and wall motion information. Statistical analysis was performed on the identified borders. The experimental results show that de-noising and image enhancement help to improve the consistency of the manually- defined borders for the data set.
Clinical validation studies on clinical workstation prototypes
Bert Verdonck, Karel C. Strasters, Frans A. Gerritsen
This paper reports on our experience with setting up and analyzing clinical validation studies for new or improved medical image processing algorithms. This is illustrated with two specific examples: (1) clinical validation of a motion correction algorithm to improve the image quality of digital subtraction angiography (DSA) imaging and (2) clinical validation of an overview image reconstruction algorithm for translated X-ray image sequences (bolus chase reconstruction, spine imaging, colon image map or leg bone imaging).
Effects of radiation dose and display contrast on low-contrast phantom image visibility
Charles C. Chamberlain, Walter Huda, Andrij R. Wojtowycz
Computed radiography (CR) radiographs were generated of a low contrast phantom with 5 mm diameter disks. The radiation exposure incident on the imaging plate was varied from approximately 0.1 mR to approximately 10 mR, with the phantom images printed to film using a range of display contrast settings. Changing the radiation exposure by two orders of magnitude had only a modest effect on disk detection performance (approximately 20%), and much less than predicted by signal detection theory for the perception of noise limited objects. For images generated at approximately 1 and approximately 10 mR, increasing the display contrast markedly improved the disk detection performance (approximately 50%). There was approximate agreement between the experimental data and the corresponding theoretical predictions for the detection of contrast limited objects. For the contrast detail phantom employed in this study, disk detection was primarily contrast limited, with image noise being relatively unimportant. Lesion detection with an anthropomorphic phantom containing a structured background wold be unlikely to change this conclusion, since noise is expected to be most important for low contrast objects viewed against a uniform background. Contrast enhancement, as opposed to increasing radiation exposure, is therefore the method of choice for improving the detection of 5 mm diameter sized low contrast lesions in CR images.
Past and future of radiologic error
Paul J. Friedman
Although radiographs are extremely important in clinical medicine, their interpretation is susceptible to significant error: mistakes in detection and interpretation result in a false negative rate of 30-40%. This figure has not changed with nearly two generations of research.
Display Parameters and Performance II
icon_mobile_dropdown
Radiologists' ability to use computer-aided diagnosis (CAD) to improve breast biopsy recommendations
An important issue in developing computer-aided diagnosis (CAD) methods is to demonstrate the ability of a computer aid to improve diagnostic accuracy when used by radiologists. A related issue to that is to understand how radiologists' decision-making is influenced by the computer aid. We conducted an observer performance study designed to address these issues for a computerized classification scheme for malignant and benign clustered microcalcifications in mammograms. Results of the study showed that radiologists' diagnostic performance improved significantly when they used the computer aid. The results also showed that radiologists were able to incorporate the computer aid -- a quantitative analysis of the mammogram -- effectively in their decision- making.
Computer algorithm for automated detection and quantification of microaneurysms and hemorrhages (HMAs) in color retinal images
Samuel C. Lee, Yiming Wang, Elisa T. Lee
This paper presents a computer algorithm for automatic quantification of HMAs in a color retinal image. The algorithm begins with an image quality test. If the image is determined to be useful (normal), image processing and pattern recognition techniques are then applied. The image processing techniques employed are designed to achieve three purposes, image enhancement, noise removal, and most importantly, image normalization. It is followed by the detection of (1) optic disc and macula, (2) flame and blot hemorrhages, and (3) dot hemorrhages and microaneurysms. A special polar coordinate system centered at the macula is proposed. Such a coordinate system is particularly attractive in describing the location of a lesion relative to the center of the macula. In addition, it can be viewed as a 'spider net' and thus can be used to catch hemorrhages of large size, e.g., flame and blot hemorrhages, they way a spider net to catch insects. The spider net, however, will not work for the detection of microaneurysms and dot hemorrhages, because their sizes are too small to be caught by the net. A method specially designed for the detection of microaneurysms and dot hemorrhages is presented. It uses a sequence of seven automatically globally- thresholding binary images, obtained from the pre-processed normalized image, and a set of matched filters using only binary coefficients for differentiating HMAs and blood vessels. At the end, a computer printout of list of all the HMAs detected and their sizes and locations is given. Over four hundred color fundus photographs including standard fundus photographs are used to test the system. It should be pointed out that the sensitivity of this system can be adjusted by the user. By comparing the computer detected and quantified HMAs with the manual counts, it is found that the results are quite satisfactory. Therefore, we conclude that with the sensitivity of the system adjusted to human experts, this system can provide an automatic, objective, and repeatable way to quantify HMAs accurately.
ROC and Other Performance Methodologies
icon_mobile_dropdown
Using incomplete and imprecise localization data on images to improve estimates of detection accuracy
Richard G. Swensson, Glenn S. Maitz, Jill L. King, et al.
We tested new analytic procedures for combining an observer's image-ratings of lesion-likelihood with localization reports that are incomplete (unavailable on images rated as 'normal') and/or imprecise (possibly scored as 'correct' by chance), and for fitting a constrained ROC formulation to the rating data alone. Eight radiologist readers in a previous study had rated the likelihood of nodular lesions on each of 250 chest-film cases (39 with subtle nodules, 36 with 'typical' nodules and 175 normal cases) that were presented in two display modes (original films or on video workstation). Ratings in the four positive categories (2 to 5) were accompanied by reports that grossly localized the suspected nodules into one of 7 film- regions (upper, middle or lower portions of left or right lung field, or retrocardiac), but there was no localization for the cases rated as 'normal' (category 1). In each of 29 sets of data, we estimated the area below the ROC curve (Az) and its standard error using three different fits: (1) the usual ROC formulation, (2) the constrained ROC formulation and (3) the new procedure that included incomplete and imprecise localization data (I&I). Estimates of Az from the usual and constrained ROC fits were quite similar unless the standard ROC exhibited an upward 'hook,' but standard errors of Az were always the same or smaller for the constrained ROC fit. The I&I fit that included localization data often estimated Az to be either larger or smaller than the usual or constrained ROC fits that considered only the rating data, but its Az had substantially smaller standard errors in 28 of the 29 sets of observer data.
Differential receiver operating characteristic (DROC) method: rationale and results of recent experiments
Dev Prasad Chakraborty, Nelson Scott Howard, Harold L. Kundel
The Differential Receiver Operating Characteristic (DROC) method has been recently proposed as a method of more sensitively determining which of two modalities has a higher Az value. This method is unlike the Receiver Operating Characteristic (ROC) method, which employs a single image interpretation strategy. The DROC experiment requires the reader to interpret pairs of images of the same patient, one from each modality termed A and B. This study reports on methodological improvements that we have made as well as experiments that we have conducted since this method was last proposed. The original B-modality image set consisted of 60 digitized normal mammograms. We simulated abnormal images from these original images by superimposing speck-like patterns resembling clustered microcalcifications. Applying a wavelet compression-decompression program to these two sets of images yielded the corresponding A-modality image sets. An image creation and display program was written with a user-friendly interface. Five readers interpreted the images in alternating ROC and DROC sessions. In the ROC sessions the diagnosis (abnormal/normal) and associated confidence level (0 - 100) was indicated for each image. In the DROC-sessions, two decisions and associated confidence levels were indicated for each image pair: the diagnosis (abnormal/normal) and a preference (A/B) for the modality that yielded the higher confidence level for the diagnosis decision. It was found that both DROC and ROC showed that Az (A) less than Az(B), demonstrating that the observers could readily detect the degradation introduced by the compression. In addition, the DROC critical ratio was larger than the corresponding ROC critical ratio for all observers. This confirmed the earlier published results, which used a noise-addition processing and a non-clinical simulation. The combined experiments continue to indicate that DROC has the potential advantage over ROC of increased sensitivity to image quality differences. Issues of bias and their effect on ROC and DROC readings are discussed and suggestions are made for further improvements to DROC methodology. The significance of the increased sensitivity potentially offered by DROC may transform future observer performance studies. Fewer cases and readers may be needed to conduct DROC studies with equivalent power to ROC studies. Thus, DROC would enable more convenient testing of imaging modalities, allowing design engineers to more quickly optimize imaging variables.
Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: split-plot experimental design
Donald D. Dorfman, Kevin S. Berbaum, Russell V. Lenth, et al.
The major purpose of this paper was to evaluate the Dorfman/Berbaum/Metz (DBM) method for analyzing multireader receiver operating characteristic (ROC) discrete rating data on reader split-plot and case split-plot designs. It is not always appropriate or practical for readers to interpret imaging studies of the same patients in all modalities. In split plot designs, either a different sample of readers is assigned to each modality or a different sample of cases is assigned to each modality. For each type of split-plot design, a series of null-case Monte Carlo simulations were conducted. The results suggest that the DBM method provides trustworthy alpha levels with discrete ratings when ROC area is not too large, and case and reader sample sizes are not too small. In other situations, the test tends to be somewhat conservative. Our Monte Carlo simulations show that the DBM multireader method can be validly extended to the reader-split and case- split plot designs.
Gains in accuracy from averaging ratings of abnormality
Richard G. Swensson, Jill L. King, David Gur, et al.
Six radiologists used continuous scales to rate 529 chest-film cases for likelihood of five separate types of abnormalities (interstitial disease, nodules, pneumothorax, alveolar infiltrates and rib fractures) in each of six replicated readings, yielding 36 separate ratings of each case for the five abnormalities. Analyses for each type of abnormality estimated the relative gains in accuracy (area below the ROC curve) obtained by averaging the case-ratings across: (1) six independent replications by each reader (30% gain), (2) six different readers within each replication (39% gain) or (3) all 36 readings (58% gain). Although accuracy differed among both readers and abnormalities, ROC curves for the median ratings showed similar relative gains in accuracy. From a latent-variable model for these gains, we estimate that about 51% of a reader's total decision variance consisted of random (within-reader) errors that were uncorrelated between replications, another 14% came from that reader's consistent (but idiosyncratic) responses to different cases, and only about 35% could be attributed to systematic variations among the sampled cases that were consistent across different readers.
Dynamic viewing protocols for diagnostic image comparison
David H. Foos, Richard M. Slone, Bruce R. Whiting, et al.
There is an ongoing need to evaluate the impact of various digital image processing and display variables on diagnostic image quality. In most cases, evaluation includes comparison of images, often multiple versions of the same image. In order to improve speed and sensitivity, new protocols were developed to enhance a radiologist's ability to detect subtle changes in images and provide a means to quantify differences in a standard fashion. The protocols make use of the rapid sequential display of registered images on a single high- resolution CRT (a.k.a., flicker) and 2X magnification in order to increase observer sensitivity. The flicker technique was implemented in the form of an image comparison workstation (ICW) that was designed to facilitate the evaluation of different image processing options. The ICW was developed with capabilities to interactively control the rate of flicker between image pairs (up to 5 Hz), the degree of image magnification (1X to 4X), and the selection of the region of interest (ROI). Three specific protocols were developed based on the flicker technique, two forms of forced-choice and a rank-ordering protocol employing a reference set comprised of images with varying degrees of spatial-resolution degradation. All three protocols were exercised as part of an observe study whose goal was to establish visually lossless compression levels for JPEG and a wavelet-transform based algorithm. The results indicate that, for high resolution digitally acquired posteroanterior (PA) chest radiographs presented to observers at 2X magnification on a 2K X 2.5K addressable pixel monochrome display, the visually lossless thresholds for both JPEG and wavelet occur in the range of 2.0 to 1.5 bits-per- pixel (approximately equals 10:1). These results are a conservative estimate of the visually lossless threshold because of the sensitive nature of the experimental methodology.
Perceptual Processes and Performance
icon_mobile_dropdown
Eye-position study of the effects of a verbal prompt and pictorial backgrounds on the search for lung nodules in chest radiographs
Harold L. Kundel, Calvin F. Nodine, Lawrence C. Toto
Peripherally inconspicuous nodules on chest radiographs are frequently missed by competent readers. In order to find a peripherally inconspicuous nodule the reader must inspect the nodule site with the central vision and decide if the features at the site are sufficiently characteristic to report a nodule. The experiment reported here was carried out to examine the effect of a nodule prompt and distraction by unrelated native abnormalities on the location and recognition of inconspicuous lung nodules on chest x-ray images. On two occasions separated by 3 years, 4 radiologists had their eye- position recorded while viewing 24 chest radiographs, 12 with prominent native abnormalities and 12 with no abnormalities. An inconspicuous nodule was simulated in the lungs of half of the radiographs on the first viewing and in the other half on the second viewing. For the first viewing, the readers were instructed to report any abnormalities. For the second viewing the readers were told to report any abnormalities including nodules. A nodule prompt triggers a scanning strategy that sends the central vision to high probability nodule sites early in search and at the same time relaxes the criteria used to evaluate nodule features resulting in more true positives and false positives without a change in absolute detectability. Prominent native abnormalities, unrelated to nodules, do not affect the search strategy but competitively inhibit the nodule feature recognition mechanism.
Medical image compression using attention modeling
This paper describes the use of a simple model of visual importance in digital images, based on the use of contrast statistics for prediction of areas of detail which influence observer attention, to allow image compression to be undertaken at varying quality within an individual image. The effects of this strategy on expert observer perception of the reconstructed images, based on subjective image quality judgment and analysis of eye-position patterns, are considered by applying the model to sample mammograms, cervical and thoracic X-rays. The results indicate that consistent observer performance can be attained with images compressed in this way, at compression rates in excess of those reported for uniformly compressed images.
Development of the eye-movement response in the trainee radiologist
David S. Wooding, Geraint M. Roberts, Jane Phillips-Hughes
In order to explore the initial response of the visual system to radiological images in groups of individuals with increasing degrees of radiological training and experience, the locations of fixations made during visual inspection of digitized chest radiographs were examined for 4 groups of observers: 10 experienced radiologists, 9 first-year 'novice' radiologists, 11 'trainee' radiologists in the second and third years of their training, and 7 native controls. Each observer viewed 12 digitized chest radiographs (6 normal and 6 showing some abnormality) in a VDU for 8s each. Eye movements were recorded throughout and observers indicated via a button box whether they thought the radiograph to be normal or abnormal. A least squares index was utilized in order to quantify the similarity in fixation location between pairs of eye movement traces over the first 1.5 and 3 seconds of an inspection. The similarities thus produced were then averaged to give intra- and inter-group similarities in fixation location. The fixation locations of experienced radiologists were found to be highly similar as a group, as were those of the novices. While the fixation locations of controls showed less similarity, it was the fixations of trainees which were the least similar (i.e. showed the most variability) within their group. The fixation locations of novices showed a greater similarity to those of radiologists than those of controls, and a decreased similarity to those of controls than those of the controls themselves. However, rather than showing that the fixation locations of individuals become increasing similar to those of radiologists as training progresses, the data show that the more variable fixation locations of trainees are the least similar to those of radiologists than those of any of the groups, even the controls. Control observers examine every day images in a similar way and this is also true of radiological images. Experienced radiologists view radiological images in a similar way to each other, but their training has resulted in differences between them and controls. In becoming experienced radiologists, it appears that trainees may move through a developmental phase characterized by more idiosyncratic eye movements; their eye movements becoming less similar to controls or experienced radiologists than they were. With experience the eye movements of trainee radiologists may become more similar to both groups, but the transition of the trainee from novice to experienced radiologist is not a simple one: the change involves a period of some disorder.
Chronometric analysis of mammography expertise
Claudia Mello-Thoms, Calvin F. Nodine, Harold L. Kundel
This paper studies the effects of training and experience on decision time and performance in mammography. We compared the performance of three groups of observers representing different levels of expertise: dedicated breast imagers (mammographers), radiology residents undergoing a mammography rotation, and mammography technologists, when reading a test set that contains benign and malignant lesions, as well as lesion free images. We show that the number of cases read significantly impacts performance, as measured by the area under the AFROC curve. We also show that different levels of expertise have different decision structures during the time course of image viewing. In fact we show that the mammographers should stop reading an image after 60 - 80 seconds, because at this point they have found all of the true targets present, and they are much more likely to make a mistake. On the other hand residents and technologists mistakes plague their performance throughout the time course of image viewing.
Influence of monitor luminance and tone scale on observers' search and dwell patterns
The goal of this study was to measure the influence of monitor display luminance and tone scale on diagnostic performance and visual search behavior. Radiologists viewed 50 pairs of mammograms in two experiments. The first experiment changed the monitor's tone scale from its default (non-linearized) scale to the DICOM Barten scale (perceptually linearized). The second study compared an 80 ftL with a 140 ftL monitor. Eye- position was recorded. Performance with the Barten was higher than with the default curve. Performance with the 140 ftL was higher than with the 80 ftL monitor. Viewing time did not differ for tone scale, but was significantly shorter for 140 ftL vs 80 ftL. The number of fixation clusters generated was higher for the 80 ftL and the default tone scale. The difference was significant for lesion-free images. Median decision dwell times were longer in the default and 80 ftL conditions. Display luminance and tone scale affect diagnostic and search performance when using monitors. Lower luminance levels and a non-linearized display prolong search and recognition of normal, lesion-free areas compared to lesion- containing areas -- radiologists are having more trouble deciding that areas are truly normal. It is recommended that such factors as monitor luminance and choice of tone scale be taken into account when deciding to use CRT monitors for viewing radiographic images in the clinical environment.
Breast cancer screening: comparison of radiologists' performance in a self-assessment scheme and in actual breast screening
Helen C. Cowley, Alastair G. Gale
The PERFORMS self-assessment scheme is used by the UK Breast Screening Programme as an educational tool. From this scheme a radiologist can gain insight into their own sensitivity, specificity, feature and cancer detection performance. Such data may, however, be questionable if they are not well related to the radiologist's performance in actual breast screening. Consequently, data from the scheme were compared with those from actual breast screening performance. Some correlations were found in performance, this indicates that continued use of the scheme is important to identify any areas of individual difficulty.
Technology Assessment and Observer Performance I
icon_mobile_dropdown
Psychophysical evaluation of the image quality of a dynamic flat-panel digital x-ray image detector using the threshold contrast detail detectability (TCDD) technique
We are currently in an era of active development of the digital X-ray imaging detectors that will serve the radiological communities in the new millennium. The rigorous comparative physical evaluations of such devices are therefore becoming increasingly important from both the technical and clinical perspectives. The authors have been actively involved in the evaluation of a clinical demonstration version of a flat-panel dynamic digital X-ray image detector (or FDXD). Results of objective physical evaluation of this device have been presented elsewhere at this conference. The imaging performance of FDXD under radiographic exposure conditions have been previously reported, and in this paper a psychophysical evaluation of the FDXD detector operating under continuous fluoroscopic conditions is presented. The evaluation technique employed was the threshold contrast detail detectability (TCDD) technique, which enables image quality to be measured on devices operating in the clinical environment. This approach addresses image quality in the context of both the image acquisition and display processes, and uses human observers to measure performance. The Leeds test objects TO[10] and TO[10+] were used to obtain comparative measurements of performance on the FDXD and two digital spot fluorography (DSF) systems, one utilizing a Plumbicon camera and the other a state of the art CCD camera. Measurements were taken at a range of detector entrance exposure rates, namely 6, 12, 25 and 50 (mu) R/s. In order to facilitate comparisons between the systems, all fluoroscopic image processing such as noise reduction algorithms, were disabled during the experiments. At the highest dose rate FDXD significantly outperformed the DSF comparison systems in the TCDD comparisons. At 25 and 12 (mu) R/s all three-systems performed in an equivalent manner and at the lowest exposure rate FDXD was inferior to the two DSF systems. At standard fluoroscopic exposures, FDXD performed in an equivalent manner to the DSF systems for the TCDD comparisons. This would suggest that FDXD would therefore perform adequately in a clinical fluoroscopic environment and our initial clinical experiences support this. Noise reduction processing of the fluoroscopic data acquired on FDXD was also found to further improve TCDD performance for FDXD. FDXD therefore combines acceptable fluoroscopic performance with excellent radiographic (snap shot) imaging fidelity, allowing the possibility of a universal x-ray detector to be developed, based on FDXD's technology. It is also envisaged that fluoroscopic performance will be improved by the development of digital image enhancement techniques specifically tailored to the characteristics of the FDXD detector.
Evaluation of angiograms obtained from a laser-based x-ray source in DESA regime
Ernest M. Scalzetti, Andrzej Krol, George M. Gagne, et al.
Contrast resolution of angiograms created using a laser-based x-ray source in Dual Energy Subtraction Angiography (DESA) regime has been investigated. It has been compared to contrast in angiograms obtained using an x-ray tube-based clinical angiography unit in DSA mode. Contrast detail phantoms and rats with opacified vascular structures were imaged. A table top terawatt laser was used (1019 Wcm-2, 150 fs or 450 fs per pulse). For Iodine contrast agent, an Iodine filter was used with the BaF2 target to obtain images with mean x-rays energy below the Iodine K-edge. La target and La filter was used to obtain images with mean x-rays energy above the Iodine K-edge. For Ba contrast agent, a Nd filter was used with the Nd target to obtain images with mean x-rays energy below the Ba K-edge. Gd target and Nd filter was used to obtain images with mean x-rays energy above the Barium K-edge. It has been determined that the laser-based DESA with properly selected targets demonstrates better contrast than a standard x-ray tube-based DSA angiography. We conclude that laser-based x-ray source has promise for angiography in DESA regime providing that sufficient x-ray flux can be delivered by the laser.
LROC analysis of human detection performance in PET and time-of-flight PET
Previous investigations into time-of-flight positron emission tomography (TOFPET) have shown that stochastic noise in images can be reduced when the reconstruction process accounts for the differences in detection times of coincidence photons. Among the factors that influence this reduction are the sensitivity and the spatial and temporal resolutions of the TOFPET detectors. Within the framework of a simplified time- of-flight imaging model, we have considered the effect of these factors on task performance for human observers. The task was detection of mediastinal 'hot' tumors in simulated images of the chest. There were 14 simulated TOFPET systems and 2 simulated PET systems considered. Image reconstruction was performed using filtered backprojection (FBP) for PET and a modified FBP for TOFPET. Localization receiver operating characteristic (LROC) methodology, in which the observers must detect and locate the tumors, was used. The LROC study gives insight into how TOFPET detector characteristics might improve in order to make possible observer task performance on a par with PET. A comparison of our results to a theoretical result from the literature was also conducted.
Diagnostic performance and image quality assessment in teledermatology
Elizabeth A. Krupinski, Behjamin W. LeSueur, Lansing G. Ellsworth M.D., et al.
Digital photography is available for use in telemedicine using commercially available compact digital cameras. The goal of this study was to compare and evaluate the diagnostic accuracy of dermatological diagnoses based on photos obtained with a digital camera versus in-person diagnoses. 308 subjects were recruited from a university dermatology clinic. Patients were examined in-person by one of three dermatologists who provided the clinical diagnosis. Digital photos were then obtained on all patients. The three dermatologists reviewed the images on a computer monitor and provided a diagnosis and confidence rating. There was 80% agreement between in-person versus digital photo diagnoses. Intra-dermatologist agreement averaged 84%. Decision confidence was rated as very definite to definite 70% of the time using the photo images. Monitor reading agreement with biopsy results averaged about 75%. Image resolution and color were rated as good to excellent 83% and 93% of the time respectively. The use of digital photography for store and forward teledermatology yields high quality images and diagnostic accuracy rates which correlate well with in-person clinical diagnoses and biopsy results.
Classification of mammographic patterns: beyond fraction of dense tissue
Philip F. Judy, Richard Nawfel, Francine L. Jacobson, et al.
Women with mammograms that radiologists classify as dense have been found to have an increased risk of breast cancer. The purpose to this investigation was to determine whether human readers are willing and able to make reliable comparisons of five attributes of pairs of mammograms matched by a quantitative estimate of the fraction of dense tissue (FDT). Forty pairs of CC projections were digitized and presented using a computer workstation. The 40 pairs of mammograms had the same FDT as measured by a visual threshold procedure. Each breast image was from a different woman. The difference in the following 5 attributes were rated: (1) fraction of dense tissue, (2) fraction of homogeneous of the dense tissue, (3) fraction of ductal dense tissue, (4) prominence of scalloping of dense tissue, and (5) prominence of subareolar structures. The rating were replicated to evaluate their reliability. Spearman rank-order correlations of replicated measurements ranged from 0.89 to 0.65 (p was less than 0.0001). Homogeneous dense tissue ratings were negatively correlated with ductal dense tissue ratings (-0.59, p equals 0.0001). The prominence of scalloping rating was not significantly correlated with other attributes. The ratings of the attributes, except scalloping, were significantly correlated to differences mean gray level of breast parenchyma. Readers can make reliable judgments regarding the differences in attributes of mammograms that are matched by FDT. The negative correlation between the homogeneous dense and the ductal dense tissue ratings suggest that homogeneous dense and ductal dense tissues contend for perceived dense breast area. The absence of correlation between scalloping and other image attributes suggests further investigation of scalloping as an independent, breast-cancer risk factor is warranted.
Technology Assessment and Observer Performance II
icon_mobile_dropdown
Effect of monitor luminance on the detection of a solitary pulmonary nodule: ROC analysis
Koun-Sik Song, Jin Seong Lee, Hae Young Kim, et al.
We compared the detectability of solitary pulmonary nodule (SPN) in chest radiographs displayed on different gray-scale monitor luminance. From the long-term archive of Asan Medical Center PACS 40 normal chest PA images and 40 chest PA images with SPN were fetched into the short-term storage. All Chest PA images were acquired using Fuji FCR 9501 or 9500 HQ and down-sampled from 4k to 2k pixel resolutions, and archived to ODJ with 10:1 compression ratio. Mean diameter of the nodules were 12 mm ranging in size from 8 to 20 mm. Nodules were located within the free lung fields (10 cases), overlapped with rib (13 cases), and overlapped with hilum, heart, or subphrenic areas (17 cases). Gray-scale monitors compared in our study were Image Systems M21P2KHBMAX monitor with 100 fL brightness and M21PMAX monitor with 65 fL brightness. After randomization, eight board-certified radiologists determined the presence or absence of nodules independently using worksheet. All radiologists interpreted the images displayed on low-brightness monitors, then after 10 days interpreted the images displayed on high-brightness monitors. Data were gathered using five rating categories, and ROC analysis was performed. Area under the ROC curve was compared for low and high brightness monitors. Mean area under the ROC curve for low-brightness monitor was 0.8597 and high-brightness monitor was 0.8734. Although high-brightness monitor is slightly superior to low-brightness monitor, there was no statistically significant differences between low-brightness and high- brightness monitors (p equals 0.3). Further studies are required for various other subtle lung diseases, long-term physiological effect.
Observer performance using CRT monitors with different phosphors
Hans Roehrig, Elizabeth A. Krupinski, Mahesh Sivarudrappa
The goal of this study was to compare observer performance on two monitors -- one with a P45 and the other with a P104 phosphor. Phosphors have distinctly different physical properties that, among other things, affect the noise properties of the display. Differences in noise have an effect on the signal-to-noise ratio, and hence may have a significant affect on observer's detection performance. A complete physical analysis was done on the 2 monitors. A JND study was conducted to measure observer performance. A series of grating patterns was generated for display on the 2 monitors. Observers were instructed to report whether the pattern was vertical, horizontal or blank. Observer performance with the P45 was better than with the P104 phosphor. This result was supported by those found in the physical evaluation, that showed poorer results on various parameter. The results indicate that the type of phosphor does affect the physical aspects, which in turn affects observer performance. Two clinical areas in particular that might be affected by phosphor differences are nodule detection in chest images and mass detection in mammograms, since low contrast targets will be most affected by the noise differences (SNR) in different monitor phosphors.
Web-based tool for subjective observer ranking of compressed medical images
Steven G. Langer, Brent K. Stewart, Rex K. Andrew
In the course of evaluating various compression schemes for ultrasound teleradiology applications, it became obvious that paper based methods of data collection were time consuming and error prone. A method was sought which allowed participating radiologists to view the ultrasound video clips (compressed to varying degree) at their desks. Furthermore, the method should allow observers to enter their evaluations and when finished, automatically submit the data to our statistical analysis engine. We have found the World Wide Web offered a ready solution. A web page was constructed that contains 18 embedded AVI video clips. The 18 clips represent 6 distinct anatomical areas, compressed by various methods and amounts, and then randomly distributed through the web page. To the right of each video, a series of questions are presented which ask the observer to rank (1 - 5) his/her ability to answer diagnostically relevant questions. When completed, the observer presses 'Submit' and a file of tab delimited test is created which can then be imported to an Excel workbook. Kappa analysis is then performed and the resulting plots demonstrate observer preferences.
Modeling Visual Signal Detection I
icon_mobile_dropdown
Evaluation of keyhole MR imaging with a human visual response model
Kyle A. Salem, Jeffrey L. Duerk, Michael Wendt, et al.
As a first step toward developing a methodology suitable for optimizing the many parameters in keyhole and other fast imaging techniques, we applied an accepted human visual system (HVS) perceptual difference model to simulated keyhole images. A series of 'gold-standard' full k-space images were acquired during the insertion of a needle into ex vivo bovine liver. Keyhole imaging, a method by which image frame rate is increased due to sub-sampling k-space, was simulated from this image data. A perceptual difference HVS model was used to create a map of the likelihood of visible differences between a simulated keyhole image and the corresponding full k-space acquisition. Visible difference degradation was compared with a mean squared error (MSE) metric for both entire images and regions of interest around the needle tip. The output of the HVS model was a spatial map of perceptual differences. This map proved useful since it provided an accurate tool for finding the location of image differences. According to the perceptual model, the quality of the entire image is preserved most favorably with a stripe parallel to the direction of insertion. For a region of interest surrounding the needle, a perpendicular stripe resulted in the lowest level of image error. The HVS model agreed favorably with anecdotal human inspection. For example, while high frequency noise in the image produces effective changes in the MSE metric, the visual model and inspection show no true perceivable image difference. Additionally, inspection verified the importance of the direction of the k-space sub-sampling. Examination of rotated stripes of k-space show that a step of 45 degrees is preferred. Larger steps caused high initial error, while smaller steps took too long to traverse k-space. Experience indicates that the HVS model is an objective, promising tool for the automated evaluation and optimization of keyhole imaging sequences. Hopefully, it will provide a rational method for optimizing the large number of potential techniques and infinite number of parameters in fast MR imaging.
Effect of image compression in model and human performance
We applied three different model observers (non-prewhitening matched filter with an eye filter, Hotelling and channelized Hotelling) to predict the effect of JPEG image compression on human visual detection of a simulated lesion (clinically known as thrombus) in single frame digital x-ray coronary angiograms. Since the model observers' absolute performance is better than human, model performance was degraded to match human performance by injecting internal noise proportional to the external noise. All three model-observers predicted reasonably well the degradation in human performance as a function of JPEG image compression, although the NPWEW and the channelized Hotelling models (with internal noise proportional to the external noise) were better predictors than the Hotelling model.
Visual discrimination model for digital mammography
Numerous studies have been conducted to determine experimentally the effects of image processing and display parameters on the diagnostic performance of radiologists. Comprehensive optimization of imaging systems for digital mammography based solely on measurements of reader performance is impractical, however, due to the large number of interdependent variables to be tested. A reliable, efficient alternative is needed to improve the evaluation and optimization of new imaging technologies. The Sarnoff JNDmetrixTM Visual Discrimination Model (VDM) is a computational, just-noticeable difference model of human vision that has been applied successfully to predict performance in various nonmedical detection and rating tasks. To test the applicability of the VDM to specific detection tasks in digital mammography, two observer performance studies were conducted. In the first study, effects of display tone scale and peak luminance on the detectability of microcalcifications were evaluated. The VDM successfully predicted improvements in reader performance for perceptually linearized tone scales and higher display luminances. In the second study, the detectability of JPEG and wavelet compression artifacts was evaluated, and performance ratings were again found to be highly correlated with VDM predictions. These results suggest that the VDM would be useful in the assessment and optimization of new imaging and compression technologies for digital mammography.
Signal detection in a lumpy background: effects of providing more information to the human than just raw data
In this paper we present a modification to the standard two- alternative forced-choice (2AFC) experiment in an attempt to help the human detect signals by providing redundant information. We call the old experiment 2AFC_RAW, and the new experiment, 2AFC_FILTER. In the 2AFC_FILTER experiment, we provide the observer with the pair of raw data images (as in 2AFC_RAW) plus filtered versions of the raw data. The thought behind this modification is that the human might benefit from generic pre-processing of the data into multiple images, each extracting different information. We defined two different 2AFC_FILTER experiments, each using Laguerre-Gauss functions as the filters. The difference between the two was their defining Gaussian envelope. We tested human performance given a variety of image classes with the 2AFC_RAW and the two 2AFC_FILTER experiments. The same raw data were used in each. We found that there was a significant human performance increase from the 2AFC_RAW to the 2AFC_FILTER experiment. It was also seen that the choice of the filters made a difference. Specifically, human performance was better when the Gaussian envelope of the Laguerre-Gauss functions matched the signal.
Further investigation of the effect of phase spectrum on visual detection in structured backgrounds
Medical images can be described by their power and phase spectra. Therefore, it is of interest to know how these components influence human observers in detection tasks. Whereas the power spectrum appears to correctly describe the useful statistical properties of computer generated noise images (like white noise, filtered white noise or lumpy backgrounds), this might not be the case for patient structured images. The present study investigates the role of stationarity, power and phase spectra of two types of medical images (mammography and angiography). We consider different categories of images that all have the same mean, and power spectrum. Two-alternative forced-choice experiments are performed on patient structured images, random phase, filtered white noise, and clustered lumpy background. This latter has the property to contain visible structures similar to the ones observed on real mammograms, and (unlike real patient structure) to be stationary by construction. It is shown that model observers can take non-stationarity into account of real images in two different ways. The safest and easiest way consists of applying the model template directly on the images. The other way consists of correcting the performance computed from global quantities with a factor that takes into account local statistical values in the area of interest. Finally, patient structured backgrounds are not fully described by their power spectrum and we show that human observers are able to use some information contained in the phase spectrum.
Modeling Visual Signal Detection II
icon_mobile_dropdown
Estimation of human-observer templates in two-alternative forced-choice experiments
A method is presented for directly estimating the weights, or 'linear template' used by an observer performing a signal- known-exactly detection task in a two-alternative forced- choice (2-AFC) experiment. The approach generalizes prior work by Ahumada, and Beard and Ahumada, to 2-AFC experiments and correlated image noise, and yields an unbiased estimate of the observer template. The estimation procedure is checked against a known linear detection strategy, and human-observer templates estimated from some preliminary psychophysical experiments are shown.
Quantitative image quality of spatially filtered x-ray fluoroscopy
One potential method to lower x-ray fluoroscopy dose without compromising image quality is to acquire images at a decreased exposure rate and digitally filter to reduce noise. In both single image frames and image sequences, we investigated the effect of noise-reduction spatial filtering on the detection of stationary cylinders that mimicked arteries, catheters, and guide wires in x-ray imaging. We simulated ideal edge- preserving spatial filters by filtering the noise only and then adding targets for detection. Fitters used were three different center-weighted averagers that reduce pixel noise variance by factors of 0.75, 0.5, and 0.25. Detection performance in unfiltered and spatially filtered noisy image sequences and single frames was measured using a reference/test, 9-alternative, adaptive forced-choice method. Performance level was fixed and results were obtained in the form of signal contrast sensitivity. In single images, the effect of filtering on detection was insignificant at all filtering levels. On the other hand, filtering in image sequences improved detectability by as much as 23%, yielding a potential x-ray dose savings of 34%. Comparing results with the prewhitening matched filter model indicated that human observers have improved detection efficiency in spatially filtered image sequences, as compared to white-noise sequences. We conclude that edge-preserving spatial filtering is more effective in sequences than in single frames. Such filtering can potentially improve image quality in noisy image sequences such as x-ray fluoroscopy.
Detection of lesions in mammographic structure
Arthur E. Burgess, Francine L. Jacobson, Philip F. Judy
This paper is a report on very surprising results from recent work on detection of real lesions in digitized mammograms. The experiments were done using a novel experimental procedure with hybrid images. The lesions (signals) were real tumor masses extracted from breast tissue specimen radiographs. In the detection experiments, the tumors were added to digitized normal mammographic backgrounds. The results of this new work have been both novel and very surprising. Contrast thresholds increased with increasing lesion size for lesions larger than approximately 1 mm in diameter. Earlier work with white noise, radiographic image noise, computed tomography (CT) noise and some types of patient structure have accustomed us to a particular relationship between lesion size and contrast for constant detectability. All previous contrast/detail (CD) diagrams have been similar, the contrast threshold decreases as lesion size increases and flattens at large lesion sizes. The CD diagram for lesion detection in mammographic structure is completely different. It will be shown that this is a consequence of the power-law dependence of the projected breast tissue structure spectral density on spatial frequency. Mammographic tissue structure power spectra have the form P(f) equals B/f(beta ), with an average exponent of approximately 3 (range from 2 to 4), and are approximately isotropic (small angular dependence). Results for two-alternative forced-choice (2AFC) signal detection experiments using 4 tumor lesions and one mathematically generated signal will be presented. These results are for an unbiased selection of mammographic backgrounds. It is possible that an additional understanding of the effects of breast structure on lesion detectability can be obtained by investigating detectability in various classes of mammographic backgrounds. This will be the subject of future research.
Poster Session
icon_mobile_dropdown
Appearance matching of radiographic images using lightness index
Eiji Ogawa, Kazuo Shimura
Appearances of images are closely related with the luminance dependence of human visual characteristics. Radiographic images are displayed on the CRTs with various luminance as well as on high luminance light-boxes. We studied a tone scale that can improve consistency in appearance among various devices with different luminance. It is likely that radiologists diagnose images based on the relation between the brightness of region of interest and that of surrounding area. Lightness is defined as a relative brightness of region of interest compared with the maximum luminance level of the image. We think the lightness index can be applied for realizing the appearance matching of radiographic images. Lightness matching can be realized by displaying images with the tone scale which gets agreement of the gradients of the display tone scale, on the logarithm of output luminance vs. input data level plane, among display systems. In this paper we call it a 'lightness-equivalent' characteristic. We evaluated the appearance consistency of images displayed with the log-luminance linear tone scale, as realizing the lightness equivalent characteristic, compared with those displayed with the perceptual-linear tone scale. In evaluation the log-luminance linear tone scale gave almost the same appearance among devices with different luminance. On the other hand, the perceptual-liner tone scale gave lower visual contrast for images on the lower luminance device than the higher luminance device, which might have lead to observers perceiving as different appearances.
Reconfigurable parallel processor for noise suppression
Michael Cuviello, Philip P. Dang, Paul M. Chau
Digital images corrupted with noise regularly require different filtering techniques to optimally correct the image. Software provides convenience for implementing a variety of different filters, but suffers a speed penalty due to its serial nature of the filter calculations. In converse fashion, implementation using ASIC technology allows for a speed advantage due to parallel processing but at the cost of increased hardware overhead for implementing a variety of filters individually. Advances in Field Programmable Gate Array (FPGA) technology offers a middle ground in which the speed advantages of an ASIC and the reprogrammable aspect of a general purpose conventional CPU or DSP software approach are combined. In this paper, we present an FPGA-based, reconfigurable system, that can perform an assortment of noise filtering algorithms using the same hardware. Implementation of Gaussian and salt-and-pepper noise are evaluated for this system.
Modeling Visual Signal Detection II
icon_mobile_dropdown
Producing lesions for hybrid mammograms: extracted tumors and simulated microcalcifications
Arthur E. Burgess, Sankar Chakraborty
Experimental and theoretical investigations of signal detection in medical imaging have been increasingly based on realistic images. In this presentation, techniques for producing realistic breast tumor masses and microcalcifications will be described. The mass lesions were obtained from 24 specimen radiographs of surgically removed breast tissue destined for pathological evaluation. A variety of masses were represented including both lobular and spiculated ductal carcinomas as well as fibroadenomas. Mass sizes ranged from 4 to 18 mm. The specimens included only a small amount of attached normal tissue, so tumor boundaries could be identified subjectively. A simple, interactive quadratic surface generating method was used for background subtraction -- yielding an isolated tumor image. Individual microcalcifications were generated using a 3D stochastic growth algorithm. Starting with a central seed cell, adjacent cells were randomly filled until the 3D object consisted of a randomly selected number of filled cells. The object was then projected to 2D, smoothed and sampled. It is possible to generate a large variety of realistic shapes for these individual microcalcifications by varying the rules used to control stochastic growth. MCCs can then randomly generated, based on the statistical properties of clusters described by LeFebvre et al.
Poster Session
icon_mobile_dropdown
Image compression and feature stabilization of dynamically displayed coronary angiograms
Jay L. Bartroff, Craig A. Morioka, James Stuart Whiting, et al.
Eigler et al (1994) proposed an optimized display for coronary angiograms where each image of the sequence is digitally shifted so that the feature of interest within an artery remains fixed at the center of the screen and the background moves (stabilized display). We measure the effect of JPEG and CREW (a wavelet-based software) image compression on the detectability of a simulated morphological feature (filling defect) for the stabilized display and compare it to the conventional moving artery display. Our results show that 15:1 compressed JPEG for the stabilized display and the moving artery display does not significantly degrade human performance but a 19:1 CREW did. The stabilized display significantly improved performance with respect to the conventional moving artery display for the uncompressed and the 15:1 JPEG but not for the 19:1 CREW.