Proceedings Volume 5034

Medical Imaging 2003: Image Perception, Observer Performance, and Technology Assessment

cover
Proceedings Volume 5034

Medical Imaging 2003: Image Perception, Observer Performance, and Technology Assessment

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 22 May 2003
Contents: 9 Sessions, 59 Papers, 0 Presentations
Conference: Medical Imaging 2003 2003
Volume Number: 5034

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Perception and Softcopy Displays
  • Technology and Observer Performance
  • Observer Models I: Clinical Applications
  • Observer Models II
  • Technology and Image Quality Assessment
  • Statistics and Methodology
  • Image Interpretation
  • Observer Models III
  • Poster Session
Perception and Softcopy Displays
icon_mobile_dropdown
2-AFC observer study of digital stereomammography
A two-alternative forced choice (2-AFC) observer study has been performed to estimate the dose needed for detection of small cancers by digital stereomammography compared to projection mammography. Monoscopic and stereoscopic images simulating 4 different values of signal-to-noise ratio (SNR) were interleaved and repeated for 3 different object diameters. Five hundred images per condition were read by 4 observers and the fraction of correct answers and the corresponding d' as a function of condition were calculated. Preliminary results indicate that stereoscopy could be performed at 1.5+/-0.2 times the dose of monoscopic viewing. In our previous contrast-detail (C-d) study, we observed a factor of 1.1+/-0.1. The two experiments differ in target properties and decision tasks. In the C-d experiment, the viewer was asked to determine whether suprathreshold objects with an SNR of 5-6 were visualized in terms of a well defined edge and shape. In the 2-AFC experiment, the viewer was asked to detect an object at the limits of visibility (SNR range of 1 to 3). Experiments to elucidate the differences in observer performance are planned.
Optimized soft-copy display of digitized mammograms
Ton A. A. J. Roelofs, Sander van Woudenberg, Jan H. C. L. Hendriks, et al.
Digitization and CRT display reduce sharpness of mammograms. To ensure image quality on a CRT, comparable to the quality of original films, a modified unsharp-masking (USM) algorithm is proposed to correct for this reduction. This study evaluates the clinical value of this algorithm and determines the optimal setting of its parameters. Eight complete mammographic cases were processed by a modified USM algorithm with 19 settings for three parameters, resulting in 152 stimuli. All cases showed a clearly visible mass; five also contained microcalcifications. The modification of the standard USM algorithm consisted of selectively improving low contrasts. Moreover, the USM enhancement was made grey value dependent to avoid clipping. Four experienced screening radiologists and four physicists (having experience with mammography imaging) rated all mammograms on a 1-10 point scale, according to image quality and suitability for diagnosis. The images were randomly presented. Before the experiment started, a subset of the images was shown to familiarize the observers to the range of images and parameter settings. For a contrast enhancement factor of about 0.4, the processed mammograms appeared to be significantly better than the original digitized mammograms (P<.001). Differences in the results for the radiologists and the physicists were small.
Human vision model to predict observer performance: detection of microcalcifications as a function of monitor phosphor
The goal was to develop an efficient method of optimizing CRT monitor performance for digital mammography. The Sarnoff JNDmetrix vision model is based on just-noticeable difference measurement and frequency-channel vision-modeling principles. Given 2 images as input the model returns accurate, robust estimates of discriminability. Model predictions are compared with human performance. Mammographic images with microcalcifications were viewed by six radiologists, once on a monitor with P45 and once on one with P104 phosphor. Results were compared with output of the model used to predict differences in perceptibility of calcifications using luminance data measured with a high-resolution CCD camera. Human performance was best with high contrast clusters and got worse with each decrease in contrast. Performance was better with the P45 than the P104 for targets at all contrast levels. The JNDmetrix model predicted the same pattern of results. Correlation between human and model observer performance was very high. We have demonstrated the utility of using a vision model to accurately predict human detection performance. The type of phosphor in a monitor influences observer performance at least for the detection of microcalcifications. The main reason is that the P104 has a higher luminance, but the P45 has a higher signal-to-noise ratio.
Optimized perception of lesion growth in mammograms using digital display
Saskia van Engeland, Peter R. Snoeren, Nico Karssemeijer, et al.
In this study we investigate two ways of presenting prior and current mammograms on a mammography workstation: next to each other and alternating at the same display (toggle). The experiment consisted of 420 trials with prior-current mammogram pairs, displayed on a dedicated mammography workstation. In two-alternative forced-choice (2AFC) experiment, observers were asked to select the image containing the largest lesion. The stimuli were created by pasting extracted lesions into normal mammograms. Results showed that the observers preformed more accurate in selecting the largest lesion when using the toggle option.
Accuracy of transcribing locations on mammograms: implications for the user interface of a system to record and assess breast screening decisions
James W. Hatton, David S. Wooding, Alastair G. Gale, et al.
A series of studies were carried out to determine the accuracy of transcribing feature locations in a 'reference’ mammogram to a scaled, and sometimes simplified 'copy’ of the same image. A computer monitor displayed two images of a mammogram. The reference image of the mammogram was presented at 256 colors (greyscale) and, adjacent to this, the copy image was presented with height and width scaled to 0.25, 0.50 or 0.75 of the original, and in one of four forms: blank image, outline, 16 colour palette, and 256 color palette. Participants were required to locate a target on the reference image and indicate its location on the copy image. Results demonstrated accuracy in transcription of target location increased with enhanced image detail and scale. Consequently, it is possible to determine which image characteristics are important in situations where a small representational image is used to record patient information, such as the user interface for a tablet device used to record breast screening decisions.
Technology and Observer Performance
icon_mobile_dropdown
Effect of radiologists' variability on the performance of computer classification of malignant and benign calcifications in mammograms
Yulei Jiang, M. Fernanda Salfity, Vicky Chen, et al.
In developing a computer technique to classify clustered microcalcifications as malignant or benign, we previously indicated manually the location of all individual calcifications to the computer and found the computer to be more accurate than radiologists. In this study, we investigate whether radiologists can be asked to provide minimal input to the computer and obtain consistent computer classification results. Radiologists were instructed to draw a rectangle that enclosed all calcifications, and indicate the approximate number of the calcifications (either <6, 6-10, 10-30, or >30). The computer used these two pieces of information to detect the individual calcifications and, subsequently, to classify the calcifications as malignant or benign based on only those calcifications detected by the computer. We showed at the 2002 RSNA conference 18 cases together with standard and magnification view mammograms to 38 self-reported breast-imaging radiologists (12 of whom read all 18 cases). The standard deviation in the location of their rectangles (averaged over all cases) was approximately 3 mm, the standard deviation in the linear dimension of the rectangles was 6 mm, and the standard deviation in the computer-estimated likelihood of malignancy was 17%. These results indicate that radiologists are able to provide consistent input to the computer, which in turn produces reasonably consistent computer classification results.
See-through head-worn display of patient monitoring data to enhance anesthesiologists' response to abnormal clinical events
David F. Ormerod, Brian K. Ross M.D., A. Naluai-Cecchini
One obstacle to safety in the operating room is anesthesiologist distraction -- having to shift attention back and forth from the patient to vital sign monitor while performing either routine or emergency procedures. The purpose of this study was to measure the decrease in anesthesiologist distraction made possible by using a head-mounted, see-through personal display (HMD) using retinal scanning technology. With the head-up display, they were able to focus their attention exclusively on the patient and the task at hand. The Nomad reduced the number of times the anesthesiologist had to shift their attention by a more than a third (17 times versus 58 times). This allowed them to spend more time focused on the patient.
Diagnostic performance of radiologists with and without different CAD systems for mammography
Adele Lauria, Maria Evelina Fantacci, Ubaldo Bottigli, et al.
The purpose of this study is the evaluation of the variation of performance in terms of sensitivity and specificity of two radiologists with different experience in mammography, with and without the assistance of two different CAD systems. The CAD considered are SecondLookTM (CADx Medical Systems, Canada), and CALMA (Computer Assisted Library in MAmmography). The first is a commercial system, the other is the result of a research project, supported by INFN (Istituto Nazionale di Fisica Nucleare, Italy); their characteristics have already been reported in literature. To compare the results with and without these tools, a dataset composed by 70 images of patients with cancer (biopsy proven) and 120 images of healthy breasts (with a three years follow up) has been collected. All the images have been digitized and analysed by two CAD, then two radiologists with respectively 6 and 2 years of experience in mammography indipendently made their diagnosis without and with, the support of the two CAD systems. In this work sensitivity and specificity variation, the Az area under the ROC curve, are reported. The results show that the use of a CAD allows for a substantial increment in sensitivity and a less pronounced decrement in specificity. The extent of these effects depends on the experience of the readers and is comparable for the two CAD considered.
Observer Models I: Clinical Applications
icon_mobile_dropdown
Detection in power-law noise: spectrum exponents and CD diagram slopes
Normal mammographic image backgrounds have approximately isotropic power spectra of the form, P(f) =K/fe, where f is radial frequency. The values ofthe exponent, 3, range from 1.5 to 3.5 with an average of about 2.8. The ideal observer model predicts that, for signals with certain properties, the log-log contrast-detail (CD) diagram slope, m, is given by: m = O.5(3-2). Previously, we reported results for detection of a model mass (designer nodule) in filtered noise with an exponent of 3. The model and human observer CD slopes were 0.5 and 0.45 respectively. Here, we report preliminary results for human and model observer 2AFC detection of a simple signal in filtered noise with exponents from 1.5 to 3.5. Our results are in good agreement with the prediction of the above equation. We will also describe results of 2AFC detection experiments done using "twin" noise backgrounds with identical noise realizations in the two backgrounds. We could not replicate the results ofJohnson et al. For '1/f3' noise, they found a CD slope of—O.59 while we found +0.37.
Optimizing lens-coupled digital radiographic imaging systems based on model observers' performance
Liying Chen, Harrison H. Barrett
Recent advances in model observers that predict human perceptual performance now make it possible to optimize medical imaging systems for human task performance. We illustrate the procedure by considering the design of a lens for use in an optically coupled digital mammography system. The channelized Hotelling observer is used to model human performance, and the channels chose are differences of Gaussians (DOGs). The task is detection of a lesion at a random but known location in a clustered lumpy background mimicking breast tissue. The entire system is simulated with a Monte Carlo application according to the physics principles, but the main system component under study is the lens that couples a fluorescent screen to a CCD detector. The bigger the aperture is, the larger the portion of light is coupled to the CDD, but the more severe the aberrations are, so the worse the image blue is. So when changing the stop size, the signal (lesion) detectability of human observers associated with this task also changes. The SNR of the channelized Hotelling observer is used to quantify this detectability. In this paper, plots of channelized Hotelling SNR between coupling efficiency and blur in a task-based manner. In this way he channelized Hotelling SNR is used as a merit function for lens design.
Computer-aided detection of masses in mammography using a Laguerre-Gauss channelized hotelling observer
We propose to investigate the use of a Laguerre-Gauss Channelized Hotelling Observer (LG-CHO) for the basis of a computer aided detection scheme for masses in mammography. A database of 1320 regions of interest was selected from the DDSM database collected by the University of South Florida. The breakdown of the cases was: 656 normals, 307 benigns, and 357 cancers. For the detection task, cancer and benign cases were considered positive and normal was considered negative. A 25 channel LG-CHO was designed to best classify regions as containing a mass or not. Application of this LG-CHO to the database gave a ROC area under the curve of 0.936 and a partial area of 0.648. Additionally, at 98% sensitivity the classifier had a specificity of 44.8% and a positive predictive value of 64.2%. Preliminary results suggest that using a LG-CHO can provide a strong backbone for a CAD scheme to help radiologists with detection. These initial results should be able to be incorporated into a larger CAD system for higher performance either as a false positive reduction scheme or as an initial filter used for mass detection.
Bit-plane-channelized hotelling observer for predicting task performance using lossy-compressed images
A technique for assessing the impact of lossy wavelet-based image compression on signal detection tasks is presented. A medical image’s value is based on its ability to support clinical decisions such as detecting and diagnosing abnormalities. Image quality of compressed images is, however, often stated in terms of mathematical metrics such as mean square error. The presented technique provides a more suitable measure of image degradation by building on the channelized Hotelling observer model, which has been shown to predict human performance of signal detection tasks in noise-limited images. The technique first decomposes an image into its constituent wavelet subband coefficient bit-planes. Channel responses for the individual subband bit-planes are computed, combined,and processed with a Hotelling observer model to provide a measure of signal detectability versus compression ratio. This allows a user to determine how much compression can be tolerated before signal detectability drops below a certain threshold.
Performance comparisons of planar and volumetric observers for lesion detection in PET scanning
This work presents initial results of comparisons between planar and volumetric observer detection task performances for both human and model observers. Positron Emission Tomography (PET) imaging acquires and reconstructs tomographic images as contiguous volumetric (3D) images. Consequently physicians typically interpret these images by searching the image volume using linked orthogonal planar images in the three standard orientations (transverse, sagittal, and coronal). Most of observer studies, however, have typically used planar images for evaluation. For human observer ROC studies, an observer scoring tool, similar to the display tool being used in clinical PET oncology imaging, has been developed. For model observer studies the non-prewhitening matched filter (NPWMF) and the channelized Hotelling observer (CHO) were used to compute detectabilities as figures-of-merit for class separations. For the volumetric (3D)model observers, the entire image volume is used with appropriate 3D templates. For the planar (2D) model observers the transaxial plane centered on the target sphere is extracted and analyzed using 2D templates. Multiple realizations were generated using a non-Monte Carlo analytic simulator for feasible amount of simulation time and statistically accurate noise properties. For comparisons, the correlations between each model observer and human observer performance are computed. The result showed that 3D model observers have a higher correlation with human observers than 2D observers do when axial smoothing is not applied. With axial smoothing, however, the correlation of 2D model observers in general increased to the level of 3D model observer correlations with the human observer.
Observer Models II
icon_mobile_dropdown
Variance of the channelized-hotelling observer from a finite number of trainers and testers
In this paper we analyze the bootstrap and shuffle methods for estimating the mean and variance of the performance of the channelized-Hotelling observer given a finite number of images for training and testing. This background is needed to understand the role of the bootstrap and shuffle methods in new and more complicated models of bias and variance. We assess the accuracy and precision of the bootstrap and shuffle estimates by comparing them to Monte Carlo estimates. The comparisons show that the shuffle estimate of the mean and the bootstrap estimate of the variance are unbiased.
Comparison of human- and model-observer LROC studies
Howard C. Gifford, P. Hendrik Pretorius, Michael A. King
We have investigated whether extensions of linear model observers can predict human performance in a localization ROC (LROC) study. The specific task was detection of gallium-avid tumors in SPECT images of a mathematical phantom, and the study was intended to quantify the effect of improved detector energy resolution on scatter-corrected images. The basis for our model observers is the latent perception measurement postulated for the LROC model. This measurement is obtained by cross-correlating the image with a kernel, and the LROC rating and localization data are the max and argmax, respectively, of this measurement made at all relevant search locations. The particular model observers tested were the nonprewhitening (NPW), channelized NPW (CNPW), and channelized Hotelling (CH) observers. Specification of the observer's search region was also part of the task definition, and several variations were considered that could approximate the training of human observers. The best agreement with the human observers was found with the CNPW observer, suggesting that the ability of human observers to prewhiten images may be degraded when the detection task requires signal localization.
Optimization of model observer performance for signal known exactly but variable tasks leads to optimized performance in signal known statistically tasks
Miguel P. Eckstein, Yani Zhang, Binh Pham, et al.
Previous work has shown that model observers can be used for automated optimization of human performance in clinically relevant detection tasks where the signal does not vary and is known to the observers (signal known exactly, SKE). In the present study, we investigate whether model observers can be used for automated optimization of a more clinically realistic task in which the signal varies in shape and size from trial to trial and is not known to the observer (signal known statistically, SKS). We specifically test the hypothesis of whether optimizing model observer in a computationally more tractable task in which the signal varies from trial to trial but is known to the observer (Signal known exactly but variable task, SKEV) leads to improved model and human performance in the SKS task. We optimized the JPEG 2000 encoder options to maximize performance of a particular model observer (non-prewhitening with an eye filter; NPWE) for a SKEV task using hybrid test images combining simulated signals and patient x-ray coronary angiograms. We then show that NPWE SKEV optimized JPEG 2000 encoder settings lead to an improved NPWE performance in the clinically more realistic SKS task. A follow up psychophysical study showed that human performance in the SKEV and SKS tasks improved by 18-24 % with the encoder options resulting from NPWE SKEV performance optimization. These findings suggest that model observer performance in the computationally more tractable SKEV task can be used to optimize human performance in the more clinically realistic SKS task using real anatomic backgrounds.
Assessing the accuracy of estimates of the likelihood ratio
There are many methods to estimate, from ensembles of signal-present and signal-absent images, the area under the receiver operating characteristic curve for an observer in a detection task. For the ideal observer on realistic detection tasks, all of these methods are time consuming due to the difficulty in calculating the ideal-observer test statistic. There are relations, in the form of equations and inequalities, that can be used to check these estimates by comparing them to other quantities that can also be estimated from the ensembles. This is especially useful for evaluating these estimates for any possible bias due to small sample sizes or errors in the calculation of the likelihood ratio. This idea is demonstrated with a simulation of an idealized single photon emission detector array viewing a possible signal in a two-dimensional lumpy activity distribution.
Laguerre-Gauss basis functions in observer models
Observer models based on linear classifiers with basis functions (channels) are useful for evaluation of detection performance with medical images. They allow spatial domain calculations with a covariance matrix of tractable size. The term “channelized Fisher-Hotelling observer” will be used here. It is also called the “channelized Hotelling observer” model. There are an infinite number of basis function (channel ) sets that could be employed. Examples of channel sets that have been used include: difference of Gaussian (DOG) filters, difference of Mesa (DOM) filters and Laguerre-Gauss (LG) basis functions. Another option, sums of LG functions (LGS), will also be presented here. This set has the advantage of having no DC response. The effect of the number of images used to estimate model observer performance will be described, for both filtered 1/f3 noise and GE digital mammogram backgrounds. Finite sample image sets introduce both bias and variance to the estimate. The results presented here agree with previous work on linear classifiers. The LGS basis set gives a small but statistically significant reduction in bias. However, this may not be of much practical benefit. Finally, the effect of varying the number of basis functions included in the set will be addressed. It was found that four LG bases or three LGS bases are adequate.
Technology and Image Quality Assessment
icon_mobile_dropdown
Measuring CRT display image quality: effects of phosphor type, pixel contrast, and luminance
There is much interest in methodology for accurate measurements of display image quality that is suitable for evaluation and Quality Control purposes. This work aimed to assess the image qualities of two mammography Cathode Ray Tube (CRT) displays for the detection of small targets simulating microcalcifications. Twenty five test patterns containing single pixel targets with variable background values and contrasts, and uniform background test patterns, were generated. Each test pattern was displayed on a P-45 and a P-104 five mega-pixel monitor and images were acquired with a CCD camera. An existing method for measuring Signal to Noise Ratio (SNR) of targets on uniform backgrounds was adapted by including a crucial step to suppress the spurious effects due to the raster lines in the display. It was found that SNR scaled linearly with target contrast and increased with background luminance. The P-45 phosphor was superior at low luminance. Preliminary indications are that this method may be the preferred way to evaluate AMLCD displays.
Automated scoring of CDMAM: a dose study
Ruben Rico, Serge L. Muller, Guillaume Peter, et al.
Digital mammography detectors have a large linear dynamic range that allows dose optimization. CD-MAM phantom is widely used in image quality assessment on mammography systems. Currently human readers score the phantom, which is time-consuming and leads to intra and inter observer variability. We discuss the design and validation of an algorithm based on the Non-Prewhitening Matched Filter observer that intends to replace human observers in the task of CDMAM 3.4 phantom scoring. Correlation between both observers has been evaluated through a dose study. Correlation has been established between scores obtained by both observers. The algorithm presents a higher sensitivity and a better discrimination power to dose variations, which makes automated scoring of CDMAM phantom a potential tool, free of observer variability, for quality control in screening mammography.
Spatial noise and threshold contrasts in LCD displays
This paper presents the results of initial physical and psycho-physical evaluations of the noise of high resolution LCDs. 5 LCDs were involved, having 4 different pixel structures. Spatial as well as temporal noise was physically measured with the aid of a high-performance CCD camera. Human contrast sensitivity in the presence of spatial noise was determined psycho-physically using periodic stimuli (square-wave patterns) as well as aperiodic stimuli (squares). For the measurements of the human contrast sensitivity, all LCDs were calibrated to the DICOM 14 Grayscale Standard Display Function (GSDF). The results demonstrate that spatial noise is the dominant noise in all LCDs, while temporal noise is insignificant and plays only a minor part. The magnitude of spatial noise of LCDs is in the range between that of CRTs with a P104 and that of CRTs with a P45. Of particular importance with respect to LCD noise is the contribution of the pixel structure to the Noise Power Spectrum, which shows up as sharp spikes at spatial frequencies beyond the LCDs’ Nyquist frequency. The paper does not offer any clues about the importance of these spikes on the human contrast sensitivity.
Search for optimal tube voltage for image plate radiography
Anders Tingberg, David Sjostrom
Purpose: To search for the tube voltage which results in the highest clinical image quality per effective dose unit for chest and pelvis radiography respectively, using image plates. Methods: Two anthropomorphic phantoms were imaged with several different tube voltages. For the chest phantom, the tube voltage was varied between 70 and 150 kV, and for the pelvis phantom between 50 and 102 kV. The mAs settings were chosen so that the effective dose to the phantom was the same, regardless of the tube voltage, for the two examinations re-spectively. The clinical image quality of the resulting images was evaluated by a panel of experienced radiologists with vis-ual grading analysis of defined anatomical structures taken from the European Image Criteria. Images produced with the standard tube voltage settings (125 kV for chest and 70 kV for pelvis) were used as reference. These two kV settings were previously used for screen film radiography. Results: For both the chest and the pelvis examinations the image quality at a constant level of effective dose increased when the tube voltage was reduced. Concl usions: The image quality of image plate radiography can be increased by lowering the tube voltage compared to what was used for screen film radiography.
Impedance measurements for early detection of breast cancer in younger women: a preliminary assessment
Jules H. Sumkin M.D., Alexander Stojadinovic M.D., Michelle Huerbin, et al.
The purpose of this preliminary investigation is to explore the possibility that electrical impedance measurements of the breast can ultimately be used to screen younger women for early detection of breast cancer. As a part of a comprehensive protocol to compare different modalities, participating women undergo a series of diagnostic examinations, including impedance measurements under IRB-approved protocols. The results of the frequency-dependent algorithm are compared with the results of other imaging modalities as well as diagnostic outcome when available. In a preliminary series of 83 patients (divided into two groups) with varying risk levels, a significant correlation between impedance measurements and results from other diagnostic modalities was observed. The specific algorithm developed for high specificity resulted in an overall performance level of 90 percent specificity. The procedure was found to be “simple,” “fast,” and “easy to use” by the technologists. The interpretation of the results is straightforward. Our preliminary assessment is encouraging and indicates that the system may prove extremely useful for the purpose it was designed. Further technical improvements and clinical assessments are underway.
Statistics and Methodology
icon_mobile_dropdown
Proposed solution to the FROC problem and an invitation to collaborate
There is a need to develop observer performance methodologies that are not restricted to binary tasks, as is the widely used Receiver Operating Characteristic (ROC) method. Clinical tasks involving lesion localization do not fit the ROC paradigm. Alternative approaches namely Free-response ROC (FROC) and Localization ROC (LROC), have their own limitations. They neglect intra-image correlations, with consequent questionable statistical validity, the image scoring criterion is arbitrary and affects the results, and they do not account for the two sub-tasks, namely detection and localization, inherent to these experiments. The purpose of this work is to propose a new FROC model that deals with these issues, to propose a maximum likelihood method of estimating the model parameters, and to invite collaboration by others in implementing the solution. The model was used to generate simulated FROC data that illustrate the above-mentioned limitations. Measures of detection and localization performance are proposed that will allow FROC performance to be interpreted more quantitatively, and observer performance experiments to be conducted with greater statistical power.
Contemporary issues for experimental design in assessment of medical imaging and computer-assist systems
Robert F. Wagner, Sergey V. Beiden, Gregory Campbell, et al.
The dialog among investigators in academia, industry, NIH, and the FDA has grown in recent years on topics of historic interest to attendees of these SPIE sub-conferences on Image Perception, Observer Performance, and Technology Assessment. Several of the most visible issues in this regard have been the emergence of digital mammography and modalities for computer-assisted detection and diagnosis in breast and lung imaging. These issues appear to be only the “tip of the iceberg” foreshadowing a number of emerging advances in imaging technology. So it is timely to make some general remarks looking back and looking ahead at the landscape (or seascape). The advances have been facilitated and documented in several forums. The major role of the SPIE Medical Imaging Conferences i well-known to all of us. Many of us were also present at the Medical Image Perception Society and co-sponsored by CDRH and NCI in September of 2001 at Airlie House, VA. The workshops and discussions held at that conference addressed some critical contemporary issues related to how society - and in particular industry and FDA - approach the general assessment problem. A great deal of inspiration for these discussions was also drawn from several workshops in recent years sponsored by the Biomedical Imaging Program of the National Cancer Institute on these issues, in particular the problem of “The Moving Target” of imaging technology. Another critical phenomenon deserving our attention is the fact that the Fourth National Forum on Biomedical Imaging in Oncology was recently held in Bethesda, MD., February 6-7, 2003. These forums are presented by the National Cancer Institute (NCI), the Food and Drug Administration (FDA), the Centers for Medicare and Medicaid Services (CMS), and the National Electrical Manufacturers Association (NEMA). They are sponsored by the National Institutes of Health/Foundation for Advanced Education in the Sciences (NIH/FAES). These forums led to the development of the NCI’s Interagency Council on Biomedical Imaging in Oncology (ICBIO) about two and a half years ago. The purpose of the ICBIO is to assist developers of new imaging technologies for cancer screening and diagnosis to find a coherent way to interface with government agencies with responsibilities in these areas. A recent product of these activities was an overview paper written by the present authors and published this year in the Journal Academic Radiology (2). The paper includes a summary of some of the major developments in assessment methodology in recent years and includes several case studies from the public forum of the FDA’s Center for Devices & Radiological Health (CDRH). We will include a brief sketch of some of the key issues of that paper in this review.
Evaluating estimation techniques in medical imaging without a gold standard: experimental validation
John W. Hoppin, Matthew A. Kupinski, Donald W. Wilson, et al.
Imaging is often used for the purpose of estimating the value of some parameter of interest. For example, a cardiologist may measure the ejection fraction (EF) of the heart to quantify how much blood is being pumped out of the heart on each stroke. In clinical practice, however, it is difficult to evaluate an estimation method because the gold standard is not known, e.g., a cardiologist does not know the true EF of a patient. An estimation method is typically evaluated by plotting its results against the results of another (more accepted) estimation method. This approach results in the use of one set of estimates as the pseudo-gold standard. We have developed a maximum-likelihood approach for comparing different estimation methods to the gold standard without the use of the gold standard. In previous works we have displayed the results of numerous simulation studies indicating the method can precisely and accurately estimate the parameters of a regression line without a gold standard, i.e., without the x-axis. In an attempt to further validate our method we have designed an experiment performing volume estimation using a physical phantom and two imaging systems (SPECT< CT).
Propagation of reader variability in mammography to the variable expected benefit over the population
Robert F. Wagner, Craig A. Beam, Sergey V. Beiden
An approach to sampling the U.S. population of mammographers was devised by Beam, Layde, Sullivan. They now have accumulated the reports of 108 radiologists, each reading the same random sample of 150 cases. We have applied multiple-reader, multiple case (MRMC) ROC analysis to their data and reconfirmed the earlier conclusion of Beam that the variability in their observations is dominated by the range of reader skill in their sample (private communication, C. Beam 2000). The purpose of the present paper is to demonstrate how the range of reader skill may become amplified when it is propagated into the corresponding expected benefit curves. This paper is a works-in-progress for which the full report has been submitted to the journal of Medical Decision Making.
Image Interpretation
icon_mobile_dropdown
How does lesion location affect detection performance in digital mammography?
Walter Huda, Kent M. Ogden, Ernest M. Scalzetti, et al.
We investigated how the thickness of a mass lesion at the observer detection threshold varied with lesion location in the breast. A digital mammography system was used to acquire radiographs of an anthropomorphic breast phantom. Mammograms were acquired with and without mass lesions, thereby permitting a difference image to be generated corresponding to the lesion alone. This isolated lesion was added at a reduced intensity to a non-lesion digital mammogram during a 4-Alternate Forced-Choice (4-AFC) experiment. The lesion intensity that corresponded to a 92% correct performance level in the 4-AFC experiments was determined (I92%). Values of I92% were obtained at different locations in the anthropomorphic phantom, thereby permitting the importance of breast thickness and structured background on lesion detection to be investigated. Lesion detection (I92%) was found to be best in high signal intensity regions (black) and ~25% lower in the low signal regions (white). Lesion detection also appeared to depend on the characteristics of the structured background. The experimental results showed a good correlation with a computation that used a convolution of the lesion and the local background region in the mammogram.
Perceptual analysis of cancers missed in mammography screening
Mammography screening is recommended for a large population of women, aiming at detecting the initial signs of breast cancer. However, due to the complexity of the breast parenchyma and to the low prevalence of cancer in the screening population, among other factors, a significant fraction of cancers are not initially reported, being only found in retrospect. Fault visual search, that is, not examining the area where the cancer is located, is responsible for a third of these misses, but all other unreported cancers attract some amount of visual attention, as indicated by the duration of visual gaze in the location of the lesion. Thus, perceptual and decision making mechanisms must be understood, in order to aid radiologists in detection cancer at earlier stages. We have been working on modeling these mechanisms by using spatial frequency analysis, in a process that is inspired by the one performed by the eye-brain system. In this paper we compare the spatial frequency representation of areas that contain reported cancers and that of the same area on the previous mammogram, where the cancer was either not reported or it was reported as a benign lesion. In addition, we contrast the representation of these areas containing cancerous lesions with the representation of the corresponding area in the cancer-free contra-lateral breast.
Eye-tracking AFROC study of the influence of experience and training on chest x-ray interpretation
David Manning, Susan C. Ethell, Trevor Crawford
Four observer groups with different levels of expertise were tested in an investigation into the comparative nature of expert performance. The radiological task was the detection and localization of significant pulmonary nodules in postero-anterior vies of the chest in adults. Three test banks of 40 images were used. The observer groups were 6 experienced radiographers prior to a six month training program in chest image interpretation, the same radiographers after their tr4aining program, and 6 fresher undergraduate radiography students. Eye tracking was carried out on all observers to demonstrate differences in visual activity and nodule detection performance was measured with an AFROC technique. Detection performances of the four groups showed the radiologists and radiographers after training were measurably superior at the task. The eye-tracking parameters saccadic length, number of fixations visual coverage and scrutiny timer per film were measured for all subjects and compared. The missed nodules fixated and not fixated were also determined for the radiologist group. Results have shown distinct stylistic differences in the visual scanning strategies between the experienced and inexperienced observers that we believe can be generalized into a description of characteristics of expert versus non-expert performance. The findings will be used in the educational program of image interpretation for non-radiology practitioners.
Pre-ablative high-resolution MRA facilitates electrophysiologic pulmonary vein ablation and reduces fluoroscopy time in patients with paroxysmal atrial fibrillation
Jeremy D. Collins, F. Scott Pereles M.D., David Bello M.D., et al.
Pulmonary MRA generates high-resolution images of the pulmonary veins (PV) and left atrium (LA), permitting characterization of complex PV anatomy, which is useful in electrophysiologic PV catheter ablation, a proven technique for the treatment of paroxysmal atrial fibrillation (PAF). The purpose of this study was to determine if pre-ablative pulmonary MRA with intra-ablative viewing facilitates ablation by reducing fluoroscopy time. We studied the morphology of the LA and PV at 1.5T (Magnetom Sonata, Siemens Medical Solutions, Erlangen Germany) with breath-held gadolinium-enhanced 3D MRA in 7 patients with PAF undergoing PV ablation. Data was volume rendered (VR) on a stereoscopic workstation. PV ostial diameter and cross-sectional area measurements were obtained on multi-planar reformatted (MPR) images. VR datasets were converted into digital movies and were viewed on a laptop computer adjacent to real-time fluoroscopic images. Fluoroscopy times for patients undergoing pre-ablative MPA mapping were compared with a cohort of 22 consecutive patients diagnosed with PAF who underwent catheter ablation without pre-ablative MRA planning. Mean PV ablation fluoroscopic time with MRA planning versus fluoroscopic imaging alone were 84±20 minutes and 114±20 minutes respectively. Pre-ablative MRA planning resulted in a significant mean fluoroscopy time savings of 26% (p<0.05). In patients with PAF undergoing PV ablation, analysis of MRA datasets depicting PV anatomy confirms that there is great variability in anatomy between veins. Pre-ablative 3D PV mapping by MRA greatly facilitates fluoroscopic identification of individual veins and significantly reduces fluoroscopic radiation time.
Observer Models III
icon_mobile_dropdown
Fast search and localization algorithm based on human visual perception modeling: an application for fast localization of structures in mammograms
A computer algorithm for fast identification and localization of structures of interest in images is presented. The algorithm is based on the analysis of a reduced set of image neighborhoods selected randomly by a constrained sampling of an associated image map of much smaller spatial resolution. The general approach is demonstrated by estimating the relative location of the breast tissue on a dataset of 860 digitized mammographic images. The computational times and breast tissue localization error rates are reported for different reduced spatial resolution image maps and three different features used for the corresponding neighborhood analysis. Our results show significant improvement on the error rates and computational times obtained with our approach compared to a pixel intensity thresholding approach. The algorithm implementation is very simple, requires less computation time than the sequential processing of each one of the image elements in a raster pattern and can be easily included into a hierarchical image analysis model.
Pre-envelope deconvolution for increased lesion detection efficiency in ultrasonic imaging
We use an ideal observer model to evaluate the efficiency of human observers detecting a simulated lesion in the presence of speckle, and the ability of pre-envelope deconvolution to improve performance in this task. We model the lesion as a localized area of increased scatter density, which translates into an area of higher variance in the ultrasound signal. Assuming the scattering function and electronic noise obey Gaussian distributions, the ideal observer for lesion detection is given by a quadratic function of the in-phase (I) and quadrature (Q) data. For comparison, human-observer performance is assessed through two-alternative forced-choice (2AFC) psychophysical studies after making a B-mode image by computing the magnitude (envelope) of the I and Q components. We also consider the effect of removing spatial correlations in the I and Q components, before computing the magnitude (pre-envelope deconvolution). Our Psychophysical studies indicate approximately a 4-fold improvement in detection efficiency with pre-envelope deconvolution.
Computation of the ensemble channelized hotelling observer signal-to-noise ratio for ordered-subset image reconstruction using noisy data
Edward J. Soares, Howard C. Gifford, Stephen J. Glick
We investigated the estimation of the ensemble channelized Hotelling observer (CHO) signal-to-noise ratio (SNR) for ordered-subset (OS) image reconstruction using noisy projection data. Previously, we computed the ensemble CHO SNR using a method for approximating the channelized covariance of OS reconstruction, which requires knowledge of the noise-free projection data. Here, we use a “plug-in” approach, in which noisy data is used in place of the noise-free data in the aforementioned channelized covariance approximation. Additionally, we evaluated the use of smoothing of the noisy projections before use in the covariance approximation. Additionally, we evaluated the use of smoothing of the noisy projections before use in the covariance calculation. The task was detection of a 10% contrast Gaussian signal within a slice of the MCAT phantom. Simulated projections of the MCAT phantom were scaled and Poisson noise was added to create 100 noisy signal-absent data sets. Simulated projections of the scaled signal were then added to the noisy background projections to create 100 noisy signal-present data set. These noisy data sets were then used to generate 100 estimates of the ensemble CHO SNR for reconstructions at various iterates. For comparison purposes, the same calculation was repeated with the noise-free data. The results, reported as plots of the average CHO SNR generated in this fashion, along with 95% confidence intervals, demonstrate that this approach works very well, and would allow optimization of imaging systems and reconstruction methods using a more accurate object model (i.e., real patient data).
Perceptual difference paradigm for analyzing image quality of fast MRI techniques
David L. Wilson, Kyle A. Salem, Donglai Huo, et al.
We are developing a method to objectively quantify image quality and applying it to the optimization of fast magnetic resonance imaging methods. In MRI, to capture the details of a dynamic process, it is critical to have both high temporal and spatial resolution. However, there is typically a trade-off between the two, making the sequence engineer choose to optimize imaging speed or spatial resolution. In response to this problem, a number of different fast MRI techniques have been proposed. To evaluate different fast MRI techniques quantitatively, we use a perceptual difference model (PDM) that incorporates various components of the human visual system. The PDM was validated using subjective image quality ratings by naive observers and task-based measures as defined by radiologists. Using the PDM, we investigated the effects of various imaging parameters on image quality and quantified the degradation due to novel imaging techniques including keyhole, keyhole Dixon fat suppression, and spiral imaging. Results have provided significant information about imaging time versus quality tradeoffs aiding the MR sequence engineer. The PDM has been shown to be an objective tool for measuring image quality and can be used to determine the optimal methodology for various imaging applications.
Optimizing imaging hardware for estimation tasks
Medical imaging is often performed for the purpose of estimating a clinically relevant parameter. For example, cardiologists are interested in the cardiac ejection fraction, the fraction of blood pumped out of the left ventricle at the end of each heart cycle. Even when the primary task of the imaging system is tumor detection, physicians frequently want to estimate parameters of the tumor, e.g. size and location. For signal-detection tasks, we advocate that the performance of an ideal observer be employed as the figure of merit for optimizing medical imaging hardware. We have examined the use of the minimum variance of the ideal, unbiased estimator as a figure of merit for hardware optimization. The minimum variance of the ideal, unbiased estimator can be calculated using the Fisher information matrix. To account for both image noise and object variability, we used a statistical method known as Markov-chain Monte Carlo. We employed a lumpy object model and simulated imaging systems to compute our figures of merit. We have demonstrated the use of this method in comparing imaging systems for estimation tasks.
Optimizing a multiple-pinhole SPECT system using the ideal observer
Kevin Gross, Matthew A. Kupinski, Todd E. Peterson, et al.
In a pinhole imaging system, multiple pinholes are potentially beneficial since more radiation will arrive in the detector plane. However, the various images produced by each pinhole may multiplex (overlap), possibly decreasing image quality. In this work we develop the framework for comparing various pinhole configurations using ideal-observer performance as a figure of merit. We compute the ideal-observer test statistic, the likelihood ratio, using a statistical method known as Markov-Chain Monte Carlo. For different imaging systems, we estimate the likelihood ratio for many realizations of noisy image data both with and without a signal present. For each imaging system, the area under the ROC curve provides a meaningful figure of merit for hardware comparison. In this work we compare different pinhole configurations using a three-dimensional lumpy object model, a known signal (SKE), and simulated pinhole imaging systems. The results of our work will eventually serve as a basis for a design of high-resolution pinhole SPECT systems.
Poster Session
icon_mobile_dropdown
MTF correction for optimizing softcopy display of digital mammograms: use of a vision model for predicting observer performance
Elizabeth A. Krupinski, Hans Roehrig, Michael Engstrom, et al.
The goal of this project was to develop an efficient method of optimizing CRT monitor performance for digital mammography. In this study we examined the effects on performance of processing images to compensate for limitations in the MTF of the CRT monitor. The Sarnoff JNDmetrix vision model is based on just-noticeable difference measurement and frequency-channel vision-modeling principles. Given two images as input the model returns accurate, robust estimates of their discriminability. Model predictions are then compared with human performance. Mammographic images (n = 250) with microcalcifications were viewed by six radiologists. The images were viewed once in original unprocessed form and once after processing. Results were compared with output of the model that was used to predict differences in perceptibility of calcifications using luminance data measured with a high-resolution CCD camera. Human performance was better with the MTF compensated images at all contrast levels. The JNDmetrix model predicted the same pattern of results. Correlation between human and model observer performance was very high. Using image processing methods to compensate for limitations in the MTF of CRT monitors can improve the detection performance of radiologists searching for microcalcifications.
Selective pattern enhancement processing for digital mammography, algorithms, and the visual evaluation
Masahiko Yamada, Kazuo Shimura, Takefumi Nagata
In order to enhance the micro calcifications selectively without enhancing noises, PEM (Pattern Enhancement Processing for Mammography) has been developed by utilizing not only the frequency information but also the structural information of the specified objects. PEM processing uses two structural characteristics i.e. steep edge structure and low-density isolated-point structure. The visual evaluation of PEM processing was done using two different resolution CR mammography images. The enhanced image by PEM processing was compared with the image without enhancement, and the conventional usharp-mask processed image. In the PEM processed image, an increase of noises due to enhancement was suppressed as compared with that in the conventional unsharp-mask processed image. The evaluation using CDMAM phantom showed that PEM processing improved the detection performance of a minute circular pattern. By combining PEM processing with the low and medium frequency enhancement processing, both mammary glands and micro calcifications are clearly enhanced.
A digital image acquisition system for skin lesions
Ilias G. Maglogiannis, Dimitrios I. Kosmopoulos
A major issue concerning the design and implementation of an acquisition system for digital images of skin lesions is the ability of capturing reproducible images. The reproducibility is considered essential for image analysis classification and for the comparison of sequential images during follow-up studies. This paper describes a complete image acquisition system used for the collection of reproducible images of patients having melanoma and compares them with images displaying dysplastic nevus for diagnostic purposes. The system includes a standardized illumination and capturing geometry with polarizing filters and a series of software corrections: Calibration to Black, White, Internal and External camera parameters, Shading correction and Median filtering. The validity of the calibration procedure and the ability of the implemented system to produce reproducible images were tested by capturing sample images in 3 different lighting conditions of the surrounding environment: dark, medium and intense lighting. For each case the average values of the three-color planes RGB and their standard deviations were calculated and the measured error differences ranged between 0,4 and 13,2 (in the 0-255 scale). Preliminary experiments for stereo measurements provided repeatability of about 0.3mm. The above numbers demonstrate the reproducibility of the captured images at a satisfactory level.
Comparison of a segmentation algorithm to six expert imagiologists in detecting pulmonary contours on x-ray CT images
Quantitative evaluation of the performance of segmentation algorithms on medical images is crucial before their clinical use can be considered. We have quantitatively compared the contours obtained by a pulmonary segmentation algorithm to contours manually-drawn by six expert imaiologists on the same set of images, since the ground truth is unknown. Two types of variability (inter-observer and intra-observer) should be taken into account in the performance evaluation of segmentation algorithms and several methods to do it have been proposed. This paper describes the quantitative evaluation of the performance of our segmentation algorithm using several figures of merit, exploratory and multivariate data analysis and non parametric tests, based on the assessment of the inter-observer variability of six expert imagiologists from three different hospitals and the intra-observer variability of two expert imagiologists from the same hospital. As an overall result of this comparison we were able to claim that the consistency and accuracy of our pulmonary segmentation algorithm is adequate for most of the quantitative requirements mentioned by the imagiologists. We also believe that the methodology used to evaluate the performance of our algorithm is general enough to be applicable to many other segmentation problems on medical images.
Assessment of a novel, high-resolution, color, AMLCD for diagnostic medical image display: luminance performance and DICOM calibration
Alice N. Averbukh, David S. Channin M.D., Michael J. Flynn
This paper documents the results of the first in a series of experiments designed to evaluate the suitability of a novel, high resolution, color, digital, LCD panel for diagnostic quality, gray scale image display. The goal of this experiment was to measure the performance of this display especially with respect to luminance. A DICOM Part 14 calibration of the panel was done using both a pure gray look up table and a color look up table. The panel evaluated was the IBM T221 22.2" backlit AMLCD display with native resolution of 3840 × 2400 pixels. Taking advantage of the color capabilities of the workstation, we were able to create a 256 entry grayscale calibration look up table derived from a palette of 1786 nearly gray luminance values. We also constructed a 256 entry gray-scale calibration look up table derived from a palette of 256 true gray values for which the red, green, and blue values were equal. For the DICOM calibration derived from the 256 palette, 45 of the 256 shades of gray were redundant. For the DICOM calibration derived from a 1786 palette, all shades of gray were different and the luminance change between each gray level accurately agreed with the DICOM part 14 standard. These calibrations will now be used in our subsequent evaluation of human contrast - detail perception on this panel.
Investigation of JPEG 2000 encoder options on model observer performance in signal known exactly but variable tasks (SKEV)
The new still image compression standard JPEG 2000 provides a set of features such as multiple resolution representation, tiling, region of interest (ROI) coding, and easy compression rate control. Previous evaluations of these encoder options have been with respect to non-task based image quality metrics (PSNR) and for non-medical images. In this paper, we investigated the effect of different JPEG 2000 encoder options on task-based model and human observer performance. Test images consisted of x-ray coronary angiogram backgrounds with simulated filling defects (signals)of 184 different size/shapes inserted in one of four simulated arteries. The task was to select the simulated artery containing the signal (four alternative forced choice; 4 AFC). The signal on each trial varied in shape and size but was known to the observer (signal known exactly but variable, SKEV). We obtained performance for the non-prewhitening matched filter with an eye filter (NPWE) model for the SKEV task through direct template implementation (Eckstein et al., 2000) on the test images. For comparison, a follow-up human psychophysical study with two observers was conducted. Our results showed that the dependence of task performance on the JPEG 2000 encoder options was similar for both the NPWE model and the human observers.
How to establish equivalence between two treatments in ROC analysis
Mitsuru Ikeda, Takeo Ishigaki, Kazunobu Yamauchi
We have applied the equivalence statistical test designs in the success rate to the equivalence test in the ROC analysis. Here, in the ROC analysis, it is difficult to determine the acceptable difference. We consider the ROC curve in the binormal model, and the maximum allowable value of the true difference in the area under the binormal ROC curve between two treatments is considered to be the one corresponding to the maximum acceptable value of the true difference in the sensitivity with the same specificity (or specificity with same sensitivity) between two treatments. Here, we have shown the tolerable true difference of the areas under the binormal ROC curve will be calculated in the case of the same slope in the binormal ROC plane, if an acceptable true difference of the sensitivity with same specificity (or specificity with same sensitivity) is given. So, one is able to test the equivalence in diagnostic performance of two medical image tests by using the ROC analysis, by testing a null hypothesis of an acceptable difference calculated from an acceptable sensitivity (or specificity) difference versus an alternative hypothesis of a true difference less than it.
Relationship between changes in pupil size over time and diagnostic accuracy
Toru Matsumoto, Akira Furukawa, Megumu Tsuchikawa, et al.
The objective of this study was to measure the image exploration activity of physicians, and thereby contribute to the development of a support system for CRT image interpretation in thoracic CT screening. In this study, we examined how the pupil diameters of five physicians changes over time during interpretation of a large quantity of CT images on a CRT monitor, and how this might be related to the accuracy of diagnosis. The study showed that, when a large quantity of CT images were viewed through a CRT monitor in a dimly lit room, the pupil diameter decreased during the second half of the long interpretation procedure in three of the five physicians. Furthermore, the pupil diameter frequently became approximately zero because the physician became drowsy. However, when the relationship between these phenomena and the accuracy of diagnosis was analyzed in one of the physicians, proof that such phenomena might lead to statistically significant false negatives or false positives was not found. Despite such results, the potential risk of misdiagnosis cannot be ignored. It may be necessary to devise both equipment and work conditions that will not cause the pupil diameter to become approximately zero during interpretation of images on a CRT monitor.
Differences in coverage patterns in cervical cytology screening
Amy Smith, Alastair G. Gale, David S. Wooding, et al.
The visual screening of cervical smears is a complex process requiring appropriate slide coverage to detect any unusual appearances without making any omission errors. In examining a smear the observer has both to move the microscope stage appropriately to bring different slide areas into view, plus visually search the information presented within the binocular visual field. This study examined the patterns of slide coverage by different individuals when they inspected liquid based cervical smears. A binocular microscope was first adapted in order to record both the physical movement of the stage by the observer and also to access the microscope’s visual field. An image of the area of the smear under the microscope was displayed on a PC monitor and observers’ eye movements were recorded as they searched this. By manually adjusting the microscope controls they also moved the stage and all stage movements and focussing were also recorded. The behaviour was examined of both novices and an expert screener as they searched a number of test cervical smears. It was found that novices adopted a regular examination pattern, which maximized slide coverage, albeit slowly. In contrast, the experienced screener covered the slides faster and more effectively ensuring more overlap between microscope fields.
Contrast visibility of simulated microcalcifications in full field mammography systems
We evaluated the visibility of simulated subtle microcalcifications in real digital mammograms acquired with a flat-panel system (GE) and a CR system (Fuji). Ideal templates of microcalcifications were created, based on the attenuation characteristics of subtle microcalcifications from biopsied specimen in magnified images. X-ray transmission coefficients were expressed in Al-equivalent thickness. In this way, the X-ray transmission of a particular lesion could be re-calculated for other X-ray beams, different mammography systems and for different breast thickness. Extra corrections for differences in spatial resolution were based on the pre-sampled MTF. Zero to 10 simulated microcalcifications were randomly distributed in square frames. These software phantoms were then inserted in sets of raw mammograms of the modalities under study. The composed images were compressed, processed and printed as in clinical routine. Two experienced radiologists indicated the locations of the microcalcifications and rated their detection confidence. It is possible to assess the visibility of 'well controlled’ microcalcifications in digital clinical mammograms. Microcalcifications were better visible in the CR images than in the flat panel images. This psychophysical method comes close to the radiologists’ practice. It allows fpr including processing and visualization in the analysis and was well appreciated by our radiologists.
Subsecond magnetic resonance angiography and the evaluation of arteriovenous communications
Anish B. Zachariah, F. Scott Pereles M.D., Ryan Kaliney, et al.
Magnetic resonance (MR) angiography is becoming widely accepted in the diagnosis of vascular diseases. When used for evaluation of arterial stenoses, aneurysm, thrombosis, or occlusion, MR angiography is a robust and accurate technique. Traditional techniques for contrast-enhanced magnetic resonance angiography (MRA) offer the benefit of high spatial resolution in characterizing vascular malformations, but have lacked the temporal resolution to describe dynamic flow events. The purpose of this project is to demonstrate the potential role of a novel technique, sub-second MRA, in the evaluation of abdominal arteriovenous malformation.
Training artificial neural networks (ANNs) with multiple target values to reduce output uncertainty
We have shown previously that there is uncertainty associated with the output of artificial neural network (ANN) and we have now developed a new method to reduce this uncertainty by training ANNs with multiple target values. In conventional ANN training, binary target values are used to represent, e.g., benign and malignant cases. However, this method does not take into consideration the various histology subtypes. In this work, we used both simulated datasets and a mammography dataset to show that the conventional training method leads to larger uncertainty in the ANN output. Eight ANNs were trained by choosing different initial weights and ANN output variance was measured by the average standard deviation (SD) of the 8 ANNs' outputs for each test case. In the simulation, in addition to the conventional training method using binary target values, we also trained ANNs with multiple target values, and a set of continuous target values derived from a likelihood ratio of the underlying distributions. For the mammogram study, we assigned multiple target values based on histology subtypes. Both the simulation and mammogram studies showed that ANNs produce very close overall performance regardless the training methods. However, training neural networks with multiple target values demonstrated lower uncertainty in the ANN outputs.
Identification of missed pulmonary nodules on low-dose CT lung cancer screening studies using an automatic detection system
Carol L. Novak, Li Fan, Jianzhong Qian, et al.
Multi-slice CT (MSCT) scanners allow nodules as small as 3mm to be identified during screening. However the associated large data sets make it challenging for radiologists to identify all small nodules in a reasonable amount of time. Computer-aided detection may play a critical role in identifying missed nodules. 13 MSCT screening studies, initially interpreted as "non-actionable" by a radiologist, were selected from participants in a lung cancer screening study. The study protocol defines "actionable" studies as those containing at least 1 solid non-calcified nodule larger than 3mm, for which follow-up studies are recommended to exclude interval growth. An automatic detection algorithm was applied to the 13 studies to determine whether it might detect missed nodules, and whether any of these were of sufficient size to be considered "actionable". There were a total of 138 automatically detected candidate nodules, an average of 10.6 per patient. 83 candidates were characterized as true positives, yielding a positive predictive value of 60.1%. 10 automatically detected candidates were judged to be actionable nodules greater than 3mm in diameter. 6 of 13 (46%) patients had at least one "actionable" finding detected by the computer that had been overlooked in the initial exam.
Comparison between different monitors to be used in the reading of digital mammographic images
Adele Lauria, Mauro Drogo, Maria Evelina Fantacci, et al.
Digital acquisition of mammographic images is becoming more diffuse in hospitals, as well as off line digitalization of analogical images to allow use of CAD, filing and statistical tools. Radiologists performance in reading digital images are strictly related to the quality of the images displayed on the monitor. We are investigating how different display devices, influence digital images reading. To reach the goal we are using phantoms for quality controls in mammography. Three different monitors are considered. The first one is the high resolution CRT display used as diagnostic monitor for the GE digital mammograph. The others are a high quality monitor for personal computer and the monitor of a high quality notebook. The phantoms used are the CDMAM 3.2 (a contrast-detail phantom) and the RMI 156 (which contains test objects that represent malignancies and small breast structures). Their images were acquired by the digital mammograph and were then analyzed by two expert radiologists by observing them on the different display devices, adopting the same procedure. The results about the reading of the phantoms and the interpretation of the images with different monitors are presented here.
ROC analysis of lesion descriptors in breast ultrasound images
Michael P. Andre, Michael Galperin, Peter Phan M.D., et al.
Breast biopsy serves as the key diagnostic tool in the evaluation of breast masses for malignancy, yet the procedure affects patients physically and emotionally and may obscure results of future mammograms. Studies show that high quality ultrasound can distinguish a benign from malignant lesions with accuracy, however, it has proven difficult to teach and clinical results are highly variable. The purpose of this study is to develop a means to optimize an automated Computer Aided Imaging System (CAIS) to assess Level of Suspicion (LOS) of a breast mass. We examine the contribution of 15 object features to lesion classification by calculating the Wilcoxon area under the ROC curve, AW, for all combinations in a set of 146 masses with known findings. For each interval A, the frequency of appearance of each feature and its combinations with others was computed as a means to find an “optimum” feature vector. The original set of 15 was reduced to 6 (area, perimeter, diameter ferret Y, relief, homogeneity, average energy) with an improvement from Aw=0.82∓0.04 for the original 15 to Aw=0.93∓0.02 for the subset of 6, p=0.03. For comparison, two sub-specialty mammography radiologists also scored the images for LOS resulting in Az of 0.90 and 0.87. The CAIS performed significantly higher, p=0.02.
Effects of grayscale window/level parameters on breast lesion detectability
The detectability of low-contrast lesions in medical images can be affected significantly by the choice of grayscale window width and level (W/L) for electronic display. Our objective was to measure the effects of various W/L conditions on lesion detectability in simulated and real mammographic images, and then correlate observer performance with predictions of detection thresholds derived from a visual discrimination model (VDM). In the first experiment, detection thresholds were measured in 2AFC trials for five W/L conditions applied to simulated mammographic backgrounds and lesions (i.e., Gaussian "masses" and blurred-disk "microcalcification clusters") using nonmedical observers. In the second experiment, the detectability of real microcalcification clusters in digitized mammograms was evaluated for three W/L conditions in an ROC observer study with mammographers. For the simulated images, there was generally good agreement between model and experimental thresholds and their variations across W/L conditions. Both experimental and model results showed significant reductions in thresholds when W/L processing was applied locally near the lesion. ROC results with digitized mammograms read by radiologists, however, failed to show enhanced detection of microcalcifications using a localized W/L frame, probably due to the nonuniform appearance of parenchymal tissue across the image.
Bayesian ANN estimates of three-class ideal observer decision variables for classification of mammographic masses
Darrin C. Edwards, Li Lan, Charles E. Metz, et al.
We are using Bayesian artificial neural networks (BANNs) to classify mammographic masses. We investigated whether a BANN can estimate ideal observer decision variables to distinguish malignant, benign, and false-positive computer detections. Five features were calculated for 143 malignant and 125 benign mass lesions, and for 1049 false-positive computer detections, in 596 mammograms randomly divided into a training and testing set. A BANN was trained on the training set features and applied to the testing set features. We then used a known relation between three-class ideal observer decision variables and that used by a two-class ideal observer when two of three classes are grouped into one class, giving one decision variable for distinguishing malignant from non-malignant detections, and a second for distinguishing true-positive from false-positive computer detections. For comparison, we pooled the training data into two classes in the same two ways and trained two-class BANNs for these two tasks. The three-class BANN decision variables were essentially identical in performance to the specifically trained two-class BANNS. This is consistent with the theoretical observation that three-class ideal observer decision variables are directly related to those used by a two-class ideal observer.
Computer-aided detection of lung cancer on chest radiographs: differences in the interpretation time of radiologists showing versus not showing improvement with CAD
Using data from a clinical trial of a commercial CAD system for lung cancer detection, we are comparing the time used for interpreting chest radiographs between the radiologists showing improvement in detecting lung cancer with computer assistance to those not showing improvement. While measurement showed that the 15 radiologists as a group showed improvement (the Az was 0.8288 in independent reading, and 0.8654 in sequential reading with CAD, improvement has a P-value of 0.0058), there were 9 radiologists who showed improvement and 6 who did not. The behavior of the radiologists differed between the cases that contained cancer and those that were cancer-free. For the cases that contained a cancer, there was no statistically significant difference in time between the two groups (P-value 0.26). For the cancer-free cases, we found a statistically significant greater interpretation time for the radiologists whose performance in cancer detection was better with computer assistance compared to those without improvement (P-value 0.02). This work shows that radiologists who increased their detection of lung cancer using CAD, compared to those who showed no improvement, significantly increased their reading time when they determined that true negative cases for cancer were indeed true negative cases, but did not increase reading time for true positive decision on cancer cases.
Statistical analysis to assess automated level of suspicion scoring methods in breast ultrasound
A well-defined rule-based system has been developed for scoring 0-5 the Level of Suspicion (LOS) based on qualitative lexicon describing the ultrasound appearance of breast lesion. The purposes of the research are to asses and select one of the automated LOS scoring quantitative methods developed during preliminary studies in benign biopsies reduction. The study has used Computer Aided Imaging System (CAIS) to improve the uniformity and accuracy of applying the LOS scheme by automatically detecting, analyzing and comparing breast masses. The overall goal is to reduce biopsies on the masses with lower levels of suspicion, rather that increasing the accuracy of diagnosis of cancers (will require biopsy anyway). On complex cysts and fibroadenoma cases experienced radiologists were up to 50% less certain in true negatives than CAIS. Full correlation analysis was applied to determine which of the proposed LOS quantification methods serves CAIS accuracy the best. This paper presents current results of applying statistical analysis for automated LOS scoring quantification for breast masses with known biopsy results. It was found that First Order Ranking method yielded most the accurate results. The CAIS system (Image Companion, Data Companion software) is developed by Almen Laboratories and was used to achieve the results.
Visual scan-path analysis with feature space transient fixation moments
Laura Dempere-Marco, Xiao-Peng Hu, Guang-Zhong Yang
The study of eye movements provides useful insight into the cognitive processes underlying visual search tasks. The analysis of the dynamics of eye movements has often been approached from a purely spatial perspective. In many cases, however, it may not be possible to define meaningful or consistent dynamics without considering the features underlying the scan paths. In this paper, the definition of the feature space has been attempted through the concept of visual similarity and non-linear low dimensional embedding, which defines a mapping from the image space into a low dimensional feature manifold that preserves the intrinsic similarity of image patterns. This has enabled the definition of perceptually meaningful features without the use of domain specific knowledge. Based on this, this paper introduces a new concept called Feature Space Transient Fixation Moments (TFM). The approach presented tackles the problem of feature space representation of visual search through the use of TFM. We demonstrate the practical values of this concept for characterizing the dynamics of eye movements in goal directed visual search tasks. We also illustrate how this model can be used to elucidate the fundamental steps involved in skilled search tasks through the evolution of transient fixation moments.
Evaluation of detection in compressed digital mammograms using numerical observers
The objective of this study was to evaluate an image compression technique for digital mammography using a nonprewhitening matched filter with an eye filter (NPWE) and channelized Hotelling numerical observer models. A total of 1024 images were cropped from clinical digital mammograms and used as backgrounds. The images were acquired using a clinical full-field digital mammography (FFDM) system and masses of sizes 30, 40, and 60 pixels (100 μm pixel size) were simulated. In addition, microcalcifications were synthetically extracted from clinical digital mammograms and used in the study. Image compression was achieved using a compression software (JPEG 2000, Aware Inc., Bedford, MA) at compression ratios 1:1, 15:1 and 30:1. The channelized Hotelling observer model was investigated only for the mass type lesions by transforming the images to channel space and computing the Hotelling trace for each compression condition. The NPWE model was investigated for both lesions and micocalcifications at all compression conditions and the detection indices were computed by assuming Gaussian statistics and by the 'percent correct’ detection method. The results of the study indicated a reduction in detection with increased compression for microcalcifications at 30:1 compression while almost no variation in detection index was observed for the simulated masses.