Efficient channels for the ideal observer
Author(s):
Subok Park;
Matthew A. Kupinski;
Eric Clarkson;
Harrison H. Barrett
Show Abstract
For a signal-detection task, the Bayesian ideal observer is
optimal among all observers because it incorporates all the
statistical information of the raw data from an imaging system.
The ideal observer test statistic, the likelihood ratio, is
difficult to compute when uncertainties are present in backgrounds
and signals. In this work, we propose a new approximation
technique to estimate the likelihood ratio. This technique is a
dimensionality-reduction scheme we will call the channelized-ideal
observer (CIO). We can reduce the high-dimensional integrals of
the ideal observer to the low-dimensional integrals of the CIO by
applying a set of channels to the data. Lumpy backgrounds and
circularly symmetric Gaussian signals are used for simulations
studies. Laguerre-Gaussian (LG) channels have been shown to be
useful for approximating ideal linear observers with these
backgrounds and signals. For this reason, we choose to use LG
channels for our data. The concept of efficient channels is
introduced to closely approximate ideal-observer performance with
the CIO for signal-known-exactly (SKE) detection tasks.
Preliminary results using one to three LG channels show that the
performance of the CIO is better than the channelized-Hotelling
observer for the SKE detection tasks.
Using Fisher information to compute ideal-observer performance on detection tasks
Author(s):
Fangfang Shen;
Eric Clarkson
Show Abstract
In medical imaging, signal detection is one of the most important tasks. A common way to evaluate the performance of an imaging system for a signal-detection task is to calculate the detectability of
the ideal observer. Since the detectability of an ideal observer is not always easy to calculate, it is useful to have approximations for it. These approximations can also be used to check the bias of
numerical computations of the ideal-observer detectability. For signal detection tasks, we usually have two probability densities for the data vector, the signal-absent density and the signal-present density. In this work, we use a single probability density with a variable scalar or vector parameter to represent the corresponding densities under the two hypotheses. The ideal-observer detectability is derived from the area under the receiver operating characteristic curve of the ideal observer for the given detection
task. We have found that we can develop expansions for the square of this detectability as a function of the signal parameter, and that the lowest order expansions involve the Fisher information matrix for
the problem of estimating the signal parameter. There are four basic methods we have considered for deriving such expansions. We compute these approximations to ideal-observer detectability for several cases. We compare these to the exact detectability values for these same cases, derived from results in previous work, to examine the usefulness of these approaches. The idea of using one parameterized
probability density function is introduced in order to relate detection performance to estimation performance. Even without an analytical expression for ideal-observer detectability we are able to
compute analytical forms for its derivatives in terms of the Fisher information matrix and similarly defined statistical moments. The results suggest that there is a connection between the performance
of a system on signal-detection tasks and signal-estimation tasks.
The effect of non-linear human visual system components on linear model observers
Author(s):
Yani Zhang;
Binh T. Pham;
Miguel P. Eckstein
Show Abstract
Linear model observers have been used successfully to predict human performance in clinically relevant visual tasks for a variety of backgrounds. On the other hand, there has been another family of models used to predict human visual detection of signals superimposed on one of two identical backgrounds (masks). These masking models usually include a number of non-linear components in the channels that reflect properties of the firing of cells in the primary visual cortex (V1). The relationship between these two traditions of models has not been extensively investigated in the context of detection in noise. In this paper, we evaluated the effect of including some of these non-linear components into a linear channelized Hotelling observer (CHO), and the associated practical implications for medical image quality evaluation. In particular, we evaluate whether the rank order evaluation of two compression algorithms (JPEG vs. JPEG 2000) is changed by inclusion of the non-linear components. The results show: a) First that the simpler linear CHO model observer outperforms CHO model with the nonlinear components investigated. b) The rank order of model observer performance for the compression algorithms did not vary when the non-linear components were included. For the present task, the results suggest that the addition of the physiologically based channel non-linearities to a channelized Hotelling might add complexity to the model observers without great impact on medical image quality evaluation.
Metrics of medical image quality: task-based model observers vs. image discrimination/perceptual difference models
Author(s):
Miguel P. Eckstein;
Yani Zhang;
Binh T. Pham
Show Abstract
There have been two distinct approaches to develop human vision models that can be used to perform automated evaluation and optimization of medical image quality: linear task based model observers vs. perceptual difference/image discrimination models. Although these two approaches are very different there has been little work directly comparing them in their ability to optimize human performance in clinically relevant tasks. We compared the effectiveness of these two types of metrics of image quality to perform automated computer optimization of JPEG 2000 image compression encoder settings using test images that combined real x-ray coronary angiogram backgrounds with simulated filling defects of 184 different size/shapes. A genetic algorithm was used to optimize the JPEG 2000 encoder settings with respect to: a) a particular task based model observer performance (non-prewhitening matched filter with an eye filter, NPWE; b) a particular perceptual difference/image discrimination model error metric (DCTune2.0; NASA Ames Research Center). A subsequent human psychophysical study was conducted to evaluate the effect of the two different optimized compression encoder settings on visual detection of the simulated filling defect in one of four locations (four alternative forced choice; 4 AFC). Results show that optimizing JPEG 2000 encoder settings with respect to both the NPWE performance and DCTune 2.0 perceptual error lead to improved human task performance relative to human performance with the default encoder settings. However, the NPWE-optimization led to much greater human performance improvement than the perceptual difference model optimization.
Predicting detection task performance using a visual discrimination model
Author(s):
Dev Prasad Chakraborty
Show Abstract
In the visual discrimination model (VDM) approach to measuring image quality two input images are analyzed by an algorithm, which calculates the just-noticeable-index (JND) index, which is a measure of the perceptual difference between the two images in the discrimination task. It has been proposed that if one can simulate relevant lesions and backgrounds, the same method can be used to predict target detectability. One generates pairs of images which are exactly identical except for the presence of a lesion in one of them. The JND-index measured on this image pair is thought to correlate with target detectability, such as might be measure in a receiver operating characteristic (ROC) study, and some experimental studies supporting this idea have appeared. It is pointed out in this work that this method can lead to anomalous results, namely it does not predict the qualitative effect of lesion-size on lesion detectability in mammographic backgrounds. Another anomaly is that the method appears to work on single images, whereas the ROC method needs sets of normal and abnormal images. In this work we show that by modifying the method so that comparisons of near-identical images are avoided, it is possible to predict the lesion size dependence and avoid the clash with the ROC method.
Reader error, object recognition, and visual search
Author(s):
Harold L. Kundel
Show Abstract
Small abnormalities such as hairline fractures, lung nodules and breast tumors are missed by competent radiologists with sufficient frequency to make them a matter of concern to the medical community; not only because they lead to litigation but also because they delay patient care. It is very easy to attribute misses to incompetence or inattention. To do so may be placing an unjustified stigma on the radiologists involved and may allow other radiologists to continue a false optimism that it can never happen to them. This review presents some of the fundamentals of visual system function that are relevant to understanding the search for and the recognition of small targets embedded in complicated but meaningful backgrounds like chests and mammograms. It presents a model for visual search that postulates a pre-attentive global analysis of the retinal image followed by foveal checking fixations and eventually discovery scanning. The model will be used to differentiate errors of search, recognition and decision making. The implications for computer aided diagnosis and for functional workstation design are discussed.
2-AFC observer study of shape and contrast discrimination in digital stereomammography
Author(s):
Andrew D. A. Maidment;
Todd Karasick;
Predrag R. Bakic;
Michael Albert
Show Abstract
We continue to evaluate fundamental factors that affect the ability of human observers in digital stereomammography. A 2-alternative forced choice (2-AFC) observer study for discrimination of simulated objects in the presence of x-ray quantum noise was performed. In our previous contrast-detail and 2-AFC studies investigating the detection of simulated lesions, we observed that at the same total dose observers perform similarly for stereoscopic and monoscopic imaging. The current experiments were designed to investigate discrimination tasks. Three or four observers attended a series of sessions, each consisting of 300-400 image pairs. We sequentially evaluated discrimination of images based on object shape and contrast. In each trial, two images were presented, each containing a small disk of known size and position, but which differ in terms of blurring or contrast to background. The observers indicated the image containing the disk with greater blurring or higher contrast. The experiments were repeated for 3 or 4 different values of signal-to-noise ratio (SNR), and for 3 different diameters. The fraction of correct responses was computed for each test condition. Detection performance was compared in terms of the linear fit of d’ as a function of SNR. Preliminary results again confirmed the advantage of stereoscopy. For the discrimination of blurred objects, the ratio of d’(SNR) averaged over all conditions took values in the range of 1.22-1.73 for the three observers (average 1.45), compared to the theoretically expected value of 1.41. No advantage was seen for discrimination of contrast (average 1.02). It appears that suppression of quantum noise in stereoscopically viewed simulated images by the human visual system enables advantages in discrimination of small lesions with different shape. It is possible, therefore, to match the dose of a stereo pair to the dose required for a single mammogram.
Salient features in mammograms using Gabor filters and clustering
Author(s):
Philip Perconti;
Murray H. Loew
Show Abstract
We present a method for locating salient features in mammograms. The perceptual importance or salience of image pixels can be studied using a statistical measure of pixel-based features. The “outliers” or greatest values for this measure can be regarded as salient because in an imaging sense, the outliers tend to contribute to the local feature contrast. Our method finds important image features first by spatially decomposing the image using a process that models the human vision system. Salience maps then are created using the Mahalanobis distance, and a scalar visibility metric then is analyzed. Six mammographers each read three mammograms. Each mammogram had two views. During screening, eye position data were recorded. A K-means algorithm then was applied to identify fixation clusters. Following decomposition, Analysis of Variance (ANOVA) then was performed to examine the effects of observer experience, spatial frequency, and discrimination using the maximum value of the visibility metric. This pilot study shows statistically significant differences in true positive and true negative features, and in both the features and filters used to discriminate true negative results between expert and resident observers. This type of analysis can be useful for finding fixation tendencies that result from the available spatial features during mammogram screening.
Effect of independent multiple reading of mammograms on detection performance
Author(s):
Nico Karssemeijer;
Johannes D. Otten;
Antonius A. J. Roelofs;
Sander van Woudenberg;
Jan H. C. L. Hendriks
Show Abstract
The goal of this study was to assess the effect of independent combination of multiple readers in mammography on detection performance, using different rules to combine localized observer responses. A group of 12 radiologists each read a series of 192 screening mammograms, including 96 prior mammograms of breast cancer cases in which a visible sign of abnormality could be identifed in retrospect. The other 96 cases were normal. In total the 12 readers annotated 1890 findings. LROC analysis was used to measure performance. The mean sensitivity in a false positive interval from 2 to 8% was 31.4% for single reading (range: 14.4% - 46.9%). The best rule for combination of observer scores was taking the average of all radiologists, using a zero score for radiologists who did not annotate the finding. With this strategy the average performance of 2 readers combined, in the interval selected, went up to 42.2%. When the interpretations of more readers were independently combined the mean sensitivity further increased, up to a level of 64.8% for the combination of all 12 readers. Using the mean score of only those readers who reported a finding turned out to be a poor strategy, yielding results that were similar or worse than single reading.
Categories of observer error from eye tracking and AFROC data
Author(s):
David Manning;
Susan Ethell;
Tim Donovan
Show Abstract
Twenty-four volunteer observers were divided into groups of eight radiologists, eight radiographers and eight novices to carry out a pulmonary nodule detection task on a test bank of 120 digitized PA chest radiographs. The eight radiographers were tested twice: before and after a six-month training program in interpretation of the adult chest radiograph. During each test session the observers eye movements were tracked. Data on the observers' decisions through AFROC methodology were correlated to their eye-movement and fixation patterns. False negative error-rates were recorded as 41% for the radiologists, 45% for the novices, 47% for the radiographers before training and 42% for the radiographers after training. The errors were sub-classified into search, recognition and decision errors depending on the duration of the fixation-time for each faulty response. Errors due to satisfaction of search were determined from images with multiple nodules. Differences between the groups were shown. Errors due to inefficient search were in the minority for all the observer groups and the dominant cause of unreported nodules was incorrect decision-making. True negative decisions from all observers were associated with shorter fixation times than false negative decisions. No correct negative decisions were made after fixations exceeding three seconds.
Attentional perceptual thresholds for manipulated digitized mammograms
Author(s):
Anthony Maeder;
Clinton Fookes
Show Abstract
This paper describes a series of experiments to investigate influence of perceptual response in skilled observers, due to subtle pixel intensity transforms in radiological images. Contrast and edge enhancement operations were applied to digitized mammograms, in order to determine thresholds at which variations in attentional behavior not consciously identified by the observer were detected, during normal visual scanning procedures in a typical screening viewing situation. Continuous tracking of eye movements was undertaken to obtain patterns of fixation sequences and durations for three different observers, and both qualitative and quantitative analyses were applied to this data. Consistent thresholds at which attentional perturbation occurred were established based on levels of aggregated pixel errors determined by SNR values, across the different methods of image manipulation considered.
Modeling visual search during mammogram viewing
Author(s):
Harold L. Kundel;
Calvin F. Nodine
Show Abstract
The purpose of this study was to determine if quantitative parameters derived from visual scanpaths obtained by eye-position recording and parsed using a search and error model can be used to separate readers with different levels of expertise. Expertise was established on the basis of level of training and scores on an image reading test. The eye-position recordings of 9 readers, who searched 31, two view (cc and mlo) mammograms for breast cancer were reviewed. The likelihood of hitting a lesion with a "useful visual field" of 5 degrees of visual angle circumscribed about the center of the gaze location was measured as well as the time required to first hit a lesion and the fixation dwell time on the lesion zone. The hits and misses were classified, using the search model, into scanning, recognition and decision categories. The mammographers were significantly different from the trainees when compared using the categories of the search model.
What can spatial frequency analysis tell us about inter-observer variability in mammogram reading?
Author(s):
Claudia Mello-Thoms
Show Abstract
The differences in the interpretation of perceived findings are one of the most important elements in breast cancer detection. Several studies have shown that radiologists do not necessarily agree with each other, which is reflected in wide ranges of sensitivity and specificity, when groups of radiologists read the same mammogram cases. This variability, however, is not well understood. The characteristics of the areas where the observers agree or do not agree have not been widely explored. In this paper we compare the agreement rates of observers belonging to two different groups, namely, mammographers and residents, when reading a test set of mammograms. We determine the spatial frequency characteristics of areas that yield high agreement, as well as that of areas that yield high disagreement, among the observers.
Pulmonary nodule detection: what features attract attention?
Author(s):
Elizabeth A. Krupinski;
William Berger;
William Dallas;
Hans Roehrig
Show Abstract
The goal of the study was to determine if there are certain physical features of pulmonary nodules that attract visual attention and contribute to increased recognition and detection by observers.
A series of posteroanterior chest images with solitary pulmonary nodules were searched by six radiologists as their eye-position was recorded. The signal-to-noise ratio, size, conspicuity, location, and calcification status were measured for each nodule. Dwell parameters were correlated with nodule features and related to detection rates. Only nodule size (F = 5.08, p = 0.0254) and conspicuity (F = 4.625, p = 0.0329) influenced total dwell time on nodules, with larger more conspicuous nodules receiving less visual attention than smaller less conspicuous nodules. All nodule features examined influenced overall detection performance (p < 0.05) even though most did not influence visual search and attention.
Individual nodule features do not attract attention as measured by “first hit” fixation data, but certain features do tend to hold attention once the nodule has been fixated. The combination of all features influences whether or not it is detected.
Hypervolume under the ROC hypersurface of a near-guessing ideal observer in a three-class classification task
Author(s):
Darrin C. Edwards;
Charles E. Metz;
Robert M. Nishikawa
Show Abstract
We expressed the performance of the three-class "guessing" observer in terms of the six probabilities which make up a three-class receiver operating characteristic (ROC) space, in a formulation in which "sensitivities" are eliminated in constructing the ROC space (equivalent to using false-negative fraction and false-positive fraction in a two-class task). We then show that the "guessing" observer's performance in terms of these conditional probabilities is completely described by a degenerate hypersurface with only two degrees of freedom (as opposed to the five required, in general, to achieve a true hypersurface in such a ROC space). It readily follows that the hypervolume under such a degenerate hypersurface must be zero. We then consider a "near-guessing" task; that is, a task in which the three underlying data probability density functions (PDFs) are nearly identical, controlled by two parameters which may vary continuously to zero (at which point the PDFs become identical). The hypervolume under the ROC hypersurface of an observer in the three-class classification task tends continuously to zero as the underlying data PDFs converge continuously to identity (a "guessing" task). The hypervolume under the ROC hypersurface of a "perfect" ideal observer (a task in which the three data PDFs never overlap) is also found to be zero in the ROC space formulation under consideration. This suggests that hypervolume may not be a useful performance metric in three-class classification tasks, despite the utility of the area under the ROC curve for two-class tasks.
Problems with the differential receiver operating characteristic (DROC) method
Author(s):
Dev Prasad Chakraborty
Show Abstract
Most papers in these proceedings present ideas that work. This is the story of an idea that did not work as intended. The differential receiver operating characteristic (DROC) method was proposed about 8 years ago. It was intended to measure the difference in performance between two imaging modalities. It was expected that the DROC method could outperform the ROC method in statistical power. This expectation has not been borne out and the author no longer recommends the DROC method. The purpose of this paper is to present a critical look at this method, why the author initially believed it should work, the assumptions involved and the fallacies. The author believes there is value to this frank account as it has yielded, at least for the author, new insights into ROC analysis. The author concludes with a few personal reflections on his experience with this project and advice on how to deal with negative results.
Jackknife free-response ROC methodology
Author(s):
Dev Prasad Chakraborty;
Kevin S. Berbaum
Show Abstract
Although ROC analysis is the accepted methodology for evaluation of diagnostic imaging systems, it has some serious shortcomings. By contrast, FROC methodology allows the observer to report multiple abnormalities per case, and uses the location of reported abnormalities to improve the measurement. Because ROC methodology has no way to allow multiple responses or use the location information, its statistical power will suffer. The FROC method has not enjoyed widespread acceptance because of concern about whether responses made to the same case can be treated as independent. We propose a new jackknife FROC method (JAFROC) that does not make the independence assumption. The new method combines elements of FROC and the Dorfman-Berbaum-Metz (DBM) multi-reader ROC methods. To compare the JAFROC method to an earlier free-response method (alternative free-response or AFROC method), and to the DBM method, which uses conventional ROC scoring, we developed a model for generating simulated FROC detection and location data. The simulation model is quite general and can be used to evaluate any method for analysis of multiple-response detection-and-localization data. It allowed us to examine null hypothesis (NH) behavior and statistical power of analytic methods. We found that AFROC analysis did not pass the NH test, being unduly conservative. Both the JAFROC method and the DBM passed the NH test, but JAFROC had more statistical power than the DBM method. The results of this comparison suggests that future studies of diagnostic performance may enjoy improved statistical power or reduced sample size requirements through the use of the JAFROC method.
Evaluation of image analysis techniques without requiring ground truth or gold standard
Author(s):
Tianhu Lei;
Jayaram K. Udupa
Show Abstract
Observers often evaluate image analysis techniques by comparing their results with the corresponding ground truth or gold standard. Difficulties in making such assessments often occur when the ground truth or gold standard is either unknown or inaccurate. Motivated by commonly used image restoration approaches, we developed an image analysis technique, which, instead of assessing the obtained results, directly assesses the technique itself by testing its validity it with the fundamental imaging principles which are well defined. We conducted a statistical investigation into MR imaging, starting from the data domain and proceeding to the image domain, and derived several intrinsic statistical properties of MR images. Based on them, we further proved a Finite Normal Mixture (FNM) model (in terms of pixel intensities and their independence) and a Markov random field (MRF) model (in terms of pixel intensities and their correlation) for MR images, and developed Expectation-Maximization (EM) and Iterated Conditional Modes (ICM) algorithms for FNM and MRF model-based image analysis. The results obtained by applying these algorithms to real MR images demonstrated that this image analysis technique can generate results which accurately fit the true objects.
Human efficiency in the detection and discrimination tasks
Author(s):
Ingrid Reiser;
Charles E. Metz;
Robert M. Nishikawa
Show Abstract
We investigated human efficiency in a discrimination task and compared it to human efficiency in an associated detection task.
The goal of this study was to investigate the relationship between image quality and shape discrimination in radiographic images. We conducted 2-AFC observer experiments to determine human performance and compared it to ideal observer performance in the SKE-BKE detection and discrimination tasks. We found that human efficiency was significantly lower for the discrimination task than for the detection task, and discrimination performance also depended on the actual object shape. The results support our hypothesis that the shape of individual microcalcifications in a mammogram cannot be identified reliably, unless the two microcalcification shapes in question are substantially different, such as punctate and linear.
Gold standards and expert panels: a pulmonary nodule case study with challenges and solutions
Author(s):
Dave P. Miller;
Kathryn F. O’Shaughnessy;
Susan A. Wood;
Ronald A. Castellino M.D.
Show Abstract
Comparative evaluations of reader performance using different modalities, e.g. CT with computer-aided detection (CAD) vs. CT without CAD, generally require a “truth” definition based on a gold standard. There are many situations in which a true invariant gold standard is impractical or impossible to obtain. For instance, small pulmonary nodules are generally not assessed by biopsy or resection. In such cases, it is common to use a unanimous consensus or majority agreement from an expert panel as a reference standard for actionability in lieu of the unknown gold standard for disease. Nonetheless, there are three major concerns about expert panel reference standards: (1) actionability is not synonymous with disease (2) it may be possible to obtain different conclusions about which modality is better using different rules (e.g. majority vs. unanimous consensus), and (3) the variability associated with the panelists is not formally captured in the p-values or confidence intervals that are generally produced for estimating the extent to which one modality is superior to the other. A multi-reader-multi-case (MRMC) receiver operating characteristic (ROC) study was performed using 90 cases, 15 readers, and a reference truth based on 3 experienced panelists. The primary analyses were conducted using a reference truth of unanimous consensus regarding actionability (3 out of 3 panelists). To assess the three concerns noted above: (1) additional data from the original radiology reports were compared to the panel (2) the complete analysis was repeated using different definitions of truth, and (3) bootstrap analyses were conducted in which new truth panels were constructed by picking 1, 2, or 3 panelists at random. The definition of the reference truth affected the results for each modality (CT with CAD and CT without CAD) considered by itself, but the effects were similar, so the primary analysis comparing the modalities was robust to the choice of the reference truth.
Computer-aided detection of lung cancer on chest radiographs: effect of machine CAD true positive/false negative detections on radiologists' confidence level
Author(s):
Matthew T. Freedman M.D.;
Teresa Osicka;
Shih-Chung Benedict Lo;
Fleming Lure;
Xin-Wei Xu;
Jesse Lin;
Hui Zhao;
Ron Zhang
Show Abstract
This paper evaluates the effect of Computer-Aided Detection prompts on the confidence and detection of cancer on chest radiographs. Expected findings included an increase in confidence rating and a decrease in variance in confidence when radiologists interacted with a computer prompt that confirmed their initial decision or induced them to switch from an incorrect to a correct decision. Their confidence rating decreased and the variance of confidence rating increased when the computer failed to confirm a correct or incorrect decision. A population of cases was identified that changed among reading modalities. This unstable group of cases differed between the Independent and Sequential without CAD modalities in cancer detection by radiologists and cancer detection by machine. CAD prompts induced the radiologists to make two types of changes in cases: changes on the sequential modality with CAD that restored an initial diagnosis made in the Independent read and new changes that were not present in the Independent or Sequential reads without CAD. This has implications for double reading of cases. The effects of intra-observer variability and inter-observer variability are suggested as potential causes for differences in statistical significance of the Independent and Sequential Design approaches to ROC studies.
Observers' ability to judge the similarity of clustered calcifications on mammograms
Author(s):
Robert M. Nishikawa;
Yongyi Yang;
Dezheng Huo;
Miles Wernick;
Charlene A. Sennett;
John Papaioannou;
Liyang Wei
Show Abstract
We are comparing two different methods for obtaining the radiologists’ subjective impression of similarity, for application in distinguishing benign from malignant lesions. Thirty pairs of mammographic clustered calcifications were used in this study. These 30 pairs were rated on a 5-point scale as to their similarity, where 1 was nearly identical and 5 was not at all similar. After this, all possible combinations of pairs of pairs were shown to the reader (n=435) and the reader selected which pair was most similar. This experiment was repeated by the observers with at least a week between reading sessions. Using analysis of variance, intra-class correlation coefficients (ICC) were calculated for both absolute scoring method and paired comparison method. In addition, for the paired comparison method, the coefficient of consistency within each reader was calculated. The average coefficient of consistence for the 4 readers was 0.88 (range 0.49-0.97). These results were statistically significant different from guessing at p << 0.0001. The ICC for intra-reader agreement was 0.51 (0.37-0.66 95% CI) for the absolute method and 0.82 (0.73-0.91 95% CI) for the paired comparison method. This difference was statistically significant (p=0.001). For the inter-reader agreement, the ICC for the absolute method was 0.39 (0.21-0.57 95% CI) and 0.37 (0.18-0.56 95% CI) for the paired comparison method. We conclude that humans are able to judge similarity of clustered calcifications in a meaningful way. Further, radiologists had greater intra-reader agreement when using the paired comparison method than when using an absolute rating scale. Differences in the criteria used by different observers to judge similarity and differences in interpreting which calcifications comprise the cluster can lead to low ICC values for inter-reader agreement for both methods.
Use of BI-RADS lesion descriptors in computer-aided diagnosis of malignant and benign breast lesions
Author(s):
Yulei Jiang;
Robert A. Schmidt;
Robert M. Nishikawa;
Carl J. D'Orsi;
Carl J. Vyborny;
Gillian M. Newstead
Show Abstract
The purpose of this study was to determine whether combining an automated computer technique that classifies calcifications in mammograms as malignant or benign with radiologist-provided BI-RADS lesion description improves classification performance. Three expert mammography radiologists who were MQSA certified and familiar with BI-RADS retrospectively interpreted 125 cases of mammograms containing calcifications and provided BI-RADS lesion descriptions. A computer technique was applied to the mammograms to extract eight image features that describe the size, shape, and uniformity of individual as well as groups of calcifications. We compared the performance of artificial neural networks that estimated the likelihood of malignancy based on input from either the computer-extracted image features alone, the BI-RADS lesion descriptors alone, or the combination of both. The leave-one-out method was used. Combining the BI-RADS lesion description provided by a single radiologist and computer-extracted image features resulted in improved performance. However, using two radiologists' BI-RADS lesion descriptions such that one radiologist's data was used to train and another radiologist's data was used to test the neural network diminished this improvement in performance. These results suggest that variability in radiologists' BI-RADS lesion description is large enough to offset a potential gain in performance from combining it with an automated computer technique.
Comparison of nodule characteristics on CT and radiographic images
Author(s):
Zhimin Huo;
Melisa Gao;
David F. Yankelevitz;
Claudia I. Henschke;
William J. Kostis;
John Wandtke
Show Abstract
The purpose of this study is to identify the difference in nodule characteristics manifested on computed topography (CT) and X-ray images and to evaluate the ability of radiographic features to differentiate between benign and malignant nodules, when compared to the features extracted from CT. We collected 79 consecutive computed radiographic (CR) chest images with one or more CT-documented lung nodules. Upon viewing the CT slices, corresponding nodules were localized on CR images by an experienced chest radiologist. Of the 79 CT nodules (19 benign, 60 malignant), 61 (14 benign, 47 malignant) were considered to be definitely visible on the CR, and the rest were considered to be invisible or did not qualify for distinct feature assessment. Eleven nodule features each were visually extracted from CT and CR images. These features were used to characterize the nodule in terms of size, shape, lobulation, spiculation, density, etc. Correlation between the CT and CR features was calculated for the 61 definitely CR-visible nodules. Receiver operating characteristics (ROC) analysis was performed to evaluate the ability of these features in the task of differentiating between benign and malignant nodules. Results showed that CR and CT images agreed well in characterizing nodules in terms of shape, lobulation, spiculation and density features. We found that 40-50% of the cases had same CR and CT ratings and 41-51% of cases were rated by a difference of one between their ratings on CT and CR for shape (3-point scale), lobulation (4-point scale) and speculation (4-point scale) features. Ninety-two percent of the cases had same CT and CR ratings on the density feature. Size yielded a correlation coefficient of 0.84. In the task of differentiating between benign and malignant lung nodules, ROC analysis of individual features yielded an Az value ranging from 0.52 to 0.77 for the 14 CT features and from 0.52 to 0.75 for the CR features. In addition, we examined the characteristics of the 18 nodules that were excluded from feature analysis. On average, these 18 nodules were smaller in size (15.2 mm measured from CT) than the 61 CR-visible nodules (23.5 mm). We found that CR features agreed reasonably well with CT features and their ability to differentiate between benign and malignant nodules were similar to that of the CT features.
Classification of breast abnormalities under different detection cueing environments
Author(s):
Bin Zheng;
Richard G. Swensson;
Amy H. Klym;
Lara A. Hardesty;
Ratan Shah;
Luisa Wallace;
Christiane M. Hakim;
David Gur
Show Abstract
Eight radiologists interpreted 110 subtle cases in an observer performance study. The database includes 51 verified masses and 44 microcalcification clusters. Of these, 35 masses and 29 clusters were associated with malignancy. Two computer-aided detection (CAD) cueing conditions involving the same case-based sensitivity of 73% and two false-positive rates (0.8 or 2 per image) were applied. In each condition, radiologists interpreted 110 cases twice. In one reading mode radiologists first interpreted images without viewing CAD cues and then they could revise their initial interpretation after reviewing CAD cues. In another reading mode, radiologists viewed cues as soon as the images were displayed. Abnormalities were first detected by the radiologists and then classified as benign or malignant. The results demonstrated that these two cueing modes had little impact on radiologists performance after they have already made initial interpretation. However, displaying a large number of false-positive cues simultaneously with the images significantly reduced radiologists' performance in the classification of masses (p < 0.05). As false-positive cueing rate decreased the negative effect on classification performance decreased as well. Hence, inappropriate use of or reliance on CAD results with high false-positive rate could interfere with radiologists' attention to the classification task.
Reliability measure for segmenting algorithms
Author(s):
Robert E. Alvarez
Show Abstract
Segmenting is a key initial step in many computer-aided detection (CAD) systems. Our purpose is to develop a method to estimate the reliability of segmenting algorithm results. We use a statistical shape model computed using principal component analysis. The model retains a small number of eigenvectors, or modes, that represent a large fraction of the variance. The residuals between the segmenting result and its projection into the space of retained modes are computed. The sum of the squares of residuals is transformed to a zero-mean, unit standard deviation Gaussian random variable. We also use the standardized scale parameter. The reliability measure is the probability that the transformed residuals and scale parameter are greater than the absolute value of the observed values. We tested the reliability measure with thirty chest x-ray images with “leave-out-one” testing. The Gaussian assumption was verified using normal probability plots. For each image, a statistical shape model was computed from the hand-digitized data of the rest of the images in the training set. The residuals and scale parameter with automated segment results for the image were used to compute the reliability measure in each case. The reliability measure was significantly lower for two images in the training set with unusual lung fields or processing errors. The data and Matlab scripts for reproducing the figures are at http://www.aprendtech.com/papers/relmsr.zip
Errors detected by the new reliability measure can be used to adjust processing or warn the user.
Closed-form quality measures for compressed medical images: compression noise statistics of transform coding
Author(s):
Dunling Li;
Murray H. Loew
Show Abstract
This paper provides a theoretical foundation for the closed-form expression of model observers on compressed images. In medical applications, model observers, especially the channelized Hotelling observer, have been successfully used to predict human observer performance and to evaluate image quality for detection tasks in various backgrounds. To use model observers, however, requires knowledge of noise statistics. This paper first identifies quantization noise as the sole distortion source in transform coding, one of the most commonly used methods for image compression. Then, it represents transform coding as a 1-D block-based matrix expression, it further derives first and second moments, and the probability density function (pdf) of the compression noise at pixel, block and image levels. The compression noise statistics depend on the transform matrix and the quantization matrix in the transform coding algorithm. Compression noise is jointly normally distributed when the dimension of the transform (the block size) is typical and the contents of image sets vary randomly. Moreover, this paper uses JPEG as a test example to verify the derived statistics. The test simulation results show that the closed-form expression of JPEG quantization and compression noise statistics correctly predicts the estimated ones from actual images.
Relative impact of detector noise and anatomical structure on lung nodule detection
Author(s):
Brian W. Keelan;
Karin Topfer;
John Yorkston;
William J Sehnert;
Jacquelyn S Ellinwood
Show Abstract
A four-alternative forced-choice experiment was conducted to investigate the relative impact of detector noise and anatomical structure on detection of subtle lung nodules. Sets of four independent backgrounds from each of three regions (heart, ribs, and lung field between the ribs) were derived from a very low-noise chest-phantom capture. Simulated nodules of varying contrast and fixed diameter (10 mm) were digitally added to the centers of selected background images. Subsequently, signal-dependent noise was introduced to simulate amorphous selenium radiographic detector performance at typical 80, 200, 400, 800, or higher speed class exposures. Series of four nodule contrasts each were empirically selected to yield comparable ranges of detectability index (d') for each background type and exposure level. Thirty-six observers with imaging expertise performed the nodule detection task, for which the signal and location were known exactly. Equally detectable nodule contrasts for each background type and exposure level were computed and their squares plotted against detector noise variance. The intercepts and slopes of the linear regressions increased in the order of lung, heart, and ribs, correlating with apparent anatomical structural complexity. The regression results imply that the effect of anatomical structure dominated that of capture device noise at clinically relevant exposures and beyond.
Characterization of breast masses for simulation purposes
Author(s):
Robert S. Saunders Jr.;
Ehsan Samei
Show Abstract
Simulation of radiographic lesions is an important prerequisite for several research applications in medical imaging, including hardware and software design and optimization. For mammography, breast masses are an important class of lesions to be considered. In this study, we first characterized both benign and malignant breast masses with example mammograms from the Digital Database for Screening Mammography (DDSM). The measured properties of each of these mass types were then used to create a simulation routine that was capable of creating example masses from each category. A preliminary observer experiment was conducted to determine whether a mammographer could distinguish between the simulated and true masses. An ROC analysis indicated Az values of 0.59 and 0.61 for benign and malignant lesions, respectively, suggesting very similar appearance for the simulated and real lesions. A larger observer performance experiment with multiple mammographers is underway to validate these results.
Comparison of two methods for evaluation of image quality of lumbar spine radiographs
Author(s):
Anders Tingberg;
Magnus Bath;
Markus Hakansson;
Joakim Medin;
Michael Sandborg;
Gudrun Alm-Carlsson;
Sören Mattsson;
Lars Gunnar Mansson
Show Abstract
To evaluate the image quality of clinical radiographs with two different methods, and to find correlations between the two methods.
Based on fifteen lumbar spine radiographs, two new sets of images were created. A hybrid image set was created by adding two distributions of artificial lesions to each original image. The image quality parameters spatial resolution and noise were manipulated and a total of 210 hybrid images were created. A set of 105 disease-free images was created by applying the same combinations of spatial resolution and noise to the original images. The hybrid images were evaluated with the free-response forced error experiment (FFE) and the normal images with visual grading analysis (VGA) by nine experienced radiologists. The VGA study showed that images with low noise are preferred over images with higher noise levels. The alteration of the MTF had a limited influence on the VGA score. For the FFE study the visibility of the lesions was independent of the spatial resolution and the noise level. In this study we found no correlation between the two methods, probably because the detectability of the artificial lesions was not influenced by the manipulations of noise level and resolution. Hence, the detection of lesions in lumbar spine radiography may not be a quantum-noise limited task. The results show the strength of the VGA technique in terms of detecting small changes in the two image quality parameters. The method is more robust and has a higher statistical power than the ROC related method and could therefore, in some cases, be more suitable for use in optimization studies.
Depth perception of stereo overlays in image-guided surgery
Author(s):
Laura Johnson;
Philip Edwards;
Lewis Griffin;
David Hawkes
Show Abstract
See-through augmented reality (AR) systems for image-guided surgery merge volume rendered MRI/CT data directly with the surgeon’s view of the patient during surgery. Research has so far focused on optimizing the technique of aligning and registering the computer-generated anatomical images with the patient’s anatomy during surgery. We have previously developed a registration and calibration method that allows alignment of the virtual and real anatomy to ~1mm accuracy. Recently we have been investigating the accuracy with which observers can interpret the combined visual information presented with an optical see-through AR system. We found that depth perception of a virtual image presented in stereo below a physical surface was misperceived compared to viewing the target in the absence of a surface. Observers overestimated depth for a target 0-2cm below the surface and underestimated the depth for all other presentation depths. The perceptual error could be reduced, but not eliminated, when a virtual rendering of the physical surface was displayed simultaneously with the virtual image. The findings suggest that misperception is due either to accommodation conflict between the physical surface and the projected AR image, or the lack of correct occlusion between the virtual and real surfaces.
Fast approach to evaluate MAP reconstruction for lesion detection and localization
Author(s):
Jinyi Qi;
Ronald H. Huesman
Show Abstract
Lesion detection is an important task in emission tomography. Localization ROC (LROC) studies are often used to analyze the lesion detection and localization performance. Most researchers rely on Monte Carlo reconstruction samples to obtain LROC curves, which can be very time-consuming for iterative algorithms. In this paper we
develop a fast approach to obtain LROC curves that does not require Monte Carlo reconstructions. We use a channelized Hotelling observer model to search for lesions, and the results can be easily extended to other numerical observers. We theoretically analyzed the mean and covariance of the observer output. Assuming the observer outputs are multivariate Gaussian random variables, an LROC curve can be directly generated by integrating the conditional probability density functions. The high-dimensional integrals are calculated using a Monte Carlo method. The proposed approach is very fast because no iterative reconstruction is involved. Computer simulations show that the results of the proposed method match well with those obtained using the tradition LROC analysis.
Quantitative image quality evaluation of spiral MRI techniques under noisy conditions
Author(s):
Donglai Huo;
Kyle A. Salem;
David L. Wilson
Show Abstract
Spiral sampling of k-space is a popular technique in fast MRI. Many methods are available for spiral acquisition and reconstruction. We used a Perceptual Difference Model (PDM) to evaluate these selections and to examine the effects of noise. PDM is a human observer model that calculates the visual difference between a “test image” and a “gold standard image.” PDM has been shown to correlate well with human observers in a variety of MR experiments including added noise, increased blurring, keyhole imaging, and spiral imaging. We simulated MR images from six different interleave patterns, seven different sampling levels, three different density compensation methods, and four different reconstruction options under zero noise and three noise levels. By comparing results with and without noise, we can separate noise effects from reconstruction errors. Comparing many different conditions, Voronoi (VOR) plus conventional regridding was good for high SNR data. In low SNR conditions, area density function (ADF) was better. One can also quantitatively compare different acquisition parameters; smaller numbers of interleaves and high number of samples were very desirable when noise was applied, because high frequency sampling was ensured. We conclude that PDM scoring provides an objective, useful tool for the assessment of spiral MR image quality and can greatly aid the design of MR acquisition and signal processing strategies.
Using LROC analysis to evaluate detection accuracy of microcalcification clusters imaged with flat-panel CT mammography
Author(s):
Xing Gong;
Stephen J. Glick;
Aruna A. Vedula
Show Abstract
The purpose of this study is to investigate the detectability of microcalcification clusters (MCCs) using CT mammography with a flat-panel detector. Compared with conventional mammography, CT mammography can provide improved discrimination between malignant and benign cases as it can provide the radiologist with more accurate morphological information on MCCs. In this study, two aspects of MCC detection with flat-panel CT mammography were examined: (1) the minimal size of MCCs detectable with mean glandular dose (MGD) used in conventional mammography; (2) the effect of different detector pixel size on the detectability of MCCs. A realistic computer simulation modeling x-ray transport through the breast, as well as both signal and noise propagation through the flat-panel imager, was developed to investigate these questions. Microcalcifications were simulated as calcium carbonate spheres with diameters set at the levels of 125, 150 and 175 μm. Each cluster consisted of 10 spheres spread randomly in a 6×6 mm2 region of interest (ROI) and the detector pixel size was set to 100×100, 200×200, or 300×300μm2. After reconstructing 100 projection sets for each case (half with signal present) with the cone-beam Feldkamp (FDK) algorithm, a localization receiver operating characteristic (LROC) study was conducted to evaluate the detectability of MCCs. Five observers chose the locations of cluster centers with correspondent confidence ratings. The average area under the LROC curve suggested that the 175 μm MCCs can be detected at a high level of confidence. Results also indicate that flat-panel detectors with pixel size of 200×200 μm2 are appropriate for detecting small targets, such as MCCs.
Simulation study comparing the imaging performance of a solid state detector with a rotating slat collimator versus parallel beam collimator setups
Author(s):
Steven Staelens;
Stefaan Vandenberghe;
Jan De Beenhouwer;
Stijn De Clercq;
Yves D'Asseler;
Ignace Lemahieu;
Rik Van de Walle
Show Abstract
The main goal of this work is to assess the overall imaging performance of dedicated new solid state devices compared to a traditional scintillation camera for use in SPECT imaging. A solid state detector with a rotating slat collimator will be compared with the same detector mounted with a classical collimator as opposed to a traditional Anger camera. A better energy resolution characterizes the solid state materials while the rotating slat collimator promises a better sensitivity-resolution tradeoff. The evaluation of the different imaging modalities is done using GATE, a recently developed Monte Carlo code. Several features for imaging performance evaluation were addressed: spatial resolution, energy resolution, sensitivity, and a ROC analysis was performed to evaluate the hot spot detectability. In this way a difference in perfromance was concluded for the diverse imaging techniques which allows a task dependent application of these modalities in future clinical practice.
Optimization of detector pixel size for stent visualization in x-ray fluoroscopy
Author(s):
Yuhao Jiang;
David L. Wilson
Show Abstract
Pixel size is of great interest in flat-panel detector design. To visualize small interventional devices such as a stent in angiographic x-ray fluoroscopy, pixels should be small to limit contrast dilution from partial-area and large to collect sufficient x-rays for an adequate signal-to-noise ratio (SNR). Using quantitative experimental and modeling techniques, we investigated the optimal pixel sizes for visualization of a stent created from 50 μm diameter wires. Image quality was evaluated by the ability of subjects to perform two tasks: detect the presence of a stent and discriminate a partially deployed stent from a fully deployed one. With regard to detection, for the idealized direct detector, the 100 μm pixel size resulted in maximum measured contrast sensitivity. For an idealized indirect detector, with a scintillating layer, the maximal measured contrast sensitivity was obtained at 200 μm pixel size. The channelized human observer model predicted a peak at 150 and 170 μm, for idealized direct and indirect detectors, respectively. Stent deployment is more sensitive to pixel size than stent detection, resulting in a steeper drop in performance with large pixels. With regard to stent deployment detection, smaller even pixel sizes are favored for both detector types. With increasing exposures, the model predicts a smaller optimal pixel size because the noise penalty is reduced.
Observer models and human visual detection performance with different targets
Author(s):
Jian Yang;
Cathleen Daniels Cerosaletti
Show Abstract
Prior publications have shown that ideal observer models provide a good estimate of measured d' values for varying noise amplitude and target strength after allowing for observer internal noise and human efficiency. To provide a consistent estimate of visual performance in general applications, the internal noise and human efficiency should either be fixed values or calculable based on experimental conditions. In the current study, we test observer models for several sizes of three types of targets (rectangular, Gaussian, or Gabor) at two uniform background luminances and three levels of added Gaussian noise. The ideal observer predictions for each individual experimental condition are well correlated with measured d' values (r2 > 0.90 in most cases); however, the required internal noise and human efficiency vary substantially with target and luminance. A modified ideal observer, which includes a luminance-dependent eye filter and Gabor channels, is developed to simultaneously account for the measured d' values in all experimental conditions with r2 = 0.88. This observer model can be used to estimate general target detectability in flat two-dimensional image areas.
Calcification classifications of small nodules identified during CT lung cancer screening
Author(s):
Philip F. Judy;
Roberto Riva;
Yoshiko Kadota;
Francine L. Jacobson
Show Abstract
The aim of this study was to determine whether radiologists are more likely to report as calcified the small nodules detected during CT lung-cancer screening, if sharper reconstruction filters are utilized. Images were reconstructed with the 2 filters used at our institution for the lung (B50f) and for the mediastinum (B30f). The 4 lung-cancer screening cases were reconstructed with 1.25-mm section thickness at 0.6-mm section increments. Using a lax criterion, 2 radiologists identified the locations of nodular features and rated the likelihood that the features were calcified. There were 302 nodules reports. More of these (57%) were reported on images reconstructed using the smooth filter. Sixty (60) reports were definitely or possibly calcified. Seventy-three percent (73%) calcification reports were from images reconstructed using B50f. There were 27 calcification reports of one of the radiologist that were classified as non-calcified by the other radiologist. Most of calcification reports (81%) of 27 reports on which radiologists disagree regarding the likelihood of calcification were from images reconstructed using B50f. Radiologists are more likely to report small nodules detected during lung-cancer screening as calcified using the sharper reconstruction filter. Whether these nodules are actually calcified or not remains a question.
AUC-based resolution in optical coherence tomography
Author(s):
Jannick P. Rolland;
Jason O'Daniel;
Eric Clarkson;
Kit-Iu Cheong;
A. C. Akcay;
Tony Delemos;
Pascale Parrein;
Kye Sung Lee
Show Abstract
Optical coherence tomography (OCT) is an interferometric technique using the low coherence property of light to axially image at high resolution in biological tissue samples. Transverse imaging is obtained with two-dimensional scanning and transverse resolution is limited by the size of the scanning beam at the imaging point. The most common metrics used for determining the axial resolution of an OCT system are the full-width-at-half-maximum (FWHM), the absolute square integral (ASI), and the root-mean-square (RMS) width of the axial PSF of the system, where the PSF of an OCT system is defined as the envelope of the interference fringes when the sample has been replaced by a simple mirror. Such metrics do not take into account the types of biological tissue samples being imaged. In this paper we define resolution in terms of the instrument and the biological
sample combined by defining a resolution task and computing the associated detectability index and area under the receiver operating characteristic curve (AUC). The detectability index was computed using the Hotelling observer or best linear observer. Results of simulations demonstrate that resolution is best quantified as a
probability of resolving two layers, and the impact on resolution of variations in the index of refraction between the layers is clearly demonstrated.
2AFC assessment of contrast threshold for a standardized target using a monochrome LCD monitor
Author(s):
Philip Tchou;
Michael J. Flynn;
Edward Peterson
Show Abstract
The DICOM Gray Scale Display Function (GSDF) relates display contrast to the contrast threshold derived from the Barton Model (CBM) of the human visual system. We have measured the contrast threshold (CT) using a monochrome medical LCD monitor and graphics card under the conditions defined by the DICOM standard and compared the results to the Barten Model. A two Alternative Forced Choice (2AFC) observer performance test was used to measure contrast threshold. The 2AFC tests were given once to a large group of observers with varied medical imaging experience. A small subset of this group was tested multiple times over several months in order to examine intraobserver variability. The mean relative contrast (CT/CBM) associated with a 75% detection rate was found to be 0.508, with a standard deviation of 0.176. For the intraobserver tests, results improved after the first 3 trials. The mean CT/CBM values (and standard deviation) for the next 9 tests were 0.0980 (0.107), 0.244 (0.0928), and 0.398 (0.0855). The results indicate that contrast substantially less than 1 CT/CBM is detected based on the statistical criteria used. This can be explained based on the criteria for detection used in the classical observer tests that form the basis for the Barten model. Additionally, our data indicates significant differences amongst observers.
Distances between greyvalue curves
Author(s):
Miha Fuderer
Show Abstract
A measure is proposed to express the conformance between any pair of greyvalue curves. This may be used to compare a greyvalue curve against a standard. Such a measure may be used to quantify and compare various causes of non-conformance.
Use of a human visual system model to predict the effects of display veiling glare on observer performance
Author(s):
Elizabeth A. Krupinski;
Jeffrey Johnson;
Hans Roehrig;
John Nafziger;
Jiahua Fan;
Jeffrey Lubin
Show Abstract
The goal of this project was to evaluate a human visual system model (JNDmetrix) based on JND and frequency-channel vision-modeling principles to predict the effects of monitor veiling glare on observer performance in interpreting radiographic images. The veiling glare of a high-performance CRT and an LCD display was measured. A series of mammographic images with masses of different contrast levels was generated. Six radiologists viewed the sets of images on both monitors and reported their decision confidence about the presence of a mass. The images were also run through the JNDmetrix model. Veiling glare affected observer performance (ROC Az). Performance was better on the LCD display with lower veiling glare compared to the CRT with higher veiling glare. The JNDmetrix model predicted the same pattern of results and the correlation between human and computer observers was high. Veiling glare can affect significantly observer performance in diagnostic radiology. A possible confound exists in that two different monitors were used and other physical parameters may contribute to the differences observed. A new set of studies is underway to remove that confound.
A prospective study of CAD system for lung cancer based on helical CT image
Author(s):
Michito Hasegawa;
Mitsuru Kubo;
Yoshiki Kawata;
Noboru Niki;
Hironobu Ohmatsu;
Ryutaro Kakinuma;
Masahiro Kaneko;
Masahiko Kusumoto;
Y. Nishiyama;
Kenji Eguchi;
Noriyuki Moriyama
Show Abstract
Chest CT images obtained by CT scanner have drawn a great interest in suspicious region detection. However, mass screening based on CT images leads a considerable number of images to be diagnosed. CAD system for lung cancer that detects tumor candidates at early stage from CT images has developed. In July 1997, clinical trial using first version of the CAD system in Anti-Lung Cancer Association (National Cancer Center Hospital and National Cancer Hospital East) started. As following stage, clinical trial using second version of the CAD system that supports comparative reading function started in April 2002. We expect that CAD system reduces diagnosis time and increase the reliability. In this paper, we describe the clinical trial results using the CAD system from July 1997 and the detection results of the CAD system for supporting mass screening in a prospective study. The results show that the CAD system improves diagnostic accuracy and throughput. And we describe the future of CAD system.
Analysis of ROC on chest direct digital radiography (DR) after image processing in diagnosis of SARS
Author(s):
Guozheng Lv;
Rihui Lan;
Qingsi Zeng;
Zhong Zheng
Show Abstract
The Severe Acute Respiratory Syndrome (SARS, also called Infectious Atypical Pneumonia), which initially broke out in late 2002, has threatened the public’s health seriously. How to confirm the patients contracting SARS becomes an urgent issue in diagnosis. This paper intends to evaluate the importance of Image Processing in the diagnosis on SARS at the early stage. Receiver Operating Characteristics (ROC) analysis has been employed in this study to compare the value of DR images in the diagnosis of SARS patients before and after image processing by Symphony Software supplied by E-Com Technology Ltd., and DR image study of 72 confirmed or suspected SARS patients were reviewed respectively. All the images taken from the studied patients were processed by Symphony. Both the original and processed images were taken into ROC analysis, based on which the ROC graph for each group of images has been produced as described below: For processed images: a = 1.9745, b = 1.4275, SA = 0.8714; For original images: a = 0.9066, b = 0.8310, SA = 0.7572;
(a - intercept, b - slop, SA - Area below the curve). The result shows significant difference between the original images and processed images (P<0.01). In summary, the images processed by Symphony are superior to the original ones in detecting the opacity lesion, and increases the accuracy of SARS diagnosis.
Full width at half maximum as a measure of vessel diameter in computed tomography angiography
Author(s):
Jay K. Varma;
Krishna Subramanyan;
Jacob Durgan
Show Abstract
Computed tomography angiography (CTA) is a procedure gaining usage in the diagnosis of aneurysms located in the aorta, carotid arteries, and in other locations and has also shown promise in the planning of stent placement procedures. Recently, automatic vessel segmentation programs have been developed that can extract the entire aortic vessel tree and provide information to the user regarding the size, length, and tortuosity of the blood vessels. This study was designed to determine if using the full width at half maximum (FWHM) value is an accurate method of determining the diameter of contrast-enhanced blood vessels. A phantom used to simulate vessels of various diameters was filled with a nonionic iodine solution and scanned using a 16-detector CT scanner (Mx8000IDT, Philips Medical Systems, Inc.). The phantom was scanned with varying concentrations of contrast solution to emulate the variation of enhancement that may be seen clinically. The data was analyzed using an application on a workstation (MxView, Philips Medical Systems, Inc.), which allowed for the calculation of FWHM of a user-defined region of interest. The results indicate that the full width at half maximum is an accurate method of calculating the diameter of a blood vessel, regardless of contrast concentration. The full width at half maximum is an easily calculated value, which could potentially be used in an automatic segmentation algorithm to determine the diameters of extracted vessels.
How does the concentration of iodine in enhancing lesions relate to image quality in head CT?
Author(s):
Aaron M. Fischman;
Robert G. Dixon;
Walter Huda;
Kent M. Ogden;
Kristin A. Lieberman;
Marsha L. Roskopf
Show Abstract
Patients undergoing head CT examinations with iodinated contrast received 100 cc of Iohexol 240 injected intravenously by hand. We developed a software package to align non-contrast and contrast head CT images, and obtain the “difference image” consisting of the iodine enhancement within a given lesion. This “difference image” of the iodine enhancement was added to the non-contrast study at reduced intensities. The signal to noise ratio (SNR) for detecting the added iodine was taken to be directly proportional to the concentration of contrast in the lesion. The visibility of the lesion enhancement in this composite image was compared with the original contrast image using a six-point scale ranging from 5 (no observable difference) to 0 (unacceptable). Two radiologists evaluated head CT images of eleven metastatic lesions. The iodine concentration required to generate an image quality rank of 3, deemed to be satisfactory for diagnosis (S), was determined. We also performed a Receiver Operating Characteristic (ROC) study to identify the iodine concentration corresponding to an area under the ROC curve of 0.95, which corresponds to the detection threshold (D) for each lesion. Reducing the intensity to 50% resulted in an average image qualtiy score S of 3, suggesting that it may be possible to reduce the administered iodine by a half in head CT examinations with no significant loss of diagnostic performance. The average iodine concentration at the detection threshold D was 16%. The average S:D ratio was 3.8 ± 1.7, and was similar for both readers. The value of S was independent of enhancement characteristics, whereas the detection threshold D correlated inversely with the size and intensity of the iodine enhancement. The resultant S:D ratio correlated with the lesion area (r2 = 0.31), mean lesion intensity (r2 = 0.44), and the product of the mean lesion intensity and the lesion area (r2 = 0.37). Our results indicate that the SNR of enhancing lesions in head CT that is needed to satisfy radiologists is about a factor of four greater than the SNR required for iodine detection.
How do radiographic techniques affect mass lesion detection performance in digital mammography?
Author(s):
Walter Huda;
Kent M. Ogden;
Ernest M. Scalzetti;
Eric F. Dudley;
David R. Dance
Show Abstract
We investigated how the x-ray tube kV and mAs affected the detection of simulated lesions with diameters between 0.24 and 12 mm. Digital mammograms were acquired with and without mass lesions, permitting a difference image to be generated corresponding to the lesion alone. Isolated digital lesions were added at a reduced intensity to non-lesion images, and used in Four-Alternate Forced Choice (4-AFC) experiments to determine the lesion intensity that corresponded to an accuracy of 92% (I92%). Values of I92% were determined at x-ray tube output values ranging from 40 to 120 mAs, and x-ray tube voltages ranging from 24 to 32 kV. For mass lesions larger than ~0.8 mm, there was no significant change in detection peformance with changing mAs. Doubling of the x-ray tube output from 60 to 120 mAs resulted in an average change in I92% of only +3.8%, whereas the Rose model of lesion detection predicts a reduction in the experimental value of I92% of -29%. For the 0.24 mm lesion, however, reducing the x-ray beam mAs from 100 to 40 mAs reduced the average detection performance by ~60%. Contrast-detail curves for lesions with diameter ≥ 0.8 mm had a slope of ~+0.23, whereas the Rose model predicts a slope of -0.5. For lesions smaller than ~0.8 mm, contrast-detail slopes were all negative with the average gradient increasing with decreasing mAs value. Increasing the x-ray tube voltage from 24 to 32 kV at a constant display contrast resulted in a modest improvement in low contrast lesion detection performance of ~10%. Increasing the display window width from 2000 to 2500 reduced the average observer performance by ~6%. Our principal finding is that radiographic technique factors have little effect on detection performance for lesions larger than ~0.8 mm, but that the visibility of smaller lesions is affected by quantum mottle in qualitative agreement with the predictions of the Rose model.
European breast screening performance: does case volume matter?
Author(s):
Hazel J. Scott;
Alastair G. Gale;
David S. Wooding;
Dieter Walter
Show Abstract
U.K. breast screening radiologists typically read over 5,000 screening cases per annum, whereas in Europe this figure may be lower as in some countries national breast screening programs are in development. The PERFORMS scheme in the UK permits radiologists annual self-assessment of their film-reader skills. As part of a Bavarian breast-screening training scheme a number of German radiologists have now also read the current PERFORMS case set. We investigated whether real-life case volume affects reading performance by the comparison of matched groups reading these screening cases. For each case, individuals identified which key mammographic features were present, whether the case was abnormal and should be recalled or not. For this analysis the participants were matched on volume of cases read and years of experience. Assessment of case volume was elicited by questionnaire data. The radiologists were compared on several key performance measures; cancers detected, correct recall and correct return to screen, signal detection performance statistics and real-life screening practice. It was found that whilst the performance of the Bavarian radiologists on the current test sets was extremely good, on average they performed less well than their UK counterparts. Reasons for this are considered.
The effect of novel prompts upon radiologists’ visual search of mammograms
Author(s):
James W. Hatton;
David S. Wooding;
Alastair G. Gale;
Hazel J. Scott
Show Abstract
Prompting is utilised in CAD systems to draw attention to regions of potential abnormality within screening mammograms. The benefit of such systems is under debate. Our previous research found that radiologists’ visual search patterns were significantly altered when mammographic prompts were displayed. Visual attention concentrated upon prompted areas, with significantly less attention to unprompted regions. Additionally, prompts caused a reduction in the amount of bi-lateral visual comparisons between the two breasts. Current CAD systems use a variety of prompts (e.g. circle and triangle) that appear incongruous to the mammogram and may inadvertently detract attention from unprompted regions. The aim of this experiment was to determine whether attentional focus would continue when using subtle prompts, without resulting in the insufficient search of unprompted areas. A series of paired medio lateral-oblique view mammographic cases were presented to participants on a monitor. Images were presented as "unprompted" and "prompted"- using various methods to highlight potentially abnormal areas. These included typical prompt shapes and also more novel prompts (e.g. altered brightness and colour). Participants were instructed to scan the images as they normally would when screening for abnormalities and to indicate their confidence that an abnormality was present, using a five-point scale. Eye movements were recorded during the task. Results demonstrated that visual attention was drawn to prompted regions. However, the potentially negative influence of prompts upon normal visual search patterns within mammograms was found to be less pronounced in conditions containing novel prompts. By comparing differing prompts during screening it was possible to establish their consequent impact upon visual search patterns. This research contributes to the establishment of optimal prompt displays in soft copy systems.
Breast screening technologists: Does real-life case volume affect performance?
Author(s):
Hazel J. Scott;
Alastair G. Gale;
David S. Wooding
Show Abstract
In the UK fewer radiologists are now specialising in breast cancer screening. Consequently, a number of technologists have been specially trained to read mammograms so as to double-read with existing radiologists. Each year the majority of these film-readers examine a set of difficult cases as a means of self-assessing their skills. We investigated whether the technologists performed as well as breast-screening radiologists on this difficult test set. We also investigated technologists’ performance over a number of years to compare the performance of those technologists who have read a greater number of breast screening films and those who have had less experience. Finally, we investigated real-life experience and performance on the scheme by comparing; volume of cases read, experience, and technologists’ performance over time versus radiologists’ performance. Data for approximately 250 breast screening Radiologists and 80 specially trained technologists over three years for six sets of 60 difficult recent screening cases were examined. Overall, those technologists who have not read the same volume of cases as radiologists did not perform as well on this particular task. Although when the group was fractionated by volume of cases read in real-life and the number of years reading cases, then the technologists performed at a level similar to the radiologists.
Diagnostic performance of different measurement methods for lung nodule enhancement at quantitative contrast-enhanced computed tomography
Author(s):
Dag Wormanns;
Ernst Klotz;
Uwe Dregger;
Florian Beyer;
Walter Heindel
Show Abstract
Lack of angiogenesis virtually excludes malignancy of a pulmonary nodule; assessment with quantitative contrast-enhanced CT (QECT) requires a reliable enhancement measurement technique. Diagnostic performance of different measurement methods in the distinction between malignant and benign nodules was evaluated. QECT (unenhanced scan and 4 post-contrast scans) was performed in 48 pulmonary nodules (12 malignant, 12 benign, 24 indeterminate). Nodule enhancement was the difference between the highest nodule density at any post-contrast scan and the unenhanced scan. Enhancement was determined with: A) the standard 2D method; B) a 3D method consisting of segmentation, removal of peripheral structures and density averaging. Enhancement curves were evaluated for their plausibility using a predefined set of criteria. Sensitivity and specificity were 100% and 33% for the 2D method resp. 92% and 55% for the 3D method using a threshold of 20 HU. One malignant nodule did not show significant enhancement with method B due to adjacent atelectasis which disappeared within the few minutes of the QECT examination. Better discrimination between benign and malignant lesions was achieved with a slightly higher threshold than proposed in the literature. Application of plausibility criteria to the enhancement curves rendered less plausibility faults with the 3D method. A new 3D method for analysis of QECT scans yielded less artefacts and better specificity in the discrimination between benign and malignant pulmonary nodules when using an appropriate enhancement threshold. Nevertheless, QECT results must be interpreted with care.
Methods of evaluating the effectiveness of double-checking in interpreting mass screening images
Author(s):
Tohru Matsumoto;
Akira Furukawa;
Kikuo Machida;
Norinari Honda;
Tomoho Maeda;
Mitsuomi Matsumoto;
Yuichi Fujino;
Shinichi Wada;
Shusuke Sone;
Kiminori Suzuki;
Masahiro Endo
Show Abstract
In this paper we present two methods of evaluating the effectiveness of double check (by two radiologists or by a CAD system and a radiologist): One method uses ROC analysis and the other uses the phi correlation coefficient (φ). We used the first method to evaluate the effectiveness of two radiologists conducting double check through discussion (i.e. the radiologists confer; conference system). We used the second method to evaluate the effectiveness of double check in which Reader 2 makes a final assessment by referring to the assessment of Reader 1 (reference system). It is suggested that double check conducted by two radiologists through discussion may not be so effective; however, double check in which Reader 2 makes a final assessment by referring to the assessment or Reader 1 may be very effective. In addition, we discuss problems that may occur in relation to Reader 2 deciding whether to adopt the assessment of Reader 1, and practical models of double check by a CAD system and a radiologist. Continued research is necessary to establish a double check system that improves diagnostic accuracy in practical situations, i.e. it is unknown if assessments are correct.
Semi-automated measurement of anatomical structures using statistical and morphological priors
Author(s):
Edward A. Ashton;
Tong Du
Show Abstract
Rapid, accurate and reproducible delineation and measurement of arbitrary anatomical structures in medical images is a widely held goal, with important applications in both clinical diagnostics and, perhaps more significantly, pharmaceutical trial evaluation. This process requires the ability first to localize a structure within the body, and then to find a best approximation of the structure’s boundaries within a given scan. Structures that are tortuous and small in cross section, such as the hippocampus in the brain or the abdominal aorta, present a particular challenge. Their apparent shape and position can change significantly from slice to slice, and accurate prior shape models for such structures are often difficult to form. In this work, we have developed a system that makes use of both a user-defined shape model and a statistical maximum likelihood classifier to identify and measure structures of this sort in MRI and CT images. Experiments show that this system can reduce analysis time by 75% or more with respect to manual tracing with no loss of precision or accuracy.
LROC model observers for emission tomographic reconstruction
Author(s):
Parmeshwar Khurd;
Gene Gindi
Show Abstract
Detection and localization performance with signal location uncertainty may be summarized by Figures of Merit (FOM's) obtained from the LROC curve. We consider model observers that may be used to compute the two LROC FOM's: ALROC and PCL, for emission tomographic MAP reconstruction. We address the case background-known-exactly (BKE) and signal known except for location. Model observers may be used, for instance, to rapidly prototype studies that use human observers. Our FOM calculation is an ensemble method (no samples of reconstructions needed) that makes use of theoretical expressions for the mean and covariance of the reconstruction. An affine local observer computes a response at each location, and the maximum of these is used as the global observer - the response needed by the LROC curve. In previous work, we had assumed the local observers to be independent and normally distributed, which allowed the use of closed form expressions to compute the FOM's. Here, we relax the independence assumption and make the approximation that the local observer responses are jointly normal. We demonstrate a fast theoretical method to compute the mean and covariance of this joint distribution (for the signal absent and present cases) given the theoretical expressions for the reconstruction mean and covariance. We can then generate samples from this joint distribution and rapidly (since no reconstructions need be computed) compute the LROC FOM's. We validate the results of the procedure by comparison to FOM's obtained using a gold-standard Monte Carlo method employing a large set of reconstructed noise trials.
A practical automated polyp detection scheme for CT colonography
Author(s):
Hong Li;
Pete Santago
Show Abstract
A fully automated computerized polyp detection (CPD) system is presented that takes DICOM images from CT scanners and provides a list of detected polyps. The system comprises three stages, segmentation, polyp candidate generation (PCG), and false positive reduction (FPR). Employing computer tomographic colonography (CTC), both supine and prone scans are used for improving detection sensitivity. We developed a novel and efficient segmentation scheme. Major shape features, e.g., the mean curvature and Gaussian curvature, together with a connectivity test efficiently produce polyp candidates. We select six shape features and introduce a multi-plane linear discriminant function (MLDF) classifier in our system for FPR. The classifier parameters are empirically assigned with respect to the geometric meanings of a specific feature. We have tested the system on 68 real subjects, 20 positive and 48 negative for 6 mm and larger polyps from colonoscopy results. Using a patient-based criterion, 95% accuracy and 31% specificity were achieved when 6 mm was used as the cutoff size, implying that 15 out of 48 healthy subjects could avoid OC. One 11 mm polyp was missed by CPD but was also not reported by the radiologist. With a complete polyp database, we anticipate that a maximum a posteriori probability (MAP) classifier tuned by supervised training will improve the detection performance. The execution time for both scans is about 10-15 minutes using a 1 GHz PC running Linux. The system may be used standalone, but is envisioned more as a part of a computer-aided CTC screening that can address the problems with a fully automatic approach and a fully physician approach.
Effect of force and acoustic feedback on object-insertion work by teleoperation
Author(s):
Zhenglie Cui;
Katsuya Matsunaga;
Kazunori Shidoji
Show Abstract
The operating efficiency of teleoperation under stereoscopic video images has been reported to be inferior to that of using the naked eye at a real working environment. A human operator working at an actual work location is aided by force, tactile, and acoustic senses in addition to vision. Conventional teleoperated robots lack sense information, except vision, which may explain operators’ inefficient cognition of the working space. Therefore, using stereoscopic video images, we intend to clarify effects of force and acoustic feedback information on the performance of the teleoperation work. Experiment 1 produces a system that can acquire touch-information by the site of the master robot; it elucidates the influence of force and acoustic feedback information in work. Human operators are required to pick up a cylindrical object and insert it into a hole. The experiment shows that feedback of simple touch-information by force and acoustic feedback was not effective to shorten the completion-time. Experiment 2, in force feedback conditions, directs a user to search a hole by sliding a cylindrical object on its surface. Experimental results indicate that the working efficiency was improved by force information using a sliding sense. Experiment 3 investigated effects of sound when the cylindrical object was oriented such that it could be inserted in a hole and the hole was approached in a state of contact. Experimental results demonstrate that working efficiency was not improved by presentation of acoustic information.
Detection of single cells: an observer study
Author(s):
Nancy L Ford;
Steven I Pollmann;
Damiaan F Habets;
David W Holdsworth
Show Abstract
Recent advances in imaging technology have brought high-resolution imaging into the practical laboratory setting. As a result, there has been increasing interest in imaging molecular and cellular processes in live animals. To image a single cell, a contrast medium, radiolabel or metallic label is inserted into the cell, which is then introduced into the animal. How well the cell is visualized depends upon the contrast-to-noise ratio between the cell and the surrounding tissues, along with other factors such as the amount of signal present in each voxel of the image and the size of the contrast-enhanced region, as compared with the image dimensions. Through observer studies, we are investigating the detectability of a single cell in an image. We synthesized uniform volumetric datasets with Gaussian distributed noise and altered a single voxel to reflect one of 5 different contrast-to-noise ratios (CNRs) to create the cell-labeled image. The maximum intensity projection was acquired through each image. For each dataset, a high-contrast signal-only image was flanked on either side by the noise image and the cell-labeled image to create an image triplet. The observer task was to locate the cell-labeled image in a two-alternative forced choice study.
Subjective assessment of high-level image compression of digitized mammograms
Author(s):
J. Ken Leader;
Jules H. Sumkin M.D.;
Marie A. Ganott M.D.;
Christiane M. Hakim M.D.;
Lara A. Hardesty M.D.;
Ratan Shah M.D.;
Luisa Wallace M.D.;
Amy H. Klym;
John M. Drescher;
Glenn S. Maitz;
David Gur
Show Abstract
This study was designed to evaluate radiologists’ ability to identify highly-compressed, digitized mammographic images displayed on high-resolution, monitors. Mammography films were digitized at 50 micron pixel dimensions using a high-resolution laser film digitizer. Image data were compressed using the irreversible (lossy), wavelet-based JPEG 2000 method. Twenty images were randomly presented in pairs (one image per monitor) in three modes: mode 1, non-compressed versus 50:1 compression; mode 2, non-compressed versus 75:1 compression; and mode 3, 50:1 versus 75:1 compression with 20 random pairs presented twice (80 pairs total). Six radiologists were forced to choose which image had the lower level of data compression in a two-alternative forced choice paradigm. The average percent correct across the six radiologists for modes 1, 2 and 3 were 52.5% (+/-11.3), 58.3% (+/-14.7), and 58.3% (+/-7.5), respectively. Intra-reader agreement ranged from 10 to 50% and Kappa from -0.78 to -0.19. Kappa for inter-reader agreement ranged from -0.47 to 0.37. The “monitor effect” (left/right) was of the same order of magnitude as the radiologists’ ability to identify the lower level of image compression. In this controlled evaluation, radiologists did not accurately discriminate non-compressed and highly-compressed images. Therefore, 75:1 image compression should be acceptable for review of digitized mammograms in a telemammography system.
Only stereo information improves performance in surgical tasks
Author(s):
Joerg W. Huber;
Neil S. Stringer;
Ian R. L. Davies;
David Field
Show Abstract
Laboratory based research has shown that the use of stereoscopic displays and observer-produced motion parallax in telepresence systems can each improve operators' performance beyond that achieved using conventional 2-D displays. In applied contexts such as minimal access surgery (MAS) tasks are more complex and a range of sources of depth information is available. We therefore decided to examine the benefits of stereoscopic displays and observer-produced motion parallax under more realistic conditions. The 'pick and place' task was taken from surgical performance studies. It involved picking up small irregular spheres from one place and dropping them through apertures in another. The task was performed under seven different viewing conditions: (1) baseline (monocular), (2) biocular, (3) stereoscopic, (4) free motion parallax, (5) instructed motion parallax, (6) augmented motion parallax and (7) stereo and motion parallax. Each subject did a baseline condition (monocular viewing) followed by one of the seven experimental conditions, followed by a final block of the baseline condition (n = 7 conditions x 10 subjects). Only stereoscopic viewing (conditions 3 and 7) leads to better performance. The provision of motion parallax adds nothing to performance. It may even reduce the effectiveness of stereoscopic viewing. The evidence converges on the fact that binocular viewing confers a considerable performance advantage, while providing motion parallax information, to novice operators at least, is not beneficial.