Proceedings Volume 4324

Medical Imaging 2001: Image Perception and Performance

cover
Proceedings Volume 4324

Medical Imaging 2001: Image Perception and Performance

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 26 June 2001
Contents: 7 Sessions, 31 Papers, 0 Presentations
Conference: Medical Imaging 2001 2001
Volume Number: 4324

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Observer Performance: Displays and Technology I
  • Observer Performance: Displays and Technology II
  • Evaluating, Enhancing, and Modeling Perception and Performance I
  • Evaluating, Enhancing, and Modeling Perception and Performance II
  • ROC and Other Performance Assessment Tools I
  • ROC and Other Performance Assessment Tools II
  • Poster Session
  • Observer Performance: Displays and Technology II
Observer Performance: Displays and Technology I
icon_mobile_dropdown
Color and contrast perception in monochrome medical imaging flat-panel displays
The interpretation of diagnostic grayscale images by human beings relies on luminance discrimination at photopic levels. The observer in his search for abnormality relies on luminance modulation. If this hypothesis is valid, then the color of a monochrome presentation should not affect diagnostic performance when the image luminance is equivalent to grayscale levels. Does observer preference for a particular tint influence his performance defining an ideally colored grayscale? In this paper, we studied the variations in supra-threshold contrast perception when using different colored scales to display psychophysics targets on uniform background. We used targets with six different colored scales based upon the hue and saturation levels, while maintaining a constant luminosity. The six colored scales and the 'white' grayscale constituted our set of seven colored scales used in a two-alternative forced choice scheme with random presentation and eighteen observers. All image targets contained the same degree of physical contrast and the same luminance values. We computed the degree of preference for all possible combinations of two colored scales. In spite of large inter-observer variability, we found that green and blue scales result in higher perceived contrast.
Influence of the characteristic curve on the clinical image quality and patient absorbed dose in lumbar spine radiography
Anders Tingberg, Clemens Herrmann, Birgitta Lanhede, et al.
The 'European Guidelines on Quality Criteria for Diagnostic Radiographic Images' do not address the choice of film characteristic (H/D) curve, which is an important parameter for the description of a radiographic screen-film system. Since it is not possible to investigate this influence by taking repeated exposures of the same patients on films with systematically varied H/D curves, patient images of lumbar spine were digitised in the current study. The image contrast was altered by digital image processing techniques, simulating images with H/D curves varying from flat over standard latitude to a film type steeper than a mammography film. The manipulated images were printed on film for evaluation. Seven European radiologists evaluated the clinical image quality of in total 224 images by analysing the fulfilment of the European Image Criteria and by visual grading analysis of the images. The results show that the local quality can be significantly improved by the application of films with a steeper film H/D curve compared to the standard latitude film. For images with an average optical density of about 1.25, the application of the steeper film results in a reduction of patient absorbed dose by about 10-15% without a loss of diagnostically relevant image information. The results also show that the patient absorbed dose reduction obtained by altering the tube voltage from 70 kV to 90 kV coincides with a loss of image information that cannot be compensated for by simply changing the shape of the H/D curve.
Observer performance using monitors with different phosphors: an ROC study
The goal of the study was to compare observer performance using a P104 vs P45 CRT monitor for display of radiographic images. Different types of phosphor in a CRT monitor can affect the SNR, which in turn could affect the visibility of certain structures of lesions in an image. A series of portable CR chest images with subtle pulmonary nodules were presented to radiologists. They were instructed to search the images while their eye position was recorded. They reported on the presence or absence of a nodule, rate their confidence and reported on image quality. Physical measurements were also taken for both monitors (e.g., dynamic range, MTF, veiling glare). Observer performance was slightly better with the P45 than with the P104 monitor, although the difference was not statistically significant. Physical characterization of the two monitors also revealed advantages for the P45 monitor. P104 monitors are typically used in Asia while P45 monitors are typically used in the US. This study shows that choice of monitor phosphor may influence diagnostic and/or visual search performance and thus should be taken into account when selecting a monitor for clinical use.
Optimization of detector pixel size for interventional x-ray fluoroscopy
Ravindra M. Manjeshwar, David L. Wilson
Visualization of guide-wires and stents is of great interest in x-ray fluoroscopy. Detector pixels need to be small to limit partial-area effects with pixel-thin guide-wires. However, as pixels become smaller, the number of x-ray photons per pixel and the pixel signal-to-noise-ratio (SNR) decrease. Hence, there is a trade-off between resolution and SNR that will affect visualization. For imaging interventional devices, we determined the optimum detector pixel size using human observer detection experiments and modeling. Contrast sensitivities for guide-wire detection were determined for an idealized digital detector with pixels of 100, 200, 300 and 500 micrometers and for a guide-wire diameter of 400 micrometers . Results indicate that the optimal pixel size is around 200 micrometers at fluoroscopic exposures. At higher exposures, a smaller detector pixel is desirable. With large pixels, detection was degraded significantly due to contrast-dilution from partial-area effects. A human observer model predicted results. Having established the model for an idealized situation, we then use it to predict the effect of physical detector parameters like MTF, NPS and fill-factor on image quality. The model is a very promising tool for designing detectors for dose-efficient x-ray systems.
Observer Performance: Displays and Technology II
icon_mobile_dropdown
Physical and psychophysical evaluation of a flat CRT monitor
Hans Roehrig, Elizabeth A. Krupinski, Toshihiko Furukawa
The goal of the study was to compare physical and psychophysical performance of a flat surface CRT monitor vs a traditional curved surface CRT monitor. Radiographs based on familiar projection techniques are planar images and are traditionally displayed by placing the film on a flat surface viewbox. Presenting them in digital form on a CRT with a curved surface may cause distortions, which might affect diagnoses, especially if the physical dimensions of the anatomy are important. Other problems with a curved surface occur due to reflections from ambient lights behind the observer. Two DataRay CRT monitors that had different types of front glass-panel surfaces were evaluated. The first was a traditional CRT monitor with a curved surface, the other was a CRT with a flat surface. Physical measurements included dynamic range, display function, veiling glare and spatial uniformity. The performance study used low contrast squarewave patterns to determine JNDs. Room lights were off in one condition and on in the other. For both studies, performance with the curved CRT was affected by ambient light - performance was better with lights off for the curved panel, but not very different for the flat panel with or without the lights on.
Evaluating, Enhancing, and Modeling Perception and Performance I
icon_mobile_dropdown
Role of computer-assisted visual search in mammographic interpretation
Calvin F. Nodine, Harold L. Kundel, Claudia Mello-Thoms, et al.
We used eye-position data to develop Computer-Assisted Visual Search (CAVS) as an aid to mammographic interpretation. CAVS feeds back regions of interest that receive prolonged visual dwell (greater than or equal to 1000 ms) by highlighting them on the mammogram. These regions are then reevaluated for possible missed breast cancers. Six radiology residents and fellows interpreted a test set of 40 mammograms twice, once with CAVS feedback (FB), and once without CAVS FB in a crossover, repeated- measures design. Eye position was monitored. LROC performance (area) was compared with and without CAVS FB. Detection and localization of malignant lesions improved 12% with CAVS FB. This was not significant. The test set contained subtle malignant lesions. 65% (176/272) of true lesions were fixated. Of those fixated, 49% (87/176) received prolonged attention resulting in CAVS FB, and 54% (47/87) of FBs resulted in TPs. Test-set difficulty and the lack of reading experience of the readers may have contributed to the relatively low overall performance, and may have also limited the effectiveness of CAVS FB which could only play a role in localizing potential lesions if the reader fixated and dwelled on them.
Can computer-aided diagnosis (CAD) help radiologists find mammographically missed screening cancers?
Robert M. Nishikawa, Maryellen Lissak Giger, Robert A. Schmidt, et al.
We present data from a pilot observer study whose goal is design a study to test the hypothesis that computer-aided diagnosis (CAD) can improve radiologists' performance in reading screening mammograms. In a prospective evaluation of our computer detection schemes, we have analyzed over 12,000 clinical exams. Retrospective review of the negative screening mammograms for all cancer cases found an indication of the cancer in 23 of these negative cases. The computer found 54% of these in our prospective testing. We added to these cases normal exams to create a dataset of 75 cases. Four radiologists experienced in mammography read the cases and gave their BI-RADS assessment and their confidence that the patient should be called back for diagnostic mammography. They did so once reading the films only and a second time reading with the computer aid. Three radiologists had no change in area under the ROC curve (mean Az of 0.73) and one improved from 0.73 to 0.78, but this difference failed to reach statistical significance (p equals 0.23). These data are being used to plan a larger more powerful study.
Using computer-assisted perception to determine the characteristics of missed and reported breast cancers
Claudia Mello-Thoms, Stanley M. Dunn, Calvin F. Nodine, et al.
Early detection of breast cancer is the desired goal in breast cancer screening. Nonetheless it has been reported in the literature that 10-30% of all breast cancers are missed by the radiologist, albeit most of these are deemed visible in retrospect on the mammogram. In this work we have studied the underlying structure of the areas that attracted the radiologist's visual attention and either yield or do not yield a response. We have shown that the spatial frequency profile of areas where a lesion is detected (TP) is significantly different from the one where a lesion is missed (FN), where a lesion is incorrectly placed (FP) or of lesion-free areas that are correctly identified (TN). Furthermore, we have shown that the spatial frequency profile alone can be used by an artificial neural network to predict decision outcome in that area of the image.
Anomalous nodule visibility effects in mammographic images
Dev Prasad Chakraborty, Harold L. Kundel
This study was undertaken to further investigate recent reports of unusual contrast-detail (CD) behavior in images with mammographic backgrounds, namely threshold contrast for detection increased with nodule size for Gaussian nodules. In this work we investigated the effects on the CD curve of allowing differently shaped nodules, in particular nodules with sharper edges. The following types of nodules/disks were studied: Gaussian shaped nodules and blurred disks, the latter characterized by a radius and an independent edge sharpness parameter. In a second type of disk the edge blur was held proportional to the disk radius. Ideal Observer detection thresholds were calculated for different nodule/disk radii ranging from 1.5 to 15 mm and observer performance studies were conducted. Noise power spectra (NPS) measurements confirmed the frequency dependence previously reported, NPS (alpha) 1/f3.1. For the Gaussian nodules we confirmed the reported CD behavior, with threshold contrast (alpha) radius0.2. However, for the disk nodules with fixed blur edges we observed different behavior (larger objects required less contrast), with threshold proportional to radius-0.28. For the disk nodules with variable blur the threshold contrast was almost independent of radius. In summary while we duplicated the reported CD diagrams for Gaussian nodules, different behavior was observed for nodules with edges. We conclude that in addition to considering the details of the noise, it is necessary to consider the signal properties in more detail.
Influence of enhanced visual processing (EVP) of chest images on workflow
Elizabeth A. Krupinski, Martin Radvany, Alan Levy, et al.
The goal was to determine the influence of processing radiographic images with an Enhanced Visualization Processing (EVP) method on workflow in a PACS environment. Portable CR chest images were obtained and processed with either a Fuji CR unit (no EVP) or with a Kodak CR unit (EVP). A security camera with VCR for recording was positioned above the workstation. Four radiologists reviewed the images during their normal work schedule. The current diagnostic image was used to determine if the case was EVP or non-EVP. Only cases with 3 images were used in the analysis. The videotapes of the sessions were reviewed to determine diagnostic viewing times and whether zoom and/or window/level was used. Viewing time was significantly longer for the non-EVP than the EVP cases. The difference occurred with all readers, but was highest for reader 1. Window/level was used on 35% of the EVP and 41% of the non-EVP images. The difference was not significant. Zoom was used on 64% of the EVP and 69% of the non-EVP images. EVP processing of chest images displayed on PACS monitors significantly improves workflow and viewing time. It decreases (although not significantly) use of window/level, but not use of zooming.
Evaluating, Enhancing, and Modeling Perception and Performance II
icon_mobile_dropdown
Optimization of noisy nonuniform sampling and image reconstruction for fast MRI using a human vision model
Kyle A. Salem, Hisamoto Moriguchi, Jeffrey L. Duerk, et al.
We are developing clinical magnetic resonance imaging (MRI) strategies using spiral acquisition techniques that sample k-space nonuniformly. These methods require a regridding process. Multiple regridding and reconstruction algorithms have been proposed, and we use a perceptual difference model (PDM) to optimize them. We acquired sixteen in vivo MR brain images and simulated reconstruction from a spiral k-space trajectory. Regridding was done by the conventional method of Jackson et al., the block uniform resampling algorithm (BURS), and a newly developed method named matrix rescaling. Each of 16 reference images was reconstructed with multiple parameter sets resulting in a total of over 800 different images. The spiral MR images were compared to the original, fully sampled image using a PDM. Of the three reconstruction methods, the conventional and high-level matrix rescaling methods produce high quality images, but the latter method executed much faster. BURS worked only in extremely low- noise instances, making it often inappropriate. We also demonstrated the effect of display parameters, such as grayscale windowing on image quality. We believe that the PDM techniques provide a promising tool for the evaluation of MR image quality that can aid the engineering design process.
Model observers for signal-known-statistically tasks (SKS)
Model observers have been successfully applied to predict human visual detection performance for tasks in which the signal is known a priori and does not vary from trial to trial (signal known exactly task). Although well understood, the signal known exactly task does not reflect aspects of real-life tasks where the signal might vary and the observer does not have full knowledge of the signal parameters. In this paper, we investigate performance in two tasks: a signal known to observers but with variable size and shape and compare it to a task where the signal is variable and not known to the observer (signal known statistically). The tasks are investigated in the context of 2-component noise (power law and white noise). We present a number of candidate multitemplate models for signal known statistically tasks that are natural extensions of the signal known exactly existing models. Human observer results show that although human performance in the signal known exactly but variable is in general better than the signal known exactly task, the differences in performance are not large and smaller than that of the ideal observer and other suboptimal models (e.g. non-prewhitening matched filter with an eye filter). For the ranges of size and shape uncertainty studied in this paper, our results suggest that the signal known exactly but variable task could be used as a first approximation to performance in the signal known statistically tasks. Therefore, the computationally simpler signal known exactly but variable task might be used in these circumstances as a figure of merit to evaluate and optimize performance in the more realistic signal known statistically tasks.
Bach, breasts, and power-law processes
Both the natural and manmade worlds abound with processes that have power-law spectra of the form, P(f)equalsK/f(beta ). Statistical properties of such processes are dramatically different from those of smoothed Gaussian random processes. There is an extreme concentration of spectral power at low frequencies and a unique correlation distance does not exist. In addition, processes that do not have a low frequency cutoff have infinite (undefined) variance for infinite data sets. The fact that mammographic structure has a power-law spectrum does not tell one a great deal about the underlying process that generated the structure. Many different processes can have the same second order statistics, example classes are: deterministic, stochastic, self-similar, self-affine, and chaotic. It will be necessary to develop or adapt a variety of analytical techniques to investigate the nature of mammographic statistics. Some examples of power-law processes will be described and some statistical properties of mammograms will be presented.
Maximum-likelihood and maximum-a-posteriori estimates of human-observer templates
Procedures for direct estimation of observer templates (also known as classification images) have largely focused on unbiased estimates. In this paper we take a different approach, deriving maximum likelihood (ML) and maximum a posteriori (MAP) procedures for estimating an observer template from the outcome of a two-alternative forced-choice experiment. While these estimation procedures will generally result in estimates with some bias, the reduction in variance can potentially outweigh the negative effects of a small bias. One promising feature of the ML and MAP estimates is that the distribution of the sample images used for the experiment is not necessary for evaluating the likelihood term implying that the method is robust to the distribution of images (although the validity of the assumptions used to derive these estimators may not be). It may therefore be possible to use these methods to estimate classification images for detection tasks in realistic images.
Evaluation of detection model performance in power-law noise
Two alternative forced-choice (2AFC) nodule detection performances of a number of model observers were evaluated for detection of simulated nodules in filtered power-law (1/f3) noise. The models included the ideal observer, the channelized Fisher-Hotelling (FH) model with two different basis function sets, the non-prewhitening matched filter with an eye filter (NPWE), and the Rose model with no DC response (RoseNDC). Detectability of the designer nodule signal was investigated. It has equation s((rho) )equalsA*Rect((rho) /2)(1-(rho) 2)v, where (rho) is a normalized distance (r/R), R is the nodule radius and A is signal amplitude. The nodule profile can be changed (designed) by changing the value of v. For example, the result is a sharp-edged, flat-topped disc for v equal to zero and the projection of a sphere for v equal to 0.5. Human observer experiments were done with nodules based on v equal to 0, 0.5 and 1.5. For the v equal to 1.5 case, human results could be well fitted using a variety of models. The human CD diagram slopes were -0.12, +0.27 and +0.44 for v equal to 0, 0.5 and 1.5 respectively.
ROC and Other Performance Assessment Tools I
icon_mobile_dropdown
Formal contaminated binormal model and fits of the model to classic ROC data
Donald D. Dorfman, Kevin S. Berbaum
A contaminated binormal receiver operating characteristic (ROC) model is proposed to account for ROC data with very few false positive reports even though many normal patients are sampled. The model assumes that for a proportion of abnormalities, no signal information is captured and that those abnormalities have the same distribution as noise along the latent decision axis. The new model can fit ROC data in which some or all of the ROC points have false positive fractions of zero and true positive fractions less than one without concluding perfect performance. The resulting ROC curves never exhibit inappropriate chance line crossings. The model holds that, for expert decision makers, there are situations in which the prevalence and utility matrix preclude operating points in some ROC regions. The model has a straightforward extension to the joint detection and localization ROC curve. Fits of the contaminated binormal ROC model to non-degenerate data from exemplary experiments in radiology were evaluated. For several studies, the contaminated binormal model fit the data better than conventional ROC models suggesting that contamination may not be limited to degenerate ROC data. This research has been published for a different audience.
Case sampling in LROC: a Monte Carlo analysis
We conducted a series of Monte Carlo simulations to investigate how hypothesis testing for modality effects in multireader localization ROC (LROC) studies is influenced by case effects. One specific goal was to evaluate for LROC studies the Dorfman-Berbaum-Metz method of analyzing case effects in reader data acquired from a single case-set. Previous evaluations with ROC study simulations found the DBM method to be moderately conservative. Our simulations, using procedures adapted from those earlier works, showed the DBM method to be a conservative test of modality effect in LROC studies as well. The degree of conservatism was greater for a critical value of (alpha) equals0.05 than for (alpha) equals0.01, and was not moderated by increased numbers of readers or cases. Other simulations investigated the tradeoff between power and empirical type-I error rate for the DBM method and two standard hypothesis tests. Besides the DBM method, a two-way analysis of variance (ANOVA) was applied to performance indices based on the LROC curve under an assumption of negligible case effects. The third test was a three-way ANOVA applied to performance indices, which required two sets of images per modality. With (alpha) equals0.01, the DBM method outperformed the other tests for studies with low numbers of readers and cases. In most other situations, its performance lagged behind that of the other tests.
Evaluating imaging systems in the absence of truth: a comparison of ROC and mixture distribution analysis in computer-aided diagnosis in mammography
Harold L. Kundel, Marcia Polansky, Megan Phelan
A rigorous ROC analysis requires independent proof of cases that cannot be obtained in many practical clinical situations. An alternative is measurement of diagnostic reliability as relative percent agreement (RPA). A large set of mammograms read with and without computer aided diagnosis (CAD) was used to compare the ROC area (Az) using proof and RPA using agreement. A subset of 767 (416 normal and 351 abnormal) cases read by 5 generalists and 5 mammographers was selected from the readings on 900 proved mammograms read with and without CAD. The Az was calculated using the multireader-multicase (MRMC) method. The RPA was calculated from a mixture distribution analysis (MDA) using the EM algorithm. Individual reader values were calculated by a jackknife procedure. With and without CAD the Az was .90 and .88 for the mammographers (p equals .04) and .87 and .86 for the generalists (p equals .5). The RPA was 90 and 83 for mammographers (p equals .08) and 82 and 85 for generalists (p equals .3). The correlation between Az and RPA was 0.6. The larger variance of the RPA decreases statistical power. The MDA may be a useful method for comparing imaging modalities in clinical studies where definitive proof cannot be obtained.
Some interesting examples of binormal degeneracy and analysis using a contaminated binormal ROC model
Kevin S. Berbaum, Donald D. Dorfman
Receiver operating characteristic (ROC) data with false positive fractions of zero are often difficult to fit with standard ROC methodology, and are sometimes discarded. Some extreme examples of such data were analyzed. A new ROC model is proposed that assumes that for a proportion of abnormalities, no signal information is captured and that those abnormalities have the same distribution as noise along the latent decision axis. Rating reports of fracture for single view ankle radiographs were also analyzed with the binormal ROC model and two proper ROC models. The conventional models gave ROC area close to one, implying a true positive fraction close to one. The data contained no such fractions. When all false positive fractions were zero, conventional ROC areas gave little or no hint of unmistakable differences in true positive fractions. In contrast, the new model can fit ROC data in which some or all of the ROC points have false positive fractions of zero and true positive fractions less than one without concluding perfect performance. These data challenge the validity and robustness of conventional ROC models, but the contaminated binormal model accounts for these data. This research has been published for a different audience.
Analysis of components of variance in multiple-reader studies of computer-aided diagnosis with different tasks
Sergey V. Beiden, Robert F. Wagner, Gregory Campbell, et al.
In recent years, the multiple-reader, multiple-case (MRMC) study paradigm has become widespread for receiver operating characteristic (ROC) assessment of systems for diagnostic imaging and computer-aided diagnosis. We review how MRMC data can be analyzed in terms of the multiple components of the variance (case, reader, interactions) observed in those studies. Such information is useful for the design of pivotal studies from results of a pilot study and also for studying the effects of reader training. Recently, several of the present authors have demonstrated methods to generalize the analysis of multiple variance components to the case where unaided readers of diagnostic images are compared with readers who receive the benefit of a computer assist (CAD). For this case it is necessary to model the possibility that several of the components of variance might be reduced when readers incorporate the computer assist, compared to the unaided reading condition. We review results of this kind of analysis on three previously published MRMC studies, two of which were applications of CAD to diagnostic mammography and one was an application of CAD to screening mammography. The results for the three cases are seen to differ, depending on the reader population sampled and the task of interest. Thus, it is not possible to generalize a particular analysis of variance components beyond the tasks and populations actually investigated.
ROC and Other Performance Assessment Tools II
icon_mobile_dropdown
Optimal method for combining two correlated diagnostic assessments with application to computer-aided diagnosis
Yulei Jiang, Charles E. Metz
We are developing computer-aided diagnosis (CAD) methods that produce a quantitative diagnostic assessment, such as the likelihood of malignancy of a breast lesion. Radiologists who use this computer aid must combine the computer's quantitative assessment with their own. No theoretical or empirical methods are currently available to help radiologists perform this task. Results of recent observer studies show that while CAD helps radiologists improve performance, radiologists' ad hoc performance tends to be inferior to that of the computer alone, indicating that they are unable to use computer aids optimally. We have developed a general method to combine two correlated diagnostic assessments. We calculate a likelihood ratio based on a bivariate binormal model that describes the joint probability density of the latent decision variables from two diagnostic assessments. To the extent that the bivariate binormal model is valid and that the model's parameters can be estimated reliably, results that we obtain in this way will be optimal because that likelihood ratio is used by the ideal observer in combining the diagnostic assessments. Preliminary results indicate that this method can produce better performance than that achieved by radiologists when they use computer aids in an ad hoc way. This method can potentially help radiologists use quantitative computed diagnostic assessments optimally, thereby surpassing the computer in accuracy.
Methods for identifying changes in radiologists' behavioral operating point of sensitivity-specificity trade-offs within an ROC study of the use of computer-aided detection of lung cancer
In this paper, we look at a different potentially useful method of behavior analysis, a method that may allow one to derive from the ROC confidence ratings of individual radiologists, a behavioral operating point that closely reflects the point where the radiologist would have decided to act or take no action on a case. This behavioral operating point appears appropriate for the calculation of cost benefit relationships and for studying how a radiologist shifts within ROC space when provided with Computer Aided Diagnosis (CADx) information.
Two-alternative forced-choice evaluation of 3D CT angiograms
Damiaan F. Habets, Brian E. Chapman, Allan J. Fox, et al.
This study describes the development and evaluation of an appropriate methodology to study observer performance when comparing 2D and 3D angiographic techniques. 3D-CT angiograms were obtained from patients with cerebral aneurysms or occlusive carotid artery disease and perspective rendering of this 3D data was performed to produce maximum intensity projections (MIP) at view angles identical to digital subtraction angiography (DSA) images. Two-alternative-forced-choice methodology (2AFC) was then used to determine the percent correct (Pc), which is equivalent to the area Az under the receiver-operating characteristic (RTOC) curve. In a comparison of CRA MIP images and DSA images of the intracranial vasculature, the average value of Pc was 0.90+/- 0.03. Perspective reprojection produces digitally reconstructed radiographs (DRRs) with image quality that is nearly equivalent to conventional DSA, with the additional clinical advantage of providing digitally reconstructed images at an unlimited number of viewing angles.
Establishing perceptual limits for medical image compression
The acceptance and uptake of the use of lossy compression for medical digital images depends on the reliable determination of limits on the loss of visual information, which would not adversely affect utility of the images for radiologists. A lack of widely accepted thresholds determining acceptable loss for various image modalities and for different compression techniques, motivates a need for further evidence of observer performance in such cases, via comparative studies. This paper contributes results of subjective testing for perceived image quality over three different image modality collections, using test sets of 5 images from each collection compressed by conventional lossy JPEG and by wavelet techniques at 4 different compression quality settings, presented independently to 4 radiologists for viewing. The results indicate a smooth decrease in perceived image quality in all three modalities when the compression quality setting was decreased, which implies there is some intrinsic difficulty in establishing a fixed ideal compression threshold within a modality. Furthermore, substantial differences were observed in the compression quality settings at which perceived absolute image quality was judged to be similar, across the three modalities. The results also indicate that the perceived image quality was usually slightly higher for wavelet compressed images than for JPEG compressed images, at a given compression quality setting.
ROC theory under the exponential assumption to analyze visual detection experimental data
Francisco J. Sanchez-Marin
Psychophysical data of 30 human observers that were not properly explained by ROC theory under the Gaussian assumption were analyzed assuming underlying exponential distributions. The detection experiments were done using the rating procedure. Under the power-law assumption a detectability index, similar to d', was deduced. An interesting result was that p(a), the area under the ROC, which is a free-distribution parameter, is very similar to the area under the power-law ROC for all our observers.
Poster Session
icon_mobile_dropdown
High-precision magnetoencephalo field-source estimation using the head model
Kazumasa Ito, Shinsuke Saita, Noboru Niki, et al.
In this paper we present the effect of the realistically- shaped head model on MEG inverse problem by using head model which is made up of triangle elements. Among many suggested methods until now, the head model most widely used is the sphere model. However the shape of human head is different from sphere and the conductivity of the head is not uniform. As basic study for accurate estimation, we present the effect of the realistically-shaped head model on MEG inverse problem by using head model which is made up of triangle elements. Then, we perform the estimation on the extract cortex by means of the Multiple Signal Classification (MUSIC) method.
Relationship between trabecular texture features of CT images and an amount of bone cement volume injection in percutaneous vertebroplasty
Gye Rae Tack, Hyung Guen Choi, Kyu-Chul Shin, et al.
Percutaneous vertebroplasty is a surgical procedure that was introduced for the treatment of compression fracture of the vertebrae. This procedure includes puncturing vertebrae and filling with polymethylmethacrylate (PMMA). Recent studies have shown that the procedure could provide structural reinforcement for the osteoporotic vertebrae while being minimally invasive and safe with immediate pain relief. However, treatment failures due to disproportionate PMMA volume injection have been reported as one of complications in vertebroplasty. It is believed that control of PMMA volume is one of the most critical factors that can reduce the incidence of complications. In this study, appropriate amount of PMMA volume was assessed based on the imaging data of a given patient under the following hypotheses: (1) a relationship can be drawn between the volume of PMMA injection and textural features of the trabecular bone in preoperative CT images and (2) the volume of PMMA injection can be estimated based on 3D reconstruction of postoperative CT images. Gray-level run length analysis was used to determine the textural features of the trabecular bone. The width of trabecular (T-texture) and the width of intertrabecular spaces (I-texture) were calculated. The correlation between PMMA volume and textural features of patient's CT images was also examined to evaluate the appropriate PMMA amount. Results indicated that there was a strong correlation between the actual PMMA injection volume and the area of the intertrabecular space and that of trabecular bone calculated from the CT image (correlation coefficient, requals0.96 and requals-0.95, respectively). T- texture (requals-0.93) did correlate better with the actual PMMA volume more than the I-texture (requals0.57). Therefore, it was demonstrated that appropriate PMMA injection volume could be predicted based on the textural analysis for better clinical management of the osteoporotic spine.
Evaluation of the 3D visualization of quantitative stereoelectroencephalographic information: new results
The visual analysis of Stereoelectroencephalographic (SEEG) signals in their anatomical context is aimed at the understanding of the spatio-temporal dynamics of epileptic processes. The magnitude of these signals may be encoded by graphical glyphs, having a direct impact on the perception of the values. Our study is devoted to the evaluation of the quantitative visualization of these signals, specifically to the influence of the coding scheme of the glyphs on the understanding and the analysis of the signals. This work describes an experiment conducted with human observers in order to evaluate three different coding schemes used to visualize the magnitude of SEEG signals in their 3D anatomical context. We intended to study if any of these coding schemes allows better performances for the human observers in two aspects: accuracy and speed. A protocol has been developed in order to measure these aspects. The results that will be presented in this work were obtained from 40 human observers. The comparison between the three coding schemes has first been performed through an Exploratory Data Analysis (EDA). The statistical significance of this comparison has then been established using nonparametric methods. The influence on the observers' performance of some other factors has also been investigated.
Data mining and online recognition of mammographic images based on Haar wavelets, principal component analysis, and rough sets methods
Roman W. Swiniarski, T. Luu, Anna K. Swiniarska, et al.
The paper presents an application of 2D Haar wavelets, neural networks, principal component analysis, and rough sets based data mining methods for classification of mammographic images. The features from the mammographic images have been extracted based on the 2D Haar wavelets. The rough sets method has been applied for feature selection and data reduction.
Effects of prevalence on visual search and decision making in fracture detection
Susan C. Ethell, David Manning
Research concerning disease prevalence has inferred that indices of observer performance become, in part, a function of predetermined prevalence. The cause of this modified performance and decision-making is not fully understood, although the alteration of the criterion level of the observer may be a feature. Novice radiography students were randomly assigned to one of three digitised test banks of 72 wrist images. Test bank A, B and C represented a fracture prevalence of 50%, 83% and 22% respectively. Half of the observers from each group were made aware of the prevalence of their respective test bank. Observers recorded their decisions on an operator rating scale. Results showed significant differences in overall Az between the 50% and 83% prevalence sets (p equals 0.04) and the 50% and 22% prevalence sets (p equals 0.005). Knowledge of the prevalence influenced both sensitivity and specificity values at the 83% prevalence level (p equals 0.03 and 0.02) but not at the lower levels. For test bank A sensitivity was 87%; specificity 53%; Az 0.80, test bank B sensitivity 81%; specificity 48%; Az 0.71 and test bank C sensitivity 85%; specificity 43%; Az 0.68. Analysis of the eye movement patterns of observers under conditions of varying prevalence is in progress.
Observer Performance: Displays and Technology II
icon_mobile_dropdown
Analysis of computer-aided diagnosis on radiologists' performance using an independent database
We developed a computerized method for the automated classification of benign and malignant mammographic mass lesions. An independent evaluation of the automatic method on a database consisting of 110 cases showed that the classification method is robust to variations in case-mix and in film digitization technique. We also evaluated the effectiveness of the method as an aid to radiologists in differentiating between benign and malignant masses. A total of 6 mammographers and 6 community general radiologists participated in an observer study. In that study, the radiologists interpreted the 110 cases in the independent database, unknown to both the radiologist observers and the trained computer classification method, first without and then with the computer aid. Results from our observer study indicated that use of the computer aid improved the abilities for both the expert and general radiologists in the task of differentiating between benign and malignant mammographic mass lesions, as indicated by the increase in Az values and sensitivities at statistically significant levels. With the database we used, however, we were unable to demonstrate the effect of computer aid on radiologist performance regarding the number of benign cases sent for biopsy. In this study, we investigate the relationship between the value of the computer output and the effect on the observers in terms of changing their patient management decision upon viewing the computer output.