Medical Imaging 2012: Image Perception, Observer Performance, and Technology Assessment | (2012) | Publications

Volume Details

Date Published: 5 April 2012

Contents: 9 Sessions, 64 Papers, 0 Presentations

Conference: SPIE Medical Imaging 2012

Volume Number: 8318

All links to SPIE Proceedings will open in the SPIE Digital Library.

Show all abstracts

View Session

Front Matter: Volume 8318
Technology Assessment
Image Display
ROC Analysis
Image Perception
Digital Pathology II: Joint Session with Conferences 8314 and 8315
Model Observers
Observer Performance
Poster Session

Front Matter: Volume 8318

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 8318, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.

Technology Assessment

CT detector evaluation with complex random backgrounds

Helen Fan, Harrison H. Barrett

Show abstract

Modern computed tomography (CT) uses detector arrays consisting of large numbers of photodiodes with scintil- lator crystals. The number of pixels in the array can play an important role in system performance. Considerable research has been performed on signal detection in flat backgrounds under various conditions, but little has been done with complex, random backgrounds in CT; our work investigates in particular the effect of the number of detector elements on signal detection by a channelized Hotelling observer in a complex background. For this project, a simulated three-dimensional phantom is generated with its attenuation equal to that of water. The phantom contains a smaller central section with random variations to simulate random anatomical structures. Cone-beam projections of the phantom are acquired at different angles and used to calculate the covariance matrix of the raw projection data. Laguerre-Gauss channels are used to reduce the dimensionality of each 2D projection and hence the size of the covariance matrix, but the covariance is still a function of two projection angles. A strong cross-channel correlation is observed as a function of the difference between the angles. A signal with known location and size is used, and the performance of the observer is calculated from the channel outputs at multiple projection angles. A contrast-detail diagram is computed for different variables such as signal size, number of incident x-ray photons, pixel size, etc. At a fixed observer signal-to-noise ratio (SNR), the contrast required to detect a signal increases dramatically as the signal size decreases.

Reader behavior in a detection task using single- and multislice image datasets

Asli Kumcu, Ljiljana Platiša, Milan Platiša, et al.

Show abstract

We assess human reader behavior such as reading times and browsing trends in a signal detection experiment with synthetic single-slice (ss) and multi-slice (ms) image datasets of varying task complexity, defined in this study as the ratio of the background lump size to the signal width. Three dataset types were generated by inserting one 3D Gaussian target of fixed size into the center of 3D volumes of correlated Gaussian noise with three different kernel sizes. Corresponding signal intensities were determined separately for the three background types using the staircase method targeting an AUC of 0.7 for ss datasets. Non-expert human readers were presented with ss (central slice of the volume) and ms datasets (slice-by-slice viewing in a stack-browsing mode). Readers were aware of the target's approximate location within the slice or volume. Readers could scroll freely through the ms datasets at arbitrary speed and direction with no time limit. Experiments were conducted in a controlled viewing environment on a 5MP digital mammography display. AUCs were 0.68-0.73 for ss; 0.82-0.98 for ms datasets. Reading time (ms, ss), the number of repetitions through the stack (ms), and the average number of slices per repetition (ms) were assessed. Browsing speeds were in the range of 1-7 slices per second. Results show that readers spent the shortest time and fewest repetitions reading TP cases, with FP and FN cases requiring the most attention. The reported trends concur with earlier chest x-ray and mammography studies which report that readers fixate longer on regions subsequently rated incorrectly.

An image-dependent model of veiling glare effects on detection performance in large-luminance-range displays

Mina Choi, Luigi Albani, Aldo Badano

Show abstract

One limitation of visual detection tasks in complex scenes with a large range of luminance values is the decrease in sensitivity due to veiling glare in the display device and in the human eye caused by unwanted light scattering. We used our previously measured results regarding the increase in detection thresholds due to veiling glare to formulate an empirical model for this phenomenon. Our results are based on a ring glare source and a Gaussian target on white noise using a dual-layer, high-dynamic-range liquid-crystal display prototype. The thresholds, measured using a double-random staircase technique with added signal-absent images, are modeled as a function of illuminance at the eyes and angular distance between the veiling glare source and the detection target. In this work, we model increases in detection contrast thresholds due to veiling glare for any image by calculating the contribution of each display pixel. We validate our model by determining threshold increases for the set of experimental results previously obtained with human subjects. Our imagedependent model predicts how the contrast threshold is affected by veiling glare for any target location. Finally, we discuss the range of validity of our model and show predictions for sample mammography, chest CT, and chest radiography images displayed on large-luminance-range devices.

Visual grading regression with random effects

Örjan Smedby, Mats Fredrikson, Jakob De Geer, et al.

Show abstract

To analyze visual grading experiments, ordinal logistic regression (here called visual grading regression, VGR) may be used in the statistical analysis. In addition to types of imaging or post-processing, the VGR model may include factors such as patient and observer identity, which should be treated as random effects. Standard software does not allow random factors in ordinal logistic regression, but using Generalized Linear Latent And Mixed Models (GLLAMM) this is possible. In a single-image study, 9 radiologists graded 24 cardiac Computed Tomography Angiography (CTA) images with reduced dose without and after post-processing with a 2D adaptive filter, using five image quality criteria. First, standard ordinal logistic regression was carried out, treating filtering, patient and observer identity as fixed effects. The same analysis was then repeated with GLLAMM, treating filtering as a fixed effect and patient and observer identity as random effects. With both approaches, a significant effect (p<0.01) of the filtering was found for all five criteria. No dramatic differences in parameter estimates or significance levels were found between the two approaches. It is concluded that random effects can be appropriately handled in VGR using GLLAMM, but no major differences in the results were found in a preliminary evaluation.

Computational observer approach for the assessment of stereoscopic visualizations for 3D medical images

Fahad Zafar, John Dorband, Aldo Badano

Show abstract

We present a computational stereoscopic observer approach inspired by the mechanisms of stereopsis in human vision that makes decisions based on a set of image pairs. Our stereo observer is constrained to a left and a right image generated using a visualization operator (ray tracing) to render simulated voxel datasets. We present the formulation of the observer based on model observer theory and discuss issues regarding simulated data generation and processing for this approach. The applicability of this observer extends to stereoscopic displays in the areas of entertainment, industrial, and medical imaging applications.

Image Display

Stereoscopic versus monoscopic detection of masses on breast tomosynthesis projection images

Gautam S. Muralidhar, Tejaswini Ganapathi, Alan C. Bovik, et al.

Show abstract

The goal of this study was to assess if stereoscopic viewing of breast tomosynthesis projection images impacted mass detection performance when compared to monoscopic viewing. The dataset for this study, provided by Hologic, Inc., contained 47 craniocaudal cases (23 biopsy proven malignant masses and 24 normals). Two projection images that were separated by 8 degrees were chosen to form a stereoscopic pair. The images were preprocessed to enhance their contrast and were presented on a stereoscopic display. Three experienced breast imagers participated in a blinded observer study as readers. Each case was shown twice to each reader - once in the stereoscopic mode, and once in the monoscopic mode in a random order. The readers were asked to make a binary decision on whether they saw a mass for which they would initiate a diagnostic workup or not, and also report the location of the mass and provide a confidence score in the range of 0-100. The binary decisions were analyzed using the sensitivity-specificity measure, while the confidence scores were analyzed using the Receiver Operating Characteristic curve (ROC). We also report a statistical analysis of the difference in partial AUC values greater than 95% sensitivity between the stereoscopic and monoscopic modes.

The effect of fixed eye adaptation when using displays with a high luminance range

Patrik Sund, Lars Gunnar Månsson, Magnus Båth

Show abstract

Calibration of medical review displays according to the part 14 Grayscale Standard Display Function (GSDF) is important in order to obtain consistency in displayed image quality since display technology and viewing conditions may vary substantially. Unfortunately, the purpose of the GSDF calibration is best suited for low luminance range conditions but is not optimal when using modern displays with a high luminance range. Low contrast objects will then obtain a greater visibility in mid-gray areas compared to similar objects in bright or dark regions. In this study, low contrast sinusoidal patterns were displayed on a high luminance range monitor under realistic viewing conditions. In order to simulate the viewing of an x-ray image with both dark and bright regions displayed simultaneously, the luminance of the patterns ranged from 2 to 600 cd/m² while the observers were always adapted to the logarithmic average of 35 cd/m². The results show a clear relationship between the patterns deviation from the adaptation luminance level and the necessary contrast required to detect the pattern. The results also indicate the potential for an improvement in the lowcontrast detectability over a large luminance range by adjusting the GSDF for the limited eye adaptation.

Perceptual enhancement of arteriovenous malformation in MRI angiography displays

Kamyar Abhari, John S. H. Baxter, Roy Eagleson, et al.

Show abstract

The importance of presenting medical images in an intuitive and usable manner during a procedure is essential. However, most medical visualization interfaces, particularly those designed for minimally-invasive surgery, suffer from a number of issues as a consequence of disregarding the human perceptual, cognitive, and motor system's limitations. This matter is even more prominent when human visual system is overlooked during the design cycle. One example is the visualization of the neuro-vascular structures in MR angiography (MRA) images. This study investigates perceptual performance in the usability of a display to visualize blood vessels in MRA volumes using a contour enhancement technique. Our results show that when contours are enhanced, our participants, in general, can perform faster with higher level of accuracy when judging the connectivity of different vessels. One clinical outcome of such perceptual enhancement is improvement of spatial reasoning needed for planning complex neuro-vascular operations such as treating Arteriovenous Malformations (AVMs). The success of an AVM intervention greatly depends on fully understanding the anatomy of vascular structures. However, poor visualization of pre-operative MRA images makes the planning of such a treatment quite challenging.

Radiologists' eye gaze when reading cranial CT images

Antje Venjakob, Tim Marnitz, Jan Mahler, et al.

Show abstract

Gaze tracking is a common method to assess perceptual processes when reading medical images. However, little attention has yet been paid to multi-slice images. The present study examines the gaze data of four experienced radiologists reading 15 cranial Computer Tomography scans (CCT), five of which contain lesions. The participants navigated freely through the slices, while their eye position was tracked. Participants' visual search performance was examined in terms of: time per case, scrolling pattern including the number of runs through each case and number of oscillations within each case, fixation duration, time to first fixate a lesion and the initial dwell time on a lesion. The results of the study indicate that performance and reading strategy differ between radiologists. The greatest behavioral differences occurred between the two readers, who performed best. One of them, participant 4, showed extremely short periods of inspection, few oscillations between the slices, short initial dwells on lesions and short time to first fixation, whereas participant 2 performed equally as well, but took longer to read individual cases, went through the slices with many more oscillations, showed longer time to first fixation and initial dwell times on lesions. The behavior displayed by participant 4 is consistent with expert behavior reading 2-dimensional images. In contrast, participant 2's behavior resembles that of a novice, namely because of the systematic search pattern employed. The results hint that expertise may be characterized by various and diverse strategies.

iPads and LCDs show similar performance in the detection of pulmonary nodules

Mark F. McEntee, Joanna Lowe, Marie Louise Butler, et al.

Show abstract

In February 2011 the University of Chicago Medical School distributed iPads to its trainee doctors for use when reviewing clinical information and images on the ward or clinics. The use of tablet computing devices is becoming widespread in medicine with Apple™ heralding them as "revolutionary" in medicine. The question arises, just because it is technical achievable to use iPads for clinical evaluation of images, should we do so? The current work assesses the diagnostic efficacy of iPads when compared with LCD secondary display monitors for identifying lung nodules on chest x-rays. Eight examining radiologists of the American Board of Radiology were involved in the assessment, reading chest images on both the iPad and the an off-the-shelf LCD monitor. Thirty chest images were shown to each observer, of which 15 had one or more lung nodules. Radiologists were asked to locate the nodules and score how confident they were with their decision on a scale of 1-5. An ROC and JAFROC analysis was performed and modalities were compared using DBM MRMC. The results demonstrate no significant differences in performance between the iPad and the LCD for the ROC AUC (p<0.075) or JAFROC FOM (p<0.059) for random readers and random cases. Sample size estimation showed that this result is significant at a power of 0.8 and an effect size of 0.05 for ROC and 0.07 for JAFROC. This work demonstrates that for the task of identifying pulmonary nodules, the use of the iPad does not significantly change performance compared to an off-the-shelf LCD.

ROC Analysis

Quantitative evaluation of the memory bias effect in ROC studies with PET/CT

Maria Kallergi, Nicoletta Pianou, Alexandros Georgakopoulos, et al.

Show abstract

PURPOSE. The purpose of the study was to evaluate the memory bias effect in ROC experiments with tomographic data and, specifically, in the evaluation of two different PET/CT protocols for the detection and diagnosis of recurrent thyroid cancer. MATERIALS AND METHODS. Two readers participated in an ROC experiment that evaluated tomographic images from 43 patients followed up for thyroid cancer recurrence. Readers evaluated first whole body PET/CT scans of the patients and then a combination of whole body and high-resolution head and neck scans of the same patients. The second set was read twice. Once within 48 hours of the first set and the second time at least a month later. The detection and diagnostic performances of the readers in the three reading sessions were assessed with the DBMMRMC and LABMRMC software using the area under the ROC curve as a performance index. Performances were also evaluated by comparing the number and the size of the detected abnormal foci among the three readings. RESULTS. There was no performance difference between first and second treatments. There were statistically significant differences between first and third, and second and third treatments showing that memory can seriously affect the outcome of ROC studies. CONCLUSION. Despite the fact that tomographic data involve numerous image slices per patient, the memory bias effect is present and substantial and should be carefully eliminated from analogous ROC experiments.

A new parametrization for the three-class ideal observer's decision rule

Darrin C. Edwards

Show abstract

Despite theoretical and practical difficulties, we are attempting to extend receiver operating characteristic (ROC) analysis to tasks with more than two classes. Previously we developed explicit analytical expressions for the behavior of the ideal observer acting on univariate trinormal data, and for the region of support of the ideal observer's decision variables when acting on bivariate trinormal data. Although explicit calculation of the ideal observer's behavior for general underlying data is difficult, we have developed a new set of parameters for describing the ideal observer's decision rule which may aid in analytic or numeric computation of the ideal observer's behavior.

A nonparametric approach to comparing the areas under correlated LROC curves

Adam Wunderlich, Frédéric Noo

Show abstract

In contrast to the ROC assessment paradigm, localization ROC (LROC) analysis provides a means to jointly assess the accuracy of visual search and detection in an observer study. In a typical multireader, multicase (MRMC) evaluation, the data sets are paired so that correlations arise in observer performance both between observers and between image reconstruction methods (or modalities). Therefore,MRMC evaluations motivate the need for a statistical methodology to compare correlated LROC curves. In this work, we suggest a nonparametric strategy for this purpose. Specifically, we find that seminal work of Sen on U-statistics can be applied to estimate the covariance matrix for a vector of LROC area estimates. The resulting covariance estimator is the LROC analog of the covariance estimator given by DeLong et al. for ROC analysis. Once the covariance matrix is estimated, it can be used to construct confidence intervals and/or confidence regions for purposes of comparing observer performance across reconstruction methods. The utility of our covariance estimator is illustrated with a human-observer LROC evaluation of three reconstruction strategies for fan-beam CT.

Image recognition and consistency of response

Tamara Miner Haygood M.D., John Ryan, Qing Mary Ashley Liu, et al.

Show abstract

Purpose: To investigate the connection between conscious recognition of an image previously encountered in an experimental setting and consistency of response to the experimental question.
Materials and Methods: Twenty-four radiologists viewed 40 frontal chest radiographs and gave their opinion as to the position of a central venous catheter. One-to-three days later they again viewed 40 frontal chest radiographs and again gave their opinion as to the position of the central venous catheter. Half of the radiographs in the second set were repeated images from the first set and half were new. The radiologists were asked of each image whether it had been included in the first set. For this study, we are evaluating only the 20 repeated images. We used the Kruskal-Wallis test and Fisher's exact test to determine the relationship between conscious recognition of a previously interpreted image and consistency in interpretation of the image.
Results. There was no significant correlation between recognition of the image and consistency in response regarding the position of the central venous catheter. In fact, there was a trend in the opposite direction, with radiologists being slightly more likely to give a consistent response with respect to images they did not recognize than with respect to those they did recognize.
Conclusion: Radiologists' recognition of previously-encountered images in an observer-performance study does not noticeably color their interpretation on the second encounter.

Inverse dependence of search and classification performances in lesion localization tasks

D. P. Chakraborty, Hong-Jun Yoon, Claudia Mello-Thoms

Show abstract

Search involves detecting the locations of potential lesions. Classification involves determining if a detected region is a true lesion. The most commonly used measure of observer performance, namely the area A under the ROC curve, is affected by both search and classification performances. The aim was to demonstrate a method for separating these contributions and apply it to several clinical datasets. Search performance S was defined as the square root of 2 times the perpendicular distance of the end-point of the search-model predicted ROC from the chance diagonal. Classification performance C was defined as the separation of the unit-variance binormal distributions for signal and noise sites. Eleven (11) datasets were fitted by the search model and search, classification and trapezoidal A were computed for each modality and reader combination. Kendall-tau correlations were calculated between the resulting S, C and A pairs. Kendall correlation (S vs. C) was smaller than zero for all datasets, and the average Kendall correlation was significantly smaller than 0 (average = -0.401, P = 8.3 x 10^-6). Also, Kendall correlation (A vs. S) was larger than zero for 9 out of 11 datasets and the average Kendall correlation was significantly larger than 0 (average = 0.295, P = 2.9 x 10^-3). On the other hand average Kendall correlation (A vs. C) was not significantly different from zero (average = 0.102, P = 0.25). The results suggest that radiologists may learn to compensate for poor search performance with better classification performance. This study also indicates that efforts at improving net performance, which currently focus almost exclusively on improving classification performance, may be more successful if aimed at improving search performance.

Image Perception

Outlining and categorising mammographic breast density: expert radiologist perception

Yanpeng Li, Patrick C. Brennan, Warwick Lee, et al.

Show abstract

The main aim of this study is to investigate the outlining and categorising of mammographic breast density by expert radiologists in order to help to understand what kind of region radiologists perceive as breast density and how they assess the density of a mammogram. It investigates inter-radiologist variability in breast density outlining and assessment. Forty-five normal cranio-caudal view mammograms with a range of appearances of breast density were presented to twenty radiologists. Each participant was asked to manually outline any mammographic breast density using an interactive pen tablet and to visually classify mammographic breast density in two ways by using the BI-RADS density categorization system by the American College of Radiology, and by estimating the percentage of area of mammographic breast density. Large differences were found in breast density outlining for all BI-RADS density categories. Scattered and patchy breast density appeared to be associated with large variation in outlining. There was moderate inter-radiologist agreement in BI-RADS density categorising (Kappa = 0.489). Breast density is a complex radiological feature that impacts upon assessment consistency.

Measurements of the detectability of hepatic hypovascular metastases as a function of retinal eccentricity in CT images

Ivan Diaz, Miguel P. Eckstein, Anaïs Luyet, et al.

Show abstract

The great amount of slices in volumetric data sets and limited time prevents human observers from exhaustively pointing their high resolution processing fovea to all regions in the images. Thus, many image-regions are processed with nonfoveal peripheral visual processing. Yet, most studies quantifying human detectability of signals in computer simulated textures and medical image backgrounds, have measured performance without consideration of the location of the signal in the observer's eye relative to the fovea (retinal eccentricity). Here, we measure human observer detectability of signals in CT images as a function of retinal eccentricity. A representative signal was extracted from a liver image and was added to healthy liver backgrounds at random positions. The retinal eccentricities of the signal were manipulated by varying the position of the point at which observers fixated with their eyes. Real-time video-based eye tracking was used to ensure steady fixation. High contrast fiduciary marks indicated the only possible location of the signal which was present in 50% of the images. Single CT slices were presented for 200 ms or 1 second. The observer was instructed to decide whether the image contained a signal (yes/no task). We probed 6 eccentricities with 420 decision trials per eccentricity. We found a large detectability degradation with retinal eccentricity with d' degrading by 50% at an eccentricity of 9 degrees for a 200 ms display time.

Signal-known exactly detection performance in tomosynthesis: does volume visualization help human observers?

I. Reiser, R. M. Nishikawa

Show abstract

Tomosynthesis produces three-dimensional images of an object, with non-isotropic resolution. Tomosynthesis images are typically read by human observers in a stack viewing mode, displaying planes through the tomosynthesis volume. The purpose of this study was to investigate whether human performance in a signal-known exactly (SKE) detection task improves when the entire tomosynthesis volume is available to the observer, compared to displaying a single plane through the signal center. The goal of this study was to improve understanding of human performance in order to aid development of observer models for tomosynthesis. Human performance was measured using sequential 2-alternative forced choice experiments. In each trial, the observer was first asked to select the signal-present ROI based on a single 2D tomosynthesis plane. Then, scrolling was enabled and the observer was able to select the signal-present ROI, based on knowledge of the entire volume. The number of correct decisions for 2D and 3D viewing was recorded, and the number of trials was recorded for which a score increase or decrease occured between 2D and 3D readings. Test images consisted of tomosynthesis reconstructions of simulated breast tissue, where breast tissue was modeled as binarized power-law noise. Tomosynthesis reconstructions of designer nodules of r = 250μm, r = 1mm, r = 4mm were added to the structured backgrounds. For each signal size, observers scored 256 trials with signal amplitude set so that the proportion of correct answers in the single slice was 90%. For two observers, a slight increase in performance was found when adjacent tomosynthesis slices were displayed, for the two larger signals. Statistical significance could not be established. The number of decision changes was analyzed for each observer. For these two observers, the number of decision changes that led to a score increase or decrease were outside the 95% confidence interval of the decision change being random, indicating that for these two observers, displaying the tomosynthesis stack did boost performance. For the other two observers, decision changes that increased or decreased the score were within the 95% confidence interval of guessing, indicating that the decision changes were due to a satisfaction of search effect. However the results also indicate that the performance increase is small and the majority of information appears to be contained in the tomosynthesis slice that corresponds to the center of the lesions.

Satisfaction of search errors detecting subtle fractures diminish in the presence of more serious injuries

Kevin S. Berbaum, Kevin M. Schartz, Robert T. Caldwell, et al.

Show abstract

Satisfaction of search (SOS) occurs when an abnormality is missed because another abnormality has been detected in radiology examinations. This research includes our study of whether the severity of a detected fracture determines whether subsequent fractures are overlooked. Each of 70 simulated multitrauma patients presented radiographs of three anatomic areas. Readers evaluated each patient under two experimental conditions: when the images of the first anatomic area included a severe fracture (the SOS condition), and when it did not (the control condition). The SOS effect was measured on detection accuracy for subtle test fractures presented on examinations of the second or third anatomic areas. SOS reduction in ROC area for detecting subtle test fractures with the addition of a major fracture to the first radiograph was not observed. The same absence of SOS that had been observed when high-morbidity added fractures were presented on CT was replicated with the high-morbidity added fractures presented on radiographs. This finding rules out the possibility that there was no SOS in the prior study with CT because SOS effects do not extend from one imaging modality to another. Taken together, the evidence rejects the hypothesis that the severity of a detected fracture determines the SOS for subsequently viewed fractures.

Predictive modeling of human perception subjectivity: feasibility study of mammographic lesion similarity

Songhua Xu, Kathleen Hudson M.D., Yong Bradley M.D., et al.

Show abstract

The majority of clinical content-based image retrieval (CBIR) studies disregard human perception subjectivity, aiming to duplicate the consensus expert assessment of the visual similarity on example cases. The purpose of our study is twofold: i) discern better the extent of human perception subjectivity when assessing the visual similarity of two images with similar semantic content, and (ii) explore the feasibility of personalized predictive modeling of visual similarity. We conducted a human observer study in which five observers of various expertise were shown ninety-nine triplets of mammographic masses with similar BI-RADS descriptors and were asked to select the two masses with the highest visual relevance. Pairwise agreement ranged between poor and fair among the five observers, as assessed by the kappa statistic. The observers' self-consistency rate was remarkably low, based on repeated questions where either the orientation or the presentation order of a mass was changed. Various machine learning algorithms were explored to determine whether they can predict each observer's personalized selection using textural features. Many algorithms performed with accuracy that exceeded each observer's self-consistency rate, as determined using a cross-validation scheme. This accuracy was statistically significantly higher than would be expected by chance alone (two-tailed p-value ranged between 0.001 and 0.01 for all five personalized models). The study confirmed that human perception subjectivity should be taken into account when developing CBIR-based medical applications.

Digital Pathology II: Joint Session with Conferences 8314 and 8315

Analysis of slide exploration strategy of cytologists when reading digital slides

Liron Pantanowitz M.D., Anil Parwani M.D., Eugene Tseytlin, et al.

Show abstract

Cytology is the sub-domain of Pathology that deals mainly with the diagnosis of cellular changes caused by disease. Current clinical practice involves a cytotechnologist that manually screens glass slides containing fixed cytology material using a light microscope. Screened slides are then forwarded to a specialized pathologist, a cytopathologist, for microscopic review and final diagnostic interpretation. If no abnormalities are detected, the specimen is interpreted as "normal", otherwise the abnormalities are marked with a pen on the glass slide by the cytotechnologist and then are used to render a diagnosis. As Pathology is migrating towards a digital environment it is important to determine whether these crucial screening and diagnostic tasks can be performed as well using digital slides as the current practice with glass slides. The purpose of this work is to make this assessment, by using a set of digital slides depicting cytological materials of different disease processes in several organs, and then to analyze how different cytologists including cytotechnologists, cytopathologists and cytotechnology-trainees explored the digital slides. We will (1) collect visual search data from the cytologists as they navigate the digital slides, as well as record any electronic marks (annotations) made by the cytologists; (2) convert the dynamic visual search data into a static representation of the observers' exploration strategy using 'search maps'; and (3) determine slide coverage, per viewing magnification range, for each group. We have developed a virtual microscope to collect this data, and this interface allows for interactive navigation of the virtual slide (including panning and zooming), as well as annotation of reportable findings. Furthermore, all interactions with the interface are time stamped, which allows us to recreate the cytologists' search strategy.

Influence of LCD color reproduction accuracy on observer performance using virtual pathology slides

Elizabeth A. Krupinski, Louis D. Silverstein, Syed F. Hashmi, et al.

Show abstract

The use of color LCDs in medical imaging is growing as more clinical specialties use digital images as a resource in diagnosis and treatment decisions. Telemedicine applications such as telepathology, teledermatology and teleophthalmology rely heavily on color images. However, standard methods for calibrating, characterizing and profiling color displays do not exist, resulting in inconsistent presentation. To address this, we developed a calibration, characterization and profiling protocol for color-critical medical imaging applications. Physical characterization of displays calibrated with and without the protocol revealed high color reproduction accuracy with the protocol. The present study assessed the impact of this protocol on observer performance. A set of 250 breast biopsy virtual slide regions of interest (half malignant, half benign) were shown to 6 pathologists, once using the calibration protocol and once using the same display in its "native" off-the-shelf uncalibrated state. Diagnostic accuracy and time to render a decision were measured. In terms of ROC performance, Az (area under the curve) calibrated = 0.8640; uncalibrated = 0.8558. No statistically significant difference (p = 0.2719) was observed. In terms of interpretation speed, mean calibrated = 4.895 sec, mean uncalibrated = 6.304 sec which is statistically significant (p = 0.0460). Early results suggest a slight advantage diagnostically for a properly calibrated and color-managed display and a significant potential advantage in terms of improved workflow. Future work should be conducted using different types of color images that may be more dependent on accurate color rendering and a wider range of LCDs with varying characteristics.

Compressing virtual pathology slides: human and model observer evaluation

Elizabeth A. Krupinski, Jeffrey P. Johnson, Stacey Jaw, et al.

Show abstract

We aim to improve telepathology images for diagnoses using compression based on information about human visual system. Underlying goal is to demonstrate utility of a visual discrimination model (VDM) for predicting observer performance. 100 ROIs from breast biopsy virtual slides at 5 levels of compression (uncompressed, 8:1, 16:1, 32:1, 64:1, 128:1) were shown to 6 pathologists to determine benign vs malignant. There was a decrease in performance as a function of compression (F = 14.58, p< 0.0001). The visibility of compression artifacts in the test images was predicted using a VDM. JND metrics were computed for each image including mean, median, ≥90^th percentiles, and maximum. For comparison PSNR and SSIM were also computed. Image distortion metrics were computed as a function of compression ratio and averaged across test images. All of the JND metrics were found to be highly correlated and differed primarily in magnitude. Both PSNR and SSIM decreased with bit rate, correctly reflecting a loss of image fidelity with increasing compression. The correlation of observer performance in the ROC experiment with image distortion metrics is shown in Figures 3 and 4. Observer performance (Az) was nearly constant up to a compression ratio of 32:1, then decreased significantly for 64:1 and 128:1 compression. The initial decline in Az occurred around a mean JND of 3, Minkowski JND of 4, and 99^th percentile JND of 6.5. Virtual pathology may be compressible to relatively high levels before impacting diagnostic accuracy and the VDM accurately predicts human performance.

Model Observers

Creation of an ensemble of simulated cardiac cases and a human observer study: tools for the development of numerical observers for SPECT myocardial perfusion imaging

J. Michael O'Connor, P. Hendrik Pretorius, Howard C. Gifford, et al.

Show abstract

Our previous Single Photon Emission Computed Tomography (SPECT) myocardial perfusion imaging (MPI) research explored the utility of numerical observers. We recently created two hundred and eighty simulated SPECT cardiac cases using Dynamic MCAT (DMCAT) and SIMIND Monte Carlo tools. All simulated cases were then processed with two reconstruction methods: iterative ordered subset expectation maximization (OSEM) and filtered back-projection (FBP). Observer study sets were assembled for both OSEM and FBP methods. Five physicians performed an observer study on one hundred and seventy-nine images from the simulated cases. The observer task was to indicate detection of any myocardial perfusion defect using the American Society of Nuclear Cardiology (ASNC) 17-segment cardiac model and the ASNC five-scale rating guidelines. Human observer Receiver Operating Characteristic (ROC) studies established the guidelines for the subsequent evaluation of numerical model observer (NO) performance. Several NOs were formulated and their performance was compared with the human observer performance. One type of NO was based on evaluation of a cardiac polar map that had been pre-processed using a gradient-magnitude watershed segmentation algorithm. The second type of NO was also based on analysis of a cardiac polar map but with use of a priori calculated average image derived from an ensemble of normal cases.

Volumetric detection tasks with varying complexity: human observer performance

Ljiljana Platiša, Asli Kumcu, Milan Platiša, et al.

Show abstract

This study explores detection performance trends of human observers with respect to two parameters: task complexity determined by the frequency content of background and signal, and image viewing mode: singleslice (ss) versus multi-slice (ms) stack-browsing image presentation. The images are 3D correlated Gaussian noise with a 3D Gaussian signal centered in the image volume. To vary task complexity, we consider three different noise kernels while keeping the signal spread constant across all images. In ss mode, only the central slice of the volume is presented to the observer, while in ms mode all slices are available. All human readings are conducted in a controlled viewing environment on a 5MP digital mammography medical display. Overall, in line with the literature, we find that human performance increases in ms relative to ss image presentation mode. Furthermore, our experiments indicate that the extent of difference between ms and ss performance is influenced by the properties of image data (level of task complexity): the difference in performance increases (from ΔAUC= 0.14 to ΔAUC= 0.30) as the difference in the frequency content of the signal and the background increases. In other words, the benefit of having additional slices available in ms mode is larger for lower-complexity tasks. Future studies shall focus on comparing the results of the present study to the existing model observers for volumetric images, ultimately aiming to design an anthropomorphic model observer for volumetric detection tasks.

Performance characteristics of a visual-search human-model observer with sparse PET image data

Howard C. Gifford

Show abstract

As predictors of human performance in detection-localization tasks, statistical model observers can have problems with tasks that are primarily limited by target contrast or structural noise. Model observers with a visual-search (VS) framework may provide a more reliable alternative. This framework provides for an initial holistic search that identifies suspicious locations for analysis by a statistical observer. A basic VS observer for emission tomography focuses on hot "blobs" in an image and uses a channelized nonprewhitening (CNPW) observer for analysis. In [1], we investigated this model for a contrast-limited task with SPECT images; herein, a statisticalnoise limited task involving PET images is considered. An LROC study used 2D image slices with liver, lung and soft-tissue tumors. Human and model observers read the images in coronal, sagittal and transverse display formats. The study thus measured the detectability of tumors in a given organ as a function of display format. The model observers were applied under several task variants that tested their response to structural noise both at the organ boundaries alone and over the organs as a whole. As measured by correlation with the human data, the VS observer outperformed the CNPW scanning observer.

Theoretical performance analysis of multislice channelized Hotelling observers

Bart Goossens, Ljiljana Platiša, Wilfried Philips

Show abstract

Quality assessment of 3D medical images is becoming increasingly important, because of clinical practice rapidly moving in the direction of volumetric imaging. In a recent publication, three multi-slice channelized Hotelling observer (msCHO) models are presented for the task of detecting 3D signals in multi-slice images, where each multi-slice image is inspected in a so called stack-browsing mode. The observer models are based on the assumption that humans observe multi-slice images in a simple two stage process, and each of the models implement this principle in a different way. In this paper, we investigate the theoretical performance, in terms of detection signal-to-noise-ratio (SNR) of msCHO models, for the task of detecting a separable signal in a Gaussian background with separable covariance matrix. We find that, despite the differences in architecture of the three models, they all have the same asymptotical performance in this task (i.e., when the number of training images tends to infinity). On the other hand, when backgrounds with nonseparable covariance matrices are considered, the third model, msCHO_c, is expected to perform slightly better than the other msCHO models (msCHO_a and msCHO_b), but only when sufficient training images are provided. These findings suggest that the choice between the msCHO models mainly depends on the experiment setup (e.g., the number of available training samples), while the relation to human observers depends on the particular choice of the "temporal" channels that the msCHO models use.

Utilizing the Hotelling template as a tool for CT image reconstruction algorithm design

Adrian A. Sanchez, Emil Y. Sidky, Xiaochuan Pan

Show abstract

Design of image reconstruction algorithms for CT can be significantly aided by useful metrics of image quality. Useful metrics, however, are difficult to develop due to the high-dimensionality of the CT imaging system, lack of spatial invariance in the imaging system, and a high degree of correlation among the image voxels. Although true task-based evaluation on realistic imaging tasks can be time-consuming, and a given task may be insensitive to the image reconstruction algorithm, task-based metrics can still prove useful in many contexts. For example, model observers that mimic performance of the imaging system on specific tasks can provide a low-dimensional measure of image quality while still accounting for many of the salient properties of the system and object being scanned. In this work, ideal observer performance is computed on a single detection task. The modeled signal for detection is taken to be very small - size on the order of a detector bin - and inspection of the accompanying Hotelling template is suggested. We hypothesize that improved detection on small signals may be sensitive to the reconstruction algorithm. Further, we hypothesize that structurally simple Hotelling templates may correlate with high human observer performance.

Observer Performance

Diagnostic accuracy of digital mammography versus tomosynthesis: effect of radiologists' experience

F. Zanca, M. Wallis, E. Moa, et al.

Show abstract

Purpose: To investigate whether readers' experience affects performance in a study comparing 2D digital mammography (2D) with 2-view (CC and MLO) or 1-view (MLO) tomosynthesis. Materials and Methods: One-hundred-thirty 2D cases were collected from screening assessment and referral clinics; 64 of the cases had verified abnormalities and the remaining were confirmed normal. Two-view tomosynthesis images were obtained from the same patients. Ten accredited readers (5 with ≥ 10 years experience in mammography and 5 with < 10 years) classified the cases in terms of malignancy (rate 0-5), and recall (yes/no), for both modalities. A second experiment was performed with the same cases, with 10 other readers (again 5 experienced / 5 less experienced), but using 2D and 1-view tomosynthesis as the two modalities. The multi-reader-multi-case ROC method was applied and the significance of diagnostic accuracy difference of 2D vs tomosynthesis was calculated, as a function of experience and for each experiment. Recall rate (RR) on malignant and benign cases was also calculated, along with reading time. Results: No significant difference was reached between 2D and 2-view tomosynthesis for experienced readers (pvalue= 0.25); for less experienced readers the p-value was significant (0.03). No significant difference was found between 2D and 1-view tomosynthesis, independent of readers' experience. RR for benign cases decreased for tomosynthesis (for booth 1- and 2-view), independent of experience. Average reading time per case was 79 s (range 65- 91 s) and 134 s (range 119-158 s) for experienced readers; 56 s (range 46-67 s) and 115s (range 97-142 s) for nonexperienced, for 2D and 2-view tomosynthesis respectively. Reading time was 74 s (range 43-98 s) and 99 s (range 73- 117 s) for experienced readers; 74 s (range 62-85 s) and 94 s (range 82-137 s) for non-experienced, for 2D and 1-view tomosynthesis respectively. Conclusions: For experienced readers, there is no evidence of improved diagnostic accuracy when using 2-view or 1- view tomosynthesis, while less experienced readers perform better with 2-view tomosynthesis than 2D images. Tomosynthesis reduces the number of recall of benign cases, without hindering cancer detection.

Is diagnostic accuracy for detecting pulmonary nodules in chest CT reduced after a long day of reading?

Elizabeth A. Krupinski, Kevin S. Berbaum, Robert Caldwell, et al.

Show abstract

Radiologists are reading more cases with more images, especially in CT and MRI and thus working longer hours than ever before. There have been concerns raised regarding fatigue and whether it impacts diagnostic accuracy. This study measured the impact of reader visual fatigue by assessing symptoms, visual strain via dark focus of accommodation, and diagnostic accuracy. Twenty radiologists and 20 radiology residents were given two diagnostic performance tests searching CT chest sequences for a solitary pulmonary nodule before (rested) and after (tired) a day of clinical reading. 10 cases used free search and navigation, and the other 100 cases used preset scrolling speed and duration. Subjects filled out the Swedish Occupational Fatigue Inventory (SOFI) and the oculomotor strain subscale of the Simulator Sickness Questionnaire (SSQ) before each session. Accuracy was measured using ROC techniques. Using Swensson's technique yields an ROC area = 0.86 rested vs. 0.83 tired, p (one-tailed) = 0.09. Using Swensson's LROC technique yields an area = 0.73 rested vs. 0.66 tired, p (one-tailed) = 0.09. Using Swensson's Loc Accuracy technique yields an area = 0.77 rested vs. 0.72 tired, p (one-tailed) = 0.13). Subjective measures of fatigue increased significantly from early to late reading. To date, the results support our findings with static images and detection of bone fractures. Radiologists at the end of a long work day experience greater levels of measurable visual fatigue or strain, contributing to a decrease in diagnostic accuracy. The decrease in accuracy was not as great however as with static images.

Indirect detection of pulmonary nodule on low-pass filtered and original x-ray images during limited and unlimited display times

Mariusz W. Pietrzyk, Mark McEntee, Michael G. Evanoff, et al.

Show abstract

Aim: This study evaluates the assumption that global impression is created based on low spatial frequency components of posterior-anterior chest radiographs. Background: Expert radiologists precisely and rapidly allocate visual attention on pulmonary nodules chest radiographs. Moreover, the most frequent accurate decisions are produced in the shortest viewing time, thus, the first hundred milliseconds of image perception seems be crucial for correct interpretation. Medical image perception model assumes that during holistic analysis experts extract information based on low spatial frequency (SF) components and creates a mental map of suspicious location for further inspection. The global impression results in flagged regions for detailed inspection with foveal vision. Method: Nine chest experts and nine non-chest radiologists viewed two sets of randomly ordered chest radiographs under 2 timing conditions: (1) 300ms; (2) free search in unlimited time. The same radiographic cases of 25 normal and 25 abnormal digitalized chest films constituted two image sets: low-pass filtered and unfiltered. Subjects were asked to detect nodules and rank confidence level. MRMC ROC DBM analyses were conducted. Results: Experts had improved ROC AUC while high SF components are displayed (p=0.03) or while low SF components were viewed under unlimited time (p=0.02) compared with low SF 300mSec viewings. In contrast, non-chest radiologists showed no significant changes when high SF are displayed under flash conditions compared with free search or while low SF components were viewed under unlimited time compared with flash. Conclusion: The current medical image perception model accurately predicted performance for non-chest radiologists, however chest experts appear to benefit from high SF features during the global impression.

Are improved rater reliability results associated with faster reaction times after rater training for judgments of laryngeal mucus?

Heather Shaw Bonilha, Amy Dawson, Katlyn McGrattan

Show abstract

Mucus aggregation on the vocal folds, a common complaint amongst persons with voice disorders, has been visually rated on four parameters: type, pooling, thickness, and location. Rater training is used to improve the reliability and accuracy of these ratings. The goal of this study was to evaluate the effect of training on rater reliability, accuracy and response time. Two raters scored mucus aggregation from 120 stroboscopic exams after a brief introductory session and again after a thorough training session. Reliability and accuracy were calculated in percent agreement. Two-tail paired t-tests were used to assess differences in reaction time for ratings before and after training. Inter-rater reliability improved from 79% pre-training to 92% post-training. Intra-rater reliability improved from 77% to 91% for Rater 1 and 80% to 88% for Rater 2 following training. Accuracy improved from 80% to 96% for Rater 1 and 76% to 95% for Rater 2 from pre- to post-training. Reaction time decreased for both raters (p=0.025). These findings further our understanding of observer performance on judgments of laryngeal mucus. These results suggest that rater training increases reliability and accuracy while decreasing reaction time. Future studies should assess the relationship of these judgments and voice changes.

Assessment of change in breast density: reader performance using synthetic mammographic images

Sue Astley, Chitra Swayamprakasam, Michael Berks, et al.

Show abstract

A recent study has shown that breast cancer risk can be reduced by taking Tamoxifen, but only if this results in at least a 10% point reduction in mammographic density. When mammographic density is quantified visually, it is impossible to assess reader accuracy using clinical images as the ground truth is unknown. Our aim was to compare three models of assessing density change and to determine reader accuracy in identifying reductions of 10% points or more. We created 100 synthetic, mammogram-like images comprising 50 pairs designed to simulate natural reduction in density within each pair. Model I: individual images were presented to readers and density assessed. Model II: pairs of images were displayed together, with readers assessing density for each image. Model III: pairs of images were displayed together, and readers asked whether there was at least a 10% point reduction in density. Ten expert readers participated. Readers' estimates of percentage density were significantly closer to the truth (6.8%-26.4%) when images were assessed individually rather than in pairs (9.6%-29.8%). Measurement of change was significantly more accurate in Model II than Model I (p<0.005). Detecting density changes of at least 10% points in image pairs, mean accuracy was significantly (p<0.005) lower (58%-88%) when change was calculated from density assessments than in Model III (74%-92%). Our results suggest that where readers need to identify change in density, images should be displayed alongside one another. In our study, less accurate assessors performed better when asked directly about the magnitude of the change.

Performance differences across the Atlantic when UK and USA radiologists read the same set of test screening cases

Yan Chen, Alastair G. Gale, Michael Evanoff

Show abstract

Two groups of experienced radiologists from the UK and the USA read the same set of 40 recent FFDM screening cases to examine the effects of mammography experience, volume of cases read per year, screening practice and monitor resolution on performance,. Sixteen American radiologists reported these cases using twin DICOM calibrated monitors which were half the resolution of the clinical mammographic workstations used by 16 UK radiologists. In terms of effects of volume of cases read per year, then when the group of American radiologists were split into high and low volume readers (using 5,000 cases p.a. as a criterion) no difference in any performance measure was found. This may be partly explained by the fact that they were all were very experienced which may have counteracted any case volume effect here. Comparing the two groups of radiologists from both countries, then the UK group performed better in terms of the number of cancers detected although the American group recalled more cases, despite having poorer monitors. This reflects differences in clinical screening practice between the countries, however differences simply due to the reporting monitors used cannot be ruled out. Data from the study were also compared to that from all UK screeners who had read these cases as either soft copy or as mammographic film.

Poster Session

Dose-optimized slice thickness for routine multislice computed tomography liver examinations

K. Dobeli, S. Lewis, S. Meikle, et al.

Show abstract

The need to optimize CT protocols with respect to radiation dose is widely recognized. This study uses phantom-based methodology to investigate the affect of changes in exposure and slice thickness on observer performance for the detection of low contrast opacities with multislice computed tomography to determine dose- optimized slice thickness and image noise for routine liver imaging. Methods: A phantom containing an opacity with diameter 9.5mm and density 10HU below background was scanned at various exposure and slice thickness settings (range 50-125mAs and 1-3mm). An image set consisting of 120 images containing background-only and 60 images containing the opacity in random locations was created. Following Institutional Review Board approval, nine experienced observers viewed the images and scored opacity visualization using a four-point confidence scale. Noise, contrast-to-noise ratio (CNR), sensitivity, specificity and area under the curve (AUC) were calculated. Comparisons between exposure and slice thickness settings were performed using ROC, Spearman and Wilcoxon techniques. Results: Significant (p<0.05) reductions in AUC and sensitivity occurred when CNR dropped to 0.71 or below and 0.68 or below, respectively. There was strong correlation between noise and AUC (r = -0.79, p<0.01), noise and sensitivity (r = -0.92, p<0.001), CNR and AUC (r = -0.90, p<0.001) and CNR and sensitivity (r = 0.61, p<0.05). Conclusion: Observer performance for the detection of opacities is strongly related to quantum noise and CNR. Dose optimized lesion detection was achieved with 5mm slice thickness and CNR of 0.72 and noise of 9.05.

Collaborative labeling of malignant glioma with WebMILL: a first look

Eesha Singh, Andrew J. Asman, Zhoubing Xu, et al.

Show abstract

Malignant gliomas are the most common form of primary neoplasm in the central nervous system, and one of the most rapidly fatal of all human malignancies. They are treated by maximal surgical resection followed by radiation and chemotherapy. Herein, we seek to improve the methods available to quantify the extent of tumors using newly presented, collaborative labeling techniques on magnetic resonance imaging. Traditionally, labeling medical images has entailed that expert raters operate on one image at a time, which is resource intensive and not practical for very large datasets. Using many, minimally trained raters to label images has the possibility of minimizing laboratory requirements and allowing high degrees of parallelism. A successful effort also has the possibility of reducing overall cost. This potentially transformative technology presents a new set of problems, because one must pose the labeling challenge in a manner accessible to people with little or no background in labeling medical images and raters cannot be expected to read detailed instructions. Hence, a different training method has to be employed. The training must appeal to all types of learners and have the same concepts presented in multiple ways to ensure that all the subjects understand the basics of labeling. Our overall objective is to demonstrate the feasibility of studying malignant glioma morphometry through statistical analysis of the collaborative efforts of many, minimally-trained raters. This study presents preliminary results on optimization of the WebMILL framework for neoplasm labeling and investigates the initial contributions of 78 raters labeling 98 whole-brain datasets.

Subjective evaluation of user experience in interactive 3D visualization in a medical context

Sylvain Tourancheau, Mårten Sjöström, Roger Olsson, et al.

Show abstract

New display technologies enable the usage of 3D-visualization in a medical context. Even though user performance seems to be enhanced with respect to 2D thanks to the addition of recreated depth cues, human factors, and more particularly visual comfort and visual fatigue can still be a bridle to the widespread use of these systems. This study aimed at evaluating and comparing two different 3D visualization systems (a market stereoscopic display, and a state-of-the-art multi-view display) in terms of quality of experience (QoE), in the context of interactive medical visualization. An adapted methodology was designed in order to subjectively evaluate the experience of users. 14 medical doctors and 15 medical students took part in the experiment. After solving different tasks using the 3D reconstruction of a phantom object, they were asked to judge their quality of the experience, according to specific features. They were also asked to give their opinion about the influence of 3D-systems on their work conditions. Results suggest that medical doctors are opened to 3D-visualization techniques and are confident concerning their beneficial influence on their work. However, visual comfort and visual fatigue are still an issue of 3D-displays. Results obtained with the multi-view display suggest that the use of continuous horizontal parallax might be the future response to these current limitations.

Implementation of combined SVM-algorithm and computer-aided perception feedback for pulmonary nodule detection

Mariusz W. Pietrzyk, Didier Rannou, Patrick C. Brennan

Show abstract

This pilot study examines the effect of a novel decision support system in medical image interpretation. This system is based on combining image spatial frequency properties and eye-tracking data in order to recognize over and under calling errors. Thus, before it can be implemented as a detection aided schema, training is required during which SVMbased algorithm learns to recognize FP from all reported outcomes, and, FN from all unreported prolonged dwelled regions. Eight radiologists inspected 50 PA chest radiographs with the specific task of identifying lung nodules. Twentyfive cases contained CT proven subtle malignant lesions (5-20mm), but prevalence was not known by the subjects, who took part in two sequential reading sessions, the second, without and with support system feedback. MCMR ROC DBM and JAFROC analyses were conducted and demonstrated significantly higher scores following feedback with p values of 0.04, and 0.03 respectively, highlighting significant improvements in radiology performance once feedback was used. This positive effect on radiologists' performance might have important implications for future CAD-system development.

Effect of morphing between unenhanced and multiscale enhanced chest radiographs on pulmonary nodule detection

Mariusz W. Pietrzyk, Fabian Zöhrer, Markus T. Harz, et al.

Show abstract

Aim: This study aims to determine the effectiveness of a novel image-processing algorithm for multi-scale enhancement of chest radiographs to improve detection and localization of real pulmonary nodules. Background: Our wavelet-based enhancement method interactively adjusts the contrast of medical images extracting the spatial frequency components at different scales, followed by a weighting procedure. This study aims to explore the usefulness of this novel procedure for chest image reporting. Method: Sixteen radiologists viewed 50 PA chest radiographs in order to localize pulmonary nodules. The databank contains 25 normal and 25 abnormal images, with multi-nodule cases. Subjects were allowed to mark unlimited number of locations followed by ranking confidence of nodule presence according to a 5-level scale. Subjects viewed all cases at least in two out of three conditions: unprocessed, enhanced and with morphing between these two. MCMR ROC and JAFROC analyses were conducted. Results: No significant differences were found in ROC AUC values across modalities and specialities. Only localization performance with morphing tool is significantly higher (F(1,8)=13.303, p=0.007) for chest expert (JAFROC FOM=0.6355) from non-chest (JAFROC FOM=0.4675) radiologists. Conclusion: Radiologists specialized in chest image interpretation performed consistently well in localizing pulmonary nodules, whereas non-chest radiologists were suffer from distracting effect of morphing tool.

Effect of selective suppression of spatial frequency domain noise on visual detection of a sample object in an inhomogeneous background

Mariusz W. Pietrzyk, J. Scott McDonald, Patrick C. Brennan, et al.

Show abstract

This study aims to investigate the effect of selective suppression of spatial frequency (SF) domain Gaussian white noise on visibility of a sample object in inhomogeneous backgrounds. SF-specific variation in signal-to-noise ratio due to selective signal averaging in the SF domain is a consequence of some of MRI acquisition methods. This study models the potential effect on visibility of an object in a complex image. A single disc was randomly positioned in 25 of 50 synthetic clustered lumpy background images. Neutral, low mid and high frequency suppressed Gaussian white noise was added in the frequency domain to simulate SF-weighted MRI signal averaging. Twelve readers performed visual searching and localization tasks on ordered sets. Subjects were asked to detect and locate discs and to rank confidence level. Sensitivity, specificity and ROC analyses were performed. Readers achieved significantly higher ROC AUC - Azscores - (p<0.001) and case-based sensitivity (p<0.001) and target-based sensitivity (p<0.001) with images in which low SF noise was suppressed. Also, significant higher cased-based sensitivity (p=0.005), target-based sensitivity (p=0.022) and Az-values (p=0.01) were scored under mid SF noise filtration. No significant differences were observed when images with SF-neutral noise suppression were compared with high SF noise suppression. In conclusion, increase of low and also mid SF signal signal-to-noise ratio significantly improves human performance in visual detection of simple targets in inhomogeneous backgrounds and suggests that a low SF bias in MRI signal averaging may enhance diagnostic quality.

Comparison of 2D versus 3D mammography with screening cases: an observer study

James Reza Fernandez, Ruchi Deshpande, Linda Hovanessian-Larsen, et al.

Show abstract

Breast cancer is the most common type of non-skin cancer in women. 2D mammography is a screening tool to aid in the early detection of breast cancer, but has diagnostic limitations of overlapping tissues, especially in dense breasts. 3D mammography has the potential to improve detection outcomes by increasing specificity, and a new 3D screening tool with a 3D display for mammography aims to improve performance and efficiency as compared to 2D mammography. An observer study using human studies collected from was performed to compare traditional 2D mammography with this new 3D mammography technique. A prior study using a mammography phantom revealed no difference in calcification detection, but improved mass detection in 2D as compared to 3D. There was a significant decrease in reading time for masses, calcifications, and normals in 3D compared to 2D, however, as well as more favorable confidence levels in reading normal cases. Data for this current study is currently being obtained, and a full report should be available in the next few weeks.

A potential method to identify poor breast screening performance

Leng Dong, Yan Chen, Alastair G. Gale, et al.

Show abstract

In the UK all breast screeners undertake the PERFORMS scheme where they annually read case sets of challenging cases. From the subsequent data it is possible to identify any individual who is performing significantly lower than their peers. This can then facilitate them being offered further targeted training to improve performance. However, currently this under-performance can only be calculated once all screeners have taken part, which means the feedback can potentially take several months. To determine whether such performance outliers could usefully be identified approximately much earlier the data from the last round of the scheme were re-analysed. From the information of 283 participants, 1,000 groups of them were selected randomly for fixed group sizes varying from four to 50 individuals. After applying bootstrapping on 1,000 groups, a distribution of low performance threshold values was constructed. Then the accuracy of estimation was determined by calculating the median value and standard error of this distribution as compared with the known actual results. Data indicate that increasing sample sizes improved the estimation of the median and decreased the standard error. Using information from as few as 25 individuals allowed an approximation of the known outlier cut off value and this improved with larger sample sizes. This approach is now implemented in the PERFORMS scheme to enable individuals who have difficulties, as compared to their peers, to be identified very early after taking part which can then help them to improve their performance.

Does the thinking aloud condition affect the search for pulmonary nodules?

Stephen Littlefair, Patrick Brennan, Warren Reed, et al.

Show abstract

Aim: To measure the effect of thinking aloud on perceptual accuracy and visual search behavior during chest radiograph interpretation for pulmonary nodules. Background: Thinking Aloud (TA) is an empirical research method used by researchers in cognitive psychology and behavioural analysis. In this pilot study we wanted to examine whether TA had an effect on the perceptual accuracy and search patterns of subjects looking for pulmonary nodules on adult posterioranterior chest radiographs (PA CxR). Method: Seven academics within Medical Radiation Sciences at The University of Sydney participated in two reading sessions with and without TA. Their task was to localize pulmonary nodules on 30 PA CxR using mouse clicks and rank their confidence levels of nodule presence. Eye-tracking recordings were collected during both viewing sessions. Time to first fixation, duration of first fixation, number of fixations, cumulative time of fixation and total viewing time were analysed. In addition, ROC analysis was conducted on collected outcome using DBM methodology. Results: Time to first nodule fixation was significantly longer (p=0.001) and duration of first fixation was significantly shorter (p=0.043). No significant difference was observed in ROC AUC scores between control and TA conditions. Conclusion: Our results confirm that TA has little effect on perceptual ability or performance, except for prolonging the task. However, there were significant differences in visual search behavior. Future researchers in radio-diagnosis could use the think aloud condition rather than silence so as to more closely replicate the clinical scenario.

Effect of lesion blurring on observer performance in AFC experiments using chest CT images

Kent M. Ogden, Danielle Williams, Dalanda Jalloh, et al.

Show abstract

The goal was to analyze the influence of blurring of artificial lesions on observer performance during AFC experiments in chest CT images. Lesion images were generated by scanning Teflon rods of multiple sizes (3/16", 1/4", 5/16", 3/8", and 1/2") in a General Electric VCT scanner. Images were reconstructed using Bone and Detail reconstruction algorithms and cropped for use in AFC experiments. Three sets of artificial lesions (simple disks) were generated mathematically at the same sizes as the Teflon lesions, with two of the sets blurred with 3x3 and 5x5 averaging kernels. All lesions were scaled to have the same maximum intensity. Approximately 180 normal chest CT images (both Bone and Detail algorithm) were collected under IRB exemption for use in 2-AFC experiments. Two observers conducted AFC experiments using the Teflon lesions with the appropriate CT images, and using the artificial lesions in both sets of CT images. A performance metric was calculated that allowed comparison of experimental results. For Bone algorithm images, the Teflon and un-blurred lesions produced equivalent performance. Performance was significantly worse using the blurred lesions. For the Detail algorithm images, un-blurred lesion performance was significantly better than with the Teflon lesion. The performance using the 3x3-blurred lesions was the closest to the Teflon lesion performance, though it was slightly worse. Using these results, it is possible to design artificial lesions of any size for use in AFC experiments that will result in observer performance equivalent to that when using lesions derived from physical phantoms.

A feasibility assessment of automated FISH image and signal analysis to assist cervical cancer detection

Xingwei Wang, Yuhua Li, Hong Liu, et al.

Show abstract

Fluorescence in situ hybridization (FISH) technology provides a promising molecular imaging tool to detect cervical cancer. Since manual FISH analysis is difficult, time-consuming, and inconsistent, the automated FISH image scanning systems have been developed. Due to limited focal depth of scanned microscopic image, a FISH-probed specimen needs to be scanned in multiple layers that generate huge image data. To improve diagnostic efficiency of using automated FISH image analysis, we developed a computer-aided detection (CAD) scheme. In this experiment, four pap-smear specimen slides were scanned by a dual-detector fluorescence image scanning system that acquired two spectrum images simultaneously, which represent images of interphase cells and FISH-probed chromosome X. During image scanning, once detecting a cell signal, system captured nine image slides by automatically adjusting optical focus. Based on the sharpness index and maximum intensity measurement, cells and FISH signals distributed in 3-D space were projected into a 2-D con-focal image. CAD scheme was applied to each con-focal image to detect analyzable interphase cells using an adaptive multiple-threshold algorithm and detect FISH-probed signals using a top-hat transform. The ratio of abnormal cells was calculated to detect positive cases. In four scanned specimen slides, CAD generated 1676 con-focal images that depicted analyzable cells. FISH-probed signals were independently detected by our CAD algorithm and an observer. The Kappa coefficients for agreement between CAD and observer ranged from 0.69 to 1.0 in detecting/counting FISH signal spots. The study demonstrated the feasibility of applying automated FISH image and signal analysis to assist cyto-geneticists in detecting cervical cancers.

Assembly and evaluation of a training module and dataset with feedback for improved interpretation of digital breast tomosynthesis examinations

David Gur, Margarita L. Zuley, Jules H. Sumkin, et al.

Show abstract

The FDA recently approved Digital Breast Tomosynthesis (DBT) for use in screening for the early detection of breast cancer. However, MQSA qualification for interpreting DBT through training was noted as important. Performance issues related to training are largely unknown. Therefore, we assembled a unique computerized training module to assess radiologists' performances before and after using the training module. Seventy-one actual baseline mammograms (no priors) with FFDM and DBT images were assembled to be read before and after training with the developed module. Fifty examinations of FFDM and DBT images enriched with positive findings were assembled for the training module. Depicted findings were carefully reviewed, summarized, and entered into a specially designed training database where findings were identified by case number and synchronized to the display of the related FFDM plus DBT examinations on a clinical workstation. Readers reported any findings using screening BIRADS (0, 1, or 2) followed by instantaneous feedback of the verified truth. Six radiologists participated in the study and reader average sensitivity and specificity were compared before and after training. Average sensitivity improved and specificity remained relatively the same after training. Performance changes may be affected by disease prevalence in the training set.

Assessment of two mammographic density related features in predicting near-term breast cancer risk

Bin Zheng, Jules H. Sumkin, Margarita L. Zuley, et al.

Show abstract

In order to establish a personalized breast cancer screening program, it is important to develop risk models that have high discriminatory power in predicting the likelihood of a woman developing an imaging detectable breast cancer in near-term (e.g., <3 years after a negative examination in question). In epidemiology-based breast cancer risk models, mammographic density is considered the second highest breast cancer risk factor (second to woman's age). In this study we explored a new feature, namely bilateral mammographic density asymmetry, and investigated the feasibility of predicting near-term screening outcome. The database consisted of 343 negative examinations, of which 187 depicted cancers that were detected during the subsequent screening examination and 155 that remained negative. We computed the average pixel value of the segmented breast areas depicted on each cranio-caudal view of the initial negative examinations. We then computed the mean and difference mammographic density for paired bilateral images. Using woman's age, subjectively rated density (BIRADS), and computed mammographic density related features we compared classification performance in estimating the likelihood of detecting cancer during the subsequent examination using areas under the ROC curves (AUC). The AUCs were 0.63±0.03, 0.54±0.04, 0.57±0.03, 0.68±0.03 when using woman's age, BIRADS rating, computed mean density and difference in computed bilateral mammographic density, respectively. Performance increased to 0.62±0.03 and 0.72±0.03 when we fused mean and difference in density with woman's age. The results suggest that, in this study, bilateral mammographic tissue density is a significantly stronger (p<0.01) risk indicator than both woman's age and mean breast density.

Evaluation of low contrast detectability performance using two-alternative forced choice method on computed tomography dose reduction algorithms

Jiahua Fan, Priti Madhav, Paavana Sainath, et al.

Show abstract

Today lowering patient radiation dose while maintaining image quality in Computed Tomography has become a very active research field. Various iterative reconstruction algorithms have been designed to improve/maintain image quality for low dose patient scans. Typically radiation dose variation will result in detectability variation for low contrast objects. This paper assesses the low contrast detectability performance of the images acquired at different dose levels and obtained using different image generation algorithms via two-alterative forced choice human observer method. Filtered backprojection and iterative reconstruction algorithms were used in the study. Results showed that for the objects and scan protocol used, the iterative algorithm employed in this study has similar low contrast detectability performance compared to filtered backprojection algorithm at a 4 times lower dose level. It also demonstrated that well controlled human observer study is feasible to assess the image quality of a CT system.

Classification of thyroid nodules using a resonance-frequency-based electrical impedance spectroscopy: progress assessment

Bin Zheng, Mitchell E. Tublin, Dror Lederman, et al.

Show abstract

The incidence of thyroid cancer is rising faster than other malignancies and has nearly doubled in the United States (U.S.) in the last 30 years. However, classifying between malignant and benign thyroid nodules is often difficult. Although ultrasound guided Fine Needle Aspiration Biopsy (FNAB) is considered an excellent tool for triaging patients, up to 25% of FNABs are inconclusive. As a result, definitive diagnosis requires an exploratory surgery and a large number of these are performed in the U.S. annually. It would be extremely beneficial to develop a non-invasive tool or procedure that could assist in assessing the likelihood of malignancy of otherwise indeterminate thyroid nodules, thereby reducing the number of exploratory thyroidectomies that are performed under general anesthesia. In this preliminary study we demonstrate a unique hand-held Resonance-frequency based Electrical Impedance Spectroscopy (REIS) device with six pairs of detection probes to detect and classify thyroid nodules using multi-channel EIS output signal sweeps. Under an Institutional Review Board (IRB)-approved case collection protocol, this REIS device is being tested in our clinical facility and we have been collecting an initial patient data set since March of this year. Between March and August of 2011, 65 EIS tests were conducted on 65 patients. Among these cases, six depicted pathology-verified malignant cells. Our initial assessment indicates the feasibility of easily applying this REIS device and measurement approach in a very busy clinical setting. The measured resonance frequency differences between malignant and benign nodules could potentially make it possible to accurately classify indeterminate thyroid nodules.

Registration of T2-weighted and diffusion-weighted MR images of the prostate: comparison between manual and landmark-based methods

Yahui Peng, Yulei Jiang, Fatma Nur Soylu, et al.

Show abstract

Quantitative analysis of multi-parametric magnetic resonance (MR) images of the prostate, including T2-weighted (T2w) and diffusion-weighted (DW) images, requires accurate image registration. We compared two registration methods between T2w and DW images. We collected pre-operative MR images of 124 prostate cancer patients (68 patients scanned with a GE scanner and 56 with Philips scanners). A landmark-based rigid registration was done based on six prostate landmarks in both T2w and DW images identified by a radiologist. Independently, a researcher manually registered the same images. A radiologist visually evaluated the registration results by using a 5-point ordinal scale of 1 (worst) to 5 (best). The Wilcoxon signed-rank test was used to determine whether the radiologist's ratings of the results of the two registration methods were significantly different. Results demonstrated that both methods were accurate: the average ratings were 4.2, 3.3, and 3.8 for GE, Philips, and all images, respectively, for the landmark-based method; and 4.6, 3.7, and 4.2, respectively, for the manual method. The manual registration results were more accurate than the landmark-based registration results (p < 0.0001 for GE, Philips, and all images). Therefore, the manual method produces more accurate registration between T2w and DW images than the landmark-based method.

A systematic review of automated melanoma detection in dermatoscopic images and its ground truth data

Abder-Rahman A. Ali, Thomas M. Deserno

Show abstract

Malignant melanoma is the third most frequent type of skin cancer and one of the most malignant tumors, accounting for 79% of skin cancer deaths. Melanoma is highly curable if diagnosed early and treated properly as survival rate varies between 15% and 65% from early to terminal stages, respectively. So far, melanoma diagnosis is depending subjectively on the dermatologist's expertise. Computer-aided diagnosis (CAD) systems based on epiluminescense light microscopy can provide an objective second opinion on pigmented skin lesions (PSL). This work systematically analyzes the evidence of the effectiveness of automated melanoma detection in images from a dermatoscopic device. Automated CAD applications were analyzed to estimate their diagnostic outcome. Searching online databases for publication dates between 1985 and 2011, a total of 182 studies on dermatoscopic CAD were found. With respect to the systematic selection criterions, 9 studies were included, published between 2002 and 2011. Those studies formed databases of 14,421 dermatoscopic images including both malignant "melanoma" and benign "nevus", with 8,110 images being available ranging in resolution from 150 x 150 to 1568 x 1045 pixels. Maximum and minimum of sensitivity and specificity are 100.0% and 80.0% as well as 98.14% and 61.6%, respectively. Area under the receiver operator characteristics (AUC) and pooled sensitivity, specificity and diagnostics odds ratio are respectively 0.87, 0.90, 0.81, and 15.89. So, although that automated melanoma detection showed good accuracy in terms of sensitivity, specificity, and AUC, but diagnostic performance in terms of DOR was found to be poor. This might be due to the lack of dermatoscopic image resources (ground truth) that are needed for comprehensive assessment of diagnostic performance. In future work, we aim at testing this hypothesis by joining dermatoscopic images into a unified database that serves as a standard reference for dermatology related research in PSL classification.

User-friendly tools on handheld devices for observer performance study

Takuya Matsumoto, Takeshi Hara, Junji Shiraishi, et al.

Show abstract

ROC studies require complex procedures to select cases from many data samples, and to set confidence levels in each selected case to generate ROC curves. In some observer performance studies, researchers have to develop software with specific graphical user interface (GUI) to obtain confidence levels from readers. Because ROC studies could be designed for various clinical situations, it is difficult task for preparing software corresponding to every ROC studies. In this work, we have developed software for recording confidence levels during observer studies on tiny personal handheld devices such as iPhone, iPod touch, and iPad. To confirm the functions of our software, three radiologists performed observer studies to detect lung nodules by using public database of chest radiograms published by Japan Society of Radiological Technology. The output in text format conformed to the format for the famous ROC kit from the University of Chicago. Times required for the reading each case was recorded very precisely.

Studying the relative impact of ghosting and noise on the perceived quality of MR images

Hantao Liu, Jos Koonen, Miha Fuderer, et al.

Show abstract

In current magnetic resonance (MR) imaging systems, design choices are confronted with a trade-off between structured (i.e. artifacts) and unstructured noise. The impact of both types of noise on perceived image quality, however, is so far unknown, while this knowledge would be highly beneficial for further improvement of MR imaging systems. In this paper, we investigate how ghosting artifacts (i.e. structured noise) and random noise, applied at the same energy level in the distortion, affect the perceived quality of MR images. To this end, a perception experiment is conducted with human observers rating the quality of a set of images, distorted with various levels of ghosting and noise. To also understand the influence of professional expertise on the image quality assessment task, two groups of observers with different levels of medical imaging experience participated in the experiment: one group contained fifteen clinical scientists or application specialists, and the other group contained eighteen naïve observers. Experimental results indicate that experts and naïve observers differently assess the quality of MR images degraded with ghosting/noise. Naïve observers consistently rate images degraded with ghosting higher than images degraded with noise, independent of the energy level of the distortion, and of the image content. For experts, the relative impact of ghosting and noise on perceived quality tends to depend on the energy level of the distortion and on the image content, but overall the energy of the distortion is a promising metric to predict perceived image quality.

Combined collimator/reconstruction optimization for myocardial perfusion SPECT imaging using polar map-based LROC numerical observer

Souleymane Konate, P. Hendrik Pretorius, Howard C. Gifford, et al.

Show abstract

Polar maps have been used to assist clinicians diagnose coronary artery diseases (CAD) in single photon emission computed tomography (SPECT) myocardial perfusion imaging. Herein, we investigate the optimization of collimator design for perfusion defect detection in SPECT imaging when reconstruction includes modeling of the collimator. The optimization employs an LROC clinical model observer (CMO), which emulates the clinical task of polar map detection of CAD. By utilizing a CMO, which better mimics the clinical perfusion-defect detection task than previous SKE based observers, our objective is to optimize collimator design for SPECT myocardial perfusion imaging when reconstruction includes compensation for collimator spatial resolution. Comparison of lesion detection accuracy will then be employed to determine if a lower spatial resolution hence higher sensitivity collimator design than currently recommended could be utilized to reduce the radiation dose to the patient, imaging time, or a combination of both. As the first step in this investigation, we report herein on the optimization of the three-dimensional (3D) post-reconstruction Gaussian filtering of and the number of iterations used to reconstruct the SPECT slices of projections acquired by a low-energy generalpurpose (LEGP) collimator. The optimization was in terms of detection accuracy as determined by our CMO and four human observers. Both the human and all four CMO variants agreed that the optimal post-filtering was with sigma of the Gaussian in the range of 0.75 to 1.0 pixels. In terms of number of iterations, the human observers showed a preference for 5 iterations; however, only one of the variants of the CMO agreed with this selection. The others showed a preference for 15 iterations. We shall thus proceed to optimize the reconstruction parameters for even higher sensitivity collimators using this CMO, and then do the final comparison between collimators using their individually optimized parameters with human observers and three times the test images to reduce the statistical variation seen in our present results.

Characterizing atherosclerotic plaque with computed tomography: a contrast-detail study

Nima Kasraie, Geoffrey David Clarke

Show abstract

Plaque characterization may benefit from the increasing distinctiveness of the attenuating properties of different soft plaque components at lower energies. Due to the relative slight increase in the CT number of the nonadipose soft plaque at lower tube voltage settings vs. adipose plaque, a higher contrast between atheromous adipose and non-adipose plaque may become visible with modern 64 slice systems. A contrast-detail (C-D) phantom with varying plaque composition as the contrast generating method, was imaged on a commercial 64 slice MDCT system using 80, 120, and 140 kVp settings. The same phantom was also imaged on a Cone Beam CT (CBCT) system with a lower tube voltage of 75 kVp. The results of experiments from four different observers on three different plaque types (lipid, fiber, calcific) indicate that CT attenuation within lipid cores and fibrous masses vary not only with the percentage of lipid or fiber present, but also with the size of the cores. Furthermore, the C-D curve analysis for all three plaque types reveals that while the noise constraints prevent visible differentiation of soft plaque at current conventional 64 slice MDCT settings, CBCT exhibits superior visible contrast detectability than its conventional counterpart, with the latter having appreciably better resolution limits and beneficial lower tube voltages. This low voltage CT technique has the potential to be useful in composition based diagnosis of carotid vulnerable atherosclerotic plaque.

Quantifying effects of post-processing with visual grading regression

Örjan Smedby, Mats Fredrikson, Jakob De Geer, et al.

Show abstract

For optimization and evaluation of image quality, one can use visual grading experiments, where observers rate some aspect of image quality on an ordinal scale. To take into account the ordinal character of the data, ordinal logistic regression is used in the statistical analysis, an approach known as visual grading regression (VGR). In the VGR model one may include factors such as imaging parameters and post-processing procedures, in addition to patient and observer identity. In a single-image study, 9 radiologists graded 24 cardiac CTA images acquired with ECG-modulated tube current using standard settings (310 mAs), reduced dose (62 mAs) and reduced dose after post-processing. Image quality was assessed using visual grading with five criteria, each with a five-level ordinal scale from 1 (best) to 5 (worst). The VGR model included one term estimating the dose effect (log of mAs setting) and one term estimating the effect of postprocessing. The model predicted that 115 mAs would be required to reach an 80% probability of a score of 1 or 2 for visually sharp reproduction of the heart without the post-processing filter. With the post-processing filter, the corresponding figure would be 86 mAs. Thus, applying the post-processing corresponded to a dose reduction of 25%. For other criteria, the dose-reduction was estimated to 16-26%. Using VGR, it is thus possible to quantify the potential for dose-reduction of post-processing filters.

The effect of compression on confidence during the detection of skull fractures in CT

Ines Nikolovski, Mark F. McEntee, Roger Bourne, et al.

Show abstract

As part of a study to establish whether detection of cranial vault fractures is affected by JPEG 2000 30:1 and 60:1 lossy compression when compared to JPEG 2000 lossless compression we looked at the effects on confidence ratings 55 CT images, with three levels of JPEG 2000 compression (lossless, 30:1 & 60:1) were presented to 14 senior radiologists, 12 from the American Board of Radiology and 2 form Australia, 7 of whom were MSK specialists and 7 were neuroradiologists. 32 Images contained a single skull fracture while 23 were normal. Images were displayed on one calibrated, secondary LCD, in an ambient lighting of 32.2 lux. Observers were asked to identify the presence or absence of a fracture and where a fracture was present to locate and rate their confidence in its presence. A jack-knifed alternate free-response receiver operating characteristic (JAFROC) and a ROC methodology was employed and the DBM MRMC and ANOVA were used to explore differences between the lossless and lossy compressed images. A significant trend of increased confidence in true and false positive scores was seen with JPEG2000 Lossy 60:1 compression. An ANOVA on the mean confidence rating obtained for correct (TP) and incorrect (FP) localization skull fractions demonstrated that this was a significant difference between lossless and 60:1 [FP, p<0.001; TP, p<0.014] and 30:1 and 60:1 [FP, p<0.014; TP, p<0.037].

3D brain MR angiography displayed by a multi-autostereoscopic screen

Daniel S. F. Magalhães, Fádua H. Ribeiro, Fabrício O. Lima, et al.

Show abstract

The magnetic resonance angiography (MRA) can be used to examine blood vessels in key areas of the body, including the brain. In the MRA, a powerful magnetic field, radio waves and a computer produce the detailed images. Physicians use the procedure in brain images mainly to detect atherosclerosis disease in the carotid artery of the neck, which may limit blood flow to the brain and cause a stroke and identify a small aneurysm or arteriovenous malformation inside the brain. Multi-autostereoscopic displays provide multiple views of the same scene, rather than just two, as in autostereoscopic systems. Each view is visible from a different range of positions in front of the display. This allows the viewer to move left-right in front of the display and see the correct view from any position. The use of 3D imaging in the medical field has proven to be a benefit to doctors when diagnosing patients. For different medical domains a stereoscopic display could be advantageous in terms of a better spatial understanding of anatomical structures, better perception of ambiguous anatomical structures, better performance of tasks that require high level of dexterity, increased learning performance, and improved communication with patients or between doctors. In this work we describe a multi-autostereoscopic system and how to produce 3D MRA images to be displayed with it. We show results of brain MR angiography images discussing, how a 3D visualization can help physicians to a better diagnosis.

NPS assessment of color medical displays using a monochromatic CCD camera

Hans Roehrig, Xiliang Gu, Jiahua Fan

Show abstract

This paper presents an approach to Noise Power Spectrum (NPS) assessment of color medical displays without using an expensive imaging colorimeter. The R, G and B color uniform patterns were shown on the display under study and the images were taken using a high resolution monochromatic camera. A colorimeter was used to calibrate the camera images. Synthetic intensity images were formed by the weighted sum of the R, G, B and the dark screen images. Finally the NPS analysis was conducted on the synthetic images. The proposed method replaces an expensive imaging colorimeter for NPS evaluation, which also suggests a potential solution for routine color medical display QA/QC in the clinical area, especially when imaging of display devices is desired.

Theoretical demonstration of image characteristics and image formation process depending on image displaying conditions on liquid crystal display

Asumi Yamazaki, Katsuhiro Ichikawa, Masao Funahashi, et al.

Show abstract

In soft-copy diagnosis, medical images with a large number of matrices often need displaying of reduced images by subsampling processing. We analyzed overall image characteristics on a liquid crystal display (LCD) depending on the display condition. Specifically, we measured overall Wiener spectra (WS) of displayed X-ray images at the sub-sampling rates from pixel-by-pixel mode to 35 %. A used image viewer took image reductions by sub-sampling processing using bilinear interpolation. We also simulated overall WS from sub-sampled images by bilinear, super-sampling, and nearestneighbor interpolations. The measured and simulated results agreed well and demonstrated that overall noise characteristics were attributed to luminance-value fluctuation, sub-sampling effects, and inherent image characteristics of the LCD. Besides, we measured digital MTFs (modulation transfer functions) on center and shifted alignments from subsampled edge images as well as simulating WS. The WS and digital MTFs represented that the displaying of reduced images induced noise increments by aliasing errors and made it impossible to exhibit high-frequency signals. Furthermore, because super-sampling interpolation processed the image reductions more smoothly compared with bilinear interpolations, it resulted in lower WS and digital MTFs. Nearest-neighbor interpolation had almost no smoothing effect, so the WS and digital MTFs indicated the highest values.

Preliminary display comparison for dental diagnostic applications

Nicholas Odlum, Guillaume Spalla, Nele van Assche, et al.

Show abstract

The aim of this study is to predict the clinical performance and image quality of a display system for viewing dental images. At present, the use of dedicated medical displays is not uniform among dentists - many still view images on ordinary consumer displays. This work investigated whether the use of a medical display improved the perception of dental images by a clinician, compared to a consumer display. Display systems were simulated using the MEdical Virtual Imaging Chain (MEVIC). Images derived from two carefully performed studies on periodontal bone lesion detection and endodontic file length determination, were used. Three displays were selected: a medical grade one and two consumer displays (Barco MDRC-2120, Dell 1907FP and Dell 2007FPb). Some typical characteristics of the displays are evaluated by measurements and simulations like the Modulation Function (MTF), the Noise Power Spectrum (NPS), backlight stability or calibration. For the MTF, the display with the largest pixel pitch has logically the worst MTF. Moreover, the medical grade display has a slightly better MTF and the displays have similar NPS. The study shows the instability effect for the emitted intensity of the consumer displays compared to the medical grade one. Finally the study on the calibration methodology of the display shows that the signal in the dental images will be always more perceivable on the DICOM GSDF display than a gamma 2,2 display.

Impact of solid-state lighting on observer performance of color discrimination

Wei-Chung Cheng, Widad Tannous, Aldo Badano

Show abstract

We studied the impact of the microscope light source on reader's performance using a microscopic version of the Farnsworth-Munsell 100 hue test for photographic slide film. Each pair of two adjacent color caps in the original test kit was reproduced on the film with random order and a 5X objective was used to examine the microscopic color patterns. The subject's visual task was to determine whether the color pair was in the correct hue order or not. The test was repeated for both a light-emitting diode lamp and a conventional halogen lamp. In this paper, we discuss the methodology using preliminary results.

Using connectionist models to determine decision making strategy of pathology residents reading dermatopathology digital slides

Claudia Mello-Thoms, Gregory Gardner

Show abstract

Theories of the processes involved in medical decision making have long been formulated in an attempt to understand medical reasoning. Historically medical training has relied on the 'forward reasoning strategy, where trainees are instructed to collect all diagnostic evidence before formulating any hypotheses. However, more recently, studies have determined that medical experts do not rely on such time consuming strategies, but instead quickly generate diagnostic hypotheses and then proceed to collect diagnostic evidence to confirm or to dismiss each hypothesis. In light of this, medical training has been switched to rely on the hypothetical deductive' approach, in which trainees are instructed to mimic the experts and generate diagnostic hypotheses first and then gather diagnostic evidence to sort out the hypotheses. Both reasoning models have shortcomings, as identification of many irrelevant findings adds too much noise to the diagnostic process in the 'forward reasoning' case, whereas identification of too many competing hypotheses generates too large a problem space in the 'hypothetical deductive' approach. In this paper we will use connectionist modeling to simulate the decision making strategies of Pathology residents 'before' and 'after' they undergo a well-known difficult rotation, that in Dermatopathology. We will seek to identify changes in the reasoning patterns of the residents as a result of formal training in the domain. We hypothesize that 'before' undertaking the rotation residents will rely on the 'forward reasoning' approach, whereas 'after' their rotation they are more likely to use the 'hypothetical deductive' reasoning.