Show all abstracts
View Session
- Front Matter: Volume 8318
- Technology Assessment
- Image Display
- ROC Analysis
- Image Perception
- Digital Pathology II: Joint Session with Conferences 8314 and 8315
- Model Observers
- Observer Performance
- Poster Session
Front Matter: Volume 8318
Front Matter: Volume 8318
Show abstract
This PDF file contains the front matter associated with SPIE Proceedings Volume 8318, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Technology Assessment
CT detector evaluation with complex random backgrounds
Show abstract
Modern computed tomography (CT) uses detector arrays consisting of large numbers of photodiodes with scintil-
lator crystals. The number of pixels in the array can play an important role in system performance. Considerable
research has been performed on signal detection in flat backgrounds under various conditions, but little has been
done with complex, random backgrounds in CT; our work investigates in particular the effect of the number of
detector elements on signal detection by a channelized Hotelling observer in a complex background. For this
project, a simulated three-dimensional phantom is generated with its attenuation equal to that of water. The
phantom contains a smaller central section with random variations to simulate random anatomical structures.
Cone-beam projections of the phantom are acquired at different angles and used to calculate the covariance
matrix of the raw projection data. Laguerre-Gauss channels are used to reduce the dimensionality of each 2D
projection and hence the size of the covariance matrix, but the covariance is still a function of two projection
angles. A strong cross-channel correlation is observed as a function of the difference between the angles. A signal
with known location and size is used, and the performance of the observer is calculated from the channel outputs
at multiple projection angles. A contrast-detail diagram is computed for different variables such as signal size,
number of incident x-ray photons, pixel size, etc. At a fixed observer signal-to-noise ratio (SNR), the contrast
required to detect a signal increases dramatically as the signal size decreases.
Reader behavior in a detection task using single- and multislice image datasets
Show abstract
We assess human reader behavior such as reading times and browsing trends in a signal detection experiment
with synthetic single-slice (ss) and multi-slice (ms) image datasets of varying task complexity, defined in this
study as the ratio of the background lump size to the signal width. Three dataset types were generated by
inserting one 3D Gaussian target of fixed size into the center of 3D volumes of correlated Gaussian noise
with three different kernel sizes. Corresponding signal intensities were determined separately for the three
background types using the staircase method targeting an AUC of 0.7 for ss datasets. Non-expert human
readers were presented with ss (central slice of the volume) and ms datasets (slice-by-slice viewing in a
stack-browsing mode). Readers were aware of the target's approximate location within the slice or volume.
Readers could scroll freely through the ms datasets at arbitrary speed and direction with no time limit.
Experiments were conducted in a controlled viewing environment on a 5MP digital mammography display.
AUCs were 0.68-0.73 for ss; 0.82-0.98 for ms datasets. Reading time (ms, ss), the number of repetitions
through the stack (ms), and the average number of slices per repetition (ms) were assessed. Browsing speeds
were in the range of 1-7 slices per second. Results show that readers spent the shortest time and fewest
repetitions reading TP cases, with FP and FN cases requiring the most attention. The reported trends
concur with earlier chest x-ray and mammography studies which report that readers fixate longer on regions
subsequently rated incorrectly.
An image-dependent model of veiling glare effects on detection performance in large-luminance-range displays
Show abstract
One limitation of visual detection tasks in complex scenes with a large range of luminance values is the decrease in
sensitivity due to veiling glare in the display device and in the human eye caused by unwanted light scattering. We
used our previously measured results regarding the increase in detection thresholds due to veiling glare to formulate an
empirical model for this phenomenon. Our results are based on a ring glare source and a Gaussian target on white noise
using a dual-layer, high-dynamic-range liquid-crystal display prototype. The thresholds, measured using a double-random
staircase technique with added signal-absent images, are modeled as a function of illuminance at the eyes and angular
distance between the veiling glare source and the detection target. In this work, we model increases in detection contrast
thresholds due to veiling glare for any image by calculating the contribution of each display pixel. We validate our model
by determining threshold increases for the set of experimental results previously obtained with human subjects. Our imagedependent
model predicts how the contrast threshold is affected by veiling glare for any target location. Finally, we discuss
the range of validity of our model and show predictions for sample mammography, chest CT, and chest radiography images
displayed on large-luminance-range devices.
Visual grading regression with random effects
Show abstract
To analyze visual grading experiments, ordinal logistic regression (here called visual grading regression, VGR) may be
used in the statistical analysis. In addition to types of imaging or post-processing, the VGR model may include factors
such as patient and observer identity, which should be treated as random effects. Standard software does not allow
random factors in ordinal logistic regression, but using Generalized Linear Latent And Mixed Models (GLLAMM) this
is possible. In a single-image study, 9 radiologists graded 24 cardiac Computed Tomography Angiography (CTA)
images with reduced dose without and after post-processing with a 2D adaptive filter, using five image quality criteria.
First, standard ordinal logistic regression was carried out, treating filtering, patient and observer identity as fixed effects.
The same analysis was then repeated with GLLAMM, treating filtering as a fixed effect and patient and observer identity
as random effects. With both approaches, a significant effect (p<0.01) of the filtering was found for all five criteria. No
dramatic differences in parameter estimates or significance levels were found between the two approaches. It is
concluded that random effects can be appropriately handled in VGR using GLLAMM, but no major differences in the
results were found in a preliminary evaluation.
Computational observer approach for the assessment of stereoscopic visualizations for 3D medical images
Show abstract
We present a computational stereoscopic observer approach inspired by the mechanisms of stereopsis in human
vision that makes decisions based on a set of image pairs. Our stereo observer is constrained to a left and a right
image generated using a visualization operator (ray tracing) to render simulated voxel datasets. We present
the formulation of the observer based on model observer theory and discuss issues regarding simulated data
generation and processing for this approach. The applicability of this observer extends to stereoscopic displays
in the areas of entertainment, industrial, and medical imaging applications.
Image Display
Stereoscopic versus monoscopic detection of masses on breast tomosynthesis projection images
Show abstract
The goal of this study was to assess if stereoscopic viewing of breast tomosynthesis projection images impacted mass
detection performance when compared to monoscopic viewing. The dataset for this study, provided by Hologic, Inc.,
contained 47 craniocaudal cases (23 biopsy proven malignant masses and 24 normals). Two projection images that were
separated by 8 degrees were chosen to form a stereoscopic pair. The images were preprocessed to enhance their contrast
and were presented on a stereoscopic display. Three experienced breast imagers participated in a blinded observer study
as readers. Each case was shown twice to each reader - once in the stereoscopic mode, and once in the monoscopic mode
in a random order. The readers were asked to make a binary decision on whether they saw a mass for which they would
initiate a diagnostic workup or not, and also report the location of the mass and provide a confidence score in the range
of 0-100. The binary decisions were analyzed using the sensitivity-specificity measure, while the confidence scores were
analyzed using the Receiver Operating Characteristic curve (ROC). We also report a statistical analysis of the difference
in partial AUC values greater than 95% sensitivity between the stereoscopic and monoscopic modes.
The effect of fixed eye adaptation when using displays with a high luminance range
Show abstract
Calibration of medical review displays according to the part 14 Grayscale Standard Display Function (GSDF) is
important in order to obtain consistency in displayed image quality since display technology and viewing conditions may
vary substantially. Unfortunately, the purpose of the GSDF calibration is best suited for low luminance range conditions
but is not optimal when using modern displays with a high luminance range. Low contrast objects will then obtain a
greater visibility in mid-gray areas compared to similar objects in bright or dark regions. In this study, low contrast
sinusoidal patterns were displayed on a high luminance range monitor under realistic viewing conditions. In order to
simulate the viewing of an x-ray image with both dark and bright regions displayed simultaneously, the luminance of the
patterns ranged from 2 to 600 cd/m2 while the observers were always adapted to the logarithmic average of 35 cd/m2.
The results show a clear relationship between the patterns deviation from the adaptation luminance level and the
necessary contrast required to detect the pattern. The results also indicate the potential for an improvement in the lowcontrast
detectability over a large luminance range by adjusting the GSDF for the limited eye adaptation.
Perceptual enhancement of arteriovenous malformation in MRI angiography displays
Show abstract
The importance of presenting medical images in an intuitive and usable manner during a procedure is essential.
However, most medical visualization interfaces, particularly those designed for minimally-invasive surgery, suffer
from a number of issues as a consequence of disregarding the human perceptual, cognitive, and motor system's
limitations. This matter is even more prominent when human visual system is overlooked during the design cycle.
One example is the visualization of the neuro-vascular structures in MR angiography (MRA) images. This study
investigates perceptual performance in the usability of a display to visualize blood vessels in MRA volumes
using a contour enhancement technique. Our results show that when contours are enhanced, our participants,
in general, can perform faster with higher level of accuracy when judging the connectivity of different vessels.
One clinical outcome of such perceptual enhancement is improvement of spatial reasoning needed for planning
complex neuro-vascular operations such as treating Arteriovenous Malformations (AVMs). The success of an
AVM intervention greatly depends on fully understanding the anatomy of vascular structures. However, poor
visualization of pre-operative MRA images makes the planning of such a treatment quite challenging.
Radiologists' eye gaze when reading cranial CT images
Antje Venjakob,
Tim Marnitz,
Jan Mahler,
et al.
Show abstract
Gaze tracking is a common method to assess perceptual processes when reading medical images. However, little
attention has yet been paid to multi-slice images. The present study examines the gaze data of four experienced radiologists reading 15 cranial Computer Tomography scans (CCT), five of which contain lesions. The participants navigated freely through the slices, while their eye position was tracked. Participants' visual search performance was
examined in terms of: time per case, scrolling pattern including the number of runs through each case and number of oscillations within each case, fixation duration, time to first fixate a lesion and the initial dwell time on a lesion. The
results of the study indicate that performance and reading strategy differ between radiologists. The greatest behavioral
differences occurred between the two readers, who performed best. One of them, participant 4, showed extremely short periods of inspection, few oscillations between the slices, short initial dwells on lesions and short time to first fixation,
whereas participant 2 performed equally as well, but took longer to read individual cases, went through the slices with many more oscillations, showed longer time to first fixation and initial dwell times on lesions. The behavior displayed by participant 4 is consistent with expert behavior reading 2-dimensional images. In contrast, participant 2's behavior resembles that of a novice, namely because of the systematic search pattern employed. The results hint that expertise may be characterized by various and diverse strategies.
iPads and LCDs show similar performance in the detection of pulmonary nodules
Show abstract
In February 2011 the University of Chicago Medical School distributed iPads to its trainee doctors for use when
reviewing clinical information and images on the ward or clinics. The use of tablet computing devices is becoming
widespread in medicine with Apple™ heralding them as "revolutionary" in medicine. The question arises, just because
it is technical achievable to use iPads for clinical evaluation of images, should we do so? The current work assesses the
diagnostic efficacy of iPads when compared with LCD secondary display monitors for identifying lung nodules on chest
x-rays.
Eight examining radiologists of the American Board of Radiology were involved in the assessment, reading chest images
on both the iPad and the an off-the-shelf LCD monitor. Thirty chest images were shown to each observer, of which 15
had one or more lung nodules. Radiologists were asked to locate the nodules and score how confident they were with
their decision on a scale of 1-5. An ROC and JAFROC analysis was performed and modalities were compared using
DBM MRMC.
The results demonstrate no significant differences in performance between the iPad and the LCD for the ROC AUC
(p<0.075) or JAFROC FOM (p<0.059) for random readers and random cases. Sample size estimation showed that this
result is significant at a power of 0.8 and an effect size of 0.05 for ROC and 0.07 for JAFROC.
This work demonstrates that for the task of identifying pulmonary nodules, the use of the iPad does not significantly
change performance compared to an off-the-shelf LCD.
ROC Analysis
Quantitative evaluation of the memory bias effect in ROC studies with PET/CT
Show abstract
PURPOSE. The purpose of the study was to evaluate the memory bias effect in ROC experiments with tomographic data
and, specifically, in the evaluation of two different PET/CT protocols for the detection and diagnosis of recurrent thyroid
cancer. MATERIALS AND METHODS. Two readers participated in an ROC experiment that evaluated tomographic
images from 43 patients followed up for thyroid cancer recurrence. Readers evaluated first whole body PET/CT scans of
the patients and then a combination of whole body and high-resolution head and neck scans of the same patients. The
second set was read twice. Once within 48 hours of the first set and the second time at least a month later. The detection
and diagnostic performances of the readers in the three reading sessions were assessed with the DBMMRMC and
LABMRMC software using the area under the ROC curve as a performance index. Performances were also evaluated
by comparing the number and the size of the detected abnormal foci among the three readings. RESULTS. There was
no performance difference between first and second treatments. There were statistically significant differences between
first and third, and second and third treatments showing that memory can seriously affect the outcome of ROC studies.
CONCLUSION. Despite the fact that tomographic data involve numerous image slices per patient, the memory bias
effect is present and substantial and should be carefully eliminated from analogous ROC experiments.
A new parametrization for the three-class ideal observer's decision rule
Show abstract
Despite theoretical and practical difficulties, we are attempting to extend receiver operating characteristic (ROC)
analysis to tasks with more than two classes. Previously we developed explicit analytical expressions for the
behavior of the ideal observer acting on univariate trinormal data, and for the region of support of the ideal
observer's decision variables when acting on bivariate trinormal data. Although explicit calculation of the ideal
observer's behavior for general underlying data is difficult, we have developed a new set of parameters for
describing the ideal observer's decision rule which may aid in analytic or numeric computation of the ideal
observer's behavior.
A nonparametric approach to comparing the areas under correlated LROC curves
Show abstract
In contrast to the ROC assessment paradigm, localization ROC (LROC) analysis provides a means to jointly
assess the accuracy of visual search and detection in an observer study. In a typical multireader, multicase
(MRMC) evaluation, the data sets are paired so that correlations arise in observer performance both between
observers and between image reconstruction methods (or modalities). Therefore,MRMC evaluations motivate the
need for a statistical methodology to compare correlated LROC curves. In this work, we suggest a nonparametric
strategy for this purpose. Specifically, we find that seminal work of Sen on U-statistics can be applied to estimate
the covariance matrix for a vector of LROC area estimates. The resulting covariance estimator is the LROC
analog of the covariance estimator given by DeLong et al. for ROC analysis. Once the covariance matrix is
estimated, it can be used to construct confidence intervals and/or confidence regions for purposes of comparing
observer performance across reconstruction methods. The utility of our covariance estimator is illustrated with
a human-observer LROC evaluation of three reconstruction strategies for fan-beam CT.
Image recognition and consistency of response
Tamara Miner Haygood M.D.,
John Ryan,
Qing Mary Ashley Liu,
et al.
Show abstract
Purpose: To investigate the connection between conscious recognition of an image previously
encountered in an experimental setting and consistency of response to the experimental question.
Materials and Methods: Twenty-four radiologists viewed 40 frontal chest radiographs and gave their opinion as to the position of a central venous catheter. One-to-three days later they again viewed 40 frontal chest radiographs and again gave their opinion as to the position of the central venous catheter. Half of the radiographs in the second set were repeated images from the first set and half were new. The radiologists were asked of each image whether it had been included in the first set. For this study, we are evaluating only the 20 repeated images. We used the Kruskal-Wallis test and Fisher's exact test to determine the relationship between conscious recognition of a previously interpreted image and consistency in interpretation of the image.
Results. There was no significant correlation between recognition of the image and consistency in response regarding the position of the central venous catheter. In fact, there was a trend in the opposite direction, with radiologists being slightly more likely to give a consistent response with respect to images they did not recognize than with respect to those they did recognize.
Conclusion: Radiologists' recognition of previously-encountered images in an observer-performance study does not noticeably color their interpretation on the second encounter.
Materials and Methods: Twenty-four radiologists viewed 40 frontal chest radiographs and gave their opinion as to the position of a central venous catheter. One-to-three days later they again viewed 40 frontal chest radiographs and again gave their opinion as to the position of the central venous catheter. Half of the radiographs in the second set were repeated images from the first set and half were new. The radiologists were asked of each image whether it had been included in the first set. For this study, we are evaluating only the 20 repeated images. We used the Kruskal-Wallis test and Fisher's exact test to determine the relationship between conscious recognition of a previously interpreted image and consistency in interpretation of the image.
Results. There was no significant correlation between recognition of the image and consistency in response regarding the position of the central venous catheter. In fact, there was a trend in the opposite direction, with radiologists being slightly more likely to give a consistent response with respect to images they did not recognize than with respect to those they did recognize.
Conclusion: Radiologists' recognition of previously-encountered images in an observer-performance study does not noticeably color their interpretation on the second encounter.
Inverse dependence of search and classification performances in lesion localization tasks
Show abstract
Search involves detecting the locations of potential lesions. Classification involves determining if a detected region is a
true lesion. The most commonly used measure of observer performance, namely the area A under the ROC curve, is
affected by both search and classification performances. The aim was to demonstrate a method for separating these
contributions and apply it to several clinical datasets. Search performance S was defined as the square root of 2 times the
perpendicular distance of the end-point of the search-model predicted ROC from the chance diagonal. Classification
performance C was defined as the separation of the unit-variance binormal distributions for signal and noise sites.
Eleven (11) datasets were fitted by the search model and search, classification and trapezoidal A were computed for each
modality and reader combination. Kendall-tau correlations were calculated between the resulting S, C and A pairs.
Kendall correlation (S vs. C) was smaller than zero for all datasets, and the average Kendall correlation was significantly
smaller than 0 (average = -0.401, P = 8.3 x 10-6). Also, Kendall correlation (A vs. S) was larger than zero for 9 out of 11
datasets and the average Kendall correlation was significantly larger than 0 (average = 0.295, P = 2.9 x 10-3). On the
other hand average Kendall correlation (A vs. C) was not significantly different from zero (average = 0.102, P = 0.25).
The results suggest that radiologists may learn to compensate for poor search performance with better classification
performance. This study also indicates that efforts at improving net performance, which currently focus almost
exclusively on improving classification performance, may be more successful if aimed at improving search performance.
Image Perception
Outlining and categorising mammographic breast density: expert radiologist perception
Show abstract
The main aim of this study is to investigate the outlining and categorising of mammographic breast density by expert
radiologists in order to help to understand what kind of region radiologists perceive as breast density and how they assess
the density of a mammogram. It investigates inter-radiologist variability in breast density outlining and assessment.
Forty-five normal cranio-caudal view mammograms with a range of appearances of breast density were presented to
twenty radiologists. Each participant was asked to manually outline any mammographic breast density using an interactive pen tablet and to visually classify mammographic breast density in two ways by using the BI-RADS density categorization system by the American College of Radiology, and by estimating the percentage of area of mammographic breast density. Large differences were found in breast density outlining for all BI-RADS density categories. Scattered and patchy breast density appeared to be associated with large variation in outlining. There was
moderate inter-radiologist agreement in BI-RADS density categorising (Kappa = 0.489). Breast density is a complex
radiological feature that impacts upon assessment consistency.
Measurements of the detectability of hepatic hypovascular metastases as a function of retinal eccentricity in CT images
Show abstract
The great amount of slices in volumetric data sets and limited time prevents human observers from exhaustively pointing
their high resolution processing fovea to all regions in the images. Thus, many image-regions are processed with nonfoveal
peripheral visual processing. Yet, most studies quantifying human detectability of signals in computer simulated
textures and medical image backgrounds, have measured performance without consideration of the location of the signal
in the observer's eye relative to the fovea (retinal eccentricity). Here, we measure human observer detectability of
signals in CT images as a function of retinal eccentricity. A representative signal was extracted from a liver image and
was added to healthy liver backgrounds at random positions. The retinal eccentricities of the signal were manipulated by
varying the position of the point at which observers fixated with their eyes. Real-time video-based eye tracking was used
to ensure steady fixation. High contrast fiduciary marks indicated the only possible location of the signal which was
present in 50% of the images. Single CT slices were presented for 200 ms or 1 second. The observer was instructed to
decide whether the image contained a signal (yes/no task). We probed 6 eccentricities with 420 decision trials per
eccentricity. We found a large detectability degradation with retinal eccentricity with d' degrading by 50% at an
eccentricity of 9 degrees for a 200 ms display time.
Signal-known exactly detection performance in tomosynthesis: does volume visualization help human observers?
Show abstract
Tomosynthesis produces three-dimensional images of an object, with non-isotropic resolution.
Tomosynthesis images are typically read by human observers in a stack viewing mode, displaying
planes through the tomosynthesis volume. The purpose of this study was to investigate whether
human performance in a signal-known exactly (SKE) detection task improves when the entire
tomosynthesis volume is available to the observer, compared to displaying a single plane through
the signal center. The goal of this study was to improve understanding of human performance
in order to aid development of observer models for tomosynthesis.
Human performance was measured using sequential 2-alternative forced choice experiments.
In each trial, the observer was first asked to select the signal-present ROI based on a single
2D tomosynthesis plane. Then, scrolling was enabled and the observer was able to select the
signal-present ROI, based on knowledge of the entire volume. The number of correct decisions
for 2D and 3D viewing was recorded, and the number of trials was recorded for which a score
increase or decrease occured between 2D and 3D readings.
Test images consisted of tomosynthesis reconstructions of simulated breast tissue, where
breast tissue was modeled as binarized power-law noise. Tomosynthesis reconstructions of designer
nodules of r = 250μm, r = 1mm, r = 4mm were added to the structured backgrounds.
For each signal size, observers scored 256 trials with signal amplitude set so that the proportion
of correct answers in the single slice was 90%.
For two observers, a slight increase in performance was found when adjacent tomosynthesis
slices were displayed, for the two larger signals. Statistical significance could not be established.
The number of decision changes was analyzed for each observer. For these two observers,
the number of decision changes that led to a score increase or decrease were outside the 95%
confidence interval of the decision change being random, indicating that for these two observers,
displaying the tomosynthesis stack did boost performance. For the other two observers, decision
changes that increased or decreased the score were within the 95% confidence interval of guessing,
indicating that the decision changes were due to a satisfaction of search effect.
However the results also indicate that the performance increase is small and the majority of
information appears to be contained in the tomosynthesis slice that corresponds to the center
of the lesions.
Satisfaction of search errors detecting subtle fractures diminish in the presence of more serious injuries
Show abstract
Satisfaction of search (SOS) occurs when an abnormality is missed because another abnormality has been detected
in radiology examinations. This research includes our study of whether the severity of a detected fracture determines
whether subsequent fractures are overlooked. Each of 70 simulated multitrauma patients presented radiographs of three anatomic areas. Readers evaluated each patient under two experimental conditions: when the images of the first anatomic
area included a severe fracture (the SOS condition), and when it did not (the control condition). The SOS effect was
measured on detection accuracy for subtle test fractures presented on examinations of the second or third anatomic areas.
SOS reduction in ROC area for detecting subtle test fractures with the addition of a major fracture to the first radiograph
was not observed. The same absence of SOS that had been observed when high-morbidity added fractures were
presented on CT was replicated with the high-morbidity added fractures presented on radiographs. This finding rules out
the possibility that there was no SOS in the prior study with CT because SOS effects do not extend from one imaging
modality to another. Taken together, the evidence rejects the hypothesis that the severity of a detected fracture determines the SOS for subsequently viewed fractures.
Predictive modeling of human perception subjectivity: feasibility study of mammographic lesion similarity
Songhua Xu,
Kathleen Hudson M.D.,
Yong Bradley M.D.,
et al.
Show abstract
The majority of clinical content-based image retrieval (CBIR) studies disregard human perception subjectivity, aiming to
duplicate the consensus expert assessment of the visual similarity on example cases. The purpose of our study is twofold:
i) discern better the extent of human perception subjectivity when assessing the visual similarity of two images
with similar semantic content, and (ii) explore the feasibility of personalized predictive modeling of visual similarity.
We conducted a human observer study in which five observers of various expertise were shown ninety-nine triplets of
mammographic masses with similar BI-RADS descriptors and were asked to select the two masses with the highest
visual relevance. Pairwise agreement ranged between poor and fair among the five observers, as assessed by the kappa
statistic. The observers' self-consistency rate was remarkably low, based on repeated questions where either the
orientation or the presentation order of a mass was changed. Various machine learning algorithms were explored to
determine whether they can predict each observer's personalized selection using textural features. Many algorithms
performed with accuracy that exceeded each observer's self-consistency rate, as determined using a cross-validation
scheme. This accuracy was statistically significantly higher than would be expected by chance alone (two-tailed p-value
ranged between 0.001 and 0.01 for all five personalized models). The study confirmed that human perception
subjectivity should be taken into account when developing CBIR-based medical applications.
Digital Pathology II: Joint Session with Conferences 8314 and 8315
Analysis of slide exploration strategy of cytologists when reading digital slides
Liron Pantanowitz M.D.,
Anil Parwani M.D.,
Eugene Tseytlin,
et al.
Show abstract
Cytology is the sub-domain of Pathology that deals mainly with the diagnosis of cellular
changes caused by disease. Current clinical practice involves a cytotechnologist that manually
screens glass slides containing fixed cytology material using a light microscope. Screened slides
are then forwarded to a specialized pathologist, a cytopathologist, for microscopic review and final
diagnostic interpretation. If no abnormalities are detected, the specimen is interpreted as "normal",
otherwise the abnormalities are marked with a pen on the glass slide by the cytotechnologist and
then are used to render a diagnosis. As Pathology is migrating towards a digital environment it is
important to determine whether these crucial screening and diagnostic tasks can be performed as
well using digital slides as the current practice with glass slides. The purpose of this work is to make
this assessment, by using a set of digital slides depicting cytological materials of different disease
processes in several organs, and then to analyze how different cytologists including
cytotechnologists, cytopathologists and cytotechnology-trainees explored the digital slides. We will
(1) collect visual search data from the cytologists as they navigate the digital slides, as well as
record any electronic marks (annotations) made by the cytologists; (2) convert the dynamic visual
search data into a static representation of the observers' exploration strategy using 'search maps';
and (3) determine slide coverage, per viewing magnification range, for each group. We have
developed a virtual microscope to collect this data, and this interface allows for interactive
navigation of the virtual slide (including panning and zooming), as well as annotation of reportable
findings. Furthermore, all interactions with the interface are time stamped, which allows us to
recreate the cytologists' search strategy.
Influence of LCD color reproduction accuracy on observer performance using virtual pathology slides
Show abstract
The use of color LCDs in medical imaging is growing as more clinical specialties use digital images as a resource in
diagnosis and treatment decisions. Telemedicine applications such as telepathology, teledermatology and
teleophthalmology rely heavily on color images. However, standard methods for calibrating, characterizing and profiling
color displays do not exist, resulting in inconsistent presentation. To address this, we developed a calibration,
characterization and profiling protocol for color-critical medical imaging applications. Physical characterization of
displays calibrated with and without the protocol revealed high color reproduction accuracy with the protocol. The
present study assessed the impact of this protocol on observer performance. A set of 250 breast biopsy virtual slide
regions of interest (half malignant, half benign) were shown to 6 pathologists, once using the calibration protocol and
once using the same display in its "native" off-the-shelf uncalibrated state. Diagnostic accuracy and time to render a
decision were measured. In terms of ROC performance, Az (area under the curve) calibrated = 0.8640; uncalibrated =
0.8558. No statistically significant difference (p = 0.2719) was observed. In terms of interpretation speed, mean
calibrated = 4.895 sec, mean uncalibrated = 6.304 sec which is statistically significant (p = 0.0460). Early results suggest
a slight advantage diagnostically for a properly calibrated and color-managed display and a significant potential
advantage in terms of improved workflow. Future work should be conducted using different types of color images that
may be more dependent on accurate color rendering and a wider range of LCDs with varying characteristics.
Compressing virtual pathology slides: human and model observer evaluation
Show abstract
We aim to improve telepathology images for diagnoses using compression based on information about human visual
system. Underlying goal is to demonstrate utility of a visual discrimination model (VDM) for predicting observer
performance. 100 ROIs from breast biopsy virtual slides at 5 levels of compression (uncompressed, 8:1, 16:1, 32:1, 64:1,
128:1) were shown to 6 pathologists to determine benign vs malignant. There was a decrease in performance as a
function of compression (F = 14.58, p< 0.0001). The visibility of compression artifacts in the test images was predicted
using a VDM. JND metrics were computed for each image including mean, median, ≥90th percentiles, and maximum.
For comparison PSNR and SSIM were also computed. Image distortion metrics were computed as a function of
compression ratio and averaged across test images. All of the JND metrics were found to be highly correlated and
differed primarily in magnitude. Both PSNR and SSIM decreased with bit rate, correctly reflecting a loss of image
fidelity with increasing compression. The correlation of observer performance in the ROC experiment with image
distortion metrics is shown in Figures 3 and 4. Observer performance (Az) was nearly constant up to a compression ratio
of 32:1, then decreased significantly for 64:1 and 128:1 compression. The initial decline in Az occurred around a mean
JND of 3, Minkowski JND of 4, and 99th percentile JND of 6.5. Virtual pathology may be compressible to relatively high
levels before impacting diagnostic accuracy and the VDM accurately predicts human performance.
Model Observers
Creation of an ensemble of simulated cardiac cases and a human observer study: tools for the development of numerical observers for SPECT myocardial perfusion imaging
Show abstract
Our previous Single Photon Emission Computed Tomography (SPECT) myocardial perfusion imaging (MPI) research
explored the utility of numerical observers. We recently created two hundred and eighty simulated SPECT cardiac cases
using Dynamic MCAT (DMCAT) and SIMIND Monte Carlo tools. All simulated cases were then processed with two
reconstruction methods: iterative ordered subset expectation maximization (OSEM) and filtered back-projection (FBP).
Observer study sets were assembled for both OSEM and FBP methods. Five physicians performed an observer study on
one hundred and seventy-nine images from the simulated cases. The observer task was to indicate detection of any
myocardial perfusion defect using the American Society of Nuclear Cardiology (ASNC) 17-segment cardiac model and
the ASNC five-scale rating guidelines. Human observer Receiver Operating Characteristic (ROC) studies established the
guidelines for the subsequent evaluation of numerical model observer (NO) performance. Several NOs were formulated
and their performance was compared with the human observer performance. One type of NO was based on evaluation of
a cardiac polar map that had been pre-processed using a gradient-magnitude watershed segmentation algorithm. The
second type of NO was also based on analysis of a cardiac polar map but with use of a priori calculated average image
derived from an ensemble of normal cases.
Volumetric detection tasks with varying complexity: human observer performance
Show abstract
This study explores detection performance trends of human observers with respect to two parameters: task
complexity determined by the frequency content of background and signal, and image viewing mode: singleslice
(ss) versus multi-slice (ms) stack-browsing image presentation. The images are 3D correlated Gaussian
noise with a 3D Gaussian signal centered in the image volume. To vary task complexity, we consider three
different noise kernels while keeping the signal spread constant across all images. In ss mode, only the
central slice of the volume is presented to the observer, while in ms mode all slices are available. All human
readings are conducted in a controlled viewing environment on a 5MP digital mammography medical display.
Overall, in line with the literature, we find that human performance increases in ms relative to ss image
presentation mode. Furthermore, our experiments indicate that the extent of difference between ms and
ss performance is influenced by the properties of image data (level of task complexity): the difference in
performance increases (from ΔAUC= 0.14 to ΔAUC= 0.30) as the difference in the frequency content of the
signal and the background increases. In other words, the benefit of having additional slices available in ms
mode is larger for lower-complexity tasks. Future studies shall focus on comparing the results of the present
study to the existing model observers for volumetric images, ultimately aiming to design an anthropomorphic
model observer for volumetric detection tasks.
Performance characteristics of a visual-search human-model observer with sparse PET image data
Show abstract
As predictors of human performance in detection-localization tasks, statistical model observers can have problems
with tasks that are primarily limited by target contrast or structural noise. Model observers with a visual-search
(VS) framework may provide a more reliable alternative. This framework provides for an initial holistic search
that identifies suspicious locations for analysis by a statistical observer. A basic VS observer for emission
tomography focuses on hot "blobs" in an image and uses a channelized nonprewhitening (CNPW) observer for
analysis. In [1], we investigated this model for a contrast-limited task with SPECT images; herein, a statisticalnoise
limited task involving PET images is considered. An LROC study used 2D image slices with liver, lung
and soft-tissue tumors. Human and model observers read the images in coronal, sagittal and transverse display
formats. The study thus measured the detectability of tumors in a given organ as a function of display format.
The model observers were applied under several task variants that tested their response to structural noise both
at the organ boundaries alone and over the organs as a whole. As measured by correlation with the human data,
the VS observer outperformed the CNPW scanning observer.
Theoretical performance analysis of multislice channelized Hotelling observers
Show abstract
Quality assessment of 3D medical images is becoming increasingly important, because of clinical practice rapidly moving
in the direction of volumetric imaging. In a recent publication, three multi-slice channelized Hotelling observer (msCHO)
models are presented for the task of detecting 3D signals in multi-slice images, where each multi-slice image is inspected
in a so called stack-browsing mode. The observer models are based on the assumption that humans observe multi-slice
images in a simple two stage process, and each of the models implement this principle in a different way.
In this paper, we investigate the theoretical performance, in terms of detection signal-to-noise-ratio (SNR) of msCHO
models, for the task of detecting a separable signal in a Gaussian background with separable covariance matrix. We
find that, despite the differences in architecture of the three models, they all have the same asymptotical performance in
this task (i.e., when the number of training images tends to infinity). On the other hand, when backgrounds with nonseparable
covariance matrices are considered, the third model, msCHOc, is expected to perform slightly better than the
other msCHO models (msCHOa and msCHOb), but only when sufficient training images are provided. These findings
suggest that the choice between the msCHO models mainly depends on the experiment setup (e.g., the number of available
training samples), while the relation to human observers depends on the particular choice of the "temporal" channels that
the msCHO models use.
Utilizing the Hotelling template as a tool for CT image reconstruction algorithm design
Show abstract
Design of image reconstruction algorithms for CT can be significantly aided by useful metrics
of image quality. Useful metrics, however, are difficult to develop due to the high-dimensionality
of the CT imaging system, lack of spatial invariance in the imaging system, and a high degree of
correlation among the image voxels. Although true task-based evaluation on realistic imaging tasks
can be time-consuming, and a given task may be insensitive to the image reconstruction algorithm,
task-based metrics can still prove useful in many contexts. For example, model observers that
mimic performance of the imaging system on specific tasks can provide a low-dimensional measure
of image quality while still accounting for many of the salient properties of the system and object
being scanned. In this work, ideal observer performance is computed on a single detection task.
The modeled signal for detection is taken to be very small - size on the order of a detector bin -
and inspection of the accompanying Hotelling template is suggested. We hypothesize that improved
detection on small signals may be sensitive to the reconstruction algorithm. Further, we hypothesize
that structurally simple Hotelling templates may correlate with high human observer performance.
Observer Performance
Diagnostic accuracy of digital mammography versus tomosynthesis: effect of radiologists' experience
Show abstract
Purpose: To investigate whether readers' experience affects performance in a study comparing 2D digital
mammography (2D) with 2-view (CC and MLO) or 1-view (MLO) tomosynthesis.
Materials and Methods: One-hundred-thirty 2D cases were collected from screening assessment and referral clinics; 64
of the cases had verified abnormalities and the remaining were confirmed normal. Two-view tomosynthesis images were
obtained from the same patients. Ten accredited readers (5 with ≥ 10 years experience in mammography and 5 with < 10
years) classified the cases in terms of malignancy (rate 0-5), and recall (yes/no), for both modalities. A second
experiment was performed with the same cases, with 10 other readers (again 5 experienced / 5 less experienced), but
using 2D and 1-view tomosynthesis as the two modalities. The multi-reader-multi-case ROC method was applied and the
significance of diagnostic accuracy difference of 2D vs tomosynthesis was calculated, as a function of experience and for
each experiment. Recall rate (RR) on malignant and benign cases was also calculated, along with reading time.
Results: No significant difference was reached between 2D and 2-view tomosynthesis for experienced readers (pvalue=
0.25); for less experienced readers the p-value was significant (0.03). No significant difference was found
between 2D and 1-view tomosynthesis, independent of readers' experience. RR for benign cases decreased for
tomosynthesis (for booth 1- and 2-view), independent of experience. Average reading time per case was 79 s (range 65-
91 s) and 134 s (range 119-158 s) for experienced readers; 56 s (range 46-67 s) and 115s (range 97-142 s) for nonexperienced,
for 2D and 2-view tomosynthesis respectively. Reading time was 74 s (range 43-98 s) and 99 s (range 73-
117 s) for experienced readers; 74 s (range 62-85 s) and 94 s (range 82-137 s) for non-experienced, for 2D and 1-view
tomosynthesis respectively.
Conclusions: For experienced readers, there is no evidence of improved diagnostic accuracy when using 2-view or 1-
view tomosynthesis, while less experienced readers perform better with 2-view tomosynthesis than 2D images.
Tomosynthesis reduces the number of recall of benign cases, without hindering cancer detection.
Is diagnostic accuracy for detecting pulmonary nodules in chest CT reduced after a long day of reading?
Show abstract
Radiologists are reading more cases with more images, especially in CT and MRI and thus working longer hours than
ever before. There have been concerns raised regarding fatigue and whether it impacts diagnostic accuracy. This study
measured the impact of reader visual fatigue by assessing symptoms, visual strain via dark focus of accommodation, and
diagnostic accuracy. Twenty radiologists and 20 radiology residents were given two diagnostic performance tests
searching CT chest sequences for a solitary pulmonary nodule before (rested) and after (tired) a day of clinical reading.
10 cases used free search and navigation, and the other 100 cases used preset scrolling speed and duration. Subjects filled
out the Swedish Occupational Fatigue Inventory (SOFI) and the oculomotor strain subscale of the Simulator Sickness
Questionnaire (SSQ) before each session. Accuracy was measured using ROC techniques. Using Swensson's technique
yields an ROC area = 0.86 rested vs. 0.83 tired, p (one-tailed) = 0.09. Using Swensson's LROC technique yields an area
= 0.73 rested vs. 0.66 tired, p (one-tailed) = 0.09. Using Swensson's Loc Accuracy technique yields an area = 0.77 rested
vs. 0.72 tired, p (one-tailed) = 0.13). Subjective measures of fatigue increased significantly from early to late reading. To
date, the results support our findings with static images and detection of bone fractures. Radiologists at the end of a long
work day experience greater levels of measurable visual fatigue or strain, contributing to a decrease in diagnostic
accuracy. The decrease in accuracy was not as great however as with static images.
Indirect detection of pulmonary nodule on low-pass filtered and original x-ray images during limited and unlimited display times
Show abstract
Aim: This study evaluates the assumption that global impression is created based on low spatial frequency components
of posterior-anterior chest radiographs. Background: Expert radiologists precisely and rapidly allocate visual attention
on pulmonary nodules chest radiographs. Moreover, the most frequent accurate decisions are produced in the shortest
viewing time, thus, the first hundred milliseconds of image perception seems be crucial for correct interpretation.
Medical image perception model assumes that during holistic analysis experts extract information based on low spatial
frequency (SF) components and creates a mental map of suspicious location for further inspection. The global
impression results in flagged regions for detailed inspection with foveal vision. Method: Nine chest experts and nine
non-chest radiologists viewed two sets of randomly ordered chest radiographs under 2 timing conditions: (1) 300ms; (2)
free search in unlimited time. The same radiographic cases of 25 normal and 25 abnormal digitalized chest films
constituted two image sets: low-pass filtered and unfiltered. Subjects were asked to detect nodules and rank confidence
level. MRMC ROC DBM analyses were conducted. Results: Experts had improved ROC AUC while high SF
components are displayed (p=0.03) or while low SF components were viewed under unlimited time (p=0.02) compared
with low SF 300mSec viewings. In contrast, non-chest radiologists showed no significant changes when high SF are
displayed under flash conditions compared with free search or while low SF components were viewed under unlimited
time compared with flash. Conclusion: The current medical image perception model accurately predicted performance
for non-chest radiologists, however chest experts appear to benefit from high SF features during the global impression.
Are improved rater reliability results associated with faster reaction times after rater training for judgments of laryngeal mucus?
Heather Shaw Bonilha,
Amy Dawson,
Katlyn McGrattan
Show abstract
Mucus aggregation on the vocal folds, a common complaint amongst persons with voice disorders, has been visually
rated on four parameters: type, pooling, thickness, and location. Rater training is used to improve the reliability and
accuracy of these ratings. The goal of this study was to evaluate the effect of training on rater reliability, accuracy and
response time.
Two raters scored mucus aggregation from 120 stroboscopic exams after a brief introductory session and again after a
thorough training session. Reliability and accuracy were calculated in percent agreement. Two-tail paired t-tests were
used to assess differences in reaction time for ratings before and after training.
Inter-rater reliability improved from 79% pre-training to 92% post-training. Intra-rater reliability improved from 77% to
91% for Rater 1 and 80% to 88% for Rater 2 following training. Accuracy improved from 80% to 96% for Rater 1 and
76% to 95% for Rater 2 from pre- to post-training. Reaction time decreased for both raters (p=0.025).
These findings further our understanding of observer performance on judgments of laryngeal mucus. These results
suggest that rater training increases reliability and accuracy while decreasing reaction time. Future studies should assess
the relationship of these judgments and voice changes.
Assessment of change in breast density: reader performance using synthetic mammographic images
Show abstract
A recent study has shown that breast cancer risk can be reduced by taking Tamoxifen, but only if this results in at least a
10% point reduction in mammographic density. When mammographic density is quantified visually, it is impossible to
assess reader accuracy using clinical images as the ground truth is unknown.
Our aim was to compare three models of assessing density change and to determine reader accuracy in identifying
reductions of 10% points or more. We created 100 synthetic, mammogram-like images comprising 50 pairs designed to
simulate natural reduction in density within each pair. Model I: individual images were presented to readers and density
assessed. Model II: pairs of images were displayed together, with readers assessing density for each image. Model III:
pairs of images were displayed together, and readers asked whether there was at least a 10% point reduction in density.
Ten expert readers participated.
Readers' estimates of percentage density were significantly closer to the truth (6.8%-26.4%) when images were assessed
individually rather than in pairs (9.6%-29.8%). Measurement of change was significantly more accurate in Model II than
Model I (p<0.005). Detecting density changes of at least 10% points in image pairs, mean accuracy was significantly
(p<0.005) lower (58%-88%) when change was calculated from density assessments than in Model III (74%-92%).
Our results suggest that where readers need to identify change in density, images should be displayed alongside one
another. In our study, less accurate assessors performed better when asked directly about the magnitude of the change.
Performance differences across the Atlantic when UK and USA radiologists read the same set of test screening cases
Show abstract
Two groups of experienced radiologists from the UK and the USA read the same set of 40 recent FFDM screening cases
to examine the effects of mammography experience, volume of cases read per year, screening practice and monitor
resolution on performance,. Sixteen American radiologists reported these cases using twin DICOM calibrated monitors
which were half the resolution of the clinical mammographic workstations used by 16 UK radiologists. In terms of
effects of volume of cases read per year, then when the group of American radiologists were split into high and low
volume readers (using 5,000 cases p.a. as a criterion) no difference in any performance measure was found. This may be
partly explained by the fact that they were all were very experienced which may have counteracted any case volume
effect here. Comparing the two groups of radiologists from both countries, then the UK group performed better in terms
of the number of cancers detected although the American group recalled more cases, despite having poorer monitors.
This reflects differences in clinical screening practice between the countries, however differences simply due to the
reporting monitors used cannot be ruled out. Data from the study were also compared to that from all UK screeners who
had read these cases as either soft copy or as mammographic film.
Poster Session
Dose-optimized slice thickness for routine multislice computed tomography liver examinations
Show abstract
The need to optimize CT protocols with respect to radiation dose is widely recognized. This study uses
phantom-based methodology to investigate the affect of changes in exposure and slice thickness on observer
performance for the detection of low contrast opacities with multislice computed tomography to determine dose-
optimized slice thickness and image noise for routine liver imaging. Methods: A phantom containing an opacity with diameter 9.5mm and density 10HU below background was scanned at various exposure and slice thickness settings (range 50-125mAs and 1-3mm). An image set consisting of 120 images containing background-only and 60 images containing the opacity in random locations was created. Following Institutional Review Board approval, nine experienced observers viewed the images and scored opacity visualization using a four-point confidence scale. Noise,
contrast-to-noise ratio (CNR), sensitivity, specificity and area under the curve (AUC) were calculated. Comparisons between exposure and slice thickness settings were performed using ROC, Spearman and Wilcoxon techniques. Results: Significant (p<0.05) reductions in AUC and sensitivity occurred when CNR dropped to 0.71 or below and 0.68 or below,
respectively. There was strong correlation between noise and AUC (r = -0.79, p<0.01), noise and sensitivity (r = -0.92,
p<0.001), CNR and AUC (r = -0.90, p<0.001) and CNR and sensitivity (r = 0.61, p<0.05). Conclusion: Observer
performance for the detection of opacities is strongly related to quantum noise and CNR. Dose optimized lesion
detection was achieved with 5mm slice thickness and CNR of 0.72 and noise of 9.05.
Collaborative labeling of malignant glioma with WebMILL: a first look
Show abstract
Malignant gliomas are the most common form of primary neoplasm in the central nervous system, and one of the
most rapidly fatal of all human malignancies. They are treated by maximal surgical resection followed by radiation
and chemotherapy. Herein, we seek to improve the methods available to quantify the extent of tumors using newly
presented, collaborative labeling techniques on magnetic resonance imaging. Traditionally, labeling medical images
has entailed that expert raters operate on one image at a time, which is resource intensive and not practical for very
large datasets. Using many, minimally trained raters to label images has the possibility of minimizing laboratory
requirements and allowing high degrees of parallelism. A successful effort also has the possibility of reducing
overall cost. This potentially transformative technology presents a new set of problems, because one must pose the
labeling challenge in a manner accessible to people with little or no background in labeling medical images and
raters cannot be expected to read detailed instructions. Hence, a different training method has to be employed. The
training must appeal to all types of learners and have the same concepts presented in multiple ways to ensure that all
the subjects understand the basics of labeling. Our overall objective is to demonstrate the feasibility of studying
malignant glioma morphometry through statistical analysis of the collaborative efforts of many, minimally-trained
raters. This study presents preliminary results on optimization of the WebMILL framework for neoplasm labeling
and investigates the initial contributions of 78 raters labeling 98 whole-brain datasets.
Subjective evaluation of user experience in interactive 3D visualization in a medical context
Show abstract
New display technologies enable the usage of 3D-visualization in a medical context. Even though user performance seems to be enhanced with respect to 2D thanks to the addition of recreated depth cues, human factors, and more particularly visual comfort and visual fatigue can still be a bridle to the widespread use of these systems. This study aimed at evaluating and comparing two different 3D visualization systems (a market stereoscopic display, and a state-of-the-art multi-view display) in terms of quality of experience (QoE), in the context of interactive medical visualization. An adapted methodology was designed in order to subjectively evaluate the experience of users. 14 medical doctors and 15 medical students took part in the experiment. After solving different tasks using the 3D reconstruction of a phantom object, they were asked to judge their quality of the experience, according to specific features. They were also asked to give their opinion about the influence of 3D-systems on their work conditions. Results suggest that medical doctors are opened to 3D-visualization techniques and are confident concerning their beneficial influence on their work. However, visual comfort and visual fatigue are still an issue of 3D-displays. Results obtained with the multi-view display suggest that the use of continuous horizontal parallax might be the future response to these current limitations.
Implementation of combined SVM-algorithm and computer-aided perception feedback for pulmonary nodule detection
Show abstract
This pilot study examines the effect of a novel decision support system in medical image interpretation. This system is
based on combining image spatial frequency properties and eye-tracking data in order to recognize over and under
calling errors. Thus, before it can be implemented as a detection aided schema, training is required during which SVMbased
algorithm learns to recognize FP from all reported outcomes, and, FN from all unreported prolonged dwelled
regions. Eight radiologists inspected 50 PA chest radiographs with the specific task of identifying lung nodules. Twentyfive
cases contained CT proven subtle malignant lesions (5-20mm), but prevalence was not known by the subjects, who
took part in two sequential reading sessions, the second, without and with support system feedback. MCMR ROC DBM
and JAFROC analyses were conducted and demonstrated significantly higher scores following feedback with p values of
0.04, and 0.03 respectively, highlighting significant improvements in radiology performance once feedback was used.
This positive effect on radiologists' performance might have important implications for future CAD-system
development.
Effect of morphing between unenhanced and multiscale enhanced chest radiographs on pulmonary nodule detection
Show abstract
Aim: This study aims to determine the effectiveness of a novel image-processing algorithm for multi-scale enhancement
of chest radiographs to improve detection and localization of real pulmonary nodules. Background: Our wavelet-based
enhancement method interactively adjusts the contrast of medical images extracting the spatial frequency components at different scales, followed by a weighting procedure. This study aims to explore the usefulness of this novel procedure for chest image reporting. Method: Sixteen radiologists viewed 50 PA chest radiographs in order to localize pulmonary
nodules. The databank contains 25 normal and 25 abnormal images, with multi-nodule cases. Subjects were allowed to mark unlimited number of locations followed by ranking confidence of nodule presence according to a 5-level scale. Subjects viewed all cases at least in two out of three conditions: unprocessed, enhanced and with morphing between
these two. MCMR ROC and JAFROC analyses were conducted. Results: No significant differences were found in ROC
AUC values across modalities and specialities. Only localization performance with morphing tool is significantly higher (F(1,8)=13.303, p=0.007) for chest expert (JAFROC FOM=0.6355) from non-chest (JAFROC FOM=0.4675) radiologists. Conclusion: Radiologists specialized in chest image interpretation performed consistently well in localizing pulmonary nodules, whereas non-chest radiologists were suffer from distracting effect of morphing tool.
Effect of selective suppression of spatial frequency domain noise on visual detection of a sample object in an inhomogeneous background
Show abstract
This study aims to investigate the effect of selective suppression of spatial frequency (SF) domain Gaussian white noise
on visibility of a sample object in inhomogeneous backgrounds. SF-specific variation in signal-to-noise ratio due to
selective signal averaging in the SF domain is a consequence of some of MRI acquisition methods. This study models
the potential effect on visibility of an object in a complex image. A single disc was randomly positioned in 25 of 50
synthetic clustered lumpy background images. Neutral, low mid and high frequency suppressed Gaussian white noise
was added in the frequency domain to simulate SF-weighted MRI signal averaging. Twelve readers performed visual
searching and localization tasks on ordered sets. Subjects were asked to detect and locate discs and to rank confidence
level. Sensitivity, specificity and ROC analyses were performed. Readers achieved significantly higher ROC AUC - Azscores
- (p<0.001) and case-based sensitivity (p<0.001) and target-based sensitivity (p<0.001) with images in which low
SF noise was suppressed. Also, significant higher cased-based sensitivity (p=0.005), target-based sensitivity (p=0.022)
and Az-values (p=0.01) were scored under mid SF noise filtration. No significant differences were observed when
images with SF-neutral noise suppression were compared with high SF noise suppression. In conclusion, increase of low
and also mid SF signal signal-to-noise ratio significantly improves human performance in visual detection of simple
targets in inhomogeneous backgrounds and suggests that a low SF bias in MRI signal averaging may enhance diagnostic
quality.
Comparison of 2D versus 3D mammography with screening cases: an observer study
Show abstract
Breast cancer is the most common type of non-skin cancer in women. 2D mammography is a screening tool to aid in the
early detection of breast cancer, but has diagnostic limitations of overlapping tissues, especially in dense breasts. 3D
mammography has the potential to improve detection outcomes by increasing specificity, and a new 3D screening tool
with a 3D display for mammography aims to improve performance and efficiency as compared to 2D mammography.
An observer study using human studies collected from was performed to compare traditional 2D mammography with
this new 3D mammography technique. A prior study using a mammography phantom revealed no difference in
calcification detection, but improved mass detection in 2D as compared to 3D. There was a significant decrease in
reading time for masses, calcifications, and normals in 3D compared to 2D, however, as well as more favorable
confidence levels in reading normal cases.
Data for this current study is currently being obtained, and a full report should be available in the next few weeks.
A potential method to identify poor breast screening performance
Show abstract
In the UK all breast screeners undertake the PERFORMS scheme where they annually read case sets of challenging
cases. From the subsequent data it is possible to identify any individual who is performing significantly lower than their
peers. This can then facilitate them being offered further targeted training to improve performance. However, currently
this under-performance can only be calculated once all screeners have taken part, which means the feedback can
potentially take several months. To determine whether such performance outliers could usefully be identified
approximately much earlier the data from the last round of the scheme were re-analysed. From the information of 283
participants, 1,000 groups of them were selected randomly for fixed group sizes varying from four to 50 individuals.
After applying bootstrapping on 1,000 groups, a distribution of low performance threshold values was constructed. Then
the accuracy of estimation was determined by calculating the median value and standard error of this distribution as
compared with the known actual results. Data indicate that increasing sample sizes improved the estimation of the
median and decreased the standard error. Using information from as few as 25 individuals allowed an approximation of
the known outlier cut off value and this improved with larger sample sizes. This approach is now implemented in the
PERFORMS scheme to enable individuals who have difficulties, as compared to their peers, to be identified very early
after taking part which can then help them to improve their performance.
Does the thinking aloud condition affect the search for pulmonary nodules?
Show abstract
Aim: To measure the effect of thinking aloud on perceptual accuracy and visual search behavior during chest radiograph
interpretation for pulmonary nodules. Background: Thinking Aloud (TA) is an empirical research method used by
researchers in cognitive psychology and behavioural analysis. In this pilot study we wanted to examine whether TA had
an effect on the perceptual accuracy and search patterns of subjects looking for pulmonary nodules on adult posterioranterior
chest radiographs (PA CxR). Method: Seven academics within Medical Radiation Sciences at The University of
Sydney participated in two reading sessions with and without TA. Their task was to localize pulmonary nodules on 30
PA CxR using mouse clicks and rank their confidence levels of nodule presence. Eye-tracking recordings were collected
during both viewing sessions. Time to first fixation, duration of first fixation, number of fixations, cumulative time of
fixation and total viewing time were analysed. In addition, ROC analysis was conducted on collected outcome using
DBM methodology. Results: Time to first nodule fixation was significantly longer (p=0.001) and duration of first
fixation was significantly shorter (p=0.043). No significant difference was observed in ROC AUC scores between
control and TA conditions. Conclusion: Our results confirm that TA has little effect on perceptual ability or
performance, except for prolonging the task. However, there were significant differences in visual search behavior.
Future researchers in radio-diagnosis could use the think aloud condition rather than silence so as to more closely
replicate the clinical scenario.
Effect of lesion blurring on observer performance in AFC experiments using chest CT images
Show abstract
The goal was to analyze the influence of blurring of artificial lesions on observer performance during AFC experiments
in chest CT images. Lesion images were generated by scanning Teflon rods of multiple sizes (3/16", 1/4", 5/16", 3/8",
and 1/2") in a General Electric VCT scanner. Images were reconstructed using Bone and Detail reconstruction
algorithms and cropped for use in AFC experiments. Three sets of artificial lesions (simple disks) were generated
mathematically at the same sizes as the Teflon lesions, with two of the sets blurred with 3x3 and 5x5 averaging kernels.
All lesions were scaled to have the same maximum intensity. Approximately 180 normal chest CT images (both Bone
and Detail algorithm) were collected under IRB exemption for use in 2-AFC experiments. Two observers conducted
AFC experiments using the Teflon lesions with the appropriate CT images, and using the artificial lesions in both sets of
CT images. A performance metric was calculated that allowed comparison of experimental results. For Bone algorithm
images, the Teflon and un-blurred lesions produced equivalent performance. Performance was significantly worse using
the blurred lesions. For the Detail algorithm images, un-blurred lesion performance was significantly better than with the
Teflon lesion. The performance using the 3x3-blurred lesions was the closest to the Teflon lesion performance, though it
was slightly worse. Using these results, it is possible to design artificial lesions of any size for use in AFC experiments
that will result in observer performance equivalent to that when using lesions derived from physical phantoms.
A feasibility assessment of automated FISH image and signal analysis to assist cervical cancer detection
Show abstract
Fluorescence in situ hybridization (FISH) technology provides a promising molecular imaging tool to detect cervical
cancer. Since manual FISH analysis is difficult, time-consuming, and inconsistent, the automated FISH image scanning
systems have been developed. Due to limited focal depth of scanned microscopic image, a FISH-probed specimen needs
to be scanned in multiple layers that generate huge image data. To improve diagnostic efficiency of using automated
FISH image analysis, we developed a computer-aided detection (CAD) scheme. In this experiment, four pap-smear
specimen slides were scanned by a dual-detector fluorescence image scanning system that acquired two spectrum images
simultaneously, which represent images of interphase cells and FISH-probed chromosome X. During image scanning,
once detecting a cell signal, system captured nine image slides by automatically adjusting optical focus. Based on the
sharpness index and maximum intensity measurement, cells and FISH signals distributed in 3-D space were projected
into a 2-D con-focal image. CAD scheme was applied to each con-focal image to detect analyzable interphase cells using
an adaptive multiple-threshold algorithm and detect FISH-probed signals using a top-hat transform. The ratio of
abnormal cells was calculated to detect positive cases. In four scanned specimen slides, CAD generated 1676 con-focal
images that depicted analyzable cells. FISH-probed signals were independently detected by our CAD algorithm and an
observer. The Kappa coefficients for agreement between CAD and observer ranged from 0.69 to 1.0 in
detecting/counting FISH signal spots. The study demonstrated the feasibility of applying automated FISH image and
signal analysis to assist cyto-geneticists in detecting cervical cancers.
Assembly and evaluation of a training module and dataset with feedback for improved interpretation of digital breast tomosynthesis examinations
Show abstract
The FDA recently approved Digital Breast Tomosynthesis (DBT) for use in screening for the early detection of breast
cancer. However, MQSA qualification for interpreting DBT through training was noted as important. Performance issues
related to training are largely unknown. Therefore, we assembled a unique computerized training module to assess
radiologists' performances before and after using the training module. Seventy-one actual baseline mammograms (no
priors) with FFDM and DBT images were assembled to be read before and after training with the developed module.
Fifty examinations of FFDM and DBT images enriched with positive findings were assembled for the training module.
Depicted findings were carefully reviewed, summarized, and entered into a specially designed training database where
findings were identified by case number and synchronized to the display of the related FFDM plus DBT examinations on
a clinical workstation. Readers reported any findings using screening BIRADS (0, 1, or 2) followed by instantaneous
feedback of the verified truth. Six radiologists participated in the study and reader average sensitivity and specificity
were compared before and after training. Average sensitivity improved and specificity remained relatively the same after
training. Performance changes may be affected by disease prevalence in the training set.
Assessment of two mammographic density related features in predicting near-term breast cancer risk
Show abstract
In order to establish a personalized breast cancer screening program, it is important to develop risk models that have
high discriminatory power in predicting the likelihood of a woman developing an imaging detectable breast cancer in
near-term (e.g., <3 years after a negative examination in question). In epidemiology-based breast cancer risk models,
mammographic density is considered the second highest breast cancer risk factor (second to woman's age). In this study
we explored a new feature, namely bilateral mammographic density asymmetry, and investigated the feasibility of
predicting near-term screening outcome. The database consisted of 343 negative examinations, of which 187 depicted
cancers that were detected during the subsequent screening examination and 155 that remained negative. We computed
the average pixel value of the segmented breast areas depicted on each cranio-caudal view of the initial negative
examinations. We then computed the mean and difference mammographic density for paired bilateral images. Using
woman's age, subjectively rated density (BIRADS), and computed mammographic density related features we compared
classification performance in estimating the likelihood of detecting cancer during the subsequent examination using
areas under the ROC curves (AUC). The AUCs were 0.63±0.03, 0.54±0.04, 0.57±0.03, 0.68±0.03 when using woman's
age, BIRADS rating, computed mean density and difference in computed bilateral mammographic density, respectively.
Performance increased to 0.62±0.03 and 0.72±0.03 when we fused mean and difference in density with woman's age.
The results suggest that, in this study, bilateral mammographic tissue density is a significantly stronger (p<0.01) risk
indicator than both woman's age and mean breast density.
Evaluation of low contrast detectability performance using two-alternative forced choice method on computed tomography dose reduction algorithms
Show abstract
Today lowering patient radiation dose while maintaining image quality in Computed Tomography has become a very
active research field. Various iterative reconstruction algorithms have been designed to improve/maintain image quality
for low dose patient scans. Typically radiation dose variation will result in detectability variation for low contrast
objects. This paper assesses the low contrast detectability performance of the images acquired at different dose levels
and obtained using different image generation algorithms via two-alterative forced choice human observer method.
Filtered backprojection and iterative reconstruction algorithms were used in the study. Results showed that for the
objects and scan protocol used, the iterative algorithm employed in this study has similar low contrast detectability
performance compared to filtered backprojection algorithm at a 4 times lower dose level. It also demonstrated that well
controlled human observer study is feasible to assess the image quality of a CT system.
Classification of thyroid nodules using a resonance-frequency-based electrical impedance spectroscopy: progress assessment
Show abstract
The incidence of thyroid cancer is rising faster than other malignancies and has nearly doubled in the United States
(U.S.) in the last 30 years. However, classifying between malignant and benign thyroid nodules is often difficult.
Although ultrasound guided Fine Needle Aspiration Biopsy (FNAB) is considered an excellent tool for triaging patients,
up to 25% of FNABs are inconclusive. As a result, definitive diagnosis requires an exploratory surgery and a large
number of these are performed in the U.S. annually. It would be extremely beneficial to develop a non-invasive tool or
procedure that could assist in assessing the likelihood of malignancy of otherwise indeterminate thyroid nodules, thereby
reducing the number of exploratory thyroidectomies that are performed under general anesthesia. In this preliminary
study we demonstrate a unique hand-held Resonance-frequency based Electrical Impedance Spectroscopy (REIS) device
with six pairs of detection probes to detect and classify thyroid nodules using multi-channel EIS output signal sweeps.
Under an Institutional Review Board (IRB)-approved case collection protocol, this REIS device is being tested in our
clinical facility and we have been collecting an initial patient data set since March of this year. Between March and
August of 2011, 65 EIS tests were conducted on 65 patients. Among these cases, six depicted pathology-verified
malignant cells. Our initial assessment indicates the feasibility of easily applying this REIS device and measurement
approach in a very busy clinical setting. The measured resonance frequency differences between malignant and benign
nodules could potentially make it possible to accurately classify indeterminate thyroid nodules.
Registration of T2-weighted and diffusion-weighted MR images of the prostate: comparison between manual and landmark-based methods
Show abstract
Quantitative analysis of multi-parametric magnetic resonance (MR) images of the prostate, including T2-weighted
(T2w) and diffusion-weighted (DW) images, requires accurate image registration. We compared two registration
methods between T2w and DW images. We collected pre-operative MR images of 124 prostate cancer patients (68
patients scanned with a GE scanner and 56 with Philips scanners). A landmark-based rigid registration was done based
on six prostate landmarks in both T2w and DW images identified by a radiologist. Independently, a researcher manually
registered the same images. A radiologist visually evaluated the registration results by using a 5-point ordinal scale of 1
(worst) to 5 (best). The Wilcoxon signed-rank test was used to determine whether the radiologist's ratings of the results
of the two registration methods were significantly different. Results demonstrated that both methods were accurate: the
average ratings were 4.2, 3.3, and 3.8 for GE, Philips, and all images, respectively, for the landmark-based method; and
4.6, 3.7, and 4.2, respectively, for the manual method. The manual registration results were more accurate than the
landmark-based registration results (p < 0.0001 for GE, Philips, and all images). Therefore, the manual method
produces more accurate registration between T2w and DW images than the landmark-based method.
A systematic review of automated melanoma detection in dermatoscopic images and its ground truth data
Show abstract
Malignant melanoma is the third most frequent type of skin cancer and one of the most malignant tumors, accounting
for 79% of skin cancer deaths. Melanoma is highly curable if diagnosed early and treated properly as survival rate varies
between 15% and 65% from early to terminal stages, respectively. So far, melanoma diagnosis is depending subjectively
on the dermatologist's expertise. Computer-aided diagnosis (CAD) systems based on epiluminescense light microscopy
can provide an objective second opinion on pigmented skin lesions (PSL). This work systematically analyzes the
evidence of the effectiveness of automated melanoma detection in images from a dermatoscopic device. Automated
CAD applications were analyzed to estimate their diagnostic outcome. Searching online databases for publication dates
between 1985 and 2011, a total of 182 studies on dermatoscopic CAD were found. With respect to the systematic
selection criterions, 9 studies were included, published between 2002 and 2011. Those studies formed databases of
14,421 dermatoscopic images including both malignant "melanoma" and benign "nevus", with 8,110 images being
available ranging in resolution from 150 x 150 to 1568 x 1045 pixels. Maximum and minimum of sensitivity and
specificity are 100.0% and 80.0% as well as 98.14% and 61.6%, respectively. Area under the receiver operator
characteristics (AUC) and pooled sensitivity, specificity and diagnostics odds ratio are respectively 0.87, 0.90, 0.81, and
15.89. So, although that automated melanoma detection showed good accuracy in terms of sensitivity, specificity, and
AUC, but diagnostic performance in terms of DOR was found to be poor. This might be due to the lack of
dermatoscopic image resources (ground truth) that are needed for comprehensive assessment of diagnostic performance.
In future work, we aim at testing this hypothesis by joining dermatoscopic images into a unified database that serves as
a standard reference for dermatology related research in PSL classification.
User-friendly tools on handheld devices for observer performance study
Show abstract
ROC studies require complex procedures to select cases from many data samples, and to set confidence levels in
each selected case to generate ROC curves. In some observer performance studies, researchers have to develop software
with specific graphical user interface (GUI) to obtain confidence levels from readers. Because ROC studies could be
designed for various clinical situations, it is difficult task for preparing software corresponding to every ROC studies. In
this work, we have developed software for recording confidence levels during observer studies on tiny personal handheld
devices such as iPhone, iPod touch, and iPad. To confirm the functions of our software, three radiologists performed
observer studies to detect lung nodules by using public database of chest radiograms published by Japan Society of
Radiological Technology. The output in text format conformed to the format for the famous ROC kit from the University
of Chicago. Times required for the reading each case was recorded very precisely.
Studying the relative impact of ghosting and noise on the perceived quality of MR images
Show abstract
In current magnetic resonance (MR) imaging systems, design choices are confronted with a trade-off between structured
(i.e. artifacts) and unstructured noise. The impact of both types of noise on perceived image quality, however, is so far
unknown, while this knowledge would be highly beneficial for further improvement of MR imaging systems. In this
paper, we investigate how ghosting artifacts (i.e. structured noise) and random noise, applied at the same energy level in
the distortion, affect the perceived quality of MR images. To this end, a perception experiment is conducted with human
observers rating the quality of a set of images, distorted with various levels of ghosting and noise. To also understand the
influence of professional expertise on the image quality assessment task, two groups of observers with different levels of
medical imaging experience participated in the experiment: one group contained fifteen clinical scientists or application
specialists, and the other group contained eighteen naïve observers. Experimental results indicate that experts and naïve
observers differently assess the quality of MR images degraded with ghosting/noise. Naïve observers consistently rate
images degraded with ghosting higher than images degraded with noise, independent of the energy level of the
distortion, and of the image content. For experts, the relative impact of ghosting and noise on perceived quality tends to
depend on the energy level of the distortion and on the image content, but overall the energy of the distortion is a
promising metric to predict perceived image quality.
Combined collimator/reconstruction optimization for myocardial perfusion SPECT imaging using polar map-based LROC numerical observer
Show abstract
Polar maps have been used to assist clinicians diagnose coronary artery diseases (CAD) in single photon emission
computed tomography (SPECT) myocardial perfusion imaging. Herein, we investigate the optimization of collimator
design for perfusion defect detection in SPECT imaging when reconstruction includes modeling of the collimator. The
optimization employs an LROC clinical model observer (CMO), which emulates the clinical task of polar map detection
of CAD. By utilizing a CMO, which better mimics the clinical perfusion-defect detection task than previous SKE based
observers, our objective is to optimize collimator design for SPECT myocardial perfusion imaging when reconstruction
includes compensation for collimator spatial resolution. Comparison of lesion detection accuracy will then be employed
to determine if a lower spatial resolution hence higher sensitivity collimator design than currently recommended could be
utilized to reduce the radiation dose to the patient, imaging time, or a combination of both. As the first step in this
investigation, we report herein on the optimization of the three-dimensional (3D) post-reconstruction Gaussian filtering
of and the number of iterations used to reconstruct the SPECT slices of projections acquired by a low-energy generalpurpose
(LEGP) collimator. The optimization was in terms of detection accuracy as determined by our CMO and four
human observers. Both the human and all four CMO variants agreed that the optimal post-filtering was with sigma of
the Gaussian in the range of 0.75 to 1.0 pixels. In terms of number of iterations, the human observers showed a
preference for 5 iterations; however, only one of the variants of the CMO agreed with this selection. The others showed a
preference for 15 iterations. We shall thus proceed to optimize the reconstruction parameters for even higher sensitivity
collimators using this CMO, and then do the final comparison between collimators using their individually optimized
parameters with human observers and three times the test images to reduce the statistical variation seen in our present
results.
Characterizing atherosclerotic plaque with computed tomography: a contrast-detail study
Show abstract
Plaque characterization may benefit from the increasing distinctiveness of the attenuating properties of different soft
plaque components at lower energies. Due to the relative slight increase in the CT number of the nonadipose soft plaque
at lower tube voltage settings vs. adipose plaque, a higher contrast between atheromous adipose and non-adipose plaque
may become visible with modern 64 slice systems. A contrast-detail (C-D) phantom with varying plaque composition as
the contrast generating method, was imaged on a commercial 64 slice MDCT system using 80, 120, and 140 kVp
settings. The same phantom was also imaged on a Cone Beam CT (CBCT) system with a lower tube voltage of 75 kVp.
The results of experiments from four different observers on three different plaque types (lipid, fiber, calcific) indicate
that CT attenuation within lipid cores and fibrous masses vary not only with the percentage of lipid or fiber present, but
also with the size of the cores. Furthermore, the C-D curve analysis for all three plaque types reveals that while the noise
constraints prevent visible differentiation of soft plaque at current conventional 64 slice MDCT settings, CBCT exhibits
superior visible contrast detectability than its conventional counterpart, with the latter having appreciably better
resolution limits and beneficial lower tube voltages. This low voltage CT technique has the potential to be useful in
composition based diagnosis of carotid vulnerable atherosclerotic plaque.
Quantifying effects of post-processing with visual grading regression
Show abstract
For optimization and evaluation of image quality, one can use visual grading experiments, where observers rate some
aspect of image quality on an ordinal scale. To take into account the ordinal character of the data, ordinal logistic
regression is used in the statistical analysis, an approach known as visual grading regression (VGR). In the VGR model
one may include factors such as imaging parameters and post-processing procedures, in addition to patient and observer
identity. In a single-image study, 9 radiologists graded 24 cardiac CTA images acquired with ECG-modulated tube
current using standard settings (310 mAs), reduced dose (62 mAs) and reduced dose after post-processing. Image quality
was assessed using visual grading with five criteria, each with a five-level ordinal scale from 1 (best) to 5 (worst). The
VGR model included one term estimating the dose effect (log of mAs setting) and one term estimating the effect of postprocessing.
The model predicted that 115 mAs would be required to reach an 80% probability of a score of 1 or 2 for
visually sharp reproduction of the heart without the post-processing filter. With the post-processing filter, the
corresponding figure would be 86 mAs. Thus, applying the post-processing corresponded to a dose reduction of 25%.
For other criteria, the dose-reduction was estimated to 16-26%. Using VGR, it is thus possible to quantify the potential
for dose-reduction of post-processing filters.
The effect of compression on confidence during the detection of skull fractures in CT
Show abstract
As part of a study to establish whether detection of cranial vault fractures is affected by JPEG 2000 30:1 and 60:1 lossy
compression when compared to JPEG 2000 lossless compression we looked at the effects on confidence ratings 55 CT
images, with three levels of JPEG 2000 compression (lossless, 30:1 & 60:1) were presented to 14 senior radiologists, 12
from the American Board of Radiology and 2 form Australia, 7 of whom were MSK specialists and 7 were
neuroradiologists. 32 Images contained a single skull fracture while 23 were normal. Images were displayed on one
calibrated, secondary LCD, in an ambient lighting of 32.2 lux. Observers were asked to identify the presence or absence
of a fracture and where a fracture was present to locate and rate their confidence in its presence. A jack-knifed alternate
free-response receiver operating characteristic (JAFROC) and a ROC methodology was employed and the DBM MRMC
and ANOVA were used to explore differences between the lossless and lossy compressed images. A significant trend of
increased confidence in true and false positive scores was seen with JPEG2000 Lossy 60:1 compression. An ANOVA on
the mean confidence rating obtained for correct (TP) and incorrect (FP) localization skull fractions demonstrated that this
was a significant difference between lossless and 60:1 [FP, p<0.001; TP, p<0.014] and 30:1 and 60:1 [FP, p<0.014; TP,
p<0.037].
3D brain MR angiography displayed by a multi-autostereoscopic screen
Daniel S. F. Magalhães,
Fádua H. Ribeiro,
Fabrício O. Lima,
et al.
Show abstract
The magnetic resonance angiography (MRA) can be used to examine blood vessels in key
areas of the body, including the brain. In the MRA, a powerful magnetic field, radio waves
and a computer produce the detailed images. Physicians use the procedure in brain
images mainly to detect atherosclerosis disease in the carotid artery of the neck, which
may limit blood flow to the brain and cause a stroke and identify a small aneurysm or
arteriovenous malformation inside the brain.
Multi-autostereoscopic displays provide multiple views of the same scene, rather than just
two, as in autostereoscopic systems. Each view is visible from a different range of positions
in front of the display. This allows the viewer to move left-right in front of the display and
see the correct view from any position.
The use of 3D imaging in the medical field has proven to be a benefit to doctors when
diagnosing patients. For different medical domains a stereoscopic display could be
advantageous in terms of a better spatial understanding of anatomical structures, better
perception of ambiguous anatomical structures, better performance of tasks that require
high level of dexterity, increased learning performance, and improved communication with
patients or between doctors.
In this work we describe a multi-autostereoscopic system and how to produce 3D MRA
images to be displayed with it. We show results of brain MR angiography images
discussing, how a 3D visualization can help physicians to a better diagnosis.
NPS assessment of color medical displays using a monochromatic CCD camera
Show abstract
This paper presents an approach to Noise Power Spectrum (NPS) assessment of color medical displays without using an
expensive imaging colorimeter. The R, G and B color uniform patterns were shown on the display under study and the
images were taken using a high resolution monochromatic camera. A colorimeter was used to calibrate the camera
images. Synthetic intensity images were formed by the weighted sum of the R, G, B and the dark screen images. Finally
the NPS analysis was conducted on the synthetic images. The proposed method replaces an expensive imaging
colorimeter for NPS evaluation, which also suggests a potential solution for routine color medical display QA/QC in the
clinical area, especially when imaging of display devices is desired.
Theoretical demonstration of image characteristics and image formation process depending on image displaying conditions on liquid crystal display
Show abstract
In soft-copy diagnosis, medical images with a large number of matrices often need displaying of reduced images by subsampling
processing. We analyzed overall image characteristics on a liquid crystal display (LCD) depending on the
display condition. Specifically, we measured overall Wiener spectra (WS) of displayed X-ray images at the sub-sampling
rates from pixel-by-pixel mode to 35 %. A used image viewer took image reductions by sub-sampling processing using
bilinear interpolation. We also simulated overall WS from sub-sampled images by bilinear, super-sampling, and nearestneighbor
interpolations. The measured and simulated results agreed well and demonstrated that overall noise
characteristics were attributed to luminance-value fluctuation, sub-sampling effects, and inherent image characteristics of
the LCD. Besides, we measured digital MTFs (modulation transfer functions) on center and shifted alignments from subsampled
edge images as well as simulating WS. The WS and digital MTFs represented that the displaying of reduced
images induced noise increments by aliasing errors and made it impossible to exhibit high-frequency signals.
Furthermore, because super-sampling interpolation processed the image reductions more smoothly compared with
bilinear interpolations, it resulted in lower WS and digital MTFs. Nearest-neighbor interpolation had almost no
smoothing effect, so the WS and digital MTFs indicated the highest values.
Preliminary display comparison for dental diagnostic applications
Nicholas Odlum,
Guillaume Spalla,
Nele van Assche,
et al.
Show abstract
The aim of this study is to predict the clinical performance and image quality of a display system for viewing dental
images. At present, the use of dedicated medical displays is not uniform among dentists - many still view images on
ordinary consumer displays. This work investigated whether the use of a medical display improved the perception of
dental images by a clinician, compared to a consumer display. Display systems were simulated using the MEdical
Virtual Imaging Chain (MEVIC). Images derived from two carefully performed studies on periodontal bone lesion
detection and endodontic file length determination, were used. Three displays were selected: a medical grade one and
two consumer displays (Barco MDRC-2120, Dell 1907FP and Dell 2007FPb). Some typical characteristics of the
displays are evaluated by measurements and simulations like the Modulation Function (MTF), the Noise Power
Spectrum (NPS), backlight stability or calibration. For the MTF, the display with the largest pixel pitch has logically the
worst MTF. Moreover, the medical grade display has a slightly better MTF and the displays have similar NPS. The study
shows the instability effect for the emitted intensity of the consumer displays compared to the medical grade one. Finally
the study on the calibration methodology of the display shows that the signal in the dental images will be always more
perceivable on the DICOM GSDF display than a gamma 2,2 display.
Impact of solid-state lighting on observer performance of color discrimination
Show abstract
We studied the impact of the microscope light source on reader's performance using a microscopic version of the
Farnsworth-Munsell 100 hue test for photographic slide film. Each pair of two adjacent color caps in the original test kit
was reproduced on the film with random order and a 5X objective was used to examine the microscopic color patterns.
The subject's visual task was to determine whether the color pair was in the correct hue order or not. The test was
repeated for both a light-emitting diode lamp and a conventional halogen lamp. In this paper, we discuss the
methodology using preliminary results.
Using connectionist models to determine decision making strategy of pathology residents reading dermatopathology digital slides
Show abstract
Theories of the processes involved in medical decision making have long been formulated in
an attempt to understand medical reasoning. Historically medical training has relied on the 'forward reasoning strategy, where trainees are instructed to collect all diagnostic evidence before
formulating any hypotheses. However, more recently, studies have determined that medical experts
do not rely on such time consuming strategies, but instead quickly generate diagnostic hypotheses
and then proceed to collect diagnostic evidence to confirm or to dismiss each hypothesis. In light of
this, medical training has been switched to rely on the hypothetical deductive' approach, in which
trainees are instructed to mimic the experts and generate diagnostic hypotheses first and then
gather diagnostic evidence to sort out the hypotheses. Both reasoning models have shortcomings,
as identification of many irrelevant findings adds too much noise to the diagnostic process in the
'forward reasoning' case, whereas identification of too many competing hypotheses generates too
large a problem space in the 'hypothetical deductive' approach. In this paper we will use
connectionist modeling to simulate the decision making strategies of Pathology residents 'before' and 'after' they undergo a well-known difficult rotation, that in Dermatopathology. We will seek to
identify changes in the reasoning patterns of the residents as a result of formal training in the
domain. We hypothesize that 'before' undertaking the rotation residents will rely on the 'forward reasoning' approach, whereas 'after' their rotation they are more likely to use the 'hypothetical deductive' reasoning.