Proceedings Volume 9037

Medical Imaging 2014: Image Perception, Observer Performance, and Technology Assessment

cover
Proceedings Volume 9037

Medical Imaging 2014: Image Perception, Observer Performance, and Technology Assessment

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 11 April 2014
Contents: 9 Sessions, 59 Papers, 0 Presentations
Conference: SPIE Medical Imaging 2014
Volume Number: 9037

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 9037
  • Keynote and Visual Search
  • Image Perception
  • Observer Performance
  • Technology Assessment
  • Model Observers: Imaging Applications
  • Observer Performance: Breast
  • Model Observers: General
  • Poster Session
Front Matter: Volume 9037
icon_mobile_dropdown
Front Matter: Volume 9037
This PDF file contains the front matter associated with SPIE Proceedings Volume 9037, including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.
Keynote and Visual Search
icon_mobile_dropdown
Visual search from lab to clinic and back
Jeremy M. Wolfe
Many of the tasks of medical image perception can be understood as demanding visual search tasks (especially if you happen to be a visual search researcher). Basic research on visual search can tell us quite a lot about how medical image search tasks proceed because even experts have to use the human “search engine” with all its limitations. Humans can only deploy attention to one or a very few items at any one time. Human search is “guided” search. Humans deploy their attention to likely target objects on the basis of the basic visual features of object and on the basis of an understanding of the scene containing those objects. This guidance operates in medical images as well as in the mundane scenes of everyday life. The paper reviews some of the dialogue between medical image perception by experts and visual search as studied in the laboratory.
Effect of mammographic breast density on radiologists' visual search pattern
Dana S. Al Mousa, Patrick C. Brennan, Elaine A. Ryan, et al.
This study investigates the impact of breast density on visual searching pattern. A set of 74 one-view malignancy containing mammographic images were examined by 7 radiologists. Eye position was recorded and visual search parameters such as total time examining a case, time to hit the lesion, dwell time and number of hits per area were collected. Fixations were calculated in 3 areas of interests: background breast parenchyma, dense areas of parenchyma and lesion. Significant increases in dwell time and number of hits in dense areas of parenchyma were noted for highcompared to low- mammographic density images when the lesion overlay the fibroglandular tissue (p<0.01). When the lesion was outside the fibroglandular tissue, significant increase in dwell time and number of hits in dense areas of parenchyma in high- compared to low- mammographic density images were observed (p<0.01). No significant differences have been found in total time examining a case, time to first fixate the lesion, dwell time and number of hits in background breast parenchyma and lesion areas. In conclusion, our data suggests that dense areas of breast parenchyma attract radiologists’ visual attention. Lesions overlaying the fibroglandular tissue were detected faster, therefore lesion location, whether overlaying or outside the fibroglandular tissue, appeared to have an impact on radiologists' visual searching pattern.
Laparoscopic surgical skills training: an investigation of the potential of using surgeons' visual search behaviour as a performance indicator
Laparoscopic surgery is a difficult perceptual-motor task and effective and efficient training in the technique is important. Viewing previously recorded laparoscopic operations is a possible available training technique for surgeons to increase their knowledge of such minimal access surgery (MAS). It is not well known whether this is a useful technique, how effective it is or what effect it has on the surgeon watching the recorded video. As part of an on-going series of studies into laparoscopic surgery, an experiment was conducted to examine whether surgical skill level has an effect on the visual search behaviour of individuals of different surgical experience when they examine such imagery. Medically naive observers, medical students, junior surgeons and experienced surgeons viewed a laparoscopic recording of a recent operation. Initial examination of the recorded eye movement data indicated commonalities between all observers, largely irrespective of surgical experience. This, it is argued, is due to visual search in this situation largely being driven by the dynamic nature of the images. The data were then examined in terms of surgical steps and also in terms of interventions when differences were found related to surgical experience. Consequently, it is argued that monitoring the eye movements of trainee surgeons whilst they watch pre-recorded operations is a potential useful adjunct to existing training regimes.
Image Perception
icon_mobile_dropdown
Adaptive controller for volumetric display of neuroimaging studies
Ben Bleiberg, Justin Senseney, Jesus Caban
Volumetric display of medical images is an increasingly relevant method for examining an imaging acquisition as the prevalence of thin-slice imaging increases in clinical studies. Current mouse and keyboard implementations for volumetric control provide neither the sensitivity nor specificity required to manipulate a volumetric display for efficient reading in a clinical setting. Solutions to efficient volumetric manipulation provide more sensitivity by removing the binary nature of actions controlled by keyboard clicks, but specificity is lost because a single action may change display in several directions. When specificity is then further addressed by re-implementing hardware binary functions through the introduction of mode control, the result is a cumbersome interface that fails to achieve the revolutionary benefit required for adoption of a new technology. We address the specificity versus sensitivity problem of volumetric interfaces by providing adaptive positional awareness to the volumetric control device by manipulating communication between hardware driver and existing software methods for volumetric display of medical images. This creates a tethered effect for volumetric display, providing a smooth interface that improves on existing hardware approaches to volumetric scene manipulation.
Preference and performance regarding different image sizes when reading cranial CT
Antje C. Venjakob, Tim Marnitz, Claudia R. Mello-Thoms
Radiology practice is based on the implicit assumption that preference for a particular presentation mode goes hand in hand with superior performance. The present experiment tests this assumption by asking 43 radiologists in two different facilities to interpret 20 cranial computed tomography (cCT) scans in two image sizes, 14 x 14 cm and 28 x 28 cm. The radiologists were asked to identify any intracranial hemorrhages on the images. Subsequently, they were asked to indicate which size they preferred and rated the two image sizes on a continuous scale in terms of how much they liked them. The results show no correlation between diagnostic accuracy, as measured by the JAFROC figure of merit, and preference rated on a continuous scale for both image sizes (large image: r = 0.14, p = 0.38; small images: r= 0.14, p = 0.39). Similarly, there was no significant correlation between reading efficiency, i.e. the time a radiologist took to read a case, and preference rated on the continuous scale (large image: r =- 0.07, p = 0.64; small images: r = -0.04, p = 0.80). Further, no significant differences with regard to diagnostic accuracy, reading efficiency and performance could be observed when comparing the two image sizes. The results strengthen the idea that one cannot automatically assume a connection between preference for a display mode and performance with regard to it.
Gaze as a biometric
Hong-Jun Yoon, Tandy R. Carmichael, Georgia Tourassi
Two people may analyze a visual scene in two completely different ways. Our study sought to determine whether human gaze may be used to establish the identity of an individual. To accomplish this objective we investigated the gaze pattern of twelve individuals viewing still images with different spatial relationships. Specifically, we created 5 visual “dotpattern” tests to be shown on a standard computer monitor. These tests challenged the viewer’s capacity to distinguish proximity, alignment, and perceptual organization. Each test included 50 images of varying difficulty (total of 250 images). Eye-tracking data were collected from each individual while taking the tests. The eye-tracking data were converted into gaze velocities and analyzed with Hidden Markov Models to develop personalized gaze profiles. Using leave-one-out cross-validation, we observed that these personalized profiles could differentiate among the 12 users with classification accuracy ranging between 53% and 76%, depending on the test. This was statistically significantly better than random guessing (i.e., 8.3% or 1 out of 12). Classification accuracy was higher for the tests where the users’ average gaze velocity per case was lower. The study findings support the feasibility of using gaze as a biometric or personalized biomarker. These findings could have implications in Radiology training and the development of personalized e-learning environments.
Going on with false beliefs: What if satisfaction of search was really suppression of recognition?
Satisfaction of search (SOS) is a well known phenomenon in radiology, in which the detection of one abnormality facilitates the neglect of other abnormalities. Over the years SOS has been thoroughly studied primarily in chest and in trauma, and it has been found to be an elusive effect, appearing in some settings but not in others. Unfortunately, very little is known about SOS in mammography. In this study we will explore SOS in breast cancer detection by considering a case set of digital mammograms as interpreted by breast radiologists. However, the primary goal of the study will be to challenge the core of the paradigm; for decades, many have associated SOS with incomplete search, but as Kundel has put eloquently when addressing the SPIE Medical Imaging in 2004 [1], “observers do not stop viewing when one abnormality has been found on an image with multiple abnormalities”. What else could cause SOS then? According to our previous work, the first “perceived” abnormality reported by a radiologist has an influential role in the report of any other “perceived” abnormalities on the case, which supports the idea that perhaps SOS is caused a perceptual suppression of the recognition of different abnormalities. In other words, once the radiologist has made a first report (regardless of whether that first report is a TP or FP), detection and hence reporting of other abnormalities present in the case are greatly dependent on whether these associated abnormalities “fit the profile” of what has been already found.
eeDAP: an evaluation environment for digital and analog pathology
Purpose: The purpose of this work is to present a platform for designing and executing studies that compare pathologists interpreting histopathology of whole slide images (WSI) on a computer display to pathologists interpreting glass slides on an optical microscope. Methods: Here we present eeDAP, an evaluation environment for digital and analog pathology. The key element in eeDAP is the registration of theWSI to the glass slide. Registration is accomplished through computer control of the microscope stage and a camera mounted on the microscope that acquires images of the real time microscope view. Registration allows for the evaluation of the same regions of interest (ROIs) in both domains. This can reduce or eliminate disagreements that arise from pathologists interpreting different areas and focuses the comparison on image quality. Results: We reduced the pathologist interpretation area from an entire glass slide (≈10-30 mm)2 to small ROIs <(50 um)2. We also made possible the evaluation of individual cells. Conclusions: We summarize eeDAP’s software and hardware and provide calculations and corresponding images of the microscope field of view and the ROIs extracted from the WSIs. These calculations help provide a sense of eeDAP’s functionality and operating principles, while the images provide a sense of the look and feel of studies that can be conducted in the digital and analog domains. The eeDAP software can be downloaded from code.google.com (project: eeDAP) as Matlab source or as a precompiled stand-alone license-free application.
Visual quality assessment of H.264/AVC compressed laparoscopic video
Asli E. Kumcu, Klaas Bombeke, Heng Chen, et al.
The digital revolution has reached hospital operating rooms, giving rise to new opportunities such as tele-surgery and tele-collaboration. Applications such as minimally invasive and robotic surgery generate large video streams that demand gigabytes of storage and transmission capacity. While lossy data compression can offer large size reduction, high compression levels may significantly reduce image quality. In this study we assess the quality of compressed laparoscopic video using a subjective evaluation study and three objective measures. Test sequences were full High-Definition videos captures of four laparoscopic surgery procedures acquired on two camera types. Raw sequences were processed with H.264/AVC IPPP-CBR at four compression levels (19.5, 5.5, 2.8, and 1.8 Mbps). 16 non-experts and 9 laparoscopic surgeons evaluated the subjective quality and suitability for surgery (surgeons only) using Single Stimulus Continuous Quality Evaluation methodology. VQM, HDR-VDP-2, and PSNR objective measures were evaluated. The results suggest that laparoscopic video may be lossy compressed approximately 30 to 100 times (19.5 to 5.5 Mbps) without sacrificing perceived image quality, potentially enabling real-time streaming of surgical procedures even over wireless networks. Surgeons were sensitive to content but had large variances in quality scores, whereas non-experts judged all scenes similarly and over-estimated the quality of some sequences. There was high correlation between surgeons’ scores for quality and “suitability for surgery”. The objective measures had moderate to high correlation with subjective scores, especially when analyzed separately by camera type. Future studies should evaluate surgeons’ task performance to determine the clinical implications of conducting surgery with lossy compressed video.
Observer Performance
icon_mobile_dropdown
Investigating links between emotional intelligence and observer performance by radiologists in mammography
Sarah J. Lewis, Patrick C. Brennan, Steven Cumming, et al.
A novel direction of radiology research is better understanding the links between cognitive and personality factors and radiologists' accuracy and performance. This study examines relationships between Emotional Intelligence (EI) scores and observer performance by radiologists in breast cancer detection. Three separate samples were collected with Australian and US breast imaging radiologists. The radiologists were asked to undertake a mammographic interpretation task to identify malignant breast lesions and localise them, in addition to use a confidence rating scale to report confidence in the decision. Following this activity, the radiologists were administered the EI Trait (TEIQue-SF) questionnaire. The Trait EI test gives a Global EI score and 4 sub-scores in Well-being, Self-Control, Emotionality and Sociability. Sample 1 (Sydney 2012) radiologists were divided into 2 experience bands; radiologists practicing <13 years as “less” experience and <13 years as “more”. There was a significant correlation (r = 0.849, p =0.012) between Self-Control and Location Sensitivity in the “less” experience group; however there was little correlation between this EI trait in “more” experience, although more experienced radiologists had significantly higher EI scores for sociability than their less experienced counterparts (z = -1.981, P = 0.047). In the second sample (Darwin 2013) radiologists were divided into 2 groups: high and low experience, however there were no statistically significant correlation between EI and performance in any band. For sample 3 (Louisville 2013) radiologists were divided into 3 groups of experience, with the “medium “experience radiologists having correlations between EI factors “emotionality” and “sociability” to Location Sensitivity and JAFROC. Our preliminary results indicate EI is correlated to observer performance in lesser experienced radiologists. It is suggested that tasks perceived as more difficult by less experienced radiologists may evoke more emotion (uncertainty, frustration, pressure). As experience increases, radiologists may develop an ability to control their emotions or emotional intelligence becomes less important in decision making.
How does radiology report format impact reading time, comprehension and visual scanning?
The question of whether radiology report format influences reading time, comprehension of information, and/or scannig behavior was examined. Three radiology reports were reformatted to three versions: conventional free text, structured text organized by organ system, and hierarchical structured text organized by clinical significance. Five radiologists, 5 radiology residents, 5 internal medicine clinicians and 5 internal medicine residents read the reports. They then answered a series of questions about the report content. Reading time was recorded. Participants also reported reading preferences. Eye-position was also recorded. There were no significant diffrences for reading time as a function of format, but there was for attending versus resident, and radiology versus internal medicine. There was no significant difference for percent correct scores on the questions for report format or for attending versus resident, but there was for radiology versus internal medicine with the radiologists scoring higher. Eye-position results showed that although patterns tended to be indeosynchratic to readers, there were differences in the overall search patterns as a function of report format, with the free text option yielding more regular scanning and the other two formats yielding more “jumping” from one section to another. Report format does not appear to impact viewing time or percent correct answers, but there are differences in both for specialty and level of experience. There were also differences between the four groups of participants with respect to what they focus on in a radiology report and how they read reports (skim versus read in detail). Eye-position recording also revealed differences in report coverage patterns. The way that radiology reports are read is quite variable as individual preferences differ widely, suggesting that there may not be a single format acceptable to all users.
Bone suppression technique for chest radiographs
Zhimin Huo, Fan Xu, Jane Zhang, et al.
High-contrast bone structures are a major noise contributor in chest radiographic images. A signal of interest in a chest radiograph could be either partially or completely obscured or “overshadowed” by the highly contrasted bone structures in its surrounding. Thus, removing the bone structures, especially the posterior rib and clavicle structures, is highly desirable to increase the visibility of soft tissue density. We developed an innovative technology that offers a solution to suppress bone structures, including posterior ribs and clavicles, on conventional and portable chest X-ray images. The bone-suppression image processing technology includes five major steps: 1) lung segmentation, 2) rib and clavicle structure detection, 3) rib and clavicle edge detection, 4) rib and clavicle profile estimation, and 5) suppression based on the estimated profiles. The bone-suppression software outputs an image with both the rib and clavicle structures suppressed. The rib suppression performance was evaluated on 491 images. On average, 83.06% (±6.59%) of the rib structures on a standard chest image were suppressed based on the comparison of computer-identified rib areas against hand-drawn rib areas, which is equivalent to about an average of one rib that is still visible on a rib-suppressed image based on a visual assessment. Reader studies were performed to evaluate reader performance in detecting lung nodules and pneumothoraces with and without a bone-suppression companion view. Results from reader studies indicated that the bone-suppression technology significantly improved radiologists’ performance in the detection of CT-confirmed possible nodules and pneumothoraces on chest radiographs. The results also showed that radiologists were more confident in making diagnoses regarding the presence or absence of an abnormality after rib-suppressed companion views were presented
The patterns of false positive lesions for chest radiography observer performance: insights into errors and locations
John W. Robinson, Patrick C. Brennan, Claudia R. Mello-Thoms, et al.
To examine the lobar distribution of false positives on a set of nodule-free and nodule-containing chest radiographs when radiologists are requested to perform an unframed task (oral report) compared to a framed task (nodule/s identification). A set of 40 chest images, 21 nodule-free (NF) and 19 nodule-containing (NC), was used. Ten radiologists participated in the study; first an oral clinical report was performed (unframed task, UFT) naming any abnormality seen, a confidence score and the location of reported abnormalities. The second (framed task, FT) had the same images randomly presented and radiologists were asked to locate any lung nodule/s and record their confidence and location of nodules. There was no statistical difference between the mean number of false positives (FPs) per lobe per case type (UFT or FT) with the exception of the right lower lobe (RLL) P=0.021. When the comparison of FPs for tasks and case types was carried out there were significant changes. For the NF cases there are significant differences for right upper lobe (RUL) P=0.0003, left upper lobe (LUL) P=0.0412; for NC cases there are significant differences for RUL P=0.009, RLL P=0.0112, LUL P=0.0337 and left lower lobe (LLL) P=0.0209. There was no significant correlation between the presence of a nodule in a given lobe and the occurrence of a FP in that lobe. The number and lobar location of FPs identified on a chest image by a radiologist is influenced by the task and case type.
Nonparametric EROC analysis for observer performance evaluation on joint detection and estimation tasks
The majority of the literature on task-based image quality assessment has focused on lesion detection tasks, using the receiver operating characteristic (ROC) curve, or related variants, to measure performance. However, since many clinical image evaluation tasks involve both detection and estimation (e.g., estimation of kidney stone composition, estimation of tumor size), there is a growing interest in performance evaluation for joint detection and estimation tasks. To evaluate observer performance on such tasks, Clarkson introduced the estimation ROC (EROC) curve, and the area under the EROC curve as a summary figure of merit. In the present work, we propose nonparametric estimators for practical EROC analysis from experimental data, including estimators for the area under the EROC curve and its variance. The estimators are illustrated with a practical example comparing MRI images reconstructed from different k-space sampling trajectories.
Technology Assessment
icon_mobile_dropdown
Non-Gaussian statistical properties of virtual breast phantoms
Images derived from a “phantom” are useful for characterizing the performance of imaging systems. In particular, the modulation transfer properties of imaging detectors are traditionally assessed by physical phantoms consisting of an edge. More recently researchers have come to realize that quantifying the effects of object variability can also be accomplished with phantoms in modalities such as breast imaging where anatomical structure may be the principal limitation in performance. This has driven development of virtual phantoms that can be used in simulation environments. In breast imaging, several such phantoms have been proposed. In this work, we analyze non-Gaussian statistical properties of virtual phantoms, and compare them to similar statistics from a database of breast images. The virtual phantoms assessed consist of three classes. The first is known as clustered-blob lumpy backgrounds. The second class is “binarized” textures which typically apply some sort of threshold to a stochastic 3D texture intended to represent the distribution of adipose and glandular tissue in the breast. The third approach comes from efforts at the University of Pennsylvania to directly simulate the 3D anatomy of the breast. We use Laplacian fractional entropy (LFE) as a measure of the non-Gaussian statistical properties of each simulation. Our results show that the simulation approaches differ considerably in LFE with very low scores for the clustered-blob lumpy background to very high values for the UPenn phantom. These results suggest that LFE may have value in developing and tuning virtual phantom simulation procedures.
Mammographic density descriptors of novel phantom images: effect of clustered lumpy backgrounds
Mammographic breast density (MBD) is a risk factor for breast cancer. Both qualitative and quantitative methods have been used to evaluate MBD. However as it is impossible to measure the actual weight or volume of fibroglandular tissue evident on a mammogram, therefore it is hard to know the true correlation between measured mammographic density and the fibroglandular tissue volume. A phantom system has been developed that represents glandular tissue within an adipose tissue structure. Although a previous study has found strong correlation between the synthesised glandular mass and several image descriptors, it is not known if the correlation is still present when a high level of background noise is introduced. The background noise is required to more realistically simulate clinical image appearance. The aim of this study is to investigate if the correlation between percentage density, integrated density, and standard deviation of mean grey value of the whole phantom and simulated glandular tissue mass is affected by background noise being added to the phantom images. For a set of one hundred phantom mammographic images, clustered lumpy backgrounds were synthesised and superimposed onto phantom images. The correlation between the synthesised glandular mass and the image descriptors were calculated. The results showed the correlation is strong and statistically significant for the above three descriptors with r is 0.7597, 0.8208, and 0.7167 respectively. This indicates these descriptors may be used to assess breast fibroglandular tissue content of the breast using mammographic images.
Using image simulation to test the effect of detector type on breast cancer detection
Alistair Mackenzie, Lucy M. Warren, David R. Dance, et al.
Introduction: The effect that the image quality associated with different image receptors has on cancer detection in mammography was measured using a novel method for changing the appearance of images. Method: A set of 270 mammography cases (one view, both breasts) was acquired using five Hologic Selenia and two Hologic Dimensions X-ray sets: 160 normal cases, 80 cases with subtle real non-calcification malignant lesions and 30 cases with biopsy proven benign lesions. Simulated calcification clusters were inserted into half of the normal cases. The 270 cases (Arm 1) were converted to appear as if they had been acquired on three other imaging systems: caesium iodide detector (Arm 2), needle image plate computed radiography (CR) (Arm 3) and powder phosphor CR (Arm 4). Five experienced mammography readers marked the location of suspected cancers in the images and classified the degree of visibility of the lesions. Statistical analysis was performed using JAFROC. Results: The differences in the visibility of calcification clusters between all pairs of arms were statistically significant (p<0.05), except between Arms 1 and 2. The difference in the visibility of non-calcification lesions was smaller than for calcification clusters, but the differences were still significant except between Arms 1 and 2 and between Arms 3 and 4. Conclusion: Detector type had a significant impact on the visibility of all types of subtle cancers, with the largest impact being on the visibility of calcification clusters.
Task-based optimization of image reconstruction in breast CT
Adrian A. Sanchez, Emil Y. Sidky, Xiaochuan Pan
We demonstrate a task-based assessment of image quality in dedicated breast CT in order to optimize the number of projection views acquired. The methodology we employ is based on the Hotelling Observer (HO) and its associated metrics. We consider two tasks: the Rayleigh task of discerning between two resolvable objects and a single larger object, and the signal detection task of classifying an image as belonging to either a signalpresent or signal-absent hypothesis. HO SNR values are computed for 50, 100, 200, 500, and 1000 projection view images, with the total imaging radiation dose held constant. We use the conventional fan-beam FBP algorithm and investigate the effect of varying the width of a Hanning window used in the reconstruction, since this affects both the noise properties of the image and the under-sampling artifacts which can arise in the case of sparse-view acquisitions. Our results demonstrate that fewer projection views should be used in order to increase HO performance, which in this case constitutes an upper-bound on human observer performance. However, the impact on HO SNR of using fewer projection views, each with a higher dose, is not as significant as the impact of employing regularization in the FBP reconstruction through a Hanning filter.
Evaluation of penalty design in penalized maximum-likelihood image reconstruction for lesion detection
Li Yang, Andrea Ferrero, Rosalie J. Hagge, et al.
Detecting cancerous lesions is a major clinical application in emission tomography. In previous work, we have studied penalized maximum-likelihood (PML) image reconstruction for the detection task, where we used a multiview channelized Hotelling observer (mvCHO) to assess the lesion detectability in 3D images. It mimics the condition where a human observer examines three orthogonal views of a 3D image for lesion detection. We proposed a method to design a shift-variant quadratic penalty function to improve the detectability of lesions at unknown locations, and validated it using computer simulations. In this study we evaluated the bene t of the proposed penalty function for lesion detection using real data. A high-count real patient data with no identi able tumor inside the eld of view was used as the background data. A Na-22 point source was scanned in air at variable locations and the point source data were superimposed onto the patient data as arti cial lesions after being attenuated by the patient body. Independent Poisson noise was added to the high-count sinograms to generate 200 pairs of lesion-present and lesion-absent data sets, each mimicking a 5-minute scans. Lesion detectability was assessed using a multiview CHO and a human observer two alternative forced choice (2AFC) experiment. The results showed that the optimized penalty can improve lesion detection over the conventional quadratic penalty function.
Observer assessment of multi-pinhole SPECT geometries for prostate cancer imaging: a simulation study
SPECT imaging using In-111 ProstaScint is an FDA-approved method for diagnosing prostate cancer metastases within the pelvis. However, conventional medium-energy parallel-hole (MEPAR) collimators produce poor image quality and we are investigating the use of multipinhole (MPH) imaging as an alternative. This paper presents a method for evaluating MPH designs that makes use of sampling-sensitive (SS) mathematical model observers for tumor detectionlocalization tasks. Key to our approach is the redefinition of a normal (or background) reference image that is used with scanning model observers. We used this approach to compare different MPH configurations for the task of small-tumor detection in the prostate and surrounding lymph nodes. Four configurations used 10, 20, 30, and 60 pinholes evenly spaced over a complete circular orbit. A fixed-count acquisition protocol was assumed. Spherical tumors were placed within a digital anthropomorphic phantom having a realistic Prostascint biodistribution. Imaging data sets were generated with an analytical projector and reconstructed volumes were obtained with the OSEM algorithm. The MPH configurations were compared in a localization ROC (LROC) study with 2D pelvic images and both human and model observers. Regular and SS versions of the scanning channelized nonprewhitening (CNPW) and visual-search (VS) model observers were applied. The SS models demonstrated the highest correlations with the average human-observer results
Model Observers: Imaging Applications
icon_mobile_dropdown
Comparing observer models and feature selection methods for a task-based statistical assessment of digital breast tomsynthesis in reconstruction space
A task-based assessment of image quality1 for digital breast tomosynthesis (DBT) can be done in either the projected or reconstructed data space. As the choice of observer models and feature selection methods can vary depending on the type of task and data statistics, we previously investigated the performance of two channelized- Hotelling observer models in conjunction with 2D Laguerre-Gauss (LG) and two implementations of partial least squares (PLS) channels along with that of the Hotelling observer in binary detection tasks involving DBT projections.2, 3 The difference in these observers lies in how the spatial correlation in DBT angular projections is incorporated in the observer’s strategy to perform the given task. In the current work, we extend our method to the reconstructed data space of DBT. We investigate how various model observers including the aforementioned compare for performing the binary detection of a spherical signal embedded in structured breast phantoms with the use of DBT slices reconstructed via filtered back projection. We explore how well the model observers incorporate the spatial correlation between different numbers of reconstructed DBT slices while varying the number of projections. For this, relatively small and large scan angles (24° and 96°) are used for comparison. Our results indicate that 1) given a particular scan angle, the number of projections needed to achieve the best performance for each observer is similar across all observer/channel combinations, i.e., Np = 25 for scan angle 96° and Np = 13 for scan angle 24°, and 2) given these sufficient numbers of projections, the number of slices for each observer to achieve the best performance differs depending on the channel/observer types, which is more pronounced in the narrow scan angle case.
A Naive-Bayes model observer for detection and localization of perfusion defects in cardiac SPECT-MPI
Felipe M. Parages, J. Michael O’Connor, P. Hendrik Pretorius, et al.
Model observers (MO) are widely used in medical imaging to act as surrogates of human observers in task-based image quality evaluation, frequently towards optimization of reconstruction algorithms. In SPECT myocardial perfusion imaging (MPI), a realistic task-based approach involves detection and localization of perfusion defects, as well as a subsequent assessment of defect severity. In this paper we explore a machine-learning MO based on Naive- Bayes classification (NB-MO). NB-MO uses a set of polar-map image features to predict lesion detection, localization and severity scores given by five human readers for a set of simulated 3D SPECT-MPI patients. The simulated dataset included lesions with different sizes, perfusion-reduction ratios, and locations. Simulated projections were reconstructed using two readily used methods namely: FBP and OSEM. For validation, a multireader multi-case (MRMC) analysis of alternative free-response ROC (AFROC) curve was performed for NB-MO and human observers. For comparison, we also report performances of a statistical Hotelling Observer applied on polar-map images. Results show excellent agreement between NB-MO and humans, as well as model’s good generalization between different reconstruction treatments.
Design of a practical model-observer-based image quality assessment method for CT imaging systems
The channelized Hotelling observer (CHO) is a powerful method for quantitative image quality evaluations of CT systems and their image reconstruction algorithms. It has recently been used to validate the dose reduction capability of iterative image-reconstruction algorithms implemented on CT imaging systems. The use of the CHO for routine and frequent system evaluations is desirable both for quality assurance evaluations as well as further system optimizations. The use of channels substantially reduces the amount of data required to achieve accurate estimates of observer performance. However, the number of scans required is still large even with the use of channels. This work explores different data reduction schemes and designs a new approach that requires only a few CT scans of a phantom. For this work, the leave-one-out likelihood (LOOL) method developed by Hoffbeck and Landgrebe is studied as an efficient method of estimating the covariance matrices needed to compute CHO performance. Three different kinds of approaches are included in the study: a conventional CHO estimation technique with a large sample size, a conventional technique with fewer samples, and the new LOOL-based approach with fewer samples. The mean value and standard deviation of area under ROC curve (AUC) is estimated by shuffle method. Both simulation and real data results indicate that an 80% data reduction can be achieved without loss of accuracy. This data reduction makes the proposed approach a practical tool for routine CT system assessment.
Comparison of computational to human observer detection for evaluation of CT low dose iterative reconstruction
Model observers were created and compared to human observers for the detection of low contrast targets in computed tomography (CT) images reconstructed with an advanced, knowledge-based, iterative image reconstruction method for low x-ray dose imaging. A 5-channel Laguerre-Gauss Hotelling Observer (CHO) was used with internal noise added to the decision variable (DV) and/or channel outputs (CO). Models were defined by parameters: (k1) DV-noise with standard deviation (std) proportional to DV std; (k2) DV-noise with constant std; (k3) CO-noise with constant std across channels; and (k4) CO-noise in each channel with std proportional to CO variance. Four-alternative forced choice (4AFC) human observer studies were performed on sub-images extracted from phantom images with and without a “pin” target. Model parameters were estimated using maximum likelihood comparison to human probability correct (PC) data. PC in human and all model observers increased with dose, contrast, and size, and was much higher for advanced iterative reconstruction (IMR) as compared to filtered back projection (FBP). Detection in IMR was better than FPB at 1/3 dose, suggesting significant dose savings. Model(k1,k2,k3,k4) gave the best overall fit to humans across independent variables (dose, size, contrast, and reconstruction) at fixed display window. However Model(k1) performed better when considering model complexity using the Akaike information criterion. Model(k1) fit the extraordinary detectability difference between IMR and FBP, despite the different noise quality. It is anticipated that the model observer will predict results from iterative reconstruction methods having similar noise characteristics, enabling rapid comparison of methods.
Assessment of prostate cancer detection with a visual-search human model observer
Early staging of prostate cancer (PC) is a significant challenge, in part because of the small tumor sizes in- volved. Our long-term goal is to determine realistic diagnostic task performance benchmarks for standard PC imaging with single photon emission computed tomography (SPECT). This paper reports on a localization receiver operator characteristic (LROC) validation study comparing human and model observers. The study made use of a digital anthropomorphic phantom and one-cm tumors within the prostate and pelvic lymph nodes. Uptake values were consistent with data obtained from clinical In-111 ProstaScint scans. The SPECT simulation modeled a parallel-hole imaging geometry with medium-energy collimators. Nonuniform attenua- tion and distance-dependent detector response were accounted for both in the imaging and the ordered-subset expectation-maximization (OSEM) iterative reconstruction. The observer study made use of 2D slices extracted from reconstructed volumes. All observers were informed about the prostate and nodal locations in an image. Iteration number and the level of postreconstruction smoothing were study parameters. The results show that a visual-search (VS) model observer correlates better with the average detection performance of human observers than does a scanning channelized nonprewhitening (CNPW) model observer.
Observer Performance: Breast
icon_mobile_dropdown
Does sensitivity measured from screening test-sets predict clinical performance?
BaoLin P. Soh, Warwick B. Lee, Claudia R. Mello-Thoms, et al.
Aim: To examine the relationship between sensitivity measured from the BREAST test-set and clinical performance.

Background: Although the UK and Australia national breast screening programs have regarded PERFORMS and BREAST test-set strategies as possible methods of estimating readers' clinical efficacy, the relationship between test-set and real life performance results has never been satisfactorily understood.

Methods: Forty-one radiologists from BreastScreen New South Wales participated in this study. Each reader interpreted a BREAST test-set which comprised sixty de-identified mammographic examinations sourced from the BreastScreen Digital Imaging Library. Spearman's rank correlation coefficient was used to compare the sensitivity measured from the BREAST test-set with screen readers' clinical audit data.

Results: Results shown statistically significant positive moderate correlations between test-set sensitivity and each of the following metrics: rate of invasive cancer per 10 000 reads (r=0.495; p < 0.01); rate of small invasive cancer per 10 000 reads (r=0.546; p < 0.001); detection rate of all invasive cancers and DCIS per 10 000 reads (r=0.444; p < 0.01).

Conclusion: Comparison between sensitivity measured from the BREAST test-set and real life detection rate demonstrated statistically significant positive moderate correlations which validated that such test-set strategies can reflect readers' clinical performance and be used as a quality assurance tool. The strength of correlation demonstrated in this study was higher than previously found by others.
Modeling resident error-making patterns in detection of mammographic masses using computer-extracted image features: preliminary experiments
Maciej A. Mazurowski, Jing Zhang, Joseph Y. Lo, et al.
Providing high quality mammography education to radiology trainees is essential, as good interpretation skills potentially ensure the highest benefit of screening mammography for patients. We have previously proposed a computer-aided education system that utilizes trainee models, which relate human-assessed image characteristics to interpretation error. We proposed that these models be used to identify the most difficult and therefore the most educationally useful cases for each trainee. In this study, as a next step in our research, we propose to build trainee models that utilize features that are automatically extracted from images using computer vision algorithms. To predict error, we used a logistic regression which accepts imaging features as input and returns error as output. Reader data from 3 experts and 3 trainees were used. Receiver operating characteristic analysis was applied to evaluate the proposed trainee models. Our experiments showed that, for three trainees, our models were able to predict error better than chance. This is an important step in the development of adaptive computer-aided education systems since computer-extracted features will allow for faster and more extensive search of imaging databases in order to identify the most educationally beneficial cases.
Mammographic density measurement: a comparison of automated volumetric density measurement to BIRADS
The aim of this study is to compare mammographic breast density assessment with automated volumetric software with Breast Imaging Reporting and Data System (BIRADS) categorization by radiologists on two imaging systems. A data set of 120 mammograms was classified by twenty American Board of Radiology (ABR) Examiners. The mammograms were of 20 women (mean age, 60 years; range, 42–89 years). These women were image twice one year apart either on GE system or Hologic system. These images also had their volumetric density classified by using Volpara Density Grade (VDG). The radiologists were asked to estimate the mammographic density according to BIRADS categories (1-4). There was a moderate agreement between VDG classification and radiologist BIRADS density shown with Cohen’s Kappa (κ=0.45; p<0.001). Radiologists estimated percentage density to be lower by an average of 0.37 for the Hologic system, the radiologist’s BIRADS having a mean of 2.05 and the mean VDG higher at 2.42(t= -8.88; p<0.001). VDG and radiologist’s BIRADS showed a positive strong correlation (r=0.87; p<0.001). Radiologist BIRADS and VDG AvBD% also showed a strong positive correlation (r=0.86; p<0.001). There was a large spread of radiologist’s BIRADS categories for each of the VDG AvBD% classifications. Using Volpara, the Hologic system showed a lower mean AvBD% (9.5 vs. 9.6). However using BIRADS the Hologic systems showed a lower mean (2.05 vs. 2.22). Automated systems demonstrated higher internal validity. The results demonstrated a moderate agreement and a strong correlation between VDG classification and radiologist BIRADS density assessment.

Publisher’s Note: This paper, originally published on 3/11/14, was replaced with a corrected/revised version on 8/1/14. If you downloaded the original PDF but are unable to access the revision, please contact SPIE Digital Library Customer Service for assistance.
Pursuing optimal thresholds to recommend breast biopsy by quantifying the value of tomosynthesis
Yirong Wu, Oguzhan Alagoz, David J. Vanness, et al.
A 2% threshold has been traditionally used to recommend breast biopsy in mammography. We aim to characterize how the biopsy threshold varies to achieve the maximum expected utility (MEU) of tomosynthesis for breast cancer diagnosis. A cohort of 312 patients, imaged with standard full field digital mammography (FFDM) and digital breast tomosynthesis (DBT), was selected for a reader study. Fifteen readers interpreted each patient’s images and estimated the probability of malignancy using two modes: FFDM versus FFDM + DBT. We generated receiver operator characteristic (ROC) curves with the probabilities for all readers combined. We found that FFDM+DBT provided improved accuracy and MEU compared with FFDM alone. When DBT was included in the diagnosis along with FFDM, the optimal biopsy threshold increased to 2.7% as compared with the 2% threshold for FFDM alone. While understanding the optimal threshold from a decision analytic standpoint will not help physicians improve their performance without additional guidance (e.g. decision support to reinforce this threshold), the discovery of this level does demonstrate the potential clinical improvements attainable with DBT. Specifically, DBT has the potential to lead to substantial improvements in breast cancer diagnosis since it could reduce the number of patients recommended for biopsy while preserving the maximal expected utility.
Efficacy of digital breast tomosynthesis for breast cancer diagnosis
M. Alakhras, C. Mello-Thoms, M. Rickard, et al.
Purpose: To compare the diagnostic performance of digital breast tomosynthesis (DBT) in combination with digital mammography (DM) with that of digital mammography alone.

Materials and Methods: Twenty six experienced radiologists who specialized in breast imaging read 50 cases (27 cancers and 23 non-cancer cases) of patients who underwent DM and DBT. Both exams included the craniocaudal (CC) and mediolateral oblique (MLO) views. Histopathologic examination established truth in all lesions. Each case was interpreted in two modes, once with DM alone followed by DM+DBT, and the observers were asked to mark the location of any lesions, if present, and give it a score based on a five-category assessment by the Royal Australian and New Zealand College of Radiologists (RANZCR). The diagnostic performance of DM compared with that of DM+DBT was evaluated in terms of the difference between areas under receiver-operating characteristic curves (AUCs), Jackknife free-response receiver operator characteristics (JAFROC) figure-of-merit, sensitivity, location sensitivity and specificity.

Results: Average AUC and JAFROC for DM versus DM+DBT was significantly different (AUCs 0.690 vs 0.781, p=< 0.0001), (JAFROC 0.618 vs. 0.732, p=< 0.0001) respectively. In addition, the use of DM+DBT resulted in an improvement in sensitivity (0.629 vs. 0.701, p=0.0011), location sensitivity (0.548 vs. 0.690, p=< 0.0001) and specificity (0.656 vs. 0.758, p=0.0015) when compared to DM alone.

Conclusion: Adding DBT to the standard DM significantly improved radiologists’ performance in terms of AUCs, JAFROC figure of merit, sensitivity, location sensitivity and specificity values.
Retrieving high spatial frequency information in sonography for improved microcalcification detection
Sara Bahramian, Michael F. Insana
The process of echo-signal demodulation within the display stage of ultrasonic image formation discards signal phase. It has long been hypothesized that demodulation could be eliminating important clinical task information but the tools to study this effect were not developed. We have now developed a task-energy analysis to show how signal energy flows through different stages of image formation. In this paper, we show how traditional displaystage processing eliminates high spatial-frequency task information, and how simple methods can recover the loss for improved diagnostic performance. We also study the improvement in detecting breast microcalcifications using the proposed method.
Model Observers: General
icon_mobile_dropdown
Development and evaluation of a 3D model observer with nonlinear spatiotemporal contrast sensitivity
Ali R. N. Avanaki, Kathryn S. Espig, Andrew D. A. Maidment, et al.
We investigate improvements to our 3D model observer with the goal of better matching human observer performance as a function of viewing distance, effective contrast, maximum luminance, and browsing speed. Two nonlinear methods of applying the human contrast sensitivity function (CSF) to a 3D model observer are proposed, namely the Probability Map (PM) and Monte Carlo (MC) methods. In the PM method, the visibility probability for each frequency component of the image stack, p, is calculated taking into account Barten’s spatiotemporal CSF, the component modulation, and the human psychometric function. The probability p is considered to be equal to the perceived amplitude of the frequency component and thus can be used by a traditional model observer (e.g., LG-msCHO) in the space-time domain. In the MC method, each component is randomly kept with probability p or discarded with 1-p. The amplitude of the retained components is normalized to unity. The methods were tested using DBT stacks of an anthropomorphic breast phantom processed in a comprehensive simulation pipeline. Our experiments indicate that both the PM and MC methods yield results that match human observer performance better than the linear filtering method as a function of viewing distance, effective contrast, maximum luminance, and browsing speed.
Human template estimation using a Gaussian processes algorithm
Francesc Massanes, Jovan G. Brankov
In this paper we propose the use of a machine-learning algorithm based in Gaussian Processes to estimate a human observer linear template for the detection of a signal in a noisy background. Estimating a human observer template is not novel, however the use of a multi-kernel Gaussian Processes approach is. This model provides spatial smoothing by using a sparse kernel representation. For validation purposes, we train this model observer with the ground truth and the estimated template is actually the same as the statistically optimal detector. Next, we present the human observer template estimated for the detection of a signal on a different power-low background.
A stereo matching model observer for stereoscopic viewing of 3D medical images
Gezheng Wen, Mia K. Markey, Gautam S. Muralidlhar
Stereoscopic viewing of 3D medical imaging data has the potential to increase the detection of abnormalities. We present a new stereo model observer inspired by the characteristics of stereopsis in human vision. Given a stereo pair of images of an object (i.e., left and right images separated by a small displacement), the model observer rst nds the corresponding points between the two views, and then fuses them together to create a 2D cyclopean view. Assuming that the cyclopean view has extracted most of the 3D information presented in the stereo pair, a channelized Hotelling observer (CHO) can be utilized to make decisions. We conduct a simulation study that attempts to mimic the detection of breast lesions on stereoscopic viewing of breast tomosynthesis projection images. We render voxel datasets that contain random 3D power-law noise to model normal breast tissues with various breast densities. 3D Gaussian signal is added to some of the datasets to model the presence of a breast lesion. By changing the separation angle between the two views, multiple stereo pairs of projection images are generated for each voxel dataset. The performance of the model is evaluated in terms of the accuracy of binary decisions on the presence of the simulated lesions.
A model observer based on human perception to quantify the detectability
Georges Acharian, Nathalie Guyader, Jean-Michel Vignolle, et al.
In medical imaging, model observers such as the "Hotelling observer" and the "Non Prewhitening Matched Filter" have been proposed to detect objects in X-ray images. These models, based on decision theory, are applied over the entire image. In this paper, we developed a model that mimics some processes of human visual perception. The proposed model is locally applied on some particular areas that correspond to the salient areas of the object. By doing this, the model mimics the sequence of eye fixations that we make when we explore an image for example in order to detect an object. The study is divided into three parts: a psychophysical experiment to obtain human’s performance to detect various objects in noises, a theoretical part to develop the proposed model, and finally, a result part. During the experiment, several participants were asked to detect objects in noisy images using a free search task. The luminance contrast of objects was adaptively adjusted according to their responses to obtain a percentage of correct detection for each object of 50 %. The proposed model, based on decision theory, was applied locally on some areas of the image that has a size corresponding to the high visual acuity of foveal vision. Areas were chosen according to their high saliency values computed through a bio-inspired model of visual attention. For each area, our model returned a detectability index. By supposing statistical independence between areas, the local indexes are combined into a global detectability index. Results show that the proposed model fits the results of the psychophysical experiment and outperforms classical models of the literature.
Detectability and image quality metrics based on robust statistics: following non-linear, noise-reduction filters
Non-linear image processing and reconstruction algorithms that reduced noise while preserving edge detail are currently being evaluated in medical imaging research literature. We have implemented a robust statistics analysis of four widely utilized methods. This work demonstrates consistent trends in filter impact by which such non-linear algorithms can be evaluated. We calculate observer model test statistics and propose metrics based on measured non-Gaussian distributions that can serve as image quality measures analogous to SDNR and detectability. The filter algorithms that vary significantly in their approach to noise reduction include median (MD), bilateral (BL), anisotropic diffusion (AD) and total-variance regularization (TV). It is shown that the detectability of objects limited by Poisson noise is not significantly improved after filtration. There is no benefit to the fraction of correct responses in repeated n-alternate forced choice experiments, for n=2-25. Nonetheless, multi-pixel objects with contrast above the detectability threshold appear visually to benefit from non-linear processing algorithms. In such cases, calculations on highly repeated trials show increased separation of the object-level histogram from the background-level distribution. Increased conspicuity is objectively characterized by robust statistical measures of distribution separation.
Discover common properties of human observers' visual search and mathematical observers' scanning PART I: theory and conjecture
Xin He, Frank Samuelson, Rongping Zeng, et al.
There is a lack of consensus in measuring observer performance in search tasks. To pursue a consensus, we set our goal as to obtain metrics that are practical, meaningful and objective. We consider a metric practical if it can be implemented to measure human and computer observers’ performance. To be meaningful, we propose to discover metrics that reflect the intrinsic properties of search observers. Thus, the meaningfulness of the metrics is ensured by the discovered properties being intrinsic. We set our success criteria as that the discovered properties can make verifiable predications. Thus the objectivity of the metrics is ensured by their prediction ability. The goal of this work is to present a theory and a conjecture toward two intrinsic properties of search observers: rationality in classification as measured by the location-known-exactly ROC curve and location uncertainty as measured by the effective set size. These two properties are used to develop search models in both single-response and free-response search tasks. To confirm whether these properties are “intrinsic”, in a companion paper, we investigate their ability in predicting search performance of both human and scanning channelized Hotelling observers.
Discover common properties of human observers' visual search and mathematical observers' scanning PART II: emperical studies using human and model observers
In a companion paper, we proposed the well-delineated-object conjecture to describe a rational observer’s behavior in a search task. We discovered two intrinsic properties to describe the performance of rational search observers: rationality in classification and location uncertainty. We proposed to use the location-known-exactly (LKE) ROC curve and the effective number of well-delineated objects or effective set size (M*) to quantify these two properties. The purpose of this paper is to develop an experimental framework to test the conjecture that was put forward in the companion paper. In particular, for each observer, we designed experiments to measure LKE ROC curve and M*, which were then used to predict the same observer’s performance in other search tasks. The predictions were then compared to the experimentally measured observer performance. Our results indicate that modeling the search performance using the LKE ROC curve and M* leads to successful predictions in most cases.
Poster Session
icon_mobile_dropdown
A comparison of Australian and USA radiologists' performance in detection of breast cancer
Wasfi I. Suleiman, Dianne Georgian-Smith M.D., Michael G. Evanoff, et al.
The aim of current work was to compare the performance of radiologists that read a higher number of cases to those that read a lower number, as well as examine the effect of number of years of experience on performance. This study compares Australian and USA radiologist with differing levels of experience when reading mammograms. Thirty mammographic cases were presented to 41 radiologists, 21 from Australia and 20 from the USA. Readers were asked to locate and visualize cancer and assign a mark-rating pair with confidence levels from 1 to 5. A jackknife free-response receiver operating characteristic (JAFROC), inferred receiver operating characteristic (ROC), sensitivity, specificity and location sensitivity were calculated. A Mann-Whitney test was used to compare the performance of Australian and USA radiologists using SPSS software. The results showed that the USA radiologists sampled had more years of experience (p≤0.01) but read less mammograms per year (p≤0.03). Significantly higher sensitivity and location sensitivity (p≤ 0.001) were found for the Australia radiologists when experience and the number of mammograms read per year were taken into account. There were no differences between the two countries in overall performance measured by JAFROC and inferred ROC. For the most experienced radiologists within the Australian sample experienced ROC and location sensitivity were higher when compared to the least experienced. The increased number of years experience of the USA radiologists did not result in an increase in any performance metrics. The number of cases per year is a better predictor of improved diagnostic performance.
Investigations of internal noise levels for different target sizes, contrasts, and noise structures
To describe internal noise levels for different target sizes, contrasts, and noise structures, Gaussian targets with four different sizes (i.e., standard deviation of 2,4,6 and 8) and three different noise structures(i.e., white, low-pass, and highpass) were generated. The generated noise images were scaled to have standard deviation of 0.15. For each noise type, target contrasts were adjusted to have the same detectability based on NPW, and the detectability of CHO was calculated accordingly. For human observer study, 3 trained observers performed 2AFC detection tasks, and correction rate, Pc, was calculated for each task. By adding proper internal noise level to numerical observer (i.e., NPW and CHO), detectability of human observer was matched with that of numerical observers. Even though target contrasts were adjusted to have the same detectability of NPW observer, detectability of human observer decreases as the target size increases. The internal noise level varies for different target sizes, contrasts, and noise structures, demonstrating different internal noise levels should be considered in numerical observer to predict the detection performance of human observer.
Investigating the visual inspection subjectivity on the contrast-detail evaluation in digital mammography images
A major difficulty in the interpretation of mammographic images is the low contrast and, in the case of early detection of breast cancer, the reduced size of the features of malignancy on findings such as microcalcifications. Furthermore, image assessment is subject to significant reliance of the capacity of observation of the expert that will perform it, compromising the final diagnosis accuracy. Thinking about this aspect, this study evaluated the subjectivity of visual inspection to assess the contrast-detail in mammographic images. For this, we compared the human readings of images generated with the CDMAM phantom performed by four observers, enabling to determining a threshold of contrast visibility in each diameter disks present in the phantom. These thresholds were compared graphically and by statistical measures allowing us to build a strategy for use of contrast and detail (dimensions) as parameters of quality in mammography.
The quest for 'diagnostically lossless' medical image compression: a comparative study of objective quality metrics for compressed medical images
Ilona Kowalik-Urbaniak, Dominique Brunet, Jiheng Wang, et al.
Our study, involving a collaboration with radiologists (DK,NSK) as well as a leading international developer of medical imaging software (AGFA), is primarily concerned with improved methods of assessing the diagnostic quality of compressed medical images and the investigation of compression artifacts resulting from JPEG and JPEG2000. In this work, we compare the performances of the Structural Similarity quality measure (SSIM), MSE/PSNR, compression ratio CR and JPEG quality factor Q, based on experimental data collected in two experiments involving radiologists. An ROC and Kolmogorov-Smirnov analysis indicates that compression ratio is not always a good indicator of visual quality. Moreover, SSIM demonstrates the best performance, i.e., it provides the closest match to the radiologists' assessments. We also show that a weighted Youden index1 and curve tting method can provide SSIM and MSE thresholds for acceptable compression ratios.
A comparison of ROC inferred from FROC and conventional ROC
Mark F. McEntee, Stephen Littlefair, Mariusz W. Pietrzyk
This study aims to determine whether receiver operating characteristic (ROC) scores inferred from free-response receiver operating characteristic (FROC) were equivalent to conventional ROC scores for the same readers and cases. Forty-five examining radiologists of the American Board of Radiology independently reviewed 47 PA chest radiographs under at least two conditions. Thirty-seven cases had abnormal findings and 10 cases had normal findings. Half the readers were asked to first locate any visualized lung nodules, mark them and assign a level of confidence [the FROC mark-rating pair] and second give an overall to the entire image on the same scale [the ROC score]. The second half of readers gave the ROC rating first followed by the FROC mark-rating pairs. A normal image was represented with number 1 and malignant lesions with numbers 2-5. A jackknife free-response receiver operating characteristic (JAFROC), and inferred ROC (infROC) was calculated from the mark-rating pairs using JAFROC V4.1 software. ROC based on the overall rating of the image calculated using DBM MRMC software, which was also used to compare infROC and ROC AUCs treating the methods as modalities. Pearson’s correlations coefficient and linear regression were used to examine their relationship using SPSS, version 21.0; (SPSS, Chicago, IL). The results of this study showed no significant difference between the ROC and Inferred ROC AUCs (p≤0.25). While Pearson’s correlation coefficient was 0.7 (p≤0.01). Inter-reader correlation calculated from Obuchowski- Rockette covariance’s ranged from 0.43-0.86 while intra-reader agreement was greater than previously reported ranging from 0.68-0.82.
Visual search behaviour during laparoscopic cadaveric procedures
Laparoscopic surgery provides a very complex example of medical image interpretation. The task entails: visually examining a display that portrays the laparoscopic procedure from a varying viewpoint; eye-hand coordination; complex 3D interpretation of the 2D display imagery; efficient and safe usage of appropriate surgical tools, as well as other factors. Training in laparoscopic surgery typically entails practice using surgical simulators. Another approach is to use cadavers. Viewing previously recorded laparoscopic operations is also a viable additional approach and to examine this a study was undertaken to determine what differences exist between where surgeons look during actual operations and where they look when simply viewing the same pre-recorded operations. It was hypothesised that there would be differences related to the different experimental conditions; however the relative nature of such differences was unknown. The visual search behaviour of two experienced surgeons was recorded as they performed three types of laparoscopic operations on a cadaver. The operations were also digitally recorded. Subsequently they viewed the recording of their operations, again whilst their eye movements were monitored. Differences were found in various eye movement parameters when the two surgeons performed the operations and where they looked when they simply watched the recordings of the operations. It is argued that this reflects the different perceptual motor skills pertinent to the different situations. The relevance of this for surgical training is explored.
Direction of an initial saccade depends on radiological expertise
Purpose: To evaluate the role of radiographic details in global impression of chest x-ray images viewed by experts in thoracic and non-thoracic domains. Materials and Methods: The study was approved by IRB. Five thoracic and five non-thoracic radiologists participated in two tachistoscopic (one low pass and one with the entire frequency spectrum, each lasting 270 ms) each containing 50 PA chest radiographs with 50% prevalence of pulmonary nodule. Eye movements were monitored in order to evaluate a pre-saccade shift of visual attention, saccade latency, decision time and the time to first fixation on a pulmonary nodule. Results: Thoracic radiologists showed significantly higher pre-saccadic shift of visual attention towards pulmonary nodules once using the full frequency spectrum (p < 0.05). An initial saccade orientation made by these radiologists on full resolution images correlated at significant level with their confidence ranking of pulmonary nodules (ρ = -0.387, p < 0.001). Conclusions: Thoracic radiologists benefited from high spatial frequency appearance during a rapid presentation of chest radiograph by allocating pre-saccade attention towards pulmonary nodules. This behavior correlated with a higher number of correct decisions, followed by higher confidence in the decisions made, and briefer reaction times.
Preliminary experiments on quantification of skin condition
Kenzo Kitajima, Hitoshi Iyatomi
In this study, we investigated a preliminary assessment method for skin conditions such as a moisturizing property and its fineness of the skin with an image analysis only. We captured a facial images from volunteer subjects aged between 30s and 60s by Pocket Micro (R) device (Scalar Co., Japan). This device has two image capturing modes; the normal mode and the non-reflection mode with the aid of the equipped polarization filter. We captured skin images from a total of 68 spots from subjects' face using both modes (i.e. total of 136 skin images). The moisture-retaining property of the skin and subjective evaluation score of the skin fineness in 5-point scale for each case were also obtained in advance as a gold standard (their mean and SD were 35.15 +/- 3.22 (μS) and 3.45 +/- 1.17, respectively). We extracted a total of 107 image features from each image and built linear regression models for estimating abovementioned criteria with a stepwise feature selection. The developed model for estimating the skin moisture achieved the MSE of 1.92 (μS) with 6 selected parameters, while the model for skin fineness achieved that of 0.51 scales with 7 parameters under the leave-one-out cross validation. We confirmed the developed models predicted the moisture-retaining property and fineness of the skin appropriately with only captured image.
Validation and comparison of intensity based methods for change detection in serial brain images
Žiga Lesjak, Žiga Špiclin, Boštjan Likar, et al.
Detection of longitudinal changes in brain structures is a common clinical task when assessing the progress of cerebrovascular and neurodegenerative diseases, which manifest in appearing and disappearing white matter lesions (WMLs). Changes of WMLs are usually quanti ed by their manual outlines and compared across longi- tudinal, serial magnetic resonance (MR) brain images. Since manual outlining in 3D MR images is subjective and inaccurate, several automated methods were proposed so as to enhance the sensitivity, reliability and re- peatability of change detection of WMLs. However, the absence of publicly available synthetic or clinical MR image databases with corresponding ground truth of changes renders the validation and comparison of any new and existing automated methods highly subjective. In this paper, we focus on the validation and comparison of three state-of-the-art intensity based methods for detection of longitudinal changes of WMLs. To objectively assess the three methods we created several synthetic MR image databases using a generative lesion model, which was trained on manually outlined patches of WMLs in a clinical MR image database of 22 patients. Val- idation was also performed on clinical MR image database of MS patients. Performances of the three change detection methods were evaluated by computing the similarity index and sensitivity between the obtained and the ground truth binary change map. The obtained similarity indices were in the range of 0.40-0.77, which should be improved for clinical use, while the comparison of methods revealed that the intensity subtraction method achieved similar performance as the change vector analysis method, which employed two MR sequences for change detection. Third method was based on local steering kernels and exhibited stable performance both on synthetic and clinical MR image databases.
Validation of parameter estimation methods for determining optical properties of atherosclerotic tissues in intravascular OCT
In this paper we present a new process for assessing optical properties of tissues from 3D pullbacks, the standard clinical acquisition method for iOCT data. Our method analyzes a volume of interest (VOI) consisting of about 100 A-lines spread across the angle of rotation (θ) and along the artery, z. The new 3D method uses catheter correction, baseline removal, speckle noise reduction, alignment of A-line sequences, and robust estimation. We compare results to those from a more standard, “gold standard” stationary acquisition where many image frames are averaged to reduce noise. To do these studies in a controlled fashion, we use a realistic optical artery phantom containing of multiple “tissue types.” Precision and accuracy for 3D pullback analysis are reported.

Our results indicate that when implementing the process on a stationary acquisition dataset, the uncertainty improves at each stage while the uncertainty is reduced. When comparing stationary acquisition dataset to pullback dataset, the values were as follows: calcium: 3.8±1.09mm-1 in stationary and 3.9±1.2 mm-1 in a pullback; lipid: 11.025±0.417 mm-1 in stationary and 11.27±0.25 mm-1 in pullback; fibrous: 6.08±1.337 mm-1 in stationary and 5.58±2.0 mm-1. These results indicates that the process presented in this paper introduce minimal bias and only a small change in uncertainty when comparing a stationary and pullback dataset, thus paves the way to a highly accurate clinical plaque type discrimination, enabling automatic classification.
Analysis of temporal dynamics in imagery during acute limb ischemia and reperfusion
John M. Irvine, John Regan, Tammy A. Spain, et al.
Ischemia and reperfusion injuries present major challenges for both military and civilian medicine. Improved methods for assessing the effects and predicting outcome could guide treatment decisions. Specific issues related to ischemia and reperfusion injury can include complications arising from tourniquet use, such as microvascular leakage in the limb, loss of muscle strength and systemic failures leading to hypotension and cardiac failure. Better methods for assessing the viability of limbs/tissues during ischemia and reducing complications arising from reperfusion are critical to improving clinical outcomes for at-risk patients. The purpose of this research is to develop and assess possible prediction models of outcome for acute limb ischemia using a pre-clinical model. Our model relies only on non-invasive imaging data acquired from an animal study. Outcome is measured by pathology and functional scores. We explore color, texture, and temporal features derived from both color and thermal motion imagery acquired during ischemia and reperfusion. The imagery features form the explanatory variables in a model for predicting outcome. Comparing model performance to outcome prediction based on direct observation of blood chemistry, blood gas, urinalysis, and physiological measurements provides a reference standard. Initial results show excellent performance for the imagery-base model, compared to predictions based direct measurements. This paper will present the models and supporting analysis, followed by recommendations for future investigations.
Clinical compliance of viewing conditions in radiology reporting environments against current guidelines and standards
Several studies have demonstrated the importance of environmental conditions in the radiology reporting environment, with many indicating that incorrect parameters could lead to error and misinterpretation. Literature is available with recommendations as to the levels that should be achieved in clinical practice, but evidence of adherence to these guidelines in radiology reporting environments is absent. This study audited the reporting environments of four teleradiologist and eight hospital based radiology reporting areas. This audit aimed to quantify adherence to guidelines and identify differences in the locations with respect to layout and design, monitor distance and angle as well as the ambient factors of the reporting environments. In line with international recommendations, an audit tool was designed to enquire in relation to the layout and design of reporting environments, monitor angle and distances used by radiologists when reporting, as well as the ambient factors such as noise, light and temperature. The review of conditions were carried out by the same independent auditor for consistency. The results obtained were compared against international standards and current research. Each radiology environment was given an overall compliance score to establish whether or not their environments were in line with recommended guidelines. Poor compliance to international recommendations and standards among radiology reporting environments was identified. Teleradiology reporting environments demonstrated greater compliance than hospital environments. The findings of this study identified a need for greater awareness of environmental and perceptual issues in the clinical setting. Further work involving a larger number of clinical centres is recommended.
Study of quality perception in medical images based on comparison of contrast enhancement techniques in mammographic images
B. Matheus, L. B. Verçosa M.D., B. Barufaldi, et al.
With the absolute prevalence of digital images in mammography several new tools became available for radiologist; such as CAD schemes, digital zoom and contrast alteration. This work focuses in contrast variation and how the radiologist reacts to these changes when asked to evaluated image quality. Three contrast enhancing techniques were used in this study: conventional equalization, CCB Correction [1] – a digitization correction – and value subtraction. A set of 100 images was used in tests from some available online mammographic databases. The tests consisted of the presentation of all four versions of an image (original plus the three contrast enhanced images) to the specialist, requested to rank each one from the best up to worst quality for diagnosis. Analysis of results has demonstrated that CCB Correction [1] produced better images in almost all cases. Equalization, which mathematically produces a better contrast, was considered the worst for mammography image quality enhancement in the majority of cases (69.7%). The value subtraction procedure produced images considered better than the original in 84% of cases. Tests indicate that, for the radiologist’s perception, it seems more important to guaranty full visualization of nuances than a high contrast image. Another result observed is that the “ideal” scanner curve does not yield the best result for a mammographic image. The important contrast range is the middle of the histogram, where nodules and masses need to be seen and clearly distinguished.
MedXViewer: an extensible web-enabled software package for medical imaging
MedXViewer (Medical eXtensible Viewer) is an application designed to allow workstation-independent, PACS-less viewing and interaction with anonymised medical images (e.g. observer studies). The application was initially implemented for use in digital mammography and tomosynthesis but the flexible software design allows it to be easily extended to other imaging modalities. Regions of interest can be identified by a user and any associated information about a mark, an image or a study can be added. The questions and settings can be easily configured depending on the need of the research allowing both ROC and FROC studies to be performed. The extensible nature of the design allows for other functionality and hanging protocols to be available for each study. Panning, windowing, zooming and moving through slices are all available while modality-specific features can be easily enabled e.g. quadrant zooming in mammographic studies. MedXViewer can integrate with a web-based image database allowing results and images to be stored centrally. The software and images can be downloaded remotely from this centralised data-store. Alternatively, the software can run without a network connection where the images and results can be encrypted and stored locally on a machine or external drive. Due to the advanced workstation-style functionality, the simple deployment on heterogeneous systems over the internet without a requirement for administrative access and the ability to utilise a centralised database, MedXViewer has been used for running remote paper-less observer studies and is capable of providing a training infrastructure and co-ordinating remote collaborative viewing sessions (e.g. cancer reviews, interesting cases).
Atlas-registration based image segmentation of MRI human thigh muscles in 3D space
Ezak Ahmad, Moi Hoon Yap, Hans Degens, et al.
Automatic segmentation of anatomic structures of magnetic resonance thigh scans can be a challenging task due to the potential lack of precisely defined muscle boundaries and issues related to intensity inhomogeneity or bias field across an image. In this paper, we demonstrate a combination framework of atlas construction and image registration methods to propagate the desired region of interest (ROI) between atlas image and the targeted MRI thigh scans for quadriceps muscles, femur cortical layer and bone marrow segmentations. The proposed system employs a semi-automatic segmentation method on an initial image in one dataset (from a series of images). The segmented initial image is then used as an atlas image to automate the segmentation of other images in the MRI scans (3-D space). The processes include: ROI labeling, atlas construction and registration, and morphological transform correspondence pixels (in terms of feature and intensity value) between the atlas (template) image and the targeted image based on the prior atlas information and non-rigid image registration methods.
Automatic segmentation of abdominal vessels for improved pancreas localization
Accurate automatic detection and segmentation of abdominal organs from CT images is important for quantitative and qualitative organ tissue analysis as well as computer-aided diagnosis. The large variability of organ locations, the spatial interaction between organs that appear similar in medical scans and orientation and size variations are among the major challenges making the task very difficult. The pancreas poses these challenges in addition to its flexibility which allows for the shape of the tissue to vastly change. Due to the close proximity of the pancreas to numerous surrounding organs within the abdominal cavity the organ shifts according to the conditions of the organs within the abdomen, as such the pancreas is constantly changing. Combining these challenges with typically found patient-to-patient variations and scanning conditions the pancreas becomes harder to localize. In this paper we focus on three abdominal vessels that almost always abut the pancreas tissue and as such useful landmarks to identify the relative location of the pancreas. The splenic and portal veins extend from the hila of the spleen and liver, respectively, travel through the abdominal cavity and join at a position close to the head of the pancreas known as the portal confluence. A third vein, the superior mesenteric vein, anastomoses with the other two veins at the portal confluence. An automatic segmentation framework for obtaining the splenic vein, portal confluence and superior mesenteric vein is proposed using 17 contrast enhanced computed-tomography datasets. The proposed method uses outputs from the multi-organ multi-atlas label fusion and Frangi vesselness filter to obtain automatic seed points for vessel tracking and generation of statistical models of the desired vessels. The approach shows ability to identify the vessels and improve localization of the pancreas within the abdomen.
A new iterative method for liver segmentation from perfusion CT scans
Ahmed Draoua, Adélaïde Albouy-Kissi, Antoine Vacavant, et al.
Liver cancer is the third most common cancer in the world, and the majority of patients with liver cancer will die within one year as a result of the cancer. Liver segmentation in the abdominal area is critical for diagnosis of tumor and for surgical procedures. Moreover, it is a challenging task as liver tissue has to be separated from adjacent organs and substantially the heart. In this paper we present a novel liver segmentation iterative method based on Fuzzy C-means (FCM) coupled with a fast marching segmentation and mutual information. A prerequisite for this method is the determination of slice correspondences between ground truth that is, a few images segmented by an expert, and images that contain liver and heart at the same time.
Evaluation of correlation between CT image features and ERCC1 protein expression in assessing lung cancer prognosis
Stage I non-small-cell lung cancers (NSCLC) usually have favorable prognosis. However, high percentage of NSCLC patients have cancer relapse after surgery. Accurately predicting cancer prognosis is important to optimally treat and manage the patients to minimize the risk of cancer relapse. Studies have shown that an excision repair crosscomplementing 1 (ERCC1) gene was a potentially useful genetic biomarker to predict prognosis of NSCLC patients. Meanwhile, studies also found that chronic obstructive pulmonary disease (COPD) was highly associated with lung cancer prognosis. In this study, we investigated and evaluated the correlations between COPD image features and ERCC1 gene expression. A database involving 106 NSCLC patients was used. Each patient had a thoracic CT examination and ERCC1 genetic test. We applied a computer-aided detection scheme to segment and quantify COPD image features. A logistic regression method and a multilayer perceptron network were applied to analyze the correlation between the computed COPD image features and ERCC1 protein expression. A multilayer perceptron network (MPN) was also developed to test performance of using COPD-related image features to predict ERCC1 protein expression. A nine feature based logistic regression analysis showed the average COPD feature values in the low and high ERCC1 protein expression groups are significantly different (p < 0.01). Using a five-fold cross validation method, the MPN yielded an area under ROC curve (AUC = 0.669±0.053) in classifying between the low and high ERCC1 expression cases. The study indicates that CT phenotype features are associated with the genetic tests, which may provide supplementary information to help improve accuracy in assessing prognosis of NSCLC patients.
Comparison of two indirect detection flat panel imagers
Kent Ogden, Kimball Clark, Andrij Wojtowycz, et al.
To obtain clearance for the use of a new flat-panel indirect detection imager, the FDA required the manufacturer to provide evidence of the image quality. To this end, two sets of observer studies were conducted, one in which images from the detector was compared side by side with images from an approved detector, and a second set in which each individual image was scored for image contrast, noise, and resolution. Statistical analysis of the results showed that there was not a significant difference in the image quality produced by the two detectors. FDA 510k clearance was granted in May 2013.
Complementary cumulative precision distribution: a new graphical metric for medical image retrieval system
Several single valued measures have been proposed by researchers for the quantitative performance evaluation of medical image retrieval systems. Precision and recall are the most common evaluation measures used by researchers. Amongst graphical measures proposed, precision vs. recall graph is the most common evaluation measure. Precision vs. recall graph evaluates di®erent systems by varying the operating points (number of top retrieval considered). However, in real life the operating point for di®erent applications are known. Therefore, it is essential to evaluate di®erent retrieval systems at a particular operating point set by the user. None of the graphical metric provides the variation of performance of query images over the entire database at a particular operating point. This paper proposes a graphical metric called Complementary Cumulative Precision Distribution (CCPD) that evaluates di®erent systems at a particular operating point considering each images in the database for query. The strength of the metric is its ability to represent all these measures pictorially. The proposed metric (CCPD) pictorially represents the di®erent possible values of precision and the fraction of query images at those precision values considering number of top retrievals constant. Di®erent scalar measures are derived from the proposed graphical metric (CCPD) for e®ective evaluation of retrieval systems. It is also observed that the proposed metric can be used as a tie breaker when the performance of di®erent methods are very close to each other in terms of average precision.