Medical Imaging 2016: Image Perception, Observer Performance, and Technology Assessment | (2016) | Publications

Volume Details

Date Published: 17 June 2016

Contents: 10 Sessions, 57 Papers, 0 Presentations

Conference: SPIE Medical Imaging 2016

Volume Number: 9787

All links to SPIE Proceedings will open in the SPIE Digital Library.

Show all abstracts

View Session

Front Matter: Volume 9787
Technology Assessment in Breast Imaging
Model Observers I
Perception Metrology
Perception
Keynote and ROC Analysis
Model Observers II: Search
Breast Imaging II
Technology Assessment
Poster Session

Front Matter: Volume 9787

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 9787, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and Conference Committee listing.

Technology Assessment in Breast Imaging

Quantra reproduces BI-RADS assessment on a two-point scale

Ernest U. Ekpo, Claudia Mello-Thoms, Mary Rickard, et al.

Show abstract

PURPOSE: To assess the performance of Quantra^TM in reproducing BI-RADS^® mammographic breast density (MBD) assessment. METHODS: Two methods of MBD assessment were used (Quantra^TM and BI-RADS^®). Volumetric breast density measurement from 292 raw projection images was performed using Quantra^TM. BI-RADS^® assessment was performed by three radiologists and a majority report (consensus of at least two radiologists) was generated. Interreader agreement (κ), agreement, and the sensitivity and specificity of Quantra^TM in reproducing BI-RADS^® rating were calculated on a four-grade (1, 2, 3, and 4) and two-grade (1–2 vs. 3–4) scale. RESULTS: The majority BI-RADS^® report in the dataset consisted of 9.6% (n = 28), 35.3% (n = 103), 27.1% (n = 79), and 28.1% (n = 82) for BI-RADS® 1, 2, 3, and 4 respectively. Intra-reader agreement (κ) was 0.86 (95%CI: 0.83 – 0.91) to 0.88 (95%CI: 0.85 – 0.93) on a four-grade and 0.88 (95%CI: 0.83 – 0.92) to 0.91 (95%CI: 0.88 – 0.95) on a two-grade scale. Inter-reader agreement (κ) was substantial [0.66 (95%CI: 0.62 – 0.71) to 0.75 (95%CI: 0.70 – 0.81)] on a four-grade scale and substantial to almost perfect [0.77 (95%CI: 0.73 – 0.82) to 0.89 (95%CI: 0.84 – 0.93)] on a two-grade scale. Quantra^TM correctly classified 35.7%, 91.2%, 88.6%, and 50.3% of BI-RADS^® 1, 2, 3, and 4 respectively. It also demonstrated 91.3% sensitivity and 83.6% specificity in reproducing BI-RADS^® on a two-grade scale (1–2 vs. 3–4). CONCLUSION: Quantra^TM has limited performance in reproducing BI-RADS^® rating on a four-grade scale, however, highly reproduces BI-RADS^® assessment on a two-grade scale.

Validated novel software to measure the conspicuity index of lesions in DICOM images

K. R. Szczepura, D. J. Manning

Show abstract

A novel software programme and associated Excel spreadsheet has been developed to provide an objective measure of the expected visual detectability of focal abnormalities within DICOM images. ROIs are drawn around the abnormality, the software then fits the lesion using a least squares method to recognize the edges of the lesion based on the full width half maximum. 180 line profiles are then plotted around the lesion, giving 360 edge profiles.

Impact of two types of image processing on cancer detection in mammography

Lucy M. Warren, Mark D. Halling-Brown, Padraig T. Looney, et al.

Show abstract

The impact of image processing on cancer detection is still a concern to radiologists and physicists. This work aims to evaluate the effect of two types of image processing on cancer detection in mammography. An observer study was performed in which six radiologists inspected 349 cases (a mixture of normal cases, benign lesions and cancers) processed with two types of image processing. The observers marked areas they were suspicious were cancers. JAFROC analysis was performed to determine if there was a significant difference in cancer detection between the two types of image processing. Cancer detection was significantly better with the standard setting image processing (flavor A) compared with one that provides enhanced image contrast (flavor B), p = 0.036. The image processing was applied to images of the CDMAM test object, which were then analysed using CDCOM. The threshold gold thickness measured with the CDMAM test object was thinner using flavor A than flavor B image processing. Since Flavor A was found to be superior in both the observer study and the measurements using the CDMAM phantom, this may indicate that measurements using the CDMAM correlate with change in cancer detection with different types of image processing.

Potential workflow advantages with single 8MP versus dual 5MP displays

Elizabeth A. Krupinski

Show abstract

This study compared an 8MP vs dual-5MP displays for diagnostic accuracy, reading time, number of times readers zoomed/panned images, and visual search. Six radiologists viewed 60 mammographic cases, once on each display, 15 with eye-tracking. For viewing time, there was significant difference (F = 13.901, p = 0.0002), with 8MP taking less time (62.04 sec vs 68.99). There was no significant difference (F = 0.254, p = 0.6145) in zoom/pan use (1.94 vs 1.89). Total number of fixations was significantly (F = 4.073, p = 0.0466) lower with 8MP (134.47 vs 154.29). Number of times readers scanned between images was significantly fewer (F = 10.305, p = 0.0018) with single (6.83 vs 8.22). Time to first fixate lesion did not differ (F = 0.126, p = 0.7240). It did not take any longer to detect the lesion as a function of the display configuration. Total time spent on lesion did not differ (F = 0.097, p = 0.7567) (8.59 vs 8.39). Overall the single 8MP display yielded the same diagnostic accuracy as the dual 5MP displays. The lower resolution did not appear to influence the readers’ ability to detect and view the lesion details, as the eye-position study showed no differences in time to first fixate or total time on the lesions. Nor did the lower resolution result in significant differences in the amount of zooming and panning that the readers did while viewing the cases.

Discriminatory power of common genetic variants in personalized breast cancer diagnosis

Yirong Wu, Craig K. Abbey, Jie Liu, et al.

Show abstract

Technology advances in genome-wide association studies (GWAS) has engendered optimism that we have entered a new age of precision medicine, in which the risk of breast cancer can be predicted on the basis of a person’s genetic variants. The goal of this study is to evaluate the discriminatory power of common genetic variants in breast cancer risk estimation. We conducted a retrospective case-control study drawing from an existing personalized medicine data repository. We collected variables that predict breast cancer risk: 153 high-frequency/low-penetrance genetic variants, reflecting the state-of-the-art GWAS on breast cancer, mammography descriptors and BI-RADS assessment categories in the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We trained and tested naïve Bayes models by using these predictive variables. We generated ROC curves and used the area under the ROC curve (AUC) to quantify predictive performance. We found that genetic variants achieved comparable predictive performance to BI-RADS assessment categories in terms of AUC (0.650 vs. 0.659, p-value = 0.742), but significantly lower predictive performance than the combination of BI-RADS assessment categories and mammography descriptors (0.650 vs. 0.751, p-value < 0.001). A better understanding of relative predictive capability of genetic variants and mammography data may benefit clinicians and patients to make appropriate decisions about breast cancer screening, prevention, and treatment in the era of precision medicine.

Model Observers I

Can model observers be developed to reproduce radiologists' diagnostic performances? Our study says not so fast!

Juhun Lee, Robert M. Nishikawa, Ingrid Reiser, et al.

Show abstract

The purpose of this study was to determine radiologists’ diagnostic performances on different image reconstruction algorithms that could be used to optimize image-based model observers. We included a total of 102 pathology proven breast computed tomography (CT) cases (62 malignant). An iterative image reconstruction (IIR) algorithm was used to obtain 24 reconstructions with different image appearance for each image. Using quantitative image feature analysis, three IIRs and one clinical reconstruction of 50 lesions (25 malignant) were selected for a reader study. The reconstructions spanned a range of smooth-low noise to sharp-high noise image appearance. The trained classifiers’ AUCs on the above reconstructions ranged from 0.61 (for smooth reconstruction) to 0.95 (for sharp reconstruction). Six experienced MQSA radiologists read 200 cases (50 lesions times 4 reconstructions) and provided the likelihood of malignancy of each lesion. Radiologists’ diagnostic performances (AUC) ranged from 0.7 to 0.89. However, there was no agreement among the six radiologists on which image appearance was the best, in terms of radiologists’ having the highest diagnostic performances. Specifically, two radiologists indicated sharper image appearance was diagnostically superior, another two radiologists indicated smoother image appearance was diagnostically superior, and another two radiologists indicated all image appearances were diagnostically similar to each other. Due to the poor agreement among radiologists on the diagnostic ranking of images, it may not be possible to develop a model observer for this particular imaging task.

Applying the J-optimal channelized quadratic observer to SPECT myocardial perfusion defect detection

Meredith K. Kupinski, Eric Clarkson, Michael Ghaly, et al.

Show abstract

To evaluate performance on a perfusion defect detection task from 540 image pairs of myocardial perfusion SPECT image data we apply the J-optimal channelized quadratic observer (J-CQO). We compare AUC values of the linear Hotelling observer and J-CQO when the defect location is fixed and when it occurs in one of two locations. As expected, when the location is fixed a single channels maximizes AUC; location variability requires multiple channels to maximize the AUC. The AUC is estimated from both the projection data and reconstructed images. J-CQO is quadratic since it uses the first- and second- order statistics of the image data from both classes. The linear data reduction by the channels is described by an L x M channel matrix and in prior work we introduced an iterative gradient-based method for calculating the channel matrix. The dimensionality reduction from M measurements to L channels yields better estimates of these sample statistics from smaller sample sizes, and since the channelized covariance matrix is L x L instead of M x M, the matrix inverse is easier to compute. The novelty of our approach is the use of Jeffrey’s divergence (J) as the figure of merit (FOM) for optimizing the channel matrix. We previously showed that the J-optimal channels are also the optimum channels for the AUC and the Bhattacharyya distance when the channel outputs are Gaussian distributed with equal means. This work evaluates the use of J as a surrogate FOM (SFOM) for AUC when these statistical conditions are not satisfied.

Identification of error making patterns in lesion detection on digital breast tomosynthesis using computer-extracted image features

Mengyu Wang, Jing Zhang, Lars J. Grimm, et al.

Show abstract

Digital breast tomosynthesis (DBT) can improve lesion visibility by eliminating the issue of overlapping breast tissue present in mammography. However, this new modality likely requires new approaches to training. The issue of training in DBT is not well explored. We propose a computer-aided educational approach for DBT training. Our hypothesis is that the trainees’ educational outcomes will improve if they are presented with cases individually selected to address their weaknesses. In this study, we focus on the question of how to select such cases. Specifically, we propose an algorithm that based on previously acquired reading data predicts which lesions will be missed by the trainee for future cases (i.e., we focus on false negative error). A logistic regression classifier was used to predict the likelihood of trainee error and computer-extracted features were used as the predictors. Reader data from 3 expert breast imagers was used to establish the ground truth and reader data from 5 radiology trainees was used to evaluate the algorithm performance with repeated holdout cross validation. Receiver operating characteristic (ROC) analysis was applied to measure the performance of the proposed individual trainee models. The preliminary experimental results for 5 trainees showed the individual trainee models were able to distinguish the lesions that would be detected from those that would be missed with the average area under the ROC curve of 0.639 (95% CI, 0.580-0.698). The proposed algorithm can be used to identify difficult cases for individual trainees.

Location- and lesion-dependent estimation of background tissue complexity for anthropomorphic model observer

Ali R. N. Avanaki, Kathryn Espig, Eddie Knippel, et al.

Show abstract

In this paper, we specify a notion of background tissue complexity (BTC) as perceived by a human observer that is suited for use with model observers. This notion of BTC is a function of image location and lesion shape and size. We propose four unsupervised BTC estimators based on: (i) perceived pre- and post-lesion similarity of images, (ii) lesion border analysis (LBA; conspicuous lesion should be brighter than its surround), (iii) tissue anomaly detection, and (iv) mammogram density measurement. The latter two are existing methods we adapt for location- and lesion-dependent BTC estimation. To validate the BTC estimators, we ask human observers to measure BTC as the visibility threshold amplitude of an inserted lesion at specified locations in a mammogram. Both human-measured and computationally estimated BTC varied with lesion shape (from circular to oval), size (from small circular to larger circular), and location (different points across a mammogram). BTCs measured by different human observers are correlated (ρ=0.67). BTC estimators are highly correlated to each other (0.84<rho;<0.95) and less so to human observers (ρ<=0.81). With change in lesion shape or size, estimated BTC by LBA changes in the same direction as human-measured BTC. A generalization of proposed methods for viewing breast tomosynthesis sequences in cine mode is outlined. The proposed estimators, as-is or customized to a specific human observer, may be used to construct a BTC-aware model observer, with applications such as optimization of contrast-enhanced medical imaging systems, and creation of a diversified image dataset with characteristics of a desired population.

Task-based optimization of flip angle for texture analysis in MRI

Jonathan F. Brand, Lars R. Furenlid, Maria I. Altbach, et al.

Show abstract

Chronic liver disease is a worldwide health problem, and hepatic fibrosis (HF) is one of the hallmarks of the disease. The current reference standard for diagnosing HF is biopsy followed by pathologist examination, however this is limited by sampling error and carries risk of complications. Pathology diagnosis of HF is based on textural change in the liver as a lobular collagen network that develops within portal triads. The scale of collagen lobules is characteristically on order of 1-5 mm, which approximates the resolution limit of in vivo gadolinium-enhanced magnetic resonance imaging in the delayed phase. We have shown that MRI of formalin fixed human ex vivo liver samples mimic the textural contrast of in vivo Gd-MRI and can be used as MRI phantoms. We have developed local texture analysis that is applied to phantom images, and the results are used to train model observers. The performance of the observer is assessed with the area-under-the-receiveroperator- characteristic curve (AUROC) as the figure of merit. To optimize the MRI pulse sequence, phantoms are scanned with multiple times at a range of flip angles. The flip angle that associated with the highest AUROC is chosen as optimal based on the task of detecting HF.

Task-based detectability comparison of exponential transformation of free-response operating characteristic (EFROC) curve and channelized Hotelling observer (CHO)

P. Khobragade, Jiahua Fan, Franco Rupcich, et al.

Show abstract

This study quantitatively evaluated the performance of the exponential transformation of the free-response operating characteristic curve (EFROC) metric, with the Channelized Hotelling Observer (CHO) as a reference. The CHO has been used for image quality assessment of reconstruction algorithms and imaging systems and often it is applied to study the signal-location-known cases. The CHO also requires a large set of images to estimate the covariance matrix. In terms of clinical applications, this assumption and requirement may be unrealistic. The newly developed location-unknown EFROC detectability metric is estimated from the confidence scores reported by a model observer. Unlike the CHO, EFROC does not require a channelization step and is a non-parametric detectability metric. There are few quantitative studies available on application of the EFROC metric, most of which are based on simulation data. This study investigated the EFROC metric using experimental CT data. A phantom with four low contrast objects: 3mm (14 HU), 5mm (7HU), 7mm (5 HU) and 10 mm (3 HU) was scanned at dose levels ranging from 25 mAs to 270 mAs and reconstructed using filtered backprojection. The area under the curve values for CHO (AUC) and EFROC (A_FE) were plotted with respect to different dose levels. The number of images required to estimate the non-parametric A_FE metric was calculated for varying tasks and found to be less than the number of images required for parametric CHO estimation. The A_FE metric was found to be more sensitive to changes in dose than the CHO metric. This increased sensitivity and the assumption of unknown signal location may be useful for investigating and optimizing CT imaging methods. Future work is required to validate the A_FE metric against human observers.

Perception Metrology

Semi-parametric estimation of the area under the precision-recall curve

Berkman Sahiner, Weijie Chen, Aria Pezeshk, et al.

Show abstract

Precision and recall are two common metrics used in the evaluation of information retrieval systems. By changing the number of retrieved documents, one can obtain a precision-recall curve. The area under the precision-recall curve (AUCPR) has been suggested as a performance measure for information retrieval systems, in a manner similar to the use of the area under the receiver operating characteristic curve in binary classification. Limited work has been performed in the literature to investigate the bias and variance of AUCPR estimators. The goal of our study was to investigate the bias and variability of a semi-parametric binormal method for estimating the AUCPR, and to compare it to other techniques, such as average precision (AP) and lower trapezoid (LT) approximation. We show how AUCPR can be obtained given the binormal model parameters, and how its variance can be estimated using the delta method. We performed simulation experiments with normal and non-normal data, and investigated the effect of sample size and prevalence. Our results indicated that the semi-parametric binormal approach provided AUCPR estimates with small bias and confidence intervals with acceptable coverage when the sample size was large, and the performance of the binormal model was comparable to or better than alternative methods evaluated in this study when the sample size was small. We conclude that the semi-parametric binormal model can be used to accurately estimate the AUCPR, and that the confidence intervals derived from the model can be at least as accurate as from other alternatives, even for non-normal decision variable distributions.

Proper bibeta ROC model: algorithm, software, and performance evaluation

Weijie Chen, Nan Hu

Show abstract

Semi-parametric models are often used to fit data collected in receiver operating characteristic (ROC) experiments to obtain a smooth ROC curve and ROC parameters for statistical inference purposes. The proper bibeta model as recently proposed by Mossman and Peng enjoys several theoretical properties. In addition to having explicit density functions for the latent decision variable and an explicit functional form of the ROC curve, the two parameter bibeta model also has simple closed-form expressions for true-positive fraction (TPF), false-positive fraction (FPF), and the area under the ROC curve (AUC). In this work, we developed a computational algorithm and R package implementing this model for ROC curve fitting. Our algorithm can deal with any ordinal data (categorical or continuous). To improve accuracy, efficiency, and reliability of our software, we adopted several strategies in our computational algorithm including: (1) the LABROC4 categorization to obtain the true maximum likelihood estimation of the ROC parameters; (2) a principled approach to initializing parameters; (3) analytical first-order and second-order derivatives of the likelihood function; (4) an efficient optimization procedure (the L-BFGS algorithm in the R package “nlopt”); and (5) an analytical delta method to estimate the variance of the AUC. We evaluated the performance of our software with intensive simulation studies and compared with the conventional binormal and the proper binormal-likelihood-ratio models developed at the University of Chicago. Our simulation results indicate that our software is highly accurate, efficient, and reliable.

MRMC analysis of agreement studies

Brandon D. Gallas, Amrita Anam, Weijie Chen, et al.

Show abstract

The purpose of this work is to present and evaluate methods based on U-statistics to compare intra- or inter-reader agreement across different imaging modalities. We apply these methods to multi-reader multi-case (MRMC) studies. We measure reader-averaged agreement and estimate its variance accounting for the variability from readers and cases (an MRMC analysis). In our application, pathologists (readers) evaluate patient tissue mounted on glass slides (cases) in two ways. They evaluate the slides on a microscope (reference modality) and they evaluate digital scans of the slides on a computer display (new modality). In the current work, we consider concordance as the agreement measure, but many of the concepts outlined here apply to other agreement measures. Concordance is the probability that two readers rank two cases in the same order. Concordance can be estimated with a U-statistic and thus it has some nice properties: it is unbiased, asymptotically normal, and its variance is given by an explicit formula. Another property of a U-statistic is that it is symmetric in its inputs; it doesn't matter which reader is listed first or which case is listed first, the result is the same. Using this property and a few tricks while building the U-statistic kernel for concordance, we get a mathematically tractable problem and efficient software. Simulations show that our variance and covariance estimates are unbiased.

Quality metrics can help the expert during neurological clinical trials

L. Mahé, F. Autrusseau, H. Desal, et al.

Show abstract

Carotid surgery is a frequent act corresponding to 15 to 20 thousands operations per year in France. Cerebral perfusion has to be tracked before and after carotid surgery. In this paper, a diagnosis support using quality metrics is proposed to detect vascular lesions on MR images. Our key stake is to provide a detection tool mimicking the human visual system behavior during the visual inspection. Relevant Human Visual System (HVS) properties should be integrated in our lesion detection method, which must be robust to common distortions in medical images. Our goal is twofold: to help the neuroradiologist to perform its task better and faster but also to provide a way to reduce the risk of bias in image analysis. Objective quality metrics (OQM) are methods whose goal is to predict the perceived quality. In this work, we use Objective Quality Metrics to detect perceivable differences between pairs of images.

Performance comparison of quantitative semantic features and lung-RADS in the National Lung Screening Trial

Qian Li, Yoganand Balagurunathan, Ying Liu, et al.

Show abstract

Background: Lung-RADS is the new oncology classification guideline proposed by American College of Radiology (ACR), which provides recommendation for further follow up in lung cancer screening. However, only two features (solidity and size) are included in this system. We hypothesize that additional sematic features can be used to better characterize lung nodules and diagnose cancer. Objective: We propose to develop and characterize a systematic methodology based on semantic image traits to more accurately predict occurrence of cancerous nodules. Methods: 24 radiological image traits were systematically scored on a point scale (up to 5) by a trained radiologist, and lung-RADS was independently scored. A linear discriminant model was used on the semantic features to access their performance in predicting cancer status. The semantic predictors were then compared to lung-RADS classification in 199 patients (60 cancers, 139 normal controls) obtained from the National Lung Screening Trial. Result: There were different combinations of semantic features that were strong predictors of cancer status. Of these, contour, border definition, size, solidity, focal emphysema, focal fibrosis and location emerged as top candidates. The performance of two semantic features (short axial diameter and contour) had an AUC of 0.945, and was comparable to that of lung-RADS (AUC: 0.871). Conclusion: We propose that a semantics-based discrimination approach may act as a complement to the lung-RADS to predict cancer status.

Perception

The classification of normal screening mammograms

Zoey Z. Y. Ang, Mohammad A. Rawashdeh, Robert Heard, et al.

Show abstract

Rationale and objectives: To understand how breast screen readers classify the difficulty of normal screening mammograms using common lexicon describing normal appearances. Cases were also assessed on their suitability for a single reader strategy.

Materials and Methods: 15 breast readers were asked to interpret a test set of 29 normal screening mammogram cases and classify them by rating the difficulty of the case on a five-point Likert scale, identifying the salient features and assessing their suitability for single reading. Using the False Positive Fractions from a previous study, the 29 cases were classified into 10 "low", 10 "medium" and nine "high" difficulties. Data was analyzed with descriptive statistics. Spearman's correlation was used to test the strength of association between the difficulty of the cases and the readers’ recommendation for single reading strategy.

Results: The ratings from readers in this study corresponded to the known difficulty level of cases for the 'low' and 'high' difficulty cases. Uniform ductal pattern and density, symmetrical mammographic features and the absence of micro-calcifications were the main reasons associated with 'low' difficulty cases. The 'high' difficulty cases were described as having ‘dense breasts’. There was a statistically significant negative correlation between the difficulty of the cases and readers’ recommendation for single reading (r = -0.475, P = 0.009).

Conclusion: The findings demonstrated potential relationships between certain mammographic features and the difficulty for readers to classify mammograms as 'normal'. The standard Australian practice of double reading was deemed more suitable for most cases. There was an inverse moderate association between the difficulty of the cases and the recommendations for single reading.

The potential of pigeons as surrogate observers in medical image perception studies

Elizabeth A. Krupinski, Richard M. Levenson, Victor Navarro, et al.

Show abstract

Assessment of medical image quality and how changes in image appearance impact performance are critical but assessment can be expensive and time-consuming. Could an animal (pigeon) observer with well-known visual skills and documented ability to distinguish complex visual stimuli serve as a surrogate for the human observer? Using sets of whole slide pathology (WSI) and mammographic images we trained pigeons (cohorts of 4) to detect and/or classify lesions in medical images. Standard training methods were used. A chamber equipped with a 15’ display with a resistive touchscreen was used to display the images and record responses (pecks). Pigeon pellets were dispensed for correct responses. The pigeons readily learned to distinguish benign from malignant breast cancer histopathology in WSI (mean % correct responses rose 50% to 85% over 15 days) and generalized readily from 4X to 10X and 20X magnifications; to detect microcalcifications (mean % correct responses rose 50% to over 85% over 25 days); to distinguish benign from malignant breast masses (3 of 4 birds learned this task to around 80% and 60% over 10 days); and ignore compression artifacts in WSI (performance with uncompressed slides averaged 95% correct; 15:1 and 27:1 compression slides averaged 92% and 90% correct). Pigeons models may help us better understand medical image perception and may be useful in quality assessment by serving as surrogate observers for certain types of studies.

The impact of radiology expertise upon the localization of subtle pulmonary lesions

John W. Robinson, Patrick C. Brennan, Claudia Mello-Thoms, et al.

Show abstract

Rationale and objectives: This study investigates the influence of radiology expertise in the correct localization of lesions when radiologists are requested to complete an observer task. Specifically, the ability to detect pulmonary lesions of different subtleties is explored in relation to radiologists’ reported specialty. Materials and Methods: Institutional ethics was granted. Ten radiologists (5 thoracic, 5 non-thoracic) interpreted 40 posterior-anterior (PA) chest x-rays (CXRs) consisting of 21 normal and 19 abnormal cases (solitary pulmonary nodule). The abnormal cases contained a solitary nodule with an established subtlety (subtlety 5 = obvious to subtlety 1 = extremely subtle). Radiologists read the test set and identified any pulmonary nodule using a 1-5 confidence scale (1=no pulmonary nodule to 5=highest confidence case contains a pulmonary lesion). The radiologists interpreted the image bank twice and the cases were randomized for each reader between reads. Results: The Kruskal-Wallis test identified that subtlety of nodules significantly influenced the sensitivity of nonthoracic radiologists (P=<0.0001) and thoracic radiologists (P=<0.0001). A Wilcoxon rank test demonstrated a significant difference in sensitivity for radiologist specialisation (P=0.013), with thoracic radiologists better compared to non-thoracic radiologists (mean sensitivity 0.479 and 0.389 respectively). The sensitivity of nodule detection decreased when comparing subtlety 4 to 3, 3 to 2 and 2 to 1 for non-thoracic and thoracic radiologists’with the subtlety 3 to subtlety 2 being significant (P=0.014) for non thoracic radiologists while thoracic radiologists’ demonstrated a decrease but no transitions between subtlety were significant. The most noticeable, and interesting, effect was with the thoracic radiologists’ with the average means of subtlety 2 and 1 being almost the same and closely comparable to level 3. Conclusion: Results from this study indicate that expertise in chest radiology does significantly impact upon the sensitivity of radiologists in detecting pulmonary lesions of varying subtlety. Thoracic radiologists had a consistently higher sensitivity with subtle, very subtle and extremely subtle nodules.

Quantitative imaging features to predict cancer status in lung nodules

Ying Liu, Yoganand Balagurunathan, Thomas Atwater, et al.

Show abstract

Background: We propose a systematic methodology to quantify incidentally identified lung nodules based on observed radiological traits on a point scale. These quantitative traits classification model was used to predict cancer status. Materials and Methods: We used 102 patients’ low dose computed tomography (LDCT) images for this study, 24 semantic traits were systematically scored from each image. We built a machine learning classifier in cross validation setting to find best predictive imaging features to differentiate malignant from benign lung nodules. Results: The best feature triplet to discriminate malignancy was based on long axis, concavity and lymphadenopathy with average AUC of 0.897 (Accuracy of 76.8%, Sensitivity of 64.3%, Specificity of 90%). A similar semantic triplet optimized on Sensitivity/Specificity (Youden’s J index) included long axis, vascular convergence and lymphadenopathy which had an average AUC of 0.875 (Accuracy of 81.7%, Sensitivity of 76.2%, Specificity of 95%). Conclusions: Quantitative radiological image traits can differentiate malignant from benign lung nodules. These semantic features along with size measurement enhance the prediction accuracy.

Shapelet analysis of pupil dilation for modeling visuo-cognitive behavior in screening mammography

Folami Alamudun, Hong-Jun Yoon, Tracy Hammond, et al.

Show abstract

Our objective is to improve understanding of visuo-cognitive behavior in screening mammography under clinically equivalent experimental conditions. To this end, we examined pupillometric data, acquired using a head-mounted eye-tracking device, from 10 image readers (three breast-imaging radiologists and seven Radiology residents), and their corresponding diagnostic decisions for 100 screening mammograms. The corpus of mammograms comprised cases of varied pathology and breast parenchymal density. We investigated the relationship between pupillometric fluctuations, experienced by an image reader during mammographic screening, indicative of changes in mental workload, the pathological characteristics of a mammographic case, and the image readers’ diagnostic decision and overall task performance. To answer these questions, we extract features from pupillometric data, and additionally applied time series shapelet analysis to extract discriminative patterns in changes in pupil dilation. Our results show that pupillometric measures are adequate predictors of mammographic case pathology, and image readers’ diagnostic decision and performance with an average accuracy of 80%.

Image similarity ranking of focal computed tomography liver lesions using a 2AFC technique

Jessica Faruque, Sameer Antani, Rodney Long, et al.

Show abstract

Content-based image retrieval (CBIR) for radiological images has experienced massive growth over the past two decades, and shows great potential as a tool for use in precision medicine. A recurring challenge in CBIR evaluation has been in obtaining reference sets of images from human viewers of the system. Our work seeks to determine the feasibility of creating a reference set from images ranked by similarity from human viewers of the images. We obtained 2 sets each of 10 images of CT focal liver lesions from a database of open-access publications with and without markings showing the region containing the lesions, respectively. We created 2 sets of all 45 pair-wise combinations of the images, and displayed them to 10 volunteers, of which 2 had medical training. We used a Two-Alternative Forced Choice (2AFC) paradigm to obtain complete rankings of similarity levels in these image pairs. Analysis showed that inter-reader agreement for rankings ranged from Tau=0.21-0.69 (median=0.37) for the image pairs without any markings, and Tau=0.21-0.57 (median=0.33) for the image pairs with markings. A comparison of the regions of interests drawn by the study participants outlining the lesions in images without markings showed that participants tended to agree on images containing a single focal lesion of a single density, and inter-reader agreement for image rankings in which the regions of interest agree ranged from Tau=0.39-0.85 (median=0.58). These results show that the use of image ranking using 2AFC may be a feasible method for creating reference sets for CBIR system validation.

Keynote and ROC Analysis

Detection of pulmonary nodule growth with dose reduced chest tomosynthesis: a human observer study using simulated nodules

Christina Söderman, Åse Johnsson, Jenny Vikgren, et al.

Show abstract

Chest tomosynthesis may be a suitable alternative to computed tomography for the clinical task of follow up of pulmonary nodules. The aim of the present study was to investigate the detection of pulmonary nodule growth suggestive of malignancy using chest tomosynthesis. Previous studies have indicated remained levels of detection of pulmonary nodules at dose levels corresponding to that of a conventional lateral radiograph, approximately 0.04 mSv, which motivated to perform the present study this dose level. Pairs of chest tomosynthesis image sets, where the image sets in each pair were acquired of the same patient at two separate occasions, were included in the study. Simulated nodules with original diameters of approximately 8 mm were inserted in the pairs of image sets, simulating situations where the nodule had remained stable in size or increased isotropically in size between the two different imaging occasions. Four different categories of nodule growth were included, corresponding to a volume increase of approximately 21 %, 68 %, 108 % and 250 %. All nodules were centered in the depth direction in the tomosynthesis images. All images were subjected to a simulated dose reduction, resulting in images corresponding to an effective dose of 0.04 mSv. Four observers were given the task of rating their confidence that the nodule was stable in size or not on a five-level rating scale. This was done both before any size measurements were made of the nodule as well as after measurements were performed. Using Receiver operating characteristic analysis, the rating data for the nodules that were stable in size was compared to the rating data for the nodules simulated to have increased in size. Statistically significant differences between the rating distributions for the stable nodules and all of the four nodule growth categories were found. For the three largest nodule growths, nearly perfect detection of nodule growth was seen. In conclusion, the present study indicates that during optimal imaging conditions and for nodules with diameters of approximately 8 mm that grow fairly symmetrically, chest tomosynthesis performed at a dose level corresponding to that of a lateral chest radiograph can, with high sensitivity, differentiate nodules stable in size from nodules growing at rates associated with fast growing malignant nodules.

Assessing nodule detection on lung cancer screening CT: the effects of tube current modulation and model observer selection on detectability maps

J. M. Hoffman, F. Noo, K. McMillan, et al.

Show abstract

Lung cancer screening using low dose CT has been shown to reduce lung cancer related mortality and been approved for widespread use in the US. These scans keep radiation doses low while maximizing the detection of suspicious lung lesions. Tube current modulation (TCM) is one technique used to optimize dose, however limited work has been done to assess TCM’s effect on detection tasks. In this work the effect of TCM on detection is investigated throughout the lung utilizing several different model observers (MO). 131 lung nodules were simulated at 1mm intervals in each lung of the XCAT phantom. A Sensation 64 TCM profile was generated for the XCAT phantom and 2500 noise realizations were created using both TCM and a fixed TC. All nodules and noise realizations were reconstructed for a total of 262 (left and right lungs) nodule reconstructions and 10 000 XCAT lung reconstructions. Single-slice Hotelling (HO) and channelized Hotelling (CHO) observers, as well as a multislice CHO were used to assess area-under-the-curve (AUC) as a function of nodule location in both the fixed TC and TCM cases. As expected with fixed TC, nodule detectability was lowest through the shoulders and leveled off below mid-lung; with TCM, detectability was unexpectedly highest through the shoulders, dropping sharply near the mid-lung and then increasing into the abdomen. Trends were the same for all model observers. These results suggest that TCM could be further optimized for detection and that detectability maps present exciting new opportunities for TCM optimization on a patient-specific level.

Model Observers II: Search

Ranking inconsistencies in the assessment of digital breast tomosynthesis (DBT) reconstruction algorithms using a location-known task and a search task

Xin He, Rongping Zeng, Frank Samuelson, et al.

Show abstract

In this work, we validated a task-based performance figure-of-merit (FOM) by investigating ranking inconsistencies due to lurking variable/factors. We applied a falsifiable search assessment theory to assessing digital breast tomosynthesis (DBT) image quality using a scanning channelized Hotelling observer (CHO) on a simulated DBT dataset. We compared the performance of five reconstruction algorithms: filter back projection (FBP), maximum likelihood (ML), simultaneous algebraic reconstruction technique (SART), total-variation regularized least square estimator (TVLS) with strong and mild regularization settings. The results showed that the location-known-exactly (LKE) detection performance was almost identical for the five reconstruction algorithms. However the search characteristic as described by effective set size (M*) and search AUC value, ranked them differently. To falsify/corroborate our evaluations on search characteristic and performance, we conducted an image-size test. This test demonstrated an agreement between theoretical predictions and empirically measured observer performance in absolute performance levels, except for the ML algorithm. We concluded that evidence corroborated our evaluations, except that for the ML algorithm where our evaluation was wrong. Further investigation of the wrong evaluation in the ML case revealed a lurking variable that affected system performance ranking in search when AUC value was used as the FOM. This further confirmed that our evaluation in its current form for the ML algorithm was indeed wrong. We also noted that the ranking inconsistencies exist even when the AUC value was used as the FOM, and the falsifiable nature of M* allowed such inconsistencies to be identified.

Model observer design for detecting multiple abnormalities in anatomical background images

Gezheng Wen, Mia K. Markey, Subok Park

Show abstract

As psychophysical studies are resource-intensive to conduct, model observers are commonly used to assess and optimize medical imaging quality. Existing model observers were typically designed to detect at most one signal. However, in clinical practice, there may be multiple abnormalities in a single image set (e.g., multifocal and multicentric breast cancers (MMBC)), which can impact treatment planning. Prevalence of signals can be different across anatomical regions, and human observers do not know the number or location of signals a priori. As new imaging techniques have the potential to improve multiple-signal detection (e.g., digital breast tomosynthesis may be more effective for diagnosis of MMBC than planar mammography), image quality assessment approaches addressing such tasks are needed. In this study, we present a model-observer mechanism to detect multiple signals in the same image dataset. To handle the high dimensionality of images, a novel implementation of partial least squares (PLS) was developed to estimate different sets of efficient channels directly from the images. Without any prior knowledge of the background or the signals, the PLS channels capture interactions between signals and the background which provide discriminant image information. Corresponding linear decision templates are employed to generate both image-level and location-specific scores on the presence of signals. Our preliminary results show that the model observer using PLS channels, compared to our first attempts with Laguerre-Gauss channels, can achieve high performance with a reasonably small number of channels, and the optimal design of the model observer may vary as the tasks of clinical interest change.

Visual-search observers for SPECT simulations with clinical backgrounds

Howard C. Gifford

Show abstract

The purpose of this work was to test the ability of visual-search (VS) model observers to predict the lesion- detection performance of human observers with hybrid SPECT images. These images consist of clinical back- grounds with simulated abnormalities. The application of existing scanning model observers to hybrid images is complicated by the need for extensive statistical information, whereas VS models based on separate search and analysis processes may operate with reduced knowledge. A localization ROC (LROC) study involved the detection and localization of solitary pulmonary nodules in Tc-99m lung images. The study was aimed at op- timizing the number of iterations and the postfiltering of four rescaled block-iterative reconstruction strategies. These strategies implemented different combinations of attenuation correction, scatter correction, and detector resolution correction. For a VS observer in this study, the search and analysis processes were guided by a single set of base morphological features derived from knowledge of the lesion profile. One base set used difference-of- Gaussian channels while a second base set implemented spatial derivatives in combination with the Burgess eye filter. A feature-adaptive VS observer selected features of interest for a given image set on the basis of training-set performance. A comparison of the feature-adaptive observer results against previously acquired human-observer data is presented.

Three scenarios of ranking inconsistencies involving search tasks

Xin He, Frank W. Samuelson, Rongping Zeng, et al.

Show abstract

Our previous work on assessment of digital breast tomosynthesis (DBT) image quality revealed inconsistencies in ranking the reconstruction algorithms’ performances for a location-known-exactly (LKE) detection and a location-unknown searching task. Such results made us wonder that ranking inconsistencies may not be rare phenomena at all. In this work, we conducted a small literature review that involved three publications (He, Samuelson, Zeng and Sahiner SPIE 2016; Park, Kupinski, Clarkson and Barrett, IPMI 2003 and JOSA 2005). These publications compared the LKE and search performance for a variety of observers using the AUC value as the performance criterion (human observers, CHOs for detection, scanning CHOs for search, and the Markov Chain Monte Carlo ideal observer for detection and search). We categorized the experimental findings into three types of ranking inconsistencies: 1) Ranking inconsistencies in LKE and search tasks; 2) human/ideal observer ranking inconsistencies; and 3) LKE/search ranking inconsistencies in the presence of signal variability. The empirical evidence presented in this work suggested that ranking inconsistencies for imaging systems existed, but these inconsistencies often do not draw enough attention in the literature.

Investigation on location-dependent detectability of a small mass for digital breast tomosynthesis evaluation

Changwoo Lee, Jongduk Baek, Subok Park

Show abstract

Digital breast tomosynthesis (DBT) is an emerging imaging modality for improved breast cancer detection and diagnosis [1-5]. Numerous efforts have been made to find quantitative metrics associated with mammographic image quality assessment, such as the exponent β of anatomical noise power spectrum, glandularity, contrast noise ratio, etc. [6-8]. In addition, with the use of Fourier-domain detectability for a task-based assessment of DBT, a stationarity assumption on reconstructed image statistics was often made [9-11], resulting in the use of multiple regions-of-interest (ROIs) from different locations in order to increase sample size. While all these metrics provide some information on mammographic image characteristics and signal detection, the relationship between these metrics and detectability in DBT evaluation has not been fully understood. In this work, we investigated spatial-domain detectability trends and levels as a function of the number of slices N_s at three different ROI locations on the same image slice, where background statistics differ in terms of the aforementioned metrics. Detectabilities for the three ROI locations were calculated using multi-slice channelized Hotelling observers with 2D/3D Laguerre-Gauss channels. Our simulation results show that detectability levels and trends as a function of Ns vary across these three ROI locations. They also show that the exponent β, mean glandularity, and mean attenuation coefficient vary across the three ROI locations but they do not necessarily predict the ranking of detectability levels and trends across these ROI locations.

Machine-learning model observer for detection and localization tasks in clinical SPECT-MPI

Felipe M. Parages, J. Michael O'Connor, P. Hendrik Pretorius, et al.

Show abstract

In this work we propose a machine-learning MO based on Naive-Bayes classification (NB-MO) for the diagnostic tasks of detection, localization and assessment of perfusion defects in clinical SPECT Myocardial Perfusion Imaging (MPI), with the goal of evaluating several image reconstruction methods used in clinical practice. NB-MO uses image features extracted from polar-maps in order to predict lesion detection, localization and severity scores given by human readers in a series of 3D SPECT-MPI. The population used to tune (i.e. train) the NB-MO consisted of simulated SPECT-MPI cases – divided into normals or with lesions in variable sizes and locations – reconstructed using filtered backprojection (FBP) method. An ensemble of five human specialists (physicians) read a subset of simulated reconstructed images, and assigned a perfusion score for each region of the left-ventricle (LV). Polar-maps generated from the simulated volumes along with their corresponding human scores were used to train five NB-MOs (one per human reader), which are subsequently applied (i.e. tested) on three sets of clinical SPECT-MPI polar maps, in order to predict human detection and localization scores. The clinical “testing” population comprises healthy individuals and patients suffering from coronary artery disease (CAD) in three possible regions, namely: LAD, LcX and RCA. Each clinical case was reconstructed using three reconstruction strategies, namely: FBP with no SC (i.e. scatter compensation), OSEM with Triple Energy Window (TEW) SC method, and OSEM with Effective Source Scatter Estimation (ESSE) SC. Alternative Free-Response (AFROC) analysis of perfusion scores shows that NB-MO predicts a higher human performance for scatter-compensated reconstructions, in agreement with what has been reported in published literature. These results suggest that NB-MO has good potential to generalize well to reconstruction methods not used during training, even for reasonably dissimilar datasets (i.e. simulated vs. clinical).

Breast Imaging II

Varying performance in mammographic interpretation across two countries: Do results indicate reader or population variances?

BaoLin P. Soh, Warwick B. Lee, Jill Wong, et al.

Show abstract

Aim: To compare the performance of Australian and Singapore breast readers interpreting a single test-set that consisted of mammographic examinations collected from the Australian population. Background: In the teleradiology era, breast readers are interpreting mammographic examinations from different populations. The question arises whether two groups of readers with similar training backgrounds, demonstrate the same level of performance when presented with a population familiar only to one of the groups. Methods: Fifty-three Australian and 15 Singaporean breast radiologists participated in this study. All radiologists were trained in mammogram interpretation and had a median of 9 and 15 years of experience in reading mammograms respectively. Each reader interpreted the same BREAST test-set consisting of sixty de-identified mammographic examinations arising from an Australian population. Performance parameters including JAFROC, ROC, case sensitivity as well as specificity were compared between Australian and Singaporean readers using a Mann Whitney U test. Results: A significant difference (P=0.036) was demonstrated between the JAFROC scores of the Australian and Singaporean breast radiologists. No other significant differences were observed. Conclusion: JAFROC scores for Australian radiologists were higher than those obtained by the Singaporean counterparts. Whilst it is tempting to suggest this is down to reader expertise, this may be a simplistic explanation considering the very similar training and audit backgrounds of the two populations of radiologists. The influence of reading images that are different from those that radiologists normally encounter cannot be ruled out and requires further investigation, particularly in the light of increasing international outsourcing of radiologic reporting.

Luminance level of a monitor: influence on detectability and detection rate of breast cancer in 2D mammography

Frédéric Bemelmans, Alaleh Rashidnasab, Frédérique Chesterman, et al.

Show abstract

Purpose: To evaluate lesion detectability and reading time as a function of luminance level of the monitor. Material and Methods: 3D mass models and microcalcification clusters were simulated into ROIs of for processing mammograms. Randomly selected ROIs were subdivided in three groups according to their background glandularity: high (>30%), medium (15-30%) and low (<15%). 6 non-spiculated masses (9 – 11mm), 6 spiculated masses (5 – 7mm) and 6 microcalcification clusters (2 – 4mm) were scaled in 3D to create a range of sizes. The linear attenuation coefficient (AC) of the masses was adjusted from 100% glandular tissue to 90%, 80%, 70%, to create different contrasts. Six physicists read the full database on Barco’s Coronis Uniti monitor for four different luminance levels (300, 800, 1000 and 1200 Cd/m²), using a 4-AFC tool. Percentage correct (PC) and time were computed for all different conditions. A paired t-test was performed to evaluate the effect of luminance on PC and time. A multi-factorial analysis was performed using MANOVA.. Results: Paired t-test indicated a statistically significant difference for the average time per session between 300 and 1200; 800 and 1200; 1000 and 1200 Cd/m², for all participants combined. There was no effect on PC. MANOVA denoted significantly lower reading times for high glandularity images at 1200 Cd/m². Both types of masses were significantly faster detected at 1200 Cd/m², for the contrast study. In the size study, microcalcification clusters and spiculated masses had a significantly higher detection rate at 1200 Cd/m². Conclusion: These results demonstrate a significant decrease in reading time, while detectability remained constant.

The effectiveness of the cranio-caudal mammogram projection among radiologists

Phuong Dung (Yun) Trieu, Warwick Lee, Kriscia Tapia, et al.

Show abstract

This study aims to investigate the effectiveness of the single cranio-caudal (CC) mammogram in comparison with traditional two projection mammography for breast cancer detection. Sixteen radiologists were invited to report 60 two-projection (MLO and CC) mammograms of the left and right breasts of which 20 cases contained cancer. Participants searched for the presence of breast lesion(s) on each view and provided a confidence score. Sensitivity, lesion sensitivity and specificity were compared between the CC projection versus the two projection approach among different groups of readers. Results showed that expert readers needed only single CC mammogram in their reading while non-expert readers required two-projection mammography.

Investigating the link between the radiological experience and the allocation of an 'equivocal finding'

Mohammad A. Rawashdeh, Camila Vidotti, Warwick Lee, et al.

Show abstract

Rationale and Objectives: This study will investigate the link between radiologists’ experience in reporting mammograms, their caseloads and the decision to give a classification of Royal Australian and New Zealand College of Radiologists (RANZCR) category ‘3’ (indeterminate or equivocal finding). Methods: A test set of 60 mammograms comprising of 20 abnormal and 40 normal cases were shown to 92 radiologists. Each radiologist was asked to identify and localize abnormalities and provide a RANZCR assessment category. Details were obtained from each reader regarding their experience, qualifications and breast reading activities. ‘Equivocal fractions’ were calculated by dividing the number of ‘equivocal findings’ given by each radiologist in the abnormal and normal cases by the total number of cases analyzed: 20 and 40 respectively. The ‘equivocal fractions’ for each of the groups (normal vs abnormal) were calculated and independently correlated with age, number of years since qualification as a radiologist, number of years reading mammograms, number of mammograms read per year, number of hours reading mammograms per week and number of mammograms read over lifetime (the number of years reading mammograms multiplied by the number of mammograms read per year). The non-parametric Spearman test was used. Results: Statistically negative correlations were noted between ‘equivocal fractions’ for the following groups: • For abnormal cases: hours per week (r= -0.38 P= 0.0001) • For normal cases: total number of mammograms read per year (r= -0.29, P= 0.006); number of mammograms read over lifetime (r= -0.21, P= 0.049)); hours reading mammograms per week (r= - 0.20, P= 0.05). Conclusion: Radiologists with greater reading experience assign fewer RANZCR category 3 or equivocal classifications. The findings have implications for screening program efficacy and recall rates. This work is still in progress and further data will be presented at the conference.

The interplay of attention economics and computer-aided detection marks in screening mammography

Tayler M. Schwartz, Radhika Sridharan, Wei Wei, et al.

Show abstract

Introduction: According to attention economists, overabundant information leads to decreased attention for individual pieces of information. Computer-aided detection (CAD) alerts radiologists to findings potentially associated with breast cancer but is notorious for creating an abundance of false-positive marks. We suspected that increased CAD marks do not lengthen mammogram interpretation time, as radiologists will selectively disregard these marks when present in larger numbers. We explore the relevance of attention economics in mammography by examining how the number of CAD marks affects interpretation time. Methods: We performed a retrospective review of bilateral digital screening mammograms obtained between January 1, 2011 and February 28, 2014, using only weekend interpretations to decrease distractions and the likelihood of trainee participation. We stratified data according to reader and used ANOVA to assess the relationship between number of CAD marks and interpretation time. Results: Ten radiologists, with median experience after residency of 12.5 years (range 6 to 24,) interpreted 1849 mammograms. When accounting for number of images, Breast Imaging Reporting and Data System category, and breast density, increasing numbers of CAD marks was correlated with longer interpretation time only for the three radiologists with the fewest years of experience (median 7 years.) Conclusion: For the 7 most experienced readers, increasing CAD marks did not lengthen interpretation time. We surmise that as CAD marks increase, the attention given to individual marks decreases. Experienced radiologists may rapidly dismiss larger numbers of CAD marks as false-positive, having learned that devoting extra attention to such marks does not improve clinical detection.

Technology Assessment

Importance of the grayscale in early assessment of image quality gains with iterative CT reconstruction

F. Noo, K. Hahn, Z. Guo

Show abstract

Iterative reconstruction methods have become an important research topic in X-ray computed tomography (CT), due to their ability to yield improvements in image quality in comparison with the classical filtered bacprojection method. There are many ways to design an effective iterative reconstruction method. Moreover, for each design, there may be a large number of parameters that can be adjusted. Thus, early assessment of image quality, before clinical deployment, plays a large role in identifying and refining solutions. Currently, there are few publications reporting on early, task-based assessment of image quality achieved with iterative reconstruction methods. We report here on such an assessment, and we illustrate at the same time the importance of the grayscale used for image display when conducting this type of assessment. Our results further support observations made by others that the edge preserving penalty term used in iterative reconstruction is a key ingredient to improving image quality in terms of detection task. Our results also provide a clear demonstration of an implication made in one of our previous publications, namely that the grayscale window plays an important role in image quality comparisons involving iterative CT reconstruction methods.

Validation of no-reference image quality index for the assessment of digital mammographic images

Helder C. R. de Oliveira, Bruno Barufaldi, Lucas R. Borges, et al.

Show abstract

To ensure optimal clinical performance of digital mammography, it is necessary to obtain images with high spatial resolution and low noise, keeping radiation exposure as low as possible. These requirements directly affect the interpretation of radiologists. The quality of a digital image should be assessed using objective measurements. In general, these methods measure the similarity between a degraded image and an ideal image without degradation (ground-truth), used as a reference. These methods are called Full-Reference Image Quality Assessment (FR-IQA). However, for digital mammography, an image without degradation is not available in clinical practice; thus, an objective method to assess the quality of mammograms must be performed without reference. The purpose of this study is to present a Normalized Anisotropic Quality Index (NAQI), based on the Rényi entropy in the pseudo-Wigner domain, to assess mammography images in terms of spatial resolution and noise without any reference. The method was validated using synthetic images acquired through an anthropomorphic breast software phantom, and the clinical exposures on anthropomorphic breast physical phantoms and patient’s mammograms. The results reported by this noreference index follow the same behavior as other well-established full-reference metrics, e.g., the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). Reductions of 50% on the radiation dose in phantom images were translated as a decrease of 4dB on the PSNR, 25% on the SSIM and 33% on the NAQI, evidencing that the proposed metric is sensitive to the noise resulted from dose reduction. The clinical results showed that images reduced to 53% and 30% of the standard radiation dose reported reductions of 15% and 25% on the NAQI, respectively. Thus, this index may be used in clinical practice as an image quality indicator to improve the quality assurance programs in mammography; hence, the proposed method reduces the subjectivity inter-observers in the reporting of image quality assessment.

Impact of large x-ray beam collimation on image quality

Damien Racine, Alexandre Ba, Julien G. Ott, et al.

Show abstract

Large X-ray beam collimation in computed tomography (CT) opens the way to new image acquisition techniques and improves patient management for several clinical indications. The systems that offer large X-ray beam collimation enable, in particular, a whole region of interest to be investigated with an excellent temporal resolution. However, one of the potential drawbacks of this option might be a noticeable difference in image quality along the z-axis when compared with the standard helical acquisition mode using more restricted X-ray beam collimations. The aim of this project is to investigate the impact of the use of large X-ray beam collimation and new iterative reconstruction on noise properties, spatial resolution and low contrast detectability (LCD). An anthropomorphic phantom and a custom made phantom were scanned on a GE Revolution CT. The images were reconstructed respectively with ASIR-V at 0% and 50%. Noise power spectra, to evaluate the noise properties, and Target Transfer Functions, to evaluate the spatial resolution, were computed. Then, a Channelized Hotelling Observer with Gabor and Dense Difference of Gaussian channels was used to evaluate the LCD using the Percentage correct as a figure of merit. Noticeable differences of 3D noise power spectra and MTF have been recorded; however no significant difference appeared when dealing with the LCD criteria. As expected the use of iterative reconstruction, for a given CTDIvol level, allowed a significant gain in LCD in comparison to ASIR-V 0%. In addition, the outcomes of the NPS and TTF metrics led to results that would contradict the outcomes of CHO model observers if used for a NPWE model observer (Non- Prewhitening With Eye filter). The unit investigated provides major advantages for cardiac diagnosis without impairing the image quality level of standard chest or abdominal acquisitions.

Predicting radiologists' true and false positive decisions in reading mammograms by using gaze parameters and image-based features

Ziba Gandomkar, Kevin Tay, Will Ryder, et al.

Show abstract

Radiologists’ gaze-related parameters combined with image-based features were utilized to classify suspicious mammographic areas ultimately scored as True Positives (TP) and False Positives (FP). Eight breast radiologists read 120 two-view digital mammograms of which 59 had biopsy proven cancer. Eye tracking data was collected and nearby fixations were clustered together. Suspicious areas on mammograms were independently identified based on thresholding an intensity saliency map followed by automatic segmentation and pruning steps. For each radiologist reported area, radiologist’s fixation clusters in the area, as well as neighboring suspicious areas within 2.5° of the center of fixation, were found. A 45-dimensional feature vector containing gaze parameters of the corresponding cluster along with image-based characteristics was constructed. Gaze parameters included total number of fixations in the cluster, dwell time, time to hit the cluster for the first time, maximum number of consecutive fixations, and saccade magnitude of the first fixation in the cluster. Image-based features consisted of intensity, shape, and texture descriptors extracted from the region around the suspicious area, its surrounding tissue, and the entire breast. For each radiologist, a userspecific Support Vector Machine (SVM) model was built to classify the reported areas as TPs or FPs. Leave-one-out cross validation was utilized to avoid over-fitting. A feature selection step was embedded in the SVM training procedure by allowing radial basis function kernels to have 45 scaling factors. The proposed method was compared with the radiologists’ performance using the jackknife alternative free-response receiver operating characteristic (JAFROC). The JAFROC figure of merit increased significantly for six radiologists.

Quantitative image quality evaluation for cardiac CT reconstructions

Hsin-Wu Tseng, Jiahua Fan, Matthew A. Kupinski, et al.

Show abstract

Maintaining image quality in the presence of motion is always desirable and challenging in clinical Cardiac CT imaging. Different image-reconstruction algorithms are available on current commercial CT systems that attempt to achieve this goal. It is widely accepted that image-quality assessment should be task-based and involve specific tasks, observers, and associated figures of merits. In this work, we developed an observer model that performed the task of estimating the percentage of plaque in a vessel from CT images. We compared task performance of Cardiac CT image data reconstructed using a conventional FBP reconstruction algorithm and the SnapShot Freeze (SSF) algorithm, each at default and optimal reconstruction cardiac phases. The purpose of this work is to design an approach for quantitative image-quality evaluation of temporal resolution for Cardiac CT systems. To simulate heart motion, a moving coronary type phantom synchronized with an ECG signal was used. Three different percentage plaques embedded in a 3 mm vessel phantom were imaged multiple times under motion free, 60 bpm, and 80 bpm heart rates. Static (motion free) images of this phantom were taken as reference images for image template generation. Independent ROIs from the 60 bpm and 80 bpm images were generated by vessel tracking. The observer performed estimation tasks using these ROIs. Ensemble mean square error (EMSE) was used as the figure of merit. Results suggest that the quality of SSF images is superior to the quality of FBP images in higher heart-rate scans.

Effect of anatomical backgrounds on detectability in volumetric cone beam CT images

Minah Han, Subok Park, Jongduk Baek

Show abstract

As anatomical noise is often a dominating factor affecting signal detection in medical imaging, we investigate the effects of anatomical backgrounds on signal detection in volumetric cone beam CT images. Signal detection performances are compared between transverse and longitudinal planes with either uniform or anatomical backgrounds. Sphere objects with diameters of 1mm, 5mm, 8mm, and 11mm are used as the signals. Three-dimensional (3D) anatomical backgrounds are generated using an anatomical noise power spectrum, 1/fβ, with β=3, equivalent to mammographic background [1]. The mean voxel value of the 3D anatomical backgrounds is used as an attenuation coefficient of the uniform background. Noisy projection data are acquired by the forward projection of the uniform and anatomical 3D backgrounds with/without sphere lesions and by the addition of quantum noise. Then, images are reconstructed by an FDK algorithm [2]. For each signal size, signal detection performances in transverse and longitudinal planes are measured by calculating the task SNR of a channelized Hotelling observer with Laguerre-Gauss channels. In the uniform background case, transverse planes yield higher task SNR values for all sphere diameters but 1mm. In the anatomical background case, longitudinal planes yield higher task SNR values for all signal diameters. The results indicate that it is beneficial to use longitudinal planes to detect spherical signals in anatomical backgrounds.

Poster Session

Breast ultrasound lesions classification: a performance evaluation between manual delineation and computer segmentation

Moi Hoon Yap, Chuin Hong Yap

Show abstract

Breast cancer is a threat to women worldwide. Manual delineation on breast ultrasound lesions is time-consuming and operator dependent. Computer segmentation of ultrasound breast lesions can be a challenging task due to the ill-defined lesions boundaries and issues related to the speckle noise in ultrasound images. The main contribution of this paper is to compare the performance of the computer classifier on the manual delineation and computer segmentation in malignant and benign lesions classification. This paper we implement computer segmentation using multifractal approach on a database consists of 120 images (50 malignant lesions and 70 benign lesions). The computer segmentation result is compared with the manual delineation using Jaccard Similarity Index (JSI). The result shows that the average JSI of 0.5010 (±0.2088) for malignant lesions and the average JSI of 0.6787 (±0.1290) for benign lesions. These results indicate lower agreement in malignant lesions due to the irregular shape while the higher agreement in benign lesions with regular shape. Further, we extract the shape descriptors for the lesions. By using logistic regression with 10 fold cross validation, the classification rates of manual delineation and computer segmentation are computed. The computer segmentation produced results with sensitivity 0.780 and specificity 0.871. However, the manual delineation produced sensitivity of 0.520 and specificity of 0.800. The results show that there are no clear differences between the delineation in MD and CS in benign lesions but the computer segmentation on malignant lesions shows better accuracy for computer classifier.

Impact of patient photos on visual search during radiograph interpretation

Elizabeth A. Krupinski, Kimberly Applegate, Ariadne DeSimone, et al.

Show abstract

To increase detection of mislabeled medical imaging studies evidence shows it may be useful to include patient photographs during interpretation. This study examined how inclusion of photos impacts visual search. Ten radiologists viewed 21 chest radiographs with and without a photo of the patient while search was recorded. Their task was to note tube/line placement. Eye-tracking data revealed that presence of the photo reduced the number of fixations and total dwell on the chest image as a result of periodically looking at the photo. Average preference for having photos was 6.10 on 0-10 scale and neck and chest were preferred areas.

Changes in frequency of recall recommendations of examinations depicting cancer with the availability of either priors or digital breast tomosynthesis

Christiane M. Hakim, Andriy I. Bandos, Marie A. Ganott, et al.

Show abstract

Performance changes in a binary environment when using additional information is affected only when changes in recommendations are made due to the additional information in question. In a recent study, we have shown that, contrary to general expectation, introducing prior examinations improved recall rates, but not sensitivity. In this study, we assessed cancer detection differences when prior examinations and/or digital breast tomosynthesis (DBT) were made available to the radiologist. We identified a subset of 21 cancer cases with differences in the number of radiologists who recalled these cases after reviewing either a prior examination or DBT. For the cases with differences in recommendations after viewing either priors or DBT, separately, we evaluated the total number of readers that changed their recommendations, regardless of the specific radiologist in question. Confidence intervals for the number of readers and a test for the hypothesis of no difference was performed using the non-parameteric bootstrap approach addressing both case and reader-related sources of variability by resampling cases and readers. With the addition of priors, there were 14 cancer cases (out of 15) where the number of “recalling radiologists” decreased. With the addition of DBT, the number of “recalling radiologists” decreased in only five cases (out of 15) while increasing in the remaining 9 cases. Unlike most new approaches to breast imaging DBT seems to improve both recall rates and cancer detection rates. Changes in recommendations were noted by all radiologists for all cancers by type, size, and breast density.

The study of surgical image quality evaluation system by subjective quality factor method

Jian J. Zhang, Jason R. Xuan, Xirong Yang, et al.

Show abstract

GreenLightTM procedure is an effective and economical way of treatment of benign prostate hyperplasia (BPH); there are almost a million of patients treated with GreenLightTM worldwide. During the surgical procedure, the surgeon or physician will rely on the monitoring video system to survey and confirm the surgical progress. There are a few obstructions that could greatly affect the image quality of the monitoring video, like laser glare by the tissue and body fluid, air bubbles and debris generated by tissue evaporation, and bleeding, just to name a few. In order to improve the physician’s visual experience of a laser surgical procedure, the system performance parameter related to image quality needs to be well defined. However, since image quality is the integrated set of perceptions of the overall degree of excellence of an image, or in other words, image quality is the perceptually weighted combination of significant attributes (contrast, graininess …) of an image when considered in its marketplace or application, there is no standard definition on overall image or video quality especially for the no-reference case (without a standard chart as reference). In this study, Subjective Quality Factor (SQF) and acutance are used for no-reference image quality evaluation. Basic image quality parameters, like sharpness, color accuracy, size of obstruction and transmission of obstruction, are used as subparameter to define the rating scale for image quality evaluation or comparison. Sample image groups were evaluated by human observers according to the rating scale. Surveys of physician groups were also conducted with lab generated sample videos. The study shows that human subjective perception is a trustworthy way of image quality evaluation. More systematic investigation on the relationship between video quality and image quality of each frame will be conducted as a future study.

Inter-observer variability within BI-RADS and RANZCR mammographic density assessment schemes

Christine N. Damases, Claudia Mello-Thoms, Mark F. McEntee

Show abstract

This study compares variability associated with two visual mammographic density (MD) assessment methods using two separate samples of radiologists. The image test-set comprised of images obtained from 20 women (age 42–89 years). The images were assessed for their MD by twenty American Board of Radiology (ABR) examiners and twenty-six radiologists registered with the Royal Australian and New Zealand College of Radiologists (RANZCR). Images were assessed using the same technology and conditions, however the ABR radiologists used the BI-RADS and the RANZCR radiologists used the RANZCR breast density synoptic. Both scales use a 4-point assessment. The images were then grouped as low- and high-density; low including BIRADS 1 and 2 or RANZCR 1 and 2 and high including BI-RADS 3 and 4 or RANZCR 3 and 4. Four-point BI-RADS and RANZCR showed no or negligible correlation (ρ=-0.029; p<0.859). The average inter-observer agreement on the BI-RADS scale had a Kappa of 0.565; [95% CI = 0.519 – 0.610], and ranged between 0.328–0.669 while the inter-observer agreement using the RANZCR scale had a Kappa of 0.360; [95% CI = 0.308 – 0.412] and a range of 0.078-0.499. Our findings show a wider range of inter-observer variability among RANZCR registered radiologists than the ABR examiners.

Observer study to evaluate the simulation of mammographic calcification clusters

Maria A. Z. Sousa, Karem D. Marcomini, Predrag R. Bakic, et al.

Show abstract

Numerous breast phantoms have been developed to be as realistic as possible to ensure the accuracy of image quality analysis, covering a greater range of applications. In this study, we simulated three different densities of the breast parenchyma using paraffin gel, acrylic plates and PVC films. Hydroxyapatite was used to simulate calcification clusters. From the images acquired with a GE Senographe DR 2000D mammography system, we selected 68 regions of interest (ROIs) with and 68 without a simulated calcification cluster. To validate the phantom simulation, we selected 136 ROIs from the University of South Florida’s Digital Database for Screening Mammography (DDSM). Seven trained observers performed two observer experiments by using a high-resolution monitor Barco mod. E-3620. In the first experiment, the observers had to distinguish between real or phantom ROIs (with and without calcification). In the second one, the observers had to indicate the ROI with calcifications between a pair of ROIs. Results from our study show that the hydroxyapatite calcifications had poor contrast in the simulated breast parenchyma, thus observers had more difficulty in identifying the presence of calcification clusters in phantom images. Preliminary analysis of the power spectrum was conducted to investigate the radiographic density and the contrast thresholds for calcification detection. The values obtained for the power spectrum exponent (β) were comparable with those found in the literature.

A four-alternative forced choice (4AFC) software for observer performance evaluation in radiology

Guozhi Zhang, Lesley Cockmartin, Hilde Bosmans

Show abstract

Four-alternative forced choice (4AFC) test is a psychophysical method that can be adopted for observer performance evaluation in radiological studies. While the concept of this method is well established, difficulties to handle large image data, perform unbiased sampling, and keep track of the choice made by the observer have restricted its application in practice. In this work, we propose an easy-to-use software that can help perform 4AFC tests with DICOM images. The software suits for any experimental design that follows the 4AFC approach. It has a powerful image viewing system that favorably simulates the clinical reading environment. The graphical interface allows the observer to adjust various viewing parameters and perform the selection with very simple operations. The sampling process involved in 4AFC as well as the speed and accuracy of the choice made by the observer is precisely monitored in the background and can be easily exported for test analysis. The software has also a defensive mechanism for data management and operation control that minimizes the possibility of mistakes from user during the test. This software can largely facilitate the use of 4AFC approach in radiological observer studies and is expected to have widespread applicability.

The study on the color reproduction by illumination source for disposable endoscope

Sang Kyeong Park, Hyeon Jin Bang, Young Jae Won

Show abstract

Most of cameras like CCTV cameras or video cameras are shoot in available light or lightings. But the medical cameras like endoscope or laparoscopy are shoot in the situation in vivo. Generally, inside ad body is lightless and moist. So we can shoot the inside with a single light from outside the body. So medical cameras should be set more clearer and vivider in low-light than in high-light. The camera setting of disposable endoscopes is especially very important because those are low price and should be guaranteed against faulty workmanship. In this study, We suggest effective conditions for camera settings of disposable endoscopes, and request value of test measurement item with analysis from the commercial endoscopes.

Cellular automata segmentation of the boundary between the compacta of vertebral bodies and surrounding structures

Jan Egger, Christopher Nimsky

Show abstract

Due to the aging population, spinal diseases get more and more common nowadays; e.g., lifetime risk of osteoporotic fracture is 40% for white women and 13% for white men in the United States. Thus the numbers of surgical spinal procedures are also increasing with the aging population and precise diagnosis plays a vital role in reducing complication and recurrence of symptoms. Spinal imaging of vertebral column is a tedious process subjected to interpretation errors. In this contribution, we aim to reduce time and error for vertebral interpretation by applying and studying the GrowCut - algorithm for boundary segmentation between vertebral body compacta and surrounding structures. GrowCut is a competitive region growing algorithm using cellular automata. For our study, vertebral T2-weighted Magnetic Resonance Imaging (MRI) scans were first manually outlined by neurosurgeons. Then, the vertebral bodies were segmented in the medical images by a GrowCut-trained physician using the semi-automated GrowCut-algorithm. Afterwards, results of both segmentation processes were compared using the Dice Similarity Coefficient (DSC) and the Hausdorff Distance (HD) which yielded to a DSC of 82.99±5.03% and a HD of 18.91±7.2 voxel, respectively. In addition, the times have been measured during the manual and the GrowCut segmentations, showing that a GrowCutsegmentation – with an average time of less than six minutes (5.77±0.73) – is significantly shorter than a pure manual outlining.

New conversion factors between human and automatic readouts of the CDMAM phantom for CR systems

Johann Hummel, Peter Homolka, Angelika Osanna-Elliot, et al.

Show abstract

Mammography screenings demand for profound image quality (IQ) assessment to guarantee their screening success. The European protocol for the quality control of the physical and technical aspects of mammography screening (EPQCM) suggests a contrast detail phantom such as the CDMAM phantom to evaluate IQ. For automatic evaluation a software is provided by the EUREF. As human and automatic readouts differ systematically conversion factors were published by the official reference organisation (EUREF). As we experienced a significant difference for these factors for Computed Radiography (CR) systems we developed an objectifying analysis software which presents the cells including the gold disks randomly in thickness and rotation. This allows to overcome the problem of an inevitable learning effect where observers know the position of the disks in advance. Applying this software, 45 computed radiography (CR) systems were evaluated and the conversion factors between human and automatic readout determined. The resulting conversion factors were compared with the ones resulting from the two methods published by EUREF. We found our conversion factors to be substantially lower than those suggested by EUREF, in particular 1.21 compared to 1.42 (EUREF EU method) and 1.62 (EUREF UK method) for 0.1 mm, and 1.40 compared to 1.73 (EUREF EU) and 1.83 (EUREF UK) for 0.25 mm disc diameter, respectively. This can result in a dose increase of up to 90% using either of these factors to adjust patient dose in order to fulfill image quality requirements. This suggests the need of an agreement on their proper application and limits the validity of the assessment methods. Therefore, we want to stress the need for clear criteria for CR systems based on appropriate studies.

Variability amongst radiographers in the categorization of clinical acceptability for digital trauma radiography

Robin Decoster, Rachel Toomey, Dirk Smits, et al.

Show abstract

Introduction: Radiographers evaluate anatomical structures to judge clinical acceptability of a radiograph. Whether a radiograph is deemed acceptable for diagnosis or not depends on the individual decision of the radiographer. Individual decisions cause variation in the accepted image quality. To minimise these variations definitions of acceptability, such as in RadLex, were developed. On which criteria radiographers attribute a RadLex categories to radiographs is unknown. Insight into these criteria helps to further optimise definitions and reduce variability in acceptance between radiographers. Therefore, this work aims the evaluation of the correlation between the RadLex classification and the evaluation of anatomical structures, using a Visual Grading Analysis (VGA) Methods: Four radiographers evaluated the visibility of five anatomical structures of 25 lateral cervical spine radiographs on a secondary class display with a VGA. They judged clinical acceptability of each radiograph using RadLex. Relations between VGAS and RadLex category were analysed with Kendall’s Tau correlation and Nagelkerke pseudo-R². Results: The overall VGA score (VGAS) and the RadLex score correlate (rτ= 0.62, p<0.01, R2=0.72) strongly. The observers’ evaluation of contrast between bone, air (trachea) and soft tissue has low value in predicting (rτ=0.55, p<0.01, R2=0.03) the RadLex score. The reproduction of spinous processes (rτ=0.67, p<0.01, R2=0.31) and the evaluation of the exposure (rτ=0.65, p<0.01, R2=0.56) have a strong correlation with high predictive value for the RadLex score. Conclusion: RadLex scores and VGAS correlate positively, strongly and significantly. The predictive value of bony structures may support the use of these in the judgement of clinical acceptability. Considerable inter-observer variations in the VGAS within a certain RadLex category, suggest that observers use of observer specific cut-off values.

A utility/cost analysis of breast cancer risk prediction algorithms

Craig K. Abbey, Yirong Wu, Elizabeth S. Burnside, et al.

Show abstract

Breast cancer risk prediction algorithms are used to identify subpopulations that are at increased risk for developing breast cancer. They can be based on many different sources of data such as demographics, relatives with cancer, gene expression, and various phenotypic features such as breast density. Women who are identified as high risk may undergo a more extensive (and expensive) screening process that includes MRI or ultrasound imaging in addition to the standard full-field digital mammography (FFDM) exam. Given that there are many ways that risk prediction may be accomplished, it is of interest to evaluate them in terms of expected cost, which includes the costs of diagnostic outcomes. In this work we perform an expected-cost analysis of risk prediction algorithms that is based on a published model that includes the costs associated with diagnostic outcomes (true-positive, false-positive, etc.). We assume the existence of a standard screening method and an enhanced screening method with higher scan cost, higher sensitivity, and lower specificity. We then assess expected cost of using a risk prediction algorithm to determine who gets the enhanced screening method under the strong assumption that risk and diagnostic performance are independent. We find that if risk prediction leads to a high enough positive predictive value, it will be cost-effective regardless of the size of the subpopulation. Furthermore, in terms of the hit-rate and false-alarm rate of the of the risk prediction algorithm, iso-cost contours are lines with slope determined by properties of the available diagnostic systems for screening.

Development and application of a channelized Hotelling observer for DBT optimization on structured background test images with mass simulating targets

Dimitar Petrov, Koen Michielsen, Lesley Cockmartin, et al.

Show abstract

Digital breast tomosynthesis (DBT) is a 3D mammography technique that promises better visualization of low contrast lesions than conventional 2D mammography. A wide range of parameters influence the diagnostic information in DBT images and a systematic means of DBT system optimization is needed. The gold standard for image quality assessment is to perform a human observer experiment with experienced readers. Using human observers for optimization is time consuming and not feasible for the large parameter space of DBT. Our goal was to develop a model observer (MO) that can predict human reading performance for standard detection tasks of target objects within a structured phantom and subsequently apply it in a first comparative study. The phantom consists of an acrylic semi-cylindrical container with acrylic spheres of different sizes and the remaining space filled with water. Three types of lesions were included: 3D printed spiculated and non-spiculated mass lesions along with calcification groups. The images of the two mass lesion types were reconstructed with 3 different reconstruction methods (FBP, FBP with SRSAR, MLTR_pr) and read by human readers. A Channelized Hotelling model observer was created for the non-spiculated lesion detection task using five Laguerre-Gauss channels, tuned for better performance. For the non-spiculated mass lesions a linear relation between the MO and human observer results was found, with correlation coefficients of 0.956 for standard FBP, 0.998 for FBP with SRSAR and 0.940 for MLTRpr. Both the MO and human observer percentage correct results for the spiculated masses were close to 100%, and showed no difference from each other for every reconstruction algorithm.

Evaluation of image quality of MRI data for brain tumor surgery

Frank Heckel, Felix Arlt, Benjamin Geisler, et al.

Show abstract

3D medical images are important components of modern medicine. Their usefulness for the physician depends on their quality, though. Only high-quality images allow accurate and reproducible diagnosis and appropriate support during treatment. We have analyzed 202 MRI images for brain tumor surgery in a retrospective study. Both an experienced neurosurgeon and an experienced neuroradiologist rated each available image with respect to its role in the clinical workflow, its suitability for this specific role, various image quality characteristics, and imaging artifacts. Our results show that MRI data acquired for brain tumor surgery does not always fulfill the required quality standards and that there is a significant disagreement between the surgeon and the radiologist, with the surgeon being more critical. Noise, resolution, as well as the coverage of anatomical structures were the most important criteria for the surgeon, while the radiologist was mainly disturbed by motion artifacts.

Evaluation of the possibility to use thick slabs of reconstructed outer breast tomosynthesis slice images

Hannie Petersson, Magnus Dustler, Anders Tingberg, et al.

Show abstract

The large image volumes in breast tomosynthesis (BT) have led to large amounts of data and a heavy workload for breast radiologists. The number of slice images can be decreased by combining adjacent image planes (slabbing) but the decrease in depth resolution can considerably affect the detection of lesions. The aim of this work was to assess if thicker slabbing of the outer slice images (where lesions seldom are present) could be a viable alternative in order to reduce the number of slice images in BT image volumes. The suggested slabbing (an image volume with thick outer slabs and thin slices between) were evaluated in two steps. Firstly, a survey of the depth of 65 cancer lesions within the breast was performed to estimate how many lesions would be affected by outer slabs of different thicknesses. Secondly, a selection of 24 lesions was reconstructed with 2, 6 and 10 mm slab thickness to evaluate how the appearance of lesions located in the thicker slabs would be affected. The results show that few malignant breast lesions are located at a depth less than 10 mm from the surface (especially for breast thicknesses of 50 mm and above). Reconstruction of BT volumes with 6 mm slab thickness yields an image quality that is sufficient for lesion detection for a majority of the investigated cases. Together, this indicates that thicker slabbing of the outer slice images is a promising option in order to reduce the number of slice images in BT image volumes.