Share Email Print

Proceedings Paper

Case-based repeatability of machine learning classification performance on breast MRI
Author(s): Michael Vieceli; Amy Van Dusen; Karen Drukker; Hiroyuki Abe; Maryellen L. Giger; Heather M. Whitney
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Computer-aided diagnosis and radiomics have shown potential in diagnosis and prognosis of breast cancer. The purpose of this study was to investigate repeatability of classifier output and its relationship to classification performance of breast lesions imaged with dynamic contrast-enhanced MRI. Images of 1,169 breast lesions (267 benign, 902 cancers) were retrospectively collected under HIPAA/IRB compliance. The lesions were segmented automatically using a fuzzy c-means method and thirty-eight radiomic features were extracted. Three classification tasks were investigated, with different proportions of cases in each class: (i) benign (23%) vs. malignant (77%), (ii) “pure” ductal carcinoma in situ (DCIS) (25%) vs. DCIS with invasive ductal carcinoma (IDC) (75%), and (iii) invasive cancers of molecular subtype luminal A or luminal B (66%) vs. other molecular subtypes (34%). For each task, support vector machine classifiers were trained and tested within 0.632+ bootstrap analyses (1000 iterations) and the 0.632+ bias-corrected area under the ROC curve (AUC) served as the classification performance metric. Repeatability of classifier output was evaluated at three levels: a) repeatability by case (performance metric: width of the 95% confidence interval of classifier-estimated posterior probabilities for each case), b) repeatability within the dataset (performance metric: median and 95% confidence interval of the by-case 95% confidence interval widths), and c) potential relationship between classification performance and repeatability. In classification performance assessment, median AUCs [95% confidence interval] for the three tasks were 0.85 [0.83, 0.87], 0.84 [0.80, 0.87], and 0.65 [0.60, 0.69], respectively. In repeatability assessment within the dataset, the median confidence interval widths [95% confidence interval] for the posterior probabilities were 0.25 [0.08, 0.72], 0.34 [0.14, 0.84], and 0.23 [0.14, 0.68]. In conclusion, the classifiers in the first two tasks demonstrated strong classification performance while in all three they showed similar repeatability in posterior probabilities.

Paper Details

Date Published: 16 March 2020
PDF: 7 pages
Proc. SPIE 11314, Medical Imaging 2020: Computer-Aided Diagnosis, 1131421 (16 March 2020); doi: 10.1117/12.2548144
Show Author Affiliations
Michael Vieceli, Wheaton College (United States)
The Univ. of Chicago (United States)
Amy Van Dusen, Wheaton College (United States)
The Univ. of Chicago (United States)
Karen Drukker, The Univ. of Chicago (United States)
Hiroyuki Abe, The Univ. of Chicago (United States)
Maryellen L. Giger, The Univ. of Chicago (United States)
Heather M. Whitney, Wheaton College (United States)
The Univ. of Chicago (United States)

Published in SPIE Proceedings Vol. 11314:
Medical Imaging 2020: Computer-Aided Diagnosis
Horst K. Hahn; Maciej A. Mazurowski, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?