Share Email Print
cover

Proceedings Paper • new

Mammographic breast density classification using a deep neural network: assessment based on inter-observer variability
Author(s): N. Kaiser; A. Fieselmann; S. Vesal; N. Ravikumar; L. Ritschl; S. Kappler; A. Maier
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Mammographic breast density is an important risk marker in breast cancer screening. The ACR BI-RADS guidelines (5th ed.) define four breast density categories that can be dichotomized by the two super-classes dense" and not dense". Due to the qualitative description of the categories, density assessment by radiologists is characterized by a high inter-observer variability. To quantify this variability, we compute the overall percentage agreement (OPA) and Cohen's kappa of 32 radiologists to the panel majority vote based on the two super-classes. Further, we analyze the OPA between individual radiologists and compare the performances to an automated assessment via a convolutional neural network (CNN). The data used for evaluation contains 600 breast cancer screening examinations with four views each. The CNN was designed to take all views of an examination as input and trained on a dataset with 7186 cases to output one of the two super-classes. The highest agreement to the panel majority vote (PMV) achieved by a single radiologist is 99%, the lowest score is 71% with a mean of 89%. The OPA of two individual radiologists ranges from a maximum of 97.5% to a minimum of 50.5% with a mean of 83%. Cohen's kappa values of radiologists to the PMV range from 0.97 to 0.47 with a mean of 0.77. The presented algorithm reaches an OPA to all 32 radiologists of 88% and a kappa of 0.75. Our results show that inter-observer variability for breast density assessment is high even if the problem is reduced to two categories and that our convolutional neural network can provide labelling comparable to an average radiologist. We also discuss how to deal with automated classification methods for subjective tasks.

Paper Details

Date Published: 4 March 2019
PDF: 6 pages
Proc. SPIE 10952, Medical Imaging 2019: Image Perception, Observer Performance, and Technology Assessment, 109520O (4 March 2019); doi: 10.1117/12.2513420
Show Author Affiliations
N. Kaiser, Siemens Healthcare GmbH (Germany)
Friedrich-Alexander-Univ. Erlangen-Nürnberg (Germany)
A. Fieselmann, Siemens Healthcare GmbH (Germany)
S. Vesal, Friedrich-Alexander-Univ. Erlangen-Nürnberg (Germany)
N. Ravikumar, Friedrich-Alexander-Univ. Erlangen-Nürnberg (Germany)
L. Ritschl, Siemens Healthcare GmbH (Germany)
S. Kappler, Siemens Healthcare GmbH (Germany)
A. Maier, Friedrich-Alexander-Univ. Erlangen-Nürnberg (Germany)


Published in SPIE Proceedings Vol. 10952:
Medical Imaging 2019: Image Perception, Observer Performance, and Technology Assessment
Robert M. Nishikawa; Frank W. Samuelson, Editor(s)

© SPIE. Terms of Use
Back to Top