Share Email Print

Proceedings Paper

Virtual clinical trial for task-based evaluation of a deep learning synthetic mammography algorithm
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Image processing algorithms based on deep learning techniques are being developed for a wide range of medical applications. Processed medical images are typically evaluated with the same kind of image similarity metrics used for natural scenes, disregarding the medical task for which the images are intended. We propose a com- putational framework to estimate the clinical performance of image processing algorithms using virtual clinical trials. The proposed framework may provide an alternative method for regulatory evaluation of non-linear image processing algorithms. To illustrate this application of virtual clinical trials, we evaluated three algorithms to compute synthetic mammograms from digital breast tomosynthesis (DBT) scans based on convolutional neural networks previously used for denoising low dose computed tomography scans. The inputs to the networks were one or more noisy DBT projections, and the networks were trained to minimize the difference between the output and the corresponding high dose mammogram. DBT and mammography images simulated with the Monte Carlo code MC-GPU using realistic breast phantoms were used for network training and validation. The denoising algorithms were tested in a virtual clinical trial by generating 3000 synthetic mammograms from the public VICTRE dataset of simulated DBT scans. The detectability of a calcification cluster and a spiculated mass present in the images was calculated using an ensemble of 30 computational channelized Hotelling observers. The signal detectability results, which took into account anatomic and image reader variability, showed that the visibility of the mass was not affected by the post-processing algorithm, but that the resulting slight blurring of the images severely impacted the visibility of the calcification cluster. The evaluation of the algorithms using the pixel-based metrics peak signal to noise ratio and structural similarity in image patches was not able to predict the reduction in performance in the detectability of calcifications. These two metrics are computed over the whole image and do not consider any particular task, and might not be adequate to estimate the diagnostic performance of the post-processed images.

Paper Details

Date Published: 7 March 2019
PDF: 10 pages
Proc. SPIE 10948, Medical Imaging 2019: Physics of Medical Imaging, 109480O (7 March 2019); doi: 10.1117/12.2513062
Show Author Affiliations
Andreu Badal, U.S. Food and Drug Administration (United States)
Kenny H. Cha, U.S. Food and Drug Administration (United States)
Sarah E. Divel, U.S. Food and Drug Administration (United States)
Stanford Univ. (United States)
Christian G. Graff, U.S. Food and Drug Administration (United States)
Rongping Zeng, U.S. Food and Drug Administration (United States)
Aldo Badano, U.S. Food and Drug Administration (United States)

Published in SPIE Proceedings Vol. 10948:
Medical Imaging 2019: Physics of Medical Imaging
Taly Gilat Schmidt; Guang-Hong Chen; Hilde Bosmans, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?