Share Email Print

Proceedings Paper

Artificially augmenting data or adding more samples? A study on a 3D CNN for lung nodule classification
Author(s): Panagiotis Gonidakis; Bart Jansen; Jef Vandemeulebroucke
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Convolutional neural networks are known to require large amounts of data to achieve optimal performance. In addition, data is commonly computationally augmented using a variety of geometric and intensity transformations to further extent the set of training samples. In medical imaging, annotated data is often scarce or costly to obtain, and there is considerable interest in methods to reduce the amount of data needed. In this work, we investigate the relative benefit of increasing the amount of original data, with respect to computationally augmenting the amount of training samples, for the case of false positive reduction of lung nodules candidates. To this end, we have implemented a previously published topology for classification, shown to achieve state of the art results on the publicly available Luna16 dataset. Numerous models were trained using different amounts of unique training samples and different degrees of data augmentation involving rotations and translations, and the performance was compared. Results indicate that in general, better performance is achieved when increasing the amount of data, or augmenting the data more extensively, as expected. Surprisingly however, we observed that after reaching a certain amount of unique training samples, data augmentation leads to significantly better performance compared to adding the same number of new samples to the training dataset. We hypothesize that the augmentation has aided in learning more general {rotation and translation invariant-features, leading to improved performance on unseen data. Future experiments include more detailed characterization of this behavior, and relating this to the topology and amount of parameters to be trained.

Paper Details

Date Published: 16 March 2020
PDF: 6 pages
Proc. SPIE 11314, Medical Imaging 2020: Computer-Aided Diagnosis, 113142F (16 March 2020); doi: 10.1117/12.2549810
Show Author Affiliations
Panagiotis Gonidakis, Vrije Univ. Brussel (Belgium)
imec (Belgium)
Bart Jansen, Vrije Univ. Brussel (Belgium)
imec (Belgium)
Jef Vandemeulebroucke, Vrije Univ. Brussel (Belgium)
imec (Belgium)

Published in SPIE Proceedings Vol. 11314:
Medical Imaging 2020: Computer-Aided Diagnosis
Horst K. Hahn; Maciej A. Mazurowski, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?