Share Email Print
cover

Proceedings Paper

Supplementing training with data from a shifted distribution for machine learning classifiers: adding more cases may not always help
Author(s): Kenny H. Cha; Alexej Gossmann; Nicholas Petrick; Berkman Sahiner
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

In this study, we show that when a training data set is supplemented by drawing samples from a distribution that is different from that of the target population, the differences in the distributions of the original and supplemental training populations should be considered to maximize the performance of the classifier in the target population. Depending on these distributions, drawing a large number of cases from the supplemental distribution may result in lower performance compared to limiting the number of added cases. This is relevant for medical images when synthetic data is used for training a machine learning algorithm, which may result in a mixed distribution for the training set. We simulated a twoclass classification problem and determined the performance of a linear classifier and a neural network classifier on test cases when trained with cases from only the target distribution, and when cases from a shifted, supplemental distribution are added to a limited number of cases from the target distribution. We show that adding data from a supplemental distribution for machine learning classifier training may improve the performance on the target test distribution. However, given the same number of training cases from a mixed distribution, the performance may not reach the performance of only training on data from the target distribution. In addition, the increase in performance will peak or plateau, depending on the shift in the distribution and the number of cases from the supplemental distribution.

Paper Details

Date Published: 16 March 2020
PDF: 6 pages
Proc. SPIE 11316, Medical Imaging 2020: Image Perception, Observer Performance, and Technology Assessment, 113160S (16 March 2020); doi: 10.1117/12.2550538
Show Author Affiliations
Kenny H. Cha, U.S. Food and Drug Administration (United States)
Alexej Gossmann, U.S. Food and Drug Administration (United States)
Nicholas Petrick, U.S. Food and Drug Administration (United States)
Berkman Sahiner, U.S. Food and Drug Administration (United States)


Published in SPIE Proceedings Vol. 11316:
Medical Imaging 2020: Image Perception, Observer Performance, and Technology Assessment
Frank W. Samuelson; Sian Taylor-Phillips, Editor(s)

© SPIE. Terms of Use
Back to Top
PREMIUM CONTENT
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?
close_icon_gray