Share Email Print

Proceedings Paper

Density-induced oversampling for highly imbalanced datasets
Author(s): Daniel Fecker; Volker Märgner; Tim Fingscheidt
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

The problem of highly imbalanced datasets with only sparse data of the minority class in the context of two class classification is investigated. A novel synthetic data oversampling technique is proposed which utilizes estimations of the probability density distribution in the feature space. First, a Gaussian mixture model (GMM) from the data of the well-sampled majority class is generated and with its help a new GMM is approximated by Bayesian adaptation using the sparse minority class data. Random synthetic data is generated from the adapted GMM and an additional assignment rule assigns this data to either the minority class or else discards it. The obtained synthetic data is employed in combination with the available original data to train a support vector machine classifier. The examined application in this paper is optical on-line process monitoring of laser brazing with only rare sporadic occurring defects. Experiments with different amounts of minority class data samples and comparisons to other methods show that this approach performs very well for highly imbalanced datasets.

Paper Details

Date Published: 6 March 2013
PDF: 11 pages
Proc. SPIE 8661, Image Processing: Machine Vision Applications VI, 86610P (6 March 2013); doi: 10.1117/12.2003973
Show Author Affiliations
Daniel Fecker, Technische Univ. Braunschweig (Germany)
Volker Märgner, Technische Univ. Braunschweig (Germany)
Tim Fingscheidt, Technische Univ. Braunschweig (Germany)

Published in SPIE Proceedings Vol. 8661:
Image Processing: Machine Vision Applications VI
Philip R. Bingham; Edmund Y. Lam, Editor(s)

© SPIE. Terms of Use
Back to Top