Share Email Print

Proceedings Paper

Exploring data sampling techniques for imbalanced classification problems
Author(s): Yu Sui; Xiaohui Zhang; Jiajia Huan; Haifeng Hong
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

The class imbalance problem is one of the key challenges in machine learning and data mining. Imbalanced data can result in the sub-optimal performance of classification models. To address the problem, a variety of data sampling methods have been proposed in previous studies. However, there is no universal solution and it is worth to explore which kind of data sampling technique is more effective in balancing class distribution in terms of the type of data and classifier. In this work, we present an experimental study based on a number of real-world data sets obtained from different disciplines. The goal is to investigate different sampling techniques in terms of the effectiveness of increasing the classification performance in imbalanced data sets. In particular, we study ten sampling methods of different types, including random sampling, clusterbased sampling, ensemble sampling and so on. Besides, the C4.5 decision tree algorithm is used to train the base classifiers and the performance is measured by using precision, G-Measure and Cohen's Kappa statistic.

Paper Details

Date Published: 31 July 2019
PDF: 5 pages
Proc. SPIE 11198, Fourth International Workshop on Pattern Recognition, 1119813 (31 July 2019); doi: 10.1117/12.2540457
Show Author Affiliations
Yu Sui, Guangdong Power Grid Corp. (China)
Xiaohui Zhang, Guangdong Power Grid Corp. (China)
Jiajia Huan, Guangdong Power Grid Corp. (China)
Haifeng Hong, Guangdong Power Grid Corp. (China)

Published in SPIE Proceedings Vol. 11198:
Fourth International Workshop on Pattern Recognition
Xudong Jiang; Zhenxiang Chen; Guojian Chen, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?