Share Email Print

Proceedings Paper

Learning from imbalanced data: a comparative study for colon CAD
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Classification plays an important role in the reduction of false positives in many computer aided detection and diagnosis methods. The difficulty of classifying polyps lies in the variation of possible polyp shapes and sizes and the imbalance between the number of polyp and non-polyp regions available in the training data. CAD schemes for medical applications demand high levels of sensitivity even at the expense of keeping a certain number of false positives. In this paper, we investigate some state-of-the-art solutions to the imbalanced data problem: Synthetic Minority Over-sampling Technique (SMOTE) and weighted Support Vector Machines (SVM). We tested these methods using a diverse database of CT colonography, which included a wide spectrum of dificult cases to detect polyps. We performed several experiments with different combinations of over-sampling techniques on training data. The results demonstrated that SVMs have achieved much better performance over C4.5 with different over-sampling techniques. Also, the results show that weighted SVM without over-sampling can achieve comparable performance in terms of sensitivity and specificity to conventional SVM combined with the over-sampling approach.

Paper Details

Date Published: 17 March 2008
PDF: 9 pages
Proc. SPIE 6915, Medical Imaging 2008: Computer-Aided Diagnosis, 69150R (17 March 2008); doi: 10.1117/12.770630
Show Author Affiliations
Xiaoyun Yang, Medicsight PLC (United Kingdom)
Yalin Zheng, Medicsight PLC (United Kingdom)
Musib Siddique, Medicsight PLC (United Kingdom)
Gareth Beddoe, Medicsight PLC (United Kingdom)

Published in SPIE Proceedings Vol. 6915:
Medical Imaging 2008: Computer-Aided Diagnosis
Maryellen L. Giger; Nico Karssemeijer, Editor(s)

© SPIE. Terms of Use
Back to Top