Share Email Print

Proceedings Paper

Experimental analysis of methods for imputation of missing values in databases
Author(s): Alireza Farhangfar; Lukasz A. Kurgan; Witold Pedrycz
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

A very important issue faced by researchers and practitioners who use industrial and research databases is incompleteness of data, usually in terms of missing or erroneous values. While some of data analysis algorithms can work with incomplete data, a large portion of them require complete data. Therefore, different strategies, such as deletion of incomplete examples, and imputation (filling) of missing values through variety of statistical and machine learning (ML) procedures, are developed to preprocess the incomplete data. This study concentrates on performing experimental analysis of several algorithms for imputation of missing values, which range from simple statistical algorithms like mean and hot deck imputation to imputation algorithms that work based on application of inductive ML algorithms. Three major families of ML algorithms, such as probabilistic algorithms (e.g. Naive Bayes), decision tree algorithms (e.g. C4.5), and decision rule algorithms (e.g. CLIP4), are used to implement the ML based imputation algorithms. The analysis is carried out using a comprehensive range of databases, for which missing values were introduced randomly. The goal of this paper is to provide general guidelines on selection of suitable data imputation algorithms based on characteristics of the data. The guidelines are developed by performing a comprehensive experimental comparison of performance of different data imputation algorithms.

Paper Details

Date Published: 12 April 2004
PDF: 11 pages
Proc. SPIE 5421, Intelligent Computing: Theory and Applications II, (12 April 2004); doi: 10.1117/12.542509
Show Author Affiliations
Alireza Farhangfar, Univ. of Alberta (Canada)
Lukasz A. Kurgan, Univ. of Alberta (Canada)
Witold Pedrycz, Univ. of Alberta (Canada)

Published in SPIE Proceedings Vol. 5421:
Intelligent Computing: Theory and Applications II
Kevin L. Priddy, Editor(s)

© SPIE. Terms of Use
Back to Top