Share Email Print
cover

Proceedings Paper

Novel approach to data discretization
Author(s): Grzegorz Borowik; Karol Kowalski; Cezary Jankowski
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Discretization is an important preprocessing step in data mining. The data discretization method involves determining the ranges of values for numeric attributes, which ultimately represent discrete intervals for new attributes. The ranges for the proposed set of cuts are analyzed, in order to obtain a minimal set of ranges while retaining the possibility of classification. For this purpose, a special discernibility function can be constructed as a conjunction of alternative cuts set for each pair of different objects of different decisions- cuts discern these objects. However, the data mining methods based on discernibility matrix are insufficient for large databases. The purpose of this paper is the idea of implementation of a new data discretization algorithm that is based on statistics of attribute values and that avoids building the discernibility matrix explicitly. Evaluation of time complexity has shown that the proposed method is much more efficient than currently available solutions for large data sets.

Paper Details

Date Published: 11 September 2015
PDF: 9 pages
Proc. SPIE 9662, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2015, 96623U (11 September 2015); doi: 10.1117/12.2205916
Show Author Affiliations
Grzegorz Borowik, Warsaw Univ. of Technology (Poland)
Karol Kowalski, Warsaw Univ. of Technology (Poland)
Cezary Jankowski, Warsaw Univ. of Technology (Poland)


Published in SPIE Proceedings Vol. 9662:
Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2015
Ryszard S. Romaniuk, Editor(s)

© SPIE. Terms of Use
Back to Top