Share Email Print
cover

Proceedings Paper

Discovering clusters in databases containing mixed continuous and discrete-valued attributes
Author(s): Keith C. C. Chan; Lewis L. H. Chung
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Clustering is concerned with the unsupervised discovery of natural grouping of records in a database. Recently, considerable effort has been put into the development of effective clustering algorithms for spatial data mining. These algorithms are mainly developed to deal with continuous valued attributes. They typically use a distance measure, defined in the Euclidean space to determine if two records should be placed in the same or in disjoint clusters. Since such a distance measure cannot be validly defined in non-Euclidean space, these algorithms cannot be used to handle databases, which contain records also characterized by discrete qualitative or categorical values. Owing to the fact that there is usually a mixture of both continuous and discrete valued attributes in many real-world databases, it is important that a clustering algorithm should be developed to handle data mining tasks involving them. In this paper, we present such a clustering algorithm. The algorithm can be divided into two phases: a discretization phase and a genetic algorithm-based clustering phase. The first phase involves discretizing continuous attributes into discrete attributes, based on the use of an information theoretic measure, which minimizes loss of information during such process. These discretized and the discrete attributes are then used in the second phase. In this phase, clustering is carried out, using a genetic algorithm. By representing a specific grouping of records in a chromosome and using a weight of evidence measure as a fitness measure, to determine if such grouping is meaningful, we present here an effective GA for data clustering. To evaluate the effectiveness of the proposed techniques, we tested it using real data. The experimental results showed that it is very promising.

Paper Details

Date Published: 25 February 1999
PDF: 9 pages
Proc. SPIE 3695, Data Mining and Knowledge Discovery: Theory, Tools, and Technology, (25 February 1999); doi: 10.1117/12.339987
Show Author Affiliations
Keith C. C. Chan, Hong Kong Polytechnic Univ. (Hong Kong)
Lewis L. H. Chung, Hong Kong Polytechnic Univ. (Hong Kong)


Published in SPIE Proceedings Vol. 3695:
Data Mining and Knowledge Discovery: Theory, Tools, and Technology
Belur V. Dasarathy, Editor(s)

© SPIE. Terms of Use
Back to Top