Share Email Print
cover

Proceedings Paper

Value-balanced agglomerative connectivity clustering
Author(s): Gunjan K. Gupta; Joydeep Ghosh
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

In this paper we propose a new clustering framework for transactional data-sets involving large numbers of customers and products. Such transactional data pose particular issues such as very high dimensionality (greater than 10,000), and sparse categorical entries, that have been dealt with more effectively using a graph-based approach to clustering such as ROCK. But large transactional data raises certain other issues such as how to compare diverse products (e.g. milk vs. cars) cluster balancing and outlier removal, that need to be addressed. We first propose a new similarity measure that takes the value of the goods purchased into account, and form a value-based graph representation based on this similarity measure. A novel value-based balancing criterion that allows the user to control the balancing of clusters, is then defined. This balancing criterion is integrated with a value-based goodness measure for merging two clusters in an agglomerative clustering routine. Since graph-based clustering algorithms are very sensitive to outliers, we also propose a fast, effective and simple outlier detection and removal method based on under-clustering or over- partitioning. The performance of the proposed clustering framework is compared with leading graph-theoretic approaches such as ROCK and METIS.

Paper Details

Date Published: 27 March 2001
PDF: 10 pages
Proc. SPIE 4384, Data Mining and Knowledge Discovery: Theory, Tools, and Technology III, (27 March 2001); doi: 10.1117/12.421079
Show Author Affiliations
Gunjan K. Gupta, Univ. of Texas at Austin (United States)
Joydeep Ghosh, Univ. of Texas at Austin (United States)


Published in SPIE Proceedings Vol. 4384:
Data Mining and Knowledge Discovery: Theory, Tools, and Technology III
Belur V. Dasarathy, Editor(s)

© SPIE. Terms of Use
Back to Top