Share Email Print

Proceedings Paper

Cluster structure evaluation of dyadic k-means algorithm for mining large image archives
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

For many applications in data mining and knowledge discovery in databases, clustering methods are used for data reduction. If the amount of data increases like in image information mining, where one has to process GBytes of data, for instance, many of the existing clustering algorithms cannot be applied because of a high computational complexity. To overcome this disadvantage, we developed an efficient clustering algorithm called dyadic k-means. The algorithm is a modified and enhanced version of the traditional k-means. Whereas k-means has a computational complexity of O(nk) with n samples and k clusters, dyadic k-means has one of O(n \log k). Our algorithm is particularly efficient for the grouping of very large data sets with a high number of clusters. In this article we will present statistically-based methods for the objective evaluation of clusters obtained by dyadic k-means. The main focus is on how well the clusters describe the data point distribution in a multi-dimensional feature space and how much information can be obtained from the clusters. Both the filling of the feature space with samples and the characterization of this configuration with dyadic k-means produced clusters will be considered. We will use the well-established scatter matrices to measure the compactness and separability of clustered groups in the feature space. The probability of error, which is another indicator for the characterization of samples in the featuer space by clusters, will be calculated for each point, too. This probability delivers the relationship of each point to its cluster and can therefore be considered as a measurement of cluster reliability. We will test the evaluation methods both on a synthetic and a real world data set.

Paper Details

Date Published: 13 March 2003
PDF: 11 pages
Proc. SPIE 4885, Image and Signal Processing for Remote Sensing VIII, (13 March 2003); doi: 10.1117/12.463151
Show Author Affiliations
Herbert Daschiel, DLR (Germany)
Mihai P. Datcu, DLR (Germany)

Published in SPIE Proceedings Vol. 4885:
Image and Signal Processing for Remote Sensing VIII
Sebastiano B. Serpico, Editor(s)

© SPIE. Terms of Use
Back to Top