Share Email Print
cover

Proceedings Paper

Clustering of complex shaped data sets via Kohonen maps and mathematical morphology
Author(s): Jose Alfredo Ferreira Costa; Marcio Luiz de Andrade Netto
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Clustering is the process of discovering groups within the data, based on similarities, with a minimal, if any, knowledge of their structure. The self-organizing (or Kohonen) map (SOM) is one of the best known neural network algorithms. It has been widely studied as a software tool for visualization of high-dimensional data. Important features include information compression while preserving topological and metric relationship of the primary data items. Although Kohonen maps had been applied for clustering data, usually the researcher sets the number of neurons equal to the expected number of clusters, or manually segments a two-dimensional map using some a-priori knowledge of the data. This paper proposes techniques for automatic partitioning and labeling SOM networks in clusters of neurons that may be used to represent the data clusters. Mathematical morphology operations, such as watershed, are performed on the U-matrix, which is a neuron-distance image. The direct application of watershed leads to an oversegmented image. It is used markers to identify significant clusters and homotopy modification to suppress the others. Markers are automatically found by performing a multilevel scan of connected regions of the U-matrix. Each cluster of neurons is a sub-graph that defines, in the input space, complex and non-parametric geometries which approximately describes the shape of the clusters. The process of map partitioning is extended recursively. Each cluster of neurons gives rise to a new map, which are trained with the subset of data that were classified to it. The algorithm produces dynamically a hierarchical tree of maps, which explains the cluster's structure in levels of granularity. The distributed and multiple prototypes cluster representation enables the discoveries of clusters even in the case when we have two or more non-separable pattern classes.

Paper Details

Date Published: 27 March 2001
PDF: 12 pages
Proc. SPIE 4384, Data Mining and Knowledge Discovery: Theory, Tools, and Technology III, (27 March 2001); doi: 10.1117/12.421088
Show Author Affiliations
Jose Alfredo Ferreira Costa, Univ. Estadual de Campinas (Brazil)
Marcio Luiz de Andrade Netto, Univ. Estadual de Campinas (Brazil)


Published in SPIE Proceedings Vol. 4384:
Data Mining and Knowledge Discovery: Theory, Tools, and Technology III
Belur V. Dasarathy, Editor(s)

© SPIE. Terms of Use
Back to Top