Share Email Print

Proceedings Paper

Evaluation of similarity measures for analysis of databases on laboratory examinations
Author(s): Xiaoguang Sun; Shoji Hirano; Shusaku Tsumoto
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

One of the key concepts in data mining is to give a suitable partition of datasets in an automatic way. On one hand, classification method is to find the partitions given by combinations of attribute-value pairs which are best fit to the partition given by target concepts. On the other hand, clustering method is to find the partitions which best characterize given datasets by using a similarity measure. Therefore, the choice of distance or similarity measures are one of the most important research topics in data mining. However, such empirical comparisons have never been studied in the literature. In this paper, several types of similarity measures were compared in the following three clinical contexts: the first one is for datasets composed of only categorical attributes. The second one is for those of mixture of categorical and numerical attributes. The final one is for those of only numerical attributes. Experimental results show that simple similarity measures perform as well as new proposed measures.

Paper Details

Date Published: 12 March 2002
PDF: 10 pages
Proc. SPIE 4730, Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV, (12 March 2002); doi: 10.1117/12.460243
Show Author Affiliations
Xiaoguang Sun, Shimane Medical Univ. (Japan)
Shoji Hirano, Shimane Medical Univ. (Japan)
Shusaku Tsumoto, Shimane Medical Univ. (Japan)

Published in SPIE Proceedings Vol. 4730:
Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV
Belur V. Dasarathy, Editor(s)

© SPIE. Terms of Use
Back to Top