Share Email Print
cover

Proceedings Paper

Analysis and summarization of correlations in data cubes
Author(s): Chien-Yu Chen; Shien-Ching Hwang; Yen-Jen Oyang
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

This paper presents a novel mechanism to analyze and summarize the statistical correlations among the attributes of a data cube. To perform the analysis and summarization, this paper proposes a new measure of statistical significance. The main reason for proposing the new measure of statistical significance is to have an essential closure property, which is exploited in the summarization stage of the data mining process. In addition to the closure property, the proposed measure of statistical significance has two other important properties. First, the proposed measure of statistical significance is more conservative than the well-known chi-square test in classical statistics and, therefore, inherits its statistical robustness. This paper does not simply employ the chi-square test due to lack of the desired closure property, which may lead to a precision problem in the summarization process. The second additional property is that, though the proposed measure of statistical significance is more conservative than the chi-square test, for most cases, the proposed measure yields a value that is almost equal to a conventional measurement of statistical significance based on the normal distribution. Based on the closure property addressed above, this paper develops an algorithm to summarize the results from performing statistical analysis in the data cube. Though the proposed measure of statistical significance avoids the precision problem due to having the closure property, its conservative nature may lead to a recall rate problem in the data mining process. On the other hand, if the chi-square test, which does not have the closure property, was employed, then the summarization process may suffer a precision problem.

Paper Details

Date Published: 12 March 2002
PDF: 12 pages
Proc. SPIE 4730, Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV, (12 March 2002); doi: 10.1117/12.460206
Show Author Affiliations
Chien-Yu Chen, National Taiwan Univ. (Taiwan)
Shien-Ching Hwang, National Taiwan Univ. (Taiwan)
Yen-Jen Oyang, National Taiwan Univ. (Taiwan)


Published in SPIE Proceedings Vol. 4730:
Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV
Belur V. Dasarathy, Editor(s)

© SPIE. Terms of Use
Back to Top