Share Email Print
cover

Proceedings Paper

Interactive mining of schema for semistructured data
Author(s): Yubao Liu; Yucai Feng
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Semistructured data such as HTML, SGML and XML documents are specified in lack of any fixed and rigid schema, but typically some implicit structures appear in the data. The crucial problem of mining of schema is to discover the similarly hidden structures of the semistructured data. The huge of amounts of on-line applications make it important to mine schemas for semistructured data. Notice that the user may have to dynamically tune the minimum support of schema, in the course of mining, since the minimum support always describes the user's special interests, we present the problem of interactive mining of schema for semistructured data in this paper. In the course of interactive mining, as the old minimum support of schema is tuned by the user, one possible way of discovering the interesting schemas is to re-run the mining algorithm of schema on the new minimum from scratch. However this approach is not efficient for it does not utilize the already mined results. Hence an incremental mining algorithm is presented. In addition, an improved algorithm for finding the maximal schema tree sets is also given. The experimental results show that the incremental algorithm is more efficient than the non-incrementally A-priori-like algorithm.

Paper Details

Date Published: 12 March 2002
PDF: 10 pages
Proc. SPIE 4730, Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV, (12 March 2002); doi: 10.1117/12.460250
Show Author Affiliations
Yubao Liu, Huazhong Univ. of Science and Technology (China)
Yucai Feng, Huazhong Univ. of Science and Technology (China)


Published in SPIE Proceedings Vol. 4730:
Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV
Belur V. Dasarathy, Editor(s)

© SPIE. Terms of Use
Back to Top