Share Email Print

Proceedings Paper

Processing heterogeneous XML data from multi-source
Author(s): Tong Wang; Da-Xin Liu; Wei Sun; Xuanzuo Lin
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Recently XML heterogeneity has become a new challenge. In this paper, a novel clustering strategy is proposed to regroup these heterogeneous XML sources, for searching in a relatively smaller space with certain similarity can reduce cost. The strategy consists of four steps. We at first extract features about paths and map them into High-dimension Vector Space (HDVS). In the data pre-process, two algorithms are applied to diminish the redundancies in XML sources. Then heterogeneous documents are clustered. Finally, Multivalued Dependency (MVD) is introduced, for MVD can be redefined according to the range of constraints of XML. This paper also proposes a novel algorithm that discovering minimal MVD, based on the rough set handling non-integrity data. It can solve the problem that non-integrity data of XML influence on finding the MVD of XML, thus patterns can be extracted from each cluster.

Paper Details

Date Published: 18 April 2006
PDF: 8 pages
Proc. SPIE 6242, Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2006, 62420S (18 April 2006); doi: 10.1117/12.666467
Show Author Affiliations
Tong Wang, Harbin Engineering Univ. (China)
Da-Xin Liu, Harbin Engineering Univ. (China)
Wei Sun, Harbin Engineering Univ. (China)
Xuanzuo Lin, Northeast Agricultural Univ. (China)

Published in SPIE Proceedings Vol. 6242:
Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2006
Belur V. Dasarathy, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?