
Proceedings Paper
Learning to change taxonomiesFormat | Member Price | Non-Member Price |
---|---|---|
$17.00 | $21.00 |
Paper Abstract
Taxonomies are valuable tools for structuring and representing our knowledge about the world. They are widely used in many domains, where information about species, products, customers, publications, etc. needs to be organized. In the absence of standards, many taxonomies of the same entities can co-exist. A problem arises when data categorized in a particular taxonomy needs to be used by a procedure (methodology or algorithm) that uses a different taxonomy. Usually, a labor-intensive manual approach is used to solve this problem. This paper describes a machine learning approach which aids domain experts in changing taxonomies. It allows learning relationships between two taxonomies and mapping the data from one taxonomy into another. The proposed approach uses decision trees and bootstrapping for learning mappings of instances from the source to the target taxonomies. A C4.5 decision tree classifier is trained on a small manually labeled training set and applied to a randomly selected sample from the unlabeled data. The classification results are analyzed and the misclassified items are corrected and all items are added to the training set. This procedure is iterated until unlabeled data is available or an acceptable error rate is reached. In the latter case the last classifier is used to label all the remaining data. We test our approach on a database of products obtained from as grocery store chain and find that it performs well, reaching 92.6% accuracy while requiring the human expert to explicitly label only 18% of the entire data.
Paper Details
Date Published: 12 March 2002
PDF: 8 pages
Proc. SPIE 4730, Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV, (12 March 2002); doi: 10.1117/12.460247
Published in SPIE Proceedings Vol. 4730:
Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV
Belur V. Dasarathy, Editor(s)
PDF: 8 pages
Proc. SPIE 4730, Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV, (12 March 2002); doi: 10.1117/12.460247
Show Author Affiliations
Elena Eneva, Carnegie Mellon Univ. (United States)
Valery A. Petrushin, Accenture (United States)
Published in SPIE Proceedings Vol. 4730:
Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV
Belur V. Dasarathy, Editor(s)
© SPIE. Terms of Use
