Share Email Print
cover

Proceedings Paper

Feature selection combined category concentration degree with minimal set covering
Author(s): Hao-dong Zhu; Hong-chan Li
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Feature selection is the core research topic in text categorization. Selected feature subset directly influences results of text categorization. Firstly, word frequency and document frequency were analyzed. And then, the category concentration degree based on word frequency and document frequency was proposed. Next, set covering was introduced into rough sets and an attribute reduction algorithm based on minimal set covering was provided. Finally, a new feature selection method combined the proposed category concentration degree with the provided attribute reduction algorithm was presented. The presented feature selection method firstly uses the proposed category concentration degree to select features and filter out some terms to reduce the sparsity of feature spaces, and then employs the provided attribute reduction algorithm to eliminate redundancy, so that the more representative feature subset was acquired. The experimental results show that presented feature selection method is better than the three classical feature selection methods: information gain (IG), x2 statistics (CHI), mutual information (MI) in time performance, macro-average F1 and micro-average F1.

Paper Details

Date Published: 15 November 2011
PDF: 7 pages
Proc. SPIE 8335, 2012 International Workshop on Image Processing and Optical Engineering, 83350K (15 November 2011); doi: 10.1117/12.917518
Show Author Affiliations
Hao-dong Zhu, Zhengzhou Univ. of Light Industry (China)
Hong-chan Li, Zhengzhou Univ. of Light Industry (China)


Published in SPIE Proceedings Vol. 8335:
2012 International Workshop on Image Processing and Optical Engineering
Hai Guo; Qun Ding, Editor(s)

© SPIE. Terms of Use
Back to Top