Share Email Print

Proceedings Paper

Evaluation of decision forests on text categorization
Author(s): Hao Chen; Tin Kam Ho
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Text categorization is useful for indexing documents for information retrieval, filtering parts for document understanding, and summarizing contents of documents of special interests. We describe a text categorization task and an experiment using documents from the Reuters and OHSUMED collections. We applied the Decision Forest classifier and compared its accuracies to those of C4.5 and kNN classifiers using both category dependent and category independent term selection schemes. It is found that Decision Forest outperforms both C4.5 and kNN in all cases, and that category dependent term selection yields better accuracies. Performances of al three classifiers degrade from the Reuters collection to the OHSUMED collection, but Decision Forest remains to be superior.

Paper Details

Date Published: 22 December 1999
PDF: 9 pages
Proc. SPIE 3967, Document Recognition and Retrieval VII, (22 December 1999); doi: 10.1117/12.373494
Show Author Affiliations
Hao Chen, Univ. of California/Berkeley (United States)
Tin Kam Ho, Lucent Technologies/Bell Labs. (United States)

Published in SPIE Proceedings Vol. 3967:
Document Recognition and Retrieval VII
Daniel P. Lopresti; Jiangying Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?