Share Email Print

Proceedings Paper

Document clustering: applications in a collaborative digital library
Author(s): Fuad Rahman; Aman Kumar; Yuilya Tarnikova; Hassan Alam
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

This paper introduces a document clustering method within a commercial document repository, FileShare(R). FileShare(R) is a commercial collaborative digital library offering facilities for sharing and accessing documents over a simple Internet browser (e.g. Microsoft(R) Internet Explorer(R), Netscape(R) or Opera(R)) within groups of people working on common projects. As the number of documents increases within a digital library, displaying these documents in this environment poses a huge challenge. This paper proposes a document clustering method that uses a modified version of the traditional K-Means algorithm to categorize documents by their themes using lexical chaining within the FileShare(R) repository. The proposed algorithm is unsupervised, and has shown very high accuracy in a typical experimental setup.

Paper Details

Date Published: 16 January 2006
PDF: 8 pages
Proc. SPIE 6067, Document Recognition and Retrieval XIII, 60670K (16 January 2006); doi: 10.1117/12.650161
Show Author Affiliations
Fuad Rahman, BCL Technologies Inc. (United States)
Aman Kumar, BCL Technologies Inc. (United States)
Yuilya Tarnikova, BCL Technologies Inc. (United States)
Hassan Alam, BCL Technologies Inc. (United States)

Published in SPIE Proceedings Vol. 6067:
Document Recognition and Retrieval XIII
Kazem Taghva; Xiaofan Lin, Editor(s)

© SPIE. Terms of Use
Back to Top