Share Email Print

Proceedings Paper

Research on Hadoop-based massive short text clustering algorithm
Author(s): Qiang Zhao; Yuliang Shi; Zepeng Qing
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Many clustering algorithms work well on small data sets of less than 200 data objects. However, a large database may contain millions of objects, and clustering on such a large data set may lead to biased results. As data volumes and availability continue to grow, so does the need for large dataset analytics. Among the most commonly used clustering algorithms, K-means proved to be one of the most popular choices to provide acceptable results in a reasonable amount of time. In this paper, we present an improved k-means algorithm with better initial centroids. Also, we implement this modified algorithm on Hadoop platform. Experiments show that the improved k-means algorithm converges faster than the classic k-means and the average execution time is reduced compared to the traditional k-means.

Paper Details

Date Published: 31 July 2019
PDF: 6 pages
Proc. SPIE 11198, Fourth International Workshop on Pattern Recognition, 111980A (31 July 2019); doi: 10.1117/12.2540380
Show Author Affiliations
Qiang Zhao, Beijing Univ. of Technology (China)
Yuliang Shi, Beijing Univ. of Technology (China)
Zepeng Qing, Beijing Univ. of Technology (China)

Published in SPIE Proceedings Vol. 11198:
Fourth International Workshop on Pattern Recognition
Xudong Jiang; Zhenxiang Chen; Guojian Chen, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?