Share Email Print

Proceedings Paper

Automatic similarity detection and clustering of data
Author(s): Craig Einstein; Peter Chin
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

An algorithm was created which identifies the number of unique clusters in a dataset and assigns the data to the clusters. A cluster is defined as a group of data which share similar characteristics. Similarity is measured using the dot product between two vectors where the data are input as vectors. Unlike other clustering algorithms such as K-means, no knowledge of the number of clusters is required. This allows for an unbiased analysis of the data. The automatic cluster detection algorithm (ACD), is executed in two phases: an averaging phase and a clustering phase. In the averaging phase, the number of unique clusters is detected. In the clustering phase, data are matched to the cluster to which they are most similar. The ACD algorithm takes a matrix of vectors as an input and outputs a 2D array of the clustered data. The indices of the output correspond to a cluster, and the elements in each cluster correspond to the position of the datum in the dataset. Clusters are vectors in N-dimensional space, where N is the length of the input vectors which make up the matrix. The algorithm is distributed, increasing computational efficiency

Paper Details

Date Published: 22 May 2017
PDF: 7 pages
Proc. SPIE 10185, Cyber Sensing 2017, 101850K (22 May 2017); doi: 10.1117/12.2267844
Show Author Affiliations
Craig Einstein, Boston Univ. (United States)
Peter Chin, Boston Univ. (United States)

Published in SPIE Proceedings Vol. 10185:
Cyber Sensing 2017
Igor V. Ternovskiy; Peter Chin, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?