Share Email Print

Proceedings Paper

Matrix sketching for big data reduction (Conference Presentation)

Paper Abstract

Abstract: In recent years, the concept of Big Data has become a more prominent issue as the volume of data as well as the velocity in which it is produced exponentially increases. By 2020 the amount of data being stored is estimated to be 44 Zettabytes and currently over 31 Terabytes of data is being generated every second. Algorithms and applications must be able to effectively scale to the volume of data being generated. One such application designed to effectively and efficiently work with Big Data is IBM’s Skylark. Part of DARPA’s XDATA program, an open-source catalog of tools to deal with Big Data; Skylark, or Sketching-based Matrix Computations for Machine Learning is a library of functions designed to reduce the complexity of large scale matrix problems that also implements kernel-based machine learning tasks. Sketching reduces the dimensionality of matrices through randomization and compresses matrices while preserving key properties, speeding up computations. Matrix sketches can be used to find accurate solutions to computations in less time, or can summarize data by identifying important rows and columns. In this paper, we investigate the effectiveness of sketched matrix computations using IBM’s Skylark versus non-sketched computations. We judge effectiveness based on several factors: computational complexity and validity of outputs. Initial results from testing with smaller matrices are promising, showing that Skylark has a considerable reduction ratio while still accurately performing matrix computations.

Paper Details

Date Published: 6 June 2017
PDF: 1 pages
Proc. SPIE 10199, Geospatial Informatics, Fusion, and Motion Video Analytics VII, 101990F (6 June 2017); doi: 10.1117/12.2262937
Show Author Affiliations
Soundararajan Ezekiel, Indiana Univ. of Pennsylvania (United States)
Michael Giansiracusa, Indiana Univ. of Pennsylvania (United States)

Published in SPIE Proceedings Vol. 10199:
Geospatial Informatics, Fusion, and Motion Video Analytics VII
Kannappan Palaniappan; Peter J. Doucette; Gunasekaran Seetharaman; Anthony Stefanidis, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?