Share Email Print
cover

Proceedings Paper

Pipelining multiple singular value decomposition (SVDs) on a single processor array
Author(s): Kishore Kota; Joseph R. Cavallaro
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

We present a new family of architectures for processor arrays to implement Jacobi SVD which allow systolic loading and unloading of input and result matrices. Unlike most of the previous SVD arrays in the literature, our architectures do not require special handling of external I/O and hence are closer to the traditional concept of systolic architectures. The boundary processors communicate with the host the same way any of the interior processors communicate with their neighbors. The arrays are surprisingly uniform and simple. The various architectures in the family represent different throughput-hardware tradeoffs corresponding to the degree to which the multiple sweeps have been unrolled and determine the number of independent SVDs which may be pipelined on the array. We achieved systolic loading by using the flexibility provided by the cyclic Jacobi method on the order in which pivot pairs may be chosen. The array operates on the matrix data even as it is being loaded. Once the pipeline is full, the ordering is very similar to odd-even ordering. Our ordering is equivalent to cyclic-by-rows ordering and hence the algorithm is guaranteed to converge. Our systolic loading scheme is very important in an I/O limited system, since it allows more communication to occur in parallel, where the communication includes the loading and unloading operations. The array with the highest throughput in our family of architectures, which implement one-sided Jacobi (either Hestenes' method or Eberlein and Park's method), is a linear array of processors with unidirectional links between neighbors. The architectures with lower throughput require fewer processors connected in a ring, allowing data to recirculate among the processors. The input matrix is loaded one column at a time from the left and the results stream out one column at a time from the right.

Paper Details

Date Published: 28 October 1994
PDF: 12 pages
Proc. SPIE 2296, Advanced Signal Processing: Algorithms, Architectures, and Implementations V, (28 October 1994); doi: 10.1117/12.190872
Show Author Affiliations
Kishore Kota, Rice Univ. (United States)
Joseph R. Cavallaro, Rice Univ. (United States)


Published in SPIE Proceedings Vol. 2296:
Advanced Signal Processing: Algorithms, Architectures, and Implementations V
Franklin T. Luk, Editor(s)

© SPIE. Terms of Use
Back to Top