Share Email Print
cover

Proceedings Paper

High-performance FFT implementation on the BOPS ManArray parallel DSP
Author(s): Nikos P. Pitsianis; Gerald Pechanek
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

We present a high performance implementation of the FFT algorithm on the BOPS ManArray parallel DSP processor. The ManArray we consider for this application consists of an array controller and 2 to 4 fully interconnected processing elements. To expose the parallelism inherent to an FFT algorithm we use a factorization of the DFT matrix in Kronecker products, permutation and diagonal matrices. Our implementation utilizes the multiple levels of parallelism that are available on the ManArray. We use the special multiply complex instruction, that calculates the product of two complex 32-bit fixed point numbers in 2 cycles (pipelinable). Instruction level parallelism is exploited via the indirect Very Long Instruction Word (iVLIW). With an iVLIW, in the same cycle a complex number is read from memory, another complex number is written to memory, a complex multiplication starts and another finishes, two complex additions or subtractions are done and a complex number is exchanged with another processing element. Multiple local FFTs are executed in Single Instruction Multiple Data (SIMD) mode, and to avoid a costly data transposition we execute distributed FFTs in Synchronous Multiple Instructions Multiple Data (SMIMD) mode.

Paper Details

Date Published: 2 November 1999
PDF: 8 pages
Proc. SPIE 3807, Advanced Signal Processing Algorithms, Architectures, and Implementations IX, (2 November 1999); doi: 10.1117/12.367633
Show Author Affiliations
Nikos P. Pitsianis, BOPS, Inc. and Duke Univ. (United States)
Gerald Pechanek, BOPS, Inc. (United States)


Published in SPIE Proceedings Vol. 3807:
Advanced Signal Processing Algorithms, Architectures, and Implementations IX
Franklin T. Luk, Editor(s)

© SPIE. Terms of Use
Back to Top