Share Email Print

Proceedings Paper

Experience with ADI-FDTD techniques on the Cray MTA supercomputer
Author(s): Harry F. Jordan; Shahid Bokhari; Shawn Staker; Jon R. Sauer; Mona A. ElHelbawy; Melinda J. Piket-May
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Finite difference, time domain (FDTD) simulations are important to the design cycle for optical communications devices. High spatial resolution is essential, and the Courant condition limits the time step, making this problem require the level of high-performance system usually only available at a remote center. Model definition and result visualization can be done locally. Recent application of the alternating direction implicit (ADI) method to FDTD removes the Courant condition, promising larger time steps for meaningful turnaround in simulations. At each time step, tridiagonal equations are solved over single dimensions of a 3D problem, but all three dimensions are involved in each time step. Thus, for a distributed memory multiprocessor, no partition of the data prevents tridiagonals from crossing processors without remapping every time step. Likewise, for cache based or vector computers, there is a stride of NxN for tridiagonals at every time step for a NxNxN grid. There is plenty of parallelism, because NxN tridiagonals can be solved simultaneously. This makes the problem well suited to a machine like the Cray multithreaded architecture (MTA) that has a large, flat memory and uses parallelism to hide memory latency. A Cray MTA implementation of the ADI-FDTD code executes serial tridiagonal solvers in parallel on multiple threads and successfully hides memory latency, achieving just over one FLOP per clock cycle per processor for a 200x200x200 grid on an 8 processor system at the San Diego Supercomputer Center. The 8 processor speed is 2.06 Gflop and the efficiency is 98%. Comparing one MTA processor, with a 250 MHz clock to a 500 MHz Alpha processor, the MTA is three times as fast for a 50x50x50 grid problem size. A vectorized version of the code run on one Cray T90 processor is three times faster than one MTA processor for a 100x100x100 grid size.

Paper Details

Date Published: 27 July 2001
PDF: 9 pages
Proc. SPIE 4528, Commercial Applications for High-Performance Computing, (27 July 2001); doi: 10.1117/12.434878
Show Author Affiliations
Harry F. Jordan, Univ. of Colorado/Boulder (United States)
Shahid Bokhari, Univ. of Engineering and Technology (Pakistan)
Shawn Staker, MIT Lincoln Lab. (United States)
Jon R. Sauer, Univ. of Colorado/Boulder (United States)
Mona A. ElHelbawy, Univ. of Colorado/Boulder (United States)
Melinda J. Piket-May, Univ. of Colorado/Boulder (United States)

Published in SPIE Proceedings Vol. 4528:
Commercial Applications for High-Performance Computing
Howard Jay Siegel, Editor(s)

© SPIE. Terms of Use
Back to Top