Until a few years ago, clock scaling represented the dominant method by which to boost computer performance. Electronic components were therefore designed to operate at higher and higher frequencies, allowing the frequency of the clock that drove them to be increased. However, clock scaling began to show its limitations about 10 years ago.^{1} A possible way to increase performance and solve the problems associated with clock scaling is to abandon serial computation (in which only one execution flow exists at any time and instructions are executed sequentially) and develop hardware and software components that are capable of carrying out parallel computing.^{2, 3} Multi-core central processing units (CPUs), which aggregate more than one core in the same physical package, represent one route toward achieving this.

Graphics processing units (GPUs) have recently become a viable alternative to multi-core CPUs. No longer confined to rendering scenes for gaming and media applications, today's GPUs are sophisticated enough that general-purpose parallel algorithms can be coded and run on them. GPU technology has found application in many scientific fields, ranging from signal processing to medical imaging, and from life sciences to fluid dynamics. As a result of the mass production and constant hardware improvements that they enjoy, GPUs offer inexpensive, state-of-the-art computational power.

Modern GPU devices have a large number of cores, making them suitable for many applications in biomedical imaging that typically require the processing of large amounts of data. We have developed GPU software that enables image reconstruction of emission tomography data collected with FastSPECT II.^{4, 5} FastSPECT II is a single-photon-emission computed tomography (SPECT) imaging system for small-animal imaging. It consists of 16 stationary modular gamma-ray cameras^{6} that are interfaced to a computer station for data acquisition and processing via dedicated circuitry. The size of the field of view is approximately 42×42×54 mm^{3}. As Figure 1 shows, the system achieves high sensitivity over a volume large enough to accommodate a mouse.^{4, 5}

**Figure 1. **Plots of FastSPECT II sensitivity (defined as the probability that a gamma-ray photon emitted from point r in space will be collected by the imaging system). The volume of the high-sensitivity region is approximately 57.29cm^{3}. SPECT: Single-photon-emission computed tomography.

Our reconstruction code implements the maximum-likelihood expectation-maximization (MLEM) algorithm, which is a popular choice for many image-reconstruction problems.^{7–10} Briefly, the MLEM algorithm begins with an estimate of the reconstructed image (e.g., a uniform image) and uses the measured data to refine this estimate by performing a certain number of iterations. At each iteration, a forward-projection step propagates the current estimate from image space to data space. Mathematically, this step can be formalized as a matrix multiplication between the system matrix,^{11}*H*, and the current image estimate. The measured data are then divided, component by component, by the result of the forward projection step, thus providing correction factors. A matrix multiplication by the transpose of *H* backpropagates these correction factors to image space. The result of this last step is then multiplied, component by component, by the current estimate of the reconstructed image, thereby giving a refined estimate to be used in the next iteration of the algorithm.

Our parallel implementation of the MLEM algorithm takes advantage of the capabilities of modern GPU devices. For example, instead of storing *H* in memory, we are able to calculate its non-zero entries on the fly and as they are required by the reconstruction algorithm. The non-zero entries can be calculated from a Gaussian fit (described by a limited number of fitting coefficients) of data that were collected during system calibration. Because modern GPUs are much faster at performing floating-point operations than fetching large amounts of data from memory, this approach greatly improves performance. Our approach also makes it easier to perform reconstructions on an arbitrary fine scale by simple interpolation of the Gaussian-fitting coefficients.^{12}

To assess the performance of our algorithm, we performed a simple simulation study. Beginning with a simulated bone scan of a mouse, we ran 30 iterations of the MLEM algorithm on an 8-Tesla K40 GPU machine. The total processing time was less than six minutes, representing a significant speedup compared with traditional hardware. Results are shown in Figure 2, and the full reconstruction can be seen in a short video.^{13}

**Figure 2. **A maximum-likelihood expectation-maximization reconstruction of a bone scan (mouse head and upper torso), run on an 8-Tesla K40 GPU (graphics processing unit) machine.

^{13}
We have implemented an MLEM algorithm on GPU hardware, enabling fast 3D image reconstruction for biomedical imaging applications. The amount of GPU computational power and memory that is projected to be available in the next few years will allow new theoretical approaches for analyzing imaging systems and processing the data that they produce. In one such approach, we are treating digital imaging systems with large numbers of acquired bits per event as part of continuous-to-continuous mappings. In such a case, an object (which is a function of continuous variables) is imaged, and the imaging system collects a list of parameters that vary over a continuous space. This gives rise to a representation of an imaging system via a continuous-to-continuous mathematical operator that relates the object that is being imaged to the collected data. This operator can then be analyzed mathematically, thereby providing insights regarding the fundamental properties of the imaging system. We are also working to integrate the concept of information content of a photon^{14} into GPU-based reconstruction code.

*The authors would like to acknowledge funding by the National Institutes of Health (grants R37 EB000803 and P41 EB002035).*

Luca Caucci, Lars R. Furenlid

Department of Medical Imaging

University of Arizona

Tucson, AZ

Luca Caucci is part of the research faculty and earned his PhD in optical sciences from the University of Arizona. His research interests include list-mode data processing, signal detection, parameter estimation, adaptive imaging, parallel computing, and list-mode digital radiology.

Lars R. Furenlid is a professor in the Department of Medical Imaging and the College of Optical Sciences. He earned his PhD in physical chemistry from the Georgia Institute of Technology. His research interests include scintillation and solid-state detectors, methods of optics, pulse-processing electronics, digital data acquisition, and data reconstruction with a variety of computational methods.

References:

1. P. E. Ross, Why CPU frequency stalled, *IEEE Spectrum* 45(4), p. 72, 2008.

2. D. E. Culler, J. P. Singh, A. Gupta, *Parallel Computer Architecture: A Hardware/Software Approach*, Morgan Kaufmann, 1999.

3. K. Hwang, *Advanced Computer Architecture: Parallelism, Scalability, Programmability*, McGraw-Hill, 1993.

4. L. R. Furenlid, D. W. Wilson, Y.-C. Chen, H. Kim, P. J. Pietraski, M. J. Crawford, H. H. Barrett, FastSPECT II: a second-generation high-resolution dynamic SPECT imager, *IEEE Trans. Nucl. Sci.* 51(3), p. 631-635, 2004.

5. Y.-C. Chen, *System Calibration and Image Reconstruction for a New Small-Animal SPECT System*, 2006. University of Arizona

6. T. D. Milster, J. N. Aarsvold, H. H. Barrett, A. L. Landesman, L. S. Mar, D. D. Patton, T. J. Roney, R. K. Rowe, R. H. Seacat III, A full-field modular gamma camera, *J. Nucl. Med.* 31(5), p. 632-639, 1990.

7. L. A. Shepp, Y. Vardi, Maximum likelihood reconstruction for emission tomography, *IEEE Trans. Med. Imag.* 1(2), p. 113-122, 1982.

8. A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, *J. Roy. Stat. Soc.* 39(1), p. 1-38, 1977.

9. W. H. Richardson, Bayesian-based iterative method of image restoration, *J. Opt. Soc. Am.* 62(1), p. 55-59, 1972.

10. L. B. Lucy, An iterative technique for the rectification of observed distributions, *Astron. J.* 79, p. 745-754, 1974.

11. H. H. Barrett, K. J. Myers, *Foundations of Image Science*, Wiley InterScience, Hoboken, NJ, 2004.

13. L. Caucci, L. R. Furenlid, GPU programming for biomedical imaging,

*Proc. SPIE* 9594, p. 95940G, 2015.

doi:10.1117/12.2195217