Share Email Print
cover

Proceedings Paper

GPU-completeness: theory and implications
Author(s): I-Jong Lin
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.

Paper Details

Date Published: 25 January 2011
PDF: 12 pages
Proc. SPIE 7872, Parallel Processing for Imaging Applications, 78720J (25 January 2011); doi: 10.1117/12.872650
Show Author Affiliations
I-Jong Lin, Hewlett-Packard Labs. (United States)


Published in SPIE Proceedings Vol. 7872:
Parallel Processing for Imaging Applications
John D. Owens; I-Jong Lin; Yu-Jin Zhang; Giordano B. Beretta, Editor(s)

© SPIE. Terms of Use
Back to Top