Share Email Print

Proceedings Paper

Targeting multiple heterogeneous hardware platforms with OpenCL
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

The OpenCL API allows for the abstract expression of parallel, heterogeneous computing, but hardware implementations have substantial implementation differences. The abstractions provided by the OpenCL API are often insufficiently high-level to conceal differences in hardware architecture. Additionally, implementations often do not take advantage of potential performance gains from certain features due to hardware limitations and other factors. These factors make it challenging to produce code that is portable in practice, resulting in much OpenCL code being duplicated for each hardware platform being targeted. This duplication of effort offsets the principal advantage of OpenCL: portability. The use of certain coding practices can mitigate this problem, allowing a common code base to be adapted to perform well across a wide range of hardware platforms. To this end, we explore some general practices for producing performant code that are effective across platforms. Additionally, we explore some ways of modularizing code to enable optional optimizations that take advantage of hardware-specific characteristics. The minimum requirement for portability implies avoiding the use of OpenCL features that are optional, not widely implemented, poorly implemented, or missing in major implementations. Exposing multiple levels of parallelism allows hardware to take advantage of the types of parallelism it supports, from the task level down to explicit vector operations. Static optimizations and branch elimination in device code help the platform compiler to effectively optimize programs. Modularization of some code is important to allow operations to be chosen for performance on target hardware. Optional subroutines exploiting explicit memory locality allow for different memory hierarchies to be exploited for maximum performance. The C preprocessor and JIT compilation using the OpenCL runtime can be used to enable some of these techniques, as well as to factor in hardware-specific optimizations as necessary.

Paper Details

Date Published: 13 June 2014
PDF: 9 pages
Proc. SPIE 9095, Modeling and Simulation for Defense Systems and Applications IX, 90950E (13 June 2014); doi: 10.1117/12.2050643
Show Author Affiliations
Paul A. Fox, EM Photonics, Inc. (United States)
Stephen T. Kozacik, EM Photonics, Inc. (United States)
John R. Humphrey, EM Photonics, Inc. (United States)
Aaron Paolini, EM Photonics, Inc. (United States)
Aryeh Kuller, EM Photonics, Inc. (United States)
Eric J. Kelmelis, EM Photonics, Inc. (United States)

Published in SPIE Proceedings Vol. 9095:
Modeling and Simulation for Defense Systems and Applications IX
Eric J. Kelmelis, Editor(s)

© SPIE. Terms of Use
Back to Top