Share Email Print

Journal of Electronic Imaging

Design of a pseudo-log image transform hardware accelerator in a high-level synthesis-based memory management framework
Author(s): Shahzad Ahmad Butt; Stéphane Mancini; Frédéric Rousseau; Luciano Lavagno
Format Member Price Non-Member Price
PDF $20.00 $25.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

The pseudo-log image transform belongs to a class of image processing kernels that generate memory references which are nonlinear functions of loop indices. Due to the nonlinearity of the memory references, the usual design methodologies do not allow efficient hardware implementation for nonlinear kernels. For optimized hardware implementation, these kernels require the creation of a customized memory hierarchy and efficient data/memory management strategy. We present the design and real-time hardware implementation of a pseudo-log image transform IP (hardware image processing engine) using a memory management framework. The framework generates a controller which efficiently manages input data movement in the form of tiles between off-chip main memory, on-chip memory, and the core processing unit. The framework can jointly optimize the memory hierarchy and the tile computation schedule to reduce on-chip memory requirements, to maximize throughput, and to increase data reuse for reducing off-chip memory bandwidth requirements. The algorithmic C++ description of the pseudo-log kernel is profiled in the framework to generate an enhanced description with a customized memory hierarchy. The enhanced description of the kernel is then used for high-level synthesis (HLS) to perform architectural design space exploration in order to find an optimal implementation under given performance constraints. The optimized register transfer level implementation of the IP generated after HLS is used for performance estimation. The performance estimation is done in a simulation framework to characterize the IP with different external off-chip memory latencies and a variety of data transfer policies. Experimental results show that the designed IP can be used for real-time implementation and that the generated memory hierarchy is capable of feeding the IP with a sufficiently high bandwidth even in the presence of long external memory latencies.

Paper Details

Date Published: 23 September 2014
PDF: 13 pages
J. Electron. Imag. 23(5) 053012 doi: 10.1117/1.JEI.23.5.053012
Published in: Journal of Electronic Imaging Volume 23, Issue 5
Show Author Affiliations
Shahzad Ahmad Butt, Politecnico di Torino (Italy)
Stéphane Mancini, TIMA Lab. (France)
Frédéric Rousseau, TIMA Lab. (France)
Luciano Lavagno, Politecnico di Torino (Italy)

© SPIE. Terms of Use
Back to Top