Reconfigurable Technology: FPGAs and Reconfigurable Processors for Computing and Communications IV

Net-aware bitstreams that upgrade FPGA hardware remotely over the Internet: creating intelligent bitstreams that know where to go, what to do when they get there, and can report back when they're d

Steve Casselman, John Schewel

Show abstract

Success in the marketplace may well depend upon the ability to upgrade and test hardware designs instantly around the world. An upgrade management strategy requires more than just the bitstream file, email or a JTAG cable. A well-managed methodology, capable of transmitting bitstreams directly into targeted FPGAs over the network or internet is an essential element for a successful FPGA based product strategy. Virtual Computer Corporation’s HOTMan, Bitstream Management Environment combines a feature rich cross-platform API with an Object Oriented Bitstream technique for Remote Upgrading of Hardware over the Internet.

Constraint-directed CAD tool for automatic latency-optimal implementation of 1D and 2D Fourier transforms

J. Gregory Nash

Show abstract

A specialized CAD tool is described that will take a user's high level code description of a non-uniform affinely indexed algorithm and automatically generate abstract latency-optimal systolic arrays. Emphasis has been placed on ease of use and the ability to either force conformation to specific design criteria or perform unconstrained explorations. How such design goals are achieved is illustrated in the context of LU decomposition and the matrix Lyapunov equation. The tool is then used to generate new 1-D and 2-D hardware efficient systolic arrays for the discreet Fourier transform that take advantage of the use of the radix-4 matrix decomposition.

Single instruction set architectures for image processing

Phillip A. Laplante, William Gilreath

Show abstract

For more than fifty years, computer engineers have sought to construct minimal computers using only a single instruction computer. While it might appear to be a simple academic exercise, remarkably, a rich computation paradigm can be developed using this approach, with important applications and implications in reconfigurable, chemical, optical and biological computing. More recently, the widespread use of the Field Programmable Gate Array (FPGA) has made such an approach not only desirable, but also practical. In this paper the history and motivation behind single instruction or one instruction computing (OISC) is reviewed. It is then shown how the paradigm can be used to implement a variety of imaging operations efficiently. Finally, a practical application and future work in languages and tools are presented.

Design flow for the reconfigurable HW platform XPP

Claus Ritter, Eberhard Schueler, Eric Sax, et al.

Show abstract

Due to an increasing technology progress in the configurable hardware sector, which is currently dominated by FPGAs, new approaches like very fast re-configurable devices with ALU level granularity are on the rise. However, these coprocessor devices can not be programmed with conventional HW nor SW design approaches. To solve this dilemma, a combination is needed. This approach is described in this paper. Furthermore, an example how to program a re-configurable device is illustrated. This example consists of parts of an MPEG-4 decoder, which is running on the re-configurable processor platform XPP. The partitioning of these decoding algorithms into modules and the means of interaction between these modules is highlighted. In addition, the embedding of this algorithm in a XPP system is outlined.

Implementing a dynamically reconfigurable ATM switch on the VIRTEX FPGA of the FPX platform

Edson Lemos Horta, John W. Lockwood, Sergio Takeo Kofuji

Show abstract

This paper shows how a reconfigurable ATM switch (RECATS) has been implemented on a single VIRTEX FPGA, present in the Field Programmable Port Extender board (FPX). The switch architecture is outlined and the FPX platform is described in detail. Finally, the modifications applied to the FPX components and the methodology used to validate the reconfigurable switch are explained.

Framework for development and distribution of hardware acceleration

David B. Thomas, Wayne W.C. Luk

Show abstract

This paper describes IGOL, a framework for developing reconfigurable data processing applications. While IGOL was originally designed to target imaging and graphics systems, its structure is sufficiently general to support a broad range of applications. IGOL adopts a four-layer architecture: application layer, operation layer, appliance layer and configuration layer. This architecture is intended to separate and co-ordinate both the development and execution of hardware and software components. Hardware developers can use IGOL as an instance testbed for verification and benchmarking, as well as for distribution. Software application developers can use IGOL to discover hardware accelerated data processors, and to access them in a transparent, non-hardware specific manner. IGOL provides extensive support for the RC1000-PP board via the Handel-C language, and a wide selection of image processing filters have been developed. IGOL also supplies plug-ins to enable such filters to be incorporated in popular applications such as Premiere, Winamp, VirtualDub and DirectShow. Moreover, IGOL allows the automatic use of multiple cards to accelerate an application, demonstrated using DirectShow. To enable transparent acceleration without sacrificing performance, a three-tiered COM (Component Object Model) API has been designed and implemented. This API provides a well-defined and extensible interface which facilitates the development of hardware data processors that can accelerate multiple applications.

Optimizing parallel programs for hardware implementation

Jose Gabrial Figueiredo Coutinho, Wayne W.C. Luk, Markus Weinhardt

Show abstract

This paper describes an approach for optimizing hardware designs produced from software languages extended with constructs for parallel execution and hardware processing, such as the Handel-C language. Our aim is to optimize these programs by applying transformations that include the appropriate amount of parallelism, in order to obtain the best trade-offs in space and in time. These transformations can be applied automatically at compile time, enabling the programmer to adapt parallel programs rapidly to a specific hardware platform. Our transformational approach, which involves design sequentialisation and parallelisation, contains two novel features. First, we develop an algorithm for sequentialising parallel programs. This algorithm relaxes the scheduling of the original design, giving a scheduler the freedom to arrange it to achieve better results in speed, in size, or in both. Second, we combine this sequentialisation algorithm with pipeline vectorization, a technique known to reduce the execution delay of loops by pipelining the loop body. We adapt several transformation techniques used in vectorizing and parallelizing software compilers, such as loop unrolling and loop tiling, to widen the applicability of our method. Results show that our approach often works well: for instance a manually pipelined convolution design, for implementation in a Xilinx XC4000 device produced from a Handel-C description, is speeded up by over 2 times by our prototype compiler.

Defect-tolerant fine-grained parallel testing of a cell matrix

Lisa J.K. Durbeck, Nicholas J. Macias

Show abstract

A fault testing methodology for a cell-based self configurable hardware platform (the Cell Matrix) is described. Background on the Cell Matrix is given, including its amenability to use despite the presence of manufacturing defects. The ability of cells within the Cell Matrix to isolate faulty regions is also described. A method for testing individual cells, based on an external test driver, is discussed. The benefits of locating this test driver inside the device under test are explained. A method is described for efficient, autonomous, robust creation of a network of self-testing structures (called Supercells) for parallel implementation and execution of this test driver. Sample tests are described, and their results are given, demonstrating the effectiveness and robustness of the testing methodology. A discussion of the research, including conclusions, is presented. Plans for future work are discussed.

Parameterizing reconfigurable designs for image warping

Jun Jiang, Stefan Schmidt, Wayne W.C. Luk, et al.

Show abstract

This paper describes reconfigurable computing techniques for optimising image warping designs.Our image warping algorithm is based on radial basis functions, which enable the warping effect to be specified in terms of feature points. The coefficients of the warping function are obtained from the Symmetric Bipartite Table Method (SBTM), and the lookup tables can be generated dynamically at run time. We have deployed an optimised number representation involving both custom integer and custom floating-point formats in computing the radial function approximation. Furthermore, a fully-pipelined design has been developed in the Handel-C language, which can perform image warping in real time for resolutions up to 256 by 256 pixels on a Xilinx XC2V6000 device. This design is parameterisable at compile time for different image resolutions. Currently our implementation on a Xilinx Virtex XCV1000 device for the RC1000-PP platform achieves 50% faster than a software version on an AMD Athlon 1.4 GHz PC. A faster data bus and a larger FPGA for the RC1000-PP platform can result in a further speed improvement of over ten times.

Minimizing energy dissipation of matrix multiplication kernel on Virtex-II

Seonil Choi, Viktor K. Prasanna, Ju-wook Jang

Show abstract

In this paper, we develop energy-efficient designs for matrix multiplication on FPGAs. To analyze the energy dissipation, we develop a high-level model using domain-specific modeling techniques. In this model, we identify architecture parameters that significantly affect the total energy (system-wide energy) dissipation. Then, we explore design trade-offs by varying these parameters to minimize the system-wide energy. For matrix multiplication, we consider a uniprocessor architecture and a linear array architecture to develop energy-efficient designs. For the uniprocessor architecture, the cache size is a parameter that affects the I/O complexity and the system-wide energy. For the linear array architecture, the amount of storage per processing element is a parameter affecting the system-wide energy. By using maximum amount of storage per processing element and minimum number of multipliers, we obtain a design that minimizes the system-wide energy. We develop several energy-efficient designs for matrix multiplication. For example, for 6×6 matrix multiplication, energy savings of upto 52% for the uniprocessor architecture and 36% for the linear arrary architecture is achieved over an optimized library for Virtex-II FPGA from Xilinx.

Reconfigurable platform for development of embedded systems

Ming jiang Jiang Yang, Yan Xin Yan, Qing Guo Wang

Show abstract

Quality, functionality and time-to-market are key indices for a competitive and successful embedded system product. A good way to reduce the time to market is to make use of reusable software models and reconfigurable hardware platform. This paper introduces a reconfigurable platform, which is now being done for methodology research on rapid development of embedded systems. The effective design method and efficient implementation technology are formal reuse and reconfiguration. The reusability consideration is mainly the reuse frequency and the abstraction level of the application system, while the reconfigurability consideration mainly includes reconfiguration of function/architecture, hardware/software and interfaces. In view of these considerations, the paper describes three possibly reconfigurable architectures like DSP-FPGA, MCU-FPGA and DSP-MCU-FPGA architectures. To get these architectures, we can use reconfigurable data-path units and library-based interfaces. In terms of benefits, the paper not only introduces knowledge achieved from development of this platform, but also demonstrates how to use the platform to construct an orthogonal IP space for development of virtual IPs and virtual components.

Reconfigurable logic design case

Shing-Fat Fred Ma, John Knight, Calvin Plett

Show abstract

This design case identifies generalizable features of a course-grained reconfigurable FPGA, Chameleon's reconfigurable platform. An FFT is used to identify typical design practices, problems, and solutions in targeting such a platform. This paper focuses on datapath mapping, separating it into functional design and placement of reconfigurable resources. In addition to exploring the design methodology, it analyzes numerical artifacts, demonstrates efficient packing of the data path, and highlights differences from ASIC design.

Reconfigurable Technology: FPGAs and Reconfigurable Processors for Computing and Communications IV

Volume Details

Table of Contents

Table of Contents