VLSI Circuits and Systems IV

Front Matter: Volume 7363

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 7363, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and the Conference Committee listing.

Future memory technologies

Wolfgang Mueller, Michael Kund

Show abstract

In this paper the concepts, status and technical challenges for high density working memory will be reviewed. The main technology covering this application space today is DRAM, based on a 1 transistor 1 capacitor cell (1T1C). 50-60nm DRAM technologies have been already introduced into mass production. Full process integration results for 40nm DRAM, and key technologies for the 30nm DRAM node have been presented previously. No technical roadblock is seen for further scaling down to the 30nm node, however some of the key technology concepts such as the capacitor dielectrics with capacitance equivalent (oxide) thickness (CET) of <0.5nm have still to be proven. The DRAM cell sizes currently in mass production are ranging between 8F² and 6F². The development of the further cell size reduction to 4F2 is under development. The status and scaling potential of the most probable DRAM successor candidate technologies: capacitor-less DRAM, phase-change RAM (PCRAM), and spin transfer torque MRAM (STT MRAM) will be discussed. Capacitor-less DRAM or floating body FB DRAM cells have been proposed, both for stand-alone memory and embedded memory applications. Different cell device schemes (transistor and capacitor-coupled thyristor) have been investigated. Recently a number of papers covering cell device data and integration schemes for 50nm feature sizes have been published. However so far no results based on a high density demonstrator chip or product have been shown. PCRAM is the most mature technology out of the candidates mentioned. Product demonstrators with 90nm design rules and densities up to 512Mb have been presented. The introduction of first products in 65-45nm technology for 2009 has been announced recently. Scalability of the phase change element to below 10nm has been demonstrated. Spin transfer torque (STT) MRAM has been proposed as a fast, nonvolatile, and scalable cell concept. The memory concept has been experimentally verified at structure sizes down to 50nm. Theoretical estimations indicate the scalability down to 20nm. A 2Mb product demonstrator has been published, utilizing a rather large cell size, however. Based on these data the comparison of the key parameters for the different technologies will be presented, and a mapping of the different technologies to the current DRAM application segments will be proposed.

Survey of reconfigurable architectures for multimedia applications

T. Cervero, S. López, G. M. Callicó, et al.

Show abstract

In a short period of time, the multimedia sector has quickly progressed trying to overcome the exigencies of the customers in terms of transfer speeds, storage memory, image quality, and functionalities. In order to cope with this stringent situation, different hardware devices have been developed as possible choices. Despite of the fact that not every device is apt for implementing the high computational demands associated to multimedia applications; reconfigurable architectures appear as ideal candidates to achieve these necessities. As a direct consequence, worldwide universities and industries have incremented their research activity into this area, generating an important know-how base. In order to sort all the information generated about this issue, this paper reviews the most recent reconfigurable architectures for multimedia applications. As a result, this paper establishes the benefits and drawbacks of the different dynamically reconfigurable architectures for multimedia applications according to their system-level design.

Method for run time hardware code profiling for algorithm acceleration

Vladimir Matev, Eduardo de la Torre, Teresa Riesgo

Show abstract

In this paper we propose a method for run time profiling of applications on instruction level by analysis of loops. Instead of looking for coarse grain blocks we concentrate on fine grain but still costly blocks in terms of execution times. Most code profiling is done in software by introducing code into the application under profile witch has time overhead, while in this work data for the position of a loop, loop body, size and number of executions is stored and analysed using a small non intrusive hardware block. The paper describes the system mapping to runtime reconfigurable systems. The fine grain code detector block synthesis results and its functionality verification are also presented in the paper. To demonstrate the concept MediaBench multimedia benchmark running on the chosen development platform is used.

Non-rectangular reconfigurable cores for system-on-chip

Pedro Alves, João Canas Ferreira

Show abstract

Non-rectangular cores of standard-cell-based reconfigurable logic can be used to fill space left on System-on-Chips, thereby providing the system with hardware reconfigurability. The proposed architecture for a non-rectangular reconfigurable core is based on a fixed set of blocks that implement logic functions, interconnections and configurable switching. The basic blocks connect by abutment to form clusters and clusters abut to form a complete reconfigurable core. A software tool was created to generate a gate-level netlist and the floorplan data of the reconfigurable logic core together with a basic testbench. Cores with non-rectangular shapes were created using 90 nm and 45 nm standard-cell technologies and validated by simulation. The results demonstrate the feasibility of a flexible, technology-independent architecture for non-rectangular reconfigurable logic cores that can be physically implemented using a standard digital design flow.

Using partial reconfiguration for SoC design and implementation

Yana E. Krasteva, Jorge Portilla, Félix Tobajas Guerrero, et al.

Show abstract

Most reconfigurable systems rely on FPGA technology. Among these ones, those which permit dynamic and partial reconfiguration, offer added benefits in flexibility, in-field device upgrade, improved design and manufacturing time, and even, in some cases, power consumption reductions. However, dynamic reconfiguration is a complex task, and the real benefits of its use in real applications have been often questioned. This paper presents an overview of the partial reconfiguration technique application, along with four original applications. The main goal of these applications is to test several architectures with different flexibility and, to search for the partial reconfiguration "killing application", that is, the application that better demonstrates the benefits of today reconfigurable systems based on commercial FPGAs. Therefore, the presented applications are rather a proof of concept, than fully operative and closed systems. First, a brief introduction to the partial reconfigurable systems application topic has been included. After that, the descriptions of the created reconfigurable systems are presented: first, an on-chip communications emulation framework, second, an on chip debugging system, third, a wireless sensor network reconfigurable node and finally, a remote reconfigurable client-server device. Each application is described in a separate section of the paper along with some test and results. General conclusions are included at the end of the paper.

Polytopol computing for multi-core and distributed systems

Henk Spaanenburg, Lambert Spaanenburg, Johan Ranefors

Show abstract

Multi-core computing provides new challenges to software engineering. The paper addresses such issues in the general setting of polytopol computing, that takes multi-core problems in such widely differing areas as ambient intelligence sensor networks and cloud computing into account. It argues that the essence lies in a suitable allocation of free moving tasks. Where hardware is ubiquitous and pervasive, the network is virtualized into a connection of software snippets judiciously injected to such hardware that a system function looks as one again. The concept of polytopol computing provides a further formalization in terms of the partitioning of labor between collector and sensor nodes. Collectors provide functions such as a knowledge integrator, awareness collector, situation displayer/reporter, communicator of clues and an inquiry-interface provider. Sensors provide functions such as anomaly detection (only communicating singularities, not continuous observation), they are generally powered or self-powered, amorphous (not on a grid) with generation-and-attrition, field re-programmable, and sensor plug-and-play-able. Together the collector and the sensor are part of the skeleton injector mechanism, added to every node, and give the network the ability to organize itself into some of many topologies. Finally we will discuss a number of applications and indicate how a multi-core architecture supports the security aspects of the skeleton injector.

Parallel workload analysis in SMP platform: a new modelling approach to infer the hardware efficiency for remote sensing application

Guo Yi, Eleni Kanellou, L. Andrés Cardona, et al.

Show abstract

The remote sensing techniques have put great pressure on real-time waveform post-processing design. Due to the intensive computation and multi-channel waveform integration, the overhead between the processing time and the storage of amount of data prior to downlink issues has lead us to get the solution of task-level parallelism. With the development of IC design and innovation of architecture, embedded system can range from a single microprocessor to a complex multi-processor and even including the embedded operating system (OS) on a chip. Therefore symmetric multiprocessing (SMP) with embedded OS offers an attractive way to expose coarse-grained parallelism application. In this paper we demonstrate a new modeling approach. In order to simplify the system; a workload model is derived from a remote sensing application, which represents the workload characteristic and time degrading factors. The intention is to leverage the task-level parallelism load is evenly to each processor in SMP, with the OS level testing to speculate the bottleneck in hardware level. This parallel workload model which maps to a 6-LEON3 SMP architecture, attains a 2.7x mean speedup over a single-LEON3 baseline; with 3-LEON3 attains a 2.23x mean speedup; with 2- LEON3 attains a 1.78x mean speedup over a single-LEON3 baseline. Due to the involved sharing resources and scheduling of multiple CPUs, the system will have a degrading in processing speed. With this lag we could infer the hardware pipeline efficiency. And afford on the processor-set subsystem and memory subsystem analysis reveal the affects on the system throughput.

Design of a miniaturized electrochemical instrument for in-situ O2 monitoring

Jordi Colomer-Farrarons, Pedro L. Miribel-Català, Josep Samitier, et al.

Show abstract

The authors are working toward the design of a device for the detection of oxygen, following a discrete and an integrated instrumentation implementation. The discrete electronics are also used for preliminary analysis, to confirm the validity of the conception of system, and its set-up would be used in the characterization of the integrated device, waiting for the chip fabrication. This paper presents the design of a small and portable potentiostat integrated with electrodes, which is cheap and miniaturized, which can be applied for on-site measurements for the simultaneous detection of O₂ and temperature in water systems. As a first approach a discrete PCB has been designed based on commercial discrete electronics and specific oxygen sensors. Dissolved oxygen concentration (DO) is an important index of water quality and the ability to measure the oxygen concentration and temperature at different positions and depths would be an important attribute to environmental analysis. Especially, the objective is that the sensor and the electronics can be integrated in a single encapsulated device able to be submerged in environmental water systems and be able to make multiple measurements. For our proposed application a small and portable device is developed, where electronics and sensors are miniaturized and placed in close proximity to each other. This system would be based on the sensors and electronics, forming one module, and connected to a portable notebook to save and analyze the measurements on-line. The key electronics is defined by the potentiostat amplifier, used to fix the voltage between the Working (WE) and Reference (RE) electrodes following an input voltage (Vin). Vin is a triangular signal, programmed by a LabView© interface, which is also used to represent the CV transfers. To obtain a smaller and compact solution the potentiostat amplifier has also been integrated defining a full custom ASIC amplifier, which is in progress, looking for a point-of-care device. These circuits have been designed with a 0.13 μm technology from ST Microelectronics through the CMP-TIMA service.

Hardware implementation of a scheduler for high performance switches with quality of service (QoS) support

R. Arteaga, F. Tobajas, V. De Armas, et al.

Show abstract

In this paper, the hardware implementation of a scheduler with QoS support is presented. The starting point is a Differentiated Service (DiffServ) network model. Each switch of this network classifies the packets in flows which are assigned to traffic classes depending of its requirements with an independent queue being available for each traffic class. Finally, the scheduler chooses the right queue in order to provide Quality of Service support. This scheduler considers the bandwidth distribution, introducing the time frame concept, and the packet delay, assigning a priority to each traffic class. The architecture of this algorithm is also presented in this paper describing their functionality and complexity. The architecture was described in Verilog HDL at RTL level. The complete system has been implemented in a Spartan-3 1000 FPGA device using ISE software from Xilinx, demonstrating it is a suitable design for high speed switches.

Resonation-based hybrid continuous-time/discrete-time cascade sigma-delta modulators: application to 4G wireless telecom

José M. de la Rosa, Alonso Morgado, Rocío del Río

Show abstract

This paper presents innovative architectures of hybrid Continuous-Time/Discrete-Time (CT/DT) cascade ΣΔ Modulators (ΣΔMs) made up of a front-end CT stage and a back-end DT stage. In addition to increasing the digitized signal bandwidth as compared to conventional ΣΔMs, the proposed topologies take advantage of the CT nature of the front-end ΣΔM stage, by embedding anti-aliasing filtering as well as their suitability to operate up to the GHz range. Moreover, the presented modulators include multi-bit quantization and Unity Signal Transfer Function (USTF) in both stages to reduce the integrator output swings, and programmable resonation to optimally distribute the zeroes of the overall Noise Transfer Function (NTF), such that the in-band quantization noise is minimized for each operation mode. Both local and inter-stage (global) based resonation architectures are synthesized and compared in terms of their circuit complexity, resolution-bandwidth programmability and robustness with respect to circuit non-ideal effects. The combination of all mentioned characteristics results in novel hybrid ΣΔMs, very suited for the implementation of adaptive/reconfigurable Analog-to-Digital Converters (ADCs) intended for the 4th Generation (4G) of wireless telecom systems.

Improved 10GBase-LX4 limiting amplifier in a low-cost 0.18 µm CMOS technology

J. M. García del Pozo, S. Celma, A. Otín

Show abstract

This work overcomes the limitations of a previous work by using three high frequency compensation techniques: polezero cancellation, shunt-peaking and downscaling. By considering these strategies, a fully integrated limiting amplifier in a low-cost 0.18 μm CMOS digital process is introduced. This design improves the original design without inductors and without local multi-feedback loops obtaining a compact, stable and robust design perfectly intended for low-voltage applications.

Anisotropic quality measurement applied to H.264 video compression

G. M. Callico, Sebastián Lopez, Salvador Gabarda, et al.

Show abstract

This paper presents the results of measuring the image quality of a video compression system based in the H.264 standard using the Anisotropic Quality Index (AQI). These results have been compared with the quality measured by means of the traditionally used Peak Signal to Noise Ratio (PSNR). The PSNR has demonstrated to be an unreliable way to compute the perceptual quality of images. Although it is widely used because its simplicity and immediacy to be computed, the PSNR and other methods based in the image differences measurement (as the Root Mean Squared Error or RMSE) experience the problem of not properly reflecting the real perceptual image quality. Images with the same amount of noise can present similar PSNRs values even with very different perceptual appearance. In the other side, the AQI has proven to be a more reliable way to analytically measure the perceptual image quality. This new measure is based on the use of a particular type of the high-order Rényi entropies. This method is based on measuring the anisotropy of the image through the variance of the expected value of the pixel-wise directional image entropy. Moreover, the AQI has the additional benefit of not needing a reference image. The reference image, compulsory in the PSNR computation, is usually impossible to obtain in real situations, thus relegating the PSNR only to test-bench developments. The possibility of computing the AQI opens the ability of self-regulated compression systems based on the adjustment of parameters that exhibit greater influence on the final image quality. This work shows the results of compressing several standard video sequences using the H.264 video compression standard. Compared with the PSNR, the AQI represents a better indicator of the perceptual quality of images.

A system for emulating the broadcasting of a DAB ensemble containing data services

David Samper, Pedro J. Lobo, Manuel César Rodríguez, et al.

Show abstract

In this paper, a system that emulates the whole DAB transmission chain, allowing the development and test of external decoders for DAB data services, is described. DAB receivers offer the possibility of connecting an external data decoder that handles additional data services, using a data interface called RDI. The system described in this paper replaces the complete DAB transmission chain from the transmitter to the RDI interface of the receiver. The system generates a DAB ensemble that can carry several data services and transmits the RDI frames corresponding to this ensemble through an RDI output. Any type of data service can be carried by the ensemble. The purpose of the system is to be used as a debug and verification tool for external decoder equipment that can be connected to a DAB receiver via an RDI interface. The system has been tested with two kinds of data services -data carousels and video streaming- with very satisfactory results in both cases. We are working currently on adding DMB support to our system.

ESL flow for a hardware H.264/AVC decoder using TLM-2.0 and high level synthesis: a quantitative study

M. Thadani, P. P. Carballo, P. Hernández, et al.

Show abstract

The present paper describes an Electronic System Level (ESL) design methodology which was established and employed in the creation of a H.264/AVC baseline decoder. The methodology involves the synthesis of the algorithmic description of the functional blocks that comprise the decoder, using a high level synthesis tool. Optimization and design space exploration is carried out at the algorithmic level before performing logic synthesis. Final, post-place and route implementation results show that the decoder can operate at the target frequency of 100 MHz and meet real time requirements for QCIF frames.

Implementation of a media synchronization algorithm for multistandard IP set-top box systems

Esther Estévez, David Samper, Fernando Pescador, et al.

Show abstract

Media synchronization at network context minimizes the effects of the network jitter and the skew between the emitter and receiver clocks. Theoretical algorithms cannot always be implemented on real systems for the architecture differences between a real and a theoretical system. In this paper an implementation for an intra-medium and an inter-media synchronization algorithm for a real multistandard IP set-top box is presented. For intra-medium synchronization, the proposed technique is based on controlling the receiver buffer. However for inter-media synchronization, the proposed technique is based on controlling the video playback according the Presentation Time Stamp (PTS) of the media units (audio and video). The proposed synchronizations algorithms has been integrated in an IP-STB and tested in a real environment using DVD movies and TV channels with excellent results. Those results show that the proposed algorithm can achieve media synchronization and meet the requirements of perceived quality of service (P-QoS).

Performance analysis of mixed communication architectures: bus and network-on-chip

Stefano Gigli, Massimo Conti

Show abstract

System on Chip performances in terms of speed and power dissipation are becoming dominated by communication between the cores. The communication architectures are usually based on bus or Network on Chip. Bus-based on chip communication architectures are simple and flexible. Network on Chip is a distributed communication architecture allowing to overcome the bus bottleneck occurring when the number of cores connected is high. In this work we present the integration in a SystemC NoC library of a new library for creating and simulating master and slave devices of the AMBA AHB bus. The simulation environment has been used to evaluate the performance in terms of communication throughput and delay in different communication architectures: AMBA AHB bus, NoC and mixed.

Cache-aware network-on-chip for chip multiprocessors

Konstantinos Tatas, Costas Kyriacou, George Dekoulis, et al.

Show abstract

This paper presents the hardware prototype of a Network-on-Chip (NoC) for a chip multiprocessor that provides support for cache coherence, cache prefetching and cache-aware thread scheduling. A NoC with support to these cache related mechanisms can assist in improving systems performance by reducing the cache miss ratio. The presented multi-core system employs the Data-Driven Multithreading (DDM) model of execution. In DDM thread scheduling is done according to data availability, thus the system is aware of the threads to be executed in the near future. This characteristic of the DDM model allows for cache aware thread scheduling and cache prefetching. The NoC prototype is a crossbar switch with output buffering that can support a cache-aware 4-node chip multiprocessor. The prototype is built on the Xilinx ML506 board equipped with a Xilinx Virtex-5 FPGA.

Dynamic power management of network-on-chip

Stefano Gigli, Luca Casagrande Montesi, Andrea Primavera, et al.

Show abstract

Systems on Chip performances in terms of speed and power dissipation is becoming dominated by communication between the cores. To overcome the limitations of traditional bus architectures, nowadays Network-on-Chip architectures are adopted. The Dynamic Power Management architecture and algorithm and Network-on-Chip topology and routing algorithms should be selected considering that they both effect in a complex and complementary way the network throughput and power dissipation. This paper presents the analysis of the effect of Dynamic Power Management strategies on Network-on-Chip performances.

NoC generation of an optimal memory distribution for multimedia systems

Raúl Regidor, Félix Tobajas, Valentin de Armas, et al.

Show abstract

In this paper a topological analysis of different IP distributions focusing on optimal memory placements in regular 2DMeshes has been performed. As case study, a real MPEG-4 decoder implementation with three memories was chosen. In order to study the influence of memories in the topology of the network, Arteris NoCexplorer tool was used. The results inferred from the experiments show how the performance of a multimedia system can be improved if memories are properly located within a NoC. Furthermore, the present work serves to validate the use of Arteris NoCexplorer for simulating and modelling complex NoC based designs. In addition, a methodology for determining the best IP distribution in terms of latency and throughput is presented and its feasibility is demonstrated.

Flexible CMOS low-noise amplifiers for beyond-3G wireless hand-held devices

Edwin C. Becerra-Alvarez, Federico Sandoval-Ibarra, José M. de la Rosa

Show abstract

This paper explores the use of reconfigurable Low-Noise Amplifiers (LNAs) for the implementation of CMOS Radio Frequency (RF) front-ends in the next generation of multi-standard wireless transceivers. Main circuit strategies reported so far for multi-standard LNAs are reviewed and a novel flexible LNA intended for Beyond-3G RF hand-held terminals is presented. The proposed LNA circuit consists of a two-stage topology that combines inductive-source degeneration with PMOS-varactor based tuning network and a programmable load to adapt its performance to different standard specifications without penalizing the circuit noise and with a reduced number of inductors as compared to previous reported reconfigurable LNAs. The circuit has been designed in a 90-nm CMOS technology to cope with the requirements of the GSM, WCDMA, Bluetooth and WLAN (IEEE 802.11b-g) standards. Simulation results, including technology and packaging parasitics, demonstrate correct operation of the circuit for all the standards under study, featuring NF<2.8dB, S₂₁>13.3dB and IIP3>10.9dBm, over a 1.85GHz-2.4GHz band, with an adaptive power consumption between 17mW and 22mW from a 1-V supply voltage. Preliminary experimental measurements are included, showing a correct reconfiguration operation within the operation band.

Comprehensive procedural approach for transferring or comparative analysis of analogue IP building blocks towards different CMOS technologies

Dorine M. Gevaert

Show abstract

The challenges for the next generation of integrated circuit design of analogue and mixed-signal building blocks in standard CMOS technologies for signal conversion demand research progress in the emerging scientific fields of device physics and modelling, converter architectures, design automation, quality assurance and cost factor analysis. Estimation of mismatch for analogue building blocks at the conceptual level and the impact on active area is not a straightforward calculation. The proposed design concepts reduce the over-sizing of transistors, compared with the existing methods, with 15 to 20% for the same quality specification. Besides the reduction of the silicon cost also the design time cost for new topologies is reduced considerably. Comparison has been done for current mode converters (ADC and DAC) and focussing on downscaling technologies. The developed method offers an integrated approach on the estimation of architecture performances, yield and IP-reuse. Matching energy remains constant over process generations and will be the limiting factor for current signal processing. The comprehensive understanding of all sources of mismatches and the use of physical based mismatch modelling in the prediction of mismatch errors, more adequate and realistic sizing of all transistors will result in an overall area reduction of analogue IP blocks. For each technology the following design curves are automatically developed: noise curves for a specified signal bandwidth, choice of overdrive voltage versus lambda and output resistance, physical mismatch error modelling on target current levels. The procedural approach shares knowledge of several design curves and speeds up the design time.

A low voltage CMOS low drop-out voltage regulator

Salma Ali Bakr, Tanvir Ahmad Abbasi, Mohammas Suhaib Abbasi, et al.

Show abstract

A low voltage implementation of a CMOS Low Drop-Out voltage regulator (LDO) is presented. The requirement of low voltage devices is crucial for portable devices that require extensive computations in a low power environment. The LDO is implemented in 90nm generic CMOS technology. It generates a fixed 0.8V from a 2.5V supply which on discharging goes to 1V. The buffer stage used is unity gain configured unbuffered OpAmp with rail-to-rail swing input stage. The simulation result shows that the implemented circuit provides load regulation of 0.004%/mA and line regulation of -11.09mV/V. The LDO provides full load transient response with a settling time of 5.2μs. Further, the dropout voltage is 200mV and the quiescent current through the pass transistor (Iload=0) is 20μA. The total power consumption of this LDO (excluding bandgap reference) is only 80μW.

0.18µm CMOS inductorless AGC amplifier with 50dB input dynamic range for 10GBase-LX4 ethernet

F. Aznar, S. Celma, B. Calvo, et al.

Show abstract

This paper presents a new automatic gain control main amplifier for 10GBase-LX4 Ethernet realized in a 0.18 μm CMOS process. The proposed optical-fiber differential post-amplifier is based on a very compact inductorless design which comprises three main digitally programmable gain stages followed by a buffer. It is characterized by a -3 dB cutoff frequency above 3 GHz over a -3 to 33 dB linear-in-dB gain control. It includes a DC offset cancellation network and an automatic gain control loop which establishes a setting time below 1μs. Results show a sensitivity of 2.1 mV for BER = 10^-12 and an input dynamic range above 50 dB. The power consumption is 58 mW at a single supply voltage of 1.8 V.

A 100mA fractional step-down charge pump with digital control

Valter A. L. Sadio, Abílio E. M. Parreira, Marcelino B. Santos

Show abstract

A switched capacitor step-down DC-DC converter (charge pump) is proposed. High efficiency is achieved by combination of fractional conversion ratios (different step-down modes of operation), output voltage sensing and pulse skipping based digital control techniques. Two control techniques were implemented with automatic change between modes and their results are discussed and compared. The power module has 9 switches, implemented with 14 power transistors, and a current limit circuit to mitigate the in-rush current in startup phase. This circuit has been designed in AMS C35B4 (0.35um) CMOS process. The charge pump was designed to provide a maximum load current of 100mA. The peak-to-peak output voltage ripple is less than 30mV with two 3uF flying capacitors and one 20uF output capacitor. Peak and average efficiencies, with maximum load current, are over 80% and 68%, respectively.

ModelSim-PSIM mixed signal simulation for power electronics digital control design

M. García Valderas, P. Zumel, A. Lázaro, et al.

Show abstract

In the design of Power Electronics converters, several approaches can be chosen for the implementation of the closed loop control. The use of a digital control loop implemented in a FPGA is becoming quite common. For the design of such a system, a simulation environment must be provided to check the digital and analog part working together. The simulation of both the analog and digital part is a very difficult task, which involves the simultaneous usage of an analog and a digital simulator, or the use of a mixed signal simulator. In this paper, we present a method to perform mixed signal simulation. The simulation is performed by linking PSIM analog simulator and ModelSim digital simulator. This method has proven to be very effective in the design of digital control circuits for power converters, implemented in FPGAs.

Dynamic OSR sigma delta controller for monolithic switching converters

M. Conti, S. Orcioni, R. d'Aparo

Show abstract

A digital controller for high frequency Switching Power Supply based on Sigma Delta modulation is proposed in this work. A technique to restrict average switching frequency in a suitable range is used. The complete system has been modelled and simulated at system level using the SystemC-WMS environment. A high precision controller has been designed with relatively low clock frequency and area occupancy of the Sigma Delta modulator, and at the same time reducing the sensitivity to parameter statistical variations and to temperature drift.

A new approach to accelerate SEU sensitivity evaluation in circuits with embedded memories

M. Portela-García, M. Garcia Valderas, C. Lopez-Ongil, et al.

Show abstract

Current circuit complexity requires faster fault injection techniques to allow the evaluation of a high number of faults in a reasonable time. In particular, FPGA emulation has proven to be a performance effective method to analyze the behavior of digital circuits in the presence of soft errors due to SEU effects. In general, fault emulation-based solutions that use circuit instrumentation to inject faults in the literature does not consider the fault emulation in circuits with embedded memories. The few existing proposals that study this kind of circuits are oriented to inject faults in microprocessors, are slow solutions with respect to the injection in flip-flops and with a poor capacity to analyze the circuit behavior, due to the limited accessibility in memories (a word memory per clock cycle). Embedded memories are more and more usual and large in modern designs, and therefore, the emulation of the embedded memories is a problem of rising importance. The proposed models presented in this work allow the fault emulation in embedded memories, injection faults and observing their effects in a fast way.

Analysis of current transients in SRAM memories for single event upset detection

G. Torrens, S. Bota, J. Verd, et al.

Show abstract

Soft errors resulting from the impact of charged particles are emerging as a major issue in the design of reliable circuits at deep sub-micron dimensions even at ground level. To face this challenge, a designer must dispose of a variety of mitigation schemes adapted to their specific design constraints. Built In Current Sensors have been proposed as a detection scheme for single event upsets in SRAM. In this paper, Power-Bus current transients in SRAM memories for Single Event Upset Detection have been analyzed in a 65nm CMOS technology. The different types of current roles which are applied during the simulation is discussed. The results show the important contribution of leakage currents in the response of the memory cell to an external event.

Automated insertion of twin gates to improve reliability concerning gate oxide breakdown

Hagen Saemrow, Claas Cornelius, Frank Sill, et al.

Show abstract

Scaling device dimensions towards atomic scales leads to increased reliability and yield concerns which considerably affects the work of integrated circuit designers. Furthermore, the complexity of integrated systems increases which leads to a demand for tool assisted reliability insertion during the design process. Lots of research efforts have focused on softerrors and system-level approaches. However, only few low-level solutions have been published to enhance lifetime reliability. Investigations in this field have reached an up to 200 % increased reliability concerning gate oxide breakdown if so called Twin Gates have been inserted. This contribution comprehensively presents algorithms to implement these redundant cells automatically during logic synthesis. Besides the placement in the whole design process, approaches are provided to insert Twin Gates correctly considering timing and area issues.

Static power dissipation in adder circuits: the UDSM domain

Steve Cayouette, Dhamin Al-Khalili

Show abstract

This paper presents adder circuits of various architectures aimed at reducing static power dissipation. Circuit topologies for basic building blocks were evaluated for fabrication technologies of 65nm down to 32nm, and simulation results are presented. This work has lead to the development of various low power adder circuits and provides comparative analysis leading to the recommendation that a variable size block carry select adder is the best performer, taking into consideration both static and dynamic power dissipation.

Approach to an FPGA embedded, autonomous object recognition system: run-time learning and adaptation

Rubén Salvador, Carlos Terleira, Félix Moreno, et al.

Show abstract

Neural networks, widely used in pattern recognition, security applications and robot control have been chosen for the task of object recognition within this system. One of the main drawbacks of the implementation of traditional neural networks in reconfigurable hardware is the huge resource consuming demand. This is due not only to their intrinsic parallelism, but also to the traditional big networks designed. However, modern FPGA architectures are perfectly suited for this kind of massive parallel computational needs. Therefore, our proposal is the implementation of Tiny Neural Networks, TNN -self-coined term-, in reconfigurable architectures. One of most important features of TNNs is their learning ability. Therefore, what we show here is the attempt to rise the autonomy features of the system, triggering a new learning phase, at run-time, when necessary. In this way, autonomous adaptation of the system is achieved. The system performs shape identification by the interpretation of object singularities. This is achieved by interconnecting several specialized TNN that work cooperatively. In order to validate the research, the system has been implemented and configured as a perceptron-like TNN with backpropagation learning and applied to the recognition of shapes. Simulation results show that this architecture has significant performance benefits.

Tiled architecture of a CNN-mostly IP system

Lambert Spaanenburg, Suleyman Malki

Show abstract

Multi-core architectures have been popularized with the advent of the IBM CELL. On a finer grain the problems in scheduling multi-cores have already existed in the tiled architectures, such as the EPIC and Da Vinci. It is not easy to evaluate the performance of a schedule on such architecture as historical data are not available. One solution is to compile algorithms for which an optimal schedule is known by analysis. A typical example is an algorithm that is already defined in terms of many collaborating simple nodes, such as a Cellular Neural Network (CNN). A simple node with a local register stack together with a 'rotating wheel' internal communication mechanism has been proposed. Though the basic CNN allows for a tiled implementation of a tiled algorithm on a tiled structure, a practical CNN system will have to disturb this regularity by the additional need for arithmetical and logical operations. Arithmetic operations are needed for instance to accommodate for low-level image processing, while logical operations are needed to fork and merge different data streams without use of the external memory. It is found that the 'rotating wheel' internal communication mechanism still handles such mechanisms without the need for global control. Overall the CNN system provides for a practical network size as implemented on a FPGA, can be easily used as embedded IP and provides a clear benchmark for a multi-core compiler.

Optimization of input-constrained systems

Suleyman Malki, Lambert Spaanenburg

Show abstract

The computational demands of algorithms are rapidly growing. The naive implementation uses extended doubleprecision floating-point numbers and has therefore extreme difficulties in maintaining real-time performance. For fixedpoint numbers, the value representation pushes in two directions (value range and step size) to set the applicationdependent word size. In the general case, checking all combinations of all different values on all system inputs will easily become computationally infeasible. Checking corner cases only helps to reduce the combinatorial explosion, as still checking for accuracy and precision to limit word size remains a considerable effort. A range of evolutionary techniques have been tried where the sheer size of the problem withstands an extensive search. When the value range can be limited, the problem becomes tractable and a constructive approach becomes feasible. We propose an approach that is reminiscent of the Quine-Mc.Cluskey logic minimization procedure. Next to the conjunctive search as popular in Boolean minimization, we investigate the disjunctive approach that starts from a presumed minimal word size. To eliminate the occurrence of anomalies, this still has to be checked for larger word sizes. The procedure has initially been implemented using Java and Matlab. We have applied the above procedure to feed-forward and to cellular neural networks (CNN) as typical examples of input-constrained systems. In the case of hole-filling by means of a CNN, we find that the 1461 different coefficient sets can be reduced to 360, each giving robust behaviour on 7-bits internal words.

An adaptable interface between resistive sensors and microcontrollers

J. Revuelto, N. Medrano, B. Calvo, et al.

Show abstract

The increasing application of sensor networks in many different fields causes a growing demand of low-cost passive sensors for monitoring physical variables as temperature, pressure or ambient humidity. These sensors need a conditioning circuit that allows an easy interface to a microcontroller, taking advantage of the full range of the sensor and reducing the microcontroller requirements. This paper presents a conditioning electronics designed to transform the output of low-cost resistive sensors to a frequency variable signal. The circuit is designed to use the full frequency range available, providing a good resolution. These quasi-digital signals are compatible to the logic levels of a standard low-power microcontroller, allowing its connection through a digital I/O port.

Considerations on the design of conventional receivers for wireless optical channels using a Monte Carlo based ray-tracing algorithm

S. Rodríguez, B. R. Mendoza, O. González, et al.

Show abstract

This paper presents a study of the design of a conventional receiver structure that offers improved performance with respect to the main IR channel parameters, such as path loss and rms delay spread. To this end, we use a recently proposed model for the effective signal-collection area of a conventional angle-diversity receiver that is nearer to real behaviour than the ideal model. The inclusion of this model in the Monte Carlo ray-tracing algorithm allows us to study those optical links that are characterized by the use of these receivers and investigate the structure of the conventional receiver that yields improve performance with respect to the IR channel parameters. Based on the obtained results, we propose the use of a conventional receiver composed of seven branches o photodiodes. One oriented towards the ceiling, and six looking at an elevation of 56° with a separation of 60° in azimuth. For each element, a CPC with a FOV=50° must be used. Furthermore, the proposed structure is evaluated in a representative link budget using L-PPM modulation schemes.

Autonomous low-noise system for broadband measurements of the cosmic microwave background radiation

George Dekoulis

Show abstract

This paper describes the digital side implementation of a new suborbital experiment for the measurement of broadband radiation emissions of the Cosmic Microwave Background (CMB) anisotropy. The system has been used in campaign mode for initial mapping of the galactic radiation power received at a single frequency. The recorded galactic sky map images are subsequently being used to forecast the emitted radiation at neighboring frequencies. A planned second campaign will verify the prediction algorithms efficiency in an autonomous manner. The system has reached an advanced stage in terms of hardware and software combined operation and intelligence, where other Space Physics measurements are performed autonomously depending on the burst event under investigation. The system has been built in a modular manner to expedite hardware and software upgrades. Such an upgrade has recently occurred mainly to expand the frequency range of space observations.

VLSI Circuits and Systems IV

Volume Details

Table of Contents

Table of Contents