Show all abstracts
View Session
- Front Matter: Volume 7363
- Plenary
- Reconfigurable Hardware
- Sensors and Signal Conditioning
- Design for Communication Systems
- Multimedia Applications
- Network on a Chip
- Design of Analog Circuits
- Smart Power
- Test and Reliability
- Digital Design
- Poster Session
Front Matter: Volume 7363
Front Matter: Volume 7363
Show abstract
This PDF file contains the front matter associated with SPIE Proceedings Volume 7363, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and the Conference Committee listing.
Plenary
Future memory technologies
Show abstract
In this paper the concepts, status and technical challenges for high density working memory will be reviewed. The main technology covering this application space today is DRAM, based on a 1 transistor 1 capacitor cell (1T1C). 50-60nm DRAM technologies have been already introduced into mass production. Full process integration results for 40nm DRAM, and key technologies for the 30nm DRAM node have been presented previously. No technical roadblock is seen for further scaling down to the 30nm node, however some of the key technology concepts such as the capacitor dielectrics with capacitance equivalent (oxide) thickness (CET) of <0.5nm have still to be proven. The DRAM cell sizes currently in mass production are ranging between 8F2 and 6F2. The development of the further cell size reduction to 4F2 is under development. The status and scaling potential of the most probable DRAM successor candidate technologies: capacitor-less DRAM, phase-change RAM (PCRAM), and spin transfer torque MRAM (STT MRAM) will be discussed. Capacitor-less DRAM or floating body FB DRAM cells have been proposed, both for stand-alone memory and embedded memory applications. Different cell device schemes (transistor and capacitor-coupled thyristor) have been investigated. Recently a number of papers covering cell device data and integration schemes for 50nm feature sizes have been published. However so far no results based on a high density demonstrator chip or product have been shown. PCRAM is the most mature technology out of the candidates mentioned. Product demonstrators with 90nm design rules and densities up to 512Mb have been presented. The introduction of first products in 65-45nm technology for 2009 has been announced recently. Scalability of the phase change element to below 10nm has been demonstrated. Spin transfer torque (STT) MRAM has been proposed as a fast, nonvolatile, and scalable cell concept. The memory concept has been experimentally verified at structure sizes down to 50nm. Theoretical estimations indicate the scalability down to 20nm. A 2Mb product demonstrator has been published, utilizing a rather large cell size, however. Based on these data the comparison of the key parameters for the different technologies will be presented, and a mapping of the different technologies to the current DRAM application segments will be proposed.
Reconfigurable Hardware
Survey of reconfigurable architectures for multimedia applications
Show abstract
In a short period of time, the multimedia sector has quickly progressed trying to overcome the exigencies of the
customers in terms of transfer speeds, storage memory, image quality, and functionalities. In order to cope with this
stringent situation, different hardware devices have been developed as possible choices. Despite of the fact that not every
device is apt for implementing the high computational demands associated to multimedia applications; reconfigurable
architectures appear as ideal candidates to achieve these necessities. As a direct consequence, worldwide universities and
industries have incremented their research activity into this area, generating an important know-how base. In order to
sort all the information generated about this issue, this paper reviews the most recent reconfigurable architectures for
multimedia applications. As a result, this paper establishes the benefits and drawbacks of the different dynamically
reconfigurable architectures for multimedia applications according to their system-level design.
Method for run time hardware code profiling for algorithm acceleration
Show abstract
In this paper we propose a method for run time profiling of applications on instruction level by analysis of loops. Instead
of looking for coarse grain blocks we concentrate on fine grain but still costly blocks in terms of execution times. Most
code profiling is done in software by introducing code into the application under profile witch has time overhead, while
in this work data for the position of a loop, loop body, size and number of executions is stored and analysed using a
small non intrusive hardware block. The paper describes the system mapping to runtime reconfigurable systems. The fine
grain code detector block synthesis results and its functionality verification are also presented in the paper. To
demonstrate the concept MediaBench multimedia benchmark running on the chosen development platform is used.
Non-rectangular reconfigurable cores for system-on-chip
Show abstract
Non-rectangular cores of standard-cell-based reconfigurable logic can be used to fill space left on System-on-Chips,
thereby providing the system with hardware reconfigurability. The proposed architecture for a non-rectangular
reconfigurable core is based on a fixed set of blocks that implement logic functions, interconnections and configurable
switching. The basic blocks connect by abutment to form clusters and clusters abut to form a complete reconfigurable
core. A software tool was created to generate a gate-level netlist and the floorplan data of the reconfigurable logic core
together with a basic testbench. Cores with non-rectangular shapes were created using 90 nm and 45 nm standard-cell
technologies and validated by simulation. The results demonstrate the feasibility of a flexible, technology-independent
architecture for non-rectangular reconfigurable logic cores that can be physically implemented using a standard digital
design flow.
Using partial reconfiguration for SoC design and implementation
Show abstract
Most reconfigurable systems rely on FPGA technology. Among these ones, those which permit dynamic and partial
reconfiguration, offer added benefits in flexibility, in-field device upgrade, improved design and manufacturing time, and
even, in some cases, power consumption reductions. However, dynamic reconfiguration is a complex task, and the real
benefits of its use in real applications have been often questioned.
This paper presents an overview of the partial reconfiguration technique application, along with four original
applications. The main goal of these applications is to test several architectures with different flexibility and, to search
for the partial reconfiguration "killing application", that is, the application that better demonstrates the benefits of today
reconfigurable systems based on commercial FPGAs. Therefore, the presented applications are rather a proof of concept,
than fully operative and closed systems. First, a brief introduction to the partial reconfigurable systems application topic
has been included. After that, the descriptions of the created reconfigurable systems are presented: first, an on-chip
communications emulation framework, second, an on chip debugging system, third, a wireless sensor network
reconfigurable node and finally, a remote reconfigurable client-server device. Each application is described in a separate
section of the paper along with some test and results. General conclusions are included at the end of the paper.
Sensors and Signal Conditioning
Polytopol computing for multi-core and distributed systems
Henk Spaanenburg,
Lambert Spaanenburg,
Johan Ranefors
Show abstract
Multi-core computing provides new challenges to software engineering. The paper addresses such issues in the general
setting of polytopol computing, that takes multi-core problems in such widely differing areas as ambient intelligence
sensor networks and cloud computing into account. It argues that the essence lies in a suitable allocation of free moving
tasks. Where hardware is ubiquitous and pervasive, the network is virtualized into a connection of software snippets
judiciously injected to such hardware that a system function looks as one again. The concept of polytopol computing
provides a further formalization in terms of the partitioning of labor between collector and sensor nodes. Collectors
provide functions such as a knowledge integrator, awareness collector, situation displayer/reporter, communicator of
clues and an inquiry-interface provider. Sensors provide functions such as anomaly detection (only communicating
singularities, not continuous observation), they are generally powered or self-powered, amorphous (not on a grid) with
generation-and-attrition, field re-programmable, and sensor plug-and-play-able. Together the collector and the sensor are
part of the skeleton injector mechanism, added to every node, and give the network the ability to organize itself into
some of many topologies. Finally we will discuss a number of applications and indicate how a multi-core architecture
supports the security aspects of the skeleton injector.
Parallel workload analysis in SMP platform: a new modelling approach to infer the hardware efficiency for remote sensing application
Show abstract
The remote sensing techniques have put great pressure on real-time waveform post-processing design. Due to the
intensive computation and multi-channel waveform integration, the overhead between the processing time and the
storage of amount of data prior to downlink issues has lead us to get the solution of task-level parallelism. With the
development of IC design and innovation of architecture, embedded system can range from a single microprocessor to a
complex multi-processor and even including the embedded operating system (OS) on a chip. Therefore symmetric
multiprocessing (SMP) with embedded OS offers an attractive way to expose coarse-grained parallelism application.
In this paper we demonstrate a new modeling approach. In order to simplify the system; a workload model is derived
from a remote sensing application, which represents the workload characteristic and time degrading factors. The
intention is to leverage the task-level parallelism load is evenly to each processor in SMP, with the OS level testing to
speculate the bottleneck in hardware level. This parallel workload model which maps to a 6-LEON3 SMP architecture,
attains a 2.7x mean speedup over a single-LEON3 baseline; with 3-LEON3 attains a 2.23x mean speedup; with 2-
LEON3 attains a 1.78x mean speedup over a single-LEON3 baseline. Due to the involved sharing resources and
scheduling of multiple CPUs, the system will have a degrading in processing speed. With this lag we could infer the
hardware pipeline efficiency. And afford on the processor-set subsystem and memory subsystem analysis reveal the
affects on the system throughput.
Design of a miniaturized electrochemical instrument for in-situ O2 monitoring
Show abstract
The authors are working toward the design of a device for the detection of oxygen, following a discrete and an integrated
instrumentation implementation. The discrete electronics are also used for preliminary analysis, to confirm the validity of
the conception of system, and its set-up would be used in the characterization of the integrated device, waiting for the
chip fabrication.
This paper presents the design of a small and portable potentiostat integrated with electrodes, which is cheap and
miniaturized, which can be applied for on-site measurements for the simultaneous detection of O2 and temperature in
water systems.
As a first approach a discrete PCB has been designed based on commercial discrete electronics and specific oxygen
sensors. Dissolved oxygen concentration (DO) is an important index of water quality and the ability to measure the
oxygen concentration and temperature at different positions and depths would be an important attribute to environmental
analysis. Especially, the objective is that the sensor and the electronics can be integrated in a single encapsulated device
able to be submerged in environmental water systems and be able to make multiple measurements.
For our proposed application a small and portable device is developed, where electronics and sensors are miniaturized
and placed in close proximity to each other.
This system would be based on the sensors and electronics, forming one module, and connected to a portable notebook to
save and analyze the measurements on-line.
The key electronics is defined by the potentiostat amplifier, used to fix the voltage between the Working (WE) and
Reference (RE) electrodes following an input voltage (Vin). Vin is a triangular signal, programmed by a LabView©
interface, which is also used to represent the CV transfers.
To obtain a smaller and compact solution the potentiostat amplifier has also been integrated defining a full custom ASIC
amplifier, which is in progress, looking for a point-of-care device. These circuits have been designed with a 0.13 μm
technology from ST Microelectronics through the CMP-TIMA service.
Design for Communication Systems
Hardware implementation of a scheduler for high performance switches with quality of service (QoS) support
Show abstract
In this paper, the hardware implementation of a scheduler with QoS support is presented. The starting point is a
Differentiated Service (DiffServ) network model. Each switch of this network classifies the packets in flows which are
assigned to traffic classes depending of its requirements with an independent queue being available for each traffic class.
Finally, the scheduler chooses the right queue in order to provide Quality of Service support. This scheduler considers
the bandwidth distribution, introducing the time frame concept, and the packet delay, assigning a priority to each traffic
class. The architecture of this algorithm is also presented in this paper describing their functionality and complexity. The
architecture was described in Verilog HDL at RTL level. The complete system has been implemented in a Spartan-3
1000 FPGA device using ISE software from Xilinx, demonstrating it is a suitable design for high speed switches.
Resonation-based hybrid continuous-time/discrete-time cascade sigma-delta modulators: application to 4G wireless telecom
Show abstract
This paper presents innovative architectures of hybrid Continuous-Time/Discrete-Time (CT/DT) cascade ΣΔ Modulators
(ΣΔMs) made up of a front-end CT stage and a back-end DT stage. In addition to increasing the digitized signal bandwidth
as compared to conventional ΣΔMs, the proposed topologies take advantage of the CT nature of the front-end ΣΔM stage,
by embedding anti-aliasing filtering as well as their suitability to operate up to the GHz range. Moreover, the presented
modulators include multi-bit quantization and Unity Signal Transfer Function (USTF) in both stages to reduce the integrator
output swings, and programmable resonation to optimally distribute the zeroes of the overall Noise Transfer Function
(NTF), such that the in-band quantization noise is minimized for each operation mode. Both local and inter-stage
(global) based resonation architectures are synthesized and compared in terms of their circuit complexity, resolution-bandwidth
programmability and robustness with respect to circuit non-ideal effects. The combination of all mentioned characteristics
results in novel hybrid ΣΔMs, very suited for the implementation of adaptive/reconfigurable Analog-to-Digital
Converters (ADCs) intended for the 4th Generation (4G) of wireless telecom systems.
Improved 10GBase-LX4 limiting amplifier in a low-cost 0.18 µm CMOS technology
Show abstract
This work overcomes the limitations of a previous work by using three high frequency compensation techniques: polezero
cancellation, shunt-peaking and downscaling. By considering these strategies, a fully integrated limiting amplifier in
a low-cost 0.18 μm CMOS digital process is introduced. This design improves the original design without inductors and
without local multi-feedback loops obtaining a compact, stable and robust design perfectly intended for low-voltage
applications.
Multimedia Applications
Anisotropic quality measurement applied to H.264 video compression
Show abstract
This paper presents the results of measuring the image quality of a video compression system based in the H.264
standard using the Anisotropic Quality Index (AQI). These results have been compared with the quality measured by
means of the traditionally used Peak Signal to Noise Ratio (PSNR). The PSNR has demonstrated to be an unreliable way
to compute the perceptual quality of images. Although it is widely used because its simplicity and immediacy to be
computed, the PSNR and other methods based in the image differences measurement (as the Root Mean Squared Error
or RMSE) experience the problem of not properly reflecting the real perceptual image quality. Images with the same
amount of noise can present similar PSNRs values even with very different perceptual appearance. In the other side, the
AQI has proven to be a more reliable way to analytically measure the perceptual image quality. This new measure is
based on the use of a particular type of the high-order Rényi entropies. This method is based on measuring the anisotropy
of the image through the variance of the expected value of the pixel-wise directional image entropy. Moreover, the AQI
has the additional benefit of not needing a reference image. The reference image, compulsory in the PSNR computation,
is usually impossible to obtain in real situations, thus relegating the PSNR only to test-bench developments. The
possibility of computing the AQI opens the ability of self-regulated compression systems based on the adjustment of
parameters that exhibit greater influence on the final image quality. This work shows the results of compressing several
standard video sequences using the H.264 video compression standard. Compared with the PSNR, the AQI represents a
better indicator of the perceptual quality of images.
A system for emulating the broadcasting of a DAB ensemble containing data services
David Samper,
Pedro J. Lobo,
Manuel César Rodríguez,
et al.
Show abstract
In this paper, a system that emulates the whole DAB transmission chain, allowing the development and test of external
decoders for DAB data services, is described. DAB receivers offer the possibility of connecting an external data decoder
that handles additional data services, using a data interface called RDI. The system described in this paper replaces the
complete DAB transmission chain from the transmitter to the RDI interface of the receiver. The system generates a DAB
ensemble that can carry several data services and transmits the RDI frames corresponding to this ensemble through an
RDI output. Any type of data service can be carried by the ensemble. The purpose of the system is to be used as a debug
and verification tool for external decoder equipment that can be connected to a DAB receiver via an RDI interface. The
system has been tested with two kinds of data services -data carousels and video streaming- with very satisfactory
results in both cases. We are working currently on adding DMB support to our system.
ESL flow for a hardware H.264/AVC decoder using TLM-2.0 and high level synthesis: a quantitative study
Show abstract
The present paper describes an Electronic System Level (ESL) design methodology which was established and employed
in the creation of a H.264/AVC baseline decoder. The methodology involves the synthesis of the algorithmic description
of the functional blocks that comprise the decoder, using a high level synthesis tool. Optimization and design space
exploration is carried out at the algorithmic level before performing logic synthesis. Final, post-place and route
implementation results show that the decoder can operate at the target frequency of 100 MHz and meet real time
requirements for QCIF frames.
Implementation of a media synchronization algorithm for multistandard IP set-top box systems
Show abstract
Media synchronization at network context minimizes the effects of the network jitter and the skew between the emitter and receiver clocks. Theoretical algorithms cannot always be implemented on real
systems for the architecture differences between a real and a theoretical system. In this paper an implementation for an intra-medium and an inter-media synchronization algorithm for a real multistandard IP set-top box is presented. For intra-medium synchronization, the proposed technique is based on controlling the receiver buffer. However for inter-media synchronization, the proposed technique is based on controlling the video playback according the Presentation Time Stamp (PTS) of the media units
(audio and video). The proposed synchronizations algorithms has been integrated in an IP-STB and tested in a real environment using DVD movies and TV channels with excellent results. Those results show that
the proposed algorithm can achieve media synchronization and meet the requirements of perceived quality of service (P-QoS).
Network on a Chip
Performance analysis of mixed communication architectures: bus and network-on-chip
Show abstract
System on Chip performances in terms of speed and power dissipation are becoming dominated by communication
between the cores. The communication architectures are usually based on bus or Network on Chip. Bus-based on chip
communication architectures are simple and flexible. Network on Chip is a distributed communication architecture
allowing to overcome the bus bottleneck occurring when the number of cores connected is high. In this work we present
the integration in a SystemC NoC library of a new library for creating and simulating master and slave devices of the
AMBA AHB bus. The simulation environment has been used to evaluate the performance in terms of communication
throughput and delay in different communication architectures: AMBA AHB bus, NoC and mixed.
Cache-aware network-on-chip for chip multiprocessors
Show abstract
This paper presents the hardware prototype of a Network-on-Chip (NoC) for a chip multiprocessor that provides support
for cache coherence, cache prefetching and cache-aware thread scheduling. A NoC with support to these cache related
mechanisms can assist in improving systems performance by reducing the cache miss ratio. The presented multi-core
system employs the Data-Driven Multithreading (DDM) model of execution. In DDM thread scheduling is done
according to data availability, thus the system is aware of the threads to be executed in the near future. This characteristic
of the DDM model allows for cache aware thread scheduling and cache prefetching. The NoC prototype is a crossbar
switch with output buffering that can support a cache-aware 4-node chip multiprocessor. The prototype is built on the
Xilinx ML506 board equipped with a Xilinx Virtex-5 FPGA.
Dynamic power management of network-on-chip
Stefano Gigli,
Luca Casagrande Montesi,
Andrea Primavera,
et al.
Show abstract
Systems on Chip performances in terms of speed and power dissipation is becoming dominated by communication
between the cores. To overcome the limitations of traditional bus architectures, nowadays Network-on-Chip
architectures are adopted. The Dynamic Power Management architecture and algorithm and Network-on-Chip
topology and routing algorithms should be selected considering that they both effect in a complex and complementary
way the network throughput and power dissipation. This paper presents the analysis of the effect of
Dynamic Power Management strategies on Network-on-Chip performances.
NoC generation of an optimal memory distribution for multimedia systems
Show abstract
In this paper a topological analysis of different IP distributions focusing on optimal memory placements in regular 2DMeshes
has been performed. As case study, a real MPEG-4 decoder implementation with three memories was chosen. In
order to study the influence of memories in the topology of the network, Arteris NoCexplorer tool was used. The results
inferred from the experiments show how the performance of a multimedia system can be improved if memories are
properly located within a NoC. Furthermore, the present work serves to validate the use of Arteris NoCexplorer for
simulating and modelling complex NoC based designs. In addition, a methodology for determining the best IP
distribution in terms of latency and throughput is presented and its feasibility is demonstrated.
Design of Analog Circuits
Flexible CMOS low-noise amplifiers for beyond-3G wireless hand-held devices
Show abstract
This paper explores the use of reconfigurable Low-Noise Amplifiers (LNAs) for the implementation of CMOS Radio Frequency
(RF) front-ends in the next generation of multi-standard wireless transceivers. Main circuit strategies reported so
far for multi-standard LNAs are reviewed and a novel flexible LNA intended for Beyond-3G RF hand-held terminals is
presented. The proposed LNA circuit consists of a two-stage topology that combines inductive-source degeneration with
PMOS-varactor based tuning network and a programmable load to adapt its performance to different standard specifications
without penalizing the circuit noise and with a reduced number of inductors as compared to previous reported reconfigurable
LNAs. The circuit has been designed in a 90-nm CMOS technology to cope with the requirements of the GSM,
WCDMA, Bluetooth and WLAN (IEEE 802.11b-g) standards. Simulation results, including technology and packaging
parasitics, demonstrate correct operation of the circuit for all the standards under study, featuring NF<2.8dB, S21>13.3dB
and IIP3>10.9dBm, over a 1.85GHz-2.4GHz band, with an adaptive power consumption between 17mW and 22mW from
a 1-V supply voltage. Preliminary experimental measurements are included, showing a correct reconfiguration operation
within the operation band.
Comprehensive procedural approach for transferring or comparative analysis of analogue IP building blocks towards different CMOS technologies
Show abstract
The challenges for the next generation of integrated circuit design of analogue and mixed-signal building blocks in
standard CMOS technologies for signal conversion demand research progress in the emerging scientific fields of device
physics and modelling, converter architectures, design automation, quality assurance and cost factor analysis. Estimation
of mismatch for analogue building blocks at the conceptual level and the impact on active area is not a straightforward
calculation. The proposed design concepts reduce the over-sizing of transistors, compared with the existing methods,
with 15 to 20% for the same quality specification. Besides the reduction of the silicon cost also the design time cost for
new topologies is reduced considerably. Comparison has been done for current mode converters (ADC and DAC) and
focussing on downscaling technologies. The developed method offers an integrated approach on the estimation of
architecture performances, yield and IP-reuse.
Matching energy remains constant over process generations and will be the limiting factor for current signal processing.
The comprehensive understanding of all sources of mismatches and the use of physical based mismatch modelling in the
prediction of mismatch errors, more adequate and realistic sizing of all transistors will result in an overall area reduction
of analogue IP blocks.
For each technology the following design curves are automatically developed: noise curves for a specified signal
bandwidth, choice of overdrive voltage versus lambda and output resistance, physical mismatch error modelling on
target current levels. The procedural approach shares knowledge of several design curves and speeds up the design time.
A low voltage CMOS low drop-out voltage regulator
Show abstract
A low voltage implementation of a CMOS Low Drop-Out voltage regulator (LDO) is presented. The requirement of low
voltage devices is crucial for portable devices that require extensive computations in a low power environment. The
LDO is implemented in 90nm generic CMOS technology. It generates a fixed 0.8V from a 2.5V supply which on
discharging goes to 1V. The buffer stage used is unity gain configured unbuffered OpAmp with rail-to-rail swing input
stage. The simulation result shows that the implemented circuit provides load regulation of 0.004%/mA and line
regulation of -11.09mV/V. The LDO provides full load transient response with a settling time of 5.2μs. Further, the
dropout voltage is 200mV and the quiescent current through the pass transistor (Iload=0) is 20μA. The total power
consumption of this LDO (excluding bandgap reference) is only 80μW.
0.18µm CMOS inductorless AGC amplifier with 50dB input dynamic range for 10GBase-LX4 ethernet
Show abstract
This paper presents a new automatic gain control main amplifier for 10GBase-LX4 Ethernet realized in a 0.18 μm
CMOS process. The proposed optical-fiber differential post-amplifier is based on a very compact inductorless design
which comprises three main digitally programmable gain stages followed by a buffer. It is characterized by a -3 dB
cutoff frequency above 3 GHz over a -3 to 33 dB linear-in-dB gain control. It includes a DC offset cancellation
network and an automatic gain control loop which establishes a setting time below 1μs. Results show a sensitivity of
2.1 mV for BER = 10-12 and an input dynamic range above 50 dB. The power consumption is 58 mW at a single supply
voltage of 1.8 V.
Smart Power
A 100mA fractional step-down charge pump with digital control
Show abstract
A switched capacitor step-down DC-DC converter (charge pump) is proposed. High efficiency is achieved by
combination of fractional conversion ratios (different step-down modes of operation), output voltage sensing and pulse
skipping based digital control techniques. Two control techniques were implemented with automatic change between
modes and their results are discussed and compared. The power module has 9 switches, implemented with 14 power
transistors, and a current limit circuit to mitigate the in-rush current in startup phase. This circuit has been designed in
AMS C35B4 (0.35um) CMOS process.
The charge pump was designed to provide a maximum load current of 100mA. The peak-to-peak output voltage ripple is
less than 30mV with two 3uF flying capacitors and one 20uF output capacitor. Peak and average efficiencies, with
maximum load current, are over 80% and 68%, respectively.
ModelSim-PSIM mixed signal simulation for power electronics digital control design
Show abstract
In the design of Power Electronics converters, several approaches can be chosen for the implementation of the closed
loop control. The use of a digital control loop implemented in a FPGA is becoming quite common. For the design of
such a system, a simulation environment must be provided to check the digital and analog part working together. The
simulation of both the analog and digital part is a very difficult task, which involves the simultaneous usage of an analog
and a digital simulator, or the use of a mixed signal simulator. In this paper, we present a method to perform mixed
signal simulation. The simulation is performed by linking PSIM analog simulator and ModelSim digital simulator. This
method has proven to be very effective in the design of digital control circuits for power converters, implemented in
FPGAs.
Dynamic OSR sigma delta controller for monolithic switching converters
Show abstract
A digital controller for high frequency Switching Power Supply based on Sigma Delta modulation is proposed in this work. A technique to restrict average switching frequency in a suitable range is used. The complete system has been modelled and simulated at system level using the SystemC-WMS environment. A high precision controller
has been designed with relatively low clock frequency and area occupancy of the Sigma Delta modulator, and at the same time reducing the sensitivity to parameter statistical variations and to temperature drift.
Test and Reliability
A new approach to accelerate SEU sensitivity evaluation in circuits with embedded memories
Show abstract
Current circuit complexity requires faster fault injection techniques to allow the evaluation of a high number of faults in
a reasonable time. In particular, FPGA emulation has proven to be a performance effective method to analyze the
behavior of digital circuits in the presence of soft errors due to SEU effects. In general, fault emulation-based solutions
that use circuit instrumentation to inject faults in the literature does not consider the fault emulation in circuits with
embedded memories. The few existing proposals that study this kind of circuits are oriented to inject faults in
microprocessors, are slow solutions with respect to the injection in flip-flops and with a poor capacity to analyze the
circuit behavior, due to the limited accessibility in memories (a word memory per clock cycle). Embedded memories are
more and more usual and large in modern designs, and therefore, the emulation of the embedded memories is a problem
of rising importance. The proposed models presented in this work allow the fault emulation in embedded memories,
injection faults and observing their effects in a fast way.
Analysis of current transients in SRAM memories for single event upset detection
Show abstract
Soft errors resulting from the impact of charged particles are emerging as a major issue in the design of reliable circuits
at deep sub-micron dimensions even at ground level. To face this challenge, a designer must dispose of a variety of
mitigation schemes adapted to their specific design constraints. Built In Current Sensors have been proposed as a
detection scheme for single event upsets in SRAM. In this paper, Power-Bus current transients in SRAM memories for
Single Event Upset Detection have been analyzed in a 65nm CMOS technology. The different types of current roles
which are applied during the simulation is discussed. The results show the important contribution of leakage currents in
the response of the memory cell to an external event.
Automated insertion of twin gates to improve reliability concerning gate oxide breakdown
Show abstract
Scaling device dimensions towards atomic scales leads to increased reliability and yield concerns which considerably
affects the work of integrated circuit designers. Furthermore, the complexity of integrated systems increases which leads
to a demand for tool assisted reliability insertion during the design process. Lots of research efforts have focused on softerrors
and system-level approaches. However, only few low-level solutions have been published to enhance lifetime
reliability. Investigations in this field have reached an up to 200 % increased reliability concerning gate oxide breakdown
if so called Twin Gates have been inserted. This contribution comprehensively presents algorithms to implement these
redundant cells automatically during logic synthesis. Besides the placement in the whole design process, approaches are
provided to insert Twin Gates correctly considering timing and area issues.
Digital Design
Static power dissipation in adder circuits: the UDSM domain
Show abstract
This paper presents adder circuits of various architectures aimed at reducing static power dissipation. Circuit
topologies for basic building blocks were evaluated for fabrication technologies of 65nm down to 32nm, and
simulation results are presented. This work has lead to the development of various low power adder circuits and
provides comparative analysis leading to the recommendation that a variable size block carry select adder is the
best performer, taking into consideration both static and dynamic power dissipation.
Approach to an FPGA embedded, autonomous object recognition system: run-time learning and adaptation
Show abstract
Neural networks, widely used in pattern recognition, security applications and robot control have been chosen for the task of object recognition within this system. One of the main drawbacks of the implementation of traditional neural networks in reconfigurable hardware is the huge resource consuming demand. This is due not only to their intrinsic parallelism, but also to the traditional big networks designed. However, modern FPGA architectures are perfectly suited for this kind of massive parallel computational needs. Therefore, our proposal is the implementation of Tiny Neural Networks, TNN -self-coined term-, in reconfigurable architectures. One of most important features of TNNs is their learning ability. Therefore, what we show here is the attempt to rise the autonomy features of the system, triggering a new learning phase, at run-time, when necessary. In this way, autonomous adaptation of the system is achieved. The system performs shape identification by the interpretation of object singularities. This is achieved by interconnecting several specialized TNN that work cooperatively. In order to validate the research, the system has been implemented and configured as a perceptron-like TNN with backpropagation learning and applied to the recognition of shapes. Simulation results show that this architecture has significant performance benefits.
Tiled architecture of a CNN-mostly IP system
Lambert Spaanenburg,
Suleyman Malki
Show abstract
Multi-core architectures have been popularized with the advent of the IBM CELL. On a finer grain the problems in
scheduling multi-cores have already existed in the tiled architectures, such as the EPIC and Da Vinci. It is not easy to
evaluate the performance of a schedule on such architecture as historical data are not available. One solution is to
compile algorithms for which an optimal schedule is known by analysis. A typical example is an algorithm that is
already defined in terms of many collaborating simple nodes, such as a Cellular Neural Network (CNN). A simple node
with a local register stack together with a 'rotating wheel' internal communication mechanism has been proposed.
Though the basic CNN allows for a tiled implementation of a tiled algorithm on a tiled structure, a practical CNN system
will have to disturb this regularity by the additional need for arithmetical and logical operations. Arithmetic operations
are needed for instance to accommodate for low-level image processing, while logical operations are needed to fork and
merge different data streams without use of the external memory. It is found that the 'rotating wheel' internal
communication mechanism still handles such mechanisms without the need for global control. Overall the CNN system
provides for a practical network size as implemented on a FPGA, can be easily used as embedded IP and provides a clear
benchmark for a multi-core compiler.
Optimization of input-constrained systems
Suleyman Malki,
Lambert Spaanenburg
Show abstract
The computational demands of algorithms are rapidly growing. The naive implementation uses extended doubleprecision
floating-point numbers and has therefore extreme difficulties in maintaining real-time performance. For fixedpoint
numbers, the value representation pushes in two directions (value range and step size) to set the applicationdependent
word size. In the general case, checking all combinations of all different values on all system inputs will easily
become computationally infeasible. Checking corner cases only helps to reduce the combinatorial explosion, as still
checking for accuracy and precision to limit word size remains a considerable effort. A range of evolutionary techniques
have been tried where the sheer size of the problem withstands an extensive search. When the value range can be limited,
the problem becomes tractable and a constructive approach becomes feasible. We propose an approach that is
reminiscent of the Quine-Mc.Cluskey logic minimization procedure. Next to the conjunctive search as popular in
Boolean minimization, we investigate the disjunctive approach that starts from a presumed minimal word size. To
eliminate the occurrence of anomalies, this still has to be checked for larger word sizes. The procedure has initially been
implemented using Java and Matlab. We have applied the above procedure to feed-forward and to cellular neural
networks (CNN) as typical examples of input-constrained systems. In the case of hole-filling by means of a CNN, we
find that the 1461 different coefficient sets can be reduced to 360, each giving robust behaviour on 7-bits internal words.
Poster Session
An adaptable interface between resistive sensors and microcontrollers
Show abstract
The increasing application of sensor networks in many different fields causes a growing demand of low-cost passive
sensors for monitoring physical variables as temperature, pressure or ambient humidity. These sensors need a
conditioning circuit that allows an easy interface to a microcontroller, taking advantage of the full range of the sensor
and reducing the microcontroller requirements. This paper presents a conditioning electronics designed to transform the
output of low-cost resistive sensors to a frequency variable signal. The circuit is designed to use the full frequency range
available, providing a good resolution. These quasi-digital signals are compatible to the logic levels of a standard low-power
microcontroller, allowing its connection through a digital I/O port.
Considerations on the design of conventional receivers for wireless optical channels using a Monte Carlo based ray-tracing algorithm
Show abstract
This paper presents a study of the design of a conventional receiver structure that offers improved performance with
respect to the main IR channel parameters, such as path loss and rms delay spread. To this end, we use a recently
proposed model for the effective signal-collection area of a conventional angle-diversity receiver that is nearer to real
behaviour than the ideal model. The inclusion of this model in the Monte Carlo ray-tracing algorithm allows us to study
those optical links that are characterized by the use of these receivers and investigate the structure of the conventional
receiver that yields improve performance with respect to the IR channel parameters. Based on the obtained results, we
propose the use of a conventional receiver composed of seven branches o photodiodes. One oriented towards the ceiling,
and six looking at an elevation of 56° with a separation of 60° in azimuth. For each element, a CPC with a FOV=50° must
be used. Furthermore, the proposed structure is evaluated in a representative link budget using L-PPM modulation
schemes.
Autonomous low-noise system for broadband measurements of the cosmic microwave background radiation
Show abstract
This paper describes the digital side implementation of a new suborbital experiment for the measurement of broadband
radiation emissions of the Cosmic Microwave Background (CMB) anisotropy. The system has been used in campaign
mode for initial mapping of the galactic radiation power received at a single frequency. The recorded galactic sky map
images are subsequently being used to forecast the emitted radiation at neighboring frequencies. A planned second
campaign will verify the prediction algorithms efficiency in an autonomous manner. The system has reached an
advanced stage in terms of hardware and software combined operation and intelligence, where other Space Physics
measurements are performed autonomously depending on the burst event under investigation. The system has been built
in a modular manner to expedite hardware and software upgrades. Such an upgrade has recently occurred mainly to
expand the frequency range of space observations.