Proceedings Volume 5117

VLSI Circuits and Systems

cover
Proceedings Volume 5117

VLSI Circuits and Systems

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 21 April 2003
Contents: 14 Sessions, 63 Papers, 0 Presentations
Conference: Microtechnologies for the New Millennium 2003 2003
Volume Number: 5117

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Image Processing I
  • Mixed Circuits I
  • Data Communications I
  • Technology I
  • VLSI Architectures I
  • CAD I
  • Data Communications II
  • Mixed Circuits II
  • VLSI Architectures II
  • Poster Session
  • VLSI Architectures II
  • Image Processing II
  • Technology II
  • Modeling
  • CAD II
  • Poster Session
  • Image Processing I
Image Processing I
icon_mobile_dropdown
VLSI architecture for MPEG-4 core profile video codec with accelerated bitstream processing
A VLSI architecture with flexible, application-specific coprocessors for object based video encoding/decoding is presented. The architecture consists of a standard embedded RISC core, as well as coprocessor modules for macroblock algorithms, motion estimation and bitstream processing. Bitstream decoding involves strong data dependencies, which requires optimized logical partitioning. An optimized instruction set can speed up bitstream decoding by a factor of two. This architecture combines high performance of dedicated ASIC architectures with the flexibility of programmable processors. Dataflow and memory access were optimized based on extensive studies of statistical complexity variations. Results on gate count and clock rate, required for realtime processing of MPEG-4 Core Profile video, are presented, as well as a comparison with software implementations on a standard RISC architecture.
FPGA implementation of Santos-Victor optical flow algorithm for real-time image processing: an useful attempt
Pedro Cobos Arribas, Felix Monasterio Huelin Macia
A FPGA based hardware implementation of the Santos-Victor optical flow algorithm, useful in robot guidance applications, is described in this paper. The system used to do contains an ALTERA FPGA (20K100), an interface with a digital camera, three VRAM memories to contain the data input and some output memories (a VRAM and a EDO) to contain the results. The system have been used previously to develop and test other vision algorithms, such as image compression, optical flow calculation with differential and correlation methods. The designed system let connect the digital camera, or the FPGA output (results of algorithms) to a PC, throw its Firewire or USB port. The problems take place in this occasion have motivated to adopt another hardware structure for certain vision algorithms with special requirements, that need a very hard code intensive processing.
Integer cosine transform chip design for image compression
Gustavo A. Ruiz, Juan A. Michell, Angel M. Buron, et al.
The Discrete Cosine Transform (DCT) is the most widely used transform for image compression. The Integer Cosine Transform denoted ICT (10, 9, 6, 2, 3, 1) has been shown to be a promising alternative to the DCT due to its implementation simplicity, similar performance and compatibility with the DCT. This paper describes the design and implementation of a 8×8 2-D ICT processor for image compression, that meets the numerical characteristic of the IEEE std. 1180-1990. This processor uses a low latency data flow that minimizes the internal memory and a parallel pipelined architecture, based on a numerical strength reduction Integer Cosine Transform (10, 9, 6, 2, 3, 1) algorithm, in order to attain high throughput and continuous data flow. A prototype of the 8×8 ICT processor has been implemented using a standard cell design methodology and a 0.35-μm CMOS CSD 3M/2P 3.3V process on a 10 mm2 die. Pipeline circuit techniques have been used to attain the maximum frequency of operation allowed by the technology, attaining a critical path of 1.8ns, which should be increased by a 20% to allow for line delays, placing the estimated operational frequency at 500Mhz. The circuit includes 12446 cells, being flip-flops 6757 of them. Two clock signals have been distributed, an external one (fs) and an internal one (fs/2). The high number of flip-flops has forced the use of a strategy to minimize clock-skew, combining big sized buffers on the periphery and using wide metal lines (clock-trunks) to distribute the signals.
Mapping of real-time and low-cost super-resolution algorithms onto a hybrid video encoder
Gustavo Marrero Callico, Rafael Peset Llopis, Antonio Nunez, et al.
This paper focuses on the mapping of low-cost and real-time super-resolution (SR) algorithms onto SOC (System-On-Chip) platforms in order to achieve high-quality image improvements. Low-cost constraints are accomplished by avoiding the need for specific SR hardware, by re-using a video encoder architecture. Only small modifications are needed for: the motion estimator, the motion compensator, image loop memory, etc. This encoder can be used either in compression mode or in SR mode. This video encoder together with the new SR features constitutes an IP block inside Philips Research, upon which several SOC platforms are being developed. Although this SR algorithm has been implemented on an encoder architecture developed by Philips Research it can be easily mapped upon other hybrid video encoder platforms. The results show important improvements in the image quality. Based on these results, some generalizations can be made about the impact of the sampling process on the quality of the super-resolution image.
Mixed Circuits I
icon_mobile_dropdown
State of the art in CMOS threshold logic VLSI gateimplementations and applications
In recent years, there has been renewed interest in Threshold Logic (TL), mainly as a result of the development of a number of successful implementations of TL gates in CMOS. This paper presents a summary of the recent developments in TL circuit design. High-performance TL gate circuit implementations are compared, and a number of their applications in computer arithmetic operations are reviewed. It is shown that the application of TL in computer arithmetic circuit design can yield designs with significantly reduced transistor count and area while at the same time reducing circuit delay and power dissipation when compared to conventional CMOS logic.
New efficient offset voltage cancellation techniques using digital trimming
Operational amplifiers have an important role as a basic building block in analog circuit design. One of the performance limitations of these circuits is the input referred offset voltage or simply input offset voltage. This voltage can range between 1 - 30 mV depending on the fabrication process and the sizes of the ideally symmetrical input transistors of the differential amplifier. Two new techniques to digitally trim the offset voltage of operation amplifier are presented and discussed. The techniques can be divided into two categories. The first is called weighted current technique, while the second is called weighted voltage technique. The attractive features of the new techniques are the trimming is performed digitally, large dynamic range; require small silicon area, and the ability to provide auto-zero cancellation using extra hardware. In the presented analysis, a binary weighted scheme will be used. However, the techniques are not restricted to that scheme and they are still applicable with other weighting schemes. A detailed analysis of these techniques will be presented and discussed and measurement from fabrication and simulation will be presented.
Novel 1.25-Gb/s CMOS burst mode optical receiver with automatic gain controllable preamplifier and a highly sensitive peak detector without external reset signal
Ja-Won Seo, Seop Han, Quan Le, et al.
This paper introduces a novel feed-forward type burst mode optical receiver, which has automatically gain controllable preamplifier and high sensitive peak detector using a 0.18mm CMOS technology. The important feature of the receiver is that it operates with a reset signal that is self-generated inside the receiver not applied externally. The designed receiver can be used in the Ethernet PON (Passive Optical Network) system and has a sensitivity of -28dBm and a overload of -8dBm at 1.25Gb/s optical input signal.
Data Communications I
icon_mobile_dropdown
Scheduling components for multigigabit network SoCs
Theofanis Orphanoudakis, George Kornaros, Ioannis Papaefstathiou, et al.
To meet the demand for higher performance, flexibility, and economy in today's state-of-the-art networks, great emphasis is placed on unconventional hardware architectures of network processors. This paper analyzes the problem of processor internal resource and traffic management and proposes a programmable scheduler architecture implemented in a novel protocol processor that deals with the above problems in an integrated way. We briefly outline the architecture of the protocol processor and we support that the innovative scheduling scheme integrated in PRO3 is, in general, crucial for network Systems-on-Chip since it makes it feasible to use scheduler's architecture are discussed that lead to efficient integration of the component to different network processor architectures at a similar cost. Its beneficial features are easy hardware implementation, low memory bandwidth requirements and high flexibility so as to support multiple service disciplines in a programmable way, thousands of flows and even perform different scheduling tasks.
CMOS mixed-signal MODEM for data transmission and control of electrical household appliances using a low-voltage power line
Sara Escalera, Carlos Manuel Dominguez-Matas, Jose Manuel Garcia-Gonzalez, et al.
This paper presents a CMOS mixed-signal MODEM ASIC for data transmission on the low-voltage power line. The circuit includes all the analog circuitry needed for input interfacing and modulation/demodulation (PLL-based frequency synthesis, slave filter banks with PLL master VCO for tuning, and decision circuitry) plus the logic circuitry needed for control purposes. To allow the communication between the electrical household appliances and a remote unit to control them as well as to reduce the cost, an unique mixed-signal ASIC, made of two parts, one operating at high frequencies and another operating at lower frequencies, has been designed. The High Frequencies Module must allow the connection with the external control systems and, to ensure reasonable robustness, has to be able to send and receive signals using at least two different channels (to avoid local and temporary degradations of the communication). The Low Frequencies Module is needed to manage the indoors communication. This module enables the transmission of signals within distances between 50 and 100 meters with a speed in the order of, but never less than, 100 bits/s. This link should be accomplished by using a frequency range in such a way that a maximum number of channels are disposable to allow the control of as many different in-house devices as possible. Again, to this end, two different tunable channels have to be simultaneously available: one to control the quality of the signal and the other to allow the effective communication.
High-speed clock recovery unit based on a phase aligner
Efrain Tejera, Roberto Esper-Chain, Felix Tobajas, et al.
Nowadays clock recovery units are key elements in high speed digital communication systems. For an efficient operation, this units should generate a low jitter clock based on the NRZ received data, and be tolerant to long absence of transitions. Architectures based on Hogge phase detectors have been widely used, nevertheless, they are very sensitive to jitter of the received data and they have a limited tolerance to the absence of transitions. This paper shows a novel high speed clock recovery unit based on a phase aligner. The system allows a very fast clock recovery with a low jitter, moreover, it is very resistant to absence of transitions. The design is based on eight phases obtained from a reference clock running at the nominal frequency of the received signal. This high speed reference clock is generated using a crystal and a clock multiplier unit. The phase alignment system chooses, as starting point, the two phases closest to the data phase. This allows a maximum error of 45 degrees between the clock and data signal phases. Furthermore, the system includes a feed-back loop that interpolates the chosen phases to reduce the phase error to zero. Due to the high stability and reduced tolerance of the local reference clock, the jitter obtained is highly reduced and the system becomes able to operate under long absence of transitions. This performances make this design suitable for systems such as high speed serial link technologies. This system has been designed for CMOS 0.25μm at 1.25GHz and has been verified through HSpice simulations.
Practical high-level methodology case study: implementation of an ATM over SDH transceiver from the system specification
Ruben Arteaga, Felix Tobajas, Roberto Esper-Chain, et al.
Nowadays digital networks require architectures based on standards that are implemented independently of the technology. Besides, these network specifications can easily change to include novel services. For these reasons, dominant trends are to design and verify systems at high level, prior to technology mapping. In this paper, a methodology is proposed to obtain a full verified system from an architecture specification. In order to validate this methodology, a system specification document is used as starting point. This specified system is partitioned in average size modules and then each module is described itself in design specification documents, which are the basis of their implementation using Hardware Description Languages (HDL). Each module is verified based on test specification documents generated along with the design specification. Finally, all the modules are interconnected and verified using an automatic test vector generator. Firstly, this approach introduces a method that, independently of the size of the system, improves reliability in the results, and secondly, it documents all the steps performed during the different stages. In order to validate this methodology, SDH (Synchronous Digital Hierarchy) and ATM (Asynchronous Transfer Mode) standards was chosen. Based on these standards, an ATM over SDH transceiver with add/drop functionality is studied, designed, implemented and verified. This system is described and verified using HDL and, after that, it is synthesized in FPGA devices. The obtained results show that a complex digital system has been developed guaranteeing the specifications, and, on the other hand, the optimization of the human resources and the effort of engineering. This methodology encourages the documentation process while the system is developed, easing the knowledge transfer process.
Technology I
icon_mobile_dropdown
High-bandwidth low-latency global interconnect
Christer M Svensson, Peter Caputa
Global interconnects have been identified as a serious limitation to chip scaling, due to their limited bandwidth and large delay. A critical analysis of intrinsic limitations of electrical interconnect indicates that these limitations can be overcome. This basic analysis is presented, together with design constraints. We demonstrate a scheme for this, based on the utilization of upper-level metals as transmission lines. A global communication architecture based on a global mesochronous, local synchronous approach allows very high data-rate per wire and therefore very high bandwidth in buses of limited width. As an example, we demonstrate a 320μm wide bus with a capacity of 160Gb/s in a nearly standard 0.18μm process.
Leakage control for deep-submicron circuits
Kaushik Roy, Hamid Mahmoodi-Meimand, Saibal Mukhopadhyay
High leakage current in deep submicron regime is becoming a significant contributor to power dissipation of CMOS circuits as threshold voltage, channel length, and gate oxide thickness are scaled every technology generation. Consequently, the identification and modeling of different leakage components is very important for estimation and reduction of leakage power, especially for low power applications. This paper considers various transistor intrinsic leakage mechanisms including weak inversion, drain-induced barrier lowering, gate-induced drain leakage, gate oxide tunneling, and bad-to-band-tunneling and explores different techniques to reduce leakage power consumption for scaled technologies.
Scaling down photonic waveguide devices on the SOI platform
Pavel Cheben, Dan-Xia Xu, Siegfried Janz, et al.
We discuss the challenges encountered when scaling down photonic waveguide devices, and demonstrate possible solutions in silicon-on-insulator (SOI) platform. First, sources of waveguide birefringence such as waveguide geometry and stress in the waveguiding layer are discussed. Birefringence sensitivity to inaccuracy of waveguide dimensions is compared for different waveguide geometries, including trapezoidal and rectangular cross-sections. Results show that trapezoidal waveguides are more robust, which makes fabrication tolerances less stringent. Methods for minimizing the waveguide birefringence using stress induced by an over-cladding dielectric film, and by inducing form birefringence through deposition of thin layers of high and low refractive index materials, are discussed. Compact arrayed waveguide grating (AWG) devices are presented, with internal loss of -5.9 dB, crosstalk better than -20 dB, and polarization dependent wavelength shift of <0.05 nm. We discuss and quantify the sources of loss and crosstalk in our AWG devices, and review the methods we have developed for compensation of the polarization dependent wavelength shift, including etched compensator and silicon-oxide-silicon (SOS) compensator. The latter exploits form birefringence of a thin buried oxide layer sandwiched between silicon waveguide core and a silicon over-layer, and is simple to fabricate by standard oxide and amorphous silicon or polysilicon deposition techniques. The calculated loss penalty of the SOS compensator is less than 0.2 dB for both TE and TM polarization.
Theoretical analysis on characteristics and efficiency of CdS/CdTe heterojunction solar cell
Jiang Guo, Chunyang Kong, Wanlu Wang
The current-voltage properties of CdS/CdTe heterojunction were first discussed under ideal conditions (without any losses). Conversion efficiency of CdS/CdTe heterojunction solar cells was then investigated under AM1.5 illumination. The results showed that the conversion efficiency of solar cells could be achieved up to 27%, which was in agreement with the results predicted previously. Meanwhile, the realistic conversion efficiency of CdS/CdTe heterojunction solar cells was about 23% after taking into consideration of influence of some material parameters. Given considerable progress that has been made in manufacturing CdS/CdTe solar cells in recent years as well as considering a rational theoretical analysis, the conversion efficiency of 23% is achievable in the near future.
VLSI Architectures I
icon_mobile_dropdown
Signaling in the heterogeneous architecture multiprocessor paradigm
Antonio Nunez, Victor Reyes, Tomas Bautista
This paper discusses and compares solutions for the issue of signalling and synchronization in the heterogeneous architecture multiprocessor paradigm. The on-chip interconnect infrastructure is split conceptually into a data transport network and a signalling network. This paper presents a SystemC based technique for modelling the communication architecture, with different topologies for the synchronization or signalling network. Each topology is parameterised for several communication requirements that define a point in the communication space. A high abstraction model leads to an experimental set-up that eases the analysis of the quantitative and qualitative behaviour of the networks for representative points in the communication space of the system design. The SystemC simulation models developed allow us to obtain information about total simulation time, processing time spent by the coprocessors, data transport time (read/write) used by the coprocessors (including arbitration time), and synchronization time spent by the coprocessors and the network. Another important metric is the coprocessor usage percentage. Results show that splitting data and signalling networks bring additional improvement to the performance of the system. The model applies well when mapping to architectural platforms the application processes expressed by abstract computational models such as Kahn process networks (KPN), synchronous data flow models (SDF), and generalized communicating sequential processes models (CSP).
High-performance VLSI architecture for video processing
Real time image processing is a key issue in nowadays multimedia applications. Image filtering and video coding are two basic applications in image processing. Their algorithms are computationally expensive due to both, the number of points of each frame to be processed, and the calculation complexity per point. The VLSI implementation of these algorithms leads to special architectures that are based on systolic arrays, and whose implementation is greedy in silicon area. In this paper, we propose a configurable and bidimensional pipelined VLSI architecture that supports mathematical morphology operations and the block matching algorithm. Remarkable advantages include low power consumption, and a regular and compact design (in terms of core active area) versus the traditional systolic architecture. The architecture is adequate for both morphological image filtering and video compression, depending on the hardware resources of the processing elements. The main advantage of this bidimensional pipeline architecture is the area saving compared with the systolic array implementation. Total area saving was presented in terms of the number of bits of the FIFO memories that can be eliminated. The proposed architecture was verified at high level in C++, at RTL level using Verilog and at C++/RTL level using DEMETER. Required cycle times was measured for a real time morphological filter per dilation/erosion operation, as a function of the incoming resolution. Physical layouts were obtained for the basic slice of the processing element and for the systolic array using the technology of 0,35 microns CMOS from AMS.
CAD I
icon_mobile_dropdown
System-level verification methodology for advanced switch fabrics
A system-level verification methodology for advanced switch fabrics is introduced in this paper. Verification is getting more time-consuming as design complexity increases and typically consumes over half of the design effort. Writing testbenches in an ad-hoc verification environment, such as Verilog, VHDL or in C/C++ using PLI directly, is tedious, unproductive and the reusability is low. Time-to-market is critical at chip-level verification, if we add the short life cycles and the changing standards, the design and verification of new products require new design and verification tools. As result of our methodology a verification framework is also presented. The decomposition of each interface of the switch fabric allows the reconfiguration of the framework, when there is a new revision of one design. This idea promotes the reuse of the main interface code and verification statements. The development of the verification framework in C++, 'e' and Verilog demonstrates that our methodology can be applied independently of the programming language. New features, added to the framework such as error insertion, give a highest control over the verification and they enhance the verification coverage in corner case mode. On the other hand, the golden reference layer is the key of the automatic mode, because a high level model can be used as reference to check the correctness of the DUV automatically and the synchronization issues are resolved between the different simulator natures. Several commercial and non-commercial advanced switch fabrics have been verified using this method. The highest complexity verified circuits were VSC882/VSC872 (GigaStream Chip Set). The usefulness of the proposed methodology is demonstrated by GigaStream Chip Set functional success and the saving of a 60% in the verification time per effort unit. In general, given a programming language, the improvement of using our methodology is around a 40% in verification time per effort unit.
Models and algorithm for the calculation of the impulse response on IR-wireless indoor channels
Recently there has been growing interest in using infrared (IR) light for broadband indoor wireless communications. There are two major limitations for establishing a wideband infrared communications link. The first and most important limit is the power requirements of such a link. The second important impairment is the intersymbol interference caused by multipath dispersion. The use of angle-diversity receiver allows to achieve high optical gain and a wide field of view simultaneously, it can reduce the impact of ambient light noise, path loss and multipath distortion, in part by exploiting the fact that they are often received from different directions than the desired signal. The advantages achieved depend on how signal received in the different elements are detect and processed. For this reason, we have developed a fast simulation tool that allows to study the influence of the IR channel and to propose new techniques and receiver structures for those systems. The indoor optical channel simulation can significantly benefit the design of high performance IR systems, but requires models that fit correctly the channel characteristics. In contrast to previous works, we define new models for the emitter, lenses, receiver, nonimaging concentrators and reflectors upon which a Monte Carlo ray-tracing algorithm allows to study different links. The inclusion of these models benefit the design of IR links since the are nearer to real behavior than the ideals models. The use of this simulation tool allowed us to analyze the behavior of several links and suggest a configuration of a receiver using angle diversity.
VESTA: a system-level verification environment based on C++
Mahendra V. Shahdadpuri, Javier Sosa, Héctor Navarro, et al.
System verification is an important issue to do at every design step to ensure the complete system correctness. The verification effort is becoming more time-consuming due to the increase in design complexity. New environments are necessary to reduce the complexity of this task and, most importantly, reduce the time to develop it. Among the languages used in verification, C++ is powerful enough for encapsulating the necessary concepts in a set of classes and templates. This work introduces a framework that allows describing and verifying highly complex systems in a user-friendly and speedy way with C++ classes. These encapsulate hardware description and verification concepts and can be reused throughout a project and also in various development projects. Furthermore, the resulting libraries provide an easy-to-use interface for describing systems and writing test benches in C++, with a transparent connection to an HDL simulator. VESTA includes an advanced memory management with an extremely versatile linked list. The linked list access mode can change on-fly to a FIFO, a LIFO or a memory array access mode, among others. Experimental results demonstrate that the basic types provided by our verification environment excel the features of non-commercial solutions as Openvera or TestBuilder and commercial solutions such as 'e' language. Besides, the results achieved have shown significant productivity gain in creating reusable testbenches and in debugging simulation runs.
MHDL CAD tool with fault circuit handling
Guillermo Espinosa Flores-Verdad, Leopoldo Altamirano Robles, Leticia Osorio Roque
Behavioral modeling and simulation, with Analog Hardware and Mixed Signal Description High Level Languages (MHDLs), have generated the development of diverse simulation tools that allow handling the requirements of the modern designs. These systems have million of transistors embedded and they are radically diverse between them. This tendency of simulation tools is exemplified by the development of languages for modeling and simulation, whose applications are the re-use of complete systems, construction of virtual prototypes, realization of test and synthesis. This paper presents the general architecture of a Mixed Hardware Description Language, based on the standard 1076.1-1999 IEEE VHDL Analog and Mixed-Signal Extensions known as VHDL-AMS. This architecture is novel by consider the modeling and simulation of faults. The main modules of the CAD tool are briefly described in order to establish the information flow and its transformations, starting from the description of a circuit model, going throw the lexical analysis, mathematical models generation and the simulation core, ending at the collection of the circuit behavior as simulation’s data. In addition, the incorporated mechanisms to the simulation core are explained in order to realize the handling of faults into the circuit models. Currently, the CAD tool works with algebraic and differential descriptions for the circuit models, nevertheless the language design is open to be able to handle different model types: Fuzzy Models, Differentials Equations, Transfer Functions and Tables. This applies for fault models too, in this sense the CAD tool considers the inclusion of mutants and saboteurs. To exemplified the results obtained until now, the simulated behavior of a circuit is shown when it is fault free and when it has been modified by the inclusion of a fault as a mutant or a saboteur. The obtained results allow the realization of a virtual diagnosis for mixed circuits. This language works in a UNIX system; it was developed with an object-oriented methodology and programmed in C++.
Data Communications II
icon_mobile_dropdown
Switch-based interconnect architecture for future systems on chip
Partha Pratim Pande, Cristian S Grecu, Andre Ivanov, et al.
System on Chip (SoC) design involves the integration of numerous heterogeneous semiconductor intellectual property (SIP) blocks. The success of this approach depends on the seamless integration of cores like processors, memories, UARTs, etc. Some of the main problems associated with future SoC design arise from non-scalable global wire delays, failure to achieve global synchronization with a single clock, errors due to signal integrity issues, and difficulties associated with non-scalable bus-based functional interconnects. To address these problems, we conjecture the future need and practicality of a paradigm shift in SoC design methodology from a conventional, typically bus-based approach, to a network-centric approach. In replacement of global wiring, we propose a switch-based on-chip interconnection network to interconnect IP blocks. One of the challenges in an interconnection network-based SoC is sending data from one IP block to multiple destination IP blocks simultaneously, i.e., multicasting. To achieve multicasting we introduce the concept of a bit-string encoding in the addressing mechanism to communicate among IP blocks. Another major challenge in such network-based SoCs is throughput degradation due to idle physical channels. By introducing the concept of virtual channels in an on-chip interconnection network, the overall throughput of the SoC can be improved. To incorporate the concept of multicasting and virtual channels the silicon area consumed by the switches will increase, but that can be made to be very small in a billion-transistor SoC.
CMOS receiver circuits for high-speed data transmission according to LVDS standard
For high speed data transmission between different integrated circuits on one circuit board several aspects have to be considered: to avoid reflections a termination at the receiver is needed; to reduce power consumption a low signal swing is required; to make the transmission insensitive to interferences differential signals have to be used. All of this is taken into consideration by using the 'IEEE-Standard for Low-Voltage Differential Signals (LVDS)'. In one part of this standard the specifications for the receiver are given. To fulfill these requirements special amplifier circuits are necessary. They must be able to operate with a very small differential signal at the input (400 mV max.) and a strongly varying operating point (between 0 and 2.4 V). With a supply voltage of 2.5 V two complementary input stages are necessary. Their output signals have to be combined and amplified to full signal swing. Different circuits which fulfill these conditions are presented and compared based on transistor level simulation. To improve the timing behaviour and to increase the signal slope and the opening in the eye diagram the transistor dimensions of the circuits have been optimized by using the optimization tool OPSIM. For the two most promising circuits with a data rate of 1.0 respectively 1.4 GBit/s and a power consumption of approximately 1 respectively 4 mW a full custom layout was created by using a modul generator environment and a design assistant. These two circuits have been realized in a 0.25 μm CMOS technology. Measurement results of the two circiuts are presented.
System-level optimization of baseband filters for communication applications
Manuel Delgado-Restituto, Juan F. Fernandez-Bootello, Angel Rodriguez-Vazquez
In this paper, we present a design approach for the high-level synthesis of programmable continuous-time Gm-C and active-RC filters with optimum trade-off among dynamic range, distortion products generation, area consumption and power dissipation, thus meeting the needs of more demanding baseband filter realizations. Further, the proposed technique guarantees that under all programming configurations, transconductors (in Gm-C filters) and resistors (in active-RC filters) as well as capacitors, are related by integer ratios in order to reduce the sensitivity to mismatch of the monolithic implementation. In order to solve the aforementioned trade-off, the filter must be properly scaled at each configuration. It means that filter node impedances must be conveniently altered so that the noise contribution of each node to the filter output be as low as possible, while avoiding that peak amplitudes at such nodes be so high as to drive active circuits into saturation. Additionally, in order to not degrade the distortion performance of the filter (in particular, if it is implemented using Gm-C techniques) node impedances can not be scaled independently from each other but restrictions must be imposed according to the principle of nonlinear cancellation. Altogether, the high-level synthesis can be seen as a constrained optimization problem where some of the variables, namely, the ratios among similar components, are restricted to discrete values. The proposed approach to accomplish optimum filter scaling under all programming configurations, relies on matrix methods for network representation, which allows an easy estimation of performance features such as dynamic range and power dissipation, as well as other network properties such as sensitivity to parameter variations and non-ideal effects of integrators blocks; and the use of a simulated annealing algorithm to explore the design space defined by the transfer and group delay specifications. It must be noted that such design space also includes most common approximation methods and network synthesis approaches as optimization variables, in order to make as widespread as possible the search for optimum solutions. The proposed methodology has been partially developed in MATLAB, taking advantage of the routines available in the signal processing and control toolboxes, and C++. The validity of the methodology and companying software will be demonstrated at the Conference and reported in the paper, using as a tailoring example the design of a programmable bank of filters for a high-performance powerline modem.
Mixed Circuits II
icon_mobile_dropdown
Analog filter circuits testing using voltage and current measurements
This paper presents a study of the importance of analogue circuits testing in general and the challenges faced in testing these modules in a mixed-signal environment. It highlights the difficulties that are involved in testing analogue and mixed-signal circuits and compares them with those of testing digital only circuits. Sources of failure in integrated circuits and their relation to fault models are outlined. The paper concentrates on testing active analogue filter circuits operating in mid-range frequencies. A variety of filter circuits with different configurations and varied degrees of complexity are studied. Both soft and catastrophic single fault conditions are introduced to the circuits at the transistor, operational amplifier and feedback network levels. The work presented in the paper compares the detection of the injected faults using both frequency response and transient response voltage and current measurements. The objective is to determine the measurement method and parameter that is best at detecting a particular fault or class of faults. Analysis of the simulation data indicates that the measurement methods and parameters are complementary in terms of fault coverage and fault detection confidence.
Iterative current mode per pixel ADC for 3D SoftChip implementation in CMOS
Stefan W. Lachowicz, Alexander Rassau, Seung-Minh Lee, et al.
Mobile multimedia communication has rapidly become a significant area of research and development constantly challenging boundaries on a variety of technological fronts. The processing requirements for the capture, conversion, compression, decompression, enhancement, display, etc. of increasingly higher quality multimedia content places heavy demands even on current ULSI (ultra large scale integration) systems, particularly for mobile applications where area and power are primary considerations. The ADC presented in this paper is designed for a vertically integrated (3D) system comprising two distinct layers bonded together using Indium bump technology. The top layer is a CMOS imaging array containing analogue-to-digital converters, and a buffer memory. The bottom layer takes the form of a configurable array processor (CAP), a highly parallel array of soft programmable processors capable of carrying out complex processing tasks directly on data stored in the top plane. This paper presents a ADC scheme for the image capture plane. The analogue photocurrent or sampled voltage is transferred to the ADC via a column or a column/row bus. In the proposed system, an array of analogue-to-digital converters is distributed, so that a one-bit cell is associated with one sensor. The analogue-to-digital converters are algorithmic current-mode converters. Eight such cells are cascaded to form an 8-bit converter. Additionally, each photo-sensor is equipped with a current memory cell, and multiple conversions are performed with scaled values of the photocurrent for colour processing.
Novel low-voltage low-power Gb/s transimpedance amplifier architecture
A novel current-mode transimpedance amplifier (TIA) architecture is proposed for optical receivers. This new architecture, based around the use of a uniquely biased common-base current buffer stage, allows stable, DC coupled TIAs to be designed in bipolar or CMOS processes operating from extremely low supply voltages and using very low levels of power. Noise performance is comparable to that of higher power designs that operate from higher supply rails. Simulation results have been obtained for a 47GHz fT SiGe BiCMOS process and also 0.25μm CMOS.
Sigma-delta modulator for a programmable gain low-power high-linearity automotive sensor interface
Jose M. de la Rosa, Fernando Medeiro, Belen Perez-Verdu, et al.
Smart sensors play a critical role in modern automotive electronic systems, covering a wide range of data capturing functions and operating under adverse environmental conditions - temperature range of [-40ºC,175ºC]. In such sensors, the signal provided by transducers is composed of an offset voltage, which depends on the manufacturing process, and a low-frequency signal carrying the information. In practice, the offset voltage is subject to temperature variations, thus causing a shifting of the signal range to be measured. Therefore, the measuring circuit driving the sensor, normally formed by a low-noise preamplifier and an Analog-to-Digital Converter (ADC), must accommodate the complete range of possible offsets and real signals. In this scenario, the use of ADCs based on Sigma-Delta Modulators (SDMs) is convenient for several reasons. On the one hand, the noise-shaping performed by SDMs allows to achieve high resolution (16-17bits), in the band of interest (10-20kHz), with less power consumption than full Nyquist ADCs. On the other hand, the action of feedback renders SDMs very linear, and high-linearity is a must for automotive applications. Last but not least, the robustness of SDMs with respect to circuit imperfections make them suitable to include programmable gain without significant performance degradation. This feature allows to accommodate the complete range of possible offsets and information signals in a sensor interface with relaxed specifications for the preamplifier circuitry. This paper describes the design and implementation of a third-order cascade (2-1) SDM with programmable gain in a 0.35mm CMOS technology - the type of technology commonly employed for automotive applications (deep submicron is mostly employed for telecom). It is capable of handling signals up to 20-kHz bandwidth with 17-bit resolution. The programmable gain is implemented by a capacitor array whose unitary capacitors are connected or disconnected depending on the value of the selected gain. In order to relax the amplifier dynamics requirements as the modulator gain varies, switchable capacitor arrays have been used for all the capacitors in the first integrator. The design of the modulator building blocks is based upon a top-down CAD methodology which combines simulation and statistical optimization at different levels of the modulator hierarchy. As a result, a dynamic range equal to 105 dB is obtained for all cases of the modulator gain, which corresponds to 17 bit resolution.
LP-LV high-performance monolithic DTMF receiver with on-chip test facilities
Diego Vazquez, Gloria Huertas, Maria José Avedillo, et al.
Dual Tone Multi-Frequency (DTMF) signalling (also known as Touch-Tone, Tel-Touch,etc.) has gained importance in the world of telecommunications (telephony, answering machines, remote control, credit cards, etc) at the expense of dial-pulse signalling due to its more efficient and higher reliability for transmission of signals. This paper presents a high performance DTMF receiver able to operate in the range of 3V-5V of voltage supply with a low current consumption (<1mA) that is virtually fixed. In addition, on-chip test facilities for the analog part have been incorporated into the chip, in particular: a) a modified opamp (called sw-opamp) has been used to provide external accessing to inputs and outputs of the main analog blocks for off-line test purposes and, b) a Built-In-Self-Test strategy based on converting the analog part into an oscillator (the so-called oscillation-based-test) to perform a structural testing of the architecture. An integrated prototype has been designed and integrated in a 0.6μm technology. The price paid for such on-chip test facilities is very low; concretely, just an extra pin is used, power consumption during normal operation is not penalized and the area overhead is in the order of 7%. The experimental results demonstrate the good performance of the design and the feasibility of the testing approaches.
VLSI Architectures II
icon_mobile_dropdown
Flexible coprocessor architectures for ambient intelligent applications in the mobile communication and automotive domain
Winfried Gehrke, Joern Jachalsky, Martin Wahle, et al.
Ambient Intelligent is expected to become one of the driving key factors of the semiconductors industry in this decade. One of the most promising areas in this respect is the advent of embedded smart imaging applications in a variety of consumer applications, like mobile communication devices and the automotive domain. The efficient VLSI implementation of these applications requires architectural concepts that enable the extraction of objects and associated information out of video sequences in real-time. The main architectural challenge is to find an appropriate trade-off between architectural flexibility and scalability in order to cope with moderate variations of the applied smart imaging algorithms on one hand and cost efficiency of the implementation on the other hand. This paper describes the algorithmic and architectural requirements for the implementation of smart imaging applications in the mentioned fields. The target system, based on an embedded RISC processor, embedded memory, and cores for accelerating essential functions, like morphological operations, connected component labeling, motion extraction etc., is presented. The functional system partitioning applied is based on HW acceleration of core functions that enable the extraction of low-level information out of the images of a video sequence. This information is provided to the embedded RISC processor for further abstraction of the image content information and interpretation of the image content by SW means. One of the focal points of this paper is the derivation of efficient architectural concepts for smart imaging coprocessors, acting as a system toolbox for accelerating the required smart imaging core functions.
Lifting folded pipelined discrete wavelet packet transform architecture
Guillermo Paya, Marcos M. Peiro, J. Francisco Ballester, et al.
The present article describes a new high-efficient architecture for 1-D discrete wavelet packet transform (DWPT) base on lifting, folded and pipeline techniques, which makes possible to expand three completes levels. An architecture for a CDF(2,2) wavelet base is proposed. We have designed a filter bank using a lifting factorization for these coefficients and we have used an extension of the recursive pyramid algorithm (RPA) to obtain the three complete levels. We have pipelined our architecture to reach a maximally fast structure with only one logic operator in the critical path. Moreover, our architecture performances 75 % of hardware utilization for a DWPT realization. A comparative is presented between our DWPT architecture with others DWPT architectures. Our proposal lifting pipelined DWPT architecture is a maximally fast structure with only one logic operator in the critical path. Others DWPT architectures are based on memory access, that implies lower operation frequency and higher power consumption as our architecture.
Poster Session
icon_mobile_dropdown
Turbo decoder core design for system development
Xiaoyi Chen, Qingdong Yao, Peng Liu
Due to their near Shannon-capacity performance, turbo codes have received a considerable amount of attention since their introduction. They are particularly attractive for cellular communication systems and have been included in the specifications for both the WCDMA(UMTS) and cdma2000 third-generation cellular standards. The log-MAP decoding algorithm and some technologies used to reduce the complexity have discussed in the past days. But we can see that if we apply the Turbo code to wireless communications,the decoding process rate is the bottleneck. The software implement is not realistic in today’s DSP process rate. So the hardware design is supposed to realize the decoding. The purpose of this paper is to present a full ASIC design way of Turbo decoding. Many technologies are added to the basic Log-MAP algorithm to simple the design and improve the performance. With the log-MAP algorithm, the Jacobi logarithm is computed exactly using max*()=ln(exp(x)+exp(y))=max()+fc(|y-x|),The correction function fc(|y-x|) is important because there will be 0.5dB SNR loss without it. The linear approximation can be used and the linear parameters was selected carefully to suit hardware realize in our design. In order to save the power consumption and also to assure the performance, the quantization is important in ASIC design, we adopt a compromise scheme to save the power and also there is good BER behaves. Many noisy frames can be corrected with a few iterations while some of the more noisy frames need to experience a full number of iterations (NOI). Thus, if the decoder could stop the iteration as soon as the frame becomes correct, the average NOI would be reduced. There are many ways to stop the iteration such as CRC, compare and so on, we adopt a significantly less computation and much less storage stop criteria. For long frames the memory for storing the entire frame of the forward probability α or the backward probability β can be very large. Available products all use sliding-window version of the turbo decoder to reduce the memory requirements. This is also true in our design. In addition of this, a new method is adopted to expand the sliding window length but without increasing the storing requirement. This method also improves the performance evidently. The technologies adoped in the paper are suited hardware design for wireless application. For example, this decoding core can be embedded into our 32-bit digital signal processor (MD-32) to realize 3G basestation receiver.
VLSI Architectures II
icon_mobile_dropdown
Some experiences using system-on-chip buses
Pedro P Carballo, Pablo Santos, Margarita Marrero, et al.
Advances in fabrication and design technologies have contributed to integrate a complete system on a chip. A system-on-chip (SoC) is generally composed of a microprocessor core, on-chip memory and one or more specific coprocessors IPs. One of the major drawbacks of this approach is the differences in the interfaces that each virtual component (VC) of the SoC presents. The idea of a common bus infrastructure allows us to smooth the system integration and has been considered as a design solution for SoC architectures. This paper presents a review of different alternatives for SoC buses and summarizes some experiences of their use. Different alternatives exist for SoC buses. ARM has proposed AMBA (Advanced Microcontroller Bus Architecture) as an open specification that serves as a framework for SoC design. AMBA is a bus architecture multiplayer for high performance SoC designs. AMBA support multi-master configurations where a bus arbiter must be included. AMBA-Lite is a simpler alternative if you are using only one master. IBM uses CoreConnect Bus architecture as a SoC solution for buses. CoreConnect share some similarities with AMBA because both use a multilayer bus to accommodate different speeds in the system: AHB and PLB can be compared. The same situation occurs for APB and OPB. Other alternatives can be found. Wishbone is an Open Bus Specification form opencores.org that tries to solve the problem of IP integration. The idea is to specify a common interface between cores to accelerate the development of virtual components. VSIA has proposed Virtual Component Interface (VCI) as a solution to solve the problem of virtual component integration. VCI specify three types of protocols depending on the level of complexity: Peripheral, Basic and Advanced VCI. The development of the IPs compatible with any of the SoC buses above presented is a complex problem. One solution is the use of wrappers that adapts the interface of the Virtual Component to the protocol supported by the SoC buses. The two main characteristics of these wrappers are that the increased in latency and area would be as low as possible. The second solution is to design the IP with the final environment in mind.
Image Processing II
icon_mobile_dropdown
Performance optimization of an MPEG-2 to MPEG-4 video transcoder
The MPEG-2 compressed digital video content is being used in a number of products including the DVDs, Camcorders, digital TV, and HDTV. The ability to access this widely available MPEG-2 content on low-power end-user devices such as PDAs and mobile phones depends on effective techniques for transcoding the MPEG-2 content to a more appropriate, low bitrate, video format such as MPEG-4. In this paper we present the software and algorithmic optimizations performed in developing a real time MPEG-2 to MPEG-4 video transcoder. A brief overview of the transcoding architectures is also provided. The details of the transcoding architectures for MPEG-2 to MPEG-4 video transcoding can be found in. The transcoder was targeted and optimized for Windows PCs with the Intel Pentium-4 processors. The optimizations performed exploit the SIMD parallelism offered by the Intel Pentium-4 processors. The transcoder consists of two distinct components: the MPEG-2 video decoder and the MPEG-4 video transcoder. The MPEG-2 video decoder is based on the MPEG-2 Software Simulation Group’s reference implementation while MPEG-4 transcoder is developed from scratch with portions taken from the MOMUSYS implementation of the MPEG-4 video encoder. The optimizations include: 1) generic block-processing optimizations that affected both the MPEG-2 decoder and the MPEG-4 transcoder and 2) optimizations specific to the MPEG-2 video decoder and the MPEG-4 video transcoder. The optimizations resulted in significant improvements both in MPEG-2 decoding as well as the MPEG-4 transcoding. With optimizations, the total time spent by the transcoder was reduced by over 82% with MPEG-2 decoding reduced by over 56% and MPEG-4 transcoding reduced by over 86%.
New lifting folded pipelined discrete wavelet transform architecture
Guillermo Paya, Marcos M. Peiro, J. Francisco Ballester, et al.
The present work describes a new architecture for a CDF(2,2) wavelet base. The proposed architecture is based on the recursive pyramid algorithm (RPA) and the multirate folding technique to obtain better performance. The used of folding and retiming techniques improves the area and speed-rate. In order to obtain a maximally fast structure, we have modified the initial architecture scheduling getting internal pipelining delays to minimize the logic depth to one adder. Two different implementations using lifting scheme and polyphase decomposition are discussed. The lifting implementation requires approximately 52 % less hardware resources than the polyphase structure. Finally a comparative between our architecture and others folded architectures, which make all the computations into one filter bank, is presented. Our folded architecture reduces the number of registers and logic operators, increasing the frequency operation and minimizing the occupied area with the same throughput (one input / one output). Moreover, replicating delays block we can easily scale this architecture up. Our architecture performances an 87,5% hardware utilization.
0.25-µm technology arithmetic codec for mobile multimedia communicators
Alberto Alvarez, Sebastian Lopez, Jose Fco. Lopez, et al.
Low power dissipation is a must when dealing with mobile devices due to the influence related to its weight and hence, its portability. In this paper, the implementation of a 0.25 mm technology arithmetic codec with a good power/area/performance trade-off is presented. One of the key aspects introduced in order to obtain good performance is the fact of using low precision arithmetic rather than full precision, allowing the elimination of multiplications and divisions needed in order to process symbols and coefficients. These operations are replaced by shift/add operations, minimizing the complexity of the algorithm and improving the encoding and decoding process. The chip has been described in a high level language, ensuring its portability to other technologies. The implementation gives as result a 25 mm2 chip, pads included, with a total power dissipation of 300 mW and a frequency of operation of 10 MHz.
New distributed arithmetic discrete wavelet packet transform architecture
Guillermo Paya, Marcos M. Peiro, J. Francisco Ballester, et al.
The present paper describes a new architecture for a Discrete Wavelet Packet Transform (DWPT) based on a folded Distributed Arithmetic (DA) implementation, which makes possible to expand two complete stages (4-subband DWPT). The proposed parameterized architecture can use different CDF wavelet coefficients with modified precision. As the distributed arithmetic technique brings the possibility to make scalable designs, the proposed architecture can be easily parameterized. The data input and coefficient precision can be increased modifying the register size and the space memory, respectively. The number of coefficients can be change too increasing the memory and replicating the register structure. Our architecture uses only two FIR filters (high-pass and low-pass) that are folded to calculate various wavelet stages together in time. A discrete DWPT implementation using CDF(9/7) wavelet coefficients are implemented on VIRTEX-E1000-6 FPGA for different precisions. Finally, the use of both, the folding technique and the DA structure has offered a frequency operation of 75 MHz with 393 Flip-flop Slices (with 8 bits precision operation) on the FPGA.
Mixed-signal early vision chip with embedded-image and programming memories and digital I/O
Gustavo Linan-Cembrano, Angel Rodriguez-Vazquez, Rafael Dominguez-Castro, et al.
From a system level perspective, this paper presents a 128x128 flexible and reconfigurable Focal-Plane Analog Programmable Array Processor, which has been designed as a single chip in a 0.35μm standard digital 1P-5M CMOS technology. The core processing array has been designed to achieve high-speed of operation and large-enough accuracy (~7bit) with low power consumption. The chip includes on-chip program memory to allow for the execution of complex, sequential and/or bifurcation flow image processing algorithms. It also includes the structures and circuits needed to guarantee its embedding into conventional digital hosting systems: external data interchange and control are completely digital. The chip contains close to four million transistors, 90% of them working in analog mode. The chip features up to 330GOPs (Giga Operations per second), and uses the power supply (180GOP/Joule) and the silicon area (3.8 GOPS/mm2) efficiently, as it is able to maintain VGA processing throughputs of 100Frames/s with about 15 basic image processing tasks on each frame.
Technology II
icon_mobile_dropdown
Approaching nanoscale integration
Technological progress is inevitably linked with decreasing feature size. During the past we have learned that shrinking brings many benefits: Higher speed, lower power consumption (CU²), and higher levels of integration. This manifests itself in giga-speed for processors, highly complex SoCs, and this even for battery operated products like hand-held phones. However, dark clouds are rising on the sky: Processor developers are talking about a power crisis, meaning that they don’t know how to cool their chips. Experts are stating that analog scaling has come to an end. Development and processing cost start to become overwhelming. Why does this happen and how will it continue?
Holographic study of microsystems during space missions in the 21st century
Classification of microsystems is introduced. Review of holographic terrestrial aerospace research and in-orbit holographic investigations performed in microgravity conditions during the last century is given. Prospects of holographic in-orbit research of microsystems in the 21th century are regarded. Prospects of holographic research during future interplanetary missions in the 21th century are discussed. Advanced holographic techniques are presented. They enable quite novel possibilities of producing holograms and holographic interferograms of MEMS, microelectronic components and other microsystems. These innovative techniques suit ideally for testing of MEMS and microelectronics, monitoring of various physical processes, studying of vibrations and static deformations in microgravity aboard current orbital International Space Station. Minimal hardware is required. The hardware is very compact, portable and user-friendly. It is so simple that it can be operated by an astronaut having practically no skill in optics. One of the early variants of holographic techniques invented by this author was used to obtain the first ever holograms and holographic interferograms of different physical phenomena outside the Earth aboard navigating spaceships. The unique feature of innovative techniques is the possibility to work in real time in situ. It is possible to obtain holograms and holographic interferogams in any brightly lit environment, including sunlit environments. The last might be very important in the future planetary missions. Novel very small holographic device is presented. It is portable device with no lenses and no alignment problems. Holographic minirobot for planet-based investigations is proposed. Experimental data properly illustrating novel vast possibilities and prospects for the future in-orbit and interplanetary space research are presented.
Evaluation of package and technology effects on substrate-crosstalk isolation in CMOS RFIC
Xavier Aragones, Diego Mateo, Olga Boric-Lubecke
Crosstalk propagating through the silicon substrate is a serious limiting factor on the performance of advanced mixed analog-digital CMOS integrated circuits. This problem also appears in RF chips in the form of power leakage from local oscillators or power amplifiers, as well as the noise coupled from the digital baseband circuitry. Several studies have presented measurements on simple test structures to determine the best approach to minimize this leakage. Nevertheless, these studies are usually restricted to a single technology, and the consequences of applying results to other technologies are not evaluated. Also, these studies are usually performed with on-wafer samples, and thus package effects are not taken into account. However, package parasitics are an important factor in the substrate crosstalk, since they determine how much of the leakage finds a return path to external ground. In this paper, we discuss different technological approaches to increase isolation between coupled circuits. Measurements of the isolation on some test structures fabricated in a CMOS RF technology are presented. The package parasitics effect is evaluated by comparing on-wafer vs packaged samples. Measurement results are complemented with simulations of a broader range of situations.
Low-cost VLSI-compatible resonant-cavity-enhanced p-i-n in micron-Si operating at the VCSEL wavelengths around 850 nm
Low original design of Resonant-Cavity-Enhanced photodetectors at 850 nm, realized in microcrystalline silicon by simpe and low-cost thin film deposition processes compatible with standard VLSI technologies is presented. The configuration allows high quantum efficiencies in thin active region. This increases the bandwidth reducign the carrier transit time in teh device. The wavelength selective behavior is a further characterization of high-quality distributed bragg reflectors, necessary to the microcavity definition and optimization, and of the active p-i-n structure are also reported.
Modeling
icon_mobile_dropdown
Macromodel for exact computation of propagation delay time in GaAs and CMOS technologies
A new transient macromodel for the cells used in DCFL GaAs and CMOS digital design is introduced in this paper. The numerical solution determines accurate propagation delay times. The macromodel is based on the differential equation for the output voltage in terms of currents and capacitances. An straightforward treatment of the differential equation for an inverter in DCFL GaAs and CMOS has been obtained. It could be resolved numerically by a 4th order Runge Kutta method. Good agreement is obtained between the HSPICE simulation and the computation of the propagation delays for DCFL GaAs and CMOS basic gates: INV, NOR, OR and NAND. There is no error between HSPICE and our computation of propagation delay time for the high to low (tphl) and low to high (tplh) transitions. The propagation delay times for two types of transition were measured and compared with HSPICE. The results demonstrate that our approach matches with HSPICE with no error. The numerical method was programmed in C language. In addition, computation time analysis is provided and numerical solution is several orders of magnitude faster than HSPICE. Work is in progress to obtain the macromodel of a standard cell library for digital application both for a 0.6 microns E/D GaAs process (H-GaAsIV) from Vitesse Semiconductor and for a 0.18 microns logic/mixed-signal CMOS process (1P6M) from TSMC Corp.
Simulation of void formation in interconnect lines
Alireza Sheikholeslami, Clemens Heitzinger, Helmut Puchner, et al.
The predictive simulation of the formation of voids in interconnect lines is important for improving capacitance and timing in current memory cells. The cells considered are used in wireless applications such as cell phones, pagers, radios, handheld games, and GPS systems. In backend processes for memory cells, ILD (interlayer dielectric) materials and processes result in void formation during gap fill. This approach lowers the overall k-value of a given metal layer and is economically advantageous. The effect of the voids on the overall capacitive load is tremendous. In order to simulate the shape and positions of the voids and thus the overall capacitance, the topography simulator ELSA (Enhanced Level Set Applications) has been developed which consists of three modules, a level set module, a radiosity module, and a surface reaction module. The deposition process considered is deposition of silicon nitride. Test structures of interconnect lines of memory cells were fabricated and several SEM images thereof were used to validate the corresponding simulations.
Timing and power model for CMOS inverters
Nowadays, the delay, the output transition time and the short circuit power consumption of CMOS gates depend on the load capacitance and the input transition time. In currently used technology libraries, table models with 25 or more samples are used for calculating by interpolation each of these three variables. Previous work deriving analytical models are based on neglecting the short circuit current or approximating currents as piecewise linear. In the beginning of this paper, different mathematical models describing the transistor current are compared with respect to the accuracy of a numerical calculated output waveform. The results show that Sakurai's alpha-Power Model with linear equation in the linear region and exponent alpha=1 serves as a well-fitting model for the underlying 0.35 μm technology. Based on this transistor model and the assumption of a linear rising input, the differential equation of the output voltage, including both transistor currents and the capacitive load, has to be solved. Splitting the solution into regions, an approximate solution can be derived for the case that the PMOS transistor is working in linear and the NMOS in saturation condition. The rather complex calculation of the point where the PMOS transistor switches from linear to saturation region can be simplified by using curve fitting techniques. The required curve parameters depend on technology constants as in MM9 and the quotient wn/wp. Consequently, one set of parameters allows the analysis of a wide range of inverters as long as wn/wp is kept constant. The accuracy of the results for the delay are typically within 10%, those for output transition time and power consumption within 5% compared to spice simulation.
Empirical model of the metal losses in integrated inductors
Integrated inductors are key components in Radio Frequency Integrated Circuits (RFICs) because they are needed in several building blocks, such as voltage-controlled oscillators (VCOs), low-noise amplifiers (LNAs), mixers, or filters. The cost reduction, achieved in the circuit assemblage, makes them preferable to Surface Mounted Devices in spite of the different sources of lost that limits the use of integrated inductors; there are losses associated with the semiconductor substrate, and losses in the metals. We report, in this work, our research in modeling integrated inductors, particularly the losses in the metals. The model is derived from measurements taken from integrated spiral inductors designed and fabricated in a standard silicon process. The measurements reveal that the widely accepted lumped equivalent model does not properly predict the integrated inductor behavior at frequencies above 3 GHz for our technology. We propose a simple modification in the lumped equivalent circuit model: the introduction of an empirical resistor in the port 1-to-port 2 branch of the equivalent circuit. As a result, it will be demonstrated that the integrated inductor behavior is adequately predicted in a wider frequency range than does the conventional model. We also report a new methodology for characterizing the integrated inductors including the new resistor. In addition, the new model is used to build-up an integrated inductor library containing optimized integrated inductors.
CAD II
icon_mobile_dropdown
On-chip training for cellular neural networks using iterative annealing
Cellular Neural Network-Universal Machines (CNN-UM) are analog devices, which are excellently suited for image processing. A big challenge thereby is the determination of CNN templates for special image processing tasks. In many cases appropriate templates can only be found by a parameter optimization. The determination of templates for complex applications in the area of CNN is usually performed by using a CNN software simulator. Unfortunately, in many cases the determined templates cannot be used in hardware realizations of CNN caused by realization effects. In order to find robust templates, which are not only working on CNN simulators, but also on hardware implementations, we present in this contribution a new kind of on-chip-multi-template-training. Furthermore, as a possible application, we will also present a CNN-based solution of the problem of Pattern Matching, which is a processing step in many areas of image processing, like e.g. in Motion Estimation, Image- and Video-Compression.
Optimal design of a leak-proof SRAM cell using MCDM method
As deep-submicron CMOS technology advances, on-chip cache has become a bottleneck on microprocessor's performance. Meanwhile, it also occupies a big percentage of processor area and consumes large power. Speed, power and area of SRAM are mutually contradicting, and not easy to be met simultaneously. Many existent leakage suppression techniques have been proposed, but they limit the circuit's performance. We apply a Multi-Criteria Decision Making strategy to perform a minimum delay-power-area optimization on SRAM circuit under some certain constraints. Based on an integrated device and circuit-level approach, we search for a process that yields a targeted composite performance. In consideration of the huge amount of simulation workload involved in the optimal design-seeking process, most of this process is automated to facilitate our goal-pursuant. With varying emphasis put on delay, power or area, different optimal SRAM designs are derived and a gate-oxide thickness scaling limit is projected. The result seems to indicate that a better composite performance could be achieved under a thinner oxide thickness. Under the derived optimal oxide thickness, the static leakage power consumption contributes less than 1% in the total power dissipation.
Evolutionary design and FPGA implementation of digital filters
Antonia Azzini, Matteo Bettoni, Valentino Liberali, et al.
This paper discusses the use of evolutionary algorithms to design digital circuits. It is shown that evolutionary design can be fully compliant with the existing design methodologies. Moreover, the evolutionary design is capable to perform a better exploration of the design space, and therefore it can find solutions having different features with respect to conventional design. In some cases, evolved circuits can have better performances, or they can be optimized with respect to different parameters. An example on design of a multi-rate digital filter with reduced power consumption is presented and discussed. FPGA implementation demonstrates that evolutionary design can lead to both area and power saving with respect to conventional design.
Poster Session
icon_mobile_dropdown
Hierarchical test pattern composition to testing a foveal imager ASIC
Martin Gonzalez, Jose R. Salinas, Francisco J. Coslado, et al.
The aim of this work is the test of an ASIC, intended for multiresolution images generation, with high fault coverage and low number of patterns, looking for the improvement of the results obtained with other tools. The circuit includes a embedded SRAM block used to implement several internal FIFO structures. This RAM block has been generated with the 'Memory Compiler Systems' supplied by AMS, and does not includes BIST logic, so the strategy was to generate and insert the BIST logic to completely test the RAM operation. The original test algorithm proposed by the foundry support center, has been modified for a thorough verification. Also, to achieve the controllability and observability of the shadow logic connected to the RAM outputs and inputs respectively, the necessary test logic around the embedded block has been inserted. Once the test of the RAM has been guaranteed the remaining logic needs to be tested. To accomplish this task the full scan path approach has been selected, and a hierarchical bottom-up methodology has been followed to generate the test patterns. The ATPG commercial tools ( Synopsys Test Compiler) has been used only to generate the patterns for the lowest level modules of the hierarchy tree. Making the appropriate design partitioning (basically defining the modules with registered outputs), the patterns for the upper level modules can be easily composed. Several appropriate configurations for this smart partitioning has been identified and defined. Using a simple composing technique we can obtain a considerable reduction above 37% in the number of patterns with a negligible fault coverage decrease and hardware overhead.
Experimental characterization of a synchronous frequency-hopping spread-spectrum transceiver for wireless optical communications
Santiago T Perez Suarez, Jose Alberto Rabadan, Francisco Alberto Delgado, et al.
In this paper, the design and experimental characterization of a wireless optical transceiver for indoor applications, based on Frequency-Hopping Spread-Spectrum techniques, is presented. Using these techniques reduce the narrowband interference produced by optical sources and the intersymbol interference induced by multipath propagation. It also makes possible using the CDMA capabilities associated with Spread Spectrum, in order to improve the performances when several emitters and receivers are considered. The main drawback of these kind of systems lies on the high complexity of the synchronization system of the receiver, typically consisting on two cascaded structures: acquisition and tracking. We propose using a dual-pilot signal, transmitted by a master emitter, for reducing both complexity and cost of the synchronization stage of the receiver.
Analysis of current-mode flip-flops in CMOS technologies
Raul Jimenez, Pilar Parra, Pedro M Sanmartin, et al.
Switching noise reduction in mixed-mode VLSI circuits is of high importance in mixed-mode applications. The use of current-mode logic circuits, such as Current Steering Logic (CSL) or Current Balanced Logic (CBL) offers advantages in switching noise reduction, since their operation way is based on the use of an almost constant current source. However their usage is limited since they exhibit static power consumption. For this reason, these logic families are only used in those applications where the low-noise requirement becomes critical. Additionally, memory elements are the main source of noise in digital circuits, because they are driven for a few clock signals. In this paper, the analysis of different implementations of memory elements -edge-triggered flip-flops, in current-mode technologies is presented. Main parameters as area, delay, power consumption and noise generation have been measured by electrical simulation in a 0.35 m CMOS technology. The reliability in operation has been also quantified by timing violation parameters measurement. The main results obtained are, on one hand, the selection of a logic family for an specific application and, on the other hand, the selection of an specific flip-flop structure for a optimized parameter option -power, noise or speed. Variations of measured parameters for different operation conditions have been also considered. The novelty of this work lies in this analysis has not been considered before, being usual in other CMOS technologies.
Temperature in HFETs when operating in DC
This work analyses the DC response of an InGaAs channel PHFET when varying temperature. An analytic model for the drain current is derived from previous work, incorporating the extrinsic resistances. Experimental output characteristics at different temperatures are compared with those offered by the resulting model and numerical simulations. The DC drain current is obtained introducing the external voltages applied to the HFET terminals into an intrinsic model. The temperature range considered in this paper is between 300 and 400 K. In this range, the temperature dependence of the intrinsic electrical parameters is included in the model. For the temperature dependence of the extrinsic resistances, the HFET is numerically simulated with MINIMOS-NT. As far as we know, any influence of the electron transport through the AlGaAs/InGaAs heterojunction on the extrinsic resistances has not been already established. In our case, a termionic-field-emission (TFE) is used to simulate this effect (without TFE not only the drain current is underestimated, but also the temperature dependence predicted is opposite to the actual). As result, the extrinsic source resistence is nearly constant (7.5 ohms), and higher values are obtained for the extrinsic drain resistence, which has a linear and positive temperature dependence, raising as the transistor operates in saturation region. When the drain voltage diminishes, the influence of the TFE model on the extrinsic resistances vanishes, and RD tends to RS. The drain current predicted by the model, in linear and saturation region, shows a relative error between measured and modeled values smaller than 10%.
Laser-induced structure defects and their use for modification of the properties of (Cd,Hg)Te epitaxial layers and CdTe crystals
Bohdan K. Kotlyarchuk, Apollinariy O. Zaginey, Yuriy E. Syvenkyy
The work deals with the experimental researches of the processes of structural defects generation in (Cd,Hg)Te epitaxial layers on the CdTe substrates and CdTe monocrystals after impulsive laser processing and their influence on the mechanical, optical and galvanomagnetical properties. In the experiments we used the ruby laser radiation with energy density, changed in the range of 1,5-15 J/sq.cm. The duration of the laser impulses was about 1,5 ms and 20 ns. Changes in the chemical composition of the irradiated surface have been analyzed by Auger electron spectroscopy. The zones with increased defects concentration were determined by the method of selective chemical etching. It was shown, that the impulsive laser processing results in both the essential redistribution of the components concentration and generation of the linear and dot defects in the near-surface crystal layers, excited by the laser irradiation. Microhardness of the surface, irradiated with the laser without preliminary heating increases in the average on 20-30%. The photoluminescent properties of the laser modified cadmium-telluride samples were investigated in the spectral range of 650-1000 nm. After the laser irradiation of the samples the redistribution of the intensity of the luminescence bands and emergence of new band was observed in the region of 840 nm, when the temperature of samples was about 4,2 K. The essential growth of the spectral band intensity with a maximum within the range of 875-885 nm, when T= 77 K, was observed as well. Diminution of the life time of non-equilibrium charge carriers in a defective zone creates the premises to the magnetoconcentration effect origin in the crossed electrical and magnetic fields. The perspective of usage of such (Cd,Hg)Te structures as infrared and magnetic field sensors is shown.
Design and simulation of an a-Si:H/GaAs HBT with improved DC and high-frequency characteristics
In this work the properties of an a-Si:H/GaAs heterojunction are discussed and the analysis of advantages that may result from its use in bipolar devices compatible with the GaAs homojunction technology is performed. Experimental and theoretical results are presented concerning the application of a wide gap amorphous silicon layer to improve the injection efficiency into GaAs regions. Fundamental DC and high frequency characteristics of an a-Si:H/GaAs heterojunction bipolar transistor (HBT) are investigated through detailed numerical simulations. The electronic properties of the a-Si:H layer, as the distributed density of states typical of amorphous materials, have been carefully considered. The tuning of the simulator and the reliability test of its output have been performed on experimental results obtained through the fabrication of a-Si:H/GaAs p-i-n diodes. The study shows that limiting the number of defects located at the amorphous/crystalline interface below a critical level would dramatically improve the minority carrier injection ratio at the heterojunction. The current thin film silicon technology would allow the fabrication of a transistor performing a DC current gain close to 3000 and a cut-off frequency close to 10 GHz. Due to the simplicity of fabrication, such a device could represent an effective way to add a bipolar stage to a GaAs MESFET IC without recurring to AlGaAs/GaAs heterostructures.
Diffusion barrier layer fabrication by plasma immersion ion implantation
Mukesh Kumar, - Rajkumar, Dinesh Kumar, et al.
Plasma immersion Ion Implantation technique has been used to modify the diffusion barrier properties of titanium (Ti) metal layer against copper diffusion. Ti coated silicon wafer were implanted with doses viz. 1015ions/cm2 and 1017ions/cm2 corresponding to low and high dose regime. High dose of implantation of nitrogen ions in the film render it to become Ti(N). Cu/Ti(N)/Si structures were formed by depositing copper over the implanted samples. Diffusion barrier properties of Ti(N) was evaluated after annealing the samples up to 700 degrees C for 30 minutes. Sheet resistance, X-Ray Diffraction (XRD) and Scanning Electron Microscope (SEM) measurements were carried out to investigate the effect of annealing. Low dose implanted Ti layer does not show any change in its diffusion barrier properties and fails at about 400 degrees C. The failure of diffusion barrier properties of low dose implanted samples is attributed to the chemical reaction between titanium and copper films. The high dose implanted layer stops the diffusion of Cu metal through it even at high annealing temperature. The enhancement in its diffusion barrier properties is supposed to be due to nitridation of titanium film which increases the activation energy involved for its chemical reaction with copper metal film.
Switching-noise reduction in clock distribution in mixed-mode VLSI circuits
Pilar Parra, Antonio J. Acosta, Manuel Valencia
Digital switching noise is the main source of on-chip noise in mixed-signal ICs. When many digital gates change state together, a large cumulative current spike flows through parasitic resistances and inductances and noise is also injected into the substrate, causing the sensitive analog portions of the design to malfunction. Many solutions have been proposed to alleviate this problem from both the analog and the digital domain. Some current mode families are used in low noise applications, but are strongly unsuited for low power applications, due to its static power consumption. In this paper we propose a set of techniques to reduce the switching noise generated by the digital circuitry, based on classical digital (static CMOS) methodologies at a circuit level. One of the most important sources of switching noise in large VLSI circuits is the clock-driven circuitry and the clock generation and distribution logic. It is well known for the mixed-signal community that harmonics of clock signal are easily injected in the analog part. This paper analyzes how some actuations like the insertion of buffers, the suited placement and routing of the clock tree cells, as well as the suited sizing of devices can save switching noise. In fact different solutions for the clocking logic generate very different results for switching noise. This paper faces the analysis and design of clock generation and distribution logic oriented towards low noise applications. Some illustrative examples will shown the feasibility of the proposed solutions, and some useful design guidelines will be proposed for the community of digital designers.
Integrated optical scheme for residue-based logic operations
Residue arithmetic is a very promising mathematical approach accepted in optical parallel processing and computing for its inherent parallelism. Here in this paper is an integrated optical scheme using residue arithemetic is proposed.
Digital optical switch based on amorphous silicon waveguide
The performances of a digital optical switch, based on the thermo-optic effect in amorphous silicon, are investigated for the first time to our knowledge. Numerical simulations show that the strong thermo-optic effect of amorphous silicon, combiend with the possibiltiy of realizing micrometric integrated structures, allows the realization of efficient waveguide devices capable of microsecond switching times. The switch, operating at the IR communications wavelength of 1550 nm, can be easily integrated in silicon optoelectronic circuits and is thought for low-cost photonic applications.
1.55-µm reflection-type optical waveguide switch based on thermo-optic effect
Based on the total internal reflection (TIR) phenomenon and the thermo-optic effect in hydrogenated amorphous silicon (a-Si:H) and crystalline silicon (c-Si), a symmetric rib optical waveguide integrated switch is proposed and theoretically discussed. The device exploits the similar refractive index coupled to the different thermo-optic coefficient in the two materials. The possibility of alloying and doping for the band-gap engineering of a-Si:H, by means of the gas phase composition during the modern plasma enhanced chemical vapour deposition process, which takes place at temperatures as low as 220 degrees C, makes this semiconductor ideal for this type of application. In particular the refractive index at room temperature of the amorphous film can be properly tailored to match that of c-Si in order to achieve the light switching when the device experiences a given temperature change. TIR may be achieved however at the interface by acting on the temperature, because the two materials have different thermo-optic coefficient. The integrated single-mode rib waveguide is 4 μm wide and 3 μm high. The substrate is a SOI wafer with an oxide thickness of 500 nm. The switch has a quite short operation length of about 280 μm. The device performance is analyzed at the wavelength of 1.55 μm. It shows that the output crosstalk and insertion loss are less than -26.9 dB and 3.5 dB, respectively.
Comparison of CMOS and BiCMOS optical receiver SoCs
Currently two very interesting trends in design of optical receivers can be observed. The first is to realize optical receivers in deep-sub-μm CMOS technology and to integrate them in analog-digital systems-on-a-chip (SoC). The second even much more innovative trend is to integrate voltage-up-converters (VUCs) in optoelectronic integrated circuits (OEICs) to increase the bandwidth and data rate, whereby only the chip voltage supply is necessary. The properties of deep-sub-µm CMOS optical receivers and of sub-μm OEICs with respect to current consumption, noise, and chip area will be compared. For both trends a new design each and measured results will be presented. The first example is a burst-mode receiver in digital 0.18μm CMOS technology with sensitivities better than -28 dBm and -22 dBm at data rates of 622Mb/s and 1.25Gb/s, respectively, for a bit error rate of 10-10 each. These values compare to sensitivities of -24.5 dBm and -24.1 dBm, respectively, of a 0.6μm BiCMOS OEIC. For implementation of the burst-mode receiver in an analog-digital SoC, a differential circuit is chosen. Another example is an OEIC in 0.6μm BiCMOS technology with an integrated VUC, which generates a bias voltage of 16V for the integrated photodiode from the chip supply voltage of 5V. Due to the VUC, the data rate for the given technology is increased from 50Mb/s to 1.5Gb/s. The dependence of the receiver sensitivity and of the maximum photocurrent on the VUC clock-frequency will be shown. The VUC-OEIC represents a complete SoC consisting of sensor, analog and digital part. Aspects of substrate noise coupling from the digital part into the photodiode and amplifier are discussed.
Method of generating trustworthy performance estimations for soft IPs
Margarita Marrero, Pedro P. Carballo, Antonio Nunez
At 0.25, 0.18 um processes and beyond important process variations occur not only from one fab to another among batches. Moreover as we approach the realm of deep-submicron design, process variations even across a single die are predicted to become a major source of spread. Reduced signal levels, noise margins and timing windows are all contributing to make previously minor variations in geometry and technological parameters a big issue for circuit design. Worse still, new mechanisms appear that cause important variations not only in transistors but also in interconnect. And some of those mechanisms, show greater variation across a single die than across similar structures on different dice from a wafer. Thus the chip designer must expect significant and not necessarily predictable differences between transistors and between interconnect resistances on a single die. Given this scenario widely recognised by process engineers, and given the additional spread built-in in the process of mapping from a soft IP design to a hard IP block, if the designer had the opportunity to know certain performance parameters of the final hard-cores without doing successive synthesis it would lead to an easier and more predictable and accurate integration of the blocks in the system. In this sense, pre-characterised trust-worthy soft-IP blocks would be preferred candidates to select. We have explored ways for quantifying and analysing the synthesis to layout spread so that, instead of modelling the spread in devices and interconnects, we model and quantify at a higher abstraction level the technology mapping process as a whole, for a set of seed designs that will give bounds and guidelines for the behaviour of other designs when they are mapped to the same technology. For that purpose, only the best-, typical-, worst-case and other process variation corners need to be known. The analysis is based in the actual measured spread of reference seed designs as they experience spread when passing from soft to hard designs.
Image Processing I
icon_mobile_dropdown
IC technology trends for wireless local area networks
Wireless Local Area Networks (WLANs) have rapidly matured from a curiosity to a 'must have' for many personal computer users. This has been made possible by the incredible development that has occurred in WLAN chipsets over the past few years. From a 'You can't do that in CMOS!' to a 'Gee, I didn't think you could do that in CMOS!', cheap mass produced WLAN chipsets have flooded the market. This paper will summarize some recent developments and look at some future trends in this exciting area.