The von Neumann model is an architecture describing traditional stored-program computers. There are established concerns regarding the ultimate viability of this model to continue the performance-scaling trends associated with Moore's Law (i.e., the number of transistors per unit area on integrated circuits doubling every year) due to the limitations inherent in the system (e.g., bottlenecks that occur in memory access). Recent studies have reported that only the most parallelizable of benchmarks (those in which >99% of code can be efficiently parallelized) will continue to benefit from increased core scaling. However, voltage scaling may have a more far-reaching positive impact on parallel workloads and has been touted as another lever with which Moore's Law could be extended.1,2
These limitations have led to growing interest in non-von Neumann and non-Boolean computer architectures. Finding new energy-efficient architectures to replace the von Neumann model represents a significant area of ongoing research.5 Among these efforts, the Defense Advanced Research Projects Agency (DARPA) recently initiated a research program to improve the real-time processing of video surveillance imagery. The specific focus of the project (UPSIDE)6 is to leverage the physics of emerging nanoscale devices for non-digital processing.
Using device-level benchmarking data, we have been able to project the way in which voltage scaling could impact core scaling for metal-oxide-semiconductor field-effect-transistor (MOSFET) technology.7 Our results show that although some additional speedup is obtained for parallel benchmarks, speedup projections still fall short of the historical performance scaling trends associated with Moore's Law.
Currently, we are targeting hardware realizations of non-von Neumann architectures, particularly cellular neural networks (CNNs). Generally speaking, a CNN is a spatially parallel computing paradigm consisting of identical processing elements, which are often analog in nature and are connected to their nearest neighbors. We aim to capitalize on the unique properties of emerging device technologies to improve the architectural performance, energy efficiency, and functionality of CNNs.
The use of alternative computational models and analog processing primitives is not new. In fact, the analog array-based CNN processor architecture was first proposed in 1988.9, 10 In the time since, this architecture has been shown to significantly improve both the power and performance of various computation-intensive information-processing applications.11 The attractive features of CNNs include: the need for primarily local interconnects; cells and synaptic interconnections that are typically space-invariant; and continuous-time processing of analog signals in a highly concurrent manner. By using a CNN to process information, the overhead of analog-to-digital conversion could be avoided, resulting in computational speedup. As an example, for complex 2D image processing functions, a commodity, CNN-based processor with an area of 1.4cm2 and a power budget of just 4.5W could match the performance of the IBM cellular supercomputer (i.e., von Neumann architecture) with an area of ∼7m2 and a power budget of 491kW.12
The resolution of a state-of-the-art CNN architecture remains limited, however. For example, the high-frame-rate Eye-RIS system13 comprises highly functional and programmable CNN cells capable of performing a wide variety of image pre-processing functions on the order of 10,000 frames per second, but its image resolution is restricted to just 176×144 pixels: 81× smaller than high-definition video. The power consumption of current CNN hardware also has room for improvement.
Considering these issues, our goal is to exploit emerging transistor technologies to realize more energy- and area-efficient non-von Neumann architectures. Although the current-voltage (I-V) characteristics of many emerging transistors could allow them to serve as drop-in replacements in digital circuits, their I-V characteristics are nevertheless typically different than those of MOSFETs. This, in turn, opens up opportunities in the analog design space.
We have developed CNN-inspired architectures, specifically, diffusion networks based on tunnel field-effect transistors (TFETs) and symmetric tunneling field-effect transistors (SymFETs).8 SymFETs and TFETs are highly nonlinear devices with pronounced negative differential-resistance regions.3, 4,8,14 Representative I-V characteristics of TFETs and SymFETs are illustrated in Figure 1.
Current-voltage (I-V) characteristics for (a) homojunction and heterojunction tunneling field-effect transistors (TFETs and HTFETs, respectively) versus high-performance and low standby power complementary metal-oxide semiconductor (CMOS),3
and (b) theoretical I-V characteristics for a symmetric tunneling field-effect transistor (SymFET) device based on work by P. Zhao et al
D: Drain. S: Source. BG: Back gate. DS: Drain source. GS: Gate source. VTG: Voltage on the top gate. VBG: Voltage on the back gate.
Although SymFETs may not be suitable for conventional linear analog circuits (e.g., operational transconductance amplifiers), they provide an exciting opportunity for the investigation of nonlinear computational paradigms. As an example, diffusion networks—see Figure 2(a)—comprise a 1- or 2D grid of resistors and capacitors. The diffusion process can be used to approximate a Gaussian smoothing filter widely used in imaging processing (e.g., for the removal of noise) that precedes tasks such as edge detection. This allows particular features or objects in a complex image to be more readily identified.
(a) A diffusion network: a special-purpose CNN that consists of a 2D grid of resistors and capacitors. The resistors are typically metal-oxide-semiconductor field-effect transistors, and the voltage of each capacitor is related to that of its neighbor. R: Resistor. C: Capacitor. V: Voltage. Vi, j
: The voltage of the node at coordinates i, j.8
(b) (i) 256×256 pixel original image; (ii) output of isotropic diffusion (linear resistor); (iii) output image assuming ideal Perona-Malik diffusion (nonlinear resistor); (iv) output image with a SymFET (nonlinear resistor).8
Simulations done with SPICE.
Resistors are typically implemented by MOSFETs operating in the linear/triode region. Interestingly, if the resistive elements in Figure 2(a) exhibit a special type of nonlinear (i.e., bell-shaped) I-V characteristic, the smoothing operation will preserve the edges of the original image.15 A SymFET in a diode-connected configuration with the top-gate and back-gate shorted exhibits similar characteristics to the desired I-V curve. Figure 2(b) shows a comparison between a variety of diffusion networks. Our preliminary results indicate the potential of SymFETs for applications such as these. Initial projections suggest that device count, power, and processing time can all be improved by at least an order of magnitude by employing a SymFET network in lieu of hardware based on complementary metal-oxide semiconductors.8
TFET-based diffusion networks have also been developed.8 TFETs exhibit asymmetric conduction. Replacing resistors with TFETs could therefore lead to directional diffusion networks capable of facilitating optical flow calculations in a variety of applications (e.g., video processing). Although diode-connected MOSFETs could offer similar functionality and complexity, the conduction process using TFET-based hardware is less likely to disrupt the spatial smoothing process.
In summary, emerging transistor technologies could play a role in constructing more capable non-von Neumann architectures with greater efficiencies. It is important to stress that devices continue to evolve and a great deal more work is required before experimental results can match theory.16 However, in at least some instances, architectural utility can be derived from even imperfect devices.8 In future work, we will continue to investigate other application-level targets17–19 and benchmark our proposed hardware against current state-of-the-art system architectures.
The authors were supported in part by the Center for Low Energy Systems Technology (LEAST), one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO (Microelectronics Advanced Research Corporation) and DARPA.
Michael Niemier, Xiaobo Sharon Hu
University of Notre Dame
Notre Dame, IN
1. H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, D. Burger, Dark silicon and the end of multicore scaling, Proc. 38th Int'l Symp. Comput. Arch., p. 365-376, 2011.
2. H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, D. Burger, Power challenges may end the multicore era, Commun. ACM 56(2), p. 93-102, 2013.
3. B. Sedighi, X. S. Hu, L. Huichu, J. J. Nahas, M. Niemier, Analog circuit design using tunnel-FETs, IEEE Trans. Circuits Syst. I: Regular Papers 62(1), p. 39-48, 2015.
4. P. Zhao, R. M. Feenstra, G. Gu, D. Jena, SymFET: a proposed symmetric graphene tunneling field effect transistor, 70th DRC, p. 33-34, 2012.
7. A. Horvath, X. S. Hu, J. Nahas, M. Niemier, I. Palit, R. Perricone, B. Sedighi, Architectural impacts of emerging transistors, IEEE 12th Int'l NEWCAS, p. 69-72, 2014.
8. B. Sedighi, X. S. Hu, J. J. Nahas, M. Niemier, Nontraditional computation using beyond-CMOS tunneling devices, IEEE Trans. Emerg. Sel. Topics Circuits Syst. 4(4), p. 438-449, 2014.
9. L. O. Chua, L. Yang, Cellular neural networks: theory, IEEE Trans. Circuits Syst. 35, p. 1257-1272, 1988.
10. L. O. Chua, L. Yang, Cellular neural networks: applications, IEEE Trans. Circuits Syst. 35, p. 1273-1290, 1988.
11. L. O. Chua, T. Roska, The CNN paradigm, IEEE Trans. Circuits Syst. I: Fundament. Theory Appl. 40, p. 147-156, 1993.
12. T. Roska, Cellular wave computers for brain-like spatial-temporal sensory computing, IEEE Circuits Syst. Mag. 5(2), p. 5-19, 2005.
14. L. Hao, A. Seabaugh, Tunnel field-effect transistors: state-of-the-art, IEEE J. Electron Devices Soc. 2(4), p. 44-49, 2014.
15. P. Perona, J. Malik, Scale-space and edge detection using anisotropic diffusion, IEEE Trans. Pattern Anal. Mach. Intell. 12(7), p. 629-639, 1990.
16. L. Britnell, R. V. Gorbachev, A. K. Geim, L. A. Ponomarenko, A. Mishchenko, M. T. Greenaway, T. M. Fromhold, K. S. Novoselov, L. Eaves, Resonant tunnelling and negative differential conductance in graphene transistors, Nat. Commun. 4, p. 1794, 2013. Published online 30 April 2013.
17. I. Palit, X. S. Hu, J. Nahas, M. Niemier, TFET based cellular neural network architectures, ISLPED, p. 236-241, 2013.
18. I. Palit, B. Sedighi, A. Horvath, X. S. Hu, J. Nahas, M. Niemier, Impact of steep-slope transistors on non-von Neumann architectures: CNN case study, Proc. Design Automat. Test Eur. Conf. Exhibit., 2014.
19. B. Sedighi, I. Palit, X. S. Hu, J. Nahas, M. Niemier, A CNN-inspired mixed signal processor based on tunnel transistors, Proc. Design Automat. Test Eur. Conf. Exhibit., p. 1150-1155, 2015.