Krishna Palem, a computer scientist at Rice University, has an unorthodox prescription for building faster computers. "If you are willing to live with inexact answers, you can compute more efficiently," he says. If you ask him whether one of his processors is working correctly, he's apt to answer, "Probably."
He's not joking, and his reasoning is based on two sound principles. Palem has shown that you can get dramatic energy savings by slightly relaxing the requirement that a computer be 100% correct. Further, he says there are many applications where some errors are completely acceptable and even unnoticeable by users.
Palem's ideas, which date to 2003, sound like a kind of fault-tolerance, a discipline that dates to the dawn of computing. There is, however, a crucial difference: traditional fault-tolerance aims to produce reliable results from unreliable components; Palem's devices are intended to produce unreliable results from unreliable components.
In 1952, the computing pioneer John von Neumann, in a lecture at Caltech on "error in logics," argued that computational errors should not be viewed as mere "accidents," but as "an essential part of the process." That is, error has to be treated as an intrinsic part of computing.
In subsequent years, the relatively unreliable electromechanical relays and vacuum tubes of von Neumann's day gave way to highly reliable semiconductor chips, and worries over errors seemed less pressing, except in mission-critical applications where elaborate fault-tolerant (error-correcting) mechanisms sometimes were required.
Now, Palem argues, the pendulum is swinging back, as the march of Moore's Law drives computer circuits to ever-smaller sizes. At the nano scale, they become increasingly difficult to manufacture uniformly and are more likely to fail. At the smallest dimensions, energy waste and heat barriers start playing an increasingly dominant role.
Indeed, energy use has become the grand challenge of computing. The New York Times recently (Sept. 22, 2012) reported that in 2010, data centers used 2% of all the electricity consumed in the U.S., with Google's data centers alone consuming almost 300 million watts. According to the newspaper, Amazon had been cited for 24 violations of air-quality regulations over a three-year period in Northern Virginia, mostly from its big arrays of diesel-powered backup generators. And energy use is an issue at the level of individual users, with people carrying more and more portable devices featuring power-hungry applications like video that drain their batteries ever faster.
The semiconductor industry's latest (2011) International Technology Roadmap for Semiconductors puts the issue in stark terms: "Power consumption has become the key technical parameter that controls feasible semiconductor scaling: device roadmaps are now power-driven, and operating frequencies have fattened in several key market domains."
In 2003, Palem published his idea for saving energy. "I proposed the idea of designing computing switches that you know are going to be inaccurate, with the amount of inaccuracy being a design parameter," he says. These designs are now referred to as "inexact designs."
In an inexact design, no attempt is made to correct for some inaccuracies; no attempt is made to ensure absolute reliability as von Neumann recommended. The designs are appropriate in applications such as video, where an incorrect pixel here or there is unnoticed or can be tolerated; in audio, where a slight distortion is acceptable, or in machine learning (classification) problems. A computing process that works by iteration might converge faster if it makes a few errors on the way, and it can still provide a satisfactory result. Some of those applications exist in mobile devices, where energy savings for long battery life are especially desirable.
Palem and his colleagues tried out the idea, which they also call "probabilistic computing," first in CMOS. In 2004, they showed in simulations that "probabilistically correct CMOS gates" (or switches), since dubbed PCMOS gates, could be made 300% more energy efficient by reducing the probability of accuracy 1.3%, from .988 to .975.
In May 2012, researchers including Palem and colleagues at Nanyang Technological University (NTU) in Singapore, Switzerland's Center for Electronics and Microtechnology, and the University of California at Berkeley unveiled a PCMOS adder that produced an incorrect value .25% of the time, but was 3.5 times more efficient as a result. By relaxing the probability of correctness to .92, the chips became 15 times more efficient. Palem is director of the Rice-NTU Institute for Sustainable and Applied Infodynamics, which did the work.
Palem points out that in the number 5,263, the "3" is much less important than the "5." In adding two such numbers, what if the processor simply ignored the low-order digit (the low-order bits)? Perhaps it wouldn't matter to the usefulness of the application. He and his colleagues have done just that, by a process called "circuit pruning," in which the chip is literally stripped of the energy-consuming circuitry needed for the low-importance bit processing. "Not all bits are created equal," Palem says. "This allows us to design computer circuits where the value of information guides investment."
"Not all bits are created equal. This allows us to design computer circuits where the value of information guides investment."
Electing to use single-precision rather than double-precision math in an application is in essence a kind of "bitwise pruning," says Avinash Lingamneni, a graduate student at Rice who has worked with Palem. But that is often too crude a method, he points out, because the optimum amount of pruning might lie between the two. Also, he says, "we found that instead of reducing the bit-width, we can actually modify the structure and parameters of the arithmetic operators themselves (which are often designed for the worst-case scenarios), leading to even more resource-savings than what could be obtained by bit-width pruning for the same accuracy loss."
Related to circuit pruning is a principle called "confined voltage scaling." The pruning results in lower energy use, but also speed improvements; for example, pruning might result in a threefold energy savings and a twofold speed increase. But if the designer is willing to go with the same speed as the original chip, then the pruning results in the chip running at a lower voltage, and energy savings rise to a factor of 4.8 in Palem's studies.
So far, the work of Palem and his colleagues has been limited to hardware, via static pruning. However, dynamic "gating" of portions of the chip by software is also possible. An application could choose to forgo using portions of the chip if the application designers felt the benefits in energy savings or speed were worth the loss of accuracy. The logic added for gating comes at a cost, because the instructions required to turn portions of the chip on and off also consume energy.
Of course, a small error in a single calculation might compound unacceptably through a chain of operations, each of which makes a similar small error. But Rice's Lingamneni says most of the applications targeted so far have small chains of serial calculations. Also, it is often possible to design successive blocks of instructions so that a given block is likely to offset the error from the previous block. "We have algorithmic frameworks in place [for] cascaded inexact computations, especially for embedded and signal processing applications," he says. The algorithms favor approaches where errors are additive and hence accumulate relatively slowly, and not multiplicative, where they might grow very rapidly.
Indeed, hardware-based pruning requires some knowledge of what will run on the chip. "We are studying workloads where the behavior of the application tells us how to distribute the error," Palem says. "You tune it to match the needs of the application." For example, researchers at the Rice-NTU Institute are exploring the use of inexact circuits in an energy-efficient hearing aid that ultimately could fit inside the ear canal. A multidisciplinary team that includes neuroscientists and a linguist is studying the hearing process to determine which parts of the audio signal are most important and which parts can be ignored. Researchers are also designing a low-power graphics controller for communities without electricity, and a high-efficiency "machine-learning engine" for pattern recognition, dictionary classification, and search.
"We are studying workloads where the behavior of the application tells us how to distribute the error. You tune it to match the needs of the application."
The principles of probabilistic computing are applicable elsewhere in computer systems, Palem says, adding that he and his colleagues are also looking into their use in memories that are less than perfect. Further out, he predicts their use in nanoscale devices such as molecular computers, which are inherently prone to error.
IBM's Blue Gene series of massive supercomputers uses both the traditional concepts of fault-tolerance and the newer concepts behind probabilistic computing, which it calls "approximate computing." "The Watson "Jeopardy" machine is a deep Q&A machine, where the answer is not required to always be precisely correct and you can still win the game," says Pradip Bose, IBM's manager of reliability- and power-aware microarchitectures.
The system can often forgo exactness at a low level as well. Bose offers this example: a program has a statement, "If A>B, then do..." Conventional computers will precisely compute A, precisely compute B, then precisely compare them, he says. Yet the logical test can possibly be done satisfactorily by comparing just a few bits. "You can save a lot of power by not always doing the full computation," he says.
Neural systems, which attempt to mimic the brain, are the subject of much research (see "Artificial Connections," p. 15). "The brain's computations are very inexact, and these architectures are following the same idea," Bose says. "Humans have no trouble walking without solving differential equations in their brains, as a robot might. What we do is approximate, but we don't fall."
Traditional fault-tolerant systems, which correct for errors, could be adapted to fail probabilistically and reap efficiencies thereby, Bose says. For example, IBM's production POWER7 server uses a technique called Guardband to save energy by dynamically lowering operating voltage, a process Bose calls "living dangerously" because of the increased probability of processor error. Guardband blocks these errors by dynamically monitoring results and decreasing the processor's clock frequency if errors start to occur. However, says Bose, if the application will tolerate some error, you could allow for lower voltages and energy consumption. "If you are willing to live even more dangerously, then you can save even more power," he says.
"Living dangerously" may be the price we pay to help solve the computer power problem, but that will require a good understanding of the applications that will run on these energy-efficient systems. William Pulleyblank, a professor of operations research at the U.S. Military Academy and a former Blue Gene architect at IBM, puts it this way: "If it's a video game, and rocks are tumbling over a cliff, and a rock defies gravity for a second, I may not care. But if it's a controller in the Space Shuttle, I may care a lot."
Chakrapani, L., George, J., Marr, B., Akgul, B., Palem, P.
Probabilistic design: a survey of probabilistic CMOS technology and future directions for terascale IC design, VLSI-SoC: Research Trends in VLSI and Systems on Chip, December 2007. http://www.ece.rice.edu/~al4/visen/2008springer.pdf
European Semiconductor Industry Association, Japan Electronics and Information Technology Industries Association, Korean Semiconductor Industry Association, Taiwan Semiconductor Industry Association, U.S. Semiconductor Industry Association
International roadmap for semiconductors, 2011. http://www.itrs.net/Links/2011ITRS/home2011.htm
Lala, Parag K.
Self-checking and fault-tolerant digital design, Elsevier Science, The Morgan Kaufmann Series in Computer Architecture and Design, June 2000. http://www.elsevier.com/wps/find/bookdescription.cws_home/677912/description#description
Palem, K., Chakrapani, L., Kedem, Z., Lingamneni, A., Muntimadugu, K.
Sustaining Moore's law in embedded computing through probabilistic and approximate design: retrospects and prospects, International Conference on Compilers, Architectures, and Synthesis for Embedded Systems, Grenoble, France, October 1126, 2009 http://www.ece.rice.edu/~al4/visen/2009cases.pdf
Shanbhag, N.R., Mitra, S., de Veciana, G., Orshansky, M., Marculescu, R., Roychowdhury, J., Jones, D., Rabaey, J.M.
The search for alternative computational paradigms, Design & Test of Computers, IEEE, JulyAug. 2008 http://dl.acm.org/citation.cfm?id=1440404.1440427
Figure. An application-specific integrated circuit (ASIC) chip developed by a team from Nanyang Technological University and Rice University. In 2012, that team, with colleagues from other institutions, unveiled an "approximate" computing PCMOS chip that is 15 times more efficient than standard chips.
©2013 ACM 0001-0782/13/04
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from firstname.lastname@example.org or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2013 ACM, Inc.