Researchers at the University of Texas and The Australian National University have published groundbreaking research in assessing the power requirements of microprocessors under varying software loads. These researchers say a wide range of developers will need to understand the principles behind this work as performance-to-power issues become more central in assessing what types of hardware and software architectures will yield the most efficient returns — and that hardware vendors should start supplying on-chip power meters to help speed this vital measurement.
"Performance improvements which were exponential and taken for granted are not happening without a whole lot more work and engineering," says Kathryn McKinley, who holds an endowed professorship of computer science at the University of Texas at Austin, and is currently on leave as a principal researcher at Microsoft. That’s a good thing for creating jobs for computer scientists, McKinley says, but it’s a bad thing for the average developer who hasn’t had to focus intently on the trade-offs between performance and energy consumption until recently.
McKinley and Stephen Blackburn, professor of computer science at the Australian National University, and their graduate students, co-authored the research investigating these vital benchmarks, Looking Back on the Language and Hardware Revolutions: Measured Power, Performance, and Scaling, presented at the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2011).
The research tested power and performance profiles of representative Intel processors while they executed sequential and parallel benchmarks written in native and managed languages. The results, in several instances, were counterintuitive. For example, they discovered a "huge variety of processor power, performance, and energy responses," due to features such as clock scaling, microarchitecture, Turbo Boost, and simultaneous multithreading; these responses reveal what they termed "a complex and poorly understood energy efficiency design space."
For instance, they found that halving the clock rate of the i5 processor increases its energy consumption around 4 percent, whereas it decreases the energy consumption of the i7 and Core 2D by around 60 percent. What this means, the researchers observed, is that running the i5 at its peak clock rate is as energy efficient as running it as its lowest, whereas running the i7 and Core 2D at their lowest clock rate is substantially more energy efficient than their peak.
"This variety and the difficulty of obtaining power measurements recommends exposing on-chip power meters and when possible structure specific power meters for cores, caches, and other structures," they wrote. "Just as hardware event counters provide a quantitative grounding for performance innovations, power meters are necessary for optimizing energy."
Multi-Clocking Efficiencies
The research in the ASPLOS paper, which examined energy requirements on native and managed programming languages, has led Blackburn and McKinley to further explore the impact of virtual machine services (VMS) on processors. McKinley says perfecting VMS architecture could save impressive amounts of energy, especially as applications for complex business platforms are overwhelmingly written in managed languages such as Java and C#. These applications require virtual machine services such as compilation, interpretation, and memory management, and consume an average of 20 percent of system energy. And the VMS structure of the next generation of languages, McKinley says, is even more power-hungry.
"There’s been an explosion of people writing in languages like JavaScript, PHP, Ruby, and Python, where you’re really writing in a high level, and that’s a combination of domain specialists and people who just want computers to do stuff for them," she says. "The languages they are using are higher level but their VM technology has not matured yet. If you did the same kind of measurements for these as for Java, those costs for the other languages will be much higher, and those can be lowered."
Currently, Blackburn and McKinley’s research is indicating that the benefits of running VMS such as garbage collection are far greater on lower-performing processors such as Intel’s Atom than on higher-performing i3 processors. This work, McKinley says, follows along the lines of burgeoning trends in both research and production line processors.
For example, Intel researchers have examined scheduling for performance-asymmetric systems, and ARM recently introduced its "big.LITTLE" processing system for mobile devices, which marries ARM’s high-performance Cortex-A15 and energy-efficient Cortex-A7 processors. The ARM system architecture performs low-level tasks on the A7 and when higher performance is needed, the applications in use are all moved to the A15.
Whether these multi-speed efficiencies are being explored on a single multi-core chip or on several chips which thread the proper application to the corresponding chip, McKinley says it is a recognized and promising trend.
"This is a trend in the whole community," she says. "I’m exploiting it in an innovative way, but lots of people are aware this is where the community’s going."
Gregory Goth is an Oakville, CT-based writer who specializes in science and technology.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment