Sign In

Communications of the ACM

Research highlights

Technical Perspective: Technology Scaling Redirects Main Memories


View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook

As predicted by Intel's Gordon Moore in 1965, based on his observation of the scaling of several generations of silicon technology at the time, the number of transistors that can be integrated on one die continues to double approximately every two years. Amazing to some, Moore's Law has prevailed for 45 years and is expected to continue for several more generations. Transistor feature size and die integration capacity projections from the International Technology Roadmap for Semiconductors (ITRS) roadmap is shown in the accompanying table here.

These faster and more abundant transistors have been exploited by computer engineers to build processors that double in performance about every two years. Up until the beginning of this decade, that was done through faster clock speeds and clever architectural enhancements. Many of these architectural enhancements were directed at tackling the "memory wall," which still plagues us today. Early in this decade, we ran into the "power wall" that dramatically slowed the increase in clock speeds. Since then, we are still seeing performance doublea every two years, but now it's through having more cores (running at only modestly faster clock rates) on one die since technology scaling provides all of those additional transistors.

Another key component on the motherboard affected by technology scaling is the main memory, traditionally built out of dynamic random access memory (DRAM) parts. DRAMs have been doubling in capacity every two to three years while their access latency has improved about 7% per year. However, processors speeds still leave main memories in the dustwith the processors having to wait 100 or more cycles to get information back from main memoryhence, the focus by architects on cache memory systems that tackle this "memory wall." And multi-core parts put even more pressure on the DRAM, demanding more capacity, lower latencies, and better bandwidth.

As pointed out in the following paper by Lee, Ipek, Mutlu, and Burger, DRAM memory scaling is in jeopardy, primarily due to reliability issues. The storage mechanism in DRAMs, charge storage and maintenance in a capacitor, requires inherently unscalable charge placement and control. Flash memories, which have the advantage of being nonvolatile, have their own scaling limitations. Thus, the search for new main memory technologies has begun.

The authors make a case for phase change memories (PCMs) that are nonvolatile and can scale below 40nm. PCMs store state by forcing a phase change in their storage element (for example, chalcogenide) to a high resistance state (so storing a "0") or to a low resistance state (so storing a "1"). Fortunately, programming current scales linearly. However, PCMs do not come without their disadvantages: read and, especially, write latencies several times slower than DRAMs, write energies several times larger than DRAMs, and, like Flash, a limited lifetime directly related to the number of writes to a memory location.

This paper is a wonderful illustration of the way computer architects can work around the limitations of the technology with clever architectural enhancementsturning lemons into lemonade. By using an area-neutral memory buffer reorganization, the authors are able to reduce application execution time from 1.6X to only 1.2X relative to a DRAM-based system and memory array energy from 2.2X to 1.0X also relative to a DRAM-based system. They use multiple, narrower memory buffers, which reduces the number of expensive (in terms of both area and power) sense amplifiers and focus on application performance rather than the performance of an individual memory cell.

The authors also describe their investigation of the trade-offs between buffer row widths and the number of rows. To tackle the PCM's lifetime limitation, the authors propose using partial writes to reduce the number of writes to the PCM by tracking dirty data from the L1 caches to the memory banks. With this approach, they can improve PCM lifetimes from hundreds of hours to nearly 10 years, assuming present 1E+08 to 1E+12 writes per bit for a 32nm PCM cell.

The paper concludes with some suggestions as to how the use of a nonvolatile main memory would change the computing landscape: instantaneous system boot/hibernate, cheaper checkpointing, stronger safety guarantees for file system. Now, if only someone could figure out a way to dramatically improve memory to processor bandwidth.

Back to Top

Author

Mary Jane Irwin (mji@cse.psu.edu) is Evan Pugh Professor and A. Robert Noll Chair in Engineering in the Department of Computer Science and Engineering at Penn State University, University Park, PA.

Back to Top

Footnotes

a. But one only really gets double the performance if they can figure out how to keep all of those cores busy.

DOI: http://doi.acm.org/10.1145/1785414.1785440

Back to Top

Tables

UT1Table. Projections for transistor size and die integration capacity.

Back to top


©2010 ACM  0001-0782/10/0700  $10.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account
Article Contents:
  • Article
  • Author
  • Footnotes
  • Tables
  • ACM Resources