Xiaochen Guo may be considered an "early career"-stage researcher by the U.S. National Science Foundation (NSF), but she has spent a lot of time in her post-graduate and junior faculty positions pondering a very timely issue in computer science: moving data more efficiently from memory to processor, as the physical limitations on scaling ever-smaller processors become more daunting.
"It is really accessing memory and moving data around that is the bottleneck," said Guo, an assistant professor of electrical and computer engineering at Lehigh University in Bethlehem, PA. "Because when technologies scale, we can make the computation local, but we can not easily feed the data to the processor core. Also, the wire doesn't scale well. When you make transistors smaller, they end up faster and occupying smaller area, but if you make wire narrower or thinner, it becomes more resistive and that ends up dissipating more energy."
The NSF has awarded Guo a $500,000 CAREER grant, given annually to rising emerging academic researchers and leaders who have demonstrated potential to serve as role models in research and education. She will use the funding to pursue a five-year project investigating ways to improve memory design.
Guo's project is just one funded this year by the NSF's Computer and Information Science and Engineering (CISE) directorate that examines more efficient use of current memory architectures, implementation of new and emerging technologies, and the integration of both legacy and emerging architectures.
"The traditional way to view the memory hierarchy is to have faster memory closer to the processor and slower but denser memory further away, and the granularity of accessing different levels of the memory are also different," she said. "The denser memory is slower so you want to access them in larger granularity to improve throughput. The memory that is closer to the core is already small and fast so there is no need to really increase the granularity."
That conventional design, however, comes with a price: high inefficiency. Guo's project will entail experimenting with ways to reduce metadata in cache memory by looking more closely at correlations among data access requests. If those correlations can be used to discern a pattern and enable the system to predict what data might be requested next, Guo said metadata overhead can be reduced and cache granularity can be increased, leading to much more efficient storage and processing.
The theory holds promise: in preliminary work on fine-grained memory, the proposed design achieves 16% better performance, with a 22% decrease in the amount of energy expended compared to conventional memory.
One approach among many
Guo's project is just one approach now being funded by the NSF toward making computing faster or more efficient, or both, through better use of memory. Josep Torrellas, professor of computer science at the University of Illinois at Urbana-Champaign, said the new era will present plenty of research challenges and opportunities.
"It's becoming more and more interesting to engineer main memories, because main memories are going to be heterogeneous," said Torrellas, principal investigator on a project that received a $1.2-million NSF grant to look at improving system efficiency. "They will include fast memories like the stacked high-bandwidth memory (HBM) close to the processor die, then other slower DRAM modules farther from the processor, and then NVM (non-volatile memory) modules."
Torrellas mentioned the burgeoning ecosystem in NVM as an exemplar of the trade-offs memory researchers are currently examining. For instance, he said, while NVM, which retains data even when receiving no power, offers a decided advantage over DRAM due to its superior scalability, every access to NVM currently takes longer and consumes more energy; "So as a result, a mix is the most competitive design at this point—where you combine volatile and non-volatile memory."
Researchers at North Carolina State University recently addressed the issue of NVM speed and energy consumption by proposing a technique called lazy persistency. In essence, it assumes most interactions between memory and processor will not end up in a crash, and its code is free of cache flushes and stalls that wait for data to become durable. The technique uses a checksum to account for the assumed small percentage of processes that result in a crash: "Our results show that Lazy Persistency reduces the execution time and write amplification overheads, from 9% and 21%, to only 1% and 3%, respectively," they wrote in their paper, presented at the 2018 ACM/IEEE International Symposium on Computer Architecture (ISCA 2018).
Taking NVM — and all memory — further
Numerous teams of researchers in academia and industry are studying the storage and processing capabilities of NVM. Among the most intriguing current projects are explorations of the analog properties of memristor arrays. For example:
- IBM researchers created an analog NVM array that satisfactorily performed the types of algorithms used in trained deep neural networks. In a paper published in Nature, they explained how they pass a small current through a resistor into a wire, then connect many such wires together to let the currents build up: "This lets us perform many calculations at the same time, rather than one after the other," they wrote. "And instead of shipping digital data on long journeys between digital memory chips and processing chips, we can perform all the computation inside the analog memory chip."
- University of Michigan researchers led by Wei Lu found a way to digitize current outputs of memristor arrays. As a result, they found operations that multiply and sum the rows and columns can be taken care of simultaneously, with a set of voltage pulses along the rows. The current measured at the end of each column contains the answers. A typical processor, in contrast, would have to read the value from each cell of the matrix, perform multiplication, and then sum up each column in series.
- Nathaniel Cady, professor of nanobioscience at the State University of New York (SUNY) Polytechnic Institute in Albany, NY, received a collaborative NSF grant with University of Central Florida computer science professor Sumit Jha, to develop working nanoscale NVM circuits. Cady, who also is pursuing analog-oriented construction, said work previously funded by the U.S. Air Force Research Lab helped him develop the technique to build and integrate resistive memory devices with traditional CMOS; similar to the IBM deep neural network research, he said a lot of his lab's work is spun toward neuromorphic computing.
"This new NSF project, as well as some we are spinning with the Air Force now as a follow-on, are all focused on leveraging that base technological advance," Cady said. "Mostly, what we are using our NVM elements for is to encode the synaptic functions—basically the connectivity between the nodes in one of these neural networks. You could store all those levels in a lookup table, or RAM somewhere else, but if you could store it locally at the circuit in a non-volatile element, the information is encoded right at the location in an NVM bit. You avoid the serial process of read-in, read-out to some memory array, and there is less overhead circuitry to refresh RAM or something in that position."
Lehigh University's Guo said her work in re-organizing memory hierarchies could work with either volatile or NVM technology. She is pleased to see the emergence of NVM has stimulated wider discussion about maximizing memory design overall.
"Another perspective on NVM is that because there is already commercial product, they have already had some discussion in the community and industry about potential interfaces and how they will evolve the memory modules. That is definitely beneficial, because the memory industry has been very slow and reluctant about changing."
Gregory Goth is an Oakville, CT-based writer who specializes in science and technology.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment