Computer architecture is currently undergoing a radical and exciting transition as the end of Moore's Law nears, and the burden of increasing humanity's ability to compute falls to the creativity of computer architects and their ability to fuse together the application and the silicon. A case in point is the recent explosion of deep neural networks, which occurred as a result of a drop in the cost of compute because of successful parallelization with GPGPUs (general-purpose graphics processing units) and the ability of cloud companies to gather massive amounts of data to feed the algorithms. As improvements in general-purpose architecture slow to a standstill, we must specialize the architecture for the application in order to overcome fundamental energy efficiency limits that prevent humanity's progress. This drive to specialize will bring another wave of chips with neural-network specific accelerators currently in development worldwide, but also a host of other kinds of accelerators, each specialized for a particular planet-scale purpose.
Organizations like Google, Microsoft, and Amazon are increasingly finding reasons to bypass the confines imposed by traditional silicon companies by rolling their own silicon that is tailored to their own datacenter needs. In this new world, a multicore processor acts more of a caretaker of the accelerator rather than the main act.
However, specialization brings challenges, primarily the high NRE (nonrecurring engineering) costs, and the long time-to-market of developing customized chips. Ultimately this NRE will limit the diversity of specialized chips that are economically feasible. For this reason, a new style of computer architecture research has emerged, which attacks the challenge of driving down the cost and time to market of developing these specialized hardware designs. A growing movement within academia is to train Ph.D. students with the skills necessary to succeed in this brave new world, learning not only how to perform research but also to design and build chips. Both feeding into and out of this approach is the growth of an active open source movement that ultimately will provide many of the components that will be mixed and matched to create low-NRE designs.
The OpenPiton research, led by Princeton professor David Wentzlaff, is one of the watershed moments in this fundamental shift toward the construction of an open source ecosystem in the computer architecture. OpenPiton is an open source distributed cache-coherent manycore processor implementation for cloud servers. Unlike most multicore implementations, OpenPiton implements a new scalable, directory-based cache-coherence protocol (known as P-Mesh) with three levels of cache, including a distributed, shared last-level L2 cache that scales in size with the number of tiles. Cache coherence is maintained using three physical Networks on Chip (NoCs), which can connect general-purpose cores, accelerators, and other peripherals, and can be extended across multiple chips to build systems with up to 500 million cores.
In contrast to existing Intel Xeon processors, manycores are designed to have low implementation complexity, in order to minimize NRE. Many-core is clearly the future of low-NRE general-purpose server architecture and provides scalable general-purpose performance that can be coupled with emerging classes of accelerators. The OpenPiton work advances the bar not only by releasing open source for use by others, but also serving as a framework for micro-architectural exploration in Infrastructure-as-a-Service (IaaS) clouds. Much of this micro-architectural work focuses on how resources, whether inside a core, in the cache, or in the on-chip or off-chip interconnect, are shared between different jobs running on the system. Other OpenPiton-related work has also explored issues in security and side-channel attacks.
While most computer architects use in-house simulators or from-scratch implementations to do their research, which results in questionable claims of validity and reproducibility, the Princeton team took an extremely clever approach: they leveraged an existing open source processor design, the OpenSPARC T1, to extend it into an entirely new scalable design. Then, the team integrated their research projects into this chip design, and taped out many of their research projects, so that all of these have a real-world physical realization in an advanced 25-core manycore processor in 32-nm technology. Then, they realized this effort as the open source OpenPiton scalable processor, which is the only real-world open source platform for both prototyping and experimenting with Linux-capable manycore.
I believe this work will unlock the next 20 years of progress in Linux-capable manycore research in academia, which has largely fizzled because of the lack of realistic, silicon-validated models to work with. At the same time, OpenPiton's scalable cache coherence implementation is licensed under the BSD license, which allows it to be freely mixed and matched. Indeed, work is already underway to retrofit open source RISC-V implementations like BlackParrot and Ariane. I expect OpenPiton's influence will grow across the community and enable larger and larger research projects that can truly deliver specialization across the stack.
To view the accompanying paper, visit doi.acm.org/10.1145/3366343
The Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.
No entries found