The following paper, "Simba: Scaling Deep-Learning Inference with Chiplet-Based Architecture," by Shao et al. presents a scalable deep learning accelerator architecture that tackles issues ranging from chip integration technology to workload partitioning and non-uniform latency effects on deep neural network performance. Through a hardware prototype, they present a timely study of cross-layer issues that will inform next-generation deep learning hardware, software, and neural network architectures.
Chip vendors face significant challenges with the continued slowing of Moore's Law causing the time between new technology nodes to increase, sky-rocketing manufacturing costs for silicon, and the end of Dennard scaling. In the absence of device scaling, domain specialization provides an opportunity for architects to deliver more performance and greater energy efficiency. However, domain specialization is an expensive proposition for chip manufacturers. The non-recurring engineering costs of producing silicon are exorbitant including design and verification time for chips containing billions of transistors. Without significant market demand, it is difficult to justify this cost.
No entries found