Architecture and Hardware

Chipping Away at Big Data

Artist's representation of a neural network chip.
Neural network chips can create a compact representation of large amounts of data using a layered architecture.

Hardware accelerators for artificial intelligence (AI) in end-user devices—neural network chips integrated into smartphones, autonomous vehicles, and Internet of Thing (IoT) devices—are capable of creating smaller representations of big data locally, rather than wasting bandwidth sending massive raw data streams to the cloud.

Neural networks provide a framework for many different machine learning (ML) algorithms. Neural network chips can create a compact representation of large amounts of data using a layered architecture. Layers of artificial neurons (logic gates with accumulative input thresholds) are connected to like layers above and below them through synapses (tuned resistive connections). The input layer accepts inputs from feature sensors in the outside world—for instance, detectors sensitive to different colors. During learning, the resistance of the synapses is tuned in such a way that the output layer activates the neuron(s) indicating the answer—if the inputs were red and blue, then the output would indicate the perceived color purple. The layer(s) between the input and output layers are called "hidden" layers. A "deep" neural network has many hidden layers; the more layers (deeper) the neural network, the more complex the object it can classify—for instance, to classify the species of a bird from features depicting size, shape and coloring might require dozens of layers.

Today, most neural networks use power-hungry neural accelerator boards in the cloud, built from programmable components including field-programmable gate arrays (FPGAs), graphics processor units (GPUs), and tensor processing units (TPUs). These shared resources typically use from 5 to 500 watts of power (or more) to process massive streams of raw data from edge devices (smartphones, autonomous cars, IoT devices). The advantage of provisioning local neural network chips in each edge device is that they take require only milliWatts of power to create an ultra-small representation of a sensor data stream.

During the learning phase (the first step in building a neural network application), the resistive values of the synapses (weights) are tuned to turn on the output neuron indicating the correct classification, such as spoken word as input and text as output. That synaptic weight map of resistive values is hundreds to millions of times smaller than the original raw data. Once sent to the cloud, it can be combined with other synaptic weight maps from other users' edge devices to accommodate, in the speech-to-text example, every person's accent. After optimizing the combined network, an inference-only synaptic weight map can then be sent back to all the edge devices—allowing them to translate voice to text regardless of a user's accent.

There are two types of neural network hardware. The first is a high-resolution learning neural network that digests literally millions of examples covering all the features that need to be detected for a particular application. After learning, the second type of neural network hardware, a low-resolution neural network optimized into a minimal representation of the learned synaptic weight maps, can then be deployed as a low-power inference-only neural network. For instance, Intel's Nervana neural network processor (NNP) comes in two versions—the NNP-L for learning-only and the NNP-I for inferencing-only.

"Neural network chips are creating big opportunities for processor and semiconductor markets in general," said Linley Gwennap, publisher of the Microprocessor Report newsletter. "The AI capabilities they enable, such as face recognition in smartphones and security cameras, are also creating entirely new markets like autonomous ground vehicles and autonomous flying drones. These chips have completely new architectures, enabling chip makers to move beyond Moore's Law to create tomorrow's innovations."

The chip-sized neural networks are already being used to recognize the owner's face in lieu of memorized passwords in high-end smartphones and are "predicted to be deployed in 70% of all smartphones by 2023" by Gwennap.

All the leading processors in high-end smartphones use neural networks today, including Apple's A12 (using a proprietary architecture), Samsung's Exynos 9810 (using Xilinx's DeePHi ML), MediaTek's P90 (using Cadence's P6 DSP configured for ML), Huawei's Kirin 970 (using Cambricon's neural processing unit/NPU) and LG's Qualcomm Snapdragon (using its Hexagon DSP configured for ML). Local ML allows the high-end smartphones using these chips to perform face recognition and biometric diagnoses as well as learn acoustic, visual, and electronic signatures, all without sending raw data up to the cloud.

Apple is the first smartphone maker offering independent software developers an application programming interface (API) to its A12's on-chip neural network. "Soon we will begin to see third-party apps performing human-like feats of recognition, biometric diagnoses, and other AI processing using Apple's neural API," said Gwennap.

In the wake of Apple's release of its API, other smartphone makers using neural networks likely will feel compelled to follow suit with their own APIs, according to Gwennap.

Many other IoT devices are receiving standalone learning and inferencing capabilities in large volume applications. 

"Over 200 million smart surveillance cameras with built-in citizen identification are predicted to be deployed in China alone," said Gwennap. In other countries, smart IoT cameras are also using face recognition to identify "persons of interest," mostly using neural network chips from vendors in the U.S., but also from the U.K. by virtue of Arm's Project Trillium machine learning (ML) platform.

"Arm's Project Trillium has already delivered our first-generation Arm ML processor for dedicated ML workloads," said Arm vice president Steve Roddy.  "By sheer numbers of units, edge devices doing inference will be the primary visible beneficiary of Project Trillium."

IBM is also advancing the state of the art with what it calls AI for IoT devices. The company offers entire prewritten and validated software stacks that implement the smart AI learning and inferencing algorithms for IoT programmers. IBM's software can be run on its Watson AI platform, its TrueNorth chip set, or on the tiny analog neural network chips just emerging from its labs. IBM Research in Yorktown Heights, NY, and IBM Zurich in Switzerland are collaborating on the neural network software, to be implemented on chips from its new AI Hardware Research Center in Albany, NY, in collaboration with Samsung (which is supplying neural network chips), Israel-based Mellanox Technologies (for fast optical connections, California-based Synopsys (for advanced design software), California-based Applied Materials (for specialty fabrication equipment), and Japan's Tokyo Electron (for system integration of these diverse technologies).

"IBM Research has created a set of versatile AI-based applications which we can run on any platform—both on our own servers and any other platform in the industry, such as Apple's new A12," said IBM Fellow Dinesh Verma. "Our software is structured as libraries that can be easily ported to the most efficient hardware and software environment for each specific AI application—whether it is IBM's own chips or someone else's."

Intel is providing neural networks for the IoT market with its Movidius Myriad X vision processing unit (VPU). The VPU is suitable for both learning and inferencing, and has proven capable of differentiating between family members and strangers for home security systems, as well as to identify the difference between animals and poachers in field operations. Intel also supplies a Neural Compute Stick the size of a thumb drive containing a VPU for use during software development before IoT chip prototypes are even built.

"For Intel, the use of neural learning and inferencing at the edge includes a whole range of applications, including cities using smart security cameras to insure public safety, smart kiosks for retailers, and also for robotics applications," said Gary Brown, who joined Intel's IoT Group during its acquisition of Movidius.

Intel claims 70% of the autonomous automobile market is using its Mobileye 360-Degree vision-based advanced driver-assistance system, according to Gwennap. Mobileye says its technology "is one of the first embedded versions of AI, which means the technology does not live in the cloud but instead lives in the vehicle, on the chip. "

Autonomous vehicle makers not using the Mobileye solution (including Audi, General Motors, and Toyota) are experimenting with Nvidia neural processors. Tesla, on the other hand, began working with Nvidia GPUs, but announced last year that it is switching to its own proprietary chip. Other semiconductor makers also make neural network hardware, including Mythic's hybrid digital/analog neural network inside a flash array, Wave Computing's Triton accelerator called "AI-at-the-edge," and Xilinx's Alveo FPGA-based neural network.

A new crop of startups is popping up with neural network chips, including Eta Compute's Dial processor and GreenWave Systems with its Axon processor.

Besides Arm, there is also an increasing number of intellectual property (IP)-only endeavors that sell the resources to add neural networks to proprietary application specific integrated circuits (ASICs). These include AImotive (specializing in automotive neural IP), Cadence (with neural IP for smartphones), Ceva (with neural IP for smartphones), Imagination Technologies (with neural GPUs for smartphones), Synopsys (with neural IP development software), Videantis (with automotive neural IP), and the deep-learning open-source accelerator neural IP from NVidia.

R. Colin Johnson is a Kyoto Prize Fellow who ​​has worked as a technology journalist ​for two decades.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More