acm-header
Sign In

Communications of the ACM

ACM TechNews

Tiny Chips, Big Headaches


Researchers worry they are finding rare defects because they are trying to solve bigger and bigger computing problems, which stresses their systems in unexpected ways.

Credit: Tom Schierlitz/Trunk Archive

With transistors in computer chips shrinking in size, concern is growing about larger and more intricate cloud computing networks' fundamental dependence on less reliable and less predictable chips.

Recent studies by Facebook and Google researchers described outages with difficult-to-diagnose causes, arguing that underlying hardware was to blame.

Stanford University's Subhasish Mitra said people increasingly think manufacturing defects correspond with silent hardware errors, while scientists worry they are finding rare defects because they are attempting to meet bigger computing challenges, leading to unexpected system stressors.

The smallest error in a microprocessor hosting billions of transistors can disrupt systems that routinely execute billions of calculations each second, and mounting evidence suggests the problem is getting generationally worse.

Proposed remedies include software that proactively monitors for hardware errors.

From "Tiny Chips, Big Headaches"

The New York Times (02/07/22) John Markoff
View Full Article - May Require Paid Subscription


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account