Cascade Failure

ACM Past President and Google Inc. Vice President and Chief Internet Evangelist Vinton G. Cerf

Most readers of this column are likely familiar with the notion of a cascade failure in which one failure triggers another and the process continues gaining strength until it produces a catastrophic outcome. A particularly good example of this is the Northeast power failure of 2003^a about which was written:

"The blackout’s primary cause was a software bug in the alarm system at a control room of the FirstEnergy Corporation, located in Ohio. A lack of alarm left operators unaware of the need to redistribute power after overloaded transmission lines hit unpruned foliage, which triggered a race condition in the control software. What would have been a manageable local blackout cascaded into widespread distress on the electric grid."

There are plenty of other examples of this kind, including the reactor meltdown in the Japanese Fukushima area in 2011^b in which a series of failures produced a negative reinforcement cycle ending in disastrous consequences. That this was triggered by the massive 9.0 Tohoku earthquake and subsequent tidal wave underscores the undeniable importance of trying to imagine the worst-case scenarios and asking how to design for their mitigation.

Ironically, my topic for this column is not about these kinds of massive failures but, rather, more subtle scenarios we might not immediately identify as triggers with serious consequences. This whole line of reasoning started when I had to replace a tiny battery in one of my hearing aids. I have to do this every couple of days. The aids are in-the-ear type, so they are small, and the battery powering them must fit exactly into the enclosure provided. By good fortune and design, these are standard batteries made by a number of suppliers and can be found typically in most drug-stores and hardware shops in the U.S. and probably elsewhere, at least in the so-called developed countries. So, what’s my point?

It occurred to me that if these battery makers might someday decide to abandon the product, these expensive hearing aids would be worthless. The more I thought about this, the more I began to think about the problems of specialization and obsolescence.

I have owned a lot of printers over time. For all practical purposes, they all used ink cartridges specially designed to be incompatible with other models. I recall railing about this to HP management once, and was told, "How do you think we make up for selling the printers cheaply? We lock you into our ink cartridges." Of course, there are third-party suppliers of cartridges and even refill kits, but consumers are often warned about potential risks. It seems as if new models take different formats for the cartridges so even if you stocked up on older cartridges, the next printer you buy may not be compatible with them. This column is not a rant against HP or any other maker of printers, but an observation about the dependencies we are building into our increasingly complex infrastructure that may come to bite us from time to time.

Anyone with a collection of DVD or CD-ROM disks will appreciate that some new laptops do not even have readers for these, let alone 5¼" or 3½" floppies. We cannot rely in the long term on specialized format devices being available. Some battery formats have had very long availability (D, C, AA, AAA, AAAA batteries), but others may be much harder to come by (often by design). The same can be said of various memory devices including USB memory sticks. Some interfaces change and adapters are needed to cope with older formats. There are analogues of incompatibility in the communications protocol world (for example, implementing IPv6 in addition to IPv4) where adaptation is impossible or awkward and clumsy at best.

So, what’s the point, you may ask? Essentially, I think it is worth some effort to pay attention to the nature of these dependencies, their scope, and the potential side effects when the supply of particular devices runs out. Preparing for that eventuality seems wise. There may be no tactic that saves the day forever. Stocking up on batteries may not work, especially if they are not rechargeable or will not hold a charge for years. I am sure you can think of many other examples.

It is impressive to what degree ingenuity can keep things running. Consider the 1950s automobiles in Cuba, or the possibility of some parts being made in 3D printers. Still, the main point is we may make big investments in things that are dependent on small but crucial items only to have those investments rendered useless for lack of a small component. Maybe I should have titled this column "for want of a nail…."

Cascade Failure

DOI

May 2015 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Cascade Failure

DOI

May 2015 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.