Which Is Riskier – Communications of the ACM

It is computer science "folk wisdom" that our computer systems, particularly the networks, are unnecessarily vulnerable because so many of our systems are either made by Microsoft, highly dependent on Microsoft software, or required to interact with Microsoft software. Many see this as a single point of failure, an Achilles’ heel. Analogies are drawn to situations such as many people concentrated in a dangerous area, large quantities of hazardous materials stored in one place, or systems reliant on a single power source. Many propose that we can decrease our vulnerability by insisting on the use of non-Windows operating systems—thereby increasing diversity. In this column, I question that view.

Diversity, when combined with redundancy, is a well-established approach to increasing the reliability of safety-critical systems. For example, having two independent pumps, either of which is adequate, may decrease the probability of a complete outage; using two pumps of diverse manufacture may make it less likely that both fail at once. Predictions that diversity will increase reliability assume:

Redundancy: The system must be functional even if one pump fails.
Independence: The failure of one pump must not make the failure of the other more likely, and the pumps must not depend on shared resources.
Deep diversity: The pumps must be fundamentally different so that they are unlikely to have common design faults.
Interoperability: The pumps must function well together. If not, the number of failures may increase.

The validity of these assumptions must be carefully examined in each individual situation; they do not seem to apply to today’s computer systems.

That increasing diversity does not always improve reliability is obvious if we think of automobile traffic. We would not make our roads safer by demanding that 30% of us switch to the other side of the road. On the roads there is limited redundancy and independence. All drivers are essential; a failure of one affects many others. Further, cars driven with different rules would not be interoperable.

Examining the case at hand we see:

In today’s computer systems, there is remarkably little redundancy. When a Canadian tax system failed, no tax returns could be submitted. Basing the Irish system on a different OS would not have helped. Having two diverse Canadian systems would help only if they implemented exactly the same rules.
In today’s OS market, diversity is shallow. Linux, and the various Unix versions are very much alike and Apple has joined this club. More subtly, studying Windows and the Unix family, one will see that the developers have all "learned from" each other or from common sources. More generally, programmers often overlook the same situations and write diverse programs that fail on the same cases.

Independence is equally questionable. Two communicating systems constitute a single system. A failure of one can cause problems for the other. Frequently, networks stop while all the elements patiently wait for one to finish. One false message can trigger a cavalcade of failures.

Interoperability of independently developed software systems is very difficult to achieve. Communication protocols that are "almost alike" often fail to work together. Many experienced IT managers wisely insist on a monoculture in their networks because they fear incompatibility and cannot deal with the finger pointing that occurs when one supplier’s system does not work well with another. If one product is updated to remove a fault (or feature) on which another depends, the combined system fails and it is not clear who should fix it.

Were we to insist on a diverse mix of operating systems, their failure to work together properly could actually reduce reliability and increase vulnerability. In some cases, the whole system would be no more reliable than its weakest link. When I buy a light bulb, tire, or car, I benefit from competition and limited diversity because there are tight standards that allow me to replace one brand with another. We do not have comparable standards for operating systems. Each upgrade causes some trouble in application software.

This column is neither pro-monopoly nor pro-Microsoft. It is pro-realism. If we want the advantages of diversity and competition in support software, there is much difficult work to do. We need precise specifications for systems that are to operate in our networks; we also need the ability to enforce those standards. Otherwise, increasing diversity might make the situation worse.

Which Is Riskier: OS Diversity or OS Monopoly?

Which Is Riskier: OS Diversity or OS Monopoly?

DOI

August 2007 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.