Security of the Internet and the Known Unknowns

The Internet is not secure. The fundamental mechanics of the Internet have well-known and long-standing security issues that occasionally disrupt it. This is usually an accident and mostly caused by human error, but it is speculated that by exploiting the openness and trust on which the Internet is built, "bad people" could build a "cyber nuke" to take down the network. Serious as these security issues appear to be, they have not apparently created any serious problems thus far. Does this mean we have simply been lucky, or are the issues more theoretical than actual?

When we think about the Internet, we tend to think about the equipment and the protocols—the technology that makes it all work. But the Internet is a many-layered system that depends not only on technology, but also on people, coffee, and money. The equipment and protocols are the network layer. Each network in the Internet has a Network Operation Center (NOC) that monitors its own network and its connections to other networks, responds to incidents when they occur, and continually strives to maintain acceptable levels of service and reliability, at an acceptable cost. Each NOC acts independently and interacts with other NOCs, collectively forming the operational layer—people and coffee. Every network operator makes their own commercial decisions, forming the commercial and economic layer(s), driven by money. Finally, there is the policy or regulatory layer, which sets the context in which all networks operate. Each layer is a complex system in its own right, and each one plays its part in the security and reliability of the Internet. Consider, for example, the Border Gateway Protocol (BGP), a key component of the network layer.

Layers of Complexity

The essence of the Internet is that every network is able to reach every other network. The routing information that each network needs is distributed across the Internet by BGP. Unfortunately, BGP is not secure. BGP will happily distribute invalid routing information, and offers no way to distinguish valid from invalid routes. Every now and then, some network administrator somewhere makes a small mistake that generates bogus routing information that BGP blindly accepts and relays across the Internet. The effect of such a "route leak" is that data is diverted from its intended destination, usually ending up in a black hole from which there is no return. Since that can be achieved by accident, who can say what might be achievable with malice aforethought?

The Border Gateway Protocol has been with us for a long time—the latest version (version 4) is nearly 18 years old. So why have these issues not been addressed? First, what is remarkable about BGP is not that it is imperfect. Rather, given the exponential growth of the Internet since BGP was designed, what is truly remarkable is that it works at all. Second, it turns out that the validity of routing information is difficult to establish. Third, a more secure BGP may be either more brittle or less flexible, or both. This would make it more vulnerable or less able to adapt in a crisis, and hence not necessarily an improvement.

Making BGP more secure is difficult and the extra equipment and running costs are significant. The most recent effort, BGPSEC, is a work in progress in the IETF.^a Assuming BGPSEC is adopted, the IETF working group envisages a five- to 10-year timescale for deployment, and is rather depending on Moore's Law reducing the cost of the required extra memory capacity and processing power in the meantime. Even so, as currently defined BGPSEC does not address the issue of route leaks.

The insecurity of BGP leads to failure at the network layer. But securing the Internet is not solely a technology issue: the operational layer detects and responds quickly to route leaks and the like. The China Telecom incident that occurred in April 2010, in which approximately 15% of Internet addresses were disrupted for perhaps 18 minutes,^b is not only an example of the insecurity of the technology but also a testament to the effectiveness of the operational layer.

The continuing insecurity of BGP can be seen as a failure of the commercial and economic layer. The cost to each network, in equipment and in effort, of a more secure form of BGP would be high. What is worse, the benefit to each network would be limited until or unless others did the same. A more secure Internet would be a common good, but the incentive to create or maintain that common good is missing. On the other hand, given how effectively the operational layer compensates for the insecurity of BGP, perhaps the commercial/economic system has not failed. Perhaps the invisible hand has guided us to a cost-effective balance between security and the ability to deal with the occasional failure. Intervention to increase security would be a failure of the policy layer if, in the name of safety, unnecessary costs were forced on the system.

The great strength of the Internet is that it harnesses the independent efforts of tens of thousands of separate organizations. They cannot all connect directly to each other, so the Internet is a vast mesh of connections over and above the component networks, moderated by BGP. Each network monitors and manages its own small part of the system, but nobody has oversight over the entire mesh—there is no NOC for the Internet as a whole! A key function of a NOC is to monitor network performance; but for the Internet there is little data on how well it performs normally, let alone how well it actually copes with failure.

Another key function of a NOC is to ensure there is spare capacity in the network, available for use when things go wrong; but for the Internet we do not know how much spare capacity there is, or where it is—so we cannot even guess whether there is enough capacity in the right places, should something unprecedented happen.

A more secure Internet would be a common good, but the incentive to create or maintain that common good is missing.

Nobody would advocate a central committee for the Internet. However, the network's reliability is of vital interest, so it is remarkable how little we know about how well it works. Hard information is difficult to obtain. Much is anecdotal and incomplete, while some is speculative or simply apocryphal. The report The Internet Under Crisis Conditions: Learning from September 11,"² is an exception and a model of clarity; the authors warn: "…While the committee is confident in its assessment that the events of September 11 had little effect on the Internet as a whole…the precision with which analysts can measure the impact of such events is limited by a lack of relevant data."

Effects of Internet Vulnerabilities

What is the realistic, likely impact of Internet vulnerabilities? What can be done to cost-effectively mitigate the risks? Is the longevity of these issues a symptom of a market failure, and if so, should government act to secure this critical Infrastructure? These are all good questions but, sadly, we do not have good answers and we are hampered by a lack of good data.

Monitoring Internet performance could provide the relevant data. Unfortunately, it would be difficult to implement such monitoring: the sheer scale of the Internet is an obvious problem, and then each layer has its own issues. First, there are many technical problems: what data should be collected, where and how to collect it, how to store it in a usable form, how to process it to extract useful information, and so on. Some of these are engineering issues; others are research topics. Second, such a system would be a common good, with the usual issues of incentives to create and maintain—or, more bluntly, who would pay for it? Third, Internet networks compete with each other and some of the data would be deemed commercially sensitive—so, there could be some self-interested resistance to creating the common good. Fourth, there is a fine line between monitoring and control, and a fine line between monitoring performance and monitoring content—lines that different jurisdictions might take quite different views on.

An accident investigation board for the Internet would be a modest and minimally intrusive step, though not without its own challenges.

All day, every day, the Internet copes with circuit and equipment failures, and almost nobody notices. Occasionally an event—a route leak, the cutting of critical undersea cables, the discovery of a software problem—has a more significant impact, but the Internet just keeps rolling along. Many in the industry will argue this is evidence for the inherent reliability of the Internet, and, if improvement is required, it is best left to the industry. Any outside interference is generally deemed likely to be clueless and destructive.

On the other hand, every incident is a small experiment in how well the system responds. But it is nobody's job to investigate, assess the impact, and find the lessons to be learned. So, nobody does it, or at least nobody does it thoroughly or authoritatively. In some cases, those closest to an incident do not care to publish the exact causes or the entire history. This can be for commercial reasons, or for security reasons, but often simply because they have better things to do with their time. Perhaps we can learn from the airline industry. Air travel has become ever safer, not by trying to gain perfect knowledge of all possible combinations of weather, equipment failure, human error, and so forth, but by learning the lessons of every incident. This common good is funded communally—nobody expects the industry to do this by itself. Further, air accident investigators have found ways to deal with the commercial and other issues, and to operate in a global industry.

An accident investigation board for the Internet would be a modest and minimally intrusive step, though not without its own challenges. It would need to develop and refine ways to assess the impact of incidents. It would need to establish what day-to-day data gathering is most useful and most easily achieved. Unlike comprehensive monitoring, which looks like a step too far, this would be a step-by-step way of approaching a 20% solution that might provide 80% of what we need.

Conclusion

We do not know whether a more secure Border Gateway Protocol would materially improve the reliability of the Internet, let alone whether the improvement would be cost justified. We do not know whether it would be more effective to improve the operational layer, so that it would cope better with the known vulnerabilities, and would be better prepared to deal with as-yet unknown ones. Assuming we can work out how best to improve things, we do not know how to construct a system of incentives to implement those improvements. And the insecurity of BGP is not the only threat to the Internet's reliability.

If we are to improve the security and reliability of the Internet, we really need better data on which to base policy and engineering decisions…and that is a common good to which government could usefully contribute.

Footnotes

a. The Internet Engineering Task Force—see http://www.ietf.org for more about the IETF in general, and http://datatracker.ietf.org/wg/sidr/charter/ for more about the BGPSEC initiative in particular.

b. http://www.renesys.com/blog/2010/11/chinas-18-minute-mystery.shtml.

Security of the Internet and the Known Unknowns

Layers of Complexity

Effects of Internet Vulnerabilities

Conclusion

Security of the Internet and the Known Unknowns

DOI

June 2012 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Layers of Complexity

Effects of Internet Vulnerabilities

Conclusion

Security of the Internet and the Known Unknowns

DOI

June 2012 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.