Sign In

Communications of the ACM

ACM News

Internet Routing Failures Bring Architecture Changes Back to the Table

The trend over time of autonomous systems affected by outages.

In this table of autonomous systems affected by outages compiled by BGPMon, it is easy to see the jump in outages that occurred on August 12 or "512K Day."

Credit: BGPMon

A burst of route updates from one network operator triggered a failure among a number of routers across the Internet in mid-August, focusing attention on the cost of upgrading systems to support the growing demand for access.

Research by networking monitoring company BGPMon showed outages among "autonomous systems"—groups of IP addresses under the control of a single operator—rose by 66% on August 12th (which has come to be known by some as 512K Day). Earlier that day, Verizon published 15,000 more Border Gateway Protocol (BGP) routes than usual by deaggregating groups of addresses from a Class B range that contains 64K IP addresses in total. The breach of a default 512K address limit in some of the routers caused them to slow down as route data overflowed from content-addressable memory (CAM) designed specifically for rapid address searches. This caused some routes to be dropped entirely, or handled using lookups from much slower and cheaper random-access memory.

However, even among affected routers, the 512K address limit is not yet a physical one. Network equipment supplier Cisco warned in May that route counts were very close to their default threshold of 512K for IPv4 route data in some of its older routers, and asked customers to run an update—demanding a reboot—that would reallocate some of the space originally reserved for the next-generation IPv6 protocol. Emergency fixes halted the problem. Although Verizon quickly reaggregated the additional routing data it published back into a larger block, the table size permanently crossed the 512,000-entry threshold a matter of days later, according data from the CIDR Report, a weekly analysis of the BGP route table.

James Cowie, chief scientist at Dyn, a Manchester, NH-based company that provides Internet traffic management and performance assurance, said, "Table growth is particularly strong in emerging economies, many of which are transitioning over time from a single state-owned Internet provider to a more competitive market landscape, and all those new entrants need representation in the routing table."

Significant reductions in the space needed for IPv4 routes could be made if service providers aggregated more of their routes. If maximum aggregation were performed, the table would shrink by close to half. Yet this is increasingly unlikely to happen.

Geoff Huston, chief scientist at the Asia-Pacific Network Information Center (APNIC) and who maintains the CIDR Report, said, "There are many reasons why deaggregation occurs. For example, in the absence of any other tools, the only way I can perform traffic engineering for inbound links is by route engineering."

Cowie added, "For resilience, they want not only to be reachable, but reachable through multiple providers, so they have to have their own independent entry in the global routing table."

Timothy Griffin, reader in computer science at the University of Cambridge, said the temporary overflow problem "highlights the fact that the Internet is not for free; it’s built on quite expensive hardware. They are physical resources that need to be upgraded on occasion and you tend to put that off as long as possible. No matter what protocol you use, you will have this infrastructure cost problem. However, the way that BGP uses addresses and prefixes does exacerbate the problem."

As table growth continues, operators may investigate ways to keep growth of their own BGP routing tables under control. One option for operators using equipment close to its limits is to perform greater levels of filtering of unnecessary routes. Said Cowie, "I think operators are looking more carefully at filtering, but they are also cognizant of the fact that filtering has the potential for creating connectivity problems to small networks that don't make the cut."

Jennifer Rexford, Gordon Y. S. Wu Professor in Engineering in the Department of Computer Science at Princeton University, said more advanced filtering could cut the effective table size three-fold. "If a customer connects to two upstream providers, they both have to make that network visible, but there is a point beyond which others don't really need to see that. A bunch of research proposals have shown that it's possible to recognize the situation. It could just (need) a software change for a router so it wouldn't require new physical equipment. At the moment, it’s just a research proposal, but there is a need for solutions like that."

Further improvements would demand changes to core Internet protocols. "It’s really an architectural problem in the Internet’s design," said Griffin, pointing to a proposal called the Locator/ID Separation Protocol (LISP) that would provide a second layer of Internet addresses reserved for routers. "All the infrastructure in the core would need to know about is how to get to a locator," said Griffin. "Therefore, the routing infrastructure would have to carry much less information. A lot of people think it’s a good idea, but the devil is in the details; how would we implement and then transition to it? With IPv6, everybody understands that we need to do that, but the transition is still painfully slow. Can you imagine how difficult it would be to make an architectural change?"

Chris Edwards is a Surrey, U.K.-based writer who reports on electronics, IT, and synthetic biology.


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account