The Internet is increasingly a platform for online services—such as email, Web search, social networks, and virtual worlds—running on rack after rack of servers in data centers. The servers not only communicate with end users, but also with each other to analyze data (for example, to build a search index) or compose Web pages (for example, by combining data from multiple backend servers). With the advent of large data centers, the study of the networks that interconnect these servers has become an important topic to researchers and practitioners alike.
Data-center networking presents unique opportunities and challenges, compared to traditional backbone and enterprise networks:
- In a data center, the same company controls both the servers and the network elements, enabling new network architectures that implement key functionality on the end-host computers.
- Servers are installed in fixed units, such as racks or even trucks filled with racks driven directly into the data center. This leads to very uniform wiring topologies, such as fat trees or Clos networks, reminiscent of the massively parallel computers designed in the 1990s.
- The traffic load in data centers is often quite heavy and non-uniform, due to new backend applications like MapReduce; the traffic can also be quite volatile, varying dramatically and unpredictably over time.
In light of these new characteristics, researchers have been revisiting everything in networking—from addressing and congestion control to routing and the underlying topology—with the unique needs of data centers in mind.
The following paper presents one of the first measurement studies of network traffic in data centers, highlighting specifically the volatility of the traffic even on a relatively small timescale. These observations led the authors to design an "agile" network engineered for all-to-all connectivity with no contention inside the network. This gives data-center operators the freedom to place applications on any servers, without concern for the performance of the underlying network. Having an agile network greatly simplifies the task of designing and running online services.
More generally, the authors propose a simple abstraction—a single "virtual" layer-two switch (hence the name "VL2") for each service, with no interference with the many other services running in the same data center. They achieve this goal through several key design decisions, including flat addressing (so service instances can run on any server, independent of its location) and Valiant Load Balancing (to spread traffic uniformly over the network). A Clos topology ensures the network has many paths between each pair of servers. To scale to large data centers, the servers take responsibility for translating addresses to the appropriate "exit point" from the network, obviating the need for the networking equipment to keep track of the many end hosts in the data center.
This paper is a great example of rethinking networking from scratch, while coming full circle to work with today’s equipment.
In addition to proposing an effective design, the authors illustrate how to build the solution using mechanisms available in existing network switches (for example, equal-cost multipath routing, IP anycast, and packet encapsulation). This allows data centers to deploy VL2 with no changes to the underlying switches, substantially lowering the barrier for practical deployment. This paper is a great example of rethinking networking from scratch, while coming full circle to work with today’s equipment. Indeed, the work depicted in the VL2 paper has already spawned substantial follow-up work in the networking research community, and likely will for years to come.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment