Q&A: Big Challenge

After meeting in the 1990s at Digital Equipment Corporation (DEC) and forging a productive friendship at a gelato stand between their two labs, ACM-Infosys Foundation Award in the Computing Sciences recipients Jeff Dean and Sanjay Ghemawat moved to Google where, for nearly 15 years—and often coding at the same computer—they have transformed Internet-scale computing. Spurred by the challenge of handling an ever-growing volume of web pages and search requests, the two built scalable computing platforms that distributed computations across thousands of servers, and have since been leveraged by thousands of projects both inside and beyond Google.

I understand you met at Digital Equipment Corporation.

JEFF: We both started at DEC within about a year of each other. Sanjay was at the Systems Research Center, and I was at the Western Research Lab. The two labs were conveniently separated by a gelato stand. We initially started collaborating on a project to build a low-overhead profiling system.

SANJAY: Then we moved on to an optimizing compiler for Java.

JEFF: I got interested in a side project working on information retrieval. A colleague had built a system that kept the entire graph of all the connectivity structure from an AltaVista crawl in memory, and built a simple API so you could see what pages pointed to which other pages—and more importantly, which pages pointed to a given page. So I started looking into the link structure of the Web, and I decided to leave the research lab. Two months later, I told Sanjay he should come to Google, too.

At Google, you share an office, and you even code together.

JEFF: We’ve shared an office for most of the time that we’ve been here, but we were actually coding together at DEC. We’ve been doing that for a number of years. We work really well in that mode because we each understand where the other one is going with an idea, both on a very small scale—like how we should implement some data structure that we need—and at the large scale of a big system. It’s a very fluid style.

“We work really well in that mode because we each understand where the other is going with an idea.”

SANJAY: It’s all over the place. Jeff has a lot of energy and excitement. We usually sit, and one of us is typing and the other is looking on, and we’re chatting all the time about ideas, going back and forth.

Let’s talk about scalability. One of your best-known projects at Google is MapReduce, which enables programmers to spread computations across multiple machines.

JEFF: For the first few years we were at Google, scalability was a real challenge. Our indexing and serving systems had to rework things very quickly in order to both be able to update the index files and deal with the queries coming in. The indexing system starts with a bunch of pages collected from the Internet. Eventually, you want to end up with an inverted index where words map to the documents that contain them, along with a bunch of other information about those documents: things like page rank of the document, what language the document is in, and then you want to eliminate duplicates … so there was this whole collection of operations you have to perform, starting with raw document contents that we had crawled on disk, and ultimately ending up with an inverted index and other data structures needed for handling search requests.

SANJAY: Each of these operations had to process a lot of data, so we had to divide up the work across many machines so it would finish in a reasonable amount of time. This division of work was very boring and repetitive. We had to implement the same boilerplate for every new data processing task. That’s where MapReduce came in. We abstracted the repetitive parts into a library and allowed the author of the new data processing task to just plug in the specific operations they wanted to apply to their data; the MapReduce library took care of the rest.

JEFF: It was originally done in the context of these eight or 10 phases of the indexing system, but as we looked around, we realized it was much more general-purpose.

One thing that makes it so versatile is that the programmer doesn’t have to worry about how to map his or her computations across the processors.

JEFF: MapReduce makes it easy for people without a lot of systems experience to get the answers they want, without being experts in how you automatically parallelize computations or handle failures.

Speaking of failures, you’ve called MapReduce a software answer to a hardware problem. Can you elaborate?

SANJAY: If you have 1,000 servers, then three of them are dying every day. So if you’re running a MapReduce computation across thousands of machines, given the law of large numbers, chances are that something will go wrong. We could try to make the underlying hardware more reliable, but I think the costs are against that.

“MapReduce makes it easy for people without a lot of systems experience to get the answers they want.”

So because MapReduce spreads computations across so many machines, it doesn’t matter if one or two fail.

JEFF: Doing recovery in software means that if a particular machine dies after it has done 100 pieces of work, you can map each of those 100 pieces of work to one of 100 other machines. The recovery is very fast because it can happen with a high degree of parallelism.

SANJAY: And there are other reasons to make error-handling a central part of these systems, because you can leverage it for other things. So consider a job scheduling system that is handling jobs submitted by many people. When a new job comes in, the job scheduler might kick out parts of other jobs to make room for the new job, so the old job has to be prepared to handle failures introduced by this preemption. If the old job is a MapReduce, the MapReduce library will deal with these preemptions automatically.

After your 2004 paper, MapReduce inspired the open source system that became Hadoop, along with countless other projects. What about internally?

JEFF: It’s used in thousands of ways in hundreds of products and underlying systems. Lots of batch-oriented computations, processing things like logs data, web pages, images, satellite imagery, source code, etc., in order to compute summary or derived information of various kinds. It’s often used in multistage pipelines with different MapReduce operations forming different stages of the pipeline.

Tell me about BigTable, a database that spreads rows of data across multiple machines.

SANJAY: So the initial motivation for BigTable was the cycle of periodic crawling and indexing. In particular, we wanted to make things a lot more real time. When a page goes up on the Web, you want it to be searchable quickly. Every month we would crawl and index from scratch and that would take a while. We wanted to reduce that delay.

JEFF: If the row is the URL, there are a bunch of different columns with information about that URL. Then you basically have all of the information about a page, and you can, with very low latency, crawl a new version of the page, and re-index the page with all the information we know about it in an incremental fashion, rather than waiting for the next large batch update.

In 2007, Jeff served as the inspiration for a popular April Fool’s gag—a collection of statements modeled after Chuck Norris Facts, like “Jeff Dean once failed a Turing Test when he correctly identified the 203rd Fibonacci number in less than a second.” Sanjay, do you have a favorite Jeff Dean fact?

SANJAY: Hmm, let me think. Jeff is a fast typist, but he’s also a very hard typist, so—

JEFF: I wear out keyboards fairly quickly.

SANJAY: So we were working together and all of a sudden, from the next office, we heard: “Is it raining?” “No, that’s just Jeff typing.”