RISCy Beginnings

Garth gibson has spent his career pushing data storage systems to higher levels of performance, reliability, and scalability. While he was a graduate student at the University of California, Berkeley, Gibson was a part of groundbreaking research on Redundant Arrays of Inexpensive Disks (RAID). Later, as a professor at Carnegie Mellon University, he worked on projects like Network Attached Secure Disk (NASD) technology, and a clustered storage system used by Roadrunner, the world’s first peta-scale supercomputer. Here, he speaks about handling failures, collaborating with industry and academia, and how deep learning has impacted systems design.

When you were a graduate student at the University of California at Berkeley, you joined a computer architecture team led by David Patterson that was building a complete system based on RISC concepts. How did you get started on storage?

David Patterson walked into my office around 1986 and said, "You know, I think we’re doing really well with the development of the computer architecture and data processing. Let’s think about the broader systems issues." There were two areas of interest. One was in networking distributed systems, and the other was in storage. So I started trying to figure out what was going on in storage, and what the most important issues were when it came to storing and accessing huge amounts of information.

We also knew we wanted to build a parallel storage system, just like we wanted to build multiprocessors into highly parallel computers.

One of the dominant problems was that, if your storage medium—the magnetic disk—failed, you put everything on the computer at risk.

The database community, by that point, already had their answer, namely duplicates. Memory systems researchers had also developed error-cracking code that stretched across memory chips to cope with the loss of the memory system. Then we noticed a group that pointed out that there’s a big difference between a code that can detect a loss and replace it, and a code that doesn’t have to detect, but just replaces a known loss. And we realized that the most important failures were going to be identifiable, so we built the taxonomy around erasure codes. The original paper on redundant arrays of inexpensive disks, or RAID, presented the taxonomy, the problem space, the goals in parallelism, and a way to compare the different ways that you could do it.

Your co-authors, David Patterson and Randy Katz, then built a research program to develop those ideas, which were eventually commercialized under a slightly different acronym: "redundant arrays of independent disks."

We made a very important strategic decision to engage the industry, which was Patterson’s approach to doing research projects. Because we chose not to patent or trademark the ideas, they were quickly adopted by industry. We called it "inexpensive" because replacing one unit with 100 units was never going to be a good idea if it meant your product was 100 times the cost. But nobody likes to try to sell something by saying, "This is a high-value, high-cost product, and that’s why it’s called ‘inexpensive.’" Fortunately, the word "independent" is entirely appropriate, because the only reason that RAID’s failure modeling works is that the failure domains of each drive are independent failures.

In any event, when I look at datasheets for a computer system today, I see that the chips that are built into the motherboards have support for RAID 0, RAID 1, RAID 5, and RAID 10.

Like Patterson, you have gone back and forth between industry and academia throughout your career. Can you tell me about the Parallel Data Lab, which you founded in 1993, two years after you left Berkeley and moved to Carnegie Mellon University?

At Berkeley, I loved the benefits of interacting with industry—engaging with smart people and real-world problems and trends. I wanted to build the same infrastructure at Carnegie Mellon. We created what we called the Parallel Data Consortium, which was a vehicle to give companies access and interaction with the people in the lab. Initially, we held annual retreats in which the research results of the lab were shared and discussed with industry collaborators. That structure has evolved over time, as have the industry participants. But the number of companies involved in it grew from an initial six to about 20 now, plus strong engagements with government funding agencies.

One of the more impactful projects to emerge from the lab was Network Attached Secure Disks (NASD) technology, which moved magnetic storage disks out of the host computers and communicated with them via networking protocols in the interests of providing more scalable storage.

We did a few things that ended up having influence on the architecture. The first was to call up the interface abstraction from the magnetic disk layer upward in the stack, but not all the way to the file system abstraction. That’s still the way it’s done today. Almost all of the large file systems have an interface layer, whether it’s an AWS object or a file system object. It’s usually seperate from the file system above it, and it is the scalability component.

The security aspect is also interesting, because we separated policy from implementation. The storage object would implement security and access control using a Merkle tree protection, a lot like Blockchain is used in the integrity of ledgers. That gave us the ability to move storage across the network in a way where the storage components didn’t have to understand the file systems’ policies. It was a big influence on the distributed systems community.

In 2018, you joined another consortium of industry and academia, Toronto’s Vector Institute, as president and CEO.

If I compare it to the Parallel Data Lab, what I say to people at CMU is, it’s kind of like the Parallel Data Lab, except it’s moved outside the university, and it’s dealing with 10 times as much people, and 10 times the amount of money. And a lot more partners.

How have machine learning, deep learning, and the other areas the Vector Institute is looking at impacted systems design?

Around 2010, Carlos Guestrin at Carnegie Mellon and Joseph Hellerstein at Berkeley began to think about solving for machine learning problems. And that meant distributive processing. What happens when we try to do an iterative convergent solution of an equation using distributed systems? And how are we going to deal with all of that communication? And that became the big way that machine learning impacted systems design. We are going to do so much communication that the algorithms are going to need to explicitly take into consideration the cost of communication.

You’ve also done some work in that area with Eric Xing and Greg Ganger.

In 2012, we started working on stale synchronous processing, or SSP, as opposed to bulk synchronous processing, which is the basis for parallel computing that Leslie Valiant did in his Turing Award work. The idea in stale synchronous is that when you are searching for an approximate solution—which is what a convergent algorithm is, because you’re going to assume that it’s close enough—you can allow error as long as you can bound it. In particular, you can allow some signals to arrive later than others. In the traditional computing world, we would have called this relaxed consistency.

My students since then have said that, while it’s true that staleness can be tolerated, it isn’t necessarily fast. So, the work has been more along the lines of, how can we allow staleness if it increases the speed of the system, while maximizing freshness in order that we converge fast?

RISCy Beginnings

DOI

December 2019 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.