In the early 1970s, a young assistant professor at the University of California, Berkeley, decided to build a relational database. The concept of such a database had been proposed in 1970 by Edgar Codd of IBM, who outlined the advantages it would have over the dominant database model at the time, IBM’s hierarchical Information Management System.
Many computer science researchers had followed Codd’s work with papers of their own, but none had gone beyond proof-of-concept prototypes. The Berkeley professor, Michael Stonebraker, wanted to build something that could actually work.
"This was new and different terrain, nothing like anything that had been done before," says Stonebraker, now an adjunct professor at the Massachusetts Institute of Technology (MIT) and recipient of the 2014 ACM A.M. Turing Award. Along with Berkeley computer science professor Eugene Wong, doctoral student Gerald Held, and some other students, Stonebraker spent the next six years building INGRES, one of the world’s first two relational databases.
"Building a real database system is a huge amount of work," he says. "We didn’t know how much work it was going to be. We just did it."
Hierarchical databases store information in a stacked set of categories. For instance, there might be a listing of departments within a business, and within those departments would be a list of employees. If, however, an employee did not belong to a specific department, or if the user did not know which department the employee was in, then a query would not be able to find information about that employee. Stonebraker felt users should not have to write an algorithm to find what they wanted, and their programs should not depend on exactly how the data was structured.
The result of their work was INGRES, for which Stonebraker, Held, and Wong were awarded the ACM Software System Award in 1988. The team made some choices that, as it turned out, helped boost the popularity of their system, says Held. They wrote it on an operating system that had recently been released by AT&T’s Bell Labs, Unix, in a then-little-known language, C. As Unix spread, users found there was a ready-made database system they could use with it. Stonebraker and colleagues also made INGRES an open source project, which allowed users to refine and build on it.
As INGRES grew more popular, its limitations became clearer. It began to be used in areas beyond the business data processing it had been designed for, and it needed changes. That led Stonebraker to develop POSTGRES, which allowed users to add their own data types, from specific formats for dates and times to geographic points. This object-relational database was now able to work on a much broader class of problems.
By the early 2000s, the database world had changed, and businesses had mountains of data they wanted to query in various ways. That data was arranged in rows, and queries took a relatively long time to be processed, so Stonebraker introduced column storage, which he says was two orders of magnitude faster. These days, he is promoting the idea that a "one size fits all" approach to databases is one whose time is past. He has founded various companies—including Vertica, Streambase, VoltDB, and Tamr—to commercialize different software products he has developed.
Held, who was chairman of Vertica and is now chairman at Tamr—Stonebraker was chief technology officer at both companies—says it is unusual for researchers to straddle industry and academia for as long as Stonebraker has. "There aren’t that many people that can have their feet both in the academic and commercial world," he says, adding that it may be that back-and-forth that has made him so productive. "Some people have one good idea and it’s great and it’s worth a lot. Mike has been in there designing year after year and decade after decade," Held says.
Stonebraker finds it important to go beyond simply describing ideas in a paper, to actually using those ideas to build practical applications. "If you don’t have something people can use, you have no chance of making a difference," he argues. At the same time he was working on INGRES, IBM started developing its own relational database, System R. Stonebraker says the existence of each spurred the other along, and the two wound up providing valuable feedback to each other. "The way to do technological innovation is to build a better mousetrap and then threaten the existing order with your better mousetrap, and then something will happen."
Samuel Madden, a colleague at MIT who studies database systems, believes Stonebraker founded these companies to demonstrate that his academic ideas have real-world applications. "He’s very singularly driven with a vision of the way the world ought to be," Madden says, "and remarkably most of the time this vision turns out to be right."
For undergraduates just getting into computer science, Stonebraker’s advice is, "Learn how to code, and code well, because whatever you do is going to involve implementation." For Ph.D. students trying to figure out where to focus their attention, he suggests talking to real-world computer users. "They’re happy to tell you why they don’t like or do like any given technology, so they’re a wonderful source of problems to work on."
Join the Discussion (0)
Become a Member or Sign In to Post a Comment