BLOG@CACM
Computing Applications

The End of a DBMS Era (Might Be ­Upon ­Us)

Posted
MIT Adjunct Professor Michael Stonebraker

Relational database management systems (DBMSs) have been remarkably successful in capturing the DBMS marketplace. To a first approximation they are “the only game in town,” and the major vendors (IBM, Oracle, and Microsoft) enjoy an overwhelming market share. They are selling “one size fits all”; i.e., a single relational engine appropriate for all DBMS needs. Moreover, the code line from all of the major vendors is quite elderly, in all cases dating from the 1980s. Hence, the major vendors sell software that is a quarter century old, and has been extended and morphed to meet today’s needs. In my opinion, these legacy systems are at the end of their useful life. They deserve to be sent to the “home for tired software.”

Here’s why.

If we examine the nontrivial-sized DBMS markets, it turns out that current relational DBMSs can be beaten by approximately a factor of 50 in most any market I can think of. What follows are a few examples.

In the data warehouse market, a column store beats a row store by approximately a factor of 50 on typical business intelligence queries. The reason is because column stores read only the columns of interest to the query and not all of them. In addition, compression is more effective in a column store. Since the legacy systems are all row stores, they are vulnerable to competition from the newer column stores. The interested reader can start with “C-Store: A Column-oriented DBMS” to explore this topic further.

In the online transaction processing (OLTP) market, a lightweight main memory DBMS beats a row store by a factor of 50. Leveraging main memory and the fact that no DBMS application will send a message to a human user in the middle of a transaction, allows an OLTP DBMS to run transactions to completion with no resource contention or locking overhead. The interested reader can start with “The End of an Architectural Era (It’s Time for a Complete Rewrite)” to explore this topic further.

In the science DBMS market, users have never liked relational DBMSs and want a non-relational model and query facility. This was the topic of my last ACM blog, "DBMSs for Science Applications: A Possible Solution."

If you are storing Resource Description Framework (RDF) data, which is popular in the bio community and elsewhere, then “Scalable Semantic Web Data Management Using Vertical Partitioning” points out that column stores are very good at certain RDF workloads. In addition, other ideas, such as “RDF-3X: A Risc-style engine for RDF,” will beat conventional DBMSs in other situations. Lastly, native RDF engines (e.g., Virtuoso, Sesame, and Jena) may well gain traction. The point is that something else will beat conventional row stores in this market.

Text applications have never used relational DBMSs. This was pointed out to me most clearly by Eric Brewer nearly 15 years ago in the early days of Inktomi. He wanted to use a relational DBMS to store the results of Web crawling, but found RDBMS to be two orders of magnitude slower than a home-brew system. All the major Web-search engines use home-brew text software to serve us search results. None use relational DBMSs.

Even in XML, where the current major vendors have spent a great deal of energy extending their engines, it is claimed that specialized engines, such as Mark Logic or Tamino, run circles around the major vendors, according to a private communication by Dave Kellogg.

In summary, one can leverage at least the following ideas to get superior performance:

A non-relational data model. If the user’s data is naturally something other than tables and if simulating his natural data model on top of tables is awkward, then chances are that a native implementation of the natural data model will significantly outperform a conventional RDBMS. This is certainly true in scientific data.

A different implementation of tables. If something other than a row store accelerates the user’s queries, then a direct implementation of the relational model using non-row store technology will run circles around a conventional RDBMS. This is true in the data warehouse marketplace.

A different implementation of transactions. Current row stores give you a “one size fits all” implementation of transactions. This can be radically beaten if a user has lesser requirements or if the system can take advantage of workload specific features. This is true in the OLTP marketplace.

One of these characteristics is true in every market I can think of. Hence, in my opinion, the days of a “one size fits all” monolithic DBMS are at an end. The replacement will be a collection of vertical market specific engines, with much higher performance.

You might ask, “What if I don’t care about performance?” The answer: Run one of the open source relational DBMSs. They are mature, reliable, and, best of all, they are free.

You might also ask, “I am dug in deep with my current vendor(s). What do I do?” The answer: Take some portion of your DBMS budget and allocate it to new solutions. Over time, you will move onto better technology.

References

Michael Stonebraker et al., “C-Store: A Column-oriented DBMS,” Proc 2005 VLDB Conference, Trondheim, Norway, Sept. 2005.

Michael Stonebraker et al., “The End of an Architectural Era (It’s Time for a Complete Rewrite)” Proc 2007 VLDB Conference, Vienna, Austria, Sept. 2007.

Dan Abadi et al., “Scalable Semantic Web Data Management Using Vertical Partitioning,” Proc. 2007 VLDB Conference, Vienna, Austria, Sept. 2007.

Thomas Neumann et al., “RDF-3X: A Risc-style engine for RDF,” Proc VLDB Endowment, 1(1): 647-659 (2008)

Disclosure: Michael Stonebraker is associated with four startups that are either producers or consumers of data base technology. Hence, his opinions should be considered in this light.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More