CACM logo

BLOG@CACM

The End of a DBMS Era (Might be Upon Us)

[article image]

Relational database management systems (DBMSs) have been remarkably successful in capturing the DBMS marketplace. To a first approximation they are “the only game in town,” and the major vendors enjoy an overwhelming market share. They are selling “one size fits all”; i.e., a single relational engine appropriate for all DBMS needs. Moreover, the code line from all of the major vendors is quite elderly. Hence, the major vendors sell software that is a quarter centur...

User Comments

 (6)

This article nicely sums up a lot of the arguments about why the RDBMS is no longer a viable solution for people whose data needs truly must scale. I have not personally looked into RDF much, but I agree that your data storage needs to reflect the nature of the data ... not just a bunch of tables and rows.

I've written an article on it as well that I encourage you to check out, entitled "Social Media Kills the Database", which is about the Swiss-Army RDBMS and its impending end. You can check it out at http://www.roadtofailure.com

Absolutely, we somehow allowed ourselves to go down the path of monopolizing a single technology for data management and largely monopolizing a handful of vendors. At the same time alternatives to the RDBMS were largely discredited throughout the years and never really gained acceptance, even when there clearly was a disconnect between the RDBMS and requirements (and despite some poorly bolted on extensions by the RDBMS vendors in an attempt to retrofit their platforms).

This is clearly changing now. Some difficult but pressing challenges haven’t been easily solved by the RDBMS in traditional form (massive scalability for example), and this has opened the door to new approaches and ideas. The traditional RDBMS will of course live on but in an ecosystem of alternative data management strategies.

Yes. It is very true that RDBMS are overhyped for years for not-so valid reasons. The current trends also showcase that there are viable alternatives to RDBMS and also can beat them at its own game. Also, the emergence of distributed key-value stores such as Cassandra, Voldemort proves the efficiency and cost effectiveness of their approaches.

Also the recently concluded "NoSQL" conference discussed at length as to how distributed, non relational databases work along with overview of the emerging alternatives in this space.

One of the chief benefits cited for dbms's was improved programmer productivity due to insulating the application programmers from the significant requirements of learning specialized files/systems and their internals. However valuable this benefit actually turned out to be, it would be entirely lost when replacing dbms's with specialized systems.

Michael Stonebraker is fairly entitled to express his opinions, but as a fellow member of ACM, I would like to express some counterpoints based on my 25 years of successful business deployments on 8 or 9 (maybe more) commercial relational database products.

It is perfectly legitimate for Stonebraker to differ with me on the practicalities of relational storage for text, and how we interpret it when it is "claimed" that specialized XML engines outperform RDBMSes.

Similarly, he might claim a column store is different from a tuple store, and I might claim the only difference is data modeling choices.

It is also perfectly fair for him to emphasize performance benefits while downplaying or redefining the notion of transactional integrity that is now in widespread use.

But when Stonebraker misrepresents the relational model by claiming a user's data is "naturally something other than tables" he fundamentally misrepresents relational data analysis. The relational data model is not called "tabular" and the abstraction of a relation is not a simple data-entry form. One needn't read any further than Codd and Date to understand this. Their whole supplier-part-warehouse example shows how you have to deconstruct crude tabular representations in order to get good relational data models. Codd won the Turing Award because the relational abstraction is capable of representing any structured data.

And of all the valid criticisms of a model or a technology, "elderly" and "tired" are worse than useless. Do we believe that technology builds on prior discoveries, or that new technology throws older discoveries away? By such a standard, we would stop teaching Boolean logic, Turing machines, and all the other things that predate us.

Computer science has given us two fantastic tools for analyzing and managing data complexity: the relational model, and language theory. Compiler technology has been bulletproof for decades because of terrific underlying abstractions like the context-free grammar. The longevity of the relational model is due to its similar foundation in a powerful abstraction. I would claim that we will see relational DBMSes for at least as long as we see compilers.

In the interest of full disclosure, I am a proud employee of Oracle Corporation, but I do not speak for Oracle in any official or unofficial capacity.

It is disappointing, conversely, that neither Stonebraker nor ACM has informed our fellow ACM members that he is the CTO of Vertica Systems, a "column-store" database vendor that positions itself as a technology superior to relational technology.

With all respect,

Andrew D. Wolfe, Jr.

Mr. Wolfe appears to be making three main points in his posting:

1) the relational model is the best approach to data modeling
2) column stores are no different than row stores
3) elderly software is not bad.

I would like to briefly respond to each point.

Mr. Wolfe uses examples from business data processing in his posting. It is widely recognized that the relational model is probably the best fit for most business data processing data. In fact, all of the early examples used by Ted Codd, Chris Date, and others (including me) come from this domain (e.g., suppliers, parts, employees, departments, etc.). In the 1970s and 80s this was the only database market of consequence. However, one of the points that I was trying to make is that there are now other sizeable markets with different requirements.

In the science domain, tables are rarely the natural data model, and arrays would be a better choice. Popular science packages (e.g., MATLAB and S+) use arrays, not tables, as their user model. Once one leaves business data processing, the naturalness of the relational model must be questioned.

Column stores are a different implementation of the relational model than the row stores used by the major commercial vendors. Because they make different architectural choices than row stores in the areas of query processing, compression, and storage formats, they have a different performance envelope than row stores. In typical data warehouse workloads, column stores (which were designed specifically for this market) are vastly superior to row stores. See [1, 2, 3] for some detailed remarks in this area. Or just have your favorite Web browser search for “column stores versus row stores” to access the abundance of literature on this topic.

Third, I am always reminded of the Airline Control Program (ACP), renamed TPF by IBM. Written in IBM assembler a long time ago, it used very small disk blocks, an architectural decision made more than 30 years ago to optimize processing on a then-current (but now obviously long gone) IBM disk drive. Only fairly recently was this architectural decision changed. Hence, the problem with legacy code is that some things are just hard to change and linger in elderly code lines.

Two additional examples come to mind. A major database vendor wanted to change his replication system from active-passive to active-active. However, he didn’t do so because it was just too much work. Another DBMS vendor has a shared-disk architecture because implementing a shared nothing architecture was simply too hard.

Besides technical problems, there are also political and business issues to cope with. Any technologist would be well advised to read Clayton Christensen’s book on this topic [4].

--Michael Stonebraker, Sept. 4, 2009

[1] Mike Stonebraker et. al., "C-Store: A Column-oriented DBMS," Proc. 2005 VLDB Conference, Trondheim, Norway, Sept. 2005.

[2] en.wikipedia.org/wiki/Column-oriented_DBMS.

[3] Dan Abadi et. al., "Column-Stores vs. Row-Stores: How Different Are They Really," Proc. 2008 SIGMOD Conference, Vancouver, Canada, June 2008.

[4] Clayton M. Christensen, "The Innovator’s Dilemma," Collins Business Essentials, 1997.

sign in to comment

If you are an ACM member, Communications subscriber, Digital Library subscriber, or use your institution's subscription, please set up a web account to access comments, premium content and additional site features.

If you are a SIG member or member of the general public, you may set up a web account to comment on free articles and sign up for email alerts.

Tools For Readers

Bookmark and Share
Default Font Size Large Font Size X-Large Font Size Text Size

Related ACM Resources

Conferences:

Books:

Courses:

  • Project Communication Management - In this course, you will explore the communication planning processes and examine the inputs to and outputs from communication planning, information distribution, and performance reporting. (Duration: …

In The Digital Library


About Communications | Join ACM External Link | Renew External Link | Subscribe External Link | Sign In | For Authors | For Advertisers External Link | Privacy | Site Map | Help | Contact Us

Copyright © 2009 by the ACM. All rights reserved.