Sign In

Communications of the ACM

Contributed articles

MapReduce and Parallel DBMSs: Friends or Foes?


View as: Print Mobile App ACM Digital Library In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
massively parallel illustration

Illustration by Marius Watz

Parallel DBMSs excel at efficient querying of large data sets; MapReduce-style systems excel at complex analytics and ETL tasks. Neither is good at what the other does well. Hence, the two technologies are complementary.

The full text of this article is premium content


Comments


Michael Berry

The finding that Vertica is faster than Hadoop or DBMS-X would be more credible if the article's author were not CTO and co-founder of Vertica, a fact nowhere mentioned in the article or author listing.

See http://www.vertica.com/leadership


The account that made this comment no longer exists.

It is interesting to see both of the articles come up. Anyway, the Google MapReduce is not as the same as Hadoop, which leaves us a mysterious comparison. Also I think the invention of MapReduce itself is not for research but for solving their own problems, not elegant in an academic way.


Nathan Fiedler

This is an improvement over Stonebraker's other writings related to MapReduce and NoSQL, but still a very slanted view. The authors pit Hadoop, a specific (and imperfect) implementation of MapReduce against the idealized conception of parallel DBMS's. Even their tests are slanted to show Vertica in a good light (an important fact to consider is Stonebraker's vested interest in Vertica coming out ahead). The article by Dean and Ghemawat nicely illustrates the fallacies in the comparison paper and show just where Stonebraker went wrong (again).


CACM Administrator

The following letter was published in the Letters to the Editor in the April 2010 CACM (http://cacm.acm.org/magazines/2010/4/81506).
--CACM Administrator

I applaud the debate on MapReduce between "MapReduce and Parallel DBMSs: Friends or Foes?" by Michael Stonebraker et al. and "MapReduce: A Flexible Data Processing Tool" by Jeffrey Dean and Sanjay Ghemawat (Jan. 2010). But I strongly object to the former's criticism of the MapReduce designers, saying "Engineers should stand on the shoulders of those who went before, rather than on their toes." Creating an alternate method is not stepping on anyone's toes. Such accusations, besides being unjust, impede science.

Jonathan Grier
Lakewood, NJ

----------------------------------------

AUTHORS' RESPONSE

As we noted in the article, the Map phase of a MapReduce computation is essentially a filter and a group-by operation in SQL, while the Reduce phase is largely a target-list computation in SQL. When user-defined functions are included in SQL (as they are in many commercial implementations), the functionality provided by parallel SQL DBMSs and MapReduce implementations appears to be the same.

The parallel DBMS literature, dating from the 1980s, includes hundreds of articles on implementation tactics. Our comment about "standing on the shoulders..." was meant to suggest that any new implementation effort should carefully review the prior literature to learn what past results are available, then add to the store of total knowledge.

The MapReduce team seemed not to have done this exercise. Hence the comment.

Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, Alexander Rasin
Cambridge, MA


Displaying all 4 comments

Log in to Read the Full Article

Sign In

Sign in using your ACM Web Account username and password to access premium content if you are an ACM member, Communications subscriber or Digital Library subscriber.

Need Access?

Please select one of the options below for access to premium content and features.

Create a Web Account

If you are already an ACM member, Communications subscriber, or Digital Library subscriber, please set up a web account to access premium content on this site.

Join the ACM

Become a member to take full advantage of ACM's outstanding computing information resources, networking opportunities, and other benefits.
  

Subscribe to Communications of the ACM Magazine

Get full access to 50+ years of CACM content and receive the print version of the magazine monthly.

Purchase the Article

Non-members can purchase this article or a copy of the magazine in which it appears.