Sign In

Communications of the ACM

News

Beyond Hadoop


Hadoop ecosystem components

Hadoop ecosystem components as visualized by Datameer.

Credit: Datameer, www.datameer.com/blog

The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise.

The full text of this article is premium content


Comments


Vijay Agneeswaran

Hi,

I think that Carlos Guestrin is from CMU (as the graphlab page here - http://graphlab.org/contact/ also tells us), whereas you have said he is from University of Washington.

Could you clarify this please?

Best Regards,
Vijay


Anonymous

Although he is still affiliated with CMU, Guestrin recently moved to the University of Washington:

http://www.cs.washington.edu/people/faculty/guestrin/

--Gregory Mone
Boston, MA


Flavio Villanustre

Gregory, very good article!

I would like to add that there is another free and open source distributed data-intensive computing platform, which is not based on the MapReduce paradigm: the LexisNexis HPCC Systems platform (http://hpccsystems.com).

The original design for the HPCC Systems platform predates the paper on MapReduce from the Google researchers by, at least, 5 years. The processing model of the HPCC Systems platform is dataflow oriented and provides a very high level declarative and open programming language called ECL, which offers modern programming language features, including code/data encapsulation, lazy evaluation, compilation to native code and purity. This platform underpins all the data services and analytic products from LexisNexis Risk Solutions, and several other information products from Reed Elsevier, its parent company, in areas that cover machine learning, massive data warehousing, social graph analytics, recommendation systems, etc. It has also been in use by several large and medium sized Organizations for years (even before it was released under an Open Source license, back in 2011).

On the same topic, a few weeks ago, I wrote a short blog post comparing the paradigms behind the two main data-intensive open source platforms: Hadoop and HPCC, which you can read here: http://hpccsystems.com/blog/hpcc-systems-hadoop-%E2%80%93-contrast-paradigms. Some of the concepts that I expose there are relevant for this article.

Best regards,

Flavio Villanustre


Displaying all 3 comments

Log in to Read the Full Article

Sign In

Sign in using your ACM Web Account username and password to access premium content if you are an ACM member, Communications subscriber or Digital Library subscriber.

Need Access?

Please select one of the options below for access to premium content and features.

Create a Web Account

If you are already an ACM member, Communications subscriber, or Digital Library subscriber, please set up a web account to access premium content on this site.

Join the ACM

Become a member to take full advantage of ACM's outstanding computing information resources, networking opportunities, and other benefits.
  

Subscribe to Communications of the ACM Magazine

Get full access to 50+ years of CACM content and receive the print version of the magazine monthly.

Purchase the Article

Non-members can purchase this article or a copy of the magazine in which it appears.
Sign In for Full Access
» Forgot Password? » Create an ACM Web Account