Artificial Intelligence and Machine Learning Breakthrough research: a preview of things to come

MapReduce: Simplified Data Processing on Large Clusters

Posted Jan 1 2008

MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

View this article in the ACM Digital Library.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

MapReduce: Simplified Data Processing on Large Clusters

View in the ACM Digital Library

DOI

10.1145/1327452.1327492

January 2008 Issue

Published: January 1, 2008

Vol. 51 No. 1

Pages: 107-113

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Sep 25 2025

Effective Technical Definitions

Bertrand Meyer

Computing Profession

dictionary definition of the word 'Requirement'

News Sep 25 2025

Juice Jacking

David Geer

Architecture and Hardware

BLOG@CACM Sep 25 2025

What Lessons Can We Learn from the Internet for AI/ML Evolution?

Mallik Tatipamula and Vinton G. Cerf

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

MapReduce: Simplified Data Processing on Large Clusters

DOI

January 2008 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.