Sign In

Communications of the ACM

ACM TechNews

How Hpc Is Hacking Hadoop

The Hadoop logo.

The Hadoop open source framework is being used increasingly in high-performance computing environments.

Credit: Apache Software Foundation

The Hadoop open source framework increasingly is being used with high-performance computing (HPC) environments, particularly for data-intensive scientific computing applications. Almost all major HPC system vendors and many software vendors are offering significant Hadoop enhancements, customized distributions, and sometimes new product lines. HPC systems can be modified to work with Hadoop for the purpose of providing more streamlined data management and processing on certain problems.

The San Diego Supercomputer Center's (SDSC) Glenn K. Lockwood is renowned for his work on Hadoop for large-scale systems, particularly the Gordon flash-based data-intensive computing system at SDSC. Lockwood is experimenting with Hadoop clusters on Gordon and writing Hadoop applications in Python with Hadoop Streaming. "Although traditional supercomputers and Hadoop clusters are designed to solve very different problems and are consequentially architected differently, domain scientists are becoming increasingly interested in learning how Hadoop works and how it may be useful in addressing the data-intensive problems they face," Lockwood says.

Users can launch a Hadoop cluster by submitting a single pre-made job script to the batch system on Gordon with which they are already familiar, eliminating the need to learn a new cloud API or be a systems administrator, says Lockwood; this has made it significantly easier for domain scientists to experiment with Hadoop's potential role in their analyses.

From HPC Wire
View Full Article


Abstracts Copyright © 2014 Information Inc., Bethesda, Maryland, USA


No entries found