The next generation of Apache Hadoop will likely be released this year, says Yahoo!'s Todd Papaioannou. Apache Hadoop enables batch processing of petabytes of data, but it does not effectively manage resources across thousands of servers in a cluster. As a result, developers are working to improve its utilization, scheduling, and management of resources.
Yahoo!, which contributed about 70 percent of the code for the current iteration of Hadoop and the Hadoop Distributed File System (HDFS), is working more closely with the Apache Hadoop community because it allows the open source community to help with development efforts.
In addition to Apache, Hadoop uses an iteration of a Google-originated programming technique, MapReduce, for building parallel programs. Hadoop enables MapReduce to perform parallel batch processing. "The next generation of HDFS will be more resilient, available, and reliable," Papaioannou says.
Yahoo! also has launched H Catalog, a table metadata management schema for Hadoop that recently went into the Apache version.
View Full Article
Abstracts Copyright © 2011 Information Inc., Bethesda, Maryland, USA
No entries found