IBM researchers have developed BigSheets, a data analysis tool based on Hadoop designed to help users analyze large Web data sets. BigSheets uses Hadoop to comb through Web pages, looking for key terms and other data. BigSheets organizes the information in a very large spreadsheet, where users can analyze the data by using normal spreadsheet software.
BigSheets also works with an IBM visualization tool called Many Eyes, as well as other visualization software.
IBM first tested BigSheets at the British Library, which has been working to create an archive of about eight million U.K. Web sites. In less than eight hours, BigSheets took 4.5 terabytes of archived files and processed them using a Hadoop cluster of four machines.
University of Michigan professor Eytan Adar says BigSheets is useful because it compares data from many different pages as well as over time. He says effective visualizations are "crucial for letting users quickly understand large collections of data."
Abstracts Copyright © 2010 Information Inc., Bethesda, Maryland, USA
No entries found