Sign In

Communications of the ACM

ACM TechNews

Tool Detects Patterns Hidden in Vast Data Sets


View as: Print Mobile App Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
Data skyscraper

If researchers printed on paper each potential relationship in a recent data set containing abundance levels of bacteria in the human gut, the stack of paper would reach to a height of 1.4 miles, six times the height of the Empire State Building.

Credit: Sigrid Knemeyer, Broad Communications

Researchers at the Broad Institute and Harvard University have developed a tool that can analyze large data sets.

The tool is part of a suite of statistical tools known as Maximal Information-based Nonparametric Exploration (MINE), which can find multiple patterns hidden in massive data sets. "This toolkit gives us a way of mining the data to look for relationships," says the Broad Institute's Pardis Sabeti.

In one test, the researchers used MINE to make more than 22 million comparisons, focusing on a few hundred patterns of interest that had not been observed before in a data set of microorganisms.

"We view this as an exploration tool--it can find patterns and rank them in an equitable way," says Harvard professor Michael Mitzenmacher.

One of the tool's strengths is it can detect a wide range of patterns and organize them according to several different variables. "What’s exciting about our method is that it looks for any type of clear structure within the data, attempting to find all of them," says Harvard graduate student David Reshef. "This ability to search for patterns in an equitable way offers tremendous exploratory potential in terms of searching for patterns without having to know ahead of time what to search for."

From Broad Institute
View Full Article

Abstracts Copyright © 2011 Information Inc. External Link, Bethesda, Maryland, USA 

 

 

No entries found