Sign In

Communications of the ACM

ACM TechNews

Finding Patterns in Corrupted Data


Data analysis.

A team of researchers has created a new set of algorithms that can efficiently fit probability distributions to high-dimensional data.

Credit: MIT News

A new set of algorithms developed by a multi-university team of researchers is capable of efficiently model-fitting probability distributions to high-dimensional data.

"From the vantage point of theoretical computer science, it's much more apparent how rare it is for a problem to be efficiently solvable," says Massachusetts Institute of Technology professor Ankur Moitra. "If you start off with some hypothetical thing--'Man, I wish I could do this. If I could, it would be robust'--you're going to have a bad time, because it will be inefficient. You should start off with the things that you know that you can efficiently do, and figure out how to piece them together to get robustness."

Moitra and his collaborators developed an algorithm whose running time rises with the number of data dimensions at a more reasonable rate than algorithms that take two-dimensional cross-sections of the data graph to see if they resemble Gaussian distributions.

Their algorithm hinges on what metric to use when quantifying how far off a dataset is from a range of distributions with approximately the same shape. Another key to its performance is identifying the regions of data in which to start taking cross sections.

From MIT News
View Full Article

 

Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA


 

No entries found