Sign In

Communications of the ACM

ACM TechNews

Finding Patterns in Corrupted Data

Data analysis.

A team of researchers has created a new set of algorithms that can efficiently fit probability distributions to high-dimensional data.

Credit: MIT News

A new set of algorithms developed by a multi-university team of researchers is capable of efficiently model-fitting probability distributions to high-dimensional data.

"From the vantage point of theoretical computer science, it's much more apparent how rare it is for a problem to be efficiently solvable," says Massachusetts Institute of Technology professor Ankur Moitra. "If you start off with some hypothetical thing--'Man, I wish I could do this. If I could, it would be robust'--you're going to have a bad time, because it will be inefficient. You should start off with the things that you know that you can efficiently do, and figure out how to piece them together to get robustness."

Moitra and his collaborators developed an algorithm whose running time rises with the number of data dimensions at a more reasonable rate than algorithms that take two-dimensional cross-sections of the data graph to see if they resemble Gaussian distributions.

Their algorithm hinges on what metric to use when quantifying how far off a dataset is from a range of distributions with approximately the same shape. Another key to its performance is identifying the regions of data in which to start taking cross sections.

From MIT News
View Full Article


Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account
Read CACM in a free mobile app!
Access the latest issue, plus archived issues and more
ACM Logo
  • ACM CACM apps available for iPad, iPhone and iPod Touch, and Android platforms
  • ACM Digital Library apps available for iOS, Android, and Windows devices
  • Download an app and sign in to it with your ACM Web Account
Find the app for your mobile device