acm-header
Sign In

Communications of the ACM

ACM TechNews

Making Big Data Manageable


The technique shrinks data sets while preserving their fundamental mathematical relationships.

A new technique devised by researchers at the Massachusetts Institute of Technology can take data sets with huge numbers of variables and find approximations of them with far fewer variables.

Credit: MIT News

Researchers from the Massachusetts Institute of Technology's (MIT) Computer Science and Artificial Intelligence Laboratory and the University of Haifa in Israel presented a new coreset-generation technique for handling big data at the Neural Information Processing Systems (NIPS 2016) conference in Barcelona, Spain.

The technique, which works with sparse data and uses a merge-and-reduce procedure, examines every data point in a huge dataset, but it remains computationally efficient because it deals with only small collections of points at a time.

The researchers say the technique is useful for tools such as singular-value decomposition, principal-component analysis, and nonnegative matrix factorization. They note for applications involving an array of common dimension-reduction tools, the method provides a very good approximation of the full dataset.

The researchers say the technique could be used to winnow a dataset with millions of variables to just thousands. The approach is tailored to data analysis tools with applications in natural-language processing, computer vision, signal processing, recommendation systems, weather prediction, finance, and neuroscience.

From MIT News
View Full Article

 

Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account