Users at the Oak Ridge National Laboratory's Oak Ridge Leadership Computing Facility (OLCF) are able to use R, the most commonly used data analytics software in academia, to manage and analyze enormous datasets generated by supercomputers.
Typically used to analyze small datasets on regular workstations, R has been scaled to enable researchers to expedite analysis by at least an order of magnitude.
The Programming with Big Data in R project was funded by the U.S. National Science Foundation for use on OLCF's systems. The team wrote the code to conduct deep data analysis from the R language, developed the high-level infrastructure to allow for easier implementation of statistical computations on supercomputers, and optimized the library and data input choices on the thousands of cores in OLCF's Rhea, Eos, and Titan systems.
"The main idea is to use some of the same scalable libraries that are already used by simulation science and supercomputers," says OLCF's George Ostrouchov. "We not only make them easily accessible from R, but we also built infrastructure inside and outside R that makes it easier to implement statistical matrix methods in a highly scalable way."
Using scaled R, the group took a complex analytical problem that typically takes several hours on Apache Spark and analyzed it in less than a minute.
From Oak Ridge National Laboratory
View Full Article
Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA
No entries found