Sign In

Communications of the ACM

ACM TechNews

Managing the Data Deluge


View as: Print Mobile App Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook

The Texas Advanced Computing Center (TACC) at the University of Texas recently unveiled the Corral, a central repository for data collections designed to handle the processing requirements of data-driven science. Corral features 16 server nodes and 1.2 petabytes of storage, which is four times larger than any other data-collection resource on the TeraGrid.

Larger and more sophisticated computing and visualization systems, such as the Ranger, Lonestar, and Stallion, generate immense amounts of data, which needs to be properly stored and managed. Furthermore, traditional data-collection repositories, such as museums and physical archives, are renovating themselves for the 21st century.

Corral addresses the need for digital preservation and document and specimen management, and provides archives that allow data to be shared and explored more thoroughly than previously possible. "We're ahead of the curve in terms of providing this kind of dedicated data collection and application resource," says Chris Jordan, who is responsible for data infrastructure at TACC. "A lot of other sites are doing data collections, but very few sites are providing this kind of universally accessible, unified resource."

TACC expects the repository to be completely full within two years and has designed the system so that it can be made up to 10 times larger. "The advantage of having Corral is that we have the ability to offer services based on new methodologies, and to support them in a very flexible way," Jordan says. "This gives us the opportunity to learn what some of the best practices are and share that information with a wide variety of projects."

From University of Texas at Austin
View Full Article

 

Abstracts Copyright © 2009 Information Inc., Bethesda, Maryland, USA


 

No entries found