Sign In

Communications of the ACM

ACM TechNews

U.s. Library of Congress Saving 500 Million Tweets Per Day in Archives


View as: Print Mobile App Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
Twitter bird in jar

Credit: Soul Culture

The U.S. Library of Congress expects to finish the initial stage of building a Twitter archive by the end of January. In April 2010, Twitter agreed to provide an archive of every public tweet since the company went live in 2006. The initial four-year archive contained about 21 billion tweets that take up 20 terabytes when uncompressed, including data fields.

The Library of Congress is storing 500 million tweets a day, and has added a total of about 170 billion tweets to its collection. The focus will now shift to making the collection accessible to lawmakers and researchers. "It is clear that technology to allow for scholarship access to large data sets is lagging behind technology for creating and distributing such data," the library says.

The full archive now requires 133.2 terabytes for two compressed copies, which are stored on tape in separate locations for safekeeping. The library already has received 400 inquiries from researchers studying citizen journalism, vaccination rates, stock market trends, and other topics.

From IDG News Service 
View Full Article

 

Abstracts Copyright © 2013 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account