acm-header
Sign In

Communications of the ACM

ACM TechNews

British Library Sets Out to Archive the Web


A screen in the British Library.

A screen in the British Library showcases the latest task in its archiving of every British website and e-book.

Credit: Lefteris Pitarakis/Associated Press

The British Library is archiving every British website and e-book in a monumental undertaking aimed at maintaining the country's digital memory for future researchers, regardless of technological change.

With the rise in popularity of computers and mobile phones, future historians have been losing valuable material, such as firsthand accounts of the 2005 London transit bombings.

"The average life of a Web page is only 75 days, because websites change, the contents get taken down," says the British Library's Lucie Burgess. "If we don't capture this material, a critical piece of the jigsaw puzzle of our understanding of the 21st century will be lost."

Although the British Library has been archiving pieces of the Web for years and has collected about 10,000 sites, in the past it had to obtain permission from website owners to do so. A 2003 law changed the permission requirement, but legislative and technological issues have slowed the archiving project.

The effort relies on an automated Web harvester that will scan and record 4.8 million sites with 1 billion Web pages. To protect the archive for future generations, there will be multiple self-replicating copies on servers across the United Kingdom, and files will be converted into updated formats as technology evolves.


From Associated Press
View Full Article

 

Abstracts Copyright © 2013 Information Inc., Bethesda, Maryland, USA


 

No entries found