acm-header
Sign In

Communications of the ACM

ACM TechNews

Internet Archaeologists Reconstruct Lost Web Pages


Indiana Jones, a well-known (fictional) archaeologist.

Internet researchers are reconstructing links that have vanished from social media platforms, from the clues they leave behind on the web.

Credit: SimplyAccessible.com

Previous research by Old Dominion University researchers Hany SalahEldeen and Michael Nelson suggests that links shared over social media platforms such as Twitter were disappearing at the rate of 11 percent within a year and 27 percent within two years. SalahEldeen and Nelson have now embarked on an effort to reconstruct deleted posts and resources in part from the clues they leave behind on the web.

SalahEldeen and Nelson used the Twitter search engine Topsy, which allowed them to enter the address of a missing resource and return the tweets that refer to it, or to obtain the resource's tweet signature. They then extracted the top five most frequent terms in this signature and used them as a search query in Google. The result was a list of potential replacements for the lost resource. SalahEldeen and Nelson also tested how closely the replacement candidates matched the original resource by carrying out the same process for resources that had not disappeared and then comparing the replacement candidates with the originals. They report that the replacements had a 70 percent textual similarity to the original resource about 40 percent of the time.

From "Internet Archaeologists Reconstruct Lost Web Pages"

MIT Technology Review (09/18/2013)


View Full Article


 

No entries found