A team of Harvard Law School researchers, working with the digital team of The New York Times, said that a quarter of the deep links in the New York Times article are now rotten, leaving the page completely inaccessible.
They found that this problem affected more than half of the articles containing links in the NYT directory (dating to 1996), which illustrates the problem of link corruption and how difficult the context is to survive on the Web.
The study studied more than 550,000 articles, which contained more than 2.2 million links to external websites. The survey found that 72% of these links are “or point-to specific rather than general websites.
It’s foreseeable that as time passes, links are more likely to be broken: 6% of links in articles in 2018 were inaccessible, while up to 72% of links have expired since 1998. For an example of link corruption that’s been widespread recently, look at what happened when Twitter banned Donald Trump: all the articles embedded in his tweets were messed up with gray boxes.
The team chose The New York Times in part because the newspaper is well-known for its archiving practices, but this doesn’t mean that the New York Times is exceptionally different about link rotation.
Instead, take the recording paper as an example to illustrate the phenomenon that occurs on the entire Internet.
Over time, websites that once provided valuable insights, important background, or evidence of controversial claims through links will be sold or sold, or simply cease to exist, causing links to lead to blank pages, or worse.
BuzzFeed News reported on the underground industry in 2019. The existence of this industry is that customers can pay marketers to find dead links in large media such as The Times or BBC and buy domain names on their own:
They can then use the link to do whatever they want, such as using it to promote a product or hosting a message on the subject of a mocking article.
Link rot not only affects journalism. Imagine if you deleted Rick Astley’s “Never Give Up” video and uploaded the video again. Reddit‘s threads and tweet replies will be countless, which will no longer make sense to future readers. Or imagine if you tried to display the NFT and found that the source link now points to anywhere. What a nightmare.
Some work has been done in trying to preserve the link. For example, Wikipedia requires contributors who write citations to provide links to archived pages on websites such as Wayback Machine if they think the article may change.
There’s also the Perma.cc project, which tries to solve the problem of link corruption in legal citations and academic journals by providing archived versions of pages and links to original sources.
Spreading similar projects there’s unlikely to solve the problem of the entire Internet (including social networks) or even journalists. Until a solution is found, the article will continue to lose more and more context as time passes.
To give a perfect example: our article on link rot, published in 2012, had a source link from the Chesapeake Digital Preservation Group, which now points to page 404.