Internet Archive Says It Has Restored 9 Million Broken Wikipedia Links By Directing Them To Archived Versions in Wayback Machine (archive.org)
Mark Graham, the Director of Wayback Machine at Internet Archive, announces: As part of the Internet Archive's aim to build a better Web, we have been working to make the Web more reliable -- and are pleased to announce that 9 million formerly broken links on Wikipedia now work because they go to archived versions in the Wayback Machine.
For more than 5 years, the Internet Archive has been archiving nearly every URL referenced in close to 300 wikipedia sites as soon as those links are added or changed at the rate of about 20 million URLs/week. And for the past 3 years, we have been running a software robot called IABot on 22 Wikipedia language editions looking for broken links (URLs that return a '404', or 'Page Not Found'). When broken links are discovered, IABot searches for archives in the Wayback Machine and other web archives to replace them with. Restoring links ensures Wikipedia remains accurate and verifiable and thus meets one of Wikipedia's three core content policies: 'Verifiability.'
For more than 5 years, the Internet Archive has been archiving nearly every URL referenced in close to 300 wikipedia sites as soon as those links are added or changed at the rate of about 20 million URLs/week. And for the past 3 years, we have been running a software robot called IABot on 22 Wikipedia language editions looking for broken links (URLs that return a '404', or 'Page Not Found'). When broken links are discovered, IABot searches for archives in the Wayback Machine and other web archives to replace them with. Restoring links ensures Wikipedia remains accurate and verifiable and thus meets one of Wikipedia's three core content policies: 'Verifiability.'
Exactly what I was thinking. A site posts something that creates a situation, they take the page down and engage in PR spin, Wikipedia links to the archived copy of the page to demonstrate what content had been there, and then the site modifies their robots.txt, retroactively clearing the content from the IA.
I understand IA's policy of abiding by robots.txt, but when someone needs to be held accountable for what they said, having a single source that can serve as a living embodiment of "the Internet never forgets" would be quite nice.
Except the Internet Archive is a recognized library, which means they actually have powers to ignore DMCA takedowns. In fact, as a library they get a lot of exceptions to the DMCA. It's why they host a lot of copyrighted material for free
It's one of he few positives of the DMCA.