Slashdot Mirror

← Back to Stories (view on slashdot.org)

Fixing Broken Links With the Internet Archive

Posted by Soulskill on Friday January 24, 2014 @08:55AM from the maintain-URIs-or-T.B-L.-will-beat-you-up dept.

eggboard writes "The Internet Archive has copies of Web pages corresponding to 378 billion URLs. It's working on several efforts, some of them quite recent, to help deter or assist with link rot, when links go bad. Through an API for developers, WordPress integration, a Chrome plug-in, and a JavaScript lookup, the Archive hopes to help people find at least the most recent copy of a missing or deleted page. More ambitiously, they instantly cache any link added to Wikipedia, and want to become integrated into browsers as a fallback rather than showing a 404 page."

2 of 79 comments (clear)

Min score:

Reason:

Sort:

Re:No. 404 is important! by Sarten-X · 2014-01-24 09:10 · Score: 4, Insightful

Supply HTTP code 404, and provide the content of the old page, preferably with a large banner saying "we couldn't find it, but here's what we had before".
I believe that meets all applicable standards. Automated systems should recognize the 404 code, and human systems (which won't likely see the underlying code) will see the banner.

--
You do not have a moral or legal right to do absolutely anything you want.
Re:No. 404 is important! by Minwee · 2014-01-24 09:43 · Score: 4, Insightful

Sorry but that violates the standard as well. It must return a 404 or you break testing.
RFC 2616 mandates a 4xx error code followed by an optional human readable reason phrase. While the reason phrase is usually "Not Found" for a 404 error, there's nothing keeping it from being augmented by "...but a copy of a previous version is over there."
If your testing relies on anything beyond the numeric error code, then it's probably already broken.