Linkguard To Cure Broken Links?
sean dreilinger writes: "Here's a BBC writeup of the company Linkguard, which hopes to monitor hyperlink stability via their 40-terabyte database and notify web authors when links are broken." This is a different effort than this one. Still, 40 terabytes?
Yeah, these methods are kind of a pain in the ass, but they're only worse if this new plan can do no better. But look at it -- they want every single link out there to be rewritten to their spec. Who does that help? Millions of web authors out there trying to rewrite documents to filter into this (really small) database can't possibly be easier than having the much smaller set of web admins adding server redirects whenever they notice more than a handful of 404 errors on the same document.
DO NOT LEAVE IT IS NOT REAL
Or better, Linkguard will work with Netscape and Microsoft to have the browsers automatically redirect you to companies who have paid money to have 404's intercepted and -- instead of redirecting you to the original site as the designer intended, will steal you away to some big corporate website.
"Jeeze, every time I run into a 404, I wind up at eBay.com!"
---
icq:2057699
seumas.com
That will only check outbound links. That's not the problem what's being solved. The problem is checking for inbound links. That is, links on other peoples websites to your website. That isn't easily solved with a short script.
-- Abigail
It's like name resolving using only a single DNS server.
It shouldn't be too difficult to write a short script that would check all the links on one's web site periodically. Using a 40 terabyte database to effect this is insanely ineffective.
With a simple cronjob and Perl's wonderfulLWP module package, not to mention the other implemtations of tracking web-pages, any relativly smart administrator should already be doing this. It comes down to this, programmers are lazy and that is good, but is this just too lazy? phooey. Maybe this should be done as an apache module .. hrmm... maybe i should write that one.. mod_url_validator />
<Location
Add-handler Check-Links
</Location>
or something like that... no i dont like it. too much overhead. well at least my first offer works, because i use it.
Seriously, don't they already have the database?
There's two problems that this thing could never catch with dynamically generated pages - one, is the famous "missing include" problem which usually looks something this in the middle of the page: [Unable to process directive] - not a 404, as the page renders, but definately a Bad Thing. The second problem is that of 404's which appear and disappear at random - like doing a sitewide search & replace across a hundred include files - that has a tendancy to lock files, producing share violations, which in turn result in 404's. So the database can't have 100% integrity.
The other problem is the rapid amount of turnover on the web - millions of pages are appearing and disappearing every day. Those quantum people thought virtual particles were odd - try tracking down the same piece of information you found in a search engine 2 weeks ago!
The second problem is operator/server/network errors. I've seen misconfigured proxies that mangle the URL and produce 404's when the page is there.. I've seen people make typos in the URL field of their browser (and then report it to me!), hell.. I've seen the 'net itself eat a few pages. All of this increases entropy in the database.
Finally.. I like seeing the ocasional cool 404 error. Take this one, from my server:
If you want to make sure that you don't break any links when you move your website, all you have to do is consult your HTTP logs, pull out all the lines starting "Referer:", and remove the duplicates.
Tarsnap: Online backups for the truly paranoid
I like that bit about cataloging pages with a five-word "lexical" signature based on words that appear mainly only on that page. How are they going to deal with the 5,000,000,000 web pages that contain only the word "porn"?
- There's a free service that already does this. They do it even if you don't ask them to, then send you spam telling you about broken links on your site.
- And there's Alexa, which really does archive the Web so that you can find old pages.
- Personally, I like the link checker in Dreamweaver. It's very well integrated with the site maintenance tools.
Probably the biggest source of bad links is unmaintained "favorite links" lists. That's something that needs a simple tool. If the major free-web-page sites provided something, that would probably cut the number of dead links substantially.The Network Working Group is working on a replacement for URLs -- Uniform Resource Names. URNs are intended to serve as persistant, location-independent, resource identifiers and are designed to make it easy to map other namespaces (which share the properties of URNs) into URN-space.