Robust Hyperlinks: The End of 404s?
Tom Phelps writes, "URLs can be made robust so that if a Web page moves to another location anywhere on the Web, you can find it even if that page has been edited. Today's address-based URLs are augmented with a five or so word content-based lexical signature to make a Robust Hyperlink. When the URL's address-based portion breaks, the signature is fed into any Web search engine to find the new site of the page. Using our free, Open Source software (including source code), you can rewrite your Web pages and bookmarks files to make them robust, automatically. Although Web browser support is desirable for complete convenience, Robust Hyperlinks work now, as drop-in replacements of URLs in today's HTML, Web browsers, Web servers and search engines."
I'll explain the 2 that come to mind right away:
1) Growing sites that may change servers, or domain names (add/on to dedicated URL, change domain name for legal/incorporation/buyout reasons), will see the massive traffic bleed they suffer until everyone realizes their site has changed virtually disappear. Yes, putting a redirect page on your "old home" may help, but for things like RSS file addresses, and other external connectors, which may have an effect on your site, this is a problem.
Ultimately, of course, for this to TRULY work there needs to be technology like this built into not only browsers, but virtually any software that uses HTTP communication (XML parsers, bots, spiders, etc).
2) I want to start offering streaming video on my site, and the single biggest obstacle for doing that is COST. Bandwidth, unless you OWN the pipe, is NOT cheap. I can (albeit in a somewhat underhanded fashion) set up a script to register, say, 24 different "free site" pages with the content to be the "correct" version of my page once an hour, and, unless the content is in VERY heavy demand, essentially have a free method of streaming video on my site.
Egads, I'm already feeling dirty about what I just said. Okay, maybe that's a little TOO unethical. But I guarantee someone will do it.
That said, the concept seems iffy. Based on the above, the fact that it works in all existing browsers, suggests to me that the form of the URL is the following:
>a href="http://robusturl.server.com?http://my.outdat edsite.com&keyword1="whatever"<
Namely, that anchors that use this URL will be sent to this server (apparently fixed in place), then redirected either to the working page, or to the appropriate search engine results. This means that the robust server will be running scripts. While I don't believe that the indent as described here would be to catalog all matches, all you need is one unscrupulous company that uses this and can now trace where you are and where you are going to quite easily with a bit of modification. I really don't like this potental, and personally I'll take a 404 anyday over potental privacy problems.
On the other hand, it migth be that the method uses Javascript, but at which point this nulls and voids any statement on "working on all existing browsers".
"Pinky, you've left the lens cap of your mind on again." - P&TB
"I can see my house from here!" - ST:
I'm pretty sure URL's where just a makeshift URI and some day the IETF was going to figure out how to do URI's right. Am I wrong?
sigs are a waste of space
--
Some 404's are just a way to pass time. Sometimes I go from site to site looking for pages that don't exist just to see what happens.
...poorly.
/. even uses these once a story has been archived.
anyone who's looked at the http spec for more than a millisecond will see that it already handles this case quite gracefully with the 3xx series of responses, including:
301 Moved permanently
302 Moved Temporarily
I think
Perhaps one of the keywords should be the previous URL? In fact, perhaps a better solution would be a new Meta tag of "Prev-URL" (or something similar) that search engines could look at and use to update their databases?
On an anecdotal note (or is that redundant?), I remember searching once, for the web site of a Land Rover owners club (I think it was Ottawa Valley Land Rovers in Canada) and was directed to a auto parts store in Australia -- turned out that the web pages had the names of lots of auto clubs in meta tags. The idea was to get people searching for the clubs to go to the store's site.
Stupid people will be persecuted to the fullest extent allowed by law.
send flames > /dev/null
Only 'flamers' flame!
This sounds like a good idea but you'll still see plenty of 404s if this gets into action.
Why, because 90% of 404's are a result of the page been taken down completely (especially if it's on geocities or xoom or some free provider).
A program that you could install for your browser like NetAccelerate (loads links off current page into cache when the bandwidth isn't been used) but simply loads the links far enough to detect a broken link or not would be very handy. Although it wouldn't solve any problems it would alteast stop you from getting your hopes up when you've finally found a link to a page that claims to be what you've been searching for for an hour.
It's turtles all the way down.
<ASSUMPTION>The 'word description' is going to be capable of describing a page adequately, and uniquely, per page, like an MD5 digest, rather than a simple text descriptor. The latter would just be silly.</ASSUMPTION>
I can see some value to this if the page is static and likely to be relocated, rather than rewritten, or deleted, but how is this going to work if the page is, dynamically generated from a database, and the whole site is prone to reorganisation (like what Microsoft's seems to be).
It might help more if there was a way to uniquely identify snippets of content within a page, and provide a universal look-up scheme based on unique fingerprints of these 'snippets'. Although I'm sure that pouts it straight into XPointers territory, isnt it...?
And an 'opt-out' system is necessary. There are lots of reasons one might want particular content to be transient.
free experimental electronic music netlabel at www.viablehybrid.com
Yes, but thats only one side of it, the pull side. Eventually systems will evolve to the point where a push model exists along-side the pull model for robustness. Unfortunately data structures change, companies reorganize, and no type of pointer will really ever suffice. It will have to change at some point. The robustness of a push model will facilitate these scenarios. It's not a question of if, it will happen, eventually.
This will also allow site owners to see who's linking to them, but obviously it should be utterly transparent (so that you can still link in private, but then you wouldn't get updates).
At some point we'll get there, it's just a matter of time. Questionable schemes such as the topic of this story are just a kludge, and probably not worth the effort.
PoC
Well it sounds like an interesting concept bu unfortunately I can't get to the site already. Surely it's too soon for the /. effect?
This sounds great - practical solutions to a real problem.
OTOH, there are already far too many sites where there just isn't an accessible URL anyway. Some are frame-based, some are dynamically generated. They all have the problem of not being bookmarkable (from within the browser's normal "Bookmark Here" function). Some do try to solve this though, by separately publishing a bookmark that will take you back to the same content.
If this idea is to really work, then it needs to be supported by dynamic sites publishing their Robust Hyperlinks, even for pages that don't have a "traditional" URL to begin with.
Definitely a heads-up for anyone looking for a quick technical fix to the problem.
Simply having a search string included seems a bit of a kludge to me.
- -
What about it the link tag in the html also contained the date/time it was created. This way the browser would now how old it was. It the browser sent this to the server as a header then if the server couldn't find it it could check some database or whatever to see what the directory structure was like at that time and work out what redirect to use. If bookmarks also contained this date/time then surely the server could tell the browser to update the bookmark (after warning the user, of course).
This would be pretty cool on an interactive site where the server could rearrange query strings or whatever if the serverside scripting had been given a big overhaul/re-organization.
Basically, surely the server itself, and not some search engine would best know how to fix a broken link and it would only requires a couple of new headers and should be easy to implement at least on the client side.
-----------------------------------------------
"If I can shoot rabbits then I can shoot fascists" -
- My page has been moved for some reason or another.
- The old page no longer exists at all, i.e. I don't have a redirect on it. (side note, surprisingly enough, many providers will be happy to keep your redirects around for an almost infinate length of time. It's not like they take up a lot of space or bandwidth.)
- I built the first page with a specific set of keywords and I kept those keywords on the new page
- The search engines FINALLY got around to spidering/accepting my site. (Note that it can currently take up to 6 months to be spidered and Yahoo may not reaccept you site.)
And this allows us what?-----
No Zen is good zen
Alexa also collects detailed information about what you look at with your browser, although they of course claim to use it only in the aggregate.
This makes one big whopper of an assumption: that the web page has moved and still exists somewhere. Well, the major cause of 404s that I know of is web sites simply going away.
So you get a 404 and you want to use a search site to find where it went? That's fine if it's been long enough since the move to give the web crawlers time to find it... there's a lot of web space out there to search!
But here's the good one: what if someone decides to hijack your web site by simple keyword spamming? All they have to do is set up their own page with the right keywords, get it indexed, and anyone who uses an "old" link will get redirected to them instead! And if web pages can be defaced, they can be removed, too, thus forcing the 404 and the search!
Better yet, use wholesale keyword spamming to get all those "dead" web pages pointing to your e-commerce site!
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
... as in, "It's a good idea, but!" As has been pointed out, there are potential privacy issues. For the "average" user, though, I don't think this is a terribly big deal. What becomes a problem, then, is access to the Robust URL redirector (as I understand it from posts, the site seems to either be simply down, or a victim of the /. effect). Since all Robust URLs have to pass through the redirector, what happens if the redirector is down? What happens if the redirector is unreachable?
Furthermore, simply feeding keywords to a search engine doesn't guarantee finding your page quickly, or even finding it at all. Designers would have to include unique keywords - words that might not even apply to their page - so that a Robust URL search would turn up only their page. Not only does this bloat HTML code, but it also confuses people using search engines in the usual way.
Certainly a good idea, as many people hate 404s (bah, they're just a fact of life), but it seems like it's got more than a few bugs left in it.
--
You're not wrong. There is in fact a proposal about the form and resolution of URNs (which are location independent) from the IETF. I don't know its status.
As far as I can tell this scheme relies on checksums of the static content of web pages to find the correct web page. So what does this do to dynamically generated content?
Also, somebody else mentioned that they had a project on SourceForge which was basically like the Web, but in a completely distributed manner. This makes a lot more sense to me. The notion that my bits must cross a continent to retrieve data on a certain TOPIC seems a bit archaic. I shouldn't know or care where the data of the topic is stored...I just want it. Also, having a distributed web like this, as the person suggests, will make it a lot harder to invade privacy or censor material.
It's 10 PM. Do you know if you're un-American?
Will this still work even if someone tries to add lots of context words to the search engines so it comes to their page instead?
Don't mean to be the Devil's Adocate, it is just my game programming / design skills kicking in. Whenever someone adds a usefull feature, you must look at the ways people will try to exploit this.
"Live free or Die" - Ironically, seen on a license plate.
Frankly, I'd rather just get the 404 than waste time digging through erroneous links.
By the way, there are hypertext systems that address this issue in ways that actually solve the problem - the now defunct HyperG system was very intelligent about redirecting requests.
Eric