Robust Hyperlinks: The End of 404s?
Tom Phelps writes, "URLs can be made robust so that if a Web page moves to another location anywhere on the Web, you can find it even if that page has been edited. Today's address-based URLs are augmented with a five or so word content-based lexical signature to make a Robust Hyperlink. When the URL's address-based portion breaks, the signature is fed into any Web search engine to find the new site of the page. Using our free, Open Source software (including source code), you can rewrite your Web pages and bookmarks files to make them robust, automatically. Although Web browser support is desirable for complete convenience, Robust Hyperlinks work now, as drop-in replacements of URLs in today's HTML, Web browsers, Web servers and search engines."
(please, sir, may I have another?) http://www.brunching.com/nothing.html
I haven't examined robust.jar, but I believe Netscape has reused Sun's jar authentication scheme (make a file with hashes of some of the files in the zip archive, and sign that file) to support signed JavaScript. I don't see the point, though, as Navigator allows unsigned JavaScript to take any number of unfortunate actions).
Nobody publishes their ActiveX source, so don't forget the PE loader and x86 machine code emulator!
This sounds pretty similar to an Eternity Service, and I suggest that would be a less confusing name.
This is what responses like 301Moved Permanently and 410Gone are for. By definition, any server using 404Not Found has no idea what happened to the resource in question and couldn't have notified anyone else.
Possibly this is a sign that your thoughts weren't that interesting in the first place?
If you find yourself consistently wantING to delete your own comments, maybe you should just wait about 5 mintues before posting and see if anyone else has posted the same thing.
If so, post something more original, or just troll.
NATALIE PORTMAN NAKED PERTRIFIED GRITS DOWN MY/HER PANTS
Moderate this Informative. Thanks man.
BTW, I wish people would stop referring to a web page as a "Paper". "Treatise" would be more appropriate.
I thought of this years ago. If somebody patents this and makes money off it I'll be so pissed...
You must be from massachusetts (massho
I'll say it again "Live Free or Die". Don't like it? Mind your own business.
Most people in the world will return a 404 when asked about New Hampshire. And we like it that way.
Could someone please mirror the page if they get to read it or provide a short blurb.
the signature is fed into any Web search engine to find the new site of the page
How long would this take? I'd imagine this would clutter web search engines even more...what if altavista had three links to the a page with three different URLs (two of them, being old links, one being the recent location). Its sounds like a great idea and even somewhat practically but I still not sure if it would be generally accepted.
I actually read through my error logs to let webmasters or even myself know where bad links are located on their site.
Here's a fun web page, 404 Not Found Homepage
Read the page. It's based on ActiveX and requires IE 4 or better.
This is just *another* case of Linux falling behind due to it's lack of support for common Internet standards. Where is our ActiveX? COM? Granted, I can occasionally watch as the Java ads on Slashdot cause Netscape for Linux to crash, but that seems to be the extend of Linux's so-called internet connectivity.
And you wonder why people are forced to use windows+IE? If they want to make use of the latest technologies, for example 'Robust URLs' (though maybe they should have invested in a Robust Server), then Linux, sadly, can't keep up. We as a community are being left behind in the Internet arms race. Fortunatly, I have a few ideas:
Get a task force composed of Richard Stallman, Bruce Perens, and ESR to develop and debug ActiveX support for Linux. Estimated time: 2 months.
Form an Open Source Browser Committee to create a new, Open Source web browser that supports all the latest standards (CSS, DOM, DNA) Estimated time: 3 months.
Push for Perl to be embedded in all new web browsers so that CGI programs can be run on the user's machine, which will reduce server loads. Estimated time: 1 month.
Design a new, Internet-ready desktop for Linux, Give it a web browser, probably the new one I described above, and embed it in everything: file manager, word processor, start button, etc. Estimated time: 4 months.
I think that with these items accomplished, Linux will truly begin to shine as a web platform, even for the newest users.
Did any of you morons who moderated this up actually notice that the link is to www.scoobydoo.com? Geez! At least two of you didn't!
The parent message is a fucking troll!!!
Nope, and it wouldn't work very reliably even without people actively trying to subvert it. Their paper on the subject notes several major problems (what if the page changes? can't find it. what if the keywords are unique now, but not in the near future? can't find it. what if the new location hasn't been index yet by the search engine you use? can't find it).
to elliminate pop-up windows, just turn off java and javascript.
It's not down, it just looks like it. It's a little joke. Read the page carefully, it's not a real 404 error, but merely a statement. They're making a point that you won't have to see it in the future. Don't be so quick to click back next time. Muerte
URNs are still being hammered out. In the meantime, oclc.org has already implemented most of URNs' probable features in what they call Persistent URLs (PURLs). They allow access to the source code for PURL servers (which basically do an HTTP-Redirect from your unchanging PURL to your mobile URL). Don't know about licensing off-hand.
maybe now i can finally find all of those porn sites that keep on goin missin
Now one has to wonder if the UC at Berkeley is the new player in the Slashdot effect game, or if the use of Napster in university campi really can use all that bandwidth... =)
--
Marcelo Vanzin
Marcelo Vanzin
Oh, and holding down left-shift on my keyboard didn't seem to help any.
WWJD? JWRTFM!!!
... the porn servers start embedding shitloads of common 5-word phrases in their pages so every 404 takes you straight to "101 pussies for today" or wherever.
From what I'm reading here, the form of those URLS this guy is generating is actually illegal syntax. That is, with the '?' character, that is intended as a query and any proper web server would attempt to run a CGI type script with it.
If you want to know more about URNs, and my implementation of them in Java (replaces most of java.net) go to http://www.vlc.com.au/~justin/java/urn/
Life is complete only for brief intervals in between toys or projects -- John Dalton
Sadly NH still hasn't gotten around to changing it to "Live Free or Die, Punk"
-- This and all my posts are in the public domain. I am a lawyer. I am not your lawyer, and this is not legal advice.
< A HREF="http://my.outdatedsite.com/page?robusturlkey words=farts+sandler+zippo+methane+boom" >
So the "robust keywords" are just an HTTP query string attached to the usual URL. When the server goes to produce a 404, it presumably calls a CGI (the distribution's jar file probably contains a 404 servelet or some such beastie) which re-directs (301 or whichever) to google.com with an appropriate query string based on the keywords in "robusturlkeywords".
As an HTTP junkie, I have to say I'm not too fond of it; you're ruining the whole point of 404 semantics. (Kinda like sites that redirect you to their homepage when you give them a bogus URL - it irks me to no end.) It would be much more straightforward (and less prone to attacks and the general unreliability of search engines) for server administrators to start maintaining proper 301-Moved Permanently databases and perform lookups in those whenever the server hits a 404 condition.
Just MHO.
Left shift 1 for e-mail...
Robust links may be great but I can't connect to the site to learn more about it.
After posting this, I notice the comment submitted before me said the same thing I just wrote. (The comment after me also says the same thing) It would be nice if we were allowed to delete our own comments to save some moderators some redundant moderating.
"ActiveX and COM aren't common Internet standards they are just the work of a proprietary company!"
;-). And personally, I don't see a whole lot of difference between ActiveX and Perl (or TeX or Python or TCL or...), neither are standardized in any even vaguely meaningful sense of the word.
Of course, that doesn't stop many of those same people from complaining about lack of Java on Linux
Apparently de factor "standards" only count when they come from the Good Guys.
I got a 404 Not Found error when I tried the link!
[OK, it was a server down or unreachable error, but it was funnier the other way]
Has it been Slashdotted already
Donte Alistair Anderson Roberts - hi son!
Karma: Chameleon
If something is too good to be true, it probably is. Besides, chances are someone'll patent it sooner or later. ;-(
love the idea of no more 404s, but, seeing as the server is appropriately and thoroughly slashdotted (or was when i looked), what about a url that's robust enough to survive the 'effect'?
- Entertaining Bits from the Ancient Kernel Tree
I can just see it... pr0n sites won't no longer need all those senseless keywords in their meta-tags to show up on innocent-looking keywords you feed a search engine...
No - now all they have to do is to stuff the "Robust Redirector" with some makeshift-keywords they extracted by spidering over a load of webpages, and presto! --- You've Got PR0N!1!
That's kind of like they do now, with sitenames that are popular "speling" errors of other sites...
Also, who's going to prevent people using the same keywords for their page, and how is the process of choosing between n possible redirections going to be handled, as it should be "transparent" to the user?
I guess there's a lot of thought-work left before this reasonably can go live... and still, how many of you have Smart Browsing enabled in Netscape, and how does this differ, privacy-wise?
(Mmmmh... Portscan... ARGH)
np: Boards Of Canada - Unknown Track 2.mp3 (Live)
As always under permanent deconstruction.
"I'm not anti-anything, I'm anti-everything, it fits better." - Sole
So the link to Robust Hyperlinks doesn't work. Sigh.
This sounds interesting if your using unique keywords for something like a family web site and you have a unique sername. However, what will happen when I havea site that needs to use "Hot", "Sex", "Babes", "XXX", "Nude" for my keywords? How many other sites are going to have the exact same keywords? Or more seriously, how about "Smith", "Family", "Web", "Page"?
How will it help me if my URL changes?
quack
On the other hand, it migth be that the method uses Javascript, but at which point this nulls and voids any statement on "working on all existing browsers".
From freashmeat you can see that the appropriate file for it is called Robust.jar, so I think you're probably correct there :)
JavaScript has nothing to do with Java. The fact that the file ends inLook more carefully at that 404 message. It's a joke.
Hm. The site seems to be slashdotted.
With Harvest, indexing software that is several years old, an indexing engine that identifies documents by their MD5 signature is easy to build, I've done this. So what these people are proposing isn't exactly rocket science
All of the required technology is present in Harvest, it just never became popular. My guess is that cool ideas have to be reinvented in Berkeley before the world gets to see them applied at large, see Yahoo! for another example.
Seems to me what is needed to make this (Robust 404s) work is a database whereby all the URLs that refer to your pages go first, to be redirected to your page. When you change a page, you notify the database and off-site redirections follow the moved page. If you're careful on your own site, you never kill off a URL but instead have it refer forward .... doesn't work, of course, if your main URL moves but. Of course, this may be what the original article talks about, but it's either 404 or /.ed so I can't read it.
"A gun is a tool, Marian. No better, no worse than any other tool. An axe, a shovel, or anything." Shane (1953)
Well I would say something like ActiveX is bad is not common at all. But I prefer to address the embedded perl statement, since I program in Perl. Why would I want the perl to run on the client machine? The problem with Javascript, JScript or any other client-side technology is the client. Major vendor refuse to follow any sort of standard forcing me to write 4 different version of the code to do the same thing and detect what platform and browser its running on. Cross-platfrom does not mean to me writing it for each platform then choosing the correct one to run. Server-Side technology such as PERL, PHP, C++ etc... allow me to access databases, generate dynamic code, but still spit out plain ole HTML.
If you have an idea don't pass the buck and say all these "famous" OpenSourcers need to do this. Go do it yourself...then maybe you won't be so quick to say how easy and quick it would be.
Remove the spam reference to email
I can't get through to the site to see if they address the most common 404 problem I have. The problem is that I sdo a search, find the page, never been then before, but now it has moved. How am I supposed to get this extended data about the page if the page moved before I ever saw it without webs earch engines storing this information too... Sure, Google can do it because of caching, but the others would be out of luck. In any case, 404 can never go away, things come up, things go down, things move. It may be possilbe to fix moving problems, but once a page goes down, it goes down :) Maybe forcing everyone to chmod directories so we get 403s instead, then 404s wouldn't be around so much :)
XML is like violence. If it doesn't solve the problem, use more.
URI's are the generic term; you mean `URN'.
There are several different proposed URN systems being worked on right now (the document even mentions some, such as PURLs and handles). The big problem with these new specs is that there are a larger number of conflicting requirements dependsin on what you really want to do, so they're unlikely to be able to settle on just one proposal (they've been trying for several years).
Still, after looking through the `robust Hyperlink' documents, basically all of the old URN specs that I've seen are better than this, so I hope it doesn't distract people too much.
IANA 404 research scientist, but... Why can't my browser just open a connection to the web page, and if the heading starts with "404", not load the page and simply flash a warning that the page is not available?
The software seems to pick out the most unusual words in a page. Typos can get quite unusual. One of their papers gives an example that uses "peroperties" as an index word. On the target page, it's clearly a typo for "properties". If the authors of that page ever bothered to spell-check it, that word would go away, and the paper would be that much harder to find.
(I've already sent them an email about this.)
Chris
Ask me about Nanotechnology, Dyslexia Correction. Tell me about A.I., robotics, infrastructure.
actually, the welcome signs are even better
live free or die
pay toll ahead
The web is great for the sorts of things that lots of people (particularly fellow geeks) are interested in: software, OS issues, MP3s, goat pornography, and Mahir Cagri.
_ got_the_word_yo.asp is a document on canine interspecific coprophagia based on the presence of several uncommon words...
% 20christ+coprophagia+free%20pics+online% 20investing
But what if I'm looking for something specific? The web has been nearly useless to me when I wanted to find information on ancient illuminated Arabic text, or pictures of Microsoft Bob in action (for a parody).
So do "robust hyperlinks" help me or hurt me? Say I get a dog who has certain unsavory habits with regards to my cats, and I want to look up links about "interspecific coprophagia". Also assume for a moment that the next Korn clone band names themselves "coprophagia". Good search engines allow me to exclude entries that have certain words, but what happens when "robust hyperlinks"-based software assures me that http://www.coprophagiaonline.com/new_releases/ive
...are we just using new technology to make search engines even more frustratingly inaccurate?
lexical-signature= "sex+mp3+porn+alissa%20milano+beanie%20baby+jesus
... Not so long as I own the domain name! *muahahahahahahahahhaaaaaaa*
Then again, the domain name just won't be funny anymore if 404 Errors go away. *sigh*
--Ruhk
404 Error:
I can see it now.
Porn sites start copying the five words of large portal and news sites and in the event of a 404 for one of those sites you automatically get redirected to the site you really "wanted" to visit anyway.
Anybody know if this is going to be an actual standard or just something usefull until a new truly robust adressing system gets adopted. It might be on the site but that's sort of unreachable right now.
When you desert one host or modify your site, why don't you leave forwarding messages (or 302 responses) to tell people where to find your new content?
How's that for a great idea?
Oh GREAT. Just what we need--so much for the whole "you can't 'accidentally' find porn on the Internet" argument. This just throws that out the window, because all a porn site needs to do is hijack the right search keywords and wait for cnn.com to have a broken link.. *poof* millions of users get sent to porn.
Not only that, but it makes site debugging a pain in the ass.
Thanks Berkeley!
...just what it says... We can only pray. I hate 404's and i am assuming you do too. I hope with all my heart and soul this works...
"As many of you know, I was very instrumental in the founding of the Internet" --Al Gore to Katie Couric 3/99
You like 404s ? Try this one: http://www.g-wizz.net/wibblewibblewibble.swf.
Yes, that file extension is a hint...
I think it was back in 1995 when I saw a warez page (on Geocities) which used a feature like that.
If a visitor couldn't reach the site because Geocities had taken it down, he just needed to feed "paer9udtzk6gn8modfi" (paraphrased, of course) into Altavista to be pointed to the new location.
WOW.
This is a fantastically great idea.
How long before we get URLS like freenet://contraband_information.html ?
-k
I took a look at this, and it looks quite neat. If Freenet manages to get this right, I hope it really takes off. I especially like the idea of not having to dole out tons of cash or make do with a free web service in order to get something published.
-RickHunter
--"We are gray. We stand between the candle and the star."
--Gray council, Babylon 5.
by far the largest number of problems i have with chasing information down (information that was not removed, but simply moved to another location) is because it has been moved OFF the world wide web and into the INVISIBLE WEB, meaning that it is accessible through a query to some database. the thing is, that the final location of these content pieces is generally known in advance to the site that is hosting them - and then the easiest way for users to relocate content would be to attach to it tags that define its location as a function of time.
"Live free or Die" - Ironically, seen on a license plate.
It's worse than that. That state tried to penalize someone for covering the slogan. When someone tried to exercise his freedom of (non)speech by putting electrical tape over the slogan, the state took him to court. I seem to recall the case going on for a long while through several appeal processes where the state tried to force people to spout slogans about freedom. The irony was apparently completely lost on the bureaucrats enforcing the slogan.
I'll re-iterate what the AC said, only without the flamebait.
FREENET is already a widespread term, referring to MANY local public-access community supported ISPs. A quick lookup gives 16 countries with 233 separate groups.
It is unfortunate that nobody told you of the name overlap before this, but using "freenet" for your web will only generate anger among people already familiar with the community free ISP usage.
Hmmmm - Is it possible that the socialists (free public access to whatever) and the libertarians (Where were you when they took our freedoms?) have really never heard of each other's Freenet until now? I'm only familiar with the ISP usage, where it is
Does anyone else think that this will just be another way for people to trap you into looking at their shitty pr0n sites?
-- Dr. Eldarion --
If you are relying on a search engine to "reconnect" the link you are going to have problems.
/.ed so I probably am missing the point entirely!)
Even the best search engines only index a small percentage of the entire web and then they are hideously out of date.
Not to mention the problems of someone hijacking your unique id by stuffing the search engine with bogus words.
(Disclaimer - I haven't read the actual article due to it being
This would be cool. No more 404s! That's the problem with the web. I also like the idea that this is being put in open source so that we can all benifit. At least it isn't Microsoft...
---------- I laugh at a dumb SysAdmin.
Well, it would be much easier to include a token somewhere (e.g., in a comment) that would be unique to this page. A randomly generated string of 20 ASCII characters would do the job.
But this is prone to the same highjaking attack as the original scheme.
A much better solution would be to fetch by MD5: teach search engines to compute MD5 sums of every document they index, then include MD5 sum somewhere in the URL.
That would also allow for better caching!
-- Stanislav Shalunov
how about an apache mod that automatically checks the urls as they are sent and changes them. then there'd be no need for any browser modifications.
infact, it wouldn't have to be an apache mod - any kind of executable that could be cron'd to check links every so often could have the same effect.
i'm not sure how this would fit in with the whole signature thing. i suppose we could just pgp sign our web pages and but the signing in comments.
but as with most of my ideas, someone's probably already coded this.
I went through the entire site, including the white papers. I looked at the actual Java code. Not a lick of ActiveX anywhere. Whomever posted this anonymously is either smoking crack, working for M$, or both. Robust Hyperlinks is pure Java.
I'll explain the 2 that come to mind right away:
1) Growing sites that may change servers, or domain names (add/on to dedicated URL, change domain name for legal/incorporation/buyout reasons), will see the massive traffic bleed they suffer until everyone realizes their site has changed virtually disappear. Yes, putting a redirect page on your "old home" may help, but for things like RSS file addresses, and other external connectors, which may have an effect on your site, this is a problem.
Ultimately, of course, for this to TRULY work there needs to be technology like this built into not only browsers, but virtually any software that uses HTTP communication (XML parsers, bots, spiders, etc).
2) I want to start offering streaming video on my site, and the single biggest obstacle for doing that is COST. Bandwidth, unless you OWN the pipe, is NOT cheap. I can (albeit in a somewhat underhanded fashion) set up a script to register, say, 24 different "free site" pages with the content to be the "correct" version of my page once an hour, and, unless the content is in VERY heavy demand, essentially have a free method of streaming video on my site.
Egads, I'm already feeling dirty about what I just said. Okay, maybe that's a little TOO unethical. But I guarantee someone will do it.
That said, the concept seems iffy. Based on the above, the fact that it works in all existing browsers, suggests to me that the form of the URL is the following:
>a href="http://robusturl.server.com?http://my.outdat edsite.com&keyword1="whatever"<
Namely, that anchors that use this URL will be sent to this server (apparently fixed in place), then redirected either to the working page, or to the appropriate search engine results. This means that the robust server will be running scripts. While I don't believe that the indent as described here would be to catalog all matches, all you need is one unscrupulous company that uses this and can now trace where you are and where you are going to quite easily with a bit of modification. I really don't like this potental, and personally I'll take a 404 anyday over potental privacy problems.
On the other hand, it migth be that the method uses Javascript, but at which point this nulls and voids any statement on "working on all existing browsers".
"Pinky, you've left the lens cap of your mind on again." - P&TB
"I can see my house from here!" - ST:
I'm pretty sure URL's where just a makeshift URI and some day the IETF was going to figure out how to do URI's right. Am I wrong?
sigs are a waste of space
--
Some 404's are just a way to pass time. Sometimes I go from site to site looking for pages that don't exist just to see what happens.
...poorly.
/. even uses these once a story has been archived.
anyone who's looked at the http spec for more than a millisecond will see that it already handles this case quite gracefully with the 3xx series of responses, including:
301 Moved permanently
302 Moved Temporarily
I think
Perhaps one of the keywords should be the previous URL? In fact, perhaps a better solution would be a new Meta tag of "Prev-URL" (or something similar) that search engines could look at and use to update their databases?
On an anecdotal note (or is that redundant?), I remember searching once, for the web site of a Land Rover owners club (I think it was Ottawa Valley Land Rovers in Canada) and was directed to a auto parts store in Australia -- turned out that the web pages had the names of lots of auto clubs in meta tags. The idea was to get people searching for the clubs to go to the store's site.
Stupid people will be persecuted to the fullest extent allowed by law.
send flames > /dev/null
Only 'flamers' flame!
This sounds like a good idea but you'll still see plenty of 404s if this gets into action.
Why, because 90% of 404's are a result of the page been taken down completely (especially if it's on geocities or xoom or some free provider).
A program that you could install for your browser like NetAccelerate (loads links off current page into cache when the bandwidth isn't been used) but simply loads the links far enough to detect a broken link or not would be very handy. Although it wouldn't solve any problems it would alteast stop you from getting your hopes up when you've finally found a link to a page that claims to be what you've been searching for for an hour.
It's turtles all the way down.
<ASSUMPTION>The 'word description' is going to be capable of describing a page adequately, and uniquely, per page, like an MD5 digest, rather than a simple text descriptor. The latter would just be silly.</ASSUMPTION>
I can see some value to this if the page is static and likely to be relocated, rather than rewritten, or deleted, but how is this going to work if the page is, dynamically generated from a database, and the whole site is prone to reorganisation (like what Microsoft's seems to be).
It might help more if there was a way to uniquely identify snippets of content within a page, and provide a universal look-up scheme based on unique fingerprints of these 'snippets'. Although I'm sure that pouts it straight into XPointers territory, isnt it...?
And an 'opt-out' system is necessary. There are lots of reasons one might want particular content to be transient.
free experimental electronic music netlabel at www.viablehybrid.com
Yes, but thats only one side of it, the pull side. Eventually systems will evolve to the point where a push model exists along-side the pull model for robustness. Unfortunately data structures change, companies reorganize, and no type of pointer will really ever suffice. It will have to change at some point. The robustness of a push model will facilitate these scenarios. It's not a question of if, it will happen, eventually.
This will also allow site owners to see who's linking to them, but obviously it should be utterly transparent (so that you can still link in private, but then you wouldn't get updates).
At some point we'll get there, it's just a matter of time. Questionable schemes such as the topic of this story are just a kludge, and probably not worth the effort.
PoC
Well it sounds like an interesting concept bu unfortunately I can't get to the site already. Surely it's too soon for the /. effect?
This sounds great - practical solutions to a real problem.
OTOH, there are already far too many sites where there just isn't an accessible URL anyway. Some are frame-based, some are dynamically generated. They all have the problem of not being bookmarkable (from within the browser's normal "Bookmark Here" function). Some do try to solve this though, by separately publishing a bookmark that will take you back to the same content.
If this idea is to really work, then it needs to be supported by dynamic sites publishing their Robust Hyperlinks, even for pages that don't have a "traditional" URL to begin with.
Definitely a heads-up for anyone looking for a quick technical fix to the problem.
Simply having a search string included seems a bit of a kludge to me.
- -
What about it the link tag in the html also contained the date/time it was created. This way the browser would now how old it was. It the browser sent this to the server as a header then if the server couldn't find it it could check some database or whatever to see what the directory structure was like at that time and work out what redirect to use. If bookmarks also contained this date/time then surely the server could tell the browser to update the bookmark (after warning the user, of course).
This would be pretty cool on an interactive site where the server could rearrange query strings or whatever if the serverside scripting had been given a big overhaul/re-organization.
Basically, surely the server itself, and not some search engine would best know how to fix a broken link and it would only requires a couple of new headers and should be easy to implement at least on the client side.
-----------------------------------------------
"If I can shoot rabbits then I can shoot fascists" -
- My page has been moved for some reason or another.
- The old page no longer exists at all, i.e. I don't have a redirect on it. (side note, surprisingly enough, many providers will be happy to keep your redirects around for an almost infinate length of time. It's not like they take up a lot of space or bandwidth.)
- I built the first page with a specific set of keywords and I kept those keywords on the new page
- The search engines FINALLY got around to spidering/accepting my site. (Note that it can currently take up to 6 months to be spidered and Yahoo may not reaccept you site.)
And this allows us what?-----
No Zen is good zen
Alexa also collects detailed information about what you look at with your browser, although they of course claim to use it only in the aggregate.
This makes one big whopper of an assumption: that the web page has moved and still exists somewhere. Well, the major cause of 404s that I know of is web sites simply going away.
So you get a 404 and you want to use a search site to find where it went? That's fine if it's been long enough since the move to give the web crawlers time to find it... there's a lot of web space out there to search!
But here's the good one: what if someone decides to hijack your web site by simple keyword spamming? All they have to do is set up their own page with the right keywords, get it indexed, and anyone who uses an "old" link will get redirected to them instead! And if web pages can be defaced, they can be removed, too, thus forcing the 404 and the search!
Better yet, use wholesale keyword spamming to get all those "dead" web pages pointing to your e-commerce site!
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
... as in, "It's a good idea, but!" As has been pointed out, there are potential privacy issues. For the "average" user, though, I don't think this is a terribly big deal. What becomes a problem, then, is access to the Robust URL redirector (as I understand it from posts, the site seems to either be simply down, or a victim of the /. effect). Since all Robust URLs have to pass through the redirector, what happens if the redirector is down? What happens if the redirector is unreachable?
Furthermore, simply feeding keywords to a search engine doesn't guarantee finding your page quickly, or even finding it at all. Designers would have to include unique keywords - words that might not even apply to their page - so that a Robust URL search would turn up only their page. Not only does this bloat HTML code, but it also confuses people using search engines in the usual way.
Certainly a good idea, as many people hate 404s (bah, they're just a fact of life), but it seems like it's got more than a few bugs left in it.
--
You're not wrong. There is in fact a proposal about the form and resolution of URNs (which are location independent) from the IETF. I don't know its status.
As far as I can tell this scheme relies on checksums of the static content of web pages to find the correct web page. So what does this do to dynamically generated content?
Also, somebody else mentioned that they had a project on SourceForge which was basically like the Web, but in a completely distributed manner. This makes a lot more sense to me. The notion that my bits must cross a continent to retrieve data on a certain TOPIC seems a bit archaic. I shouldn't know or care where the data of the topic is stored...I just want it. Also, having a distributed web like this, as the person suggests, will make it a lot harder to invade privacy or censor material.
It's 10 PM. Do you know if you're un-American?
Will this still work even if someone tries to add lots of context words to the search engines so it comes to their page instead?
Don't mean to be the Devil's Adocate, it is just my game programming / design skills kicking in. Whenever someone adds a usefull feature, you must look at the ways people will try to exploit this.
"Live free or Die" - Ironically, seen on a license plate.
Frankly, I'd rather just get the 404 than waste time digging through erroneous links.
By the way, there are hypertext systems that address this issue in ways that actually solve the problem - the now defunct HyperG system was very intelligent about redirecting requests.
Eric