Slashdot Mirror


Robust Hyperlinks: The End of 404s?

Tom Phelps writes, "URLs can be made robust so that if a Web page moves to another location anywhere on the Web, you can find it even if that page has been edited. Today's address-based URLs are augmented with a five or so word content-based lexical signature to make a Robust Hyperlink. When the URL's address-based portion breaks, the signature is fed into any Web search engine to find the new site of the page. Using our free, Open Source software (including source code), you can rewrite your Web pages and bookmarks files to make them robust, automatically. Although Web browser support is desirable for complete convenience, Robust Hyperlinks work now, as drop-in replacements of URLs in today's HTML, Web browsers, Web servers and search engines."

30 of 105 comments (clear)

  1. This is very important for a few reasons. by dougman · · Score: 2

    I'll explain the 2 that come to mind right away:

    1) Growing sites that may change servers, or domain names (add/on to dedicated URL, change domain name for legal/incorporation/buyout reasons), will see the massive traffic bleed they suffer until everyone realizes their site has changed virtually disappear. Yes, putting a redirect page on your "old home" may help, but for things like RSS file addresses, and other external connectors, which may have an effect on your site, this is a problem.

    Ultimately, of course, for this to TRULY work there needs to be technology like this built into not only browsers, but virtually any software that uses HTTP communication (XML parsers, bots, spiders, etc).

    2) I want to start offering streaming video on my site, and the single biggest obstacle for doing that is COST. Bandwidth, unless you OWN the pipe, is NOT cheap. I can (albeit in a somewhat underhanded fashion) set up a script to register, say, 24 different "free site" pages with the content to be the "correct" version of my page once an hour, and, unless the content is in VERY heavy demand, essentially have a free method of streaming video on my site.

    Egads, I'm already feeling dirty about what I just said. Okay, maybe that's a little TOO unethical. But I guarantee someone will do it.

  2. Sounds very iffy to me by Masem · · Score: 2
    First, I did try to access the link in the article, but the berkeley server appears to be down or slow.

    That said, the concept seems iffy. Based on the above, the fact that it works in all existing browsers, suggests to me that the form of the URL is the following:

    >a href="http://robusturl.server.com?http://my.outdat edsite.com&keyword1="whatever"<

    Namely, that anchors that use this URL will be sent to this server (apparently fixed in place), then redirected either to the working page, or to the appropriate search engine results. This means that the robust server will be running scripts. While I don't believe that the indent as described here would be to catalog all matches, all you need is one unscrupulous company that uses this and can now trace where you are and where you are going to quite easily with a bit of modification. I really don't like this potental, and personally I'll take a 404 anyday over potental privacy problems.

    On the other hand, it migth be that the method uses Javascript, but at which point this nulls and voids any statement on "working on all existing browsers".

    --
    "Pinky, you've left the lens cap of your mind on again." - P&TB
    "I can see my house from here!" - ST:
    1. Re:Sounds very iffy to me by ehiggins · · Score: 2

      Ummm, .jar files do *NOT* indicate JavaScript.

      Java != JavaScript, people!

      --Earl

    2. Re:Sounds very iffy to me by spiralx · · Score: 2

      On the other hand, it migth be that the method uses Javascript, but at which point this nulls and voids any statement on "working on all existing browsers".

      From freashmeat you can see that the appropriate file for it is called Robust.jar, so I think you're probably correct there :)

  3. Wasn't this what URI's were supposed to address? by X · · Score: 2

    I'm pretty sure URL's where just a makeshift URI and some day the IETF was going to figure out how to do URI's right. Am I wrong?

    --
    sigs are a waste of space
  4. Re:Not unlike Freenet by Sanity · · Score: 2
    This has been discussed to death on our mailing lists. Basically our view is that if Freenet is as popular as we hope it will be, then "Freenet" is the perfect term for it, it is possibly more deserving of the term than the other projects which currently use it. If, on the other hand, Freenet is not a success, then this won't affect anyone and it won't matter.

    --

  5. 404 Gallery by dattaway · · Score: 2

    Some 404's are just a way to pass time. Sometimes I go from site to site looking for pages that don't exist just to see what happens.

    1. Re:404 Gallery by kwsNI · · Score: 2
      Yeah, like Userfriendly. I love their 404.

      You're in the midst of nowhere
      a droplet in a mist,
      you musta typed in something weird
      this URL, it don't exist.


      kwsNI

  6. reinventing the wheel... by cabbey · · Score: 2

    ...poorly.

    anyone who's looked at the http spec for more than a millisecond will see that it already handles this case quite gracefully with the 3xx series of responses, including:

    301 Moved permanently
    302 Moved Temporarily

    I think /. even uses these once a story has been archived.

  7. Re:Hijacking redirectors ?? by UncleRoger · · Score: 2
    A very valid point:
    Will this still work even if someone tries to add lots of context words to the search engines so it comes to their page instead?

    Perhaps one of the keywords should be the previous URL? In fact, perhaps a better solution would be a new Meta tag of "Prev-URL" (or something similar) that search engines could look at and use to update their databases?

    On an anecdotal note (or is that redundant?), I remember searching once, for the web site of a Land Rover owners club (I think it was Ottawa Valley Land Rovers in Canada) and was directed to a auto parts store in Australia -- turned out that the web pages had the names of lots of auto clubs in meta tags. The idea was to get people searching for the clubs to go to the store's site.

    --
    Stupid people will be persecuted to the fullest extent allowed by law.
  8. smart and dumb by josepha48 · · Score: 2
    I guess for those of us who don't want to make that move just yet, we can have our 404 document, say, "Sorry I am just a dumb server and don't know where the page has gone." Come back later when I get smarter.

    send flames > /dev/null

    --

    Only 'flamers' flame!

  9. Good Idea but 90% of 404's are deleted pages by bug_hunter · · Score: 2

    This sounds like a good idea but you'll still see plenty of 404s if this gets into action.
    Why, because 90% of 404's are a result of the page been taken down completely (especially if it's on geocities or xoom or some free provider).

    A program that you could install for your browser like NetAccelerate (loads links off current page into cache when the bandwidth isn't been used) but simply loads the links far enough to detect a broken link or not would be very handy. Although it wouldn't solve any problems it would alteast stop you from getting your hopes up when you've finally found a link to a page that claims to be what you've been searching for for an hour.

    --
    It's turtles all the way down.
  10. Nice idea, shame about the... by WhyteRabbyt · · Score: 2

    <ASSUMPTION>The 'word description' is going to be capable of describing a page adequately, and uniquely, per page, like an MD5 digest, rather than a simple text descriptor. The latter would just be silly.</ASSUMPTION>

    I can see some value to this if the page is static and likely to be relocated, rather than rewritten, or deleted, but how is this going to work if the page is, dynamically generated from a database, and the whole site is prone to reorganisation (like what Microsoft's seems to be).

    It might help more if there was a way to uniquely identify snippets of content within a page, and provide a universal look-up scheme based on unique fingerprints of these 'snippets'. Although I'm sure that pouts it straight into XPointers territory, isnt it...?

    And an 'opt-out' system is necessary. There are lots of reasons one might want particular content to be transient.

    --
    free experimental electronic music netlabel at www.viablehybrid.com
  11. Re:The real solution ... by HalJohnson · · Score: 2

    Yes, but thats only one side of it, the pull side. Eventually systems will evolve to the point where a push model exists along-side the pull model for robustness. Unfortunately data structures change, companies reorganize, and no type of pointer will really ever suffice. It will have to change at some point. The robustness of a push model will facilitate these scenarios. It's not a question of if, it will happen, eventually.

  12. The real solution ... by HalJohnson · · Score: 2
    And the logical next step is inter-server communication. At some point we'll end up with a defined way for servers to communicate with each other, so that when an object is moved or removed, the server that "owns" that object can notify other servers that own objects with links to it. The worse case would be better than what we have now, if the object has been removed, the other server could mark it as unavailable and notify the site owner that it needs to be updated. Some site management utils already have a process for checking broken links (pull model), we need a push model.

    This will also allow site owners to see who's linking to them, but obviously it should be utterly transparent (so that you can still link in private, but then you wouldn't get updates).

    At some point we'll get there, it's just a matter of time. Questionable schemes such as the topic of this story are just a kludge, and probably not worth the effort.

  13. Damn! by PacketOfCrisps · · Score: 2
    I am getting a 404 not found on the sites' homepage.

    PoC

  14. It's down already by spiralx · · Score: 2

    Well it sounds like an interesting concept bu unfortunately I can't get to the site already. Surely it's too soon for the /. effect?

  15. We need URLs first by dingbat_hp · · Score: 2

    This sounds great - practical solutions to a real problem.

    OTOH, there are already far too many sites where there just isn't an accessible URL anyway. Some are frame-based, some are dynamically generated. They all have the problem of not being bookmarkable (from within the browser's normal "Bookmark Here" function). Some do try to solve this though, by separately publishing a bookmark that will take you back to the same content.

    If this idea is to really work, then it needs to be supported by dynamic sites publishing their Robust Hyperlinks, even for pages that don't have a "traditional" URL to begin with.

  16. Re:Wasn't this what URI's were supposed to address by Shimbo · · Score: 2
    There is a good paper by the man himself on the problem of URL persistence.

    Definitely a heads-up for anyone looking for a quick technical fix to the problem.

  17. Here's another way to do it by hoss10 · · Score: 2

    Simply having a search string included seems a bit of a kludge to me.
    What about it the link tag in the html also contained the date/time it was created. This way the browser would now how old it was. It the browser sent this to the server as a header then if the server couldn't find it it could check some database or whatever to see what the directory structure was like at that time and work out what redirect to use. If bookmarks also contained this date/time then surely the server could tell the browser to update the bookmark (after warning the user, of course).

    This would be pretty cool on an interactive site where the server could rearrange query strings or whatever if the serverside scripting had been given a big overhaul/re-organization.

    Basically, surely the server itself, and not some search engine would best know how to fix a broken link and it would only requires a couple of new headers and should be easy to implement at least on the client side.


    ------------------------------------------------ -
    "If I can shoot rabbits then I can shoot fascists" -

  18. thoughts by bons · · Score: 2
    The situation:
    • My page has been moved for some reason or another.
    • The old page no longer exists at all, i.e. I don't have a redirect on it. (side note, surprisingly enough, many providers will be happy to keep your redirects around for an almost infinate length of time. It's not like they take up a lot of space or bandwidth.)
    • I built the first page with a specific set of keywords and I kept those keywords on the new page
    • The search engines FINALLY got around to spidering/accepting my site. (Note that it can currently take up to 6 months to be spidered and Yahoo may not reaccept you site.)
    And this allows us what?
    • Well, it means we have to make sure we register with all the possible search engines, including the ones we usually don't care about.
    • It means someone will come up with a "find that 404" search engine that you'll have to submit to as well.
    • Meanwhile, people will notice that you've moved and will create redirect porn pages with your keywords and register them with the 404 search engine.
    • Microsoft will add something to Front page to create default keywords that send your 404 to microsoft.com
    • The new stardards are not part of the official Web Standards so Mozilla will not support it and w3.org will barf errors out about your HTML code.
    • Someone will figure out how to use this technology so that they can set up emergency /. effect mirror sites.
    • Someone will get smart and figure that trick out really quick and take advantage of it."I'm sorry, the page you want has been slashdotted, welcome to geocities."


    -----
  19. Alexa's solution to "404 errors" by Animats · · Score: 2
    Alexa has had a solution to 404 errors for years. They have a large archive of the web, and will give you a copy of a deleted page. Unfortunately, the Alexa client has ballooned into a combination advertising delivery system and portal. They're just now adding Amazon's shopping system. It's turning into a piece of bloatware.

    Alexa also collects detailed information about what you look at with your browser, although they of course claim to use it only in the aggregate.

  20. I see a problem with this... by Megane · · Score: 2

    This makes one big whopper of an assumption: that the web page has moved and still exists somewhere. Well, the major cause of 404s that I know of is web sites simply going away.

    So you get a 404 and you want to use a search site to find where it went? That's fine if it's been long enough since the move to give the web crawlers time to find it... there's a lot of web space out there to search!

    But here's the good one: what if someone decides to hijack your web site by simple keyword spamming? All they have to do is set up their own page with the right keywords, get it indexed, and anyone who uses an "old" link will get redirected to them instead! And if web pages can be defaced, they can be removed, too, thus forcing the 404 and the search!

    Better yet, use wholesale keyword spamming to get all those "dead" web pages pointing to your e-commerce site!

    --
    #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
  21. There's always a "but." by UncleOzzy · · Score: 2

    ... as in, "It's a good idea, but!" As has been pointed out, there are potential privacy issues. For the "average" user, though, I don't think this is a terribly big deal. What becomes a problem, then, is access to the Robust URL redirector (as I understand it from posts, the site seems to either be simply down, or a victim of the /. effect). Since all Robust URLs have to pass through the redirector, what happens if the redirector is down? What happens if the redirector is unreachable?

    Furthermore, simply feeding keywords to a search engine doesn't guarantee finding your page quickly, or even finding it at all. Designers would have to include unique keywords - words that might not even apply to their page - so that a Robust URL search would turn up only their page. Not only does this bloat HTML code, but it also confuses people using search engines in the usual way.

    Certainly a good idea, as many people hate 404s (bah, they're just a fact of life), but it seems like it's got more than a few bugs left in it.

  22. Not unlike Freenet by Sanity · · Score: 3
    I am working on a project that will do something like this - and a whole lot more. The primary intention is to create an information publication system similar to the world wide web, but where censorship is much more difficult or impossible. However there is more to the system than that, it incorporates intelligent decentraised caching making it much more efficient than the world wide web, and also intelligent mirroring meaning that information on the system will never be slashdotted as this site appears to be! The homepage may be found at http://freenet.sourceforge.net/. We are looking for testers and developers right now in preparation for our first release which will happen in the next few weeks.

    --

  23. Re:Wasn't this what URI's were supposed to address by SimonK · · Score: 3

    You're not wrong. There is in fact a proposal about the form and resolution of URNs (which are location independent) from the IETF. I don't know its status.

  24. Dynamic content by Hard_Code · · Score: 3

    As far as I can tell this scheme relies on checksums of the static content of web pages to find the correct web page. So what does this do to dynamically generated content?

    Also, somebody else mentioned that they had a project on SourceForge which was basically like the Web, but in a completely distributed manner. This makes a lot more sense to me. The notion that my bits must cross a continent to retrieve data on a certain TOPIC seems a bit archaic. I shouldn't know or care where the data of the topic is stored...I just want it. Also, having a distributed web like this, as the person suggests, will make it a lot harder to invade privacy or censor material.

    --

    It's 10 PM. Do you know if you're un-American?
  25. Hijacking redirectors ?? by UnknownSoldier · · Score: 3

    Will this still work even if someone tries to add lots of context words to the search engines so it comes to their page instead?

    Don't mean to be the Devil's Adocate, it is just my game programming / design skills kicking in. Whenever someone adds a usefull feature, you must look at the ways people will try to exploit this.

    "Live free or Die" - Ironically, seen on a license plate.

  26. Replacing a broken link with a Google search? No. by rambone · · Score: 3
    Like any search, the search that tries to reunite your 404 error with the correct address is going to be wrong quite often.

    Frankly, I'd rather just get the 404 than waste time digging through erroneous links.

    By the way, there are hypertext systems that address this issue in ways that actually solve the problem - the now defunct HyperG system was very intelligent about redirecting requests.

  27. Try ftp'ing instead by EricWright · · Score: 5
    From the freshmeat announcement, you can ftp it from here. I was able to connect just fine...

    Eric