Slashdot Mirror


Broken Links No More?

johndoejersey writes "Students in England have developed a tool which could bring the end to broken links. Peridot, developed by UK intern students at IBM scans company weblinks and replaces outdated information with other relevant documents and links. IBM have already filed 2 patents for the project. The students said Peridot could protect companies by spotting links to sites that have been removed, or which point to wholly unsuitable content. 'Peridot could lead to a world where there are no more broken links,' James Bell, computer science student at the University of Warwick, told BBC News Online. Here is another story on it." See also the BBC story.

212 comments

  1. Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 5, Interesting

    There are two parts to this tool, one of which is bad quite and one of which is quite good.

    First, replacing links. This is a rather quite bad idea. Here's why, with an example.

    In general, we can all agree that the technology behind Google is pretty impressive. It has its own "More Pages Like This" feature, which we can assume is at least somewhat similar to this one. Complex content analysis amoung billions of pages, to determine which are similar and which are different.

    So, suppose we had a link to Major League Baseball, www.mlb.com on our page. And suppose, for whatever reason, that their site went away (perhaps a few more players' strikes?).

    Well, what does Google suggest as a replacement? Check it out here.

    First the National Football League (NFL), then the National Basketball Association (NBA), and then the National Hockey League (NHL). Followed by the ESPN sports network, and NASCAR racing.

    Obviously if wanted to link to a site about baseball, all of those (other than ESPN) are really entirely irrelevant.

    But if we wanted to link to a site about professional sports organizations, all of those (other than ESPN) are QUITE relevant.

    Can this software know our intent?

    Hardly.

    You really have to question the ability of machines to select relevant links.

    The situation is this: If someone goes to the trouble to manually create links in the first place, those should not be automatically changed to other sites that some computer program thinks may be related. Links shouldn't be inserted automatically; if someone needs more information on something you haven't linked to, they can use a search engine. And then your company isn't liable to look idiotic by linking to irrelevant sites.

    Now, the other aspect of this product.

    Removing dead or changed links is quite another matter. Automated removal of links is a great idea and quite useful. For example, consider when someone's domain name expires and it is taken over by a porn site. It'd be great to have a program that automatically removes links to it from your site. Like this tool, this could be based on a percentage of changed content--if the content changes significantly, remove the link quickly and automatically. If the content changes some intermittent amount, flag the link as needing review by the webmaster.

    But in those both case, the software should present the webmaster with a list of such questionable links, those it has removed from the site temporarily, and then allow the webmaster to select replacement links.

    Manually. With relevance.

    1. Re:Can someone say "Bad Idea Jeans"? by GigsVT · · Score: 5, Insightful

      The "related" search isn't what you should be looking at.

      Try this.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    2. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 5, Funny

      You probably bookmark your A/C posts so you can slip back and check them. You're not fooling anyone, you know.

    3. Re:Can someone say "Bad Idea Jeans"? by TheRealMindChild · · Score: 3, Interesting

      Actually, I think something pretty simple that should happen (in the context of search engine links, not links in general) is that when a visitor searches google, and I click one of the returned links, google should do something like queue that perticular page for respidering. Make it so a page cant enter that queue more then say once a week, and you will find that we come up with less and less broken links.

      While this might seem like a LOT for google to be doing on the backend, I would have to think that a majority of the public ends up visiting the same 5-10% of the the internet each day (number pulled from my ass, but an educated guess at least).

      --

      "When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
    4. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 1, Insightful

      No.

      That just happens to work if your intent was to link to any random site about major league baseball (the first of the two cases outlined in the original post).

      But what if your intent was to provide a list of the current US professional sports organizations and their homepages?

      Linking MLB to some other random site ABOUT major league baseball makes no sense. (And if MLB.com no longer exists, MLB probably no longer exists, so the link should probably even be removed!)

      Again, a computer program can't know your intent.

    5. Re:Can someone say "Bad Idea Jeans"? by nayigeta · · Score: 1

      Good insight.

      In addition, I'll just like to point out that some web pages with bad links are due to dynamically form hyperlinks - in order to generate dynamic content.

      In such cases, a bad link might actually be more useful, rather than having a tool that replace bad links and potentially leads to confusion.

      Besides, if this tool acts on 404, it would not be hard to imagine that there would be prior arts.

      --
      Sunset over the lake, cool mist over the bridge; A leave upon the ripples, the snow reflects its glow.
    6. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 3, Funny

      heh.... and what if "whitehouse.gov" for whatever reason doesn't respond ... the third thing on the list is ...

    7. Re:Can someone say "Bad Idea Jeans"? by Otter · · Score: 1
      Removing dead or changed links is quite another matter.

      Here's another proposal: a tool that finds linked redirects and updates the link to the new URL. Even then, requiring individual approval for each change seems like a sensible precaution.

      Although, there would be seeing something elegant about seeing all the old goatse.cx links in the /. archives change to wherever the new site is now.

    8. Re:Can someone say "Bad Idea Jeans"? by gowen · · Score: 1

      So I replace www.mlb.com by a search for mlb. That works well in that one case. But suppose I'm looking for a replacement to www.slashdot.org? Not exactly helpful, is it?

      --
      Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
    9. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 0

      Agreed, it'll probably be less successful than the anti-spam tools are working right now.

    10. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 0

      I mostly agree. Just because a link isn't dead doesn't mean it contains information that is relevant to the content it used to have. How is this anything more than checking for dead links and redirecting through an apache directive, all 404 errors to display a "main content" page instead? This is hardly worth patenting for fuck's sake. A lot of people have been doing this for the better part of a decade.

      And why is IBM going to patent something that STUDENTS at a UNIVERSITY created?! Um... That doesn't sound fucking right....?

    11. Re:Can someone say "Bad Idea Jeans"? by Lust · · Score: 1

      Corporate web sites rarely link to the outside for fear of implied endorsement. I imagine the best replacement links would also come from within the company's site. Then, even if the material is unrelated, at least you can attest that it doesn't break any policies or (worse) promote the competition.

      Regardless, it would be a simple adjustment to have the tool replace only links to the same site.

    12. Re:Can someone say "Bad Idea Jeans"? by dwerg · · Score: 4, Interesting

      Sadly I can't get into details, but their not using technology like the 'related' functionality in Google. They try to find the document that was previously on the other end of the dead link, so the link will never be rewritten to something vaguely related to the original document.

      The reason why they want to replace the links manually is because some webmasters have to manage thousands op pages and don't want to press the 'ok' button every time the system detected a change.

    13. Re:Can someone say "Bad Idea Jeans"? by kzinti · · Score: 2, Insightful

      Obviously if wanted to link to a site about baseball, all of those (other than ESPN) are really entirely irrelevant.

      Yep. Because Major League Baseball has strong conceptual similarity to several other concepts: the game of baseball, professional sports, American culture, and others. Granted some are more specific than others, but that's a pretty tough judgment call that depends on the context in which the original link occurred. If I link to mlb.com from a site about baseball, then it means something different than if I link to mlb.com from a site about American culture, and if mlb.com goes dead, then the link would have to be replaced in different ways for the different sites. So this link-replacement software has to be smart about both the destination and the source (context) of a link in order to replace a dead one properly.

      I wouldn't go so far as to say this has to be done manually. In theory, software could handle it, but I've never seen software smart enough to automate the task for a such a broad information source as the Internet.

      (Disclaimer: no, I have not RTFA; just reacting to an interesting post.)

    14. Re:Can someone say "Bad Idea Jeans"? by Mr+Guy · · Score: 1

      echo "127.0.0.1 slashdot.org" >> /etc/hosts

    15. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 0

      Any serious /. addict will just type http://66.35.250.150 straight from memory.

    16. Re:Can someone say "Bad Idea Jeans"? by Toresica · · Score: 1

      Not exactly helpful, is it?
      I dunno, that guy who posted above could have used some of the pages on that search...

    17. Re:Can someone say "Bad Idea Jeans"? by AndroidCat · · Score: 2, Funny

      That's Mr. Patented Bad Idea Jeans to you! (I'm afraid to RTFA to see how trival their patents might be.)

      --
      One line blog. I hear that they're called Twitters now.
    18. Re:Can someone say "Bad Idea Jeans"? by shawn(at)fsu · · Score: 3, Insightful

      After RTFA it doesn't seem like a fair comparison to say it's like google's "related" or Verisign '
      "product", this looks like a technology a webmaster would use on there own site. It also gives them they option to accept the suggestion or not. This could be really good for corporation with large intranet sites as webmaster leaves documents constanly get moved etc.

      I think had the original poster read the article they wouldn't have gone of half cocked. IBM must also be somehwat confident that this is new technology or else they wouldn't have filled two patents for it.

      --
      500 dollar reward for tip(s) leading to the arrest of the person(s) who stole my sig.
    19. Re:Can someone say "Bad Idea Jeans"? by hackstraw · · Score: 1

      This could be interesting. This is some kind of "autolinking", and I guess like language it would change and evolve over time. So instead of linking to hard urls, one would link to abstract ideas. We do this today in our speach when we talk in America today about "our president" we mean George W. Bush. But in another country, or even again in the US the context could be different to mean something like the president of our company or club. So in the future we will have these intelilinks like: Weapons of mass destruction or Litigious Bastards, or Miserable Failure, which all go to links that are relavent to today in the here and now, but who knows? Sometime in the future a search for "Weapons of mass destruction", or "Litigious Bastards", or "Miserable Failure" will show completely different results, yet these results will be "correct" at the current time and context.

    20. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 0

      Well, Garcia, you could also give someone else the password to your low ID account.

      That would make a nice clean break.

    21. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 0
      Not exactly helpful, is it?

      Sure, but honestly, how many websites feel the need to have dozens of themes and implement it by using a different host name for each theme.

    22. Re:Can someone say "Bad Idea Jeans"? by Tenebrious1 · · Score: 1

      Well, what does Google suggest as a replacement? Check it out here. First the National Football League (NFL), then the National Basketball Association (NBA), and then the National Hockey League (NHL). Followed by the ESPN sports network, and NASCAR racing.

      Google worked perfectly, isn't it obvious? MLB is not a sport; it is the corporation that is related to a sport, that controls the major professional players, but it is *not* the sport itself. You wanted to find similar items, and so Google brought up similar websites; NBA, NHL, NFL, NASCAR, ESPN... none of these are sports, but the webpages of busineses EXACTLY LIKE MLB. Google worked as advertised.

      If you want a search for baseball, then do a search for baseball, not the MLB.

      --
      -- If god wanted me to have a sig, he'd have given me a sense of humor.
    23. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 0
      It's good to be able to let loose and say what you really feel sometimes!

      Agreed.
      Goatse forever!

    24. Re:Can someone say "Bad Idea Jeans"? by TCM · · Score: 0, Offtopic

      You probably bookmark your A/C posts so you can slip back and check them.

      Eh, there are people that don't do this? Not doing it would be like admitting you don't actually want to participate in a discussion but rather just troll and fire-and-forget remarks.

      AC is not automatically troll or don't-bother-reading. Not for me at least.

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
    25. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 0

      Google worked perfectly, isn't it obvious? MLB is not a sport; it is the corporation that is related to a sport, that controls the major professional players, but it is *not* the sport itself. You wanted to find similar items, and so Google brought up similar websites; NBA, NHL, NFL, NASCAR, ESPN... none of these are sports, but the webpages of busineses EXACTLY LIKE MLB. Google worked as advertised.

      If you want a search for baseball, then do a search for baseball, not the MLB.


      That is the grandparents point. Any site has many aspects or angles that you can view it. The software can't know what you think of mlb.com. Only a person can really understand why the original link was made.

    26. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 0

      Sorry, my copy of Peridot flags this link as "wholly unsuitable content".

    27. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 0

      I keep reading that as "Bad Idea Jesus". What would Bad Idea Jesus do?

    28. Re:Can someone say "Bad Idea Jeans"? by SphericalCrusher · · Score: 1

      Heh, and wouldn't people have to adopt the use of that before links on their server aren't "broken"? I'm sure not everyone is going to want to do that...

      --
      "Instant gratification takes too long." - Carrie Fisher
    29. Re:Can someone say "Bad Idea Jeans"? by GigsVT · · Score: 1

      I'm not saying the idea is good, only that his example wasn't the best way to illustrate things.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    30. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · · Score: 0

      That's a low-down dirty lie!

      Wait...

      ...crap.

    31. Re:Can someone say "Bad Idea Jeans"? by Jotham · · Score: 2, Interesting

      This goes off the assumption that it just uses the link name or address to find the page.

      Basically when a page is indexed by a search engine such as google, the first step is to create a document vector from the document based on the repetition of words (terms) and how common these words are (ie. list of TFIDF values -- term frequency * inverse document frequency).

      Anyway, this document vector is what is compared against by the search engine to find matches (which is how google can return results is 0.14 seconds). It also acts as a extremely good (and small) statistical representation of the document.

      I'd put money on the fact that the program calculates and keeps a record of the document vectors of these linked pages. When re-run, it then can tell how much a linked page has changed from last time it was checked (as mentioned in the article) and if a page has moved it can refind it with a lot of precision. (finding a 100% match if the page has just been moved but the content hasn't changed at all).

      Quite clever, and if this was used, you could basically totally reshuffle your web's directory structure, re-run the tool, and have all your links back in place...

    32. Re:Can someone say "Bad Idea Jeans"? by Frizzle+Fry · · Score: 1

      You post as AC because you don't want to track your posts and "see how they do", but there's a reply to one of your posts and you've replied to it within four minutes. Hmmm.

      --
      I'd rather be lucky than good.
  2. Oh ya... by Sloh_One · · Score: 0

    And im sure ill be making use of this technology as along side of my copy of Duke Nukem Forever

  3. Great by gowen · · Score: 5, Insightful
    Peridot could lead to a world where there are no more broken links,
    ... just links that don't got where the author intended. Gee, thats ... just great.

    Hang on. On similar lines, I've a great idea. Suppose I type a nonexistent hostname into my browser. Wouldn't it be good if the DNS server just gave me its best guess instead of an error message. Or some kind of Site Finding search engine. That'd be even better than ... :)
    --
    Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
    1. Re:Great by rokzy · · Score: 4, Insightful

      I agree it's a bad idea and is imo looking in the wrong direction.

      I want things to be LESS tolerant of mistakes, not more. this is why the web is so fucked up. when people can get away with absolute shit, why produce anything better than shit?

    2. Re:Great by waterford0069 · · Score: 3, Funny
      mean while at Verisign...

      "Hey guys, we have grass roots support, check out slashdot!"

    3. Re:Great by tannnk · · Score: 3, Informative

      Hey... We had this kind of features on internet before

      --
      T!
    4. Re:Great by TheKeeper · · Score: 1

      check your browser settings, i seem to remember IE having this ability on its own.
      and i know i've seen a plugin for moz somewhere.

    5. Re:Great by Jim+Hall · · Score: 1

      check your browser settings, i seem to remember IE having this ability on its own. and i know i've seen a plugin for moz somewhere.

      Perhaps you are thinking of Internet Keywords:

      How does it work?

      The Location bar (Bugzilla component Browser:Location Bar) takes user input and converts it into a URL. If the user provides an absolute URL, the browser will get that page. In practice, this is very rare, so the URL bar uses a series of smart parsers to convert the user-typed string to a complete URL and retrieve the guessed page. The details of this URL resolution process are beyond the scope of this document. Suffice to say, this works most of the time.

      Internet Keywords supplements the URL bar parsers in some cases. For example, words with spaces will go directly to Internet Keywords. Also, Domain Guessing is replaced by Internet Keywords. For example, "mozilla" will not be expanded to "http://www.mozilla.com"

      Internet Keywords is turned via: (pref("keyword.enabled", true), which can be set in Preferences | Navigator | Smart Browsing | Internet Keywords.

      If so, the URL bar sends the text to the keyword: protocol handler, which sends the text to the Internet Keyword server, by URL encoding the string at the end of the keyword URL set in: (pref("keyword.URL", )

      The server response is displayed as the page, which could be anything, but usually are one of two things:

      1. A re-direct to the "correct" page that the server thinks you wanted
      2. A page.

      I haven't used it, but it looks interesting.

  4. I liked this one: by underpar · · Score: 5, Funny

    A team in the Netherlands built an application that listens to contact centre conversations, picks out relevant keywords and automatically prompts the call centre agent with possible answers.

    Does this app take the form of a paper clip? Because that would be a great idea!

  5. Semantic Web? by jarich · · Score: 5, Insightful

    Wouldn't this idea work a lot better with semantic web markup attached to links and also to intranet pages?

    1. Re:Semantic Web? by dwerg · · Score: 1

      Semantic web functionality could be easily integrated into the algorithm that decides if the pages are the same.

    2. Re:Semantic Web? by Milo+Fungus · · Score: 1

      Parent poster is exactly right. The semantic web is designed exactly for just this kind of thing, and would drastically reduce the amount of computing power needed to do it.

      For a good discussion of the semantic web, and why we need to get going and build it, read the relevant chapter in The Unfinished Revolution by Michael Dertouzos. I didn't quite understand what Tim Berners-Lee was getting at when he described the semantic web in Weaving the Web. Dertouzos explains it better, I think.

      I had an idea earlier this week about broken links. I use Amaya as my primary word processor, and I use hyperlinks to connect related documents. But directory structures change over time as different areas change in importance. For instance, it may have made sence to keep your financial aid status documents in the 'School' folder during the summer, but after the semester begins that folder fills up with lecture notes. Then those lecture notes are partitioned into child folders for separate subjects. Maybe now you want to move your financial aid documents to a child folder called 'Financial Aid'. It would be really nice if the application (or the operating system) kept track of changes to the directory structure and updated the link urls in the documents. Perhaps it could leave a pointer in the old place for a certain amount of time, just in case the change is only temporary.

      It shouldn't matter where the document is - on my local machine in a certain folder, on a removable disk, on a network share, or on the Web. Things get trickier when the document is on the web, which is where this technology could help.

    3. Re:Semantic Web? by Doctor+O · · Score: 1

      Yes, it would, but simply stripping all tags works with both kind of markup, semantic as well as presentational. Work with what's left, and it almost doesn't matter.

      Mind you, I'm all for the semantic web and build sites accordingly, but people are hyping it for the wrong reasons. It *is* great, but it's for accessibility and brings separation of content and design with it which makes for helluva lot of time savings when updating those pages.

      --
      Who is General Failure and why is he reading my hard disk?
  6. well by Anonymous Coward · · Score: 5, Funny

    I think the link is broken... :)

    1. Re:well by TopShelf · · Score: 1

      Sounds like a job for this guy...

      --
      Stop by my site where I write about ERP systems & more
    2. Re:well by DrEldarion · · Score: 1

      I wonder, if we slashdot a server, does Peridot kick in while the server is down?

  7. No more broken bookmarks... by greppling · · Score: 2, Insightful

    ...would be good enough for me. I find it really annoying how many of the bookmarks I don't use often are broken after about a year or so.

    1. Re:No more broken bookmarks... by bizpile · · Score: 1

      ...would be good enough for me. I find it really annoying how many of the bookmarks I don't use often are broken after about a year or so.

      How many bookmarks do you actually use after a year? Most stuff I bookmark is irrelevant long before it gets a chance to be broken.

    2. Re:No more broken bookmarks... by TrentL · · Score: 2, Insightful

      Why would that be a good thing? If the page I originally bookmarked is gone, I want an error message, not a redirection to something similar.

    3. Re:No more broken bookmarks... by julesh · · Score: 1

      How about an error message and a pointer to something similar? That's one of the options that this software provides, if your RTFA.

    4. Re:No more broken bookmarks... by smagruder · · Score: 1

      I run this software weekly. It's the best (and free) tool I've found for telling me the state of my links.

      --
      Steve Magruder, Metro Foodist
    5. Re:No more broken bookmarks... by mce · · Score: 1
      I tend to find sites that interest me but that I have no time to read at that specific moment (yes, I'm abnormal: an IT geek with non-IT interests that are not subject to the fallout of Moore's law). So I bookmark them in a special "to be processed" category. As I'm chronically overworked, on the average I add more pages to this list than I actually check out to classify elsewhere or remove. But I surely do every so often spend an evening researching some specific topic purely out of interest and diving into this "to be processed" list. When that happens, I select the links that are related to the topic at hand, no matter how long they have been in the list.

      So a delay of year or more is not unusual for me. In fact, the oldest item my current list that I have explicitly not yet removed for being no longer relevant anyway has been on there for about three years now. Someday I will look at it (if the page still exists, at least (it does now, I just checked)).

      A long time ago, borwsers like Mosaic had a nifty feature that allowed you tcheck your entire list of bookmarks for validity with only a single mouseclick. I really wish Mozilla would support this.

  8. And sure, Verisign could operate this like DNS by Anonymous Coward · · Score: 1, Insightful

    For the good of the internet..

  9. What if the page is deleted, not changed by alta · · Score: 4, Insightful

    My biggest problem is when I follow a link to a website that's no longer there. Yeah, moved pages happen, but I don't think they happen as often as deleted pages, expired domains, deleted websites, etc.

    --
    Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
    1. Re:What if the page is deleted, not changed by l810c · · Score: 1
    2. Re:What if the page is deleted, not changed by Anonymous Coward · · Score: 0

      That's why I use a proxy (Privoxy) which has it's own 404 error page that redirects me to archive.org. It took ten seconds to code and is very useful.

      If you use privoxy, put the following in your no-such-domain template file:
      <meta http-equiv="refresh" content="2; url=http://web.archive.org/web/*/@hostport@@path@" >

    3. Re:What if the page is deleted, not changed by troon · · Score: 4, Informative

      That's what the 410 Gone HTTP response header is for. If only admins would use it more...

      --
      Ydco co ,df C erb-y go. a Ekrpat t.fxrapev
  10. Take this with a grain of salt by blankman · · Score: 5, Insightful

    This sounds a little like SiteFinder from Verisign. Click a broken link and isntead of a helpful error message you get whatever content IBM thinks is appropriate. Certainly this could be useful, but it could also end up as just another vehicle for advertising.

    1. Re:Take this with a grain of salt by liquidsin · · Score: 2, Informative

      The big difference here is that in the case of SiteFinder, Verisign had control over where you ended up for basically the entire internet. This seems like it would be the type of thing that would run as an Apache mod that would get invoked when a 404 gets returned, and so would only affect that particular site. There's a big difference between going to www.linuxdistro.org/whats_new.html and getting redirected to www.linuxdistro.org/whatsnew.php like this would probably do, and going to www.linuxdistor.org (typo intentional) and having Verisign redirect you to www.microsoft.com because they're getting paid to advertise.

      --
      do not read this line twice.
    2. Re:Take this with a grain of salt by julesh · · Score: 2, Informative

      If you read the article (the BBC one, which is the only link in there with any relevant information) you'll find that's not how it works. It alerts the webmaster and suggests a replacement, rather than randomly "fixing" other people's pages.

  11. Not Entirely New by terrencefw · · Score: 3, Informative
    I've seen lots of site that return search results based on bits of the broken link instead of 404's.

    Suppose you have broken link http://somesite.com/foo/bar.html, some sites return a list of search results from within 'somesite.com' matching 'foo' or 'bar'. Quite clever, and much more useful than a plain old 'page not found' error.

    This just takes that one step further by doing the searching at the referring end instad.

    --
    Like tinyurl, but one letter less! http://qurl.co.uk/
    1. Re:Not Entirely New by Anonymous Coward · · Score: 0
      and much more useful than a plain old 'page not found' error


      Perhaps it is more useful to a human, but not to a script that would find a "404" more useful.

      These types of solutions (ex: SiteFinder) lead to ambiguity. Ambiguity is not a "Good Thing" (tm)

  12. "New technology" by SpooForBrains · · Score: 0

    Correct me if I'm wrong, but couldn't the majority of this New Exciting Technology (TM) be achieved with a small amount of Bash scripting and a copy of wget?

    --
    "The dew has clearly fallen with a particularly sickening thud this morning"
    1. Re:"New technology" by Anonymous Coward · · Score: 0
      Sure, just like Linux could be achieved with a small amount of C and some ASM.

      The trick is to, you know, do it.

  13. The Slashdot use? by makomk · · Score: 3, Interesting
    I actually considered whether it would be possible to write some code to detect if linked-to content has been replaced. The reason I was interested was to make it impossible for someone to put up a copy of a slashdotted page, link to it in a posting, and then substitute it for a copy of goatse once they'd been moderated up.

    I decided it'd be too hard for software to decide whether a change was significant. I wonder how this software does it - presumably, you can change the threshold?

    1. Re:The Slashdot use? by Anonymous Coward · · Score: 0

      My favourite is the script that randomly sends visitor to goatse.cx. The reactions of the people replying to the message!

      Mod parent down, it's goatse.cx!

      • Mod parent down, mod grandparent up, it's not goatse.cx!

        • Mod parent down, mod grandparent up, mod great grand parent down, it's goatse.cx!

        • Mod parent down, mod grandparent up, mod great grand parent down, mod great great grand parent up, it's not goatse.cx!

  14. More info... by Guano_Jim · · Score: 2, Funny

    You can read more information about this process here.

  15. worrying by TwistedSpring · · Score: 4, Insightful

    "Peridot could lead to a world where there are no more broken links". Yes, it could. Peridot could also lead to a world where broken links are not manually and intelligently spotted and repaired, but automatically repaired. Automatic resolution of what a link "ought" to point to is never going to be accurate (look at search engines), and could make a company website a minefield of confusion and frustration for the user.

    Only time will tell, I suppose.

    1. Re:worrying by FearUncertaintyDoubt · · Score: 2, Insightful

      Imagine a case where a broken link is pointed to another link, which later itself becomes a broken link, and so on...it might even be possible that somehow the chain loops back on itself at some point. One thing I've realized in my career is that if you handle an error too gracefully, no one bothers to fix it. I prefer to have errors cause enough of a problem that there is feedback to fix it.

  16. Can I just have a web site that, you know, works? by grasshoppa · · Score: 2, Insightful

    Websites need to be useful before I start caring about broken links. I can think of any number of sites that started off with the best of intentions, but never quite live up to being useful.

    From bad layout, to missing options, to obscure names for common links, it seems that people are actively trying to hide crap from the end user, making their website utterly worthless.

    Can we devise a tool that fixes this problem first?

    --
    Mod me down with all of your hatred and your journey towards the dark side will be complete!
  17. Well that sounds perfectly dreadful by Illserve · · Score: 5, Insightful

    Some algorithm cruising through my website, rearranging files as it sees fit?

    Sounds like a recipe for utter disaster in the worst case, and a source of mildly embarassing incidents at best.

    How about this algorithm just report dead links to a human instead of trying too hard to be clever?

    This sounds like someone had to come up with a final project, and settled on this one.

    1. Re:Well that sounds perfectly dreadful by Anonymous Coward · · Score: 1, Insightful

      There are plenty of tools that do that already. See, for instance, Site Valet.

      Even so, there are an awful lot of websites that people just can't be bothered to maintain. Maybe the development was outsourced, maybe the maintenance guy is on another project, or on holiday, and the owners don't fancy having a link to a porn site active for 2 weeks or more. So there's a case for correcting it automatically.

      You don't want it on your site? Don't use it. No-one's forcing you.

    2. Re:Well that sounds perfectly dreadful by Anonymous Coward · · Score: 0

      Some algorithm cruising through my website, rearranging files as it sees fit?

      Sounds like a recipe for utter disaster in the worst case, and a source of mildly embarassing incidents at best.

      How about this algorithm just report dead links to a human instead of trying too hard to be clever?


      This isn't such a bad idea. AI has come a long way. The robot servants made these days are incredibly smart and seem almost human. I'm sure that the algorithm can figure a silly little website out.

      Wait, what's that? No intelligent servant robots? Well, what do we have? Clippy and Aibo?

      Hmmm, maybe you're right.

    3. Re:Well that sounds perfectly dreadful by Anonymous Coward · · Score: 0

      Well, I'm author of an open source link management tool that is (was?) heading in this direction. Not talking about changing the links, but as an integrated feature of a CMS (Plone) managing in a sane way to tell authors, make suggestions, but stay out of the way.

  18. yawn by FinestLittleSpace · · Score: 2, Insightful

    Look, I'm not being a troll or flamebait or whatever, but seriously, I've had enough of this fucking pipedream chasing crap that gets posted to BBc news and then swiftly chucked up on slashDot.

    The whole BBc News Technology section reminds me of the 'Tomorrows World' program when it was in full swing, saying how everything could be 'the next big thing' and that we'd likely se eit on shop shelves and in every home 'in a year'. Why do these people never learn that so much of this is just press release bullshit?

    I'm gonna rant more, so sorry. It's just ridiculous... why do we always have to have these blopdy 'next big things'? Why can't people actually look at things rationally (say, as a geek not a mother of 2 who's never touched a computer) and think 'shit, that's not gonna get very far'.

    All these posts are so ednlessly flawed, yet we still get news items with titles like 'THE END OF SPAM FOREVER?', 'LINUX COULD KILL WINDOWS IN A MONTH', 'COULD WE SAY GOODBYE TO THE INTERNET AS WE KNOW IT?'... it's all sensationalist bullshit that *might* happen, but not in the press-release inflated year like they always claim.

    1. Re:yawn by louisfreeman · · Score: 1
      Could it be that the press is a sensationalist bunch out to get as many hyped headlines out as possible ?

      'shit, that's not gonna get very far'.

      :-) I'll force myself to use that line every day for the next week.

    2. Re:yawn by FinestLittleSpace · · Score: 1

      But.. it's understandable... you can barely rely on the BBC ?News site for 'accurate' tech news.. the point is more that slashdot people submit it in the same bloody way... and the moderators dont go 'hmmm.. thats not gonna go down well with cynical geek scum'...........

      grr

    3. Re:yawn by WIAKywbfatw · · Score: 1

      Yeah, because the science and technology sections of so many other news gathering organisations are so superior to the BBC's, aren't they?

      Listen, let me explain this in simple terms: BBC News caters to a wide audience made up of mainly lay people and, as such, it pitches its articles accordingly. It's not New Scientist, Nature, The Lancet or whatever academic publication that's on your reading list and it doesn't pretend to be. It doesn't try to blind its readers with science because it's readers aren't all PhDs with specialist knowledge of every field. It just delivers the basic facts in a manner that the average man on the street can comprehend. And that, my friend, is a very good thing.

      Take any news story and you could pitch it on so many different levels and in so many different directions. A plane crash can be a human interest story, an engineering story, a health and safety story, an insurance story or an investment story. The same facts can be slanted so many ways, and that's before you start to compare and contrast the weighty analysis of a broadsheet to the relative throwaway analysis of a tabloid.

      Criticising BBC News for having articles that are easily digested and understood by its readership is like criticising MTV for showing music videos. Both are simply trying to give their target demographic what they want and need.

      If you want more in-depth analysis of every story then go get it. Go look up the details yourself or go find them on another site. As a starting point, the "Related Internet Links" provided in the right-hand column of every BBC News Online story should be a good starting point: or is that something else the BBC isn't doing to your satisfaction?

      Frankly, it seems to me that you're having a good bitch at the BBC simply because you want to have a good bitch at the BBC. After all, it's not like the BBC has any control over what stories make the frontpage of Slashdot or any other site that you happen to read.

      Frankly, I find it ironic when the majority of people don't even bother to RTFA that you're bitching about articles that don't provide enough detail but if you really want to see story submissions that are aimed at people that have degree- and doctorate-level understanding of the subject material, and who have the time to read it all, then why don't you start submitting you own stories and see how many make the cut.

      In simple terms (or as BBC News Online might put it), if you don't like it then why not try doing something about it yourself?

      (Oh, and by the way, you are being a troll or flamebait or whatever. You might like to think that you're not, but you are.)

      --

      "Accept that some days you are the pigeon, and some days you are the statue." - David Brent, Wernham Hogg
    4. Re:yawn by FinestLittleSpace · · Score: 1

      I agree whole heartedly... It wasn't a rant at BBC... it was a rant at the slashdot 'ooo lets post it and not change a thing' attitude... .i don't see why someone can't put a more realistic healdine on what is supposed to be a more technically apt and trained website...

  19. And... ? by Doesn't_Comment_Code · · Score: 4, Insightful

    Maybe I'm being overly naive, but checking for broken links doesn't seem all that spectacular to me. It wouldn't take long to write a script to find all the broken links on a page.

    The only parts that seemed worth while are replacing the links automatically, and testing if links are relevant.

    I'm not so sure I'd trust a computer to do those things though. I'd much rather have the links flagged and checked by a human.

    --

    Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
    1. Re:And... ? by pipingguy · · Score: 3, Informative


      It wouldn't take long to write a script to find all the broken links on a page.

      Just use Xenu's Link Checker.

    2. Re:And... ? by julesh · · Score: 2, Insightful

      It doesn't only find broken links -- it also alerts you if the content changes substantially. This sounds very useful to me.

  20. CMS by Anonymous Coward · · Score: 5, Insightful

    Any good Content Management System should already take care of any internal broken links automatically, or notify the webmaster so he'll be able to take care of it manually (in the case of page deletion, etc).

    The only kind of people who'd go out of their way to use this software, probably have already use some sort of CMS.

    1. Re:CMS by flibberdi · · Score: 1

      So, what are they using at dmoz?? There are tons of 403's in there. Not that I use dmoz, but it's suppose to be "serious". What other serious directories exists?

  21. It will work, but that isn't good, here is why by tod_miller · · Score: 5, Insightful

    A link points to document X.

    If document X moves, and the link is invalid, a search for the link might actually find document X, and therefore, you have your benefit, and you would have saved a 404.

    However - if a document becomes deprecated and deleted, then how can you assume the link is valid?

    Or indeed, if the document has no relevant substitute.

    A genealogy providing a link to another Willian Wallace wouldn't be good news if the original page went missing.

    A better system is automated 404 alerting to the webservers administrator.

    A bad link gets hit, bam, what document, from where. You can work things out intelligently, not automatically.

    I think this is silly, perhaps grasping at straws, I see no reason why we would replace all our links to google 'I feel lucky' searches, so why do something like this?

    This is the essence of what they have, and all they have done is coulded the search IP field (which is important) with 2 more patents, again increasing costs and endangering open source innovation, the true innovative playing field.

    Of course, I could be wrong.

    --
    #hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
    1. Re:It will work, but that isn't good, here is why by rastos1 · · Score: 1
      However - if a document becomes deprecated and deleted, then how can you assume the link is valid?

      Because it is in different color/it is in different font/changes mouse cursor on hover/ ...

    2. Re:It will work, but that isn't good, here is why by Phisbut · · Score: 1
      This is the essence of what they have, and all they have done is coulded the search IP field (which is important) with 2 more patents

      Anyone knows the number of those patents, I'd actually be interrested in reading them... to see if what my company is developping right now will be infringing...

      --
      After 3 days without programming, life becomes meaningless
      - The Tao of Programming
    3. Re:It will work, but that isn't good, here is why by Halo1 · · Score: 1

      You'd better don't read them, or you can be convicted for treble damages because of "wilful infringement".

      --
      Donate free food here
    4. Re:It will work, but that isn't good, here is why by GeorgeH · · Score: 2, Informative

      Ideally, cool URIs don't change, but in the real world they do.

      If document X moves and the link is invalid, you should be serving an HTTP 301 Permanent Redirect and well behaved user agents will update their bookmarks, and well behaved content management systems will update their code. If document X is gone, you should be serving an HTTP 410 Gone.

      Ideally, 404 is supposed to mean that the web server has never heard of the file in question before, but in the real world...

      --
      Why can't I moderate something "Wrong" or at least "Grossly Misinformed"?
    5. Re:It will work, but that isn't good, here is why by rho · · Score: 2, Interesting
      This isn't a one-time, forever-and-ever-amen technology. You start with an automatic link-checker and link-fixer. Then you add features like "list all the changes so an editor can filter the results", then you move to "direct potential changes to a team of experts", and so forth and so on. The idea is pretty good. When you're a Big Company with a huge website with thousands of links, having this automagic tool is a lot better than having (unprofessional) dead links.

      I, personally, hate dead links with a passion. And, usually, I can devise a Google search that will give me the new home of the old link--often nothing's changed other than the server. A tool that does this for me is useful. Sure, there's plenty of issues that need to be looked into, but that's what we used to call "the Next Version".

      It's easy to nitpick this. Seeing the technology, and then seeing how it can be improved in its next iteration is what separates a visionary from a Slashdot howler monkey.

      --
      Potato chips are a by-yourself food.
    6. Re:It will work, but that isn't good, here is why by tod_miller · · Score: 1

      Some sense man! :-)

      If the world actually used 403/404/410 and 301 it would indeed be a better place!

      Perhaps a neat fix fox apache? if a file is deleted from the system, but it *knows* it was previously served, automagically slap up a 410.

      --
      #hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
  22. Obligatory RTFA by acvh · · Score: 4, Insightful

    this isn't about replacing links on the internet as a whole... it's about replacing links on your company website, or at least reviewing those links.

    not everything that happens in the world is an attempt by big brother to steer internet traffic to verisign or microsoft.

    1. Re:Obligatory RTFA by Anonymous Coward · · Score: 0

      not everything that happens in the world is an attempt by big brother to steer internet traffic to verisign or microsoft.

      That's right -- the third option is that Amazon is trying to patent something!

    2. Re:Obligatory RTFA by dwerg · · Score: 1

      Actually in this case it's IBM patenting 2 things.

      Still, the sad part about this article is that because of the patents and stuff like that the really interresting part: '_how_ does is work?' can't be aswered.

  23. Not New at All by Nazmun · · Score: 3, Funny

    Spyware/Adware and IE already give you search results and links. The only difference is that this automatically places you at a different link without a choice.

    --
    Hmmm... Pie...
    1. Re:Not New at All by Nazmun · · Score: 1

      This wasn't really for humor... such spyware already exists and this is almost mimicking it.

      --
      Hmmm... Pie...
  24. Sorry by Anonymous Coward · · Score: 0

    You are the weakest link. Goodbye!

  25. Slashdotted by genneth · · Score: 4, Funny

    Damn you slashdotters!!! I work at IBM and the intranet server is down! I can't believe you've managed to cause the automatic load-balancer to kill the intranet in favour of a slashdot article.

    Damn you!!

    And purple hatstands

    1. Re:Slashdotted by trilks · · Score: 1

      Fixing broken links, that's a good idea. However, a better one might be to figure out a way to avoid getting slashdotted, as their site will soon be.

      --
      You won't hate yourself in the morning if you don't get up before noon.
    2. Re:Slashdotted by dwerg · · Score: 1

      How am I going to find the sametime address of that cute intern now?

    3. Re:Slashdotted by Anonymous Coward · · Score: 0

      You're half right... I've worked with some cute IBM interns, but they were never programmers...

    4. Re:Slashdotted by Anonymous Coward · · Score: 0

      You haven't been working there long, have you? That site is nearly always down... just about as reliable as SINE...

  26. Ahh... the good days at Big Blue by firefarter · · Score: 1

    The story reminds me of my diploma, which I wrote at IBM Germany. Ah, the good days....

    I have never worked with people like that - highly skilled and very friendly and approachable. As a group very concentrated, but very relaxed as individuals.
    Wouldn't be an exaggeration to say that that experience has defined what I view as professionalism.

  27. In other developments.... by tod_miller · · Score: 2, Interesting

    some over funded jumped up interns have developed a high tech, method and software and system to stop the slashdot effect.

    Each webserver will return a redirect to a google cache lookup for itself if the load sever gets too high.

    1: Stupid idea
    2: Patent
    3: Wait 'til someone nudges at your generously worded patent
    4: happily license this unrelated technology to keep thier VC peeps in the green.

    --
    #hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
    1. Re:In other developments.... by julesh · · Score: 1

      Unfortunately, this is happening in Europe, and (at least at present) such ideas aren't patentable in Europe.

  28. Simple solution by mirko · · Score: 3, Insightful

    ErrorDocument 404 script.pl
    Where script.pl parse the wanted URL and ask an indexing engin to find the most relevant page associated with the query...

    --
    Trolling using another account since 2005.
    1. Re:Simple solution by julesh · · Score: 1

      ErrorDocument 404 script.pl

      That would be very useful if I could persuade everyone I link to to do it. However, since I can't, a solution that runs on the server where the links reside, not the linked content, is much more useful.

  29. Vulnerability? by darkmeridian · · Score: 2, Informative

    Remember Google-hacks at http://johnny.ihackstuff.com/? Basically, since Google effectively snoops millions of servers, you can use this information to break into servers and get information. Having an internal feature that connects broken links to real pages may be orders of magnitudes worse. What if I imaginatively "linked" to a made-up URL to see what's on your servers? This could be bad news if it's effectively done.

    --
    A NYC lawyer blogs. http://www.chuangblog.com/
  30. Useful for science papers? by AndroidCat · · Score: 1

    How about if I type up a large scientific paper loaded with non-existant links, and then their software will "fix" it by finding the proper material out on the net and pointing the links to it. This could revolutionize the science of hand waving!

    --
    One line blog. I hear that they're called Twitters now.
  31. Prior Art by Bill+Dimm · · Score: 2, Insightful

    They think there is something to patent here? Seems like there should be prior art all over the place. At MagPortal.com we have been using software to repair our links to articles for years.

  32. Rather then finding new relavent links... by Lifix · · Score: 2, Insightful

    How about this, lets find a better way to eliminate bad links. Have a bot scan your companies, web site, and every time you find a link to an outside source, save that page to your servers, if the link gets screwed up, you can replace it with a link to the saved web page in your server until you can do something about it.

    This would not work with large web sites, but if it is just a link to a how-to guide or something small like that this would work.

    --
    In nature, there are neither rewards or punishments, there are only consequences.
    1. Re:Rather then finding new relavent links... by stratjakt · · Score: 1

      Nope, too many potential IP problems.

      Assholes still haven't decided whether having a companies logo.gif stored in my web cache is "copyright infringement" or not.

      Caching pages without permissions certainly is, though.

      My solution: HIRE A FUCKING WEBMASTER WHO KNOWS HOW TO DO HIS JOB. Duh. If the site is too big for one guy to mangage, hire two.

      Don't just have a bunch of jackoffs and noone running the site, a la daddypants@slashdot.org, but actually hire someone to do work.

      Yeah, IT jobs are supposed to involve work, not sitting on your ass playing GBA and listening to the BBC on Windows Media Player.

      --
      I don't need no instructions to know how to rock!!!!
  33. How can you tell changed link? by 192939495969798999 · · Score: 1

    How can you tell if the link's changed content is an update that's OK, or an update that's "not ok"? If the tool could do that, it could create a site of links related to whatever, kinda like google, but it sounds like it would be a whole level smarter than google somehow.

    --
    stuff |
  34. No thanks by stratjakt · · Score: 3, Insightful

    I'd prefer a more helpful 404 page, maybe with some links to the homepage or main sections of the site on it.

    Sort of a "cannot find hello.jpg, click here to go back to the main page".

    My point being, if the document I'm looking for is not there, I want to know it's not there. I don't want to read something else, thinking it's what I meant to read.

    Usually when I'm googling around and clicking stuff I'm looking for the answer to some coding or computer related problem. I don't want to click on a link for "configuring Samba 3.0 with AD support", and wind up on a "Configuring Samba 2.2 with LDAP" and waste my time following bad advice.

    --
    I don't need no instructions to know how to rock!!!!
    1. Re:No thanks by Tenebrious1 · · Score: 1

      My point being, if the document I'm looking for is not there, I want to know it's not there. I don't want to read something else, thinking it's what I meant to read.

      Exactly. If I put up a web page with links to other sites, I want people to email me saying "hey this link is broken".

      Does a computer understand satire? If I've linked to a satirical, subtle, pro-aborition piece, is the algorith smart enough to know that, or will it relink to a anti-abortion site? If I link to a serious anti-war speech made by some actress, is the algorith smart enough to relink to another similar protest speech, or will it pick up only that the person is an actress and link to some of her movies or reviews of movies? If I've put up links endorsing the 2nd amendment, will the program get faked out by some anti-gun advocate websites which can appear to be pro-gun?



      --
      -- If god wanted me to have a sig, he'd have given me a sense of humor.
  35. Wow by Xoknit · · Score: 0, Offtopic

    This is AMAZING. The intarweb will never be the same.

  36. German readers... by dukoids · · Score: 5, Informative
    may want to take a look at the master's thesis of Nils Malzahn (from 2003, in German) to see (in detail) how this actually can work:

    http://www-ai.cs.uni-dortmund.de/DOKUMENTE/malzahn _2003a.pdf

    Basically, the thesis evaluates different methods to build a kind of "finger-print" of a page. The finger print is used to find the page with google if it is gone, or has changed significantly.

    The internet wayback machine was used to learn distinguishing disappeared pages from pages changing slightly over the time.

  37. implications by theMerovingian · · Score: 2, Funny


    Wonder where it would send me if www.hotmail.com were down?

    *shudder*

    (disclaimer: no, I didn't actually look to see what's on that site)

    --
    "If you think you have things under control, you're not going fast enough." --Mario Andretti
  38. A better solution from the BOFH by tomhudson · · Score: 2, Insightful
    The BOFH has a better solution http://www.theregister.com/2004/09/17/bofh_2004_ep isode_31/
    "Ladies und Gentlemans, I present to you... The Newsmaker!" the PFY chirps happily, waving his hand at his squid plug-in.

    "Which does?"

    "Give me a news headline, anything, no matter how ridiculous!"

    "Scientists discover intelligent life in Redmond!"

    >clickety< >clickety< >click< >clickety< >tap< >tap< >clickety< ... >click<

    "Right, now Google for it!"

    I dutifully fire up Google, bash in Redmond and Intelligence, and roger me senseless with a full height drive if the first 10 hits don't point show up the headline I've just created, pointing at Time Warner, Yahoo News, all the greats...

    "Interesting - injecting false links into Google to point at news sites. I like it!"

    "Ahem," the PFY interrupts. "Click on one of the links."

    I do so, and grab that hard drive for a second go if the site concerned doesn't come up with the headline in question!

    "You hacked the news site?"

    "Not at all! I used the base idea behind banner blocking to remove the lead headline of a news site and insert my headline instead. You can even add a picture if you want, but obviously only for things that are possible to prove."

    "So will this work for all the news sites listed?"

    "Oh yes. And more importantly, the various search sites as well. So no matter what common search engine you use, the proxy discards the first 100 matches and inserts 100 of its own 'matches' instead."
    Honestly, which of the two is more deserving of patent protectin as an "innovative, non-obvious invention"?
    1. Re:A better solution from the BOFH by orst_sw_engr · · Score: 1

      While I think most software patents are crap, just because a patent is obvious doesn't make it a bad patent. Plenty of good patent seem obvious (especially now), like maybe Interchangeable Parts for a Musket Gun.

      I have seen many claims that Ford patented the assembly line; I have never seen a patent number. He did patent things, but is generally is credited with the transmission which is a big patent. I think he would have patent assembly lines if he could have. Think that is what most software is doing... Getting away with patenting the assembly line.

      RSA was not such an obvious method of encryption 20 years ago, but now public keys are everywhere. And I can remember a world without them. Look at it now, it seems so obvious. But it was a deserved patent.

  39. Patent? by Anonymous Coward · · Score: 0

    just how do you patent "wget -r $SITE |grep 404" ?

    if you can, i'm going to patent "/usr/games/fortune -o zippy |cowsay |wall"

  40. pooper hole by Anonymous Coward · · Score: 0

    there's penis in my pooper and it hurts real bad

  41. SED? by Kenja · · Score: 2, Informative

    So, they've invented SED? Cause thats what I've been using for years to replace old/broken links. A simple script using the netsaint/nagios service tests can check if a link is still good and then build a list of bad ones to be replaced by script number two using SED.

    --

    "Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"
  42. Patents are ridiculous... by DroopyStonx · · Score: 1

    Perfectly good and useful technology that everyone can use, and some asshat company has to f'n jump on the bandwagon and patent it in hopes to make a fucking dime... give me a break.

    I weep for the future of technology if this is what it's gonna come down to.

    --
    We have secretly replaced these Slashdot mods' sense of humor with a rusty nail. Let's see if they notice!!
  43. Corporations are legal entities. by Anonymous Coward · · Score: 0

    So what does this mean?

    Quite simply, the rules of the English language dictate that saying "IBM have" is WRONG. "IBM HAS" is correct.

    Thanks in advance for stopping this bastardization of the worlds most-spoken language.

  44. Re:Can I just have a web site that, you know, work by grasshoppa · · Score: 2

    Alright, I gotta know, how is this a troll?

    I don't mind being modded down, but seriously, I am at a loss here.

    --
    Mod me down with all of your hatred and your journey towards the dark side will be complete!
  45. vaguely similar service exists by AssFace · · Score: 1

    Being bad with names, I can't recall what the service is called or the URL - but I have received e-mail from a service (free since I haven't paid for it, that is for sure) that I thought was spam, but upon reading it saw that it is actually terrifically useful.

    It spiders the web and looks for links that are dead, and then e-mails a contact on that site to say that the links are broken and on what page they can be found.

    Come to think of it, I don't even know what address it sends it to.

    Basically it is all magic to me, but I like it.

    --

    There are some odd things afoot now, in the Villa Straylight.
  46. No more broken links by nebulus4 · · Score: 0

    Wonder what RIAA thinks about it?

    --
    "It would be wrong to refuse to face the fact that everything is fundamentally sick and sad."
  47. Instead by Dr.+Stavros · · Score: 3, Informative

    Just use the W3C's link-checker.

  48. More patents for IBM by ClosedSource · · Score: 1

    "IBM have already filed 2 patents for the project."

    More evidence that IBM isn't really committed to an open/free philosophy.

    1. Re:More patents for IBM by terrox · · Score: 1

      despite this idea already being put into practice by other sites a long time ago too. I've seen various dead links with built in searches and related info.

    2. Re:More patents for IBM by Audacious · · Score: 1

      I actually understand why IBM does this. It is to protect them from other companies coming along afterwards and saying that IBM owes them money because they are infringing on their patents. At the same time I deplore the fact that anyone has to patent such blatantly obvious programming.

      It is like patenting 2+2. I say it equals 5 and so I'm patenting that answer.

      This is also yet another example of a patenting office failing to recognize something which is easily done in Perl or PHP (or probably any other language as well so long as you can check a link and get back the 404 error or any of the other "bad link" type of errors). Links which lead to "We are forwarding you to..." can be easily parsed and the new link substituted for the old one.

      Some day, breathing, drinking, and eating will be patented. But I'm betting no one will patent going to the bathroom! ;-)

      --
      Someone put a black hole in my pocket and now I'm broke. :-)
    3. Re:More patents for IBM by ClosedSource · · Score: 1

      "I actually understand why IBM does this. It is to protect them from other companies coming along afterwards and saying that IBM owes them money because they are infringing on their patents."

      I disagree. The primary purpose of patents is to fight competition. You can stop a competitor from producing a product that violates your patent without actually using the technology yourself.

      Carmakers, for example, can make it difficult to market an alternative fuel vehicle by holding relevent patents. Thus they can stop the potential competition without actually spending the money to develop a product.

      If IBM was only concerned about defending itself, it could provide royalty-free licenses for all it's patented technologies. The fact that it doesn't indicates that these patents are not strictly defensive. IBM retains the right to use them as offensive weapons.

    4. Re:More patents for IBM by Audacious · · Score: 1

      This is true. However, there have been many times in the past when one company (such as Macromedia) was to be sued by another company (such as Adobe) and the patents were used to ward off the attack.

      As such, patents can be used both ways: Offensively and Defensively. In this case, it will prevent another company (if the patent is issued) from trying to say that IBM has infringed on their rights (aka SCO vs IBM). By holding the patents, IBM can say "No. We invented this on our own and you have no right to sue us."

      It is true though, that it does not prevent IBM from going around and trying to squash everyone else - but I haven't seen IBM doing that lately. Have you?

      --
      Someone put a black hole in my pocket and now I'm broke. :-)
    5. Re:More patents for IBM by ClosedSource · · Score: 1

      "It is true though, that it does not prevent IBM from going around and trying to squash everyone else - but I haven't seen IBM doing that lately. Have you?"

      No, I haven't, but I don't see much evidence of Sun, MS, HP, or Oracle doing it either. It's a power that is used sparingly; where the upside of being offensive outweighs the negative PR.

      On the other hand, IBM has given lip service (and some no-profit software) to the "free" software movement. If they're sincere, they should not be seeking patents or they should start adopting the royalty-free license I mentioned earlier.

      From a software patent perspective, the only difference between IBM and MS is that IBM has a lot more of them.

    6. Re:More patents for IBM by Audacious · · Score: 1

      On the other hand, IBM has given lip service (and some no-profit software) to the "free" software movement. If they're sincere, they should not be seeking patents or they should start adopting the royalty-free license I mentioned earlier.

      Mmmmmmmmmmmmm.....no. I'd have to disagree. IBM has (as per the SCO vs IBM case is proving) contributed a lot more than just lip service to the OSS movement and it has allowed its employees to help other companies (such as RedHat) to reach their goals of beoming stable companies who also help to protect the OSS outlook.

      Now, before you begin going "What about X or Y or Z" or anything else. I am responding to your quip about lip service. The term "Lip Service" as I define it means that they only talk about doing something and do not really do it. Your definition may be different. But according to my definition - IBM is backing its words with actual service. They not only host some OSS items but they also contributed hardware, money, people, and time into getting the OSS up and going. Without IBM there wouldn't be any respectability to the OSS movement. We'd just be considered hacks (or hackers).

      Just like when Apple Computer, Inc. started up - the business world considered the Apple ][+ a hobby computer. Something not worth even thinking of or about. Then IBM entered the foray and gave the microcomputer the kiss of reality. After they entered the foray no one ever said microcomputers were hobbiest items again. (Now, they may have said with distain that the Apple computer was a kiddie toy - but that is just personal preferences.)

      That was one of the reasons it was really important for IBM to back the OSS movement. When IBM said "This is for real and that we will stand behind it," it grounded everything everyone had worked so long and hard on into the business world. IBM, HP, Novell, et al could have said "We will sue those who have software which is similar to our work." But they didn't. They looked at it and realized that if they help the OSS movement that it is in turn helping them. It helps them keep the cost down on the creation of new tools. It helps to unify a broken hodge-podge of computers, computer OSs, and languages. It has helped to make the internet what it is today.

      And yeah! They make quite a few bucks while helping out. So what? Nothing is keeping you from doing the same thing and starting up a small computer business that puts together the systems, installs the OSs, and installs the whole thing into someone's place of business.

      Now don't take me wrong. I'm not a Big Blue hugging kind of person. But I am also not going to knock the people who are helping things out either. They've done a hell-of-a-lot for OSS and I'm glad they did. The truth is, IBM went out on a limb for OSS by even saying they were going to start supporting it and selling machines with the software installed.

      And the only time, so far, that I have seen IBM swing the big bat was because SCO said "This is all our software and you don't own any of it." SCO was the culprit - not IBM. If anyone were at fault here it would be you and I. We probably are doing only a hundredth of what IBM is out there trying to do for the OSS movement. They are spending millions on ads to increase awareness, holding seminars (been invited to a few thank you! Nothing like a free lunch!), and sending out flyers to various corporations.

      So let's talk about phobias. Especially the fear of some company making money off of the OSS software. Well - get over it. The fact of life is that everyone wants to get paid, and have money. In order to do that you have to have some kind of company (or work for a company). IBM is a company. It is in the business to make money - not to file bankruptcy. IBM isn't coming around and trying to steal what the OSS movement has done - it is following the rules and regulations set down by the OSS people. It is giving away source code, not asking anyone to pay them who isn't a business (like m

      --
      Someone put a black hole in my pocket and now I'm broke. :-)
    7. Re:More patents for IBM by ClosedSource · · Score: 1

      If you read my post carefully you'll note that I did acknowledge that IBM has contributed software to the free/open source movement, just not the software they make a lot of money on. Generally the software they open either supports the sale of closed hardware or closed software. There's nothing wrong with that, but it doesn't qualify them as a poster-boy for OSS either.

      By the way, in case it's not obvious from the name I chose; I'm not doing anything for the OSS movement. I don't, however, consider it a fault.

    8. Re:More patents for IBM by Audacious · · Score: 1

      No. You stated that they did "lip service" which is a deroggatory statement. I said you were mistaken and provided several areas where IBM has helped the OSS movement.

      You are right in that it may not qualify them as a poster boy for the OSS (which is yet another deroggatory statement) but it doesn't disqualify them either.

      On the name: I took that as a given that you were not helping. But it is a fault. The OSS needs people and if you just sit on the sidelines and do nothing then you get what you deserve if you do not go out and help. Good or bad, by being someone who doesn't wish to be involved you hurt not only yourself but anyone you may have helped.

      I believe this has gone as far as it needs to go. Our points have been made and further discussion would be meaningless. Peace, love, and long life.

      --
      Someone put a black hole in my pocket and now I'm broke. :-)
    9. Re:More patents for IBM by ClosedSource · · Score: 1

      You are right about bringing this discussion to a close.

      My only comment at this point is that there are many activities that an individual can participate in and only a limited time for that particpation. If I were more generous with my time I'd prefer to contribute it to what I believe are much more important causes than OSS, such as helping the homeless, teaching people to read, etc. I haven't been doing these things, so I don't take credit for them, but frankly I'd feel better doing those things than contributing to OSS. This may seem to you to be a fault, but I suspect the day will come when you aren't so judgemental.

  49. Oh boy... by ceeam · · Score: 1

    What if (and the chances are high) I store the URLs in some DB, let's add a proprietary format + compression for fun here, and then let's fetch this URLs by a script depending on user entered parameters. Imagine full-text searching or stuff. How is it at all possible to write a universal checking tool here??? Man, I wish people stop wasting energy on stuff like "automatic C++ to Java converter" and similar bullshit when every semi-knowledgable person can instantly say that the thing is no go.

  50. been there, done that. by Quickening · · Score: 4, Informative

    For those running a real browser, just make this a link, preferably in your personal tool bar.

    javascript:Qr=document.URL;if(Qr=='about:blank') {v oid(Qr=prompt('Url...',''))};if(Qr)location.href=' http://web.archive.org/web/*/'+escape(Qr)

    Now when I click on a link that isn't there, I select my Archive search button and it shows me the Wayback Machine's history of that link. Of course it works only if the url hasn't been modified by the server. If it has it's another couple steps (copy link, ^T, archive search, paste url in pop-up dialog)

    --
    tcboo
  51. ibm interns by Anonymous Coward · · Score: 0

    When I worked at IBM the interns never did anything useful. They would come and go as they please and do their homework when they were around. When work got busy they would go home early.

  52. Slashdot effect by EvilGoodGuy · · Score: 1

    Just think of how many pages we could crash with just one article with this tool... *drool*

  53. Google "I'm Feeling Lucky" by Flamefly · · Score: 4, Interesting

    You could create your links using googles im feeling lucky feature, assuming it was just a generic link site looking for interesting sites rather then specific articles.

    e.g:
    http://www.google.com/search?hl=en&ie=UTF-8&q=News +For+Nerds&btnI=Google+Search

    And voila, you'll site will take you to the most popular related site to news for nerds, automagically, if slashdot died one day, another site would take it's place in the google rankings. FF.

  54. Changed is worse in some situations by phorm · · Score: 2, Interesting

    This is quite often true in respect to sites/companies with large webpages and hence lots of links. One company I used to work in the internet/intranet division for kept links to several partners' webpages. When one of those partners let their domain expire, it was bought out by a pr0n company.

    You can imagine how much the staff enjoy the content on the new page... and the IT Security folks especially as the proxy was suddenly giving them lots of nice warnings about workers' viewing inappropriate conduct (probably due to the nasty popups, etc).

  55. I prefer my system by Jameth · · Score: 1

    It is, unsurprisingly, extremely easy to just write a script which checks if links are working and ignored them if they are working or, if they are not working, reports them to the admin and makes them into Not-Links in the page that actually gets posted. Although that might leave a few gaps in navigation, at least the gaps don't let people follow them to dead-ends. And, with the admin warned, they can be fixed promptly.

  56. The Real Worry by Bozdune · · Score: 1

    You know what, none of that link stuff worries me one bit. Links are bound to be irrelevant/stupid/broken unless someone really cares about them and monitors them manually.

    No, the big worry for me is PATENTS. What the hell are they patenting? What is the Big Idea here that deserves a patent? This is scary stuff. What, do we have to find prior art for every stupid idea someone decides to patent? The answer is "yes." We are all out of business if we let this continue. Support the EFF! Kill this stupidity now before we are all out of work.

    1. Re:The Real Worry by Dr.+Evil · · Score: 1

      I personally think there's no singular mind at work on it, it's just one IBMer trying to get a patent listed on their resume and their manager trying to look important.

    2. Re:The Real Worry by Danse · · Score: 1

      I personally think there's no singular mind at work on it, it's just one IBMer trying to get a patent listed on their resume and their manager trying to look important.

      I actually think it's funny that people will brag about how many patents their division or company received in the last year. After seeing the kinds of crap that get patents over the last 10 years or so, I'm not likely to be impressed, regardless of their numbers. In fact, a high number is more likely to be indicative of a large number of truly undeserving patents rather than exceptional innovation. As long as businesses can keep spamming the PTO with every crap idea that pops into someone's head and the PTO keeps granting them a monopoly on the idea, we are going to see more and more small companies and entrepreneurs locked out of competition by the anti-competitive force that this mountain of ridiculous patents represents.


      --
      It's not enough to bash in heads, you've got to bash in minds. - Captain Hammer
  57. scary enough... by Rev.LoveJoy · · Score: 2, Informative
    FrontPage has been able to "Scan your web site for broken links" since it first came out in ... what 1997?

    Clippy indeed, must be a slow news day,
    - RLJ

  58. Old Technology by KillaKen187 · · Score: 1

    Wait... I already get this with the Spyware that is on my computer.

  59. Firefox extension by malx · · Score: 3, Insightful

    On a slightly related note, a Firefox extension that searched links ahead and removed the link rendering for those that return a 404 might be handy (albeit fairly evil).

    On a less related note, I've long been disappointed that some 300 series status codes in HTTP are so under-exploited, both by clients (e.g. automated bookmark management) and people running web sites.

  60. Need to think outside of the site by kippster · · Score: 1

    The problem is larger than an individual site. Since the web is built on a distributed platform, solving broken links on the small is a good start, but not complete. External sites that have the link also need to corrected (whether an update, a delete, a move, etc.).

    One way to extend this idea is to make use of the referrer: field in HTTP. I worked on an early prototype ('95-96 http://www5conf.inria.fr/fich_html/papers/P10/Over view.html) of this that would not only notify the local system administrator of a broken link, but also provided a facility to notify the administrator of the system from whence the request originated.

    Automating this is difficult due to security issues, but it's a least worth somebody continuing to do research and finding ways to make things better, even if incrementally.

    Not sure I like the idea of patenting this, but I do like the idea of people working on it.

    Kipp

  61. If anybody than myself change the links on my site by Yaa+101 · · Score: 1

    Then they will get a very large problem with me... Large enough to get hardball tactics from my closet...

  62. Actually a complex process by smagruder · · Score: 2

    I periodically run dead link checking software to perform this function with regards to my bookmarks, some of which I publish on my web sites.

    There are many things that happen to links, such as redirects, but to conclude that a link is down because you get a 4xx or 5xx HTTP response is extreme. Sometimes sites go down for a period of time for various reasons. Such a link replacement process would need to have some kind of forgiveness mechanism. Further, sometimes links move elsewhere without the benefit of redirects--this replacement process therefore shouldn't replace links with "related" content, but the same content that's moved to another spot.

    The bottom line is that the replacement process requires a step in the process where a human being reviews link change recommendations.

    --
    Steve Magruder, Metro Foodist
    1. Re:Actually a complex process by Anonymous Coward · · Score: 0

      to conclude that a link is down because you get a 4xx or 5xx HTTP response is extreme.

      To conclude that a link is down? No, that's perfectly reasonable. I think you mean "to conclude that a link is gone for good". I agree that 5xx response codes aren't suitable to be treated in this way, and neither are most of 4xx response codes, but 410 Gone, signifying that a resource has permanently been removed, is suitable to be treated in this way.

      The bottom line is that the replacement process requires a step in the process where a human being reviews link change recommendations.

      Absolutely.

    2. Re:Actually a complex process by smagruder · · Score: 1

      I think you mean "to conclude that a link is gone for good".

      Yes, that's what I meant. Thank you.

      --
      Steve Magruder, Metro Foodist
  63. Help my pr0n search? by duxwig · · Score: 0

    So if I post a link to some cool pr0n site, or make my own pr0n....and shut the link down...will it find me more quality porn?

    Now THAT is the question.

  64. IBM patents anything that moves! by Anonymous Coward · · Score: 0

    I swear, when I worked as an intern there, they encouraged patents on anything and everything. They were even proud to lead other technology companies in having patents.

  65. But... by Anonymous Coward · · Score: 0

    Can it deal with the "Slashdot Effect"?

  66. Xanadu by 12357bd · · Score: 1

    If broken links are a problem, maybe the html/http pair would better be shaped more acording the original Xanadu project.http://xanadu.com/

    --
    What's in a sig?
    1. Re:Xanadu by kippster · · Score: 1

      Excellent ideas, how do we get from here to there? I don't think I've seen anybody lay out a plan that would actually do that. Mostly, I've seen Ted whine about the web and people ignoring the good ideas.

      Kipp

    2. Re:Xanadu by 12357bd · · Score: 1

      There's a browser compatibility page on the site, but don't know how good idea it could be, or how easily it could be accepted.

      About Ted's whinnings, I really don't know, but the xanadu project existed years before the web took place. In fact the actual hyperlink idea is a simplification of the original xanadu concept, so I prefer to give some extra credit to the fathers of the idea.

      Xanadu was ahead of his time (hey it's a 60's thingy), and probably failed due to the lack of processing power among other 'political' reasons, but the base idea was good.
      It reminds me the tcp/ip protocol history, a simple and logical protocol, that for years has been underused due to the lack of enough processing power.

      --
      What's in a sig?
    3. Re:Xanadu by kippster · · Score: 1

      I absolutely agree that both credit and praise should be given to the folks that spent many years of efforts on thinking through and researching hyperlink systems. A lot of great ideas that should be incorporated not ignored.

      I think the failure was timing, infrastructure, simplicity, political, processing power, and graphical capabiliites. All of these had to come together to make it truly happen in the 90's.

      Unfortunately, the simplicity which made it able to grow so quickly is also what people have been trying to go back and fix over the last 10 years.

      Kipp

  67. Coining a nickname for this technology... by SuperKendall · · Score: 2, Funny

    The "I'm feeling lucky" link.

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  68. network down by Pragmatix · · Score: 4, Insightful
    Of course, if your links happen to go to a network that is experiencing a temporary outage, this tool would wreck havoc.

    Soon the target network would be back up, but all your links would be lost and randomly changed to something less useful. Good Invention!

    1. Re:network down by Anonymous Coward · · Score: 0

      this tool would wreck havoc.

      You mean wreak havoc. Think about it. "Wreck havoc" makes no sense.

  69. Hey micheal RTFA by ZosX · · Score: 2
    From the fucking summary.

    Broken Links No More? "Students in England have developed a tool which could bring the end to broken links. Peridot............Peridot could lead to a world where there are no more broken links,'"

    I'll troll to hell for this, but I could care less, and I have no problems standing up for what I say. This is terribly irresponsible journalism. No fucking where in the summary does it mention intranet or corporate websites. A world would be pretty global, would it not? Again, the headlines and summaries are getting totally out of sync with the actual articles. Nothing gets edited and people are bitching all over the place about what essentially has amounted to stories (hah, an actual story on slashdot) that are ads. I mean come on. Product announcements versus real news?

    I'm sure for as many people that read this site (and pay the bills, as well as *ahem* salary) that the editors must surely get a great deal of story submissions every day. Hell, I'm sure that many people wouldn't actually mind some fucking content once in a while other than links to other stories. The site hasn't been changed in what, like years now? Are the "editors" that busy that they can't even hire someone to actually double check their stories for dupes and errors?

    Mod me to hell for this, but I have merely summarized what many have bitched about for months and months and months now, and frankly, while I love the humor and the insightfulness that many share here, the "editors" are really getting lazy these days. Feel free to correct me, I mean I'm sure all those "nothing to see here" and downtimes are starting put a cramp in the afternoon quake matches, so I guess they could be working on getting the site running better, right?

    I feel sorry for the subscribers.

    zosX

    1. Re:Hey micheal RTFA by man_ls · · Score: 1

      What's the line between a product announcement for a technology like this, and a product announcement for, say, AMD Opteron processors?

      They're both product announcements.

      Except we like AMD here, and like Linux here, and if it's not one of those two, then we don't like it. Apparently.

  70. A better idea by pdamoc · · Score: 2, Insightful

    Maybe it would be better if that smart program replaced the links with links from:
    archive.org
    or maybe google cache.
    Then ofcourse it has to be smart enough to know it did that and replace the links back with the originals if they come online.
    Sometimes "broken links" can recover.

  71. Hey micheal RTFA by ZosX · · Score: 1
    From the fucking summary.

    Broken Links No More? "Students in England have developed a tool which could bring the end to broken links. Peridot............Peridot could lead to a world where there are no more broken links,'"

    I'll troll to hell for this, but I could care less, and I have no problems standing up for what I say. This is terribly irresponsible journalism. No fucking where in the summary does it mention intranet or corporate websites. A world would be pretty global, would it not? Again, the headlines and summaries are getting totally out of sync with the actual articles. Nothing gets edited and people are bitching all over the place about what essentially has amounted to stories (hah, an actual story on slashdot) that are ads. I mean come on. Product announcements versus real news?

    I'm sure for as many people that read this site (and pay the bills, as well as *ahem* salary) that the editors must surely get a great deal of story submissions every day. Hell, I'm sure that many people wouldn't actually mind some fucking content once in a while other than links to other stories. The site hasn't been changed in what, like years now? Are the "editors" that busy that they can't even hire someone to actually double check their stories for dupes and errors?

    Mod me to hell for this, but I have merely summarized what many have bitched about for months and months and months now, and frankly, while I love the humor and the insightfulness that many share here, the "editors" are really getting lazy these days. Feel free to correct me, I mean I'm sure all those "nothing to see here" and downtimes are starting put a cramp in the afternoon quake matches, so I guess they could be working on getting the site running better, right?

    I feel sorry for the subscribers.

    zosX

  72. Another patent on something stupid by RedLaggedTeut · · Score: 1

    This seems to be the classic case of a patent on something stupid. The guy to patent such a thing is often the first, since all others discarded the idea right away.

    While such a patent has some merit because sometimes it turns out that the stupid stuff is not so stupid after all, this one basically patents two easy steps that are done in succession: finding broken links(easy) and replacing broken links(more difficult).

    In my eyes if they had patented the details of a sophisticated solution to problem b) that would be OK, but I bet they made a broad patent, like patenting all ways to do step a) and all ways to do step b).

    Consider that some web masters did the same process before, replacing broken link by hand, what exactly is new about the process itself in such a patent ?

    My web server automatically replaces broken links with a different 404 page ;-) go figure ..

    --
    I'm still trying to figure out what people mean by 'social skills' here.
    1. Re:Another patent on something stupid by orangesquid · · Score: 1

      404 page on my server. You can pretty easily resolve lost pages on my site, although it's not automatic. I guess the next step could be to, whenever a 404 is generated, log it, and run a daemon later that searches the possible replacement matches and stores the relevant information later; then, the next time that particular URL 404's, some keywords or such could be provided, and the user could be asked, "Is this a close match to what you wanted?," and this poll could be used to sort the keyword links on the 404 page for most probable match.

      IBM needs to stop patenting so much stuff, IMHO. It seems pretty obvious to me. I know that my solution isn't the same, but it's similar in intent, and it could easily be extended to compete with IBM's tool...

      --
      --TheOrangeSquid Is it any wonder things seem so awry? We swim in a sea of confusion and don't have to think to survive
  73. Do for HTTP what Verisign did for DNS by Anonymous Coward · · Score: 0

    But that's a small price to pay for no more goatse.cx links.

  74. Better luck next time by ropley · · Score: 1

    Disclaimer: I used to work for the company discussed.

    Here's a link to an earlier effort:

    http://slashdot.org/article.pl?sid=00/06/18/143420 4&tid=95

    Think it's a good idea? We raised millions in venture capital!

  75. countdown to the first virus... by gosand · · Score: 1

    The first virus to modify this and replace all links to goatse in 5... 4.... 3...

    --

    My beliefs do not require that you agree with them.

  76. I believe in dot.com IPOs too by peter303 · · Score: 1

    Sounds like a another fanstasy too goo to be true :-)

  77. more brains please by Ragica · · Score: 1
    Years and years and years and years ago I replaced the standard Apache 404 page with a cgi on my (old) site which attempted to detect typos, and pick up other hints from the URL being accessed (moved pages in different subdirectories, changed extensions, etc). If the script felt that it had a good idea where the visitor had intended, it would automatically redirect them; otherwise it would present a list of links suggesting what the visitor might have been wanting.

    Damn, i should have gotten a patent!

    Sigh.

  78. It'll simply redefine what a broken link is by Triscuit · · Score: 1

    No, it won't eliminate broken links....It'll simply redefine what a broken link is.

    As far as I'm concerned if it's not the originally intended link it may as well be "broken".

  79. Yes, and it's called... by eomnimedia · · Score: 1

    ...offshore coding monkeys.

  80. Peridot is not mythical... by rthille · · Score: 1, Flamebait


    Tried reading the article, but with writing like 'named after Peridot, a mythical gemstone...' I gave up.

    See, Peridot isn't mythical. It may have mythical properties, but the gemstone itself is _REAL_. Details matter people...

    --
    Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
  81. Like everything2.com..sorta by GreenCow · · Score: 1

    It's sorta like everything2.com, where you link phrases instead of url's, every phrase has an entry, although more obscure phrases won't have any info in the entry, but anyone can enter information in. So this broken links deal seems like that in a way, except maybe with a closed database or using google.

  82. predecessor: robust hyperlinks by pangloss · · Score: 2, Informative

    There were two fellows at UC Berkeley (Phelps and Wilensky) who implemented the idea of "fingerprinting" web pages at least as far back as 2000. It was a non-trivial fingerprinting (i.e. not just MD5 hash of a web page).

    As far as I know, they haven't done any more recent work on this and the software is only available via archive.org.

    A paper

    I gather that the IBM effort is different in significant respects, but it certainly employs ideas from Phelps & Wilensky.

    1. Re:predecessor: robust hyperlinks by Tom+Phelps · · Score: 1
      Robust Hyperlinks was even a Slashdot story, a BBC News story, and other press.

      Here's the blurb from Slashdot: "URLs can be made robust so that if a Web page moves to another location anywhere on the Web, you can find it even if that page has been edited. Today's address-based URLs are augmented with a five or so word content-based lexical signature to make a Robust Hyperlink. When the URL's address-based portion breaks, the signature is fed into any Web search engine to find the new site of the page. Using our free, Open Source software (including source code), you can rewrite your Web pages and bookmarks files to make them robust, automatically. Although Web browser support is desirable for complete convenience, Robust Hyperlinks work now, as drop-in replacements of URLs in today's HTML, Web browsers, Web servers and search engines."

      The technical Robust Hyperlinks paper of 4 years ago is now at archive.org.

      It would be interesting to read a more technical description than just a popular news story to see if Peridot does more than just store a different signature/fingerprint. It would be erroneous if Peridot's patents claim to have invented the idea of fingerprinting web pages to help fix links to pages that have moved or changed.

  83. Why the overhead of replacing content? by INetEngineer · · Score: 1

    I'm a little surprised! This isn't new. Associate keywords or phrases with your links and then let the user search those keywords/phrases if the link doesn't work. You could implement this several ways and I strongly suggest getting an account with Google's AdSense to get paid for your broken links when people search (or any other paid search service). Get paid for broken links? What an idea! So, generate all of your sites links using a database and run checks on them periodically, or run all the links through a gateway that checks their availability. Or, crawl your site for broken links and replace them with a search replacement that picks up the links visible text. Then, when a link is broken, the visitor gets a nice, neat message and, hopefully, still gets the info they are looking for.

    --
    --I smoked my sig.
    1. Re:Why the overhead of replacing content? by Anonymous Coward · · Score: 0

      I would love to get paid for search broken porn links! Paid to surf Porn! I love it

  84. Whoosh. by Doctor+O · · Score: 1
    --
    Who is General Failure and why is he reading my hard disk?
  85. Checkbot by Anonymous Coward · · Score: 0

    I would like to point out link-checking tool Checkbot. Certainly a story like this on /. needs to have a thread about comparing Peridot vs. Checkbot vs. ...

    Randall

  86. My New Patent Is More Unique by militiaMan · · Score: 0

    I have a new patent under the name breath.
    Description: Inhale into the lungs a portion of the atmosphere. Remove some oxygen. Then exhale the unused air. Anyone that inhales needs to pay me a fee!

    So don't breath until my lawyers can contact you for a fee arrangement.

  87. Old news?? by zubinjdalal · · Score: 1

    Anyone heard of Robert Wilensky and Tom Phelps and their work on solving the broken link problem? How about my work?? :)

  88. I have considered something like this... by BillX · · Score: 1

    ...in a somewhat different form. Not so much unbreaking your own internal web site links (here's a free clue, don't break them in the first place!), but dealing with all these links that are not a "404" or similar error, but simply no longer what you originally linked to.

    A BIG example: Domain buy-outs by porn sites, "portal potties" and shady marketing companies, resulting in links pointing to an undesired resource that is still a "valid" (non-error) document. Or links to a news site that eventually recycles the same URL for a new story. Your average "404 checker" is powerless to tell you this has happened.

    My proposed solution makes a copy or digest of the linked document AT THE TIME IT IS LINKED (or very soon after), then compares aspects of the original with the 'current' version during subsequent link checks. The easiest way would be to simply alert the webmaster when the page contents have changed by x% (where x is user definable), or when a 'required' key word(s) or phrase(s) are no longer present. More advanced, future enhancements are possible of course; similarly to Google's ability to pick out related words, the link checker could eventually be able to understand the linked article is about the same topic (if this is all the webmaster is going for), even if the exact words have changed.

    --
    Caveat Emptor is not a business model.
  89. a smart 404 page? my god by Anonymous Coward · · Score: 0

    who'd ever have thought of that if IBM hadn't?...

  90. Site's link to the mating habits of goats is down by initialE · · Score: 1

    Wonder where it would take me?

    --
    Starbucks, Harbuckle of Breath.
  91. "wholly unsuitable content" by Anonymous Coward · · Score: 0

    "said Peridot could protect companies by spotting links to sites that have been removed, or which point to wholly unsuitable content"

    No more Goatse I guess