Broken Links No More?

← Back to Stories (view on slashdot.org)

Posted by ryuzaki0 on Friday September 24, 2004 @01:35AM from the dream-big dept.

johndoejersey writes "Students in England have developed a tool which could bring the end to broken links. Peridot, developed by UK intern students at IBM scans company weblinks and replaces outdated information with other relevant documents and links. IBM have already filed 2 patents for the project. The students said Peridot could protect companies by spotting links to sites that have been removed, or which point to wholly unsuitable content. 'Peridot could lead to a world where there are no more broken links,' James Bell, computer science student at the University of Warwick, told BBC News Online. Here is another story on it." See also the BBC story.

36 of 212 comments (clear)

Can someone say "Bad Idea Jeans"? by Anonymous Coward · 2004-09-24 01:36 · Score: 5, Interesting

There are two parts to this tool, one of which is bad quite and one of which is quite good.

First, replacing links. This is a rather quite bad idea. Here's why, with an example.

In general, we can all agree that the technology behind Google is pretty impressive. It has its own "More Pages Like This" feature, which we can assume is at least somewhat similar to this one. Complex content analysis amoung billions of pages, to determine which are similar and which are different.

So, suppose we had a link to Major League Baseball, www.mlb.com on our page. And suppose, for whatever reason, that their site went away (perhaps a few more players' strikes?).

Well, what does Google suggest as a replacement? Check it out here.

First the National Football League (NFL), then the National Basketball Association (NBA), and then the National Hockey League (NHL). Followed by the ESPN sports network, and NASCAR racing.

Obviously if wanted to link to a site about baseball, all of those (other than ESPN) are really entirely irrelevant.

But if we wanted to link to a site about professional sports organizations, all of those (other than ESPN) are QUITE relevant.

Can this software know our intent?

Hardly.

You really have to question the ability of machines to select relevant links.

The situation is this: If someone goes to the trouble to manually create links in the first place, those should not be automatically changed to other sites that some computer program thinks may be related. Links shouldn't be inserted automatically; if someone needs more information on something you haven't linked to, they can use a search engine. And then your company isn't liable to look idiotic by linking to irrelevant sites.

Now, the other aspect of this product.

Removing dead or changed links is quite another matter. Automated removal of links is a great idea and quite useful. For example, consider when someone's domain name expires and it is taken over by a porn site. It'd be great to have a program that automatically removes links to it from your site. Like this tool, this could be based on a percentage of changed content--if the content changes significantly, remove the link quickly and automatically. If the content changes some intermittent amount, flag the link as needing review by the webmaster.

But in those both case, the software should present the webmaster with a list of such questionable links, those it has removed from the site temporarily, and then allow the webmaster to select replacement links.

Manually. With relevance.
1. Re:Can someone say "Bad Idea Jeans"? by GigsVT · 2004-09-24 01:42 · Score: 5, Insightful
  
  The "related" search isn't what you should be looking at.
  
  Try this.
  
  --
  I've had enough abrasive sigs. Kittens are cute and fuzzy.
2. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · 2004-09-24 01:48 · Score: 5, Funny
  
  You probably bookmark your A/C posts so you can slip back and check them. You're not fooling anyone, you know.
3. Re:Can someone say "Bad Idea Jeans"? by TheRealMindChild · 2004-09-24 01:48 · Score: 3, Interesting
  
  Actually, I think something pretty simple that should happen (in the context of search engine links, not links in general) is that when a visitor searches google, and I click one of the returned links, google should do something like queue that perticular page for respidering. Make it so a page cant enter that queue more then say once a week, and you will find that we come up with less and less broken links.
  
  While this might seem like a LOT for google to be doing on the backend, I would have to think that a majority of the public ends up visiting the same 5-10% of the the internet each day (number pulled from my ass, but an educated guess at least).
  
  --
  
  "When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
4. Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · 2004-09-24 01:50 · Score: 3, Funny
  
  heh.... and what if "whitehouse.gov" for whatever reason doesn't respond ... the third thing on the list is ...
5. Re:Can someone say "Bad Idea Jeans"? by dwerg · 2004-09-24 02:01 · Score: 4, Interesting
  
  Sadly I can't get into details, but their not using technology like the 'related' functionality in Google. They try to find the document that was previously on the other end of the dead link, so the link will never be rewritten to something vaguely related to the original document.
  
  The reason why they want to replace the links manually is because some webmasters have to manage thousands op pages and don't want to press the 'ok' button every time the system detected a change.
6. Re:Can someone say "Bad Idea Jeans"? by shawn(at)fsu · 2004-09-24 02:21 · Score: 3, Insightful
  
  After RTFA it doesn't seem like a fair comparison to say it's like google's "related" or Verisign '
  "product", this looks like a technology a webmaster would use on there own site. It also gives them they option to accept the suggestion or not. This could be really good for corporation with large intranet sites as webmaster leaves documents constanly get moved etc.
  
  I think had the original poster read the article they wouldn't have gone of half cocked. IBM must also be somehwat confident that this is new technology or else they wouldn't have filled two patents for it.
  
  --
  500 dollar reward for tip(s) leading to the arrest of the person(s) who stole my sig.
Great by gowen · 2004-09-24 01:38 · Score: 5, Insightful

Peridot could lead to a world where there are no more broken links,
... just links that don't got where the author intended. Gee, thats ... just great.

Hang on. On similar lines, I've a great idea. Suppose I type a nonexistent hostname into my browser. Wouldn't it be good if the DNS server just gave me its best guess instead of an error message. Or some kind of Site Finding search engine. That'd be even better than ... :)

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
1. Re:Great by rokzy · 2004-09-24 01:51 · Score: 4, Insightful
  
  I agree it's a bad idea and is imo looking in the wrong direction.
  
  I want things to be LESS tolerant of mistakes, not more. this is why the web is so fucked up. when people can get away with absolute shit, why produce anything better than shit?
2. Re:Great by waterford0069 · 2004-09-24 01:54 · Score: 3, Funny
  
  mean while at Verisign...
  "Hey guys, we have grass roots support, check out slashdot!"
3. Re:Great by tannnk · 2004-09-24 01:58 · Score: 3, Informative
  
  Hey... We had this kind of features on internet before
  
  --
  T!
I liked this one: by underpar · 2004-09-24 01:39 · Score: 5, Funny

A team in the Netherlands built an application that listens to contact centre conversations, picks out relevant keywords and automatically prompts the call centre agent with possible answers.

Does this app take the form of a paper clip? Because that would be a great idea!
Semantic Web? by jarich · 2004-09-24 01:39 · Score: 5, Insightful

Wouldn't this idea work a lot better with semantic web markup attached to links and also to intranet pages?

--
Agile Artisans
well by Anonymous Coward · 2004-09-24 01:39 · Score: 5, Funny

I think the link is broken... :)
What if the page is deleted, not changed by alta · 2004-09-24 01:40 · Score: 4, Insightful

My biggest problem is when I follow a link to a website that's no longer there. Yeah, moved pages happen, but I don't think they happen as often as deleted pages, expired domains, deleted websites, etc.

--
Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
1. Re:What if the page is deleted, not changed by troon · 2004-09-24 02:41 · Score: 4, Informative
  
  That's what the 410 Gone HTTP response header is for. If only admins would use it more...
  
  --
  Ydco co ,df C erb-y go. a Ekrpat t.fxrapev
Take this with a grain of salt by blankman · 2004-09-24 01:41 · Score: 5, Insightful

This sounds a little like SiteFinder from Verisign. Click a broken link and isntead of a helpful error message you get whatever content IBM thinks is appropriate. Certainly this could be useful, but it could also end up as just another vehicle for advertising.
Not Entirely New by terrencefw · 2004-09-24 01:41 · Score: 3, Informative

I've seen lots of site that return search results based on bits of the broken link instead of 404's.
Suppose you have broken link http://somesite.com/foo/bar.html, some sites return a list of search results from within 'somesite.com' matching 'foo' or 'bar'. Quite clever, and much more useful than a plain old 'page not found' error.
This just takes that one step further by doing the searching at the referring end instad.

--
Like tinyurl, but one letter less! http://qurl.co.uk/
The Slashdot use? by makomk · 2004-09-24 01:42 · Score: 3, Interesting

I actually considered whether it would be possible to write some code to detect if linked-to content has been replaced. The reason I was interested was to make it impossible for someone to put up a copy of a slashdotted page, link to it in a posting, and then substitute it for a copy of goatse once they'd been moderated up.
I decided it'd be too hard for software to decide whether a change was significant. I wonder how this software does it - presumably, you can change the threshold?
worrying by TwistedSpring · 2004-09-24 01:42 · Score: 4, Insightful

"Peridot could lead to a world where there are no more broken links". Yes, it could. Peridot could also lead to a world where broken links are not manually and intelligently spotted and repaired, but automatically repaired. Automatic resolution of what a link "ought" to point to is never going to be accurate (look at search engines), and could make a company website a minefield of confusion and frustration for the user.

Only time will tell, I suppose.
Well that sounds perfectly dreadful by Illserve · 2004-09-24 01:42 · Score: 5, Insightful

Some algorithm cruising through my website, rearranging files as it sees fit?

Sounds like a recipe for utter disaster in the worst case, and a source of mildly embarassing incidents at best.

How about this algorithm just report dead links to a human instead of trying too hard to be clever?

This sounds like someone had to come up with a final project, and settled on this one.
And... ? by Doesn't_Comment_Code · 2004-09-24 01:43 · Score: 4, Insightful

Maybe I'm being overly naive, but checking for broken links doesn't seem all that spectacular to me. It wouldn't take long to write a script to find all the broken links on a page.

The only parts that seemed worth while are replacing the links automatically, and testing if links are relevant.

I'm not so sure I'd trust a computer to do those things though. I'd much rather have the links flagged and checked by a human.

--

Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
1. Re:And... ? by pipingguy · 2004-09-24 02:14 · Score: 3, Informative
  
  It wouldn't take long to write a script to find all the broken links on a page.
  
  Just use Xenu's Link Checker.
CMS by Anonymous Coward · 2004-09-24 01:45 · Score: 5, Insightful

Any good Content Management System should already take care of any internal broken links automatically, or notify the webmaster so he'll be able to take care of it manually (in the case of page deletion, etc).

The only kind of people who'd go out of their way to use this software, probably have already use some sort of CMS.
It will work, but that isn't good, here is why by tod_miller · 2004-09-24 01:45 · Score: 5, Insightful

A link points to document X.

If document X moves, and the link is invalid, a search for the link might actually find document X, and therefore, you have your benefit, and you would have saved a 404.

However - if a document becomes deprecated and deleted, then how can you assume the link is valid?

Or indeed, if the document has no relevant substitute.

A genealogy providing a link to another Willian Wallace wouldn't be good news if the original page went missing.

A better system is automated 404 alerting to the webservers administrator.

A bad link gets hit, bam, what document, from where. You can work things out intelligently, not automatically.

I think this is silly, perhaps grasping at straws, I see no reason why we would replace all our links to google 'I feel lucky' searches, so why do something like this?

This is the essence of what they have, and all they have done is coulded the search IP field (which is important) with 2 more patents, again increasing costs and endangering open source innovation, the true innovative playing field.

Of course, I could be wrong.

--
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
Obligatory RTFA by acvh · 2004-09-24 01:45 · Score: 4, Insightful

this isn't about replacing links on the internet as a whole... it's about replacing links on your company website, or at least reviewing those links.

not everything that happens in the world is an attempt by big brother to steer internet traffic to verisign or microsoft.
Not New at All by Nazmun · 2004-09-24 01:46 · Score: 3, Funny

Spyware/Adware and IE already give you search results and links. The only difference is that this automatically places you at a different link without a choice.

--
Hmmm... Pie...
Slashdotted by genneth · 2004-09-24 01:48 · Score: 4, Funny

Damn you slashdotters!!! I work at IBM and the intranet server is down! I can't believe you've managed to cause the automatic load-balancer to kill the intranet in favour of a slashdot article.

Damn you!!

And purple hatstands
Simple solution by mirko · 2004-09-24 01:50 · Score: 3, Insightful

ErrorDocument 404 script.pl
Where script.pl parse the wanted URL and ask an indexing engin to find the most relevant page associated with the query...

--
Trolling using another account since 2005.
No thanks by stratjakt · 2004-09-24 02:09 · Score: 3, Insightful

I'd prefer a more helpful 404 page, maybe with some links to the homepage or main sections of the site on it.

Sort of a "cannot find hello.jpg, click here to go back to the main page".

My point being, if the document I'm looking for is not there, I want to know it's not there. I don't want to read something else, thinking it's what I meant to read.

Usually when I'm googling around and clicking stuff I'm looking for the answer to some coding or computer related problem. I don't want to click on a link for "configuring Samba 3.0 with AD support", and wind up on a "Configuring Samba 2.2 with LDAP" and waste my time following bad advice.

--
I don't need no instructions to know how to rock!!!!
German readers... by dukoids · 2004-09-24 02:12 · Score: 5, Informative

may want to take a look at the master's thesis of Nils Malzahn (from 2003, in German) to see (in detail) how this actually can work:
http://www-ai.cs.uni-dortmund.de/DOKUMENTE/malzahn _2003a.pdf
Basically, the thesis evaluates different methods to build a kind of "finger-print" of a page. The finger print is used to find the page with google if it is gone, or has changed significantly.
The internet wayback machine was used to learn distinguishing disappeared pages from pages changing slightly over the time.
Instead by Dr.+Stavros · 2004-09-24 02:35 · Score: 3, Informative

Just use the W3C's link-checker.
been there, done that. by Quickening · 2004-09-24 02:43 · Score: 4, Informative

For those running a real browser, just make this a link, preferably in your personal tool bar.

javascript:Qr=document.URL;if(Qr=='about:blank') {v oid(Qr=prompt('Url...',''))};if(Qr)location.href=' http://web.archive.org/web/*/'+escape(Qr)

Now when I click on a link that isn't there, I select my Archive search button and it shows me the Wayback Machine's history of that link. Of course it works only if the url hasn't been modified by the server. If it has it's another couple steps (copy link, ^T, archive search, paste url in pop-up dialog)

--
tcboo
Google "I'm Feeling Lucky" by Flamefly · 2004-09-24 02:50 · Score: 4, Interesting

You could create your links using googles im feeling lucky feature, assuming it was just a generic link site looking for interesting sites rather then specific articles.

e.g:
http://www.google.com/search?hl=en&ie=UTF-8&q=News +For+Nerds&btnI=Google+Search

And voila, you'll site will take you to the most popular related site to news for nerds, automagically, if slashdot died one day, another site would take it's place in the google rankings. FF.
Firefox extension by malx · 2004-09-24 03:13 · Score: 3, Insightful

On a slightly related note, a Firefox extension that searched links ahead and removed the link rendering for those that return a 404 might be handy (albeit fairly evil).

On a less related note, I've long been disappointed that some 300 series status codes in HTTP are so under-exploited, both by clients (e.g. automated bookmark management) and people running web sites.
network down by Pragmatix · 2004-09-24 03:41 · Score: 4, Insightful

Of course, if your links happen to go to a network that is experiencing a temporary outage, this tool would wreck havoc.
Soon the target network would be back up, but all your links would be lost and randomly changed to something less useful. Good Invention!