Broken Links No More?

← Back to Stories (view on slashdot.org)

Posted by ryuzaki0 on Friday September 24, 2004 @01:35AM from the dream-big dept.

johndoejersey writes "Students in England have developed a tool which could bring the end to broken links. Peridot, developed by UK intern students at IBM scans company weblinks and replaces outdated information with other relevant documents and links. IBM have already filed 2 patents for the project. The students said Peridot could protect companies by spotting links to sites that have been removed, or which point to wholly unsuitable content. 'Peridot could lead to a world where there are no more broken links,' James Bell, computer science student at the University of Warwick, told BBC News Online. Here is another story on it." See also the BBC story.

31 of 212 comments (clear)

Min score:

Reason:

Sort:

Great by gowen · 2004-09-24 01:38 · Score: 5, Insightful

Peridot could lead to a world where there are no more broken links,
... just links that don't got where the author intended. Gee, thats ... just great.

Hang on. On similar lines, I've a great idea. Suppose I type a nonexistent hostname into my browser. Wouldn't it be good if the DNS server just gave me its best guess instead of an error message. Or some kind of Site Finding search engine. That'd be even better than ... :)

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
1. Re:Great by rokzy · 2004-09-24 01:51 · Score: 4, Insightful
  
  I agree it's a bad idea and is imo looking in the wrong direction.
  
  I want things to be LESS tolerant of mistakes, not more. this is why the web is so fucked up. when people can get away with absolute shit, why produce anything better than shit?
Semantic Web? by jarich · 2004-09-24 01:39 · Score: 5, Insightful

Wouldn't this idea work a lot better with semantic web markup attached to links and also to intranet pages?

--
Agile Artisans
No more broken bookmarks... by greppling · 2004-09-24 01:40 · Score: 2, Insightful

...would be good enough for me. I find it really annoying how many of the bookmarks I don't use often are broken after about a year or so.
1. Re:No more broken bookmarks... by TrentL · 2004-09-24 02:09 · Score: 2, Insightful
  
  Why would that be a good thing? If the page I originally bookmarked is gone, I want an error message, not a redirection to something similar.
And sure, Verisign could operate this like DNS by Anonymous Coward · 2004-09-24 01:40 · Score: 1, Insightful

For the good of the internet..
What if the page is deleted, not changed by alta · 2004-09-24 01:40 · Score: 4, Insightful

My biggest problem is when I follow a link to a website that's no longer there. Yeah, moved pages happen, but I don't think they happen as often as deleted pages, expired domains, deleted websites, etc.

--
Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
Take this with a grain of salt by blankman · 2004-09-24 01:41 · Score: 5, Insightful

This sounds a little like SiteFinder from Verisign. Click a broken link and isntead of a helpful error message you get whatever content IBM thinks is appropriate. Certainly this could be useful, but it could also end up as just another vehicle for advertising.
Re:Can someone say "Bad Idea Jeans"? by GigsVT · 2004-09-24 01:42 · Score: 5, Insightful

The "related" search isn't what you should be looking at.

Try this.

--
I've had enough abrasive sigs. Kittens are cute and fuzzy.
worrying by TwistedSpring · 2004-09-24 01:42 · Score: 4, Insightful

"Peridot could lead to a world where there are no more broken links". Yes, it could. Peridot could also lead to a world where broken links are not manually and intelligently spotted and repaired, but automatically repaired. Automatic resolution of what a link "ought" to point to is never going to be accurate (look at search engines), and could make a company website a minefield of confusion and frustration for the user.

Only time will tell, I suppose.
1. Re:worrying by FearUncertaintyDoubt · 2004-09-24 02:58 · Score: 2, Insightful
  
  Imagine a case where a broken link is pointed to another link, which later itself becomes a broken link, and so on...it might even be possible that somehow the chain loops back on itself at some point. One thing I've realized in my career is that if you handle an error too gracefully, no one bothers to fix it. I prefer to have errors cause enough of a problem that there is feedback to fix it.
Can I just have a web site that, you know, works? by grasshoppa · 2004-09-24 01:42 · Score: 2, Insightful

Websites need to be useful before I start caring about broken links. I can think of any number of sites that started off with the best of intentions, but never quite live up to being useful.

From bad layout, to missing options, to obscure names for common links, it seems that people are actively trying to hide crap from the end user, making their website utterly worthless.

Can we devise a tool that fixes this problem first?

--
Mod me down with all of your hatred and your journey towards the dark side will be complete!
Well that sounds perfectly dreadful by Illserve · 2004-09-24 01:42 · Score: 5, Insightful

Some algorithm cruising through my website, rearranging files as it sees fit?

Sounds like a recipe for utter disaster in the worst case, and a source of mildly embarassing incidents at best.

How about this algorithm just report dead links to a human instead of trying too hard to be clever?

This sounds like someone had to come up with a final project, and settled on this one.
1. Re:Well that sounds perfectly dreadful by Anonymous Coward · 2004-09-24 04:18 · Score: 1, Insightful
  
  There are plenty of tools that do that already. See, for instance, Site Valet.
  
  Even so, there are an awful lot of websites that people just can't be bothered to maintain. Maybe the development was outsourced, maybe the maintenance guy is on another project, or on holiday, and the owners don't fancy having a link to a porn site active for 2 weeks or more. So there's a case for correcting it automatically.
  
  You don't want it on your site? Don't use it. No-one's forcing you.
yawn by FinestLittleSpace · 2004-09-24 01:43 · Score: 2, Insightful

Look, I'm not being a troll or flamebait or whatever, but seriously, I've had enough of this fucking pipedream chasing crap that gets posted to BBc news and then swiftly chucked up on slashDot.

The whole BBc News Technology section reminds me of the 'Tomorrows World' program when it was in full swing, saying how everything could be 'the next big thing' and that we'd likely se eit on shop shelves and in every home 'in a year'. Why do these people never learn that so much of this is just press release bullshit?

I'm gonna rant more, so sorry. It's just ridiculous... why do we always have to have these blopdy 'next big things'? Why can't people actually look at things rationally (say, as a geek not a mother of 2 who's never touched a computer) and think 'shit, that's not gonna get very far'.

All these posts are so ednlessly flawed, yet we still get news items with titles like 'THE END OF SPAM FOREVER?', 'LINUX COULD KILL WINDOWS IN A MONTH', 'COULD WE SAY GOODBYE TO THE INTERNET AS WE KNOW IT?'... it's all sensationalist bullshit that *might* happen, but not in the press-release inflated year like they always claim.
And... ? by Doesn't_Comment_Code · 2004-09-24 01:43 · Score: 4, Insightful

Maybe I'm being overly naive, but checking for broken links doesn't seem all that spectacular to me. It wouldn't take long to write a script to find all the broken links on a page.

The only parts that seemed worth while are replacing the links automatically, and testing if links are relevant.

I'm not so sure I'd trust a computer to do those things though. I'd much rather have the links flagged and checked by a human.

--

Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
1. Re:And... ? by julesh · 2004-09-24 03:07 · Score: 2, Insightful
  
  It doesn't only find broken links -- it also alerts you if the content changes substantially. This sounds very useful to me.
CMS by Anonymous Coward · 2004-09-24 01:45 · Score: 5, Insightful

Any good Content Management System should already take care of any internal broken links automatically, or notify the webmaster so he'll be able to take care of it manually (in the case of page deletion, etc).

The only kind of people who'd go out of their way to use this software, probably have already use some sort of CMS.
It will work, but that isn't good, here is why by tod_miller · 2004-09-24 01:45 · Score: 5, Insightful

A link points to document X.

If document X moves, and the link is invalid, a search for the link might actually find document X, and therefore, you have your benefit, and you would have saved a 404.

However - if a document becomes deprecated and deleted, then how can you assume the link is valid?

Or indeed, if the document has no relevant substitute.

A genealogy providing a link to another Willian Wallace wouldn't be good news if the original page went missing.

A better system is automated 404 alerting to the webservers administrator.

A bad link gets hit, bam, what document, from where. You can work things out intelligently, not automatically.

I think this is silly, perhaps grasping at straws, I see no reason why we would replace all our links to google 'I feel lucky' searches, so why do something like this?

This is the essence of what they have, and all they have done is coulded the search IP field (which is important) with 2 more patents, again increasing costs and endangering open source innovation, the true innovative playing field.

Of course, I could be wrong.

--
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
Obligatory RTFA by acvh · 2004-09-24 01:45 · Score: 4, Insightful

this isn't about replacing links on the internet as a whole... it's about replacing links on your company website, or at least reviewing those links.

not everything that happens in the world is an attempt by big brother to steer internet traffic to verisign or microsoft.
Re:Can someone say "Bad Idea Jeans"? by Anonymous Coward · 2004-09-24 01:49 · Score: 1, Insightful

No.

That just happens to work if your intent was to link to any random site about major league baseball (the first of the two cases outlined in the original post).

But what if your intent was to provide a list of the current US professional sports organizations and their homepages?

Linking MLB to some other random site ABOUT major league baseball makes no sense. (And if MLB.com no longer exists, MLB probably no longer exists, so the link should probably even be removed!)

Again, a computer program can't know your intent.
Simple solution by mirko · 2004-09-24 01:50 · Score: 3, Insightful

ErrorDocument 404 script.pl
Where script.pl parse the wanted URL and ask an indexing engin to find the most relevant page associated with the query...

--
Trolling using another account since 2005.
Prior Art by Bill+Dimm · 2004-09-24 01:58 · Score: 2, Insightful

They think there is something to patent here? Seems like there should be prior art all over the place. At MagPortal.com we have been using software to repair our links to articles for years.
Rather then finding new relavent links... by Lifix · 2004-09-24 01:59 · Score: 2, Insightful

How about this, lets find a better way to eliminate bad links. Have a bot scan your companies, web site, and every time you find a link to an outside source, save that page to your servers, if the link gets screwed up, you can replace it with a link to the saved web page in your server until you can do something about it.

This would not work with large web sites, but if it is just a link to a how-to guide or something small like that this would work.

--
In nature, there are neither rewards or punishments, there are only consequences.
Re:Can someone say "Bad Idea Jeans"? by kzinti · 2004-09-24 02:06 · Score: 2, Insightful

Obviously if wanted to link to a site about baseball, all of those (other than ESPN) are really entirely irrelevant.

Yep. Because Major League Baseball has strong conceptual similarity to several other concepts: the game of baseball, professional sports, American culture, and others. Granted some are more specific than others, but that's a pretty tough judgment call that depends on the context in which the original link occurred. If I link to mlb.com from a site about baseball, then it means something different than if I link to mlb.com from a site about American culture, and if mlb.com goes dead, then the link would have to be replaced in different ways for the different sites. So this link-replacement software has to be smart about both the destination and the source (context) of a link in order to replace a dead one properly.

I wouldn't go so far as to say this has to be done manually. In theory, software could handle it, but I've never seen software smart enough to automate the task for a such a broad information source as the Internet.

(Disclaimer: no, I have not RTFA; just reacting to an interesting post.)
No thanks by stratjakt · 2004-09-24 02:09 · Score: 3, Insightful

I'd prefer a more helpful 404 page, maybe with some links to the homepage or main sections of the site on it.

Sort of a "cannot find hello.jpg, click here to go back to the main page".

My point being, if the document I'm looking for is not there, I want to know it's not there. I don't want to read something else, thinking it's what I meant to read.

Usually when I'm googling around and clicking stuff I'm looking for the answer to some coding or computer related problem. I don't want to click on a link for "configuring Samba 3.0 with AD support", and wind up on a "Configuring Samba 2.2 with LDAP" and waste my time following bad advice.

--
I don't need no instructions to know how to rock!!!!
A better solution from the BOFH by tomhudson · 2004-09-24 02:17 · Score: 2, Insightful

The BOFH has a better solution http://www.theregister.com/2004/09/17/bofh_2004_ep isode_31/
"Ladies und Gentlemans, I present to you... The Newsmaker!" the PFY chirps happily, waving his hand at his squid plug-in. "Which does?" "Give me a news headline, anything, no matter how ridiculous!" "Scientists discover intelligent life in Redmond!" >clickety< >clickety< >click< >clickety< >tap< >tap< >clickety< ... >click< "Right, now Google for it!" I dutifully fire up Google, bash in Redmond and Intelligence, and roger me senseless with a full height drive if the first 10 hits don't point show up the headline I've just created, pointing at Time Warner, Yahoo News, all the greats... "Interesting - injecting false links into Google to point at news sites. I like it!" "Ahem," the PFY interrupts. "Click on one of the links." I do so, and grab that hard drive for a second go if the site concerned doesn't come up with the headline in question! "You hacked the news site?" "Not at all! I used the base idea behind banner blocking to remove the lead headline of a news site and insert my headline instead. You can even add a picture if you want, but obviously only for things that are possible to prove." "So will this work for all the news sites listed?" "Oh yes. And more importantly, the various search sites as well. So no matter what common search engine you use, the proxy discards the first 100 matches and inserts 100 of its own 'matches' instead."
Honestly, which of the two is more deserving of patent protectin as an "innovative, non-obvious invention"?
Re:Can someone say "Bad Idea Jeans"? by shawn(at)fsu · 2004-09-24 02:21 · Score: 3, Insightful

After RTFA it doesn't seem like a fair comparison to say it's like google's "related" or Verisign '
"product", this looks like a technology a webmaster would use on there own site. It also gives them they option to accept the suggestion or not. This could be really good for corporation with large intranet sites as webmaster leaves documents constanly get moved etc.

I think had the original poster read the article they wouldn't have gone of half cocked. IBM must also be somehwat confident that this is new technology or else they wouldn't have filled two patents for it.

--
500 dollar reward for tip(s) leading to the arrest of the person(s) who stole my sig.
Firefox extension by malx · 2004-09-24 03:13 · Score: 3, Insightful

On a slightly related note, a Firefox extension that searched links ahead and removed the link rendering for those that return a 404 might be handy (albeit fairly evil).

On a less related note, I've long been disappointed that some 300 series status codes in HTTP are so under-exploited, both by clients (e.g. automated bookmark management) and people running web sites.
network down by Pragmatix · 2004-09-24 03:41 · Score: 4, Insightful

Of course, if your links happen to go to a network that is experiencing a temporary outage, this tool would wreck havoc.
Soon the target network would be back up, but all your links would be lost and randomly changed to something less useful. Good Invention!
A better idea by pdamoc · 2004-09-24 03:47 · Score: 2, Insightful

Maybe it would be better if that smart program replaced the links with links from:
archive.org
or maybe google cache.
Then ofcourse it has to be smart enough to know it did that and replace the links back with the originals if they come online.
Sometimes "broken links" can recover.