Broken Links No More?
johndoejersey writes "Students in England have developed a tool which could bring the end to broken links. Peridot, developed by UK intern students at IBM scans company weblinks and replaces outdated information with other relevant documents and links. IBM have already filed 2 patents for the project. The students said Peridot could protect companies by spotting links to sites that have been removed, or which point to wholly unsuitable content. 'Peridot could lead to a world where there are no more broken links,' James Bell, computer science student at the University of Warwick, told BBC News Online. Here is another story on it." See also the BBC story.
There are two parts to this tool, one of which is bad quite and one of which is quite good.
First, replacing links. This is a rather quite bad idea. Here's why, with an example.
In general, we can all agree that the technology behind Google is pretty impressive. It has its own "More Pages Like This" feature, which we can assume is at least somewhat similar to this one. Complex content analysis amoung billions of pages, to determine which are similar and which are different.
So, suppose we had a link to Major League Baseball, www.mlb.com on our page. And suppose, for whatever reason, that their site went away (perhaps a few more players' strikes?).
Well, what does Google suggest as a replacement? Check it out here.
First the National Football League (NFL), then the National Basketball Association (NBA), and then the National Hockey League (NHL). Followed by the ESPN sports network, and NASCAR racing.
Obviously if wanted to link to a site about baseball, all of those (other than ESPN) are really entirely irrelevant.
But if we wanted to link to a site about professional sports organizations, all of those (other than ESPN) are QUITE relevant.
Can this software know our intent?
Hardly.
You really have to question the ability of machines to select relevant links.
The situation is this: If someone goes to the trouble to manually create links in the first place, those should not be automatically changed to other sites that some computer program thinks may be related. Links shouldn't be inserted automatically; if someone needs more information on something you haven't linked to, they can use a search engine. And then your company isn't liable to look idiotic by linking to irrelevant sites.
Now, the other aspect of this product.
Removing dead or changed links is quite another matter. Automated removal of links is a great idea and quite useful. For example, consider when someone's domain name expires and it is taken over by a porn site. It'd be great to have a program that automatically removes links to it from your site. Like this tool, this could be based on a percentage of changed content--if the content changes significantly, remove the link quickly and automatically. If the content changes some intermittent amount, flag the link as needing review by the webmaster.
But in those both case, the software should present the webmaster with a list of such questionable links, those it has removed from the site temporarily, and then allow the webmaster to select replacement links.
Manually. With relevance.
And im sure ill be making use of this technology as along side of my copy of Duke Nukem Forever
Hang on. On similar lines, I've a great idea. Suppose I type a nonexistent hostname into my browser. Wouldn't it be good if the DNS server just gave me its best guess instead of an error message. Or some kind of Site Finding search engine. That'd be even better than
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
A team in the Netherlands built an application that listens to contact centre conversations, picks out relevant keywords and automatically prompts the call centre agent with possible answers.
Does this app take the form of a paper clip? Because that would be a great idea!
Wouldn't this idea work a lot better with semantic web markup attached to links and also to intranet pages?
Agile Artisans
I think the link is broken... :)
...would be good enough for me. I find it really annoying how many of the bookmarks I don't use often are broken after about a year or so.
For the good of the internet..
My biggest problem is when I follow a link to a website that's no longer there. Yeah, moved pages happen, but I don't think they happen as often as deleted pages, expired domains, deleted websites, etc.
Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
This sounds a little like SiteFinder from Verisign. Click a broken link and isntead of a helpful error message you get whatever content IBM thinks is appropriate. Certainly this could be useful, but it could also end up as just another vehicle for advertising.
Suppose you have broken link http://somesite.com/foo/bar.html, some sites return a list of search results from within 'somesite.com' matching 'foo' or 'bar'. Quite clever, and much more useful than a plain old 'page not found' error.
This just takes that one step further by doing the searching at the referring end instad.
Like tinyurl, but one letter less! http://qurl.co.uk/
Correct me if I'm wrong, but couldn't the majority of this New Exciting Technology (TM) be achieved with a small amount of Bash scripting and a copy of wget?
"The dew has clearly fallen with a particularly sickening thud this morning"
I decided it'd be too hard for software to decide whether a change was significant. I wonder how this software does it - presumably, you can change the threshold?
You can read more information about this process here.
3D Printing Tips and Tricks at Zheng3.com
"Peridot could lead to a world where there are no more broken links". Yes, it could. Peridot could also lead to a world where broken links are not manually and intelligently spotted and repaired, but automatically repaired. Automatic resolution of what a link "ought" to point to is never going to be accurate (look at search engines), and could make a company website a minefield of confusion and frustration for the user.
Only time will tell, I suppose.
Websites need to be useful before I start caring about broken links. I can think of any number of sites that started off with the best of intentions, but never quite live up to being useful.
From bad layout, to missing options, to obscure names for common links, it seems that people are actively trying to hide crap from the end user, making their website utterly worthless.
Can we devise a tool that fixes this problem first?
Mod me down with all of your hatred and your journey towards the dark side will be complete!
Some algorithm cruising through my website, rearranging files as it sees fit?
Sounds like a recipe for utter disaster in the worst case, and a source of mildly embarassing incidents at best.
How about this algorithm just report dead links to a human instead of trying too hard to be clever?
This sounds like someone had to come up with a final project, and settled on this one.
Look, I'm not being a troll or flamebait or whatever, but seriously, I've had enough of this fucking pipedream chasing crap that gets posted to BBc news and then swiftly chucked up on slashDot.
The whole BBc News Technology section reminds me of the 'Tomorrows World' program when it was in full swing, saying how everything could be 'the next big thing' and that we'd likely se eit on shop shelves and in every home 'in a year'. Why do these people never learn that so much of this is just press release bullshit?
I'm gonna rant more, so sorry. It's just ridiculous... why do we always have to have these blopdy 'next big things'? Why can't people actually look at things rationally (say, as a geek not a mother of 2 who's never touched a computer) and think 'shit, that's not gonna get very far'.
All these posts are so ednlessly flawed, yet we still get news items with titles like 'THE END OF SPAM FOREVER?', 'LINUX COULD KILL WINDOWS IN A MONTH', 'COULD WE SAY GOODBYE TO THE INTERNET AS WE KNOW IT?'... it's all sensationalist bullshit that *might* happen, but not in the press-release inflated year like they always claim.
Maybe I'm being overly naive, but checking for broken links doesn't seem all that spectacular to me. It wouldn't take long to write a script to find all the broken links on a page.
The only parts that seemed worth while are replacing the links automatically, and testing if links are relevant.
I'm not so sure I'd trust a computer to do those things though. I'd much rather have the links flagged and checked by a human.
Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
Any good Content Management System should already take care of any internal broken links automatically, or notify the webmaster so he'll be able to take care of it manually (in the case of page deletion, etc).
The only kind of people who'd go out of their way to use this software, probably have already use some sort of CMS.
A link points to document X.
If document X moves, and the link is invalid, a search for the link might actually find document X, and therefore, you have your benefit, and you would have saved a 404.
However - if a document becomes deprecated and deleted, then how can you assume the link is valid?
Or indeed, if the document has no relevant substitute.
A genealogy providing a link to another Willian Wallace wouldn't be good news if the original page went missing.
A better system is automated 404 alerting to the webservers administrator.
A bad link gets hit, bam, what document, from where. You can work things out intelligently, not automatically.
I think this is silly, perhaps grasping at straws, I see no reason why we would replace all our links to google 'I feel lucky' searches, so why do something like this?
This is the essence of what they have, and all they have done is coulded the search IP field (which is important) with 2 more patents, again increasing costs and endangering open source innovation, the true innovative playing field.
Of course, I could be wrong.
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
this isn't about replacing links on the internet as a whole... it's about replacing links on your company website, or at least reviewing those links.
not everything that happens in the world is an attempt by big brother to steer internet traffic to verisign or microsoft.
Spyware/Adware and IE already give you search results and links. The only difference is that this automatically places you at a different link without a choice.
Hmmm... Pie...
You are the weakest link. Goodbye!
Damn you slashdotters!!! I work at IBM and the intranet server is down! I can't believe you've managed to cause the automatic load-balancer to kill the intranet in favour of a slashdot article.
Damn you!!
And purple hatstands
The story reminds me of my diploma, which I wrote at IBM Germany. Ah, the good days....
I have never worked with people like that - highly skilled and very friendly and approachable. As a group very concentrated, but very relaxed as individuals.
Wouldn't be an exaggeration to say that that experience has defined what I view as professionalism.
some over funded jumped up interns have developed a high tech, method and software and system to stop the slashdot effect.
Each webserver will return a redirect to a google cache lookup for itself if the load sever gets too high.
1: Stupid idea
2: Patent
3: Wait 'til someone nudges at your generously worded patent
4: happily license this unrelated technology to keep thier VC peeps in the green.
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
ErrorDocument 404 script.pl
Where script.pl parse the wanted URL and ask an indexing engin to find the most relevant page associated with the query...
Trolling using another account since 2005.
Remember Google-hacks at http://johnny.ihackstuff.com/? Basically, since Google effectively snoops millions of servers, you can use this information to break into servers and get information. Having an internal feature that connects broken links to real pages may be orders of magnitudes worse. What if I imaginatively "linked" to a made-up URL to see what's on your servers? This could be bad news if it's effectively done.
A NYC lawyer blogs. http://www.chuangblog.com/
How about if I type up a large scientific paper loaded with non-existant links, and then their software will "fix" it by finding the proper material out on the net and pointing the links to it. This could revolutionize the science of hand waving!
One line blog. I hear that they're called Twitters now.
They think there is something to patent here? Seems like there should be prior art all over the place. At MagPortal.com we have been using software to repair our links to articles for years.
How about this, lets find a better way to eliminate bad links. Have a bot scan your companies, web site, and every time you find a link to an outside source, save that page to your servers, if the link gets screwed up, you can replace it with a link to the saved web page in your server until you can do something about it.
This would not work with large web sites, but if it is just a link to a how-to guide or something small like that this would work.
In nature, there are neither rewards or punishments, there are only consequences.
How can you tell if the link's changed content is an update that's OK, or an update that's "not ok"? If the tool could do that, it could create a site of links related to whatever, kinda like google, but it sounds like it would be a whole level smarter than google somehow.
stuff |
I'd prefer a more helpful 404 page, maybe with some links to the homepage or main sections of the site on it.
Sort of a "cannot find hello.jpg, click here to go back to the main page".
My point being, if the document I'm looking for is not there, I want to know it's not there. I don't want to read something else, thinking it's what I meant to read.
Usually when I'm googling around and clicking stuff I'm looking for the answer to some coding or computer related problem. I don't want to click on a link for "configuring Samba 3.0 with AD support", and wind up on a "Configuring Samba 2.2 with LDAP" and waste my time following bad advice.
I don't need no instructions to know how to rock!!!!
This is AMAZING. The intarweb will never be the same.
http://www-ai.cs.uni-dortmund.de/DOKUMENTE/malzahn _2003a.pdf
Basically, the thesis evaluates different methods to build a kind of "finger-print" of a page. The finger print is used to find the page with google if it is gone, or has changed significantly.
The internet wayback machine was used to learn distinguishing disappeared pages from pages changing slightly over the time.
Wonder where it would send me if www.hotmail.com were down?
*shudder*
(disclaimer: no, I didn't actually look to see what's on that site)
"If you think you have things under control, you're not going fast enough." --Mario Andretti
just how do you patent "wget -r $SITE |grep 404" ?
if you can, i'm going to patent "/usr/games/fortune -o zippy |cowsay |wall"
there's penis in my pooper and it hurts real bad
So, they've invented SED? Cause thats what I've been using for years to replace old/broken links. A simple script using the netsaint/nagios service tests can check if a link is still good and then build a list of bad ones to be replaced by script number two using SED.
"Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"
Perfectly good and useful technology that everyone can use, and some asshat company has to f'n jump on the bandwagon and patent it in hopes to make a fucking dime... give me a break.
I weep for the future of technology if this is what it's gonna come down to.
We have secretly replaced these Slashdot mods' sense of humor with a rusty nail. Let's see if they notice!!
So what does this mean?
Quite simply, the rules of the English language dictate that saying "IBM have" is WRONG. "IBM HAS" is correct.
Thanks in advance for stopping this bastardization of the worlds most-spoken language.
Alright, I gotta know, how is this a troll?
I don't mind being modded down, but seriously, I am at a loss here.
Mod me down with all of your hatred and your journey towards the dark side will be complete!
Being bad with names, I can't recall what the service is called or the URL - but I have received e-mail from a service (free since I haven't paid for it, that is for sure) that I thought was spam, but upon reading it saw that it is actually terrifically useful.
It spiders the web and looks for links that are dead, and then e-mails a contact on that site to say that the links are broken and on what page they can be found.
Come to think of it, I don't even know what address it sends it to.
Basically it is all magic to me, but I like it.
There are some odd things afoot now, in the Villa Straylight.
Wonder what RIAA thinks about it?
"It would be wrong to refuse to face the fact that everything is fundamentally sick and sad."
Just use the W3C's link-checker.
"IBM have already filed 2 patents for the project."
More evidence that IBM isn't really committed to an open/free philosophy.
What if (and the chances are high) I store the URLs in some DB, let's add a proprietary format + compression for fun here, and then let's fetch this URLs by a script depending on user entered parameters. Imagine full-text searching or stuff. How is it at all possible to write a universal checking tool here??? Man, I wish people stop wasting energy on stuff like "automatic C++ to Java converter" and similar bullshit when every semi-knowledgable person can instantly say that the thing is no go.
For those running a real browser, just make this a link, preferably in your personal tool bar.
) {v oid(Qr=prompt('Url...',''))};if(Qr)location.href=' http://web.archive.org/web/*/'+escape(Qr)
javascript:Qr=document.URL;if(Qr=='about:blank'
Now when I click on a link that isn't there, I select my Archive search button and it shows me the Wayback Machine's history of that link. Of course it works only if the url hasn't been modified by the server. If it has it's another couple steps (copy link, ^T, archive search, paste url in pop-up dialog)
tcboo
When I worked at IBM the interns never did anything useful. They would come and go as they please and do their homework when they were around. When work got busy they would go home early.
Just think of how many pages we could crash with just one article with this tool... *drool*
You could create your links using googles im feeling lucky feature, assuming it was just a generic link site looking for interesting sites rather then specific articles.
s +For+Nerds&btnI=Google+Search
e.g:
http://www.google.com/search?hl=en&ie=UTF-8&q=New
And voila, you'll site will take you to the most popular related site to news for nerds, automagically, if slashdot died one day, another site would take it's place in the google rankings. FF.
This is quite often true in respect to sites/companies with large webpages and hence lots of links. One company I used to work in the internet/intranet division for kept links to several partners' webpages. When one of those partners let their domain expire, it was bought out by a pr0n company.
You can imagine how much the staff enjoy the content on the new page... and the IT Security folks especially as the proxy was suddenly giving them lots of nice warnings about workers' viewing inappropriate conduct (probably due to the nasty popups, etc).
It is, unsurprisingly, extremely easy to just write a script which checks if links are working and ignored them if they are working or, if they are not working, reports them to the admin and makes them into Not-Links in the page that actually gets posted. Although that might leave a few gaps in navigation, at least the gaps don't let people follow them to dead-ends. And, with the admin warned, they can be fixed promptly.
You know what, none of that link stuff worries me one bit. Links are bound to be irrelevant/stupid/broken unless someone really cares about them and monitors them manually.
No, the big worry for me is PATENTS. What the hell are they patenting? What is the Big Idea here that deserves a patent? This is scary stuff. What, do we have to find prior art for every stupid idea someone decides to patent? The answer is "yes." We are all out of business if we let this continue. Support the EFF! Kill this stupidity now before we are all out of work.
Clippy indeed, must be a slow news day,
- RLJ
Wait... I already get this with the Spyware that is on my computer.
On a slightly related note, a Firefox extension that searched links ahead and removed the link rendering for those that return a 404 might be handy (albeit fairly evil).
On a less related note, I've long been disappointed that some 300 series status codes in HTTP are so under-exploited, both by clients (e.g. automated bookmark management) and people running web sites.
The problem is larger than an individual site. Since the web is built on a distributed platform, solving broken links on the small is a good start, but not complete. External sites that have the link also need to corrected (whether an update, a delete, a move, etc.).
r view.html) of this that would not only notify the local system administrator of a broken link, but also provided a facility to notify the administrator of the system from whence the request originated.
One way to extend this idea is to make use of the referrer: field in HTTP. I worked on an early prototype ('95-96 http://www5conf.inria.fr/fich_html/papers/P10/Ove
Automating this is difficult due to security issues, but it's a least worth somebody continuing to do research and finding ways to make things better, even if incrementally.
Not sure I like the idea of patenting this, but I do like the idea of people working on it.
Kipp
Then they will get a very large problem with me... Large enough to get hardball tactics from my closet...
I periodically run dead link checking software to perform this function with regards to my bookmarks, some of which I publish on my web sites.
There are many things that happen to links, such as redirects, but to conclude that a link is down because you get a 4xx or 5xx HTTP response is extreme. Sometimes sites go down for a period of time for various reasons. Such a link replacement process would need to have some kind of forgiveness mechanism. Further, sometimes links move elsewhere without the benefit of redirects--this replacement process therefore shouldn't replace links with "related" content, but the same content that's moved to another spot.
The bottom line is that the replacement process requires a step in the process where a human being reviews link change recommendations.
Steve Magruder, Metro Foodist
So if I post a link to some cool pr0n site, or make my own pr0n....and shut the link down...will it find me more quality porn?
Now THAT is the question.
I swear, when I worked as an intern there, they encouraged patents on anything and everything. They were even proud to lead other technology companies in having patents.
Can it deal with the "Slashdot Effect"?
If broken links are a problem, maybe the html/http pair would better be shaped more acording the original Xanadu project.http://xanadu.com/
What's in a sig?
The "I'm feeling lucky" link.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Soon the target network would be back up, but all your links would be lost and randomly changed to something less useful. Good Invention!
Broken Links No More? "Students in England have developed a tool which could bring the end to broken links. Peridot............Peridot could lead to a world where there are no more broken links,'"
I'll troll to hell for this, but I could care less, and I have no problems standing up for what I say. This is terribly irresponsible journalism. No fucking where in the summary does it mention intranet or corporate websites. A world would be pretty global, would it not? Again, the headlines and summaries are getting totally out of sync with the actual articles. Nothing gets edited and people are bitching all over the place about what essentially has amounted to stories (hah, an actual story on slashdot) that are ads. I mean come on. Product announcements versus real news?
I'm sure for as many people that read this site (and pay the bills, as well as *ahem* salary) that the editors must surely get a great deal of story submissions every day. Hell, I'm sure that many people wouldn't actually mind some fucking content once in a while other than links to other stories. The site hasn't been changed in what, like years now? Are the "editors" that busy that they can't even hire someone to actually double check their stories for dupes and errors?
Mod me to hell for this, but I have merely summarized what many have bitched about for months and months and months now, and frankly, while I love the humor and the insightfulness that many share here, the "editors" are really getting lazy these days. Feel free to correct me, I mean I'm sure all those "nothing to see here" and downtimes are starting put a cramp in the afternoon quake matches, so I guess they could be working on getting the site running better, right?
I feel sorry for the subscribers.
zosX
zosxavius photography
Maybe it would be better if that smart program replaced the links with links from:
archive.org
or maybe google cache.
Then ofcourse it has to be smart enough to know it did that and replace the links back with the originals if they come online.
Sometimes "broken links" can recover.
Broken Links No More? "Students in England have developed a tool which could bring the end to broken links. Peridot............Peridot could lead to a world where there are no more broken links,'"
I'll troll to hell for this, but I could care less, and I have no problems standing up for what I say. This is terribly irresponsible journalism. No fucking where in the summary does it mention intranet or corporate websites. A world would be pretty global, would it not? Again, the headlines and summaries are getting totally out of sync with the actual articles. Nothing gets edited and people are bitching all over the place about what essentially has amounted to stories (hah, an actual story on slashdot) that are ads. I mean come on. Product announcements versus real news?
I'm sure for as many people that read this site (and pay the bills, as well as *ahem* salary) that the editors must surely get a great deal of story submissions every day. Hell, I'm sure that many people wouldn't actually mind some fucking content once in a while other than links to other stories. The site hasn't been changed in what, like years now? Are the "editors" that busy that they can't even hire someone to actually double check their stories for dupes and errors?
Mod me to hell for this, but I have merely summarized what many have bitched about for months and months and months now, and frankly, while I love the humor and the insightfulness that many share here, the "editors" are really getting lazy these days. Feel free to correct me, I mean I'm sure all those "nothing to see here" and downtimes are starting put a cramp in the afternoon quake matches, so I guess they could be working on getting the site running better, right?
I feel sorry for the subscribers.
zosX
zosxavius photography
This seems to be the classic case of a patent on something stupid. The guy to patent such a thing is often the first, since all others discarded the idea right away.
;-) go figure ..
While such a patent has some merit because sometimes it turns out that the stupid stuff is not so stupid after all, this one basically patents two easy steps that are done in succession: finding broken links(easy) and replacing broken links(more difficult).
In my eyes if they had patented the details of a sophisticated solution to problem b) that would be OK, but I bet they made a broad patent, like patenting all ways to do step a) and all ways to do step b).
Consider that some web masters did the same process before, replacing broken link by hand, what exactly is new about the process itself in such a patent ?
My web server automatically replaces broken links with a different 404 page
I'm still trying to figure out what people mean by 'social skills' here.
But that's a small price to pay for no more goatse.cx links.
Disclaimer: I used to work for the company discussed.
0 4&tid=95
Here's a link to an earlier effort:
http://slashdot.org/article.pl?sid=00/06/18/14342
Think it's a good idea? We raised millions in venture capital!
The first virus to modify this and replace all links to goatse in 5... 4.... 3...
My beliefs do not require that you agree with them.
Sounds like a another fanstasy too goo to be true :-)
Damn, i should have gotten a patent!
Sigh.
No, it won't eliminate broken links....It'll simply redefine what a broken link is.
As far as I'm concerned if it's not the originally intended link it may as well be "broken".
...offshore coding monkeys.
Tried reading the article, but with writing like 'named after Peridot, a mythical gemstone...' I gave up.
See, Peridot isn't mythical. It may have mythical properties, but the gemstone itself is _REAL_. Details matter people...
Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
It's sorta like everything2.com, where you link phrases instead of url's, every phrase has an entry, although more obscure phrases won't have any info in the entry, but anyone can enter information in. So this broken links deal seems like that in a way, except maybe with a closed database or using google.
There were two fellows at UC Berkeley (Phelps and Wilensky) who implemented the idea of "fingerprinting" web pages at least as far back as 2000. It was a non-trivial fingerprinting (i.e. not just MD5 hash of a web page).
As far as I know, they haven't done any more recent work on this and the software is only available via archive.org.
A paper
I gather that the IBM effort is different in significant respects, but it certainly employs ideas from Phelps & Wilensky.
I'm a little surprised! This isn't new. Associate keywords or phrases with your links and then let the user search those keywords/phrases if the link doesn't work. You could implement this several ways and I strongly suggest getting an account with Google's AdSense to get paid for your broken links when people search (or any other paid search service). Get paid for broken links? What an idea! So, generate all of your sites links using a database and run checks on them periodically, or run all the links through a gateway that checks their availability. Or, crawl your site for broken links and replace them with a search replacement that picks up the links visible text. Then, when a link is broken, the visitor gets a nice, neat message and, hopefully, still gets the info they are looking for.
--I smoked my sig.
Whoosh.
Who is General Failure and why is he reading my hard disk?
I would like to point out link-checking tool Checkbot. Certainly a story like this on /. needs to have a thread
about comparing Peridot vs. Checkbot vs. ...
Randall
I have a new patent under the name breath.
Description: Inhale into the lungs a portion of the atmosphere. Remove some oxygen. Then exhale the unused air. Anyone that inhales needs to pay me a fee!
So don't breath until my lawyers can contact you for a fee arrangement.
Anyone heard of Robert Wilensky and Tom Phelps and their work on solving the broken link problem? How about my work?? :)
...in a somewhat different form. Not so much unbreaking your own internal web site links (here's a free clue, don't break them in the first place!), but dealing with all these links that are not a "404" or similar error, but simply no longer what you originally linked to.
A BIG example: Domain buy-outs by porn sites, "portal potties" and shady marketing companies, resulting in links pointing to an undesired resource that is still a "valid" (non-error) document. Or links to a news site that eventually recycles the same URL for a new story. Your average "404 checker" is powerless to tell you this has happened.
My proposed solution makes a copy or digest of the linked document AT THE TIME IT IS LINKED (or very soon after), then compares aspects of the original with the 'current' version during subsequent link checks. The easiest way would be to simply alert the webmaster when the page contents have changed by x% (where x is user definable), or when a 'required' key word(s) or phrase(s) are no longer present. More advanced, future enhancements are possible of course; similarly to Google's ability to pick out related words, the link checker could eventually be able to understand the linked article is about the same topic (if this is all the webmaster is going for), even if the exact words have changed.
Caveat Emptor is not a business model.
who'd ever have thought of that if IBM hadn't?...
Wonder where it would take me?
Starbucks, Harbuckle of Breath.
"said Peridot could protect companies by spotting links to sites that have been removed, or which point to wholly unsuitable content"
No more Goatse I guess