Caching Content and the Shrinking Web?
"I run a small discussion-oriented site patterned after Slashdot; small story blurbs and discussion center around links to external content. From time to time we post our own content, but the vast majority involves links to articles on other sites. This structure obviously relies heavily on the external pages
being available for our visitors so they can understand the issue or viewpoint
being highlighted.
Just before the new year, I took a look back at story entries that had been posted throughout 2002 and found it interesting to note that a large portion of the linked content was no longer available/had moved/etc. In the short
term, this is not an issue; most outside material tends to remain available for the length of an active discussion. The problem I see is visitors coming to the site by way of search engines to stories whose linked content no longer exists. Without the background provided by the referenced story link, the discussion or quick blurb may not make sense or may not fulfill the request that brought the visitor to us.
I know I am not alone in this quandary and that others must have run into this before. While I respect the copyright of the external content
providers and do not wish to get into the whole issue of lost advertising revenue for them if I were to cache a local copy, I'm curious what other users are doing to mitigate this problem."
Here is the content I shamelessly mirrored without the permission from the original author. Now all those meta-karma-whore flamers can jump up to my ass and sue me for plaigarism.
Caching Content and the Shrinking Web?
Posted by Cliff on 02:55 AM -- Friday March 14 2003
from the keeping-the-context-intact dept.
kill-hup asks: "I know the issue of caching linked pages has been discussed many times here on Slashdot, but the majority of those discussions centered around the 'Slashdot Effect' knocking remote content servers off-line. How does the ethic/legality issue change, if any, when we're talking about information that once was available but now has moved or disappeared from the provider's site?"
"I run a small discussion-oriented site patterned after Slashdot; small story blurbs and discussion center around links to external content. From time to time we post our own content, but the vast majority involves links to articles on other sites. This structure obviously relies heavily on the external pages being available for our visitors so they can understand the issue or viewpoint being highlighted.
Just before the new year, I took a look back at story entries that had been posted throughout 2002 and found it interesting to note that a large portion of the linked content was no longer available/had moved/etc. In the short term, this is not an issue; most outside material tends to remain available for the length of an active discussion. The problem I see is visitors coming to the site by way of search engines to stories whose linked content no longer exists. Without the background provided by the referenced story link, the discussion or quick blurb may not make sense or may not fulfill the request that brought the visitor to us.
I know I am not alone in this quandary and that others must have run into this before. While I respect the copyright of the external content providers and do not wish to get into the whole issue of lost advertising revenue for them if I were to cache a local copy, I'm curious what other users are doing to mitigate this problem."
If your discussion were around the coffee table about a magazine article, and you were writing down your notes on paper and the paper-clipping them to the article (cut out from the magazine, of course) and storing them away in a binder, would you have any qualms about this at all? At ALL?
To make the case even more clear-cut, imagine if the magazine you are cutting from was completely free to the readers and got all thier revenue from ads sold.
Would you even care if you cute the ads out along side of the article? No, you would probably even go out of your way to cut them OUT of teh real world example.
Why is it different when it is on the internet?
"Your superior intellect is no match for our puny weapons!"
I've looked up my past personal sites, and realize how much they suck. Including the brief period where I was enamoured with IE 4.0 (MS had me on their free CD circuit).
As far as the commerical sites go, I think, inasmuch as bits and pieces are used as "fair use," and people aren't selling things that belong to someone else, I don't see a problem.
One of the more interesting things I've seen is what Art Bell and his webmaster did when Bell "retired" from broadcasting (let's see how long this one lasts...hmmph). They put out a CD that had some neat extra features, and authorization methods which allow you to access the website through the webmaster's site. Pretty cool, IMHO>
I think there is a deeper problem being alluded to here, that of loss of intellectual property. Copyright, as if often pointed out, has two sides: the copyright owner gets to exercise control over thir asset, but in the end that asset becomes publish property.
It has long been law and/or practice in most countries that in order to publish a book (or any copyrightable material) a copy must be lodged with the state archive (in the US, the Library of Congress). In order to make a commercial gain off a work it usually requires publication, which means that most works are available in such libraries.
But the web changes that. Publication becomes a lot more informal, and there is no requirement or even encouragement to archive. How, in such a scenario, can we protect against publically accessible information disappearing forever? This material has been published and, at some point, the copyright will expire; it should fall into the public domain. But it most likely won't: over time it will be taken away, and never seen again.
Consider the loss we would face if a valuable repository like Slashdot vanished. Deride it all you like - this is nevertheless a meeting place of (amongst others) some very experienced people with insightful comments, leading to a wealth of information gathered on topics that are discussed. It it not at all uncommon to find a Slashdot discussion when searching for technical information.
archive.org is a start in the process of archiving to prevent this sort of loss -- but how can we move to tackle the problem in a proactive manner?
i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net