Slashdot Mirror


Caching Content and the Shrinking Web?

kill-hup asks: "I know the issue of caching linked pages has been discussed many times here on Slashdot, but the majority of those discussions centered around the 'Slashdot Effect' knocking remote content servers off-line. How does the ethic/legality issue change, if any, when we're talking about information that once was available but now has moved or disappeared from the provider's site?"

"I run a small discussion-oriented site patterned after Slashdot; small story blurbs and discussion center around links to external content. From time to time we post our own content, but the vast majority involves links to articles on other sites. This structure obviously relies heavily on the external pages being available for our visitors so they can understand the issue or viewpoint being highlighted.

Just before the new year, I took a look back at story entries that had been posted throughout 2002 and found it interesting to note that a large portion of the linked content was no longer available/had moved/etc. In the short term, this is not an issue; most outside material tends to remain available for the length of an active discussion. The problem I see is visitors coming to the site by way of search engines to stories whose linked content no longer exists. Without the background provided by the referenced story link, the discussion or quick blurb may not make sense or may not fulfill the request that brought the visitor to us.

I know I am not alone in this quandary and that others must have run into this before. While I respect the copyright of the external content providers and do not wish to get into the whole issue of lost advertising revenue for them if I were to cache a local copy, I'm curious what other users are doing to mitigate this problem."

25 comments

  1. the caaaaaache by gnudutch · · Score: 1

    it chooses who stays and who will go... google

    1. Re:the caaaaaache by scubacuda · · Score: 1
      Agreed...I often link to a cached article if I feel that it'll go away in a few months.

      Unfortunately, a lot of sites even go so far as to BAN the IPs of the Way Back Machine.

    2. Re:the caaaaaache by walt-sjc · · Score: 1

      Unfortunately? Not at all. If someone wantes to remove their stuff from the net, that's their right under copyright law. Period. Don't like it? Move to another planet, 'cause it's an international thing.

  2. Mirror in case it's slashdotted and removed by jsse · · Score: 2, Interesting

    Here is the content I shamelessly mirrored without the permission from the original author. Now all those meta-karma-whore flamers can jump up to my ass and sue me for plaigarism.

    Caching Content and the Shrinking Web?
    Posted by Cliff on 02:55 AM -- Friday March 14 2003
    from the keeping-the-context-intact dept.

    kill-hup asks: "I know the issue of caching linked pages has been discussed many times here on Slashdot, but the majority of those discussions centered around the 'Slashdot Effect' knocking remote content servers off-line. How does the ethic/legality issue change, if any, when we're talking about information that once was available but now has moved or disappeared from the provider's site?"

    "I run a small discussion-oriented site patterned after Slashdot; small story blurbs and discussion center around links to external content. From time to time we post our own content, but the vast majority involves links to articles on other sites. This structure obviously relies heavily on the external pages being available for our visitors so they can understand the issue or viewpoint being highlighted.

    Just before the new year, I took a look back at story entries that had been posted throughout 2002 and found it interesting to note that a large portion of the linked content was no longer available/had moved/etc. In the short term, this is not an issue; most outside material tends to remain available for the length of an active discussion. The problem I see is visitors coming to the site by way of search engines to stories whose linked content no longer exists. Without the background provided by the referenced story link, the discussion or quick blurb may not make sense or may not fulfill the request that brought the visitor to us.

    I know I am not alone in this quandary and that others must have run into this before. While I respect the copyright of the external content providers and do not wish to get into the whole issue of lost advertising revenue for them if I were to cache a local copy, I'm curious what other users are doing to mitigate this problem."

  3. Have you realized that by jsse · · Score: 1

    Most contents removed is as a result of it being slashdotted and the company who provided web hosting service decided that it's better to remove them and cancelled the associated accounts to avoid exceesive bandwidth bill next month.

    If you can see this, you can realize that we are among one of those bloodly murderers who killed those contents. :)

    1. Re:Have you realized that by kill-hup · · Score: 1
      That's not highly likely in our case. I would liken our version of a "slashdotting" to be along the lines of a fly hitting a brick wall ;) Again, the purpose of my question was not to debate mirroring and the "Slashdot effect", but in the case of articles that just cease to be available.

      What I believe is that the content providers either went out of business (as is common these days), were swallowed up by another provider who may not archive old content or just lost some pages as a result of a re-design.

      --
      Sinepaw.org: Grape Winos
  4. Knowledge is power by quintessent · · Score: 4, Insightful

    Ethically, we need to keep the channels of knowledge open. If it was public knowledge at one time, it must remain so. Otherwise, we begin to foster an Orwellian world where any number of Ministries of Truth can hide history and rewrite it as needed. A web page is a record of the world at a given time. Just as libraries keep old journals for reference, we need to be able to reference the web of the past.

    Legally, I fear that litigation like Scientology vs. the Wayback Machine will begin to erode this protection. Having a monopoly on knowledge gives an entity the power to bring the masses into submission. We must let truth prevail.

  5. Look at it this way... by gnovos · · Score: 4, Interesting

    If your discussion were around the coffee table about a magazine article, and you were writing down your notes on paper and the paper-clipping them to the article (cut out from the magazine, of course) and storing them away in a binder, would you have any qualms about this at all? At ALL?

    To make the case even more clear-cut, imagine if the magazine you are cutting from was completely free to the readers and got all thier revenue from ads sold.

    Would you even care if you cute the ads out along side of the article? No, you would probably even go out of your way to cut them OUT of teh real world example.

    Why is it different when it is on the internet?

    --
    "Your superior intellect is no match for our puny weapons!"
    1. Re:Look at it this way... by kill-hup · · Score: 1
      I agree with your example; it shouldn't matter. The only potential problem I see is that, on the 'net, we have a much larger table. Granted, my site does not have the readership of Slashdot but would I not be re-distributing the original content? Like photocopying the original magazine article and handing out copies to everyone I know, then them handing it out to everyone they know, etc.

      If I could rely on the original content provider to keep the article available, this would be a non issue. It's somewhat like a bibliography in the printed world; if you want to see my references, go to pretty much any library.

      One of my thoughts was to mirror the original article, (locally cached) ads and all so I hadn't modified it, just in case the content disappeared. In that respect, I wouldn't be taking away ad revenue for the provider while the article was up; I'd just have something to point to when the content became unavailable.

      --
      Sinepaw.org: Grape Winos
    2. Re:Look at it this way... by Alphanos · · Score: 1

      What about this: keep a cached copy of the original article stored on your server, but only put it up instead of the original once the original is no longer available? It would require a lot more checking/work, but I wouldn't think there should be legal problems with this.

      --
      Alphanos
  6. A source of embarassment by idiotnot · · Score: 3, Interesting

    I've looked up my past personal sites, and realize how much they suck. Including the brief period where I was enamoured with IE 4.0 (MS had me on their free CD circuit).

    As far as the commerical sites go, I think, inasmuch as bits and pieces are used as "fair use," and people aren't selling things that belong to someone else, I don't see a problem.

    One of the more interesting things I've seen is what Art Bell and his webmaster did when Bell "retired" from broadcasting (let's see how long this one lasts...hmmph). They put out a CD that had some neat extra features, and authorization methods which allow you to access the website through the webmaster's site. Pretty cool, IMHO>

  7. Mirror of your mirror, just in case... by gnovos · · Score: 1

    Mirror in case it's slashdotted and removed
    Mirror in case it's slashdotted and removed (Score:1)
    by jsse (254124) on Friday March 14, @12:09AM (#5509874)
    (http://slashdot.org/)

    Here is the content I shamelessly mirrored without the permission from the original author. Now all those meta-karma-whore flamers can jump up to my ass and sue me for plaigarism.

    Caching Content and the Shrinking Web?
    Posted by Cliff on 02:55 AM -- Friday March 14 2003
    from the keeping-the-context-intact dept.


    kill-hup asks: "I know the issue of caching linked pages has been discussed many times here on Slashdot, but the majority of those discussions centered around the 'Slashdot Effect' knocking remote content servers off-line. How does the ethic/legality issue change, if any, when we're talking about information that once was available but now has moved or disappeared from the provider's site?"

    "I run a small discussion-oriented site patterned after Slashdot; small story blurbs and discussion center around links to external content. From time to time we post our own content, but the vast majority involves links to articles on other sites. This structure obviously relies heavily on the external pages being available for our visitors so they can understand the issue or viewpoint being highlighted.

    Just before the new year, I took a look back at story entries that had been posted throughout 2002 and found it interesting to note that a large portion of the linked content was no longer available/had moved/etc. In the short term, this is not an issue; most outside material tends to remain available for the length of an active discussion. The problem I see is visitors coming to the site by way of search engines to stories whose linked content no longer exists. Without the background provided by the referenced story link, the discussion or quick blurb may not make sense or may not fulfill the request that brought the visitor to us.

    I know I am not alone in this quandary and that others must have run into this before. While I respect the copyright of the external content providers and do not wish to get into the whole issue of lost advertising revenue for them if I were to cache a local copy, I'm curious what other users are doing to mitigate this problem."
    [ Reply to This ]

    --
    "Your superior intellect is no match for our puny weapons!"
  8. Just cache it. by sudog · · Score: 1

    Once it's posted, it's public information. Sites that try to prevent others from caching their pages are living in an unrealistic dreamworld that doesn't include ISP proxies, browser caches, and multiple hops through routers.

    In other words, they're morons. Just cache the data privately and ignore what you think the rest of the world thinks about it.

  9. Loss of intellectual property by Twylite · · Score: 3, Interesting

    I think there is a deeper problem being alluded to here, that of loss of intellectual property. Copyright, as if often pointed out, has two sides: the copyright owner gets to exercise control over thir asset, but in the end that asset becomes publish property.

    It has long been law and/or practice in most countries that in order to publish a book (or any copyrightable material) a copy must be lodged with the state archive (in the US, the Library of Congress). In order to make a commercial gain off a work it usually requires publication, which means that most works are available in such libraries.

    But the web changes that. Publication becomes a lot more informal, and there is no requirement or even encouragement to archive. How, in such a scenario, can we protect against publically accessible information disappearing forever? This material has been published and, at some point, the copyright will expire; it should fall into the public domain. But it most likely won't: over time it will be taken away, and never seen again.

    Consider the loss we would face if a valuable repository like Slashdot vanished. Deride it all you like - this is nevertheless a meeting place of (amongst others) some very experienced people with insightful comments, leading to a wealth of information gathered on topics that are discussed. It it not at all uncommon to find a Slashdot discussion when searching for technical information.

    archive.org is a start in the process of archiving to prevent this sort of loss -- but how can we move to tackle the problem in a proactive manner?

    --
    i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
  10. Information wants to be free... by Zapper · · Score: 1
    Or at least that is the oft quoted er, umm... quote.
    Something, in this case a webpage, once made public, is likely to be copied to some sort of personal space that is not under the control of the publisher, no matter how much they protest their copyright.
    Once in this personal space however, there is no obligation to share. And so information pools in the corners of the 'net unable to benefit any but a select/fortunate few for fear of persecution.
    Bottom line, if you don't want it preserved for posterity and must maintain rigid control, don't publish publically. If you do publish, expect that it is now copyrighted public domain material.

    So why not bring the cache/mirror out into the open. It happens anyway.

    --
    So much to do, so little bandwidth.
    --
    Try Mozilla
  11. IMHO by oni · · Score: 1


    I'm no lawyer, but I think it's ok to copy content from a news or opinion site as long as you cite your source. In other words, I *think* you're on solid ground if you copy the entire text of a news article and append the date and the place you copied it from.

    There are a couple of reasons why an author might have a problem with you doing this: Firstly, if you draw customers (and therefore ad revenue) away from their site, they wont like it. So, what I suggest is that, at the time you open a discussion thread, you cache the article but don't link to your cache at that time. Link to the original. Later on, if the article becomes unavailable, you can add a link to the cache (I'd keep the original link too though).

    Secondly, if you surround the cached copy with content from your own site - if you put your own banner ads at the top or your site's menus along the sides etc. - if you do that, you make it seem like the author wrote the article for you, or gave you permission to publish it. You make it seem like you have some relationship with the author and that just isn't true. So, I'd suggest that the cached copy open in its own window and contain nothing but the article text.

    I think these two suggestions are just common courtesy and journalistic integrity.

    If the author still demands that the cached content be removed, I think you should take it down. In its place, you could put a report or review of the article. You can't copy directly from the article as that's plagiarism. But you can quote the important lines and cite it. Think of it as your own journalistic report. If they still have a problem with that, you should tell them "the content of my site accurately reflects my opinions regarding an article you published and constitutes fair use of that article." If they persist beyond that - get a lawyer and counter sue for harassment.

    1. Re:IMHO by kill-hup · · Score: 1
      I think those are all good suggestions. In a way, they mirror (somewhat) some of the ideas I had.

      I think the point about not linking to the mirror until after the original article/content becomes unavailable is key to refute any arguments over lost ad revenue. Essentially, I'm saying that I will direct people to your site as long as there's something there for them to read. When you (as the site owner) cease to make the content available, you really aren't losing any revenue by me linking to a cached copy. As long as it's properly attributed, I think you'd have a good chance of defending yourself against any copryright infringement challenges.

      I actually read a bit on copyright law after posting this AskSlashdot and found it to be less helpful than if I'd never read it. Everything is subjective, so it looks like the best you can do is show good intent and hope for the best ;) Worst case scenario is that you remove the cache; big deal.

      --
      Sinepaw.org: Grape Winos
  12. On a related note by JimDabell · · Score: 1

    ...for all web developers: Cool URIs don't change.

  13. What do you want from slashdot? by BortQ · · Score: 1

    Either you accept the missing articles (bad choice) or you cache them.

    The answer seems pretty clear cut to me. Google does caching well, so I'd just copy them. Or you could even just link to the google cache, but that could still change.

    --

    A Multiplayer Strategy Game for Mac OS X, Windows, and Linux
  14. A Hypertexual Caching Doctrine for Slashdot by Ry+R. · · Score: 1

    Hypertexual information, posted publicly once, can and should always be preserved, especially if it relevant to another story, as links are used as jump-points here at SlashDot.

    However, because this is hypertext, another procedure needs to be followed: Content needs to be maintained. Because of the fluid nature of the web, which makes the link possible in the first place, some special actions (i e actions not taken with archival of books, magazines, newspapers, etc) need to be taken.

    Here, assuming I had ultimate control over the whole thing, is what I would do:

    Auto-caching any 'all rights reserved' site to prevent the Slashdot effect isn't OK, unless you have the permission of the owner.

    The reason for this that the owner has put up the information with the expectation that the content will be viewed on his site but with the realization that anyone may link to it.

    To undercut the owner's expectation of the content being their exclusive contribution to the web isn't ok. To link to a cache anything instead of the original document is, thus, not OK.

    However, Slashdot (or whomever), may maintain a copy of the document on their computer for the future use, in case the document is removed.

    Which means:

    If either the document is inaccessible (because of the Slashdot effect or because the document was taken down) then the cached document should be provided on Slashdot's server.

    But, the author should be contacted to both inform them of the action, as well as to find the reason of the document being taken down (inaccuracy?) and to see if the owner can or has provided another copy of the document (perhaps revised). If there is an inaccuracy and the document has been permanently removed, Slashdot should continue the caching but note the situation and attempt to correct any errors. If the document is at another URI Slashdot should removed the cached document and link to the new URI.

    A special situation might arise where a revised document is at a new URI, in this case Slashdot may provide a cache of the original and also link to the revised document. This would both provide a way to see the new, accurate, revised document and to see what it was exactly that hundreds of Slashdotters were posting to (since quotations might have been extracted verbatim and used as a jump point).

    However, users should take use of the copyright statement: If something is public domain or under a looser-than-typical license that would suggest the author wouldn't mind a wholesale caching, then cache the document from the get-go, but it would be appropriate to fully cite the author and the URI from which the cache was taken.

  15. Spam and the /. effect by chrisseaton · · Score: 1

    I've always wondered how the /. effect is different to spam.

    Both claim to be beneficical to the, shall we say, victim. "Information about special offers is useful", "They get more people looking at their banners".

    Both use up bandwidth and cause charges to the victim.

    Spam is often redundant - "I've seen this bloody spam a hundred times" - and we all know how redundant slashdot can be...

    Both can be defended by saying that if you publish your address (site or email) then people can use it.

    Nobody opts into the /. effect but still recieve it, so why are people up in arms about nobody opting into spam but still receiving it?