Slashdot Mirror


Is Spidering Content from the Web Illegal?

Lysol asks: "I'm setting up a new site that I want to have spider some industry sites and grab their news. The problem is that no one has been able to tell me if this is illegal or not. I'd be fine with it, but I could see if a site was touchy about this and they were monitoring their logs and found a particular server on my site grabbing pages every few hours or so and then proceeding to sue me or something." As I understand it, it all depends on the fine print. Some sites have notices that allow you to freely copy the content of a web page as long as credit is given. Many commercial sites are much more restrictive.

4 of 11 comments (clear)

  1. robots.txt by Anonymous Coward · · Score: 2

    There is and informal but generally accepted standard you should take a look at called "A Standard for Robot Exclusion"

    Take a look at http://info.webcrawler.c om/mak/projects/robots/robots.html and http://info.webcrawler .com/mak/projects/robots/norobots.html

    This does not address copyright issues, which have become even murkier with the recent revisions to the copyright law restricting fair use.

    You should also take a look at the XML syndication format (aka RSS [RDF Site Summary]). It's based on RDF and is becoming supported by alot of larger news sites, even /. Here are some links: http://www.edventure.c om/release1/abstracts/syndication.html for background info. http://www.w3.org/RDF/ for the low level info, and http://my.netscape.com/publish/ help/quickstart.html for the RSS implementation.

  2. Suggestion... by GoRK · · Score: 2

    If this is a commercial venture and you're looking at offering news kind of like a 'portal' does (god i hate buzzwords) then there are a couple companies that eliminate the need to do-it-yourself and will also eliminate the legal hassles for delivering news content... if it fits your bill, that is.

    Here are the two that I know of. I have used both services and they have been fairly decent content providers.

    1) Screaming Media http://www.screamingmedia.net/ - These guys let you pick and choose what goes on your site. Articles appear as if they came from your site. Good hands-on approach. Good content, too.

    2) iSyndicate http://www.isyndicate.com/ - These guys let you publish your content back to their network as well as letting you use their syndicated content. I really like the way it works. I dont know if you have as much say as to what comes down the pipes from them though.. E.G. not wanting to put your competitor's press release on your site..

    Anyway that's just a glimpse of the content providers out there. I have seen tens upon tens of them.. these are the two that i had settled on before. I hope that you might find some use with them. iSyndicate used to have a free program too.. i dont know what (if anything) happened to that.. possibly it's still there.

    ~GoRK

  3. Be very very careful by copito · · Score: 3

    #include "disclaimer.h"

    www.freerepublic.com a conservative news discussion site is being sued by the LA Times and the Washington Post for copying news stories for discussion. Sort of like Slashdot without the links. The judge ruled that this was not free use. A final ruling in the case has not been reached.

    Linking to content is likely to be much safer than copying it or framing it, although copying headlines might be safe.

    Remember it is not whether someone can sue you that is the important question. Anyone can sue you. It is a question of whether you will piss them off enough they might sue you, and whether it is easy to get the lawsuit dismissed.


    --

    --
    "L'IT c'est moi!"
  4. Automatic copyrights by m+o+r+p+h+e+u+s · · Score: 2

    Any work is automatically copyrighted to its author and subject to copyright laws at the moment of creation, EVEN IF there is no explicit copyright statement. This is part of the Digital Millennium Copyright Act and applies to web pages. Therefore, you can't copy stuff from a page just because it doesn't say you can't. The only time you can is if it says you can.