Slashdot Mirror


Websites Complaining About Screen-Scraping

wilko11 writes "There have been two cases recently where websites have requested the removal of modules from CPAN. These modules could be used to access the websites (EuroTV and Streetmap) from a PERL program. The question being asked on the mailinglists (threads about EuroTV and about Streetmap) is 'can companies dictate what software you can use to access web content from their server?'"

11 of 432 comments (clear)

  1. Can information be protected by copyright? by Lumpish+Scholar · · Score: 3, Informative

    Everyone's assuming the appropriate rules here are from copyright law, which allow you to protect the expression of an idea but not the idea itself. That's probably right. It's not the way some big organizations want to play.

    In the United States, most major sports leagues (NFL, NBA, NHL, MLB, etc.) believe that they own the rights to real time scores, and can permit or restrict any desired use. I ran into this at a previous job: we could "broadcast" football, basketball, and hockey scores at the end of every "period," and baseball scores at the end of every half inning, but we couldn't send updated broadcasts for every new score. That information needed (so said the leagues) to be licensed, and most of it had been exclusively licensed for the medium (Internet) we were interested in.

    Do they have a legal leg to stand on? No. (IANAL.) Are they leaning on a great, big, huge stick with nails driven through it? Apparently.

    --
    Stupid job ads, weird spam, occasional insight at
  2. ebay has already done this by troydsmith · · Score: 3, Informative
    About 2 years ago ebay did exactly this. Their case went to court and they won.

    Here is some more info

    1. Re:ebay has already done this by kryptkpr · · Score: 2, Informative

      No, no, this is NOT the same thing.

      This was a website, meta-searching another website without their permission.

      I used to run a large MP3 meta-search, and I made damned sure I had permission from every search engine I meta'd, and that their ads were put into my rotation to compensate for the extra traffic.

      I also added measures such as search caching (so when people searched for "britney spears" 500 times a day, I wouldn't actually send 500 queries, I'd only send 8, at 3 hour intervals).

      The perl module in question here allows an easy way to extract information from a website, and of course provides the capability to meta-search another site.. but that doesn't mean you have the right to do it without their permission! This is exactly what the Judge ruled:

      "Even if (Bidders Edge's) searches use only a small amount of eBay's computer system capacity, Bidders Edge has nonetheless deprived eBay of the ability to use that portion of its personal property for its own purposes"

      They used eBay's system resources, without making a deal, and without compensation.. This is just-plain-wrong (tm).

      Technology is not the problem here, it's that some people are just jackasses and want to profit from other's work.. this shouldn't be allowed. And I don't mean not allowed by law. Technology does wonders for blocking othertechnology.. if the two websites in question have half a brain they'll either

      a) change their business model
      b) find a way to block these bots (embedding tiny images in their pages for example? I'm sure I could come up with many more, if someone wants to pay me :)

      and not try to fight progress with congress.

      --
      DJ kRYPT's Free MP3s!
  3. Derivative work by yerricde · · Score: 5, Informative

    There's no law stating that we have to look at ads.

    What about 17 USC 106, which states that barring fair use, etc., the copyright owner has the right to prevent others from creating derivative works of a web page?

    --
    Will I retire or break 10K?
    1. Re:Derivative work by Natalie's+Hot+Grits · · Score: 5, Informative

      Yes, barring fair use, which explicitly allows you to do this unless you re-distribute the work. Which you aren't.

      Short answer is that you can modify any work under fair use for your OWN PERSONAL USE and not for someone else. If your web browser cuts out ads, then that is legal, and no US Code that is currently existance disallows these modifications.

      Aside from this point, there is still the legal rammifications that there is no US Law which states it is illegal to build, distribute, or use tools that can modify copyrighted works (unless the work is encrypted and covered under the DMCA)

      If an ISP started doing this at his firewall, and then re-distributing the web site to your computer after you request it, then this might be illegal. They might be able to argue that one party is getting the work, modifying it, and redistributing it, which is certaintly not covered under the Fair Use Doctrine.

      OTOH, if the ISP has a fair use reason to do this (such as reformatting the text to work on a text only terminal), then this may also be legal.

      What it all boils down to is that the spirit of copyright laws are restricting COPYING and REDISTRIBUTING, not how a person uses those works. This has been true untill 1998 when the DMCA was enacted, and even now is still true for all copyrighted works that are not covered under the DMCA's encryption clauses. To this day, I have yet to find a website that is encrypted for purposes of the DMCA protection. Untill this changes, they won't have any legal legs to stand on.

      --
      Two infinite things: your stupidity and mine. But I'm not sure about the latter. If my sig offends you, I'm sorry.
  4. Re:Sure they can! by Anonymous Coward · · Score: 1, Informative

    The problem with the examples you gave is that the first 2 fail the human comprehension test as well.

    The first one could be (orca, oyca, oycd, orcd).

    The second asked: "What are these pictures of?" and gave 6 pictures. I assume that they want an answer that applies to all 6 pictures, but damned if I could come up with a common theme for all 6.

    Now, if I went to a site that used this technology for something, I would get frustrated and leave. Kinda defeats the purpose of using it in the first place don't you think?

  5. Thread at Perlmonks by Neil+Watson · · Score: 2, Informative

    Go Here for discussion last summer over at Perlmonks.

  6. Comment removed by account_deleted · · Score: 2, Informative

    Comment removed based on user account deletion

  7. If its on a web site, its in the public domain. by crovira · · Score: 0, Informative

    If its purely internal then they should use a VPN and/or intranet and keep their stuff OFF the web.

    The web is about as private as yelling at the top of your lungs at a karaoke competition. Anybody who thinks they can tell you to listen with one ear or the other is dumb.

    --
    MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
    1. Re:If its on a web site, its in the public domain. by Afterimage · · Score: 2, Informative

      Ah, no. It is far from the public domain, legally speaking. The author has copyright on the material unless they've explicitly assigned their rights to the public domain. Simply posting material on a website does not accomplish that.

      As a content author, they are free to try to their consumers how to consume and use their service (per license, I'm sure). Whether it's *reasonable* or not is another issue entirely.

      --
      --Humpty Dumpty was pushed!
  8. The internet is a public, unregulated network. by Zone-MR · · Score: 2, Informative

    If they dont want people to use the information the way they do, why the hell are they publishing it on servers connected to a network not controlled by them...

    I mean seriously, are they now telling us what packets and requests we are allowed to send over the internet?

    By hosing an internet server they are accepting people can connect to it and send the data they like. If they dont like it, they should try and outsmart people with clever protecting software, or host it on their own private lans.