Slashdot Mirror


Websites Complaining About Screen-Scraping

wilko11 writes "There have been two cases recently where websites have requested the removal of modules from CPAN. These modules could be used to access the websites (EuroTV and Streetmap) from a PERL program. The question being asked on the mailinglists (threads about EuroTV and about Streetmap) is 'can companies dictate what software you can use to access web content from their server?'"

14 of 432 comments (clear)

  1. In short, no. by numbski · · Score: 5, Insightful

    If you don't want your content being redisplayed on another site, place appropriate copyright and seek protections therein.

    Don't stifle the technology. Treat the cause, not the symptom.

    --

    Karma: Chameleon (mostly due to the fact that you come and go).

  2. Sure they can! by stile · · Score: 5, Interesting

    If we piss them off enough by chopping off their advertisements and snipping out their content, they'll just write their sites in Flash, or as one big image file, or some other proprietary format. That'll pretty well dictate what software you use to view their site.

  3. Re-read the article... by numbski · · Score: 5, Insightful

    So far as apps are concerned, again no.

    There's no law stating that we have to look at ads. Although I see the problem paying the bills, a flaw in a business model is not the problem of the application coder (namely: me, you, and most people reading this site).

    --

    Karma: Chameleon (mostly due to the fact that you come and go).

  4. Don't they already??? by tacocat · · Score: 5, Interesting

    I am constantly greeted with messages to the tone of:

    You must have Windows Internet Explorer 4 or higher installed on your system to view this website

    How is this any different from what they are attempting to do here?

    I hate to disappoint, but I don't think that this is a new precedent. What is a new precedent is the notion that they can request the removal, or to make unavailable, software that is otherwise available

    The precedent here is not the software usage to access a website, but the notion that this can be extended to:

    Dear Mozilla.org,

    It has come to our attention that people are using your software to access our website. We don't like this are sending our legal team over to discuss the removal of your software application from the internet.

    Similarly, we are contacting Netscape, AOL, Opera, Konqueror, et al and removing them as well.

    Have a nice day!

  5. If you don't want window shoppers... by Eese · · Score: 5, Insightful

    ... don't put merchandise in the windows.

    Just like you can listen to unencrypted radio broadcasts through the airwaves as much as you want, or stand next to a group of people talking and listen in, you can view web pages that are served openly over the Internet.

    If you are going to be presenting something for people to observe, they can observe it however they like. Legislate all you want, but this is a fundamental component of logical (as opposed to legal) privacy.

  6. Why not? by JazzyJ · · Score: 5, Insightful

    There are a multitude of methods for providing different content based on what the client browser returns on certain environment variables. While I think it's silly to demand that modules be removed from CPAN, it's entirely up to the people running the server to determine who they want to serve content to....and who they dont.

    If they can't figure out how to do it serverside (or with clientside scripting) then that's their problem.

    That's the bitch about open standards....EVERYONE can use them.... :)

  7. HTTP GET is an authorization by bwt · · Score: 5, Insightful

    This is just another example of gross technical incompetence by executives and lawyers.

    A company that attaches an HTTP server receives an HTTP GET request complete with some information in its headers. They have a reasonable case to request that that information be accurate. They have unilateral technical ability to firewall IP's or whole subnets. Otherwise, once they receive a GET request, when the machine that they have configured responds by sending a file, they have granted explicit permission to process that file consistent with the info in the GET request.

    The owner of the server is completely in control at a technical level. If they don't like what you are doing, they can firewall you. Absent a contractual agreement not to, you have the permission to send ***REQUESTS*** for anything you would like to request. They can say no. If you lie in your request, then they have a case to say your use is unauthorized, but short of that, there should be no need to have the judicial system rewrite the technology.

  8. paging Jack Valenti by sydlexic · · Score: 5, Funny

    didn't you read the terms of service agreement you were handed at birth (us citizens only) that states any bypassing of ads during receipt of content is theft?

    I'm just waiting for ashcroft's goons to knock on my door, find the tivo and haul my ass off to jail.

    1. Re:paging Jack Valenti by trbogie · · Score: 5, Funny

      I thought they were trying to modify that to say that "Having left the womb, you have, by default, accepted the agreements to all life's conditions."

  9. Back in the day... by TheTick · · Score: 5, Insightful

    Remember when the web -- no, remember when the net was about sharing information? I miss that time. If somebody wrote a cool front end to your service, it was COOL and more power to them. If it made your service (site, whatever) more accessible, that mean more people were looking at your stuff, and that was COOL.

    Now we have entities that threaten legal action for accessing the stuff they've made publically available. There may actually be a case when the software scrapes and repackages the content (or, more importantly, redistributes it), but I hope the stuff about decoding the URL for easy use is bogus. I have my doubts that a court will see it my way, but still I hope for reason. Nevertheless, the whole idea makes me sad and nostalgic.

    Another thought: is my mozilla vulnerable to this sort of action because it blocks ads -- essentially repackaging the server output for display to me? Now I'm really depressed.

    --

    --
    bachiatari na torisetsu o yome!

  10. What's the problem here? by hmccabe · · Score: 5, Insightful

    I think this is something we're going to start seeing a lot of in coming years. Right now, the Internet in general is going through growing pains, and the pressure is starting to show in these "free services" type sites ( i.e. Mapquest )

    I don't know about these site in particular, but many of the big sites around today were built with the failed dot-com business model of delivering free content and selling advertising that ran on the page (or popped up behind it.) This, of course, is dependant on people viewing the site in a browser. If people get the information without using a browser, therefore never seeing the ads, the advertisers won't want to spend any money on the site.

    Another problem is, most companies don't want to take the risks associated with innovation, so instead they seek legal action to maintain the good thing they have going. While this is a quick fix, and in the company's best interests, we need companies to present a new business model to the public and see how it gets adopted. I would pay an annual subscription fee for things like Mapquest.com, tvguide.com and maybe even /. I believe others would as well.

    Porn sites, Ebay auctions, games such as Everquest and services such as Apple's dot-mac are online services that subscribers happily pay for because more than anything, they are quality products(well, some of the porn is). If the company's revenue is coming from its users, they would be a lot less concerned about how the information is being distributed.

    This isn't such a radical change, as they could add a premium subscription service, and slowly transition the focus of their business towards it. Wouldn't it be cool if I could write my own mapping application ( or download a pre-made one from the site ) and have it connect to xml.mapquest.com, give my username and password, and retrieve the data I requested.

  11. Re:Content is important by anaradad · · Score: 5, Insightful

    The eBay EULA only applies if you actually register for their service. If you have never signed up for eBay, you have never signed off on their EULA.

  12. Derivative work by yerricde · · Score: 5, Informative

    There's no law stating that we have to look at ads.

    What about 17 USC 106, which states that barring fair use, etc., the copyright owner has the right to prevent others from creating derivative works of a web page?

    --
    Will I retire or break 10K?
    1. Re:Derivative work by Natalie's+Hot+Grits · · Score: 5, Informative

      Yes, barring fair use, which explicitly allows you to do this unless you re-distribute the work. Which you aren't.

      Short answer is that you can modify any work under fair use for your OWN PERSONAL USE and not for someone else. If your web browser cuts out ads, then that is legal, and no US Code that is currently existance disallows these modifications.

      Aside from this point, there is still the legal rammifications that there is no US Law which states it is illegal to build, distribute, or use tools that can modify copyrighted works (unless the work is encrypted and covered under the DMCA)

      If an ISP started doing this at his firewall, and then re-distributing the web site to your computer after you request it, then this might be illegal. They might be able to argue that one party is getting the work, modifying it, and redistributing it, which is certaintly not covered under the Fair Use Doctrine.

      OTOH, if the ISP has a fair use reason to do this (such as reformatting the text to work on a text only terminal), then this may also be legal.

      What it all boils down to is that the spirit of copyright laws are restricting COPYING and REDISTRIBUTING, not how a person uses those works. This has been true untill 1998 when the DMCA was enacted, and even now is still true for all copyrighted works that are not covered under the DMCA's encryption clauses. To this day, I have yet to find a website that is encrypted for purposes of the DMCA protection. Untill this changes, they won't have any legal legs to stand on.

      --
      Two infinite things: your stupidity and mine. But I'm not sure about the latter. If my sig offends you, I'm sorry.