Websites Complaining About Screen-Scraping
wilko11 writes "There have been two cases recently where websites have requested the removal of modules from CPAN. These modules could be used to access the websites (EuroTV and Streetmap) from a PERL program. The question being asked on the mailinglists (threads about EuroTV and about Streetmap) is 'can companies dictate what software you can use to access web content from their server?'"
If we piss them off enough by chopping off their advertisements and snipping out their content, they'll just write their sites in Flash, or as one big image file, or some other proprietary format. That'll pretty well dictate what software you use to view their site.
Comment removed based on user account deletion
I can understand how site owners could have a problem with a commercial software product like ExpertGPS wasting their bandwidth while skipping ads. ExpertGPS costs $59.95, but downloads maps from Microsoft's TerraServer without going through its web interface and viewing its advertising. Microsoft hasn't blocked access from these programs yet, but what if they do? All the paying users of ExpertGPS would be out of this functionality.
The solution that has worked best for me...is to avoid public discussion. -- CmdrTaco
I am constantly greeted with messages to the tone of:
How is this any different from what they are attempting to do here?
I hate to disappoint, but I don't think that this is a new precedent. What is a new precedent is the notion that they can request the removal, or to make unavailable, software that is otherwise available
The precedent here is not the software usage to access a website, but the notion that this can be extended to:
This was not ever realized, I believed mostly because of overpaid "web designers".
But the Semantic Web would require many funny user agents for all kinds of things.
Clearly, if this kind of thinking is allowed to persist in corporate headquarters, it will kill the Semantic Web before it gets started.
I wonder what Tim Berners-Lee thinks about this...
Employee of Inrupt, Project Release Manager and Community Manager for Solid
One of the biggest sites that I've not seen anyone mention is eBay. Following is in their eula:
Our Web site contains robot exclusion headers and you agree that you will not use any robot, spider, other automatic device, or manual process to monitor or copy our Web pages or the content contained herein without our prior expressed written permission.
You agree that you will not use any device, software or routine to bypass our robot exclusion headers, or to interfere or attempt to interfere with the proper working of the eBay site or any activities conducted on our site.
You agree that you will not take any action that imposes an unreasonable or disproportionately large load on our infrastructure.
Much of the information on our site is updated on a real time basis and is proprietary or is licensed to eBay by our users or third parties. You agree that you will not copy, reproduce, alter, modify, create derivative works, or publicly display any content (except for Your Information) from our Web site without the prior expressed written permission of eBay or the appropriate third party.
Now why they do this is obvious, they have an absolute goldmine of information and they want to be able to take advantage of it when they're good and ready. I assume other sites could adopt this type of eula, which wouldn't make the software itself illegal, but would make using it so (or at least until someone challenges it).
Actually, this is a field that is quickly being considered a new Turing test for the computer vision field. It is actually very easy to make pictures that humans can read and that machines currently can't. Look up more info on it here.
If it's for-profit but free, you're not the customer -- you're the product (e.g., the Slashdot Beta's "audience").
Comment removed based on user account deletion
If I buy a copy of The Hobbit, rip out every 5th page and then read it, have I created a derivative work and broken a law?
If I don't distribute it, can't I do whatever I want with the content?
If I was to then repost this on the web, yes...I could see where that would be a problem, but not what I do for myself.