Slashdot Mirror


Websites Complaining About Screen-Scraping

wilko11 writes "There have been two cases recently where websites have requested the removal of modules from CPAN. These modules could be used to access the websites (EuroTV and Streetmap) from a PERL program. The question being asked on the mailinglists (threads about EuroTV and about Streetmap) is 'can companies dictate what software you can use to access web content from their server?'"

42 of 432 comments (clear)

  1. In short, no. by numbski · · Score: 5, Insightful

    If you don't want your content being redisplayed on another site, place appropriate copyright and seek protections therein.

    Don't stifle the technology. Treat the cause, not the symptom.

    --

    Karma: Chameleon (mostly due to the fact that you come and go).

    1. Re:In short, no. by numbsafari · · Score: 1, Insightful

      I completely agree with this. As long as the modules in question and the redisplay/use of the information did not violate the stated copyrights, then nothing wrong was done.

      As for treating the cause and not the symptom, how many slashdotter's will decry this act, but still support gun control?

      Just a curious question...

    2. Re:In short, no. by numbsafari · · Score: 2, Insightful

      It's easy to take Europeans' freedom because they tend to roll over and take it from anybody that'll give it to 'em. The same argument goes for gun control as it does for drug control. People shouldn't be arrested, convicted, hassled, jailed, tortured, abused, or whatever because they possess, use, sell or otherwise traffic in drugs. People SHOULD be arrested, convicted, hassled, jailed, whatever because they kill someone while driving under the influence, operating heavy machinery while under the influence, make fiduciary or medical decisions while under the influence, etc. Same thing with guns. You punish the act. Laws should not seek to control behavior by limiting freedom. They should seek to control behavior by punishing specific things. It makes the law much less complex, easier to enforce, easier to understand and easier to promote. American's get riled up over gun control because American's did, at one point, know what it is like to be ruled by a foreign power, to be denied basic freedoms. Europeans were for centuries the one's who ruled colonial powers. Don't get all high and mighty, because Europe is just as guilty, if not more so. I think watching France and Germany self-destruct is a great new spectator sport. It'll be interesting to see how and when they come to their senses.

  2. Re-read the article... by numbski · · Score: 5, Insightful

    So far as apps are concerned, again no.

    There's no law stating that we have to look at ads. Although I see the problem paying the bills, a flaw in a business model is not the problem of the application coder (namely: me, you, and most people reading this site).

    --

    Karma: Chameleon (mostly due to the fact that you come and go).

  3. Re:Sure they can! by interiot · · Score: 4, Insightful

    No they won't. The main goal of HTML wasn't so everything would be open and "stealable", the goal was to have content that could be viewed on a variety of platforms. You can't get that with flash or huge images, and in fact, for some of the more interesting devices (eg. cell phones, PDAs), it's explicitely required that the machine be able to understand the content to some extent so that it can transform it to something that better suits the particular device.

  4. It's their server... by Just+Some+Guy · · Score: 2, Insightful
    ...but the limit of their sphere of influence should be strictly limited to their users, and not the author of software that those users may use to retrieve content from the site.

    Put another way: particularly on a subscription site, the site owners may specify whatever stupid terms and conditions that their subscribers are willing to submit to. That does not mean, though, that the client software is obligated to know whether or not the software itself meets the TOS (nor can I be made to believe that this is possible).

    --
    Dewey, what part of this looks like authorities should be involved?
  5. Re:Sure they can! by superdan2k · · Score: 4, Insightful

    Yeah, and then they'll lose traffic and die because no one will bother wasting the time on their site.

    What a lot of companies fail to realize is that the Social Contract (philosophy, not law) applies as much to the relationship between client and customer as it does between Joe and Jim Average. Play by the rules and be part of society, or doom yourself...that's basically it. No man is an island. No company is an island...well, maybe Microsoft, but that's it.

    --
    blog |
  6. Re:Sure they can! by TheJesusCandle · · Score: 4, Insightful

    Thats what I tell my clients who try to "encrypt" things in this silly manner. I've written packages that defeat those silly "enter the word contained in the image" tests, I've written packages that defeat silly anti-automation scripts.

    It's really not hard.


    Sure, theres always the 2% that can get around any barier you put up. Stopping the 98% is usually good enough to justify the extra effort of developing these measures.

    You shouldnt complain too much about what your customers want, theyre paying you for your time right? Give 'em what they want.

  7. If you don't want window shoppers... by Eese · · Score: 5, Insightful

    ... don't put merchandise in the windows.

    Just like you can listen to unencrypted radio broadcasts through the airwaves as much as you want, or stand next to a group of people talking and listen in, you can view web pages that are served openly over the Internet.

    If you are going to be presenting something for people to observe, they can observe it however they like. Legislate all you want, but this is a fundamental component of logical (as opposed to legal) privacy.

  8. Why not? by JazzyJ · · Score: 5, Insightful

    There are a multitude of methods for providing different content based on what the client browser returns on certain environment variables. While I think it's silly to demand that modules be removed from CPAN, it's entirely up to the people running the server to determine who they want to serve content to....and who they dont.

    If they can't figure out how to do it serverside (or with clientside scripting) then that's their problem.

    That's the bitch about open standards....EVERYONE can use them.... :)

  9. Learn from Google by shiflett · · Score: 4, Insightful

    They should do as many of us do and learn a lesson from Google.

    It is a violation of Google's terms of use for you to "screen scrape" search results. You can implement their API using a free key and achieve similar results, however.

    Not only are these companies approaching the "problem" from the wrong angle in terms of common sense, they are also taking the most difficult approach. It is practically impossible to seek to outlaw software that fetches Web content, because Web browsers and wget (for example) are the same thing, HTTP clients. The HTTP protocol is an open standard that anyone can implement. If you don't want a valid HTTP client accessing your server, don't make your server an HTTP server.

    Stated another way, don't try to take an open standard and restrict everyone else's use of it to suit your own needs. You don't see me (an avid soccer player) trying to get the NBA to change the rules of their game to require use of the feet for ball control. If I want to play basketball, I have to play by the rules, else I am not really playing basketball.

  10. HTTP GET is an authorization by bwt · · Score: 5, Insightful

    This is just another example of gross technical incompetence by executives and lawyers.

    A company that attaches an HTTP server receives an HTTP GET request complete with some information in its headers. They have a reasonable case to request that that information be accurate. They have unilateral technical ability to firewall IP's or whole subnets. Otherwise, once they receive a GET request, when the machine that they have configured responds by sending a file, they have granted explicit permission to process that file consistent with the info in the GET request.

    The owner of the server is completely in control at a technical level. If they don't like what you are doing, they can firewall you. Absent a contractual agreement not to, you have the permission to send ***REQUESTS*** for anything you would like to request. They can say no. If you lie in your request, then they have a case to say your use is unauthorized, but short of that, there should be no need to have the judicial system rewrite the technology.

    1. Re:HTTP GET is an authorization by errxn · · Score: 2, Insightful

      Here's an analogy of sorts:

      You leave your house unlocked. Someone walks in the front door and steals your TV.

      According to our laws, just because you left your house unlocked (giving the outside world access) does not give the person who stole your TV a legal right to do so. They still committed a crime.

      Now, where this analogy might fall flat on his face is the idea that when you make a GET request, and the party on the other end responds by sending you a stream of data, have they just performed the equivalent of giving you the TV after you walked into their house? They can't very well say that you stole it if they willingly gave it to you.

      --
      In Soviet Russia, Chuck Norris will still kick your ass.
  11. Re:Sure they can! by umeboshi · · Score: 3, Insightful

    -- Sure, theres always the 2% that can get around any barier you put up. Stopping the 98% is usually good enough to justify the extra effort of developing these measures.

    they're trying to stop the %2 from sharing their knowledge with the other 98%

  12. Re:Sure they can! by Gojira+Shipi-Taro · · Score: 2, Insightful

    If they want to take an extreme measure such as that, fine. They are entitled to limit their viewership as much as they like. To take steps to get a project to eliminate code that offends them is going beyond the realm of reasonable request.

    If they wish to restrict which applications can access their content, it is up to THEM to take the measures necessary to restrict the access. It is not the responsibility of the developer to comply with their request.

    --
    "Oh my God. This is terrible. This is the end of my Presidency. I'm fucked."; ~ Donald J. Trump
  13. Dangerous Precedent by EnglishTim · · Score: 4, Insightful

    I find it sad that so many people seem to think it is just fine to mine their site for data. Sure, there's not all that much that they can do about it, except remove the data or make it harder for regular users of the site to use it.

    For example, The EuroTV site seems to work on the concept that they provide the information for free for users of their site, but you can pay them to get it on your site. They're using their site as an advert for their services, while at the same time offering a useful service to the community. By making freely available a system to allow anybody to use their data in their own websites without paying them for it, you're completely ridding them of their reason for having the site up at all.

    Yes, you can argue that they shouldn't put the information out there if they don't want people to use it, but then you're giving them a good reason not to put the information out there at all, which makes all of us poorer.

    As for whether they can dictate that CPAN remove the modules, certainly it's fair enough of them to request that the module be removed, but it is a shame they leapt to threats of lawsuits quite so quickly.

  14. Copyrights vs. Fair Use by prgrmr · · Score: 2, Insightful

    If content is obtainted in a manner that is not in violation of copyright, the next question is that of fair use. It didn't sound from the article that the either module author intended or enabled anything explicitly unfair for using the data. If the website owner's in questions were objecting strictly to the method with which their web data was being accessed, their arguement holds no water.

    This is somewhat similar to the "what constitutes a license" arguement regarding database licenses, the contention being a warm body vs. a connection. In the case of these perl modules, just because there's not a warm body explicitly directing the access of the data should not automatically qualify that access as a breach of copyright.

    It would be worth the effort to question both of the website owners as to what exactly did they consider the breach of copyright to be? My guess is that neither of them will be willing or able to express their concerns with enough technical detail or legal specificity to present a valid explanation.

  15. Re:Sure they can! by mr_z_beeblebrox · · Score: 3, Insightful

    That'll pretty well dictate what software you use to view their site.

    As the admin for a large distributor, I am often called to the desk of various sales people to install flash. I inform that flash is not supported in our environment. The result, well companies use websites because it costs a LOT less to process web orders than to process called orders (but the cost of order placement is only slightly different). Some of these companies depend on us as their largest customer. I have to date seen three websites rewritten to accomodate that policy. If we all leverage (buzzword ;-) ourselves as customers we can defeat the evil monolith. That is my contribution to the internet.

  16. Back in the day... by TheTick · · Score: 5, Insightful

    Remember when the web -- no, remember when the net was about sharing information? I miss that time. If somebody wrote a cool front end to your service, it was COOL and more power to them. If it made your service (site, whatever) more accessible, that mean more people were looking at your stuff, and that was COOL.

    Now we have entities that threaten legal action for accessing the stuff they've made publically available. There may actually be a case when the software scrapes and repackages the content (or, more importantly, redistributes it), but I hope the stuff about decoding the URL for easy use is bogus. I have my doubts that a court will see it my way, but still I hope for reason. Nevertheless, the whole idea makes me sad and nostalgic.

    Another thought: is my mozilla vulnerable to this sort of action because it blocks ads -- essentially repackaging the server output for display to me? Now I'm really depressed.

    --

    --
    bachiatari na torisetsu o yome!

  17. What's the problem here? by hmccabe · · Score: 5, Insightful

    I think this is something we're going to start seeing a lot of in coming years. Right now, the Internet in general is going through growing pains, and the pressure is starting to show in these "free services" type sites ( i.e. Mapquest )

    I don't know about these site in particular, but many of the big sites around today were built with the failed dot-com business model of delivering free content and selling advertising that ran on the page (or popped up behind it.) This, of course, is dependant on people viewing the site in a browser. If people get the information without using a browser, therefore never seeing the ads, the advertisers won't want to spend any money on the site.

    Another problem is, most companies don't want to take the risks associated with innovation, so instead they seek legal action to maintain the good thing they have going. While this is a quick fix, and in the company's best interests, we need companies to present a new business model to the public and see how it gets adopted. I would pay an annual subscription fee for things like Mapquest.com, tvguide.com and maybe even /. I believe others would as well.

    Porn sites, Ebay auctions, games such as Everquest and services such as Apple's dot-mac are online services that subscribers happily pay for because more than anything, they are quality products(well, some of the porn is). If the company's revenue is coming from its users, they would be a lot less concerned about how the information is being distributed.

    This isn't such a radical change, as they could add a premium subscription service, and slowly transition the focus of their business towards it. Wouldn't it be cool if I could write my own mapping application ( or download a pre-made one from the site ) and have it connect to xml.mapquest.com, give my username and password, and retrieve the data I requested.

  18. Re:Content is important by anaradad · · Score: 5, Insightful

    The eBay EULA only applies if you actually register for their service. If you have never signed up for eBay, you have never signed off on their EULA.

  19. Turing test? by siskbc · · Score: 4, Insightful
    So far, I was under the impression no one had won the Turing contest yet. You are beating their trivial problems, but they're finally waking up and shifting the "online human test" to things that people haven't figured out how to code. I'd link to the article if I could remember where I saw it...

    Hell, the simplest would be an easy reading comprehension or logic test with a short-answer blank - the computer would never get it, and all humans would.

    My guess is that soon, people who REALLY want you out will keep you out.

    --

    -Looking for a job as a materials chemist or multivariat

    1. Re:Turing test? by nuggz · · Score: 2, Insightful

      First off you assume people will be able to comprehend, I doubt that people are dumb. Don't belive me, listen to a daytime talk show.

      Second a computer will mark your answer, so it must be able to comprehend the answer you put in, you have to give a precise and exact answer (likely), which means its a simple question, and a computer might be able to answer it.

  20. It's not about technology by cygnusx197 · · Score: 2, Insightful

    You know, I think some of you are missing the point in all the technology. I work for a community newspaper publishing company, and we have copyright info at the bottom of every page. I found a guy on google that demonstrates screen scraping techniques using our main news page. That's fine. 99% of the time, it's not a big deal...it's going to happen. What we don't like is when somebody comes along, takes our content, and presents it in questionable environments, like a page that happens to have porn banners on it. Ever hear of "guilty by association"? Frankly, I think it's more likely to happen if screen scraping becomes more commonplace. Honestly... i haven't noticed a drop in traffic when someone does this.

  21. Re:Content is important by binaryDigit · · Score: 2, Insightful

    From ebay again:

    Welcome to the User Agreement for eBay Inc. The following describes the terms on which eBay offers you access to our services.

    This agreement describes the terms and conditions applicable to your use of our services available under the domain and sub-domains of www.ebay.com (including half.ebay.com, ebaystores.com) and the general principles for the Web sites of our subsidiaries and international affiliates. If you do not agree to be bound by the terms and conditions of this agreement, please do not use or access our services.


    Notice that it doesn't say anything about registering, it says "using their serice", which could be interpreted as also browsing, since that is a "feature" offered by their website. Registering simply brings into effect other parts of the eula that are applicable to those actions. If nothing else, the contents of the site are still copyrighted, so even if you didn't agree to their eula, you still couldn't do anything with the content.

  22. Re:Maybe they can't but... by pla · · Score: 4, Insightful

    but they can dictate whether you get the content or not

    Yes, they can. They have the option of not putting it on a public webserver in the first place. Beyond that, they have no control over who sees it and how. They can use various technological measures to try to control access, but short of forcing some form of user authentication via a secure proprietary client, the ad-blockers and scrapers *WILL* win.


    If they are getting no ad impressions, then they are getting no money.

    This statement seems a common way of viewing these issues (Ad blocking, scraping, whatever). However, realize that they don't have a "right" to make money just because they offer otherwise-free content online. They offer that content in the *HOPE* of making money, but that comes with no guarantees. And yes, I go to the kitchen during commercials, or change the station, or fast-forward.


    I see the problem as involving how offensive these sites make the ads. I find Flash and Shockwave ads so offensive (and, I find that they often crash my browser - the huge offensive Flash ad currently on the Onion, for example, crashes my browser every time) that I simply browse with them disabled. Pop(up/under) ads bother me enough that I have the "dom.disable_open_during_load" preference set to completely block them. In comparison, the small, unintrusive text ad in the upper left of K5's front page doesn't bother me at all, and I've even *clicked* on it a few times.

    Companies (not just advertisers, but those who serve such ads) need to realize that more annoying ads do make an impression - a strongly negative one. If I want their products, *I'll* seek *them* out. If they detract from my web browsing experience, I will specifically make a point of seeking out their *competitors* if I need something they offer.


    In case any marketing folks read this, I'll mention the last ad I *DID* watch - The one with the hamster and rabbit from Blockbuster. Why? Because I found the ad sufficiently amusing to watch, on its own merits. Important point there. It didn't annoy me, and it had value all by itself. *THAT* makes a positive impression on a potential customer. I don't even know what the hampster and rabbit talked about, but it doesn't matter, I remember that "Blockbuster amused me for 30 seconds". Making me waste a few minutes to figure out how to filter out your crap does *not* make a good impression. I will remember "X10 pissed me off for 30 seconds, let's visit Logitech's cam offerings instead".

  23. Phil Donahue Is My Cousin by Acidic_Diarrhea · · Score: 2, Insightful

    I don't believe the discussion is about whether or not screen scrape is feasible for people and whether or not it can be stopped through a bit of intelligence but is instead a discussion of whether or not one company has a right to grab content from a website and redistribute it on their own. Yes, it's possible to stop people from doing aforementioned grab (of course, as this war escalates you're going to have to start shutting real people out of your content) but should people have the legal right to do the grab. Now, what do you think of that question?

    --
    I hate liberals. If you are a liberal, do not reply.
  24. wipe the foam from your lips by sydlexic · · Score: 2, Insightful

    ashcroft is a thug regardless of his party affiliation. take of your partisan blinders and understand that patriotism != submission.

  25. Not completely by mccrew · · Score: 2, Insightful
    Follow that logic, then by having a telephone a diner has granted explicit permission to the telemarketer to interrupt his meal.

    Or more related to the point, here are some real-world scenarios:

    1. Spammer tries to relay through a machine by looking for well-known CGI. For example, I frequently see requests for /cgi-bin/formail.pl, with the Referer: header set to the name of my domain.

    2. Spammer tries to relay through either an HTTP server or HTTP proxy which supports the "CONNECT" method.

    Has the owner of the machine explicitly granted spammer permission to (mis-)use his machine, just because a well-known script is present, or because CONNECT is enabled on the wrong side of the internet connection?

    I would respectfully disagree.

    --
    Hey, Windows users, there is no such thing as "forward" slash, there is only slash and backslash.
  26. Re:What falls out the back end of a bull? by poot_rootbeer · · Score: 2, Insightful


    Maybe bullshit, maybe not. A good OCR library will get you 90% of the way there already.

    They can't distort the characters TOO much in the image, or else humans wouldn't be able to recognize them either. And the background patterns to cause interference with OCR sytems could be pretty easy to strip out too; a grid of straight black lines on a white background is fairly trivial to recognize algorithmically, and then removing the lines becomes a simple matter of figuring out where a black pixel is just part of a line, and where it's part of a character.

    Whether it's worth all that effort just to be able to automate the submission of a form is debatable.

  27. Don't like it? Don't put stuff on the web! by Maul · · Score: 4, Insightful

    If you put something on the web, you have to assume that people are going to access that information in any way that they possibly can.

    I suppose the big complaint is that people might not be viewing the "ads" on pages if they use certain HTTP clients.

    I have a suggestion for the sites that are complaining. If you don't like it, don't put stuff on the web. Write your own custom client-server solution if you don't want people accessing it with certain browsers or other software.

    If you are depending on ad banners for your revenues, you and advertisers are taking a "risk" that people might not see the ads, or that they might not buy advertised products. Tough luck if you lose out on your bet. Hopefully you have a solid way of making money related to whatever service you are providing to make up for it.

    Whining about lost ad revenue and such is the same as whining about losing money in Las Vegas. You should have assessed the risks before playing the game.

    --

    "You spoony bard!" -Tellah

  28. Fairly uninforceable. by nobodyman · · Score: 2, Insightful
    Even if you removed the screenscraping modules you wouldn't even come closs to solving the "problem" these website operators are having. Both Microsoft and (I think) Sun have XML api's that allow you to ssue http requests and easily access what the server sends back. Even if you didn't have a high-level "screenscraper", you could always go through the sockets api. Hell, if I want to find out the type of server a website is using I just open a telnet connection to port 80 and type
    GET <document_name> HTTP/1.0
    ...hit the return key twice and boom. Being that easy, I'm sure there are tons of developers that screenscrap without even using a mod.

    If a website operator is having their copyrighted content lifted by another site and presented as its own, then that operator can sue using traditional copyright law. If they are having their website slammed because some clueless developer is scraping too often, they can block the IP. But trying to restrict access to the api is heavy-handed and futile.
  29. Missing the point by Anonymous Coward · · Score: 1, Insightful

    When are people going to get it into their heads that public accessibility != public domain? This is, essentially, the argument that both authors and some supports make, that if it is publicly available then it is within the public domain. It isn't. Books in a library are not in the public domain simply because any schmuck off the street can stroll in and look at them. TV shows and sound recordings broadcast over radio waves are not in the public domain because anyone can pull the signal out of the air. Movies are not public domain because anyone willing to pony up the cash to see one can see it. Correspondingly, webpages are not in the public domain just because any nitwit with a computer and a connection to the 'net can load a webpage.

    Damn straight this about our rights online. It's an educational example that with rights come responsibilities. Those that abuse those responsibilities lose those rights.


  30. NF Chance by frovingslosh · · Score: 2, Insightful
    If human eyes can read it, someone can write software to parse it.

    Thats what I tell my clients who try to "encrypt" things in this silly manner. I've written packages that defeat those silly "enter the word contained in the image" tests, I've written packages that defeat silly anti-automation scripts.

    It's really not hard.

    Can something that recognizes text in an image be written? Sure. It's just a form of OCR. Can you write one that's able to look at any generic webpage, a mix of text and images, and do what is being asked of a human? I don't believe you can, and it seems a pretty high expectation of any software for the current state of AI. A targeted program for one website I might believe, but such tests for a human are certainly valid protection against web crawling 'bots.

    Which is not to say I in any way agree that screen scraping software in any way is a violation of a website owner's rights. It's not.

    --
    I'm an American. I love this country and the freedoms that we used to have.
  31. Legal? Probably. Rude? Maybe... by Rob+Parkhill · · Score: 4, Insightful

    EuroTV has a robots.txt file that asks to leave the various /scripts directories alone. If this Perl module is just ignoring that robots.txt file, then that is just rude, although I don't see how it is illegal.

    Streetmap doesn't even have a robots.txt file, so I don't see why they are whining about it.

    Although I can see why these websites could get upset. The TV-listing screen scrapers are especially bad at hammering a site relentlessly for a sustained period of time to obtain all of the programming information for a certian broadcast area. The scraper has to hit the site repeatedly to obtain all of the information, since it isn't all displayed on a single page. If any one of these scrapers gets to be really popular, it could kill the site.

    Of course, the solution to that is to make all of the listing available as one big chunk to avoid repeated requests. But then the site goes out of business in a few weeks due to lack of advertising revenue.

    I, for one, wish I could buy a subscription to zap2it.com that would give me fast, easy access to the channel listings in, say, XMLTV format. Is $25/year a reasonable fee, considering that I would only hit the site once a day at the most, and grab a single file?

    --
    "Tomorrow's forecast: a few sprinkles of genius with a chance of doom!" - Stewie Griffin
  32. stupid business models by g4dget · · Score: 2, Insightful
    First, people come up with stupid business models ("we'll put up copyrighted map data for free and make money from advertising"). Then, when it predictably turns out that people access that data programmatically, they whine.

    Let's not screw up our legal system with provisions to protect bogus business models. If streetmap.co.uk cannot figure out how to make money putting up information openly on the Internet, then either they should make room for someone who can, or maybe there just isn't a market there.

  33. How about Google, et al? by tjcoyle · · Score: 2, Insightful

    Gosh, I don't know, but don't I see Google redisplaying site content of billions of pages day in and day out?

    Sounds to me like the area's too grey to ascertain right and wrong (I may be, and probably am, ignorant).

    However, these sites definately have every right to do whatever they wish in order to prevent such use, such as IP blocking, taking some creative evasive measures, OR... securing content they don't feel Joe Public should consume.

    What would happen if say, General Motors suddenly decided that each and every time a GM vehicle shows up in media that it was an abuse of their intellectual property??

    Ptttth!

  34. screen scraping software is completely legal by frovingslosh · · Score: 4, Insightful
    Some /. readers seem to be missing this, but this is not a debate on if it's right to take someone's content and post it elsewhere. (To me it's clearly not without their permission, but that's not the issue here at all so lets not even pretend that it is by debating it.) The issue is "is it legal / proper/ ligitimate to write software that is capable of looking at the output of a website, by any means - including examining the HTML returned or by capturing the computer screen itself and analizing that? Of course it is. Such software in no way pirates a website owner's content, it just gives me additional tools for keeping current with the content of those pages. There are plenty of legitimate uses (the Streetmap reference was perfectly on target for this, just to give one). That someone might abuse such a tool and pirate content is hardly the issue, if it were every C compiler would also be at fault. People need to stand up against cranks like btek's Kate Sutton who think they can bully everyone else in the world. Simon Batistoni should have never even tried to be reasonable with her, and he should make his tool available again and sue her and her company for the slander she has done to him in the main perl5 bug queue.

    Even if he had provided a tool to make a copy of a map, which he did not, there is nothing at all wrong with making and supplying others with that tool. It's how the tool is used that is the issue, and a tool that has legitimate useful uses can never be allowed to be the target of such a complaint or suit.

    --
    I'm an American. I love this country and the freedoms that we used to have.
  35. Banning vs. Blocking by billstewart · · Score: 3, Insightful
    All sorts of people who don't understand the web or the Internet keep trying to get rules made or bring lawsuits or abuse the DMCA in novel ways because they don't like how their data is being used. In most cases, this is way out of line (as opposed to mildly out of line) because they can simply set their web server not to respond to requests they don't like.

    A classic instance is the "deep linking" cases, where somebody doesn't want to let you see their deep pages except by coming through their front page. Rather than taking this to court, as several content providers have done, and beat up on users one at a time, it's much simpler to check the HTTP-REFERER to find out what page the request came from, and send an appropriate response page to any request that doesn't come from one of their other pages. (Whether that's a 404 or a redirect to the front page or a login screen or whatever depends on the circumstances.)

    Screen scapers are an interesting case for a couple of reasons. One of them is that blind people often use them to feed text-to-speech browsers, so banning them is Extremely Politically Incorrect, as well as rude and stupid. Another is that anybody with a Print-Screen program on their PC can screen-scrape - you're only affecting whether they get ugly bitmaps or friendlier HTML objects. So you not only have to ban custom-tailored CPAN objects, you have to get Microsoft and Linus to break the screen-grabbers in their operating systems.

    The related question "ok, so how *do* I detect and block http requests I don't like?" is left as an exercise to the blocker (and to the people who build workarounds to the blocks, and the people who also block those workarounds, etc...) The classic answers are things like cookies (widely supported "need the cookie to see the page" features seem to be available), ugly URLs that are either time-decaying or dependent on the requester's IP address, etc., or just checking the browser to see which lies it's telling about what kind of browser it is. There's also the robots.txt convention for politely requesting robots to stay away, and Spider traps to hand entertaining things to impolite robots or overly curious humans.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  36. Re:Sure they can! by CaseyB · · Score: 2, Insightful
    The first one of those could be semi-easily defeted with a well written vision program.

    Unsubstantiated bullshit. And for every advance in smart OCR you come up with, I can come up with 10 obscuring transformations that leave it readable to humans but garbage to a computer.

    The second could be very easily defeated by a simple concept to image hash database.

    Yeah, you only have to model the recognition and indexing abilities of a human brain.

    The final test could simply be brute forced. Pick three buttons. Keep selecting those until they're right.

    You're ignorantly assuming that an implementation detail like radio buttons is core to the system.

    These proof of concepts show just the first step in writing a solid system.

    An obvious extension that I can think of, would be to implement a whole slew of different types of these problems, and then an engine that outputs a given problem -- and the method for determining the solution -- all into a bitmap. Then you have to deal with not only whatever first-order recognition is specific to the problem, but also the higher-order job of interpreting the nature of the problem itself: e.g. A picture of a guy brushing his teeth, with accompanying text "what is this man doing" OR "what color is the mans shirt?". Good luck to your software.

  37. Re:Derivative work by bwt · · Score: 4, Insightful

    The author does not create the "web page", that is the job of the user agent. The author offers up raw HTML source code and YOU render it. Your argument proves too much -- it proves that all rendering of HTML in a browswer is copyright infringement because it creates a derivitive work of his source code. Indeed, it DOES create a derivitive work, just one that is **authorized**.

    The author creates various files such as HTML text files, pictures, pdf's etc. By using HTML, he has authorized the user agent to render consistent with the HTML standard and his HTML code. Thus, he has explicitly authorized certain limited types of derivitive works to be made from his source code by using HTML. The HTML standard does not require images to be rendered, and since it was the author's choice to use HTML, no violation of copyright law occurs when HTML is rendered in a manner consistent with the HTML spec.

    Had he wanted to mandate the exact representation, he could have used an image format or a PDF. It's his choice, but he must live with it and all that follows from it.

    Of course, there is nothing wrong with not rendering the HTML at all and just looking at it as source code. Nor is there any cause of action under copyright law if you extract unprotectable facts and ideas from either the source code or the rendered version.

  38. money, business models and digital futures of IP by drDugan · · Score: 2, Insightful

    It all comes down to money and the models people have used to force advertizements onto people while they are entertained or eduacted.

    the cold, hard truth is that the digital future obviates the traditional content control mechanisms used to force consumers to watch ads for content. The exact same lines are playing out on the web, on TV, in music, movies, magazines -- everywhere informationcan be digitized and presented in ways not tied to physical mediums.

    The (now old) business models that the digital methods circumvent will eventually be redefined. Short term laws will support them, because the industries have eough money and clout to cause the laws to happen. Long term, though, people will no longer stand for the absurd, one-sided contract with society that is our current IP system.

    This a vague comment, quickly written -- but I see here the exact same theme played out over and over in recent years. Free communication (amortized) + 'digitizable' items of value => lack of control by provider for profits. This is yet another example.