Slashdot Mirror


Websites Complaining About Screen-Scraping

wilko11 writes "There have been two cases recently where websites have requested the removal of modules from CPAN. These modules could be used to access the websites (EuroTV and Streetmap) from a PERL program. The question being asked on the mailinglists (threads about EuroTV and about Streetmap) is 'can companies dictate what software you can use to access web content from their server?'"

432 comments

  1. In short, no. by numbski · · Score: 5, Insightful

    If you don't want your content being redisplayed on another site, place appropriate copyright and seek protections therein.

    Don't stifle the technology. Treat the cause, not the symptom.

    --

    Karma: Chameleon (mostly due to the fact that you come and go).

    1. Re:In short, no. by numbsafari · · Score: 1, Insightful

      I completely agree with this. As long as the modules in question and the redisplay/use of the information did not violate the stated copyrights, then nothing wrong was done.

      As for treating the cause and not the symptom, how many slashdotter's will decry this act, but still support gun control?

      Just a curious question...

    2. Re:In short, no. by Anonymous Coward · · Score: 0

      I'm pretty neutral on gun control, but comparing taking someone's life to taking some public content is hardly a realistic or fair comparison.

    3. Re:In short, no. by Anonymous Coward · · Score: 0

      Ever think that guns may be a part of the cause and not just a symptom?

      "The NRA says that guns don't kill people, people do... but I think the gun helps." - Eddie Izzard

    4. Re:In short, no. by Anonymous Coward · · Score: 0

      Because nobody got killed before guns were invented.

    5. Re:In short, no. by molarmass192 · · Score: 3, Funny

      Off topic but ...
      <grin>
      Everybody knows that guns DO NOT kill people ... bullets DO!!!
      </grin>

      --

      Good people do not need laws to tell them to act responsibly, while bad people will find a way around the laws-Plato
    6. Re:In short, no. by Anonymous Coward · · Score: 0

      Not with the ease and simplicity that the gun has to offer.

    7. Re:In short, no. by Anonymous Coward · · Score: 0

      What the fuck do ease and simplicity have to do with it?

    8. Re:In short, no. by Anonymous Coward · · Score: 0

      What the fuck do ease and simplicity have to do with it?

      It's easy to get Americans riled up over gun control, because of their simplicity.

      HTH

    9. Re:In short, no. by Ivan+Raikov · · Score: 1, Funny

      Everybody knows that guns DO NOT kill people ... bullets DO!!!

      I know you were only joking, but I feel obligated to point out that the acceleration of the bullet, times its mass, gives it the force necessary to penetrate your body and disrupt the function of vital organs.

    10. Re:In short, no. by Anonymous Coward · · Score: 0

      Goddamn physics, I knew it would be the death of me...

    11. Re:In short, no. by Anonymous Coward · · Score: 0

      Pick the one that's easier to do.

      Walking up to person and shooting them in the head or stabbing a person to death.

      Which is more personal? Which one requires more thought? Which one is more likely to get you caught? Which one is more likely to have the person fight back?

      Ease and simplicity has everything do to with it.

    12. Re:In short, no. by KDan · · Score: 1

      More to the point, I would say that the electrostatic interaction between the electron clouds of the bullet atoms and the electron clouds of the atoms of your body is what kills you.

      Daniel

      --
      Carpe Diem
    13. Re:In short, no. by shyster · · Score: 1

      Ah, so it's neither the gun nor the bullet that kills people...it's their own body!

    14. Re:In short, no. by KDan · · Score: 1

      Yup. They should be put away for murder. The bodies, that is...

      Daniel

      --
      Carpe Diem
    15. Re:In short, no. by Opie812 · · Score: 0

      Physics doesn't kill people guns do!

      --
      I'm not a nerd. Nerds are smart.
    16. Re:In short, no. by drDugan · · Score: 1

      wait wait

      physics nearly killed ME! (in college)

    17. Re:In short, no. by Anonymous Coward · · Score: 0

      funny eh?

      people sometimes kill other people by hitting them really hard over the head ... with guns (the bigger the better)

    18. Re:In short, no. by egreB · · Score: 1

      We're on to knitpicking? Great!

      Actually, the only thing it's possible to die from, is lack of oxygen support to the brain. Every cause of death eventually leads to this, and only then can you be declared medically (?) dead. Although, sometimes it's possible to wake a dead person (for example with CPR).

      At least that's how they define death here in Norway. Dunno how it is with american guns and bullets.

    19. Re:In short, no. by jadavis · · Score: 1

      Might you be referring to the kinetic energy ((1/2)*mass*(velocity^2)) required to break the chemical bonds that are necessary for survival?

      Accelleration times mass is force. But force doesn't kill people, force exerted over a distance kills people.

      Of course, the energy must be directed properly, and enough must be transferred. If the bullet has the same kinetic energy but the force is exterted over a long distance and a large area, it won't do much damage.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    20. Re:In short, no. by numbsafari · · Score: 2, Insightful

      It's easy to take Europeans' freedom because they tend to roll over and take it from anybody that'll give it to 'em. The same argument goes for gun control as it does for drug control. People shouldn't be arrested, convicted, hassled, jailed, tortured, abused, or whatever because they possess, use, sell or otherwise traffic in drugs. People SHOULD be arrested, convicted, hassled, jailed, whatever because they kill someone while driving under the influence, operating heavy machinery while under the influence, make fiduciary or medical decisions while under the influence, etc. Same thing with guns. You punish the act. Laws should not seek to control behavior by limiting freedom. They should seek to control behavior by punishing specific things. It makes the law much less complex, easier to enforce, easier to understand and easier to promote. American's get riled up over gun control because American's did, at one point, know what it is like to be ruled by a foreign power, to be denied basic freedoms. Europeans were for centuries the one's who ruled colonial powers. Don't get all high and mighty, because Europe is just as guilty, if not more so. I think watching France and Germany self-destruct is a great new spectator sport. It'll be interesting to see how and when they come to their senses.

    21. Re:In short, no. by Crashmarik · · Score: 1

      Actually its not guns its those pesky bullets. Crash

  2. Sure they can! by stile · · Score: 5, Interesting

    If we piss them off enough by chopping off their advertisements and snipping out their content, they'll just write their sites in Flash, or as one big image file, or some other proprietary format. That'll pretty well dictate what software you use to view their site.

    1. Re:Sure they can! by interiot · · Score: 4, Insightful

      No they won't. The main goal of HTML wasn't so everything would be open and "stealable", the goal was to have content that could be viewed on a variety of platforms. You can't get that with flash or huge images, and in fact, for some of the more interesting devices (eg. cell phones, PDAs), it's explicitely required that the machine be able to understand the content to some extent so that it can transform it to something that better suits the particular device.

    2. Re:Sure they can! by superdan2k · · Score: 4, Insightful

      Yeah, and then they'll lose traffic and die because no one will bother wasting the time on their site.

      What a lot of companies fail to realize is that the Social Contract (philosophy, not law) applies as much to the relationship between client and customer as it does between Joe and Jim Average. Play by the rules and be part of society, or doom yourself...that's basically it. No man is an island. No company is an island...well, maybe Microsoft, but that's it.

      --
      blog |
    3. Re:Sure they can! by TheJesusCandle · · Score: 4, Insightful

      Thats what I tell my clients who try to "encrypt" things in this silly manner. I've written packages that defeat those silly "enter the word contained in the image" tests, I've written packages that defeat silly anti-automation scripts.

      It's really not hard.


      Sure, theres always the 2% that can get around any barier you put up. Stopping the 98% is usually good enough to justify the extra effort of developing these measures.

      You shouldnt complain too much about what your customers want, theyre paying you for your time right? Give 'em what they want.

    4. Re:Sure they can! by SoCalChris · · Score: 4, Interesting

      You have good points, but try explaining that to a very non-technical executive who is afraid that everyone is out to steal their content. I've seen many companies that will do their entire website in Flash just so the content can't be "stolen".

      Personally, I refuse to install the Flash plugin, so if I come to one of these pages looking to do business, oh well. I'll just go somewhere else. The higher up people in companies that make all Flash sites don't seem to realize that Flash is annoying to a lot of people.

    5. Re:Sure they can! by TheTomcat · · Score: 1

      You mean like slashdot formkeys?

      Yeah, me too (see sig).

      S

    6. Re:Sure they can! by umeboshi · · Score: 3, Insightful

      -- Sure, theres always the 2% that can get around any barier you put up. Stopping the 98% is usually good enough to justify the extra effort of developing these measures.

      they're trying to stop the %2 from sharing their knowledge with the other 98%

    7. Re:Sure they can! by Gojira+Shipi-Taro · · Score: 2, Insightful

      If they want to take an extreme measure such as that, fine. They are entitled to limit their viewership as much as they like. To take steps to get a project to eliminate code that offends them is going beyond the realm of reasonable request.

      If they wish to restrict which applications can access their content, it is up to THEM to take the measures necessary to restrict the access. It is not the responsibility of the developer to comply with their request.

      --
      "Oh my God. This is terrible. This is the end of my Presidency. I'm fucked."; ~ Donald J. Trump
    8. Re:Sure they can! by mr_z_beeblebrox · · Score: 3, Insightful

      That'll pretty well dictate what software you use to view their site.

      As the admin for a large distributor, I am often called to the desk of various sales people to install flash. I inform that flash is not supported in our environment. The result, well companies use websites because it costs a LOT less to process web orders than to process called orders (but the cost of order placement is only slightly different). Some of these companies depend on us as their largest customer. I have to date seen three websites rewritten to accomodate that policy. If we all leverage (buzzword ;-) ourselves as customers we can defeat the evil monolith. That is my contribution to the internet.

    9. Re:Sure they can! by Anonymous Coward · · Score: 0

      CowboyNeal is an island. Hell, he's big enough to be the sovereign nation of Homoslavia.

    10. Re:Sure they can! by Yuan-Lung · · Score: 1

      No man is an island. No company is an island...well, maybe Microsoft, but that's it.

      um... actually, with $40 billion on hand, MS is more like a small nation.

    11. Re:Sure they can! by catch23 · · Score: 1

      And what's worse is that those sites with flash content can STILL be stolen. Yeah it's more work, but just like all those useless encryption mechanisms with DVDs and CDs, if you can see it with your own eyes, it can be stolen.

    12. Re:Sure they can! by MisterMook · · Score: 1

      If companies understood implicit social contracts with their customers then there would be no Microsoft or XXAA's. Instead of attempting to win consumer support most businesses today seem to believe that the way to sell product is to institute the most draconion controls imaginable on their products and the ruthlessly savage the competition until they have as much of a monopoly as allowable by law (or buyable by law).

    13. Re:Sure they can! by CaseyB · · Score: 4, Interesting
      If human eyes can read it, someone can write software to parse it.

      Uh huh.

      Good luck, buddy.

    14. Re:Sure they can! by MyPantsAreOnFire! · · Score: 1

      Nailed it on the head. I write software that does scraping of a very popular site, and said site has an expensive plan to allow access to their database. They've thrown some rudimentary tricks out that I've been able to counter, but for the most part, if your webserver will respond to a request, there is someone out there who can parse it.

      The smart alternative for sites such as these is to offer a pay-for-API option for users or other sites that don't want to spend months writing scrapers and parsers. If the site that I scrape gave me database access for $5/month, I'd take it in a second -- It saves me time, costs me less, and they make money off of me using their services.

      --
      --My other sig is a ferrari.
    15. Re:Sure they can! by Anonymous Coward · · Score: 1, Informative

      The problem with the examples you gave is that the first 2 fail the human comprehension test as well.

      The first one could be (orca, oyca, oycd, orcd).

      The second asked: "What are these pictures of?" and gave 6 pictures. I assume that they want an answer that applies to all 6 pictures, but damned if I could come up with a common theme for all 6.

      Now, if I went to a site that used this technology for something, I would get frustrated and leave. Kinda defeats the purpose of using it in the first place don't you think?

    16. Re:Sure they can! by CaseyB · · Score: 3, Funny
      Hmm. I didn't have trouble with any of them. (Reload for different variants.)

      It may be that the tests go beyond a simple Turing test and also validate for a certain level of intelligence. I suppose that would be useful sometimes as well.

      "You must be _this_ smart to ride our web site."

    17. Re:Sure they can! by Lt+Razak · · Score: 1
      I wonder what your site is that you're talking about. Currently I'm scraping a site too, (music related) but besides messy HTML, there are no tricks. I'm hoping no one has scraped them badly enough to cause them to change their HTML up much.

      I know the site I am trying to build a database from charges a lot of money for their database. I don't think a $5/month charge would work in this case. I believe that the only revenue they create happens to be the few customers that they sell their database to.

    18. Re:Sure they can! by Anonymous Coward · · Score: 0

      Sure, theres always the 2% that can get around any barier you put up. Stopping the 98% is usually good enough to justify the extra effort of developing these measures.

      It's those 2% (actually I'm sure it's far, far less) that create the applications that lets the other 98% get around the barriers also. Witness popup-blockers, Kazaa, DeCSS, PGP, etc.

    19. Re:Sure they can! by epyT-R · · Score: 1
      You have good points, but try explaining that to a very non-technical executive who is afraid that everyone is out to steal their content. I've seen many companies that will do their entire website in Flash just so the content can't be "stolen".


      Then that non-technical executive and his idiotic company have no business on the internet. Plain and simple. The big problem here is that these companies insist that their 'web content' is somehow 'valuable' and therefore needs to be 'protected.' Its rediculous. When are these companies going to realize the internet can't be made into a top-down media streaming service where their site is the ONLY place to get certain data? They only want it this way so they can charge money for it (and we all know how worthless most of the content out there is).

      One of the great things about the net is the concept of a mirror (Nvidia, get a clue). With multiple mirrors, the information is fairly well protected from destruction and no one node (and its networks) of the net is overloaded with traffic. Of course, Orwellian rewrites may be exactly what these companies want.

      To these companies: Give it up, you can't pay for (or make much money from) sites with advertising alone. Either charge for access or pay up for the bandwith and write it off as advertising your company's presence. You could also do us all a favor and get out and leave room/bandwith for people who will put up some decent content.

      Unrealistically idealistic? Definately. But hey, I'm an old school net user and I miss the pre-commercial, pre-AOL days.
    20. Re:Sure they can! by Anonymous Coward · · Score: 0

      Say Hi to the networked present: 2% is thousand times more than enough. One DeCSS was enough. One screenreader is enough.

    21. Re:Sure they can! by Anonymous Coward · · Score: 0

      I think the people that might use these perl modules would constitute only 2%, yet they seem to want to put the effort into stopping them. Seems like they have nothing better to do, if you ask me.

    22. Re:Sure they can! by Anonymous Coward · · Score: 0

      Cheap and easy quips about intelligence aside (must have appealed to the low brow moderators on crack), the first picture looked a 4 letter word that was warped in photoshop. It was difficult to determine with certainty what each letter was.

      The second link had 6 pictures. Two of the pictures could have been farmers, but one was a bottle with crosses on it, and the other was a crayon drawing of god knows what.

    23. Re:Sure they can! by ivan256 · · Score: 1

      The first one of those could be semi-easily defeted with a well written vision program. The second could be very easily defeated by a simple concept to image hash database. Such a database could be built through colabaration such that each user would provide data for the database occationally, but most of the time the interaction could be automated. The final test could simply be brute forced. Pick three buttons. Keep selecting those until they're right. Given enough processing power it would be significantly easier than that.

      When designing these novelty programs, the designers assume that nobody will be able to quickly gain an understanding of how a small portion of the human brain works enough to implement it's functionality in software. Worse, by exploiting what we don't know about writing software to emulate the brain, they've define the probelm set in such a way that any guarantees they provide will be short lived. In the present, these programs for ensuring you're interacting with an actual person might be useful for single transactions that involve cash, but they're useless for ensuring you're not geting your web content scraped by a script because you don't need better than 10-20% accuracy when you have infinite retries. In the long term (which is 12-48 months probably) 80-99% accuracy will be easily achieved through study and improvements in hardware.

      In other words, If human eyes can read it someone can write software to parse it.

    24. Re:Sure they can! by ivan256 · · Score: 1

      Whoops. Submit instead of preview. Sorry about the broken sentences. I was supposed to fix those before posting. You get the idea though.

    25. Re:Sure they can! by Anonymous Coward · · Score: 0

      That doesn't mean that this tactic would work to keep a site with a different target demographic readable. If there is a strong correlation between "has Flash installed" and "is a valued customer" on one hand and "complains about our webdesign" and "only costs us money" on the other hand, they'll be glad to let you know how to remove their address from your bookmarks file. Do you think that avoiding critical thoughts goes well with being a good customer or are the informed and demanding customers more welcome?

    26. Re:Sure they can! by CaseyB · · Score: 2, Insightful
      The first one of those could be semi-easily defeted with a well written vision program.

      Unsubstantiated bullshit. And for every advance in smart OCR you come up with, I can come up with 10 obscuring transformations that leave it readable to humans but garbage to a computer.

      The second could be very easily defeated by a simple concept to image hash database.

      Yeah, you only have to model the recognition and indexing abilities of a human brain.

      The final test could simply be brute forced. Pick three buttons. Keep selecting those until they're right.

      You're ignorantly assuming that an implementation detail like radio buttons is core to the system.

      These proof of concepts show just the first step in writing a solid system.

      An obvious extension that I can think of, would be to implement a whole slew of different types of these problems, and then an engine that outputs a given problem -- and the method for determining the solution -- all into a bitmap. Then you have to deal with not only whatever first-order recognition is specific to the problem, but also the higher-order job of interpreting the nature of the problem itself: e.g. A picture of a guy brushing his teeth, with accompanying text "what is this man doing" OR "what color is the mans shirt?". Good luck to your software.

    27. Re:Sure they can! by Anonymous Coward · · Score: 0

      well, what do you do about homestarrunner?

      can't live without that!!

    28. Re:Sure they can! by Anonymous Coward · · Score: 0

      No company is an island...well, maybe Microsoft, but that's it.

      Yeah, the Island of Dr. Moreau!

    29. Re:Sure they can! by Jeremy+Erwin · · Score: 1

      What do you know? The pix captcha finally accepts plurals. It still doesn't realize that infant == baby.

    30. Re:Sure they can! by ivan256 · · Score: 1
      Unsubstantiated bullshit. And for every advance in smart OCR you come up with, I can come up with 10 obscuring transformations that leave it readable to humans but garbage to a computer.

      Clearly you haven't read their paper.

      "There is no way to prove that a program cannot pass a test which a human can pass, since there is a known program - the human brain - which passes the test."

      The creaters for CAPTCHA acknowledge that you can programatically defeat their creations, but that it is hard to do with high accuracy. My assertions are as follows:

      • It is significantly easier to achieve a low degree of accuracy than a high degree of accuracy, and a low degree of accuracy is all that is necissary to successfully scrape web content.
      • As time goes by it will become easier do defeat any given CAPTCHA
      • Eventually you will run out of ways to obfuscate information such that a human bran will be the only way to decipher it, because eventually we will be able to outdo our own brains with software. In fact Ray Kurzwiel thinks we will do that in the next 40 years. I don't know if it will be that soon, but eventually you'll run ot of ways to fool the software.


      I challange you do come up with a credible argument to refute any of those points.
    31. Re:Sure they can! by antirename · · Score: 1

      That software must be in EARLY beta. Type in the answer in all caps and it tells you that you failed the test. Great job. Really, I'm impressed. And I never would have thought to type in "automovile" as an answer... car, cars, CAR, CARS, maybe. This kind of stuff is more frustrating than useful. Hell, even MS Office is smart enough to ignore case by default when you do a find :)

    32. Re:Sure they can! by Resseguie · · Score: 1
      I had the same problem. I consider myself a fairly intelligent individual and it took me a while (longer than I'm usually willing to wait on a website) to figure out the right answer. Several of the warped four letter words were extremely difficult to distinguish between a' and d's for example. Some of the letters with serifs where especially difficult to determine if it was a warped serif or actually part of the letter. Etc. And I have no idea what a picture of some guy with crazy hats on his head has to do with "television".

      Someone else mentioned this earlier, but I also wonder about things like 508 compliance. There's no way I could get get by with using something like this on a federal website. The idea of making it impossible to read by computer is mutually exclusive with making it readable by computer for special disability software.

    33. Re:Sure they can! by PetWolverine · · Score: 1

      Uh-oh...

      Computers may not be able to pass those tests yet, but they can obviously post to /.!

      Predicted new moderation category: -1, Computer-generated. I might have to start making typos to convince the mods I'm human!

      --
      I found the meaning of life the other day, but I had write-only access.
    34. Re:Sure they can! by Watts+Martin · · Score: 1

      The higher up people in companies that make all Flash sites don't seem to realize that Flash is annoying to a lot of people.

      People who disable Flash and refuse to patronize sites that require it don't seem to realize how small a minority they actually are. The Flash client is installed on 98% of the browsers in use (according to Macromedia's estimate last year), and most people don't disable it because they're not so bothered by it. I skip over Flash intros and most of the time don't even notice Flash interfaces, except to the degree that they're fluidly animated in ways that would be far more bandwidth-consuming. Flash also provides greater degree of cross-platform compliance than HTML and CSS do, as sad as that is to say--a Flash designer doesn't have to worry about how IE 6 on Windows and IE 5 on the Mac don't quite sync up, how Mozilla does things slightly differently, and how Netscape 4 is horribly, horribly broken with respect to standards. Flash just works.

    35. Re:Sure they can! by DoXaVG · · Score: 1

      Neat, I didn't know automobile was spelled with a 'v'!

      --Dox

      The possible words were:

      car cars automovile

    36. Re:Sure they can! by DoXaVG · · Score: 1

      And in fact a large number of corporations block Flash for security reasons. I for one work in an environment where anything inside the OBJECT tag is stripped and removed. This was done initially to remove ActiveX for security reasons, however in light of more recent security issues surrounding flash and other activex plugins it seems it was a good tactic.

      There are _way_ to many sites that use flash and don't have other ways in.

      --Dox

    37. Re:Sure they can! by machine+of+god · · Score: 1

      I've written packages that defeat silly anti-automation scripts. So that's how cowboyneal get's all those votes.

    38. Re:Sure they can! by Anonymous Coward · · Score: 0

      >It still doesn't realize that infant == baby.

      Or that bikes == bicycles. The only infant I saw was the pix project itself, talk about "early stages."

    39. Re:Sure they can! by Anonymous Coward · · Score: 0
      Hmm. I didn't have trouble with any of them. (Reload for different variants.)
      It may be that the tests go beyond a simple Turing test and also validate for a certain level of intelligence. I suppose that would be useful sometimes as well.

      "You must be _this_ smart to ride our web site."
      The problem is that if you're trying to test someone's aptitude, it's crucial that you have a good tester/proctor. The 'pix' project is a pretty cool idea, but it isn't anywhere near that level yet. I went through a bunch of the photo series and ran into several problems.

      The first series I encountered had 2 or 3 pictures of futuristic looking maglev trains, one picture of a woman eating what appeared to be a bowl of noodles, one picture of a horse-drawn carriage, and a picture of something else. OK, so 3 of the pictures were clearly about trains. But the woman eating chow mein? The carriage? Those have nothing to do with trains. I failed the test.

      The second series I saw had a similarly confusing array of images. Two of them depicted toothbrushes very clearly. A third showed a girl sitting down holding a bowl; I have to assume she was brushing her teeth and spitting into the bowl, the image was too small to tell. Another image showed what looked like it might have been a toothbrush factory, but again, I wasn't really able to make out what was in the photo. I answered toothbrushes and got it correct, but I wasn't confident in my answer.

      One series was clearly about bikes. Oops, make that bicycles. Bikes was not one of the potential answers, so even though I did make the connection between the images perfectly well, I failed the test. I was "_this_ smart" - but the test wasn't.

      This tech is cute to play with. It's got a damn long way to go before it's going to be useful, though.
    40. Re:Sure they can! by belrick · · Score: 1

      Don't you find it funny that when they give you the "monkey" picture it is really a picture of a chimpanzee. So an intelligent human will fail that test if they answer the question honestly.

    41. Re:Sure they can! by dpt · · Score: 1

      So ... you don't mind the gaping security holes in Flash clients then?

      I do, therefore I will not allow it. End of story. If I was to go back to being a sadmin, and take the large pay cut that implies, it would be summarily removed from all machines under my control.

    42. Re:Sure they can! by FireBreathingDog · · Score: 1
      Eventually you will run out of ways to obfuscate information such that a human bran will be the only way to decipher it, because eventually we will be able to outdo our own brains with software. In fact Ray Kurzwiel thinks we will do that in the next 40 years. I don't know if it will be that soon, but eventually you'll run ot of ways to fool the software.

      Kurzweil's book was a fascinating read--and he made many intriguing points--but in this case, he assumes that the barrier to emulating a brain today is the amount of computing power available to do it successfully. But, we don't know enough about the functioning of the brain now to model it successfully even if we had the processing capacity. Ultimately, I think it'll take quite a while longer than 40 years, because our understanding won't expand quickly enough to enable it sooner.

    43. Re:Sure they can! by ivan256 · · Score: 1

      He has newer writings if you got to his website. He talks about how the ability to scan the brain to better understand how it's working without negatively affecting the person being scanned is increasing exponentially, and that at aproximately the same time as we acquire enough computing power to emulate the brain, we'll also have the ability to map every cell and interconnection, and record every firing of every neuron.

      This is, of course, assuming that the progress of technology continues at the current rate. As we know, past numbers are no guarantee of future performance. Either way, we'll figure it out eventually, wether it be in 40 years or 400...

    44. Re:Sure they can! by interiot · · Score: 1
      I'm just saying that HTML has many merits (eg. all the reasons it was created for) besides being screen-scrapable (which wasn't one of the reasons it was created) and those merits are ample or else it wouldn't be so popular right now. So while there may be individual content providers (who don't mind their content being only available on one kind of device with only a certain type of software) who won't use HTML, there's enough of a draw to HTML that it likely will be around for quite some time.



      Furthermore, it's possible that to make content be device-independant like HTML is, the content MUST be screen-scrapable. That is, the software understand exactly what bits are content and what bits are presentation, so it can ignore/mess with the presentation and leave the content alone, so just the part about "knowing exactly what's content" allows a screen scraper to work.

    45. Re:Sure they can! by mr_z_beeblebrox · · Score: 1

      Do you think that avoiding critical thoughts goes well with being a good customer or are the informed and demanding customers more welcome?

      That question goes very well with yur first statement regarding demographics. We distribute gourmet food. All of our vendors want critical thought from us as we do from our customers. We want customers who plan what they do with our merchandise, who know and understand each part of purchasing. Food is a high margin item and very few customers "cost" money, and distributors rarely (if ever) move into the expense to maintain category (and when they do they are dropped like hot potatoes and frequently just die)

  3. Comment removed by account_deleted · · Score: 4, Interesting

    Comment removed based on user account deletion

  4. Re-read the article... by numbski · · Score: 5, Insightful

    So far as apps are concerned, again no.

    There's no law stating that we have to look at ads. Although I see the problem paying the bills, a flaw in a business model is not the problem of the application coder (namely: me, you, and most people reading this site).

    --

    Karma: Chameleon (mostly due to the fact that you come and go).

  5. enter the word in the image software by yppiz · · Score: 1

    Do you deal with word in the image tests without requiring the user to read the word? How?

    --Pat

  6. Maybe they can't but... by Pyromage · · Score: 1

    Maybe they can't dictate what you use to access their content, but they can dictate whether you get the content or not. Seriously: If they are getting no ad impressions, then they are getting no money. Poof, your not getting the service any more.

    I don't know what the answer is, but seriously, abusing the intent of their services (which *IS* to generate ad revenue, after all) shall do little but get them to change or remove those services.

    Oh, and kudos for them for not just up and suing CPAN (which they have little grounds for, but we all know that proof is worth less than a cat fart in the U.S. legal system).

    1. Re:Maybe they can't but... by Anonymous Coward · · Score: 0

      So if I browse to their site, but cover the ads on the screen with Post-It (tm) notes, I'm stealing from them? What about accessing the site via lynx?

    2. Re:Maybe they can't but... by Pyromage · · Score: 1

      I see your point, and agree with you; in fact, I use links myself as I post this.

      But I never said you were stealing from them: WHat I did say was that you were abusing the intent of their services.

      The "spirit" of what they offer is information in exchange for ad impressions. They are trying to offer a service in exchange for you viewing ads.

      You may not be stealing from them, but you can't deny that 1) they intend for viewers of their information to also view the ads and 2) that you are abusing it when you violate their intentions.

      And it may not cost them money, but it's money they aren't making, and if they are not making any money, how do you propose they stay in business?

      I think it'd be great to be able to use these perl moduls w/o putting them out of business, and I'd love to do so, but seriously, how do you propose to have this occur?

    3. Re:Maybe they can't but... by pla · · Score: 4, Insightful

      but they can dictate whether you get the content or not

      Yes, they can. They have the option of not putting it on a public webserver in the first place. Beyond that, they have no control over who sees it and how. They can use various technological measures to try to control access, but short of forcing some form of user authentication via a secure proprietary client, the ad-blockers and scrapers *WILL* win.


      If they are getting no ad impressions, then they are getting no money.

      This statement seems a common way of viewing these issues (Ad blocking, scraping, whatever). However, realize that they don't have a "right" to make money just because they offer otherwise-free content online. They offer that content in the *HOPE* of making money, but that comes with no guarantees. And yes, I go to the kitchen during commercials, or change the station, or fast-forward.


      I see the problem as involving how offensive these sites make the ads. I find Flash and Shockwave ads so offensive (and, I find that they often crash my browser - the huge offensive Flash ad currently on the Onion, for example, crashes my browser every time) that I simply browse with them disabled. Pop(up/under) ads bother me enough that I have the "dom.disable_open_during_load" preference set to completely block them. In comparison, the small, unintrusive text ad in the upper left of K5's front page doesn't bother me at all, and I've even *clicked* on it a few times.

      Companies (not just advertisers, but those who serve such ads) need to realize that more annoying ads do make an impression - a strongly negative one. If I want their products, *I'll* seek *them* out. If they detract from my web browsing experience, I will specifically make a point of seeking out their *competitors* if I need something they offer.


      In case any marketing folks read this, I'll mention the last ad I *DID* watch - The one with the hamster and rabbit from Blockbuster. Why? Because I found the ad sufficiently amusing to watch, on its own merits. Important point there. It didn't annoy me, and it had value all by itself. *THAT* makes a positive impression on a potential customer. I don't even know what the hampster and rabbit talked about, but it doesn't matter, I remember that "Blockbuster amused me for 30 seconds". Making me waste a few minutes to figure out how to filter out your crap does *not* make a good impression. I will remember "X10 pissed me off for 30 seconds, let's visit Logitech's cam offerings instead".

    4. Re:Maybe they can't but... by leonardluen · · Score: 1

      the parent makes some good points!

      any ad that i find that is either flash, pops up/under or just blinks to much has the host end up being added to my hosts file pointing to 0.0.0.0 for both my windows and linux machines.

      that means i never see any ads from that company again.

    5. Re:Maybe they can't but... by mobiGeek · · Score: 1
      I see the problem as involving how offensive these sites make the ads. I find Flash and Shockwave ads so offensive [...] that I simply browse with them disabled.
      Problem is that you are an advanced-user. The average user [i.e. The Masses (tm)] don't know how to disable this stuff and can't be bothered to find out.

      And the mainstream manufacturers of the tools they use to access such offensive materials are not AT ALL persuaded to add blocking/fast-forwarding features...they are, after all, mostly owned or partnered with the Mass Media outfits themselves.

      This is the reason for the big push for "home console units". Xbox and PS2 aren't simply "rivals" to Nintendo. MS and the likes see the extreme control that these consoles give them in serving content to the user.

      Imagine a world where your email, browsing, games, television, movies, music, voice-mail, messaging, (and more!) are contolled by a single device.

      Now consider this device to be developed by MicroSonyTimeFoxVivendi Corp.

      EuroTV and Streetmap are trying to use legal tactics to tackle technical problems. Soon the Big Fish will use the ultimate technical tactic...they own the devices and the lend them to us...

      --

      ...Beware the IDEs of Microsoft...

    6. Re:Maybe they can't but... by Mr.+McD · · Score: 1

      Dood, that blockbuster Ad sucked. That kind of humor usually appeals to the "Grandma" crowd.

      It's no way near as amusing as "Terry Tate Office Linebacker" or the Diet Dr. Pepper "3" Ad.

    7. Re:Maybe they can't but... by pla · · Score: 1

      EuroTV and Streetmap are trying to use legal tactics to tackle technical problems. Soon the Big Fish will use the ultimate technical tactic...they own the devices and the lend them to us...

      I don't mean this as elitist or anything, but...

      Good.

      *Let* the media conglomerates serve all their precanned crap via a single totally controlled TV-like device, and get the hell off the internet... Perhaps we can go back to content for its own sake, rather than content to make a buck from an uncooperative audience and then pissing and moaning when people try to block out what they never wanted in the first place; back to relative obscurity wherein we didn't have laws passed specifically to stop the actions of a tiny minority of people because the mindless hordes discovered they could take advantage of those activities to commit minor crimes on a massive scale. Give me a distributed mesh network of open WAPs all run by private individuals, with no corporate presence whatsoever, and I'll say goodbye to mass media forever.

      Currently I watch exactly one half hour of TV per week - South Park. And even that has started getting stale. The only commercial web-sites I regularly visit (not counting sites like Slashdot, who have commercial backing but all their content comes from totally uncompensated contributors) include those that serve certain very specific types of information, such as Google, Weather.com, or Yahoo news. And if I had to pay a small fee to use those, I would do so (providing I could get *just* those sites and ad-free, rather than the way cable companies package channels so I have to pay $50/mo more just for sports channels that I have literally never watched and *STILL* have 15 minutes of ads per hour). Personally, I wouldn't even *have* cable, except it would cost me just as much to get broadband internet without the TV feed (five dollars difference, actually).

      Unfortunately, I don't see my idea of the optimal outcome of this as likely. The current trends seem to have us moving toward no such thing as a general-purpose PC, totally precanned media content with no control whatsoever over it, and "perfect" tracking of our media-using habits to make sure we get a legally mandated fair share of ads.

      So overall, I guess I agree with you. But it depresses the hell out of me.

    8. Re:Maybe they can't but... by Anonymous Coward · · Score: 0

      No no no.

      You've described how you felt about ads, and got pissed, and didn't get pissed. But you didn't *buy* anything.

      I drives me nuts but my friends sometimes buy stuff from annoying ads. And when it's junk and doesn't work (a cell phone antenna that sticks inside the battery compartment, and connects to -- drumroll -- *nothing*, for example), they say "Oh well, it was only $10." They don't even get pissed off about being ripped off, let alone by the annoying ad!

      It only takes 1 buyer in 1000 (or 5000) to make the business case for the pain-in-the-ass ad. A hundred geeks turning themselves into single-person focus groups to explore their feelings about a marketing approach doesn't overcome one person willing to give up their cash.

      BTW, as for annoying, how about those people who tape and thumbtack shit to your mailbox because it's cheaper than buying stamps? Arrrgh!

    9. Re:Maybe they can't but... by Anonymous Coward · · Score: 0

      BTW, as for annoying, how about those people who tape and thumbtack shit to your mailbox because it's cheaper than buying stamps? Arrrgh!

      That actually commits a felony, at least in the US, as does just leaving flyers in your mailbox - ANY use of a mailbox other than for sending or receiving mail *directly* through the US postal service commits a crime.

      Take the flyers or whatever to your local post office, and file a formal complaint.

    10. Re:Maybe they can't but... by KalvinB · · Score: 1

      Ads don't work. Any site that tries to make money off of ads is going to be out of business. However that doesn't justify circumventing ads just because you don't like them.

      In exchange for their service, you watch ads. Simple as that. TV ads pay the network whether you actively watch or not. On-line it doesn't work that way. If you like their site, help support it passivly or you'll be forced to help costs actively.

      Personally I've found memberships work a whole lot better at bringing in money for my site. In 9 months I never made a dime even after 65,000+ impressions. With memberships I started making money the first day.

      What will probably happen with these sites is that ads will be fed to you in an even more annoying way since you couldn't deal with the current method.

      It costs money to run a quality site. Get used to helping offset the cost in some way. You have as much a right to an ad free consumer experience as companies have to shove ads down your throat.

      Don't like it, don't visit. They'll be better off without you. I had no qualms about cutting off the leaches at my site by requiring a membership for certain areas and my site has been all the better because of it. Better to not have a visitor and no money than no money and a visitor sucking up resources.

      Ben

    11. Re:Maybe they can't but... by Anonymous Coward · · Score: 0
      Companies (not just advertisers, but those who serve such ads) need to realize that more annoying ads do make an impression - a strongly negative one.


      The same could be said for telemarketing. The ugly truth is that some people do buy, annoyance be damned. Saddly, the situation seems similar with web ads. Making them more annoying does seem to increase their effectiveness. It's just a tiny fraction of users, and everyone has to suffer because those few losers click on the annoying ad and sometimes make a purchase. Maybe they're the same bastards that buy from telemarkers. Whoever they are, they're indirectly causing everyone else to endure annoying advertising.

    12. Re:Maybe they can't but... by mobiGeek · · Score: 1
      I don't mean this as elitist or anything, but...

      Problem as I see it is...you won't be able to be elitist. They want to control how, when and what people do on the 'net. They don't care about who, because they expect that to be everyone.

      If the conglomerates just happen to control the entry points to the 'net as well (ISPs, cable, wireless, teleco, etc.) then just how do we expect to be (or remain) elitist?

      I want to remain elitist...but give the (mainly) U.S. corporations the lead-way, and they'll quash elitism as we know it. General computing devices are, after all, potential tools for terrorism/hacking/spying/media-hype-word-of-the-mo nth.

      --

      ...Beware the IDEs of Microsoft...

  7. It's their server... by Just+Some+Guy · · Score: 2, Insightful
    ...but the limit of their sphere of influence should be strictly limited to their users, and not the author of software that those users may use to retrieve content from the site.

    Put another way: particularly on a subscription site, the site owners may specify whatever stupid terms and conditions that their subscribers are willing to submit to. That does not mean, though, that the client software is obligated to know whether or not the software itself meets the TOS (nor can I be made to believe that this is possible).

    --
    Dewey, what part of this looks like authorities should be involved?
  8. TerraServer by Corrupt+System · · Score: 3, Interesting

    I can understand how site owners could have a problem with a commercial software product like ExpertGPS wasting their bandwidth while skipping ads. ExpertGPS costs $59.95, but downloads maps from Microsoft's TerraServer without going through its web interface and viewing its advertising. Microsoft hasn't blocked access from these programs yet, but what if they do? All the paying users of ExpertGPS would be out of this functionality.

    --
    The solution that has worked best for me...is to avoid public discussion. -- CmdrTaco
    1. Re:TerraServer by topografix · · Score: 2, Interesting

      TerraServer explicitly allows access to their USGS map database from programs like ExpertGPS. They even have a webpage with step-by-step instructions on how to do it.

      ExpertGPS could just as easily grab its maps from sites like TopoZone and deprive them of ad revenue. Other programs have actually done that, and caused the nice guys at TopoZone a lot of hassle and lost revenue. The guys at Geocaching.com spend lots of time dealing with database scrapers who mine the site continually, chewing up bandwidth.

      The moral of the story - play nicely. If a website like TerraServer is generous enough to offer you a way to scrape their data, say thank you. If a website asks that you refrain from using automated scripts, either work out a licensing agreement with them, or start your own website and learn how it feels to be on the other end of the scraper.

  9. Don't they already??? by tacocat · · Score: 5, Interesting

    I am constantly greeted with messages to the tone of:

    You must have Windows Internet Explorer 4 or higher installed on your system to view this website

    How is this any different from what they are attempting to do here?

    I hate to disappoint, but I don't think that this is a new precedent. What is a new precedent is the notion that they can request the removal, or to make unavailable, software that is otherwise available

    The precedent here is not the software usage to access a website, but the notion that this can be extended to:

    Dear Mozilla.org,

    It has come to our attention that people are using your software to access our website. We don't like this are sending our legal team over to discuss the removal of your software application from the internet.

    Similarly, we are contacting Netscape, AOL, Opera, Konqueror, et al and removing them as well.

    Have a nice day!

    1. Re:Don't they already??? by catch23 · · Score: 1

      Yeah definitely. In the mailing list, it seems like Johan Van den Rande (author of the EuroTV stuff) is the one getting sued when really the ones causing the abuse is from the MisterHouse home automation list. I'm pretty sure if they brought the suit to court, the case would be dismissed. It's like suing all the computer manufacturers who designed the computer to be possibly be "insecure".

    2. Re:Don't they already??? by Spunk · · Score: 2, Funny

      You are aware that AOL, Netscape, and Mozilla are the same people, right?

      (more or less)

    3. Re:Don't they already??? by SubliminalLove · · Score: 1

      They may very well say in their webpage that a particular version of IE is required, but they can't *enforce* it. If my copy of Netscape or the web browser I programmed myself is capable of reading the same web protocols as IE, I'm certainly free to do so. That message merely indicates that the code used on the page includes commands that older web browsers won't understand. It's not a legal threat.

  10. Where have I heard it before? by LinuxMacWin · · Score: 0, Flamebait

    Is it just like MSN wants you to use MS IE only?

    http://slashdot.org/article.pl?sid=03/02/06/1645 22 9&mode=nested&tid=109

    1. Re:Where have I heard it before? by LinuxMacWin · · Score: 1

      Or is it just like sites don't want you to deep link...

      http://yro.slashdot.org/article.pl?sid=03/01/08/ 22 23214&mode=nested

    2. Re:Where have I heard it before? by LinuxMacWin · · Score: 1

      Or is it like the RIAA which believes you can listen to a music CD, but not copy, not play it loud so someone can hear, not transform the media for a backup copy .....

      http://slashdot.org/article.pl?sid=03/01/26/2317 25 3&mode=nested&tid=99&tid=123

      Sorry the link is not perfect, but close enough...

    3. Re:Where have I heard it before? by cmallinson · · Score: 1
      Is it just like MSN wants you to use MS IE only?

      Is that even an issue? I thought that the only people who ever went to the msn website did so because it's the default start page for IE. I bet that people who don't know how, or haven't bothered to change their homepage account for 95% of the traffic to MSN.

    4. Re:Where have I heard it before? by budgenator · · Score: 1

      I just checked, MSN.com seems to work just fine in opera. I've had a HotMail account that work just fine in netscape under linux. I guess wants and requires are two seperate things

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
  11. If you don't want window shoppers... by Eese · · Score: 5, Insightful

    ... don't put merchandise in the windows.

    Just like you can listen to unencrypted radio broadcasts through the airwaves as much as you want, or stand next to a group of people talking and listen in, you can view web pages that are served openly over the Internet.

    If you are going to be presenting something for people to observe, they can observe it however they like. Legislate all you want, but this is a fundamental component of logical (as opposed to legal) privacy.

  12. Why not? by JazzyJ · · Score: 5, Insightful

    There are a multitude of methods for providing different content based on what the client browser returns on certain environment variables. While I think it's silly to demand that modules be removed from CPAN, it's entirely up to the people running the server to determine who they want to serve content to....and who they dont.

    If they can't figure out how to do it serverside (or with clientside scripting) then that's their problem.

    That's the bitch about open standards....EVERYONE can use them.... :)

    1. Re:Why not? by collapser · · Score: 1

      haven't these people heard of subscription services? that at least would enable them to know some of the clients they are distributing to, and maybe even make things a little more tailored to the clients requirements.

      And then, I thought that copyrights laws dealt with this already (Thou Shalt Not Publically Publish This Elsewhere Without Ye Express Permission etc).
      its dumb.

      --
      <B>note to self:</B> <I>post as html</I>
    2. Re:Why not? by Gadzinka · · Score: 1
      That's the bitch about open standards... EVERYONE can use them...

      ...your neighbour too ;)

      Robert

      --
      Bastard Operator From 193.219.28.162
    3. Re:Why not? by Tokerat · · Score: 2, Interesting


      Same goes for the deep-link fanatics. Create a 0px wide frame (basically invisible) the encompases the entire browser window content area and then load pages in there, on server side checking the HTTP_REFERER and on the client side, using JavaScript to ensure the documents are loaded inside the proper frame (which could have a static name or one that is dynamically allocated to each session, even). Make it run over SSL so no one can "steal" those URLs "in transit".

      Is it really just easier to sue everyone than to pay a grungy guy in a t-shirt like me to set up your server to do this?

      Ahh, I get it, it's the return you make on the "investment" in your lawyer.

      --
      CAn'T CompreHend SARcaSm?
    4. Re:Why not? by Lord+Ender · · Score: 1

      You are wrong. Client-side scripting and server side browser detection do not work because they make the error of trusting the client software. With some browsers, like Opera, you can have it identify itself as any web browser.

      --
      A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
  13. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  14. Screen Scraping by barnsleyBigUn · · Score: 1

    How do EuroTV get their television schedules?

    Subscribe to some kind of schedule syndication service? Pay for some student to type them in? Or retrieve information from the broadcasters websites?

    Wouldn't it be ironic if they scraped the (e.g.) BBC's Web Site to retrieve schedule information for Beeb 1, 2, ...

  15. News Clipper by Coppit · · Score: 1
    Funny, you'd think that I would have heard something about my News Clipper open source program. It lets people fetch anything using "handlers". The usenet handler, for example, inserts links into a webpage for usenet articles. There are noninfringing uses of such software, under most fair use laws. My stock answer to the copyright question was "It's your responsibility: you have to get syndication rights, or stay under the terms of fair use".

    By the way, the easiest way to defeat WWW::EuroTV is to simply change your formatting every few days. The author will go crazy trying to keep up. :)

    1. Re:News Clipper by Anonymous Coward · · Score: 0

      "By the way, the easiest way to defeat WWW::EuroTV is to simply change your formatting every few days. The author will go crazy trying to keep up. :)" and so will their webmaster!

  16. WGET? by LT4Ryan · · Score: 1

    I know a lot of sites don't like the use of WGET to 'acess web resources'...so what makes this screen scraping technology any different?

  17. From the inside. by Anonymous Coward · · Score: 0

    From the perspective of a webmaster of a large commercial site that gets regularly scraped by users using libwwwperl and other various perl packages for its content...

    We don't care how you look at our site, but we do, to the best of our ability, monitor you closely. We only care if you republish our stuff. You may look at it anyway you wish, but just don't make it available to others. It's in our notice on every page.

    Screen scrape away...

  18. Doesn't our favorite company already do that? by FroMan · · Score: 1, Offtopic

    Microsoft Sends Broken Stylesheets to Opera

    Not exactly enforcing you through law, but definitely through little "accidents" like above.

    --
    Norris/Palin 2012
    Fact: We deserve leaders who can kick your ass and field dress your carcass.
  19. Learn from Google by shiflett · · Score: 4, Insightful

    They should do as many of us do and learn a lesson from Google.

    It is a violation of Google's terms of use for you to "screen scrape" search results. You can implement their API using a free key and achieve similar results, however.

    Not only are these companies approaching the "problem" from the wrong angle in terms of common sense, they are also taking the most difficult approach. It is practically impossible to seek to outlaw software that fetches Web content, because Web browsers and wget (for example) are the same thing, HTTP clients. The HTTP protocol is an open standard that anyone can implement. If you don't want a valid HTTP client accessing your server, don't make your server an HTTP server.

    Stated another way, don't try to take an open standard and restrict everyone else's use of it to suit your own needs. You don't see me (an avid soccer player) trying to get the NBA to change the rules of their game to require use of the feet for ball control. If I want to play basketball, I have to play by the rules, else I am not really playing basketball.

  20. HTTP GET is an authorization by bwt · · Score: 5, Insightful

    This is just another example of gross technical incompetence by executives and lawyers.

    A company that attaches an HTTP server receives an HTTP GET request complete with some information in its headers. They have a reasonable case to request that that information be accurate. They have unilateral technical ability to firewall IP's or whole subnets. Otherwise, once they receive a GET request, when the machine that they have configured responds by sending a file, they have granted explicit permission to process that file consistent with the info in the GET request.

    The owner of the server is completely in control at a technical level. If they don't like what you are doing, they can firewall you. Absent a contractual agreement not to, you have the permission to send ***REQUESTS*** for anything you would like to request. They can say no. If you lie in your request, then they have a case to say your use is unauthorized, but short of that, there should be no need to have the judicial system rewrite the technology.

    1. Re:HTTP GET is an authorization by bwalling · · Score: 1

      If you lie in your request, then they have a case to say your use is unauthorized, but short of that, there should be no need to have the judicial system rewrite the technology.

      I don't have to lie. I do my programming in VB on Windows (save the comments). I have written several programs for scraping, and all of them simply automate Internet Explorer. I identify as an acceptable user agent, and can accept cookies and JavaScript redirections. It works out well. The only thing you might notice is a spike in the number of requests from one location.

    2. Re:HTTP GET is an authorization by errxn · · Score: 2, Insightful

      Here's an analogy of sorts:

      You leave your house unlocked. Someone walks in the front door and steals your TV.

      According to our laws, just because you left your house unlocked (giving the outside world access) does not give the person who stole your TV a legal right to do so. They still committed a crime.

      Now, where this analogy might fall flat on his face is the idea that when you make a GET request, and the party on the other end responds by sending you a stream of data, have they just performed the equivalent of giving you the TV after you walked into their house? They can't very well say that you stole it if they willingly gave it to you.

      --
      In Soviet Russia, Chuck Norris will still kick your ass.
    3. Re:HTTP GET is an authorization by innerlimit · · Score: 1

      i was just testing the new xml capabilities in vb to start scraping eurotv & nedstat... funny really... i was thinking "would they get pissed if i used it on my site?" seems i got my answer :-)

    4. Re:HTTP GET is an authorization by dgoodman · · Score: 1

      No, his analogy is more like knocking at the door. You knock at my door (GET request), stating that you'd like entry to collect my TV. Now, normally, I don't have a problem with my neighbors coming in and taking my TV, so normally I just haul it to the door and hand it over. Sometimes, though, some stranger wears a mask that makes him look like my neighbors, and so gets my TV that way, but most of the time I really don't care anyway 'cause most people will watch the ads on the TV anyway (which is my source of income).

    5. Re:HTTP GET is an authorization by errxn · · Score: 1

      1. Yes, I like your "knock at the door" analogy better than my "come into my unlocked house" analogy. Someone does have to "answer the door", so to speak.

      2. The stranger won't necessarily have to sit through the ads, since he is most likely parsing and filtering what he's "watching". At the very least, I doubt that he'd pass on the ads to his "clients". Maybe, but I doubt it.

      3. I truly wish that this analogy were closer to reality, because if it were, I could go through the request logs and track down the whereabouts of the sorry little bastards who broke into my car last week and stole my stereo. I could then go to their address and initiate a "Denial of the Ability to Walk Correctly Again" attack. Ah, well....

      --
      In Soviet Russia, Chuck Norris will still kick your ass.
    6. Re:HTTP GET is an authorization by Anonymous Coward · · Score: 0

      I think the analogy is more like :

      1) you know on their door
      2) they answer the door
      3) you ask for the TV
      4) they give you the TV

      Note complete absense of guns...

      Sounds perfectly legal to me.

    7. Re:HTTP GET is an authorization by bwt · · Score: 1

      If you left your door unlocked, then he can only be charged with stealing your TV and not with breaking-and-entering.

      However, if somebody knocks on your door and asks you if they can have your TV and you give it to them, then you no longer own a TV. If you make a contract with them, then they would have to abide by it. HTTP supplies no way to constraing the request with a contract, although robots.txt might be sufficiently standard to override this.

      I would have no problem with a lawsuit that said "X sued Y for violating the robots.txt policy". In fact, I think that might be a good thing.

    8. Re:HTTP GET is an authorization by Anonymous Coward · · Score: 0

      it's much simpler than that. if you check the referrer part of the http request, and it's not
      from your site, simply redirect to the home page.

      this comes up over and over...

      it's all explained in the mod_rewrite section of the apache config manual

    9. Re:HTTP GET is an authorization by bwt · · Score: 1


      Either way works. The point is that the technological methods and standards exist and are completely adequate to protect the web site's interests.

    10. Re:HTTP GET is an authorization by LagDemon · · Score: 1

      I think a better analogy would be if you were at home, and someone came to your house, knocked on the door, and said "Hi, I'm Joe. Can i have your T.V.?" and you give it to them.

      --


      Beware of he who would deny you access to information, for in his heart he dreams himself your master.
    11. Re:HTTP GET is an authorization by kingosric · · Score: 1

      Here's an analogy of sorts:

      And here's another. Some people you up and asks for your credit card number, expiry date and mothers maiden name. Do you

      • Leave the information on your outgoing answerphone message, so you aren't bothered by all the people calling and asking for information
      • Ask each caller (at least) who they are and (maybe) why they want the info.
      • Unplug the phone
  21. Removed? by Karamchand · · Score: 1

    So, were these modules really removed from CPAN or did the CPAN admin withstand the pressure?

    1. Re:Removed? by spacefight · · Score: 1

      Go, have a look at CPAN for the EuroTV stuff.

  22. Yah, so? by mosch · · Score: 1
    There have been two cases recently where websites have requested the removal of modules from CPAN.
    Is there anybody who didn't see this coming?

    The companies spend time and money making websites that are designed to help further their corporate goals, cross-promote products and services, and possibly act as a vehicle for third-party advertising. If somebody is making a product which is designed specifically to circumvent the reasons why you're providing the website, then of course you should ask them to stop.

    Here's a news flash: TV Guide will eventually stop giving you free screen-scraped guide data. The map sites will stop giving you screen-scraped maps. And so on, and so forth.

    If you want to do this on your own, nobody will stop you, but if you make it simple for thousands of people to use a company's resources while providing no benefit to that organization, you should expect that they'll ask you to stop.

    1. Re:Yah, so? by Arthur+Dent · · Score: 1
      Here's a news flash: TV Guide will eventually stop giving you free screen-scraped guide data. The map sites will stop giving you screen-scraped maps. And so on, and so forth.

      If you want to do this on your own, nobody will stop you, but if you make it simple for thousands of people to use a company's resources while providing no benefit to that organization, you should expect that they'll ask you to stop.

      I don't get it. Do you mean to say that it's ok for Microsoft and Mozilla to make it easy for thousands of people to go to eurotv.com and streetmap.co.uk and use the company's resources while providing no benefit to those organizations, but you cannot write and release a perl module that does the same thing?

    2. Re:Yah, so? by Anonymous Coward · · Score: 0

      But the question is, are you under any legal obligation to comply?

  23. Sure they can by slippy51 · · Score: 1

    And if they don't let me use what I want; I will just take my business elsewhere.

    Simple as that.

    1. Re:Sure they can by Anonymous Coward · · Score: 0

      Yeah! I'll go spend $0 at another company instead. That'll show em!!!

  24. Dangerous Precedent by EnglishTim · · Score: 4, Insightful

    I find it sad that so many people seem to think it is just fine to mine their site for data. Sure, there's not all that much that they can do about it, except remove the data or make it harder for regular users of the site to use it.

    For example, The EuroTV site seems to work on the concept that they provide the information for free for users of their site, but you can pay them to get it on your site. They're using their site as an advert for their services, while at the same time offering a useful service to the community. By making freely available a system to allow anybody to use their data in their own websites without paying them for it, you're completely ridding them of their reason for having the site up at all.

    Yes, you can argue that they shouldn't put the information out there if they don't want people to use it, but then you're giving them a good reason not to put the information out there at all, which makes all of us poorer.

    As for whether they can dictate that CPAN remove the modules, certainly it's fair enough of them to request that the module be removed, but it is a shame they leapt to threats of lawsuits quite so quickly.

    1. Re:Dangerous Precedent by Anonymous Coward · · Score: 0

      Isn't it a simple question of economics? If they put up that information on the web, and they lose money doing that, then maybe they should switch to a subscription model. And then we all pay for the information if we want to have it.

      However, I don't think this is bad. In this particular case maybe, but in general I think paying for the things you get is a good principle. It means I don't have to pay rediculous amounts of money for my printer cartridges so that the printer itself can be cheaper. It means I don't have to pay loads of money for a console game just so that the hardware can be sold for a price that's lower than it's worth.

      And anyway, whether we like it or not, the free market economy seems to work pretty well to counter this unbalanced situation. That's why we have XBox Linux (after all, what would be the point if an XBox was more expensive than an equivalent PC?), warezed games and cheap noname ink cartridges. It's also why all these companies want those draconian laws they keep trying to push through Congress: the only way to keep the business model alive is to kill the forces of the free market, and the only way to do that is through restrictive laws.

      Lourens

  25. Copyrights vs. Fair Use by prgrmr · · Score: 2, Insightful

    If content is obtainted in a manner that is not in violation of copyright, the next question is that of fair use. It didn't sound from the article that the either module author intended or enabled anything explicitly unfair for using the data. If the website owner's in questions were objecting strictly to the method with which their web data was being accessed, their arguement holds no water.

    This is somewhat similar to the "what constitutes a license" arguement regarding database licenses, the contention being a warm body vs. a connection. In the case of these perl modules, just because there's not a warm body explicitly directing the access of the data should not automatically qualify that access as a breach of copyright.

    It would be worth the effort to question both of the website owners as to what exactly did they consider the breach of copyright to be? My guess is that neither of them will be willing or able to express their concerns with enough technical detail or legal specificity to present a valid explanation.

    1. Re:Copyrights vs. Fair Use by Anonymous Coward · · Score: 0
      "If content is obtainted in a manner that is not in violation of copyright, the next question is that of fair use."

      If the content is obtained in a manner that is not in violation of copyright, then *fair use* doesn't even enter into the equation.

    2. Re:Copyrights vs. Fair Use by civilizedINTENSITY · · Score: 1

      Mod the AC up: "If the content is obtained in a manner that is not in violation of copyright, then *fair use* doesn't even enter into the equation." Since data can't be copyrighted, we don't need to invoke fair use.

    3. Re:Copyrights vs. Fair Use by prgrmr · · Score: 1

      I wrote "content" as opposed to "data" because the concept ought to apply to anything retrieved from a web site. True, the data is not copyrightable, but presentation, analysis, and comment on that data is. Given that the copyright holder has sole rights to derivative works, fair use is most definitely an issue, and something too many slashdotters are all too quick to ignore, IMO.

  26. Where do they draw the line? by raju1kabir · · Score: 1

    Where is the boundary between acceptable viewing and unacceptable viewing of content they are making publicly available?

    What if I have my display resolution set differently to the web designer's?

    What if I use Netscape instead of IE?

    What if I use a black & white screen?

    What if I surf the site with image loading turned off?

    What if I wear dark glasses with holes cut through so I can only see the content and not the ads?

    What if I use a text-only browser?

    What if I use a screen reader?

    What if I use a hypothetical browser that summarizes paragraphs to the first few lines?

    What if I use a browser that "collapses" paragraphs on the first few lines and lets me click on an arrow to reveal the entire paragraph?

    There's a continuum between displaying the content exactly as they envisioned it, and reducing or distilling it to some other form. Other than sitting at the same computer used by the web designer, anyone who visits a web site is somewhere along that continuum.

    Unless they can define a specific point beyond which viewing is objectionable to the publisher - and I don't think they can - then I don't see how this case could get anywhere. The person making the software can keep backing up slightly until the plaintiff's position is absurd.

    Furthermore, the tool in question is not a commercial product, the developer is not trafficking in the publisher's data, and, as a practical matter, it's easy enough to rewrite it so that the web server can't tell the difference anyway.

    This is just another case of someone with a lousy business model trying to fix their problems with a lawyer instead of a good solid application of common sense (CueCat anyone?).

    --
    "Patriotism is your conviction that this country is superior to all other countries because you were born in it." -- GBS
  27. Use the "Wink Wink, Nudge Nudge" module disclaimer by karlandtanya · · Score: 1
    You know, like the ones for turning on certain "licensed" features of freetype, using decss (keep it on topic, folks), gif libraries, and patching your kernel for crypto.

    Basically: "Here is the code. Here is what it does. If it's illegal for you do do that where you live, then don't break the law."

    Why should a developer be denied the right to publish code that "could be used to do something that may be illegal under certain circumstances". Hey, I know--I'll build a security system to protect against 19th century threats, and then sic my lawyers on anybody who invents a technology that might circumvent my security.

    I have a pair of bolt-cutters in my garage. Ace hardware was happy to sell it to me. I don't think the hardware store owner should bow to pressure from the U-Stor-It down the street who might say: "Hey, people use those things to break into our storage facilities."

    OTOH, if this actually gets to court and holds up, then I will create a website and copyright some work on that site. Perhaps an scan of an artistic display of one of my fingers. The license will say "You may not view any material on this website"

    Then I will tell anyone who produces tools that allow this sort of copyright violation (web browsers) to take place that they must stop!

    Hmmmm...Who should I start with??" MWAAAHAHAHAHA

    --
    "Reality is that which, when you stop believing in it, doesn't go away." - Philip K. Dick
  28. The future of the web by KjetilK · · Score: 4, Interesting
    The web was never intended to be a browser-only environment. From the start, it was intended to be a medium that would be useful for a wide varity of user agents, crawling for info and presenting compiled and digested information to the user.

    This was not ever realized, I believed mostly because of overpaid "web designers".

    But the Semantic Web would require many funny user agents for all kinds of things.

    Clearly, if this kind of thinking is allowed to persist in corporate headquarters, it will kill the Semantic Web before it gets started.

    I wonder what Tim Berners-Lee thinks about this...

    --
    Employee of Inrupt, Project Release Manager and Community Manager for Solid
  29. Content is important by binaryDigit · · Score: 4, Interesting

    One of the biggest sites that I've not seen anyone mention is eBay. Following is in their eula:

    Our Web site contains robot exclusion headers and you agree that you will not use any robot, spider, other automatic device, or manual process to monitor or copy our Web pages or the content contained herein without our prior expressed written permission.

    You agree that you will not use any device, software or routine to bypass our robot exclusion headers, or to interfere or attempt to interfere with the proper working of the eBay site or any activities conducted on our site.

    You agree that you will not take any action that imposes an unreasonable or disproportionately large load on our infrastructure.

    Much of the information on our site is updated on a real time basis and is proprietary or is licensed to eBay by our users or third parties. You agree that you will not copy, reproduce, alter, modify, create derivative works, or publicly display any content (except for Your Information) from our Web site without the prior expressed written permission of eBay or the appropriate third party.


    Now why they do this is obvious, they have an absolute goldmine of information and they want to be able to take advantage of it when they're good and ready. I assume other sites could adopt this type of eula, which wouldn't make the software itself illegal, but would make using it so (or at least until someone challenges it).

    1. Re:Content is important by anaradad · · Score: 5, Insightful

      The eBay EULA only applies if you actually register for their service. If you have never signed up for eBay, you have never signed off on their EULA.

    2. Re:Content is important by binaryDigit · · Score: 2, Insightful

      From ebay again:

      Welcome to the User Agreement for eBay Inc. The following describes the terms on which eBay offers you access to our services.

      This agreement describes the terms and conditions applicable to your use of our services available under the domain and sub-domains of www.ebay.com (including half.ebay.com, ebaystores.com) and the general principles for the Web sites of our subsidiaries and international affiliates. If you do not agree to be bound by the terms and conditions of this agreement, please do not use or access our services.


      Notice that it doesn't say anything about registering, it says "using their serice", which could be interpreted as also browsing, since that is a "feature" offered by their website. Registering simply brings into effect other parts of the eula that are applicable to those actions. If nothing else, the contents of the site are still copyrighted, so even if you didn't agree to their eula, you still couldn't do anything with the content.

    3. Re:Content is important by steve_l · · Score: 1

      IMDB's robots.txt file has a no robots most places policy to keep server load down, but the file also talks about how to get the raw data if you really want to, which is a good compromise.

      The file also appends the User-Agent field of the browser at the bottom, which shows that even that .txt file is probably served by a few lines of Perl...

    4. Re:Content is important by Lt+Razak · · Score: 1
      ....but the file also talks about how to get the raw data if you really want to, which is a good compromise

      It does? It had an email listed to contact them if you wanted access to internal docs. That doesn't necessarily mean their database is all yours.

    5. Re:Content is important by pheared · · Score: 1

      Under request from some people, especially those in Germany (where apparently eBay germany is more strict with respect to scraping) I changed bidwatcher's User-Agent to something more resembling Mozilla.

      I too was surprised that no one had mentioned eBay, and I have also read that EULA.

      eBay actually sells an API but I'd say the pricing is a little less than lucrative for an open project like bidwatcher. I suppose one could craft a server that bid for a bunch of users and charge them for use and recoup the cost. Can't let the users have the API controls in their clients though, because they can quickly run up your bill with eBay, as they charge *per access*.

    6. Re:Content is important by drafalski · · Score: 1
      ...you agree that you will not use any ... manual process to monitor or copy our Web pages or the content contained herein without our prior expressed written permission.


      Wouldn't typing their URL in my browser and looking at auctions be a manual process to monitor some of their pages? Or reloading an auction I bid on to see how its going?
    7. Re:Content is important by John+Hasler · · Score: 1

      This agreement is no more enforceable than it would be if it were attached to a billboard.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    8. Re:Content is important by Anonymous Coward · · Score: 0

      ..and what law school graduated you that you can make that unsubstantiated remark and not be considered foolish?

    9. Re:Content is important by QuakeBurger · · Score: 1

      eBay has had an API for a long time to allow access to this goldmine, which is smart because it cuts out the overhead of screen-scraping for both the server and the client. Ain't free though.

      http://developer.ebay.com/

      --
      -- It is my strong belief that it is a mistake to hold strong beliefs.
    10. Re:Content is important by Hektor_Troy · · Score: 1
      Funny that. I have the following in my HTTP-requests:

      You, the owner of the server, the owner of the contents of the server and all persons even remotely related to said persons, agree to give me all of your belongings, by replying to my GET request.
      --
      We do not live in the 21st century. We live in the 20 second century.
    11. Re:Content is important by Anonymous Coward · · Score: 0

      A better one than yours, presumably.

      [not the parent poster]

    12. Re:Content is important by steve_l · · Score: 1

      They really do give you the raw film info content for
      download, though the licensing rules for the data say that you cant use them on your own web site; people provide implementations of standalone clients from the mainstream -unix & windows to the obscure: OS/2 and Amiga; so if you want to integrate your linux PVR with a standalone IMDB dataset, go right ahead...

      Enjoy the data; its a good example of how a bunch of perl and mysql hackers remain true to their roots, and the origin of the data as some Usenet affiliated files.

  30. paging Jack Valenti by sydlexic · · Score: 5, Funny

    didn't you read the terms of service agreement you were handed at birth (us citizens only) that states any bypassing of ads during receipt of content is theft?

    I'm just waiting for ashcroft's goons to knock on my door, find the tivo and haul my ass off to jail.

    1. Re:paging Jack Valenti by merlyn · · Score: 4, Funny
      "Click Here To Accept Your Life's Conditions: [Agree] [Disagree]"

      {grin}

    2. Re:paging Jack Valenti by bofkentucky · · Score: 0, Flamebait

      Try the democrat goons, look up the entertainment industry on open secrets, they gave 84% of their donations to democrats in the last election cycle, chew on that before the next election.

      --
      09f911029d74e35bd84156c5635688c0
    3. Re:paging Jack Valenti by Anonymous Coward · · Score: 0

      There is a discussion on this subject at diveintomark right now also. Mark is, for once, dead wrong on an issue. We're trying to set the poor guy straight though...

    4. Re:paging Jack Valenti by Anonymous Coward · · Score: 0

      Which explains, I suppose, why your heroic Republicans will repeal the DMCA before the next election, given that they control the Presidency and both houses of Congress. Oh, they won't? Gee, chew on that.

    5. Re:paging Jack Valenti by trbogie · · Score: 5, Funny

      I thought they were trying to modify that to say that "Having left the womb, you have, by default, accepted the agreements to all life's conditions."

    6. Re:paging Jack Valenti by spencerogden · · Score: 1
      IE6 users, try typing about:Mozilla in the address bar Someone doesn't like the lizard

      All I get is a blue page, what's so bad about that?

    7. Re:paging Jack Valenti by Jack+Edward+Valenti · · Score: 1

      Well I'm glad one of you guys agree with me.

      -Jack

      --

      You are all pirates, plain and simple.
    8. Re:paging Jack Valenti by Natalie's+Hot+Grits · · Score: 1

      type about:[any word here] and it is supposed to display the word, of if the text says "blank" then it gives you a blank page. However, there is special code to handle the word "mozilla" that turns your page blue.

      I dont know why, but I'm quite sure it is intentionally different than the expected result

      --
      Two infinite things: your stupidity and mine. But I'm not sure about the latter. If my sig offends you, I'm sorry.
    9. Re:paging Jack Valenti by damiam · · Score: 1
      One way to look at it is that MS's insinuating that Netscape crashes (bluescreens) a lot. I think I've heard it explained another way too.

      If you never have, try typing about:mozilla in Mozilla and in Netscape 4.x. You'll get two different quotes.

      --
      It's hard to be religious when certain people are never incinerated by bolts of lightning.
    10. Re:paging Jack Valenti by flacco · · Score: 2, Funny
      "Click Here To Accept Your Life's Conditions: [Agree] [Disagree]"

      A slight correction:

      "Click Here To Accept Your Life's Conditions: [Agree] [Agree]"

      --
      pr0n - keeping monitor glass spotless since 1981.
    11. Re:paging Jack Valenti by jon+doh! · · Score: 4, Funny

      a correction of the correction

      "Click Here To Accept Your Life's Conditions: [Agree] [Disagree]"

      (it's greyed out, like the microsoft patch i applied that said "you need to reboot your computer for the changes to take effect" and had two buttons, one to reboot now, one to reboot later. the reboot later was greyed out...)

    12. Re:paging Jack Valenti by dbrutus · · Score: 1

      I'd actually have more hope in the (R) side of the Congress to fix the DMCA than the (D) side but the Senate's so finely balanced that anything that isn't very high priority just simply isn't going to get through.

      Let's face it, nasty as the DMCA is, judge confirmations, the budget, and the war all outrank fixing it.

    13. Re:paging Jack Valenti by AntiNorm · · Score: 1

      didn't you read the terms of service agreement you were handed at birth (us citizens only) that states any bypassing of ads during receipt of content is theft?

      Two things:

      1. You're under 18 at the time you "accepted" this "contract," so legally you cannot be held to it.

      2. How the heck is a newborn supposed to be able to read, let alone understand, let alone be able to agree to something like that?

      --

      I pledge allegiance to the flag...
      of the Corporate States of America...
    14. Re:paging Jack Valenti by ArsonSmith · · Score: 1

      2. How the heck is a newborn supposed to be able to read, let alone understand, let alone be able to agree to something like that?

      We could say the same about many users who have opened srink wrap software.

      --
      Paying taxes to buy civilization is like paying a hooker to buy love.
    15. Re:paging Jack Valenti by renegade600 · · Score: 2, Funny

      However, please be aware that your conditions for living may change without notice...Violating the terms for Your Life's Conditions may result in the termination...

      All I got to say it thats "Life"

    16. Re:paging Jack Valenti by Golias · · Score: 0, Offtopic
      A lot of republicans would love to fix or repeal the DMCA. If there were time this session, I'm sure Orin Hatch would enthusiastically author such a bill, as would John McCain.

      However, with shit like Kennedy filibustering to prevent the appointment of a former Clinton judge beacuse he's supposedly too conservative, there's no way anything is likely to get done beyond the budget bill and Iraq's Saddamectomy.

      --

      Information wants to be anthropomorphized.

    17. Re:paging Jack Valenti by Anonymous Coward · · Score: 0

      The about: echo shouldn't work anymore, it was a security flaw waiting to happen (and it did, so they removed it). In IE, res://mshtml.dll/about.moz (which about:mozilla redirects to) is a good old fashioned easter egg, although the engineer responsible wasn't allowed to keep the text so it's just a blank blue screen where Netscape and Mozilla have quotes from The Book of Mozilla.

      The source didn't know what the parody text was, but his impression was that the engineer kept their job on the condition they remove it. It may have been something similar to the famous "seineewerasreenigneepacsten" egg.

    18. Re:paging Jack Valenti by machine+of+god · · Score: 1

      So what happens if you choose disagree?

    19. Re:paging Jack Valenti by Anonymous Coward · · Score: 0

      That should be: "Click Here To Accept Your Life's Conditions: [Agree] [Agree]"

    20. Re:paging Jack Valenti by Anonymous Coward · · Score: 0

      No, should be agree now or we will agree for you in 15 seconds.

    21. Re:paging Jack Valenti by Master+of+Transhuman · · Score: 1


      What makes you think it's US citizens only?

      The US corporate state would like it to be everybody on the planet.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    22. Re:paging Jack Valenti by bofkentucky · · Score: 0, Offtopic

      Try the democrat goons, look up the entertainment industry on open secrets, they gave 84% of their donations to democrats in the last election cycle, chew on that before the next election.

      I suppose that someone didn't want there precious liberal protectors insulted, flamebait my ass. If you have information to refute my argument, please respond

      --
      09f911029d74e35bd84156c5635688c0
    23. Re:paging Jack Valenti by whereiswaldo · · Score: 1

      >>>"Click Here To Accept Your Life's Conditions: [Agree] [Disagree]"

      Hey, if you click Disagree, does that send you to Yahoo.com?

    24. Re:paging Jack Valenti by zorander · · Score: 1

      So _that's_ why there's all this controversy over partial-birth abortion...

      what else can you do when the baby disagrees?

    25. Re:paging Jack Valenti by n9hmg · · Score: 1

      He got too verbose, but it did require a line saying that it meant greyedtake of your partisan blinders out, rather than "agree" being the default.
      However, you're right... it wasn't funny.

    26. Re:paging Jack Valenti by DunbarTheInept · · Score: 1

      Netscape cannot cause a blue screen of death. Only the OS can. If Netscape's apps are capable of making the OS crash, that's the OS's fault, all the way. Netscape on Linux never crashes linux, but it does segfault a lot.

      --

      Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

    27. Re:paging Jack Valenti by a_n_d_e_r_s · · Score: 1

      Actually your parents as your legal guardians agreed for you.

      And yes it means that you when you become of legal age must agree to the conditions or else can be subject to termination. (Depending in which country you live in)

      You may think of it as a joke but it's not. In alla countries all citizens must know the laws - not knowing the laws is not an legal defense. So yes thare are an EULA for life and it is called - the law.

      The only thing you can do about it is to emigrate or run för office to try and rectify the situation.

      --
      Just saying it like it are.
  31. Can information be protected by copyright? by Lumpish+Scholar · · Score: 3, Informative

    Everyone's assuming the appropriate rules here are from copyright law, which allow you to protect the expression of an idea but not the idea itself. That's probably right. It's not the way some big organizations want to play.

    In the United States, most major sports leagues (NFL, NBA, NHL, MLB, etc.) believe that they own the rights to real time scores, and can permit or restrict any desired use. I ran into this at a previous job: we could "broadcast" football, basketball, and hockey scores at the end of every "period," and baseball scores at the end of every half inning, but we couldn't send updated broadcasts for every new score. That information needed (so said the leagues) to be licensed, and most of it had been exclusively licensed for the medium (Internet) we were interested in.

    Do they have a legal leg to stand on? No. (IANAL.) Are they leaning on a great, big, huge stick with nails driven through it? Apparently.

    --
    Stupid job ads, weird spam, occasional insight at
    1. Re:Can information be protected by copyright? by niola · · Score: 1

      You could challenge that. The scores are fact. Factual information cannot be copyrighted. What CAN be copyrighted is the method of compilation or presentation of said fact. Check this out for more information. --Jon

    2. Re:Can information be protected by copyright? by bear_phillips · · Score: 1

      Data can't be copyrighted. That is why anyone can give sports scores. The NBA etc. make a loophole for real time scores. The only way to get real time scores is to be in the building. Basically when you watch a game, they prohibit you from "broadcasting" via your cell phone etc while the game is going. If you just got up in the middle of the game left, then you could say "hey the score was X when I left." But then they wouldn't let you back into watch the rest of the game.

      --
      http://www.windmeadow.com/
  32. Back in the day... by TheTick · · Score: 5, Insightful

    Remember when the web -- no, remember when the net was about sharing information? I miss that time. If somebody wrote a cool front end to your service, it was COOL and more power to them. If it made your service (site, whatever) more accessible, that mean more people were looking at your stuff, and that was COOL.

    Now we have entities that threaten legal action for accessing the stuff they've made publically available. There may actually be a case when the software scrapes and repackages the content (or, more importantly, redistributes it), but I hope the stuff about decoding the URL for easy use is bogus. I have my doubts that a court will see it my way, but still I hope for reason. Nevertheless, the whole idea makes me sad and nostalgic.

    Another thought: is my mozilla vulnerable to this sort of action because it blocks ads -- essentially repackaging the server output for display to me? Now I'm really depressed.

    --

    --
    bachiatari na torisetsu o yome!

    1. Re:Back in the day... by collapser · · Score: 1

      i'm all for the free content myself, but take things from their point (admittedly rather hazy) POV: they paid for the stories (and i'd be amused if they didn't); and they gain revenue (and the cash to buy/write/serve more stories) through advertising.

      That's their current business model; you can't fault them for trying to prove its (questionable) effectiveness.

      Everything is about sharing, but you must be willing to share something back. This applies for economic sense as much as it does for p2p as much as it does for, well anything.

      no such thing as a free lunch blah blah blah etc;
      but I can agree that ads (antiquated, one-way, old-info-distro cultural thinking) are not the best way to gain revenue.

      But at the end of the line, there is someone out there doing the work, checking the facts and writing the articles for a living. I can respect that, and their desire to get paid for it.

      --
      <B>note to self:</B> <I>post as html</I>
    2. Re:Back in the day... by old_skul · · Score: 1

      Back in the day, bandwidth and servers were all paid for by governmental or educational organizations. Not commercial entities like what we have today with CNN and Ebay and Amazon.

      In today's reality, someone has to pay for that information. With payment comes a sense of entitlement; with entitlement comes a lawyer.

      TANSTAAFL.

    3. Re:Back in the day... by Lt+Razak · · Score: 1

      I remember those days. And I remember back then when people said business would kill the internet "some day". I always agreed, but laughed inside, thinking the day would be long in coming.

    4. Re:Back in the day... by .com+b4+.storm · · Score: 1

      Another thought: is my mozilla vulnerable to this sort of action because it blocks ads -- essentially repackaging the server output for display to me? Now I'm really depressed.

      I imagine it'd be pretty hard for you to make that kind of case against Mozilla. Mozilla doesn't really "repackage" anything, it simply denies a request. The page requests to open a window, Mozilla refuses. And indeed, Mozilla is refusing a request to make a request (to fetch an ad).

      If you don't like my browser refusing to pop up an ad, then put it on the page itself and stop trying to hijack my desktop with your flashy Windows-lookalike-animated-GIF crap. And if you don't like the fact that I can ask my browser to block certain images/servers as well, then get off the web. I shed no tears for people who resort to annoying me (pop-ups) or trying to strong arm me (legislating how I access content through a public protocol on a public network on public servers).

      --
      "Wow, you're like some kind of superhero able to ward off happiness and success at every turn."
      -- Ryan Stiles
    5. Re:Back in the day... by TheTick · · Score: 1

      At least one of the packages mentioned in this story (one that has been removed from CPAN) doesn't repackage anything either. It just generates URLs.

      --

      --
      bachiatari na torisetsu o yome!

  33. What falls out the back end of a bull? by Wonko42 · · Score: 4, Funny
    "I've written packages that defeat those silly "enter the word contained in the image" tests..."

    Ahem. Bullshit.

    1. Re:What falls out the back end of a bull? by Anonymous Coward · · Score: 0

      I agree. me too I'm an anon. I know

    2. Re:What falls out the back end of a bull? by Anonymous Coward · · Score: 0

      I can see doing it for something really simple, like a clear, horizontal word, but it's trivial to make something that OCR can't do easily. When I created a Yahoo group the other day, they had one with half of the word blurred and curved off of horizontal.

      They could also do something like those colorblind tests with a bunch of dots of different colors, some of which make a number. Of course colorblind people wouldn't be able to figure it out either.

    3. Re:What falls out the back end of a bull? by Anonymous Coward · · Score: 0

      If an image of an apple is shown and the question is "what fruit is shown in the image" OCR will not help.

    4. Re:What falls out the back end of a bull? by poot_rootbeer · · Score: 2, Insightful


      Maybe bullshit, maybe not. A good OCR library will get you 90% of the way there already.

      They can't distort the characters TOO much in the image, or else humans wouldn't be able to recognize them either. And the background patterns to cause interference with OCR sytems could be pretty easy to strip out too; a grid of straight black lines on a white background is fairly trivial to recognize algorithmically, and then removing the lines becomes a simple matter of figuring out where a black pixel is just part of a line, and where it's part of a character.

      Whether it's worth all that effort just to be able to automate the submission of a form is debatable.

    5. Re:What falls out the back end of a bull? by Anonymous Coward · · Score: 0

      Plum?

    6. Re:What falls out the back end of a bull? by leonardluen · · Score: 1

      shhh! his clients don't know that...and the less they know and the more they listen to him about those things the better!

    7. Re:What falls out the back end of a bull? by stratjakt · · Score: 2, Funny

      >> If an image of an apple is shown and the question is "what fruit is shown in the image"?

      Steve Jobs?

      --
      I don't need no instructions to know how to rock!!!!
    8. Re:What falls out the back end of a bull? by Wonko42 · · Score: 1

      I say "bullshit" not because I think it's technically impossible, but because I don't believe that you personally have written any of these "packages", as you claimed.

    9. Re:What falls out the back end of a bull? by Wonko42 · · Score: 1
      I see. So when you said "I've written packages that defeat..." what you really meant was "I've used packages that could theoretically do something similar to defeating..."

      Do you see how much confusion could have been cleared up right from the beginning if you had stated what you actually did rather than implying that you did something wholly more remarkable?

    10. Re:What falls out the back end of a bull? by Anonymous Coward · · Score: 0

      http://www.captcha.net/cgi-bin/gimpy-r

      I was thinking...maybe some handwriting recognition software would be able to help too?

    11. Re:What falls out the back end of a bull? by Alan+Cox · · Score: 2, Interesting

      Actually there is a much simpler way to defeat please enter the word on the image web sites, and one that actually raises a real issue. Those image tricks are discriminating horribly against the blind, the old and those with eye problems in general, as well in some cases dyslexics

    12. Re:What falls out the back end of a bull? by Anonymous Coward · · Score: 0

      It also is quite elementry to create an image that a human could resolve as a word, but would stump even the best OCR. If in the far far future, when OCR tech is good enough to make this game impractical, it would be a simple matter to do for example, ask what the little girl in the picture is holding, in an image of a crowd of people.

    13. Re:What falls out the back end of a bull? by Zerth · · Score: 1

      Have you ever seen a doctor's handwriting? That is defeating such a scheme:)

    14. Re:What falls out the back end of a bull? by Anonymous Coward · · Score: 0

      Why call it bullshit? It is quite possible. I have seen such a script. It read numbers, but I don't see why it wouldn't be possible to read words.

      It was succesfully used to rip off a "rewards" site where you click on ads to earn a few cents.

      Perhaps I've said too much... |-)

  34. Re:Use the "Wink Wink, Nudge Nudge" module disclai by Frobnicator · · Score: 1
    Reminds me of how most of the policys that say "Reading the content means you agree to these terms" are at the BOTTOM of the web page.

    Frob.

    --
    //TODO: Think of witty sig statement
  35. What's the problem here? by hmccabe · · Score: 5, Insightful

    I think this is something we're going to start seeing a lot of in coming years. Right now, the Internet in general is going through growing pains, and the pressure is starting to show in these "free services" type sites ( i.e. Mapquest )

    I don't know about these site in particular, but many of the big sites around today were built with the failed dot-com business model of delivering free content and selling advertising that ran on the page (or popped up behind it.) This, of course, is dependant on people viewing the site in a browser. If people get the information without using a browser, therefore never seeing the ads, the advertisers won't want to spend any money on the site.

    Another problem is, most companies don't want to take the risks associated with innovation, so instead they seek legal action to maintain the good thing they have going. While this is a quick fix, and in the company's best interests, we need companies to present a new business model to the public and see how it gets adopted. I would pay an annual subscription fee for things like Mapquest.com, tvguide.com and maybe even /. I believe others would as well.

    Porn sites, Ebay auctions, games such as Everquest and services such as Apple's dot-mac are online services that subscribers happily pay for because more than anything, they are quality products(well, some of the porn is). If the company's revenue is coming from its users, they would be a lot less concerned about how the information is being distributed.

    This isn't such a radical change, as they could add a premium subscription service, and slowly transition the focus of their business towards it. Wouldn't it be cool if I could write my own mapping application ( or download a pre-made one from the site ) and have it connect to xml.mapquest.com, give my username and password, and retrieve the data I requested.

    1. Re:What's the problem here? by Frobnicator · · Score: 1
      If people get the information without using a browser, therefore never seeing the ads, the advertisers won't want to spend any money on the site.
      It was a problem with text browsers, so many companies explicitly dropped the ALT tags. Now they fight ad-blockers.
      Wouldn't it be cool if I could write my own mapping application ( or download a pre-made one from the site ) and have it connect to xml.mapquest.com, give my username and password, and retrieve the data I requested.
      That's what the plan was.
      --
      //TODO: Think of witty sig statement
    2. Re:What's the problem here? by Evro · · Score: 1

      I would pay an annual subscription fee for ... maybe even /.

      Hrm. Why don't you put your money where your mouth is?

      --
      rooooar
    3. Re:What's the problem here? by Gadzinka · · Score: 1

      Porn sites, Ebay auctions, games such as Everquest and services such as Apple's dot-mac are online services that subscribers happily pay for because more than anything, they are quality products(well, some of the porn is). If the company's revenue is coming from its users, they would be a lot less concerned about how the information is being distributed.

      Not likely. Think DeCSS. Or any content producer (music, movies, books).

      The issue isn't the access method. The issue is control. Total control means huge profit margins (like Microsoft). Luckily, besides the biggest monopolies, total control is impossible to achieve.

      --
      Bastard Operator From 193.219.28.162
  36. retype anti-script graphics by junkwis_anet · · Score: 1

    Maybe all these sites will front-end their sites with "retype anti-script graphics". Kinda like what slashdot.org does to email your password.

    $0.02

  37. Turing test? by siskbc · · Score: 4, Insightful
    So far, I was under the impression no one had won the Turing contest yet. You are beating their trivial problems, but they're finally waking up and shifting the "online human test" to things that people haven't figured out how to code. I'd link to the article if I could remember where I saw it...

    Hell, the simplest would be an easy reading comprehension or logic test with a short-answer blank - the computer would never get it, and all humans would.

    My guess is that soon, people who REALLY want you out will keep you out.

    --

    -Looking for a job as a materials chemist or multivariat

    1. Re:Turing test? by jforr · · Score: 1
      Hell, the simplest would be an easy reading comprehension or logic test with a short-answer blank - the computer would never get it, and all humans would.
      This being posted on a site where everyone jumps into the discussion about the article without ever reading the article, but claim they know everything about it since they read the comments by the editors who don't read their own site and can't see the exact same story posted four slots down.
    2. Re:Turing test? by ignorant_newbie · · Score: 1

      >Hell, the simplest would be an easy reading
      > comprehension or logic test with a short-answer
      >blank - the computer would never get it, and
      > all humans would.

      you mean something like "how many bubbles are in a bar of soap"?

    3. Re:Turing test? by Anonymous Coward · · Score: 0
      Hell, the simplest would be an easy reading comprehension or logic test with a short-answer blank - the computer would never get it, and all humans would.

      This being posted on a site where everyone jumps into the discussion about the article without ever reading the article, but claim they know everything about it since they read the comments by the editors who don't read their own site and can't see the exact same story posted four slots down.

      Yikes...bitter?

    4. Re:Turing test? by nuggz · · Score: 2, Insightful

      First off you assume people will be able to comprehend, I doubt that people are dumb. Don't belive me, listen to a daytime talk show.

      Second a computer will mark your answer, so it must be able to comprehend the answer you put in, you have to give a precise and exact answer (likely), which means its a simple question, and a computer might be able to answer it.

    5. Re:Turing test? by way2trivial · · Score: 0

      Hell, the simplest would be an easy reading comprehension or logic test with a short-answer blank - the computer would never get it, and all humans would. a short logic test, and all humans would get the answer? like- the woman who cooked her laptop?

      --
      every day http://en.wikipedia.org/wiki/Special:Random
    6. Re:Turing test? by Anonymous Coward · · Score: 0

      what about the people who can't read that use the internet. they wouldn't be able to order!@ We all know what happened when they tried to keep the BLIND from using the internet...sigh...

    7. Re:Turing test? by Servants · · Score: 1

      Hell, the simplest would be an easy reading comprehension or logic test with a short-answer blank - the computer would never get it, and all humans would.

      The only problem with that is that it's nontrivial to make up such tests. And that means that no site will ever have more than a relatively small number of them, in the hundreds or thousands at most. So all an attacker needs is a list of question types and answers.

      Part of the reason "read these letters" tests are so widely used is that the number of variations is combinatorially large -- you can automatically vary background, rotation, slant, color or pattern, font, size, baseline, and so forth, individually for each letter.

      Amazingly enough, none of these seriously impede people's ability to read the text, amazingly enough. That's some impressive OCR.

  38. i think Jan Dubois is making an excellent point: by Simon+(S2) · · Score: 1

    * From: Jan Dubois
    * Subject: Re: [Fwd: IMPORTANT: Request removal of WWW::EuroTV]
    * Date: Thu, 06 Feb 2003 13:05:09 -0800

    On Thu, 6 Feb 2003 21:44:20 +0100, "Bas A. Schulte"
    wrote:

    >They're just too ignorant that they think they can publish the data for
    >everyone to see can only be seen through their own website.

    [...]
    >Anyway, I'd love to hear anyone on this with some legal knowledge. I
    >don't believe at all that this will hold up in a court of law.

    I think this discussion is missing the point. It should not be: "What can
    we legally get away with?", but "Do we have the courtesy to respect the
    wishes of publishers of information?", even if their wishes might not be
    legally enforceable.

    Since this is about Perl advocacy, I would like to quote a bit of Perl
    culture: "It [Perl] would prefer that you stayed out of its living room
    because you weren't invited, not because it has a shotgun."

    I think the same rules should apply for screenscrapers too: If website
    owners don't want their pages to be scraped, then people shouldn't do it
    and get their information elsewhere. It is like honoring a robots.txt
    file. It is probably not enforceable, but it is the right thing to do.

    Cheers,
    -Jan

    PS: I'm not saying that "they" weren't the first ones to break the rules
    of politeness by threatening a law-suit, instead of just asking for the
    modules removal. But that doesn't mean that one has to respond in kind.

    --
    I just don't trust anything that bleeds for five days and doesn't die.
  39. Serving Information by Hasufin_Heltain · · Score: 1

    Hmm - my belief is this:

    If someone displays information for public consumption: It releases the right to control what the public does with that information.

    Information being website content - text.

    Sure it is copyrighted. But if I want to screen scrap - make a collage out of it whatever I have that right. If I'm going to publish derivative works then you have to seek permission to publish said works.

    But saying that you have to use a particular application/technology to access my public webserver is ludicrus. If you want to do that - make your data proprietary and require proprietary software to access said information.

    It is well within the rights but don't complain how the public digests your information: do something about it.

  40. You fucking hippies by Anonymous Coward · · Score: 0

    Jesus - Forget all the politics and "My rights are getting trampled on". Write your own version of the module and be done with it. You can fuck the lawyers, and ignore whatever crap is going on.

  41. Yes but by A55M0NKEY · · Score: 1
    First of all let me say that I agree wholeheartedly.

    But if the rules of a protocol are the only rules I need to follow in cyberspace then consider the following valid telnet sessions:

    login: abe

    password: lincoln

    Wrong password, try again:

    login: abe

    password: Lincoln

    Wrong password good bye.

    DISCONNECTED

    login: abe

    password:linc0ln

    connected!

    $ sudo rm -r /

    password: linc0ln

    DISCONNECTED

    I would argue that insecure systems deserve to be broken in to. The person here shouldn't have had such an easy to guess password. He also shouldn't have used an unencrypted protocol like telnet that anyone can listen to. If you want to have a telnet service running accessable by the public then it should be your responsibility to have a hard to guess password and keep people outside your firewall from being able to connect.

    --

    Eat at Joe's.

    1. Re:Yes but by DEBEDb · · Score: 1
      I would argue that insecure systems deserve to be broken in to.


      So if I have a puny lock on my house, opened
      by a credit card, I deserve to be broken into?
      But what if you have a super-duper security
      system, but I am some kind of super-duper
      pro able to circumvent your protection with
      about the same effort as you spend on opening
      my lock with a credit card? Do you deserve it?

      In fact, this typical elitist geek "deserves"
      bullshit really pisses off a lot of "non-geeks."
      Who are you to decide who deserves what? Just
      because someone can do something to you, does
      it mean you deserve it? What if I am able to
      beat you senseless? Do you deserve that too,
      because you didn't study enough kung-fu or
      don't employ an army of bodyguards?

      --

      Considered harmful.
    2. Re:Yes but by A55M0NKEY · · Score: 1
      Online, often the only way to know something is forbidden is to fail to access it because of a security system. At the very least a username and a password should be required as a no-trespassing sign. But why stop there? Banks have their cash delivered by BRINKS truck because the insurance rating is better than for Kid with a Radio Flier Bagz-o-Cash Courier Service. Paying BRINKS is cheaper than eating the losses from the robberies that would take place or paying an insurance company to eat the losses of the robberies that would take place. So though no law of man would require me to pay armored guards to deliver a Bag-o-Priceless Joolz natural law ( i.e. common sence ) would.

      You can walk whereever you want unless you come to a fence or a no-tresspassing sign. You can even walk on private property. To keep people off your land you need to build a fence or put up signs. But if I walk on your land anyway and get caught, it's a $100 fine.

      It's more if I open your unlocked door and even more if I jimmy the lock with my credit card. It is also more if I cut a hole in the chainlink fence at a military base or even worse if I break into a bank. In general, the more incentive there might be to break into something, the better it is secured, and the more trouble I might get in for breaking in. The more trouble I could get in for breaking in the more determined I'd have to be to try it. I might miss a no trespassing sign and be where I shouldn't but it'd just be a small fine. I might knock at my friend's house before opening the door and yelling 'Hello'. I might jimmy the door with a credit card if I thought he might be on the floor dying of a heart attack or choking on a
      chicken bone. I might piss him off and maybe get reported to the cops, especially if his girlfriend was the only one home and didn't answer the door because she was playing in the shower with the vibro-masssage. I might have some explaining to do but jail time would suprise me.

      If I climbed over the military's chain link fence to get my football, I'd be in for some trouble if I got caught. But probably more since that would just be an excuse for my real mission which would be spying. If I was caught burglerizing a bank I'd be in jail 15 yrs or more. The odds of being innocent and getting in trouble goes down with the security of the situation - an person with innocent motives is very unlikely to have overcome decent security.

      But fences and safes are expensive. However big tough locks are freely available for computers and data. This changes the equation taking some responsibility away from the government protect the public and puts more responsibility on the public to protect themselves. There should not be only one penalty for breaking into joe shmoe's computer and putting a goatse.cx wallpaper on it which is meant for people who break into a companies website and post erroneous figures to drive the price of the stock down so they can buy it low and profit later."

      --

      Eat at Joe's.

  42. FIRST POST by Anonymous Coward · · Score: 0

    that counts.

    Because none of your drivel counts for shit.

    Suck it bitches!

    1. Re:FIRST POST by nickclarke · · Score: 1

      why do we keep getting these sort of posts from ACs?
      They're not big, funny or clever, have nothing whatsoever to do with the discussion, and purely serve to annoy the rest of us. If you don't have anything useful to post, don't bother. Moderators please - mod down rude insulting posts such as the parent.

  43. It's not about technology by cygnusx197 · · Score: 2, Insightful

    You know, I think some of you are missing the point in all the technology. I work for a community newspaper publishing company, and we have copyright info at the bottom of every page. I found a guy on google that demonstrates screen scraping techniques using our main news page. That's fine. 99% of the time, it's not a big deal...it's going to happen. What we don't like is when somebody comes along, takes our content, and presents it in questionable environments, like a page that happens to have porn banners on it. Ever hear of "guilty by association"? Frankly, I think it's more likely to happen if screen scraping becomes more commonplace. Honestly... i haven't noticed a drop in traffic when someone does this.

    1. Re:It's not about technology by exhilaration · · Score: 1
      To quote from the comment directly above yours (at least on my screen):

      In order to "publish derivative works then you have to seek permission to publish said works"

      Permission might mean a license from you, and if they continue to publish it without your OK, you can take them to court and seek damages - this is how copyright law works.

      But this is all irrelevent if I'm pulling your content for my personal use - if I feel like reformatting your content, changing its color scheme, etc., you have no right to make me stop. You're welcome to defeat my technology, but that will most likely put you in violation of the Americans Disabilities Act, as your content might no longer be accessible by screen reader programs.

      I believe a blind woman successfully sued a major airline because its website was inaccessible to the blind.

    2. Re:It's not about technology by cygnusx197 · · Score: 1
      You're right. Take it for your personal use. Make a dirty lymeric out of it. We don't care.


      Just don't republish it without our permission.


      Blind people can sue us all they want. We're not an american company.
      Yeah. Low blow, but if we wanted our content syndicated, then we'd develop rss feeds, and xml templates... which by the way, are coming soon.

    3. Re:It's not about technology by jmagar.com · · Score: 1

      Woohoo! RSS would have saved me the time of scraping your site. The scraper grabs the headlines, and links directly back to your site's content. If anything you should see more traffic as the headlines reach more eyeballs now.

      Here's the document in question:

      Template Based Scraping

      Cheers,
      Mike

    4. Re:It's not about technology by jmagar.com · · Score: 1
      Slashdot has eaten my original reply: http://slashdot.org/comments.pl?sid=53128&cid=5255 436

      Anyway, I think the document referred to is this one:

      Template Based Scripting

      Looking forward to your RSS feeds, so that I can turn of my scrapper.

      Cheers,
      Mike

  44. ebay has already done this by troydsmith · · Score: 3, Informative
    About 2 years ago ebay did exactly this. Their case went to court and they won.

    Here is some more info

    1. Re:ebay has already done this by kryptkpr · · Score: 2, Informative

      No, no, this is NOT the same thing.

      This was a website, meta-searching another website without their permission.

      I used to run a large MP3 meta-search, and I made damned sure I had permission from every search engine I meta'd, and that their ads were put into my rotation to compensate for the extra traffic.

      I also added measures such as search caching (so when people searched for "britney spears" 500 times a day, I wouldn't actually send 500 queries, I'd only send 8, at 3 hour intervals).

      The perl module in question here allows an easy way to extract information from a website, and of course provides the capability to meta-search another site.. but that doesn't mean you have the right to do it without their permission! This is exactly what the Judge ruled:

      "Even if (Bidders Edge's) searches use only a small amount of eBay's computer system capacity, Bidders Edge has nonetheless deprived eBay of the ability to use that portion of its personal property for its own purposes"

      They used eBay's system resources, without making a deal, and without compensation.. This is just-plain-wrong (tm).

      Technology is not the problem here, it's that some people are just jackasses and want to profit from other's work.. this shouldn't be allowed. And I don't mean not allowed by law. Technology does wonders for blocking othertechnology.. if the two websites in question have half a brain they'll either

      a) change their business model
      b) find a way to block these bots (embedding tiny images in their pages for example? I'm sure I could come up with many more, if someone wants to pay me :)

      and not try to fight progress with congress.

      --
      DJ kRYPT's Free MP3s!
  45. Simple solution by Anonymous Coward · · Score: 0

    If this is how those companies treat their customers, fuck 'em.

    While there's absolutely nothing illegal that you're doing - they don't deserve your patronage or the following that your utilities will create for their sites. Take your eyes elsewhere. There are fine alternatives.

    Try Mapquest UK and your choice of alternative TV listings.
    1. Re:Simple solution by Anonymous Coward · · Score: 0

      They'd probably be very greatful that you and your sponging friends went elsewhere. Take your money...oh wait, you ain't paying your just stealing the content and giving nothing in return. You're right, just fuck off.

    2. Re:Simple solution by nickclarke · · Score: 1

      Also try the Ordinance Survey. That's where streetmap get their maps from anyway, and the only advertising is from the OS trying to sell you the printed version of the maps (which is their business anyway)

  46. It's stupid, anyway. by errxn · · Score: 1

    This begs the question: why screen scrape in the first place? It's not very reliable in the sense that, barring special circumstances, there is no guarantee that the data that is returned in a response will be in the format the scraper expects.

    You're basically trying to parse data out a string that you can at best only *assume* is going to be in a predetermined format. All the target has to do, in a lot of cases, is change a tag, comment, or what-have-you here or there (assuming that the response is a string of HTML) and it can throw the whole thing out of whack.

    Now, if the response is just straight data, a return from a web service, or some other special case, then the data from it could probably be more trustworthy. But then again, if you're making requests to a web service, it's not really a "screen scrape", is it? And, I would also assume that if the target went to the trouble to expose a web service, they certainly expect outside parties to use it. Authorization issues, etc. would then become their burden.

    --
    In Soviet Russia, Chuck Norris will still kick your ass.
    1. Re:It's stupid, anyway. by glwtta · · Score: 1
      You're basically trying to parse data out a string that you can at best only *assume* is going to be in a predetermined format. All the target has to do, in a lot of cases, is change a tag, comment, or what-have-you here or there (assuming that the response is a string of HTML) and it can throw the whole thing out of whack.

      If you are doing it the very wrong way, then yes, you are correct. If you need to do this for real, then use use modules like HTML::TokeParser and HTML::TreeBuilder, so you are not parsing a string but traversing a tree. And this, while not perfect, is reliable enough to be very useful.

      Incidentally, here's why I have rely on "screen scrapping" heavily: In the scintific communitites a lot of data and services are provided via the web (and are in the public domain, here at least, when they say that they want you to have access to the data, they do mean it). They don't however have the resources (time, technical expertise, etc) to provide these as web services, or (in case of software) as local installations (various reasons for this).

      In any case, regardless of what these companies think the web is and how they can limit their clients' use of their sites, I find it absolutely infuriating that they would demand the removal of modules from CPAN. This whole "it can potentially be used in a displeasing way for a company, so it must be banned and destroyed" mode of thinking is really loosing all semblance of propriety.

      --
      sic transit gloria mundi
    2. Re:It's stupid, anyway. by errxn · · Score: 1

      I would say that your scrapes fall under one of the "special case" scenarios.

      And, string parsing vs. token parsing/tree traversal issues aside (I agree with you on that, BTW), my point is that unless you can enforce the integrity of the return data, or at least be reasonably certain that it will be enforced by the party on the other end, you are relying on inherently unreliable data. In some cases, the risks may be acceptable, but I've always shied away from it whenever possible.

      --
      In Soviet Russia, Chuck Norris will still kick your ass.
  47. Freedom from innovation by RokaMoka · · Score: 0, Flamebait

    This is just another example of people turning to the law instead of using their brains.

    Any admin worth his salt has to deal with undesirable traffic without crying for help. Whether it's spam, badly-written bots, DOS attacks, or just offtopic trolls in community/chat sites.

    Don't like traffic from Nigeria, block it. Don't like bad bots, trap them. Don't like "First Posts", invent a clever meta-moderation system to deal with it.

    Blaming CPAN for annoying bots is like blaming the NRA for gun violence. Oh, wait-a-minute, I DO blame the NRA for gun violence.

    Blaming CPAN for annoying bots is like blaming Microsoft for every w32.Worm. Oh, wait-a-minute, that one is their fault too.

    Blaming CPAN for annoying bots is like blaming CD players for Britney Spears. Yeah, that's it.

  48. They do it all the time! by Anonymous Coward · · Score: 0

    It's been in common practice for years, but people are just now bothering to complain about it??

    "This site best viewed with Microsoft Internet Explorer"

  49. Derivative work by yerricde · · Score: 5, Informative

    There's no law stating that we have to look at ads.

    What about 17 USC 106, which states that barring fair use, etc., the copyright owner has the right to prevent others from creating derivative works of a web page?

    --
    Will I retire or break 10K?
    1. Re:Derivative work by jcast · · Score: 1

      I have difficulty buying that re-formatting a UI is ``creating a derivative work''.

      I mean: if I write a GUI on top of mount (say), is that a derivative work? Can Sun prosecute me if I port it to their OS (which I assume has a proprietary mount)?

      --
      There are reasons why democracy does not work nearly as well as capitalism.
      -- David D. Friedman
    2. Re:Derivative work by Anonymous Coward · · Score: 0

      A better analogy is asking, "Can I place that rented white sculpture behind some multi-coloured glass panes?" ... I'd say so. Even better is the fact that if I want to, I'll wear shades watching TV, if the producers have a problem with that -- blow.

    3. Re:Derivative work by Anonymous Coward · · Score: 0

      Your analogy does not work.

      The key word in your statement is "rent". If you have specific license from the copyright holder in exchange for money, then it's fine. If you don't it's not.

    4. Re:Derivative work by rking · · Score: 1

      The key word in your statement is "rent". If you have specific license from the copyright holder in exchange for money, then it's fine. If you don't it's not.

      Renting a sculpture doesn't imply any sort of license from the copyright holder. You could rent, or buy, a sculpture without having any dealings with the copyright holder and without ever knowing who that is.

    5. Re:Derivative work by Herkum01 · · Score: 1

      Time schedules cannot be protected any more than phone books. Sorry, there is not copyright there.

    6. Re:Derivative work by Natalie's+Hot+Grits · · Score: 5, Informative

      Yes, barring fair use, which explicitly allows you to do this unless you re-distribute the work. Which you aren't.

      Short answer is that you can modify any work under fair use for your OWN PERSONAL USE and not for someone else. If your web browser cuts out ads, then that is legal, and no US Code that is currently existance disallows these modifications.

      Aside from this point, there is still the legal rammifications that there is no US Law which states it is illegal to build, distribute, or use tools that can modify copyrighted works (unless the work is encrypted and covered under the DMCA)

      If an ISP started doing this at his firewall, and then re-distributing the web site to your computer after you request it, then this might be illegal. They might be able to argue that one party is getting the work, modifying it, and redistributing it, which is certaintly not covered under the Fair Use Doctrine.

      OTOH, if the ISP has a fair use reason to do this (such as reformatting the text to work on a text only terminal), then this may also be legal.

      What it all boils down to is that the spirit of copyright laws are restricting COPYING and REDISTRIBUTING, not how a person uses those works. This has been true untill 1998 when the DMCA was enacted, and even now is still true for all copyrighted works that are not covered under the DMCA's encryption clauses. To this day, I have yet to find a website that is encrypted for purposes of the DMCA protection. Untill this changes, they won't have any legal legs to stand on.

      --
      Two infinite things: your stupidity and mine. But I'm not sure about the latter. If my sig offends you, I'm sorry.
    7. Re:Derivative work by bear_phillips · · Score: 1

      Original works are copyrightable, facts are not. The facts, tv showtimes etc, are fair game. The other parts of the webpage, ads etc, are copyrightable. I guess you could argue that removing the adds is voilating the copyright they hold on the ads???

      --
      http://www.windmeadow.com/
    8. Re:Derivative work by Sabalon · · Score: 3, Interesting

      If I buy a copy of The Hobbit, rip out every 5th page and then read it, have I created a derivative work and broken a law?

      If I don't distribute it, can't I do whatever I want with the content?

      If I was to then repost this on the web, yes...I could see where that would be a problem, but not what I do for myself.

    9. Re:Derivative work by Gadzinka · · Score: 1
      There's no law stating that we have to look at ads.

      What about 17 USC 106, which states that barring fair use, etc., the copyright owner has the right to prevent others from creating derivative works of a web page?

      Does he have the right to prevent the end user from creating such derivative works for his own personal use?

      Robert
      --
      Bastard Operator From 193.219.28.162
    10. Re:Derivative work by Anonymous Coward · · Score: 0

      There is an interesting twist to this regarding
      cell phone companies and web content. Without
      exception they all run proxies that further compress
      jpg images. I do not want my copywight images subject to this. Can the cell phone company refuse to provide access to my site because of this?

      As far as personal use scraping, save the ads to watch later :-)

    11. Re:Derivative work by pareve · · Score: 1

      Copyright is not the only relevant law; companies are often worried about the collection of things like prices (which aren't copyrightable) by competitors. One company, EF Cultural Travel, has sued 2 competitors, and won twice, under the Computer Fraud and Abuse Act (CFAA, 18 U.S.C. 1030). Most recently EF claimed that a competitor's use of screen scraper "exceeded authorized access" to its website because use of the scraper was beyond the "reasonable expectations" of an ordinary user. On this basis EF actually got a federal court to order the scraper writer not to act in concert with a company that used confidential information about EF's site to make the scraping more effective.

    12. Re:Derivative work by jedidiah · · Score: 1

      Actually, I believe that an art patron is perfectly entitled to vandalize anything that they buy. It may not be this way in Europe, but I do believe this to be the case in the US.

      Also, paints and other assorted devices aren't outlawed merely because some art patron may vandalize some original artwork or print. ...this is like banning people from writing code that would paint a moustache on the Mona Lisa.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    13. Re:Derivative work by bwt · · Score: 4, Insightful

      The author does not create the "web page", that is the job of the user agent. The author offers up raw HTML source code and YOU render it. Your argument proves too much -- it proves that all rendering of HTML in a browswer is copyright infringement because it creates a derivitive work of his source code. Indeed, it DOES create a derivitive work, just one that is **authorized**.

      The author creates various files such as HTML text files, pictures, pdf's etc. By using HTML, he has authorized the user agent to render consistent with the HTML standard and his HTML code. Thus, he has explicitly authorized certain limited types of derivitive works to be made from his source code by using HTML. The HTML standard does not require images to be rendered, and since it was the author's choice to use HTML, no violation of copyright law occurs when HTML is rendered in a manner consistent with the HTML spec.

      Had he wanted to mandate the exact representation, he could have used an image format or a PDF. It's his choice, but he must live with it and all that follows from it.

      Of course, there is nothing wrong with not rendering the HTML at all and just looking at it as source code. Nor is there any cause of action under copyright law if you extract unprotectable facts and ideas from either the source code or the rendered version.

    14. Re:Derivative work by Dausha · · Score: 1

      Yeah but . . . Isn't the copyright on the Mona Lisa expried? I mean, at least until the next Disney upgrade to the US Copyright laws extends the 'limited period' to cover an epoch?

      --
      What those who want activist courts fear is rule by the people.
    15. Re:Derivative work by diggitzz · · Score: 1

      I guess you could argue that removing the adds is voilating the copyright they hold on the ads???

      I don't see how ... I doubt "refusing to look at copyrighted work" somehow violates copyright.

      --
      -=[You cannot consistently judge this statement to be true.]=-
    16. Re:Derivative work by jedidiah · · Score: 1

      Ok then... replace Mona Lisa with Mickey Mouse.

      That would be quite some worm... deface every copy of a Mickey Mouse image on the planet.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    17. Re:Derivative work by phantomlord · · Score: 1
      They might be able to argue that one party is getting the work, modifying it, and redistributing it, which is certaintly not covered under the Fair Use Doctrine.

      If I buy a newspaper, give it to my friend without reading it and he gives it back to me with all of the ads scribbled out with a sharpie, there's nothing the publisher of the paper can do. I can sue my friend for destroying my property but there's nothing anyone else can do about it.

      When I request a webpage and hand it to my ISP's proxy before I see it, it's the same situation.

      --
      Don't leave your mind so open that your brain falls out. Don't close it so much that you cut off the blood.
    18. Re:Derivative work by Minna+Kirai · · Score: 1

      The Hobbit is more than 56 years old, so it's out of copyright anyway.

      Oops, never mind.

    19. Re:Derivative work by plague3106 · · Score: 1

      Does that apply though if the derivative work is for your use only?

      Technically couldn't highlighting be a derivitive work if thats not the case?

    20. Re:Derivative work by Anonymous Coward · · Score: 0

      I think the question is closer to whether or not you can design, manufacture, and distribute a machine that rips out every 5th page of the Hobbit for your customers, providing them with a derivative work as you envisioned it. Did they derive the work from their legal copy of the Hobbit, or did you derive the work from their legal copy of the Hobbit?

      I don't know the answer, but I bet a lot of lawyers would be willing to take sides for money.

    21. Re:Derivative work by Anonymous Coward · · Score: 0

      If you bought the book "The Hobbit" which was in the original printing, hasn't the copyright expired? So you could repost all of it on the Web. But you have to prove that what you posted came from a copyright-free volume.

    22. Re:Derivative work by Berzelius · · Score: 1

      Suppose I do https and not http? All webcontent then suddenly is encrypted ...

  50. As Slashdot said yesterday ... by supergiovane · · Score: 1

    ... they can.

    --
    Signatures are for stupids.
  51. I've done it for years by rkinch · · Score: 1

    I've done this with my own scripts for eBay (to improve their search engine) and for Yahoo Groups (which hobbles a perfectly fine NNTP model with advertising, lack of threading, and slow HTTP retrieval). You code in your interests, and just key a simple shell command to have your pertinent info retrieved for you, instead of all the tedious pointing and clicking. Yahoo Groups is the absolute worst, but it is free and people are seduced to use it. Yahoo Groups is to Usenet as AOL is to a real ISP.

  52. Phil Donahue Is My Cousin by Acidic_Diarrhea · · Score: 2, Insightful

    I don't believe the discussion is about whether or not screen scrape is feasible for people and whether or not it can be stopped through a bit of intelligence but is instead a discussion of whether or not one company has a right to grab content from a website and redistribute it on their own. Yes, it's possible to stop people from doing aforementioned grab (of course, as this war escalates you're going to have to start shutting real people out of your content) but should people have the legal right to do the grab. Now, what do you think of that question?

    --
    I hate liberals. If you are a liberal, do not reply.
  53. wipe the foam from your lips by sydlexic · · Score: 2, Insightful

    ashcroft is a thug regardless of his party affiliation. take of your partisan blinders and understand that patriotism != submission.

    1. Re:wipe the foam from your lips by Anonymous Coward · · Score: 0

      So are most Dems... take off your partisan blinders and understand that democrats != freedom.

    2. Re:wipe the foam from your lips by Anonymous Coward · · Score: 0

      jesus, you moron. you still think I'm a democrat. I'm not. nor a republican. I do my thinking for myself.

    3. Re:wipe the foam from your lips by Anonymous Coward · · Score: 0

      Ashcroft is so in tune with the American people that as an incumbant he lost an election to a dead guy.

    4. Re:wipe the foam from your lips by bofkentucky · · Score: 1

      And I suppose that Janet Reno wasn't, why not ask Elian about DOJ goons kicking down the door to send you back to a communist shithole.

      --
      09f911029d74e35bd84156c5635688c0
    5. Re:wipe the foam from your lips by n9hmg · · Score: 1

      take of your partisan blinders
      I could understand that as an archaic grammar... "take of your partisan blinders a portion, and give it unto the abyss", or something like that, but in this context, it just doesn't make any sense.

  54. Section 508 compliance by yerricde · · Score: 1

    Any web site that uses a visual method of authentication as the exclusive method of authentication will be inaccessible to people with vision problems and thus not be compliant with Section 508 of the U.S. Rehabilitation Act, and the entity that operates the web site will lose the U.S. government as a potential customer.

    --
    Will I retire or break 10K?
  55. Utter bullshit. by Civil_Disobedient · · Score: 1

    Complete and utter bullshit. OCR with results of 97%? Sure, if the text is consistent, all in the same direction, with basic fonts, and non-contrasting backgrounds.

    Bascially, everything that the "Enter the word/Image" protection does not use. There are a hundred different ways to alter the text to prevent anything but human reasoning to read (decode). The beauty of these systems is that the transformations are computed upon request, which means you have no way of knowing what to expect. You might get backwards letters, or letters that are rotated, or words that are upside-down, with each letter as a different, crazy font (i.e., NOT Times Roman or Courier).

    Sites like PayPal, Yahoo Mail, Ticketmaster and the like are using this system because so far there is no way around it. A computerized system that requires human authentication like this is an absolutely beautiful challenge to the hacking community. I honestly doubt you have a working solution.

    If you did, you would be very, VERY rich, and would be too busy cavorting with naked Playmates on your desert island than to write this kind of crap on Slashdot.

    1. Re:Utter bullshit. by eyegone · · Score: 1
      Why aren't these systems illegal under the Americans with Disabilities Act?

      --
      "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
    2. Re:Utter bullshit. by ceejayoz · · Score: 1

      Sites like PayPal, Yahoo Mail, Ticketmaster and the like are using this system because so far there is no way around it.

      And banks use signatures because so far there is no way around it, right?

      PayPal etc. use it because it's the best they currently have - that doesn't make it perfect by any means.

      If you did, you would be very, VERY rich

      Uh, no... you'd still need stolen credit card numbers and the ability to not be tracked.

    3. Re:Utter bullshit. by Anonymous Coward · · Score: 0

      Great, another fucking lawyer. You're probably foreign, too, right?

    4. Re:Utter bullshit. by etcpasswd · · Score: 1

      There was an article on slashdot a few days ago about this (sorry, can't find the URL now). Check out CAPTCHA. It doesn't seem to me that the project intends to extend the idea to the complete content of the site though.

    5. Re:Utter bullshit. by Anonymous Coward · · Score: 0

      You're a moron. Capchas are created by PhDs ... you know, people with a clue.

  56. It will always be possible to scrape their content by Anonym0us+Cow+Herd · · Score: 1

    They may not like it, but there really is little they can do about it.

    Trying to stop content scraping is a loosing battle.

    They can try to restrict it to real browsers. But what is a real browser? After all, Mozilla is open source. It executes JavaScript, or anything else they might care to attempt to detect. In the worse case, Mozilla being open sourced, could be hacked to go to their site, (yes an inefficient Perl module of course), scrape the content, executing JavaScript, etc., and then from Mozilla's menu, pick "Document Structure" and recover the information from there. All automatically and in the background.

    They could start using Flash. But if it is text in flash, then the flash file can still be parsed. Its format is documented.

    They could start generating a JPEG of the information. That can still be OCR'ed. Efforts to defeat the OCR would just make it harder for the human eyes to recognize. Do you want to look through TV listings in strange fonts, with lines through them, inconsistent or unattractive colors?

    --
    The price of freedom is eternal litigation.
  57. Not completely by mccrew · · Score: 2, Insightful
    Follow that logic, then by having a telephone a diner has granted explicit permission to the telemarketer to interrupt his meal.

    Or more related to the point, here are some real-world scenarios:

    1. Spammer tries to relay through a machine by looking for well-known CGI. For example, I frequently see requests for /cgi-bin/formail.pl, with the Referer: header set to the name of my domain.

    2. Spammer tries to relay through either an HTTP server or HTTP proxy which supports the "CONNECT" method.

    Has the owner of the machine explicitly granted spammer permission to (mis-)use his machine, just because a well-known script is present, or because CONNECT is enabled on the wrong side of the internet connection?

    I would respectfully disagree.

    --
    Hey, Windows users, there is no such thing as "forward" slash, there is only slash and backslash.
    1. Re:Not completely by bwt · · Score: 1

      Follow that logic, then by having a telephone a diner has granted explicit permission to the telemarketer to interrupt his meal.

      That is the law, yes. Unless of course you are in a state with a do-not-call list and your name is on it.

      Has the owner of the machine explicitly granted spammer permission to (mis-)use his machine, just because a well-known script is present, or because CONNECT is enabled on the wrong side of the internet connection?

      Yes.

      He is responsible for the content of his site to the extent that his actions create it. He turned on his box with /cgi-bin/formail.pl present, so the consequences of his actions are that he is allowing anonymous relaying, which in my opinion is an uncool thing to do, but it is his right. Some people intentionally do this, by the way. If he is an idiot and didn't intend to do this, then his own negligence is his problem.

      Of course, I am free to put him on my blacklist so that I don't get any traffic from his site.

    2. Re:Not completely by LMariachi · · Score: 1
      Follow [sic] that logic, then by having a telephone a diner has granted explicit permission to the telemarketer to interrupt his meal.

      The word you're looking for is implicit, not explicit. And yes, the diner has indeed agreed to receive telephone calls from all and sundry by leaving the phone on, absent an explicit request to the contrary (e.g., membership in a Do-Not-Call registry.) The diner is under no obligation to pick up the phone or even leave the ringer on, but it's up to him to decide how to handle the request.

    3. Re:Not completely by mccrew · · Score: 1
      The word you're looking for is implicit, not explicit

      Actually, no.

      The poster to whom I replied was trying to make the point that by putting up an HTTP server that responds to arbitrary GET requests constitutes explicit permission.

      To follow this analogy further, are you saying agreeing with the original post, and saying that putting up an HTTP server that responds to arbitrary GET requests constitutes explicit permission? Would the answer change if the resource in question was listed in a robots.txt exclusion file? (i.e. analogus to the do-not-call registry).

      --
      Hey, Windows users, there is no such thing as "forward" slash, there is only slash and backslash.
    4. Re:Not completely by LMariachi · · Score: 1
      The only thing that constitutes explicit permission is some sort of notice informing you that "You have permission to do foo." Robots.txt files explicitly forbid certain resources, implicitly allowing all others.

      This is just semantics; permission is permission, and while implicit permission may be somewhat easier to argue about in court, at the end of the day it's not really any less compelling than the explicit version.

  58. licences? by zoloto · · Score: 1

    Since when do you need a licence to view content? Does a library tell you to sign a NDA or other contract before you can look at those works or publications? If you don't want people to view your information don't post it on the web. The internet is for information and entertainment, and not many people pay for it in comparison to the rest of us.

  59. Don't like it? Don't put stuff on the web! by Maul · · Score: 4, Insightful

    If you put something on the web, you have to assume that people are going to access that information in any way that they possibly can.

    I suppose the big complaint is that people might not be viewing the "ads" on pages if they use certain HTTP clients.

    I have a suggestion for the sites that are complaining. If you don't like it, don't put stuff on the web. Write your own custom client-server solution if you don't want people accessing it with certain browsers or other software.

    If you are depending on ad banners for your revenues, you and advertisers are taking a "risk" that people might not see the ads, or that they might not buy advertised products. Tough luck if you lose out on your bet. Hopefully you have a solid way of making money related to whatever service you are providing to make up for it.

    Whining about lost ad revenue and such is the same as whining about losing money in Las Vegas. You should have assessed the risks before playing the game.

    --

    "You spoony bard!" -Tellah

  60. "Buying" by yerricde · · Score: 1

    I have difficulty buying that re-formatting a UI is ``creating a derivative work''.

    If you're not independently wealthy, you'll also "have difficulty buying" the services of an attorney to defend you in a court of law.

    The definition of "derivative work" in US copyright law can be found in 17 USC 101 plus case law with which I am not very familiar because I'm not a copyright lawyer.

    --
    Will I retire or break 10K?
    1. Re:"Buying" by Anonymous Coward · · Score: 0

      If you're not independently wealthy, you'll also "have difficulty buying" the services of an attorney to defend you in a court of law.

      Why would your wealth need to be independent in this instance?

    2. Re:"Buying" by jcast · · Score: 1

      The definition of "derivative work" in US copyright law can be found in 17 USC 101 [cornell.edu] plus case law with which I am not very familiar because I'm not a copyright lawyer.

      So tell me: do you or do you not claim to be able to tell me that the EuroTV module is creating a derivative work?

      Also: are you or are you not always this pedantic with informal english?
      --
      There are reasons why democracy does not work nearly as well as capitalism.
      -- David D. Friedman
  61. Thread at Perlmonks by Neil+Watson · · Score: 2, Informative

    Go Here for discussion last summer over at Perlmonks.

  62. Fairly uninforceable. by nobodyman · · Score: 2, Insightful
    Even if you removed the screenscraping modules you wouldn't even come closs to solving the "problem" these website operators are having. Both Microsoft and (I think) Sun have XML api's that allow you to ssue http requests and easily access what the server sends back. Even if you didn't have a high-level "screenscraper", you could always go through the sockets api. Hell, if I want to find out the type of server a website is using I just open a telnet connection to port 80 and type
    GET <document_name> HTTP/1.0
    ...hit the return key twice and boom. Being that easy, I'm sure there are tons of developers that screenscrap without even using a mod.

    If a website operator is having their copyrighted content lifted by another site and presented as its own, then that operator can sue using traditional copyright law. If they are having their website slammed because some clueless developer is scraping too often, they can block the IP. But trying to restrict access to the api is heavy-handed and futile.
  63. Yes, I think they can. by bmetzler · · Score: 1

    They are giving you a service for basically free that has enormous costs for them. The expect that page views are just that, people actually viewing the complete page in the way it was presented.

    Now, I also believe that companies should provide SOAP interfaces to their sites so that people can properly integrate the information available. However, they should also charge for this service.

    Maybe if they don't have any other way to get the information screen-scraping is acceptable. But it's much better to have a SOAP interface you can use. Oh, and if they do have a SOAP interface, and you screen-scrap to get the same information without paying for it, you are stealing from them.

    -Brent

  64. Comment removed by account_deleted · · Score: 2, Informative

    Comment removed based on user account deletion

  65. Crossbows don't kill people by yerricde · · Score: 1

    If you think that there wasn't anything as simple as a firearm before firearms were invented, take a look at a crossbow.

    "Crossbows don't kill people; people kill people." "Crossbows don't kill people; arrows kill people." "Crossbows don't kill people; blood loss kills people." The clichés are intended to concentrate attention on different parts of the cause.

    --
    Will I retire or break 10K?
    1. Re:Crossbows don't kill people by Anonymous Coward · · Score: 0

      Ever try to load and fire a crossbow? No, you have not. It's not easy. It's not accurate. It's not 'point and click'.

    2. Re:Crossbows don't kill people by Anonymous Coward · · Score: 0

      Like crossbows of the time, early firearms were neither easy nor accurate.

    3. Re:Crossbows don't kill people by jedidiah · · Score: 1

      Ever try to fire a 9mm, or AR-15? It's not as easy as you make it out to be. It's also not at all accurate if you are a novice.

      Much like a bow, you will hit something and it will likely bleed but it may not be the intended target.

      Bows and crossbows were being used like modern infantry rifles long before firearms became that accurate or had that much range.

      The Battle of Agincourt is a good example of this.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    4. Re:Crossbows don't kill people by n9hmg · · Score: 1

      Ever try to load and fire a crossbow?
      Yes.
      No, you have not.
      Yes, I have.
      It's not easy.
      Yes, it is.
      It's not accurate.
      That depends on your measured definition of accuracy. I can place a bolt in the same 4 inch circle at 30 yards that I can put an arrow into, and my arm doesn't hurt, even if I wait a week to release.
      It's not 'point and click'.

      Ok, you're right on this one. It's point and "chun"... Pardon my weak ononmatopaeic skill. If you've heard the sound, you know what I really mean.
      Note: I'm not talking about a poacher's leaf-spring limbed allthread-shooting behemoth here, but light pistol crossbows, midweight hunting bows (with goatsfoot), and belly bows... all a bit faster, and every bit as deadly (to maybe 50 yards) as a percussion-ignition muzzleloader (I've never used a flintlock).

    5. Re:Crossbows don't kill people by TW+Burger · · Score: 1

      Guns don't kill people, I kill people.

  66. In the Spirit of Things by Herkum01 · · Score: 1

    Dear Bill,

    I ask that Microsoft to please recall all versions of Windows. They might be used be to illegally to spread content without my my approval.

    Thanks,

    Your copyright overseer,

    Valenti

  67. Captchas by Valdrax · · Score: 4, Interesting

    Actually, this is a field that is quickly being considered a new Turing test for the computer vision field. It is actually very easy to make pictures that humans can read and that machines currently can't. Look up more info on it here.

    --
    If it's for-profit but free, you're not the customer -- you're the product (e.g., the Slashdot Beta's "audience").
    1. Re:Captchas by Herkum01 · · Score: 1

      What I would be interested in is a computer that can actuallly see and show those 3-D pictures. You know the ones that you have to look at just right, while crossing your eyes and standing on your head to convince yourself that may be looking at an elephant.

      Now that be useful!

  68. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  69. Um, HTTP supports this... by jeremiahstanley · · Score: 1

    I can think of a few creative uses of mod_rewrite to stop people from hot linking your images. Here is a tutorial on how to set it up using something as trivial as .htaccess.

  70. Before they call out their lawyers... by Dr.+Mu · · Score: 1
    ...why don't they exhaust other means? For example, Streetmap doesn't even have a robots.txt file, which as much as says, "Welcome all!" EuroTV does have one, which reads:
    User-agent: * Disallow: /scripts/
    Disallow: /scriptswin/
    Disallow: /scriptsrv/
    Disallow: /images/
    That should be enough to block well-behaved scripts, so one point for EuroTV. For the more ill-mannered client, their server can always check the client's UserAgent string. Perl's LWP suite, by default, does not try to mimic a browser and could easily be blocked from access. Only then, if those two measures fail, should the lawyers be called -- but not as a substitute for technical cluefulness.
  71. Do it the right way... by gabe · · Score: 1

    If the company is unwilling to negotiate over you stripping dat from their web pages, perhaps there's another way to get what you want? Why not try asking them to implement web services via XML-RPC or SOAP that would provide the data you desire?

    --
    Gabriel Ricard
  72. generic screen scraper by loudici · · Score: 1

    this news item makes me feel like there is a need for a generic screen scraper plugged into mozilla that would know how to get to a piece of data without having to navigate in mouseclick hell.

    it is most probably a difficult task to make a tool that would be easy to use and powerful (how do you describe the way to parse the NFL webpage and get the score of your favorite team?) but sticking it to the people who create artificial limitation to the way their data can be accessed feels like a reward worth the effort.

    --
    Dev elpizw tipota, dev phoboumai tipota eimai lephteros http://euclidian.org
  73. Missing the point by Anonymous Coward · · Score: 1, Insightful

    When are people going to get it into their heads that public accessibility != public domain? This is, essentially, the argument that both authors and some supports make, that if it is publicly available then it is within the public domain. It isn't. Books in a library are not in the public domain simply because any schmuck off the street can stroll in and look at them. TV shows and sound recordings broadcast over radio waves are not in the public domain because anyone can pull the signal out of the air. Movies are not public domain because anyone willing to pony up the cash to see one can see it. Correspondingly, webpages are not in the public domain just because any nitwit with a computer and a connection to the 'net can load a webpage.

    Damn straight this about our rights online. It's an educational example that with rights come responsibilities. Those that abuse those responsibilities lose those rights.


  74. Comment removed by account_deleted · · Score: 3, Interesting

    Comment removed based on user account deletion

  75. I've been wondering about this by Mike+Hicks · · Score: 1

    I've been using MythTV for the past few months, which uses XMLTV to scrape certain sites for TV program guides. I've felt kind of concerned about using that software. I wouldn't mind paying someone for my TV program guide -- I just don't want the provider to know what I'm watching (one tradeoff you have with Tivo, among others).

    In addition, if you have a good site that has a vested interest in providing well-formatted data for you to download, you don't have to worry every day that the website might change it's layout or whatever.. I much prefer to use something that has a defined protocol, rather than something that is always in a state of flux..

  76. Re:As Slashdot said yesterday ... by Xformer · · Score: 1

    Yes, but if you spoof the user agent, you get around that easily, as is stated in the comments on this page.

    If a receiver's only clues as to a client's nature reside in an open protocol, the sender's nature can be faked. Mozilla can look like IE (though I wouldn't know why it'd WANT to), IRC bots can look like mIRC, etc.

    --
    All I want is a kind word, a warm bed and unlimited power.
  77. If its on a web site, its in the public domain. by crovira · · Score: 0, Informative

    If its purely internal then they should use a VPN and/or intranet and keep their stuff OFF the web.

    The web is about as private as yelling at the top of your lungs at a karaoke competition. Anybody who thinks they can tell you to listen with one ear or the other is dumb.

    --
    MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
    1. Re:If its on a web site, its in the public domain. by Afterimage · · Score: 2, Informative

      Ah, no. It is far from the public domain, legally speaking. The author has copyright on the material unless they've explicitly assigned their rights to the public domain. Simply posting material on a website does not accomplish that.

      As a content author, they are free to try to their consumers how to consume and use their service (per license, I'm sure). Whether it's *reasonable* or not is another issue entirely.

      --
      --Humpty Dumpty was pushed!
  78. NF Chance by frovingslosh · · Score: 2, Insightful
    If human eyes can read it, someone can write software to parse it.

    Thats what I tell my clients who try to "encrypt" things in this silly manner. I've written packages that defeat those silly "enter the word contained in the image" tests, I've written packages that defeat silly anti-automation scripts.

    It's really not hard.

    Can something that recognizes text in an image be written? Sure. It's just a form of OCR. Can you write one that's able to look at any generic webpage, a mix of text and images, and do what is being asked of a human? I don't believe you can, and it seems a pretty high expectation of any software for the current state of AI. A targeted program for one website I might believe, but such tests for a human are certainly valid protection against web crawling 'bots.

    Which is not to say I in any way agree that screen scraping software in any way is a violation of a website owner's rights. It's not.

    --
    I'm an American. I love this country and the freedoms that we used to have.
  79. Legal? Probably. Rude? Maybe... by Rob+Parkhill · · Score: 4, Insightful

    EuroTV has a robots.txt file that asks to leave the various /scripts directories alone. If this Perl module is just ignoring that robots.txt file, then that is just rude, although I don't see how it is illegal.

    Streetmap doesn't even have a robots.txt file, so I don't see why they are whining about it.

    Although I can see why these websites could get upset. The TV-listing screen scrapers are especially bad at hammering a site relentlessly for a sustained period of time to obtain all of the programming information for a certian broadcast area. The scraper has to hit the site repeatedly to obtain all of the information, since it isn't all displayed on a single page. If any one of these scrapers gets to be really popular, it could kill the site.

    Of course, the solution to that is to make all of the listing available as one big chunk to avoid repeated requests. But then the site goes out of business in a few weeks due to lack of advertising revenue.

    I, for one, wish I could buy a subscription to zap2it.com that would give me fast, easy access to the channel listings in, say, XMLTV format. Is $25/year a reasonable fee, considering that I would only hit the site once a day at the most, and grab a single file?

    --
    "Tomorrow's forecast: a few sprinkles of genius with a chance of doom!" - Stewie Griffin
    1. Re:Legal? Probably. Rude? Maybe... by rusty+spoon · · Score: 1

      Try DigiGuide as it does that and much more.

    2. Re:Legal? Probably. Rude? Maybe... by Rob+Parkhill · · Score: 1

      Argh, so close... but no Canadian TV listings there. Bummer.

      --
      "Tomorrow's forecast: a few sprinkles of genius with a chance of doom!" - Stewie Griffin
  80. Re:Use the "Wink Wink, Nudge Nudge" module disclai by anubi · · Score: 1
    Your post reminds me of a case I read about involving another computer professional:

    RIAA: "You have all these CD Burners, all this Kazaa usage, and you claim you are not distributing our copyrighted content???

    GEEK: "Sir, I have not distributed your stuff. It's for my own use."

    JUDGE: "The evidence shows you have all the equipment and usage logs to clearly show you are in violation."

    GEEK: "Then you may as well accuse me of rape, too, your honor."

    JUDGE: "Are you confessing to rape as well???"

    GEEK: "No, Sir, but I sure as hell have the equipment for it!"

    --
    "Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]

  81. stupid business models by g4dget · · Score: 2, Insightful
    First, people come up with stupid business models ("we'll put up copyrighted map data for free and make money from advertising"). Then, when it predictably turns out that people access that data programmatically, they whine.

    Let's not screw up our legal system with provisions to protect bogus business models. If streetmap.co.uk cannot figure out how to make money putting up information openly on the Internet, then either they should make room for someone who can, or maybe there just isn't a market there.

    1. Re:stupid business models by El+Cubano · · Score: 1

      If these stupid business models gain legal protection, how long until something really stupid happens?

      I can see it now:

      Company XYZ purchases advertising space in a subway terminal. They figure out that most people drive cars. Next step is to sue for a mandate that everyone must use the subway because automobile drivers deliberately evade the company's advertising by utilizing an alternate means of transportation.

      Sheesh!

    2. Re:stupid business models by Eric+Savage · · Score: 1

      Stupid business model, In your opinion. I have no problem scanning the ads of a company that provides me a very valuable service (thinking MapQuest here). Some chode decides he wants to violate the AUP and writes a nifty little program that gives people the map without ads. This program ends up on a popular site and now a significant portion of MapQuest's users are providing no revenue. They eventually convert it to a pay-only service, so the chode referenced now just cost me the use of a valuable service or X dollars a month because HE doesn't like ads.

      Thanks.

      --

      This is not the greatest sig in the world, this is just a tribute.
  82. COPYING in copyright law includes volatile RAM by yerricde · · Score: 2, Interesting

    the spirit of copyright laws are restricting COPYING

    The problem here is that a U.S. court decision interpreted a copy in RAM as a "copy" for purposes of copyright law. Thus, when the kernel receives a packet, it COPIES the packet from the network card to the browser's memory, and then the browser COPIES and ADAPTS the HTML into a document tree, COPIES and ADAPTS the document tree into an offscreen bitmap, and COPIES the offscreen bitmap into your video card's RAM.

    And if you're arguing fair use, as I said, you better have the money to pay an attorney to back it up.

    --
    Will I retire or break 10K?
    1. Re:COPYING in copyright law includes volatile RAM by Natalie's+Hot+Grits · · Score: 1

      Too bad fair use allows copies for personal use(which is exactly the case you describe).

      If this was a mainframe computer that was giving all this data to multiple users, you might have a prayer, but no court will convict if someone is copying it for personal use(copying for personal use for convenience factor is acceptable). This by definition, is covered under the "Fair Use Doctrine" which was defined by the US Supreme Court.

      You must also realize that if you are not distributing anything, Fair Use is extremely loose on copy restrictions. However, if you distribute it, you practically can't do ANYTHING to it first without permission.

      --
      Two infinite things: your stupidity and mine. But I'm not sure about the latter. If my sig offends you, I'm sorry.
    2. Re:COPYING in copyright law includes volatile RAM by Anonymous Coward · · Score: 0

      no US copyright law actually has a SPECIFIC SECTION that says loading software into RAM, etc., isn't a copyright violation. That's why EULAs fall under contract law.

      I mean, it's total bullshit that this would be infringing, anyway. THEIR computer delivered the copy at your request. If you put a "copy dispensing machine" in the middle of the street, you can't sue people for copyright infringement because they use it.

    3. Re:COPYING in copyright law includes volatile RAM by Minna+Kirai · · Score: 1

      a U.S. court decision interpreted a copy in RAM as a "copy" for purposes of copyright law

      Really? Do you have a reference? I've always been told that it was the UK that had that interpretation, and that US law permitted "incidental copying as required for normal use of the product".

    4. Re:COPYING in copyright law includes volatile RAM by Anonymous Coward · · Score: 0

      What about Java? Intellectual property gets copied wholesale whenever the JVM garbage collects the heap.

    5. Re:COPYING in copyright law includes volatile RAM by Anonymous Coward · · Score: 0
      Too bad fair use allows copies for personal use

      No. Fair use *might* allow copies for personal use.

    6. Re:COPYING in copyright law includes volatile RAM by Natalie's+Hot+Grits · · Score: 0, Troll

      No. Fair use *does* allow copies for personal use.

      Fair use has been clearly defined by the US Supreme Court for years. There is no *might* about it.

      --
      Two infinite things: your stupidity and mine. But I'm not sure about the latter. If my sig offends you, I'm sorry.
  83. How about Google, et al? by tjcoyle · · Score: 2, Insightful

    Gosh, I don't know, but don't I see Google redisplaying site content of billions of pages day in and day out?

    Sounds to me like the area's too grey to ascertain right and wrong (I may be, and probably am, ignorant).

    However, these sites definately have every right to do whatever they wish in order to prevent such use, such as IP blocking, taking some creative evasive measures, OR... securing content they don't feel Joe Public should consume.

    What would happen if say, General Motors suddenly decided that each and every time a GM vehicle shows up in media that it was an abuse of their intellectual property??

    Ptttth!

  84. They're right. by Dthoma · · Score: 0, Troll

    Anything which allows a Perl program to access their website more than once should be banned. Guess we'd better get rid of telnet. And FTP. And web browsers. Heck, let's get rid of ping just to make sure. Better get rid of modems and the programmers while we're at it. Screw it, just ban computers all together. Then there's no way those evil hackers can fuck with their website!

    --

    Note to M1-ers: a curt but otherwise insightful message is not "Flamebait" or "Troll".

  85. I don't have to worry about screen scraping by Powercntrl · · Score: 1

    My monitor has an automatic defrost feature.

    --

    ---
    DRM is like antifreeze, to the MPAA/RIAA it's sweet, to the consumers it's poison.
  86. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  87. Fun Banner on EuroTV by Zibu · · Score: 1

    They seem to try (and of course fail) to detect robots: browsing their site with OmniWeb and Safari on a Mac, you'll see a banner at the top of the pages which says:
    "This banner is used to kill robots if you see it on a web page please advise technical@beweb.com".

    Changing the User Agent in Safari displays a "normal" banner.

    I wonder what the genious who has set up this had in mind?

    Anyway, I reported to technical@beweb.com, and I'll see what they mean.

    Do they know LWP::UserAgent? Guess not!

    --
    Me no sig.
  88. Speaking as someone that owns and a TV guide... by rusty+spoon · · Score: 2, Interesting

    I do feel pissed off every time we catch someone stealing our content and using it in their own tools. Copyright notices and T&C's are all well and good but they do NOTHING to stop someone from trawling your site.

    As an owner and publisher I *can* say how my content is to be used because that's the licence I grant, it's MY choice. If I wanted it to be freely copied and used in any way then I would release it into the public domain...and it will be a cold day in hell when that happens.

    The information (in our case TV listings) is costly to collect. I guess the spongers don't realise that or they just don't give a fuck.

    I've found the solution is to a) implement technology to try to prevent it, and b) complain directly to their ISPs.

    Both of the above solutions work but are themselves costly in terms of the technology and the time taken. These are two things we'd rather not spend our time and money on, and they distract us from creating great software.

    At the end of the day if everyone trawled web sites for content then there would be no web sites supplying the content. The people trawling often request thousands or tens of thousands of pages in a very short space of time. The costs in terms of bandwidth and slow service to legitimate customers soon add up.

    Our downloadable software TV guide (DigiGuide) did in the past have unencrypted data files. We didn't honestly expect someone to take our content and build a (possibly competing ) product around our data but they did. The data is now encrypted and should someone crack the encryption then we just change it and their hard work is wasted.

    I feel sorry for web sites like TVGuide.com because they probably think they have some very loyal users that spend a lot of time on their site and read a lot of pages...instead they just have people sucking their content and paying them nothing for it. Ignorance is probably bliss for them.

    1. Re:Speaking as someone that owns and a TV guide... by kindbud · · Score: 1

      As an owner and publisher I *can* say how my content is to be used because that's the licence I grant, it's MY choice.

      That is not an absolute privilege. For instance, you cannot expect to enforce a license that restricts viewing your work through rose-colored glasses. You cannot expect to enforce a license that prohibits parodies of your work. It is NOT all YOUR choice how your work is used.

      The information (in our case TV listings) is costly to collect. I guess the spongers don't realise that or they just don't give a fuck.

      The latter, no doubt, is the case. But then of course, that is not their problem, it is yours.

      These are two things we'd rather not spend our time and money on, and they distract us from creating great software.

      As a storefront merchant, I can sympathize. I'd rather spend my time and money selling goods and restocking my store to keep up with sales. Installing locks on the doors and reporting theft to the police just distract me from selling to my customers.

      Our downloadable software TV guide (DigiGuide) did in the past have unencrypted data files. We didn't honestly expect someone to take our content and build a (possibly competing ) product around our data but they did.

      What was that you said about data that is costly to collect? Or were you just incompetent "in the past"? The School of Hard Knocks has a demanding curriculum, doesn't it?

      I feel sorry for web sites like TVGuide.com because they probably think they have some very loyal users that spend a lot of time on their site and read a lot of pages...instead they just have people sucking their content and paying them nothing for it. Ignorance is probably bliss for them.

      Ah yes, so you think you are smarter than the TVGuide people. They must be just as idiotic as you were "in the past" when you released "costly" data without encryption.

      --
      Edith Keeler Must Die
    2. Re:Speaking as someone that owns and a TV guide... by Anonymous Coward · · Score: 0

      Limit concurrent connections dumbass.

      Jesus CHRIST!

    3. Re:Speaking as someone that owns and a TV guide... by rusty+spoon · · Score: 1

      Sorry AC but that would also stop people (rightly and honestly) using legitimate tools like iSiloX, AvantGo, Mazingo, MobiPocket etc.

      Anyway, there are better ways.

    4. Re:Speaking as someone that owns and a TV guide... by civilizedINTENSITY · · Score: 1

      Complaining directly to their ISPs should get you squat, though. If you coerce ISPs into taking unlawful action, that should be actionable. Also,its not as though the web had no content before commercial interests became interested. The thing to remember is that you own your content and that gives you some say in how its distributed, but not mere data. You have no say in terms of what someone does with data even though they learned the facts from you.

    5. Re:Speaking as someone that owns and a TV guide... by rusty+spoon · · Score: 1
      Or were you just incompetent "in the past"?

      No, but we were naive in thinking that people would do the right thing. Experience was as usual a fine teacher.

      Ah yes, so you think you are smarter than the TVGuide people.

      They attempt to charge businesses tens of thousands of dollars per month to use their TV guide data and then they allow (probably due to ignorance) people to use perl scripts (and similar) to scrape their entire site and use it in their own competing products (there are several). Not only that but they give their content away for free on their web site yet they charge good money for the same content in their magazines.

      I couldn't say whether we are smarter than the tvguide.com people but I can say that what they do is plain dumb and it's something we don't do. They would do themselves a favour if they dropped the crummy advertising and switched to subscription. At least that way the site could justify it's own existence and a 'user' would be transformed into a 'customer'.

      (Incidenatlly, removing advertising and viewing 'users' as 'customers' has a big, and positive, impact on the way a web site business operates. From both customer and our POV it's been great.)

  89. Re:paging Jack Valenti (morbid humor) by Dr.+Photo · · Score: 2, Funny
    "Having left the womb, you have, by default, accepted the agreements to all life's conditions."

    Well, there are ways to terminate the agreement, but they ain't pretty...

  90. screen scraping software is completely legal by frovingslosh · · Score: 4, Insightful
    Some /. readers seem to be missing this, but this is not a debate on if it's right to take someone's content and post it elsewhere. (To me it's clearly not without their permission, but that's not the issue here at all so lets not even pretend that it is by debating it.) The issue is "is it legal / proper/ ligitimate to write software that is capable of looking at the output of a website, by any means - including examining the HTML returned or by capturing the computer screen itself and analizing that? Of course it is. Such software in no way pirates a website owner's content, it just gives me additional tools for keeping current with the content of those pages. There are plenty of legitimate uses (the Streetmap reference was perfectly on target for this, just to give one). That someone might abuse such a tool and pirate content is hardly the issue, if it were every C compiler would also be at fault. People need to stand up against cranks like btek's Kate Sutton who think they can bully everyone else in the world. Simon Batistoni should have never even tried to be reasonable with her, and he should make his tool available again and sue her and her company for the slander she has done to him in the main perl5 bug queue.

    Even if he had provided a tool to make a copy of a map, which he did not, there is nothing at all wrong with making and supplying others with that tool. It's how the tool is used that is the issue, and a tool that has legitimate useful uses can never be allowed to be the target of such a complaint or suit.

    --
    I'm an American. I love this country and the freedoms that we used to have.
  91. Have you done your part?! by philipx · · Score: 1
    Geeks!

    Have you done your part in emailing the two companies and trying to say your thoughts hoping you would steer them to reason ?
    If not do it so.

    If you did, then switch to your anonymous email account and send them some nice hate mail as well...

    --
    __________
    Don't belong. Never join. Think for yourself. Peace!
  92. Umm... by hateddamntruth · · Score: 1

    Yeah, and you're only allowed to look at my house with one eye.

  93. An appropriate response... by X86Daddy · · Score: 1

    ... in the words of Mr. Garrison:

    No!! No, No, No! I'm Mr. Hat and you're, you're a little turd! You hear me?!? You go to hell! You go to hell and you die!

    Who do these people think they are... threatening programmers for writing code to read info that is published openly and allowed to be read from their servers...

    Dipshits don't know what the web is for, but they use it anyway... next they'll threaten to sue makers of scissors or highlighters for providing tools to extract TV listings from the paper.

    1. Re:An appropriate response... by Anonymous Coward · · Score: 0

      It's their stuff, they do what they want with it including deciding how it is accessed.

  94. But, it's a technology! by Quixadhal · · Score: 1

    And we all know that if there isn't a specific law which pertains to an exact version of a piece of software, then that software will bring about the downfall of humanity, right?

    It doesn't matter if the *issue* is already dealt with in lots of other laws, we need to create a new digital law to deal with this CyberCrime, because...errr... think of the children!

    Can we go home now?

  95. reselling? by AssFace · · Score: 1

    if someone is scraping the data and then redistributing it for their own profit - then that would be sketchy to me - but otherwise, I don't see how you look at it matters.

    I have scrapers for a few things, and I doubt the company even notices (since they aren't mass distributed scrapers like a CPAN module is) - but I would think scraping wrong if I then took that data and used it for my own profit while getting it free from the other source. (thinking mainly of stock data - strip it for free from one location, but then charge users to see that data on your site)

    --

    There are some odd things afoot now, in the Villa Straylight.
  96. and what about DISTRIBUTION? by ShinmaWa · · Score: 1

    Nice troll.

    I find it very funny that you cut off the rest of the phrase you quoted -- "COPYING and DISTRIBUTING". Last time I checked, a computer's memory is not -- by nature -- a distribution medium to a mass audience.

    --
    The /. Effect: Thousands of users simultaneously accessing a site to not read its content.
  97. Yes they can... by stubear · · Score: 1

    "Information contained on this server is copyrighted and may not be distributed, modified, reused, re-posted, or otherwise used outside the scope of a WWW client without the express written permission of B. On The Net, owner of the EuroTV site."

    If someone wants to write a program to push this data through another application or website they need to take the time to establish a way to build the database on their own. No one can copyright the data itself but they can keep you from using their data as the source.

    1. Re:Yes they can... by rusty+spoon · · Score: 1

      Actually much of the data (episode and movie descriptions) is unique to us, and if not to us then to our core suppliers.

      So, you are wrong. The times and the programmes names are probably fine but then there is the question of access to the server...it's a terms and conditions of use thing. If you don't like the terms of a particular supplier for your products then shop elsewhere.

  98. Banning vs. Blocking by billstewart · · Score: 3, Insightful
    All sorts of people who don't understand the web or the Internet keep trying to get rules made or bring lawsuits or abuse the DMCA in novel ways because they don't like how their data is being used. In most cases, this is way out of line (as opposed to mildly out of line) because they can simply set their web server not to respond to requests they don't like.

    A classic instance is the "deep linking" cases, where somebody doesn't want to let you see their deep pages except by coming through their front page. Rather than taking this to court, as several content providers have done, and beat up on users one at a time, it's much simpler to check the HTTP-REFERER to find out what page the request came from, and send an appropriate response page to any request that doesn't come from one of their other pages. (Whether that's a 404 or a redirect to the front page or a login screen or whatever depends on the circumstances.)

    Screen scapers are an interesting case for a couple of reasons. One of them is that blind people often use them to feed text-to-speech browsers, so banning them is Extremely Politically Incorrect, as well as rude and stupid. Another is that anybody with a Print-Screen program on their PC can screen-scrape - you're only affecting whether they get ugly bitmaps or friendlier HTML objects. So you not only have to ban custom-tailored CPAN objects, you have to get Microsoft and Linus to break the screen-grabbers in their operating systems.

    The related question "ok, so how *do* I detect and block http requests I don't like?" is left as an exercise to the blocker (and to the people who build workarounds to the blocks, and the people who also block those workarounds, etc...) The classic answers are things like cookies (widely supported "need the cookie to see the page" features seem to be available), ugly URLs that are either time-decaying or dependent on the requester's IP address, etc., or just checking the browser to see which lies it's telling about what kind of browser it is. There's also the robots.txt convention for politely requesting robots to stay away, and Spider traps to hand entertaining things to impolite robots or overly curious humans.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
    1. Re:Banning vs. Blocking by Jeremy+Erwin · · Score: 1
      Spider Trap?


      So that's the technical term for feeding /usr/share/dict/words into google.


      Of course, some websites have fed lists of "dirty" words into google for a wholly different purpose. Can't those website designers understand that some of us have very specific tastes?

  99. What distinguishes a Perl script from Mozilla? by Fiery · · Score: 1

    What is the line in the sand separating a Perl script and Mozilla, in this situation?

    Both collect data from the web, process it, and display it in a form understandable by the user. It just happens that one is more popular than the other.

    If I was to rewrite that module to use AppleScript under OS X to go to their website, fill in the form, and save the image to my hard drive in a desired location, could they say I was violating their terms of service?

    I'm using a web browser to access their service; it so happens that my preferred interface to that web browser is through AppleScript, instead of through the mouse and keyboard. Does that make it unacceptable to use their site

    1. Re:What distinguishes a Perl script from Mozilla? by Anonymous Coward · · Score: 0

      This is precisely why I think that these companies are going about this all wrong. They are not suing the people who are redistributing the content (which I think would be perfectly legitimate as long as they have copyright notices). They are trying to stop one particular tool.

      Let's say that they get these modules removed from CPAN, it is still not difficult to write the code to do the same thing. They don't gain anything by this action except to annoy a lot of developers.

  100. When is it OK? by KevMar · · Score: 1

    A what point is the information provided to you fair game. You can manualy take any information that is made public and view it under your own terms. You can save the page, copy/past into notepad, import into excell and even sort it.

    What about public urls? can you visit a public url as often as you want as long as you dont DOS the server?

    I got a SPAM that used a tracking url for a image source. can I write a script that passes that url bad tracking information?

    http://216.219.227.69/cgi-bin/open/open.cgi?x=jo es mithhm@hotmail.com
    http://216.219.227.69/cgi-bin/ open/open.cgi?x=jack smithhm@hotmail.com
    http://216.219.227.69/cgi-bin /open/open.cgi?x=jims mithhm@hotmail.com

    --
    Im a gamer, not a grammer major. This post is full of spelling and grammer mistakes.
    1. Re:When is it OK? by rusty+spoon · · Score: 1
      You can manualy take any information that is made public and view it under your own terms.

      Um, no, you can't. The publisher decides the terms by which you can use the information. You either agree or disagree.

  101. TV Guides, New Business Models, Screen Scraping by CuriousKangaroo · · Score: 1

    The aggregator of television guide content is in an interesting business... and problem. Yes, it is costly to get guide data, and they need to recoup their losses. Screen scraping free guide websites bypass the ads, which is, of course, how those websites pay for their work.

    But there are so many potential uses for free guide content in an easily transformable format. Besides VCRs or PVRs that can find programs for you, what about websites dedicated to finding your favorite shows, or a simple PDA app that alerts you when it finds out that your favorite movie is due to air? So where is the data? Who would provide it?

    Well, with a standard format (or this ) for such data, I believe the producers of that content (or, rather, the distributors... the networks themselves) could provide that data. It's obviously in their best interests for that data to be accurate, and as freely and widely available as possible. They want people to find their shows. Combine it with a simple automated lookup-table translation of network names to your local cable station numbers, and you're set. Of course, something like this could put guide aggregator businesses out-of-business, and I really doubt it will ever happen... But it probably should!

  102. LED sign by PW2 · · Score: 1

    I use a Betabrite LED sign as one of my web-browsers:
    http://www.remote-control.net/software/ledsign/

    I always thought of html or xml as data that is provided publicly (internet) or privately (intranet) and the reader application uses what it can. (Internet Explorer vs Lynx vs BetabriteHeadlines)

  103. nitpick by Anonymous Coward · · Score: 0

    Linus wrote the kernel, not the OS... and definately not the gui.

  104. No man is an island... except in the bath by jo_ham · · Score: 1

    I'll be here all week ladies and gents. Please, try the fish.

  105. I say turn it around... by ZoneGray · · Score: 2, Interesting

    Well, if screen-scraping is illegal (and in some forms, it certainly is), then somebody should sue the people who sell programs that harvest e-mail addresses from web sites.

  106. for reference... by H0NGK0NGPH00EY · · Score: 1

    This is the page that he is referring to. Admittedly, the last two were rather small, but they did have cows, and it was pretty plain to me after looking at just the first two pictures that the theme was cows.

    --
    Do not read this sig.
    1. Re:for reference... by Anonymous Coward · · Score: 0

      Or "farm", "farming", "rural", "milk", "country", "Gateway".

      Everyone's interpretation of pictures are different. That's why art sells so well.

  107. Um, search engines? by Anonymous Coward · · Score: 0

    Unless I'm sadly mistaken, it's CODE that screen scrapes Web pages that's used by Google et al to populate their databases. Without these search engines, a lot of these sites wouldn't be getting any eyeballs at all.

  108. The blind? by Anonymous Coward · · Score: 0

    What about that blind guy who is suing the airlines for not making their site bind person friendly? Will he be suing these companies too since they require him to read an image? lol.

  109. TV Listings by jefu · · Score: 1
    I have Dish TV and have proposed to them in email several times that they make the tv listings for all their channels available in some nice downloadable format and with a local viewer program. (I even offered to design and build a first version for them for a reasonable (grin) fee.)

    Dish TV however takes their customer interaction guidelines primarily from Ernestine the Telephone Operator ("Cackle, we don't have to care!").

    Hence their answer has been uniform. They don't bother to answer.

    So I've been using XMLTV to download the listings, and a homegrown XSL transformation to change the listings to a nice grid for viewing in a browser. Works just fine, but I'm sure that the people running the sites with the listings are getting cranky at people doing this or similar things with XMLTV.

    Worse yet, I believe that if they provided the listings in a slightly different format and compressed, the downloads would be much less onerous.

    I suspect they're trying to figure out how to make downloadable listings available and then charge heavily for them - the paper version is now a couple bucks a month ($3.95?) so they're probably counting on being able to charge at least $30/month for the electronic version.

    And they'll undoubtedly justify those high costs by pointing to the load XMLTV places on their systems.

    And when they do, they'll stop anyone from using XMLTV or the like. I suspect that EuroTV is just doing the same thing.

    1. Re:TV Listings by sn0wcrash · · Score: 1

      Some of the freeware Tivo-like programs use XMLTV to gather the program data. So how will that effect these projects? My main concideration for looking into these projects is because of the monthly fee associated with my Tivo. $12 a month I can justify.. but I could really use a seocnd one.. but $25 a month or so just doesn't seem worth it to me. And if I have to pay to get information for a free linux app.. where is the advantage? More work for the same money?

  110. they are ass monkeys by laugau · · Score: 1

    Technologists should realize that:

    1) More robots use the internet than people anyways

    2) They cannot dictate how their site is used, just that it is used in a certain way. For instance, if the protocol is HTTP,and as long as a person or a robot uses HTTP they really don't have that much to say.

    They don't have a broke-ass-splintered pirates leg to stand on by saying how to get the information as long as what I do with it after I get it is fair use.

    If they want to be profitable, sell fucking memberships and quit bitching. The Internet ain't free (or am I the only person who fucking learned anything since 2000)

    For instance, lets say I hit a website... do I care if the other end of the connection is a person responding to a terminal request for content and they drag and drop all of my stuff into a network hole? Or does it matter if it is an apache or an ISS server? It don't matter.... as long as we both follow an agreed upon protocol, they can eat my ass if they think I am going to grab something with a browser when I can have my agent do it.

    I say to those Nazi ass sniffers that they can slurpo my dongo before they tell me how to use a computer.

    Oh, and just so this comment can get modded higher, Bill Gates and Microsoft can swallow along with Dell and Toyota.

    1. Re:they are ass monkeys by jkcity · · Score: 1

      Would you like it if I made an app that reloaded you webpage every 0.2 seconds to see if you had updated it, what happen if everyone really wanted to know when your site was updated would you like it if I distributed it to say 1000 people who all ran it 24 hours a day.

  111. Encrypted connection [was:Dangerous Precedent] by Anonymous Coward · · Score: 0

    If they really cared about keeping this stuff from the public - surely they would have used SSL (HTTPS), even with an old expired key (as is so common nowadays)? I don't see a single reason to not blame them for this. You might feel another way, but technically you'd be wrong however you turn it.

  112. OT: Re:What falls out the back end of a bull? by lommer · · Score: 2

    You are really thick aren't you.

    He wrote a package that happened to incorporate some OCR libraries that someone else wrote. He didn't claim to write the OCR libraries. Also, without his package, the OCR libraries wouldn't be applied to defeating this securtity.

    Why do I waste my time with trolls? Why?!

    1. Re:OT: Re:What falls out the back end of a bull? by Wonko42 · · Score: 1
      He claimed he wrote packages that defeat the aforementioned security methods, when in actuality what he wrote was a medical records package that used OCR libraries to do something not at all related to the security methods he previously claimed he had written software to defeat. Thus: bullshit.

      And I'm the thick one?

    2. Re:OT: Re:What falls out the back end of a bull? by Anonymous Coward · · Score: 0

      Yes, thick and pedantic. Go away.

  113. Birth agreement by lastberserker · · Score: 3, Funny

    Then I assume such agreements do not apply to c-section kids, do they? Oh the ineffable joy of medical techno... oops... does this make c-section DMCA circumvention device? =8-Z

    --
    My other Beowulf cluster is... er...
    1. Re:Birth agreement by n9hmg · · Score: 1

      I never thought I'd find myself wishing for more mod points. This really struck me funny. Thank you.

  114. The internet is a public, unregulated network. by Zone-MR · · Score: 2, Informative

    If they dont want people to use the information the way they do, why the hell are they publishing it on servers connected to a network not controlled by them...

    I mean seriously, are they now telling us what packets and requests we are allowed to send over the internet?

    By hosing an internet server they are accepting people can connect to it and send the data they like. If they dont like it, they should try and outsmart people with clever protecting software, or host it on their own private lans.

  115. Section 508 compliance MOD PARENT UP by YellowSnow · · Score: 1

    MOD PARENT UP

  116. This seems legal, just like copying a phone book. by raygundan · · Score: 1

    Which has been ruled legal in court, if I recall correctly. While it is legal to copyright the "arrangement" of the information, grabbing somebody's yellow pages and transcribing the whole thing with some re-typsetting makes that list of phone numbers yours, even to redistribute.

    See here for info: http://www.writing-world.com/rights/fair.html

    Search for the phrase "phone book" on the page. Copying "creative Works," like fiction or poetry, is frowned upon, while copying facts is not. The copyright is "in the expression" to quote the site, not in any underlying facts. If facts were copyrightable, we'd all be completely screwed. As it stands right now, we're only 97% screwed.

  117. "Moral rights" in the United States by yerricde · · Score: 1

    Actually, I believe that an art patron is perfectly entitled to vandalize anything that they buy.

    There does exist limited protection of "moral rights" in United States copyright law, in 17 USC 106A, which would prevent such defacements.

    --
    Will I retire or break 10K?
  118. My bad - definition of computer program by yerricde · · Score: 1

    I read it wrong. I thought that by "restricts COPYING and DISTRIBUTING" you meant "restricts COPYING or DISTRIBUTING", or "restricts COPYING and restricts DISTRIBUTING".

    I also thought that copyright law restricted copying works other than computer programs into RAM except subject to limited fair use exemptions. But now, after trying to determine whether HTML counts as a computer program, and then reading and re-reading 17 USC 101, I realize that under a broad interpretation of 101, any work fixed digitally could be termed a "computer program" and subject to the additional limitations of 17 USC 117. Can anybody cite case law pertaining to this?

    --
    Will I retire or break 10K?
  119. money, business models and digital futures of IP by drDugan · · Score: 2, Insightful

    It all comes down to money and the models people have used to force advertizements onto people while they are entertained or eduacted.

    the cold, hard truth is that the digital future obviates the traditional content control mechanisms used to force consumers to watch ads for content. The exact same lines are playing out on the web, on TV, in music, movies, magazines -- everywhere informationcan be digitized and presented in ways not tied to physical mediums.

    The (now old) business models that the digital methods circumvent will eventually be redefined. Short term laws will support them, because the industries have eough money and clout to cause the laws to happen. Long term, though, people will no longer stand for the absurd, one-sided contract with society that is our current IP system.

    This a vague comment, quickly written -- but I see here the exact same theme played out over and over in recent years. Free communication (amortized) + 'digitizable' items of value => lack of control by provider for profits. This is yet another example.

  120. Pedantry pays by yerricde · · Score: 1

    do you or do you not claim

    I claim nothing. I have never even set foot inside a law school. I just wondered if anybody more familiar with the case law could elaborate, particularly about how much originality it takes to make a derivative work (as opposed to de minimis alterations).

    are you or are you not always this pedantic with informal english?

    No. But when it comes to the fine points that win or lose a lawsuit, pedantry pays.

    --
    Will I retire or break 10K?
    1. Re:Pedantry pays by Anonymous Coward · · Score: 0

      Don't bother arguing with jcast, because he is a known troll, and quite adept, as well. Check his posting history or make the same mistake that I did.

  121. you know, this interweb dealy worked musch better by Anonymous Coward · · Score: 0

    before the lawyers and marketers got involved. We've had problems with: linking and deep linking, DRM, censorship, content legal in one locale and illegal but still accessible in others, domain name speculation, parody sites, libel (maybe even slander), etc. Of course many of these problems are caused by us litigious Americans, what with our ancient (technophobe) judges and all, but Germany, France, China, Saudi Arabia, etc. are also guilty. BTW, my employer wrote a screen scaper, and periodically the source IP's would be blocked, even though we had a contract with the target website(s), so we would renumber the scrapers every time the block happened. My guess is the target's tech people never spoke to the business development people.

  122. That's the idea by siskbc · · Score: 1
    The idea is a question that is stupid simple for a person who has existed in this world, but impossible for a computer that lacks the proper context.

    Like this: Show a picture of a tree. The user fills in the blank. T-R-E-E. Any dipshit would get that right. Hell, even give them the T. I don't think a computer would get it in three tries - after that, do a 1 hour IP lockout. That should also prevent "guessing."

    If you had a bunch of such problems, it would make it pretty tough. Would some of them be solveable some of the time? Maybe. But staying ahead of computers in the Turing test has ALWAYS been very easy.

    But I know what you mean about daytime talk shows. ;)

    --

    -Looking for a job as a materials chemist or multivariat

  123. A Related Issue - Opera Browser Ads by Master+of+Transhuman · · Score: 1

    Somebody came up with a proggie that puts up a program launcher window that covers the Opera ads precisely.

    Some people in the freeware newsgroup complained that this violated the user's agreement with Opera allowing use of the ad-supported version of Opera.

    I argued that since Opera's EULA makes absolutely NO mention of any requirement to view the ads (or even not to alter them), there was NO deal between the end user and Opera concerning the ads in any way. And even if there was, it would be ridiculous to demand that people actually view the ads.

    There have been some ad-supported programs, IIRC, that actually demanded that people click on the ads before the program would function. I suspect most of these programs died a quick death in the marketplace.

    The proggie was just a more sophisticated way of putting tape over that section of the monitor...

    --
    Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
  124. Whilst I don't believe it's illegal by goldcd · · Score: 1

    to use an online resource, I do think there is a moral duty to pay in some way the provider of the information. Maybe if the provider actually created an easy API to retrieve information they could then ask coders to reciprocate the gesture. For example hyperlink them, or embed supplied advertisements. Unless a compromise can be reached then we get a confrontational attitude where both sides waste resources, the data source trying to protect itself and the other side continually trying to break the protection. CDDB is a nice example, it's embedded into loads of programs, but they all (or are supposed to) ask you to supply CDDB your email address before it'll become functional.

    1. Re:Whilst I don't believe it's illegal by base3 · · Score: 1
      CDDB is a pretty lousy example of a case in which users would have a "moral duty" to pay in some way, since CDDB misappropriated the labor of thousands of people by taking a database they got them to create when the program and database were GPLd, then changing the terms later.

      Of course, FreeDB works great, and Gracenote can go pack fudge.

      --
      One CPU cycle wasted on digital restrictions management is ONE TOO MANY.
  125. Palladium would fix it by SiliconEntity · · Score: 1

    Palladium technology (or possibly TCPA) could fix this, without any new laws or arguing over copyright terms.

    With Palladium, the server site could verify that the client (that's you) is running an approved (by the site) web browser and not a screen scraper. In order to access the site you would have to run a Palladium OS and run one of the web browsers the site owner accepted.

    Your freedom would be complete. You could choose not to view the site, or you could choose to view it under conditions which are mutually acceptable between you and the site owner. That's the same basic bargain being offered in every voluntary transaction in the world.

    1. Re:Palladium would fix it by Anonymous Coward · · Score: 0
      One more reason Pd/TCPA is going to go over like a fart in church.

      ~~~

  126. ah what is WAP? by linuxislandsucks · · Score: 1

    someone should rmeind thee two companies that systems that display their sites as wap content to mobile phones are actual scrapping systems!

    Jeeze how stupid can people get!

    --
    Don't Tread on OpenSource
  127. Uhh... good luck defining what a "browser" is... by SmurfButcher+Bob · · Score: 1

    One of my scrapers wraps IE. Another is a script that runs inside an html page, and scrapes using IE inter-zone bugs with IFrames. I'd love to see the language that precludes either one of those techniques from being used...

    --

    help me i've cloned myself and can't remember which one I am

  128. segfault by Anonymous Coward · · Score: 0

    Good one. The example did say IE 4 "or higher".

    Congratulations, your custom software is "higher".

    Or, "installed on your system" doesn't mean you need to use it. Use another browser.

    Especially if you can't install IE on your OS. In a way that works, anyway.

  129. Mirror (while it lasts) by kalos · · Score: 1

    http://cpan.azc.uam.mx/modules/by-category/15_Worl d_Wide_Web_HTML_HTTP_CGI/WWW/JOHANVDB/

    May whatever you hold holy bless Google. =)

  130. PROFIT!!! by Anonymous Coward · · Score: 0

    1. Restrict http access use to certain programs
    2. See number 1
    3. ...
    4 PROFIT!!!

  131. copy right???? by Anonymous Coward · · Score: 0

    by offering your work on the web, you are posting it to the public domain and have fuck-all right to say what happens to it afterward.

    At least that's the way it will be when I rule the web!

    someday...

  132. Stupid, but true by Angst+Badger · · Score: 2, Interesting

    Under the current state of US law, unauthorized access to a computer system is a federal crime. (I can't speak to EU laws, but I suspect parallels exist.) If Company X says, "You must use Internet Explorer 5.5 to access this site," then you must use IE 5.5. Of course, it would be just plain stupid to do so, but it's their computer system, and they get to decide who is authorized.

    To judge from most of the comments here, the fact that it is incredibly stupid to impose such restrictions has obscured what is actually a legally unambiguous situation. Just because it's dumb doesn't mean it's not legal.

    That an http server is nominally "public" doesn't mean diddly here. Any number of http servers provide for member- or employee-only access. The brick and mortar parallel would be those signs that say things like, "No shirt, no shoes, no service."

    It is surprising that so few people have touched on the reason why companies might object to the distribution of Perl modules designed to harvest data from their sites: bandwidth costs and site performance. It doesn't take too many cron jobs banging on a site every minute -- and being ignored by their users most of the time -- to degrade site performance for "live" users and run up steep bandwidth bills.

    Now, there is certainly no legal basis for Company X to demand that CPAN remove the modules, though it is hardly out of line to ask nicely. But there is firm legal grounds to prohibit anyone from actually using those modules.

    Legal action is probably the wrong way to handle this, though. Having written fairly complicated web scrapers before, I know how easy it would be to make a site virtually impossible to harvest. Rather than make a big stink about the Perl programmers who contribute to CPAN, Company X would be well-advised to hire a good Perl programmer to thwart automated harvesters.

    --
    Proud member of the Weirdo-American community.
    1. Re:Stupid, but true by base3 · · Score: 1

      Bullshit. The open port and running HTTP server constitute permission to connect. Of course, they have money, and the victim of such a ridiculous prosecution doesn't, so it would probably go the way you describe--but not because that's the law.

      --
      One CPU cycle wasted on digital restrictions management is ONE TOO MANY.
  133. Shrinkwrap License by FireBreathingDog · · Score: 1

    When the water breaks, it is akin to breaking a shrinkwrap, so consent is implied.

  134. Some sites use text ads by ledestin · · Score: 1

    It's not the ads that I dislike, it's the way they're made. Flashy, blinking ads are annoying, they enticed me to turn Flash off. OTOH I don't mind text ads and won't mind clicking on it, if I'm interested.

    I've seen some sites start using text ads, lwn.net for example. And google has been doing it for a long time now, though it gets more annoying with more ads per page. Well, I guess they're successful *grin*

  135. "Re:Missing the point" is missing the point by civilizedINTENSITY · · Score: 1

    Yeah but it matters if there is creative expression involved. No one is saying a webpage is in the public domain, but rather that no one can copyright a fact. TV shows are not in the public domain, but no one can stop me from reporting that Joe Blow said "such and such" last night at 8:00 PST on channel 5. Thus the responsibilities to which you refer do not exist and are contrary to the public good with respect to the topic at hand. So remember, when people give up their rights because they are made to feel its the responsible thing to do, they are being irresponsible.

  136. More than law by amcguinn · · Score: 1
    1. Making and using these is totally legal
    2. Some ad-supported free services, like Streetmap, are very valuable. It would be a loss, socially, if they failed because people stripped off the ads.
    3. It's still a good thing that it's legal. The social loss of losing a few free ad-supported services is small compared to the huge social cost of intrusive and expensive law-enforcement that would be needed to protect them. Nothing I do in my own room with the door locked should be illegal. (An extreme view, but I hold it.)
    4. The way to get the best of both worlds is for people to voluntarily refrain from abusing worthwhile free services.
    5. What constitutes "abuse" or "worthwhile" are pretty subjective; that's another reason why it is important these judgements are left to individuals and not made by governments.
    6. My conclusion is to say: don't blindly stand on your "rights", even as you argue (rightly) to defend them. Bear in mind that a good legal system leaves you a lot of responsibility to decide what is right or wrong: Consider your own responsibility as well as the law
  137. Oh please. by jotaeleemeese · · Score: 1

    In the UK most people killed by fire arms are criminals killing each other and there are just a few incidents like those every year. It is simlarly so in most other places with gun restrictions.

    In the US any retard can kill somebody else, they could not if they did not have a gun. It is similarly so in other places with lax gun restrictions or no restrictions at all.

    Obviously the cause of so much violence is the widespread availability of guns.

    You confuse cause and symptom, that tragic mistake commited by the US society is costing people in your country their lifes every day.

    --
    IANAL but write like a drunk one.
    1. Re:Oh please. by numbsafari · · Score: 1, Flamebait

      I hate to tell you this, but you are completely wrong.

      The cause of so much violence is poor leadership, lax morals, the advent of a philosophy that tells people they are not responsible for their actions.

      You didn't kill him, the gun did.

      You didn't kill him, the guy who sold you the gun (who also sells guns to millions of other people who didn't kill anyone)--he killed him.

      You didn't gamble all your money away, drink too much, hit your wife, and sodomize your daughter. It was the dirty corporate scum who exploited you by paying you, out of the kindness of their filthy hearts, 10 times what they could pay someone in Costa Rica--who needs it much worse than you do--for the same job.

      You didn't choose to eat too much at McDonald's, McDonald's used mind control techniques to make you eat too much at McDonald's.

      You didn't fail out of school, it was the elitist punk who studied too much and went to Harvard that made you fail out of school.

      Because the government spends all its time trying to tell you how to act all day long, it doesn't have time to punish the real criminals, or even defend its own borders. Because you let the government make all your decisions for you, you aren't responsible for any of the outcomes of your actions.

      When you take control of someone's life, you are committing a crime worse than murder.

      Europe deserves what it gets. France, Germany and Belgium will be Third World countries by the end of this century. If the US isn't careful to not follow their example, we won't be far behind.

      Of course, you probably think I'm some simple-minded, hate-filled warmonger. The fact of the matter is that I'm college educated (graduated with honors), give money to charities (probably more than you do), and generally don't like war or violence--I don't even own any guns. It makes me sad to see a continent with such a proud history--a history that is really my own as well--fall into such disrepair. Even in America, it makes me sad to see such a powerful country fall apart.

      I fight every chance I get to promote the ideals that will save us; while people like you roll over and let it all slip away.

    2. Re:Oh please. by Fat+Casper · · Score: 1
      Obviously the cause of so much violence is the widespread availability of guns.

      You confuse cause and symptom, that tragic mistake commited by the US society is costing people in your country their lifes every day.

      Obviously the cause of such a silly comparison is the fact that you have an axe to grind.

      In the US any retard can kill somebody else, they could not if they did not have a gun.

      Obviously, you haven't met our millions of drunk drivers with SUVs.

      In the UK most people killed by fire arms are criminals killing each other...

      Funny, most killings in the US are by criminals, too- killing being, by and large, a criminal act. Guns are an easy tool, and if I decide that you're dying today then it's over and the lack of a gun isn't going to matter. The problem is idiots wanting to kill people, the symptom it the tool they choose to do it with. It was an armed citizenry that threw the UK's silly monarchy out, and Samuel Colt who made all men equal. All things considered, we're better off keeping them.

      Why do you consider software to be the problem when it is simply the tool used by some to violate a law (I know, I ask from the US, where they actually do this)? Do you take personal responsibility for your own actions or blame something else? I'll bet that you take credit for your own accomplishments, don't you? Dishonesty is a problem, too: minimizing it starts with being honest with yourself.

      --
      I spent a year in Iraq looking for WMD and all I found was this lousy sig.
    3. Re:Oh please. by Mr.+Slippery · · Score: 1
      In the US any retard can kill somebody else, they could not if they did not have a gun...Obviously the cause of so much violence is the widespread availability of guns.

      Actually:

      • Canada has more guns per capita than the U.S;
      • Killing people without firearms is not difficult - about 30% of U.S. homicides do not use a firearm (per capita, we have many more non-firearm homicides than the U.K. has total homicides), and the majority of violent crimes do not involve a firearm;
      • Defensive uses of firearms far outnumber homicides by means of firearms; and,
      • Gun laws keep firearms away from violent criminals about as well as drug laws keep drugs away from junkies.

      Our problem lies not within our guns, but ourselves - our culture of violence and fear. Let me recommend Michael Moore's recent movie Bowling For Columbine

      --
      Tom Swiss | the infamous tms | my blog
      You cannot wash away blood with blood
    4. Re:Oh please. by Anonymous Coward · · Score: 0

      You would benefit greatly from living overseas for a few years.

  138. And that law banning HTTP GET... by jotaeleemeese · · Score: 1

    ... is written where?

    --
    IANAL but write like a drunk one.
  139. I was mistaken by Anonymous Coward · · Score: 0

    I was mistaken as to the interpretation of USA law.

  140. Upskirt web browsing? by sbakker · · Score: 1

    Well in Seattle they have decided it is OK to photograph up a woman on the streets skirt. But I guess looking up a web site skirt is just too disguting and lecherous...

  141. Ethics by Anonymous Coward · · Score: 0

    Personally, if I want to re-use content from an ad-supported site, I'd present the ad to the eyeballs. The company is being paid to show the ad along with the content, and I won't steal the content.

    However, "Click the Monkey to Win!" isn't presenting any advertising information, and if the screen has no Web browser...the advertiser isn't going to get much business. But then, they don't get any business from my not clicking on the stupid image anyway. They'll have to present me with info which is interesting rather than stupid.

  142. One Quote Says It All by serutan · · Score: 1

    This one quote by the PERL developer says more than all the comments on Slashdot about how the real world works:

    Today, they treatened me with a law-suit for writing this module. I would like to have the WWW::EuroTV module removed as soon as possible from CPAN and any of its mirrors.

  143. Does Kelly vs. Arriba hold the answer? by Anonymous Coward · · Score: 0
    Isn't scraping and redisplaying essentially equivalent to "framing" somebody else's content, as described clearly in the Kelly vs. Arriba judgement (PDF) in the 9th circuit court of appeals?

    In the finding, it says things like "By giving users access to Kelly's full size images on its own web site, Arriba harms all of Kelly's markets. users will no longer have to go to Kelly's web site to see the full-sized images, therby deterring people from visiting his web site."

    Is scraping and reformatting the data you get equivalent to framing somebody else's content on your web site? Certainly the point of your modification is at a different point, and you are writing code to scrape and reformat, rather than HTML to make the browser load content from two different locations to simulate a single web page.

    I'd love to be proved wrong on this....

  144. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  145. EuroTV offers free Javascript API! by Anonymous Coward · · Score: 0

    Isn't this a strange fact? They threaten people to write an API for eurotv, but they themselves offer a free API using javascript to display TV listings on your site! You can even use your own design, and I don't see any image you have to display on your site to give them credit. Okay, one catch though, you can click on a link that will bring you to the detail page of the show.

    This is what they say about the javascript API:
    To have the TV tonight programs in their native language ! on your site, just copy the following line hereunder. Don't forget to change the parameters to adapt the country. Feel free to adapt the header and footer of your page before and after the include to adapt the look and feel to the one of your site.

    You can read more here:
    http://www.eurotv.com/scripts/jsinclude/Exe mple_TV _Tonight.htm

  146. Re1-hour ip lockout won't work by tomhudson · · Score: 1
    All they have to do, since most IPs are dynamic, is have the script disconnect and reconnect, & check that their IP has changed, and try again.

    Then, of course someone else (whoever gets the IP from the shared IP pool) is locked from that site for an hour.

    But with the tree example - what if they try the following 3 answers, also valid responses, but not right?

    1. bush
    2. elm
    3. forest
    Each response is a matter of the viewer's perspective.
    1. Re:Re1-hour ip lockout won't work by siskbc · · Score: 1
      All they have to do, since most IPs are dynamic, is have the script disconnect and reconnect, & check that their IP has changed, and try again.

      Well, you'd have to be smarter than that. Basically, what ends up happening is that an entire subnet gets locked out if someone starts playing games. Basically, the first time someone gets locked, you only lock that IP. The second time from the same subnet within a short time, lock the entire subnet. People may get screwed, but them's the breaks.

      But with the tree example - what if they try the following 3 answers, also valid responses, but not right?

      1. bush
      2. elm
      3. forest

      Give 'em a short answer blank with four spaces. And give 'em the T. Computer probably STILL won't get it. Have a thousand simple problems like this with about 10 different pictures each, and rotate them occasionally, and a computer will do no better than occasionally get "lucky." That won't worry paypal or yahoo - they don't need airtight security for this, just general discouragement of bots - and such an approach, I think, would do it.

      --

      -Looking for a job as a materials chemist or multivariat

    2. Re:Re1-hour ip lockout won't work by tomhudson · · Score: 1
      Great tactic to help anyone do a DoS - just get yourself an account at a big ISP, then write a script to, every hour, make 2 guesses, and make sure they're wrong (use random chars, for ex.);

      How many common nouns are there that begin with a 't' and have 4 letters. Not that many, and we can't be too fancy. (Download a scrabble word list).

    3. Re:Re1-hour ip lockout won't work by siskbc · · Score: 1
      Great tactic to help anyone do a DoS - just get yourself an account at a big ISP, then write a script to, every hour, make 2 guesses, and make sure they're wrong (use random chars, for ex.)

      First, that would have to be the LAMEST DoS attack ever. Second, such practices are already used - I didn't make it up. And there seems to be no flurry of such attacks - I mean come on, how hard are you going to work to keep 255 people from getting a new account at yahoo? If any of them WANT to right then anyway...that's not even worth the effort.

      How many common nouns are there that begin with a 't' and have 4 letters. Not that many, and we can't be too fancy. (Download a scrabble word list).

      Enough to make it hard to guess in three tries. And I just made up an example off the top of my head, I'm not exactly engineering this. So pick six letter examples instead. Computers will be able to guess less than 1% of the time.

      As I said before, you're missing the picture - this doesn't HAVE to be secure. All you have to do is make it so that a person, sitting at a computer, could make new accounts quicker than the bot would. That alone is enough to discourage the use of bots.

      --

      -Looking for a job as a materials chemist or multivariat

    4. Re:Re1-hour ip lockout won't work by tomhudson · · Score: 1
      even 1% of the time is sufficient, since cpu cycles are basically free. Random passwords of 8 characters can be broken over the weekend nowadays :-)

      However, your example - I downloaded a scrabble wordlist. Of the 306 4-letter words (grep is your friend :-) that begin with 'T', most are adjectives or verbs, which you can't evoke in a viewer's mind with a simple icon. Take the word 'TEXT' or 'TYPE' for example. The only one that wasn't too ambiguous was 'tank (and only if you show a picture of an Abrams A1A or whatever)'. There aren't that many nouns that would work, and if you give 1 of the letters, you've pretty much narrowed down the solution space.

      The bots work for free, so it doesn't matter if they only succeed 0.0001% of the time - the cost is still nothing. This is the same theory spam uses :-(

    5. Re:Re1-hour ip lockout won't work by siskbc · · Score: 1
      OK, two things here basically, right? We have the weakness of 4 letter words, and we have the "1% is enough argument."

      As for the four letter, ok - let's make them 6-letter. That should give us enough ground.

      As for the 1%, remember this has to be considered with the lockout scheme. Let's assume that the lockout is as lenient as possible - that is, unique IP's only (no subnet blocking). Under that, he could basically try 255 times per hour. Assuming a 1% guess rate (which should be generous), we have 2 accounts per hour. That ain't much, even considering automation. Add in subnet blocking, and it drops a ton - 2 tries per hour (it takes 2 tries to establish a subnet-based attack), or one new account every few DAYS.

      There will be a way to make this work - from what I've heard, this is the way that a lot of places (like yahoo) will be going - easier but more "fuzzy" Turing tests.

      Oh, and if you don't like the "fill in the blank" tests, how about something like this. Show a picture and require a human to guess contextual information, at which computers are horrible. Say, a picture of a guy in a suit, and ask if he's going to "a meeting," "a party," "the beach"... Have five choices, and make the user get 4 questions right out of 5. Less than 1% chance of guessing it for a computer. (0.64% actually).

      --

      -Looking for a job as a materials chemist or multivariat

    6. Re:Re1-hour ip lockout won't work by tomhudson · · Score: 1
      Try again:
      • Again, we have the problem that the subset of objects (nouns) that can be depicted easily and interpreted predictably by the end user being relatively small, and therefore easy to predict based on word length and providing of clues to the end-user (1 letter anywhere in the word)
      • To do a DoS, all he has to do is make 5 connections at the same time (the system sees more than the limit of requests from 1 ip and blocks it), and re-connect to get a fresh IP from the pool. Total time - 3 secs max at the machine I'm sitting at. This gives us 20 per min, 1200 per hour. After a few weeks of this showing up in their logs, they'll ban the whole sub-net. In the meantime, this works out to (at 1% success) 12 per hour, or +/-300 per day
      • Add in subnet blocking, and the DoS becomes even more effective with less work.
      • With fill-in-the-blank tests, a success rate of 0.64% by random trial-and-error is not very secure, either. You wouldn't trust your email account to such security, would you?
      Remember, the whole context of this thread was screen-scraping. There's no way to prevent it, esp. if you have a human user to monitor what's going on, or fill-in-the-blanks to allow access. After all, you're not going to have the user do this on every screen. People would go elsewhere, defeating the purpose of a web presence.

      A good example: I scraped 40 gigs of data over a period of 1 week to create a database of 15 million Canadians for one business. There's no way, given proper incentives, you're going to keep me from that data :-)

    7. Re:Re1-hour ip lockout won't work by siskbc · · Score: 1
      Again, 6 letters is OK, there's a whole lot of 'em - and as I mentioned, you could punt out of that if it proved unworkable and find another solution. I don't claim to be the Turing expert, but there will be problems that can't be brute-forced. Again, you can't get a dictionary to solve all "fuzzy" problems (recall my last point about the computer answering a series of multiple choice questions regarding contextual information from pictures or even short passages).

      As for the DoS stuff, yes, he can prevent ONE SUBNET from signing up with yahoo. Again, this is not even worth a script kiddie's time. You're throwing out a bunch of numbers, but think about them. Can you do 1200 per hour when there are only 255 addresses on a subnet, before they all get locked? No...so you can only lock a single subnet, which is 255 people. And if any of those people start bitching about not having access, you can look forward to losing your ISP, and finding another one.

      Also, is a DoS attack that does not spread very much fun? NO! (That's why they do DDoS's instead). And, as I mentioned, websites already do IP lockouts (like slashdot, for instance, for frequent posting). Has anyone DoS'd slashdot? NO! Because no one CARES enough to do it.

      As for security, again, and I repeat myself, this is NOT about security. They do not NEED the same kind of security that you want with your email. If 1% of bots manage to make an account before getting locked out, or each bot succeeds in making a fake account every once in a while, WHO CARES - they don't.

      Yes, the original was screen scraping, but responses tended to take it out of the original context. But if a web site cares enough about scraping to make sure a human is on the other end, then they'll develop more sophisticated tests. And when will they care? At account creation, mainly, to ensure that a human at some point interacted with them. Outside of that, yes, the prevention will be worse than the cure.

      But I maintain, if they care enough, they WILL keep you out. It's all a tradeoff with how many people they piss off, but no one has yet been able to fool a well-designed test to determine person from machine. You will, of course, always be able to get those that don't care enough, which will be the vast majority, I expect. Be satisfied with that, and you'll be fine.

      --

      -Looking for a job as a materials chemist or multivariat