Slashdot Mirror


Meet Cyveillancebot

gulker writes "A rant about making a new 'acquaintance'... Googlebot is like the UPS driver who comes to the door in a uniform, and will happily show you his ID and business card: Cyveillancebot is like a coarse, unshaven, itchy guy with his hat pulled down lurking near your half-open bedroom window. This after Cyveillance defeats a 'protection mechanism' - robots.txt - and grabs 155 copyrighted files from my Web server, which files it will presumably share with others, for a profit..."

7 of 47 comments (clear)

  1. Amusement! by fm6 · · Score: 2, Insightful
    What's really dumb about this article is the belief that any documents on a public web site can be considered "private". Indeed, the guy seems to totally misunderstand the purpose of robots.txt. It's not there to specify what's private, it's there to control the way your site is presented on public web servers, and also to help spiders avoid overloading your site.

    And in any case, Cyveillancebot is hardly a real threat to security, compared to script kiddies and the like. If you're trying to keep your private information private, you should be thinking in terms of passwords and encryption, not robot.txt files!

    Oh well, those who can, do. Those who can't, write columns.

    1. Re:Amusement! by You're+All+Wrong · · Score: 2, Insightful

      There's a difference between private and copyright.
      All my website is copyright me, but not private. I have no problem with sharing the results of my research with humans, however, I don't want my copyrights violated. I'm happy with google caching them, I consider that a favour, as it does a public service like a library. This is different though, it's not a public resource.

      If every website were to contain a query-response entry page which screened out non-humans (or unintelligent ones, or ones that can't read English well, or do maths well, or whatever query I set them), then I'd piss of many hundreds of humans.
      It's ungentlemanly to force me to piss off hundreds of people just to keep those who I don't want to read my site out.

      Where has honour gone?

      YAW.

      --
      Your head of state is a corrupt weasel, I hope you're happy.
    2. Re:Amusement! by You're+All+Wrong · · Score: 2, Insightful

      I think highly of Spyveillance's bot in the same way that I'd like every airport security guard to stick his finger up my arse in order to see if I was smuggling heroin.

      Maybe some people approve of such things, but I ain't one of them.

      YAW

      --
      Your head of state is a corrupt weasel, I hope you're happy.
  2. Re:And this is why many ISPs don't give log access by km790816 · · Score: 3, Insightful

    I totally agree...but...

    This is classic American business practices.

    We are a good, upstanding corporation.
    We want to protect our turf.
    We employ a company to help us.
    We don't ask about that companies means or, more likely, turn a blind eye.

    Dell would never agree that applications on the Internet should, in general, act the way that Cyveillancebox does.

    I believe that the author understands your point. He's not whining.

    He is, however, pointing out the hypocrisy, which I think is valuable. I'll think twice about buying another Dell.

  3. Re:Cyveillance in a nutshell by PurpleFloyd · · Score: 4, Insightful
    To me, these actions (hammering databases, getting caught in recursive loops that could be easily avoided) are much worse than ignoring robots.txt. While the whole robots.txt issue could be justifiable from their position (so people couldn't hide copyrighted info via robots.txt), bringing down servers through what amounts to a DOS attack is simply inexcusable.

    There are any number of spiders out there that are smart enough to index whole sites, including dynamically-generated pages, without taking a site down or even hitting it harder than a couple of simeltaneous users. This behavior is not only negligent, but malicious. Any site brought down by Cyveillance would probably have good grounds for legal action (I am not a lawyer, this is not legal advice, talk to a lawyer if you want legal advice, etc.).

    --

    That's it. I'm no longer part of Team Sanity.
  4. Re:Cyveillance in a nutshell by Cy+Guy · · Score: 2, Insightful

    Cyveillance runs a web robot. That web robot has one purpose, and one purpose only: to scour the web looking for "copyrighted material" owned by its clients. What happens when such material is found, I don't know; it's probably reported back to the Mother Ship for C&D processing.

    What I don't understand is why scouring the web for Copyrighted material is considered being violated. If you are depending on the copyright laws, then you must abide by the limitations on those rights. Once the copyright owner has made the document publicly accessibly without encryption, fair-use would dictate that anyone that comes across it can at least read and index the text. They may not be able to keep a complete copy, but they would be able to keep their index, and even profit from the sale/rental of access to that index. If they are caching the page ala Google, then persue them under the copyright laws. If they are merely scouring and indexing and you don't want that done, then don't allow public access to the document. As noted elsewhere robots.txt is not the method for denying public access - some combination of userid/password and/or encryption where you control the encryption key is.

  5. Re:This guy is a bit stupid, right? by 91degrees · · Score: 2, Insightful

    He's a bit of an idiot.

    I agree with the basic principles that this robot is being a little impolite though. The guy opens up his website, hoping that people will act in a civil manner. Cyveillancebot marches in there with the digital equivalent of hobnail boots, ignores the signs, and takes copies of everything, assuming that anything there is probably stolen.

    Equating it to mugging or breaking and entering is a bit much, but the shifty unshaven lurker seemed quite apt.