Meet Cyveillancebot
gulker writes "A rant about making a new 'acquaintance'... Googlebot is like the UPS driver who comes to the door in a uniform, and will happily show you his ID and business card: Cyveillancebot is like a coarse, unshaven, itchy guy with his hat pulled down lurking near your half-open bedroom window. This after Cyveillance defeats a 'protection mechanism' - robots.txt - and grabs 155 copyrighted files from my Web server, which files it will presumably share with others, for a profit..."
And in any case, Cyveillancebot is hardly a real threat to security, compared to script kiddies and the like. If you're trying to keep your private information private, you should be thinking in terms of passwords and encryption, not robot.txt files!
Oh well, those who can, do. Those who can't, write columns.
I totally agree...but...
This is classic American business practices.
We are a good, upstanding corporation.
We want to protect our turf.
We employ a company to help us.
We don't ask about that companies means or, more likely, turn a blind eye.
Dell would never agree that applications on the Internet should, in general, act the way that Cyveillancebox does.
I believe that the author understands your point. He's not whining.
He is, however, pointing out the hypocrisy, which I think is valuable. I'll think twice about buying another Dell.
A speech...
There are any number of spiders out there that are smart enough to index whole sites, including dynamically-generated pages, without taking a site down or even hitting it harder than a couple of simeltaneous users. This behavior is not only negligent, but malicious. Any site brought down by Cyveillance would probably have good grounds for legal action (I am not a lawyer, this is not legal advice, talk to a lawyer if you want legal advice, etc.).
That's it. I'm no longer part of Team Sanity.
Cyveillance runs a web robot. That web robot has one purpose, and one purpose only: to scour the web looking for "copyrighted material" owned by its clients. What happens when such material is found, I don't know; it's probably reported back to the Mother Ship for C&D processing.
What I don't understand is why scouring the web for Copyrighted material is considered being violated. If you are depending on the copyright laws, then you must abide by the limitations on those rights. Once the copyright owner has made the document publicly accessibly without encryption, fair-use would dictate that anyone that comes across it can at least read and index the text. They may not be able to keep a complete copy, but they would be able to keep their index, and even profit from the sale/rental of access to that index. If they are caching the page ala Google, then persue them under the copyright laws. If they are merely scouring and indexing and you don't want that done, then don't allow public access to the document. As noted elsewhere robots.txt is not the method for denying public access - some combination of userid/password and/or encryption where you control the encryption key is.
Work for Change & GET PAID!
He's a bit of an idiot.
I agree with the basic principles that this robot is being a little impolite though. The guy opens up his website, hoping that people will act in a civil manner. Cyveillancebot marches in there with the digital equivalent of hobnail boots, ignores the signs, and takes copies of everything, assuming that anything there is probably stolen.
Equating it to mugging or breaking and entering is a bit much, but the shifty unshaven lurker seemed quite apt.