Slashdot Mirror


Meet Cyveillancebot

gulker writes "A rant about making a new 'acquaintance'... Googlebot is like the UPS driver who comes to the door in a uniform, and will happily show you his ID and business card: Cyveillancebot is like a coarse, unshaven, itchy guy with his hat pulled down lurking near your half-open bedroom window. This after Cyveillance defeats a 'protection mechanism' - robots.txt - and grabs 155 copyrighted files from my Web server, which files it will presumably share with others, for a profit..."

4 of 47 comments (clear)

  1. This guy is a bit stupid, right? by swmccracken · · Score: 5, Informative

    This guy is a moron, right?

    Anyone that has *anything* on a public web server that isn't protected with a username and password (and that isn't very difficult, now, is it?) and they want it kept private is some kind moron.

    I mean, I could easily spider his site using wget ignoring his robots.txt.. (For the record, his robots.txt is disalow everyone).

    It is, in fact a mechanism for safeguarding content that owners wish to keep private from crawlers. WRONG! It is a mechanism for discouraging crawlers from downloading vast hunks of your site. (Good example: Crawling all of slashdot would be much larger than slashdot itself because of all the different views of comments you can have. That's why the robots.txt of /. discourages spiders in the dynamically generated views.) Yes, in theory he's right, but reality beckons.

    Robots.txt is not like locking your door with a weak latch. It's like leaving the door unlocked with a "please behave while inside" sign on it.

    Oddly enough, I don't think the police would have much sympathy for anyone who's house got burgled like this...

  2. Cyveillance in a nutshell by Anonymous Coward · · Score: 5, Informative

    Cyveillance runs a web robot. That web robot has one purpose, and one purpose only: to scour the web looking for "copyrighted material" owned by its clients. What happens when such material is found, I don't know; it's probably reported back to the Mother Ship for C&D processing.

    The reason they're widely hated is that their bot misbehaves. Badly. Not only does it send bogus User-Agent headers and disregard the robots.txt file, it'll literally hammer a site. It's one of the most aggressive bots I've ever come across, and it seems its operators don't care. I've seen a server go down because a spider in Cyveillance's IP space was hitting a MySQL-based message board thousands of times per minute.

    Most spiders either ignore URLs with query strings in them, recognize them as potentially resource-intensive and avoid fetching more than once or twice per minute, or are at least smart enough to avoid getting caught in a recursive loop. Not Cyveillance; the damned thing would fetch the forum index, then fetch a thread, then follow the link from that thread right back to the forum index, ad nauseum.

    Cyveillance doesn't just crawl the IP space of webhosting and colo companies, either. They hit my cablemodem all the time - I'm not sure whether they scan all cable modems, or whether they've just grown fond of me because I'm running a web server (which serves nothing externally, save for a tiny index page that shows my uptime).

    Drop 63.148.99.0/24 into the bit bucket and save your server some strain.

    (By the way, why the fuck do I have to logout to post as AC now? Are registered users only allowed one AC post per month or something?)

  3. IP-BLOCK TO BLOCK by Oriumpor · · Score: 4, Informative

    I used SAMSPADE to reference their owned IP block (off the wonderful article) this is most definitely not their ONLY ip block, but if anyone does have more, it would be great to compile a whole list of "mean" IPS.

    I do not care for this kind of intrusion (I equate this to exactly what spammers do to harvest your email...) then you can block these ips (route em to never never land.)

  4. Another guy's experiences by plsuh · · Score: 4, Informative

    Take a look at one guy's experiences with blocking rude bots and spiders. Mark is a buddy of mine and this got him pretty steamed.

    --Paul