Slashdot Mirror


Microsoft Bots Effectively DDoSing Perl CPAN Testers

at_slashdot writes "The Perl CPAN Testers have been suffering issues accessing their sites, databases and mirrors. According to a posting on the CPAN Testers' blog, the CPAN Testers' server has been being aggressively scanned by '20-30 bots every few seconds' in what they call 'a dedicated denial of service attack'; these bots 'completely ignore the rules specified in robots.txt.'" From the Heise story linked above: "The bots were identified by their IP addresses, including 65.55.207.x, 65.55.107.x and 65.55.106.x, as coming from Microsoft."

8 of 332 comments (clear)

  1. This is a normal occurence for Bing by Anonymous Coward · · Score: 5, Informative

    I had a registration page - static content basically. The only thing that was dynamic was that it was referred to by many pages on the site with a variable in the querystring. Bing decided that it needed check on this one page *thousands* of time per day.

    They ignored robots.txt.
    I sent a note to an address on the Bing site that requested feedback from people having issues with the Bing bots - nothing.

    The only thing they finally 'listened' to was placing "" in the header.

    This kind of sucked because it took the registration page out of the search engines' index, however it was much better than being DDOS'd. Plus, the page is easy to find on the site so not *that* big a deal.

    Bing has been open for months now and if you search around there are tons of stories just like this. Maybe now that a site with some visibility has been 'attacked', the engineers will take a look at wtf is wrong.

  2. Flooding... by Bert64 · · Score: 4, Informative

    I have noticed the microsoft crawlers (msnbot) being fairly inefficient on many of my sites...
    In contrast to googlebot and spiders from other search engines msnbot is far more aggressive, ignores robots.txt and will frequently re-request the same files repeatedly, even if those files haven't changed... Looking at my monthly stats (awstats) which groups traffic from bots, msnbot will frequently have consumed 10 times more bandwidth than googlebot, but is responsible for far less incoming traffic based on referrer headers (typically 1-2% of the traffic generated by google on my sites).

    Other small search engines don't bring much traffic either, but their bots don't hammer my site as hard as msnbot does.

    --
    http://spamdecoy.net - free throwaway anonymous email - avoid spam!
  3. Re:Probably just a bug. by Rogerborg · · Score: 5, Informative

    You're probably new here, but if you'd RTFA, you'd see that:

    It seems their bots completely ignore the rules specified in the robots.txt, despite me setting it up as per their own guidelines on their site

    Come to think of it though, isn't this what happens to most people who try to interoperate with Microsoft?

    Amusingly, if I Google for "bing robots.txt" I get a link to a bing page titled "Bing - Robots.txt Disallow vs No Follow - Neither Working!" which has already been elided from history by Microsoft. CLassy.

    --
    If you were blocking sigs, you wouldn't have to read this.
  4. Re:So how do we DDoS Microsoft? by jlp2097 · · Score: 5, Informative

    Not necessary. A Bing Product Manager has already commented on the CPAN Testers blog entry upon which the article is based:

    Hi,
    I am a Program Manager on the Bing team at Microsoft, thanks for bringing this issue to our attention. I have sent an email to barbie@cpan.org as we need additional information to be able to track down the problem. If you have not received the email please contact us through the Bing webmaster center at bwmc@microsoft.com.

    As said below, never ascribe to malice that which can be adequately explained by stupidity. (Insert lame joke about MSFT being full of stupidity here).

  5. Re:Are you sure? by TheRaven64 · · Score: 5, Informative

    Are we sure this traffic comes from Microsoft? Could it not consist of forged network packets?

    It's a TCP connection, so they need to have completed the three-way handshake for it to work. That means that they must have received the SYN-ACK packet or by SYN flooding. If they are SYN flooding, then that would show up in the firewall logs. If they've received the SYN-ACK packet then they are either from that IP, or they are on a router between you and that IP and can intercept and block the packets from thatIP.

    You don't need a reply if you are running a DDOS.

    You do if it's via TCP. If they're just ping flooding, then that's one thing, but they're issuing HTTP requests. This involves establishing a TCP connection (send SYN, receive SYN-ACK with random number, reply ACK with that number) and involves sending TCP window replies for each group of TCP packets that you receive.

    On the other hand, why would anyone, including Microsoft, want to bring down CPAN?

    Who says that they want to? It's more likely that their web crawler has been written to the same standard as the rest of their code.

    --
    I am TheRaven on Soylent News
  6. Re:Oh! *Literally* Microsoft bots! by Ardaen · · Score: 4, Informative

    Probably not, if you look at other incidents: http://cmeerw.org/blog/594.html it appears they just like to push the limits.

  7. No problem by rgviza · · Score: 4, Informative

    ipchains -A input -j REJECT -p all -s 65.55.207.0/24 -i eth0 -l
    ipchains -A input -j REJECT -p all -s 65.55.107.0/24 -i eth0 -l
    ipchains -A input -j REJECT -p all -s 65.55.106.0/24 -i eth0 -l

    problem solved

    --
    Don't kid yourself. It's the size of the regexp AND how you use it that counts.
    1. Re:No problem by j_sp_r · · Score: 4, Informative

      Linux IP Firewalling Chains, normally called ipchains, is free software to control the packet filter/firewall capabilities in the 2.2 series of Linux kernels. It superseded ipfwadm, but was replaced by iptables in the 2.4 series.

      You're a few kernels behind.