Microsoft Bots Effectively DDoSing Perl CPAN Testers
at_slashdot writes "The Perl CPAN Testers have been suffering issues accessing their sites, databases and mirrors. According to a posting on the CPAN Testers' blog, the CPAN Testers' server has been being aggressively scanned by '20-30 bots every few seconds' in what they call 'a dedicated denial of service attack'; these bots 'completely ignore the rules specified in robots.txt.'"
From the Heise story linked above: "The bots were identified by their IP addresses, including 65.55.207.x, 65.55.107.x and 65.55.106.x, as coming from Microsoft."
Anyone know what sites on Microsoft's front-facing sites are most computationally intensive, and yet always dynamically generated? :D
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I manage some networks in my home city in Italy, and in the past year I've often seen strange traffic coming from some of their IP addresses. Guess they have been exploited by someone long time ago, and didn't even notice it.
Looks like Microsoft's Bing managers are on it. They'll make it worse in no-time flat. :)
BTW, the difference between a DDOS and a Slashdotting? You know why your site went down -- you got linked!
--
# Canmephians for a better Linux Kernel
$Stalag99{"URL"}="http://stalag99.net";
I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?
This is my sig.
I had a registration page - static content basically. The only thing that was dynamic was that it was referred to by many pages on the site with a variable in the querystring. Bing decided that it needed check on this one page *thousands* of time per day.
They ignored robots.txt.
I sent a note to an address on the Bing site that requested feedback from people having issues with the Bing bots - nothing.
The only thing they finally 'listened' to was placing "" in the header.
This kind of sucked because it took the registration page out of the search engines' index, however it was much better than being DDOS'd. Plus, the page is easy to find on the site so not *that* big a deal.
Bing has been open for months now and if you search around there are tons of stories just like this. Maybe now that a site with some visibility has been 'attacked', the engineers will take a look at wtf is wrong.
I have noticed the microsoft crawlers (msnbot) being fairly inefficient on many of my sites...
In contrast to googlebot and spiders from other search engines msnbot is far more aggressive, ignores robots.txt and will frequently re-request the same files repeatedly, even if those files haven't changed... Looking at my monthly stats (awstats) which groups traffic from bots, msnbot will frequently have consumed 10 times more bandwidth than googlebot, but is responsible for far less incoming traffic based on referrer headers (typically 1-2% of the traffic generated by google on my sites).
Other small search engines don't bring much traffic either, but their bots don't hammer my site as hard as msnbot does.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Are we sure this traffic comes from Microsoft? Could it not consist of forged network packets? You don't need a reply if you are running a DDOS. On the other hand, why would anyone, including Microsoft, want to bring down CPAN?
Nae king! Nae laird! Nae yurrupiean pressedent! We willna be fooled again!
For ignoring robots.txt, they don't deserve any more nor less.
You do not have a moral or legal right to do absolutely anything you want.
Probably not, if you look at other incidents: http://cmeerw.org/blog/594.html it appears they just like to push the limits.
I redirect lost bots home, seems a polite thing to do. 301 www.microsoft.com
ipchains -A input -j REJECT -p all -s 65.55.207.0/24 -i eth0 -l
ipchains -A input -j REJECT -p all -s 65.55.107.0/24 -i eth0 -l
ipchains -A input -j REJECT -p all -s 65.55.106.0/24 -i eth0 -l
problem solved
Don't kid yourself. It's the size of the regexp AND how you use it that counts.