Microsoft Bots Effectively DDoSing Perl CPAN Testers
at_slashdot writes "The Perl CPAN Testers have been suffering issues accessing their sites, databases and mirrors. According to a posting on the CPAN Testers' blog, the CPAN Testers' server has been being aggressively scanned by '20-30 bots every few seconds' in what they call 'a dedicated denial of service attack'; these bots 'completely ignore the rules specified in robots.txt.'"
From the Heise story linked above: "The bots were identified by their IP addresses, including 65.55.207.x, 65.55.107.x and 65.55.106.x, as coming from Microsoft."
Anyone know what sites on Microsoft's front-facing sites are most computationally intensive, and yet always dynamically generated? :D
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I manage some networks in my home city in Italy, and in the past year I've often seen strange traffic coming from some of their IP addresses. Guess they have been exploited by someone long time ago, and didn't even notice it.
Looks like Microsoft's Bing managers are on it. They'll make it worse in no-time flat. :)
BTW, the difference between a DDOS and a Slashdotting? You know why your site went down -- you got linked!
--
# Canmephians for a better Linux Kernel
$Stalag99{"URL"}="http://stalag99.net";
From TFA:
Hi,
I am a Program Manager on the Bing team at Microsoft, thanks for bringing this issue to our attention. I have sent an email to nospam@example.com as we need additional information to be able to track down the problem. If you have not received the email please contact us through the Bing webmaster center at nospam@example.com.
I mean, what additional information is needed wrt "respecting robots.txt" and "not letting loose more than one bot on a site at a time"?
Bing. Meh.
I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?
This is my sig.
I had a registration page - static content basically. The only thing that was dynamic was that it was referred to by many pages on the site with a variable in the querystring. Bing decided that it needed check on this one page *thousands* of time per day.
They ignored robots.txt.
I sent a note to an address on the Bing site that requested feedback from people having issues with the Bing bots - nothing.
The only thing they finally 'listened' to was placing "" in the header.
This kind of sucked because it took the registration page out of the search engines' index, however it was much better than being DDOS'd. Plus, the page is easy to find on the site so not *that* big a deal.
Bing has been open for months now and if you search around there are tons of stories just like this. Maybe now that a site with some visibility has been 'attacked', the engineers will take a look at wtf is wrong.
I have noticed the microsoft crawlers (msnbot) being fairly inefficient on many of my sites...
In contrast to googlebot and spiders from other search engines msnbot is far more aggressive, ignores robots.txt and will frequently re-request the same files repeatedly, even if those files haven't changed... Looking at my monthly stats (awstats) which groups traffic from bots, msnbot will frequently have consumed 10 times more bandwidth than googlebot, but is responsible for far less incoming traffic based on referrer headers (typically 1-2% of the traffic generated by google on my sites).
Other small search engines don't bring much traffic either, but their bots don't hammer my site as hard as msnbot does.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Are we sure this traffic comes from Microsoft? Could it not consist of forged network packets? You don't need a reply if you are running a DDOS. On the other hand, why would anyone, including Microsoft, want to bring down CPAN?
Nae king! Nae laird! Nae yurrupiean pressedent! We willna be fooled again!
> ...why not just block them?
They have.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
For ignoring robots.txt, they don't deserve any more nor less.
You do not have a moral or legal right to do absolutely anything you want.
Probably not, if you look at other incidents: http://cmeerw.org/blog/594.html it appears they just like to push the limits.
It's the first. Whatever you specify in the robots.txt as no-follow etc... means not to spider the pages, so no scanning of them at all.
You use it for when you only want part of your site to appear in search results, such as just the front page (for example). The rest of the site should not be touched by the bot at all.
I redirect lost bots home, seems a polite thing to do. 301 www.microsoft.com
The US Gov't has successfully operated as a going concern for 220+ years, with a proven and reliable management structure. Few, if any corporations, have been able to do that.
Private corporations can go under with just a couple of bad years. Or even months, particularly if they're new businesses. Governments just have to raise taxes.
It's basically a rough pattern filter that the bot is supposed to follow on parts of the site not to crawl. One reason it's used is that you can have dynamically generated pages that create an infinite loop that's impossible for the bot to detect.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
I'm pretty sure the first "D" in DDoS stands for "Distributed."
/^65\.55\.(106|107|207)/. from TFA).
If it was really a DDoS, you wouldn't be able to filter the IP out with a simple regex (like the
To boot, TFA didn't even say DDoS. Maybe that's too much to expect the editors to oh... I don't know...say... RTFA or Fact-Check it?
I should drop my bar a bit, I suppose.
ipchains -A input -j REJECT -p all -s 65.55.207.0/24 -i eth0 -l
ipchains -A input -j REJECT -p all -s 65.55.107.0/24 -i eth0 -l
ipchains -A input -j REJECT -p all -s 65.55.106.0/24 -i eth0 -l
problem solved
Don't kid yourself. It's the size of the regexp AND how you use it that counts.
The CPAN folks could complain to their ISP and have them drop the traffic that's coming in to their boxes.
Most ISP's will work with you to correct DDOS problems.
How dare you sir (or madam)!! How dare you! It is clear from the title of your post that you were not so subtly casting aspersions on an organization who I hold dear -- namely the Hirsute Dungeons n' Dragons society. You can frame your remarks in some obscure racial epithets, but to those of us who twirl our mustaches or stroke our beards while rolling dice, your insidious implication is brazenly clear. As the leader of a group of men (and women) With decorative facial hair who play Dungeons n' Dragons every Wednesday night, I cannot help but express the strongest offense to your euphamisticaly delivered hidden acronym. In the future, should you have such thoughts I would urge you to Do Not Say them.
"You never pushed a noun against a verb except to blow up something" (Spencer Tracey, 'Inherit the Wind')
It is a request not to scan part or all of a site. robots.txt
Every site does not have dozens of powerful servers and terabytes of bandwidth, nor is every site an ad-supported one that wants to maximize traffic. Common courtesy requires that a bot operator minimize his impact on any given site and honor requests not to index. Of course "courtesy" and "honor" are concepts that baffle Microsoft managers.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
While he could be more polite, it is indeed embarrassing for Microsoft if they cannot check their own network
a) for the existence of computers with given IPs
b) what these computers are doing
I think that deserves an "insightful" that cancels out the "flamebait".
C - the footgun of programming languages
Nothing you listed under the "War on Drugs" has anything to do with the war on drugs.
The war on drugs has made America a police state where the government can seize any of your property and auction it for profit before your trial. Even if you are found innocent, or the charges are thrown out for insufficient grounds, you will not be compensated for your lost money or profit. It has made an America where more people are imprisoned than any other nation on earth. It has made a nation where the cheapest and most effective drug for curing glaucoma and mitigating the pain and nausea associated with cancer treatments is a crime. Its made a nation where at least half its citizens are criminals.
Robots.txt is merely advisory. Ignoring it is discourteous and oafish but not illegal.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
if it's a scan (TCP established stream, taxing the SERVERS, not the NETWORK) that's the problem, as opposed to a SYN flood etc, and the IP addresses are in a very small range, why aren't they just using a hardware firewall at the router and blocking the IPs? There's not a whole lot to "distributed" when it's coming from a pair of C's.
Not saying they should be DOING it, but this is not a Denial of Service, it's a Denial of Stupid.
I work for the Department of Redundancy Department.
Got it! Bing is written in perl. They do regular expression matching while crawling and forgot to have a \E ... \Q escape sequence for the regex matching. They got so much perl code on CPAN, full of special characters, that somehow the crawler engine went into an infinite loop.
Bingo Dictionary - Pragmatist, n. A myopic idealist.