Microsoft Bots Effectively DDoSing Perl CPAN Testers
at_slashdot writes "The Perl CPAN Testers have been suffering issues accessing their sites, databases and mirrors. According to a posting on the CPAN Testers' blog, the CPAN Testers' server has been being aggressively scanned by '20-30 bots every few seconds' in what they call 'a dedicated denial of service attack'; these bots 'completely ignore the rules specified in robots.txt.'"
From the Heise story linked above: "The bots were identified by their IP addresses, including 65.55.207.x, 65.55.107.x and 65.55.106.x, as coming from Microsoft."
Anyone know what sites on Microsoft's front-facing sites are most computationally intensive, and yet always dynamically generated? :D
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Until I read the summary I thought it was another article about windows botnets and was wondering why the "microsoft" was tacked on since windows is the default OS assumption. Of course it would be interesting if these were new CPAN mirrors that MS was settings up.
Sooooo, lets all go to the testers blog and DDOS that too. Dumbass...
Excuse me, but please get off my Pennisetum Clandestinum, eh!
I manage some networks in my home city in Italy, and in the past year I've often seen strange traffic coming from some of their IP addresses. Guess they have been exploited by someone long time ago, and didn't even notice it.
Sounds like Microsoft.CN to me.
Looks like Microsoft's Bing managers are on it. They'll make it worse in no-time flat. :)
BTW, the difference between a DDOS and a Slashdotting? You know why your site went down -- you got linked!
--
# Canmephians for a better Linux Kernel
$Stalag99{"URL"}="http://stalag99.net";
It's not like ASP.NET is the most efficient way to sling web pages to being with.
This is my sig.
From TFA:
Hi,
I am a Program Manager on the Bing team at Microsoft, thanks for bringing this issue to our attention. I have sent an email to nospam@example.com as we need additional information to be able to track down the problem. If you have not received the email please contact us through the Bing webmaster center at nospam@example.com.
I mean, what additional information is needed wrt "respecting robots.txt" and "not letting loose more than one bot on a site at a time"?
Bing. Meh.
I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?
This is my sig.
Its not a bug, its a feature to index a site with a new, rapid, powerful, direct, personalised crawler :)
http://arstechnica.com/microsoft/news/2010/01/microsoft-outlines-plan-to-improve-bings-slow-indexing.ars
Domestic spying is now "Benign Information Gathering"
I had a registration page - static content basically. The only thing that was dynamic was that it was referred to by many pages on the site with a variable in the querystring. Bing decided that it needed check on this one page *thousands* of time per day.
They ignored robots.txt.
I sent a note to an address on the Bing site that requested feedback from people having issues with the Bing bots - nothing.
The only thing they finally 'listened' to was placing "" in the header.
This kind of sucked because it took the registration page out of the search engines' index, however it was much better than being DDOS'd. Plus, the page is easy to find on the site so not *that* big a deal.
Bing has been open for months now and if you search around there are tons of stories just like this. Maybe now that a site with some visibility has been 'attacked', the engineers will take a look at wtf is wrong.
I have noticed the microsoft crawlers (msnbot) being fairly inefficient on many of my sites...
In contrast to googlebot and spiders from other search engines msnbot is far more aggressive, ignores robots.txt and will frequently re-request the same files repeatedly, even if those files haven't changed... Looking at my monthly stats (awstats) which groups traffic from bots, msnbot will frequently have consumed 10 times more bandwidth than googlebot, but is responsible for far less incoming traffic based on referrer headers (typically 1-2% of the traffic generated by google on my sites).
Other small search engines don't bring much traffic either, but their bots don't hammer my site as hard as msnbot does.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Are we sure this traffic comes from Microsoft? Could it not consist of forged network packets? You don't need a reply if you are running a DDOS. On the other hand, why would anyone, including Microsoft, want to bring down CPAN?
Nae king! Nae laird! Nae yurrupiean pressedent! We willna be fooled again!
If they've identified the IP ranges, why not just block them? You can do it at the router or TCP level (drop packets), or just throw up a 403 Forbidden.
rooooar
That's not a troll. That's common knowledge.
A more appropriate mod would be +5 Redundant.
Yes, Evil more so
I suppose Microsoft can offer a simple explanation: "Our servers and other internal infrastructure are so vulnerable that they have been hacked and being used as remote-controlled botnets."
The largest prime factor of my UID is 263267.
They have to have SOME activity.
Sounds like there's more traffic from their bots than customers.
"I've got more toys than Teruhisa Kitahara."
Can anyone here clarify what robots.txt stands for, as in:
Is it an 'agreement' to not scan the site at all (by a search engine bot), or is it meant to just not -display- those results in the search engine?
I'd assume, since everything on a site is more or less public, that it would be the second. And if so, I can't see anything wrong with what Microsoft's bots did.
I can see how scanning a site's content (even if you're not going to list the results in your search engine) can have some value to a company.
When you shoot a mime, do you use a silencer?
AFAIK, the one doesn't exclude the other.
However, assuming evil is more fun :-)
Insert
> ...issues accessing their sites...
"Issues"? What's wrong with "problem"? "Issues" is marketing-speak. Microsoft marketing-speak.
And yes, get off my lawn.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
I redirect lost bots home, seems a polite thing to do. 301 www.microsoft.com
you might want to read that over again
The US Gov't has successfully operated as a going concern for 220+ years, with a proven and reliable management structure. Few, if any corporations, have been able to do that.
Private corporations can go under with just a couple of bad years. Or even months, particularly if they're new businesses. Governments just have to raise taxes.
I'm pretty sure the first "D" in DDoS stands for "Distributed."
/^65\.55\.(106|107|207)/. from TFA).
If it was really a DDoS, you wouldn't be able to filter the IP out with a simple regex (like the
To boot, TFA didn't even say DDoS. Maybe that's too much to expect the editors to oh... I don't know...say... RTFA or Fact-Check it?
I should drop my bar a bit, I suppose.
you might want to read that over again
Didn't say I was!
This is my sig.
I don't think you actually know what you want and I think people of a similar mind will do much more harm to the US
I think that is a fair statement. I'm putting together a piece for the relaunch of my web site that takes the federal budget, breaks it down to # of days you have to work to support each line item, says, what happens if you don't do that, then, lets you cut to your heart's content, and then tallies the results for everyone to see what the averages are.
I don't think anyone even really gets the government at all, left or right.
This is my sig.
ipchains -A input -j REJECT -p all -s 65.55.207.0/24 -i eth0 -l
ipchains -A input -j REJECT -p all -s 65.55.107.0/24 -i eth0 -l
ipchains -A input -j REJECT -p all -s 65.55.106.0/24 -i eth0 -l
problem solved
Don't kid yourself. It's the size of the regexp AND how you use it that counts.
The CPAN folks could complain to their ISP and have them drop the traffic that's coming in to their boxes.
Most ISP's will work with you to correct DDOS problems.
If you dont know, you should Google it, that will make it clear /. is not a -help mailing list and this was stupid, feckless and criminal, as in mis-use of a computer system beyond authorisation.
MS will just blame outsourcing, Danger engineering, pink, a new team just took over, oh wait they used that.
Outsourcing works best.
If you think about the other options it gets more interesting..
Are google, yahoo, ms ect all passing robots.txt over?
As a share holder, why waste the cpu time, storage and power costs if its of not of any direct short term or long use?
Put that cpu time, storage and power usage to good for profit calculations, indexing faster or quality ads.
If not who is paying for the ignore sites and why..
Domestic spying is now "Benign Information Gathering"
I'm a right winger and I like to see smaller, less intrusive government, but, I think it is wrong to say that the US government is competent.
The US Gov't has successfully operated as a going concern for 220+ years, with a proven and reliable management structure. Few, if any corporations, have been able to do that.
Let's see...
War on Poverty - yeah, that worked out *real* well, didn't it.
War on Drugs - See any results there?
War on Terror - With this one, I can't really tell if it's bungling, or actual malice
Those are just the "big names".
The US Government is a well-designed structure. And it worked pretty darn well for a while. But as federal power has increased, the effectiveness of that structure has decreased. In short, the Republic of the Founding Fathers is showing it's years.
And, for the record, there are organizations such as Lloyd's of London that can trace their existence, in one form or another, back to the 17th century. And if you want a truly old corporation, look at Stora Kopparberg Bergslags Aktiebolag in Sweden, which has been around since the 1300's!
Most corporations wither and die after a while, since their markets can wither and die. The "market" of a government does not - the only way for a government to "die" is by armed force, either from within or without.
Don't tell me to get a life. I had one once. It sucked.
How dare you sir (or madam)!! How dare you! It is clear from the title of your post that you were not so subtly casting aspersions on an organization who I hold dear -- namely the Hirsute Dungeons n' Dragons society. You can frame your remarks in some obscure racial epithets, but to those of us who twirl our mustaches or stroke our beards while rolling dice, your insidious implication is brazenly clear. As the leader of a group of men (and women) With decorative facial hair who play Dungeons n' Dragons every Wednesday night, I cannot help but express the strongest offense to your euphamisticaly delivered hidden acronym. In the future, should you have such thoughts I would urge you to Do Not Say them.
"You never pushed a noun against a verb except to blow up something" (Spencer Tracey, 'Inherit the Wind')
You mean the mods have to read TFCs? D:
I read TFA and all I got was this lousy cookie
You know women with decorative facial hair? mkaaaay....
Disallow a directory in robots.txt if anyone opens it have a link there along the lines of: If you open this your ip will be blocked. Everyone that requests that link gets nullrouted for a week if they do it again they get nullrouted forever.
What happens when the MS bots (which apparently ignore the robots.txt file) start indexing some site which provides pay-per-view information? Can we expect a fix to the problem then? All it takes is to get some lawyers involved, you know how that snowball goes.
How's it possible that, on Slashdot of all sites, *I*, of all people, need to tell you that IP packets do not necessarily come from the address inscribed in their headers?
While he could be more polite, it is indeed embarrassing for Microsoft if they cannot check their own network
a) for the existence of computers with given IPs
b) what these computers are doing
I think that deserves an "insightful" that cancels out the "flamebait".
C - the footgun of programming languages
Bing?
Ned? Ned Ryerson?
There's no place like
Nothing you listed under the "War on Drugs" has anything to do with the war on drugs.
The war on drugs has made America a police state where the government can seize any of your property and auction it for profit before your trial. Even if you are found innocent, or the charges are thrown out for insufficient grounds, you will not be compensated for your lost money or profit. It has made an America where more people are imprisoned than any other nation on earth. It has made a nation where the cheapest and most effective drug for curing glaucoma and mitigating the pain and nausea associated with cancer treatments is a crime. Its made a nation where at least half its citizens are criminals.
Robots.txt is merely advisory. Ignoring it is discourteous and oafish but not illegal.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
if it's a scan (TCP established stream, taxing the SERVERS, not the NETWORK) that's the problem, as opposed to a SYN flood etc, and the IP addresses are in a very small range, why aren't they just using a hardware firewall at the router and blocking the IPs? There's not a whole lot to "distributed" when it's coming from a pair of C's.
Not saying they should be DOING it, but this is not a Denial of Service, it's a Denial of Stupid.
I work for the Department of Redundancy Department.
Wow, this article is prescient.
I was just noticing in my web logs that small, out of the way sites that I host that used to get 1,000 hits a month were suddenly getting 1,000 hits PER DAY. Sure enough, anybody care to guess what netblock the 26,000 hits came from?
Microsoft.com just earned a ban.
Watch out for that first step, it's a DOOZY!
Don't blame me, I voted for Baltar.
Sweet, got any room in your group? I have my own dice, mustaches and beard, though sadly lacking women for some reason.
Question reality.
You left out the "War on Drugs"
A total failure to treat a social problem. Wasn't Prohibition ! enough for you.
Legalize it and tax the crap out of it.
Who is threatened by that? The crooks. As long as there's a "War on Drugs", crooks are guaranteed monopoly profits and monopoly access, all supported by your tax dollars keeping people in jail.
The first rule of consulting is "No matter what they say, it's ALWAYS a people problem." Well, it's true here to.
Right now, the US has more people in jail than any other country in the world. And the #1 reason is the "War on Drugs." Stop it and you'll reduce crime, reduce drug use, save money and lives, and fix the deficit.
Stop it and you'll reduce crime,
No, we'll just legalize it. So now we'll have corporations buying out advertising to convince people to ruin their lives by purchasing smack.
This is my sig.
I'm a right winger and I like to see smaller, less intrusive government, but, I think it is wrong to say that the US government isn't competent.
Fixed that.
This is my sig.
Evil does not require malice
Do the same as tobacco - high taxes, illegal to advertise, gross packaging, fines of up to 2/3 of a million dollars for illegal distrbution, etc.
Got it! Bing is written in perl. They do regular expression matching while crawling and forgot to have a \E ... \Q escape sequence for the regex matching. They got so much perl code on CPAN, full of special characters, that somehow the crawler engine went into an infinite loop.
Bingo Dictionary - Pragmatist, n. A myopic idealist.
He said he was a right winger.
You are being MICROattacked, from various angles, in a SOFT manner.
Add to your .htaccess file:
deny from 65.55.207.
deny from 65.55.106.
deny from 65.55.107.
The file in question is robots.txt for cpantesters.org, which does exist.
how to invest, a novice's guide
Do the same as tobacco - high taxes, illegal to advertise
I'm down with that.
This is my sig.
Bing should have used Wget first to download the articles to a local hard drive, and also to add a 2 to 3 second wait. Let it run over the weekend. Then test the search indexing algorithms on the local HTML files. They were probably performing indexing tests. I know they have smart people working for them, so it probably involved a contractor who didn't think about performance issues.
If you remember the history of robots.txt because you were there are the time, rather than because you read it in some history book somewhere, the purpose was to protect small web servers from being trashed by big search robots, initially altavista, and secondarily to protect them from other well-behaved web crawlers of whatever sorts. There were no script-generated pages back then, or at least hardly any; just handing out static html could be difficult enough if you had a small pipe and a slow server, though serving images to a robot obviously a waste of time back then.
Tarpits of various sorts existed soon after robots.txt, as a way of trapping spammer-run crawlers that ignored robots.txt, but that was as much for fun as for necessity :-)
And yes, people did have /private/ directories back then and still do now, thinking that because Google's polite about not looking in directories robots.txt says not to that there aren't humans or impolite robots that won't look there.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
It'll just keep them from bothering you, and you're (almost by definition) too small for them to care that they're not indexing your site.
Advertising their IP address block with BGP, if your ISP is careless enough to let you do that, now *that* would get their attention :-)
As an intermediate level of annoyance, you could set up your DNS server to respond to queries from Microsoftland to return entertaining IP addresses, such as 127.0.0.2 or bing's IP addresses or whatever.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
The primary reason for robots.txt was to protect small slow web servers from being swamped by Altavista's big fast web crawlers. Dynamic pages weren't a problem back then. On the other hand, after robots.txt became common, setting up dynamic pages to trap crawlers that ignored it into infinite loops became common also, because most of them were run by spammers of various sorts.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Search engines try to tell humans what web sites would have interesting contents based on their queries. They use robots and content models to approximate that so they can produce results quickly and economically. SEOs try to get the robots to tell the humans "my page is really interesting", when it usually isn't, which is scummy lying, and you shouldn't encourage such people.
They've really got three things to offer:
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
I don't see any requests for Robots.txt in my logs. It's always lower case : /robots.txt HTTP/1.1" 200 30 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.138 - - [19/Jan/2010:01:00:46 +0100] "GET
They likely have a web-server written in perl
You are likely a fucking idiot. I saw your sig about spite. You're still a fucking idiot.
Actually why the fuck did you spend so much time writing all that totally speculative bullshit just to try and prove that MS aren't at fault? Do you want to explain that? Because case sensitive web servers written in Perl are a bit of a fucking stretch of the imagination.
And no, I don't want to show sound reasoning to disprove some wacky shit your diseased brain made up.
Here it is another one from some minutes ago:
IPv4: 65.55.34.139 -> 83.211.46.34
hlen=5 TOS=192 dlen=162 ID=46000 flags=0 offset=0 TTL=0 chksum=7990
Payload: Priority Count: 5
Connection Count: 6
IP Count: 7
Scanner IP Range: 78.130.238.2:212.90.12.134
Port/Proto Count: 7
Port/Proto Range: 80:40210
65.55.34.139 resolving to col0-omc3-s1.col0.hotmail.com
My, seems like I struck a nerve. The the web server doesn't have to be written in perl for it not to ignore case. The point was it doesn't.
As for my comment about them writing a web server in perl being wacky -- you obviously know nothing about the perl community. There is nothing that can not be done better in perl -- including a web server. I don't seen that being an unwarranted comment. May not be true in this circumstance, but I'm sure it's been done. Did you even bother to check cpan for a cpan webserver? I'll take my bemusings, that you erroneously call speculations, over your sad, dim existence speculation any day. Tell me "CPAN::Mini::Webserver" isn't meant to server up a copy of cpan -- maybe not used for the main site, but...maybe with a squid accelerator front end -- yeah...I could see it!
Too bad you are such a hateful, spiteful diseased thing. You really should get some help or consider doing mankind a favor and stop wasting the planet's resources with your continued existence. It would be the responsible thing to do.
-l
Governments can fail quickly, too. Sure, they usually fall to different problems than private entities do -- governments usually that fail early generally due so, if it is early in their life, because of violent reactions by existing governments, and otherwise (early or not) because they so fail the populace that they see a violent reaction from them.
New attempts to start governments probably fail about as frequently as attempts to start businesses.
And, like any other government policy, raising taxes only works to the extent that the governed populace is willing to accept it.
Did you even bother to check the headers of the CPAN Testers site? It's Apache httpd. You spent longer typing your speculation and its defense than it would have taken you to verify for yourself.
how to invest, a novice's guide
You miss this the main point. Funny how people focus on the unimportant details when they don't like the main statement of the post. Robots.txt says to ignore case. It only makes sense for their webserver to
also ignore case for the file name. It sure would make retrieving cpan modules much easier. I'm always forgetting where some specific author has decided to put caps - because it isn't done consistently. It would be far smarter to allow case insensitive searching and usage given how capricious case usage is. They got hoisted by their own petard.
Most web servers ignore case. Theirs doesn't because they like to give authors the ability to randomly force
users to remember random combination of case. Yipee. The bit about the perl server was a piece of dry wit for reasons I've previously stated. It wasn't meant as an insult. That you took it that way shows you aren't a true perl affectionado, so stop complaining.
If I remember apache defaults to options to set case insensitivity. So they'd have to explicitly disable case insensitivity to enable this vulnerability.
Bing
Bing
If you can't get any trivially verifiable details correct (including which site this is), why should anyone take your random speculations seriously?
how to invest, a novice's guide
Ya, give them an excuse to get away with it. "it wasn't us attacking our competition, really"
---- Booth was a patriot ----
http://www.networksolutions.com/whois/results.jsp?ip=65.55.207.0