Slashdot Mirror


Microsoft Bots Effectively DDoSing Perl CPAN Testers

at_slashdot writes "The Perl CPAN Testers have been suffering issues accessing their sites, databases and mirrors. According to a posting on the CPAN Testers' blog, the CPAN Testers' server has been being aggressively scanned by '20-30 bots every few seconds' in what they call 'a dedicated denial of service attack'; these bots 'completely ignore the rules specified in robots.txt.'" From the Heise story linked above: "The bots were identified by their IP addresses, including 65.55.207.x, 65.55.107.x and 65.55.106.x, as coming from Microsoft."

332 comments

  1. So how do we DDoS Microsoft? by drinkypoo · · Score: 4, Funny

    Anyone know what sites on Microsoft's front-facing sites are most computationally intensive, and yet always dynamically generated? :D

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    1. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 2, Interesting

      Bing? ...But that would only help them to DDoS Bing.

    2. Re:So how do we DDoS Microsoft? by Lennie · · Score: 2, Insightful

      http://blogs.msdn.com/

      I've seen it fail many times

      --
      New things are always on the horizon
    3. Re:So how do we DDoS Microsoft? by SharpFang · · Score: 2, Insightful

      No, we just make mistakes writing our Perl programs for automatic downloading stuff from MSDN. Like, download() unless success, and forget to set success=true;

      --
      45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
    4. Re:So how do we DDoS Microsoft? by jisatsusha · · Score: 2, Funny

      All that'd serve to do is make them look more popular than ever. Traffic up 300%! Sounds like a good mar

    5. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 3, Funny

      That exactly what i said. Dont you dare leech the score from me jackass!

    6. Re:So how do we DDoS Microsoft? by jlp2097 · · Score: 5, Informative

      Not necessary. A Bing Product Manager has already commented on the CPAN Testers blog entry upon which the article is based:

      Hi,
      I am a Program Manager on the Bing team at Microsoft, thanks for bringing this issue to our attention. I have sent an email to barbie@cpan.org as we need additional information to be able to track down the problem. If you have not received the email please contact us through the Bing webmaster center at bwmc@microsoft.com.

      As said below, never ascribe to malice that which can be adequately explained by stupidity. (Insert lame joke about MSFT being full of stupidity here).

    7. Re:So how do we DDoS Microsoft? by John+Hasler · · Score: 1

      Seems like the CPAN admin has already solved the "issue".

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    8. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 5, Funny

      As much spam as I get from ir@infousa.com , I wish that someone would DDOS that damned company. If I knew of a way to get extra spam to ir@infousa.com I would probably do it so that company could get a taste of its own medicine. ir@infousa.com sent me unsolicited spam and it drives me nuts. Thanks for nothing, ir@infousa.com . It makes me want to call the company at (402)593-4500 and complain, but I don't have time. I guess I'll email them at ir@infousa.com instead. maybe.

    9. Re:So how do we DDoS Microsoft? by kulnor · · Score: 5, Funny

      Well, with Barbie(TM) on the case, this should be quickly resolved (unless she's too busy with G.I.Joe(TM))

    10. Re:So how do we DDoS Microsoft? by PetoskeyGuy · · Score: 4, Insightful

      Why make things worse? Block the ip address or range and notify the admins. This isn't a chan mob.

    11. Re:So how do we DDoS Microsoft? by Mephistro · · Score: 0

      Clue: Subtle joke, deserves 'funny' moderation ;)

    12. Re:So how do we DDoS Microsoft? by Zarf · · Score: 3, Insightful

      Clue: Subtle joke, deserves 'funny' moderation ;)

      Subtle + Slashdot = FAIL

      --
      [signature]
    13. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 0, Troll

      Thank you MS for admitting to the world that you're completely incapable of fixing the problem on your own. How horrible are your employees at their jobs when they require the assistance of their victims to fix the problem?

    14. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 5, Insightful

      "as we need additional information to be able to track down the problem."

      IP addresses aren't enough? You're MS--if you can't fix the problem and IP addresses are given, damn, that's just sad. You're freaking massive multi-billion dollar tech companies, and this is the best you can do?

      No wonder Chinese hackers own our asses.

      Then again, it took Comcast 9 months to fix a security hole in customer accounts (which would have required an s to http to make pages SSL'd), and the only reason it was "fixed" was because they did their annual website makeover and changed their entire system to something Flash based. Then again, I had contacted a VP, VP's security, referred to web security, and talked to web security 3x, talked to a manager. The last 3 groups verified the problem. It was referred to their web applications team by that point, who sat on it.

      Lovely world we live in.

    15. Re:So how do we DDoS Microsoft? by Penguinisto · · Score: 2, Insightful

      As said below, never ascribe to malice that which can be adequately explained by stupidity. (Insert lame joke about MSFT being full of stupidity here).

      Given the back-story on the whole Danger data loss affair, stupidity is the FIRST thing I'd ascribe to Microsoft these days...

      --
      Quo usque tandem abutere, Nimbus, patientia nostra?
    16. Re:So how do we DDoS Microsoft? by Penguinisto · · Score: 1

      So, really, the Perl guys deserve to be on the receiving end of some shitty code for once.

      So, err, it's .NET versus Perl (okay, PHP) then in a battle over whose customer base can mis-use whose code the worst on the public Internet?

      Fuck this - I'm going back to BBS.

      --
      Quo usque tandem abutere, Nimbus, patientia nostra?
    17. Re:So how do we DDoS Microsoft? by WinterSolstice · · Score: 4, Insightful

      Actually, your statement works better with 'INSERT LANG HERE'...

      I'm always surprised by how people seem to think that any language has a monopoly of some sort on sloppy and/or lazy coders. Been doing IT a long time, and the one thing that never changes is the sloppy/lazy code issue. It even predates programming, you know - look at infrastructure around the world for examples of "just toss something out there, hope it works".

      --
      An operating system should be like a light switch... simple, effective, easy to use, and designed for everyone.
    18. Re:So how do we DDoS Microsoft? by Short+Circuit · · Score: 4, Insightful

      A quick guess? Identifying unique sites by domain name, rather than by IP address, and either the bot or server not respecting HTTP 301 redirects.

      With Rosetta Code, I once had www.rosettacode.org serving up the same content as rosettacode.org. My server got pounded by two bots from Yahoo. I could set Crawl-Delay, but it was only partially effective; One bot had been assigned to www.rosttacode.org, while another to rosettacode.org, and they were each keeping track of their request delay independently. I've since corrected things such that www.rosettacode.org returns an HTTP 301 redirect to rosettacode.org, and have was eventually able to remove the Crawl-Delay entirely.

      I've since worked towards only serving up content for any particular part of the site on a single domain name, and have subdomains such as "wiki.rosettacode.org" redirect to "rosettacode.org/wiki", and "blog.rosettacode.org" to "rosettacode.org/blog". Works rather nice, though it does leave me a bit more open to cookie theft attacks.

      YMMV; As I said, that was a quick guess.

    19. Re:So how do we DDoS Microsoft? by Spatial · · Score: 5, Funny

      How horrible are your employees at their jobs when they require the assistance of their victims to fix the problem?

      [Every IT worker on Slashdot looks around nervously]

    20. Re:So how do we DDoS Microsoft? by Hurricane78 · · Score: 1, Insightful

      As said below, never ascribe to malice that which can be adequately explained by stupidity.

      Must be really easy to just beat you in the face, and say “Ooops, I’m sorry, I’m so st00pid! *drool*”
      I call bullshit on that rule.

      My rule: Don’t make judgements at all (either way), about things that you just don’t know.

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    21. Re:So how do we DDoS Microsoft? by jc42 · · Score: 4, Interesting

      As said below, never ascribe to malice that which can be adequately explained by stupidity. (Insert lame joke about MSFT being full of stupidity here).

      Yeah, though this particular sort of stupidity has been going on for a long time, and not just at Microsoft (though they seem to be the worst culprit).

      I run a couple of sites that, among other things, has links to return the "content" in a list of different formats (GIF, PNG, PS, PDF, ...). Periodically, the servers get bogged down by search sites hitting them many times per second, trying to get every file in every format. The worst cases seem to come from microsoft.com and msn.com, though it happens with other search sites, too. Actually, the first attempts I saw at "deep search" like this came from googlebots around 10 years ago, though they quickly backed off and haven't been a serious problem since then. MS-origin "attacks" of this sort have been happening every few months, for nearly a decade.

      I've generally handled them with a couple of techniques. One is to check the logs for successive requests from the same address, and insert sleep() calls with progressively longer sleeps as more messages arrive. The code prefixes the "content" with a comment explaining what's happening, in case a human investigates.

      Another technique is to look for series of "give me this in all your output formats" requests, verify that it's a search bot, and add the address to a "banned" list of sites that simply get a message explaining why they aren't getting what they asked for, plus an email address if they want to get in contact. So far nobody at any search site has ever used that address. I did once get a response from a guy who was studying sites with such multi-format data, for a school project, to see how the various output formats compared in size and information content. I took his address off the banned list, and suggested that he add a couple-second delay between requests, and he finished his project a few days later.

      I suspect that the googlebot folks may have read my explanation of the delays and added code to spread their requests out over time, since that's what their bots seem to do now. But I never heard from them. They must have gotten complaints (and bans) from lots of web sites when they started doing this, so they probably realized quickly that they should add code to prevent such flooding of sites.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    22. Re:So how do we DDoS Microsoft? by Alpha830RulZ · · Score: 2, Insightful

      You know, it's easy to poke fun at the Microsofty, but is it possible that he was just trying to find out what was being hit so that he could figure out who in his organization he should contact? Maybe there is some uber technical way he could have figured this out, or maybe he should have RTFB, but his response sounded well intentioned and responsive. What would you prefer? The microsoft of old?

      --
      I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.
    23. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 0

      never ascribe to malice that which can be adequately explained by stupidity.

      Too true. But keep in mind that stupid actions increase everyone's risk of getting damaged, while malevolent actions are generally dangerous only to the target.

      Car analogy: If you are not his target, you are safer facing a vengeful man with a gun than if you are a bystander in the path of a speeding Hummer driven by stupidity.

    24. Re:So how do we DDoS Microsoft? by Jarjarthejedi · · Score: 1

      That's not uncommon at all, ever hear of a bug report? Different systems/setups exist everywhere, it's impossible to test your system on all of them, just the most common and wait to hear from people with oddball systems and problems.

      Different system's doesn't really apply but what if the site's robots.txt is slightly different (different newlines or something) which is causing an unforeseen error?

      --
      There are two kinds of fool One says 'This is old therefore good' Another says 'This is new therefore better'- Dean Ing
    25. Re:So how do we DDoS Microsoft? by John+Hasler · · Score: 1

      > ...his response sounded well intentioned and responsive...

      Microsoft managers are very good at *sounding* well intentioned and resposive when the shit hits the fan. But why the hell can't they do things right the first time?

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    26. Re:So how do we DDoS Microsoft? by Lonewolf666 · · Score: 1

      If he was trying to find out what was going on, he was doing it wrong. The CPAN blog probably gives all the information the guys at CPAN have. As outsiders, they probably don't know which department at Microsoft is running those bots. Except that we all can guess at Bing because that is the Microsoft search engine.

      As a Microsoft manager in that situation, I'd try to reach someone in Bing network administration first. That someone might not have the tools or network privileges to track down the offending bots himself, but should at least be able to direct the call to the right people.

      --
      C - the footgun of programming languages
    27. Re:So how do we DDoS Microsoft? by MstrFool · · Score: 2, Insightful

      Same reason other folks can't, they are human. Look, I despise MS for a variety of reasons and am one of the rabid anti-MS folks. But honestly, they do enough that is legit to gripe about, no need to blow a mistake like this out of proportion. Considering all they do it was inevitable to happen at some point. Shit happens, any one that codes has had a mega-woops at one point or an other, and if they haven't they they are cookie cutter coding and not risking creativity. Hate them for needlessly locking the geeks from the systems, for locking the owners out of the systems while permitting hackers more remote access rights then they could get at the system it self. But this? 'eh, they goofed, get over it and worry about the real evil they are doing.

      --
      Question reality.
    28. Re:So how do we DDoS Microsoft? by Short+Circuit · · Score: 4, Insightful

      The REAL solution to your problem is for everyone to abandon the dumb-as-shite "www" prefix.

      Why bother with www.example.com and example.com? Get rid of it. Anyone who still puts "www." on their business cards is a dufus.

      REAL solutions to immediate problems don't depend on the rest of the world changing to suit my needs. Also, the fact remains that there are links out there that point to "http://www.rosettacode.org/w/index.php?something_or_other", not all of those links will (or can) change, and I would be an absolute fool to knowingly break them, if I want people to visit RCo via referral traffic.

    29. Re:So how do we DDoS Microsoft? by raju1kabir · · Score: 4, Insightful

      Different system's doesn't really apply but what if the site's robots.txt is slightly different (different newlines or something) which is causing an unforeseen error?

      There is a spec for robots.txt. If someone's not following it, then it's their fault. Given Microsoft's past history, I know where I'd point the finger absent any more concrete information.

      --
      "Patriotism is your conviction that this country is superior to all other countries because you were born in it." -- GBS
    30. Re:So how do we DDoS Microsoft? by mounthood · · Score: 4, Insightful

      As said below, never ascribe to malice that which can be adequately explained by stupidity.

      Must be really easy to just beat you in the face, and say “Ooops, I’m sorry, I’m so st00pid! *drool*” I call bullshit on that rule.

      My rule: Don’t make judgements at all (either way), about things that you just don’t know.

      How about: Don't mistake organizational stupidity for individual stupidity. This isn't the case of a single bad coder making a mistake, this is an organization that's chosen to how much effort to apply. How much testing and review? What failsafe's, logging and active monitoring? Will options for feedback be accessible and responsive? Stupidity and Malice aren't mutually exclusive for an individual, and certainly not for an organization.

      --
      tomorrow who's gonna fuss
    31. Re:So how do we DDoS Microsoft? by Chris+Burke · · Score: 5, Insightful

      I've never liked that saying because of the implication that malice and stupidity are exclusive.

      Dumb and mean are often found together.

      --

      The enemies of Democracy are
    32. Re:So how do we DDoS Microsoft? by dissy · · Score: 4, Interesting

      Every once in a while, I still see sites that don't serve up unless you include "www." in the address - but it's like I said - a dufus.

      Looks like someone hasn't read RFC 1178 and enjoys breaking interoperability.

      Your method also breaks email by redelegating MX records one sub domain above where the control should be and MX's point to, thus breaks delegation of sub domains.

    33. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 0

      That minx!

      And she told me math was hard to get me to dump her!

    34. Re:So how do we DDoS Microsoft? by HiThere · · Score: 1

      OK, but since I still wouldn't ascribe any trustworthiness to them, I doubt the manager's story. It's not that it's implausible, it's that it's coming from Microsoft.

      Put it this way:
      If Microsoft said the sky was blue, I'd carry a raincoat.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    35. Re:So how do we DDoS Microsoft? by davester666 · · Score: 1

      And now we know why the web-crawlers are so slow to index small web sites for Bing. They're spending all their time crawling open-source web sites in an effort to slow them down.

      --
      Sleep your way to a whiter smile...date a dentist!
    36. Re:So how do we DDoS Microsoft? by dimeglio · · Score: 1, Funny

      That would be Ken(tm), if I recall correctly, G.I. Joe(tm) was not interested in Barbie(tm).

      --
      Views expressed do not necessarily reflect those of the author.
    37. Re:So how do we DDoS Microsoft? by mmontour · · Score: 2, Funny

      Mission accomplished. I got this on the second link that I clicked.

      We are currently unable to serve your request
      We apologize, but an error occurred and your request could not be completed.
      This error has been logged. If you have additional information that you believe may have caused this error please report the problem here.

    38. Re:So how do we DDoS Microsoft? by HiThere · · Score: 1

      It's plausible. But the explanation came from Microsoft, so trusting it isn't reasonable. (Neither is claiming it's a lie. It *is* plausible.)

      This is one case where I have to say "Well, they might be telling the truth", and leave it at that.

      Still, blocking the addresses seems like the correct move. Even if the truth has leaked out of Microsoft, there's no telling how long it would take them to fix the problem.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    39. Re:So how do we DDoS Microsoft? by HiThere · · Score: 1

      Agreeing with what you said, it's still true that some languages are worse than others (along some particular vector of evaluation). Perl is used to produce more crufty code BECAUSE it's used to quickly hack together solutions.

      N.B.: It's not because Perl looks (to me) like line noise. APL is as bad for that (though not worse!). And several other languages are also noted for "write only code". But Perl was created specially for quick hacks. That is both it's power and it's weakness.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    40. Re:So how do we DDoS Microsoft? by Yakasha · · Score: 2, Funny

      Clue: Subtle joke, deserves 'funny' moderation ;)

      Subtle + Slashdot = FAIL

      And what exactly are you hinting at?

    41. Re:So how do we DDoS Microsoft? by MikeFM · · Score: 1

      Yeah I complained about a similar issue with being aggressively scanned by bots that ignored robots.txt and didn't identify themselves with a user-agent and their answer was to first ignore my question and then to almost stop scanning my site altogether. Bing sucks.

      --
      At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
    42. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 0

      Looks like someone hasn't read RFC 1178 and enjoys breaking interoperability.

      Aside from being a 20-year-old document that doesn't take into account the current use of names--that is, to name websites instead of individual machines--it also ends with this guideline: "There is always room for an exception."

    43. Re:So how do we DDoS Microsoft? by __aaclcg7560 · · Score: 2, Funny

      I thought Ken(tm) was interested in G.I. Joe(tm) these days. :P

    44. Re:So how do we DDoS Microsoft? by dkh2 · · Score: 1

      Of course, you presume that the gateway is smart enough to route all traffic to the correct device or sub-domain, and that the under-budgeted admin actually knows how to do that.

      I've seen a number of small companies with a very active digital presence for which the owner/president also manages the gateway and has the entire company running on a bank of repurposed workstation towers - each providing a specific service. The gateway box at domain.com doesn't provide anything but traffic cop services. The system named 'WWW' provides ONLY the HTTPd service. Likewise separate boxen provide POP, SMTP, etc...

      In these cases the domain.com gateway is a bare bones implementation and actually requires the additional information to route requests correctly.

      --
      My office has been taken over by iPod people.
    45. Re:So how do we DDoS Microsoft? by gbjbaanb · · Score: 2, Funny

      IP addresses aren't enough? You're MS--if you can't fix the problem and IP addresses are given, damn, that's just sad. You're freaking massive multi-billion dollar tech companies, and this is the best you can do?

      I've seen and used Vista. The answer to your question is "yes".

    46. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 0

      That's pretty stupid. He says he needs information from the CPAN site? He really shouldn't need any more information. They should be able to download the robots.txt file from CPAN and compare it to their own rules, then catalog what bots are active. From the sounds of things, the bots don't talk to each other, that's definitely a problem there.
      Bottom line, Mr. Program Manager, you have all the information you need.

    47. Re:So how do we DDoS Microsoft? by tomhudson · · Score: 0, Flamebait

      Of course, you presume that the gateway is smart enough to route all traffic to the correct device or sub-domain, and that the under-budgeted admin actually knows how to do that.

      I've seen a number of small companies with a very active digital presence for which the owner/president also manages the gateway and has the entire company running on a bank of repurposed workstation towers - each providing a specific service. The gateway box at domain.com doesn't provide anything but traffic cop services. The system named 'WWW' provides ONLY the HTTPd service. Likewise separate boxen provide POP, SMTP, etc...

      You don't need the "www" prefix to figure out that requests for port 80 are http, port 21 are ftp, 443 are https, 25 are smtp, and 110 is pop3.

      For those wanting to try this at home and work around their providers' traffic blocking: You also don't need a the power consumption of a repurposed box for that when you can use port forwarding on a router. It'll even let you use one of your boxes on your home lan as a public-facing web/ftp/mail/whatever server (and you can set them up to listen to alternate ports, like 8080 for http, and 2525 for running your own private mail server). Throw in a redirect to your external ip from a known web page, and you're in business. You can even run a proxy that way.

    48. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 0

      You don't seem to understand how DNS works. Technically, 'www' is a hostname and per convention was turned into the de facto standard hostname for websites. As another poster pointed out, showing websites without the www or other hostname portion goes agent RFC rules and although it works, not all DNS servers / resolvers will support it.

      As for messing up the whole sub-domain naming scheme, please clarify?

    49. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 0

      Clue: Subtle joke, deserves 'funny' moderation ;)

      Subtle + Slashdot = FAIL

      Actually Subtle + Slashdot + You Not Understanding It = FAIL

    50. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 0

      Every once in a while, I still see sites that don't serve up unless you include "www." in the address - but it's like I said - a dufus.

      For example, most (all?) government sites in the UK

    51. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 1, Insightful

      Never ascribe to malice and stupidity what can be explained by stupidity alone.

       

      That better?

    52. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 1, Insightful

      Hey, great, sexism.

    53. Re:So how do we DDoS Microsoft? by budgenator · · Score: 2, Funny

      The unobtainable fruit is always thought to be the sweetest.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    54. Re:So how do we DDoS Microsoft? by budgenator · · Score: 1

      The http: part does make the www. part redundant.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    55. Re:So how do we DDoS Microsoft? by Achromatic1978 · · Score: 0, Flamebait
      Ye gods. I think, of all the threats to its business model that Microsoft has... "Needing to DDoS CPAN to stifle competition" ranks somewhere about ... oooh, 5,542nd?

      Shit happens. People misconfigure things. Even professionals. Someone noticed, complained, and someone else said they'd investigate and get resolved. Wow. Yawn.

      Instead we have Slashtroglodytes screaming about conspiracies by MSFT.

    56. Re:So how do we DDoS Microsoft? by Fzz · · Score: 1
      • Step 1. Turn on HTTP Compression on your server (msnbot supports it).
      • Step 2. Write a little cgi script that checks if the agent is msnbot, and if so for every image on your web site, returns a really really large file of zeros. It wont cost you any bandwidth because gzip will compress all the zeros to very little for transmission.
      • Step 3. Invest in shares of Seagate and Western Digital. Short Microsoft.
      • Step 4. Profit!
    57. Re:So how do we DDoS Microsoft? by budgenator · · Score: 1

      As for messing up the whole sub-domain naming scheme, please clarify?

      Well let see Example Corp is a mega-sized multi-nation with a physical presence in numerous countries providing goods and services; it's global web portal is example.com, it's web portal for American operations is usa.example.com, Mexican operations site is mex.example.com and Swiss operations is at che.example.com. Putting an "www." on the front just complicates things a smidgen, but it's not a deal breaker.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    58. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 1, Insightful

      Even professionals.

      You're implying "professionals" work there? Ha, ha ha. Ignoring robots.txt, particularly with the extraordinary resources they have to get it right, is incompetence, not professionalism.

    59. Re:So how do we DDoS Microsoft? by Passman · · Score: 2, Funny

      Nah, G.I. Joe was interested in G.I. Joe these days. But don't bother asking, he won't tell.

      --
      Minne-snow-da: Winter is comming...
    60. Re:So how do we DDoS Microsoft? by dissy · · Score: 1

      You've made the classical mistake of mistaking the name of a thing with the thing itself.

      The name of my web server is not 'www'. The machine has a real name, and www is a cname (Alias) to that.

      In reality, for those of us that have been using the Internet since before 1992 and the web existed, some of us still run many other services than web servers, and some of them for a lot longer than web servers existed.
      Why single out a relatively new service to hand the root of your domain over to?

      Maybe 'example.com' points to my mail server, because I am an email company. That means I must use a subdomain, and it must be one my visitors KNOW IN ADVANCE.

      This means 'screwball.example.com' would be unreachable to anyone I did not tell the computers name to ahead of time, since it would not be possible for just 'example.com' to redirect to the web server, being on a different machine than the mail server it already points to.

      This means going to 'example.com' would end up TCP wise at my mail server, and 'www.example.com' would not exist. How would anyone ever find my web server?

      Welcome to networking 101

      In addition, if you want to base your case off of that logic, then OK.
      If you think 'www' is the machines name out of 'www.example.com', then your solution of using just 'example.com' means your machine is named 'example' and part of the network 'com'.

      That makes even less sense. (At least for all of us that are not ICANN/NetSol)

      It's 2 decades out of date, and more importantly, specifically states It does not specify any standard.

      Let's see what I said (Since you clearly didn't)

      Looks like someone hasn't read RFC 1178 and enjoys breaking interoperability.

      No, I don't see the word standard anywhere in there.

      I see 'enjoys breaking interoperability', and following that I see a very precise example, which you ignored.

    61. Re:So how do we DDoS Microsoft? by merreborn · · Score: 1

      Also, the fact remains that there are links out there that point to "http://www.rosettacode.org/w/index.php?something_or_other", not all of those links will (or can) change, and I would be an absolute fool to knowingly break them, if I want people to visit RCo via referral traffic.

      That can be resolved with a single, simple apache rewrite rule.

      Continuing to support www. -- if only by rewrite rule -- is unfortunately a necessary evil presently. If it isn't "www.*.com", the technically unsavvy majority doesn't understand it.

    62. Re:So how do we DDoS Microsoft? by 31eq · · Score: 1

      I don't see any need for exceptions because I can't find anything in RFC 1187 that we'd need to make an exception to.

      I don't see any problem with MX records either. MX records exist precisely to solve this problem. The DNS tells you which server to send email to. That is, it delegates to a sub domain, if that's what you want. Nothing you do on your web server will break email.

      Now, you may have services other than web and email, yes. And you may have some problem with routing by protocol. But, given that it isn't 1992 any more, and this web thing doesn't look like a passing fad, it makes sense to use the shortest domain name for the most used protocol. That does mean that the vast majority of "www" prefixes are completely redundant. But, hey, aren't people funny things?

    63. Re:So how do we DDoS Microsoft? by tomhudson · · Score: 0, Flamebait

      As another poster pointed out, showing websites without the www or other hostname portion goes agent RFC rules

      No, it doesn't. The rfc the poster quoted was about naming machines in general, NOT specifically about naming web servers. The title was "Choosing a name for your computer".

      The pertinent part says"

      Avoid domain names.

      For technical reasons, domain names should be avoided. In particular, name resolution of non-absolute hostnames is problematic. Resolvers will check names against domains before checking them against hostnames. But we have seen instances of mailers that refuse to treat single token names as domains. For example, assume that you mail to "libes@rutgers" from yale.edu. Depending upon the implementation, the mail may go to rutgers.edu or rutgers.yale.edu (assuming both exist).

      In other words, don't name your machine "slashdot" and expect it to work all the time.

      And:

      Avoid domain-like names.

      Domain names are either organizational (e.g., cia.gov) or geographical (e.g., dallas.tx.us). Using anything like these tends to imply some connection. For example, the name "tahiti" sounds like it means you are located there. This is confusing if it is really somewhere else (e.g., "tahiti.cia.gov is located in Langley, Virginia? I thought it was the CIA's Tahiti office!"). If it really is located there, the name implies that it is the only computer there. If this isn't wrong now, it inevitably will be.

      And, as I point out, it's only a suggestion, now rendered obsolete by 20 years of practice:

      This FYI RFC is a republication of a Communications of the ACM article on guidelines on what to do and what not to do when naming your computer [1]. This memo provides information for the Internet community. It does not specify any standard.

    64. Re:So how do we DDoS Microsoft? by Eil · · Score: 1

      Mod parent up. Any datacenter worth their salt has a way of blacklisting IPs at the router if a DoS can't be stopped at the server (although it honestly sounds like they didn't try that approach either).

    65. Re:So how do we DDoS Microsoft? by lena_10326 · · Score: 1

      I run a couple of sites that, among other things, has links to return the "content" in a list of different formats (GIF, PNG, PS, PDF, ...). Periodically, the servers get bogged down by search sites hitting them many times per second, trying to get every file in every format.

      I don't understand why you exposed them as links in your html, which crawlers will easily pickup. If bandwidth/CPU is a concern (it almost always is) one shouldn't serve plain links to large data files; files with many duplicate formats; or files meant only for human consumption. Is there a reason you don't serve them from CGI with mandatory POST arguments? You turn off indexing on the data file directory to lock out crawlers and use the CGI to validate POST arguments, which serves a 302 redirect link to the actual data file. It can be done with GET and mandatory args but going with POST is gives you an extra layer because convention is crawlers don't do form POSTs. You can also throw in a CAPTCHA on the form page.

      Inserting a form POST does throw up an extra page, but it's become a sort of status quo for user initiated downloads. Some nanny HTTP purists might bitch about misusing POST, but it's simple way to do it and can execute very fast with caching compiled CGIs because there's little CPU overhead with parsing CGI args and sending back a few headers.

      As for automated downloads (invoked from installer programs), they should be using file manifests to locate files inside the non-indexable data directory.

      --
      Camping on quad since 1996.
    66. Re:So how do we DDoS Microsoft? by Al+Al+Cool+J · · Score: 1

      Why single out a relatively new service to hand the root of your domain over to?

      Because that is the service that all of your internet-using customers will use to seek information about your company.

      Maybe 'example.com' points to my mail server, because I am an email company.

      Then that would be a stupid email company and deserves to go out of business.

      I'm sorry, but if http://example.com/ does not bring up your company's website, then you are a dismal IT failure, and no amount of rationalisation or waving RFCs about will change that.

      I understand and appreciate that there is often perceived to be a "right way" to do things in IT, but you still have to balance that against common sense, practical considerations, and user expectation. The "right way" may be right when seen within a specific and confined logical framework (networking 101), yet be completely moronic when placed within a broader context (business and marketing on the internet).

    67. Re:So how do we DDoS Microsoft? by fm6 · · Score: 1

      never ascribe to malice that which can be adequately explained by stupidity.

      You'll pry my conspiracy theory from my cold dead hands!!!!!

    68. Re:So how do we DDoS Microsoft? by tomhudson · · Score: 1

      The http: part does make the www. part redundant.

      Thank you!

      Reserve the extra stuff for subdomains. like blog.example.com, clients.example.com, specials.example.com, ads.example.com (so we can block that one more easily &lt:-0

      We already know by the $PROTO http:/// portion that it's web traffic, not ftp or ssh or telnet or ...

    69. Re:So how do we DDoS Microsoft? by tomhudson · · Score: 0, Flamebait

      Maybe 'example.com' points to my mail server, because I am an email company. That means I must use a subdomain, and it must be one my visitors KNOW IN ADVANCE

      Route your traffic to the right server based on the port requested. "cat /etc/services" for the list. No need for subdomains.

      All they need to know is example.com.

      Q: What's your domain name?
      A: example.com
      Q: So what's the name of your ftp server?
      A: example.com.
      Q: What's the smtp mail server?
      A: example.com.
      Q: What's your pop3 server?
      A: example.com.
      Q: So they're all on one machine?
      A: No. We use magic pixie dust.

    70. Re:So how do we DDoS Microsoft? by Chris+Burke · · Score: 1

      No, because you're still suggesting to discount malice in the presence of stupidity.

      "If it's possible it could be a case of stupidity alone, assume there's no mal intent involved," is a bad assumption in many cases, and thus dumb.

      --

      The enemies of Democracy are
    71. Re:So how do we DDoS Microsoft? by NNKK · · Score: 1

      I encourage you to explain to people serving large quantities of content that in order to be in compliance with your personal view of network correctness, their routers must now perform NAT on all of their traffic.

      I assure you, you will not get the hoped-for response from people saturating 10gbps uplinks.

    72. Re:So how do we DDoS Microsoft? by Emilio+III · · Score: 1

      This whole thing appears to be a fraud. The IP address range 66.55.96.0 - 66.55.111.255 belongs to Funds Xpress Financial Network, Inc. in Austin TX. 66.55.192.0 - 66.55.223.255 belongs to Great Works Internet of Biddeford ME. Who decided those IP addresses belong to Microsoft? Please check whois.arin.net

    73. Re:So how do we DDoS Microsoft? by spongman · · Score: 2, Funny

      let's hope they don't store it compressed...

    74. Re:So how do we DDoS Microsoft? by drinkypoo · · Score: 2, Interesting

      Instead we have Slashtroglodytes screaming about conspiracies by MSFT.

      Just for the record, since you're commenting under a thread I started, I do not believe that there was a conspiracy to attack CPAN. I think there is a conspiracy to continue accidentally attacking CPAN. The information provided ought to be more than sufficient to figure out what is going on. Remember, any time two people work to screw a third out of something, it's a conspiracy by definition.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    75. Re:So how do we DDoS Microsoft? by darthflo · · Score: 1

      Maybe 'example.com' points to my mail server, because I am an email company.

      Out of all the examples in the world you could pick, you went for the wrong one like a cartoon character falling into the desert and hitting the only cactus in a three-mile radius. There's a DNS record type called MX to identify Mail eXchanges for that domain. example.org. A may point to 10.2.3.4, which could be your web, telnet, irc and quake server yet example.org. MX would point to pizza.example.org. (which in turn has an A record to 10.4.5.6) and spaghetti.example.org. (10.1.1.1). You can even add several MX records with different priorities, so in the event of pizza failing, clients will try spaghetti. It's quite awesome.

    76. Re:So how do we DDoS Microsoft? by bcrowell · · Score: 1

      I run a couple of sites that, among other things, has links to return the "content" in a list of different formats (GIF, PNG, PS, PDF, ...). Periodically, the servers get bogged down by search sites hitting them many times per second, trying to get every file in every format.

      I've had sort of a similar issue, not with bots but with things known as "download managers" (example) Apparently people install a plugin in IE that is supposed to make their downloads go faster. If I'm understanding correctly, it opens up multiple http connections in order to retrieve the same file. I suspect it's basically snake oil. I suppose it might help in cases where the bottleneck isn't your own ISP but the overloaded server on the other end, although then you'd essentially be screwing the other users on the site in order to get more than your fair share. My site has a lot of books that are in the form of large PDF files. I'll get these users hitting my site, and it utterly brings my server to its knees. My apache logs show these people using up 50 Mb worth of data flow in order to download a 5 Mb pdf file. The only solution I've been able to find is to write a perl script that goes through my logs every 15 min looking for this pattern of usage. When it detects it, it writes to the .htaccess file to block that IP.

    77. Re:So how do we DDoS Microsoft? by budgenator · · Score: 1

      Dude you are so dissing gopher

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    78. Re:So how do we DDoS Microsoft? by bluefoxlucid · · Score: 1

      Actually, you'd just have to run a little HTTP server on that box that replies with an HTTP 310 Moved Permanently to www.example.com

    79. Re:So how do we DDoS Microsoft? by tomhudson · · Score: 1

      Most people aren't saturating 10gps uplinks. Also, most people ARE doing NAT anyway. So what's your point?

    80. Re:So how do we DDoS Microsoft? by marcosdumay · · Score: 1

      Maybe... He got all the trouble of separating the punctuation from the adress, he could at least put a few href="mailto:..." tags in it. No, doesn't deserve the moderation :p

    81. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 0

      It's 65.55.207.x, 65.55.107.x and 65.55.106.x. Read carefully next time.

  2. There's... by Anonymous Coward · · Score: 0, Redundant

    probably a PERL script to handle that!

    1. Re:There's... by Anonymous Coward · · Score: 0

      One? There's more than one way to do it!

  3. The end is near by Jorl17 · · Score: 0, Funny

    Run, Microsoft is coming to get you!

    --
    Have you heard about SoylentNews?
  4. Why? by joel.neely · · Score: 0, Redundant

    Bing?

    1. Re:Why? by ozmanjusri · · Score: 1
      Why? Bing?

      They have to have SOME activity.

      Sounds like there's more traffic from their bots than customers.

      --
      "I've got more toys than Teruhisa Kitahara."
    2. Re:Why? by Mitchell314 · · Score: 1

      You mean the mods have to read TFCs? D:

      --
      I read TFA and all I got was this lousy cookie
    3. Re:Why? by darkpixel2k · · Score: 1

      Bing?

      Ned? Ned Ryerson?

      --
      There's no place like ::1 (I've completed my transition to IPv6)
    4. Re:Why? by iluvcapra · · Score: 1

      Watch out for that first step, it's a DOOZY!

      --
      Don't blame me, I voted for Baltar.
  5. Oh! *Literally* Microsoft bots! by Culture20 · · Score: 1

    Until I read the summary I thought it was another article about windows botnets and was wondering why the "microsoft" was tacked on since windows is the default OS assumption. Of course it would be interesting if these were new CPAN mirrors that MS was settings up.

    1. Re:Oh! *Literally* Microsoft bots! by Ardaen · · Score: 4, Informative

      Probably not, if you look at other incidents: http://cmeerw.org/blog/594.html it appears they just like to push the limits.

    2. Re:Oh! *Literally* Microsoft bots! by Trailer+Trash · · Score: 1

      Until I read the summary I thought it was another article about windows botnets and was wondering why the "microsoft" was tacked on since windows is the default OS assumption.

      I'm not sure these are mutually exclusive.

  6. Testers blog link... by flyingfsck · · Score: 1

    Sooooo, lets all go to the testers blog and DDOS that too. Dumbass...

    --
    Excuse me, but please get off my Pennisetum Clandestinum, eh!
    1. Re:Testers blog link... by nicolas.kassis · · Score: 1

      If he can handle the msnbots, he probably can handle the slashdot crowd.

  7. I've seen it before by LordAzuzu · · Score: 5, Interesting

    I manage some networks in my home city in Italy, and in the past year I've often seen strange traffic coming from some of their IP addresses. Guess they have been exploited by someone long time ago, and didn't even notice it.

    1. Re:I've seen it before by beadfulthings · · Score: 3, Interesting

      It's interesting to read this, as I've had some random and somewhat incomprehensible port scans coming from an IP address identified as one of theirs. If you're just an insignificant slob, you can't write to their abuse address, either; you'll get bounced. I simply blocked that particular IP address. Let them worry about who's gotten to them.

      --
      "Here's what's happening. You're starting to drive like your Dad..." - Red Green
    2. Re:I've seen it before by Anonymous Coward · · Score: 0

      One of the IPs that was running attacks against my server belonged to a Italian Linux website.

  8. Typical M$ by omb · · Score: 0, Flamebait

    Lazy, feckless, inconsiderate crooks.

    1. Re:Typical M$ by auric_dude · · Score: 1

      Sounds like Microsoft.CN to me.

    2. Re:Typical M$ by Anonymous Coward · · Score: 1, Informative

      That's not a troll. That's common knowledge.

      A more appropriate mod would be +5 Redundant.

  9. Check the blog... by strredwolf · · Score: 4, Funny

    Looks like Microsoft's Bing managers are on it. They'll make it worse in no-time flat. :)

    BTW, the difference between a DDOS and a Slashdotting? You know why your site went down -- you got linked!

    --

    --
    # Canmephians for a better Linux Kernel
    $Stalag99{"URL"}="http://stalag99.net";
    1. Re:Check the blog... by Anonymous Coward · · Score: 5, Funny

      BTW, the difference between a DDOS and a Slashdotting?

      The DDOS bots actually read TFA.

    2. Re:Check the blog... by Anonymous Coward · · Score: 0

      I think you may just have explained Bing's search accuracy...

    3. Re:Check the blog... by gothzilla · · Score: 1

      They're not "on it." They admitted they were powerless to solve their own problems without help from their victims.

    4. Re:Check the blog... by Anonymous Coward · · Score: 1, Insightful

      Seems like they read everything but robots.txt.

    5. Re:Check the blog... by jc42 · · Score: 4, Insightful

      They admitted they were powerless to solve their own problems without help from their victims.

      Heh. It's another "damned if you do; damned if you don't" scenario. Usually, people criticise Microsoft for developing software without bothering to consult or test with actual customers. Now we have a manager of a MS dev group that actually does communicate (though not exactly with "customers"), and acts on what they say, so he's criticised for needing help from his "victims".

      Ya can't win that game.

      But the fact is that if you're developing server-side web software, you need to test it against real-world sites, not just the toy sites you've set up in your lab. And we all know the "Sourcerer's Apprentice" sort of bug that produces a runaway test that tries to do something as many times as it can per second until it's killed. Good testers will be on the lookout for such events, but it's understandable that they might fail occasionally

      Among web developers, MS does have a bit of a reputation for hitting your new site with a flood of requests, trying to extract everything that you have (even the content of your "tmp" directory which your robots.txt file says to ignore). There are lots of small sites that block MS address ranges for just this reason.

      It should be considered good news that there's at least one MS manager who understands all this, and is willing to talk to the "victims" and fix the problems. Now if they could fix the next-level problem, that this sort of thing happens repeatedly and their corporate culture seems to have no way to prevent it from happening again.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    6. Re:Check the blog... by Anonymous Coward · · Score: 0

      May be fix the victim's server to suit their bot.

    7. Re:Check the blog... by schon · · Score: 2, Informative

      They admitted they were powerless to solve their own problems without help from their victims.

      Heh. It's another "damned if you do; damned if you don't" scenario.

      Un, no. Not unless you're a rabid MS apologist.

      Usually, people criticise Microsoft for developing software without bothering to consult or test with actual customers.

      True.

      Now we have a manager of a MS dev group that actually does communicate (though not exactly with "customers"), and acts on what they say, so he's criticised for needing help from his "victims".

      Umm, exactly how did he act on what they said? According to the quote, they explicitly didn't act, which is the problem people are complaining about.

  10. What's not? by tjstork · · Score: 1, Troll

    It's not like ASP.NET is the most efficient way to sling web pages to being with.

    --
    This is my sig.
  11. MS ineptitude? by Anonymous Coward · · Score: 2, Insightful

    From TFA:

    Hi,
    I am a Program Manager on the Bing team at Microsoft, thanks for bringing this issue to our attention. I have sent an email to nospam@example.com as we need additional information to be able to track down the problem. If you have not received the email please contact us through the Bing webmaster center at nospam@example.com.

    I mean, what additional information is needed wrt "respecting robots.txt" and "not letting loose more than one bot on a site at a time"?

    Bing. Meh.

    1. Re:MS ineptitude? by Anonymous Coward · · Score: 2, Interesting

      It kind of depends on the individual robots.txt. Google, for instance, added a bunch of extended rules that they respect but which aren't officially part of the robots.txt spec (which is pretty limited). If they've added some of those rules in it could be that it's failing to validate when the MS bot hits it and therefore being ignored.

    2. Re:MS ineptitude? by ShecoDu · · Score: 3, Interesting

      I remember reading that the MSNBOT reads the "Robots.txt" file, but cpantesters has a lowercase filename:

      http://static.cpantesters.org/robots.txt

      http://static.cpantesters.org/Robots.txt doesn't exist, so basically MSNBOT only respects the robots.txt on case insensitive operating systems.

    3. Re:MS ineptitude? by John+Hasler · · Score: 3, Interesting

      The standard clearly specifies lower case. However, if you are correct there's a simple way to send bingbots one way and all other bots another: create Robots.txt and robots.txt with different contents.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    4. Re:MS ineptitude? by gbjbaanb · · Score: 1

      I wonder if it has something to do with fixing this

      We asked Microsoft how it was planning improve Bing's indexing problem. "We're always working to improve the crawler," a Microsoft spokesperson told Ars. "With our latest crawler release still in beta, we doubled our crawling capacity worldwide. We increased our sitemap URL size to 50K and we made it easier for webmasters to control the crawler's aggressiveness."

    5. Re:MS ineptitude? by godless+dave · · Score: 1

      I mean, what additional information is needed wrt "respecting robots.txt" and "not letting loose more than one bot on a site at a time"?

      To begin with, you would probably have to explain what robots.txt is.

      --
      "If it's real, then it gets more interesting the closer you examine it. If it's not real, just the opposite is true." -
    6. Re:MS ineptitude? by PAjamian · · Score: 1

      They'll probably be asking specific questions, such as, "can we get a copy of your log entries so we can match the IPs and times to our own logs and see why this is happening?"

      --
      Windows is a bonfire, Linux is the sun. Linux only looks smaller if you lack perspective.
    7. Re:MS ineptitude? by Fnord666 · · Score: 1

      I remember reading that the MSNBOT reads the "Robots.txt" file, but cpantesters has a lowercase filename:

      This is not likely to be the cause. In the article the author states that "It seems their bots completely ignore the rules specified in the robots.txt," and that "I know this because I can see the IP addresses in the logs. ", one would have to assume that cpantesters have reviewed their logs and that they would have noticed if robots.txt was not being returned.

      --
      'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
  12. Probably just a bug. by tjstork · · Score: 5, Insightful

    I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?

    --
    This is my sig.
    1. Re:Probably just a bug. by Lloyd_Bryant · · Score: 5, Insightful

      I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?

      Sufficiently advanced incompetence is indistinguishable from malice. For additional examples, see Government, US.

      The simple fact is that ignoring robots.txt is effectively evil, regardless of the intent. It's not like robots.txt is some new innovation...

      --
      Don't tell me to get a life. I had one once. It sucked.
    2. Re:Probably just a bug. by fish+waffle · · Score: 5, Insightful

      I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?

      Probably. But since incompetence is the plausible deniability of evil it's sometimes hard to tell.

    3. Re:Probably just a bug. by mspohr · · Score: 1
      Occam's razor (or Ockham's razor[1]), entia non sunt multiplicanda praeter necessitatem, is the principle that "entities must not be multiplied beyond necessity" and the conclusion thereof, that the simplest explanation or strategy tends to be the best one.

      Rough translation: "Never ascribe to malice that which can be adequately explained by stupidity."

      --
      I don't read your sig. Why are you reading mine?
    4. Re:Probably just a bug. by alexhs · · Score: 2, Insightful

      these bots 'completely ignore the rules specified in robots.txt.'

      Microsoft ignoring standards is not incompetence, it's policy (NIH syndrome).

      --
      I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
    5. Re:Probably just a bug. by djupedal · · Score: 4, Insightful

      > "I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?"

      We assume MS is evil...

      We know they are incompetent.

      We feel this is typical.

      We pray they'd just go away.

      We think this will never end...

    6. Re:Probably just a bug. by gmuslera · · Score: 3, Insightful

      They are not ignoring robots.txt, probably just that they understand that file in their slighly different, but in the end incompatible, format. As every other file.

    7. Re:Probably just a bug. by Yvanhoe · · Score: 4, Interesting

      There is such thing as criminal incomptence. If a script kiddie can be arrested for having a virus "out of control" I don't see why Microsoft engineers DDOSing a website couldn't be charged.

      By the way a philosopher once told that "evil" did not exist. That it was most of the time just a kind of hidden stupidity.

      --
      The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
    8. Re:Probably just a bug. by MrMr · · Score: 5, Insightful

      The problem is, there is no evidence that:
      Never ascribe to stupidity that which can be adequately explained by malice.
      Is invoking more entities.
      In fact, claiming that the commercially most successfull software company got there through stupidity rather than malice sounds extremely implausible to me.

    9. Re:Probably just a bug. by ztransform · · Score: 1

      The simple fact is that ignoring robots.txt is effectively evil, regardless of the intent. It's not like robots.txt is some new innovation...

      Since when did Microsoft feel existing standards were something to honour? How many times have its browsers changed behaviour? Re-defined entrenched URL standards (you cannot specify username/password in an Internet Explorer URL but this is a legal standard form of URL)?

      It stands to reason Microsoft would take no notice of anything your website has to say.

      Unless.. of course.. Microsoft define a certificate type that can sign your Microsoft-specific format exception list after payment on an annual licensing basis..

      Oh hey, another Microsoft example: Vista! After all, why assume someone upgrading their operating system might expect the same if not better!

      PS see http://support.microsoft.com/kb/834489

    10. Re:Probably just a bug. by Lundse · · Score: 1

      That's a pretty rough translation!

      You might be able to argue, that the latter saying is a corollary of the former, but in no way do they mean the same.

      Occam says the simplest explanation is best - the better explanation is the one with least assumptions.

      In this case, Occam affords us no help - we already know MS is both "evil" and incompetent. So the two explanations are equal in this regard. The "corollary" suggests, then, something else; namely that stupidity is a better explanation than "evil" in all/most cases (presumably because stupidity is more widespread).

      --
      IAIFARSIJDPOOTV - I Am In Fact A Reality Star; I Just Don't Play One On TV
    11. Re:Probably just a bug. by Rogerborg · · Score: 5, Informative

      You're probably new here, but if you'd RTFA, you'd see that:

      It seems their bots completely ignore the rules specified in the robots.txt, despite me setting it up as per their own guidelines on their site

      Come to think of it though, isn't this what happens to most people who try to interoperate with Microsoft?

      Amusingly, if I Google for "bing robots.txt" I get a link to a bing page titled "Bing - Robots.txt Disallow vs No Follow - Neither Working!" which has already been elided from history by Microsoft. CLassy.

      --
      If you were blocking sigs, you wouldn't have to read this.
    12. Re:Probably just a bug. by Anonymous Coward · · Score: 1, Informative

      Excuse my ignorance, but isn't robots.txt compliance easily enforceable on the server? I remember something about hiding links to trap pages in order to indentify robots and then holding identified robots responsible for robots.txt infractions by blocking their IP address.

    13. Re:Probably just a bug. by drspliff · · Score: 1

      Well, the last I heard Bing spider was looking for `Robots.txt` rather than `robots.txt` which would explain the file being "ignored" in this case.

    14. Re:Probably just a bug. by paiute · · Score: 1

      I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?

      Probably. But since incompetence is the plausible deniability of evil it's sometimes hard to tell.

      "incompetence is the plausible deniability of evil"

        fish waffle, that is great sig material.

      --
      If Slashdot were chemistry it would look like this:Cadaverine
    15. Re:Probably just a bug. by Suki+I · · Score: 5, Funny

      Try saving a copy as robots.docx and see if that works ;)

    16. Re:Probably just a bug. by maxwell+demon · · Score: 1

      In fact, claiming that the commercially most successfull software company got there through stupidity rather than malice sounds extremely implausible to me.

      So if certain Microsoft products are or were insecure and/or unstable, it wasn't incompetence, but malice? You think Microsoft was happy every time a user got the dreaded Blue Screen Of Death?

      --
      The Tao of math: The numbers you can count are not the real numbers.
    17. Re:Probably just a bug. by init-five · · Score: 1

      I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?

      how about both?

      --
      Hallowed are the Ori
    18. Re:Probably just a bug. by Opportunist · · Score: 4, Funny

      Like my grandpa said, it doesn't matter how dumb you are. As long as you find someone even dumber to sell to.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    19. Re:Probably just a bug. by Anonymous Coward · · Score: 0

      By the way a philosopher once told that "evil" did not exist. That it was most of the time just a kind of hidden stupidity.

      Un huh; so child raping priests are not evil, just stupid. Sounds like a perverse definition of stupid to me.

    20. Re:Probably just a bug. by Xest · · Score: 1

      Yes, and I like the solution too- rather than contact Microsoft to find out what the fuck is going on, post it to Slashdot and get Slashdotted as well.

      Pure genius.

    21. Re:Probably just a bug. by afidel · · Score: 4, Funny

      I wonder if it's a CR/CRLF bug =)

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    22. Re:Probably just a bug. by horatio · · Score: 1

      You think Microsoft was happy every time a user got the dreaded Blue Screen Of Death?

      Yes, in a way. I never really thought about it until you asked, but it fits with their business model of forcing users into an expensive upgrade of their OS every few years. Look what has happened with XP. It doesn't blue screen [as] much, and they've met heavy resistance from folks not wanting to upgrade to Vista. (Never mind that Vista is crap.) So now they've re-packaged Vista as "Windows 7" and hope folks don't realize it looks the same and smells the same, because it basically is.

      --
      There is very little future in being right when your boss is wrong.
    23. Re:Probably just a bug. by hairyfeet · · Score: 3, Interesting

      But MSFT is a corporation, which thanks to our corporate butt kissing congress and courts can just go "ooopsie", maybe cut a small check at most, and walk away scott free.

      And as for your philosopher? I saw an interview with Joss Whedon on writing evil characters that I thought really hit the nail on the head. He said, and I paraphrase "The villain never sees himself or herself as evil. To them there is a perfectly justifiable reason for their actions. I have known some truly evil people, those that have intentionally hurt their fellow man out of pure malice, and to them their actions were justified and noble. They simply didn't see what they did as wrong."

      Which is how you get MSFT and Intel paying backroom deals to crush competition, or Jack Trammell and his "business is war" philosophy. To the ones making the decisions "the other guy would do it to us if they could, so why shouldn't we do it to them?". I'm sure that if you talked to Gates or the head of Intel you could never get them to believe that crushing your competition any way you can is wrong. To them that was/is business 101 and not evil. That is why I think Whedon was right, the villain always thinks they are noble.

      --
      ACs don't waste your time replying, your posts are never seen by me.
    24. Re:Probably just a bug. by schon · · Score: 5, Insightful

      It has nothing to do with the RTFA.

      their own guidelines on their site

      As anyone who has ever read MS documentation can tell you, you need to read it, then implement a test, so you can see what it really expects, then adjust your test, then try it until it works.

      Their problem is that they expected MS documentation to actually describe the expected behaviour.

    25. Re:Probably just a bug. by MrMr · · Score: 1

      Sort of:
      I'm saying that the assumption that these flaws persist through incompetence is not a less complex explanation.
      The fact that issues were not solved in one of their later releases may very well be a deliberate commercial decision, which would make it indeed malicious rather than incompetent from the end-user perspective.

    26. Re:Probably just a bug. by CrazyDuke · · Score: 1

      Something that bugs me about that statement: Out of curiosity, since when does a lack of evidence amount to an adequate explanation?

      And, also, how does malicious incompetence fall under that false dichotomy? Or, for that matter, what of reckless incompetence and plausible dependability?

      Oh, and for the record: Experience tells me such an outcome is often the result of a PHB or two and a few "I don't give a fuck anymore." engineers. It's fun to dismiss PHBs as merely incompetent. But, what they are competent in is convincing people their actions warrant promotion, regardless of the actual results of their actions.

      --
      Any sufficiently advanced influence is indistinguishable from control.
    27. Re:Probably just a bug. by kjart · · Score: 0, Troll

      The simple fact is that ignoring robots.txt is effectively evil, regardless of the intent.

      So evil, in fact, that you just know that nobody else would ever do something like this. Oh wait...

    28. Re:Probably just a bug. by PinkyDead · · Score: 5, Funny

      Microsoft don't have any tools that can effectively read that format.

      --
      Genesis 1:32 And God typed :wq!
    29. Re:Probably just a bug. by AHuxley · · Score: 0, Troll

      Why would any search engine ignore a site?
      A site could have quality links to non ignore sites.
      Think of "robots.txt" as a flag to 'do not display results to consumers".
      Selected paying customers who sign a NDA ect. would get to see all the webs.
      Ignoring robots.txt is effectively how search engines would work, we just got to see it for an instant.

      --
      Domestic spying is now "Benign Information Gathering"
    30. Re:Probably just a bug. by CFBMoo1 · · Score: 1

      Your links look odd.

      Google: http://www.google.co.uk/search?q=bing+robots.txt
      Bing: http://www.bing.com/community/forums/t/647019.aspx

      My Bing: http://www.bing.com/search?q=bing+robots.txt

      I get "Bing Not Honoring Robots.txt Directives?" as the second hit on Bing. The first is their own site which kind of makes sense since Bing is the first search term I used.

      --
      ~~ Behold the flying cow with a rail gun! ~~
    31. Re:Probably just a bug. by Goaway · · Score: 3, Informative

      I'm sure you heard that, but it's not actually true in any way.

    32. Re:Probably just a bug. by blueZ3 · · Score: 3, Insightful

      What's amusing about the issue in the kb is that the problem that they're "solving" by breaking the username/password in a URL standard is NOT a problem with username/password URLs, but a problem with how IE displays the URLs. In other words, rather than fixing the behavior of IE's address and status bars to display such URLs correctly, they just stopped supporting them.

      Incompetence at that level isn't just indistinguishable from malice, it IS malicious.

      --
      Interested in a Flash-based MAME front end? Visit mame.danzbb.com
    33. Re:Probably just a bug. by blueZ3 · · Score: 1

      For the sake of argument...

      Wouldn't you say that in most cases malice implies more complexity? For example, it only takes one stupid mistake by a coder to introduce a bug, whereas intentional introduction of flaws for some sort of business purpose supposes a concerted effort by a group?

      Just asking :-)

      --
      Interested in a Flash-based MAME front end? Visit mame.danzbb.com
    34. Re:Probably just a bug. by Chyeld · · Score: 0, Redundant

      *woosh*

      That's the sound of Microsoft embracing and extending robots.txt, or you missing the joke the OP made... one of the two.

    35. Re:Probably just a bug. by Pharmboy · · Score: 1

      robots.rtf?

      --
      Tequila: It's not just for breakfast anymore!
    36. Re:Probably just a bug. by Anonymous Coward · · Score: 0

      I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?

      Probably the work of the "best and brightest" from India.

    37. Re:Probably just a bug. by MrMr · · Score: 1

      Good thing there's a google-cache. Now if we only need a Bing-cache where Googles whitewashing is archived...

    38. Re:Probably just a bug. by mR.bRiGhTsId3 · · Score: 3, Interesting

      That would be tremendously amusing. I can see the headline now. Bing robots DDoS attack every Unix hosted site by assuming Windows linefeeds.

    39. Re:Probably just a bug. by Pharmboy · · Score: 2, Funny

      Wow, you must be new....to computers. I particularly liked you comment "A site could have quality links to non ignore sites." as justification for a bot to ignore robots.txt. Can I have your AOL email address so I can write you personally?

      --
      Tequila: It's not just for breakfast anymore!
    40. Re:Probably just a bug. by b1t+r0t · · Score: 2, Informative

      What exactly do you mean by "elided from history"? I brought them both up, turned off the CSS (Google's version is broken), and tab-flipped betwen them. Not only is the page still there, it has all the same posts as the Google cache version, with small differences such as tags switching around, number of posts by users, and another stupid Blackpool adlink. Maybe you found some messages missing and then Google later re-cached it, but the thread itself is certainly not missing.

      --

      --
      "Open source is good." - Steve Jobs
      "Open source is evil." - Microsoft
    41. Re:Probably just a bug. by Hurricane78 · · Score: 1

      The most realistic thing to do, would be to not make any stupid assumptions at all, about things that you know nothing about.

      But who cares for actualy facts, nowadays, right? As long as you strongly prescribe to a side... “doesn’t matter which, as long as it’s mine!” ...you’re good. Right. :(

      This world depresses me.

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    42. Re:Probably just a bug. by ckaminski · · Score: 1

      I think his point is that if you look at the Google Cache and the actual page on bing.com, the entire discussion is gone. Just the original question. The rest, poof gone!

    43. Re:Probably just a bug. by StuartHankins · · Score: 1

      As anyone who has ever read MS documentation can tell you, you need to read it, then implement a test, so you can see what it really expects, then adjust your test, then try it until it works.

      Mod parent up. I thought it was just me...

    44. Re:Probably just a bug. by b1t+r0t · · Score: 1

      I can't tell whether you're being ineptly sarcastic or really that stupid. The main purpose of robots.txt is to keep web spiders (aka "robots") from getting stuck in a tarpit of script-generated pages which are not only redundant but waste resources of the website, possibly bringing it to its knees. For instance, something like a button that says "full view" that shows the same page with more fancy formatting.

      What it's not for is hiding stuff from view, because anybody can look at your robots.txt file and see that you have a /secret/ path in your web site. Yes, this actually happens, and people actually do find the secret information and have fun scattering it across the internets.

      --

      --
      "Open source is good." - Steve Jobs
      "Open source is evil." - Microsoft
    45. Re:Probably just a bug. by Anonymous Coward · · Score: 0

      You're probably new here, but if you'd RTFA, you'd see that:

      You're probably new here. We don't RTFA, we just make broad generalisations based on as little evidence as possible.
      Points will be deducted for making sense ;-)

    46. Re:Probably just a bug. by Yvanhoe · · Score: 1

      A pedophile that doesn't understand his pulsions and do not interest in existing treatments, who believes sexual drive come from the devil and are a challenge to his own faith, yes act more out of stupidity than of "evilness".

      I once heard the story of the first serial killer arrested thanks to psychological profiling (someone who murdered old women and mutilated them). Do you know what was his reaction when he heard about how he was found ? He wanted psychological help. Until then he did not understand he was struggling with abnormal urges.

      --
      The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
    47. Re:Probably just a bug. by Anonymous Coward · · Score: 0

      If so, I guess they should have used chomp() instead of chop() :-)

    48. Re:Probably just a bug. by darkpixel2k · · Score: 1

      robots.rtf?

      No, it's robots.wmf. Just take a screenshot of the URL you don't want them accessing...

      --
      There's no place like ::1 (I've completed my transition to IPv6)
    49. Re:Probably just a bug. by catman · · Score: 1

      So if certain Microsoft products are or were insecure and/or unstable, it wasn't incompetence, but malice? You think Microsoft was happy every time a user got the dreaded Blue Screen Of Death?

      Of course. "Oh, that's fixed in the next version, please upgrade. That'll be $nn, thank you. " Ka-ching!

    50. Re:Probably just a bug. by AnonymouseUser · · Score: 1

      > We pray they'd just go away.

      Therein lies the problem, and is why they never will go away. Instead of hoping/praying they go away, I do everything I can to make them go away. IOW, I avoid their products every chance I get, and recommend others use non-MS products whenever I can.

    51. Re:Probably just a bug. by tomhudson · · Score: 1
      That's because they're not at version 3 of whatever they're working on that did the DDoS.

      Either it's prior to version 3, in which case it should be labeled "beta"

      Or it's after version 3, in which case it should be labeled "bloatware".

      Of course, emailing a site you're accidentally DDoSing, you'd better hope their email server is on another machine ...

    52. Re:Probably just a bug. by dwiget001 · · Score: 1

      Which, for me, has been my biggest pet peeve in relation to their "Help" files and similar documentation.

      They are, for the most part "not helpful", thank you very much.

    53. Re:Probably just a bug. by Silverlock · · Score: 1

      It's not terrorism if you have a flag and it's not computer theft if you have a brand name.

    54. Re:Probably just a bug. by Anonymous Coward · · Score: 0

      I accidentally did an experiment once on my webserver. I had no links pointing to a music directory but "just to be sure" I put an exclusion of that directory in my robots.txt file. Sure enough, microsoft's bot comes along, reads robots.txt, and immediately started reading my music directory!

    55. Re:Probably just a bug. by Jah-Wren+Ryel · · Score: 1

      I can't believe you are the ONLY person to point out that the guy took an obvious MS joke waaaay too seriously and then some dumbass mod came along and gave you a redundant - what the hell?

      --
      When information is power, privacy is freedom.
    56. Re:Probably just a bug. by Anonymous Coward · · Score: 0

      The real problem with MS is, that they have more lawyers than coders, so it's faster for them to file a patent on the bug and declare it a standard than actually fixing it.
      Just as with their old UI bug where the first mouseclick wouldn't register and you had to rapidly click again.

      This is just the beginning of the Dodeca-Indexing(TM) (C)

    57. Re:Probably just a bug. by rrohbeck · · Score: 1

      You got it backwards :)

    58. Re:Probably just a bug. by metamatic · · Score: 1

      Re-defined entrenched URL standards (you cannot specify username/password in an Internet Explorer URL but this is a legal standard form of URL)?

      HTTP URLs never supported username and password in the URL, according to the actual standards. RFC 1738 was the original URL specification. Section 3.1 said that some schemes supported username (and/or password) in the URL, giving the example of ftp urls. However, http was not one of the schemes supporting usernames or passwords, as you can see from the syntax description in section 3.3. None of the followup RFCs added user or password support to http URLs. In fact, RFC2396 noted in section 3.2.2 that the feature was not recommended even when it was supported. RFC3986 then deprecated the feature, even for ftp URLs. So user and password in http URLs was a non-standard feature Microsoft should never have implemented in the first place, and they were right to remove it. As far as I know, the only URL scheme which still officially supports username and password without deprecation is telnet, presumably on the grounds that anyone still using telnet doesn't care about the username and password being hacked anyway.

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
    59. Re:Probably just a bug. by epine · · Score: 1

      The main purpose of robots.txt is to keep web spiders (aka "robots") from getting stuck in a tarpit of script-generated pages

      You mean provocating cause at point of consensus. Once something becomes ensconced as a facility of the commons, it's purpose takes on a life of its own, as practiced in the large.

      Off-label use

      Robots.txt also serves as a sentinel for which parties on the net are playing ball, but obeying robots.txt, and who is operating outside the bounds of conformity, DoSing whatever they please.

    60. Re:Probably just a bug. by metamatic · · Score: 1

      What's amusing about the issue in the kb is that the problem that they're "solving" by breaking the username/password in a URL standard is NOT a problem with username/password URLs, but a problem with how IE displays the URLs.

      No, it's much more than a display problem. URLs get cached. They end up in history files, downloaded files, cache files on proxies, log files, everywhere. You can't guarantee that all the software out there is going to dispose of URLs securely or sanitize out usernames and passwords, so it would not have been safe to put usernames and passwords in URLs even if Internet Explorer had taken pains to avoid information leakage.

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
    61. Re:Probably just a bug. by LihTox · · Score: 1

      The villain never sees himself or herself as evil. To them there is a perfectly justifiable reason for their actions.

      Add to that the phenomenon of emergence, where the whole is different from the sum of its parts. An organization can be evil even if its members are not.

    62. Re:Probably just a bug. by binaryspiral · · Score: 1

      try robots.wtf

    63. Re:Probably just a bug. by Lundse · · Score: 1

      Hm... Interesting.

      Quite possibly, though it would be different for different domains. When it comes to bugs in software, obviously! To avoid Godwins law by a few borders, lets take the genocide in Rwanda as something which it would be pretty hard to explain by stupidity (without waxing philosophical).

      So I would still not say that "stupidity is a better explanation than malice" is implied by Occams Razor, and certainly not that they mean the same.
      But in certain domains, it could be a corollary...

      --
      IAIFARSIJDPOOTV - I Am In Fact A Reality Star; I Just Don't Play One On TV
    64. Re:Probably just a bug. by Anonymous Coward · · Score: 0

      The argument:

      "the other guy would do it to us if they could, so why shouldn't we do it to them?"

      can be invalidated by citing examples of humans who did not "do it" to anyone, when they had the power to.

      Bishop Tutu: "But the process of forgiveness also requires acknowledgement on the part of the perpetrator that they have committed an offence." http://www.writespirit.net/authors/desmond_tutu/desmond-tutu-on-forgiveness

    65. Re:Probably just a bug. by Suki+I · · Score: 1

      +1

    66. Re:Probably just a bug. by TheSpoom · · Score: 1

      That's not a bug, that's by design. How else are they supposed to only DDoS Unix and Mac servers and leave Windows servers alone?

      --
      It's better to vote for what you want and not get it than to vote for what you don't want and get it.
      - E. Debs
    67. Re:Probably just a bug. by watergeus · · Score: 1

      "By the way a philosopher once told that "evil" did not exist. That it was most of the time just a kind of hidden stupidity."

      Who was that?

    68. Re:Probably just a bug. by mgblst · · Score: 1

      It seems they are looking for robots.docx instead. It is really the admins fault, for not converting the file to the accepted way.

    69. Re:Probably just a bug. by Lotana · · Score: 1

      If there was a way to mod a comment to +10 Insightful: This post is it.

      Thank you.

    70. Re:Probably just a bug. by aralin · · Score: 1

      I think this thread is slowly approximating to the correct robots.wtf?

      --
      If programs would be read like poetry, most programmers would be Vogons.
    71. Re:Probably just a bug. by aralin · · Score: 1

      There should be a way to mark posts as favorite or submit to hall of fame of comments or something. This comment spot on describes my relationship with Microsoft during the last 14 years.

      --
      If programs would be read like poetry, most programmers would be Vogons.
  13. Fixing Bing's poor indexing by AHuxley · · Score: 1, Interesting

    Its not a bug, its a feature to index a site with a new, rapid, powerful, direct, personalised crawler :)
    http://arstechnica.com/microsoft/news/2010/01/microsoft-outlines-plan-to-improve-bings-slow-indexing.ars

    --
    Domestic spying is now "Benign Information Gathering"
    1. Re:Fixing Bing's poor indexing by Anonymous Coward · · Score: 0

      http://arstechnica.com/microsoft/news/2010/01/microsoft-outlines-plan-to-improve-bings-slow-indexing.ars

      Is the extension on the file referenced by that URL some indication as to the author's view of Microsoft's plans?

  14. This is a normal occurence for Bing by Anonymous Coward · · Score: 5, Informative

    I had a registration page - static content basically. The only thing that was dynamic was that it was referred to by many pages on the site with a variable in the querystring. Bing decided that it needed check on this one page *thousands* of time per day.

    They ignored robots.txt.
    I sent a note to an address on the Bing site that requested feedback from people having issues with the Bing bots - nothing.

    The only thing they finally 'listened' to was placing "" in the header.

    This kind of sucked because it took the registration page out of the search engines' index, however it was much better than being DDOS'd. Plus, the page is easy to find on the site so not *that* big a deal.

    Bing has been open for months now and if you search around there are tons of stories just like this. Maybe now that a site with some visibility has been 'attacked', the engineers will take a look at wtf is wrong.

    1. Re:This is a normal occurence for Bing by The+Cisco+Kid · · Score: 1

      Seems like a better solution would have been to setup a test for the either the User-Agent, or the IP/blocks that Bing was attacking your site from, and dropping those requests in /dev/null - your site would still exist on 'real' search engines, and Bing doesn't pound on your bandwidth anymore.

    2. Re:This is a normal occurence for Bing by The+Cisco+Kid · · Score: 1

      Replying to myself: if testing the UA or the IP in the httpd itself was too much load, you could have also just nullrouted the IP blocks the Bing spider was coming from, either in the kernel table, or in your router.

    3. Re:This is a normal occurence for Bing by Anonymous Coward · · Score: 0

      Or you could remove the dynamic variable from a static page so the bot knows it's always the same page?

    4. Re:This is a normal occurence for Bing by dkf · · Score: 1

      Replying to myself: if testing the UA or the IP in the httpd itself was too much load, you could have also just nullrouted the IP blocks the Bing spider was coming from, either in the kernel table, or in your router.

      I know of one site where this has been done for years (both with Bing and its predecessors). Sure it ruins the site's searchability for anyone using Bing, but like we care; that's better than having the site itself unreachable due to load and Google doesn't cause the same level of problems.

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    5. Re:This is a normal occurence for Bing by Anonymous Coward · · Score: 0

      oops. should've previewed my post and escaped the html:

      The only thing they finally 'listened' to was placing "<meta name="robots" content="noindex, nofollow">" in the header.

  15. Flooding... by Bert64 · · Score: 4, Informative

    I have noticed the microsoft crawlers (msnbot) being fairly inefficient on many of my sites...
    In contrast to googlebot and spiders from other search engines msnbot is far more aggressive, ignores robots.txt and will frequently re-request the same files repeatedly, even if those files haven't changed... Looking at my monthly stats (awstats) which groups traffic from bots, msnbot will frequently have consumed 10 times more bandwidth than googlebot, but is responsible for far less incoming traffic based on referrer headers (typically 1-2% of the traffic generated by google on my sites).

    Other small search engines don't bring much traffic either, but their bots don't hammer my site as hard as msnbot does.

    --
    http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    1. Re:Flooding... by Anonymous Coward · · Score: 0

      Block their crawlers. They will behave after that.

    2. Re:Flooding... by Manfre · · Score: 1

      Did you provide google with a sitemap file? If so, that explains why google does not need to check your site for changes as often.

    3. Re:Flooding... by Bert64 · · Score: 1

      I have 2 sites with sitemaps, but they were not the ones i was looking at as there are many more sites on the server.
      That also wouldn't explain why search engines other than msn don't hammer the site.

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
  16. Are you sure? by Errol+backfiring · · Score: 4, Insightful

    Are we sure this traffic comes from Microsoft? Could it not consist of forged network packets? You don't need a reply if you are running a DDOS. On the other hand, why would anyone, including Microsoft, want to bring down CPAN?

    --
    Nae king! Nae laird! Nae yurrupiean pressedent! We willna be fooled again!
    1. Re:Are you sure? by Anonymous Coward · · Score: 3, Funny

      Because they are coming out with P# and don't want the competition?

    2. Re:Are you sure? by Anonymous Coward · · Score: 2, Informative

      You only see an IP in an apache log after a successfull TCP handshake. This is hard (not impossible, but really, really hard) to do with a forged IP.

    3. Re:Are you sure? by TheRaven64 · · Score: 5, Informative

      Are we sure this traffic comes from Microsoft? Could it not consist of forged network packets?

      It's a TCP connection, so they need to have completed the three-way handshake for it to work. That means that they must have received the SYN-ACK packet or by SYN flooding. If they are SYN flooding, then that would show up in the firewall logs. If they've received the SYN-ACK packet then they are either from that IP, or they are on a router between you and that IP and can intercept and block the packets from thatIP.

      You don't need a reply if you are running a DDOS.

      You do if it's via TCP. If they're just ping flooding, then that's one thing, but they're issuing HTTP requests. This involves establishing a TCP connection (send SYN, receive SYN-ACK with random number, reply ACK with that number) and involves sending TCP window replies for each group of TCP packets that you receive.

      On the other hand, why would anyone, including Microsoft, want to bring down CPAN?

      Who says that they want to? It's more likely that their web crawler has been written to the same standard as the rest of their code.

      --
      I am TheRaven on Soylent News
    4. Re:Are you sure? by Anonymous Coward · · Score: 0

      Yes, I was getting this, too on a couple of perl-driven sites. The Bing msnbots were ignoring the crawl delay. Turns out they weren't, but they had several crawlers working on it at once, effectively ignoring the crawl delay. They still are, so I gave them a 300 second crawl delay and it's dropped to a reasonable level.

      They were also ignoring the Disallow: headers until I notified "Live Search WMC community " and got somebody working on the problem to look at it. Apparently Bing needs a little handholding or he gets ADHD.

    5. Re:Are you sure? by Anonymous Coward · · Score: 0

      > Are we sure this traffic comes from Microsoft?

      'The bots were identified by their IP addresses, including 65.55.207.x, 65.55.107.x and 65.55.106.x, as coming from Microsoft. The administrators of CPAN Testers have now blocked access to their site from these addresses'

    6. Re:Are you sure? by Anonymous Coward · · Score: 0

      [snip] ... send SYN, receive SYN-ACK with random number, reply ACK with that number

      Unless the client platform has a predictable RNG

    7. Re:Are you sure? by mikelieman · · Score: 1

      On the other hand, why would anyone, including Microsoft, want to bring down CPAN?

      Jealousy?

      --
      Technology -- No Place For Wimps! Grateful Dead and Jerry Garcia Chatroom -- http://www.wemissjerry.org
    8. Re:Are you sure? by Anonymous Coward · · Score: 0

      Are we sure this traffic comes from Microsoft? Could it not consist of forged network packets? You don't need a reply if you are running a DDOS. On the other hand, why would anyone, including Microsoft, want to bring down CPAN?

      I am getting rather frustrated that the article describes this as a "dedicated" denial of service attack, the headline reads DDOS which means distributed denial of service, and yet not only is nobody mentioning that fact, but are commenting as if it really was a distributed attack.

  17. So block those IP ranges? by Evro · · Score: 1

    If they've identified the IP ranges, why not just block them? You can do it at the router or TCP level (drop packets), or just throw up a 403 Forbidden.

    --
    rooooar
    1. Re:So block those IP ranges? by Anonymous Coward · · Score: 0

      RTFA, they did.

      The bots were identified by their IP addresses, including 65.55.207.x, 65.55.107.x and 65.55.106.x, as coming from Microsoft. The administrators of CPAN Testers have now blocked access to their site from these addresses.

    2. Re:So block those IP ranges? by John+Hasler · · Score: 3, Informative

      > ...why not just block them?

      They have.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    3. Re:So block those IP ranges? by Anonymous Coward · · Score: 0

      Hush now!

      If it was not an IP address block from Micro$haft they would have done exactly that. This is meant to cause pure unadulterated (heck even adulterated) embarassment to our sworn mortal enemy. Nothing more Nothing less.

    4. Re:So block those IP ranges? by Sarten-X · · Score: 5, Insightful

      For ignoring robots.txt, they don't deserve any more nor less.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    5. Re:So block those IP ranges? by delinear · · Score: 1

      According to TFBlog, that's what they're doing (returning 403s), but it's still a nuisance as it's filling up the log file with thousands of requests per hour (I don't know if there's a way to prevent this being logged, I'm just relaying what they're saying).

    6. Re:So block those IP ranges? by thePowerOfGrayskull · · Score: 1

      If they've identified the IP ranges, why not just block them? You can do it at the router or TCP level (drop packets), or just throw up a 403 Forbidden.

      That's a good temporary solution -- but unfortunately as Bing continues to gain market share, blocking them will cost you. For this particular site it's probably not a big deal, but that's not really practical for other sites hammered by Bing (my own included: I've had problems with it ignoring robots, and frequently re-indexing the same unchanged pages - regularly consuming over twice the bandwidth that googlebots have in the same time.)

  18. Ask the Chinese to do it by Anonymous Coward · · Score: 0

    They know how.

  19. So... by Anonymous Coward · · Score: 0

    Block the IP addresses and send Microsoft email?
    What am I missing here?

  20. Incompetent? by omb · · Score: 1

    Yes, Evil more so

  21. Too easy for Microsoft by BhaKi · · Score: 1

    I suppose Microsoft can offer a simple explanation: "Our servers and other internal infrastructure are so vulnerable that they have been hacked and being used as remote-controlled botnets."

    --
    The largest prime factor of my UID is 263267.
  22. Evil? What "evil"? by Anonymous Coward · · Score: 0

    So.. by your definition of evil. If you fail math exam, you're being evil?

    If you trip down the stairs, and crash into somebody, you're evil?

    Do not attribute to malice, what can very well be attributed to incompetence, or just bad luck.

    Else, your mistaking this quote, is also evil then, according to your own definition of evil.
    However, that is logically impossible, since it falsifies the very premise, thus I must conclude you are false, and also probably with good intentions,
    if not just to get some modpoints, but I wouldn't call that evil ;)

    1. Re:Evil? What "evil"? by jbengt · · Score: 1

      Evil does not require malice

  23. Robots.txt by anomnomnomymous · · Score: 1

    Can anyone here clarify what robots.txt stands for, as in:

    Is it an 'agreement' to not scan the site at all (by a search engine bot), or is it meant to just not -display- those results in the search engine?
    I'd assume, since everything on a site is more or less public, that it would be the second. And if so, I can't see anything wrong with what Microsoft's bots did.

    I can see how scanning a site's content (even if you're not going to list the results in your search engine) can have some value to a company.

    --
    When you shoot a mime, do you use a silencer?
    1. Re:Robots.txt by Ogi_UnixNut · · Score: 2, Informative

      It's the first. Whatever you specify in the robots.txt as no-follow etc... means not to spider the pages, so no scanning of them at all.

      You use it for when you only want part of your site to appear in search results, such as just the front page (for example). The rest of the site should not be touched by the bot at all.

    2. Re:Robots.txt by afidel · · Score: 2, Informative

      It's basically a rough pattern filter that the bot is supposed to follow on parts of the site not to crawl. One reason it's used is that you can have dynamically generated pages that create an infinite loop that's impossible for the bot to detect.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    3. Re:Robots.txt by TerranFury · · Score: 1

      AFAIK you're not supposed to visit URLs that robots.txt tells you not to. The issue is more to do with load on the servers, side-effects from cgi programs, and the like (for instance, you don't want web robots clicking your "one-click ordering" button*) than it is to do with public visibility of the content: If you want to hide something, you don't put it up on a public webserver to begin with.

      As usual, Wikipedia has more to say.

      * ok, bad example; no purchase system actually works like this... but you get the idea.

    4. Re:Robots.txt by TerranFury · · Score: 1

      I just learned something...

      There are ways to achieve each of the various things you mention. See this, this, and this.

    5. Re:Robots.txt by anomnomnomymous · · Score: 1

      Ahright. Never thought of that: That makes sense. Thanks for the answer :-)

      --
      When you shoot a mime, do you use a silencer?
    6. Re:Robots.txt by Anonymous Coward · · Score: 0

      There are multiple standards for controlling bots. robots.txt is used to deny bots access to certain parts of the site. The rules forbid the bot from accessing the pasges listed as disallow. One can in the alternative mark individual pages to allow them to be spidered, but not indexed (which really means, never show this page in search results, but feel free to download a copy, and utilize the links), or indexed but not have the links on that page be followed or count in any algorithm, or to disallow both indexing and link following. Those can be found in the form of meta tags.

      There are some non-standard additions to robots.txt, which may point out a sitemap file, an allow directive, to allow specific files in a generally disallowed directory, and a crawl-time value which allows a webmaster to slow access to the site by bots which support it.

      One also can use sitemaps to work with spiders, to give them information about the pages that exist, including hints about how important pages are relative to one another, how frequently they change, and potentially even the last changed data of the pages (that last works best if the sitemap is dynamically generated, or if the site is static HTML with relatively infrequent updates, so the commit script can regenerate the sitemap.)

      There is also the nofollow rel attribute on links, which appear to allow the links to be followed, but not count in any algorithm.

    7. Re:Robots.txt by Anonymous Coward · · Score: 0

      It's the first option. Basically, a robot takes up bandwidth, same as a user. If there are some parts of the site that are hugely bandwidth intensive, then the spider shouldn't be scanning them. robots.txt is supposed to tell spiders: 'don't go in this directory, ignore all links that point to .php or .pl files, don't read this one pdf that takes forever to download, &c &c'.

      What microsoft is doing is not reading that file, so it's robots are aggressivly indexing pages that take forever for the server to generate. CPAN isn't really interested in not showing up on bing (though because those pages aren't read, parts of CPAN won't show up to google and other 'good' spiders), they are interested in not having to do a bunch of server-side processing because Microsoft is blatently ignoring a standard.

    8. Re:Robots.txt by John+Hasler · · Score: 2, Informative

      Is it an 'agreement' to not scan the site at all...

      It is a request not to scan part or all of a site. robots.txt

      And if so, I can't see anything wrong with what Microsoft's bots did.

      Every site does not have dozens of powerful servers and terabytes of bandwidth, nor is every site an ad-supported one that wants to maximize traffic. Common courtesy requires that a bot operator minimize his impact on any given site and honor requests not to index. Of course "courtesy" and "honor" are concepts that baffle Microsoft managers.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    9. Re:Robots.txt by Terrasque · · Score: 1

      It is in the interest of the bot admins to respect it, since step 2 is usually full-out block of the ip's / user agent.

      --
      It's The Golden Rule: "He who has the gold makes the rules."
    10. Re:Robots.txt by John+Hasler · · Score: 1

      > ...step 2 is usually full-out block of the ip's / user agent.

      Inconceivable to a Microsoft manager. No one could tolerate the resulting loss in traffic and therefor ad revenue. What's that you say? You operate a small, noncommercial site? Then it couldn't possibly have any content of interest to Bing users.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    11. Re:Robots.txt by Fnord666 · · Score: 1

      Here is some info that you might find helpful.

      --
      'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
  24. The US government is competent. by tjstork · · Score: 0, Troll

    . For additional examples, see Government, US.

    I'm a right winger and I like to see smaller, less intrusive government, but, I think it is wrong to say that the US government is competent.

    The US Gov't has successfully operated as a going concern for 220+ years, with a proven and reliable management structure. Few, if any corporations, have been able to do that.

    --
    This is my sig.
    1. Re:The US government is competent. by elvesrus · · Score: 1

      you might want to read that over again

    2. Re:The US government is competent. by jimicus · · Score: 2

      The US Gov't has successfully operated as a going concern for 220+ years, with a proven and reliable management structure. Few, if any corporations, have been able to do that.

      Private corporations can go under with just a couple of bad years. Or even months, particularly if they're new businesses. Governments just have to raise taxes.

    3. Re:The US government is competent. by Anonymous Coward · · Score: 0

      proven and reliable management structure

      Huh?
      Can live under that rock with you? Seems like a blissful place.

    4. Re:The US government is competent. by Anonymous Coward · · Score: 0

      .

      The US Gov't has successfully operated as a going concern for 220+ years

      You must have a whack-job definition of "successful"

    5. Re:The US government is competent. by Anonymous Coward · · Score: 0

      yet several of it's operating entities (states) are a going concern.

    6. Re:The US government is competent. by Anonymous Coward · · Score: 0

      The US Government is a baby compared to the majority of other nation's governments who are much older. Corporations are something entirely different than government (though they do have similarities) and anyone who thinks a government should be operated like a corporation I automatically have issues with, on multiple levels. The fact you think of the US Government as a corporation (which are noted for their usual lack of empathy for their consumers aka citizens and have the sole goal of expanding as much as possible while turning a profit) and at the same time think you want a smaller government shows a large disconnect. In other words: I don't think you actually know what you want and I think people of a similar mind will do much more harm to the US than already has been done. "The path to Hell is paved with good intentions..." and all that.

    7. Re:The US government is competent. by tjstork · · Score: 1

      you might want to read that over again

      Didn't say I was!

      --
      This is my sig.
    8. Re:The US government is competent. by tjstork · · Score: 1

      I don't think you actually know what you want and I think people of a similar mind will do much more harm to the US

      I think that is a fair statement. I'm putting together a piece for the relaunch of my web site that takes the federal budget, breaks it down to # of days you have to work to support each line item, says, what happens if you don't do that, then, lets you cut to your heart's content, and then tallies the results for everyone to see what the averages are.

      I don't think anyone even really gets the government at all, left or right.

      --
      This is my sig.
    9. Re:The US government is competent. by Lloyd_Bryant · · Score: 1, Offtopic

      I'm a right winger and I like to see smaller, less intrusive government, but, I think it is wrong to say that the US government is competent.

      The US Gov't has successfully operated as a going concern for 220+ years, with a proven and reliable management structure. Few, if any corporations, have been able to do that.

      Let's see...

      War on Poverty - yeah, that worked out *real* well, didn't it.
      War on Drugs - See any results there?
      War on Terror - With this one, I can't really tell if it's bungling, or actual malice

      Those are just the "big names".

      The US Government is a well-designed structure. And it worked pretty darn well for a while. But as federal power has increased, the effectiveness of that structure has decreased. In short, the Republic of the Founding Fathers is showing it's years.

      And, for the record, there are organizations such as Lloyd's of London that can trace their existence, in one form or another, back to the 17th century. And if you want a truly old corporation, look at Stora Kopparberg Bergslags Aktiebolag in Sweden, which has been around since the 1300's!

      Most corporations wither and die after a while, since their markets can wither and die. The "market" of a government does not - the only way for a government to "die" is by armed force, either from within or without.

      --
      Don't tell me to get a life. I had one once. It sucked.
    10. Re:The US government is competent. by tjstork · · Score: 1

      I'm a right winger and I like to see smaller, less intrusive government, but, I think it is wrong to say that the US government isn't competent.

      Fixed that.

      --
      This is my sig.
    11. Re:The US government is competent. by SpaceLifeForm · · Score: 1

      He said he was a right winger.

      --
      You are being MICROattacked, from various angles, in a SOFT manner.
    12. Re:The US government is competent. by Anonymous Coward · · Score: 0

      The govt can print money. As Dick Cheney said, "Reagan proved deficits don't matter." http://www.washingtonpost.com/ac2/wp-dyn/A26402-2004Jun8?language=printer

    13. Re:The US government is competent. by DragonWriter · · Score: 1

      Private corporations can go under with just a couple of bad years. Or even months, particularly if they're new businesses. Governments just have to raise taxes.

      Governments can fail quickly, too. Sure, they usually fall to different problems than private entities do -- governments usually that fail early generally due so, if it is early in their life, because of violent reactions by existing governments, and otherwise (early or not) because they so fail the populace that they see a violent reaction from them.

      New attempts to start governments probably fail about as frequently as attempts to start businesses.

      And, like any other government policy, raising taxes only works to the extent that the governed populace is willing to accept it.

  25. Or both by cheros · · Score: 1

    AFAIK, the one doesn't exclude the other.

    However, assuming evil is more fun :-)

    --
    Insert .sig here. Send no money now. Owner may sue, contents will settle. Batteries not included.
  26. What the hell has become of the word "problem"? by John+Hasler · · Score: 1

    > ...issues accessing their sites...

    "Issues"? What's wrong with "problem"? "Issues" is marketing-speak. Microsoft marketing-speak.

    And yes, get off my lawn.

    --
    Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    1. Re:What the hell has become of the word "problem"? by Spad · · Score: 1

      Blame ITIL; you can't call it a problem until you've had multiple incidents, or something.

    2. Re:What the hell has become of the word "problem"? by FerociousFerret · · Score: 1

      It depends on where you stand in the scenario.

      In this case, for Microsoft, who is not directly affected, it's an issue. For CPAN, it's a problem.

    3. Re:What the hell has become of the word "problem"? by bipbop · · Score: 1

      You can draw that distinction, if you like; but the word "issue" in this sense dates from the 14th century legal term "issue", and was used in this way long before you were born. See here for a discussion of different uses of the word: The Issue with Issues

    4. Re:What the hell has become of the word "problem"? by John+Hasler · · Score: 1

      I'm aware of the various meanings of the word "issue". It is now being used as a synonym for "problem", thus diluting the meaning of both words.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
  27. Typical of Bots by jmaslak · · Score: 0

    Sure, it should not ignore robots.txt. And if that's true, there's a problem - but I'd like MS's side of the story before assuming that it ignores robots.txt - who knows, maybe the robots.txt is malformed.

    I'd also like to know what user agent string is the crawler using.

    But all that said, this is not exactly news worthy. I've run large, dynamic internet sites for years. I've had problems with many, many different kinds of crawlers, from many companies (including companies like Google). There's a ton of bots out there that do ignore robots.txt (there was a few hundred bots that scanned the site I used to run, back in 2001, that ignored robots.txt). So it's something a programmer really needs to be ready to deal with.

    Yes, these bots are rude, abusive, and inconsiderate of the site owners (go figure - most of the companies running them, the small bots, are pretty much unethical anyhow - anything for a buck). But it's on the internet, just like spam and a bunch of other things we all get annoyed with. You have to deal with it.

    I suggest applications like mod_bwshare to even out this type of behavior, traffic shaping at the network layer for known abusers you don't just want to block, etc. Those are the tactics I use.

    1. Re:Typical of Bots by AHuxley · · Score: 1

      MS will just blame outsourcing, Danger engineering, pink, a new team just took over, oh wait they used that.
      Outsourcing works best.
      If you think about the other options it gets more interesting..
      Are google, yahoo, ms ect all passing robots.txt over?
      As a share holder, why waste the cpu time, storage and power costs if its of not of any direct short term or long use?
      Put that cpu time, storage and power usage to good for profit calculations, indexing faster or quality ads.
      If not who is paying for the ignore sites and why..

      --
      Domestic spying is now "Benign Information Gathering"
    2. Re:Typical of Bots by jack2000 · · Score: 1

      Disallow a directory in robots.txt if anyone opens it have a link there along the lines of: If you open this your ip will be blocked. Everyone that requests that link gets nullrouted for a week if they do it again they get nullrouted forever.

  28. Send the lost bots home. by N1ckR · · Score: 5, Funny

    I redirect lost bots home, seems a polite thing to do. 301 www.microsoft.com

    1. Re:Send the lost bots home. by TheSpoom · · Score: 1

      ...which they'd look up in their cache and find that it's already been sufficiently indexed; the only thing it would do to them is add your site's (Microsoft equivalent of) PageRank to www.microsoft.com.

      --
      It's better to vote for what you want and not get it than to vote for what you don't want and get it.
      - E. Debs
  29. Re:pl0s 2, Troll) by ArsenneLupin · · Score: 0, Troll
    Yes, that's the address that they should have redirected the Micro$hit spiders to.

    O, it's just a pumpkin :-(

    Here's the real address goatse.fr. Doesn't Mr Sarkozy have a lovely face?

  30. However, look at the private CEOs. by Anonymous Coward · · Score: 0

    However, look at the private CEOs. When the company goes under, they get the golden parachute and off to another business.

  31. DDoS? Really? by Siberwulf · · Score: 2, Informative

    I'm pretty sure the first "D" in DDoS stands for "Distributed."

    If it was really a DDoS, you wouldn't be able to filter the IP out with a simple regex (like the /^65\.55\.(106|107|207)/. from TFA).

    To boot, TFA didn't even say DDoS. Maybe that's too much to expect the editors to oh... I don't know...say... RTFA or Fact-Check it?

    I should drop my bar a bit, I suppose.

    1. Re:DDoS? Really? by Anonymous Coward · · Score: 0

      Welcome to Slashdot! How'd you manage to get such a low user id and not know how things work around here? :)

    2. Re:DDoS? Really? by Anonymous Coward · · Score: 0

      What do you mean you can't stop a DDoS with a simple regex? Ever tried .* ?

  32. No problem by rgviza · · Score: 4, Informative

    ipchains -A input -j REJECT -p all -s 65.55.207.0/24 -i eth0 -l
    ipchains -A input -j REJECT -p all -s 65.55.107.0/24 -i eth0 -l
    ipchains -A input -j REJECT -p all -s 65.55.106.0/24 -i eth0 -l

    problem solved

    --
    Don't kid yourself. It's the size of the regexp AND how you use it that counts.
    1. Re:No problem by j_sp_r · · Score: 4, Informative

      Linux IP Firewalling Chains, normally called ipchains, is free software to control the packet filter/firewall capabilities in the 2.2 series of Linux kernels. It superseded ipfwadm, but was replaced by iptables in the 2.4 series.

      You're a few kernels behind.

    2. Re:No problem by Anonymous Coward · · Score: 1, Insightful

      He's just running Debian stable. SCNR

    3. Re:No problem by Anonymous Coward · · Score: 0

      OK, tell me the story about Little Bobby IPTables again?

    4. Re:No problem by Anonymous Coward · · Score: 0

      Um... why REJECT? Better to use DROP. REJECT sends them a rejection notice. DROP just ignores the packet and dumps it in the bit bucket (/dev/null).

      iptables -A INPUT -p all -s 65.55.207.0/24 -j DROP
      iptables -A INPUT -p all -s 65.55.107.0/24 -j DROP
      iptables -A INPUT -p all -s 65.55.106.0/24 -j DROP

      Actually, for the last two, maybe iptables -A INPUT -p all -s 65.55.106.0/16 -j DROP, but I'd need someone who is better at iptables than I am to verify that this covers both the 106 and the 107 ranges.

  33. Complain to Upstream Providers by jchawk · · Score: 3, Interesting

    The CPAN folks could complain to their ISP and have them drop the traffic that's coming in to their boxes.

    Most ISP's will work with you to correct DDOS problems.

  34. Astroturfing Idiot by omb · · Score: 1

    If you dont know, you should Google it, that will make it clear /. is not a -help mailing list and this was stupid, feckless and criminal, as in mis-use of a computer system beyond authorisation.

    1. Re:Astroturfing Idiot by John+Hasler · · Score: 1

      Stupid and obnoxious, but not criminal unless there was deliberate intent to interfere with use of the site. Robots.txt is not access control. If you want to strictly limit your site to authorized users install an authorization system. The Web is public by default.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
  35. Re:Happy Dead Nigger Day! by woody.jesus · · Score: 2, Funny

    How dare you sir (or madam)!! How dare you! It is clear from the title of your post that you were not so subtly casting aspersions on an organization who I hold dear -- namely the Hirsute Dungeons n' Dragons society. You can frame your remarks in some obscure racial epithets, but to those of us who twirl our mustaches or stroke our beards while rolling dice, your insidious implication is brazenly clear. As the leader of a group of men (and women) With decorative facial hair who play Dungeons n' Dragons every Wednesday night, I cannot help but express the strongest offense to your euphamisticaly delivered hidden acronym. In the future, should you have such thoughts I would urge you to Do Not Say them.

    --
    "You never pushed a noun against a verb except to blow up something" (Spencer Tracey, 'Inherit the Wind')
  36. Aggresive MS Bot by badevlad · · Score: 0

    Yeah, in statistics of my site Microsoft bots are most active visitors. Really, they crawl site hundreds times more often than Googlebot.

  37. Re:Happy Dead Nigger Day! by ckaminski · · Score: 1

    You know women with decorative facial hair? mkaaaay....

  38. US Government is good. by tjstork · · Score: 0, Offtopic

    For every failure you list, I can give you three that succeeded.

    War on Poverty - yeah, that worked out *real* well, didn't it.

    Homestead act, Rural electrification act, Highways

    War on Drugs - See any results there?

    CDC, Peace Corp - cures smallpox worldwide. I don't know -any- government that can make that claim, but our US government.

    Social Security, Medicare - unless you really want your grandma to move in and then die.

    Food and Drug administration, Small Business Administration, Student Loans. Safe food, help for small businesses, put kids in college.

    Fannie Mae - yeah, it blew up, but look at how many people actually have -homes-. The whole banking crisis could have been Bush's finest hour. When the Democrats were railing on about the mortgage meltdown, Bush could have said, "yeah, but we put people into homes. We tried to put people into homes and give them a chance, and for the 95% of people who did NOT default on their mortgages, it totally worked."

    War on Terror - With this one, I can't really tell if it's bungling, or actual malice

    That's on all of us. Americans overreacted. We voted for the war on terror and the invasion or Iraq. We lost our cool after 9/11, and now we pay the price for our own stupidity.

    But, I'll see your war on drugs and raise you one US Military. Brings democracy to Japan and Germany, deters Commies from taking over europe. The military is a government operation, and for the most part, its actually worked pretty well.

    PS. Whose saving lives in Haiti right now? Why, its fresh water from American aircraft carriers, US Marines acting as peacekeepers. Our government did that, and we should be proud.

    --
    This is my sig.
    1. Re:US Government is good. by Nadaka · · Score: 2, Informative

      Nothing you listed under the "War on Drugs" has anything to do with the war on drugs.

      The war on drugs has made America a police state where the government can seize any of your property and auction it for profit before your trial. Even if you are found innocent, or the charges are thrown out for insufficient grounds, you will not be compensated for your lost money or profit. It has made an America where more people are imprisoned than any other nation on earth. It has made a nation where the cheapest and most effective drug for curing glaucoma and mitigating the pain and nausea associated with cancer treatments is a crime. Its made a nation where at least half its citizens are criminals.

    2. Re:US Government is good. by tomhudson · · Score: 1

      You left out the "War on Drugs"

      A total failure to treat a social problem. Wasn't Prohibition ! enough for you.

      Legalize it and tax the crap out of it.

      Who is threatened by that? The crooks. As long as there's a "War on Drugs", crooks are guaranteed monopoly profits and monopoly access, all supported by your tax dollars keeping people in jail.

      The first rule of consulting is "No matter what they say, it's ALWAYS a people problem." Well, it's true here to.

      Right now, the US has more people in jail than any other country in the world. And the #1 reason is the "War on Drugs." Stop it and you'll reduce crime, reduce drug use, save money and lives, and fix the deficit.

    3. Re:US Government is good. by tjstork · · Score: 1

      Stop it and you'll reduce crime,

      No, we'll just legalize it. So now we'll have corporations buying out advertising to convince people to ruin their lives by purchasing smack.

      --
      This is my sig.
    4. Re:US Government is good. by tomhudson · · Score: 1

      No, we'll just legalize it. So now we'll have corporations buying out advertising to convince people to ruin their lives by purchasing smack.

      Do the same as tobacco - high taxes, illegal to advertise, gross packaging, fines of up to 2/3 of a million dollars for illegal distrbution, etc.

    5. Re:US Government is good. by tjstork · · Score: 1

      Do the same as tobacco - high taxes, illegal to advertise

      I'm down with that.

      --
      This is my sig.
  39. I can't wait till the MS bots index private data by StuartHankins · · Score: 1

    What happens when the MS bots (which apparently ignore the robots.txt file) start indexing some site which provides pay-per-view information? Can we expect a fix to the problem then? All it takes is to get some lawyers involved, you know how that snowball goes.

  40. IP Spoofing by jkantola · · Score: 1

    How's it possible that, on Slashdot of all sites, *I*, of all people, need to tell you that IP packets do not necessarily come from the address inscribed in their headers?

    1. Re:IP Spoofing by John+Hasler · · Score: 1

      > IP packets do not necessarily come from the address inscribed in their
      > headers?

      TCP/IP connections do necessarily come from the address inscribed in their headers.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    2. Re:IP Spoofing by Anonymous Coward · · Score: 1, Funny

      Because you don't know what you're talking about?

      Understand, then post.

    3. Re:IP Spoofing by Fnord666 · · Score: 1

      How's it possible that, on Slashdot of all sites, *I*, of all people, need to tell you that IP packets do not necessarily come from the address inscribed in their headers?

      Maybe it's because most people here know how TCP works?

      --
      'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
  41. Mod parent up by Lonewolf666 · · Score: 3, Insightful

    While he could be more polite, it is indeed embarrassing for Microsoft if they cannot check their own network
    a) for the existence of computers with given IPs
    b) what these computers are doing

    I think that deserves an "insightful" that cancels out the "flamebait".

    --
    C - the footgun of programming languages
  42. Re:I can't wait till the MS bots index private dat by John+Hasler · · Score: 2, Insightful

    Robots.txt is merely advisory. Ignoring it is discourteous and oafish but not illegal.

    --
    Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
  43. hello? firewall? by v1 · · Score: 2, Insightful

    if it's a scan (TCP established stream, taxing the SERVERS, not the NETWORK) that's the problem, as opposed to a SYN flood etc, and the IP addresses are in a very small range, why aren't they just using a hardware firewall at the router and blocking the IPs? There's not a whole lot to "distributed" when it's coming from a pair of C's.

    Not saying they should be DOING it, but this is not a Denial of Service, it's a Denial of Stupid.

    --
    I work for the Department of Redundancy Department.
  44. I was just noticing this... by faedle · · Score: 1

    Wow, this article is prescient.

    I was just noticing in my web logs that small, out of the way sites that I host that used to get 1,000 hits a month were suddenly getting 1,000 hits PER DAY. Sure enough, anybody care to guess what netblock the 26,000 hits came from?

    Microsoft.com just earned a ban.

  45. Re:Happy Dead Nigger Day! by MstrFool · · Score: 1

    Sweet, got any room in your group? I have my own dice, mustaches and beard, though sadly lacking women for some reason.

    --
    Question reality.
  46. Microsoft being EVIL? by Anonymous Coward · · Score: 0

    "never ascribe to malice that which can be adequately explained by stupidity. (Insert lame joke about MSFT being full of stupidity here)."

    Insert true story about Microsoft being EVIL here, sometimes even unintentionally evil.

  47. Microsoft just tries to compete with Google by Anonymous Coward · · Score: 0

    I believe soon we will see a new Bing feature - real time results. This will definitely beat Google

  48. bing is written in perl by bingoUV · · Score: 2, Funny

    Got it! Bing is written in perl. They do regular expression matching while crawling and forgot to have a \E ... \Q escape sequence for the regex matching. They got so much perl code on CPAN, full of special characters, that somehow the crawler engine went into an infinite loop.

    --
    Bingo Dictionary - Pragmatist, n. A myopic idealist.
  49. Hanlon's Razor by PPH · · Score: 0, Redundant

    Never attribute to malice that which can be adequately explained by stupidity.

    --
    Have gnu, will travel.
  50. Looks like a simple bug to me by MerlynEmrys67 · · Score: 0, Flamebait
    Sadly not microsoft's though. If I am doing this correctly Robots.txt seems to return a 404 error. Looks like cpan removed their robots.txt file at least from where I am sitting.

    Looking at another Robots.txt file seems to return what I expect.

    Let no rock remain unthrown when it shows Microsoft is in the wrong - even if they aren't

    --
    I have mod points and I am not afraid to use them
    1. Re:Looks like a simple bug to me by chromatic · · Score: 1

      Looks like cpan removed their robots.txt file at least from where I am sitting.

      The file in question is robots.txt for cpantesters.org, which does exist.

  51. Simple solution: by Anonymous Coward · · Score: 1, Informative

    Add to your .htaccess file:

    deny from 65.55.207.
    deny from 65.55.106.
    deny from 65.55.107.

  52. Re:Happy Dead Nigger Day! by Anonymous Coward · · Score: 0
  53. You forgot abusive and socially backward. by Anonymous Coward · · Score: 0

    "Lazy, feckless, inconsiderate crooks." You forgot abusive and ignorant and socially backward.

    Don't you hate it when people are excessively positive about Microsoft?

    Steve Ballmer has little technical knowledge, and any good people who were at Microsoft left long ago, I'm guessing.

  54. download locally first and then test indexing by Anonymous Coward · · Score: 1, Interesting

    Bing should have used Wget first to download the articles to a local hard drive, and also to add a 2 to 3 second wait. Let it run over the weekend. Then test the search indexing algorithms on the local HTML files. They were probably performing indexing tests. I know they have smart people working for them, so it probably involved a contractor who didn't think about performance issues.

  55. robots.txt is to protect servers, not spiders by billstewart · · Score: 1

    If you remember the history of robots.txt because you were there are the time, rather than because you read it in some history book somewhere, the purpose was to protect small web servers from being trashed by big search robots, initially altavista, and secondarily to protect them from other well-behaved web crawlers of whatever sorts. There were no script-generated pages back then, or at least hardly any; just handing out static html could be difficult enough if you had a small pipe and a slow server, though serving images to a robot obviously a waste of time back then.

    Tarpits of various sorts existed soon after robots.txt, as a way of trapping spammer-run crawlers that ignored robots.txt, but that was as much for fun as for necessity :-)

    And yes, people did have /private/ directories back then and still do now, thinking that because Google's polite about not looking in directories robots.txt says not to that there aren't humans or impolite robots that won't look there.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  56. No, Blocking won't make them behave by billstewart · · Score: 1

    It'll just keep them from bothering you, and you're (almost by definition) too small for them to care that they're not indexing your site.

    Advertising their IP address block with BGP, if your ISP is careless enough to let you do that, now *that* would get their attention :-)

    As an intermediate level of annoyance, you could set up your DNS server to respond to queries from Microsoftland to return entertaining IP addresses, such as 127.0.0.2 or bing's IP addresses or whatever.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  57. to too two by Anonymous Coward · · Score: 0

    Well, it's true here to.

    You mean "too", as in "also". "to" is the opposite of "from".

  58. Re:Robots.txt is there to protect servers by billstewart · · Score: 1

    The primary reason for robots.txt was to protect small slow web servers from being swamped by Altavista's big fast web crawlers. Dynamic pages weren't a problem back then. On the other hand, after robots.txt became common, setting up dynamic pages to trap crawlers that ignored it into infinite loops became common also, because most of them were run by spammers of various sorts.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  59. Linking to SEOs encourages scum by billstewart · · Score: 1

    Search engines try to tell humans what web sites would have interesting contents based on their queries. They use robots and content models to approximate that so they can produce results quickly and economically. SEOs try to get the robots to tell the humans "my page is really interesting", when it usually isn't, which is scummy lying, and you shouldn't encourage such people.

    They've really got three things to offer:

    • Telling the web site owner how to structure their content so that the robots can find it. That's legitimate and useful, but that's also something that can be covered in 1-2 pages of documentation. On the other hand, sometimes less-technical web site owners find it worthwhile to hire consultants to do that for them, which is fine, but usually the consultants they really need call themselves "web designers".
    • Telling web site owners how to make their content look more interesting (keeping it up to date, adding new material, etc.) To the extent that it's making the content actually more interesting to human readers, cool, but if the consultant is just adding features to make it look attractive to robots so humans will look at it and generate advertising revenue, and not to make it actually interesting to actual humans, it's still sleazy. If you want to hire somebody to make your site actually interesting to humans, the consultants you should hire usually call themselves "editors" or "authors" or "web designers", not "SEOs".
    • Lying to robots so the robots will lie to the humans, using whatever tricks still work, whether that's link farms or astroturfing popular discussion sites or blog comment spam or whatever. This is sleazy scum behaviour that makes search engine results less useful to humans, and at best it's done because of ignorance and greed, though it's increasingly often done to distribute malware. And yes, if you want to hire somebody to lie to robots to boost your search engine position, the consultant you're looking for will call probably themselves an SEO.
    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
    1. Re:Linking to SEOs encourages scum by TerranFury · · Score: 1

      Good points. I hadn't noticed my sources. Anyway, my purpose had only been to figure out how the various robots.txt and HTML META directives are interpreted to respond to great-great grandparent.

    2. Re:Linking to SEOs encourages scum by billstewart · · Score: 1

      I'm guessing you probably looked on Google for references to robots.txt and HTML META - if SEO scum are any good, they should be among the first references you'll find, because that is one thing they know about, and they'll use that as part of their self-promotion.

      --

      Bill Stewart
      New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  60. robots.txt by petit_robert · · Score: 1

    I don't see any requests for Robots.txt in my logs. It's always lower case :
    65.55.106.138 - - [19/Jan/2010:01:00:46 +0100] "GET /robots.txt HTTP/1.1" 200 30 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"

  61. CPAN webserver broken by lpq · · Score: 0, Troll

    The spec for robots.txt says that strings matched internally in the text file should be done in a case insensitive manner.

    It would only make sense for a "reasonable person" to assume" that any web fetches for a file name for 'robots.txt' should also match in a case insensitive manner.

    This sounds like Microsoft being used to Uppercasing the first letter of words -- which looks aesthetically pleasing, and not having it make any real difference on 70% of the computers on the planet (running Microsoft) and (in my experience, on most webservers running apache). Never noticed any case sensitivity.

    This looks like a case of the perl guys being at fault. They likely have a web-server written in perl and DIDn't do a case ignore when processing requests for 'robots.txt'. This violates the intent if not the letter of the spec.

    Check out http://www.robotstxt.org/orig.html. It specifies that all of its strings should be matched in a case insensitive manner. IT doesn't explicitly say that the filename 'robots.txt' should also be matched by the webserver, in a case insensitive manner, but if if specifies that all of the web-addresses in the file should be handled in a case-insensitive manner, doesn't it makes sense that the file name it-self should also be case insensitive?

    People should use a little common sense before going off and blaming microsoft for doing something that is perfection natural and perfectly understandable, while the supposed victims should be a bit more robust in the design of the web server.

    At least, that's how it appears to me -- anyone care to show me a sound reasoning why it should be otherwise or why one would expect otherwise?

    1. Re:CPAN webserver broken by Slashcrap · · Score: 1

      They likely have a web-server written in perl

      You are likely a fucking idiot. I saw your sig about spite. You're still a fucking idiot.

      Actually why the fuck did you spend so much time writing all that totally speculative bullshit just to try and prove that MS aren't at fault? Do you want to explain that? Because case sensitive web servers written in Perl are a bit of a fucking stretch of the imagination.

      And no, I don't want to show sound reasoning to disprove some wacky shit your diseased brain made up.

    2. Re:CPAN webserver broken by lpq · · Score: 1

      My, seems like I struck a nerve. The the web server doesn't have to be written in perl for it not to ignore case. The point was it doesn't.

      As for my comment about them writing a web server in perl being wacky -- you obviously know nothing about the perl community. There is nothing that can not be done better in perl -- including a web server. I don't seen that being an unwarranted comment. May not be true in this circumstance, but I'm sure it's been done. Did you even bother to check cpan for a cpan webserver? I'll take my bemusings, that you erroneously call speculations, over your sad, dim existence speculation any day. Tell me "CPAN::Mini::Webserver" isn't meant to server up a copy of cpan -- maybe not used for the main site, but...maybe with a squid accelerator front end -- yeah...I could see it!

      Too bad you are such a hateful, spiteful diseased thing. You really should get some help or consider doing mankind a favor and stop wasting the planet's resources with your continued existence. It would be the responsible thing to do.

      -l

    3. Re:CPAN webserver broken by chromatic · · Score: 1

      May not be true in this circumstance....

      Did you even bother to check the headers of the CPAN Testers site? It's Apache httpd. You spent longer typing your speculation and its defense than it would have taken you to verify for yourself.

    4. Re:CPAN webserver broken by lpq · · Score: 1

      You miss this the main point. Funny how people focus on the unimportant details when they don't like the main statement of the post. Robots.txt says to ignore case. It only makes sense for their webserver to
      also ignore case for the file name. It sure would make retrieving cpan modules much easier. I'm always forgetting where some specific author has decided to put caps - because it isn't done consistently. It would be far smarter to allow case insensitive searching and usage given how capricious case usage is. They got hoisted by their own petard.

      Most web servers ignore case. Theirs doesn't because they like to give authors the ability to randomly force
      users to remember random combination of case. Yipee. The bit about the perl server was a piece of dry wit for reasons I've previously stated. It wasn't meant as an insult. That you took it that way shows you aren't a true perl affectionado, so stop complaining.

      If I remember apache defaults to options to set case insensitivity. So they'd have to explicitly disable case insensitivity to enable this vulnerability.

    5. Re:CPAN webserver broken by chromatic · · Score: 1

      Funny how people focus on the unimportant details when they don't like the main statement of the post.

      If you can't get any trivially verifiable details correct (including which site this is), why should anyone take your random speculations seriously?

  62. Here it is! by LordAzuzu · · Score: 1

    Here it is another one from some minutes ago:

    IPv4: 65.55.34.139 -> 83.211.46.34
          hlen=5 TOS=192 dlen=162 ID=46000 flags=0 offset=0 TTL=0 chksum=7990
    Payload: Priority Count: 5
    Connection Count: 6
    IP Count: 7
    Scanner IP Range: 78.130.238.2:212.90.12.134
    Port/Proto Count: 7
    Port/Proto Range: 80:40210

    65.55.34.139 resolving to col0-omc3-s1.col0.hotmail.com

  63. Re:You've been Bing'ed by Macrat · · Score: 1

    Bing


    Bing

  64. Exploited servers by nurb432 · · Score: 1

    Ya, give them an excuse to get away with it. "it wasn't us attacking our competition, really"

    --
    ---- Booth was a patriot ----
    1. Re:Exploited servers by LordAzuzu · · Score: 1

      I just can't believe they are so dumb, that's why.
      No way I'm trying to defend them.

  65. Network Solutions Domain Information by DJRumpy · · Score: 1
    1. Re:Network Solutions Domain Information by Emilio+III · · Score: 1

      Yes, I'm an idiot. How I managed to get a typo in a copy-and-paste job I haven't figured out yet.

    2. Re:Network Solutions Domain Information by DJRumpy · · Score: 1

      I was actually wondering if your browser or dns had been hijacked ;)

      No worries...