Slashdot Mirror


Microsoft Bots Effectively DDoSing Perl CPAN Testers

at_slashdot writes "The Perl CPAN Testers have been suffering issues accessing their sites, databases and mirrors. According to a posting on the CPAN Testers' blog, the CPAN Testers' server has been being aggressively scanned by '20-30 bots every few seconds' in what they call 'a dedicated denial of service attack'; these bots 'completely ignore the rules specified in robots.txt.'" From the Heise story linked above: "The bots were identified by their IP addresses, including 65.55.207.x, 65.55.107.x and 65.55.106.x, as coming from Microsoft."

7 of 332 comments (clear)

  1. What's not? by tjstork · · Score: 1, Troll

    It's not like ASP.NET is the most efficient way to sling web pages to being with.

    --
    This is my sig.
  2. The US government is competent. by tjstork · · Score: 0, Troll

    . For additional examples, see Government, US.

    I'm a right winger and I like to see smaller, less intrusive government, but, I think it is wrong to say that the US government is competent.

    The US Gov't has successfully operated as a going concern for 220+ years, with a proven and reliable management structure. Few, if any corporations, have been able to do that.

    --
    This is my sig.
  3. Re:pl0s 2, Troll) by ArsenneLupin · · Score: 0, Troll
    Yes, that's the address that they should have redirected the Micro$hit spiders to.

    O, it's just a pumpkin :-(

    Here's the real address goatse.fr. Doesn't Mr Sarkozy have a lovely face?

  4. Re:Probably just a bug. by kjart · · Score: 0, Troll

    The simple fact is that ignoring robots.txt is effectively evil, regardless of the intent.

    So evil, in fact, that you just know that nobody else would ever do something like this. Oh wait...

  5. Re:Probably just a bug. by AHuxley · · Score: 0, Troll

    Why would any search engine ignore a site?
    A site could have quality links to non ignore sites.
    Think of "robots.txt" as a flag to 'do not display results to consumers".
    Selected paying customers who sign a NDA ect. would get to see all the webs.
    Ignoring robots.txt is effectively how search engines would work, we just got to see it for an instant.

    --
    Domestic spying is now "Benign Information Gathering"
  6. Re:So how do we DDoS Microsoft? by Anonymous Coward · · Score: 0, Troll

    Thank you MS for admitting to the world that you're completely incapable of fixing the problem on your own. How horrible are your employees at their jobs when they require the assistance of their victims to fix the problem?

  7. CPAN webserver broken by lpq · · Score: 0, Troll

    The spec for robots.txt says that strings matched internally in the text file should be done in a case insensitive manner.

    It would only make sense for a "reasonable person" to assume" that any web fetches for a file name for 'robots.txt' should also match in a case insensitive manner.

    This sounds like Microsoft being used to Uppercasing the first letter of words -- which looks aesthetically pleasing, and not having it make any real difference on 70% of the computers on the planet (running Microsoft) and (in my experience, on most webservers running apache). Never noticed any case sensitivity.

    This looks like a case of the perl guys being at fault. They likely have a web-server written in perl and DIDn't do a case ignore when processing requests for 'robots.txt'. This violates the intent if not the letter of the spec.

    Check out http://www.robotstxt.org/orig.html. It specifies that all of its strings should be matched in a case insensitive manner. IT doesn't explicitly say that the filename 'robots.txt' should also be matched by the webserver, in a case insensitive manner, but if if specifies that all of the web-addresses in the file should be handled in a case-insensitive manner, doesn't it makes sense that the file name it-self should also be case insensitive?

    People should use a little common sense before going off and blaming microsoft for doing something that is perfection natural and perfectly understandable, while the supposed victims should be a bit more robust in the design of the web server.

    At least, that's how it appears to me -- anyone care to show me a sound reasoning why it should be otherwise or why one would expect otherwise?