Slashdot Mirror


The Slashdot DDoS: What Happened?

What follows this introduction is a rough summary of the crazy hell that we endured with the intermittant DDoS[?] attacks we experienced last Thursday through Saturday. I'm sorry it took this long to put this together and tell you what happened, but as these things go, we were too busy trying to solve the problem to waste time talking about it. Big thanks to Andover.Net's Netops PatL, Martin and Liz, as well as Slashcode-wranglers PatG, Chris, Marc, Kurt and CowboyNeal, plus scoop (from freshmeat) and others who chimed in along the way. Tomorrow is part2: A good description of how the new Slashdot @ Exodus works.

What follows is more-or-less Pat "BSD-Pat" Lynch's account of the DDoS... Pat is our super 31337 BSD Junkie sysadmin. He wants everyone to know that the timeline below is little screwy, but things are more or less in sequential order. Things might not be exactly perfect, but hey, what do you expect after 30 hours without sleep?

Having moved the day before, none of us were truly familiar with exactly how the new hardware would handle the full burden of being 'slashdot.org'. The cluster (known affectionately as The Matrix) had handled its premiere day with flying colors, but we didn't really have an accurate feel of how things would react. Combine this with a couple of extremely high traffic stories posted on both Thursday and Friday, and it took us a awhile to determine that the problems were external, and not a flaw in some new component in the cluster."

The Attacks began Thursday morning. Most of it came in the form of SYN floods, from obvious /16's no less, and some /24's. We didn't have any zombie-killing software or a firewall installed because of certain network topology issues. Later on, a second wave came, this closer to 8 or 9pm and the load balancer (an arrowpoint CS-100) died under the load.

The DDoS, as far as I could see, was a lot of SYN and Zero port packets coming from various /16's and /24's as well as a bunch of RFC1918 reserved addresses (10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16) At one point we reached 109Mbits worth of traffic into our network.

Liz and I went back to Exodus and rebooted the Arrowpoint, then the site seemed "ok" for a bit. By 3 in the morning, Liz decided that the PIX (Cisco's firewall) could simply not do what it was supposed to do, so we went back and started building a FreeBSD box as a bridging firewall.

just before we went to plug it in, I tried to ssh into the vpn-gate and noticed that nothing was working right: while the site worked, outgoing traffic and source groups on the Arrowpoint was screwed. As if that wasn't enough, two ports died on it already!

At some unknown point (time blurs after 30 hours straight!) Martin and PatG show up (thank the gods!) and they force us to go to sleep, they bring the site up outside the Arrowpoint, while Liz and I watch from a hotel room.

As of Friday morning, the site is semi-working, but the adsystem can't be updated, and we have no access to the backend servers. I scream bloody murder to Arrowpoint, who eventually shows up to blame the router: a cisco 6509 switch with two RSM/MSFCs.

Liz and I do packet dumps and determine it's not the router, the little CS-100 had died the night before, and thats where it all started. The Arrowpoint guy insists we did something to make the Arrowpoint not work (CT: Explicit description of precisely where Liz and and Pat wanted to store the newly deceased Arrowpoint removed to keep things rated PG) By 7 the CS-800 CSS is up we're almost done for the day, but we stay to make sure. By 10pm we're exhausted but stable, although we're running 4 servers on a round-robin DNS while the new load balancer waits.

Netops (Liz , Martin and I) regroup, and do reintegration of new Arrowpoint CS-800 and installation of a new FreeBSD Firewall box instead of the PIX during Saturday Afternoon. Slashdot returns to normal. Sysadmins get well-deserved sleep.

So that was the story. It was a pretty hellish weekend for everyone involved, but thanks again to those that helped get our ducks back in a row. Again, Part #2 to this (which originally was gonna be run last Thursday, but with all this ddos stuff got pushed aside) is a fairly detailed description of the new Slashdot setup at Exodus, complete with all the changes mentioned above. Fun for the whole family if your family is really into clusters of web servers."

15 of 367 comments (clear)

  1. Defense in Depth by Old+Man+Kensey · · Score: 5
    Modern military command uses the concept of defense in depth. The essence of this is trading space for time.

    The simplest case is building two small walls instead of one humongous wall. If you build a humongous wall, it takes a long time to get through... unless the enemy finds a single weak point -- then you're screwed. Two walls each take less time to get through, but if they're well-built using different techniques, the enemy may not get through to begin with and if they breach the first they lose time covering ground and then adapting. They're also very obvious as they traverse the open ground between barriers.

    Network security can benefit from the same concept. Others have already mentioned heterogeneous "airgap" systems -- one of the most common and least excusable faux pas by so-called "security admins" is a single firewall protecting a herd of boxen. Second to that is identical airgap firewalls.

    Of course real defense doesn't end with the walls. Even services running behind an airgap should be structured with an eye toward reasonable security, as others have pointed out. Many companies think their firewalls make them safe; come the day those firewalls are breached and the attackers make off with everything stored on the NT intranet server before wiping the drive, they'll find out differently.

    Any server, no matter how well shielded, should start life in a lockdown configuration and then be made less secure only as needed ("do we really need to enable daytime on this box?"). Admittedly I haven't kept up with developments in secure distros, but does anyone make a "locked-down by default" distro based off Red Hat/Debian/*BSD? It'd be a real service to admins and if not it's something I might consider starting a project for. I know of Bastille Linux but that's (as far as I know) not so much a distro as a set of scripts to tighten up Red Hat.

    The only thing we have yet to figure out is how to effectively make systems under attack "shoot back". The most they can do at the moment is call in an airstrike (i.e. alert the admins). Any return-fire capability would only be as good as the intermediate links let it be. It might not even be a good idea, as it would increase network traffic and make the attack that much more severe.

    --
    -- Old Man Kensey
  2. Re:What's the Cisco angle? by buffy · · Score: 5

    "They are probably much better off with the BSD box. Although it's not a good idea to advertise their security infrastructure layout to the world. (Hint, Hint, CmdrTaco!)"

    I disagree 100%. Knowledge of an installation's infrastructure should never comprimise the security of the setup. If it does, then you're relying (to a certain extent) on security through obscurity. Security should be provided by a well thought out layered approach: network layering (multiple firewalls, screening routers, IDS, etc...), host-based security (tcp wrappers, service minimalization & replacement, tripwire, etc..), and application security (ie. authentication, verification, etc...)

    In designing networking/server infrastructures it's best to think of it as an open source project, and you should be willing to get opinions and discussion from any number of sources that could include crackers who may at some point want to use that knowledge to attack your site. This is one of the things I like about TIS Gauntlet once upon a time..."crystal box" was the term they used to describe it.

    You should prepare for an attack ASSUMING that the infiltrators know as much about your setup as you do. In the long run, if you know that your infrastructure can hold up to someone with that amount of knowledge, then you'll be doing pretty well.

    My only question...did I actually see in a comment that they're using NFS to publish data to the distributed webservers??? Ew. Run.

    -buffy

    (Hmm...I seem to really like parentheticals, don't I? (well maybe not. (really!)))

  3. Re:Blame Canada by Kurt+Gray · · Score: 5

    Exodus is getting $1million/year from us so they let us do whatever we want. They only thing they won't let us do is take a picture of our cage -- no cameras allowed anywhere in the facility! I guess they're afraid we're going to steal their soul. We were able to smuggle out this picture of PatG, PatL, Martin, and the Arrowpoint rep. Behind them you can see the current Slashdot setup.

  4. Nice account, but who? by alteridem · · Score: 5

    That was a good account of what happened, but in part two, we want to hear what you are doing to track the bastards down. Knowing how you go about fixing the problem and then tracking down the culprits may help other people who run into the same problem in the future. We would understand if you need to keep the info secret until you have finished tracking them down, or for legal reasons, but at least tell us so.

  5. Re:Why a firewall? by Christopher+Thomas · · Score: 5

    Why are you installing a Unix-based firewall in front of some Unix-based public servers? Why not secure the servers in the first place?

    Having a firewall in place to filter invalid packets and other crud thrown at the servers means that more of the servers' time is spent generating slashdot pages. Also, the simpler the Unix box, the easier it is to secure - hence, securing a stripped down firewall instead of a big, complex slashdot server.

  6. Timing by 348 · · Score: 5

    I'm curious about the timing with the port to the Exodus environment, was there any indication the attack was timed to take advantage of the different environment? Not saying that the security measures were better or worse than the old site, just that the timing seems rather convienent.

    --

    More race stuff in one place,
    than any one place on the net.

  7. What's the Cisco angle? by drteknikal · · Score: 5

    I'm curious on one detail. What was it that the Cisco PIX was supposed to do and didn't?

    --
    http://drteknikal.blogspot.com/
    1. Re:What's the Cisco angle? by John+Fulmer · · Score: 5

      I'm not a slashdot admin (but I could play one on TV!), but I am painfully familiar with PIXs.

      The idea behind the PIX, or any firewall-like object, is to allow 'good' traffic (http, smtp, etc) into the production network, and reject 'bad' traffic (oddball ports, like port 0, unauthorized UDP traffic, etc).

      The problem with the PIX, is that it is essentially a fairly stupid router that can do network address translation and other bells and whistles, but it does it poorly. VERY poorly. It was designed as a network address translation system back in the mid 80's (anyone remember all the "We'll run out of IP's by 1997!") by a company that Cisco later bought. Cisco took the product, did a logic problem ( "Firewalls can do address translation. PIX does address translation. PIX is a firewall!"), and had themselves a firewall.

      Its configuration makes a lot of sense to someone familiar with cisco router ACL rules, but no one else.

      They are probably much better off with the BSD box. Although it's not a good idea to advertise their security infrastructure layout to the world. (Hint, Hint, CmdrTaco!)

      jf

  8. Blame Exodus by snopes · · Score: 5
    We didn't have any zombie-killing software or a firewall installed because of certain network topology issues.

    Topology my ass. Exodus fights hard to make you use their 'value add' security services. Be honest guys, the reason you weren't protected was b/c those bastards were working you over for more money and don't want you running your own security, right? In fairness, there's some nice things about running out of an Exodus facility, but dealing with their physical and network security chimps is not one of the high points.

  9. What about the children? by Anonymous Coward · · Score: 5

    While I agree that the Slashdot DDoS attack caused many people quite a bit of annoyance and frustration, I think leaving the impact at that is very short sighted.

    Firstly, I don't think the blame for this DDoS can be centered on just one person or group. Obviously, those who attacked Slashdot are to blame, as are Slashdot's sysadmins, and the people at Arrowpoint. And secondly, the costs of this are much greater than you might think.

    I have an eight year old daughter. We had a family pet - a rabbit, black, named Midnight, and my daughter was very fond of it. Midnight, sadly, passed away about two months ago. A week or two after Midnight died, my daughter came to me in tears and asked me, "Daddy, why won't God bring Midnight back? I've been praying like Deacon Simmons told me to."

    Naturally, I had to think about how to respond to this. I finally answered, "well, honey, God is a little like Slashdot. He can seem arbitrary, cruel, and unresponsive, but he's really a nice guy who's just a little out of touch and is a little slow at responding to requessts."

    This was fine, and I thought that would be the end of it. However, when Slashdot went down last week, my daughter burst into my den, positively sobbing and wailing, and managed to choke out "Daddy! Daddy! I can't get to Slashdot!" "Honey," I said, "it's just a website." But, between sobs, she said, "but you said God is just like Slashdot, remember? Does this mean God is dead?"

    I tried to console her as best I could, but nothing seemed to work. When Slashdot came back up, she seemed to return to normal, but she hasn't been quite the same since. She doesn't ask me about God so much any more, and she seems less interested in Church.

    As a good Christian, I will turn the other cheek, and not call for the punishment of those responsible. But to the heinous criminals and negligents responsible for this, I must ask, how do you feel about destroying a small girl's sense of innocence and wonder about the world? About crushing her childish dreams and idealism? About shattering her faith in God and his benevolence? About possibly having crushed her soul and emotion forever, leaving her to live the rest of her days in spiritual agony as a broken, scarred husk of a person?

    I hope all of you think long and hard about what you've done. What is the soul of a child worth, next to a few double-checks of the router?

    Thank you.

  10. Re:Owned? - Nope by Cpyder · · Score: 5

    Maybe you should type more carefully, since you
    requested http://slahsdot.org (slaHSdot) not
    slashdot.org...

    I registered that domain (for free @ namezero) to
    help the people who couldn't type. Sorry if I scared you :-)

    Cpyder@slahsdot.org
    _
    / /pyder.....
    \_\ sig under construction

  11. RSM/MSFC definations by Ed+Bugg · · Score: 5

    RSM - Route Switch Module
    - Basically a router on a card in the switch for routing between VLANs

    MSFC - Multilayer Switch Feature Card
    - Once a route for a packet flow is figured out (from the first packet going through the router) all other packets from the flow get switched instead of routed.

    --
    -- Ed Bugg --You have freedom of choice, but not of consequences.--
  12. A little more detail on the hardware setup by Kurt+Gray · · Score: 5
    Rob is going to post exact hardware specs later, but in the meantime just to give you a brief idea where the "Arrowpoint" sits in relation to all this... Slashdot now running on several machines, all VA FullOns, running Debian and few running Red Hat, Apache+mod_perl, MySQL. The database is on its own VA 3500 server. There are currently six VA FullOns serving web pages from an NFS server, and three other web servers serving images.

    All of these machines were behind an Arrowpoint (CS-100) firewall/load balancer which took it on the chin when we got DDoSed, so basically the Arrowpoint was taking the full force of the attack. So as described above we replaced it with a CS-800 and a BSD firewall.

    I guess we learned that if you're going to post a letter from a Microsoft attorney on your web site the same day you implement a few new troll filters you better be prepared for the fury of hell to rain down on you. Then again this is Slashdot, so we always should be prepared for the fury of hell to rain down on us.

  13. DDOS != 10.0.0.0 by Christopher+Thomas · · Score: 5

    DDoS is the direct result of sloppy upstream administrators. IF I were in your shoes, I would be suing every person upstream for atleast a few hops for passing those 10.0.0.0 packets along for gross negligence.

    Um, no.

    DDOS simply requires that a lot of compromized boxes be able to send you packets. Spoofing to non-existant return addresses is an orthogonal issue. You reply that it's used to mask the souce boxes? Any _valid_ address could also be used for that, so filtering would gain you nothing against that.

    I agree that filtering of reserved addresses should be done, but that would not hinder a DDOS attack.

  14. Re:PIX inusfficient? by gavinhall · · Score: 5

    Posted by BSD-Pat:

    The problem here is that we only had one subnet to work with. The PIX we had wouldn;t to the type of filtering/bridging that I wanted.

    Cisco wants a DMZ on these things.

    I needed a bridge...why I didn't use linux...

    It was quicker and easier for me... ipchains has always been a pain in my arse... ipfw and ipfilter I know best.

    The other thig is that we fried an arrowpoint cs-100 (little itty bitty dinky thing that was being replaced with a bigger one)

    the little arrowpoint couldn't take the traffic of 109Mbits , it wasn;t meant for that, we were waiting on arrowpoint to ship us the unit we were *supposed* to have.

    *BSD fills the gap because I know it inside and out, and it was the quickest to get up at that point.

    As far as the router, we can't do any type of stateful filtering on the 6509, due to some setup that exodus has with the HSRP stuff, I'm sure given enough thought I could figure out how to do it, however we were running on crisis mode.

    The BSD firewall filled that gap for us...I can now do access lists on that, instead of the cisco.

    and we still have a "DMZ" but its on the same subnet.

    The arrowpoint CS-800 was emergency shipped to us that afternoon....its about as big as a cisco 6509...and ummm won't die under that type of traffic/content checking (its layer 5 remember)

    -Pat