Slashdot Mirror


Facebook Engineers Crash Data Centers In Real-World Stress Test (ieee.org)

An anonymous reader writes: In a report via IEEE Spectrum, Facebook's VP of Engineering Jay Parikh described the company's "Project Storm" -- regular takedowns of Facebook's data center intended to stress test the company's disaster recovery efforts. The first few didn't go so well, he reports. (Perhaps doing a test during a World Cup final was not such a good idea). Months and months of planning went into the initial effort, though up until the actual moment, other Facebook leaders didn't think he'd actually take out an active data center. "In 2014, Parikh decided Project Storm was ready for a real-world test: The team would take down an actual data center during a normal working day and see if they could orchestrate the traffic shift smoothly," reports IEEE Spectrum. Parikh recalls: "I was having coffee with a colleague just before the first drill. He said, 'You're not going to go through with it; you've done all the prep work, so you're done, right?' I told him, 'There's only one way to find out'" if it works. (Parikh made the remarks at this week's @Scale conference in San Jose.) Parikh says there never seemed to be a good time to perform the live takedowns. "Something always ended up happening in the world or the company. One was during the World Cup final, another during a major product launch." The report adds, "The live takedowns continue today, with the Project Storm team members coming up with crazier and crazier ambitions for just what to take offline, Parikh says.

4 of 52 comments (clear)

  1. Re: Worth it by johnsmithperson123 · · Score: 5, Insightful

    Considering that Facebook is arguably the world's biggest news service, it actually is sort of important.

  2. Somebody Finally Gets It! by chill · · Score: 5, Insightful

    Good for him! Most DR exercises I've seen are planned weeks, if not months in advance. They are more of a scheduled fail-over to a redundant site and not an actual disaster recovery test.

    In the event of an actual disaster, there would be no recovery.

    I'm heartened to see SOMEONE does it right.

    --
    Learning HOW to think is more important than learning WHAT to think.
  3. And nothing of value was lost by RogueWarrior65 · · Score: 3, Insightful

    Pity.

  4. Re: Worth it by bill_mcgonigle · · Score: 3, Insightful

    News DISTRIBUTION service. It's not like they provide any original content like AP, Reuters, etc.

    In that AP and Reuters are just distribution services, Facebook is arguably a larger source of original news distribution than those two.

    And kudos to their engineering team for not just paying lip service to reliability.

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)