Slashdot Mirror


How Google Routes Around Outages

1sockchuck writes "Making changes to Google's search infrastructure is akin to 'changing the tires on a car while you're going at 60 down the freeway,' according to Urs Holzle, who oversees the company's massive data center operations. In a Q-and-A with Data Center Knowledge, Holzle discusses Google's infrastructure, how it has engineered its system to route around hardware failures, and how it responds when something goes awry. These updates usually go unnoticed, but during system maintenance last month a software bug triggered an outage for Gmail."

24 of 105 comments (clear)

  1. Just me? by Anonymous Coward · · Score: 5, Funny

    Was it just me or did anyone else spend a few minutes contemplating how you actually could make a car that did allow you to change a flat while moving?

    1. Re:Just me? by esocid · · Score: 4, Funny

      Just you. I kept thinking about how I could use a car metaphor to describe how google...oh wait.

      --
      Absolute power corrupts absolutely. indymedia
    2. Re:Just me? by Anonymous Coward · · Score: 5, Insightful

      I thought about it for approximately 30 seconds. Then I realized that it is a bad analogy. A Google car would have hundreds of redundant wheels, changing one is easy.

    3. Re:Just me? by Yetihehe · · Score: 4, Interesting

      Car is a bad analogy, building airplane in mid-air is better.

      --
      Extreme Programming - Redundant Array of Inexpensive Developers
    4. Re:Just me? by Slumdog · · Score: 5, Funny

      Just you. I kept thinking about how I could use a car metaphor to describe how google...oh wait.

      I kept thinking about derailing a car, before I realized I was on the wrong track.

    5. Re:Just me? by Saerko · · Score: 5, Insightful
      That's what I was thinking too; and probably just function like an 18-wheeler where a tire can blow out and there's so much support that the load is still distributed adequately.

      Basically, all this means is Google designs like Mack while everyone else designs like Chrysler...

    6. Re:Just me? by zonky · · Score: 3, Informative

      That problem has been solved for sometime, at least in Rallying. http://www.inforally.sibiul.ro/wrc-rally-news-10661-runflat_mousse_tyres_detail.html At least, that was until they banned it.

    7. Re:Just me? by tux0r · · Score: 5, Funny

      I kept thinking about derailing a car, before I realized I was on the wrong track.

      I was going to reply about mixing metaphors, but then I lost my train of thought.

      --
      ( Redundancy is ) ^ n
    8. Re:Just me? by moxley · · Score: 2, Funny

      What are you talking about? I'm in America, and we need to find someone to blame for the flat first....

      Then, maybe we can fix it..Got any nails and a hammer?

    9. Re:Just me? by zonky · · Score: 2, Interesting

      Trying to lower costs of competing. Also, it could be argued that mousse meant that off-line mistakes were not 'punished'.

  2. I know! I know! Pick Me!!! by fuzzyfuzzyfungus · · Score: 5, Funny

    It just treats the damage as censorship and routes around it, right?

    1. Re:I know! I know! Pick Me!!! by fulldecent · · Score: 4, Funny

      It treats the censors as routes and damages them?

      --

      -- I was raised on the command line, bitch

  3. Google File System Paper by Anonymous Coward · · Score: 5, Informative

    To those looking for a more in-depth description, check out the technical paper on the google file system:

    http://labs.google.com/papers/gfs.html

    Had to read it for a search engines course in college, it's pretty darn spiffy.

    1. Re:Google File System Paper by SeePage87 · · Score: 5, Funny

      Why would we need a more in-depth description? We already got our car metaphor!

  4. Video of the car analogy by Anonymous Coward · · Score: 5, Funny

    Excellent use of the car analogy, especially since it is possible to change a tire while driving a car. Youtube video at 1:48.

    Slightly..ahem... OT so posting anon.

  5. Re:nothing by Yetihehe · · Score: 2, Funny

    Analizing the reason first is a very good step. It could have saved me two hours today :/

    --
    Extreme Programming - Redundant Array of Inexpensive Developers
  6. Article doesn't really say anything. by girlintraining · · Score: 5, Interesting

    You know, the article read like a press release. Hasn't slashdot whored itself out enough lately on these kinds of things? Google is so ultra-reliable, blah blah, 24x7, blah blah, commitment, blah blah, premier service partner, blah blah... I get that kind of talk enough in staff meetings. Where's the meat already!?

    Why not write an article with some nice graphics saying what happens to my request from the time I hit "Search" to the time I click a result. List off all the servers it goes through, their roles, how they're monitored, etc. Give examples of failure and show the mode decisions the software makes (and where this software is running) -- show the latencies and other performance impacts as my request bounces over failure after failure. That's what I expect when I pull up an article entitled "How Google Routes Around Outages". Something useful, professionally enriching, intellectually stimulating, etc. In short, tell me why I (should) never see a "500 Internal Server Error" from Google, but I do from just about every other major website I've used.

    --
    #fuckbeta #iamslashdot #dicemustdie
    1. Re:Article doesn't really say anything. by Anonymous Coward · · Score: 2, Insightful

      I would wager to say you would learn all this if you were hired on as part of google's site reliability team. Probably most of that info. you're curious about is something they're not willing to talk about in great detail for competitive reasons.

    2. Re:Article doesn't really say anything. by Red+Flayer · · Score: 2, Interesting

      You know, the article read like a press release. Hasn't slashdot whored itself out enough lately on these kinds of things?

      YMBNH.

      This has been happening since as long as I've been lurking slashdot (2000?), and didn't go away once I set up an account (2002? maybe 2003). And from the YMBNH posts I saw when I began lurking, this has apparently been an issue since the beginning (or shortly thereafter).

      At any rate, complaining about it won't do much good. There's a saying maybe it might help you to repeat:

      Give me the strength to change the things I can, the humility to accept the things I can't, and the wisdom to know the difference.

      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
  7. Simple, really... by neokushan · · Score: 4, Informative

    The key point:

    When they get an outage, they check how it was caught and if it wasn't caught automatically, they figure out how to next time. Simple rule: They learn from their mistakes and don't put all their eggs in one basket.

    --
    +1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
    1. Re:Simple, really... by SeePage87 · · Score: 4, Funny

      Bollocks! I tried learning from my mistakes once, and boy did that ever turn out bad. Now I know better than to try that again.

  8. Changing tires by Spazmania · · Score: 2, Interesting

    akin to 'changing the tires on a car while you're going at 60 down the freeway,'

    This is not so hard. Just design the car with 4 axles instead of 2 and lift one off the road at a time. Helps if it can swivel for easy access to the lugnuts.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  9. Inspiration taken from the same thing it runs on! by Thantik · · Score: 5, Insightful

    Isn't this how the *internet* is (at least in theory) supposed to work anyhow? Instead we have 90% of the cables that route the middle-east/europe running through the same canal. And I know of VERY few ISPs who actually make their systems redundant anymore. /sadface

  10. Replacing a wheel on a car going 60mph by krnpimpsta · · Score: 2, Informative

    Ok, granted they are not travelling 60mph, this is still pretty impressive.. I consider this on-topic, because maybe it is possible to do what the summary suggests (replace wheel in moving car). :)

    Watch from 1:55 to 2:35:
    Youtube video of guys replacing a wheel on a car while it is moving..

    --

    New webcomic updated on Sundays: HERE