Slashdot Mirror


How Google Routes Around Outages

1sockchuck writes "Making changes to Google's search infrastructure is akin to 'changing the tires on a car while you're going at 60 down the freeway,' according to Urs Holzle, who oversees the company's massive data center operations. In a Q-and-A with Data Center Knowledge, Holzle discusses Google's infrastructure, how it has engineered its system to route around hardware failures, and how it responds when something goes awry. These updates usually go unnoticed, but during system maintenance last month a software bug triggered an outage for Gmail."

2 of 105 comments (clear)

  1. Google File System Paper by Anonymous Coward · · Score: 5, Informative

    To those looking for a more in-depth description, check out the technical paper on the google file system:

    http://labs.google.com/papers/gfs.html

    Had to read it for a search engines course in college, it's pretty darn spiffy.

  2. Simple, really... by neokushan · · Score: 4, Informative

    The key point:

    When they get an outage, they check how it was caught and if it wasn't caught automatically, they figure out how to next time. Simple rule: They learn from their mistakes and don't put all their eggs in one basket.

    --
    +1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill