How Google Routes Around Outages
1sockchuck writes "Making changes to Google's search infrastructure is akin to 'changing the tires on a car while you're going at 60 down the freeway,' according to Urs Holzle, who oversees the company's massive data center operations. In a Q-and-A with Data Center Knowledge, Holzle discusses Google's infrastructure, how it has engineered its system to route around hardware failures, and how it responds when something goes awry. These updates usually go unnoticed, but during system maintenance last month a software bug triggered an outage for Gmail."
Was it just me or did anyone else spend a few minutes contemplating how you actually could make a car that did allow you to change a flat while moving?
It just treats the damage as censorship and routes around it, right?
To those looking for a more in-depth description, check out the technical paper on the google file system:
http://labs.google.com/papers/gfs.html
Had to read it for a search engines course in college, it's pretty darn spiffy.
Has the same issue with rolling out updates and even though Google is (I suppose 10^100) times larger than any other company it does not mean that the same principles can't be applied. I don't see why Google should have any more problem than any other large company especially as they clearly have lots of resources and expertise to bring to bear.
Nullius in verba
dont waste time reading this, holzles core answer is: well, if there is a problem, we fix it. and after that, we analyse the reason. brilliant.
* a merry live and a short one
Google treats outages like damage and routs around it.
Knowledge is power. Knowledge shared is power lost.
Excellent use of the car analogy, especially since it is possible to change a tire while driving a car. Youtube video at 1:48.
Slightly..ahem... OT so posting anon.
You know, the article read like a press release. Hasn't slashdot whored itself out enough lately on these kinds of things? Google is so ultra-reliable, blah blah, 24x7, blah blah, commitment, blah blah, premier service partner, blah blah... I get that kind of talk enough in staff meetings. Where's the meat already!?
Why not write an article with some nice graphics saying what happens to my request from the time I hit "Search" to the time I click a result. List off all the servers it goes through, their roles, how they're monitored, etc. Give examples of failure and show the mode decisions the software makes (and where this software is running) -- show the latencies and other performance impacts as my request bounces over failure after failure. That's what I expect when I pull up an article entitled "How Google Routes Around Outages". Something useful, professionally enriching, intellectually stimulating, etc. In short, tell me why I (should) never see a "500 Internal Server Error" from Google, but I do from just about every other major website I've used.
#fuckbeta #iamslashdot #dicemustdie
The key point:
When they get an outage, they check how it was caught and if it wasn't caught automatically, they figure out how to next time. Simple rule: They learn from their mistakes and don't put all their eggs in one basket.
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
akin to 'changing the tires on a car while you're going at 60 down the freeway,'
This is not so hard. Just design the car with 4 axles instead of 2 and lift one off the road at a time. Helps if it can swivel for easy access to the lugnuts.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
Isn't this how the *internet* is (at least in theory) supposed to work anyhow? Instead we have 90% of the cables that route the middle-east/europe running through the same canal. And I know of VERY few ISPs who actually make their systems redundant anymore. /sadface
Ok, granted they are not travelling 60mph, this is still pretty impressive.. I consider this on-topic, because maybe it is possible to do what the summary suggests (replace wheel in moving car). :)
Watch from 1:55 to 2:35:
Youtube video of guys replacing a wheel on a car while it is moving..
New webcomic updated on Sundays: HERE
*ahem* It's Hoelzle. Just saying.
I'm sure they just do exactly what I do when I'm at work and have a problem: they google for an answer to the problem at hand.
Oh, wait.
The above sort of leads into explaining my fear of asking google "is google alive" and the ensuing apocalypse.
I kept thinking about derailing a car, before I realized I was on the wrong track.
It's easy. (With a little help from Google Images...)
Car
Derailer
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Tell me I am not the only one who read that and wondered if Google also employ I P Nightly etc
http://slashdot.org/~GuyFawkes/journal
http://www.imdb.com/title/tt0074205/
Like maybe this one?
Not A Sig
I figured you'd need a car with redundant wheels that can be lowered and retracted while the car was moving. Probably also analogous to how google manages hardware downtime: redundancy.
While technically possible to make such a car, I don't see any practical use for such a system when it's just safer and more efficient to stop the car and replace the wheel.
At first I thought it might be a useful system for an armored car like the presidential limo, but the added weight of another motorized system would probably make it more of a liability than help in most situations, considering if something powerful enough to disable an armored wheel on an armored car will probably also destroy any systems adjacent to it.
In Soviet Russia, outages route around Google
Sorry couldn't resist
has never really seemed appropriate to me. in an infrastructure you have quorum managers, active fencing, and weighted peers...what does road construction have to do with any of it?
Good people go to bed earlier.
Woohooo, going at an amazing 60 mph! Imagine that! The wind might well blow away you hat!
For Germans, talking about that speed like it was fast is really kinda cute. :-)