Confirmed Gmail / Google App Outage
mbone writes "Earlier today there was a confirmed Google outage which got a lot of attention from network operators. From a post to NANOG after everything calmed down: 'Google ack'd a maintenance on their core network did not go as planned-Forced traffic to one peer link that was unable to handle all the traffic. Maintenance has been rolled back. Issue has been restored.' This is exactly what makes me nervous about cloud computing and data storage. It's bad enough when I screw up a config and it takes down my mail, but what about when it happens to the entire globe at once?" Several readers also point to CNET's coverage of the outage.
Update: 05/14 19:25 GMT by T : CWmike adds this: "Steven J. Vaughan-Nichols writes that what may be happening is a massive DDoS attack. Based on the size of the attack that would be needed to interfere with Google, I believe that it's quite likely to be the result of an attack from the controllers of the Windows worm, Conficker. Another theory that has been put about — that the problem was due to AT&T NOC routing problems — does not appear to hold water, writes Steven."
Update: 05/14 21:01 GMT by T : Google's put up a low-detail explanation on their blog that says "An error in one of our systems caused us to direct some of our web traffic through Asia, which created a traffic jam. As a result, about 14% of our users experienced slow services or even interruptions."
In comments from Google Admins, they said "oops." :)
Serious? Seriousness is well above my pay grade.
My Google voice account went all sorts of haywire.
1) Text messages sent from the web got duplicated. One person got near 10 duplicates in quick succession. I also got duplicate messages back.
2) My number doesn't work. If you call it you get a "Currently unavailable"
3) A few calls that came in before the outage aren't showing up in the Received/Missed calling list.
...and take an stroll to the great big place known as "outside".
call me....
And yet somehow miraculously we are all still alive. The sky is not falling!
When it's just your mail server down, everyone else gets annoyed at you because you're not {gett,receiv}ing mail they're {sending, expecting from} you. When the cloud is down, everyone can just chill and be thankful that they're not going to log on to find a whole stream of new emails.
This sucks for docs though but using a completely cloud based doc solution is a bit mental. Even if you're mobile it's best to have a local copy to save on battery life.
Nick
If everybody goes down, nothing happens and you just go outside (beyond the doors, out into the bright white light) and enjoy your day until 'they' fix it.
What's not to like?
Faster! Faster! Faster would be better!
If it bothers you then use a mail client to download your mail from Google. As someone that has been using my gmail account all week I didn't even notice a problem, the whole thing seems overblown.
Having run my own mail server, and used mail servers run by companies I work for, I'll -gladly- take GMail's track record for reliability. Even with no 'guarantee', it's been a hell of a lot better than anything else I've experienced.
And what's -really- the difference between a server going down locally that affects you and a server going down globally that affects you? Nothing.
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
Take a good look kids. Google was down and Twitter was up. This only happens once in every 3,271 days. You probably won't see it again, at least in Twitters lifetime...
Anyone who has ever used or administered a mail server has experienced a mail server going down. This is not news.
What is news is that Google Mail has been up for so long until now. And current accounts seem to indicate the outage lasted about one hour.
One hour of down time after five years of steady service is good enough for me. It is better than any other mail server I have ever used.
If a life is not lost, there are no worries with cloud computing (hence, cloud computing should be used for non-life critical services, gmail is a perfect example).
Of course, VCs may have lost revenue, Capitalists may sweat from loss stock trades, teenagers may lose that one twitter about how cool Miley is to them, some adult may not get that date tonight from craigslist, you may miss that one Hulu commercial, some K-12 kid may not be able to send out his homework, some college kid can't access his pirate bay music lists, or the USPoTC may miss that extra minute to promote his stimulus bill.
In the end, I hope cloud services shows us that we are not slaves to time. The human race has advanced enough to know that already. And really, if "the cloud" is down for an hour, maybe you should go outside and enjoy the wonders of nature and peace for once, or talk to someone physically. It begs to ask the question: "can it wait?"
Considering the amount of usage google sees, a minor interruption like today's issue is nothing that worries me much at all.
But usage is precisely the point.
I lost access to Search, News, E-Mail...
Everything Google.
To a casual user at home this doesn't matter - but try explaining a global blackout of Google to your boss.
Google is the poster child for the web-based app.
Computing in the cloud.
If we're talking about the same outage that caused google advertisements to hang forever this morning, it caused access to many unrelated websites to hang, including slashdot itself. This seems like a really bad single-point-of-failure issue. If a site can't display ads, shouldn't it come up anyway?
It's bad enough that I have to wait tens of seconds for Captcha content to pop up long after a login page has loaded.
This is starting to get annoying. If this is "cloud computing", I'd rather stay on earth.
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
When done correctly, the "cloud" is the internet itself. Google has network design issues, some of their key services only have a couple of ingresses into Tier-1 providers:
http://en.wikipedia.org/wiki/Tier_1_carrier
I don't work for them, i don't hold their stock, and I am not (currently) a customer, so I have no skin in their game, but Internap as a BUSINESS MODEL, becomes more important.
If you are a major company that comes to rely HEAVILY on Cloud Services, you want to insure that you have on-ramps into several Tier-1 providers ALL AT ONCE, without having to contract individually with 4 or 5 of them yourself. I predict more companies will mimic this model of aggregation, essentially handling the business of BGP optimization for customers, and handing customers 2 redundant pipes and saying "hey, don't worry if San Fran has an earthquake and these peering points blow up, we'll get you out via this Tier-1 backbone over to your cloud computing provider's service via this backbone within seconds. Let us handle that."
Especially with ISPs that get into pissing matches, like when Cogent and Telia got into it, and cut each other off. If you had Cogent as your only ISP, you were screwed if you wanted to get to a bunch of Swedish sites, because Cogent's CEO was trying to play chicken over some tariff rates. The cloud computing model will no longer tolerate that, it's not just some website, it's a BUSINESS function.
that's my take at least.
Finally someone commenting with some sense. It kills me to read all the "Great Job! Google!" and "Bravo!" comments. This exposes a serious flaw in planning, design and change management of a very heavily relied upon resource.
There is nothing to give kudos for here. Gotta love blind loyalty.
http://teasphere.wordpress.com - A little spot of tea
ack'ed is short for acknowledged, by way of TCP (which sends ACKs and NACKs). In the networking world, saying ACK as shorthand is pretty common.
Somebody must have typed "google" into Google. It's the only possible explanation.