Confirmed Gmail / Google App Outage
mbone writes "Earlier today there was a confirmed Google outage which got a lot of attention from network operators. From a post to NANOG after everything calmed down: 'Google ack'd a maintenance on their core network did not go as planned-Forced traffic to one peer link that was unable to handle all the traffic. Maintenance has been rolled back. Issue has been restored.' This is exactly what makes me nervous about cloud computing and data storage. It's bad enough when I screw up a config and it takes down my mail, but what about when it happens to the entire globe at once?" Several readers also point to CNET's coverage of the outage.
Update: 05/14 19:25 GMT by T : CWmike adds this: "Steven J. Vaughan-Nichols writes that what may be happening is a massive DDoS attack. Based on the size of the attack that would be needed to interfere with Google, I believe that it's quite likely to be the result of an attack from the controllers of the Windows worm, Conficker. Another theory that has been put about — that the problem was due to AT&T NOC routing problems — does not appear to hold water, writes Steven."
Update: 05/14 21:01 GMT by T : Google's put up a low-detail explanation on their blog that says "An error in one of our systems caused us to direct some of our web traffic through Asia, which created a traffic jam. As a result, about 14% of our users experienced slow services or even interruptions."
In comments from Google Admins, they said "oops." :)
Serious? Seriousness is well above my pay grade.
My Google voice account went all sorts of haywire.
1) Text messages sent from the web got duplicated. One person got near 10 duplicates in quick succession. I also got duplicate messages back.
2) My number doesn't work. If you call it you get a "Currently unavailable"
3) A few calls that came in before the outage aren't showing up in the Received/Missed calling list.
...and take an stroll to the great big place known as "outside".
call me....
And yet somehow miraculously we are all still alive. The sky is not falling!
When it's just your mail server down, everyone else gets annoyed at you because you're not {gett,receiv}ing mail they're {sending, expecting from} you. When the cloud is down, everyone can just chill and be thankful that they're not going to log on to find a whole stream of new emails.
This sucks for docs though but using a completely cloud based doc solution is a bit mental. Even if you're mobile it's best to have a local copy to save on battery life.
Nick
If everybody goes down, nothing happens and you just go outside (beyond the doors, out into the bright white light) and enjoy your day until 'they' fix it.
What's not to like?
Faster! Faster! Faster would be better!
If it bothers you then use a mail client to download your mail from Google. As someone that has been using my gmail account all week I didn't even notice a problem, the whole thing seems overblown.
Considering the amount of usage google sees, a minor interruption like today's issue is nothing that worries me much at all.
It's not like oh, say, Comcast, who left me without an internet connection for a month because their technician was drunk and rammed his truck into the large metal junction box where my apartment's internet connection tied into everyone elses. It only took them a month to replace the box and re-wrire everything
Sig Follows: "Suppose you were an idiot. And suppose you were a member of Congress. But I repeat myself." -- Mark Twain
Having run my own mail server, and used mail servers run by companies I work for, I'll -gladly- take GMail's track record for reliability. Even with no 'guarantee', it's been a hell of a lot better than anything else I've experienced.
And what's -really- the difference between a server going down locally that affects you and a server going down globally that affects you? Nothing.
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
Take a good look kids. Google was down and Twitter was up. This only happens once in every 3,271 days. You probably won't see it again, at least in Twitters lifetime...
Anyone who has ever used or administered a mail server has experienced a mail server going down. This is not news.
What is news is that Google Mail has been up for so long until now. And current accounts seem to indicate the outage lasted about one hour.
One hour of down time after five years of steady service is good enough for me. It is better than any other mail server I have ever used.
If a life is not lost, there are no worries with cloud computing (hence, cloud computing should be used for non-life critical services, gmail is a perfect example).
Of course, VCs may have lost revenue, Capitalists may sweat from loss stock trades, teenagers may lose that one twitter about how cool Miley is to them, some adult may not get that date tonight from craigslist, you may miss that one Hulu commercial, some K-12 kid may not be able to send out his homework, some college kid can't access his pirate bay music lists, or the USPoTC may miss that extra minute to promote his stimulus bill.
In the end, I hope cloud services shows us that we are not slaves to time. The human race has advanced enough to know that already. And really, if "the cloud" is down for an hour, maybe you should go outside and enjoy the wonders of nature and peace for once, or talk to someone physically. It begs to ask the question: "can it wait?"
"It's bad enough when I screw up a config and it takes down my mail, but what about when it happens to the entire globe at once?"
That's much better for you. Instead of having to explain to everybody that the dog ate your homework or whatever, you can sit back and let them explain it to you...
It will suck when everything's on the Cloud because I won't be able to claim my server's been down all day while I'm out playing golf.
When things came back up this afternoon it was an old backup version and several of my settings had been rolled back.
I guess this is one instance where Google's perpetual beta status really applied - those using Voice for mission critical communications were up a creek.
Ah, we all get our power from the "electrical cloud". We all need private generators. Ah! Ah!
If we're talking about the same outage that caused google advertisements to hang forever this morning, it caused access to many unrelated websites to hang, including slashdot itself. This seems like a really bad single-point-of-failure issue. If a site can't display ads, shouldn't it come up anyway?
It's bad enough that I have to wait tens of seconds for Captcha content to pop up long after a login page has loaded.
This is starting to get annoying. If this is "cloud computing", I'd rather stay on earth.
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
why yahoo,hotmail never go down , because nobody is using them. Heck I haven't opened mine in months now.
-- It is the mark of an educated mind to be able to entertain a thought without accepting it. -- Aristotle
I love all the fucktards who keep saying: "oop, the cloud's down I'll go for a stroll" or "welp, google's down, I'll go home." Where in the hell do you work? Your phone is going to be lit up like Times Square with all the user calls/complaints for hours. And just up and leaving offers zero customer service to users who rightfully don't know what is actually wrong.
Shit's down, whether it is your or not you are seen as responsible and at least have to offer some communication and support. The problem is you look bad because you can't tell anyone an actual ETA or valid explanation besides a shrug of the shoulders and a "hopefully CompanyX gets it fixed soon."
Cloud computing can be a great thing but this shows that there are fundamental flaws still. I have run systems that could have zero downtime and achieved it. Yes, it requires redundancy. Yes, it is expensive. Yes, it requires geographically and ISP independent sites. Yes, it requires planning. But it can be done, so stop all the bullshit praise that because it is Google and they are big, this is OK. It isn't. if anything they should NEVER have this kind of issue.
The Google-colored glasses need to be taken off.
http://teasphere.wordpress.com - A little spot of tea
When done correctly, the "cloud" is the internet itself. Google has network design issues, some of their key services only have a couple of ingresses into Tier-1 providers:
http://en.wikipedia.org/wiki/Tier_1_carrier
I don't work for them, i don't hold their stock, and I am not (currently) a customer, so I have no skin in their game, but Internap as a BUSINESS MODEL, becomes more important.
If you are a major company that comes to rely HEAVILY on Cloud Services, you want to insure that you have on-ramps into several Tier-1 providers ALL AT ONCE, without having to contract individually with 4 or 5 of them yourself. I predict more companies will mimic this model of aggregation, essentially handling the business of BGP optimization for customers, and handing customers 2 redundant pipes and saying "hey, don't worry if San Fran has an earthquake and these peering points blow up, we'll get you out via this Tier-1 backbone over to your cloud computing provider's service via this backbone within seconds. Let us handle that."
Especially with ISPs that get into pissing matches, like when Cogent and Telia got into it, and cut each other off. If you had Cogent as your only ISP, you were screwed if you wanted to get to a bunch of Swedish sites, because Cogent's CEO was trying to play chicken over some tariff rates. The cloud computing model will no longer tolerate that, it's not just some website, it's a BUSINESS function.
that's my take at least.
while I was trying to get work done today. This was pretty scary. I mean, besides not being able to search google and check my email, there are other sites that wouldn't work. Some apache projects and also nabble use google analytics apparently, so I couldn't even load those pages. Also, I couldn't load slashdot's main page because it apparently uses googleads or something like that. What suggestions to people have for this? What other sites were not accessible during the outage?
Many sites rely on Google in ways that aren't immediately evident - for instance, during the outage, Google Analytics connections were lagged, which meant that all our our sites that incorporate Analytics were ALSO lagged.
What's amazing is the extent to which an outage on a single entity can bring down ALL of the other entities that surround it -- not just those who rely more visibly, e.g., Google Docs., on their services.
Yikes!
--Dave
Everything is fine. It was simply a glitch in the holo-matrix. The Doctor has been tinkering with his program again and caused a feedback loop between the holo-emitters and EPS conduits on deck five. Seven has corrected the malfunction.
Notes or contacts causing important meetings to be missed or leaving attendees un/less prepared. It's easy to say back everything up, but in the real world under stress (or laziness, or stupidity) you tend to stick with simpler work-flows. I like Saas for non-critical applications, maybe it's an age thing or maybe critical service/hosted solutions are simply still new enough that the kinks in reliability haven't been fully worked out.
Quack, quack.
It's bad enough when I screw up a config and it takes down my mail, but what about when it happens to the entire globe at once?
I was reading this comment and it occurred to me that the latter is actually preferred. With the first option, your systems are messed up, but everyone else wants you to continue to conduct business. With the latter situation, your systems are down and so are the people who would normally be trying to reach you.
The Cylons are coming, the Cylons are coming!!!
I think I did something wrong: I was going to open google, but wrote the address in the firefox search box instead of the address bar. Suddenly, the internet went mayhem. Any clues what went wrong?
The Fog: What happens when The Cloud is down.
This speculation from the ComputerWorld blog doesn't belong in the post. Even the blog author says its conjecture. Especially ridiculous since the NANOG post in the second link already explained that the problem was a routing error at Google.
...and it's still in beta!
e-mail is supposed to be reliable because of its distributed nature. It is not supposed to be on single "cloud", distributed machines should be caring for it. It is just like XMPP vs. old fashion MSN/AIM etc. junk.
:)
Let me show what I see with the "cloud" (which is one of the worst abused terms) right now:
(wget)
s3.amazonaws.com[72.21.207.242]
Saving to: `423.dmg'
10% [===> ] 4109203 5,54K/s eta 79m 33s
So, highly successful mac shareware which I love couldn't deal with bandwidth issues and offloaded the downloads to Amazon S3. Amazon S3 on the other hand, showing it works perfectly (on status page) has 450 ms ping response and I am back to 56K speed on a 4 mbit ADSL line. It looks like something wrong with Level3 hops.
Cloud is not offloading all mail to one central server nor putting all files to Amazon S3, it doesn't even exist yet. When people do 10x realtime h264 encoding with their Xgrid enabled portable Macs running Snow Leopard and store the file anonymously to thousands of other machines, that would be some kind of "cloud". Right now, Cloud is just an icon for that overpriced me.com (dotmac) service
Will this have any impact on internet porn?!
strange. my Firefox 3.0.10 got somehow affected by this outage. it just refused to open! it loaded about 30Mb of data to RAM but went nowhere from there. the browser window never appeared. and i tried to re-launch it several times, but for no avail! very odd.... anyone else had problems with it? Opera -- although not able to open Google.com -- opened fine!
Is Firefox tied to Google like E.T. was tied to Elliot?
Right when it started I had trouble loading /. even. It kept stalling while loading Google ads. However that seemed to only last for 10 minutes while my Gmail, iGoogle, etc. was slow for 1/2 an hour or so. Maybe they fixed the ads quickly...
Looking at the Google status page at http://www.google.com/appsstatus# has some live info.
ZDNet is reporting that any traffic that is routed through AT&T was not able to get to Google
http://blogs.zdnet.com/BTL/?p=18064
Google says that a traffic overload in Asia was the problem:
http://googleblog.blogspot.com/2009/05/this-is-your-pilot-speaking-now-about.html
So it looks like a switch/router issue caused a long packet path with caused timeouts which caused unhappy users.
I blinked a few times over the extra update,
until I read that it wast just Steven J. Vaughan-Nichols' speculations. Apparantly, the claim
that Google got DDOS'ed is not confirmed.
This makes me wonder is Google too big to fail of the technology world - as Citibank/Bank of America/AIG are to the financial world. Would the US government have to prop up Google and its services some day with massive bailouts because the failure of google could be catastrophic for the general public. The cost of this failure to individual users may not be high (a few minutes of lost access to mail, a hit to efficiency because you cannot search etc etc) but the cumulative cost across the globe could be very high.
I post, therefore I am
Am I missing something? What is "ack'da maintenance?"
Sounds like someone's watched the new Star Trek a few times too many...
To reign is to serve.
at least 4chan is working again
MITM!!! Quick, change your passwords. The Asians sniffed them all! LOL
Companies expecting to do mission critical work over the Net need dedicated lines, dedicated machines, and somebody from THEIR company overseeing the system.
Relying on other people is a sure route to disaster. It's hard enough relying on your OWN people.
The Net is NOT fault-tolerant - unless YOU make it so.
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
i wonder if whoever did this used teardrop.c or boink.c?
We purchased a 25K euros firewall last month with which we had some issues, not that much issues, just too much issues to my taste for a so much expensive firewall. Then i made the error of confusing the google outage of today with the firewall issues at the exact time google went down, using firefox to type adresses in the search bar, nothing was responding. Thus i looked quite dumb in front of the support team when i realized everything but google was working, the thing is, when you trust Google and you trust it for anything.. when it goes down and you use it for testing, your tests can t be relevant. A very old teacher of me would have called that a n00b mistake, as the support guy was saying (after i noticed only google wasn't ok) i should have tried to ping the first hop of my isp. Still it s a bit complex when http proxy on the firewall is crashing, black/whitelist adress is not always ok, AD user sync isn't always ok, but still i should not have tried opening a google page as my only test during that time, opening web pages via firefox google search plugin was also a mistake.
And what's -really- the difference between a server going down locally that affects you and a server going down globally that affects you? Nothing.
The difference is that you don't have a global melt-down of every web base service that is dependent on Google.
At work I experienced the issue, but I could remote in to my home computer and load gmail etc no problem...I live 3km from my office.
/. loaded slowly for me while the issue was occurring, and gmail was totally inaccessible.
Even
I was going to email you all about this earlier but I couldn't send email for some reason.
Are these growing pains or telltale signs that the Googlactic Empire is decaying?
Where's Hari Seldon when you need him?
Cloud Cloud cloud cloud? Cloud cloud "cloud cloud cloud?"
Cloud cloud cloud's cloud cloud cloud Cloud Computing cloud cloud cloud cloud cloud...
For those with a significant investment in the cloud, how do you plan for disaster recovery?
I've been running mail servers for companies AND myself since 1996. In that time, there's been 2 days total of downtime. That includes delays in access because my ISP forces an IP change and didn't tell me it was going to happen. Took a day to figure that out and get the DNS record updated when I was out of the country.
If you're running email servers and can't beat the poor outage record for google, just in the last year, that's really, really sad.
My biggest concern over google is the invasion of privacy. The more you use their services, the more they learn about you and your habits. Somewhere it was estimated that google earns $400/yr on our data. Do you gain $800/2 yrs in benefit from google tools?
Somebody must have typed "google" into Google. It's the only possible explanation.
...just the cloud.
Instead of 'all your data in the cloud ', how about all your data on a portable device that you plug into a rom type device that provide basic screen, mouse, keyboard and Internet functionality. The only thing out there 'in the cloud' would be a set of servers providing identity and virtual location information. As in Skype where the server keeps a telephone directory but the communication is end-to-end. That means if one service fails I can fall back to the others.
"We purchased a 25K euros firewall last month with which we had some issues"
..
What for, all you needed was a redundant PC and SmoothWall, not that a firewall is much good in this day-and-age of RPC over HTTP and various apps allowed to open most any high port. Firewall were only really useful when the original nix system only allowed 'root' to open low ports for sending, so any packets received (nix-to-nix) from one of these ports was deemed semi-validated. Whatever, read what an expert has to say on Firewalls and security.
"using firefox to type adresses in the search bar, nothing was responding"
Why not have a heartbeat applet running on the firewall that SMSed your phone in the event of an outage. That way you don't have to set up camp in the server room, clicking on things
There is a cool graph on the outage from Wired and Arbor Networks.