Dark Day In the AWS Cloud: Big Name Sites Go Down
An outage of one company's servers might only affect that company's customers — but when a major data center for Amazon hits kinks, sites that rely on the AWS cloud services all suffer from the downtime. That's what happened today, when several major sites or online services (like Instagram and AirBnB) were knocked temporarily offline, evidently because of problems at an Amazon data center in Northern Virginia. From TechCrunch's coverage of the outage: "The deluge of tweets that accompanied the services’ initial hiccups first started at around 4 p.m. Eastern time, and only increased in intensity as users found they couldn’t share pictures of their food or their meticulously crafted video snippets. Some further poking around on Twitter and beyond revealed that some other services known to rely on AWS — Netflix, IFTTT, Heroku and Airbnb to name a few — have been experiencing similar issues today."
but I'd rather have a few strategically placed servers in datacenters spread around the country (world?) than something hosted on AWS.
How is it that AWS is less reliable than the 4 Windows machines I get stuck managing? One of which has had a failed CPU for a few years now ... yet its still going.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
I thought this might already exist, but I'm not finding it with a quick Google search. Seems like it's a thing that could get ad views from some decent IT audiences.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
When morons can't watch TV (or equivalent) they fuck. 9 months later you'll see a birth rate spike.
Why do people even cloud ? Real dedicated overpowered servers with multiple Gbps pipes are available for a few hundred bucks these days...
That went down and I think it ate some files with it. Just before the crash my client reported 103 files being removed. They weren't by me.
Anonymous Cowards generally receive no replies because you're a coward and I'm a bitch
You can do it with AWS, no problem. Only one region was affected this time, other regions are OK.
Yeah, sure. Maybe in the Bay Area.
I thought for sure the first comment would be "I'm on to you NSA...down time for service "upgrades" " I'm disappointed in you my tin foil hat wearing brethren.
I've run servers on both Amazon and Rackspace for several years now and I can't recall a single instance of Rackspace having an outage. On the other hand, Amazon seems to have major issues at least 2 or 3 times a year. Is this stuff tracked anywhere?
"Don't teach a man to fish, feed yourself. He's a grown man. Fishing's not that hard." - Ron Swanson
Maybe the NSA screwed things a bit when were installing there their new codenamed program (after Snowden published all the old ones).
Were there any problems with Amazon.com? You'd assume they use their own service.
Don't lose faith that easily.
What is going on? I don't buy it. While I get that you can't tell when the NSA has tapped the line I would imagine that things might go down in such instances. Something has to go down before they cut the line unless there are multiple entry points maybe. However I wonder if this has something to do with the way things are being done. That is there not tapping the line any more. Rather they are force implementing taps that provide access to specific data types. For instance now they can do more than just search for strings in users email. Now they can see a users facebook page as they user sees it for example instead of just a series of texts.
I gues stuff goes down. But not at the rate in which major sites are going down. If there is a logical explanation of some kind that impacts everybody (sun bursts radiation type thing) then please... provide it. But they all seem to be giving vague answers to the reasons the sites have gone down. Ebay it was 'regular maintenance' gone amok (I do believe that was scheduled, but others haven't been, I don't think).
Yes. Rackspace even has an outage on their main website that lasted *days* just few months ago, if you wanted to access it via IPv6. Sadly, there was not easy place to report the outage. The technical contact in whois is something at netnames.com? So I just ignored it.
Anyway,
https://status.rackspace.com/
lots of reports of small issues. You should know this stuff if you are running an instance on their hardware!!
RS has had issues
Would make a good site, a historic long term heat map of server outages. A lot of tech press to search back into, thankfully you can buy into digital press databases :)
Domestic spying is now "Benign Information Gathering"
Isn't this why AWS offers multiple regions?
Such large sites should understand that having multiple availability zones means nothing if the zones are all in the same region. Oh, and your application would need to be designed for failover.
In addition, when looking for high-availability, you don't segregate your audience to individual regions. You let the working regions take over for you.
Or spend the extra money and set up your own co-lo arrangement.
Kriston
partly cloudy with a chance for server outages.
Just had to power down while the NSA live feed was plugged in.
"I hear-by renounce my allegiance to the United States Of America and for which it now stands."
You are the property of the Corporation called the UNITED STATES OF AMERICA and your allegiance is not yours to renounce. You are mortgaged property belonging to the Federal Government of the Corporation called the United States of America. You must do as you are told.
"cloud" is sold as a *convenient* way to compute, where it's quick to add resources when needed so you can start small and scale up (and down) with demand.
It is *not* generally considered a cheap or particularly reliable solution. So far at least none of the cloud providers are offering five nines--if you want that, you should (for now at least) jbe looking at enterprise/telecom gear.
As a cloud customer, reliability (currently at least) is up to you. If you want the extra reliability of running instances in multiple availability zones then it's up to you to pay for it.
The point of the cloud as it stands currently is not that it's cheap or reliable, but that it's easy to scale up/down with demand.
That things like this will happen with a cloud infrastructure are obvious. That the reliability claims made by the cloud providers are fantasy is also obvious. As soon as they start to do "uptime or else" (meaning you get tons of money as downtime compensation), things may be different. but they will not do that. At this time, the only thing you can do is change to a different cloud provider, which will have the same issues. Uptime guarantees without penalties when failed to meet them are worthless.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
It depends which data center you're in. PortableApps.com has been hosted at Rackspace for years and we had multiple major outtages due to ongoing power issues in the Dallas data center in 2009. The switch from grid to ups was failing and would take the whole wing of the data center out with every server crashing hard. It would take quite a while to come back up. Then we'd have to wait hours for the Rackspace folks to rebuild our corrupted database (fully managed account on a dedicated server). It happened two weekends in a row in June and one other time if I recall correctly, basically costing us a full day of downtime each time.
Portable versions of Firefox, GIMP, LibreOffice, etc
public cloud services as "the future". I will never risk my corporate data uptime and reliability to some "location in the cloud". I'll stick to private clouds (VMWare/VCenter) where I have control of both hardware and software and reliable failsafe systems. At least then if I have downtime I also have accountability and predictability. They same cannot be said for cloud providers and no matter what anyone says once the data leaves your hardware, you have lost that control.
Netcraft?
If you want news from today, you have to come back tomorrow.
Amazon also offers some of the cheapest prices, so you pay for what you get.
The only web site that I've noticed being down in the past few weeks has been Wikifonia, the wonderful place where crowd-sourced MusicXML lead sheets for all sorts of music are available.
They're back online now, and at least from what I can see, there is great jubilation among musicians worldwide. Where else can you go and search for some old jazz standard and get an immaculate lead sheet, instantly transposable into any key, downloadable as a PDF?
I think Wikifonia has been single-handedly keeping the vast Great American Songbook alive, for which they deserve great thanks.
I thought it was just an issue where some big music publishing group that represents outfits that charge $5 for a lead sheet to a song whose composer has been dead for half a century has been hassling them, but since it's back online and faster than ever, I think it might just have been a technical glitch.
Wikifonia, salute!~
You are welcome on my lawn.
Shouldn't this, technically speaking, be a "bright day" or a "sunny day"? After all, that's what I call it when the cloud-coverage breaks around here.
I had an outage in IAD just two weeks ago. Connectivity failure on several aggregates affecting many customers. Rackspace shill much?
For believing and investing in some handwavy concept called 'cloud' where you abrogate responsibility take the iOS view (it Just Works) of technology.
I want to delete my account but Slashdot doesn't allow it.
Depends on which "future" you are talking about. The future where the bulk of personal data is stored on the cloud to be shared across devices and with friends, family and authorized services is one I think is bound to come to fruition.
The future where Corporations put their core infrastructure into the Cloud is not one I ever recall anyone talking about.
how i hate it. rebuild my local lan vm server and on the other side of the world aws craps out ... so either quantum entangelment or a tried frame from nsa?
Chances are that there are no providers that offer a true 99.999% uptime. If you demand that, you need to be building your code to run in a HA cluster with nationwide dispersion. (For reference, you get 5.25 minutes of downtime across a whole year).
99.999% uptime is also completely unnecessary, but sounds really good to management until you talk cost.
If people can connect to one another even the smallest of voices will grow loud.
--Serial Experiments Lain
And unless you're a very large company, this will be wasted money. And it'll be less reliable than properly-designed applications using Amazon's infrastructure for cheaper.
Now, if you're a bank, and you're putting your critical customer data up on Amazon, that's probably pretty dumb. But there's a lot of data that's not "critically sensitive" like that.
The right wing talking heads on TV would be squealing like stuck pigs. They would be screaming about "gubment" waste and incompetence, and start floating bills to privatize the FAA (or whomever). You'd get the same response on Slashdot as well.
Meanwhile in real life AWS, Google, and NASDAQ have all had dramatic failures in recent weeks. Although NASDAQ got a fair amount of coverage, and Google got some mention, AWS has been pretty much below the radar for the mainstream media. No one is making dramatic statements on TV about how Google is run by a bunch of idiots, or NASDAQ, a quasi-governmental entity, should be nationalized, because when it fails the entire economy is as risk. As far a critical comments, it's the sound of crickets.
Clearly, there is a double standard. When there are problems with technology in the public sector, it's all hostility and table thumping. Similar failures in the private sector are treated like natural disasters completely beyond human control. According to common rhetoric, the private sector is always better then the public sector. Yet when the private sector fails, no one ever compares it to the well functioning public sector.
There is clearly a lot of hypocrisy in bashing the government. A lot of political power is at stake, and along with that goes a lot of money. This situation makes some people very happy, because they are getting what they want, both in public policy and private profit.
Why is Snark Required?
and the operators realizing this tried to deactivate it?
I've had tons of outages on rackspace cloud. Including systemwide networking outages.
AWS US-East is overloaded. It will continue to be overloaded as long as US-East is the cheapest region, because people are idiots. Here's an idea: RAISE THE PRICE OF US-EAST.
Poorly designed apps that depend on instances rather than redundant zones suck regardless of where they are hosted.
I bitched about this today. There is a right and a wrong way to write a cloud application. Blaming the Cloud for an outage is like blaming the road for a car wreck.
Thank god the cloud-fanboys and cloud-haters have already kicked in to prevent any sane discussion.
Netflix did NOT go down.
I was a former Slicehost user in the St. Louis data center and then was moved to Chicago after the Rackspace acquisition. Even so, there's never been so much as a blip from there in the last 5 years. Probably is data center dependent, I just never remember hearing about anything.
Friend of mine here in town owns a web business using about 9 Rackspace servers to host 700 websites and he said they hadn't had an outage in the last 8 years.
"Don't teach a man to fish, feed yourself. He's a grown man. Fishing's not that hard." - Ron Swanson
I've seen some weird error messages mentioning Google Groups today when sending emails in gmail to addresses, afaik not even remotely related to such a group...
No he went to Russia so he wouldn't have to share a cell with Bradley Manning.
I only look human.
My mother is a halfling and my dad is an ogre, so that makes me an Ogreling
entertainment and social media down? who gives a shit? grow up.
I have to say with all of the big names having problems recently this has been one of the best weeks ever for the lowly corporate sys admin. Now if the company's email, file or web server--or even the coffee machine goes down, they can point to the big names that also have problems. It's great to be able to say that even at companies like Amazon, Google or Microsoft with all of their talents their servers also have problems. It's the greatest excuse ever for tripping over the power cord. And if that doesn't work, you can always blame the NSA for the typo in your email or the late TPS reports.
Thanks everyone and happy SysAdmin day! (which isn't today, but due to the unexpected outage is running late)
"no need for capital expenditure" and "minimal start-up costs" are not the same as "cheap". All it means is that you don't need to pay up-front.
It's like renting a car for a day vs buying one. If you only need a car a few times a year, renting is cheap. If you need a car every day for a decade, you should probably buy one.
There are some things where five-nines makes sense.
Disclaimer...I have worked in the telecom industry in the past.
Is that recursion or just self documenting?
I only look human.
My mother is a halfling and my dad is an ogre, so that makes me an Ogreling
The issue is that when people sing the praises of EC2, they always seem to imply people mostly know better and have moved past the need for reliability at the lowel level. However, events like these *repeatedly* show that some of the biggest names flame out multiple times a year. This suggests that while the *theory* may be there, there isn't so much good examples in practice.
Netflix is the example that frustrates me the most. They brag about how bullet proof their services are because they are so smart, to the point of intentionally killing random instances in production to verify to themselves they are still bullet proof. However, they have significant outages and streams randomly do fail out. The shining example that 99% of people hold up as to why the 'cloud model' of disposable VMs is totally worth it and solid fails far more often than typical schemes. All this while EC2 has been able to take advantage of brand recognition and actually charges *more* for less reliable infrastructure than some other hosting providers.
I thought one of the pro's of using "the cloud" was that these events would not happen, as you are no longer relying on a single datacenter?
If AWS is putting several "clouds" in a single datacenter, what's the use of AWS?
/ The Arrow
"How lovely you are. So lovely in my straightjacket..." - Nny
"The future where Corporations put their core infrastructure into the Cloud is not one I ever recall anyone talking about."
Microsoft tried to push that very concept during the Windows 2000 launch tour, back in 1999. At the presentation I went to in Los Angeles, the audience, ~1000 professional IT types, all developed identical angry scowls.
~REZ~ #43301. Who'd fake being me anyway?