Amazon EC2 May Be Experiencing Growing Pains
1sockchuck writes "Some developers using Amazon EC2 are wondering aloud whether the popularity of the cloud computing service is beginning to affect its performance. Amazon this week denied speculation that it was experiencing capacity problems after a veteran developer reported performance issues and suggested that EC2 might be oversubscribed. Meanwhile, a cloud monitoring service published charts showing increased latency on EC2 in recent weeks. The reports follow an incident over the holidays in which a DDoS on a DNS provider slowed Amazon's retail and cloud operations."
Why not say "Yes, we're way too popular. We're adding capacity as quickly as we can, but people are just lapping up our service!"
This seems like a missed marketing opportunity.
The ______ Agenda
When the news came around for EC2's DDoS around Christmas, I remembered reading how Amazon began offering their services to third parties in the first place. Turns out Amazon has a sudden peak of traffic around shopping holidays and particularly Christmas.
To prepare for that, they have added enough hardware to handle the peak, but that hardware went unused the rest of the year. So they started leasing it to third parties in the form of their web services.
This immediately makes you think, ok, what happens to their ability to handle the third party apps around Christmas, when they need a lot more hardware to handle Amazon.com's traffic itself? And then this DDoS happened, which importantly overloaded not the actual app servers, but the DNS servers pointing to the app servers. So as a result the app servers experiences lower traffic for third party sites than they would have otherwise.
It's making me think, and this is of course just speculation, this may have possibly not be a genuine attack as much as a stunt to lessen the overload of their cloud services they knew they'd experience around Christmas, while having a plausible explanation for the downtime that blames it on a malicious third party.
Reading they do indeed have had (and still have) performance issues supports that speculation.
Amazon needs to move their cloud into space. Yes, space! It's the next big frontier beyond clouds, and you heard it here first.
You means, virtualize their EC2 server using EC2?
That's virtualization all the way down.
Wouldn't want any of my important data stored on a system which has performance issues...
Or having to wait significantly longer than I would storing my data locally!
i priced out a high memory config and it's like $6000 per year or more for 32GB RAM of memory and 8 CPU cores. In a few months Intel will ship server CPU's with 12 logical cores per socket. RAM prices are dirt cheap and at current prices a 36GB RAM HP Proliant DL 380 G6 will run around $13,000 and 72GB of RAM another $2000. and that includes 5 year 4 hour response time support, some of the other extras like advanced ilo, and i forgot what else i added since it's so cheap.
add in the increased bandwidth costs and the supposed cost savings vanish. it's like the ghetto people that lease a lexus or a Benz because they can't afford to buy or they like the lower monthly payments. it's like 2000 all over again. hardware is expensive to ASP's set up shop. hardware prices drop for the power you get and ASP's go out of business.
and i think this is a scam by the hardware companies. i buy an HP server i buy one machine and a few hard drives. to support me Amazon needs to buy a few servers and 5 times the raw space for DR purposes.
Comment removed based on user account deletion
Uh... How else do you think they make money?
Deleted
We use EC2 as the back-end for Netalyzr (our free, applet-based network testing and debugging service), and right now are in the middle of a minor flashcrowd with our big updated release. No recent glitches we've noticed, with long running small instances.
Test your net with Netalyzr
Or like the insurance industry, where the insurance companies take out insurance with re-insurance companies, against getting too many claims.
Or like the mortgage industry...
Seriously, I think Amazon and Google intend to be the end of the chain. They don't want to buy computing services from a third party. It seems like they need to invest in affordable "idle" capacity to deal with peaks. That spare capacity needs to be economical when it's not needed. Either it can be cheap to keep mothballed (whole powered-down data centres on cheap land?) or it can be working on profitable batch computing tasks, that customers don't mind having paused when the capacity is needed for real-time work.
Seriously, I think Amazon and Google intend to be the end of the chain. They don't want to buy computing services from a third party.
They may want to, and it might be reasonable. One reason for them wanting to do this is that they are so far the top dogs in the fight and their buying from smaller players would not make economic sense. They cannot buy from each other because they have very different models - Google's "cloud" services are much more restricted than Amazon's.
But if/when more players come into this field, it might make sense for them to buy computing resources from each other. Both buyer and seller would gain. Seller gets to earn for his idle resources - these earnings would be non-zero but less than if they were selling to an end customer. Buyer, of course, avoids disappointing his customers and save his face.
Though there might always be some cloud service providers who will not buy/sell. This does not mean there is no value in cloud guys trading with each other.
Bingo Dictionary - Pragmatist, n. A myopic idealist.
I seen to recall a post on slashdot about Amazon Introduces Bidding For EC2 Compute Time. This announcement took place on 12/14/2009, which coincides with the increase in average ping latency as illustrated in cloudkick's chart. Was Amazon unprepared for the increase in demand created as a result of bidding off of the unused EC2 capacity?
I am sure that people came up with some pretty creative thing to do with low priced EC2 capacity.
Trinity Rescue Kit is a network boot/CD boot linux that reads and writes NTFS etc.
We use it here to image and deimage windows systems, it takes ~10 minutes boot-to-boot to bring up a raw windows system in a known state.
Test your net with Netalyzr
I keep a small reserve instance running 24x7 and the cost is very low. I also have a EBS bootable large instance that I run for a few hours at a time as needed. It has been a while since I used it, but Elastic MapReduce also works well and is fairly inexpensive for what you get.
About half of my customers also use EC2s.
(Note: Amazon gave me a large grant to use EC2 for free for work on my last book, but my comments are my honest opinions.)
Comment removed based on user account deletion
G6 Proliant servers start at $2000 for the low end model and scale up to 144GB RAM. the hardware is so scalable today it's insane. we just buy a lot of RAM because the difference is only a few hundred $$$ and it saves me from a late night hardware upgrade. in fact what i do is max out using the least dense RAM we can afford. RAM is dropping in price by 50% a year so if we need more RAM we buy the more dense RAM next year at the same price and use the existing RAM in another server where we were really financially strapped to buy it as cheap as possible.
we even have a "crap" box of RAM lying around with 30GB or more in there now at any one time that we go to for a quick upgrade of a server from a few years ago when RAM was a lot more. i just upgraded a server from 8GB to 16GB RAM that was bought 2 years ago with a tight budget. RAM is cheap and there is a ton of extra always around.
in the end it comes down to having a lot of expensive hardware sitting around not doing anything 90% of the time and crazy support costs being paid. the only question with the cloud nonsense is who has this hardware? the big companies like EMC, HP, Dell, Seagate and Cisco want Amazon to have it since they know Amazon will probably pay up the support costs and will always buy extra for growth. where a smaller shop will buy what they need and upgrade later. and not pay the insane precious metals support costs all the sales people like to push with contract clauses that say we don't really guarantee you this level of service if the parts aren't in stock.
having a lot of expensive hardware sitting around not doing anything 90% of the time
The solution to this is finding saleable ways to use this spare capacity. That's what Amazon's Spot Instances plan addresses. Essentially, you set up an image, and ask Amazon to run it at a given price. During off-peak hours, when the capacity is available, your image comes up. If the capacity is unavailable, and someone's outbid you, your image comes down again.
(Remember, web hosting is not the only thing computers can do)
The cloud providers will have growing pains for years to come. However, cloud is a much better choice than the overhead of building and running your own data center.
But if individual (small) business owners own all the hardware, overall more hardware gets sold. So Intel, AMD, Samsung, Kingston etc. companies that earn more from hardware than from support don't want Amazon to own all the hardware.
Also, Amazon knows what support service is worthless. They may not be using expensive EMC storage solutions and rather going with consumer grade hard disks like Google does. So EMC may not be happy with Amazon owning lots of hardware.
Amazon also does not pay Microsoft for virtualization solutions (it uses modified xen virtualization, last I heard). Small businesses are much more likely to buy everything from Microsoft, including virtualizaion solutions which Microsoft is not the best in. So Microsoft also may be unhappy with Amazon owning the hardware.
Bingo Dictionary - Pragmatist, n. A myopic idealist.
Google's App Engine is so over-sold it can take 20 seconds for a page to load.
If you're app takes 20 seconds to deliver a page then it's your fault. My app consistently delivers dynamic, multi-hundred kilobyte pages in 1-2 seconds anywhere in the US. See for yourself! www.TwitGrids.com
If you code for App Engine like it's Rails or Django your app will be a dog.
"Liechtenstein is the world's largest producer of sausage casings, potassium storage units, and false teeth."
Amazon's instance types (http://aws.amazon.com/ec2/instance-types/) doesn't seem to indicate the number of cycles/sec you are guaranteed to use per type.
They sell instance types based on the physical hardware specs which is worthless in a cloud architecture.
What they should really be doing is indicating the number of cycles/sec an instance type will be GUARANTEED and then enforce it.
If the customer doesn't use that number of cycles/sec, then fine put the idle cycles up for bidding.
Just my $0.02
Ben