Why Auto-Scaling In the Cloud Is a Bad Idea

I don't think so by Yvan256 · 2008-12-06 10:49 · Score: 5, Funny

I think auto-scaling the clouds based on actual demand is a really great idea. I think farmers would really like that feature, in fact.

Wait, what clouds?!

Re:I don't think so by ZarathustraDK · 2008-12-06 11:28 · Score: 3, Funny

Wait, what clouds?!
Cumulo-mumbo-jumbo-nimbus clouds maybe?

--
If you quote this signature there'll be 72 copies of Windows ME waiting for you in Heaven.
Re:I don't think so by larry+bagina · 2008-12-06 12:31 · Score: 3, Funny

Could be a script that logs in and then posts anonymously. That's what I'd do.
Disclaimer -- I didn't do that.

--
Do you even lift?
These aren't the 'roids you're looking for.

Like cellphones by Tablizer · 2008-12-06 10:50 · Score: 5, Insightful

Without a hard-limit, some people run up big cell-phone bills. If you are forced to stop and plan and budget when you exceed resources, then you have better control over them. Cloud companies will likely not make metering very easy or cheap because they *want* you to get carried away.

--
Table-ized A.I.

Re:Like cellphones by Cylix · 2008-12-06 13:09 · Score: 2, Insightful

Actually, metering is cheap and easy, simply because they *need* to meter your traffic. Companies with infrastructure requirements and not a great deal of dumb users will generally have to be honest to keep your business.
Loyalty is based on performance and meeting customer expectations.
Phone companies get away with this crap because they are either a monopoly or engage in lengthy customer lock in. It also doesn't help that it is pretty much the norm to nickel and dime the customer.
Ec2 and other retail outlets you can simply walk away from if you are unhappy. I'm assuming other cloud operators work in a similar fashion.

--
"You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
Re:Like cellphones by lysergic.acid · 2008-12-06 14:33 · Score: 4, Interesting

i think the author's point is that dynamic scaling should always be planned; partly because it results in better understanding of traffic patterns, and thus better long-term capacity planning, and partly because you need to be able to distinguish between valid traffic and DDoS attacks. still, i think the author is overstating it a bit. one of the main draws of cloud computing to smaller businesses is the ability to pool resources more efficiently through multitenancy, part of which is precisely due to auto-scaling. without the cloud being able to dynamically allocate resources to different applications as needed in real-time (i.e. without human intervention), there isn't much of an advantage to sharing a cloud infrastructure over leasing dedicated servers.
for instance, let's say there are 10 different startups with similar hosting needs, and they can each afford to lease 10 application servers on their own. so using traditional hosting models they would each lease 10 servers and balance the load between the them. but after a few months they realize that 75% of the time they only really need 5 servers, and 20% of the time they need all 10, but an occasional 5% of the time they need more than 10 servers to adequately handle their user traffic. this means that in their current arrangement, they're wasting money on more computing resources than they actually need most of the time, and yet they still have service availability issues during peak loads 5% of the time (that's over 2.5 weeks a year).
all 10 of these startups share a common problem--they each have variable/fluctuating traffic loads severely reducing server utilization & efficiency. luckily, cloud computing allows them to pool their resources together. since the majority of the time each startup needs only 5 servers, the minimum number of virtual servers their cloud infrastructure needs is 50. and since each startup needs double that 20% of the time, 10 extra virtual servers are needed (shared through auto-scaling). but since each startup needs more than 10 servers for about 2.5 weeks each year, we'll add another 15 extra virtual servers. so all in total, the 10 startups are now sharing the equivalent of 75 servers in their cloud.
by hosting their applications together on a cloud network, each startup not only has their hosting needs better met, but they also stand to save a lot of money because of better server utilization. and each startup now has access to up to 30 virtual servers when their application requires it. this kind of efficiency would not be possible without a cloud infrastructure and auto-scaling.
Re:Like cellphones by enovikoff · 2008-12-07 06:26 · Score: 2, Interesting

As a cloud computing provider, I actually have no interest in having my customers suddenly run up huge bills. The reason is that as the article said, something is most likely wrong somewhere, which means that as their services provider, I'll also be responsible for figuring it out :) I can't speak for Amazon, which has a more hands-off model, but my success is invested in the success of my customers, so I won't sit idly by while they waste their money. However, looking at my company's balance sheet, we make our money off of base load, not peaks. Unlike what one of the other posters said, we can't average the peak load across customers since most customers have peaks at the same time, so accommodating peak load is more of a money-losing proposition since the bulk of those servers lie idle much of the day. In a truly round-the-clock (geographically distributed) cloud operation, this might be true, but even Amazon, which makes you choose the continent you want to run your cloud servers in, still has to hold a lot of reserve capacity (which is built into their rates) to accommodate the usual twice-daily peak loads. For web sites, a peak load that is many times the base load usually indicates something is wrong with the business model as well as the software, since SaaS providers also can't make any money off short bursts of usage. In many cases, peaks that last less than the provisioning time of a new instance (which is typically no less than a few minutes because of the time to load the instance's memory from storage) have to be handled differently anyway, either with more base allocation or for example with a queue of work to be done and notifications to customers when that work is completed.

Author makes some valid points, but... by Anonymous Coward · 2008-12-06 10:54 · Score: 4, Insightful

THe author states that one reason he doesn't like autoscaling is because it can take a while to take effect. Thats bad technology, waiting for someone to come along and improve it.

He also says he doesnt like autoscaling even with limiters. Autoscaling with limiters makes sense to me, especially if the limits are things along the line of 'dont spend more than XXX over time Y'.

Finally, not using autoscaling because you might get DDoS'd is just stupid. You lose business/visitors. Thats worse than paying more to avoid being taken down, because your reputation gets hurt AS WELL AS losing you business.

Re:Author makes some valid points, but... by narcberry · 2008-12-06 12:51 · Score: 2, Insightful

He complains that 10 minutes for a computer to scale is too slow, then states

Auto-scaling cannot differentiate between valid traffic and non-sense. You can. If your environment is experiencing a sudden, unexpected spike in activity, the appropriate approach is to have minimal auto-scaling with governors in place, receive a notification from your cloud infrastructure management tools, then determinate what the best way to respond is going forward.
It's 4pm on a Saturday, and your site is getting hit hard. Rally the troops, call a meeting, decide the proper action, call Fedex to ship you more infrastructure, deploy new hardware, profit from your new customers, all the while laughing at the fools who waited 10 minutes for their cloud to auto-scale.

--
Modding me -1 troll doesn't make me wrong.
Re:Author makes some valid points, but... by narcberry · 2008-12-06 13:14 · Score: 2, Funny

Oh right, Al Gore internet rule number 1. Internet closes on weekends. Only hackers can visit sites, and only with malicious intent.

--
Modding me -1 troll doesn't make me wrong.
Re:Author makes some valid points, but... by Cylix · 2008-12-06 13:16 · Score: 2, Interesting

His complaint with auto-scaling was that if the org is doing their proverbial homework then and planning for additional capacity then they should not need it.
There are times when traffic boosts come as a bit of a surprise. However, depending on size and free capacity some bumps should be able to smooth out.
Another trick is to have the means to scale some functionality down to allow for additional traffic. Slashdot for instance used to flip to a static front page when traffic was insane.
Personally, a very limited automatic scale to meet a few percentage points might not be a bad idea at least to create additional buffer for increased reaction time. Still, alarms should sound and I would think of this as a fall back option.
All in all, I rather agree with his sentiment. Don't be lazy and don't waste cash.

--
"You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
Re:Author makes some valid points, but... by cecil_turtle · 2008-12-07 03:54 · Score: 2, Interesting

Of course, you can only do this if you know you're under attack, and if your infrastructure is set to autoscale, you probably won't know. Until you receive the bill.
Yes because if you happen to use some sort of auto-scaling system, be it at the cloud level or your own management system, it's very likely that you never thought to put in the same monitoring and alerting systems that you already had on your non-cloud, non-autoscaling systems thus ensuring that you will be blindsided by the scenario you just laid out.

Or, you have more than two brain cells to rub together and you already had all of that in place and just pointed it to the auto-scaling cloud system enabling you to react the same way, except without the downtime in the middle.

Want to be hip /.? by Daimanta · 2008-12-06 11:00 · Score: 5, Funny

The blogosphere has disagreed with the use of web2.0 in the cloud. Sure, we all know that data is king and that's why we use software as a service nowadays with the web as a platform using AJAX and RSS extensively. This has helped to solve the challenge of findability since lightweight companies helps to connect user needs. The fact is that the long tail is part of the paradigm of user as co-developers in server wiki-like sites. Unfortunately this brings up the problem of ownership of user generated content. But I think that perpetual betas help the architecture of participation to stimulate web2.0. Interaction does make the experience good.

--
Knowledge is power. Knowledge shared is power lost.

Re:Want to be hip /.? by Atriqus · 2008-12-06 12:22 · Score: 3, Funny

Bingo!

--
Hey, look! It's Bono's brother.
Re:Want to be hip /.? by pdbaby · 2008-12-06 13:09 · Score: 2

It'd be funny if it wasn't so true :-(

--
Global symbol "$deity" requires explicit package name at line 2. - If only $scripture started "use strict;"
Re:Want to be hip /.? by FredFredrickson · 2008-12-06 14:53 · Score: 2, Funny

Perfect, you've written my next proposal for my boss. Woot! He'll love it.

--
Belief? Hope? Preference?The Existential Vortex

Get Off My Lawn! by chill · 2008-12-06 11:10 · Score: 5, Funny

Someone get this guy a cane to shake at the whipper-snappers. "In my day, you learned proper capacity planning or you didn't enter the data center!"

It can take up to 10 minutes for your EC2 instances to launch. That's 10 minutes between when your cloud infrastructure management tool detects the need for extra capacity and the time when that capacity is actually available. That's 10 minutes of impaired performance for your customers (or perhaps even 10 minutes of downtime).

Like, you could do it so much faster than 10 minutes without auto-scaling. Bah! If you've read The Art of Capacity Planning you would've mailed in the coupon for the free crystal ball and seen this coming!

Properly used, automation is a good thing. Blindly relying on it will get you burned, but to totally dismiss it out of hand is foolish.

--
Learning HOW to think is more important than learning WHAT to think.

Re:Get Off My Lawn! by VoidEngineer · 2008-12-06 12:05 · Score: 4, Insightful

Properly used, automation is a good thing. Blindly relying on it will get you burned, but to totally dismiss it out of hand is foolish.

First Rule of Automation: Automation applied to an efficient task increases it's efficiency; likewise, automation applied to an inefficient task will simply increase the problem until it's an all out clusterfuck.

Second Rule of Automation: Automation applied to an effective task will be effective; likewise, automation applied to an ineffective task will still be a pointless waste of time.

Or something like that. My eloquence appears to be -1 today.
Re:Get Off My Lawn! by Aladrin · 2008-12-06 14:26 · Score: 2, Insightful

And in addition, if that capacity is needed on my current servers (which aren't all cloud-y), how long does it take to scale up? I have to order a new server, install an OS, configure it, install all the software I need, test it, carefully roll it out.
Can I do that in 10 minutes? Not a chance! If I did that in 10 hours it would be a miracle. 10 days is a lot closer to reality, for a true rush job.

--
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
Re:Get Off My Lawn! by TubeSteak · 2008-12-06 14:29 · Score: 2, Interesting

First Rule of Automation: Automation applied to an efficient task increases it's efficiency; likewise, automation applied to an inefficient task will simply increase the problem until it's an all out clusterfuck.

Last time I checked, most sites that get slashdotted are either some shiatty shared hosting or a dynamic page.
Static pages & CoralCDN would keep a lot of websites from getting hammered off the internet.

--
[Fuck Beta]
o0t!
Re:Get Off My Lawn! by nine-times · 2008-12-06 16:27 · Score: 2, Interesting
Yeah, it seems like is argument really comes down to a couple points:
- Auto-scaling isn't fast enough- Apparently EC2 doesn't react quickly enough. To me, this seems to be a technical question as to whether auto-scaling can be designed to be reactive enough to be practical, and not necessarily an insurmountable problem with the concept of auto-scaling.
- Auto-scaling might incur unexpected costs- The basic idea here is that, if you're paying a certain amount per measurement of capacity and it scales automatically, then your costs scale automatically too. This seems more like a contractual issue with your "cloud" service provider than an insurmountable problem with the concept of auto-scaling.
So if someone offered a service where auto-scaling was fast, and there was some kind of limits on what you could be charged under what sorts of situations, would he still have a problem with auto-scaling? I was expecting something a little more absolute, like "there's a definite trade-off between security and accessibility", but it seems like he's saying something more like, "Right now there's no service that is offering auto-scaling services that are good enough."
Re:Get Off My Lawn! by initialE · 2008-12-06 21:25 · Score: 3, Funny

The second rule of automation is you do not talk about automation.

--
Starbucks, Harbuckle of Breath.
Re:Get Off My Lawn! by SanityInAnarchy · 2008-12-07 10:33 · Score: 2, Interesting

there would be various triggers of "if capacity exceeds A in time frame B, someone gets emailed/paged and is given the opportunity to override."
Point is, the overriding should probably happen after the system has attempted to auto-scale.
For instance, if I got Slashdotted, I'd probably want to scale to handle the load. If I have to be called in to make a decision before any scaling happens, I've probably missed an opportunity. On the other hand, if I've set reasonable limits, I then have the choice to relax some of those limits, or to decide I can't afford surviving Slashdot this time (or maybe realize it's a DDOS and not Slashdot), and pulling the plug -- but an hour's worth of extra capacity shouldn't kill me.
Of course, that all depends on what kind of site you're running. Some sites might rather be taken completely down by a Slashdotting than spend too much on hosting.

--
Don't thank God, thank a doctor!

Auto-rooting? by Gothmolly · 2008-12-06 11:15 · Score: 3, Funny

So I hand over my business logic and data to a third party, who may or may not meet a promised SLA, and whose security I cannot verify? Does this mean I can be rooted and lose my customer data faster, and at a rate proportional to the hack attempts? Cool!

--
I want to delete my account but Slashdot doesn't allow it.

Re:Auto-rooting? by Eskarel · 2008-12-06 15:36 · Score: 3, Interesting

Well yes, you could also look at it from the point of view of. "I have a really clever idea, which will probably take off, and which if it does take off will require a lot of resources. I don't have a lot of money, but I can scrape together the cash for a small cloud investment and if my idea takes off I can afford as many servers as I want. I could buy a couple of regular servers and be unable to meet demand for several weeks while I order new equipment and possibly lose my start because people got sick of my site not being up, I could sell my idea to some venture capital people who, if they invest at all will take half my profits, or I can use the cloud, expand in ten minutes, and maybe make a lot of money without having to give it all to someone else".

That's the strength of the cloud my friend, being able to start an idea without having to promise 90% of it to someone else to get funding.

Capacity planning isn't that hard...for us by HangingChad · 2008-12-06 11:20 · Score: 3, Interesting

While a content site might run the risk of getting slashdotted or Dugg, that isn't necessarily a big risk for applications. And your platform choice makes a big difference. We do our business applications on a LAMP stack. If we need capacity, we can stand it up for the cost of hardware. Nice thing about LAMP is at least the AMP part is OS portable, so we can rent capacity where ever it's cheap. So far we haven't needed to do that but it's nice to have the ability.

To date we haven't run into any problems. If we're expecting a surge of new customers, we have a pretty good idea of expected traffic per customer. We can stand up the capacity well in advance. Hardware is cheap and can be repurposed if end up not needing all the extra capacity.

Our platform choice gives us a tremendous amount of flexibility. You don't get that with Windows. Any increase in capacity has a significant price tag in license fees associated with it. Once you build the capacity there are fairly significant ongoing expenses to maintain it. You can take it offline if you need to scale down but you don't get your money back on the licenses. There's a whole new set of problems outsourcing your hosting.

I like our setup. The flexibility, the scalability, the peace of mind of not struggling with capacity issues, negotiating license agreements with MS or one of their solution providers and not being limited to their development environment. We can build out a lot of excess capacity and just leave it sit in the rack. If we need more just push a button and light it up. I'm not sure an Amazon or anyone else could do it cheap enough to justify moving it. And I really like having the extra cash. Cash is good. Peace of mind and extra money...what's not to like? Keep your cloud.

--
That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage

Re:Capacity planning isn't that hard...for us by Wonko · 2008-12-06 12:55 · Score: 3, Insightful

Careful with that - some nuances will turn up that will bite you on the ass. I found out last year that Apache's MD5 module creates different hashes(!) on Windows than it does on UNIX.
If that is true then at least one of them isn't actually generating an MD5 hash.
I'm just guessing, but I bet you were also encoding the line ending characters. That would be encoded differently on Windows and UNIX, so you'd actually be hashing two strings that differed by at least one byte.

He assumes too much by tpwch · 2008-12-06 11:46 · Score: 3, Insightful

He seems to be assuming that you only want to run a website on this service. I don't think hosting websites on this kind of service is a good idea at all. There are many other types of application you run on clould computing infrastructure, which makes much more sense, and negates almost all of his claims.

Consider for example a rendering farm. One day you may have two items to render. Another day 10 items. The next day 5 items. Should you really scale up and down manually each day, when you could just as easily just start the amount of servers you need based on how many jobs have been submitted for that day, and how large the jobs are?

There are many other examples. Websites are not the only thing you run on these services.

--
Posted by a Debian GNU/Linux user

Re:He assumes too much by Cylix · 2008-12-06 13:23 · Score: 2, Interesting

What if someone posts a bad batch or accidently malforms some package in such a way to chew though 10x the resources.
I think there are many great uses for cloud environments, but people have to be careful when it is pay for play.
It's a bit different then tying up all the resources on the web server. Sure, there is cost in time, but rarely does anyone get billed for those man hours.

--
"You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
Re:He assumes too much by Animats · 2008-12-06 17:48 · Score: 2, Informative

Consider for example a rendering farm.
Such as ResPower. They've been around for a while, from before the "grid" era (remember the "grid" era?). This is a good example of a service which successfully scales up the number of machines applied to your job based on available resources and load. Unlike a web service, though, ResPower normally runs fully loaded, and charges a daily rate with variable turnaround, rather charging for each render. (They do offer a metered service, but it's not that popular.)
It's worth looking at ResPower because, unlike most of the "grid" or "cloud" services, they have an established customer base and make money.

You don't even actually save money by using cloud by Skal+Tura · 2008-12-06 12:11 · Score: 2, Insightful

Yeap, that's right. With over 7yrs of solid hosting industry experience, it's very easy to see.

Atleast Amazon's service is WAY overpriced for long term use. Sure if you need it just for few hours ever it's all good, but for 24/7 hosting it ain't, none of them.

It's cheaper to get regular servers, even from a very high quality provide than to use amazon's services.

Best of all: You can still use their service to autoscale up if you prepare right, and yet have low baseline cost.

If it's only filehosting service you need, the BW prices amazon offers are outrageous, take a bunch of cheapend shared accounts, and you'll get way better ROI, and still, for the most part, do not sacrifice any reliability at all. Cost: Greater setup time, depending upon on several contingency factors.

Case examples: you can get from bluehost, dreamhost etc. plenty of HDD & Bandwidth for few $ a month. Don't even try to run any regular website on it, they'll cut you off (CPU & Ram usage), but for filehosting, it's great bang for buck :)

Scared of reliability? Automatically edit DNS zone according to locations availability and have low(ish) TTL. Every added location increases reliability.

--
Pulsed Media Seedboxes

One of several anti-cloud arguments by mattbee · 2008-12-06 13:34 · Score: 2, Informative

I did some rough cost comparisons for a high-traffic web site in my similarly cynical article a few weeks ago (disclaimer: I run a hosting company flogging unfashionable servers, and am not a cloud fan yet :) ).

--
Matthew @ Bytemark Hosting

An odd argument by chrb · 2008-12-06 13:55 · Score: 3, Insightful

His argument basically boils down to "Auto-scaling is a bad idea because you might implement it badly and then it will do the wrong thing". Isn't that true of everything? The flip side, is that if you implement it well, then auto-scaling would be a great idea!

It's like saying that dynamically sized logical partitions are a bad idea, because you should just anticipate your needs in advance and use statically sized partitions. Or dynamically changing CPU clock frequencies are a bad idea, because you should just anticipate your CPU needs and set your clock frequency in advance. Or dynamically changing process counts that adapt to different multi-core/CPU availability factors are a bad idea... you get the picture.

The idea that some computational factor can be automatically dynamically adjusted isn't necessarily a bad idea, it's just the implementation that might be.

Stupid by Free+the+Cowards · 2008-12-06 14:47 · Score: 2, Insightful

I can summarize this article in one sentence:

"X is only useful for those who are too lazy to do Y."

It's been said about assembly language, high-level languages, garbage collection, plug-n-play, and practically any other technology you can name. It is not actually a valid criticism.

--
If you mod me Overrated, you are admitting that you have no penis.

Autoscaling is a ticking time bomb by upuv · 2008-12-06 15:35 · Score: 2, Interesting

On the surface auto-scaling is obviously a great thing. But it doesn't take much thought to start punching holes in it.

Lets first look at the Data center that provides such a glorious capability.
1. It is their own best interest for you to scale up. Scale up processing, disk, bandwidth or what ever. For the simple reason it's more money. Since you signed the contract you will probably be scaled well and truly before you know it. Usually you only find out when the bill comes in.
2. The data center has very little incentive to make sure you are notified in a timely manor of autoscaling. As a matter of fact this feature is usually crippled or even broken. I don't care what the contract says. The datacenter rarely honors this part of the contract to anyones satisfaction.

Now lets look at the client and the horrible things that can go wrong. By no means even remotely a complete list.

The new version of the app list.
1. Bob the developer forgets to index that new DB table. Database goes nuts trying to do a simple select. BAM autoscaling of DB CPU resources goes through the roof.
2. New AJAX call is not properly tested. For some reason it now triggers div refreshes as the mouse moves. App server is now flooded. BAM Band Width and CPU autoscale through the roof.
3. App no longer properly caches that all important query. BAM again DB and APP CPU skyrockets.
4. The genius in dev decides to make the jsession stateful. Works fine on the desktop. Works fine when load test hammers only 10 users. Oh Oh real world kicks in, in Prod. We have 10k users. Everything goes through the roof.
5. The list of new version issues goes on.

The bad guys come a knocking.
OK so now your a hot property on the net and you sign up for autoscaling so that you don't have to worry about capacity planning. You are focused on that cash machine that is your cool app.
1. You didn't know about that monster hole in the app. The bad guys inject a phishing site onto your Uber site. The phishing site is wildly successful. Oh crap we just paid for the biggest fraud site on the net.
2. The dev team leaves that back door on the site so they can maintain it remotely. Oh Oh all of a sudden you notice port 25 traffic is off the charts from the site. OMG we just uploaded 25Tbtyes in the last 24hours. You have just joined the ranks of the largest SPAM generators on the planet. You have a monster bandwidth bill and a very expensive legal bill.
3. What are these very large globs in the database all of a sudden. OH crap we left a hole and are vulnerable to SQL injection. OH crap it's all encrypted kiddie porn. Bills for bandwidth, disk and legal come a knocking.

I do have experience with this sort of thing. And it always goes sour at some point. The techies are always overruled by the marketing and business types on this. As the deal is always so great on paper. At some point something will go wrong. Software is never perfect. Between defects and bad guys you are a sitting duck for the big man carrying the bill to your door. It's only ever a matter of time.

Oh and lastly. Geuss what some times the autoscaling fails. Make that a lot of the time. And you are then off the air.

The best situation is for you as a customer of scaling is to have a close relationship with the supplier. Once you start to reach certain predefined levels of usage they should contact you and give you the option of an upgrade. Make the scaling feature by human choice. Never let the supplier decide that for you.

It really depends on your business model by PornMaster · 2008-12-06 16:09 · Score: 2, Insightful

When your revenues scale with the services rendered, it *does* make business sense to auto-scale. Auto-scaling is a technical solution, not a business one. Being Slashdotted isn't typically associated with more commercial activity, it's associated with "hit-and-run" visitors. The same with social networks. Does Twitter even have a business model? But wherever there's a business model where margins are relatively stable but activity rises and falls, auto-scaling makes you money rather than costing you severely. Like many things, it's a tool which should be used wisely, where not paying attention can leave you missing fingers.

--
500GB of disk, 5TB of transfer, $5.95/mo

Ever heard of "SLA"? by mcrbids · 2008-12-06 19:03 · Score: 2, Interesting

I have. My company lives (or dies) by the !@# SLA.

Our agreements require no less than 99.9% uptime, about 8 hours of downtime per year. We never gotten close to that - our worst year was about 2.5 hours of downtime because of a power failure at our "fully redundant" hosting facility.

In this world, where I have up to 8 hours per year, 10 minute response would be a god-send. We've just spent *alot* of money revamping our primary cluster so that we now operate with 100% full redundancy on everything. Redundant network feeds. Redundant logic servers. Redundant load balancers. Redundant database servers. All with auto failure, dynamic routing with DNS. (which is, itself, very failure tolerant)

But an application has to be constructed in a very particular way in order to scale, particularly if data integrity is important. (EG: ACID compliance SQL) This is often counter-intuitive and non-obvious, and porting an existing application to such an environment is not a quick investment. It's very typical to give up raw performance for performance scalability. We've devoted approximately 6 man-months over the past year to take full advantage of clustered, redundant computing in order to try for 1 hour over the next year along with near-linear scalability.

It's not just about capacity - it's about keeping all those !@# servers organized and coordinated!

Bottom line? Take a look at your SLA.

In our case, if we suffered a few hours of downtime every year or so, it would be an inconvenience to our users and clients. In any event, our uptime is best-of-breed in our niche-ish industry, but I'd put our uptime as mid range for hosted products overall, when you include companies that are much bigger than our still-somewhat-small rapidly-maturing startup.

Spend money where it counts. This requires an understanding of your economic base. If somebody slashdots your site, is that your golden opportunity, or is that an annoyance. In our case, a few hours of downtime if we got slashdotted wouldn't cause any particular long-term problem if it brought us down. If you have a few hundred customers paying $10/month for some cheap-o websites, a few hours of downtime every year or two won't cause much problem.

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.

Also auto-budgeting (;-)) by davecb · 2008-12-07 03:51 · Score: 2, Interesting

This reminds me of a large company which outsourced enthusiastically, until at one point they discovered they'd outsourced decisions about maintenance... causing the outsourcer to have control over the maintenance budget.

As you might expect, after it ballooned, they started in-sourcing!

Giving others control over financial decisions is almost always unwise, even if doing so is the newest, coolest idea of the week.

--dave

--
davecb@spamcop.net

Slashdot Mirror

Why Auto-Scaling In the Cloud Is a Bad Idea

38 of 124 comments (clear)