Why Is Less Than 99.9% Uptime Acceptable?
Ian Lamont writes "Telcos, ISPs, mobile phone companies and other communication service providers are known for their complex pricing plans and creative attempts to give less for more. But Larry Borsato asks why we as customers are willing to put up with anything less than 99.999% uptime? That's the gold standard, and one that we are used to thanks to regulated telephone service. When it comes to mobile phone service, cable TV, Internet access, service interruptions are the norm — and everyone seems willing to grin and bear it: 'We're so used cable and satellite television reception problems that we don't even notice them anymore. We know that many of our emails never reach their destination. Mobile phone companies compare who has the fewest dropped calls (after decades of mobile phones, why do we even still have dropped calls?) And the ubiquitous BlackBerry, which is a mission-critical device for millions, has experienced mass outages several times this month. All of these services are unregulated, which means there are no demands on reliability, other than what the marketplace demands.' So here's the question for you: Why does the marketplace demand so little when it comes to these services?"
Probably because of the cost. I do network design for a fairly large telco, and let me tell you the cost goes up exponentially with the number of "9"s that the business asks for.
Are these kind of outages really so common? Mobiles phones I absolutely agree with. ON the other hand, I literally cannot remember the last time I lost cable or my internet. I've literally lost power more frequently than either of them (maybe 4 times in the past year) and lost water once. Emails not making it to their destination--again, does this really happen? In the decade plus I've been using internet email, I can't off the top of my head ever think of any "lost" email unless it was sent to a wrong address or something.
While you deserve the mod points, it should also be noted that consumer expectation is strangled into submission within 20 minutes on the first support call they make to ask about better service quality.I know a guy who is locally famous because he will spend 4,5,6 or more hours on the phone with customer service, supervisors, managers and anyone on the board of directors that he can find a phone number for. What is he fighting for? discounted service or reparations for lost service(s). That's right, it takes hours on the phone to get one of those companies to either own up to, and pay for losses accrued by their customers through loss of service.
In truth, most consumers won't complain when they should, so there is no marketplace pressure on those businesses to aim for five nines uptime.
Support NYCountryLawyer RIAA vs People
I think the term you're looking for is "managing expectations." Here's a little article about it from the IT side. It's something that Microsoft and teleco's have become so good at. If you keep expectations low and give them a little better, they'll be more than happy. If you give the same, but you promised the world, you get a bunch of unsatisfied customers.
As a guy who does communications in the U.S. Navy, I can attest to this. If the United States military can't guarantee 99.999% uptime on communications in all conditions, what makes anyone think it's possible in the private sector?
512 MB RAM, 20 GB disk, 200 GB transfer, five datacenters. $19.95/month.
I'm still waiting for people to scream about the rising gas prices and the record oil company profits. Seems like this would have a greater impact on the general populous than reliable cell phone service.
It has nothing to do with conditioning. They could easily bury power lines to prevent storm outages, but people don't want to pay the costs. That is what 9s in uptime is all about. Paying increasingly more for increasingly smaller additional uptime. I would rather pay my current rates than pay twice as much, but have less downtime. I can live for a day or two with out power after a major storm. If you can't then pay the extra your self and buy a generator. Don't try to force others to subsidize your service requirements.
I was born in 1964. I have no recollection of POTS telephone service ever being unavailable.
Electricity was expected to drop out a few times every summer, and until someone figures out how to tell lightning where to go, I expect it will continue to happen. In my part of Canada, however, power is continuously available from October to April no matter what. Even if you don't pay your bill. The only winter power outage of note I can think of offhand was the great Ice Storm of 1998, one of the most spectacular cases of force majeure I've witnessed in my life.
In my part of the world, at least, power and telephone were life-and-death services and legislation mandated their reliability.
Crumb's Corollary: Never bring a knife to a bun fight.
My electricity isn't 99.999% uptime (that's 30 seconds in a year) which would require me to get an UPS
My consumer grade equipment isn't 99.999% uptime (with luck, maybe I guess but there's no ECC, redundant power etc).
My software isn't 99.999% uptime (ok, so the kernel is stable. When X crashes, so does everything of importance on a desktop)
If there's something urgent, you CALL me anyway.
I'd rather take a line with 99.5% uptime (that's two days without internet per year) that's 10x faster and costs 10x less. Which doesn't include that I have Internet at work, or via my cellphone, or via a webcafe or any number of other easily available sources. The only real killer I can think of is if you only telecommute and can't go to work, but even then I figure the nearest Starbucks will let you occupy a corner with some purchases.
Live today, because you never know what tomorrow brings
Partly correct. What they did was to mass introduce the GUI. 1.0 was a joke as far as usability went. At the same time the 386 was out and the talks of multiprocessing was promising new and exciting computing in the near future.
I don't think they measured squat. Just did their best. Only thing was that there were nobody who could properly design an O/S and complexity, instead of simplicity, ruled the day.
What we are seeing is the very best they as group are able to produce.
They have never been great at marketing either. But they were really the first to push the GUI with success. Don't forget Apple became a very closed platform. They did not attract masses the way the open IBM PC did.
Right there history shows how important open standards are to success. Apple was considered this fantastic success story but in reality they cut it short and did not buy the masses the way the Johnny came lately IBM PC did. But we are slow when it comes to learning from history.
What they have been good at is market lock-in, vender lock-in and many other types of lock-in. (The problem really is that they had never heard about duty and were only interested in money.) We all thought they would get it right sooner or later and deliver a good platform that would allow happy computing. The fact that they specialized in adopting good standards and then corrupt them so that you got locked in was a very calculated development.
At one point Gates himself said that Unix was the way to go. Then he decided to do it better but clearly never understood what made Unix so good (simplicity). Torvald on the other hand was ONLY looking for simplicity. Which is why it fit so well into the general Unix design.
Look at windows, it is filled with arbitrary complexities and is horribly inefficient. Never mind when upper management throws fits and yell at staff, I've never found that conducive to good programming, or business.
Gates cheated his way into O/S design, used people from VAX who's memory management problem were dragged over to windows. Built a kernel in BASIC! Haha! And got away with it for years!
Someone who knew more about systems picked the Unix design and rewrote history based on technology, and was not motivated by money. Interesting to see how much we like to be able to just do what we need. Imagine if IBM had released Linux. With all the corporate support for let's say $100. Then opened it up with a GPL license.
Microsoft would not be sitting pretty at all. The O/S2 collaboration would not have happened and Gates would not have learned his lessons from that. For all their success I've never considered them much of a success where it really matters. Integrity in product and care for customers. I have people send me Brandy, fine wines and other tokens of their appreciation after sales. Because I believe in treating other people the way I like to be treated, and I really care about my clients.
Frequent reboots haven't been required since win2k.
(snicker)
I've been running windows for years, and this statement is just very funny to me. You must be running some entirely different magical version of windows that I've ever seen, but reboots are EXTREMELY common on 2000, XP, and Vista. The "just reboot" instinct I've seen from multiple different Windows guys is common, and DOES work. I was looking forward to Vista, which claimed it didn't require rebooting as often. That didn't really turn out to be the case. If you really think win2k and beyond doesn't require reboots, I think you either don't run it, or just have a very poor memory.
AccountKiller
Maybe that's the question the cable company would like to ask, but the one concerned consumers should be asking is, "how do you get someone to expect _more_ for the same price (or less) when they think that what they currently get is good enough?" Reading your piece of the discussion, I think this question could also follow, and it happens to be the original question...
Would I be willing to pay more for cell service that had fewer dead zones, dropped calls and "busy networks" then my current one has? No way. It's not as good as landline, but it's good enough for me. If, ten years from now, it worked the same as it does now, I would expect their competition to have passed them by and I'd switch. In the US we're in a free market system.
If I was tired of my cable internet dying on me occasionally, which competitor would I turn to? DSL, satellite and local wireless all have problems too. I settle for less than 5 nines because I have no choice, if I want service that is anywhere near the cost it is right now.
I'm not going to mod you as troll. But maybe others do me. Computer science and all what goes with is is an exact science. Until MS came around. MS destroyed it. Welcome to the dark ages of MS.
The origins of an OS really show through a lot of the time. Windows started out as a single user OS, so rebooting was OK because the only person you messed up was the guy sitting in front of the screen. It eventually evolved into a multi-user OS, but the "just reboot!" mentality persists to this day.
Windows NT (ie: contemporary Windows) has been a multiuser OS since it's first release.
The reason the "just reboot" mentality persists is simply becaus e99% of the time it *is* used as a single-user OS, and no-one else is impacted. This has _zero_ to do with the architecture and everything to do with the user. Linux would be (and is) treated in the same way in similar situations.
Linux/Unix on the other hand started out life as a multi-user OS. Rebooting was a big no-no, because you'd affect countless people logged in, and you'd get yelled at for ruining someones work.
UNIX actually started out as a single-user OS and the multiuser aspect was bolted on later. Linux didn't, of course, because by the time Linus banged together his UNIX rip-off, UNIX had been multiuser for quite a while.
However, again, the attitudes towards how their relevant users treat servers and workstations have about 10% to do with their architectures and 90% to do with their knowledge. DOS and OS/2 were single user, yet frequently had BBSes and similar running off them. You can be assured the people running those BBSes were far less like to have the "just reboot" mentality.
Further, the other reason most people have that attitude is because to them a computer is just another appliance. When other appliances act up, pretty much the first thing _everybody_ does is turn it off and back on again. Why on Earth would you expect them to treat a computer any differently ?
Windows administrators categorically will try rebooting the damn thing first to fix any problem (and it usually works). Linux administrators will only try this as a last resort (and it almost never works).
No. Inexperienced admins will try rebooting first, regardless of platform. Experienced admins will not. Incidentally, there are numerous classes of problems on Linux (and UNIX in general) which are more quickly and easily "fixed" with a reboot.
Anyway, at Microsoft the idea that you can somehow tweak windows just right so rebooting isn't necessary is crazy.
I can't even remember the last time I had to reboot any of my Windows machines without a good reason (eg: patching).
Finally, there's nothing wrong with rebooting _anyway_. If your service uptime requirements are affected by a single machine rebooting, your architecture is broken. All the reboot does is demonstrate that it's broken without a real problem actually occurring.
Sysadmins comparing machine uptimes is like ricers comparing spoilers.
[citation needed] I call bullshit on that one.
And I call BS on your BS. Clearly you're not familiar with the state-of-the-art as far as email goes. You've certainly not had to set up and run a private email server.
Here's one good reference. It mostly mirrors my experience, except that it's been going on longer than the writer has observed.
The basic problem is that Yahoo, Hotmail, ATT and other large email providers, or ISPs, simply refuse to honor the standards which have been published (DKIM, et. al.). Google is great. But it's gotten so bad with the others that I simply don't bother communicating with anyone who has a Hotmail, Yahoo, or ATT account. If they are someone important, I'll tell them once (via a different band) of the situation. And let them know that unless they change their email provider, I won't be responding to any future email from them.
Usually I just refer them to gmail, because google seems to be the only large email provider with a technical clue.
The other interesting thing is that all of these large companies will treat unsigned email from an Exchange server as more verified than a DKIM email, but I digress.
Supposedly the excuse is that it's due to spam. I'm certain that is part of the problem. But the other part is that there's definite incentive for the big boys to eliminate the small independent websites and drive all of the business into their arms.
So, yes, the OP's statement about many email messages not reaching their destination is quite true. Most? No. But anything that doesn't use the technology offered by the big commercial joints (including Microsoft server technology) is shut off from communicating with a large part of the internet.
Blackberry is not a mission critical service. The people who use it as such are naive.
Heh. Well, many PHBs would disagree, but your point is valid.
For your amusement, the Blackberry email servers are provided by a company called Mirapoint (mirapoint.com), and they are Linux based. From what I've heard, they cut over about 2 years ago from BSD to Linux, for various reasons. I'm also told that the CEO is a complete airhead who has difficulty managing a secretary, let alone a company. But that the mid-level managers and engineers in the U.S. are first rate. I imagine that they could indeed improve the uptime of the email servers, but those servers are quite good already.
The best way to predict the future is to create it. - Peter Drucker.
I'd say it's an even deeper problem -- it's not really a marketplace. The competition is few and far between, and they're oligopolistic, and probably price-fixing. I mean, what's your alternative to a blackberry? So what if the service sucks -- is your employer going to ... buy you an iPhone? [1] If Verizon pisses me off, I can switch to... AT&T, or some of the others if I don't mind roaming? People would vote with their wallets if there were candidates worth switching to.
[1] If so, let me forward you my resume for your consideration
Returned Peace Corps IT Volunteer
It's funny the attitude that comes from the users of each OS. Windows administrators categorically will try rebooting the damn thing first to fix any problem (and it usually works). Linux administrators will only try this as a last resort (and it almost never works).
It's even less than a last resort. I have, once or twice, had true problems that required a reboot of a Linux machine to fix. The one in most recent memory, it took three weeks before realizing that a reboot was (or at least, could be) the solution. That's three weeks of hard core debugging, tweaking, and hair pulling. The idea of a reboot to fix a user-level software issue is not something that even remotely crossed my mind, nor anyone else's. In fact, it was a Windows user from another location who ultimately made the suggestion "Have you tried rebooting it?"
Rebooting a computer to fix a problem should be viewed with the same suspicion as burning down your house to eradicate an infestation of insects.
I'm at home (and awake) 20% of the time.
My landline is up 99.999% meaning my phone is available to me when I need it 19.998% of the time.
I'm out and about (and coherent) 40% of the time.
My cell phone works 90% of the time meaning it is available to me when I need it 36% of the time.
Clear winner, cell phone.
Sometimes we lose site of reality while studying statistics.
Somewhat off-topic, but an anecdote related to massive consumer calls to tech support.
Back in the day, I was an IRC Operator for a large Undernet server, and there came a time where the new thing for troublemakers was to use open proxies on cable connections to flood channels/servers. One cable provider had a particularly large number of clients whose setup was used to attack the network and generally cause trouble.
At first, being in the area of that provider, I called tech support and escalated the issue as much as I could. My point was that they were ultimately responsible for the abuse coming from their network. Long story short, for months I got nothing but "we'll look into it".
After a particularly nasty week, and after consulting with the server admins, we decided to ban the whole ip range of that provider from using our server (they could still use the rest of Undernet, but our server was popular for them). The ban kicked > 1000 clients from the server with a message like : Your provider does not respond to abuse complaints. Contact your provider's technical support to have this issue resolved.
10 minutes later, there was a 30 minute wait at the provider's tech line. On a sunday afternoon. One hour later, I got an email saying they were blocking inbound port 1080 at their router to protect their clients machines from being abused.
I guess the point is, when something generates enough backlash, preferably with a nice surprise effect, things can change. The hard thing is to organize people enough to harass the company about it.
"I remember Y1K, every abacus had to get another bead"
No, it should be viewed as fumigating your house. You all move out, wait a few days, then move back in. When you reboot you don't lose the computer, you don't lose the archived data, and all the users can return in a short amount of time.
Burning down your house loses all the contents and ensures you'll never return...
Browsing at +1 - no ACs, I ignore their posts. So refreshing!
Well, we do have physics and trees and hills in Finland but I can't even remember the last time I had a call drop. Just last thursday I took a 140km train trip to a nearby city and spent the whole time chatting on irc. Used the same ssh connection for the whole trip. Nice 3g handovers @ 120 km /h (Nokia N73). Greetings from the 21st century..
people aren't willing to accept the economic and aesthetic costs of providing those services at the level of reliability you and the author are demanding.
I have to agree.
I've stated before 'Every 9 of reliability increases the cost 10 fold'. Now, this is only the vaguest estimate, with vast numbers of variables, unseen incidents, competency, etc...
Take a car that's 90% reliable. It'd be used, of course, and probably cost you only $100-500. You can get a car that's 99% reliable for $1-5k. 99.9% reliability would be getting into needing a new car(or newer used), costing $10-40k. This, of course, discounts getting a lemon.
Now, when it comes to phone service it's reliability comes from that stuff has been done for so long that the extra reliability doesn't actually cost 10X, plus the base '90%' is so cheap that upping it to 99.9% isn't very expensive.
I don't read AC A human right
> I can tell you that in the pre-Windows days...
and I can tell you that in the post-windows days... well, people have this concept of rebooting when things don't work. "it will auto-magically fix itself" (tm). cell-phones, managed switches, home routers... you name it, the first thing tech-support will do is ask you to "turn it off and on again". so much so that that is a standard gag in "the IT crowd".
i had this incident in our data center where this nincompoop kept futzing around with a managed switch. he hosed the config, caused some ripple effect on the servers and then panicked and wanted to reboot everything - including the servers. didn't know what the problem was but as he is indoctrinated to the ethos of rebooting to automagically fix problems, just wanted to reboot everything.
i had to step in, restart a few services and things were back to normal. no reboot required. a reboot would have taken us out for a good 15-20 minutes. restarting services, 10 seconds.
it's almost like people don't take pride in uptimes. who cares if it's down for 30 minutes... thanks largely to the microsoft OS culture. unix was bad enough - compared to mainframes and VMS - or so i'm told.
so, yeah, it's gone downhill. MS didn't help. telephones might have had outages but i sure don't recall having to reboot the big black rotery dial phones..