Slashdot Mirror


Uptime Realities in the Internet World

schnurble writes: "My former boss has written an interesting article on the realities of uptime in the Internet World. It poses the idea that four and five nines of reliability are too expensive to be realistic, especially in the post dot-bomb economy. It's an interesting read, especially if you answer to an 800lb gorilla for outages and uptime issues."

15 of 353 comments (clear)

  1. Re:Nothing is THAT Important by Anonymous Coward · · Score: 2, Insightful

    Five-nine reliability in the airline industry would mean that we'd see a major commercial jetliner crash about every other day.

    No, thanks.

  2. Customers want it, but don't understand it by derekb · · Score: 5, Insightful


    How many engineers out there have heard the marketing / sales 'it has to be always available' and priced out an infrastructure accordingly.

    Even recently I'm working with a customer who wants a compromise between price and availability - but it still needs five nine's

    Availability is infrastructure plus process. You need to have the supporting process to go along with the hardware - maintenance schedules, change management (well FCAPS in general), etc. It's not just a big box.

    1. Re:Customers want it, but don't understand it by rob_from_ca · · Score: 4, Insightful

      This is the most intelligent thing I've ever heard on slashdot before. If you don't understand this comment, read it again and again until you do. :-)

      If you're a business, your money is far better spent improving the user experience rather than working on buying redundant-everything, building the support infrastructure, and incurring the extra overhead of the tedious and careful processes needed to obtain 5 nines (and 4, and even to a degeree 3 nines).

      If your site sucks and no one visits, it doesn't really matter if it's down...work on building something reasonably reliable that is very compelling to your users; that's money much better spent...

  3. In other words... by reaper20 · · Score: 2, Insightful

    We should just give up on decent service and professionalism. I don't think so.

    My ISP (Ameritech) seems to think so, considering my DSL connection and their promptness to "Get ahold of me within 24 hours..."

    Bleh ... It's not unrealistic ... don't expect people to live with downtime just because a good portion of those systems need to be rebooted on a regular basis (Win machines), and general retardness of sysadmins around the world allow things like Nimda and Codered to get out of hand. This is an excuse to let companies too cheap to have decent customer support off the hook. Maybe if they were educating their tech staff instead of finding more ways to rip us off, they'd have decent servive.

    Everyone with competent sysadmins on rock solid *nix systems raise your hands...

  4. Re:Nothing is THAT Important by nuggz · · Score: 3, Insightful

    I guess you don't have a pacemaker.

    Some things ARE that important, most things aren't.

  5. 99.999% perfection by Gorm+the+DBA · · Score: 4, Insightful

    Let's see...five nines would be just over five minutes of downtime in a year (315 seconds). For business and other non-life-threatening situations, that would be way better than necessary. Lots of folks are probably going to harp on the "If 1 out of 10,000 airplanes crashed, there'd be X crashes" line of argument. There's a problem with that...one mistake doesn't crash an airplane. Every system on an airliner is redundant, and virtually any "pilot error" has time to be fixed before there's a problem. Listen in on the Air Traffic Control to Cockpit transmissions sometime...just about every flight encounters some minor error at some point, whether it is a pilot needing to reask for a clearance or someone needing to climb or descend a bit to clear a potential collision. Errors are unavoidable. The key is to ensure recovery from those errors is possible. So sure, your computer may be down for 5 minutes a year. Make sure you have a backup system that is able to take up the slack instantly, and your downtime is down to 3/10 of a second a year. Redundancy is the key.

  6. Re:Nothing is THAT Important by WetCat · · Score: 3, Insightful

    Heh... Switzerland....
    Some factors that precede recent crash between Tu-154 and Boeing 757 DHL were
    - Traffic warining system in its scheduled 10-minutes maintenance - dispatcher got no warnings
    - Busy phone lines to dispatch - Deutch dispatch was not able to get to Switzerland dispatch to tell them about dangerous situation...

    This is an example that cost a lot of lives...
    (other tragic circumstance was that pilots of Tu-154 gave priority to dispatch commands instead of commands of collision avoidance system...)

  7. Re:Nothing is THAT Important by medcalf · · Score: 4, Insightful

    Not true. Five 9s in the airlines means that you'd see an airliner late or in some other way unavailable - possibly due to a crash, but not likely - every other day. Reliability is the availability to do what you need, when you need it. If a server is up 100% of the time, but is not able to be accessed because the network is down, the system is not reliable for you.

    --
    -- Two men say they're Jesus. One of them must be wrong. - Dire Straits
  8. not economically possible? by lingqi · · Score: 2, Insightful

    with M$, it is theoretically impossible as well to achieve their advertised up-time; ( i think back when they ran some ad (still running?) about how windows can achieve three or four 9s of uptime).

    Total bullshit... let's see -- windows machine *requires* reboot every time you apply a patch; a reboot on a large machine is... i dunno, 10 minutes if you got a lot of crap. security update turns up about twice a week or so... that puts up to be ~99.8% MAXIMUM;

    even if you don't buy my numbers, three 9s uptime means every week you only gets ~6 seconds downtime.

    yeah... sure... not if you want to patch up than internet explorer / IIS so your system does not die from DoS, hackers, or worms!

    --

    My life in the land of the rising sun.

  9. Never go to work again! by novakane007 · · Score: 2, Insightful

    Hate standing in the meat locker (server room)? Hate rushing to work past midnight to cycle a server?
    The problem I used to have is I'm not a morning person so being available as an admin before 7am is tough, but now I can admin my network while trapped in rush hour traffic. =] Reboot servers, telent into devices, stop/start services, add users, manage DNS... the list goes on and on.
    Uptime can be maintained without even having to leave the comfort of your easy chair. If you're an admin you should check this product out.
    SonicAdmin by sonicmobility
    (http://www.sonicmobility)

    --

    WURD!!
  10. Re:Oi! You act like a manager! by dasmegabyte · · Score: 4, Insightful

    Actually, even this is silly. True five nines availability on a widely distributed network would mean that an application was available at all times on all segments of the network. Which would mean that your uptime depends not only on your redundancy on one side of a pipe, but on your overall reduncancy as well, so that when a pipe goes down you're still accessible. Since when a pipe goes down in your host you probably lose other resources as well (such as power or alternate pipelines), this means multiple datahouses owned by multiple vendors. Each of these has to have a perfect backup of all data and be running the same versions of all software. Really, the only true redunancy would be so heavily distributed that each local network would basically have to have its own server. This isn't so crazy -- technically, DNS and email do this. However, we all know that for an end user even DNS and email can have perceived outtages.

    And this is why 5 9s is foolish. Sure, you're redundant behind the pipe, but if you lose the pipe you can't blame your datacenter when you charged a customer for uninterrupted service. Technically, if their modem disconnects them for a few hours you've broken contract.

    Besides, who needs it? If yahoo is unreachible from my desk, I wait and reconnect. It doesn't matter if the downtime was my fault or theirs...the effect on my user experience was the same. Any services I might have used, or products purchased, I will use or purchase at a later time. After all, I don't refrain from buying shoes just because the mall is closed!

    --
    Hey freaks: now you're ju
  11. Re:Nothing is THAT Important by Sique · · Score: 4, Insightful

    No, it means that a jetliner has to be operating for a year with just 3 mins in the hangar. But about half its lifetime a jetliner is in maintenance, giving it an uptime of about 50%.

    --
    .sig: Sique *sigh*
  12. Re: Netcraft have the final word on this by Black+Parrot · · Score: 3, Insightful

    > Too Bad that a lot of the servers on the top 50 uptime list still have the default page that apache provides. I'm sure it isn't too difficult to keep them running - just make sure the power is on and the network cable is plugged in.

    Historically, some very popular and widely sold operating systems couldn't even do that much.

    --
    Sheesh, evil *and* a jerk. -- Jade
  13. Re:Nothing is THAT Important by rodgerd · · Score: 3, Insightful

    Yeah, and when some systems fail, it doesn't actually matter.

    One thing I saw again and again during the .com boom was pissant little companies demanding 100% uptime, spending a fortune on Oracle and redundant data centres and shit, when they didn't need that reliability. Their business plan didn't call for it, their demographic didn't call for it, nothing called for it. They were engineering their shithouse little business' systems like they were for the A&E department of a hostpital.

    And that's the point the guy seems to be making: people are spending millions of dollars where they only need to spend a tenth that, to build systems you could run a trading floor with.

  14. The company I work for needs it...... by Anonymous Coward · · Score: 1, Insightful

    Police system needs to be able to access the criminal databank 24/7/365. Unless you want a shotgun in the face when you pull over a driver.

    Crime doesn't take holidays.