Slashdot Mirror


Why Do Computers Still Crash?

geoff lane asks: "I've used computers for about 30 years and over that time their hardware reliability has improved (but not that much), but their software reliability has remained largely unchanged. Sometimes a company gets it right -- my Psion 3a has never crashed despite being switched on and in use for over five years, but my shiny new Zaurus crashed within a month of purchase (a hard reset losing all data was required to get it running again). Of course, there's no need to mention Microsoft's inability to create a stable system. So, why are modern operating systems still unable to deal with and recover from problems? Is the need for speed preventing the use of reliable software design techniques? Or is modern software just so complex that there is always another unexpected interaction that's not understood and not planned for? Are we using the wrong tools (such as C) which do not provide the facilities necessary to write safe software?" If we were to make computer crashes a thing of the past, what would we have to do, both in our software and in our operating systems, to make this come to pass?

8 of 1,224 comments (clear)

  1. Touchy subject by aarondyck · · Score: 5, Interesting

    I remmeber years ago having a conversation with an IT manager at IBM. We were talking about the inability of computer programmers to make their code foolproof. His point was that we don't see problems like this with proprietary hardware. When was the last time someone crashed their Super Nintendo? Of course, with a PC platform (or even Mac, or whatever else) there are problems of unreliability. His idea is that this is because of sloppy programming. The reason we were having this conversation is that I had a piece of software (brand new, I might add) that would not install on my computer. You would think that a reputable software company (and this was a reputable company) would test their product on at least a few systems to make sure that it would at least install! The end result was that I ended up never playing the game (not even to this day), nor have I purchased another title from that company since that time. Perhaps that is the solution to the root problem?

  2. Re:Simple ... by The+Analog+Kid · · Score: 5, Interesting

    Yes, on my parents computer, which has 2000 on it(tried Linux it didn't work for them). I set most of the services to manual that aren't needed. Disabled Auto-update. Put it behind a router ofcourse. The only problem remained was Internet Exploder, well I just installed Mozilla with an IE theme, haven't noticed a difference). I think killing most of the services keeps it up. Haven't had a problem with it. This was done before KDE 3.1.x so who knows Linux might work after all.

  3. Mandate memory checking tools by hawkstone · · Score: 5, Interesting

    I'm sure it's harder to accomplish this for kernel level code (it's primarily OSes being pointed at right here) but you can think everything is working hunkey-dorey and not realize something is going wrong under the covers.

    Most errors of this can be found with testing under tools like valgrind or Rational's purify. I'm sure there are others (I've heard of ParaSoft Insure++, ATOM Third Degree, CodeGaurd, and ZeroFault), but the quality of these tools really matters.

    The issue is that tiny errors can cause crashes intermittently, and not immediately. For example:
    uninitialized memory reads -- usually not a problem, but if this value is ever actually used, it will be.
    array bounds reads -- never acceptable, but depending on the structure of memory, may not always cause an immediate crash.
    array bounds writes -- like ABRs, may not be immediately fatal, but these are going to crash your code sooner or later.

    Since they don't always cause an immediate crash, these errors are likely to creep in to released code without use of one of these tools. And if you want to know why we shouldn't always run programs in an environment that checks these kinds of things, try it once; you'll notice a speed hit of usually an order of magnitude. C/C++ is a perfectly acceptable language -- not all debugging has to be done by the compiler/interpreter or only after you notice a problem.

    Anyway, hope that wasn't too pedantic....

  4. Re:Microsoft by VTS · · Score: 5, Interesting

    Some time ago I would have agreed with you, but not anymore, If media player crashes playing some video then the whole system becomes unstable and then even doing something like sending a file to the recyclebin freezes the UI...

    --
    --- No 16-bit support in Vista? Half of our modules still use it! ---
  5. Re:The ultimate solution by Jeremi · · Score: 5, Interesting
    The ultimate solution to the problem is to let computers write the software themselves. Give them a goal, set up evolutionary and genetic algorithms, and let them go at it on a supercomputer cluster for a few months.


    That only works if you can write a fiteness algorithm that can tell whether the program did the correct thing or not -- otherwise, you have no way to decide what to "breed" and what to throw away. And for many types of program, that fitness algorithm would be more difficult to write than the program you are trying to auto-generate...


    Of course, you'd need to make sure the algorithms that humans wrote aren't flawed themselves, but once you got that pinned down, you would be more or less home-free.


    All you've done is replace a hard problem ("write a program that does X") with a harder problem ("write a program that teaches a computer to write a program that does X"). No dice.


    Even if you didn't take this drastic a step, another solution would be computer-aided software burn-in. Let the computer test the software for bugs. A super-QA Analysis if you will. Log complete program traces for every trial run, and let the machine put the software through every input/output possiblity.


    For most modern programs, there isn't nearly enough time left before the heat-death of the universe to do this. Hell, for programs other than simple batch-processors, the number of possible input and outputs is infinite (since the program can do an arbitrary number of actions before the user quits it)

    --


    I don't care if it's 90,000 hectares. That lake was not my doing.
  6. Time is Money. by Rimbo · · Score: 5, Interesting

    I think this is basically the right answer.

    A couple of months ago, the company I worked for spent a lot of time and effort developing a robust testing methodology. We had a software product that through blood sweat and tears would not crash unless you basically blasted the hardware in some way.

    But that led to two problems. First, we only had so many people working, and resources spent testing and bugfixing were not being used to add new features. Second, the time it took to get it that robust delayed the product's release beyond the point where we could recover the investment. [Time developing] * [Cost of operating] was greater than [expected number of units sold] * [price per unit].

    What ended up happening was that we lacked the features to justify the price and number of units we needed to sell to cover the cost of developing it. We had no bugs -- and we could be certain of it -- that would crash the machine.

    As of last month, the company could no longer afford to pay me. I'm not there any more.

    The moral of the story is that trying to make a bug-free product will bankrupt your company, especially a startup. Software tools have improved, but the benefit largely goes towards adding new whiz-bang features that sell the product for more money, not to being able to fix more bugs.

    What we should do as engineers and managers of software products is to not be afraid of getting the product out the door with a few bugs in it if we want our company to do well; this business reality is ultimately why bugs will a big part of software for the forseeable future.

  7. Re:Computers don't crash by Anonymous Coward · · Score: 5, Interesting

    The current issue of Scientific American states that 51% of crashes are due to user error. 15%=software error. 34%=hardware error. Refer to article for further info.

    You made a little "user error" there yourself-- the article says that 34%=software error and 15%=hardware error.

    Oh, and those figures are just for Web applications, not software applications in general.

    It's an interesting article. Unfortunately, they're not very clear about what constitutes a "user error." I've filled out Web forms that gave me an "error" when I included hyphens in my phone number or credit card number. That's far from an error, it's just poor user interface design.

    In my opinion, something the user does should never cause a program or operating system crash. If this can occur, it is the developer who is at fault, not the user.

    Apple's Human Interface Guidelines are a nice introduction to user-fault tolerance, even if you're developing for other platforms.

  8. Re:OT: Electric overconsumption by doorbot.com · · Score: 5, Interesting
    I wish there was consumer demand for low power destop computing.

    My mail/web server would run fine off of something rediculously small, like a Sharp Zaurus. Here are my requirements, and I will pay for one if it is available.

    1. Non-x86 hardware designed for lower power -- extra speed is nice, but not required; Pentium 200 speeds or better
    2. Low power, with 9V or AA-based battery backup (changeable while system is running)
    3. 3" - 4" LCD (with manual switch to turn off) at 640 x 480, or some sort of LED array/VFD, because all I really need is a low power terminal supporting 80 x 24 characters.
    4. USB port for keyboard
    5. Serial port
    6. Two or three 10/100 NICs
    7. Full (Debian) Linux support of all hardware
    8. Some sort of expansion (PCMCIA maybe, or via USB)
    9. Support for CompactFlash for backups
    10. Hardware encryption would be a nice goodie but not required


    Yes, I could probably build this with PC104 components, but I want a pre-built product, and I'm willing to pay for it (maybe $300 - $400).