Why Do Computers Still Crash?
geoff lane asks: "I've used computers for about 30 years and over that time their hardware reliability has improved (but not that much), but their software reliability has remained largely unchanged. Sometimes a company gets it right -- my Psion 3a has never crashed despite being switched on and in use for over five years, but my shiny new Zaurus crashed within a month of purchase (a hard reset losing all data was required to get it running again). Of course, there's no need to mention Microsoft's inability to create a stable system. So, why are modern operating systems still unable to deal with and recover from problems? Is the need for speed preventing the use of reliable software design techniques? Or is modern software just so complex that there is always another unexpected interaction that's not understood and not planned for? Are we using the wrong tools (such as C) which do not provide the facilities necessary to write safe software?" If we were to make computer crashes a thing of the past, what would we have to do, both in our software and in our operating systems, to make this come to pass?
Well the computers that I manage we've got an OpenBSD server hat never crashes (uptime max is around 6months--when a new release comes out) and a FreeBSD server that has never crashed--max up time has been around 140-150 days, and that was for system upgrades/hardware additions.
On the workstation side they are definitely not THAT stable, but since we've switched to XP/2K on the PC side, those pc's regularly get 60+ days of uptime. Just as a note--I had a XP computer the other day that would crash about two or three times a day. The guy that was using it kept yelling about microsoft, etc etc etc. Turned out to be bad ram. After switching in new ram it's currently at 40 days uptime (not a single crash).
For some reason the macs we have get turned off every night so their uptime isn't an issue, but from what I hear OSX is quite stable.
I remmeber years ago having a conversation with an IT manager at IBM. We were talking about the inability of computer programmers to make their code foolproof. His point was that we don't see problems like this with proprietary hardware. When was the last time someone crashed their Super Nintendo? Of course, with a PC platform (or even Mac, or whatever else) there are problems of unreliability. His idea is that this is because of sloppy programming. The reason we were having this conversation is that I had a piece of software (brand new, I might add) that would not install on my computer. You would think that a reputable software company (and this was a reputable company) would test their product on at least a few systems to make sure that it would at least install! The end result was that I ended up never playing the game (not even to this day), nor have I purchased another title from that company since that time. Perhaps that is the solution to the root problem?
In order to be immortal you must be organize
Scientific American actually had an article on a similar topic. Basically, they seem to be accepting crashes as ineveitable, and were focusing on systems to help computers recover from crashes faster and more reliably...
They also propose that all computer systems should have an "undo" feature built in to allow harmful changes (either due to mistakes or malice) to be easily undone...
A Minesweeper clone that doesn't suck
People upgrade for new features. That computer/OS/gizmo you have today does a lot more than the one from 10 years ago. That's a lot more code that needs to be written, and thus a lot more opportunity for errors. It's that simple.
(I'm actually ok with that. I'd rather have a moderately crashy Windows XP box capable of playing GTA:Vice City than the hypothetical alternative: a super-stable Windows95, capable only of playing "Doom 2".)
Yes, on my parents computer, which has 2000 on it(tried Linux it didn't work for them). I set most of the services to manual that aren't needed. Disabled Auto-update. Put it behind a router ofcourse. The only problem remained was Internet Exploder, well I just installed Mozilla with an IE theme, haven't noticed a difference). I think killing most of the services keeps it up. Haven't had a problem with it. This was done before KDE 3.1.x so who knows Linux might work after all.
I've crashed OS X. It wasn't even that hard, really. I just did a bit of extremely intensive stuff with Adobe InDesign and it died. I found that it was also far more resource-intensive than most other Operating Systems I use. Perhaps it's just the way I use it, but I think that the only OS I haven't been able to crash is DOS.
In order to be immortal you must be organize
The ultimate solution to the problem is to let computers write the software themselves. Give them a goal, set up evolutionary and genetic algorithms, and let them go at it on a supercomputer cluster for a few months.
Of course, you'd need to make sure the algorithms that humans wrote aren't flawed themselves, but once you got that pinned down, you would be more or less home-free.
Even if you didn't take this drastic a step, another solution would be computer-aided software burn-in. Let the computer test the software for bugs. A super-QA Analysis if you will. Log complete program traces for every trial run, and let the machine put the software through every input/output possiblity.
occultae nullus est respectus musicae - originally a Greek proverb
"How often does your 4 function pocket calculator crash?"
Well, maybe never, but...
My calc professor once put a question on an exam that was designed to crash a TI-89.
I'm sure it's harder to accomplish this for kernel level code (it's primarily OSes being pointed at right here) but you can think everything is working hunkey-dorey and not realize something is going wrong under the covers.
Most errors of this can be found with testing under tools like valgrind or Rational's purify. I'm sure there are others (I've heard of ParaSoft Insure++, ATOM Third Degree, CodeGaurd, and ZeroFault), but the quality of these tools really matters.
The issue is that tiny errors can cause crashes intermittently, and not immediately. For example:
uninitialized memory reads -- usually not a problem, but if this value is ever actually used, it will be.
array bounds reads -- never acceptable, but depending on the structure of memory, may not always cause an immediate crash.
array bounds writes -- like ABRs, may not be immediately fatal, but these are going to crash your code sooner or later.
Since they don't always cause an immediate crash, these errors are likely to creep in to released code without use of one of these tools. And if you want to know why we shouldn't always run programs in an environment that checks these kinds of things, try it once; you'll notice a speed hit of usually an order of magnitude. C/C++ is a perfectly acceptable language -- not all debugging has to be done by the compiler/interpreter or only after you notice a problem.
Anyway, hope that wasn't too pedantic....
In the world of games, especially console games, a crash immediately spoils the user's gameplay experience, and it's doubly so if you don't have a mechanism to patch games as in the PC world.
In the GameCube, crashes are alleviated by having only a thin OS layer between the hardware and the game, and restricting only a single task to be run in a single privilege level of the CPU, avoiding context switches and going back and forth between user and kernel mode which introduces complexity and can wreak havoc if malicious data is present.
Furthermore, we have a set hardware configuration, running a well defined consistent set of drivers, which are again, minimal, and this eliminates another factor that often leads to crashes in the PC world.
The most important thing though is robust software design. In our games, we all code exception handlers for the software, so that a single errant NULL pointer doesn't bring the whole thing down with a "Segmentation fault" message as PC users seem to experience with their software, but rather, we gracefully recover, perhaps immediately rolling back to the previous iteration in the game loop and "moving" the player a bit, for instance, in a FPS where the player might have entered into an area in a orientation that happens to create a divide by zero error due to numerical imprecision.
In the future with CPU and memory speeds increasing, we are investigating new designs, such as microkernel based architectures where individual game entities are separate protected "processes" that communicate via some fast IPC mechanism such as shared memory or a "tuplespace", so that a bug in one entity doesn't bring the whole universe crashing to a halt, and I hope that such techniques are adopted by the general computing world.
-- Samir Gupta, Ph. D. Head, New Technology Research Group, Nintendo Co. Ltd., Kyoto, Japan.
People who lack with facility with English generally write shitty interfaces that people loathe using, even if the code is "clean".
Some time ago I would have agreed with you, but not anymore, If media player crashes playing some video then the whole system becomes unstable and then even doing something like sending a file to the recyclebin freezes the UI...
--- No 16-bit support in Vista? Half of our modules still use it! ---
There are a lot of moving parts in a working linux system (I'm talking CLI here), however, it seems to be less prone to crashing. As someone previously mentioned, software that is larger and more complex is more likely to have a bug. The point I'm getting at is that the design priciples of *nix dictate many small programs to create a large working system. When a program is small it can be designed and developed with care. This leads me to my final though, modern Operating Systems with GUIs are less stable because they are generally designed as large monolithic systems.
I'm going to claim that the prime reason systems with GUIs (and I'm including everyone) are unstable is because noone has come up with a rock solid base for such a system. X is not solid, windows explorer, mac os x's application manager, no one has it right.
The one thing I am leaving out, is that drivers also tend to be a major cause of instability. I cannot run the nvidia driver on my gentoo box, certain usb events can bring a system to a screetching halt. What needs to happen is better design around the unstable interfaces, such that in the worst case scenario, things can still be recovered.
Fear trumps hope and ignorance trumps both
My Windows XP box, which is my fileserver, has been up for 5 months so far.
My OS X box, which I use for web browsing and word processing, crashes about once every three days.
Now, I certainly have some bones to pick with Microsoft, but Apple is no better.
Best Buy can have you arrested
instability is inevitable for fast evolution. A stable system means its not evolving fast enough, or evolution is slow.
Many softwares have been evolving so fast that there's been no time to perfect the existing features before adding new ones. At some point in the lifetime of indivisual software, it reaches a point where it's somewhat "stable" in the sense that no more major features are needed. For example, TeX reached its relative maturity during 80s and IIRC, there's no known bug at this point.
If all softwares are given enough time, they will all reach that kind of maturity. The problem is not all of them can survive that long - usually they become obsolete before they become stable...
Use OpenZaurus and while crashes still appear (I assume 3.2 will eventually, though I haven't had a full crash since it first came out), crashes will not lose all your data, since it's written to flash.
Also, my Linux box hasn't crashed this year, and I can't recall any crashes last ye-- no, wait, there was one slew, but it was an icky driver which I got rid of. I'd say a pretty good track record for a system built almost entirely from CVS.
Can't remember any crashes this year or last on any other Linux boxes I manage that I can think of (8 boxes off the top of my head).
Accept it. It's a fact of nature.
When can we finally give up the FUD of "MS crashes all the time"? Anyone who has used a later MS OS (Win2k or XP) can easily see they crash very rarely. I have had my Redhat install have more problems than my Windows install in the past 6 months, and on the MS system most of the problems have been 3rd party software while on the Linux most of the problems have been the OS itself. The reason systems crash is that there are many pieces, written by many different people, interacting with each other. This is the same whether the OS is Linux of Windows. The harping on the instability of Windows does nothing but hurt the Linux cause, since anyone who actually uses a newer version of Windows knows that the person has no basis in reality.
"Information wants to be expensive" - Stewart Brand, the same guy who said "Information wants to be free"
I think this is basically the right answer.
A couple of months ago, the company I worked for spent a lot of time and effort developing a robust testing methodology. We had a software product that through blood sweat and tears would not crash unless you basically blasted the hardware in some way.
But that led to two problems. First, we only had so many people working, and resources spent testing and bugfixing were not being used to add new features. Second, the time it took to get it that robust delayed the product's release beyond the point where we could recover the investment. [Time developing] * [Cost of operating] was greater than [expected number of units sold] * [price per unit].
What ended up happening was that we lacked the features to justify the price and number of units we needed to sell to cover the cost of developing it. We had no bugs -- and we could be certain of it -- that would crash the machine.
As of last month, the company could no longer afford to pay me. I'm not there any more.
The moral of the story is that trying to make a bug-free product will bankrupt your company, especially a startup. Software tools have improved, but the benefit largely goes towards adding new whiz-bang features that sell the product for more money, not to being able to fix more bugs.
What we should do as engineers and managers of software products is to not be afraid of getting the product out the door with a few bugs in it if we want our company to do well; this business reality is ultimately why bugs will a big part of software for the forseeable future.
My OS X box, which I use for web browsing and word processing, crashes about once every three days.
The Ti PowerBook G4 I am writing this post on is running Mac OS X 10.2.x. It goes in an out of sleep on an irregular basis, and not always when it is idle. I swap PCMCIA cards in and out. It hops from network to network. I do a lot more than browsing and word processing.
According to my Konfabulator uptime widget, I have 83 days, 23 hours, 20 minutes. My load average at the moment is 1.7. It has not been rebooted since I installed OS X (I did it myself after buying it just for messing around purposes).
You sir are either lying, have bad hardware, or you've severely corrupted your installation. This operating system (which is BSD) is solid as a rock.
Join Tor today!
The current issue of Scientific American states that 51% of crashes are due to user error. 15%=software error. 34%=hardware error. Refer to article for further info.
You made a little "user error" there yourself-- the article says that 34%=software error and 15%=hardware error.
Oh, and those figures are just for Web applications, not software applications in general.
It's an interesting article. Unfortunately, they're not very clear about what constitutes a "user error." I've filled out Web forms that gave me an "error" when I included hyphens in my phone number or credit card number. That's far from an error, it's just poor user interface design.
In my opinion, something the user does should never cause a program or operating system crash. If this can occur, it is the developer who is at fault, not the user.
Apple's Human Interface Guidelines are a nice introduction to user-fault tolerance, even if you're developing for other platforms.
Computers crash (and have any number of other problems) largely
...), and code
because almost all software is still developed using third-generation
("high-level") languages. These languages place on the programmer
the burden of such fiddly details as allocating and freeing memory
and checking the size of allocated memory to see that it's adequate
for the data being copied in.
*Most* of the time when an application crashes seemingly at random,
it's a memory allocation problem of one kind or another: a buffer
that was allocated to small and gets overrun, or a pointer error,
or something of that nature. When an application (or your whole
system) grows more sluggish the longer you leave it running, that's
usually a memory leak: something was allocated and not released
properly -- repeatedly. All of these problems result from a lack
of excruciating vigilence on the part of the programmers when using
a language that requires it. In a large project, maintaining that
ceaseless caution is a nightmarish prospect.
Languages (both interpreted and compiled languages) have been around
for over a decade that handle these things, freeing the programmer
to concentrate on developing the more high-level features of the
software, but because this checking imposes some overhead (in terms
mostly of CPU time and sometimes some memory footprint), they don't
get used for most applications. Yet.
The time is coming, though. The value of VHLLs is beginning to be
recognised, *finally*. When software is written in a language with
built-in memory management, problems like segmentation faults (core
dumps in Unix; in the Windows world these are known as Illegal
Operations, formerly known as General Protection Faults) and buffer
overruns go away entirely.
Add proper garbage collection (not reference counting like Perl5
does, but real gc, which I hope we will get in Perl6), and you
also dispense with memory leaks once and for all.
It's coming. Applications are *beginning* to be developed in this
next generation of languages, but it takes time, because all the
existing apps are mostly C and C++, and you have to throw them out
and start over, which nobody wants to do for obvious reasons.
There will of course always be room for a certain amount of
inherently low-level code written in C or one of its kin: code
that absolutely can't spare a nanosecond per run, code that has
to run on the bare metal (kernels, bootloaders,
needed to bootstrap the VHLL tools (compilers and whatnot). But
when C is no more common than assembly language is today, then
you'll be done with random crashes.
Applications will of course still have bugs -- circumstances
wherein they don't perform as they ought. And you'll still have
hangs, because nobody's figured out how to design a compiler or
interpreter that can detect an infinite loop, and nobody except
Mel[1] has coded up an implementation for completing an infinite
loop and passing on to what follows. Perhaps quantum computing
will one day change this, but that's outside of the forseeable
future. But crashes of the sort where the app suddenly terminates
should be mostly a thing of the past within twenty years, ten if
we're quite lucky.
[1] Google for "The Story of Mel, A Real Programmer".
Cut that out, or I will ship you to Norilsk in a box.
I'm willing to concede that the codebase was considerably smaller. It had to be, in order to produce an executable that would fit in 800K (the size of a 3.5" double-density floppy) and would run reasonably well on a 1-MHz 8-bit processor with as little as 128K of RAM...but I don't find myself doing sufficiently more advanced stuff in Word or Excel than I used to do in AppleWorks (actually, AppleWorks was probably doing more sophisticated stuff with UltraMacros added to it). I would be willing to wager that 95% of Office users use no more than 5-10% of its features. All that extra code that keeps getting added in with every new release means there's that much less time spent making sure the core functionality (and all of the chrome added in previous releases) is bug-free.
(I'll admit that I haven't had much trouble with Office...but then you've noticed that I don't push it particularly hard either.)
20 January 2017: the End of an Error.
...because we aren't willing to wait for, or pay for, software that has been adequately tested to any reasonable level of reliability.
/. articles and Ars Technica articles for weeks if a console game came out that crashed, but when PC games are released that have those kinds of problems, it's hardly news.
With something like Windows XP, no amount of testing could eliminate every conceivable bug, but there is no doubt in my mind that Microsoft, along with almost every other software company in the world, rushes poorly designed, inadequately tested products to market to meet customer demand.
Remember, a product's success is due largely to a check list of features created by the marketing people. A product with 90% reliability and 100 features will sell better than a product with 98% reliability and 10 features. Otherwise, how can you explain the success of Microsoft Office? OK, bad example, MS Office is successful because it's been bundled with so much hardware, but you see my point.
The bottom line is computers are now a commodity. They have become so ubiquitous and cheap that I can go down to the Salvation Army and purchase what would have been considered a supercomputer 10 years ago, for $50. Software is quikly reaching the same state. How much software can you buy for $10 or less? A lot. And not all of it is bad, though most is. On the other hand, you can drop hundreds or thousands of dollars on software that is just as quirky, hard to use and even just as buggy.
Here's the thing that always interested me. Why don't console games crash? I'm sure they do sometimes, but I've got a Dreamcast and about 50 games. I've seen a small bug here and there, but I've never seen the machine blue-screen or whatever DC's do when the OS lunches itself. I realize that the standardized hardware platform has a lot to do with it, but games are every bit as complex as other software, perhaps more so. So why don't these games crash? Well, if they did, they would never sell. I'm sure there would be
Kinda makes me wonder...
You are in a maze of twisty little passages, all alike.
This reminds me of a story I read in the internal magazine of a telecomunications equipment supplier that I used to work for. It was about an international toll switch somewhere in the U.K. that had been up for 17 years (or something extreme like that.) Furthermore, this included having all of its hardware upgraded and replaced. Twice.
Just stop and think about that for a while in PC terms... "I replaced my motherboard with the power on without rebooting my system, while it was serving 10,000 web pages a second."
Granted, this is a higher level of hardware with full redundancy, but it still boggles my mind.
Software is incomprehensibly fragile -- any single thing can cause a crash, taking the whole system or application down. And even those critical parts of things like airplanes have multiple redundancies, something that's hard to build into software. You can do things like catching exceptions, but you typically can't recover as gracefully as if there was never a problem at all.
The shuttle is actually not a bad analogy -- it's also very fragile due to the stresses it endures. And we've effectively had two crashes in 100 runs. Most software is more stable than that.
These are true statements:
-In our server room, which, admittedly, is a little crowded, a Windows 95 box was disconnected from the network but accidently left running. It stayed up for more than a year. No load, of course, but it stayed up. It made the hair on my neck stand on end.
-In the same server room, a clone PC running Suse Linux 7.0 ran for just short of two years without a reboot. It would have gone longer had the old, 2 gig hard disk not died a clunking death. Fortunately, the web data was on a different disk. We loaded another system drive and had our departmental web/Samba server up in minutes.
-We have a Compaq Prosignia 200 running NT4 and Raptor 6.0 Firewall. It has seen uptimes exceeding 9 months on more than one occasion. Would have gone longer, I think, were it not for some memory leaks in the Raptor management console snap-in.
I point these things out so as to ask the question: how stable is stable? Hey, *nix has been my passion for years, but I've seen for myself that NT4 and, now, Windows 2000, can perform well if they are set up by someone who knows what s/he is doing. I believe impressive uptimes can be attributed to many things, but I do not always blame the OS code for the bad things that happen.We all know what bad firmware and drivers can do. I'll take NT4 on an Alphaserver over Linux on a Packard-Bell any day.
Of course, Linux on the Alphaserver is better yet . . . . : )
It's only funny until someone gets hurt. Then, it's hilarious.
Yes, NT5+ is very stable. MS is working on the driver problem. SLAM is a tool for verifying drivers. Given a requirement, e.g., after acquiring a kernel lock the driver must release it exactly once on all control paths, and some driver source code, SLAM can find all the ways the driver can fail the requirement. They have specifications for various driver types and are using them to test some drivers. It's a research project by the Software Development Tools group in MSR, but they're working on getting it stable and powerful enough to verify more drivers. If they can get it to work well enough, they'll supply it to hardware vendors.
I don't remember having any games screw up my system since I stopped playing half-life. I built a new system a couple months ago and it hasn't crashed once. ;)
I had a win98 system last a bit past 30 days with regular use once and it was terribly hosed by the time I rebooted. Win2k or XP can last until your power goes out, you kick the surge protector, or you need to reboot to install drivers/software/hotfixes
Although I certainly have no problems with apple or macs in general, I do have a problem with their user interfaces. Personally I don't think not giving the user the option of defining any settings which could cause malfunction to be the answer. The reason? Well it's pretty simple, when set properly those same settings give flexibility, added functionality, and performance (at least one, sometimes two, often all three of the above).
Actually, "syntax errors" like this DO cause a problem for wetware systems -- they cause the brain (well, mine at least) to kind of glaze over and take the remainder of the sentence/thought much less seriously. Kind of like aborting/returning out of a subroutine.
Here in the Slashdot world of "definately" and "righting", I've learned that any posted comment that makes high-school-level grammatical or spelling errors is not worth my time and I immediately skip the post. I've been doing this quite rigorously lately -- blah blah blah "seperate" PAGE DOWN.
OK now, everybody nod and think I'm talking about someone else's posts ...
One simple rule for its versus it's
My experience is quite different. I have 7 computers that run 24/7 playing a popular MMORPG.*
The systems themselves are a collection of spare parts and old workstation purchased off Ebay. At the low end is a typically configured Pentium II 400 and at the high end is a typically configured Duron 900. All of them are running Windows 98 SE.
The game and the scripts keep all the systems at or very near 100% cpu utilization at nearly all times. The only time they are not working is when the game servers are down or my internet connection is down. Both of those are not very frequent.
Even under that somewhat heavy load, I go months without rebooting them. In fact, the only time they are rebooted is when I lose power or I'm leaving for on an extended vacation. One of them is an exception to that rule and has blue screened on occasion, perhaps 3 or 4 times in the past year.
Of course, on the system I actually use(not one of the seven described above), I left windows 98 a long time ago and I remember being plagued with BSOD's, lock ups, and constant reboots to keep things working.
What explains these two opposing performance comparisons? I have no idea really, but I have a guess...
On systems I use, I am constantly adding/installing software and hardware. On the systems that just macro 24/7, I don't do any of that. There is nothing but the bare essentials installed. Perhaps that has something to do with it.
Anyhow, back to the main point, I disagree that Windows based systems crash even if they are not doing anything. I have a whole bunch that work hard all day and they don't have that problem.
*No, that wasn't a typo, scripts on the computers "play" the game. It is known as macroig in the MMORPG world.
About your sig: Actually, I currently write games on a machine with about 1.5K of memory and an 895kHz CPU. And I am grateful.
--JoeProgram Intellivision!
"All programs (for the most part) must be written by people. ... Computers crash because people cant catch that one little fatal error in 10,000 lines of code."
All bridges (for the most part) must be built by people. Bridges collapse because people can't catch that one little fatal error in one or two million components.
The shit coders put out there, I swear... The reason software crashes is that by-and-large it's hacked together, not engineered. You hack a bridge together, and yes, it'll fail. You engineer software, and yes, it will run reliably. It's not fun to do - no easter eggs, no cool tricks, no cramming features in weeks before ship.
I'm stunned at the amount of code that goes out that was written by interns, by unexperienced coders, by people that just don't have a clue. The software industry really has no concept of best practices, no leadership, no authority body. The fact that buffer overflows still happen is stunning.
It's not small projects that work well because out of dumb luck they happen to not fail, or larger projects that work okay because we have 34,000 people looking at the code. If that's 'best practices', then we're doomed.
"Mozilla (www.mozilla.org) has a feedback option to help them debug, many software companies are including this."
Uh huh. Let's translate that to my car: "Hi. Yeah, I'd like to report a bug. I have a Saturn Ion, version 1.1v4. Yeah, when I turn on the left turn signal and then turn on the lights, the car catches on fire. You might want to fix that in the next version. Just though you might want to know. Bye."
Windows 9x actually has a bug in it that would lock the computer after 46 days of uptime, but it took years to catch it because no one ever got close to that mark.
Bullshit, bullshit, bullshit. This urban legend deserved to die years ago.
I ran several Windows95 OSR2 systems with uptimes approaching 90+ days, and had no problems with them locking up. Sure, 9x wasn't HAPPY with this, and if you ran a lot of applications odds are you won't hit this, but I did it many times in my former employment.
When the '45 days' (as I heard it first) rumor started going around, I set up a bunch of idle 95 machines for fun, and on days 45-50 watched for anything going on. Not one crashed.
Hell, for all I know, Microsoft themselves are reporting this, just to cover their asses based on some average uptime limit they worked out, but I will swear on a stack of bibles that I've had Win95 machines go at least twice this supposed limit without locking up.
Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
Architects and engineers use extremely detailed drawings. Have you ever taken any drafting courses in Highschool or College? Every piece and even the size of every screw is accurately detailed as possible. It takes forever to get anything done because the precsion is more important. It drives some people like myself crazy.
The blueprint is the actual prototype of the product being designed.
The problem is if you document every step and algorthim in exact detail you will spend weeks, months, and yes years without a single line of code!
This is unacceptable in today's bussiness world where all the projects are due yesterday and your bosses demand percentage wise how much of the code is being developed. If you spend a month planning and not a single line of code is developered your canned.
My father took over a project where a clueless IT manager got because she slept with the CIO. Anyway she went to a seminar which talked about over flowcharting everything would be the wave of the future. She then had all the programers draft every single algorithm to the very if statements themselves on paper. After 4 months and not a single line of code my old man took over. From there he finished the project within 3 weeks!
My point is that drafting programs is too time consuming. In a way your drawing is the program and changes can be made as you go. Its essential to have good flowcharts and notes but they need to be generalized. If there is an error in it you can delete the line and fix it. In engineering you would have to dissamble the actual product and redesign it. Because they would cost time and money it is not accepted. In software that limitation is not there or as sevre.
UML tries to be the blueprint of all software programs but instead is only used to explain certain subsystems and algorithms. Mostly flowcharts are used so all the developers have a sense on how the program will work and how to invoke different pieces of the program.
I do not think this going to change unless there is a quick and easy way to debug UML charts. Logic errors are killer and if its perfect I suppose you can compile the uml directly into the language of choice.
Hmmm infact this might be the way to do it in the future.
http://saveie6.com/
My mail/web server would run fine off of something rediculously small, like a Sharp Zaurus. Here are my requirements, and I will pay for one if it is available.
Yes, I could probably build this with PC104 components, but I want a pre-built product, and I'm willing to pay for it (maybe $300 - $400).
Interesting.
I play RTCW quite a bit on my WinXP box with no issues. RTCW occasionally crashes, and I have to hit CTRL-ALT-DEL to bring up task manager and kill it, but the system remains stable.
When I first built this box I had some issues, after a while it would lock up. Turned out it was because the video card was overheating. The system itself wasn't locking up, just the video card. Put the system in a new Antec SX-835II case with better cooling and haven't had a problem since.
While the constraints may be cost etc perhaps something I took from a PL/1 book - ;-0 years ago may be relevant.
'The Meaning of Correctness
1. The program contains no syntax errors that can be detected by the compiler.
2. As for 1 and it can be run.
3. There exists a set of test data for which the program will yield the correct answer
4. For a typical ( ie reasonable) set of data the program return the right answer
5. For a deliberately difficult set of data the program returns the right answer.
6. For all sets of data, valid with respect to the specification, the program restuns the right answer
7. For all possible sets of valid test data, and for all likely conditions of erroneous input the program returns a correct ( or at least reasonable) answer.
8. For all possible input the program gives the correct, or reasonable answers.
Most programmers work at level 3 or 4
Users at 8.'
(I am sorry but I have lost the reference to the original book)
In the vast majority of cases, it's simply not economic to release bug-free code.
1. Any programmer knows that 90% of the code is written in the first 90% of the time, and the other 10% of the code is written in the other 90% of the time. (no typo). That is to say, it takes a lot more time, effort, and hence money, to move a project from "working well" to "working perfectly".
2. Many software companies these days make very little profit on the 1.0 release of their software, and make huge amounts of money through ongoing support charges. Microsoft is a classic example of this type of company.
3. If you release a piece of software that works really well, does everything the users want, and never crashes or causes trouble, then you may as well pack up shop and go out of business quietly. The unfortunate truth is that nobody is going to buy version 2 if they can do everything they want with version 1, and they're not getting constantly frustrated by crashes. The only carrot you have in this situation is to think up some really great ideas for version 2 in order to encourage people to upgrade - In fact, some of those ideas may have been deliberately left out of version 1 just so that they could be added later. Version 3 is more difficult still, and version 5 is right out. By comparison - how many versions of office are we up to now ?
A notable except to this business model is the games writers. Companies like valve and id software consistantly produce very near to bug-free code that works well and generally impresses the masses.
In all the years since half-life was released, there have been relatively few patches and fixes, and many of those were to prevent ingenious new methods of cheating, or to add support for hardware that didn't exist when the game was first released. The unreal engine had a similar history.
People buy new games because they crave the excitement or challange of exploring and interacting with it. That's not something that could really be said about excel or word, so those sorts of products have to rely on the "draw out the profit over many releases" strategy described above.
Another (big) factor is people's expectations - most people expect that word will crash from time to time, and given microsoft's past history, they have little reason to expect that to change. On the other hand, gamers have an expectation that the latest game from id software will be as solid as a rock, and that the few problems that do crop up after the release will be fixed quickly.
If a games company didn't spend that "other" 90% on the last 10% of development, and released something that crashed as often as explorer, their reputation would be mud within days, and people would stop buying their games.
And lastly, choice.
People have a choice as to which games they want to buy. It's a competitive market out there, with many people having little disposible income to spend on games. On the other hand, despite what linux advocates (I can't believe I'm saying this on slashdot) say, most people use MS apps and operating systems because they don't have a choice - say due to corporate rules.
You might think that it is the end user that gets the sharp end of the stick here, but the people that really get screwed are the dedicated and talented programmers, who are working for companies that don't care too much if they release code before it has been fully tested.
UNIX had the opposite philosophy. The hardware was expected to work perfectly. This led to situations where a DEC operating system would run reliably on a particular machine for months at a time and UNIX would crash within minutes on the same hardware.
Mea navis aericumbens anguillis abundat
I got fed up with just that sort of thing and changed computing platform. I'm not saying that the Mac never crashes, but it's certainly been a massive, massive step in the right direction.
A quick trip to the terminal reports my uptime as "11:35AM up 57 days, 12:42..." This is by no means a long time by Unix standards, but for a laptop (iBook 600Mhz) that I use everyday, sleeping, waking, starting and stopping multiple programs, working on all sorts of stuff, burning CDs, browsing the net etc, I'd say it was very good.
The longest I could go on my Windows 2000 box before I'd have to reset was about a week - it wouldn't crash, it would just get confused and start swapping icon images over, so Word would have the Excel icon, and so on.
The only time I reboot my iBook is for system updates. Very few programs "Unexpectedly Quit" on me (Camino used to do it occasionally, every 2 weeks or so, but I'm using Safari right now). I've never had a kernel panic in 10.2.x (I had two in 10.1.5, but I traced it to the well known Classic environment and a USB device panic bug that was fixed).
If you want your software to crash less, buy a Mac.
I agree completely... This is the same kind of thinking that people use to try to outlaw guns... "If someone can use it to commit a crime, we should just eliminate them!".
I would say that poor development, insufficient design, (obviously) insufficient testing and a focus on features rather than security are MUCH more to blame for software quality issues than which language was chosen for the implementation.
I still think we should be able to moderate the whole article as a Troll...
T
---- It puts the lotion on its skin or else it gets the hose again. It does this whenever it's told.
It's interesting how little has really changed in the past 5 years...
Best Slashdot Co
Impossible.
How can a "user error" cause a crash. Software should do proper bounds checking and should act appropriately (which may mean giving and error message) no matter what input it is given.
About the only crash due to user error that I can imagine really being due to user error would be the user killing the proicess with killall or pkill or its moral equivalent.
Other than that, its just bad bounds checking and blaming it on user error is really bad form.
Part of the problem IMNSHO is the commodity desktop. There are so many machines and they are all cheap and its more important to get the work done than it is to make sure the crash doens't ever happen again.
On real systems, if the system crashes, crash dumps are sent off to the OS vendor and they track down the problem and fix it. I know, we have had to collect and send off crash dumps in the past.
Each round of that makes the system more stable.
Thats one of the advantages of Linux, and why there are some systems that don't crash (my linux boxes pretty much only crash when the power goes out, and the UPS battery drains). That is, that these OSs like Linuxs and BSD are used in real enviornments and there are people commited to fixing the problems... so even the lowly common desktop user reaps the benefits.
See there is the differnce.. Windows, even the "server" versions grew out of a desktop OS with a desktop way of doing things. "Oh the server crashed, well lets reboot and hope it doesn't happen again", whereas Linux and BSD come from the land of the server down to the desktop "Oh the server cashed? get DEC on the phone" or "Get out those crash dumps".
-Steve
"I opened my eyes, and everything went dark again"
I think this brings up a good point. Hardware may have improved, software development tools may have improved, the people writing software have gotten much worse. A few years ago most people who were in the computer industry were there because they knew something. Now they are there because they wanted money, some HR droid picked their CV out of a pile because of the acronyms, and some manager does not know enough to fire them. Layoffs haven't helped either, generally the knowldegable people with higher salaries get booted first. Security vulnerabilities are up (including old stuff that has not been patched) and successful projects are down.
You got me into this! You were the ideologue! I'm only a poor assassin! - Twenty evocations, Bruce Sterling
As someone said before -- no product liability -- you have to pay money just to report a bug ...
Training of Software Engineers. With point and click interfaces you have people with an average reading ability of a 5th grader writing code. Even hinting that someone wasn't a good writer of code was considered "unprofessional" at some workplaces (i.e. -- you are not a 'team player').
Capitalism -- it's not cost effective to fix bugs until a customer finds them.
Even in code for Secure OS's under Common Criteria CAPP/LSPP, vendors aren't required to fix bugs that are not discovered by the independant evaluator or the customer. So even if the product manager knows of bugs in the OS that is intended for 'high security' government projects, there is no law saying he has to list them or fix them (unless they are found by a 3rd party or the customer). Spending time fixing bugs that are NOT found by the customer is not only not cost-effective, it is considered not working on "assigned priorities" and can be grounds for lower reviews.
This isn't pessimism -- it's reality. Quality doesn't pay when you can sell customers faulty products then charge the customers to fix the faulty product you sold them in the first place -- one might argue that it pays to have more bugs in the code -- you can charge more for service contracts and rack up more incidents that you then charge the customer, per incident, to handle.
When I was in college in Computer Science (how many programmers today have a formal degree in Computers, vs. say, a liberal arts degree?), Sophmore year, University of Midwest - CS201 - required for Computer Science majors -- beginning assembly language in Compass (CDC assembler).
The price of perfection is taught early -- an early lesson was when for a final project we were to work with 2-3 other people to make a final program. The deadline was approaching and our program still wasn't running. Turning it in late was a letter grade drop/day. Two of us felt we were close and didn't want to turn in a non-running program. The third wanted to turn it in. They also felt that they'd done their part and there were no problems in it.
The third turned in the project with his name on it. My partner and I spent another day cleaning up his code to get it to work and turned it in. We got a a "C" on the project, with a downgrade for bad coding practice in his section of the code and being a day late. He got a "B" even though it didn't work. In the final grade both he and my partner got "D"s while I got a "C", which sorta sucked for my major -- but it turns out that 60% of the class got "D"s and "E"s. Made a big stink about the course material being too difficult and the teacher made a public 'booboo' comment "It was the same material he'd taught before, it was just an exceptionally dumb class." Major ire of parents.
Anyone who got a "D" or "E" had it stricken from their academic record. It as the only "C" I got in my comp-sci curriculum (str8 A's in 300 level and above classes). But on that project, I learned that deadlines were more important than code quality.
Spin forward 15 years -- at small startup before Xmas. Deadline for demo approaching and I and other team member had parties to go to that evening. He was programming a DSP chip (he was a PhD wizard), and I was handling the drivers on the 286 DOS box. I checked my code backwards and forwards and he swore it couldn't be his stuff. Finally, I displayed output he was sending and it was 'wrong'. Unfortunately, my party had been out of town and I'd already missed the deadline for getting there because it was emphasized to me how important the project was to complete before leaving. When the problem was discovered in his code -- guess what -- he could't stay to fix it (I didn't
know anything about the DSP chip he was using) because, the VP told me, he was married and his wife was gonna leave him if he missed the party (I don't think he was serious, but maybe). I had no such excuse -- only a partner who went to the party alone.
Again -- what do I learn? Personal relationships take presidence over
product and code quality, so far we have code quality below deadlines and below personal relationships (though that has more disappeared in the modern
world).
more later...
-l
The core of the problem was delineated in the book "Weird Ideas That Work: 11 1/2 Practices for Promoting, Managing, and Sustaining Innovation". It it he makes the main point -- that those people who are most creative are the people who don't do things the "normal way". They are the 'loners' -- the 'slow adopters of company culture'. They aren't the team players and they are slow to be programmed with the company way of doing things. As a result, they see problems differently than those that have been trained in the "correct way" to do things.
Those who spend time going to lunch, drinking beer together, palling around together -- they begin to think alike -- they develop synergy -- but they also develop a closed system. The ones who don't pal around come up with the completely off-the-wall ways of doing things because they haven't been indoctrinated into the 'normal way' of doing things. Quite often these ideas are shot down because of their eccentricity. But Steve Job's personal computer idea he presented to HP -- shot down by corportate culture was a brilliant success. He gives countless examples of the most brilliant people generally not being very good with "people skills".
A correllary of this is that those who push for perfection far past the 'norm' are going to be unpopular outsiders -- they are the nit-pickers, the one's who aren't team players. Again, they might be the ones that would nit pick the code to perfection, given the chance, but the larger group says "enough" -- it's "good enough, it boots, let's ship it".
In both instances the people most likely to increase quality in software are those that have the least political clout and are often least liked by their peers. Their peers often feel like the 'nitpicker' has a prideful, superiority complex -- overly prideful and sometimes go out of their way to sabotage work that might otherwise have turned the company around and saved millions.
I specifically was involved in a group who had to choose between 2 vendors of Microsoft compatible software. I became the lone supporter of company B. I was adamantly opposed to "A" for reasons I coudn't articulate at the time -- my gut told me "A" was untrustworthy but I couldn't tell why. I was overruled and 4-5 months into the project "A" sued MS for non-cooperation effectively killing our project. It was too late to go with company "B" who's price had doubled now that they were the only game in town. It turns out "A" had been having trouble with MS all through the negotiations with us, but no one picked up on it. Reminding anyone of the decision made me decidedly unpopular. But it was precisely because I hadn't gone out and been wined and dined by "A" and hadn't formed a "Good 'ol boy" relationship with them that I could see something was amiss. It was precisely the fact that I wasn't a hobbnobber/ polical animal that I caught the 'off' vibes. Those who were "good team employees" went along with the majority decision and the 'friendly team "A" who came onsite to woo us. Its the same principle at work.
Those who make the world work -- are also those most likely to compromise and most likely to compromise quality. It's because of their willingness to compromise that they are liked by many but it's the same compromise that resultes in compromised code -- both in terms of bugs and security.
I sure as heck don't know the answer. Successful combinations are highlighted in the book mentioned above where one person knows the almost anti-personal nature of the 'idea' person, and handles the media and external interactions, but the it's rare to find groups that work well like that.
It has often been said that the best software doesn't come out of committee but out of 1 or a few people -- while companies like to think that 9 women can have a baby in 1 month, it ends up more often that the 9 women argue over who