Why Do Computers Still Crash?

Simple ... by Vilim · 2003-05-20 12:57 · Score: 4, Insightful

Well, basically as software systems get more complex there is more things to go wrong. That is why I like the roll-your-own-kernel of linux. Don't compile the stuff you don't need and fewer things can break.

--
History will be kind to me, for I intend to write it - Sir Winston Churchill

Re:Simple ... by Transient0 · 2003-05-20 13:01 · Score: 4, Insightful

More specifically... As hardware gets more complex, software gets more complex to fill the available space. More complex software not only means more things to go wrong but also means that the hardware never really gets a chance to outpace the needs of the software.

Also, as I'm sure someone else will point out, it is very hard to right code that will not crash under any circumstances. Even if you are running a super-stripped down linux kernel in console mode on an Itanium, you can still get out of memory errors if someone behaves rudely with malloc().

--
lysergically yours
Re:Simple ... by cscx · 2003-05-20 13:01 · Score: 4, Funny

Actually the Zaurus he mentions crashing in the article runs a roll-your-own Linux kernel... ;)
Re:Simple ... by The+Analog+Kid · 2003-05-20 13:07 · Score: 5, Interesting

Yes, on my parents computer, which has 2000 on it(tried Linux it didn't work for them). I set most of the services to manual that aren't needed. Disabled Auto-update. Put it behind a router ofcourse. The only problem remained was Internet Exploder, well I just installed Mozilla with an IE theme, haven't noticed a difference). I think killing most of the services keeps it up. Haven't had a problem with it. This was done before KDE 3.1.x so who knows Linux might work after all.
Re:Simple ... by MikeXpop · 2003-05-20 13:07 · Score: 2, Insightful

Exactly. I expect that when I enter in [9], [+], [3], [=] on my calculator it will respond with "12", not "ERROR". I expect if I do the same thing on the calculator.app it will do the same thing, agains sans-crash. However, if I'm trying to download a huge file while opening and closing lots of windows, programming some web pages, uploading them to the web, listening to some tunes, talk to 80 different people on AIM, and enjoying a flash animation at the same time, the computer might crash. After all, those are two very different things.

--
Etiquette is etiquette. He kills his mother but he can't wear grey trousers.
Re:Simple ... by Zach+Garner · 2003-05-20 13:09 · Score: 3, Funny

I find that is really easy to wrong code. I do it all the time...
Re:Simple ... by orbbro · 2003-05-20 13:13 · Score: 5, Funny

And, when the cocaine that let's YOU do all these things wears off, you'll crash!

--
"It's an erotic, spectacular scene that captures the thrusting, violent, vibrant world Bohemian spirit..."
Re:Simple ... by fishbowl · 2003-05-20 13:13 · Score: 5, Insightful

"However, if I'm trying to download a huge file while opening and closing lots of windows, programming some web pages, uploading them to the web, listening to some tunes, talk to 80 different people on AIM, and enjoying a flash animation at the same time, the computer might crash."

Was it, or was it not, designed to be used in this way? If it was not, why does the system let you try it?

--
-fb Everything not expressly forbidden is now mandatory.
Re:Simple ... by Fulcrum+of+Evil · 2003-05-20 13:25 · Score: 2, Insightful

Even if you are running a super-stripped down linux kernel in console mode on an Itanium, you can still get out of memory errors if someone behaves rudely with malloc().

It's not crashing if you handle the error gracefully. Sure, the app crashes, but the system remains stable. Now, if you run an embedded system of some sort, you'll be writing that app, and being rude with malloc() is a no-no.

--
"We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
Re:Simple ... by quantum+bit · 2003-05-20 13:58 · Score: 2, Funny

I expect that when I enter in [9], [+], [3], [=] on my calculator it will respond with "12", not "ERROR"
I expect that my calculator will respond with "ERROR" right after I hit [+]. And it doesn't have an [=] button.

/RPN geek
Re:Simple ... by guile*fr · 2003-05-20 14:14 · Score: 2, Funny

I expect that my calculator will respond with "ERROR" right after I hit [+].
too much RPL for ya.... now drop the hp and come with hands in sight
Re:Simple ... by DarkZero · 2003-05-20 15:03 · Score: 5, Insightful

Was it, or was it not, designed to be used in this way? If it was not, why does the system let you try it?

Your microwave isn't designed to let you put an AOL CD or a piece of tinfoil in it and turn it into a box-shaped firecracker, but it still lets you try it. So the simple answer would be that it lets you do it because it can't control absolutely everything that it interacts with. A download manager isn't designed to be run at the same time as an MP3 player, AIM, ten browser windows, an IRC client, and downloads in other programs at the same time, but it still lets you try it because it has no control over those programs, no different than the microwave's lack of control over your hand and your AOL CD.
Re:Simple ... by Marc2k · 2003-05-20 15:41 · Score: 2, Insightful

Computers are just like liquor...the less his parents drink vodka, the less likely they'll be to notice a difference.

--
--- What

Easy by PerlGuru · 2003-05-20 12:57 · Score: 4, Funny

Same reason cars crash.... people ;-)

Re:Easy by robfoo · 2003-05-20 13:10 · Score: 2, Funny

No, I think you mean other people. :p

C and C++ are the problem by zedge · 2003-05-20 12:59 · Score: 3, Insightful

Don't allow people to use languages that allow you to access memory not assigned to you or to access array positions that don't exist. This would fix 95% of software problems.

Re:C and C++ are the problem by Anonymous Coward · 2003-05-20 13:05 · Score: 3, Insightful

I'm writing thas as anon because I refuse to enter passwords on a computer I don't trust (internet cafe). But if you must know, my nick is TheMMaster.

I think you misunderstand the problem, using pointers in C/C++ to unallocated memory only occurs with sloppy programing. It is not a "feature" of the language itself. You could easily do the same with visual basic even, if you wanted to. I DO admit that doing stuff wrong is easier with C/C++ (think of a copier in the wrong place).

People that write bad code will always write bad code, the point is that C/C++ gives you more power to create better code than other programming languages do, because they are much more flexible.

thanks for your time
Re:C and C++ are the problem by Anonymous Coward · 2003-05-20 14:00 · Score: 5, Insightful

A commonly held notion, but not really well thought through.

Sloppy programmer accesses through bad pointer in C. OS traps task.

Sloppy programmer accesses beyond array bounds in MySafeLanguage. Runtime system traps tasks.

In either case, your program "crashes", and the user isn't going to be any happier if you tell them that it's the "MSL virtual run time environment" that painted the blue screen of death than if it's the "operating system". The crappy program still ate my data.

The two actual causes, IMO:

1) People always code on the bounds of manageable complexity. Think about the programs people wrote 25 years ago. Nice as they were at the time, and they were on the bounds of manageble complexity, they have what would now be considered a laughable number of features and capabilities. As tools and processes and programmers get better, you don't get a better version of the same old thing you always had. You get something new and different that's just now become possible.

2) Users (customers) get what they deserve. I have yet to meet a real customer that will actually wait longer and pay more for a higher quality system. Instead, they'll pay less to the guy that gets there cheaper or sooner. Everyone rants about quality, but they turn around and reward time-to-market and corner-cutting on development. If any significant proportion of users really insisted on quality, they'd get it, and probably at a much higher price. (Some, but not all, embedded development falls into this category.) Instead, they want it now and cheap, and the company that takes longer and cost more simply goes out of business.
Re:C and C++ are the problem by Anonymous Coward · 2003-05-20 14:18 · Score: 2, Insightful

This would fix 95% of software problems.

It also means that you throw away 95% of all existing software. Away goes the Slashdot (MySQL) along with the rest of the Web (Apache, ISS) and the Internet in general. Not that it matters, because you don't have any more operating systems (Linux, Windows, OS X, etc.) some of which are doubly bad because they are partially written in assembly language, which lets you do (gasp!) anything! What, precisely, are we to actually get done in your computing utopia? Read Pascal code by candlelight?

And what do you mean, "don't allow" people to use languages you don't like? Keep your laws off my computer, mein Fuhrer.
Re:C and C++ are the problem by El+Cubano · 2003-05-20 14:53 · Score: 3, Insightful

Don't allow people to use languages that allow you to access memory not assigned to you or to access array positions that don't exist.

It always bugs me at how quick people are to blame the problem for crappy coding on the language. This would be tantamount to a carpenter saying, "if my hammers weren't so damned versatile I could build a higher quality product and not break my thumb open." People would look at him like he was crazy. Or better yet, an inexperienced apprentice saying, "That hammer is just too powerful for me to use."

That being said, C and C++ are the hammer that was designed by carpenters (OS experts) for use by caprenters (OS experts). Don't blame the problems on a bunch of kids who are neverly properly educated on the use of the tool.
Re:C and C++ are the problem by mendred · 2003-05-20 22:04 · Score: 2, Insightful

C/C++ are languages that were designed to be as low level as possible. Therefore, the language itself is very simplified, meaning that it expects you to take care of every detail.

Which makes this language very suitable when used by a small team of 10-20 people who know exactly what they are doing. They can design specific components relevant to their project/product and the rest 100-200 ppl can use a higher level language to link these components and build the final product.

Using C/C++ only in a team of 100-200 ppl is a recipe for disaster. It requires a great deal of discipline and expertise and also a lot of time in such cases. And humans are prone to error after all. And there is that saying too many cooks spoil a broth.

Also using only a high level language may make ur code stable for limited usage but under heavy load it will fail, and when it does you will run helter skelter wondering where the problem is. But there will be no indication in your code. I don't know if any of you java programmers have ever encountered a out of memory exception thanks to heavy object overhead and torn your hair in despair, but I have and it isn't pleasant.

And a client isn't interested in excuses. He just says get it to work in the hardware I have. Atleast a crash or a memory leak, can be traced and fixed but this??? We found a workaround eventually but it was a very painful and harrowing process after consulting a lot of documentation and certainly belied java's reputation as a easy language.

Also remember some faults may be under the hood and will be there till they get fixed- beyond your control, because essentially after all these languages add a layer over the lower level, meaning more complexity. And this complexity will be very generic in nature and may not pertain to your project or your need. In contrast, C/C++ is as low as u get and so you can write components suited for ur needs.

For eg. in our project, the programmers outside the core team use java , but they will use native calls to some libs the core team prepares. We find that this way Java gives excellent performance. It is an excellent language for program structure and modules but not for coding core components as the overhead involved is significant. Significant allocations and deallocations are not done by them at all, (ya they use new in java but under the hood all allocation and deallocation is taken care of by the component, the java part is more like a wrapper and has a very low memory print so reduced work for the GC) and any module the core team develops goes through vigorous testing before it is handed over.And the others can just drop it in place. Its not as easy as it sounds, but the Boss anyway feels its a nice balance between efficiency and ease. And besides its helpful of ur boss is also a programmer and a member of the core team.:)

Again if you are a java expert you could probably minimize those overheads without needing to touch C/C++. I am not sure. Also maybe in the future JIT compilers and other stuff may make java come very close to C/C++ in terms of performance,and defacto hardware may become powerful enough to drop C/C++ altogether (for example now nobody uses a 386/486 for serious work, but here we even had problems on a P4 1 ghz having 256 mb ram, ok may not be bleeding edge but can't call it obsolete). But till then this is the model we will use. Also if we require cross platform independence, only the core libraries need to be ported. Right now the linux port is underway.

What I am trying to say is everything should be viewed in shades of gray. There is a place for everything and there is a reason for everything to exist. For example, my brother for his phd is using java to run some scientific calculations heavy number crunching stuff because it is easy to code and u don't have to worry about anything other than the logic. Plus his university has given him a dual processor P4 2.5ghz with 1GB ram just for that :))(oh it also has a radeon 9700 drool:). Bu
Re:C and C++ are the problem by smallpaul · 2003-05-21 00:48 · Score: 2, Insightful

It always bugs me at how quick people are to blame the problem for crappy coding on the language. This would be tantamount to a carpenter saying, "if my hammers weren't so damned versatile I could build a higher quality product and not break my thumb open."

No, you've completely mischaracterized the argument. Actually, the argument is: "People keep using a wrench as a hammer. Yes, you can do it but it isn't efficient and it isn't safe." C++ is not a good application programming language but that is what is is most often used for. It is excellent as a component, operating system or runtime programming language.

Whose computers still crash? by fishbowl · 2003-05-20 12:59 · Score: 4, Funny

Crash? What crash?

radagast% uptime
8:56pm up 582 day(s), 12:45, 22 users, load average: 0.00, 0.00, 0.01

--
-fb Everything not expressly forbidden is now mandatory.

Re:Whose computers still crash? by Anonymous Coward · 2003-05-20 13:04 · Score: 5, Funny

load average: 0.00, 0.00, 0.01

easy to keep a computer up if you never use it ;)
Re:Whose computers still crash? by stefanlasiewski · 2003-05-20 13:19 · Score: 3, Insightful

Crash? What crash?

up 582 days

Reboot? What reboot?

Now, when was the last time you tested those init scripts? :)

-= Stefan

--
"Can of worms? The can is open... the worms are everywhere."
Re:Whose computers still crash? by EvilTwinSkippy · 2003-05-20 13:29 · Score: 3, Insightful

So what Kernel is that you are running? Hmmm. If it's a linux box that would barely by 2.4. More likely 2.2.
(Digging through my pile of vulnerabilities...)
Say, could we get an address on that box? Muhuahahahaha
My uptime is largely limited by kernel upgrades and the fact I cycle the power once per month to prevent the drive head from sticking.

--
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
Re:Whose computers still crash? by toddestan · 2003-05-20 13:49 · Score: 5, Informative

Even with my uptime experiments, which consisted of taking an old but reliable hardware, installing Windows 95/OSR2/98/98SE/ME, and then letting the computer idle and do nothing never resulted in more than about 25 days before I came over and windows was fubar'ed or the computer was simply locked hard.

Windows 3.1 actually did quite well if I remember right, as it seemed perfectly content sitting idle doing nothing seemly forever. Windows 9x always seemed to randomly thrash the HDD, even after a clean install, which led me to believe that Windows 9x is never truly idle, it's always up to something (virtual memory?), and that something eventually will bring it down.

Windows 9x actually has a bug in it that would lock the computer after 46 days of uptime, but it took years to catch it because no one ever got close to that mark.
Re:Whose computers still crash? by UserGoogol · 2003-05-20 13:50 · Score: 5, Funny

Well... in my day I had to write games with just seven transistors and a piece of cheese! And I thought I was lucky. Kids today. Geez.

Granted, I'm 16, but that's not the point.

--
"Never attribute to malice that which can be adequately explained by stupidity." -- Hanlon's Razor
Re:Whose computers still crash? by Dr.+Photo · 2003-05-20 13:57 · Score: 5, Funny

Reboot? What reboot?

Now, when was the last time you tested those init scripts? :)

Init scripts? You heathen!!

Rebooting is a special occasion, signalling the coming of the harvest season, or the installation of a new kernel. Accordingly, the High Priest shall bring the system up by hand, typing in the ancient incantations from the sacred scrolls.

Init scripts are for the weak of faith. Let ye not be tempted by the daemons of rc-dot-d!
Re:Whose computers still crash? by Guppy06 · 2003-05-20 14:58 · Score: 5, Funny

"Accordingly, the High Priest shall bring the system up by hand, typing in the ancient incantations from the sacred scrolls."

Would those sacred scrolls, perchance, be small, yellow, and stuck all around the monitor screen?
Re:Whose computers still crash? by stwrtpj · 2003-05-20 15:48 · Score: 2, Funny

Well... in my day I had to write games with just seven transistors and a piece of cheese! And I thought I was lucky. Kids today. Geez.
You had transistors?? What a spoiled brat. I had only three vaccuum tubes, an abacus, and a photo of ENIAC. And I was grateful!!

--
Karma: Frotzed (mostly due to the Frobozz Magic Karma Company)
Re:Whose computers still crash? by TheNetAvenger · 2003-05-20 18:12 · Score: 2, Insightful

Why is it that people are always using Windows98/ME that was basically written in 1997 and 1999 and then compare it to their *nix installations that are the current versions and running the latest *nix patches?

If people want to compare MS and Windows to their *nux, at least use WindowsXP as the base line.

It would be just as silly to compare WindowsXP to a 1997 version of any *nix out there.

Or if you are going to use an 'old' MS OS, at least base it on WindowsNT4.0 which is at least in the same class line as *nix. Our clients have had high usage NT4.0 installations run for years without failures.

Windows9x is a grand extension of the DOS architecture, NT on the other hand is just a completely different ball game by design.
Re:Whose computers still crash? by fucksl4shd0t · 2003-05-20 19:17 · Score: 4, Funny

Kind of like how my 2 year old daughter carrying dishes to the sink. She's trying to be helpful, but occasionally she drops one.
HEh. My daughter's 4 and she's never accidentally dropped a dish. That doesn't mean she's never broken one, though....
My son's two, and it's impossible to tell if he drops dishes on purpose or on accident, because he does it so much.
Should've named my daughter Linux and my son Windows. Now we're having another one, what should I name him? BSD? What's he gonna do? Sit there and whine about how nobody loves him 'cuase he's the only true eunich left? Or is he gonna spend his time crying because right after he's born they're gonna cut him into three pieces and each person will claim their piece is better than the whole?
Wow, first time I've ever trolled BSD. I feel strangely liberated...

--
Like what I said? You might like my music

AS LONG AS YOU CAN TEST EVERY STATE... by drink85cent · 2003-05-20 12:59 · Score: 5, Insightful

As I've always have heard with computers you can't prove something works, you can only prove it doesn't work. As long as there are an almost astronomical number of states a computer can be in, you can never test for every possible case.

Re:AS LONG AS YOU CAN TEST EVERY STATE... by innosent · 2003-05-20 13:54 · Score: 5, Insightful

Not exactly. Assuming that the hardware is ok, you can prove that a system is reliable for any given finite input (including, most importantly, all possible finite substrings of inputs, however it is not possible to test all possible inputs, since a portion of those are infinite), it's just that doing so in large systems takes enormous amounts of time, and of course, time = money. Take Microsoft, for example. It takes a team years to develop a product like Windows XP, run a few test cases, and fix the major bugs. But just think how long it would take to go through every possible input substring of a given length (and by substring/string I am including non-character inputs [mouse, network, etc]).

Consider a simple program that inputs 10 short strings of text and does some computations on those strings. Say for example that the system that has only a keyboard as input, that all input functions are guaranteed only to input A-Z (caps only), the space bar, and 0-9 (regex ((A-Z)*(0-9)*)*( )*), not to overflow, and that there are 10 inputs with exactly 10 characters for each input (spaces fill end of string). This means that there are 37 possibilities for each digit, totaling 37^100 unique possible inputs, about 6.61E156 possibilities, each 100 characters. Typing a million characters per second would take 2.094E145 years! Keep in mind that this is an extremely simple system.

Therefore, it is not possible to test ALL input cases of any nontrivial program, only a few selected cases, which most will agree is far from proving a program correct. Instead, developers should have detailed mathematical descriptions of how a program is to behave at each incremental step, and verify that the program follows those descriptions accurately. Programs can only be proven correct in the same manner that any discrete mathematic concept can be proven correct, with one of the most common methods of a functionality proof being mathematical induction. Based on a few basic assumptions (like that the functions you call work as documented), the rest of the system can be proven by proving the trivial parts and cases first, and then constructing a complete proof based on the trivial parts.

The problem with this is that a small change can have a big impact on the proof, and nobody actually takes the time to verify that everything still works. Companies don't often spend money on making their software 100% correct, they just need to add the nifty new features that their customers want before their competitors do. I'd be willing to bet that 90% of the bugs found in XP can be traced to a "nifty new feature" that broke code that may have been proven correct at some point.

In other words, the short answer is yes, if you can test every state, you can prove a program correct, but since that's usually impossible, it becomes the developers' responsibility to incrementally prove the system, which is far easier if all functionality is planned ahead of time, but still too time/money consuming for most software companies to bother with. Microsoft doesn't care if your computer crashes, you'll probably still pay them, and as much as I'd like to think otherwise, OSS isn't much different (although it's usually more time than money there).

--
--That's the point of being root, you can do anything you want, even if it's stupid.
Re:AS LONG AS YOU CAN TEST EVERY STATE... by TheOnlyCoolTim · 2003-05-20 14:11 · Score: 3, Insightful

"We know that i+1 > i"

Are you so sure? Depending on various circumstances, you might find that a little while after you get to 127 or 32767 (or thereabouts) i+1 has become i...

Tim

--
Omnia vestra castrorum habetur nobis.
Re:AS LONG AS YOU CAN TEST EVERY STATE... by moncyb · 2003-05-21 15:40 · Score: 2, Informative

Why? To waste processing time? Programmers should most certainly check beforehand if something catastrophic will happen...such as checking for NULL pointers, or making sure a drive head won't go too far and crash. But I don't see why they should have to predict if an error will happen beforehand for every case.
Sometimes a more elegant, efficient ior easy solution is to check afterward. Who even says the result will be wanted if an overflow happens anyway? The program may just end the function on an error, or pop up a warning box.
If needed, an increment can easily be undone, and overflow checking is far easier than comparing against the max value--which you'll have to figure out. Not easy if you don't know the size of the operand (such as with C and ints). Not easy if the size of the operand may be changed at a later date (you'll also have to change all your comparison code, or the max constant if you used one). This may not seem like a big deal, but it's one more thing to get wrong.
Some 80x86 assembly to illistrate how overflow works to the programmer's advantage:
inc al ; increment the al register jo errorhandler ; if overflow, jump to error handler ; no error, continue on...
Without overflow:
cmp al,127 ; compare al against the max value of a byte jge errorhandler ; if al is greater or equal to max, jump to error handler inc al
Yeah, the extra cmp opcode may not look like much, but it does add to the code size, and will use additional processing time. Doing this a lot in a large program will add up--especially if it needs to be fast.
This was just a simple case for an increment. What about adds? The precheck will be much more complex than the previous example and probably use up an extra register.

Human Error by Obscenity · 2003-05-20 13:00 · Score: 5, Insightful

All programs (for the most part) must be written by people. People crash, they're buggy and they dont have a development team working on them. Computers crash because people cant catch that one little fatal error in 10,000 lines of code. Smaller programs are less succeptable to errors and big scary warning messages that make even the most world-hardend geek worried about his files. Yes, it's getting better with more and more people working on something at once. Mozilla (www.mozilla.org) has a feedback option to help them debug, many software companies are including this. But even with that in place, there is always that small human error, that will screw something up.

--
OMG OMG OMG WTF OMG WTF BBQ STFU RTFM, OMFG OMG OMG OMG ROFL LMAO OMG WTF STFU ROFLMAO

Re:Human Error by Malcontent · 2003-05-20 13:45 · Score: 5, Insightful

"People crash, they're buggy and they dont have a development team working on them. Computers crash because people cant catch that one little fatal error in 10,000 lines of code. "

While this statement is true it's also a cop out. In the last twenty years there have been tremendous amount of advances in computer science and languages and yet everybody still programs in C.

That is the reason why programs crash. Why don't people use languages that make programs more failsafe and make programmers more productive.

It would be interesting to do a study of the "bugginess" of programs written in python, java, scheme, smalltak, lisp etc. My guess is that programs written in C crash the most.

Where are all the programs written in scheme or smalltalk or ML?

Use better languages and crash less.

--
War is necrophilia.
Re:Human Error by Uller-RM · 2003-05-20 14:15 · Score: 4, Insightful

Java programs can still crash -- and believe me, grade homework for undergrad CS students for a few years and you'll see plenty of it. The only difference is that Java tosses an exception that isn't handled, and C either asserts and calls exit(-1) or segfaults.

I don't think it's fair to say that any one language is "safer" than another -- once you reach a certain level of expertise, one can write a stable and robust program in C or C++ or Java or Haskell (my preference) with equal effort. The effort is mental: being persistent enough to define solid logical definitions for each part of the program, failure conditions, etc. and then execute them to the letter in the language of choice. If the program behaves logically, you can prove that it works using logical principles -- induction and so on. (And if you ever do govt contracting or any other project that calls for requirement tracability, you'll need to.)

The difference between languages is merely the way the code is expressed. Java and C++ have exceptions; C does not. For some situations, return codes are better than exceptions, and for some situations the opposite is true. Java has robust runtime safety -- C and C++ do not. C and C++ have templated containers -- Java's just now getting such genericity. All languages and all approaches to problems have tradeoffs: the mark of a good programmer is knowing those tradeoffs and picking which is best for the situation.
Re:Human Error by JohnsonWax · 2003-05-20 16:00 · Score: 4, Interesting

"All programs (for the most part) must be written by people. ... Computers crash because people cant catch that one little fatal error in 10,000 lines of code."

All bridges (for the most part) must be built by people. Bridges collapse because people can't catch that one little fatal error in one or two million components.

The shit coders put out there, I swear... The reason software crashes is that by-and-large it's hacked together, not engineered. You hack a bridge together, and yes, it'll fail. You engineer software, and yes, it will run reliably. It's not fun to do - no easter eggs, no cool tricks, no cramming features in weeks before ship.

I'm stunned at the amount of code that goes out that was written by interns, by unexperienced coders, by people that just don't have a clue. The software industry really has no concept of best practices, no leadership, no authority body. The fact that buffer overflows still happen is stunning.

It's not small projects that work well because out of dumb luck they happen to not fail, or larger projects that work okay because we have 34,000 people looking at the code. If that's 'best practices', then we're doomed.

"Mozilla (www.mozilla.org) has a feedback option to help them debug, many software companies are including this."

Uh huh. Let's translate that to my car: "Hi. Yeah, I'd like to report a bug. I have a Saturn Ion, version 1.1v4. Yeah, when I turn on the left turn signal and then turn on the lights, the car catches on fire. You might want to fix that in the next version. Just though you might want to know. Bye."
Re:Human Error by GlassHeart · 2003-05-20 16:16 · Score: 4, Insightful

It would be interesting to do a study of the "bugginess" of programs written in python, java, scheme, smalltak, lisp etc. My guess is that programs written in C crash the most.
Even that is a worthless statistic. Assuming that bad programs are written by bad programmers, the language that more bad programmers choose will appear the highest in your study as the buggiest language. Bad programmers choose the language du jour, thinking it will land them a cushy job.
You'll have to disprove the assumption to correctly blame the language.
Use better languages and crash less.
Try dividing by zero in your better language of choice.
Re:Human Error by Anonymous Coward · 2003-05-20 19:35 · Score: 2, Informative

Scheme and Smalltalk are bad examples, because dynamically typed languages produce entirely different types of faults (typing errors).

ML is a much better example.

Others will claim Java is a good example, but it's a bad one, because despite being statically typed it causes typing errors (from casts) and null-pointer exceptions (ick).

Safer languages still don't mean that programs don't fail, but they eliminate some of the ways they can fail.
Re:Human Error by ojQj · 2003-05-20 19:35 · Score: 4, Insightful

Disclaimer: I haven't programmed in Java since my undergrad, but I learned it before C++. I've been programming in C++ professionally for 3 years straight now, not counting internships and class assignments before that.
I'd rather have an exception than a crash. It gives me more information about what I did wrong. A crash that's not reliably repeatable and only happens in your release version under Windows OT systems with IE 4 installed, is next to impossible to find and fix -- in C++ it's only worse.
Not only that, but memory management is more than just a nuisance. Just yesterday, I wanted to move some code from one class to another to improve the object-oriented structure of some code which I've taken over from another developer. In that code were a couple of news, and I couldn't find the deletes which matched them. So I asked the original developer. Turns out the deletes were in a base class of the class that I was moving the code to. If I had been programming in Java, this would have been a cut and paste job finished in 30 seconds, plus 15 minutes for testing the change before checking in. In C++, it was 15 minutes trying to find the deletes myself, 15 minutes waiting for the other developer to get to a break point in his work and another 15 minutes assuring myself that the deletes really were called for all cases, and another 15 minutes for testing the change before checking in. That's a factor of 3-4 (depending on if I have something else I can do while waiting) for the C++ program.
Memory management and other unnecessary tasks which C++ saddles the developer with do make an impact on either development time, program stability, or both. And that is also true for experienced C++ programmers.
They also make an impact on language learning time, which is not to be underestimated with the number of newbies today, and people moving up from still worse languages like Cobol. In addition, even for an experienced C++ programmer, they make a difference in the time it takes to understand code which was programmed by another programmer.
I agree with you that there are situations where every language, including C++, is the most appropriate for the problem in question. I just think that C++ is over-used, thus reducing the average stability of modern programs and the average productivity of modern programmers.

Re:Computers don't crash by BoomerSooner · 2003-05-20 13:00 · Score: 3, Funny

That's called job security man!

In my CompSci class.. by ziggy_zero · 2003-05-20 13:01 · Score: 4, Insightful

...I remember my teacher saying "Computers do exactly what they're told, not necessarily what you want them to do."

I think the root of the problem is time. Microsoft doesn't have the time to spend going through every possible software scenario and interaction, or every possible hardware configuration. If they did do that, it would probably take a decade to pump out an operating system, and by that time hardware's changed, and it's a neverending cycle.....

We just have to accept the fact that the freedom of using the hardware components we want and the software we want, all made by different people, will result in unexpected errors. I, for one, have come to grips with it.

--
I belong to the ______ generation.

Re:In my CompSci class.. by pnatural · 2003-05-20 13:22 · Score: 2, Insightful

But this shouldn't be an issue. If your HAL is done properly, there is no possibility of crashes with different software/hardware combinations, because the hardware doesn't matter. If libraries etc are managed properly, and memory space is isolated properly, then there should be no software-software issues.

And this, ladies and germs, is precisely why computers crash. One system depends on another, and each layer is presumed to be solid. It's the presumption that things at the lower level cannot go wrong that gets most coders into deep do-do.

The reactionary solution is to code defensivly. Defensive programming has it's place, but it's rarely done correctly (IMO) and it leads to cruft and maintainance nightmares. The solution (again IMO) is to account for failures at the design level.
Re:In my CompSci class.. by nick_davison · 2003-05-20 13:43 · Score: 3, Insightful

...I remember my teacher saying "Computers do exactly what they're told, not necessarily what you want them to do."

D&D summed it up for me, years ago, with the wish spell: At its purest, it's too powerful to give to players - they'll unbalance and destroy the game. However, it can be balanced by giving them exactly what they ask for.

"A demon lord approaches you out of the shadows."
"I cast 'wish' - I wish for a +100 sword of almighty vorpal type slayingness."
"The sword appears in the demon's hand. He thanks you for it, then hits you."

Writing good code is like making a good wish. All you can do is try to cover as many eventualities as possible. The problem is, code gets really slow to run and even slower to write when you have to add out of bounds checks on every argument, error handling and reporting, garbage collection and all the rest. Even then, there'll always be some twisted scenario that you didn't know could exist so didn't plan for. So most people just give up, wish for the damn sword and hope the PC/Dungeon Master doesn't have too evil an imagination this time.
Re:In my CompSci class.. by |Cozmo| · 2003-05-20 14:26 · Score: 2, Insightful

If only it was as simple as you say.
Creating drivers that run devices made by many different manufacturers means you have to take all of their differences into account in order to get the same behavior from all devices. For Example: If one chipset powers down a certain device during a reboot/standby/hibernate/etc and another chipset doesn't, you can run into strange behavior. Throw laptops and APCI into the mix and we're lucky things work as well as they do :) I think we're probably just lucky most of these things are isolated from end-users.

I have several machines that won't even POST with certain configurations of USB devices plugged in. I think it is a BIOS issue.. Probably trying to fiddle with devices to do HID support or booting from storage devices and it is probably either hanging the BIOS or the hardware.
Re:In my CompSci class.. by KrispyKringle · 2003-05-20 16:02 · Score: 4, Funny

And just when being into computers was starting to get "cool" (think The Matrix, Hackers, or Swordfish) someone like you comes along and start talking about Dungeons and Dragons. There go my chances of getting laid. There go all our chances of getting laid.
Re:In my CompSci class.. by nick_davison · 2003-05-20 18:49 · Score: 2, Funny

I've been in your game and you're a dick. ;)

Of course I was. I was thirteen at the time, in to D&D and computer programming, and couldn't get any girls.

Had they had video labs at my highschool and file sharing networks, back at the end of the 80s, I'd have been the fat kid making lightsaber noises while waving a broomstick around.

because someone was very curious and decided to... by null-sRc · 2003-05-20 13:01 · Score: 4, Funny

*0;

never follow the null pointer they said... what are they hiding there????

--
-judging another only defines yourself

Reliability and complexity by woodhouse · 2003-05-20 13:02 · Score: 4, Insightful

Because reliability is inversely proportional to complexity. Systems these days are generally a lot more complex than those of 10 years ago, and in complex systems, bugs are much harder to find. The fact that you say stability hasn't changed is in fact a pretty impressive achievement if you consider how much more complex hardware and software is nowadays.

Re:Computers don't crash by UndercoverBrotha · 2003-05-20 13:02 · Score: 3, Funny

That sir, is a TRUE statement.

Everyone leaves some code that only they can fix...

My Standards for variables:

_needthis
_needthis1
_x
_uz

etc

--
Solid!

It's not the need for speed by Jeremi · 2003-05-20 13:02 · Score: 5, Insightful

It's the need for new features. Every feature that gets added to a piece of software is a chance for a bug to creep in.

Worse, as the number of features (and hence the amount of code and number of possible execution paths) increases, the ability of the programmer(s) to completely understand how the code works decreases -- so the chances of bugs being introduced doesn't just rise with each feature, it accelerates.

The moral is: You can have a powerful system, a bug-free system, or an on-time system -- pick any two (at best).

--

I don't care if it's 90,000 hectares. That lake was not my doing.

Re:It's not the need for speed by WasterDave · 2003-05-20 13:28 · Score: 5, Insightful

Thank you, at least somebody got it fucking right.

Software doesn't have to crash, but for a given quantity of development resources there's a fairly simple tradeoff between feature-richness and stability.

You want reliable? Strip back features left right and centre, design an elegant architecture, then unit test properly.

Dave (in a ranty mood)

--
I write a blog now, you should be afraid.
Re:It's not the need for speed by MourningBlade · 2003-05-20 16:36 · Score: 2

It's the need for new features. Every feature that gets added to a piece of software is a chance for a bug to creep in.

This is one of the reasons I consider the 'nix philosophy of many small, well-written programs working together to be a good one. Programs use programs.
One of the reaons that programs crash is that they require knowledge of the component programs. 'nix programmers (should) follow the principle of "define an input protocol and an output protocol. Be as flexible as possible with the input protocol, but do require it. Be as strict with the output protocol and make it human-readable."
This eliminates numerous programmer-factors problems. It allows for black boxing, unit testing, and allows you to isolate blame quickly. It also allows you to swap out components and split up effort in programming - as I think has been readily demonstrated by the 'nixes.
This development philosophy can really help kill off featuritis, as you can implement the feature as high-level as possible reducing the amount of code that requires it, thus reducing errors across the system.
I think that the core problem with features and software (and I do think you're on the right track) is the amount of virgin code introduced, and the quantity of white boxes required to understand the problem. My favorite example of this is TeX and LaTeX: I've never encountered a LaTeX error (they're getting pretty rare now), and I don't know anyone who's encountered a TeX error (you can find them on Usenet, but they're very old posts).
I sometimes wonder if we wouldn't all be better off using lightweight libraries that handle protocols to talk to programs and daemons than using heavyweight APIs and toolkits.

Don't forget the hardware... by MightyTribble · 2003-05-20 13:03 · Score: 5, Insightful

Some crashes aren't the fault of the OS. Bad RAM, flaky disk controllers, CPU with floating-point errors (Intel, I'm looking at *you*. Again. *cough* Itanium *cough*)... all can take down an OS desite flawless code.

That said, some Enterprise-class *NIX (I'm specifically thinking of Solaris, but maybe AIX does this, too) can work around pretty much any hardware failure, given enough hardware to work with and attentive maintainence.

crashes? by Moridineas · 2003-05-20 13:03 · Score: 3, Interesting

Well the computers that I manage we've got an OpenBSD server hat never crashes (uptime max is around 6months--when a new release comes out) and a FreeBSD server that has never crashed--max up time has been around 140-150 days, and that was for system upgrades/hardware additions.

On the workstation side they are definitely not THAT stable, but since we've switched to XP/2K on the PC side, those pc's regularly get 60+ days of uptime. Just as a note--I had a XP computer the other day that would crash about two or three times a day. The guy that was using it kept yelling about microsoft, etc etc etc. Turned out to be bad ram. After switching in new ram it's currently at 40 days uptime (not a single crash).

For some reason the macs we have get turned off every night so their uptime isn't an issue, but from what I hear OSX is quite stable.

Touchy subject by aarondyck · 2003-05-20 13:04 · Score: 5, Interesting

I remmeber years ago having a conversation with an IT manager at IBM. We were talking about the inability of computer programmers to make their code foolproof. His point was that we don't see problems like this with proprietary hardware. When was the last time someone crashed their Super Nintendo? Of course, with a PC platform (or even Mac, or whatever else) there are problems of unreliability. His idea is that this is because of sloppy programming. The reason we were having this conversation is that I had a piece of software (brand new, I might add) that would not install on my computer. You would think that a reputable software company (and this was a reputable company) would test their product on at least a few systems to make sure that it would at least install! The end result was that I ended up never playing the game (not even to this day), nor have I purchased another title from that company since that time. Perhaps that is the solution to the root problem?

--
In order to be immortal you must be organize

Re:Touchy subject by DNS-and-BIND · 2003-05-20 13:29 · Score: 2, Informative

I saw a quote from a game maker on this...he said something to the effect of, "We have to be really careful with testing the console versions of our games, because if something goes wrong we can't just issue a patch to fix it like we can the PC version."

--
Shutting down free speech with violence isn't fighting fascism. It IS fascism!
Re:Touchy subject by cryosis · 2003-05-20 13:32 · Score: 2, Insightful

Wait a second. You're saying that because you can't get a modern game that was designed to run on a damn near infinite number of hardware configurations plus a wide variety of software configurations and have that game always run perfectly every time that the programmers are sloppy?

You can't expect that programmers predict every condition of every system that their software might run on. It would take decades for a new package to be released and even then it would be huge.

How can you compare a Super Nintendo where all that games written for it are within very strict guidelines to a PC game where the programmer knows next to nothing about the systems that the game is going to be run on? The PC programmer can only try their best to quash the bugs that they can find. And there is no way that they can stop them all. I don't think that this is due to laziness on their part, it's more due to the fact that they're being expected to ship the product on time. If consumers would tolerate longer development schedules and higher program costs, then I think that software would get more stable. But everyone wants newer, faster, better *now*. Oh, and cheap too.

The only way you could have complete software stability is to ensure that every system is exactly the same, down to the RAM manufacturer and the library versions. And you're never going to get that. Not everyone wants the same computer as Bob next door.
Re:Touchy subject by llamaluvr · 2003-05-20 14:20 · Score: 2, Informative

I've crashed my Super Nintendo. Quite a bit, actually.

In Final Fantasy III, in the Phoenix Cave, occasionally when I encountered a random battle, the sprites would all become garbled, andthen the game would hang. At first I thought it was a secret, because one of my characters turned into General Leo, but then the game stopped working and I had to reset.

You can also sometimes crash Final Fantasy III by using Relm's "Sketch" command on Gau. What you do is you let Gau use "Leap" to learn a new Rage ability when you're roaming the Veldt, and then when you find him in another battle and sketch him when he appears. I'm not sure if it always crashes the game- I recall that sometimes it gave you tons of extra random items (like 99 daggers, among other things)- but that might be another bug.

--
Insightful: 76, Off-Topic: 379, Flamebait: 24, Funny: 152, Interesting: 201, Underrated: 55, Troll: 9, Total: 896
Re:Touchy subject by Exantrius · 2003-05-20 16:14 · Score: 2, Insightful

Well, I've actually caused my PS2 to crash quite a bit, such as playing gauntlet with 4 players (tries to reference negative ram addresses or something like that...
But in general, you're right. It's very difficult to recall a console game...
However, it only has to run on one (few at least) set of hardware. It's the reason macs seem to never crash-- If they had to program for every piece of hardware out there, there'd be a lot of "crap" that happens, and things get messy...

If PCs were uber-standardized-- this proc. this amount of ram, this and that, then there would be no problem. I'm working tech support (for a *gag* foxpro program) and one in 100 customers gets extreme slowdown (like running a report can take 72 hours when it's supposed to only take 10ish minutes) all the time. We have been hunting for it for the past months, and it isn't the data... It seems possibly hardware related, but there's so much hardware out there, and so many different layouts for it (win9x v. me v. 2k vs. nt v. xp)

It's a nice belief that they try it on a bunch of systems, but chances are, if it's anything like the jobs I've run, you've got one guy that collects all the files, then at the end, he runs it around the office, and maybe to a "test room" with generic pcs of varying speeds and makeups, which he tests it on.

Did you ever look at sierra's help stuff? I never had a problem installing their stuff (microprose on the other hand, cost me six months allowance because it hard killed win31, and I had to bring it to the store to get it reinstalled (you know, when your parents didn't trust you to touch the damned thing, even though you can't do any more damage than you already did? Ahhh, memories...

Uh, that's all I have to add. Good points, just a bit more insight... /ex
Re:Touchy subject by Zoarre · 2003-05-20 17:21 · Score: 3, Insightful

I remmeber years ago having a conversation with an IT manager at IBM. We were talking about the inability of computer programmers to make their code foolproof. His point was that we don't see problems like this with proprietary hardware. When was the last time someone crashed their Super Nintendo?
The Super Nintendo used a 3Mhz Motorola 65816, the same processor used in an Apple IIgs. I can't find it's transistor count on the web, but it could not have had less than 5000 (the 6502) nor could it have had more then 68,000 (the 68k). Compare this to a modern AMD Athlon 3000+, which has about 54.3 million transistors. The Super Nintendo might be less likely to crash than a PC because there are at least 54 million fewer things to break.
Also, his claim that you don't find similar problems in modern hardware is incorrect. Just search Google for "intel errata" to see what I mean.
I bought my Gamecube last week and a copy of Metroid Prime. Ironically, it runs on an IBM PowerPC chip (the IBM branding is right on the box) and it's crashed twice since I've owned it. (I <3 my Gamecube regardless).
Industry professionals that produce glib, ignorant assertions such as this one might be part of the problem. :D

--
"People with opinions just go around bothering one another." -The Buddha
Re:Touchy subject by Bluelive · 2003-05-20 22:28 · Score: 2, Insightful

This could ofcourse have been a single bit that failed on the medium you got your copy on.

Scientific American... by Hanji · 2003-05-20 13:04 · Score: 4, Interesting

Scientific American actually had an article on a similar topic. Basically, they seem to be accepting crashes as ineveitable, and were focusing on systems to help computers recover from crashes faster and more reliably...

They also propose that all computer systems should have an "undo" feature built in to allow harmful changes (either due to mistakes or malice) to be easily undone...

--
A Minesweeper clone that doesn't suck

It's expected. by echucker · 2003-05-20 13:05 · Score: 3, Insightful

We've lived with bugs for so long, they're a fact of life. They're accepted as part of the daily dealings with computers.

New features are more important than stability by IvyMike · 2003-05-20 13:05 · Score: 2, Interesting

People upgrade for new features. That computer/OS/gizmo you have today does a lot more than the one from 10 years ago. That's a lot more code that needs to be written, and thus a lot more opportunity for errors. It's that simple.

(I'm actually ok with that. I'd rather have a moderately crashy Windows XP box capable of playing GTA:Vice City than the hypothetical alternative: a super-stable Windows95, capable only of playing "Doom 2".)

Complexity, my dear Watson by T5 · 2003-05-20 13:05 · Score: 5, Insightful

It's all about the bits. There are just so many more of them now, and a great deal more pressure in the marketplace to bring ever newer software and hardware to market. Back in the day of the IBM 360 and the VAX, even though we were mesmerized by the capabilities of these machines, they were years and years in the making, debugged much more thoroughly than we can hope for today, and much, much simpler.

And let's not forget that this was the exclusive realm of the highly trained engineer, not some wannabe type that pervades the current service market. These guys knew these machines inside and out.

Essence of Software Engineering by Zach+Garner · 2003-05-20 13:05 · Score: 5, Insightful

Read "No Silver Bullet: Essence and Accident of Software Engineering" by Brooks. A copy can be found here.

Software is extremely complex. Developed to handle all possible states is an enormous task. That, combined with market forces for commercial software and constraints on developer time and interest for free software, causes buggy, unreliable software.

Not always the softwares fault: by bravehamster · 2003-05-20 13:05 · Score: 2, Insightful

I've found in my years of repairing pc's that the majority of software problems have their root cause in hardware. A bad stick of memory, corrupt hard drive sectors, overheating components, cosmic radiation causing bit flips-all of these things cause random, bizarre errors. It's pretty easy to tell the difference too. Software errors are repeatable. The exact same situation should produce the exact same error. So all I'm trying to say is that I doubt we will ever reach the point that computers won't crash, because at some point there has to be interaction with the physical world. And no matter how perfect your program is, it's not going to survive a two year old stuffing pennies into the back of the power supply.

--
---- El diablo esta en mis pantalones! Mire, mire!

Re:Not always the softwares fault: by Jeremi · 2003-05-20 13:14 · Score: 4, Insightful

I've found in my years of repairing pc's that the majority of software problems have their root cause in hardware.

Wow, your experiences are much different from mine, then. I'd say 95%+ of my computer problems are caused by software bugs.

Software errors are repeatable. The exact same situation should produce the exact same error.

For a significant percentage of software errors, that statement is false (at least misleading), because it's nearly impossible to reproduce "the exact same situation". For example, take any multithreaded program with a race condition bug -- the chances of the two threads getting the exact same time-slices on two different executions of the program are approximately zero. The result: a crash that happens only sometimes, at random, even given the exact same starting conditions.

--

I don't care if it's 90,000 hectares. That lake was not my doing.
Re:Not always the softwares fault: by pi_rules · 2003-05-20 13:44 · Score: 2

And no matter how perfect your program is, it's not going to survive a two year old stuffing pennies into the back of the power supply.

Somebody else already stole my rebuttal about multi-threaded apps not being 100% repeatable, so I'm going to follow up with something a bit more humorous.

I've already put this out here on Slashdot once before I think but it's a fun story and worth repeating I hope.

I had a wondeful prof. in college, a wise old sage, who had many a tale about the olden days of programming. While going over finite state machines (FSMs) he enchanted us with a story from his younger years about designing a FSM for an automatic teller machine (ATM, or "magic money machine" for those over 60).

The group dwelled over the problem set and studied the possible events that could occur from user input at any given time. They had a wonderful diagram that represented 100% of all possible situations at all time. They commenced on coding to the hardware designs and the code was finished. QA? Flawless. The men were masters of the ATM universe. Nobody could kill their software -- nobody! The hardware was solid, the client purchased and deployed. Life was good. It served for a long time and one day they get the report:

"Your ATM crashed."

"What?!" say the software engineers. "This is impossible! What could we have possibly NOT thought of? Our FSM is perfect!".

Well, some drunk-assed bastard stuck a McDonald's FishWich into the deposit slip one night. Apparently they didn't account for that one.

So, to this day, in honor of Professor Jorgensen, I have my own little personal Jorgensen's rule, which was the summary of his story:

"You can make it fool proof -- but you can't make it damn fool proof!"
Re:Not always the softwares fault: by intermodal · 2003-05-20 14:01 · Score: 2, Insightful

in my years of repairing PCs and professional testing of software, I find that the majority of software problems are either a direct result of the QA department having no say in the company, the fact that more often than not, QA is looked down upon (and therefore are all contractors who once they know the product and company networks inside and out are at the end of their contract), and that home users don't maintain (defrag, remove dust from inside the case) their computers or pay attention when fans burn out. Or that they bought shoddy hardware, like that of the Dell single-processor Xeon Precision Workstations or an Emachine.

--
In SOVIET RUSSIA... erm...NSA AMERICA, the Internet logs onto YOU!

Re:Computers don't crash by SILIZIUMM · 2003-05-20 13:06 · Score: 2, Funny

Anyway remember, it's not a bug, it's a feature...

Re:And by UndercoverBrotha · 2003-05-20 13:07 · Score: 2, Funny

I never thought of that...although I do confide in my DBA who does the same thing with his procs, derived tables and jobs...he would handle it, I would come back like Swayze and help him out...

--
Solid!

Microsoft by eht · 2003-05-20 13:08 · Score: 5, Insightful

Microsoft has made an extremely stable OS, it's called Windows 2000, as long as you use MS certified drivers the OS should never crash, individual programs may crash under Windows, but you can hardly blame Microsoft for that. I have had Windows machines with months of uptimes and no problems, went down 8 days ago due to power failure too long for my UPS's to handle, which also took down my FreeBSD machines, uptime is matched for all of them, and will one day again be measured in months.

Yes I should probably patch some of my Windows machines, but I have my network configured in such a way that for the most part I don't need to worry and you don't have to worry about my network spewing forth slammer or other nasty junk.

Re:Microsoft by VTS · 2003-05-20 13:20 · Score: 5, Interesting

Some time ago I would have agreed with you, but not anymore, If media player crashes playing some video then the whole system becomes unstable and then even doing something like sending a file to the recyclebin freezes the UI...

--
--- No 16-bit support in Vista? Half of our modules still use it! ---
Re:Microsoft by Foolhardy · 2003-05-20 14:02 · Score: 3, Insightful

I have found that the drivers you use in Windows are the biggest factor in stability. Usually the drivers that come on the CD are the most stable, but they are not the best option for some devices. Microsoft supplied video drivers usually have almost no features and sometimes are quite incompatible, espically with games. Some companies produce great drivers while others seem to be really cheapo.
Sometimes, different compainies will make completely different drivers for the same device. For example, the VIA AC'97 audio controller: VIA has their own drivers, and so does Realtek. I think that the Realtek are vastly superior to the VIA drivers, in terms of functionality and stability.
I know its easy to blame Microsoft for every crash on a Windows system, but in my opinion bad drivers seem to be the culprit most of the time.
Re:Microsoft by CognitivelyDistorted · 2003-05-20 14:47 · Score: 4, Interesting

Yes, NT5+ is very stable. MS is working on the driver problem. SLAM is a tool for verifying drivers. Given a requirement, e.g., after acquiring a kernel lock the driver must release it exactly once on all control paths, and some driver source code, SLAM can find all the ways the driver can fail the requirement. They have specifications for various driver types and are using them to test some drivers. It's a research project by the Software Development Tools group in MSR, but they're working on getting it stable and powerful enough to verify more drivers. If they can get it to work well enough, they'll supply it to hardware vendors.

Economics? by iso · 2003-05-20 13:09 · Score: 5, Insightful

While it's not the whole story, something definitely has to be said about the fact that while people are willing to pay for features, they're rarely willing to pay more for stability. Quite frankly there's little economic incentive to make software that doesn't crash.

If your market will put up with the ocassional crash, and never expects software to be bulletproof, why bother putting the effort into stability? Until people start putting their money into the more stable platforms, that's not going to change.

The ultimate solution by dsanfte · 2003-05-20 13:09 · Score: 4, Interesting

The ultimate solution to the problem is to let computers write the software themselves. Give them a goal, set up evolutionary and genetic algorithms, and let them go at it on a supercomputer cluster for a few months.

Of course, you'd need to make sure the algorithms that humans wrote aren't flawed themselves, but once you got that pinned down, you would be more or less home-free.

Even if you didn't take this drastic a step, another solution would be computer-aided software burn-in. Let the computer test the software for bugs. A super-QA Analysis if you will. Log complete program traces for every trial run, and let the machine put the software through every input/output possiblity.

--
occultae nullus est respectus musicae - originally a Greek proverb

Re:The ultimate solution by Christopher+Thomas · 2003-05-20 13:20 · Score: 2, Interesting

The ultimate solution to the problem is to let computers write the software themselves. Give them a goal, set up evolutionary and genetic algorithms, and let them go at it on a supercomputer cluster for a few months.

Unfortunately, this only works if you can distinguish between buggy and non-buggy code produced by the algorithm. You can do tests, but no test suite will be exhaustive (otherwise we'd just use it on human-developed code to find the bugs).

Perfect software can only be produced if a formal proof of correctness is possible. Even then, you're limited by the assumptions the proof makes.
Re:The ultimate solution by Jeremi · 2003-05-20 13:24 · Score: 5, Interesting

The ultimate solution to the problem is to let computers write the software themselves. Give them a goal, set up evolutionary and genetic algorithms, and let them go at it on a supercomputer cluster for a few months.

That only works if you can write a fiteness algorithm that can tell whether the program did the correct thing or not -- otherwise, you have no way to decide what to "breed" and what to throw away. And for many types of program, that fitness algorithm would be more difficult to write than the program you are trying to auto-generate...

Of course, you'd need to make sure the algorithms that humans wrote aren't flawed themselves, but once you got that pinned down, you would be more or less home-free.

All you've done is replace a hard problem ("write a program that does X") with a harder problem ("write a program that teaches a computer to write a program that does X"). No dice.

Even if you didn't take this drastic a step, another solution would be computer-aided software burn-in. Let the computer test the software for bugs. A super-QA Analysis if you will. Log complete program traces for every trial run, and let the machine put the software through every input/output possiblity.

For most modern programs, there isn't nearly enough time left before the heat-death of the universe to do this. Hell, for programs other than simple batch-processors, the number of possible input and outputs is infinite (since the program can do an arbitrary number of actions before the user quits it)

--

I don't care if it's 90,000 hectares. That lake was not my doing.

Bugs are fun! by giraphe · 2003-05-20 13:10 · Score: 2, Funny

But eliminating bugs would take all the fun out of programming!

99% of all statistics come out of my ass by SweetAndSourJesus · 2003-05-20 13:11 · Score: 2, Funny

14% of all people know that.

--

--
the strongest word is still the word "free"

The bar isn't set very high. by joshtimmons · 2003-05-20 13:13 · Score: 2, Insightful

Sure, hardware is complex and today's software is huge, multi-featured, multithreaded, and event-driven and all of these factors make writing good software hard, but I think that the reason we don't see higher quality OS's is simply that the bar isn't set very high by the market leader. We tolerate applications that freeze, computers that need to be rebooted, or crash, etc. That low bar sets consumer expectations and the result is that companies (and programmers) only work to a certain level of reliability - then they work more on more features instead of more work on stability.

For those who are willing to pay... by PseudononymousCoward · 2003-05-20 13:14 · Score: 5, Insightful

The number of bugs is smaller. Think of the systems used by the telcos, or NASA. Are they perfect? No, but they are much, much more stable than Win32, or Mac, or Linux. The reason is simple, the owners demand them to be.

There are costs associated with fixing bugs and reducing crashes. The more stable an operating system is to be, the more time and money that must be devoted to its design and implementation. PC users are not willing to pay this amount for stability, either in explicit cost, or in hardware restrictions or in trade-offs for other features.

As Linux evolves over time, its stability will always improve, but it may still never reach the stability of, say, VMS. Why? Because even with the open source model of development, there are still tradeoffs to be made, tradeoffs between new features and stability, mostly. And successive bugs are harder and harder to fix, requiring greater and greater amounts of time. At some point, the community/individual decides that they would rather spend their time going after some lower-hanging fruit.

Just my $0.02

Actually, IAAE.

Re:For those who are willing to pay... by dghcasp · 2003-05-20 14:22 · Score: 4, Interesting

Think of the systems used by the telcos, or NASA. Are they perfect? No, but they are much, much more stable than Win32, or Mac, or Linux. The reason is simple, the owners demand them to be.

This reminds me of a story I read in the internal magazine of a telecomunications equipment supplier that I used to work for. It was about an international toll switch somewhere in the U.K. that had been up for 17 years (or something extreme like that.) Furthermore, this included having all of its hardware upgraded and replaced. Twice.
Just stop and think about that for a while in PC terms... "I replaced my motherboard with the power on without rebooting my system, while it was serving 10,000 web pages a second."
Granted, this is a higher level of hardware with full redundancy, but it still boggles my mind.

Uhhh.... by swagr · 2003-05-20 13:14 · Score: 2, Insightful

I'm lazy so I haven't bothere to read what others have said. At the risk of repeating what others may have said:

Isn't this just a matter of economics?
I bet if you get everyone on the planet, and every company to purchace software solely by merit of stability, you'll start to see a lot more stable software. But as long as people are shopping for *featureful* apps, *fun* games, and eye candy, it's not going to happen.

--

-... --- .-. . -.. ..--..

Mandate memory checking tools by hawkstone · 2003-05-20 13:15 · Score: 5, Interesting

I'm sure it's harder to accomplish this for kernel level code (it's primarily OSes being pointed at right here) but you can think everything is working hunkey-dorey and not realize something is going wrong under the covers.

Most errors of this can be found with testing under tools like valgrind or Rational's purify. I'm sure there are others (I've heard of ParaSoft Insure++, ATOM Third Degree, CodeGaurd, and ZeroFault), but the quality of these tools really matters.

The issue is that tiny errors can cause crashes intermittently, and not immediately. For example:
uninitialized memory reads -- usually not a problem, but if this value is ever actually used, it will be.
array bounds reads -- never acceptable, but depending on the structure of memory, may not always cause an immediate crash.
array bounds writes -- like ABRs, may not be immediately fatal, but these are going to crash your code sooner or later.

Since they don't always cause an immediate crash, these errors are likely to creep in to released code without use of one of these tools. And if you want to know why we shouldn't always run programs in an environment that checks these kinds of things, try it once; you'll notice a speed hit of usually an order of magnitude. C/C++ is a perfectly acceptable language -- not all debugging has to be done by the compiler/interpreter or only after you notice a problem.

Anyway, hope that wasn't too pedantic....

Re:Computers don't crash by Anonymous Coward · 2003-05-20 13:17 · Score: 3, Funny

The current issue of Scientific American states that 51% of crashes are due to user error. 15%=software error. 34%=hardware error. Refer to article for further info.

Take your pick... by Rocker2000 · 2003-05-20 13:18 · Score: 3, Informative

Any of the following reasons conspire to result in buggy software these days.. (a) clueless marketing departments, project managers, etc set unrealistic deadlines for completing code to an acceptable standard. shortcuts are taken to meet the unrealistci deadliens and buggy products are the end result... (b) to satisfy client demands for increased functionality (no matter how unnecessary) results in more compelx code.. complx code is harder to maintain and troubleshoot... i sometimes think IT peopel have forgotten the notion that a simple solution that achieves functionality is the best solution... (c) programmers are humans, humans make mistakes in code... (d) companies to reduce the time/resource necessary to complete a product put in place aenemic testing/load testing methologies... (e) people often compare a computer to a kettle, car etc.. why can't it just work like that... well kettles do one thing and that's it.. computers do many complex things from rendering a CAD diagram through to a large scale mail server... etc etc... cars do one thing by relative comparison too but even most cars get more maintenace than some IT environments i've seen and you don't see people rushing out to buy a no name no brand car (e.g. like pc clones etc etc)... and many more im sure... how many more faield IT projects/Buggy software have to occur before peopel realize these things?

We've got a lot of techniques in the gaming world by Samir+Gupta · 2003-05-20 13:18 · Score: 3, Interesting

In the world of games, especially console games, a crash immediately spoils the user's gameplay experience, and it's doubly so if you don't have a mechanism to patch games as in the PC world.

In the GameCube, crashes are alleviated by having only a thin OS layer between the hardware and the game, and restricting only a single task to be run in a single privilege level of the CPU, avoiding context switches and going back and forth between user and kernel mode which introduces complexity and can wreak havoc if malicious data is present.

Furthermore, we have a set hardware configuration, running a well defined consistent set of drivers, which are again, minimal, and this eliminates another factor that often leads to crashes in the PC world.

The most important thing though is robust software design. In our games, we all code exception handlers for the software, so that a single errant NULL pointer doesn't bring the whole thing down with a "Segmentation fault" message as PC users seem to experience with their software, but rather, we gracefully recover, perhaps immediately rolling back to the previous iteration in the game loop and "moving" the player a bit, for instance, in a FPS where the player might have entered into an area in a orientation that happens to create a divide by zero error due to numerical imprecision.

In the future with CPU and memory speeds increasing, we are investigating new designs, such as microkernel based architectures where individual game entities are separate protected "processes" that communicate via some fast IPC mechanism such as shared memory or a "tuplespace", so that a bug in one entity doesn't bring the whole universe crashing to a halt, and I hope that such techniques are adopted by the general computing world.

--
-- Samir Gupta, Ph. D. Head, New Technology Research Group, Nintendo Co. Ltd., Kyoto, Japan.

Obligatory anti-MS by cptgrudge · 2003-05-20 13:18 · Score: 5, Insightful

Of course, there's no need to mention Microsoft's inability to create a stable system.

What exactly is the purpose behind this? Why was it put in here? People are going to need to grow up if people in "our" circle want to be taken seriously. I've used Windows 2000 and Windows XP both. They crash as much as my Red Hat and Debian boxes do. Never. They are all rock solid.

I work for a public school system. We have a class at the High School that teaches and certifies for A+ (I know, I know). They have all sorts of problems getting stuff to work and to get a system stable. In Windows and Linux.

It isn't because they are high schoolers.

It isn't because they are "just learning".

It's because they buy really shitty hardware. They look for the best cost, and they get their hardware from some loser manufacturer that has fucked up drivers and horrible quality control.

Properly maintained boxes with quality hardware in them just don't crash anymore. Programs maybe, but not systems.

Christ, people, this has been beat to death! Microsoft has a great product for an OS now! Get back to making something better than them instead trying to convince yourself that Microsoft is delusional.

Mod me Flamebait, I don't care.

--
Qualitas edurus commercium, nullus penitus net rimor, nullus deus beneficium

Re:Obligatory anti-MS by UserChrisCanter4 · 2003-05-21 01:18 · Score: 2, Informative

Not that anyone will even read this, but good call.

I see so many people spend beacoup money on their internals and then say, "oh, yeah, and this 420Watt PS I got for $35. What a steal!"

A good power supply is not cheap. On the flip side, though, a good power supply is not cheap. And a bad power supply is the most annoying thing to troubleshoot.

Antec makes some good stuff that I've been very happy with, ditto for PC Power and Cooling. Expensive, but so worth it it isn't funny.

Three Words. by coday · 2003-05-20 13:19 · Score: 2, Insightful

"Time To Market". For commercial software developers they are always trying to "balance", quality and getting into the market ASAP. Unfortunately MS (and others) have made it acceptable to release service packs after the "final" product has already shipped. Get it out there now, fix it later is commonplace.

Thoughts on why *nix is stable by jone1941 · 2003-05-20 13:21 · Score: 2, Interesting

There are a lot of moving parts in a working linux system (I'm talking CLI here), however, it seems to be less prone to crashing. As someone previously mentioned, software that is larger and more complex is more likely to have a bug. The point I'm getting at is that the design priciples of *nix dictate many small programs to create a large working system. When a program is small it can be designed and developed with care. This leads me to my final though, modern Operating Systems with GUIs are less stable because they are generally designed as large monolithic systems.

I'm going to claim that the prime reason systems with GUIs (and I'm including everyone) are unstable is because noone has come up with a rock solid base for such a system. X is not solid, windows explorer, mac os x's application manager, no one has it right.

The one thing I am leaving out, is that drivers also tend to be a major cause of instability. I cannot run the nvidia driver on my gentoo box, certain usb events can bring a system to a screetching halt. What needs to happen is better design around the unstable interfaces, such that in the worst case scenario, things can still be recovered.

--
Fear trumps hope and ignorance trumps both

Why by pjdepasq · 2003-05-20 13:25 · Score: 2, Insightful

Massive complexity (even for simple apps) + enless possibilities of user interactions + rush to market + no sliver bullet = likelyhood of crashing

A lesson from history by Dr.+Bent · 2003-05-20 13:26 · Score: 5, Insightful

Back in the Middle Ages, when the Catholic Church wanted a Cathedral built, they would pay a bunch of Freemasons to do it. The Freemasons viewed themselves as creative artisans, and they closely guarded the secrets they used to construct these impressive houses of worship.

The method they used, however, was less than impressive. Typically, they would start with a general design, and piece together stone and mortar until something collapsed, which happened quite often. Then they would patch the section that collapsed and keep on going until something else fell down, or they finished. Given the level of understanding with regards to Physics and Material Science, those Freemasons has no other choice than to build them this way.

Now fast forward to the 21st century. The engineering disasters on par with those medieval collapses can be counted on one hand (Tacoma Narrows Bridge and the Hyatt Regency walkway collapse are the only two I can think of). This is directly due to the fact that a civil engineer can determine if a design is structurally sound before they build it.

Contrast this with modern day software development. We can't even tell if a system is flawed after we build it, let alone before. So software gets written, deployed, and put into the marketplace that has no assurances whatsoever of actually doing what it's supposed to do (hence the 10,000 page EULA).

You can't have Civil Engineers until you have Physics. And you won't have 100% bulletproof software until you have Software Engineering. And you won't have that until someone can figure out a way to prove that a given peice of software will perform as it's supposed to. JUnit is a step in the right direction, but there's still a long way to go. It's going to take a breakthrough on the order of Newton to make Software Engineering as reliable a discipline as Civil Engineering.

Re:A lesson from history by ThreeToe · 2003-05-20 14:32 · Score: 2, Insightful

You make a very insightful analogy and I think it is quite revealing.
You state that we won't have Software Engineering until "someone can figure out a way to prove that a given piece of software will perform as it's supposed to."
Alas, this is known to be an impossible task in the general case: this is Turing's halting problem. There's no Newton-caliber breakthrough waiting in the wings here.
Unit testing works because testers know their software systems intimately and can specialize testing code to work in a narrower number of cases. State modeling languages such as ASML can help improve the situation, but seasoned testers know that no tool will help them achieve 100% block and arc coverage of their code.
I'll throw this out for discussion, then: the underlying principals of a software system's design dictate its fundamental physics. It is difficult (sometimes impossible?) to make distinctions between a software's functionality and its substrate.
In an ideal world, developers would find a technique by which they could _always_ separate the two and hence categorize a common physics. The choice of language is part of the physics, but it isn't the sum total: C++ apps can have radically different underlying structures.
Thoughts?
Re:A lesson from history by Minna+Kirai · 2003-05-20 15:23 · Score: 3, Insightful

It's going to take a breakthrough on the order of Newton to make Software Engineering as reliable a discipline as Civil Engineering.

The reliablity of today's Civil Engineering comes not from deep theoretical understanding ala Newton- it's really just the same "build, crash, repeat" method those Freemasons have been using for 1000 years.

Now that we've had centuries of experience at building similar kinds of structures, most of the kinks have been worked out. Those rare CivEng projects that break new ground still have a high risk of unexpected failures. (A 4000% cost overrun is a failure)

Civil Engineering still uses empirical testing to decide if a new technique is reliable, as does "Software Engineering". You just notice it more in SE because that field has more opportunities for innovation and much, much fewer penalties when an experiment fails.

JUnit is a step in the right direction, but there's still a long way to go.

JUnit is a step down a curving road to a dead-end. It won't take us to an ultimate solution (but it will provide benefit in the near-term future). That's because it's not a system to help formally prove code is correct (which some unpopular languages support to small degrees)- instead, Unit Testing is just a way to automate "build, crash, repeat" empirical testing.

A lasting lockup that hasn't been fixed... by Epistax · 2003-05-20 13:30 · Score: 2, Insightful

... is deadlock. Lets say you have two IO devices, for ease we'll call them disk drives, which give exclusive access. Process A grabs one disk drive, then loses their processor turn (happens many times per second). Process B grabs another disk drive, then requests the drive Process A has, and 'blocks'. Process A then requests the drive Process B has, and 'blocks'. This is a very simple example of deadlock. Now if one of these processes is an OS process, well too bad.

There are mitigation strategies, but in short the all suck. You can constantly monitor every piece of hardware to see who has rights to what, and flat out deny access to people when a deadlock may occure. This is slow and isn't very nice to processes who now have to trap twice as many errors for many IO operations.
Another method (in avoidance) is to require all processes to request hardware in a certain order. This prevents all deadlock, but is unrealistic to how a program may function, and may require a programmer to hold onto a hardware device for much longer than actually needed.
The last method is perhaps worst of all: restrict every process to one hardware device at a time.

Can you think of a better strategy? Patent it and make a few billion. The strategy taken by *nix, Mac and Windows is... well to completely ignore it because it very rarely happens, but as processors in the future become faster and faster, they are more apt to run more and more processes at once, increasing the problem significantly.

Note this problem only occures for hold-and-wait devices. Usually any number of programs can read a file for instance, and there is no conflict at all. I find that Operating Systems Concepts (Silberschatz, Galvin, Gagne) covers this topic well, and plenty of other hotspots.

I can tell you why MY computer crashes! by MATTtheROGUE · 2003-05-20 13:30 · Score: 2, Funny

It seems to be one of the most popular things to do is to blame the software creators, or the human operators. Thats not the reason my computer crahes Why does my computer crash? Well, when you spend hours of your day looking at thigs you shouldn't be looking at (wink wink), and other memory consuming things, (chatting, all those programs that even though I disable from starting up [yes, using regedit] still manage to start) just eat away at my computer. And finally, the most important and Sane reason of all. The Underpants Gnomes. They finally figured out what stage two is. Stage 2 part a; Crash My computer, Stage 2 part c; Use the underpants to make profit. Thanks for listening to my rambling post.

Software, complexity, and human nature. by Christopher+Thomas · 2003-05-20 13:31 · Score: 3, Insightful

There are several reasons why software keeps crashing, and they aren't going away any time soon. These reasons are:

You can't prove that most software works.

Except for a restricted set of cases, you can't prove that a given piece of code works or doesn't work. A truly exhaustive set of tests would be impractical to perform, and formal proofs of correctness place strong limits on the type of code you can write and the environment in which you can write it.

The result is that code is assumed correct when no bugs are found. This only means that there probably aren't _many_ bugs left. Thus, it may still crash (or have a security hole, or what-have-you).
Software is very complex.

Software has been complex for a long time. It just tends to be bigger now. A larger system has more opportunities for unexpected high-level interactions between components, but even a smaller system will have enough twists and turns that formulating a really good test suite, or checking the code by inspection, is very difficult. Bugs will be missed. As was discussed above, many of these missed bugs will slip through testing and reach the world.
- Nobody wants to pay for perfect software.
  
  As more effort is applied, you can get asymptotically closer to a bug-free system. However, this is far past the point of diminishing returns on the cost/benefit curve. For sufficiently constrained systems, you can even try proving it correct, but this tends to lead to cutting out a lot of functionality, speed, or both.
  
  In situations where reliability must be had at any cost - aerospace control systems, vehicle control systems, medical equipment - the money will exist to produce near-perfect code, but even then there are bugs that occasionally bite. With commercial software, the buyer would rather have an application that crashes now and then than an application that costs ten times as much and comes out several years later.
Free and/or open software avoids some of this by staying in development longer, which allows more of the bugs to be caught, but even free and/or open software evolves. Every change brings new bugs to be squashed. As long as there are new types of software that we want, it isn't going to end.

Re:Software, complexity, and human nature. by tabby · 2003-05-20 18:52 · Score: 2, Funny

>># You can't prove that most software works.

No-one can be told if this software runs.
You must compile it for yourself.

--
I've experiments to run, there is research to be done on the people who are still alive.

Simple, yes, for other reasons by jabber01 · 2003-05-20 13:32 · Score: 4, Insightful

Software crashes because it's complex, yes, but that's just part of it.

Jets are complex too. So is the Space Shuttle. Cruise ships. CARS are pretty complex.

While all these things do suffer catastrophic failure from time to time, it is far from the norm. Defective cars get recalled. Space shuttles ALL get grounded at the mere possibility of defect.

If Q/A as stringent as this was applied to software, Microsoft - and in fact most of the software industry - would be out of business. Can you imagine a Windows recall?

There is software out there that does not fail. Mind-bendingly complex software of the sort that "drives mere mortals mad" to boot. It is tested and retested, through all possible situations - not just the "likely 80%" of them. It is proved correct, and then verified again.

COTS software is crap because neither the market nor the regulatory forces (such as they are, but that's a separate discussion) do not require it to be. Nor could they.

A 747 Jumbo costs a whole lot, and while much of that cost is in the manufacture of the "big and complex thing" that it is, a significant chunk of that cost is also due to the design process, the testing, the modeling and simulation of it.

Software is easy to scale, everyone can have a copy of the product once one is built. Cake. But spread out the cost of an error free design - tested to exhaustion, passed through V&V and so on, and you have a completely different market landscape with which to contend.

Consumers, in the COTS context, don't mind "planned obsolescence" in their software. The current state of things proves this. People would rather have pretty features on a flaky system, than a solid system.

--

The REAL jabber has the user id: 13196
What you do today will cost you a day of your life

Re:Simple, yes, for other reasons by Surazal · 2003-05-20 13:44 · Score: 3, Insightful

Consumers, in the COTS context, don't mind "planned obsolescence" in their software. The current state of things proves this. People would rather have pretty features on a flaky system, than a solid system.

This is not necessarily true... it's a bad generalization besides. Most people I work with in the IT industry would give their arm, leg, spleen, right lung, part of their left lung, lower intestine, and maybe even their occipital lobes for a reliable system that WORKS. Features are secondary.

The "features over stability" myth is just that: a myth. Show me an admin that prefers only the latest and greatest in "features" and I'll show you an admin that will lose all her/his hair within six months (a little after all their hair turns white).

Well, ok, I work primarily with IT people admittedly. Perhaps the folks in management are a little different. But I've noticed that IT people have ways of making management's lives miserable (in ways that are downright creative) when a bad decision is made with software purchases. I've done it, myself. ;^)

--
--- Journals are boring; Go to my web page instead
Re:Simple, yes, for other reasons by Chris+Carollo · 2003-05-20 14:25 · Score: 3, Interesting

Jets are complex too. So is the Space Shuttle. Cruise ships. CARS are pretty complex.
Then again, if one of the overhead bin latches get stuck, or my overhead light burns out, or my seatbelt gets stuck, the entire plane or car doesn't instantly explode. The issue isn't complexity, it's fragility.

Software is incomprehensibly fragile -- any single thing can cause a crash, taking the whole system or application down. And even those critical parts of things like airplanes have multiple redundancies, something that's hard to build into software. You can do things like catching exceptions, but you typically can't recover as gracefully as if there was never a problem at all.

The shuttle is actually not a bad analogy -- it's also very fragile due to the stresses it endures. And we've effectively had two crashes in 100 runs. Most software is more stable than that.
Re:Simple, yes, for other reasons by drinkypoo · 2003-05-20 14:39 · Score: 5, Funny

Can you imagine a Windows recall?

I must be able to, I'm feeling flushed and my nipples are hard.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Simple, yes, for other reasons by Minna+Kirai · 2003-05-20 14:47 · Score: 2, Insightful
Most people I work with in the IT industry would give their arm, leg, spleen, right lung, part of their left lung, lower intestine, and maybe even their occipital lobes for a reliable system that WORKS.

No, that's the myth!

Show me one off these voluntarily maimed admins, who carved out all his organs hoping for improved software. They don't exist.

(More realistically, show me one who sacrificed 30% of his annual salary for better software. He also doesn't exist)

True, from day to day, everyone wishes that that jobs were easier.
- "I wish customers would read the web page, instead of calling me for phone support"
- "I wish we had a train to Chicago instead of me driving this truck for 7 hour streches"
  
  "I wish the servers I maintained didn't crash"
However, if those people were fully rational, they'd understand that as soon as their wish comes true, they're out of a job. (An enlightened person will welcome the change as better for the world at large; a luddite would whine, scream, and throw boots in the gears)

And anyway, IT admins are not the consumers of software. They're not the ones who drive the buyer-seller economy. The actual consumers are other people in the company- and from their perspective, the IT staff are an expense attached to buying the software.
Re:Simple, yes, for other reasons by Surazal · 2003-05-20 16:29 · Score: 3, Informative

You ought to work tech support some time. There are real costs associated with software bugs. These costs are measured. Many times these costs are measured more meticulously than software vendors would like to admit. There are more organizations than you might realize that purposely delay software deployments to make sure that they do not ruin their technology infrastructure. Often times, when I work with a senior admin within the organization, I find they are the "NO" people. "No", we will not apply that patch unless you prove to us it will fix the problem. "No", we will not apply that patch unless you prove it won't introduce new problems. And, in the case that there are unforeseen complications in a software upgrade, guess who gets the heat? Directly, it's the senior admins. Both directly and indirectly, it's the software vendor. Bad publicity == lost sales. Ask any sales person (technical or non-technical).

Of course, I'm at the end of the equation where these costs are realized after the fact. Also, I think that since I come from the Unix world, I've seen more preference towards quality over quantity. Unix-oriented orgs are much more cautious than Windows-oriented orgs. I attribute this to lack of experience in that market, but the way things are going, experience is not in short supply. Bugs and security breaches are costing companies in real dollors nowadays, and commercial and gov't organizations are not ignorant of this fact, even at the high echelon levels.

For proof, look at Microsoft. I certainly remember reading that they decided to go for a company-wide code freeze to resolve bugs and security issues. This code freeze lasted for SIX MONTHS. That's a HUGE risk for a software company. Also, there's that whole trillion dollor fine against the company thing, too, that's been circulating a bit lately. It also undermines any arguments based on "customers are lemmings that will buy anything we dangle in front of them". Maybe the fact that features outweighed stability was true during the dot-com boom. I think it's definitely less true now, by a significant degree.

--
--- Journals are boring; Go to my web page instead

Don't single out Microsoft by callipygian-showsyst · 2003-05-20 13:32 · Score: 3, Interesting

Of course, there's no need to mention Microsoft's inability to create a stable system
My Windows XP box, which is my fileserver, has been up for 5 months so far.

My OS X box, which I use for web browsing and word processing, crashes about once every three days.

Now, I certainly have some bones to pick with Microsoft, but Apple is no better.

--
Best Buy can have you arrested

Need to keep costs low by anoopiyer · 2003-05-20 13:33 · Score: 2, Insightful

Is the need for speed preventing the use of reliable software design techniques?

No it's the need to keep costs low and time to market pressures that is preventing the use of reliable software design techniques.

If all vendors had a large number of programmers and could select their own timeframe for releases, code would perhaps get more reliable.

But on the other hand, Microsoft does have a large number of programmers, and they pretty much decide their own release schedules. So the above obviously doesn't hold for Microsoft. I guess that's because all their releases add new features, which introduce bugs...

That's true for other vendors and other platforms too, isn't it? If all feature enhancements to say RedHat or SuSE Linux were stopped overnight and all future releases were only bug fixes, then said distro would be 100 percent bug-free at some hypothetical point in the future. But they have to add features to compete and evolve, and alas, said distro will never be bug-free.

The low barriers to software updates also make software a less rigorous practice than hardware design. In hardware design, it takes millions of dollars to tape out a new rev of a chip to fix a bug; not to mention all the bad publicity the vendor gets (Intel fdiv bug, anyone?). Hence rigor in design and validation is much higher for hardware when compared to software.

Software is a young industry by AmVidia+HQ · 2003-05-20 13:35 · Score: 3, Insightful

I'll paraphrase a comment that was said before, don't remember where i read it:

"We've been building bridges for thousands of years, but only started writing software for a few decades."

To combat increasing bugs in increasingly complex software, we need better tools. From the low level (more reliable memory handling) to the high level (more abstraction to reduce human programming errors) in software languages and compilers.

You can't expect to build the Golden Gate with shovels, without expecting it to fall apart do you? (no, i'm not a terrorist)

--
VIVA1023.com | Political Fashion.

STFP by rice_burners_suck · 2003-05-20 13:36 · Score: 5, Insightful

Software crashes because: Software is an immature field. Good software takes time. Software is unobvious to business managers who want the job done yesterday.

Businessmen generally do not understand the internal workings of software. They are in a "big-picture" sort of world where software is but one pesky detail that will be taken care of. A computer crash that causes so many thousands of dollars in damages is no different than a truck crash. There is simply a risk to every element of business. If the risk is relatively low, the big shots don't care about it. Grocery stores in earthquake prone areas continue to place glass jars on the edges of shelves. Sure, there will be an earthquake one day, but it's a calculated part of business risk, and the risk is relatively low (the Earth doesn't shake every five minutes).

Software bugs are a similar risk. It needs to look like it works. It needs to crash (and lose data) infrequently enough that the software will still sell. The business is not concerned with stamping out software bugs. It is concerned with releasing the software and making money. If the need arises, the business will improve the software and make more money. More often than not, this means adding features and shiny graphics. Fixing bugs is not very important to companies because customers do not pay for bug fixes. By the consumer, bugs are viewed as defects and their fixes should be free. By the company, bugs are viewed as a minor risk and fixing them would cost too much to justify. So you'll reboot once in a while or lose an hour's work once in a while. If it fries your hard disk, well, you should have backed up your data.

Software is also one of the newest fields of human endeavor. Buildings have been built, ships have sailed and farms were farmed, all for thousands of years. No matter how much progress happens in these fields now, they have come so close to "perfection" that continued improvement serves to lower cost, improve safety and increase convenience. It's not a matter of, "Gee, how can we make buildings that actually stand without falling down three times a week?" It's just a matter of, "How wide, how deep, how tall and what color glass do you want on the outside?" You pay X dollars, wait Y months and voila, there is a building. But programming has been around for how long, 50 years? It's an increasingly important but very immature field.

Buildings, bridges, ships... they're obvious. Everyone knows that if enough lifeboats aren't put on an unsinkable ship, it'll sink on purpose, just to piss you off. Everyone knows that if a 100 story building is going to stand, it has to take 10 years to build it. Everyone knows that a dam has to be pretty damn strong or it'll break and flood half the countryside. The building, shipyard and dam businesses aren't progressing at light speed. It is easy to justify 10 years for an outrageous building design because people KNOW what is involved. But software... Now that's totally unobvious. Software is an idea. It's abstract. It's a bunch of curse words that look like gobbledygook to the uninitiated. A bunch of "noise" characters on a broken terminal. Something done by a bunch of skinny, pimply faced geeks who got beat up in high school, took the ugly girl to prom and didn't have any friends. Why should a manager bother to care that fst_jejcl_reduce() causes a possible NULL pointer in the outer loop if case 32 is activated, which happens if the previous re-sort encountered two items with similar Amount fields, all of which will take a whole day to find and fix and will only happen, say, 2% of the times this particular feature is invoked by the user, which isn't that often? Why should anybody justify spending 2 years to develop some bulletproof program that can be banged out in 3 months, with bugs? What's the problem? Constructor workers are risking their lives, moving heavy things, sweating all day in the hot sun... While geeks are sitting in offices just punching crap on a keyboard. How difficult could it possibly be? To

According to complexity theory... by zrm8y5m02 · 2003-05-20 13:39 · Score: 2, Interesting

instability is inevitable for fast evolution. A stable system means its not evolving fast enough, or evolution is slow.

Many softwares have been evolving so fast that there's been no time to perfect the existing features before adding new ones. At some point in the lifetime of indivisual software, it reaches a point where it's somewhat "stable" in the sense that no more major features are needed. For example, TeX reached its relative maturity during 80s and IIRC, there's no known bug at this point.

If all softwares are given enough time, they will all reach that kind of maturity. The problem is not all of them can survive that long - usually they become obsolete before they become stable...

Obligatory OpenZaurus plug by noda132 · 2003-05-20 13:40 · Score: 2, Interesting

Use OpenZaurus and while crashes still appear (I assume 3.2 will eventually, though I haven't had a full crash since it first came out), crashes will not lose all your data, since it's written to flash.

Also, my Linux box hasn't crashed this year, and I can't recall any crashes last ye-- no, wait, there was one slew, but it was an icky driver which I got rid of. I'd say a pretty good track record for a system built almost entirely from CVS.

Can't remember any crashes this year or last on any other Linux boxes I manage that I can think of (8 boxes off the top of my head).

Turing showed this by martin-boundary · 2003-05-20 13:41 · Score: 4, Interesting

A crashed computer is a computer that's stopped. Alan Turing proved in 1936 that the halting problem is unsolvable. So, it's impossible to know when and how a computer is going to crash or not under all possible circumstances (inputs).

Accept it. It's a fact of nature.

Re:Computers don't crash by Trillan · 2003-05-20 13:43 · Score: 2

Ah, but is user error defined as "well, you know it doesn't work, so why'd you try"?

all systems crash, not just MS by dirk · 2003-05-20 13:44 · Score: 4, Interesting

When can we finally give up the FUD of "MS crashes all the time"? Anyone who has used a later MS OS (Win2k or XP) can easily see they crash very rarely. I have had my Redhat install have more problems than my Windows install in the past 6 months, and on the MS system most of the problems have been 3rd party software while on the Linux most of the problems have been the OS itself. The reason systems crash is that there are many pieces, written by many different people, interacting with each other. This is the same whether the OS is Linux of Windows. The harping on the instability of Windows does nothing but hurt the Linux cause, since anyone who actually uses a newer version of Windows knows that the person has no basis in reality.

--

"Information wants to be expensive" - Stewart Brand, the same guy who said "Information wants to be free"

Re:all systems crash, not just MS by shish · 2003-05-21 00:26 · Score: 2, Funny

It's windows, not even the bugs work properly...

--
I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment

Why do computers crash? Because we let them. by dschuetz · 2003-05-20 13:51 · Score: 3, Insightful

Face it -- if our cars broke down as frequently as Windows (or Linux or whatever), we'd be suing the auto industry out of business.

If our VCRs ate every tenth tape and only played tapes from the same manufacturer as the VCR with any quality, they'd all be returned to Circuit City.

But for software, we grit our teeth and say, well, I just don't understand computers, and reach for the power switch.

Until we, as consumers, start fighting for software that works without crashing, we'll continue to get the lowest possible quality -- just as we have for years. Once the customer starts demanding a quality product, the quality (and whatever software development practices, languages, testing procedures, etc., are needed) will follow.

Bottom line -- there's no real incentive. Microsoft makes billions with buggy software, the increase in profit for selling non-buggy software is pretty small.

Time is Money. by Rimbo · 2003-05-20 13:53 · Score: 5, Interesting

I think this is basically the right answer.

A couple of months ago, the company I worked for spent a lot of time and effort developing a robust testing methodology. We had a software product that through blood sweat and tears would not crash unless you basically blasted the hardware in some way.

But that led to two problems. First, we only had so many people working, and resources spent testing and bugfixing were not being used to add new features. Second, the time it took to get it that robust delayed the product's release beyond the point where we could recover the investment. [Time developing] * [Cost of operating] was greater than [expected number of units sold] * [price per unit].

What ended up happening was that we lacked the features to justify the price and number of units we needed to sell to cover the cost of developing it. We had no bugs -- and we could be certain of it -- that would crash the machine.

As of last month, the company could no longer afford to pay me. I'm not there any more.

The moral of the story is that trying to make a bug-free product will bankrupt your company, especially a startup. Software tools have improved, but the benefit largely goes towards adding new whiz-bang features that sell the product for more money, not to being able to fix more bugs.

What we should do as engineers and managers of software products is to not be afraid of getting the product out the door with a few bugs in it if we want our company to do well; this business reality is ultimately why bugs will a big part of software for the forseeable future.

TexOS by waldoj · 2003-05-20 13:53 · Score: 2, Informative

I've got the solution: the TeXOS(tm)!

Slogan: Crash-free -- Donald Knuth guarantees it!

-Waldo Jaquith

A Contributing Factor & Principles by Jerk+City+Troll · 2003-05-20 13:53 · Score: 3, Informative

I develop software at a small shop for a living. We're scraping by; money is extremely tight. As a result, anything we code is coded as quickly as possible. The boss always says "we need this done fast and we need it done right." This sentence is almost always followed up with statements like "don't build it for the ages" or any number of quotes that indicate he doesn't care how, just get it finished as soon as possible.

Welcome to the sorry state of affairs in the software industry today. Developers are too rushed (or don't care themselves) to come up with good designs and write solid implementations. Weaker coders are rewarded for their speed while stronger coders are degraded for software built to last.

Good engineering principles must be applied if software is to not: crash all the time, contain more than a fair share of bugs, contain security vulnerabilities, and not corrupt data. These engineering principles are complicated in practice, but not so numerous. I cannot be exhaustive here, but I am trying to convey a general idea.

- Build tiny, atomic pieces and make sure they work. It amazes me how my peers always come up with blanket solutions to problems. These solutions are remarkably complex and may work for most of the data, but not all of the data. Remember tiny pieces! The immediate question is how to make sure these pieces work. It's more than just testing here. You cannot just evalute a small number of pre and post conditions and assume something works. Prove mathematically that for all possible inputs/pre-states you receive correct outputs/post-states. Remember your discrete math class? Remember doing proofs? Apply it! Computers are fundamentally number crunchers and your input/output are fundamentally numbers and can be represented symbolically and in finite terms. Certainly cases exist where this principle cannot be employed, but those are rare. People working in the encryption field should understand this principle very well.

- Have clearly defined specifications for the software to be written. Strive to work out any questions or ambiguities in the specification before even embarking on the design process. If the specification is unclear or ambiguous, it is simply a matter of time before programmers do the wrong thing or begin to make incorrect or unreasonable assumptions. Another important note on this principle is the partitioning of specifications where appropriate. Do not let specifications for user interface mingle with those for the back-end. While they may be closely related, try to follow the Model Control View (MCV or MVC... it varies). This must be adhered to at the earliest stages of the specification, all the way up to the actual pounding of keyboards.

- Conduct frequent peer review! This is one of the strongest points of open source software development. I argue that it does not occur frequently in the commercial world because everyone is afraid of their peers negatively reviewing their code, placing their jobs at risk. Sadly, this only results in a suboptimal product. The more other people look at your code, the more likely your mistakes (and they do exist) are likely to be found. It's a shame work place environments are not geared to eliminate fear of failure, otherwise I think most software would be a lot better today if people were eager to do reviews.

Once again, this isn't entirely complete, but I think the point is clear. This was written on the fly and mostly off the top of my head, but I think I've got it right. In general, a lot of common sense needs to be applied. For example, if your input is for all intents and purposes random (it's coming from the user) then do extensive checking on it! If you want to encounter unexpected values in your data structures, make sure you hide as much as possible from the rest of the code. It amazes me how little the most basic computer science principles are followed in most software development projects. This is one of the biggest reasons software is so unstable.

--
Join Tor today!

Complexity, standards, peer review, sanity. by twitter · 2003-05-20 13:54 · Score: 3, Insightful

this was the exclusive realm of the highly trained engineer, not some wannabe type that pervades the current service market.

Let's hear it for the "wannabes". I'm not a highly trained engineer by a long shot, but I've got computers that don't go down except for power outages. Then they come right back up. As ERS is so fond of pointing out, complexity kills traditional software. Cosed source can't keep up.

Free software has the answer. Debian has 8,710 packages available to do anything a comercial comercial software does, mostly better. Not just one or two pieces of it, every piece. My systems never crash under their stable release and I run all sorts of services. How is this? It's easy. Free code get's used, fixed, improved and reviewed all the time. The pace of improvement is astounding. I could go on and on about things free software does that common comercial code does not. Code that never sees the light of day is dead.

--

Friends don't help friends install M$ junk.

Re:Complexity, standards, peer review, sanity. by 4minus0 · 2003-05-20 18:28 · Score: 3, Insightful

Free software never ceases to amaze me.

I have set up countless email servers, firewalls, spam catching relays, web servers and dns servers. Some clients want Red Hat, others are more up on the game and have heard of Debian or Slackware, others could care less. That's beside the point... It's open freaking source, hack it to your needs/liking.
You wanna know how much I had to pay for the operating system or individual packages of said software? Nada, that's right, zero, zilch, zip.

It baffles the mind how something that works so well can be free.
That means alot to a small time contractor like myself.
I may not have the money or the coding know-how to give back to the community but you can bet your custom kernel that when somebody has a question on Usenet or a web forum about Linux or a particular package that I happen to know about that I help that person like I was being paid to.
That's the beauty of 99% of the people in this community... I can even say "I have a client who needs X how do I implement this?", and more often than not someone will help me out with the answer or at least point me to the docs that will answer my question. Even knowing good and damn well I'm getting paid to find the answer to that question.
This is a good thing we have here folks, I would imagine that I've taken far more than I've given back but every chance I get I do give back and I like to think that most users of this crazy thing called Free Software do too. So far that theory has proven itself true. Just a little soapboxing on my part here, sorry for the rambling.

--
You've got an easy breezy wind at your back...most of the time.

I'm surprised nobody has pointed out yet... by Frobnicator · 2003-05-20 13:56 · Score: 3, Informative

That beyond all the hyperbole and other reasons, there is something that could be done but usually isn't.

In C++, which a great deal of software is written in, an exception block [or the language or system equivalent] placed around the entire application will catch just about any recoverable error. This is how most of the windows blue-screens or 'your application has performed an illigal operation and will be terminated' messages are brought up. This is how Linux and other unixes generates a core dump.

The actual handling may be in a signal handler, try/catch block, or abend, but the functionality is present in every activly developed language I have ever worked with from cobol and fortran to c, c++, java, and object pascal.

The main reason for applications actually crashing is programmer lazyness.

The main reason for applications getting into a state that they can crash is improper complexity management.

When it comes to drivers, I'm much more forgiving, since it is quite difficult to manage both the hardware and software, and the communication between different programs.

Finally, the operating system itself, which is the layer between the drivers and the applications, I haven't seen any in the last 5 years that has been unstable. Even Windows ME, for all its faults, was very stable in the actual 'operating system'.

But that's just my 2 pesos.

frob

--
//TODO: Think of witty sig statement

What are you smoking? by Jerk+City+Troll · 2003-05-20 14:02 · Score: 4, Interesting

My OS X box, which I use for web browsing and word processing, crashes about once every three days.

The Ti PowerBook G4 I am writing this post on is running Mac OS X 10.2.x. It goes in an out of sleep on an irregular basis, and not always when it is idle. I swap PCMCIA cards in and out. It hops from network to network. I do a lot more than browsing and word processing.

According to my Konfabulator uptime widget, I have 83 days, 23 hours, 20 minutes. My load average at the moment is 1.7. It has not been rebooted since I installed OS X (I did it myself after buying it just for messing around purposes).

You sir are either lying, have bad hardware, or you've severely corrupted your installation. This operating system (which is BSD) is solid as a rock.

--
Join Tor today!

Re:Computers don't crash by Anonymous Coward · 2003-05-20 14:06 · Score: 5, Interesting

The current issue of Scientific American states that 51% of crashes are due to user error. 15%=software error. 34%=hardware error. Refer to article for further info.

You made a little "user error" there yourself-- the article says that 34%=software error and 15%=hardware error.

Oh, and those figures are just for Web applications, not software applications in general.

It's an interesting article. Unfortunately, they're not very clear about what constitutes a "user error." I've filled out Web forms that gave me an "error" when I included hyphens in my phone number or credit card number. That's far from an error, it's just poor user interface design.

In my opinion, something the user does should never cause a program or operating system crash. If this can occur, it is the developer who is at fault, not the user.

Apple's Human Interface Guidelines are a nice introduction to user-fault tolerance, even if you're developing for other platforms.

Re:And by jdray · 2003-05-20 14:06 · Score: 4, Funny

Isn't that "restore him from backup?"

--
The Spoon
Updated 6/28/2011

yeah, it's people. by twitter · 2003-05-20 14:09 · Score: 2, Insightful

Computers crash because of people, especially some people in Redmond. You know, the folks who push that OS with a binary, one bit changes and it dies, registry, an email client that gives mail root access to hardware automatically, a web browwer with similar problems and a kernel that may still not keep track of background processes.

It's never the user's fault. No matter what the user does, the program should recover gracefully. Code that crashes is pathetic. Take my wife. She's managed to uncover all sorts of bugs and flaws in software, but she and my baby girl have had a hard time busting Debian.

--

Friends don't help friends install M$ junk.

Re:And by guile*fr · 2003-05-20 14:09 · Score: 3, Funny

... doing dirty things with clay?

Why is it acceptable that they crash? by MeddlesomeKids · 2003-05-20 14:13 · Score: 2, Insightful

A lot of the posts here have posited answers to why computers crash (people, complexity, unsafe languages, etc.), but most everyone seems resigned to it.

It should not be acceptable that they crash.

Personally, I'm shocked every time I use a computer as to how primitive they are and how little has changed. Is it, or is it not the year 2003?

All of these posited problems are solvable.

Unsafe Languages? Stop using them. Someone please design hardware and an OS that disallows their use and disallows unsafe behavior.
There are safe languages that compile and provide performance today (Lisp comes to mind, perhaps C#, Java's getting faster everyday, and there are safe subsets of C++). Start using those. And then someone go write something better.

I earnestly believe that if the hardware/OS had good protection at the lowest level then performance would not necessarily have to suffer. If the OS is written in a language where the API is solidly contracted, then _true_ safety can be enforced at compile time, and not slow down the system at runtime.

People? Users should _never_ be able to crash their machine. The person riding the elevator should have _no_ way, no matter how contrived, of making the elevator crash. And if the problem is programmers, then kick them out of the loop by forcing them to use safe languages, libraries and tools.

Complexity? Well, this is the kicker isn't it? "you can't foresee all the possible conclusions". But we don't need to see all possible conclusions to stop crashes. And if we lay a foundation of solid transactions on solid APIs with solid languages, then complexity will be reduced, there will be less dark "unknown" spaces. Maybe it'll even be easier to write software with fewer bugs.

Debian. by twitter · 2003-05-20 14:14 · Score: 4, Funny

Debian tested in every state, works good everywhere. I have yet to prove that it does not work anywhere in any way. I can not say the same thing for any other software I've ever run on a PC.

--

Friends don't help friends install M$ junk.

Re:Computers don't crash by abirdman · 2003-05-20 14:16 · Score: 5, Insightful

I'm afraid if a user error causes the program to crash, I've got to call it a software error. It's not that hard to write the error handling handling routines, it's just never in the budget. And the users are invariably able to discover new frontiers of errors the programmer(s) never dreamed of. No matter. If clicking the wrong box, entering the wrong data, plugging in the wrong mouse, or installing the wrong screensaver causes a program to crash it's not the users fault (bless them, for they know not), it's the programmers and design engineers fault.

Hardware errors are another problem altogether. Luckily, it's usually quick to diagnose, and it's usually cheaper to replace hardware than software. It's great how I've been using Microsoft error reporting for about 6 months now, and it's never been their fault. They must be getting better. \snicker>

--
Everything I've ever learned the hard way was based on a statistically invalid sample.

Third-generation languages. by jonadab · 2003-05-20 14:20 · Score: 2, Interesting

Computers crash (and have any number of other problems) largely
because almost all software is still developed using third-generation
("high-level") languages. These languages place on the programmer
the burden of such fiddly details as allocating and freeing memory
and checking the size of allocated memory to see that it's adequate
for the data being copied in.

*Most* of the time when an application crashes seemingly at random,
it's a memory allocation problem of one kind or another: a buffer
that was allocated to small and gets overrun, or a pointer error,
or something of that nature. When an application (or your whole
system) grows more sluggish the longer you leave it running, that's
usually a memory leak: something was allocated and not released
properly -- repeatedly. All of these problems result from a lack
of excruciating vigilence on the part of the programmers when using
a language that requires it. In a large project, maintaining that
ceaseless caution is a nightmarish prospect.

Languages (both interpreted and compiled languages) have been around
for over a decade that handle these things, freeing the programmer
to concentrate on developing the more high-level features of the
software, but because this checking imposes some overhead (in terms
mostly of CPU time and sometimes some memory footprint), they don't
get used for most applications. Yet.

The time is coming, though. The value of VHLLs is beginning to be
recognised, *finally*. When software is written in a language with
built-in memory management, problems like segmentation faults (core
dumps in Unix; in the Windows world these are known as Illegal
Operations, formerly known as General Protection Faults) and buffer
overruns go away entirely.

Add proper garbage collection (not reference counting like Perl5
does, but real gc, which I hope we will get in Perl6), and you
also dispense with memory leaks once and for all.

It's coming. Applications are *beginning* to be developed in this
next generation of languages, but it takes time, because all the
existing apps are mostly C and C++, and you have to throw them out
and start over, which nobody wants to do for obvious reasons.

There will of course always be room for a certain amount of
inherently low-level code written in C or one of its kin: code
that absolutely can't spare a nanosecond per run, code that has
to run on the bare metal (kernels, bootloaders, ...), and code
needed to bootstrap the VHLL tools (compilers and whatnot). But
when C is no more common than assembly language is today, then
you'll be done with random crashes.

Applications will of course still have bugs -- circumstances
wherein they don't perform as they ought. And you'll still have
hangs, because nobody's figured out how to design a compiler or
interpreter that can detect an infinite loop, and nobody except
Mel[1] has coded up an implementation for completing an infinite
loop and passing on to what follows. Perhaps quantum computing
will one day change this, but that's outside of the forseeable
future. But crashes of the sort where the app suddenly terminates
should be mostly a thing of the past within twenty years, ten if
we're quite lucky.

[1] Google for "The Story of Mel, A Real Programmer".

--
Cut that out, or I will ship you to Norilsk in a box.

AppleWorks never crashed by ncc74656 · 2003-05-20 14:20 · Score: 2, Interesting

From 1986 or '87 until about '94 or '95, all my word-processing/database/spreadsheet stuff got done on an Apple II (first a IIe, then a IIGS) running several versions of AppleWorks, up to v3.0. Even with some 3rd-party addons (mainly SuperFonts and UltraMacros), AppleWorks never crashed.

I'm willing to concede that the codebase was considerably smaller. It had to be, in order to produce an executable that would fit in 800K (the size of a 3.5" double-density floppy) and would run reasonably well on a 1-MHz 8-bit processor with as little as 128K of RAM...but I don't find myself doing sufficiently more advanced stuff in Word or Excel than I used to do in AppleWorks (actually, AppleWorks was probably doing more sophisticated stuff with UltraMacros added to it). I would be willing to wager that 95% of Office users use no more than 5-10% of its features. All that extra code that keeps getting added in with every new release means there's that much less time spent making sure the core functionality (and all of the chrome added in previous releases) is bug-free.

(I'll admit that I haven't had much trouble with Office...but then you've noticed that I don't push it particularly hard either.)

--
20 January 2017: the End of an Error.

Software still crahses... by ConceptJunkie · 2003-05-20 14:20 · Score: 2, Interesting

...because we aren't willing to wait for, or pay for, software that has been adequately tested to any reasonable level of reliability.

With something like Windows XP, no amount of testing could eliminate every conceivable bug, but there is no doubt in my mind that Microsoft, along with almost every other software company in the world, rushes poorly designed, inadequately tested products to market to meet customer demand.

Remember, a product's success is due largely to a check list of features created by the marketing people. A product with 90% reliability and 100 features will sell better than a product with 98% reliability and 10 features. Otherwise, how can you explain the success of Microsoft Office? OK, bad example, MS Office is successful because it's been bundled with so much hardware, but you see my point.

The bottom line is computers are now a commodity. They have become so ubiquitous and cheap that I can go down to the Salvation Army and purchase what would have been considered a supercomputer 10 years ago, for $50. Software is quikly reaching the same state. How much software can you buy for $10 or less? A lot. And not all of it is bad, though most is. On the other hand, you can drop hundreds or thousands of dollars on software that is just as quirky, hard to use and even just as buggy.

Here's the thing that always interested me. Why don't console games crash? I'm sure they do sometimes, but I've got a Dreamcast and about 50 games. I've seen a small bug here and there, but I've never seen the machine blue-screen or whatever DC's do when the OS lunches itself. I realize that the standardized hardware platform has a lot to do with it, but games are every bit as complex as other software, perhaps more so. So why don't these games crash? Well, if they did, they would never sell. I'm sure there would be /. articles and Ars Technica articles for weeks if a console game came out that crashed, but when PC games are released that have those kinds of problems, it's hardly news.

Kinda makes me wonder...

--
You are in a maze of twisty little passages, all alike.

Re:Software still crahses... by stwrtpj · 2003-05-20 16:21 · Score: 2, Insightful

Here's the thing that always interested me. Why don't console games crash? I'm sure they do sometimes, but I've got a Dreamcast and about 50 games. I've seen a small bug here and there, but I've never seen the machine blue-screen or whatever DC's do when the OS lunches itself. I realize that the standardized hardware platform has a lot to do with it, but games are every bit as complex as other software, perhaps more so. So why don't these games crash? Well, if they did, they would never sell.
This is one point (and a good one), but the truth is that games get a lot more testing than other software. The main reason for this is that most games could be considered a realtime system, whereas your spreadsheet program is not.

What I mean by this is the fact that a program that needs to respond instantly to user input while at the same time spewing out millions of triangles a second of 3D graphics data has a much lower tolerance for error than your spreadsheet program that spends 90% of its time just sitting there as you type stuff into cells.

Spreadsheet fubar'ed because of some odd value you input? Oops. Oh well, reload from the autosaved copy and try again.

Your game fubar'ed because of some object collision detection glitch? Arrggghh, my character got killed!! I had the game's ultimate superpowered megaboss down to 1 friggin' hit point!! NOOOOOO!!!

Perhaps this example also makes a statement about the priorities we place on how excited we get over games vs productivity software :)

--
Karma: Frotzed (mostly due to the Frobozz Magic Karma Company)

Re:and by Temporal · 2003-05-20 14:31 · Score: 3, Informative

My Win2k box plays games reliably and maintains more than a few months of uptime.

Please refer to this post for more information.

Thank you.

Software Engineering? I don't think so by PetoskeyGuy · 2003-05-20 14:33 · Score: 3, Insightful

Software crashes because it's acceptable and information about how to make programs that don't crash is sometimes hard to come by.

There are programmers out there who have spent years coding and learned how to avoid buffer overflows, check return codes, and fail safe if something unknown happens. But these things are not taught in school and even if they are, someone is going to make a mistake.

Software Engieering never advances. We don't follow the blue prints, we send out the constructions workers and makes sure something is standing ASAP so it looks like were working. Boss is coming, put some drywall up - we'll wire it later. Some guys worked on a really safe way to build the stairways, but his last company patented it so we'll have to do something else this time.

As an industry we don't learn from our mistakes. We reinvent the wheel time and time again but this time it's transparent, chrome and glows in the dark and square. Things are moving too fast and the old can't teach the young to avoid their mistakes because they are considered dinosaurs after a few short years. So we make the same mistakes on the "new" systems over and over.

Plus the system feeds itself this way. This software sucks, I better upgrade.

We would need something like standard Building Codes and Inspectors. When real buildings fail people could get hurt or die, but when a computer fails you reboot. It's just not worth it economically to make a program that never crashes. It would be obselete by the time it's done.

Several Factors by null+etc. · 2003-05-20 14:34 · Score: 3, Insightful

There are several causes of software crashes. Let's address the obvious ones:

race conditions. From the FreeBSD Developers' Handbook: "A race condition is anomalous behavior caused by the unexpected dependence on the relative timing of events. In other words, a programmer incorrectly assumed that a particular event would always happen before another."

Race conditions are particularly difficult for developers to address, since they propogate at many levels within the system (hardware level, OS-assigned resource level, application instruction level, etc.) Also, only realtime operating systems or simple embedded systems guarantee the relative ordering of certain events. Complexity has a direct correlation to the inability to guarantee timing.
deadlocks. Deadlock occurs when multiple processes compete for limited resources. From Sun's Java Classes: "The simplest approach to preventing deadlock is to impose ordering on the condition variables." Sometimes, it is difficult or impossible to guarantee cooperation among competing resources.
unsafe application environments. An operating system can establish limitations upon applications, such that those applications never exceed certain safety boundaries (e.g. access to areas of the filesystem, system resources, etc.)

Most operating systems that thoroughly employ these limitations are considered "user-unfriendly." More user-friendly operating systems, such as Microsoft Windows, inherently eschew these safeguards by default, allowing applications to perform actions that potentially result in a crash. Application environments such as Sun's Java do a good job of "sandboxing" an application's access to resources, such that system crashes are unlikely.
unsafe hardware architecture. A computer's hardware consists of a primitive architecture that is unable to guarantee proper operation. The current PCI bus and "IRQ" interrupt scheme is particularly susceptible to computer crashes, if hardware drivers are programmed incorrectly.
third-party software and hardware. The support for third-party software and hardware results in an operating system environment which is open and generalized enough to be susceptible to crashes. For example, if you allowed anyone to come into your house and plug any manner of devices into your power outlets, you could conceivably experience a power outage as the circuit breaker kicks in to prevent electrical damage. That's the danger of exposing your outlet to strangers.

In order to create a system that enables applications to perform tasks as complex as controlling the entire computer (e.g. screen savers, hotkey programs, power toys, etc.), applications must be given the theoretical power to perform tasks that can crash the computer. The result is that the computer crashes when the application works improperly.
application complexity. Regardless of how smart a developer is, the developer's ability to guarantee the functional correctness of a system decreases in proportion to the complexity of that system. Simple systems therefor are much less likely to crash than complicated systems. Whether they do, or not, depends on the safeguards that were put in place to augment the developer's ability to guarantee the functional correctness of a system. NASA's procedures for programming misison-critical systems relies on any number of safeguards to ensure functional correctness of those systems.

That's a good starting point, for now.

"Stability" is in the eye of the admin by LazloToth · 2003-05-20 14:35 · Score: 2, Interesting

These are true statements:

-In our server room, which, admittedly, is a little crowded, a Windows 95 box was disconnected from the network but accidently left running. It stayed up for more than a year. No load, of course, but it stayed up. It made the hair on my neck stand on end.

-In the same server room, a clone PC running Suse Linux 7.0 ran for just short of two years without a reboot. It would have gone longer had the old, 2 gig hard disk not died a clunking death. Fortunately, the web data was on a different disk. We loaded another system drive and had our departmental web/Samba server up in minutes.

-We have a Compaq Prosignia 200 running NT4 and Raptor 6.0 Firewall. It has seen uptimes exceeding 9 months on more than one occasion. Would have gone longer, I think, were it not for some memory leaks in the Raptor management console snap-in.

I point these things out so as to ask the question: how stable is stable? Hey, *nix has been my passion for years, but I've seen for myself that NT4 and, now, Windows 2000, can perform well if they are set up by someone who knows what s/he is doing. I believe impressive uptimes can be attributed to many things, but I do not always blame the OS code for the bad things that happen.We all know what bad firmware and drivers can do. I'll take NT4 on an Alphaserver over Linux on a Packard-Bell any day.

Of course, Linux on the Alphaserver is better yet . . . . : )

--

It's only funny until someone gets hurt. Then, it's hilarious.

oh please. by twitter · 2003-05-20 14:49 · Score: 2, Informative

Too bad there are so many things you can't do when you've crippled your hardware with an OS with so few apps.

Debian now has 8,710 applications. There are few things I'd like to do that I can't. I spend much less time "rebuilding" computers and more time doing those things now.

I mean, you can't do much except play back multimedia,

Hmmm, ever heard of film gimp? Sure, there are some hardware problems but those will go away as M$ dies. Hardware makers are already taking free software into account.

there's seldom any games you can play on it.

I'm not a game boy, quake II is good enough for me. More will come, in the mean time dual boot. Woody takes care of that auto-magically now.

I liken it to a rock in the middle of a field. Damned stable, that rock. It just sits there.

Yes that's the picture you drew. Reality is different. Think of it as a tremendous magic building, where everyone is invited to come and do as they please. Building materials are free, and so long as you follow a few basic guidlines, your changes and additions will be as sturdy as any piece and everyone can enjoy it at once.

--

Friends don't help friends install M$ junk.

Actually by Sycraft-fu · 2003-05-20 14:50 · Score: 3, Insightful

One of the biggest barriers to stability for something like Linux (or Windows) is the fact that it must accomadate new software and hardware configurations all teh time. If you take a Lucent 7R/E phone switch it will run on a given hardware (the 7R/E) hardware. IT will run Lucent's OS, it will do only what it was designed to (switch phone circuts). There is no putting new hardware in it, less it be Lucent approved, there is no loading of new apps to make it do things, less it be Lucent approved, and so on.

IF you want an open OS that will run with hardware by whoever happens to want to make it and software by whoever hapens to want to write it, you cannot have a verified design that is 100% reliable. Unforseen interactions WILL happen and crashes or other malfuncations will result.

Re:and by |Cozmo| · 2003-05-20 14:53 · Score: 2, Interesting

I don't remember having any games screw up my system since I stopped playing half-life. I built a new system a couple months ago and it hasn't crashed once.
I had a win98 system last a bit past 30 days with regular use once and it was terribly hosed by the time I rebooted. Win2k or XP can last until your power goes out, you kick the surge protector, or you need to reboot to install drivers/software/hotfixes ;)

Risks Digest by dsplat · 2003-05-20 15:01 · Score: 2, Informative

For many megs of answers about why software isn't 100% reliable, read Risks Digest.

There is indeed hardware out there with this level of reliability (like an AT&T 5ESS/Lucent 7R/E phone switch) however it is highly expensive and very unflexable.

I don't mean to bash AT&T. In fact, the very infrequency of this sort of problem is a strong argument for their reliability. I had to go back to the pre-Lucent days for this one, folks. However, they do have some occasional bugs in their software. And it makes the news when they do:

Risks Digest, Volume 9: Issue 69, Tuesday 20 February 1990

--
The net will not be what we demand, but what we make it. Build it well.

Scientific American cover story this month by ChrisCampbell47 · 2003-05-20 15:02 · Score: 2, Informative

Scientific American magazine has this topic as their cover story this month (June issue).

Computer, Heal Thyself

"Systems inevitably fail. The key to reliable computing is building systems that crash gracefully and recover quickly."

--
One simple rule for its versus it's

OT: Electric overconsumption by maynard · 2003-05-20 15:08 · Score: 5, Insightful

I used to leave all sorts of machines running 24/7 in my apartment. Several Suns, a couple PCs running Linux and BSD, an SGI, blah blah blah. I did take care to turn monitors off though. I kept this up until I turned off all my systems (except the mail server) for a two week vacation: I was shocked to discover the next electric bill arrived a good $80 cheaper. I've since cut back to a single machine which I turn off at night. No more crazy uptimes, but honestly - I'll take the money. I wish there was consumer demand for low power destop computing. I guess I'll just have to migrate to a good laptop for the low power option. But you're absolutely right: a few computers can suck up a lot of power, with damaging results to one's electric bill. --M

Re:OT: Electric overconsumption by doorbot.com · 2003-05-20 16:54 · Score: 5, Interesting
I wish there was consumer demand for low power destop computing.

My mail/web server would run fine off of something rediculously small, like a Sharp Zaurus. Here are my requirements, and I will pay for one if it is available.
1. Non-x86 hardware designed for lower power -- extra speed is nice, but not required; Pentium 200 speeds or better
2. Low power, with 9V or AA-based battery backup (changeable while system is running)
3. 3" - 4" LCD (with manual switch to turn off) at 640 x 480, or some sort of LED array/VFD, because all I really need is a low power terminal supporting 80 x 24 characters.
4. USB port for keyboard
5. Serial port
6. Two or three 10/100 NICs
7. Full (Debian) Linux support of all hardware
8. Some sort of expansion (PCMCIA maybe, or via USB)
9. Support for CompactFlash for backups
10. Hardware encryption would be a nice goodie but not required
Yes, I could probably build this with PC104 components, but I want a pre-built product, and I'm willing to pay for it (maybe $300 - $400).

it DOES cause an error by ChrisCampbell47 · 2003-05-20 15:17 · Score: 4, Interesting

Interesting that the first two posts in the thread had English syntax errors in their first sentences. We can still understand it, but compilers/CPUs would have problems. Seems that the real problem is the difference in the natures of wetware and hardware.

Actually, "syntax errors" like this DO cause a problem for wetware systems -- they cause the brain (well, mine at least) to kind of glaze over and take the remainder of the sentence/thought much less seriously. Kind of like aborting/returning out of a subroutine.

Here in the Slashdot world of "definately" and "righting", I've learned that any posted comment that makes high-school-level grammatical or spelling errors is not worth my time and I immediately skip the post. I've been doing this quite rigorously lately -- blah blah blah "seperate" PAGE DOWN.

OK now, everybody nod and think I'm talking about someone else's posts ...

--
One simple rule for its versus it's

Re:it DOES cause an error by fucksl4shd0t · 2003-05-20 18:50 · Score: 4, Funny

Here in the Slashdot world of "definately" and "righting", I've learned that any posted comment that makes high-school-level grammatical or spelling errors is not worth my time and I immediately skip the post. I've been doing this quite rigorously lately -- blah blah blah "seperate" PAGE DOWN.
You are just asking for it. :) Yes, you are. So here it is:
"high-school-level" should not be hyphenated. That is a High School level grammatical error.
That sound you hear is the toilet flushing your shit away.

--
Like what I said? You might like my music
Re:it DOES cause an error by bm_luethke · 2003-05-20 19:28 · Score: 2, Insightful

I think you loose alot (and I am sure you will quickly ignore my posts).

The poster may not be a native english speaker.

The poster may have my problem (dyslexia, learning disability, etc) and still be quite competant in what they are trying to express.

So, I can't spell. I have a physical problem (dyslexic - actuall medical diagnosis), I also don't have time to spend putting every post through a spell checker. It still doesn't change the fact of the content of my posts being correct or incorrect. Then again - it is only damaging to you to ignore any wisdom given by someone who doesn't speak english well or has a disability.

--
------- Sorry about the spelling, I suffer from two problems. Dyslexia makes it difficult to spell well, lazy makes it
Re:it DOES cause an error by pesto · 2003-05-21 02:14 · Score: 2, Informative

Actually, you're right. Words used together as a compound adjective modifying a noun should be hypenated. There's one little catch here: Because "high school" is itself an adjective modifying "level," we should put a hyphen between "high" and "school" (think of it as a first-order compound) but a longer en-dash between "high-school" and "level" (a second-order compound).

So: high-school-level, where the second dash should be HTML entity ߝ (but Slashcode won't allow it).

Containing the Damage by Salamander · 2003-05-20 15:24 · Score: 4, Informative

A lot of people are answering the question of why there are bugs at all, and it's an important question, but I'd like to take a different angle and consider why there are so many visible bugs. Why does a bug in a driver, or even an application, bring down a whole system? In addition to reducing the incidence of actual bugs, IMO, we should also do a better job of containing the bugs that will inevitably exist even if we all use the latest whiz-bang code analysis tools (which rarely work for kernel code anyway). Some of the semi-informed members of the audience are probably thinking that's the job of the operating system; I'd argue that our entire current notion of operating systems is flawed. There are way too many components in a typical computer system that "trust each other with their lives" in the sense that if one dies all die. Memory protection between user processes is great, but there should be memory protection between kernel entities, and other kinds of protection, as well. One of the basic services that operating systems need to provide going forward is greater fault isolation and graceful instead of catastrophic degradation.

The Recovery Oriented Computing project at Berkeley has gotten some press recently for trying to address this issue. Many here on Slashdot don't seem to "get it" because they've never worked on systems in which a component failure was survivable; they don't realize that rebooting a single component - perhaps even preemptively - is better than having the whole system crash. "Software rot" is a real problem, no matter how hard we try to wish it away. ROC isn't about saying bugs are OK; it's about saying that bugs happen even though they're not OK, and let's do the best we can about that. Another project in the same space, with more of a hardware/security orientation, is Self Securing Devices at CMU. There, the idea is to find ways that parts of a system can work together without having to share each others' fate. While the focus of the work is on security, it shouldn't be hard to see how much of the same technology could be applied to protect a system from outright failure as well as compromise. There are plenty of other projects out there trying to address this problem, but those are two with which I happen to have personal experience.

The key idea in all cases is that current OS design forces us to put all of our eggs in one basket, and that's really not necessary. Designing fault-resilient systems is tough - few know that better than I do - but that's only a reason why we should do it once instead of devising ad-hoc clustering solutions for each specific application. Lots of people use various forms of clustering as a way to achieve fault containment and survive failures, but the solutions tend to be very ad-hoc and application-specific. Do you think Google's solution works for anything but Google, or that a database transaction monitor is useful for anything that's not a database? Fault containment needs to be a fundamental part of the OS, not something we layer on top of it.

--
Slashdot - News for Herds. Stuff that Splatters.

Re:and by workindev · 2003-05-20 15:26 · Score: 3, Funny

You are acting like you can actually play a decent game on Linux. HINT: Some freeking penguin on a sled doesn't count as a decent game.

This is a mathematically proven un-solvable thing by jhoffoss · 2003-05-20 15:30 · Score: 2, Insightful

You could never write software that was perfect, because you can never account for every situation.

The solution most non-CSci people ask next is "Can't you write a program that checks for errors?" Intriguing to think about if you've never actually pondered it, but the answer unfortunately is no. You can't write a finite-state machine that can detect or correct an infinite number of states.

To do so would be similar to calculating the "best" route from NY, NY to LA, CA. You could choose any number of roads and paths from coast to coast, with or without loops (finding them would be quite a bitch) possibly traversing every road in the US. If you don't understand why you can't calculate this, ask your neighborhood CSci major.

The best we can instead do is safeguard the software we write as well as possible, which requires time (and therefore money) and computing power to do things like bound-checking on arrays; handling interrupts properly; and managing memory throughly, to name a few major problems in any software. Languages like Java come a long way in some respects, but are very slow. But this isn't a good enough solution, and frankly, most programmers aren't good enough to produce fully error free code.

As revolting as it may sound to the hacker-coders out there, great programmers, software engineering, business processes, documentation, and management of the whole product are necessary to produce truly good software.

--
Linux: The world's best text-adventure game.

Nope! Case in point. by fireboy1919 · 2003-05-20 15:35 · Score: 3, Informative

I have a Microsoft reference driver for my soundcard (i.e. Microsoft made the driver and approved it themselves). I use it on my computer.

Unfortunately, two things cause it to fail.
1) It doesn't play nice with other drivers on the same IRQ.
2) Microsoft's advanced power management driver assigns it to the same IRQ as my USB port and my network card, and that can't be changed without a reinstall of Windows.

So basically, what happens is that the sound card will eventually crap out completely and never work again (until reboot) if it attempts to work at the same time either of the other two devices on that IRQ are working.

Keep in mind:
1) Microsoft knows about this bug
2) It causes system instability for lots of drivers - even certified ones

I should also mention that there is nowhere that this bug is reported by the OS; I had to find it through trial, error, and lots of research. Win2K is not as stable as you think

--
Mod me down and I will become more powerful than you can possibly imagine!

Scope and Features by jkichline · 2003-05-20 15:42 · Score: 2, Insightful

I think the issue with crashing software is a combination of problems. Obviously cost is the biggest issue. Economics is another. And time is never on the developers side. Fact is, it is not economically advantageous to write rock solid code. Why?

First, it costs a lot of money to test and it is very difficult to keep your new code under wraps (from competition) and still offer a truly well tested system. Open source solves this problem by somewhat reducing competition since the code is free and can be tested by many people in various stages of testing. (Probably why Open Source is more stable)

Don't forget boredom. Once a developer gets something "working" he or she doesn't want to continue to stare at the code for hours contemplating its every possible flaw. We'd rather be reading slashdot.

Second, if your software was 100% bug free, people would never have a reason to upgrade. Guaranteed, if Windows 98 didn't crash so dang much I would never have installed Win2k. My dad had an old Compaq Presario with Windows 3.1 on it and it never crashed. He reluctantly had to upgrade to experience things like MP3's and AOL. (and crashes) I did downgrade from WinXP (Piece of doggie doo) back to Win2k.

Third, time is of the essence. Many times I am pressured to get the code done. It is better to have a software application that works pretty good and start using it than to have it absolutely perfect and never use it. This is an expontential scale. It takes more and more time to make the software a fractionally more stable. And sometimes you find a rewrite is in order. There is a balance to be obtained.

Some other things to consider: Scope and Methodology. The comparison was made between cars and code. I think this is an unfair evaluation because the scope of a car is well defined. You know certain parameters such as the size of the road, the speed it can travel. You have certain benchmarks it must meet, safety regulations. Software on the other hand has few of these. Operating Systems run on an incredible number of hardware and can be configured in infinite number of ways. I've found that PCAnywhere when installed with some other, unrelated software can just blow up an machine. The problem is that scope is not, and most noteable cannot be contained WITHOUT limitations. This is the reason why a Linux server running in Terminal mode with two daemons on it can run FOREVER. The scope is well defined, crap is not compiled into the kernel.

Lastly, methodology is the best answer. The comparison of computer code to legal code is a very good one. The reason why good lawyers write good legal docs is because they have a good methodology. They know how to cover their bases. Programming language developers should consider a development methodology and set up limitations. Java and other type-safe language set up these limitations and the result is safer code. Consider narrowing this even more. But realize that limiting what the developer can do has economic effects. What good is the worlds tightest coding methodology if VBScript still exists and can do the same thing? (and break)

In all, we are held in the balance. Yin and Yang. We cannot have one without the other. You add features, you add bugs. You create limitations, your code doesn't get used. You increase your time to market, you watch your competition buy you out. This is the way of things. A chasing after the wind.

[OT] your sig by Mr+Z · 2003-05-20 15:48 · Score: 2, Interesting

About your sig: Actually, I currently write games on a machine with about 1.5K of memory and an 895kHz CPU. And I am grateful.

--Joe

--
Program Intellivision!

Whoops, bullshit alert. by freeweed · 2003-05-20 16:06 · Score: 2, Interesting

Windows 9x actually has a bug in it that would lock the computer after 46 days of uptime, but it took years to catch it because no one ever got close to that mark.

Bullshit, bullshit, bullshit. This urban legend deserved to die years ago.

I ran several Windows95 OSR2 systems with uptimes approaching 90+ days, and had no problems with them locking up. Sure, 9x wasn't HAPPY with this, and if you ran a lot of applications odds are you won't hit this, but I did it many times in my former employment.

When the '45 days' (as I heard it first) rumor started going around, I set up a bunch of idle 95 machines for fun, and on days 45-50 watched for anything going on. Not one crashed.

Hell, for all I know, Microsoft themselves are reporting this, just to cover their asses based on some average uptime limit they worked out, but I will swear on a stack of bibles that I've had Win95 machines go at least twice this supposed limit without locking up.

--
Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.

Re:Whoops, bullshit alert. by rabidcow · 2003-05-20 16:40 · Score: 4, Informative

Microsoft says so.

Actually it's in some driver, not the core OS, so it's not surprising that it doesn't happen to everyone. (There's a few other things with similar problems.)
Re:Whoops, bullshit alert. by tagevm · 2003-05-20 22:08 · Score: 3, Interesting

I bet the piece of code causing this looks something like this: ... /*
Check every second....
Maybe GetTickCount wraps, but I don't care,
something else will probably break before 49 days anyway
*/

if (m_dwLastTick-GetTickCount())>1000)
{
DoSomeThingImportant();
m_dwLastTick=GetTickCount();
}

GetTickCount returns the number of millisecs since reboot, after 49 days it will wrap and start over, so lazy programmers using code such as above will have a problem.

Re:And by killmenow · 2003-05-20 16:17 · Score: 3, Funny

He dies.

Re:Try the UML by Billly+Gates · 2003-05-20 16:18 · Score: 3, Interesting

Architects and engineers use extremely detailed drawings. Have you ever taken any drafting courses in Highschool or College? Every piece and even the size of every screw is accurately detailed as possible. It takes forever to get anything done because the precsion is more important. It drives some people like myself crazy.

The blueprint is the actual prototype of the product being designed.

The problem is if you document every step and algorthim in exact detail you will spend weeks, months, and yes years without a single line of code!

This is unacceptable in today's bussiness world where all the projects are due yesterday and your bosses demand percentage wise how much of the code is being developed. If you spend a month planning and not a single line of code is developered your canned.

My father took over a project where a clueless IT manager got because she slept with the CIO. Anyway she went to a seminar which talked about over flowcharting everything would be the wave of the future. She then had all the programers draft every single algorithm to the very if statements themselves on paper. After 4 months and not a single line of code my old man took over. From there he finished the project within 3 weeks!

My point is that drafting programs is too time consuming. In a way your drawing is the program and changes can be made as you go. Its essential to have good flowcharts and notes but they need to be generalized. If there is an error in it you can delete the line and fix it. In engineering you would have to dissamble the actual product and redesign it. Because they would cost time and money it is not accepted. In software that limitation is not there or as sevre.

UML tries to be the blueprint of all software programs but instead is only used to explain certain subsystems and algorithms. Mostly flowcharts are used so all the developers have a sense on how the program will work and how to invoke different pieces of the program.

I do not think this going to change unless there is a quick and easy way to debug UML charts. Logic errors are killer and if its perfect I suppose you can compile the uml directly into the language of choice.

Hmmm infact this might be the way to do it in the future.

--
http://saveie6.com/

Microsoft lower the standard by frovingslosh · 2003-05-20 17:00 · Score: 2, Insightful

"I've used computers for about 30 years and over that time their hardware reliability has improved (but not that much), but their software reliability has remained largely unchanged.

I've been using computers a few years longer. Heck, I've owned computers a few years longer (yes, that makes my first one prior to the 8080 micro chip). But even 25 years ago I saw Data General systems with a lot less raw power than a Pentium that ran a multi-user OS and supported an office full of users, and routinely ran without crashing or even being shut down from year to year, and were only rebooted when the tech came around to give them a scheduled prevenative maintence. Sure, some systems did fail (and some in quite interesting ways), but it was the exception, not the rule. The thing that I see as having changed is that Bill Gates became the richest man in the world, while at the same time giving us an OS that crashed so regularly that it just can't stay up. And somehow people accepted it. How he got away with it I don't understand.

--
I'm an American. I love this country and the freedoms that we used to have.

Re:Computers don't crash by NetCurl · 2003-05-20 17:08 · Score: 5, Insightful

Personally I don't think not giving the user the option of defining any settings which could cause malfunction to be the answer. The reason? Well it's pretty simple, when set properly those same settings give flexibility, added functionality, and performance (at least one, sometimes two, often all three of the above).

See, that's the thing. I like Apple's OS because at surface level, you can't get access to those features that could really break things if you screwed with them too much. If you really want to muck around with those settings, they are there and ready to be played with through various means (Terminal -- it's a freaking BSD system, Third-Party, and power-user know-how). I would like to respectfully disagree with your statment and say that by default they don't offer the option of defining settings that may cause malfunction, but in OS X they have left almost complete wiggle-room to in fact screw EVERYTHING up; if you know what you're doing. I think it's more genius than anything...

--

It's only when we've lost everything, that we are free to do anything...

Re:and by sheldon · 2003-05-20 17:08 · Score: 3, Interesting

Interesting.

I play RTCW quite a bit on my WinXP box with no issues. RTCW occasionally crashes, and I have to hit CTRL-ALT-DEL to bring up task manager and kill it, but the system remains stable.

When I first built this box I had some issues, after a while it would lock up. Turned out it was because the video card was overheating. The system itself wasn't locking up, just the video card. Put the system in a new Antec SX-835II case with better cooling and haven't had a problem since.

the MS case by porkface · 2003-05-20 17:10 · Score: 2, Informative

Having spoken with Microsoft OS developers about this issue in some detail, they make risk / benefit choices all the time where they know one way will not crash ever, and the other way will crash but will be amazingly faster.

Guess which way consumers pay them to build it. When they choose the crash-but-fast method, they just put an astounding amount of QA into it to whittle the probability of a crash down to an acceptable level. And I agree with them about what an acceptable level is, because I know that when I crash my Win2k system, it's my fault.

They put more testing and research into their OSes than anyone these days. Maybe Sun used to have them beat there, but Sun isn't nearly as focused on those things anymore.

Re:Computers don't crash by geekoid · 2003-05-20 17:49 · Score: 3, Informative

I would call anything that unnesarily foists a business rule onto the user an error.

" If this can occur, it is the developer who is at fault, not the user."

thank you. I have to combat 'stupid user' mentality at work every day.
from "the user will never do that, so don't worry about it" to "I can't be blamed if the user wont read the manual".
I try not to say it, but it is hell working with coders who got into code 3 years ago for the money. Whining about working 45 hours a week and not understanding things like pointer and user defined types. Normally thats fine, I don't mind mentoring, but when you explain it, to a developer, and there eyes glaze over until you tell them exactly what to write, for the 3rd time that day. I thought it was me, but I even gave the photo copies of very clear explaination, with very simple examples and diagram.

hmmm, sorry about that, I think work is getting to me.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

Crashing... by JWSmythe · 2003-05-20 18:23 · Score: 2, Insightful

Crashes are a rather ambiguous topic..

A lot of computer crashes depend on what you're doing with it.

The machine I'm working on right now running Win98 or Win2k crashed on a regular basis by itself. I was tempted to blame bad hardware. Under Linux with a similiar workload (OS, GUI, browser, mail client) it never crashes.. That I can blame on the software being run.

Identical machines with completely MS software behave the same, so it's hard to blame non MS software for the crashing.

My Compaq iPaq with WinCE would lock up or shut itself off about twice a day under virtually no load and no 3rd party software. (I hadn't really figured out what to do with it yet). I was ready to return it to the store. I opted to call it a part-time paperweight, and "try" Familiar Linux on it.. Hasn't crashed since..

Well, that's not completely true. I've done some rather silly OS upgrades (hey, lets change all the libraries while it's running, and see what happens), so the crash was user failure.

But not to make Linux sound perfect, I've crashed machines with poorly written software. I've sent them into huge loops, and had software running that managed to suck up all the memory and hang the machine (a packet sniffer monitoring a 100Mb/s connection). Even my favorite web server, thttpd, had a poorly written beta version once that would upset the server after a couple days of running.

Is it always the OS? Nope. I've had a set of 10 machines with "generic" memory in them.. After a few years of running, they all began crashing mysteriously about twice a day.. Swapped the memory out for name-brand memory, and the started working perfectly.

We have a big industrial looking Dell on the network. Memory flaked out in that. Machine was dying about once a month. Swapped that out for a larger quantity of Crucial memory, and no more problems.

In a computer store I worked in years ago, we bought the cheapest hardware possible. The motherboards didn't come with boxes, and the manuals never made a reference to a manufacturer. Most of the hardware I couldn't even track down a manufacturer name through the vendors. About 1 in 10 parts wouldn't behave properly when we turned it on. About 1 in 30 machines came back for repairs for bad hardware within a few months.

So, it is really up to everyone involved if the machine will work right. I use Asus motherboards, Crucial memory, and Western Digital hard drives, and rarely have a hardware problem. The last problem I had was a bad IDE cable. There's always something that can fail.

The software has to run well, and we've very very happy with Slackware's distributions, with Apache and thttpd.

The biggest problem we have is user software or simple misconfigurations.. What happens when you have a heavy traffic web site, and the web server logs never rotate or get truncated? The drive fills up fast, and you end up with 2Gb logs.

What happens when you write a program that ends up sucking up all the memory and CPU time? Makes it not run right (I've done it myself a few times. Oops.)

People constantly bring their home machines in to work for repairs, for various reasons. About half are software misconfigurations (how many 3rd party applications do you really need running at boot time?). The other half, dying hardware.. The CPU fan made noise for 6 months and then stopped making noise, but you let it go? Ya your CPU is burnt. Cheap fans do that faster than most.

Can they build a crash-proof computer? No. Just like they can't build a crash proof car.. Cars typically crash due to user failure (users including other drivers), or compontent failure (Ford tire blowouts). Not really the car's fault. I had a car in a parking lot crash. A driver missed the highway and broadsided it.

So, you can strive for perfection, but there are always going to be circumstances that can cause failures, usually attributed to users. (those damned users.).

--
Serious? Seriousness is well above my pay grade.

Re:Computers don't crash by TheOldFart · 2003-05-20 19:13 · Score: 4, Funny

Eliminate the user. That takes care of half of the problems...

Re:Computers don't crash by digital+photo · 2003-05-20 19:26 · Score: 5, Insightful

I would agree. Properly and well written code will gracefully handle runtime errors.

Translation: Short of the user fubar'ing the program or data files themselves, the program should handle all user input in a graceful way.

The problem though is that to do this would require quite a bit of extra work.

Progammers are caught in a situation of getting something ready for market at a time dictated to them by a department which doesn't understand the underlying issues or saying "Screw it" and making the code solid.

That only describes one way in which the problem is caused.

The bigger problem is the attitude people have about computers which allows for this kind of shoddy programming. People are, for the most part, okay and even expectant of their computers to crash at some point in time.

This in turn makes it okay to release bad code which will be "fixed later".

I say that whenever we get a crash or a problem, we report it to the company and we post it to our websites and to review sites.

I say that the users should make it a big fat noticable problem to the companies whenever their software breaks.

why? because it means that whenever someone who's never used the software before searches on Google for that software or software company's name, they will find page after page of complaints, dissuading them from using the software.

the flip side is, if the software works, post to your sites and review sites. Give the people and companies who produce good software credit when it is due.

As users and consumers, we should find ways to encourage the producers and companies to produce solid code.

Solid stable code shouldn't be the exception to the rule.

--

Winged Power Photography

Why Do Computer STill Crash ? by pwl256 · 2003-05-20 20:21 · Score: 2, Interesting

While the constraints may be cost etc perhaps something I took from a PL/1 book - ;-0 years ago may be relevant.
'The Meaning of Correctness

1. The program contains no syntax errors that can be detected by the compiler.
2. As for 1 and it can be run.
3. There exists a set of test data for which the program will yield the correct answer
4. For a typical ( ie reasonable) set of data the program return the right answer
5. For a deliberately difficult set of data the program returns the right answer.
6. For all sets of data, valid with respect to the specification, the program restuns the right answer
7. For all possible sets of valid test data, and for all likely conditions of erroneous input the program returns a correct ( or at least reasonable) answer.
8. For all possible input the program gives the correct, or reasonable answers.

Most programmers work at level 3 or 4
Users at 8.'

(I am sorry but I have lost the reference to the original book)

Easy.. economics and ongoing profit by smeenz · 2003-05-20 20:27 · Score: 3, Interesting

In the vast majority of cases, it's simply not economic to release bug-free code.

1. Any programmer knows that 90% of the code is written in the first 90% of the time, and the other 10% of the code is written in the other 90% of the time. (no typo). That is to say, it takes a lot more time, effort, and hence money, to move a project from "working well" to "working perfectly".

2. Many software companies these days make very little profit on the 1.0 release of their software, and make huge amounts of money through ongoing support charges. Microsoft is a classic example of this type of company.

3. If you release a piece of software that works really well, does everything the users want, and never crashes or causes trouble, then you may as well pack up shop and go out of business quietly. The unfortunate truth is that nobody is going to buy version 2 if they can do everything they want with version 1, and they're not getting constantly frustrated by crashes. The only carrot you have in this situation is to think up some really great ideas for version 2 in order to encourage people to upgrade - In fact, some of those ideas may have been deliberately left out of version 1 just so that they could be added later. Version 3 is more difficult still, and version 5 is right out. By comparison - how many versions of office are we up to now ?

A notable except to this business model is the games writers. Companies like valve and id software consistantly produce very near to bug-free code that works well and generally impresses the masses.

In all the years since half-life was released, there have been relatively few patches and fixes, and many of those were to prevent ingenious new methods of cheating, or to add support for hardware that didn't exist when the game was first released. The unreal engine had a similar history.

People buy new games because they crave the excitement or challange of exploring and interacting with it. That's not something that could really be said about excel or word, so those sorts of products have to rely on the "draw out the profit over many releases" strategy described above.

Another (big) factor is people's expectations - most people expect that word will crash from time to time, and given microsoft's past history, they have little reason to expect that to change. On the other hand, gamers have an expectation that the latest game from id software will be as solid as a rock, and that the few problems that do crop up after the release will be fixed quickly.

If a games company didn't spend that "other" 90% on the last 10% of development, and released something that crashed as often as explorer, their reputation would be mud within days, and people would stop buying their games.

And lastly, choice.

People have a choice as to which games they want to buy. It's a competitive market out there, with many people having little disposible income to spend on games. On the other hand, despite what linux advocates (I can't believe I'm saying this on slashdot) say, most people use MS apps and operating systems because they don't have a choice - say due to corporate rules.

You might think that it is the end user that gets the sharp end of the stick here, but the people that really get screwed are the dedicated and talented programmers, who are working for companies that don't care too much if they release code before it has been fully tested.

Re:Easy.. economics and ongoing profit by anubi · 2003-05-20 21:03 · Score: 2, Interesting

"If you release a piece of software that works really well, does everything the users want, and never crashes or causes trouble, then you may as well pack up shop and go out of business quietly."
Geez! You stated exactly what happened to me. The company I used to work for bought some really neat DOS software for circuit analysis, schematic capture, and PCB layout. It worked flawlessly. Very easy to use. No frustrating DRM/Licensing issues to deal with. User-definable libraries. Nice file structures. In short, what I would have done if I did it myself.
When they transitioned from an Engineering Company to a Management Company, they surplussed all this neat software. Me, along with my software, was excessed. I was first in line to buy it from the company, being I knew exactly what it was and how I could run it on anything I could get my hands on. The company no longer exists, but I still run the software daily, albeit in another company.
Here it is, nearly 20 years later. I *still* prefer to use these programs. They are blindingly fast on a Pentium, allow me to update their libraries with all the latest parts I use, and still work perfectly.
By this time, I understand exactly what these programs do and am quite fast with them.. they are so familiar by now that I no longer have to concern myself with how to get the system to do what I want... now that I have finally perfected a simple DOS-based system thats ready for work about 13 seconds after I turn on power. I still fail to see what everyone is carrying-on about over these finicky new design softwares. I *try* to use them but soon become so frustrated with them that I keep reverting to the simple one.
It kinda bugs me when I have way too many choices - like do I really care what font or centering options the resistor values show up in the schematic I am preparing to feed to the SPICE simulator or the PCB Layout proggie? Just put the value where I place it and I'm happy. I just want it done NOW. I don't wanna dicker with it. If its gonna get published, I'll dump it into a .DXF file and let the AutoCad and PhotoShop guys gussy it up all they want.
See? There's an anecdotal evidence supporting your claim. They did the software right, and never sold another to me. All the companies that made the software are now out of business ( one got bought out, the other two are just gone.)
The favorite concern of the company I now work with is that I am using completely unsupported software. But then, I used a completely unsupported hammer when I built my doghouse. Big deal. If it works, what do you need support for?

--
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]

the way as we do.. by tshuma · 2003-05-20 20:46 · Score: 3, Insightful

Have you ever heard about a company which is bulding houses without any plans?
Software companies are growing too fast, and they want to make more and more and more...
there is no time to make good requirements and no time to make a plan..

People, and mostly managers, are not "safe thinking".. Thay want everything as fast as possible. This is the reason why software companies need to use software to controll they process.

But in the other hand, the hardware is looking the same.. i dont remember any C64 which has wrong memory, or motherboard.. it was just good at all! But if I buy a new memory modul to my computer it could be wrong, or it is incompatible with the others!

So, what I belive, we need to use programs to controll the all software designe process, a program which dont let me go around a problem. But I am sad, because we sould use it since 80's!!!

--
There is only one good solution: The simpliest!

Paranoia by Detritus · 2003-05-20 21:08 · Score: 2, Interesting

Many years ago, I had the experience of reading the source code for the device drivers in a multi-user DEC operating system. It was very enlightening. The engineers who wrote the drivers assumed that all of the hardware was buggy, unreliable junk. They wrote code that expected the hardware to fail or lock up, and took the appropriate corrective action. If an operation timed out, the driver would reset the controller and reissue the command.

UNIX had the opposite philosophy. The hardware was expected to work perfectly. This led to situations where a DEC operating system would run reliably on a particular machine for months at a time and UNIX would crash within minutes on the same hardware.

--
Mea navis aericumbens anguillis abundat

Re:Computers don't crash by Alan+Partridge · 2003-05-20 21:16 · Score: 2, Insightful

who's gonna buy the product? who's gonna install it?

--
That was classic intercourse!

Apple's Human Interface Guidelines by BigBadBri · 2003-05-20 21:20 · Score: 2, Insightful

Apple's Human Interface Guidelines are a nice introduction to user-fault tolerance, even if you're developing for other platforms.

Are we to understand that Apple is good, or that Apple users are particularly stupid?

Personally, I've never used a Mac for work (I've only dealt with them when setting networks up for others), but the UI has always seemed a few steps ahead of the competition in terms of ease of use, so I'd applaud Apple for taking the time to think of the user and making the interface easy to use.

--
oh brave new world, that has such people in it!

Re:Apple's Human Interface Guidelines by Afrosheen · 2003-05-20 22:21 · Score: 3, Insightful

Considering that Apple's original (and perhaps enduring) core market were 'creative types', I'd say they were shooting for brilliant people that didn't know shit about computers. They originally established those guidelines so companies coding software would adhere to a standard and everything would feel right.

Consider Adobe, for example. You open an old or new version of photoshop on macintosh..it looks and feels the same. Everything is always in the same place on a mac. File, Edit, Bla bla bla it's always in the same order regardless of the version, regardless of the app. It's called 'genius' from a user's standpoint.

When you can take a drooling noob and turn him into a productive photo retoucher in one week, I attribute that more to apple and adobe than anything. Trust me, I had to train a few dozen people from various backgrounds and everyone became a ninja eventually.

It's all about conditioning. by Bert64 · 2003-05-20 22:23 · Score: 2, Insightful

People are so used to unstable computers nowadays, a crash is considered normal.. people EXPECT computers to crash, and couldnt imagine one that doesnt.
This means that unstable software sells just as well as stable software, but is much cheaper to produce since you dont need to test it so thoroughly. Now any commercial vendor will realise they can save a lot of money while only very slightly damaging their sales, the money they save on testing more than makes up for the lost sales so they just continue writing buggy software.
If the average computer user would boycott products for being unstable, and stand up and say "this really isn't good enough", and it seriously hurt software sales, then something would swiftly be done about it.

--
http://spamdecoy.net - free throwaway anonymous email - avoid spam!

Re:Computers don't crash by jo_ham · 2003-05-20 22:39 · Score: 2, Interesting

I got fed up with just that sort of thing and changed computing platform. I'm not saying that the Mac never crashes, but it's certainly been a massive, massive step in the right direction.

A quick trip to the terminal reports my uptime as "11:35AM up 57 days, 12:42..." This is by no means a long time by Unix standards, but for a laptop (iBook 600Mhz) that I use everyday, sleeping, waking, starting and stopping multiple programs, working on all sorts of stuff, burning CDs, browsing the net etc, I'd say it was very good.

The longest I could go on my Windows 2000 box before I'd have to reset was about a week - it wouldn't crash, it would just get confused and start swapping icon images over, so Word would have the Excel icon, and so on.

The only time I reboot my iBook is for system updates. Very few programs "Unexpectedly Quit" on me (Camino used to do it occasionally, every 2 weeks or so, but I'm using Safari right now). I've never had a kernel panic in 10.2.x (I had two in 10.1.5, but I traced it to the well known Classic environment and a USB device panic bug that was fixed).

If you want your software to crash less, buy a Mac.

Stable, Fast, or Cheap by interactive_civilian · 2003-05-20 23:20 · Score: 2, Funny

choose any 2.

or something like that...

--
"Empathise with stupidity, and you're halfway to thinking like an idiot." - Iain M. Banks

the organisation of software development by tychoS · 2003-05-21 00:12 · Score: 2, Insightful

For close to a decade I have worked as a software developer for various companies, and in the course of that period I have read quite a few books and papers on software project management, process and the like, as well as participated in conferences and study groups on the topic. Both theorethical and anecdotical evidence points towards the way we organise software development to be the main limiter to quality and creativity.

In most software companies you get promoted for political aptitude with little or no regard to yoru knowledge of how to create software and just as important how to organise software development teams well and how to get a mutually benefitical relationship with the clients during and after the project.

Such people tend to beleive urban legends such as in bygone days, in a country far from here, there was a software project that used the waterfall process and finished on time, within budget and with a happy customer.

They do this despite the reasons why waterfall processes leads to nowhere pleasent having been throughly documented in everything from scholary texts on organisational theory to excessive numbers of first person narrated horror stories. And who can blame them. They got promoted to middle or upper management, not because they knew a thing about organising software projects, but because they were better politicans than the next guy, so it would not further their carear if they were to sit down and read their first book on software project management throry.

Lots of crashes == bad hardware by ChrisPaget · 2003-05-21 00:20 · Score: 2, Insightful

Windows 2000 Server, SP3. Up for 55 days, 15 hours, 53 minutes. And that's only because I moved into my flat 55 days, 17 hours ago :) In that time it's been used extensively for C / C++ development, plenty of Quake 3, CD burning, watching DVDs, Kazaa, you name it. And it also serves my website (half a million hits over the 55 days), email, internal DNS, DHCP and file server. It's transferred over 150Gb of data to either the internet or LAN, and has never crashed. Who says Windows 2000 isn't stable? I don't even need to reboot when I install patches - restarting services to trigger the updates is relatively easy on Win2K if you know your services well.

Windows in general cops a LOT of shit for instability that it really doesn't deserve. Before you criticise Windows for being unstable, I suggest you try debugging a crashdump - 99.9% of the time it's caused by a third-party driver. Cheap sound card? Old graphics driver? Hell, maybe even you've not installed the 4in1 driver for that Via IDE controller on your motherboard? Drivers are the single biggest source of crashes and reboots in Win2K. If you want a stable system, spend some money on your hardware, and get it from a company that provides decent drivers.

Admittedly, that's the reason why *nix is generally perceived as more stable than Windows - if a driver is bad in Windows, you're screwed. If a Linux driver is bad, you can fix it, recompile the source, and bye bye instability.

Don't blame Microsoft for instability. Blame the third-party hardware vendors who can't be bothered to spend the time and money properly debugging their drivers.

Re:Good = expensive as hell by tommck · 2003-05-21 00:26 · Score: 2, Interesting

I'm tired of hearing this. There is nothing unsafe about C.

I agree completely... This is the same kind of thinking that people use to try to outlaw guns... "If someone can use it to commit a crime, we should just eliminate them!".

I would say that poor development, insufficient design, (obviously) insufficient testing and a focus on features rather than security are MUCH more to blame for software quality issues than which language was chosen for the implementation.

I still think we should be able to moderate the whole article as a Troll...

T

--
---- It puts the lotion on its skin or else it gets the hose again. It does this whenever it's told.

It's not /computers/ in general, it's PC's by wiredog · 2003-05-21 00:33 · Score: 2, Interesting

From the April 1998 (!) issue of Byte (back when it was an excellent printed magazine):

"The fundamental concept of the personal computer was to make trade-offs that guaranteed PCs would crash more often...The first PCs cut corners in ways that horrified computer scientists at the time, but the idea was to make a computer that was more affordable and more compact."

"Having 15 million lines of code isn't as bad as having 15 million lines of new code"

Millions of PC users would be overjoyed with an MTBCF of just one day. Yet mainframes are big, complex systems that often have clusters of CPUs, gigab ytes of main memory, and thousands of users. What makes them so reliable?

Mainframe experts say that it's a matter of priorities. ... . When a mainframe crashes, however, it's a major catastrophe. It's General Motors calling up IBM to demand answers.

It's interesting how little has really changed in the past 5 years...

--

Best Slashdot Co

I wish Slashdot had a comment summary feature by Crag · 2003-05-21 00:39 · Score: 2, Informative

So here's my attempt (in no particular order): 1. Software is a lot more stable than we think a. My (*nix/Windows/MacOS) computer has been up for X days/years 2. Some software is faulty because it can be a. Out of Fast/Cheap/Correct, correct is first to go b. It's Good Enough (for management/the market) c. No accountability for faults 3. Developers/managers insist on using broken tools (C language) 4. The problem is just Too Hard a. Software gets more complex faster than it gets better b. We don't have the tools to build it right yet c. It's impossible to test every code path and state d. Software is a new "science" we don't yet understand 5. It's the hardware/interfaces to other software a. VCRs and other closed simple systems don't break b. Even with ECC memory and disks, there's a non-zero chance of a random single bit error from cosmic rays going uncorrected and gumming up the works c. Each piece works as designed, but every combination of pieces can not be designed or tested, and will always have unanticipated states (think The Matrix) Solutions 0. It's not a problem (it's a result of other problems) 1. Fix bugs by removing code instead of adding it 2. Pay more, wait, or fix it yourself (tip the Fast/Cheap/Correct balance) 3. Over-build systems to be more defensive against all errors (and use better tools and components)

Nintendo Cube has crashed by Carbon+Unit+549 · 2003-05-21 00:50 · Score: 2, Insightful

When was the last time someone crashed their Super Nintendo?

Actually, my game cube has crashed on several occasions with SSX tricky and other games.

--

nohup rm -rf ~/. >& zen &

Re:Computers don't crash by c4seyj0nes · 2003-05-21 00:54 · Score: 2, Insightful

In my opinion, something the user does should never cause a program or operating system crash. If this can occur, it is the developer who is at fault, not the user C:\WINDOWS> del *.*

--
"In wine there is wisdom. In beer there is strength. In water there is bacteria." --Old German Proverb

Re:Computers don't crash by glatiak · 2003-05-21 01:15 · Score: 2, Insightful

Twentyfive years ago I worked for a database and application vendor doing internals (Amcor in case anyone cares). Filtering for correct input and preventing long scale logical errors was a major fetish. Much of this was not difficult, just a group agreement to use library routines for all user interaction that had input validation and condition handling. Programs were built from shells that had standard condition handling embedded -- you added custom branches as needed. What made the whole approach successful was an agreement on standards of program behavior and a willingness to share common code. Errors like the ever popular buffer overflow just didnt happen because moves into buffers checked first, etc. The move to RISC processor architecture attenuated synchronous error handling, to be sure. But in the large, it is the obsession that in IT, experience is a handicap (just ask any recruiter about experience that is not 110% matched to what they want NOW) -- so junior programmer mistakes become institutionalized. The budget is a convenient excuse, but I think the real root is the inexperienced lack of appreciation for what matters.

Re:Computers don't crash by garrulous · 2003-05-21 01:27 · Score: 2, Funny

>

Have you ever met a user? They will rip out IDE cables while an OS is being loaded. They're savage, man! They're beasts and there's no proper defense!

Re:Computers don't crash by TheCarp · 2003-05-21 02:17 · Score: 2, Interesting

Impossible.

How can a "user error" cause a crash. Software should do proper bounds checking and should act appropriately (which may mean giving and error message) no matter what input it is given.

About the only crash due to user error that I can imagine really being due to user error would be the user killing the proicess with killall or pkill or its moral equivalent.

Other than that, its just bad bounds checking and blaming it on user error is really bad form.

Part of the problem IMNSHO is the commodity desktop. There are so many machines and they are all cheap and its more important to get the work done than it is to make sure the crash doens't ever happen again.

On real systems, if the system crashes, crash dumps are sent off to the OS vendor and they track down the problem and fix it. I know, we have had to collect and send off crash dumps in the past.

Each round of that makes the system more stable.

Thats one of the advantages of Linux, and why there are some systems that don't crash (my linux boxes pretty much only crash when the power goes out, and the UPS battery drains). That is, that these OSs like Linuxs and BSD are used in real enviornments and there are people commited to fixing the problems... so even the lowly common desktop user reaps the benefits.

See there is the differnce.. Windows, even the "server" versions grew out of a desktop OS with a desktop way of doing things. "Oh the server crashed, well lets reboot and hope it doesn't happen again", whereas Linux and BSD come from the land of the server down to the desktop "Oh the server cashed? get DEC on the phone" or "Get out those crash dumps".

-Steve

--
"I opened my eyes, and everything went dark again"

Oh god, more Big Bro bashing... by dasmegabyte · 2003-05-21 02:29 · Score: 2, Insightful

Of course, there's no need to mention Microsoft's inability to create a stable system.

You know, my win2k machine -- the one that has been up since our last power outtage, and had been up since the power outtage before that -- has never crashed. It might be because I don't overclock it, used a retail processor, Intel networking, four fans, whatever. But it has not crashed or needed a reboot since I installed Jetico BestCrypt last year, March or something. I use it every day, have played pretty hardware intensive games on it, and even used it as a server.

I think the problem here isn't with Microsoft and their inability to write a stable OS. If it is stable anywhere, that means the kernel isn't leaking ram or occasionally polling hardware that doesn't exist. The problem therefore lies with Microsoft's inherent trust that driver manufacturers and software engineers will handle their own damn errors. Linux doesn't do that. The kernel is so "low" that it recovers from just about everything. The software on top of it, that's another story. Many of the applications I've used in Linux crash after a single parsing error, bringing down anything reliant on them. Tell me you've never had an X server crash on you, taking down your entire GUI. To the average user, who isn't running a bunch of services or daemons, losing the GUI is the same thing as crashing. So what if bringing it back up is faster than rebooting the machine -- it's also more complex to support.

Besides, hardly anybody buys a Windows installation because they wanted a more stable system. They bought it because they wanted cooler toys and a snappy GUI. People "buy" Linux, BSD, et al. for stability.

--
Hey freaks: now you're ju

Re:Computers don't crash by Cookeisparanoid · 2003-05-21 02:38 · Score: 2, Insightful

Its a well know HCI concept that people learn by trail and error so its really a design flaw in the program if user error causes a crash

programmers trending downward by junkgoof · 2003-05-21 02:41 · Score: 3, Interesting

I think this brings up a good point. Hardware may have improved, software development tools may have improved, the people writing software have gotten much worse. A few years ago most people who were in the computer industry were there because they knew something. Now they are there because they wanted money, some HR droid picked their CV out of a pile because of the acronyms, and some manager does not know enough to fire them. Layoffs haven't helped either, generally the knowldegable people with higher salaries get booted first. Security vulnerabilities are up (including old stuff that has not been patched) and successful projects are down.

--
You got me into this! You were the ideologue! I'm only a poor assassin! - Twenty evocations, Bruce Sterling

Re:Computers don't crash by garrulous · 2003-05-21 02:48 · Score: 2, Funny

" A correct program does not allow a user to enter erroneous data." Name me one program that will keep a user from using the CD tray from acting as a coffee holder.

And here's your display by hendridm · 2003-05-21 02:53 · Score: 2, Informative

It's x86 hardware, but it's powered through the video card. Looks pretty good at 800x600 too (it's a TFT display).

Unisys 10.5" LCD Monitor w/ 2MB PCI Video Card

It says 2MB video card, but the one I got was a 4MB video card. It happily supports dual displays with Windows 9x and higher, but it doesn't support video playback, so scrap the idea of getting it to watch TV or play movies on. But for what you're describing, a small monitor for a low-power system, I think this would be ideal.

Sadly, they don't have a Linux driver for the required 65550 video card, but there's always Google and the price is right.

Software crashes because it is open, not closed. by gurps_npc · 2003-05-21 03:18 · Score: 3, Insightful

Anyone can write code for a computer.

In order to be flexible enough to do everythign a computer can do, computer languages have to be allowed to crash the computer. Otherwise you are severly limiting what they can do and slowing thigns down.

Most computer crashes are caused by an INTERACTION of two pieces of code that did not know about each other and were never tested.

If you want a system that never crashes than all you have to do is:

1) accept a restricted operating system that will never be able to compete with a commercial system like Windows.

2) Never install a program that was not A) created by the same company/group that wrote your operating sytem, B) specifically designed for your particular computer, and C) designed to be used with and thoroghly tested against all the other software that is currently installed on your PC>

That is what companies do when they make non-pc computer equiptment (cars have tiny computers) and is the reason why such things do not crash.

--
excitingthingstodo.blogspot.com

Vendor chaos, low quality control by labradort · 2003-05-21 03:56 · Score: 2

In mentioning that gaming consoles are the exception to the rule, you're on to the critical factors in the differences.

Here are the key aspects that lead to system instability of PCs (Linux or Windows or whatever):

1. Chaos of hardware vendors.

There are thousands of pieces of compatibile hardware for your PC. No one can test all of the combinations and revisions and their various driver and BIOS versions with all of the other hardware and software. If the hardware came from one vendor, or was standardized (which isn't going to happen), then there could be better quality control on the hardware.

When software and hardware are from the same vendor and go through their joint QA, you get better quality and fewer surprises. It isn't perfect, but it is less chaotic. Sun Solaris and Sun hardware is an example of this, and there are many more in the server/mainframe world.

2. Low level of quality control and durability design in hardware engineering and manufacturing

There are more quality checks in how Heinz ketchup is made (tastes the same all over the world with tomatos grown in very different regions) than in how PC components are made.

As a result there are many more DOA hardware, and hardware that behaves flaky.

This is related to keeping prices for the hardware down. Up to now the performance has increased at such a pace no one is complaining if they have to ditch a 2 GB drive and buy a 100 GB drive, or toss out their 486 for a P-III. Contrast that with my Canon AT-1 camera, which is still working fine since 1976 (two minor physical repairs), and my Yahama stereo amp, which is working great since 1989. If computers (and other cheap electronics such as digital cameras) ever reach the point where we expect them to last for 10 years or more, the quality control and durability of the components will have to increase.

3. Chaos of software vendors

Again, there are thousands of software titles and dozens of operating systems and revision/patch levels. It is impossible to test all permutations together. It is usually impossible to test completely all permutations of software options. I was asked to QA a product which had over 8000 different ways the options dialog box could be configured. Of course it wasn't done. We tested only a single option by itself, and not very many combinations were tried before the server product was shipped. I've seen very specific interactions between software products and their DLLs in Windows. This has been mainly fixed by the way Microsoft Windows XP manages DLLs now, but there are still ways that memory and resource leaks can cause one application to poison another. I've also seen Microsoft intentionally leave a poison bug in Internet Explorer to keep a competitor from taking on a role they had planned for IE in the future. They do these things in a very innocent way.

crash-proof computing by Roadmaster · 2003-05-21 04:21 · Score: 2, Informative

a few months before it got cancelled, Byte magazine published a great article entitled crash-proof computing, exploring the reasons why PCs are so tremendously unreliable. This goes beyond merely stating the known fact that Windows is horribly unstable and recognizing Linux particularly as a more stable solution; the article compares the entire PC architecture, design and current manufacturing and implementation techniques to big-iron systems like mainframes, with "5 nines" availability, MTBF of 20 years (yes, that means the computer is spec'd by the manufacturer to crash only once every 20 years), and other such techniques meant to justify those 6 and 7-figure pricetags.

Overall a very good read, highly recommended.

What about liability? by VinceTronics · 2003-05-21 07:58 · Score: 2, Insightful

MIT Tech Review's July 2002 cover story was titled Why Is Software So Bad?" (registration required to read whole article). The article makes the point that because there is no liability to the makers of faulty systems, there isn't any real incentive to build systems that never crash.

What if we could bill the HW, OS, and apps vendors for our lost time due to crashes? I'm sure systems would improve in a hurry!

What's needed is legislation making vendors liable for losses due to faulty computer systems. Remember, carmakers cared more about styling than safety until Ralph Nader's book Unsafe at Any Speed alerted the industry and consumers to the need for things like safety belts. Now we have federal safety standards for automobiles.

I'm sure the libertarian-leaning tech community will freak out as soon as they read this. But "self-regulation" will only take the computer industry so far towards total reliability. As computer systems govern more aspects of our modern lives, government regulation seems inevitable in my view.

liability, training, capitalism by lpq · 2003-05-24 10:34 · Score: 2, Interesting

As someone said before -- no product liability -- you have to pay money just to report a bug ...

Training of Software Engineers. With point and click interfaces you have people with an average reading ability of a 5th grader writing code. Even hinting that someone wasn't a good writer of code was considered "unprofessional" at some workplaces (i.e. -- you are not a 'team player').

Capitalism -- it's not cost effective to fix bugs until a customer finds them.
Even in code for Secure OS's under Common Criteria CAPP/LSPP, vendors aren't required to fix bugs that are not discovered by the independant evaluator or the customer. So even if the product manager knows of bugs in the OS that is intended for 'high security' government projects, there is no law saying he has to list them or fix them (unless they are found by a 3rd party or the customer). Spending time fixing bugs that are NOT found by the customer is not only not cost-effective, it is considered not working on "assigned priorities" and can be grounds for lower reviews.

This isn't pessimism -- it's reality. Quality doesn't pay when you can sell customers faulty products then charge the customers to fix the faulty product you sold them in the first place -- one might argue that it pays to have more bugs in the code -- you can charge more for service contracts and rack up more incidents that you then charge the customer, per incident, to handle.

More thoughts on hot button; ex: college class by lpq · 2003-05-24 12:50 · Score: 2, Interesting

When I was in college in Computer Science (how many programmers today have a formal degree in Computers, vs. say, a liberal arts degree?), Sophmore year, University of Midwest - CS201 - required for Computer Science majors -- beginning assembly language in Compass (CDC assembler).

The price of perfection is taught early -- an early lesson was when for a final project we were to work with 2-3 other people to make a final program. The deadline was approaching and our program still wasn't running. Turning it in late was a letter grade drop/day. Two of us felt we were close and didn't want to turn in a non-running program. The third wanted to turn it in. They also felt that they'd done their part and there were no problems in it.

The third turned in the project with his name on it. My partner and I spent another day cleaning up his code to get it to work and turned it in. We got a a "C" on the project, with a downgrade for bad coding practice in his section of the code and being a day late. He got a "B" even though it didn't work. In the final grade both he and my partner got "D"s while I got a "C", which sorta sucked for my major -- but it turns out that 60% of the class got "D"s and "E"s. Made a big stink about the course material being too difficult and the teacher made a public 'booboo' comment "It was the same material he'd taught before, it was just an exceptionally dumb class." Major ire of parents.

Anyone who got a "D" or "E" had it stricken from their academic record. It as the only "C" I got in my comp-sci curriculum (str8 A's in 300 level and above classes). But on that project, I learned that deadlines were more important than code quality.

Spin forward 15 years -- at small startup before Xmas. Deadline for demo approaching and I and other team member had parties to go to that evening. He was programming a DSP chip (he was a PhD wizard), and I was handling the drivers on the 286 DOS box. I checked my code backwards and forwards and he swore it couldn't be his stuff. Finally, I displayed output he was sending and it was 'wrong'. Unfortunately, my party had been out of town and I'd already missed the deadline for getting there because it was emphasized to me how important the project was to complete before leaving. When the problem was discovered in his code -- guess what -- he could't stay to fix it (I didn't
know anything about the DSP chip he was using) because, the VP told me, he was married and his wife was gonna leave him if he missed the party (I don't think he was serious, but maybe). I had no such excuse -- only a partner who went to the party alone.

Again -- what do I learn? Personal relationships take presidence over
product and code quality, so far we have code quality below deadlines and below personal relationships (though that has more disappeared in the modern
world).

more later...
-l

core problem: people people != computer people by lpq · 2003-05-24 13:23 · Score: 2, Interesting

The core of the problem was delineated in the book "Weird Ideas That Work: 11 1/2 Practices for Promoting, Managing, and Sustaining Innovation". It it he makes the main point -- that those people who are most creative are the people who don't do things the "normal way". They are the 'loners' -- the 'slow adopters of company culture'. They aren't the team players and they are slow to be programmed with the company way of doing things. As a result, they see problems differently than those that have been trained in the "correct way" to do things.

Those who spend time going to lunch, drinking beer together, palling around together -- they begin to think alike -- they develop synergy -- but they also develop a closed system. The ones who don't pal around come up with the completely off-the-wall ways of doing things because they haven't been indoctrinated into the 'normal way' of doing things. Quite often these ideas are shot down because of their eccentricity. But Steve Job's personal computer idea he presented to HP -- shot down by corportate culture was a brilliant success. He gives countless examples of the most brilliant people generally not being very good with "people skills".

A correllary of this is that those who push for perfection far past the 'norm' are going to be unpopular outsiders -- they are the nit-pickers, the one's who aren't team players. Again, they might be the ones that would nit pick the code to perfection, given the chance, but the larger group says "enough" -- it's "good enough, it boots, let's ship it".

In both instances the people most likely to increase quality in software are those that have the least political clout and are often least liked by their peers. Their peers often feel like the 'nitpicker' has a prideful, superiority complex -- overly prideful and sometimes go out of their way to sabotage work that might otherwise have turned the company around and saved millions.

I specifically was involved in a group who had to choose between 2 vendors of Microsoft compatible software. I became the lone supporter of company B. I was adamantly opposed to "A" for reasons I coudn't articulate at the time -- my gut told me "A" was untrustworthy but I couldn't tell why. I was overruled and 4-5 months into the project "A" sued MS for non-cooperation effectively killing our project. It was too late to go with company "B" who's price had doubled now that they were the only game in town. It turns out "A" had been having trouble with MS all through the negotiations with us, but no one picked up on it. Reminding anyone of the decision made me decidedly unpopular. But it was precisely because I hadn't gone out and been wined and dined by "A" and hadn't formed a "Good 'ol boy" relationship with them that I could see something was amiss. It was precisely the fact that I wasn't a hobbnobber/ polical animal that I caught the 'off' vibes. Those who were "good team employees" went along with the majority decision and the 'friendly team "A" who came onsite to woo us. Its the same principle at work.

Those who make the world work -- are also those most likely to compromise and most likely to compromise quality. It's because of their willingness to compromise that they are liked by many but it's the same compromise that resultes in compromised code -- both in terms of bugs and security.

I sure as heck don't know the answer. Successful combinations are highlighted in the book mentioned above where one person knows the almost anti-personal nature of the 'idea' person, and handles the media and external interactions, but the it's rare to find groups that work well like that.

It has often been said that the best software doesn't come out of committee but out of 1 or a few people -- while companies like to think that 9 women can have a baby in 1 month, it ends up more often that the 9 women argue over who

Slashdot Mirror

Why Do Computers Still Crash?

209 of 1,224 comments (clear)