Slashdot Mirror


Open Source Programmers Stink At Error Handling

Mark Cappel writes: "LinuxWorld columnist Nick Petreley has a few choice words for for the open source community in 'Open source programmers stink at error handling'. Do you think commercial software handles errors better?"

28 of 610 comments (clear)

  1. In the case of some of us... by brunes69 · · Score: 4, Funny

    Who spend days at a time at work (read: Stallman) without showers, removing the last 3 words provides a better description :o)

  2. No, its not limited to OSS by ArcadeNut · · Score: 4, Insightful
    Commercial Developers are just as bad. If you ever see an "Access Violation" or "Unhandled Exception in blah...", chances are, the programmer didn't do proper error handling or checking for error conditions.

    Things like checking pointers to see if they are NULL before using them. Simple basic things that could prevent errors.

    Error handling doesn't just mean catching the error after its already happened. It also means being proactive about it before it happens.

    A lot of programmers do not do that.

    --
    Visit the Arcade Restoration Workshop @ http://www.arcaderestoration.com
    1. Re:No, its not limited to OSS by deranged+unix+nut · · Score: 3, Insightful

      No, but frequently error cases are only added because good testers force the programmer to take care of all of the edge cases.

      I have forced the developers that I work with to add hundreds of error case checks in the last year. :)

  3. That all depends on... your selection of course! by noahbagels · · Score: 3, Interesting

    Why does it seem like there are as many people in the "community" criticizing open source as there are supporting it?

    Two Words: Apache and Tomcat

    I'm a professional who works with the closed source equivalents all the time: Netscape iPlanet server, IIS and WebLogic.

    Now: before you flame - I like working with WebLogic, but it is no better than Tomcat in my opinion (as far as error reporting goes). And IIS is a piece of crap! Not to mention Netscape's overly complecated UI that blasts every change you've ever made and is completely out of sync with the flat file configs.

    Need I mention that Tomcat error logging is set-up in an XML file that is easy to read, modify, and translate into a simple report for management (IT that is).


    When was the last time Windows gave you a nice error.log when it blue-screened, or how about IIS on a buffer overflow?

    I'm sick of bashing on the free stuff out there. Sure, just because I can release one of my college projects into the open source may mean that statistically there are more projects without good error reporting, the real projects are pretty darn good.

  4. Blame this on Open Source Programmers only? by ackthpt · · Score: 3, Interesting

    I've been coding for over 20 years and I've seen some beauties, and I'm sure others have as well. Like the guy who put about 500 lines of Java in one Try - Catch. I'd suggest they screen their contributors better. Use a carrot and very gentle stick approach and be certain to encourage coders to think "what could happen here and how should I handle it?" whenever writing.

    --

    A feeling of having made the same mistake before: Deja Foobar
  5. Testing by Taral · · Score: 3, Insightful

    The real problem, IMHO, is that nobody likes to do the intensive testing that is necessary to get a program to be truly robust. We do it here at IBM, and I promise you -- it's not something I would do if I weren't being paid to do it.

    --
    Taral

    WARN_(accel)("msg null; should hang here to be win compatible\n");
    -- WINE source code

  6. I think I've got it down by SanLouBlues · · Score: 5, Funny

    As a professional programmer I adhere to a strict stylesheet which I think the Open Source community may appreciate a copy of:

    main( arguments ){
    try{
    --code goes here--
    }catch( exception ){
    printout "I'm sorry to do that you need our $50k/year support plan. \n Thank you!"
    }}

    No need to thank me.

  7. Error handling... by mcpkaaos · · Score: 5, Interesting

    ...is only as solid as the engineer behind it (and the design behind him/her). A poor design often results in a flaky system, difficult to implement and nearly impossible to predict. That, in turn, can result in very thin error handling. Whether or not a product is commercial has nothing to do with it. The only argument for that could possibly be that in many cases, more careful attention (in the form of testing and code reviews) is taken when a product is a revenue generator (or anything that will affect the perception of the quality of a company's engineering ability).

    Ultimately, if the engineer (or team of engineers) is inexperienced, error-handling will be weak, error-recovery nearly non-existant. However, a more senior engineer will generally start from error handling on up, making sure the code is robust before diving too deeply into business logic. The time taken for unit testing plays an especially large role here. The more time spent trying to break the code (negative test cases) the more likely you will have a system that has been revised throughout development to have rock-solid error handling/reporting/recovery.

    [McP]KAAOS

    --
    It goes from God, to Jerry, to me.
  8. Re:That all depends on your point of view by LordNimon · · Score: 3, Insightful
    Any Linux user who tries a stunt like that deserves a seg fault (or worse)

    And attitudes like that, ladies and gentlemen, are the reason why we're all going to be old and grey before Linux is accepted on the desktop.

    --
    And the men who hold high places must be the ones who start
    To mold a new reality... closer to the heart
  9. Too specific: Programmers Stink at Error Handling by victim · · Score: 3, Insightful

    I was just re-re-reauthenticating my Cubase installation. The key CD is now scratched which hangs the authenticator forced a quite ungraceful reboot and corrupted my hard drive. (Perhaps a $150 upgrade will help. I'll never know.)

    The last time I used Word a drive filled during a save operation and left me with just a mutilated copy of the original file. (I will not use it again.)

    My HP PSC 750xi software informs me every morning that its controlling software was exploded and I should reboot the host computer. (I'll wait for the OS-X drivers. If they are still bad the PSC goes out the door.)

    The most amazing part is that this state of affairs doesn't surprise me. If my refrigerator intermittently defrosted and melted icecream all over the kitchen I'd be ticked. If my car mysteriously dies at stop signs I get it fixed.

    Programmers have managed to beat down everyone's expectations to the point where half-assed is pretty good.

    The only way I see to fix it is for consumers to refuse to buy flawed products, or legislators to pass laws allowing redress for flawed products.

    I don't think either is likely.

    I now use OSS for my mission critical work and fix what needs it.

  10. slashdot effect - slashdot mirror by TV-SET · · Score: 5, Informative

    I guess if /. killed the site, it should mirror it :)
    Here is a select-n-middlemousebuttonclick(with my formatting):

    Title: Open source programmers stink at error handling.

    Outline: Commercial programmers stink at it too, but that's not the point. We should be better.

    Summary: Why are we subjected to so many errors? Shouldn't open source be better at this than commercial software? Where are the obsessive-compulsive programmers? Plus, more reader PHP tips. (1,400 words)

    Author: By Nicholas Petreley

    Body: (LinuxWorld) -- Thanks to my very talented readers I've been able to start almost every recent column with a reader's PHP tip.I'm tempted to make it a regular feature, but with my luck the tips would stop rolling in the moment I made it official.So I want you to be aware that this week's tip is not part of any regular practice. It is purely coincidental that PHP tips appear in column after column. Now that I've jinx-proofed the column, I'll share the tip.

    Reader Michael Anderson wrote in with an alternative to using arrays to pass database information to PHP functions. As you may recall from the column Even more stupid PHP tricks, you can retrieve the results of a query into an array and pass that array to a function this way:

    <?PHP
    $result = mysql_query("select name, address from customer where cid=1");
    $CUST = mysql_fetch_array($result);
    do_something($CUST);
    function do_something($CUST) {
    echo $CUST["name"];
    echo $CUST["address"];
    }
    ?>

    Michael pointed out that you can also retrieve the data as an object and reference the fields as the object's properties. Here's the above example rewritten to use objects:

    <?PHP
    $result = mysql_query("select name, address from customer where cid=1");
    $CUST = mysql_fetch_object($result);
    do_something($CUST);
    function do_something($CUST) {
    echo $CUST->name;
    echo $CUST->address;
    }
    ?>
    I can't help but agree with Michael that this is a preferable way to handle the data, but only because it feels more natural to me to point to an object property than to reference an element of an array using the string name or address. It's purely a personal preference, probably stemming from habits I learned using C++.

    Subtitle: OCD programmers unite

    Nothing could be a better segue into the topic I had planned for this week. I'm thinking about starting a group called OLUG, the Obsessive Linux User Group. Although I know enough about psychology to know I don't meet the qualifications of a person with full-fledged OCD (Obsessive-Compulsive Disorder), I confess that I went back and rewrote my PHP code to use objects instead of arrays even there was no technical justification for doing so.

    Certain things bring out the OCD in me. Warning messages, for example. It doesn't matter if my programs seem to work perfectly. If a compiler issues warnings when I compile my code, I feel compelled to fix the code to get rid of the warnings even if I know the code works fine. Likewise, if my program generates warnings or error messages at run time, I feel driven to look for the reasons and get rid of them.

    Now I don't want you to get the wrong impression. My PHP and C++ code stand as testimony to the fact that my programming practices don't even come within light years of perfection. But just because I do not live up to the standards I am about to demand isn't going to stop me from demanding them. It's my right as a columnist. Those who can, do. Those who can't, write columns.

    I'll be blunt. Open source programmers need to stop being so darned lazy about error handling. That obviously doesn't include all open source programmers. You know who you are.

    If you want a demonstration of what I mean, start your favorite GUI-based open source applications from the command line of an X terminal instead of a menu or icon. In most cases this will cause the errors and warnings that the application generates to appear in the terminal window where you started it. (There are exceptions, depending on the application or the script that launches the application.)

    Many of the applications I use on a daily basis generate anywhere from a few warnings or error messages to a few hundred. And I'm not just talking about the debug messages that programmers use to track what a program is doing. I mean warning messages about missing files, missing objects, null pointers, and worse.

    These messages raise several questions. Doesn't anyone who works on these programs check for such things?Why do they go unfixed for so long? Are these problems something that should be of concern to users?Worse, what if these messages appear because of a problem with my installation or configuration, and not because the program hasn't been fully debugged?But even if it is my installation that is broken, shouldn't the application report the errors? Why do I have to start the application from a terminal window to see the messages?

    Subtitle: Getting a handle on errors

    At first I wondered if this was a problem that you would be more likely to find when developers use one graphical toolkit rather than another. But I see both good and bad error handling no matter which tools people use. For example, the GNOME/Gtk word processor AbiWord has been flawless lately. Not a single warning or error message appears in the console. It's possible that AbiWord simply isn't directing output to the console, but I'm guessing that it's simply a well-tested and well-behaved application.

    On the other hand, GNOME itself has been a nightmare for me lately. At one point I got so frustrated that I deleted all the configuration files for all of GNOME and GTK applications in my home directory in disgust, determined never to use them again. When I regained my composure and restarted GNOME with the intent of finding the cause of the problems, the problems had already disappeared. Obviously one or more of my configuration files had been at fault. Which one, I may never know, because GNOME or some portion of it lacked the proper error handling that should have told me.

    In this case I was lucky that the problems were so bad I lost my temper and deleted the configuration files. In most cases, the applications appear to function normally. Aside from being ignorant of any messages unless you start the application from a terminal, there's no way of knowing why the warnings exist, or if they are cause for concern. The warnings could be harmless, or they could mean the application will eventually crash, corrupt data, or worse.

    Subtitle: Examples

    Just so you know I'm not making this up, here are some samples of the console messages that appeared after just a couple of minutes of toying with various programs. By the way, did you know you can actually configure the Linux kernel from the KDE control panel? Bravo to whoever added this feature. Nevertheless, when I activate that portion of the control panel, I get the message:

    QToolBar::QToolBar main window cannot be 0.

    Is there supposed to be a toolbar that isn't displayed as a result? I may never know.

    The e-mail client sylpheed generates this informative message after about a minute of use:

    Sylpheed-CRITICAL **: file main.c: line 346 (get_queued_message_num): assertion `queue != NULL' failed.

    The Ximian Evolution program generates tons of warnings, but most are repetitions. They begin with the following:

    evolution-shell-WARNING **: Cannot activate Evolution component -- OAFIID:GNOME_Evolution_Calendar_ShellComponent
    evolution-shell-WARNING **: e_folder_type_registry_get_icon_for_type() -- Unknown type `calendar'
    evolution-shell-WARNING **: e_folder_type_registry_get_icon_for_type() -- Unknown type `tasks'

    The KDE Aethera client generates even more warning messages than Evolution, but many of them are simply debug messages about what the program is doing. By the way, I finally figured out why I couldn't login to my IMAP server with Aethera. The Aethera client couldn't deal with the asterisks in my password. I could log in after I changed my password, but I still can't see my mail. The program simply leaves the folder empty and says there's nothing to sync. Here are just a few of the countless warnings I get from Aethera, including the sync message.

    Warning: ClientVFS::_fact_ref could not create object vfolderattribute:/Magellan/Mail/default.fattr
    Reason(s): -- object does not exist on server
    Warning: VFolder *_new() was called on an already registered path
    clientvfs: warning: could not create folder [spath:imap_00141, type:imap]
    RemoteMailFolder::sync() : Nothing to sync!

    The spreadsheet Kspread reports these errors all the time, even though what I'm doing has nothing to do with dates or times:

    QTime::setHMS Invalid time -1:-1:-1.000
    QDate::setYMD: Invalid date -001/-1/-1
    The e-mail client Balsa popped up these messages just moments after using it:
    changing server settings for '' ((nil))
    ** WARNING **: Cannot find expected file "gnome-multipart-mixed.png" (spliced with "pixmaps") with no extra prefixes

    The Gnumeric spreadsheet only reported that it couldn't find the help file, as shown below:

    Bonobo-WARNING **: Could not open help topics file NULL for app gnumeric

    Many of these problems could easily have been handled more intelligently. For example, Gnumeric could have asked for the correct path to the help file, perhaps adding an option so a user can decide not to install the help files and disable the message. Unless GTK and Bonobo are a lot more complicated than they should be, it should be easy to create a generic component for handling things like this and then use the component to handle all optional help files as a rule.

    The only conclusion I can draw is that, like most commercial software developers, many open source programmers are just plain lazy about proper error handling. But we're supposed to be better than that, and it's time we started to live up to the reputation. I realize that most of these programs are works in progress. But good error handling is not something that should be left for last. It should be part of the development process. Although I may not practice it myself, I'm not the least bit ashamed to preach it.

    --
    Leonid Mamtchenkov ...i don't need your civil war...
  11. Nick Petreley is a moron... by ryanwright · · Score: 3, Insightful

    Nick Petreley is a moron. Intelligent people don't make blanket statements like "Open source programmers stink at error handling." Next thing you know, he'll be telling you "Closed source programmers use more descriptive variables." How the hell does he know?

    Programming traits - just like preferences for pizza toppings, frequency in bathing and type of pr0n - vary from programmer to programmer. Some implement proper error handling, others could care less. It doesn't matter whether they're working on an open or closed source project. If the open-source programmers all traded places with the closed-source programmers, you'd have the same ratios of proper vs. improper error handling (although the traffic from open-source-programmers.com to goatse.cx would probably spike).

    --
    -Ryan, with the unoriginal sig
  12. Excpetions are a key by ftobin · · Score: 3, Informative

    Exceptions are mandatory for good programming, period. If the language you are using doesn't support exceptions (C, Perl, etc), you are going to have problems. Exceptions make sure that if an error occurs, and you aren't aware of it, your program dies, and doesn't go on its merry way, causing a security hole/unstable software.

    Perl's hack at exceptions using 'die' doesn't cut it; one important thing about implementing exceptions is that your base operations (e.g., opening files, and other system functions) need to raise exceptions when problems occur. If this doesn't happen, you're only going to struggle in vain to implement good, correct code.

    Exceptions are a primary reason I've moved from Perl to Python. Python's exceptions model is standard and clean. Base operations throw exceptions when they occur problems. And my hashes no longer auto-vivify on access, thank goodness. Auto-vivification on hash access are probably one of the principle causes of bad Perl code.

  13. Well... error checking sucks in most languages by TrixX · · Score: 5, Informative

    Most languages make error checking very hard. In particular, C and Perl, two of the most used langs in OSS development, lack good mechanisms for sane error checking. I might explain more, but is better explained at this document.

    btw, the document is part of a library that allows nicer error checking in C, called BetterC. (Yes, this is a plug, I've participated in the development).

    It is modelled in Eiffel's "Design by contract", a set of techniques complemented with language support to make error checking a lot easier and semiautomatic. "Design by contract" has been described as "one of the most useful non-used engineering tool".

  14. There a no such things as errors! by toupsie · · Score: 3, Funny

    The open source community should take the same stance as closed source corporations when it comes to bugs. They are not really bugs but undocumented features!

    --
    Strange women lying in ponds distributing swords is no basis for a system of government.
  15. Re:That all depends on... (slightly OT) by FastT · · Score: 3, Insightful
    Why does it seem like there are as many people in the "community" criticizing open source as there are supporting it?
    Because the reality is that open source software is neither as good as, nor as bad as, the zealots on both sides claim. More than closed source development, open source development is subject to significant variability in the skills of its practitioners. There is some open source software out there that is complete crap, and some that is very good, and far more than either that is merely mediocre.

    And I would be careful about holding up Tomcat as an (open source) triumph. It's had some major bugs all through the 3.x timeframe, and its team includes at least a few daytime profressional "closed source" programmers (there's no correlation between the two, by the way).

    --

    The only certainty is entropy.
  16. Re:What errors? by Cato+the+Elder · · Score: 3, Insightful

    Bullshit.

    Sure, an application error in a Unix derived system is much less likely to bring down the whole system. But that's no excuse for not dealing with error conditions correctly.

    Also "errors" don't just occur from bad code. Running Linux gives you no protection against drive failures, network flakiness, or plain old user error.

    Sure, if the error messages is misleading you can look in the source to find out what actually caused the error. Heck, even if it SEGVs you can compile it with debugging symbols and let GDB tell you what line it's failing at.

    However, the open source community will NEVER attract mainstream users with that kind of attitude. Furthermore, even hardcore geeks should have better things to do fix up crud in supposedly release quality code. Hey, it's one thing if I'm working on something clearly under development, but it's nice to be able to get stable stuff to.

    That said, I don't find open source to be any worse than the commercial stuff I've worked with. With, say, Microsoft stuff, it is just much harder to distinguish bad error handling code from bad code even when no error conditions are encountered.

  17. Blanket statements by Wonko42 · · Score: 3, Funny
    "Intelligent people don't make blanket statements..."

    Wait for it...wait for it...ahhhhh!

  18. Error messages need to be have error numbers by Khalid · · Score: 3, Interesting

    Error messages need to have numbers associated with them. For instance when I have ORA-1241 in oracle, a quick search in groups.google.com will give me a lot of informations about this error, and why it occured and what I can do about. Alas, there is no such thing in most of Open Source software, you just have plain text, so the search is less effective, which search keywords are you going to choose. The situation is even worse for people who used localised versions of the software, as you don't have the English transltation so you can search the English archive in groups.google.com and which count for 80% of the posts.

    What might be cool is a codified error numbering a la Oracle for instance. I would love to have KDE-2345 error, or GNOME-1234 error, or Koffice-567 etc. That would made searchs far more effectives

  19. Maybe people expect too much from software. by Carnage4Life · · Score: 5, Insightful
    One of my friends counters arguments about software being too sloppy with the point that there is practically no other field where a product is designed to be used on such a varying degree of ways and expected to still be robust. For instance, let's use a car as an analogy to your complaints

    The key CD is now scratched which hangs the authenticator forced a quite ungraceful reboot and corrupted my hard drive. (Perhaps a $150 upgrade will help. I'll never know.)

    how is a programmer expected to deal with the CD being scratched? Does your car still work if the transmission is damaged or half the engine has been riddled with bullet holes?

    The last time I used Word a drive filled during a save operation and left me with just a mutilated copy of the original file. (I will not use it again.)


    Again, a very unexpected and unnatural scenario. How well do cars function when they run out of fuel?

    The most amazing part is that this state of affairs doesn't surprise me. If my refrigerator intermittently defrosted and melted icecream all over the kitchen I'd be ticked. If my car mysteriously dies at stop signs I get it fixed.


    But how well would your refrigerator react if you treated it shoddily such as by leaving it outdoors intermuitently or diconnecting and reconnecting the power several times a day?

    Now, I'm not trying to excuse sloppy software development but the fact of the matter is that software is constantly expected to work perfectly under situations completely outside its specifications yet we don't expect this from other items or appliances that we use.
  20. Graceful degradation by PollMastah · · Score: 3, Insightful

    I agree with you that software in general is a lot more complex, and used in a lot more unexpected ways, than something like a car.

    OTOH, there is such a thing called graceful degradation -- that is, if you push the limits of the software, it shouldn't just suddenly barf and die on you, but degrade gracefully. Too much code I've seen (both open and non-open source) assumes too much -- and dies badly when the assumptions fail.

    It is possible and not overly difficult to design software such that it degrades gracefully. Sadly to say, sloppy programming (programmers), deadline pressure, or disinterest in handling error conditions, dominate the world of software. Not many would put in the extra work to make a program degrade gracefully, because it doesn't have very visible effect -- until things start to fail. And too many programmers have this "test only the cases that work" syndrome.

    --

    Poll Mastah

  21. Re:you didn't read the article by egomaniac · · Score: 5, Insightful

    News flash: Technology pundit seemingly insults open source, Slashdot up in arms. None of them actually read the article. Story at 11.

    The article does not say "open source doesn't handle errors as well as closed source". What the article does say is "like most commercial software developers, many open source programmers are just plain lazy about proper error handling. But we're supposed to be better than that...".

    I don't see a problem with this statement. The fact is, most open-source software sucks donkey balls. Petreley is merely saying it's time to put your money where your mouth is -- if you want open source to be considered better than closed source software, it better stop being so danged flaky.

    --
    ZFS: because love is never having to say fsck
  22. Blame it on von Neumann by dha · · Score: 3, Interesting
    Inexperienced and lazy programmers are usually poor at error-handling, and it's easy to lay the blame there -- but at a deep level that misses the point.

    This is not about open-source vs closed-source programs, nor for-fun vs for-money programmers. It's about computational models such as von Neumann machines that, at their deepest roots, assume there will be no errors. That chain-of-falling-dominos style of thinking so permeates conventional programming on conventional machines that it's almost surprising that any code has any error handling at all.

    Of course it's possible to hand-pack error-handling code all around the main functional code in an application.. and of course quality designers and programmers in and out of open-source will do just that.. but viewed honestly we must admit it's an huge drag having to do so, and typically fragile to boot, because the typical underlying computational and programming models provide no help with it. Error-handling code tends to be added on later to applications just as try/catch was added on later to C++.

    Lest we think this sad state must be inevitable, let's recall that other computational models, like many neural network architectures for example, are inherently robust to low level noise and error. Then, that underlying assumption colors and shapes all the `programming' that gets built on top of it. We're to the point where trained neural networks, for all the limitations they currently have, can frequently do the right thing in the face of entirely novel and unanticipated combinations of inputs. Now that's error handling.

    The saddest part is that von Neumann knew his namesake architecture was bogus in just this way, and expressed hope that future architectures would move toward more robust approaches. Fifty years later and pretty much the future's still waiting..

  23. Re:THAT is your answer? by Nindalf · · Score: 3, Insightful

    I'm not familiar with the rocket you describe, but yes, it is a superior error-handling philosophy. Imagine if there was an unchecked error, and the rocket, instead of detonating, landed in civilian housing?

    Why would you assume the rocket was intentionally detonated by the computer? Its computers went down and it went completely out of control. It was only blown up after it broke apart because it happened to go into a spin. There is no upside to this computer failure.

    You call blowing up a commercial satellite launch vehicle non-destructive? If this error was ignored the rocket would not have been affected by it, it was an utterly irrelevant mathematical value overflow error in a program that only did anything before launch.

    This program became destructive because of the "error management." In particular, the error management philosophy that halting a suspicious system is always safer than allowing it to run.

    The point you seem to have missed is that halting the program is often more destructive than ignoring the error. Data loss, control loss, vital services suspended, etc.

    That's like saying there's no reason to assume knowing about a bug is better than just allowing a program to go on its merry way. Uncaught bugs are the cause of 99% of the security holes out there. It's always better to know when there is a problem.

    I'm sure the European Space Agency found it worth every penny of the estimated half-billion dollars lost to find this otherwise irrelevant bug. After all, it's always better to know, whatever the cost of halting the system, right?

  24. Not TOO much by victim · · Score: 3

    ardax makes the point above, but is only scored 1, so...

    First, about your analogies...
    If I wear out the door key to my car, the car should not burst into flames when I try to open the door.

    If my car runs out of fuel, I expect that after rectifying that little problem (and bleeding the injectors) it will be just like new. I do not expect that it will ruin my tires.

    And yes, I have kept my refrigerator outdoors. I kept it on the front porch for two months while the house was being renovated 10 years ago. It worked just fine. It is 30 years old now (Thats 15 PC generations to you young whippersnappers. Moores law says the new fridges should be 1,000,000 times colder now. :-) and I expect it to continue running indefinately. About every 5 years I put a drop of oil on the fan shaft behind the freezer to keep it from squealing. Software has a ways to go.

    About the cubase incident...
    Yes, the CD is scratched. I expect that I won't be able to re-authorize my copy of the software, but don't ruin ALL the data on my hard drive! (Its actually worse. It was my wife's laptop. You do NOT want to have to tell my wife that you just wiped out her laptop.)

    About Word...
    Destroying the on disk copy of a document before successfully writing out the new copy is just plain stupid. Particularly on a Mac where there is a special file system function to swap two files. You write the new copy under a fake name, swap it atomically (even over file severs) with the original file, then delete the fake named file (which now contains the old data). No one gets hurt in error conditions, no one can ever have bad luck timing and read a partially written file off the file sever. Life is good.

    The third case (The PSC) you don't mention, but it isn't really a case of graceful degradation. Its just an irritating bug. Honestly I'd dump the device because of the irritation, but it actually feeds card stock out of its paper tray! A rare quality in a printer.

    I suppose the more explicit point I should have made is that bad things are going to happen to software and it requires effort from the programmer to deal with it. Sometimes just a tiny bit of effort. Cubase performed so badly with a bad CD that I suspect they never tested it. They write about it in their documentation, but they probably didn't test it. The Word example is just careless programming which could have been trivially avoided if the programmers understood the platform's file system calls.

    How about the cost? I estimate that it probably doubles the engineering effort to handle the exception cases to a degree that would cover the incidents I note above. In the calculus of software development the benefits do not out way that cost.

  25. The user should not see errors unless they want to by Codifex+Maximus · · Score: 3, Insightful

    The user should not see errors unless they want to. I agree with sending errors to STDOUT. If the user wants to see the errors then they can either start the app in an XTERM so they can see the errors or switch virt terms over to VT1 and see the STDOUT output.

    Also, I use error reporting to a logfile rather than alarming the user. Most applications should be able to survive the average error. Those applications should prompt the user for proper input - even to the point of placing the cursor in the proper field. Each field should be intelligent and be able to validate it's own input data.

    Those error logs I spoke of should be used by the programmer to debug his/her application - don't alarm the user ok?

    --
    Codifex Maximus ~ In search of... a shorter sig.
  26. Explanation via Analogy by jayed_99 · · Score: 3, Insightful

    The main difference between a great systems administrator and a technically competent sysadmin is paranoia.

    A great sysadmin would cut out their own heart before operating without known good backups. A great sysadmin would chew their own arm off before putting something into production without testing it first in a development environment. A great sysadmin *always* has a backout plan.

    And how does a lowly admin reach this amazing level of greatness, you ask?

    Admins get paranoid after making hideous, terrible mistakes that immediately result in Bad Things Happening.

    I have personally: killed the email server for 2 days...shut down distribution for the world's largest distributor of widgets (every Thursday for 3 weeks)...destroyed all connectivity (voice and data) to the world for 12 hours...hosed the upgrade on a 700GB Oracle database (and our backups were no good). And any semi-experienced administrator will have, at minimum, two stories that are at least this bad (like my friend who shut down trading at Fidelity for a day).

    And for every one one of these instances, I immediately felt the wrath of: my manager, my manager's manager, other people's managers, other people who were affected, stray people wandering by my cube who weren't affected...I also became a part of the "mythical sysadmin storybook"--"I once worked with this guy, and (you won't believe this) he..."

    I submit the hypothesis that: generally, most developers are not subject to this type of immediate and extremely negative form of feedback for their mistakes. Therefore it takes a developer a long time to develop an aversion reflex that conditions them to do "the right thing -- error handling, code documentation" instead of doing "the easy, interesting, enjoyable and sexy thing -- making spiffy algorithms, writing tight code".

    Drifting into another analogy, error handling is like code docmentation. Why do most developers get good (and a little obsessive) about documenting code? Becuase they finally spent some years trying to maintain someone else's tight, sexy code that is virtually incomprehensible.

    So, my point is, developers take a long time to viscerally learn the need for good error handling by repeatedly getting whacked on the head for lack of error handling. It's like evolution in action.

  27. Something to be said for baby-sitting mainframes by dinotrac · · Score: 3, Interesting

    I'm astonished at the poor error-handling in most software these days.

    The biggest problem is not whether your language has exceptions (good error-handling has been done for years without them) or whether programmers are lazy. It's a matter of making it a priority. In fact, laziness caused a lot of us old-timers to take a major interest in error-handling.

    Picture the days before internet access, running mainframe systems, probably with overnight batch cycles.

    Good error handling might mean that you don't get a phone call at 3:00 am.
    If that phone call comes, good error messages might mean that you can diagnose the problem over the phone and walk the operator through recovery.
    In either case, you don't have to drive down to the data center.

    Sleep. Now there's a motivator.