Slashdot Mirror


Database Error Detection and Recovery

CowboyRobot writes "ACM Queue has an interview by Steve Bourne with Bruce Lindsay, responsible for a lot of the SQL and RDBMS we use today, in which they discuss error detection and recovery. My favorite part other than the photos is the definition of Heisenbugs - those problems that disappear only when you explicitly look for them."

19 of 163 comments (clear)

  1. Rite of Passage by beldraen · · Score: 4, Interesting

    Heisenburg bugs are a rite of passage in the computer world. They result from the production environment being different from the development environment. For instance, a debugger may initialize all memory in the process space to zero. An errant loop control now happens to be set properly, so no error occurs; however, in the production environment, whatever is left over in memory is used, which means the loop wanders off into nomansland and crashes. Always initialize your variables, period! Even in languages that automatically do it for you so that you are aware to what they are initialized.

    --
    Bel, the mostly sane.. "Of course I can't see anything! I'm standing on the shoulders of idiots." -- Me
    1. Re:Rite of Passage by Jeremi · · Score: 4, Informative
      Don't forget the unintentional stack smashing, which is all too easy to do when you're writing tricky pointer code in C, and damned hard to find, especially when you barely understand the code you just wrote in the first place.


      For stuff like this, a wonderful debugging tool is valgrind -- it takes about 5 minutes to download and install (GPL, Linux/x86), and will find all kinds of memory-usage bugs in your program that you never even knew existed.

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    2. Re:Rite of Passage by leoval · · Score: 3, Interesting

      Nothing like putting extra printf's to get rid of an error. Thankfully the universe is a better place since the invention of Purify, (since most of the Heisenbugs are memory problems --evil pointers--). However the most challenging heisenbugs are the timing related ones, specially in networked applications. Those @#%'s are really hard to debug.I remember a project in particular where the heisenbug will only occur in Windows (not flame intended), but it would go away whenever we put a fprintf just before sending any packets trhu the socket. I think that the developer could not figure out the problem on time for the production deadline, so the fprintf to a bogus file is still there (about 6 years after the fact).

    3. Re:Rite of Passage by mikael · · Score: 3, Interesting

      Oh man, I remember those. The worst case was when I was trying to fix a SLIP (the model protocol) bug in an Ethernet probe. You could step through the code and everything worked correctly. The Token Ring network version worked correctly without failing. But run the system normally, and it failed to connect. The quickest solution was to compare the two code segments of each system; the only difference. was a 15 millisecond wait. Once replaced the system worked correctly (This was a requirement specified in the back page appendix of the modem chip specification).

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
  2. this is to test a bug in slashcode by EnVisiCrypt · · Score: 3, Funny

    ignore this response.

    --


    *everything* is Orwellian to cats.
  3. Heisenbugs by base_chakra · · Score: 4, Funny

    "Heisenbug as originally defined--and I was there when it happened--are bugs in which clearly the behavior of the system is incorrect, and when you try to look to see why it's incorrect, the problem goes away.

    This is a really cool article, but it was especially fun to see the heisenbug mention. Years ago, some fellow CS people and myself conjectured a similar phenomenon that seemed to manifest once in a while, in which a computer malfunction goes away after one "proves" that there's no cause for the error to exist.

    Here's a list of heisenbug anecdotes, but note that some of these submissions aren't strictly heisenbugs.

    1. Re:Heisenbugs by JaredOfEuropa · · Score: 4, Funny

      I remember compaints about a Windows NT server slowing down to a crawl even when there weren't that many people using it. So the SysOp would fire up a few performance monitors and keep an eye on the thing. Sure enough: no slowdowns, no performance issues, normal operation. But every time when the guy would leave, the system would slow down again after 5 minutes or so. For a few days this had us stumped.

      Then someone figured out that the system had the 'pipes' screensaver on that came with NT3.51. Of course, as soon as we started to diagnose the machine, the screensaver would disappear. And yes... the screensaver turned out to be the culprit, sucking all the system resources away. We removed it and all was well.

      Does anyone know who coined the term 'heisenbug' by the way?

      --
      If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
  4. The Heart Monitor Case... by Anonymous Coward · · Score: 4, Funny

    BL: In the heart monitor case, you better keep the heart going, whereas in the Microsoft Word case, you can just give them a blue screen and everybody is used to that.

    SB: But also in the heart monitor case, it?s hard to ask users if they want to keep the heart going because the answer is pretty obvious, whereas in the Word case, you can ask the user in some cases what to do about it.


    New Microsoft Pace - Heart Monitor and Pacemaker

    STOP: 0x0000000A (0x0000015a, 0x0000001c, 0x00000000, 0x80116bf4)
    IRQL_NOT_LESS_OR_EQUAL - Beat.exe
    Please hold your breath while a dump file is created...

  5. Picture by bonzoesc · · Score: 4, Funny

    That picture is really something. I didn't know Gandalf wrote bsh.

  6. This is why MySQL ignites flamewars by Anonymous Coward · · Score: 5, Insightful

    A good design principle is: either do what you're told to do or tell us you didn't do it and why, but don't do something completely different.

    Exactly. Compare and contrast with MySQL's behaviour.

    • NULL inserted into a NOT NULL column silently alters the data to fit.
    • VARCHAR values have trailing whitespace silently removed without asking.
    • Dividing by zero is not an error.
    • Inserting a value into a column that violates its constraints doesn't result in an error; MySQL guesses at the "correct" value instead. For example, limiting an integer column to 4 digits, and attempting to insert 99999 will result in 9999 being inserted without any error.
    • If MySQL finds that it can't create certain table types, it simply ignores referential integrity.

    That's why there are loads of people who point out that you can't trust MySQL for important data, or that it isn't a "real" database. A real database tells you when it fails, which is something that is necessary for trusting it with data integrity.

    The key point here is that if you go to sea with only one clock, you can't tell whether it's telling you the right time.

    Ahh... but a man with one clock always knows the time - but a man with two is never quite sure :).

    1. Re:This is why MySQL ignites flamewars by Anonymous Coward · · Score: 4, Insightful

      Everything noted in the parent is well documented functionality in MySQL

      Well documented, perhaps, but nevertheless utterly wrong and often in violation of the SQL specifications.

      Slashdot doesn't need your redundant and off-topic flames.

      Try reading the article. I was pointing out that MySQL's behaviour goes 100% in opposition to what the article calls a "good design principle". How on earth is that off-topic?

  7. Good god, that picture! by Anonymous Coward · · Score: 4, Funny

    The guy looks like he's covered in coke dust.

  8. Make error message meaningful! by martyb · · Score: 4, Insightful

    One of the things that is addressed to some extent in the article is the need to make error messages meaningful! There is nothing more frustrating to me than to encounter an error message like "syntax error."

    At a minimum, an error message should have a Unique ID of where in the code this message is coming from, what was expected, what was actually found, and the context where it was found.

    EXAMPLE:

    for (i=1;i<=10;i++) {printf "%d\n', i}
    Which would you prefer:
    1. Syntax error in line 1.
    2. ERROR [ID=WXY1234] found "'" where expected """ in statement: "{printf "%d\n', i}" on line: 1.

    In my experience, meaningful error messages save more debugging time than it takes to put them in.

    1. Re:Make error message meaningful! by Anonymous Coward · · Score: 4, Insightful
      I agree with your sentiment, but I think that this particular example can be used to describe why error messages are exactly the way they are. The compiler has no way of knowing that you DIDN'T want a single quote, comma, space, letter i, and closed curly brace to be part of your string literal -- they're all valid characters for inside a string literal, after all. It has no idea that you typed a single quote in error instead of a double quote -- Maybe you want the output to look like:
      1
      ',i}2
      ',i}3
      ...
      and your only mistake was forgetting a final ",i); It is only when it gets to something that is clearly wrong (like the NEXT double quote in the file, which is followed by something that doesn't look like the proper token -- or maybe the end of the file itself) that it knows something was wrong. It would take a lot of guessing to scan through the mess that looks like a string literal and arbitrarily decide that it must have been the single quote (also used as an apostrophe) where you went wrong. I think the best thing it could have told you was that it found an error immediately after parsing the string literal that starts on line 1, show you the line, and point to the character that starts the literal. Anything beyond that is mere speculation.

      I hand crafted a (simple) C compiler when an undergrad, and figuring out where the stream of good tokens turns to mush is very hard. Often by the time you realize there's a problem, you already missed the real problem.

      I agree you should be as explicit and precise as you can in telling the user, but there are so many ways to screw things up, and they look so much like unusual-but-legal syntax that it's probably better to tell the user / developer what you actually do know, rather than guess about what might have been wrong.

      Now, on the other hand, if your statement was

      int y = ;
      the compiler should probably be able to tell that the equals operator needs an operand of some kind on the right, and there was none. It ought to tell you immediately that the problem was a missing right hand operand for the equals operator, and it should be able to tell you the exact position of the equals that is missing the operand. Just spitting out "syntax error" in a case like that is a little weak.
  9. Exception Handling by Joe+U · · Score: 3, Interesting

    Not quite on topic, but, I once tried writing code in SQL (in this case for ColdFusion) by using stored procedures and exception handling.

    What a nightmare.

    Many people code unique inserts like this.

    Check for duplicate record.
    if not found, then insert.
    else, prompt user.

    Using exception handling, you code like this.

    insert.
    if error thrown, prompt user.

    One less query, lots less code.

    One problem, the web application language treated all db errors as fatal. When asked, I was told this was by design.

    Thinking about it, I feel that Macromedia didn't want me to code efficiently. You don't sell extra ColdFusion servers if you can offload all your data logic to the SQL server. (Where it belongs)

  10. Java does exactly what Bruce wants by chaves · · Score: 4, Insightful
    "That's one of the real problems in today's programming language architecture for exception handling. Each component should list the exceptions that were"That's one of the real problems in today's programming language architecture for exception handling. Each component should list the exceptions that were raised: typically if I call you and you say that you can raise A, B, and C, but you can call Joe who can raise D, E, and F, and you ignore D, E, and F, then I'm suddenly faced with D, E, and F at my level and there's nothing in your interface that said D, E, and F errors were things you caused. That seems to be ubiquitous in the programming and the language facilities. You are never required to say these are all the errors that might escape from a call to me. And that's because you're allowed to ignore errors."

    I bet he didn't look into Java. Java (at least) allows and enforces that. A method will only throw an exception if declares to do so. A caller is forced to provide appropriate handlers or to declare it throws the exceptions not handled at its level. If a method can throw A, B or C but gets D during its execution, it has to in some way map D to either A, B or C (or not throw an exception at all).

    Of course, I am talking here about checked exceptions. Unchecked exceptions are supposed to represent *bugs*, and nobody should be trying to capture those.

    The sad thing is that even seasoned Java programmers do not understand how to write code w.r.t. exception handling. And beginners are usually turned off by the verbosity required by exception handling, so it is usual to see code where people capture (because they are forced by the language) and ignore exceptions (because they are too lazy and/or stupid to understand the consequences).

    1. Re:Java does exactly what Bruce wants by koreth · · Score: 5, Insightful
      it is usual to see code where people capture (because they are forced by the language) and ignore exceptions (because they are too lazy and/or stupid to understand the consequences).
      Ignoring exceptions completely is almost always a bad idea (though what do you do to handle, say, the InterruptedException that can be thrown by Thread.sleep(), or a CloneNotSupportedException from one of your own classes that you know is cloneable?) But there is some legitimate difference of opinion about whether Java's checked exceptions were a good idea or not.

      In my Java code I'm pretty paranoid about catching exceptions and handling them in as intelligent a way as I can, and even so I've run into plenty of situations where there's really no good way to recover from an underlying error and I end up just repackaging the exception into a higher-semantic-level one and tossing it upstream, where the upstream code does the same thing, all the way back out to the UI code, which displays an error message. At which point all I've achieved is cluttering up the intermediate layers of code with useless exception handlers when I could have gotten exactly the same effect by just catching a superclass exception in the UI code and displaying the same error message. (In addition to catching any specific exceptions that would cause a different result, of course.)

      Most likely anyone who's written a Java app of any appreciable size has run into exactly the same thing. In theory, and in small sample snippets of code, checked exceptions seem great. In practice, even some experienced Java gurus find them more hassle than they're worth. I'm quite certain that over the years I've spent far more time writing code to handle checked exceptions than they've saved me in debugging or diagnosis time. That to me is not the sign of a helpful language feature.

  11. Re:Heisen-whats? by fforw · · Score: 3, Funny
    Heisenbugs. Are there Heisen-features as well?
    yes. they exist but disappear once you try to use them. A direct result of featuritis.
    --
    while (!asleep()) sheep++
  12. Re:Has language in CS matured? by zuzulo · · Score: 3, Interesting

    Strangely, this is a phenomenon i have noticed with experts in many different fields who *really* understand what they are doing. They really do have an internal model of what is happening that essentially boils down to this sort of simplicity.

    In fact, the correlation is so strong that I am suspicious of folks who *cannot* boil an arbitrarily complex interaction into an easily understood metaphor.

    Jargon does in no way denote true understanding.

    --
    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."