Slashdot Mirror


Database Error Detection and Recovery

CowboyRobot writes "ACM Queue has an interview by Steve Bourne with Bruce Lindsay, responsible for a lot of the SQL and RDBMS we use today, in which they discuss error detection and recovery. My favorite part other than the photos is the definition of Heisenbugs - those problems that disappear only when you explicitly look for them."

29 of 163 comments (clear)

  1. Rite of Passage by beldraen · · Score: 4, Interesting

    Heisenburg bugs are a rite of passage in the computer world. They result from the production environment being different from the development environment. For instance, a debugger may initialize all memory in the process space to zero. An errant loop control now happens to be set properly, so no error occurs; however, in the production environment, whatever is left over in memory is used, which means the loop wanders off into nomansland and crashes. Always initialize your variables, period! Even in languages that automatically do it for you so that you are aware to what they are initialized.

    --
    Bel, the mostly sane.. "Of course I can't see anything! I'm standing on the shoulders of idiots." -- Me
    1. Re:Rite of Passage by quamaretto · · Score: 2, Interesting

      I have been dealing recently with a Heisenbug in Internet Explorer while trying to design a web page with floats. (Wed designers, weep with me.) The trouble is that a certain page renders wrong (what I think is wrong), the first time you look at it after opening Internet Explorer, and then displays correctly every time you look at it afterward, even with 'refresh'.

      And yes, I really do have to design it for Internet Explorer.

      Also, early on in the development of the page, I was encountering a similar situation with VS.Net 2004 where, when it "corrected the formatting" of my HTML (which happens when you switch to design view), it would introduce the same display glitch into the page.

      At some point in the cycle of figuring all of that out, I even came across variations of the page where IE, on every individual view, would randomly choose one of two ways to display the same page.

      (For reference, I was trying to float a div tag to the left of another div as a menu. The trouble is, sometimes the div would pop entirely outside the containing div to the left. I'm still not sure what causes it, after fixing it some dozen times.)

      --
      *is run over by rotten tomatoes*
    2. Re:Rite of Passage by thepoch · · Score: 2, Interesting

      This probably resulted in the quote: "If you can't repeat it, it's not a bug" End-User : Look at this bug! Developer: What bug? End-User : This bug! Developer: Oh? Try it out on my development computer. End-User : Developer: See, it's not a bug. Must be your Windows or something. Have your IT department support it. Manager : All right, no bugs, pay up. If only development was like this little play I wrote.

    3. Re:Rite of Passage by Jeremi · · Score: 4, Informative
      Don't forget the unintentional stack smashing, which is all too easy to do when you're writing tricky pointer code in C, and damned hard to find, especially when you barely understand the code you just wrote in the first place.


      For stuff like this, a wonderful debugging tool is valgrind -- it takes about 5 minutes to download and install (GPL, Linux/x86), and will find all kinds of memory-usage bugs in your program that you never even knew existed.

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    4. Re:Rite of Passage by leoval · · Score: 3, Interesting

      Nothing like putting extra printf's to get rid of an error. Thankfully the universe is a better place since the invention of Purify, (since most of the Heisenbugs are memory problems --evil pointers--). However the most challenging heisenbugs are the timing related ones, specially in networked applications. Those @#%'s are really hard to debug.I remember a project in particular where the heisenbug will only occur in Windows (not flame intended), but it would go away whenever we put a fprintf just before sending any packets trhu the socket. I think that the developer could not figure out the problem on time for the production deadline, so the fprintf to a bogus file is still there (about 6 years after the fact).

    5. Re:Rite of Passage by mikael · · Score: 3, Interesting

      Oh man, I remember those. The worst case was when I was trying to fix a SLIP (the model protocol) bug in an Ethernet probe. You could step through the code and everything worked correctly. The Token Ring network version worked correctly without failing. But run the system normally, and it failed to connect. The quickest solution was to compare the two code segments of each system; the only difference. was a 15 millisecond wait. Once replaced the system worked correctly (This was a requirement specified in the back page appendix of the modem chip specification).

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    6. Re:Rite of Passage by bsgenerator · · Score: 2, Interesting

      I once worked at company on a program in qbasic. The small procedure I was working on was ok when used in a test-program, but when it was used in the 'real' program it barfed. After I looked for hours for non-existing bugs in only 10 lines of code, questioning my own sanity in the proces, I asked for help. It turned out to be the compiler we run afterwards who messed up, and was 'corrected' by a dummy if-statement at the right place.

      if (1=0)
      print("The end of the world...");

      Of course, another senior programmer saw the statemnent and removed it just before it was shipped to the customer...

      Nowadays, I use JUnit to test portions of code, but it won't save me for this kind of problems.

    7. Re:Rite of Passage by Anonymous Coward · · Score: 2, Insightful
      Of course, another senior programmer saw the statemnent and removed it just before it was shipped to the customer...

      As well he should have. If you need something like that to stay, you need a comment explaining its purpose. "The following statement is never executed but necessary to work around a compiler bug" would be helpful. You could even describe the bug so they could check if it's still necessary once the next version of the compiler is released.

  2. this is to test a bug in slashcode by EnVisiCrypt · · Score: 3, Funny

    ignore this response.

    --


    *everything* is Orwellian to cats.
  3. Heisenbugs by base_chakra · · Score: 4, Funny

    "Heisenbug as originally defined--and I was there when it happened--are bugs in which clearly the behavior of the system is incorrect, and when you try to look to see why it's incorrect, the problem goes away.

    This is a really cool article, but it was especially fun to see the heisenbug mention. Years ago, some fellow CS people and myself conjectured a similar phenomenon that seemed to manifest once in a while, in which a computer malfunction goes away after one "proves" that there's no cause for the error to exist.

    Here's a list of heisenbug anecdotes, but note that some of these submissions aren't strictly heisenbugs.

    1. Re:Heisenbugs by JaredOfEuropa · · Score: 4, Funny

      I remember compaints about a Windows NT server slowing down to a crawl even when there weren't that many people using it. So the SysOp would fire up a few performance monitors and keep an eye on the thing. Sure enough: no slowdowns, no performance issues, normal operation. But every time when the guy would leave, the system would slow down again after 5 minutes or so. For a few days this had us stumped.

      Then someone figured out that the system had the 'pipes' screensaver on that came with NT3.51. Of course, as soon as we started to diagnose the machine, the screensaver would disappear. And yes... the screensaver turned out to be the culprit, sucking all the system resources away. We removed it and all was well.

      Does anyone know who coined the term 'heisenbug' by the way?

      --
      If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
  4. The Heart Monitor Case... by Anonymous Coward · · Score: 4, Funny

    BL: In the heart monitor case, you better keep the heart going, whereas in the Microsoft Word case, you can just give them a blue screen and everybody is used to that.

    SB: But also in the heart monitor case, it?s hard to ask users if they want to keep the heart going because the answer is pretty obvious, whereas in the Word case, you can ask the user in some cases what to do about it.


    New Microsoft Pace - Heart Monitor and Pacemaker

    STOP: 0x0000000A (0x0000015a, 0x0000001c, 0x00000000, 0x80116bf4)
    IRQL_NOT_LESS_OR_EQUAL - Beat.exe
    Please hold your breath while a dump file is created...

  5. Picture by bonzoesc · · Score: 4, Funny

    That picture is really something. I didn't know Gandalf wrote bsh.

    1. Re:Picture by SnowZero · · Score: 2, Informative

      The picture is Lindsay, not Bourne. See here for an earlier picture. I admit its a bit disturbing, but some CS people don't want to "waste" time shaving and getting their hair cut. I also need a haircut and a shave, but not nearly as bad as him. This picture has helped motivate me :)

  6. This is why MySQL ignites flamewars by Anonymous Coward · · Score: 5, Insightful

    A good design principle is: either do what you're told to do or tell us you didn't do it and why, but don't do something completely different.

    Exactly. Compare and contrast with MySQL's behaviour.

    • NULL inserted into a NOT NULL column silently alters the data to fit.
    • VARCHAR values have trailing whitespace silently removed without asking.
    • Dividing by zero is not an error.
    • Inserting a value into a column that violates its constraints doesn't result in an error; MySQL guesses at the "correct" value instead. For example, limiting an integer column to 4 digits, and attempting to insert 99999 will result in 9999 being inserted without any error.
    • If MySQL finds that it can't create certain table types, it simply ignores referential integrity.

    That's why there are loads of people who point out that you can't trust MySQL for important data, or that it isn't a "real" database. A real database tells you when it fails, which is something that is necessary for trusting it with data integrity.

    The key point here is that if you go to sea with only one clock, you can't tell whether it's telling you the right time.

    Ahh... but a man with one clock always knows the time - but a man with two is never quite sure :).

    1. Re:This is why MySQL ignites flamewars by Anonymous Coward · · Score: 4, Insightful

      Everything noted in the parent is well documented functionality in MySQL

      Well documented, perhaps, but nevertheless utterly wrong and often in violation of the SQL specifications.

      Slashdot doesn't need your redundant and off-topic flames.

      Try reading the article. I was pointing out that MySQL's behaviour goes 100% in opposition to what the article calls a "good design principle". How on earth is that off-topic?

  7. Good god, that picture! by Anonymous Coward · · Score: 4, Funny

    The guy looks like he's covered in coke dust.

  8. Heisenpages by Anonymous Coward · · Score: 2, Funny

    Web pages that disappear when you try to look at them....

  9. Make error message meaningful! by martyb · · Score: 4, Insightful

    One of the things that is addressed to some extent in the article is the need to make error messages meaningful! There is nothing more frustrating to me than to encounter an error message like "syntax error."

    At a minimum, an error message should have a Unique ID of where in the code this message is coming from, what was expected, what was actually found, and the context where it was found.

    EXAMPLE:

    for (i=1;i<=10;i++) {printf "%d\n', i}
    Which would you prefer:
    1. Syntax error in line 1.
    2. ERROR [ID=WXY1234] found "'" where expected """ in statement: "{printf "%d\n', i}" on line: 1.

    In my experience, meaningful error messages save more debugging time than it takes to put them in.

    1. Re:Make error message meaningful! by Anonymous Coward · · Score: 4, Insightful
      I agree with your sentiment, but I think that this particular example can be used to describe why error messages are exactly the way they are. The compiler has no way of knowing that you DIDN'T want a single quote, comma, space, letter i, and closed curly brace to be part of your string literal -- they're all valid characters for inside a string literal, after all. It has no idea that you typed a single quote in error instead of a double quote -- Maybe you want the output to look like:
      1
      ',i}2
      ',i}3
      ...
      and your only mistake was forgetting a final ",i); It is only when it gets to something that is clearly wrong (like the NEXT double quote in the file, which is followed by something that doesn't look like the proper token -- or maybe the end of the file itself) that it knows something was wrong. It would take a lot of guessing to scan through the mess that looks like a string literal and arbitrarily decide that it must have been the single quote (also used as an apostrophe) where you went wrong. I think the best thing it could have told you was that it found an error immediately after parsing the string literal that starts on line 1, show you the line, and point to the character that starts the literal. Anything beyond that is mere speculation.

      I hand crafted a (simple) C compiler when an undergrad, and figuring out where the stream of good tokens turns to mush is very hard. Often by the time you realize there's a problem, you already missed the real problem.

      I agree you should be as explicit and precise as you can in telling the user, but there are so many ways to screw things up, and they look so much like unusual-but-legal syntax that it's probably better to tell the user / developer what you actually do know, rather than guess about what might have been wrong.

      Now, on the other hand, if your statement was

      int y = ;
      the compiler should probably be able to tell that the equals operator needs an operand of some kind on the right, and there was none. It ought to tell you immediately that the problem was a missing right hand operand for the equals operator, and it should be able to tell you the exact position of the equals that is missing the operand. Just spitting out "syntax error" in a case like that is a little weak.
  10. Re:Too much slashdot. by Anonymous Coward · · Score: 2, Insightful

    And /. needs better thread handling too. When there's to much posts, the thread is splited in 72 pages. Pages 1, 2, and 3 are the same post reappearing over and over, then page 4 skipped a couple messages. You have to manually change the startat= parameter in the URL to see thoses missing posts. This is nuts. I mean, how difficult is it to code message threading? It's not rocket science. This makes /. look like ass when they are complaining about Microsoft bugs.

  11. Exception Handling by Joe+U · · Score: 3, Interesting

    Not quite on topic, but, I once tried writing code in SQL (in this case for ColdFusion) by using stored procedures and exception handling.

    What a nightmare.

    Many people code unique inserts like this.

    Check for duplicate record.
    if not found, then insert.
    else, prompt user.

    Using exception handling, you code like this.

    insert.
    if error thrown, prompt user.

    One less query, lots less code.

    One problem, the web application language treated all db errors as fatal. When asked, I was told this was by design.

    Thinking about it, I feel that Macromedia didn't want me to code efficiently. You don't sell extra ColdFusion servers if you can offload all your data logic to the SQL server. (Where it belongs)

  12. Java does exactly what Bruce wants by chaves · · Score: 4, Insightful
    "That's one of the real problems in today's programming language architecture for exception handling. Each component should list the exceptions that were"That's one of the real problems in today's programming language architecture for exception handling. Each component should list the exceptions that were raised: typically if I call you and you say that you can raise A, B, and C, but you can call Joe who can raise D, E, and F, and you ignore D, E, and F, then I'm suddenly faced with D, E, and F at my level and there's nothing in your interface that said D, E, and F errors were things you caused. That seems to be ubiquitous in the programming and the language facilities. You are never required to say these are all the errors that might escape from a call to me. And that's because you're allowed to ignore errors."

    I bet he didn't look into Java. Java (at least) allows and enforces that. A method will only throw an exception if declares to do so. A caller is forced to provide appropriate handlers or to declare it throws the exceptions not handled at its level. If a method can throw A, B or C but gets D during its execution, it has to in some way map D to either A, B or C (or not throw an exception at all).

    Of course, I am talking here about checked exceptions. Unchecked exceptions are supposed to represent *bugs*, and nobody should be trying to capture those.

    The sad thing is that even seasoned Java programmers do not understand how to write code w.r.t. exception handling. And beginners are usually turned off by the verbosity required by exception handling, so it is usual to see code where people capture (because they are forced by the language) and ignore exceptions (because they are too lazy and/or stupid to understand the consequences).

    1. Re:Java does exactly what Bruce wants by koreth · · Score: 5, Insightful
      it is usual to see code where people capture (because they are forced by the language) and ignore exceptions (because they are too lazy and/or stupid to understand the consequences).
      Ignoring exceptions completely is almost always a bad idea (though what do you do to handle, say, the InterruptedException that can be thrown by Thread.sleep(), or a CloneNotSupportedException from one of your own classes that you know is cloneable?) But there is some legitimate difference of opinion about whether Java's checked exceptions were a good idea or not.

      In my Java code I'm pretty paranoid about catching exceptions and handling them in as intelligent a way as I can, and even so I've run into plenty of situations where there's really no good way to recover from an underlying error and I end up just repackaging the exception into a higher-semantic-level one and tossing it upstream, where the upstream code does the same thing, all the way back out to the UI code, which displays an error message. At which point all I've achieved is cluttering up the intermediate layers of code with useless exception handlers when I could have gotten exactly the same effect by just catching a superclass exception in the UI code and displaying the same error message. (In addition to catching any specific exceptions that would cause a different result, of course.)

      Most likely anyone who's written a Java app of any appreciable size has run into exactly the same thing. In theory, and in small sample snippets of code, checked exceptions seem great. In practice, even some experienced Java gurus find them more hassle than they're worth. I'm quite certain that over the years I've spent far more time writing code to handle checked exceptions than they've saved me in debugging or diagnosis time. That to me is not the sign of a helpful language feature.

  13. Cool! by Anonymous Coward · · Score: 2, Funny

    I just found my new avatar picture. :)

    ps: not a troll, this guy's a freakin genius. I hope I look like that in 20+ years.

  14. Re:Heisen-whats? by fforw · · Score: 3, Funny
    Heisenbugs. Are there Heisen-features as well?
    yes. they exist but disappear once you try to use them. A direct result of featuritis.
    --
    while (!asleep()) sheep++
  15. Has language in CS matured? by sapgau · · Score: 2, Funny

    I couldn't help noticing Mr. Lindsay explanations of what a process would or could do. He kept describing it in the first person:

    - "You asked me to do X, I didn't do it."
    - "Aha, this seems like I should go further."
    - "Oh, I see this as one of those really bad ones."
    - "I'm going to initiate the massive dumping now."

    Obviously he is an expert in his field but I'm not sure if he talks this way because of his personality or because there isn't a vocabulary big enough to describe it.

    Would you imagine a medical doctor talking this way?

    - "So the white blood cells fight with the cancer cells: die evil cell, die!!"

    Or an engineer:

    - "The little peg ask it's big brother : can you help me convert this energy into circular motion?"

    1. Re:Has language in CS matured? by zuzulo · · Score: 3, Interesting

      Strangely, this is a phenomenon i have noticed with experts in many different fields who *really* understand what they are doing. They really do have an internal model of what is happening that essentially boils down to this sort of simplicity.

      In fact, the correlation is so strong that I am suspicious of folks who *cannot* boil an arbitrarily complex interaction into an easily understood metaphor.

      Jargon does in no way denote true understanding.

      --
      "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
    2. Re:Has language in CS matured? by aka.Daniel'Z · · Score: 2, Interesting

      I've talked about some unrelated stuff with some friends once and we came to the conclusion that when you observe something, you compare it to yourself.

      Applying that idea to coding, it would mean that talking like that in relation to code (.. so I will ask the other component ..) would be like understanding and agreeing with what the code is doing. Talking about the code in other ways (.. so it will ask the user ..) may mean that you don't understand it (so you should observe it more as to understand it better) or that you don't agree with the way it works (in this case, if the responsability is yours, you should think about changing the code).

      I always thought of it as an interesting idea - my own experiences working as a programmer and working with other people on a project seems to lead to this.