Slashdot Mirror


Database Error Detection and Recovery

CowboyRobot writes "ACM Queue has an interview by Steve Bourne with Bruce Lindsay, responsible for a lot of the SQL and RDBMS we use today, in which they discuss error detection and recovery. My favorite part other than the photos is the definition of Heisenbugs - those problems that disappear only when you explicitly look for them."

15 of 163 comments (clear)

  1. Rite of Passage by beldraen · · Score: 4, Interesting

    Heisenburg bugs are a rite of passage in the computer world. They result from the production environment being different from the development environment. For instance, a debugger may initialize all memory in the process space to zero. An errant loop control now happens to be set properly, so no error occurs; however, in the production environment, whatever is left over in memory is used, which means the loop wanders off into nomansland and crashes. Always initialize your variables, period! Even in languages that automatically do it for you so that you are aware to what they are initialized.

    --
    Bel, the mostly sane.. "Of course I can't see anything! I'm standing on the shoulders of idiots." -- Me
    1. Re:Rite of Passage by quamaretto · · Score: 2, Interesting

      I have been dealing recently with a Heisenbug in Internet Explorer while trying to design a web page with floats. (Wed designers, weep with me.) The trouble is that a certain page renders wrong (what I think is wrong), the first time you look at it after opening Internet Explorer, and then displays correctly every time you look at it afterward, even with 'refresh'.

      And yes, I really do have to design it for Internet Explorer.

      Also, early on in the development of the page, I was encountering a similar situation with VS.Net 2004 where, when it "corrected the formatting" of my HTML (which happens when you switch to design view), it would introduce the same display glitch into the page.

      At some point in the cycle of figuring all of that out, I even came across variations of the page where IE, on every individual view, would randomly choose one of two ways to display the same page.

      (For reference, I was trying to float a div tag to the left of another div as a menu. The trouble is, sometimes the div would pop entirely outside the containing div to the left. I'm still not sure what causes it, after fixing it some dozen times.)

      --
      *is run over by rotten tomatoes*
    2. Re:Rite of Passage by thepoch · · Score: 2, Interesting

      This probably resulted in the quote: "If you can't repeat it, it's not a bug" End-User : Look at this bug! Developer: What bug? End-User : This bug! Developer: Oh? Try it out on my development computer. End-User : Developer: See, it's not a bug. Must be your Windows or something. Have your IT department support it. Manager : All right, no bugs, pay up. If only development was like this little play I wrote.

    3. Re:Rite of Passage by leoval · · Score: 3, Interesting

      Nothing like putting extra printf's to get rid of an error. Thankfully the universe is a better place since the invention of Purify, (since most of the Heisenbugs are memory problems --evil pointers--). However the most challenging heisenbugs are the timing related ones, specially in networked applications. Those @#%'s are really hard to debug.I remember a project in particular where the heisenbug will only occur in Windows (not flame intended), but it would go away whenever we put a fprintf just before sending any packets trhu the socket. I think that the developer could not figure out the problem on time for the production deadline, so the fprintf to a bogus file is still there (about 6 years after the fact).

    4. Re:Rite of Passage by mikael · · Score: 3, Interesting

      Oh man, I remember those. The worst case was when I was trying to fix a SLIP (the model protocol) bug in an Ethernet probe. You could step through the code and everything worked correctly. The Token Ring network version worked correctly without failing. But run the system normally, and it failed to connect. The quickest solution was to compare the two code segments of each system; the only difference. was a 15 millisecond wait. Once replaced the system worked correctly (This was a requirement specified in the back page appendix of the modem chip specification).

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    5. Re:Rite of Passage by bsgenerator · · Score: 2, Interesting

      I once worked at company on a program in qbasic. The small procedure I was working on was ok when used in a test-program, but when it was used in the 'real' program it barfed. After I looked for hours for non-existing bugs in only 10 lines of code, questioning my own sanity in the proces, I asked for help. It turned out to be the compiler we run afterwards who messed up, and was 'corrected' by a dummy if-statement at the right place.

      if (1=0)
      print("The end of the world...");

      Of course, another senior programmer saw the statemnent and removed it just before it was shipped to the customer...

      Nowadays, I use JUnit to test portions of code, but it won't save me for this kind of problems.

  2. Heisenbugs - Oh my gawd by Anonymous Coward · · Score: 1, Interesting

    I have worked on electronic hardware for forty years. Over that period, I have experienced many such bugs. You carefully trace the problem and get to the point where you say: "It must be this!" So you go there and the signal is correct but now the equipment works properly. It stays working properly. I'm used to problems that are sometimes there and sometimes aren't; this is different. The working condition stays.

    It's like the equipment is playing hide and seek with you. You found the problem and the game is over. Maybe this proves that the 'great electron' has a sense of humor.

  3. Re:Make error message meaningful! by Anonymous Coward · · Score: 1, Interesting

    Damn, if only you worked for Microsoft. This is a major reason I absolutely hate diagnosing any problem on a Microsoft system. "Unable to connect to server." What the hell does that mean? A DNS failure? Connection timed out? Connection refused? No route?

  4. Exception Handling by Joe+U · · Score: 3, Interesting

    Not quite on topic, but, I once tried writing code in SQL (in this case for ColdFusion) by using stored procedures and exception handling.

    What a nightmare.

    Many people code unique inserts like this.

    Check for duplicate record.
    if not found, then insert.
    else, prompt user.

    Using exception handling, you code like this.

    insert.
    if error thrown, prompt user.

    One less query, lots less code.

    One problem, the web application language treated all db errors as fatal. When asked, I was told this was by design.

    Thinking about it, I feel that Macromedia didn't want me to code efficiently. You don't sell extra ColdFusion servers if you can offload all your data logic to the SQL server. (Where it belongs)

  5. Re:This is why MySQL ignites flamewars by Anonymous Coward · · Score: 1, Interesting

    And when the two meet, they go looking for a third guy who has a clock so they can settle the argument. And then they all go looking for another guy since none of them agree.

    All data is imperfect without proper reference, and all reference data is external. See the art of calibration.

    Even a room full of atomic clocks will not reflect the correct time in 100 years because the earth-sun system isn't slave to the rhythms of any process other than it (at least in any quantifiable sense). They could be several seconds off from any continuous measure of solar time.

    The more mechanistically you design your system to not need the external reference the more error you introduce until (at the limit) you can't be certain of anything at all.

    And considering our universe was created by a divide by zero, I think you are jumping to conclusions calling the performance of the function an error.

  6. Re:This is why MySQL ignites flamewars by Anonymous Coward · · Score: 1, Interesting

    > split of data validation as either a pre-processing step or a more macroscopic endevor.

    You're right that pre-processing is a good way to do constraints. We gave-up on using Oracle contraints, because they aren't very featureful. Well, that's not Oracle's fault. It's the fault of the SQL-standard. You can do *much* better error checking with a turing-complete programming language than you can with SQL. That's what we do with our applications even though we use Oracle. The biggest problem is that you must have good programmers.

    It's funny how that idiot isn't smart enough to argue the philosophical differences, but the kiddies here with mod points still give his post points. He posted a poorly thought-out rant, and that's the type of thing the 15 year-olds enjoy. Since most of the moderators here have no clue, they tend to blindly give points to any post like that. Instead, the moderators should only moderate posts they understand. This type of crap that has kept /. from being truely useful.

  7. Re:Make error message meaningful! by Anonymous Coward · · Score: 1, Interesting

    Which would you prefer
    1. Syntax error in line 1.
    2. ERROR [ID=WXY1234] found "'" where expected """ in statement: "{printf "%d\n', i}" on line: 1.


    I would prefer an error message indicating the real source of the problem, an unterminated string literal. What you list as the 2nd option doesn't describe the problem that the compiler runs into when trying to compile that code.

    I submit to you option #3:

    3. ERROR [ID=123123] Unterminated string literal in line 1

  8. Re:Has language in CS matured? by zuzulo · · Score: 3, Interesting

    Strangely, this is a phenomenon i have noticed with experts in many different fields who *really* understand what they are doing. They really do have an internal model of what is happening that essentially boils down to this sort of simplicity.

    In fact, the correlation is so strong that I am suspicious of folks who *cannot* boil an arbitrarily complex interaction into an easily understood metaphor.

    Jargon does in no way denote true understanding.

    --
    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
  9. Re:Has language in CS matured? by aka.Daniel'Z · · Score: 2, Interesting

    I've talked about some unrelated stuff with some friends once and we came to the conclusion that when you observe something, you compare it to yourself.

    Applying that idea to coding, it would mean that talking like that in relation to code (.. so I will ask the other component ..) would be like understanding and agreeing with what the code is doing. Talking about the code in other ways (.. so it will ask the user ..) may mean that you don't understand it (so you should observe it more as to understand it better) or that you don't agree with the way it works (in this case, if the responsability is yours, you should think about changing the code).

    I always thought of it as an interesting idea - my own experiences working as a programmer and working with other people on a project seems to lead to this.

  10. Re:Java does exactly what Bruce wants by Anonymous Coward · · Score: 1, Interesting

    Koreth, you are no doubt aware that Java already solves this problem. If you repackage your exception into one derived from RuntimeException, then it becomes an unchecked exception. So you can treat checked exceptions as an 'optional' feature of Java, and avoid them in the (IMO rare) situations where they make no sense.

    Repackaging at every higher semantic level is often worth doing. The power of abstraction comes through treating each class and method as a little black box. Higher level code cannot make assumptions about what goes on lower down, therefore you can change what goes on lower down, without penalty.

    If you are throwing a specific exception, say SQLException, up to higher level code, then if you later want to change your low level code to using XML or something else instead, the higher level code will break.

    (of course, handily, the higher level code will break at compile time, not at runtime - if you are using checked exceptions).