Slashdot Mirror


Null References, the Billion Dollar Mistake

jonr writes "'I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. In recent years, a number of program analysers like PREfix and PREfast in Microsoft have been used to check references, and give warnings if there is a risk they may be non-null. More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.' This is an abstract from Tony Hoare Presentation on QCon. I'm raised on C-style programming languages, and have always used null pointers/references, but I am having trouble of grokking null-reference free language. Is there a good reading out there that explains this?"

13 of 612 comments (clear)

  1. Re:20 second explanation by Anonymous Coward · · Score: 3, Insightful

    doesn't NULL in SQL represent "unknown", which is something entirely different that a NULL reference, which in the context of programming languages is a discrete value?

  2. Re:null or not null, that is the question by CTalkobt · · Score: 4, Insightful

    When debugging at the hardware level it's fairly common to fill uninitialized memory (or newly allocated in a debug version of the malloc libraries) with a value that will either cause the computer to execute a system level break ( eg: TRAP / BRK etc) or something fairly obvious such as ($BA).

    If you don't like the 0's, then replace your memory allocation library.

    --
    There's a gorilla from Manilla whose a fella that stinks of vanilla and has salmonella.
  3. Wouldn't help by corporate+zombie · · Score: 5, Insightful

    Fine. No null references. So I create the same thing by having a reference to some unique structure (probably named Null) and I still *fail to check for it*.

    Null references don't kill programs. Programmers do.

        -CZ

  4. Re:There was a bigger mistake: by Rik+Sweeney · · Score: 5, Insightful

    Null-terminated strings. The bane of modern computing.

    Yeah! Let's abolish them, life would be much simplerasdjkaRGfl$!jaekrbFt6634i2u23Q0CCA;DMF ASDJFERR

  5. Re:null or not null, that is the question by jeremyp · · Score: 3, Insightful

    That's all very well, but in a production environment when dereferencing a NULL pointer you'd probably rather have the program crash than carry on merrily with bad data. With a zero null value, you can easily arrange for this to happen by protecting the bottom page of memory from reads and writes. That way, even an assembly language program can't dereference a null pointer.

    --
    All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
  6. The mistake was actually not having a standard by Nicolas+MONNET · · Score: 4, Insightful

    for Pascal type strings in C. The fact that null-terminated strings existed wasn't the problem, they make some sense in some respects, such as when you want to pass text of arbitrary length. But the real problem, the real bug was not having a standard way of doing real strings in C. Everybody had to do it himself, poorly. Had there been a standard, no matter how poor, it would have been a starting point to do something better if needed, and would have been better anyway for many uses than C strings. It would have avoided MANY vulnerabilities from common software.

  7. Re:20 second explanation by AKAImBatman · · Score: 4, Insightful

    Consider the situation of apples. If you have an apple, then something is in your possession. If you don't have an apple, what do you have? Do you have some sort of object that depicts your lack of an apple? Obviously not. Yet in the world of computers, we have this special piece of data that shows our lack of data. It's a bit like getting a certificate that you have no apples. The certificate accomplishes nothing except to fill a space that does not need to be filled.

  8. Re:20 second explanation by jadavis · · Score: 4, Insightful

    It's a bit weird, but it makes sense when you actually follow the logic.

    Not really.

    The expression "0 <> 1" is true, but the poster you referenced also says "0 <> NULL", which is NOT true, it is NULL.

    Additionally, NULL is not always treated as false-like. For instance, if you added the constraint "CHECK (0 NOT IN (NULL, 1))", that would always succeed, as though it was "CHECK(true)".

    And if you think "it makes sense", consider this: ... WHERE x > 0 OR x <= 0
    If x is NULL, that statement will evaluate to NULL, and then be treated as false-like, and the row will not be returned. However, there is no possible value of x such that the statement will be false.

    I'm not a big fan of NULL, but I think the most obvious sign that it's a problem is that so many people think they understand it, when they do not.

    --
    Social scientists are inspired by theories; scientists are humbled by facts.
  9. Re:null or not null, that is the question by Thiez · · Score: 5, Insightful

    > Another behaviour by default that C got wrong is initialisation: by default your variables are not initialised so if you forget to initialise your variables your program may act randomly which is a pain to debug, the correct default would be to have all variables initialised by default but with the option to let variables non-initialised which can be useful as a performance optimisation.

    C did NOT get it 'wrong'. C just gives you a lot of rope to hang yourself with. You are free to write you own version of C that protects you from yourself (tweaking an open source C-compiler to initialise all variables by default (to what value?) should take you a few hours at most, and most of that time will go to finding the right source file to edit...), but I like it when C obliterates my foot every now and then. Alternatively you could write a program that goes through your code to look for situations where variables that may be uninitialised are used (I believe Java does this) and whines about it.

  10. Lying to the language - the real problem. by Animats · · Score: 3, Insightful

    A useful way to think about troubles in language design is to ask the question "When do you have to lie to the language?" Most of the major languages have some situations in which you have to lie to the language, and that's usually a cause of bugs.

    The classic example is C's "array = pointer" ambiguity. Consider

    int read(int fd, char* buf, size_t len);

    Think hard about "char* buf". That's not a pointer to a character. It's a pass of an array by reference. The programmer had to lie to the language because the language doesn't have a way to talk about what the programmer needed to say. That should have been

    int read(int fd, byte& buf[len], size_t len);

    Now the interface is correctly defined. The caller is passing an array of known size by reference. Notice also the distinction between "byte" and "char". C and C++ lack a "byte" type, one that indicates binary data with no interpretation attached to it. Python used to be that way too, but the problem was eventually fixed; Python 3K has "unicode", "str" (ASCII text only, 0..127, no "upper code pages"), and "bytes" (uninterpreted binary data). C and C++ are still stuck with a 1970s approach to the problem.

    The problem with NULL is related. Some functions accept NULL pointers, some don't, and many languages don't have syntax for the distinction. C doesn't; C++ has references, but due to backwards compatibility problems with C, they're not well handled. ("this", for example, should have been a reference; Strostrup admits he botched that one.) C++ supposedly disallows null references (as opposed to null pointers), but doesn't check. C++ ought to raise an exception when a null pointer is converted to a reference.

    SQL does this right. A field may or may not allow NULL, and you have to specify.

    Look for holes like this in language design. Where are you unable to say what you really meant? Those are language design faults and sources of bugs.

  11. Re:20 second explanation by thePowerOfGrayskull · · Score: 3, Insightful

    And if you think "it makes sense", consider this: ... WHERE x > 0 OR x <= 0 If x is NULL, that statement will evaluate to NULL, and then be treated as false-like, and the row will not be returned. However, there is no possible value of x such that the statement will be false.

    If x is NULL, the statement evaluates to false. This isn't "false-like"; NULL is the state of not having a value. Comparing a non-value to /any/ value of or range of values is logically false: X is neither LTE 0 nor is it GT 0; a non-value has no relation to the value 0.

    While you can use it to derive a true/false value, NULL is not a (in the RDBMS context) value at all. Would you say in mathematics "empty set" makes no logical sense?

  12. Re:null or not null, that is the question by jvkjvk · · Score: 3, Insightful

    Mods are on crack.

    Of course there is more than a syntatic difference between a reference and a pointer in C++.

    For one, references CANNOT be null, while pointers are allowed to be null. I'd say that is an indictor of a pretty big semantic difference, wouldn't you?

    To say that * or & "fixes" the difference is handwaving around the fact that pointers and references are two different, yet related concepts (that is, they have more that a "purely syntatical" difference).

    To be pendatic, you can't even write a null reference in C++; the compiler will complain (more pendantic - although you can delete the underlying object sometimes, this does not make the reference null, merely dangling) so it is also nonsensical to talk about "null references" vis a vis "null pointers" per se, except in a most general way.

    Regards.

  13. Re:no such requirement at the assembly level by thethibs · · Score: 3, Insightful

    You are confusing C with...well, I'm not sure what...Haskell, maybe? In many cases with C, the sequence of events is as important as the end result. C code can have side-effects.

    C is not an expression evaluator, it's a control language; A && B is an instruction to copy A and if it is non-zero, replace the copy with B, in that order. A++ says copy A and then increment it.

    Most of the people on slashdot can tell you why that's important and a few of them have; there are more than a few scenarios where not getting the sequence right would have undesirable effects even if the returned value was correct. Look up memory-mapped I/O.

    --
    I'm a Programmer. That's one level above Software Engineer and one level below Engineer.