Slashdot Mirror


Null References, the Billion Dollar Mistake

jonr writes "'I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. In recent years, a number of program analysers like PREfix and PREfast in Microsoft have been used to check references, and give warnings if there is a risk they may be non-null. More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.' This is an abstract from Tony Hoare Presentation on QCon. I'm raised on C-style programming languages, and have always used null pointers/references, but I am having trouble of grokking null-reference free language. Is there a good reading out there that explains this?"

13 of 612 comments (clear)

  1. Re:20 second explanation by MattRog · · Score: 5, Informative

    "Obviously the best way of accomplishing such a database is to denormalize any value that might be null"

    That's normalizing -- the table in this example is de-normalized

    --

    Thanks,
    --
    Matt
  2. Re:There was a bigger mistake: by RetroGeek · · Score: 3, Informative

    A null terminated String is a misnomer. It is actually an array of chars which uses a special character to signify its upper boundary. So that a second variable is not needed to hold the upper boundary. Zero was chosen by K&R.

    In some languages, a String is an object, and the object holds the upper boundary, so a terminator flag is not required.

    --

    - - - - - - - - - - -
    I am a programmer. I am paid to produce syntax not grammar. Deal with it.
  3. Algebraic data types by Sneftel · · Score: 4, Informative

    The concept of "no null references" would be very limiting in a language without algebraic datatypes. You can think of null references as a sort of teeny limited braindead algebraic data type, actually. I get the feeling that much of the incredulity here stems from the posters not being familiar with languages that support them. If this describes you, check out Haskell and OCaML! They're the sort of languages that make you a better programmer no matter what language you're using.

    --
    The opinions stated herein do not necessarily represent those of anybody at all. Deal with it.
  4. Re:There was a bigger mistake: by Panaflex · · Score: 4, Informative

    Which comes from Pascal - which has always had the length at the beginning. Hence why pascal strings always had limits.

    --
    I said no... but I missed and it came out yes.
  5. Re:20 second explanation by AKAImBatman · · Score: 4, Informative

    doesn't NULL in SQL represent "unknown", which is something entirely different that a NULL reference

    No. NULL in SQL represents an absence of data. Which is occasionally used to cover for unknown values. However, NULL is a piece of data that says there is an absence of data. Which is incorrect. Absence of data means that it doesn't exist. Therefore, nothing should exist in its place.

    Normalizing the database can create a situation where the NULL is unnecessary. Therefore, the concept is not needed by computer science. The problem is that real-world considerations often override the ivory tower of comp-sci. And one of those considerations was the fact that RDBMSes have traditionally been organized according to a fixed column model. The inflexibility of the model is driven by the on-disk data structures which are optimized for fast access. OODBMSes (which are really fancy RDBMSes with many "pure" relational features that work around the traditional weaknesses of RDBMSes) attempt to solve this issue by introducing concepts like table-less storage, columns that may or may not exist on a per-row basis, and a dynamic typing system that potentially allow for any data type to show up in particular column. (Note that columns are often handled more as key-value pairs than what we normally think of as columns. This does not undo the theoretical foundation of the Relational model, only results in a different view on it.)

  6. Pass by reference by hobbit · · Score: 4, Informative

    I'm raised on C-style programming languages, and have always used null pointers/references, but I am having trouble of grokking null-reference free language.

    Take a look at C++, in which you can declare methods to be "pass by reference" rather than "pass by pointer". Although the former is actually really just passing a pointer too, the semantics of the construct make it impossible to pass NULL.

    --
    "Wise men talk because they have something to say; fools, because they have to say something" - Plato
    1. Re:Pass by reference by Chris_Jefferson · · Score: 3, Informative

      According to the C++ standard, as soon as you dereference NULL, you are off into the nasty old land of undefined behaviour, and all kinds of horrible things can occur to you.

      --
      Combination - fun iPhone puzzling
  7. Re:null or not null, that is the question by marcansoft · · Score: 4, Informative

    Wrong. A NULL pointer is implementation-defined in C and !p would work just as well if the bit value of p were 0xdeadbeef for a NULL pointer. The compiler is responsible for that.

    0 is used because it's convenient for compilers and architectures, not for programmers. Programmers don't care, they never see the bit pattern of a NULL pointer unless they're doing things wrong (casting to integers) or working on lower level architecture-specific code. Most think they do, though. See the C-faq section on NULL pointers.

  8. Re:20 second explanation by AKAImBatman · · Score: 3, Informative

    You need to have NULL to represent missing data, anything else is actual data.

    I don't think you understand the argument. Having the following is incorrect:

    ID|Name|State|Address|Address2
    5|Bob|MN|12 East St.|NULL

    THIS is correct:

    ID|Name|State
    5|Bob|MN
     
    ID|Parent|Address|Order
    22|5|12 East St.|1

    Note how there is no NULL value. In fact, NULL is antithetical to relational theory as all set values should have a value. Missing data should be normalized away.

    There are only people who can't intuitively understand 3VL.

    3 value logic has nothing to do with it. 3VL actually creates problems in this case. In fact, your very own snarky comment above is a perfect example of how things go wrong with 3VL:

    ID|Contestant|Prize
    2|AKAImBatman|NULL
     
    resultset = sql("select Prize from Contestants where Contestant = 'AKAImBatman');
     
    //Prints the stupid "NULL" answer
    while(resultset.next()) print("He leaves with " + resultset["prize"] + " the home game.");

    FAIL.

    Now look at this situation:

    ID|Contestant
    2|AKAImBatman
    3|geekoid
     
    Parent|Prize
    3|Remedial 6th Normal Form
     
    resultset = sql("select Prize from Prizes join Contestants on Contestants.ID = Prizes.Parent where Contestant = 'AKAImBatman'");
     
    //Correctly prints nothing
    while(resultset.next()) print("He leaves with " + resultset["prize"] + " the home game.");

  9. Re:null or not null, that is the question by Amazing+Quantum+Man · · Score: 3, Informative

    Undefined behaviour. A reference must refer to a valid object. Also, dereferencing the null pointer (ip) is undefined behaviour.

    --
    Fascism starts when the efficiency of the government becomes more important than the rights of the people.
  10. maybe type by j1m+5n0w · · Score: 4, Informative

    Maybe types are wonderful. I first thought they were inconvenient, since you have to pattern match against them any time you want to extract the value, but then I realized that that was something I ought to be doing anyways, and the advantages of never accidentally dereferencing a null pointer vastly outweigh a little extra typing. And then, more recently, I figured out how to use the maybe monad to string together a bunch of things that might fail without having to manually pattern match every time.

  11. Re:null or not null, that is the question by geekgirlandrea · · Score: 4, Informative

    Actually, in C the null pointer constant is a distinct value from integer zero. The standard requires the following (see section 6.3.2.3 of ISO C99):

    • That the integer value 0, when cast to any pointer type, yield a null pointer
    • That a null pointer, when cast to any other pointer type, yield another null pointer
    • That any two null pointers will compare as equal, regardless of type

    As for constructions like if (!ptr), the standard requires that the if statement execute if its value is non-zero, and it would be entirely legal for the null pointer to have a non-zero in-memory representation, but convert to the integer zero. See, for example, the comp.lang.c FAQ.

  12. Re:20 second explanation by Estanislao+Mart�nez · · Score: 3, Informative

    Variant types (or, put more generally, algebraic data types) are indeed a general solution for this problem, that can be reused for countless others.

    The simplest example here is the way you define linked list types in a functional language like Haskell. In pseudo-code (yes, I know this might not be valid Haskell code):

    data List a = EmptyList | Node a (List a)

    This is a data type declaration that says that the type "List of a" is either the singleton EmptyList value, or a 'Node a' value, which contains (as struct fields, basically) an element of type a and a List of a. (In case it isn't clear, 'a' is a type parameter here; so a list of strings would be 'List String', a list of integers would be 'List Integer', and so on.)

    This works just as well to allow you to define generic nullable type constructors (which the standard Haskell library provides):

    data Maybe a = Nothing | Just a

    The type 'Maybe String' represents a value that might be either 'Nothing', or 'Just x' for some x of type String.