Slashdot Mirror


Null References, the Billion Dollar Mistake

jonr writes "'I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. In recent years, a number of program analysers like PREfix and PREfast in Microsoft have been used to check references, and give warnings if there is a risk they may be non-null. More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.' This is an abstract from Tony Hoare Presentation on QCon. I'm raised on C-style programming languages, and have always used null pointers/references, but I am having trouble of grokking null-reference free language. Is there a good reading out there that explains this?"

24 of 612 comments (clear)

  1. null or not null, that is the question by alain94040 · · Score: 5, Interesting

    It's hard to imagine life without the null pointer! That being said, the author is not really responsible for billions of dollars of mistakes, the programmers are.

    If there is one thing I'll complain about, it's the choice of the value 0. It's almost impossible to trace it. When we do hardware debug of chips, we prefer to use a much more visible value such as 0xdeadbeef for instance. Otherwise a bad pointer will bland too much with all the uninitialized values out there.

    In assembly, null has no particular meaning. If you dereference an address, you can do it in any range you like. It's just that 0 on most machines was not a good place to store anything, since it would typically be used to boot the OS or some other critical IO function that you don't want to mess up with. Thus null was born.

    1. Re:null or not null, that is the question by CTalkobt · · Score: 4, Insightful

      When debugging at the hardware level it's fairly common to fill uninitialized memory (or newly allocated in a debug version of the malloc libraries) with a value that will either cause the computer to execute a system level break ( eg: TRAP / BRK etc) or something fairly obvious such as ($BA).

      If you don't like the 0's, then replace your memory allocation library.

      --
      There's a gorilla from Manilla whose a fella that stinks of vanilla and has salmonella.
    2. Re:null or not null, that is the question by marcansoft · · Score: 4, Informative

      Wrong. A NULL pointer is implementation-defined in C and !p would work just as well if the bit value of p were 0xdeadbeef for a NULL pointer. The compiler is responsible for that.

      0 is used because it's convenient for compilers and architectures, not for programmers. Programmers don't care, they never see the bit pattern of a NULL pointer unless they're doing things wrong (casting to integers) or working on lower level architecture-specific code. Most think they do, though. See the C-faq section on NULL pointers.

    3. Re:null or not null, that is the question by johny42 · · Score: 5, Funny

      That being said, the author is not really responsible for billions of dollars of mistakes, the programmers are.

      Who am I to argue with someone that is taking resposibility for my mistakes?

    4. Re:null or not null, that is the question by Thiez · · Score: 5, Insightful

      > Another behaviour by default that C got wrong is initialisation: by default your variables are not initialised so if you forget to initialise your variables your program may act randomly which is a pain to debug, the correct default would be to have all variables initialised by default but with the option to let variables non-initialised which can be useful as a performance optimisation.

      C did NOT get it 'wrong'. C just gives you a lot of rope to hang yourself with. You are free to write you own version of C that protects you from yourself (tweaking an open source C-compiler to initialise all variables by default (to what value?) should take you a few hours at most, and most of that time will go to finding the right source file to edit...), but I like it when C obliterates my foot every now and then. Alternatively you could write a program that goes through your code to look for situations where variables that may be uninitialised are used (I believe Java does this) and whines about it.

    5. Re:null or not null, that is the question by Toonol · · Score: 4, Funny

      MOV DX, BABA; GET UP N DANCE

    6. Re:null or not null, that is the question by geekgirlandrea · · Score: 4, Informative

      Actually, in C the null pointer constant is a distinct value from integer zero. The standard requires the following (see section 6.3.2.3 of ISO C99):

      • That the integer value 0, when cast to any pointer type, yield a null pointer
      • That a null pointer, when cast to any other pointer type, yield another null pointer
      • That any two null pointers will compare as equal, regardless of type

      As for constructions like if (!ptr), the standard requires that the if statement execute if its value is non-zero, and it would be entirely legal for the null pointer to have a non-zero in-memory representation, but convert to the integer zero. See, for example, the comp.lang.c FAQ.

  2. 20 second explanation by AKAImBatman · · Score: 4, Interesting

    I am having trouble of grokking null-reference free language.

    If you're familiar with SQL, then a simple "MyColumn NOT NULL" definition should explain it. Basically, the value can never be set to a null value. Attempting to do so is an error condition itself.

    In fact, DB design is a pretty good analogy for the concept as databases often are forced to wrestle with this issue.

    Consider for a moment how you would design a database that has absolutely NO null references. Not a one. Zip, zero, nada. Obviously the best way of accomplishing such a database is to denormalize any value that might be null. So if Address2 is optional, you would want to split Address into its own table with a parent key pointing back to the user entry. If the user has an Address2 value, there will be a row. If the user does NOT have an Address2, the row will be missing. In that way, empty result sets take the place of null values.

    In terms of programming languages, there are a varity of ways to map such a concept. Collections are a 1:1 mapping to result sets that can work. If you don't have any values in your collection, then you know that you don't have a value. Very easy. Similarly, you can be sure that none of the values passed to a function or method will ever contain a null value. Cases where you might want to pass some of the values but not all can be handled either by method overloading (e.g. Java) or by allowing a variable number of parameters. (e.g. C)

    Some pieces of programming would become slightly more difficult. For example, 'if(hashmap.get("myvalue") != null)' would not be a valid construct. You'd need to perform a check like this: 'if(hashmap.exists("myvalue")'

    Of course, the latter is the "correct" check anyway, so the theory goes that the software will be more robust and reliable.

    1. Re:20 second explanation by MattRog · · Score: 5, Informative

      "Obviously the best way of accomplishing such a database is to denormalize any value that might be null"

      That's normalizing -- the table in this example is de-normalized

      --

      Thanks,
      --
      Matt
    2. Re:20 second explanation by AKAImBatman · · Score: 4, Informative

      doesn't NULL in SQL represent "unknown", which is something entirely different that a NULL reference

      No. NULL in SQL represents an absence of data. Which is occasionally used to cover for unknown values. However, NULL is a piece of data that says there is an absence of data. Which is incorrect. Absence of data means that it doesn't exist. Therefore, nothing should exist in its place.

      Normalizing the database can create a situation where the NULL is unnecessary. Therefore, the concept is not needed by computer science. The problem is that real-world considerations often override the ivory tower of comp-sci. And one of those considerations was the fact that RDBMSes have traditionally been organized according to a fixed column model. The inflexibility of the model is driven by the on-disk data structures which are optimized for fast access. OODBMSes (which are really fancy RDBMSes with many "pure" relational features that work around the traditional weaknesses of RDBMSes) attempt to solve this issue by introducing concepts like table-less storage, columns that may or may not exist on a per-row basis, and a dynamic typing system that potentially allow for any data type to show up in particular column. (Note that columns are often handled more as key-value pairs than what we normally think of as columns. This does not undo the theoretical foundation of the Relational model, only results in a different view on it.)

    3. Re:20 second explanation by AKAImBatman · · Score: 4, Insightful

      Consider the situation of apples. If you have an apple, then something is in your possession. If you don't have an apple, what do you have? Do you have some sort of object that depicts your lack of an apple? Obviously not. Yet in the world of computers, we have this special piece of data that shows our lack of data. It's a bit like getting a certificate that you have no apples. The certificate accomplishes nothing except to fill a space that does not need to be filled.

    4. Re:20 second explanation by jadavis · · Score: 4, Insightful

      It's a bit weird, but it makes sense when you actually follow the logic.

      Not really.

      The expression "0 <> 1" is true, but the poster you referenced also says "0 <> NULL", which is NOT true, it is NULL.

      Additionally, NULL is not always treated as false-like. For instance, if you added the constraint "CHECK (0 NOT IN (NULL, 1))", that would always succeed, as though it was "CHECK(true)".

      And if you think "it makes sense", consider this: ... WHERE x > 0 OR x <= 0
      If x is NULL, that statement will evaluate to NULL, and then be treated as false-like, and the row will not be returned. However, there is no possible value of x such that the statement will be false.

      I'm not a big fan of NULL, but I think the most obvious sign that it's a problem is that so many people think they understand it, when they do not.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
  3. Wouldn't help by corporate+zombie · · Score: 5, Insightful

    Fine. No null references. So I create the same thing by having a reference to some unique structure (probably named Null) and I still *fail to check for it*.

    Null references don't kill programs. Programmers do.

        -CZ

    1. Re:Wouldn't help by nuttycom · · Score: 4, Interesting

      If you use a sane class for references that could possibly be null (like Option (aka Maybe in haskell) then your compiler will *force* you to handle the null case.

      This is where null went wrong, at least in statically typed languages: it's a hole in the type system that errors fall through into your program. When coding in Java, I make an explicit point to never return null from a method; if I have a situation where no reasonable return value might exist, I use the Option class from functionaljava.org and thus force the client to handle the possibility of the method not returning sensible data. Since Option obeys the monad laws, it's easy to chain together multiple things that might fail (with the bind or flatMap operations.)

  4. Re:There was a bigger mistake: by Rik+Sweeney · · Score: 5, Insightful

    Null-terminated strings. The bane of modern computing.

    Yeah! Let's abolish them, life would be much simplerasdjkaRGfl$!jaekrbFt6634i2u23Q0CCA;DMF ASDJFERR

  5. Algebraic data types by Sneftel · · Score: 4, Informative

    The concept of "no null references" would be very limiting in a language without algebraic datatypes. You can think of null references as a sort of teeny limited braindead algebraic data type, actually. I get the feeling that much of the incredulity here stems from the posters not being familiar with languages that support them. If this describes you, check out Haskell and OCaML! They're the sort of languages that make you a better programmer no matter what language you're using.

    --
    The opinions stated herein do not necessarily represent those of anybody at all. Deal with it.
  6. Re:There was a bigger mistake: by Anonymous Coward · · Score: 4, Funny

    I agree.ï½ï½ï½ï½ï½ï½ï½cï½ï½A
    5ï½)ï½"ï½ï½ï½lï½3åï½ï½ï½SLï½4ï½54Vï½iï½ï½ï½D.O%N|ï½ï½ï½Tï½2nï½ì'iï½ï½ï½;ï½
                                                      ï½,ï½ï½(85ï½Iï½{ï½ï½ï½ï½)ï½Oï½Æ¼ï½%Cï½iwï½ï½ï½ï½ï½ï½I!,.ï½Õ'ï½ï½ï½ï½!ï½òfsQï½ï½zï½ï½Gï½ï½ï½aï½zï½-@ï½ yï½Ë+ï½ï½ï½Xï½ï½ï½ï½"ï½cï½âï½ï½ï½ï½ï½ï½ï½ï½ï½ï½dï½nbÕoeï½ï½ï½ï½lï½ï½ï½ï½ï½;hmï½ï½

  7. Re:There was a bigger mistake: by Panaflex · · Score: 4, Informative

    Which comes from Pascal - which has always had the length at the beginning. Hence why pascal strings always had limits.

    --
    I said no... but I missed and it came out yes.
  8. The mistake was actually not having a standard by Nicolas+MONNET · · Score: 4, Insightful

    for Pascal type strings in C. The fact that null-terminated strings existed wasn't the problem, they make some sense in some respects, such as when you want to pass text of arbitrary length. But the real problem, the real bug was not having a standard way of doing real strings in C. Everybody had to do it himself, poorly. Had there been a standard, no matter how poor, it would have been a starting point to do something better if needed, and would have been better anyway for many uses than C strings. It would have avoided MANY vulnerabilities from common software.

  9. Pass by reference by hobbit · · Score: 4, Informative

    I'm raised on C-style programming languages, and have always used null pointers/references, but I am having trouble of grokking null-reference free language.

    Take a look at C++, in which you can declare methods to be "pass by reference" rather than "pass by pointer". Although the former is actually really just passing a pointer too, the semantics of the construct make it impossible to pass NULL.

    --
    "Wise men talk because they have something to say; fools, because they have to say something" - Plato
  10. An was an even Bigger mistake: by Wargames · · Score: 5, Funny

    Zero. The bane of all. It was the gateway math to all modern problems. It would be so much simpler with just countables. Surely the current crisis, measured in trillions would look so much better without all those zeros.
    Whoever it was who invented zero should take responsibility for all the worlds problems, ex nehilo.

    --
    -- Each tock of the Planck clock is a new world and here we are still life. --
  11. Null as a concept by JustNiz · · Score: 5, Interesting

    Stroustrup's "C++ Programming Language" book introduces a concept called "resource acquisition is initialisation" that was eye-opening enough to me that it forever changed the way I think about code, and also seems relevant to your point.

    The basic idea is that an object is always meant to represent something tangible. As an example, consider the design of file object that abstracts file I/O operations. As a developer, I've come across this one several times, it is normal that such objects have open and close methods, however that makes the design of the object in contradiction with Stroustrup's concept because open/close provided as methods rather than only called in the constructor/destructor means the object may be in existence yet be in a state where it is not associated with an open file. You basically have to grok that having a file object around that doesn't directly map to an open file just adds overhead to the system and is basically bad OO design in that in some sense that object is meaningless.

    Apply the same concept to a reference and you have your answer. If a reference is pointing at nothing, then what is its purpose? The only thing a NULL reference is good for is when the software design ascribes a special meaning to the value NULL. Instead of just meaning address location 0, it gets subverted to mean "variable unassigned" or the "tail node of list" or somesuch. Ascribing multiple meanings to a variable value (especially pointers/references that are only ever meant to hold memory addresses) is one example of bad programming practice known as programming by side-effect which most people agree should be avoided.

    Another point is that in most OO lanugages, references have an extra benefit of being more strongly typed than pointers, menaing that reference is guaranteed to only ever be pointing at an instantiated object of its specific type. That guarantee also gets broken when a reference can be NULL.

  12. Re:There was a bigger mistake: by Anonymous Coward · · Score: 5, Funny

    Just allocate the same amount of memory for everythi

  13. maybe type by j1m+5n0w · · Score: 4, Informative

    Maybe types are wonderful. I first thought they were inconvenient, since you have to pattern match against them any time you want to extract the value, but then I realized that that was something I ought to be doing anyways, and the advantages of never accidentally dereferencing a null pointer vastly outweigh a little extra typing. And then, more recently, I figured out how to use the maybe monad to string together a bunch of things that might fail without having to manually pattern match every time.