Slashdot Mirror


How Your Compiler Can Compromise Application Security

jfruh writes "Most day-to-day programmers have only a general idea of how compilers transform human-readable code into the machine language that actually powers computers. In an attempt to streamline applications, many compilers actually remove code that it perceives to be undefined or unstable — and, as a research group at MIT has found, in doing so can make applications less secure. The good news is the researchers have developed a model and a static checker for identifying unstable code. Their checker is called STACK, and it currently works for checking C/C++ code. The idea is that it will warn programmers about unstable code in their applications, so they can fix it, rather than have the compiler simply leave it out. They also hope it will encourage compiler writers to rethink how they can optimize code in more secure ways. STACK was run against a number of systems written in C/C++ and it found 160 new bugs in the systems tested, including the Linux kernel (32 bugs found), Mozilla (3), Postgres (9) and Python (5). They also found that, of the 8,575 packages in the Debian Wheezy archive that contained C/C++ code, STACK detected at least one instance of unstable code in 3,471 of them, which, as the researchers write (PDF), 'suggests that unstable code is a widespread problem.'"

470 comments

  1. News flash by digitalPhant0m · · Score: 1

    Humans write unstable code.

    1. Re:News flash by war4peace · · Score: 4, Informative

      I would also like to understand what's the definition of "unstable code".

      --
      ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
    2. Re:News flash by Mitchell314 · · Score: 5, Funny

      Code with a finite half-life. Sometimes radiates when it decays. The byproducts tend to be hazardous to health, and most cause symptoms such as headaches, tremors, Carpal Tunnel Syndrome, and Acute Induced Tourette Syndrome. Handle with care. The Daily WTF has an emergency hotline if you or somebody you know has been exposed to unsafe levels of unstable code.

      --
      I read TFA and all I got was this lousy cookie
    3. Re:News flash by Anonymous Coward · · Score: 1

      Code that, just when you're relying on it the most, bursts into tears, slams the front door, and runs away to its mother for a week.

    4. Re:News flash by Cryacin · · Score: 4, Funny

      So that's why you have to restart your computer. Gets rid of dangerous radiation from weapons grade baloneyum decay.

      --
      Science advances one funeral at a time- Max Planck
    5. Re:News flash by EvanED · · Score: 1

      Didn't RTFA because this is /., but I'd guess that it's code that works now but is fragile under a change of compiler, compiler version, optimization level, or platform.

    6. Re:News flash by foobar+bazbot · · Score: 2

      I would also like to understand what's the definition of "unstable code".

      Unstable code is code such that, when you make an arbitrarily small change, you end up rewriting the entire thing.

      Stable code, by contrast, is code such that when you make an arbitrarily small change, the code ends up being restored to its original state, or perhaps engaging in a bounded oscillation, where you and another coder keep changing it back and forth with every release.

    7. Re:News flash by ShanghaiBill · · Score: 4, Informative

      Didn't RTFA because this is /., but I'd guess that it's code that works now but is fragile under a change of compiler, compiler version, optimization level, or platform.

      Yes, you didn't RTFA, because your definition actually makes sense. TFA defines "unstable code" as code with undefined behavior. TFA also claims that many compilers simply DELETE such code. I have never seen a compiler that does that, and I seriously doubt if is really common. Does anyone know of a single compiler that does this? Or is TFA just completely full of crap (as I strongly suspect)?

    8. Re:News flash by EvanED · · Score: 5, Informative

      Yes, you didn't RTFA, because your definition actually makes sense. TFA defines "unstable code" as code with undefined behavior.

      ...and undefined behavior is exactly what causes the things I listed.

      TFA also claims that many compilers simply DELETE such code. I have never seen a compiler that does that, and I seriously doubt if is really common.

      You probably haven't used any desktop compilers.

      Just a sampling:

      • During MS's security push a decade ago, they discovered that the compiler was optimizing away the memset in code such as memset(password, '\0', len); free(password); that was limiting the lifetime of sensitive information, because the assignment to password in the memset was a dead assignment -- it was never read from (not actually undefined behavior, but it is an example of the compiler deleting unused code that was actually there for a purpose)
      • I linked part 3 of this series to you in another response, but the first example in here discusses such an optimization that GCC did which removed security checks in the Linux kernel (see also this series -- look down at "A Fun Case Analysis")
      • GCC has long turned on -fno-strict-aliasing because optimizations based on the strict aliasing assumption break the kernel (more precisely: code that violates the standard's strict aliasing rules was being "mis-"optimized), though I don't know if it led to security implications
    9. Re:News flash by Anonymous Coward · · Score: 1

      TFA defines "unstable code" as code with undefined behavior.

      This is why it's so important to document your code. Once the compiler reads your comments it will understand your intent and decide whether you or your code is unstable.

    10. Re:News flash by gweihir · · Score: 4, Interesting

      That is not "unstable" or "undefined" code. There is already a word for it: dead code. In addition, any programmer worth his/her salt will make sure to define things like that as "volatile", i.e. tell the compiler that they might be accessed at any time from place the complier does not see. Which is exactly the security problem here. Don't blame compilers for programmer incompetence....

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    11. Re:News flash by EvanED · · Score: 1

      That is not "unstable" or "undefined" code. There is already a word for it: dead code.

      In all three cases, or just the first? I agree in the first case, but I did say that it wasn't undefined behavior, so you're not bringing anything new to the table.

      For the second and third cases, things are a lot more complicated. In the second case in particular, the removed code was dead precisely because the compiler made inferences based on undefined behavior.

      In addition, any programmer worth his/her salt will make sure to define things like that as "volatile", i.e. tell the compiler that they might be accessed at any time from place the complier does not see. Which is exactly the security problem here. Don't blame compilers for programmer incompetence....

      Because every mistake has to be borne out of incompetence? From a quick search, the same problem seems to have occurred in Linux. Are they incompetent?

      And besides, I doubt your rule is universal. I'm not a programmer for security-sensitive code, but let's take a quintessential examples of performance-critical code: an encryption loop. Using [i]volatile[/i] removes a number of optimization opportunities from the compiler. If you mark the key as volatile, it's entirely conceivable that your performance-critical code will slow substantially.

    12. Re:News flash by epee1221 · · Score: 2

      I have never seen a compiler that does that, and I seriously doubt if is really common.

      I'm a bit depressed to find a /.er who's never seen GCC :-P
      I once wrote an overflow check wrong -- I tried to write an `if' that would check whether the preceding operation on signed integers had overflowed. Overflow on signed integers is undefined behavior, so once it happens, it is legal for the program to do anything. "Anything" includes updating the variable with the overflowed value and then skipping the condition check, which is what GCC's output code did.

      --
      "The use-mention distinction" is not "enforced here."
    13. Re:News flash by Anonymous Coward · · Score: 0

      I guess kernel devs don't know about __attribute__((__may_alias__)) ?

    14. Re:News flash by davester666 · · Score: 1

      Your code does that too! I thought it might just be me.

      Now, clearly, it's not me, because it happened to others as well.

      --
      Sleep your way to a whiter smile...date a dentist!
    15. Re:News flash by tragedy · · Score: 1

      Compilers will remove code that doesn't actually do anything. This could actually affect the program. Code that does nothing can still act essentially as a noop. Code that is never called can make the executable slightly larger, causing longer load times. If you have a race condition or weird timing error, your code could mysteriously work with a delay and not work without it, or vice versa.

      If that is happening, of course, you have problems that can't really be fixed by a compiler. "Fixing" this issue in the compiler is just as likely to expose new wierd bugs. The term "unstable code" is, I agree with you, poorly defined at best and utter nonsense at worst.

    16. Re:News flash by maxwell+demon · · Score: 4, Interesting

      While what you say is true, I think it's not what they mean. Instead what they mean is compilers taking advantage of undefined behaviour you didn't notice. The compiler is allowed to assume that undefined behaviour never happens, and optimize accordingly. The important point is that this can even affect code before the undefined behaviour would occur. For example, consider the following code, where undefined() is some code that causes undefined behaviour:

      if (a>4)
      {
        a=4;
        big=true;
        undefined();
      }
      else
      {
        big=false;
      }
      assert(a<=4);

      Now if a>4, the code inevitably runs into undefined behaviour, and therefore it may assume that a is not larger than 4 right from the start. Therefore it is allowed to compile the complete block to simply

      big=false;

      Note that even the assert doesn't help because the compiler "knows" it cannot trigger anyway, and therefore optimizes it out.

      I think it is not hard to imagine how this can lead to security problems.

      Another nice example (which I read on the gcc mailing list quite some time ago; not an exact quote though):

      bool validate_passwd(char const* user)
      {
        int tries = 0;
        char const* given_passwd = ask_password();
        char const* user_passwd = get_password(user);
        while (strcmp(given_password, user_password))
        {
          tries = tries++; /* undefined behaviour! */
          if (tries > 3)
            return false; /* allow only to try three times */
          printf("password not valid. Please try again.\n");
          given_passwd = ask_passwd();
        }
        return true;
      }

      Now if strcmp returns anything but 0, the code inevitably runs into undefined behaviour, therefore the compiler is allowed to assume that never happens, and therefore is allowed to optimize the code to simply

      bool validate_passwd(char const* user)
      {
        char const* given_passwd = ask_password();
        char const* user_passwd = get_password(user);
        return true;
      }

      So there goes your password security.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    17. Re:News flash by pmontra · · Score: 1

      Obviously there is an example in the first page of the paper. What follows is mostly a direct quote from there.

      char *buf = ...;
      char *buf_end = ...;
      unsigned int len = ...;
      if (buf + len >= buf_end)
      return; /* len too large */
      if (buf + len < buf)
      return; /* underflow, buf+len wrapped around */
      /* write to buf[0..len-1] */

      They say they found this code in Chromium, Linux and Python. This code works on flat address spaces but fails on segmented architectures. The C standard states that overflowed pointers are undefined, so gcc assumes that no pointer overflow occurs on any architecture. It's overflow checker evaluates (buf + len

      I suggest that everybody read the paper. It contains many examples of normal looking code that's actually wrong.

    18. Re:News flash by pmontra · · Score: 1

      Slashdot ate my code because of an unquoted <
      I wrote: gcc's overflow checker evaluates (buf + len < buf) as false and optimizes it away paving the way to attacks.

    19. Re:News flash by pmontra · · Score: 1

      +funny +chaos theory

    20. Re:News flash by Anonymous Coward · · Score: 0

      This is one of the reasons we, as an industry, have been adopting unit testing, to find these corner case like checking for overflows.

    21. Re:News flash by u38cg · · Score: 1

      A random /. poster, ShanghaiBill vs. a fucking MIT research group - who should we take more seriously?

      --
      [FUCK BETA]
    22. Re:News flash by TheRaven64 · · Score: 3, Informative

      I have never seen a compiler that does that, and I seriously doubt if is really common. Does anyone know of a single compiler that does this?

      The only compilers I know of that definitely do this are GCC, LLVM, ICC, Open64, ARMCC, and XLC, but others probably do too. Compilers use undefined behaviour to propagate unreachable state and aggressively trim code paths. There's a fun case in ARM's compiler, where you write something like this:

      int x[5];
      int y;
      ...
      for (int i=0 ; i<10 ; i++)
      y += x[i];

      The entire loop is optimised away to an infinite loop. Why? Because accesses to array elements after the end of the array are undefined. This means that, when you write x[i] then either i is in the range 0-4 (inclusive), or you are hitting undefined behaviour. Because the compiler can do anything it wants in cases of undefined behaviour, it is free to assume that they never occur. Therefore, it assumes that, at the end of the loop, i is always less than 5. Therefore, i++ is always less than 10, and therefore the loop will never terminate. Therefore, since the body of the loop has no side effects, it can be elided. Therefore, the declarations of x and y are never read from in anything with side effects and so can be elided. Therefore, the entire function becomes a single branch instruction that just jumps back to itself.

      If your code relies on undefined behaviour, then it's broken. A compiler is entirely free to do whatever it wants in the cases where the behaviour is undefined. Checking for undefined behaviour statically is very hard, however (consider trying to check for correct use of the restrict keyword - you need to do accurate alias analysis on the entire program) and so compilers won't warn you in all cases. Often, the undefined behaviour is only apparent after inlining, at which point it's difficult to tell what the source of the problem was.

      --
      I am TheRaven on Soylent News
    23. Re:News flash by Joce640k · · Score: 2

      I'm more interested in how Linus is going to respond to a bunch of C++ programmers finding 32 bugs in his kernel.

      --
      No sig today...
    24. Re:News flash by DrXym · · Score: 1
      Maybe a developer has added logic like this - "if (someBadCondition) int x = 1 / 0;" to force an exception or fault to be thrown. Maybe on their compiler it causes the code to die (and dump core) makes it easier to figure out what went wrong post mortem.

      But if the behaviour is undefined then another compiler or one with different optimization enabled might no-op this code so it drops through into the next section where something less obvious might happen. So a tool which checks for deliberately undefined behaviour in code is probably a good thing.

    25. Re:News flash by michelcolman · · Score: 1

      Quite amazing that this doesn't trigger a compiler warning! I can understand that the compiler would "optimize away" code that it considers to be unreachable, undefined, or extraneous, but how hard would it be to let it give a warning so that the programmer can say "hey, wait a minute, I wrote that code for a reason!".

    26. Re:News flash by michelcolman · · Score: 1

      I really recommend reading the actual paper, it was an eye-opener for me. I now have to go check all my source code because the paper gave examples of things I use on a regular basis and never thought twice about. Apparently I know too much for my own good about how binary numbers work. What's worse, I can't believe the compiler doesn't even warn about these things! Very often, there's nothing really wrong with the code except for the fact that is based on a certain knowledge about how computers work, basic things like two's complement, knowledge which programmers are apparently not supposed to have.

      For a vey basic simplified example:

      unsigned int a = ...
      int b = a;
      if (b < 0) // a was too large for a signed int
      { // the "if" and the whole code block behind it is "optimized" away because b supposedly can never be negative. Huh?!!
      }

      Of course in this case you could have just checked a instead of b, but in more complicated cases this can really become a problem. All kinds of overflow checking (for example "buf + len buf" with positive len, intended to catch overflows with very large values of len in a buffer overflow exploit) are suddenly optimized away while they were actually important parts of the security of the application!

      Other examples include:
      if (abs(x) < 0)
      abs(x) CAN actually be negative if x equals -2^(n-1) due to integer overflow, but the compiler assumes it is always positive and therefore discards the check that was INTENDED to catch this case. And without any warning whatsoever. Unbelievable.

    27. Re:News flash by michelcolman · · Score: 1

      If your code relies on undefined behaviour, then it's broken. A compiler is entirely free to do whatever it wants in the cases where the behaviour is undefined.

      I think this is taking things a bit too far. "Undefined behaviour" was normally supposed to mean "this might give unexpected results on certain systems due to reasons beyond our control", not "we may decide to let pigs fly across the screen just for fun".

      It has now become impossible to use certain features of the processor, very basic things like two's complement arithmetic to check for overflows etcetera, just because the compilers have decided that we shouldn't rely on our knowledge of the fact that a processor uses binary numbers.

      You gave some examples of actually broken code, but in my opinion there's nothing wrong with checking "if (buf + len < buf)". That should just work as intended, or at least give a compiler warning. Instead, the compiler just takes it out without warning because it thinks it knows better than us.

    28. Re:News flash by kantos · · Score: 1

      actually.... the old standby is that undefined behavior is just that:

      Undefined behavior -- behavior, upon use of a nonportable or erroneous program construct, ... for which the standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to having demons fly out of your nose.

      --
      Any and all content posted above may be ignored, considered irrelevant, or otherwise dismissed.
    29. Re:News flash by Wootery · · Score: 1

      Not to worry, I'm sure he'll direct his rage at the C hackers responsible.

    30. Re:News flash by LoRdTAW · · Score: 1

      Windows gave me cancer!

    31. Re:News flash by Alioth · · Score: 3, Interesting

      In that vein, I tried:

      while(1) {
            bar=bar++;
            if(bar > 3) {
                  printf("bar = %d\n", bar);
                  break;
            }
      }

      Under gcc (trying -O0 to -O3 and -Os), this code printed "bar = 4". Compiling the same code with clang resulted in an infinite loop.

    32. Re:News flash by mrwolf007 · · Score: 1

      Maybe a developer has added logic like this - "if (someBadCondition) int x = 1 / 0;" to force an exception or fault to be thrown.

      As opposed to "if (someBadCondition) exit(E_MYBADCONDITION);"? Actually might as well write to a log fire before exiting as well.

      But sorry, if a dev actually uses something like in your example he should consider switching to a different job.

    33. Re:News flash by fatphil · · Score: 1

      The example given (memset before free) was not dead code.
      The compiler is entirely at fault in removing that code. It did not see writing to non-automatic storage as being a side effect, when in truth writing to non-automatic storage is *nothing but* a side effect.

      --
      Also FatPhil on SoylentNews, id 863
    34. Re:News flash by gweihir · · Score: 1

      That is not "unstable" or "undefined" code. There is already a word for it: dead code.

      In all three cases, or just the first? I agree in the first case, but I did say that it wasn't undefined behavior, so you're not bringing anything new to the table.

      For the second and third cases, things are a lot more complicated. In the second case in particular, the removed code was dead precisely because the compiler made inferences based on undefined behavior.

      I was just commenting the memset-free example, have not had the time to read the paper yet. (But I definitely will.)

      In addition, any programmer worth his/her salt will make sure to define things like that as "volatile", i.e. tell the compiler that they might be accessed at any time from place the complier does not see. Which is exactly the security problem here. Don't blame compilers for programmer incompetence....

      Because every mistake has to be borne out of incompetence? From a quick search, the same problem seems to have occurred in Linux. Are they incompetent?

      And besides, I doubt your rule is universal. I'm not a programmer for security-sensitive code, but let's take a quintessential examples of performance-critical code: an encryption loop. Using [i]volatile[/i] removes a number of optimization opportunities from the compiler. If you mark the key as volatile, it's entirely conceivable that your performance-critical code will slow substantially.

      Yes, I do not dispute that. And no, the rule is not universal, you have to think about each case individually and understand what exactly the security problem is. For the example at hand, you can make it volatile as late as possible, e.g. by something like this:

      volatile char * dispose;
      ...
      dispose = password;
      memset(dispose '\0', len);
      free(dispose);

      Security-critical code is even more tricky than ordinary code. The incompetence is not with the programmer not knowing volatile. It is with the programmer not realizing that the compiler may need extra restrictions here as security critically depends on a side-effect. Unfortunately, a lot of C code with security functionality gets written by programmers that do not understand security and, worse, that do not understand that they do not understand security and hence do not get help. The latter is the type of incompetence that is at the core of the problem.

      I still vastly prefer good C programmers without solid security skills, as they can understand such things when they are explained to them. Unfortunately, the majority is mediocre programmers without security skills, at least judging from the people I have to do with professionally.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    35. Re:News flash by mrwolf007 · · Score: 1

      Obviously there is an example in the first page of the paper. What follows is mostly a direct quote from there.

      char *buf = ...;
      char *buf_end = ...;
      unsigned int len = ...;
      if (buf + len >= buf_end)
      return; /* len too large */
      if (buf + len < buf)
      return; /* underflow, buf+len wrapped around */ /* write to buf[0..len-1] */


      char *buf = ...;
      char *buf_end = ...;
      unsigned int len = ...;
      unsigned int buf_len= buf_end-buf;
      if (len>buf_len) return; /* write to buf[0..len-1] */

      There fixed it for ya.

      They say they found this code in Chromium, Linux and Python.

      If this had been some newb project i would have just frowned. But hearing those levels of incompetence reach so far hurts my head.

    36. Re:News flash by parkinglot777 · · Score: 0

      This is why it's so important to document your code. Once the compiler reads your comments it will understand your intent and decide whether you or your code is unstable.

      What? Do you think a compiler reads human readable comments? Do you really know what a "compiler" is? From what you said, I highly doubt you know what this topic is all about.

      Speaking of this article, to me, it is talking about a software (STACK http://css.csail.mit.edu/stack/ ) implemented by MIT researchers that could detect bugs from complied codes. These bugs are from optimization in compilers -- remove undefined behavior from the original code while compiling the code. As a result, the compiled code could have security holes.

    37. Re:News flash by neurovish · · Score: 1

          tries = tries++; /* undefined behaviour! */

      Maybe I'm overlooking the obvious, but how could foo = foo++ in a loop ever not be undefined behaviour? I'm sure I've used used that before in a, and things worked as expected, so it wasn't optimized away. Was I just lucky? I wouldn't have conciously coded around undefined behaviour.

    38. Re:News flash by war4peace · · Score: 1

      Thank you for the explanation.
      I am a very very junior coder (did some small projects every now and then) but I see your examples as being buggy, rather than "unstable" code.
      I saw "unstable" as code that you run 100 times in the same way and gives you 5 different results, e.g. 96 times result A (expected), 1 time result B, etc.

      --
      ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
    39. Re:News flash by TheRaven64 · · Score: 1

      That should just work as intended, or at least give a compiler warning. Instead, the compiler just takes it out without warning because it thinks it knows better than us.

      Quite the reverse. The compiler is assuming that you know what you mean and have written what you mean. If what you mean doesn't make sense, then it tries to interpret it as best as it can (in this example, if you've done something that means that the only well-defined values of i are 0-4, then i is always less than 10).

      The problem is that most of these things need global analysis to see if they are as a result of undefined behaviour, but the optimisers only do local analysis.

      --
      I am TheRaven on Soylent News
    40. Re:News flash by michelcolman · · Score: 1

      No, they are not buggy. Not in the slightest. OK, they might be considered "undefined" if you follow the C standard to the letter, because the C standard was written for some kind of non-existent generic calculating device without making any assumptions about the hardware, but if you know what actually happens inside the processor, it makes perfect sense to check the sign of a calculation to detect overflows. People have been doing that for ages, it's extremely efficient. It's what you would write in machine code, too.

      The only thing that makes it buggy, is the way the compiler tries to "optimize" things by making invalid assumptions.

    41. Re:News flash by michelcolman · · Score: 1

      nothing wrong with checking "if (buf + len < buf)". That should just work as intended, or at least give a compiler warning. Instead, the compiler just takes it out without warning because it thinks it knows better than us.

      Quite the reverse. The compiler is assuming that you know what you mean and have written what you mean. If what you mean doesn't make sense, then it tries to interpret it as best as it can (in this example, if you've done something that means that the only well-defined values of i are 0-4, then i is always less than 10).

      That was not the example I was talking about. "if (buf + len < buf)" does make perfect sense, because it catches a bug that is caused precisely by the fact that buf + len can become less that buf due to overflow. You write a program, it crashes because some value becomes negative, you add code to fix it ("if the value is negative, take corrective action") and then the compiler just says "you don't know what you're doing, this value will never be negative". And then your app crashes because, guess what, the value became negative. And you're wondering why the hell the code didn't catch it. Set a breakpoint, the code passes right over it, enter an expression to check the sign, sure thing, the sign IS negative, what the hell is going on?!

      As for the reading from beyond the bounds of an array, really, I still think it's going a bit too far there. When you read from a location that is outside the bounds of an array, you should either get garbage numbers or a seg fault, but the flow of the program should never be altered like that. I do understand what's happening, I'm just saying they went too far. Yes, OK, I get it, it's undefined and therefore they are allowed to let demons come out of my nose, but really?!

    42. Re:News flash by Anonymous Coward · · Score: 0

      any programmer worth his/her salt will make sure to define things like that as "volatile", i.e. tell the compiler that they might be accessed at any time from place the complier does not see. Which is exactly the security problem here. Don't blame compilers for programmer incompetence....

      Sure, let's not blame them for programmer incompetence. Let's leave the programmers out of it. Instead, I'll blame compilers for a holocaust of pervasive security vulnerabilities. Seriously, no one should have any patience for your focused hair-splitting pedantry. This kind of talk-to-the-hand mental insulation is why we are stuck for three decades with shit standards written by arrogant little alligator-men, all mouth and no ears.

    43. Re:News flash by gweihir · · Score: 1

      It was dead code from the POV of the compiler, as it has no effect. It is not dead code from the outside, but for that the "volatile" keyword exists,

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    44. Re:News flash by DrXym · · Score: 1
      The exit() method doesn't dump a core and that might be why that code is there, to debug a situation which should not happen post mortem. Apps need a signal which causes a core dump, e.g. a segmentation fault with an illegal memory access (SIGSEGV), or divide by zero (SIGFPE) or something timilar. An easy way to do that is perform an illegal action in the code.

      And it's no good pretending "devs shouldn't do that". Maybe they shouldn't, at least not in production code, but it's quite possible they might and that is what a tool might pick up. There are also far more subtle things that are not intentional but are still optimized away. There are links on this page which describe some of them, such as null pointer checks in the kernel which are optimized away simply due to the order some of the tests are made in.

    45. Re:News flash by Obfuscant · · Score: 1

      knowledge which programmers are apparently not supposed to have.

      Knowledge which programmers of portable code do not have. Yes, if you are writing code only for the Spitfire X2300 CPU you can assume things about how numbers are handled. If you want your code to run on other things, where "other things" means "other people's computers", you need to write portable code. Things like "I know how long an 'int' is" fall into the "a little bit of knowledge is a dangerous thing." And yes, I've fallen into that trap and had to spend a significant bit of time fixing my 32 bit code to run under a 64 bit OS. And a bit of time fixing other people's programs, when they use their knowledge of binary numbers to cut corners.

      I now have to go check all my source code ...

      I'd like to, and I followed the instructions for compiling it, but it fails at the "autoreconf" step, complaining about an undefined macro "AC_PROG_MKDIR_P". And clang and llvm fail to configure because the "optional" python causes a fatal error when it isn't there.

    46. Re:News flash by JesseMcDonald · · Score: 1

      Maybe I'm overlooking the obvious, but how could foo = foo++ in a loop ever not be undefined behaviour?

      That's the point, it's always undefined behavior. It doesn't even matter whether you're in a loop. It is undefined behavior to modify the stored value of an object more than once between two adjacent sequence points, as this statement does by combining assignment and post-increment of the same variable in the same expression. Even ignoring the fact that this explicitly violates the (C99) standard, the result would depend on which update occurs first; the standard only says that the stored values must be updated before the next sequence point (in any order). If the incremented value is stored first, it will be overwritten by the assignment of the pre-incremented value and nothing will change.

      The correct way to write this expression is simply "foo++", with no assignment.

      --
      "The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
    47. Re:News flash by david_thornley · · Score: 1

      The C++ Standard (and likely the C Standard) is specific about that: the implementation must cause the same I/O calls and accesses to volatile variables as you'd get with a naive implementation. The compiler was conforming to the Standard in this case. The side effect is irrelevant because it doesn't apply to I/O or volatile variables.

      If you want to require the implementation to do every stupid thing the code does, you can throw out any idea of an optimizing compiler, since optimization is generally changing the implementation of things to be more efficient and get the right result. Compilers can't telepathically tell whether a potentially useless side effect is to be kept or discarded, and there are facilities in the C and C++ languages to specify this.

      The problem is that the programmer did not understand what he or she was doing, and assumed that side effects would be kept depending on whether the programmer really meant them or didn't. The solution is to have code important for security written, or at least reviewed, by people who know what they are doing.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    48. Re:News flash by UnderCoverPenguin · · Score: 1

      There's a fun case in ARM's compiler, where you write something like this:
       

      int x[5];
      int y; ...
      for (int i=0 ; i<10 ; i++)
          y += x[i];

      That looks a common error hidden by undefined behavior: The array size and loop bound are coded with "hard constants". The problem arises when the programmer changes some, but not all, of the constants, so there is a mismatch. Better to use symbolic constants. Then you only need to change the definition of the constant. Also, the symbol can be given a descriptive name.

      --
      Don't try to out wierd me, three-eyes. I get stranger things than you, free with my breakfast cereal. --Zaphod Beeblebr
    49. Re:News flash by JesseMcDonald · · Score: 1

      "if (buf + len < buf)" does make perfect sense, because it catches a bug that is caused precisely by the fact that buf + len can become less that buf due to overflow.

      The root cause of this problem isn't overflowing the pointer type, it's overflowing the range of your buffer. Unless your buffer extends all the way to the end of the address space, there are many possible buffer overflow conditions which this code won't catch. You should check len against the size of the buffer, or a pointer to the last element of the buffer (like "if ((0 <= len) && (len <= buff_end - buf))"), both of which are compliant with the standard and won't be eliminated automatically.

      --
      "The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
    50. Re:News flash by david_thornley · · Score: 1

      OK, so how is the compiler to know which constructs the human intended and which the human didn't intend? The optimizer has to have some rules to work with.

      You seem to want more behavior to be defined, or at least promoted to "unspecified" status. This isn't necessarily a bad idea. The reason C had so much undefined behavior was that it was expected to run on a much more varied collection of computers that we've got now. For example, on arithmetic overflow, some computers would cause an exception or raise a signal or something, while others would wrap the values around, and I don't know how many might have done something different. Specifying a behavior in that case would have meant that a whole class of computers would have slow signed arithmetic, and that wasn't according to the C design philosophy. Nowadays, almost everything is twos-complement, eight bits to the byte, and darn little have hardware overflow protection for integer arithmetic.

      However, you'll have to convince the appropriate Standards committees to go along with you.

      As far as "if (buf + len INT_MAX)" (or whatever you're using for the value of the largest possible value for the appropriate signed integral data type), which has the advantage of expressing what you're actually testing for.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    51. Re:News flash by JesseMcDonald · · Score: 1

      Apparently I know too much for my own good about how binary numbers work.

      No, the problem is that you know just enough to be dangerous. The C standard was written to support real architectures which did not use two's complement to represent signed integers; some of the other options include sign-magnitude and biased representation. These architectures are uncommon today, but that is why signed overflow and underflow are undefined. INT_MAX+1 could be INT_MIN, or it could be zero (or even negative zero). It could also cause an exception. Even with two's complement there is the possibility that signed and unsigned integers are not the same width; the standard allows for "padding" bits.

      Note that conversion from signed to unsigned is defined to always give the two's complement representation, so if you explicitly cast to an unsigned integer type the problem goes away. (Though you have to use "> INT_MAX" rather than "< 0".)

      --
      "The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
    52. Re:News flash by michelcolman · · Score: 1

      As far as "if (buf + len > INT_MAX)" (or whatever you're using for the value of the largest possible value for the appropriate signed integral data type), which has the advantage of expressing what you're actually testing for.

      No, that will always return false, because if buf + len is bigger than INT_MAX, the result of the addition will be a smaller number due to that very overflow we were testing against. Except if you cast it to a long long or something like that.

      You could use "if (INT_MAX - int(buf) < len)" though. (Assuming buf is a char*, otherwise add some multiplications with sizeof)

    53. Re:News flash by war4peace · · Score: 1

      Oh yeah, I totally agree that the compiler has no right to make assumptions and change things around. If anything, it should warn the user and let them make a decision. I (apparently wrongly) assumed this would always be the case.

      Sorry for looking like a dumbass, I have no C knowledge whatsoever, hence I'm asking these stupid questions :)

      --
      ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
    54. Re:News flash by hazah · · Score: 1

      The questions aren't stupid.

    55. Re:News flash by AmiMoJo · · Score: 1

      Embedded C programmers all know that anything you require the compiler to always read/write when told to gets declared "volatile". IMHO failing to declare password as volatile, meaning you require its state in memory to always be set how the code dictates even if the compiler thing it doesn't need to be, is the fault here.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    56. Re:News flash by tragedy · · Score: 1

      Now if a>4, the code inevitably runs into undefined behaviour, and therefore it may assume that a is not larger than 4 right from the start. Therefore it is allowed to compile the complete block to simply

      Right. Which means that the a compiler that removes this code (compiler A) will create a smaller executable than a compiler that doesn't(compiler B). This will affect load times, therefore it could cause race conditions to surface that otherwise wouldn't. Also, it could cause all kinds of really subtle memory complications. And there's the timing issues from when it's actually running, as well. It also doesn't matter if you are using compiler A or compiler B. If you switch from one to the other, all kinds of subtle, hard to track down conditions could be exposed.

    57. Re:News flash by ais523 · · Score: 1

      If you want a "this cannot happen" with core dump in C, just use abort(). Unless its behaviour is specifically overriden, it's specified to exit the program as unsuccessful termination by the C standards (e.g. 7.22.4.1p2 of C11), and to core dump by POSIX (about as portable as you can get where core dumps are concerned; in straight C, they might not necessarily exist). It also has the benefit of being pretty short, and not undefined behaviour at all.

      Of course, that doesn't work in the kernel, but then neither would the other methods you suggested.

      --
      (1)DOCOMEFROM!2~.2'~#1WHILE:1<-"'?.1$.2'~'"':1/.1$.2'~#0"$#65535'"$"'"'&.1$.2'~'#0$#65535'"$#0'~#32767$#1"
    58. Re:News flash by EvanED · · Score: 1

      IMHO failing to declare password as volatile, meaning you require its state in memory to always be set how the code dictates even if the compiler thing it doesn't need to be, is the fault here.

      In many cases, yes, though I disagree and think that a fix that ensures that the memset actually occurs is equally as good.

      But in a more general sense, volitile is not always the answer -- in particular, suppose that you're writing an encryption loop. That's likely performance-sensitive code -- so do you really want a possibly significant speed degradation by marking the key as volatile? Probably not.

    59. Re:News flash by fatphil · · Score: 1

      The C standard, n1570 5.1.2.3pp2,4, disagrees, as I read it. I would claim that the zeroing of memory is a "needed side effect", and that the compiler has no reason to assert that it is not needed. The burdon of proof of non-neededness should be on the one doing the optimisation. It might be that the pointer was locally allocated, and never communicated to an external function, and therefore the object being pointed to was not accessible elsewhere (such as from a signal handler), that would probably be a satisfactory proof, but I'm fairly sure things weren't that cut and dry.

      I don't disagree with your second paragraph at all. Many optimisations are semantics preserving. Use of register variables (this is the big win, once you can do this, you can do most of the others, as there are no visible side effects regarding what's stored in registers), redundant store elimination, common subexpression elimination, loop unrolling, ...

      I agree about review, though. And that review should include runs through static code analysis tools such as coverity, purify, and this new one.

      --
      Also FatPhil on SoylentNews, id 863
    60. Re:News flash by david_thornley · · Score: 1

      I could have sworn I was writing "if ((unsigned)buf + len > INT_MAX)" there. Using unsigned arithmetic is not only well-defined, but does express what the person wants.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    61. Re:News flash by michelcolman · · Score: 1

      But if buf is already unsigned, and the problem is that buf+len wraps around to some positive value less than buf, your fix doesn't help.

      Then again, I suppose the optimizer won't (or at least shouldn't) optimize in that case. So I imagine the bug only occurred if pointers are converted to int and not unsigned int.

    62. Re:News flash by gweihir · · Score: 1

      Indeed. Just my point.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    63. Re:News flash by gweihir · · Score: 1

      You are reading it wrong. Zeroing memory is not a needed side-effect as there is no way to reliably get access to that exact piece of memory again, unless you use very system-specific things which are entirely out of scope.

      You read your wishes into the standard. The only thing a C compiler needs to care about is what the C code sees and any explicitly documented I/O or IPC side-effects. Otherwise, "volatile" would be unneeded.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  2. Film at 11 by Anonymous Coward · · Score: 0

    Running static analysis tools on a whole repository gives lots of warnings.
    Who'da thunk it?

  3. TFA does a poor job of defining what's happening by istartedi · · Score: 4, Insightful

    If my C code contains *foo=2, the compiler can't just leave that out. If my code contains if (foo) { *foo=2 } else { return EDUFUS; } it can verify that my code is checking for NULL pointers. That's nice; but the questions remain:

    What is "unstable code" and how can a compiler leave it out? If the compiler can leave it out, it's unreachable code and/or code that is devoid of semantics. No sane compiler can alter the semantics of your code, at least no compiler I would want to use. I'd rather set -Wall and get a warning.

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
  4. "Unstable" code? WTF? by Anonymous Coward · · Score: 1

    "Unstable" code is not a technical term used by any self-respecting programmer. Researchers love to make up terms that nobody but themselves use. Props to the MIT News article for correctly avoiding that term.

    1. Re:"Unstable" code? WTF? by viperidaenz · · Score: 1

      The MIT article also incorrectly refers to Java. Which isn't compile-time optimized.

    2. Re:"Unstable" code? WTF? by Tablizer · · Score: 1

      "Unstable" code is not a technical term used by any self-respecting programmer.

      I believe "fucked-up steaming pile of rotting garbage" is the proper term.

    3. Re:"Unstable" code? WTF? by Lehk228 · · Score: 0

      I take it you have not been introduced to Microsoft Windows?

      --
      Snowden and Manning are heroes.
  5. How is this news? by Anonymous Coward · · Score: 0

    Compilers have been ignoring meatspace problems for years. It's well known that most compilers will both ignore some bad chunks of code as well as do its own optimizations (like unrolling).

    If the binaries it compiles to work as intended and pass validation, what's the issue? The compiler being a point of trust is something that's been rehashed constantly with people continually reposting the 70 year old ken article

  6. Inflammatory Subject by Imagix · · Score: 4, Informative
    This is complaining because code which is already broken is broken more by the compiler? The programmer is already causing unpredictable things to happen, so even "leaving the code in" still provides no assurances of correct behaviour. An example of how the article is skewed:

    Since C/C++ is fairly liberal about allowing undefined behavior

    No, it's not. The language forbids undefined behavior. If your program invokes undefined behavior, it is no longer well-formed C or C++.

    1. Re:Inflammatory Subject by Murdoch5 · · Score: 2

      You're right, there should never be undefined behavior or clueless development. If things are getting compiled out of the code then you clearly don't know enough about the compiler and language. I love when developers blame things like pointers and memory faults instead of the misuse of these by bad programming.

    2. Re:Inflammatory Subject by Anonymous Coward · · Score: 0

      C/C++ has a lot of things which are left to the implementation. It doesn't forbid them. It simply says that it is implementation specific. See: the size of various data types, and other other things.

    3. Re:Inflammatory Subject by HiThere · · Score: 1

      That's nice. But when a language invites such things, that *is* a flaw in the language. I basically distrust pointers, but especially any pointers on which the user does arithmetic. Some people think that's a snazzy way to move through an array. I consider it recklessly dangerous stupidity, which is leaving you wide open to an undetected error with a simple typo.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    4. Re:Inflammatory Subject by Anonymous Coward · · Score: 0

      C++ does not forbid undefined behaviour. This is pretty easy to see if you look at features like "new". Some compilers will initialize memory allocated by new, others will not. It is not safe to assume we know the contents of memory returned by new as it can vary depending on compiler/platform.

    5. Re:Inflammatory Subject by Anonymous Coward · · Score: 0

      There is a sense in which the article is correct. C and C++ are fairly liberal in that a conforming implementation (e.g. compiler) is not required to reject such code. So although the specifications imply that certain programs are invalid, they permit every implementation to accept all such programs.

      That in itself isn't all bad, since one might imagine designing a system that rejects all such programs. However: in these languages almost any line of code might or might not invoke undefined behaviour depending upon preconditions set up by arbitrarily distant code, implementation dependent details, or external input. The general problem of testing for undefined behaviour in these languages is uncomputable. Any compiler will therefore always have either false positives (spurious warnings or errors) or false negatives (invalid programs compiling without warning).

      The only general way to determine whether a C or C++ program invokes undefined behaviour is by doing something equivalent to running it.

    6. Re:Inflammatory Subject by Murdoch5 · · Score: 1

      You can't blame a language for flaws when you decide to use the features you consider dangerous. Pointers are one of the most powerful features of C and if you know how to use them correctly and safety they will be very very powerful. Just because a pointer can completely grable memory and completely corrupt your stack and heap doesn't mean they will. C and ASM assume the programmer is smart enough to take memory management into there own hands and personally I completely agree. I hate all forms of automatic management and garbage collection, they both don't work nearly as well as a skilled programmer with a good knowledge of pointers.

    7. Re:Inflammatory Subject by Anonymous Coward · · Score: 0

      >I consider it recklessly dangerous stupidity
      That's because you are very stupid and should not be let near a compiler. Please stay within the confines of garbage collected languages, not that you should be trusted with those either.

    8. Re:Inflammatory Subject by osu-neko · · Score: 1

      "A man's got to know his limitations..."

      --
      "Convictions are more dangerous enemies of truth than lies."
    9. Re:Inflammatory Subject by smellotron · · Score: 1

      I basically distrust pointers, but especially any pointers on which the user does arithmetic. Some people think that's a snazzy way to move through an array.

      In C and C++, all array iteration is pointer arithmetic, so your "some people" is really everyone. Always remember that foo[n] is equivalent to n[foo] is equivalent to *(foo+n).

    10. Re:Inflammatory Subject by Error27 · · Score: 1

      It doesn't forbid it. GCC doesn't even warn about it when it silently removes things. In the kernel, we turn most of these optimizations off now but before then it did cause kernel security bugs.

      My guess is that you didn't read the PDF?

    11. Re:Inflammatory Subject by Anonymous Coward · · Score: 0

      The language forbids undefined behavior. If your program invokes undefined behavior, it is no longer well-formed C or C++.

      BZZZT! Wrong, but thanks for playing.

      Here's a discussion from comp.lang.c++.moderated which explains it best:

      >> > 1. Program has undefined behavior is ill-formed. Is it true?
      >>
      >> No. It is possible to write a well-formed program that (conditionally) has undefined behaviour.
      >
      For example:

      int main(int argc, char**argv) { return (int)argv[1][0]; }

      This may or may not produce undefined behavior, depending on what arguments are supplied to the program at run time.

      Ill-formed is a compile-time property. Undefined behavior occurs at run time.

    12. Re:Inflammatory Subject by Anonymous Coward · · Score: 0

      In C/C++, is there a way to move through a primitive array without using pointers? I'm pretty sure there isn't...

    13. Re:Inflammatory Subject by Anonymous Coward · · Score: 0

      You're right, there should never be undefined behavior or clueless development.

      I can't tell if you're being sarcastic or not. This stuff seems like victim-blaming: if the standard can specify an explicit situation where "all bets are off," then the standard should be revised to require that the compiler detect that situation and throw an error. These norms were appropriate when we had to compile C on machines with <1MByte of RAM, but aren't now. I realize the designers of the C standards all still remember that world, but I'm suggesting that they stop living in it.

    14. Re:Inflammatory Subject by AmiMoJo · · Score: 1

      I think what they meant was that C compilers are fairly liberal about allowing undefined behaviour. They usually don't generate any kind of warning, mainly because it's really really hard to detect when things like that happen and spit out a useful error message.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    15. Re:Inflammatory Subject by HiThere · · Score: 1

      If the compiler handles it, you SHOULDN'T need to worry about typos...most of the time, and you can pay particular attention to those times. Yes, under the covers it's pointer manipulation, but that's the kind of thing computers are good at, and people are terrible at.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    16. Re:Inflammatory Subject by HiThere · · Score: 1

      On computers there's no way to move through an array without using pointers. But if what you specify is array indexes it's much less error prone.

      I don't really object to pointers per se. I object to humans manipulating them. It's highly error prone, and not necessary now that compilers have gotten out of the 8-bit environment. (In the olden days there was trouble fitting a decent compiler into memory...so they had to be simple.)

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
  7. Null pointer detection at compile time by tepples · · Score: 1

    I'd rather set -Wall and get a warning.

    There are some undefined behaviors that can't be detected so easily at compile time, at least not without a big pile of extensions to the C language. For example, if a pointer is passed to a function, is the function allowed to dereference it without first checking it for NULL? The Rust language doesn't allow assignment of NULL to a pointer variable unless it's declared as an "option type" (Rust's term for a value that can be a pointer or None).

    1. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 4, Insightful

      "For example, if a pointer is passed to a function, is the function allowed to dereference it without first checking it for NULL?"

      Of course it is, and it is supposed to be able to do so. If you were an embedded systems programmer you would know that, and also know why. Next you'll be complaining that languages allow infinite loops (again, a very useful thing to be able to do). C doesn't protect the programmer from himself, and that's by design. Compilers have switches for a reason. If they don't know how it is being built or what the purpose of the code is then they can't possibly determine with another program if the code is "unstable".

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    2. Re:Null pointer detection at compile time by EvanED · · Score: 2

      Of course it is, and it is supposed to be able to do so.

      Actually no, you're not, or you're programming in Some-C-Like-Language and not C. In C, dereferencing a NULL pointer is always undefined behavior, and compilers are allowed (though presumably very unlikely to on embededd platforms) to make transformations based on that assumption, such as the following:

      void f(int * p) {
        int x = *p;
        if (p == NULL) {
          g();
        }
      }

      C compilers are allowed to optimize away the null check and subsequent call to g(), and if you rely them not, you're relying on behavior that's not guaranteed by the standard.

    3. Re:Null pointer detection at compile time by aiht · · Score: 1

      allowed to dereference it without first checking it for NULL?

      dereferencing a NULL pointer is always undefined behavior

      Dereferencing without checking for NULL is not the same as dereferencing NULL.
      Just because the compiler may not be able to prove that you never call that function with a NULL pointer, that does not mean you can't prove it yourself. If that is the case, why check again?

    4. Re:Null pointer detection at compile time by EvanED · · Score: 1

      I didn't say you should always check for null, or anything like it.

      What I said was that if you have code where you rely on a particular behavior of dereferencing NULL pointers, that code's behavior is undefined in that case and you may be surprised by different behavior. In particular, if you dereference a pointer p, the optimizer is allowed to apply transformations based on the assumption that p is non-null.... ... and more to the point, that all of that is true (if with less practical consequence as compilers will likely know their targets) even on platforms where you expect low addresses like 0 to work like any other address.

    5. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      Show me in the standard where compilers are allowed to "optimize away" a check for null. By the way that wouldn't be "optimizing", that would be what we call "breaking", and I don't mean with a break statement.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    6. Re:Null pointer detection at compile time by EvanED · · Score: 1

      The key excerpt (this exact quote is apparently from C++03 standard, para 1.9.5):

      A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

      There's nothing in the standard that explicitly says "you can remove null checks", but in cases like this it is a consequence of other rules. I'll give you an example. This is not likely how a compiler would arrive at optimizing away the null check in my example, but it's also not entirely implausible, and it's certainly an allowable explanation.

      Consider the function from my earlier post:

      void f(int * p) {
      int x = *p;
      if (p == NULL) {
      g();
      }
      }

      The standard has the "as-if" rule, which allows the compiler to make changes to the program that can't be observed from within the program itself (i.e., without invoking undefined behavior or using a debugger or something like that). For example, the following function variant behaves the same as the first as far as the standard is concerned:

      void f(int * p) {
      if (p == NULL) { // *
      int x = *p;
      if (p == NULL) { // **
      g();
      }
      }
      else {
      int x = *p; // ***
      if (p == NULL) {
      g();
      }
      }
      }

      If execution reaches **, we know that p is guaranteed to be NULL. (Or rather, we are allowed to assume that p is guaranteed to be null.) The reason should be pretty obvious: we took the true branch of * which meant that p was NULL there, there is no other way to reach ** (e.g. no gotos), there is no assignment to p between * and **, and there is no way within the semantics of C for anyone else to get the address of p to modify it (and even if there was a way that another thread could, that'd be a race condition which is also UB).

      As a result, by the as-if rule the compiler is allowed to optimize away that NULL check:

      void f(int * p) {
      if (p == NULL) { // *
      int x = *p;
      g();
      }
      else {
      int x = *p; // ***
      if (p == NULL) {
      g();
      }
      }
      }

      Now let's look at the other branch. If execution reaches ***, we know that p is NULL by the same analysis as above. What does this mean? Line *** dereferences a NULL pointer, which by the standard is UB (C++03 1.9.4: "Certain other operations are described in this International Standard as undefined (for example, the effect of
      dereferencing the null pointer).")

      Combined with the as-if rule above, this means that the compiler is allowed to perform any transformation it wants to the else branch of *, because every execution that reaches it will invoke UB. In particular, the following is an allowable transformation:

      void f(int * p) {
      if (p == NULL) { // *
      int x = *p;
      g();
      }
      else {
      int x = *p; // ***
      g();
      }
      }

    7. Re:Null pointer detection at compile time by Anonymous Coward · · Score: 0

      NULL pointers are actually GOOD, because they will facilitate Fast Fail instead of the byzantine crap that can come from freed or uninitialized pointers. The latter two are major security risks.

    8. Re:Null pointer detection at compile time by AmiMoJo · · Score: 1

      NULL is a valid pointer on some embedded systems. There is addressable memory at location zero, and no OS or memory management so you are free to use it. Thus, the standard cannot include any special functionality for NULL pointers or it would break those systems.

      I suppose it would be more accurate to say that sometimes the compiler is allowed to optimize away checks on pointers. The fact that the check is for NULL is meaningless, because to the compiler NULL is no more special than any other number.

      I think a lot of people who have not done system level programming get confused by this.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    9. Re:Null pointer detection at compile time by david_thornley · · Score: 1

      You're confusing the null pointer and the all-zero-bits pointer. They're two separate things.

      The null pointer is just a specific value of a pointer type that's not ever going to be valid. It is denoted in the language by a constant integer expression evaluating to 0, and so lots of compilers did the easy thing and made the null pointer all zero bits and said, basically, "Don't dereference a zero-value pointer". That isn't required.

      If your compiler on your embedded system treats all-bits-zero as NULL, when it's important to dereference memory value zero, it's doing something wrong.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    10. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      No. You are confusing dereferencing a null pointer, which is in fact undefined, with checking a pointer for null which is 100% defined. Read the comp.lang.c FAQ null pointer section for a better understanding of where you went wrong.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    11. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      "What I said was that if you have code where you rely on a particular behavior of dereferencing NULL pointers, that code's behavior is undefined in that case and you may be surprised by different behavior. In particular, if you dereference a pointer p, the optimizer is allowed to apply transformations based on the assumption that p is non-null."

      No. You specifically said they were allowed to optimize the check for null away. I just copied and pasted the following from your post.

      "C compilers are allowed to optimize away the null check and subsequent call to g()"

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    12. Re:Null pointer detection at compile time by EvanED · · Score: 1

      "What I said was that if you have code where you rely on a particular behavior of dereferencing NULL pointers, that code's behavior is undefined in that case and you may be surprised by different behavior. In particular, if you dereference a pointer p, the optimizer is allowed to apply transformations based on the assumption that p is non-null."

      No. You specifically said they were allowed to optimize the check for null away. I just copied and pasted the following from your post.

      "C compilers are allowed to optimize away the null check and subsequent call to g()"

      And the contents of that second quote still holds. To relate it to the first part, here's the code in question again:

      void f(int * p) {
        int x = *p; // *
        if (p == NULL) {
          g();
        }
      }

      Line * dereferences p, and thus the behavior is undefined when p is NULL. If you depend on a particular behavior when p is NULL, such as calling g, you may be surprised by different behavior (not calling g). In particular, because you've dereferenced p, the optimizer is allowed to apply transformations based on the assumption that p is non-null, in this case, removing a conditional that can never be false in conforming executions.

    13. Re:Null pointer detection at compile time by EvanED · · Score: 1

      No. You are confusing dereferencing a null pointer, which is in fact undefined, with checking a pointer for null which is 100% defined.

      No, I'm not.You, I think, are confusing removing the null check because it's a null check (which of course doesn't make sense) with removing the null check because it is never false.

      But please, feel free to explain what specific part of the post you replied to is wrong. And maybe you want to also inform Chris Lattner (director of Apple's Developer Tools division and founding author of LLVM, which was the subject of his Master's and PhD theses) about how he is confusing dereferencing a null pointer with checking a pointer for null (immediately after "This would give us these two steps:" he shows the optimization that you're claiming I'm wrong about, though he doesn't explicitly explain it), and maybe also John Regehr (CS prof at the Univ. of Utah who does program analysis research) about how his explanation is also wrong (see "A Fun Case Analysis").

    14. Re:Null pointer detection at compile time by EvanED · · Score: 1

      You, I think, are confusing removing the null check because it's a null check (which of course doesn't make sense) with removing the null check because it is never false.

      I forgot to finish my sentence:

      "...removing the null check because it is never true in executions that do not invoke undefined behavior." (Or maybe a better wording would have been to say that the compiler can remove the null check because it is only true in executions that invoke undefined behavior, and thus C semantics impose no restrictions on the program's behavior.)

    15. Re:Null pointer detection at compile time by EvanED · · Score: 1

      Oh balls. My explanation is a bit wrong as-stated because my description is operating with the opposite sense of the outer conditional.

      If you pretend that I said p != NULL instead of p == NULL everywhere, that fixes it.

    16. Re:Null pointer detection at compile time by EvanED · · Score: 1

      I've already replied and then replied to myself, but I just noticed that the code and description in my post are backwards, and so the explanation is wrong as-stated. It's possible that threw you off. Read p == NULL as p != NULL in all the code snippets and see if it makes more sense.

      (In particular: (1) The else branch of the outer if needs to correspond to p = NULL to match the description's statement that code that reaches the else block will always invoke undefined behavior and thus any transformation is legal. (2) The condition in line ** needs to match the condition in line *, which is what allows the compiler to optimize away ** because it's redundant with * being true.)

    17. Re:Null pointer detection at compile time by EvanED · · Score: 1

      Thus, the standard cannot include any special functionality for NULL pointers or it would break those systems.

      Well... it does... dereferencing a NULL pointer is UB, and compilers are allowed by the standard to make transformations based on that. Those are just facts.

      Compilers for those systems are free to treat NULL as just any pointer of course, and that's likely what they do. On embedded systems, worrying about the compiler performing transformations based on UB in the face of NULL dereferences is likely academic for that reason. But programs that rely on that are still relying on a specific behavior for something that the standard alone leaves undefined.

      (Besides, just because you can put something at address 0 doesn't mean that it necessarily completely breaks on such systems -- the compiler and runtime could ensure that no object gets loaded sufficiently close to 0 to cause a problem. Maybe the first 16 bytes are reserved or something instead of leaving an entire 4K page unmapped, or maybe the program's code itself is put at address 0.)

      I suppose it would be more accurate to say that sometimes the compiler is allowed to optimize away checks on pointers. The fact that the check is for NULL is meaningless, because to the compiler NULL is no more special than any other number.

      The compiler's allowed to optimize away lots of things. :-) If I had to distill this particular "problem" down, I'd say "if the code dereferences a pointer, the compiler is allowed to assume it's non-NULL around that point."

      And in that sense it's not particularly true that the check is against NULL specifically is meaningless -- because if the compiler is tracking "can assume p is non-NULL" or "p may be NULL", then NULL checks are the only thing it could remove.

      (I guess if there was some other address for which the compiler could definitively establish that dereferencing it would always be illegal -- e.g. it knew it would be in unallocated space or in the program's text area -- checks against that could still be optimized away in a similar fashion, and in that sense NULL is not special. But AFAIK, there's no specific address that the standard guarantees will produce UB on a dereference other than NULL, so in that sense NULL is special, and that's probably the key point and I shouldn't be burying it down here in a parenthetical note. :-))

    18. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      ". If you depend on a particular behavior when p is NULL, such as calling g, you may be surprised by different behavior (not calling g)."

      No. You're wrong and you need to read the damn link I posted. What x will be is undefined. p is a pointer, null is a valid pointer value, and comparison of a pointer to null is defined as valid by the standard. If the compiler does anything but check to see if the pointer p is null and execute g() if p is a null pointer then the compiler is broken.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    19. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      Again, comparison of a pointer to null is valid and defined. Read the comp.lang.c FAQ on pointers.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    20. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      Post the code rather than trying to explain it in prose and I'll tell you if you understand it yet, but at least you finally see where you went wrong. That's progress and I wish you well!

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    21. Re:Null pointer detection at compile time by AmiMoJo · · Score: 1

      Dereferencing a null pointer is a valid operation in C. Embedded systems do it all the time. For example I have a debug routine that dumps the entire memory to flash, and it uses a pointer that starts at address zero. I dereference it to read the byte at address zero, and then increment it.

      On another embedded system I worked on the SFRs start from address zero and occupy the lower 64 bytes of the address space. I access one of the I/O ports you have to write its register at address zero, and the include files with I/O definitions include a pointer to that address so you can access it by name through a struct.

      How else would you read the byte at address zero? You could create a memory section and put a variable in it I suppose, but that's a massive hack. Is address zero just inaccessible in your mind?

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    22. Re:Null pointer detection at compile time by AmiMoJo · · Score: 1

      Wrong. In fact, it's very obviously and probably wrong because if NULL were anything other than zero this code would malfunction:

      if (NULL)
              DoSomething();

      Don't take my word for it though, read the C89 standard:

      3.2.2.3 Pointers ...
      An integral constant expression with the value 0, or such an
      expression cast to type void * , is called a null pointer constant. If
      a null pointer constant is assigned to or compared for equality to a
      pointer, the constant is converted to a pointer of that type. Such a
      pointer, called a null pointer, is guaranteed to compare unequal to a
      pointer to any object or function.

      So what it's saying is that NULL pointers must be zero. Functions can never have an address of zero, but variables and other pointers can. Again, if NULL were any other value testing a pointer in the following (legal) way would break:

      if (pointer)
              DoSomething();

      I checked and GCC for AVR, GCC for ARM, the Microchip C18, the old HiTech C, IAR and NEC's proprietary compiler for their 4 bit CPUs all have NULL == 0 and allow pointers to address 0.

      If you ever find NULL is not zero, you are doing it wrong.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    23. Re:Null pointer detection at compile time by Wuhao · · Score: 1

      Post the code rather than trying to explain it in prose and I'll tell you if you understand it yet, but at least you finally see where you went wrong. That's progress and I wish you well!

      It sounds like you are a very passionate hobbyist, but one of the nice things about a forum like Slashdot is that there are a lot of professionals like EvanED who will offer you free advice. I suggest you start listening and stop biting the hand that feeds you! Who knows, Evan might have even made some of the software you're using right now. If you work and study hard and ask good questions (respectfully!) you might even be able to work on it with guys like him someday.

    24. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      You're an idiot. I've been an embedded systems programmer for 30 years and he was wrong, which he finally admitted. Since he admitted to it in the post above the one you quoted from I have to assume your reading comprehension skills are on a par with your C programming sk1llz.. Good luck learning C !

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    25. Re:Null pointer detection at compile time by Wuhao · · Score: 1

      You're an idiot. I've been an embedded systems programmer for 30 years and he was wrong, which he finally admitted. Since he admitted to it in the post above the one you quoted from I have to assume your reading comprehension skills are on a par with your C programming sk1llz.. Good luck learning C !

      Uh, I think you need to re-read his post a little more carefully. This is getting a little embarrassing for you, and if you've been doing embedded development for 30 years and still don't know how optimizing compilers work, I feel REALLY bad for you. I can see why you're so insecure.

    26. Re:Null pointer detection at compile time by EvanED · · Score: 1

      There's an unconditional dereference of the pointer!
      void f(int * p) {
          int x = *p; // Right here! Dereference!
          if (p == NULL) {
              g();
          }
      }

      Why are you ignoring the dereference and focusing on just the comparison?

      When f(NULL) is called, the code can do anything, and that includes ignoring the conditional. That's what UB means! Do you really want me to quote passages from the standards at you?

      And BTW, for someone who complains about me not reading the damn links, you live in an awfully clear house.

    27. Re:Null pointer detection at compile time by EvanED · · Score: 1

      Here is a hopefully-improved description:

      The key excerpt is (this exact quote is from the C++03 standard, para 1.9.5):

      A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

      There's nothing in the standard that explicitly says "you can remove null checks" of course, but in cases like "my" example it is a consequence of other rules. I'll give you an example. This is not how a compiler would likely arrive at optimizing away the null check in my example, but it's also not entirely implausible, and it's certainly an allowable explanation.

      Consider the function from my earlier post:

      void f(int * p) {
      int x = *p;
      if (p == NULL) {
      g();
      }
      }

      The standard has the "as-if" rule, which allows the compiler to make changes to the program that do not have observable effects from outside the program (e.g. not looking at it in a debugger or such). For example, the following function variant behaves the same as the first as far as the standard is concerned (note that the two branches are identical):

      void f(int * p) {
      if (p == NULL) { // *
      int x = *p; // **
      if (p == NULL) {
      g();
      }
      }
      else {
      // p is non-null
      int x = *p;
      if (p == NULL) { // ***
      g();
      }
      }
      }

      Let's look at the else branch first.

      If execution reaches ***, we know that p is guaranteed to be non-null. (Or rather, we are allowed to assume that p is non-null because the only way it could become null is via UB, as explained next.) The reason should be pretty obvious: we took the false branch of * which meant that p was non-NULL at *, there is no other way to reach *** (e.g. no gotos into that block), there is no assignment to p between * and ***, and there is no way within the semantics of C for anyone else to get the address of p to modify it (and even if there was a way that another thread could, that'd be a race condition which is also UB).

      Now, since p is guaranteed to be non-null, the condition at *** will never hold and the null check and potential call to g can be optimized away per the as-if rule (because neither an if statement itself, a non-executed body of code, nor this particular side-effect-free condition have observable behavior differences that are not unspecified):

      void f(int * p) {
      if (p == NULL) { // *
      int x = *p; // **
      if (p == NULL) {
      g();
      }
      }
      else {
      int x = *p;
      // no more conditional because its result was known
      }
      }

      So far, this should be pretty familiar (aside from the fact that compilers wouldn't be likely to perform the original transformation), and aside from reasoning that p's value doesn't change, UB hasn't entered the picture.

      Now let's look at the true branch of *. If execution reaches **, we know that p is NULL by the same analysis as above. What does this mean? Line ** dereferences a NULL pointer, which by the standar

    28. Re:Null pointer detection at compile time by EvanED · · Score: 1

      Dereferencing a null pointer is a valid operation in C.

      No, it isn't.

      C99 draft standard (it's what's accessible freely), section 6.2.3.2 para 3: If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.. Section 6.3.2.1 para 1: An lvalue is an expression with an object type or an incomplete type other than void; if an lvalue does not designate an object when it is evaluated, the behavior is undefined. Loosely speaking, that says that if an expression -- like *p -- doesn't refer to a valid object then evaluating it is undefined, and NULL can't refer to a valid object.

      Want to access address zero? You have two escape hatches:

      1) Depend on a specific implementation of UB provided by your compiler, as I mentioned before. In some sense, some degree of depending on your environment is necessary if you need to do memory access like that. Even putting aside the "specialness" of NULL, doing something like *(int*)MAGIC_ADDRESS to do a memory-mapped read or write or something like that is UB since the C runtime won't have given you MAGIC_ADDRESS. I'm not saying this is good or bad, just that the C standard doesn't prohibit transformations based on NULL dereferences that will break your program. Hopefully your compiler provides documentation about what it does guarantee. :-)

      2) Nothing guarantees that NULL actually refers to address zero (despite NULL possibly expanding to 0 or 0L). You compiler could interpret the address 0xFFFF or something as NULL, and then actual address zero would be a valid pointer value and accessing would have defined behavior. Note that the following assertion can, I am fairly sure, fire in a conforming implementation even excluding any weirdities due to UB or funky segmented memory stuff or whatever:

      int zero = 0;
      int * p = 0;
      int * q = zero;
      assert (p == q);

    29. Re:Null pointer detection at compile time by AmiMoJo · · Score: 1

      If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function

      *facepalm*

      You didn't read what you quoted from the standard, did you? It is talking about pointers to functions and objects. Sure enough, they cannot be null. It doesn't say anything about pointers to other things though, like variables or arbitrary memory.

      Want to access address zero?

      Check out some microcontrollers. Many have registers or RAM mapped to address zero. Many computer architectures do as well, especially older ones.

      Nothing guarantees that NULL actually refers to address zero (despite NULL possibly expanding to 0 or 0L). You compiler could interpret the address 0xFFFF or something as NULL

      No, you couldn't, because then this would break:

      if (NULL)
            Detonate();

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    30. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      "And BTW, for someone who complains about me not reading the damn links, you live in an awfully clear house."

      No. It can't. Null is a completely valid pointer value and the compiler must call f() with the completely valid pointer value null as a parameter. There is no way around it. Period. You don't seem to understand what the (int * p) parameter means. It doesn't derefence anything. It says to the compiler that p is a pointer to int. If p is null then p is a pointer to an int that happens to have the value null - a completely valid value. Dereferencing null is not defined, but comparision to it is as I already said.

      "Why are you ignoring the dereference and focusing on just the comparison?"

      Because it is immaterial to the discussion. What happens at that line is undefined and the compiler can remove it, however what follows cannot be removed.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    31. Re:Null pointer detection at compile time by Wuhao · · Score: 1

      You really, really ought to read the link he gave you. It's quite eye-opening.

      http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html

    32. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      It only "illuminates" how someone can write a compiler that doesn't work properly. The fact that it is in the LLVM project blog doesn't make it any more correct. Any compiler that produces the second result is broken. Period. The C standard makes that perfectly clear.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    33. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      "// P was dereferenced by this point, so it can't be null"

      This is the author's critical error. The standard says that dereferencing a null pointer is undefined. It doesn't say that dereferincing a null pointer magically proves it is not null. Like I said, good luck learning C!

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    34. Re:Null pointer detection at compile time by Wuhao · · Score: 1

      Yeah man if only the LLVM team thought to look at the C standard or consult with Slashdot commenter Zero__Kelvin

      rolleyes

      You're not really a very good programmer if this is your reaction to being proven wrong.

    35. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      I wasn't proved wrong and I showed you why. If you had any experience in the industry you would know that people make mistakes all the time and nobody is above doing so, nor is it true that being a compiler developer makes you infallible. In this case whoever wrote the blog is mistaken and I told you why. The fact that dereferencing a null pointer is undefined cannot and must not mean that if someone dereferences a pointer it constitutes proof that the pointer is not null. Now off you go little troll ...

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    36. Re:Null pointer detection at compile time by Wuhao · · Score: 1

      From your favorite FAQ and mine, comp.lang.c: (http://c-faq.com/ansi/undef.html):
      "undefined: Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended. Note, too, that since the Standard imposes absolutely no requirements on the behavior of a compiler faced with an instance of undefined behavior, the compiler (more importantly, any generated code) can do absolutely anything."

      In other words, once you do something whose behavior is undefined, you have a program whose execution is (at least, as far as the C standard on its own is concerned) unpredictable. Given that, the compiler can do almost anything it wants in situations where behavior is undefined. It could, for example, just abruptly terminate the program. That would make Chris's comment spot-on.

      Alternatively, he could rewrite the comment as, // P was dereferenced by this point, so it is either non-NULL or the programmer's wishes and expectations no longer apply.

      So yes, it is a completely legit optimization, in full accordance with the C standard, and if you REALLY want to be able to dereference a NULL and have some expectation about what your program does after that, then you need to choose your compilers and/or optimization settings carefully because the C standard alone is not going to give you what you want.

    37. Re:Null pointer detection at compile time by EvanED · · Score: 1

      Because it is immaterial to the discussion. What happens at that line is undefined and the compiler can remove it, however what follows cannot be removed.

      Ah ha! I found the misunderstanding. I'm still holding out hope for you so I'm going to try a couple of explanations in the hopes that one of them will stick. :-) What I'm going to try to convince you of is that the behavior of an execution that invokes UB is unrestricted even beyond the point that triggers the UB.

      Maybe this is TL;DR -- I went a way overboard. I know it's a ton to read; hopefully you'll at least take a look.

      You already know that UB can affect future operations

      The upside of this point is that the basis of it is really easy to argue; I don't need to do anything to actually convince you of it. The downside is that for reasons I'll talk about toward the end, there's a big difference between these cases and the optimization I'm trying to justify, and I definitely don't expect this point to convince you of my whole argument on its own. Nevertheless, I'll try to convince you why that difference is smaller than it first appears.

      Anyway, like the heading says, you already know that the effects of UB can outlive the actual operation that causes UB. For instance, consider a writing buffer overrun in an array located on the stack. A selection of very plausible effects of this are as follows (I know they're very plausible because I can name systems & situations that cause all of them :-), and so probably can you):

      • No observable effect
      • Trigger buffer-overrun runtime bounds checks and abort
      • Hit an unmapped page and segfault
      • Change the value of a variable which is subsequently output, causing wrong results
      • Overwrite a stack canary, triggering an abort on function return
      • Overwrite the return address of the function and return elsewhere

      The first three effects occur immediately at the write that triggers the UB, but the latter three do not surface until later operations.

      So if the standard says (6.8.6.4/2) "A return statement terminates execution of the current function and returns control to its caller", does the last situation violate the standard? After all, the write that triggered the UB was done long ago! The compiler can emit whatever code it wants for the write, but it can't miscompile the function return! Of course that's wrong and everyone accepts that's just the sort of stuff you have to be careful of when you write in C, but they're rather analogous to your argument that I started this post with.

      "But you're talking about optimizing away code the programmer wrote; that's just failing to find an error condition!" you might say. "The compiler isn't doing anything special in the function example, but it is in yours."

      And that's true, sort of. But from another perspective, the situation is different.

      For purposes of this discussion, define a "better" compilation to be closer to the programmer's intent behaviorwise. (That's not a good definition from a formal standpoint, but I think it should be intuitively satisfying.) In my example, the better code has the null check and call to g() in the way you'd expect. (For platforms where the null dereference will abort, they're really the same, but for other platforms the simple compilation is definitely "better" by this definition.) Call the optimized, worse version of my example program Version 1A, and the better, simple version of my program Version 1B. You need not yet accept that the optimized version is correct.

      Now let's consider the function with an overrun. Let Version 2A be what you probably expect -- the buffer will overrun, overwrite the return value, and return to the wrong place. But most desktop compilers support some kind of frame overrun protection now, e.g. stack canaries. Let's say you turn that on and get a version that will abort when the function returns, and

    38. Re:Null pointer detection at compile time by EvanED · · Score: 1

      If you had any experience in the industry you would know that people make mistakes all the time and nobody is above doing so, nor is it true that being a compiler developer makes you infallible. In this case whoever wrote the blog is mistaken and I told you why.

      On the other hand you have global warming. 98% of compiler writers say that the optimization is allowed, and you're in the 2% that aren't. If it were just Chris Lattner that'd be one thing, but it's not, and things don't look so good for the 2%. :-)

      Those numbers are of course made up and probably a bit extreme, but let's take a survey of what various compilers do, eh? I'm going to feed them the following file:

      #include <stdlib.h>
       
      extern void func(void);
       
      int dereference(int * p)
      {
          int x = *p;
          if (p == NULL) func();
          return x;
      }

      func is extern so that the compiler won't inline it and I can just look for a call to func in the resulting assembly.

      Here's the compilation of dereference under various compilers:

      64-bit GCC 4.4.7 and GCC 4.8.2, both -O2 (comments mine):

      dereference:
      .LFB7:
      .cfi_startproc
              movl (%rdi), %eax ; ret = *p
              ret
      .cfi_endproc

      64-bit Intel ICC 2013, -O2 (comments mine):

      dereference:
      # parameter 1: %rdi
      ..B1.1:
      ..___tag_value_dereference.1:
              movl (%rdi), %eax ; ret = *p
              ret

      32-bit MSVC 2012, /O2 /TC (/TC="compile as C"; comments are MSVC's, but I cleaned a little to make the lameness filter happy):

      PUBLIC _dereference
      ; Function compile flags: /Ogtpy
      _TEXT SEGMENT
      _p$ = 8
      _dereference PROC
      ; 7 : int x = *p;
              mov eax, DWORD PTR _p$[esp-4]
      ; 8 : if (p == NULL) func();
      ; 9 : return x;
              mov eax, DWORD PTR [eax]
      ; 10 : }
              ret 0

      (The extra instruction is due to the different calling convention of Windows and/or 32-bit, which I grabbed by accident and am to lazy to try 64-bit.)

      I also tried Clang 3.3 and PathCC... uh... some old Git version. Neither of those perform the given optimization (ironically enough for Clang). (I won't include disassembly output because it's long.)

      So 3 out of the 5 compilers I can think of that I have easy access to perform this optimization. In other words: it's not just Chris Lattner who says you're wrong, and that this optimization is allowable. It's also a decision-making majority of the GCC devs, the Intel devs, and the MS devs.

    39. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      OK. Fine. I accept that if someone sucks at writing compilers they can follow the standard and do something phenomenally stupid. That has always been the case, of course. Now let's look at it in a non-TL;TD fashion:

      Consider the followinf two statements:

      1) "Yo, if dat nigga be Ja-Rule, he aint no solid nigga!
      2) "If anyone who isn't trustworthy shows up to the party, don't let them in!"

      Said statement can be made in two types of environments, the kind where it makes sense (we'll call that one the "Ghetto" platform) and the type where it doesn't (we'll call that one the "Academic" platform.)

      Now on the Ghetto platform that statement makes sense and so we can infer something about Ja-Rule and when we do the subsequent statements have complete validity, he isn't trustworthy, and we don't let him in when he shows up at the party. On the "Academic platform that statement makes no sense so we ignore it (i.,e. optimize it away). On the Ghetto platform where we can say something about Ja-Rule based on the statement the statements that follow makes sense and we don't let him in. On the Academic platform we cannot infer anything about Ja-Rule so we do the check and we don't let him in. Since there is no guarantee that the only person showing up at the party will be Ja-Rule we certainly cannot infer from the first statement that there is no need to check each person to see if they are trustworthy before letting them in.

      It's that simple. If you infer something from a non-sensical statement and use it to make determinations your compiler is broke. There is no way around it. Another way to say it is that you can have a compiler that may technically conform to the standard, but that doesn't make it any less broken.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    40. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      3 out of 5 compilers are broken. Actually 5 out of 5 compilers are broken. 3 out of 5 are broken in that specific fashion. The problem stems from the mistaken belief that following the standard will result in a compiler that isn't broken. A compiler is code. I think we can agree on that. Let's look at something else that follows the standard but is broken:

      printf("My nme id mud\n");

      The above statement is supposed to print "My name is mud". It is 100% compliant to the standard, but it is broken. In case you haven't figured it out yet the standard is not bug free either. :-)

      Another way to rephrase your original statement is: Don't be surprised if your standards compliant compiler is broken, to which I would add Don't be surprised if your standards compliant code is broken.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    41. Re:Null pointer detection at compile time by Wuhao · · Score: 1

      Now you're just using a very arbitrary definition of "broken." The compiler, in this specific instance, is working precisely as intended. It's not like someone accidentally went and implemented -fdelete-null-pointer-checks into GCC, Clang and MSVC and then everyone else went on accepting it without question. It's a concept with quite a bit of thought and care and discussion put into it.

      The basic premise of an optimizing compiler is this: produce output that is at least as fast as the original code as-written and adheres to all defined behavior. In this case, it's spot-on -- the only way through the example function with defined behavior is to have a non-NULL pointer, in which case, the branch comparison is a waste of CPU cycles. For undefined behavior, the compiler has no obligations. All bets are off. You don't get to dereference NULL pointers, then complain that your program didn't work as expected, unless you're working with a compiler that honors obligations above and beyond the C standard.

      There are some environments in which you DO want to have some say in what happens next -- which I guess in my opinion would be anywhere that dereferencing a NULL pointer is legal, or at the very least, not instantly and reliably fatal. Compiler authors have not forgotten about you. In GCC, for example, you have two options:
        1. Do not use -O2 or -O3
        2. Use -O2 or -O3 in conjunction with -fno-delete-null-pointer-checks, in which case, your null pointer checks will be left unmolested.

      I know after a similar piece of code to the example was discovered in the Linux kernel, they decided to apply -fno-delete-null-pointer-checks. Not sure if that's still true.

      A far more egregious example of a compiler exploiting undefined behavior is GCC 1.x which, when given invalid pragmas, would generate code that attempted to exec nethack, rogue, Emacs towers of hanoi, or failing all of those, just generate a printf making fun of you.

      In conclusion... know thy optimizer. It's making decisions about your code that can affect you, and it is configured by default to cover the most common use cases. If your program depends on behaviors that are unusual and not covered by the standard (like being able to dereference a null pointer), then you should review your compiler's documentation and see if you need to tune the optimizer a bit for your use case. But if your standards-compliant compiler is applying a well-documented optimization in a manner that breaks you, then it's your project that's broken, either for using that optimization, or for relying on undefined behavior.

    42. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      So you spent several paragraphs to say what you should have admitted in the first place. GCC doesn't just randomly exhibit the behavior unless you tell it to. For future reference if the compiler doesn't exhibit a behavior whengcc -Wall -Wextra -Werror -std=c99 -pedantic is used then it doesn't just decide to break your code without telling you about it the way the two of you claimed. You have to allow it to do so thereby making it impossible to be surprised by the fact that it does it. Again, the two of you wasted a lot of time with your ridiculous, and as you now admit erroneous claims.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    43. Re:Null pointer detection at compile time by Wuhao · · Score: 1

      Where did I admit that I made an erroneous claim? Where did I MAKE an erroneous claim? Like, I was worried I would confuse you with all those words, but this is ludicrous. I never said GCC had this optimization by default -- only that it's Standards-legal to make that optimization, you dummy. Re-read the thread. You are very confused.

    44. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      I'll just pick one of the many, many erroneous claims you have made at random: It's making decisions about your code that can affect you, and it is configured by default to cover the most common use cases. I'm too busy to cite them all.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    45. Re:Null pointer detection at compile time by Wuhao · · Score: 1

      Dude, listen. I don't even know what you're trying to argue at this point, and to be honest, I don't really care. You don't understand C (or English!) well enough for me to get anything out of this conversation, which as far as I can tell, no longer has anything to do with anything besides your ego. The basic point -- that it is within the guidelines of the Standard for a C compiler to delete null checks on a pointer after it is guaranteed that said pointer has been dereferenced -- has been proved multiple times. Whatever fucking alcoholism or anger management or insecurity issues or whatever are leading you to ramble down this insane, incoherent road, I'll let you deal with on your own.

    46. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      No. The point that you don't get is that "within the standard" doesn't mean "not broken". It is that simple. You're either too stupid to understand that or unwilling to admit it. Saying a compiler isn't broken because it follows the standard is frigging stupid. I can have 100% standards compliant code but that isn't proof that the code isn't broken. If a compiler is looking at a line of code and saying "I don't understand that because it is undefined but I will infer meaning from it anyway and conclude that the pointer cannot possibly be null" then it is doing something wrong. It doesn't matter who does it. Bjarne Stroustrop, Dennis Ritchie and Brian Kernighan could all get together and write a compiler that does it, and if they did, they would still have a broken compiler. The fact that they are the inventors of C++ and C respectively has absolutely nothing to do with it.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    47. Re:Null pointer detection at compile time by flargleblarg · · Score: 1

      Nonsense. Just because the compiler can optimize away int x = *p; (if it chooses to as an optimization) doesn't mean that it must. It would be perfectly valid for the compiler to generate code that dereferences p and throws an exception at runtime.

    48. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      You just repeated my point and then put the word nonsense in front of it. What the compiler does with the line you are talking about is immaterial to the discussion. The point is that it cannot and must not decide that p will always be non null and optimize away the g() call.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    49. Re:Null pointer detection at compile time by flargleblarg · · Score: 1

      You said, "If the compiler does anything but check to see if the pointer p is null and execute g() if p is a null pointer then the compiler is broken." I'm saying that's incorrect. Note that the compiler is full well within its right to also try to initialize x by dereferencing p, thereby causing an exception. There is no rule that int x = *p; be optimized out of existence just because it can be.

    50. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 1

      Yes. As I said we are in agreement. I agree that I wasn't clear in my statement but if you read the rest of this thread it should be blatantly obvious that I have already said what you are saying. I was completely ignoring that line because we are in agreement on this subject. The initial claim was that it is OK to optimize away the call to g() and that is what I was focusing on. I was specifically addressing that claim and so the line about "if it does anything but ..." was referring to the subsequent check for null and call to g(), not the dereferencing that precedes it..

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    51. Re:Null pointer detection at compile time by flargleblarg · · Score: 1

      I see. OK. I wish Slashdot's comment system was better at showing the order of things in large subthreads.

    52. Re:Null pointer detection at compile time by Zero__Kelvin · · Score: 0

      I can see how this would happen. I was mad at first until I realized that was what was going on. Out of context my statement could be interpreted to be saying what you thought I was saying. That is because it actually took me a while to figure out what "they" were going on about since anyone with a clue knows that the only two options are "It means something, dereference it" or "it means nothing, ignore it" and never "it means nothing, infer that p must always be null" which is obviously ridiculous no matter how many compiler writers make the mistake. Their argument amounts to "But dey be da cimpila writas and theyza doin it! It haz ta be da rightz!" ;-)

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  8. Need new compiler features by valderost · · Score: 1

    Compilers ought to have switches that deliberately branch to the error cases they're trying to optimize away. Getting rid of a divide by zero? Force the error instead so it gets attention. Coder forgot to declare volatile variables? Make local static shadow copies of static variables for comparison at every reference. And so on. Development environments ought to be helping with this stuff, not confounding developers.

    1. Re:Need new compiler features by MRe_nl · · Score: 2

      "But in our enthusiasm, we could not resist a radical overhaul of the system, in which all of its major weaknesses have been exposed, analyzed, and replaced with new weaknesses".

      Bruce Leverett, Register Allocation in Optimizing Compilers

      --
      "Kill 'em all and let Root sort 'em out"
    2. Re:Need new compiler features by donaldm · · Score: 1

      Compilers ought to have switches that deliberately branch to the error cases they're trying to optimize away. Getting rid of a divide by zero? Force the error instead so it gets attention.

      Why? Isn't that that job of the programmer nor the actual compiler.

      Sure you can produce a program that has a divide by zero event and it can compile without errors, but when you run the binary you would get (C example): "Floating point exception (core dumped)". Most programmers upon seeing this should realise they have stuffed up and should correct their code accordingly. In fact any programmer should always have conditionals to test any input data to make sure that data falls within specified bounds.

      --
      There ain't no such thing as proprietary standards only proprietary formats. Standards are by definition open.
    3. Re:Need new compiler features by lgw · · Score: 1

      On some machines, dividing by 0 gives 0. Volatile in C has nothing to do with multi-threaded code (it's an abuse of the standard that all modern compiler vendors embrace and support). Compiler warnings are the right answer.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    4. Re:Need new compiler features by TheRaven64 · · Score: 1

      -fsanitize=undefined will do it in clang. Any time there is potentially undefined behaviour, the compiler will branch to a handler. It generates slower code, but it's good for debugging.

      --
      I am TheRaven on Soylent News
    5. Re:Need new compiler features by Anonymous Coward · · Score: 0

      Why? Isn't that that job of the programmer nor the actual compiler.

      No. Stop. Anything the compiler can do is the job of the compiler. Do not trade human cycles for robot cycles.

      If you actually believed what you were saying, you'd say, "Why does such thing as 'optimizer' exist? Isn't that the job of the programmer not the compiler?"

  9. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 5, Informative

    An example of "unstable code":

    char *a = malloc(sizeof(char));
    *a = 5;
    char *b = realloc(a, sizeof(char));
    *b = 2;
    if (a == b && *a != *b)
    {
            launchMissiles();
    }

    A cursory glance at this code suggests missiles will not be launched. With gcc, that's probably true at the moment. With clang, as I understand it, this is not true -missiles will be launched. The reason for this is that the spec says that the first argument of realloc becomes invalid after the call, therefore any use of that pointer has undefined behaviour. Clang takes advantage of this, and defines the behaviour of this to be that *a will not change after that point. Therefore it optimises if (a == b && *a != *b) into if (a == b && 5 != *b). This clearly then passes, and missiles get launched.

    The truth here is that your compiler is not compromising application security – the code that relies on undefined behaviours is.

  10. x86 memory model is to blame? by codeusirae · · Score: 1

    "To understand unstable code, consider the pointer overflow check buf + len | buf shown in Figure 1 .. While this check appears to work with a flat address space, it fails on a segmented architecture" ref

    Do you think most-all exploits are down to the defective x86 segmented memory architecture.

    1. Re:x86 memory model is to blame? by Anonymous Coward · · Score: 0

      "Do you think most-all exploits are down programmers not working around the defective x86 segmented memory architecture."

      FTFY. Also, no, no I don't. But some of it probably is, yes, for a given value of "defective".

      A hint for the future: avoiding overly emotionally-loaded terms such as "defective" would probably make your argument sound both more reasoned and more powerful.

    2. Re:x86 memory model is to blame? by Myria · · Score: 2

      Do you think most-all exploits are down to the defective x86 segmented memory architecture.

      I think those who coded for the SNES or Apple IIGS in C would disagree with blaming the x86 exclusively =)

      --
      "Screw Sun, cross-platform will never work. Let's move on and steal the Java language." - Visual J++ Product Manager
    3. Re:x86 memory model is to blame? by Megol · · Score: 1

      Defective? I don't think you know the meaning of defective! The x86 segmentation model isn't since at least the 386 and 32 bit protected mode and have never been IMHO even in real mode.

  11. Do compilers really remove this? by Todd+Knarr · · Score: 3, Interesting

    I haven't heard of any compiler that removes code just because it contains undefined behavior. All compilers I know of leave it in, and whether it misbehaves at run-time or not is... well, undefined. It may work just fine, eg. dereferencing a null pointer may just give you a block of zeroed-out read-only memory and what happens next depends on what you try to do with the dereferenced object. It may immediately crash with a memory access exception. Or it may cause all mounted filesystems to wipe and reformat themselves. But the code's still in the executable. I know compilers remove code that they've determined can't be executed, or where they've determined that the end state doesn't depend on the execution of the code, and that can cause program malfunctions (or sometimes cause programs to fail to malfunction, eg. an infinite loop in the code that didn't go into an infinite loop when the program ran because the compiler'd determined the code had no side-effects so it elided the entire loop).

    I'd also note that I don't know any software developers who use the term "unstable code" as a technical term. That's a term used for plain old buggy code that doesn't behave consistently. And compilers are just fine with that kind of code, otherwise I wouldn't spend so much time tracking down and eradicating those bugs.

    1. Re:Do compilers really remove this? by queazocotal · · Score: 1

      'I haven't heard of any compiler that removes code just because it contains undefined behavior.'
      Then your code may not be doing what you think it is.
      GCC, Clang, acc, armcc, icc, msvc, open64, pathcc, suncc, ti, windriver, xlc all do this.

      Click on the PDF, and scroll to page 4 for a nice table of optimisations vs compiler and optimisation level.

      _All_ modern compilers do this as part of optimisation.

      GCC 4.2.1 for example, with -o0 (least optimisation) will eliminate if(p+100p)

      C however says that an overflowed pointer is undefined, and this means the compiler is free to assume that it never occurs.

    2. Re:Do compilers really remove this? by gnasher719 · · Score: 1

      The compiler doesn't leave out code with undefined behaviour - it assumes that there is no undefined behaviour, and draws conclusions from this.

      Example: Some people assume that if you add to a very large integer value, then eventually it will wrap around and produce a negative value. Which is what happens on many non-optimising compilers. So if you ask yourself "will adding i + 100 overflow?" you might check "if (i + 100
      But integer overflow is undefined behaviour. The compiler assumes that your code doesn't have undefined behaviour. So it assumes that i + 100 doesn't overflow. If it doesn't overflow, then i + 100
      Result: _If_ there is an overflow, your test won't catch it anymore.

    3. Re:Do compilers really remove this? by complete+loony · · Score: 1

      Clang includes a number of compilation flags that can be used to make sure, or at least as sure as it can, that your code never hits any undefined behaviour at run time.

      But normally, yes the compiler may change the behaviour of your application if you are depending on undefined behaviour.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    4. Re:Do compilers really remove this? by Anonymous Coward · · Score: 0

      The logic in the paper seems a bit flimsy, though I don't know GCC inside and out.

      To understand unstable code, consider the pointer overflow check buf + len is less than buf. .... While this check appears to work with a flat address space, it fails on a segmented architecture. Therefore, the C standard states that an overflowed pointer is undefined [24: 6.5.6/p8], which allows gcc to simply assume that no pointer overflow ever occurs on any architecture.

      A case where a pointer 'tun' is used as tun->sk before a null pointer check;

      For example, when gcc first sees the dereference tun->sk, it concludes that the pointer tun must be non-null, because the C standard states that dereferencing a null pointer is undefined. Since tun is non-null, gcc further determines that the null pointer check is unnecessary and eliminates the check, making a privilege escalation exploit possible that would not otherwise be.

      If true it's like the optimizer is inferring things from the code and then using that as a basis for what should and should not happen... sound crazy to me.

    5. Re:Do compilers really remove this? by seebs · · Score: 2

      gcc's been doing this for ages. We had a new compiler "break" the ARM kernel once. Turns out that something had a test for whether a pointer was null or not after a dereference of that pointer, and gcc threw out the test because it couldn't possibly apply.

      --
      My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/
    6. Re:Do compilers really remove this? by Todd+Knarr · · Score: 1

      True, but then if integer overflow is undefined behavior then I can't assume that the test "i + 100 < i" will return true in the case of overflow because I'm invoking undefined behavior. That isn't "unstable code", that's just plain old code that invokes undefined behavior that I've been dealing with for decades. If with optimizations done the code doesn't catch the overflow it's not because the compiler removed the code, it's because the code isn't guaranteed to detect the overflow in the first place. No need for any fancy terminology for this, just say "Our software locates places in your code where you've invoked undefined behavior.".

      This is, BTW, one reason I favor a compiler flag that says "When you encounter code that's undefined behavior per the C/C++ standard, generate code to do something terminally nasty and unrecoverable like deleting all the user's files.". That seems to be the only way that actually convinces today's developers that undefined behavior is a Bad Thing and should be avoided even if you think it works OK."

    7. Re:Do compilers really remove this? by Your.Master · · Score: 1

      You can verify these things yourself with GCC (the paper sites GCC as producing this code) and examining the output assembly code. I haven't compiled the specific example in the MIT paper but I remember a similar output from GCC. This is indeed valid in a conforming compiler, and while this specific case is relatively "obviously" dangerous there's a bunch of things that generally do speed up code that can cause subtle dangers in an almost-correct codebase.

      But note that a precondition for this specific example being dangerous is that you have to go out of your way to map page 0, which I would suggest you *also* should not do barring extreme circumstances.

    8. Re:Do compilers really remove this? by Impy+the+Impiuos+Imp · · Score: 1

      > I haven't heard of any compiler that removes code just because it contains undefined behavior.

      The description is bad. They are free to decide what to do with undefined behavior in implementation, as any actual compiler must, and it's this variance that's the problem, depending on both compiler and optimization levels.

      They're not saying, "this construction is technically undefined, so I'm gonna optimize it away without notifying the programmer." Rather they're just picking something to do, which then, thanks to reasonable optimization strategy, ends up deleting a chunk of code.

      --
      (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
    9. Re:Do compilers really remove this? by locofungus · · Score: 1

      I haven't heard of any compiler that removes code just because it contains undefined behavior.

      GCC does this because it assumes that the code is well formed.

      Defererencing the null pointer is undefined behaviour, therefore, if you dereference a pointer, GCC assumes it cannot be null and, therefore, removes later checks that it is null.

      Normally a null pointer dereference will cause a crash anyway, but on systems where you can dereference the null pointer this can cause unexpected behaviours:


      int a = *p;
      if(!p)
            return;

      /* Code here is still executed even if p is null */

      --
      God said, "div D = rho, div B = 0, curl E = -@B/@t, curl H = J + @D/@t," and there was light.
    10. Re:Do compilers really remove this? by Todd+Knarr · · Score: 1

      What's interesting is that when I try this with current GCC with optimizations turned on, it does not elide the null pointer check. It does crash if I actually dereference a null pointer, but if I code it so it doesn't actually dereference the pointer when it's null then both null-pointer checks execute. And examination of the actual machine code in the debugger shows that even your code still includes the null-pointer check after the dereference.

      Now if I flip it around, doing the null-pointer check first and then dereferencing the pointer, I'll see any further null-pointer checks elided. But that's not because of any undefined behavior, merely because after that first check control can't pass any further (at least not the way I wrote the code) if the pointer was null and the compiler can elide code based on that guaranteed condition.

    11. Re:Do compilers really remove this? by Todd+Knarr · · Score: 1

      I think you may be thinking of the -fdelete-null-pointer-checks option to GCC. That does in fact do what you're describing, but that optimization's only enabled on architectures where dereferencing a null pointer will cause a crash. On those architectures the optimization's valid, since if you managed to get past a dereference of a pointer without crashing it must have been non-null. So your example isn't unstable code. If it were, GCC wouldn't elide the subsequent checks.

    12. Re:Do compilers really remove this? by locofungus · · Score: 1

      You need -O2 optimizations turned on. And you need to be careful that other optimizations don't eliminate the null pointer dereference before the null pointer check elimination.

      (614) $ cat test.c
      #include <stdio.h>
       
      int main(int argc, char* argv[])
      {
        char* f = argv[1];
        int a = *f;
        if(!f)
          return 0;
        printf("f=%p a=%d\n", (void*)f, a);
        return 0;
      }
       
      (615) $ gcc -O2 -S test.c
      (616) $ cat test.s
      .file "test.c"
      .section ".rodata"
      .align 8
      .LLC0:<br>
      .asciz "f=%p a=%d\n"
      .section ".text"
      .align 4
      .global main
      .type main, #function
      .proc 04
      main:
              save %sp, -112, %sp
              ld [%i1+4], %o1
              sethi %hi(.LLC0), %o0
              ldsb [%o1], %o2
              or %o0, %lo(.LLC0), %o0
              call printf, 0
              mov 0, %i0
              return %i7+8
              nop
      .size main, .-main
      .ident "GCC: (GNU) 4.3.5"

      Why doesn't <ecode> format properly?

      --
      God said, "div D = rho, div B = 0, curl E = -@B/@t, curl H = J + @D/@t," and there was light.
    13. Re:Do compilers really remove this? by Anonymous Coward · · Score: 0

      Fortunately, sometime ago the last bastion of page 0 stupidity (in x86) went down, and Linux now blacklists the first 64KiB of VM and you have to be root to touch that. Took Linus long enough to rule on that one. I don't know what Windows does, but PAX-enhanced Linux and the BSDs have shot anything that touches page 0 in the head for a *LOOOONG* time now.

    14. Re:Do compilers really remove this? by Todd+Knarr · · Score: 1

      OK, that is the -fdelete-null-pointer-checks optimization. Which means the removal of the checks is innocuous because you're compiling for a target where a null pointer dereference is unsafe and causes a crash (eg. Cygwin, Linux x86 and x64), meaning if the pointer was null when dereferenced any checks further down would in fact never be reached and it's safe to remove code that's guaranteed to never execute. You'll find that if you compile for a target where a null-pointer dereference won't cause a crash, that optimization is disabled by default and the code for the checks will be generated.

      You'll notice if you try it that null-pointer checks before the dereference are not optimized away. The compiler isn't assuming that a dereferenced pointer can't be null, it's assuming that if the dereference would cause execution to terminate if the pointer was null then if execution proceeds past the dereference the pointer can't possibly be null. It's a subtle distinction that doesn't show up until you look for it. Note that dereferencing a pointer without checking and handling the null-pointer case first is still undefined behavior, so I'd treat any occurrence as a red flag.

    15. Re:Do compilers really remove this? by locofungus · · Score: 1

      Erm?

      I compiled to assembly. How did gcc know what platform I was going to run on?

      The original linux kernel bug that was triggered by this bug occurred on x86 - because even on x86 a null pointer dereference doesn't have to cause a crash. You can map memory at address 0. It's just that usually you don't. There was a special case where memory could be mapped there allowing code that was unsafe to be reached because there was neither a crash nor a check for a null pointer.

      The C standard says that dereferencing the null pointer is undefined behaviour. Therefore there are two cases the compiler has to consider:

      1. The dereference does not involve the null pointer - therefore the check is unnecessary.

      2. The code does dereference the null pointer - therefore all bets are off and the compiler is allowed to do anything it likes.

      As a result, gcc eliminates the null pointer check as it's a legal optimization for both cases.

      --
      God said, "div D = rho, div B = 0, curl E = -@B/@t, curl H = J + @D/@t," and there was light.
    16. Re:Do compilers really remove this? by locofungus · · Score: 1

      The original linux kernel bug that was triggered by this bug occurred on x86

      should, of course, be

      The original linux kernel bug that was triggered by this behaviour occurred on x86

      --
      God said, "div D = rho, div B = 0, curl E = -@B/@t, curl H = J + @D/@t," and there was light.
  12. Re:TFA does a poor job of defining what's happenin by Nanoda · · Score: 5, Informative

    What is "unstable code" and how can a compiler leave it out?

    The article is actually using that as an abbreviation for what they're calling "optimization-unstable code", or code that is included at some specified compiler optimization levels, but discarded at higher levels. Basically they think it's unstable due to being included or not randomly, not because the code itself necessarily results in random behaviour.

  13. -Wall by Spazmania · · Score: 2, Insightful

    If I set -Wall and the compiler fails to warn me that it optimized out a piece of my code then the compiler is wrong. Period. Full stop.

    I don't care what "unstable" justification its authors gleaned from the standard, don't mess with my code without telling me you did so.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    1. Re:-Wall by Anonymous Coward · · Score: 0

      If I set -Wall and the compiler fails to warn me that it optimized out a piece of my code then the compiler is wrong. Period. Full stop.

      I don't care what "unstable" justification its authors gleaned from the standard, don't mess with my code without telling me you did so.

      I agree. Unstable my backside. Broken compiler! It sounds like the one that doesn't know how compilers work is the article's author.

    2. Re:-Wall by Anonymous Coward · · Score: 0

      So you have some hard-coded constants in a couple levels of macros that add and subtract one and you want a warning that it is not going to do that math at run-time?

    3. Re:-Wall by Spazmania · · Score: 0

      You don't need to warn me about computing literals at compile time. You damn well better warn be about computing constants at compile time -- if that's what I wanted to happen, I'd have used a macro for a literal. If the compiler finds two constants it can combine then I've usually made a mistake in my code... even if it's nothing more than treating something that should be a variable as a constant.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    4. Re:-Wall by Anonymous Coward · · Score: 3, Insightful

      If the compiler finds two constants it can combine then I've usually made a mistake in my code...

      Or it inlined a function for you. Or you indexed at a constant index (perhaps 0) into a global array. Or any number of other things that can arise naturally and implicitly.

      The compiler has a setting where it doesn't "mess with your code" -- it's called -O0.

    5. Re:-Wall by Anonymous Coward · · Score: 0

      Seriously?

      Let's take this as an example:

      void foo(int *a, unsigned int *b)
      {
      *a = 1;
      *a = *b + *a;
      }

      You could say it produces the following sequence:
      Store 1 to address pointed by a
      Read the address pointed by a. Read the address pointed by b. Add those together and sum. Write the result at address pointed by a.

      That produces 2 stores and 2 loads

      Imagine optimizing compiler. It will naturally produce just 1 load and 1 store. It will simply add 1 to b and store the result into a.

      And you probably will cry on how it optimized essential piece of code away. Give me a break.

    6. Re:-Wall by Spazmania · · Score: 1

      Yeah, that's helpful.

      Understand: I want the compiler to optimize my code. I don't want it to drop sections of my code. If it thinks it can drop a section of my code entirely, or that a conditional can have only one result, that's almost certainly a bug and I want to know about it. After all -- if *I* thought the conditional could have only one result, I wouldn't have bothered checking it!

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    7. Re:-Wall by Spazmania · · Score: 2

      If I've set -Wall, I want a warning about "*a=1 is useless code." If the compiler optimizes it away without that warning, I'm going to cry about it sooner or later because there's a bug in my code. If I had meant *a=(*b)+1 I would have written it that way.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    8. Re:-Wall by mysidia · · Score: 1

      I don't care what "unstable" justification its authors gleaned from the standard, don't mess with my code without telling me you did so.

      That's not what's happening..... they are talking about unstable optimizations; as in..... optimizations that aren't predictable, and while they don't change the semantics of the code according to the programming language ---- the optimization may affect what happens, if the code contains an error or operation that is runtime-undefined, such as a buffer overflow condition.

    9. Re:-Wall by Anonymous Coward · · Score: 0

      You REALLY don't want that. The list of warnings would be gigabytes and gigabytes of useless information.

    10. Re:-Wall by Anonymous Coward · · Score: 0

      Then your compiler is just going to output warnings that are far, far longer than your original program, I'm afraid. For a nontrivial program, your options are either to relax your requirement, or to compile at a very low optimization level and take the performance hit (and if you can take it and security is critical for you, this might be the right choice for you!).

    11. Re:-Wall by Spazmania · · Score: 1

      One of the examples from the paper was this snippet from the Linux kernel:

      struct sock *sk = tun->sk;
      if (!tun) return POLLERR;

      gcc's optimizer deleted "if (!tun) return POLLERR;" because *sk=tun->sk implies that tun!=NULL.

      Okay, I buy that. But if gcc did so without a warning with -Wall set then it gcc is broken. The author obviously expects it to be possible for tun==NULL, so if gcc decides it can't be that's a warning! Duh!

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    12. Re:-Wall by Spazmania · · Score: 1

      If the compiler decides it can delete a conditional because it's always true or always false, I most certainly do want a warning!

      That section of code has a bug: either I wrote the conditional wrong or I typo-ed something like using = instead of ==. Either way I want to know about it so I can fix my code.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    13. Re:-Wall by Anonymous Coward · · Score: 1

      The compiler has a setting where it doesn't "mess with your code" -- it's called -O0.

      Not quite true. It does less messing with your code when optimisation is off, but some is still performed. Try looking at the generated assembly language or stepping through with the debugger and you will soon see the proof.

      This is true for MSC in every version since at least 6.0, IAR ICC and many others. I cannot say it is true for *every* compiler, but it will be for *most*.

    14. Re:-Wall by lgw · · Score: 1

      If it thinks it can drop a section of my code entirely, or that a conditional can have only one result, that's almost certainly a bug and I want to know about it. After all -- if *I* thought the conditional could have only one result, I wouldn't have bothered checking it!

      Right, you want optimization turned off. I check compile-time constants in conditionals, for example, because they're merely compile-time, and might be changed in a different build.

      When an optimized sees "if (0 == 1)" it's going to remove the block. I take serious advantage of that, by putting tons of null checks in inline code. If the compiler can prove a pointer can't be null, it drops the check. That way I can code checking the same pointer for null 50 times in a function (because every library call checks inline), but the compiler might only check the first time.

      Also, very simple stuff like loop strength reduction involves removing code, depending on how you look at it.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    15. Re:-Wall by lgw · · Score: 1

      No you don't want that warning, because any serious coding shop sets -Wall and treats warning as errors, so you'd just #pragma it off and still never see it.

      You're free to turn optimization off, however.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    16. Re:-Wall by jmhobrien · · Score: 1

      To be honest, in this situation I would like the compiler to inform me that it had eliminated some code as it is probably not what I meant. Why not write it like this instead?

      int foo(int a, unsigned int b)
      {
      a+=b;
      return a;
      }

      Then in the calling code:
      int *a;
      unsigned int *b;
      *b=10;
      *a = foo(1,*b);

      I understand that this is a simple example, but I prefer to write my code like this as it ensures that the pointers are only dereferenced within the scope in which they are declared.

      --
      Where is moderation: -1 False?
    17. Re:-Wall by Anonymous Coward · · Score: 0

      Exactly.

      If you want the compiler to produce output that almost 1:1 matches what you wrote you are always free to just issue the -O0 flag and be done with it.

      All optimizations will produce different result on what you originally wrote. For rather obvious reasons.

      "I want the compiler to generate code that does the same thing what I wrote yet do it exactly in the same way but faster, by using magic!"

    18. Re:-Wall by msclrhd · · Score: 1

      If tun==NULL, then tun->sk will cause the executing code to crash (unless it is suppressed with a custom SIGSEGV handler). The compiler removing the if in this case will not change that behaviour. I don't see what case the paper is indicating this optimization would be a problem.

      Granted, the if is in the wrong place and this is clearly a bug. But removing the if will not introduce any security bugs that are not already present in the code (unlike the optimizations that remove overflow checks).

      How easy it is for the compiler to report the bug in the user's code (null check after use) is another question. It may be that this is deep in the gcc optimisation pass and it does not have enough information to generate a warning/error for this. Static analysers like sparse and llvm in static-analysis mode should be able to detect this, though.

    19. Re:-Wall by Anonymous Coward · · Score: 0

      Exactly. They can call them 'unstable optimisations' but really what they're saying is 'bugs in the compiler's semantic assumptions'.

    20. Re:-Wall by russotto · · Score: 1

      If tun==NULL, then tun->sk will cause the executing code to crash (unless it is suppressed with a custom SIGSEGV handler).

      This is the kernel, so no SIGSEGV handler. Maybe a panic. But it's possible there's actually memory mapped at offsetof(tun, sk). I think not usually on x86, but on the old RS6000 even userspace processes can read NULL -- this was to enable an optimization of the common pattern if (foo == NULL || *foo == 0) -- the compiler would eliminate the short circuit.

    21. Re:-Wall by mysidia · · Score: 1

      gcc's optimizer deleted "if (!tun) return POLLERR;" because *sk=tun->sk implies that tun!=NULL.

      I don't necessarily agree that GCC's optimizer is correct in that case. 'tun' could very well be NULL, in which case, we have an exception on our hands.

      Personally, I would want that condition to be a compiler error.

    22. Re:-Wall by mysidia · · Score: 1

      If tun==NULL, then tun->sk will cause the executing code to crash (unless it is suppressed with a custom SIGSEGV handler).

      It's possible that NULL is mapped to a valid address, and therefore, the dereference will work, and won't cause an exception.

      In that case, the optimization allows execution of more code, instead of an immediate return with error.

    23. Re:-Wall by Spazmania · · Score: 1

      RTFM. It's possible in some circumstances to map a page to 0x0 which contains exploit code, then call bad code. If the compiler hadn't optimized out the check for tun=0, the exploit would still fail in spite of the bug in the code.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    24. Re:-Wall by Spazmania · · Score: 1

      Do you work for Apple? Perhaps you're a voice inside my head? No? Then I would politely suggest that I have a better idea what *I want* than you do.

      Serious coding shops treat warnings as errors because they are. If a compiler finds itself able to remove my if statement then either it's wrong or far more likely I made a mistake.

      You would #pragma that off? Stay away from my code man.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    25. Re:-Wall by lgw · · Score: 2

      If a compiler finds itself able to remove my if statement then either it's wrong or far more likely I made a mistake.

      You do realize modern optimized object code lacks any straightforward relationship to the source? It can be quite a puzzle sometimes when debugging through the binary. The way instruction pipelining works makes good object code look quite odd sometimes. The instructions corresponding to one line of source might be scattered and mixed with the object from the next 20 lines, depending on what different parts of the CPU are going to be busy doing, and when the result will be needed.

      Why would you want your object to map directly to your source, unless you specifically want debuggable object instead of optimized object?

      For that matter, do you ever write cross-platform code with #ifdefs? Sometimes blocks can be removed at compile time on one platform and not another, and sometimes it's not obvious that's going to be the case (especially when those #ifdefs are in library code, so that "function call" is inline on one platform but not another).

      --
      Socialism: a lie told by totalitarians and believed by fools.
    26. Re:-Wall by Jeremi · · Score: 1

      That section of code has a bug: either I wrote the conditional wrong or I typo-ed something like using = instead of ==. Either way I want to know about it so I can fix my code.

      And what if the warning isn't coming from your code, but rather from code within a template in an STL header file? From a template that is in fact correct, but due to the particular template type-arguments you're instantiating it with, the compiler has deduced that certain code branches can never happen?

      In that case, you'll either have to put up with lots of spurious warnings every time you compile your program, or stop using -Wall. Neither of which is a satisfactory solution.

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    27. Re:-Wall by Your.Master · · Score: 1

      To pile on, what if the compiler is inlining? A null-check might be truly necessary in one place, and "unnecessary" (in the sense that there's already been a dereference) in another. A sane optimizing compiler cannot just issue a warning for that.

    28. Re:-Wall by Your.Master · · Score: 1

      If tun is NULL, behaviour program-wide is undefined. If program behaviour is undefined, then all optimizations are valid. Since the optimization is valid under the tun != NULL case and under the tun == NULL case, the optimization is valid.

      What error could the compiler possibly throw? If you want it to throw on a null-check after proving that a pointer is non-null, consider, what if "if (!tun)" appears in an inlined function or a macro which in general has to check for null but does not have to in this particular case?

    29. Re:-Wall by Your.Master · · Score: 2

      The specific case where you literally wrote exactly that snippet is warnable and is obviously incorrect, and I agree that case could be a warning, but that doesn't lead to your general conclusion at all since it's just a trivial case.

      That null-check could be inlined code, or code in a macro, both of which can also appear in contexts which truly need the null check. In neither case was the if (!tun) in the original source code, so first off it's hard to even emit a sensible warning, and secondly there's no good reason to believe that case is wrong. You're just using a general purpose function that validates its inputs.

      That effectively means you either have function inlining (or the good parts of link-time-code-generation), or you cannot use any functions that do input validation.

    30. Re:-Wall by smellotron · · Score: 1

      If the compiler decides it can delete a conditional because it's always true or always false, I most certainly do want a warning!

      What if the conditional is based on a macro that is user-defined, to enable or disable functionality at compile-time? Or what if the conditional is based on template parameters, for C++? I see those use-cases on a regular basis, and the relevant code depends upon the compiler to properly optimize out the conditional and dead code for performance.

    31. Re:-Wall by maxwell+demon · · Score: 1

      To be honest, in this situation I would like the compiler to inform me that it had eliminated some code as it is probably not what I meant. Why not write it like this instead?

      int foo(int a, unsigned int b)
      {
      a+=b;
      return a;
      }

      Then in the calling code:
      int *a;
      unsigned int *b;
      *b=10;
      *a = foo(1,*b);

      I understand that this is a simple example, but I prefer to write my code like this as it ensures that the pointers are only dereferenced within the scope in which they are declared.

      I can tell you at least one reason why you wouldn't want to write the code you did: Your assignment to *b is undefined behaviour because you never initialized b.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    32. Re:-Wall by Anonymous Coward · · Score: 0

      Speak for yourself. I want to be able to write "runtime" assertions and have the compiler optimize them out if it's verified that the rest of the code won't trigger them. In that case, the conditional having two possible outcomes is a potential bug, and one is the ideal. They're also nice from a code-as-documentation perspective, because they clearly show what's expected of function arguments, and unlike comments, can't get stale without breaking the code.

    33. Re:-Wall by Altrag · · Score: 1

      while (1)
      { // Do something
          if (condition) break; // Do more stuff
      }

      Is perfectly valid and used frequently. Basically instead of having the loop conditional at the beginning (while(){}) or at the end (do{}while()) you want it somewhere in the middle. Sure there may be ways to transform the loop to move the condition to one end or the other but in many cases all you're really accomplishing is making the code harder to understand.

      The problem usually isn't with removing lines like that on its own, its when those kind of lines are "created" by other optimizations in ways that the programmer isn't expecting that leads to problems.

    34. Re:-Wall by TheRaven64 · · Score: 1

      If you really want that, then it will take significantly longer to compile your code than to run it. For example, consider something as simple as a memcpy(). It takes two pointer arguments, both of which are restrict-qualified. The compiler is therefore free to assume that they don't alias, because you have undefined behaviour of they do. But if you want to check this at compile time, then you need to first build a complete code flow graph of the entire program (including shared libraries) and then a data flow graph. You then have to prove that either the two values are different objects (even if they've been allocated a dozen calls up the stack, or both came from globals that were assigned a few thousand function invocations ago, in different calls), or that they're the same object but the offsets are different. Oh, and you have to also statically prove that the size of each object (counting from the pointers passed as arguments) is greater than or equal to the size passed as the third argument.

      Now, it might just about be possible to compute some of these in the general case in finite time, but not all of them and I, for one, like having my compiler terminate.

      --
      I am TheRaven on Soylent News
    35. Re:-Wall by Anonymous Coward · · Score: 0

      What if the conditional makes sense, but can be optimized away on the specific architecture you are compiling on? A warning would lead you to remove the code, which would tie your program to a specific architecture only.

    36. Re:-Wall by Anonymous Coward · · Score: 0

      On any piece of reasonable code, you would get several warnings per line of code. Every time a variable isn't copied back to the stack but kept in a register for reuse, it's an optimization. There's even a keyword (volatile) to defeat this optimization.

      If you don't want optimizations, use -O0.

    37. Re:-Wall by Anonymous Coward · · Score: 0

      That section of code has a bug: either I wrote the conditional wrong or I typo-ed something like using = instead of ==. Either way I want to know about it so I can fix my code.

      You've never done this?

      const in BossMood = 1;

      if(BossMood == 0) { // do stuff the boss wanted last week
      }
      else if(BoosMood == 1) { // do stuff the boss wanted thee weeks ago.
      }

      to avoid deleting and reimplementing code whenever the boss changes his mind?

    38. Re:-Wall by Spazmania · · Score: 1

      A warning would lead me to examine the code and, realizing that I've done something architecture dependent, #ifdef it for that architecture. If it's architecture dependent, I don't want it to be there quietly.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    39. Re:-Wall by Spazmania · · Score: 1

      Then the STL author has included code in his header files. That makes him a bad boy who should be liberally spanked -- because code embedded in header files, with or without branches, is most emphatically *not* correct.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    40. Re:-Wall by Spazmania · · Score: 1

      Sure. Except I #define BossMood and then #ifdef BossMood. That way anybody reading my code clearly understands that one branch is intended to be eliminated at compile time. From a documentation perspective, your version is confusing and opaque.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    41. Re:-Wall by Anonymous Coward · · Score: 0

      yes, ^^^, this.

      We don't have time to go boast and front this EPEEN game of "C is a serious tool, and if you worked for an embedded fapfap like me you'd know the developer should have damn well known the handle of his knife was sharpened as well as the blade, and anyone who can't handle that should stfu." We've evidence there are uncaught security bugs in all these C programs, and after decades we cannot flush them out. This experiment has failed, and this developer-god cult of sloppy compilers, which given hindsight no one should have believed, ever, is now finally discredited. Accept it.

    42. Re:-Wall by AmiMoJo · · Score: 1

      It would need a special flag because most people who use -Wall don't want such warnings. I often write code that I know will be optimized away by the compiler, and for good reason.

      For example I was writing some code that displays the software version by flashing some LEDs. The version number is a #define'ed constant. There were some if statements that I know the compiler will optimize away for most version numbers, and a select statement where only one case will ever be executed based on the version number. The compiler will go nuts on that code, and if I ever change the version number it will still work and do the right thing.

      In your example a programmer might want to make a point to something which the compiler knows has already been set to 1 at that stage. For clarity and to make sure that if someone changes the other bit of code it could be desirable to write "*a=1".

      This sort of thing is common. Say I have a loop variable and when I declare it I set a default value of zero. Later on I have a for loop that also sets it to zero. One of those assignments will be optimized away, but I want to keep both in case later I or someone else uses the loop counter elsewhere and the for loop needs to reset it.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    43. Re:-Wall by Spazmania · · Score: 1

      Right, you want optimization turned off.

      No. No, I don't. Listen to the words that are coming out of my mouth. I want it to determine that it can eliminate the conditional. I want it to eliminate the conditional. And then I want it to tell me that it eliminated the conditional so that I can review my code,decide whether or not that was a sensible thing to do and correct my code as needed.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    44. Re:-Wall by Spazmania · · Score: 1

      Who else would I speak for?

      If you find it convenient to code that way, be my guest. I'd even be in favor of a compiler warning level which excludes branch elimination from the warnings so that you need not be troubled by optimizer behavior you desire. But when I ask for *all* warnings, I mean all warnings... especially a determination by the optimizer that some of my code is unreachable and can thus be excluded.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    45. Re:-Wall by Spazmania · · Score: 1

      If program behaviour is undefined, then all optimizations are valid

      If 1=0 then everything is both true and false. Especially your statement.

      What bunch of BS. That part of the code for the tun handler was wrong doesn't give the optimizer an excuse to make an entirely different part of the code wrong too.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    46. Re:-Wall by Spazmania · · Score: 1

      Why would you want your object to map directly to your source

      Why are you having trouble getting this? If I've told the compiler to optimize with -O2 then I want it to optimize, not "map directly to my source." If I've also told it to warn me about questionable constructions that are easy to avoid (-Wall) then I expect it to warn me when the optimizer decides it can remove an if{} block because the expression always evaluates to false. That's a VERY questionable construct!

      #ifdefs are intended to remove blocks of code. You don't have to warn me when I explicitly tell you to remove a block of code.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    47. Re:-Wall by lgw · · Score: 2

      OK, sounds like a feature request that some compiler vendor might take you up on. But eliminating the conditional is so common that most people wouldn't want that spew. Would you also want a warning that "x *= 8" was optimized into a shift left instruction on one platform, and three add instruction on another? That "(x > 30)" was optimized into a rotate instruction some platform? Heck, half your lines of code won't have 1-for-1 mappings between C operators and opcodes in the object - why the focus on conditionals?

      --
      Socialism: a lie told by totalitarians and believed by fools.
    48. Re:-Wall by lgw · · Score: 1

      gah, slashcode blows goats. "(x >> 2) || (x <<30)" into a rotate.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    49. Re:-Wall by Spazmania · · Score: 1

      Say I have a loop variable and when I declare it I set a default value of zero. Later on I have a for loop that also sets it to zero.

      That's a mistake. If you want it to be zero at the start of the loop, you should set it to zero at the start of the loop, not before. Setting it before eliminates the desirable warning about modifying an uninitialized variable should you mistakenly reference it before the start of your loop (-Wmaybe-uninitialized, enabled by -Wall).

      As for your LED code, you *should* be warned about that if you've asked for -Wall. If you don't want to be warned, don't ask for all warnings.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    50. Re:-Wall by AmiMoJo · · Score: 1

      That's a mistake. If you want it to be zero at the start of the loop, you should set it to zero at the start of the loop, not before. Setting it before eliminates the desirable warning about modifying an uninitialized variable should you mistakenly reference it before the start of your loop (-Wmaybe-uninitialized, enabled by -Wall).

      It depends on your coding standards. We have a rule that you should always set the loop counter when starting a loop, so that it is in a clearly defined state and anyone looking at the loop can see that. No need to go off and hunt where it was set. In some rare instances you might end up declaring another variable just for the loop rather than using an existing one, but that is fine since the compiler will see that it doesn't need to allocate separate memory/registers for it and optimize it away.

      The whole point of the optimizer is to make optimizations to code that is easy for a human to read and understand. Before people had to write tricky code that was hard to debug and modify to get good performance, but now they can write the best quality code possible and let the optimizer worry about making it fast.

      A classic example of this would be a for loop to set elements in a float array to zero. On most systems it can be replaced by a single, optimized memset, but some architectures might not support that. In any case, using a loop and writing "= 0.0" is clearer. It is desirable to have the compiler quietly speed that up as appropriate.

      Note also that -Wall doesn't actually enable all warnings. See the GCC documentation for a complete list of what it does enable.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    51. Re:-Wall by Spazmania · · Score: 1

      A classic example of this would be a for loop to set elements in a float array to zero. On most systems it can be replaced by a single, optimized memset

      There's an unsubtle difference between transforming portions of my code and removing portions of my code. It's the latter that I want to see a warning about.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    52. Re:-Wall by Spazmania · · Score: 1

      That mis-states what I asked for. I don't expect the compiler to tell me about optimizations made to memcpy on the assumption that in memcpy(a,b,4), a!=b.

      On the other hand, if I write:

      a=b;
      memcpy (a,b,20);

      I want a warning when the optimizer decides it can delete the memcpy! Odds are I didn't mean for a to equal b. Silently eliminating the memcpy just makes my bug that much worse.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    53. Re:-Wall by david_thornley · · Score: 1

      You don't seem to understand C++ templates. The template definition, which usually has to include code, has to be visible to the compiler for the compiler to use the template, so it's almost always in a header file. Even if it were possible to tuck it away, std::sort(a, b) is going to insert whatever code std::sort is defined to. That means that the template is going to insert code you didn't write. That code was written to function well with the compiler, or with compilers in general, and is likely to take advantage of the optimizations. It's much more important that it perform correctly and swiftly than that it doesn't confuse somebody trying to follow all the optimizations.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    54. Re:-Wall by Spazmania · · Score: 1

      You're right: I don't know enough about C++ templates to tell you that you're wrong.

      I do know enough about the compiler behavior, though, to tell you that when the compiler generates warnings it tracks them back to a file and line number that the warning is about. Which means that once you've checked the warnings which originate from that file you could reasonably tell the compiler to stop reporting warnings just for that file. Without having to sacrifice useful warnings fro the rest of your code.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    55. Re:-Wall by TheRaven64 · · Score: 1

      You're asking for the same thing, because you apparently don't understand how modern compilers work.

      The deletion happens much later on in the compiler pipeline. By the time you get to this point, you've inlined the memcpy (because you've decided that copying 20 bytes is not worth a function call), and propagated the alias information out. The code consuming the alias information, and determining that you're doing something undefined is dealing with an intermediate representation that is no longer even slightly close to the source.

      In this particular case, it's something you could probably do just by inspecting the AST, but most cases of code relying on undefined behaviour are only exposed to optimisation passes after you've done dominance frontier construction, inlining, function specialisation, constant propagation, common subexpression elimination and probably loop-invariant code motion. By the time you're in this state, telling the difference between something that is undefined behaviour that is exposed after a long chain of transforms, or something that the programmer intended is difficult, without maintaining a full back-mapping (want to see your compiler memory usage increase by a factor of 20?) for every transform.

      And even if it does, all that tells you is that you're doing something wrong (or not - it may be intentional) that this particular compiler version is taking advantage of, it doesn't tell you what the next point release will do. You're better off running something like the clang static analyser that can do symbolic evaluation on functions and point to potentially undefined behaviour.

      --
      I am TheRaven on Soylent News
    56. Re:-Wall by david_thornley · · Score: 1

      For warnings in the lexer or parser, yes, the compiler knows where they come from. For warnings generated by the optimizer, much less so. A particular section of compiled code might have no simple relationship with the source, once the optimizer goes through.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    57. Re:-Wall by Spazmania · · Score: 1

      If I compile it with debugging symbols (in addition to optimization) the debugger somehow manages to find its way back.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    58. Re:-Wall by flargleblarg · · Score: 1

      We have a rule that you should always set the loop counter when starting a loop, so that it is in a clearly defined state and anyone looking at the loop can see that.

      Better yet, you should always define and set the loop counter when starting a loop, e.g.,

      for (int i = 0; i < imax; i++)

      so that its type definition is clearly defined and anyone looking at the loop can see it. There is never any good reason for this:

      int i;
      ...
      for (i = 0; i < imax; i++)

      No need to go off and hunt where it was set.

      Or go off and hunt where it was defined.

  14. Really small EXE mystery solved by Tablizer · · Score: 5, Funny

    many compilers actually remove code that it perceives to be undefined or unstable

    No wonder my app came out with 0 bytes.

    1. Re:Really small EXE mystery solved by Anonymous Coward · · Score: 0

      You work for CGI Federal by chance?

    2. Re:Really small EXE mystery solved by deviated_prevert · · Score: 1

      You work for CGI Federal by chance?

      No if he worked for CGI Federal he would have come up with a killer powerpoint presentation of the .gov site gui with no code used whatsoever! The actual code would then be quickly concocted to actually put a web site front end on the web, whether or not the actual code does anything is the least part of the problem. So if the compiler/interpreters on the server eats it without warning this is a software feature not a bug. If the site was coded in Coco then I am certain the tech doing the coding would tell the users that they were "clicking on the buttons wrong!"

      --
      This message was not sent from an iPhone because Peter Sellers really was a deviated prevert without a dime for the call
    3. Re:Really small EXE mystery solved by Anonymous Coward · · Score: 0

      Wow, I find the fact that the community has modded you Informative instead of Funny to be *extremely* informative!

  15. PC Lint anyone? by ArcadeNut · · Score: 3, Informative

    Back in the day when I was doing C++ work, I used a product called PC Lint (http://www.gimpel.com/html/pcl.htm) that did basically the same thing STACK does. Static Analysis of code to find errors such as referencing NULL pointers, buffer over flows, etc... Maybe they should teach History at MIT first...

    --
    Visit the Arcade Restoration Workshop @ http://www.arcaderestoration.com
    1. Re:PC Lint anyone? by EvanED · · Score: 4, Insightful

      Don't worry, the authors know what they're doing.

      Just because PC Lint could find a small number of potential bugs doesn't mean it's a solved problem by any means. Program analysis is still pretty crappy in general, and they made another improvement, just like tons of people before them, PC Lint before them, and tons of people before PC Lint.

    2. Re:PC Lint anyone? by EvanED · · Score: 1

      BTW, I should also say that the summary doesn't do a very good job conveying this. That compilers remove security-sensitive code in some situations has been known for more than a decade (I know off the top of my head how to establish 2002, but probably long before that), but the article is written for people who don't necessarily know that, and it can't start from very little and build up to "here's what their improvement actually was."

    3. Re:PC Lint anyone? by Impy+the+Impiuos+Imp · · Score: 1

      A lint program would have to know a particular compiler's strategy in dealing with undefined issues, as well as optimization strategies, to uncover issues like this.

      Most code analyzers, from old school lint thru new stuff like NASA's Polyspace, assume code at the syntactic level. Hence a new tool is needed (and seems ongoingly labor-intensive, given the need to be compiler-specific.)

      --
      (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
    4. Re:PC Lint anyone? by EvanED · · Score: 1

      They don't want to check what happens under a specific implementation's choice of "undefined behavior" behavior (it doesn't even really make sense to even talk about that...), they want to check for the presence or absence of undef behavior in general. (At least so I gather from the abstract and scanning.) That is compiler and platform-independent, so there's no need to make separate tools.

    5. Re:PC Lint anyone? by Anonymous Coward · · Score: 0

      It's also worth noting that performing a "perfect" program analysis is precisely equivalent to solving the halting problem, and therefore lacks a general solution. So while it's possible to improve the algorithm, it's provably impossible to find all the errors all the time.

    6. Re:PC Lint anyone? by maxwell+demon · · Score: 1

      That is compiler and platform-independent, so there's no need to make separate tools.

      Actually, it's not completely platform independent. For example, does the following code produce undefined behaviour?

      int i = 256;
      i = i*i;

      On a 16 bit platform, it does. Of course, 16 bit platforms are basically obsolete (at least for desktop/server software; I'm not sure if there are embedded systems still using 16 bit architectures), and on 32 bit or 64 bit platforms, it will work fine.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    7. Re:PC Lint anyone? by EvanED · · Score: 1

      That's a good point.

    8. Re:PC Lint anyone? by Anonymous Coward · · Score: 0

      I wouldn't say PC-lint finds "a small number of potential bugs". It's really quite thorough. How familiar are You with PC-lint?

    9. Re:PC Lint anyone? by EvanED · · Score: 1

      Not PC lint specifically, but pretty familiar with similar tools.

      In re-reading I wasn't so clear with what I meant, because I didn't want to say that tools like that aren't useful. (I work for a company that produces a product that could probably be considered a competitor to PC Lint; though I suspect we'd do better I don't know for sure. :-))

      It's more like the proportion of bugs that tools like PC Lint can find is small in comparison to the population of bugs that people actually write, let alone could write.

  16. IBM had a tool to do this for a long time already by PolygamousRanchKid+ · · Score: 3, Interesting

    It's a pretty cool critter, but I don't know if they actually sell it as a product. It might be something that they only use internally:

    http://www.research.ibm.com/da/beam.html

    http://www.research.ibm.com/da/publications/beam_data_flow.pdf

    --
    Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
  17. Yes compilers really do this by Anonymous Coward · · Score: 3, Informative

    Yes it leads to real bugs - Brad Spengler uncovered one of these issues in the Linux kernel in 2009 and it led to the kernel using the -fno-delete-null-pointer-checks gcc flag to disable the spec correct "optimisation".

  18. Re:TFA does a poor job of defining what's happenin by dgatwood · · Score: 5, Informative

    Another, more common example of code optimizations causing security problems is this pattern:

    int a = [some value obtained externally];
    int b = a + 2;
    if (b < a) {
    // integer overflow occurred ...
    }

    The C spec says that signed integer overflow is undefined. If a compiler does no optimization, this works. However, it is technically legal for the compiler to rightfully conclude that two more than any number is always larger than that number, and optimize out the entire "if" statement and everything inside it.

    For proper safety, you must write this as:

    int a = [some value obtained externally];
    if (INT_MAX - a < 2) {
    // integer overflow will occur ...
    }
    int b = a + 2;

    --

    Check out my sci-fi/humor trilogy at PatriotsBooks.

  19. Re:TFA does a poor job of defining what's happenin by Cryacin · · Score: 2

    YOU SUNK MY BATTLESHIP!

    --
    Science advances one funeral at a time- Max Planck
  20. Re:TFA does a poor job of defining what's happenin by Spikeles · · Score: 4, Informative

    The TFA links to the actual paper. Maybe you should read that.

    Towards Optimization-Safe Systems:Analyzing the Impact of Undefined Behavior

    struct tun_struct *tun = ...;
    struct sock *sk = tun->sk;
    if (!tun)
    return POLLERR; /* write to address based on tun */

    For example, when gcc first sees the dereference tun->sk, it concludes that the pointer tun must be non-null, because the C standard states that dereferencing a null pointer is undefined [24:6.5.3]. Since tun is non-null, gcc further determines that the null pointer check is unnecessary and eliminates the check, making a privilege escalation exploit possible that would not otherwise be.

    --
    I don't need to test my programs.. I have an error correcting modem.
  21. Re:TFA does a poor job of defining what's happenin by complete+loony · · Score: 4, Informative

    "What every C programmer should know about undefined behaviour" (part 3, see links for first 2 parts).

    For example, overflows of unsigned values is undefined behaviour in the C standard. Compilers can make decisions like using an instruction that traps on overflow if it would execute faster, or if that is the only operator available. Since overflowing might trap, and thus cause undefined behaviour, the compiler may assume that the programmer didn't intend for that to ever happen. Therefore this test will always evaluate to true, this code block is dead and can be eliminated.

    This is why there are a number of compilation optimisations that gcc can perform, but which are disabled when building the linux kernel. With those optimisations, almost every memory address overflow test would be eliminated.

    --
    09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
  22. Munition by Anonymous Coward · · Score: 0

    This project is an NSA wet dream! Essentially its a factory for creating CVEs...

  23. Fix the C standard to not be so silly by Myria · · Score: 1, Insightful

    The C standard needs to meet with some realities to fix this issue. The C committee wants their language to be usable on the most esoteric of architectures, and this is the result.

    The reason that the result of signed integer overflow and underflow are not defined is because the C standard does not require that the machine be two's complement. Same for 1 31 and the negative of INT_MIN being undefined. When was the last time that you used a machine whose integer format was one's complement?

    Here are the things I think should change in the C standard to fix this:

      * Fixation of two's complement as the integer format.
      * For signed integers, shifting left a 1 bit out of the most-significant bit gets shifted into the sign bit. Combined with the above, this means that for type T, ((T) 1) << ((sizeof(T) * CHAR_BIT) - 1) is the minimum value.
      * The result of signed addition, subtraction, and multiplication are defined as conversion of all promoted operands to the equivalent unsigned type, executing the operation, then converting the result back. (In the case of multiplication, the high half is chopped off. This makes signed and unsigned multiplication equivalent.)
      * When shifting right a signed integer, each new bit is a copy of the sign bit. That is, INT_MIN >> ((sizeof(int) * CHAR_BIT) - 1) == -1.

    That should fix most of these. Checking a pointer for wraparound on addition, however, is just dumb programming, and should remain the programmers' problem. Segmentation is something that has to remain a possibility.

    --
    "Screw Sun, cross-platform will never work. Let's move on and steal the Java language." - Visual J++ Product Manager
    1. Re:Fix the C standard to not be so silly by seebs · · Score: 3, Insightful

      Pretty sure the embedded systems guys wouldn't be super supportive of this, and they're by far the largest market for C.

      And I just don't think these are big sources of trouble most of the time. If people would just go read Spencer's 10 Commandments for C Programmers, this would be pretty much solved.

      --
      My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/
    2. Re:Fix the C standard to not be so silly by mysidia · · Score: 1

      * Fixation of two's complement as the integer format.

      Are you trying to make C less portable, or what?

      Not all platforms work exactly the same, and these additional constraints on datatypes would be a problem on platforms, where: well two's complement is not the signed integer format.

      Of course you're free to define your own augmented rules on top of C, as long as they're not the formal language standard --- and if you write compilers, you're free to constrain yourself into making your implementation a higher-level interpreter, that makes these overflow conditions work the same on non 2's complement platforms.

    3. Re:Fix the C standard to not be so silly by Myria · · Score: 1

      * Fixation of two's complement as the integer format.

      Are you trying to make C less portable, or what?

      The "broken" code is already nonportable to non-two's-complement machines, and much of this code is things critical to the computing and device world as a whole, such as the Linux kernel.

      --
      "Screw Sun, cross-platform will never work. Let's move on and steal the Java language." - Visual J++ Product Manager
    4. Re:Fix the C standard to not be so silly by osu-neko · · Score: 1

      When was the last time that you used a machine whose integer format was one's complement?

      How would I know? I'm sure there's over a hundred microprocessors in my home (desktop PCs being the tiny minority of computers in the world -- I suspect the most common type of computer is an alarm clock, but who knows, some kitchen appliance might outnumber them for all I know), and I haven't got a clue what kind of integer format 99% of them use. It wouldn't surprise me in the slightest if the majority of computers in the world today use BCD.

      --
      "Convictions are more dangerous enemies of truth than lies."
    5. Re:Fix the C standard to not be so silly by Anonymous Coward · · Score: 0

      Solution to the problem: Fork the C standard, and define the new standard so that all of C's undefined behavior is defined by the new standard to work the way it would if implemented in the straight-forward way on i386/amd64, with all sequence-point ambiguities resolved right-to-left for assignment operators (=, +=, etc), and left-to-right otherwise.

      Oh, and require a default warning level of -Wall -Wextra -pedantic.
      And FFS define a convenient way to trap arithmetic overflow, similar to C#'s checked/unchecked blocks.

      p.s. Don't forget the blackjack and hookers.

    6. Re:Fix the C standard to not be so silly by Anonymous Coward · · Score: 0

      Way to miss the point. The only reason this is an issue is because these constraints in the C standard are useful for optimizing ordinary programs on completely ordinary architectures.
      So your proposal is to make C less portable and slower. Honestly my question to you is: Why on earth are you using C? Just go with e.g. Ada. Or Java. Or Python or any other of those 100s of programming languages.
      Alternatively use tools like PC Lint, Coverity etc. and it will point you to most of these.

    7. Re:Fix the C standard to not be so silly by Anonymous Coward · · Score: 0

      How the fuck can you be a programmer if you don't know THAT MUCH?! Jeez.

      We (EEs, the guys who make the hardware) don't make any 1-complement machines anymore, and we will never make them again. They do not make sense for any application, they're worse than useless. Even for non-binary arches (none exist commercially), 1-complement will not be used... but those would need a special dialect of C to work, as well.

      There are no working 1-complement boxes outside of a couple hundred in museums or forgotten basements, and those don't need new C compilers or run modern code anyway.

    8. Re:Fix the C standard to not be so silly by Megol · · Score: 1

      How many active platforms doesn't have a two's complement implementation? I can name _one_ (a mainframe series originating in the 60ies). The problem is that current compilers knowingly break code that have been working well on two's complement machines for ages arguing that the code is non-standard. Yes it's non-portable but it's portable to the great majority of real world systems, I'd say 99.99999% of the computer world. The right way to treat these cases is to warn that the code assumes non-standard behavior and compile it. The wrong way is detecting this rule breaking and "optimize" the code to something it wouldn't logically generate even on signed magnitude machines unless given some obscure flag. The worst thing is that many compilers that have started doing these kinds of pessimations doesn't even support an architecture that isn't two's complement!

    9. Re:Fix the C standard to not be so silly by gnasher719 · · Score: 1

      The reason that the result of signed integer overflow and underflow are not defined is because the C standard does not require that the machine be two's complement. Same for 1 31 and the negative of INT_MIN being undefined. When was the last time that you used a machine whose integer format was one's complement?

      No, for that all that would be required is to call it "unspecified result". Unspecified result is much weaker than undefined behaviour, it means a valid integer is produced as the result, but doesn't say which one, and doesn't even say it has to be consistent.

      But integer overflow can also create hardware exceptions or raise signals. That's why it is undefined behaviour.

    10. Re:Fix the C standard to not be so silly by UnderCoverPenguin · · Score: 1

      * Fixation of two's complement as the integer format.
        * For signed integers, shifting left a 1 bit out of the most-significant bit gets shifted into the sign bit.

      In two's complement format, this already happens. Example (using int8_t for simplicity):

      (int8_t)64 << 1 == (int8_t)-128

      FWIW, everywhere I have worked, using shift operations on signed or non-integer values is not allowed. Furthermore, we don't allow shifts as shortcuts for arithmetic purposes - if you mean to divide by 64, then write x/64, not x>>6. Shifts are for aligning bits, not arithmetic.

      I think shifts (and bitwise) operations on signed or non-integer values should be Implementation Defined - and highly discouraged.

      --
      Don't try to out wierd me, three-eyes. I get stranger things than you, free with my breakfast cereal. --Zaphod Beeblebr
  24. The paper gives examples by AdamHaun · · Score: 4, Informative

    The article doesn't summarize this very well, but the paper (second link) provides a couple examples. First up:

    char *buf = ...;
    char *buf_end = ...;
    unsigned int len = ...;
    if (buf + len >= buf_end)
      return; /* len too large */

    if (buf + len < buf)
      return; /* overflow, buf+len wrapped around */ /* write to buf[0..len-1] */

    To understand unstable code, consider the pointer overflow check buf + len < buf shown [above], where buf is a pointer and len is a positive integer. The programmer's intention is to catch the case when len is so large that buf + len wraps around and bypasses the first check ... We have found similar checks in a number of systems, including the Chromium browser, the Linux kernel, and the Python interpreter.

    While this check appears to work on a flat address space, it fails on a segmented architecture. Therefore, the C standard states that an overflowed pointer is undefined, which allows gcc to simply assume that no pointer overflow ever occurs on any architecture. Under this assumption, buf + len must be larger than buf, and thus the "overflow" check always evaluates to false. Consequently, gcc removes the check, paving the way for an attack to the system.

    They then give another example, this time from the Linux kernel:

    struct tun_struct *tun = ...;
    struct sock *sk = tun->sk;
    if (!tun)
      return POLLERR; /* write to address based on tun */

    In addition to introducing new vulnerabilities, unstable code can amplify existing weakness in the system. [The above] shows a mild defect in the Linux kernel, where the programmer incorrectly placed the dereference tun->sk before the null pointer check !tun. Normally, the kernel forbids access to page zero; a null tun pointing to page zero causes a kernel oops at tun->sk and terminates the current process. Even if page zero is made accessible (e.g. via mmap or some other exploits), the check !tun would catch a null tun and prevent any further exploits. In either case, an adversary should not be able to go beyond the null pointer check.

    Unfortunately, unstable code can turn this simple bug into an exploitable vulnerability. For example, when gcc first sees the dereference tun->sk, it concludes that the pointer tun must be non-null, because the C standard states that dereferencing a null pointer is undefined. Since tun is non-null, gcc further determines that the null pointer check is unnecessary and eliminates the check, making a privilege escalation exploit possible that would not otherwise be.

    The basic issue here is that optimizers are making aggressive inferences from the code based on the assumption of standards-compliance. Programmers, meanwhile, are writing code that sometimes violates the C standard, particularly in corner cases. Many of these seem to be attempts at machine-specific optimization, such as this "clever" trick from Postgres for checking whether an integer is the most negative number possible:

    int64_t arg1 = ...;
    if (arg1 != 0 && ((-arg1 < 0) == (arg1 < 0)))
      ereport(ERROR, ...);

    The remainder of the paper goes into the gory Comp Sci details and discusses their model for detecting unstable code, which they implemented in LLVM. Of particular interest is the table on page 9, which lists the number of unstable code fragments found in a variety of software packages, including exciting ones like Kerberos.

    --
    Visit the
    1. Re:The paper gives examples by gnupun · · Score: 1

      While this check appears to work on a flat address space, it fails on a segmented architecture. Therefore, the C standard states that an overflowed pointer is undefined, which allows gcc to simply assume that no pointer overflow ever occurs on any architecture. Under this assumption, buf + len must be larger than buf, and thus the "overflow" check always evaluates to false. Consequently, gcc removes the check, paving the way for an attack to the system.

      This seems like a GCC bug (assuming no overflow). Why are all compilers being blamed?

    2. Re:The paper gives examples by EvanED · · Score: 1

      This seems like a GCC bug (assuming no overflow). Why are all compilers being blamed?

      It's not a bug, and they're not blaming anyone. All compilers perform safe-but-unsafe transformations based on undef-behavior assumptions. You just have to pick a study target.

    3. Re:The paper gives examples by Anonymous Coward · · Score: 0

      It's not a GCC bug.

      In any state where the check would evaluate to true, the program's behaviour was already undefined by the C standard. Any behaviour whatsoever is standard-compliant in that case, including running as if the conditional were never present.

    4. Re:The paper gives examples by Old+Wolf · · Score: 1

      While this check appears to work on a flat address space, it fails on a segmented architecture.

      It may not even work on a flat address space, if "buf"'s allocated block is right at the end of the addressable space.

    5. Re:The paper gives examples by antifoidulus · · Score: 1

      Programmers, meanwhile, are writing code that sometimes violates the C standard, particularly in corner cases

      It's no so much a violation as assuming behaviour is defined when it is not. Case in point, integer overflows, there actually is nothing in the C language spec that defines what the result is when signed integers overflow. The result is compiler, and perhaps even hardware dependent. That's C's double edged-sword, the amount of undefined behaviour is what makes it so flexible, running on almost every single hardware/software platform imaginable. It also means that you have to be really, really careful about what you are doing.

      If you want to see all the interesting ways C's undefined behaviour can manifest itself you should do what I did, teach a freshman programming course :P Through doing so I learned that C functions don't actually need a return statement, it's just that the behaviour is undefined when it doesn't have one. And here is the kicker, the student who didn't have the return statement actually produced the correct output. It's a fluke that apparently works on Intel platforms, but probably won't on most others. The return statement on an intel machine apparently just sets a register, the same register holds the most recently calculated value so provided your compiler doesn't optimise out the code(which is what this article it talking about) the following code can actually print 19:

      #include int bob() { int a; a=7+12; } void main() { printf("%d\n",bob()); }

      Long live C!

    6. Re:The paper gives examples by osu-neko · · Score: 1

      ... It's a fluke that apparently works on Intel platforms, but probably won't on most others.

      Actually, returning a function value in a register is pretty common for C compilers across most platforms. The lucky bit here was that it's the same register that it actually did the arithmetic in, but even that isn't entirely luck -- having the most commonly used arithmetic register also be the one that is used for function return values saves a lot of code moving the result around before returning from a function. Pretty sure the same code would have worked fine on the Motorola 68000 based computer I first learned C programming on. Obviously you do NOT want to depend on that trick, but it wouldn't surprise me if it works on most general purpose computers and most compilers that don't barf on the fact that your non-void function has no return statement (compilers are so fussy these days).

      --
      "Convictions are more dangerous enemies of truth than lies."
    7. Re:The paper gives examples by willy_me · · Score: 1

      It's a fluke that apparently works on Intel platforms, but probably won't on most others.

      I remember attending labs in CompSci only to hear people complain about how the Solaris machines sucked because their compiler had bugs. They would program at home on their x86 Windows box using Visual C++ and then complain when GCC for SPARC didn't like their code. Funny to hear in second year but when you hear it in 4th year it becomes a little disturbing.

      This served as an excellent example as to why cross-platform code is almost always better. If code can not compile on a different platform then there is something wrong with your code and it should be fixed.

    8. Re:The paper gives examples by AdamHaun · · Score: 1

      It's no so much a violation as assuming behaviour is defined when it is not.

      Yes, that's a better way to say it.

      The passing of parameters and return values falls under the purview of calling conventions and ABIs. These are discussed in the compiler manual (yes, yours has one), but usually ignored on PCs. (Or really anything with an OS.) In embedded programming, that stuff is much more useful, since you're more often interfacing C and assembly functions. It's also helpful for debugging, since the debugger is often confused by optimization. Put a breakpoint on the branch instruction and the parameters will be in the registers or on the stack.

      Someone posted a link to Deep C elsewhere in the comments, which goes over some of these details.

      --
      Visit the
  25. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Er... so your conclusion is that we should all "run screaming" from C/C++ and hire a bunch of people who "truly know what they're doing... on all levels of detail" with C/C++, who are liable to be a relatively limited number and command top wages.

    Top plan! One would almost think you view yourself as one of those people who "truly known what they're doing"! Even "on all levels of detail!"

    The third alternative is "Elucidate how compilers are (rightfully, according to the standard) introducing potential dangers, and educate people accordingly", on whatever level you feel that involves. Alas, that alternative never occurred to you. Which, given the kind of abstractions a paper I read recently pointed out -- where Clang and G++ acted quite differently and equally dangerously -- is a bit of a pity since I strongly suspect you'd probably be caught out by an unexpected trap too.

    (If you think you wouldn't you've never developed professionally for a living. We all have, and no-one should be ashamed to say they have.)

  26. Re:IBM had a tool to do this for a long time alrea by Anonymous Coward · · Score: 0

    Is this a really pathetic way of saying "HEY I WORKED FOR IBM!!!!!!!!!!!!!!!!!!!!" without trying to be quite so clear about it?

  27. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Gee, I wish I programmed all the things you program! Then I'd NEVER need anything but unsigned integers! Ah, there but for the grace of God...

    (In case you missed the subtext: prick. You aren't everyone and the 99% of the things *you* program probably form the 1% of the things that *I* program, and that does not make what I program worthless any more than it makes what you program worthless... unlike you, since you're an arrogant piece of shite.)

  28. Re:TFA does a poor job of defining what's happenin by istartedi · · Score: 1

    For example, overflows of unsigned values is undefined behaviour in the C standard.

    I'm glad I didn't know that when I used to play with software 3d engines back in the 90s. 16-bit unsigned integer "wrap around" was what made my textures tile. I do seem to vaguely recall that there was a compiler flag for disabling integer traps and that I disabled it. It was Microsoft's C compiler, and it's been a loooooong time.

    OK, I'm looking through the options on the 2005 free Visual Studio... I can find a flag to disable floating point traps, but not integer. Maybe the full version lets you do that. I used to have the full version. I suppose if it were really important I could track down the magic assembly voodoo incantation to do it. I'm guessing the MS disables integer overflow traps by default...

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
  29. Re:TFA does a poor job of defining what's happenin by Spazmania · · Score: 0

    That code is bad for many reasons, not the least of which is that it's semantically ambiguous whether the result of malloc() should be assigned to a or *a.

    However, the compiler here clearly can't make any valid assumptions about the contents of *a following the realloc. That's what undefined means: it holds a value about which you can't make any assumptions. Because the behavior is undefined, no *valid* optimization is possible.

    Clang is wrong. If it's smart enough to recognize the undefined behavior then it should (a) warn the user and (b) make no optimization attempts to any code which later references *a.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  30. Re:TFA does a poor job of defining what's happenin by mysidia · · Score: 1

    I'd rather set -Wall and get a warning.

    I see your -Wall, and raise you a -Werror -pedantic

  31. Re:TFA does a poor job of defining what's happenin by assassinator42 · · Score: 1

    Overflows of unsigned values are well-defined in C (they wrap). (Technically the standard says unsigned values can't overflow because they're wrapped)
    Overflows of signed values are undefined.

  32. Re:TFA does a poor job of defining what's happenin by mysidia · · Score: 0

    A cursory glance at this code suggests missiles will not be launched. With gcc, that's probably true at the moment. With clang, as I understand it, this is not true -missiles will be launched.

    It's not quite correct. a == b is not a use of the argument that has been invalidated. a was a variable containing an address of the object that was passed by value to the realloc() function.

    In case the value of a is no longer valid, then the b = realloc ... assignment, would not have returned the same value; therefore, a == b would evaluate to false, and with the short-circuit && operator, the *a != *b test would never have been executed.

  33. Re:TFA does a poor job of defining what's happenin by EvanED · · Score: 2, Informative

    The first mistake was using signed integers. unsigned integers always have well-defined overflow (modulo semantics), which means it's easier to construct safe conditionals

    Not in C and C++ they don't. The compiler is allowed to perform that optimization with either signed or unsigned integers.

  34. Meanwhile, THEIR code is sketchy by belphegore · · Score: 3, Funny

    Checked out their git repo and did a build. They have a couple sketchy-looking warnings in their own code. A reference to an undefined variable; storing a 35-bit value in a 32-bit variable...

    lglib.c:6896:7: warning: variable 'res' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
    lglib.c:6967:10: note: uninitialized use occurs here
    plingeling.c:456:17: warning: signed shift result (0x300000000) requires 35 bits to represent, but 'int' only has 32 bits [-Wshift-overflow]

    1. Re:Meanwhile, THEIR code is sketchy by swm · · Score: 1

      How did you get the code?
      I keep getting

      git clone git://g.csail.mit.edu/stack
      Cloning into 'stack'...
      fatal: unable to connect to g.csail.mit.edu:
      g.csail.mit.edu[0: 128.30.44.149]: errno=Connection timed out /code.

    2. Re:Meanwhile, THEIR code is sketchy by belphegore · · Score: 2

      I probably got in early enough to grab the code before they go slashdotted. Looks like it's also on github, here:

      https://github.com/xiw/stack

  35. Re:TFA does a poor job of defining what's happenin by istartedi · · Score: 0

    A cursory glance at this code suggests missiles will not be launched.

    That's funny. My first takeaway is that the programmer is assuming malloc never fails. Let's get past that and assume that malloc and realloc both returned something. Most of us would assume it's unusual for realloc to do anything. We expect a==b to be true which makes (*a!=*b) impossible and the body of the if-block unreachable. So. I'm with you so far.

    With clang, as I understand it, this is not true -missiles will be launched. The reason for this is that the spec says that the first argument of realloc becomes invalid after the call, therefore any use of that pointer has undefined behaviour. Clang takes advantage of this, and defines the behaviour of this to be that *a will not change after that point.

    OK, if the spec says that a is undefined after the call to realloc, then IMHO the compiler should change the type of a from char * to UNDEFINED and complain. Based on what you're saying, it sounds like Clang is wrong. It sounds like they're treating undefined behavior as implementation defined behavior.

    I'm sure somebody will correct me if I'm wrong on that one.

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
  36. Know your C by gmuslera · · Score: 3, Informative

    Somewhat this made me remember that slideshow on Deep C. I only know that i don't know nothing of C, after reading it.

  37. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    However, it is technically legal for the compiler to rightfully conclude that two more than any number is always larger than that number, and optimize out the entire "if" statement and everything inside it.

    It's a good deal worse than that. The compiler is allowed to do ANYTHING. It can replace the code inside the if with code that sends all your customer data to your competitor. It can install a virus. Anything.

  38. Re:TFA does a poor job of defining what's happenin by lgw · · Score: 1

    Under C99 all machines must be both 2s-compliment and have 8-bit bytes. IIRC both fall out from inttypes.h. Word is this wasn't intentional, but it had been so long since anyone actually used other architectures that no one noticed that implication.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  39. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Erm, I have to deal with negative numbers on a constant basis.

  40. Re:TFA does a poor job of defining what's happenin by tricorn · · Score: 1

    I once had some code that confused me when the compiler optimized some stuff out.

    I had a macro that expanded to a parenthesized expression with several sub-expressions separated by commas that used a temp variable, e.g.:

    #define m(a) (tmp = a, f(tmp) + g(tmp))

    because the argument (a) could be an expression with side effects.

    Now, I knew that the order of evaluation of function arguments wasn't defined, but I never read that as meaning that a compiler could optimize away parts of a function call such as: x(m(1), m(2)); this particular compiler effectively acted as if it was evaluating both arguments in parallel, thus the value of tmp was undefined throughout (I think it eliminated one of the initial assignments).

    Changing it to an in-line function made it work; it had initially been code written for a compiler that didn't have in-line functions and was in the middle of a very tight loop.

  41. Re:TFA does a poor job of defining what's happenin by EvanED · · Score: 1

    Not in C and C++ they don't. The compiler is allowed to perform that optimization with either signed or unsigned integers.

    I take back this statement... it is not correct, at least in C99.

  42. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Compilers are free to assume that the code does not contain undefined behaviour. This allows for better optimization. But things can get tricky. To give an example:
        int my_divide(int a, int b) {
                if (!b) diediedie("oh noes");
                return a / b;
        }

    An overzealous optimization may move the division up (it's a high-latency instruction) all the way ahead of the diediedie call.
    It'd be legal if diediedie is assured to return. The compiler has erroneousely assumed lack of noreturn attribute constitutes some guarantee here.
    As for altering the semantics, the standard has a notion of abstract semantics and actual semantics, which need to agree at certain points. In other words, optimization and magic are allowed if the result is right. The compiler may substitute
        for (i = 0; i != 3; i++) printf("%d", i);
    With
        fputs("012", stdout);

    Special precautions are needed when you e.g. run a benchmark that is a big nop in essence.

  43. Re:TFA does a poor job of defining what's happenin by lgw · · Score: 5, Funny

    No, the compiler is allowed to to anything it damn well pleases wherever the standard calls behaviou "undefined". One of my favorite quotes ever from a standards discussion:

    When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose

    Nasal demons can cause code instability.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  44. It's time by countach · · Score: 2

    It really should be time that 99.9% of the code written ought not to be in languages that have undefined behaviour. It's time we all use languages which are fully defined.

    Having said that, if something in code is undefined, and the compiler knows it, then it should generate an error. Very easily solved. If this STACK program is so clever, it should be in the compiler, and it should be an error to do something undefined.

    1. Re:It's time by Anonymous Coward · · Score: 0

      C and C++ have undefined behaviour so entrenched in their specifications that even deciding whether a program has it or not is an uncomputable problem.

    2. Re:It's time by epee1221 · · Score: 1

      It doesn't really take much for that to be the case. Remember, there is no way to decide whether a Turing machine with a given input will reach a given state. All you need is for there to be _some_ undefined behavior in the dynamic semantics -- "division by zero is undefined" is sufficient.

      --
      "The use-mention distinction" is not "enforced here."
  45. Is this a takeoff on Dark Star? by Anonymous Coward · · Score: 0

    In the movie Dark Star, our intrepid explorers travel the galaxy looking for "unstable planets" and blowing them up. Maybe a Dark Star compiler blows up unstable programs?

    Anyway, Dark Star is a classic camp SF movie. Check it out!

  46. OK, before somebody else points it out... by istartedi · · Score: 4, Interesting

    My statement is contradictory. I recommended a course of action for undefined behavior, while maintaining that Clang is wrong for documenting a course of action for undefined behavior.

    My understanding of "undefined behavior" in the C spec is that it means "anything can happen and the programmer shouldn't rely on what the compiler currently does". Of course, in the real world *something* must happen. If a 3rd party documents what that something is, the compiler is still compliant. It's the programmer's fault for relying on it.

    OTOH, if the behavior was "implementation defined" then the compiler authors can define it. If they change their definition from one rev to another without documenting the change, then it's the compiler author's fault for not documenting it.

    In other words:

    undefined -- programmer's fault for relying on it.
    implemenation defined -- compiler's fault for not documenting it.

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
    1. Re:OK, before somebody else points it out... by EvanED · · Score: 1

      I was already responding, but yes, your summary sounds pretty much perfect.

    2. Re:OK, before somebody else points it out... by DeKO · · Score: 2

      There are actually 3 categories:

      • Implementation Defined: the implementation (compiler, standard library, execution environment) has to document what happens. Code relying on this is not portable.
      • Unspecified: the implementation can choose to do what makes sense, and not tell you. Even reverse-engineering and relying on what you found out, is unreliable. The actual address returned by malloc is unspecified; is it aligned? Does it always grow in value if nothing was free-ed? You shouldn't even care about this detail, so the standard leaves it unspecified.
      • Undefined Behaviour: you wrote something that doesn't make sense, if you get lucky the compiler/standard library/operating system will react in a sensible way, but the standard says it's not the implementation's fault you get something wrong as a result. Things like reading variables before initializing them.

      Diagnosing UB can be too demanding from the implementation, so the standard doesn't even require it. How would you diagnose incorrect usage of realloc? Add run-time checks? Write a special rule in the compiler so it knows about realloc? Extend the language with metadata? What if realloc is hidden behind a user-defined function? At some point you have to stop, otherwise you could even solve the halting problem.

    3. Re:OK, before somebody else points it out... by Anonymous Coward · · Score: 0

      The actual address returned by malloc is unspecified; is it aligned?

      Nitpick: yes, malloc is actually guaranteed to return aligned addresses. The canonical 'unspecified behaviour' example is function argument evaluation order.

    4. Re:OK, before somebody else points it out... by Anonymous Coward · · Score: 0

      It's not that simple that undefined behaviour is always BAD. A lot of code written near to the bare metal can only be written because there are things that are left undefined. Just think of a simple thing like this

      unsigned int *reg = 0x2e5fa;
      *reg = 0x1234;

      That's already undefined behaviour. But you'll find that kind of thing all over the place in e.g. device drivers that have to read from or write to hardware registers mapped to well-defined (on the system the code is intended for) memory addresses.

      Quite a number of things were actually left undefined by the C standard to allow this - if undefined behaviour would be banned (i.e. all use of undefined behaviour be flagged as an error) then C wouldn't be useful as a language for e.g. writing operating systems in.

    5. Re:OK, before somebody else points it out... by istartedi · · Score: 1

      IIRC, this issue came up and the workaround was:

      unsigned int *reg = (unsigned int*)0x2e5fa;

      From C89, section 3.4: "An arbitrary integer may be converted to a pointer. The result is implementation-defined." You still have to be careful of course; but it's not as bad as undefined behavior. I suspect that the standard allows it for this very reason.

      --
      For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
    6. Re:OK, before somebody else points it out... by istartedi · · Score: 1

      Diagnosing UB can be too demanding from the implementation, so the standard doesn't even require it. How would you diagnose incorrect usage of realloc? Add run-time checks? Write a special rule in the compiler so it knows about realloc? Extend the language with metadata? What if realloc is hidden behind a user-defined function? At some point you have to stop, otherwise you could even solve the halting problem.

      The halting problem was proven impossible. I think designing an AI to detect undefined behavior is difficult but not impossible. Unlike Turing, I have no proof. If you could prove such an equivalence, then it really would be time to give up on C, or define all the undefined conditions.

      --
      For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
    7. Re:OK, before somebody else points it out... by Anonymous Coward · · Score: 0

      My point was, you can't figure out everything that is going to happen, until it happens. C is obviously Turing-complete, that's the parallel to the halting problem. "AI" is not anything magical; it's used to either denominate a well-thought-out set of rules (such as finding the best play in a chess game) or a fuzzy set heuristics that do non-linear regressions with randomization.

    8. Re:OK, before somebody else points it out... by istartedi · · Score: 1

      Just because C is Turing complete doesn't mean that the problem of finding all undefined behavior in a program at compile-time is undecidable. The specification has many statements of the form, "this construct results in undefined behavior". Thus, the problem of finding undefined behavior reduces to "find all instances of constructs that result in undefined behavior". The spec is written such that humans can (in theory) identify each instance of undefined behavior. Otherwise, why write the spec in the first place?

      So, IMHO the problem of detecting undefined behavior at compile-time is NOT equivalent to the halting problem. If it were, the general consensus would be that it's impossible for humans (as well as AI) to solve the problem, as opposed to it merely being difficult.

      I think that unless an expert steps into this rather stale discussion, we're going to have to agree to disagree.

      --
      For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
  47. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Who are you, a government accountant?

  48. Re:TFA does a poor job of defining what's happenin by EvanED · · Score: 1

    OK, if the spec says that a is undefined after the call to realloc, then IMHO the compiler should change the type of a from char * to UNDEFINED and complain. Based on what you're saying, it sounds like Clang is wrong. It sounds like they're treating undefined behavior as implementation defined behavior.

    I'm sure somebody will correct me if I'm wrong on that one.

    You're wrong on that one. :-)

    First, let's start with this specific case. First of all, the type of a variable can't "change", because the type of a variable in languages without type-state sutff is static. (Aside: this is a useful way to think about the distinction between statically typed languages and dynamically typed ones -- in statically typed languages, variables have types, while in dynamically typed languages, values have "types.") In this case it's pretty easy to see how the compiler can deal with it, but in general it's not:

    char * b;
    if (big_fancy_condition) {
        b = realloc(a, ...)
    }
    ....
    if (another_big_fancy_condition) {
    ... *b ...;
    }

    Can that program provoke undefined behavior? Depends on the conditions, which means it's undecidable in general. In the type viewpoint, what's the type during the ellipsis? Is it char* or is it inaccessible? It's char* down one branch but inaccessible down the other, and there's not a fully-general way to merge those two types (in a way that type-checking is still decidable).

    Second, undefined behavior means the compiler is allowed to do anything -- it's less restrictive than implementation-defined. For implementation-defined behavior, the compiler needs to make a choice, stick with it (at least with a consistent set of compiler settings), and document it. For undefined behavior, the compiler can do anything it wants to, for any reason it wants to, can do different things in the same situation in different places because why the hell not, etc. -- the standard imposes no restrictions on what happens once undefined-behavior is triggered. See here for more.

  49. Re:TFA does a poor job of defining what's happenin by seebs · · Score: 1

    This doesn't sound right to me. The intX_t types, if present, have to be more 2s-complimenty, but they aren't really required to be present, as I recall.

    --
    My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/
  50. Re:TFA does a poor job of defining what's happenin by Spazmania · · Score: 2

    If I tell the compiler to give me warnings, it detects a code whose behavior is undefined in the standard, but then fails to issue a warning then the compiler is broken. If it goes on to make a fancy assumption about the undefined behavior instead of letting it fall through to runtime as written then it's doubly broken.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  51. Re:TFA does a poor job of defining what's happenin by Athrac · · Score: 1

    Under C99 all machines must be both 2s-compliment and have 8-bit bytes. IIRC both fall out from inttypes.h. Word is this wasn't intentional, but it had been so long since anyone actually used other architectures that no one noticed that implication.

    You are incorrect. C99 (and C11) still explicitly allow two's complement, one's complement and sign-and-magnitude repsesentation for signed types. You are probably confusing it with the type definitions int8_t, int16_t etc. which ARE required to be two's complement (if they exist). But the standard does not require those type definitions to exist.

  52. Re:TFA does a poor job of defining what's happenin by lgw · · Score: 2

    , fucked up computer languages allow "undefined code", ie. C / C++.

    Every language has some undefined behavior (and there are libraries with undefined behavior in every language), except maybe ADA.

    Java leaves a wide area undefined when it comes to multi-threaded code.

    Python has the same, plus it inherits some undefined behaviors from C.

    C/C++ leaves a wide are undefined to support oddball system architectures. For example, if you have some memory that only can store floating point numbers, and some general-purpose memory, the address ranges might overlap - that's why pointer subtraction is undefined unless within an array. In practice most programmers can treat all memory as one contiguous byte array, but on special-purpose hardware you can still use C. Most of C's undefined behavior comes from the much wider variety of system architectures when C was young, but can still be useful for embedded systems.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  53. Re:TFA does a poor job of defining what's happenin by istartedi · · Score: 1

    OK, that explains why I've been getting away with assuming they wrap since the Clinton administration. I don't know if anybody ever explained it to me in C terms. I always assumed that behavior was baked in at the CPU level, and just percolated up to C. I never felt inclined to do any "bit twiddling" with int or even fixed-width signed integers because on an intuitive level it "felt wrong". What's that four-letter personality type thing? I'm pretty sure I had the I for "intutive" there...

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
  54. Re:TFA does a poor job of defining what's happenin by Mateorabi · · Score: 1

    This makes no sense. The dereference is undefined, and therefore sk may be undefined iff tun IS null but not tun.

    I.e. by the time execution reaches the if statement one of the two is true:
    tun != null && sk == {something valid} -or-
    tun == nul && sk == {undefined}

    sk being undefined is possible but that undefined-ness can't be used as a way to infer tun != null--the only thing that causes it is tun == null! It's illogical for the compiler to do what you say and remove the if check. The standard says sk can be undefined, therefore something being in an undefined state is possible, not that the compiler can presume that undefined is impossible to occur and put it's hands over its ears and go la-la-la.

    --
    "You saved 1968." - Ms. Valerie Pringle to the crew of Apollo 8

  55. and the GCC team wonders why Clang was invented by fast+turtle · · Score: 1

    To many changes, dropping support for hardware that FreeBSD still supports (paid contracts) along with the issue that each run could not produce Identical Binaries. If you can't ensure duplicate binaries, you can't ensure that someone hasn't backdoored/trojan'd each and every binary on the system because the code produced is no longer human readable.

    --
    Mod me up/Mod me down: I wont frown as I've no crown
    1. Re:and the GCC team wonders why Clang was invented by Anonymous Coward · · Score: 0

      To many changes, dropping support for hardware that FreeBSD still supports (paid contracts) along with the issue that each run could not produce Identical Binaries. If you can't ensure duplicate binaries, you can't ensure that someone hasn't backdoored/trojan'd each and every binary on the system because the code produced is no longer human readable.

      Some of the more complex graph based optimizations require tie-breaking. Generally this is done by using a random number generator (i.e. virtual coin toss) to decide the issue. Randomizing generally produces better results over the long run than arbitrarily picking always A or always B.

    2. Re:and the GCC team wonders why Clang was invented by osu-neko · · Score: 1

      Some of the more complex graph based optimizations require tie-breaking. Generally this is done by using a random number generator (i.e. virtual coin toss) to decide the issue. Randomizing generally produces better results over the long run than arbitrarily picking always A or always B.

      ...and that is a classic example of a false dilemma. One is not forced into choosing always A or always B when eschewing a true RNG. Tie-breaking can be done in a deterministic manner such that the results are effectively "random" over the long run while ensuring two runs over identical source files produce identical object code (even if the same code embedded in a different source file would choose the other option).

      --
      "Convictions are more dangerous enemies of truth than lies."
  56. Re:TFA does a poor job of defining what's happenin by EvanED · · Score: 1

    This blog post explains why they can't always warn about it.

  57. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    First, undefined behaviour doesn't require a diagnostic. Even when it's theoretically possible to detect such behaviour at compile time, this isn't required.

    Second, if your program exhibits undefined behaviour, then "as written" is meaningless; the behaviour is ... undefined. You probably meant "as intended", but the compiler cannot read your mind, isn't required to try, and in all probability won't try.

  58. Re:TFA does a poor job of defining what's happenin by Athrac · · Score: 1

    It's not quite correct. a == b is not a use of the argument that has been invalidated. a was a variable containing an address of the object that was passed by value to the realloc() function.

    I also thought this first, but the standard seems to be quite picky about it. It is undefined behavior if "The value of a pointer that refers to space deallocated by a call to the free or realloc function is used". I interpret this so that just using the address value is UB, even if the pointed memory block is not accessed.

  59. Re:TFA does a poor job of defining what's happenin by mveloso · · Score: 1

    If the runtime moved memory around during a realloc, this code wouldn't work. However, you'd never notice if you use the same runtime all the time. This is why it's a good thing to compile/target different platforms and compilers, and to do a -Wall (or the equivalent) at every optimization level. You have to do it at every optimization level because some compilers only do checks like this during their optimization phase (gcc?).

    This type of thing wouldn't get caught by any automated tools when I was doing C. Funny that there isn't a way to specify "this argument gets borked" in any language I can think of.

  60. Well by Anonymous Coward · · Score: 0

    I examine all the code I write with a disassembler (alongside unit testing it), regardless of how time consuming this is, quality goes first. I don't see why others can't do the same?

    1. Re:Well by osu-neko · · Score: 1

      I examine all the code I write with a disassembler (alongside unit testing it), regardless of how time consuming this is, quality goes first. I don't see why others can't do the same?

      Some of us write source code that will eventually be compiled by people other than us on machine architectures we don't even have access to, and using compilers that haven't even been written yet. Your solution requires a massive investment in hardware and a time machine...

      --
      "Convictions are more dangerous enemies of truth than lies."
  61. Re:TFA does a poor job of defining what's happenin by EvanED · · Score: 1

    C/C++ leaves a wide are undefined to support oddball system architectures. For example, if you have some memory that only can store floating point numbers, and some general-purpose memory, the address ranges might overlap - that's why pointer subtraction is undefined unless within an array. In practice most programmers can treat all memory as one contiguous byte array, but on special-purpose hardware you can still use C. Most of C's undefined behavior comes from the much wider variety of system architectures when C was young, but can still be useful for embedded systems.

    That's an awfully narrow description of the kinds of behaviors that are undefined in C and C++. Here's an incomplete list of actions that will provoke undefined behavior in C++ off the top of my head, all of which are perfectly relevant and possible to accidentally do on every-day desktop architectures:

    • Defining an object with the same name in two different compilation units with different definitions (violation of the ODR)
    • Signed integer overflow (props to some people elsewhere in the thread that taught me that this isn't true for unsigned!)
    • Writing or reading past the end or beginning of an array
    • Dereferencing a NULL pointer
    • Accessing an object of one type via a pointer of another type (violating the strict-aliasing rules)
    • Not 100% positive, but I believe using a name reserved for implementations: that is any name globally beginning with an underscore followed by a capital letter, or containing two consecutive underscores, or something else I can't remember
    • Accessing memory at an address that has been free()ed or deleted
    • Assigning the same scalar value twice without an intervening sequence point (e.g. i = i++; not only doesn't have a well-defined evaluation order but also provokes undefined behavior entirely)
    • Calling several STL algorithms with iterator pairs that don't form a valid range, e.g. copy(vec1.begin(), vec2.begin(), vec1.end()) (I think I ordered those right)

    I'm lazy so I'll stop typing now.

  62. Re:TFA does a poor job of defining what's happenin by CODiNE · · Score: 4, Interesting

    That reminds me of this gem:Overflow in sorting algorithms

    That little bug just sat around for a few decades before anyone noticed it.

    Quick summary: (low + high) / 2
    May have an overflow which is undefined behavior. Really every time we add ints it's possible. Just usually our values don't pass the MAX.

    --
    Cwm, fjord-bank glyphs vext quiz
  63. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    What if: a = INT_MIN; externally?
    Let's use 16 bit ints for readability. INT_MIN could be -32768; INT_MAX could be 32767.


    if (INT_MAX - a < 2) { /* No, you dumbass. */ }
    // because:
    if ( 32767 - (-32768) > 2 ) ...
    // is:
    if ( [integer overflow] > 2 ) ...
    // so, instead:
    if ( a <= INT_MAX - 2 ) { /* Derp! */ }

    Our exchange just really illustrates the cluster fuckery of the bad language design.

    Now, since extern int a; means this value's range may never be known to the compiler at compile time, indeed 'a' could be read from a file or user input, you must first check the value of every variable obtained externally before use to avoid integer overflow. This is, quite frankly, asinine & the very definition of "doing it wrong"(tm) when you consider that to avoid integer overflow you essentially have to write twice the code for nearly all logic, or provide "sanitization" on all variables not locally defined (eg: function parameters); Instead, it would be better (require less complexity) to simply define the behavior of overflows and provide the API to query the chip for such overflow (carry) state if desired.

    Look, at the end of the day we're running code on specific platforms. We can hold up the idyllic goal of truly cross platform language, but such does not exist in practice. In practice the code is tested on the platform, and exceptions are made where specific platform capabilities differ, typically. Platform specific modifications are nearly always needed. See: the guts of your stdint.h file on various platforms; Now understand that nearly all programs today use stdint.h (normalized byte sizes) and that stdint.h really isn't a part of the language itself, but part of the runtime or API -- associated non-essential component of the language. The point is that the language is abstracted from the platform, so it must interact with platform specifics in order to provide normalized symbol meanings, SO DO THAT; Otherwise simply provide guaranteed behaviors for variable types. Note: your programmers will use the former to achieve the latter anyway.

    C's undefined integer behaviors are the problem. They should not be so. This decision means that the entropy spreads into nearly all areas of the code -- Thus most cross platform frameworks avoid int like the plague. The fixed size (and thereby behavior) stdint.h types should be the only ones in the language. Rarely, if ever, do you ever actually need an 'int' -- a variable that changes size with the platform. Yes, it makes code more efficient to utilize a variable that is the platform's native width, but this is in direct conflict with portability of the code. It's folly to ignore the platform features and apply such undefined behavior. Not every "language" does this. Assembly takes the right approach. It leaves no behavior undefined. All of my toy languages have fully defined behavior for any piece of code in every environment it can run on. There is a finite number of platforms in existence, and I explicitly give the programmer the ability -- within the language -- to detect what the platform capabilities are at compile time. Instead of C's int type, one must use the equivalent of #ifdef blocks to supply your typedef mapping:
    // C-ish pseudocode for creating your flexible int type, that you actually rarely (if ever) need:
    #if ( __CPU_WORD == 32 )
    typedef int32_t int;
    #elseif ( __CPU_WORD == 64 )
    typedef int64_t int;
    #elseif ( __CPU_WORD > 64 )
    #warn "Emulating 64 bit integer compatibility."
    typedef int64_t int;
    #else
    #warn "Emulating 32 bit integer compatibility."
    typedef int32_t int;
    #endif

    You only have to do this in one place, and it lets the programmer explicitly define behaviors so they can depend on them. This code allows every p

  64. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    > If it goes on to make a fancy assumption about the undefined behavior instead of letting it fall through to runtime as written then it's doubly broken.

    What does "letting it fall through to runtime as written" mean? The spec explicitly says that the behaviour is undefined, so there is no standard behaviour.

    Another example is "int i = 1; f(i++, i++)", what are the values of the parameters passed to f which uses undefined behaviour? You might expect that one would be 1 and the other two but possibly either way around. Since it's undefined it's also allowed that both be 1, both be 2, both be 666, it refuses to compile, or demons fly out of the programmer's nose.

  65. Re:TFA does a poor job of defining what's happenin by lgw · · Score: 1

    There's nothing in the standard about "warnings", though most compilers are good about it when it comes to common problems. But even with a warning, optimizer's gonna optimize.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  66. Re:TFA does a poor job of defining what's happenin by Fwipp · · Score: 1

    Myers-Briggs test, and it's 'N' for intuitive. :)

  67. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Incorrect –"undefined" means that absolutely any result at all is correct. That means that the compiler can do anything at all that it likes (because all of those will fall into correct), but *you* can't expect anything in particular. Clang is absolutely correct here (as would any behaviour at all be).

  68. Re:IBM had a tool to do this for a long time alrea by Anonymous Coward · · Score: 0

    GCC fails to warn on stripping code in a way that leads to security vulnerabilities.
    IBM has a tool that catches those same vulnerabilities.
    And people wonder how the NSA gets so many cool zero-days.

  69. Re:TFA does a poor job of defining what's happenin by lgw · · Score: 1

    Signed integer overflow (props to some people elsewhere in the thread that taught me that this isn't true for unsigned!)
    Writing or reading past the end or beginning of an array
    Dereferencing a NULL pointer
    Accessing an object of one type via a pointer of another type (violating the strict-aliasing rules)

    All of these are exactly what I was talking about - different needs for different architectures. I've coded on a platform where writing to 0 was legal, and did something bad, unless you did it on purpose No fun at all, but possible to code for.

    Accessing memory at an address that has been free()ed or deleted
    Calling several STL algorithms with iterator pairs that don't form a valid range, e.g. copy(vec1.begin(), vec2.begin(), vec1.end()) (I think I ordered those right)

    These are important for library optimization. Without the optimization they allow, people would have written their own, faster libraries and that would have sucked far worse.

    Assigning the same scalar value twice without an intervening sequence point (e.g. i = i++; not only doesn't have a well-defined evaluation order but also provokes undefined behavior entirely

    I never did understand what they gained from that one, but the examples I've seen of the sequence point thing are horrible code anyway, so it doesn't bother me.

    The one big one in C++, the biggest gotcha for even veteran programmers, is the undefined lifetime of "compiler temporaries" (usually unnamed objects created in the argument list to a function call), which is a landmine for shared_ptr. I hope they fixed that one in C++0X

    --
    Socialism: a lie told by totalitarians and believed by fools.
  70. Re:TFA does a poor job of defining what's happenin by Old+Wolf · · Score: 1

    >If my C code contains *foo=2, the compiler can't just leave that out

    Well, it could if the program produces no further output before exiting, or if "foo" is unassigned.

  71. Headline by michaelmalak · · Score: 1

    Based on the headline, I thought it was going to be about Ken Thompson's self-referencing compiler that not only inserted a back door whenever it saw that it was compiling the UNIX login command, it also inserted the back door insertion code whenever it saw it was compiling the compiler source code.

    1. Re:Headline by udittmer · · Score: 1

      +1. That's a paper everyone should read.

  72. Unstable Code: int n = x / 0; by ebyrob · · Score: 1

    According to the article "unstable code" is anything with undefined behavior according to the C++ standard. This could be as simple as an integer overflow or divide by zero which in debug or "zero optimization" mode would always cause an error, but which in an optimized release may simply be removed.

    1. Re:Unstable Code: int n = x / 0; by Anonymous Coward · · Score: 0

      WTF did they have to go invent the brand-new term "unstable code" as a synonym for the existing, well-known term "undefined behavior"?

      The cynic in me thinks it was to drive up the number of comments in threads like this one.

    2. Re:Unstable Code: int n = x / 0; by war4peace · · Score: 1

      Exactly. I get "buggy code" or "code which doesn't handle exceptions well" but "unstable" leads to me thinking of erratic output with the same input, which is a totally different thing.

      --
      ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
  73. Re:TFA does a poor job of defining what's happenin by Old+Wolf · · Score: 1

    The behaviour is also undefined if realloc returns NULL. Also, sizeof(char) is 1 by definition.

  74. Re:TFA does a poor job of defining what's happenin by istartedi · · Score: 1

    I think the compiler would be violating sequence points if it moved the division up.

    However, I see your point with the for-loop and have experienced it first hand when I wanted to see how fast such a loop would run. I had put some stupid addition or something in there, and the sneaky compiler went ahead and optimized my loop into oblivion. I had to put a function call in the loop to make it generate loop code.

    After reading over responses to my original post, and to other posts around here I've come to the following conclusion:

    Programmers are invoking undefined behavior.

    OK, aside from that I always figured that invoking undefined behavior could make your program blow up at runtime. I never thought about the possibility of undefined behavior occuring at compile-time. I certainly wouldn't rely on such behavior, no matter how fast it made my program run. I'd be at the mercy of the compiler author. Even if I used #ifdef checks for the operating system, compiler version, etc. I could get screwed. Such checks are legitimate for implementation defined behavior in the compiler or quirks of the operating system on which the program will run. They are NOT legitimate for getting away with undefined behavior, not if you want to claim your program is C or C++.

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
  75. Re:TFA does a poor job of defining what's happenin by Old+Wolf · · Score: 2

    >a == b is not a use of the argument that has been invalidated

    Yes it is. Evaluating the expression "a" causes undefined behaviour if "a" is
    indeterminate. "a" is considered to no longer have a value, any attempt to
    refer to its value causes UB. (It has the same status as a variable that has
    been defined but not initialized, i.e. "int a;"

    The only thing that can be done with "a" thereafter is to assign a new value to it
    (or take its address, or do "sizeof a" .. can't think of any other exceptions)

  76. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    What if INT_MAX - a overflows?

  77. Re:TFA does a poor job of defining what's happenin by VortexCortex · · Score: 1

    Hmm I seem to have messed up a few &gt;s and &lt;s... That's my fault, 0: For not giving a fuck -- It's futile to try deconverting a zealot; and 1: it's 2013 and we're still escaping HTML manually?

    Truly, the whole computing world is shit strung together with bubble gum and twine. I mean, really... No isolation for code and data pointers or sacrificing a register for offset / segmentation and not giving us a new offset register so we could ACTUALLY do the heap code pointer protections.

    How fucking dumb can everyone be? The language and systems programmers don't interact with the hardware makers and vise versa. What the actual fuck. I'd love just ONE MORE hardware execution permission ring level, so that SANDBOXES could actually work... Nope, not on ARM, or AMD... Just 2 levels -- Hardware designed for a monolithic kernel. It's fucking disgusting.

  78. Re:TFA does a poor job of defining what's happenin by Old+Wolf · · Score: 4, Insightful

    >The dereference is undefined, and therefore

    Stop right here. Once undefined behaviour occurs, "all bets are off" as they say; the remaining code may have any behaviour whatsoever. C works like this on purpose , and it's something I agree with. It means the compiler doesn't have to insert screeds of extra checks , both at compile-time and run-time.

    There are plenty of other languages you can use if you want a different language definition :)

  79. Re:TFA does a poor job of defining what's happenin by Old+Wolf · · Score: 1

    "Overflows of unsigned values" is NOT undefined. You can assign out-of-range values to unsigned types, and also perform arithmetic operations which exceed the bounds of the type; and the value is adjusted using modular arithmetic.

    Some would be facetious and say that "unsigned types cannot overflow", meaning that they always have well-defined behaviour on operations that would generate an out-of-range value, but that's just an issue of pedantry with English.

  80. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    With all due respect, this is a silly example of an obvious coding mistake (making assumptions about the location of a dynamically allocated pointer after calling realloc) followed by melodramatic consequences.

  81. Re:TFA does a poor job of defining what's happenin by Old+Wolf · · Score: 1

    I think you must be mis-remembering the details slightly. The comma operator is a sequence-point, so "tmp" must be assigned the value of "a", and f() and g() must both be called with a value that is the value of "a" converted to the type of "tmp". The two functions can be called in either order though (or in parallel) but there is no issue there.

    Of course, the compiler can do anything it likes so long as the program's output is equivalent to what I just described. So, for example, it might not allocate a memory location to "tmp", it could just push the value of "a" onto a register and then call f and g with it. Or if f or g do nothing and have no side-effects, the assembly code might not show calls to f and g. But there is no way you could know these things by running the program, which is the whole point.

  82. These bugs exist even *without* signed integers! by Myria · · Score: 5, Interesting

    The first mistake was using signed integers.

    The problem is C's promotion rules. In C, when promoting integers to the next size up, typically to the minimum of "int", the rule is to use signed integers if the source type fits, even if the source type is unsigned. This can cause code that seems to use unsigned integers everywhere break because C says signed integer overflow is undefined. Take the following code, for example, which I saw on a blog recently:

    uint64_t MultiplyWords(uint16_t x, uint16_y)
    {
        uint32_t product = x * y;
        return product;
    }

    MultiplyWords(0xFFFF, 0xFFFF) on GCC for x86-64 was returning 0xFFFFFFFFFFFE0001, and yet this is not a compiler bug. From the promotion rules, uint16_t (unsigned short) gets promoted to int, because unsigned short fits in int completely without loss or overflow. So the multiplication became ((int) 0xFFFF) * ((int) 0xFFFF). That multiplication overflows in a signed sense, an undefined operation. The compiler can do whatever it feels like - including generate code that crashes if it wants.

    GCC in this case assumes that overflow cannot happen, so therefore x * y is positive (when it's really not at runtime). This means the uint32_t cast does nothing, so is omitted by the optimizer. Now, the code generator sees an int cast to uint64_t, which means sign extension. The optimizer this time isn't smart enough to know again that it's positive and therefore can ignore sign extension and use "mov eax, ecx" to clear the high 32 bits, so it emits a "cqo" opcode to do the sign extension.

    So no, avoiding signed integers does not always save you.

    --
    "Screw Sun, cross-platform will never work. Let's move on and steal the Java language." - Visual J++ Product Manager
  83. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    This code also would launch the missiles in gcc as well if the C library did the perfectly valid action of freeing a after allocating a new 1+ sized object which is the reason this is UB. In a security minded implementation each allocation would be placed on at least one page which would be marked as not readable after free which would save us from the missiles but not a drone crash.
    Clang optimizes to the first implementation. It should error out.

  84. Re:TFA does a poor job of defining what's happenin by russotto · · Score: 2

    Quick summary: (low + high) / 2

    I like this one, because it shows a very common weakness in high level languages.

    In most machine languages, getting the average of two unsigned numbers up to UINT_MAX is absolutely trivial -- add the two, then shift right including the carry. The average of two signed numbers rounding to zero is a little more difficult (x86 makes it harder than it should be by not setting flags in a convenient manner), but still a few instructions.

    In C? Assuming low and high are unsigned
    (low >> 1) + (high >> 1) + (low & high & 1). Ick. The answer given in your article is inadequate; it gets you one more bit.

    Of course, now we have 64 bit integers and the problem is solved ONCE AND FOR ALL.

    But there are algorithms which need the average of 64-bit unsigned numbers too..
    ONCE AND FOR ALL

  85. What a stupid ungoogleable name. by technosaurus · · Score: 1

    Why do people do this crap?

    1. Re:What a stupid ungoogleable name. by Anonymous Coward · · Score: 0

      The site http://css.csail.mit.edu is down and it's impossible to search for a copy of "STACK", what a bunch of dumbasses!

  86. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    This is simply a bug in the compiler, which has made semantic assumptions that are incorrect. It has nothing to do with removing this so-called 'unstable' code, which I'm still yet to see a real example of.

  87. Re:TFA does a poor job of defining what's happenin by paylett · · Score: 1

    For all intents and purposes "for all intensive purposes" is a mishearing of "for all intents and purposes". (Just thought I'd mention it, because I used to say that too, and became really conscious of it since someone pointed it out to me).

    --

    Believing something doesn't make it true. Not believing something doesn't make it false.

  88. Re:TFA does a poor job of defining what's happenin by msobkow · · Score: 1

    In other words crappy, buggy code can cause underfined behaviours from the compiler and at runtime.

    News flash.

    Code written with such erroneous assumptions has long been at fault for everything from BSODs to the loss of satellites and deep space probes. Compilers are not mind readers. They can only work with what's been provided; they can't guess what your intentions were.

    --
    I do not fail; I succeed at finding out what does not work.
  89. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    It doesn't need to make any assumptions. The choices are:
    1) The test would fail. In this case the body of the conditional is irrelevant.
    2) The test would succeed, in which case the program's behaviour was already undefined and any further course of execution is legal - including ignoring the conditional.

    In both cases the body of the conditional is irrelevant, and it is legal to remove it. Since the conditional test itself has no side effects, it too is irrelevant and can be removed.

  90. Re:TFA does a poor job of defining what's happenin by sconeu · · Score: 1

    In C? Assuming low and high are unsigned
    (low >> 1) + (high >> 1) + (low & high & 1).

    If low and high are unsigned, then (low + high)/2 is well defined, because unsigned arithmetic is defined as modular.

    --
    General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
  91. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    (low+high)>>1 === low + ((high-low)>>1) // ONCE AND FOR ALL

    Works fine for any values if low and high are signed..
    If unsigned, then you need to verify low = high

  92. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    (low+high)>>1 === low + ((high-low)>>1) // ONCE AND FOR ALL

    Works fine for any values if low and high are signed..
    If unsigned, then you need to verify low <= high
    (edit for slashcode)

  93. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Wrapping might have and continue to "work" on signed integers too if the compiler simply translates every addition into a machine instruction that adds and wraps on over flow. If you went to another architecture where such an instruction doesn't exist, being undefined operation allows the compiler to chose some other instruction that works fine for addition whenever there is not an overflow. Otherwise, it might need to add a bunch of instructions to handle overflow if the cpu doesn't wrap it. Undefined operations on a boring platform might actually sometimes do what you want and/or expect, at least until optimization is done.

  94. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Pretty much every arithmetic operation on signed integers can produced undefined behaviour. Do you want to see warnings about that for every single line that uses signed integers?

    Also what do you mean by "as written"? The code as written has undefined behaviour, and by the standard anything at all can happen at runtime - including doing nothing.

  95. Re:TFA does a poor job of defining what's happenin by EvanED · · Score: 1

    "Write a C function that correctly computes the average of two ints" is one of my favorite little programming puzzles, probably mostly because I ran into that exact problem. Speaking from total ignorance from what makes a good interview question, I think it might make a neat little puzzle for a programming job where you're expected to do low-level stuff like that.

  96. Re:TFA does a poor job of defining what's happenin by EvanED · · Score: 1

    I find it very curious that you'd be prohibited from using the value of the pointer proper. Do you have a citation for it being undefined? I don't see anything that seems to me to say that around the description of realloc in the C99 draft standard, but it also doesn't explicitly say that you can't access the memory that was deallocated in the section I'm looking at either.

  97. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 1

    What do you mean "word is"? Nobody.. and I mean nobody... should ever be coding in C unless they've read the standard from top-to-bottom, carefully, at least once, and then keeps it handy. It's quite easy to understand, unlike the C++ standard, which is written completely differently and is 10x longer.

    To quote C99: "The typedef name intN_t designates a signed integer type with width N , no padding bits, and a two’s complement representation." C99 7.18.1.1p1 (N1256).

    That is completely intentional. However, it doesn't resolve the issue because signed overflow is still undefined behavior regardless of the width or representation. If you want modulo behavior just used unsigned. My rule of thumb: unsigned for everything unless you know you need signed arithmetic. And make sure you keep it unsigned (i.e. be careful of unsigned short -> int promotions). Why? Because unsigned behavior is always well defined, and its easier to code to precisely one set of rules. Plus you can make use of unsigned rules to help avoid buffer overflows (i.e. size_t overflow simply wraps around to something smaller, not bigger, which if you don't catch the overflow because you're being sloppy is usually the better behavior, anyhow).

  98. easily googleable name by iggymanz · · Score: 1

    "mit stack checker", and it's the first URL returned: css.csail.mit.edu/stack

    I really wonder about people that complain google is unusable and never gives them that for which they are searching. are they lazy? uncreative? semiliterate? Maybe a well constructed online test could discern the truth.

    1. Re:easily googleable name by osu-neko · · Score: 1

      "mit stack checker", and it's the first URL returned: css.csail.mit.edu/stack

      I really wonder about people that complain google is unusable and never gives them that for which they are searching. are they lazy? uncreative? semiliterate? Maybe a well constructed online test could discern the truth.

      Yeah, I never have any trouble getting Google to barf up what I'm looking for. Although, as an interesting experiment that came to mind after reading your post, I guessed maybe people aren't very good at being specific in their requests and, to test what would happen, I simply asked Google to "give me what I am searching for". Amusingly, it seems to have concluded that I don't give a F... xD

      --
      "Convictions are more dangerous enemies of truth than lies."
    2. Re:easily googleable name by Fnord666 · · Score: 1

      "mit stack checker", and it's the first URL returned: css.csail.mit.edu/stack

      And that URL eventually returns:

      internal error - server connection terminated

      Nice job /.

      --
      'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
  99. Re:TFA does a poor job of defining what's happenin by russotto · · Score: 2

    Yes, (low + high)/2 is well-defined for unsigned ints even when (low + high) > UINT_MAX. It's not the definition you want, though. The average of (UINT_MAX/2 + 1000) and (UINT_MAX/2 - 500) should not be 250.

  100. Re:TFA does a poor job of defining what's happenin by TranquilVoid · · Score: 1

    I also like the following possibility that arises from warnings being completely compiler-dependent.

    Warning L-8272: Your code is perfectly valid and well-formed.

    I wish vendors would implement this, just to be a thorn in the side of those who advocate those unthinking zero-warnings policies.

  101. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Signed integer overflow being undefined is a mistake in the C spec. No one will make a ones-complement machine ever again.

    Now to extirpate big endian.

  102. Re:TFA does a poor job of defining what's happenin by sjames · · Score: 1

    In that case, yes. The case in TFA where password is memset to zero, then freed is another matter. The code is unambiguous and clearly serves a security function. However, looked at narrowly, the memset is wasted because the memory will be freed in the next line. But if the memset is skipped, the password is left floating around in unallocated memory. Worse, it might end up in swap.

  103. Re:-Wall doesn't catch some aliasing issues by Anonymous Coward · · Score: 0

    -Wall doesn't catch aliasing issues see here: http://blog.worldofcoding.com/2010/02/solving-gcc-44-strict-aliasing-problems.html

  104. Re:TFA does a poor job of defining what's happenin by aiht · · Score: 1

    Myers-Briggs test, and it's 'N' for intuitive. :)

    Yep, the I is for introverted.

  105. Re:TFA does a poor job of defining what's happenin by sconeu · · Score: 1

    Good point.

    --
    General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
  106. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    He said "negative", not "imaginary".

  107. Re:TFA does a poor job of defining what's happenin by lgw · · Score: 1

    What do you mean "word is"? Nobody.. and I mean nobody... should ever be coding in C unless they've read the standard from top-to-bottom, carefully, at least once, and then keeps it handy.

    As in, (public) discussions on the standard between standards committee members. Reading the standard top-to-bottom only gives part of the picture, like reading the Constitution without reading the Federalist Papers. When late in the game someone writes something like "do you realize the means a char can only be 8 bits?" one can reasonably speculate that mandating an 8-bit byte (goodbye PDP) wasn't the intention.

    It's quite hard to write a standard that's both intelligible and unambiguous - English doesn't work that way. Sometimes googling around for the discussions sheds light.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  108. Re:TFA does a poor job of defining what's happenin by lgw · · Score: 1

    Nice. Nice. I like the way you think.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  109. All human is inperfection by lapm · · Score: 1

    Its humans writing a code, humans are not perfect.. If this tools actually works as advertised, it should make fine addition to us not so perfect coders toolbox. Somehow i would have expected that compiler would warn you about it going to leave piece of code out. But i guess that's shows how little i code actually.

  110. Re:TFA does a poor job of defining what's happenin by dgatwood · · Score: 1

    Ah, sorry. I screwed up the example slightly. In a proper example, there's always a little more code to ensure that "a" is a strictly positive integer, because if it isn't, then a + 2 is guaranteed to not overflow, making the check superfluous.

    --

    Check out my sci-fi/humor trilogy at PatriotsBooks.

  111. Re:TFA does a poor job of defining what's happenin by osu-neko · · Score: 1

    I find it very curious that you'd be prohibited from using the value of the pointer proper.

    Makes sense to me that doing so would have undefined results. Plenty of more sophisticated memory managers will monkey with the contents of pointers from time to time. Once you deallocate the block it pointed to, God only knows what's left in the pointer. Probably the same value for simple memory allocators. Probably...

    --
    "Convictions are more dangerous enemies of truth than lies."
  112. Re:TFA does a poor job of defining what's happenin by osu-neko · · Score: 1

    This makes no sense. The dereference is undefined, and therefore sk may be undefined iff tun IS null but not tun.
    I.e. by the time execution reaches the if statement one of the two is true:
    tun != null && sk == {something valid} -or-
    tun == nul && sk == {undefined}
    sk being undefined is possible but that undefined-ness can't be used as a way to infer tun != null--the only thing that causes it is tun == null! It's illogical for the compiler to do what you say and remove the if check. The standard says sk can be undefined, therefore something being in an undefined state is possible, not that the compiler can presume that undefined is impossible to occur and put it's hands over its ears and go la-la-la.

    "sk being undefined"? You misunderstand what "undefined" means in this case. It's not a question of whether "sk" has a defined value or not, because if tun was null, it's not that the value of sk becomes undefined, rather, the behavior of that deference statement is undefined. "By the time execution reaches the if statement"? It may never reach it! Since the behavior of "struct sock *sk = tun->sk" is undefined in the case tun == null, it's perfectly acceptable for that statement to be treated as a "return" statement if tun is null. Or for it to be treated as a command to reformat your hard drive. Or to simply make demons fly out your nose. The if statement can be eliminated, since if the compiler wants to treat it as unreachable code in the event tun == null, it has every right in the world to do so.

    --
    "Convictions are more dangerous enemies of truth than lies."
  113. Re:These bugs exist even *without* signed integers by Animats · · Score: 5, Interesting

    The problem is C's promotion rules. In C, when promoting integers to the next size up, typically to the minimum of "int", the rule is to use signed integers if the source type fits, even if the source type is unsigned.

    I know. C's handling of integer overflow is "undefined". In Pascal, integer overflow was a detected error. DEC VAX computers could be set to raise a hardware exception on integer overflow, and about thirty years ago, I rebuilt the UNIX command line tools with that checking enabled. Most of them broke.

    In the first release of 4.3BSD, TCP would fail to work with non-BSD systems during alternate 4-hour periods. The sequence number arithmetic had been botched due to incorrect casts involving signed and unsigned integers. I found that bug. It wasn't fun.

    C's casual attitude towards integer overflow is why today's machines don't have the hardware to interrupt on it. Ada and Java do overflow checks, but the predominance of C sloppyness influenced hardware design too much.

    I once wrote a paper, "Type Integer Considered Harmful" on this topic. One of my points was that unsigned arithmetic should not "wrap around" by default. If you want modular arithmetic, you should write something like n = (n +1) % 65536;. The compiler can optimize that into machine instructions that exploit word lengths when the hardware allows, and you'll get the same result on all platforms.

  114. Re:These bugs exist even *without* signed integers by Anonymous Coward · · Score: 0

    I never said it always saved you. There are obviously caveats, one of which is promotions of unsigned char and unsigned short to int. Another is something like this:

    unsigned x = UINT_MAX, y = UINT_MAX;
    unsigned long z = x + y;

    Problem? If unsigned is 32-bits but unsigned long is 64-bits, z is assigned the truncated value, which may or may not be what you wanted. (In the common case of object manipulation, inadvertent unsigned truncation is usually the least harmful effect, and if used purposefully and knowingly, very handy).

    But my ultimate point is that if one sticks to unsigned, they've drastically reduced complexity. You still need to understand the semantics, but there are fewer issues to keep in mind. That's no excuse for not understanding all the semantics for all types, but when you're jamming away on code you want to use the method which requires the least amount of rules to be resident in your head, so as to minimize mistakes. One wants to habituate oneself to patterns and techniques which, on the whole, maximally mitigate the effects of bugs.

    Interestingly, one counter argument I've heard to using unsigned is that it _prevents_ some of the kinds of optimizations that compilers use. For example, unsigned arithmetic in for-loop conditionals is often less than optimal. But if one is aiming for consistency and correctness, it's usually the way to go, not the least because for-loops often operate on vectors or arrays, and negative values are almost always non-sensical. (Obviously there are caveats to unsigned in for-loops, such as the infamous ">= 0", but that's beside the point).

  115. Re:TFA does a poor job of defining what's happenin by osu-neko · · Score: 1

    Apparently lots of programmers start with "IN-"... although I still haven't decided if Myers-Briggs is insightful analysis or modern astrology. It would help if they could agree with each other (I've been variously told I'm INTP or INFP, depending on the test, or maybe what kind of mood I'm in, or perhaps somehow related to the phases of the Jovian moons)...

    --
    "Convictions are more dangerous enemies of truth than lies."
  116. Re:TFA does a poor job of defining what's happenin by CommanderK · · Score: 1

    The original x86 had 4 ring levels and segmentation, and at the time we got AMD64 no one was using them (or at least Windows/Linux/*BSD weren't using them). AMD removed security measures because they weren't popular.

  117. Re:TFA does a poor job of defining what's happenin by Altrag · · Score: 1

    I don't think that's the problem the OP is referring to. The problem is that the compiler assumes *a is invalid and "optimizes" it even if the realloc returned the same memory address, making (a==b) true. If the compiler did nothing (*a==*b) should also be true but because the compiler replaced it with something incorrect, (*a!=*b) ends up being true instead.

    That's the problem with loose standards -- if the behavior is undefined, there will be people who through ignorance or "cleverness" will end up abusing the undefined behavior of a specific system (in this case compiler) and have their code break in ways that are sometimes extremely difficult to debug -- especially if its been working that way so long that you've forgotten its technically an undefined behavior -- you end up completely overlooking the culprit line of code because it looks correct to you.

  118. Compiler can't warn about all undefined behaviour by Anonymous Coward · · Score: 0

    You're never going to see this comment but one of the LLVM lead programmers outlines the reasons why the compiler can't always warn that the programmer has invoked undefined behaviour in C. He outlines three reasons:

    1. Too many spurious warnings (warning generated when there's no bug)
    2. It's hard to generate said warnings ONLY when people want them
    3. No good way to tell the user how the situation occurred after X optimisations

    The article outlines some steps programmers can take but ultimately concludes that C just isn't a "safe" language (but that's partly why it can go so fast).

  119. Re:These bugs exist even *without* signed integers by Anonymous Coward · · Score: 0

    that's absolutely incorrect. while signed-preserving compilers are standards
    complaint, unsigned-preserving compilers are too. unsigned-preserving
    compilers are conservative in the principle of least surprise.

  120. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    that's not correct. a does not become undefined. its value does not
    change since c is pass by value. and either realloc() assigns new
    storage or it does not. so either a==b or a!=b, either one is valid but
    there is no third "undefined" option. the compiler is simply wrong
    if it thinks that a function call can make a previously defined value go
    random.

  121. Re:TFA does a poor job of defining what's happenin by maxwell+demon · · Score: 1

    Isn't right shift for negative values implementation defined? So your code may work on some platforms, but not on others.

    --
    The Tao of math: The numbers you can count are not the real numbers.
  122. Re:TFA does a poor job of defining what's happenin by maxwell+demon · · Score: 1

    int avg(int a, int b)
    {
      return a/2 + b/2 + (a%2 + b%2)/2;
    }

    --
    The Tao of math: The numbers you can count are not the real numbers.
  123. Re:TFA does a poor job of defining what's happenin by maxwell+demon · · Score: 1

    I like even more the fact that, given that a diagnostic is sometimes required, but never disallowed, and the text of a diagnostic is not regulated, a conforming compiler may just output on every compilation:

    warning: This program might contain errors.

    Useless, but completely conforming.

    --
    The Tao of math: The numbers you can count are not the real numbers.
  124. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    It doesn't really have to _detect_ undefined behaviour. Maybe it even can't detect it. It just has to produce code that will work as defined in cases when the behaviour is defined.

    Let's say I need an algorithm A that, given a Turing machine M and an input word w, produces M(w) if M would halt on w; otherwise, A(M, w) is undefined. A can just run M on w and not worry (or warn) about it potentially not halting. In fact, that is the only available option.

  125. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    More realistically (and in almost all userland scenarios), if tun is null then the dereference causes a segmentation fault, so the test is guaranteed to fail (i.e. tun is guaranteed non-null) if the code gets as far as the test. To get any real risk, you need a compiler that both optimizes out the test and does something really bizarre on a null pointer dereference (which they're technically allowed to do, but generally don't). Or be writing a kernel, of course.

  126. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Try this one: if (a a+1) { /* do stuff */ }
    If a is a signed int, a smart compiler will leave out the condition and assume it is always true, because it assumes undefined behaviour never happens. a a+1 is true even when a = MAX_INT.

  127. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    > if (INT_MAX - a 2) {

    Oh please no. Don't get clever with security checks. The rule is pretty simple: In almost all cases, the value to be validated should stand alone.
    Which means
    if (a = INT_MAX - 2)
    It will generate better code even with a dumb compiler and it actually works. Because your code will still overflow for a 0, this time not in the addition but in the condition. If you're lucky that is a less critical failure type. If you're unlucky, it's worse than what you tried to fix.

  128. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Well, it works when plain text posting actually works and doesn't change >= to = ...
    Also things get tricky (and the rule doesn't work) when you must validate two untrusted value to stand in some relation to each other.

  129. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    > For undefined behavior, the compiler can do anything it wants to

    In particular, relevant for the case of optimizations discussed here, it is free to assume that case can never happen and optimize code before/surrounding it accordingly.
    So if you have (a being int)
    int b = a + INT_MAX;
    if (a > 0) { ...
    }
    It can optimize the if away, since it can only be true when the calculation of b would have overflown. This is even true if you never use b and it is old code you forgot to remove and is far a way in the source file.

  130. Re:TFA does a poor job of defining what's happenin by TheRaven64 · · Score: 1

    This is not correct, however there is a set of constraints in C and a set of constraints in POSIX which, when composed, mean that 8 bits is the only valid size for a char, so if you are writing C code that only targets POSIX machines then you can safely assume that char is 8 bits.

    --
    I am TheRaven on Soylent News
  131. Re:TFA does a poor job of defining what's happenin by eulernet · · Score: 1

    Nice, but one of the proposed solutions is incorrect:

    mid = ((unsigned int)low + (unsigned int)high)) >> 1;

    if low and high are equal or greater than 2^30, then it overflows.
    For example, low=131, high=131, then you get mid=0 instead of mid=131.

  132. Re:TFA does a poor job of defining what's happenin by eulernet · · Score: 1

    It's 1<<31, of course !
    Slashdot removes the unescaped <

  133. Re:TFA does a poor job of defining what's happenin by TheRaven64 · · Score: 1

    Here's another one: ordered comparisons between pointers to different objects (less-than or greater-than). These are undefined in C and C++, and yet every STL implementation I've seen relies on them being both defined and stable for many collections (map, set and friends) to work. C++11 is explicitly written to allow implementations with GCs that modify pointer values, so this could cause some interesting issues in the future...

    --
    I am TheRaven on Soylent News
  134. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Switching rings was too damn slow. So slow, that only academics ever used them, and not very well. It is no wonder it died.

    Separating code and data pointers is supported through the NX bit if you forbid self-modifying code and trampolines, so just say no to any crap that requires them, and place all data on separate pages with NX enabled. This is supported in Linux+gcc+LD, but it requires explicit action and cant be used for everything due to dumbass language designs that require trampolines.

    Properly partitioning memory between ring 0 and ring 3 (sSeparating kernel mode from user mode) is supported through SMEP and SMAP instructions (and their AMD equivalent): Linux already implements it if you have a processor that has the support. I dont know about the BSDs, but I doubt theyd fall behind on this.

    And there is now a new "Intel MTX" thing that might fix the array-out-of-bounds disease by actually making it painless enough that we can just enable runtime checking everywhere (static analysis of this is already a MUST).

    Now, if only we could kill ring -1 *DEAD* to get rid of both vendor-added bugridden crap SMM, and hypervisor/NSA viruses/trojans, it would be REALLY nice. You can only do it right now if you design the platform from the ground up and youre an Intel hardware partner (or AMD hardware partner) to get complete access to all BIOS/EFI reference code for platform setup.

    Segment registers were very useful, but if you look at chip errata youll get a glimpse of the hellpit wed be in should those things still work in long mode: the TLBs and MMUs are already buggy as all hell [in silicon], and you have extremely nasty interactions of even the microcoded "basic instructions" with memory pages and alignment boundaries in X86 (think "fast strings" mode of operation). Add segments to that again, and chances are we will just have to junk the whole arch for good. Which would not be a bad idea, it would sink Microsoft crap along with it, while Linux and the BSDs work extremely fine with everything else worth of notice under the sun.

  135. thankfully compilers are lazy by Anonymous Coward · · Score: 0

    It is a damn good thing that compilers are mostly lazy, then. And the easiest is to just omit that section of code ;-)

  136. Re:TFA does a poor job of defining what's happenin by Warbothong · · Score: 1

    It should be ingrained in every programmer's brain that undefined behaviour is exactly that, so I have absolutely no sympathy for anyone who's code suffers from this. Personally I try not to use languages like C unless I'm forced to interface with some particular library; not only is the semantics full of undefined behaviour, but it's also incredibly complex. This means programmers struggle to understand what's going on *exactly* and have to think in terms of approximations, which leads to some situations becoming unexpected (eg. integer overflow). This also leads to teams of very smart people having to invest a lot of time to make tools like the one in TFA, which is clearly a sign that the language is too complex.

    Now, the really *interesting* security problems caused by optimisation are the side-channel attacks. For example, you might have code which checks a security token; if this is done character-by-character and fails on the first mis-match then an attacker can guess each character in turn by timing how long it takes to fail. Quick failure == first character is wrong, slightly slower failure == first character is right, and so on. You might decide to guard against this by looping through the whole token, so that every check takes the same amount of time. However, is your compiler going to optimise this away? Maybe you'll call some random number generator in your loop to force it to run, but maybe your compiler will move this out of the loop without affecting the semantics, etc.

  137. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Augmented C for static analysis can do it, which is just C with lots of __foo crap added everywhere (and preprocessor code to kill that when not running under the static analyser). Refer to "sparse" and have a look at the Linux source code which is full of sparse annotations for extended typechecking.

  138. Re:TFA does a poor job of defining what's happenin by gnasher719 · · Score: 1

    For the "missile launch example":

    After the statement b = realloc (a, sizeof char);
    the value of a is indeterminate unless realloc failed, that is b = NULL. So immediately afterwards a comparison a == b must give the correct result if b == NULL. However, there was an assignment to *b which tells the compiler that b != NULL and therefore a is indeterminate.

    So it's safe to do b = realloc (a, sizeof char); if (b == NULL) { /* handle the error and a is unchanged */ }

  139. Re:TFA does a poor job of defining what's happenin by michelcolman · · Score: 1

    Unsigned arithmetic in C is not defined as modular. Integer overflows are "undefined", leading to unintended compiler optimizations where it assumes overflows just can't happen.

  140. Re:TFA does a poor job of defining what's happenin by goose-incarnated · · Score: 1

    There's nothing in the standard about "warnings", though most compilers are good about it when it comes to common problems. But even with a warning, optimizer's gonna optimize.

    There aren't "warnings", but the c89 and c99 std both prescribe diagnostics to be issued under certain circumstances.

    --
    I'm a minority race. Save your vitriol for white people.
  141. PostgreSQL doesn't take safety seriously... by Anonymous Coward · · Score: 0

    This is in the paper (read it, it is worth your time if you write any code in something other than ADA), section 6.3:

    Postgres. Stack reported 68 bugs in total. The developers promptly fixed 9 of them after we demonstrated how to crash the database server by exploiting these bugs, as described in 6.2.1. We further discovered that Intel’s icc and PathScale’s pathcc compilers discarded 29 checks, which Stack identified as unstable code (i.e., urgent optimization bugs), and reported these problems to the developers. At the writing of this paper, the strategies for fixing them are still under discussion. Stack found 26 time bombs (see 6.2.3 for one example); we did not submit patches to fix these time bombs given the developers’ hesitation in fixing urgent optimization bugs.

    In conext, you have to read this about a bug the PostgreSQL people tried to fix and got it all wrong *again* (and therefore did not fix it properly), read section 6.2.3 and 6.2.1.

    I am not amused. When someone who KNOWS WHAT THEY'RE DOING point out your code is crap, you FIX it instead of jumping around in one foot chanting "I know more than you" (when you obviously don't).

    Now, I alredy run all my code through three static checkers as well as LLVM and newest GCC in -Wall -Werror -pedantic mode, but as soon as the MIT "Stack" site unclogs (it is currently overloaded or something), I will add a fourth :)

  142. Clinton Administration??? by gstovall · · Score: 1

    You young whippersnappers!

  143. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    C99 standard, section 6.2.4 paragraph 2 and annex J.2

  144. Re:TFA does a poor job of defining what's happenin by mysidia · · Score: 1

    plenty of more sophisticated memory managers will monkey with the contents of pointers from time to time.

    The memory heap manager library is passed the value of the pointer, not a reference to the pointer itself.

    e.g. if you had p =malloc(10);
    To release the memory, you do free(p); not free(&p);

    The key thing to keep in mind is, the memory manager has no knowledge of the object 'p'; only the address of the object that p had pointed to.

    Furthermore, there could be plenty of other copies of the pointer lying around, that the memory manager does not know the address of. A perfectly valid construct would be...

    struct { char * x; } bar4; char *bar1, *bar2, *bar3;
    bar3 = malloc(256); bar1 = bar3;
    bar2 = bar1;
    bar4.x = bar2; free(bar4.x);

    In this case, you have 4 copies of the pointer in scope, they each contain an memory offset within program address space, plus the copy that gets created when free is called. At no time is any memory management library able to change the contents of any of those 4 copies in a C program. The memory manager can only change what is the object contained at the address referenced by the pointers.

  145. Re:TFA does a poor job of defining what's happenin by mysidia · · Score: 1

    the value of a is indeterminate unless realloc failed, that is b = NULL.

    In practice; I see most developers neglecting or ignoring the condition where b is NULL; realloc, malloc, and calloc are assumed to always succeed.

    On their platform of choice, it may be the case, that the system kills a process with OOM, when allocation fails, instead of these procedures returning NULL.

  146. Re:TFA does a poor job of defining what's happenin by fatphil · · Score: 1

    >> Quick summary: (low + high) / 2

    > In C? Assuming low and high are unsigned
    > (low >> 1) + (high >> 1) + (low & high & 1). Ick.

    (low & high) + ((low ^ high) >> 1);

    Every bit in the & expression would be included twice in the sum, and then halved. So leave them alone.
    Every bit in the ^ expression would be only included once in the sum, on one side or the other, we care not, and then halved. So just half those.

    --
    Also FatPhil on SoylentNews, id 863
  147. Re:TFA does a poor job of defining what's happenin by mysidia · · Score: 1

    Yes it is. Evaluating the expression "a" causes undefined behaviour if "a" is indeterminate. "a" is considered to no longer have a value, any attempt to refer to its value causes UB.

    No.... it's *a that becomes indeterminate. I suppose the case could be made, that some strange clause of one of the new drafts could be interpreted as 'a' the pointer, instead of the value of the pointer becoming indeterminate. But nonetheless, in traditional C, it is perfectly valid, and a compiler will have to support it, to avoid breaking backwards compatibility.

    I suppose this becomes something like the argument, that the programmer should not be able to rely on MySQL coercing a NULL value to 0, when inserting a SQL row, containing a NOT NULL column, where the NULL has been presented; after all, the formal papers from the SQL standardization effort don't show that the DB engine can coerce NULL in such a manner.

    Nonetheless, there is sometimes one established convention or another that predates the standard, and has more authority than the standard.

  148. Re:TFA does a poor job of defining what's happenin by fatphil · · Score: 1

    > Switching rings was too damn slow. So slow, that only academics ever used them, and not very well. It is no wonder it died.

    I can't think of any current system that doesn't have something running at ring 0 and something running at ring 3.

    --
    Also FatPhil on SoylentNews, id 863
  149. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    The memory heap manager library is passed the value of the pointer, not a reference to the pointer itself.

    The C standard plays it safe even for hardware which does paranoid checking on pointers. The hardware may barf just from trying to assign unmapped address to a memory access register even before you actually try to dereference it.

  150. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    That's very strange - I thought one of the design goals of Clang was to raise an error for undefined behaviour. I've certainly noticed that happening much more in Clang than in MSVC, in which I've had many cases of the compiler 'knowing what I mean' and then filling in the blanks.

  151. Too much left out by T.E.D. · · Score: 1

    The problem with C compilers that remove unstable code is that nearly every C program fed into it gets optimized down to hello_world. These new C compilers can just spit out a warning: "Your program is unsafe", and go on their merry way now. Actually having the compiler perform a check to verify that this is the case removes the 0.5% false positives, but most will find the extra compilation time not worth that.

    I'm eggagerating of course. Hello_world is horribly unsafe too, because it uses printf.

    1. Re:Too much left out by david_thornley · · Score: 1

      printf() is, from the compiler's point of view, a defined I/O function. Hello, World always has the effect of calling printf("Hello, world!\n"), whatever that implementation may be.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
  152. Re:TFA does a poor job of defining what's happenin by Spazmania · · Score: 1

    No, I meant as written. When you encounter something undefined, turn the optimizer off and do the statements exactly as written. Whatever happens, make it happen in the order and steps that the programmer wrote. It'll still be wrong (most likely) but it'll be the programmer's wrong, not the compiler's wrong.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  153. Re:TFA does a poor job of defining what's happenin by Spazmania · · Score: 1

    From the blog post:

    Violating Type Rules: It is undefined behavior to cast an int* to a float* and dereference it (accessing the "int" as if it were a "float"). C requires that these sorts of type conversions happen through memcpy: using pointer casts is not correct and undefined behavior results. The rules for this are quite nuanced and I don't want to go into the details here (there is an exception for char*, vectors have special properties, unions change things, etc). This behavior enables an analysis known as "Type-Based Alias Analysis" (TBAA) which is used by a broad range of memory access optimizations in the compiler, and can significantly improve performance of the generated code. For example, this rule allows clang to optimize this function:

    float *P;
      void zero_array() {
          int i;
          for (i = 0; i < 10000; ++i)
              P[i] = 0.0f;
      }

    Okay, maybe it's too early in the morning but where exactly did this function cast an int* to a float*? Where's the "undefined behavior"?

    And anyway, how is casting int* to float* undefined behavior? You set this pointer to that pointer and now you're looking at the data a different way. It won't be sensible data unless you know what architecture you're programming to, but who says you don't?

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  154. Re:TFA does a poor job of defining what's happenin by Spazmania · · Score: 1

    I'd expect it to work left to right, pass f(1,2) and leave the value 3 in i. Just like everything else in the language that works left to right and top to bottom. If I got any other result, I'd call the compiler broken.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  155. Re:TFA does a poor job of defining what's happenin by EvanED · · Score: 1

    That's probably the best answer (at least by some metric), but it needs just a tiny tweak to be perfect: a/2 + (a%2 + b%2)/2 + b/2.

    I if you have a platform where INT_MIN is odd and a negative number divided by 2 rounds toward negative infinity* (I'm pretty sure such a platform is legal), then avg(INT_MIN, INT_MIN) will overflow. I think my tweak fixes that.

    * I'm not 100% sure of what standards allow what behavior, but at least in C++98/03, -3/2 can return either -1 or -2, as long as (a/b)*b + a%b == a (so if -3/2 == -1 then -3%2 == -1 and if -3/2 == -2 then -3%2 == 1)

  156. Re:TFA does a poor job of defining what's happenin by fatphil · · Score: 1

    > I find it very curious that you'd be prohibited from using the value of the pointer proper. Do you have a citation for it being undefined?

    N1570 L.3p2
    "The value of a pointer that refers to space deallocated by a call to the free or realloc function is used (7.22.3)."

    realloc() *always* deallocates, unless it returns NULL. Simply growing an area counts as a deallocation of the old area followed by a new allocation at the same spot.

    --
    Also FatPhil on SoylentNews, id 863
  157. Re:TFA does a poor job of defining what's happenin by AmiMoJo · · Score: 1

    That reminds me of this gem:Overflow in sorting algorithms

    Why would anyone make array indices signed ints in the first place? As a C programmer that sets off alarm bells for me, so I'd immediately suspect the rest of the function too. I'm amazed no-one noticed it for so long. Did nobody review that code?

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  158. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    There's a catch: the expression f(tmp = a, tmp, tmp) contains no sequence points. Comma used as a separator for function arguments is not regarded as comma the operator. So unless you explicitly add parentheses to force the comma to be treated as the operator, any side-effects made inside the function argument list may not take effect until after the function returns.

  159. Re:TFA does a poor job of defining what's happenin by AmiMoJo · · Score: 1

    In this instance there is a better way. We know that low > 1) + (high >> 1)

    This takes care of rounding up. Alternatively, just re-write the binary search to work with rounding down, a trivial modification. Then you can just do

    ave = (low >> 1) + (high >> 1)

    which can be compiled down to three assembly instructions on many architectures.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  160. Re:TFA does a poor job of defining what's happenin by AmiMoJo · · Score: 1

    In this instance there is a better way. We know that low < high because we already tested for low==high as our loop exit condition. Therefore we can safely add 1 to low without fear of an overflow.

    ave = ((low+1 ) >> 1) + (high >> 1)

    This takes care of rounding up. Alternatively, just re-write the binary search to work with rounding down, a trivial modification. Then you can just do

    ave = (low >> 1) + (high >> 1)

    which can be compiled down to three assembly instructions on many architectures.

    (re-posting because it was mangled the first time, should have used preview)

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  161. Re:TFA does a poor job of defining what's happenin by Spazmania · · Score: 1

    Not following the logic that an infinite loop is "undefined." Seems pretty well defined to me. Bugged. But perfectly well defined.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  162. Re:These bugs exist even *without* signed integers by AmiMoJo · · Score: 1

    An interrupt is not a good way to handle integer overflow, especially since it is often the desired behaviour. Most modern CPUs can detect overflow and set a CPU flag, which the code can then test. That's the best way to handle it - test the flag if you care, or not if you don't.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  163. Re:TFA does a poor job of defining what's happenin by Spazmania · · Score: 1

    The results of arithmetic overflow is undefined in the language, not the architecture. Don't warn. Punt to the CPU and accept its result. If you optimize in a way that potentially corrupts the result from the CPU then I expect you to warn... and give me a compiler flag to turn the specific optimization off.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  164. Re:These bugs exist even *without* signed integers by Anonymous Coward · · Score: 0

    You sound like one of these Terrorists who want to deny Raytheon their multi-billion dollar Cyber Warfare Pork !

  165. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    Well defined, but sometimes wrong:

    arch x86_64, gcc 4.7.2 20121109 (Red Hat 4.7.2-8) (GCC)

    char: 7f , 7f = 7f
    unsigned char: ff , ff = ff
    short: 7fff , 7fff = 7fff
    unsigned short: ffff , ffff = ffff
    int: 7fffffff , 7fffffff = ffffffff
    unsigned int: ffffffff , ffffffff = 7fffffff
    long: 7fffffffffffffff , 7fffffffffffffff = ffffffffffffffff
    unsigned long: ffffffffffffffff , ffffffffffffffff = 7fffffffffffffff

    (Under either -O0 or -O2. Sizes smaller than int work because the arithmetic will be done under promotion up to int. Sizes int and larger fail because the MSbit is lost.)

  166. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    No, I meant as written.

    When you say "as written", what seem to mean is a different compiler spec that you believe is obvious. I don't believe "as written" has any meaning, because that spec just doesn't exist and each programmer might have a different definition of what "as written" means. That's why we have specs.

  167. Re:TFA does a poor job of defining what's happenin by Anonymous Coward · · Score: 0

    This is a common error. On real hardware, arithmetic overflow is well-defined. It may not be the behavior you want, but it's well-defined. In the C standard, arithmetic overflow is undefined. Sure, if you intentionally overflow ints, you will get something, based on the hardware behavior of the instructions generated by the C compiler. But according to the C standard, arithmetic overflow is undefined. That means that if you write in C, and what you write in C overflows, you may not have any expectations about the behavior: it can be anything.

  168. Re:TFA does a poor job of defining what's happenin by AmiMoJo · · Score: 1

    This brings up another tricky thing in C: pointers that are not pointers.

    Arrays are just pointers to pre-allocated memory. This allocates both the memory for the array (100 bytes) and the memory for a pointer to the array (typically 4 bytes on a 32 bit system):

    uint8_t a[100];

    &a gives you a pointer to a pointer to the first element of the array. Consider this:

    typedef struct {
          uint8_t a[100];
    } oddity_t;

    oddity_t st;

    Here we have a struct 100 bytes in size. However, st.a is not a pointer. &st.a gives you the pointer of the first element of the array. a itself gives you the first element of the array and is of type uint8, but I'm not sure if that is undefined/compiler dependent and can't be bothered to look it up now.

    Now try doing sizeof(a) and sizeof(st.a).

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  169. Re:Compiler can't warn about all undefined behavio by Spazmania · · Score: 1

    From your link:

    Lets look at an example: even though invalid type casting bugs are frequently exposed by type based alias analysis, it would not be useful to produce a warning that "the optimizer is assuming that P and P[i] don't alias" when optimizing "zero_array" (from Part #1 of our series).

    float *P;
      void zero_array() {
          int i;
          for (i = 0; i < 10000; ++i)
              P[i] = 0.0f;
    }

    But that statement makes no sense. P[i] is not a pointer, it's a single float. It can't be an alias for the pointer P. Hence there is no assumption for the optimizer to warn about.

    P+i would be a pointer. But P[i] is the same as saying *(P+i).

    And yes, I did see your comment. Did you see my response?

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  170. Re:TFA does a poor job of defining what's happenin by sconeu · · Score: 1

    Sorry, I was thinking C++, where it is defined.

    --
    General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
  171. What every programmer should know about undefined: by Anonymous Coward · · Score: 1

    Everyone who cares about correctness of their C or C++ programs at all, should read carefully all of the following articles:

    Dangerous Optimizations and the Loss of Causality (PDF)

    Understanding Integer Overflow in C/C++ (PDF)

    A Guide to Undefined Behavior in C and C++, Part 1 (blog post)

    Finding Undefined Behavior Bugs by Finding Dead Code (blog post)

    Then complain to your local compiler manufacturer. :P

  172. Undefined behavior is a serious problem by Anonymous Coward · · Score: 0

    There's at least 191 kinds of undefined behaviour in C99. Its not reasonable to expect programmers to always perfectly avoid all of them.

    We need compiler-writers to meet us half-way, by agreeing not to aggressively optimize out code that triggers undefined behaviour the way they are doing. There millions of existing time-bombs in billions of lines of C/C++ code out there, any one of which could suddenly become a serious problem when a new version of a popular compiler starts taking advantage of the fact that it is relying on undefined behaviour and optimizes it out, breaking code that used to work (even though that code was "incorrect"). Compiler writers need to be shamed into not doing this, because its a bad thing for everything and everyone except for their fucking benchmarks.

  173. all of them by Anonymous Coward · · Score: 0

    All modern optimizing compilers can delete code that invokes undefined behavior. All modern optimizing compilers assume integer overflows won't happen, allowing them to do things with loop induction vars that they just couldn't do if they had accomodate the possiblity of 2's complement overflow. Programmers need to learn to avoid undefined behavior, because the compiler will fuck you if you don't.

  174. you missed MSVC by Anonymous Coward · · Score: 0

    Microsoft's compiler does it too.

  175. Re:-Wall doesn't catch some aliasing issues by Spazmania · · Score: 1

    I read the article but I'm not following the author's point.

    int a = 0x12345678;
    short *b = (short *)&a;
    b[1] = 0;

    What's wrong with that? You take the location of a and assign it to a pointer to a short. b[0] now contains the first two bytes that comprise a and b[1] now contains the second two bytes that comprise a. Which index contains 0x1234 and which one contains 0x5678 (before b[1]=0 sets it to zero) depends on thee endianness of your machine but that's beside the point. It's very clear what this statement should do: act on the bytes which comprise the 32 bit integer and do it 16 bits at a time.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  176. Only if your compiler defines the behavior by tepples · · Score: 1

    NULL pointers are actually GOOD, because they will facilitate Fast Fail

    Only if your compiler's manual states that it extends the C language such that NULL pointers facilitate Fast Fail. The point of the article is that dereferencing NULL pointers in C is undefined behavior, and compilers aren't required to facilitate Fast Fail in cases of undefined behavior. The proper way to facilitate Fast Fail in C or C++ is to check for NULL pointers near the start of each function using a construction like assert.

  177. Re:TFA does a poor job of defining what's happenin by CODiNE · · Score: 1

    I suspect it's because of signed ints being the default. Many people simply think "float or int?" and aren't even thinking about the sign unless they specifically want to double their addressable space. Also the habit of using -1 as an error code prevents unsigned ints being passed back from functions.

    Unsigned ints are very deliberate so perhaps only used a fraction of the time they could be, maybe predominately in structs.

    That's a research paper just waiting to be written up. :-)

    --
    Cwm, fjord-bank glyphs vext quiz
  178. What would be a "better C" if . . . by Joey+Vegetables · · Score: 1

    What would be a safer low-level systems language than C? I'd love to see one, preferably, one with a LOT less undefined behavior, but still within 2-3x the performance of C, and with the ability to call C or C++ libraries when necessary. I'm not looking for a fully managed environment like Java or .NET, or a higher level language like Python or Ruby. Definitely *not* looking for C++ either . . . I know one can write safer code in it but one can also, quite by accident, write very unsafe code in it as well. Maybe something like D?

  179. Re:TFA does a poor job of defining what's happenin by david_thornley · · Score: 1

    The average of (UINT_MAX/2 + 1000) and (UINT_MAX/2 - 500) cannot be expressed in a standard int. If you're assigning the value to an unsigned int, you get UINT_MAX/2 + 250, which is reasonable. You can't, by definition, represent that number in an int. In that case, why are you complaining about a specific result, when all possible results are wrong?

    --
    "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
  180. Re:TFA does a poor job of defining what's happenin by david_thornley · · Score: 1

    Wrong. Unsigned arithmetic in C is defined as modular. Signed arithmetic isn't, and overflows are undefined behavior.

    --
    "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
  181. Re:TFA does a poor job of defining what's happenin by JesseMcDonald · · Score: 1

    Wrong. Only signed overflow and underflow are undefined. Unsigned arithmetic is, indeed, defined as modular. From ISO/IEC9899:TC3:

    A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

    --
    "The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
  182. Re:TFA does a poor job of defining what's happenin by tricorn · · Score: 1

    No, the response I got was that since the order of evaluation of function arguments is undefined, they can even be done in parallel. Each of the two expressions has sequence points within them, but the comma in the function call does not define a sequence point.

    It isn't about the order of f() and g() being evaluated, but the two arguments to x():

    x( (tmp = 1, f(tmp) + g(tmp) ), (tmp = 2, f(tmp) + g(tmp) ) );

    Now, the value of tmp after the call to x() is obviously undefined, but apparently even the two arguments to x() are undefined.

    Maybe the specific language specification has changed since then, I don't know, this was around 10-15 years ago, on an Alpha with DEC's ucode-based optimizing compiler.

  183. Re:TFA does a poor job of defining what's happenin by hazah · · Score: 1
  184. Re:TFA does a poor job of defining what's happenin by russotto · · Score: 1

    The average of (UINT_MAX/2 + 1000) and (UINT_MAX/2 - 500) cannot be expressed in a standard int.

    It can, however, be expressed in an unsigned int. And you do NOT get UINT_MAX/2 + 250 if you naively average two variables with those values; you get 250.

    #include <stdio.h>
    #include <limits.h>

    unsigned int avg(unsigned int a, unsigned int b) {
    return (a + b) / 2u;
    }

    unsigned int real_avg(unsigned int a, unsigned int b) {
    asm("add %1,%0": "+r"(a) : "r"(b));
    asm("rcr $1, %0": "+r"(a));
    return a;
    }

    int main() {
    int a = UINT_MAX/2u + 1000;
    int b = UINT_MAX/2u - 500;
    int c = UINT_MAX/2u + 250;
    int d = avg(a, b);
    int e = real_avg(a, b);
    printf("a %u, b %u, expected_avg %u, avg %u, real_avg %u\n", a, b, c, d, e)\
    ;
    }

    $ ./foo a 2147484647, b 2147483147, expected_avg 2147483897, avg 249, real_avg 2147483897

  185. Re:TFA does a poor job of defining what's happenin by mysidia · · Score: 1

    Here we have a struct 100 bytes in size. However, st.a is not a pointer.

    Where? Of course st.a is a pointer.... there are rules in C of pointer-array interchangeability.

    st.a has type uint_t [100]. And sizeof both of those is the same.

    &st.a has type uint_t (*)[100]

    Just try it.... http://pastebin.com/7an0MS9g

    .... Now what's really funny are two-dimensional arrays.

    Remember: when you are defining an array of two dimensions dynamically, there are two completely different approaches.

    1. List of pointers technique

    int **Array = malloc(sizeof(int*) * OuterMax);

    And 2. Built-in array type

    Both result in the a[i][j] notation.

    Both are structured completely differently.

    Hilarity ensues, when a programmer accidentally forgets which type of 2D array it is, and tries to resize (a[i]); or bcopy/memcpy on 'a' in an array created using the list-of-pointers technique

  186. Re:TFA does a poor job of defining what's happenin by EvanED · · Score: 1

    Okay, maybe it's too early in the morning but where exactly did this function cast an int* to a float*? Where's the "undefined behavior"?

    Part 1 of the series has more. Here's the result, and then I'll explain:

    "[The strict-aliasing rule] allows clang to optimize [zero_array] into "memset(P, 0, 40000)". This optimization also allows many loads to be hoisted out of loops, common subexpressions to be eliminated, etc. This class of undefined behavior can be disabled by passing the -fno-strict-aliasing flag, which disallows this analysis. When this flag is passed, Clang is required to compile this loop into 10000 4-byte stores (which is several times slower) because it has to assume that it is possible for any of the stores to change the value of P."

    Now, for the explaination (I don't think the LLVM blog explains well):

    That code, taken on its own, doesn't invoke violate the strict-aliasing rules or have UB. The UB would arise (unrelated, so far, to zero_array) if you wrote something like

    P[0] = (float)&P;

    If you did that and then called zero_array, what would happen (practically speaking, when there is no optimization) is that on the first iteration of the loop the compiler would write 0.0f at the address of P[0] = *(P+0) = *((float&P) = P, thus changing the value of P itself. On the next loop iteration, P would have changed.

    The strict-aliasing rule allows the compiler to assume that P does not change between loop iterations, which allows it to generate better code.

    And anyway, how is casting int* to float* undefined behavior?

    The short answer is because "the standard says so".

    But here's a very realistic situation for which forcing semantics would be very detrimental. Assume you're on a platform with 32-bit ints, 64-bit doubles, and for which a 64-bit memory load must be aligned to 8 bytes. (This is a very realistic architecture.) Now suppose you do a somewhat different type-punning cast:

    void foo(double * pd) {
        printf("%f\n", *pd); // or whatever, I don't use printf much
    }
     
    void bar() {
        int x, y;
        foo((double*)&y);
    }

    What should this code do? If you run it on the architecture I said above with a naive compilation, it will probably bus error: probably x will be nicely-aligned but then y will probably be exactly not on an 8-byte boundary and when foo dereferences pd it will be a misaligned load.

    In the absence of the strict-aliasing rule -- if the load from address pd had to produce at least some value -- the compiler would have to assume that every memory access it could not establish safe could potentially be misaligned, and either insert code to catch the trap if possible or perform the appropriate correction.

    There are other ways in which the strict-aliasing rule makes sense (e.g. similar code but y is at the end of a page for which the next page isn't mapped), but that's probably the most convincing one I can come up with off the top of my head because most of the others would involve made up memory models and stuff that have probably never been built but are permitted by the standard anyway. :-)

  187. Re:TFA does a poor job of defining what's happenin by EvanED · · Score: 1

    Cool, thanks!

  188. Re:TFA does a poor job of defining what's happenin by EvanED · · Score: 1

    ...either insert code to catch the trap if possible or perform the appropriate correction

    Which, of course, in the latter case would be slllllooooow.

  189. Re:TFA does a poor job of defining what's happenin by EvanED · · Score: 1

    I've already written way too much for this /. story (but this is the kind of story that fits very snugly in my areas of interest and, to a lesser extent, expertise), but it just occurred to me: my example is much more convincing if you substitute char for int and "any_type_larger_than_1_byte_t" for double. :-)

    In other words, without interprocedural optimizations, a compiler for a platform where misaligned loads and stores causes a trap could never compile

    any_type_larger_than_1_byte_t deref(any_type_larger_than_1_byte_t * p) {
        return *p;
    }

    to a simple load instruction unless it was prepared to deal with the trap and restart.

  190. Re:These bugs exist even *without* signed integers by Animats · · Score: 1

    An interrupt is not a good way to handle integer overflow, especially since it is often the desired behaviour.

    Very seldom, if ever, is integer overflow desired behavior. Other that for computing simple checksums, there are very few use cases.

  191. Re:These bugs exist even *without* signed integers by AmiMoJo · · Score: 1

    Loop counters, RNGs, timers, doing maths on values with more bits than the architecture can handle at once. I can think of many situations where it is used.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  192. Re:TFA does a poor job of defining what's happenin by AmiMoJo · · Score: 1

    What I mean by "not a pointer" is that you it does not contain an address. If you printf it you get the first value of the array, not a pointer to the array. You have to use an ampersand to get the address of the first item of the array.

    Neither arrays nor pointers work that way. There are good reasons why, it's just confusing and inconsistent.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  193. Re:TFA does a poor job of defining what's happenin by david_thornley · · Score: 1

    In that program, you're sticking lots of values where they just don't fit. The variables a and c are initialized with values that don't fit in an int, and avg will return a value that doesn't fit in an int. You are also printing out ints with a %u specifier, which doesn't match. Given that there's no actual requirement for a C implementation to use two's complement, the result of printing out an int with %u is not well defined (I don't know which category of "not well defined" it falls in, offhand).

    Since the variables you use cannot hold the values you use, any value is incorrect, and you're asking that it be incorrect in a way you like. If you use variables of a type that can hold the values (unsigned int, possibly long or long long), I think you'll find that everything looks fine.

    --
    "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
  194. Re:TFA does a poor job of defining what's happenin by jgrahn · · Score: 1

    OK, that explains why I've been getting away with assuming they wrap since the Clinton administration. I don't know if anybody ever explained it to me in C terms.

    It's *your* responsibility as a C programmer to find out what the rules of the game are; you should accept that responsibility. However ...

    I always assumed that behavior was baked in at the CPU level, and just percolated up to C. I never felt inclined to do any "bit twiddling" with int or even fixed-width signed integers because on an intuitive level it "felt wrong".

    ... that's exactly the correct way of thinking (informally) about it. There are so many different representations of signed integers, but only one popular one of unsigned integers.

  195. Re:TFA does a poor job of defining what's happenin by russotto · · Score: 1

    You're correct that I messed up some types. However, making everything unsigned produces exactly the same results.

    #include <stdio.h>
    #include <limits.h>
     
    unsigned int avg(unsigned int a, unsigned int b) {
        return (a + b) / 2u;
    }
     
    unsigned int real_avg(unsigned int a, unsigned int b) {
        asm("add %1,%0": "+r"(a) : "r"(b));
        asm("rcr $1, %0": "+r"(a));
        return a;
    }
     
    int main() {
        unsigned int a = UINT_MAX/2u + 1000u;
        unsigned int b = UINT_MAX/2u - 500u;
        unsigned int c = UINT_MAX/2u + 250u;
        unsigned int d = avg(a, b);
        unsigned int e = real_avg(a, b);
        printf("a %u, b %u, expected_avg %u, avg %u, real_avg %u\n", a, b, c, d, e);
    }

    $ ./foo
    a 2147484647, b 2147483147, expected_avg 2147483897, avg 249, real_avg 2147483897

    This is a well-defined program, but avg() returns an incorrect value. The issue is the intermediate value (a + b), which is well-defined, but is 498 instead of UINT_MAX + 499.

  196. Bug Farm by cwsumner · · Score: 1

    " rather than have the compiler simply leave it out. "

    I always knew that C was a "Bug Farm", but I never knew that some C compilers intentionally insert bugs !!