Overeager Compilers Can Open Security Holes In Your Code
jfruh writes: "Creators of compilers are in an arms race to improve performance. But according to a presentation at this week's annual USENIX conference, those performance boosts can undermine your code's security. For instance, a compiler might find a subroutine that checks a huge bound of memory beyond what's allocated to the program, decide it's an error, and eliminate it from the compiled machine code — even though it's a necessary defense against buffer overflow attacks."
well known for decades that optimizing compilers can produce bugs, security holes, code that doesn't work at all, etc.
This is just as poorly written up as last time. These are truly bugs in the programs using undefined parts of the language. It's silly to blame the compiler.
But according to a presentation at this week's annual UNISEX conference
Any code removal by the compiler can be prevented by correctly
coding the code with volatile (in C) or its equivalent.
This is not really about the existence of bad compiler optimization - it is about a tool called Stack that can be used to detect this, which is known as "unstable" code, and has been used to find lots of vulnerabilities already.
I know that at least GCC will get rid of overflow checks if they rely on checking the value after overflow (without any warning), because C defines that overflow on signed integers is undefined. This is even documented. If anything is declared by the language specification as being undefined, expect trouble.
The kinds of checks that compilers eliminate are ones which are incorrectly implemented (depend on undefined behavior) or happen too late (after the undefined behavior already was triggered). The actual article is reasonable— it's about a tool to help detect errors in programs that suffer here. The compilers are not problematic.
Compilers can also "optimize" away Kahan summation algorithm. See page 6 of How Futile are Mindless Assessments of Roundoff in Floating-Point Computation
Short of bugs in the compiler's optimizer — and we all know there have been many — the idea that "if the entire code absolutely must stay fully intact, it shouldn't be optimized" is already dangerous.
A compiler conforming to its documentation or standard isn't going to change semantics that have been guaranteed by that document. Those guarantees though are all you have: even without explicit optimization options, a compiler has a lot of freedom in how it implements those semantics. Relying on a naïve translation from a line of code to a particular, non-guaranteed assembly representation is a very brittle practice.
The classic example of a compiler interfering with intention, opening security holes, is failure to wipe memory.
On a typical embedded system - if there is such a thing (no virtual memory, no paging, no L3 cache, no "secure memory" or vault or whatnot) - you might declare some local (stack-based) storage for plaintext, keys, etc. Then you do your business in the routine, and you return.
The problem is that even though the stack frame has been "destroyed" upon return, the contents of the stack frame are still in memory, they're just not easily accessible. But any college freshman studying computer architecture knows how to get to this memory.
So the routine is modified to wipe the local variables (e.g. array of uint8_t holding a key or whatever...) The problem is that the compiler is smart, and sees that no one reads back from the array after the wiping, so it decides that the observable behavior won't be affected if the wiping operation is elided.
My making these local variables volatile, the compiler will not optimize away the wiping operations.
The point is simply that there are plenty of ways code can be completely "correct" from a functional perspective, but nonetheless terribly insecure. And often the same source code, compiled with different optimization options, has different vulnerabilities.
Unfortunately for those languages, the entire world does not run x86 or other workstation-class or better CPU. Which one of those will run on, for example, the hundreds of millions of 16-bit microcontrollers in wide use? Or MIPS chips in memory-constrained devices like consumer routers? For those requirements, the only usable portable language is C.
We're so impressed with your insights that we want to hire you to get a managed language like Java or .net to run on our CPU (4KB flash and 2KB ram). It will be so nice to not worry about 1970s problems any more.
C became popular because it was vastly more portable and performant than its predecessors. It still is today. None of those "better" languages that came before it or after it can beat that. And yes, extreme portability does matter when you have 100s of millions if not billions of devices that can't run anything but assembly or C. It's why the people saying that OpenSSL should be written in Java or C# are morons. Care to tell me how that's going to run on a, for example, Linksys WRT54G with only 8 or 16 MB of RAM, 2 to 4 MB of Flash storage and a 125 to 240 mhz MIPS CPU? Yeah, it's not.
Yes! Failure to wipe memory is like failure to wipe asshole. You save some time by not doing it but you have smell like shit and people know where you have been.
But the entire world runs x86 with gigs of RAM and terabytes of storage!! How dare you being reality into this!
The problem is that most programmers have never had to get their hands dirty doing embedded work. They live in a bubble that ignores all the memory/storage/processing-power constrained devices all around them. OpenSSL, for example, as used in something like DD-WRT would be unusable if it was written in anything but C or possibly C++.
vaxocentrism.
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
Well I'd be pretty pissed as well if my pet language was relegated to the graveyard of obscurity by a language that was usable for real work. Dennis Ritchie was a pragmatist who got shit done not some guy wanking over the greatness and purity of the language he created. People to this day are still jealous of that.
Indeed. But that's a different class of problem. Or are compilers optimising constant time comparison routines to not run in constant time these days?
-- MartinG To mail me: echo kewyjlcxyzvjfxbqwh | tr bcefhjklqvwxyz
C became popular because it was vastly more portable and performant than its predecessors.
With the exception of Forth. :-)
And yes, extreme portability does matter when you have 100s of millions if not billions of devices that can't run anything but assembly or C.
Or Oberon. Oops, there. I said it.
Care to tell me how that's going to run on a, for example, Linksys WRT54G with only 8 or 16 MB of RAM, 2 to 4 MB of Flash storage and a 125 to 240 mhz MIPS CPU? Yeah, it's not.
Correct me if I'm wrong, but aren't Tektronix oscilloscopes still running embedded Smalltalk these days?
Ezekiel 23:20
It ran on the Alto. But you really should be using Forth on that CPU, though. It was born for that. (Unless it's one of the dreaded 8051s, of course.)
Ezekiel 23:20
Suppose your program accesses millions of array elements in performance sensitive areas. Bounds checking would slow down your code by a factor of 2 or more and therefore should be optional (but not non-existent, like in C).
The Alto had at minimum 128 KB so it's not even remotely analogous. Even the most constrained Java ME profile requires 8KB just for itself.
I was speaking of RAM of course.
And regarding your CPU, I was speaking of Forth, of course.
Ezekiel 23:20
So what's the standard-conforming way to determine whether a particular integer operation will not overflow? And are compilers smart enough to optimize the standard-conforming way into something that uses the hardware's built-in overflow detection, such as carry flags?
With the exception of Forth. :-)
Not a systems language so far less usable than C.
Or Oberon. Oops, there. I said
Same problem as above.
Correct me if I'm wrong, but aren't Tektronix oscilloscopes still running embedded Smalltalk these days?
They might but still has the same limitations as the above and is even more niche.
I should amend my previous statement to say does not have the same portability and capabilities as C. I would dispute that they're as portable as C (I would love to be proven wrong on this), but even if I did except that they are far less capable than C.
Sure. Use C++, use std::string and std::vector instead of C-type strings or arrays, and wherever you have brackets you substitute ".at()". Bounds checking guaranteed, and you can globally search for "[" to see if anybody's violating the rules.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
Even if you don't like Forth (which is arguably vastly superior in the tiniest applications), why should Oberon be "far less useable" than C? A technical argument, please.
Ezekiel 23:20
Even if you don't like Forth (which is arguably vastly superior in the tiniest applications)
I don't dislike it. It's still less portable and powerful.
why should Oberon be "far less useable" than C? A technical argument, please.
That was my bad. I confused the language. It's usability would be limited by its platform support which is smaller than C.
There are many fallacies your post builds on, all stemming from the original premise that UNIX was built using C, which already laid down the groundwork for its popularization, leading to C++.
You ascribe something to me that I never stated. Of course UNIX was not built using C. C was created in order to make Unix portable. The only thing fallacious is your strawman.
Java and C# are in the same venue as C and C++.
+5 funny. If they are in the same venue please show me you running Java or C# on an Amtel ATTiny. I won't hold my breath.
Obviously you are not going to invest time in researching better ways when you have a hammer and some nails to do it right away. Humans do with what gets the job done there and then, and the more who use the same tools, the more you can copy and learn from others, even if it's not the optimal way.
+5 funny. Half my job is programming in C# so you would be wrong again.
C and C++ are still very close to how assembly language is translated to machine code. It's 99% a 1:1 relationship in how the code is organized in source to how it is organized in code.
LOL. That hasn't been true for decades. C and C++ translate horribly to modern vector assembly language instructions. Even the best of vectorizing compilers are laughably bad. If what you said was true Intel and others wouldn't be constantly reinventing extensions to C to allow better vectorizing of the code.
C could have been far better at what it does, if it had acknowledge it was just another form of of assembly language. As for C++, you have to become a compiler to fully understand the language, or risk writing code you can't predict the behavior of.
C would be far better if lots of things were changed about it. C is a very flawed language, but it's still the best portable language around.
Hubris is always funny. These are the same people who will write Javascript code that has XSS flaws or will write database interfacing code that is subject to SQL injection attacks while at the same time talking about how secure, memory-safe, etc. the language they use is.
Yes but knowing about that would require the GP and his ilk to get better talking points. Most of them have never used C or C++ and are merely parroting random crap they hear from other people who have also likely never used them either. C and C++ are anything but perfect, but for a number of domains/platforms they are basically they best you're going to get unless you want to dive into a usually shitty, proprietary vendor language or assembly.
Now that parallel computing has started to take central stage, you're forced to deal with the abstract modeling problem.
Nah, already been solved by things like OpenMP. It's cross-platform, cross-vendor, etc.
"...decide it's an error.."
No, it is an "optimizing" compiler not a "correcting" compiler. The optimizer can detect that no language defined semantic will be changed by removing the code, so it does. As others have noted, "volatile" is the fix for this particular coding / compiler blunder. However ill-defined, it is *not an error*.
As for the folks commenting that only C can run in small embedded processors that's hogwash. Huge mainframes of the early ages had smaller memory sizes and ran FORTRAN (now Fortran, but then it was all caps), COBOL, PL/I (and .8 for IBM internals), Algol and other languages. Most made entire classes of C blunders impossible, and there is no fundamental reason why we couldn't go back to safer languages for embedded programming (and good reasons why we ought to; not that I expect we shall).
But can you write for that device in a language that has proper bounds checking built in from the get go?
More than that, string and vector are just as fast as the vulnerable C alternatives as long as you pre-allocate the buffers to the expected size. It's not rocket science, but the number of times I've heard "but C++ is too slow, we have to use C" makes me cry. With bounds checking in both (hand-written in C, of course), the speed is the same. Without buffer checking in either, the speed is the same.
Socialism: a lie told by totalitarians and believed by fools.
Your complier constantly removes tons of unneeded code under optimization. It can't possibly be a warning to do this, or you'd drown in them.
Socialism: a lie told by totalitarians and believed by fools.
If there is external evidence that the compiler needs in order to compile/optimize correctly then it needs to be given to the compiler. Thus the addition of "volatile". Other compilers have a way to add options on the command line, such as indicating that the machine uses strict alignment.
The problem here is however not that there is some strange stuff happening with the program, and that code is being optimized away because the compiler can prove that it serves no purpose given the lack of additional information. The problem is that the compiler sees a code performing an operation that has undefined behavior and then "deduces" that it can perform additional optimization based on that (the article did not talk about security holes, this seems to be added in this summary only). However I think such compilers should instead issue a warning about the undefined behavior, or at the very least not try to exploit such behavior and remove the code altogether. It's extremely pedantic to say that because the code has undefined behavior then generating code that does nothing is a valid option, especially without warning the user that you're doing this.
The examples given in the article did not appear to be fully functionally correct, all of them would have caused a static code analysis tool to complain.
Unless all the code running on the machine is absolutely type-safe and only allows "safe" reflection then trying to hide sensitive data from other bits of code in your address space is a lost cause. Code modification, emulation, tracing, breakpoint instructions, hardware debugger support, etc. are all viable ways for untrusted code with access to your address space to steal your data.
Wiping memory is only effective for avoiding hot or cold boot attacks against RAM, despite its frequent use for hacking terrible operating systems to hope/pretend that userspace software isn't leaking data into other processes either directly via attacks or accidentally through kernel mishandling of memory.
I always insist on a clean compile with the warning level turned up as high as it will go. If the compiler is cool with my code, I have a better chance it will do the right thing with it.
Once I have an application that works I see if it meets performance goals (if any). If it does, I'm done. If it doesn't, profile, find the hot spots, optimize as needed. Compiling an entire application with -O3 is idiotic, and misses the point.
...laura
Volatile tells the compiler it may not eliminate or introduce reads and writes to a variable that did not exist in the source code. They also cannot be reordered with respect to other volatile variables.
No, I think C is a great langauge. It has its flaws but no langauge is perfect and the ones that claim to be are extremely niche or long since dead.
Flatly false.
Good code has plenty of null-checking, for example, most of which a clever compiler can prove is unreachable. It's still completely good code.
Compilers remove code that is provably unreachable on this build. Good defensive coding includes protection against future code changes.
Socialism: a lie told by totalitarians and believed by fools.
Ever heard of Ada and Fortran?
I know tobacco is bad for you, so I smoke weed with crack.