Overeager Compilers Can Open Security Holes In Your Code
jfruh writes: "Creators of compilers are in an arms race to improve performance. But according to a presentation at this week's annual USENIX conference, those performance boosts can undermine your code's security. For instance, a compiler might find a subroutine that checks a huge bound of memory beyond what's allocated to the program, decide it's an error, and eliminate it from the compiled machine code — even though it's a necessary defense against buffer overflow attacks."
well known for decades that optimizing compilers can produce bugs, security holes, code that doesn't work at all, etc.
This is just as poorly written up as last time. These are truly bugs in the programs using undefined parts of the language. It's silly to blame the compiler.
But according to a presentation at this week's annual UNISEX conference
Any code removal by the compiler can be prevented by correctly
coding the code with volatile (in C) or its equivalent.
This is not really about the existence of bad compiler optimization - it is about a tool called Stack that can be used to detect this, which is known as "unstable" code, and has been used to find lots of vulnerabilities already.
I know that at least GCC will get rid of overflow checks if they rely on checking the value after overflow (without any warning), because C defines that overflow on signed integers is undefined. This is even documented. If anything is declared by the language specification as being undefined, expect trouble.
The kinds of checks that compilers eliminate are ones which are incorrectly implemented (depend on undefined behavior) or happen too late (after the undefined behavior already was triggered). The actual article is reasonable— it's about a tool to help detect errors in programs that suffer here. The compilers are not problematic.
Use a language that does bounds checking automatically. Its not the 1970s any more.
-- MartinG To mail me: echo kewyjlcxyzvjfxbqwh | tr bcefhjklqvwxyz
Apparently, no one told USENIX about this 22 year old revelation.
(also, I was compelled to fix the typo in my Re: of your title, unless you really meant "Code unrelated to sables", which I assume would cover most code)
Compilers can also "optimize" away Kahan summation algorithm. See page 6 of How Futile are Mindless Assessments of Roundoff in Floating-Point Computation
#fail
It's not just bounds checking -- if you want to securely compare 100 bytes, you need to compare all 100 bytes. An optimized algorithm (intentionally or via compiler) that exits after the first mismatch leaks information. This isn't some pie in the sky hypothetical attack, it's been used to break passwords since the 70s (if not earlier).
Short of bugs in the compiler's optimizer — and we all know there have been many — the idea that "if the entire code absolutely must stay fully intact, it shouldn't be optimized" is already dangerous.
A compiler conforming to its documentation or standard isn't going to change semantics that have been guaranteed by that document. Those guarantees though are all you have: even without explicit optimization options, a compiler has a lot of freedom in how it implements those semantics. Relying on a naïve translation from a line of code to a particular, non-guaranteed assembly representation is a very brittle practice.
The classic example of a compiler interfering with intention, opening security holes, is failure to wipe memory.
On a typical embedded system - if there is such a thing (no virtual memory, no paging, no L3 cache, no "secure memory" or vault or whatnot) - you might declare some local (stack-based) storage for plaintext, keys, etc. Then you do your business in the routine, and you return.
The problem is that even though the stack frame has been "destroyed" upon return, the contents of the stack frame are still in memory, they're just not easily accessible. But any college freshman studying computer architecture knows how to get to this memory.
So the routine is modified to wipe the local variables (e.g. array of uint8_t holding a key or whatever...) The problem is that the compiler is smart, and sees that no one reads back from the array after the wiping, so it decides that the observable behavior won't be affected if the wiping operation is elided.
My making these local variables volatile, the compiler will not optimize away the wiping operations.
The point is simply that there are plenty of ways code can be completely "correct" from a functional perspective, but nonetheless terribly insecure. And often the same source code, compiled with different optimization options, has different vulnerabilities.
lol, why not just use haskell or lisp or some other weenie language where you can mathematically prove it correct? Oh, because sitting around pulling your pud doesn't get the job done.
Gentoo funroll-loops dweebs.
Yes! Failure to wipe memory is like failure to wipe asshole. You save some time by not doing it but you have smell like shit and people know where you have been.
vaxocentrism.
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
Undefined behaviors don't behave in a defined manner.
These moronic programs are scanning off the end of allocated memory to do what? Execute that memory if it contains random data, but not if it is detected as malicious code? Please. If the compiler is compliant to the language standard and it opens up a security hole in your crappy code when optimizing then that is your bug.
So what's the standard-conforming way to determine whether a particular integer operation will not overflow? And are compilers smart enough to optimize the standard-conforming way into something that uses the hardware's built-in overflow detection, such as carry flags?
"...decide it's an error.."
No, it is an "optimizing" compiler not a "correcting" compiler. The optimizer can detect that no language defined semantic will be changed by removing the code, so it does. As others have noted, "volatile" is the fix for this particular coding / compiler blunder. However ill-defined, it is *not an error*.
As for the folks commenting that only C can run in small embedded processors that's hogwash. Huge mainframes of the early ages had smaller memory sizes and ran FORTRAN (now Fortran, but then it was all caps), COBOL, PL/I (and .8 for IBM internals), Algol and other languages. Most made entire classes of C blunders impossible, and there is no fundamental reason why we couldn't go back to safer languages for embedded programming (and good reasons why we ought to; not that I expect we shall).
Which is why good IDEs and error messages are so important. When integrated with the compiler it can highlight the code, log a warning, and say ''hey, this doesn't seem to effect anything and will be removed" and the programmer will know there's a potential bug.
Many IDEs have their own checks to do a little of this, but there are lots of room for improvement.
The point is simply that there are plenty of ways code can be completely "correct" from a functional perspective, but nonetheless terribly insecure. And often the same source code, compiled with different optimization options, has different vulnerabilities.
And this is exactly the reason the BSD and Apple folks developed the CLANG Compiler.
When I started using Gentoo in 2003, I began reading up on optimizations and what could happen only to discover that the compiler could introduce unreproduciable errors in the software along with changing the md5 hash each and every time it compiled things. This didn't make me happen then and it hasn't gotten any better other then with the Clang compiler and Gentoo now offers it as an option for building instead of the GCC toolchain.
Your complier constantly removes tons of unneeded code under optimization. It can't possibly be a warning to do this, or you'd drown in them.
Socialism: a lie told by totalitarians and believed by fools.
My making these local variables volatile, the compiler will not optimize away the wiping operations.
Not necessarily. When those variables go out of scope whether they are volatile or not is irrelevant. When in scope does "volatile" mean "this variable can change while I'm using it" or "some unknown code reads this variable when I change it". Not clear and if it's the first then the zeroing can still be optimized out.
If there is external evidence that the compiler needs in order to compile/optimize correctly then it needs to be given to the compiler. Thus the addition of "volatile". Other compilers have a way to add options on the command line, such as indicating that the machine uses strict alignment.
The problem here is however not that there is some strange stuff happening with the program, and that code is being optimized away because the compiler can prove that it serves no purpose given the lack of additional information. The problem is that the compiler sees a code performing an operation that has undefined behavior and then "deduces" that it can perform additional optimization based on that (the article did not talk about security holes, this seems to be added in this summary only). However I think such compilers should instead issue a warning about the undefined behavior, or at the very least not try to exploit such behavior and remove the code altogether. It's extremely pedantic to say that because the code has undefined behavior then generating code that does nothing is a valid option, especially without warning the user that you're doing this.
The examples given in the article did not appear to be fully functionally correct, all of them would have caused a static code analysis tool to complain.
if you have tons of unneeded code, you've probably done something very, very wrong. All dead code is begging for a security hole.
Unless all the code running on the machine is absolutely type-safe and only allows "safe" reflection then trying to hide sensitive data from other bits of code in your address space is a lost cause. Code modification, emulation, tracing, breakpoint instructions, hardware debugger support, etc. are all viable ways for untrusted code with access to your address space to steal your data.
Wiping memory is only effective for avoiding hot or cold boot attacks against RAM, despite its frequent use for hacking terrible operating systems to hope/pretend that userspace software isn't leaking data into other processes either directly via attacks or accidentally through kernel mishandling of memory.
I always insist on a clean compile with the warning level turned up as high as it will go. If the compiler is cool with my code, I have a better chance it will do the right thing with it.
Once I have an application that works I see if it meets performance goals (if any). If it does, I'm done. If it doesn't, profile, find the hot spots, optimize as needed. Compiling an entire application with -O3 is idiotic, and misses the point.
...laura
Most of these problems originate from the the programmer relying on undefined behavior. If the language definition doesn't state the behavior explicitly (or if it says the behavior is undefined, or implementation specific) - don't use that "feature".
Excellent writeup by Valve's Bruce Dawson here.
If there is external evidence that the compiler needs in order to compile/optimize correctly then it needs to be given to the compiler. Thus the addition of "volatile".
Not quite true. According to the standard, the compiler is not allowed to optimize sections touching volatile memory *at all*.
Volatile tells the compiler it may not eliminate or introduce reads and writes to a variable that did not exist in the source code. They also cannot be reordered with respect to other volatile variables.
After RFA and RFP, in some cases this does look like the compiler should give better support.
But a lot of these would be avoided if programmers expressed exactly what they want instead of trying to be "cute" and pre-optimizing their code to be "short". It would also really improve readability
For God's sake, /. comment system sucks, writing code in it winds up with it cutting out large parts of my comment. So I'll just have to manage without actually code sample from the paper. Instead of using buf + len buf to check for wrap-around for huge len, actually compare len to buf_len and be done with (this is clearer to any idiot too). Similarly, whomever thinks doing (-arg1 0) == (arg1 0) for testing for the special value of INT_MIN (or LONG_MIN or whatever it is in your system) instead of just testing against the damn constant is a moron. It's just inviting trouble.
I think that if you're doing something that is "clever," just stop. Write it clearly. If it's clever for people, maybe you're being too clever by half and the compiler will guess wrong too.
Flatly false.
Good code has plenty of null-checking, for example, most of which a clever compiler can prove is unreachable. It's still completely good code.
Compilers remove code that is provably unreachable on this build. Good defensive coding includes protection against future code changes.
Socialism: a lie told by totalitarians and believed by fools.
http://developers.slashdot.org/story/13/10/29/2150211/how-your-compiler-can-compromise-application-security