How Your Compiler Can Compromise Application Security

← Back to Stories (view on slashdot.org)

How Your Compiler Can Compromise Application Security

Posted by Soulskill on Tuesday October 29, 2013 @11:18AM from the my-compiler-levels-me-out dept.

jfruh writes "Most day-to-day programmers have only a general idea of how compilers transform human-readable code into the machine language that actually powers computers. In an attempt to streamline applications, many compilers actually remove code that it perceives to be undefined or unstable — and, as a research group at MIT has found, in doing so can make applications less secure. The good news is the researchers have developed a model and a static checker for identifying unstable code. Their checker is called STACK, and it currently works for checking C/C++ code. The idea is that it will warn programmers about unstable code in their applications, so they can fix it, rather than have the compiler simply leave it out. They also hope it will encourage compiler writers to rethink how they can optimize code in more secure ways. STACK was run against a number of systems written in C/C++ and it found 160 new bugs in the systems tested, including the Linux kernel (32 bugs found), Mozilla (3), Postgres (9) and Python (5). They also found that, of the 8,575 packages in the Debian Wheezy archive that contained C/C++ code, STACK detected at least one instance of unstable code in 3,471 of them, which, as the researchers write (PDF), 'suggests that unstable code is a widespread problem.'"

24 of 470 comments (clear)

Min score:

Reason:

Sort:

TFA does a poor job of defining what's happening by istartedi · 2013-10-29 11:25 · Score: 4, Insightful

If my C code contains *foo=2, the compiler can't just leave that out. If my code contains if (foo) { *foo=2 } else { return EDUFUS; } it can verify that my code is checking for NULL pointers. That's nice; but the questions remain:
What is "unstable code" and how can a compiler leave it out? If the compiler can leave it out, it's unreachable code and/or code that is devoid of semantics. No sane compiler can alter the semantics of your code, at least no compiler I would want to use. I'd rather set -Wall and get a warning.

--
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Inflammatory Subject by Imagix · 2013-10-29 11:29 · Score: 4, Informative

This is complaining because code which is already broken is broken more by the compiler? The programmer is already causing unpredictable things to happen, so even "leaving the code in" still provides no assurances of correct behaviour. An example of how the article is skewed:

Since C/C++ is fairly liberal about allowing undefined behavior
No, it's not. The language forbids undefined behavior. If your program invokes undefined behavior, it is no longer well-formed C or C++.
Re:TFA does a poor job of defining what's happenin by Anonymous Coward · 2013-10-29 11:38 · Score: 5, Informative

An example of "unstable code":
char *a = malloc(sizeof(char));
*a = 5;
char *b = realloc(a, sizeof(char));
*b = 2;
if (a == b && *a != *b)
{
launchMissiles();
}
A cursory glance at this code suggests missiles will not be launched. With gcc, that's probably true at the moment. With clang, as I understand it, this is not true -missiles will be launched. The reason for this is that the spec says that the first argument of realloc becomes invalid after the call, therefore any use of that pointer has undefined behaviour. Clang takes advantage of this, and defines the behaviour of this to be that *a will not change after that point. Therefore it optimises if (a == b && *a != *b) into if (a == b && 5 != *b). This clearly then passes, and missiles get launched.
The truth here is that your compiler is not compromising application security – the code that relies on undefined behaviours is.
Re:News flash by war4peace · 2013-10-29 11:40 · Score: 4, Informative

I would also like to understand what's the definition of "unstable code".

--
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
Re:TFA does a poor job of defining what's happenin by Nanoda · 2013-10-29 11:43 · Score: 5, Informative

What is "unstable code" and how can a compiler leave it out?
The article is actually using that as an abbreviation for what they're calling "optimization-unstable code", or code that is included at some specified compiler optimization levels, but discarded at higher levels. Basically they think it's unstable due to being included or not randomly, not because the code itself necessarily results in random behaviour.
Re:Null pointer detection at compile time by Zero__Kelvin · 2013-10-29 11:46 · Score: 4, Insightful

"For example, if a pointer is passed to a function, is the function allowed to dereference it without first checking it for NULL?"
Of course it is, and it is supposed to be able to do so. If you were an embedded systems programmer you would know that, and also know why. Next you'll be complaining that languages allow infinite loops (again, a very useful thing to be able to do). C doesn't protect the programmer from himself, and that's by design. Compilers have switches for a reason. If they don't know how it is being built or what the purpose of the code is then they can't possibly determine with another program if the code is "unstable".

--
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Re:News flash by Mitchell314 · 2013-10-29 11:49 · Score: 5, Funny

Code with a finite half-life. Sometimes radiates when it decays. The byproducts tend to be hazardous to health, and most cause symptoms such as headaches, tremors, Carpal Tunnel Syndrome, and Acute Induced Tourette Syndrome. Handle with care. The Daily WTF has an emergency hotline if you or somebody you know has been exposed to unsafe levels of unstable code.

--
I read TFA and all I got was this lousy cookie
Really small EXE mystery solved by Tablizer · 2013-10-29 11:52 · Score: 5, Funny

many compilers actually remove code that it perceives to be undefined or unstable
No wonder my app came out with 0 bytes.

--
Table-ized A.I.
Re:News flash by Cryacin · 2013-10-29 11:54 · Score: 4, Funny

So that's why you have to restart your computer. Gets rid of dangerous radiation from weapons grade baloneyum decay.

--
Science advances one funeral at a time- Max Planck
Re:TFA does a poor job of defining what's happenin by dgatwood · 2013-10-29 11:56 · Score: 5, Informative

Another, more common example of code optimizations causing security problems is this pattern:
int a = [some value obtained externally];
int b = a + 2;
if (b < a) {
// integer overflow occurred ...
}
The C spec says that signed integer overflow is undefined. If a compiler does no optimization, this works. However, it is technically legal for the compiler to rightfully conclude that two more than any number is always larger than that number, and optimize out the entire "if" statement and everything inside it.
For proper safety, you must write this as:
int a = [some value obtained externally];
if (INT_MAX - a < 2) {
// integer overflow will occur ...
}
int b = a + 2;

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:TFA does a poor job of defining what's happenin by Spikeles · 2013-10-29 11:59 · Score: 4, Informative

The TFA links to the actual paper. Maybe you should read that.
Towards Optimization-Safe Systems:Analyzing the Impact of Undefined Behavior

struct tun_struct *tun = ...;
struct sock *sk = tun->sk;
if (!tun)
return POLLERR; /* write to address based on tun */

For example, when gcc first sees the dereference tun->sk, it concludes that the pointer tun must be non-null, because the C standard states that dereferencing a null pointer is undefined [24:6.5.3]. Since tun is non-null, gcc further determines that the null pointer check is unnecessary and eliminates the check, making a privilege escalation exploit possible that would not otherwise be.

--
I don't need to test my programs.. I have an error correcting modem.
Re:TFA does a poor job of defining what's happenin by complete+loony · 2013-10-29 12:00 · Score: 4, Informative

"What every C programmer should know about undefined behaviour" (part 3, see links for first 2 parts).
For example, overflows of unsigned values is undefined behaviour in the C standard. Compilers can make decisions like using an instruction that traps on overflow if it would execute faster, or if that is the only operator available. Since overflowing might trap, and thus cause undefined behaviour, the compiler may assume that the programmer didn't intend for that to ever happen. Therefore this test will always evaluate to true, this code block is dead and can be eliminated.
This is why there are a number of compilation optimisations that gcc can perform, but which are disabled when building the linux kernel. With those optimisations, almost every memory address overflow test would be eliminated.

--
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
The paper gives examples by AdamHaun · 2013-10-29 12:04 · Score: 4, Informative

The article doesn't summarize this very well, but the paper (second link) provides a couple examples. First up:

char *buf = ...; char *buf_end = ...; unsigned int len = ...; if (buf + len >= buf_end) return; /* len too large */
if (buf + len < buf) return; /* overflow, buf+len wrapped around */ /* write to buf[0..len-1] */
To understand unstable code, consider the pointer overflow check buf + len < buf shown [above], where buf is a pointer and len is a positive integer. The programmer's intention is to catch the case when len is so large that buf + len wraps around and bypasses the first check ... We have found similar checks in a number of systems, including the Chromium browser, the Linux kernel, and the Python interpreter.
While this check appears to work on a flat address space, it fails on a segmented architecture. Therefore, the C standard states that an overflowed pointer is undefined, which allows gcc to simply assume that no pointer overflow ever occurs on any architecture. Under this assumption, buf + len must be larger than buf, and thus the "overflow" check always evaluates to false. Consequently, gcc removes the check, paving the way for an attack to the system.

They then give another example, this time from the Linux kernel:

struct tun_struct *tun = ...; struct sock *sk = tun->sk; if (!tun) return POLLERR; /* write to address based on tun */
In addition to introducing new vulnerabilities, unstable code can amplify existing weakness in the system. [The above] shows a mild defect in the Linux kernel, where the programmer incorrectly placed the dereference tun->sk before the null pointer check !tun. Normally, the kernel forbids access to page zero; a null tun pointing to page zero causes a kernel oops at tun->sk and terminates the current process. Even if page zero is made accessible (e.g. via mmap or some other exploits), the check !tun would catch a null tun and prevent any further exploits. In either case, an adversary should not be able to go beyond the null pointer check.
Unfortunately, unstable code can turn this simple bug into an exploitable vulnerability. For example, when gcc first sees the dereference tun->sk, it concludes that the pointer tun must be non-null, because the C standard states that dereferencing a null pointer is undefined. Since tun is non-null, gcc further determines that the null pointer check is unnecessary and eliminates the check, making a privilege escalation exploit possible that would not otherwise be.
The basic issue here is that optimizers are making aggressive inferences from the code based on the assumption of standards-compliance. Programmers, meanwhile, are writing code that sometimes violates the C standard, particularly in corner cases. Many of these seem to be attempts at machine-specific optimization, such as this "clever" trick from Postgres for checking whether an integer is the most negative number possible:

int64_t arg1 = ...; if (arg1 != 0 && ((-arg1 < 0) == (arg1 < 0))) ereport(ERROR, ...);

The remainder of the paper goes into the gory Comp Sci details and discusses their model for detecting unstable code, which they implemented in LLVM. Of particular interest is the table on page 9, which lists the number of unstable code fragments found in a variety of software packages, including exciting ones like Kerberos.

--
Visit the
Re:PC Lint anyone? by EvanED · 2013-10-29 12:34 · Score: 4, Insightful

Don't worry, the authors know what they're doing.
Just because PC Lint could find a small number of potential bugs doesn't mean it's a solved problem by any means. Program analysis is still pretty crappy in general, and they made another improvement, just like tons of people before them, PC Lint before them, and tons of people before PC Lint.
Re:TFA does a poor job of defining what's happenin by lgw · 2013-10-29 12:57 · Score: 5, Funny

No, the compiler is allowed to to anything it damn well pleases wherever the standard calls behaviou "undefined". One of my favorite quotes ever from a standards discussion:

When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose
Nasal demons can cause code instability.

--
Socialism: a lie told by totalitarians and believed by fools.
OK, before somebody else points it out... by istartedi · 2013-10-29 13:03 · Score: 4, Interesting

My statement is contradictory. I recommended a course of action for undefined behavior, while maintaining that Clang is wrong for documenting a course of action for undefined behavior.
My understanding of "undefined behavior" in the C spec is that it means "anything can happen and the programmer shouldn't rely on what the compiler currently does". Of course, in the real world *something* must happen. If a 3rd party documents what that something is, the compiler is still compliant. It's the programmer's fault for relying on it.
OTOH, if the behavior was "implementation defined" then the compiler authors can define it. If they change their definition from one rev to another without documenting the change, then it's the compiler author's fault for not documenting it.
In other words:
undefined -- programmer's fault for relying on it.
implemenation defined -- compiler's fault for not documenting it.

--
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Re:News flash by ShanghaiBill · 2013-10-29 13:32 · Score: 4, Informative

Didn't RTFA because this is /., but I'd guess that it's code that works now but is fragile under a change of compiler, compiler version, optimization level, or platform.
Yes, you didn't RTFA, because your definition actually makes sense. TFA defines "unstable code" as code with undefined behavior. TFA also claims that many compilers simply DELETE such code. I have never seen a compiler that does that, and I seriously doubt if is really common. Does anyone know of a single compiler that does this? Or is TFA just completely full of crap (as I strongly suspect)?
Re:TFA does a poor job of defining what's happenin by CODiNE · 2013-10-29 13:35 · Score: 4, Interesting

That reminds me of this gem:Overflow in sorting algorithms
That little bug just sat around for a few decades before anyone noticed it.
Quick summary: (low + high) / 2
May have an overflow which is undefined behavior. Really every time we add ints it's possible. Just usually our values don't pass the MAX.

--
Cwm, fjord-bank glyphs vext quiz
Re:News flash by EvanED · 2013-10-29 13:44 · Score: 5, Informative
Yes, you didn't RTFA, because your definition actually makes sense. TFA defines "unstable code" as code with undefined behavior.
...and undefined behavior is exactly what causes the things I listed.

TFA also claims that many compilers simply DELETE such code. I have never seen a compiler that does that, and I seriously doubt if is really common.
You probably haven't used any desktop compilers.
Just a sampling:
- During MS's security push a decade ago, they discovered that the compiler was optimizing away the memset in code such as memset(password, '\0', len); free(password); that was limiting the lifetime of sensitive information, because the assignment to password in the memset was a dead assignment -- it was never read from (not actually undefined behavior, but it is an example of the compiler deleting unused code that was actually there for a purpose)
- I linked part 3 of this series to you in another response, but the first example in here discusses such an optimization that GCC did which removed security checks in the Linux kernel (see also this series -- look down at "A Fun Case Analysis")
- GCC has long turned on -fno-strict-aliasing because optimizations based on the strict aliasing assumption break the kernel (more precisely: code that violates the standard's strict aliasing rules was being "mis-"optimized), though I don't know if it led to security implications
Re:TFA does a poor job of defining what's happenin by Old+Wolf · 2013-10-29 14:06 · Score: 4, Insightful

>The dereference is undefined, and therefore
Stop right here. Once undefined behaviour occurs, "all bets are off" as they say; the remaining code may have any behaviour whatsoever. C works like this on purpose , and it's something I agree with. It means the compiler doesn't have to insert screeds of extra checks , both at compile-time and run-time.
There are plenty of other languages you can use if you want a different language definition :)
These bugs exist even *without* signed integers! by Myria · 2013-10-29 14:25 · Score: 5, Interesting

The first mistake was using signed integers.
The problem is C's promotion rules. In C, when promoting integers to the next size up, typically to the minimum of "int", the rule is to use signed integers if the source type fits, even if the source type is unsigned. This can cause code that seems to use unsigned integers everywhere break because C says signed integer overflow is undefined. Take the following code, for example, which I saw on a blog recently:
uint64_t MultiplyWords(uint16_t x, uint16_y)
{
uint32_t product = x * y;
return product;
}
MultiplyWords(0xFFFF, 0xFFFF) on GCC for x86-64 was returning 0xFFFFFFFFFFFE0001, and yet this is not a compiler bug. From the promotion rules, uint16_t (unsigned short) gets promoted to int, because unsigned short fits in int completely without loss or overflow. So the multiplication became ((int) 0xFFFF) * ((int) 0xFFFF). That multiplication overflows in a signed sense, an undefined operation. The compiler can do whatever it feels like - including generate code that crashes if it wants.
GCC in this case assumes that overflow cannot happen, so therefore x * y is positive (when it's really not at runtime). This means the uint32_t cast does nothing, so is omitted by the optimizer. Now, the code generator sees an int cast to uint64_t, which means sign extension. The optimizer this time isn't smart enough to know again that it's positive and therefore can ignore sign extension and use "mov eax, ecx" to clear the high 32 bits, so it emits a "cqo" opcode to do the sign extension.
So no, avoiding signed integers does not always save you.

--
"Screw Sun, cross-platform will never work. Let's move on and steal the Java language." - Visual J++ Product Manager
Re:News flash by gweihir · 2013-10-29 14:44 · Score: 4, Interesting

That is not "unstable" or "undefined" code. There is already a word for it: dead code. In addition, any programmer worth his/her salt will make sure to define things like that as "volatile", i.e. tell the compiler that they might be accessed at any time from place the complier does not see. Which is exactly the security problem here. Don't blame compilers for programmer incompetence....

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:These bugs exist even *without* signed integers by Animats · 2013-10-29 18:01 · Score: 5, Interesting

The problem is C's promotion rules. In C, when promoting integers to the next size up, typically to the minimum of "int", the rule is to use signed integers if the source type fits, even if the source type is unsigned.
I know. C's handling of integer overflow is "undefined". In Pascal, integer overflow was a detected error. DEC VAX computers could be set to raise a hardware exception on integer overflow, and about thirty years ago, I rebuilt the UNIX command line tools with that checking enabled. Most of them broke.
In the first release of 4.3BSD, TCP would fail to work with non-BSD systems during alternate 4-hour periods. The sequence number arithmetic had been botched due to incorrect casts involving signed and unsigned integers. I found that bug. It wasn't fun.
C's casual attitude towards integer overflow is why today's machines don't have the hardware to interrupt on it. Ada and Java do overflow checks, but the predominance of C sloppyness influenced hardware design too much.
I once wrote a paper, "Type Integer Considered Harmful" on this topic. One of my points was that unsigned arithmetic should not "wrap around" by default. If you want modular arithmetic, you should write something like n = (n +1) % 65536;. The compiler can optimize that into machine instructions that exploit word lengths when the hardware allows, and you'll get the same result on all platforms.
Re:News flash by maxwell+demon · 2013-10-29 18:48 · Score: 4, Interesting

While what you say is true, I think it's not what they mean. Instead what they mean is compilers taking advantage of undefined behaviour you didn't notice. The compiler is allowed to assume that undefined behaviour never happens, and optimize accordingly. The important point is that this can even affect code before the undefined behaviour would occur. For example, consider the following code, where undefined() is some code that causes undefined behaviour:

if (a>4) { a=4; big=true; undefined(); } else { big=false; } assert(a<=4);

Now if a>4, the code inevitably runs into undefined behaviour, and therefore it may assume that a is not larger than 4 right from the start. Therefore it is allowed to compile the complete block to simply

big=false;

Note that even the assert doesn't help because the compiler "knows" it cannot trigger anyway, and therefore optimizes it out.
I think it is not hard to imagine how this can lead to security problems.
Another nice example (which I read on the gcc mailing list quite some time ago; not an exact quote though):

bool validate_passwd(char const* user) { int tries = 0; char const* given_passwd = ask_password(); char const* user_passwd = get_password(user); while (strcmp(given_password, user_password)) { tries = tries++; /* undefined behaviour! */ if (tries > 3) return false; /* allow only to try three times */ printf("password not valid. Please try again.\n"); given_passwd = ask_passwd(); } return true; }

Now if strcmp returns anything but 0, the code inevitably runs into undefined behaviour, therefore the compiler is allowed to assume that never happens, and therefore is allowed to optimize the code to simply

bool validate_passwd(char const* user) { char const* given_passwd = ask_password(); char const* user_passwd = get_password(user); return true; }

So there goes your password security.

--
The Tao of math: The numbers you can count are not the real numbers.