Are Buffer Overflow Sploits Intel's Fault?
Bruce Perens submitted a story he wrote for his website on overflows and who's fault they are. I'm pretty skeptical of almost every point raised in this story, but it's an interesting read. [Updated 21:13 by t] As Sea Monkey points out, Bruce has now taken down the article, with a brief note: "I've withdrawn this article after enough people convinced me that I didn't know what I was talking about. It happens sometimes. Thanks." What if everyone displayed such grace?
I disagree with this, making it the developers responsibility to write bounds checking code every time they deal with input is why we are in this mess today. Not a week goes by without annother buffer overflow sob story on Bugtraq. Asking software developers never to make any mistakes, ever, is not a realistic solution and assigning blame isn't going to make the problem go away.
There are a few possible solutions, none of them really easy.
Buffer overflows and other common security problems have been with us for over thirty years and still aren't in the "solved problems" bin. This is inexcusable. If people are going to rely on computers in their daily lives, the computer have to be reliable and having the possibility of security comprimise using 30 year old techniques does not a reliable computer make.
-- Remember: Wherever you go, there you are!
And to help half-wits like us still do something useful with software, it might be nice to give us tools we don't hurt ourselves with too much. You know, scissors instead of a Samurai sword. A Toyota Camry instead of a Formula 1 race car. 110V household current instead of 50KV "professional" power.
The fact is, C/C++ is too powerful for me. Oh, I understand it just fine, and with enough time and effort, I can make something work semi-reliably in it. But what's the benefit I get for that self-flagellation? Should I spend all that extra time because finding the bugs "hurts so good"?
The fact is: writing code in C/C++ is a lot of unnecessary work. The only reason why people put up with it is because everybody else does it, so it's the path of least resistance if you want to use the "standard" compiler on your platform, use a language other people are likely to understand, and use other people's C/C++ libraries. But make no mistake about it: C/C++ is successful these days in spite of its cumbersome design, not because of it.
'' Oh? I'm fairly sure machine code is /also/ "unsafe", and that's what your pretty source code ends up as. How do you prove that your oh-so-wonderful language is still safe when rendered into raw machine code? ''
I'm sure you're thinking that you've got me beat, but in fact this is a great question!
The old, less-satisfying answer is that compilers are less likely to have bugs than the programs they're given. This is probably true (I don't recall any exploits due to compiler bugs, though to be fair I do recall some Java VM exploits).
The new exciting answer is: We use a type-safe subset of the target machine's assembly code.
In our TILT compiler for instance, we take ML code and put it through a series of transformations. At each transformation or optimization we also translate the types (a static proof that the program can't crash) until we get to machine language. This catches a lot of compiler bugs, and helps propagate safety properties to the raw code.
The result is that we have machine language which can pretty easily be checked for type-safety. This allows us to do some other cool things, like ship the proof along with the raw machine code, to be executed on someone else's machine. They don't have to trust us (just the proof), and it doesn't suffer sandboxing costs like java. Wow! Read about Proof Carrying Code .
The second answer isn't really usable today, except in theory. The first is absolutely practical, though. (even if it fails due to compiler bugs, we'd still cut down on a high percentage of errors -- and we'd only need to fix bugs in one place).
Buffer overflows are the fault of the LANGUAGE. Important system utilities need to be written in bounds-checked languages. Some compilers, no matter the architecture, will write executable code on the stack: "trampolines". Unfortunatly, this is common enough that the OS can't blindly turn off the executable bit on the stack pages. And non-executable stack pages don't stop all buffer overflow attacks, they just require a 2 part attack: A heap buffer to write the code, and a stack buffer to overwrite the return address. The heap buffer doesn't necessarily even need to be overflown, the attacker just needs to be able to deduce the address. And one can't set heap-addresses to be nonexecutable, simply because there are MANY language environments which do create code at runtime, such as interpreters, JITs, etc etc etc.
Nicholas C Weaver
nweaver@cs.berkeley.edu
Test your net with Netalyzr
What if everyone displayed such grace?
Or what if everyone bothered to do some research before writing and self-promoting some inane rant.
If you used the segment registers, the result was basically a highly non-linear address space. In a lot of ways, it was an 8 bit processor with 16 bit registers and hardware bank switching (for those of you that remember bank switching).
as a result, there were a few 'standard' memory models that programmers used:
- Small address space: All segment registers the same. don't touch them. This gave you a flat 16bit (64k)address space, turning the machine into a glorified 8085/Z80 -- almost completely source code (assembler!) compatible. It also gave a slight speed advantage, since all pointers and integers were 16 bits wide.
- Intermediate address space: segment registers point to disjoint spaces. not too much difference but you get some breathing space since the code and data don't share the same (tiny!) 64K address space.. pointers are still 16 bits, but you now have to remember which segment you're talking to.
- 'large' address space: all pointers are 32 bits wide. (include both segment registers and then pointers within the segments). This gives you access to the full 1M address space. (the 640K limit was because 380K was reserved for I/O space).
The 80286 allowed people to break the 1M barrier without doing bank switching (EMS?), but it turned the segment register/pointer problem into a serious horror story. Unless you were seriously masochistic (or just plain desperate) you just made it look like an 8086 that ran a bit faster.SERIOUS performance hit. If you allow arrays >64K then just about every array access requires you to calculate and load the segment register. address math sucks because if you have two 32 bit addresses A and B, A != B does not necessarily mean that they don't point to the same memory, and *X++ can require some serious work to do the exepected thing.
When they came out with the '386 you now had segments of 4GB each. This was at a time when a 2GB ram module could have been camouflaged as a desk and would have required a 15KW watt power supply.
Most programmers and OS designers just set all the segment registers the same (the '386 equivalent of the 'small memory model', and forget about them (I called this traumatic amnesia).
So, yes: Intel has a Segment model that could be used to provide security, but few people are brave/stupid enough to risk the horror stories/ flashbacks that enabling it might entail.
Intel: Just short of intelligent.
Free Software: Like love, it grows best when given away.
The underlying problem is that C/C++/Objective-C do not have mechanisms to protect against these kinds of problems. In fact, it's impossible to write substantial programs in those languages that use only "safe" constructs. This is a peculiar and fundamental bug in the C-family language design.
There are excellent alternatives around. Modula-3, Oberon, Ada, Sather, and Eiffel all have efficient, free, open source implementations around, they all provide access to unsafe features when needed, and one of them should satisfy anybody's programming needs. Java is an excellent applications and server programming language, although it has a bit more overhead and no access to low-level features.
So, folks, get with the program and stop writing servers and other applications in C/C++.
This is a very old debate, and it's been raised on the kernel list several times. The problem is that it seems pretty clear that given a buffer overrun attack which can be exploitable without the stack-exec patch, it's possible to transform that attack into an exploit which will work with the stack-exec patch present.
It may require more work to create the exploit, but it's the sort of thing which only one person needs to do and then share with 100,000 of his best friends on some cracker web site. Hence, such a patch only provides the illusion of security, and it adds crap to the kernel. (There's all sorts of kludges you have to put in there to make sure that trampoline code doesn't break, etc., etc.)
The problem isn't Intels fault because the arch has an execute bit in the segments. The original idea was you put your code in a separate code segment from your stack and data segments. The real problem is OS designers who for various reasons decide that the x86 arch's segmentation should be ignored and set the code segments equal in size to the data segments and stack segments. It then becomes a simple matter to just jump into the data or stack segment and begin executing code.
Of course since most of the OS's don't properly use the protection mechanisms Intel has provided, I guess it becomes Intels fault if they don't extend the arch to support a feature and potentially break downward compatibility with other OS's using the current paging system.
Blame the language! C and C++ continue to be inappropriate for security-critical work.
Aside from speed-critical stuff like kernels and Quake 3, I don't see the need to write programs in C and C++ any more.
Let's start using modern languages with type safety. They're easier to write programs in (because debugging is easier) and not that slow.
I know that I'd gladly take the 2x speed hit on my security-critical apps (mail daemon, web server, ssh, etc.) to know that they cannot have this kind of bug in them, because they were written in a language like ML, Eiffel, Haskell, or even Java.
They need to be deprecated more forcefully. All the unsafe functions should be pulled from the standard C library and moved to something like "deprecated_unsafe_library.h". All set-UID programs need to be purged of those functions. Now. Any manufacturer shipping a system with those functions in a security-critical program should be sued for gross negligence.
Crispin
-----
Immunix: Free Hardened Linux
Chief Scientist, WireX
Ick. That's just the sort of mundane task I want a compiler for. As a programmer, I already have too much to worry about -- bounds checking is one simple task that I'd just as soon have the compiler do.
In most cases, the bounds check can be hoisted out of loops, so there's almost no overhead. In a perfect world, I'd like to see a compiler that, when given a high enough warning level, warns that it can't hoist bounds checks.
Blame the developer!
Sure, some operating systems or languages or chips hold the coder's hand and make some dangerous things impossible or difficult to do.
It's still the programmer's fault for not knowing what the (void*) they're doing.
This is the same argument as "C++ is slow!" It's only slow if you don't bother to learn what code a C++ compiler generates, using lots of mechanisms without realizing it. C++ implements its mechanisms as tightly as it can, but every mechanism you use takes some time to operate.
Back to buffer overrun security: If you are gonna accept data from an untrusted source, why are you (1) putting it on the must-be-kept-inviolate stack, (2) not doing everything in your power to accept no more than n bytes that have been allocated?
If the compiler docs specifically say "data in auto variables will never be put into an executable address space," and it does, then it's time to fix the compiler or docs. Likewise if the docs belie the behavior of a chip, time to fix the chip or docs.
Don't blame a microprocessor for your mess. Don't blame a language for your mess.
You have only yourself to blame.
[
Try this hypothetical: what if, instead of doing public speeches, polticians took to publishing their opinions in articles on the web? That way, if anything they say produces a bad reaction, they can just edit it away, and no one will be able to figure out what the complaints were about. Very convienient, eh?
My take: If you publish an article, and then later recant, the thing to do is to add a link at the top pointing to your later thoughts on the subject.
Lest you be confused by the +1 funny on my post, let me say that I am not joking.
2x slower is the most conservative estimate for the speed of modern safe languages against C code. (In practice I've seen much better. Does anyone trust benchmarks?) My point is, even if it is 2X slower, I'll gladly take it and sleep a little more soundly at night knowing that my linux box isn't being hacked due to 20 year-old issues. 99% of my box's CPU time is spent at Nice -19 trying to find big primes for the GIMPS project.
Modern languages (take java if OO is your thing, but there are more intersting languages around) have SOLVED this problem with buffer checking (or static proofs that checking isn't needed). Without having to worry about this type of common security hole, programmers can spend more time on things we REALLY need: documentation, maintainable code, asymptotic speed increases, and the other possible security holes (ie, not escaping shell metacharacters in user input).
See my thread on Functional Languages for what I think is a convincing argument about modern typed languages in general. I know my position is extreme, but that doesn't make it a joke.
http://slashdot.org/comments.pl?sid=00/07/01/23
No, I think that's more akin to what a packet sniffer does. But close!
Friends don't let friends use multiple inheritance.
Like a system, and langauge can be as secure or insecure as you can make it. One can write an extremely tight program in C++ while writing one in Perl or Java that leaves gaping security holes open.
This statement troubles me. C/C++ addict who have little exposure to other languages have little knowledge of what they're missing.
_Many_ (if not most?) security attacks involve buffer overflows. You have to _work_ and _think_ to free yourself of buffer overflows in C/C++. In other languages, this protection comes for free.
Yes, it's possible to make a secure program in C/C++. But it's just a hell of a lot easier in bounds-checking languages.
So there.