GCC 4.3.0 Exposes a Kernel Bug
ohxten sends news from earlier this month that GCC 4.3.0's new behavior of not clearing the direction flag before a string operation on x86 systems poses problems with kernels — such as Linux and BSD — that do not clear the direction flag before a signal handler is called, despite the ABI specification.
That's what happens when you don't clear that STD...
from 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
to 45 2F 6E 40 3C DF 10 71 4E 41 DF AA 25 7D 31 3F
OK so the kernel developers add a single line of code, the bugzilla ticket is closed, and we get on to real news?
Better than a general fault.
"Rule #1: Don't break existing stuff"
The ABI wasn't being followed correctly, hence GCC, Linux and the BSD kernels were already broken.
"GCC breaks this cardinal rule. It should be reverted."
It is not a wise idea to revert corrections to long standing issues.
So, are we going to get on GCC's case for enforcing standards compliance and thus breaking backwards compatibility while insisting that Microsoft should take the opposite approach with IE8?
"Rule #1: Don't break existing stuff"
GCC is in the business of creating new and better optimizations. It is pretty much impossible to make optimizations without assuming things in the ABI. As more and more stuff from the ABI is assumed in the optimizations, people get away with less violations of the ABI, but without assuming more stuff, faster optimizations wouldn't happen.
Because the newest versions of GCC are necessary to improve the state of the art in C compiler optimizations in the open source world, the appropriate reaction to this is to have the compiler people follow the spec, and assume the spec, and if assuming the spec breaks something, the people affected by the breakage don't upgrade their compilers.
This is why there are still people using GCC versions from the stone age.
I suppose this might be a longstanding issue if Linux was Unix.
GCC 4.3.0's new behavior of not clearing the direction flag before a string operation on x86 systems poses problems with kernels -- such as Linux and BSD -- that do not clear the direction flag before a signal handler is called, despite the ABI specification.
Oh my GOD! If this is true, that means- that means-- it... the-
Uh, what does it mean exactly?
What this really exposes is not a bug in any kernel. Indeed, the story states that the "bug" exists in both the BSD and Linux kernels. It really exposes something fascinating about the development process: Code is written based on certain assumptions and a working theory of how the code will function once put into use, but the only way to really know how well it works is to hand it over to the ultimate judge of code correctness--the computer--by running the code. If it works, case closed. Now it's entirely possible that the kernel developers never heard of this obscure nuance of the Intel processor. Then one day, the compiler changed, and with it, the assumptions changed. Mature code that has been declared good years ago seemingly breaks. Now it's easy to blame the code, but really this is a deletion of a feature from the compiler. Nevertheless, it exposes the fact that ultimately, no matter what tools we use and no matter how well we think our code through, you can only consider the code good once it runs and appears to do what it's supposed to.
McCain/Palin '08. Now THAT's hope and change!
Check the BSD mailing lists for yourself, they are affected. I'll give you one example below:
http://leaf.dragonflybsd.org/mailarchive/commits/2008-03/msg00072.html
Before flaming people next time, at least try and learn about what you're talking about.
This article is not yet public for non-subscribers. The link given is supposed to be for a subscriber to forward to a friend; putting it up on Slashdot goes against the intended spirit and does not help support Linux Weekly News, which deserves the community's support.
Using that logic Microsoft shouldn't try to improve security in Windows since it breaks many third party applications that depend on exploits and other silly behavior to function.
I seem to recall the MS-DOS 2.x suffered this same problem with either the Int 21 or Int 13 interfaces. (Hey it was 20 years ago, I don't remember the details.) If you made certain BDOS calls with the direction flag set, the message "A evird rorre etirw daeR" ("Read write error drive A" backwards) would be displayed on the console. It wasn't fixed for years. I remember we rigorously enforced the "Clear the direction flag before calling into MS-DOS" rule.
Most experienced assembler programmers know better than to assume the direction flag will be set or cleared unless this is specifically documented.
The Lumber Cartel, local 42 (Canadian branch)
British Columbia, Canada
Silly question time...
If this managed to affect both Linux and BSD despite no relevant common code, is Windows affected? I'm guessing OSX is, thanks to its BSD heritage. Has anyone tested either of them, though? How about other OSes?
It is not quite as bad as that. It causes problems between two threads, but both threads have to be from the same program. If someone has such a specially crafted program running on their system, they have been breached already.
No privilege escalation, only DOS.
Mielipiteet omiani - Opinions personal, facts suspect.
I fixed this bug in 1989 in an Intel C compiler. That was some years before the GCC project was started. Some people never learn...
Excuse me, but please get off my Pennisetum Clandestinum, eh!
On the other hand: the instructions affected by this aren't used very much, so if you want optimizations, a good candidate would be to not clear the flag unless it is needed. If the ABI were simply changed to allow this, no existing code would break (obviously), and future code could both conform to the new ABI *and* avoid the overhead of unnecessary instructions to clear the flag when it is not being used.
I suppose the only barrier to this optimization would be the political effort needed to get everyone to agreee to change the ABI.
Does this mean that you could hand-craft some assembler code that exploits virtually all Linux and BSD-kernels out there?
1) Nobody is getting on gcc's case. As I understand it, they are doing the right thing, and reverting to the older, safer, although slightly slower, behavior.
2) Perhaps you haven't gotten the news, but IE8 is doing the right thing too, by using their "less broken" mode by default. This is a switch from what they announced earlier, where you would have to opt-in to better standards compliance.
3) The difference between IE, and gcc is IE is broken, and gcc is not. Clearing the DF does not break standards in any way. In fact, according to the ABI, it needed to be done anyway (although the kernel is supposed to do it). Guess what happens when you clear the DF twice?
Write your own Choose Your Own Adventure. http://www.freegameengines.org/gamebook-engine/
Enforcing standards compliance will be a pain in the short run, but pay off in the long run. Because you can get away with accommodating old bugs (or bad designs, but that gets offtopic) for a while, but eventually the difficulty in maintaining all the quirks grows to a point where it is no longer doable.
/. on how that worked out ;-)
I think Windows Vista is a good example of what happens when you try to maintain backwards compatibility to the assorted bugs and mis-designs of decades. See the various Vista articles on
If Microsoft takes the opposite approach with IE8, I consider that a good move and a sign that they are capable of learning.
C - the footgun of programming languages
Windows does not have signal handlers natively. (or actually, only a few now that I google it:SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV, SIGTERM) There is the whole SEH C-language exceptions which take over some of the uses, but no other signals natively. So you won't write a signal handler that gets called on a timer.
Full signals for GCC-compiled programs would be implemented by Cygwin which should give you timer signals and so on. Since the standard way to upgrade GCC under cygwin is to use the cygwin upgrade/package manager, they can just make the new GCC package depend on an updated cygwin DLL which could set the correct flag for you in a thunk before passing on the signal.
Don't bother trying to compile GCC yourself under cygwin, it's quite painful. Or at least time-consuming, the slower process spawning makes configure take an hour or more last time I tried it a few years ago. And then you have to wait for make bootstrap to finish.
Then again, MS isn't notorious for following standards. If this does show up under windows (say when starting an SEH handler) they'll just say that that's the windows ABI and ignore it.
Hell, it might even be different under win98/XP/Vista, as they are different kernel.
You lose one CPU cycle ?
Religion is what happens when nature strikes and groupthink goes wrong.
You don't get it either. In a signal-enabled environment there can't be any policy that would ensure the deterministic state of the flag, even if you set it explicitly before each flag-dependent operation. The only way to fix the problem is to make sure that the signal-routing environment meticulously stores and restores the value when handling interruptions. This was not done. This is the the problem in question. It was there all along and it is not related to any compilers. The current version of GCC was simply more likely to reveal it (and it did reveal it), but the problem itself was there since the beginning of time and can lead to problems with any version of GCC, or any other compiler.
I just heard that this has seriously set back the release date of Duke Nukem Forever!
Actually, no. Two threads will work just fine, because the state of the CPU in its entirety (all flags) is saved and restored at when switching between them - indeed, if it wasn't, simply clearing the flag before using it wouldn't help any, because a task switch can occur between any two instructions, including the one clearing the flag and the one immediately following, which makes use of the now-cleared flag.
No, the problem is in signal handlers, which are the software-level equivalent of interrupts. When a thread receives an signal, and a handler has been registered, it immediately interrupts what it was doing and executes the handler function - or, more precisely, the kernel switches the point of execution to the start of that function. Now, the problem is that the spec says that a certain flag should be cleared whenever a function starts, and he kernel doesn't make sure it is. It didn't matter previously, because the GCC generated code to clear it anyway; however, this is redundant according to the spec, so it was dropped.
So, to sum it up: this has nothing to do with threading and can affect single-threaded programs just fine.
This bug could conceivably cause parts of a program's memory be overwritten by the contents of a string. It isn't unthinkable that this might cause foreign code execution attack in the program.
Altought it does seem pretty unlikely that anyone would do string copying in a signal handler...
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Please choose the statement that best describes you:
A) I want to develop programs that are, theoretically, infinitesimally faster, even though they crash whenever I run them in practice.
B) I want to force those annoying kernel developer fucktards to follow the damn specification.
C) I want my software to work reliably, even though it means sacrificing performance and putting up with fucktards.
If you chose A, academia might be right for you.
If you chose B, consider the public sector.
If you chose C, you might be suitable for a career in software development.
http://xkcd.com/756//