GCC 4.3.0 Exposes a Kernel Bug
ohxten sends news from earlier this month that GCC 4.3.0's new behavior of not clearing the direction flag before a string operation on x86 systems poses problems with kernels — such as Linux and BSD — that do not clear the direction flag before a signal handler is called, despite the ABI specification.
That's what happens when you don't clear that STD...
from 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
to 45 2F 6E 40 3C DF 10 71 4E 41 DF AA 25 7D 31 3F
OK so the kernel developers add a single line of code, the bugzilla ticket is closed, and we get on to real news?
Rule #1: Don't break existing stuff
GCC breaks this cardinal rule. It should be reverted.
Better than a general fault.
A error was found and patched, now remind me why this is news ?
"A bug? In the Kernel? BWAHAHA, a trivial matter" [Snaps random developer in half, ingests]
GCC 4.3.0's new behavior of not clearing the direction flag before a string operation on x86 systems poses problems with kernels -- such as Linux and BSD -- that do not clear the direction flag before a signal handler is called, despite the ABI specification.
Oh my GOD! If this is true, that means- that means-- it... the-
Uh, what does it mean exactly?
What this really exposes is not a bug in any kernel. Indeed, the story states that the "bug" exists in both the BSD and Linux kernels. It really exposes something fascinating about the development process: Code is written based on certain assumptions and a working theory of how the code will function once put into use, but the only way to really know how well it works is to hand it over to the ultimate judge of code correctness--the computer--by running the code. If it works, case closed. Now it's entirely possible that the kernel developers never heard of this obscure nuance of the Intel processor. Then one day, the compiler changed, and with it, the assumptions changed. Mature code that has been declared good years ago seemingly breaks. Now it's easy to blame the code, but really this is a deletion of a feature from the compiler. Nevertheless, it exposes the fact that ultimately, no matter what tools we use and no matter how well we think our code through, you can only consider the code good once it runs and appears to do what it's supposed to.
McCain/Palin '08. Now THAT's hope and change!
1991 was a long time ago. Linux is old.
Help stamp out iliturcy.
This article is not yet public for non-subscribers. The link given is supposed to be for a subscriber to forward to a friend; putting it up on Slashdot goes against the intended spirit and does not help support Linux Weekly News, which deserves the community's support.
With all due respect, an application that uses strcpy will not necessarily bring a system down (nor introduce buffer overflows, or whatever). strcpy used properly is quite safe. Sure, if you strcpy to some unknown memory address of some unknown size then this will cause problems. That is not a strcpy fault, but a programmer fault. strncpy is not inherently "better". To say that "Any application that performs a simple strcpy brings linux down" is FUD.
Microkernels have to follow processor ABIs too.
how to invest, a novice's guide
It's like you got a bunch of cars at a stoplight and you want to walk by each to panhandle for money but instead of starting at the first car in line and the walking down to the back, you start at the first then head out into cross traffic and get run over and something crashes.
I also wrongly assumed GP's view until I actually RTFM-ed....
Don't quote me on this.
Can someone please explain that in terms that non-LKML subscribers can understand?
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
See, I told you we shoulda' used the Hurd!
I seem to recall the MS-DOS 2.x suffered this same problem with either the Int 21 or Int 13 interfaces. (Hey it was 20 years ago, I don't remember the details.) If you made certain BDOS calls with the direction flag set, the message "A evird rorre etirw daeR" ("Read write error drive A" backwards) would be displayed on the console. It wasn't fixed for years. I remember we rigorously enforced the "Clear the direction flag before calling into MS-DOS" rule.
http://www.urbandictionary.com/define.php?term=metric+fuckton
... 4chan ...
>
Now I know you're trolling --- who could be familiar with 4chan and not "fuckton"?
This is assuming the flag is unmodified from the kernel call, saying the string function is called or entered from the kernel. But if the string functions get called mid-code and the flag is changed be some other function, say a memmove that has an overlapping source and destination, the direction flag is set (STD) and the memory copied backwards end-to-start to prevent the beginning being copied over and over by the overlap.
Does GCC's memmove clear the flag (CLD)?
What if someone writes some custom inline assembly with a STD and no CLD (yes, this does violate asm practice - flipping a flag and not resetting it when done) then a string function sometime after that during the same procedure? GCC will fail.
GCC should not rely on the kernel to have the flags in a particular state upon entry, as the functions will not always be called immediately.
from 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
to 45 2F 6E 40 3C DF 10 71 4E 41 DF AA 25 7D 31 3F
Maybe it's time to break that. -- Larry Wall in 199710311718.JAA19082@wall.org
;)
An appropriate quote for the bottom of the page
That means that all other compilers behave like the old GCCs in this case. Otherwise they would have exposed this bug already. So GCCs new behaviour could be seen as either non-standard or "innovative".
Debian, RedHat et al aren't going to release new packages compiled with GCC 4.3.0 for every damn binary. Instead, they'll hold back on providing an update to GCC and they won't compile any updated packages with the updated GCC until the next major release.
Of course, that's not very helpful if you depend on closed-source software and the vendor won't tell you what compiler they use. Neither is it particularly helpful if you run Gentoo (which sooner or later will expect you to upgrade compiler) or if you're in the habit of compiling packages from scratch using a compiler other than the one that shipped with your distro. But for most of us in the real world, that's not really a huge deal.
Most experienced assembler programmers know better than to assume the direction flag will be set or cleared unless this is specifically documented.
The Lumber Cartel, local 42 (Canadian branch)
British Columbia, Canada
I still use GCC 2.95.3 to compile my kernel, but the developers don't allow it to build 2.6 versions. They're too dumb to fix it and I have to use 2.4.
.0 version? It's like running Windows Vista RC0.
Anyway, why would you use a
OK, I challenge you to find a user-space program that brings linux down using strcpy (as opposed to just crashing that particular program). If you are talking about kernel modules then the same is true of any OS.
They don't need to. All they need to do is release an updated kernel.
Mielipiteet omiani - Opinions personal, facts suspect.
I fixed this bug in 1989 in an Intel C compiler. That was some years before the GCC project was started. Some people never learn...
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Yup, and another problem is that there are instructions that leave the direction flag undefined, a random value of either 0 or 1. Therefore one has to always explicitly set the direction flag before using it.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Does this mean that you could hand-craft some assembler code that exploits virtually all Linux and BSD-kernels out there?
True. Major distros will hold back on upgrading to gcc 4.3.0. Unless they already upgraded. For the most part, this bug will only cause headaches (and possibly suicides) to people trying to diagnose issues in their code, either because they didn't get the memo, and are using gcc 4.3.0, or because they are helping someone with run-time issues, who are using gcc 4.3.0. If I remember correctly, we had similar problems with gcc 4.0.x. I don't recall any reported deaths.
Write your own Choose Your Own Adventure. http://www.freegameengines.org/gamebook-engine/
Why would GCC make an assumption about a register, shouldn't it (GCC) set the register to a known value if it needs it ?
OMG OMG OMG! My kernel is vulnerable!!
- regs->flags &= ~(X86_EFLAGS_TF);
+ regs->flags &= ~(X86_EFLAGS_TF | X86_EFLAGS_DF);
make
done.
- these are not the droids you are looking for -
The document referred to is (old) SCO's ABI for System V. If Linux and BSD have not been following this ABI in some respects, perhaps the solution is to have a Linux or BSD ABI that reflects real practice, rather than having a gcc that causes problems because it adheres to a System V ABI that is not being followed.
Well debian already packaged the latest glibc in sid using gcc-4.3. That is how this issue was discovered in the first place.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
It's a feature! Now we know who's been lazy all this while.
You really have absolutely no idea how Gentoo works, do you?
/etc/portage/package.use and package.keywords, editing USE flags, finally emerging the updated package, the fact that you can't easily avoid this unless you're prepared to forsake security updates) really didn't gel with my idea of running a solid system.
Yes, actually. I've run a whole bunch of servers running it.
I found the amount of handholding required in order to turn it into a serious system for server use relative to Debian (such as setting up your own private portage repository, custom holding back of packages, updates which are known in forums to break functionality but don't have the good grace to warn you in the ebuild first, updates which completely restructure a package into different component parts, updates which haven't been tested for backward compatability and so break things subtly, meaning that what should be a simple emerge -U <package name> winds up becoming a complex mix of emerge --sync, editing
Granted, a major update to GCC will almost certainly wind up in a slot of its own. But sooner or later the version that you're using now will be obsoleted and removed from portage altogether, at which point you either have to put the ebuild in your own private portage repository lest future emerge --update's break things or recompile anything which is at risk.
Now, most of these issues can be minimised by following practices that any good sysadmin should be anyway - for instance, setting up a test environment and making changes there first before putting them live. All this does, however, is move the risk from the live system to the test environment. It doesn't eliminate any of the work.
Maybe, but sid's the unstable repository and is intended for exactly this kind of thing. Even the Debian maintainers strongly recommend against using Sid on a production system because things are far more likely to break and they may stay broken for some time.
Apparently you still don't get it. It is not a problem with GCC. It is a bug in the kernel. GCC just helped to detect it. Now, as its' been detected, it is no longer connected to GCC in any way.
Intel is to blame. The original 8086/8088 instruction set was just dumb in this respect. Having a global value (the direction bit) that can determine the behaviour of a class of powerful instructions is a great way to generate all sorts of subtle intermittent bugs. I have been personally burned by this (badly) as have many others.
There is a policy you can enforce to try to improve things. You can try to make everyone leave the direction bit in the most common state after they are done with their less common use of the string instructions. This can work if the policy is enforced by something like a compiler. It won't work if the program is for instance called by another entity outside the control of the compiler. ...such as, for
instance, a kernel calling a signal handler... You end up with a state
of affairs where you have to depend on having some other programmer
remember to set the bit to the right state to have your string
instructions work right. You can't test for this as the bit might be
right almost all the time. This is simply a poor approach.
The only fix that can work reliably for your code is to have the compiler insure that the state of the direction bit is known before any string instructions are executed. If I am for instance using a C compiler I should not have to hear about the 8086/8088 string direction bit. ...ever...
The kernel people should fix their failure to respect the ABI policy. The GCC people should revert to the old more deterministic handling of string instructions. The almost negligible optimization here is simply not worth generating a lot of intermittent, hard to find problems (there are likely more out there). If other compilers do not make their string functions entirely deterministic in the face of all external influences then those other compilers are doing it wrong. We can't fix the hardware architecture so this is a case where defensive programming is the best that can be done.
Bruce
I just heard that this has seriously set back the release date of Duke Nukem Forever!
I used to do assembler, and I can't think of any one time I actually used STD. I often issued CLD when writing interrupt handlers, because that was the safe practice, but is it really that useful to reverse string scans at the opcode level ? I can't think of that many places where it would be useful, easily replaced with a manually decremented loop that's not much slower. It always seemed like a risky thing to do in the first place, and I was never fond of issuing CLD all the time "just in case".
:/
My rant doesn't solve the kernel issue, we'll have to deal with legacy code forever
-Billco, Fnarg.com
Sorry, I know I need to brush up; how many libraries of congress to a fuckton on average?
If it was "exposed" by changing the compiler, the bug is in the compiler.
The kernels compiled with earlier GCC versions don't have the bug,right?
If someone changes the rules, and expects everyone to know it beforehand, it would be a fault in that guy, not the people abiding by the old rules.
yeah, on the other hand it is only because some people do use sid that bugs get spotted before they get a chance to make it into testing.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Did some poking around and it looks like FreeBSD have fixed it for the -CURRENT builds : http://groups.google.com/group/mailing.freebsd.current/browse_thread/thread/3df4366ff396a60b/bfc90b9b0a478628/
Please choose the statement that best describes you:
A) I want to develop programs that are, theoretically, infinitesimally faster, even though they crash whenever I run them in practice.
B) I want to force those annoying kernel developer fucktards to follow the damn specification.
C) I want my software to work reliably, even though it means sacrificing performance and putting up with fucktards.
If you chose A, academia might be right for you.
If you chose B, consider the public sector.
If you chose C, you might be suitable for a career in software development.
http://xkcd.com/756//
I have been writing assembler code for x86 since the beginning. It has always been the coder's responsibility to assure the direction flag is set appropriately before using a repeating instruction. My favorite was "rene scasb". In the old days, we would, pushf ! cli ........ popf to assure the direction flag and place it back where it was before. This used to work in the 8086 time. When reviewing
assembler code, I often ask, where is the direction flag set, when I see a repeating instruction. Not setting it explicitly is risky coding.
Having separate privileges assigned to user accounts rather than a global root is one of the ways that windows is better than traditional unix/linux. W