AMD Confirms CPU Bug Found By DragonFly BSD's Matt Dillon
An anonymous reader writes "Matt Dillon of DragonFly BSD just announced that AMD confirmed a CPU bug he found. Matt quotes part of the mail exchange and it looks like 'consecutive back-to-back pops and (near) return instructions can create a condition where the processor incorrectly updates the stack pointer.' The specific manifestations in DragonFly were random segmentation faults under heavy load."
Though, it's still very serious. At least it generally causes your program to crash rather than spitting out a wrong answer. And it sounds like the sequence of instructions that causes it is not commonly found.
I can well understand the guy who found it being all excited. The CPU is the last place you'd look for a bug, and finding one is pretty impressive, especially a really elusive one like this.
Need a Python, C++, Unix, Linux develop
I'm wondering if they will. This seems like a very odd timing issue that may be a problem in the electronics. Of course, I suppose they could just put in some microcode to wait after certain operations to make sure things settle and so avoid the hardware bug.
Need a Python, C++, Unix, Linux develop
What has Taiwan got to do with this ?
I mean, was the CPU bug somehow introduced by TSMC ?
Muchas Gracias, Señor Edward Snowden !
I suppose to be sure he is not confused with the other Matt Dillon.
It matters because it's impressive. It also seems fair to associate some of the positive impression with DragonflyBSD, and I cannot see any downside to throwing good PR at any BSD flavor.
I can only imagine the time and effort spent on tracking down this problem - a rare CPU condition is exponentially more difficult to narrow down than most programming mistakes. A lot of progress in IT depends on engineers like this, who obsessively solve problems even when it's much easier to just ignore them, try to hack around them or pass the buck around. Kudos.
A pertinent addition to the submission would be which CPUs have been found to be affected.
The second link says Opteron 6168 and Phenom II X4 820. For a second I thought that bulldozer hasn't managed to do anything right, but these two examples are pre bulldozer.
No doubt this is not an exhaustive list.
Matt Dillon is a rather famous programmer (as programmers go). I assume that's why they mention him by name. I think a very large percentage of old Amiga hackers know who he is. He's also done work on the Linux kernel. Despite all that, he's best known for his work on FreeBSD and on his DragonflyBSD project. While a lot of old timers will know that, not everyone else will.
Because now we know not only that something cool was done, but also who did it. Both are relevant.
"First they came for the slanderers and i said nothing."
FWIW:
The failure has been observed on three different machines, all running AMD cpus. A quad opteron 6168 (48 core) box, and two Phenom II x4 820 boxes.
When information is power, privacy is freedom.
I have to worry about stack smashing bugs here... can there be a way for (say) a data pattern in a media file, or carefully crafted javascript or java code that's been JIT-compiled, to break out of its sandbox? What about a hostile OS kernel running inside a VPS container taking over the hypervisor or bare iron? Hmm.
A floating point precision error. Floating points cannot represent quite a diverse collection of numbers, this is especially problematic when you're doing intersections with small objects. Say a ray projected from an object will, because of the minute errors in floating point, collide with the same object (which produces some cool patterns).
Floating points are kind of crappy. Not that I have a better option with viable performance on a desktop machine. That's not a division bug, that's just the nature of representing numbers in binary with a fixed number of bits.
Matt Dillon, desperate after chasing unsuccessfully mary in Something about Mary radically changed jobs and started to study computer science...
Slashdot, fix the reply notifications... You won't get away with it...
Division is division, regardless of the base used. The issue is that in base 10 (aka decimal numbers), division by 2 and 5 always comes out to a finite decimal; in binary numbers only division by 2 comes out to a finite decimal. Dividing by any primes other than 2 and 5 (and numbers involving those primes) will require rounding in both bases (and they may not necessarily round the same way). That is, unless you're only dividing by combinations of 2 and 5, there really is no preferred base.
The main problem with a QBASIC "single" (a 32-bit float) is the extremely limited precision of that type, and not so much how rounding is done. Most calculators these days can handle 8-12 digits, you have to use 64-bit floats (a QBASIC "double") to get anything like that from your program.
Pop two off the stack and ret to the calling routine seems fairly common to me. Lots of functions use two arguments and are called with near calls in various programming languages.
That might have been true on 386s.
But currently we're in 2012 and the most widely used instruction set for Linux on AMD processors is x86_64. Because these 64bit processors feature a big number of registers, the two arguments will be passed as registers, not on the stack. So the sequence of instructions isn't indeed common.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
You try and find something that "the other guy" had a problem with and bring it up as worse so as to try and "protect" the thing you are a fan about? Because I see nothing about the FDIV bug anywhere but your post.
Oh and you know what that bug applied to, right? The Intel Pentium, the ORIGINAL Pentium. Not the Pentium MMX, not the Pentium Pro, not the Pentium II, not the Pentium III, not the Pentium 4, not the Core, not the Core 2, not the Core i, not the second generation Core i. And yes, that's how many major processor versions from Intel there have been since then (with another to launch in the next couple weeks). The original Pentium chips that had this problem came out almost 2 decades ago, 1993.
So seriously, leave off it. I get tired of any time there is a problem with $Product_X fans of it will point out how $Product_Y had a similar or worse error way back in the day and that somehow changes things.
No it doesn't. The story is about the AMD chips, nobody gives a shit about the FDIV bug and I'll wager there are people reading Slashdot who weren't alive when it happened.
The good news for AMD is that processors can often patch around this shit in microcode these days so a recall may not be needed. Have to see, but the potential is there for a software (so to speak) fix.
Presumably AMD will announce affected CPUs fairly soon, after they get done testing. This isn't the kind of thing they would be able to sit on, even if they wanted to. If your CPU has been working for you in general it isn't like it is going to suddenly go and beat up your cat or something, it'll be fine for a bit longer while AMD figures out which ones are all affected and figures out how to fix it.
As I noted in another post, depending on it may be possible to fix it via microcode. CPUs aren't "pure" hardware these days. They have a bit of software that tells them how to do things and on some of them (Intel CPUs I know for sure) it is field upgradable. So they may find a way to patch out the bug.
Just keep an eye on their page, maybe send them an e-mail saying you'd like a notice when they know. Should be soon I'd imagine.
This would be insightful and all -- except that it isn't -- because DragonFly BSD uses the same x86-64 calling conventions as Linux.
512bit calculations aren't that expensive
Yes they are.
"His name was James Damore."
Except I very much doubt that would solve whatever "problems" this guy was having. As a newbie programmer, it's entirely understandable that he wouldn't know about the fun you can (or can't) have with floating point operations. However, I very much doubt that sheer accuracy was the issue, rather he was probably making assumptions such as 1.0 - 1.0 == 0.0, when in reality the result isn't necessarily exactly 0.0. Considering it's an MMO, he probably had something like "Why is this guy not dying, he has 4 HP left and this attack does exactly 4 damage? Must be a bug!". /* die */", you do something like "if (Health is less than 0.0 + epsilon) /* die */", with "epsilon" being a very small number (such as 0.00000001).
Really, it doesn't matter a huge amount, if such "accuracy" is important to your game then instead of doing "if(Health is less than 0.0)
The real fun with floats, however, is that each platform does something different. It's possible that the OP ran the game on Intel hardware and got one result (which may have seemed more "correct"), then ran it on an AMD machine and got a different (seemingly less-correct) result - you can see why he naturally jumped to the conclusion that the AMD system had a bug.
In reality, chances are both systems were "wrong" anyway, they just happen to use different implementations for floating-point logic. To solve this, once again higher rates of calculations aren't the answer, but rather there's a compiler switch (/fp:strict in VS) that will use the ISO standard floating point model. It's not as fast as the other methods, but you will at least game the same results across different platforms (assuming that CPU has implemented the standard correctly which these days is almost certain).
There's LOTS of fantastic info on this here: http://gafferongames.com/networking-for-game-programmers/floating-point-determinism/
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
Multiplying and dividing are the least of your worries in floating point. Adding and subtracting are where the real problems happen.
eg.
float a = 0.1;
float b = 0.2;
if (a == b) {
print("Before the add, a is equal to b");
}
float c = 10000000;
a += c;
b += c;
if (a == b) {
print("After the add, a is equal to b");
}
What's the output?
What happens if you multiply by c instead of adding it?
No sig today...
Indeed, even WIN64 uses the same essential calling convention because the hardware itself is designed with it in mind.. specifically the 64-bit structured exception handling requires 16-byte aligned read and write operations by design.
"His name was James Damore."
I'm pretty sure it was with the introduction of the Pentium (which had the famous FDIV bug) that John Carmack officially made the switch to single precision FP for most things because it was finally fast enough. FP wasn't cheap, per se, but the simplification it brings over keeping track of binary points and precision/range tradeoffs in integerized algorithms should not be underestimated either.
For example, if I want to do a floating point multiply and add, I just say: f3 = f0 * f1 + f2. Before I even start writing a fixed-point multiply and add, I need to ask what the Q points (binary points) are for each of the terms, what Q point you'd like for the result, and what sort of rounding (if any) the result requires for stability. You can end up with a monstrosity like this, assuming all four numbers are at the same Q point:
x3 = (int)(((long long)x0 * x1 + (1LL > Q) + x2;
Ok, maybe you hide that behind a macro, but what about cases where some of the terms are at different Q points? A fully general macro (which is no fun to write, BTW) would also have a ton of arguments, and only reduce you to something like x3 = FXMULADD(x0, Q0, x1, Q1, x2, Q2, Q3); which won't win you any awards in the clarity department.
And look at the operations themselves, too. You have type promotion, extra adds and shifts... the instruction sequence itself isn't super efficient. It pays off when floating point takes 10s and 100s of cycles, but is a dubious win when most of the core FP starts coming down into the single digits. With the Pentium's dual pipes and the fact you could keep integer instructions flowing in parallel to the float, that's effectively what happened. And notice we haven't even talked about dynamic range and overflow errors and how they screw you up. If you have to add tests for that... yuck. With floating point, you degrade gracefully if your dynamic range spikes a little higher than you expect.
Anyway, getting back on topic: This isn't the first time an x86 has had a stack-pointer related bug. I remember the 80386s that had the so-called "POPAD bug". That one was a bit easier to hit.
Hopefully, AMD will be able to publish a microcode update or something to work around theirs. That's one thing modern x86s have over their predecessors: A good number of CPU bugs can be patched around with microcode updates. I believe Intel added that with the Pentium Pro, and AMD followed suit. I believe my Phenom is one of the affected parts. I guess I'll have to keep an eye out for such a patch.
Program Intellivision!
AMD has indicated to me that the Bulldozer is not effected, which is a relief.
I guess I should have realized this would get slashdotted. In anycase, it took quite a bit of effort to track the bug down. It was very difficult to reproduce reliably. It isn't a show stopper in that it really takes a lot of work to get it to happen and most people will never see it, but it's certainly a significant bug owing to the fact that it can be reproduced with normal instruction sequences.
I began to suspect it might be a cpu bug last year and after exhaustive testing I posted my suspicions in December:
http://leaf.dragonflybsd.org/mailarchive/kernel/2011-12/msg00025.html
Older versions of GCC were more prone to generate the sequence of POP's + RET, coupled with a deep recursion and other stack state, that could result in the bug. It just so happened that DragonFly's buildworld hit the right combination inside gcc, and even then the bug only occurred sometimes and only one a small subset of .c files being compiled (like maybe 2-3 files). The bug never manifested anywhere else, doing anything else, running any other application. Ever.
In particular the bug disappeared with later versions of GCC and disppeared when I messed with the optimizations. We use -O by default, not -O2. The bug disappeared when I produced code with gcc -O2 (using 4.4.7).
It is really unlikely that Linux is effected... the sensitivity to particular code sequences laid out in the compiler is so fine that adding a single instruction virtually anywhere could make the bug disappear. Even just shifting the stack pointer a little bit would make it disappear.
In anycase, for a programmer like me being able to find an honest-to-god cpu bug in a modern cpu is very cool :-)
-Matt
What's really amusing is that I've been on the scene for so long if you google my name 'Matthew Dillon', the first entry is actually... me! And not the actor(s). I'm sure that grinds a bit but I do bask in the occasional fan mail reaching my inbox, just before I hit the 'delete' key.
In recent years its started to flip back and forth, and I expect Hollywood will again take over the top spot after things die down again :-)
-Matt
Intel has had quite a few serious chip bugs too, all in errata. A number of new cpu bugs in both AMD and Intel chips always appears in new generations, but both companies have very large test suites and the number of new bugs goes down in every generation.
Don't forget that Intel had to recall a sandybridge chipset early in the sandybridge cycle, which cost them something like a billion dollars because the related motherboards had to be thrown away and replaced. That was due to internal on-chip circuitry related to a SATA port burning out.
Right at this moment AMD has two issues facing it in order to compete on workstations: (1) Power and (2) Performance. Their initial bulldozer release clearly depends too much on compiler optimizations to make full use of the architecture. They will clearly have to bulk-up some of the simplifications they made that made their cpu cores a little too sensitive to instruction sequences generated by compilers and I hope their next few releases will do better.
On power consumption it comes down to the Fab as much as anything else. Their dependence on the Fab is clearly a problem and they've made a break for it to try to solve it, even though it is costing them dearly. At the same time Intel has made some major advances in their three fabs, to the point where Intel can do their entire production on just two of those three fabs now but they decided to keep the third fab because they think they can 'grow into' it.
So AMD definitely has some work ahead of it, and I am hoping they reserve some of their focus for the high-end and don't concentrate entirely on laptops. I always like to say that I love AMD, but in the stock market I invest in Intel. That's just business. But I got on the AMD bandwagon big-time when they got to 64-bit first and I stuck with them all the way through the Phenom II.
Now, at this moment, Intel's SandyBridge has the best value and AMDs bulldozer is quite far behind, so new purchases for me right now are Intel. That may change in the next year or two and when it does my new purchases will happily be in the AMD camp again. Frankly, AMD only has to get within shouting distance (~8%) of Intel and I will happily use AMD. AMD doesn't have to beat Intel.
I think there are a number of things AMD can do right now to compete better with Intel. One of the biggest is in the mini-server department (albeit clearly with lower volumes than their current focus on laptops & integrated graphics). AMD consumer cpus (aka Phenom II) always had ECC support but very few motherboards actually supported it, which made it difficult to use AMD for mini-servers and avoid the Intel Xeon tax to get ECC. If AMD worked on the mobo vendors to ALWAYS support an ECC option that would allow them to compete against Intel Xeons on price, even if they are unable to compete on performance.
On the opterons AMD clearly has the right idea going with high-core-count cpus, but the memory subsystem is lagging too much to really be able to make use of all those cores. That seems to be low-hanging fruit to me, something which should be readily addressable by AMD. The opterons still have a lot of value and potentially can have a radical improvement in value with Bulldozer, but only if AMD can push the core count and improve the memory subsystem.
On large multi-core boxes AMD also needs to improve CMPXCHG and other atomic instructions in situations where contention is high. Right now multi-chip opteron systems seriously lag Intel on contended latency due to cache coherency inefficiencies. Will Bulldozer fix those latency issues? I don't know.
AMD only needs to get within shouting distance of Intel for me to buy their chips, and work their mobo producers a bit more to get better overall support for their chip's capabilities. They don't have to beat Intel.
-Matt
Once in the late 1990's we had a weird bug where FTPing or RCPing a particular file between two offices would often result in a corrupt file on the other end. We kept scratching our heads trying to figure out what could possibly be corrupting the file. FTPing it anywhere else succeeded... no corruption. Everything else between the offices seemed to work ok.
It wound up being a hardware issue with the T3 between the two offices. The hardware would corrupt the bitstream in a manner that tended to PASS the TCP/IP checksum, resulting in corrupted data. It required a particular pattern of 1's and 0's for the bitstream to be corrupted in a manner that passed the checksum, which this particular file happened to have.
These days, of course, I use scp to transfer files whenever possible. SSH will detect that sort of corruption and fail with a protocol error. Encryption has certain uses beyond just encrypting the data, it seems!
-Matt
No, you are mistaken because you didn't read my post (or any of the posts above it). We're talking about floating point rounding irregularities that are present in ALL modern processors, not the floating point bug you're referring to.
In any case, there is a different floating-point bug that affected some AMD CPU's as well - http://www.reghardware.com/2006/04/28/amd_opteron_fpu_bug/
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
I stand by my original post.
I did not take issue with the floating point irregularities. In fact, I also believe that the issues he experienced were not due to the FDIV problem he believed to be the cause. I probably would have used the fact that the last release of QuickBasic was in about 1989, before the widespread inclusion of FPUs in PCs, and that QuickBasic would almost certainly use software emulation for floating point arithmetic. It therefore would not have triggered a bug with the FDIV instruction.
What I did take issue with was your notion that he would have run on the AMD chip and seen a less accurate result. As I said, the bug he was talking about was the FDIV bug.
The idea that the QuickBasic would trigger an overheating-related bug on a 2006 Opteron is even more laughable than the OP's original troll post. :p
If it's in you sig, it's in your post.