Erratum Plagues Quad-Core Opterons, Phenoms
theraindog writes "Errata are not uncommon with new processors, but a problem with the TLB logic in AMD's quad-core Opteron and Phenom processors appears to be quite serious. The erratum is so severe that AMD has issued a 'stop ship' order on all quad-core Opterons. AMD has also blamed this bug for the delay of the 2.4GHz Phenom, despite the fact that the erratum is unrelated to clock speed. A BIOS-based workaround for the issue has been made available to motherboard makers, but it apparently carries a 10-20% performance penalty. What's more disturbing is that AMD knew of the erratum and the potential performance hit associated with fixing it before it launched the Phenom processor. Hardware provided to the press for reviews did not include the fix, conveniently overstating Phenom performance."
I'm a geek an all. But, I've never heard of erratum.
But dictionary.com is your friend.
Design errors and mistakes in a CPU's hardwired microcode may also be referred to as an erratum. One well publicised example is Intel's "flag" erratum in early Pentium Pro processors. This made the conversion of floating point numbers to integers unreliable due to an exception not being signaled under certain conditions.
Thus concludes another episode of Short Answers To Stupid Questions.
Good thing it's just a patch, as opposed to a derived work of someone else's GPLed code. I wonder what the FSF guys would say about that. I also wonder: Red Hat, why?
"Believe me!" -- Donald Trump
AMD can turn this into a PR boon to one-up Intel at the "Green" initiatives. All they have to do is repurpose the uncut wafers of these chips as solar panels and then retile the outside of all their buildings with the panels. This will save money on their energy bills and they can even start a new Ad Campaign:
"AMD Outside".
Wow, bad times for AMD. They're losing the war against intel, and now have another set back. A 20% performance penalty is simply unacceptable for any processor. The fact that it is for brand new ones makes it an even bigger slap in the face for consumers.
In 3.... 2... 0.9999921341...
AMD has also blamed this bug for the delay of the 2.4GHz Phenom, despite the fact that the erratum is unrelated to clock speed. [Emphasis added.]
Why does the summary claim this? I read through both articles, and AMD says this is a hardware issue across both chip models. Since this is a hardware issue, wouldn't it stand to reason that AMD would hold up a related chip because it's a hardware bug across both chip models and not because it's a clock speed issue? I'm not sure where the "despite" comes into play. I didn't see where the article said that AMD is not delaying a different speed Phenom.
It's not like there aren't problems with Intel's CPUs - just take a look at the problems with the MMU in the Core 2 - but no-one is suggesting Intel is doomed. It would just be better if AMD had admitted this when they first knew about the issue rather than sending out review units that are known to have serious issues.
My good old Opteron 170 had the same stupid issue with unsynched core clocks. What is new here?
The patch is under the NDA, the kernel is under GPL, so the resulting work (patched kernel) can't be distributed, because the licenses are incompatible.
The GPL only applies to redistribution. Private-use changes don't have to be GPL'd.
IANAL,TIJHIUI (I Am Not A Lawyer, This Is Just How I Understand It).
that Intel's Core 2 also had a problem with the TLB when first released, although that problem manifested itself as data corruption instead of a lockup. Here are the two articles from The Inquirer about it - the second one especially. And note that this document was released after Intel had shipped the buggy Core 2's.
However, Intel was able to fix it without incurring a large performance loss. It's a shame for AMD that they weren't able to do the same.
AMD admitted there were errors in the early Phenom CPUs back before launch. They even put it in their presentations in the press conferences and such. They also said before launch that they were going to include the proper fix in the revised core used in the higher end Phenom, hence the delay.
At least in the graphics world, "faster and usually correct" is acceptable.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
The idea was to gain some cash to sustain operations until a faultless (i.e. no major faults) CPU can be released. Those that bought faulty CPUs will get their CPUs replaced as soon as faultless CPUs are completed. In some sense you can look at AMD's action as taking out a long term loan.
A counter argument to my theory can be that AMD would not risk its reputation to take out a "cash loan" in such a manner. However, the risk of losing reputation is justified if we consider another major factor at play: the holidays. It is less likely that AMD would gain the same (or even close to the same) cash flows if they would have released the CPUs after the holidays.
AMD now has some cash and is able to breath a little bit. When it releases fixed CPUs it will be able to continue where it left off.
Ironically, these may turn into the CPUs dejour for Linux users...
The performance hit is probably 10% when patching the microcode which should mean steep price mark-downs on this generation of CPUs. But it's only a 1% performance hit when patching the (Linux) kernel.
So why doesn't every OEM that sells Linux servers and desktops just buy up all of AMD's supplies of defective chips at a big discount, and pass the savings along? I'd buy a couple.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant