Erratum Plagues Quad-Core Opterons, Phenoms

← Back to Stories (view on slashdot.org)

Erratum Plagues Quad-Core Opterons, Phenoms

Posted by kdawson on Tuesday December 4, 2007 @11:43AM from the correct-or-fast-choose-at-most-one dept.

theraindog writes "Errata are not uncommon with new processors, but a problem with the TLB logic in AMD's quad-core Opteron and Phenom processors appears to be quite serious. The erratum is so severe that AMD has issued a 'stop ship' order on all quad-core Opterons. AMD has also blamed this bug for the delay of the 2.4GHz Phenom, despite the fact that the erratum is unrelated to clock speed. A BIOS-based workaround for the issue has been made available to motherboard makers, but it apparently carries a 10-20% performance penalty. What's more disturbing is that AMD knew of the erratum and the potential performance hit associated with fixing it before it launched the Phenom processor. Hardware provided to the press for reviews did not include the fix, conveniently overstating Phenom performance."

16 of 226 comments (clear)

Min score:

Reason:

Sort:

Bummer by El+Pollo+Loco · 2007-12-04 11:55 · Score: 3, Insightful

Wow, bad times for AMD. They're losing the war against intel, and now have another set back. A 20% performance penalty is simply unacceptable for any processor. The fact that it is for brand new ones makes it an even bigger slap in the face for consumers.
Re:What??? by fitten · 2007-12-04 12:02 · Score: 4, Insightful

Every CPU maker publishes the errata for their CPUs because system designers/vendors/whatever need to know these things. Every CPU made for the past (insert very long time in the computer world here) has had a big list of errata publicly published. Just got to the Intel or AMD site, for example, and look up the errata on the PPro, P3, P4, Core, Core2, Athlon, Athlon XP, Athlon64, Athlon64 X2, or whatever your favorite CPU happens to be.

The thing is, the CPU is actually broken a bit and AMD has pulled the Barcelona line but are continuing to sell the Phenom(inal Failure) line to customers and, evidently, don't plan to 'fix' the problem later (Intel offered replacements for the Pentium floating point bug after they got dinged on it, for example... I know... I had one and replaced it).

So... if you actually get your hands on (or got your hands on) a Phenom, realize you have a broken CPU and the more you load it, the more likely you'll have stability issues.... and AMD isn't (currently) going to fix it.
Re:NDA for patch? by wizardforce · 2007-12-04 12:02 · Score: 2, Insightful

I also wonder: Red Hat, why?
I imagine that their reasoning was that it was better to offer a patch, closed or not that benefited their users that would choose to make use of this processor. The solution isn't elegant, more like repairing an aircraft's hull with duct tape but apparently it is better than the alternatives they tried.

--
Sigs are too short to say anything truly profound so read the above post instead.
"because", not "despite" by statemachine · 2007-12-04 12:02 · Score: 5, Insightful

AMD has also blamed this bug for the delay of the 2.4GHz Phenom, despite the fact that the erratum is unrelated to clock speed. [Emphasis added.]

Why does the summary claim this? I read through both articles, and AMD says this is a hardware issue across both chip models. Since this is a hardware issue, wouldn't it stand to reason that AMD would hold up a related chip because it's a hardware bug across both chip models and not because it's a clock speed issue? I'm not sure where the "despite" comes into play. I didn't see where the article said that AMD is not delaying a different speed Phenom.
No, but it looks bad by _merlin · 2007-12-04 12:06 · Score: 5, Insightful

It's not like there aren't problems with Intel's CPUs - just take a look at the problems with the MMU in the Core 2 - but no-one is suggesting Intel is doomed. It would just be better if AMD had admitted this when they first knew about the issue rather than sending out review units that are known to have serious issues.
1. Re:No, but it looks bad by ceoyoyo · 2007-12-04 13:18 · Score: 4, Insightful
  
  No, but AMD seems to be in a pretty delicate state. Their stock is pretty low and they've taken a beating from a newly-competitive Intel. They don't have a big advantage in processor speed anymore, nor power, nor even price. Halting shipment on an entire line? Not good. If they eventually have to recall it... bad.
  
  It might not be AMD's doom, but they're really not that many big screwups away.
Expect Theo de Raadt by andreyvul · 2007-12-04 12:09 · Score: 1, Insightful

to make a big issue out of this, as he did with the Core 2 errata.

(For mods that are troll-tag-happy, Theo de Raadt is the maintainer of OpenBSD and is security paranoid.)

--
proud caffeine whore
Re:NDA for patch? by Crispy+Critters · 2007-12-04 12:16 · Score: 4, Insightful

It is silly to think that RH is ignoring the GPL.
There are other possibilities that are more likely. For example, perhaps the patched kernel is doing something like loading microcode into the processor. The kernel code would be GPLed but the microcode would not be.
Depends ... by Pinky's+Brain · 2007-12-04 12:18 · Score: 2, Insightful

As long as the diff doesn't contain any of the original code and the patch is distributed in isolation then there is no conflict with the GPL ... if RH distributes a binary kernel though then they are in violation of the GPL, this would make RH liable but I don't know whether your rights under the GPL or the prohibitions under the NDA take precedence for the recipient though.
Re:in real life... by Anonymous Coward · 2007-12-04 12:34 · Score: 1, Insightful

I may work here, but ur the one eating this crap.
Re:Why all the secrecy? by Anonymous Coward · 2007-12-04 13:40 · Score: 1, Insightful

Well if they knew about it prior to the release there are only a couple of things to do, 1) depending on when they knew they delay the other option 2) is that it's too late to delay so the have a problem in that they've burned a lot of money and made a bunch of broken chips. So what do you do? The chips aren't always broken like you might hope, it's a subtle thing, they mostly work. So you sell them, most are going to go to Taiwanese companies anyways. Then before the customers start to get them en masse, you sort of announce the screw up on the hush hush. You've got the board vendors in a situation where you can get them to eat the costs until you fix it and they still need to buy chips form you in the future. Likewise, if you can keep it quiet enough The Street doesn't pay too much attention, they might even boost you stock because there is a lot of demand for these parts that you simply cannot get.

As for the ones that have actually reached customers, it is a more messy problem at that point. Better to not publicize the whole thing and hope they don't notice than to do anything about it. You have any idea how many BIOSes aren't patched out there and because of that how many parts are working with known errors? Do you have any idea how much money Intel burned fixing floating point units that the majority of their customers never actually used?

Going public with this in a big way only burns AMD. They already have a big bowl of suck to eat. Their market is so insanely competitive, they need a 3 and 4 core part just to stay in the game and stop the bleeding. Hopefully the bulldozer is good and can win some hearts back.
Re:Old issue, really by merreborn · 2007-12-04 13:58 · Score: 2, Insightful

AMD would be fine if they had an expensive chip they could sell at a premium, or a very cheap to produce chip they could sell for the budget crowd, but right now they have Acura production costs coupled with Kia per-unit revenues: bad times.

AMD actually still rules the absolute low end of the market (and has for years). Semprons ($30+) and old X2s ($60+, new retail box) are dirt cheap, and it's simply not possible to get better performance per dollar.

There isn't much a $60 X2 can't do in your average desktop.
Re:This doesn't have to be so bad by canuck57 · 2007-12-04 13:58 · Score: 2, Insightful

AMD can turn this into a PR boon to one-up Intel at the "Green" initiatives. All they have to do is repurpose the uncut wafers of these chips as solar panels and then retile the outside of all their buildings with the panels. This will save money on their energy bills and they can even start a new Ad Campaign:
It will not stop me from buying AMD. The only processor I have ever (of 20+) had that cooked was a P4 2.4GHz HT on a Intel PERL mobo no less! But I have abused two older AMD chips I still have running with over-clocking, dust plugs in the fans etc and in a an el-cheap mobos. One even ran with a defective fan for months. It did crash, but I caught the fan doing it one day where it would just stall. Replaced the fan, been running ever since. Those AMD just keeps on ticking, a 1200 and 2000+. Totally abused and owe me nothing.
And the AMD X2 I bought last year, runs flawlessly. BTW, I do have 2 P4 heaters still running. Yep, I have a few, even a Sparc.
But looking for another X2 for Christmas. And anyone who buys a chip with a serial number increment of less than 100,000 for production or stability are nuts. Just like a Chrysler, GM or Ford, you don't want the first 100,000, nor the last 100,000. The sweet spot is in the middle.
And although down, I do look forward to the day AMD kicks Intel ass once again. Too bad AMD execs sidelined AMD engineering with this ATI noose. ATI is going to set AMD back 4 years by the time all is counted. I do have respect for Intel PIII 650Mhz duals in a Supermicro though, they too keep ticking.
Why AMD Released Faulty CPUs: Possible Theory by Ma3oxuct · 2007-12-04 14:27 · Score: 3, Insightful

If you look at AMD's financial statements (http://sec.gov/Archives/edgar/data/2488/000119312507238299/d10q.htm#tx48043_5) for the last quarter, it has been loosing a lot of cash. This leads me to believe that they released faulty CPUs, right before the holidays, in order to get some cash in the short term.
The idea was to gain some cash to sustain operations until a faultless (i.e. no major faults) CPU can be released. Those that bought faulty CPUs will get their CPUs replaced as soon as faultless CPUs are completed. In some sense you can look at AMD's action as taking out a long term loan.
A counter argument to my theory can be that AMD would not risk its reputation to take out a "cash loan" in such a manner. However, the risk of losing reputation is justified if we consider another major factor at play: the holidays. It is less likely that AMD would gain the same (or even close to the same) cash flows if they would have released the CPUs after the holidays.
AMD now has some cash and is able to breath a little bit. When it releases fixed CPUs it will be able to continue where it left off.
Oh please knock it off by Sycraft-fu · 2007-12-04 19:06 · Score: 2, Insightful

The whole "Intel is t3h hot!!!" thing has gotten old. Yes, P4s were very inefficient chips. Not so with their modern lineup. Core processors are quite efficient power wise for their given level of performance. They also scale way down, there are Core Solos with only a 3 watt TDP spec. Shouting about the Core lineup using a lot of power when it is AMD's processors that you use as the alternative makes little sense.

It is just silly to dredge up old crap and keep using it. It actually weakens any point you try to make because it makes you look as though you don't know what you are talking about. Name calling is bad enough but when it is outdated name calling it is really silly.

By the way, I wouldn't crow too much about price either. I can't find many Phenoms available but the 2.2GHz one Newegg sells is $245. A 2.4GHz Core 2 Quad is $260. Even assuming the Phenom is faster (which would be real questionable especially in light of the patch) that makes it 94% of the price, not 60%. Not a significant cost savings.
Re:AMD Is Doomed Unless... by Anonymous Coward · 2007-12-04 21:23 · Score: 1, Insightful

I read about your COSA thing. It looks like you reinvented LabView's language G...and not particularly well. Needless to say, there are many problems with it. If it was really that good, wouldn't it have caught on sometime in the past 20 years? Here are a few of the many reasons it sucks:

1. It does NOT eliminate programmers. Sure, it doesn't require typing, but that just means it eliminates typists (and it doesn't really do that because you still have to type in comments). Anybody can drag and drop a component, but it takes a programmer to figure out which ones to use and how to connect the components together.

2. It's a bitch to debug because you have potentially thousands of things all running in parallel. You can't easily single-step. You can't easily comment out a block of code you think is causing problems. You can't just start sticking Print statements everywhere.

3. Cutting and pasting code is a mess! When you have to insert some code into the middle of your algorithm, you can't just insert a new line, you have to insert rows and columns of pixels. If your components aren't all on a grid, that may not be easy.

4. Printing out your program requires cutting and pasting because it's 2-dimensional. It's hard to visually understand things like switch constructs and sequential operations, particularly when they're nested, because that makes it 3-dimensional.

5. Non-text files are difficult to deal with. You can't tell your friend to look at line XXX to help you figure out why you have a bug. You can't diff the code, meaning it's not really possible to version control properly. Have you ever posted a small code snippet to have somebody copy it, get it working, and post a reply with the fixed code? It's not possible with these programs.

6. It gives new meaning to the term "spaghetti code" because data flow is indicated by lines, and complex data flows look like a plate of spaghetti. In a traditional programming language, you can compute some value (say, average of Foos) and assign it to a variable (say, avgFoo). You can then use that value 100 different times in your function by typing 'avgFoo' and every time you see it you know that it's the average of Foos. With this graphical method, you just have some icon with a line coming out of it with some comment hopefully indicating somewhere that the line is the average of Foos. Then you have 100 lines distributing this value all over the place in your diagram, all of them looking just like any other line in the diagram (in the case of G, every line of the same data type looks the same). Can you make sense of this program snippet?

Unfortunately, G is not the only visual programming language I've ever had to use. In fact, the company I currently work for has such a method of programming its system. It was designed because the users apparently had trouble using the text-based language. I think the engineer who designed the graphical system is the only one in the company who still uses it. Keep in mind that the text-based language is still implicitly parallel, it just doesn't have all of the problems I mentioned above (although it's not much easier to debug).

For those who don't know what this sort of programming looks like, see this for a good example of how it takes a 2MP bitmap to describe a page of code.

dom