Erratum Plagues Quad-Core Opterons, Phenoms
theraindog writes "Errata are not uncommon with new processors, but a problem with the TLB logic in AMD's quad-core Opteron and Phenom processors appears to be quite serious. The erratum is so severe that AMD has issued a 'stop ship' order on all quad-core Opterons. AMD has also blamed this bug for the delay of the 2.4GHz Phenom, despite the fact that the erratum is unrelated to clock speed. A BIOS-based workaround for the issue has been made available to motherboard makers, but it apparently carries a 10-20% performance penalty. What's more disturbing is that AMD knew of the erratum and the potential performance hit associated with fixing it before it launched the Phenom processor. Hardware provided to the press for reviews did not include the fix, conveniently overstating Phenom performance."
... for AMD?
my k6-2 533AFR is still rock solid. ill never upgrade
socket 7 ftw.
I'm a geek an all. But, I've never heard of erratum.
But dictionary.com is your friend.
Design errors and mistakes in a CPU's hardwired microcode may also be referred to as an erratum. One well publicised example is Intel's "flag" erratum in early Pentium Pro processors. This made the conversion of floating point numbers to integers unreliable due to an exception not being signaled under certain conditions.
Thus concludes another episode of Short Answers To Stupid Questions.
Errata are very common but how company handles them is a big factor in deciding things. I certainly hope all review sites will rerun benchmarks.
Anandtech I'm looking at you.
Good thing it's just a patch, as opposed to a derived work of someone else's GPLed code. I wonder what the FSF guys would say about that. I also wonder: Red Hat, why?
"Believe me!" -- Donald Trump
Yet no press people noticed the reportly huge bug?
So basically, -1 troll/offtopic is really slashdots way of saying "I hate that you thought of something before me."
He characterized the issue as a race condition in the TLB logic "where the other guy wins who isn't supposed to win,"
This pretty much describes the college admissions process, as well.
The theory of relativity doesn't work right in Arkansas.
AMD can turn this into a PR boon to one-up Intel at the "Green" initiatives. All they have to do is repurpose the uncut wafers of these chips as solar panels and then retile the outside of all their buildings with the panels. This will save money on their energy bills and they can even start a new Ad Campaign:
"AMD Outside".
Wow, bad times for AMD. They're losing the war against intel, and now have another set back. A 20% performance penalty is simply unacceptable for any processor. The fact that it is for brand new ones makes it an even bigger slap in the face for consumers.
In 3.... 2... 0.9999921341...
AMD has also blamed this bug for the delay of the 2.4GHz Phenom, despite the fact that the erratum is unrelated to clock speed. [Emphasis added.]
Why does the summary claim this? I read through both articles, and AMD says this is a hardware issue across both chip models. Since this is a hardware issue, wouldn't it stand to reason that AMD would hold up a related chip because it's a hardware bug across both chip models and not because it's a clock speed issue? I'm not sure where the "despite" comes into play. I didn't see where the article said that AMD is not delaying a different speed Phenom.
If it's a patch for the Linux kernel, which is distributed under the GPL, I don't think they can enforce an NDA. The patch may be used to create a derived work of a GPL'd product, so the derived work must also be GPL'd: so you can distribute it, as long as you include its source. This will be available for all Linux variants soon.
It's not like there aren't problems with Intel's CPUs - just take a look at the problems with the MMU in the Core 2 - but no-one is suggesting Intel is doomed. It would just be better if AMD had admitted this when they first knew about the issue rather than sending out review units that are known to have serious issues.
to make a big issue out of this, as he did with the Core 2 errata.
(For mods that are troll-tag-happy, Theo de Raadt is the maintainer of OpenBSD and is security paranoid.)
proud caffeine whore
Intel will have copied these features and have similar problems.
My good old Opteron 170 had the same stupid issue with unsynched core clocks. What is new here?
As long as the diff doesn't contain any of the original code and the patch is distributed in isolation then there is no conflict with the GPL ... if RH distributes a binary kernel though then they are in violation of the GPL, this would make RH liable but I don't know whether your rights under the GPL or the prohibitions under the NDA take precedence for the recipient though.
It just means they're starting to make Intel's mistakes! They're on-par now! :D
And no 'kdawson' crack either. Disappointing.
that Intel's Core 2 also had a problem with the TLB when first released, although that problem manifested itself as data corruption instead of a lockup. Here are the two articles from The Inquirer about it - the second one especially. And note that this document was released after Intel had shipped the buggy Core 2's.
However, Intel was able to fix it without incurring a large performance loss. It's a shame for AMD that they weren't able to do the same.
AMD admitted there were errors in the early Phenom CPUs back before launch. They even put it in their presentations in the press conferences and such. They also said before launch that they were going to include the proper fix in the revised core used in the higher end Phenom, hence the delay.
QMUNAD, WHEA (Quit Making Up New Acronyms Damnit, We Have Enough Already)!
Send email from the afterlife! Write your e-will at Dead Man's Switch.
And an unusual amount of goatse crap. Can we ban their IP's or something
Everyone makes crap these days and releases it anyway. It has to kill or injure people and be recalled before it's truely "disturbing"....
Relax everything's fine!
Borat says "I buy Pentium Core Duo, neighbor can not afford he buys AMD Phenom everyone knows it's for girls, Great success!".
aaahhh !
/., who not only RTFA, but also RTFUA ! ...
Someone, here, now, on
--
What is so bad about a company like AMD coming right out and saying "processor model x, clock speed y, stepping z has bug abc and this is the workaround for it". Assuming BIOS vendors and others are going to be deploying the fix anyway, how does it hurt AMD if everyone knows of the fix?
At least in the graphics world, "faster and usually correct" is acceptable.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
Hey! This is slashdot, not your personal advertising space for YACB (yet another crappy blog).
Oh, wait a minute...
Hey man, fuck you and your limpdick blog. That is all.
The idea was to gain some cash to sustain operations until a faultless (i.e. no major faults) CPU can be released. Those that bought faulty CPUs will get their CPUs replaced as soon as faultless CPUs are completed. In some sense you can look at AMD's action as taking out a long term loan.
A counter argument to my theory can be that AMD would not risk its reputation to take out a "cash loan" in such a manner. However, the risk of losing reputation is justified if we consider another major factor at play: the holidays. It is less likely that AMD would gain the same (or even close to the same) cash flows if they would have released the CPUs after the holidays.
AMD now has some cash and is able to breath a little bit. When it releases fixed CPUs it will be able to continue where it left off.
Whatever. Start off whining about crappy concurrency, then start mumbling about "easy software composition". How are they related? Are you ADD?
[engineer] gets [error] during [random load tests]
[engineer]: Hmm
[engineer] runs [random load test] 10 more times gets 10 more [errors]
[engineer] calls [manager] "Um, I have something to show you..."
[manager]: [expletive]
[manager] calls [vp]: "We may have discovered an issue..."
[vp]: [expletive expletive expletive]
[vp] calls Hector Ruiz: "Hector, remember when you said the next person that called with bad news would be wearing your guitar around his neck?
Hector: [?]
[vp]: "Well," [explains]
Hector: [expletive expletive expletive expletive expletive expletive expletive]
[vp]: "We have maybe a BIOS fix."
Hector: [expletive expletive expletive expletive expletive]
[vp]: "Ok we'll just go ahead and do that then..."
Hector: [expletive]
[vp]: "Ok then I'll see you later."
Hector: [expletive expletive expletive expletive] *click* [expletive]
Equine Mammals Are Considerably Smaller
Just wondering - if a new customer buys a quad core phenom, just to run some super elite gamerz rig running Vista .... does it really matter if the CPU is going to generate bad results, or crash at some point ? Its not like the operating system and other code running on the buggy processor isnt equally likely to break something as well.
... if a blind, retarded midget (Vista operating system), gets into a car with a broken crankshaft and square wheels (busted phenom CPU), then is anyone going to lose any sleep ?
To use a car analogy
Ironically, these may turn into the CPUs dejour for Linux users...
The performance hit is probably 10% when patching the microcode which should mean steep price mark-downs on this generation of CPUs. But it's only a 1% performance hit when patching the (Linux) kernel.
So why doesn't every OEM that sells Linux servers and desktops just buy up all of AMD's supplies of defective chips at a big discount, and pass the savings along? I'd buy a couple.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
Just bought my MSI AM2+ mobo and 4Gig DDR2. Next week I buy my own broke-ass Phenom - happily!
Why?
Because I still have the choice of buying a broke-ass AMD processor instead of an Intel. If AMD folds, Intel might just give every employee a new Porsche just for kicks. Because with what they will be able to charge (in a monopoly business), the Porsches would be a rounding error.
Remember that Free Enterprise is the ultimate democracy. Vote with your dollars!
"It might not be AMD's doom, but they're really not that many big screwups away."
They're trying to pull a Commodore. Anyway I think my next processor will be Intel. Combine that with an Intel board and it should be one steady machine.
I read some of your links, and, while I will admit that there are certainly problems with how software is built, I don't think you have a silver bullet.
If pictures are so good, how come a lot of hardware is written in Verilog/VHDL instead of schematics, when both are available?
If fine-grained parallelism is so error-free, how come chip companies spend a fortune verifying and validating both their schematics and their HDL before they tape out?
BTW, I make my living by writing software, writing Verilog, and drawing schematics.
FWIW, I prefer Python to Verilog, and Verilog to schematics. In fact, I check my schematic netlists with Python scripts to make sure there are no errors.
When it's important to get it right, I find text much preferable to schematics -- it can be easily diffed and version controlled.
That's psycho. Assuming a 50k price for Porsche's and Intel's current employee count of 80k, that's a 4 billion dollar investment right there, or half a year of profit. You really think that would be a rounding error?? I can see that now... umm... board of directors, our profits dropped by 50% this year, but employee morale is through the roof! Yeah, right.
Do your math next time.
The whole "Intel is t3h hot!!!" thing has gotten old. Yes, P4s were very inefficient chips. Not so with their modern lineup. Core processors are quite efficient power wise for their given level of performance. They also scale way down, there are Core Solos with only a 3 watt TDP spec. Shouting about the Core lineup using a lot of power when it is AMD's processors that you use as the alternative makes little sense.
It is just silly to dredge up old crap and keep using it. It actually weakens any point you try to make because it makes you look as though you don't know what you are talking about. Name calling is bad enough but when it is outdated name calling it is really silly.
By the way, I wouldn't crow too much about price either. I can't find many Phenoms available but the 2.2GHz one Newegg sells is $245. A 2.4GHz Core 2 Quad is $260. Even assuming the Phenom is faster (which would be real questionable especially in light of the patch) that makes it 94% of the price, not 60%. Not a significant cost savings.
That the vast majority of errata are minor things. They either don't matter much or don't affect almost anything or both. It is the kind of stuff that if discovered in a piece of software would probably be left in if it was more than a few lines of code to fix. However, because of the nature of a CPU, they just make sure to document all of it so people know. Most of it never gets fixed, precisely because it isn't a big deal.
Bigger ones generally are fixed in microcode, and sometimes even lesser ones. However something things big, well it may be as you say and take an actual hardware fix.
I also think you may be right about them being not so willing to replace it. For one, there just aren't that many sold. The Pentium was marketed at consumer machines, this is more high end. So it was a bigger flap for Intel. However the other thing is AMD is hurting badly right now. They have lost a lot of money recently and are not going to be in the mood for losing more. While it may well be a case of penny wise pound foolish to refuse replacement, executives make those kind of decisions all the time, especially when there is heavy pressure to cut costs.
I have one character array and a float for you: 'gcc-' '2.96'
Enough said.
Doesn't this constitute as a an attempt to exploit a loop-hole within the license? Doesn't it go against the spirit of the license which aims to ensure that modifications to GPL'ed code remain GPL'ed? Any lawyers out there that can comment on this?
I'm reading your articles, and you're a complete crackpot.
The true model of paralelism is double buffering on a console game? Uhhh.... Right... I think what you're getting at here is a non-blocking algorithm. There isn't any "lock" since they're reading and writing separate buffers. There's lots of non-blocking algorithm research out there, and other alternatives to locks. I encourage you to read up on it, a lot of it is quite interesting. Transactional memory is another hot topic; have you read up on that?
In other words: don't trash academia when academia has already come up with very good solutions for the problems you describe. You're showing blatant ignorance of what academia has to offer.
Your rejection of terms like "thread" and "algorithm" is even more ludicrous.
I read about your COSA thing. It looks like you reinvented LabView's language G...and not particularly well. Needless to say, there are many problems with it. If it was really that good, wouldn't it have caught on sometime in the past 20 years? Here are a few of the many reasons it sucks:
1. It does NOT eliminate programmers. Sure, it doesn't require typing, but that just means it eliminates typists (and it doesn't really do that because you still have to type in comments). Anybody can drag and drop a component, but it takes a programmer to figure out which ones to use and how to connect the components together.
2. It's a bitch to debug because you have potentially thousands of things all running in parallel. You can't easily single-step. You can't easily comment out a block of code you think is causing problems. You can't just start sticking Print statements everywhere.
3. Cutting and pasting code is a mess! When you have to insert some code into the middle of your algorithm, you can't just insert a new line, you have to insert rows and columns of pixels. If your components aren't all on a grid, that may not be easy.
4. Printing out your program requires cutting and pasting because it's 2-dimensional. It's hard to visually understand things like switch constructs and sequential operations, particularly when they're nested, because that makes it 3-dimensional.
5. Non-text files are difficult to deal with. You can't tell your friend to look at line XXX to help you figure out why you have a bug. You can't diff the code, meaning it's not really possible to version control properly. Have you ever posted a small code snippet to have somebody copy it, get it working, and post a reply with the fixed code? It's not possible with these programs.
6. It gives new meaning to the term "spaghetti code" because data flow is indicated by lines, and complex data flows look like a plate of spaghetti. In a traditional programming language, you can compute some value (say, average of Foos) and assign it to a variable (say, avgFoo). You can then use that value 100 different times in your function by typing 'avgFoo' and every time you see it you know that it's the average of Foos. With this graphical method, you just have some icon with a line coming out of it with some comment hopefully indicating somewhere that the line is the average of Foos. Then you have 100 lines distributing this value all over the place in your diagram, all of them looking just like any other line in the diagram (in the case of G, every line of the same data type looks the same). Can you make sense of this program snippet?
Unfortunately, G is not the only visual programming language I've ever had to use. In fact, the company I currently work for has such a method of programming its system. It was designed because the users apparently had trouble using the text-based language. I think the engineer who designed the graphical system is the only one in the company who still uses it. Keep in mind that the text-based language is still implicitly parallel, it just doesn't have all of the problems I mentioned above (although it's not much easier to debug).
For those who don't know what this sort of programming looks like, see this for a good example of how it takes a 2MP bitmap to describe a page of code.
dom
The Phenom Rev B3 release (March '08) is supposed to fix the TLB issue in hardware. Until then, there's a microcode update (via a BIOS patch) that fixes stability issues (at the cost of 10-20% L3 performance).
I just got my Barcelona system running last month, paid very good money for what was supposed to be a revolutionary platform, and for what? Sub-par performance? Really not cool AMD, really not cool.
I might hope that the RH patch hits mainstream Linux, but I'm not sure how that affects Xen...
The new quad cores from AMD do beat Intel's once you get past two sockets. Even with "just" eight cores the Intel systems start to get memory constrained.
The problem is most people don't need more than four cores these days. But if you need a really big server AMD does still seem to be a better solution than Intel.
I really want to see a dual-core Phenom! Most desktops still don't need a quad core. The Phenom core has some real improvments over the old cores and it should be cheap and very low power.
Today on the desktop the quad-core just isn't that useful. So where are the next generation dual-cores?
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
I cannot believe you bothered with that reply!
But I still bought it (the 2.3GHz version, though).
I know it's not the fastest on the market and I know it has a bad rep,
but I'm an AMD guy and I intend to stick with them as long as I can.
I have a compatible (CrossFire) motherboard and the CPU will cost me about $250.
If I want to go to Intel, I would have to spend at least -twice- that much,
just for the MoBo and CPU (Never mind new cooler, etc).
Even if it might freak out under heavy load, It won't be that much of a problem, because it's only a game machine.
(I realize that means it will often be under a heavy load, but I can stand to lose the data on there, I have my Ghost Backup)
I've had about a half dozen 'freezes' in the past month and I don't think this CPU will make that much of a difference.
Just my 2c
"I was in love with a beautiful blonde once, dear. She drove me to drink. It's the one thing I am indebted to her for."
I'm on the VHDL side and I agree about schematics being bad. It takes longer to draw them and they are harder to read. To trace signals, you have to highlight the wire and then go on a huge search over the schematic to find it. The whole time you are zooming and unzooming. With a text-based language you can just do a simple search and find the information instantly. Another thing is how hard it is to modify a schematic. In VHDL I can easily insert an inverter. In a schematic, you have to spend several minutes tearing up wiring and changing stuff around.
I think the first thing you need to do is remove the incredibly arrogant tone from your blog, especially seeing as a very large portion of it is really quite questionable. It makes you seem like an absolute raving lunatic.
No, AMD is not trying to get a "cash advance" from it's customers.
If AMD, a Fortune 500 company, needs cash....they go to the debt markets like everyone else. It's not like this is the local loan shark here. It's Goldman Sachs. It's Citibank. It's Lehman Brothers. And trust me, they will be more than happy to help AMD find short-term cash (for a fee, of course). What do you think investment banking is?
There is no conspiracy here. Please move along.
...erratum plaguing their boxen.
I think the first thing you need to do is remove the incredibly arrogant tone from your blog, especially seeing as a very large portion of it is really quite questionable. It makes you seem like an absolute raving lunatic.
One of the things I noticed about a lot of computer geeks is their total lack of a sense of humor. It smacks of autism and anal retentiveness. ahahaha...
are you saying the whole thing is joke?
(1.21 gigawatts) / (88 miles per hour) = 30 757 874 newtons