Slashdot Mirror


Erratum Plagues Quad-Core Opterons, Phenoms

theraindog writes "Errata are not uncommon with new processors, but a problem with the TLB logic in AMD's quad-core Opteron and Phenom processors appears to be quite serious. The erratum is so severe that AMD has issued a 'stop ship' order on all quad-core Opterons. AMD has also blamed this bug for the delay of the 2.4GHz Phenom, despite the fact that the erratum is unrelated to clock speed. A BIOS-based workaround for the issue has been made available to motherboard makers, but it apparently carries a 10-20% performance penalty. What's more disturbing is that AMD knew of the erratum and the potential performance hit associated with fixing it before it launched the Phenom processor. Hardware provided to the press for reviews did not include the fix, conveniently overstating Phenom performance."

21 of 226 comments (clear)

  1. What??? by GregPK · · Score: 5, Informative

    I'm a geek an all. But, I've never heard of erratum.

    But dictionary.com is your friend.

    Design errors and mistakes in a CPU's hardwired microcode may also be referred to as an erratum. One well publicised example is Intel's "flag" erratum in early Pentium Pro processors. This made the conversion of floating point numbers to integers unreliable due to an exception not being signaled under certain conditions.

    1. Re:What??? by fitten · · Score: 4, Insightful

      Every CPU maker publishes the errata for their CPUs because system designers/vendors/whatever need to know these things. Every CPU made for the past (insert very long time in the computer world here) has had a big list of errata publicly published. Just got to the Intel or AMD site, for example, and look up the errata on the PPro, P3, P4, Core, Core2, Athlon, Athlon XP, Athlon64, Athlon64 X2, or whatever your favorite CPU happens to be.

      The thing is, the CPU is actually broken a bit and AMD has pulled the Barcelona line but are continuing to sell the Phenom(inal Failure) line to customers and, evidently, don't plan to 'fix' the problem later (Intel offered replacements for the Pentium floating point bug after they got dinged on it, for example... I know... I had one and replaced it).

      So... if you actually get your hands on (or got your hands on) a Phenom, realize you have a broken CPU and the more you load it, the more likely you'll have stability issues.... and AMD isn't (currently) going to fix it.

    2. Re:What??? by nuzak · · Score: 4, Informative

      Erratum is singular. Errata is plural.

      The conventional terms used for erratum, however, are usually "error" or "bug".

      --
      Done with slashdot, done with nerds, getting a life.
    3. Re:What??? by alshithead · · Score: 5, Funny

      "I'm a geek an all. But, I've never heard of erratum."

      Mod me down, call me troll, but please don't claim to be a geek if you can claim to never have heard of erratum or errata. That's as bad as not knowing what a bug is or calling a PC case and its contents a hard drive.

      Here's a heartfelt suggestion...read more.

      --
      I reserve the right to think for myself. Others' opinions are optional. Puppy on lap = typos...not illiteracy.
    4. Re:What??? by Carnildo · · Score: 4, Informative

      Well... I can't remember any for my beloved 6502.


      They may not have been published, but there are at least three:
      1) A memory-indirect jump where the address is stored across a 256-byte boundary will read the second byte of the address from the wrong location.
      2) The arithmetic status flags are not valid when performing arithmetic in BCD mode.
      3) If a hardware interrupt occurs while the processor is fetching a BRK instruction, the BRK instruction is ignored.
      --
      "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
  2. No. by Anonymous Coward · · Score: 5, Funny

    Thus concludes another episode of Short Answers To Stupid Questions.

  3. NDA for patch? by Cajun+Hell · · Score: 5, Interesting
    Check this out:

    Linux users may have another option in the form of a patch for that operating system's kernel. Sources estimate this patch's performance hit at less than one percent, but it comes with several caveats. At present, the patch purportedly only applies to the 64-bit version Red Hat Enterprise Linux, Upgrade 4. Customers must sign a non-disclosure agreement in order to obtain the patch...

    Good thing it's just a patch, as opposed to a derived work of someone else's GPLed code. I wonder what the FSF guys would say about that. I also wonder: Red Hat, why?

    --
    "Believe me!" -- Donald Trump
    1. Re:NDA for patch? by Crispy+Critters · · Score: 4, Insightful
      It is silly to think that RH is ignoring the GPL.

      There are other possibilities that are more likely. For example, perhaps the patched kernel is doing something like loading microcode into the processor. The kernel code would be GPLed but the microcode would not be.

  4. Cue the intel jokes by Anonymous Coward · · Score: 5, Funny

    In 3.... 2... 0.9999921341...

    1. Re:Cue the intel jokes by nmb3000 · · Score: 5, Funny

      Some of the (obligatory) Pentium jokes were pretty funny. From a text file I've had laying around for quite a while:

      --------------

      Intel's new motto: "United We Stand, Divided We Fall"

      Q: How many Pentium designers does it take to screw in a light bulb?
      A: 1.99904274017, but that's close enough for non-technical people.

      Q: What do you get when you cross a Pentium PC with a research grant?
      A: A mad scientist.

      Q: What's another name for the "Intel Inside" sticker they put on Pentiums?
      A: The warning label.

      Q: What do you call a series of FDIV instructions on a Pentium?
      A1: Successive approximations.
      A2: A random number generator.

      Q: Complete the following word analogy: Add is to Subtract as Multiply is to:
              1) Divide
              2) Round
              3) Random
              4) All of the above

      Q: What algorithm did Intel use in the Pentium's floating point divider?
      A: "Life is like a box of chocolates." (Source: F. Gump of Intel)

      Q: Why didn't Intel call the Pentium the 586?
      A: Because they added 486 and 100 on the first Pentium and got
          585.999983605.

      Q: According to Intel, the Pentium conforms to the IEEE standards 754
          and 854 for floating point arithmetic. If you fly in aircraft
          designed using a Pentium, what is the correct pronunciation of "IEEE"?
      A: Aaaaaaaiiiiiiiiieeeeeeeeeeeee!

      Q: Did you hear about the new "morning after" pill being developed as a
          replacement for RU-486???
      A: Its called RU-Pentium. It causes the embryo to not divide correctly.

      TOP TEN NEW INTEL SLOGANS FOR THE PENTIUM

          9.9999973251 - It's a FLAW, Dammit, not a Bug
          8.9999163362 - It's Close Enough, We Say So
          7.9999414610 - Nearly 300 Correct Opcodes
          6.9999831538 - You Don't Need to Know What's Inside
          5.9999835137 - Redefining the PC -- and Mathematics As Well
          4.9999999021 - We Fixed It, Really
          3.9998245917 - Division Considered Harmful
          2.9991523619 - Why Do You Think They Call It *Floating* Point?
          1.9999103517 - We're Looking for a Few Good Flaws
          0.9999999998 - The Errata Inside



      Worth a laugh anyway :)

      --
      "What do you despise? By this are you truly known." --Princess Irulan, Manual of Muad'Dib
      /)
  5. Re:Bummer by the_humeister · · Score: 4, Funny

    Hmmm... I suppose that I should disconnect this Phenom-powered computer running Windows from this nuclear power station I'm working at...

  6. "because", not "despite" by statemachine · · Score: 5, Insightful

    AMD has also blamed this bug for the delay of the 2.4GHz Phenom, despite the fact that the erratum is unrelated to clock speed. [Emphasis added.]

    Why does the summary claim this? I read through both articles, and AMD says this is a hardware issue across both chip models. Since this is a hardware issue, wouldn't it stand to reason that AMD would hold up a related chip because it's a hardware bug across both chip models and not because it's a clock speed issue? I'm not sure where the "despite" comes into play. I didn't see where the article said that AMD is not delaying a different speed Phenom.

  7. No, but it looks bad by _merlin · · Score: 5, Insightful

    It's not like there aren't problems with Intel's CPUs - just take a look at the problems with the MMU in the Core 2 - but no-one is suggesting Intel is doomed. It would just be better if AMD had admitted this when they first knew about the issue rather than sending out review units that are known to have serious issues.

    1. Re:No, but it looks bad by ceoyoyo · · Score: 4, Insightful

      No, but AMD seems to be in a pretty delicate state. Their stock is pretty low and they've taken a beating from a newly-competitive Intel. They don't have a big advantage in processor speed anymore, nor power, nor even price. Halting shipment on an entire line? Not good. If they eventually have to recall it... bad.

      It might not be AMD's doom, but they're really not that many big screwups away.

  8. Old issue, really by Uzito · · Score: 4, Interesting

    My good old Opteron 170 had the same stupid issue with unsynched core clocks. What is new here?

    1. Re:Old issue, really by CajunArson · · Score: 4, Informative

      The old opty 170 didn't have an L3 cache which is where the bug lies. This bug is rare, but it is reproducible when the CPU is under heavy load and was one of the reasons why AMD was trying to get hardware reviewers to come to an AMD event in Tahoe to run benchmarks on AMD approved systems instead of just dropping chips into FedEx packages. Causing a full-blown system freeze is also on the serious side when it comes to bugs. There have been even more problems, techreport has a story that unlike the hand selected systems that ran at Tahoe, many of the actual consumer phenoms you can buy today actually use slower HT speeds (1.8Ghz vs. 2.0 Ghz in the demos). This means that the memory subsystem (AMD's one theoretical strength over Intel right now) is slowed down, so the somewhat unimpressive initial results are actually overstatements of what the consumer chips can do. (article here).

          AMD is in a world of hurt right now. The "true" quad-core line appears to be nothing more than marketing hyperbole since year-old q6600's are faster clock-for-clock than Phenom is. AMD will hopefully get these bugs ironed out... by next February. Even then though, AMD will have chips that are MASSIVELY expensive to make, but that they can't sell for the higher prices Intel is able to command. AMD would be fine if they had an expensive chip they could sell at a premium, or a very cheap to produce chip they could sell for the budget crowd, but right now they have Acura production costs coupled with Kia per-unit revenues: bad times.

      --
      AntiFA: An abbreviation for Anti First Amendment.
  9. Re:NDA not enforcible by TheThiefMaster · · Score: 5, Informative

    The patch is under the NDA, the kernel is under GPL, so the resulting work (patched kernel) can't be distributed, because the licenses are incompatible.

    The GPL only applies to redistribution. Private-use changes don't have to be GPL'd.

    IANAL,TIJHIUI (I Am Not A Lawyer, This Is Just How I Understand It).

  10. Let's not forget.. by AcidPenguin9873 · · Score: 4, Interesting

    that Intel's Core 2 also had a problem with the TLB when first released, although that problem manifested itself as data corruption instead of a lockup. Here are the two articles from The Inquirer about it - the second one especially. And note that this document was released after Intel had shipped the buggy Core 2's.

    However, Intel was able to fix it without incurring a large performance loss. It's a shame for AMD that they weren't able to do the same.

  11. They did by DreadSpoon · · Score: 5, Informative

    AMD admitted there were errors in the early Phenom CPUs back before launch. They even put it in their presentations in the press conferences and such. They also said before launch that they were going to include the proper fix in the revised core used in the higher end Phenom, hence the delay.

  12. Re:Bummer by scottv67 · · Score: 4, Funny

    What if you were doing scientific computing? 20% drop could mean a lot of time for a calculation. I use to run calculations that would take months...

    Just thinking out-loud here: Did you trying pushing-in the Turbo button?

  13. Good thing they bought ATI by Chris+Snook · · Score: 5, Funny

    At least in the graphics world, "faster and usually correct" is acceptable.

    --
    There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.