Slashdot Mirror


Flawed AMD Chip Can Lead To Data Corruption

Brandonski writes "Apparently AMD allowed some flawed chips to slip through their detection grid. The problem affects only a small number of chips and only single core 2.6 and 2.8 GHz CPUs." From the article: "It is believed that the glitch is triggered when the affected chip's FPU is made to loop through a series of memory-fetch, multiplication and addition operations without any condition checks on the result of the calculations. The loop has to run over and over again for long enough to cause localized heating which together with high ambient temperatures could combine to cause the result of the operation to be recorded incorrectly, leading to data corruption."

3 of 203 comments (clear)

  1. nice! by B3ryllium · · Score: 3, Interesting

    Wow, that was fast. FreeBSD already has a patch for this.

    Judging from the posting date, I *really* need to be updating my sources more often. :)

    20060419: p7 FreeBSD-SA-06:14.fpu
                    Correct a local information leakage bug affecting AMD FPUs.

    (could be an unrelated correction, I guess, it doesn't provide much more information in /usr/src/UPDATING)

  2. CALL ESP by Myria · · Score: 3, Interesting

    Probably the easiest errata to come by is the instruction "CALL ESP" (or "CALL RSP"). On AMD CPUs, "CALL ESP" will jump to the address in ESP, *then* push the return address. However, on Intel CPUs, it will push the return address first, then jump to the value it just pushed. This is, of course, disasterous if you try to use it.

    According to Intel errata documents, this is a bug in the Pentium Pro that has been kept for several generations. The Pentium and below, except the 8086 and 8088, worked correctly with this instruction.

    If you want to differentiate Intel and AMD in your program and don't want to use CPUID, you can set up a test with CALL ESP.

    Melissa

    --
    "Screw Sun, cross-platform will never work. Let's move on and steal the Java language." - Visual J++ Product Manager
  3. Quality Control at AMD must be good. by MROD · · Score: 5, Interesting

    Having read a lot about this flaw it's actually amazing that AMD's quality control found the problem in the first place.

    The actions needed to cause the problem to arise are so extreme that they'd never happen in the field. i.e. Loop through tight floating-point only instructions without any comparisons for maybe hours before the error occurs.

    This would *NEVER* happen in the field. Firstly, in any modern OS the process would have been pre-empted long before any problem could occur (causing other instructions to run and hence stopping the overheating). Secondly, no real-world program would ever do this sort of thing as there would always be a comparison in the loop within the timeframe.

    This is a theoretical problem only in the real world, especially as it only affects about 3000 processors in total (it has been quoted). This is why AMD gave it such a low priority. We should just forget about it and move on.

    --

    Agrajag: "Oh no, not again!"