Slashdot Mirror


AMD Confirms CPU Bug Found By DragonFly BSD's Matt Dillon

An anonymous reader writes "Matt Dillon of DragonFly BSD just announced that AMD confirmed a CPU bug he found. Matt quotes part of the mail exchange and it looks like 'consecutive back-to-back pops and (near) return instructions can create a condition where the processor incorrectly updates the stack pointer.' The specific manifestations in DragonFly were random segmentation faults under heavy load."

2 of 292 comments (clear)

  1. Affected CPUs by Anonymous Coward · · Score: 5, Informative

    A pertinent addition to the submission would be which CPUs have been found to be affected.
    The second link says Opteron 6168 and Phenom II X4 820. For a second I thought that bulldozer hasn't managed to do anything right, but these two examples are pre bulldozer.
    No doubt this is not an exhaustive list.

  2. Bulldozer not effected. by m.dillon · · Score: 5, Informative

    AMD has indicated to me that the Bulldozer is not effected, which is a relief.

    I guess I should have realized this would get slashdotted. In anycase, it took quite a bit of effort to track the bug down. It was very difficult to reproduce reliably. It isn't a show stopper in that it really takes a lot of work to get it to happen and most people will never see it, but it's certainly a significant bug owing to the fact that it can be reproduced with normal instruction sequences.

    I began to suspect it might be a cpu bug last year and after exhaustive testing I posted my suspicions in December:

    http://leaf.dragonflybsd.org/mailarchive/kernel/2011-12/msg00025.html

    Older versions of GCC were more prone to generate the sequence of POP's + RET, coupled with a deep recursion and other stack state, that could result in the bug. It just so happened that DragonFly's buildworld hit the right combination inside gcc, and even then the bug only occurred sometimes and only one a small subset of .c files being compiled (like maybe 2-3 files). The bug never manifested anywhere else, doing anything else, running any other application. Ever.

    In particular the bug disappeared with later versions of GCC and disppeared when I messed with the optimizations. We use -O by default, not -O2. The bug disappeared when I produced code with gcc -O2 (using 4.4.7).

    It is really unlikely that Linux is effected... the sensitivity to particular code sequences laid out in the compiler is so fine that adding a single instruction virtually anywhere could make the bug disappear. Even just shifting the stack pointer a little bit would make it disappear.

    In anycase, for a programmer like me being able to find an honest-to-god cpu bug in a modern cpu is very cool :-)

    -Matt