Slashdot Mirror


Tracking Down The AMD "Processor Bug"

tercero writes: "over at the Gentoo Linux website there is an update on the AMD processor bug mentioned here. The sum up is that AMD claims it's not a bug with the Athlon processor, but with the motherboard. More detailed information can be found on this LKML post." An Anonymous Coward points to a similar explanation at Linux Weekly News. Update: 01/25 01:25 GMT by T : Daniel Robbins from Gentoo clarifies: "AMD is not calling this a 'motherboard' issue, it is an interaction between a feature of the Athlon called 'speculative writes' and the design of the GART, which is not cache-coherent. It's a 'Athlon/cache coherency/GART' problem, not a 'motherboard' problem."

12 of 237 comments (clear)

  1. Kernel parameter vs LILO config file by DragonHawk · · Score: 5, Informative

    The kernel will look for the parameter

    mem=nopentium

    and turn off 4MB pages (which may or may not prevent the problem from manifesting -- the situation is unclear at this time). You can do this at the boot prompt like this

    LILO boot: linux mem=nopentium

    or by placing the configuration directive

    append="mem=nopentium"

    in your /etc/lilo.conf configuration file.

    See the manual page for lilo.conf for the details.

    --

    dragonhawk@iname.microsoft.com
    I do not like Microsoft. Remove them from my email address.
  2. More information by DragonHawk · · Score: 5, Informative

    Yesterday, information became widely available that described possible stability issues (system crashes, hangs, etc.) when using an AGP video card under Linux in conjunction with an AMD Athlon processor. It was generally called a "bug" in the Athlon CPU.

    More information is now available at http://www.gentoo.org, including an analysis of AMD's response. AMD's official response was posted to LKML, and is available at http://www.geocrawler.com/lists/3/Linux/35/175/762 6960/.

    There is apparently some kind of bad interaction between the AGP GART ("Graphics Address Remapping Table", I think?), speculative memory operations performed by the Athlon processor, the memory mappings used by the kernel, and cache coherency. The details are beyond me, but the practical upshot appears to be that the wrong data ends up being written back to main memory at some point.

    I recommend reading the above LKML thread if you suspect you are affected by this issue. Information is still being uncovered, and it is not immediately clear how this occurs, what causes it, who is affected by it, and how to work around it.

    In particular, there is some uncertainty as to whether the "mem=nopentium" option actually prevents the problem, or merely makes it less likely to occur.

    --

    dragonhawk@iname.microsoft.com
    I do not like Microsoft. Remove them from my email address.
  3. All of the above. by Christopher+Thomas · · Score: 5, Informative

    AMD claims it's not a bug with the Athlon processor, but with the motherboard

    According to young bald children everywhere, "There is no bug".

    In related news, the motherboard manufacturers are quoted as saying, "It's not a bug with the motherboard, but with the Athlon processor."


    Funny, I didn't think I was bald...

    It's an Athlon bug if you think doing speculative writes is a bug.

    It's a motherboard chipset bug if you think that the AGP controller should play nicely with cache-coherence protocols (right now it doesn't, presumably to gain a speed boost).

    It's an OS bug if you think that the OS should be bright enough not to make AGP-touched memory cacheable (it wasn't intended to be).

    I'm voting for option 3), myself.

  4. Well, I'll be by Anonymous Coward · · Score: 1, Informative

    Well, I'll be darned. Vendors pointing the finger at each other. Who'd have thought?

  5. Re:Easy - Buy Intel. The cost of using 2nd party.. by pivo · · Score: 2, Informative
    Your argument is incoherant. The idea that lower cost equals lower performance can't be backed up by the presence of a bug, and it ignores real (as in "this is reality") market factors. There have been bugs in all sorts of hardware and software, even the highest performing hardware in the world. There is no correlation.

    If you paid attention to benchmarks you'd see that in almost every case AMD has a higher cost effectiveness than Intel. If you have some specific examples of why AMD is not a good choice (as opposed to vague, illogical ramblings) then why don't you share them? Prove that your mumblings are, "not made up of bugus stuff"

  6. Re:This is embarassing by Dahan · · Score: 3, Informative
    Actually, it isn't embarassing at all. It wasn't the "Linux Community"'s fault. This is the fault of AMD who anounced/classified the bug as a Windows 2000 issue instead of a hardware issue.

    If you read the technical writeup on LKML, you'll see that it's not a hardware issue, but a software bug. Which is why AMD announced the bug as a Windows 2000 issue--it is one. Linux also happens to have the same bug (it's a subtle issue and an easy mistake to make, IMO), but how was AMD supposed to know that Linux was doing the same bad thing--mapping the AGP GART area cacheable, when the GART is non-cacheable?

  7. Re:This is embarassing by LordNimon · · Score: 2, Informative
    but how was AMD supposed to know that Linux was doing the same bad thing

    Oh, that's easy. The engineer who discovered the problem should have realized that it's not necessarily a Windows-specific issue, but a problem that any OS could have. He should have then tried to contact all the OS vendors, not just Microsoft.

    Considering how Linux is used by a higher percentage of AMD customers than Intel customers, AMD should have paid more attention to an important segment of its customer base.

    --
    And the men who hold high places must be the ones who start
    To mold a new reality... closer to the heart
  8. OS Bug by kenneth_martens · · Score: 3, Informative

    According to the article, it is not a problem with the motherboard at all. The problem is "the operating system is creating coherency problems within the system by creating cacheable translation to AGP GART-mapped physical memory." That means it's a problem with the OS, not with the motherboard or processor.

    In truth, we should probably say it is a combination of a problem with the OS and a problem with the processor. After all, Intel processors don't have the same problem, simply because they work differently. So while it may not technically be the CPU's fault, the CPU does play a part.

  9. It's Linux, NOT the motherboard! by Anonymous Coward · · Score: 2, Informative

    Get the story straight:

    "Our conclusion is that the operating system is creating coherency problems within the system by creating cacheable translation to AGP GART-mapped physical memory."

  10. Re:this is not a motherboard bug either... by addaon · · Score: 3, Informative

    While the use of the GART you mention (video chipsets with no onboard memory) really does suck, performance-wise, the GART itself is not useless. Most games today limit themselves to 16MB or so of textures, so that they run properly, without swapping to main memory, with a 32MB video card. However, if you want a game with 256MB of textures, say, you have three options.

    1) Get a video card with 270+MB of memory. (Yeah, right.)

    2) Snatch from main memory the portions of the texture you need. (This gets slow AND ugly if you use more than ~16MB in a single frame.)

    3) Use the GART, take (less of) a performance hit, and just keep the textures in system memory.

    This was the original purpose of the GART, and is still important.

    --

    I've had this sig for three days.
  11. Re:You are assuming... by Dahan · · Score: 5, Informative
    Apparently the GART is cacheable on pentium systems?

    There are Pentium systems with an AGP port? If you mean the Pentium II and up, I don't see why the GART would be cacheable there either; I don't know if the P4 chipsets have changed things, but with the PII and PIII, here's what Intel had to say about the subject:

    For current hardware implementations, the OS will make AGP memory (like other video memory) non-cacheable, so that there is no coherency problem between the CPU caches and the data that the graphics controller uses. Otherwise, graphics controller accesses to AGP memory would require "snooping" the CPU caches, which would cause delays in execution in some cases.

    -- AGP and Graphics Optimization Techniques

    (Emphasis added). As for why the bug doesn't happen on Intel CPUs, it sounds like the Athlon has more aggressive speculative writes and can change memory that wasn't explicitly written to, dirtying the cache. But in any case, even on Intel CPUs, the AGP area is supposed to be mapped non-cacheable.

    Why does disabling large pages fix the problem?

    Don't know about that one; I haven't read the various tech docs for the Athlon. Perhaps the cache works slightly differently with 4MB pages vs 4KB pages?

  12. Re:this is something.. by geekoid · · Score: 3, Informative

    BOy where have I heard that before... oh yeah every 2 years since there have been macs..sheesh.
    FYI I don't own a mac, but I will purchase one next time I want a computer.

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect