Tracking Down The AMD "Processor Bug"
tercero writes: "over at the Gentoo Linux website there is an update on the AMD processor bug mentioned here. The sum up is that AMD claims it's not a bug with the Athlon processor, but with the motherboard. More detailed information can be found on this LKML post."
An Anonymous Coward points to a similar explanation at Linux Weekly News.
Update: 01/25 01:25 GMT by T : Daniel Robbins from Gentoo clarifies: "AMD is not
calling this a 'motherboard' issue, it is an interaction between a
feature of the Athlon called 'speculative writes' and the design of the
GART, which is not cache-coherent. It's a 'Athlon/cache coherency/GART'
problem, not a 'motherboard' problem."
2+2=3.9999999999999999999999999999983774
Oh wait that ws Intel.
Hammer of Truth
According to young bald children everywhere, "There is no bug".
In related news, the motherboard manufacturers are quoted as saying, "It's not a bug with the motherboard, but with the Athlon processor."
--SC
You read fiction? I write it! Lemme know what you th
And it's never our program that has your bug.
Meanwhile, we're feverishly fixing your bug in our software.
"Yes sir, we've patched around the OS problem and this should get rid of that nasty bug you were seeing."
LILO passes kernel parameters via an 'append' line, so the syntax would be
append=" mem=nopentium"
Make sure you aren't appending anything else. If you are, just add the mem=nopentium at the end of your existing append line.
I pledge allegiance to the flag...
of the Corporate States of America...
Don't blame AMD entirely. They acknowledged the bug back in September of 2000 and immediately released patches for Windows 2000. Consequently, it doesn't affect users of Windows XP either. It's been around for over a year and now it's "news"? This should've been fixed in the Linux kernel months ago. Sorry for sounding so harsh.
If you celebrate Xmas, befriend me (538
It's an optimization for Windows XP!!
ender-iii
The kernel will look for the parameter
/etc/lilo.conf configuration file.
mem=nopentium
and turn off 4MB pages (which may or may not prevent the problem from manifesting -- the situation is unclear at this time). You can do this at the boot prompt like this
LILO boot: linux mem=nopentium
or by placing the configuration directive
append="mem=nopentium"
in your
See the manual page for lilo.conf for the details.
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
Mac users don't have to worry about using the term 'Gigahertz' either.
"Derp de derp."
This is embarassing to the Linux community as a whole, and It also explains why I've had problems with crashes on two different systems running Linux and Athlons.
What I don't understand is how this could have made it so far? This is exactly the sort of problem I have been telling people we don't have in the Linux world, and now it looks like I was wrong. Is this pointing out an underlying problem we have with QA in the Linux kernel? With Open Source in general? What can we do to make sure that a bugs of this magnitude are detected more quickly?
Sigs are awesome huh?
Yesterday, information became widely available that described possible stability issues (system crashes, hangs, etc.) when using an AGP video card under Linux in conjunction with an AMD Athlon processor. It was generally called a "bug" in the Athlon CPU.
2 6960/.
More information is now available at http://www.gentoo.org, including an analysis of AMD's response. AMD's official response was posted to LKML, and is available at http://www.geocrawler.com/lists/3/Linux/35/175/76
There is apparently some kind of bad interaction between the AGP GART ("Graphics Address Remapping Table", I think?), speculative memory operations performed by the Athlon processor, the memory mappings used by the kernel, and cache coherency. The details are beyond me, but the practical upshot appears to be that the wrong data ends up being written back to main memory at some point.
I recommend reading the above LKML thread if you suspect you are affected by this issue. Information is still being uncovered, and it is not immediately clear how this occurs, what causes it, who is affected by it, and how to work around it.
In particular, there is some uncertainty as to whether the "mem=nopentium" option actually prevents the problem, or merely makes it less likely to occur.
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
Recently one of my friends, a computer wizard, paid me a visit. As we were talking I mentioned that I had recently installed Windows XP on my PC. I told him how happy I was with this operating system and showed him the Windows XP CD. To my surprise he threw it into my microwave oven and turned it on. Instantly I got very upset, because the CD had become precious to me, but he said, "Do not worry, it is unharmed."
E lO E5IOCC98D444AA08EI324
After a few minutes he took the CD out, gave it to me and said, "Take a close look at it."
To my surprise the CD was quite cold to hold and it seemed to be heavier than before. At first I could not see anything, but on the inner edge of the central hole I saw an inscription, an inscription finer than anything I had ever seen before. The inscription shone piercingly bright, and yet remote, as if out of a great depth:
12413AEB2ED4FA5E6F7D78E78BEDE820945092OF923A40E
'I cannot understand the fiery letters,' I said in a timid voice.
"No but I can," he said. '"The letters are Hex, of an ancient mode, but the language is that of Microsoft, which I shall not utter here. But in common English this is what it says:
'One OS to rule them all, One OS to find them,
One OS to bring them all and in the darkness bind them.'
It is only two lines from a verse long known in System-lore:
'Three OS's from corporate-kings in their towers of glass,
Seven from valley-lords where orchards used to grow,
Nine from dotcoms doomed to die,
One from the Dark Lord Gates on his dark throne
In the Land of Redmond where the Shadows lie.
One OS to rule them all, One OS to find them,
One OS to bring them all and in the darkness bind them,
In the Land of Redmond where the Shadows lie.'"
AMD claims it's not a bug with the Athlon processor, but with the motherboard
According to young bald children everywhere, "There is no bug".
In related news, the motherboard manufacturers are quoted as saying, "It's not a bug with the motherboard, but with the Athlon processor."
Funny, I didn't think I was bald...
It's an Athlon bug if you think doing speculative writes is a bug.
It's a motherboard chipset bug if you think that the AGP controller should play nicely with cache-coherence protocols (right now it doesn't, presumably to gain a speed boost).
It's an OS bug if you think that the OS should be bright enough not to make AGP-touched memory cacheable (it wasn't intended to be).
I'm voting for option 3), myself.
Sheesh! Read the above article where it states "...AMD claims it's not a bug with the Athlon processor, but with the motherboard". AMD is claiming no such thing! They are claiming it's a Linux kernel bug.
it's not a bug with the Athlon processor, but with the motherboard
I somehow wonder if this is related! I had a P3 system, with Gforce 2head card everything was working fine, I replaced the motherboard for an ASUS P4B, and a intel P4 chip. Ever since I intermitently get a BSOD, (bad pool caller).
Point is, isn't this very similar to the problems that AMD were reported on Win2k system without the patch?
I also noticed that as I run programs, not all the memory used by the program is freed when the program terminates. I ran the System Monitor and it revealed to me this information. I'm not sure if this is Athlon or Windoze related. Anyways, I'm suspecting that the problem may not be limited to Linux boxes.
From the LKML post linked in the story, it seems it's because some 4MiB pages (I couldn't understand why 4KiB pages aren't affected, if they effectively are not) are allocated for the AGP (GART more specifically) with some bits set telling it is cacheable.
Why would somebody want to cache the AGP memory? I'm pretty sure it's used 99.99% of the time as write-only memory, because it's the main output method of most computers. What's the point of caching that? It can only prevent the use of the CPU cache by some more important things, no?
Feel free to correct me if I'm wrong, I'm not very familiar with the usage of AGP memory (or GARTs).
If the bug doesn't appear on intel chips, then how are we supposed to believe that it's not an AMD bug? Sure, we could blame the motherboard... but wouldn't that mean via/intel solutions would carry the same issue?
Anyone have any knowledge as to how intel treats this 4mb pages different?
I mean, if the bug is caused by AMD's precaching of AGP Gart mapped memory, and intel just doesn't precache that memory, then now is it NOT an AMD processor bug?
When two processors aren't equal, there has to be a reason for the difference in running software.
(Note that I prefer AMD, so I'm just looking for answers, not trolling).
Basically, it seems that someone figured that the GART shouldn't worry about the CPU potentially caching 4MB pages and simplified their circuits accordingly. Unfortunately, they forgot to tell OS developers (NB: I wonder if this affects other OSs like the now doomed Solaris/x86 or *BSD?) causing these problems.
.... "don't blame me, its a motherboard problem"
MARIJUANA, SHROOMS, X: ONLINE?! - E
Hmmmm.
Is the Bug...
- (A) In the Athlon cache?
- (B) In the chipset?
- (C) In the AGP-using devices misusing memory?
- (D) In the Linux kernel?
Well, AFAICT, the real bug is in the communication of relevent knowledge.These kinds of bugs would have significantly shorter duration if the specifications for all four possible culprits in (A)-(D) were openly published, completely, for all to see.
"Provided by the management for your protection."
I recently put together an HTPC (Home Theater PC) based on an ASUS A7V133 (Via) motherboard with AMD Duron processor. It runs Windows 98. I had been experiencing an unbelievable number of random lockups (no blue-screen, no error... just locks). For the most part, I couldn't keep the system running for more than an hour or so.
In doing extensive research on the problem, I found very large numbers of people with the same problem and very little explanation. I tried MANY different solutions and eventually found one that worked. It involved wiping everything out and installing hardware and software in a VERY specific order. It seems that if you don't install the VIA 4-in-1 drivers (which include GART) at just the right time in the system building, the drivers don't work properly and thus the random lockups.
I wonder if this is in any way related to the problem here.
-S
--- What parts of "shall make no law", "shall not be infringed", and "shall not be violated" don't you understand?
If you paid attention to benchmarks you'd see that in almost every case AMD has a higher cost effectiveness than Intel. If you have some specific examples of why AMD is not a good choice (as opposed to vague, illogical ramblings) then why don't you share them? Prove that your mumblings are, "not made up of bugus stuff"
When Linus switched to the AA VM, I got the impression that one of the key differences between the AA VM and the RvR VM is that Rik's VM is much more flexible, but with that flexibility comes complexity, which is why Linus switched to AA's VM. AA's was much simpler to understand and helped to stabalize the VM problems. Does the above quote mean that the AA VM isn't going to be able to handle the requirements to fix this bug? Is this a plug to put back RvR's VM?
I'm not trying to start a flame war here, just want to understand if I understood what the final paragraph was saying. Please mod me down if I'm way off base, but help me understand too!
Key to financial independence: Spend less than you earn. Save and invest the difference. Do it for a long time.
>Lower costs typically means lower perfomance
.. in CPUs, it's the value of a larger consumer base, which essentially translates into a higher possibility of latent design flaws (ie, they exist in the costlier platform as well, but are found earlier because of the larger user base), and the value of being in the same boat as everyone else should a product fail in some fashion.
/doesnt/ work, thats not performance; it's either compatibility with the outside world or a design flaw. Anyhow, I feel sorry for your view, because I guess you're paying alot of money for brand security .. but everyone in-the-know computer geek I know (I'm a C++ developer, so I'm not talking tech fanboys here) knows that you'd have to enjoy wasting money to justify buying Intel CPUs at this point in time.
.. it has already been fixed in Windows, and there is a known Linux workaround. So really, there's not much of an issue, and my AMD chip still cost me half the price of an Intel CPU, and benchmarks faster than the Intel, to boot! Keep buying your Nikes! I just want the shoe. :)
What planet are you from? Lower costs (in the case of demonstrated similarity in performance) typically means lower demand and lower consumer valuation of the brand name, which means smaller user base, which means that it generally takes longer to run into compatibility flaws.
For instance, Nike is more expensive than Puma. Does that mean Nike shoes are better? Of course not, it means people are more willing to buy Nike, because they percieve that the brand gives them additional values. In the world of shoes, that value is the value of conformity and fashion
Thue funniest thing is you're talking about performance. Performance is how well something works when it works. When it
Lest you cite this situation as a reason why I might be wrong
"Old man yells at systemd"
>I>essentially translates into a higher possibility
er, I meant lower possibility of latent design flaws with a large user base. A smaller user base increases the likelihood of problems existing unnoticed for an unspecified amount of time.
"Old man yells at systemd"
According to the article, it is not a problem with the motherboard at all. The problem is "the operating system is creating coherency problems within the system by creating cacheable translation to AGP GART-mapped physical memory." That means it's a problem with the OS, not with the motherboard or processor.
In truth, we should probably say it is a combination of a problem with the OS and a problem with the processor. After all, Intel processors don't have the same problem, simply because they work differently. So while it may not technically be the CPU's fault, the CPU does play a part.
Interestingly enough, this feature of AGP is not really critical to increasing performance in games - in fact, it could be counterproductive to it.
The AGP GART (Graphics Address Remapping Table, I believe) maps "video card memory addresses" to "main memory addresses", i.e., it's to allow the graphics card to grab textures, etc. directly from main memory without going through the CPU.
Many motherboard manufacturers use this feature to provide on-board video without any dedicated memory so they don't have to include any additional memory for the graphics card.
Of course, since this blows so massively performance-wise, it's mostly abandoned now.
Is the GART actually useful for anything except extending the video card's onboard memory? I'm not really sure...
You are assuming that AMDs current explanation is 100% true, correct, and complete. There are good reasons to doubt this.
The "explanation" so far has just raised more questions. Why does the same code that causes the athlon to crash work fine on pentiums? Apparently the GART is cacheable on pentium systems? And the Athlon is billed as pentium-compatible...
Why does disabling large pages fix the problem? If their explanation is correct, that fix should not work, because it doesn't address the issue they claim to be the problem.
I'm sure this will get worked around in software (and the linux fix will actually workaround the underlying problem, rather than just making it less likely as the windows world seems to be satisfied with) once the real details of this are known. But to claim it's not a hardware bug is ludicrous. It's a bug with the Athlon CPU, or with certain GARTS found in Athlon chipsets, or both. If AMD were less worried about spin-controlling it and claiming it's the software at fault maybe they would be more forthcoming about what is really going on here.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Friends don't let friends enable ecmascript.
Get the story straight:
"Our conclusion is that the operating system is creating coherency problems within the system by creating cacheable translation to AGP GART-mapped physical memory."
Hmm... my post 'Mac users don't have to worry about using the term gigahertz' post got modded down as flamebait.
The original post of 'Mac users don't have to worry about this [the Athlon bug]' is flame bait, my response was a humorous way of saying why 'this is why buying a Mac won't solve my problem.'
An Off-topic moderation wouldn't have bothered me since I didn't spell out my reasoning, but I do feel the flamebait call was bad.
"Derp de derp."
While the use of the GART you mention (video chipsets with no onboard memory) really does suck, performance-wise, the GART itself is not useless. Most games today limit themselves to 16MB or so of textures, so that they run properly, without swapping to main memory, with a 32MB video card. However, if you want a game with 256MB of textures, say, you have three options.
1) Get a video card with 270+MB of memory. (Yeah, right.)
2) Snatch from main memory the portions of the texture you need. (This gets slow AND ugly if you use more than ~16MB in a single frame.)
3) Use the GART, take (less of) a performance hit, and just keep the textures in system memory.
This was the original purpose of the GART, and is still important.
I've had this sig for three days.
...Slashdotters that always point out their favorite OS isn't vulnerable to a particular bug.
My Macintosh isn't affected by this bug due to its PowerPC processor.
I have a website. It's about Macs.
You could also argue that the published spec for the GART states that it shouldn't worry and the OS developers didn't read the spec and assumed that everything worked just like a Pentium *.
Thus by conforming to a specific implementation, rather than the published spec, it is an OS bug. My architecture knowledge is rusty enough to be unsure which answer is correct.
You can only drink 30 or 40 glasses of beer a day, no matter how rich you are.
-- Colonel Adolphus Busch
I believe that's the currently proposed fix - it may change as people understand the details more, but I think that's the basic idea.
himi
My very own DeCSS mirror.
BOy where have I heard that before... oh yeah every 2 years since there have been macs..sheesh.
FYI I don't own a mac, but I will purchase one next time I want a computer.
The Kruger Dunning explains most post on
Why doesn't your ISP use AMD for servers?
They probably do. Well, depending on size. If they're AOL-sized, I doubt they use PCs at all.
If they're smaller, they probably use AMD chips. Certainly they used celerons and other cheap technology.
There are two types of setups.
1) Very expensive server with the best (and best "name") hardware money can buy.
2) Cheap crap in a fail-over cluster.
For many things like email servers, news servers, etc, the cheap cluster is most cost efficient, easier to maintain (want to fix one? Unplug it and the others take over automatically), and easier to build.
While your ISP may not use AMD (the saving for a cheap duron + mobo vs cheap celeron + mobo aren't great when you get into motherboards with integrated video and lan) they would if it saved them any money.
There are some taks that are hard to "fail over" and those require a sturdy server, but even then, as long as it's not rack mounted, AMD has a good reputation (with an AMD chipset).
I vote for a +5 Informative mod. :-)
299,792,458 m/s...not just a good idea, its the law!
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
You wouldn't wear them, would you?
I had the same opinion about wasting money on Intel, so I bought AMD. Though I'm glad you're having a good experience with your AMD, I simply can't agree:
I really don't agree with your feeling that performance is how well something works when it works. If that were true, I could just stay home from work most of the time and kick butt when I show up. (OK, I sort of do that now, but that's another issue...)
Performance, in my book, is a judgement of how well something is doing its job. My AMD 'sometimes-kickass' workstation is not performing well in my opinion, even though when it does run, it runs great.
If I look at the system as the tool that it's supposed to be, it simply isn't giving good performance.
Let me explain:
I have a few boxes on a small network at home - My main workstation is an Asus/AMD 1200Mhz setup running RedHat 7.2 - Before that, it was an Asus/AMD 600Mhz setup. Both systems have had the same problems, even though *Every Component* has been replaced in an effort to track down the problem. This morning, it froze a few minutes after the screen saver kicked in.
Each time this happens, I have to do a hard reboot. The other day, I added that mem=nopentium option and it still has the problem.
I used to have some big drives in the PC, but they were getting thrashed by the powerdowns, so I replaced them with a single 10GB and moved the 2 60GB drives to the server in my laundry room.
The server, by the way, is an old 300Mhz IBM with an Intel chip and it happily chugs along, serving files by Samba, database stuff, CGI/Apache stuff, SSH logins, VNC logins, whatever I happen to throw at it.
This is a machine that I literally snagged from the trash, but you couldn't *pry* it from me at this point. I just ran uptime on it, just for kicks:
11:39am up 155 days, 13:45, 3 users, load average: 0.00, 0.00, 0.00
So, do I regret spending so much time and money on AMD? Yes.
Would I buy them again?
No.
To me it *is* a performance issue. The AMD system has not done its job.
YMMV,
Jim in Tokyo
-- My Weblog.
I've got karma to burn. With 3 down-mods, I'm at 47 now. So it's no big deal. It's just sad that people don't recognize the truth. Of course, the idiot above me got one downmod and that's it, even though he was spouting off complete bullshit. Oh well...
Every once in a while I like to masturbate a new word into my vocabulary, even if I don't know what it means.
"I think it won't work out, because there's too much legacy stuff that there will always be confusion at this point about what "mega" and "kilo" mean with computers."
Not to mention the fact that computers are incapable of "thinking" in anything but a power of two. You will not find a discrete quantity of 10 (or a power thereof) bytes anywhere in a computer system. This makes the SI units useless for computers. While re-defining them for use in computers was and still is an abuse, the lack of applicability of the conventional SI units makes it largely a non-issue. The only people who care are are HDD manufacturers who rate drives in "millions of bytes" so they can swindle stupid customers.
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
My secret wish would be for Apple to support AMD x86-64 processors with MacOS X...not that it'll happen. That would be a great combination.
On the other hand, if they can really hit 1.5+ GHz. with the G5, that'll be OK too. Just a lot more expensive.
299,792,458 m/s...not just a good idea, its the law!
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
Well. To address 2 points...
:P). Now, I am using Linux on the Athlon and OS X on the G4, so you can draw any comparisons you wish.
Yes, Slashdot moderation is ridiculous. Period. It's the biggest problem with Slashdot, and that's why I hardly ever read it anymore.
As far as Apple and processors... the G4 is a pretty good processor, and I have a Dual 800MHz G4 and am loving every second I get to spend on it... but I also have a Dual Athlon MP 1600+ (1.4GHz) and the Dual Athlon spanks the shit out of it at all the things that really matter (Quake 3
That having been said, the G4 is a fast machine. But... It was $4,000 with monitor ($500 Sony CPD-G400), whereas with a $1,000 monitor, my Dual Athlon was only $3,000. The only differing factors is that the Athlon has a GF3 Ti500 (the PowerMac has a GF3 regular), the G4 has 128 more megs of RAM (whoopty doo), and the G4 has an extra 60 gig drive. Now I'm not complaining - I think the machine was worth what I paid for it. But for $3,500 (computer itself), you'd think it should spank the shit out of a $2,000 computer (okay, $2,100 with shipping and all).
The point is, the G4 isn't that fast. Apple really would do better to put in some AMD processors, knock the price down a LEETLE and be able to claim that it really *did* burn Pentiums (instead of just with Adobe products).
One can only hope...
Every once in a while I like to masturbate a new word into my vocabulary, even if I don't know what it means.