Major Linux/Athlon CPU bug discovered
GeorgeFrancisco writes "I recently installed the nVidia drivers so I could play TuxRacer on my Athlon. Problem is it kept inexplicably hanging Linux. Now I know why. The CPU bug affects Athlon/Duron/Athlon MP AGP users. Fortunately there's a way around it, and: "Alan [Cox] is going to try to add some kind of Athlon/AGP CPU bug detection code to the kernel so that it will be able to auto-downgrade to 4K pages when necessary." Read more on the Gentoo Linux site."
I noticed this too, it seems to only affect 3D games, mainly SDL based ones such as armagedtron, but strangly it hasent affected quake 3 at all. Unreal tournamet was affected, but i SWEAR it didnt use to do that.
There was a Win2k bug a while back that did the exact same thing, and you had to install a "LargePageMinimum" patch for it to not crash. Is this the Linux equivilant of that? And if so, how come it has taken so long to surface and fix?
I guess it takes awhile to pile through the submissions. This was posted on pclinuxonline.com recently.
VIA does make some complete crap, but they also make some nice chipsets. The KT266A is very nice, it's the fastest DDR implementation out there by far. But still, VIA chipsets are a good bit cheaper than the Intel equivalent, and while the Intel chipset may be more stable, the VIA one is almost always faster. And even Intel has issues with chipset stability, it's just that they ignore them and only quietly replace the faulty boards when they're returned under warranty. You know how it goes in the computer industry... Faster, cheaper, or more stable- pick any two.
So does anyone know how performance is affected from this 4MB->4KB page thing?
so the question is, if I configure my kernel for the K7 family, do I need to pass the kernel "mem=nopentium" or is this the default?
Nvdia drivers forces AGP to 1x due to corruptions caused by AMD Irongate chipset signal integrity [ Mentioned at the README for Nvidia 1.0-2313 Drivers ]
This newly discovered memory corruption with Athlon + AGP, is it contributing to the signal integrity of the Irongate ? Or is it a separate bug ?
Anyway this makes AMD look very bad in my view. There is a bug in the CPU and their chipset screws up my AGP to 1x. Sigh.
AMD doesn't keep tabs on VIA and VIA doesn't keep tabs on motherboard manufacturers.. The only decent AMD motherboards are the from manufacturers trying to compete in the enthusiast market where crap boards just don't sell. Combined with VIA actually being in competition with AMD in the budget processor market (The Cyrix) delaying a decent integrated chipset for the duron and VIA bullying motherboard manufacturers into not producing The SIS 735 chipset, VIA is not AMD's best friend.
AMD chipsets:
Nforce 220,420
AMD-760MPX,760MP,760
ALi MAGiK 1,MAGiK 2
SIS 735,745,746,755
VIA KT266A,KT133A,KM133,KLE133,KT333,K8HTB
STABLE (100+Days,Linux) Chipsets:
760,KT133A,735,760mp
Good Motherboard Manufacturers:
Asus,Abit,Iwill,ECS,Epox,Soyo
Personal Best Uptime 135 days, Iwill KK266 (KT133A), Power supply failure
If voting were effective, it would be illegal by now.
can anyone tell me if this problem may occur when running SETI? only I used to run it on my dual MP Athlon under MDK 8.1 , but it would invariably kill my machine. so I stopped running it.. ideas anyone ?
nick(nospam)@(nospam)polyprecords.com
Electronic Music Made Using Linux http://soundcloud.com/polyp
I have 2 Athlon systems, a dual Thundirbird 1.4GHz (Tyan Thunder K7) and a single Thunderbird 1.4GHz (Asus A7V133). The former runs a GeForce 3 and kernel 2.4.17, the later TNT2 and RH 7.2 (kernel 2.4.9 I believe). Both systems run semi-custom NVidia drivers (release 2313). By semi-custom, I mean I tweaked them to use SBA, the NVIDIA AGP driver (NOT agpgart) and to run in 4x mode. The later has never had a problem, the former (the dual) had some problems until kernel 2.4.14.
The problems I had were frequent lockups with everything X, especially Q3A and Tribes 2. Some experimenting proved what worked and what didn't, and here's what I found:
agpgart never worked worth a damn even with kernel 2.4.17, despite several attempts by me to make it work (I don't maintain it, so I gave up on messing with it). Earlier NVIDIA drivers were less stable, but the latest is great (although it does not support FW, which blows). Tweaking the NVIDIA driver to use SBA and it's own AGP driver instead of agpgart, along with kernel 2.4.14 - 2.4.17 makes for a very stable and fast system. Older kernels just did not work worth a damn whenever I enabled DMA on my IDE drive - they locked every time. These newer kernels don't exhibit this problem, and the NVIDIA driver works nicely with all 3D games as well as 3D development tools like Blender.
My kernels have always been compiled as Athlon kernels as well. The bottom line is: don't blame this bug and/or the NVIDIA driver if your system is unstable and/or slow. There are other things at work, and in my case I seem to have found them all.
- Rohan
Yep, in addition to the 4004 bug mentioned in one of your links above, I seem to recall a bug in the Z80 (in fact, a Russian company developed some kind of device (calculator??) which used a custom CPU compatible with the Z80 chip -- only, it also had the microcode bug so it's likely the reverse engineering was not as "clean room" as it probably should have been). There were bugs in the Amiga's SCSI chipset, and even a few microcode bugs in the old VAXen.
The article says it happens when the kernel is compiled for Pentium processors; but does this happen if the kernel is compiled for a K7?
... I have an Athlon and it kept hard freezing. The bug doesn't happen with a Voodoo card.
By the way, I had to shelve my nVidia card a couple months ago because of this
Not like they are recalling processors like Intel
-----
Oh great, so they make defective processors, but don't worry because they won't recall them! How in the hell does that make them better than Intel?
Think about it -- If you own an affected part a recall is GOOD!
Bullshit. The same *precise* bug hit me running 2.4.* on a PIII 450.
I reported it half a dozen times.
Someone, somewhere doesnt give a shit.
I had XMMS crashing and completely locking the box serveral times. I tried accessing my computer remotely too, but it didn't even echo on pings! This may ofcourse be XMMS' fault, but even though it's pretty darn dangerous.
I just got a new box, Athlon 1.2GHz... Asus a7a266 mainboard... nice little box for general usage. Soon as I finish moving, I'll get cable modem back and stop using mom's AOL, and I'll go back to Linux. But now I see this, and I'm eyeing my AGP card, and wondering. AMD has earned a lot of respect from me in the last couple years, as I've found the Athlons to be simply the finest x86 CPU's I've ever got my hands on, at great prices with very reasonable motherboards/chipsets as well. Now this. I'm not sure. Yeah, it's an engineering mistake, but I'm not clear on how AMD is handling it, and I hope they don't disappoint me. Sure, you can do a workaround - but as others have asked, what's the story on the performance hit? What about AMD working with the kernel folks to find another, better solution? Or maybe AMD could consider offering serious discounts on new, un-flawed CPU's, for those who are already eyeing upgrades?
think for yourself, you won't like the results if others do it for you.
except model 1 (the very first K7 since it didn't have PSE) are affected,
except the latest revision A5 (cpuid 662) of the Athlon XP, i.e. A0/660 and
A2/661 are affected as well (similarly all 64x Thunderbirds etc.).
(there was a model 1, 2, 4 and 6 Athlon, with 6 being XP)
Some or all Durons might be affected too, but I didn't look at that closely.
The above hinges on whether this is the correct bug description, feel free :)
to flame the anonymous coward if this has got nothing to do with it
"16 INVLPG Instruction Does Not Flush Entire Four-Megabyte Page Properly with Certain Linear Addresses
Normal Specified Operation. After executing an INVLPG instruction the TLB should not contain any
translations for any part of the page frame associated with the designated logical address.
Non-conformance. When the logical address designated by the INVLPG instruction is mapped by a 4-MB
page mapping and LA[21] is equal to one it is possible that the TLB will still retain translations after
the instruction has finished executing.
Potential Effect on System. The residual data in the TLB can result in unexpected data access to stale or
invalid pages of memory.
Suggested Workaround. When using the INVLPG instruction in association with a page that is mapped via
a 4-MB page translation, always clear bit 21."
(page 7 from Athlon Model 6 revision sheet)
The current workaround gets around this problem by disabling 4M (2M?) pages (PSE). Hence we go back to 4K pages, and mapping large slabs of VM is a little slower and wastes memory (we need another Page table for each slab of 4M) and obviously more TLB misses/space wasted, because to touch the whole 4M region, the CPU needs to do up to 1024 page table lookups instead of 1.
:-)
As discussed this may have performance implications.
According to the AMD docs, the problem is only when flushing TLB entries with INVLPG and the page is a 4M page, _and_ the virtual address's bit 21 is set (which does not affect the 4M block of memory the address is in - eg: 0x400000 (2^22) vs 0x600000 (2^22|2^21) are both in the second 4M block).
Hence, when invlpg'ing a VA we just need to INVLPG(address&~(1 (leftshift) 21)). This only requires a single ANDL instruction. But we need to distinguish a 4M page first though, so I don't know?
Heck maybe we should just do it the FreeBSD way and recursively map the Pagedir
Any ideas? Will this work?
--JQuirke
I've lost count of the number of times I wanted 64-bit integers, in pretty general purpose apps.
Not because I do big databases or suchlike, but they let you do loads of optimisations that wouldn't otherwise be possible. For example, you can pass around 8-byte structures in a single register, which is damn useful given the lack of available registers in the x86 architecture.
Example: I've recently been coding a large hexagonal grid component. Each point in the grid is indexed by 2 32-bit (x,y) integers. With a 64-bit register, you could put a full co-ordinate into a single register.
Why is this useful? Well, one of my requirements was to be able to manage large sets of co-ordinates (think reachable spaces for an AI). You want to be able to combine sets of co-ordinates, which basically requires merging two lists. In order to merge lists efficiently, you need to sort them. And with the 64-bit representation, you can do this with just one subtraction and one branch rather than a combination of two subtracts
and two branches. This is a definite speedup if you are hand-coding, and possibly an even bigger one if your compiler doesn't inline all the 32-bit code properly.
Other example: 32-bits are large enough for most integer applications (you couldn't enumerate all the people on the plant though....) but they tend to fall down when you multiply, e.g. 100,000 * 100,000 has already blown the 32-bit limit, and neither of those are particularly big numbers. Whenever you start doing a reasonable amount of multiplication, 64-bit becomes useful.
Also, 64-bits is big enough to encode the positions of pieces on a chess board. You can use bitwise logic to analyse and store positions. GNU chess certainly does it this way. I expect a *cosiderable* speedup in the top chess-playing algorithms when 64-bit becomes widespread.
I'm really keen to se 256-bit arrive to be honest, 2^(2^3) has more elegance than 2^(2*3) and it would allow you to store a set of bytes in one register. Would allow some very cool text-processing tricks.
Course, it might never happen - I predict a move towards massively parallel 64-bit computers rather than stonking 256-bit ones as the next major evolution in processor power.
Infact, this legally prevents Linux developers from using this patch, because it explicitely state that you may not "reverse engineer" it.
Imho AMD have a serious problem in the relationship towards linux. Intel has helped gcc developement, they provide their own optimised compiler free of charge and they work together with kernel developers. Seems like AMD on the other hand couldn't care less. Don't get me wrong, I like AMD CPU. But I what use is a good CPU if it is not supported properly?
From just the story, it looks like to me that the linux kernel could send the cpu pages that were bigger/smaller then 4k, and it would have a problem with that. His fix would automatically detect the bug and resize the info that is sent to the cpu to 4k.
The original pentium bug had to do with the floating point processor on the chip, not with the size of page that was sent to the chip..
Of course I could be wrong about all this =)
I certainly wouldn't mind bashing AMD, and not just for broken processors.. Their whole socket A platform is just plain unreliable.
They just don't have critical people taking a stance against them, like Intel, so AMD never has to recall anything.
First of all, this bug is not that significant performance wise. Very little software is going to use 4 MB pages. I don't think you even have an option of allocating memory with 4 MB pages in user space. This appears to be an issue with being able to optimise drivers, however, if AMD's processors can't do this, and Intel's can, why don't we see Intel's processors greatly outperforming AMD's in Win2k? This is a minor bug, and it's easily worked around without patching the kernel in both Win2k and Linux.
The processors are basicly all their Athlon and Duron processors. For AMD or any chip maker to replace chips with bugs in them is VERY expensive. They already have a low profit margin. Replacing all "defective" Athlon and Duron processors would simply bankrupt AMD. Realisticly, all complex software or hardware has bugs. Bugs in hardware are much more difficult and expensive to fix. The truely significant hardware bugs are usually found early in testing. Other bugs are fixed in software, usually in the system BIOS, but sometimes in the OS code. This isn't something new. It's pretty much always been this way. Why has it been this way? Because no one wants to pay the outlandish prices that would result from trying to make hardware perfect. It costs a tremendous amount of money to reroll a processor. It's not as simple as making a quick code change and recompiling software. THERE WILL ALWAYS BE BUGS IN PROCESSORS! A truely significant bug like the Pentium floating point bug needs to be fixed in the hardware, and that one was even significant enough to deserve a recall of the processor. This bug is simple to work around, and isn't truely a significant problem.
The question you asked in the subject is "Should AMD do the right thing?" The answer is yes, they should correct their Technology Bulletin to actually say what the processor bug is, rather than just say here's a workaround to a bug that effects Win2k.
I'm really surprsed that someone at NVidia didn't pass this on to Linux kernel developers much sooner, since people at that company seem to have been aware of this for some time.
Why do you think Intel hasn't made more of an effort to really squash AMD?
.13m (more capacity in theory), and the market has constricted, they may change course and start competing on price with AMD.
Plus, some big customers have second-source agreements from Intel.
Also, until recently, both Intel and AMD have been running at full fab capacity, and Intel hasn't had an incentive to move into the lower profit markets that AMD inhabits. Now that they are going to
That third article about the supposed "HCF" instruction on the 4004 is completely and utter BS. None of the instructions on the 4004 will cause it to burn up, even on the earliest production parts.
When the IBM System 360 series came out it had a large number of new opcodes (as compared with the 70x/70xx series). These were the days of CISC (Complex Instruction Set Computers), and the 360 really lived up to the name. It gave over a large amount of its word space to opcodes and opcode extensions, so it had a VERY large potential opcode space. Much of it was unpopulated, but some was populated with undocumented instructions. Further, the machine was microcoded, and the microcode was loaded when the machine powered up. (That's what floppy disks were invented for.) So the company could write new opcodes and add them later.
Of course the new machine with the ENORMOUS list of opcodes and (true) rumors of hidden undocumented opcodes quickly lead to the circulation of a humorous list of perhaps 20ish additional "new undocumented opcodes". Things like XOE (Execute Operator Immediately), EK (Electrify Keyboard), SSJ (Select Stacker and Jam), BLNK (Blink Lights), WHR (Whirr), etc. The crown jewel of this list was HCF (Halt and Catch Fire).
While this list was still funny Motorola released the 6800 single-chip microprocessor, predecessor to 650x knockoff that formed the core of the first Apple computers. To ease chip testing, the all-ones opcode threw the chip into a test mode, where it continuously incremented the program counter and performed memory reads. This wiggled all the address lines and most of the control lines, letting you know if the chip was alive and bonded.
Of course they didn't tell you about it. And of course the only way out was hard reset. And of course a jump to an unpopulated region of the address space (i.e. most of it) would leave the bus floating and generate 0xFF. And of course jumping into random data or uninitialized memory would also quickly get you an 0xFF or jump you off into unpopulated address space. So the typical behavior for a program bug was to lock up the processor beyond the ability of a debugger to function.
(Hell: I had one of the first round of solder-it-yourself evaluation kits, bent a pin on the debugger ROM putting it into the socket, and ended up with a board that booted into the test state. Was starving student and it took a couple days to get access to test equipment to find out what was wrong.)
So of course programmers, once they found out about the instruction that hung the chip in a mode where it "twiddled its thumbs at maximum speed" and got a bit warmer than usual, and couldn't get out of the mode except by hard reset, quickly christened the opcode "Halt and Catch Fire". And this became the generic term for get-stuck-in-a-test-mode instructions on microprocessors, until the chip manufacturers finally came to their senses and stopped putting such instructions into instruction sets.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way