Major Linux/Athlon CPU bug discovered
GeorgeFrancisco writes "I recently installed the nVidia drivers so I could play TuxRacer on my Athlon. Problem is it kept inexplicably hanging Linux. Now I know why. The CPU bug affects Athlon/Duron/Athlon MP AGP users. Fortunately there's a way around it, and: "Alan [Cox] is going to try to add some kind of Athlon/AGP CPU bug detection code to the kernel so that it will be able to auto-downgrade to 4K pages when necessary." Read more on the Gentoo Linux site."
Could this be AMD's version of the bug in the original Pentium?
It was bound to happen, everybody makes mistakes.
God save our Queen, and Heaven bless The Maple Leaf Forever!
I noticed this too, it seems to only affect 3D games, mainly SDL based ones such as armagedtron, but strangly it hasent affected quake 3 at all. Unreal tournamet was affected, but i SWEAR it didnt use to do that.
Aww, now I have to figure out how to install UMODs in Linux.
There was a Win2k bug a while back that did the exact same thing, and you had to install a "LargePageMinimum" patch for it to not crash. Is this the Linux equivilant of that? And if so, how come it has taken so long to surface and fix?
http://support.microsoft.com/support/kb/articles/Q 270/7/15.ASP
Since September of 2000..
Now, since gentoo's well and truly dead (thanks to slashdot), can someone explain the bug and the workaround for us Athlon users?
- A.P
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
It really shows up if you use the pre-empt kernel patch. Ever since I added the workaround, things have been pretty solid. (not that it's been that long)
I guess it takes awhile to pile through the submissions. This was posted on pclinuxonline.com recently.
I don't think so. AMD reverse engineered the x86 and made their own implementation without Intel's crap in it.
AMD's version of the x86 that is in the Athlon and the Duron runs faster than Intel's chips because of this reverse engineering.
This bug could be a problem of reverse engineering the x86. It doesn't say Intel's chips have the problem.
--Metrollica
Is there more information about this bug anywhere? I'd like to know if it affects the Athlon XP I upgraded to in the last couple of months.
I had a Duron before which was very unstable with the nVidia driver AGP support enabled, but I've had a couple of crashes with the AGP support enabled on the Athlon XP - if I disable the AGP support it runs rock solid (current uptime is 8 days with GL screensavers and various GL apps having run for hours and hours of that 8 days).
I hope it's bugs in the drivers/agpgart and not the CPU - if AMD knew about this in 2000 the Duron I bought shouldn't even have had it, let alone me new Athlon XP.
More info (specifically, which CPUs it affects) would be really good. Any takers?
Chris "Ng" Jones
cmsj@tenshu.net
www.tenshu.net
Here is the cached article.
Thank Google again for this one!
--Metrollica
From what I've seen with amd motherboards (granted, this isn't amd's fault), half of those damn via chipset discount boards should be bonfired. The worst agp implementation ever seems to rear it's ugly head only in linux.
well, except that they released a patch back in 2000.
It would help if you actually read the story before posting uninformed opinions. Dumbass.
Chris "Ng" Jones
cmsj@tenshu.net
www.tenshu.net
Soooo... is Linux going to now start having kernel patches to detect people who overclock to 3+ gigahertz? lol -- I've heard that has some 'stability issues' as well *grin*
Groove Salad -- a nicely chilled plate of ambient grooves and beats.
It's not a Linux bug, troll, it's a bug in the Athlon chip itself. Read the damned article.
- Have a picture
Karma whoring, here I come. Hopefully this server can withstand a mild slashdotting. Link
The site seems to be down. However, last week, I contacted nVidia about this problem on my two dual Ahtlon MP workstations (random hangs when OpenGL is invoked). So the quick answer is you can
Boot your system with following option on your kernel command line: "mem=nopentium"
or
Disable AGP in XFree86 config (i.e. Option "NvAGP" "0" in the "Devices" section).
nVidia clued me into the first approach about a week and a half ago. It made my system completely stable. However, there was still some texture flakiness in some OpenGL applications. Since my workstations are number crunchers (and thus Quake FPS don't matter to me), the latter option eliminated both the stability problems and the texture flakiness (at the expense of some graphics speed).
By the way, nVidia mentioned the same issue exists on Win2K / Athlon boxes.
Enjoy,
Kevin
The Gentoo site says a simple workaround where you add "nopentium" to your kernel options at bootup and it will avoid the bug condition. Alan Cox is currently working on adding auto-detection of this bug in the kernel, so we wont have to worry about it soon.
And yes, this is the same Athlon Windows 2000 AGP bug that was discovered and patched last year with that registry key. They just didn't realize that it also effected Linux until now. I now realize that was the cause of my TuxRacer crashes with my nVidia card on my Athlon computer.
Boycott AMD!!!
--Metrollica
VIA does make some complete crap, but they also make some nice chipsets. The KT266A is very nice, it's the fastest DDR implementation out there by far. But still, VIA chipsets are a good bit cheaper than the Intel equivalent, and while the Intel chipset may be more stable, the VIA one is almost always faster. And even Intel has issues with chipset stability, it's just that they ignore them and only quietly replace the faulty boards when they're returned under warranty. You know how it goes in the computer industry... Faster, cheaper, or more stable- pick any two.
So does anyone know how performance is affected from this 4MB->4KB page thing?
AMD didn't turn interesting until the Athlon came out. The previous versions of its processors were decidedly inferior. This is *worse* than recalling for a bad, rarely used function call. I can't take a processor back 6 months after I bought it because it sucks, but I can get it replaced if it has a bona-fide bug.
If this is a bug in the processor, AMD really should fix it and offer replacement processors to those who need it. If they don't, and they expect you to patch your OS instead, then that definitely shakes my faith in that company. When you're an artist dependent on OpenGL, you can't have problems like this.
And finally...
Why are you worried about running 32-bit code on a 64-bit processor? 64-bit processors are supposed to run 64-bit code. Intel's not marketing 64-bit processors to replace desktop computers (today), they're for servers and high-end graphics with custom code. They don't NEED to run 32-bit code. I hardly think that's a point against Intel, especially considering they don't make it a big secret that 32-bit code runs slower on it.
"Derp de derp."
so the question is, if I configure my kernel for the K7 family, do I need to pass the kernel "mem=nopentium" or is this the default?
But the bug was there since Sep. 2000 !
You think someone in AMD may have correct the bug, but nooooooo.
How many version of Athlon / Thunderbird / XP / MP have there since Sep. 2000 ?
I thought in all new iteration of chip, they have "de-bugging sessions" - just like softwares - before the "tape-out" stage.
Have to wonder why AMD don't do debugging before tape-out ?
Is it money?
Or is the bug a "feature" instead ?
Muchas Gracias, Señor Edward Snowden !
Does anyone actually *know* if this is worked around in the *bsds?
Or do they use the 4k method by default anyway?
-Yarn - Rio Karma: Excellent
Nvdia drivers forces AGP to 1x due to corruptions caused by AMD Irongate chipset signal integrity [ Mentioned at the README for Nvidia 1.0-2313 Drivers ]
This newly discovered memory corruption with Athlon + AGP, is it contributing to the signal integrity of the Irongate ? Or is it a separate bug ?
Anyway this makes AMD look very bad in my view. There is a bug in the CPU and their chipset screws up my AGP to 1x. Sigh.
I should start by saying I haven't read the article yet, can't get to it. *hopes the /. traffic dies down soon...*
If it is a defect in the processor, I wonder if AMD will replace my existing processor. It may not seem like all that big of deal to most people here at Slashdot, but as a 3D artist I am *dependent* on OpenGL.
Don't get me wrong, I'm not having this problem now. (I'm not a Linux user.) But when I built my Athlon I had to install a patch for a similar type of problem in order to get the machine to work. At what point do we say "it's no longer ok to work around a CPU bug"?
If Intel has one set of bugs in their processors, and AMD has another, that divides the market. Software companies shouldn't have to put the effort into scrutinizing their code based on which CPU they are on, it's bad enough they are trying to optimize for one or the other. What happens when they get used to the workaround, but then it gets fixed? Worse yet, what happens when a company says "I'm sick of this, I'm only supporting one processor."
So it's not so much that I think AMD should replace the processors with this specific bug, but I think we should be vigilent in not allowing them to let errors like that run rampant.
"Derp de derp."
AMD doesn't keep tabs on VIA and VIA doesn't keep tabs on motherboard manufacturers.. The only decent AMD motherboards are the from manufacturers trying to compete in the enthusiast market where crap boards just don't sell. Combined with VIA actually being in competition with AMD in the budget processor market (The Cyrix) delaying a decent integrated chipset for the duron and VIA bullying motherboard manufacturers into not producing The SIS 735 chipset, VIA is not AMD's best friend.
AMD chipsets:
Nforce 220,420
AMD-760MPX,760MP,760
ALi MAGiK 1,MAGiK 2
SIS 735,745,746,755
VIA KT266A,KT133A,KM133,KLE133,KT333,K8HTB
STABLE (100+Days,Linux) Chipsets:
760,KT133A,735,760mp
Good Motherboard Manufacturers:
Asus,Abit,Iwill,ECS,Epox,Soyo
Personal Best Uptime 135 days, Iwill KK266 (KT133A), Power supply failure
If voting were effective, it would be illegal by now.
It is easily fixed through software and there doesn't seem to be any noticeable performance hit when turning off PSE.
The site with the article is slashdotted to oblivion, so I'm not sure what the problem really consist of, but according to what I have read in comments below, it's not really comparable.
As far as I remember, the the bug in the original pentium was a floating point flaw that led to wrong calclulations under certain circumstances.
I've had this problem before, so intermittent I didn't think it was worth worrying about.
:-) !!!!
When I noticed it the most (figured this out the other day) was when I had several compiles going at the same time, ie kde3 libs and say a new patched kernel.
I could never get it to reliably repeat, so never looked into it.
Great to see someone actually had the patience to figure it out
404 Not Found The requested signature was not found on this server.
I have an Athlon 850 with a Geforce 1. I thought I had finally gotten rid of the last of the system workarounds when I upgraded my BIOS and I stopped seeing "Stomping on Athlon bug" (the classic VIA chipset problem). Looks like this isn't going to be the case after all.
I have always compiled my kernel for Athlon optimisations and I use the NVidia linux drivers with agpgart. How come I haven't hit this bug before?
How much performance is knocked off the system as whole because of this? Is it a few percent? I presume this will hit all applications not just AGP ones...
can anyone tell me if this problem may occur when running SETI? only I used to run it on my dual MP Athlon under MDK 8.1 , but it would invariably kill my machine. so I stopped running it.. ideas anyone ?
nick(nospam)@(nospam)polyprecords.com
Electronic Music Made Using Linux http://soundcloud.com/polyp
I have 2 Athlon systems, a dual Thundirbird 1.4GHz (Tyan Thunder K7) and a single Thunderbird 1.4GHz (Asus A7V133). The former runs a GeForce 3 and kernel 2.4.17, the later TNT2 and RH 7.2 (kernel 2.4.9 I believe). Both systems run semi-custom NVidia drivers (release 2313). By semi-custom, I mean I tweaked them to use SBA, the NVIDIA AGP driver (NOT agpgart) and to run in 4x mode. The later has never had a problem, the former (the dual) had some problems until kernel 2.4.14.
The problems I had were frequent lockups with everything X, especially Q3A and Tribes 2. Some experimenting proved what worked and what didn't, and here's what I found:
agpgart never worked worth a damn even with kernel 2.4.17, despite several attempts by me to make it work (I don't maintain it, so I gave up on messing with it). Earlier NVIDIA drivers were less stable, but the latest is great (although it does not support FW, which blows). Tweaking the NVIDIA driver to use SBA and it's own AGP driver instead of agpgart, along with kernel 2.4.14 - 2.4.17 makes for a very stable and fast system. Older kernels just did not work worth a damn whenever I enabled DMA on my IDE drive - they locked every time. These newer kernels don't exhibit this problem, and the NVIDIA driver works nicely with all 3D games as well as 3D development tools like Blender.
My kernels have always been compiled as Athlon kernels as well. The bottom line is: don't blame this bug and/or the NVIDIA driver if your system is unstable and/or slow. There are other things at work, and in my case I seem to have found them all.
- Rohan
If you're using lilo, and just want to apply the workaround quickly, edit /etc/lilo.conf.
Before the first image= line, insert the line:
append="mem=nopentium"
The article says it happens when the kernel is compiled for Pentium processors; but does this happen if the kernel is compiled for a K7?
... I have an Athlon and it kept hard freezing. The bug doesn't happen with a Voodoo card.
By the way, I had to shelve my nVidia card a couple months ago because of this
Just as an aside, if you ever deal with ultrasparcs, you'll quickly find that the majority of the code used is 32 bit.
The reason for it is simple; most applications will run slower at 64 bit than at 32 bit. The ultrasparc chips were designed to take this into account. Hell, due to a firmware bug, solaris on my ultra 1 installs as a 32 bit kernel by defualt - and runs no slower because of it (although it can't run 64 bit apps that way). After a firmware patch, it is easy to change to running the 64 bit kernel though.
In all reality, why would most apps need 64 bit integers and whatnot? Most don't, and doing so is a waste of memory. If the processor is designed right, it can handle 32 bit code with no problems whatsoever.
Those who can't do, teach. Those who can't teach either, do tech support.
Oh my God, AMD makes you read a 7000-character licensing agreement in order to download a 334 byte patch. And people think the GPL is bad ...
Funny, I knew something was wrong...
dominionrd.blogspot.com - Restaurants on
MShaft: "Not-a-bug-it's-a-feature"
Intel: "Not a bug it's erratum."
VIA: "We slowed it down to keep it cool."
Nvidia: "That was a leak! We are not doing public driver beta testing!"
ATI "Who the hell plays Quack3?"
AMD "the patch is here"
If voting were effective, it would be illegal by now.
Not like they are recalling processors like Intel
-----
Oh great, so they make defective processors, but don't worry because they won't recall them! How in the hell does that make them better than Intel?
Think about it -- If you own an affected part a recall is GOOD!
...seems they work for Intel. Their description was:
"It's a major bug. We don't know how it happend. We will ask marketing. We don't remember ever sell that chip.".
:-))
------I can please only one person per day. Today is not your day. Tomorrow isn't looking good either.------
I thought he meant the fdiv bug in the first generation Pentiums.. :)
;)
My mistake, I guess. It's still a little early in the morning here.
Maybe because there are other video cards than nvidia using AGP in Linux which have the same problems?
Thank you for your attention.
-- Could you use my software consulting serv
Quake 3 demo was run with \timedemo 1 and \demo DEMO001 . Each test was run three times. The system load average was < 0.5 before Quake 3 was run.
Without mem=nopentium
FPS = 79.4 (79.4, 79.4, 79.4)
With mem=nopentium
FPS = 79.2 (79.1, 79.3, 79.2)
System tested:
Athlon 850, 384MB RAM, Geforce 1 DDR, VIA KT133 Chipset
Athlon/Duron/K7 optimised 2.4.17 kernel (optimising the kernel above pentium makes very little difference though)
NVidia 1.0-2313 video drivers using agpgart
Mandrake 8.0
Quake 3 settings
Texture depth = 16 bits
Colour depth = 16 bits
Geometric detail = High
Texture detail = High
Dynamic lights = On
Video mode = 1024x768
Looks like there is a difference but it's very slight (0.003%) but my benchmarks aren't very scientific. Either way, if there is an improvement in stability this tradeoff is easily worth it. Here's hoping that you don't run linux just for it's Quake 3 scores though...
Well I had the same problem with the same setup, and after days of frustration it turned out to be overheating due to bad thermal contact and air flow. My problem never occured if I ran only one instance of seti, because cpu affinity of linux kernel sucks, and as a good sideeffect it can't overheat a single cpu. If single seti does not kill the machine, you should consider thermal problems.
Gentlemen, you can't fight in here, this is the War Room!
you are only right on this:
they add their own tech too, which is why they get different results.
quote
Now, the Athlon processor is made by a rival company, AMD. They have
basically reverse engineered the Intel processors and tried to make a
processor that operates just like Intel's processors, and then sell them
cheaper than Intel does.
This makes it a little more difficult to compare them to the Pentium
processors. Some things the AMD Athlon actually does faster than a Pentium
III, some things it does a little slower, and some things it can't do at
all, while other things the Intel can't do, the Athlon does do.
quote
Had AMD had a design ready when Intel released their Pentium, their market share
wouldn't have dropped to 10%. In the days of the 286, 386, and 486, AMD, Cyrix, and other "clones"
reverse-engineered the Intel chips. In a sense, it was Intel's design (with maybe a few improvments),
but it was reverse-engineered so it did not violate patents.
quote
But nothing lasts forever. The companies that had built Intel chips under license eventually reverse-engineered the chips and built them license-free. Intel copycats including Advanced Micro Devices (AMD) and Cyrix (a division of National Semiconductor) used the courts to validate their right to copy Intel's chip architectures. And PC manufacturers like Compaq and IBM used these clone chips as a weapon to force Intel prices down. Now the best way for Intel to stay ahead is to simply run faster. Running faster means shrinking product cycles from three years to 18 months by running parallel product development teams and spending more money faster than the other guys. Since Intel has more money to spend, this keeps them in command, but shorter product cycles mean less time to recoup R&D expenses. Hence, those lower margins.
someone better mod me up for all my work
--Metrollica
no problems on FreeBSD, apparently
Shessh - that was quite a mistake (didn't multiply by 100). Man I hope nobody hires me for my mathematical ability...
I just got a new box, Athlon 1.2GHz... Asus a7a266 mainboard... nice little box for general usage. Soon as I finish moving, I'll get cable modem back and stop using mom's AOL, and I'll go back to Linux. But now I see this, and I'm eyeing my AGP card, and wondering. AMD has earned a lot of respect from me in the last couple years, as I've found the Athlons to be simply the finest x86 CPU's I've ever got my hands on, at great prices with very reasonable motherboards/chipsets as well. Now this. I'm not sure. Yeah, it's an engineering mistake, but I'm not clear on how AMD is handling it, and I hope they don't disappoint me. Sure, you can do a workaround - but as others have asked, what's the story on the performance hit? What about AMD working with the kernel folks to find another, better solution? Or maybe AMD could consider offering serious discounts on new, un-flawed CPU's, for those who are already eyeing upgrades?
think for yourself, you won't like the results if others do it for you.
I've posted some quake 3 benchmarks and it looks like the difference may not be significant (less than a half percent). However this is by no means a good test of a heavily loaded system (i.e. a high load average), nor does it test the effect when memory is tight (which is when I guess more paging would take place and the change would be more noticible).
except model 1 (the very first K7 since it didn't have PSE) are affected,
except the latest revision A5 (cpuid 662) of the Athlon XP, i.e. A0/660 and
A2/661 are affected as well (similarly all 64x Thunderbirds etc.).
(there was a model 1, 2, 4 and 6 Athlon, with 6 being XP)
Some or all Durons might be affected too, but I didn't look at that closely.
The above hinges on whether this is the correct bug description, feel free :)
to flame the anonymous coward if this has got nothing to do with it
"16 INVLPG Instruction Does Not Flush Entire Four-Megabyte Page Properly with Certain Linear Addresses
Normal Specified Operation. After executing an INVLPG instruction the TLB should not contain any
translations for any part of the page frame associated with the designated logical address.
Non-conformance. When the logical address designated by the INVLPG instruction is mapped by a 4-MB
page mapping and LA[21] is equal to one it is possible that the TLB will still retain translations after
the instruction has finished executing.
Potential Effect on System. The residual data in the TLB can result in unexpected data access to stale or
invalid pages of memory.
Suggested Workaround. When using the INVLPG instruction in association with a page that is mapped via
a 4-MB page translation, always clear bit 21."
(page 7 from Athlon Model 6 revision sheet)
Hmm... your definition of "good motherboard manufacturers" seems to be quite different from mine. There's only two manufacturers I would recommend every time, and that's Supermicro and Tyan.
If that is the case, and I see no rason why is is not then surely it only applies to backward compatable 32-bit code and not the 64-bit chip specific 32-bit code.
am i the only person here who sees x86 compatability in 64-bit chips a MAJOR BAD POINT. hell the chip should never have gone 32-bit, it's a complete pile of poo.
Carrot007.
+----------------- | What is the question!
The current workaround gets around this problem by disabling 4M (2M?) pages (PSE). Hence we go back to 4K pages, and mapping large slabs of VM is a little slower and wastes memory (we need another Page table for each slab of 4M) and obviously more TLB misses/space wasted, because to touch the whole 4M region, the CPU needs to do up to 1024 page table lookups instead of 1.
:-)
As discussed this may have performance implications.
According to the AMD docs, the problem is only when flushing TLB entries with INVLPG and the page is a 4M page, _and_ the virtual address's bit 21 is set (which does not affect the 4M block of memory the address is in - eg: 0x400000 (2^22) vs 0x600000 (2^22|2^21) are both in the second 4M block).
Hence, when invlpg'ing a VA we just need to INVLPG(address&~(1 (leftshift) 21)). This only requires a single ANDL instruction. But we need to distinguish a 4M page first though, so I don't know?
Heck maybe we should just do it the FreeBSD way and recursively map the Pagedir
Any ideas? Will this work?
--JQuirke
None of the Athlons or Durons I've built have had any problems with Tux Racer (Mostly on Man8.1 default install).
My nephew spends hours Sliding that little penguin arround with that bloody elevator music going, & not once has there been a freeze or lockup, much to my dissapointment.
Maybe it sheds some light on this issue.
I've lost count of the number of times I wanted 64-bit integers, in pretty general purpose apps.
Not because I do big databases or suchlike, but they let you do loads of optimisations that wouldn't otherwise be possible. For example, you can pass around 8-byte structures in a single register, which is damn useful given the lack of available registers in the x86 architecture.
Example: I've recently been coding a large hexagonal grid component. Each point in the grid is indexed by 2 32-bit (x,y) integers. With a 64-bit register, you could put a full co-ordinate into a single register.
Why is this useful? Well, one of my requirements was to be able to manage large sets of co-ordinates (think reachable spaces for an AI). You want to be able to combine sets of co-ordinates, which basically requires merging two lists. In order to merge lists efficiently, you need to sort them. And with the 64-bit representation, you can do this with just one subtraction and one branch rather than a combination of two subtracts
and two branches. This is a definite speedup if you are hand-coding, and possibly an even bigger one if your compiler doesn't inline all the 32-bit code properly.
Other example: 32-bits are large enough for most integer applications (you couldn't enumerate all the people on the plant though....) but they tend to fall down when you multiply, e.g. 100,000 * 100,000 has already blown the 32-bit limit, and neither of those are particularly big numbers. Whenever you start doing a reasonable amount of multiplication, 64-bit becomes useful.
Also, 64-bits is big enough to encode the positions of pieces on a chess board. You can use bitwise logic to analyse and store positions. GNU chess certainly does it this way. I expect a *cosiderable* speedup in the top chess-playing algorithms when 64-bit becomes widespread.
I'm really keen to se 256-bit arrive to be honest, 2^(2^3) has more elegance than 2^(2*3) and it would allow you to store a set of bytes in one register. Would allow some very cool text-processing tricks.
Course, it might never happen - I predict a move towards massively parallel 64-bit computers rather than stonking 256-bit ones as the next major evolution in processor power.
For good measure, re-install your grub config by running:
Where /dev/hda is your boot disk.
For most PC users with IDE drives, it will
be /dev/hda .
Last, just reboot.
chongo (was here)
From AMD's website:
Note: This patch is not needed for Windows XP
I had the same problem but with my Pentium 4. When I installed the nvidia drivers the system became very unstabled and crashed often. But hey, I could start tux racer and thats the reason I installed it in the first place. But that problem was only in the latest mandrake release (8.1), it worked fine with RedHat (7.2).
Of course if you were in that situation, you must not have noticed.
Bug or no bug, my machines have been running just fine. I bought them based on reviews that showed them running circles aroung Intel and they did. At the speeds the newer machines run, I'd hardly notice if they were hanging.
DMCA, Hollings, Palladium. What might have sounded like paranoia is now common sense.
I'll second that : Iwill boards are consistently better than average in terms of both performance and stability. Abit sucks ass though, they try to push things too far and forget that a super-overclocked machine that hangs every hour isn't worth shit.
ECS are off to a very impressive start with the K7S5A board. Using the SIS 735 chipset, it is unsurpassed in reliability and offers very decent performance as well. Overclocking isn't its strong point, but at a mere 65$ price tag you can invest the money saved on a faster CPU.
(no I'm not sponsored by ECS, I just hate my Abit KT7-Raid and am jealous of all my friends who have the ECS board)
-Billco, Fnarg.com
A lot of your market share is there only because we who use Linux® have stuck by you. We have been ridiculed because we are using an "off-brand" processor, we've rationalized a way thermal problem's and fragile cores to get the benefit of more bang for the buck. We have suffered through inadequate compiler support, until your market share has grown to the point where an honest push onto the main-stream desktop is possible.
And what do we get for it, no real support, write your own fix, no; that we can, and often do. What we got was forgotten, you didn't even tell us. We are used to and demand full disclosure, and in real time. Linix people hang their dirty laundry out in public to give everyone a fair and equal chance at a fix.
We're often treated as a minority because we are, but treat us as a second class minority at our own peril. In short don't ever let the marketing weenies convince you to hide something from us; if we wanted to be treated that way we would use Win/Intel products
Apocalypse Cancelled, Sorry, No Ticket Refunds
I ran a T-bird 900 with a GeForce3 for awhile on 2.4.17 using NVagp and never had one bit of a problem. Now I have a XP 1700+ and still have no problem, though i've switched my board from a kt133a to a kt266a and have to use agpgart because the NVidia driver dosn't support my chipset. Still perfectly stable (i've noticed one or two graphical glitches in glx apps lately but never a lockup or a crash)
Hey moderators, don't waste your points modding this AC up to +5,
It's not a waste if the content of the comment deserves it.
that's a total of 6 points
No, ACs start at 0. It takes 5 points to get them up to Score:5.
AC's don't get karma remember?
And neither do experienced users thanks to the karma kap. Karma determines only who gets to moderate and who gets to post at 2. It does not equal penis size. In fact, since Slashdot upgraded to Slashcode 2.2 with a messaging system to tell me when I've been modded or replied to, I haven't even looked at my karma.
Will I retire or break 10K?
So, if it was discovered over a year ago, was this hardware bug ever fixed? We bought a dual-athlon 1.53-GHz (1900+?) machine recently; do these processors still have the bug?
I certainly wouldn't mind bashing AMD, and not just for broken processors.. Their whole socket A platform is just plain unreliable.
They just don't have critical people taking a stance against them, like Intel, so AMD never has to recall anything.
I'd always assumed that it was just a crappy AGP implementation on my no-name motherboard, as I'd been following the Mesa/GLX groups for a while and hadn't seen the problem mentioned all that much. It's nice that there's a relatively easy fix for the problem. Maybe now I can get back into Tribes2 again :-)
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
For example, you can pass around 8-byte structures in a single register, which is damn useful given the lack of available registers in the x86 architecture.
And when you want to use or change one byte in the structure, what do you do? Shift it out and put it in another register. You can beat the "lack of registers" argument by switching to any current architecture but x86; you'll get at least 16, most likely 32, or even 64 registers.
And with the 64-bit representation, you can do this with just one subtraction and one branch rather than a combination of two subtracts and two branches.
One problem with your algorithm: one subtraction will "carry" over into the next because the processor assumes you're subtracting whole integers. What you want isn't really 64-bit integers but rather vector SIMD as found in MMX, SSE, and 3DNow!. In fact, AltiVec on the G4 processor is 128-bit.
Will I retire or break 10K?
Could somebody with more knowledge explain why you need 4MB pages in the first place? Pages are supposed to be small for a reason. With 4MB pages, internal fragmentation would go through the roof. It's almost like not having paging at all. I don't understand why this option is even available and used.
___
If you think big enough, you'll never have to do it.
I finally throw away my old P-100 with the division bug and buy an Athlon, this happens.
I find it rather interesting that for Win2k, you needed to install a patch. For Linux, you can just edit your bootloader with an option, and it does the same thing. Which seems more robust?
Granted, the Win2k patch was probably just a registry tweak, but which could the average user do more easily? Which operating system gives more information to it's users?
My beliefs do not require that you agree with them.
Well, after a bunch of problems with Supermicro P5Es, my list ended up more like: ASUS, Abit, Tyan.
I personally haven't tried Epox, avoid IWill and ECS, and will stay away from Soyo boards until the day I leave this Earth... not that I'm bitter about any experience with Soyo, but one can only take so much jerking around...
"It's tough to be bilingual when you get hit in the head."
> In all reality, why would most apps need 64 bit integers and whatnot?
;-(
If you're arguing "apps don't need more then 64 bit registers", then I guess you're not a 3D (graphics or geometry) programmer to see the uses
3D apps can make use of 128-bit (16 bytes) integer registers. You can pack 4 floating point numbers (4 bytes) into one register (one of the things, the PS2 does right.)
e.g. Reading (or storing) quad words in one shot, doing a dot product, or cross product, parallel add, etc, are nice and neat with large 128 bit integers / registers.
Cheers
I'm going to have a G400, but that's because I'm moving a card over from my main machine that's a P3-600 until I can afford another card. Most people getting an Athlon are looking for maximal speed (Who isn't?) so they're going with the NVidia cards because they're "fully" supported with all functionalities including T&L supported (The Radeon doesn't have T&L right at the moment and the top of the line one is a different card w/no support right at the moment...). Most of the Athlon crowd is going to have NVidia cards unless they're insistent about having everything Open Sourced. There's nothing wrong with that position, but since the profile indicates that there's not going to be as many people with other cards, how would they see other AGP cards having this problem?
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
I've always been an Intel-supporter. At least their chips don't burn up if your heat sink falls off. And their power consumption has generally been half of AMD's chips.
I avoided the early AMD chips because of compatibility issues. However I was planning on trying an AMD Duron chip because I thought that AMD had a decent chip out now. Guess I'm wrong...
Buying AMD will eventually bite you in the ass somehow someway.
Go ahead...flame me.
but I can get it replaced if it has a bona-fide bug
...but you can't take you'r motherboard back to shop x and say "Um... my processor didn't fit" when you bought it specially for the certain processor that had to be returned. Having just bought two Athlon MP chips and an Athlon MP specific motherboard, I would hate to have to return Any of it.
I think it is more important that we help with the fix rather than spend time arguing about who's fault it is and why nothing was done sooner.
Would we be complaining if it caused problems with QNX? (hehe well some people might). It's not the hardware manufacturer's place to test their product with all software that's on the market. It is their place to release accurate specs for their product so that software producers can work from these.
This particular problem just goes to demonstrate the problems caused by trying to make everything backwards compatible. It's my guess that if AMD had made a new 64-bit chip from scratch instead of making a faster 32-bit one, then this wouldn't have happened (but sure enough some other bugs would creep up).
Follow me
All in favour, say "Aye!"
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Recently purchased 2 XP 1600+s (1 in Dec and 1 in Jan) - both indicate they are Rev A5 (CPUID 662) and do not have the INVLPG bug according to AMD's errata sheet.
Yes, UltraSPARC's run significantly slower in 64 bit mode. IIRC, this is because it takes more instructions to load 64 bit constants and access 64 bit pointers. This is not true of all 64 bit processors -- and it is not true of x86-64.
The x86-64 architecture allows 64 bit programs to take advantage of the extra precision (and doubles the number of general-purpose registers, which x86 desperately needs), without forcing them to take the performance hit of using the full 64 bit addressing. It also adds a new, IP-relative addressing, which makes position-independant code (ie, shared libraries) much more efficient. There will be an increase in code size (and possibly a performance drop, but this depends on how AMD implements the 'movabs' instruction) when you start using more than 4GB of data. And, when you start using >4GB of code, things get yucky (requiring indirect jumps).
But, the point is, x86-64 will run all your 32 bit x86 code at full speed, and if you're able to re-compile your programs for 64 bit mode, you should get a performance boost, if only from getting 9 more registers (8 + no longer need to keep a pointer to the GOT).
The other day I left my Dual Boot system (with a Nvidia GeForce 2 MX 400 and NVIDIA drivers) booted into Mandrake Linux for most of the day and it was fine. Of course I was at the system for most of the time. I decided to go to the store and when I came back the system was locked tighter then a drum. No big deal since I run ext3 for the file system. Rebooted and it was fine. How would one add this option to a GRUB bootloader?? I bet if I add it, the screensaver won't lock (Open GL screensaver.....). I don't play a whole lot of games so the texture flakiness would not bother me.
Gorkman
First off, yes, this is a rather major bug.
But is it enough to warrant not buying the processor or flaming AMD???? Hardly!
EVERY piece of hardware out there has some bug in it! Have any of you ever sat down and read the list of errata on Intel parts and the list of how many flaws are fixed in each stepping? The list of bugs fixed over the life of the P3/Celeron core is a rather lengthy document to say the least.
And I can't really fault AMD at all on this one other than that they HAD a bug...for Win2K etc, they released a fix/patch in very short order and notified everyone rather quickly.
And don't forget this was back in 2000! What version was the norm for deployed kernels back then (over a year ago!)???
From what I gather, the 4Mb AGP paging didn't show up until kernel 2.4 builds -- which I do not think were final at that time. Regardless, I feel the Linux kernel community should have been a bit more proactive in noting a DOCUMENTED bug and correcting for it.
Regardless, this bug in no way affects whether or not I would buy an Athlon/Duron. It is basically trivial to workaround and results in almost no performance loss. In essence, my Athlon XP 1900+ with the fix will still beat the crap of most P4 2Ghz machines in 90% of all applications (for half the price).
This is basically a failing of the entire Linux concept more than a failing of AMD.
Is there any central authority who regularly checks AMD, Intel, Via, Transmeta, etc. erratum sheets for bugs that might potentially affect the kernel? Based on this, I strongly doubt it.
Don't get me wrong, Linux is a great OS, but the lack of centralized control and build management is starting to cause problems. There are so many changes to different modules that version dependencies crop up all the time and no one is managing them.
I am not a big fan of Microshaft myself, but I would put money that they have at least one or two people whose job is to do nothing but monitor the processor manufacturers erratums to make sure no major problems submarine the sales of Windows XP! Bill Gates may be many things, but stupid is not one of them. H*ll, if I was Microshaft, I could have a marketing field day on this one -- it could be a very persuasive argument to lots of upper management types as to why Windows is better than Linux.
Is this bug a problem? Yes.
Was the original problem AMD's? Yes.
Did they address it and notify people? Yes.
Did anyone in the Linux community actually notice? NO
Regardless, any bug that can be worked around this easily is not THAT big a deal people...but it does point out some serious flaws in how Linux kernel development is managed. If Linux is to survive, some order had better start arising out of the spiraling chaos!
So to sum up the appropriate response to this bug: LEARN FROM YOUR MISTAKES AND GET OVER IT!!!!!!!!
"The sky is falling! The sky is falling!"
sheesh...
All I was saying is that some systems, solaris being my example, only use 64 bit applications when it's best to - otherwise the majority of the system is 32 bit. I can understand pushing 64 bit performance, but 32 bit performance is important too. Then again, I don't design chips :)
Those who can't do, teach. Those who can't teach either, do tech support.
Does anyone know if registered and/or ECC ram help? It seems that if this is a problem of memory corruption, something like ECC ram could at least reduce the chances of corruption.
- what proccessor rev its fixed in. I'm wanting to buy a new machine, it's still gonna be AMD, but I don't want a processor with that bug, as I am a big gamer.
- how to tell if my processor is affected. (I'd rather not have to wait for my system to crash to find out)
Do not spread "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0" over the internet, thank you.
You wrote: "AMD really should fix it and offer replacement processors to those who need it"
Surely, you must be joking. If a simple software workaround is all you need to get going, then there is absolutely no reason to spend millions of dollars replacing chips they have already shipped.
I'm sure that AMD will fix this bug in their next Rev (if they haven't already).
There are tons of bugs in everything. No one is perfect. But think about it this way. This is a MINOR bug. You will never get the "wrong answer" when a simple workaround is in place. You will never have to make a critical decision based upon bad data. You just need more TLB misses to get there. Minor, minor, minor.
Maybe AMD should have communicated this problem a little better. I'm sure they have tried to learn from the huge Intel debacle from the first Pentium. But there, you could get the "wrong answer".
-spitzcor
Same problems here on an AMDK6-3 on ActiveServer. PIII fixed that. My Athlon has never crashed without my help:
wtmp begins Mon Feb 14 17:48:19 2000
01:05.0 VGA compatible controller: 3Dfx Interactive, Inc. Voodoo 3 (rev 01) (prog-if 00 [VGA])
Subsystem: 3Dfx Interactive, Inc. Voodoo3 AGP
www.dedserius.com
VB != VisualBasic
Of course, you need to run lilo before you reboot.
//FIXME: Bad
What irks me is this: I got hit with this bug. I posted bug reports to Debian, with NVidia, on different forum, report lock-ups in certain open-GL situations. I got generally hand-waving "read the fucking manual" responses.
As the article notes, this isn't just a problem with AMD. It suggests that there's an ongoing problem with troubleshooting and resolving the sorts of issues that desktop users are going to have in Linux. (And "paying for support" would not have resolved much, would it have? The problem is the lack of coordination, not the lack of money.)
"when you multiply, e.g. 100,000 * 100,000"
When you multiply 2 32-bit numbers and really need the full precision of the 64-bit result, yes, then you need some 64-bit registers. However, that does not mean you need to have a multiply instruction that accepts 64-bit inputs. Also, often you don't need more than 32 bits of the result. In that case a barrel shifter in the chip right after the multiplier would already give you what you want without needing the large and slow 64x64 multiplier in the chip.
On DSPs, you can often choose between 'integer mode' and 'fixed point mode'. In the former case they mean integer input values just like the CPU has, and in the latter case they mean values in the range [-1,1>, which places the decimal point 31 bits more towards the LSB. In 'fixed point mode', it's intuitively easier to stick with 32 bit precision if more precision is not needed.
Additionally, DSPs have 'MAC' instructions: "accum out = accumin + (in1*in2)". Often, the number of bits in the 'accum' registers is larger than the number of bits in the 'in1' and 'in2' multiply inputs. A 16-bit DSP often (always?) has at least 32 bit wide 'accum' registers, often more than that, with up to 4 or 8 overflow bits in some cases. You need the overflow bits when you use the MAC instruction repeatedly (which is done often in typical DSP algorithms). With 4 overflow bits, you can use the MAC instruction 14=16 times and be guaranteed you'll never overflow 'accum'.
Personally, I'd more prefer the CPUs to get more DSP features than a simple increase of 'bits'.
--- Hindsight is 20/20, but walking backwards is not the answer.
I've posted this elsewhere but to clarify - it looks like this will still happen regardless of which processor you have selected (even i386!). This is because the test for whether your processor does pse seems to be run on startup (I think it's done by arch/i386/mm/init.c __init pagetable_init).
As an aside, as far as I can tell the only (extra) things that optimising a kernel for a K7 seems to set are gcc options (someone please correct me if I'm wrong).
I think the k62 had this problem as well. Anyone know about that?
Spring is here. Don't believe me, look outside!
It's rather hard to read non-existent documentation. This bug isn't listed in the AMD K7 errata, which is why it wasn't found - the only 'documentation' for this is the Win2k patch that AMD provided.
Linux and *BSD just do things differently: it's not a matter of one set of hackers being better than the other.
himi
My very own DeCSS mirror.
I've seen a number of mysterious X freezes in XFree86 4.1.0 and earlier on my Athlon/GeForce2MX system with NVidia kernel/X drivers. Most often the X server just seems to lock up when I'm doing nothing in particular. Occasionally I've had the whole system freeze during 3d gaming.
This is all with Linux 2.2.18. Has anyone commented for sure on this bug in the 2.2 series?
- jon
Ganymede, a GPL'ed metadirectory for UNIX
Maybe this would explain the extreme flakiness of my box. I've searched and troubleshooted high and low for a solution to the constant crashing and hanging of my system:
ASUS A7A266
512MB Crucial DDR
Athalon 1.4GHz (266FSB)
Nvidia GeForce2 GTS 32MB
Red Hat 7.2
However, it was difficult to pin down the culprit and now the problem *has* been fixed proving that the the decentralisation of Linux does work, albeit slowly when there isn't outside help. I suppose you can point to FreeBSD as an example that management does help but even over there not everyone knows about it and mistakes are made.
The problem was nobody knew exactly where this was happening. Things like the properitry NVidia drivers are frowned upon for not being open (fair enough - it makes it more difficult to tell where bugs are) because it's difficult to prove they are not buggy. Ironically, it was these drivers that were helping to show up this bug. In the end it took people from within NVidia who could check their own code is correct to file this bug simply because they were in the best position to verify it.
But you say "this is an AMD bug"...
AMD released a patch for this bug for Windows 2000 way-back-when, but the way that they went about publicizing it made it appear that the bug affected only Windows 2000 and nobody on Linux kernel development realized that it also affected Linux.
However, AMD themselves say that it's a bug in their CPU, so I think it's fairly safe to say that it's a bug in their CPU.
If you're a zombie and you know it, bite your friend!
I think you are missing the point here. Does a Win2k user have to be connected to the internet in order to fix their system? Yes. Does someone on the Linux system? No. Imagine you manage 100 machines. Which would be easier to fix? Push out lilo.conf files to the Linux machines, or install 100 patches?
My beliefs do not require that you agree with them.
First, as has already been pointed out, there is no performance hit.
But I still did not get the answer to my question. What is the purpose of having 4MB pages in the first place? It is inconceivable that an OS would use 4MB pages exclusively. The internal fragmentation would be enourmous.
To give you an analogy, think of what would happen if your file system used 4MB blocks. When you create a file, space would be allocated 4MB at a time so a 1 byte file would waste (4MB - 1byte) of disk space; (4MB + 1byte) file would take up two blocks, also wasting (4MB - 1byte) of disk space. On average, each file wastes 1/2 of the last block. Similarly, each process wastes on average 1/2 of the last page. That's not a problem if the pages are 4KB in size, but with 4MB pages there's lots of space wasted. That's like throwing away paging altogether.
So, I ask again, what is the point of having 4MB pages?
___
If you think big enough, you'll never have to do it.
It's a registry patch. That's all. If you have multiple machines on a network and the users log into a domain, you can trivially write a batch file that will apply the patch. The difficulty of placing this patch is going to be entirely dependant upon each specific situation.
The only way linux/freebsd/etc. will see this as being an easier situation is if in the future they make detection and application of the problem an automatic feature of the OS. MS could do the same, but the question is will they?
Here's the stark truth for you: 1)Money, 2)Userbase.
Remember when Corporate Enemy #1 was singing something to the tune of, "Whenever there's a problem, you don't know who to blame?", and how The Community laughed it all off as FUD? Now you can see that whenever there's a problem, you don't necessarily know who to notify, either. Don't call it a feature one day and then curse it the next. That sounds all too much like somebody else...
Wow, what a bunch of FUD. I've never had a thermal problem, never had a "fragile core" and never suffered from "inadquate compiler support", whatever that is. My new XP system seems to get stuck every now and then, but it's a new system and I've probably done something stupid like make my swap file much too large. My k6/2s and my Athlon perfom better than their more expensive Intel counterparts. According to reviews the XP system sould work just fine when I finish ironing it out. Other people have made it work and so it will work for me too. AMD does math cheap and better than Intel. Gaming? That's not my bag, but others are reporting good results.
When you look at the support AMD gives the Widoze world, you have to remember that those who suffer under M$ OS NEED the help. Just check out the utilities offered at their site. AMD CPUID? cpuinfo gets that for me. They rushed out with the goofey win2k utility because Windoze users can't pass information like, "no_pentium" to their kernel or recompile it. I'm no kernel hacker, but the AMD documentation site looks informative.
Yeah, they could have been nicer about it, but I'm not about to give up on AMD over a video bug. The short answer is that this looks like an error of ommision that could occur in any large organization. The folks in the Windoze software branch, asside from having goog feelings toward M$ by the nature of their jobs, might not know who to contact to get the word out to the Linux world. Does AMD even have a Linux division?
We're often treated as a minority because we are.
Another beautiful flame disguised as advocacy. It's neither true nor right. There are now more Linux developers than there are M$. I refuse to accept immoral or offensive behavior for myself, a minority of one. Fortunatly, there seems to be none of that here, at least not intentionally.
Friends don't help friends install M$ junk.
cat /proc/cpuinfo
stepping 2 is rev A5 which according to the AMD Athlon Processor Model 6 Revision Guide does not have the bug.
Apparently, if you'd believe the Linux community, you'd be hard-pressed upon where to place the blame. You see, the Linuxist Manifesto's number one rule is to lie to protect the best interests of Linux. No self-respectable Linux zealot would insult or place blame upon AMD, because AMD's philosophy centers around tackling American Corporations with their Asian sweatshops, selling their chips at bargain-basement prices like the Red Menace Commies do with their Wal-Mart shit.
So, right about now, you're probably thinking that the zealots are clearly in a dilemma. Who are they going to blame? If you have a prediction before I tell you, the poll is on the right. Or maybe the left. Either way, take your pick.
You'd think that the parasitic community would place blame upon Microsoft, right? Alas, Microsoft has had the bug patched since September 2000. Not only that, Windows XP , the latest in the suite of high-powered, stable operating systems from Microsoft Corp., has this patch built in. That's right, built in. Keep in mind that Windows XP was released in October 2001, over three months ago. Meanwhile, no one knows what the hell Alan Cocks has been doing since then, since he hides under the cloak of secrecy. nVidia has been informing users via tech support, even to the Linux community, how to fix the problem for months now. Clearly the blame is upon Alan Cocks's shoulder, but to place the blame where it is rightfully justified is inexcusable in the Linux community. The drones are in disarray.
The actual bug occurs when Linux users contract the Tux Racer virus via KEmail. When first run, Tux Racer enables a feature in your third-world sweatshop AMD processor called "extended paging." Now, I know you're probably thinking that this sounds like some sort of Nokia feature. Well, you're wrong. It's yet another feature that AMD illegally hacked from Intel. It allows your browser to seamlessly view pages up to 4Mb in size. Before its introduction in the early days of the Intel Pentium processor, web pages were broken up into 4K segments, because any pages larger would freeze the computer. That's why Microsoft didn't invent Javascript until after the Pentium, every time they went to use it, their pages exceeded 4K, and henceforth froze the computer. Intel came to the rescue with the Pentium line of chips, and, as usual, AMD got out their super high tech Asian hacking tools and "reverse-engineered" (code-name for 'illegally hacked') Intel's technology. Thus, users of the inferior AMD Cyrix Kx86-2 Now! processor could also view large web pages without crashing. So why did no one notice that pages larger than 4K would crash AMD processors? Well, Microsoft has had a fix for 16 months, like we mentioned earlier. But why did no one from the Linux community notice? Well, apparently, there does not exist a page devoted to Linux that is more than 4K in size. Since most of the Linux installations out there denounce color as 'feature bloat,' all Linux pages follow an unwritten oath to suck. Believe me, they all do.
So, for the good of Linux, you may now disperse. Head off to various tech sites and continue blaming Microsoft for not telling you sooner. Your community will thank you.
As someone pointed out in elsewhere, this would make the processors too expensive, if the vendor had to ship replacement processors each time a bug was found. Lots of bugs exist in processors, and typically they are fixed with each new stepping. Look at /proc/cpuinfo and see how many bugs it checks for (fdiv_bug, hlt_bug, f00f_bug, coma_bug on my system). This bug will probably be just another line. There is a simple workaround for it too, so it is not that bad.
The real problem (as may people state) is that AMD did not inform the kernel developers about this problem long ago, so a fix could already be implemented.
s/LSB/MSB/ obviously
--- Hindsight is 20/20, but walking backwards is not the answer.
I agree with most of this. A lot of having a stable system comes from paying $30 more for a decent motherboard. Also, the AMD market tends to be oversaturated with commodity memory. While the Intel side of things tends to use rambus, which is all pretty decent quality, most non-DDR RAM people buy for AMD machines is just crap. The thing memory affects the most is-- you guessed it-- system stability.
I have an AGP Nividia Geforce 2 MX, and an AMD K6-3 333 MHz. I have experienced these memory corruption, graphical anomolies, and lockups in linux and windows 95.
I noticed that AMD K6-3 was not mentioned, but it has to exist on it. The K6-3 was made with the same instruction set as a pre-Athlon. Thus the bug definately exists.
Not sure about K6/K6-2, but it is possible.
The Windows "patch" just changes a registry key, so the correct answer to your initial question is No, a Win2K user does NOT need to be connected to the internet (other then to figure out just which key to change in the first place, in which case a Linux user would also need an internet connection to figure out which line to add to the bootloader).
The only really valid point I see against Windows in this situation is that the registry has essentially no documentation! There are TONS of settings and customizations available within the registry for Windows, but essentially nobody knows even 1/10th of them because of lack of documentation!
Alan Cox and other kernel hackers do read these documents. The question is if AMD documented this bug in their errata, or just fixed for Windows 2000 and figured that was good enough.
Bug #16 in AMD's Errata list for the AMD Athlon Model 6 processor (ie the AthlonXP/MP) lists the following:
(Begin quoting)
16 INVLPG Instruction Does Not Flush Entire Four-Megabyte Page Properly with Certain Linear Addresses
Products Affected. A0, A2
Normal Specified Operation. After executing an INVLPG instruction the TLB should not contain any translations for any part of the page frame associated with the designated logical address.
Non-conformance. When the logical address designated by the INVLPG instruction is mapped by a 4-MB page mapping and LA[21] is equal to one it is possible that the TLB will still retain translations after the instruction has finished executing.
Potential Effect on System. The residual data in the TLB can result in unexpected data access to stale or invalid pages of memory.
Suggested Workaround. When using the INVLPG instruction in association with a page that is mapped via a 4-MB page translation, always clear bit 21.
(end quoting)
It's there. It's been listed there for quite some time. No one read the errata and/or no one bothered to check to see if this one affected Linux systems. I hate to break it to the Linux boys, but they kinda missed the boat on this one. Normally Linux kernal hackers seem pretty good at staying on top of processor bugs, but it looks like this one slipped through the cracks.
FWIW, anyone looking for this errata list can find it here. As a bit of an aside, none of AMD's PDF documents will load for me in Mozilla/Netscape 6 with Acrobat 5, but they all work fine with IE/Acrobat 5. Wierd.
Nevermind, just saw the "workaround" listed in the article.
STOP MISUSING APOSTROPHES, YOU MORONS!!!
I can't understand what "CPU" bug you are talking about. Could somebody tell me where's that bug? As far as I'm concerned, the trouble is our dear kernel trying to get those 4Mb pages found in Pentium and not in Athlons. Now, why everybody call this a BUG????? I think this is just a "NOT supported feature". Maybe should a say I've found a bug on the new Pentium IV. They can't work with "3D NOW!" optimizations!
And I understand that most computer users WANT this. If there is some problem with their magical computer thingy, they want something to just fix it. That is part of the real problem here, which goes way beyond this particular issue, that people are patching their systems with blind faith that the patch will "fix" whatever is wrong. Was the name of the registry setting provided? Can you go in and change it manually if you wanted to? I am guessing that it wasn't.
I am no zealot. I understand that things need to work a certain way in the computer world, just because not everyone is comfortable with computers. But it is my machine, I want to know what is going on with it.
My beliefs do not require that you agree with them.
If what you are saying is true
"As it turns out, AMD did eventually get
around to fixing this issue with Stepping
A5 of the AthlonXP/MP core."
then we have a real problem here.
You see, Alan Cox and friends are planning to make Linux "recognizes" Athlons, and do a special case - the "nomem" thingy - on it.
But if the Stepping A5 of AthlonXP/MP core has the bug licked, then, the "nomem" thingy by Alan Cox et al may be a step backward.
Unless of course, the kernel hackers want to have a double-checked thingy on their "Atholon finder". Such as
If Athlon=step5, then do nothing.
Muchas Gracias, Señor Edward Snowden !
Slightly off-topic, but...
How often are 4M pages useful? I guess the whole kernel could be in one page, but where else is this useful? I bet most applications would never see any benefit from 4M pages anyway.
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
Then read the KB article that is associated with the patch. There is your documentation as what the patch is doing. Here is the link for you....
f search%2fviewDoc.aspx%3fdocID%3dKC.Q270715%26dialo gID%3d1928924%26iterationID%3d1%26sessionID%3danon ymous%7c1700360
http://support.microsoft.com/default.aspx?scid=%2
Microsoft makes it a policy to write one of these for every patch it makes. So RTFM ok? (sorry couldn't help myself..at least I told you where in the manual to read)
Silly Rabbit...Sig's are for kids.
And using regedit, you can change it manually, too. You can add keys and values and screw things up like a 5 year-old with root, if you like.
The criticism of the registry model that is valid is two-fold: 1, it can be corrupted like any file, but since it is one file and not a directory like /etc, that can muck up your whole system (the registry can still be backed up and reinstalled) and 2, it is somewhat easier for malicious code to muck with the registry, since most Windows users work in some privileged mode.
I built a K7S5A system for a friend, at the first go it couldn't install Windows, it turned out we had faulty RAM, so we replaced it, but problems persisted, went to the shop to had the thing looked; the motherboard was faulty. Got a new motherboard but even that didn't go smoothly. After spending 3 hours getting Windows installed, the system rebooted by itself and couldn't get back into Windows because of a corrupted registry file. I looked on the internet, and found this forum and it turns out the problem is widespread. Read especially this FAQ. K7S5A, blah.
What time is it/will be over there? Check with my iPhone app!
yeah, that was a direct attack on linux. what the hell are you idiots smoking....
Shift happens. Fire it up.
The average user is probably more capable of going to the Windows Updates website, clicking on the tick box and hitting 'Download' which then runs the install.
The typical computer geek is probably equally capable of editing the bootloader or a registry, but prefers the first.
Your question was kinda like asking "Which could the typical person do easier. Build a rocket to goto the moon, or build one to goto Pluto?"
For the typical person, neither is possible.
Rod Taylor
ALright, but the mem=nopentium still doesnt work, at least in my case. Not only do I have mem=nopentium, but I also have BIOS AGP set to 2x, Option "NoRenderAccel" "true" in XF86Config, and Option "NvAGP" "0" in XF86Config (basically totally disabling AGP totally anyway). Despite all these safeguards, sure enough, graphically intensive programs (including an idle X desktop) hard-crash the kernel. Anyone else still experiencing this? I didnt have to wait long either, the system crashed within seconds of opening Quake 3 while I was still navigating the game menus.
I have expereienced that myself. I put Mandrake 8.1 on Duron 900 with DDR ram in have experienced inexplicable lockups. just my 2 cents
"Congress shall make no law... abridging the freedom of speech, or of the press"
Moderation Totals: Flamebait=1, Insightful=2, Interesting=1, Informative=1, Funny=3, Overrated=2, Underrated=1, Total=11.
I think I'm only missing 'troll' and 'offtopic' and It would have been a full house.
If I remember correctly, to enable 64 bit operation on an UltraSPARC 1 cpu (such as that in an Ultra 1 workstation), you need to upgrade to OpenBoot 3.11.1 at a minimum, and then uncomment the line in /platform/sun4u/boot.conf that says this: ALLOW_64BIT_KERNEL_ON_UltraSPARC_1_CPU=true.
I don't know exactly what the bug is with the old UltraSPARC 1's, except that given specific hand-written assembly code, it is possible to lock up the machine.
I have been running mine on the 64 bit kernel for some time now, and haven't noticed any problems, so it's probably safe in most circumstances.
mpb
man tunefs | grep fish
That third article about the supposed "HCF" instruction on the 4004 is completely and utter BS. None of the instructions on the 4004 will cause it to burn up, even on the earliest production parts.
When the IBM System 360 series came out it had a large number of new opcodes (as compared with the 70x/70xx series). These were the days of CISC (Complex Instruction Set Computers), and the 360 really lived up to the name. It gave over a large amount of its word space to opcodes and opcode extensions, so it had a VERY large potential opcode space. Much of it was unpopulated, but some was populated with undocumented instructions. Further, the machine was microcoded, and the microcode was loaded when the machine powered up. (That's what floppy disks were invented for.) So the company could write new opcodes and add them later.
Of course the new machine with the ENORMOUS list of opcodes and (true) rumors of hidden undocumented opcodes quickly lead to the circulation of a humorous list of perhaps 20ish additional "new undocumented opcodes". Things like XOE (Execute Operator Immediately), EK (Electrify Keyboard), SSJ (Select Stacker and Jam), BLNK (Blink Lights), WHR (Whirr), etc. The crown jewel of this list was HCF (Halt and Catch Fire).
While this list was still funny Motorola released the 6800 single-chip microprocessor, predecessor to 650x knockoff that formed the core of the first Apple computers. To ease chip testing, the all-ones opcode threw the chip into a test mode, where it continuously incremented the program counter and performed memory reads. This wiggled all the address lines and most of the control lines, letting you know if the chip was alive and bonded.
Of course they didn't tell you about it. And of course the only way out was hard reset. And of course a jump to an unpopulated region of the address space (i.e. most of it) would leave the bus floating and generate 0xFF. And of course jumping into random data or uninitialized memory would also quickly get you an 0xFF or jump you off into unpopulated address space. So the typical behavior for a program bug was to lock up the processor beyond the ability of a debugger to function.
(Hell: I had one of the first round of solder-it-yourself evaluation kits, bent a pin on the debugger ROM putting it into the socket, and ended up with a board that booted into the test state. Was starving student and it took a couple days to get access to test equipment to find out what was wrong.)
So of course programmers, once they found out about the instruction that hung the chip in a mode where it "twiddled its thumbs at maximum speed" and got a bit warmer than usual, and couldn't get out of the mode except by hard reset, quickly christened the opcode "Halt and Catch Fire". And this became the generic term for get-stuck-in-a-test-mode instructions on microprocessors, until the chip manufacturers finally came to their senses and stopped putting such instructions into instruction sets.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Yeah, the firmware patch came with the solaris CD set. I patched it during my first install after I got the machine.
Kernel's runnin' in 64 bit mode. Unfortunately, I don't know enough C to really take advantage of it like I want to, but I'm learnin'...
Those who can't do, teach. Those who can't teach either, do tech support.
NVidia is partly and secretly funded by Microsoft. I have the information because I used to talk with people working in Nvidia and 3Dfx after they released the NV1. They sold their souls after the NV1 fiasco. By that time 3Dfx won a contract to power Sega next console.
After some misterious "...something", everything changed:
1) Sega dropped the 3Dfx contract for NO reason.
2) Nvidia got lots of funds to automagically recover from disaster. What did they do? Target they new chip (RIVA) for Direct3D (before direct3D was really usefull).
3) 3Dfx lost a lot of the initial momentum and the masterminds quit the company (when they still where #1!)
And now we have the X-Box which houses a Nvidia chipset and 3Dfx is owned by NVidia.
Do you expect to get Linux support from NVidia?
unfinished: (adj.)
It would be so nice if ServerWorks made an AMD chipset. Imagine what they could do with hypertransport bus and if they implemented their quad channel SDRAM in a DDR solution. Finnaly there would be a truely stable enterprise class chipset available for AMD. They could probably even properly implement USB (MPX satire).
If voting were effective, it would be illegal by now.
Abit tunes their boards in order to allow such overclocks. They make many design decisions that sacrifice a bit of system stability in exchange for 2% more memory bandwidth, or to allow a wider range of bus speeds just so you can brag that your Athlon 1333 at 147 x9 can saturate the GeForce2's bus a few hairs quicker than mine at 133 x10. For tweak-freaks, that's fine, but for everyone else we just want something that can run Win98/2k for a couple days without a reboot, just like my old P2 system used to.
-Billco, Fnarg.com
...just append the option to the end of your 'kernel' line. For me, it looks like this:
/vmlinuz root=/dev/hdb1 vga=5 mem=nopentium
kernel
Hope this helps!
And, for convenience, a rundown by the players involved (both for the Linux kernel and AMD) is here.
In short, for the reading-impaired, it's not an Athlon bug.
- I don't need to go outside, my CRT tan'll do me just fine.
It was obviously well known in late 2000 when I bought my system because there were registry patch fixes for windoze avaliable from AMD's site and now are avaliable on mobo manufacturer's sites too.
F4+80y +1++135
FatBoy Titties - (aren't I l33+