Are Bad RAM Chips Common?
A semi-Anonymous Coward asks: "I recently built myself a new system using a mainboard which required using registered DDR SDRAM -- the motherboard will not work with unbuffered / unregistered memory, and I wanted the extra integrity provided by registered memory anyway. To my amazement, both the memory I purchased with the board and one of two other sticks I purchased were either defective or simply incapable of working with the board (which is the Chaintech 7KDD, BTW). About how often do people run into defective memory, and do they see them from the 'reputable' manufacturers as often as they do the 'no-name' ones? Now that I've spent a ridiculous amount of money on this, I'm a lot more wary."
i have occasionally run into bad memory. a very handy utility can be found at http://www.memtest86.com to verify that your memory is bad, and the specific address ranges that are no good. you can then specify those address ranges to the linux kernel and applications will not be able to malloc the bad memory, thus running stably despite having bad ram.
I do a lot of side work dealing with computer upgrades. I outright give 2 options:
1.) We get cheap stuff and save you money. I make it very clear that it may not work
2.) We get Normally priced ram and be sure its good
Of the few people that did not want to spend the money to get a good brand even with me warning that its a bad idea about 1 in 3 ram chips did NOT work. I've NEVER had a good brand (crucial, kingston etc) fail even 1 time. I dont' gamble on my system I use Corsair XMS and thats what i recommend but anways thats what i've found.
My Rough Stats:
Cheap Memory 30%+ failed Good Memory 0% failed this is only dealing with about 100 experiances in the last few years, i don't do much side work.
Memtest86 will go a long way to test the ram. If you are going through tons of wanky ram, the issue may be your cpu or power supply however. Test the ram on a couple boxes.
As for no-name. Usually grade 'a' ram will run at a lower cas rating, where some of the generics might work at a higher (and slower) setting. Stuff that rates at PC-100 CAS 2 might only work at PC-133 CAS 3. (dang, showing my age) The good stuff tended to be able to run stable at the faster FSB and CAS settings. My time is worth more than the ~$30 bucks between solid and guesswork.
If your not pushing a system hard - cheap ram might just work. A few years back a local vendor had some dirt cheap no-name 128M sticks that ran as fast as my mushkin stuff. Go figure. You role the dice, but it matters less if you are not pushing your settings hard.
+++ UGUCAUCGUAUUUCU
Why does the computer industry tolerate this sort of thing? When it was hobbyists tinkering with Northstars and Cromemcos and Sols it might have been understandable, but we should have grown up a long time ago.
When you put oil into your car, you know that the oil companies and the car companies have gotten together with the American Petroleum Institute to set standards so that as long as your owner's manual says "API SG" and the oil you buy says "API SG" or better, that oil will work in your car. And you can use Mobil Oil to top up an engine filled with Quaker State without losing any sleep over whether their chemistry is compatible.
You don't rely on friends' stories of whether Quaker State is better than Shell Oil. You know that regardless of the price of the oil, if it says API SG it meets API SG specs and if your car says API SG specs are good enough, they're good enough.
It doesn't benefit anyone if your engine seizes up, and it doesn't benefit anyone if your computer crashes.
It's simple, it's easy, millions of consumers who aren't chemical engineers buy engine oil every day without wrecking their cars.
Why is it expecting too much for computer vendors to do the same?
And, while we're at it, why don't all computers use parity-checked memory? This was standard on 100% of all computers before the micro age, and for some reason people started putting in non-parity memory to save money and asserting that "it works."
And our computers crash a lot, and nobody knows why and nobody does anything about it and everyone just accepts that that's the way computers are...
"How to Do Nothing," kids activities, back in print!
You've been lucky on RAM for sure. Now about this static discharge thing. I also never used to use wriststraps or any other static precaution working on home stuff. I always did it right at work because it was required, but at home I routinely did just about anything I could to static damage them because I knew it was unlikely to cause a problem. My experience was always that the components worked fine anyways, and that ESD damage must be such a low occurence that you're just not likely to ever see it so it's not worth the trouble.
However, later on down the line I learned the error of my ways. I was failing to understand the nature of ESD damage. Someone finally clued me in. In short, ESD damage *does* happen with a surprisingly high frequency when you handle components unsafely, but you don't notice because the damage takes time to show. Essentially the high voltage of the ESD (ESD like when you shock yourself on a doorknob is very high voltage, it's just very low current) is destructive to the transistor junctions, but it usually doesn't cause immediate complete failure. A few days, months, or even years down the road, the junction will prematurely break down, having had a shortened lifespan because of the ESD damage.
So those components that failed on you after a few good years of service that you chalked up to just failing from age probably failed to a large degree from ESD back when you first installed them, and had you used the right precautions, they might've lasted a lot longer. Now that I understand this, I'm a lot more careful about ESD even at home. From what I read, the long-term effects of ESD over a large sample are better felt by electronics companies. They can actually see the warranty return rate on their chips drop consistently when they put better ESD precautions into place, although it may take a few years to see.
11*43+456^2
Out of sight, out of mind.
Being a former test lead for a memory diagnostic tool, I'd bet you had plenty of memory errors. When they occured, they didn't 'look' like memory errors, so you treated a different problem. Your fix 'worked', so you claimed sucess and moved on. Other errors might not have symptoms -- even if corrupton did occur -- so you didn't notice anything was wrong.
The stats given by others -- ~30% failure on cheap memory and 0% on good within the first month -- are close to my experiences. IMNSHO, the intial numbers are the same (~30% & ~0%). Over the lifetime of a system +10% of both cheap and good memory tends to fail (or get wrecked by bad power).
To catch the +10% failure rate on non-ECC memory, and to catch memory subsystem errors in general, I run extensive tests on systems that can be taken down about once a year -- this is beyond any tests to diagnose flaky behavior.
Memtest86: It is excellent and as good as any other memory diagnostic software I've ever used when running all tests. As a matter of course, I add memtest86 to the boot menu on all x86 systems.
BIOS memory tests: The boot up memory tests are useful only to identify that the memory exists, so if possible I turn them off.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.