Slashdot Mirror


Are Bad RAM Chips Common?

A semi-Anonymous Coward asks: "I recently built myself a new system using a mainboard which required using registered DDR SDRAM -- the motherboard will not work with unbuffered / unregistered memory, and I wanted the extra integrity provided by registered memory anyway. To my amazement, both the memory I purchased with the board and one of two other sticks I purchased were either defective or simply incapable of working with the board (which is the Chaintech 7KDD, BTW). About how often do people run into defective memory, and do they see them from the 'reputable' manufacturers as often as they do the 'no-name' ones? Now that I've spent a ridiculous amount of money on this, I'm a lot more wary."

11 of 78 comments (clear)

  1. When lives are at stake ... by Glonoinha · · Score: 3, Informative

    I have run into bad RAM a few times, I quit buying the cheap stuff and only deal with Crucial - have had excellent luck with them.

    --
    Glonoinha the MebiByte Slayer
    1. Re:When lives are at stake ... by Glonoinha · · Score: 3, Informative

      Daniel - generally it isn't the fact that a chip (or whatever, actually) is bad, it is the hassle associated with a bad chip. I got a cheap (bad) chip for my g/f's laptop and it developed very subtle problems, would lock up from time to time and was not blatently obvious what the problem was. I ended up reinstalling Win98 twice (I was pretty eager to blame MS, to no avail) and after upgrading her to Win2000Pro and still having problems I remembered adding the RAM so I pulled it out. Problems went away.

      The local hardware shops will eagerly replace my cheapo RAM with different cheapo RAM but they can't replace 10 hours worth of diagnostics, lost files, scrambled data, the half hour each way drive it takes to get to their store, etc...

      What happens if the RAM is marginal only at certain temperatures or under certain loads, circumstances they can't replicate on their test gear? You go back to the house and pop it back in and go back to having problems, but this time you are SURE it isn't the RAM so you start replacing other parts (mobo, video card, NIC, caching SCSI RAID controller card) all out of your pocket trying to make it stop blue screening (or whatever) and be a stable work environment ... when it is still the RAM.

      Once you start using the hardware for work the cost (value) of the hardware is negligible compared to the cost (value) of the actual data ... I have had laptops worth $1,000 carrying a half million dollars worth of development code on them. If someone tried to steal that laptop I wouldn't be killing them over the value of the laptop, I would be killing them over the value of the IP contained within.

      --
      Glonoinha the MebiByte Slayer
  2. occasionally by Anonymous Coward · · Score: 4, Informative

    i have occasionally run into bad memory. a very handy utility can be found at http://www.memtest86.com to verify that your memory is bad, and the specific address ranges that are no good. you can then specify those address ranges to the linux kernel and applications will not be able to malloc the bad memory, thus running stably despite having bad ram.

  3. Cheap ram = bad ram by pr0c · · Score: 5, Informative

    I do a lot of side work dealing with computer upgrades. I outright give 2 options:

    1.) We get cheap stuff and save you money. I make it very clear that it may not work
    2.) We get Normally priced ram and be sure its good

    Of the few people that did not want to spend the money to get a good brand even with me warning that its a bad idea about 1 in 3 ram chips did NOT work. I've NEVER had a good brand (crucial, kingston etc) fail even 1 time. I dont' gamble on my system I use Corsair XMS and thats what i recommend but anways thats what i've found.

    My Rough Stats:
    Cheap Memory 30%+ failed Good Memory 0% failed this is only dealing with about 100 experiances in the last few years, i don't do much side work.

  4. Run Memtest86... by (H)elix1 · · Score: 4, Informative

    Memtest86 will go a long way to test the ram. If you are going through tons of wanky ram, the issue may be your cpu or power supply however. Test the ram on a couple boxes.

    As for no-name. Usually grade 'a' ram will run at a lower cas rating, where some of the generics might work at a higher (and slower) setting. Stuff that rates at PC-100 CAS 2 might only work at PC-133 CAS 3. (dang, showing my age) The good stuff tended to be able to run stable at the faster FSB and CAS settings. My time is worth more than the ~$30 bucks between solid and guesswork.

    If your not pushing a system hard - cheap ram might just work. A few years back a local vendor had some dirt cheap no-name 128M sticks that ran as fast as my mushkin stuff. Go figure. You role the dice, but it matters less if you are not pushing your settings hard.

  5. Summary by chriso11 · · Score: 2, Informative

    Ok - to summarize

    1) whenever you buy a new stick of RAM, run memtest 86 on it for an hour or so. It can save you weeks of problems.

    2) Use a grounding strap. ESD damage is a serious problem, and especially in the winter months, can easily lead to zapped parts. In fact, use a strap whenever you open your box! I even have a roll-up ESD mat for serious surgery.

    I have actually had memory go bad in my PC right when I was using the PC: it was good one minute, then bad the next. I have a nice APC UPS working as a surge protector. THe memory was some premium stuff too - Corsiar XMS memory. I hadn't touched the inside of the box for a few weeks (hard to believe, huh?), and I was practicing with the 203 on America's Army, and I suddenly got a win2k BSOD (which has a lot more words, but is basically just as useless as the win98 BSOD). So:

    3) test your memory periodically - like every 6 months or so.

    4) Maybe your motherboard has some debris in the memory slot or a sliver of metal shorting some pins out.

    --
    No, I don't trust in god. He'll have to pay up front, like everybody else.
  6. Re:My experiences by Spoing · · Score: 4, Informative
    [ Slash long list of systems ]

    1. None of these systems have ever had memory problems. They rarely, if ever, crash (or at least they didn't crash when I had them - some have passed on into the hands of friends). Maybe I'm just one really lucky bastard when it comes to RAM, but I've never had any problems buying the cheapest shit memory so that I could save a few bucks.

    Out of sight, out of mind.

    Being a former test lead for a memory diagnostic tool, I'd bet you had plenty of memory errors. When they occured, they didn't 'look' like memory errors, so you treated a different problem. Your fix 'worked', so you claimed sucess and moved on. Other errors might not have symptoms -- even if corrupton did occur -- so you didn't notice anything was wrong.

    1. Basic example: One bit errors let alone other more complex defects can pass hardware parity checks (change a bit here and it flips a bit in a physically similar area).

    The stats given by others -- ~30% failure on cheap memory and 0% on good within the first month -- are close to my experiences. IMNSHO, the intial numbers are the same (~30% & ~0%). Over the lifetime of a system +10% of both cheap and good memory tends to fail (or get wrecked by bad power).

    To catch the +10% failure rate on non-ECC memory, and to catch memory subsystem errors in general, I run extensive tests on systems that can be taken down about once a year -- this is beyond any tests to diagnose flaky behavior.

    Memtest86: It is excellent and as good as any other memory diagnostic software I've ever used when running all tests. As a matter of course, I add memtest86 to the boot menu on all x86 systems.

    BIOS memory tests: The boot up memory tests are useful only to identify that the memory exists, so if possible I turn them off.

    --
    A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
  7. Crucial is boring...and I like it by Bourbon+Man · · Score: 2, Informative

    I'm responsible for 250+ PC's and a dozen servers. Over the last couple of years I have bought literaly hundreds of sticks from Crucial. Never a single bad chip, never a compatability issue, never any problem whatsoever. Period.

  8. Comment removed by account_deleted · · Score: 2, Informative

    Comment removed based on user account deletion

  9. Re:My experiences by irix · · Score: 3, Informative

    ESD damage *does* happen with a surprisingly high frequency when you handle components unsafely, but you don't notice because the damage takes time to show

    I used to work at a semiconductor manufacturing facility once upon a time. Let me just say that this is 100% correct.

    My employer spent a lot of money on ESD prevention because ESD errors were the worst kind of errors. Sometimes the chip would fail catastrophically, but usually it would pass probe and test and get shipped, only to fail prematurely in the field (latent failure). This is obviously much more expensive than finding the problem before the device ships.

    Another common misconception is that you need to feel the ESD charge - like walking across a carpet in sock feet and touching a doorknob - in order for damage to occur. This is false - most electronic components can be damaged at a much smaller voltage than you can feel in your body.

    My best advice is that simple ESD precautions like a wriststrap are cheap, so use them.

    --

    Do you even know anything about perl? -- AC Replying to Tom Christiansen post.
  10. CMOS Electronics Primer - ESD Damage by BigBlockMopar · · Score: 2, Informative

    Essentially the high voltage of the ESD (ESD like when you shock yourself on a doorknob is very high voltage, it's just very low current) is destructive to the transistor junctions, but it usually doesn't cause immediate complete failure. A few days, months, or even years down the road, the junction will prematurely break down, having had a shortened lifespan because of the ESD damage.

    Indeed.

    Memory chips - and most other components within any computer less than fifteen years old - use CMOS logic. CMOS stands for "Complementary Metal Oxide Semiconductor", which essentially means that they're full of MOSFETs ("Metal Oxide Semiconductor Field Effect Transistor"). This includes almost all processors, support logic, etc. In fact, the only exception which comes to mind is the really old computers which had the big banks of 74xx-series TTL logic all over the place, like in an XT. But keep in mind that the processor itself - and many other components - will be CMOS.

    The neat thing about Field Effect Transistors is that the electric field created by applying a gate voltage turns on the source-drain circuit. There is essentially no current required to drive the gate. The fact that there is theoretically no gate current means that you can do things like power 20 million transistors off a single 200W AT power supply, or build a wristwatch which runs for 5 years off the same tiny little battery.

    The "field effect" is governed by the inverse square law. As you double the distance, you need 4 times the voltage to achieve the same field inside the source-drain junction. Naturally, in order to be able to work at the low voltages inside a computer, the distance therefore must be tiny.

    This tiny distance is filled with a layer of what is, essentially, glass. And it's so thin that it can have a hole blasted through it by 30 volts.

    Now, air doesn't ionize until about 3kV per millimeter. That means, to jump a 1mm gap, you need about 3,000 volts, which you perceive as a tiny static electric spark.

    You will never see, nor feel, a 30V static electric charge. You can build it up just by sitting in your chair. And that's enough to blow a MOSFET transistor.

    If a RAM chip has a million MOSFETs (modern ones have a lot more!) and you blow one of them, your chip is still well over 99.999% fine... until you try to read back data from the address with the blown MOSFET. And then you get one bit of garbage.

    The data in RAM is corrupt. What if it's executable? Does the machine crash? Probably. What if it's a JPG? Maybe one pixel on that 1024x768 pr0n image you downloaded is one shade of skin-tone different than it should be.

    A lot of ESD failures show up as intermittent crashes and other software problems. Before you reinstall your operating system because it's getting crufty, consider your hardware... well, unless you're running Windows.

    ALWAYS wear a wrist strap. It's a bummer, but them's the dice.

    --
    Fire and Meat. Yummy.