Slashdot Mirror


Salvaging Defective DRAM

An anonymous reader writes "Ever wonder what happens to DRAM that fails quality assurance testing during manufacturing? Turns out a lot of it ends up as 'downgrade' memory and ends up in OEM memory modules. Last resort: use it in an answering machine, where the sampled audio can be very tolerant of bit errors."

42 of 211 comments (clear)

  1. Comment removed by account_deleted · · Score: 3, Insightful

    Comment removed based on user account deletion

  2. Now there is a new excuse.... by MosesJones · · Score: 3, Funny


    "Oh you left a message on the answering machine, naah I didn't get it must be the defective DRAM chips they use. Now you've managed to track me down using a detective agency I'll be sure to send you the cheque next week"

    --
    An Eye for an Eye will make the whole world blind - Gandhi
  3. *Long String of Curses* by SatanicPuppy · · Score: 5, Funny

    You just explained a lot about my fricking answering machine! I thought that no one ever called! And now I find out it is low grade ram? My god! I may really HAVE a social life!

    --
    ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
    1. Re:*Long String of Curses* by Palos · · Score: 3, Funny

      Yeah, you've been missing out for a trip of a lifetime to disneyworld, and all kinds of other important calls...

    2. Re:*Long String of Curses* by Znonymous+Coward · · Score: 4, Funny

      The messages come through, they just sound like this...

      Hi, this is ___ael. Give me a call at 55_12__. Talk to you latter.

      --

      Karma: The shiznight, mostly because I am the Drizzle.

  4. Did I ever wonder? by Anonymous Coward · · Score: 5, Funny
    Ever wonder what happens to DRAM that fails quality assurance testing during manufacturing?

    No. I figured they forgot about it.

  5. Alternatively, you could use the... by Ari+Rahikkala · · Score: 5, Informative
    1. Re:Alternatively, you could use the... by Spoing · · Score: 3, Funny
      You don't want the bad ram patch.

      I don't want the bad ram patch.

      You realize it is a mistake.

      That would be a mistake.

      You have better things to do with your time...that are less risky. You will go home and reconsider your life.

      Excuse me, I need to go home.

      --
      A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
  6. bad ram patch for linux by Anonymous Coward · · Score: 4, Interesting

    Summary: This page proposes an approach to support RAMs with defective addresses, This may open interesting business perspectives, where those RAMs can be sold under a white label for less money rather than discarded of without any profit.

    the url is:

    http://rick.vanrein.org/linux/badram/

  7. Yes I was just sitting here by Anonymous Coward · · Score: 3, Funny

    and I suddenly though, hmmm what happens to that defective DRAM, I open up Mozilla and what do I find an answer to my question.

  8. Buying ram on the internet..... by Ogrez · · Score: 5, Informative

    This is the prime example of why I tell people I know not to buy ram off of the internet unless its from a major company that has good support. To many people buy 15-90 day warranty ram because its cheap, and when it fails they are upset that they have to replace it. If you pay a bit more money you get lifetime warranty ram... and why do you think they are willing to warranty it that long, because they know it works. people dont understand the testing process and think they are getting the same product buying cheap ram, as opposed to inexpensive ram...

    --


    Fire in the hands of the village idiot is no tool, but a weapon of mass destruction
    1. Re:Buying ram on the internet..... by Boo+Robin · · Score: 3, Insightful

      Very true. I'll never buy generic RAM simply because it is more likely to malfunction. I've seen this happen to many of my friends. I'll stick with my Kingston RAM. The extra price is worth the warranty and the ability to sleep knowing my RAM won't mess my computer.

      --
      'Give me one more medicated peaceful moment'
    2. Re:Buying ram on the internet..... by Ogrez · · Score: 3, Interesting

      Im a sysadmin by trade... and although I might chance 90 day oem ram if the price is right in my home machine, in my production servers I use Crucial, or Kingston. As far as the average consumer... How do they determine if a car is good, what criteria do they use to determine what new stove to buy? They do some research..If more people did a little research when they bought computers, or computer parts, they wouldnt have half the problems. Instead they listen to the sales people, or they buy the cheapest thing they can find and then wonder why their system locks up.

      --


      Fire in the hands of the village idiot is no tool, but a weapon of mass destruction
    3. Re:Buying ram on the internet..... by sjames · · Score: 4, Interesting

      If only that was the worst of it.

      Generic RAM is also in the habit of mis-reporting it's capabilities in SPD. The problem was so bad with 512M sticks back when that was the biggest available, many BIOS would automatically disregard SPD and choose the slowest settings when a 512M stick was detected.

      Better brand names don't appear to have that problem.

  9. Use memtest86 by Black+Parrot · · Score: 5, Interesting

    ...and read its documentation to find out how to make Linux skip any defects it finds.

    --
    Sheesh, evil *and* a jerk. -- Jade
  10. recycling the chips by v1 · · Score: 5, Interesting

    I recall seeing an article awhile ago where companies were buying defective memory, and running them in these external testing units,which would identify which chip(s) on the stick were bad. I'm assuming they'd then unsolder the bad chip and recover one from another module. At that time some of those sticks had 8 chips on each side, so you could recover 15 good sticks from 16 bad ones. Considering the price of memory a few yrs ago, it was probably a worthwhile venture. Nowadays though, it's probably not worth anyone's time.

    --
    I work for the Department of Redundancy Department.
  11. well by odyrithm · · Score: 4, Funny

    I sh*t you not.. they make great keyring fobs! just dont let your gf see it ;)

    --
    moo
    1. Re:well by garbs · · Score: 3, Funny


      just dont let your gf see it ;)


      No problem with that happening with most of the slashdot visitors.

    2. Re:well by wik · · Score: 5, Interesting

      Not to mention, give you hell at the airport. The security guys in Pittsburgh told me to put my keys in the little bucket, then when they looked closer, told me to put them through the X-ray machine.

      They were looking at the old 256k SIMM PCB (all chips removed) and asking "is that a computer chip"? Funny how they pointed at that and missed my Intel keyring fob with a real processor die on it.

      --
      / \
      \ / ASCII ribbon campaign for peace
      x
      / \
  12. Some updates to the article... by IvyMike · · Score: 5, Informative

    There are some things in the article that are pretty out of date:

    To reduce the test time, parallel chip testing usually is accomplished with eight to 16 chips in a row.

    That's pretty low parallelism; there are memory testers out there that test over 200 devices at a time right now. And even the older, more common systems are probably testing 64 in parallel.

    A special ink jet color marks the good dies.

    This hasn't been true for years. Each device's pass/fail status is stored in a database, along with all other test results, and the whole process is automated enough that good die are binned out automatically. No need to physically mark the chip.

    Due to the imperfection of the process, a percentage of the DRAM die contains some faulty cells.

    That percentage is 100%. At modern memory sizes, you never get a perfect device without going through repair.

  13. Re:Ever wonder? by yintercept · · Score: 3, Funny

    Well, have you ever wondered what happens to all the defected people that get produced?

    They end up on earth.

  14. I have it by acidrain69 · · Score: 5, Funny

    I've been waiting for the computer graveyard market to ramp up. Where does the rest of defective computer systems go?

    It's in my closet. All of it. The whole market. I'm waiting for the entire tech market to crash, so I can flood the market.

    --
    -- Having a Creationist Museum is like having an Atheist place of worship
  15. I am seeing a lot of this by acidrain69 · · Score: 5, Informative

    There are a lot of peeps complaining about substandard ram. If you had RTFA, you'd realize that the downgrade ram is reconfigured to skip the bad parts in the chips, so that it comes out as a normal module. Just because there is a faulty bit or 10 in a modules, doesn't mean the reast of that module is bound to fail. It could just have been an imperfection in the silicon or the circuit process.

    The downgrade ram has to pass further tests to insure the detours around the bad parts worked.

    Granted, I probably wouldn't use this stuff in a mission critical server, but if you are buying for a mission critical server, you should be getting ECC registered with lifetime warranties anyway. Now for a small web or file server, or even a desktop, I'd use this.

    Other people have mentioned memtest86. This program is your friend. Don't even bother with BIOS POST tests of RAM, just use this every once in a while if you REALLY want to find the problems. Too bad it won't run on my alpha server :(

    --
    -- Having a Creationist Museum is like having an Atheist place of worship
    1. Re:I am seeing a lot of this by alienw · · Score: 4, Insightful

      If the chip is half-bad, there are good chances that it has defects in the other half. Usually, it's a problem with the process and not just random quirks. It's just that one half works better than the other. In fact, many windows crashes are not caused by Windows, but by bad RAM. And good luck finding anything with memtest86. Once, I ran that program for about 3 hours on a machine with bad RAM. It didn't find anything. When I replaced one of the sticks, all the problems went away.

    2. Re:I am seeing a lot of this by chrysrobyn · · Score: 5, Insightful

      There are a lot of peeps complaining about substandard ram. If you had RTFA, you'd realize that the downgrade ram is reconfigured to skip the bad parts in the chips, so that it comes out as a normal module. Just because there is a faulty bit or 10 in a modules, doesn't mean the reast of that module is bound to fail. It could just have been an imperfection in the silicon or the circuit process.

      You have made a statement that makes it very clear you are a very educated layman, not someone in the field. What you've said is true to the first order, but not inherantly true.

      Wafers have what can be measured as "defect density", and observe a phenomena called "defect clustering". Defects are not always hit or miss, open or short, some of them are latent or resistive. As the part ages (diffuses), electromigrates or observes hot electron effects, all parts will decrease in quality. Downgrade RAM, so to speak, would be most likely to have additional cells fail due to the above effects -- because it had failures that made it marginal in the first place. Testing methodologies at higher quality manufacturers build in guardbands to make sure that nobody ever experiences the defects when used in-spec. (This is why many overclockers lose their chips after only a year or two, they cause latent defects to surface and suddenly the chip won't even operate at nominal frequencies; the guardband effect also explains to a great degree why many chips can be overclocked in the first place.)

      I'm not dis'n you, just trying to fill in a few more holes.

    3. Re:I am seeing a lot of this by Anonymous Coward · · Score: 3, Informative

      "If the chip is half-bad, there are good chances that it has defects in the other half."

      Actually, no that is not correct. Errors are caused by a localized defect which affects what is really (in human terms) a small point on the die. A particle of comtaminant, for instance, only a micron or two in size.

      Ever wonder why NAND FLASH (used in Smart Media, Compact FLASH, etc) are cheaper than NOR FLASH (called linear FLASH, used for BIOS and other code storage, etc)? Because not only is it designed to be fault correcting, but the spec allows for up to a certain number of sectors to be completely bad (uncorrectable by the on board ECC bits). This means higher yeild since many more get to pass in spite of defects.

      J

    4. Re:I am seeing a lot of this by shepd · · Score: 3, Interesting

      >When I replaced one of the sticks, all the problems went away.

      The big question is:

      Did you replace it with an identical type and speed of RAM? Or did it perhaps have fewer chips?

      memtest86 may not detect overclocked RAM, and on some boards, if the RAM is double sided, the extra "stress" on the bus of a poorly desgined board may be enough to cause errors when reading or writing the RAM.

      I've seen other strange effects that only happen to windows, such as a board that detects a full complement of 384 MB of RAM in the BIOS (1 each of 256 MB and 128 MB) but only 128 MB in windows. Moving the RAM about on the board would cause windows to _sometimes_ detect the rest of the memory. Swapping the 256 MB stick with another machine's 256 MB caused both machines to reliably detect and use the memory.

      While I never bothered with memtest86, I'm betting it would see the same amount of memory as the BIOS.

      Can you tell I hate modern memory modules yet? :-)

      --
      If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC
    5. Re:I am seeing a lot of this by tintruder · · Score: 3, Informative
      If the chip is half-bad, there are good chances that it has defects in the other half. Usually, it's a problem with the process and not just random quirks.

      Not true. All processes are subject to variation.

      When a wafer is produced with hundreds or thousands of discrete die on it, some are always better than others. For instance, in the 5" process where the first Pentiums were fabricated, you could have a yield of 60%-80% good die with those 60%-80% spanning a whole range of marked chip speeds. Same process, same wafer, different mhz. Different price when sold.

      If you've ever seen a fab in production, you would also see steps where manual (vacuum wand) handling is needed. Even in the filtered air of a clean room, the open movement of a wafer handled like this often leads to particles becoming affixed to the surface. The smaller the process (e.g. .09u vs..9u)the more damage a single particle can do.

      Process washings with chemicals or pure water do a good job of assuring no (well, few) particles stay affixed, but even so, some steps of metrology show that all cannot be avoided.

      Will a single particle hurt a single die? Maybe. Maybe not. It depends on where it lands and at what step in the process.

      Once the die are tested for yield and function and sorted by this performance, they are sold in batches.

      Not every die is tested completely though, but rather a restrictive set of "tell-tale" measurements are taken on most (at good fabs) and exhaustive testing done only on a small sample. Lots of statistical analysis helps know what to test and how hard to test it.

      Move to the final assembler, and all sorts of production glitches can cause bad modules. Primarily though, either minimally qualifying RAM or random sample tested RAM makes it into generic modules. Still, all the other components, the circuit board, connectors and solder itself can contribute to problems.

      In any case, the bad part in any chip is likely local because even minimal QA testing will eliminate obvious or widespread failures.

      Of course, piss-poor process does yield chips more prone to failure by breakdown of the traces or local thermal failure due to bubbles, impurities, or poor assembly.

  16. How to identify DIMMs using bad RAMs by udif · · Score: 5, Informative
    It's quite simple. Really.

    DRAM chips are usually have either 4, 8 or 16 bits per word. In order to construct a DIMM, 64 bits are needed. This means that with 4 bit DRAMs, you need 16 chips, with 8 bit DRAMs you need 8 vhips, and with 16 bit DRAMs you need 4 chips. usually you will see only the 4 or 8 bit DRAMs, because these occupy less board area for the same capacity. 16 bit DRAMs are only used for low capacity DIMMs.

    When your DIMM supports ECC, it's 72 bits wide, which makes it more complicated. Usually its made of 18, 4-bit chips, or 9 8-bit chips.

    (back in the 30 and 72 pin SIMM days, when memories were 8 or 32 bit wide, you could see ECC SIMMs that use 3 chip for 2x4+1=9 bits, or 2x16+4=36 bits).

    If you see DIMMs with 12 chips, This is usually a cheap OEM SIMM using partially good DRAMs.

    The Best way to identify such a DIMM, is to write down the marking on ALL the chips on it, and look them up in the internet. You then sum up all the DRAM bit widths, and see what you come up with:

    If its 64 bits, its a normal DRAM.

    If its 72 bits, its probably an ECC DIMM.

    If its more, it's probably a DRAM using partially good DRAMs.

  17. Re:I figured by AvitarX · · Score: 4, Informative

    Reel advice for Linux users with bad ram.

    Run memt86 and use the output for the badram patch for the kernel.

    that will actually work and cut e vary minimal amount of ram out.

    --
    Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
  18. Re:Does it get worse? by Ogrez · · Score: 3, Informative

    Yes, RAM will develop faults from use. Its just not very common, Mostly caused by overclocking, voltage spikes, and power surges.

    --


    Fire in the hands of the village idiot is no tool, but a weapon of mass destruction
  19. Alternate subject by hexdcml · · Score: 4, Funny

    Did anyone else read the title as "Salvaging DRM"? Hmmm, for minute there I thought answering machines were DRM protected.

    --
    Fight Crime - Shoot Back!
  20. Re:What about the rest of the computer? by Cyno01 · · Score: 4, Insightful

    Defective hardware is distributed to the nearest geeky friend, i myself have a shelf and a spare desk with drawers full of old/non working hardware. I'm sure you average /. reader has at least this if not a spare room full of old 286 boxes.

    --
    "Sic Semper Tyrannosaurus Rex."
  21. I just figured it was at Fry's by Artifex · · Score: 5, Informative

    Seriously, I've had some of their OEM memory as part of a package deal, and it was very nasty stuff.

    What's worse, before they would take it back, they wanted to "test" it, testing being limited to a couple runs of PC-Doctor, which is totally lightweight.

    To make a long story short, they refused to take it back the first time, later it blew up my motherboard. They replaced the motherboard (it was part of the package) and sent me home, where I discovered my Athlon XP was also damaged. I took it up there, and they wanted to run PC-Doctor on it, but the "technician" (hah!) cracked the CPU while putting it in a "test board," so "oops, I guess we're replacing that."

    P.S. One of the guys at the return desk who I got to know quite well told me, when I asked him why the "test boards" they were using always changed, that he thought they were boards that belonged to customers. Whether that meant boards in for repairs, or returned boards, I don't know or care - either is bad news.

    P.P.S. This was at the Fry's in Wilsonville, Oregon. There is also an idiotic troll in the service department there who, after ignoring me waiting at an empty counter for 10 minutes while he chatted on the phone, wanted to charge me for a "missing" monitor stand on a monitor I was returning, refusing for 15 minutes to look in the bottom of the box under the styrofoam because monitor stands always come attached to the monitors, didn't you know? He finally looked when I demanded to talk to the manager, and of course it was there. I had a long discussion with the manager anyway over his, and their, incompetence (I reminded him of the memory fiasco) but the troll was still lurking there the last time I dropped by for consumables, which is all I will ever buy from Fry's, now. You can't miss him - he looks like he'd feel more at home in a raincoat, instead of his cheesy lab coat, roaming a playground on a sunny day.

    --
    Get off my launchpad!
  22. To be quite honest... by HaloZero · · Score: 3, Funny

    ...no, I never really did wonder what happened to DRAM that failed the everpresent quality-assurance testing. Never really occured to me. So nyar.

    --
    Informatus Technologicus
  23. Re:Cheap memory. by Anonymous Coward · · Score: 4, Informative

    No Most major manufacturers use quality ram.

    Compaq and IBM both use Kingston Memory. They also like to jack up prices for their "rebranded" Compaq/IBM ram which is just really a Kingston module with an even higher price.

    Toshiba uses Samsung. I'm not sure about manufacturers like Dell or Gateway.

  24. Re:I figured by ksuMacGyver · · Score: 4, Funny

    Arriving in the email!!! Wow what will they think of next? No wonder you have problems with it. Your ink jet probably is set to too low of a resolution to print the circuit, for more help on setting up your printer check out linuxprinting.org

    --

    Ad Majorem Dei Gloriam

    Interested in AI? MACR
  25. tha~s a re_lly go@d article by hedley · · Score: 3, Funny

    I wo~nder when we wil+ see tho&e defec%ive ch)ps in ou! deskt{p mach?nes?

  26. Re:ECC worth it? by Fulcrum+of+Evil · · Score: 3, Informative

    But it's also theoretically possible for any number of other things to break, and spontaneous RAM failure seems very, very low on the list of things to worry about.

    Well, the thing about RAM failure is that, unless you do something like ECC, you won't detect the errors until it causes a crash. Probably, you'll lose some data to corruption first. The other thing is that RAM errors can be induced by bad power or other transient problems. Finally, it does happen, so better safe than sorry - you're spending $2k on a server, so why cheap out on a $50 part?

    --
    "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
  27. Re:Cheap memory. by Stonent1 · · Score: 3, Informative

    Dell uses Micron and Infineon (Siemens) for SDRAM and DDR. For RDRAM I think they mainly use Toshiba. I always recommend Crucial to people because it is just the retail branch of Micron. Lifetime warranty and I've never had a failed stick.

  28. Re:ECC worth it? by PurpleFloyd · · Score: 3, Informative

    ECC isn't there for the tiny chance that one, and only one, chip on the module would catch fire and die. It's there so that any random "bit rot" (single-bit errors) is caught and corrected before it causes damage. All RAM is susceptable to this; it can be caused by cosmic rays (!) or by radioactive decay (can't remember if it's alpha or beta) of minute quantities of radioisotopes in the chips' substrate. While it will only happen once in every ten years or so on average, it does happen and can cause a system crash. ECC is about reducing the possible risk (it would have to flip 3 bits simultaneously to fool ECC RAM).

    --

    That's it. I'm no longer part of Team Sanity.
  29. Re:Use it for Linux ;-) by mcrbids · · Score: 3, Interesting

    No. The idea of the patch wasn't to stop it crashing, you probably can't do that

    Bzzzzzzt! Wrong, buddy!

    I remember working with a DEC VAX 11/750. It had roughly the processing power of a 286, though it's hard to compare the two.

    It was the size of a large, commercial dishwasher, and had a stack of other boxes that together were about the same size that were the three 350 MB Hard Drives.

    The fault tolerance on this computer simply boggles the mind of anybody used to the Linux or Windows world.

    It would dynamically detect and remap areas of the hard disks going bad. It would dynamically detect, correct, mark, and log areas of RAM that were going bad - it would even tell you which CHIP on the memory card (about the size of a dinner plate) the error was on, with zero downtime, while it was running!

    It used a method not unlike ECC to determine "bad" and would map around bad RAM or disk sectors as a basic function of operation.

    It was so good, that one time, when it crashed (due to the air conditioner failing) that when we brought it back up, most people's sessions were preserved on their terminals, and just started working again, right where they left off! Despite the computer having been OFF for several hours!

    Sorry, but you haven't seen fault tolerance in a computer until you've seen it on an older DEC VAX. I can only wish that anything like that was available today.

    It probably is, but I sure can't afford it.

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.