Salvaging Defective DRAM
An anonymous reader writes "Ever wonder what happens to DRAM that fails quality assurance testing during manufacturing? Turns out a lot of it ends up as 'downgrade' memory and ends up in OEM memory modules. Last resort: use it in an answering machine, where the sampled audio can be very tolerant of bit errors."
BadRAM patch.
This is the prime example of why I tell people I know not to buy ram off of the internet unless its from a major company that has good support. To many people buy 15-90 day warranty ram because its cheap, and when it fails they are upset that they have to replace it. If you pay a bit more money you get lifetime warranty ram... and why do you think they are willing to warranty it that long, because they know it works. people dont understand the testing process and think they are getting the same product buying cheap ram, as opposed to inexpensive ram...
Fire in the hands of the village idiot is no tool, but a weapon of mass destruction
There are some things in the article that are pretty out of date:
To reduce the test time, parallel chip testing usually is accomplished with eight to 16 chips in a row.
That's pretty low parallelism; there are memory testers out there that test over 200 devices at a time right now. And even the older, more common systems are probably testing 64 in parallel.
A special ink jet color marks the good dies.
This hasn't been true for years. Each device's pass/fail status is stored in a database, along with all other test results, and the whole process is automated enough that good die are binned out automatically. No need to physically mark the chip.
Due to the imperfection of the process, a percentage of the DRAM die contains some faulty cells.
That percentage is 100%. At modern memory sizes, you never get a perfect device without going through repair.
There are a lot of peeps complaining about substandard ram. If you had RTFA, you'd realize that the downgrade ram is reconfigured to skip the bad parts in the chips, so that it comes out as a normal module. Just because there is a faulty bit or 10 in a modules, doesn't mean the reast of that module is bound to fail. It could just have been an imperfection in the silicon or the circuit process.
:(
The downgrade ram has to pass further tests to insure the detours around the bad parts worked.
Granted, I probably wouldn't use this stuff in a mission critical server, but if you are buying for a mission critical server, you should be getting ECC registered with lifetime warranties anyway. Now for a small web or file server, or even a desktop, I'd use this.
Other people have mentioned memtest86. This program is your friend. Don't even bother with BIOS POST tests of RAM, just use this every once in a while if you REALLY want to find the problems. Too bad it won't run on my alpha server
-- Having a Creationist Museum is like having an Atheist place of worship
DRAM chips are usually have either 4, 8 or 16 bits per word. In order to construct a DIMM, 64 bits are needed. This means that with 4 bit DRAMs, you need 16 chips, with 8 bit DRAMs you need 8 vhips, and with 16 bit DRAMs you need 4 chips. usually you will see only the 4 or 8 bit DRAMs, because these occupy less board area for the same capacity. 16 bit DRAMs are only used for low capacity DIMMs.
When your DIMM supports ECC, it's 72 bits wide, which makes it more complicated. Usually its made of 18, 4-bit chips, or 9 8-bit chips.
(back in the 30 and 72 pin SIMM days, when memories were 8 or 32 bit wide, you could see ECC SIMMs that use 3 chip for 2x4+1=9 bits, or 2x16+4=36 bits).
If you see DIMMs with 12 chips, This is usually a cheap OEM SIMM using partially good DRAMs.
The Best way to identify such a DIMM, is to write down the marking on ALL the chips on it, and look them up in the internet. You then sum up all the DRAM bit widths, and see what you come up with:
If its 64 bits, its a normal DRAM.
If its 72 bits, its probably an ECC DIMM.
If its more, it's probably a DRAM using partially good DRAMs.
Reel advice for Linux users with bad ram.
Run memt86 and use the output for the badram patch for the kernel.
that will actually work and cut e vary minimal amount of ram out.
Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
Yes, RAM will develop faults from use. Its just not very common, Mostly caused by overclocking, voltage spikes, and power surges.
Fire in the hands of the village idiot is no tool, but a weapon of mass destruction
No. The idea of the patch wasn't to stop it crashing, you probably can't do that; the idea was to analyse it when the system booted and work around it then- it's perfectly possible to send the admin an email summarising it though.
There's something very cool about the concept of buying a tonne of memory for a tenth of the price and suddenly having a system with nearly four gigabytes of memory ;-)
Then again, isn't that what ECC memory is for?
No. AFAIK ECC memory can correct only bit errors within a word; but addressing errors slip right past it. The patch can handle addressing errors, blocks that just don't work, blocks that mirror back to the same location etc. etc.
-WolfWithoutAClause
"Gravity is only a theory, not a fact!"Seriously, I've had some of their OEM memory as part of a package deal, and it was very nasty stuff.
What's worse, before they would take it back, they wanted to "test" it, testing being limited to a couple runs of PC-Doctor, which is totally lightweight.
To make a long story short, they refused to take it back the first time, later it blew up my motherboard. They replaced the motherboard (it was part of the package) and sent me home, where I discovered my Athlon XP was also damaged. I took it up there, and they wanted to run PC-Doctor on it, but the "technician" (hah!) cracked the CPU while putting it in a "test board," so "oops, I guess we're replacing that."
P.S. One of the guys at the return desk who I got to know quite well told me, when I asked him why the "test boards" they were using always changed, that he thought they were boards that belonged to customers. Whether that meant boards in for repairs, or returned boards, I don't know or care - either is bad news.
P.P.S. This was at the Fry's in Wilsonville, Oregon. There is also an idiotic troll in the service department there who, after ignoring me waiting at an empty counter for 10 minutes while he chatted on the phone, wanted to charge me for a "missing" monitor stand on a monitor I was returning, refusing for 15 minutes to look in the bottom of the box under the styrofoam because monitor stands always come attached to the monitors, didn't you know? He finally looked when I demanded to talk to the manager, and of course it was there. I had a long discussion with the manager anyway over his, and their, incompetence (I reminded him of the memory fiasco) but the troll was still lurking there the last time I dropped by for consumables, which is all I will ever buy from Fry's, now. You can't miss him - he looks like he'd feel more at home in a raincoat, instead of his cheesy lab coat, roaming a playground on a sunny day.
Get off my launchpad!
No Most major manufacturers use quality ram.
Compaq and IBM both use Kingston Memory. They also like to jack up prices for their "rebranded" Compaq/IBM ram which is just really a Kingston module with an even higher price.
Toshiba uses Samsung. I'm not sure about manufacturers like Dell or Gateway.
But it's also theoretically possible for any number of other things to break, and spontaneous RAM failure seems very, very low on the list of things to worry about.
Well, the thing about RAM failure is that, unless you do something like ECC, you won't detect the errors until it causes a crash. Probably, you'll lose some data to corruption first. The other thing is that RAM errors can be induced by bad power or other transient problems. Finally, it does happen, so better safe than sorry - you're spending $2k on a server, so why cheap out on a $50 part?
"We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
What is this "Kingston" memory of which you speak? AFAIK, Kingston does not make ram, they throw other people's die on their modules (and sometimes they just buy the modules whole). It's pretty much a crap-shoot of whether or not you're getting samsung, hynix, micron (who just signed a deal to start selling to them again), or etc. So saying kingston memory is crap would be akin to saying dell makes crappy hard drives...
:-)
Not a flame, just a clarification
Dell uses Micron and Infineon (Siemens) for SDRAM and DDR. For RDRAM I think they mainly use Toshiba. I always recommend Crucial to people because it is just the retail branch of Micron. Lifetime warranty and I've never had a failed stick.
ECC isn't there for the tiny chance that one, and only one, chip on the module would catch fire and die. It's there so that any random "bit rot" (single-bit errors) is caught and corrected before it causes damage. All RAM is susceptable to this; it can be caused by cosmic rays (!) or by radioactive decay (can't remember if it's alpha or beta) of minute quantities of radioisotopes in the chips' substrate. While it will only happen once in every ten years or so on average, it does happen and can cause a system crash. ECC is about reducing the possible risk (it would have to flip 3 bits simultaneously to fool ECC RAM).
That's it. I'm no longer part of Team Sanity.