To ECC Or Not To ECC?
MetaHiro asks: "I'm going to be upgrading my system in a couple of weeks. I've been looking around the net for reviews and/or benchmarks for ECC vs. non-ECC in both speed and whether or not it's worth it to shell out the extra bucks for ECC. I'm also wondering whether or not i should buy PC2100 ECC instead of PC 2700 non-ECC ram or wait until PC2700 ECC becomes available."
For what are you using your system? If it's just another gaming PC then ECC isn't worth it.
* Origin: XBase BBS (2:490/4100) Well the good old days may not return and rocks might melt and sea may burn.
How is this helpful? The philosophy behind that seems to be rather than allow my programs to continue with a corrupt bit of data, it's better to halt all operation and LOSE ALL MY DATA and perhaps corrupt my hard drive. That's "help" I don't need.
Is this universal, or just my OS (W2K), BIOS, or hardware? Is there a way for ECC to simply and calmly report a problem without locking up my machine in the process?
Please Rate my comment (and help support Fre
Scrubbing detects and corrects memory errors that are in memory addresses that are idle. This prevents correctable errors from turning into uncorrectable errors in sections of memory that are infrequently accessed by the CPU.
Mea navis aericumbens anguillis abundat
ECC protection of main memory is distinct from ECC protection of CPU cache memory. They are independent. You can have ECC main memory with or without ECC cache. On PCs, the ECC encoder/decoder for cache is on the CPU chip, the ECC encoder/decoder for main memory is part of the chipset.
Mea navis aericumbens anguillis abundat
set SERR to None which wont BSOD the machine by raising an error NMI and set ECC to Checking, Correction w/ Scrubbing
D.J. Bernstein makes a case here on the merits of ECC. And his description of a "standard workstation" shows that ECC memory isn't that much more expensive.
Whether you use parity, non-parity, or even ECC, you should ALWAYS test your RAM sticks with MemTest86.
Test them when newly purchased (I've received duds from brand-name online memory warehouses.) Test them every few months (they can and do go bad.) Especially test when your computer exhibits otherwise unexplainable behavior, like: Windows BSoD, kernel panics, characters changing themselves on disk willy-nilly, programs crashing for no good reason, or going bad on disk and needing reinstallation. Disk files that go corrupt. Any of the above, even (or especially) when it seems inconsistent, can be caused by a few bad blocks in a RAM stick.
MemTest86 is a program that boots and runs off floppy (has its own boot loader, no OS), and t-h-o-r-o-u-g-h-l-y tests your ram. It even detects adjecent cell errors, where a 1 in cell n can threshold bias the 0 in cell n+1 or n-1 until it is considered a 1.
It even knows how to differentiate between cache memory errors and RAM errors. Just do it (after nightmare hardware problems, MemTest86 showed me what was broken- can't say enough good things about it.) It's user interface could be more informative, but when it spots and error, you'll know.
Big Daddy, Johnny, Burp, Aunt Zelda, Scott, Slurp, Big Momma
Huh? If ECC isn't worth it, then RAID 5 (the minimum-acceptable "poor man's" form of RAID) certainly isn't, for the same reasons.
If you're correct, then I can say this, using the same logic: "In my experiences, RAID-5 is not worthwhile. There are too many ways the data can get corrupted before it ever hits the disk."
Heck, all the built-in hard drive ECC, SMART technology, sector relocation, CRC-checking, etc. are useless, if we follow your argument to its logical conclusion.
Since ECC and RAID-5 are similar technology and perform similar roles in similar ways, and since RAM is always far more important than disk, at least once the OS is booted, then, ECC is more important than RAID, yet make data centers skip on ECC and spend on RAID. What's silly is that if MEMORY IS CORRUPT, THEN DISK CERTAINLY WILL BE -- PERMANENTLY.
A 1-bit error is the most common kind of memory error and can crop up for a multitude of reasons, including static, voltage spikes, bad motherboard timings, cosmic rays, etc. And, you'll still catch the 2-bit errors, the second most common kind. I'd be willing to bet that 1 and 2 bit errors account for 99+% of all memory errors, unless you got a bad chip. ECC was NEVER designed to fix all errors, just the 99+% we actually encounter.
The thing about some
If you're anti-ECC for ANY reason, then, to follow your logic, you should also be anti-RAID and anti-tape backup.
There seems to be a misunderstanding regadring ECC and Parity memory, at least in relation to PC's.
PC memory has either some extra bits (one for every eight bits) for ciclic redundancy, or it hasn't. There is no dedicated ECC circuity on PC memory, (Exept maybe IBM Chipkill memory). The difference between parity memory and ECC memory lies on how the memory controller takes advantage of the extra bits. To get an idea on how ECC really works, see Hamming code.
Regards
Roberto de Iriarte
roberto at spock dot cl