Patch To Allow Linux To Use Defective DIMMs
BtG writes: "BadRAM is a patch to Linux 2.2 which allows it to make use of faulty memory by marking the bad pages as unallocatable at boot time. If there were a source of cheap faulty DIMMs this would make building Linux boxes with buckets of memory significantly cheaper; it also demonstrates another advantage of having the source code to one's operating system." The BadRAM page has a great explanation of the project's motivation and status. Now where can I pick up some faulty-but-fixable 512MB RAM sticks?
Now if we can just get some kernel drivers that can bypass other bad hardware... umm... uh... ok... so I don't have any examples, but dammit! Don't you love that Snicker's commercial with the guy wanting to go to lunch with his poster of the panda bear?! Pretty pretty panda! Pretty pretty panda!
I INVENTED PANTS!
Actually the BIOS Memeory test does a "rough" memory test. The test itself is a bit different from one BIOS manufacturer to another, but for instance with Phoenix bios the test is as follows:
Write a pattern to ram.
Read it back. And compare to what it should be.
Write a aliasing pattern to ram.
Read it back and make sure u got what u expected.
This will catch quite a few serious memory problems. The 2 cases I saw recently were:
1. PC100 memory that wasn't quite up to par. (Droping bits randomly)
2. A friend of mine put a PC100 dimm in a mobo set for PC133 dimms. The PC100 ram worked.. almost.
In both cases the results were random lockups and application crashes. Turning on the BIOS ram test quickly identified the problem. Which was resolved by putting quality memory in the box.
These tests are only really usefull the first time you boot your box or if you are suspecting bad RAM. (It's a quick way to test for serious memory problems without having to pull out a RamChecker)
Say a manufacturer has a DIMM with a bad address line, visually it's a normal DIMM. They decided to sell it for a massively low price. Some unscrupulous buyer utilizing "sell and run" tactics buys a bunch and sells them as normal DIMMs. Buyers call the manufacturer complaining that the DIMMs they bought are bad and the retailer no longer exists.
Note: This has happened to me, I bought two hard to find keyboards only to find both had water damage upon arrival (packaging still good) and the retailer disappeared.
If you think education is expensive, you should try ignorance -- Derek Bok, president of Harvard
I can see a lot of obstacles to becoming a dealer in bad RAM, including the hassles of having to test it and characterize that nature and extent of its problems. In order for someone to know whether the price and product were right, s/he'd have to have some detailed info about the defects, and it would vary from stick to stick. I'd think that the dealer would have to list each stick independently, along with the defect info. The customer would have to buy each stick as a unique entity, presumably. The logistics costs of doing all this and keeping the inventory records up to date would be quite costly. The economics of this are questionable, IMO, although given my parsimonious nature, I'd love to be proven wrong.
CmdrTaco that really does not become you.
--
There isn't this huge supply of bad memory out there (Radio Shack jokes aside) because memory manufacturers are pretty clever. Bad memory is put into things like:
Audio storage devices, like answering machines and mp3 players, where a bit or two of failure will just end up as a teeny bit more noise.
Cheap digital cameras (once again, a bad pixel here or there....)
Toys. They actually call bad memory "toy memory" sometimes.
SIMMS. You take (for example) 4 bad chips and 1 good chip and get the equivalent of 4 good chips (by replacing bad io's on the bad chips with io's on the good chip). There are jillions of ways to do this, and companies have pretty much done them all.
Sell them at CompUSA to people who don't know any better. (Sorry, couldn't resist)
If I were you, I'd download memtest86 right now.
This RAM idea is great. It shows the true spirit of open source. We can fix anything from broken RAM to Microsoft.
;)
Now what about something to make me burn less coasters?
There's new error-prevention technology available, but I believe it relies on hardware and software, to keep you from burning coasters.
#1) Sanyo's BURN-Proof technology (available on the newest Creative, QPS, Plextor, LaCie etc. writers)
#2) Ricoh's JustLink technology (available on its CD-R/RW/DVD-ROM combination drive among others)
Both technologies automatically prevents buffer under-run errors which are the leading cause of coasters.
If I were in the market for a new burner, I'd go with the $349 Ricoh combination drive. It does 12x CD-R, 10x CD-RW, 32x CD-ROM, and 8x DVD-ROM all in one device. That's smart.
--
--
He lives in a world where those who do not run the client software of the omnipresent meme are unacceptable.
Actually, these sorts of marking bad memory things have been around for a long time. The trouble with them is that RAM isn't as determanistic as you'd like to believe. Once you get one or two bad pages in RAM, more tend to break rather quickly. This has been tried in the past and I think that these systems will still be flakey even when not touching the bad ram.
I've also seen machines that had bad ram lock up randomly, even when the bad pages are never touched.
Let me just say that I have my doubts.
"It's worth doing because it keeps a working system up, and Linux should have that"
Huh? Isn't ECC functionality handled in the BIOS, not the OS? So... Linux does have that functionality, eh?
OtakuBooty.com: Smart, funny, sexy nerds.
Doesn't this make Linux look like a throwback to those old days of hobbies, like Amature Radio making QRP rigs in sardine tins?
Sounds to me like you're describing what the true definition of "hacking" is. Let's see, if you can get a certain amount of RAM by doing a little hacking for less than you'd pay in a store, what's wrong with that? People do this in their everyday lives. As I type this, I have a penny in my car, wedged between the stereo head unit and the side where it mounts to hold the thing in place. No, it doesn't look pretty. Yes, it did the job (and the price was right).
Perhaps in corporate, "everything must look nice and neat" environments, this isn't a valid solution for adding RAM. But for the CS student who has an old DIMM sitting around, it's pretty damn cool.
-- "Complacency is a far more dangerous attitude than outrage." -Naomi Littlebear
I must be reading way to much slashdot.. I read the headline, and thought
"Of course Signal 11 is no more.. He left after a big blowout with Rob..."
--
This message brought to you by Colin Davis
Colin Davis
IANAESE, but Linux will never be used in a life critical medical device, never mind implantable medical devices. Firstly, the FDA requirements are simply too strict to allow linux's usage. Secondly, it's both overkill and underkill at once. Linux may be relatively efficient compared to systems like Windows, but it's not anywhere near small enough for traditional embedded systems. Third, Linux simply does more than it would need ever need to, why use it? Fourth, it's not setup for DSP type operations. Fifth, do you really want to unnecessarily trust your life to linux just so you can make a statement?
Yes, but are you really going to impress management to adopt Linux when demonstrating that it can run on on a machine with defective parts?
It's like saying, "This new Mercedes E320 is as good as the one without a dent in the door." Both run, both are equally safe (assuming its just a superficial dent), but it just doesn't sell itself.
--
A feeling of having made the same mistake before: Deja Foobar
By the way, here's some ancient related trivia. The INTV Productions video game cartridge "Triple Challenge" integrated the previously-released Chess, Checkers and Backgammon on a single game cartridge. In its original form, the Chess cartridge came equipped with a 1K SRAM onboard, as the game required extra memory.
At the time INTV went to produce the Triple Challenge carts, they discovered that since RAM had grown in capacity over the years, 1K SRAMs weren't available in quantity for reasonable prices, and larger SRAMs were too expensive as well. They almost had to cancel the Triple Challenge cart.
That is, until they found someone with a stack of 2K SRAMs, in which half the RAM was good, the other half was bad. Since the game only needed 1K, it ignored the bad half, and off they went.
Cool, eh?
--Joe--
Program Intellivision!
Program Intellivision!
On x86 systems, the memory controller handles the ECC error correction, and you get an interrupt which allows you to log the event. Often this interrupt is handled by the BIOS. But the BIOS typically doesn't do anything but log the event. The OS can do more; it can map the bad block out, probably without a shutdown.
You'll probably get better results simply by cleaning off the contacts with a pencil eraser (remembering to brush away all the eraser dust first) and firmly re-inserting them into the socket.
--
"Open source is good." - Steve Jobs
"Open source is evil." - Microsoft
Well, I read kernel traffic and kernel mailing list but somehow this escaped me. Thanks slashdot. /jarek
great stuff btw
Now where can I pick up some faulty-but-fixable 512MB RAM sticks?
:(
Oops, now you can't
I'm amazed by how little this crowd know about details of semiconductor manufacturing. Defects are unavoidable! There, I said it. With the transistor sizes that we are pushing today, a speck of dust ruins an entire blcok. All you can do is *limit* the extent to which this happens by being as strict as possible with your clean room. But *some* contaminents will always get through. Perfection is unachievable. You have to accept this.
Alright, so we've accepted that some dies are necessarily going to be damaged. Why not make the hardware such that it can resist imperfections? Well, actually we do. RAM being as simple and homogenous as it is, lends itself well to this approach. Here's the idea: you add extra "blocks" of memory to a decode line. Then, if one of the "regular" blocks is destroyed by a process imperfection, the post-fab die can be modified with laser to reroute data to the extra backup block. So you invest some die room in backup structures, so that a die with only a few errors can be "corrected" and will still function as intended. This is basically like keeping a spare tire. If you get one blowout, you're still in business, but two and you are in trouble. Of course, you can package as many extras as necessary, but it may not make economic sense. Here you calculate the appropriate trade off between die size and yield to make the decision.
Anyway, long story short: your DRAM is already "bad". Quite a few RAM chips contain process errors that are rerouted around in hardware so that you, the consumer, need never know. To you, the process is transparent. All you should care about is that you get your *functional* RAM cheaper, because the manufacturer would have had to scrap that die otherwise.
This post discusses software "rerouting" around blocks that had more errors than could be corrected in hardware, but somehow still made it out the door. What's wrong with that?
Will semiconductor manufacturers suddenly think "Gee...let's not worry about yield anymore?" You'd better bet they won't. And even if they did, if the software rerouting is so clean as to not be noticeable (which is the only way it would fly), what do you care? You'd get your RAM cheaper.
--Lenny
----
No, it seems to be more like "Your new Ford Explorer will automatically check your tires and reinforce any weak spots it finds". Honestly, (in my experience) integrated circuts are generally the second part of any electronic component to go, after moving parts (of which computers have very few). So, when memory goes bad, what would you rather do? Have the computer fix it for you, or go out and buy a new module? I personally have a pair of SIMMs which aren't in top shape anymore... since I don't have the money, the time, or the ambition to get replacements, I just put them in a low-load router box, which occasionally gets rebooted to clear out memory problems. I'd much rather have the system check my memory for me. So, unless someone decides to put out a major FUD about it, I don't see how it could be advertised in any but a good light.
This is news to ME, and I'm glad it was here to hear it. Sites like /. are meant to bring attention to a wide range of topics, while others aim to provide prompt coverage of narrower topics. Sure, it's annoying to see a story about something you already heard elsewhere a while ago, but it's important for those that missed it the first time.
Now we have a way to deliberately make Linux instable.... if you subscribe to the theory that if a DIMM has bad areas then that increases the probability that more of its areas will fail in the future.
Only if the DIMMs don't degrade further slowly, right?
Anyway it's a nice thing.
Just another coder...
Doesn't this make Linux look like a throwback to those old days of hobbies, like Amature Radio making QRP rigs in sardine tins?
"Hello, Kingston, I'm looking for any old cruddy defective RAM, got any? Uh.. No.. I won't be reselling it to Linux users, I swear that I am with a major US ISP and we want to put it into our servers! Call Rambus, you say? Hello? Hello?"
--
A feeling of having made the same mistake before: Deja Foobar
Check out the 'mem=exactmap' boot-time option in the 2.4 kernel series - it got added a couple of weeks ago. That way you can specify and exclude faulty RAM via boot parameters.
which allows it to make use of faulty memory... *sigh* ....of course my wife had to be reading over my shoulder and asked "Great, now is there anything I can install in you to make use of YOUR faulty memory...." She thinks she's funny. =)
Every time the topic of bad RAM comes up I can't help but tell this story:
We had just installed an Exchange server we were rolling out the Exchange client to all the desktop PCs. Unfortunately, no one had thought to ask if they could take it--which many of them couldn't. So we were feverishly digging up all the RAM we could find and sticking it into machines as fas as we could. I happened to find a 32MB stick (glory be!) in an unused PC. I said to my boss: "Hey, I found a big one!" He turns around and asked "Is it any good?" while simultaneously reaching for it, and ZAP audibly discharges static electricity right into the thing. We look at each other for a moment and then I say "Not anymore."
I was wrong, though--it was fine.
--
An abstained vote is a vote for Bush and Gore.
Non-meta-modded "Overrated" mods are killing Slashdot
(Hey Ryan! Here's your proof!)
The theory on why this occurs is that the memory on the freelist isn't being accessed (well, ok, we have some bugs occasionally, but... :) and it degrades because of this. Since you don't care about the data on the page, it kinda sucks to panic during the bzero. So, Irix, starting with 6.5.7, knows how to "nofault" this bzero operation and if it fails, it grabs a new page off the freelist and discards the bad page. This is a feature we are thinking of adding to Linux. Other types of pages for which this same recovery can work are mapped files (ie, program and library text/read only data) and clean user data (ie, just swapped in and not yet modified). This is about the best way to solve the intermittant failure problem.
One other interesting thing we've noticed with RAM is that failure rates over time stay pretty much constant since, while manufacturing techniques are getting much better, the memories are getting larger. This causes failure rates to stay nearly flat.
Go Badgers! -- #include "std/disclaimer.h"
I like to preserve as much of the environment as I can. The production of chips is a very resource intensive process, and the complexity of a chip means that a lot of the produced chips are incorrect. I dislike wasting good materials, and even if they are merely `good enough' they should be taken seriously. By allowing the use of such `good enough' memory chips, I hope to help preserving the environment.
Computers are pretty darn unbiodegradable, yet the pace of progress makes them obsolete at an ever increasing pace. How many 386's are somewhere other than landfill? A 386 is not actually that old when you compare it to a washing machine or a fridge.
A lot of people are slamming this because it has some practical limitations, so what!
This guy has done a pretty cool hack, but has also done something positive about side of our industry that most of don't think about very often.
When it absolutely positively has to be there.
I don't know anything about memory industry testing methods in particular, but given that RAM chips are produced in volume, I doubt every chip is inspected by a human. More likely, there is an automated inspection system that checks for surface defects, perhaps runs quick functionality tests, and batch sampling inspection by human inspectors.
:)
Not that that would lead to lower quality overall. It might even be better since people might get sloppy after looking at a few hundred identical chips every day, whereas machines don't get bored. (well, except for my computer. It insists I play Unreal Tourn. now and again
-----
D. Fischer
ShoutingMan.com
I'm reminded of a Digital VAX 7/1150 (or was it 711/50? I don't remember) I worked on. It was the size of two refridgerators, and required TWO room air conditioners to keep temperatures in the room reasonable - to deliver roughly the processing power of a '286...
But it was an AWESOME machine. And, mapping out memory that was bad was something it did on the fly! It would find a bad memory spot, and do one of several things with it:
1) Stop using it;
2) If the problem was intermittent, it would only store PROGRAM CODE there - which, if the memory was bad, it could re-load from the hard disk!
3) If the memory tested good for a while doing program code, (a few days, I think) it would return that RAM to general use.
An amazing machine - with some features that pale even a big, powerful *nix box today.. For example, versioning of just about EVERYTHING... *:1, *:2, *:3, etc, and while there was a "root" user (called admin on this system) there could be more than one! (My login, "dirdisb" was a "root" login too, and you could always tell when you looked at a file whether admin or dirdisb actually did it - much better than *nix style, IMHO)
I seem to recall that there was a patch or something you could apply that would make it use ALL hard disk space to create as many versions as possible of documents - or just 10. (we used the latter)
This machine, as slow as it was, would comfortably handle 20 simultaneous users! (granted, no X-windows or GUI at all)
With patches such as this badram patch (which IMHO should be added to the kernel by default) we are getting some of these really cool features back...
I have no problem with your religion until you decide it's reason to deprive others of the truth.
...be bad in the right way. If, for example, the most significant bit of the addressing bus were damaged, you would only have access to half of the chip's memory at a maximum.
To fix this problem, you'd have to use 2 "half-working" chips to get the same amount of memory that 1 of the non-damaged ones would have provided.
It seems that buying several damaged chips to make up for the one non-damaged chip would not be very cost effective in the long run.
The Atari 7800 had lots more ROM space than it could possibly use for just the 960-bit digital signature lockout code, so they included a full 6502 CPU test in there. Sheesh.
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
Or for that matter, anything that is a half-way decent color copy of the original. If I didn't hate the bastards so much, I'd love to be an FBI badge at one of those shows. Particularly one on commission :).
Bogus software wouldn't bother me at all, at least if it worked when I got it home. I'm talking about defective modems that were taken from a trash heap somewhere and new boxes made for them and put out for sale.
I could give a damn if someone is selling bootlegs of Freddy the Fish or something like that.
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
mod this guy up, he knows what he's talking about.
--
{ Joke Mode On }
From the "Why did this article get posted...and not mine" department:
Ohh..This sounds a lot like the story I submitted a week ago called:
"How to make your good memory make use of bad software."
{Joke Mode Off }
(+1 Funny) only if I laugh out loud.
A lot of vendors offer service contracts and warranties. But peecee vendors, accustomed to dealing with...shall we say, less than reliable operating systems, will try to make you go through 543 steps and tests before allowing you to send your hardware for replacement, because most problems in that world are either OS bugs or user error. In the real computer market, they don't fuck around. You paid a lot for your system, and you can expect it to work. When you call them up and say you have a bad foowhatzit, they send you a new one (unless they're Sun, in which case they make you sign an NDA first - bad Sun, bad!). They expect, and rightly so in most cases, that you know what you're doing and it isn't a software problem. No runaround, no bullshit, no cost to you. This is one of several reasons I'll never own another peecee. The service just ain't the same.
I understand the concept of trying to get the most you can out of any hardware you might have. But I also think that people stuck with such hardware ought to learn their lesson next time instead of relying on hacks, however clever, to work around their poor buying decisions. Anyone actually seeking out bad memory to use with this is insane. Firstly, there's good reason to believe that if memory is failing, other areas in the same part may fail as well, perhaps with less frequency or at a later time. Second, even if the cost is half that of a good part, is it really worth saving 50 bucks and having to configure this thing, test it, and make sure periodically that no other memory areas fail? I would suggest that technical work of this type is worth at least 50 bucks an hour...so if you value your time fairly, it's unlikely that you'll win out of this. I'll gladly pay some extra money to know that I won't get a sig11 the next time I go to compile something...and if I do, I can get replacement parts the next morning at no cost, without any hassle. I don't work for any vendors. I'm just a sysadmin who'd rather read slashdot than argue with tech support.
Does it run every possible combination of CPU instructions on boot up?
It can't. Running every possible combination would take an indefinately long period of time (infinity).
Does it check every single block on the hard drive? No!
This is because the hard drive is not essential to the functioning of the computer. With modern operating systems, usually a hard drive is required, but again, it's not essential.
Does it check all the blocks of floppies, CDs, DVDs, etc to make sure they work?
This would be absurd. "Please insert a DVD, CD and floppy to boot".
If the memory test is essential to the functioning of the system, why do they let you skip it?
You then go on to contridict yourself by saying Obviously, the smart thing to do is to _wait_ for the memory to fail rather than test the whole lot for a minute or two..
If you're operating a linux server and someone wants to replace it with w2k, well, let him try. But don't tell him that the RAM is defective. (It works with linux, so what?) :-)
As much as I despise the racial/cultural epithets that you just used. I have a policy as it relates to computer show shoppining. Never buy from someone who doesn't speak english as their primary language. Never buy from someone from more than 1 state away. Never buy something that comes in a box that is a low quality black and white copy of the original.
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
Ahh, this would have been useful with an old P90/32MB motherboard/memory combo I recently gave away...
It was quite fun, running a system (FreeBSD) with a single-bit memory error. Sure, gcc would die on occasion, but then there was the oddness of having a script break because a file http_log was missing (mysteriously renamed to httx_log). The best part was actually figuring out which bit was bad...
iSKUNK!
Sure, you wouldn't want to intentionally put bad memory into a production machine, but what if good memory goes bad? This patch, if further developed to perform periodic testing and updating of the bad memory map *during operation*, could actually harden the linux kernel against spontaneous hardware failure!
If we ever want to see linux used in mission critical systems like air traffic control, embedded medical devices, or military applications, then projects like this are the key. Fault tolerance now exists for memory (this project), storage (RAID), and communication (redundant NICs). The next target should be the CPU.
How about projects to detect the types of errors a failing (typically, overheated) cpu produces, and adjust the scheduler accordingly to insert idle time and cool down the cpu? Or to use one cpu to monitor another in multiprocessor systems, and avoid using a processor that starts producing faulty results?
I knew from the badRAM website that it was discussed on kt (and so read that earlier today), but I hadn't noticed it there when it first appeared -- sometimes I'm too interested in other topics, sometimes I don't read it all the way through, whatever. There's a lot of information in the world. I'm glad that someone sent in the link and explained it a bit (so I was intrigued and looked through it), which is what this site is about.
... don't read it or waste your own time commenting :) There are a lot of projects out there that have been laboring quietly which may have spectacular results at any time -- do you not want them discussed because they're "old news"? The in-progress Tux2 filesystem was no secret, for instance, (that, too, was discussed on kt), but how many people had heard of it before ALS? Not nearly as many as would have been interested, I warrant, and the comments on the slashdot story about it indicate that.
But how many people saw it on kt? For purely selfish reasons, I'd like to see a lot more people know about this project, because I find it very interesting and useful-looking. Plus, I think it's just a neat hack in general, and I'd like to point it out.
If it's too old for you, then
YMMV, whaddya do?
OK.
timothy
jrnl: http://tinyurl.com/c2l8yr / foes: http://tinyurl.com/ckjno5
I actually had a friend who ran DOS debug on his BIOS and noted that it *did* actually test every one of the X86's registers during boot - obviously this didn't get printed in the boot sequence (because if it had, the message would flash up for about 8 clock cycles) but it actually did a sequence like:
put arbitrary number in first register
copy first register to second register
...
copy second-last register to last register
compare last register to first register
if (different) HALT
(I have a feeling it did this twice, with 01010... and then with 10101...)
He thought this was quite clever until he realized that a bad bit in the first register would still pass the test (and also negate the test of that bit on all the other registers)...
To be honest, doing this does nothing but ENCOURAGE the manufacturers to continue to make bad ram. Why not make these manufacturing companies make GOOD ram so we don't have to do this in the first place?
I think this comment is a result of not knowing how difficult it is to make "good" RAM (or *any* good electronics for that matter).
The complex process behind making RAM means that there will ALWAYS be defective ones in the batches that can't meet standards set by the manufacturer to cover their ass on warrantees, etc. If they are going to end up throwing this hardware out (or recycling the pieces if that's cheap) then they might be able to make more money selling the defective RAM.
While having "partially defective" RAM on the market may seem bad, if the price point is right it could be useful for some people. Like if I could get 512MB with 50MB defective for 100 bucks (w/ maybe a 1 year warrantee on the 462), I'd jump on it in a second. But that's just me.
----- rL
But what happens when there are more faulty rows than spares? Answer: They sell it to the crackling-audio people, for cheap. Such chips might not have a higer tendency toward progressive RAM-cancer than those with fewer faults (though I will be happy to stand corrected if someone has contrary data.)
By marking the bad rows bad, Linux never allocates them. With virtual memory in fixed-size pages and memory-mapped I/O there's no penalty for scattering your data all over the place and hopping over the occasional chuckhole.
Downside would be if there's a flakey cell and the memory test misses it. So a persistent bad-page map might be useful, as would beefing up the startup test if the feature is enabled, and adding a background memory test on the currently unallocated pages, to pick up any really-low-density faults.
If an intermittent cell gives you a hit on a read-only or unmodified page, a hack in the parity-error recovery code could move and refresh it. A read hit on a modified page not yet written back to disk is bad news. (Another background hack could be writing modified pages back part of the time the disk is otherwise idle, to reduce that window.)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Don't RAM chips already have some internal ability to map out bad areas, like the way that IDE/SCSI hard drives come with spare sectors that are automatically mapped in when a regular one fails?
s 294-4/project1/dram-test.html
[google]
DRAMs typically improve yield by using spare rows and columns to replace those occupied by defective cells. The repairs are performed using current-blown fuses, laser-blown fuses, or laser-annealed resistor connections. [Coc94] references one case in which memory repair increased the yield from 1% to 51%.
from http://www.cs.berkeley.edu/~rfromm/Courses/SP96/c
It's a lot more like saying, "This old Mercedes will only run on hi-test gasoline, but the new one burns any old crap you put in the tank and runs just as well."
And if that doesn't impress management, take your faulty DIMM and throw it in a Win2k box. Sit back and watch the fireworks.
This isn't as much "normalization" as it is "don't take so many drugs when you're designing tables."
PS/2?! Heck, didn't PDP-11s do this? I can still remember (barely, though) PDP memory boards with socketed discrete memory chips that included a two/three spare chips so you could replace the bad chips yourself after locating them using an XXDP+ dianostic. I don't recall bad chips bringing down the system but in those days, when a big PDP had 4MB of memory, every little bit counted and the OS worked around it.
I heard somewhere several years ago that Windows got around flaky memory not by marking pages as `bad' but by forcing additional wait states if questionable memory was detected at boot time. Any truth to this?
--
CUR ALLOC 20195.....5804M
I once had a motherboard that had a problem refreshing RAM above the 1MB boundary. You could write and read just fine, but as time passed you would watch individual bits revert back to 1's. It was kind of amusing watching all the graphics in doom change ;^). I 'fixed' the problem by writing a TSR that tricked all software into thinking I only had 640k of memory. That memory would have been fine for cache if the data was protected by a checksum/CRC.
Acually memory fails for many diferent reasons. I personaly work in the test department at a large semiconductor company that makes SDRAM. All memory gets tested before it gets soldered to the PCB but it still can encounter a fail after it leads. Single bit fails and the like are acually fairly common. Most people don't even notice them. Also there are speed related problems, heat related problems, and mechanical problems that come up. For example, the early AMD chipsets had problems with certain memory. Memory also has clock issues and other little details that can effect things dramaticly. However this project seems to be a little far fetched since most memory gets a little worse over time. This is okay for a temp fix but your memory will slowly get worse with time. Usually within 6 months the memory is almost totally bad. Another problem with using bad memory is that in several cases memory will draw a larger idle current than other modules. And if you have more bad modules there is a higher current load. This can lead to damaged parts on your motherboard. Another thing to realize is that load style can effect your stability. In several situations it has been found that windows can run over top of a memory error because it tends to not stress the memory quite as much as your basic high load unix setup. Thats my $.02 on the issue I guess.. It seems like this is basicly using a hard drive that is whining and spuddering. Not a smart move stability wise.
Answering machines use ARAM chips which are actually faulty DRAMs. They map the defects and avoid using those areas.
----
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
With that said, if I can get a 512MB DIMM for $100 because 50MB of it is inusable, I'll buy it and install this hack, even though having any where near that much ram on my Linux box will not help me(it's little more than a mp3 player and web browser, most of my work is on my Mac).
Burn Hollywood Burn
Actually this is quiet handy. Windows always worked better with dodge memory than Linux did because Linux always tries to use as much memory as possible for caching where as Windows didn't.
It made it notorious for working with dodge memory, failing to boot half of the time. I've seen people blame Linux for bad hardward because it would work with Windows.
It's nice that Linux now could just go
*ARGH YOU HAVE CRAP MEMORY*
shrug it's shoulders and chug along anyway.
Plus, some of the motivation is a little aschew. If you want to push these chips into an old machine, you still have the problems of RAM limitations due to motherboard design. A fat lot 512MB of semi-faulty memory is going to do in a board that can only support up to 32MB (or better yet, an older chipset that supports up to 8 or 2).
- I don't care if they globalize against free speech. All my best free thoughts are done in my head.
handfull of busted 256m DIMMS: $10.71 with tax
6 reboots, a little math, and a partial kernel compile: 21min
The look on my roommate's face when I typed "top": priceless!
Linux forced its way into our IT Department when it could restore a trashed system into something useful. Here at The Salvation Army, we endevor to be good stewards of what we are given. We have an IBM PC Server 350 (now named "Methusela") that crashed one day for no apparent reason. It refused to run Windows anymore... not even Win98 or Win95!
But it ran Linux flawlessly. Well, actually it did point out one flaw on its own: The internal Ethernet controller was getting an unusually high number of bad packets. It would receive DHCP assignments, even do some web work in Linux... but it was enough to shut Windows down completely. Even after installing a working NIC, Windows could not run due to the faulty internal NIC, but Linux ran fine!
Likewise, we found an instant way to crash every WinNT system in the building. Someone was re-arranging the hubs and switches, and accidentally created a packet loop by plugging a switch back to itself... in three seconds every WinNT system on the network went straight to the Blue Screen of Death.
It one thing to handle the rules well, but quite another to deal with the exceptions!
You must have some sort of problem with linux. This is a valuable, and technically interesting addition to the Linux kernel, and all you can do is act like everybody in the world who needs 256MB DIMMs also has $135 ready.
I know you're just trolling, and I shouldn't respond, but for students, and anybody who has access to memory modules that are experiencing known, predictable faults, this would be great. Not everybody has some fancy $30,000/year job, y'know.
--
"Don't trolls get tired?"
Burris
If he can only get the Kernel to do this on the fly...
"Depression is merely anger without enthusiasm." - Anonymous
It's a bad idea. Defective parts have an annoying habit of being resold as good parts by unscrupulous vendors. That is why many manufacturers make a point of destroying defective parts, so they can't sneak back into the supply chain.
Mea navis aericumbens anguillis abundat
The use of a patch like this is only on systems with ECC ram for moving data out of pages which had non-catastrophic memory errors so that the system can keep running without using the flakey bits until that DIMM can be replaced.
bad ram is just that, bad, and is likely to have more failures over time.
Hello Slashdot??? This is old news. The kernel bad ram patch was discussed on the weekly kernel traffic digest several weeks ago. Do the slashdot story posters read any news sites other then slashdot? Other news sites are out there...
Does the name Pavlov ring a bell?
I built a box from a hole-in-the-wall parts reseller that did volume, volume, volume in the silicon valley, and started having some stability issues. So I started doing mass kernel recompiles (100 at a time) as a test, and sure enough, gcc exited with errors, at random points. However, I've heard that this is not necessarily 'bad bits' on the memory sticks, but rather an inability of the memory to actually keep up with the 100Mhz bus, even though it was billed as pc-100 RAM. Anyhow, after that, I always sprung for the premium Toshiba lifetime-guarantee ram at fryes, and I just got the other parts elsewhere.
Both my almost-never-used linux boxes are entirely composed of 2nd hand hardware. I'd say it would come in handy. Dumpster divers would appreciate it.
Rich
I think the point this guy is trying to make, on the economic side of things, is that there's a limit where the ratio of good to bad RAM is as high as it's going to be. IOW, there's always going to be a certain percentage of bad RAM in a given production run. So, why not make this semi-broken RAM viable, by selling it cheap for commodity PCs. This could help reduce prices on cheap PCs and make computing power and the Internet more accessible to those with tight funding (poorer folks, libraries, schools, non-profit orgs).
- Does it run every possible combination of CPU instructions on boot up? No!
- Does it check every single block on the hard drive? No!
- Does it check all the blocks of floppies, CDs, DVDs, etc to make sure they work? No!
- If the memory test is essential to the functioning of the system, why do they let you skip it?
Obviously, the smart thing to do is to _wait_ for the memory to fail rather than test the whole lot for a minute or two. After doing a full test once, the first time you boot, you can leave a very low priority memory tester running, or leave the full test to some quiet period with a cron job - a decent memory test of course, not that half-witted test that BIOSes do.Does my bum look big in this?
No flame here, just clarification...
./ that have a clue about memories, memory protection architecture, fabrication, production, and testing techniques.
> we are talking about the die that passed the initial wafer probe and then were packaged and then fail somehow at or after the packaging or even shipping stages.
Yes, we definitely are. I only brought up the fact that the industry already routes around process errors in DRAM's to demonstrate a point. He seemed frightened by the possibility that future DRAM's we buy might not be 100% "clean". I wanted to demonstrate that 100% clean isn't necessary, and infact isn't produced now, by and large. What *is* necessary is 100% functional parts. DRAM manufacturers know this, and use it to improve yields, thus driving down cost. No foul play there.
> So in conclusion, I think there are plenty of us at
I expect there are, but none of them were posting. Instead, most of the posts demonstrated a clear lack of understanding of the process. The consequences of techniques like Linux's remapping seemed to worry the original poster, and I wanted to explain why it wasn't something to worry about since a similar process is already performed quite successfully. Further, I wanted to emphasize that this is a perfectly valid technique for increasing yield, and is transparent to the user. It isn't like the manufacturers are trying to rip people off.
> Why don't you just allow us to think that is't really cool to use software to lock out blocks of RAM?
I'm not saying it isn't cool. Some seemed weary of the idea, though, and I wanted to point out that their present DIMM's use something very much like this. As long as the software can do this transparently (as the hardware does), what does it matter to the user?
> but I seriously doubt that they are applicable to such dense cells as memories
Believe me: they are. Think of it: a massive die area that will be completely destroyed by a single speck. Wouldn't you prefer an ever so slightly (say 5-10%) larger die that can withstand one or two specks? The redundency is very easy to use in a homogenous structure like DRAM (millions of identical cells). All that has to be done to "swap in" the replacement RAM block is to modify some address lines. That can be done by electrically blowing fuses on die, or through laser modification. One of my former employers was *most* found of the laser approach. It added quite a bit of flexibility to their designs.
--Lenny
Modern DRAM doesn't have much trouble with bad cells, and the yields are quite good. So there isn't a big supply of DRAM with bad cells that fail solidly. Most DRAM problems today are at the edges: at the buffers, the connectors, or clock synchronization - the things that can be messed up during installation.
Personally, I get ECC RAM even on desktops, just so I know it's working. It eliminates arguments with tech support when the hardware really is broken.
So *any* price they can get, above and beyond shipping and handling, is pure profit to them. And there's a cost to throwing them out which they wouldn't have any more.
--
Infuriate left and right
So tell me, how, without a memory tester, do you know what's predictable vs. unpredictable? As far as I'm concerned with DIMMs, 5% broken is 100% broken.
--
A feeling of having made the same mistake before: Deja Foobar
- A.P.
--
* CmdrTaco is an idiot.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
You can check out Best Buy or CompUSA for some faulty RAM. They seem to have a never ending supply of it. Not only that but you can pay the price that you can get it of of the net for good ram!
huh?
Okay, so you car dealer marks down the car 80% because ONE of the pistons doesn't work right. However, the rest work fine and he installed a thing-a-ma-bob to make the engine ignore that piston.. ummm..
What kind of warranties will the end user get for the memory? what kind of performance is eaten by this program? does the memory run up to spec? will it still work in 2 months?
There could be a niche market for "used" memory sticks, but "damaged" or "defective" may not sell all too well...
I agree, however, that this does seem like a cool way to resurect older systems into useful appliances (print servers, routers/gateways, etc).
Verbatim
Price, Quality, Time. Pick none. What, you thought you had a choice?