Patch To Allow Linux To Use Defective DIMMs
BtG writes: "BadRAM is a patch to Linux 2.2 which allows it to make use of faulty memory by marking the bad pages as unallocatable at boot time. If there were a source of cheap faulty DIMMs this would make building Linux boxes with buckets of memory significantly cheaper; it also demonstrates another advantage of having the source code to one's operating system." The BadRAM page has a great explanation of the project's motivation and status. Now where can I pick up some faulty-but-fixable 512MB RAM sticks?
--
Now if we can just get some kernel drivers that can bypass other bad hardware... umm... uh... ok... so I don't have any examples, but dammit! Don't you love that Snicker's commercial with the guy wanting to go to lunch with his poster of the panda bear?! Pretty pretty panda! Pretty pretty panda!
I INVENTED PANTS!
This might add too much to the cost, but you could put a non-volatile chip on each simm that spells out where the bad memory is. Then when a compliant OS boots up it reads this information out of the sims and uses it to block out those pages marked as bad by the manufacturer. But who knows if they would go to all this trouble for memory that can only be used by one OS.
Also have to come up with a good euphemistic buzzword for this memory so that it can be sold. "Near compliant" was a good one I heard a while back.
JET Program: see Japan, meet intere
--
--
fat lenny's gonna lick your brain today.
Why not incorporate a memory tester into the kernel? It could pick a page, swap out whatever user-space data that was there, and run a few read/write tests on it. If it passed, just free up the page and go on. If it failed, retry to make sure, then mark it as bad and contact a user-space daemon to append this address to the list of bad memory pages. When the kernel boots up the next time, it reads in this file and the newly detected bad pages are never used again.
JET Program: see Japan, meet intere
-- We should kill all the intolerant people in the world.
(sorry, it had to be said:D)
-- We should kill all the intolerant people in the world.
I disagree.
One of the major advantages Linux has over M$ products and even some flavours of UNIX is its ability to work on spartan hardware.
It makes sense to use cheaper stuff. However, if you plan to use defective DIMMs on Mission Critical machines, you probable have
some defective ones in your head.
Its up to you (just like everything else with Linux ). If you want it, use it.
If you don't, good for you.
This is very good news for people in places like India (Where I come from) where the cost of 32MB Ram (EDO RAM) is about 1/3rd
of the average person's salary.
Hackito Ergo Sum.
Liberte, Egalite, Fraternite, Caffinate.
Electrical Engineering is BORING.
In a situation like that, it costs far more for a company to pay an IT worker to swap RAM chips than it does to buy quality, top-of-the-line components the first time around. Not to mention, mission-critical systems should NEVER use crappy hardware.
For 'real' fault tolerance, the program needs to repeatedly scan memory and mark areas as bad 'on the fly', rather than just doing it once at bootup.
Maybe as often as every file load.
Actually the BIOS Memeory test does a "rough" memory test. The test itself is a bit different from one BIOS manufacturer to another, but for instance with Phoenix bios the test is as follows:
Write a pattern to ram.
Read it back. And compare to what it should be.
Write a aliasing pattern to ram.
Read it back and make sure u got what u expected.
This will catch quite a few serious memory problems. The 2 cases I saw recently were:
1. PC100 memory that wasn't quite up to par. (Droping bits randomly)
2. A friend of mine put a PC100 dimm in a mobo set for PC133 dimms. The PC100 ram worked.. almost.
In both cases the results were random lockups and application crashes. Turning on the BIOS ram test quickly identified the problem. Which was resolved by putting quality memory in the box.
These tests are only really usefull the first time you boot your box or if you are suspecting bad RAM. (It's a quick way to test for serious memory problems without having to pull out a RamChecker)
--
Say a manufacturer has a DIMM with a bad address line, visually it's a normal DIMM. They decided to sell it for a massively low price. Some unscrupulous buyer utilizing "sell and run" tactics buys a bunch and sells them as normal DIMMs. Buyers call the manufacturer complaining that the DIMMs they bought are bad and the retailer no longer exists.
Note: This has happened to me, I bought two hard to find keyboards only to find both had water damage upon arrival (packaging still good) and the retailer disappeared.
If you think education is expensive, you should try ignorance -- Derek Bok, president of Harvard
I can see a lot of obstacles to becoming a dealer in bad RAM, including the hassles of having to test it and characterize that nature and extent of its problems. In order for someone to know whether the price and product were right, s/he'd have to have some detailed info about the defects, and it would vary from stick to stick. I'd think that the dealer would have to list each stick independently, along with the defect info. The customer would have to buy each stick as a unique entity, presumably. The logistics costs of doing all this and keeping the inventory records up to date would be quite costly. The economics of this are questionable, IMO, although given my parsimonious nature, I'd love to be proven wrong.
If RAM chips were damaged, there is potential that they get damanged further, and eventually your amount of usable RAM will run down to 0 bytes.
In theory it's possible but VERY unlikely given the quality of RAM made today. But it also depends how the RAM was damaged. If it's a bad connection, the modules (I don't know the name of the chips on the DIMM) themselves could be good but they aren't connected properly. If the chips are from a poorly made batch THEN you may have this degredation problem you speak of.
But apart from individual testing, how do you differentiate these chips from "good" chips used on "good" RAM? You can't, and that's why there are warrantees.
rLowe
----- rL
CmdrTaco that really does not become you.
--
There isn't this huge supply of bad memory out there (Radio Shack jokes aside) because memory manufacturers are pretty clever. Bad memory is put into things like:
Audio storage devices, like answering machines and mp3 players, where a bit or two of failure will just end up as a teeny bit more noise.
Cheap digital cameras (once again, a bad pixel here or there....)
Toys. They actually call bad memory "toy memory" sometimes.
SIMMS. You take (for example) 4 bad chips and 1 good chip and get the equivalent of 4 good chips (by replacing bad io's on the bad chips with io's on the good chip). There are jillions of ways to do this, and companies have pretty much done them all.
Sell them at CompUSA to people who don't know any better. (Sorry, couldn't resist)
If I were you, I'd download memtest86 right now.
I just seems like a recepie for disaster. And if Murphy has anything to say about it, the memory will completly fail just as you finish that 100% windows compatable OS (and forgot to save as you went) and it will be all lost to that great processing cycle in the sky.
Hey, if you don't mind putting low-quality, defective or damaged parts in your computer, be my guest. I'd rather run be stable than wondering how long until my computer craps out (granted that this could happen for many OTHER reasons, but I don't need another reason).
I can't see it being recommended to buisnesses or even consumers. Hackers... maybe that's the niche you'd be looking for. By Hackers I mean - people who push their computer systems beyond specification for the simple pleasure of being able to do it.
Buisnesses want reliability - no buisness will buy "damaged" or "defective" parts. You might swindle them by calling them "refurbished", but thats streching it.
Consumers want something to "surf the internet" or to send pictures to grandma. They don't want the hassle of replaing their RAM every 4 months because the lower-quality memory gives out too much.
Hackers... well.. we're a weird bunch... Most will find the idea interesting.. maybe set a system up to see how kewl it looks, but... nah.
Whatever..
Price, Quality, Time. Pick none. What, you thought you had a choice?
This RAM idea is great. It shows the true spirit of open source. We can fix anything from broken RAM to Microsoft.
;)
Now what about something to make me burn less coasters?
There's new error-prevention technology available, but I believe it relies on hardware and software, to keep you from burning coasters.
#1) Sanyo's BURN-Proof technology (available on the newest Creative, QPS, Plextor, LaCie etc. writers)
#2) Ricoh's JustLink technology (available on its CD-R/RW/DVD-ROM combination drive among others)
Both technologies automatically prevents buffer under-run errors which are the leading cause of coasters.
If I were in the market for a new burner, I'd go with the $349 Ricoh combination drive. It does 12x CD-R, 10x CD-RW, 32x CD-ROM, and 8x DVD-ROM all in one device. That's smart.
--
--
He lives in a world where those who do not run the client software of the omnipresent meme are unacceptable.
Actually, these sorts of marking bad memory things have been around for a long time. The trouble with them is that RAM isn't as determanistic as you'd like to believe. Once you get one or two bad pages in RAM, more tend to break rather quickly. This has been tried in the past and I think that these systems will still be flakey even when not touching the bad ram.
I've also seen machines that had bad ram lock up randomly, even when the bad pages are never touched.
Let me just say that I have my doubts.
Uh, if those BIOS memory tests disturb you so much, why don't you just turn them off like the rest of us who don't want to wait?
Put the quick post/boot/whatever option on, and no more memory testing wasting your precious seconds.
"It's worth doing because it keeps a working system up, and Linux should have that"
Huh? Isn't ECC functionality handled in the BIOS, not the OS? So... Linux does have that functionality, eh?
OtakuBooty.com: Smart, funny, sexy nerds.
Doesn't this make Linux look like a throwback to those old days of hobbies, like Amature Radio making QRP rigs in sardine tins?
Sounds to me like you're describing what the true definition of "hacking" is. Let's see, if you can get a certain amount of RAM by doing a little hacking for less than you'd pay in a store, what's wrong with that? People do this in their everyday lives. As I type this, I have a penny in my car, wedged between the stereo head unit and the side where it mounts to hold the thing in place. No, it doesn't look pretty. Yes, it did the job (and the price was right).
Perhaps in corporate, "everything must look nice and neat" environments, this isn't a valid solution for adding RAM. But for the CS student who has an old DIMM sitting around, it's pretty damn cool.
-- "Complacency is a far more dangerous attitude than outrage." -Naomi Littlebear
I must be reading way to much slashdot.. I read the headline, and thought
"Of course Signal 11 is no more.. He left after a big blowout with Rob..."
--
This message brought to you by Colin Davis
Colin Davis
This still doesn't cover the issue of intermittent failures, though. Think on this: if memory were obviously good or bad, then bad memory would probably not even pass the initial memory check in a system. Yet bad memory often does pass even the most strenuous checks, even with ECC enabled (I had this problem myself). And if the "good/bad" status falls about such a hazy line, how effective can any piece of software be at singling out a defective chip?
This BadRAM patch will probably improve the situation, of course, but it can't pick out every bad chip thrown at it unless it checks every chip for eternity. Obviously, that's not going to happen...so...system reliability is still compromised, if only less so.
Kelledin Tane, the Dreaming Minstrel
http://kelledin.tripod.com/scovsms.jpgIANAESE, but Linux will never be used in a life critical medical device, never mind implantable medical devices. Firstly, the FDA requirements are simply too strict to allow linux's usage. Secondly, it's both overkill and underkill at once. Linux may be relatively efficient compared to systems like Windows, but it's not anywhere near small enough for traditional embedded systems. Third, Linux simply does more than it would need ever need to, why use it? Fourth, it's not setup for DSP type operations. Fifth, do you really want to unnecessarily trust your life to linux just so you can make a statement?
Yes, but are you really going to impress management to adopt Linux when demonstrating that it can run on on a machine with defective parts?
It's like saying, "This new Mercedes E320 is as good as the one without a dent in the door." Both run, both are equally safe (assuming its just a superficial dent), but it just doesn't sell itself.
--
A feeling of having made the same mistake before: Deja Foobar
512MB sticks are still expensive, faulty or not.
I've never seen faulty RAM advertised. Where are you seeing these prices? Online?
Well, a couple of them had blue backs. It's some sort of rainbow pack. They were prety inexpensive too. I suppose it's possible they sell a couple diferent types or qualities of media. (come to think of it the floppys I mentioned were probably rainbow colored too.)
Oh well, I guesse it's back to making colored labels with my Neato and hopeing they don't peal off.
-AndyBy the way, here's some ancient related trivia. The INTV Productions video game cartridge "Triple Challenge" integrated the previously-released Chess, Checkers and Backgammon on a single game cartridge. In its original form, the Chess cartridge came equipped with a 1K SRAM onboard, as the game required extra memory.
At the time INTV went to produce the Triple Challenge carts, they discovered that since RAM had grown in capacity over the years, 1K SRAMs weren't available in quantity for reasonable prices, and larger SRAMs were too expensive as well. They almost had to cancel the Triple Challenge cart.
That is, until they found someone with a stack of 2K SRAMs, in which half the RAM was good, the other half was bad. Since the game only needed 1K, it ignored the bad half, and off they went.
Cool, eh?
--Joe--
Program Intellivision!
Program Intellivision!
On x86 systems, the memory controller handles the ECC error correction, and you get an interrupt which allows you to log the event. Often this interrupt is handled by the BIOS. But the BIOS typically doesn't do anything but log the event. The OS can do more; it can map the bad block out, probably without a shutdown.
You'll probably get better results simply by cleaning off the contacts with a pencil eraser (remembering to brush away all the eraser dust first) and firmly re-inserting them into the socket.
--
"Open source is good." - Steve Jobs
"Open source is evil." - Microsoft
from reading the owners manual on the ceo's seville sts at work, it's designed to run in a case of total coolant loss. Runs on 4 cylinders, and pumps air through the other 4. Flip back and forth as needed, to keep the system cool enough to limp home.
Does anyone else see this as a setp backwards? Don't you remember getting old hard drives with a map of all the bad sectors printed on it? Does anyone want to repeat those days using memory this time around? As with hard drives, one failing section is not alone... This kind of project encourages use of questionable memory sticks which are bound to bring down systems without warning. I will not use fault memory for the same reason I won't overclock my processor (at least not my main system's CPU) there is more to it than price! Reliability, longevity, stability, etc are all in question when you push a PC where it was not meant to go.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
Before using the patch I had quite some problems with this system when the system was under load (the defects are located in the upper address range of the RAM). Works very well now since I applied the patch
Just fyi, You can have more than one root user in linux unless I am way out of the ballpark. Just specify -u 0 when you add the user. This effectively sets the userid of whatever user you just added to the same as root, so that user can do whatever root can, but they're not technically root, as they don't have the 'root' username. They have their own.
//FIXME: Bad
Well, I read kernel traffic and kernel mailing list but somehow this escaped me. Thanks slashdot. /jarek
great stuff btw
The only coasters I ever made were because of defective cheapass BASF cd-r's... They were total crap. Most of the ones that burned fine and worked after burning now no longer work (some chemical process? who knows?) Never had a coaster with the other brands I use, Sony, Maxell, Memorex and Fuji
This is very useful for production systems with very large amounts of memory. For instance, Cray systems have a capability where bad bits in memory can be "flawed out" on the fly. Extending Linux to support the same kind of thing (especially in combination with ECC memory!) would very useful for shops that have big memory requirements and need as many 9s of uptime as they can get.
--Troy
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
Memory is only defective if it doesn't return what you earlier put into it... With this patch the memory _will_ return what was put into it. Hence, the memory is _not_ defective under Linux with this patch.
This type of thing is done with every hard drive in existance today (even production hard drives) and is such a non-issue that most people aren't even aware of the problem. Running scandisk under windows will show you the blocks that are currently mapped out. Running mkfsck with the -c flag will map out bad blocks during formatting under Linux.
This is a great patch! Good job.
The only improvement that I can see is if it could add additional addresses to map out "on the fly" as they are found to be bad during operation.
This would kill the program that hit the bad memory, but would otherwise let the computer keep running.
-- Never make a general statement.
Now where can I pick up some faulty-but-fixable 512MB RAM sticks?
:(
Oops, now you can't
Only if the DIMMs don't degrade further slowly, right?
Actually, building in support for on-the-fly memory testing might not be a bad idea. I've seen it done elsewhere and can't think of a good reason not to add it as an option to Linux. (Or other Unixes. You listening, Sun?) Even good RAM will eventually go south, and being able to prevent the system from using it could prevent critical systems from crashing. It might not be able to prevent every crash, but might prevent a few.
If I had the time to experiment, I'd add software to the Kernel that periodically pulls a page of RAM out of service and uses idle time to test it out, returning it to service or marking it permanently bad.
Windows kills the memory, Linux nurses it back to working health....
I'm amazed by how little this crowd know about details of semiconductor manufacturing. Defects are unavoidable! There, I said it. With the transistor sizes that we are pushing today, a speck of dust ruins an entire blcok. All you can do is *limit* the extent to which this happens by being as strict as possible with your clean room. But *some* contaminents will always get through. Perfection is unachievable. You have to accept this.
Alright, so we've accepted that some dies are necessarily going to be damaged. Why not make the hardware such that it can resist imperfections? Well, actually we do. RAM being as simple and homogenous as it is, lends itself well to this approach. Here's the idea: you add extra "blocks" of memory to a decode line. Then, if one of the "regular" blocks is destroyed by a process imperfection, the post-fab die can be modified with laser to reroute data to the extra backup block. So you invest some die room in backup structures, so that a die with only a few errors can be "corrected" and will still function as intended. This is basically like keeping a spare tire. If you get one blowout, you're still in business, but two and you are in trouble. Of course, you can package as many extras as necessary, but it may not make economic sense. Here you calculate the appropriate trade off between die size and yield to make the decision.
Anyway, long story short: your DRAM is already "bad". Quite a few RAM chips contain process errors that are rerouted around in hardware so that you, the consumer, need never know. To you, the process is transparent. All you should care about is that you get your *functional* RAM cheaper, because the manufacturer would have had to scrap that die otherwise.
This post discusses software "rerouting" around blocks that had more errors than could be corrected in hardware, but somehow still made it out the door. What's wrong with that?
Will semiconductor manufacturers suddenly think "Gee...let's not worry about yield anymore?" You'd better bet they won't. And even if they did, if the software rerouting is so clean as to not be noticeable (which is the only way it would fly), what do you care? You'd get your RAM cheaper.
--Lenny
----
No, it seems to be more like "Your new Ford Explorer will automatically check your tires and reinforce any weak spots it finds". Honestly, (in my experience) integrated circuts are generally the second part of any electronic component to go, after moving parts (of which computers have very few). So, when memory goes bad, what would you rather do? Have the computer fix it for you, or go out and buy a new module? I personally have a pair of SIMMs which aren't in top shape anymore... since I don't have the money, the time, or the ambition to get replacements, I just put them in a low-load router box, which occasionally gets rebooted to clear out memory problems. I'd much rather have the system check my memory for me. So, unless someone decides to put out a major FUD about it, I don't see how it could be advertised in any but a good light.
Even those of us with fancy nice-paying jobs don't necessarily have $135 for a 256MB DIMM.
This sort of thing has been done with hard drives for ages, and it's about time someone did it with RAM. Reminds me of that supercomputer some bunch of Hewlett Packard engineers built out of defective processors.
-Mars
To be honest, doing this does nothing but ENCOURAGE the manufacturers to continue to make bad ram. Why not make these manufacturing companies make GOOD ram so we don't have to do this in the first place?
Don't know much about x86 architecture, do you?
It has to "mark" these bad memory sectors somewhere - I'll need to look at the thing to be more informed on HOW it marks these sectors - probably in memory.
The exact register set escapes me at the moment but x86 processors (and indeed any processor with an MMU) keeps track of which pages are there and not there in hardware. The descriptor tables are kept in memory regardless of whether they're used for marking dirty pages or bad pages.
Also, wouldn't an entire page of memory be whiped out - not just one bit? I haven't looked at what these guys have done, but I wouldn't be suprised if entire 64KB pages are affected if only one bit in that page is gone.
Yes, I would assume they are marking PAGES of memory. So 64k (I thought they were 4k? I know there's a granularity bit to set this) chunks are taken out of the memory... How is this any different than the 4k clusters being taken in your ext2 FS? 64k in a (minimum) 32M DIMM is a drop in the well.
As I said before, I think it's really cool what they have been able to do. There may be some niche areas for this program to be useful. It is not, however, a good thing (IMO) to be buying bad memory just to save a buck.
Yes, it is a cool thing and will help those who either can't afford or can't wait to get their new memory in. Personally I won't use the module myself but that is no reason to go blasting it like you have. There aren't any speed hits, there aren't any vast wodges of memory taken up by it, you say it's buggy but that remains to be seen (the patch seems simple enough)... I'm just trying to figure out why you're so upset about this.
This is news to ME, and I'm glad it was here to hear it. Sites like /. are meant to bring attention to a wide range of topics, while others aim to provide prompt coverage of narrower topics. Sure, it's annoying to see a story about something you already heard elsewhere a while ago, but it's important for those that missed it the first time.
Now we have a way to deliberately make Linux instable.... if you subscribe to the theory that if a DIMM has bad areas then that increases the probability that more of its areas will fail in the future.
Another use:
Making useable older machines that are donated to the 3rd world which come with technologies that often are so totally obselete that replacing them is prohibitive.
In this case the idea is a very good one. A machine that would have been otherwise useless is made fully functional by Linux and what seems an ingenious way to fix the problem.
Do not spread "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0" over the internet, thank you.
Only if the DIMMs don't degrade further slowly, right?
Anyway it's a nice thing.
Just another coder...
This patch has been around for quite some time, how does this classify as news? And since when did slashdot start profiling kernel patches? Maybe they should setup a kernel patches section and archive the few hundred that are around in circulation that do something useful.
Doesn't this make Linux look like a throwback to those old days of hobbies, like Amature Radio making QRP rigs in sardine tins?
"Hello, Kingston, I'm looking for any old cruddy defective RAM, got any? Uh.. No.. I won't be reselling it to Linux users, I swear that I am with a major US ISP and we want to put it into our servers! Call Rambus, you say? Hello? Hello?"
--
A feeling of having made the same mistake before: Deja Foobar
Check out the 'mem=exactmap' boot-time option in the 2.4 kernel series - it got added a couple of weeks ago. That way you can specify and exclude faulty RAM via boot parameters.
which allows it to make use of faulty memory... *sigh* ....of course my wife had to be reading over my shoulder and asked "Great, now is there anything I can install in you to make use of YOUR faulty memory...." She thinks she's funny. =)
This is a cool idea to make use of bad hardware, and while it shouldn't be used to make new systems, it will.
It's really nice for poor geeks like me who would be happy to have more than 64MB. My system is a workstation from Dell, and I imagine it can handle 128 to 192 MB RAM total.
I know I won't get 64MB of ram out of a bad 64MB DIMM, but 50-60 extra still helps. I'd like to get this for the PeoplePC system I'm getting for Christmas (poor geek syndrome) which has 64MB RAM and 8MB of that is for the video card.
My worry is that some local computer makers will use it to screw other linux geeks. Luckily, most Linux users will notice. I'm mostly worried about those buying a preinstalled box as their first Linux PC. Here's hoping that they're safe.
How is the stability? I mean, if it is determined that a given stick of RAM has some bad areas, then can that stick degrade further after time?
For example, if a 128MB stick has 2MB removed as 'bad', is it possible that the chip may eventually have 3, then 4, then 5MB 'bad' as time goes on?
Any memory guru's out there care to give me some insight on this?
Now here's an idea.. What if a manufacturer, after going through a batch of bad memory, found a certain percentage of the group was bad. Then, conceivably, he could sell it at a lower price in some sort of clearance bin, and still make some money on the whole deal, as opposed to throwing it away and taking a loss on the deal.
Of course, I read the webpage *after* I post. He's already thought of this idea. There goes my patentable idea...
Every time the topic of bad RAM comes up I can't help but tell this story:
We had just installed an Exchange server we were rolling out the Exchange client to all the desktop PCs. Unfortunately, no one had thought to ask if they could take it--which many of them couldn't. So we were feverishly digging up all the RAM we could find and sticking it into machines as fas as we could. I happened to find a 32MB stick (glory be!) in an unused PC. I said to my boss: "Hey, I found a big one!" He turns around and asked "Is it any good?" while simultaneously reaching for it, and ZAP audibly discharges static electricity right into the thing. We look at each other for a moment and then I say "Not anymore."
I was wrong, though--it was fine.
--
An abstained vote is a vote for Bush and Gore.
Non-meta-modded "Overrated" mods are killing Slashdot
(Hey Ryan! Here's your proof!)
Would it be possible/plausible to detect bad RAM on the fly, without needing a reboot? What about testing RAM allocated right before a program fails? How long does it take to check RAM, and could it be done at runtime?
It is neither funny, insightful, interesting, or informative
Yeah, about 90% of the posts on this site are none of the above. Personally, I thought it was funny, not to warrant +1, but worth a small chuckle. I guess not everybody is as interested in karma-whoring as you are..
> The only improvement that I can see is if it could add additional addresses to map out "on the fly" as they are found to be bad during operation.
HP (HPPA) machines do this. But it's done with a combination of the kernel and the hardware/firmware. If a page is found to be faulty is noted on the PDT (Page Dealocation Table) and removed from use if possible (if it was previously user by the kernel the machine will dump core - if not it will just spit horrible errors). The PDT is stored in a flash and has 50 entries and it will be cleared when a memory is changed.
And this is done on big, expensive, production servers - Linux should addopt this as this really works (the PDT machanism kept one of my HP servers stable with a bad 512mb mem module until i could shut it down)
--
1% APY, No fees, Online Bank https://captl1.co/2uIErYq Don't let your $$$ sit in a no-interest acct.
i just started with a semiconductor testing company and i found out that memory usually has "bad blocks" which they can route around by blowing a link with a laser. they add redundant banks just for this purpose, it's cheaper to have that redundancy than scrap the whole chip.
so now (or soon) we will be able to do the same thing with software that they already do during the manufacturing process, presumeably for chips that go bad after they leave the fab?
stay frosty and alert
I like to preserve as much of the environment as I can. The production of chips is a very resource intensive process, and the complexity of a chip means that a lot of the produced chips are incorrect. I dislike wasting good materials, and even if they are merely `good enough' they should be taken seriously. By allowing the use of such `good enough' memory chips, I hope to help preserving the environment.
Computers are pretty darn unbiodegradable, yet the pace of progress makes them obsolete at an ever increasing pace. How many 386's are somewhere other than landfill? A 386 is not actually that old when you compare it to a washing machine or a fridge.
A lot of people are slamming this because it has some practical limitations, so what!
This guy has done a pretty cool hack, but has also done something positive about side of our industry that most of don't think about very often.
When it absolutely positively has to be there.
Assuming this patch works as advertised, why wouldn't you use bad memory on any machine you wanted to save a few bucks on?
We're talking about ram that was defective when it was manufactured. It's not going to get worse. There are just some areas of the chips that can't be used.
-AndyI've been dealing with mcglen micro over some bad(from all my testing) ram. They charged me full price tho.
I don't know anything about memory industry testing methods in particular, but given that RAM chips are produced in volume, I doubt every chip is inspected by a human. More likely, there is an automated inspection system that checks for surface defects, perhaps runs quick functionality tests, and batch sampling inspection by human inspectors.
:)
Not that that would lead to lower quality overall. It might even be better since people might get sloppy after looking at a few hundred identical chips every day, whereas machines don't get bored. (well, except for my computer. It insists I play Unreal Tourn. now and again
-----
D. Fischer
ShoutingMan.com
Did anyone else have memtest86 sit there for (i think i gave up after about 6 hours) passing all the tests it had performed so far but why does it take so long?
I mean i was still on the first test and only the 11th Pass??
anyway ive given up on that idea for the moment, until i can easily pinpoint the corrupted addresses on my 32meg dimm.
when everything is working perfectly.. BREAK SOMETHING before something else FUCKS up!
I'm reminded of a Digital VAX 7/1150 (or was it 711/50? I don't remember) I worked on. It was the size of two refridgerators, and required TWO room air conditioners to keep temperatures in the room reasonable - to deliver roughly the processing power of a '286...
But it was an AWESOME machine. And, mapping out memory that was bad was something it did on the fly! It would find a bad memory spot, and do one of several things with it:
1) Stop using it;
2) If the problem was intermittent, it would only store PROGRAM CODE there - which, if the memory was bad, it could re-load from the hard disk!
3) If the memory tested good for a while doing program code, (a few days, I think) it would return that RAM to general use.
An amazing machine - with some features that pale even a big, powerful *nix box today.. For example, versioning of just about EVERYTHING... *:1, *:2, *:3, etc, and while there was a "root" user (called admin on this system) there could be more than one! (My login, "dirdisb" was a "root" login too, and you could always tell when you looked at a file whether admin or dirdisb actually did it - much better than *nix style, IMHO)
I seem to recall that there was a patch or something you could apply that would make it use ALL hard disk space to create as many versions as possible of documents - or just 10. (we used the latter)
This machine, as slow as it was, would comfortably handle 20 simultaneous users! (granted, no X-windows or GUI at all)
With patches such as this badram patch (which IMHO should be added to the kernel by default) we are getting some of these really cool features back...
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Hmm.. Let's see. As a full time college student working 15 hours a week at 12 bucks an hour that's about 600 bucks a month after taxes. Figure in monthly home beer costs (18pack @ $10 X 4 = $40), bar beer costs (twice that) burritos ($24) petrol ($60) isp ($40) and rent ($350) and that gives a whole 6 dollars a month to spend on hardware. Now one could save up for almost two years to get a new 256MB stick of ram, or use this program and get a FAST ASS system with the money from taking cans back. The question of which option to take seems like no question at all.
I have a source for thousands of bad dimm modules. To inquire; jschedler@pnp-group.com
Well, having fiddled with computer hardware for years, I figger electronic components, not ONLY computers, degrade. That makes perfect sense to you dosen't it?
.18 or .15 micron die processes (correct me if i am wrong), and since they are getting so puny and fast, the possibility of failure is even greater.
;-P
What's more, our RAM chips are running at really high speeds of 66MHz, 100Mhz, and some at 133MHz for those using the P3-EB processors. They are done with
Think -- why does old 486 PCs last a decade while our new Pentium PCs die after a few years, bit by bit?
Anyway I can't think of a better way to figure out bad RAM pages until I get fux0red by a screwed data coming in. By the time that happens, it might be too late -- your screwed data might have been on its way down your data bus to your hard drive, or prolly writing some data onto your BIOS Flash chip.
When that happens, that $80 you save on your RAM izn't worth the time. I won't wanna try this, even if the system is a devel system. It's just now worth the time. I figger the time's more worth wanking.
...be bad in the right way. If, for example, the most significant bit of the addressing bus were damaged, you would only have access to half of the chip's memory at a maximum.
To fix this problem, you'd have to use 2 "half-working" chips to get the same amount of memory that 1 of the non-damaged ones would have provided.
It seems that buying several damaged chips to make up for the one non-damaged chip would not be very cost effective in the long run.
The Atari 7800 had lots more ROM space than it could possibly use for just the 960-bit digital signature lockout code, so they included a full 6502 CPU test in there. Sheesh.
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
{ SARCASM } Or better yet -- if you ALREADY are getting your news in other places -- why take the time to read or worse yet post to Slashdot? {
I would be willing to bet that a good percentage of Slashdot readers did not (or do not) read the "Weekly kernel traffic digest".
(+1 Funny) only if I laugh out loud.
Places like super-computing centers - especially those that use off the shelf parts and clustering software like beowulf will be able to both increase the total amount of memory available to their systems while still coming out cheaper than buying good memory.
I still don't have the money I'd need to build my dream beowulf system, but its getting more affordable everyday!
Of course this is a good thing (Best Martha stewart impression) It's good because, as the rest here have said, it shows us that Linux can be ran on virtually any system, any condition. MS can't even claim that, nor can Mac.
fuck off
SCO has had this facility for well over 10 years which is when I last played with it.
It was very useful for solving issues with the 16MB to 16MB+8K area which many pre-pentium PCs used for special BIOS memory mapping stuff. It also helped out with being able to stop SCO Unix using the area of RAM that BOIS copied itself and Video RAM to before the BIOS become intelligent about his.
Somebody emailed me and actually thought that I was serious.
huh?
you racist people crack me up i swear.. ive been ripped off by more 'crackers' than 'camel jockeys'
Maya 3 is coming out for Linux in 2001 so he can dump NT but it is also coming for OSX and he will likely go that way because he has always been a Mac fan. Both are better than NT for this. Maya on NT is very cool though if you can't quite spend the bucks for Irix on a SGI machine.
I own a laptop with some bad memory on the motherboard. It's usually not a problem, but a kernel compile will segfault at random points. Unlik a desktop, I can't just replace the RAM.
:-), I'll try this out.
As soon as I get the machine back (hi Rick
"The cost of freedom is eternal vigilance." -Thomas Jefferson
Obviously, this patch is for the desktop Linux user or Linux experimentor. No one in their right mind would use bad memory in a machine they want to remain stable, most importantly a company.
This sounds like a new TV special.
Tonight on Fox:
WHEN GOOD MEMORY GOES BAD!
if you were to compare this to bad tires on ford cars... it would be more like "new ford explorer add-on can fix bad tires" or something like that.. but everone knows i hate bad analogies...*cackle*
I believe sex is highly over rated... unless it involves me
Some of the commercially-developed UNIX systems have successfully used this feature to handle enterprise workloads. For example, HP-UX has long derived a unique advantage from its Dynamic Memory Resilience feature, which allows a server to sustain single-bit errors (see HP-UX 11i specs). If the Linux implementation of this function can also be made to work dynamically, i.e. fence off memory that goes bad during runtime, it will be a huge step forward for establishing Linux as a true enterprise alternative.
The BIOS memory test is of ABSOLUTLY NO VALUE unless you sprung for parity memory. The BIOS test only tries to send a random bit to memory, read it and check the parity.
If you bought non-parity memory, the test will always succeed no matter if the memory is bad or not.
This is a great idea, even if you don't want to use the bad RAM. I don't know how often this would happen, but I could imagine a scenario like this:
1. The software polls every once in a while for the bad RAM and marks it when it finds it.
2. Some other software could pick up on the kernel message and send an email to an administrator,
3. Who could schedule the downtime for the replacement without worrying about the system going blooey in the next 5 minutes.
A bit more peace of mind can't be a bad thing.
"Anyone that has ever gotten an idea based on any of my work and done something better with it-good for you."--J.Carmack
mod this guy up, he knows what he's talking about.
--
{ Joke Mode On }
From the "Why did this article get posted...and not mine" department:
Ohh..This sounds a lot like the story I submitted a week ago called:
"How to make your good memory make use of bad software."
{Joke Mode Off }
(+1 Funny) only if I laugh out loud.
Your arguement doesn't make sense. A machine with 500M of usable RAM works just as good as a machine with 512M in almost every scenario. That's completely unlike your 5-working-cylinder car analogy.
What kind of warranties will the end user get for the memory? what kind of performance is eaten by this program? does the memory run up to spec? will it still work in 2 months?
There is no performance hit; the pages are marked bad and not used, not continuously tested. It's the same as marking a chunk of memory as invalid (causing a page fault on access) but never marking it valid again since it's never swapped back in my the vmm. I'm not well versed enough to know if memory with bad sections will get progressively worse but everything else you've mentioned is silliness.
You can find more information here: http://www.home.zonnet.nl/vanrein/badra m .
bug.gd: error search engine. Humanity working together to solve all errors.
I'm mostly worried about those buying a preinstalled box as their first Linux PC.
Lucky this has happened all of ZERO times in the history of the universe.
A lot of vendors offer service contracts and warranties. But peecee vendors, accustomed to dealing with...shall we say, less than reliable operating systems, will try to make you go through 543 steps and tests before allowing you to send your hardware for replacement, because most problems in that world are either OS bugs or user error. In the real computer market, they don't fuck around. You paid a lot for your system, and you can expect it to work. When you call them up and say you have a bad foowhatzit, they send you a new one (unless they're Sun, in which case they make you sign an NDA first - bad Sun, bad!). They expect, and rightly so in most cases, that you know what you're doing and it isn't a software problem. No runaround, no bullshit, no cost to you. This is one of several reasons I'll never own another peecee. The service just ain't the same.
I understand the concept of trying to get the most you can out of any hardware you might have. But I also think that people stuck with such hardware ought to learn their lesson next time instead of relying on hacks, however clever, to work around their poor buying decisions. Anyone actually seeking out bad memory to use with this is insane. Firstly, there's good reason to believe that if memory is failing, other areas in the same part may fail as well, perhaps with less frequency or at a later time. Second, even if the cost is half that of a good part, is it really worth saving 50 bucks and having to configure this thing, test it, and make sure periodically that no other memory areas fail? I would suggest that technical work of this type is worth at least 50 bucks an hour...so if you value your time fairly, it's unlikely that you'll win out of this. I'll gladly pay some extra money to know that I won't get a sig11 the next time I go to compile something...and if I do, I can get replacement parts the next morning at no cost, without any hassle. I don't work for any vendors. I'm just a sysadmin who'd rather read slashdot than argue with tech support.
One bad bit causes them to throw out a 4Kb page.
If they'd take advantage of x86's segmented memory model they could reduce that amount to 16 bytes.
Trolls throughout history:
Jonathan Swift
We use a SCSI CD-R for data backups, average about 2CDRs per week. I've ended up with one coaster in the last year, and it had a scratch on it. There's no reduction in the machines (heavy) load while burning. SCSI rocks!
1984 was supposed to be a warning, not an instruction manual.
Does it run every possible combination of CPU instructions on boot up?
It can't. Running every possible combination would take an indefinately long period of time (infinity).
Does it check every single block on the hard drive? No!
This is because the hard drive is not essential to the functioning of the computer. With modern operating systems, usually a hard drive is required, but again, it's not essential.
Does it check all the blocks of floppies, CDs, DVDs, etc to make sure they work?
This would be absurd. "Please insert a DVD, CD and floppy to boot".
If the memory test is essential to the functioning of the system, why do they let you skip it?
You then go on to contridict yourself by saying Obviously, the smart thing to do is to _wait_ for the memory to fail rather than test the whole lot for a minute or two..
Hmm. I think the price of bad RAM just went up.
I'm not sure I'd use this myself though. I find myself resurecting machines that won't boot from lilo anymore quite a bit. It'd be awfully annoying not to be able to use an off the shelf rescue disk. It's bad enough when you have to get some weird scsi driver working.
OTOH, it'd be a lifesaver if you have some wounded machine you need to get back up ASAP.
If you're operating a linux server and someone wants to replace it with w2k, well, let him try. But don't tell him that the RAM is defective. (It works with linux, so what?) :-)
It can't [run every possible instruction combination]. Running every possible combination would take an indefinately long period of time (infinity).
Indeed, that's my point. The BIOS makes no great time-consuming effort to ensure the CPU works accurately and completely. The CPU's correct functioning is essential, as the FDIV bug showed. The CPU is the most essential part of the computer. And the only tests done on it are ones that work out which CPU it is, and some basic sanity. As the CPU isn't fully tested, and it's more important than the memory, why is the memory fully tested?
the hard drive is not essential to the functioning of the computer. With modern operating systems, usually a hard drive is required, but again, it's not essential.
Some form of device from which the OS, software, etc is loaded is necessary. If it's possible to use every block on this device, then to be sure of success every block on this device should be tested. This is the crackpot theory of BIOS memory testing applied to other system parts. My point is that hard drives map bad blocks out as and when they find them, when they're actually needed. So should memory. That's what I mean by 'waiting for memory to fail rather than test the whole lot'.
Does my bum look big in this?
Hey this sounds illogical. If RAM chips were damaged, there is potential that they get damanged further, and eventually your amount of usable RAM will run down to 0 bytes. They are electronic components anyway. Would you run your O/S on a hard disk with lots of damaged bad sectors and a potentially dying motor? I doubt so. First, correct me if I'm wrong, but before your O/S even boots, the first few bytes of your RAM is already taken up. If it's damaged, there's a few things which could happen. a) You can't start your computer b) Somewhere before your O/S loads, the whole boot process goes bonkers c) It corrupts the data in your hard disk Say, this is working at O/S kernel level, and the kernel loads way after a huge chunk of initialization process done by your system during boot, and some systems just refuse to boot after a failed RAM test. Even if you got your system running, there's a high risk of failure. Why waste time and money on a system that's gonna potentially die on you? Just spend a few more bucks on a brand new RAM stick. Common, I run my Squid on a 48mb system!
I have a car that runs on 5 of it's 6 cylinders -- and it runs great. New Cadillacs with V8's have a feature that detects if a cylinder is bad and stops its' plugs from firing while the engine churns on. I have a hard drive with lots of bad sectors -- but they are marked bad and my box is stable. You can buy new harddrives in the same condition. Ever put a 450Mhz processor on a 400Mhz motherboard? You're throwing away 12% of your processing power but the machine will still run fine. Why is RAM any different? Sure, new 100% working RAM is better -- but if I found a 512M DIMM lying on the street -- and 448M still works -- I'm not gonna complain!!
Where patchy power mains, brownouts etc can easily blow holes in your memory chips. I have 16 mb lying at home full of bad bits... I'm sure there are many others who have experienced the bad effects of a power surge. Sadly these chips blew out even though the computer was on a UPS. Memory is not cheap for people of limited resources and this is one great way of recycling something the developed world really doesn't bother about too much.In India at least we hate to throw anything away and this is one more major plus point for linux Cheerio Robin
Linkedin http://in.linkedin.com/in/robinsaikatchatterjee
Who told you that!?? Nothing, NOTHING gets thrown away. Most memory companies SELL their faulty modules to other hardware companies. Then those modules are used in low (i.e cheap) household appliances and other non-critical electronic applications that do not require a stable system.
"I can't see a f#@!! thing" - photon a to crossing photon b
You seem to miss the whole idea of the patch, its like using a drive with bad sectors on it, you can map the sectors as bad and hop over them as you seek through the disk. The badram patch does things in a similar way, it flags any area that is questionable as bad, and hops over it when mapping it out. Your stick of ram could have 0k useable and the patch would still work fine and it wouldn't effect the stability of the machine at all.
That wasn't particularly insightful, even for a dig at Signal 11.
Your right to not believe: Americans United for Separation of Church and
Ahh, this would have been useful with an old P90/32MB motherboard/memory combo I recently gave away...
It was quite fun, running a system (FreeBSD) with a single-bit memory error. Sure, gcc would die on occasion, but then there was the oddness of having a script break because a file http_log was missing (mysteriously renamed to httx_log). The best part was actually figuring out which bit was bad...
iSKUNK!
No, Windows just runs as slowly as if all your RAM has a couple of wait states.
I want to delete my account but Slashdot doesn't allow it.
Please post a URL or place to buy ram this cheap! I'd be more interested in that, than a funny kernel patch.
You wouldn't, by any chance, know where I could get a 512M for about $250, either?
But Best Buy and CompUSA carry only the highest quality "Bulk RAM". In fact, it was just about 9-12 months ago I purchased an 128MB SDRAM DIMM from Best Buy for $114. It only took me 3 trips to get one that actually worked. I think this experience alone is a testament to Best Buy's commitment to providing only the highest quality untested, shoddy, & defective merchandise! Rain check anyone? Don't even get me started...
Sure, you wouldn't want to intentionally put bad memory into a production machine, but what if good memory goes bad? This patch, if further developed to perform periodic testing and updating of the bad memory map *during operation*, could actually harden the linux kernel against spontaneous hardware failure!
If we ever want to see linux used in mission critical systems like air traffic control, embedded medical devices, or military applications, then projects like this are the key. Fault tolerance now exists for memory (this project), storage (RAID), and communication (redundant NICs). The next target should be the CPU.
How about projects to detect the types of errors a failing (typically, overheated) cpu produces, and adjust the scheduler accordingly to insert idle time and cool down the cpu? Or to use one cpu to monitor another in multiprocessor systems, and avoid using a processor that starts producing faulty results?
I knew from the badRAM website that it was discussed on kt (and so read that earlier today), but I hadn't noticed it there when it first appeared -- sometimes I'm too interested in other topics, sometimes I don't read it all the way through, whatever. There's a lot of information in the world. I'm glad that someone sent in the link and explained it a bit (so I was intrigued and looked through it), which is what this site is about.
... don't read it or waste your own time commenting :) There are a lot of projects out there that have been laboring quietly which may have spectacular results at any time -- do you not want them discussed because they're "old news"? The in-progress Tux2 filesystem was no secret, for instance, (that, too, was discussed on kt), but how many people had heard of it before ALS? Not nearly as many as would have been interested, I warrant, and the comments on the slashdot story about it indicate that.
But how many people saw it on kt? For purely selfish reasons, I'd like to see a lot more people know about this project, because I find it very interesting and useful-looking. Plus, I think it's just a neat hack in general, and I'd like to point it out.
If it's too old for you, then
YMMV, whaddya do?
OK.
timothy
jrnl: http://tinyurl.com/c2l8yr / foes: http://tinyurl.com/ckjno5
I actually had a friend who ran DOS debug on his BIOS and noted that it *did* actually test every one of the X86's registers during boot - obviously this didn't get printed in the boot sequence (because if it had, the message would flash up for about 8 clock cycles) but it actually did a sequence like:
put arbitrary number in first register
copy first register to second register
...
copy second-last register to last register
compare last register to first register
if (different) HALT
(I have a feeling it did this twice, with 01010... and then with 10101...)
He thought this was quite clever until he realized that a bad bit in the first register would still pass the test (and also negate the test of that bit on all the other registers)...
But what happens when there are more faulty rows than spares? Answer: They sell it to the crackling-audio people, for cheap. Such chips might not have a higer tendency toward progressive RAM-cancer than those with fewer faults (though I will be happy to stand corrected if someone has contrary data.)
By marking the bad rows bad, Linux never allocates them. With virtual memory in fixed-size pages and memory-mapped I/O there's no penalty for scattering your data all over the place and hopping over the occasional chuckhole.
Downside would be if there's a flakey cell and the memory test misses it. So a persistent bad-page map might be useful, as would beefing up the startup test if the feature is enabled, and adding a background memory test on the currently unallocated pages, to pick up any really-low-density faults.
If an intermittent cell gives you a hit on a read-only or unmodified page, a hack in the parity-error recovery code could move and refresh it. A read hit on a modified page not yet written back to disk is bad news. (Another background hack could be writing modified pages back part of the time the disk is otherwise idle, to reduce that window.)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
It's a lot more like saying, "This old Mercedes will only run on hi-test gasoline, but the new one burns any old crap you put in the tank and runs just as well."
And if that doesn't impress management, take your faulty DIMM and throw it in a Win2k box. Sit back and watch the fireworks.
This isn't as much "normalization" as it is "don't take so many drugs when you're designing tables."
PS/2?! Heck, didn't PDP-11s do this? I can still remember (barely, though) PDP memory boards with socketed discrete memory chips that included a two/three spare chips so you could replace the bad chips yourself after locating them using an XXDP+ dianostic. I don't recall bad chips bringing down the system but in those days, when a big PDP had 4MB of memory, every little bit counted and the OS worked around it.
I heard somewhere several years ago that Windows got around flaky memory not by marking pages as `bad' but by forcing additional wait states if questionable memory was detected at boot time. Any truth to this?
--
CUR ALLOC 20195.....5804M
if the ram went 'bad' in the first place, what's to stop even more pages from not being usable while the machine is turned on? Does anyone else think this is a remarkably bad idea? At the very least it'll require reboots until less and less of the memory is available. Is the money saved by buying good ram worth the frustration of having to boot/fsck the machine at random? Am I the only one who thinks this is another case of "we did it because we could, not because it's a good iea"? It's not even a good example of that type of thinking...
But you're absolutely right, no other OS vendor would even think about trying something like this. If you love your OS this much, I am not the right person to try and talk you out of it. Peace.
--
Peace,
Lord Omlette
ICQ# 77863057
[o]_O
I once had a motherboard that had a problem refreshing RAM above the 1MB boundary. You could write and read just fine, but as time passed you would watch individual bits revert back to 1's. It was kind of amusing watching all the graphics in doom change ;^). I 'fixed' the problem by writing a TSR that tricked all software into thinking I only had 640k of memory. That memory would have been fine for cache if the data was protected by a checksum/CRC.
Acually memory fails for many diferent reasons. I personaly work in the test department at a large semiconductor company that makes SDRAM. All memory gets tested before it gets soldered to the PCB but it still can encounter a fail after it leads. Single bit fails and the like are acually fairly common. Most people don't even notice them. Also there are speed related problems, heat related problems, and mechanical problems that come up. For example, the early AMD chipsets had problems with certain memory. Memory also has clock issues and other little details that can effect things dramaticly. However this project seems to be a little far fetched since most memory gets a little worse over time. This is okay for a temp fix but your memory will slowly get worse with time. Usually within 6 months the memory is almost totally bad. Another problem with using bad memory is that in several cases memory will draw a larger idle current than other modules. And if you have more bad modules there is a higher current load. This can lead to damaged parts on your motherboard. Another thing to realize is that load style can effect your stability. In several situations it has been found that windows can run over top of a memory error because it tends to not stress the memory quite as much as your basic high load unix setup. Thats my $.02 on the issue I guess.. It seems like this is basicly using a hard drive that is whining and spuddering. Not a smart move stability wise.
Answering machines use ARAM chips which are actually faulty DRAMs. They map the defects and avoid using those areas.
----
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
Viceroy of the Trade Federation in Episode 1, best line in the whole movie (IMHO)
--
A feeling of having made the same mistake before: Deja Foobar
With that said, if I can get a 512MB DIMM for $100 because 50MB of it is inusable, I'll buy it and install this hack, even though having any where near that much ram on my Linux box will not help me(it's little more than a mp3 player and web browser, most of my work is on my Mac).
Burn Hollywood Burn
Actually this is quiet handy. Windows always worked better with dodge memory than Linux did because Linux always tries to use as much memory as possible for caching where as Windows didn't.
It made it notorious for working with dodge memory, failing to boot half of the time. I've seen people blame Linux for bad hardward because it would work with Windows.
It's nice that Linux now could just go
*ARGH YOU HAVE CRAP MEMORY*
shrug it's shoulders and chug along anyway.
Doesn't this seem like a great oppertunity for Linux (or any OS) in the embedded market? Suppose I have some critical and rather non-accessable chunk of hardware. (Satellite, remote weather station, ...) Wouldn't it be cool if the hardware could detect the fault and "heal" itself?
Anyways, yhis is waaayyy to late to get read by anyone sane, (I have to admit that I only read the first 100 or so posts, so sorry if someone already had this idea)
Better get this guy (the Bad RAM webpage dude), he said Linux was "open source" software.
cat
I don't think we're talking about defects that are detected when the memory is in the wafer stage. I think we are talking about the die that passed the initial wafer probe and then were packaged and then fail somehow at or after the packaging or even shipping stages.
We are talking about packaged silicon here. And we're talking about populated DIMM/SIMM/etc boards that fail. There's no way in hell that a manufacturer would let faulty die get that far, and by that time, there's no way that any type of depositing or fib technique is gonna be practical in any way at this point.
so with that in mind, let's return to the topic they are talking about. Why don't you just allow us to think that is't really cool to use software to lock out blocks of RAM? (i.e. by using page protection that is already present in the OS and already supported by the CPU)
So in conclusion, I think there are plenty of us at ./ that have a clue about memories, memory protection architecture, fabrication, production, and testing techniques. We also are aware that there are a few techniques in practice like the one that you mention, but I seriously doubt that they are applicable to such dense cells as memories. Even if you could spare the silicon real estate and added logic and routing to provide the redundant resources, you would still have to reroute on a per-die and per-failure basis. And that assumes that the defect belongs to a special class of failures and is confined to an easilty fixable range of memories, etc.
Don't get so high on yourself over there.
Plus, some of the motivation is a little aschew. If you want to push these chips into an old machine, you still have the problems of RAM limitations due to motherboard design. A fat lot 512MB of semi-faulty memory is going to do in a board that can only support up to 32MB (or better yet, an older chipset that supports up to 8 or 2).
- I don't care if they globalize against free speech. All my best free thoughts are done in my head.
MS did this a lonnnnnng time ago in dos. device=emm386 -exclude 0fff-ffff or something like that. Of course this doesn't work for the first 640k but hey just another example of how m$ innovates --toq
I had memory in one of my machines that was totally out there. Would cause the weirdest problems during compiles, but never a sig 11.
I wanted to rename the box Alzheimers.
speed drops because the program has to wait a few cycles to determine if it can use an area of ram or not. HELLOO...this is already being done. The memory is already being mapped, there is already a section specifically designated to store this info. Otherwise, you'd write to GOOD areas of the RAM that were already in use by running applications... There is no reason that this should make your system any slower in the least bit. This is a very useful tool, and once developed will allow for on-th-fly saves. Not only that, but its going to allow me to have 512mb of ram in my system, without worrying about my wife fussing at me about the expenditure
handfull of busted 256m DIMMS: $10.71 with tax
6 reboots, a little math, and a partial kernel compile: 21min
The look on my roommate's face when I typed "top": priceless!
I don't know about you guys, but whenever my RAM goes bad, it usually won't even boot, so this patch is rather useless to me. It's even more useless when you consider I don't use 2.2 anymore.
Linux forced its way into our IT Department when it could restore a trashed system into something useful. Here at The Salvation Army, we endevor to be good stewards of what we are given. We have an IBM PC Server 350 (now named "Methusela") that crashed one day for no apparent reason. It refused to run Windows anymore... not even Win98 or Win95!
But it ran Linux flawlessly. Well, actually it did point out one flaw on its own: The internal Ethernet controller was getting an unusually high number of bad packets. It would receive DHCP assignments, even do some web work in Linux... but it was enough to shut Windows down completely. Even after installing a working NIC, Windows could not run due to the faulty internal NIC, but Linux ran fine!
Likewise, we found an instant way to crash every WinNT system in the building. Someone was re-arranging the hubs and switches, and accidentally created a packet loop by plugging a switch back to itself... in three seconds every WinNT system on the network went straight to the Blue Screen of Death.
It one thing to handle the rules well, but quite another to deal with the exceptions!
No! I high performance computing shop would never use substandard parts, nevermind parts that are KNOWN to be BAD.
Bad parts, regardless of whether it is somewhat usable, can corrupt data or cause incorrect results.
Therefore, those parts cannot be trusted.
-Dennis
You must have some sort of problem with linux. This is a valuable, and technically interesting addition to the Linux kernel, and all you can do is act like everybody in the world who needs 256MB DIMMs also has $135 ready.
I know you're just trolling, and I shouldn't respond, but for students, and anybody who has access to memory modules that are experiencing known, predictable faults, this would be great. Not everybody has some fancy $30,000/year job, y'know.
--
"Don't trolls get tired?"
Burris
..is that there's a patch to the linux kernel that will allow it to use rambus memory?
sweet.
Hey this sounds illogical. If RAM chips were damaged, there is potential that they get damanged further, and eventually your amount of usable RAM will run down to 0 bytes. They are electronic components anyway.
Would you run your O/S on a hard disk with lots of damaged bad sectors and a potentially dying motor? I doubt so. First, correct me if I'm wrong, but before your O/S even boots, the first few bytes of your RAM is already taken up. If it's damaged, there's a few things which could happen.
If he can only get the Kernel to do this on the fly...
"Depression is merely anger without enthusiasm." - Anonymous
Whats worse is my house mate has cut it in two and glued it to the side of one of his model tanks to play Warhammer 40k with!
Every time i see that sucker trundling across the battlefield i'll be reminded of 100 wasted dollars!
Its certainly now an expensive model tank eh?
'There is a Light that never goes out.'
Didn't PS/2s do this way back in the day? I seem to remember having an old Model 57 that reported an error in memory and then went happily on its way, using less RAM.
maybe?
In Vino Veritas
The use of a patch like this is only on systems with ECC ram for moving data out of pages which had non-catastrophic memory errors so that the system can keep running without using the flakey bits until that DIMM can be replaced.
bad ram is just that, bad, and is likely to have more failures over time.
Hello Slashdot??? This is old news. The kernel bad ram patch was discussed on the weekly kernel traffic digest several weeks ago. Do the slashdot story posters read any news sites other then slashdot? Other news sites are out there...
Does the name Pavlov ring a bell?
I built a box from a hole-in-the-wall parts reseller that did volume, volume, volume in the silicon valley, and started having some stability issues. So I started doing mass kernel recompiles (100 at a time) as a test, and sure enough, gcc exited with errors, at random points. However, I've heard that this is not necessarily 'bad bits' on the memory sticks, but rather an inability of the memory to actually keep up with the 100Mhz bus, even though it was billed as pc-100 RAM. Anyhow, after that, I always sprung for the premium Toshiba lifetime-guarantee ram at fryes, and I just got the other parts elsewhere.
Both my almost-never-used linux boxes are entirely composed of 2nd hand hardware. I'd say it would come in handy. Dumpster divers would appreciate it.
Rich
Now Linux is SO GOOD, it can even run well on defective hardware. Let's see M$ do that one! Now who's the one chasing tail-lights? The only way for M$ to one-better Linux is to make an OS that runs on NO RAM, and we know where they stand on bloat..
The car metaphor is flawed. It's more like Cadillac's 'limp home' feature that will run a V-8 engine on only 4 cylinders in the event of cooling failure, equipped with run-flat tires and bullet-proof Windows.
The REAL jabber has the /. user id: 13196
The REAL jabber has the user id: 13196
What you do today will cost you a day of your life
- Does it run every possible combination of CPU instructions on boot up? No!
- Does it check every single block on the hard drive? No!
- Does it check all the blocks of floppies, CDs, DVDs, etc to make sure they work? No!
- If the memory test is essential to the functioning of the system, why do they let you skip it?
Obviously, the smart thing to do is to _wait_ for the memory to fail rather than test the whole lot for a minute or two. After doing a full test once, the first time you boot, you can leave a very low priority memory tester running, or leave the full test to some quiet period with a cron job - a decent memory test of course, not that half-witted test that BIOSes do.Does my bum look big in this?
Yeah, but your engine is most likely running unbalanced. The firing order is there for a reason. Most likely in the Caddy V8s you're talking about, the onboard computer would stop firing the opposing cylinder under the other head as well.
No flame here, just clarification...
./ that have a clue about memories, memory protection architecture, fabrication, production, and testing techniques.
> we are talking about the die that passed the initial wafer probe and then were packaged and then fail somehow at or after the packaging or even shipping stages.
Yes, we definitely are. I only brought up the fact that the industry already routes around process errors in DRAM's to demonstrate a point. He seemed frightened by the possibility that future DRAM's we buy might not be 100% "clean". I wanted to demonstrate that 100% clean isn't necessary, and infact isn't produced now, by and large. What *is* necessary is 100% functional parts. DRAM manufacturers know this, and use it to improve yields, thus driving down cost. No foul play there.
> So in conclusion, I think there are plenty of us at
I expect there are, but none of them were posting. Instead, most of the posts demonstrated a clear lack of understanding of the process. The consequences of techniques like Linux's remapping seemed to worry the original poster, and I wanted to explain why it wasn't something to worry about since a similar process is already performed quite successfully. Further, I wanted to emphasize that this is a perfectly valid technique for increasing yield, and is transparent to the user. It isn't like the manufacturers are trying to rip people off.
> Why don't you just allow us to think that is't really cool to use software to lock out blocks of RAM?
I'm not saying it isn't cool. Some seemed weary of the idea, though, and I wanted to point out that their present DIMM's use something very much like this. As long as the software can do this transparently (as the hardware does), what does it matter to the user?
> but I seriously doubt that they are applicable to such dense cells as memories
Believe me: they are. Think of it: a massive die area that will be completely destroyed by a single speck. Wouldn't you prefer an ever so slightly (say 5-10%) larger die that can withstand one or two specks? The redundency is very easy to use in a homogenous structure like DRAM (millions of identical cells). All that has to be done to "swap in" the replacement RAM block is to modify some address lines. That can be done by electrically blowing fuses on die, or through laser modification. One of my former employers was *most* found of the laser approach. It added quite a bit of flexibility to their designs.
--Lenny
Actually, that's the point: the toshiba ram from Frye's was actually good stuff. It was the crap ram from a hole-in-the-wall parts dealer that was no good. They mostly sold name brand stuff, just bulk and deep discounted (Hi-Tech USA), and obviously a p3 from them is the same as a p3 from anyone else, but the RAM they included in the packages by default just didn't cut it.
Why?
Also, wouldn't an entire page of memory be whiped out - not just one bit? I haven't looked at what these guys have done, but I wouldn't be suprised if entire 64KB pages are affected if only one bit in that page is gone.
As I said before, I think it's really cool what they have been able to do. There may be some niche areas for this program to be useful. It is not, however, a good thing (IMO) to be buying bad memory just to save a buck.
Thats just my opinion tho...
Verbatim
Price, Quality, Time. Pick none. What, you thought you had a choice?
Modern DRAM doesn't have much trouble with bad cells, and the yields are quite good. So there isn't a big supply of DRAM with bad cells that fail solidly. Most DRAM problems today are at the edges: at the buffers, the connectors, or clock synchronization - the things that can be messed up during installation.
Personally, I get ECC RAM even on desktops, just so I know it's working. It eliminates arguments with tech support when the hardware really is broken.
He's right you know. How many times have you looked at a pile of old computer parts and thought it would just take a couple hours of effort and maybe a $20 whatzit to make it all hum as your new file server?
Last time I tried that it took me 20 hours of bad part diagnostics and piecemeal parts replacements. 20 hours worth of effort that I could have just spent working and then bought a brand new (cheap) machine.
So *any* price they can get, above and beyond shipping and handling, is pure profit to them. And there's a cost to throwing them out which they wouldn't have any more.
--
Infuriate left and right
Nobody [except the brainless] leaves defective parts in mission critical hardware.
If I were experimenting with something, I might consider this patch, but for someone who leaves a server running, and expects stability for months at a time, it's not within the realm of consideration.
--
A feeling of having made the same mistake before: Deja Foobar
So tell me, how, without a memory tester, do you know what's predictable vs. unpredictable? As far as I'm concerned with DIMMs, 5% broken is 100% broken.
--
A feeling of having made the same mistake before: Deja Foobar
Well, that $135 also covers the cost of bad memory that normally gts thrown out. So, since the bad memory is essentially already paid for, you could consider selling a bad piece of memory all profit.
My pants just got tighter.
- A.P.
--
* CmdrTaco is an idiot.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
a Beowulf Cluster of Bad DIMMs? Seriously, I am impressed by the immatureness of posts here. Whenever semiconductors are made in large scale in large factories9like those of hyundai), there is a proper verification system, which checks the DIMM's before sealing, and if any defect is found, than it's sent back to a separate process, mostly scrapped. Now if a company decides to actually sell defective DIMM's, the cost would come out top be grater than that of normal DIMM's simply because of the ratio of good yield to bad yield. Bill Clinton having sex with Chelsea, here!
You can check out Best Buy or CompUSA for some faulty RAM. They seem to have a never ending supply of it. Not only that but you can pay the price that you can get it of of the net for good ram!
huh?
Okay, so you car dealer marks down the car 80% because ONE of the pistons doesn't work right. However, the rest work fine and he installed a thing-a-ma-bob to make the engine ignore that piston.. ummm..
What kind of warranties will the end user get for the memory? what kind of performance is eaten by this program? does the memory run up to spec? will it still work in 2 months?
There could be a niche market for "used" memory sticks, but "damaged" or "defective" may not sell all too well...
I agree, however, that this does seem like a cool way to resurect older systems into useful appliances (print servers, routers/gateways, etc).
Verbatim
Price, Quality, Time. Pick none. What, you thought you had a choice?