Slashdot Mirror


Patch To Allow Linux To Use Defective DIMMs

BtG writes: "BadRAM is a patch to Linux 2.2 which allows it to make use of faulty memory by marking the bad pages as unallocatable at boot time. If there were a source of cheap faulty DIMMs this would make building Linux boxes with buckets of memory significantly cheaper; it also demonstrates another advantage of having the source code to one's operating system." The BadRAM page has a great explanation of the project's motivation and status. Now where can I pick up some faulty-but-fixable 512MB RAM sticks?

247 comments

  1. Re:Signal 11 no more? by Signal+11 · · Score: 1
    So what are you saying?

    --

  2. Cool! Put it in the installer! by Moe+Yerca · · Score: 2
    If a distro install could test your memory on the install boot and autoconfigure the kernel to ignore bad memory, gee golly! I know I've got a few sticks somewhere that are suspect... never bothered to test them.

    Now if we can just get some kernel drivers that can bypass other bad hardware... umm... uh... ok... so I don't have any examples, but dammit! Don't you love that Snicker's commercial with the guy wanting to go to lunch with his poster of the panda bear?! Pretty pretty panda! Pretty pretty panda!

    I INVENTED PANTS!

  3. Re:Bad Ram by ocie · · Score: 1

    This might add too much to the cost, but you could put a non-volatile chip on each simm that spells out where the bad memory is. Then when a compliant OS boots up it reads this information out of the sims and uses it to block out those pages marked as bad by the manufacturer. But who knows if they would go to all this trouble for memory that can only be used by one OS.

    Also have to come up with a good euphemistic buzzword for this memory so that it can be sold. "Near compliant" was a good one I heard a while back.

    --
    JET Program: see Japan, meet intere
  4. Re:Is this good for Linux's rep? by Fat+Lenny · · Score: 1
    Ignore HP -- for the most part, he's just a troll (in the classic sense). Check The Jargon File if you need the real definition.

    --

    --

    --
    fat lenny's gonna lick your brain today.

  5. Re:Bad Ram by ocie · · Score: 1

    Why not incorporate a memory tester into the kernel? It could pick a page, swap out whatever user-space data that was there, and run a few read/write tests on it. If it passed, just free up the page and go on. If it failed, retry to make sure, then mark it as bad and contact a user-space daemon to append this address to the list of bad memory pages. When the kernel boots up the next time, it reads in this file and the newly detected bad pages are never used again.

    --
    JET Program: see Japan, meet intere
  6. Re:A better solution... by kupolu · · Score: 1
    Or you know, maybe we _aren't_ all rich bastards who can go out and buy any computer we want. Maybe we need to work with the resources we have right now. *gasp*

    /me is frustrated by stupidity

    --
    -- We should kill all the intolerant people in the world.
  7. Uses of 512MB of RAM by kupolu · · Score: 1
    Now I'll *finally* be able to run Emacs in REAL TIME!

    (sorry, it had to be said:D)

    --
    -- We should kill all the intolerant people in the world.
  8. Re:If Linux works with crap, that's all IT will gi by SevenSeasOfRhye · · Score: 1

    I disagree.
    One of the major advantages Linux has over M$ products and even some flavours of UNIX is its ability to work on spartan hardware.
    It makes sense to use cheaper stuff. However, if you plan to use defective DIMMs on Mission Critical machines, you probable have
    some defective ones in your head.
    Its up to you (just like everything else with Linux ). If you want it, use it.
    If you don't, good for you.
    This is very good news for people in places like India (Where I come from) where the cost of 32MB Ram (EDO RAM) is about 1/3rd
    of the average person's salary.

    Hackito Ergo Sum.
    Liberte, Egalite, Fraternite, Caffinate.

    --
    Electrical Engineering is BORING.
  9. Re:sounds good for high-performance computing shop by ibpooks · · Score: 1

    In a situation like that, it costs far more for a company to pay an IT worker to swap RAM chips than it does to buy quality, top-of-the-line components the first time around. Not to mention, mission-critical systems should NEVER use crappy hardware.

  10. Re:No, this *is* good for production use! by Anonymous Coward · · Score: 1

    For 'real' fault tolerance, the program needs to repeatedly scan memory and mark areas as bad 'on the fly', rather than just doing it once at bootup.

    Maybe as often as every file load.

  11. Re:Now there's a point to the BIOS memory test? by Ex-NT-User · · Score: 2

    Actually the BIOS Memeory test does a "rough" memory test. The test itself is a bit different from one BIOS manufacturer to another, but for instance with Phoenix bios the test is as follows:

    Write a pattern to ram.
    Read it back. And compare to what it should be.
    Write a aliasing pattern to ram.
    Read it back and make sure u got what u expected.

    This will catch quite a few serious memory problems. The 2 cases I saw recently were:

    1. PC100 memory that wasn't quite up to par. (Droping bits randomly)

    2. A friend of mine put a PC100 dimm in a mobo set for PC133 dimms. The PC100 ram worked.. almost.

    In both cases the results were random lockups and application crashes. Turning on the BIOS ram test quickly identified the problem. Which was resolved by putting quality memory in the box.

    These tests are only really usefull the first time you boot your box or if you are suspecting bad RAM. (It's a quick way to test for serious memory problems without having to pull out a RamChecker)

  12. Re:At least it was flaimbait! by Signal+11 · · Score: 1
    Think about what you write before you do. Karma whoring is patently impossible since the karma cap was added. I can only lose karma until I get below 50. Got it?

    --

  13. Why bad ram may not be sold by Demon-Xanth · · Score: 2

    Say a manufacturer has a DIMM with a bad address line, visually it's a normal DIMM. They decided to sell it for a massively low price. Some unscrupulous buyer utilizing "sell and run" tactics buys a bunch and sells them as normal DIMMs. Buyers call the manufacturer complaining that the DIMMs they bought are bad and the retailer no longer exists.

    Note: This has happened to me, I bought two hard to find keyboards only to find both had water damage upon arrival (packaging still good) and the retailer disappeared.

    --
    If you think education is expensive, you should try ignorance -- Derek Bok, president of Harvard
  14. I'm not so sure about this by John+Jorsett · · Score: 2

    I can see a lot of obstacles to becoming a dealer in bad RAM, including the hassles of having to test it and characterize that nature and extent of its problems. In order for someone to know whether the price and product were right, s/he'd have to have some detailed info about the defects, and it would vary from stick to stick. I'd think that the dealer would have to list each stick independently, along with the defect info. The customer would have to buy each stick as a unique entity, presumably. The logistics costs of doing all this and keeping the inventory records up to date would be quite costly. The economics of this are questionable, IMO, although given my parsimonious nature, I'd love to be proven wrong.

  15. Re:Would you an O/S on a HDD with bad sectors? by rlowe69 · · Score: 1

    If RAM chips were damaged, there is potential that they get damanged further, and eventually your amount of usable RAM will run down to 0 bytes.

    In theory it's possible but VERY unlikely given the quality of RAM made today. But it also depends how the RAM was damaged. If it's a bad connection, the modules (I don't know the name of the chips on the DIMM) themselves could be good but they aren't connected properly. If the chips are from a poorly made batch THEN you may have this degredation problem you speak of.

    But apart from individual testing, how do you differentiate these chips from "good" chips used on "good" RAM? You can't, and that's why there are warrantees.

    rLowe

    --
    ----- rL
  16. Re:Signal 11 no more? by Signal+11 · · Score: 2
    at least have the balls to say it logged in you piece of shit.

    CmdrTaco that really does not become you.

    --

  17. Bad Memory doesn't go to waste by IvyMike · · Score: 3

    There isn't this huge supply of bad memory out there (Radio Shack jokes aside) because memory manufacturers are pretty clever. Bad memory is put into things like:

    Audio storage devices, like answering machines and mp3 players, where a bit or two of failure will just end up as a teeny bit more noise.

    Cheap digital cameras (once again, a bad pixel here or there....)

    Toys. They actually call bad memory "toy memory" sometimes.

    SIMMS. You take (for example) 4 bad chips and 1 good chip and get the equivalent of 4 good chips (by replacing bad io's on the bad chips with io's on the good chip). There are jillions of ways to do this, and companies have pretty much done them all.

    Sell them at CompUSA to people who don't know any better. (Sorry, couldn't resist)

    If I were you, I'd download memtest86 right now.

  18. Re:If only it made sense.. by verbatim · · Score: 1
    if I found a 512M DIMM lying on the street -- and 448M still works -- I'm not gonna complain!!

    I just seems like a recepie for disaster. And if Murphy has anything to say about it, the memory will completly fail just as you finish that 100% windows compatable OS (and forgot to save as you went) and it will be all lost to that great processing cycle in the sky.

    Hey, if you don't mind putting low-quality, defective or damaged parts in your computer, be my guest. I'd rather run be stable than wondering how long until my computer craps out (granted that this could happen for many OTHER reasons, but I don't need another reason).

    I can't see it being recommended to buisnesses or even consumers. Hackers... maybe that's the niche you'd be looking for. By Hackers I mean - people who push their computer systems beyond specification for the simple pleasure of being able to do it.

    Buisnesses want reliability - no buisness will buy "damaged" or "defective" parts. You might swindle them by calling them "refurbished", but thats streching it.

    Consumers want something to "surf the internet" or to send pictures to grandma. They don't want the hassle of replaing their RAM every 4 months because the lower-quality memory gives out too much.

    Hackers... well.. we're a weird bunch... Most will find the idea interesting.. maybe set a system up to see how kewl it looks, but... nah.

    Whatever..

    --
    Price, Quality, Time. Pick none. What, you thought you had a choice?
  19. Re:Finally! by Yardley · · Score: 2

    This RAM idea is great. It shows the true spirit of open source. We can fix anything from broken RAM to Microsoft.

    Now what about something to make me burn less coasters? ;)

    There's new error-prevention technology available, but I believe it relies on hardware and software, to keep you from burning coasters.

    #1) Sanyo's BURN-Proof technology (available on the newest Creative, QPS, Plextor, LaCie etc. writers)

    #2) Ricoh's JustLink technology (available on its CD-R/RW/DVD-ROM combination drive among others)

    Both technologies automatically prevents buffer under-run errors which are the leading cause of coasters.

    If I were in the market for a new burner, I'd go with the $349 Ricoh combination drive. It does 12x CD-R, 10x CD-RW, 32x CD-ROM, and 8x DVD-ROM all in one device. That's smart.

    --

    --

    --
    He lives in a world where those who do not run the client software of the omnipresent meme are unacceptable.
  20. Re:Bad Ram by imp · · Score: 2

    Actually, these sorts of marking bad memory things have been around for a long time. The trouble with them is that RAM isn't as determanistic as you'd like to believe. Once you get one or two bad pages in RAM, more tend to break rather quickly. This has been tried in the past and I think that these systems will still be flakey even when not touching the bad ram.

    I've also seen machines that had bad ram lock up randomly, even when the bad pages are never touched.

    Let me just say that I have my doubts.

  21. Re:Now there's a point to the BIOS memory test? by juhaz · · Score: 1

    Uh, if those BIOS memory tests disturb you so much, why don't you just turn them off like the rest of us who don't want to wait?
    Put the quick post/boot/whatever option on, and no more memory testing wasting your precious seconds.

  22. Re:Why bother? by John_Booty · · Score: 2

    "It's worth doing because it keeps a working system up, and Linux should have that"

    Huh? Isn't ECC functionality handled in the BIOS, not the OS? So... Linux does have that functionality, eh?

    --

    OtakuBooty.com: Smart, funny, sexy nerds.
  23. Re:Oh, sure, Linux users are this desperate by dboyles · · Score: 2

    Doesn't this make Linux look like a throwback to those old days of hobbies, like Amature Radio making QRP rigs in sardine tins?

    Sounds to me like you're describing what the true definition of "hacking" is. Let's see, if you can get a certain amount of RAM by doing a little hacking for less than you'd pay in a store, what's wrong with that? People do this in their everyday lives. As I type this, I have a penny in my car, wedged between the stereo head unit and the side where it mounts to hold the thing in place. No, it doesn't look pretty. Yes, it did the job (and the price was right).

    Perhaps in corporate, "everything must look nice and neat" environments, this isn't a valid solution for adding RAM. But for the CS student who has an old DIMM sitting around, it's pretty damn cool.

    --
    -- "Complacency is a far more dangerous attitude than outrage." -Naomi Littlebear
  24. Re:Signal 11 no more? by E1ven · · Score: 5

    I must be reading way to much slashdot.. I read the headline, and thought
    "Of course Signal 11 is no more.. He left after a big blowout with Rob..."

    --

    This message brought to you by Colin Davis

    --
    Colin Davis
  25. What about intermittent failures? by Kelledin · · Score: 1

    This still doesn't cover the issue of intermittent failures, though. Think on this: if memory were obviously good or bad, then bad memory would probably not even pass the initial memory check in a system. Yet bad memory often does pass even the most strenuous checks, even with ECC enabled (I had this problem myself). And if the "good/bad" status falls about such a hazy line, how effective can any piece of software be at singling out a defective chip?

    This BadRAM patch will probably improve the situation, of course, but it can't pick out every bad chip thrown at it unless it checks every chip for eternity. Obviously, that's not going to happen...so...system reliability is still compromised, if only less so.

    Kelledin Tane, the Dreaming Minstrel

    http://kelledin.tripod.com/scovsms.jpg
    1. Re:What about intermittent failures? by fgodfrey · · Score: 2
      There are ways of addressing this issue too. The group I work in at SGI is responsible for (among many other things) Irix and Linux RAS features on our hardware. RAS is Reliability, Availability, and Serviceability. One of the things we observed is that the most likely case to get a double bit error (we have ECC memory in all our Origin servers/supercomputers) was grabbing a new page off the freelist and then bzero'ing it.

      The theory on why this occurs is that the memory on the freelist isn't being accessed (well, ok, we have some bugs occasionally, but... :) and it degrades because of this. Since you don't care about the data on the page, it kinda sucks to panic during the bzero. So, Irix, starting with 6.5.7, knows how to "nofault" this bzero operation and if it fails, it grabs a new page off the freelist and discards the bad page. This is a feature we are thinking of adding to Linux. Other types of pages for which this same recovery can work are mapped files (ie, program and library text/read only data) and clean user data (ie, just swapped in and not yet modified). This is about the best way to solve the intermittant failure problem.

      One other interesting thing we've noticed with RAM is that failure rates over time stay pretty much constant since, while manufacturing techniques are getting much better, the memories are getting larger. This causes failure rates to stay nearly flat.

      --
      Go Badgers! -- #include "std/disclaimer.h"
  26. Linux is not the be all and end all by FallLine · · Score: 2
    If we ever want to see linux used in mission critical systems like air traffic control, embedded medical devices, or military applications, then projects like this are the key.


    IANAESE, but Linux will never be used in a life critical medical device, never mind implantable medical devices. Firstly, the FDA requirements are simply too strict to allow linux's usage. Secondly, it's both overkill and underkill at once. Linux may be relatively efficient compared to systems like Windows, but it's not anywhere near small enough for traditional embedded systems. Third, Linux simply does more than it would need ever need to, why use it? Fourth, it's not setup for DSP type operations. Fifth, do you really want to unnecessarily trust your life to linux just so you can make a statement?
  27. Re:Oh, sure, Linux users are this desperate by ackthpt · · Score: 2

    Yes, but are you really going to impress management to adopt Linux when demonstrating that it can run on on a machine with defective parts?

    It's like saying, "This new Mercedes E320 is as good as the one without a dent in the door." Both run, both are equally safe (assuming its just a superficial dent), but it just doesn't sell itself.


    --

    --

    A feeling of having made the same mistake before: Deja Foobar
  28. Re:Chips may still not work by dstone · · Score: 1

    512MB sticks are still expensive, faulty or not.

    I've never seen faulty RAM advertised. Where are you seeing these prices? Online?

  29. Re:Finally! by AndyL · · Score: 1

    Well, a couple of them had blue backs. It's some sort of rainbow pack. They were prety inexpensive too. I suppose it's possible they sell a couple diferent types or qualities of media. (come to think of it the floppys I mentioned were probably rainbow colored too.)

    Oh well, I guesse it's back to making colored labels with my Neato and hopeing they don't peal off.

    -Andy
  30. BTW, a bit of ancient, related trivia by Mr+Z · · Score: 2

    By the way, here's some ancient related trivia. The INTV Productions video game cartridge "Triple Challenge" integrated the previously-released Chess, Checkers and Backgammon on a single game cartridge. In its original form, the Chess cartridge came equipped with a 1K SRAM onboard, as the game required extra memory.

    At the time INTV went to produce the Triple Challenge carts, they discovered that since RAM had grown in capacity over the years, 1K SRAMs weren't available in quantity for reasonable prices, and larger SRAMs were too expensive as well. They almost had to cancel the Triple Challenge cart.

    That is, until they found someone with a stack of 2K SRAMs, in which half the RAM was good, the other half was bad. Since the game only needed 1K, it ignored the bad half, and off they went.

    Cool, eh?

    --Joe
    --
    Program Intellivision!
  31. Re:Why bother? by Animats · · Score: 2
    Isn't ECC functionality handled in the BIOS, not the OS?

    On x86 systems, the memory controller handles the ECC error correction, and you get an interrupt which allows you to log the event. Often this interrupt is handled by the BIOS. But the BIOS typically doesn't do anything but log the event. The OS can do more; it can map the bad block out, probably without a shutdown.

  32. Just how useful is this, really? by b1t+r0t · · Score: 5
    See, the thing about bad RAM and SIMMs/DIMMs is that they can test the chips before soldering them onto the circuit board. If they want, they can even test them before putting the chips in a plastic case. So if you have "bad" RAM, it's more likely to be a defect in the soldering process that renders the whole stick (or an entire column of data bits) useless, or bad contact with the socket.

    You'll probably get better results simply by cleaning off the contacts with a pencil eraser (remembering to brush away all the eraser dust first) and firmly re-inserting them into the socket.

    --

    --
    "Open source is good." - Steve Jobs
    "Open source is evil." - Microsoft
    1. Re:Just how useful is this, really? by nkpatel · · Score: 1

      This has some pretty useful points in high availability situations. Instead of having the system barf when it encounters a bad page, it can mark it bad and the OS avoids using it. Then, during some 'scheduled' down-time, technicians can replace the bad DIMM(s). Something even nicer would be runtime deallocation of bad memory...

    2. Re:Just how useful is this, really? by b1t+r0t · · Score: 2
      Moreover, most of the problems I've seen with bad memory have been intermittent random failures related (apparently) to thermal stress and/or specific timing patterns, not fixed "sticky bits".

      Ah yes, timing and all that other rot. Two stories here.

      My first every memory upgrade was many years ago, adding 4116 RAM to a TRS-80 Expansion Interface. I put it in and it didn't work. When I looked at the address range with the TRSDOS debugger, it contained random values that changed every time I looked at it! I managed to get it to work right by cranking the power supply voltage down into the low 4.x volt range.

      And a few months ago, I got some old IBM 72 pin 8MB SIMMs at a computer show that were probably pulls from old PS/2 machines. Some of them worked, some didn't. I didn't realize until a few days later that this was 80ns RAM, and the motherboard only supported 60/70ns RAM. I was lucky to get any of them to work right.

      And then there was all that talk about "cosmic rays" messing up DRAM, until eventually it was discovered that the radiation was coming from within the chip itself!

      --

      --
      "Open source is good." - Steve Jobs
      "Open source is evil." - Microsoft
    3. Re:Just how useful is this, really? by egnor · · Score: 2

      Moreover, most of the problems I've seen with bad memory have been intermittent random failures related (apparently) to thermal stress and/or specific timing patterns, not fixed "sticky bits".

      I thought it was already relatively common for RAM manufacturers to test for single bit errors in the factory and route around the affected cell, which would negate the economic value of doing this in software. (They should certainly be incented to do this, otherwise they'd have terrible yield issues.)

      This sounds like the "bad block" detectors that used to be necessary for hard drives, but aren't any longer (hard drives these days remap bad blocks internally)...

    4. Re:Just how useful is this, really? by alhaz · · Score: 2

      Well, here's my story.

      I have a Toshiba ultraportable. Nice little box, 3 pounds, magnesium case, didn't cost a whole lot, nice bright screen, 96 megs of ram (maxed out)

      64 of that 96 megs of ram are on an add-on card. For whatever reason, the cost of this proprietary memory card for this particular notebook has skyrocketed.

      I've had the notebook about 10 months when, out of the blue, it starts rebooting spontaniously. I fire up memtest86, there's a few chunks of bad ram.

      I take it out and start frantically searching for a source of a replacement. Currently, Kingston is the only manufacturer i can find shipping it, and they want 1/2 what the whole notebook goes for on eBay, including the memory card. About $350 for 64 megs of ram. No Effin Way. Just not going to happen. I'm not poor but that's just plain stupid. I'd be disgusted with myself if i spent that much on that little memory when i'd be better off selling the whole notebook and buying another.

      So i found the badram patch. Patched my kernel. Found that my lilo was too old to allow the whole commandline. Downloaded and installed new lilo.

      And, it doesn't work.

      Well, it sortof works. Now, with that ram, it randomly locks up, instead of randomly reboots. Big improvement, right? Wrong.

      I don't know what the problem is. A friend with a background in the semiconductor industry says that memtest86 is written from an outsider's point of view regarding memory, but that he's under a prior NDA, and Motorola would probably be Quite Upset if he leaked old documents that would tell the author how to improve it. Maybe i just need a better memtest86. Maybe i ought to expand the ranges so that an area around the affected areas are also blocked out. I don't know.

      All I know is, in my case, it didn't really help. And that a notebook with 32 megs of ram and a really slow harddrive is useful for little more than an xterminal.

      --
      This is just like television, only you can see much further.
    5. Re:Just how useful is this, really? by joekool · · Score: 1

      Just curious--how does bad ram affect your harddrive---I ask due to the unexplainable(by me) problems I have been having---I was chalking them up to a bad ide interface, and was gonna get a new mother board (want to up grade to an athlon anyway)--but I would love to find out before hand that it is my RAM, as I hate buying a replacement part, only to find out it was the wrong replacement part!

      --

      Slackware: old school feel, new school gear.
  33. Re:If only it made sense.. by compwizrd · · Score: 1

    from reading the owners manual on the ceo's seville sts at work, it's designed to run in a case of total coolant loss. Runs on 4 cylinders, and pumps air through the other 4. Flip back and forth as needed, to keep the system cool enough to limp home.

  34. MFM Hard Drives Revisited by evilviper · · Score: 1

    Does anyone else see this as a setp backwards? Don't you remember getting old hard drives with a map of all the bad sectors printed on it? Does anyone want to repeat those days using memory this time around? As with hard drives, one failing section is not alone... This kind of project encourages use of questionable memory sticks which are bound to bring down systems without warning. I will not use fault memory for the same reason I won't overclock my processor (at least not my main system's CPU) there is more to it than price! Reliability, longevity, stability, etc are all in question when you push a PC where it was not meant to go.

    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  35. The patch is quite old... by mw · · Score: 1
    I have a defective RAM in use since a nearly a year now using this patch, no problems so far.The 128 MB RAM module has exactly 3 reproducable memory defects

    Before using the patch I had quite some problems with this system when the system was under load (the defects are located in the upper address range of the RAM). Works very well now since I applied the patch

  36. Yes, you can have more than one root in linux by ShortSpecialBus · · Score: 1

    Just fyi, You can have more than one root user in linux unless I am way out of the ballpark. Just specify -u 0 when you add the user. This effectively sets the userid of whatever user you just added to the same as root, so that user can do whatever root can, but they're not technically root, as they don't have the 'root' username. They have their own.

    --
    //FIXME: Bad .sig
  37. Re:Hello this was on Kernel Traffic a long time ag by jarek · · Score: 2

    Well, I read kernel traffic and kernel mailing list but somehow this escaped me. Thanks slashdot. /jarek great stuff btw

  38. Re:Finally! by delysid-x · · Score: 1

    The only coasters I ever made were because of defective cheapass BASF cd-r's... They were total crap. Most of the ones that burned fine and worked after burning now no longer work (some chemical process? who knows?) Never had a coaster with the other brands I use, Sony, Maxell, Memorex and Fuji

  39. Actually, this has real-world applications... by Troy+Baer · · Score: 1

    This is very useful for production systems with very large amounts of memory. For instance, Cray systems have a capability where bad bits in memory can be "flawed out" on the fly. Extending Linux to support the same kind of thing (especially in combination with ECC memory!) would very useful for shops that have big memory requirements and need as many 9s of uptime as they can get.

    --Troy

    --
    "My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
    1. Re:Actually, this has real-world applications... by ackthpt · · Score: 1

      Troy,
      There's a reason many architectures hang when bad memory is found. If it can be corrected, great. If can be mapped out on the fly, great. If it took 1 byte and threw it into the Cuisinart of rand(), be it data or code, you'd probably want to know about it, particularly if it were for research, medical (there's a happy thought, eh?) or, heaven help you, you are Bill Gates hanging around in front of a congressional committee and your IE flakes out as you argue against removing IE from W98 making the OS unstable, you'd probably like to know that it was a hardware failure and which piece it was so you could replace it and get on with things.


      --

      --

      A feeling of having made the same mistake before: Deja Foobar
  40. Re:If Linux works with crap, that's all IT will gi by buckrogers · · Score: 1

    Memory is only defective if it doesn't return what you earlier put into it... With this patch the memory _will_ return what was put into it. Hence, the memory is _not_ defective under Linux with this patch.

    This type of thing is done with every hard drive in existance today (even production hard drives) and is such a non-issue that most people aren't even aware of the problem. Running scandisk under windows will show you the blocks that are currently mapped out. Running mkfsck with the -c flag will map out bad blocks during formatting under Linux.

    This is a great patch! Good job.

    The only improvement that I can see is if it could add additional addresses to map out "on the fly" as they are found to be bad during operation.

    This would kill the program that hit the bad memory, but would otherwise let the computer keep running.

    --
    -- Never make a general statement.
  41. How to find bad ram cheap by The+Dev · · Score: 2

    Now where can I pick up some faulty-but-fixable 512MB RAM sticks?

    Oops, now you can't :(

  42. Re:reliable ? by Blrfl · · Score: 1
    cookieman writes:

    Only if the DIMMs don't degrade further slowly, right?

    Actually, building in support for on-the-fly memory testing might not be a bad idea. I've seen it done elsewhere and can't think of a good reason not to add it as an option to Linux. (Or other Unixes. You listening, Sun?) Even good RAM will eventually go south, and being able to prevent the system from using it could prevent critical systems from crashing. It might not be able to prevent every crash, but might prevent a few.

    If I had the time to experiment, I'd add software to the Kernel that periodically pulls a page of RAM out of service and uses idle time to test it out, returning it to service or marking it permanently bad.

  43. Sounds like a good plan..... by soulsteal · · Score: 1

    Windows kills the memory, Linux nurses it back to working health....

    1. Re:Sounds like a good plan..... by soulsteal · · Score: 1
      Shut the f**k up a**hole. Look at what you just posted. It is neither funny, insightful, interesting, or informative. All that was was a LAME ASS attempt at FIRST POST.

      Just to let you know, personally I thought it was funny. And apparently some others did to. I agree it's not funny enough to warrant a karma point. But if it entertains a few, why not just leave it be, smile and go on your way? You're just a bitter little bitch. So what if you don't like it? That's why I opted NOT to use my +2 score for this post. Why risk karma on a so-so funny post? I wouldn't be hurt if all my karma disappeared, but I do enjoy moderating. Besides, you're just mad cause I got the second post and you didn't!

    2. Re:Sounds like a good plan..... by Enahs · · Score: 1

      If this is the real Siggy...here ya go.

      Read your post. Here--I'll quote it.

      /*
      Shut the fuck up asshole. Look at what you just posted. It is neither funny, insightful, interesting, or informative. All that was was a LAME ASS attempt at FIRST POST.
      */

      Yeah, it was lame. But who died and made you God?

      //Get a GODDAMN LIFE.

      Read above line.

      --
      Stating on Slashdot that I like cheese since 1997.
  44. Does Slashdot readership know nothing of hardware? by slothbait · · Score: 5

    I'm amazed by how little this crowd know about details of semiconductor manufacturing. Defects are unavoidable! There, I said it. With the transistor sizes that we are pushing today, a speck of dust ruins an entire blcok. All you can do is *limit* the extent to which this happens by being as strict as possible with your clean room. But *some* contaminents will always get through. Perfection is unachievable. You have to accept this.

    Alright, so we've accepted that some dies are necessarily going to be damaged. Why not make the hardware such that it can resist imperfections? Well, actually we do. RAM being as simple and homogenous as it is, lends itself well to this approach. Here's the idea: you add extra "blocks" of memory to a decode line. Then, if one of the "regular" blocks is destroyed by a process imperfection, the post-fab die can be modified with laser to reroute data to the extra backup block. So you invest some die room in backup structures, so that a die with only a few errors can be "corrected" and will still function as intended. This is basically like keeping a spare tire. If you get one blowout, you're still in business, but two and you are in trouble. Of course, you can package as many extras as necessary, but it may not make economic sense. Here you calculate the appropriate trade off between die size and yield to make the decision.

    Anyway, long story short: your DRAM is already "bad". Quite a few RAM chips contain process errors that are rerouted around in hardware so that you, the consumer, need never know. To you, the process is transparent. All you should care about is that you get your *functional* RAM cheaper, because the manufacturer would have had to scrap that die otherwise.

    This post discusses software "rerouting" around blocks that had more errors than could be corrected in hardware, but somehow still made it out the door. What's wrong with that?

    Will semiconductor manufacturers suddenly think "Gee...let's not worry about yield anymore?" You'd better bet they won't. And even if they did, if the software rerouting is so clean as to not be noticeable (which is the only way it would fly), what do you care? You'd get your RAM cheaper.

    --Lenny

  45. Finally! by AntiPasto · · Score: 3
    I've *always* thought that software could make up for bad hardware (err... well I guess that's the point to bad sectors marked on disks, fault-tollerance, and network routing)... but this is getting back to basics in a great way. Now what about something to make me burn less coasters? ;)

    ----

    1. Re:Finally! by karnal · · Score: 1

      The coaster issue you're talking about should be solved with the purchase of a Plextor 12/10/32 (read some reviews on it... dear god they're sweet...) :) good luck!

      --
      Karnal
    2. Re:Finally! by tzanger · · Score: 1

      Now what about something to make me burn less coasters? ;)

      • Stop kicking the computer
      • Use better media
      • Use a better CDR
      • Try not to do anything super CPU or disk intensive while burning*
        • I use probably the cheapest media available (won't burn right at 4x), use an Yamaha IDE CDR with a 2MB cache and I've only ever burnt one coaster.. And I haven't a clue how it happened. Law of avaerages I guess. :-)

          Seriously though... something's really wrong if you're burning lots of coasters.

          * I've never had a problem with this since Linux and cdrecord seem to keep cdrecord as a high priority both CPU and I/O wise, but maybe I'm just lucky

    3. Re:Finally! by 11223 · · Score: 2

      Hrmn, only burnt one coaster? I've burnt quite a bit more, but they were all my fault: if the ISO is set up correctly (hah!) then I never make a single coaster. And this is on BeOS, where I don't even get to set the priority of cdrecord. And I've got an off-brand CDR drive.

    4. Re:Finally! by Aerolith_alpha · · Score: 1

      I use Iomega zipCD 4x, any media i can get my hands on--good, bad, inbetween--never burnt a coaster that wasn't the fault of my old dusty ass dorm room... sometimes the spools would get dust in them, and when i was in a hurry i would forget to check the disk before i put it in to burn--there would be these litte 'clean' (not burned) spots on the disc afterwards where the dust was... needless to say, the disk wasn't readable.


      mov ax, 13h
      int 10h

      --


      mov ax, 13h
      int 10h
    5. Re:Finally! by ryusen · · Score: 1

      Ever tried to install RedHat 7 in a system that boots from an Promise IDE controller? i think this was a driver problem for alll linuxes... i had similar problems with corel linux on an a7v with a built in promise controler... new bios update supposedly fixes it, but i haven't had time to check yet

      --

      I believe sex is highly over rated... unless it involves me
    6. Re:Finally! by Suppafly · · Score: 1

      so you gonna sell the old cdr cheap then right? now we just need a /. classifieds section..

    7. Re:Finally! by twilightzero · · Score: 1

      Really? I've been using Verbatim media for YEARS and I've never ever burned a single coaster on them (other than one that had a flawed source cd) - you ARE talking about the ones with the kewl blue backs aren't you? Anyway I even worked at a radio station that switched to nothing but Verbatim after their first batch of it they were that impressed. I've used them on countless burners and never had a problem that was the verbatim CD's fault. As for colored ones, they're probably some lame-ass attempt at marketing and should be avoided at all costs. Find the kewl blue-backed CD-R's and you have found true happiness, Grasshoppa ;)

      -----Begin Geek Code-----
      Version 3.12
      GCS/MU/S>AT d--(?) s:-- a-- C++++ UL++>$
      P- L+++ E---- W++ N-- o- K+ w--- O+ M--
      V-- PS++(+) PE-() Y+ PGP- t+++(*) 5--(-)
      X(-) R+ tv(@) b++++ DI+++ D+++ G++ e+(*)
      h---(++) r+++ y+++(*)
      ------End Geek Code------
      Decoder is found at http://www.ebb.org/ungeek/ and original code, including instructions on how to make your own, can be found at http://www.geekcode.com - happy hacking!

      --

      "Christ what a design! I could eat a handful of iron filings and PUKE a better emergency pump than that!"
    8. Re:Finally! by Teancom · · Score: 1

      Can't set the priority in BeOS?? Download ProcessController from bebits and love life! It slices, it dices, it also happens to do everything that pulse does, but also kills, renices, and cuddles up to all your processing needs. In other words, go now :-)

    9. Re:Finally! by RobNich · · Score: 1
      Ha! with my HP 7200 Series (2x2xSomething) I play Quake3 while burning audio and data cds. Hell, I burn MP3s to AudioCD and play Q3 in the meantime. It makes Q3 a little jumpy, but I haven't made a single coaster in 2.5 years. My setup:
      • NT 4.0 SP6b
      • Dual PII-400
      • 256 MB PC-100 RAM
      • 9G SCSI 7200 RPM Quantum HD,
      • 32G UATA-66 7200 RPM IBM DeskStar HD (where my files are),
      • CD Burner on IDE (as slave, master is Plextor CDROM).

      It should be this easy for everyone...
      --
      Hello little man. I will destroy you!
    10. Re:Finally! by Defiler · · Score: 1

      Remember when NT required unusual hardware to run, no VESA, sound cards often didn't work, etc? Times sure have changed.

      Oh yeah.. Times sure have changed. Ever tried to install RedHat 7 in a system that boots from an Promise IDE controller?

    11. Re:Finally! by Yardley · · Score: 2

      If major speed is on your mind, Yamaha just announced some 16x writers. In conjunction with Oak Technology, Yamaha is bringing out a 16x/16x/40x CD-R/RW and just came out with (in Japan and parts of Europe) a 16x/10x/40x CD-R/RW.

      "Yamaha first to market with 16X CD-RW drive designed around Oak's controller that reduces CD burn time to under 5 minutes"
      16X Write
      16X ReWrite
      40X Read / Audio Ripping

      Yamaha's CRW2100:
      16X Write
      10X ReWrite
      40X Read / Audio Ripping

      These drives use an 8MB Memory Buffer for their high speed and to avoid buffer under-run. I can't find any indication if they use either Sanyo's or Ricoh's error prevention technology. I don't think they do.

      An interesting article on Plextor's newest drive talks about a newer form of BURN-proof and also JustLink hints that 24x write drives may be down the road.

      --

      --

      --
      He lives in a world where those who do not run the client software of the omnipresent meme are unacceptable.
    12. Re:Finally! by GoRK · · Score: 2

      It's called BURNproof and it's a Hardware/Software combo deal. The new plextor's support it and other drives can't be that far behind. One time I accidentally started burning about 50K extremely small files onto a CD on the fly on my new burner at 12X before realizing OOPS it's gonna UNDERRUN! Then it underran, the light kicked from blinky yellow (write) to green (you just underran you commie pig) and then back to blinky yellow (foiled again!)

      Amazing!

      Now if you're trying to correct a meatspace error your're having, that's a different story. Use rewritables!

      The reason you couldn't do this on a normal burner is that the change requires a bit of extra code in the firmware to handle the ready to restart and a laser that can switch from read to write very fast. CDRW drives can (and do) correct errors such as this when they are writing to RW media.

      ~GoRK

    13. Re:Finally! by xplosiv · · Score: 1
      Have you checked the new plextor writer? I got one of these babies. It uses BURN-proof technology, licensed from sanyo I believe, which will allow a cdwriter to pause when the buffer is low. This is the first time I bought a IDE cd-rw (I am a scsi user) because of this. I was unable to create a single bad cd, no matter how hard I tried (i.e. I tried on the fly mp3->cdaudio encoding on a p266). I even tried ejecting it, and after I put the cd back in, Nero asked me if I wanted to resume my ISO (was burning freebsd). Not to mention, you can't beat the speed of this machine, 12x write, 10x rewrite.

      For more technical info about this BURN proof technology, check out http://www.plextor.be/eng lis h/technical/burnproof.html/P

    14. Re:Finally! by AndyL · · Score: 1

      I recently bought a 10-pack of Verbatim disks. I grabbed 'em because they had these nifty thin cases instead of the jewel cases. (also I figured if they were colored I could remember what was on 'em until I got around to labeling them.)

      Out of a pack of 10 I've found 4 defective ones and I've still got three I havn't tryed!

      I should have known better. After all the problems I've had with Verbatim floppies.

      -Andy
    15. Re:Finally! by RAruler · · Score: 1

      Really? No wonder I can't install Linux on my new computer.

      ---

      --

      --
      Insert Witty Sig Here
    16. Re:Finally! by TheCuban · · Score: 1

      I have one, and yes... It rules ass. I've never burned a coaster EVER!

      --
      cuban
    17. Re:Finally! by e-Motion · · Score: 1

      There has been a recent development for CDRs that supposedly protects you from buffer underruns. It's called "Burn Proof" technology, and it allows a CDR to pick up right where it left off when you buffer underrun. I've never used a "Burn Proof" CDR myself, but the thought of getting one to replace my old 2x burner did cross my mind. This technology is fairly new, and is hardware only (I believe Sanyo came up with it, but Plextor, and possibly others, has a drive out that uses this technology). Look ma, no more coasters! =)

      More on burn proof technology
    18. Re:Finally! by ibpooks · · Score: 1

      Yes they are, if they aren't DOA like mine was. I'm going on week 6 and still haven't gotten it replaced.

  46. This IS good for Linux's rep by cirne · · Score: 2

    No, it seems to be more like "Your new Ford Explorer will automatically check your tires and reinforce any weak spots it finds". Honestly, (in my experience) integrated circuts are generally the second part of any electronic component to go, after moving parts (of which computers have very few). So, when memory goes bad, what would you rather do? Have the computer fix it for you, or go out and buy a new module? I personally have a pair of SIMMs which aren't in top shape anymore... since I don't have the money, the time, or the ambition to get replacements, I just put them in a low-load router box, which occasionally gets rebooted to clear out memory problems. I'd much rather have the system check my memory for me. So, unless someone decides to put out a major FUD about it, I don't see how it could be advertised in any but a good light.

  47. Re:anti-linux? by Mars+Saxman · · Score: 1

    Even those of us with fancy nice-paying jobs don't necessarily have $135 for a 256MB DIMM.

    This sort of thing has been done with hard drives for ages, and it's about time someone did it with RAM. Reminds me of that supercomputer some bunch of Hewlett Packard engineers built out of defective processors.

    -Mars

  48. Bad Ram by AntiTuX · · Score: 1

    To be honest, doing this does nothing but ENCOURAGE the manufacturers to continue to make bad ram. Why not make these manufacturing companies make GOOD ram so we don't have to do this in the first place?

    1. Re:Bad Ram by ryusen · · Score: 1

      without repeating the already stated technical issue of silicon yields... maybe this will be the saving grace for rambus and their bad yields...

      --

      I believe sex is highly over rated... unless it involves me
    2. Re:Bad Ram by Dastardly · · Score: 1

      They do make good RAM. Something like 99.99% of SDRAM chips are good. But, when you are talking 100s of millions of units per year. That is still 10,000 bad chips/100,000,000. And, that is just chip production. some get damaged during packaging, others will get damaged during soldering onto a board. Others during shipping. More will be after market return from people who have killed their RAM for one reason or another.

      Dastardly

    3. Re:Bad Ram by rlowe69 · · Score: 2

      To be honest, doing this does nothing but ENCOURAGE the manufacturers to continue to make bad ram. Why not make these manufacturing companies make GOOD ram so we don't have to do this in the first place?

      I think this comment is a result of not knowing how difficult it is to make "good" RAM (or *any* good electronics for that matter).

      The complex process behind making RAM means that there will ALWAYS be defective ones in the batches that can't meet standards set by the manufacturer to cover their ass on warrantees, etc. If they are going to end up throwing this hardware out (or recycling the pieces if that's cheap) then they might be able to make more money selling the defective RAM.

      While having "partially defective" RAM on the market may seem bad, if the price point is right it could be useful for some people. Like if I could get 512MB with 50MB defective for 100 bucks (w/ maybe a 1 year warrantee on the 462), I'd jump on it in a second. But that's just me.

      --
      ----- rL
    4. Re:Bad Ram by mmontour · · Score: 2

      Don't RAM chips already have some internal ability to map out bad areas, like the way that IDE/SCSI hard drives come with spare sectors that are automatically mapped in when a regular one fails?

      [google]

      DRAMs typically improve yield by using spare rows and columns to replace those occupied by defective cells. The repairs are performed using current-blown fuses, laser-blown fuses, or laser-annealed resistor connections. [Coc94] references one case in which memory repair increased the yield from 1% to 51%.

      from http://www.cs.berkeley.edu/~rfromm/Courses/SP96/cs 294-4/project1/dram-test.html

    5. Re:Bad Ram by Lawbeefaroni · · Score: 1
      The problem I see is that manufacturers will may use it as a way to jack up prices for bad ram. Kind of like the way chips are sold: You make a ton of what you want to be 900s, say, and the ones that don't make the cut (bad) are sold as 800s, 750s, whatever clockspeed they are stable at. What's to stop a 512 card being sold as a viable 256 for Linux systems? So instead of dirt cheap/free damaged 512s, you're getting 256-priced damaged 512s.

      Granted, there was little use for damaged memory before, but now you'll have to pay for it.

      --
      "When it rains, it pours." --Morton's Salt
    6. Re:Bad Ram by bmongar · · Score: 1

      It could actually lower the price of good ram by reducing the ammount of padding needed to make up for the faulty ram

      --
      As x approaches total apathy I couldn't care less.
    7. Re:Bad Ram by Abcd1234 · · Score: 2

      I think the point this guy is trying to make, on the economic side of things, is that there's a limit where the ratio of good to bad RAM is as high as it's going to be. IOW, there's always going to be a certain percentage of bad RAM in a given production run. So, why not make this semi-broken RAM viable, by selling it cheap for commodity PCs. This could help reduce prices on cheap PCs and make computing power and the Internet more accessible to those with tight funding (poorer folks, libraries, schools, non-profit orgs).

    8. Re:Bad Ram by kidlinux · · Score: 1

      Forget bad yeilds and whatever else those people talk about in the manufacturing process. What about ram that becomes faulty just from extensive use and old age. Or even damage caused by the user.

      --
      -kidlinux.
  49. Re:If only it made sense.. by tzanger · · Score: 1

    Don't know much about x86 architecture, do you?

    It has to "mark" these bad memory sectors somewhere - I'll need to look at the thing to be more informed on HOW it marks these sectors - probably in memory.

    The exact register set escapes me at the moment but x86 processors (and indeed any processor with an MMU) keeps track of which pages are there and not there in hardware. The descriptor tables are kept in memory regardless of whether they're used for marking dirty pages or bad pages.

    Also, wouldn't an entire page of memory be whiped out - not just one bit? I haven't looked at what these guys have done, but I wouldn't be suprised if entire 64KB pages are affected if only one bit in that page is gone.

    Yes, I would assume they are marking PAGES of memory. So 64k (I thought they were 4k? I know there's a granularity bit to set this) chunks are taken out of the memory... How is this any different than the 4k clusters being taken in your ext2 FS? 64k in a (minimum) 32M DIMM is a drop in the well.

    As I said before, I think it's really cool what they have been able to do. There may be some niche areas for this program to be useful. It is not, however, a good thing (IMO) to be buying bad memory just to save a buck.

    Yes, it is a cool thing and will help those who either can't afford or can't wait to get their new memory in. Personally I won't use the module myself but that is no reason to go blasting it like you have. There aren't any speed hits, there aren't any vast wodges of memory taken up by it, you say it's buggy but that remains to be seen (the patch seems simple enough)... I'm just trying to figure out why you're so upset about this.

  50. Re:Hello this was on Kernel Traffic a long time ag by ttyRazor · · Score: 2

    This is news to ME, and I'm glad it was here to hear it. Sites like /. are meant to bring attention to a wide range of topics, while others aim to provide prompt coverage of narrower topics. Sure, it's annoying to see a story about something you already heard elsewhere a while ago, but it's important for those that missed it the first time.

  51. Great, deliberate instability :-/ by Nick+Driver · · Score: 2

    Now we have a way to deliberately make Linux instable.... if you subscribe to the theory that if a DIMM has bad areas then that increases the probability that more of its areas will fail in the future.

    1. Re:Great, deliberate instability :-/ by AndyL · · Score: 1

      They're not talking about chips that just fail on thier own.

      They're talking about chips that were defective to begin with. Chips that were malformed during manufacture. Manufacture only happens once, so they're not likely malform again.

      If you could get chips that failed thier final testing but were >99% ok you could use them this way. If these chips are available somewhere you can probably get them at a tremendous discount. So instead of paying $zillion for 128mb, you'd pay a few bucks for 128mb minus half a dozen bytes.

      If anything putting more RAM in your machine (for the same price) would give you more stability for your dollar, not less.

      -Andy
  52. One ignored potential use by Vicegrip · · Score: 1

    Another use:
    Making useable older machines that are donated to the 3rd world which come with technologies that often are so totally obselete that replacing them is prohibitive.

    In this case the idea is a very good one. A machine that would have been otherwise useless is made fully functional by Linux and what seems an ingenious way to fix the problem.

    --
    Do not spread "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0" over the internet, thank you.
  53. reliable ? by cookieman · · Score: 2

    Only if the DIMMs don't degrade further slowly, right?
    Anyway it's a nice thing.

    --
    Just another coder...
    1. Re:reliable ? by crt · · Score: 1

      Doubtful - the test run is farily slow, especially for a large amount of memory. It would be like doing a massive fsck every time you boot.

    2. Re:reliable ? by Lennie · · Score: 1

      Only if you can think of a way that works without loosing performance. Great would be to be able to 'detect' bad memory, maybe when you get a 'corrected ECC error' or similair you could start by marking it bad and testing it later ?

      --
      New things are always on the horizon
    3. Re:reliable ? by BradleyUffner · · Score: 1

      hmmm.. Good point, I'v never actually ran the test. But you fsck comment gave me an idea. What if the memory test was done at the same time as fsck after the computer crashed? I doubt fsck takes up much of the computer's bandwidth because the drives are so slow. It should easly be able to check the memory durring that time. Since bad memory is likely to cause the system to crash why not examine it just like the filesystem after it does crash?

    4. Re:reliable ? by BradleyUffner · · Score: 1

      I'm guesing that the memtest86 discussed in the article will be integrated into the boot up of the system. That way if more memory goes bad it will detect it, block that part out, and reboot. I guess that eventually you will have to replace the chip, but it's better then paying the full price.

    5. Re:reliable ? by avandesande · · Score: 1

      Chips are graded first by eye (microscope) before they are 'packaged'. Chips with mass abnormalities won't make it to packaging(putting the pins on and encasing in plastic). The chips that fail on the test bench probably only have a few minor errors- there is no reason why they would multiply during use.

      --
      love is just extroverted narcissism
    6. Re:reliable ? by King+of+the+World · · Score: 1
      Recently I did a RAM check and to do so took 15 minutes.

      This is 128megs at 133Mhz.

      That at boot-time would be fscking annoying.

  54. How does this even classify as news? by Pimpy · · Score: 1

    This patch has been around for quite some time, how does this classify as news? And since when did slashdot start profiling kernel patches? Maybe they should setup a kernel patches section and archive the few hundred that are around in circulation that do something useful.

  55. Oh, sure, Linux users are this desperate by ackthpt · · Score: 5

    Doesn't this make Linux look like a throwback to those old days of hobbies, like Amature Radio making QRP rigs in sardine tins?

    "Hello, Kingston, I'm looking for any old cruddy defective RAM, got any? Uh.. No.. I won't be reselling it to Linux users, I swear that I am with a major US ISP and we want to put it into our servers! Call Rambus, you say? Hello? Hello?"


    --

    --

    A feeling of having made the same mistake before: Deja Foobar
    1. Re:Oh, sure, Linux users are this desperate by SevenSeasOfRhye · · Score: 1

      Your post makes you look like someone sucking up to the suits!!
      People still write code because they need it (and think others may too).
      Concentrate on the technology, not marketing (lets leave that to people like ESR if we want it at all)
      If people want this, they'll use it.
      If not, they won't use it - its that simple


      --
      Electrical Engineering is BORING.
    2. Re:Oh, sure, Linux users are this desperate by gaudior · · Score: 1

      Not when I'm requesting new hardware, but it is an important point when I have to make due with whatever crap I can scrounge. We regularly start building stuff on crap, and once it's rolled into production, we justify the expenditure for new/better hardware.
      --

    3. Re:Oh, sure, Linux users are this desperate by Ed+Avis · · Score: 2

      The idea of defective hard disks has been around for years. Modern disks will map out defects automatically so they're hidden from you, but they're still there (a lot of the time). So if you don't have to throw a disk away because of one error, why do the same for memory? People didn't suddenly junk all their Pentiums when the F00F bug was discovered, they just worked around it in software.

      Still, I can appreciate there is a psychological problem with knowing that your system is not 100% flawless - plus, if RAM has some bad bits, might it not have others you don't know about?

      The only way BadRAM would take off, I believe, is if RAM manufacturers started shipping each DIMM with a list of known defects, as used to be done for disks. At present, a single defect means the RAM is not used, so the only defective memory modules are dodgy no-name ones you might not want to trust anyway. If, OTOH, the manufacturer guarantees that there are no flaws other than the handful given in the defect list, there'd be no reason not to use the memory provided you trust the manufacturer.

      --
      -- Ed Avis ed@membled.com
    4. Re:Oh, sure, Linux users are this desperate by dAzED1 · · Score: 1

      I'd be more impressed with on the fly patching of bad memory to keep a server going rather than having it hang. That would be a selling point. Wouldn't this be an obvious first step towards having it work on-the-fly? We all already have scripts in our crontabs monitoring the system health, things watching other things and making everything work. I'd imagine that it would be an eventual possibility for this to work on-the-fly, so long as the beginning stages are worked on/with.

    5. Re:Oh, sure, Linux users are this desperate by ackthpt · · Score: 1
      The idea of defective hard disks has been around for years.

      Pretty much all hard disks come with defects already mapped. It's expensive to buy a HD that doesn't have bad tracks and sectors. Again, usually reserved for critical applications, however, you're better off with Raid-5 with hotswapping.
      if RAM has some bad bits, might it not have others you don't know about?

      Raising the question: How is it bad? Intermittant, low tolerances? To adequately test bad RAM would be a Q/A process all by itself. I'd be the last person, even on my tight budget, to buy a computer at a discount that is known to have bad parts and patches in it. There's already enough variables with what you get. ;)
      Re:Oh, sure, Linux users are this desperate (Score:2) by Ed Avis (epa98@doc.ic.ac.uk) on Thursday October 26, @02:21AM PDT (#311) (User #5917 Info) http://www.doc.ic.ac.uk/~epa98/ The idea of defective hard disks has been around for years. Modern disks will map out defects automatically so they're hidden from you, but they're still there (a lot of the time). So if you don't have to throw a disk away because of one error, why do the same for memory? People didn't suddenly junk all their Pentiums when the F00F bug was discovered, they just worked around it in software. Still, I can appreciate there is a psychological problem with knowing that your system is not 100% flawless - plus, if RAM has some bad bits, might it not have others you don't know about? The only way BadRAM would take off, I believe, is if RAM manufacturers started shipping each DIMM with a list of known defects, as used to be done for disks.

      This is a job for another party, usually second rate parts will not even be allowed to feature the original manufactures name or part numbers, after all, they don't want to get stuck supporting it.

      I view BadRam as an interesting hack, but not something I'd ever use.


      --
      --

      A feeling of having made the same mistake before: Deja Foobar
    6. Re:Oh, sure, Linux users are this desperate by Ed+Avis · · Score: 2

      I wouldn't use RAM with intermittent faults. But if it had a handful of known bad bits, with a guarantee that all the others were solid, I wouldn't have a problem with mapping out the bad 0.001% (with say 0.1% wasted space) and using the rest.

      usually second rate parts will not even be allowed to feature the original manufactures name or part numbers, after all, they don't want to get stuck supporting it.

      That's the problem - the perception that RAM with any defects at all is 'second rate'. In the past this has certainly been true because it wasn't possible to map bits out. If RAM starts being seen more like hard disks (and until a few years ago, floppies and LCDs) - seen as something which may have known defects without being unreliable - then manufacturers will be only too keen to improve their yields by selling the chips that are only almost-perfect.

      (I wonder - could you do this with other hardware? If one of the registers on your CPU is broken, could you sell it and tell the customer to use a compiler that won't use that register? That would be totally infeasible today, but in the future I can see it _could_ happen. For example, if the whole system were in bytecode with a small native-code bootstrap that finds out about the CPU's defects and sets up the JIT compiler appropriately. There have been cheaper chips which were rumoured to be defective versions of more expensive ones - eg the Intel 486SX may originally have been a use for 486DXes where the FPU turned out defective.)

      --
      -- Ed Avis ed@membled.com
    7. Re:Oh, sure, Linux users are this desperate by ackthpt · · Score: 1
      Your post makes you look like someone sucking up to the suits!!

      In the words of Viceroy Nute Gunray: You assume too much.

      I want Linux accepted and push Linux because I'd rather be working in it. I'm obviously more devious that you expected.


      --
      --

      A feeling of having made the same mistake before: Deja Foobar
    8. Re:Oh, sure, Linux users are this desperate by ackthpt · · Score: 2
      ...once it's rolled into production, we justify the expenditure for new/better hardware.

      Lucky you, where I once worked (two places actually) this is how these things played out:

      If you have it running on that [cobbled together pile of leftovers] then we'll just leave it the way it is.

      It eventually breaks because the upgrade never happens (due to always fewer priorities than necessities met)

      The [cobbled together pile of leftovers] never should have been done, it makes us look bad when a [cobbled together pile of leftovers] failure deprives users of one of our key services.

      In retrospect, it's funny, but wasn't each time innovation was thus mishandled.

      This too often seemed to resemble the institutional project model:

      1. Plan is proposed
      2. Wild enthusiasm
      3. Plan is put into action
      4. Process fails
      5. Feelings of hurt, loss and disillusionment
      6. Search for the guilty
      7. Punishment of the innocent
      8. Promotion of non-participants



      --

      --

      A feeling of having made the same mistake before: Deja Foobar
    9. Re:Oh, sure, Linux users are this desperate by SevenSeasOfRhye · · Score: 1

      In the words of Viceroy Nute Gunray: You assume too much.
      I'm obviously more devious that you expected.

      Wait till I take over the world. Then, people won't have to strive towards the goal of Linux on every desktop
      The first thing I'll order as the RULER OF THE WORLD will be a pre-emptive nuclear strike on Redmond. What will Management do without the active guidance, devastatingly superior technology and tech support? They'll probable nail their nuts to Windows OEM packs and install Linux.
      I'm shrewd.

      Who the fuck was Gunray?

      --
      Electrical Engineering is BORING.
    10. Re:Oh, sure, Linux users are this desperate by jnik · · Score: 1

      Doesn't this make Linux look like a throwback to those old days of hobbies, like Amature Radio making QRP rigs in sardine tins?
      Yeah, stupid stuff like low-power reliable communication. Or using repeaters to overcome line-of-site on VHF and provide great range with handheld equipment (cell phones...). Or--get this, this one is really stupid--making those computer things talk to each other over the radio.
      It's all about advancing the state of the art, and I don't think anyone can laugh at that.

    11. Re:Oh, sure, Linux users are this desperate by gaudior · · Score: 2

      To a large extent, Linux is still in the hobbyist category. That's no a bad thing, it's just a recognition that it's not shrink-wrapped for general consumption. There is still a very high bar a person needs to jump over before Linux becomes truly accessible. Those of us using linux for business purposes recognize this, and we are willing to take this into account when we make risk assesments.
      --

    12. Re:Oh, sure, Linux users are this desperate by ackthpt · · Score: 2

      Nothing wrong with that. My perspective is of one who tries to convince an employer to look beyond packaged solutions. The patch is fine for the hobbiest, but be honest how far would you trust a defective DIMM? When you first install, no problem. When you are playing a few games, no problem. When you are setting up your own server to run on a DSL/CableModem, minor problem. When you depend upon it, bigger problem.

      I'd be more impressed with on the fly patching of bad memory to keep a server going rather than having it hang. That would be a selling point.


      --

      --

      A feeling of having made the same mistake before: Deja Foobar
    13. Re:Oh, sure, Linux users are this desperate by ackthpt · · Score: 2

      I'm not taking a swipe at Amature Radio, or even QRP. The reference was to building it in a sardine tin (can't get a decent metal box from Radio Shack anymore, but you can get Cue Cats for free (then tear em apart))

      I'm well aware of the ingenuity of radio amatures, my father has been one for decades, and the innovation which made early repeaters work (retuning savlaged military/commercial radio equipment.) This is great as a hobby, great if it helps out in an emergency, but, as with Linux seeking acceptance, try not to overlook those who opt to ditch the throwback for a stable, professional package with reliability. I can't see calling up Icom and asking for technical support after I nibbled a hole in the casing of my UHF handheld and wired in a customization any more than I can see calling up a tech at 1AM on a Friday because the cut-price 512M DIMM just flaked out a little more and took the system down.

      When you buy good memory, it's expected to be 100% good, no intermittancy. If it flakes you replace it, hopefully under warranty or field searvice agreement. When you buy "iffy" memory, you accept that it is known to be broken, but you have no guarrantee how it is broken and whether that state of broken is stable.

      Annecdote: A disreputable computer technician, who often overcharged for simple repairs and patches, is hit by a truck and killed immediately. He appears in hell and a demon welcomes him, and directs him down a passageway to his eternal punishment. The tech inquires as to what it will be. The demon indicates the punishment fits the crime and opens a door. The tech looks in and sees an enormous cavern filled with PCs, all tagged as broken. The tech says, oh, I'll spend eternity fixing broken computers? The demon says, yes, but since this is hell, they all have intermittent problems.


      --

      --

      A feeling of having made the same mistake before: Deja Foobar
  56. Similar solution exists in the 2.4 kernel already! by Anonymous Coward · · Score: 4

    Check out the 'mem=exactmap' boot-time option in the 2.4 kernel series - it got added a couple of weeks ago. That way you can specify and exclude faulty RAM via boot parameters.

  57. Anything similar? by mbadolato · · Score: 5

    which allows it to make use of faulty memory... *sigh* ....of course my wife had to be reading over my shoulder and asked "Great, now is there anything I can install in you to make use of YOUR faulty memory...." She thinks she's funny. =)

    1. Re:Anything similar? by RobNich · · Score: 1

      Here, here. I wish my wife could do something like that. Hell, for one, she knew that DIMM meant memory!

      --
      Hello little man. I will destroy you!
    2. Re:Anything similar? by ackthpt · · Score: 1

      I'm hyperactive and forgetful. This works to my advantage, as I can usually pick up new stuff as fast as I forget old stuff. :)


      --

      --

      A feeling of having made the same mistake before: Deja Foobar
  58. Nice... now where can I find faulty 64/128MB DIMMS by ndnet · · Score: 1

    This is a cool idea to make use of bad hardware, and while it shouldn't be used to make new systems, it will.

    It's really nice for poor geeks like me who would be happy to have more than 64MB. My system is a workstation from Dell, and I imagine it can handle 128 to 192 MB RAM total.

    I know I won't get 64MB of ram out of a bad 64MB DIMM, but 50-60 extra still helps. I'd like to get this for the PeoplePC system I'm getting for Christmas (poor geek syndrome) which has 64MB RAM and 8MB of that is for the video card.

    My worry is that some local computer makers will use it to screw other linux geeks. Luckily, most Linux users will notice. I'm mostly worried about those buying a preinstalled box as their first Linux PC. Here's hoping that they're safe.

  59. Good idea, but.. by Dman33 · · Score: 1

    How is the stability? I mean, if it is determined that a given stick of RAM has some bad areas, then can that stick degrade further after time?

    For example, if a 128MB stick has 2MB removed as 'bad', is it possible that the chip may eventually have 3, then 4, then 5MB 'bad' as time goes on?

    Any memory guru's out there care to give me some insight on this?

  60. What about Quality Control rejects? by Anopheles · · Score: 1

    Now here's an idea.. What if a manufacturer, after going through a batch of bad memory, found a certain percentage of the group was bad. Then, conceivably, he could sell it at a lower price in some sort of clearance bin, and still make some money on the whole deal, as opposed to throwing it away and taking a loss on the deal.

    1. Re:What about Quality Control rejects? by Ares · · Score: 1

      Never buy something that comes in a box that is a low quality black and white copy of the original.

      Or for that matter, anything that is a half-way decent color copy of the original. If I didn't hate the bastards so much, I'd love to be an FBI badge at one of those shows. Particularly one on commission :).

    2. Re:What about Quality Control rejects? by Lord+Kano · · Score: 2

      Or for that matter, anything that is a half-way decent color copy of the original. If I didn't hate the bastards so much, I'd love to be an FBI badge at one of those shows. Particularly one on commission :).

      Bogus software wouldn't bother me at all, at least if it worked when I got it home. I'm talking about defective modems that were taken from a trash heap somewhere and new boxes made for them and put out for sale.

      I could give a damn if someone is selling bootlegs of Freddy the Fish or something like that.

      LK

      --
      "Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
    3. Re:What about Quality Control rejects? by Lord+Kano · · Score: 2

      As much as I despise the racial/cultural epithets that you just used. I have a policy as it relates to computer show shoppining. Never buy from someone who doesn't speak english as their primary language. Never buy from someone from more than 1 state away. Never buy something that comes in a box that is a low quality black and white copy of the original.

      LK

      --
      "Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
    4. Re:What about Quality Control rejects? by Detritus · · Score: 2

      It's a bad idea. Defective parts have an annoying habit of being resold as good parts by unscrupulous vendors. That is why many manufacturers make a point of destroying defective parts, so they can't sneak back into the supply chain.

      --
      Mea navis aericumbens anguillis abundat
  61. Oops, sorry about that. by Anopheles · · Score: 1

    Of course, I read the webpage *after* I post. He's already thought of this idea. There goes my patentable idea...

  62. My bad RAM story by OlympicSponsor · · Score: 4

    Every time the topic of bad RAM comes up I can't help but tell this story:

    We had just installed an Exchange server we were rolling out the Exchange client to all the desktop PCs. Unfortunately, no one had thought to ask if they could take it--which many of them couldn't. So we were feverishly digging up all the RAM we could find and sticking it into machines as fas as we could. I happened to find a 32MB stick (glory be!) in an unused PC. I said to my boss: "Hey, I found a big one!" He turns around and asked "Is it any good?" while simultaneously reaching for it, and ZAP audibly discharges static electricity right into the thing. We look at each other for a moment and then I say "Not anymore."

    I was wrong, though--it was fine.
    --
    An abstained vote is a vote for Bush and Gore.

    --
    Non-meta-modded "Overrated" mods are killing Slashdot
    (Hey Ryan! Here's your proof!)
    1. Re:My bad RAM story by Elgon · · Score: 1

      &ltkarmawhore&gt If it was still plugged into the PC when it got zapped there is a fairly good probability that it would be fine. It is the very high EMF of static that punches holes in CMOS and derivatives nonconducting oxide layer, however if all of the pins or even (if you are luck) several are touched then the EMF may cancel out to zero (be the same on both sides of the fragile oxide layer) and hence leave your FET's in good nick. &lt/karmawhore&gt

      This is why the 32K static RAM chip for (£15 GB or $26 US!) that I bought in 1990 for my Z80 based electronics project is stuck in a piece of tin foil wrapped around a lump of polystyrene.

      Elgon

  63. Questions by karzan · · Score: 1

    Would it be possible/plausible to detect bad RAM on the fly, without needing a reboot? What about testing RAM allocated right before a program fails? How long does it take to check RAM, and could it be done at runtime?

  64. At least it was flaimbait! by Dman33 · · Score: 1

    It is neither funny, insightful, interesting, or informative

    Yeah, about 90% of the posts on this site are none of the above. Personally, I thought it was funny, not to warrant +1, but worth a small chuckle. I guess not everybody is as interested in karma-whoring as you are..

    1. Re:At least it was flaimbait! by Dman33 · · Score: 1

      I see, so you had to go from karma whoring to trolling...

  65. Re:If Linux works with crap, that's all IT will gi by psergiu · · Score: 1

    > The only improvement that I can see is if it could add additional addresses to map out "on the fly" as they are found to be bad during operation.
    HP (HPPA) machines do this. But it's done with a combination of the kernel and the hardware/firmware. If a page is found to be faulty is noted on the PDT (Page Dealocation Table) and removed from use if possible (if it was previously user by the kernel the machine will dump core - if not it will just spit horrible errors). The PDT is stored in a flash and has 50 entries and it will be cleared when a memory is changed.

    And this is done on big, expensive, production servers - Linux should addopt this as this really works (the PDT machanism kept one of my HP servers stable with a bad 512mb mem module until i could shut it down)

    --

    --
    1% APY, No fees, Online Bank https://captl1.co/2uIErYq Don't let your $$$ sit in a no-interest acct.
  66. it's part of the manufacturing process by Fat+Cow · · Score: 1

    i just started with a semiconductor testing company and i found out that memory usually has "bad blocks" which they can route around by blowing a link with a laser. they add redundant banks just for this purpose, it's cheaper to have that redundancy than scrap the whole chip.

    so now (or soon) we will be able to do the same thing with software that they already do during the manufacturing process, presumeably for chips that go bad after they leave the fab?

    --
    stay frosty and alert
  67. The Environment by Overnight+Delivery · · Score: 2
    Ethical motivation:
    I like to preserve as much of the environment as I can. The production of chips is a very resource intensive process, and the complexity of a chip means that a lot of the produced chips are incorrect. I dislike wasting good materials, and even if they are merely `good enough' they should be taken seriously. By allowing the use of such `good enough' memory chips, I hope to help preserving the environment.

    Computers are pretty darn unbiodegradable, yet the pace of progress makes them obsolete at an ever increasing pace. How many 386's are somewhere other than landfill? A 386 is not actually that old when you compare it to a washing machine or a fridge.

    A lot of people are slamming this because it has some practical limitations, so what!

    This guy has done a pretty cool hack, but has also done something positive about side of our industry that most of don't think about very often.

    --

    When it absolutely positively has to be there.

  68. "bad" RAM doesn't not get worse! by AndyL · · Score: 1

    Assuming this patch works as advertised, why wouldn't you use bad memory on any machine you wanted to save a few bucks on?

    We're talking about ram that was defective when it was manufactured. It's not going to get worse. There are just some areas of the chips that can't be used.

    -Andy
  69. Sigh by Mr.+Buckaroo · · Score: 1

    I've been dealing with mcglen micro over some bad(from all my testing) ram. They charged me full price tho.

  70. Testing methods by skoda · · Score: 2

    I don't know anything about memory industry testing methods in particular, but given that RAM chips are produced in volume, I doubt every chip is inspected by a human. More likely, there is an automated inspection system that checks for surface defects, perhaps runs quick functionality tests, and batch sampling inspection by human inspectors.

    Not that that would lead to lower quality overall. It might even be better since people might get sloppy after looking at a few hundred identical chips every day, whereas machines don't get bored. (well, except for my computer. It insists I play Unreal Tourn. now and again :)
    -----
    D. Fischer

  71. Memtest86... by jjeff · · Score: 1

    Did anyone else have memtest86 sit there for (i think i gave up after about 6 hours) passing all the tests it had performed so far but why does it take so long?

    I mean i was still on the first test and only the 11th Pass??

    anyway ive given up on that idea for the moment, until i can easily pinpoint the corrupted addresses on my 32meg dimm.

    --
    when everything is working perfectly.. BREAK SOMETHING before something else FUCKS up!
  72. Some things NEVER CHANGE... by mcrbids · · Score: 2

    I'm reminded of a Digital VAX 7/1150 (or was it 711/50? I don't remember) I worked on. It was the size of two refridgerators, and required TWO room air conditioners to keep temperatures in the room reasonable - to deliver roughly the processing power of a '286...

    But it was an AWESOME machine. And, mapping out memory that was bad was something it did on the fly! It would find a bad memory spot, and do one of several things with it:

    1) Stop using it;

    2) If the problem was intermittent, it would only store PROGRAM CODE there - which, if the memory was bad, it could re-load from the hard disk!

    3) If the memory tested good for a while doing program code, (a few days, I think) it would return that RAM to general use.

    An amazing machine - with some features that pale even a big, powerful *nix box today.. For example, versioning of just about EVERYTHING... *:1, *:2, *:3, etc, and while there was a "root" user (called admin on this system) there could be more than one! (My login, "dirdisb" was a "root" login too, and you could always tell when you looked at a file whether admin or dirdisb actually did it - much better than *nix style, IMHO)

    I seem to recall that there was a patch or something you could apply that would make it use ALL hard disk space to create as many versions as possible of documents - or just 10. (we used the latter)

    This machine, as slow as it was, would comfortably handle 20 simultaneous users! (granted, no X-windows or GUI at all)

    With patches such as this badram patch (which IMHO should be added to the kernel by default) we are getting some of these really cool features back...

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
    1. Re:Some things NEVER CHANGE... by Miod · · Score: 1

      You surely meant a Vax 11/750...

    2. Re:Some things NEVER CHANGE... by Tejota · · Score: 1

      Vax 11/750. and VMS was an AWESOME OS,
      much better than current *nix flavors.

      tj

  73. Re:Err... by jmkaza · · Score: 1

    Hmm.. Let's see. As a full time college student working 15 hours a week at 12 bucks an hour that's about 600 bucks a month after taxes. Figure in monthly home beer costs (18pack @ $10 X 4 = $40), bar beer costs (twice that) burritos ($24) petrol ($60) isp ($40) and rent ($350) and that gives a whole 6 dollars a month to spend on hardware. Now one could save up for almost two years to get a new 256MB stick of ram, or use this program and get a FAST ASS system with the money from taking cans back. The question of which option to take seems like no question at all.

  74. BadRAM Supplies by Schedler · · Score: 1

    I have a source for thousands of bad dimm modules. To inquire; jschedler@pnp-group.com

  75. Re:Would you an O/S on a HDD with bad sectors? by detach · · Score: 1

    Well, having fiddled with computer hardware for years, I figger electronic components, not ONLY computers, degrade. That makes perfect sense to you dosen't it?

    What's more, our RAM chips are running at really high speeds of 66MHz, 100Mhz, and some at 133MHz for those using the P3-EB processors. They are done with .18 or .15 micron die processes (correct me if i am wrong), and since they are getting so puny and fast, the possibility of failure is even greater.

    Think -- why does old 486 PCs last a decade while our new Pentium PCs die after a few years, bit by bit?

    Anyway I can't think of a better way to figure out bad RAM pages until I get fux0red by a screwed data coming in. By the time that happens, it might be too late -- your screwed data might have been on its way down your data bus to your hard drive, or prolly writing some data onto your BIOS Flash chip.

    When that happens, that $80 you save on your RAM izn't worth the time. I won't wanna try this, even if the system is a devel system. It's just now worth the time. I figger the time's more worth wanking. ;-P

  76. Your bad RAM has to... by BornInASmallTown · · Score: 2

    ...be bad in the right way. If, for example, the most significant bit of the addressing bus were damaged, you would only have access to half of the chip's memory at a maximum.

    To fix this problem, you'd have to use 2 "half-working" chips to get the same amount of memory that 1 of the non-damaged ones would have provided.

    It seems that buying several damaged chips to make up for the one non-damaged chip would not be very cost effective in the long run.

    1. Re:Your bad RAM has to... by Pimpy · · Score: 1

      Its not aimed at buying damaged chips because its cheap, (as the slashdot posters don't seem to understand), its more about, okay.. I have a faulty peice of ram, instead of having to toss it out I can at least use part of it.

    2. Re:Your bad RAM has to... by b1t+r0t · · Score: 2
      ...be bad in the right way. If, for example, the most significant bit of the addressing bus were damaged, you would only have access to half of the chip's memory at a maximum.

      It's worse than that. DRAM is addressed by rows and columns, so each address line controls two bits, and not necessarily two adjacent bits.

      If an entire address line is bad, it's time to make a keychain holder. At the factory, this type of problem won't even make it out of initial die testing, much less all the way to manufacturing a DIMM.

      --

      --
      "Open source is good." - Steve Jobs
      "Open source is evil." - Microsoft
  77. Re:Now there's a point to the BIOS memory test? by Megane · · Score: 2
    I actually had a friend who ran DOS debug on his BIOS and noted that it *did* actually test every one of the X86's registers during boot

    The Atari 7800 had lots more ROM space than it could possibly use for just the 960-bit digital signature lockout code, so they included a full 6502 CPU test in there. Sheesh.

    --
    #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
  78. Re:Hello this was on Kernel Traffic a long time ag by SomeOtherGuy · · Score: 1


    { SARCASM } Or better yet -- if you ALREADY are getting your news in other places -- why take the time to read or worse yet post to Slashdot? { /SARCASM }

    I would be willing to bet that a good percentage of Slashdot readers did not (or do not) read the "Weekly kernel traffic digest".

    --
    (+1 Funny) only if I laugh out loud.
  79. sounds good for high-performance computing shops. by 11390036 · · Score: 1

    Places like super-computing centers - especially those that use off the shelf parts and clustering software like beowulf will be able to both increase the total amount of memory available to their systems while still coming out cheaper than buying good memory.

    I still don't have the money I'd need to build my dream beowulf system, but its getting more affordable everyday!

  80. Re:Is this good for Linux's rep? by firewolf · · Score: 1

    Of course this is a good thing (Best Martha stewart impression) It's good because, as the rest here have said, it shows us that Linux can be ran on virtually any system, any condition. MS can't even claim that, nor can Mac.

  81. Re:Signal 11 no more? by Anonymous Coward · · Score: 1

    fuck off

  82. SCO Unix by martin · · Score: 1

    SCO has had this facility for well over 10 years which is when I last played with it.

    It was very useful for solving issues with the 16MB to 16MB+8K area which many pre-pentium PCs used for special BIOS memory mapping stuff. It also helped out with being able to stop SCO Unix using the area of RAM that BOIS copied itself and Video RAM to before the BIOS become intelligent about his.

  83. Re:Best Buy by FraggleMI · · Score: 1

    Somebody emailed me and actually thought that I was serious.

    --
    huh?
  84. Re:256M for $135? by Strog · · Score: 1
    Try Pricewatch. A lot of places are selling memory that is called high density. This memory only works on the newer chipsets. This is generally the cheaper stuff. Make sure you know what you are getting if you don't have a new chipset. You can get Micron PC133 256Mb for under $200.00 that will work most every board.

  85. Re:Best Buy by Suppafly · · Score: 1

    you racist people crack me up i swear.. ive been ripped off by more 'crackers' than 'camel jockeys'

  86. Re:Best Buy by Strog · · Score: 1
    If you buy generic crap then you have no right to complain about a system that isn't rock solid. I will only buy quality parts for my systems. I haven't seen all the problems with Windows that I hear people whining about. I built a fairly high end (at the time) 3d animation system for my brother-in-law. I built it with ECC memory. He complained that NT would experience crashes after it was 6 months old. Come to find out he had bought some cheap non-ECC ram and threw it in his machine without telling me. We pulled the mismatched RAM out and it hasn't crashed since. Ok, it crashed twice on a flakey program. He still goes on about how stable it is for NT. I keep trying to tell him that NT can be stable if you use good hardware and stable drivers.

    Maya 3 is coming out for Linux in 2001 so he can dump NT but it is also coming for OSX and he will likely go that way because he has always been a Mac fan. Both are better than NT for this. Maya on NT is very cool though if you can't quite spend the bucks for Irix on a SGI machine.

  87. Great for laptops by bluestar · · Score: 1

    I own a laptop with some bad memory on the motherboard. It's usually not a problem, but a kernel compile will segfault at random points. Unlik a desktop, I can't just replace the RAM.

    As soon as I get the machine back (hi Rick :-), I'll try this out.

    --
    "The cost of freedom is eternal vigilance." -Thomas Jefferson
  88. Wouldn't use if I wanted a stable machine... by pwileyii · · Score: 1

    Obviously, this patch is for the desktop Linux user or Linux experimentor. No one in their right mind would use bad memory in a machine they want to remain stable, most importantly a company.

  89. Re:No, this *is* good for production use! by Cellshade · · Score: 1

    This sounds like a new TV special.

    Tonight on Fox:

    WHEN GOOD MEMORY GOES BAD!

  90. Re:Is this good for Linux's rep? by ryusen · · Score: 1

    if you were to compare this to bad tires on ford cars... it would be more like "new ford explorer add-on can fix bad tires" or something like that.. but everone knows i hate bad analogies...*cackle*

    --

    I believe sex is highly over rated... unless it involves me
  91. Big step forward to match enterprise UNIX systems by Loge · · Score: 1

    Some of the commercially-developed UNIX systems have successfully used this feature to handle enterprise workloads. For example, HP-UX has long derived a unique advantage from its Dynamic Memory Resilience feature, which allows a server to sustain single-bit errors (see HP-UX 11i specs). If the Linux implementation of this function can also be made to work dynamically, i.e. fence off memory that goes bad during runtime, it will be a huge step forward for establishing Linux as a true enterprise alternative.

  92. Re:Now there's a point to the BIOS memory test? by markos1-1 · · Score: 1

    The BIOS memory test is of ABSOLUTLY NO VALUE unless you sprung for parity memory. The BIOS test only tries to send a random bit to memory, read it and check the parity.

    If you bought non-parity memory, the test will always succeed no matter if the memory is bad or not.

  93. Very useful by Bun · · Score: 1

    This is a great idea, even if you don't want to use the bad RAM. I don't know how often this would happen, but I could imagine a scenario like this:

    1. The software polls every once in a while for the bad RAM and marks it when it finds it.
    2. Some other software could pick up on the kernel message and send an email to an administrator,
    3. Who could schedule the downtime for the replacement without worrying about the system going blooey in the next 5 minutes.

    A bit more peace of mind can't be a bad thing.

    --
    "Anyone that has ever gotten an idea based on any of my work and done something better with it-good for you."--J.Carmack
    1. Re:Very useful by Pimpy · · Score: 1

      Cute idea, but not the way it works. Your approach of polling the ram periodically would waste far too many cpu cycles and would be highly ineffecient. All that is needed is to scan through the ram on bootup and flag unusable portions as bad, then when mapping the ram out you simply skip over anything flagged bad.

    2. Re:Very useful by Bun · · Score: 1

      Yeah, well, it was just a thought. I didn't think critical servers are rebooted that often, so another opportunity for the scan was needed. I don't think once a day around 2am would be a brutal waste, but then again, how many RAM modules go bad after a period of time? Ah, bugger it. Chalk one more up to ignorance.

      --
      "Anyone that has ever gotten an idea based on any of my work and done something better with it-good for you."--J.Carmack
  94. mod this guy up by Barbarian · · Score: 2

    mod this guy up, he knows what he's talking about.

    --

    1. Re:mod this guy up by Field+Marshall+Stack · · Score: 1

      Mod this "mod this "mod this comment up" comment down" up! It's not a fucking waste of space, it's an insightful commentary on the spiritual emptyness of the millenial situation of the er postmodern somethingerather of the other thing altogether!
      --
      "HORSE."

      --
      "HORSE."
      -Flaming Carrot
  95. My Article by SomeOtherGuy · · Score: 2

    { Joke Mode On }

    From the "Why did this article get posted...and not mine" department:
    Ohh..This sounds a lot like the story I submitted a week ago called:
    "How to make your good memory make use of bad software."

    {Joke Mode Off }

    --
    (+1 Funny) only if I laugh out loud.
  96. Re:If only it made sense.. by tzanger · · Score: 1

    Your arguement doesn't make sense. A machine with 500M of usable RAM works just as good as a machine with 512M in almost every scenario. That's completely unlike your 5-working-cylinder car analogy.

    What kind of warranties will the end user get for the memory? what kind of performance is eaten by this program? does the memory run up to spec? will it still work in 2 months?

    There is no performance hit; the pages are marked bad and not used, not continuously tested. It's the same as marking a chunk of memory as invalid (causing a page fault on access) but never marking it valid again since it's never swapped back in my the vmm. I'm not well versed enough to know if memory with bad sections will get progressively worse but everything else you've mentioned is silliness.

  97. More information on bad ram modules by NoInfo · · Score: 1

    You can find more information here: http://www.home.zonnet.nl/vanrein/badra m .

  98. Re:Nice... now where can I find faulty 64/128MB DI by Anonymous Coward · · Score: 1

    I'm mostly worried about those buying a preinstalled box as their first Linux PC.

    Lucky this has happened all of ZERO times in the history of the universe.

  99. A better solution... by Anonymous Coward · · Score: 2
    Buy a real computer. Pay a lot of money for it. Get a warranty/service contract (they're often free for a certain level of support). Bad RAM? Send it back. Bad mainboard? Send it back. Bad anything for the first three years? You guessed it...send it back. Get the replacement by next-day air. Free.

    A lot of vendors offer service contracts and warranties. But peecee vendors, accustomed to dealing with...shall we say, less than reliable operating systems, will try to make you go through 543 steps and tests before allowing you to send your hardware for replacement, because most problems in that world are either OS bugs or user error. In the real computer market, they don't fuck around. You paid a lot for your system, and you can expect it to work. When you call them up and say you have a bad foowhatzit, they send you a new one (unless they're Sun, in which case they make you sign an NDA first - bad Sun, bad!). They expect, and rightly so in most cases, that you know what you're doing and it isn't a software problem. No runaround, no bullshit, no cost to you. This is one of several reasons I'll never own another peecee. The service just ain't the same.

    I understand the concept of trying to get the most you can out of any hardware you might have. But I also think that people stuck with such hardware ought to learn their lesson next time instead of relying on hacks, however clever, to work around their poor buying decisions. Anyone actually seeking out bad memory to use with this is insane. Firstly, there's good reason to believe that if memory is failing, other areas in the same part may fail as well, perhaps with less frequency or at a later time. Second, even if the cost is half that of a good part, is it really worth saving 50 bucks and having to configure this thing, test it, and make sure periodically that no other memory areas fail? I would suggest that technical work of this type is worth at least 50 bucks an hour...so if you value your time fairly, it's unlikely that you'll win out of this. I'll gladly pay some extra money to know that I won't get a sig11 the next time I go to compile something...and if I do, I can get replacement parts the next morning at no cost, without any hassle. I don't work for any vendors. I'm just a sysadmin who'd rather read slashdot than argue with tech support.

  100. Very inefficient... by MeowMeow+Jones · · Score: 1

    One bad bit causes them to throw out a 4Kb page.

    If they'd take advantage of x86's segmented memory model they could reduce that amount to 16 bytes.

    --

    Trolls throughout history:
    Jonathan Swift

    1. Re:Very inefficient... by MeowMeow+Jones · · Score: 1

      I was just trying for a quick +1 insightful that would get the flames going. :)

      --

      Trolls throughout history:
      Jonathan Swift

    2. Re:Very inefficient... by cthulhubob · · Score: 1

      correct me if I'm wrong, but isn't it only segmented in real processor mode?

      Linux runs in protected mode 100% of the time after it's loaded by the BIOS.

      --

      In post-9/11 America, the CIA interrogates YOU!
  101. Use SCSI! by rhombic · · Score: 1

    We use a SCSI CD-R for data backups, average about 2CDRs per week. I've ended up with one coaster in the last year, and it had a scratch on it. There's no reduction in the machines (heavy) load while burning. SCSI rocks!

    --
    1984 was supposed to be a warning, not an instruction manual.
  102. Re:Now there's a point to the BIOS memory test? by Anonymous Coward · · Score: 2

    Does it run every possible combination of CPU instructions on boot up?

    It can't. Running every possible combination would take an indefinately long period of time (infinity).

    Does it check every single block on the hard drive? No!

    This is because the hard drive is not essential to the functioning of the computer. With modern operating systems, usually a hard drive is required, but again, it's not essential.

    Does it check all the blocks of floppies, CDs, DVDs, etc to make sure they work?

    This would be absurd. "Please insert a DVD, CD and floppy to boot".

    If the memory test is essential to the functioning of the system, why do they let you skip it?

    You then go on to contridict yourself by saying Obviously, the smart thing to do is to _wait_ for the memory to fail rather than test the whole lot for a minute or two..

  103. Bad Ram! Bad! by bperkins · · Score: 1

    Hmm. I think the price of bad RAM just went up.

    I'm not sure I'd use this myself though. I find myself resurecting machines that won't boot from lilo anymore quite a bit. It'd be awfully annoying not to be able to use an off the shelf rescue disk. It's bad enough when you have to get some weird scsi driver working.

    OTOH, it'd be a lifesaver if you have some wounded machine you need to get back up ASAP.

  104. make sure nobody replaces linux by jannic · · Score: 2

    If you're operating a linux server and someone wants to replace it with w2k, well, let him try. But don't tell him that the RAM is defective. (It works with linux, so what?) :-)

  105. Re:Now there's a point to the BIOS memory test? by kyz · · Score: 1

    It can't [run every possible instruction combination]. Running every possible combination would take an indefinately long period of time (infinity).

    Indeed, that's my point. The BIOS makes no great time-consuming effort to ensure the CPU works accurately and completely. The CPU's correct functioning is essential, as the FDIV bug showed. The CPU is the most essential part of the computer. And the only tests done on it are ones that work out which CPU it is, and some basic sanity. As the CPU isn't fully tested, and it's more important than the memory, why is the memory fully tested?

    the hard drive is not essential to the functioning of the computer. With modern operating systems, usually a hard drive is required, but again, it's not essential.

    Some form of device from which the OS, software, etc is loaded is necessary. If it's possible to use every block on this device, then to be sure of success every block on this device should be tested. This is the crackpot theory of BIOS memory testing applied to other system parts. My point is that hard drives map bad blocks out as and when they find them, when they're actually needed. So should memory. That's what I mean by 'waiting for memory to fail rather than test the whole lot'.

    --
    Does my bum look big in this?
  106. Would you an O/S on a HDD with bad sectors? by detach · · Score: 1

    Hey this sounds illogical. If RAM chips were damaged, there is potential that they get damanged further, and eventually your amount of usable RAM will run down to 0 bytes. They are electronic components anyway. Would you run your O/S on a hard disk with lots of damaged bad sectors and a potentially dying motor? I doubt so. First, correct me if I'm wrong, but before your O/S even boots, the first few bytes of your RAM is already taken up. If it's damaged, there's a few things which could happen. a) You can't start your computer b) Somewhere before your O/S loads, the whole boot process goes bonkers c) It corrupts the data in your hard disk Say, this is working at O/S kernel level, and the kernel loads way after a huge chunk of initialization process done by your system during boot, and some systems just refuse to boot after a failed RAM test. Even if you got your system running, there's a high risk of failure. Why waste time and money on a system that's gonna potentially die on you? Just spend a few more bucks on a brand new RAM stick. Common, I run my Squid on a 48mb system!

  107. Re:If only it made sense.. by kmem · · Score: 1

    I have a car that runs on 5 of it's 6 cylinders -- and it runs great. New Cadillacs with V8's have a feature that detects if a cylinder is bad and stops its' plugs from firing while the engine churns on. I have a hard drive with lots of bad sectors -- but they are marked bad and my box is stable. You can buy new harddrives in the same condition. Ever put a 450Mhz processor on a 400Mhz motherboard? You're throwing away 12% of your processing power but the machine will still run fine. Why is RAM any different? Sure, new 100% working RAM is better -- but if I found a 512M DIMM lying on the street -- and 448M still works -- I'm not gonna complain!!

  108. Think 3 rd world by robinsc · · Score: 1

    Where patchy power mains, brownouts etc can easily blow holes in your memory chips. I have 16 mb lying at home full of bad bits... I'm sure there are many others who have experienced the bad effects of a power surge. Sadly these chips blew out even though the computer was on a UPS. Memory is not cheap for people of limited resources and this is one great way of recycling something the developed world really doesn't bother about too much.In India at least we hate to throw anything away and this is one more major plus point for linux Cheerio Robin

    --
    Linkedin http://in.linkedin.com/in/robinsaikatchatterjee
  109. Re:They throw them away now by TrainedMonkey · · Score: 1

    Who told you that!?? Nothing, NOTHING gets thrown away. Most memory companies SELL their faulty modules to other hardware companies. Then those modules are used in low (i.e cheap) household appliances and other non-critical electronic applications that do not require a stable system.

    --
    "I can't see a f#@!! thing" - photon a to crossing photon b
  110. Re:Wouldn't use if I wanted a stable machine... by Pimpy · · Score: 1

    You seem to miss the whole idea of the patch, its like using a drive with bad sectors on it, you can map the sectors as bad and hop over them as you seek through the disk. The badram patch does things in a similar way, it flags any area that is questionable as bad, and hops over it when mapping it out. Your stick of ram could have 0k useable and the patch would still work fine and it wouldn't effect the stability of the machine at all.

  111. Re:Signal 11 no more? by ethereal · · Score: 1

    That wasn't particularly insightful, even for a dig at Signal 11.

    --

    Your right to not believe: Americans United for Separation of Church and

  112. SIMMs too, right? by Straker+Skunk · · Score: 2

    Ahh, this would have been useful with an old P90/32MB motherboard/memory combo I recently gave away...

    It was quite fun, running a system (FreeBSD) with a single-bit memory error. Sure, gcc would die on occasion, but then there was the oddness of having a script break because a file http_log was missing (mysteriously renamed to httx_log). The best part was actually figuring out which bit was bad...

    --
    iSKUNK!
  113. Re:PS/2 by Gothmolly · · Score: 1

    No, Windows just runs as slowly as if all your RAM has a couple of wait states.

    --
    I want to delete my account but Slashdot doesn't allow it.
  114. 256M for $135? by jroller · · Score: 1

    Please post a URL or place to buy ram this cheap! I'd be more interested in that, than a funny kernel patch.

    You wouldn't, by any chance, know where I could get a 512M for about $250, either?

    1. Re:256M for $135? by chamont · · Score: 1

      1st choice memory. I even ordered from them once or twice back in my sysadmin days. No problems.

  115. Re:Best Buy by Jeff4212 · · Score: 1

    But Best Buy and CompUSA carry only the highest quality "Bulk RAM". In fact, it was just about 9-12 months ago I purchased an 128MB SDRAM DIMM from Best Buy for $114. It only took me 3 trips to get one that actually worked. I think this experience alone is a testament to Best Buy's commitment to providing only the highest quality untested, shoddy, & defective merchandise! Rain check anyone? Don't even get me started...

  116. No, this *is* good for production use! by Random+Q.+Hacker · · Score: 5

    Sure, you wouldn't want to intentionally put bad memory into a production machine, but what if good memory goes bad? This patch, if further developed to perform periodic testing and updating of the bad memory map *during operation*, could actually harden the linux kernel against spontaneous hardware failure!

    If we ever want to see linux used in mission critical systems like air traffic control, embedded medical devices, or military applications, then projects like this are the key. Fault tolerance now exists for memory (this project), storage (RAID), and communication (redundant NICs). The next target should be the CPU.

    How about projects to detect the types of errors a failing (typically, overheated) cpu produces, and adjust the scheduler accordingly to insert idle time and cool down the cpu? Or to use one cpu to monitor another in multiprocessor systems, and avoid using a processor that starts producing faulty results?

    1. Re:No, this *is* good for production use! by mmontour · · Score: 2

      I'd suggest one slight change to your post:

      Fault tolerance now exists for memory ( ECC RAM ), storage (RAID), and communication (redundant NICs).

    2. Re:No, this *is* good for production use! by ^chuck^ · · Score: 1
      How about projects to detect the types of errors a failing (typically, overheated) cpu produces, and adjust the scheduler accordingly to insert idle time and cool down the cpu?

      Don't worry, transmeta has got this covered ;)

      --

      Lemure, wtf! Don't you mean Lemur?
  117. Imperfect knowledge, but ... them's the breaks. by timothy · · Score: 3

    I knew from the badRAM website that it was discussed on kt (and so read that earlier today), but I hadn't noticed it there when it first appeared -- sometimes I'm too interested in other topics, sometimes I don't read it all the way through, whatever. There's a lot of information in the world. I'm glad that someone sent in the link and explained it a bit (so I was intrigued and looked through it), which is what this site is about.

    But how many people saw it on kt? For purely selfish reasons, I'd like to see a lot more people know about this project, because I find it very interesting and useful-looking. Plus, I think it's just a neat hack in general, and I'd like to point it out.

    If it's too old for you, then ... don't read it or waste your own time commenting :) There are a lot of projects out there that have been laboring quietly which may have spectacular results at any time -- do you not want them discussed because they're "old news"? The in-progress Tux2 filesystem was no secret, for instance, (that, too, was discussed on kt), but how many people had heard of it before ALS? Not nearly as many as would have been interested, I warrant, and the comments on the slashdot story about it indicate that.

    YMMV, whaddya do?

    OK.

    timothy

    --
    jrnl: http://tinyurl.com/c2l8yr / foes: http://tinyurl.com/ckjno5
  118. Re:Now there's a point to the BIOS memory test? by sab39 · · Score: 2

    I actually had a friend who ran DOS debug on his BIOS and noted that it *did* actually test every one of the X86's registers during boot - obviously this didn't get printed in the boot sequence (because if it had, the message would flash up for about 8 clock cycles) but it actually did a sequence like:

    put arbitrary number in first register
    copy first register to second register
    ...
    copy second-last register to last register
    compare last register to first register
    if (different) HALT

    (I have a feeling it did this twice, with 01010... and then with 10101...)

    He thought this was quite clever until he realized that a bad bit in the first register would still pass the test (and also negate the test of that bit on all the other registers)...

  119. Still practical... by Ungrounded+Lightning · · Score: 2
    Now we have a way to deliberately make Linux instable.... if you subscribe to the theory that if a DIMM has bad areas then that increases the probability that more of its areas will fail in the future.

    Good point! DRAMs are made with extra rows ... during manufacture the memory is checked and the chips are modified to allow the spare rows to replace the faulty rows. That's how they get the yield up so high. You can get faulty DRAM, but it is used mostly in voice-recording applications where the human brain can tolerate a few crackles.


    But what happens when there are more faulty rows than spares? Answer: They sell it to the crackling-audio people, for cheap. Such chips might not have a higer tendency toward progressive RAM-cancer than those with fewer faults (though I will be happy to stand corrected if someone has contrary data.)

    By marking the bad rows bad, Linux never allocates them. With virtual memory in fixed-size pages and memory-mapped I/O there's no penalty for scattering your data all over the place and hopping over the occasional chuckhole.

    Downside would be if there's a flakey cell and the memory test misses it. So a persistent bad-page map might be useful, as would beefing up the startup test if the feature is enabled, and adding a background memory test on the currently unallocated pages, to pick up any really-low-density faults.

    If an intermittent cell gives you a hit on a read-only or unmodified page, a hack in the parity-error recovery code could move and refresh it. A read hit on a modified page not yet written back to disk is bad news. (Another background hack could be writing modified pages back part of the time the disk is otherwise idle, to reduce that window.)
    --
    Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
  120. oh, bull by legLess · · Score: 2

    It's a lot more like saying, "This old Mercedes will only run on hi-test gasoline, but the new one burns any old crap you put in the tank and runs just as well."

    And if that doesn't impress management, take your faulty DIMM and throw it in a Win2k box. Sit back and watch the fireworks.

    --
    This isn't as much "normalization" as it is "don't take so many drugs when you're designing tables."
  121. Re:PS/2 by rnturn · · Score: 2
    ``Didn't PS/2s do this way back in the day?''

    PS/2?! Heck, didn't PDP-11s do this? I can still remember (barely, though) PDP memory boards with socketed discrete memory chips that included a two/three spare chips so you could replace the bad chips yourself after locating them using an XXDP+ dianostic. I don't recall bad chips bringing down the system but in those days, when a big PDP had 4MB of memory, every little bit counted and the OS worked around it.

    I heard somewhere several years ago that Windows got around flaky memory not by marking pages as `bad' but by forcing additional wait states if questionable memory was detected at boot time. Any truth to this?



    --

    --
    CUR ALLOC 20195.....5804M
  122. yes but by Lord+Omlette · · Score: 1

    if the ram went 'bad' in the first place, what's to stop even more pages from not being usable while the machine is turned on? Does anyone else think this is a remarkably bad idea? At the very least it'll require reboots until less and less of the memory is available. Is the money saved by buying good ram worth the frustration of having to boot/fsck the machine at random? Am I the only one who thinks this is another case of "we did it because we could, not because it's a good iea"? It's not even a good example of that type of thinking...

    But you're absolutely right, no other OS vendor would even think about trying something like this. If you love your OS this much, I am not the right person to try and talk you out of it. Peace.
    --
    Peace,
    Lord Omlette
    ICQ# 77863057

    --
    [o]_O
  123. Bad RAM for cache by wmoyes · · Score: 2
    Flaky RAM would be very useful for caching. Simply add a checksum to all data stored in the flaky RAM. If when reading the data the checksum is invalid just pretend that it was a cache miss. Most of the functionality and almost no chance of failure. Note that only the storage of the actual cached data should be put into the bad RAM, not the translations tables.

    I once had a motherboard that had a problem refreshing RAM above the 1MB boundary. You could write and read just fine, but as time passed you would watch individual bits revert back to 1's. It was kind of amusing watching all the graphics in doom change ;^). I 'fixed' the problem by writing a TSR that tricked all software into thinking I only had 640k of memory. That memory would have been fine for cache if the data was protected by a checksum/CRC.

    1. Re:Bad RAM for cache by Roy+Ward · · Score: 1

      This would only work if either:
      * it was read-only cache
      * the cache was write-through.

      Writes to cache in most setups typically get written back to main storage sometime later, so flakey RAM would still be bad.

  124. Real information... by Anonymous Coward · · Score: 5

    Acually memory fails for many diferent reasons. I personaly work in the test department at a large semiconductor company that makes SDRAM. All memory gets tested before it gets soldered to the PCB but it still can encounter a fail after it leads. Single bit fails and the like are acually fairly common. Most people don't even notice them. Also there are speed related problems, heat related problems, and mechanical problems that come up. For example, the early AMD chipsets had problems with certain memory. Memory also has clock issues and other little details that can effect things dramaticly. However this project seems to be a little far fetched since most memory gets a little worse over time. This is okay for a temp fix but your memory will slowly get worse with time. Usually within 6 months the memory is almost totally bad. Another problem with using bad memory is that in several cases memory will draw a larger idle current than other modules. And if you have more bad modules there is a higher current load. This can lead to damaged parts on your motherboard. Another thing to realize is that load style can effect your stability. In several situations it has been found that windows can run over top of a memory error because it tends to not stress the memory quite as much as your basic high load unix setup. Thats my $.02 on the issue I guess.. It seems like this is basicly using a hard drive that is whining and spuddering. Not a smart move stability wise.

  125. mod parent up please by XNormal · · Score: 2

    Answering machines use ARAM chips which are actually faulty DRAMs. They map the defects and avoid using those areas.

    ----

    --
    Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
  126. OT: Viceroy Nute Gunray by ackthpt · · Score: 1

    Viceroy of the Trade Federation in Episode 1, best line in the whole movie (IMHO)


    --

    --

    A feeling of having made the same mistake before: Deja Foobar
  127. Umm.. by Auckerman · · Score: 2
    Forgive my ignorance, but can you actually purchase ram that is "bad" and labeled as such? Also, anyone who wants "loads of RAM" is going to be in a position to need buy good RAM, because their business and research will depend on it.

    With that said, if I can get a 512MB DIMM for $100 because 50MB of it is inusable, I'll buy it and install this hack, even though having any where near that much ram on my Linux box will not help me(it's little more than a mp3 player and web browser, most of my work is on my Mac).

    --

    Burn Hollywood Burn
    1. Re:Umm.. by misleb · · Score: 2

      We are not talking about 50MB out of 512MB bad. I believe we are talking about single bits. Single 4k pages of addressable memory. Thats 4k out of 512MB. What if you could get a 512MB DIMM for $30 wouldn't you take it?

      -matthew

      --
      "THERE IS NO JUSTICE, THERE IS ONLY ME." -Death
    2. Re:Umm.. by banky · · Score: 2

      we have a local reseller of used machines and parts that has buckets and buckets full of ram marked as bad. There's probably a company like this in every large town; they seem to do quite well, making a nice buck selling "throw away" boxes and other stuff like that.

      --
      ZOMG I WOULD LOVE TO KNOW ABOUT YOUR FEELINGS ON MACINTOSH VERSUS WINDOWS, VI VERSUS EMACS, AND HOW YOU'RE NOT A DORK
  128. Signal 11 no more? by Mel · · Score: 4

    Actually this is quiet handy. Windows always worked better with dodge memory than Linux did because Linux always tries to use as much memory as possible for caching where as Windows didn't.

    It made it notorious for working with dodge memory, failing to boot half of the time. I've seen people blame Linux for bad hardward because it would work with Windows.

    It's nice that Linux now could just go

    *ARGH YOU HAVE CRAP MEMORY*

    shrug it's shoulders and chug along anyway.

  129. Embedded Systems possibilites? by Halo- · · Score: 1

    Doesn't this seem like a great oppertunity for Linux (or any OS) in the embedded market? Suppose I have some critical and rather non-accessable chunk of hardware. (Satellite, remote weather station, ...) Wouldn't it be cool if the hardware could detect the fault and "heal" itself? Anyways, yhis is waaayyy to late to get read by anyone sane, (I have to admit that I only read the first 100 or so posts, so sorry if someone already had this idea)

  130. Re:Is this good for Linux's rep? by Abcd1234 · · Score: 3
    Okay, no offense, but unless you're joking, that's one of the most ridiculous things I've ever heard. What this does is associates Linux with the ability to compensate for bad hardware. It makes Linux look MORE robust, not less... yeesh, some people...

  131. Note to RMS by coolgeek · · Score: 1

    Better get this guy (the Bad RAM webpage dude), he said Linux was "open source" software.

    --

    cat /dev/null >sig
  132. no, mod this guy down by megalomang · · Score: 1

    I don't think we're talking about defects that are detected when the memory is in the wafer stage. I think we are talking about the die that passed the initial wafer probe and then were packaged and then fail somehow at or after the packaging or even shipping stages.

    We are talking about packaged silicon here. And we're talking about populated DIMM/SIMM/etc boards that fail. There's no way in hell that a manufacturer would let faulty die get that far, and by that time, there's no way that any type of depositing or fib technique is gonna be practical in any way at this point.

    so with that in mind, let's return to the topic they are talking about. Why don't you just allow us to think that is't really cool to use software to lock out blocks of RAM? (i.e. by using page protection that is already present in the OS and already supported by the CPU)

    So in conclusion, I think there are plenty of us at ./ that have a clue about memories, memory protection architecture, fabrication, production, and testing techniques. We also are aware that there are a few techniques in practice like the one that you mention, but I seriously doubt that they are applicable to such dense cells as memories. Even if you could spare the silicon real estate and added logic and routing to provide the redundant resources, you would still have to reroute on a per-die and per-failure basis. And that assumes that the defect belongs to a special class of failures and is confined to an easilty fixable range of memories, etc.

    Don't get so high on yourself over there.

  133. Chips may still not work by Fervent · · Score: 2
    512MB sticks are still expensive, faulty or not.

    Plus, some of the motivation is a little aschew. If you want to push these chips into an old machine, you still have the problems of RAM limitations due to motherboard design. A fat lot 512MB of semi-faulty memory is going to do in a board that can only support up to 32MB (or better yet, an older chipset that supports up to 8 or 2).

    --

    - I don't care if they globalize against free speech. All my best free thoughts are done in my head.

  134. M$ DID IT FIRST by t0qer · · Score: 1

    MS did this a lonnnnnng time ago in dos. device=emm386 -exclude 0fff-ffff or something like that. Of course this doesn't work for the first 640k but hey just another example of how m$ innovates --toq

  135. My memory sucks by suitcase · · Score: 1

    I had memory in one of my machines that was totally out there. Would cause the weirdest problems during compiles, but never a sig 11.

    I wanted to rename the box Alzheimers.

  136. Re:If only it made sense.. by dAzED1 · · Score: 1

    speed drops because the program has to wait a few cycles to determine if it can use an area of ram or not. HELLOO...this is already being done. The memory is already being mapped, there is already a section specifically designated to store this info. Otherwise, you'd write to GOOD areas of the RAM that were already in use by running applications... There is no reason that this should make your system any slower in the least bit. This is a very useful tool, and once developed will allow for on-th-fly saves. Not only that, but its going to allow me to have 512mb of ram in my system, without worrying about my wife fussing at me about the expenditure

  137. Better hurry... by darial · · Score: 5
    I beat feet to my local purveyor of crappy used hardware as soon as I saw this, and all I have to say is:

    handfull of busted 256m DIMMS: $10.71 with tax

    6 reboots, a little math, and a partial kernel compile: 21min

    The look on my roommate's face when I typed "top": priceless!

  138. Bad RAM by Octal · · Score: 1

    I don't know about you guys, but whenever my RAM goes bad, it usually won't even boot, so this patch is rather useless to me. It's even more useless when you consider I don't use 2.2 anymore.

  139. Swiss Cheese by twisty · · Score: 3
    Those who know me realize my memory is already Swiss Cheeze. ;-) But I think that this latest breakthrough takes Exception Handling to new levels of fault tolerance...

    Linux forced its way into our IT Department when it could restore a trashed system into something useful. Here at The Salvation Army, we endevor to be good stewards of what we are given. We have an IBM PC Server 350 (now named "Methusela") that crashed one day for no apparent reason. It refused to run Windows anymore... not even Win98 or Win95!

    But it ran Linux flawlessly. Well, actually it did point out one flaw on its own: The internal Ethernet controller was getting an unusually high number of bad packets. It would receive DHCP assignments, even do some web work in Linux... but it was enough to shut Windows down completely. Even after installing a working NIC, Windows could not run due to the faulty internal NIC, but Linux ran fine!

    Likewise, we found an instant way to crash every WinNT system in the building. Someone was re-arranging the hubs and switches, and accidentally created a packet loop by plugging a switch back to itself... in three seconds every WinNT system on the network went straight to the Blue Screen of Death.

    It one thing to handle the rules well, but quite another to deal with the exceptions!

  140. Re:sounds good for high-performance computing shop by dbretton · · Score: 1

    No! I high performance computing shop would never use substandard parts, nevermind parts that are KNOWN to be BAD.
    Bad parts, regardless of whether it is somewhat usable, can corrupt data or cause incorrect results.
    Therefore, those parts cannot be trusted.

    -Dennis

  141. anti-linux? by mosch · · Score: 4

    You must have some sort of problem with linux. This is a valuable, and technically interesting addition to the Linux kernel, and all you can do is act like everybody in the world who needs 256MB DIMMs also has $135 ready.

    I know you're just trolling, and I shouldn't respond, but for students, and anybody who has access to memory modules that are experiencing known, predictable faults, this would be great. Not everybody has some fancy $30,000/year job, y'know.

    --
    "Don't trolls get tired?"

  142. Coming soon to Mac OS-X by burris · · Score: 2
    Considering that the kernel of Mac OS-X is Open Source, you'll eventually be able to have a hack like this on your Mac as well.

    Burris

  143. so what their saying.. by peterjm · · Score: 1

    ..is that there's a patch to the linux kernel that will allow it to use rambus memory?

    sweet.

  144. Here's my writeup with paragraphs by detach · · Score: 1

    Hey this sounds illogical. If RAM chips were damaged, there is potential that they get damanged further, and eventually your amount of usable RAM will run down to 0 bytes. They are electronic components anyway.

    Would you run your O/S on a hard disk with lots of damaged bad sectors and a potentially dying motor? I doubt so. First, correct me if I'm wrong, but before your O/S even boots, the first few bytes of your RAM is already taken up. If it's damaged, there's a few things which could happen.

    a) You can't start your computer
    b) Somewhere before your O/S loads, the whole boot process goes bonkers
    c) It corrupts the data in your hard disk

    Say, this is working at O/S kernel level, and the kernel loads way after a huge chunk of initialization process done by your system during boot, and some systems just refuse to boot after a failed RAM test.

    Even if you got your system running, there's a high risk of failure. Why waste time and money on a system that's gonna potentially die on you? Just spend a few more bucks on a brand new RAM stick.

    Common, I run my Squid on a 48mb system!

  145. One step further: by Soko · · Score: 2

    If he can only get the Kernel to do this on the fly...

    --
    "Depression is merely anger without enthusiasm." - Anonymous
  146. ARRRRGHHHHH ONE WEEK TOO LATE!! by child_of_mercy · · Score: 1
    Last week i pitched 64Mb of faulty RAM out of my home box!!!!

    Whats worse is my house mate has cut it in two and glued it to the side of one of his model tanks to play Warhammer 40k with!

    Every time i see that sucker trundling across the battlefield i'll be reminded of 100 wasted dollars!

    Its certainly now an expensive model tank eh?

    --
    'There is a Light that never goes out.'
  147. PS/2 by Kevbo · · Score: 1

    Didn't PS/2s do this way back in the day? I seem to remember having an old Model 57 that reported an error in memory and then went happily on its way, using less RAM.
    maybe?

    --
    In Vino Veritas
  148. get real, this is only useful for high reliability by Splork · · Score: 2

    The use of a patch like this is only on systems with ECC ram for moving data out of pages which had non-catastrophic memory errors so that the system can keep running without using the flakey bits until that DIMM can be replaced.

    bad ram is just that, bad, and is likely to have more failures over time.

  149. Hello this was on Kernel Traffic a long time ago by Squeezer · · Score: 2

    Hello Slashdot??? This is old news. The kernel bad ram patch was discussed on the weekly kernel traffic digest several weeks ago. Do the slashdot story posters read any news sites other then slashdot? Other news sites are out there...

    --
    Does the name Pavlov ring a bell?
  150. Could have used this last year.... by MattW · · Score: 2

    I built a box from a hole-in-the-wall parts reseller that did volume, volume, volume in the silicon valley, and started having some stability issues. So I started doing mass kernel recompiles (100 at a time) as a test, and sure enough, gcc exited with errors, at random points. However, I've heard that this is not necessarily 'bad bits' on the memory sticks, but rather an inability of the memory to actually keep up with the 100Mhz bus, even though it was billed as pc-100 RAM. Anyhow, after that, I always sprung for the premium Toshiba lifetime-guarantee ram at fryes, and I just got the other parts elsewhere.

  151. Re:Is this good for Linux's rep? by British · · Score: 2

    Both my almost-never-used linux boxes are entirely composed of 2nd hand hardware. I'd say it would come in handy. Dumpster divers would appreciate it.

  152. A long time ago... by Richy_T · · Score: 2
    I suggested this very thing on a Linux newsgroup and it was poopooed as being not worth the effort. Kinda nice to see my idea vindicated. Even though I now would tend to agree with the poopooers.

    Rich

  153. Absolutely good. by jabber01 · · Score: 1
    It SURE is...

    Now Linux is SO GOOD, it can even run well on defective hardware. Let's see M$ do that one! Now who's the one chasing tail-lights? The only way for M$ to one-better Linux is to make an OS that runs on NO RAM, and we know where they stand on bloat..

    The car metaphor is flawed. It's more like Cadillac's 'limp home' feature that will run a V-8 engine on only 4 cylinders in the event of cooling failure, equipped with run-flat tires and bullet-proof Windows.

    The REAL jabber has the /. user id: 13196

    --

    The REAL jabber has the user id: 13196
    What you do today will cost you a day of your life

  154. Now there's a point to the BIOS memory test? by kyz · · Score: 3
    I've never trusted PCs because the BIOS 'tests the memory' before booting up. Why do they do this?
    • Does it run every possible combination of CPU instructions on boot up? No!
    • Does it check every single block on the hard drive? No!
    • Does it check all the blocks of floppies, CDs, DVDs, etc to make sure they work? No!
    • If the memory test is essential to the functioning of the system, why do they let you skip it?
    Obviously, the smart thing to do is to _wait_ for the memory to fail rather than test the whole lot for a minute or two. After doing a full test once, the first time you boot, you can leave a very low priority memory tester running, or leave the full test to some quiet period with a cron job - a decent memory test of course, not that half-witted test that BIOSes do.
    --
    Does my bum look big in this?
  155. Re:If only it made sense.. by CyberChrist · · Score: 1

    Yeah, but your engine is most likely running unbalanced. The firing order is there for a reason. Most likely in the Caddy V8s you're talking about, the onboard computer would stop firing the opposing cylinder under the other head as well.

  156. ... by slothbait · · Score: 2

    No flame here, just clarification...

    > we are talking about the die that passed the initial wafer probe and then were packaged and then fail somehow at or after the packaging or even shipping stages.

    Yes, we definitely are. I only brought up the fact that the industry already routes around process errors in DRAM's to demonstrate a point. He seemed frightened by the possibility that future DRAM's we buy might not be 100% "clean". I wanted to demonstrate that 100% clean isn't necessary, and infact isn't produced now, by and large. What *is* necessary is 100% functional parts. DRAM manufacturers know this, and use it to improve yields, thus driving down cost. No foul play there.

    > So in conclusion, I think there are plenty of us at ./ that have a clue about memories, memory protection architecture, fabrication, production, and testing techniques.

    I expect there are, but none of them were posting. Instead, most of the posts demonstrated a clear lack of understanding of the process. The consequences of techniques like Linux's remapping seemed to worry the original poster, and I wanted to explain why it wasn't something to worry about since a similar process is already performed quite successfully. Further, I wanted to emphasize that this is a perfectly valid technique for increasing yield, and is transparent to the user. It isn't like the manufacturers are trying to rip people off.

    > Why don't you just allow us to think that is't really cool to use software to lock out blocks of RAM?

    I'm not saying it isn't cool. Some seemed weary of the idea, though, and I wanted to point out that their present DIMM's use something very much like this. As long as the software can do this transparently (as the hardware does), what does it matter to the user?

    > but I seriously doubt that they are applicable to such dense cells as memories

    Believe me: they are. Think of it: a massive die area that will be completely destroyed by a single speck. Wouldn't you prefer an ever so slightly (say 5-10%) larger die that can withstand one or two specks? The redundency is very easy to use in a homogenous structure like DRAM (millions of identical cells). All that has to be done to "swap in" the replacement RAM block is to modify some address lines. That can be done by electrically blowing fuses on die, or through laser modification. One of my former employers was *most* found of the laser approach. It added quite a bit of flexibility to their designs.

    --Lenny

  157. Re:who would trust ram from Fryes? by MattW · · Score: 1

    Actually, that's the point: the toshiba ram from Frye's was actually good stuff. It was the crap ram from a hole-in-the-wall parts dealer that was no good. They mostly sold name brand stuff, just bulk and deep discounted (Hi-Tech USA), and obviously a p3 from them is the same as a p3 from anyone else, but the RAM they included in the packages by default just didn't cut it.

  158. Re:If only it made sense.. by verbatim · · Score: 1
    There is no performance hit; the pages are marked bad and not used, not continuously tested. It's the same as marking a chunk of memory as invalid (causing a page fault on access) but never marking it valid again since it's never swapped back in my the vmm. I'm not well versed enough to know if memory with bad sections will get progressively worse but everything else you've mentioned is silliness.
    Uhh.. Another kernel level module won't cause any problems? Uh huh. It has to "mark" these bad memory sectors somewhere - I'll need to look at the thing to be more informed on HOW it marks these sectors - probably in memory. I don't think that it is unreasonable to assume that this module is not as great as some are making it out to be.

    Why?

    • another potentially buggy kernel module
    • memory is used to record the bad locations
    • speed drops because the program has to wait a few cycles to determine if it can use an area of ram or not.

    Also, wouldn't an entire page of memory be whiped out - not just one bit? I haven't looked at what these guys have done, but I wouldn't be suprised if entire 64KB pages are affected if only one bit in that page is gone.

    As I said before, I think it's really cool what they have been able to do. There may be some niche areas for this program to be useful. It is not, however, a good thing (IMO) to be buying bad memory just to save a buck.

    Thats just my opinion tho...

    Verbatim

    --
    Price, Quality, Time. Pick none. What, you thought you had a choice?
  159. Why bother? by Animats · · Score: 4
    It would make more sense to use ECC RAM, which can tolerate even intermittent bad bits. You get an interrupt on an ECC correction, at which point the OS should stop using that memory, without crashing. Mainframes were doing that decades ago. It's worth doing because it keeps a working system up, and Linux should have that feature. It's a big win for server farms.

    Modern DRAM doesn't have much trouble with bad cells, and the yields are quite good. So there isn't a big supply of DRAM with bad cells that fail solidly. Most DRAM problems today are at the edges: at the buffers, the connectors, or clock synchronization - the things that can be messed up during installation.

    Personally, I get ECC RAM even on desktops, just so I know it's working. It eliminates arguments with tech support when the hardware really is broken.

  160. Mod this guy up dammit! by sethdelackner · · Score: 1

    He's right you know. How many times have you looked at a pile of old computer parts and thought it would just take a couple hours of effort and maybe a $20 whatzit to make it all hum as your new file server?

    Last time I tried that it took me 20 hours of bad part diagnostics and piecemeal parts replacements. 20 hours worth of effort that I could have just spent working and then bought a brand new (cheap) machine.

  161. They throw them away now by A+nonymous+Coward · · Score: 2

    So *any* price they can get, above and beyond shipping and handling, is pure profit to them. And there's a cost to throwing them out which they wouldn't have any more.

    --

  162. Re:sounds good for high-performance computing shop by ackthpt · · Score: 1

    Nobody [except the brainless] leaves defective parts in mission critical hardware.

    If I were experimenting with something, I might consider this patch, but for someone who leaves a server running, and expects stability for months at a time, it's not within the realm of consideration.


    --

    --

    A feeling of having made the same mistake before: Deja Foobar
  163. Predictable faults? by ackthpt · · Score: 2

    So tell me, how, without a memory tester, do you know what's predictable vs. unpredictable? As far as I'm concerned with DIMMs, 5% broken is 100% broken.


    --

    --

    A feeling of having made the same mistake before: Deja Foobar
  164. Re:Err... by Dastardly · · Score: 1

    Well, that $135 also covers the cost of bad memory that normally gts thrown out. So, since the bad memory is essentially already paid for, you could consider selling a bad piece of memory all profit.

  165. Re:Does Slashdot readership know nothing of hardwa by ogre2112 · · Score: 1

    My pants just got tighter.

  166. Err... by Wakko+Warner · · Score: 4
    *how* much cheaper can faulty RAM be? I mean, 256MB SDRAM DIMMs are already $135 apiece... Would it really be worth it to get a dodgy piece of memory if the difference in price is negligible?

    - A.P.

    --
    * CmdrTaco is an idiot.

    --
    "Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
  167. Can't you just imagine by www.lunateks.com · · Score: 1

    a Beowulf Cluster of Bad DIMMs? Seriously, I am impressed by the immatureness of posts here. Whenever semiconductors are made in large scale in large factories9like those of hyundai), there is a proper verification system, which checks the DIMM's before sealing, and if any defect is found, than it's sent back to a separate process, mostly scrapped. Now if a company decides to actually sell defective DIMM's, the cost would come out top be grater than that of normal DIMM's simply because of the ratio of good yield to bad yield. Bill Clinton having sex with Chelsea, here!

  168. Best Buy by FraggleMI · · Score: 4

    You can check out Best Buy or CompUSA for some faulty RAM. They seem to have a never ending supply of it. Not only that but you can pay the price that you can get it of of the net for good ram!

    --
    huh?
  169. If only it made sense.. by verbatim · · Score: 2
    Say that current-day RAM modules would cost $100. These are the 100% correct modules, all others are wasted. By selling a decent subset of the wasted ones for a profit of, say, $20 each, a new audience is addressed, namely the one that thinks $100 is a lot (mostly home users). The professional market has the tendency to go for high-quality materials, and they will continue to be willing to pay $100 per module.

    Okay, so you car dealer marks down the car 80% because ONE of the pistons doesn't work right. However, the rest work fine and he installed a thing-a-ma-bob to make the engine ignore that piston.. ummm..

    What kind of warranties will the end user get for the memory? what kind of performance is eaten by this program? does the memory run up to spec? will it still work in 2 months?

    There could be a niche market for "used" memory sticks, but "damaged" or "defective" may not sell all too well...

    I agree, however, that this does seem like a cool way to resurect older systems into useful appliances (print servers, routers/gateways, etc).

    Verbatim

    --
    Price, Quality, Time. Pick none. What, you thought you had a choice?