MRAM Inches Towards Prime Time
levin writes "According to an article over at EETimes, magnetoresistive RAM chips are getting a little more practical. Infineon Technologies released info on a new 16M MRAM component on Tuesday and the read and write cycle times of this chip make it 'competitive with established DRAM.' How long before nonvolatile memory becomes the solution to crash-prone software rather than better programming?"
I dont care how fast it is... What good is 16M?
Toast lands jelly down. If you jelly both sides of a piece of toast, it will hover in a state of quantum indecision.
Last time I checked, most of the software crashes aren't caused by memory randomly disappearing.
How long before nonvolatile memory becomes the solution to crash-prone software rather than better programming?
Probably very long......
There are very little volatile-memory related software bugs.....
HINT: You don't want your ram back in the same corrupt state it was in before the reboot.
Jeroen
Secure messaging: http://quickmsg.vreeken.net/
That being said, imagine the power savings and lightning fast startup times! I'd love an "instant on" PC! ( or, erm...Mac
CAn'T CompreHend SARcaSm?
Looks cool for applications such as hibernate.
Someone explain to me how MRAM will help with stability if it is simply replacing the same type of functionality that good old fashioned RAM has.
My sig is blank, I typed this by hand.
How is nonvolatile RAM supposed to prevent crashes? Crashes are the result of unexpected program interaction, hardware incompatibilty, or poorly-anticipated user input.
"How long before nonvolatile memory becomes the solution to crash-prone software rather than better programming?"
Never. Having the same bits in memory after a reboot doesn't help if you wrote the wrong bits in the first place.
"On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
~Lake
Non-volatile main memory is unlikely to be a solution against crash-prone software. If the software crashed because there was a bug in how it handled the data in memory, if the data is still there and the application reads it again, it'll just crash in exactly the same place.
In any case, an application crashing very seldom causes the machine to actually power down, and an application crashing and being restarted never gets to use the same memory the same way anyway, so the point is entirely moot. If your main memory is nonvolatile RAM, the advantage is you can design a system that can be powered down and suspended without having all that lengthy write of the entire machine's state to disk (and read when it comes back up again), which would be extremely useful on a laptop. If you can do this, you can have essentially uptime of years, so the incentive would be to write MORE stable operating systems and applications if the expectation is that even a laptop may go years between reboots.
Oolite: Elite-like game. For Mac, Linux and Windows
How long before nonvolatile memory becomes the solution to crash-prone software rather than better programming?
What? A crash-prone program is a crash-prone program, regardless of whether it vanishes or not when you turn the power off.
How long before nonvolatile memory becomes the solution to crash-prone software rather than better programming?
Good, now I'll be able to preserve memory corruption even after a power-cycle! Last time I checked, software crashes weren't due to the fact that DRAM loses its contents when powered down.
If your operating system has crashed, it has crashed. You need to reboot it. MRAM cannot change that. The point is that with MRAM you should be able to switch off your computer and switch it on again later without reboot and without need to save RAM contents to disk at power-down and to retore them from disk after the system is switched on again.
Hence if anything, this technology will increase pressure on operating system vendors to produce OSes which don't crash badly enough to require a reboot.
Under construction: swpat politics overview article
Once step closer to replacing HDD, CDROM, DVD and all those other "moving parts" storage devices.
In 20 years, we'll all be looking back at DVD and CDROM like we do at Tape Cassette.
Moving parts and things that go whirr make me cringe.
I just want to plug it in and get instant access.
Who cares about instant on? Using this advanced operating system called Linux my PC never needs a reboot.
On the other hand, Windows users are gonna love this feature.
"How long before nonvolatile memory becomes the solution to crash-prone software rather than better programming?"
Hello. What do you think -hard disks- are?
I'll give you 5 seconds to come up with a list of operating system 'features' that have been 'standardized' which really resulted from this 'ideology' about how to not write 'safe' code and just let other parts of the system 'deal with it'
Give up? Okay, I'll give you a few:
1. Swap. Yup, if the program has no idea how much RAM it has or needs, and no idea how to manage it, and the programmer just wants it all
2. "Protected Memory". Yup. Same deal. Let the OS deal with 'bad programming'.
Non-volatile memory has nothing to do with 'protecting from bad programming' and everything to do with writing 'true' persistent state machines... just like these two 'features'.
In summary: If it wasn't for 'bad programming', operating systems wouldn't have anything to do
Flame on.
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
That means i can switch on my comp like i do the TV!!thats pretty cool.
in that case
nxt stage:"Computer ON"------"computer off" voice commands!
Is this all fantasy or will it come true one day with the MRAM!
Why does yahoo do this
Is not the main point of non-volatile memory. The two main advantages are significatly less power consumption (only put energy into it, when you want to change the sate, not on every single cycle) and having permanent storage at the speed of System Memory (may I see the time coming, when there will be no seperate permanent storage devices, like hds and all this periphery, with all the bus technology and other error prone parts?...It's a long shot, but this is an important first step)
Just because I can imagine doing a hippopotamus, doesn't mean I'd like to do it.
I don't see how non-volatile memory will cure crash-prone software. One of the main contributory factors to buggy code is memory leaks. Non-volatile memory would allow the invalid state of memory to be preserved between 'reboots', making defects in code more obvious. If non-volatile memory is adopted, then we'll need higher quality code than we have now.
"The noble art of losing face will one day save the human race"---Hans Blix
They'll just make the ram blue with white text on it in the future. Windows can even crash a stationary car these days
I like muppets.
... to the "immediately on" computer. Boot times reduced to next to nothing will be prove to be a giant leap in the usability of computers, I think.
"It usualy starts with some screaming. Afterwards there is much running around."
Wouldn't non-volatile RAM actually make programmers more attentive?
/proc ?) like any other file. Then, determining which program was leaking ram could be done with a simple `ls -la`.
One of the most common programming errors is a memory-leak. Can you imagine what would happen if you couln't reboot the Windows machine to clear the memory for another few days?
Non-volatile RAM may be the best excuse yet to switch to something more, ah... tightly coded!
That said, I think that the current memory/disk model of computing is antiquated. Why distinguish memory from disk? Why not treat it all the same?
A HDD is the base storage medium. RAM is a cache of that. L2 cache is a cache of RAM. L1 cache caches L2 cache.
Why the distinction from HDD to memory? Instead of allocating RAM directly, why not follow the *nix philosophy of "everything is a file" and if you want a storage space for some temp values, open a file and write them in.
The memory allocated for a particular process would then appear as a file (perhaps buried somewhere in
Instead of flushing to special swap partitions, the memory files would simply be committed to disk when you run out of RAM. (moved down the cache chain from RAM to disk)
Switching to a fundamentally different type of memory may be the right time to reconsider system architectures and challenge our conventional assumptions of computing, especially since memory leaks can be so severe, even in commercial software!
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Hmm...fast and low power - I like it. I don't exactly know how it might be a substitue for my PC's RAM, but I can certainly imagine it being a great way to replace Flash and SRAM.
Programs don't crash because the memory is cleared during reboots. They crash because they refer to memory that never existed in the first place.
Perhaps nonvolatile memory will improve startup times (think super-fast hibernate) but crashes? Not a chance.
The real future is in ENRAM. Give it all your money and then it crashes !
A Multiplayer Strategy Game for Mac OS X, Windows, and Linux
At least now, when your Windows crashes, you can reboot your machine, and in extreme cases powercycle it.
However, with such non-volatile RAM, this is a thing of the past: even leaving the machine unpowered for an hour won't erase the crashed program state...
Why everyone automatically assumed that memory can't be cleared upon reboot?! WTF???!! What you were smoking today? It's fucking RAM guys! BIOS could clean it for you during reboot. Or operating system could do it before loading itself.
Exactly. The kernel's crashed state will be preserved, so you won't be able to reboot cleanly. Some kind of checkpointing (like in database servers) would be useful here: just reboot to the last valid checkpoint. Of course, this requires a lot more WRAM though...
cpghost at Cordula's Web.
Yay! We're going to get instant-on computers, just like home computers in the '80s were. How are we going to achieve it? Some form of jumped-up magnetic core memory!
In soviet russia stale jokes recycle you!
I thought the whole point [of MRAM] was reducing power consumption. Am I mising something?
If the big advantage of non-volatile RAM is the reduction in how many times you have to wait for your PC to perform a full startup and shutdown the last thing you want to have is your software being so crap that you have to reboot it all the time anyway.
You see, there are perfectly good reasons to tunr a computer off, regardless of whether it's running Linux or Windows or Solaris or MacOS X. And then you'll want it to start as quickly as possible when you want it back on.
Laptops are the prime example. You don't want it on all the time, when you don't need it. You want to still have some juice in the battery when you do need it. You'll also want it up and running as fast as possible when you do need it.
Dunno about you, but I'd rather just start using it, instead of sitting and watching through 5 minutes of Linux loading everything _and_ the kitchen sink at startup, then loading KDE, then taking ages to start Open Office, etc. If MRAM lets me have it up and ready in 1 second, I'm all for it.
E.g., there are computers in a lot of gadgets. Take my CD-based MP3 player, for example. Whenever I power it up, it takes a couple of seconds to basically boot and read the track list. If all that could stay in MRAM, and have it start playing the millisecond I hit that button, it would be a much more convenient gadget.
And even with regular PCs, you have to understand that some people actually _use_ their PC. They don't just keep them for a retarded "my uptime can beat yours" contest. And, like any other tool, there are perfectly good reasons to turn it off when you're not using it any more.
If nothing else, for the noise. Now this computer is a lot more silent since I replaced the fans with 12 dBA ones, and got Seagate drives. But all else being equal, I'd still _not_ have an extra source of noise near my bed when I'm trying to sleep.
For a lot of people the electricity bill is a factor too. Yes, it's not a small fortune, but for a lot of people it matters. And it's still paying money for something they don't need. They're getting exactly zero use out of that computer running all night, so why would that be on their electricity bill?
Basically all I'm saying is: next time make sure brains are engaged, before jumping in with the standard knee-jerk "Microsoft sucks" post. Yes, I know. It gives retards the impression of belonging to some big sad community. Makes you sooo cool if you're whining about Microsoft too.
But sometimes it still can't hurt to pull your head out of your ass. There _are_ uses for some stuff (e.g., the MRAM we're talking about here) that aren't a Windows-vs-Linux thing at all. They're just as useful for either.
Of course, that would mean actually thinking and actually doing a real analysis, instead of just reaching for the fashionable dogma. But I'm sure you'll get the hang of that, eventually.
A polar bear is a cartesian bear after a coordinate transform.
There are very little volatile-memory related software bugs.....
Oh, are you SURE about that? You should research such statements first, my friend, rather than assuming.
Take a look at this review from last year of power supplies by Anandtech.
They ran a six-hour memory test 54 times--and found that with 512MB of RAM, after each six hour test there were an average of four bits that had flipped! That means there is a memory error on a 512MB PC--on average--every 90 minutes!
If that error occurs in a code segment in a driver, you may get a system crash. In a Windows DLL, perhaps some system instability. In an application, perhaps an application crash. If it's in a data segment, your important manuscript may suddenly lose a paragraph or skip a couple pages as a linked list pointer jumps to the wrong spot, or you may find a bunch of junk replacing normal text.
Memory errors are a serious problem that very few people acknowledge. Why people still buy non-ECC RAM is beyond me. (Of course, even with ECC RAM, there are still various places inside the PC where failure can occur--along the various buses for exmaple, which don't all have ECC. So this is only part of the solution.)
More reliable RAM would definitely be a step in the right direction.
The magnetoresistive cell can change the way ANY sequential logic circuit operates. It can make much denser CPUs, ASICs and FPGAs, because now you can make the clock input be THE power supply line.
... well, longer.
It can also make your timepiece battery last
You just need to look at it in a different view then Yet Another Non-PowerCycle-Erasable Storage.
Now one thing Novell is not is stupid. They refused.
Somehow, the story of the challenge got around the exhibition floor, and a crowd assembled. Perhaps it was gremlins. Never eager to pass up an opportunity, the keykos staff happily spent the next hour kicking their plug out of the wall. Each time, the system would come back within 30 seconds (15 of which were spent in the bios prom, which was embarassing, but not really key logic's fault). Each time key logic did this, more of the audience would give novell a dubious look.
Eventually, the novell folks couldn't take it anymore, and gritting their teeth they carefully turned the power off on their machine, hoping that nothing would go wrong. As you might expect, the machine successfully stopped running. Very reliable.
Having successfully stopped their machine, novell crossed their fingers and turned the machine back on. 40 minutes later, they were still checking their file systems. Not a single useful program had been started.
Figuring they probably had made their point, and not wanting to cause undeserved embarassment, the keykos folks stopped pulling the plug after five or six recoveries.
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
You know, the core memory of "way back when" was also magnetic, and nonvolitile. Actually, it was destructive-on-read, so you only had to refresh a bit it when reading it. Otherwise, you could turn the machine off and it would keep its contents.
(No, I'm not that old. But I had some friends in college who played around with an old PDP-11/45 we found, which used core.)
You described a harware bug (last time I checked RAM was still considered hardware).....
I was talking about a software bug.
Jeroen
Secure messaging: http://quickmsg.vreeken.net/
I at least don't consider any data too safe before it hits disk anyway. The risk that it gets destroyed due to user error (like me accidentally hitting the wrong key), program bug, failing hardware, power outage,
Now if I'd use my computer to control a nuclear power plant or medical equipment, I'd certainly use ECC-RAM only. But then, I'd probably use more reliable components all over the place.
You don't secure your house against airplain crashes either, do you?
The Tao of math: The numbers you can count are not the real numbers.
Err, then the PC and ram Anandtech have been using are dodgy.
Due to the design of Dynamic RAM chips, memory bit flip errors are not influenced by how long the memory sits "idle". I emphise idle here because Dynamic ram is never really idle. Each cell in a DRAM chip contains a capacitor and a transistor. If a DRAM cell is left to its own devices, the capacitor soon discarges and the cell looses its state. To stop this from happening, in the background, the RAM controller on the chip is constantly recharging the capacitors. Each cell is read and rewritten about every few milliseconds.
Because DRAM chips are never idle, the whole methodolgy of the anandtech test is WRONG, and the most obvious conclusion is that anandtech is using dodgy ram, or is simply pushing the RAM beyond their specs to forcibly generate errors.
You described a harware bug (last time I checked RAM was still considered hardware)..... I was talking about a software bug.
Every time a bit flips in a code segment, what do you think happens? More likely than not, a bug is introduced into the software at that address. A JZ instruction turned into a JNZ, for example (0x74 into 0x75). Memory errors do indeed cause software bugs.
Memory errors do indeed cause software bugs
And how exactly is one expected to code against this? Again, it's not the *developers* fault that memory returned a different value than was stored there.
"Ignorance more frequently begets confidence than does knowledge"
- Charles Darwin
Now if I'd use my computer to control a nuclear power plant or medical equipment, I'd certainly use ECC-RAM only. But then, I'd probably use more reliable components all over the place.
Well, I do use my PC for software development, and I surely don't want to be shipping buggy software to customers because I saved $2 buying non-ECC RAM. Really--$2. That's all the price difference is these days. Check out Crucial or NewEgg. There's no reason to go non-ECC. ECC isn't any slower (paranoid folks will tell you it is; not true--266 MHz is 266 MHz, and CAS 2 is CAS 2), and the reliability is tops.
It's funny you mention things like a "corrupt file system" being a larger risk--most modern operating systems (including Windows and Linux) use a large amount (often most, if you have a lot) of your RAM as a disk cache. If a bit flips in one of those areas, you are very likely to end up with corrupt data on your drive, such as a corrupt inode. Where do you think such errors come from? They don't come from the drives themselves, which rely extensively on Reed Solomon ECC.
Perhaps I'm being too paranoid, but I see some potential for abuse here. Imagine a program that deals with passwords or credit card numbers... They could be still lying around in your non-volatile memory after the machine is switched off.
An intelligent program should then zero out those passwords before freeing memory. Even so, would this kind of storage suffer from the security issue already discussed here and here (ability to retrieve data from many previous writes)?
Score: i, Imaginary
So according to you when I trip over the power cord and all software disappears from RAM its a software bug?
The HARDWARE failed and thus its a hardware bug.
(In the power cord example its an operator bug)
Jeroen
Secure messaging: http://quickmsg.vreeken.net/
Will this stuff be here before Bubble Memory? I have been waiting for bubble memory for a long time now.
They whose government reduces their essential liberties for temporary security, receive neither liberty nor security.
And how exactly is one expected to code against this?
It's not difficult.
Just add ECC in software.
I've done this before in some of the software I've written for hospitals and banks; it's been a design requirement for the software to detect when there is a failure, and to correct if possible.
And, yes, failures ARE detected, AND corrected.
The way it works is you divide memory up into blocks (for example, 512 bytes of 1KB). You do this for both your data and code. For each memory block, store the ECC data (usually, in a separate area of memory, so it's non-intrusive to the program design).
A thread runs in the background, often on a second CPU, continuously checking the program's data and code to ensure that the ECC data is valid. When an error is detected, it is logged and corrected if possible.
When modifying data, a flag is set for that memory block that it has been altered; a new ECC value is calculated as soon thereafter as possible. (This is done automatically by setting the CPU to generate an exception when writing to a particular segment. It's a feature built into Intel processors and available through high-level calls in both Windows and Linux.)
I'm sure you remember the Java exploit from a couple of years back, where the security model was bypassed completely by blowing a hairdryer on the RAM until a byte code error was induced in very-carefully-constructed code. Software ECC is the kind of thing you need to do to mitigate those types of attacks.
No. Why would you say that?
I'm saying it's a software bug when your code reads:
JNZ 112
when it was supposed to read:
JZ 112
If the programmer explicity wrote it as "JZ 112", it's a bug in the software due to the fault of the programmer. If the machine randomly flips a bit and switches it, it's a bug in the software due to the fault of the memory.
Just because memory caused the bug in the software to appear doesn't mean it's not a software bug. Memory isn't any less capable of adding bugs to software than an incompetent programmer is.
No need for something so complex. All one has to do to recover from such a state is to extend (or emulate one of volatile RAM's 'features', if you wish) the 'reset'-function:
The moment you push the 'reset' button, not only does the system reboot, but the memory is also wiped, after which a non-corrupted copy is loaded from the 'HDD' (or whatever is used for storage).
So in other words, the 'power'-button would be used to power the system down, while the entire state would be preserved (like the hibernate feature).
The 'reset'-button would literally reset the system to its default state, just like when you boot a system employing volatile RAM.
Site & blog: http://www.mayaposch.com
While it's possible, RAM is a hardware failure and can rarely be connected with software.
On the other hand, our handy ability to shut down and clear out bad programming is a luxury that might become more difficult with the new RAM technology.
This could mean that viruses and other malware could remain even more resistant to removal than before!
More reliable RAM would definitely be a step in the right direction.
The RAM isn't less reliable because it is non-volatile, you idiot. For all you know, this MRAM could have 10x the failure rate of DRAM.
Also, that Anandtech review obviously used crappy RAM or some eleet overclocked piece of shit computer: ECC RAM will pick up flipped bits and they are reported. Now, on a solid system with 16GB of good ECC RAM, errors just don't occur anywhere near that frequency.
No, that's wrong. The truth is that errors in dynamic RAM can be introduced on each refresh. As you said yourself, dynamic RAM needs to be refreshed every few milliseconds--read and rewritten. Each time that happens, it's possible for an error to be introduced. If the refresh circuitry reads the value incorrectly, you get an error. If it writes the value incorrectly, you get an error. The longer the RAM sits around, the more refresh cycles, so the greater the chance for errors. If the voltages aren't stable enough, for example, you'll find a "1" bit refreshed with slightly too low of a current so that when the next refresh comes around, it's read as a "0" as it's been discharging over time and falls just below the threshhold to be read as a "1".
As far as errors not being introduced when the memory is "idle," you're thinking of static RAM. Static RAM doesn't need to be refreshed, and thus actually CAN be idle. So it holds a huge advantage here. Without the refresh cycle, there's no place for errors to be introduced except during the actual reads and writes by the processor.
The RAM isn't less reliable because it is non-volatile, you idiot.
Not true. See this post for details on how having to refresh memory introduces errors. NVRAM is inherently stable because it doesn't need to be refreshed every few milliseconds, so you don't have the possibility of the refresh cycle introducing errors.
ECC RAM will pick up flipped bits and they are reported.
Right, that was exactly my point; there is no reason that we shouldn't all be using ECC RAM, yet there are still dumbasses out there who insist on saving $5 on their $1000 system by buying non-ECC RAM.
Sorry--that should have read "512 bytes or 1KB" ... not "512 bytes of 1KB." Too many typos means it's time for me to head to bed....
How long before nonvolatile memory becomes the solution to crash-prone software rather than better programming?
I dont understand how non-volatile memory would solve the problems of crash prone software. Okay so it might make the problem of recovering lost work after a crash that little bit easier. I fail to see how its going to solve the problem of crashing though.
Electronic Music Made Using Linux http://soundcloud.com/polyp
...but they won't be the same as uses for RAM or for hard disk.
Using it for RAM would be silly - RAM is supposed to be transient, keeping it around would be a security and stability loss.
Using it for hard disk would be silly - the price per megabyte would be ridiculous unless you're doing stock-market data crunching or some such.
Some uses I can immediately see for it:
- boot the OS, and save a snapshot for an instant reboot
- use it to store persistent caches of binaries, libraries, etc
- use it for filesystem and database journals
- do RAID4 and use it to hold the parity volume
Without the refresh cycle, there's no place for errors to be introduced except during the actual reads and writes by the processor.
What about external influences (heat, cosmic radiation, etc)?
What's in a sig?
Pete Burris, president and namesake of Alpaca Pete's, a retail chain and website that sells rugs and clothes made from the woolly South American alpaca, buys finished products almost exclusively from a group of about 4,000 Peruvians from the island of Amantani, located in the middle of Lake Titicaca, the highest-elevation lake in the world. Aside from a small tourism business, Burris says, his Alpaca exports constitute one of the only local sources of employment.
One of the interesting aspects of MRAM is the ability to not lose system memory "state". You turn off the machine, and the contents of memory remain for the next session.
Can you imagine a windows XP "state" that has never been rebooted? How about a continually running process that has a memory leak?
Eventually all machines need to be rebooted (some much less than others). That means re-creating a "clean" system state in memory.
-ted
You make it sound like it's impossible to add a reset button.
Site & blog: http://www.mayaposch.com
Is for a PC equipped with AAMRAM. Nothing like having an Air to Air attack missile doubling as RAM on my motherboard. That would certainly make software developers think twice about releasing buggy code.
Is that a real poncho? I mean, is that a Mexican poncho or is that a Sears poncho?
Two solutions really, I'm no CS major, but I think they ought to work.
Solution 1: A button on the motherboard (or jumper, or on the front of the case) that clears the memory. I'm not sure what exactly it would want to write. 1's? 0's? 1/2s?
Solution 2: Bootmenu. Even old versions of Windows know when you didnt finish your bootup sequence and give you a menu to delete the hibernation data. To be safe, perhaps displaying this every time the computer is dehibernated would be a good solution.
Solution 3: Combination of 1 and 2.
Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
Call me when I can get rid of my hard disks, it's been a looong time since I wondered about 'static' ram computing machines.
What's in a sig?
Yes, and if the state is flipped by an outside influence before the read, the new "read and rewritten state" will be incorrect.
This is known to happen randomly from natural radiation (mainly cosmic rays). That is the main reason ECC memory exists (as the grandparent pointed out). You should do a little research before loudly proclaiming your incorrect thesis next time.
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
in other words, WHO CARES?
These powerpoint slides describe the "hairdryer attack" in question, for the curious.
If you want "better programming", send your programming jobs to new zealand and not india
Webmaster of Infoweb
Then what's a hardware bug? A crack in the computer case?
But that's beside the point.
The article implied that non-volatile memory was somehow a solution to poorly written software, which is preposterous. You're saying that memory that doesn't fail is a solution to memory that does fail, which goes without saying.
ENDUT! HOCH HECH!
Funny, you are arguing against yourself:
With 'NO' you imply that me tripping the power cord wasn't a software bug....
Now lets have some fun with the rest of your post:
I'm saying it's a software bug when your code goes blank and stops when it was supposed to make sense and execute.
If the programmer explicitly blanked the RAM it's a bug in the software due to the fault of the programmer. If the user randomly pulls the plug, it's a bug in the software due to the fault of the user.
Just because the user caused the bug in the software to appear doesn't mean it's not a software bug. Users aren't any less capable of adding bugs to software than an incompetent programmer is.
You just gave me a great excuse: 'It wasn't me.... blame the software guys..... Its always a software bug'
Jeroen
Secure messaging: http://quickmsg.vreeken.net/
You don't get it. What it implies is that there's a greater chance of error by refreshing an "old" value then a "new" one. The problem is that there is no such thing as an "old" value since the memory is never "idle".
I don't know how MemTest86 works, but, according to AnandTech, the delay between tests is several seconds. During those several seconds, there's a lot of refresh. If there's a problem once in a while with a refresh, then it will happen wether the memory was written 6 seconds or 6 hours ago. This means that running a test that last 6 seconds over and over for 6 hours should result in about the same number of error than running one test that last 6 hours.
I would really like to know where I can find the modified MemTest86 used to do this test because I think the AnandTech's article is, for the least, doubtful.
However, this is not so useless as you think... modern memory is not installed in single chips (at least not for PC's); modern memory is installed in sticks, which are comprised of many chips. While 2-megabytes is still too small for even a stick to hold much, it's not so far away from practical uses. When we start seeing 64Mbit chips then you'll know it's just around the corner before they appear on desktop systems.
Until then, a non-volatile 2-megabyte chip makes a great solution for cache memory inside instant-on products, like MP3 players :)
/dev/random
i usually just assume that if someone has physical access to the machine's cpu or memory (ram or hd) then there is no way to protect that machine from being exploited.
Yes and it just gets worse as chip densities increases. That's why IBM invented Chipkill (which is essentially RAID-5 for ECC RAM banks). The error rate for 1GB ECC memory-equipped server is 9 outages per 100 servers over 3 years IBM whitepaper, pdf. Non-ECC ram is probably rediculously high!
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
They're not dumbasses so much as slightly ignorant. They expect their RAM to operate correctly. And why shouldn't they?
Can software developers get any lazier ?
The ones who aren't lazy are too busy reading slash dot.
It's a double whammy.
The RAM doesn't crash the computer, but if you don't loose everything in RAM when the computer DOES crash for some other reason (by crash I mean something that needs a reboot here, not when your web browser craps out) then there isn't as much of an incentive to write software that doesn't do this.
`which fortune`
Not really a problem; The memory doesn't get reset but the processor's registers will (unless they use magnetoresistive registers too for some reason). So basically, yes, the memory will have all the same stuff, but the processor wont be looking in the same place to fetch instructions after you reboot it.
`which fortune`
That means there is a memory error on a 512MB PC--on average--every 90 minutes!
No, that means if you're running an extremely intensive memory tester that's changing ALL the memory every few seconds you'll have a memory error every 90 minutes. To interpret the memtest86 results you have to understand a bit about how memtest86 works, and how memory failures occur. From the memtest86 readme:
In other words you can't extend the results of memtest86 to the real world. Most of the time all of your memory isn't being constantly written to, it just maintains its current state.
AccountKiller
I think you're mistaken about the cause of DRAM errors. It's not the refresh which causes errors - it's errors in the charge stored in a cell.
The refresh effectively "rounds" the charge back to the nearest bit. If there is enough error in a cell, it will flip states. Without external influences, the chances of an error being of magnitude large enough to flip a bit is not zero, but so small that you could get away with saying "it never happens". But, for example, a stray cosmic ray has a chance of imparting enough charge to change the level beyond a threshold for that bit sense. The refresh then "rounds" the charge and effectively causes the bit to flip. It looks like the refresh is causing the error, but really the information was already lost when the cosmic ray hit, and there's nothing it could have done.
The point is Anandtech seem to have the wrong method, because letting DRAM lie in refresh for extended periods should make absolutely no difference to the chances of an error. It's going to be refreshed every 64ms (for the Micron SDRAM I was interfacing recently) regardless of whether you're accessing it or not.
The fact that Anandtech are also making conclusions about hardware buying based on statistics as worthless as their 0-8 errors over a tiny number of trials is also quite ridiculous, and reassures me that they don't know what they're doing.
No, it's just RAM. All you have to do is tell your computer that you want to reboot. RAM does not have to be initialized to all 0s before booting. You just set the program counter to the boot sequence and off you go - your OS will be loaded into RAM (no matter what sort of RAM, and no matter what the contents are which are overwritten).
From the article: "Before we go to commercial production there remains a lot of work to do."
Man I wish they'd just release some anyways. I don't care if the 1st generation is 1.42 microns^2 per cell, I'd just like to have the stuff. I've been waiting on the stuff ever since I first heard about it ages ago. But it's good to see they are still moving forward.
Sorry, but you aren't reading the article correctly.
Bit-errors are "hardware" problems, not software problems. By definition, a software problem is due to incorrect coding - it assumes hardware that perfectly conforms to spec, and with zero error. It would be impossible to write software (on modern hardware at least) that compensates for all bit-errors. Yes we have ecc memory and caches, but they are only one component. Now reliable hardware errors can be corrected for in software (the division error in old pentiums, for example). Also high-probability error-points (such as inter-machine communication) is designed with explicit redundancy/error-correction/detection. BUT this is data, not the base instruction. I am not aware of software solutions to correct for software instructions. (Ok, perfectly redundent CPU's which an I/O comparator, but this doesn't solve the problem of the DRAM bit-errors).
As for SRAM, in the CPU, that's even more volitle than DRAM. It has much higher current requirements, and thus has a higher probability of heat-related errors. They change values more often than main-memory, and have MUCH higher performance requirements. Most SRAM has ecc these days.
The static part of SRAM just means it's continually held at a certain logic-level. But in modern CPU's that's contingent apon a steady input voltage. With millions of taps on the voltage (and especially with over-clocked CPUs with faster than spec switching times, and often over-pumped voltages) the probability of a temporary voltage drain on on of millions of taps is very high.
With DRAM, there is a very controlled simultaneous refresh of a row (or column, don't remember which). Depending on the technology (I believe SDRAM falls into this category, but I'm not completely sure; yes I know S means Synchronous, not static), micro-caches (read as static ram) exist for the purpose of fast column accesses. When switching rows, the cache is written back in parallel to the row. When a particular row has aged too long, general memory access is halted so that a quick DRAM -> SRAM -> DRAM can occur (e.g. a refresh). This all happens transparently within the DRAM (though depending on the technology, the memory-controller may have to send the appropriate signals).
My point there is that in these types of memory architectures, refreshes are no more dangerous than regular memory reads.. The only possible danger here is if a particular memory cell will expire (discharge) sooner than spec, but isn't caught during testing because it happens to be refreshed in the wrong order; it's thus a time-bomb waiting to be accessed in the right order in production-environment. But such a situation is reproduceable.
One other note.. I said DRAM is a symmetric operation.. This is because in modern DRAM the entire row is read or written to at a time (tens of thousands of bits). In SRAM,only a single cache-line (roughly 128bits) is read/written at a time. Further, in some architectures, there is simultaneous access to different sections of the cache (2 read, 1 write caches, for example). A registry is the ultimate cache with many potential input and output ports to each and every bit...
-Michael
You are storing information in magnetic fields. There are LOTS of things around the house that produce magnetic fields. It's doesn't take much to knock it one way or another.
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
They'll have to start doing more than telling you to reboot.
My new
But, DRAM will hold its state without a refresh for MUCH longer then the time the chip actually spends between refreshes. This ensures that a cells don't discharge too far and loose their correct state. An DRAM chip that refreshes every 64ms will have cells that could hold their state for 200ms or longer.
There are external events that can cause a bit flip, but their occurance is so exceedingly rare, I would be suprised if more then one such event happened per year to people.
How long before nonvolatile memory becomes the solution to crash-prone software rather than better programming?
Seriously, Slashdot would be so much more interesting if editors just posted news stories without the little editorial riders. Sometimes these are from the submitters, sometimes from the editors, but they're almost always either biased, inflammatory, or just plain wrong. Enough already!
You won't see better programming until it starts chasing us down the streets and completely ruining great literature.
All MRAM does is make it non-volatile. It doesn't prevent or even ameliorate the problem you mentioned- it's not more reliable in the sense you're looking for.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
Absolutely agreed, would give you an "Insightful" if I had points.
This article could have occasioned some interesting discussion about real applications of MRAM, but instead we've got dozens of redundant comments pointing out the idiocy of that line in the writeup.
Two words: auto-save
We're using non-volatile memory as a solution
to crash-prone software. We're just not using
MRAM yet.
Umm... dude. You using ECC RAM is not going to protect users of your software from memory-related bugs(*). Your users using ECC RAM is going to protect them from memory-related bugs.
Don't get me wrong: Using ECC is always better than using non-ECC (it isn't really all that much more expensive), but you seem to be under the impression that non-ECC memory errors are more likely to corrupt your data than simple programmer errors in the OS, your filesystem code, your revision control system, etc. That impression is wrong.
(*) Unless you are talking about the extremely unlikely scenario of a memory-related bug causing random chances to your source code and another memory-related bug causing the revision control software (you do use RCS, don't you?) to not notice the random change when you do a "$RCS diff". Programmer error in the $RCS source code is far far far more likely to screw up your source than any random bit-flip error in memory.
HAND.
isn't what introduces the errors. It's random changes in the stored charge induced by the environment (e.g. cosmic rays and other electromagnetic radiation).
Try running memtest86 on any custom box and you will be suprised at the amount of problems you will find. I would say, at least 1 out of 50 boxes have some form of memory related issues. Obviously this has nothing to do with programming as the article states. But even with RAM stats programmed in SPD, the motherboard may still not handle them very well. I'm not sure that MRAM will help any if at all with industry compatibility issues among all vinders...but this is a very real issue that needs to be delt with!
Life is not for the lazy.
What? It gurantees that his code wasn't fucked up because of a hardware memory error at compile time (therefore causing a problem in the binary). I think that's all he ways saying.
Ah, of course, sometimes I forget that not everyone releases their code as source. :)
But regardless, the probability of such an error occurring is still far lower than that of another piece of system software introducing errors.
For example, if doing a static compile, there are probably lots of bugs hiding in libraries, and any of these is more likely to cause problems for his users than some random memory error.
Hell, even compiler bugs are probably more likely (e.g. gcc >=3.2 is notoriously buggy on Pentium4).
So, just to be absolutely clear about what I was saying: He's effectively guarding against a failure which is so unlikely that it can be considered irrelevant when considering the stability of his software.
HAND.
If it's in a data segment, your important manuscript may suddenly lose a paragraph or skip a couple pages as a linked list pointer jumps to the wrong spot, or you may find a bunch of junk replacing normal text.
Memory errors could also convert a NOP instruction into a HCF and burn down your house!
A lot of things COULD happen. But when was the last time any of your examples HAVE happened to anyone here? Show of hands? No one? Huh.
However, with such non-volatile RAM, this is a thing of the past: even leaving the machine unpowered for an hour won't erase the crashed program state...
It'd be easy enough to make it so that it you hold down the power button for, say, 10 seconds, the MRAM will be flushed and the machine will go through its "long" boot process.
I have had my previous computer on for 400 days just as a test of reliability. If memory errors are the cause of so many 'problems' why did my machine not fall over in over 400 days? Am I just lucky to have gotten 'good' ram.
If you take a look at the actual test results for their power supply overview, you will see that the number of errors varies widely among the different power supplies. As such, I submit that the RAM itself, which was not being overclocked in any way, is perfectly fine, and otherwise there should have been more consistency among the errors.
And how do you choose what/when is a steady state ready to save?
Totally agree. That was superlame.
Goodness me this is atrocious. Come on, you can do better than that fucker.
Do you realize just how small that error is? That's roughly one part in 4294967296 per 90 minutes. Even if you assume that the system takes up the entire 512mb of ram, the chances that that's going to hit something critical are minimal. If you start worrying about that small an error, you've got much bigger problems to worry about.
-"It seems like you're trying to exploit a security hole. Would you like help?"
Or is the x86 motherboard design so crappy that the memory controller is actually on the motherboard, and the RAM sticks we buy are just the chips, no control? If that's the case, it's not just a question of ECC being "just a couple of dollars more"; I never see consumer motherboards advertising ECC support.
Unlimited growth == Cancer.
Priceless! I was trying to figure out how the hell this could possibly work, until I hit the last paragraph.
Somebody mark the parent as the best troll of the year! (look at the name of the comment's submitter if you're wondering why I think it's a troll, not just a clever posting.)
So the program runs with the wrong values for a time. Any operations done between the last succesfull check and this check has to be undone and redone (because they could be tainted because of the wrong values). And if a memory error happens in the code of the checking/correcting thread, the program crashes anyway, because the recovery system has been damaged.
Wouldn't it be easier and less error-prone to just restart the program when an error is detected ?
Is it available to normal (non-root) user processes under Linux ? What is this interface called ?
No, to defend against these types of attacks you need an elaborate an arcane defensive measure known as a locked door :).
I'm reminded of all those stories I heard about machines behind firewall after firewall and password after password, which someone, dressed as a janitor, simply carried out of the building...
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
I read somewhere that the typical bit error rate for normal non-ECC memory is one flip every 3 months or so. Sounds reasonable given the stability of my non-ECC systems.
Certainly 90 minutes is a joke; an overnight memtest86 will easily show decent quality memory's a lot more stable than that.
Since there's no error correction going on, and no active refreshing going on, lots of small "pushes" could accumulate and cause the state of the MRAM bits to change more often than those of DRAM (which, at least, restore an almost-one to a one and an almost-zero to a zero on each refresh).
Unless there's some physical mechanism in MRAM that accomplishes the same thing?
"A great democracy must be progressive or it will soon cease to be a great democracy." --Theodore Roosevelt
How long before nonvolatile memory becomes the solution to crash-prone software rather than better programming?
Errr.... Wouldn't non-volatile memory make life harder when programs crash? Say Windows wipes out on a computer with MRAM. Turning the machine off and then back on wouldn't erase the crashed image in memory, and you'd restore right back to the crashed state. Sounds like software (well, OS and drivers anyway) would have to be much more robust than they are today.
P.
"That's exactly what I said, only different."
Right now data has to be kept on disk, and disk access is 1000x slower than RAM. If NVRAM comes at competitive price and reliability, it can mean a large difference in everyday desktop use (loading any application in millisecs), and ALL the difference in databases, where all data can be kept in memory, regardless of its size.
The big difference is in how programs no longer needing to differentiate between slow disk storage and fast ram access, when you can have all the data in 1 quick cache.
If you are afraid of programs or the OS losing stability after staying in RAM for a long time, then make a "stable snapshot" in NVRAM somewhere, and reload it when things get unstable. Almost instant "reboot".
VIVA1023.com | Political Fashion.
If you drop your computer in a swimming pool, the software will probably stop working correctly. Does that mean you need to go review and debug your code to find the "software bug" that caused the problem?
Anyone who's had much experience with the Windows hibernate feature (especially the first year or so it was released) knows that a fresh batch of 0's and 1's on each boot works more consistently.
"Memory errors are RAMPANT" -- Oh my gawd! How will I sleep tonight?
Priceless! I was trying to figure out how the hell this could possibly work, until I hit the last paragraph.... Somebody mark the parent as the best troll of the year!
Sorry I wasn't able to respond sooner (I really was asleep), but I didn't make that up.
Here are some links about it the hairdryer attack.
CNet News
Some professor's lecture notes (Google Cache)
It's quite real. If you deny that such attacks exist, you're living in a fantasy world.
Wow.
You really don't understand what a "bug" is, do you? You should pick up a nice technology dictionary some day when you get a chance.
Well, that and perhaps get a bit of an additude adjustment.... Bugs can come from many places other than the original programmers.
I should ridicule you for that ridiculous spelling.
security loss: /proc/kcore indefinitely
- passwords and sensitive data will persist in
- powering down the machine will no longer prevent people snooping what you were doing beforehand
- I encrypt my swap. It would be a bitch to have to encrypt most of my RAM as well.
stability loss:
- any sort of "instant on" would be vulnerable to "repeating the mistake"
- any sort of "instant on" would be a virus writer's wet dream.
- it's a whole lot quicker and easier for a runaway process to overwrite and erase all the nvram, than to overwrite all the data on the hard disk.
And, yes, failures ARE detected, OR corrected.
Damn. Looks like a bit flipped.
That doesn't follow.
memtest86 can't write to "all of your memory" at once. It just writes one word at a time, like any other program.
The text you quote mentions that an adjacent word can be corrupted when a different one is written. True enough. That can happen on any single write. memtest86 checks to see if it happened. Non-test software does not, so you'll never know. Memory is rarely allocated so that you skip every other word to avoid errors in adjacent cells.
The reason memtest86 does lots of writes is to make the intermittent error show up often enough that you might see it in a reasonable amount of time.
Even an "idle" program (and OS) is continually updating a lot of RAM locations.
ECC memory is good for detecting and correcting
single bit errors only. A random cosmic ray
traveling at near light speed can flip a bit
in otherwise perfectly good memory. Even rad-
hardened memory is not impervious to continued
cosmic (or other) radiation. This MREM memory
offers an improved operating environment for
reliable computing, BUT until memory is designed
to detect AND correct multiple bit errors (such
as the use of Hamming Code in hardware), truly
rock solid computing will not have arrived.
Better memory hardware does NOT in any case
guarantee better code -- the old axiom about
GIGO (Garbage In == Garbage Out) still applies
I think the first major use of MRAM would be in the hard disk cache. They're pretty small (2-8MB) so you wouldn't need much and they make a huge difference in your computer's speed BUT if the power goes out or your computer crashes completely you lose whatever's in it.(Not common but does happen and the situation can be severe). If it was non-volatile the computer just could finish the operation after it is restarted and the data wouldn't be corrupted or, at the very least, less likely to be corrupted.