New Framework For Programming Unreliable Chips

← Back to Stories (view on slashdot.org)

New Framework For Programming Unreliable Chips

Posted by samzenpus on Monday November 4, 2013 @02:33AM from the this-is-how-you-do-it dept.

rtoz writes "For handling the future unreliable chips, a research group at MIT's Computer Science and Artificial Intelligence Laboratory has developed a new programming framework that enables software developers to specify when errors may be tolerable. The system then calculates the probability that the software will perform as it's intended. As transistors get smaller, they also become less reliable. This reliability won't be a major issue in some cases. For example, if few pixels in each frame of a high-definition video are improperly decoded, viewers probably won't notice — but relaxing the requirement of perfect decoding could yield gains in speed or energy efficiency."

13 of 128 comments (clear)

Min score:

Reason:

Sort:

godzilla by Anonymous Coward · 2013-11-04 02:37 · Score: 5, Insightful

Asking software to correct hardware errors is like asking godzilla to protect tokyo from mega godzilla
this does not lead to rising property values
1. Re:godzilla by n6mod · 2013-11-04 02:57 · Score: 5, Interesting
  
  I was hoping someone would mention James Mickens' epic rant.
  
  --
  You have violated Robot's Rules of Order and will be asked to leave the future immediately.
2. Re:godzilla by vux984 · 2013-11-04 06:31 · Score: 2
  
  OTOH, in measurement theory, it's been long known that random errors can be eliminated by post-processing multiple measurements.
  Gaining speed an energy efficiency is not usually accomplished by doing something multiple times, and then post processing the results of THAT, when you used to just do it once and got it right.
  You'll have to do the measurements in parallel, and do it a lot faster to have time for the post processing and still come out ahead for performance. And I'm still not sure that buys you any improved efficiency.
  random errors can be eliminated by post-processing multiple measurements.
  And this is the real crux of the paradox :) Random errors can be introduced by post processing multiple measurements on an unreliable processor doing the post processing.
  Now we have to post-post-process the results of the post-processed results to eliminate any random errors there? Turtles all the way down.
  That said, as TFA suggested there are operations that can tolerate error, like video decoding -- and if we can realize substantial gains in performance or energy efficiency that translates into your laptop running a lot longer in exchange for a few transient (sub tenth of a second) pixel errors... that's a pretty good trade.
3. Re:godzilla by K.+S.+Kyosuke · 2013-11-04 07:56 · Score: 2
  
  Gaining speed an energy efficiency is not usually accomplished by doing something multiple times, and then post processing the results of THAT, when you used to just do it once and got it right.
  For some kinds of computations, results can be verified in a time much shorter than the time in which they are computed. Often even asymptotically, but that's not even necessary. If you can perform a certain computation twice as fast and with half the energy on a faster but sometimes unreliable circuit/computational node, with the proviso that you need to invest five percent extra time and energy to check the result, you've still won big. (There are even kinds of computation when not even probabilistically wrong results don't matter all that much because the computation as a whole doesn't diverge easily, but I digress.)
  
  Now we have to post-post-process the results of the post-processed results to eliminate any random errors there? Turtles all the way down.
  How do you know that the universe isn't lying to you? How do you know that your brain isn't delusional about the lack of cognitive problems on your part? That's the same kind of questions.
  
  That said, as TFA suggested there are operations that can tolerate error, like video decoding -- and if we can realize substantial gains in performance or energy efficiency that translates into your laptop running a lot longer in exchange for a few transient (sub tenth of a second) pixel errors... that's a pretty good trade.
  I believe I've already seen that kind of computing before. I even believe there had already been a post on something like this here on /. (Unfortunately, that was at a time when I didn't write extensive searchable notes on the things I stumbled upon, so I can't serve with a link.)
  
  --
  Ezekiel 23:20
Hmmm ... by gstoddart · 2013-11-04 02:38 · Score: 4, Insightful

So, expect the quality of computers to go downhill over the next few years, but we'll do out best to fix it in software?
That sounds like we're putting the quality control on the wrong side of the equation to me.

--
Lost at C:>. Found at C.
1. Re:Hmmm ... by bill_mcgonigle · 2013-11-04 02:44 · Score: 2
  
  So, expect the quality of computers to go downhill over the next few years, but we'll do out best to fix it in software?
  If you use modern hard drives, you've already accepted high error rates corrected by software.
  
  --
  My God, it's Full of Source!
  OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
2. Re:Hmmm ... by ZeroPly · 2013-11-04 03:19 · Score: 2
  
  Relax, pal - frameworks that don't particularly care about accuracy have been around for years now. If you don't believe me, talk to anyone who uses .NET Framework.
  
  --
  Support microSD: in a post 9/11 world, it is unwise to carry your data on media that you cannot comfortably swallow.
3. Re:Hmmm ... by viperidaenz · 2013-11-04 08:04 · Score: 2
  
  Our MS developers of course want everyone to have a Core i7 machine with 64GB RAM and SSD hard drive
  Do what the company I'm working for has done then.
  Give everyone an i7 with 16GB RAM and an SSD.
  Except they run Windows 7 32bit, so we can only use 4GB of that (and PAE is disabled on Win7 32bit), and the SSD is the D: drive, not the system drive so when everything does page, it slows to a crawl.
How on earth by dmatos · 2013-11-04 02:48 · Score: 4, Insightful

are they going to make "unreliable transistors" that, upon failure, simply decode a pixel incorrectly, rather than, oh, I don't know, branching the program to an unspecified memory address in the middle of nowhere and borking everything.
They'd have to completely re-architect whatever chip is doing the calculations. You'd need three classes of "data" - instructions, important data (branch addresses, etc), and unimportant data. Only one of these could be run on unreliable transistors.
I can't imagine a way of doing that where the overhead takes less time than actually using decent transistors in the first place.
Oh, wait. It's a software lab that's doing this. Never mind, they're not thinking about the hardware at all.

--

It may look like I'm doing nothing, but I'm actively waiting for my problems to go away.
--Scott Adams
1. Re:How on earth by bluefoxlucid · 2013-11-04 07:06 · Score: 3, Insightful
  
  Erm, that's the whole point. If we allowed high error rates with existing architectures, none of our results would be trustworthy. I imagine the most practical approach would be a fast, low-power but error-prone co-processor living alongside the main, low-error processor.
  Or you know, the thing from 5000 years ago where we used 3 CPUs (we could on-package ALU this shit today) all running at high speeds and looking for 2 that get the same result and accepting that result. It's called MISD architecture.
  
  --
  Support my political activism on Patreon.
Chicken and the Egg. by jellomizer · 2013-11-04 03:17 · Score: 3, Informative

We need software to design hardware to make software...
In short it is about better adjusting your tolerance levels on individual features.
I want my Integer arithmetic to be perfect. My Floating point, good up to 8 decimals places. But there components meant for interfacing with the human. Audio, so much stuff is altered or loss due to difference in quality of speakers, every top notch ones with Gold(Or whatever crazy stuff) Cables. So in your digital to audio conversion, you may be fine if a voltage is a bit off, or you skipped a random change, as the smoothing mechanism will often hide that little mistake.
Now for displays... We need to be pixel perfect when we have screens with little movement. But if we are watching a movie, a Pixel color #8F6314 can be #A07310 for 1 60th of a second and we wouldn't notice it. And most most displays are not even high enough quality to show these differences.
We hear of these errors and think, how horrible that we are not good perfect products... However it is more due to the trade-off of getting smaller and faster with a few more glitches,

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
1. Re:Chicken and the Egg. by CastrTroy · 2013-11-04 03:32 · Score: 2
  
  Yeah, but you could save just as much power (I'm guessing) with dedicated hardware decoders, as you could by letting the chips be inaccurate. As chips get smaller it's much more feasible to hard hardware specific chips for just about everything. The ARM chips in phones and tablets have all kinds of specialized hardware, some for decoding video and audio, other's for doing encryption and other things that are usually costly for a general purpose processor. Plus it's a lot easier for the developer to not have to consider how inaccurate stuff can be, and just writing code as though things are actually going to be correct. Even programming with binary floating point numbers is problematic enough, as there's many decimal floating point numbers that can't be properly represented.
  
  --
  
  Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Similar Idea to EnerJ Language by MetaDFF · 2013-11-04 03:30 · Score: 3, Interesting

The idea of fault tolerable computing is similar to the EnerJ programming language being developed at the University of Washington for power savings The Language of Good Enough Computing

The jist of the idea is that the programmer can specify which variables need to be exact and which variables can be approximate. The approximate variables would then be stored a low refresh RAM which is more prone to errors to save power, while the precise variables would be stored a higher power memory which would be error free.

The example they gave was calculating the average shade of grey in a large image of 1000 by 1000 pixels. The running total could be held in an approximate variable since the error incurred by adding one pixel incorrectly out of a million would be small, while the control loop variable would be accurate since you wouldn't want your loop to overflow.