Slashdot Mirror


New Framework For Programming Unreliable Chips

rtoz writes "For handling the future unreliable chips, a research group at MIT's Computer Science and Artificial Intelligence Laboratory has developed a new programming framework that enables software developers to specify when errors may be tolerable. The system then calculates the probability that the software will perform as it's intended. As transistors get smaller, they also become less reliable. This reliability won't be a major issue in some cases. For example, if few pixels in each frame of a high-definition video are improperly decoded, viewers probably won't notice — but relaxing the requirement of perfect decoding could yield gains in speed or energy efficiency."

83 of 128 comments (clear)

  1. godzilla by Anonymous Coward · · Score: 5, Insightful

    Asking software to correct hardware errors is like asking godzilla to protect tokyo from mega godzilla

    this does not lead to rising property values

    1. Re:godzilla by n6mod · · Score: 5, Interesting

      I was hoping someone would mention James Mickens' epic rant.

      --
      You have violated Robot's Rules of Order and will be asked to leave the future immediately.
    2. Re:godzilla by K.+S.+Kyosuke · · Score: 1

      Asking software to correct hardware errors is like asking godzilla to protect tokyo from mega godzilla

      OTOH, in measurement theory, it's been long known that random errors can be eliminated by post-processing multiple measurements.

      --
      Ezekiel 23:20
    3. Re:godzilla by jeffb+(2.718) · · Score: 1

      (This is where the analogy falls apart. How useful is a partly sorted array? Not very. An almost correct floating point calculation on the other hand might even be just as good as the correct result, depending on the application.)

      Actually, it seems to me that the analogy is still quite valid. Having a large array where items are guaranteed to be off by no more than one spot -- in other words, where some adjacent items may be swapped from their correct positions -- could be quite useful. I'm thinking of things like "sort by most recent" for news articles, or "search by price ascending" in an online store. In fact, I'm seeing such "approximate ordering" a lot more frequently on large-scale Web apps; it's better to have an approximately-ordered list quickly than a precisely-ordered list much more slowly.

      Of course, if you're looking for a sorted list to support binary search, your mileage will vary.

    4. Re:godzilla by rasmusbr · · Score: 1

      Nobody is suggesting allowing errors everywhere. Errors will only be allowed where they wouldn't cause massive unexpected effects.

      A simple (self-driving) car analogy here would be that you might allow the lights to flicker a little if that saves power. You might even allow the steering wheel to move very slightly at random in order to save power as long as it never causes the car to spin out of control, but you would never allow even a small chance that the car would select its destination at random.

    5. Re:godzilla by vux984 · · Score: 2

      OTOH, in measurement theory, it's been long known that random errors can be eliminated by post-processing multiple measurements.

      Gaining speed an energy efficiency is not usually accomplished by doing something multiple times, and then post processing the results of THAT, when you used to just do it once and got it right.

      You'll have to do the measurements in parallel, and do it a lot faster to have time for the post processing and still come out ahead for performance. And I'm still not sure that buys you any improved efficiency.

      random errors can be eliminated by post-processing multiple measurements.

      And this is the real crux of the paradox :) Random errors can be introduced by post processing multiple measurements on an unreliable processor doing the post processing.

      Now we have to post-post-process the results of the post-processed results to eliminate any random errors there? Turtles all the way down.

      That said, as TFA suggested there are operations that can tolerate error, like video decoding -- and if we can realize substantial gains in performance or energy efficiency that translates into your laptop running a lot longer in exchange for a few transient (sub tenth of a second) pixel errors... that's a pretty good trade.

    6. Re:godzilla by K.+S.+Kyosuke · · Score: 2

      Gaining speed an energy efficiency is not usually accomplished by doing something multiple times, and then post processing the results of THAT, when you used to just do it once and got it right.

      For some kinds of computations, results can be verified in a time much shorter than the time in which they are computed. Often even asymptotically, but that's not even necessary. If you can perform a certain computation twice as fast and with half the energy on a faster but sometimes unreliable circuit/computational node, with the proviso that you need to invest five percent extra time and energy to check the result, you've still won big. (There are even kinds of computation when not even probabilistically wrong results don't matter all that much because the computation as a whole doesn't diverge easily, but I digress.)

      Now we have to post-post-process the results of the post-processed results to eliminate any random errors there? Turtles all the way down.

      How do you know that the universe isn't lying to you? How do you know that your brain isn't delusional about the lack of cognitive problems on your part? That's the same kind of questions.

      That said, as TFA suggested there are operations that can tolerate error, like video decoding -- and if we can realize substantial gains in performance or energy efficiency that translates into your laptop running a lot longer in exchange for a few transient (sub tenth of a second) pixel errors... that's a pretty good trade.

      I believe I've already seen that kind of computing before. I even believe there had already been a post on something like this here on /. (Unfortunately, that was at a time when I didn't write extensive searchable notes on the things I stumbled upon, so I can't serve with a link.)

      --
      Ezekiel 23:20
    7. Re:godzilla by viperidaenz · · Score: 1

      I'd rather end up at the wrong street number than sideways into a power pole...

    8. Re:godzilla by Azure+Flash · · Score: 1

      Are you kidding? Properties with a beautiful view on the battlefield between Godzilla and Mega Godzilla would definitely be worth MILLIONS of yen

    9. Re:godzilla by kermidge · · Score: 1

      God, that's one beautiful little piece of writing. Thank you.

      From the posted summary "...if few pixels in each frame of a high-definition video are improperly decoded, viewers probably won't notice" - Now there's a slippery slope if ever I saw one.

  2. Hmmm ... by gstoddart · · Score: 4, Insightful

    So, expect the quality of computers to go downhill over the next few years, but we'll do out best to fix it in software?

    That sounds like we're putting the quality control on the wrong side of the equation to me.

    --
    Lost at C:>. Found at C.
    1. Re:Hmmm ... by bill_mcgonigle · · Score: 2

      So, expect the quality of computers to go downhill over the next few years, but we'll do out best to fix it in software?

      If you use modern hard drives, you've already accepted high error rates corrected by software.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    2. Re:Hmmm ... by Desler · · Score: 1

      Next few years? More like a few decades or more. Drivers, firmware microcode, etc. have always contained software workarounds to hardware bugs. This is nothing new.

    3. Re: Hmmm ... by fizzer06 · · Score: 1

      I haven't accepted bad data from the newer hard drives.

    4. Re:Hmmm ... by fast+turtle · · Score: 1

      if you access any server remotely then you're already using this - it's called ECC RAM

      --
      Mod me up/Mod me down: I wont frown as I've no crown
    5. Re:Hmmm ... by ZeroPly · · Score: 2

      Relax, pal - frameworks that don't particularly care about accuracy have been around for years now. If you don't believe me, talk to anyone who uses .NET Framework.

      --
      Support microSD: in a post 9/11 world, it is unwise to carry your data on media that you cannot comfortably swallow.
    6. Re:Hmmm ... by InsightfulPlusTwo · · Score: 1

      You don't seem to have read the article. The software is not going to supply extra error correction when the hardware has errors. It's going to allow the programmer to specify code operations that can tolerate more errors, which the compiler can then move to the lower-quality hardware. Some software operations, like audio or video playback, can allow errors and still work OK, which allows you to use lower-energy less-quality hardware for those operations. If they did as you suggest, and tried to fix hardware errors in the software, that would cause the software to take more energy to correct the errors and be more complex besides, which would seem to negate the benefits of the new hardware. This is not unprecedented since various applications (audio CDs, hearing aids, etc.) already use a lesser standard of error correction.

      --
      I felt bad for the man who had no signature, until I met a man who had no comment.
    7. Re:Hmmm ... by Joshua+Fan · · Score: 1

      All in preparation for next big thing after that... MORE accurate hardware! 6.24% more!

    8. Re:Hmmm ... by fizzer06 · · Score: 1
      frameworks that don't particularly care about accuracy . . . .NET Framework.

      Okay, I'll bite. Explain yourself.

    9. Re:Hmmm ... by ZeroPly · · Score: 1

      I'm an application deployment guy, not a programmer. Every time we push something that needs .NET Framework, the end users complain about it being hideously slow. Our MS developers of course want everyone to have a Core i7 machine with 64GB RAM and SSD hard drive - to which I reply "learn how to write some fucking code without seven layers of frameworks and abstraction layers".

      Then of course, I can never get a straight answer from the developers on which .NET to install. Do you want 4, 3.5 SP1, 2? The usual answer is "load all of them". I get that .NET Framework is great in theory, but if you have to deal with the actual implementation, you'll see things differently. A lot of times we'll get screen glitches which the devs are convinced is a MS issue, but there's no available fix, so we go with "that's not a serious enough problem to fix".

      On the other side of the fence are the Linux apps I have to deploy. The Linux devs send me a .DEB file. I generally have that pushed out the same day.

      --
      Support microSD: in a post 9/11 world, it is unwise to carry your data on media that you cannot comfortably swallow.
    10. Re:Hmmm ... by K.+S.+Kyosuke · · Score: 1

      It uses algorithms to correct errors, instead of simply using more reliable memory cell hardware. I believe that's the point of the comparison, not whether the algorithm runs in software or in hardware.

      --
      Ezekiel 23:20
    11. Re:Hmmm ... by viperidaenz · · Score: 1

      Not 6.24%, 6.26%... or was it 8.24%?

      I forget which bit got flipped.

    12. Re:Hmmm ... by viperidaenz · · Score: 2

      Our MS developers of course want everyone to have a Core i7 machine with 64GB RAM and SSD hard drive

      Do what the company I'm working for has done then.
      Give everyone an i7 with 16GB RAM and an SSD.

      Except they run Windows 7 32bit, so we can only use 4GB of that (and PAE is disabled on Win7 32bit), and the SSD is the D: drive, not the system drive so when everything does page, it slows to a crawl.

  3. Huh? by Desler · · Score: 1

    but relaxing the requirement of perfect decoding could yield gains in speed or energy efficiency."

    Which you could already get now simply by not doing error correction. No need for some other programming framework to get this.

    1. Re:Huh? by SJHillman · · Score: 1

      It's not so much about skipping error correction as it is saying when you can skip error correction. If 5 pixels are decoded improperly, fuck it, just keep going. However, if 500 pixels are decoded improperly, then maybe it's time to fix that.

    2. Re:Huh? by Desler · · Score: 1

      And as I said you can do that already.

    3. Re:Huh? by MightyYar · · Score: 1

      Really? You can tell your phone/PC/laptop/whatever to run the graphics chip at an unreliably low voltage on demand?

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    4. Re:Huh? by HybridST · · Score: 1

      For PC and laptop, yes I can.

      Overclocking utilities can also underclock and near the lower stability threshold of graphics frequency, I often do see a few pixels out of whack. Not enough to crash, but artifacts definitely appear. A mhz or 2 higher clock clears them up though.

      I have a dumb phone so reclocking it isn't necessary.

      --
      Ever notice that Cobra Commander sounds an awful lot like Star scream?
    5. Re:Huh? by MightyYar · · Score: 1

      So you've done this yourself and you still don't see the utility in doing it at the application level rather than the system level?

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    6. Re:Huh? by HybridST · · Score: 1

      Automating the process would be handy, but not revolutionary. Automating it at the system level makes more sense to me but i'm just a power user.

      --
      Ever notice that Cobra Commander sounds an awful lot like Star scream?
    7. Re:Huh? by Xrikcus · · Score: 1

      When you do it that way you have no control over which computations are inaccurate. There's a lot more you can do if you have some input information from higher levels of the system.

      You may be happy that your pixels come out wrong occasionally, but you certainly don't want the memory allocator that controls the data to do the same. The point of this kind of technology (which is becoming common in research at the moment, the MIT link here is a good use of the marketing department) is to be able to control this in a more fine-grained fashion. For example, you could mark the code in the memory allocator as accurate - it must not have errors and so must enable any hardware error correction, might use a core on the platform that operates at a higher voltage, or would add extra software error correction as necessary. At the same time you might allow the visualization code to degrade to reduce overall power consumption, because the visualization code is not mutating any important data structures. Anything it generates is transient and the errors will barely be noticed.

    8. Re:Huh? by MightyYar · · Score: 1

      I don't think it is revolutionary, either... it's just a framework, after all. I was imagining a use where you have some super-low-power device out in the woods sampling temperatures, only firing itself up to "reliable" when it needs to send out data or something. Or a smartphone media app that lets the user choose between high video/audio quality and better battery life. Yeah, they could have already done this with some custom driver or something, but presumably having an existing framework would make it easier, less apt to conflict, and more standard.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
  4. "A few pixels incorrectly decoded"... by gnasher719 · · Score: 1

    h.264 relies heavily on the pixels in all previous frames. Incorrectly decoded pixels will be visible on many frames that are following. What's worse, they will start moving around and spreading.

    1. Re:"A few pixels incorrectly decoded"... by Desler · · Score: 1

      Not always true. There are cases where corrupted macroblocks will only cause artifacts in a single frame and won't necessarily cause further decoding corruption.

    2. Re:"A few pixels incorrectly decoded"... by SJHillman · · Score: 1

      So what you're saying is that the pixels are alive, and growing! I smell a SyFy movie of the week in the works.

    3. Re:"A few pixels incorrectly decoded"... by gigaherz · · Score: 1

      You missed the point. This is a framework for writing code that KNOWS about unreliable bits. The whole idea is that it lets you write algorithms that can tell the compiler where it's acceptable to have a few errores bits, and where isn't. No one said it would apply to EXISITNG code...

    4. Re:"A few pixels incorrectly decoded"... by SuricouRaven · · Score: 1

      24fps? Depends on content. It's too high for landscapes establishing shots, talking heads and presentations. Yet too low for high-action scenes and sports. It's a happy medium.

      If you don't like it, try to get variable frame rate support more established. Then everyone is happy.

    5. Re:"A few pixels incorrectly decoded"... by MightyYar · · Score: 1

      So then for you, the compromise in this particular example would be that you would crank up the power a bit and make the pixels all perfect. Other people without such good eyes could crank down the power and get more battery life.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    6. Re:"A few pixels incorrectly decoded"... by viperidaenz · · Score: 1

      So why not just add more instructions, for doing faster but less accurate calculations? 24bit operations for RGB values, for example.

  5. How on earth by dmatos · · Score: 4, Insightful

    are they going to make "unreliable transistors" that, upon failure, simply decode a pixel incorrectly, rather than, oh, I don't know, branching the program to an unspecified memory address in the middle of nowhere and borking everything.

    They'd have to completely re-architect whatever chip is doing the calculations. You'd need three classes of "data" - instructions, important data (branch addresses, etc), and unimportant data. Only one of these could be run on unreliable transistors.

    I can't imagine a way of doing that where the overhead takes less time than actually using decent transistors in the first place.

    Oh, wait. It's a software lab that's doing this. Never mind, they're not thinking about the hardware at all.

    --

    It may look like I'm doing nothing, but I'm actively waiting for my problems to go away.
    --Scott Adams
    1. Re:How on earth by bestdealex · · Score: 1

      Where are my mod points when I need them?! This is exactly my sentiment as well. Even the simple processing required to check if the data output is correct or within bounds will be staggering compared to simply letting it pass.

      --
      If you can't convince them, confuse them!
    2. Re:How on earth by gigaherz · · Score: 1

      This was in slashdot years ago. I can't find the slashdot link, but I did find this one. The idea is that you design a cpu focusing the reliability in the more significant bits, while you allow the least significant bits to be wrong more often. The errors will be centered around the right values (and tend to average into them), so if you write code that is aware of that fact, you can teach it to compensate for the wrong values. Of course this is not acceptable for certain kinds of software, but for things like multimedia processing, a small % error in the result wouldn't be appreciable, and over time, the image should keep averaging out the old errors while introducing new ones, assuming the software is designed for it.

    3. Re:How on earth by MightyYar · · Score: 1

      Doesn't that depend on the application? What if I'm simply updating a position based upon an already noisy sensor? I already have a bunch of code to throw out crappy results. I'm taking lots of samples, so as long as most of my measurements are accurate, it's all good. Obviously I can't tolerate a random error in every single cycle, but maybe 1 in a million is OK and lets me run at a lower voltage.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    4. Re:How on earth by Warbothong · · Score: 1

      How on earth are they going to make "unreliable transistors" that, upon failure, simply decode a pixel incorrectly, rather than, oh, I don't know, branching the program to an unspecified memory address in the middle of nowhere and borking everything.

      Very easily: the developer specifies that pixel values can tolerate errors but that branch conditions/memory addresses can't. If you'd bothered to read the summary, you'll see it says exactly that:

      a new programming framework that enables software developers to specify when errors may be tolerable.

      They'd have to completely re-architect whatever chip is doing the calculations.

      Erm, that's the whole point. If we allowed high error rates with existing architectures, none of our results would be trustworthy. I imagine the most practical approach would be a fast, low-power but error-prone co-processor living alongside the main, low-error processor. This could be programmed just like GPUs are at the moment. The nice thing about this work is that the separation can be largely transparent; just annotate your programs and the compiler will figure out which parts can be offloaded to the co-processor.

      I can't imagine a way of doing that where the overhead takes less time than actually using decent transistors in the first place.

      As far as I can tell there is no overhead involved. In fact it's the other way around: calculating exact answers (as we do now) is a perfectly acceptable way to execute an error-tolerant program. The opposite is not true though: an error-intolerant program cannot be executed with errors. Since we're strictly increasing the execution strategies available, we can only ever increase efficiency (since we can choose to ignore the new strategies).

    5. Re:How on earth by tlhIngan · · Score: 1

      are they going to make "unreliable transistors" that, upon failure, simply decode a pixel incorrectly, rather than, oh, I don't know, branching the program to an unspecified memory address in the middle of nowhere and borking everything.

      They'd have to completely re-architect whatever chip is doing the calculations. You'd need three classes of "data" - instructions, important data (branch addresses, etc), and unimportant data. Only one of these could be run on unreliable transistors.

      I can't imagine a way of doing that where the overhead takes less time than actually using decent transistors in the first place.

      Oh, wait. It's a software lab that's doing this. Never mind, they're not thinking about the hardware at all.

      More properly, the language takes care of it.

      You declare variables to be "approximate" - where errors are tolerated and you can use lower power hardware to do it (it turns out reliability means having to use higher voltages which raise power consumption, and lower clock speeds which keeps cores powered up longer rather than race them to sleep as fast as possible).

      So a counter would be "exact" and have to use the high-powered reliable hardware mode, while the pixel data will be inexact and use low power mode. Even a counter that iterates over the pixel array has to be exact.

      And you can easily transition from exact data to inexact data, but transitions back are limited and explicity - you can't test inexact values - you have to promote the inexact data (because there will always be times when you need to deal with it).

      Of course, it's a new programming language because existing ones model reliable systems.

    6. Re:How on earth by bluefoxlucid · · Score: 3, Insightful

      Erm, that's the whole point. If we allowed high error rates with existing architectures, none of our results would be trustworthy. I imagine the most practical approach would be a fast, low-power but error-prone co-processor living alongside the main, low-error processor.

      Or you know, the thing from 5000 years ago where we used 3 CPUs (we could on-package ALU this shit today) all running at high speeds and looking for 2 that get the same result and accepting that result. It's called MISD architecture.

    7. Re:How on earth by viperidaenz · · Score: 1

      A big class of CPU bugs consists of so-called speed-paths, where a part of the CPU expects a calculation in a different part of the CPU to be complete before it has actually completed

      Care to expand on that? This is not a typical race condition. What you're describing is a CPU not ordering instructions as expected - not doing its primary purpose.

    8. Re:How on earth by viperidaenz · · Score: 1

      2+2=5 for large values of 2.
      When you're performing calculations, you need to know where and how rounding takes place if everything isn't an integer.

    9. Re:How on earth by Darinbob · · Score: 1

      It seems a bit strange to me also. Didn't real all the article; but a few pixels wrong is extremely minor and very lucky. One wrong bit is far more likely to crash your computer than to make a pixel be incorrect. What about the CPU? Are we so media obsessed now that getting the pixels wrong is considered a major error but we completely ignore all the serious errors that could result? We'd need redundant transistors to monitor everything, making sure that the CPU registers have the correct values, that addition is performed correctly, that all memory values have not been corrupted. And at that point the redundant transistors are eliminating the gain achieved by making transistors smaller.

      A software solution here would have to ultimately come down to the machine language level. Ie, some add operators are tagged as error tolerant but others tagged as crucial, so they go to different ALUs (the cheap ass one versus one with redundancy). Every single branch would still have to go to the highest quality ALU though.

    10. Re:How on earth by able1234au · · Score: 1

      This is the better approach but i wonder if there is a saving with 3 dodgy processors over 1 good processor. i guess if the yield falls below one third then it might. But power requirements may triple so hard to see the saving.

    11. Re:How on earth by jouassou · · Score: 1

      I can imagine a couple of applications of these transistors though...

      Many numerical simulations require repeated random sampling of some process, and then combine the results in the end. If you're averaging some billion simulations, the result should be quite robust to fluctuations in the results of each simulation. Thus it might well be worth it to use 10 billion unreliable transistors instead of 1 billion reliable transistors, if they cost the same.

      Another application could be to generate random numbers. Let's say that you have a pseudorandom number generator with periodicity N, and your unreliable transistors makes the algorithm do a random jump after an average of N/100 numbers. Wouldn't that be "random enough" for more applications than just the pseudorandom number generator itself?

    12. Re:How on earth by bluefoxlucid · · Score: 1

      Power requirements actually increase hyperlinearly. DDR RAM uses a serializer, for example, so that you run the RAM at 100MHz but fetch multiple bytes into a buffer and output that across your FSB. This is because running the RAM at 100MHz takes N power, while running at 200MHz takes N^2 power or something ridiculously bigger than 2N.

  6. Re:viewers probably won't notice? by Desler · · Score: 1

    You confuse what that sentence is talking about. They aren't talking about stuck pixels on an LCD. It's talking about not spending time doing extensive error correction/masking when a few pixels in the video are corrupted and thus will be decoded with some level of artifacting.

  7. Re:viewers probably won't notice? by SJHillman · · Score: 1

    You must have gone through a lot of monitors before realizing this has nothing to do with dead pixels on a display.

  8. Chicken and the Egg. by jellomizer · · Score: 3, Informative

    We need software to design hardware to make software...

    In short it is about better adjusting your tolerance levels on individual features.
    I want my Integer arithmetic to be perfect. My Floating point, good up to 8 decimals places. But there components meant for interfacing with the human. Audio, so much stuff is altered or loss due to difference in quality of speakers, every top notch ones with Gold(Or whatever crazy stuff) Cables. So in your digital to audio conversion, you may be fine if a voltage is a bit off, or you skipped a random change, as the smoothing mechanism will often hide that little mistake.

    Now for displays... We need to be pixel perfect when we have screens with little movement. But if we are watching a movie, a Pixel color #8F6314 can be #A07310 for 1 60th of a second and we wouldn't notice it. And most most displays are not even high enough quality to show these differences.

    We hear of these errors and think, how horrible that we are not good perfect products... However it is more due to the trade-off of getting smaller and faster with a few more glitches,

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    1. Re:Chicken and the Egg. by CastrTroy · · Score: 2

      Yeah, but you could save just as much power (I'm guessing) with dedicated hardware decoders, as you could by letting the chips be inaccurate. As chips get smaller it's much more feasible to hard hardware specific chips for just about everything. The ARM chips in phones and tablets have all kinds of specialized hardware, some for decoding video and audio, other's for doing encryption and other things that are usually costly for a general purpose processor. Plus it's a lot easier for the developer to not have to consider how inaccurate stuff can be, and just writing code as though things are actually going to be correct. Even programming with binary floating point numbers is problematic enough, as there's many decimal floating point numbers that can't be properly represented.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    2. Re:Chicken and the Egg. by Dahamma · · Score: 1

      Yeah, but you could save just as much power (I'm guessing) with dedicated hardware decoders, as you could by letting the chips be inaccurate.

      Eh, a dedicated hardware decoder is still made out of silicon. That's the point, make chips that perform tasks like that (or other things pushing lots of data that is only relevant for a short period, like GPUs - GPUs used only for gfx and not computation, at least) tolerate some error, so that they can use even less power. No one is yet suggesting we make general purpose CPUs in today's architectures unreliable :)

    3. Re:Chicken and the Egg. by CastrTroy · · Score: 1

      Yeah, but that's not something the application level software developer has to account for. They just use OpenGL, or DirectX, and the chip and video card driver decides how to execute it and render it. Actually, with some graphics cards, and driver implementations, they basically do this already, by rendering the image incorrectly, it speeds up the result, and they hope nobody notices. Basically, if any error is acceptable when programming against certain hardware, it should just be handled at the API level for accessing the hardware. The people programming against the hardware shouldn't have to decide how much, if any, error is acceptable. For instance, If I'm decoding video, I would just pass the encoded stream to a function, and get decoded frames back, or they would be displayed on the screen. In many cases, it might even be user configurable. For some users might be OK for colors to be incorrect in exchange for higher frame rates. However, other users might want the exact opposite experience. Maybe their hardware is already producing enough frames, and they just want a nicer picture.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    4. Re:Chicken and the Egg. by Dahamma · · Score: 1

      They just use OpenGL, or DirectX, and the chip and video card driver decides how to execute it and render it.

      *Real* use of OpenGL and DirectX these days is all about the shaders, which get compiled and run on the GPUs. And even basic ops that are "builtin" to the drivers usually are using shader code internal to the driver (or microcode internal to the hardware/firmware).

      The people programming against the hardware shouldn't have to decide how much, if any, error is acceptable.

      Absolutely they should, and have been doing so with existing 3D hardware for a long time. It's just been more about 3D rendering shortcuts/heuristics/etc than faulty hardware. It's all about tricking the viewer's eyes and brain to increasing degrees, not reproducing an exactly correctly rendered 3D image... and until everything is raytraced that will continue be the case.

      For instance, If I'm decoding video, I would just pass the encoded stream to a function, and get decoded frames back, or they would be displayed on the screen.

      Well, I just finished implemented stereoscopic 3D video playback on the PS4, and I guarantee it's more work than that ;) Libraries are provided to do the low level decoding, but demuxing, decryption, scaling/blitting to framebuffers, compositing with UI elements, audio processing, A/V sync, etc, are largely left up to the application programmer. Even even with that, the "hardware decoding" is mostly happening on the GPU or other *programmable* video decoder hardware anyway.

      These reasons are precisely why GPUs / "hardware" decoders will likely be the first processors to benefit from frameworks like the one described in the article...

  9. Similar Idea to EnerJ Language by MetaDFF · · Score: 3, Interesting

    The idea of fault tolerable computing is similar to the EnerJ programming language being developed at the University of Washington for power savings The Language of Good Enough Computing

    The jist of the idea is that the programmer can specify which variables need to be exact and which variables can be approximate. The approximate variables would then be stored a low refresh RAM which is more prone to errors to save power, while the precise variables would be stored a higher power memory which would be error free.

    The example they gave was calculating the average shade of grey in a large image of 1000 by 1000 pixels. The running total could be held in an approximate variable since the error incurred by adding one pixel incorrectly out of a million would be small, while the control loop variable would be accurate since you wouldn't want your loop to overflow.

    1. Re:Similar Idea to EnerJ Language by JesseMcDonald · · Score: 1

      The example they gave was calculating the average shade of grey in a large image of 1000 by 1000 pixels. The running total could be held in an approximate variable since the error incurred by adding one pixel incorrectly out of a million would be small...

      What makes them think that the kinds of errors you'd get in a variable in low-refresh-rate RAM would be small? Flip the MSB from 1 to 0 and your total is suddenly divided in half. Or, if it's a floating-point variable, flip one bit in the exponent field and your total changes from 1.23232e4 to 1.23232e-124.

      --
      "The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
  10. Re:I for one welcome by alexander_686 · · Score: 1

    .Seriously, why do we want to do this? Is power usage going to cut in half?

    Yes. Well, about in 1/2. Think about signal processors and cell phones. Would you accept a 5% reduction in voice quality for a doubling of your talk time?

  11. Re:unrelliable is not really useful by somersault · · Score: 1

    What exactly led you to believe that anyone is wanting to use this concept in situations where 100% reliability is required?

    --
    which is totally what she said
  12. french by Spaham · · Score: 1

    am I the only french who thinks that the "Computer Science and Artificial Intelligence Laboratory" sound like this in french :
    CS-AIL ?

  13. This could make computers more brain-like by Dr.+Spork · · Score: 1

    I love this idea, because it reminds me of the most energy efficient signal processing tool in the known universe, the human brain. Give Ken Jennings a granola bar, and he'll seriously challenge Watson, who will be needing several kilowatt-hours to do the same job. Plus, Ken Jennings is a lot more flexible. He can carry on conversations, tie shoes, etc. This is because his central processing unit basically relies on some sort of fault-tolerant software. I think that there will be a lot more applications of a fault-tolerant, energy efficient software strategy, beyond just media decoding. When we get around to asking computers to be creative and apply variously-weighted "rules of thumb", I expect that those operations will run best on systems that sacrifice calculation accuracy for speed and energy efficiency. You gain almost nothing when you apply rough heuristic rules precisely. Let's allow the computers to apply rough rules imprecisely, and reap the speed and energy benefits of the trade.

    1. Re:This could make computers more brain-like by bluefoxlucid · · Score: 1

      I love this idea, because it reminds me of the most energy efficient signal processing tool in the known universe, the human brain.

      Dumb analogy. Being inaccurate does not make you more intelligent and won't cause emergent behavior.

      Give Ken Jennings a granola bar, and he'll seriously challenge Watson, who will be needing several kilowatt-hours to do the same job.

      Wrong. Ken Jennings' brain runs on blood sugar, glycogen stored in the liver from previous food (converted into blood sugar by glucagon as blood sugar is consumed for work), fat stored in consolidated fat cells from previous food (converted into blood sugar by lipolysis), and a huge set of neurotransmitters (mainly acetylcholine) stored up by prior processes. Never mind that you get 10% of the energy at each level--the plants convert 2% of the sunlight they collect to energy, which is mainly stored as inaccessible fiber and other structural work (i.e. vitamins, hormones...); herbavores (the normal analogy is 'pork chop' or 'steak') get maybe 10% of that converted energy; you get maybe 10% of the energy input from the herbavores. This is like 0.02% efficiency versus 12% efficient Photovoltaic panels or 38% efficient parabolic solar collectors, not considering the direct inefficiency of Ken Jennings' brain for converting sugar energy into useful work.

      Plus, Ken Jennings is a lot more flexible. He can carry on conversations, tie shoes, etc. This is because his central processing unit basically relies on some sort of fault-tolerant software.

      No, it's because he has better programming. A gerbil's brain relies on fault tolerant processing, and they can't talk or tie shoes; they can eat and have sex.

      I think that there will be a lot more applications of a fault-tolerant, energy efficient software strategy, beyond just media decoding. When we get around to asking computers to be creative and apply variously-weighted "rules of thumb", I expect that those operations will run best on systems that sacrifice calculation accuracy for speed and energy efficiency. You gain almost nothing when you apply rough heuristic rules precisely. Let's allow the computers to apply rough rules imprecisely, and reap the speed and energy benefits of the trade.

      Actually that's slow and stupid. This is less effective than taking a working computer with a high clock rate (SOD-CMOS at 394GHz, low-power, accurate) and seeding various inputs with a noise-based RNG (audio-entropyd measures the noise fluctuation on an unused microphone line, for example: this is just spaztastic voltage wobble from EMR).

      Stop romanticising the human mind as a result of "lots of imprecise and uniquely organic failings creating something amazing and beautiful". It's a really fucking complex system.

  14. No! Unreliability is a feature by Mister+Liberty · · Score: 1

    May the best chi(m)p win.

    1. Re:No! Unreliability is a feature by Iniamyen · · Score: 1

      The cihps rlealy olny hvae to get the frsit and lsat ltetres corcert. Yuor brian can flil in the rset.

  15. Already done by mjr167 · · Score: 1

    Doesn't intel already make a chip that is unreliable?

    1. Re:Already done by CaseCrash · · Score: 1

      Well, they did make one that was reliably incorrect.

      --
      No, that link you posted to a web comic we've all seen a hundred times is not "obligatory."
  16. Oh, GREAT by Iniamyen · · Score: 1

    Yeah, let's take away the only thing that computers had going for them - doing exactly what they're told. THAT sounds like a GREAT idea.

  17. How about stop making crap hardware? by Lumpy · · Score: 1

    It can be done, we dont have to race for atomic size transistors before we have the technology ot make them more reliable.

    --
    Do not look at laser with remaining good eye.
  18. Chips for unreliable programming... by Alejux · · Score: 1

    now that would be world changing!

  19. Decades old news by viperidaenz · · Score: 1

    What do you think the artefacts shown on screen are when you overclock your video card too high? Acceptable (sometimes) hardware errors.

  20. Re:The end of general-purpose computing? by viperidaenz · · Score: 1

    In other words, it assumes that we won't be using general-purpose computers in the future.

    Too true. Any transistor that is in the path of calculating anything that ends up as a memory location or an offset to one anywhere has the possibility of crashing the process if you're lucky, or compromising the entire system.

  21. And the inexorable decline of humanity continues by EmagGeek · · Score: 1

    This is why everything is disposable and nothing works anymore. People are too willing to sacrifice quality and reliability for cost.

  22. Re:viewers probably won't notice? by viperidaenz · · Score: 1

    A stuck pixel is still just an unreliable transistor...

  23. Infinite recursion here? by Jorgensen · · Score: 1

    So: This assumes that something, somewhere knows which transistors are unreliable. This data needs to be stored somewhere - on the "good" transistors. How is this data obtained? is there a trustworthy "map" of "unreliable transistors" ? And the code that determines the probability has to run on the "good" transistors too. Will those transistors stay good?

    I cannot see any way of allowing *any* transistor being unreliable... And based on my (admittedly incomplete) understanding of chip production, *any one* of the transistors on the sillicon can be faulty, so there still is a chicken-and-egg problem in here somewhere.

    Surely, such "suspect" transistors can only be used for storing the final end result of a calculation: If you were to use it for intermediate values on which you base "if" statements (or any sort of branch), your code will end up unreliable as a result. Unfortunately, 99% of the time the "end result" of one calculation is used as input to another calculation, so the problem spreads like rings in the water.

    What if humans want to rely on the output of the computer? Does that pixel on the screen matter? If you are playing Angry Birds, fine. But the pixels may be important if you're a doctor looking at a scan. Or you're a flight controller scanning the screen for planes. The graphics routines do not know the context in which they run. So the actual usability of this ends up being radically diminished....

    What use is a computer where you cannot trust the result? We already have logic bugs, race conditions, usability issues etc confusing everybody - I don't think we need to make the computers even more unreliable...

  24. Re:Or we could just use java by viperidaenz · · Score: 1

    Or we could just use java, with it's "almost" IEEE complete libraries

    That's a design feature and what strictfp is for. It's not Sun's fault all the different CPU's Java code can run on implement floating point hardware differently. The only other option is to emulate it in software.
    It's a pitty nothing you mentioned has anything to do with Java not guaranteeing floating point operations.

  25. Re:I for one welcome by viperidaenz · · Score: 1

    Except the battery drain in talk-time is mostly the radio, not the CPU.
    The battery drain while using it is mostly the screen backlight.

    So cutting in half the power consumption of something contributing and almost insignificant amount of power is going to do not much.

  26. Funny.... by hackus · · Score: 1

    I already thought we had a framework for making chips unreliable in the programming realm known as Windows API.

    Oh wait...

    -Hackus

    --
    Got Geometrodynamics? Awe, too hard to figure out? Too bad.
  27. faster and broken != upgrade by Gravis+Zero · · Score: 1

    if it's a choice between using a slower chip that is reliable and a chip that is blistering fast but makes mistakes, i'll take the slower chip every time.

    --
    Anons need not reply. Questions end with a question mark.
  28. a fourth possibility by NikeHerc · · Score: 1

    From the article: "A third possibility, which some researchers have begun to float, is that we could simply let our computers make more mistakes.

    A fourth possibility is to forget this silliness before it turns into epic failure, go back to the drawing board, and design computers that make fewer mistakes, not more mistakes. Sheesh, what lunacy!

    --
    Circle the wagons and fire inward. Entropy increases without bounds.