New Framework For Programming Unreliable Chips

godzilla by Anonymous Coward · 2013-11-04 02:37 · Score: 5, Insightful

Asking software to correct hardware errors is like asking godzilla to protect tokyo from mega godzilla

this does not lead to rising property values

Re:godzilla by n6mod · 2013-11-04 02:57 · Score: 5, Interesting

I was hoping someone would mention James Mickens' epic rant.

--
You have violated Robot's Rules of Order and will be asked to leave the future immediately.
Re:godzilla by K.+S.+Kyosuke · 2013-11-04 04:14 · Score: 1

Asking software to correct hardware errors is like asking godzilla to protect tokyo from mega godzilla
OTOH, in measurement theory, it's been long known that random errors can be eliminated by post-processing multiple measurements.

--
Ezekiel 23:20
Re:godzilla by Anonymous Coward · 2013-11-04 04:25 · Score: 0

Typical CPUs are designed for the worst case: If they don't give the correct result under the most adverse conditions (temperature, electrical noise, bit pattern), then it's considered a bug. One could design CPUs for the average case: Make them such that calculations are mostly correct, except under rare conditions, and even then design the CPU to give less precise results, not entirely random results.
A software analogy: Quicksort has a worst case time complexity of O(n^2), but an average case time complexity of just O(n * log n). Suppose you have a real time application that needs to sort something. Do you allocate quadratic time for the sorting or do you vastly improve the responsiveness by only waiting for the sort result long enough to deal with the average case? Obviously this depends on how horribly things will go wrong when you run into the worst case. (This is where the analogy falls apart. How useful is a partly sorted array? Not very. An almost correct floating point calculation on the other hand might even be just as good as the correct result, depending on the application.)
Re:godzilla by jeffb+(2.718) · 2013-11-04 04:58 · Score: 1

(This is where the analogy falls apart. How useful is a partly sorted array? Not very. An almost correct floating point calculation on the other hand might even be just as good as the correct result, depending on the application.)
Actually, it seems to me that the analogy is still quite valid. Having a large array where items are guaranteed to be off by no more than one spot -- in other words, where some adjacent items may be swapped from their correct positions -- could be quite useful. I'm thinking of things like "sort by most recent" for news articles, or "search by price ascending" in an online store. In fact, I'm seeing such "approximate ordering" a lot more frequently on large-scale Web apps; it's better to have an approximately-ordered list quickly than a precisely-ordered list much more slowly.
Of course, if you're looking for a sorted list to support binary search, your mileage will vary.
Re:godzilla by rasmusbr · 2013-11-04 05:41 · Score: 1

Nobody is suggesting allowing errors everywhere. Errors will only be allowed where they wouldn't cause massive unexpected effects.
A simple (self-driving) car analogy here would be that you might allow the lights to flicker a little if that saves power. You might even allow the steering wheel to move very slightly at random in order to save power as long as it never causes the car to spin out of control, but you would never allow even a small chance that the car would select its destination at random.
Re:godzilla by vux984 · 2013-11-04 06:31 · Score: 2

OTOH, in measurement theory, it's been long known that random errors can be eliminated by post-processing multiple measurements.
Gaining speed an energy efficiency is not usually accomplished by doing something multiple times, and then post processing the results of THAT, when you used to just do it once and got it right.
You'll have to do the measurements in parallel, and do it a lot faster to have time for the post processing and still come out ahead for performance. And I'm still not sure that buys you any improved efficiency.
random errors can be eliminated by post-processing multiple measurements.
And this is the real crux of the paradox :) Random errors can be introduced by post processing multiple measurements on an unreliable processor doing the post processing.
Now we have to post-post-process the results of the post-processed results to eliminate any random errors there? Turtles all the way down.
That said, as TFA suggested there are operations that can tolerate error, like video decoding -- and if we can realize substantial gains in performance or energy efficiency that translates into your laptop running a lot longer in exchange for a few transient (sub tenth of a second) pixel errors... that's a pretty good trade.
Re:godzilla by Anonymous Coward · 2013-11-04 06:42 · Score: 0

You might even allow the steering wheel to move very slightly at random in order to save power as long as it never causes the car to spin out of control,
You've obviously never lived in a place where it snows and ice remains on the roadways for months at a time. Stop by western Minnesota or the Dakotas sometime in mid-January after a 27" snowfall and say that again with a straight face.
The other analogies hold, though XD
Re:godzilla by K.+S.+Kyosuke · 2013-11-04 07:56 · Score: 2

Gaining speed an energy efficiency is not usually accomplished by doing something multiple times, and then post processing the results of THAT, when you used to just do it once and got it right.
For some kinds of computations, results can be verified in a time much shorter than the time in which they are computed. Often even asymptotically, but that's not even necessary. If you can perform a certain computation twice as fast and with half the energy on a faster but sometimes unreliable circuit/computational node, with the proviso that you need to invest five percent extra time and energy to check the result, you've still won big. (There are even kinds of computation when not even probabilistically wrong results don't matter all that much because the computation as a whole doesn't diverge easily, but I digress.)

Now we have to post-post-process the results of the post-processed results to eliminate any random errors there? Turtles all the way down.
How do you know that the universe isn't lying to you? How do you know that your brain isn't delusional about the lack of cognitive problems on your part? That's the same kind of questions.

That said, as TFA suggested there are operations that can tolerate error, like video decoding -- and if we can realize substantial gains in performance or energy efficiency that translates into your laptop running a lot longer in exchange for a few transient (sub tenth of a second) pixel errors... that's a pretty good trade.
I believe I've already seen that kind of computing before. I even believe there had already been a post on something like this here on /. (Unfortunately, that was at a time when I didn't write extensive searchable notes on the things I stumbled upon, so I can't serve with a link.)

--
Ezekiel 23:20
Re:godzilla by viperidaenz · 2013-11-04 08:08 · Score: 1

I'd rather end up at the wrong street number than sideways into a power pole...
Re:godzilla by Azure+Flash · 2013-11-04 08:12 · Score: 1

Are you kidding? Properties with a beautiful view on the battlefield between Godzilla and Mega Godzilla would definitely be worth MILLIONS of yen
Re:godzilla by Anonymous Coward · 2013-11-04 10:10 · Score: 0

LOL! Sounds like Denis Leary or something. Got to the line about the materials science guy jumping out of a birthday cake. Continuing my read.
Re:godzilla by kermidge · 2013-11-04 17:53 · Score: 1

God, that's one beautiful little piece of writing. Thank you.
From the posted summary "...if few pixels in each frame of a high-definition video are improperly decoded, viewers probably won't notice" - Now there's a slippery slope if ever I saw one.

Hmmm ... by gstoddart · 2013-11-04 02:38 · Score: 4, Insightful

So, expect the quality of computers to go downhill over the next few years, but we'll do out best to fix it in software?

That sounds like we're putting the quality control on the wrong side of the equation to me.

--
Lost at C:>. Found at C.

Re:Hmmm ... by bill_mcgonigle · 2013-11-04 02:44 · Score: 2

So, expect the quality of computers to go downhill over the next few years, but we'll do out best to fix it in software?
If you use modern hard drives, you've already accepted high error rates corrected by software.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Re:Hmmm ... by Anonymous Coward · 2013-11-04 02:45 · Score: 0

eh - more like allowing them to work outside the (rather small) window of reliability. Better reliability in high radiation environments like outer space, would be a good example.
Re:Hmmm ... by Desler · 2013-11-04 02:47 · Score: 1

Next few years? More like a few decades or more. Drivers, firmware microcode, etc. have always contained software workarounds to hardware bugs. This is nothing new.
Re: Hmmm ... by fizzer06 · 2013-11-04 02:56 · Score: 1

I haven't accepted bad data from the newer hard drives.
Re:Hmmm ... by fast+turtle · 2013-11-04 02:57 · Score: 1

if you access any server remotely then you're already using this - it's called ECC RAM

--
Mod me up/Mod me down: I wont frown as I've no crown
Re:Hmmm ... by Anonymous Coward · 2013-11-04 03:05 · Score: 0

ECC RAM doesn't use software to correct errors.
Re:Hmmm ... by ZeroPly · 2013-11-04 03:19 · Score: 2

Relax, pal - frameworks that don't particularly care about accuracy have been around for years now. If you don't believe me, talk to anyone who uses .NET Framework.

--
Support microSD: in a post 9/11 world, it is unwise to carry your data on media that you cannot comfortably swallow.
Re:Hmmm ... by Anonymous Coward · 2013-11-04 03:27 · Score: 0

But in that case you're depending on reliable hardware to correct unreliable hardware. If the hard drive microcontroller played it fast and loose, you'd be screwed either way.
Re:Hmmm ... by InsightfulPlusTwo · 2013-11-04 03:50 · Score: 1

You don't seem to have read the article. The software is not going to supply extra error correction when the hardware has errors. It's going to allow the programmer to specify code operations that can tolerate more errors, which the compiler can then move to the lower-quality hardware. Some software operations, like audio or video playback, can allow errors and still work OK, which allows you to use lower-energy less-quality hardware for those operations. If they did as you suggest, and tried to fix hardware errors in the software, that would cause the software to take more energy to correct the errors and be more complex besides, which would seem to negate the benefits of the new hardware. This is not unprecedented since various applications (audio CDs, hearing aids, etc.) already use a lesser standard of error correction.

--
I felt bad for the man who had no signature, until I met a man who had no comment.
Re:Hmmm ... by Joshua+Fan · 2013-11-04 03:56 · Score: 1

All in preparation for next big thing after that... MORE accurate hardware! 6.24% more!
Re:Hmmm ... by Anonymous Coward · 2013-11-04 04:51 · Score: 0

Then take a USB stick... There you use ECC to correct bad bits... Usually done in software in the embedded microcontroller...
Or if you take basically any router out there... They basically all use NAND flash, since it's cheap due to being shrunk down so much, and that does ECC correction usually in the driver...
Or if you take basically any SSD out there you have the same thing where the firmware takes care of the used ECC...
Doing ECC in hardware can be quite expensive since you can then be locked down to a limited amount of chip-manufactures...
Re:Hmmm ... by fizzer06 · 2013-11-04 05:13 · Score: 1

frameworks that don't particularly care about accuracy . . . .NET Framework.
Okay, I'll bite. Explain yourself.
Re:Hmmm ... by ZeroPly · 2013-11-04 05:41 · Score: 1

I'm an application deployment guy, not a programmer. Every time we push something that needs .NET Framework, the end users complain about it being hideously slow. Our MS developers of course want everyone to have a Core i7 machine with 64GB RAM and SSD hard drive - to which I reply "learn how to write some fucking code without seven layers of frameworks and abstraction layers".

Then of course, I can never get a straight answer from the developers on which .NET to install. Do you want 4, 3.5 SP1, 2? The usual answer is "load all of them". I get that .NET Framework is great in theory, but if you have to deal with the actual implementation, you'll see things differently. A lot of times we'll get screen glitches which the devs are convinced is a MS issue, but there's no available fix, so we go with "that's not a serious enough problem to fix".

On the other side of the fence are the Linux apps I have to deploy. The Linux devs send me a .DEB file. I generally have that pushed out the same day.

--
Support microSD: in a post 9/11 world, it is unwise to carry your data on media that you cannot comfortably swallow.
Re:Hmmm ... by K.+S.+Kyosuke · 2013-11-04 05:51 · Score: 1

It uses algorithms to correct errors, instead of simply using more reliable memory cell hardware. I believe that's the point of the comparison, not whether the algorithm runs in software or in hardware.

--
Ezekiel 23:20
Re:Hmmm ... by viperidaenz · 2013-11-04 07:59 · Score: 1

Not 6.24%, 6.26%... or was it 8.24%?
I forget which bit got flipped.
Re:Hmmm ... by viperidaenz · 2013-11-04 08:04 · Score: 2

Our MS developers of course want everyone to have a Core i7 machine with 64GB RAM and SSD hard drive
Do what the company I'm working for has done then.
Give everyone an i7 with 16GB RAM and an SSD.
Except they run Windows 7 32bit, so we can only use 4GB of that (and PAE is disabled on Win7 32bit), and the SSD is the D: drive, not the system drive so when everything does page, it slows to a crawl.
Re:Hmmm ... by Anonymous Coward · 2013-11-04 15:29 · Score: 0

And that is relevant to a discussion on accuracy, how?

I for one welcome by Anonymous Coward · 2013-11-04 02:40 · Score: 0

...Our new snowy-screen overlords. I was just thinking how much I liked the good old days where if the TV was flaking out, you could just give the set a good whack and it would get its act together.

Seriously, why do we want to do this? Is power usage going to cut in half? Are yields going to double? I think it's nice to talk about (especially in the interest of making systems that go "kinda bad" instead of completely breaking, but why not just invest the time/effort into fixing the issues directly? Have we run out of ideas on how to do that?

Re:I for one welcome by alexander_686 · 2013-11-04 03:34 · Score: 1

.Seriously, why do we want to do this? Is power usage going to cut in half?
Yes. Well, about in 1/2. Think about signal processors and cell phones. Would you accept a 5% reduction in voice quality for a doubling of your talk time?
Re:I for one welcome by Anonymous Coward · 2013-11-04 03:44 · Score: 0

No. Because I can hardly understand people on the phone already. Anyway, the solution there is custom ASICs to do encoding/decoding, not non-deterministic software.
Re:I for one welcome by viperidaenz · 2013-11-04 07:58 · Score: 1

Except the battery drain in talk-time is mostly the radio, not the CPU.
The battery drain while using it is mostly the screen backlight.
So cutting in half the power consumption of something contributing and almost insignificant amount of power is going to do not much.

Huh? by Desler · 2013-11-04 02:41 · Score: 1

but relaxing the requirement of perfect decoding could yield gains in speed or energy efficiency."

Which you could already get now simply by not doing error correction. No need for some other programming framework to get this.

Re:Huh? by SJHillman · 2013-11-04 03:08 · Score: 1

It's not so much about skipping error correction as it is saying when you can skip error correction. If 5 pixels are decoded improperly, fuck it, just keep going. However, if 500 pixels are decoded improperly, then maybe it's time to fix that.
Re:Huh? by Desler · 2013-11-04 03:28 · Score: 1

And as I said you can do that already.
Re:Huh? by MightyYar · 2013-11-04 03:41 · Score: 1

Really? You can tell your phone/PC/laptop/whatever to run the graphics chip at an unreliably low voltage on demand?

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re:Huh? by HybridST · 2013-11-04 04:14 · Score: 1

For PC and laptop, yes I can.
Overclocking utilities can also underclock and near the lower stability threshold of graphics frequency, I often do see a few pixels out of whack. Not enough to crash, but artifacts definitely appear. A mhz or 2 higher clock clears them up though.
I have a dumb phone so reclocking it isn't necessary.

--
Ever notice that Cobra Commander sounds an awful lot like Star scream?
Re:Huh? by MightyYar · 2013-11-04 04:20 · Score: 1

So you've done this yourself and you still don't see the utility in doing it at the application level rather than the system level?

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re:Huh? by HybridST · 2013-11-04 05:35 · Score: 1

Automating the process would be handy, but not revolutionary. Automating it at the system level makes more sense to me but i'm just a power user.

--
Ever notice that Cobra Commander sounds an awful lot like Star scream?
Re:Huh? by Xrikcus · 2013-11-04 05:37 · Score: 1

When you do it that way you have no control over which computations are inaccurate. There's a lot more you can do if you have some input information from higher levels of the system.
You may be happy that your pixels come out wrong occasionally, but you certainly don't want the memory allocator that controls the data to do the same. The point of this kind of technology (which is becoming common in research at the moment, the MIT link here is a good use of the marketing department) is to be able to control this in a more fine-grained fashion. For example, you could mark the code in the memory allocator as accurate - it must not have errors and so must enable any hardware error correction, might use a core on the platform that operates at a higher voltage, or would add extra software error correction as necessary. At the same time you might allow the visualization code to degrade to reduce overall power consumption, because the visualization code is not mutating any important data structures. Anything it generates is transient and the errors will barely be noticed.
Re:Huh? by MightyYar · 2013-11-04 05:49 · Score: 1

I don't think it is revolutionary, either... it's just a framework, after all. I was imagining a use where you have some super-low-power device out in the woods sampling temperatures, only firing itself up to "reliable" when it needs to send out data or something. Or a smartphone media app that lets the user choose between high video/audio quality and better battery life. Yeah, they could have already done this with some custom driver or something, but presumably having an existing framework would make it easier, less apt to conflict, and more standard.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.

Or we could just use java by Anonymous Coward · 2013-11-04 02:45 · Score: 0

Or we could just use java, with it's "almost" IEEE complete libraries. I mean who really needs a perfect answer anway?

Does that register "really" need to contain that value? How about any value?

Does the stack pointer "really" need to be there?

Does the password really need to match? It's just a hash anyway, what are a few bits of uncertainty?

Does the packet "really" need to get sent?

does the CRC "really" have to match..

These functions will make great improvements to Java.

Re:Or we could just use java by viperidaenz · 2013-11-04 07:54 · Score: 1

Or we could just use java, with it's "almost" IEEE complete libraries
That's a design feature and what strictfp is for. It's not Sun's fault all the different CPU's Java code can run on implement floating point hardware differently. The only other option is to emulate it in software.
It's a pitty nothing you mentioned has anything to do with Java not guaranteeing floating point operations.

viewers probably won't notice? by Anonymous Coward · 2013-11-04 02:46 · Score: 0

if few pixels in each frame of a high-definition video are improperly decoded, viewers probably won't notice

We used to return those LCD monitors back to the store.

Re:viewers probably won't notice? by Desler · 2013-11-04 02:49 · Score: 1

You confuse what that sentence is talking about. They aren't talking about stuck pixels on an LCD. It's talking about not spending time doing extensive error correction/masking when a few pixels in the video are corrupted and thus will be decoded with some level of artifacting.
Re:viewers probably won't notice? by SJHillman · 2013-11-04 03:09 · Score: 1

You must have gone through a lot of monitors before realizing this has nothing to do with dead pixels on a display.
Re:viewers probably won't notice? by viperidaenz · 2013-11-04 07:50 · Score: 1

A stuck pixel is still just an unreliable transistor...
Re: viewers probably won't notice? by Anonymous Coward · 2013-11-04 08:01 · Score: 0

Haha, sucker, you can't anymore - now your monitor isn't "defective" unless it contains a large number of defective pixels (like 10). The store may let you exchange it for another one, if they're feeling generous, but it's not considered defective.

"A few pixels incorrectly decoded"... by gnasher719 · 2013-11-04 02:47 · Score: 1

h.264 relies heavily on the pixels in all previous frames. Incorrectly decoded pixels will be visible on many frames that are following. What's worse, they will start moving around and spreading.

Re:"A few pixels incorrectly decoded"... by Desler · 2013-11-04 02:51 · Score: 1

Not always true. There are cases where corrupted macroblocks will only cause artifacts in a single frame and won't necessarily cause further decoding corruption.
Re:"A few pixels incorrectly decoded"... by Anonymous Coward · 2013-11-04 03:01 · Score: 0

Incorrectly decoded pixels will be visible on many frames that are following
Hardly 'many', only a few seconds worth.
Re:"A few pixels incorrectly decoded"... by SJHillman · 2013-11-04 03:11 · Score: 1

So what you're saying is that the pixels are alive, and growing! I smell a SyFy movie of the week in the works.
Re:"A few pixels incorrectly decoded"... by Anonymous Coward · 2013-11-04 03:12 · Score: 0

How many frames in a few seconds? For acceptable frame rates that's many enough for me.
And no, I don't think 24 fps is enough. Especially in this day and age.
Re:"A few pixels incorrectly decoded"... by gigaherz · 2013-11-04 03:17 · Score: 1

You missed the point. This is a framework for writing code that KNOWS about unreliable bits. The whole idea is that it lets you write algorithms that can tell the compiler where it's acceptable to have a few errores bits, and where isn't. No one said it would apply to EXISITNG code...
Re:"A few pixels incorrectly decoded"... by Anonymous Coward · 2013-11-04 03:39 · Score: 0

So what you're saying is that the pixels are alive, and growing! I smell a SyFy movie of the week in the works.
The notion of living and evolving pixels is indeed very recent.
Re:"A few pixels incorrectly decoded"... by SuricouRaven · 2013-11-04 03:45 · Score: 1

24fps? Depends on content. It's too high for landscapes establishing shots, talking heads and presentations. Yet too low for high-action scenes and sports. It's a happy medium.
If you don't like it, try to get variable frame rate support more established. Then everyone is happy.
Re:"A few pixels incorrectly decoded"... by MightyYar · 2013-11-04 03:46 · Score: 1

So then for you, the compromise in this particular example would be that you would crank up the power a bit and make the pixels all perfect. Other people without such good eyes could crank down the power and get more battery life.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re:"A few pixels incorrectly decoded"... by Anonymous Coward · 2013-11-04 04:51 · Score: 0

The idea is sound though...
So lets say you decode a pixel and end up with red 255 green 200 and blue 150 But it should be red 254 green 200 and blue 150.
The question is that acceptable. For viewing maybe. Would have to try it but it *may* create a warble in that pixel. Especially if the value maybe was borderline and flicked back and forth between 254 and 255. Would depend on the frequency of the i frame.
But say maybe an audio decode. Would you be able to hear an off by 1 bit difference? Probably not.
The idea is that accuracy costs time and power. If it were less accurate you may be able to lower both of those. Just so long as you were in the tolerance. I think the entire NTSC spec is basically that :)
Re:"A few pixels incorrectly decoded"... by viperidaenz · 2013-11-04 07:47 · Score: 1

So why not just add more instructions, for doing faster but less accurate calculations? 24bit operations for RGB values, for example.

How on earth by dmatos · 2013-11-04 02:48 · Score: 4, Insightful

are they going to make "unreliable transistors" that, upon failure, simply decode a pixel incorrectly, rather than, oh, I don't know, branching the program to an unspecified memory address in the middle of nowhere and borking everything.

They'd have to completely re-architect whatever chip is doing the calculations. You'd need three classes of "data" - instructions, important data (branch addresses, etc), and unimportant data. Only one of these could be run on unreliable transistors.

I can't imagine a way of doing that where the overhead takes less time than actually using decent transistors in the first place.

Oh, wait. It's a software lab that's doing this. Never mind, they're not thinking about the hardware at all.

--

It may look like I'm doing nothing, but I'm actively waiting for my problems to go away.
--Scott Adams

Re:How on earth by Anonymous Coward · 2013-11-04 02:58 · Score: 0

Unreliable computing is like NoSQL, a nice step backwards, allowing quantity to win over quality.
What is unimportant data? If MPEG used all IFRAME packets, it might make sense, but a blown bit will just propagate until the next IFRAME comes around in the sequence, and this would be at best distracting, at worse, render a movie unwatchable.
Already, look at software. It is unreliable as it is. Hardware is where one needs to know every time you stick in 2+2, you get 4, even for large values of 4. We don't need another part of the computing stack where corners are cut.
Re:How on earth by bestdealex · 2013-11-04 03:00 · Score: 1

Where are my mod points when I need them?! This is exactly my sentiment as well. Even the simple processing required to check if the data output is correct or within bounds will be staggering compared to simply letting it pass.

--
If you can't convince them, confuse them!
Re:How on earth by Anonymous Coward · 2013-11-04 03:03 · Score: 0

Oh, wait. It's a software lab that's doing this. Never mind, they're not thinking about the hardware at all.
Doesn't mean that the research will be completely useless.
Unreliable results sounds like something you would get from quantum computers.
For super scalar CPUs it could also be of interest to add an extra line of floating point operations implemented with analog circuits. Most floating point operations doesn't need an exact result anyway, allowing the CPU to parallelize the operations where possible could have boosted performance slightly if the problem wasn't cache-misses.
Re:How on earth by Anonymous Coward · 2013-11-04 03:09 · Score: 0

something like harvard architecture with different "reliability"
Re:How on earth by Anonymous Coward · 2013-11-04 03:12 · Score: 0

What they are talking about with unreliable transistors is as transistors become smaller you get more unexpected things to happen (electron tunneling) that can cause an unexpected result. The number of these unexpected things will vary transistor to transistor.
So while the transistor might give you the correct action 95% of the time, that other 5% is an issue. They are saying rather than having to redo the entire chip we can keep the chip and check and correct the incorrect issues in software. If the problem happens when it isn't tolerable to have an error we send it back and do it again, but if the error is causing something minor like a single pixel in a movie to be colored wrong for a frame, we can accept it and not have to take the time and energy to reprocess for a minor error.
Re:How on earth by Anonymous Coward · 2013-11-04 03:12 · Score: 0

Algorithms that operate on floating point data may not need exact results, but they're sure as hell do not need results randomly distributed on an unknown distribution that may or may not violate invariants such as x^2>0
Re:How on earth by Anonymous Coward · 2013-11-04 03:13 · Score: 0

Well, let's not be too hasty - This is pretty much how any modern hardware video decoder chip works. The logic is "fixed function", that is, there aren't any conditional branches in the VHDL code which use the value of input data to decide which path to take.
For a task such as video decoding, indeed, this is quite ideal. After all, you know that you have some input data which turns into some output data of a certain bounded length, dimensionality, and structure. If there is an error in the input data stream, it will simply decode it "wrong" until the next frame marker / resync point.
That frame will look garbled, but the decode logic won't "crash", simply because it doesn't do any unbounded operations depending on the input (like allocating 'N' bytes of memory or looping 'N' times).
Re:How on earth by Anonymous Coward · 2013-11-04 03:18 · Score: 0

They're not proposing bad RAM. In a CPU, different tasks are performed by different parts of the CPU. Instruction and address decoding are not done by the same transistors that do arithmetic calculations, for example. You could make an unreliable floating point unit and not affect the program flow in any way (except where it depends on results which you'd then know to be unreliable). The simple things that guide program flow (comparisons, address calculations, etc.) are fast and efficient. Calculating the root of a floating point number for example is expensive in comparison. There are many applications where the exact result doesn't matter, but speed and power consumption are important. Unless you've given a lot of thought to the intricacies of floating point math, chances are that your application doesn't really rely on the full precision guaranteed by the hardware specification. A big class of CPU bugs consists of so-called speed-paths, where a part of the CPU expects a calculation in a different part of the CPU to be complete before it has actually completed. If you arrange the calculations such that incomplete calculations just lose precision and don't give wildly different results, then you don't need to wait the maximum time every time. Instead you can wait only long enough for most calculations to complete and take the less precise "speed path" result for the remaining calculations. This speeds up all calculations at the cost of making a few minor mistakes, which are not a problem if the software can handle these mistakes (or if they don't even matter in the first place).
Re:How on earth by gigaherz · 2013-11-04 03:23 · Score: 1

This was in slashdot years ago. I can't find the slashdot link, but I did find this one. The idea is that you design a cpu focusing the reliability in the more significant bits, while you allow the least significant bits to be wrong more often. The errors will be centered around the right values (and tend to average into them), so if you write code that is aware of that fact, you can teach it to compensate for the wrong values. Of course this is not acceptable for certain kinds of software, but for things like multimedia processing, a small % error in the result wouldn't be appreciable, and over time, the image should keep averaging out the old errors while introducing new ones, assuming the software is designed for it.
Re:How on earth by MightyYar · 2013-11-04 03:50 · Score: 1

Doesn't that depend on the application? What if I'm simply updating a position based upon an already noisy sensor? I already have a bunch of code to throw out crappy results. I'm taking lots of samples, so as long as most of my measurements are accurate, it's all good. Obviously I can't tolerate a random error in every single cycle, but maybe 1 in a million is OK and lets me run at a lower voltage.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re:How on earth by Anonymous Coward · 2013-11-04 03:52 · Score: 0

This is not magic or novel. It is just extension of stuff already used in the GPU world and available in OpenCL.
http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/exp.html
see the function for exp? half_exp? native_exp? That is what you can optimize - floating point operations mostly.
Re:How on earth by Anonymous Coward · 2013-11-04 04:15 · Score: 0

Why did this get so many mod points when they clearly have no idea what they're talking about? This whole post is basically saying "I have no idea how this works therefore it's wrong! Nothing must exist that is outside my understanding!"
Re:How on earth by Warbothong · 2013-11-04 04:34 · Score: 1

How on earth are they going to make "unreliable transistors" that, upon failure, simply decode a pixel incorrectly, rather than, oh, I don't know, branching the program to an unspecified memory address in the middle of nowhere and borking everything.
Very easily: the developer specifies that pixel values can tolerate errors but that branch conditions/memory addresses can't. If you'd bothered to read the summary, you'll see it says exactly that:

a new programming framework that enables software developers to specify when errors may be tolerable.

They'd have to completely re-architect whatever chip is doing the calculations.
Erm, that's the whole point. If we allowed high error rates with existing architectures, none of our results would be trustworthy. I imagine the most practical approach would be a fast, low-power but error-prone co-processor living alongside the main, low-error processor. This could be programmed just like GPUs are at the moment. The nice thing about this work is that the separation can be largely transparent; just annotate your programs and the compiler will figure out which parts can be offloaded to the co-processor.

I can't imagine a way of doing that where the overhead takes less time than actually using decent transistors in the first place.
As far as I can tell there is no overhead involved. In fact it's the other way around: calculating exact answers (as we do now) is a perfectly acceptable way to execute an error-tolerant program. The opposite is not true though: an error-intolerant program cannot be executed with errors. Since we're strictly increasing the execution strategies available, we can only ever increase efficiency (since we can choose to ignore the new strategies).
Re:How on earth by Anonymous Coward · 2013-11-04 04:47 · Score: 0

Does this edge case outweigh 6?
1) A shorter-running (or more parallel) approximation to a correct algorithm can be encoded with different instructions
2) The longer-running instructions still exist
3) The code is marked up (pragma or hand-written?) to use the specific instructions that can produce a random error
4) You know the probability distribution of the error, and it is consistent
5) All parts of your application are written to tolerate the error, and are provably free of bugs or assumption violations stemming from it.
6) You believe this is a "better" for power usage than e.g. simpler silicon on the sensor to smooth/average out its samples itself.
Re:How on earth by tlhIngan · 2013-11-04 04:58 · Score: 1

are they going to make "unreliable transistors" that, upon failure, simply decode a pixel incorrectly, rather than, oh, I don't know, branching the program to an unspecified memory address in the middle of nowhere and borking everything.
They'd have to completely re-architect whatever chip is doing the calculations. You'd need three classes of "data" - instructions, important data (branch addresses, etc), and unimportant data. Only one of these could be run on unreliable transistors.
I can't imagine a way of doing that where the overhead takes less time than actually using decent transistors in the first place.
Oh, wait. It's a software lab that's doing this. Never mind, they're not thinking about the hardware at all.
More properly, the language takes care of it.
You declare variables to be "approximate" - where errors are tolerated and you can use lower power hardware to do it (it turns out reliability means having to use higher voltages which raise power consumption, and lower clock speeds which keeps cores powered up longer rather than race them to sleep as fast as possible).
So a counter would be "exact" and have to use the high-powered reliable hardware mode, while the pixel data will be inexact and use low power mode. Even a counter that iterates over the pixel array has to be exact.
And you can easily transition from exact data to inexact data, but transitions back are limited and explicity - you can't test inexact values - you have to promote the inexact data (because there will always be times when you need to deal with it).
Of course, it's a new programming language because existing ones model reliable systems.
Re:How on earth by bluefoxlucid · 2013-11-04 07:06 · Score: 3, Insightful

Erm, that's the whole point. If we allowed high error rates with existing architectures, none of our results would be trustworthy. I imagine the most practical approach would be a fast, low-power but error-prone co-processor living alongside the main, low-error processor.
Or you know, the thing from 5000 years ago where we used 3 CPUs (we could on-package ALU this shit today) all running at high speeds and looking for 2 that get the same result and accepting that result. It's called MISD architecture.

--
Support my political activism on Patreon.
Re:How on earth by viperidaenz · 2013-11-04 07:38 · Score: 1

A big class of CPU bugs consists of so-called speed-paths, where a part of the CPU expects a calculation in a different part of the CPU to be complete before it has actually completed
Care to expand on that? This is not a typical race condition. What you're describing is a CPU not ordering instructions as expected - not doing its primary purpose.
Re:How on earth by viperidaenz · 2013-11-04 07:44 · Score: 1

2+2=5 for large values of 2.
When you're performing calculations, you need to know where and how rounding takes place if everything isn't an integer.
Re:How on earth by Darinbob · 2013-11-04 07:54 · Score: 1

It seems a bit strange to me also. Didn't real all the article; but a few pixels wrong is extremely minor and very lucky. One wrong bit is far more likely to crash your computer than to make a pixel be incorrect. What about the CPU? Are we so media obsessed now that getting the pixels wrong is considered a major error but we completely ignore all the serious errors that could result? We'd need redundant transistors to monitor everything, making sure that the CPU registers have the correct values, that addition is performed correctly, that all memory values have not been corrupted. And at that point the redundant transistors are eliminating the gain achieved by making transistors smaller.
A software solution here would have to ultimately come down to the machine language level. Ie, some add operators are tagged as error tolerant but others tagged as crucial, so they go to different ALUs (the cheap ass one versus one with redundancy). Every single branch would still have to go to the highest quality ALU though.
Re:How on earth by Anonymous Coward · 2013-11-04 08:16 · Score: 0

CPUs are analog devices. Signals take time to propagate, transistors don't switch immediately, rise times depend on how many loads a transistor is currently driving, etc. Suppose you take a couple of logic gates and build an adder. The sum of the inputs doesn't appear at the output pins immediately. It takes a short but non-negligible time for the transistors to switch and the signals to propagate, and this time depends on the environment, the values that you're adding, manufacturing tolerances and more. If you want to use that adder in a clocked circuit, you have to make sure that whatever needs to use the result waits long enough for it to appear.
To know how fast a CPU can clock, CPU makers identify speed paths, which are the calculations where errors occur first as the clock frequency goes up. Sometimes there are unidentified speed paths and those lead to CPUs being binned for a higher clock speed than they can actually support in all situations and under all circumstances. If that happens, then there are calculations which cause the CPU to work with bad data because the voltage levels which represent the final result from one part of the CPU, say the ALU, don't arrive in time before another part of the CPU tries to read the result.
It may be beneficial to allow speed path errors (and other errors) if the CPU is designed to calculate correctly most of the time and to give almost correct results otherwise.
Re:How on earth by able1234au · 2013-11-04 11:25 · Score: 1

This is the better approach but i wonder if there is a saving with 3 dodgy processors over 1 good processor. i guess if the yield falls below one third then it might. But power requirements may triple so hard to see the saving.
Re:How on earth by jouassou · 2013-11-04 14:21 · Score: 1

I can imagine a couple of applications of these transistors though...
Many numerical simulations require repeated random sampling of some process, and then combine the results in the end. If you're averaging some billion simulations, the result should be quite robust to fluctuations in the results of each simulation. Thus it might well be worth it to use 10 billion unreliable transistors instead of 1 billion reliable transistors, if they cost the same.
Another application could be to generate random numbers. Let's say that you have a pseudorandom number generator with periodicity N, and your unreliable transistors makes the algorithm do a random jump after an average of N/100 numbers. Wouldn't that be "random enough" for more applications than just the pseudorandom number generator itself?
Re:How on earth by bluefoxlucid · 2013-11-05 03:31 · Score: 1

Power requirements actually increase hyperlinearly. DDR RAM uses a serializer, for example, so that you run the RAM at 100MHz but fetch multiple bytes into a buffer and output that across your FSB. This is because running the RAM at 100MHz takes N power, while running at 200MHz takes N^2 power or something ridiculously bigger than 2N.

--
Support my political activism on Patreon.

unrelliable is not really useful by Anonymous Coward · 2013-11-04 03:00 · Score: 0

Yeah we really want those "almost working" machines:
- flying planes
- controlling infrastructure
- running financial transactions
- doing medical inferrence

Well they would probably be alright in mobile devices, except
- when authorizing transactions
- doing secure communications

Well they would be fine for Angry Birds.

Re:unrelliable is not really useful by somersault · 2013-11-04 03:35 · Score: 1

What exactly led you to believe that anyone is wanting to use this concept in situations where 100% reliability is required?

--
which is totally what she said
Re:unrelliable is not really useful by Anonymous Coward · 2013-11-04 03:42 · Score: 0

Not really. Didn't Samsung get slammed in China for selling truckloads of "almost working" memory in their mobile phones? The result wasn't a few corrupted pixels in Angry Birds. It results in a few corrupted officials getting off their asses after too many Angry Customers couldn't stand their phones locking several times a day anymore while Samsung claimed "almost working" should be good enough for everybody :)
Re:unrelliable is not really useful by Anonymous Coward · 2013-11-04 07:51 · Score: 0

So what? No one is putting this technology to places where accurate results are important.

Chicken and the Egg. by jellomizer · 2013-11-04 03:17 · Score: 3, Informative

We need software to design hardware to make software...

In short it is about better adjusting your tolerance levels on individual features.
I want my Integer arithmetic to be perfect. My Floating point, good up to 8 decimals places. But there components meant for interfacing with the human. Audio, so much stuff is altered or loss due to difference in quality of speakers, every top notch ones with Gold(Or whatever crazy stuff) Cables. So in your digital to audio conversion, you may be fine if a voltage is a bit off, or you skipped a random change, as the smoothing mechanism will often hide that little mistake.

Now for displays... We need to be pixel perfect when we have screens with little movement. But if we are watching a movie, a Pixel color #8F6314 can be #A07310 for 1 60th of a second and we wouldn't notice it. And most most displays are not even high enough quality to show these differences.

We hear of these errors and think, how horrible that we are not good perfect products... However it is more due to the trade-off of getting smaller and faster with a few more glitches,

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.

Re:Chicken and the Egg. by CastrTroy · 2013-11-04 03:32 · Score: 2

Yeah, but you could save just as much power (I'm guessing) with dedicated hardware decoders, as you could by letting the chips be inaccurate. As chips get smaller it's much more feasible to hard hardware specific chips for just about everything. The ARM chips in phones and tablets have all kinds of specialized hardware, some for decoding video and audio, other's for doing encryption and other things that are usually costly for a general purpose processor. Plus it's a lot easier for the developer to not have to consider how inaccurate stuff can be, and just writing code as though things are actually going to be correct. Even programming with binary floating point numbers is problematic enough, as there's many decimal floating point numbers that can't be properly represented.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Chicken and the Egg. by Anonymous Coward · 2013-11-04 04:43 · Score: 0

Sillicon costs money... If you can shrink the die even more then it's a cost-saving... So if you can make the dedicated video-decoder take up 50% less space but at the cost of causing a few pixels to get the average or same color as the pixel(s) beside it it would be a huge cost-saving for minimal reduction in quality... Also things like motion-estimations needs some calculations so if they are off by a percent or two for a couple of frames will not cause any big visual artifacts...
You can also add in things like "This part only needs this much accuracy for these arithmetic functions so use some estimation with high enough accuracy"... Or maybe even "sin/cos should generate a circle, we don't care if the functions causes the height or width is off by 1-2% as long as it's predictable and always performs the same on the whole batch"...
But the main thing here is if they can shrink the isolation between components on the wafer it can shrink quite a bit but at the same time it will cause more glitches... This is what this is for, as far as i understood the summary.. (Hell no, i will never read the actual article :)
Re:Chicken and the Egg. by Dahamma · 2013-11-04 07:21 · Score: 1

Yeah, but you could save just as much power (I'm guessing) with dedicated hardware decoders, as you could by letting the chips be inaccurate.
Eh, a dedicated hardware decoder is still made out of silicon. That's the point, make chips that perform tasks like that (or other things pushing lots of data that is only relevant for a short period, like GPUs - GPUs used only for gfx and not computation, at least) tolerate some error, so that they can use even less power. No one is yet suggesting we make general purpose CPUs in today's architectures unreliable :)
Re:Chicken and the Egg. by CastrTroy · 2013-11-04 09:35 · Score: 1

Yeah, but that's not something the application level software developer has to account for. They just use OpenGL, or DirectX, and the chip and video card driver decides how to execute it and render it. Actually, with some graphics cards, and driver implementations, they basically do this already, by rendering the image incorrectly, it speeds up the result, and they hope nobody notices. Basically, if any error is acceptable when programming against certain hardware, it should just be handled at the API level for accessing the hardware. The people programming against the hardware shouldn't have to decide how much, if any, error is acceptable. For instance, If I'm decoding video, I would just pass the encoded stream to a function, and get decoded frames back, or they would be displayed on the screen. In many cases, it might even be user configurable. For some users might be OK for colors to be incorrect in exchange for higher frame rates. However, other users might want the exact opposite experience. Maybe their hardware is already producing enough frames, and they just want a nicer picture.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Chicken and the Egg. by Anonymous Coward · 2013-11-04 09:47 · Score: 0

Exactly. Anyone who thinks these errors will be a problem for, say, multimedia, is either an audio/videophile or doesn't understand the concept of lossy compression.
Re:Chicken and the Egg. by Dahamma · 2013-11-04 10:57 · Score: 1

They just use OpenGL, or DirectX, and the chip and video card driver decides how to execute it and render it.
*Real* use of OpenGL and DirectX these days is all about the shaders, which get compiled and run on the GPUs. And even basic ops that are "builtin" to the drivers usually are using shader code internal to the driver (or microcode internal to the hardware/firmware).

The people programming against the hardware shouldn't have to decide how much, if any, error is acceptable.
Absolutely they should, and have been doing so with existing 3D hardware for a long time. It's just been more about 3D rendering shortcuts/heuristics/etc than faulty hardware. It's all about tricking the viewer's eyes and brain to increasing degrees, not reproducing an exactly correctly rendered 3D image... and until everything is raytraced that will continue be the case.

For instance, If I'm decoding video, I would just pass the encoded stream to a function, and get decoded frames back, or they would be displayed on the screen.
Well, I just finished implemented stereoscopic 3D video playback on the PS4, and I guarantee it's more work than that ;) Libraries are provided to do the low level decoding, but demuxing, decryption, scaling/blitting to framebuffers, compositing with UI elements, audio processing, A/V sync, etc, are largely left up to the application programmer. Even even with that, the "hardware decoding" is mostly happening on the GPU or other *programmable* video decoder hardware anyway.
These reasons are precisely why GPUs / "hardware" decoders will likely be the first processors to benefit from frameworks like the one described in the article...

Pentium FDIV by Anonymous Coward · 2013-11-04 03:19 · Score: 0

So it's like the Pentium FDIV bug, where "a little error" wasn't enough reason to recall the processors until they got bashed for it.

Similar Idea to EnerJ Language by MetaDFF · 2013-11-04 03:30 · Score: 3, Interesting

The idea of fault tolerable computing is similar to the EnerJ programming language being developed at the University of Washington for power savings The Language of Good Enough Computing

The jist of the idea is that the programmer can specify which variables need to be exact and which variables can be approximate. The approximate variables would then be stored a low refresh RAM which is more prone to errors to save power, while the precise variables would be stored a higher power memory which would be error free.

The example they gave was calculating the average shade of grey in a large image of 1000 by 1000 pixels. The running total could be held in an approximate variable since the error incurred by adding one pixel incorrectly out of a million would be small, while the control loop variable would be accurate since you wouldn't want your loop to overflow.

Re:Similar Idea to EnerJ Language by JesseMcDonald · 2013-11-04 06:58 · Score: 1

The example they gave was calculating the average shade of grey in a large image of 1000 by 1000 pixels. The running total could be held in an approximate variable since the error incurred by adding one pixel incorrectly out of a million would be small...
What makes them think that the kinds of errors you'd get in a variable in low-refresh-rate RAM would be small? Flip the MSB from 1 to 0 and your total is suddenly divided in half. Or, if it's a floating-point variable, flip one bit in the exponent field and your total changes from 1.23232e4 to 1.23232e-124.

--
"The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
Re:Similar Idea to EnerJ Language by Anonymous Coward · 2013-11-04 08:41 · Score: 0

Or flip all the bits and divide by zero.

french by Spaham · 2013-11-04 03:37 · Score: 1

am I the only french who thinks that the "Computer Science and Artificial Intelligence Laboratory" sound like this in french :
CS-AIL ?

Re:french by Anonymous Coward · 2013-11-04 04:03 · Score: 0

Evidemment t'es le seul.

Seen this done a few times by Anonymous Coward · 2013-11-04 03:39 · Score: 0

Such as using imprecise calculations during an animation sequence moving a UI around since it won't really be worthwhile making it absolutely 100% perfect since it will be moving fast.
If it is something like the UI for a mobile app, extremely worth doing, but if it was a game that required pixel-perfect smooth animation, might be a problem. (mind you, in games, both cases are useful since you wouldn't notice, such as spinning stupidly fast in an FPS, or slowly moving through corridors hunting for people)

Simply put, some things do not require quality to perform, they just need something that more-or-less is accurate to a point.
It wouldn't be hard to imagine such a thing working easily, you just need to provide the system with the correct interfaces to allow software to punt code off to either the "TCP" of CPU or the "UDP" of CPU, to put it another way.
With TCP we want 100% of the data perfect, with UDP we only really care if most of it gets there or is even accurate.

We already use it. So if it could drop prices considerably, hell, even moderately, for some parts of a hardwares design, it would be good in the long run.
Mobile applications such as tablets, phones, watches and whatever comes next would benefit greatly from this.
But even fixed applications like desktops would benefit since it would drop prices in general. Only a percentage of hardware would need to be working 100% where some could have failed or be of bad quality. Most hardware already has a yield value, the effective products that survived creation, others are usually gimped and then sold at a cheaper price to make the most use of them.
Designing them with failure in mind could alleviate a lot of headaches and allow more headroom to fix other issues and come up with better designs.

It wouldn't be welcomed very well initially. When people think "inefficient" they'd think "bad", which would be right, but it isn't all bad.
It it adds more frames and less power use with very minimal impact on smooth animation when it is needed, what is to hate?
Not everything requires quality. There are acceptable levels of bad quality. Like JPEG for natural pictures. But not GIF. Screw GIF. And screw people that use GIF for non-animated imagery. Why won't you just unexist?

Unreliable unreliability by Anonymous Coward · 2013-11-04 04:03 · Score: 0

Ok i have to ask this... If chip itself is unreliable, then how can you trust software running on sayd chip to detect errors reliably? Would the unreliability of chip effect software as well?

This could make computers more brain-like by Dr.+Spork · 2013-11-04 04:04 · Score: 1

I love this idea, because it reminds me of the most energy efficient signal processing tool in the known universe, the human brain. Give Ken Jennings a granola bar, and he'll seriously challenge Watson, who will be needing several kilowatt-hours to do the same job. Plus, Ken Jennings is a lot more flexible. He can carry on conversations, tie shoes, etc. This is because his central processing unit basically relies on some sort of fault-tolerant software. I think that there will be a lot more applications of a fault-tolerant, energy efficient software strategy, beyond just media decoding. When we get around to asking computers to be creative and apply variously-weighted "rules of thumb", I expect that those operations will run best on systems that sacrifice calculation accuracy for speed and energy efficiency. You gain almost nothing when you apply rough heuristic rules precisely. Let's allow the computers to apply rough rules imprecisely, and reap the speed and energy benefits of the trade.

Re:This could make computers more brain-like by bluefoxlucid · 2013-11-04 07:19 · Score: 1

I love this idea, because it reminds me of the most energy efficient signal processing tool in the known universe, the human brain.
Dumb analogy. Being inaccurate does not make you more intelligent and won't cause emergent behavior.

Give Ken Jennings a granola bar, and he'll seriously challenge Watson, who will be needing several kilowatt-hours to do the same job.
Wrong. Ken Jennings' brain runs on blood sugar, glycogen stored in the liver from previous food (converted into blood sugar by glucagon as blood sugar is consumed for work), fat stored in consolidated fat cells from previous food (converted into blood sugar by lipolysis), and a huge set of neurotransmitters (mainly acetylcholine) stored up by prior processes. Never mind that you get 10% of the energy at each level--the plants convert 2% of the sunlight they collect to energy, which is mainly stored as inaccessible fiber and other structural work (i.e. vitamins, hormones...); herbavores (the normal analogy is 'pork chop' or 'steak') get maybe 10% of that converted energy; you get maybe 10% of the energy input from the herbavores. This is like 0.02% efficiency versus 12% efficient Photovoltaic panels or 38% efficient parabolic solar collectors, not considering the direct inefficiency of Ken Jennings' brain for converting sugar energy into useful work.

Plus, Ken Jennings is a lot more flexible. He can carry on conversations, tie shoes, etc. This is because his central processing unit basically relies on some sort of fault-tolerant software.
No, it's because he has better programming. A gerbil's brain relies on fault tolerant processing, and they can't talk or tie shoes; they can eat and have sex.

I think that there will be a lot more applications of a fault-tolerant, energy efficient software strategy, beyond just media decoding. When we get around to asking computers to be creative and apply variously-weighted "rules of thumb", I expect that those operations will run best on systems that sacrifice calculation accuracy for speed and energy efficiency. You gain almost nothing when you apply rough heuristic rules precisely. Let's allow the computers to apply rough rules imprecisely, and reap the speed and energy benefits of the trade.
Actually that's slow and stupid. This is less effective than taking a working computer with a high clock rate (SOD-CMOS at 394GHz, low-power, accurate) and seeding various inputs with a noise-based RNG (audio-entropyd measures the noise fluctuation on an unused microphone line, for example: this is just spaztastic voltage wobble from EMR).
Stop romanticising the human mind as a result of "lots of imprecise and uniquely organic failings creating something amazing and beautiful". It's a really fucking complex system.

--
Support my political activism on Patreon.
Re:This could make computers more brain-like by Anonymous Coward · 2013-11-04 12:54 · Score: 0

Are you Dwight Schrute?

The end of general-purpose computing? by Anonymous Coward · 2013-11-04 04:06 · Score: 0

For example, if few pixels in each frame of a high-definition video are improperly decoded, viewers probably won't notice

Ok, fine, but that assumes those transistors are dedicated to decoding, and are not used for anything that requires complete accuracy.

In other words, it assumes that we won't be using general-purpose computers in the future.

Remember that the future will be full of encryption and DRM. Those technologies have maximum brittleness -- just one wrong bit will cause large amounts of data to be discarded or blocked.

Re:The end of general-purpose computing? by viperidaenz · 2013-11-04 07:28 · Score: 1

In other words, it assumes that we won't be using general-purpose computers in the future.
Too true. Any transistor that is in the path of calculating anything that ends up as a memory location or an offset to one anywhere has the possibility of crashing the process if you're lucky, or compromising the entire system.

Nonsensical Garbage by Anonymous Coward · 2013-11-04 04:18 · Score: 0

For every transistor that is DIRECTLY connected to a specific software algorithm, even when this algorithm is encapsulated in a hardware block, there are >10 transistors acting in essential support roles, whose malfunction CANNOT be trivially ignored. Either a chip is fabricated so the TOTAL number of detectable errors across the die is vanishing low, or the chip is USELESS.

We already have the power/speed trade-off of lower precision for chains of maths calculations that can withstand the accumulating errors such lowered precision creates. Running a maths block at such a high speed, or low power, that the actions of individual transistors becomes impossible to predict, is self-defeating.

The problem is that cretinous PhDs, people who have remained in academia for all the wrong reasons, get to spew blue-sky papers to justify their existence to their universities, with no regard to the real world. "Hey, my maths is correct' does NOT ensure the quality of a paper. The 'maths' can always be made correct, with no regard to real world, applied issues.

As for error tolerance, well almost every computer is producing errors all the time. Your Windows PC, for instance, is designed to be error tolerant in the sense that most errors do not 'crash' the machine, and are frequently handled invisibly (to the user) in the background. HOWEVER, this does not mean any test is made to ensure the errors are 'harmless'. While your hardware and OS may seemingly continue to function happily, many errors CAN be silently corrupting data or processing on which you rely. It just so happens that with the experience of billions of computers deployed across the decades, such errors have proven to be worth ignoring on average.

AGAIN raising clocks, or lowering power increases the probability of errors, so the PROPER Computer Science method is to use the LEAST work to complete your task, and this includes using numerical analysis to do no more maths than is absolutely necessary. For instance, morons as poorly skilled as those responsible for the paper in the article had MPEG1/2 decoding as a fully floating point process, because "everyone knows floats, especially doubles, are always FAR better than integers". As a consequence, no two MPEG1/2 decode units created exactly the same output from the same input data.

MPEG4, on the other hand, was designed by DECENT mathematicians, and uses Integer decode methods, producing much more correct output, with less energy per unit of maths work (although decoding MPEG4 is more maths intensive than decoding MPEG1/2). Better, every MPEG4 decode unit produces identical results (if coded correctly).

No! Unreliability is a feature by Mister+Liberty · 2013-11-04 04:35 · Score: 1

May the best chi(m)p win.

Re:No! Unreliability is a feature by Iniamyen · 2013-11-04 05:26 · Score: 1

The cihps rlealy olny hvae to get the frsit and lsat ltetres corcert. Yuor brian can flil in the rset.
Re:No! Unreliability is a feature by Anonymous Coward · 2013-11-05 03:37 · Score: 0

Look buggy, my brain doesn't have a reset!

Already done by mjr167 · 2013-11-04 04:45 · Score: 1

Doesn't intel already make a chip that is unreliable?

Re:Already done by CaseCrash · 2013-11-04 08:42 · Score: 1

Well, they did make one that was reliably incorrect.

--
No, that link you posted to a web comic we've all seen a hundred times is not "obligatory."

Oh, GREAT by Iniamyen · 2013-11-04 04:58 · Score: 1

Yeah, let's take away the only thing that computers had going for them - doing exactly what they're told. THAT sounds like a GREAT idea.

How about stop making crap hardware? by Lumpy · 2013-11-04 05:17 · Score: 1

It can be done, we dont have to race for atomic size transistors before we have the technology ot make them more reliable.

--
Do not look at laser with remaining good eye.

Chips for unreliable programming... by Alejux · 2013-11-04 07:07 · Score: 1

now that would be world changing!

Decades old news by viperidaenz · 2013-11-04 07:22 · Score: 1

What do you think the artefacts shown on screen are when you overclock your video card too high? Acceptable (sometimes) hardware errors.

And the inexorable decline of humanity continues by EmagGeek · 2013-11-04 07:46 · Score: 1

This is why everything is disposable and nothing works anymore. People are too willing to sacrifice quality and reliability for cost.

These researchers are assholes by Anonymous Coward · 2013-11-04 07:47 · Score: 0

If errors in the low-order bits were economically acceptable, we wouldn't be using high-precision data formats like floats and doubles in the first place. We'd be using 4-bit fixed point or some such BS.

If you look at the history of GPUs, you see the exact opposite trend. The native data types have gotten larger and more precise every generation, because this is actually a very cheap thing to improve.

These assholes are hyping the shit out of a deliberately crippled product nobody asked for. Fuck them. Fuck them all.

Infinite recursion here? by Jorgensen · 2013-11-04 07:53 · Score: 1

So: This assumes that something, somewhere knows which transistors are unreliable. This data needs to be stored somewhere - on the "good" transistors. How is this data obtained? is there a trustworthy "map" of "unreliable transistors" ? And the code that determines the probability has to run on the "good" transistors too. Will those transistors stay good?

I cannot see any way of allowing *any* transistor being unreliable... And based on my (admittedly incomplete) understanding of chip production, *any one* of the transistors on the sillicon can be faulty, so there still is a chicken-and-egg problem in here somewhere.

Surely, such "suspect" transistors can only be used for storing the final end result of a calculation: If you were to use it for intermediate values on which you base "if" statements (or any sort of branch), your code will end up unreliable as a result. Unfortunately, 99% of the time the "end result" of one calculation is used as input to another calculation, so the problem spreads like rings in the water.

What if humans want to rely on the output of the computer? Does that pixel on the screen matter? If you are playing Angry Birds, fine. But the pixels may be important if you're a doctor looking at a scan. Or you're a flight controller scanning the screen for planes. The graphics routines do not know the context in which they run. So the actual usability of this ends up being radically diminished....

What use is a computer where you cannot trust the result? We already have logic bugs, race conditions, usability issues etc confusing everybody - I don't think we need to make the computers even more unreliable...

Funny.... by hackus · 2013-11-04 09:43 · Score: 1

I already thought we had a framework for making chips unreliable in the programming realm known as Windows API.

Oh wait...

-Hackus

--
Got Geometrodynamics? Awe, too hard to figure out? Too bad.

faster and broken != upgrade by Gravis+Zero · 2013-11-04 10:59 · Score: 1

if it's a choice between using a slower chip that is reliable and a chip that is blistering fast but makes mistakes, i'll take the slower chip every time.

--
Anons need not reply. Questions end with a question mark.

a fourth possibility by NikeHerc · 2013-11-05 03:57 · Score: 1

From the article: "A third possibility, which some researchers have begun to float, is that we could simply let our computers make more mistakes.

A fourth possibility is to forget this silliness before it turns into epic failure, go back to the drawing board, and design computers that make fewer mistakes, not more mistakes. Sheesh, what lunacy!

--
Circle the wagons and fire inward. Entropy increases without bounds.

Slashdot Mirror

New Framework For Programming Unreliable Chips

128 comments