'Inexact' Chips Save Power By Fudging the Math
Barence writes "Computer scientists have unveiled a computer chip that turns traditional thinking about mathematical accuracy on its head by fudging calculations. The concept works by allowing processing components — such as hardware for adding and multiplying numbers — to make a few mistakes, which means they are not working as hard, and so use less power and get through tasks more quickly. The Rice University researchers say prototypes are 15 times more efficient and could be used in some applications without having a negative effect."
37 posts about the Pentium division bug.
Don't they do this too, but fudge the maths so they can be a bit faster?
Hence the Cuda stuff needed special modes to operate in IEEE floats etc...
I wish I could say reading the article gave me some insight as to where it fudges, but they kinda left it out.
He tried to kill me with a forklift!
CC.
TaijiQuan (Huang, 5 loosenings)
These chips will, of course, be aimed at government markets.
This is first post according to my new power-efficient computer!
See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
I feel this is more relevant: http://xkcd.com/1047/
At least our eventual computer overlords won't be able to count accurately to be sure they've eliminated all of us...
technical whipping boy, Occam's Strop (think about it...)
From here on out, I'm requiring my chips to show their work. And, it better not look the same as the work that that northbridge chip you are sitting next to.
Seems like nothing new to me. Floating point binary math is basically used for the same reason. It gives us and answer that's close enough, without requiring too much computation time. And it causes all sorts of fun since even simple numbers like 0.1 can't be represented exactly in binary floating point. Binary floating point works well for scientific apps, but fails quite badly at financial apps. I think this is basically taking floating point to the next level where the calculations are even more off. Which might work for certain applications, but for other types of applications would be completely catastrophic. What really bothers me is languages and platforms that provide no ability to work with numbers in a decimal representation.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Before someone comes up with that stupid remark, not much. :) If the chips are 15 times as efficient as normal ones, it means that you could run for instance four in parallel and rerun each calculation in which one of them differs. That way you would both get both accurate calculations and power savings. Modify the number of chips to run in parallel depending on the accuracy and efficiency needed.
Football Odds
Meh, close enough.
"This isn't so much a circle as a square, what the hell's going on?!"
"Oh, that's because the chip in your machine doesn't accurately define PI, it rounds the value up"
"To what?"
"4"
Summation 2
Oh you misunderstand. It will still return the "right" answer, it'll just be "engineer" right, not "mathematician" right, i.e. "Good enough for all intents and purposes.
Furthermore, posting under the top post when your reply is nothing to do with the OP is considered a faux pas. Minus 50 DKP.
Finally had enough. Come see us over at https://soylentnews.org/
Anyone else think of http://en.wikipedia.org/wiki/Technology_in_The_Hitchhiker%27s_Guide_to_the_Galaxy#Bistromathic_drive when they read this?
Hmm, seems this has been used by The Fed and European Central Bank for quite a while now.
"But we decide which is right, and which is an illusion"
In more recent news, computer scientists determined that monkeys can get the same job done even faster, and by using even less power, and by making, um... a lot more mistakes.
This is exactly the problem with American chips lately. They're too lazy to put any effort into their work. Sure, they're "saving energy" but that just means they're going to become even more obese. Chips from many Asian manufacturers are already much more accurate and efficient than American ones. We need to encourage American chips to be more interested in STEM fields if we're ever going to turn our economy around!
the concept works by allowing processing components — such as hardware for adding and multiplying numbers — to make a few mistakes, which means they are not working as hard
But my math teacher didn't understand the important difference between efficient and lazy.
This concept was used a lot back in my high school.
http://tech.slashdot.org/story/09/02/08/1716235/sacrificing-accuracy-for-speed-and-efficiency-in-processors
Of course, you might've been sacrificing speed for accuracy in that 3 year estimate.
(and for all of the nay sayers -- I could see this being great for monte carlo simulations or other modeling where you're dealing with so much imprecise inputs that minor error's not going to be significant)
Build it, and they will come^Hplain.
They could be useful in a few small circumstances, but for the vast majority of cases, I'd be interested in how a speed payoff is going to be beneficial given you don't know whether you got the correct answer. You could run a check to see whether it's correct, but then you can't trust the check to give you the right answer either... so you could run a third check...
Clearly, the answer is to run 14 checks.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
GPS. I don't need to know that I'm precisely in the middle of the left lane of the 4-lane highway going 59.2MPH. I'd rather it use the processor for screen refreshes and finding a better route around Dallas or Chicago at Rush Hour.
Scales at the checkout - the faster is gets a reading on how much my apples weigh, the faster I get away from the "People of Wal-Mart" and I'll bet there's less than a penny difference anyway.
Video Games (see GPS) - many switch to integer maths already for speed, how about fuzzy integers? ;)
DHS airport scanners - the faster they scan, the less I'll glow in the dark
I write 15 times as much code by not bothering to fix the mistakes.
I envision the "less precise" CPUs being used in consumer laptops where people are just watching movies or listening to music.
It does not matter if the MPEG4 conversion is slightly off with the color, because the consumer's eye won't detect it. The selling point will be a laptop or tablet that lasts 10x longer on a battery charge.
My AC stalker: " I personally agree with your posts most of the time, but that won't keep me from modding you troll"
Wow, so the goal to be Green in the future is to introduce more bugs into hardware to save power. While I am sure there are limited uses of this kind of "math" in general I don't believe these chips will have widespread adoption because mathematical accuracy, at least for integer values, is kind of critical for most applications. Its hard enough for developers to predict the random an idiotic nature of the users of their software, now they have to build protection against hardware throwing them random results.
This instantly reminded me of a developer that claimed a 1200% improvement in performance after he optimized some code. The developer wasn't particularly skilled and some senior level guys had already optimized the performance about as far as it could be taken, so we were dubious. We found after a code review that basically this developer has improved the efficiency of the software by skipping some critical intensive calculations that was the point of the software.
Sure you could claim that this optimization is greener then the original code because the CPU is not working as hard, but if you are not going to get the expected results, f*ck being green.
I haven't thought of anything clever to put here, but then again most of you haven't either.
I haven't RTFA yet, but I strongly suspect that there would be different instructions when accuracy matters (ie program flow control), from where it's not as important (ie signal processing).
What about in a RT rendering (game/BD-Rom decode) situation, or a RT communication (Skype) situation?
Both of these do not need exact values, just close enough, and even if there was an error it will be transient and gone almost as fast as it was noticed?
-nB
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
I envision the "less precise" CPUs being used in consumer laptops where people are just watching movies or listening to music.
It does not matter if the MPEG4 conversion is slightly off with the color, because the consumer's eye won't detect it. The selling point will be a laptop or tablet that lasts 10x longer on a battery charge.
In other words, the Walmart Netbook?
Call it the "Close enough for government work" chip.
It's still math, it's just in the hardware rather than the software.
Video game graphics could probably benefit from this. Very few people will notice that one pixel is #FA1003 instead of #FC1102, especially when it's replaced 16ms (or, worst-case, 33ms) later with yet another color. It might actually make things "better" - making the rendering seem more analog. Many games are "wasting" power adding film grain or bokeh depth-of-field or lens flares or vignette, to try to simulate the imperfections of analog systems to try to make their graphics less artificial-looking. If you can get a "better" look while using *less* power, all the better.
Actually, I seem to recall hearing about this earlier. For some reason I want to say nVidia specifically has been looking into this.
Not quite so transient with MP4. You get the I-frame which has the complete picture. That picture only lasts maybe 1/24 of a second. But following that I-frame are B and P-frames. Those are deltas from the I-frame, and would contain those errors PLUS the errors from the delta for the new frames.
It seems to me that this may make repeatability difficult. What if you want to recreate the situation for debugging, court cases, etc? Perhaps there can be a "testing mode" where full accuracy is on, but switch to "efficiency mode" for low-power production. Still, losing repeatability makes me noivus, to quote the 3 Stooges.
Table-ized A.I.
You do that in the same place you currently go to not-do what they're trying to do.
Kind of like how I deal with the people I know who are trying to start a square-dancing club: by not going to the place they do it at the same time.
http://alternatives.rzero.com/
What!? That rocket was NO WHERE NEAR ME. Wait, why is everything FROZEN?!
Connection Terminated. Desynch error rate exceeded.
Oh sure we'll just snapshot the whole flippin' gamestate to the clients and do reconciliation -- But that's just wrong.
Error propagation, Non-determinism, etc etc. This is OK for GPU stuff that ONLY draws pixels. Anything that affects gameplay could only be done server side with dumb clients, but not for any real detailed worlds (just ask second life devs) -- Without deterministic client side prediction you need MUCH higher bandwidth and latency of less than 30ms to get equivalent experience. The size of game state in game worlds has been increasing geometrically (in PCs it still grows, consoles hit limits due to ridiculously long cycles and outdated HW), determinism and pseudo randomness helps keep the required synch state bandwith low. Oh, I guess I could use less precise computations for SOME particle effects (non damaging stuff), but you know what? I'M ALREADY DOING THAT.
What's that you say? The errors could be deterministic? Oh really... well, then what the hell is the point? Why not just use SMALLER NUMBERS and let the PROGRAMMER decide what the precision should be. It's like no one's heard of short int or 16bit processors. Give a dedicated path for smaller numbers, and keep us from being penalised when we use them (currently, 16 bit instructions are performed in 32bit or 64bit then trimmed back down). Some GPU stuff already has HALF PRECISION floats. Optimise that path and STFU about fuzzy math, you sound moronic...
I'm surprised g_earth = pi^2 wasn't one of those.
That one actually becomes relevant when back-of-the-enveloping orbital calculations....
Can you be Even More Awesome?!
As many have said below, your brain is indeed doing math - what it's not doing is "computation".
Most of the discussions in this thread are forgetting that important difference. The applications for which this type of chip will be useful are those in which the exact value of something is not important, but the relationships between values are. For instance, if you're implementing a control system algorithm, you don't care that the value of your integration is something specific, but you do care that it will always increase in proportion to the inputs and time. This is more akin to how your brain works - it doesn't care how much force it has to apply to your arm to make it move to catch a ball - it just knows that it needs "more" or "less".
For things like finance or engineering design that actually require computation this chip would be a poor choice.
"There are a dozen opinions on a matter until you know the truth. Then there is only one." - CS Lewis (paraprhase)
Furthermore, posting under the top post when your reply is nothing to do with the OP is considered a faux pas
That has always bugged me, but it is commonly done and they commonly get modded high.
If you could reason with religious people, there would be no religious people
So we'll probably see something like the situation we see with laptop displays. "Good enough" for a movie is good enough for 90% of people, so that's what the market will be flooded with. Anyone who actually cares about quality will lose out on the economies of scale.
Give me Classic Slashdot or give me death!
Hardly. With engineering projects (especially with regards to people losing their lives) you ALWAYS build in safety factors. Large ones in fact. If you are within (from the article) 0.54% of the limits of the material you have a lot bigger problems then the processor.
Secondly, we are talking about low-power hardware here, not a software application. I see these chips being pushed into tablets and mobile devices, not things like laptops & desktops where they do some serious mathematical lifting.
I call it 'The Aristocrats'
If someone is doing structural engineering they are already aware of how much precision they actually need, and probably are not going to be reusing some 'hobby' application to do those calculations... crow, they probably are not even going to use one of the common languages like C/C++ since floating point operations in them are already unpredictable past a certain point (the chips will do the work to great precision, but the language is sloppy)... if they REALLY need the precision they will probably use specialized libs or a more audit-able language like Ada or FORTRAN.
Most programmers doing these kinds of calculations are using floating point numbers, which already have interesting rounding error failure modes that most programmers don't understand. This is going to exacerbate the problem.
Decreasing hardware intelligence and counting on programmers to make up the difference hasn't been a winning proposition in a long time.
I envision the "less precise" CPUs being used in consumer laptops where people are just watching movies or listening to music.
It does not matter if the MPEG4 conversion is slightly off with the color, because the consumer's eye won't detect it. The selling point will be a laptop or tablet that lasts 10x longer on a battery charge.
Exactly that.
Prepare to see GPU which go into "fudged mode" when dealing with graphics (3D, Video, etc.), and which go into "high precision mode" when doing science (OpenCL, CUDA, etc...)
Then further down the line, be prepared to see the "high precision mode" to be a paid-for only option.
(Buy a GPU marketed as tablet/latptop/entry-level desktop: Only "fudged mode available",
Buy a GPU marketed as high-level desktop/workstation/cluster: "High precision mode" available too, costs 2x more, although it's exactly the same chip (only perhaps with a different number of disabled/enabled core) )
That's already the case with other pro features:
- ECC mode is only availble on cluster OpenCL/CUDA cards (although they don't use ECC DRAM chips. Instead, they reserve a small portion of the memory to do checksumming in firmware/software). They are identic. Or in fact even cheaper (the graphic output is disabled or not even soldered-on).
- Quad-buffer stereo OpenGL is only available on "workstation"-grade cards, although there's no peculiar hardware requirement (and a subset of the same capability is available as proprietary gaming 3D-Stereo DX3D/OpenGL on some mid- and high-range models).
So, yeah, one more caracteristics that will be artificially price-tired through a pure software setting!
And one more opportunity for the open-source drivers to shine...
Well, except maybe they will lack the necessary man-power, due to the required additional reverse engineering, or due to the seldom needed feature.
(Although, we maight see a better chance with AMD hardware:
AMD supports the development of open-source drivers by providing documentation for almost everything (except Video DRM), and the computing part is recent enough (OpenCL was recently developped and is only on version 1.2) and relies on less quirks and optimisation than graphics: so performance shouldn't be lagging that much behind the closed source drivers.
When you also take into account that being open-source these drivers are easily packaged-with and maintained by distributions, thus making them a little bit easier to deploy (no need to add a manufacturer's 3rd party repository, no need to recompile a separate kernel module, etc. always compatible with up-to-date Xorg/Wayland API & ABI), we can expect the AMD hardware to see more open-source usage for computing, and thus the computing feature being more sought after and also developed by the opensource drivers).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
I can imagine a few cases when it could be allowed, based on mathematical proof in advance that error level would be acceptable.
Audio/Video playback in a noisy environment
Processing similar to PageRank and the recently announced NetRank for biochemical analysis might be able to produce better results for a given cost in electricity. In other words, deeper graph analysis traded for less significant digits
CPU-controlled activities that depend on statistics and sensors, for example street light control, voice/gesture based activation of lighting
Applications in which low power is the most important thing, especially if it is output meant for a human brain which already operates on a lossy basis. A wristwatch might be lower power if it is allowed to be correct within plus or minus 15 seconds.
My question is whether they have controlled for where the error occurs. The nice thing about approximations is that you know what the error is.
According to the article, the low power increase the relative error to 7.5% (quite huge) but reduce the power requirement 15x (massive benefits).
A possible explanation:
Some mathematical computation (like trigonometry) is done with lookup table and interpolation.
By using as simpler (like linear instead of polynomial)- or even doing away with- the interpolation step, you can quite speed up and lower the power requirement for corresponding ops.
By doing this you only increase the expected relative error. Not occasionnaly producing garbage.
Thus only get more approximative DCT step in you video decoding, and the output is more "blocky" (see the attached JPEG in the article).
Another explanation:
TFA speaks about reduced precision multiplication and addition.
So you could also use a simpler (but more error prone) circuitry for handling the least significant bits (TFA mention lower voltage).
If you can have bit errors anywhere including the MSB then you're going to be limited to situations where you don't actually care about the answer
Or situation where you don't actually need exactly 1 answer pro input, but where you somewhat statistically combine ("reduce") the output. (example: you only need an average of all results) and the b0rked-bit-flipped-results would be dropped with most of the other outliers.
You trade a loss of precision (the final mean will be done on less sample - you loose p.pp% of them as outliers) against a massive power requirement decrease (15x less power).
Again, that's not how the chip works.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
I was thinking the same thing. Remember the company: Adaptive Logic with their AL220? There does seem to be
some long cycle idea repeat loop going on in our industry. (Perhaps any industry really).
H.
I cannot wait until these chips start doing high frequency trading in the financial markets....
If telephones are outlawed, then only outlaws will have telephones.
Sounds about right, which would probably be a good thing. Too many programmers are obsessed with getting the mathematically correct answer to a precision that can have no actual impact on whatever they are trying to accomplish (or even worse, is rendered 'wrong' anyway by FP limitations of the language or chip anyway).
Too many programmers appreciate a programming environment where the FP implementation doesn't play any tricks that messes up perfectly good code.
Look, we've been there 40-50 years ago. Floating-point arithmetic was rubbish, because the amount of hardware that was available was very much limited. Thanks mostly to Prof. Kahan, Apple who introduced SANE floating-point arithmetic (very fitting acronym), and Intel who proved it could be implemented in a hardware FPU with the 8087 co-processor, sanity prevailed. These guys at Rice University should be hanged, quartered, flogged and shot, I just can't decide in which order.
Exacerbate what problem? How often will an error rate n the 0.5%-1% range ever actually matter? Financial calculations are seldom floating point to begin with, but outside of finance and some scientific computation, results accurate to 2 significant digis are just fine for most things (especially where the error in measurment for all the inputs is worse that 1% to begin with!).
Socialism: a lie told by totalitarians and believed by fools.
In civil structural engineering analysis the required precision is normally around 3 significant digits, or less than 1%. The factors of safety required for different conditions vary from 30% to over 200% are far higher than this precision. Loads are often estimated with only 2 or 3 significant digits.
Where higher precision is most required in structural engineering is on geometry and fabrication tolerances. Construction tolerances for a beam length may be limited to 1/8th of an inch regardless of the beam length. Errors in calculating assembly lengths and geometry fit up, can lead to costly construction repairs and delays. I still think most of this precision fits within the standards of Excel for most cases.
Yeah, I don't know if you get to choose where, exactly, that inexactness comes into play.
For instance, you wouldn't want a vague answer as to the value of the integer 'i' in a for loop.
I am John Hurt.
Agreed. I've seen trans-coded video these days, with hideously high video encoding rates and the latest video codecs, and I wonder how they manage to screw it up so completely.
It takes only an afternoon to learn how to trans-code a video to a certain bit-rate or size with minimal or no artifacts. How are they doing this, then? I want to know.
It's like getting a Porsche (any Porsche) to do 0-100 in 30 seconds, consistently.
I am John Hurt.
Large safety factors are bad engineering in a lot of fields. Maybe not for architecture and bridges, but for airplanes, the safety factor is as close to 1 as possible (and there are certainly lives on the line). The weight savings are always worth it. In fact, in aerospace, safety factors down to 0.9 are common, meaning the part _will_ more than likely fail at some point, and so it is inspected regularly for signs of fatigue failure.