The Fight Against Dark Silicon
An anonymous reader writes "What do you do when chips get too hot to take advantage of all of those transistors that Moore's Law provides? You turn them off, and end up with a lot of dark silicon — transistors that lie unused because of power limitations. As detailed in MIT Technology Review, Researchers at UC San Diego are fighting dark silicon with a new kind of processor for mobile phones that employs a hundred or so specialized cores. They achieve 11x improvement in energy efficiency by doing so."
yeah what a great idea. just like the PS/3 with its specialized processors. 100 + should be easy to program for .... NOT.
Language support for ubiquitous and provably threadsafe implicit parallelization -- done right -- is the answer to using generic dark silicon -- not building specialized silicon. See The Flow Programming Language, an embryonic project to do just that: http://www.flowlang.net/p/introduction.html
The CPU in a cell phone does not use much power so there is little to gain. Now if you can make more efficient radio transceivers - that would be something. Or the display, that would also significantly reduce power consumption. But adopting a new, unproven technology for minimal benefits.... That's not going to happen.
http://cseweb.ucsd.edu/users/swanson/papers/Asplos2010CCores.pdf
They call the specialized cores "c-cores" in the paper. I took a quick skim through it. C-cores seem like a bunch of FPGA's and they take stable apps and synthesize it down to FPGA cells with the use of the OS on the fly. The C-core to hardware chain has Verilog and Synopsis in it.
Cool tech, guess they could add gated clocking and all the other things taught in classroom to further turnoff these c-cores when needed.
cheers.
A couple of thoughts:
1. The common functionalities surely would include OS API's, as they seem pretty stable. But would they include common applications such as social networking apps, office apps, etc.?
2. If a patch is necessary, then upgrading hardware might be a little tricky. This will become a serious issue with the invasion of malware.
The Sinclair ZX81 replaced fourteen of the chips used on the ZX80 with one big programmable logic array chip that was only supposed to have 70% of the gates programmed in it. However, Sinclair used up all the gates on the chip and it ran nice and hot because of that. I suppose that the design could have used two chips instead, leading to lots of dark silicon and a cost implication.
I realise openjdk's is stack-based vm and dalvik is register-based. But aren't they essentially mapping virtual machine instructions to hardware instructions? In a rudimentary manner this was tried a decade ago with Java. It was found that general purpose processors would spank a Java-CPU in performance due to the way that a VM would interact with a JIT instead of processing raw instructions.
[Aside - ARM does include instructions for JVM-like code - Jazelle/ThumbEE. Can/does Dalvik even take advantage ?]
The extent to which this idea can escape from a research lab depends on relative performance. Quite interesting from the power consumption aspect though.
Why is it dark silicon they fight against? This represents the struggle of the black man to overcome racial prejudice and retake the word "nigger". The parallels are deep, man.
Exactly. Why do you think green olives are in glass jars and black olives are in tin cans? So the black olives can't look out. It's subliminal racism I tell you.
- then I'll be impressed. Currently I sit at 3 days with very heavy usage, and 5-6 days with low to moderate usage. If this sort of mult-core stuff breaks the all important one week barrier, then it'll be a welcomed technology.
Occasionally living proof of the Ballmer peak.
guess using unused space is a good thing, but will it be cost effective to make these huge low nm chips? it might be more cost efficient to include two higher process chips. also batteries are always getting a little better (albeit very slowly). i think android phones especially would benefit from more cores. there are hundreds of threads running on that OS with just a few apps open.
Dark Silicon: Luke, I am your father.
Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
the claim is that this is the most power efficient design route.
the problem is that there just aren't the sophisticated tool sets you need for design and analysis.
of course I've never been clear on why you couldn't just use the asynchronous design ideas and substitute
very low clock speeds in place of disables or some such thing.
not a digital designer so can't get too far into the details.
Absolute statements are never true
Well that makes some sense, when you implement functions in FPGA/silicon you usually get a 10X/100X reduction in the number of gates needed and power consumed over compiled code. I will point out though that 'writing' FPGA code is 10X harder than programming in C/C++ let alone Java/C#/Python so it's more likely that such would be used for OS code and drivers.
Specialized CPU elements have been tried. The track record to date is roughly this:
A lot of things which you might think would help turn out to be a lose. Superscalar machines and optimizing compilers do a good job on inner loops today. (If it's not in an inner loop, it's probably not being executed enough to justify custom hardware.)
So here is the power usage breakdown from my Samsung Galaxy S running Froyo:
Display: 89%
Maps: 5%
Wifi: 3%
Cell Standby: 2%
So how is enabling "Dark Silicon" going to help the power usage on my phone when the display uses the
vast majority of the power?
The information wants to be free, I just give it somewhere to go.
We can't see it, we can only detect it by its power draw, and it makes up 95% of your chips!
I suffer from attention surplus disorder.
the lack of for-loops is anything but elegant in scientific computation, image processing etc.
What's the difference between imperative "for" and functional "map" for iterating through a collection? Python has both, and I end up using generator expressions (which use syntax not unlike "for" and the semantics of "map") at least as often as an ordinary for-loop.
I think the main challenge will be making such a language intuitive
Dijkstra wrote that programming will never be intuitive.
You can't win, because when a performance hacker reads this, he thinks, "Ooh, such waste! I need to parallelize all my stuff to increase utilization. Light 'em up!"
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Why do we need ever more powerful phones? I don't think people are going to want to run CFDs, protein-folding, or SETI-at-home on their phone.
On the other hand, if the phone consumes 11 times less power, you could go a few months without charging it which would be good.
a protocol changes. Hardware solutions are nice if the inputs and outputs never change. Why not many computers?
http://greenarrays.com/
Putting Memory super small makes it possible to emulate a whole chip with a database lookup table where each loop would do nothing more than lookup what the next gate would be on a real chip, tracing a virtual path through the chip
My idea is that one would create chips that are massively parallel in there structure and the OS on top of it would be massively paralelized in its structure and the programs on top of them would be massively parallelized.
Simpler chips that have massive lookup tables instead of many gates (just enough to make a loop and query a well index database of the gates )
So you would have a loop as I explained earlier that would execute based on inputs a search of a the tiny memory database and create an emulated NAND AND NOR XOR etc for each pass thru the loop so the chips would be simple and have hundreds or thousand or millions of them running in parallel.
As in rolled out the factory within the last few months, warranty starts ticking when I open the box. Hell, I'd be happy just to be able to buy new batteries for the one I have.
This is definitely an interesting approach they're taking.
In my research group, we're looking at a different tactic called near-threshold computing. Say you have a 32nm device that uses 100W at 1V. If you were to run it at 400mV, it would use about 1W, but logic slows by a factor of 10. So that 100X reduction in power translates into a 10X reduction in energy.
Fast-forward to 11nm, where the transistor density is 10X what it is at 32nm. Nominal voltage won't go down much, so without doing something drastic, your chip would use a kilowatt. To run the whole device, you have to lower the voltage a lot.
Unfortunately, that's not the whole story, which is where we're focused. Process variation is transistor parameter variations that are the result of manufacturing difficulties. For instance, we're doing sub-wavelength lithography, which makes all edges really fuzzy. And transistors are now small enough that dopant atoms range in the tens. So if one transistor has 20 boron atoms, and one nearby has 25, then that's a 25% variation in threshold voltage. These problems are getting progressively worse as transistors scale down. And the effect is amplified when you lower voltage. Die-to-die and within-die variation are massive at that scale and voltage. Generally, your whole processor's frequency is limited by the slowest circuit path on the chip. At smaller geometry and lower voltage, the speed difference between the slowest and fastest paths grows. The faster paths use more power, which is also a problem.
We tend to focus on architectural solutions to this. Let the circuits fail... then catch and correct the errors. I published a paper in MICRO 2010, where we lower voltage to a SRAM caches. (Google for >.) At some point, the bit cell failures become catastrophic, so we applied forward error correction. The result is that at a voltage where an unprotected cache is completely useless (so many errors that nearly every cache line has more bad cells than standard ECC can handle), our approach salvages 50% of the nominal voltage cache capacity.
with for loops, you can: * break out of it early
Some cases of breaking early can be represented as composition of iterator operations: "Find the first ten elements that meet these criteria" is something like islice(ifilter(criterion, seq), 10) where criterion is a function returning nonzero for elements that match. "Find all elements until the first not meeting the criteria" is takewhile(criterion, seq). What were you thinking?
iterate a subsequence without actually creating a subsequence, step across several elements on each iteration
You mean like Python islice(seq, start, stop, step), which takes anything iterable and returns an iterator of a subsequence?
Ya know, the only people I've ever seen argue for the use of map vs for() are those academia types who never write useful code to begin with.
That and anyone who has ever worked with a framework that implements parallel map (e.g. Grand Central Dispatch) or distributed map (e.g. Google MapReduce).
Why do those [sex analogy] always think that for() loops only have 2-3 lines within them and no real logical operations ?
In a lot of cases, the body of a for loop can be refactored into a separate function. Such loops can be considered as having only a call to that function as the body, in which case map is equivalent.
Sure, map[] may replace first year CS student for() loops, but that's about it.
Tell that to all the generator expressions* found in the real-world Python code that I write for real-world work where I earn my real-world paycheck.
* In Python, generator expressions take the form (some_expr(el) for el in some_iterable). They are commonly used to populate lists, sets, and dictionaries. Despite using the same for element in iterable wording as for loops, they have similar semantics to functional map.
What are the editing commands for forcing a new line in text. I know about bold and I thought a new line was less than character followed by a p. Where is the help / reminder for we responders?
Leslie Satenstein Montreal Quebec Canada
So, with all that suggested energy savings, the battery could become smaller, and the number of apps could increase. Is there a Moores law about the number of semi-useful applications?
Leslie Satenstein Montreal Quebec Canada
"dark silicon' has a cool ring to it? ;-)