Processors and the Limits of Physics

Dupe by Anonymous Coward · 2014-08-16 03:38 · Score: 5, Informative

Same ArsTechnica article link and everything

Re: Dupe by LocutusOfBorg1 · 2014-08-19 18:45 · Score: 1

42!

There are no limits! by Anonymous Coward · 2014-08-16 03:40 · Score: 1

Well, except that every other technology has hit limits, except computers! They'll just endlessly get better. Forever.

Re:There are no limits! by blue+trane · 2014-08-16 04:40 · Score: 1, Insightful

Yes, like Simon Newcomb proved we had hit limits in heavier-than-air flight, in 1903!

In the October 22, 1903, issue of The Independent, Newcomb made the well-known remark that "May not our mechanicians . . . be ultimately forced to admit that aerial flight is one of the great class of problems with which man can never cope, and give up all attempts to grapple with it?"
Re:There are no limits! by AchilleTalon · 2014-08-16 04:45 · Score: 2

Your reasoning is false. Most AI algorithms are having a high level of parallelism which make them less susceptible to the single CPU physical limit. You can achieve incredible performance improvement on GPU and other parallel architectures.

--
Achille Talon
Hop!

This seems like a good time to meniton these by nctritech · 2014-08-16 03:44 · Score: 5, Interesting

Clockless logic circuits might be an interesting workaround for the communication problem. The other side of the chip starts working when the data CAN make it over there, for example. I don't claim to know much about CPU design beyond how the work on a basic logical level, but I'd love to hear the opinions of someone here who does regarding CPUs and asynchronous logic.

Re:This seems like a good time to meniton these by K.+S.+Kyosuke · 2014-08-16 04:01 · Score: 1

Chuck Moore does these. Of course, they're extremely simple at the moment (comparatively speaking). But they are indeed extremely energy efficient, and the self-timing thingy works great for them.

--
Ezekiel 23:20
Re:This seems like a good time to meniton these by TechyImmigrant · 2014-08-16 07:04 · Score: 2

I guess that's me then.
Every D-flip flop is an async circuit. We use a variety of other standard small async circuits we use that are a little bigger. Receiving clock-in-data signals like DS links is a common example. What you're talking about is async across larger regions.
Scaling fully asynchronous designs to a whole chip is a false economy. The area cost is substantially greater than a synchronous design and with the static power draw of circuits now dominating, the dynamic power savings of asynchronous design is moot. You need to turn circuits off to save power. Just rendering them static doesn't help much.
A modern CPU is made of islands of synchronous design, which are not assumed to be globally synchronous. Data passing between these islands is generally re-synchronized.
An exception is power control signaling. Clock trees are power hogs, so you don't want to have to leave it on to support the power gating interfaces. So an async state machine to communicate power management protocols is common.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Re:This seems like a good time to meniton these by TechyImmigrant · 2014-08-16 07:08 · Score: 1

How are they energy efficient?
More gates == more static power draw.
Leaving a circuit switched on because you don't know when asynchronous transitions will arrive == more static power draw.
Global async design may have made sense in 1992, but not these days. Silicon has moved on.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Re:This seems like a good time to meniton these by K.+S.+Kyosuke · 2014-08-16 08:37 · Score: 2

It depends. This is the same guy that Intel licenses a lot of power-saving patents from. You'd have to ask him, but the static power draw of his circuits is indeed minimal. Perhaps the reason is that he doesn't use manufacturing processes with high static power draw on purpose, I really don't know. It may also be the case that a switch from contemporary silicon to something else in the future will make this design more relevant again, power-wise (but the timing considerations, as well as the speed of light, are of course going to stay the same).

--
Ezekiel 23:20
Re:This seems like a good time to meniton these by nctritech · 2014-08-16 11:02 · Score: 2

More relevant links to asynchronous/clockless computing:
http://www.embedded.com/design...
http://www.technologyreview.co...
http://www.scientificamerican....
http://www.nytimes.com/2001/03...
Re:This seems like a good time to meniton these by sjames · 2014-08-16 11:17 · Score: 1

It's perfectly feasible in an async processor. You just have to hold the results in a buffer at the end of the computation until a timer fires. The timer being set to the longest the op might possibly take.
Async chips can have timers, they're just not driven by them.
Re:This seems like a good time to meniton these by TechyImmigrant · 2014-08-16 13:56 · Score: 1

It depends.
Yes. In semiconductors, there's a basic tradeoff between static power dissipation and propagation latency. In a slow/low static current process, you might well be able to use asynchronous design to improve power efficiency. This is more the realm of RFID tags, payment cards and smart card chips. You won't be finding much of that going on in a desktop or phone CPU.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Re:This seems like a good time to meniton these by udippel · 2014-08-16 23:54 · Score: 1

Every D-flip flop is an async circuit. W.
How's that then? Would you care to explain, please, what you mean?
My D-FFs here are totally synchronous: The D-input pops up at the output exactly with the rising clock edge + processing delay. And the latter is unavoidably indefinably.
.
Re:This seems like a good time to meniton these by TechyImmigrant · 2014-08-17 05:38 · Score: 2

I do understand it. That patent describes an asynchronous data transfer with rendezvous using a conventional quadrature handshake. I can't imagine that there isn't prior art. That is standard stuff. The date of the patent is 2006. I finished my degree in 1991 around the same time the amulet async ARM was beginning development. My tutor at college invented the async register file for the amulet.
The method it describes is slow because it requires a two round trips between source and destination. That is why clock-in-data schemes are preferred. The neatest of those is the DS link code that was put out by Inmos in the early 90s (I used to work there). A receiving async circuit can recover data and clock and pass it on to a synchronous receiver using normal methods.
But it makes the same wrong assumption that the 'better than clock gating' efficiencies of async logic would be superior to a synchronous circuit. However these days it doesn't matter. In small geometries, your logic is sucking power whether or not it is clocked, due to static leakage. The way of the world these days is fine grained power gating. As per my previous post, async transactions have a role to play in power gating (because you don't need to leave the clock tree on to use them), but they are a false economy in random logic applications because the increased gate count leads to an increased static current draw.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Re:This seems like a good time to meniton these by TechyImmigrant · 2014-08-17 06:22 · Score: 1

>Most power is drawn on transitions.
Most power is drawn in static leakage.
There, fixed that for you. It isn't 1990 any more, when what you said was true.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Re:This seems like a good time to meniton these by TechyImmigrant · 2014-08-17 06:27 · Score: 1

Every D-flip flop is an async circuit. W.
How's that then? Would you care to explain, please, what you mean?
My D-FFs here are totally synchronous: The D-input pops up at the output exactly with the rising clock edge + processing delay. And the latter is unavoidably indefinably. .
A DFF is the basic element of a synchronous circuit, yes. But look inside a DFF and it's a basic async circuit with two (or 3 or 4) inputs and one (or two) outputs. That's why it requires you to maintain at least a minimum time gap between certain transitions on the inputs.
Don't they teach async design at college these days?

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Re:This seems like a good time to meniton these by udippel · 2014-08-17 07:15 · Score: 1

Yep, 'they' do. Where 'they' includes me. I was afraid that's what you'd thought; and that's fundamentally wrong. Propagation delay and processing time doesn't render a circuit asynchronous.
Following your logic, any circuit would be asynchronous, if only on the propagation delay induced by any non-infinitely short wire. Then we'd say that the whole world is asynchronous; and that'd be it. When we use these terms, however, it is common understanding that a synchronous circuit is 'guided', clocked, by a central clocking device. And all sub-circuits are controlled by that same device. Likewise, an asynchronous circuit is controlled by the actual occurrence of the signal, without a predefined time frame. Respectively by a pre-cursor (header) in the signal path.
One could as well focus on the number of signals: a synchronous circuit has a clock signal and payload signal(s), while the asynchronous circuit has only one signal: the payload. Eventually with a header.
Re:This seems like a good time to meniton these by TechyImmigrant · 2014-08-18 04:09 · Score: 1

You are conflating asynchronous circuits with asynchronous communication between mutually asynchronous circuits.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.

Go vertical! by putaro · 2014-08-16 03:48 · Score: 5, Interesting

Stacking dies or some other form of going from flat to vertical will get you around some of the signaling limits. If you look back at old supercomputer designs there were a lot of neat tricks played with the physical architecture to work around performance problems (for example, having a curved backplane lets you have a shorter bus but more space between boards for cooling). Heat is probably the major problem, but we still haven't gone to active cooling for chips yet (e.g. running cooling tubes through the processor rather than trying to take the heat off the top).

Re:Go vertical! by savuporo · 2014-08-16 03:57 · Score: 1

Ah, Prime Radiant !

--
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.slashdot.org Errors found while checking this document as HTML5!
Re:Go vertical! by Nemyst · 2014-08-16 05:04 · Score: 2

This. It won't be easy, of course not, but there's this entire third dimension we're barely even using right now which would give us an entirely new way to scale up. The possible benefits can already be seen in for instance Samsung's new 3D NAND, where they can get similar density to current SSDs with much larger NAND, thus improving reliability while keeping capacities and without significantly increasing costs. Of course, CPUs generate far more heat than SSDs, but the benefits could be tremendous. If anything, imagine the amount of cores you could cram in the same die area if you could stack them!
Re:Go vertical! by guruevi · 2014-08-16 13:23 · Score: 1

There have been plenty of concept designs and current chips use 3d technology to an extent. The problem IS cooling. On a flat plane, you can simply put a piece of metal on top and it will cool it. Current chips sometimes stoke away close to 200W. With 3D designs, you need to build-in the heat transfer (taking up space you can't use for chips or communications) in between and both planes will produce equal amounts of heat so either heat transfer needs to be really, really good or you need a heat sink several times larger than the space you'd save in between the planes.

--
Custom electronics and digital signage for your business: www.evcircuits.com
Re:Go vertical! by InvalidError · 2014-08-16 13:45 · Score: 1

But as you said yourself, CPUs (and GPUs) generate a lot more heat. They are already challenging enough on their own, imagine how hot the CPU or GPU at the middle of the stack would get with all that extra thermal resistance and heat added above and below it. As it is now, CPU manufacturers already have to inflate their die area just to fit all the micro-BGAs under the die and get the heat out.
Unless you find a way to teleport heat out from the middle and possibly bottom of the stack, stacking high-power chips will not work.
At best, you could stack memory and CPU/GPU for faster, wider and lower-power interconnects.
Re:Go vertical! by ChrisMaple · 2014-08-16 13:45 · Score: 1

Pump the coolant through the chip.

--
Contribute to civilization: ari.aynrand.org/donate
Re:Go vertical! by putaro · 2014-08-17 01:50 · Score: 1

Think different!
Maybe instead of stacking the chips, you put one on the bottom and have it double as a backplane and then mount additional dies to it vertically (like itty bitty expansion cards). Then you can get some airflow or other coolant flow in between those vertically mounted dies.
These kinds of funky solutions will only show up when they're cost-effective (that is, absolutely needed). The reason we stick with flat dies (and single die packages) is because it's cheaper to make/mount a single die in a package. However, when the performance is really needed we'll start seeing some innovative solutions.

Alpha Particles by AlecDalek · 2014-08-16 03:49 · Score: 1

So why don't we use Alpha radiation particles?

Re:Alpha Particles by wonkey_monkey · 2014-08-16 06:32 · Score: 1

So how would we use alpha particles?

--
systemd is Roko's Basilisk.
Re:Alpha Particles by AlecDalek · 2014-08-16 11:03 · Score: 1

They work just like electrons, but faster. Heat would be an issue though. And radiation, of course.
Re:Alpha Particles by Mr+Z · 2014-08-16 11:52 · Score: 1

Can you propose a non-stochastic process that produces alpha particles, and a way of constructing a family of logic gates out of that process?

--
Program Intellivision!
Re:Alpha Particles by slew · 2014-08-16 12:02 · Score: 1

So just why would alpha particles (which are basically a helium nucleus consisting of 4 really heavy particles) gonna be somehow faster than electrons (which are much lighter and take less energy to manipulate)?
Another problem is that we aren't currently using free-space electrons either, but electrons in a wave guide (where we lay down conductors to steer the electrons around the circuits we design). Not as easy to do with alpha particles...

can't cross chip in one clock. big deal. by dbc · 2014-08-16 04:01 · Score: 5, Interesting

"Even if signals in the chip were moving at the speed of light, a chip running above 5GHz wouldn't be able to transmit information from one side of the chip to the other." ... in a single clock.

So in the 1980's I was a CPU designer working on what I call "walk-in, refrigerated, mainframes". It was mostly 100K-family ECL in those days and compatible ECL gate arrays. Guess what -- it took most of a clock to get to a neighboring card, and certainly took a whole clock to get to another cabinet. So in the future it will take more than one clock to get across a chip. I don't see how that is anything other than a job posting for new college graduates.

That one statement in the article reminds of when I first moved to Silicon Valley. Everybody out here was outrageously proud of themselves because they were solving problems that had been solved in mainframes 20 years earlier. As the saying goes: "All the old timers stole all our best ideas years ago."

Re:can't cross chip in one clock. big deal. by Rockoon · 2014-08-16 04:27 · Score: 5, Interesting

Even more obvious is that even todays CPU's dont perform any calculation in a single clock cycle. The distances involved only effects latency, not throughput. The fact that a simple integer addition operation has a latency of 2 or 3 clock cycles doesnt prevent the CPU from executing 3 or more of those additions per clock cycle.

Even AMD's Athon designs did that. Intels latest offerings can be coerced into executing 5 operations per cycle that are each 3 cycle latency, and then thats on a single core with no SIMD.

Its not how quickly the CPU can produce a value.. its how frequently the CPU can retire(*) instructions.

(*) Thats actually a technical term.

--
"His name was James Damore."
Re:can't cross chip in one clock. big deal. by AchilleTalon · 2014-08-16 04:28 · Score: 3, Informative

Well, clearly moving mainframe people to OS/2 development wouldn't have been a so great idea. The mainframe segment was much more profitable than the PC segment where the profit margin are so thin IBM decided to sell the whole division to Lenovo. The money is elsewhere.
And do not forget memory management has to be reinvented because there was IP rights on the MVS algorithms IBM wasn't willing to transfer to OS/2. In these old times, the PC market and mid-range market were perceived as a threat by the big mainframe guys at IBM which were still the guys at the top in the hierachy. The technical side is just the lesser part of this problem.

--
Achille Talon
Hop!
Re:can't cross chip in one clock. big deal. by Zero__Kelvin · 2014-08-16 04:49 · Score: 3

I think the informed among us can agree, this whole article combines a special lack of imagination, misunderstanding of physics, and a complete lack of understanding of how computers work, in order to come up with a ridiculous article that sounds like it was written by chicken little :-)

--
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Re:can't cross chip in one clock. big deal. by Splab · 2014-08-16 05:02 · Score: 2

Are you saying electrons were moving slower in the 80's?
Re:can't cross chip in one clock. big deal. by AchilleTalon · 2014-08-16 05:48 · Score: 1

There is no such constant in physics like the speed of the electron. The speed of the electron depends on the medium it is travelling into as well as the force applied to it. That's why the electron's speed is not the same in an old CRT monitor than in the LEP (Large Electron-Positron Collider, the ancestor of the LHC in Geneva).

--
Achille Talon
Hop!
Re:can't cross chip in one clock. big deal. by Splab · 2014-08-16 06:55 · Score: 1

Erm, well true, but same goes for light, yet we speak about the speed of light as a constant...
The point I was trying to make, obviously, Slashdot of old has gone away, so I guess you need to pencil it out in stone, was that the guy is claiming a clock cycle took ages to propagate through the systems, which tells us he has no idea, what was and is going on in a computer. Now syncing a clock across several huge monolithic machines back then was easy, because a clock cycle was happening almost at a walking pace, going to 5Ghz is a entirely different beast, as you are now dealing with the limits of physics.
Re:can't cross chip in one clock. big deal. by TechyImmigrant · 2014-08-16 07:13 · Score: 1

> The fact that a simple integer addition operation has a latency of 2 or 3 clock cycles doesnt prevent the CPU from executing 3 or more of those additions per clock cycle.
That's just wrong. It does't take three clock periods to propagate through an adder on today's silicon unless it's a particularly huge adder. It might take several cycles for an add instruction to propagate though a CPU pipeline, but that is completely different.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Re:can't cross chip in one clock. big deal. by TechyImmigrant · 2014-08-16 07:14 · Score: 1

The speeds of the electrons is immaterial. The speeds of the electric field in the wires is what matters.
The electrons move really slowly.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Re:can't cross chip in one clock. big deal. by Zero__Kelvin · 2014-08-16 08:07 · Score: 1

No you're missing the point. What happens, for example, when quantum computing makes the transition from bleading edge to mundane? Nobody knows, but it's safe to say that this guy doesn't either. There is plenty of room for improvement even with current methods and understanding of physics, even if one makes the mistake of ruling out parallel advancements.

--
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Re:can't cross chip in one clock. big deal. by Mr+Z · 2014-08-16 11:51 · Score: 1

Sure, throughput is what matters most for operations you can parallelize. However, as Amdahl's Law cruelly reminds us, there's always parts of the problem that remain serial, and they'll put an upper bound on performance. You can't parallelize the traversal of a linked list, no matter how hard you try. You have to invent new algorithms and programming techniques. (In the specific case of linked lists, there are other options that trade space for efficiency, such as skiplists.)
Gustafson's Law does offer some hope: As we build more capable machines, we'll tackle bigger problems to utilize those machines. That's how we're able to, for example, get wireless data speeds on our cell phones operating on batteries that would make wired modem users of just 10-15 years ago jealous.
But, Gustafson's Law only serves as a counterpoint to Amdahl's Law to the extent that you tackle bigger problems, as opposed to trying to reduce current problems to take less time and energy.

--
Program Intellivision!
Re:can't cross chip in one clock. big deal. by K.+S.+Kyosuke · 2014-08-16 12:06 · Score: 1

I think that's exactly why Guy L. Steele, Jr. made that whole presentation on how you shouldn't be using linked lists (or anything accumulator-style, really).

--
Ezekiel 23:20
Re:can't cross chip in one clock. big deal. by lister+king+of+smeg · 2014-08-16 12:07 · Score: 1

That's because we talk about the speed of light - in a vacuum - which is a constant....
...absent large gravitational fields.

--
---Saying gnome 3 is better than windows 8 not so much a compliment as it is damning with light praise.
Re:can't cross chip in one clock. big deal. by K.+S.+Kyosuke · 2014-08-16 12:10 · Score: 1

The speeds of the electrons is immaterial.
Outside of the transistor channels, that is. :-)

--
Ezekiel 23:20
Re:can't cross chip in one clock. big deal. by InvalidError · 2014-08-16 13:51 · Score: 1

Maybe he was talking FADD.
In a float addition, you need to denormalize the inputs, do the actual addition and then normalize the output. Three well-defined pipelining steps, each embodying one distinct step of the process.
Re:can't cross chip in one clock. big deal. by udippel · 2014-08-16 23:59 · Score: 1

Having no mod points, I can only nod points.
Re:can't cross chip in one clock. big deal. by udippel · 2014-08-17 00:06 · Score: 1

The best you AC know about is proverbs. An electrical engineer you are not, and neither a physicist. Your ideas on crosstalk and interference are basically vague.
Re:can't cross chip in one clock. big deal. by udippel · 2014-08-17 00:11 · Score: 1

Sure, throughput is what matters most for operations you can parallelize. However, as Amdahl's Law cruelly reminds us, there's always parts of the problem that remain serial, and they'll put an upper bound on performance. You can't parallelize the traversal of a linked list, no matter how hard you try. You have to invent new algorithms and programming techniques. (In the specific case of linked lists, there are other options that trade space for efficiency, such as skiplists.).
Not for forget, Amdahl's law is purely calculation; not speculation; like what Moore's law is. And you are right, certain items simply remain serial.
Re:can't cross chip in one clock. big deal. by TechyImmigrant · 2014-08-17 06:21 · Score: 1

No, he wasn't. From his post: "simple integer addition"

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.

what diminishing returns? by edxwelch · 2014-08-16 04:10 · Score: 2

Each semiconductor node shrink is faster and more power effiecient than the previous. For instance, TSMC 20nm process is 30% higher speed, or 25% less power than 28nm. Likewise, 16nm will provide 60% power saving than 20nm.

Re:what diminishing returns? by phantomfive · 2014-08-16 05:13 · Score: 1

The summary is so confused, the only explanation I can think of is that it doesn't reflect what is in the article. Which is behind a paywall, so I won't be reading it any time soon.

--
"First they came for the slanderers and i said nothing."
Re:what diminishing returns? by Anonymous Coward · 2014-08-16 05:32 · Score: 1

Something called leakage grows as process size goes down.
For 40nm, leakage was around 1 to 4% depending on the process variant chosen.
For 28nm, is jumped to 5 to 10%.
For 20nm, it is around 20 to 25%. This means that just turning a circuit on and doing nothing (0 Mhz) adds to the power consumption.
Re:what diminishing returns? by TechyImmigrant · 2014-08-16 07:16 · Score: 1

If you have a crappy old planar process maybe.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Re:what diminishing returns? by edxwelch · 2014-08-16 08:06 · Score: 1

Not really, because the tech is improving all the time. At 20nm the have high-k metal gates and at 16nm FinFETs.

So what by Anonymous Coward · 2014-08-16 04:13 · Score: 1

You don't need to constantly shrink everything. My computer is about 2 feet tall and wide. I don't care if it's a couple more inches in any direction. Make a giant processor that weighs 20 pounds.

Re:So what by gl4ss · 2014-08-16 05:03 · Score: 1

shrinking can allow for higher speed.
that's what makes this article sound dumb just by the blurb(..that it takes x amount of time to get to the other side of the chip and thus the chip can't run faster bullcrap).
I mean, current overclocking records are way, way, wayyy over 5ghz. so what is the point?

--
world was created 5 seconds before this post as it is.
Re:So what by ledow · 2014-08-16 05:43 · Score: 3, Informative

Nobody says 5GHz is impossible. Read it.
It says that you can't traverse the entire chip while running at 5GHz. Most operations don't - why? Because the chips are small and any one set of instructions tends to operate in a certain smaller-again area.
What they are saying is that chips will no longer be synchronous - if chips get any bigger, your clock signal takes too long to traverse the entire length of the signal and you end up with different parts of the chips needing different clocks.
It's all linked. The size of the chip can get bigger and still pack in the same density, but then the signals get more out of sync, the voltages have to be higher, the traces have to be straighter, the routing becomes more complicated, and the heat will become higher. Oh, and you'll have to have parts of it "go dark" to avoid overheating neighbours, etc. This is exactly what the guy is saying.
At some point, there's a limit at which it's cheaper and easier to just have a bucket load of synchronous-clock chips tied together loosely than one mega-processor trying to keep everything ticking nicely.
And current overclocking records are only around 8GHz. Nobody says you can't make a processor operating at 10THz if you want. The problem is that it has to be TINY and not do very much. Frequency, remember, is high in anything dealing with radio - your wireless router can do some things at 5GHz and, somewhere inside it, is an oscillator doing just that. But not the SAME kinds of things as we expect modern processors to do.
Taking account that most of those overclocking benchmarks probably operate in small areas of the silicon, are run in mineral oil or similar and are the literal speed of a benchmark over a complicated chip that ALREADY takes account that signals take so long that clocks can get out of sync across the chip, we don't have much leeway at all. We hit a huge wall at 2-3GHz and that's where people are tending to stay despite it being - what, a decade or more? - since the first 3GHz Intel chip. We add more processors and more core and more threading but pretty much we haven't got "faster" over the last decade, we're just able to have more processors at that speed.
No doubt we can push it further, but not forever, and not with the kind of on-chip capabilities you expect now.
With current technology (i.e. no quantum leaps of science making their way into our processors), I doubt you'll ever see a commercially available 10GHz chip that'll run Windows. Super-parallel machines running at a fraction of that but performing more gigaflops per second - yeah - but basic core sustainable frequency? No.
Re:So what by anarcobra · 2014-08-16 10:02 · Score: 1

The reason processors are small is mostly due to yield. Silicon wafers have more or less a constant amount of defects per unit of area. What this means is that the larger your chip is, the lower the number of working processors you end up with. The smaller the chip the more working processors you end up with per wafer.
Re:So what by AK+Marc · 2014-08-19 14:28 · Score: 1

async is still synchronous. You would have the region close to the clock input running at C+0. on the other side of the chip, you'd be running with a clock at C+0.9. Where the sections converged, the clocks would also converge. Two related synchornous functions (even off the same clock) are not async just because they are not synchronous.

My words fail me. The operation is clocked. That the clock doesn't happen at the same time everywhere doesn't change the nature of the operation being clocked. And all of them from the same clock.

Also, hard problems often have simple solutions. The clock doesn't propogate across the chip? The send it to all the chip at the same time. Car analogy. Shortest-path headers are inefficient. So you "tune" the headers. How do you do that? You change the path so they are closer to equal distance.

For a chip, a clock cycle that's exactly one cycle late is perfect. So drift is more important to minimize than lag/delay. So run the clock to the middle of the chip, then equal distant traces to multiple pins. The clock will be the same in as many places in the chip as necessary for proper synchronous operation. Even if the clock speed wasn't enough to cover 10% of the chip, it can still be the same +-5% over the whole chip. It'll just take more pins for clock.

--
Learn to love Alaska

Lightspeed by AJWM · 2014-08-16 04:22 · Score: 1

Yet another reason to find a way around the speed of light.

Actually I've always said (jokingly) that if anyone does find a way to go FTL, it'll be the computer chip manufacturers. In fact Brad Torgersen and I had a story to that effect in Analog magazine a couple of years ago, "Strobe Effect".

--
-- Alastair

Re:Lightspeed by cavreader · 2014-08-16 05:02 · Score: 1

Mastering and ultimately harnessing quantum entanglement as it pertains to quantum computing and the limits we face right now go right out the window.

Re:Random Title by blue+trane · 2014-08-16 04:43 · Score: 1

Didn't you get the memo? Hemp seeds are better than graphene. Plus you can get high while growing the seeds.

Density limit - not computational limit by gman003 · 2014-08-16 05:13 · Score: 2

Congratulations, you identified the densest possible circuits we can make. That doesn't even give an upper bound to Moore's Law, let alone an upper bound to performance.

Moore's Law is "the number of transistors in a dense integrated circuit doubles every two years". You can accomplish that by halving the size of the transistors, or by doubling the size of the chip. Some element of the latter is already happening - AMD and Nvidia put out a second generation of chips on the 28nm node, with greatly increased die sizes but similar pricing. The reliability and cost of the process node had improved enough that they could get a 50% improvement over the last gen at a similar price point, despite using essentially the same transistor size.

You could also see more fundamental shifts in technology. RSFQ seems like a very promising avenue. We've seen this sort of thing with the hard drive -> SSD transition for I/O bound problems. If memory-bound problems start becoming a priority (and transistors get cheap enough), we might see a shift back from DRAM to SRAM for main memory.

So yeah, the common restatement of Moore's Law as "computer performance per dollar will double every two years" will probably keep running for a while after we hit the physical bounds on transistor size.

Re:Density limit - not computational limit by slew · 2014-08-16 12:30 · Score: 3, Informative

Moore's Law is "the number of transistors in a dense integrated circuit doubles every two years". You can accomplish that by halving the size of the transistors, or by doubling the size of the chip. Some element of the latter is already happening - AMD and Nvidia put out a second generation of chips on the 28nm node, with greatly increased die sizes but similar pricing. The reliability and cost of the process node had improved enough that they could get a 50% improvement over the last gen at a similar price point, despite using essentially the same transistor size.
Bad example, the initial yield on 28nm was so bad that the initial pricing was hugely impacted by wafer shortages. Many fabless customers reverted to the 40nm node to wait it out. TSMC eventually got things sorted out so now 28nm has reasonable yields.
Right now, the next node is looking even worse. TSMC isn't counting on the yield-times-cost of their next gen process to *ever* get to the point when it crosses over 28nm pricing per transistor (for typical designs). Given that reality, it will likely only make sense to go to the newer processes if you need its lower-power features, but you will pay a premium for that. The days of free transistors with a new node appear to be numbered until they make some radical manufacturing breakthroughs to improve the economics (which they might eventually do, but it currently isn't on anyone's roadmap down to 10nm). Silicon architects need to now get smarter, as they likely won't have many more transistors to work with at a given product price point.

If memory-bound problems start becoming a priority (and transistors get cheap enough), we might see a shift back from DRAM to SRAM for main memory.
Given the above situation, and that fast SRAMs tend to be quite a bit larger than fast DRAMs (6T vs 1T+C) and the basic fact that the limitation is currently the interface to the memory device, not the memory technology, a shift back to SRAM seems mighty unlikely.
The next "big-thing" in the memory front is probably WIDEIO2 (the original wideio1 didn't get many adopters). Instead of connecting an SoC (all processors are basically SoC's these days) to a DRAM chip, you put the DRAM and SoC in the same package (either stacked with through silicon vias or side-by-side in a multi-chip package). Since the interface doesn't need to go on the board, you can have many more wire to connect the two, and each wire will have lower capacitance which will increase the available bandwidth to the memory device.
Re:Density limit - not computational limit by gman003 · 2014-08-16 13:07 · Score: 1

Odd that TSMC is so pessimistic, because Intel claims their 22nm node was their most high-yield ever, and even their 14nm yield is pretty high for this early in development. Perhaps the multi-gate FinFETs helped? I know TSMC is planning FinFET for 16nm later this year. That's not a "radical manufacturing breakthrough" but it is a pretty substantial change that could change their yields considerably.

Re:Limits of physics by phantomfive · 2014-08-16 05:17 · Score: 1

President Romney agrees with you too.

I see you are from the reality where the Republican Senate repealed the laws of physics. The time-space continuum is altering already.

--
"First they came for the slanderers and i said nothing."

Re:You CAN break the laws of physics by Anonymous Coward · 2014-08-16 05:32 · Score: 1

Right, which is why we live in the leisure society with 10 hour workweeks, everyone has a flying car, a Star Trek replicator and personal warp drive space ships.

You are clueless. You live in a bubble of technology created by people infinitely smarter than you and you are happy with comic-book levels of understanding.

Re: Lightfoot by fzlotnick · 2014-08-16 05:33 · Score: 4, Informative

The speed of light is approximately .3 X 10^8 m. Per sec in a vacuum. It's about half as fast in a semiconductor like silicon. So closer to 6 inches. Nearly all chips are less than one inch. Even if this were not the case, that would not be an upper limit, data does not have to reach the end of the chip before the next clock cycle. This is an example of the author having a bit of knowledge ( erroneous, as you point out) and extrapolating an incorrect answer.

Re:Lightfoot by philip.paradis · 2014-08-16 05:33 · Score: 1

Please see propagation delay.

--
Write failed: Broken pipe

Unconventional architectures and quantum computing by earthforce_1 · 2014-08-16 06:05 · Score: 1

I see increasing emphasis in the future on unconventional architectures to solve certain problems
http://www.research.ibm.com/ar...
http://en.wikipedia.org/wiki/Q...

and a little further into the future, single molecule switches and gates.
http://en.wikipedia.org/wiki/M...

We have a ways to go, but at some point we are going to have to say bye-bye to the conventional transistor.

--
My rights don't need management.

Re:Reminds me of Lord Kelvin... by wonkey_monkey · 2014-08-16 06:37 · Score: 1

As Einstein showed, yes things are relative.

"Things," eh? Any particular "things"?

He also showed that one particular thing was absolute, if you recall.

--
systemd is Roko's Basilisk.

What we need for efficiency by hAckz0r · 2014-08-16 06:45 · Score: 1

The human brain is a marvel of technology. Brain waves move through it as waves of activity. It only consumes (most) energy where the wave of intensified activity is passing through it. If a 3d circuit could be made to sense when a signal is incoming then it could be more efficient. In this paradigm its no 1's and 0's, but rather circuit on vs circuit off. In addition, if you could turn those on/off cycles into charge pump circuits then you could essentially recycle the a partial of that charge and reuse it in a casade like or layered circuit. I believe Sun Micro was working on one such design, but the cost benifits were not there at the time to make it to production. Things have changed.

Re:What we need for efficiency by janvo · 2014-08-16 14:41 · Score: 1

IBM is working on something like this, a 'neuromorphic' chip: http://www.nytimes.com/2014/08...

Re:Duplicate article, ignore. by leonardluen · 2014-08-16 06:49 · Score: 1

remember 512k is good enough for everyone.

we aren't quite there yet...

Re: Lightfoot by TechyImmigrant · 2014-08-16 06:53 · Score: 1

Right. There's no way you'd run a signal across a one inch chip and expect to get anything useful out the other end.

In days of yore, the signal would be buffered a few times.

These days it would pass through 5 clock domains and power boundaries and so have to be rebuffered, resynchronized, levelshifted and firewalled at each stage. But this is normal and we do it all the time.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.

Re:Lightfoot by HiThere · 2014-08-16 08:05 · Score: 1

There is also the assumption that the chip structure is 2-D. This is already not totally true, though there are tremendous heat problems as you start stacking layers. This is one of the attractions of "spintronics"...state can be switched with less heat.

--

I think we've pushed this "anyone can grow up to be president" thing too far.

Power use is NOT proportional with voltage by Anonymous Coward · 2014-08-16 08:07 · Score: 1

Maybe Markov should go back to school.. Power use is modeled as voltage squared, not as proportional.
Apologies to Markov if it is just the summary that is wrong.

Re:Power use is NOT proportional with voltage by Mr+Z · 2014-08-16 11:11 · Score: 1

That's true for active power. (V^2/R). For leakage power, it's even worse. That looks closer to exponential. I've seen chip for which leakage accounted for close to half the power budget.
Supposedly FinFET /Tri-gate will help dramatically with leakage. We'll see.

--
Program Intellivision!

Is that really correct?? by udippel · 2014-08-16 09:02 · Score: 1

Power use is proportional to the chip's operating voltage, and transistors simply cannot operate below a 200 milli-Volt level

Wow. To me it is like P~U^2. So proportional, but not linear.
And where would that 200 mV level come from? In my understanding it depends very much on the semiconductor used.

Re:Is that really correct?? by slew · 2014-08-16 12:51 · Score: 1

200mV likely comes from a generic analysis of CMOS on Silicon wafer oxide assuming you don't want a leakage factor more than 50% the current (most of which comes from the subthreshold conduction current) and you don't do any weird body-biasing techniques (which would consume lots of circuit area). It isn't a hard number but a general ballpark. Since everyone is scaling down the supply voltage, we must also scale down the threshold voltage and then the amount a signal is below the threshold voltage when you are 'off' is now conducting at a level that is a significant fraction of when it is 'on' meaning the device no longer has much noise margin to work reliability at all.
Of course with different insulators and conduction semiconductors and transistor types and tolerance for leakage and reliability this will be different.

Seems simple enough by jd · 2014-08-16 09:11 · Score: 1

You need single isotope silicon. Silicon-28 seems best. That will reduce the number of defects, thus increasing the chip size you can use, thus eliminating chip-to-chip communication, which is always a bugbear. That gives you effective performance increase.

You need better interconnects. Copper is way down on the list of conducting metals for conductivity. Gold and silver are definitely to be preferred. The quantities are insignificant, so price isn't an issue. Gold is already used to connect the chip to outlying pins, so metal softness isn't an issue either. Silver is trickier, but probably solvable.

People still talk about silicon-on-insulator and stressed silicon as new techniques. After ten bloody years? Get the F on with it! These are the people who are breaking Moore's Law, not physics. Drop 'em in the ocean for a Shark Week special or something. Whatever it takes to get people to do some work!

SoI, since insulators don't conduct heat either, can be made back-to-back, with interconnects running through the insulator. This would give you the ability to shorten distances to compute elements and thus effectively increase density.

More can be done off-cpu. There are plenty of OS functions that can b e shifted to silicon, but where the specialist chips have barely changed in years, if not decades. If you halve the number of transistors required on the CPU for a given task, you have doubled the effective number of transistors from the perspective of the old approach.

Finally, if we dump the cpu-centric view of computers that became obsolete the day the 8087 arrived (if not before), we can restructure the entire PC architecture to something rational. That will redistribute demand for capacity, to the point where we can actually beat Moore's Law on aggregate for maybe another 20 years.

By then, hemp capacitors and remsistors will be more widely available.

(Heat is only a problem for those still running computers above zero Celsius.)

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Re:Seems simple enough by ultranova · 2014-08-16 11:34 · Score: 1

Finally, if we dump the cpu-centric view of computers that became obsolete the day the 8087 arrived (if not before), we can restructure the entire PC architecture to something rational. That will redistribute demand for capacity, to the point where we can actually beat Moore's Law on aggregate for maybe another 20 years.

Please explain how your vision is different from, say, OpenCL?

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:Seems simple enough by slew · 2014-08-16 13:26 · Score: 1

Single isotope silicon? Silicon wafers surfaces (where the transistors are) are generally doped with ions using diffusion and etched, and the most serious defects are usually parametric due to patterning issues. We've go a long ways to go before actually isotope purity is going to be a limiting factor...
Conductivity of gold vs copper? Copper is a better conductor than gold (although silver is a better conductor than both of them). The reason that gold is used for *connections* is that it is more malleable than copper allowing it to make a more robust physical connection. For conduction, copper or silver is much better. The reason that silver isn't used today is that the processes needed to etch it are their infancy. Also, there's a reluctance to go there, because it's known that silver is much more prone to electro-migration issues than copper and the gain in conductivity is relatively small (compared from the step between aluminum and copper).
Also, silicon on insulator isn't w/o problems. The main problems today are the floating-body problem** which will likely render SOI impractical for devices that need high frequency switching (e.g, a CPU) in future process nodes. This is why after a brief commercial taste of this technology, many companies are moving away from it except for specialty products (like Z-ram).
Also, if your assertion is true that insulators don't conduct heat (very well), now you can't get heat away from one hemisphere of your circuit (instead of heat conducting up and down). Wouldn't that tend to make more problems than it solves?
**As I understand it, in bulk silicon, there is a leakage path that bleeds the capacitance away, but on an insulator, the body of a transistor is effectively a large parasitic capacitor. Failure to fully discharge the transistor body after switching creates somewhat of a memory effect limiting performance and potentially causing a parasitic transistors to drain floating nodes (e.g., in latches and xor-gates) in a operational sequence indeterminate way influenced by neighboring transistors. This makes it hard to margin for and may ultimately make it unworkable to obtain the tolerances needed for high speed design.
Re:Seems simple enough by ChrisMaple · 2014-08-16 14:10 · Score: 1

The most recent IC transistors - FinFETs and the like - have the control element (gate electrode) on three of the four sides of the gate. The gate-to-substrate region is rather a small part of the gate surface, compared to processes a decade ago; this should reduce the magnitude of the floating-body problem.
Alas, my knowledge of this is becoming obsolete, so I could easily be wrong.

--
Contribute to civilization: ari.aynrand.org/donate
Re:Seems simple enough by jd · 2014-08-16 15:48 · Score: 1

OpenCL is highly specific in application. Likewise, RDMA and Ethernet Offloading are highly specific for networking, SCSI is highly specific for disks, and so on.
But it's all utterly absurd. As soon as you stop thinking in terms of hierarchies and start thinking in terms of heterogeneous networks of specialized nodes, you soon realize that each node probably wants a highly specialized environment tailored to what it does best, but that for the rest, it's just message passing. You don't need masters, you don't need slaves. You need bus switches with a bit more oomph (they'd need to be bidirectional, support windowing and handle multipath routing where shortest route may be congested).
Above all, you need message passing that is wholly target-independent since you've no friggin' clue what the target will actually be in a heterogeneous environment.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:Seems simple enough by ultranova · 2014-08-17 01:37 · Score: 1

OpenCL is highly specific in application. Likewise, RDMA and Ethernet Offloading are highly specific for networking, SCSI is highly specific for disks, and so on.

Well, since the CPU already specializes in general-purpose serial computation, other nodes in a heterogenous environment must logically specialize for either generic parallel computation or specific applications, otherwise you have just plain old SMP.

But it's all utterly absurd. As soon as you stop thinking in terms of hierarchies and start thinking in terms of heterogeneous networks of specialized nodes, you soon realize that each node probably wants a highly specialized environment tailored to what it does best, but that for the rest, it's just message passing. You don't need masters, you don't need slaves. You need bus switches with a bit more oomph (they'd need to be bidirectional, support windowing and handle multipath routing where shortest route may be congested).

That describes neither how this heterogenous network of equal nodes would function (how do you dispatch tasks to nodes without the dispatching node becoming de facto master) nor what advantage it would have over current model (heterogenous network of nodes with some specializing in overall control). In fact it sounds a lot like buzzword bingo.

Above all, you need message passing that is wholly target-independent since you've no friggin' clue what the target will actually be in a heterogeneous environment.

You mean like the extension card mechanism PCs have had from the very beginning? Also, SATA seems to be remarkably uncaring of whether the device on the other end stores information on spinning disks or in electric capacitors.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:Seems simple enough by jd · 2014-08-17 08:02 · Score: 1

Let's start with basics. Message-passing is not master-slave because it can be instigated in any direction. If you look at PIC Express 2.1, you see a very clear design - nodes at the top are masters, nodes at the bottom are slaves, masters cannot talk to masters, slaves cannot talk with slaves, only devices with bus master support can be masters. Very simple, totally useless.
Ok, what specifically do I mean by message passing? I mean, very specifically, a non-blocking, asynchronous routable protocol that contains an operation and a data block as an operand (think: microkernels, MPI-3). If you're clever, the operand is self-describing (think: CDF) because that lets you have overloaded functions.
The CPU is a bit naff, really. I mean, at least some operations can be pushed into a Processor In Memory, you have a fancy coprocessor for maths that you're repeatedly (and expensively) calling to create the functions that exist as a limited subset in FFTW, BLAS and LAPack. Put all three, in optimized form, along with your basic maths operations into a larger piece of silicon. Voila, massive speed boost.
But now let's totally eliminate the barrier between graphics, sound and all other processors. Instead of limited communications channels and local memory, have distributed shared memory (DSM) and totally free communication between everything. Thus, memory can open a connection to the GPU, the GPU can talk to the disk, Ethernet cards can write direct to buffers rather than going via software (RDMA and OpenSockets concepts, just generalized).
You now have a totally open network, closer to Ethernet than PCI or HyperTransport in architecture, but closer to C++ or Java in protocol, since the data type determines the operation.
What room, in such a design, for a CPU? Everything can be outsourced.
Now, move onto Wafer Scale Integration. We can certainly build single wafers that can take this entire design. Memory and compute elements, instead of segregated, are mixed. Add some pipelining and you have an arrangement that could blow most computer designs out the water.
Extrapolate this further. Instead of large chunks of silicon talking to each other, since the protocol is entirely routable, get as close to individual compute elements as you can. Have the router elements take care of heat and congestion issues, rather than compilers. Since packet headers can contain whatever label information you want, you have a notion of processes with independent storage.
It doesn't (or shouldn't) take long to figure out that a true network, rather than a bus, architecture will let you move chunks
of the operating system (which is just a virtual machine, anyway) into the physical computer, eliminating the need for running an expensive bit of simulation.
And this is marketspeak? Marketspeak for what? Name me a market that wants to eliminate complexity and abandon planned obsolescence in favour of a schizophrenic cross between a parallel Turing machine, a vector computer and a Beowulf cluster.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:Seems simple enough by ultranova · 2014-08-17 13:44 · Score: 1

But now let's totally eliminate the barrier between graphics, sound and all other processors. Instead of limited communications channels and local memory, have distributed shared memory (DSM) and totally free communication between everything.

This sounds a lot like NUMA. Which, I might add, absolutely requires differentiating between local and non-local memory, since the latter is much slower.

Thus, memory can open a connection to the GPU,

Like GPUs have done since the time of AGP? Or did you mean memory will simply send some random data for no particular reason?

the GPU can talk to the disk,

For what purpose? Do you plan to write a file system driver that runs on the GPU? To accomplish... what, exactly speaking?

Ethernet cards can write direct to buffers rather than going via software (RDMA and OpenSockets concepts, just generalized).

Haven't they done this a long time now? In fact, don't all devices that do significant IO use direct memory access?

What room, in such a design, for a CPU? Everything can be outsourced.

And the part that keeps track of the overall program execution state and issues these outsources tasks to other components is, for all intents and purposes, a CPU.

Have the router elements take care of heat and congestion issues, rather than compilers.

...What the heck are you talking about?

And this is marketspeak? Marketspeak for what? Name me a market that wants to eliminate complexity and abandon planned obsolescence in favour of a schizophrenic cross between a parallel Turing machine, a vector computer and a Beowulf cluster.

None does. It's the "schizophrenic" part that's the killer. Which is why, if you need to sell garbage anyway, you litter your product description with enough trendy buzzwords to convince technologically illiterate that it's cutting edge high tech. Which, if you are trying to polish a particularly smelly turd for a sale, can end up using almost all of them. And that can have great synergy with illusion-challenged human resources seeking a solution for cynicism management.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

Re: Lightfoot by Anonymous Coward · 2014-08-16 09:33 · Score: 1

The speed of light in a vacuum is about 3.0 x 10^8 m/sec, not 0.3 x 10^8 m/sec. Still, your 6" per nanosec at half the speed of light in a vacuum is about right.

Can you fit that in a laptop? by tepples · 2014-08-16 09:50 · Score: 1

Heat is only a problem for those still running computers above zero Celsius.

Good luck fitting your frozen computer into a laptop case or something else that can be used while riding public transit. Not everybody is content to just "consume" on a "mobile device" while away from mains power.

Re:Can you fit that in a laptop? by jd · 2014-08-16 15:34 · Score: 1

Hemp turns out to make a superb battery. Far better than graphene and Li-Ion. I see no problem with developing batteries capable of supporting sub-zero computing needs.
Besides, why shouldn't public transport support mains? There's plenty of space outside for solar panels, plenty of interior room to tap off power from the engine. It's very... antiquarian... to assume something the size of a bus or train couldn't handle 240V at 13 amps (the levels required in civilized countries).

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Re: Lightfoot by K.+S.+Kyosuke · 2014-08-16 10:36 · Score: 1

Local (narrow) interconnects are several times slower than that, though. You need wide ones across longer distances. And I'm not really sure the whole thing with routing an impulse from place A to place B is that simple anymore.

--
Ezekiel 23:20

popular science is back by holophrastic · 2014-08-16 10:48 · Score: 1

one day, computers will be twice as fast and ten times as big -- vacuum tubes? meet transistors.
computers can't get any more popular because we'll run out of copper. . . zinc. . . nickel -- welcome to silicon. Is there enough sand for you?

everything will stay the way it is now forever. things will never get any faster because these issues that aren't problems today will eventually become completely insurmountable.

relax. take it easy. we don't solve problems in-advance. capitalism is about quickly solving huge problems, while totally ignoring small and medium problems.

wait for it. computers will be different in twenty years. I promise.

Re:Reminds me of Lord Kelvin... by ultranova · 2014-08-16 11:22 · Score: 1

He also showed that one particular thing was absolute, if you recall.

Nope. Einstein showed consequences of the speed of light being a constant of nature. He didn't show or even predict that it was one, that was done by Maxwell's equations and various attempts to measure Earth's velocity relative to luminous aether (which turned out to be "zero").

And as it happens, one of those consequences is that timewise and spacewise distance are relative.

--

Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

Re:Why not integrate entire C-library functions? by Mr+Z · 2014-08-16 11:28 · Score: 1

And you think printf() and strtol() are major bottlenecks worth dedicated silicon area why?

Modern CPUs already have many accelerators for high end functions, such as numerical computations, cryptography, and the all important memcpy. (Memory copies are a traditional bottleneck, and general enough that they can be easily offloaded.) They come in two forms—specialized SIMD/vector instruction sets, and dedicated blocks for high-level functions that take multiple microseconds. An example of the former are the SIMD-oriented AVX instructions found on modern x86 chips. As an example of the latter, chips aimed at high end signal processing often have discrete blocks such as FFT accelerators. Others aimed at network tasks (especially DPI) have regular expression engines.

The problem with accelerator blocks is that they do take up area. And if they're powered up, they leak. Leakage current is a significant factor in modern designs. To get faster transistors, you need to drive their threshold voltage down. As you lower the threshold voltage, their leakage current goes up exponentially. So, that circuit better be bringing a lot of bang for the buck if it's going to be sitting there taking up space and leaking.

Another issue with dedicating area to fixed functions is the impact it has on distance between functions on the die. In the Old Days, you could get anywhere on the die in a single clock cycle. With modern designs and modern clock rates, cross-die communication is slow, taking many many cycles. So, when you plop down your custom accelerator, you have to figure out where to put it. Do you put it right in the middle of the rest of the computational units, slowing down the communication between their functions (either lowering clock rate or increasing cycle counts), or do you put it on the other side of the cache, meaning it takes several cycles to send it a request and several cycles to see the result?

This is why many custom accelerator blocks out there today focus on meaty workloads. A large FFT still takes a good bit of time to execute, and there's usually other work the main CPU can do while it executes. Thus, the communication overhead doesn't tank your performance. printf(), on the other hand, generally shows up right in the middle of a bunch of other serial steps. You can't overlap that with anything. Hauling off to a printf() accelerator block generally would make zero sense. If you're really spending that much time in printf(), you're better off rewriting the code to use a less general facility.

A final issue with dedicated hardware is that you can't patch it. Someone finds a bug in your printf() and you're back to using a library version. I could go on, but I think I've made my point.

--
Program Intellivision!

Re:Lightfoot by Mr+Z · 2014-08-16 12:00 · Score: 1

Wires on silicon aren't a vacuum. The dominant effect is actually RC delay. As you make wires smaller, the resistance goes up (inversely proportional to cross-sectional area). As you make the wires closer together, capacitance goes up (inversely proportional to distance between the conductors). So, as geometries shrink, propagation delays for real signals in real wires on real silicon go up.

I won't even get into buffers which are required to recondition the signal on long routes... (Someone elsewhere on the thread already did.)

--
Program Intellivision!

Re:Lightfoot by FatdogHaiku · 2014-08-16 14:07 · Score: 1

Reminds me of Grace Hopper and her nano second samples...
https://www.youtube.com/watch?v=1-vcErOPofQ
It's a 10 minute Letterman interview but well worth the time...

--
You have the right to remain sentient. If you give up the right to remain sentient, you will be elected to public office

The limit is human by ulatekh · 2014-08-16 14:16 · Score: 1

Your reasoning is false. Most AI algorithms are having a high level of parallelism which make them less susceptible to the single CPU physical limit. You can achieve incredible performance improvement on GPU and other parallel architectures.

Good luck finding enough programmers that can write code with that level of parallelism.

Most of the multithreaded code I encounter in the real world simply slaps mutexes around things, whether or not they're needed, or even applied consistently. Most of the time, the mutex could be replaced with something cheaper, like atomic operations, or even unique state-transitions on a single volatile global variable.

Your experience may differ. Maybe I just have the back luck of working with morons most of the time.

--
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters

Re: The limit is human by 4wdloop · 2014-08-17 04:03 · Score: 1

Except multithreading is used to do many usually different things at the same time while a paraller algorithms does the same thing on different part of data. Think car mounting line vs cloth sawing swet shop....

--
4wdloop
Re:The limit is human by AK+Marc · 2014-08-19 13:45 · Score: 1

Good luck finding enough programmers that can write code with that level of parallelism.
Just buld an AI that programs AI in a highly parallel fashion. What could possibly go wrong?

--
Learn to love Alaska

Re:Why not integrate entire C-library functions? by ChrisMaple · 2014-08-16 14:17 · Score: 1

Intel's iAPX 432 was a 1981 attempt to do what you suggest, the reference language being Ada. It was a resounding flop.

--
Contribute to civilization: ari.aynrand.org/donate

The marching morons! by ulatekh · 2014-08-16 14:22 · Score: 1

You are clueless. You live in a bubble of technology created by people infinitely smarter than you and you are happy with comic-book levels of understanding.

So you're saying that Cyril M. Kornbluth was right? Race you to Venus!

--
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters

Wrong, wrong, wrong by ChrisMaple · 2014-08-16 14:26 · Score: 2

Power use is proportional to the chip's operating voltage

Wrong.

transistors simply cannot operate below a 200 milli-Volt level

Wrong. Get the voltage too low and they won't be fast, but they won't necessarily stop working.

And of course, the analysis of the communications issue is also wrong.

There are obvious and non-obvious physical limitations that limit scaling, but nobody is being helped by this muddy, error-ridden presentation.

--
Contribute to civilization: ari.aynrand.org/donate

5 GHz limit? by ko7 · 2014-08-16 20:13 · Score: 1

" Even if signals in the chip were moving at the speed of light, a chip running above 5GHz wouldn't be able to transmit information from one side of the chip to the other."

Eh?

At 300 Megameters per second, the signal would travel 6cm during one clock cycle. Just how large of a "chip" are we talking about, and how much clock skew can we design into our processor?

I call bullshit on the above statement.

Re:Commenting on signal not crossing chip by expatriot · 2014-08-16 22:11 · Score: 1

Pipelining increases performance and instructions per cycle, but at the cost of power efficiency as branches cause a pipeline flush.

The problem is balancing area, performance, and performance.

There are obviously limits the the ability to make smaller circuits, even the ones described as 14nm are not really 14 in the same way 160 was 160. There is a lot of wasted space because of the LELE process and the need to minimise crosstalk and distortion.

The real limit however is not how much better X-ray exposure will shrink the size, but how much it costs to make circuits, 28nm is likely to be the most cost efficient size for some time to come. Many fabs are making chips in larger process sizes for fast turnaround and cheap masks.

Must be ten years again.. by doccus · 2014-08-17 07:45 · Score: 1

Every ten years I hear the same thing. "We have reached the linits of processor technology" I remember hearing it in 1994 upon the arrival of the pentium.. that the x86 processor was maxed out.. during 2004, when the next gen x86 chips arrived.. Now it.s 2014.. and it's the same tune again. Suuure. They'll find a breakthrough. Count on it.

Re:Lucy foretold this... by doccus · 2014-08-17 07:47 · Score: 1

But at the rate this is happening, the majority of a chip will have to be kept inactive at any given time, creating what Markov terms 'dark silicon.

When it's believed that computers only use 10% of their silicon, imagine if we could use 100% of our processors' capacity at the same time!

*mind blown*

*processor also blown*

But hey.. We also only use 10% of our carbon...

Re:Reminds me of Lord Kelvin... by kwbauer · 2014-08-18 17:32 · Score: 1

It is very plain that many parts of the Bible are not meant to be taken literally. The age of the earth being the most obvious.

Re: Lightfoot by AK+Marc · 2014-08-19 13:58 · Score: 1

Also, there's the issue of assuming that there's one instruction per clock. It's common for some instructions to take longer than one cycle, and it's possible to have fuzzy logic, and not even link output to clock, though those usually fail.

--
Learn to love Alaska

Re:Reminds me of Lord Kelvin... by AK+Marc · 2014-08-19 14:19 · Score: 1

We all know Bees can't fly.

--
Learn to love Alaska

Re: Lightfoot by TechyImmigrant · 2014-08-20 03:07 · Score: 1

I don't think you design chips do you?

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.

Slashdot Mirror

Processors and the Limits of Physics

112 of 168 comments (clear)