Slashdot Mirror


Processors and the Limits of Physics

An anonymous reader writes: As our CPU cores have packed more and more transistors into increasingly tiny spaces, we've run into problems with power, heat, and diminishing returns. Chip manufacturers have been working around these problems, but at some point, we're going to run into hard physical limits that we can't sidestep. Igor Markov from the University of Michigan has published a paper in Nature (abstract) laying out the limits we'll soon have to face. "Markov focuses on two issues he sees as the largest limits: energy and communication. The power consumption issue comes from the fact that the amount of energy used by existing circuit technology does not shrink in a way that's proportional to their shrinking physical dimensions. The primary result of this issue has been that lots of effort has been put into making sure that parts of the chip get shut down when they're not in use. But at the rate this is happening, the majority of a chip will have to be kept inactive at any given time, creating what Markov terms 'dark silicon.' Power use is proportional to the chip's operating voltage, and transistors simply cannot operate below a 200 milli-Volt level. ... The energy use issue is related to communication, in that most of the physical volume of a chip, and most of its energy consumption, is spent getting different areas to communicate with each other or with the rest of the computer. Here, we really are pushing physical limits. Even if signals in the chip were moving at the speed of light, a chip running above 5GHz wouldn't be able to transmit information from one side of the chip to the other."

20 of 168 comments (clear)

  1. Dupe by Anonymous Coward · · Score: 5, Informative
  2. This seems like a good time to meniton these by nctritech · · Score: 5, Interesting

    Clockless logic circuits might be an interesting workaround for the communication problem. The other side of the chip starts working when the data CAN make it over there, for example. I don't claim to know much about CPU design beyond how the work on a basic logical level, but I'd love to hear the opinions of someone here who does regarding CPUs and asynchronous logic.

    1. Re:This seems like a good time to meniton these by TechyImmigrant · · Score: 2

      I guess that's me then.

      Every D-flip flop is an async circuit. We use a variety of other standard small async circuits we use that are a little bigger. Receiving clock-in-data signals like DS links is a common example. What you're talking about is async across larger regions.

      Scaling fully asynchronous designs to a whole chip is a false economy. The area cost is substantially greater than a synchronous design and with the static power draw of circuits now dominating, the dynamic power savings of asynchronous design is moot. You need to turn circuits off to save power. Just rendering them static doesn't help much.

      A modern CPU is made of islands of synchronous design, which are not assumed to be globally synchronous. Data passing between these islands is generally re-synchronized.

      An exception is power control signaling. Clock trees are power hogs, so you don't want to have to leave it on to support the power gating interfaces. So an async state machine to communicate power management protocols is common.

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
    2. Re:This seems like a good time to meniton these by K.+S.+Kyosuke · · Score: 2

      It depends. This is the same guy that Intel licenses a lot of power-saving patents from. You'd have to ask him, but the static power draw of his circuits is indeed minimal. Perhaps the reason is that he doesn't use manufacturing processes with high static power draw on purpose, I really don't know. It may also be the case that a switch from contemporary silicon to something else in the future will make this design more relevant again, power-wise (but the timing considerations, as well as the speed of light, are of course going to stay the same).

      --
      Ezekiel 23:20
    3. Re:This seems like a good time to meniton these by nctritech · · Score: 2
    4. Re:This seems like a good time to meniton these by TechyImmigrant · · Score: 2

      I do understand it. That patent describes an asynchronous data transfer with rendezvous using a conventional quadrature handshake. I can't imagine that there isn't prior art. That is standard stuff. The date of the patent is 2006. I finished my degree in 1991 around the same time the amulet async ARM was beginning development. My tutor at college invented the async register file for the amulet.

      The method it describes is slow because it requires a two round trips between source and destination. That is why clock-in-data schemes are preferred. The neatest of those is the DS link code that was put out by Inmos in the early 90s (I used to work there). A receiving async circuit can recover data and clock and pass it on to a synchronous receiver using normal methods.

      But it makes the same wrong assumption that the 'better than clock gating' efficiencies of async logic would be superior to a synchronous circuit. However these days it doesn't matter. In small geometries, your logic is sucking power whether or not it is clocked, due to static leakage. The way of the world these days is fine grained power gating. As per my previous post, async transactions have a role to play in power gating (because you don't need to leave the clock tree on to use them), but they are a false economy in random logic applications because the increased gate count leads to an increased static current draw.
         

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
  3. Go vertical! by putaro · · Score: 5, Interesting

    Stacking dies or some other form of going from flat to vertical will get you around some of the signaling limits. If you look back at old supercomputer designs there were a lot of neat tricks played with the physical architecture to work around performance problems (for example, having a curved backplane lets you have a shorter bus but more space between boards for cooling). Heat is probably the major problem, but we still haven't gone to active cooling for chips yet (e.g. running cooling tubes through the processor rather than trying to take the heat off the top).

    1. Re:Go vertical! by Nemyst · · Score: 2

      This. It won't be easy, of course not, but there's this entire third dimension we're barely even using right now which would give us an entirely new way to scale up. The possible benefits can already be seen in for instance Samsung's new 3D NAND, where they can get similar density to current SSDs with much larger NAND, thus improving reliability while keeping capacities and without significantly increasing costs. Of course, CPUs generate far more heat than SSDs, but the benefits could be tremendous. If anything, imagine the amount of cores you could cram in the same die area if you could stack them!

  4. can't cross chip in one clock. big deal. by dbc · · Score: 5, Interesting

    "Even if signals in the chip were moving at the speed of light, a chip running above 5GHz wouldn't be able to transmit information from one side of the chip to the other." ... in a single clock.

    So in the 1980's I was a CPU designer working on what I call "walk-in, refrigerated, mainframes". It was mostly 100K-family ECL in those days and compatible ECL gate arrays. Guess what -- it took most of a clock to get to a neighboring card, and certainly took a whole clock to get to another cabinet. So in the future it will take more than one clock to get across a chip. I don't see how that is anything other than a job posting for new college graduates.

    That one statement in the article reminds of when I first moved to Silicon Valley. Everybody out here was outrageously proud of themselves because they were solving problems that had been solved in mainframes 20 years earlier. As the saying goes: "All the old timers stole all our best ideas years ago."

    1. Re:can't cross chip in one clock. big deal. by Rockoon · · Score: 5, Interesting

      Even more obvious is that even todays CPU's dont perform any calculation in a single clock cycle. The distances involved only effects latency, not throughput. The fact that a simple integer addition operation has a latency of 2 or 3 clock cycles doesnt prevent the CPU from executing 3 or more of those additions per clock cycle.

      Even AMD's Athon designs did that. Intels latest offerings can be coerced into executing 5 operations per cycle that are each 3 cycle latency, and then thats on a single core with no SIMD.

      Its not how quickly the CPU can produce a value.. its how frequently the CPU can retire(*) instructions.

      (*) Thats actually a technical term.

      --
      "His name was James Damore."
    2. Re:can't cross chip in one clock. big deal. by AchilleTalon · · Score: 3, Informative

      Well, clearly moving mainframe people to OS/2 development wouldn't have been a so great idea. The mainframe segment was much more profitable than the PC segment where the profit margin are so thin IBM decided to sell the whole division to Lenovo. The money is elsewhere.

      And do not forget memory management has to be reinvented because there was IP rights on the MVS algorithms IBM wasn't willing to transfer to OS/2. In these old times, the PC market and mid-range market were perceived as a threat by the big mainframe guys at IBM which were still the guys at the top in the hierachy. The technical side is just the lesser part of this problem.

      --
      Achille Talon
      Hop!
    3. Re:can't cross chip in one clock. big deal. by Zero__Kelvin · · Score: 3

      I think the informed among us can agree, this whole article combines a special lack of imagination, misunderstanding of physics, and a complete lack of understanding of how computers work, in order to come up with a ridiculous article that sounds like it was written by chicken little :-)

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    4. Re:can't cross chip in one clock. big deal. by Splab · · Score: 2

      Are you saying electrons were moving slower in the 80's?

  5. what diminishing returns? by edxwelch · · Score: 2

    Each semiconductor node shrink is faster and more power effiecient than the previous. For instance, TSMC 20nm process is 30% higher speed, or 25% less power than 28nm. Likewise, 16nm will provide 60% power saving than 20nm.

  6. Re:There are no limits! by AchilleTalon · · Score: 2

    Your reasoning is false. Most AI algorithms are having a high level of parallelism which make them less susceptible to the single CPU physical limit. You can achieve incredible performance improvement on GPU and other parallel architectures.

    --
    Achille Talon
    Hop!
  7. Density limit - not computational limit by gman003 · · Score: 2

    Congratulations, you identified the densest possible circuits we can make. That doesn't even give an upper bound to Moore's Law, let alone an upper bound to performance.

    Moore's Law is "the number of transistors in a dense integrated circuit doubles every two years". You can accomplish that by halving the size of the transistors, or by doubling the size of the chip. Some element of the latter is already happening - AMD and Nvidia put out a second generation of chips on the 28nm node, with greatly increased die sizes but similar pricing. The reliability and cost of the process node had improved enough that they could get a 50% improvement over the last gen at a similar price point, despite using essentially the same transistor size.

    You could also see more fundamental shifts in technology. RSFQ seems like a very promising avenue. We've seen this sort of thing with the hard drive -> SSD transition for I/O bound problems. If memory-bound problems start becoming a priority (and transistors get cheap enough), we might see a shift back from DRAM to SRAM for main memory.

    So yeah, the common restatement of Moore's Law as "computer performance per dollar will double every two years" will probably keep running for a while after we hit the physical bounds on transistor size.

    1. Re:Density limit - not computational limit by slew · · Score: 3, Informative

      Moore's Law is "the number of transistors in a dense integrated circuit doubles every two years". You can accomplish that by halving the size of the transistors, or by doubling the size of the chip. Some element of the latter is already happening - AMD and Nvidia put out a second generation of chips on the 28nm node, with greatly increased die sizes but similar pricing. The reliability and cost of the process node had improved enough that they could get a 50% improvement over the last gen at a similar price point, despite using essentially the same transistor size.

      Bad example, the initial yield on 28nm was so bad that the initial pricing was hugely impacted by wafer shortages. Many fabless customers reverted to the 40nm node to wait it out. TSMC eventually got things sorted out so now 28nm has reasonable yields.

      Right now, the next node is looking even worse. TSMC isn't counting on the yield-times-cost of their next gen process to *ever* get to the point when it crosses over 28nm pricing per transistor (for typical designs). Given that reality, it will likely only make sense to go to the newer processes if you need its lower-power features, but you will pay a premium for that. The days of free transistors with a new node appear to be numbered until they make some radical manufacturing breakthroughs to improve the economics (which they might eventually do, but it currently isn't on anyone's roadmap down to 10nm). Silicon architects need to now get smarter, as they likely won't have many more transistors to work with at a given product price point.

      If memory-bound problems start becoming a priority (and transistors get cheap enough), we might see a shift back from DRAM to SRAM for main memory.

      Given the above situation, and that fast SRAMs tend to be quite a bit larger than fast DRAMs (6T vs 1T+C) and the basic fact that the limitation is currently the interface to the memory device, not the memory technology, a shift back to SRAM seems mighty unlikely.

      The next "big-thing" in the memory front is probably WIDEIO2 (the original wideio1 didn't get many adopters). Instead of connecting an SoC (all processors are basically SoC's these days) to a DRAM chip, you put the DRAM and SoC in the same package (either stacked with through silicon vias or side-by-side in a multi-chip package). Since the interface doesn't need to go on the board, you can have many more wire to connect the two, and each wire will have lower capacitance which will increase the available bandwidth to the memory device.

  8. Re: Lightfoot by fzlotnick · · Score: 4, Informative

    The speed of light is approximately .3 X 10^8 m. Per sec in a vacuum. It's about half as fast in a semiconductor like silicon. So closer to 6 inches. Nearly all chips are less than one inch. Even if this were not the case, that would not be an upper limit, data does not have to reach the end of the chip before the next clock cycle. This is an example of the author having a bit of knowledge ( erroneous, as you point out) and extrapolating an incorrect answer.

  9. Re:So what by ledow · · Score: 3, Informative

    Nobody says 5GHz is impossible. Read it.

    It says that you can't traverse the entire chip while running at 5GHz. Most operations don't - why? Because the chips are small and any one set of instructions tends to operate in a certain smaller-again area.

    What they are saying is that chips will no longer be synchronous - if chips get any bigger, your clock signal takes too long to traverse the entire length of the signal and you end up with different parts of the chips needing different clocks.

    It's all linked. The size of the chip can get bigger and still pack in the same density, but then the signals get more out of sync, the voltages have to be higher, the traces have to be straighter, the routing becomes more complicated, and the heat will become higher. Oh, and you'll have to have parts of it "go dark" to avoid overheating neighbours, etc. This is exactly what the guy is saying.

    At some point, there's a limit at which it's cheaper and easier to just have a bucket load of synchronous-clock chips tied together loosely than one mega-processor trying to keep everything ticking nicely.

    And current overclocking records are only around 8GHz. Nobody says you can't make a processor operating at 10THz if you want. The problem is that it has to be TINY and not do very much. Frequency, remember, is high in anything dealing with radio - your wireless router can do some things at 5GHz and, somewhere inside it, is an oscillator doing just that. But not the SAME kinds of things as we expect modern processors to do.

    Taking account that most of those overclocking benchmarks probably operate in small areas of the silicon, are run in mineral oil or similar and are the literal speed of a benchmark over a complicated chip that ALREADY takes account that signals take so long that clocks can get out of sync across the chip, we don't have much leeway at all. We hit a huge wall at 2-3GHz and that's where people are tending to stay despite it being - what, a decade or more? - since the first 3GHz Intel chip. We add more processors and more core and more threading but pretty much we haven't got "faster" over the last decade, we're just able to have more processors at that speed.

    No doubt we can push it further, but not forever, and not with the kind of on-chip capabilities you expect now.

    With current technology (i.e. no quantum leaps of science making their way into our processors), I doubt you'll ever see a commercially available 10GHz chip that'll run Windows. Super-parallel machines running at a fraction of that but performing more gigaflops per second - yeah - but basic core sustainable frequency? No.

  10. Wrong, wrong, wrong by ChrisMaple · · Score: 2

    Power use is proportional to the chip's operating voltage

    Wrong.

    transistors simply cannot operate below a 200 milli-Volt level

    Wrong. Get the voltage too low and they won't be fast, but they won't necessarily stop working.

    And of course, the analysis of the communications issue is also wrong.

    There are obvious and non-obvious physical limitations that limit scaling, but nobody is being helped by this muddy, error-ridden presentation.

    --
    Contribute to civilization: ari.aynrand.org/donate