Chip Power Breakthrough Reported by Startup
Carl Bialik from WSJ writes "The Wall Street Journal reports that a tiny Silicon Valley firm, Multigig, is proposing a novel way to synchronize the operations of computer chips, addressing power-consumption problems facing the semiconductor industry. From the article: 'John Wood, a British engineer who founded Multigig in 2000, devised an approach that involves sending electrical signals around square loop structures, said Haris Basit, Multigig's chief operating officer. The regular rotation works like the tick of a conventional clock, while most of the electrical power is recycled, he said. The technology can achieve 75% power savings over conventional clocking approaches, the company says.'"
Chip Power Breakthrough Reported
By DON CLARK
May 8, 2006; Page B6
A tiny Silicon Valley company is proposing a novel way to synchronize the operations of computer chips, addressing power-consumption problems that are a major issue facing the semiconductor industry.
Multigig Inc., a closely held start-up company in Scotts Valley, Calif., says its technology is a major advance over the clock circuitry used on many kinds of chips.
Semiconductor clocks work like the drum major in a marching band, sending out electrical pulses to keep tiny components on chips performing operations at the right time. In microprocessor chips used in computers, the frequency of those pulses -- also called clock speed -- helps determine how much computing work gets done per second.
One problem is that the energy from timing pulses flows in a one-way pattern through a chip until it is discharged, wasting most of the power. Clocks account for 50% or more of the power consumption on some chips, estimates Kenneth Pedrotti, an associate professor of electrical engineering at the University of California at Santa Cruz.
Partly for that reason, companies such as Intel Corp. have all but stopped increasing the clock speeds of microprocessors, a popular way to increase computing performance through most of the 1990s.
John Wood, a British engineer who founded Multigig in 2000, devised an approach that involves sending electrical signals around square loop structures, said Haris Basit, Multigig's chief operating officer. The regular rotation works like the tick of a conventional clock, while most of the electrical power is recycled, he said. The technology can achieve 75% power savings over conventional clocking approaches, the company says.
A typical chip would use an array of timing loops, in a grid akin to a piece of graph paper, Mr. Basit said. The loops automatically synchronize their timing pulses. That feature helps address a problem called "skew" -- the slightly different arrival times of timing pulses throughout a typical chip -- that tends to limit clock precision.
Multigig says its self-synchronizing loops can run efficiently at unusually high frequencies.
Mr. Pedrotti said past attempts to address the skew problem have tended to increase power consumption. He and his students, some of whom receive research funding from Multigig, have performed simulations that so far back up the company's claims, though the team is just about to start tests using actual chips, he said.
Multigig is in talks to license its technology to chip makers, as well as design some of its own products to use the clock technology. Besides microprocessors and other digital chips, the approach could help synchronize frequencies of communication chips, Mr. Basit said.
"This is a dramatic way of clocking circuits," said Steve Ohr, an analyst at Gartner Inc. He cautioned it could take years to get existing manufacturers to modify existing products to take advantage of the new technology. "Intel is not going to redesign the Pentium tomorrow because of it," he said.
So "up to" 75% savings on "up to" 50% of the electricity usage. So 3/8 or 37.5% savings, all in all... Of course this is only for the CPU... Could be noticeable in production... Maybe...
Most of the power in a computer is used once and wasted. The input to a gate acts like a capacitor. When the input is driven from a zero to a one, the current is limited by the resistance of the output gate driving it. That resistance is where the power is dissipated. The charge is drained to ground when the input is driven from a one to a zero. If there was some way to re-use the charge stored in the inputs, the power dissipation of a chip could be dramatically reduced. There would be a limit to how much efficiency could be gained but we haven't done anything about it yet. One of the major limits to chip performance is heat and doing something like this would help to keep Moore's law valid.
An impossible concept only invented like a hundred years ago. Next, they will be charging things known as capacitors from the induced current.
P.S. In this context, the correct spelling of nerd is E-N-G-I-N-E-E-R ;^)
In most respects, chips today are ALREADY 3d in that there are multiple layers of planar (flat layers) metal wiring (anywhere from 4 to 8) connected by vias (vertical interconnect) over a single layer transistors. The routing of signals on each layer is on purpose designed to be a crazy-ass network (to avoid electromagnetic signal coupling noise between adjacent wires).
However, in current technology, there's still only 1 layer of transistors, and the main limitation of adding more is that there's no good way to get rid of the heat of transistors. Even today, there isn't a good way to get rid of the heat of the transistors in the 1 layer of current chips, let alone a big pancake stack (or lasagna) of transistors. People are already starting to stack memory chips that don't get too hot together, and I'm sure they'll eventually start doing different kind of stacks too as they get better at figuring out the heat problem...
FDIV wasn't particularly obscure; IIRC it went unnoticed for a very long time and affected many real world calculations. It was unlike many other errata in the regard that it was a documented function misbehaving and was not caught early. You could see it in action simply by loading up a spreadsheet app and doing a division. The software workaround wasn't that difficult, but the lack at the time of microcode support made it a big hassle.
The Pentium also had the more egregious F00F bug, the nonexistent opcode which would simply deadlock the processor and could be called in any mode. The workaround was a huge performance drain on all OS's. These two problems were probably the two most serious and publicized errata for the Pentium, but they were certainly not the only ones. If you are suggesting that any microprocessor of equivalent complexity produced in recent years has shipped without a flaw, I'd like to know about it.
Better link here
t ml?articleID=187200783
http://www.eetimes.com/news/latest/showArticle.jh
Looks interesting. I wonder what they mean with 'taps', and if they calculated their power savings right (would each register need its own tap, or if not, is the buffer needed to boost the power from the loop included in the clock system power?)
--- Hindsight is 20/20, but walking backwards is not the answer.
You get power dissipation in each gate or buffer that changes state because of some signal, irregardless of the direction in which the information is flowing. You can not recycle this power. This comes directly from the basic principle behind CMOS technology (used by almost all digital chips today) - you are charging and discharging a capacitor.
You're half right. You're right that what's going on is a charging and discharging of a cap, but you're wrong that the charge can't be recycled. A conventional clock works by connecting the gates of a bunch of devices (i.e. capacitance) to Vdd, then after a little time connecting it to ground instead. Wait a little bit, then repeat. What effectively happens is that you dump some amount of charge from Vdd to ground each switch, and it's gone (i.e. it's heat now). A water analogy would be a tub of water above you (Vdd), a bucket in your hand (the capacitance), and the ground (gnd). You pour some water from the tub into your bucket (charge the cap), then dump it on the ground.
It doesn't have to be this way. There are actually ways to charge a capacitor, and then pull the charge back out again (without dumping it to ground)! I'm going to assume you're familiar with LRC circuts, and how they can resonant when an impulse is applied. What's going on during the oscilattions? Charge is moving into the capacitor, and then being pulled back out to the inductor. The same charge goes back and forth, ideally forever (of course, in practice, the resistance isn't 0 so you put out some heat and the oscillations dies down). I'm not sure what exactly the water analogy would be - maybe a wave sloshing back and forth in a trough.
I recently attended a seminar where the presenter talked about clocking based on LRC oscillations and he had actually fabbed chips that worked. The basic idea was to put an inductor on the die, and set up oscillations between the inductor and the clock load capacitance, which results in a ticking clock. Of course, you get a sinusoidal clock instead of a nice almost-square-wave, so your circuits have to be designed a little bit differently, but the point is, it works and is doable.
Now, the technology described in this article, as best as I can tell, uses another idea - transmission lines. In a normal design, your clock grid basically looks like a bunch of capacitors with resistors in between (i.e. distributed RC). It takes time for a signal to propagate - signals propagate much slower than the speed of light, becuase you actually have to charge up the capacitance along the line through the resistance of the line itself. Imagine a long trough that's empty. You start pouring water in, and although water reaches the far side pretty quickly, you don't actually observe it until the water level at the far end is half way up. Signals propagate differently when wires are set up as transmission lines - they propagate at much closer to the speed of light, because you're actually sending a wave down the line (imagine creating a ripple on a trough of water, instead of actually filling and emptying the trough).
Now, I don't understand how they combined charge recycling and transmission lines, I don't understand transmission lines all that well, but your arguments aren't good reasons to disregard the claims made by the company.
If you're interested, here is a little bit of info about the talk I went to.
Typical example, that running signals in a circuit does not save power: take a ring oscillator (a number of negators wired in a loop). This circuit will oscillate (send changing signals through its loop) and consume an considerable amount of power.
If you created an oscillator between an inductor and a capacitor, on the other hand, once you started it going, it would continue for a long time with minimal energy injected in the future.
My server
"Intel is not going to redesign the Pentium tomorrow because of it," he said.
Why not?
For starters the automated design tools will need a rehack.
Current synchronous chips use a "clock tree" to try to get all the flops and latches to clock at once. Then the design tools assume that the outputs flip at the same time and try to route the signals so they all get through the logic to set up the flops in time for the next clock.
This scheme will produce waves of clocking that propagate across/around the chip. So different flops will be clocked at different times. This is good for signals going the same direction as the clocking wave (though not perfect, since the propagation time of a signal on a wire is NOT linear with length), because they get extra time to set up the next flop. It's rotten for signals going the other way.
But it's disaster for design software that doesn't understand the issue.
So new versions of the tools will be needed that can take the non-simultaneous clocking into account, both to compute the layout and wiring right and to take advantage of the effect to achieve improved performance by arranging for timing-tight data paths to "go with the flow" and slower stuff to go the other way.
Even if this hack works, getting those tool mods done, and getting them right, will hold up large projects using it.
(But something can be done meanwhile, with unaware tools, by doing some manual layout of blocks with respect to the clocking waves and telling the tools to treat each block as if it had a simultaneous clock internally, skewed with respect to other blocks and with less setup/hold time margin to take into account the internal skew.)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
You're a bit confused, I think. If something is patented, then (in theory, at least), there is a publicly available patent disclosure that describes the technique in sufficient detail that anyone "skilled in the art" of its field should be able to read and implement it. Patents and trade secrets are mutually exclusive.
Clocks are not a high percentage of the power. They're not trivial but mostly the problems with clocks is the length of the line. The bus between the register file and ALU is probably 1/20th that of the clock traces.
Compared to all the other logic in a cpu from the decoders to the schedulers to the ALUs, load-store, and then all the support pipeline registers, control logic, etc not to mention the cache...
The problem with "doing away with the clock" is being able to co-ordinate things in some usable amount of time. Each pipeline stage would need bidirectional signalling to co-ordinate the state transition, etc.
There already is some of that in the modern processors. When you do
MOV EAX,[EBX]
Provided there is no dependencies the scheduler will assume it takes 3 cycles [on the AMD side] to complete. It will then stall the ALU for two additional cycles before attempting to feed something in. But the MOV may not be finished so there is some need for feedback.
That in mind, if they are truly 1-cycle ops the scheduler will pump out instructions without waiting. The write to the register file is on a clock edge, etc, etc, etc... A lot of things "just assume" data will be ready. Similarly talking with other things like memory will have to get into a clock domain making things a bit more complicated.
The fact of the matter is asynchronous circuits are not new. They're just not space efficient. You get the efficiency of running as fast as possible (e.g. if your ADD takes 0.25ns and your ADC takes 0.35ns then executing ADD will be faster as the ALU will signal it's finished 0.10ns sooner) but waste a lot of space with the syncronization steps.
If you look at the ARM case they maxed out at ~80MIPS or so. The typical Athlon gets ~1MIPS per Mhz at a minimum and upto 2MIPS per Mhz on more efficient code. So a 2Ghz processor is netting upwards of 2000MIPS. Sure it takes more power but if you install the requisit 25 ARM cores the "power efficiency" drops quickly.
That said, there are good uses for that core. It leaks less RF energy as it's not pulsing at a fixed frequency. That is, it does leak RF, just it's spread over more of the spectrum. Also if the cpu is idle it effectively is not switching which reduces power.
Tom
Someday, I'll have a real sig.
> the current is limited by the resistance of the output gate driving it. That resistance is where the power is dissipated. The charge is drained to ground when the input is driven from a one to a zero. If there was some way to re-use the charge stored in the inputs, the power dissipation of a chip could be dramatically reduced.
Capacitors stores charge (potential energy), not power. To discharge a capacitor, you have to transfer the charge from it to another place. You can do it fast or slow, but you always have to move the full amount of energy. Resistance during discharge causes an instantaneous power loss of P = V^2/R, where V is the voltage across the resistor (this varies with time). Although raising the resistance reduces the power loss, it also increases the time constant. That means it takes proportionally longer to transfer the charge, and your clock rate goes down. And remember that you have to pay the 'conservation of energy' bill sometime: for example, a doubled time constant means the area under the curve is twice as large, so the resistor still consumes the same amount of energy.
Faster clock rate -> lower time constant -> more heat (unless you figure out how to reduce the capacitance).
The press has a knack for distorting stories and making it very hard to figure out real technical details.
http://multigig.com/pub.html has some whitepapers. I read the ISSCC 2006 slide set, which let me know the general technique.
Basically, they produce a clock ring to produce a "differential" clock pair that after one lap swaps neg and pos and so it's frequency is tuned by it's own capacitance and inductance. They call it a "moebius" loop since it's not really a differential pair, but the clock wave makes two round trips before getting back to the start.. Neighboring loops can be tuned together (although if that's by just routing the wave throughout the chip I'm not sure). They didn't seem to mention synchronizing the period to outside sources, and I'm not sure how they'll be able to do that.
The clocking is not the interesting part to me, but rather their logic strategy. The trick is that logic itself has no connection to power or ground. The clock nets provides the "power and ground" and all logic must be done as differential (a and abar as inputs, q and qbar as outputs). This is where they get the power savings from--the swings are reduced and there's no path to power or ground to drain away charge. Without really discussing it, charge seems to just shift around on internal nodes between the differential logic states. They then use pure NMOS fets for logic, which removes all PMOS. The logic will never read the power rail, though--it will always be a Vt drop. I just looked this over quickly, but it seems the full-swing clocks and lack of PMOS make this work out fine.
For quick adoption, they'll need to work out clever techniques to connect this logic to standard clocked logic. Otherwise, it looks only a little bit easier to use than asynchronous logic. The issues they face seem very similar to asynchronous logic issues--tool support, interface to standard clocked logic, debug, test, etc.
It's not vapor.
actually...sharp turns are a problem for high frequency circuits. when the frequencies get very high compared to the wires length, the waves *do* actually reflect back from sharp corners and will favor a straight path. this is the basis for things such as tdr (when finding kinks) and directional couplers.
Last time I checked, speed of electron flow is only based on the material around it. Higher dialectric constant = lower speed of propgaition. Transmission lines aren't voodoo science, they are a property of the electrical length of the line and the rate of change of the signal on that line. It does not change the rate of propagation at all. Whether a given wire is 1" long, or 200 miles long, it will not change the speed of propagation.
I didn't say electron flow speed changes. I said signal propagtion speed changes, which is true, because if I send a "1" down a long transmission line, the receiver will get it faster than they'd get it if I send a "1" using RC-style signalling. As I tried to explain before, in a normal signalling scheme, you charge an entire line up to Vdd or Gnd, and don't detect a 1 or 0 until the signal crosses Vdd/2. Take an empty trough and start filling it up; see how long it takes the water level to reach half way up on the far side. It'll cross the half way point at the far side pretty soon after it crosses at the near side, but actually filling and emptying it will still take a while. With transmission line signalling, however, you never actually charge/discharge the whole line, but send a wave down it instead. Take a trough of water and make a ripple, then on the receiving side observe the ripple. If you want to read a proposal for on-chip transmission lines, read this.
There are a lot of issues involved with using transmission lines (for example, wires have to be long before transmission-line signalling becomes better, and you have to do impedance matching at the receiver to avoid reflections, and based on the paper I linked to, your wires need to be wide and thick), but they do offer some very cool properties.
Not to be cheeky, but it's quite easy to change a sine wave into a square wave: Schmidt trigger. While I can't rule this out entirely, I would imagine that if it was more economical to produce an LRC resonator, it would be built into devices already. These circuits have been around for decades. It's very difficult to beat quartz crystals in terms of stability, ease of use, and power consumption.
I didn't say it was a flawless idea, and I also didn't say it was a stupid idea. I DID say you have to design your circuits differently (i.e. your flip flops do schmitt-trigger-like things to compensate for the slow slew rates). I brought it up because it was an example of charge recovery that works in the real world. It does have downsides, but every option has downsides (be it power, skew, manufacturability, whatever). Based on the presentation I saw, the downsides of that particular clocking method are enough to keep it out of mass-produced designs for a while, but that doesn't mean somebody else might not have found a way to make charge-recovering clocks more realistic. It's worthwhile research (meaning it might not be in the CPU you buy tomorrow, it might not be in any mass produced CPU ever, but it might also lead to a design that IS mass produced in the future, based on the knowledge gained from this research).
Wrong. The clock drives into a high impedance node. (The CMOS receivers on the other side of the clock line). CMOS drivers do have the problem of connecting to ground temporarily during switching - more akin to spilling some of the water out of the bucket as you pour it, not pouring it entirely on the ground. This can be overcome using clocks that are 90deg out of phase.
That's not what I was talking about. Short-circuit current is not a big deal as long as your signal slew rates are good.
And if the cap that you're talking about is the 10pF or so that is on the gate of the reciever CMOS - there are larger fish to fry power wise than this minimal capacitance.
I mentioned only gate cap on the clock receivers to simplify things. Since you're goin
My server