Chip Power Breakthrough Reported by Startup
Carl Bialik from WSJ writes "The Wall Street Journal reports that a tiny Silicon Valley firm, Multigig, is proposing a novel way to synchronize the operations of computer chips, addressing power-consumption problems facing the semiconductor industry. From the article: 'John Wood, a British engineer who founded Multigig in 2000, devised an approach that involves sending electrical signals around square loop structures, said Haris Basit, Multigig's chief operating officer. The regular rotation works like the tick of a conventional clock, while most of the electrical power is recycled, he said. The technology can achieve 75% power savings over conventional clocking approaches, the company says.'"
Chip Power Breakthrough Reported
By DON CLARK
May 8, 2006; Page B6
A tiny Silicon Valley company is proposing a novel way to synchronize the operations of computer chips, addressing power-consumption problems that are a major issue facing the semiconductor industry.
Multigig Inc., a closely held start-up company in Scotts Valley, Calif., says its technology is a major advance over the clock circuitry used on many kinds of chips.
Semiconductor clocks work like the drum major in a marching band, sending out electrical pulses to keep tiny components on chips performing operations at the right time. In microprocessor chips used in computers, the frequency of those pulses -- also called clock speed -- helps determine how much computing work gets done per second.
One problem is that the energy from timing pulses flows in a one-way pattern through a chip until it is discharged, wasting most of the power. Clocks account for 50% or more of the power consumption on some chips, estimates Kenneth Pedrotti, an associate professor of electrical engineering at the University of California at Santa Cruz.
Partly for that reason, companies such as Intel Corp. have all but stopped increasing the clock speeds of microprocessors, a popular way to increase computing performance through most of the 1990s.
John Wood, a British engineer who founded Multigig in 2000, devised an approach that involves sending electrical signals around square loop structures, said Haris Basit, Multigig's chief operating officer. The regular rotation works like the tick of a conventional clock, while most of the electrical power is recycled, he said. The technology can achieve 75% power savings over conventional clocking approaches, the company says.
A typical chip would use an array of timing loops, in a grid akin to a piece of graph paper, Mr. Basit said. The loops automatically synchronize their timing pulses. That feature helps address a problem called "skew" -- the slightly different arrival times of timing pulses throughout a typical chip -- that tends to limit clock precision.
Multigig says its self-synchronizing loops can run efficiently at unusually high frequencies.
Mr. Pedrotti said past attempts to address the skew problem have tended to increase power consumption. He and his students, some of whom receive research funding from Multigig, have performed simulations that so far back up the company's claims, though the team is just about to start tests using actual chips, he said.
Multigig is in talks to license its technology to chip makers, as well as design some of its own products to use the clock technology. Besides microprocessors and other digital chips, the approach could help synchronize frequencies of communication chips, Mr. Basit said.
"This is a dramatic way of clocking circuits," said Steve Ohr, an analyst at Gartner Inc. He cautioned it could take years to get existing manufacturers to modify existing products to take advantage of the new technology. "Intel is not going to redesign the Pentium tomorrow because of it," he said.
Conventional electronics uses circular loop structures to send electrical signals as the electrons would get caught on corners that were too sharp. These people must have overcome that limitation.
So "up to" 75% savings on "up to" 50% of the electricity usage. So 3/8 or 37.5% savings, all in all... Of course this is only for the CPU... Could be noticeable in production... Maybe...
We're getting ever closer to the perpetual motion machine, just 25% energy savings to go ;-)
Seriously though, I'll look forward to seeing this new chip in production, since more energy efficient chips means less waste heat, and thus quieter computers with fewer fans. I'll trust it when I see it, I'm not so swayed by a company that is still just a "startup" probably looking to get a boost to its stock price by anouncing a breakthrough.
Oh You POS
Most of the power in a computer is used once and wasted. The input to a gate acts like a capacitor. When the input is driven from a zero to a one, the current is limited by the resistance of the output gate driving it. That resistance is where the power is dissipated. The charge is drained to ground when the input is driven from a one to a zero. If there was some way to re-use the charge stored in the inputs, the power dissipation of a chip could be dramatically reduced. There would be a limit to how much efficiency could be gained but we haven't done anything about it yet. One of the major limits to chip performance is heat and doing something like this would help to keep Moore's law valid.
Why not? If this works it sounds like Moore's law would continue, and would give whatever company that deployed it first a performance advantage.
Because first they're going to get a bunch of their theoreticians to work the math on the problem to make sure it's viable. Then they're going to get a bunch of their VLSI modellers to run virtual simulations on the clock modification to refine exactly how great the potential efficiency gain would be. If that turns out OK then they'd produce some simple mock-ups of the new clock architecture to make sure that it functions correctly in hardware. Then they'd go about the expensive and time-consuming process of redesigning the current chip architectures to include the new style clock. Then they'd produce an initial fabrication of the chip to run through extensive hardware testing (and on the inevitable failure they'd hop two steps back and try again.) Once they were happy with the design they'd scale up to full production and roll it out.
Everybody in the microprocessor design world remembers this all too well.
The gift of death metal does not smile on the good looking.
You can't readily adjust the amount of time it takes electricity to make its way around a fixed-size loop. If this is what is actually clocking the chip, it'll have an official frequency (or two, perhaps, for low-power usage) and you'll be stuck with that. The manufacturer would have to throw out, rather than derate, any parts that don't work at that frequency.
Like with asynchronous processors, maybe its downside will be the silicon area required to implement it.
Other techniques like multiple independant clock areas that can be shut down when not in use seem far more beneficial, IMHO.
Open Source Drum Kit, LPLC deve board - mjhdesigns.com
Now, whether it is linear or not, any heat reduction is a Good Thing (tm).
Hopefully we can choose between faster chips at the heat levels we have now, or the same speed chips at a 37.5% reduction in heat (and points in between).
It just amazes me that a small, never-before-heard-of-company offers a solution to a problem that Intel, IBM, and AMD have been trying to solve for over a decade, each of which have 10 times the budget, expertise, and personel. Did I mention a headstart of a minimum of 10 years of R&D tossed at this problem? I hate to be a pessimistic troll-like poster, but without even a working proof of concept, I can only call this vaporware until they show me a working product. This article says nothing except "we have technology every computer in the world will need in the next ten years... please invest in us and we'll get you a demo soon."
Why? Quite a few guys got car battery adapted to work with laptops. Up to a week on a single charge! :)
So, would it be possible to make a 3-D chip?
Yes, by stacking multipul dies in one chip. The problem however is thermal. It's hard enough getting one die to cool down. How do you propose flushing the heat of the dies sandwhiched in the middle?
Life is not for the lazy.
An impossible concept only invented like a hundred years ago. Next, they will be charging things known as capacitors from the induced current.
I share your doubts, but must point out that current hybrid cars already use regenerative braking. The efficiency is only something like 30% (losses to transmit through the CVT, generate, store, spin the motor again), but it's still a little bit of return. Since the motor is already designed to act as a generator, it should be little extra investment to program the transmission to load the motor before mechanically engaging the brakes.
In your average laptop, the power consumed by a CPU when running something (i.e. not just idling around) is about half the total power. The other half, roughly, is consumed by the screen.
The Raven
I've read the FA and despite having a couple of CMOS designs behind me I don't understand a bit of what they are saying. Either the reporter that wrote this has absolutely no idea what he is writing or this entire 'breaktrough' is just vapourware.
The article seems to say that the 'tick' of the clock is carrying energy throughout the chip and when the 'tick' hits the edge, the energy is lost. Electronics in your typical digital circuit does not work that way. Energy does not flow through the chip with the signals (ok, it does theoretically, but that amount is negliable with the dynamic losses in the gates mentioned below).
You get power dissipation in each gate or buffer that changes state because of some signal, irregardless of the direction in which the information is flowing. You can not recycle this power. This comes directly from the basic principle behind CMOS technology (used by almost all digital chips today) - you are charging and discharging a capacitor.
Typical example, that running signals in a circuit does not save power: take a ring oscillator (a number of negators wired in a loop). This circuit will oscillate (send changing signals through its loop) and consume an considerable amount of power.
P.S. In this context, the correct spelling of nerd is E-N-G-I-N-E-E-R ;^)
In most respects, chips today are ALREADY 3d in that there are multiple layers of planar (flat layers) metal wiring (anywhere from 4 to 8) connected by vias (vertical interconnect) over a single layer transistors. The routing of signals on each layer is on purpose designed to be a crazy-ass network (to avoid electromagnetic signal coupling noise between adjacent wires).
However, in current technology, there's still only 1 layer of transistors, and the main limitation of adding more is that there's no good way to get rid of the heat of transistors. Even today, there isn't a good way to get rid of the heat of the transistors in the 1 layer of current chips, let alone a big pancake stack (or lasagna) of transistors. People are already starting to stack memory chips that don't get too hot together, and I'm sure they'll eventually start doing different kind of stacks too as they get better at figuring out the heat problem...
What a breakthrough
FDIV wasn't particularly obscure; IIRC it went unnoticed for a very long time and affected many real world calculations. It was unlike many other errata in the regard that it was a documented function misbehaving and was not caught early. You could see it in action simply by loading up a spreadsheet app and doing a division. The software workaround wasn't that difficult, but the lack at the time of microcode support made it a big hassle.
The Pentium also had the more egregious F00F bug, the nonexistent opcode which would simply deadlock the processor and could be called in any mode. The workaround was a huge performance drain on all OS's. These two problems were probably the two most serious and publicized errata for the Pentium, but they were certainly not the only ones. If you are suggesting that any microprocessor of equivalent complexity produced in recent years has shipped without a flaw, I'd like to know about it.
"Intel is not going to redesign the Pentium tomorrow because of it," he said.
Why not?
For starters the automated design tools will need a rehack.
Current synchronous chips use a "clock tree" to try to get all the flops and latches to clock at once. Then the design tools assume that the outputs flip at the same time and try to route the signals so they all get through the logic to set up the flops in time for the next clock.
This scheme will produce waves of clocking that propagate across/around the chip. So different flops will be clocked at different times. This is good for signals going the same direction as the clocking wave (though not perfect, since the propagation time of a signal on a wire is NOT linear with length), because they get extra time to set up the next flop. It's rotten for signals going the other way.
But it's disaster for design software that doesn't understand the issue.
So new versions of the tools will be needed that can take the non-simultaneous clocking into account, both to compute the layout and wiring right and to take advantage of the effect to achieve improved performance by arranging for timing-tight data paths to "go with the flow" and slower stuff to go the other way.
Even if this hack works, getting those tool mods done, and getting them right, will hold up large projects using it.
(But something can be done meanwhile, with unaware tools, by doing some manual layout of blocks with respect to the clocking waves and telling the tools to treat each block as if it had a simultaneous clock internally, skewed with respect to other blocks and with less setup/hold time margin to take into account the internal skew.)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Remember, in advertising-speak, "up to" means "less than". Values between 0% and 75% fulfill the conditions of being "up to a 75% savings".
Weaselmancer
rediculous.
You're a bit confused, I think. If something is patented, then (in theory, at least), there is a publicly available patent disclosure that describes the technique in sufficient detail that anyone "skilled in the art" of its field should be able to read and implement it. Patents and trade secrets are mutually exclusive.
Clocks are not a high percentage of the power. They're not trivial but mostly the problems with clocks is the length of the line. The bus between the register file and ALU is probably 1/20th that of the clock traces.
Compared to all the other logic in a cpu from the decoders to the schedulers to the ALUs, load-store, and then all the support pipeline registers, control logic, etc not to mention the cache...
The problem with "doing away with the clock" is being able to co-ordinate things in some usable amount of time. Each pipeline stage would need bidirectional signalling to co-ordinate the state transition, etc.
There already is some of that in the modern processors. When you do
MOV EAX,[EBX]
Provided there is no dependencies the scheduler will assume it takes 3 cycles [on the AMD side] to complete. It will then stall the ALU for two additional cycles before attempting to feed something in. But the MOV may not be finished so there is some need for feedback.
That in mind, if they are truly 1-cycle ops the scheduler will pump out instructions without waiting. The write to the register file is on a clock edge, etc, etc, etc... A lot of things "just assume" data will be ready. Similarly talking with other things like memory will have to get into a clock domain making things a bit more complicated.
The fact of the matter is asynchronous circuits are not new. They're just not space efficient. You get the efficiency of running as fast as possible (e.g. if your ADD takes 0.25ns and your ADC takes 0.35ns then executing ADD will be faster as the ALU will signal it's finished 0.10ns sooner) but waste a lot of space with the syncronization steps.
If you look at the ARM case they maxed out at ~80MIPS or so. The typical Athlon gets ~1MIPS per Mhz at a minimum and upto 2MIPS per Mhz on more efficient code. So a 2Ghz processor is netting upwards of 2000MIPS. Sure it takes more power but if you install the requisit 25 ARM cores the "power efficiency" drops quickly.
That said, there are good uses for that core. It leaks less RF energy as it's not pulsing at a fixed frequency. That is, it does leak RF, just it's spread over more of the spectrum. Also if the cpu is idle it effectively is not switching which reduces power.
Tom
Someday, I'll have a real sig.
Well, the FDIV was NOT obscure (I remember seeing it in every major PC magz at that time), and it was not only one obscure bug, but more like 0.9986756235 bug.
Of Code And Men
In addition to the already cited
t ml;jsessionid=SG3NCFVRB3QWEQSNDBESKHA?articleID=18 7200783
http://www.eetimes.com/news/latest/showArticle.jh
the EE Times piece (in the printed edition not up on the web) has a sidebar,
with neat background on the inventor:
________
Christmas present leads to ratoary wave epiphany
The Rotary Traveling Wave technology was the brainchild of MultiGig Inc.
founder and chief technology officer John Wood, a self-taught inventor
and son of an inventor who developed a method for self-aligning installed
underground water pipes. In a company filled with PhDs, Wood is the only
employee without a college degree.
Wood earned millions from a patent on this technique for flash-welding
plastic materials. His passion for technology drives him to order textbooks
by the dozen when pursuing a new subject, sometimes noting their errors in
scribbled notes in the margins, said MultiGig COO Haris Basit. "I've worked at
research labs including Yorktown Heights and Bell Labs, and John is clearly
a cut above," Basit said.
In the late 1990s, Wood was researching high-speed serial I/O using
traditional ring and crystal oscillators. "As I started to explore alternatives,
the first thing I looked at was transmission times," he said.
An intitial prototype, using coaxial cables, was "not very exciting."
Then Christmas 1998 brought an ephiphany. "My son had just gotten a
car racing game with a crossover on a single track. That gave me the idea
for arranging the transmission line that way," said Wood.
After a few more months of work, Wood decided to use arrays of loops
to create an approach that could work independently of any frequency
or process technology.
"It took a year or two until we could find direct commercial applications.
Before that, I was just working on it as hobby." said Wood. "But the more we
looked at clock distribution, the more we realized this could be useful."
-- Rick Merritt
The press has a knack for distorting stories and making it very hard to figure out real technical details.
http://multigig.com/pub.html has some whitepapers. I read the ISSCC 2006 slide set, which let me know the general technique.
Basically, they produce a clock ring to produce a "differential" clock pair that after one lap swaps neg and pos and so it's frequency is tuned by it's own capacitance and inductance. They call it a "moebius" loop since it's not really a differential pair, but the clock wave makes two round trips before getting back to the start.. Neighboring loops can be tuned together (although if that's by just routing the wave throughout the chip I'm not sure). They didn't seem to mention synchronizing the period to outside sources, and I'm not sure how they'll be able to do that.
The clocking is not the interesting part to me, but rather their logic strategy. The trick is that logic itself has no connection to power or ground. The clock nets provides the "power and ground" and all logic must be done as differential (a and abar as inputs, q and qbar as outputs). This is where they get the power savings from--the swings are reduced and there's no path to power or ground to drain away charge. Without really discussing it, charge seems to just shift around on internal nodes between the differential logic states. They then use pure NMOS fets for logic, which removes all PMOS. The logic will never read the power rail, though--it will always be a Vt drop. I just looked this over quickly, but it seems the full-swing clocks and lack of PMOS make this work out fine.
For quick adoption, they'll need to work out clever techniques to connect this logic to standard clocked logic. Otherwise, it looks only a little bit easier to use than asynchronous logic. The issues they face seem very similar to asynchronous logic issues--tool support, interface to standard clocked logic, debug, test, etc.
It's not vapor.
actually...sharp turns are a problem for high frequency circuits. when the frequencies get very high compared to the wires length, the waves *do* actually reflect back from sharp corners and will favor a straight path. this is the basis for things such as tdr (when finding kinks) and directional couplers.
Clock skew impacts your timing margin (If you've got 2 flip flops that in theory see the clock at the same instant, any uncertainty in the clock arriving will inpact your timing from one to the other). One concequence of this is you often have to have larger faster drivers on both your clock tree and your logic to work around this timing problem.
Larger drivers = larger power.
Therefore if you've got a method to make your clocks arrive more accuratly then you've more timing margin between FFs and therfore can use smaller drivers.
Clock trees are also the major consumer of power in most designs, so anything that can reduce them is good.
Async removes the clock altogether so you save power there.
So yes both of them can be right.
"The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -
Last time I checked, speed of electron flow is only based on the material around it. Higher dialectric constant = lower speed of propgaition. Transmission lines aren't voodoo science, they are a property of the electrical length of the line and the rate of change of the signal on that line. It does not change the rate of propagation at all. Whether a given wire is 1" long, or 200 miles long, it will not change the speed of propagation.
I didn't say electron flow speed changes. I said signal propagtion speed changes, which is true, because if I send a "1" down a long transmission line, the receiver will get it faster than they'd get it if I send a "1" using RC-style signalling. As I tried to explain before, in a normal signalling scheme, you charge an entire line up to Vdd or Gnd, and don't detect a 1 or 0 until the signal crosses Vdd/2. Take an empty trough and start filling it up; see how long it takes the water level to reach half way up on the far side. It'll cross the half way point at the far side pretty soon after it crosses at the near side, but actually filling and emptying it will still take a while. With transmission line signalling, however, you never actually charge/discharge the whole line, but send a wave down it instead. Take a trough of water and make a ripple, then on the receiving side observe the ripple. If you want to read a proposal for on-chip transmission lines, read this.
There are a lot of issues involved with using transmission lines (for example, wires have to be long before transmission-line signalling becomes better, and you have to do impedance matching at the receiver to avoid reflections, and based on the paper I linked to, your wires need to be wide and thick), but they do offer some very cool properties.
Not to be cheeky, but it's quite easy to change a sine wave into a square wave: Schmidt trigger. While I can't rule this out entirely, I would imagine that if it was more economical to produce an LRC resonator, it would be built into devices already. These circuits have been around for decades. It's very difficult to beat quartz crystals in terms of stability, ease of use, and power consumption.
I didn't say it was a flawless idea, and I also didn't say it was a stupid idea. I DID say you have to design your circuits differently (i.e. your flip flops do schmitt-trigger-like things to compensate for the slow slew rates). I brought it up because it was an example of charge recovery that works in the real world. It does have downsides, but every option has downsides (be it power, skew, manufacturability, whatever). Based on the presentation I saw, the downsides of that particular clocking method are enough to keep it out of mass-produced designs for a while, but that doesn't mean somebody else might not have found a way to make charge-recovering clocks more realistic. It's worthwhile research (meaning it might not be in the CPU you buy tomorrow, it might not be in any mass produced CPU ever, but it might also lead to a design that IS mass produced in the future, based on the knowledge gained from this research).
Wrong. The clock drives into a high impedance node. (The CMOS receivers on the other side of the clock line). CMOS drivers do have the problem of connecting to ground temporarily during switching - more akin to spilling some of the water out of the bucket as you pour it, not pouring it entirely on the ground. This can be overcome using clocks that are 90deg out of phase.
That's not what I was talking about. Short-circuit current is not a big deal as long as your signal slew rates are good.
And if the cap that you're talking about is the 10pF or so that is on the gate of the reciever CMOS - there are larger fish to fry power wise than this minimal capacitance.
I mentioned only gate cap on the clock receivers to simplify things. Since you're goin
My server