AMD's Piledriver To Hit 4GHz+ With Resonant Clock Mesh
MojoKid writes about some interesting news from AMD. From the article: "Advanced Micro Devices plans to use resonant clock mesh (PDF) technology developed by Cyclos Semiconductor to push its Piledriver processor architecture to 4GHz and beyond, the company announced at the International Solid State Circuits Conferences (ISSCC) in San Francisco. Cyclos is the only supplier of resonant clock mesh IP, which AMD has licensed and implemented into its x86 Piledriver core for Opteron server processors and Accelerated Processing Units. Resonant clock mesh technology will not only lead to higher clocked processors, but also significant power savings. According to Cyclos, the new technology is capable of reducing power consumption by 10 percent or bumping up clockspeeds by 10 percent without altering the TDP."
Unfortunately, aside from a fuzzy whitepaper, actual technical details are all behind IEEE and other paywalls with useless abstracts.
it's all vaporware till they ship, and it works.
if they pull it off though, might give Intel a run for their money again, it's about time!
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
for a single executing thread of a specific bit width GHz means everything.
The trick is can they scale it to multiple cores/threads, while lowering their power to match Intel's performance/Watt at the high end of the compute arena. If they can do that they will once again pull in DC customers.
-nB
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Man, I had to read that 4 times and I'm still not quite exactly sure what you're saying.
Let me give it a stab.
Unless it can provide competition for Intel's CPUs at the same price level, and not use a ton more power to do it (as they have been doing recently), I don't think there is any point in caring.
Communication isn't just about belching words, but actually putting them down so people can understand them.
So why post an article that contains no meaningful information?
Oh wait . . . never mind. I forgot where I was.
Intel is already running at 4GHz+. Ok not officially, but it is almost impossible to find a Sandy Bridge K series that won't easily overclock to 4Ghz or more. I bumped my 2600k to 4GHz. No voltage increase, no messing around, just turned the multiplier up. Zero stability issues, doesn't even draw a ton more power. Basically they are just being conservative for thermal reasons.
The 22nm Ivy Bridge is soon to launch as well. Never mind any potential better OCing, it is faster per clock than SB. Well SB is a good bit faster than Bulldozer (who's architecture Piledriver uses) per clock, sometimes more than a bit (depends on what you are doing).
So no, they'd need way more speed to give Intel any kind of run for their money, unfortunately. What they really need is a better design, something that does better per clock, but of course new designs take a long time and BD itself was quite delayed.
Remember the one and only time AMD did eclipse Intel was during Intel's P4 phase. Intel had decided to go for low work per clock, high clock speed. Well speeds didn't scale as they'd hoped and the P4 was not as powerful for it. AMD chips were tops. However the Core architecture turned all that around. It was very efficient per clock, and each generation just gets better. Meanwhile AMD stagnated on new architectures, and then released Bulldozer which is not that great.
Also they have to fight the losing fab battle. They spun off their fabs and as such aren't investing tons of R&D in it. Well Intel is, and thus are nearly a node ahead of everyone else. Other companies are just in the last few months getting their 32nm node and 28nm half-node production lines rolling out products to retail channels. Intel has their 22nm node process complete and is fabbing chips for retail release in a couple months. So they've got that over AMD, until other fabs catch up, by which time Intel will probably have their 14nm half-node process online in Chandler (the plant construction is in full swing).
Sadly, things are just not good in the x86 competition arena. AMD competes only in a few markets, and Intel seems to edge in more and more. Servers with lots of cores for reasonable prices seems to be the last place they really have an edge, and that is a small market.
I don't want to see a one player game, but AMD has to step it up and this unfortunately is probably not it. If they make it work, expect Intel to just release faster Core i chips with higher TDP specs. The massive OCing success shows they could do so with no problem.
There are no technical details. It's intellectual property, so it's powered by pixie dust, mana potions, and lawyers. Can't get more meaningful than that.
You can hold down the "B" button for continuous firing.
The bulldozer and i7-2600k were about same performance wise but that is 8 core cpu vs 4 cores + HT. Powerusage of both machines at wall was like 250watts under load. When you overclocked both the bulldozer to 4.8ghz and i7 to 5ghz, i7 used 80 more watts, the bulldozer doubled its draw to over 500 watts, i think it was 550 watts.
Single core performance is all that matters when processing a toolpath for CNC machining.
Rubbish. There is no way your CNC machining app will even get close to the minimum latency that a single AMD core is capable of. What you are really saying is that your vendor is slow to get a clue about parallel programming.
Have you got your LWN subscription yet?
Maybe it will catch up to the Sandy Bridge Core i5 now?
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
but clock resonance sounds like it wouldn't play well with changing the clock frequency.
What you are really saying is that your vendor is slow to get a clue about parallel programming.
Maybe there are CNC algorithms that aren't easily parallelizable. Or (more likely) they can be paralellized, but the CNC development teams haven't got around to doing that yet. It doesn't really matter which as far as the consumer is concerned -- in either case, they will want a chip that maximizes single-threaded performance. Finger-pointing doesn't help them one bit, but fast CPUs might.
I don't care if it's 90,000 hectares. That lake was not my doing.
Quick background: Currently clocks on most generic chips today are structured as trees. As you can imagine the fan-out of the clock trees is pretty large and thus require clock buffers/driver circuits which need to be balanced so that clock signal gets to the leaves at about the same time (in a typical design where you don't use a lot of physical design tricks). To ease balancing the propagation delay, the clock tree is often physically looks like a fractalized "H" (just imagine the root clock driving in the center of the crossbar out towards the leaves at the corners of the "H", the wire lengths of the clock tree segments are the same, then the corners the big H driving the center of a smaller "H", etc, etc). Of course at the leaves, there can be some residual imbalance due to small manufacturing variations and wire loading and that has to be accounted for in closing the timing for the chip (to avoid short paths), and ultimatly these imbalances limit the upper frequencies achievable by the chip.
Additional background: In any electrical circuit, there are some so-called resonant frequencies because of the distributed (or lumped) inductance and capacitances in the network. That is some frequencies experience a lot less energy loss than average (for the car analogy buffs, you can get your car to "bounce" quite easily if you bounce it at it's resonant frequency).
The basic idea of the Cyclos technology is to "short-circuit" the middle of the clock tree on the chip with a mesh to make sure all the middle of the clock tree is coordinated to be the same clock (as oppposed to a typical H tree clock, in every stage the jitter builds up from the root). That way you avoid some of the imbalances the limit the upper frequencies achievable by the chip. The reason I say "short-circuit" is that it really isn't a "short circuit". If you just arbitrarily put in a mesh in the middle of a clock tree, although it would tend to get the clocks aligned, it would presents a very large capacitive and inductive load to drive and would likely increase power greatly. **Except** if that mesh was designed so that it resonated at the frequency that you were going to drive the clock, then you can get the benefit of jitter reduction w/o the power cost. Since you get to pick the physical design parameters of the mesh (wire width, length, and grid spacing, and external tank circuit inductance) and the target frequency, theoretically you can design that mesh to be resonant (well, that remains to be seen).
The reason this idea hasn't been used to date is that it's a hard problem to create the mesh with the proper parameters and now the processor really has to just run at that frequency all the time (well, you can do clock cycle eating to approximate lower frequencies). Designers have gotten better at these things now and the area budgets for these types of things have gotten in the affordable range as transistors have gotten smaller.
FWIW, In a pipeline design (like a cpu), sometimes it's advantagous to have a clock-follows-signal clocking topology or even an async strategy instead of a clock tree, but there of course is a complication if there is a loop or cycle in the pipeline (often this happens at say a register file or a bypass path in the pipeline), so that trick is limited in appliciablity, where the mesh idea is really a more general solution to clock network jjitter problems.
Here's a white paper that describes this idea... http://www.cyclos-semi.com/pdfs/time_to_change_the_clocks.pdf
He's not talking about running the g-code, he's talking about generating it from a model. Most CAM software are very CPU intensive for toolpath generation.
Mind the frickin' laser...
Have gnu, will travel.
and has been for at least 5 years. A theoretical 10% performance boost? Gimme a break. I upgraded from a Core2Duo E6600 @ 2.4GHz to a quad core i5 2600k which runs at an overclocked 4.5GHz on air... Day to day, the new rig delivers a *mostly* perceptible performance advantage, but nothing earth shattering... I give you several recent changes that felt bigger:
1. Moving from hard drive to SSD
2. Moving from a DirectX9 class GPU to a DirectX 11 GPU (at least in games).
3. Move from pre-JIT JS browser engine to a JIT-engined browser.
As far as desktop CPU development goes, I think the future is largely about optimizing software for the multi-core architectures, not adding Gigahertz.
Except they won't sell them to you unless you are Sony or a reseller that's used to Defence pork contracts. The last time I finally got a price on a POWER CPU system (after two annyoing weeks of the salesguy building up a "relationship" and carefully weighing my wallet) I gave up and got four Xeon systems that were almost as good each for a lower price than the single POWER CPU system.
link
Cyclos resonant clock mesh technology employs on-chip inductors to create an electric pendulum, or "tank circuit", formed by the large capacitance of the clock mesh in parallel with the Cyclos inductors. The Cyclos inductors and clock control circuits "recycle" the clock power instead of dissipating it on every clock cycle like in a clock tree implementation, which results in a reduction in total IC power consumption of up to 10%.
Inductors save power because unlike most other circuit elements, inductors are able to store energy in a magnetic field so it can be used later on. This is part of how switching power supplies get their efficiency.
We don't see the world as it is, we see it as we are.
-- Anais Nin
Agreed [that it looks like vaporware]. It's a breathlessly ebullient press release sales pitch.
Agreed it's a sales pitch. But not vaporware at all. Very neat solution. (I saw another with similar properties a couple years ago but this one is 'way better.)
The issue is the power consumption of the clocking of the chip. Modern designs are primarily layers of D-type flip-flop registers separated by small amounts of random logic and all the flip flops are clocked simultaneously, all the time. The clock signal is input to ALL the flipflops and a bit of the random logic. I'm guessing somewhere between one in five and one in ten gate inputs are driven about equally by CLK or ~CLK. Further, the other signals flip between one and zero once, sometimes, on each cycle. ALL the CLK signals flip from zero to one and back to zero EVERY cycle. So there's a lot of activity on the clock.
In CMOS the load on the clock is primarily capacitave - the stray capacitance of the CMOS gates and wiring - plus some losses, mainly due to the resistance of the wiring. The stray capacitance has to be charged and discharged every cycle. The charge represents energy. In a conventional design the clock drivers are essentially the same thing as logic gates (inverters). New energy is supplied from the power supply (and about half of it, excluding signal-line resistive losses, dumped as heat in the pullup transistors of the drivers) every cycle as the lines are charged. Then the charge is dumped to ground (and the rest of the energy dumped as heat in the pulldown transistors). All that energy gets lost as heat every cycle, and it represents about 30% of the power consumed by the chip. It would be nice to scavenge it and reuse most of it for the next tick.
A previous invention used a half-wave transmission line looped around the chip and connected plus-to-minus. A big mobius strip. The CLK and ~CLK loads acted as distributed capacitance around the transmission line. A clock waveform circulated continuously, twice per cycle. Instead of a sea of drivers providing new energy and then throwing it away every cycle, the transmission ring had a few drivers distributed around it, keeping the wave circulating and correctly formed, and pumping in enough energy to replace the resistive losses while the bulk of the energy went round-and-round. Result: Most of the clock power requirements and heating load go away.
Unfortunately, the circulating clock wave meant the region completing a computation ALSO went round-and-round, rather than everything switching at the same time. Stock design tools assume CLK/~CLK is simultaneous (except for minor variations) across the whole chip. So using that earlier system would require a major rewrite on the stock tools and new design methodologies.
THIS system does a similar hack energetically, but with everything in sync. Instead of a sea of drivers driven by a carefully-balanced tree of pre-drivers, the CLK and ~CLK are constructed as a pair of heavy-conductor meshes - like two stacked layers of flattened-out window screens. These form two plates of a capacitor. These plates are connected by an inductor, forming a resonant "tank circuit". When this is "pumped up" by a few drivers and is "ringing", energy alternates between being an electric field between the screens and a magnetic field in the inductor coil, twice (once for each polarity) each cycle. Again the bulk of the energy is reused over and over while the drivers only have to replace the (mostly) resistive losses (and pump it up initially, over a number of cycles). Again the bulk of the clock power and heating is gone. But this time the whole chip is switching essentially simultaneously, so the stock design tools just work.
Neat!
Downside (of both inventions): You can't quickly start and stop the clock in a given area or run it more than a few percent off the speed set by the resonance of the tank circuit or transmission line. No overclocking. Also no clock gating to save power on quiesc
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
I don't know about you, but I would be concerned about the effects of a resonance clock mesh cascade failure.
I know a guy who had to deal with a resonance cascade and it wasn't pretty.
Goodbye Slashdot. You've changed.