AMD's Piledriver To Hit 4GHz+ With Resonant Clock Mesh
MojoKid writes about some interesting news from AMD. From the article: "Advanced Micro Devices plans to use resonant clock mesh (PDF) technology developed by Cyclos Semiconductor to push its Piledriver processor architecture to 4GHz and beyond, the company announced at the International Solid State Circuits Conferences (ISSCC) in San Francisco. Cyclos is the only supplier of resonant clock mesh IP, which AMD has licensed and implemented into its x86 Piledriver core for Opteron server processors and Accelerated Processing Units. Resonant clock mesh technology will not only lead to higher clocked processors, but also significant power savings. According to Cyclos, the new technology is capable of reducing power consumption by 10 percent or bumping up clockspeeds by 10 percent without altering the TDP."
Unfortunately, aside from a fuzzy whitepaper, actual technical details are all behind IEEE and other paywalls with useless abstracts.
it's all vaporware till they ship, and it works.
if they pull it off though, might give Intel a run for their money again, it's about time!
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
But how will it scale? How many FLOPS can it pull? GHz doesn't mean squat.....
Misread the title as "AMD's Piledriver To Hit 4GHz+ With Resonant Cloak Mesh." Must say, thats a lot cooler than the reality.
Less it can provide competition for Intel's cpu's at same price level and not use a ton more power like they have can't say its any point to care.
So why post an article that contains no meaningful information?
Oh wait . . . never mind. I forgot where I was.
Whatever makes a better processor is a good thing, but I find it ironic AMD promoting higher clock speeds after renaming their processors due to the clock speed wars.
Intel is already running at 4GHz+. Ok not officially, but it is almost impossible to find a Sandy Bridge K series that won't easily overclock to 4Ghz or more. I bumped my 2600k to 4GHz. No voltage increase, no messing around, just turned the multiplier up. Zero stability issues, doesn't even draw a ton more power. Basically they are just being conservative for thermal reasons.
The 22nm Ivy Bridge is soon to launch as well. Never mind any potential better OCing, it is faster per clock than SB. Well SB is a good bit faster than Bulldozer (who's architecture Piledriver uses) per clock, sometimes more than a bit (depends on what you are doing).
So no, they'd need way more speed to give Intel any kind of run for their money, unfortunately. What they really need is a better design, something that does better per clock, but of course new designs take a long time and BD itself was quite delayed.
Remember the one and only time AMD did eclipse Intel was during Intel's P4 phase. Intel had decided to go for low work per clock, high clock speed. Well speeds didn't scale as they'd hoped and the P4 was not as powerful for it. AMD chips were tops. However the Core architecture turned all that around. It was very efficient per clock, and each generation just gets better. Meanwhile AMD stagnated on new architectures, and then released Bulldozer which is not that great.
Also they have to fight the losing fab battle. They spun off their fabs and as such aren't investing tons of R&D in it. Well Intel is, and thus are nearly a node ahead of everyone else. Other companies are just in the last few months getting their 32nm node and 28nm half-node production lines rolling out products to retail channels. Intel has their 22nm node process complete and is fabbing chips for retail release in a couple months. So they've got that over AMD, until other fabs catch up, by which time Intel will probably have their 14nm half-node process online in Chandler (the plant construction is in full swing).
Sadly, things are just not good in the x86 competition arena. AMD competes only in a few markets, and Intel seems to edge in more and more. Servers with lots of cores for reasonable prices seems to be the last place they really have an edge, and that is a small market.
I don't want to see a one player game, but AMD has to step it up and this unfortunately is probably not it. If they make it work, expect Intel to just release faster Core i chips with higher TDP specs. The massive OCing success shows they could do so with no problem.
Sounds fake. Like Hyperthreading. Sounds like they are doing some trick to make the numbers better, but not really improve any performance.
But I'm just an armchair slashfag know-it-all.
See Athlon vs P4. Both were best for single threaded stuff, owing to a single core. However the Athlon did more with less, got better performance at lower clocks. Why? It could do more per clock, or more properly took less clocks to execute an instruction.
IPC matters and the Core i series is really good at it. Bulldozer, not as good. What that means is that all other things being equal, BD needs to be clocked higher than SB to do the same calculations in the same time.
Well that is also a problem because the Core i series are beasts with regards to clock speed. You more or less cannot find a k series part that won't overclock to 4GHz on stock cooling at stock voltages with no stability issues.
You need both good IPC and good clock speed for bitchin' single threaded performance. Really the only thing Bulldozer has going for it is that it isn't a true 4 core system, in the classic way of thinking about it. It isn't a full 8 cores, but it is more than just 4 cores with 2 threads per core. So that can help for highly parallel stuff. Unfortunately, usually even in those cases SB wins out, and there is plenty that is not so highly parallel.
There are no technical details. It's intellectual property, so it's powered by pixie dust, mana potions, and lawyers. Can't get more meaningful than that.
You can hold down the "B" button for continuous firing.
This is a real thing. I'm sure anyone who has been reading the research literature knows that it works, but it's just very difficult to do well.
IBM was selling POWER 6 processors running at 5Ghz years ago.
Maybe it will catch up to the Sandy Bridge Core i5 now?
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
-Each logic gate in a chip needs a clock signal to get to it.
-This is normally done via a wavy, wormy mesh of clock wires.
-Clock skew (When clock is sent, it takes x microseconds to traverse the chip) scales exponentially. 10% at 1ghz is 100mhz skew; 10% at 3ghz is 300mhz skew. And so on.
-Clock skew = VERY BAD, big limiting factor in making faster chips.
-Cyloes has solution.
A: Replace the clock with a simple "Tank Circuit" clock, to reduce the [possibility of the clock not working.
B: Replace a massive mesh of interweaving wires with a "clock plane". Most PCB's have a voltage and grounding plane, why not a clock plane?
This design principal has some advantages:
+Less length of wire in the processor = more savings. As hz increases, the savings in power are exponential. As you increase hz the entire name of the game becomes figuring out how to put fewer electrons through the die. Less wiring = less resistance = less power = less heat = more potential for speed.
+No more designing parts of the processor to time around each other; the whole plate loads and unloads very predictably now and superconductive materials can be inserted between the clock and plate to increase saturation speed.
Realworld:
10-35% decrease in power usage.
Scaling the processor to more transistors or more ghz is now much less problematic.
I am currently currently specing out my next two new machines, and for the first time since 1999 I am going intel. The X2 and X3 that are being replaced were just a pain in the ass, and directly out of the box felt unimpressive.
And its not like they were a great deal or anything, less than 20$ difference, so whats the angle?
amd boards have better pci-e I/o and lanes then Intel boards
So, are they going to give their chips porn names? Piledriver? Someone needs to clue these guys in... Just calling your stuff by construction equipment names doesn't make it cool. I'm a Mac user, and I still hate the fact Apple has latched onto this "let's call our products by their code names" crap. Guess I should look forward to the days of AMD's Cleveland Steamer processors.
but clock resonance sounds like it wouldn't play well with changing the clock frequency.
Quick background: Currently clocks on most generic chips today are structured as trees. As you can imagine the fan-out of the clock trees is pretty large and thus require clock buffers/driver circuits which need to be balanced so that clock signal gets to the leaves at about the same time (in a typical design where you don't use a lot of physical design tricks). To ease balancing the propagation delay, the clock tree is often physically looks like a fractalized "H" (just imagine the root clock driving in the center of the crossbar out towards the leaves at the corners of the "H", the wire lengths of the clock tree segments are the same, then the corners the big H driving the center of a smaller "H", etc, etc). Of course at the leaves, there can be some residual imbalance due to small manufacturing variations and wire loading and that has to be accounted for in closing the timing for the chip (to avoid short paths), and ultimatly these imbalances limit the upper frequencies achievable by the chip.
Additional background: In any electrical circuit, there are some so-called resonant frequencies because of the distributed (or lumped) inductance and capacitances in the network. That is some frequencies experience a lot less energy loss than average (for the car analogy buffs, you can get your car to "bounce" quite easily if you bounce it at it's resonant frequency).
The basic idea of the Cyclos technology is to "short-circuit" the middle of the clock tree on the chip with a mesh to make sure all the middle of the clock tree is coordinated to be the same clock (as oppposed to a typical H tree clock, in every stage the jitter builds up from the root). That way you avoid some of the imbalances the limit the upper frequencies achievable by the chip. The reason I say "short-circuit" is that it really isn't a "short circuit". If you just arbitrarily put in a mesh in the middle of a clock tree, although it would tend to get the clocks aligned, it would presents a very large capacitive and inductive load to drive and would likely increase power greatly. **Except** if that mesh was designed so that it resonated at the frequency that you were going to drive the clock, then you can get the benefit of jitter reduction w/o the power cost. Since you get to pick the physical design parameters of the mesh (wire width, length, and grid spacing, and external tank circuit inductance) and the target frequency, theoretically you can design that mesh to be resonant (well, that remains to be seen).
The reason this idea hasn't been used to date is that it's a hard problem to create the mesh with the proper parameters and now the processor really has to just run at that frequency all the time (well, you can do clock cycle eating to approximate lower frequencies). Designers have gotten better at these things now and the area budgets for these types of things have gotten in the affordable range as transistors have gotten smaller.
FWIW, In a pipeline design (like a cpu), sometimes it's advantagous to have a clock-follows-signal clocking topology or even an async strategy instead of a clock tree, but there of course is a complication if there is a loop or cycle in the pipeline (often this happens at say a register file or a bypass path in the pipeline), so that trick is limited in appliciablity, where the mesh idea is really a more general solution to clock network jjitter problems.
Here's a white paper that describes this idea... http://www.cyclos-semi.com/pdfs/time_to_change_the_clocks.pdf
Have gnu, will travel.
and has been for at least 5 years. A theoretical 10% performance boost? Gimme a break. I upgraded from a Core2Duo E6600 @ 2.4GHz to a quad core i5 2600k which runs at an overclocked 4.5GHz on air... Day to day, the new rig delivers a *mostly* perceptible performance advantage, but nothing earth shattering... I give you several recent changes that felt bigger:
1. Moving from hard drive to SSD
2. Moving from a DirectX9 class GPU to a DirectX 11 GPU (at least in games).
3. Move from pre-JIT JS browser engine to a JIT-engined browser.
As far as desktop CPU development goes, I think the future is largely about optimizing software for the multi-core architectures, not adding Gigahertz.
AMD isn't going to die for a long time. I am honestly an AMD fanboy and use them in all of my computers. But the fact of the matter is that even if Intel could beat them to a pulp (and most likely they could if they wanted to due to their power and warchest) but the fact of the matter is that Intel will not allow them to die even if they have to start sabotaging themselves to do it. AMD is the only thing keeping them from Monopoly status and a LOT more scrutiny and regulation. So long as they can point to AMD as what they are competing against they can keep from that level of pressure.
Now I really do hope that one day AMD becomes a serious threat to Intel but as it stands right now, they are only there cause Intel needs them. They do have stuff they are good at and have a price point I love and great stability and they do force Intel step up their game on the performance and prices though.
They are Intels best friend and worst enemy at the same time. Without AMD, Intel would be a Monopoly and regulated as such. But with AMD, Intel has to actually watch them some and compete with them some or they risk actually becoming a sizable threat.
link
Cyclos resonant clock mesh technology employs on-chip inductors to create an electric pendulum, or "tank circuit", formed by the large capacitance of the clock mesh in parallel with the Cyclos inductors. The Cyclos inductors and clock control circuits "recycle" the clock power instead of dissipating it on every clock cycle like in a clock tree implementation, which results in a reduction in total IC power consumption of up to 10%.
Inductors save power because unlike most other circuit elements, inductors are able to store energy in a magnetic field so it can be used later on. This is part of how switching power supplies get their efficiency.
We don't see the world as it is, we see it as we are.
-- Anais Nin
that trades of space for lower power. It might be an easy trade to make when you need enough area for the thousands of bumps anyway. Its a good thing to do because other wise the power per area gets really unmanageable. ARM and Intel have been using many approaches to keep down power this is just the next lowest hanging fruit. I doubt that AMD has a lock on this technology. Hopefully they can make a few bucks out of it. It is great to see AMD keeping Intel and ARM on their toes. This may put a crimp in the plans of the OC crowd. I don't care much about OC, it's a lot of noise and expense when I rarely tax the processor anyway.
I don't even know what one is. And I haven't even glanced at the fine article. I just know I want one of those. Sounds so shiny. Just wanna say it over again and again and again...
Resonant Clock Mesh
Resonant Clock Mesh
Resonant Clock Mesh...
One man's pink plane is another man's blue plane.
Agreed [that it looks like vaporware]. It's a breathlessly ebullient press release sales pitch.
Agreed it's a sales pitch. But not vaporware at all. Very neat solution. (I saw another with similar properties a couple years ago but this one is 'way better.)
The issue is the power consumption of the clocking of the chip. Modern designs are primarily layers of D-type flip-flop registers separated by small amounts of random logic and all the flip flops are clocked simultaneously, all the time. The clock signal is input to ALL the flipflops and a bit of the random logic. I'm guessing somewhere between one in five and one in ten gate inputs are driven about equally by CLK or ~CLK. Further, the other signals flip between one and zero once, sometimes, on each cycle. ALL the CLK signals flip from zero to one and back to zero EVERY cycle. So there's a lot of activity on the clock.
In CMOS the load on the clock is primarily capacitave - the stray capacitance of the CMOS gates and wiring - plus some losses, mainly due to the resistance of the wiring. The stray capacitance has to be charged and discharged every cycle. The charge represents energy. In a conventional design the clock drivers are essentially the same thing as logic gates (inverters). New energy is supplied from the power supply (and about half of it, excluding signal-line resistive losses, dumped as heat in the pullup transistors of the drivers) every cycle as the lines are charged. Then the charge is dumped to ground (and the rest of the energy dumped as heat in the pulldown transistors). All that energy gets lost as heat every cycle, and it represents about 30% of the power consumed by the chip. It would be nice to scavenge it and reuse most of it for the next tick.
A previous invention used a half-wave transmission line looped around the chip and connected plus-to-minus. A big mobius strip. The CLK and ~CLK loads acted as distributed capacitance around the transmission line. A clock waveform circulated continuously, twice per cycle. Instead of a sea of drivers providing new energy and then throwing it away every cycle, the transmission ring had a few drivers distributed around it, keeping the wave circulating and correctly formed, and pumping in enough energy to replace the resistive losses while the bulk of the energy went round-and-round. Result: Most of the clock power requirements and heating load go away.
Unfortunately, the circulating clock wave meant the region completing a computation ALSO went round-and-round, rather than everything switching at the same time. Stock design tools assume CLK/~CLK is simultaneous (except for minor variations) across the whole chip. So using that earlier system would require a major rewrite on the stock tools and new design methodologies.
THIS system does a similar hack energetically, but with everything in sync. Instead of a sea of drivers driven by a carefully-balanced tree of pre-drivers, the CLK and ~CLK are constructed as a pair of heavy-conductor meshes - like two stacked layers of flattened-out window screens. These form two plates of a capacitor. These plates are connected by an inductor, forming a resonant "tank circuit". When this is "pumped up" by a few drivers and is "ringing", energy alternates between being an electric field between the screens and a magnetic field in the inductor coil, twice (once for each polarity) each cycle. Again the bulk of the energy is reused over and over while the drivers only have to replace the (mostly) resistive losses (and pump it up initially, over a number of cycles). Again the bulk of the clock power and heating is gone. But this time the whole chip is switching essentially simultaneously, so the stock design tools just work.
Neat!
Downside (of both inventions): You can't quickly start and stop the clock in a given area or run it more than a few percent off the speed set by the resonance of the tank circuit or transmission line. No overclocking. Also no clock gating to save power on quiesc
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
At the time nothing used was available.
It was about the time people were building clusters with Playstation 2 consoles because it was the only cost effective way to get a platform with that CPU (pity about the lack of memory). I haven't even bothered to check since. A couple of weeks with daily calls from a very slimy salesman put me off and gave me an idea of what it would be like to be a good looking woman in a bar full of desperate horny lowlifes.
The anecdote was about IBM selling the things, but instead of making an effort selling them they were looking almost like they were deliberately setting them up to fail.
I have something like this, it uses water cooling and I call it "hydromesh technology"(Patent Pending)!
I've seen the cyclos public presentations a few years ago. A somewhat simpler explanation would be: On a chip such as a big processor a large part of the power is used distributing the clock signal. This is for two reasons. 1) It takes more power to get a clock distribution network closer to zero skew, and processor design is done zero skew. 2) Ratio of Clock power vs data power goes up as you put less data stages between your flops. (double clock speed, double data activity vs double clock activity x twice the flops (or local clock gates) load). Making an advance processor the most suitable chip type for a technology that's potentially difficult to use. These chips already use clock meshes for their distribution. The mesh looks to it's driver like a big capacitor. The driver does work charging the mesh each cycle. (imagine a child on a swing, lift the swing up once and let go, child annoyingly stops at bottom of swing, repeat). In this technology the driver is replaced by a 'kicker' and a big inductor is matched to the capacitance of the mesh. Now you have a resonant circuit (child is swinging without stopping, just needs a small nudge each time). Less power at the cost of either on chip or off chip inductor, lots of non-standard design flow and $$$ to cyclos for the clever bits (the kicker and tool knowledge).
You are missing one point of the clock mesh. The mesh is only the middle part of the clock tree. This part serves to spread the clock globally on a single metal mesh with zero skew. Then the lower part of the clock tree is built normally and can be gated. This means that you can certainly turn off the clock of quiescent sections of the chip. You will still have to oscillate the mesh, but the registers and detailed distribution can be turned off.
Also you can probably run this mesh off the resonant frequency, but at a power disadvantage. Or you could dynamically vary the L if that was possible.
A few months ago, I have upgraded an older PC to an AMD 4 way Phenom II as well. I chose the 910e for its low TDP of 65 watts and I'm so far quite happy with it.
But Intel has similar parts, like the Core i5-2400S. If it wasn't for the ECC RAM support the AMD offers (but Intel only in expensive Xeons), I might have gone with Intel this time. In most reviews, the i5-2400S wins clearly on performance.
So it will be a good thing if AMD can boost its performance/power ration and become more competitive.
C - the footgun of programming languages
I don't know about you, but I would be concerned about the effects of a resonance clock mesh cascade failure.
I know a guy who had to deal with a resonance cascade and it wasn't pretty.
Goodbye Slashdot. You've changed.
Eh, you're better buying stock in a telecom. They all pay dividends.
IBM hit 5.2 GHz in 2010. That's with all cores active and constant operating speed, too.
Specially, sometime the best algortihme isn't the most efficient one.
It might be better to use some less efficient algorithm, but that can be better paralized.
A O(n log n) algorithme sounds better than a O(n ^ 2). Except when the first is sequential, and the second could be parallelise along N, having n separate thread of O(n) complexity each.
Concrete exemple : ...except that it's rather serial and as such is best algorithm *for single thread* usage. ...except most are completely independant and thus can be done in parallel fashion.
- For sorting big datasets, quick sort is among the best known algorithms...
- Sorting nets have a lot of exchange/compare operation.
End result: when running in parallel, sorting nets out perform quicksort, even if they rely on more operations.
Yes, nine women can't do 1 baby in one month.
But if you want 9 babies, it's better to ask 9 different women. Not to wait 81 months on 1 single lady.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
The main problem that AMD has with both the Brazzos and Llano designs is production. Simply put, demand is so high that they were unable to meet the demand for all of their chips. Because of this and yeild issues with Global Foundaries, they jetisoned GF and worked with TSMC to get it out. From what the 3 prior posts (great explanations and why I still read /.) said, My take is that AMD now has a method of cutting the power consumption of chips like the A350 from 12 watts to 1/3 - 1/2 (4 - 6 watts) while staying on the same process size. This opens the production floodgates across the board because there are plenty of 45nm foundaries with available capacity.
Remember that one of the problems they had with Bulldozer was getting the yeilds up and this may be a solution to some of that problem. Another issue that comes to mind is that Global Foundaries simply didn't have enough capacity available, so AMD was going to have to farm some of their chip production out. So what chips make sense? Those still using the 45nm process and if this offers anything near what I see in power reduction, then they would finally be able to meet Intel's Performance per Watt and start regaining market share.
Mod me up/Mod me down: I wont frown as I've no crown
Too bad the article is lacking on the technical details. This is about energy efficiency, not GHz. They have hit 4GHz and higher with a traditional clock mesh. The point here is that they hit 4GHz with a resonant clock mesh. What this means is that instead of charging and discharging the huge capacitor that is the clock grid every cycle using only FETs connected to VDD and VSS (traditional digital logic), there is an LC tank circuit that is resonating with the clock grid. The power rails still do some of the charging and discharging of the grid, but now some of the energy comes from the oscillator. I have seen the paper, the distributed LC tank is pretty cool. The technical achievement is that they got this to run at 4GHz while keeping the skew between clock grid points as low as a traditional mesh (had the skew increased, the max frequency of the processor would go down). They claim reduced clock power of 25%. Given that clock power is roughly half the power of a core, that's a 12% power reduction overall, pretty impressive. It's also really cool that the whole thing is on-die - they made the inductors out of back-end wires on the CPU die itself, no additional components on the package so no increased cost.
Intel transfer the difficult from Hadware to software, for get more power, programmer need more technology. -- chinaitn
Sandy Bridge already runs at 4 GHz just fine without this bullshit stuff.
I'm going to pour my entire savings into AMD stock - anyone else with me?
Yup, I will too... on a short, let me know how many you will be buying.
-AI
For me, it is far better to grasp the Universe as it really is than to persist in delusion
I think you'd be talking minimum 150 months for the single lady, it takes a few months to reset the equipment.
The fertility in human mothers is linked to breast feedin / weaning. No breast feeding mean shorter period of time until reset. Longer delay until weaning mean longer time until reset.
And the 81 months were an image anyway.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]