AMD's Piledriver To Hit 4GHz+ With Resonant Clock Mesh
MojoKid writes about some interesting news from AMD. From the article: "Advanced Micro Devices plans to use resonant clock mesh (PDF) technology developed by Cyclos Semiconductor to push its Piledriver processor architecture to 4GHz and beyond, the company announced at the International Solid State Circuits Conferences (ISSCC) in San Francisco. Cyclos is the only supplier of resonant clock mesh IP, which AMD has licensed and implemented into its x86 Piledriver core for Opteron server processors and Accelerated Processing Units. Resonant clock mesh technology will not only lead to higher clocked processors, but also significant power savings. According to Cyclos, the new technology is capable of reducing power consumption by 10 percent or bumping up clockspeeds by 10 percent without altering the TDP."
Unfortunately, aside from a fuzzy whitepaper, actual technical details are all behind IEEE and other paywalls with useless abstracts.
This is an ad. What is a "resonant clock mesh"? That's sounds really cool. So I started RTFA (I know, sorry). You don't have to chastise me that much, because I stopped reading soon. Right after
Which was obviously not written by anybody who has any clue what they are talking about.
Quick background: Currently clocks on most generic chips today are structured as trees. As you can imagine the fan-out of the clock trees is pretty large and thus require clock buffers/driver circuits which need to be balanced so that clock signal gets to the leaves at about the same time (in a typical design where you don't use a lot of physical design tricks). To ease balancing the propagation delay, the clock tree is often physically looks like a fractalized "H" (just imagine the root clock driving in the center of the crossbar out towards the leaves at the corners of the "H", the wire lengths of the clock tree segments are the same, then the corners the big H driving the center of a smaller "H", etc, etc). Of course at the leaves, there can be some residual imbalance due to small manufacturing variations and wire loading and that has to be accounted for in closing the timing for the chip (to avoid short paths), and ultimatly these imbalances limit the upper frequencies achievable by the chip.
Additional background: In any electrical circuit, there are some so-called resonant frequencies because of the distributed (or lumped) inductance and capacitances in the network. That is some frequencies experience a lot less energy loss than average (for the car analogy buffs, you can get your car to "bounce" quite easily if you bounce it at it's resonant frequency).
The basic idea of the Cyclos technology is to "short-circuit" the middle of the clock tree on the chip with a mesh to make sure all the middle of the clock tree is coordinated to be the same clock (as oppposed to a typical H tree clock, in every stage the jitter builds up from the root). That way you avoid some of the imbalances the limit the upper frequencies achievable by the chip. The reason I say "short-circuit" is that it really isn't a "short circuit". If you just arbitrarily put in a mesh in the middle of a clock tree, although it would tend to get the clocks aligned, it would presents a very large capacitive and inductive load to drive and would likely increase power greatly. **Except** if that mesh was designed so that it resonated at the frequency that you were going to drive the clock, then you can get the benefit of jitter reduction w/o the power cost. Since you get to pick the physical design parameters of the mesh (wire width, length, and grid spacing, and external tank circuit inductance) and the target frequency, theoretically you can design that mesh to be resonant (well, that remains to be seen).
The reason this idea hasn't been used to date is that it's a hard problem to create the mesh with the proper parameters and now the processor really has to just run at that frequency all the time (well, you can do clock cycle eating to approximate lower frequencies). Designers have gotten better at these things now and the area budgets for these types of things have gotten in the affordable range as transistors have gotten smaller.
FWIW, In a pipeline design (like a cpu), sometimes it's advantagous to have a clock-follows-signal clocking topology or even an async strategy instead of a clock tree, but there of course is a complication if there is a loop or cycle in the pipeline (often this happens at say a register file or a bypass path in the pipeline), so that trick is limited in appliciablity, where the mesh idea is really a more general solution to clock network jjitter problems.
Here's a white paper that describes this idea... http://www.cyclos-semi.com/pdfs/time_to_change_the_clocks.pdf