Clockless Chips
iarkin writes "TechReview is running a very interesting
article about clockless chips.
Clockless, or asynchronous, chips work very much faster and consume less power than their synchronous equivalents (Intel hade some experiments on these chips back in -97, the results showed that the asynchronous chips were three times faster and consumed only half the power)."
.. otherwise people would've noticed this has been
posted before (sept 15)
Clockless chips will never take off. How are people supposed to draw incorrect conclusions about which chip is the fastest when there's no MHz/GHz rating?
In other news, AMD abandons all current R&D to work on clockless chips so they can win the clock-speed wars against Intel...
There is no escape from The Muffin.
If there is no clock, how do they know that they are 3 times faster? :-D
This is gonna be bad for business I tell you ...
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
It wouldn't be that bad. The industry would just get away from numbers, and move to something like many software makers are doing today.
In place of a 2Ghz Pentium IV we will be seeing an Axium Gold.
It will take a little getting used to, but we'll get over it. Ford doesn't call their cars Model A's or Model T's anymore!
97, the results showed that the asynchronous chips were three times faster and consumed only half the power
so....the reason they weren't used is because....of....what else....
$$$$$
(from marketing mhZ!)
-k.
I need a TiVo for my car. Pause live traffic now.
Async processing is a very old idea. The problem is that designing the logic for it is a far greater chore than for regular chips. CPU designers are simply not good enough to do it well yet.
Marketing just has to play up the clockless thing like it's the best ever. "Gigahertz, Schmigahertz"... "So fast it doesn't even need a clock"... etc.
---If you can't trust a nerd, who can you trust?
"How Sun swerved to avoid Rambus"
http://www.theregister.co.uk/content/3/22279.html
More details on the CPU:
http://www.theregister.co.uk/content/3/22274.html
Sun press release:
Extends UltraSPARC III Chip Family Tree--First Use of Sun-Developed Asynchronous Logic Design in Chip's Memory Interface
At Sun Labs:
feature article
async research home page
I took the clock out of my computer with an xacto knife. I immediately noticed an infinite difference in the speed at which it ran.
I also have an asynchronous clock ever since the spring in my wristwatch snapped.
People have spent the past twenty plus years designing development tools for synchronous design. There's just a lot less groundwork covered for asychronous design because no one has spent the millions of dollars to create a (mostly) new tool chain.
Intel has never produced, nor have they discussed at any ISSCC or HotCHips forum a plan for an asynchronous design.
Unless you can provide me with more detail, I think that statment is wrong.
https://www.accountkiller.com/removal-requested
Clockless chips would result, perhaps, in the most interesting (funny?) marketing.
Intel would develop a standard way of indicating performance. Based on something their particular chips are good at. We'll say they release the Pentium Clockless 1000, Pentium Clockless 2000 and Pentium Clockless 3000.
AMD would, if trends indicate anything, market them using performance ratings. Instead of deciding performance based on the intel standard, they would have new names to indicate that their processors, in some situations, are faster than their Intel counterparts. They'd probably be called the AMD Athlon Clockless XP 1100+, and so on.
In response, Intel would start releasing worse processors, but with higher numbers. Pentium Clockless II 5000 would be their flagship.
AMD would continue making their processors in the traditional manner, but would adopt a new naming mechanism. AMD Ahtlon Clockless Performance XP Super Fantastic 6000, maybe.
Repeat ad nauseum.
-NeoTomba
The main problem with async. design is the asycnchronous part of it. In a typical computer, you have tons of parts that you use interchangably. These parts have operate at different speeds. How would two devices working at different speeds operate smoothly. Generally, this is very hard. But the thing is they can: But the devices themselves need to agree on a few things. But async. design is higly complicated because in a clockless environment you have to pretty much garauntee something like "I'll do this within 2 equivalent clock cycle." or have other types of signalling negotiation. You can't clock on a "clock" to do stuff. You have to clock on a "async" signal.
This is the problem in the large. When you go down to the chip level, there are tons of nightmares. There can be feedback loops causing race conditions that only occur at certain times. There are load problems that might increase complexity so much more than equivalent problems in a clocked design. Clocked design makes things a lot simpler and still designing a chip is extremely diffucult.
But the future I don't think is in clockless design, but "careful clock" design. For example, there are chips which are smart enough to disable sending the clock to certain part of a chip when it knows those parts will never be used. That saves a lot of power. There are chips which aim to spread the clock around carefully thus increasing the speed. And remember, almost 50% of the power in a chip is lost due to the wiring!
me.
...chips work very much faster...
...Intel hade some experiments...
Unfortunately, these chips only seem to have half the spell-check and grammar-check capability.
As I understand it, traditional systems use a clock signal to let each stage of the pipeline know when the previous stage has completed. Each stage is designed to have few enough transisters that a signal has to pass through to guarantee that it will be done by the time the next clock signal arrives. Clockless systems instead design the processor such that at each step in the processing, the difference between a partial result and a completed result is self-evident. This requires more work, both in the design of the processor and in terms of transisters, but at the benefit of eliminating the clock (and many associated transisters) and any waiting between when the processor has completed a step and when the clock signal arrives.
Since dealing with the clock signal has become increasingly complex, instead dealing with not having one is becoming a more reasonable solution.
The article mentions the year -97. Perhaps this is a typo, but I kind of like the idea of using negative years for those before 2000 so that you'd subtract 2000 from a year, but that would make 1997 be year -3 not -97.
--Ben
Slashdot is SO behind. Kuro5hin had a story about this back in -96, right after the tests were done! Leave it to /. to wait 2,098 years to post a story. Sheesh.
"Destroy science and religion. Science would re-emerge exactly the same; but not religion." - Penn Jillette, paraphrased
The truth?
There is no clock.
Wind it too hard and it runs three times as fast and consumes less power!
The IBM Power4 architecture uses a "Wavepipelined" interconnect bus. This is a clockless bus. I believe the Alpha 21384 was going to use this as well.
r dw are/datactr/p690.html
Too bad IBM won't sell the chips. They only sell the servers. Each die has 170 million transistors with 2 microprocessors per die! They package 4 dies in one package totaling 710 million transistors.
It kicks the snot out of anything Intel or AMD has.
Initial benchmarks show the SPECINT2000 and
SPECFP2000 at 808 and 1169 are comfortably ahead of the competition (2GHz Pentium IV was the SPECINT leader at 656, while Alpha 21264 @833MHz was SPECFP leader at 777).
Anybody have $450,000 to spare?
http://www-1.ibm.com/servers/eserver/pseries/ha
Just test how fast Photoshop filters take to run. :) "It's as fast as a stupercomputer!"
Or more likely Intel (by then the only CPU company left of course) will start binning by actualy performance - look for "runs Win 95 fast enough", "runs NT fast enough" and the expensive "runs XP a bit" speed grades
I can't wait to see all of the timing errors that will pop up in software due to this. The defect reports due to race conditions alone will fill up Gigs of storage. Not to mention that systems will be as individual as fingerprints! Hours of debugging fun!!!
That is all.
Of course I'm used to things getting published a little late on slashdot ;-)
M0571y H@rml355.
that don't make sense: (from the article)
A chip without a clock would be about as useful as a page of text without any space between the letters
Actually, it's about as useful as a page of text that only exists when you have your eyes closed.
alot has been done on clockless
what it requires is a great understanding and stringent design
these are the reason why intel did IA64 non specultive
have a look at IBM's report in IA64 in the microprocessor report (they give good reasons why its doomed however clever people think it is)
amulet spun out of manchester and a stanford spin out company also started up
not exactly new new thing
only can be done in small teams with very trained people
but hey they got a clockless ARM running a long time ago
regards
john jones
Many, many years, and then a few more. All the current design tools and methodologies would have to be reworked, recoded, and redeveloped. The verification tools for both designs and the actual silicon would have to be thrown out the window.
Not many companies can afford to even try to do this. And, while it's still possible to increase the speed of the current sync designs through better design/better production technology, it's not worth the money to try it.
Once we hit the limit, it'll probably be a different story.
Never underestimate the bandwidth of a 747 filled with CD-ROMs.
I think interdata made/sold a relatively large number of async computers back in the 1970's.
Treatment, not tyranny. End the drug war and free our American POWs.
See my user info for links.
Sure, in theory they are possible, and tests have been done on some types of circuit.. but to claim 'asynchronous chips are smaller, take less power, and are 3x faster' is kind of silly.. if this is the case, where are the chips?
It's too bad to see such an interesting subject butchered by someone so lacking in technical knowledge. The entire article felt like a compilation of Comdex marketing brochures. Check this out:
From that first choice came the steamroller effect of Moore's Law, wherein nearly all research, development and production in the semiconductor industry has focused on clocked chips
Yeah, that made sense... Maybe she was thinking of "Murphy's Law"
Since Transmeta is already a bit off the deep end and is known for energy-saving Intel compatible CPU's it seems to me it'd be good for them to partner with one of these async companies and work on a chip that incorporates both their ideas. Because Transmeta CPU's use less hardware they'd seem to me to be easier to reimplement in this manner and because of their code morphing concept they can still be Intel compat. Because of both the code morphing and the async design they'd run with less energy and less heat and because of the async design they'd be faster than Intel. (well even if it took long enough to get to market they'd still be pretty fast.. and very good for rack mounted machines and laptops)
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
Actually FLOPS (floating operations per second) are too specific to be a general benchmark. They work good for gaming consoles and graphics cards because in those cases nearly every calculation involves floating points. In general processors floating point processors are only a subset of the whole processor and aren't always the most important factor.
MIPS (million instructions per second) is better, but this gets back into RISC or CISC issues. How much work does one instruction do? Not that the current MHZ system is any better in this regard. Hmm I guess then in that sense MIPS would be a good replacement for MHZ. However why would you want to move to another inaccurate measure of performance?
The factor that clockless computers have that most closly relates to MHZ is IPS or instructions per second. This is an average, obviously. One problem that this doesn't cover though is IPP or instructions per program. Related to the old RISC and CISC concepts, some computers need more instructions to get the same work done. If a standard can be found for determining IPP and some method of combining IPP and IPS can be found that makes sense in a performance measurement way.....
There are some compelling reasons:
Though synchronous design has enabled great strides to be taken in the design and performance of computers, there is evidence that it is beginning to hit some fundamental limitations. A circuit can only operate synchronously if all parts of it see the clock at the same time, at least to a reasonable approximation. However clocks are electrical signals, and when they propagate down wires they are subject to the same delays as other signals. If the delay to particular part of the circuit takes a significant part of a clock cycle-time, that part of the circuit cannot be viewed as being in step with other parts.
For some time now it has been difficult to sustain the synchronous framework from chip to chip at maximum clock rates. On-chip phase-locked loops help compensate for chip-to-chip tolerances, but above about 50MHz even this isn't enough.
Building the complete CPU on a single chip avoids inter-chip skew, as the highest clock rates are only used for processor-MMU-cache transactions. However, even on a single chip, clock skew is becoming a problem. High-performance processors must dedicate increasing proportions of their silicon area to the clock drivers to achieve acceptable skew, and clearly there is a limit to how much further this proportion can increase. Electrical signals travel on chips at a fraction of the speed of light; as the tracks get thinner, the chips get bigger and the clocks get faster, the skew problem gets worse. Perhaps the clock could be injected optically to avoid the wire delays, but the signals which are issued as a result of the clock still have to propagate along wires in time for the next pulse, so a similar problem remains.
Even more urgent than the physical limitation of clock distribution is the problem of heat. CMOS is a good technology for low power as gates only dissipate energy when they are switching. Normally this should correspond to the gate doing useful work, but unfortunately in a synchronous circuit this is not always the case. Many gates switch because they are connected to the clock, not because they have new inputs to process. The biggest gate of all is the clock driver, and it must switch all the time to provide the timing reference even if only a small part of the chip has anything useful to do. Often it will switch when none of the chip has anything to do, because stopping and starting a high-speed clock is not easy.
Early CMOS devices were very low power, but as process rules have shrunk CMOS has become faster and denser, and today's high-performance CMOS processors can dissipate 20 or 30 watts. Furthermore there is evidence that the trend towards higher power will continue. Process rules have at least another order of magnitude to shrink, leading directly to two orders of magnitude increase in dissipation for a maximum performance chip. (The power for a given function and performance is reduced by process shrinking, but the smaller capacitances allow the clock rate to increase. A typical function therefore delivers more performance at the same power. However you can get more functions onto a single chip, so the total chip power goes up.) Whilst a reduction in the power supply voltage helps reduce the dissipation (by a factor of 3 for 3 Volt operation and a factor of 6 for 2 Volt operation, relative to a 5 Volt norm in both cases), the end result is still a chip with an increasing thermal problem. Processors which dissipate several hundred watts are clearly no use in battery powered equipment, and even on the desktop they impose difficulties because they require water cooling or similar costly heat-removal technology.
As feature sizes reduce and chips encompass more functionality it is likely that the average proportion of the chip which is doing something useful at any time will shrink. Therefore the global clock is becoming increasingly inefficient.
Clockless, or asynchronous, chips work very much faster and consume less power than their synchronous equivalents...
Well, yeah! Look at any electronics book where they have an ALU (Arithmetic Logic Unit). You can perform whatever integer operations the unit supports in almost no time flat. It all works with so-called logic gates that are cleverly arranged in the unit. There is no need for a clock. You just spill the bits on one end of the thing and the results come flying out the other side after whatever the thing's propogation delay is. Which isn't very long. (I don't have a reference book handy right now so I can't tell you exactly.) Oh yeah, and this "technology" has been around since the invention of the transistor.
So why do we need a clock in a microprocessor? Because there are a zillion other operations going on, and it's really hard to make a system as complicated as a computer (millions of transistors, eh?) that operates asynchronously without messing things up. (With that much circuitry, it's a miracle the things work at all.) So they put a clock on the thing. The real arithmetic still happens in no time flat, but then it sits there waiting for the clock pulse to come around and allow the results through. It's really amazing shit. And I don't even know jack about 'lectronics.
But I was going to say something, and I forgot what it was. Oh well. Maybe I'll remember later. I really hate when that happens though. Oh well.
People have spent the past twenty plus years designing development tools for synchronous design. There's just a lot less groundwork covered for asychronous design because no one has spent the millions of dollars to create a (mostly) new tool chain.
Ditto tools for chip testing.
Chip testing of synchronous designs is easy, and there are automated tools to do it.
The common ones are based on fullscan or partial scan: You add a mux to each flop and use a test signal to string them into one or more shift registers. Pop into test mode, shift out the old state for examination and shift in a new state for the next steps of the test.
You can change the function of the pins on the chip to shift out a bunch of little chains quickly, or use one or a few long chains and shift through the JTAG port (which is really intended for "boundary scan", where you switch the pin drivers into a simialr scan mode controlled by the 4- or 5-pin JTAG port, and toss signals from chip to chip to see if all the chips got soldered onto the board correctly).
Scan works well on synchronous designs, where all the flops in each of several "clock domains" are clocked by a common signal. But in asynchronous designs, where each clock may be clocked by an arbitrary signal, this falls apart.
There IS a methodology - complete with automatic test program tools - that can test asynchronous designs as easily as synchronous. It's called the "Cross-Check Array". But it was never widely deployed in the United States and the company that did it has since been merged into another and by now may be gone. As far as I know, only Sony (which got an unlimited license as part of investing in Cross Check when it was a startup) is the only big user of it these days.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Still, if it runs at whatever speed it can, I suppose it'll speed up automatically when I cool it, and slow down when it overheats. Wonder if this will eliminate burnt-out chips... riskless overclocking for the masses. Maybe I should buy shares in heatsink/fan manufacturers :-)
This is also going to make consistent benchmarking a thing of the past. You'll never get the same run twice on the same chip, let alone different chips in different environments.
Why would anyone engrave "Elbereth"?
That format will sort correctly with a simple numerical sort, something which yyddMM won't do.
dave
This change in input creates instability in the system, as all logic elements affected by the input change undergo state transitions. If the resulting stable state at the output end of the logic block is the same no matter what, it's a noncritical race. However, in some cases the output can settle in different stable states depending on the order of the flipflop state transitions within the circuit. This is called a critical race, and it is a bad thing.
Critical races mean we can't predict what the output of a circuit will be given an initial state and an input value. Therefore, the circuit is worthless.
sigs are for suckers
Whenever the question of asynchronous chip design comes up, everyone points out the Intel work in '97, but nobody mentions the work done by the AMULET group in Manchester. Set up in 1990 they produced the world's first asynchronous chip in 1994, based on the ARM chipset. By the time Intel got their act in order, the second generation AMULET2e had arrived, providing higher performance than a synchronous ARM chip for the same power input.
++ Say to Elrond "Hello.".
Elrond says "No.". Elrond gives you some lunch.