Clockless Chips
iarkin writes "TechReview is running a very interesting
article about clockless chips.
Clockless, or asynchronous, chips work very much faster and consume less power than their synchronous equivalents (Intel hade some experiments on these chips back in -97, the results showed that the asynchronous chips were three times faster and consumed only half the power)."
It wouldn't be that bad. The industry would just get away from numbers, and move to something like many software makers are doing today.
In place of a 2Ghz Pentium IV we will be seeing an Axium Gold.
It will take a little getting used to, but we'll get over it. Ford doesn't call their cars Model A's or Model T's anymore!
What *should* happen, is everyone should argee on a standardized benchmark, which is OS & architecture independent, that would become the single number comparsion between two chips. Although, I highly doubt everyone would argree to such a single benchmark.... The real problem, which (thankfully) is coming more and more out into the open, is that there is no way to meaningfully reduce today's complex general purpose CPUs to a single number, or even a small subset of numbers. Real performance is far too application dependant (and in some cases data dependant), meaning that the only truly useful benchmark for any application is to actually run the application in question. We're pretty much on the way to this already... gaming sites benchmark equipment based on how well/fast it runs a variety of common games using a variety of settings for example.
Any quoted single number is reasonably meaningless.
Of course I'm used to things getting published a little late on slashdot ;-)
M0571y H@rml355.
Actually FLOPS (floating operations per second) are too specific to be a general benchmark. They work good for gaming consoles and graphics cards because in those cases nearly every calculation involves floating points. In general processors floating point processors are only a subset of the whole processor and aren't always the most important factor.
MIPS (million instructions per second) is better, but this gets back into RISC or CISC issues. How much work does one instruction do? Not that the current MHZ system is any better in this regard. Hmm I guess then in that sense MIPS would be a good replacement for MHZ. However why would you want to move to another inaccurate measure of performance?
The factor that clockless computers have that most closly relates to MHZ is IPS or instructions per second. This is an average, obviously. One problem that this doesn't cover though is IPP or instructions per program. Related to the old RISC and CISC concepts, some computers need more instructions to get the same work done. If a standard can be found for determining IPP and some method of combining IPP and IPS can be found that makes sense in a performance measurement way.....
There are some compelling reasons:
Though synchronous design has enabled great strides to be taken in the design and performance of computers, there is evidence that it is beginning to hit some fundamental limitations. A circuit can only operate synchronously if all parts of it see the clock at the same time, at least to a reasonable approximation. However clocks are electrical signals, and when they propagate down wires they are subject to the same delays as other signals. If the delay to particular part of the circuit takes a significant part of a clock cycle-time, that part of the circuit cannot be viewed as being in step with other parts.
For some time now it has been difficult to sustain the synchronous framework from chip to chip at maximum clock rates. On-chip phase-locked loops help compensate for chip-to-chip tolerances, but above about 50MHz even this isn't enough.
Building the complete CPU on a single chip avoids inter-chip skew, as the highest clock rates are only used for processor-MMU-cache transactions. However, even on a single chip, clock skew is becoming a problem. High-performance processors must dedicate increasing proportions of their silicon area to the clock drivers to achieve acceptable skew, and clearly there is a limit to how much further this proportion can increase. Electrical signals travel on chips at a fraction of the speed of light; as the tracks get thinner, the chips get bigger and the clocks get faster, the skew problem gets worse. Perhaps the clock could be injected optically to avoid the wire delays, but the signals which are issued as a result of the clock still have to propagate along wires in time for the next pulse, so a similar problem remains.
Even more urgent than the physical limitation of clock distribution is the problem of heat. CMOS is a good technology for low power as gates only dissipate energy when they are switching. Normally this should correspond to the gate doing useful work, but unfortunately in a synchronous circuit this is not always the case. Many gates switch because they are connected to the clock, not because they have new inputs to process. The biggest gate of all is the clock driver, and it must switch all the time to provide the timing reference even if only a small part of the chip has anything useful to do. Often it will switch when none of the chip has anything to do, because stopping and starting a high-speed clock is not easy.
Early CMOS devices were very low power, but as process rules have shrunk CMOS has become faster and denser, and today's high-performance CMOS processors can dissipate 20 or 30 watts. Furthermore there is evidence that the trend towards higher power will continue. Process rules have at least another order of magnitude to shrink, leading directly to two orders of magnitude increase in dissipation for a maximum performance chip. (The power for a given function and performance is reduced by process shrinking, but the smaller capacitances allow the clock rate to increase. A typical function therefore delivers more performance at the same power. However you can get more functions onto a single chip, so the total chip power goes up.) Whilst a reduction in the power supply voltage helps reduce the dissipation (by a factor of 3 for 3 Volt operation and a factor of 6 for 2 Volt operation, relative to a 5 Volt norm in both cases), the end result is still a chip with an increasing thermal problem. Processors which dissipate several hundred watts are clearly no use in battery powered equipment, and even on the desktop they impose difficulties because they require water cooling or similar costly heat-removal technology.
As feature sizes reduce and chips encompass more functionality it is likely that the average proportion of the chip which is doing something useful at any time will shrink. Therefore the global clock is becoming increasingly inefficient.