Clockless Computing: The State Of The Art
Michael Stutz writes: "This article in Technology Review is a good overview of the state of clockless computing, and profiles the people today who are making it happen." The article explains in simple terms some of the things that clockless chips are supposed to offer (advantages in raw performance, power consumption and security) and what characteristics make these advantages possible.
Damnit, Jim, I'm an anarchist, not a F@#$!^& doctor!
What will AMD and Intel try to one-up each other with? No clock speed, so how do you classify, much less hype, new processors?
The real reason they haven't moved to this yet is their marketing team doesn't want to give up on the MHz race.
Learning HOW to think is more important than learning WHAT to think.
The Amulet Group at The University Of Manchester have a clockless ARM (ARMs are used in many mobile phones, the Compaq iPaq and the GBA).
The article is very interesting. I though that research in asynchronous computing died in the sixties. What the article misses is that async. operations has an overhead too - the synchronization "here is the data". Synchronous computing does not have that.
I have previously read (forgotten where) that in theory async. computers will always be slower that sync. computers. It seems that that is not true anymore. I guess that the latests-and-greatest CPUs have a non-trivial percentage of idle time for instructions which takes slightly longer than an integral number of clock ticks. If an instruction takes 2.1ns and the clock runs at 1ns, everything have to assume that the instruction takes 3ns.
Also imagine a fully async. computer. No need for a new motherboard or even changing settings in the BIOS when new and faster RAM chips are available - the system will automatically adapt.
I think that we will see more and more async. parts in the year to come. But I don't know if everything is going to be asynchronous.
The AMULET group at Manchester University have been developing this for years based on ARM cores.
http://www.cs.man.ac.uk/amulet/index.html
Well, I think that the reason the async chips are not being used is quite simple - a clocked system is much easier to design and verify. You know how long before and after a clock edge your signal needs to be there to be recognised. You know that if these constraints match across your system, it will work. Yes, this makes the system as fast as its slowest link - some circuits operate near their limits, some are actually wasting the time. But it works. An asynchronous design would be a pure hell to debug - that's probably why the industry doesn't (yet) mess with it.
:-) The only drawback is that the most of us prefer slow digital model of thermonuclear reaction and similar problems...
BTW, does anybody here remember analog computing? A bunch of cleverly connected operating amplifiers? These things were asynchronous, just as mother nature is. If you can get the physics work for you, bingo - compare the time the nature needs for raytracing a complex scene compared to a digital model
Busses can be made asynchronous. Handshaking is the key. New statigies will be needed, but people are bright so I feel they will be developed. With a little thinking I've sketched out a packet type asyncronous bus in my head. It would work nicely for up to a meter or so. Longer lengths would be slower than shorter ones. One thing I feel may work best is for any signal/data that needs to travel significant distances is to then go into synchronous transmition. Otherwise you end up adding in delays from the back handshake signals.
I remember some of the first articles in SIGARCH and how they sparked my interest. I've always felt that async was the way to go when you don't know how long an operation will take. I'm happy to see it's still getting research dollars.
First off, this is pure conjecture, IANAME.
Okay, the way I suppose this would work, considering that Intel had developed a chip that was compatible with the pentium series, would be an asynchronous design, with some kind of logic translator to communicate with the bus. Yes, at first you would be wasting processor power, but eventually, the bus technology would catch up (See ISA to EISA to VLB to PCI to AGP and on...). As for the RAM, it could either run on an independent clock-bus, or, I do not see why it would be a problem to develop asynchronous RAM if they have the technology for the chips. Also, the article states that the P IV utilises some asynchronous componants, maybe that is port of the reason for the push to use RDRAM with it?
You say you want a revolution....
When designing a "conventional" CPU, you can have a clock that essentially drives events and datamovement.
If you design a multiplier circuit using a bunch of full-adders, you'll notice that the output take a long of time to settle. In fact, depending on what numbers you are multiplying together, the circuit may take more or less time before the output settles.
You can always determine the worst-case scenario for a multiply operation to settle. If the multiply takes longer than any other operation, then the multiply op is the "critical path".
A chip's frequency is the inverse of the period of the critical path (in most cases). So, if it's possible to do 100 million critical path operations in a second, then your machine can run at 100MHz.
What the article is hinting at is the amount of wasted time because everything is (currently) done on the clock cycle. Allow me to illustrate: Let's say a multiply takes 5 seconds, but an add only takes 1. A fixed clock rate (or having a clock at all) forces that add instruction to take the extra 4 seconds, and use it for nothing. Wasted computer time.
Now, the reason people are skeptical is because there is no efficient way to tell if a multiply operation (or any other operation) has actually completed and the outputs have settled.
Incidentally, if this interests you, go grab a free program called "diglog" or "chipmunk". The software (for linux/windows) allows you to simulate almost any digital circuit.
Another thing to keep in mind about current CPUs is the way they execute an instruction. Every instruction is actually made of smaller instructions (called microinstructions). Microinstructions take one clock cycle each, but there is an arbitrary number of microinstructions for each larger instruction. The microinstructions perform the "fetch execute cycle" - the sequence that decodes the instruction, grabs the associated data, performs the desired task, and goes back for more.
If you're interested in designing a CPU yourself, go grab a book by Morris Mano called "Computer System Architecture". With that book and DigLog, it's pretty easy, but it takes a long time.
if there is no mass market for asynchronous chips, there's little incentive to create tools to build them; if there are no tools, no chips get produced. The same problem applies to the development of chip-testing technologies. Without any significant quantity of asynchronous circuits to test, there is no market for third-party testing tools.
But at least here there's an accidental solution - the Cross-Check Array.
Conventional clocked chips can be tested by scan: A multiplexer is added to the flop inputs, and a test signal turns them into one or more long shift registers. The old state of the flops is shifted out for examination while a new state is shifted in to start the next phase of the test. This only works when the flops to be strung together are all part of a common clocking domain.
The Cross-Check Array is more like a RAM. A grid of select lines and sense lines are laid down on the chip, with a transistor at each intersection. The transistor is undersized compared to those of the gates, forming a small tap on a nearby signal - or it can inject a signal if the sense line is driven rather than monitored. Select drivers are laid down along one edge of the chip, sense amplifiers/drivers along another.
This approach does not depend on the flip-flops to be active participants in the observation process (though it can still force their state), and thus can observe signals in asynchronous as well as synchronous designs. It also gives observability of testpoints in combinatorial logic without the addition of extra flops. Compared to a fullscan design it gives much greater observability and takes about half the silicon-area overhead.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Does programming for clockless chip differ to synchronous one? Every links I tried to follow only explain about design, or speed, or power consumption difference.
They're talking about removing the internal CPU clock, which in effect, isn't really a clock at all. It's just something which ticks at regular intervals, and lets you do a number of things, such as synchronize instructions, pipeline, cache read/writes, and all the other stuff I forgot from CS 101.
A computer's clock (as in date, time, etc) is on another part of the motherboard, and runs (correct me if I'm wrong) off the CMOS battery. That'll always be a "clock" in the sense we understand.
And build in a microphone and make itts screen touch sensitive. That way you can get rid of the keyboard, trackpad and hinge and make it a single, consolidated unit.
It would not be economically viable to try and push this new type of processor to the market overtaken by the traditional synchronized processors and computer equipment, however, it seems that the assynchronous microprocessing can still be used inside traditional computers if it is mixed together with synchronized systems. Imagine a computer that uses a synchronous bus just the way it does now but has an assynchronous co-processor which is communicated to by a special type of synchronous CPU that allows certain operations to be carried out assynchronously. If, for example, a matrix multiplication needs to be done, the normal CPU would require a number of clock cycles that is proportional to the number of multiplications within the matrix over the number of processor pipes allocated for this task. If it can be proven that assynchronous processing can do the same job three times faster than a 'normal' cpu takes, why can't 'normal' or traditional CPU ask the assynchronous co-processor to do the task for it? The problem is of-course assynchronous data retrieval and storage. Probably a co-processor could actually be a co-processor card with its own assynchronous memory bank on board that can be later synchronized with the traditional memory banks. Such a system should not be too difficult to implement, since it could use a PCI slot for example. Soon a computer would become less and less synchronous, with the synchronous parts synchronizing many assynchronous devices.
You can't handle the truth.
Of course there is some overhead. There has to be a system telling other parts of the computer when something is finished. But if that is a long enough stage (perhaps thousands of instructions) then it'll be faster overall.
Escher was the first MC and Giger invented the HR department.
One of the first common "thinking-out-of-the-box" techniques used to crack smart cards was the sw was written to take different amounts of time to compute legal and illegal keys. By measuring the battery consumption, the smart card crackers could only search the space of legal keys.
No doubt this was a sw path put in by a well intentioned programmer trying to save battery life, but now all respected encryption systems reccomend a "veil" strategy, where all encryption/decryption operations take the same amount of time and power regardless of the key.
In practice this means that you find out the max time and power (plus some margin) and if you are done early and without using enough power, you waste time and power to pad out the the veil...
Nice thought, but this just goes to show that cryptographic systems really need to be designed by experts...
Parts of processors are already asynchronous. The basic way you get stuff done in a clocked machine is that you have a register feeding an array of logic gates some number of gates deep, with the output going to some other register. Within the array of logic gates, which might be an adder, a multipler, or an instruction decoder, things are asynchronous. But the timing is designed so that the logic will, in the slowest case, settle before the register at the receiving end locks in its input states. The worst case thus limits the clock rate, which is why the interest in asynchronous logic.
The claims of lower power consumption are probably bogus. As Transmeta found out, the power saving modes weren't exclusive to their architecture. Once power-saving became a competitive issue, everybody put it in.
The article is surprisingly accurate, for a change. Read it.
/. trademark by this point...
However, it seems to have spawned the usual problems here with misunderstanding and confusion. Practically a
Whether you construct a processor using conventional or asynchronous logic makes no difference to the programmer. The programming paradigm can be completely independant from the underlying hardware. (Admittedly, if you want to squeeze the absolute most performance from a given hardware design, you need to program with it in mind, but there is no reason why an ix86, or PPC, or SPARC, or MIPS chip couldn't be implemented asynchronously.)
One of the most interesting advantages of asynchronous logic is that it allows the use of arbitrarily large die sizes. In synchronous logic, you're limited by the delays that arise from transmitting your clock pulses across the chip... at some point maintaining a global lock-step becomes infeasible.
One of the most marketable advantages of asynchronous logic is the power saved by not having to constantly drive the same clock circutry. Most chips support a 'sleep' or 'low power' mode where they turn off the clock or provide it to only a limited portion of the chip. The chip then has to go through a 'wake up' cycle to re-establish the clock throughout the chip before returning to normal operation. The power saved by asynchronous operation can be substantial, and the lack of a 'wake up' latency can be critical in certain applications.
The biggest problem right now is that the vast Layout and Design masses are used to solving the synchronous problems and not the asynchronous problems, ditto for the availible tools. Howver, with an asynchronous-savvy group, a given solution can be designed in less time than the equivalent synchronous solution (someone here was claiming otherwise...).
And this technology is -not- vaporware... it's real and it's here. And whether you believe it or not, it's at least one part of the future.
-YA
PS: BS in EE from Caltech. Working for a company mentioned in the article, although their opinions have no logical relation or tie to mine.