ARM Offers First Clockless Processor Core
Sam Haine '95 writes "EETimes is reporting that ARM Holdings have developed an asynchronous processor based on the ARM9 core. The ARM996HS is thought to be the world's first commercial clockless processor. ARM announced they were developing the processor back in October 2004, along with an unnamed lead customer, which it appears could be Philips. The processor is especially suitable for automotive, medical and deeply embedded control applications. Although reduced power consumption, due to the lack of clock circuitry, is one benefit the clockless design also produces a low electromagnetic signature because of the diffuse nature of digital transitions within the chip. Because clockless processors consume zero dynamic power when there is no activity, they can significantly extend battery life compared with clocked equivalents."
Soooo... How many mHz does it run at?
Victory or awesome!
Can a processor like this do things like play sounds? If it doesn't have a clock I don't think it could measure time accurately so it could reproduce the samples. What other drawbacks are there?
Send email from the afterlife! Write your e-will at Dead Man's Switch.
I read the summary and cringed. (1) Don't call them clockless -- they're called a-synchronous, because (unlike a synchronus processor, one with a clock), all the parts of the processor aren't constantly starting and stopping at the same time. A typical synchronus processor can only run at a maximum frequency inversely proportional to the longest length in the critical path - so if it takes up to 5 nanoseconds for information to propagate from one part of the chip to the other, the clock cannot tick any faster than once every 5 nanoseconds. (2) One very serious problem in modern processors is clock skew - if you have one central clock, the parts closest to the clock get the 'tick' signal faster than the parts farhther away, so the processor doesn't run perfectly synchronously.
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
So in short, your next smart clock may as well have a CPU without a clock.
Those damn young'uns and their newfangled clockless clocks.
But your assertion about critical path is slightly off. Asynch processors still have a critical path. If you immagine the components as a bucket-bregade and the data the buckets, then they may not all be heaving the buckets at exactly the same time anymore, but they will still be slowed down by the slowest man in the line. The difference is that critical path is now dynamic. You don't have to time everything to the static, worst-case component on your chip. If you consistenly don't use the slowest components (say, the multiply unit), then you will get a faster IPT (instruction per time) on average.
And yes, you don't have clock skew any more which is nice, but you now have to handshake data back-and-forth across the chip. Of course putting decoupling circuitry in can help.
"You saved 1968." - Ms. Valerie Pringle to the crew of Apollo 8
One of the neatest things about asynch processors is their ability to run in a large range of voltages. You don't have to worry that lowering the voltage will make you miss gate setup timing since the thing just slows down. Increasing voltage increases rise time/propegation and speeds the thing up. The grad students had a great demo where they powered one of their CPUs using a potato with some nails in it (like from elementary school science class.) They called it the 'potato chip'.
"You saved 1968." - Ms. Valerie Pringle to the crew of Apollo 8
"What time is it?" "Shut! The! Fuck! Up! I'm saving energy here!"
This seems to be a good overview of clockless chips. I can't vouch for its accuracy (not my area), but the source - IEEE Computer Magazine - should be good. The article was published March 2005.
(warning: PDF)0 18.pdf
http://csdl2.computer.org/comp/mags/co/2005/03/r3
ARM doesn't typically target super fast designs. They go more for low power and then reach for efficiency.
So this core wouldn't be designed for speed.
Also for many embedded platforms the cpu speed is less important compared to power consumption and bus contention.
Tom
Someday, I'll have a real sig.
What did I miss? I remember the hype, the early diagrams of how it was all supposed to weave through without the need for a clock. Would someone care to elaborate on the post-mortem of what was supposed to be the first clockless processor, 4 years ago?
I can't wait to get my hands on one of these and over-asynch the hell out of it. Imagine running it under dry ice - I bet it could run up to 50% more clockless over its default clocklessnes.
I know typing this out will be useless, and it will get overlooked by the mods, but I might as well say this. Asynchronous designs have several advantages :
:). (Yes I know busses are clocked, before you start, but if they were not.... )
1. It will give good power consumption characteristics i.e. low power consumed, not just because of the built in power down mode, but also because of the voltage the chips will be running at. By pulling the voltage lower than a synchronous equivalent, it will be simpler to have greater power savings. This becomes possible if you are willing to sacrifice speed. and in async devices, speed of switching can be dynamically altered as each block will wait till the previous one is done, not until some outside clock has ticked.
2. Security: Async designs give security against side channel power analysis attacks. As all gates must switch (standard async design usually uses a dual rail design, so most gates means all gates along both +ve & -ve switch), differential power attacks become much harder. Thus async designs are perfect for crypto chips (hardware AES anyone?)
3. elegance of solution:the world is generally async. Key presses are, memory accesses are. so why not the processor
But they have several points of disadvantage:
1. They are hard to do. Especially using the synchronous design flow that most of the world uses. Synchronous tools assume, especially in RTL, that the world is combinational, and that sequential bits are simply registers that occur once a clock cycle (not true for full custom designs like intel and amd, but for slightly lower level : esp ASIC design)
2. The tools that exist now, are either able to do good implementation using only a few gates ie small functions or bad implementations, that are in worst case as slow as synchronous equivalents but are larger functions. Tools exist like http://www.lsi.upc.edu/~jordicf/petrify/ Petrify , but these become unusable for circuits with more than ~50 gates.
3. Async designs are usually large. This is not always true, but standard async designs are usually implemented as dual rail or using 1-of-M encoding on the wires. But the main overhead comes from the handshaking circuitry. For really fine grain pipeling, the output of each stage must be acknowledged to the previous stage. This adds a massive overhead, as it necessitates the use of a device called the Muller C Element, that sets the output to the output, only if the inputs are the same, or retains the previous value, if not. Many copies of this element are usually required, and its this that adds space, for example, a simple 1 bit OR gate, that would usually have 4 transistors, has 16 transistors for the dual rail async implementation.
For the time being, I think they will find a lot of use in low power applications - such as embedded microcontrollers/processors, in things like wireless sesnor networks, and security processors. However I believe that full processor design is very far off.
Legally obligatory sig : My opinions are my own... etc etc
They come in 4 models: fast faster fastest OMGspeed
I thought they came in Light Speed, Ridiculous Speed and LUDICROUS SPEED!
Unfortunately, self-clocked design (like the reported ARM uses) is also sometimes called "asynchronous" logic design; however, this is a completely different kind of thing than the "asynchronous" combinatorial logic used in clock-based design. Self-clocked design also does combinatorial logic in latched stages, but uses a self-timed asynchronous protocol to run the latches instead of a synchronous clock. Basically, the combinatorial logic figures out when it's finished, and tells both the next stage ("data's ready, latch it") and the input latch from the previous stage ("I'm done; gimme some more data").
To close the loop, each stage can wait until there's new data ready at its inputs, and space to put the output data. Thus, in absence of some bottleneck, your chip will simply run as fast as it can.
To overclock a self-timed design, you simply increase the voltage. No need to screw around with clock multipliers; as long as your oxide holds up, your traces don't migrate, and the chip doesn't melt...
Interestingly enough, Intel's latest project is being called "CORE". So yes, ARM is probably going to end up fighting CORE again.