ARM Offers First Clockless Processor Core
Sam Haine '95 writes "EETimes is reporting that ARM Holdings have developed an asynchronous processor based on the ARM9 core. The ARM996HS is thought to be the world's first commercial clockless processor. ARM announced they were developing the processor back in October 2004, along with an unnamed lead customer, which it appears could be Philips. The processor is especially suitable for automotive, medical and deeply embedded control applications. Although reduced power consumption, due to the lack of clock circuitry, is one benefit the clockless design also produces a low electromagnetic signature because of the diffuse nature of digital transitions within the chip. Because clockless processors consume zero dynamic power when there is no activity, they can significantly extend battery life compared with clocked equivalents."
Can a processor like this do things like play sounds? If it doesn't have a clock I don't think it could measure time accurately so it could reproduce the samples. What other drawbacks are there?
Send email from the afterlife! Write your e-will at Dead Man's Switch.
This may create difficulties for software that needs precise timing. Developing PICs I found that timing with the clock speed is easy and important.
Never let your sense of morals prevent you from doing what's right. --Isaac Asimov
Would this be responsive enough to be used in PDA like applications? or even laptops?
I worked for ARM for four years.
Truely wonderful and very special company for the first two of those years, then it slowly and surely went downhill - these days, it's just another company. ARM's culture didn't manage to survive its rapid growth in those few years from less than two hundred to more than seven hundred.
Maybe the first commercial micro-processor. DECs VAX-8600 was asynchronous. And it smoked for the day. I worked on some of the multi-variant multi-source clock skew calculations for the simulator used to model the processor, among other duties. Very slick hardware for the time. External syncronous contexts are maintained of course for syncronous busses but the internal processor speed is quicker in theory and cheaper power since you have fewer switching transitions. Think of the fun in ECL logic back then. :)
- Tjp
I am in wallow with my inner money grubbing capitalistic pig. ... Oink!
This seems obvious: laptops! The low power consumption makes them perfect. I'd love a multi-processor ARM9 core laptop running... oh, say, OS/X :-) Just for the geekiness of it.
Helping with organizational effectiveness is our job.
But your assertion about critical path is slightly off. Asynch processors still have a critical path. If you immagine the components as a bucket-bregade and the data the buckets, then they may not all be heaving the buckets at exactly the same time anymore, but they will still be slowed down by the slowest man in the line. The difference is that critical path is now dynamic. You don't have to time everything to the static, worst-case component on your chip. If you consistenly don't use the slowest components (say, the multiply unit), then you will get a faster IPT (instruction per time) on average.
And yes, you don't have clock skew any more which is nice, but you now have to handshake data back-and-forth across the chip. Of course putting decoupling circuitry in can help.
"You saved 1968." - Ms. Valerie Pringle to the crew of Apollo 8
One of the neatest things about asynch processors is their ability to run in a large range of voltages. You don't have to worry that lowering the voltage will make you miss gate setup timing since the thing just slows down. Increasing voltage increases rise time/propegation and speeds the thing up. The grad students had a great demo where they powered one of their CPUs using a potato with some nails in it (like from elementary school science class.) They called it the 'potato chip'.
"You saved 1968." - Ms. Valerie Pringle to the crew of Apollo 8
Not to belittle the energy savings, but how fast is it compared to a clocked CPU with a similar instruction set? To me, speed the most interesting quality of a new chip design other than reliability. The problem with a clock is that clock speed is dictated by the slowest instruction. Since a clockless CPU does not have to wait for a clock signal to begin processing the next instruction in a sequence, it should be significantly faster than a conventional CPU. Why is this not being touted as the most important feature of this processor?
This seems to be a good overview of clockless chips. I can't vouch for its accuracy (not my area), but the source - IEEE Computer Magazine - should be good. The article was published March 2005.
(warning: PDF)0 18.pdf
http://csdl2.computer.org/comp/mags/co/2005/03/r3
What did I miss? I remember the hype, the early diagrams of how it was all supposed to weave through without the need for a clock. Would someone care to elaborate on the post-mortem of what was supposed to be the first clockless processor, 4 years ago?
As others pointed out, you've made mistakes.
The most glaring is that you assume that synchronous processors can only have one clock - that's incorrect. While the clock tick is of fixed length (by design), the global clock (as seen by external parties) may run at a different speed than internal clocks.
If the a path of logic takes 5ns to complete, and its clock matches exactly, then you are perfectly optimized. You are hampered not by the clock, but by the transistor's switching speed. This path will have the same delay, regardless of whether it is driven by a clock.
You might be getting confused because you are thinking about pipelining, where the longest stage dictates stage length. If everything is driven by one clock, you create waste because some partitions will finish sooner than others, and are therefore idle. However, modern designs now employ shew-tolerant clocking. By using multiple clocks, the issues created by clock skew can be entirely avoided. The walls between pipeline stages are destroyed and skewing delay negated.
Your issue with propogation delay of the clock is also not of great concern, in most cases. Synchronous chips can employ distributed clocks, islands of asynchronous logic, and the Pentium-4 actually has stages to help propogate the clock. However most processors are unlike the speed demon design of the P4 and clock speed is limitted by other issues than clock propogation. Currently, that limitting factor is power. In dynamic logic, frequency has a direct relationship to power consumption.
"Open Source?" - Press any key to continue
Unfortunately, self-clocked design (like the reported ARM uses) is also sometimes called "asynchronous" logic design; however, this is a completely different kind of thing than the "asynchronous" combinatorial logic used in clock-based design. Self-clocked design also does combinatorial logic in latched stages, but uses a self-timed asynchronous protocol to run the latches instead of a synchronous clock. Basically, the combinatorial logic figures out when it's finished, and tells both the next stage ("data's ready, latch it") and the input latch from the previous stage ("I'm done; gimme some more data").
To close the loop, each stage can wait until there's new data ready at its inputs, and space to put the output data. Thus, in absence of some bottleneck, your chip will simply run as fast as it can.
To overclock a self-timed design, you simply increase the voltage. No need to screw around with clock multipliers; as long as your oxide holds up, your traces don't migrate, and the chip doesn't melt...
Thanks for looking at this with a realistic perspective. There is a reason that the article said these chips would be used in deeply embedded or automotive situations. In these situations, low power consumption granted by an asynchronous design is great. Not so great, however, is the overall performance. Part of the reason for clocking something (for example synchronous busses) is to avoid the excessive need for handshaking algorithms. Extending the handshaking methodology to multiple pipeline stages seems somewhat self-defeating. How might one effectively design an asynchronous pipeline with the same overall performance of a synchronous pipeline? How can you handle register bypassing or interlocks in a general case without some synchronization happening (as the different paths among different stages will inevitably introduce different time delays) yet also without adding a handshake to every stage? I guess my point is that I wouldn't expect this to find wide acceptance in scenarios in which reasonably high performance (where pipelining is a key factor) is needed. Seems great for embedded applications, poor for games.
"Is not a sentence" is not a sentence. Well damn.
> Security: Async designs give security against side channel power analysis attacks
You're right about that. I research side channel attacks on crypto hardware, and my first response to this was --- well, this would make EM analysis more complicated. For those not familiar with the general approach, in side channel attacks you don't try to do anything as complicated as breaking the underlying math of the crypto. Instead you observe the hardware for emissions that can give some clues as the instructions being carried out. If your observations help give you any info about what the chip is processing, you might learn parts of keys or gain a statistical advantage in other attacks. So if it's harder to observe signals emitted (electromagnetically from the chip, then attacking the hardware is harder.
ARM was never like that. Unlike their parent company, Acorn, it was both a company of brilliant engineers and was always highly profitable. In later days, Acorn's share in ARM was all that kept it from going under.
I don't like trolls and mod against me if you like, but I'd prefer if you'd reply.
Most (all?) commodity motherboards are completely synchronous. In fact, even the buses running at different speeds are actually clocked at rational fractions of the One True System Clock. (Letting them run at different clocks would require extra latency for the synchronization stages, to keep metastability from eating the data alive.)
Sun has clockless chips up and running (real silicon, not sims) and they have done some interesting things, but they don't have a complete system that's ready to ship. And there are other components out there that use the clockless philosophy to do certain things, but they're not CPUs in any sense. To give credit where credit is due, as the parent post points out, ARM beat Sun out the door with a clockless CPU that is a drop-in replacement (to some degree, anyway -- not clear how much) for an existing, established architecture. But that wasn't/isn't Suns goal (although perhaps it should be...). They're pushing in new directions, not using this to reimplement current architectures.
Am I part of the core demographic for Swedish Fish?
ARM have been working on asynchronous designs for a long, long time. I recall reading about their plans for an asynchronous core in Acron User magazine, back when there actually were Acorn users. This was back when the ARM6 was new and shiny, and the asynchronous part was expected to be released as the ARM8.
I am TheRaven on Soylent News
It's NetBSD that requires gcc and a paged MMU.
Linux is more portable. Linux runs on the original 68000. Linux was just ported to the Blackfin DSP. There seem to be about a dozen crappy little no-MMU processors that can run Linux.
Linux requires a gcc-like compiler, but not necessarily gcc. IBM and Intel have both produced non-gcc compilers that are able to compile Linux.