Philips, ARM Collaborate On Asynchronous CPU

← Back to Stories (view on slashdot.org)

Philips, ARM Collaborate On Asynchronous CPU

Posted by timothy on Monday November 1, 2004 @08:40PM from the it's-always-either-late-or-early dept.

Sean D. Solle writes "While not an actual off-the-shelf chip, Philips and ARM have announced a clockless ARM core using what they call "Handshake Technology." Read on for more about just what that means; according to this article, the asynchronous ARM chip has yet to be developed, but the same Philips subsidiary has applied similar technology to other microprocessors.

Sean D. Solle continues "Back in the early 1990's there was a lot of excitement (well, Acorn users got excited) about Prof. Steve Furber's asynchronous ARM research project, "Amulet". The idea is to let the CPU's component blocks run at their own rate, synchronising with each other only when needed. Like a normal RISC processor, one instruction typically takes one clock cycle; but in a clockless ARM, a cycle can take less time for different classes of instructions.

For example, a MOV instruction could finish before (and hence consume less power than) an ADD, even though they both execute in a single cycle. As well as energy-efficiency, running at effectively random frequencies reduces a chip's RFI emissions - handy if it's living in a cellphone or other wireless device."

21 of 163 comments (clear)

Min score:

Reason:

Sort:

Such a processor already exists by philj · 2004-11-01 20:45 · Score: 5, Informative

See here. Developed by Steve Furber and his team at The University Of Manchester
Intresting implications by luvirini · 2004-11-01 20:45 · Score: 3, Interesting

If we see same thing applied to non ARM architectures, there a many strange things going to happen, as quite many things in current computers are based on the assumption that things have specific clock rates. Obviously things might get very intresting...
Re:Intel were first... by h0tblack · 2004-11-01 20:47 · Score: 3, Informative

Read the story.. there were ARM based asynchronous chips in the lab (AMULET) a long time before 97.
way more elegant by fizze · 2004-11-01 20:47 · Score: 5, Informative

the very first drafts of microprocessors were clockless.
just with higher speed and hence, brute force, performance could be achieved easily.
The problems which could not be solved back then were the obvious synchronisation issues. Setting up a common clock seemed the only way to resolve them.

The idea behind clockless designs is less a "back-to-the-roots" idea, but more a step to gain the advantages of such a design, which are, amongst others:

Reduced Power Consumption
Higher Operation Speed

Moreover, highly sophisticated compilers could tune program code to match a given performance/power ratio.

Yet, I would not bet on clockless cores to become the new mainstream, by far not. Clockless cores will most likely be aimed at embedded design appliances, and low- and ultra-low-power applications.

--
Powerful is he who overpowers his temptations.
1. Re:way more elegant by renoX · 2004-11-01 21:27 · Score: 4, Interesting
  
  Agreed that clockless cores have few chance to become mainstream, but still they have a better chance of being used now than before.
  
  Let me explain: before to reduce power consumption the "easy" thing was to use a process which created smaller transistor, but smaller doesn't means 'reduced power consumption' anymore..
  So clockless CPU becomes more interesting now.
Encouraging technology, but useful soon? by Dancin_Santa · 2004-11-01 20:56 · Score: 3, Interesting

The benefit to today's high-functionality embedded operating systems like Linux, Symbian, iTron, and Windows CE is that they implement a preemptive task switching operating system. At any time, the clock interrupt may fire and the operating system will then queue up the next thread into the CPU.

Nowadays, the whole CPU is not powered at any one time. If an instruction does not access certain parts of the chip, they are dark. Now this does not hold for some predictive processors which may be processing not-yet-accessed instructions, but in general if an instruction is not using some part of the chip, that part of the chip does not require juice.

Taking out the clock and relying on the chip parts to fire and return means that each application in the system must return to the OS at some point to allow the OS a chance to queue up the next thread. Without the clock interrupt, the OS is at the mercy of the program, back to the bad old days of cooperative multitasking.

The clock is what tells the OS that it is time to give a time slice to another thread. If we say "OK, well we'll just stick a clock in there to fire an interrupt every x microseconds," then what have we accomplished? We are back at square one with a CPU controlled by a clock. No gain.

This kind of system would work in a dedicated embedded system which did not require a complex multitasking operating system. Industrial solutions for factories, car parts, HVACs, and other things that need reliability but don't really do that much feature-wise seem to be prime candidates for this technology. "Smart" devices? Not so much.
1. Re:Encouraging technology, but useful soon? by fizze · 2004-11-01 21:06 · Score: 4, Insightful
  
  Preemption is a "dirty hack" to achieve nice behaviour in a timely manner.
  For embedded systems where interrupt latency is the primary aspect, other approaches have to be found. also, if the CPU checks after every x instructions if there is an interrupt to process, you get a margin of the timely behaviour.
  I am no embedded / safety critical developer, but I know that the fastest response times on interrupts and worst-case response times vary greatly depending solely on the (RT)OS used.
  
  --
  Powerful is he who overpowers his temptations.
2. Re:Encouraging technology, but useful soon? by Anonymous Coward · 2004-11-01 21:09 · Score: 5, Informative
  
  I think you are getting clock confused with ticker interrupt. A CPU clock is typically measured in nanoseconds. A ticker interrupt is typically measured in milliseconds. A clockless core will still need to field interrupts (for I/O) and very well can still field a ticker interrupt. -cdh
3. Re:Encouraging technology, but useful soon? by CaptainAlbert · 2004-11-01 21:14 · Score: 5, Informative
  
  You appear to be confusing the CPU's clock with a real-time clock interrupt. They are fundamentally not the same thing.
  
  The clock being dispensed with is the one that causes the registers inside the CPU to latch the new values that have been computed for them. At 3GHz, this happens every 333ps. The reason this clock exists is basically because it makes everything in a digital system much, much easier to think about, design, simulate, manufacture, test and re-use. But, it's not an absolute requirement that it be present, if you're clever. (Too clever by half, in fact.)
  
  The other clock, which you were referring to, fires off an interrupt with a period on the order of milliseconds, to facilitate time-slicing. If your application requires such a feature, you can have one, regardless of whether your CPU is synchronous or asynchronous internally. It's a completely separate issue.
  
  --
  These sigs are more interesting tha
4. Re:Encouraging technology, but useful soon? by KiloByte · 2004-11-01 21:32 · Score: 3, Informative
  We're talking about two different types of clocks:
  
  a timing source needed to preempt a long-running task
  
  the heart-beat that dictates when the CPU is going to do the next instruction.
  
  These two are completely different things. The former can have a pretty low resolution as well -- but is needed for other tasks as well. Any non-degenerate processor will need some kind of timing source, but there is no reason why it would be connected to the number of instructions executed.
  In a multitasking operating system, there are three reasons that can trigger a preemption:
  
  a hardware interrupt
  Some outside even has happened. A new bit/byte came in from a serial source, an IO tranfer ended, etc, etc.
  
  resource needed
  The process requires some resource that is either held by another process or will require an IO.
  
  the time-slice has expired
  A timer interrupt is needed for this, but nothing bad will happen if the resolution is many orders of magnitude bigger than the CPU core clock would be. You don't preempt processes every a handful of CPU cycles, do you?
  --
  The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Re:Intel were first... by kf6auf · 2004-11-01 20:58 · Score: 4, Interesting

So the question is WHY didn't it make it out of the lab? Did it cost too much to produce? That's the only real possibility I can think of - I don't think Intel's Marketing Division had absolute power over the company in 1997 to push the MHz agenda.
Re:Intel were first... by jimicus · 2004-11-01 21:10 · Score: 4, Informative

No they weren't. From TFA:

The AMULET1 microprocessor is the first large scale asynchronous circuit produced by the APT group. It is an implementation of the ARM processor architecture using the Micropipeline design style. Work was begun at the end of 1990 and the design despatched for fabrication in February 1993. The primary intent was to demonstrate that an asynchronous microprocessor can offer a reduction in electrical power consumption over a synchronous design in the same role.
Not relevent by r6144 · 2004-11-01 21:15 · Score: 5, Informative

As far as I know, Linux and many other operation systems already use an external chip (the 8254 on the PC) for most timing tasks, including preemptive multitasking. For ultra-high precision timing, the CPU clock (the time stamp counter on an IA32 cpu) is used, but they are not all that essential. Last time I heard, since CPU frequencies can change by power management functions on some P4s, they are a bit tricky to use correctly for timing, so they are not used when not absolutely needed.
As for the power problem, all parts of the CPU is powered, except that gates that aren't switching consume less power (mostly leakage, which seems to be quite significant now). In synchronous circuits, at least the gates connected directly to the clock signal switch all the time, while in asynchronous circuits unused parts of the CPU can avoid switching altogether, so some power may be saved, but I don't know how much it will be.
ARM Business Model by joelethan · 2004-11-01 21:21 · Score: 3, Interesting

I'm interested becasuse ARM's business model usually involves licensing their chip designs. ARM cpus are widespread in cell phones etc. They have their own market and application area away from Wintel, PowerPC etc.
Also, anything that might boost my pitiful ARM shares value is most welcome! Why?... Why did I believe the hype?
/joelethan
ENIAC was first by chris_sawtell · 2004-11-01 21:24 · Score: 4, Funny

I hope they don't try to patent this.
Refer to 1944 for prior art.
Re:Philips growing into a Major R&D company by dtmos · 2004-11-01 22:52 · Score: 3, Interesting

Philips has been a world-class R&D company for a long time. Philips Research was established in 1914, and has contributed much, from the invention of the pentode vacuum tube (valve) by Tellegen in 1929 to the audio cassette in the 1960s and their more modern work developing CDs and DVDs.

The fire has been lit under IBM and other corporate research organizations for a long time.
I had an idea once by ajs318 · 2004-11-01 23:07 · Score: 4, Informative

The reason why a clock is commonly used in microprocessor circuits is to try to synchronise everything, because different logic elements take a different amount of time for the outputs to reach a stable state after the inputs change. This is known as "propagation delay" and is what ultimately limits the speed of a processor. With CMOS, you can actually reduce the propagation delay a little by increasing the supply voltage, but then your processor will be dissipating more power. {CMOS logic gates dissipate the most power when they are actually changing state, and almost no power at all while stable, whether they are sitting at 1 or 0. This is in contrast to TTL, which usually dissipates more power in a 0 state than in a 1 state, but there are some oddball devices that are the other way around}.

The clock is run at a speed that allows for the slowest propagation, with data being transferred in or out of the processor only on the rising or falling edges. This allows time for everything to get stable. It's also horrendously inefficient because propagation delays are actually variable, not fixed.

If you wire an odd number of NOT gates in series, you end up with an oscillator whose period is twice the sum of the propagation delays of all the gates. If you replace one of the NOT gates with a NAND or NOR gate, then you can stop or start the oscillator at will. Furthermore, by extra-cunning use of NAND/NOR and EOR gates, you can lengthen or shorten the delay in steps of a few gates. Obviously at least one of the gates should have a Schmitt trigger input to keep the edges nice and sharp; but that's just details.

My idea was to scatter a bunch of NOT gates throughout the core of a processor, so as to get a propagation delay through the chain that is just longer than the slowest bit of logic. Any thermal effects that slow down or speed up the propagation will affect these gates as much as the processing logic. Now you use these NOT gates as the clock oscillator. If you want to try being clever, you could even include the ability to shorten the delay if you were not using certain "slow" sections such as the adder. This information would be available on an instruction-by-instruction basis, from the order field of the instruction word. The net result of all this fancy gatey trickery is that if the processor slows down, the clock slows down with it. It never gets too fast for the rest of the processor to keep up with. Most I/O operations can be buffered, using latches as a sort of electronic Oldham coupling; one end presents the data as it comes, the other takes it when it's ready to deal with it, and as long as the misalignment is not too great, it will work. For seriously time-critical I/O operations that can't be buffered, you can just stop the clock momentarily.

The longer I think about this, the deeper I regret abandoning it.

--
Je fume. Tu fumes. Nous fûmes!
1. Re:I had an idea once by chrysrobyn · 2004-11-02 00:26 · Score: 4, Informative
  
  My idea was to scatter a bunch of NOT gates throughout the core of a processor, so as to get a propagation delay through the chain that is just longer than the slowest bit of logic.
  
  I assume that you hope to use your self timed logic (as it's known in the industry) to avoid all the problems associated with clocked logic and provide an easy to use asynchronous solution. Please do not forget manufacturing tolerances and that you have to make your self-timed logic 99.99999% certain slower than the slowest asynchronous path. This means that you have to qualify your entire logic library with a specific technology, then guardband it to make sure that when manufacturing shifts due to reasons you cannot explain, your chip still works. For this reason, in my experience, self timed logic has been slower than clocked logic for nominal cases and much slower in fast cases (in special cases, better than breaking even in slow process conditions).
  
  Self-timed logic of the kind you describe would likely still end up with latches to capture the result / launch into the next self-timed logic block. In this case, you're still paying the latch cycle time penalty for clocking your pipeline. You're still burning the power associated with the clock tree (although you are gating your clocks to only the active logic, known as "clock gating", an accepted practice), and you're additionally burning the power for each oscillator, which I suggest would likely be more than the local clock buffers in a traditional centrally PLL clocked chip.
  
  An ideal asynchronous chip would be able to not use latches to launch / capture and still be able to keep multiple instructions in flight -- using race conditions for good and not evil. This would involve a great deal of work beyond simply using inverters and schmitt triggers. This is a larger architecture question requiring a team of PhDs and people with equivalent professional experience.
Way Back When by opos · 2004-11-01 23:22 · Score: 5, Interesting

A long long time ago (1970s) Charlie Molnar, designer of the Linc tape (the Linc computer was an NIH funded (late 1960s) minicomputer that evolved into the PDP 8 and pushed DEC into the minicompuer business) explored asynchronous computing. Along the way they discovered synchronizer failure - i.e. the inability to reliably synchronize asyncronous subsystems - see Chaney, T.J. and Molnar, C.E. 1973. Anomalous behavior of synchronizer and arbiter circuits. IEEE Trans. Comp. pages 421-422. The bottom line is that it is physically impossible to guarantee that the data setup requirements (the minimum time the data must be asserted before it can be reliably clocked into the flip flop) of a flip flop can be met when the clock is asserted by one async component and the data are asserted by another async component. To my knowledge, this fundamental limitation has never been overcome.
Interesting... by dkf · 2004-11-01 23:30 · Score: 4, Interesting

It looks like Philips (through their tame spin-off Handshake Solutions) are letting the world see Tangram again (or something very like it.) Back in around 1994/1995 the Amulet team (already mentioned accurately by others) were looking into using the Tangram language to develop their asynchronous microprocessor technology - it was a fairly neat solution that did most of the things we wanted, though there were a few things it was crap at at the time - but then Philips decided to cut us off. It would be entirely fair to say that this was very annoying! Now it looks like they're letting the cat get its whiskers out of the bag again.

FWIW, ARM have probably known (at least informally and at a level not much deeper than your average slashdot article) a large fraction of what Philips have been up to in this area for at least a decade.

--
"Little does he know, but there is no 'I' in 'Idiot'!"
The WIZ Processor by MarcoPon · 2004-11-02 00:08 · Score: 4, Interesting

Take a look at The WIZ Processor, by Steve Bush.
It's a drastic departure from common CPUs. Definitely intresting.
Bye!

--

SeqBox