Slashdot Mirror


Philips, ARM Collaborate On Asynchronous CPU

Sean D. Solle writes "While not an actual off-the-shelf chip, Philips and ARM have announced a clockless ARM core using what they call "Handshake Technology." Read on for more about just what that means; according to this article, the asynchronous ARM chip has yet to be developed, but the same Philips subsidiary has applied similar technology to other microprocessors.

Sean D. Solle continues "Back in the early 1990's there was a lot of excitement (well, Acorn users got excited) about Prof. Steve Furber's asynchronous ARM research project, "Amulet". The idea is to let the CPU's component blocks run at their own rate, synchronising with each other only when needed. Like a normal RISC processor, one instruction typically takes one clock cycle; but in a clockless ARM, a cycle can take less time for different classes of instructions.

For example, a MOV instruction could finish before (and hence consume less power than) an ADD, even though they both execute in a single cycle. As well as energy-efficiency, running at effectively random frequencies reduces a chip's RFI emissions - handy if it's living in a cellphone or other wireless device."

19 of 163 comments (clear)

  1. Intresting implications by luvirini · · Score: 3, Interesting

    If we see same thing applied to non ARM architectures, there a many strange things going to happen, as quite many things in current computers are based on the assumption that things have specific clock rates. Obviously things might get very intresting...

  2. Encouraging technology, but useful soon? by Dancin_Santa · · Score: 3, Interesting

    The benefit to today's high-functionality embedded operating systems like Linux, Symbian, iTron, and Windows CE is that they implement a preemptive task switching operating system. At any time, the clock interrupt may fire and the operating system will then queue up the next thread into the CPU.

    Nowadays, the whole CPU is not powered at any one time. If an instruction does not access certain parts of the chip, they are dark. Now this does not hold for some predictive processors which may be processing not-yet-accessed instructions, but in general if an instruction is not using some part of the chip, that part of the chip does not require juice.

    Taking out the clock and relying on the chip parts to fire and return means that each application in the system must return to the OS at some point to allow the OS a chance to queue up the next thread. Without the clock interrupt, the OS is at the mercy of the program, back to the bad old days of cooperative multitasking.

    The clock is what tells the OS that it is time to give a time slice to another thread. If we say "OK, well we'll just stick a clock in there to fire an interrupt every x microseconds," then what have we accomplished? We are back at square one with a CPU controlled by a clock. No gain.

    This kind of system would work in a dedicated embedded system which did not require a complex multitasking operating system. Industrial solutions for factories, car parts, HVACs, and other things that need reliability but don't really do that much feature-wise seem to be prime candidates for this technology. "Smart" devices? Not so much.

  3. Re:Intel were first... by kf6auf · · Score: 4, Interesting

    So the question is WHY didn't it make it out of the lab? Did it cost too much to produce? That's the only real possibility I can think of - I don't think Intel's Marketing Division had absolute power over the company in 1997 to push the MHz agenda.

  4. Quite impressive... by Goalie_Ca · · Score: 2, Interesting

    The complexity of souch a core must be astounding. For all you non-ee's out there, a chip is full of little memory cells called flip-flops. At the end of each circuit rests a flip-flop in which normally the rising edge of the clock stores the results of that circuit so it pass that data on and start new stuff without loosing it. Everythig is synchronized to the clock. This is definently over-simplified but that's essentially why a circuit has a clock.

    To eliminate clocks you would new circuitry such arbitrers and some sort of completion logic which could be used to trigger a flip-flop. To break a slashdot law, i haven't done any reading on any modern techniques so would some one enlighten me on some design issues involving simple tasks such as accessing a register file, or making a memory read. Surely a bus would still maintain a clock.

    --

    ----
    Go canucks, habs, and sens!
    1. Re:Quite impressive... by rahard · · Score: 2, Interesting
      To eliminate clocks you would new circuitry such arbitrers and some sort of completion logic which could be used to trigger a flip-flop.... enlighten me on some design issues involving simple tasks such as accessing a register file, or making a memory read.

      if you remember your digital design, there's an asynchronous counter. basically, it involves handshaking just like handshaking in a protocol level but at a lower level. yes, there's arbiter, muller c-element (rendezvous), and other nifty components.

      the most novel approach, IMHO, would be ivan sutherland's micropipeline which could be extended into Counterflow Pipeline Processor (CfPP). Here is his Turing Award paper on micropipelines. (very good and readable paper!)

      Other keywords include "self-timed". I believe there's somebody @ SFU Computer Science who did asynchronous design. I forgot the name. (sigh)

  5. Re:Intel WAS first by Anonymous Coward · · Score: 0, Interesting

    In English, Intel is not singular. It is a composite entity, made up of many people, and as such should be considered plural when choosing the verb conjugation. Many Americans make this mistake.

  6. ARM Business Model by joelethan · · Score: 3, Interesting
    I'm interested becasuse ARM's business model usually involves licensing their chip designs. ARM cpus are widespread in cell phones etc. They have their own market and application area away from Wintel, PowerPC etc.

    Also, anything that might boost my pitiful ARM shares value is most welcome! Why?... Why did I believe the hype?

    /joelethan

  7. Re:way more elegant by renoX · · Score: 4, Interesting

    Agreed that clockless cores have few chance to become mainstream, but still they have a better chance of being used now than before.

    Let me explain: before to reduce power consumption the "easy" thing was to use a process which created smaller transistor, but smaller doesn't means 'reduced power consumption' anymore..
    So clockless CPU becomes more interesting now.

  8. standardised implemenation. by Anonymous Coward · · Score: 2, Interesting

    When I was at university we studied standard ways to overcome these problems down to gate and transistor level. The impression i was given that its not that hard. Some of the standard ways include doing a calculation twice and waiting for the results to be the same. Of coarse there are no tools to do it in automated fashion that i am aware of like synchronous design. Here the tools are extensive from vhdl to handel C implenentation. Even dynamic logic can be synthesised automatically now. This means its to risky for asic design houses to implement on there own as they would have to do there own custom blocks. This also presents a design flow problem. How would a foundry verify the design for you. In the case of synchronous design the methodology is well established.

    my 2 cents..

  9. Re:Philips growing into a Major R&D company by dtmos · · Score: 3, Interesting

    Philips has been a world-class R&D company for a long time. Philips Research was established in 1914, and has contributed much, from the invention of the pentode vacuum tube (valve) by Tellegen in 1929 to the audio cassette in the 1960s and their more modern work developing CDs and DVDs.

    The fire has been lit under IBM and other corporate research organizations for a long time.

  10. Way Back When by opos · · Score: 5, Interesting

    A long long time ago (1970s) Charlie Molnar, designer of the Linc tape (the Linc computer was an NIH funded (late 1960s) minicomputer that evolved into the PDP 8 and pushed DEC into the minicompuer business) explored asynchronous computing. Along the way they discovered synchronizer failure - i.e. the inability to reliably synchronize asyncronous subsystems - see Chaney, T.J. and Molnar, C.E. 1973. Anomalous behavior of synchronizer and arbiter circuits. IEEE Trans. Comp. pages 421-422. The bottom line is that it is physically impossible to guarantee that the data setup requirements (the minimum time the data must be asserted before it can be reliably clocked into the flip flop) of a flip flop can be met when the clock is asserted by one async component and the data are asserted by another async component. To my knowledge, this fundamental limitation has never been overcome.

    1. Re:Way Back When by BarryNorton · · Score: 2, Interesting

      A good review, as well as the state of the art, afaik, in showing how much we can formally say about what can be achieved practically is Ian Mitchell's MSc thesis (1996, British Columbia) 'Proving Newtonian Arbiters Correct, Almost Surely' (which is an answer to Mendler and Stroup's 'Newtonian Arbiters Cannot Be Proven Correct', paper versions of both being available from the proceedings of Designing Correct Circuits, in 1992 and 1996)

  11. Interesting... by dkf · · Score: 4, Interesting

    It looks like Philips (through their tame spin-off Handshake Solutions) are letting the world see Tangram again (or something very like it.) Back in around 1994/1995 the Amulet team (already mentioned accurately by others) were looking into using the Tangram language to develop their asynchronous microprocessor technology - it was a fairly neat solution that did most of the things we wanted, though there were a few things it was crap at at the time - but then Philips decided to cut us off. It would be entirely fair to say that this was very annoying! Now it looks like they're letting the cat get its whiskers out of the bag again.

    FWIW, ARM have probably known (at least informally and at a level not much deeper than your average slashdot article) a large fraction of what Philips have been up to in this area for at least a decade.

    --
    "Little does he know, but there is no 'I' in 'Idiot'!"
  12. Re:Such a processor already exists by Anonymous Coward · · Score: 2, Interesting

    Yes, indeed they did. The article doesn't mention any collaboration between the teams, which seems strange because:

    1) ARM like to licence CPU core design IP, as mentioned in a later thread.

    2) One of the major upsides of asynchronous CPU design (said Prof. Furber on the Manc. Uni course) is that because the subcomponents of the CPU aren't nearly so tied to temperature, voltage and clock speed requirements (which directly affect flip-flop "set up" and "hold" time), the intellectual property invested in creating such a chip is far more reusable than any synchronous design.

    So if this is (as I infer from the article) a clean-room implementation separate to the AMULET group's work, it's totally contrary to the ARM licensing model and duplicates a lot of effort. Which seems a shame.

  13. Re:I had an idea once by ajs318 · · Score: 2, Interesting

    That's another reason to scatter the delaying gates throughout the core, and use enough of them. You have to hope that you don't get too many instances of a logic element and one of its associated delaying gates falling on the opposite sides of a process variation boundary. Especially where the effects favour faster propagation in the delaying gate. So, my intention was to aim for the clock delay being slightly but definitely longer than, and not exactly equal to, the logic delay. It would still respond to dynamic effects like temperature better than an external clock oscillator.

    This would also be one of those kinds of circuits that, if it's not built on the same silicon substrate, won't work at all. Power op-amps are another good example: they rely on better thermal coupling than you can achieve with discrete components, and better properties-matching than you can achieve by just pulling transistors out of a bag at random without doing any tests on them. {You can't control the absolute values of most on-chip components precisely, but you can be fairly sure of the relative similarities between them}. Build one out of carefully-gain-matched transistors exactly according to the schematic in the data book, and it might just about work if you put it in a constant-temperature oven. In the best case it will distort like hell, and in the worst case it will go literally into meltdown.

    --
    Je fume. Tu fumes. Nous fûmes!
  14. The WIZ Processor by MarcoPon · · Score: 4, Interesting
    Take a look at The WIZ Processor, by Steve Bush.
    It's a drastic departure from common CPUs. Definitely intresting.

    Bye!

    --

    SeqBox
    1. Re:The WIZ Processor by MarcoPon · · Score: 2, Interesting
      Also this thread on MASM forum could be of interest. It was started by Steve Bush himself, and there are a lot of discussions & examples from the point of view of the ASM programmer (but not only that):
      The WIZ - a new and radical processor architecture

      P.S. I'm not associated with Mr. Bush in any way; I simply like this kind of things.

      Bye!

      --

      SeqBox
  15. The 68000 had async operation with /dtack pin by Anonymous Coward · · Score: 2, Interesting
    This was the purpose of the /dtack pin. This was used to acknowledge a transfer when operating async, or you could just ground it and run things at cpu self clocking sync. So how is this new again?



  16. Re:I had an idea once by TonyJohn · · Score: 2, Interesting
    You stated that the clock period (and therefore the length of the ring oscillator) should be about the same length as the critical path through the design. This is likely to be significantly less than 50 gates, and therefore your oscillator will only have 25 inverters. In a design with a million gates or more, this is not really enough to monitor the process and temperature variation across the die (which is surprisingly significant). If you could get enough gates into the ring (use NAND gates?), then they will start consuming significant area, and therefore slow the chip down.

    The idea is good and the physics is sound, but putting something like this into practice is much harder than you make out. Speed binning of chips goes part way to adjusting for process variation. Sophisticated chips have temperature monitors that will scale back the clock when things get too hot (but in a crude, broad-brush way). ARM is already working on more fine-grained closed-loop systems (see here), but as a way of saving power rather than going faster, and with an indirect link between chip speed and clock.

    --
    Owl tried to think of something wise to say, but couldn't.