Slashdot Mirror


Self-Timed ARM Provides Low Power Consumption

hardcorebit writes: "The Amulet Group at the University of Manchester is working on a 'self-timed' or 'asynchronous logic' chip which uses the ARM architecture and instruction set. The benefits? Much lower power consumption, lower EMF emissions, and it works with everything written for the ARM. Their latest effort is 'broadly equivalent' to an ARM9. Anyone had a chance to get their hands on one of these beasts?"

30 of 83 comments (clear)

  1. Re: Avoiding branch prediction troubles by SpinyNorman · · Score: 2

    For anyone curious about that statement, the way ARM does it is by making ALL instructions conditional. Rather than branch for a small piece of conditional code, you can just scream right thru it!

    I used to work for Acorn (the original "A" in ARM before it was changed). The guy doing the Amulet work at U. Manchester is Steve Furber, who was one of the original Acorn design engineers and original architects of the ARM.

  2. Re:Will this help me get a webpad faster? by Bob+Ince · · Score: 2
    Ever seen the Transmeta webpad? OOoooOOhhh I want one of those.

    Quite so. Kinda like the ARM-based NewsPAD of old, innit. Hope it's rather more successful.

    Has anyone ported linux to ARM?

    Yes.


    --
    This comment was brought to you by And Clover.
  3. Async doesn't fail if a line is slow! by jncook · · Score: 2

    Actually, tolerance of manufacturing variability is one of asynchronous design's strengths. Because each "chunk" of gates only performs its computation when all input data is available, it doesn't matter if a piece of data arrives early or late. The computation is data-driven. It runs as fast as the lines can switch. So it does not have to be "more conservative."

    Async designs require more silicon area for simple things, like data lines. Rather than having a single "high = 1, low = 0" data line + a clock line, our group used three wires. First wire high = 1, second wire high = 0, third wire high = downstream component got the data, reset please.

    The coolest geek feature of async processors is that if you improve the transistor physics (e.g., put an ice cube or some liquid nitrogen on the processor) the instruction rate increases. Whee!

    James Cook
    ex-"cook@vlsi.caltech.edu"
    now james@cookmd.com

  4. Clarifying my question. by Christopher+Thomas · · Score: 2

    Secondly, I assume you understand the purpose of the "acknoledgement", which is essentially the "hey, I'm with the previous result, I'm ready for the next set of inputs"? The acknoledgement along with the normal properties of CMOS prevent any "race condition" from occurring (I assume that is your fear?).

    My fear isn't a race condition; it's a spurious signal emitted from a previous output stage causing processing to begin before it should in the following stage, with invalid data. Spurious signals like this occur all of the time, and are called "glitches"; they result when multiple paths through a logic block have different lengths. The canonical solution is to ignore all outputs until enough time has passed for them to stabilize. Glitches can also be minimized by adding redundant logic terms.

    Again, I'm not exactly following you here, but remember, we're not using traditional NAND, etc... gates in CMOS, but NCL gates (e.g., 3 of 5) and they are designed specifically for NCL and the acknowledgement flow.

    However, your NCL gates are still composed of transistors set up using CMOS logic rules (or any of a variety of dynamic schemes that accomplish the same thing). This winds up giving effects similar to those you would see with standard boolean logic circuits. As far as I can tell from the documentation, in actual implementation NCL isn't so much a departure from boolean logic as a layer of meta-logic on top of it that makes it self-clocking. The actual physical signal encoding on individual lines is boolean (the lines are just grouped in interesting ways).

    Thus, while the gates are self-clocked, they seem to be as vulnerable to glitching as any other combinational logic blocks.

    Information regarding "orphans" noted. It's interesting, but doesn't relate to my question.

    Again, I do *NOT* speak for Theseus Logic and there are much better individuals here who can clear up any questions. Feel free to fire off some questions to the address(es) on the web site.

    Noted; thanks for posting the link, btw. This is a very interesting approach to asynchronus circuitry.

  5. PalmOS too by XNormal · · Score: 2

    PalmOS is also moving to the ARM processor.

    This is interesting because a cellular phone takes so much power for the RF transmission that the CPU consumption is relatively negligible. I don't know about you, but I really like the fact that a Palm runs forever on a pair of AAAs. I don't like the rechargeable Palm V, Palm IIIc, PocketPCs, etc.

    ----

    --
    Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
    1. Re:PalmOS too by Troed · · Score: 2
      Check out a Psion device then - they're even better at it, with MUCH more functionality than a Palm, and with a 32 bit OS ...

      I'd suggest a Psion Revo! Not much larger than a Palm V - and with a keyboard.

  6. Re:Is ARM about to go the way of DEC? by Junks+Jerzey · · Score: 2

    The Alphas were not "head and shoulders above the 386." When they were first introduced, they were sucking up way, way more power and requiring much more cooling than an average Intel chip. Faster, yes, but at a price. Alphas were targeted at the "performance at all costs" CPU market, not something for the average desktop or laptop.

  7. Amulet demo by BrianW · · Score: 3

    I saw a demo of the Amulet a couple of years ago, when I was at Manchester University. They'd wired up one to a variable-voltage power supply, and a speaker.

    By putting it into a loop where it powered the speaker every couple of cycles, it generated a tone. By adjusting the voltage of the power supply, it was possible to make the tone higher or lower, as it wa having a direct effect on the running speed of the processor.

    Also, when put into a 'halt' loop, it would power down until interrupted. An ammeter connected in series with it showed that it was using almost literally no power.

  8. I *LOVE* ARM. YMMV. by Bob+Ince · · Score: 2
    Our main project for the semester was to build a behavioral and structural model of a pipelined ARM7 processor.

    That does sound kind of harsh, but then I'd hate even more to have to do it for any other kind of modern chip architecture.

    The ARM instruction set is pretty clean, and dead dead easy to program even large projects in. Mind you, some of the newer ARMv4, Thumb instructions must be pretty hairy from an implementation POV, especially keeping backwards-compatibility with 26-bit addressing.

    Hang on, what's this story doing on /., anyways? The Amulet project has been going a long, long time and achieved ARM9-level performance some time ago, IIRC. Asynchronous chips are interesting but the power of mainstream (particularly x86) processors has kept increasing at such a rate no-one has yet needed to make the huge change of design strategy. I don't expect to see async chips in the mainstream until Moore's law is well and truly broken.


    --
    This comment was brought to you by And Clover.
  9. Re:I *HATE* ARM by Junks+Jerzey · · Score: 4

    Maybe you haven't been exposed to enough processor archictectures? The ARM chips have the cleanest instruction set and overall archictecture that I've seen, and that includes lots of hands-on experience with the PowerPC, x86, SHx, and MIPS chips. The ARM designers had some very good ideas for keeping instructions simple while getting a lot done and they had a novel way of avoiding the usual branch prediction troubles. Very slick.

  10. "Kicking out the Clock" by Amulet's lead, Furber by BitMan · · Score: 4

    Amulet's lead, Steve Furber (who also designed the original ARM), wrote a recent editorial coverstory called "Kicking out the Clock" in the May 2000 edition of Integrated System Design (ISD) magazine.

    In the article, he used an example of a "dual-rail" logic (as opposed to "single-rail" found in most boolean-designs) call Null Convention Logic (NCL) from Theseus Logic. Theseus' NCL approach not only goes a long way to not only solving the power and noise problems (like most asynchronous), but also the greater problem of design reuse (a problem with both async and, especially, synchronous) -- the later is something Furber was quoted on in a past EE Times article (cannot seem to find it on-line anymore?).

    Timing verification is becoming increasingly difficult in IC design, adding rediculous ammounts of extra effort and, in some cases, complete design failures (e.g., AMD, IBM and Intel have all had timing-related design failures). Clocks may soon disappear in favor of async designs, especially those like Theseus Logic's nearly-100% delay INsensitive NCL technology. NCL's delay INsensitive nature comes from the fact that it is NOT boolean logic based, but a new method that breaks the traditional foundation of what boolean logic was design for, mathematicians, not computers.

    In addition to an "operand" and an "operator," as with traditional, human-based math, computers require a third "control" line. In synch/boolean, this is the clock. With the limitations of the speed of light, it is IMPOSSIBLE for 10M+ transistor ICs on one section of the chip to be timed synchronous with another. As such, most modern ICs have localized clocks, which further adds to design complexity.

    NCL removes the clock as the control (as with most async) *BUT* it places the control back in the data flow lines themselves! NCL is a 3-state logic of "true" and "false", plus the control which is derived from NCL math to be "null" (no data). This representation is 2NCL in NCL math (see Theseus' site for more details on NCL including 4NCL and 3NCL, the later being used with most off-the-shelf tools and optimizers). In 2NCL, the lines (again, "dual-rail") puts the false value (0) on one line and true (1) on the other line *IF* voltage is present, otherwise, no voltage (or low) results in the state of "null" (again, no data). Acknoledgements are used to maintain a delay INsensitive combinational logic circuit, including the fact that NCL can be place alonside synch/boolean and maintain 100% data flow and integrity (again, totally delay INsensitive). So instead of data having to "wait" on a clock to move forward, data moves forward when it arrives! This further increases performance!

    Although Theseus' NCL technology is NOT boolean based, it works with off-the-shelf synch/boolean IC design tools (unlike attempts like Cogency's), it is still CMOS-based, and it not too difficult for an engineer to learn coming from the synch/boolean world.

    [Bias: I am an employee of Theseus Logic and know Mr. Furber, the Amulet lead. I am NOT an engineering lead, just a regular engineer (who seconds as the sysadmin ;-).]

    -- Bryan "TheBS" Smith

    --
    -- Bryan "TheBS" Smith
    Independent Author, Consultant and Trainer
  11. Bit Of Background info... by beebware · · Score: 3
    The A(R)mulet has been in development for a few years now (as readers of Acorn User would be aware).

    Its processor core is based on the ARM9 series, but since it is asynchronous (ie it hasn't got 'clock cycles' like normal synchronous processors) it should go very very fast (simple processes will rush through without being delayed by slight harder/longer processes).

    While I haven't had a chance to get my hands on one of these yet, the spec's I've seen (I can't remember if they are public or not) look good and the chips should be compatible with current ARM chips - as used in my RISC PC (BTW a RISC PC is used to run the 'Who Wants to Be A Millionare' shows!).

    It is difficult to place an exact Mhz rating on these chips due to the way they work, but the current version (AMULET3i) runs at roughly 120Mhz - but they have started from the basics, without using much 'proven technology', so expect development to last a few more years - but the 120Mhz version should be out next month/late this month.


    Richy C.
    --
  12. fixing asynchronous logic's drawbacks by TheDullBlade · · Score: 2

    These guys have an interesting way to deal with it.

    They describe a way to build asynchronous ciruits (using the same design even for different fabs) that run as quickly as the gate/wire delays allow. It takes more surface elements to build the same logic, but once you take removal of the clock lines into consideration, things look a lot closer.

    IMHO, the real beauty of async designs is that your bit shifter op can take 1 nanosecond, your add op can take 3 nanoseconds, and your subtract op can take 4 nanoseconds, rather than having them each take a 4 nanosecond cycle. It really disturbs me to see designs where a multiplication (inherently slower by a minimum factor of lg(bits)) takes the same amount of time as an addition.

    --
    /.
    1. Re:fixing asynchronous logic's drawbacks by mOdQuArK! · · Score: 2
      IMHO, the real beauty of async designs is that your bit shifter op can take 1 nanosecond, your add op can take 3 nanoseconds, and your subtract op can take 4 nanoseconds, rather than having them each take a 4 nanosecond cycle. It really disturbs me to see designs where a multiplication (inherently slower by a minimum factor of lg(bits)) takes the same amount of time as an addition.

      Properly-implemented asychronous circuitry ALSO has the quality of modularity - you can hook each individual module together w/o regard for inter-module timing - and if you come up with a better implementation for one of the modules, you can swap it in w/o adjusting timing in any of the other modules - just as long as the interfaces are well-defined and asynchronous.

      Most engineers would probably agree that this is a good thing.

  13. Caltech built an async MIPS processor in 1989 by jncook · · Score: 2

    When I was in grad school at Caltech, I worked on software tools in Alain Martin's asynchronous microprocessor group. The group had actually developed and fabricated a processor before I arrived. To quote their web page (www.cs.caltech.edu/~alains/previous/uP.html):

    "Above is the layout of the 1.6 micron version of the Caltech Asynchronous Microprocessor, fabricated in 1989. It is a 16-bit RISC machine with 16 general-purpose registers. Its peak performance is 5 MIPS at 2V drawing 5.2mA of current, 18 MIPS at 5V drawing 45mA, and 26 MIPS at 10V drawing 105mA. The chip was
    designed by Professor Alain Martin and his group at Caltech. You can read about the chip in Caltech CS Tech Reports CS-TR-89-02 and CS-TR-89-07."

    Keep in mind that this is a 1.6 micron process. The chip was later fabricated in gallium arsenide with very few design changes. This is because the chip, being completely data driven, will perform computation as fast as the underlying device physics will allow. There are no "timing issues" as these must all be worked out in high-level design (or the chip won't function at all... race conditions in hardware really suck).

    Of course, the neatest geek feature is to pour liquid nitrogen on the chip and watch the instruction rate climb.

    Since I left the group, they have also fabricated an asynchronous "digital filter" or simple DSP. Details at http://www.cs.caltech.edu/~lines/filter/filter.htm l

    The downside of all this stuff is that the design process is very formalized and arduous. Our group designed by writing parallel programs in a special chip-design notation, then transforming the program by hand and by software into a VLSI gate layout. It was a completely different synthesis method than most designers are used to, so it requires completely new software and designer training to be productive. It's sad, really, because the output chips are so very very nifty.

    James Cook
    ex-"cook@vlsi.caltech.edu"
    now-"cook@alumni.caltech.edu"

  14. Lower EMF - Stealthy for the 'JEDI'! by Proteus · · Score: 2
    This Slashdot story mentioned the US Government's plan to create a corp of 'connected soldiers' using palmtops and GPS equipment (among other things).

    I hope whomever is in charge of this project becomes aware of this technology - as other posters on the aforementioned story noted, EMF radiation could make these JEDI's a glowing target. Lower EMF means fewer KIA (Killed In Action, not the crappy car company) JEDI's.

    Besides, the low power consumption is something that nearly every PDA user can appreciate: and in field-critical situations, could be another lifesaver.

    --

    --
    We may not imagine how our lives could be more frustrating and complex—but Congress can. – Cullen Hightower
  15. Okay, Second Years, who's game? by Ian+Pointer · · Score: 2
    Porting Linux to Amulet3.

    Impress the dept. and gain the respect of /.! Certainly much more fun than the Java web-trawler I did this year...

  16. Self timed CPU's has been around for a good while! by vanth · · Score: 3
    Self timed CPU's have been on the design stages for quite some time now, and some groups have built prototypes.

    A group at CalTech built a 16bit RISC style self-timed CPU some years back (early 90's I believe) on a 1.5 micron process (I believe, somebody please correct me if I am wrong)

    One of the cool features is that as you coll the cpu, it literarly becomes faster!.
    The basic design of self timed CPU's has been around for probably more than 20 years.
    S Unger's Asynchronous Sequential Switching Circuits, Krieger, Malabar, FL, 1983 is probably one of the books one encounters when taking a course in this subject. (the book is pretty rough going though - )

  17. Re:Is ARM about to go the way of DEC? by mindstrm · · Score: 4

    ARM has lots of market share. LOTS.
    You are assuming that the main market for this type of chip is the home PC. This is absolutely not the case.

  18. Um. No. by Christopher+B.+Brown · · Score: 2
    The Cobalt Qube and RaQ 2 products use the SGI MIPS processor, not ARM. (And the RaQ 3 uses an "Intel compatible" processor, according to the data sheets found here. )

    I've seen information indicating expressions of interest in a port of PalmOS to StrongARM; I'll believe in there being product when I actually see it on store shelves.

    --
    If you're not part of the solution, you're part of the precipitate.
  19. What about glitches? by Christopher+Thomas · · Score: 2

    The ideas presented in the papers on the Theseus Logic site are interesting. However, the True/False/Null logic scheme defined seems to be vulnerable to glitches in gate inputs. A brief transition to a valid state on all inputs as the previous stage's logic settled would be interpreted as a new input datum by the gate in question, possibly resulting in unwanted output being produced. In other words, using T/F/N logic seems to place stricter timing requirements on input signals than clocked logic with edge-triggered registers.

    Is this correct, or am I missing something? I realize that glitching can be reduced by careful logic design, but this seems to be an issue that is addressed neither in your post nor in the papers on the Theseus site.

  20. How old? by davstott · · Score: 2

    Yes, all very interesting, but this is hardly a new project! I have a recollection of reading about the Amulet project back in the heady days of the ARM 3. I think it might have even had a fairly large lump of magazine dedicated to it when it was still called Micro User! But ancient history aside, it's good that people are still pushing ARM processors even though x86 seems to have all but won the war. Even Intel seem to think so as there's a 400Mhz StrongARM due real soon now, I hear.

    Still, long live Arthur!

    1. Re:How old? by Ed+Avis · · Score: 2

      There's also a pretty decent Archimedes emulator called Arcem by David Alan Gilbert, who coincidentally used to work on Amulet a few years ago. Unfortunately his site is down ATM. It's based on the GPL'ed Armulator code released by ARM Ltd. (Why is it ARM Ltd when they're a publicly traded company? )

      --
      -- Ed Avis ed@membled.com
  21. Re:Is ARM about to go the way of DEC? by BrianW · · Score: 2

    ARM has some market share - the ARM chip is used in all sorts of small low-power devices. The most popular of which is probably the Psion range.

  22. Asynchronous logic's drawbacks by nweaver · · Score: 4

    Asynchronous logic appears, every once in a while, as a "new" hot topic within VLSI and computer architecture research. Yet it has consistantly failed to offer the benefits it promises. Why?

    It is true that clocks in synchronous design consume a great deal of power, but when low power designs are required, it is well understood how to gate and conditionalize clocks so they don't use power when the associated logic is not operating.

    And asynchronous design has to be much more conservative than a synchronous design. With a synchronous design, a chip can be designed to operate at the maximum frequency, and then binned down if it fails to meet its target.

    However, an asynchronous design requires that the delay lines be very conservatively designed, as if the delay line was a little faster, and the logic a little slower, on the worst case critical path, the chip would fail completly, which results in a slower processor by design.

    Finally, the design methodology for building pipelined, synchronous devices is well understood, as a purely digital system. While asynchronous logic relies on building delay lines, essentially analog operations, which is a great disadvantage.

    --
    Test your net with Netalyzr
  23. This is deals with clocking, not bus width. by Christopher+Thomas · · Score: 3

    And I'm still trying to figure out why asynchronous smaller bandwidth (number of lines) buses are faster than synchronous parallel (more data lines).

    They aren't; what asynchronus logic in an IC context deals with is reducing power consumption by not clocking all parts of the chip all of the time.

    In a synchronus microprocessor, the system clock is distributed to all functional units, and the functional units even when not in use usually wind up having some kind of internal state change every clock cycle. This results in a lot of heat production, because every time the state of a bit in a register or of a bus line changes, heat is dissipated (by nature of the way the parisitic capacitances are charged and discharged).

    In a truly asynchronus microprocessor, there is no master system clock distributed to the functional units of the chip. Instead, actions in a functional unit take place when input data changes (i.e. new input data arrives). This results in only the state of units being used changing, which in turn means much less power dissipation if only one or two units is being used at a given time.

    In practice, real systems don't fit into either category. Fully synchronus circuits burn a lot of power, but truly asynchronus circuits are difficult to design and are very sensitive to certain types of process variation. An often-used compromise is to use gated clocks - A synchronus clock is propagated, but only to the functional units that are being used. This principle is extended within the functional units themselves; internal clocks and data are propagated only when they need to be for the operation being performed. This results in a circuit that is much easier to design and fabricate than a truly asynchronus circuit, and that is almost as good from a power consumption point of view.

    I hope this clarifies what the debate over asynchronus computing is about.

  24. Great for Symbian! by Troed · · Score: 4
    ARM is what Epoc, Symbian's OS runs on. Considering that Ericsson, Nokia, Motorola, Psion and Matsushita (Panasonic) owns Symbian and will use its operating system in palmtop computers with built in phones, handhelds and smartphones the future looks extremely bright!

    Oh, forgot. Sony is also an Epoc licensee - and they make cool devices!

    Go ARM!

  25. Theseus logic by DGolden · · Score: 2

    Theseus Logic have some interesting papers on asynchronous logic design on their website, not directly connected to the story, but they're interesting nonetheless.

    --
    Choice of masters is not freedom.
  26. Re:New possibilities by bluGill · · Score: 2

    In fact, if anyone can give some examples where the users benefit is greater using an ARM solution than an x86 or even PowerPC based solution, i'd love to know what they are. ARM's are cool CPU's and all, but hardly predominant in the current market place.

    Get out of your Everything is a PC or Mac box and you will see why you are wrong.

    At work we have make device with (up to) 80 SA110s. It is designed for a job which is easier to do in parrell then serial. We need to do real time tasks, and it is easier to do 64 real time tasks on 64 different processors then to figgure out the timing issues on a single [faster] processor, even if in theory the single processor would have as much power as all combined. With that many processors heat is an issue. Hardware could not have made any other processor work. (Non-arm that is) Also, since our job parrelizes so well it was easier for them to design the hardware with 64 processors then to run all the external ports to one chip.

    This is not the only example of where the strongarm is good. I've seen microwaves for campers. They run off batteries. There is nothing that can be done to get 750 watts of microwave with less then 750 watts of power, but the less power you need over that the better. Not to mention power consumed at when not running. Here the asyncronious arm shines. They don't need much of a processor, but it is easier to compare your sensors with tables in software then hardware. (I've seen microsaves that smell when the food is done, it is one sensor, and then 100 different look up tables for each type of food)

    I have intintionally covered non-computing devices. However if I could buy a linux laptop with a reasonably fast processor with ultra-low power consumption I would. I currently am not using any significant processing power, and often I offload my hard tasks the the Sparc down the hall. Give me a linux laptop that can supply bursts of power when I need it and I'll be happy. (Granted my boss wouldn't be because he needs windows programs where I compile from source anyway for all my programs)

  27. I just never thought that there was an alternative by DeepDarkSky · · Score: 2

    Not having been up on the topic, I just never thought that integrated circuit logic was an alternative. And I'm still trying to figure out why asynchronous smaller bandwidth (number of lines) buses are faster than synchronous parallel (more data lines). But I guess the speed has at least something to do with the noise tolerance. Anyway, I'm reading from one of the links followed from the site that seems to be a pretty good explanation/history of the asynchronous logic.