Self-Timed ARM Provides Low Power Consumption
hardcorebit writes: "The Amulet Group at the University of Manchester is working on a 'self-timed' or 'asynchronous logic' chip which uses the ARM architecture and instruction set. The benefits? Much lower power consumption, lower EMF emissions, and it works with everything written for the ARM. Their latest effort is 'broadly equivalent' to an ARM9. Anyone had a chance to get their hands on one of these beasts?"
I saw a demo of the Amulet a couple of years ago, when I was at Manchester University. They'd wired up one to a variable-voltage power supply, and a speaker.
By putting it into a loop where it powered the speaker every couple of cycles, it generated a tone. By adjusting the voltage of the power supply, it was possible to make the tone higher or lower, as it wa having a direct effect on the running speed of the processor.
Also, when put into a 'halt' loop, it would power down until interrupted. An ammeter connected in series with it showed that it was using almost literally no power.
Maybe you haven't been exposed to enough processor archictectures? The ARM chips have the cleanest instruction set and overall archictecture that I've seen, and that includes lots of hands-on experience with the PowerPC, x86, SHx, and MIPS chips. The ARM designers had some very good ideas for keeping instructions simple while getting a lot done and they had a novel way of avoiding the usual branch prediction troubles. Very slick.
Amulet's lead, Steve Furber (who also designed the original ARM), wrote a recent editorial coverstory called "Kicking out the Clock" in the May 2000 edition of Integrated System Design (ISD) magazine.
In the article, he used an example of a "dual-rail" logic (as opposed to "single-rail" found in most boolean-designs) call Null Convention Logic (NCL) from Theseus Logic. Theseus' NCL approach not only goes a long way to not only solving the power and noise problems (like most asynchronous), but also the greater problem of design reuse (a problem with both async and, especially, synchronous) -- the later is something Furber was quoted on in a past EE Times article (cannot seem to find it on-line anymore?).
Timing verification is becoming increasingly difficult in IC design, adding rediculous ammounts of extra effort and, in some cases, complete design failures (e.g., AMD, IBM and Intel have all had timing-related design failures). Clocks may soon disappear in favor of async designs, especially those like Theseus Logic's nearly-100% delay INsensitive NCL technology. NCL's delay INsensitive nature comes from the fact that it is NOT boolean logic based, but a new method that breaks the traditional foundation of what boolean logic was design for, mathematicians, not computers.
In addition to an "operand" and an "operator," as with traditional, human-based math, computers require a third "control" line. In synch/boolean, this is the clock. With the limitations of the speed of light, it is IMPOSSIBLE for 10M+ transistor ICs on one section of the chip to be timed synchronous with another. As such, most modern ICs have localized clocks, which further adds to design complexity.
NCL removes the clock as the control (as with most async) *BUT* it places the control back in the data flow lines themselves! NCL is a 3-state logic of "true" and "false", plus the control which is derived from NCL math to be "null" (no data). This representation is 2NCL in NCL math (see Theseus' site for more details on NCL including 4NCL and 3NCL, the later being used with most off-the-shelf tools and optimizers). In 2NCL, the lines (again, "dual-rail") puts the false value (0) on one line and true (1) on the other line *IF* voltage is present, otherwise, no voltage (or low) results in the state of "null" (again, no data). Acknoledgements are used to maintain a delay INsensitive combinational logic circuit, including the fact that NCL can be place alonside synch/boolean and maintain 100% data flow and integrity (again, totally delay INsensitive). So instead of data having to "wait" on a clock to move forward, data moves forward when it arrives! This further increases performance!
Although Theseus' NCL technology is NOT boolean based, it works with off-the-shelf synch/boolean IC design tools (unlike attempts like Cogency's), it is still CMOS-based, and it not too difficult for an engineer to learn coming from the synch/boolean world.
[Bias: I am an employee of Theseus Logic and know Mr. Furber, the Amulet lead. I am NOT an engineering lead, just a regular engineer (who seconds as the sysadmin ;-).]
-- Bryan "TheBS" Smith
-- Bryan "TheBS" Smith
Independent Author, Consultant and Trainer
Its processor core is based on the ARM9 series, but since it is asynchronous (ie it hasn't got 'clock cycles' like normal synchronous processors) it should go very very fast (simple processes will rush through without being delayed by slight harder/longer processes).
While I haven't had a chance to get my hands on one of these yet, the spec's I've seen (I can't remember if they are public or not) look good and the chips should be compatible with current ARM chips - as used in my RISC PC (BTW a RISC PC is used to run the 'Who Wants to Be A Millionare' shows!).
It is difficult to place an exact Mhz rating on these chips due to the way they work, but the current version (AMULET3i) runs at roughly 120Mhz - but they have started from the basics, without using much 'proven technology', so expect development to last a few more years - but the 120Mhz version should be out next month/late this month.
Richy C.
--
A group at CalTech built a 16bit RISC style self-timed CPU some years back (early 90's I believe) on a 1.5 micron process (I believe, somebody please correct me if I am wrong)
One of the cool features is that as you coll the cpu, it literarly becomes faster!.
The basic design of self timed CPU's has been around for probably more than 20 years.
S Unger's Asynchronous Sequential Switching Circuits, Krieger, Malabar, FL, 1983 is probably one of the books one encounters when taking a course in this subject. (the book is pretty rough going though - )
ARM has lots of market share. LOTS.
You are assuming that the main market for this type of chip is the home PC. This is absolutely not the case.
Asynchronous logic appears, every once in a while, as a "new" hot topic within VLSI and computer architecture research. Yet it has consistantly failed to offer the benefits it promises. Why?
It is true that clocks in synchronous design consume a great deal of power, but when low power designs are required, it is well understood how to gate and conditionalize clocks so they don't use power when the associated logic is not operating.
And asynchronous design has to be much more conservative than a synchronous design. With a synchronous design, a chip can be designed to operate at the maximum frequency, and then binned down if it fails to meet its target.
However, an asynchronous design requires that the delay lines be very conservatively designed, as if the delay line was a little faster, and the logic a little slower, on the worst case critical path, the chip would fail completly, which results in a slower processor by design.
Finally, the design methodology for building pipelined, synchronous devices is well understood, as a purely digital system. While asynchronous logic relies on building delay lines, essentially analog operations, which is a great disadvantage.
Test your net with Netalyzr
And I'm still trying to figure out why asynchronous smaller bandwidth (number of lines) buses are faster than synchronous parallel (more data lines).
They aren't; what asynchronus logic in an IC context deals with is reducing power consumption by not clocking all parts of the chip all of the time.
In a synchronus microprocessor, the system clock is distributed to all functional units, and the functional units even when not in use usually wind up having some kind of internal state change every clock cycle. This results in a lot of heat production, because every time the state of a bit in a register or of a bus line changes, heat is dissipated (by nature of the way the parisitic capacitances are charged and discharged).
In a truly asynchronus microprocessor, there is no master system clock distributed to the functional units of the chip. Instead, actions in a functional unit take place when input data changes (i.e. new input data arrives). This results in only the state of units being used changing, which in turn means much less power dissipation if only one or two units is being used at a given time.
In practice, real systems don't fit into either category. Fully synchronus circuits burn a lot of power, but truly asynchronus circuits are difficult to design and are very sensitive to certain types of process variation. An often-used compromise is to use gated clocks - A synchronus clock is propagated, but only to the functional units that are being used. This principle is extended within the functional units themselves; internal clocks and data are propagated only when they need to be for the operation being performed. This results in a circuit that is much easier to design and fabricate than a truly asynchronus circuit, and that is almost as good from a power consumption point of view.
I hope this clarifies what the debate over asynchronus computing is about.
Oh, forgot. Sony is also an Epoc licensee - and they make cool devices!
Go ARM!
it's in my head