Clockless Computing: The State Of The Art

Sounds very interesting, but... by weslocke · 2001-09-15 01:34 · Score: 1

Is this another example of the 'bohemian/hippie renegade engineer out to save the computing world by their bold revolutionary ideas'?

Sort of reminds me of the Rolling Stone cover back in '90 (or so) that had Jesus Jones on the cover. "Will Jesus Jones save Rock & Roll?" (And notice where they are now)

--

'Life is like a spoonful of Drain-O, it feels good on the way down but leaves you feeling hollow inside'

Re:Sounds very interesting, but... by Anonymous Coward · 2001-09-15 01:48 · Score: 0, Offtopic

yes, rminds me of those renegade hippie engineers that started Sun in their garage. (And notice where they are now)
Re:Sounds very interesting, but... by DavidRavenMoon · 2001-09-15 05:39 · Score: 1

Also reminds me of the renegade hippies that started Apple in their garage...
and where are they now? ;-)

--
-- if it was so, it might be; and if it were so, it would be; but as it isn't, it ain't. That's logic - Lewis Carrol

How... by blkros · 2001-09-15 01:39 · Score: 2, Funny

are computers going to know what time it is if they don't have any clocks?

--
Damnit, Jim, I'm an anarchist, not a F@#$!^& doctor!

Re:How... by Fat+Casper · 2001-09-15 01:50 · Score: 2

You'll just think that you've set the fuzziness scale to high.

--
I spent a year in Iraq looking for WMD and all I found was this lousy sig.
Re:How... by andika · 2001-09-15 03:00 · Score: 1

are computers going to know what time it is if they don't have any clocks?

I believe clock is still needed, but CPU itself doesn't depend on it. OS will surely require clock.
Re:How... by Anonymous Coward · 2001-09-16 00:32 · Score: 0

if you can't beat them on clock frequency...
Didn't AMD not planning to post clock frequency in new BIOS ? ;)

Async clockless designs are not easy. How can be expect the HDL code monkeys out there to understand the hardware give the abstractions in the language let alone to do something as this ? Given the high demand of chip monks, this is not likely to happen in a large scale.
Re:How... by superflex · 2001-09-17 00:52 · Score: 1

most computer systems have auxillary timers that they use for actually keeping time. for example, in an architecture/assembly language course i just took in the summer, we were using a motorola coldfire processor, and a MC68901 multifunction peripheral controller, which includes 4 independant timers. We wrote some assembly routines which set up one of the timers (i believe they used a 25MHz clock) so that running the signal through a 1/16 clock divider and then to an accumulator which generated an interrupt when it reached a certain value (25 000 000/16 = 1 562 500), i.e. once every second. The ISR that responded to this interrupt was our system clock.
Of course, this system relies on software to do all the work. A real system clock would just be the same thing implemented in hardware.

--
sigs are for suckers

Who? by John+Guilt · 2001-09-15 01:41 · Score: 1

No, I think it quite likely that many in the Linux community have given about half a thousandth of their assets.

You know, as with the police, I have a lot less trouble with Bill & co. than with their sycophants. At least the way B.G. and M.S. operate makes sense for _them_; to hear these cheerleaders prate along as if Bill might actually _like_ them....

This is cool. by Anonymous Coward · 2001-09-15 01:45 · Score: 1, Interesting

And marketing these chips will have to get back to the real stuff: how many operations of a specific kind they can carry out per second.

I'm just wondering, would such a processor execute the same machine code using the same internal sequence of signals twice ? I guess asynchronous communication between elements would introduce some kind of randomness.

Re: This is cool. by SEWilco · 2001-09-15 02:52 · Score: 1

Well, I suppose it's the lesser of two evils. Having a benchmark result printed on a retail box does carry a little more information than having a clock rate on the box.
(Of course for the benefit of the consumer who doesn't know the differences between benchmarks, some standard benchmark would have to be used so the consumer can simply know "a bigger number on the box is better".)
Re:This is cool. by zzzz23 · 2001-09-15 06:49 · Score: 1

This 'randomness' exists both in syncronous and async logic. Clocked boolean logic will not handle the same operations exactly the same way twice with respect to timing at the gate level, but the clock hides that (that's why it's there). With clockless logic, other mechanisms hide that.

In the case of Null Convention Logic, it's the extra signal saying 'wait until I'm done!' to the next logic unit. This results in relative dataflow being more random-like. However, both approaches use design to insure the end result is properly achieved.

With Null Convention Logic, this 'relative randomness' means the chip is producing more 'white noise' since multiple clocks are not joining together to produce a steady electromagnetic frequency. This should make design easier as you don't have to fight your own chip design to keep it from interfearing with itself. This is a HUGE problem with current clocked designs. I believe it results in many forced re-designs to get it right, and the problem only gets worse with higher clock rates and bigger chips.

What will they advertise now? by chill · 2001-09-15 01:49 · Score: 3

What will AMD and Intel try to one-up each other with? No clock speed, so how do you classify, much less hype, new processors?

The real reason they haven't moved to this yet is their marketing team doesn't want to give up on the MHz race.

--
Learning HOW to think is more important than learning WHAT to think.

Re:What will they advertise now? by Anonymous Coward · 2001-09-15 01:51 · Score: 0

Umm, FLOPS ?

PS. Fuck Malda and his fucking lameness filter.
Re:What will they advertise now? by Anonymous Coward · 2001-09-15 01:56 · Score: 2, Insightful

That is NOT the reason they have not moved.
Designing something as complicated as a CPU without clocks is a daunting challenge. Keeping
everything in sync, removing race conditions,
keeping order of execution the same. There's a
lot of challenges in a clockless design.
Re:What will they advertise now? by tshak · 2001-09-15 02:06 · Score: 1

There should be an independant association that uses a battery of benchmarks to come up with a few "measurements" that the general public can use to gauge the performance. We should implement this even now with our clock speed being almost ambiguous.

--

There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
Re:What will they advertise now? by weslocke · 2001-09-15 02:59 · Score: 1

You have to remember that AMD is already about to abandon the "My Mhz vs. Your Mhz" game since the speeds are beconing an increasingly 'apples to oranges' comparison. They'll be referring to future chips by model number... and from what I've been hearing the consumer will actually have to dig to get to the speed of the chips.

--

'Life is like a spoonful of Drain-O, it feels good on the way down but leaves you feeling hollow inside'
Re:What will they advertise now? by Anonymous Coward · 2001-09-15 05:40 · Score: 0

Isn't that just floating point operations? What apps that use integer operations? Or are more restricted by cache or something rather than raw cpu power?
Re:What will they advertise now? by mattdm · 2001-09-15 07:31 · Score: 2

What a novel idea!

Sarcasm aside -- the SPEC benchmarks have been around for a long time and are well respected. You can see some SPEC CPU 2000 results here.
Re:What will they advertise now? by f_thegreenbear · 2001-09-15 11:42 · Score: 1

Looking at Spec95, was anyone else surprised by the fact that a P5-100 hit 3.05 Int, and 2.07/2.72 FP, whereas a K7-1000 ran at 42.9 int and 29,4 FP? Thats about 14x improvement from a 10x clkock speed gain.
Given that the K7 is 9-way superscalar, you worry about compiler quality.

--
anarcho-roboticist [lopster incomplete: 6.5% of 2.5GB]

Doesn't matter. by Anonymous Coward · 2001-09-15 01:49 · Score: 0

It doesn't matter, computers are not sentient (yet).

Power saving, yes.... Good performance???? by AtomicBomb · 2001-09-15 01:51 · Score: 1

I can see the point that clockless design can reduce the power consumption. However, I don't really catch the point why it may solve the other problems inherited from high speed computation.

Suppose we want to increment the register for 1000M times, clocked circuit will generate hell lot of the noises when all the signal pushes thru the circuit,at say 2GHz,for a duration of say, 0.5s .... But, if we want the clockless design to work as good, its asynchronous gates should still be switched for that much times in the same 0.5.
In terms of noise generation, it will be on par of convention design. As all the gates still need to switch at pretty much the same speed, other physical barriers still operates.

Anyone has more detailed info on this topic?

Re:Power saving, yes.... Good performance???? by r1ch · 2001-09-15 02:11 · Score: 1

But you can save time by removing the clock-synched latch that currently has to separate each piece of logic. Like the article says, async chip design is gradually being introduced in this way to current designs, like the Pentium 4 - not that that is much of an advert for the technology... It's not really anything new, it's just that previously the benefits were greatly outweighed by the difficulty in designing these systems.
Re:Power saving, yes.... Good performance???? by TeknoHog · 2001-09-15 03:27 · Score: 4, Informative

Say we're running at 2GHz, which allows a maximum time of 0.5 ns for an instruction. But if you use some simple instructions that only take 0.2 ns each, you'll be wasting 3/5 of your time waiting for the next cycle. With clockless computing you can move on to the next stage as quickly as you're done with the one before.
Of course there is some overhead. There has to be a system telling other parts of the computer when something is finished. But if that is a long enough stage (perhaps thousands of instructions) then it'll be faster overall.

--
Escher was the first MC and Giger invented the HR department.
Re:Power saving, yes.... Good performance???? by Anonymous Coward · 2001-09-15 03:51 · Score: 0

Noise: you get less noise without a clock because of a few reasons, one being the clock itself (and the tree that distributes it across the chip) aren't thumping out 2ghz waves. Another is that async design generally tries to eliminate hazards (hazards cause the propagating of incorrect results for a small amount of time, which causes unneccessary transistor switching hence power consumption). The extra switching from hazards causes noise, and takes power. Like the article says, async has to get the values right first time because theres no clock to indicate signal validity.
Performance: not really from async (there is signalling overhead), but if you save 30% of the chip space you can put in more functional units (integer adders etc.) and do more operations in parallel, hence you get greater throughput of instructions.
Various good results have been reported - chuck moore claims his async chip gets 500% better performance for 20% of the power of synchronous designs. The amulet group at manchester have managed to get something like 1400mips/watt with the latest amulet core, despite the article claiming there are no commercial async chips the amulet3h core is being licensed now for embedded apps.
The main difficulty in designing these chips is that current tools dont support async design and verification, and from speaking to synchronous chip designers, they seem to have a real problem understanding it.. maybe a CS background rather than EE helps here. Amulet at manchester have open source tools like Balsa, which they used to design the MMU in Amulet.
Open source methodologies have found it difficult to break into chip design, but the paradigm shift towards asyncronous design invalidates a lot of the commercial tools and gives us a chance to start afresh. Balsa should be quite easy for programmers familiar with threads and asyncronous processes, in fact, I have heard of people designing async cpu cores in a weekend with it (beats vhdl :). Anyone interested should have a look at the open source tools on the Amulet site, check out some of the async tutorials and Suns async site.
Re:Power saving, yes.... Good performance???? by ballista · 2001-09-15 04:44 · Score: 1

With clockless computing you can move on to the next stage as quickly as you're done with the one before.

Actually you can do even better. If the instruction executed does not need the memory stage of the pipeline, it can exit the pipline before that stage. This will allow multiple quick instructions (eg shift) to execute and exit the pipeline while the slow memory instruction ties up the memory stage. This psuedo parallel operation is what clocked processors can only do with multiple pipelines.
Re:Power saving, yes.... Good performance???? by randombit · 2001-09-15 05:54 · Score: 1

Say we're running at 2GHz, which allows a maximum time of 0.5 ns for an instruction.

Not so; many instructions take multiple cycles. Which ones depends on the machine, but multiplication, division, jumps, and of course memory accesses, are usually 2-20 clock cycles to execute.

More time is spend doing memory access and missed branches than anything else (IIRC: Pentium Pro guesses 90% of branches correctly, and missed branches count for about 30% of the overall time of executing a typical piece of code). IA-64 does some interesting things to prevent missed branches from hurting the code (basically, it executes both branches in parallel, throwing away whichever one was wrong). IA-64 has so many functional units that I guess in the long run, it turns out to be a win.

If the advatages they cite for these chips are true, things could get very interesting in a couple of years. :)
Re:Power saving, yes.... Good performance???? by Lars+T. · 2001-09-15 07:35 · Score: 1

Performance: functional units in clocked systems have to finish in a clock cycle (or multiples thereof). If a unit takes different times to complete for different inputs, the clock is limited by the worst-case time for a clocked unit, a async'ed unit is done when it's done. So an async system will be at least as fast as an equivalent clocked system (using equivalent units) - ignoring overhead for the async system.

--
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck
Re:Power saving, yes.... Good performance???? by Anonymous Coward · 2001-09-16 00:38 · Score: 0

it might have lower narrow band spectrum noise as the energy is now spread over a larger frequency range. It would be like using spread spectrum clocks to pass FCC requirements.

Any minor changes in temperature, voltages in going to change the propagation delays. High frequency -> small time periods -> larger spread of frequency for small delta changes.

useful for transmeta? by ruck · 2001-09-15 01:52 · Score: 1

With its simplified core, a processor like the crusoe seems like it could be a promising general-purpose chip to first adopt technology like this.

Any comments from someone more knowledgable than I?

Re:useful for transmeta? by nd · 2001-09-15 05:52 · Score: 1

Err, not really. Crusoe is still very much synchronous design. Changing that is extremely non-trivial.
Re:useful for transmeta? by Anonymous Coward · 2001-09-15 06:56 · Score: 0

It would be cool if Transmeta would use clockless computing on it's future prosessors. Their prosessors speed could be on same level as Intel's (or even faster) and they would use even less electicity than they do now.

It would also raise an interesting question: anarchistic prosessor with anarchistic software = Chaos? ;-)

crypto regulations by Theodore+Logan · 2001-09-15 01:55 · Score: 1

Because these chips give off no regularly timed signal, the way clocked circuits do, they can perform encryption in a way that is harder to identify and to crack.

Not if you have a backdoor. Guess these guys don't read Wired..

--

"If you think education is expensive, try ignorance" - Derek Bok

Clockless ARM by Anonymous Coward · 2001-09-15 01:58 · Score: 2, Interesting

The Amulet Group at The University Of Manchester have a clockless ARM (ARMs are used in many mobile phones, the Compaq iPaq and the GBA).

Re:Clockless ARM by class_A · 2001-09-15 09:18 · Score: 1

Saw this at an Acorn Computer User Group meeting at the University of Manchester about 4 years ago.

I was only about 14 and didn't have a clue about half the stuff that was being talked about, but the AMULET simulator they showed at the end looked kinda cool :-)

Maybe it was longer than 4 years, I remember we were waiting for the first shipments of the StrongARM processor upgrade cards for our RiscPC 600's and 700's

Ah well, guess I'm getting old now...

Asynchronous vs. synchronous computing by isj · 2001-09-15 02:00 · Score: 3, Interesting

The article is very interesting. I though that research in asynchronous computing died in the sixties. What the article misses is that async. operations has an overhead too - the synchronization "here is the data". Synchronous computing does not have that.

I have previously read (forgotten where) that in theory async. computers will always be slower that sync. computers. It seems that that is not true anymore. I guess that the latests-and-greatest CPUs have a non-trivial percentage of idle time for instructions which takes slightly longer than an integral number of clock ticks. If an instruction takes 2.1ns and the clock runs at 1ns, everything have to assume that the instruction takes 3ns.

Also imagine a fully async. computer. No need for a new motherboard or even changing settings in the BIOS when new and faster RAM chips are available - the system will automatically adapt.

I think that we will see more and more async. parts in the year to come. But I don't know if everything is going to be asynchronous.

Re:Asynchronous vs. synchronous computing by Trejus · 2001-09-15 04:29 · Score: 2, Interesting

Also imagine a fully async. computer. No need for a new motherboard or even changing settings in the BIOS when new and faster RAM chips are available - the system will automatically adapt.

Now i'm not an engineer, but in the article it mentioned that it was important to have wires and gates connected in a special manner so the data arrives in the proper order. It seems to me that it would make the microprocessor more dependent on the hardware and not less so. Maybe this wouldn't be a problem if all of your RAM was the same speed, but it could cause a problem if you had one 100Mhz simm and one 133Mhz simm. I would think that the information coming from the 133 could screw things up. Can anyone clarify this for me?

--
"To save the planet, I had to go to the worst spot on Earth, and that was Philadelphia." -- Sun Ra
Re:Asynchronous vs. synchronous computing by Anonymous Coward · 2001-09-15 05:52 · Score: 0

100MHz RAM is memory rated to input/output data at 100MHz or lower, anything faster would present a risk of frying the memory. same for PC133 versus PC150 and so on. The memory doesn't "go that fast" per se, it's a matter of the memory chips overheating or not.
Re:Asynchronous vs. synchronous computing by isj · 2001-09-15 06:47 · Score: 1

You are right that it depends a lot on how you implement it.

If the RAM delivers data in serial manner (on bit at a time in one wire), faster RAM would definetely cause problems because the CPU would not know how to distinguish the individual bits ... unless the RAM chip generates a clock on a separate wire, which some RAM chips do.

On the other hand if the data bus is e.g. 33 wires = 32 data wires and one "handshake" wire, the protocol between the CPU and the RAM chip could be:
CPU -> "give me the contents of address 0x38762A63"
CPU then waits for the handshake wire to go high
The RAM chip sees the address, puts the contents on the data wires, and then sets the handshake wire high.
And the the CPU can read the data.

The above asynchronous protocol does not depend on the speed of the RAM chip. The RAM chip could be a future high-speed "zero-latency" chip, or a slow flash-ram chip. The CPU does not need to know.

There are problems with this too. The protocol is sort of request-reply / step-lock. And how do multiple devices share the same bus. And and and...
Noone said it was easy :-)
Re:Asynchronous vs. synchronous computing by mrogers · 2001-09-16 23:41 · Score: 1

I have previously read (forgotten where) that in theory async. computers will always be slower that sync. computers.
I don't think the advantage of async is a direct increase in speed, but rather a decrease in die size (because the clock signal doesn't have to be propagated to all parts of the chip), which leads to a decrease in power requirements and allows the chip to operate faster without overheating.

Sun has made a prototype by renoX · 2001-09-15 02:00 · Score: 1

They have a press release, see here: http://research.sun.com/features/async/

(I'm sorry, I can't use HTML: the lameness filter don't want to allow the posting otherwise.)

I imagine the "perfect" laptop:
- an OLED screen (no need for backlighting)
- an asynchronous processor (low power)
- no HDD, but plenty of MRAM (this RAM is persistent)

Old news? by NoMercy · 2001-09-15 02:14 · Score: 2, Informative

The AMULET group at Manchester University have been developing this for years based on ARM cores.

http://www.cs.man.ac.uk/amulet/index.html

Reliability by numo · 2001-09-15 02:17 · Score: 3, Interesting

Well, I think that the reason the async chips are not being used is quite simple - a clocked system is much easier to design and verify. You know how long before and after a clock edge your signal needs to be there to be recognised. You know that if these constraints match across your system, it will work. Yes, this makes the system as fast as its slowest link - some circuits operate near their limits, some are actually wasting the time. But it works. An asynchronous design would be a pure hell to debug - that's probably why the industry doesn't (yet) mess with it.

BTW, does anybody here remember analog computing? A bunch of cleverly connected operating amplifiers? These things were asynchronous, just as mother nature is. If you can get the physics work for you, bingo - compare the time the nature needs for raytracing a complex scene compared to a digital model :-) The only drawback is that the most of us prefer slow digital model of thermonuclear reaction and similar problems...

Re:Reliability by Aaron+Denney · 2001-09-15 05:47 · Score: 1

Just the opposite actually -- it is usually easier to debug and design. You do have synchronization to avoid reading things at the wrong time, but all the synchronization is local, rather than tied to a global clock pulse, so you only need to verify things at the boundaries, not chipwide at once.

If some unit takes a bit long to respond, you don't get a glitch, as you would in synchronous designs, but instead the unit it is talking too
slows down a bit.

Synchronous and Asynchronous are really misnomers. Better terms would be "globally synchronized" and "locally synchronized".
Re:Reliability by zzzz23 · 2001-09-15 07:30 · Score: 1

Well, I think that the reason the async chips are not being used is quite simple - a clocked system is much easier to design and verify. You know how long before and after a clock edge your signal needs to be there to be recognised. You know that if these constraints match across your system, it will work. Yes, this makes the system as fast as its slowest link - some circuits operate near their limits, some are actually wasting the time. But it works. An asynchronous design would be a pure hell to debug - that's probably why the industry doesn't (yet) mess with it.

Not so. In fact, one of the greatest problems with clocked boolean design is the interference caused by all the clocks on the chip. Fabrication will routinely result in broken chips, forcing multiple redesigns and long development cycles.

Tremendous resources are dedicated to getting around this problem. Also, you can't really just change the design 'a little bit', as doing so results in more interference issues. Want to add a new unit to a clocked boolean logic chip (a new cache, 3d unit, new pipline, etc)? Sure you can do it, but it will require fundamental redesign as adding those clocks associated with the new unit will interfere with other clocks on the chip, and other clocks will interfere with your new unit. The fact that they all have to fire off simultaneously, generating electromagnetic interference, is a real needle in the eye for chip designers.

With well thought out async, all you have to do (more or less) is add the unit to the design. The 1st fab should work, no redesign cycle required. You can add cache memory or whatever and as long as the design is logically valid you will have a functioning chip in a few days time (as long as it takes to fab the chip). Try that with syncronous logic.
Re:Reliability by numo · 2001-09-15 19:42 · Score: 1

The fact that they all have to fire off simultaneously, generating electromagnetic interference

Hmm... But in an async setup they maybe fire simultaneously - you simply don't know, it's up to the statistics. I fear that in that complex chips you will end with a system that works by pure coincidence - some picosecond fluctuation somewhere and you get one glitch per 1000 hours of operation.

It is probably not that simple and as someone wrote, the more proper name would be globally or locally synced. I fully agree with you that there is no reason to tie the bigger units to a single universal clock. But I think that on the lower levels you can get a more reliable design by using traditional approach.

I have no experience in chip design (so I don't know specific problems of trying to stuff tens of millions transistors onto a square inch), but I designed some non-trivial circuits.
Re:Reliability by nobozoz · 2001-09-16 04:40 · Score: 1

I suspect that the reason there aren't many (or any) commercial clockless logic designs has more to do with:
1. Lack of availability of design and synthesis tools.
2. Lack of engineers trained and experienced in the use of clockless logic.
3. Lack of multi-sourced, high volume production of clockless logic components.
4. Lack of a clear economic incentive to abandon clocked logic in favor of clockless logic.

I would dearly love to be able to experiment hands-on with a clockless CPU myself, but the cost and difficulty of obtaining just one such device is more than I can justify personally.

Re:what about other problems? by Bryan+Andersen · 2001-09-15 02:20 · Score: 2

Yep, you have to rethink alot. It's possible. I expect we'll see async processors first show up in embedded situations where all parts of the system are integrated on one chip.

Busses can be made asynchronous. Handshaking is the key. New statigies will be needed, but people are bright so I feel they will be developed. With a little thinking I've sketched out a packet type asyncronous bus in my head. It would work nicely for up to a meter or so. Longer lengths would be slower than shorter ones. One thing I feel may work best is for any signal/data that needs to travel significant distances is to then go into synchronous transmition. Otherwise you end up adding in delays from the back handshake signals.

I remember some of the first articles in SIGARCH and how they sparked my interest. I've always felt that async was the way to go when you don't know how long an operation will take. I'm happy to see it's still getting research dollars.

This isn't new... by Anonymous Coward · 2001-09-15 02:21 · Score: 1, Informative

The old CDC supercomputers, and the Cray 1, were clockless. They were designed by that inspired madman, ...

The reason be built them clockless is that the propogation time to get the clock signal across the machines (which were fairly large) would have significantly slowed the performance. Instead, all of the wires are the right length so that all of the signals arrive at their destination at the right time. I've been told horror stories by ex-CDC salesmen that when they installed new machines, they would spend days or weeks clipping wires to different lengths and debugging hardware failure modes until it all ran smoothly.

Cray also solved the heat dissapation problem by designing the computer to run hot. This meant that when you turned it on it didn't work reliably until all of the ceramic boards heated up (and expanded) so that the connections were solid, etc.

F-ing brilliant.

Re:This isn't new... by Anonymous Coward · 2001-09-15 06:16 · Score: 1

That's not true, the Cray 1 was clocked at 75MHz. Cray did use exact wire lengths in the way you say, but he used the wires effectively as a buffer. It was most definitely not a clockless design (nor were the older CDC machines)
Re:This isn't new... by Anonymous Coward · 2001-09-15 12:20 · Score: 0

Your claim does not jibe with Cray's advertising claims.

Cray claimed that if your code was vectorized perfectly and you avoided things like floating point divide, branching, etc. you could get one FLOATING POINT RESULT per CLOCK PERIOD.

This was a speedup of about an order of magnitude over unvectorized runs.

I doubt that Cray deliberately made his computers run hot, but it was a little known fact that he was as much a master of thermodynamics design as of electronics. He could build computers more compact than his competition without them melting down. At the Cray's speed, the speed of light was a limitation, so the computer had to be as compact as possible.

Old news by Anonymous Coward · 2001-09-15 02:24 · Score: 0

- The Amulet Group at The University Of Manchester
- have a clockless ARM

Pocket watches were invented centuries ago.

Re:what about other problems? by Ravenscall · 2001-09-15 02:24 · Score: 2

First off, this is pure conjecture, IANAME.

Okay, the way I suppose this would work, considering that Intel had developed a chip that was compatible with the pentium series, would be an asynchronous design, with some kind of logic translator to communicate with the bus. Yes, at first you would be wasting processor power, but eventually, the bus technology would catch up (See ISA to EISA to VLB to PCI to AGP and on...). As for the RAM, it could either run on an independent clock-bus, or, I do not see why it would be a problem to develop asynchronous RAM if they have the technology for the chips. Also, the article states that the P IV utilises some asynchronous componants, maybe that is port of the reason for the push to use RDRAM with it?

--
You say you want a revolution....

CPU Primer by ThePurpleBuffalo · 2001-09-15 02:26 · Score: 2, Interesting

When designing a "conventional" CPU, you can have a clock that essentially drives events and datamovement.

If you design a multiplier circuit using a bunch of full-adders, you'll notice that the output take a long of time to settle. In fact, depending on what numbers you are multiplying together, the circuit may take more or less time before the output settles.

You can always determine the worst-case scenario for a multiply operation to settle. If the multiply takes longer than any other operation, then the multiply op is the "critical path".

A chip's frequency is the inverse of the period of the critical path (in most cases). So, if it's possible to do 100 million critical path operations in a second, then your machine can run at 100MHz.

What the article is hinting at is the amount of wasted time because everything is (currently) done on the clock cycle. Allow me to illustrate: Let's say a multiply takes 5 seconds, but an add only takes 1. A fixed clock rate (or having a clock at all) forces that add instruction to take the extra 4 seconds, and use it for nothing. Wasted computer time.

Now, the reason people are skeptical is because there is no efficient way to tell if a multiply operation (or any other operation) has actually completed and the outputs have settled.

Incidentally, if this interests you, go grab a free program called "diglog" or "chipmunk". The software (for linux/windows) allows you to simulate almost any digital circuit.

Another thing to keep in mind about current CPUs is the way they execute an instruction. Every instruction is actually made of smaller instructions (called microinstructions). Microinstructions take one clock cycle each, but there is an arbitrary number of microinstructions for each larger instruction. The microinstructions perform the "fetch execute cycle" - the sequence that decodes the instruction, grabs the associated data, performs the desired task, and goes back for more.

If you're interested in designing a CPU yourself, go grab a book by Morris Mano called "Computer System Architecture". With that book and DigLog, it's pretty easy, but it takes a long time.

Re:CPU Primer by Anonymous Coward · 2001-09-15 03:27 · Score: 0

Are you new to CPU design?

There's a solution to one problem mentioned by Ungrounded+Lightning · 2001-09-15 02:34 · Score: 4, Informative

if there is no mass market for asynchronous chips, there's little incentive to create tools to build them; if there are no tools, no chips get produced. The same problem applies to the development of chip-testing technologies. Without any significant quantity of asynchronous circuits to test, there is no market for third-party testing tools.

But at least here there's an accidental solution - the Cross-Check Array.

Conventional clocked chips can be tested by scan: A multiplexer is added to the flop inputs, and a test signal turns them into one or more long shift registers. The old state of the flops is shifted out for examination while a new state is shifted in to start the next phase of the test. This only works when the flops to be strung together are all part of a common clocking domain.

The Cross-Check Array is more like a RAM. A grid of select lines and sense lines are laid down on the chip, with a transistor at each intersection. The transistor is undersized compared to those of the gates, forming a small tap on a nearby signal - or it can inject a signal if the sense line is driven rather than monitored. Select drivers are laid down along one edge of the chip, sense amplifiers/drivers along another.

This approach does not depend on the flip-flops to be active participants in the observation process (though it can still force their state), and thus can observe signals in asynchronous as well as synchronous designs. It also gives observability of testpoints in combinatorial logic without the addition of extra flops. Compared to a fullscan design it gives much greater observability and takes about half the silicon-area overhead.

--
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way

Programming difference? by andika · 2001-09-15 02:53 · Score: 2, Interesting

Does programming for clockless chip differ to synchronous one? Every links I tried to follow only explain about design, or speed, or power consumption difference.

Re:Programming difference? by 2nd+Post! · 2001-09-15 04:17 · Score: 5, Informative

It *can* be different, that but's really a function of the state of compilers and languages adapted for an asycn system. It needn't be different at all.

Disclaimer, I was a student at Caltech, and I took 1 async VLSI course, and not very in depth at that.

One way to go about it is to make an async CPU that externally looks like a sync CPU; then you drop it into just about any system, and it works. Speed is wholey dependent upn VCore settings, cooling solutions, and drive strength, I think, though of course there's always gate and transistor performance bottlenecks. Programming and using such a chip would be no different than any other CPU.

Another method is to have a partially async system, in which the CPU, some of the motherboard, and the ram interface is async because of how fast they operate; go ahead and clock something like PCI, USB, etc, because those operate slow enough that the effort of async isn't worth it. This solution is just a question of degrees, really, on how much of the system is async and how much isn't.

Now, that aside, there's the software aspect; how do you program an async system? At the lowest level it resembles, slightly, multi-threaded programming, in which you have multiple threads equating to the multiple function units, execution units, decoders, and stages in the pipeline, etc.

You shuttle data around and wait for acknowledges that the data has been processed before you continue shuttling and processing data. You can synchronize around stages or functional units by making other stages or units dependent upon the output of said unit; instead of waiting for a clock to signal the next cycle of execution, you wait for an acknowledge signal.

To be a little more clear, at the ASM level you would mov data, wait for an ack before another mov data, wait for an ack before sending an instruction, etc. Due to the magic of pipelining, the CPU doesn't have to be finished before you can start stuffing the pipeline, and because it's asynchronous, that means you can actually feed in data as fast as the processor can recieve it, even if the back end or the core is chocking on a particularly nasty multiplication.

So you're feeding data at a furious rate into the CPU, while the CPU is processing prior instructions. If the front end gets full, or whatnot, it fails to signal an ack, so whatever mechanism is feeding data in (ram, cache, memory, whatever) pauses until the CPU can handle more data.

The core, independent off the front end, is processing the data and sending out more instructions, branches, setting bits. With multiple functional units, each unit can run at it's own speed at it's own rate. So if all it's doing is adds, checking conditionals, etc, it may be able to outrun the data feed mechanism, since an add can be completed in one pipeline unit, while data always has to wait upon a slower storage mechanism.

Or if the execution units are waiting because it's doing a square root or something, it just tells the prefetch or whatever front end units to wait, because it cannot handle another chunk of data or instruction, yet, which propogates back to the data feed to wait as well.

When it finishes with it's current instruction a ready signal would get propogated back through all the stages or so, and then more data would get fed in.

So at the lowest levels it would start to resemble writing threaded code, in which you have to wait for the thread to be ready, to be awake, to be active before you send data, and if the thread is asleep, you wait until it awakes, or something like that.

Multiprocessor async is similar, except that each CPU is just another thread, and if there's a hardware front end that decides which CPU to send instructions to, then it's really just a function of stuffing instructions into the least loaded or fastest running CPU; each CPU could, more or less, look like just another functional unit, and clusters pretty well because they all run asynchronously, meaning you don't have to do anything particularly special for load balancing; just send the data to the first one who signals ready, or if there are multiple cpus ready, read a status register to see which is more empty or whatever.

Apologies if I made some errors, especially to those who know much more than I; this is a 4 year old interpretation of my async vlsi class =)

--

GPL Deconstructed
Re:Programming difference? by ballista · 2001-09-15 04:31 · Score: 1

For general programming there will be no difference. Its when you try to optimize that difference appear. Optimization in async logic is very difficult. In synchronous logic you need to optimize around the superpipeline. You reorder your instructions so data is available to later instructions without causing it to stall the pipeline. Since you know exactly how long each instruction takes in cycles you can schedule
correctly, even force schedules with NOP instructions.

In async programming NOP instructions can't be implemented. They don't really make sense anyways. The pipeline in an async chip is technically allways stalling. So you will need to learn many more consequences to the instructions you choose. At times the data has an effect on the speed of the instruction.

Using logic like DCVSL a 32 bit shift operation would finish faster if the data was a binary 1 versus any number that used multiple 1's in its representation. This makes optimization rather intersting. For example you can perform byte operations faster than word operations.

In the async processor I did for my thesis, we simply ran the optimized synchronous code throwing out the NOP instructions. The result was fater execution even if the code was not optimized for the correct processor.
Re:Programming difference? by rnd() · 2001-09-15 06:10 · Score: 1

could you sample the input and dynamically optimize based on the time the chip is taking to process?

--
Amazing magic tricks
Re:Programming difference? by isj · 2001-09-15 06:58 · Score: 1

> So at the lowest levels it would start to
> resemble writing threaded code,

In case of a pipeline stall, would SMT be advantageous?

To me it looks like asynchronous non-CPU devices, pipelined CPU, and SMT would be an ideal combination.
Re:Programming difference? by Anonymous Coward · 2001-09-15 07:00 · Score: 0

Are you kidding?

How many Slashdotters actually program "for" a chip anyway? At best, most of the coding that gets done is "programmed for an OS". That may be an OS that only runs on one chip type, but that's a different flamewar.

Based on the threads whenever software gets discussed, folks here have largely bought into the high-level==good theory of software completely. Fundamentally, for a high-level language only the compiler needs to know that the chip is anything different.

Speaking as one doing embedded/DSP programming, the vast majority of folks here have no clue about environments in which (too_big || too_slow)==wrong.

Oooh, don't you just hate it when ISO C is too abstract to let you properly utilize the CPU's features?

Eric
Re:Programming difference? by Anonymous Coward · 2001-09-17 08:24 · Score: 0

So you're feeding data at a furious rate into the CPU, while the CPU is processing prior instructions. If the front end gets full, or whatnot, it fails to signal an ack, so whatever mechanism is feeding data in (ram, cache, memory, whatever) pauses until the CPU can handle more data.

The main problem with pipelines these days are that they get longer and longer, which in turn makes prefetching of data and instructions harder and harder. In worst case the processor makes so many mispredictions that it has to flush it's pipeline and caches very often and thus degrades its performance a lot.
The solutions seems to be software pipelining. I.e. you make the pipeline much shorter to ease prefetching, but also make it fatter to allow many instructions to be carried out simultaneously by different processing cores. This, of course, adds the problem of figuring out which instructions are dependent on each other data-wise so you can factor the code in independent parts that can be ran all at once. When solved you basicly have made a software driven pipeline.
IMO it will be very interesting to see optimization thoeory evolve over the next few years to handle this. The incentive is strong since software pipelining is useful for both async. operation and multicore processors. IBM is already quite serious about its BlueGene multicore techno and colorForth/25X faces the same problems. From there it shouldn't be too hard to produce compilers that handle async. operation quite easily. Just hand time-to-complete-some-instruction data to the optimizer and off you go :-)

klasa at hl lu se
Re:Programming difference? by 2nd+Post! · 2001-09-17 09:08 · Score: 2

The interesting part about asynch CPUs is that they, duh, aren't clocked...

So in the case of a pipeline flush (and accompanying stall), it doesn't take N clocks (whatever the pipeline depth is), it goes as fast or as slow as the flush mechanism reset takes...

If done well, then a pipeline flush can operate at thousands of times faster than the normal operation of the pipeline because, well, you're just dumping data without doing any work; raise the proper bits and reset signals, and the whole pipeline dumps as fast as it can, while the front end feeder just slows down a bit (without stopping) in feeding data into the pipeline.

Above assembly, btw, the programming language for the CPU doesn't have to look like SMT; it can, but it doesn't have to.

--

GPL Deconstructed

Efficiency with growing clock speeds by Jimmy_B · 2001-09-15 02:58 · Score: 2

[from article] But after a point, cranking up the clock speed becomes an exercise in diminishing returns. That's why a one-gigahertz chip doesn't run twice as fast as a 500-megahertz chip.

Wrong. A 1GHz chip doesn't run twice as fast as a 500MHz chip because of pipelining, and because the support infrastructure in a typical PC can't handle a 1GHz chip well, so it spends a lot of time waiting for hard disk access and memory. Eliminating the clock isn't going to make the heads on a hard disk move any faster. The real benefit is that an idle component can't be a bottleneck anymore.

Re:Efficiency with growing clock speeds by Defiler · 2001-09-15 03:19 · Score: 1

The point they were trying to make with that comment is this:
Even if you removed all other bottlenecks, a 1GHz version of a 500MHz CPU (with no other architectural improvements) will not perform twice the work of the 500MHz version due to clock overhead.
Re:Efficiency with growing clock speeds by ericvids · 2001-09-15 05:58 · Score: 1

If you think about it, the article is actually correct. The latency within the wires themselves also prevent a single central clock from timing the whole system accurately.

Even with a hypothetical chip that doesn't incur speed decreases due to pipelining, the clock will still end up nearer to parts of the chip than to others, which will result in latency at the end of the pipeline.

Hence if you've got a 500Mhz chip with 2 stages and the clock physically placed near stage 1, then stage 1 of the pipeline will run at 500MHz, stage 2 will also run at 500MHz but with some latency, so the two-stage pipeline will complete an instruction very slightly over 2 cycles. Add more stages, you'll get a bigger effect at the end. And as clock speeds go faster, you'll eventually hit the ceiling -- the latency might actually be as fast as a single cycle itself.

And having multiple clocks to offload the work (and to bridge the gap from the other stages) can only do so much -- eventually it becomes an issue of timing all these clocks together. You'll eventually wish to remove the clock altogether. =)

As for I/O with the rest of the system, it's not really an issue here -- what is being discussed is the processor's raw speed. I/O bottlenecks are already being solved via intelligent caching, and for more improvement we will probably have to wait for a totally new architecture.

--
Pet peeve: Profane people propagating perfunctory pedantry.
Re:Efficiency with growing clock speeds by Jimmy_B · 2001-09-15 08:30 · Score: 2

Hence if you've got a 500Mhz chip with 2 stages and the clock physically placed near stage 1, then stage 1 of the pipeline will run at 500MHz, stage 2 will also run at 500MHz but with some latency, so the two-stage pipeline will complete an instruction very slightly over 2 cycles. Add more stages, you'll get a bigger effect at the end. And as clock speeds go faster, you'll eventually hit the ceiling -- the latency might actually be as fast as a single cycle itself.
Latency doesn't affect the time required to complete the instruction, only the time at which it is executed. If the clock reaches second pipeline stage late, the time required to complete that pipeline step is latency+calculation time. If that's more than one cycle, the chip is clocked higher than it can run, period. Anyways, there's a fairly obvious solution to clock latency in pipelines. Put the start of the pipeline near the registers, and the end of the pipeline near the registers, then a U shape for all the ones in the middle; since a stage only needs to interact with the ones before and after it, which are physically adjacent, the effective latency is small.

No changes required. by seizer · 2001-09-15 03:02 · Score: 2

They're talking about removing the internal CPU clock, which in effect, isn't really a clock at all. It's just something which ticks at regular intervals, and lets you do a number of things, such as synchronize instructions, pipeline, cache read/writes, and all the other stuff I forgot from CS 101.

A computer's clock (as in date, time, etc) is on another part of the motherboard, and runs (correct me if I'm wrong) off the CMOS battery. That'll always be a "clock" in the sense we understand.

Re:No changes required. by bugg · 2001-09-15 04:04 · Score: 2

No, a computer does indeed know what time it is based on a clock- it's the same way digital watches know what time it is (counting pulses). The answer? Computers will still have some sort of calibrated oscillating circuit in them, but they won't be synchronizing processor activity.

--
-bugg
Re:No changes required. by praedor · 2001-09-15 04:24 · Score: 1

Except every frickin' mobo clock (date/time) I have ever come across can't keep proper time worth a damn. Why is it I could buy some cheap-ass cheesy Care-Bear watch and it is assured of keeping better time, by a long stretch, than any mobo clock? Why is that?!

--
In Bushworld, they struggle to keep church and state separate in Iraq as they increasingly merge the two in America.
Re:No changes required. by Anonymous Coward · 2001-09-16 04:11 · Score: 0

that's what ntpd is for.

Ideal Laptop by Vegan+Pagan · 2001-09-15 03:06 · Score: 2

And build in a microphone and make itts screen touch sensitive. That way you can get rid of the keyboard, trackpad and hinge and make it a single, consolidated unit.

Mix&Shake by roman_mir · 2001-09-15 03:19 · Score: 2

It would not be economically viable to try and push this new type of processor to the market overtaken by the traditional synchronized processors and computer equipment, however, it seems that the assynchronous microprocessing can still be used inside traditional computers if it is mixed together with synchronized systems. Imagine a computer that uses a synchronous bus just the way it does now but has an assynchronous co-processor which is communicated to by a special type of synchronous CPU that allows certain operations to be carried out assynchronously. If, for example, a matrix multiplication needs to be done, the normal CPU would require a number of clock cycles that is proportional to the number of multiplications within the matrix over the number of processor pipes allocated for this task. If it can be proven that assynchronous processing can do the same job three times faster than a 'normal' cpu takes, why can't 'normal' or traditional CPU ask the assynchronous co-processor to do the task for it? The problem is of-course assynchronous data retrieval and storage. Probably a co-processor could actually be a co-processor card with its own assynchronous memory bank on board that can be later synchronized with the traditional memory banks. Such a system should not be too difficult to implement, since it could use a PCI slot for example. Soon a computer would become less and less synchronous, with the synchronous parts synchronizing many assynchronous devices.

--
You can't handle the truth.

Asynchronous is to impress your gf by Anonymous Coward · 2001-09-15 03:35 · Score: 0

Asynchronous VLSI is one of the exotic yet hardest subjects around. Caltech's Alan Martin is perhaps one of the most popular person in the field around. Their groups has designed asynchronous MIPS 3000 from. IT works pretty beautifully and is faster.

My gf is a device engineer and she really fell for me when she learnt that I know asynchronous VLSI design. Coool .... academics also earns chicks :)

Re:Asynchronous is to impress your gf by Anonymous Coward · 2001-09-15 14:16 · Score: 0

I bet your gf is really ugly.

You guys are way behind by pvera · 2001-09-15 03:50 · Score: 1

I read that article thru a link at the bottom of C-net's news.com a few days ago. Why bother /. it? Are you implying /. is the only place we look for news?

Gee...

--
Pedro
----
The Insomniac Coder

Re:You guys are way behind by Lars+T. · 2001-09-15 08:01 · Score: 1

No, /. may not be the only place "we" look for news (not that I can remember anybody saying that, esp. not the original blurb), but it's a damn good place to discuss the news.
Gee indeed.

--
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck
Re:You guys are way behind by Anonymous Coward · 2001-09-15 09:12 · Score: 0

and you..... you are .... well.........

this is what Slashdot does, don't be such a dork.

take that.

Re:Hello! by Turd+Fergus0n · 2001-09-15 03:53 · Score: 0, Offtopic

I'm Turd Fergus0n. Funny name huh? Turd Fergus0n, remember it.

--

Yeah, that's right. Turd Ferguson. It's a funny name.

People are to stuck on MHZ. by jellomizer · 2001-09-15 04:10 · Score: 1

The Only way they can probly avertise the Async chip is to give the MHZ of the fastest segement of the chip. That or they will actually have to advertise other segments of the computer that determin speed. Dose that meen that computers will be sold with more Cache Again, Or they actaully tell the Bus speed or even the Pipeline of the systems. My god this will turn computing advisertising around. Where a system simular to a SunBlade1000 with 8megs of Cache will actually be advertised faster then a P4Like system with 1/2Meg of cache. Will Wonders never siece.

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.

urban rumors... by Anonymous Coward · 2001-09-15 05:27 · Score: 0

One thing to think about is why in a freeway things move slower when there is a lot of traffic and faster when there is less traffic...

In asynchronous systems, control signals need to travel at the handshake speed. In a freeway, this is the brakelight to braking reaction time. In a chip this is the complete to start signal (or stall signal). This is often called the "speed-of-light" in an asynchronous system (the fastest speed information can travel).

Synchronous systems are benefit from coordination that operates globally with a path independent fast communication (the clock tree mismatch/jitter time). Clearly this is not scalable indefinitly, but its helps when practical.

In the final analysis, it will probably be true that although it appears that time is wasted in a single computational stage, the backpressure wave it will create in a pipeline is limited by the "speed-of-light" in the handshake which negates most of the advantages (since a clock is often run at the fastest speed-of-light possible and computes are repipelined to account for this). In the end, the only real savings will probably be in power area (although a chip that burns less power can run faster, this is a second order effect)...

Think about this the next time you are stuck in asynchronous traffic and how global coordination trades some inefficiencies for greater global efficiency. In fact automatic car control systems have recongized this and have proposed car clusters to improve traffic efficiency (cars would group together using wireless lans and move in a group with tighter control, but still act asynchronously with the global traffic).

As with nearly all technologies, hybrid solutions are often better than ones that ones that are architecturally "pure". Practical systems that mix locally synchronous and globally asynchronous systems are probably the more optimal solutions for many problems. With the reverse (locally async and globally sync), backpressure waves cause losses in performance (because of loss of throughput, you can't run it totally pipeline full)....

Simplistic analysis like "time is wasted between clocks" does a disservice to /.-ers trying to understand this technology...

and it doesn't even make sense... by slew · 2001-09-15 05:37 · Score: 2

One of the first common "thinking-out-of-the-box" techniques used to crack smart cards was the sw was written to take different amounts of time to compute legal and illegal keys. By measuring the battery consumption, the smart card crackers could only search the space of legal keys.

No doubt this was a sw path put in by a well intentioned programmer trying to save battery life, but now all respected encryption systems reccomend a "veil" strategy, where all encryption/decryption operations take the same amount of time and power regardless of the key.

In practice this means that you find out the max time and power (plus some margin) and if you are done early and without using enough power, you waste time and power to pad out the the veil...

Nice thought, but this just goes to show that cryptographic systems really need to be designed by experts...

Re:and it doesn't even make sense... by randombit · 2001-09-15 06:35 · Score: 1

No doubt this was a sw path put in by a well intentioned programmer trying to save battery life, but now all respected encryption systems reccomend a "veil" strategy, where all encryption/decryption operations take the same amount of time and power regardless of the key.

That's not really necessary. All you have to do is randomize the compututation. For example, power analysis of a smart card doing RSA can recover the secret key, if it knew what the input was (in many situtions, a reasonable setting). But if you multiply (or is it exponentiate?) the input by a random number, then do the RSA op, then demask the output, poof! - PA, electromagnetic emission analysis, etc all get very very hard.

Also, it can be hard to disguise your "wasting time" as being part of the computation, if the attacker can, for example, track which memory is being accessed when.

I wonder how well these clockless chips would fare against differential fault analysis; basically progressively destroying gates in the chip and looking at it's output over time. Almost any chip will fail against this attack (but it requires lots of expensive equipment and a fair amount of expertise).

Smells Like Vapor Ware by nukeshi · 2001-09-15 05:55 · Score: 1

Yikes, seems a little sci-fi and bogus claims....

Of course the new Pentium 4 contains some elements of asynchronous design... all synchronous chips do! In a synchronous design, the logic between registers (article calls Flip Flops) is asynchronous. The gating factor on the amount of asynchronous logic you can place between registers in a synchronous design is a function of the clock speed and the gate speed -- the faster the gates, and/or the slower the clock speed the more logic you can place between registers. Looks like the article is about a system with a clock rate of 0 without changing gate speed, so the processing rate will be the sum delay of the asynchronous logic -- I wonder what this would be on a chip the complexity of a P4 or G4?

The upside to slower clocks is reduced piplineing, which can be useful in designs with limited data paths.

The down side to slower clock speed is increased complexity. Data skew has to be monitored across the chip, so gate delays have to be accounted for every gate in every possible data path (vewy complex). The chances for glitching increase with logic. With no clock it gets worse, every glitch can be seen -- not the case with a clock (glitches between clocks edges may be tolerated).

I also disagree that clock distribution is limiting factor. This problem is overcome in larger ICs by distributing PLLs throughout the silicon. The limiting factor in clock speed has more to do with materials used in the chip -- gate speed, skin effect, etc.

Finally, there are quite a few ways to increase the performance of synchronous design. One way is to have multiple data and ALU paths like the Pentium and G4. Another is IC technology. Personally, I'm waiting for the day an all optical processor hits the market.

So an asynchronous chip runs a little faster, the trade is an enormous design risk, maketing, OS development, etc. I say leave the anarchy to the software.

Asynchronous mainframes by Animats · 2001-09-15 06:38 · Score: 2

Asynchronous mainframes were built in the 1960s, by, I think, Honeywell. There was a modest performance gain, but well under 2x.

Parts of processors are already asynchronous. The basic way you get stuff done in a clocked machine is that you have a register feeding an array of logic gates some number of gates deep, with the output going to some other register. Within the array of logic gates, which might be an adder, a multipler, or an instruction decoder, things are asynchronous. But the timing is designed so that the logic will, in the slowest case, settle before the register at the receiving end locks in its input states. The worst case thus limits the clock rate, which is why the interest in asynchronous logic.

The claims of lower power consumption are probably bogus. As Transmeta found out, the power saving modes weren't exclusive to their architecture. Once power-saving became a competitive issue, everybody put it in.

Re:Asynchronous mainframes by Anonymous Coward · 2001-09-15 19:09 · Score: 0

I recall a section in my graduate degree in the mid/late-1970s on asynchronous hardware design for computers where the prime example was our university's DEC mainframe ... I suspect it was a KL10 (successor to the KA10 and before the KI10). This was pretty mainstream computing ... performamce was the issue and power was hardly considered.

But it was all overun by minicomputers, UNIX and then RISC microprocessors.

Of course, back then, if you wanted to do quick calculations, you used analogue computers ... provided you didn't need more than 2 or maybe 3 digits of accuracy.

circles and squres by xah · 2001-09-15 06:42 · Score: 1

Wouldn't an asychronous microchip be fabricated as a disc rather than a square, to help make the wires closer to the same length?

--
I am not a lawyer. Do not take my words as legal advice. If you need legal advice, consult an attorney.

commercial async computer circa 1961 by Anonymous Coward · 2001-09-15 06:59 · Score: 0

I'm not sure that Philco ever attained dwarf status, but they made and sold computer systems for much of the '60s.

From the description of the Philco 2000:

"The Philco 2000 Electronic Data Processing System
uses asynchronous logic which reduces computer operating time and allows new components to be added without redesigning the equipment."

http://www.ed-thelen.org/comp-hist/BRL61-p.html

The usual problems... by Yobgod+Ababua · 2001-09-15 08:43 · Score: 2, Informative

The article is surprisingly accurate, for a change. Read it.

However, it seems to have spawned the usual problems here with misunderstanding and confusion. Practically a /. trademark by this point...

Whether you construct a processor using conventional or asynchronous logic makes no difference to the programmer. The programming paradigm can be completely independant from the underlying hardware. (Admittedly, if you want to squeeze the absolute most performance from a given hardware design, you need to program with it in mind, but there is no reason why an ix86, or PPC, or SPARC, or MIPS chip couldn't be implemented asynchronously.)

One of the most interesting advantages of asynchronous logic is that it allows the use of arbitrarily large die sizes. In synchronous logic, you're limited by the delays that arise from transmitting your clock pulses across the chip... at some point maintaining a global lock-step becomes infeasible.

One of the most marketable advantages of asynchronous logic is the power saved by not having to constantly drive the same clock circutry. Most chips support a 'sleep' or 'low power' mode where they turn off the clock or provide it to only a limited portion of the chip. The chip then has to go through a 'wake up' cycle to re-establish the clock throughout the chip before returning to normal operation. The power saved by asynchronous operation can be substantial, and the lack of a 'wake up' latency can be critical in certain applications.

The biggest problem right now is that the vast Layout and Design masses are used to solving the synchronous problems and not the asynchronous problems, ditto for the availible tools. Howver, with an asynchronous-savvy group, a given solution can be designed in less time than the equivalent synchronous solution (someone here was claiming otherwise...).

And this technology is -not- vaporware... it's real and it's here. And whether you believe it or not, it's at least one part of the future.

-YA

PS: BS in EE from Caltech. Working for a company mentioned in the article, although their opinions have no logical relation or tie to mine.

Re:The usual problems... by Anonymous Coward · 2001-09-15 18:32 · Score: 0

This Artical thread is probably dead, but if you happen to revisist it, do you know of any async ADC design? I tried to design one many years ago, but I was having trouble with my experiments because I was trying to discreat it out with cheap Xstrs and was having trouble with matching. Oh, BTW, I don't mean a flash converter.

how would we benchmark these by Anonymous Coward · 2001-09-15 09:00 · Score: 0

How would you rate these in speed?
Wohoo!, I just got a new computer with a speed of 0MHZ
ohh yah well I got 0.01MHZ!
well I guess MHZ dosent exist in these chips but how would the rating system go?

digging by mattdm · 2001-09-15 09:08 · Score: 1

By doing something like 'cat /proc/cpuinfo'?

What does this mean for programmers? by Anonymous Coward · 2001-09-15 10:46 · Score: 0

int sum(int a, int b)
{
if((a == NO_VALUE_YET) || (b == NO_VALUE_YET))
return NO_VALUE_YET;

return a + b;
}

Re:What does this mean for programmers? by Anonymous Coward · 2001-09-15 10:48 · Score: 0

int result;

while((result = sum(123, 456)) == NO_VALUE_YET)
;

...

When will /. design a chip!!?? by Anonymous Coward · 2001-09-15 10:51 · Score: 0

Lin00x r00lz j00!!!

The highway and a few questions by Anonymous Coward · 2001-09-15 11:02 · Score: 0

Thanks for this analogy, the freeway is the way I describe/visualize circuitry.

Obviously I'm not an electrical engineer, I'm just trying to understand the technology.

When we increase the frequency are we increasing the speed - passing cars/sec - by increasing the density or increasing the velocity of traffic? It seems that a higher voltage would increase the velocity while a higher clock enables the cars to be closer together.

If the technology yields 3x performance at 32 bit will the multiplier increase at 64 bit and up? (making the highway wider) If so does this mean that the technology doesn't provide cost/performance now but in the future it will?

You mention a hybrid system of locally syncronous systems in an asynchronous environment.

Would this include a MP system of syncronous processors and an asyncronous bus and I/O? It seems to my uninformed mind that this could provide a huge performance gain. Besides the bus is really what is holding us back right?

I mean isn't it the bus that makes high-end servers so expensive and isn't this what leads to diminishing returns in MP systems? Isn't it latency that limits the speed of the bus? Would an asyncronous bus minimize the latency effect?

If so could an asyncronous bus lead to higher aggredization of computing tasks - on a code level and processing level? (more threading and processors)

As an AI hobbiest I am always interested in a system that is aggregate and asyncronous - because this is how our brains work.

This type of topic is the reason I read slashdot. Cheers!!!!

Re:Call to Arms by Anonymous Coward · 2001-09-15 11:44 · Score: 0

personally, I laughed when I heard the news on Tuesday. I'm not getting laid, so it's hard for me to empathize with the bereaving spouses; I'm like "wow, at least you had someone. what's it like?" I actually feel more empathy for the terrorists, because my life sucks and I feel like I have nothing to lose and I appreciate it when someone strikes a blow against the forces of "normalcy" and "happiness" in the world.

Null Convention Logic by podom · 2001-09-15 12:18 · Score: 1

The article mentions Theseus' approach to asynchronous design -- Null Convention Logic (NCL) -- but does not go into any detail. For more info, check out Theseus' white paper on the subject: ncl_paper.pdf. I read this a couple of years ago and thought it was fascinating. At the time, I tried to design some "primitives" that could be implemented in an FPGA to at least try out some of the ideas. Not a trivial excercise.

--
We're wanted men. I have the death sentence in 12 systems!

HEHE... What a the joy of overclocking? by Anonymous Coward · 2001-09-15 14:21 · Score: 0

I guess it will just dissapear...
Also Check THIS OUT!!! ITS A SOCKET A OVERCLOCKER
like the gold finger device but fo socket not slot

http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewIte m& item=1273872749

Stating the Obvious- Human Brain by Databass · 2001-09-15 17:32 · Score: 1

The human brain doesn't have a clock speed on the Central Processing Unit- in fact, there _is_ no central clock, but our minds manage to function with a great deal of processing power. Imagine the bandwidth of the file equivilant of all the .wav, .avi, .ogg, .mp3, .txt, Optical character recognition, and AI functions we use, plus mechanical functions like bipedal balance. I've heard estimates and approximations that the brain performs about a trillion operations per second, is that about right? Pretty impressive.

An interesting thing to think about is, with no clock speed, how we still can perceive time. We need to do this to predict the paths of moving objects, like birds and arrows and spears... or more recently car trajectories when we're driving. With no absolutely authoritive center time in our minds, how do we still have such an accurate sense of time when it comes to predictiong these paths?

I personally imagine that the brain does have some sense of ratios...I imagine that neural loops have some sense of ratios... for example, if hypothetically the motor loop between between say the basal ganglia and the corpus collupsum is were twice the speed of an eyeblink? The exact milliseconds could vary between people but still give a basis for comparing motion and "time" in the real world. Of course, this would be affected by age as the loops break down- this would account for the way the old people I've seen tend to drive.

Re:Stating the Obvious- Human Brain by Anonymous Coward · 2001-09-16 00:55 · Score: 0

that's why perceived time is not uniform.

I think we use busy loops for time measurements. On a "busy" day, time just fly by while on a "slow" day you can be looking at your watch all day and the minute needle hardly moves.

A practical example by mandrewa · 2001-09-15 21:05 · Score: 1

There's a man named Charles Moore who has been developing asynchronous microprocessors over the last decade. His current chip is called the X18 and it can maintain a sustained processing rate of 2.4 billion instructions per second. The power consumption at that rate is 20 milliwatts. Check out http://www.mindspring.com/~chipchuck/X18.html Also check out http://www.mindspring.com/~chipchuck/25x.html, which describes his X25, currently available only as a prototype. Basically its 25 X18s on one chip, running in parallel. Assuming that you can write a program that could take full advantage of 25 such cpus that would amount to 60 billion instructions per second. The power consumption is so low as to allow operation of the microprocessor array for one year on one 100mAh battery.

Re:A practical example by Anonymous Coward · 2001-09-16 01:08 · Score: 0

The way he design chips are minimalistic. This is the way a real genius works.

When is the last time we hear people on /. still dealing with old 80386 ? You get flamed & crushed here if you are not running 2.0GHz P4 and IE for web browsing. People just don't understand and appreciate the concept of using just enough to do the job.

We don't need increasingly complicated systems to do the same job 10 years ago. Take a look at all that crappy website out there using all kind of cpu sucking junk and adding very little contents.
I think 99% would have been fine with just html and none of the active contents.

The average code monkey is not capable to use such architectures fully. They would be turned off if they can't code in c++ or java or vb or use win api.
Re:A practical example by mandrewa · 2001-09-16 02:33 · Score: 1

Anonymous, although Charles Moore definitely is a minimalist, there is nothing blocking anyone from implementing c++ or java or visual basic on his X18.
Although I suspect C++ would be a bear, simply because C++ is apparently difficult to implement for any processor, the others might actually be easy.
Java has a forth-like bottom layer.
Visual Basic, well, I am reminded of Marcel Hendrix (see http://home.iae.nl/users/mhx/). Some time back he developed a general procedure for implementing well-described computer languages in forth.
As test cases he did this for several languages one of which was Pascal. He was able to do these implementations quite rapidly, almost automatically, and as I recall, Pascal implemented in Forth was significantly faster than Borland's Turbo Pascal (implemented in, I think, assembler?).
Re:A practical example by mandrewa · 2001-09-16 03:04 · Score: 1

I got curious as to how the speed of X18 would compare to a pentium 4.
Pulled up Intel's Instruction Set Reference
(ftp://download.intel.com/design/Pentium4/manual s/ 24547104.pdf) and was
surprised to discover that they are apparently not giving the programmer
any clue as to how long, or how many clock cycles, it takes these instructions
to execute.

Likely this is because this is a very difficult question to answer, clock
cycles per instruction being highly variable dependent on what else is
going on in the processor at the same time.

As I recall in earlier versions of the the pentium clock cycles per instruction
would range from 110 to 20 cycles.

If we assume the average pentium 4 instruction takes 30 clock cycles to
complete, then a pentium 4 running at 2 gigahertz is executing 66 million
instructions per second.

The X18 executes 2.4 billion instructions per second. That's 36 times
faster.

Further the X18 in any quantity would probably cost several cents per cpu to
produce. The pentium 4 at 1.7 gigahertz cost about 209 dollars.

A little fairer comparison would be the X25 costing one dollar in quantity once
one has gone a million units down the learning curve. This is an array of 25 cpus
and its practical instruction processing rate is probably highly variable dependent
on application. There might be special cases where one could use all the cpus and
deliver 60 billion instructions per second (909 times faster than the pentium 4),
but more typically I would guess it would be a fraction of that although still
of course much faster.

Chip Security / # of wires by teridon · 2001-09-16 00:24 · Score: 1

Fant says, "There's no clear signal to watch. Potential hackers don't know where to begin."

Don't you just have to look for the handshake signals instead?

Also, what are the implications of the "dual-rail" circuits -- doesn't this mean that you won't be able to fit as many transistors on the chip?

--
I hold it, that a little rebellion, now and then, is a good thing. -- Thomas Jefferson

Proof your statement...(was: Re:How...) by CBravo · 2001-09-17 01:36 · Score: 1

Well, the OS can communicate asynchronously with many things. I don't think you can PROOF your statement. *ell, I think I can falsify your statement.

--
nosig today

Intel by AlgUSF · 2001-09-17 04:41 · Score: 1

How will Intel sell chips if clockless computing is ever successful? They won't be able to double the length of their pipe to "speed up" their chips. I guess we will have to finally develop some fair metric to finally be able to compare chips between product lines....

--

I want my rights back. I was actually using them when our government stole them after 9/11.

Slashdot Mirror

Clockless Computing: The State Of The Art

140 comments