Clockless Computing
ender81b writes "Scientific American is carrying a nice article on asynchronous chips. In general, the article advocates that eventually all computer systems will have to move to an asynchronous design. The article focuses on Sun's efforts but gives a nice overview of the general concept of asynchronous chip design." We had another story about this last year.
for the first guy who overclock's it ;-)
they can't take away the clock. how will i know what time it is?
Gyrate Dot Org - "Where high-tech meets low-life"
Yet another old idea revived. The Amiga's Zorro expansion bus was asyncronous and plug n play in the 80s (although the rest of the machine was clocked).
To clear a few things up, just because a processor/motherboard is "clockless" does not mean it won't be able to tell time. They can still use the 60 Hz AC signal for ticks.
This is really cool. I was learning a little about asynchronous systems in my Logic Design and Computer Organization class last fall...they seemed pretty cool on a small scale, however they could get really difficult to work with when you're dealing with something as complex as a processor.
"I may be quite wrong." - Socrates
The article talks about an advantage of clockless chips being the fact that you can do away with all of the overhead in sending out the clock signal to the various parts of the chip. It also discusses what kind data processing activities are more suited for clocked vs. clockless chips. To get a best-of-both-worlds chip design, what about farming out various responsibilities on the chip to clockless sub-sections? The analogy I have in mind is when I drop my laundry off at the dry cleaners. I am on a tight schedule, and I have a lot of things to do in a certain sequence, while the dry cleaners collects laundry and does it at various rates during the course of the day. This particular laundry of mine can be done at any point over the next 4 days, and held afterwards, just so long as I have the finished product exactly when I need it, Thursday at 4:15 p.m. Different people assign different limits on the time-sensitivity of their laundry. The clocked sections can drop off their data for processing, and pick it up when they need it, and what happens inbetween is up to the clockless subchip, which does more-or-less FIFO, but can be flexible based on the time-sensitivity of the task.
The man who does not read good books has no advantage over the man who cannot read them. - Mark Twain
withoutaclocksignal,howcanyoutellwhenoneinstructio nstopsandanotherbegins?
(kidding)
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
After reading the article, I have to wonder why asynchronous processors (or smaller logic devices, such as ALUs) haven't been considered before. The ideas have certainly been around for awhile--and in fact, asynchronous is intrinsically simpler than synchronous logic. The only conclusion on this I can reach is that while asynchronous designs may be "simpler" in theory, in that they don't include a clock pulse, they are much more difficult to work with in practice. Here's an example for those of you that have worked with logic design: try creating the logic for a simple vending machine that dispenses the product whenever a combination of coins (triggered by 3 switches, quarter, dime, and nickel) adds up to $0.50. Which would you prefer to use--synchronous or asynchronous logic? I know when I did this example I got myself stuck by using asynchronous logic, because while asynchronous logic meant less memory states (all states above $0.50 were treated the same), it also meant lots of added complexity, which I didn't need for the problem at hand.
I foresee lots of bugs, but if they can pull this off, more power to them.
"I may be quite wrong." - Socrates
... won't the buss and storage devices be a bottleneck still?
Bring on the solid state storage.
----- Whats wrong with this picture? http://www.revoh.org:1234/whatswrong
In a way yes. If I remember well, it's memory addressing and I/O bus system was asynchronous (not the clock of the CPU itself), meaning no 'wait states'. It would request a memory location and react as soon as the memory came up with the result. I forgot the details though.
Wasn't the 68000 asynchronous?
No, it was so slow it just seemed that way.
- ClocklessCPU
- ClocklessCPU 2
- Super ClocklessCPU 2
- Super ClocklessCPU Turbo
- Super ClocklessCPU Turbo 2
- Super ClocklessCPU Turbo !!!
- Super Duper ClocklessCPU Turbo MAX
- Super Duper ClocklessCPU Turbo MAX 2
- etc. etc.
Your comment violated the "postercomment" compression filter. Try less whitespace and/or less repetition. Comment aborted.Kevin Nermoyle (Sun VP) advocated asynch at the 2001 uProcessor Forum. The biggest and most daunting objection I heard in response was that tool support would be a killer. There is no tool support for asynch design at the complexity level needed to do a processor. You're left to a bunch of Dr. Crays using the length of their forearm to resolve race conditions with wiring distance. Since a large portion of the industry would have to make the leap to get the tool guys to invest in development, this kills any realistic possibility of an overnight asynch revolution. Small niche applications will have to get the ball rolling on this. Even still, designer's would need to get a lot smarter to think asynch. Think of how many chip protocols rely on a clock. How do you do even simple flow control in a queue for example? Pipelining goes to pot - its a whole different world. My two-cents.. Loli
withoutxxxx axxxxxxxxxx clockxxxxxx signalxxxxx ,xxxxxxxxxx howxxxxxxxx canxxxxxxxx youxxxxxxxx tellxxxxxxx whenxxxxxxx onexxxxxxxx instruction stopsxxxxxx andxxxxxxxx anotherxxxx beginsxxxxx ?xxxxxxxxxx
Because rephrasing your question as above is what synchronous looks like; every word has to be padded to the longest word length. Asynchronous is like normal written language; words end when they end, not when some 5 char clock says so. Another crude analogy is sync vs async serial comm, except using hoffman(sp?) encoded chars, so async can use variable length chars, but sync has to padd the short ones out to the length of the longest.
I tried underline instead of x but the stupid lameness filter objected/
Infuriate left and right
if we have clockless computers for the desktop, HOW will Intel and AMD market them?
After all, a large quick and dirty rating they have used for decades is the clock speed. Throw that away and what do you have?
I can see the panic in their faces now...
"It is a greater offense to steal men's labor, than their clothes"
> It would request a memory location and react as soon as the memory came up with the result.
;-)
Well, kind of. A bus cycle completed when someone signaled "data transfer acknowledge" (DTACK) - then the CPU would read the data off of the bus. Most systems understood where in the address space the memory request was going, how fast that device was, and had logic to count system clocks to trigger DTACK when the data "should be" ready. (In fact, most memory devices have no way of signaling when a read has completed - they just guarantee in in a certain amount of time.)
On the other hand, if you didn't hit DTACK in time, a bus error was generated and an exception routine triggered. Ahhh, the good old days
If you have looked at the "bucket brigade" graphic in the article, then you will know what I'm talking about...
Is it just me, or does that picture seem to imply that you get a lower "buckets per unit time" throughput from asynchronous processing?
I know that this is not the claim of the article... but it's still my gut reaction to the graphic.
"Gandy Dancers" (railroad manual track laying and repair teams) were so-called because the first part of their name was the Chicago tool maker that made track laying tools, and the second part of their name came from the fact that they worked to a rhythm.
A better analogy would be a work-content based multipath route, where the amount of time is based on the type of work to be performed.
This would have implied (correctly) that, in an synchronous system, you should be able to "make up for" slow elements by doubling them up: i.e., when you are faced with a slow section of pipe, rather than bottle-necking, make it wider, instead.
Or to use their analogy, if you have a slow guy, then get another slow guy to stand next to him so he doesn't bottlneck the brigade.
Probably a more apt analogy would be nice: it's hard to show throughput increases, except by number of buckets in the hands of the people.
-- Terry
Exactly right. Nowdays, most of the Motorola embedded processors (many of which use 68000 or 68020 cores) can generate their own DTACK signals. For example, the 68302 has four CS (chip select) lines that you can internally map to whatever address ranges you want. You specify how many wait states are required and the DTACK and CS signals get generated automagically. This cuts down dramatically on on-board glue logic and address decoding logic, which is important for (typically small) embedded designs.
Admit nothing, deny everything and make counter-accusations.
Here's a somewhat shorter primer from Wired:
0 0.html
http://www.wired.com/news/topstories/0,1287,6179,
If your bitterest enemies are people who hack the heads off civilians, then I would say you're doing something right.
In the past I've mentioned here the role that popular publications like Scientific American have in creating hype. Be it the semantic web, nanotechnology, AI or asynchronous circuits, SciAm seems to focus on pie-in-the-sky ideas with a very small chance of success.
That would be fine if they acknowledged this in the text, but more often than not they take an extremely bullish approach and echo the wildest promises by the researchers as if they were to happen tomorrow.
Very smart people have been working for many years in asynchronous circuits, yet the likeliest scenario are hybrid designs mixing synch and asynch circuits (the asynch circuit stops the clock from propagating).
Why do SciAm and other such publications do this? According to Chomsky because they are told so by the trilateral comission. Personally, I think they do it because it sells magazines.
That's not true either. It can take fewer transistors even at a small scale, and it often takes fewer transistors at a large scale, since propagating the clock pulse across a chip requires a surprising amount of circuitry.
Consider that the Pentium 4 added entire pipeline stages for the sole purpose of getting data from one side of the chip to the other in step with the clock.
Consider that the x25, a largely asynchronous chip, has about as many gates as a 386 yet contains 25 parallel processors.
The main problem isn't impossibility or complexity; the problem is that asynchronous design isn't yet understood. We have a LOT of research to do. Once we've done it, engineers will consider asynchrony to be a simple, solved problem.
-Billy
It's amusing to read the claim that an asychronous chip couldn't take advantage of pipelining. You see, the thing is that pipelining exists ONLY to control two of the disadvantages of clocked processors.
First, it allows different instructions to complete in different amounts of time. An asynchronous chip wouldn't have that disadvantage.
Second, it allows 'idle' portions of the chip to be used by other instructions whose time hasn't come. Asynchronous chips are vulnerable to that as well, but they can be much less vulnerable than even the most pipelined architecture, because dataflow can completely guide the chip: you can hammer in more data as soon as the previous data's been slurped in.
So far from not taking advantage of pipelining, asynchonous chips naturally have one of the advantages of pipelining, and can be built to have the only other.
-Billy
How would an asynchronous process affect determinism requirements, such as those of a hard real-time system?
Fascism starts when the efficiency of the government becomes more important than the rights of the people.
In 1993 I was a graduate student in the Caltech asynchronous circuit design group. That year we had a prototype asynchronous microprocessor that implemented a subset of the MIPS instruction set.
:-)
The guys in the lab used to demo this by hooking up an oscilloscope to show the instruction rate. They would then get out a can of liquid nitrogen, and pour it on the CPU. The instruction rate would climb right up... This lead to many jokes about temporary cooling during heavy loads. "Hey, get the ice cubes... He's starting gcc!"
I believe our group used a different basic latch design than Sutherland describes. We handled all bits asynchronously using three wires, one that went high for 0, one that went high for 1, and a feedback wire for "got it". His design looks like it could latch a bus of wires simultaneously. Forgive me if I'm wrong... it's been almost a decade.
One of the nice features of these chips is that they are tolerant of manufacturing errors. Often impurities in the silicon will change the resistance or capacitance of a long wire. In asynchronous designs, this just means operations that need that wire will be a little slower. In the synchronous world, either the whole chip fails or you have to underclock it.
A group of ex-Caltech graduate students started a company to sell these asynchronous processors. Details at Fulcrum Microsystems.
(For those at Caltech: Yes, that's me on the asynch VLSI people page. And yes, I wrote prlint. What an awful piece of software that was.)
Unless I missed it, there was no mention of Theseus Logic's Null Convention Logic at all which is a real disappointment. Theseus has one of the few approaches that doesn't require a PhD-level of education to understand and design in.
-- Bryan "TheBS" Smith
Independent Author, Consultant and Trainer