Clockless Computing: The State Of The Art
Michael Stutz writes: "This article in Technology Review is a good overview of the state of clockless computing, and profiles the people today who are making it happen." The article explains in simple terms some of the things that clockless chips are supposed to offer (advantages in raw performance, power consumption and security) and what characteristics make these advantages possible.
Is this another example of the 'bohemian/hippie renegade engineer out to save the computing world by their bold revolutionary ideas'?
Sort of reminds me of the Rolling Stone cover back in '90 (or so) that had Jesus Jones on the cover. "Will Jesus Jones save Rock & Roll?" (And notice where they are now)
'Life is like a spoonful of Drain-O, it feels good on the way down but leaves you feeling hollow inside'
Damnit, Jim, I'm an anarchist, not a F@#$!^& doctor!
No, I think it quite likely that many in the Linux community have given about half a thousandth of their assets.
You know, as with the police, I have a lot less trouble with Bill & co. than with their sycophants. At least the way B.G. and M.S. operate makes sense for _them_; to hear these cheerleaders prate along as if Bill might actually _like_ them....
And marketing these chips will have to get back to the real stuff: how many operations of a specific kind they can carry out per second.
I'm just wondering, would such a processor execute the same machine code using the same internal sequence of signals twice ? I guess asynchronous communication between elements would introduce some kind of randomness.
What will AMD and Intel try to one-up each other with? No clock speed, so how do you classify, much less hype, new processors?
The real reason they haven't moved to this yet is their marketing team doesn't want to give up on the MHz race.
Learning HOW to think is more important than learning WHAT to think.
It doesn't matter, computers are not sentient (yet).
I can see the point that clockless design can reduce the power consumption. However, I don't really catch the point why it may solve the other problems inherited from high speed computation.
.... But, if we want the clockless design to work as good, its asynchronous gates should still be switched for that much times in the same 0.5.
Suppose we want to increment the register for 1000M times, clocked circuit will generate hell lot of the noises when all the signal pushes thru the circuit,at say 2GHz,for a duration of say, 0.5s
In terms of noise generation, it will be on par of convention design. As all the gates still need to switch at pretty much the same speed, other physical barriers still operates.
Anyone has more detailed info on this topic?
With its simplified core, a processor like the crusoe seems like it could be a promising general-purpose chip to first adopt technology like this.
Any comments from someone more knowledgable than I?
Not if you have a backdoor. Guess these guys don't read Wired..
"If you think education is expensive, try ignorance" - Derek Bok
The Amulet Group at The University Of Manchester have a clockless ARM (ARMs are used in many mobile phones, the Compaq iPaq and the GBA).
The article is very interesting. I though that research in asynchronous computing died in the sixties. What the article misses is that async. operations has an overhead too - the synchronization "here is the data". Synchronous computing does not have that.
I have previously read (forgotten where) that in theory async. computers will always be slower that sync. computers. It seems that that is not true anymore. I guess that the latests-and-greatest CPUs have a non-trivial percentage of idle time for instructions which takes slightly longer than an integral number of clock ticks. If an instruction takes 2.1ns and the clock runs at 1ns, everything have to assume that the instruction takes 3ns.
Also imagine a fully async. computer. No need for a new motherboard or even changing settings in the BIOS when new and faster RAM chips are available - the system will automatically adapt.
I think that we will see more and more async. parts in the year to come. But I don't know if everything is going to be asynchronous.
They have a press release, see here: http://research.sun.com/features/async/
(I'm sorry, I can't use HTML: the lameness filter don't want to allow the posting otherwise.)
I imagine the "perfect" laptop:
- an OLED screen (no need for backlighting)
- an asynchronous processor (low power)
- no HDD, but plenty of MRAM (this RAM is persistent)
The AMULET group at Manchester University have been developing this for years based on ARM cores.
http://www.cs.man.ac.uk/amulet/index.html
Well, I think that the reason the async chips are not being used is quite simple - a clocked system is much easier to design and verify. You know how long before and after a clock edge your signal needs to be there to be recognised. You know that if these constraints match across your system, it will work. Yes, this makes the system as fast as its slowest link - some circuits operate near their limits, some are actually wasting the time. But it works. An asynchronous design would be a pure hell to debug - that's probably why the industry doesn't (yet) mess with it.
:-) The only drawback is that the most of us prefer slow digital model of thermonuclear reaction and similar problems...
BTW, does anybody here remember analog computing? A bunch of cleverly connected operating amplifiers? These things were asynchronous, just as mother nature is. If you can get the physics work for you, bingo - compare the time the nature needs for raytracing a complex scene compared to a digital model
Busses can be made asynchronous. Handshaking is the key. New statigies will be needed, but people are bright so I feel they will be developed. With a little thinking I've sketched out a packet type asyncronous bus in my head. It would work nicely for up to a meter or so. Longer lengths would be slower than shorter ones. One thing I feel may work best is for any signal/data that needs to travel significant distances is to then go into synchronous transmition. Otherwise you end up adding in delays from the back handshake signals.
I remember some of the first articles in SIGARCH and how they sparked my interest. I've always felt that async was the way to go when you don't know how long an operation will take. I'm happy to see it's still getting research dollars.
The old CDC supercomputers, and the Cray 1, were clockless. They were designed by that inspired madman, ...
The reason be built them clockless is that the propogation time to get the clock signal across the machines (which were fairly large) would have significantly slowed the performance. Instead, all of the wires are the right length so that all of the signals arrive at their destination at the right time. I've been told horror stories by ex-CDC salesmen that when they installed new machines, they would spend days or weeks clipping wires to different lengths and debugging hardware failure modes until it all ran smoothly.
Cray also solved the heat dissapation problem by designing the computer to run hot. This meant that when you turned it on it didn't work reliably until all of the ceramic boards heated up (and expanded) so that the connections were solid, etc.
F-ing brilliant.
- The Amulet Group at The University Of Manchester
- have a clockless ARM
Pocket watches were invented centuries ago.
First off, this is pure conjecture, IANAME.
Okay, the way I suppose this would work, considering that Intel had developed a chip that was compatible with the pentium series, would be an asynchronous design, with some kind of logic translator to communicate with the bus. Yes, at first you would be wasting processor power, but eventually, the bus technology would catch up (See ISA to EISA to VLB to PCI to AGP and on...). As for the RAM, it could either run on an independent clock-bus, or, I do not see why it would be a problem to develop asynchronous RAM if they have the technology for the chips. Also, the article states that the P IV utilises some asynchronous componants, maybe that is port of the reason for the push to use RDRAM with it?
You say you want a revolution....
When designing a "conventional" CPU, you can have a clock that essentially drives events and datamovement.
If you design a multiplier circuit using a bunch of full-adders, you'll notice that the output take a long of time to settle. In fact, depending on what numbers you are multiplying together, the circuit may take more or less time before the output settles.
You can always determine the worst-case scenario for a multiply operation to settle. If the multiply takes longer than any other operation, then the multiply op is the "critical path".
A chip's frequency is the inverse of the period of the critical path (in most cases). So, if it's possible to do 100 million critical path operations in a second, then your machine can run at 100MHz.
What the article is hinting at is the amount of wasted time because everything is (currently) done on the clock cycle. Allow me to illustrate: Let's say a multiply takes 5 seconds, but an add only takes 1. A fixed clock rate (or having a clock at all) forces that add instruction to take the extra 4 seconds, and use it for nothing. Wasted computer time.
Now, the reason people are skeptical is because there is no efficient way to tell if a multiply operation (or any other operation) has actually completed and the outputs have settled.
Incidentally, if this interests you, go grab a free program called "diglog" or "chipmunk". The software (for linux/windows) allows you to simulate almost any digital circuit.
Another thing to keep in mind about current CPUs is the way they execute an instruction. Every instruction is actually made of smaller instructions (called microinstructions). Microinstructions take one clock cycle each, but there is an arbitrary number of microinstructions for each larger instruction. The microinstructions perform the "fetch execute cycle" - the sequence that decodes the instruction, grabs the associated data, performs the desired task, and goes back for more.
If you're interested in designing a CPU yourself, go grab a book by Morris Mano called "Computer System Architecture". With that book and DigLog, it's pretty easy, but it takes a long time.
if there is no mass market for asynchronous chips, there's little incentive to create tools to build them; if there are no tools, no chips get produced. The same problem applies to the development of chip-testing technologies. Without any significant quantity of asynchronous circuits to test, there is no market for third-party testing tools.
But at least here there's an accidental solution - the Cross-Check Array.
Conventional clocked chips can be tested by scan: A multiplexer is added to the flop inputs, and a test signal turns them into one or more long shift registers. The old state of the flops is shifted out for examination while a new state is shifted in to start the next phase of the test. This only works when the flops to be strung together are all part of a common clocking domain.
The Cross-Check Array is more like a RAM. A grid of select lines and sense lines are laid down on the chip, with a transistor at each intersection. The transistor is undersized compared to those of the gates, forming a small tap on a nearby signal - or it can inject a signal if the sense line is driven rather than monitored. Select drivers are laid down along one edge of the chip, sense amplifiers/drivers along another.
This approach does not depend on the flip-flops to be active participants in the observation process (though it can still force their state), and thus can observe signals in asynchronous as well as synchronous designs. It also gives observability of testpoints in combinatorial logic without the addition of extra flops. Compared to a fullscan design it gives much greater observability and takes about half the silicon-area overhead.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Does programming for clockless chip differ to synchronous one? Every links I tried to follow only explain about design, or speed, or power consumption difference.
They're talking about removing the internal CPU clock, which in effect, isn't really a clock at all. It's just something which ticks at regular intervals, and lets you do a number of things, such as synchronize instructions, pipeline, cache read/writes, and all the other stuff I forgot from CS 101.
A computer's clock (as in date, time, etc) is on another part of the motherboard, and runs (correct me if I'm wrong) off the CMOS battery. That'll always be a "clock" in the sense we understand.
And build in a microphone and make itts screen touch sensitive. That way you can get rid of the keyboard, trackpad and hinge and make it a single, consolidated unit.
It would not be economically viable to try and push this new type of processor to the market overtaken by the traditional synchronized processors and computer equipment, however, it seems that the assynchronous microprocessing can still be used inside traditional computers if it is mixed together with synchronized systems. Imagine a computer that uses a synchronous bus just the way it does now but has an assynchronous co-processor which is communicated to by a special type of synchronous CPU that allows certain operations to be carried out assynchronously. If, for example, a matrix multiplication needs to be done, the normal CPU would require a number of clock cycles that is proportional to the number of multiplications within the matrix over the number of processor pipes allocated for this task. If it can be proven that assynchronous processing can do the same job three times faster than a 'normal' cpu takes, why can't 'normal' or traditional CPU ask the assynchronous co-processor to do the task for it? The problem is of-course assynchronous data retrieval and storage. Probably a co-processor could actually be a co-processor card with its own assynchronous memory bank on board that can be later synchronized with the traditional memory banks. Such a system should not be too difficult to implement, since it could use a PCI slot for example. Soon a computer would become less and less synchronous, with the synchronous parts synchronizing many assynchronous devices.
You can't handle the truth.
Asynchronous VLSI is one of the exotic yet hardest subjects around. Caltech's Alan Martin is perhaps one of the most popular person in the field around. Their groups has designed asynchronous MIPS 3000 from. IT works pretty beautifully and is faster.
.... academics also earns chicks :)
My gf is a device engineer and she really fell for me when she learnt that I know asynchronous VLSI design. Coool
I read that article thru a link at the bottom of C-net's news.com a few days ago. Why bother /. it? Are you implying /. is the only place we look for news?
Gee...
Pedro
----
The Insomniac Coder
I'm Turd Fergus0n. Funny name huh? Turd Fergus0n, remember it.
Yeah, that's right. Turd Ferguson. It's a funny name.
The Only way they can probly avertise the Async chip is to give the MHZ of the fastest segement of the chip. That or they will actually have to advertise other segments of the computer that determin speed. Dose that meen that computers will be sold with more Cache Again, Or they actaully tell the Bus speed or even the Pipeline of the systems. My god this will turn computing advisertising around. Where a system simular to a SunBlade1000 with 8megs of Cache will actually be advertised faster then a P4Like system with 1/2Meg of cache. Will Wonders never siece.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
One thing to think about is why in a freeway things move slower when there is a lot of traffic and faster when there is less traffic...
/.-ers trying to understand this technology...
In asynchronous systems, control signals need to travel at the handshake speed. In a freeway, this is the brakelight to braking reaction time. In a chip this is the complete to start signal (or stall signal). This is often called the "speed-of-light" in an asynchronous system (the fastest speed information can travel).
Synchronous systems are benefit from coordination that operates globally with a path independent fast communication (the clock tree mismatch/jitter time). Clearly this is not scalable indefinitly, but its helps when practical.
In the final analysis, it will probably be true that although it appears that time is wasted in a single computational stage, the backpressure wave it will create in a pipeline is limited by the "speed-of-light" in the handshake which negates most of the advantages (since a clock is often run at the fastest speed-of-light possible and computes are repipelined to account for this). In the end, the only real savings will probably be in power area (although a chip that burns less power can run faster, this is a second order effect)...
Think about this the next time you are stuck in asynchronous traffic and how global coordination trades some inefficiencies for greater global efficiency. In fact automatic car control systems have recongized this and have proposed car clusters to improve traffic efficiency (cars would group together using wireless lans and move in a group with tighter control, but still act asynchronously with the global traffic).
As with nearly all technologies, hybrid solutions are often better than ones that ones that are architecturally "pure". Practical systems that mix locally synchronous and globally asynchronous systems are probably the more optimal solutions for many problems. With the reverse (locally async and globally sync), backpressure waves cause losses in performance (because of loss of throughput, you can't run it totally pipeline full)....
Simplistic analysis like "time is wasted between clocks" does a disservice to
One of the first common "thinking-out-of-the-box" techniques used to crack smart cards was the sw was written to take different amounts of time to compute legal and illegal keys. By measuring the battery consumption, the smart card crackers could only search the space of legal keys.
No doubt this was a sw path put in by a well intentioned programmer trying to save battery life, but now all respected encryption systems reccomend a "veil" strategy, where all encryption/decryption operations take the same amount of time and power regardless of the key.
In practice this means that you find out the max time and power (plus some margin) and if you are done early and without using enough power, you waste time and power to pad out the the veil...
Nice thought, but this just goes to show that cryptographic systems really need to be designed by experts...
Yikes, seems a little sci-fi and bogus claims....
Of course the new Pentium 4 contains some elements of asynchronous design... all synchronous chips do! In a synchronous design, the logic between registers (article calls Flip Flops) is asynchronous. The gating factor on the amount of asynchronous logic you can place between registers in a synchronous design is a function of the clock speed and the gate speed -- the faster the gates, and/or the slower the clock speed the more logic you can place between registers. Looks like the article is about a system with a clock rate of 0 without changing gate speed, so the processing rate will be the sum delay of the asynchronous logic -- I wonder what this would be on a chip the complexity of a P4 or G4?
The upside to slower clocks is reduced piplineing, which can be useful in designs with limited data paths.
The down side to slower clock speed is increased complexity. Data skew has to be monitored across the chip, so gate delays have to be accounted for every gate in every possible data path (vewy complex). The chances for glitching increase with logic. With no clock it gets worse, every glitch can be seen -- not the case with a clock (glitches between clocks edges may be tolerated).
I also disagree that clock distribution is limiting factor. This problem is overcome in larger ICs by distributing PLLs throughout the silicon. The limiting factor in clock speed has more to do with materials used in the chip -- gate speed, skin effect, etc.
Finally, there are quite a few ways to increase the performance of synchronous design. One way is to have multiple data and ALU paths like the Pentium and G4. Another is IC technology. Personally, I'm waiting for the day an all optical processor hits the market.
So an asynchronous chip runs a little faster, the trade is an enormous design risk, maketing, OS development, etc. I say leave the anarchy to the software.
Parts of processors are already asynchronous. The basic way you get stuff done in a clocked machine is that you have a register feeding an array of logic gates some number of gates deep, with the output going to some other register. Within the array of logic gates, which might be an adder, a multipler, or an instruction decoder, things are asynchronous. But the timing is designed so that the logic will, in the slowest case, settle before the register at the receiving end locks in its input states. The worst case thus limits the clock rate, which is why the interest in asynchronous logic.
The claims of lower power consumption are probably bogus. As Transmeta found out, the power saving modes weren't exclusive to their architecture. Once power-saving became a competitive issue, everybody put it in.
Wouldn't an asychronous microchip be fabricated as a disc rather than a square, to help make the wires closer to the same length?
I am not a lawyer. Do not take my words as legal advice. If you need legal advice, consult an attorney.
I'm not sure that Philco ever attained dwarf status, but they made and sold computer systems for much of the '60s.
From the description of the Philco 2000:
"The Philco 2000 Electronic Data Processing System
uses asynchronous logic which reduces computer operating time and allows new components to be added without redesigning the equipment."
http://www.ed-thelen.org/comp-hist/BRL61-p.html
The article is surprisingly accurate, for a change. Read it.
/. trademark by this point...
However, it seems to have spawned the usual problems here with misunderstanding and confusion. Practically a
Whether you construct a processor using conventional or asynchronous logic makes no difference to the programmer. The programming paradigm can be completely independant from the underlying hardware. (Admittedly, if you want to squeeze the absolute most performance from a given hardware design, you need to program with it in mind, but there is no reason why an ix86, or PPC, or SPARC, or MIPS chip couldn't be implemented asynchronously.)
One of the most interesting advantages of asynchronous logic is that it allows the use of arbitrarily large die sizes. In synchronous logic, you're limited by the delays that arise from transmitting your clock pulses across the chip... at some point maintaining a global lock-step becomes infeasible.
One of the most marketable advantages of asynchronous logic is the power saved by not having to constantly drive the same clock circutry. Most chips support a 'sleep' or 'low power' mode where they turn off the clock or provide it to only a limited portion of the chip. The chip then has to go through a 'wake up' cycle to re-establish the clock throughout the chip before returning to normal operation. The power saved by asynchronous operation can be substantial, and the lack of a 'wake up' latency can be critical in certain applications.
The biggest problem right now is that the vast Layout and Design masses are used to solving the synchronous problems and not the asynchronous problems, ditto for the availible tools. Howver, with an asynchronous-savvy group, a given solution can be designed in less time than the equivalent synchronous solution (someone here was claiming otherwise...).
And this technology is -not- vaporware... it's real and it's here. And whether you believe it or not, it's at least one part of the future.
-YA
PS: BS in EE from Caltech. Working for a company mentioned in the article, although their opinions have no logical relation or tie to mine.
How would you rate these in speed?
Wohoo!, I just got a new computer with a speed of 0MHZ
ohh yah well I got 0.01MHZ!
well I guess MHZ dosent exist in these chips but how would the rating system go?
By doing something like 'cat /proc/cpuinfo'?
int sum(int a, int b)
{
if((a == NO_VALUE_YET) || (b == NO_VALUE_YET))
return NO_VALUE_YET;
return a + b;
}
Lin00x r00lz j00!!!
Thanks for this analogy, the freeway is the way I describe/visualize circuitry.
Obviously I'm not an electrical engineer, I'm just trying to understand the technology.
When we increase the frequency are we increasing the speed - passing cars/sec - by increasing the density or increasing the velocity of traffic? It seems that a higher voltage would increase the velocity while a higher clock enables the cars to be closer together.
If the technology yields 3x performance at 32 bit will the multiplier increase at 64 bit and up? (making the highway wider) If so does this mean that the technology doesn't provide cost/performance now but in the future it will?
You mention a hybrid system of locally syncronous systems in an asynchronous environment.
Would this include a MP system of syncronous processors and an asyncronous bus and I/O? It seems to my uninformed mind that this could provide a huge performance gain. Besides the bus is really what is holding us back right?
I mean isn't it the bus that makes high-end servers so expensive and isn't this what leads to diminishing returns in MP systems? Isn't it latency that limits the speed of the bus? Would an asyncronous bus minimize the latency effect?
If so could an asyncronous bus lead to higher aggredization of computing tasks - on a code level and processing level? (more threading and processors)
As an AI hobbiest I am always interested in a system that is aggregate and asyncronous - because this is how our brains work.
This type of topic is the reason I read slashdot. Cheers!!!!
personally, I laughed when I heard the news on Tuesday. I'm not getting laid, so it's hard for me to empathize with the bereaving spouses; I'm like "wow, at least you had someone. what's it like?" I actually feel more empathy for the terrorists, because my life sucks and I feel like I have nothing to lose and I appreciate it when someone strikes a blow against the forces of "normalcy" and "happiness" in the world.
The article mentions Theseus' approach to asynchronous design -- Null Convention Logic (NCL) -- but does not go into any detail. For more info, check out Theseus' white paper on the subject: ncl_paper.pdf. I read this a couple of years ago and thought it was fascinating. At the time, I tried to design some "primitives" that could be implemented in an FPGA to at least try out some of the ideas. Not a trivial excercise.
We're wanted men. I have the death sentence in 12 systems!
I guess it will just dissapear...
e m& item=1273872749
Also Check THIS OUT!!! ITS A SOCKET A OVERCLOCKER
like the gold finger device but fo socket not slot
http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewIt
The human brain doesn't have a clock speed on the Central Processing Unit- in fact, there _is_ no central clock, but our minds manage to function with a great deal of processing power. Imagine the bandwidth of the file equivilant of all the .wav, .avi, .ogg, .mp3, .txt, Optical character recognition, and AI functions we use, plus mechanical functions like bipedal balance. I've heard estimates and approximations that the brain performs about a trillion operations per second, is that about right? Pretty impressive.
An interesting thing to think about is, with no clock speed, how we still can perceive time. We need to do this to predict the paths of moving objects, like birds and arrows and spears... or more recently car trajectories when we're driving. With no absolutely authoritive center time in our minds, how do we still have such an accurate sense of time when it comes to predictiong these paths?
I personally imagine that the brain does have some sense of ratios...I imagine that neural loops have some sense of ratios... for example, if hypothetically the motor loop between between say the basal ganglia and the corpus collupsum is were twice the speed of an eyeblink? The exact milliseconds could vary between people but still give a basis for comparing motion and "time" in the real world. Of course, this would be affected by age as the loops break down- this would account for the way the old people I've seen tend to drive.
There's a man named Charles Moore who has been developing asynchronous microprocessors over the last decade. His current chip is called the X18 and it can maintain a sustained processing rate of 2.4 billion instructions per second. The power consumption at that rate is 20 milliwatts. Check out http://www.mindspring.com/~chipchuck/X18.html Also check out http://www.mindspring.com/~chipchuck/25x.html, which describes his X25, currently available only as a prototype. Basically its 25 X18s on one chip, running in parallel. Assuming that you can write a program that could take full advantage of 25 such cpus that would amount to 60 billion instructions per second. The power consumption is so low as to allow operation of the microprocessor array for one year on one 100mAh battery.
Don't you just have to look for the handshake signals instead?
Also, what are the implications of the "dual-rail" circuits -- doesn't this mean that you won't be able to fit as many transistors on the chip?
I hold it, that a little rebellion, now and then, is a good thing. -- Thomas Jefferson
Well, the OS can communicate asynchronously with many things. I don't think you can PROOF your statement. *ell, I think I can falsify your statement.
nosig today
How will Intel sell chips if clockless computing is ever successful? They won't be able to double the length of their pipe to "speed up" their chips. I guess we will have to finally develop some fair metric to finally be able to compare chips between product lines....
I want my rights back. I was actually using them when our government stole them after 9/11.