Slashdot Mirror


Clockless Computing: The State Of The Art

Michael Stutz writes: "This article in Technology Review is a good overview of the state of clockless computing, and profiles the people today who are making it happen." The article explains in simple terms some of the things that clockless chips are supposed to offer (advantages in raw performance, power consumption and security) and what characteristics make these advantages possible.

27 of 140 comments (clear)

  1. How... by blkros · · Score: 2, Funny
    are computers going to know what time it is if they don't have any clocks?

    --
    Damnit, Jim, I'm an anarchist, not a F@#$!^& doctor!
    1. Re:How... by Fat+Casper · · Score: 2
      You'll just think that you've set the fuzziness scale to high.

      --
      I spent a year in Iraq looking for WMD and all I found was this lousy sig.
  2. What will they advertise now? by chill · · Score: 3

    What will AMD and Intel try to one-up each other with? No clock speed, so how do you classify, much less hype, new processors?

    The real reason they haven't moved to this yet is their marketing team doesn't want to give up on the MHz race.

    --
    Learning HOW to think is more important than learning WHAT to think.
    1. Re:What will they advertise now? by Anonymous Coward · · Score: 2, Insightful

      That is NOT the reason they have not moved.
      Designing something as complicated as a CPU without clocks is a daunting challenge. Keeping
      everything in sync, removing race conditions,
      keeping order of execution the same. There's a
      lot of challenges in a clockless design.

    2. Re:What will they advertise now? by mattdm · · Score: 2

      What a novel idea!

      Sarcasm aside -- the SPEC benchmarks have been around for a long time and are well respected. You can see some SPEC CPU 2000 results here.

  3. Clockless ARM by Anonymous Coward · · Score: 2, Interesting

    The Amulet Group at The University Of Manchester have a clockless ARM (ARMs are used in many mobile phones, the Compaq iPaq and the GBA).

  4. Asynchronous vs. synchronous computing by isj · · Score: 3, Interesting

    The article is very interesting. I though that research in asynchronous computing died in the sixties. What the article misses is that async. operations has an overhead too - the synchronization "here is the data". Synchronous computing does not have that.

    I have previously read (forgotten where) that in theory async. computers will always be slower that sync. computers. It seems that that is not true anymore. I guess that the latests-and-greatest CPUs have a non-trivial percentage of idle time for instructions which takes slightly longer than an integral number of clock ticks. If an instruction takes 2.1ns and the clock runs at 1ns, everything have to assume that the instruction takes 3ns.

    Also imagine a fully async. computer. No need for a new motherboard or even changing settings in the BIOS when new and faster RAM chips are available - the system will automatically adapt.

    I think that we will see more and more async. parts in the year to come. But I don't know if everything is going to be asynchronous.

    1. Re:Asynchronous vs. synchronous computing by Trejus · · Score: 2, Interesting

      Also imagine a fully async. computer. No need for a new motherboard or even changing settings in the BIOS when new and faster RAM chips are available - the system will automatically adapt.

      Now i'm not an engineer, but in the article it mentioned that it was important to have wires and gates connected in a special manner so the data arrives in the proper order. It seems to me that it would make the microprocessor more dependent on the hardware and not less so. Maybe this wouldn't be a problem if all of your RAM was the same speed, but it could cause a problem if you had one 100Mhz simm and one 133Mhz simm. I would think that the information coming from the 133 could screw things up. Can anyone clarify this for me?

      --
      "To save the planet, I had to go to the worst spot on Earth, and that was Philadelphia." -- Sun Ra
  5. Old news? by NoMercy · · Score: 2, Informative

    The AMULET group at Manchester University have been developing this for years based on ARM cores.

    http://www.cs.man.ac.uk/amulet/index.html

  6. Reliability by numo · · Score: 3, Interesting

    Well, I think that the reason the async chips are not being used is quite simple - a clocked system is much easier to design and verify. You know how long before and after a clock edge your signal needs to be there to be recognised. You know that if these constraints match across your system, it will work. Yes, this makes the system as fast as its slowest link - some circuits operate near their limits, some are actually wasting the time. But it works. An asynchronous design would be a pure hell to debug - that's probably why the industry doesn't (yet) mess with it.

    BTW, does anybody here remember analog computing? A bunch of cleverly connected operating amplifiers? These things were asynchronous, just as mother nature is. If you can get the physics work for you, bingo - compare the time the nature needs for raytracing a complex scene compared to a digital model :-) The only drawback is that the most of us prefer slow digital model of thermonuclear reaction and similar problems...

  7. Re:what about other problems? by Bryan+Andersen · · Score: 2
    Yep, you have to rethink alot. It's possible. I expect we'll see async processors first show up in embedded situations where all parts of the system are integrated on one chip.

    Busses can be made asynchronous. Handshaking is the key. New statigies will be needed, but people are bright so I feel they will be developed. With a little thinking I've sketched out a packet type asyncronous bus in my head. It would work nicely for up to a meter or so. Longer lengths would be slower than shorter ones. One thing I feel may work best is for any signal/data that needs to travel significant distances is to then go into synchronous transmition. Otherwise you end up adding in delays from the back handshake signals.

    I remember some of the first articles in SIGARCH and how they sparked my interest. I've always felt that async was the way to go when you don't know how long an operation will take. I'm happy to see it's still getting research dollars.

  8. Re:what about other problems? by Ravenscall · · Score: 2

    First off, this is pure conjecture, IANAME.

    Okay, the way I suppose this would work, considering that Intel had developed a chip that was compatible with the pentium series, would be an asynchronous design, with some kind of logic translator to communicate with the bus. Yes, at first you would be wasting processor power, but eventually, the bus technology would catch up (See ISA to EISA to VLB to PCI to AGP and on...). As for the RAM, it could either run on an independent clock-bus, or, I do not see why it would be a problem to develop asynchronous RAM if they have the technology for the chips. Also, the article states that the P IV utilises some asynchronous componants, maybe that is port of the reason for the push to use RDRAM with it?

    --
    You say you want a revolution....
  9. CPU Primer by ThePurpleBuffalo · · Score: 2, Interesting

    When designing a "conventional" CPU, you can have a clock that essentially drives events and datamovement.

    If you design a multiplier circuit using a bunch of full-adders, you'll notice that the output take a long of time to settle. In fact, depending on what numbers you are multiplying together, the circuit may take more or less time before the output settles.

    You can always determine the worst-case scenario for a multiply operation to settle. If the multiply takes longer than any other operation, then the multiply op is the "critical path".

    A chip's frequency is the inverse of the period of the critical path (in most cases). So, if it's possible to do 100 million critical path operations in a second, then your machine can run at 100MHz.

    What the article is hinting at is the amount of wasted time because everything is (currently) done on the clock cycle. Allow me to illustrate: Let's say a multiply takes 5 seconds, but an add only takes 1. A fixed clock rate (or having a clock at all) forces that add instruction to take the extra 4 seconds, and use it for nothing. Wasted computer time.

    Now, the reason people are skeptical is because there is no efficient way to tell if a multiply operation (or any other operation) has actually completed and the outputs have settled.

    Incidentally, if this interests you, go grab a free program called "diglog" or "chipmunk". The software (for linux/windows) allows you to simulate almost any digital circuit.

    Another thing to keep in mind about current CPUs is the way they execute an instruction. Every instruction is actually made of smaller instructions (called microinstructions). Microinstructions take one clock cycle each, but there is an arbitrary number of microinstructions for each larger instruction. The microinstructions perform the "fetch execute cycle" - the sequence that decodes the instruction, grabs the associated data, performs the desired task, and goes back for more.

    If you're interested in designing a CPU yourself, go grab a book by Morris Mano called "Computer System Architecture". With that book and DigLog, it's pretty easy, but it takes a long time.

  10. There's a solution to one problem mentioned by Ungrounded+Lightning · · Score: 4, Informative

    if there is no mass market for asynchronous chips, there's little incentive to create tools to build them; if there are no tools, no chips get produced. The same problem applies to the development of chip-testing technologies. Without any significant quantity of asynchronous circuits to test, there is no market for third-party testing tools.

    But at least here there's an accidental solution - the Cross-Check Array.

    Conventional clocked chips can be tested by scan: A multiplexer is added to the flop inputs, and a test signal turns them into one or more long shift registers. The old state of the flops is shifted out for examination while a new state is shifted in to start the next phase of the test. This only works when the flops to be strung together are all part of a common clocking domain.

    The Cross-Check Array is more like a RAM. A grid of select lines and sense lines are laid down on the chip, with a transistor at each intersection. The transistor is undersized compared to those of the gates, forming a small tap on a nearby signal - or it can inject a signal if the sense line is driven rather than monitored. Select drivers are laid down along one edge of the chip, sense amplifiers/drivers along another.

    This approach does not depend on the flip-flops to be active participants in the observation process (though it can still force their state), and thus can observe signals in asynchronous as well as synchronous designs. It also gives observability of testpoints in combinatorial logic without the addition of extra flops. Compared to a fullscan design it gives much greater observability and takes about half the silicon-area overhead.

    --
    Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
  11. Programming difference? by andika · · Score: 2, Interesting

    Does programming for clockless chip differ to synchronous one? Every links I tried to follow only explain about design, or speed, or power consumption difference.

    1. Re:Programming difference? by 2nd+Post! · · Score: 5, Informative

      It *can* be different, that but's really a function of the state of compilers and languages adapted for an asycn system. It needn't be different at all.

      Disclaimer, I was a student at Caltech, and I took 1 async VLSI course, and not very in depth at that.

      One way to go about it is to make an async CPU that externally looks like a sync CPU; then you drop it into just about any system, and it works. Speed is wholey dependent upn VCore settings, cooling solutions, and drive strength, I think, though of course there's always gate and transistor performance bottlenecks. Programming and using such a chip would be no different than any other CPU.

      Another method is to have a partially async system, in which the CPU, some of the motherboard, and the ram interface is async because of how fast they operate; go ahead and clock something like PCI, USB, etc, because those operate slow enough that the effort of async isn't worth it. This solution is just a question of degrees, really, on how much of the system is async and how much isn't.

      Now, that aside, there's the software aspect; how do you program an async system? At the lowest level it resembles, slightly, multi-threaded programming, in which you have multiple threads equating to the multiple function units, execution units, decoders, and stages in the pipeline, etc.

      You shuttle data around and wait for acknowledges that the data has been processed before you continue shuttling and processing data. You can synchronize around stages or functional units by making other stages or units dependent upon the output of said unit; instead of waiting for a clock to signal the next cycle of execution, you wait for an acknowledge signal.

      To be a little more clear, at the ASM level you would mov data, wait for an ack before another mov data, wait for an ack before sending an instruction, etc. Due to the magic of pipelining, the CPU doesn't have to be finished before you can start stuffing the pipeline, and because it's asynchronous, that means you can actually feed in data as fast as the processor can recieve it, even if the back end or the core is chocking on a particularly nasty multiplication.

      So you're feeding data at a furious rate into the CPU, while the CPU is processing prior instructions. If the front end gets full, or whatnot, it fails to signal an ack, so whatever mechanism is feeding data in (ram, cache, memory, whatever) pauses until the CPU can handle more data.

      The core, independent off the front end, is processing the data and sending out more instructions, branches, setting bits. With multiple functional units, each unit can run at it's own speed at it's own rate. So if all it's doing is adds, checking conditionals, etc, it may be able to outrun the data feed mechanism, since an add can be completed in one pipeline unit, while data always has to wait upon a slower storage mechanism.

      Or if the execution units are waiting because it's doing a square root or something, it just tells the prefetch or whatever front end units to wait, because it cannot handle another chunk of data or instruction, yet, which propogates back to the data feed to wait as well.

      When it finishes with it's current instruction a ready signal would get propogated back through all the stages or so, and then more data would get fed in.

      So at the lowest levels it would start to resemble writing threaded code, in which you have to wait for the thread to be ready, to be awake, to be active before you send data, and if the thread is asleep, you wait until it awakes, or something like that.

      Multiprocessor async is similar, except that each CPU is just another thread, and if there's a hardware front end that decides which CPU to send instructions to, then it's really just a function of stuffing instructions into the least loaded or fastest running CPU; each CPU could, more or less, look like just another functional unit, and clusters pretty well because they all run asynchronously, meaning you don't have to do anything particularly special for load balancing; just send the data to the first one who signals ready, or if there are multiple cpus ready, read a status register to see which is more empty or whatever.

      Apologies if I made some errors, especially to those who know much more than I; this is a 4 year old interpretation of my async vlsi class =)

    2. Re:Programming difference? by 2nd+Post! · · Score: 2

      The interesting part about asynch CPUs is that they, duh, aren't clocked...

      So in the case of a pipeline flush (and accompanying stall), it doesn't take N clocks (whatever the pipeline depth is), it goes as fast or as slow as the flush mechanism reset takes...

      If done well, then a pipeline flush can operate at thousands of times faster than the normal operation of the pipeline because, well, you're just dumping data without doing any work; raise the proper bits and reset signals, and the whole pipeline dumps as fast as it can, while the front end feeder just slows down a bit (without stopping) in feeding data into the pipeline.

      Above assembly, btw, the programming language for the CPU doesn't have to look like SMT; it can, but it doesn't have to.

  12. Efficiency with growing clock speeds by Jimmy_B · · Score: 2
    [from article] But after a point, cranking up the clock speed becomes an exercise in diminishing returns. That's why a one-gigahertz chip doesn't run twice as fast as a 500-megahertz chip.
    Wrong. A 1GHz chip doesn't run twice as fast as a 500MHz chip because of pipelining, and because the support infrastructure in a typical PC can't handle a 1GHz chip well, so it spends a lot of time waiting for hard disk access and memory. Eliminating the clock isn't going to make the heads on a hard disk move any faster. The real benefit is that an idle component can't be a bottleneck anymore.
    1. Re:Efficiency with growing clock speeds by Jimmy_B · · Score: 2
      Hence if you've got a 500Mhz chip with 2 stages and the clock physically placed near stage 1, then stage 1 of the pipeline will run at 500MHz, stage 2 will also run at 500MHz but with some latency, so the two-stage pipeline will complete an instruction very slightly over 2 cycles. Add more stages, you'll get a bigger effect at the end. And as clock speeds go faster, you'll eventually hit the ceiling -- the latency might actually be as fast as a single cycle itself.
      Latency doesn't affect the time required to complete the instruction, only the time at which it is executed. If the clock reaches second pipeline stage late, the time required to complete that pipeline step is latency+calculation time. If that's more than one cycle, the chip is clocked higher than it can run, period. Anyways, there's a fairly obvious solution to clock latency in pipelines. Put the start of the pipeline near the registers, and the end of the pipeline near the registers, then a U shape for all the ones in the middle; since a stage only needs to interact with the ones before and after it, which are physically adjacent, the effective latency is small.
  13. No changes required. by seizer · · Score: 2

    They're talking about removing the internal CPU clock, which in effect, isn't really a clock at all. It's just something which ticks at regular intervals, and lets you do a number of things, such as synchronize instructions, pipeline, cache read/writes, and all the other stuff I forgot from CS 101.

    A computer's clock (as in date, time, etc) is on another part of the motherboard, and runs (correct me if I'm wrong) off the CMOS battery. That'll always be a "clock" in the sense we understand.

    1. Re:No changes required. by bugg · · Score: 2

      No, a computer does indeed know what time it is based on a clock- it's the same way digital watches know what time it is (counting pulses). The answer? Computers will still have some sort of calibrated oscillating circuit in them, but they won't be synchronizing processor activity.

      --
      -bugg
  14. Ideal Laptop by Vegan+Pagan · · Score: 2

    And build in a microphone and make itts screen touch sensitive. That way you can get rid of the keyboard, trackpad and hinge and make it a single, consolidated unit.

  15. Mix&Shake by roman_mir · · Score: 2

    It would not be economically viable to try and push this new type of processor to the market overtaken by the traditional synchronized processors and computer equipment, however, it seems that the assynchronous microprocessing can still be used inside traditional computers if it is mixed together with synchronized systems. Imagine a computer that uses a synchronous bus just the way it does now but has an assynchronous co-processor which is communicated to by a special type of synchronous CPU that allows certain operations to be carried out assynchronously. If, for example, a matrix multiplication needs to be done, the normal CPU would require a number of clock cycles that is proportional to the number of multiplications within the matrix over the number of processor pipes allocated for this task. If it can be proven that assynchronous processing can do the same job three times faster than a 'normal' cpu takes, why can't 'normal' or traditional CPU ask the assynchronous co-processor to do the task for it? The problem is of-course assynchronous data retrieval and storage. Probably a co-processor could actually be a co-processor card with its own assynchronous memory bank on board that can be later synchronized with the traditional memory banks. Such a system should not be too difficult to implement, since it could use a PCI slot for example. Soon a computer would become less and less synchronous, with the synchronous parts synchronizing many assynchronous devices.

  16. Re:Power saving, yes.... Good performance???? by TeknoHog · · Score: 4, Informative
    Say we're running at 2GHz, which allows a maximum time of 0.5 ns for an instruction. But if you use some simple instructions that only take 0.2 ns each, you'll be wasting 3/5 of your time waiting for the next cycle. With clockless computing you can move on to the next stage as quickly as you're done with the one before.

    Of course there is some overhead. There has to be a system telling other parts of the computer when something is finished. But if that is a long enough stage (perhaps thousands of instructions) then it'll be faster overall.

    --
    Escher was the first MC and Giger invented the HR department.
  17. and it doesn't even make sense... by slew · · Score: 2

    One of the first common "thinking-out-of-the-box" techniques used to crack smart cards was the sw was written to take different amounts of time to compute legal and illegal keys. By measuring the battery consumption, the smart card crackers could only search the space of legal keys.

    No doubt this was a sw path put in by a well intentioned programmer trying to save battery life, but now all respected encryption systems reccomend a "veil" strategy, where all encryption/decryption operations take the same amount of time and power regardless of the key.

    In practice this means that you find out the max time and power (plus some margin) and if you are done early and without using enough power, you waste time and power to pad out the the veil...

    Nice thought, but this just goes to show that cryptographic systems really need to be designed by experts...

  18. Asynchronous mainframes by Animats · · Score: 2
    Asynchronous mainframes were built in the 1960s, by, I think, Honeywell. There was a modest performance gain, but well under 2x.

    Parts of processors are already asynchronous. The basic way you get stuff done in a clocked machine is that you have a register feeding an array of logic gates some number of gates deep, with the output going to some other register. Within the array of logic gates, which might be an adder, a multipler, or an instruction decoder, things are asynchronous. But the timing is designed so that the logic will, in the slowest case, settle before the register at the receiving end locks in its input states. The worst case thus limits the clock rate, which is why the interest in asynchronous logic.

    The claims of lower power consumption are probably bogus. As Transmeta found out, the power saving modes weren't exclusive to their architecture. Once power-saving became a competitive issue, everybody put it in.

  19. The usual problems... by Yobgod+Ababua · · Score: 2, Informative

    The article is surprisingly accurate, for a change. Read it.

    However, it seems to have spawned the usual problems here with misunderstanding and confusion. Practically a /. trademark by this point...

    Whether you construct a processor using conventional or asynchronous logic makes no difference to the programmer. The programming paradigm can be completely independant from the underlying hardware. (Admittedly, if you want to squeeze the absolute most performance from a given hardware design, you need to program with it in mind, but there is no reason why an ix86, or PPC, or SPARC, or MIPS chip couldn't be implemented asynchronously.)

    One of the most interesting advantages of asynchronous logic is that it allows the use of arbitrarily large die sizes. In synchronous logic, you're limited by the delays that arise from transmitting your clock pulses across the chip... at some point maintaining a global lock-step becomes infeasible.

    One of the most marketable advantages of asynchronous logic is the power saved by not having to constantly drive the same clock circutry. Most chips support a 'sleep' or 'low power' mode where they turn off the clock or provide it to only a limited portion of the chip. The chip then has to go through a 'wake up' cycle to re-establish the clock throughout the chip before returning to normal operation. The power saved by asynchronous operation can be substantial, and the lack of a 'wake up' latency can be critical in certain applications.

    The biggest problem right now is that the vast Layout and Design masses are used to solving the synchronous problems and not the asynchronous problems, ditto for the availible tools. Howver, with an asynchronous-savvy group, a given solution can be designed in less time than the equivalent synchronous solution (someone here was claiming otherwise...).

    And this technology is -not- vaporware... it's real and it's here. And whether you believe it or not, it's at least one part of the future.

    -YA

    PS: BS in EE from Caltech. Working for a company mentioned in the article, although their opinions have no logical relation or tie to mine.