Clockless Computing?
ContinuousPark writes: "Ivan Sutherland, father of computing graphics, has been for the last ten years designing chips that don't use a clock. He's been proposing a method called asyncronous logic where there's no clock signal being distributed and regulating every part of the chip. The article doesn't give many technical details (greatly needed) but Sutherland, now doing research for Sun, is telling that significant breakthroughs have been made recently to make this technology viable for mass production. It is estimated that 15% of a chip's circuitry is dedicated to distributing the clock signal and as much as 20% percent of the power is consumed by the clock. This is indeed intriguing; what unit will replace the familiar megahertz?"
The hyper-marketing-droids ay mega-chip-corporation wont believe that consumers will be able to handle not comparing chips by Mhz/Ghz raitngs, and you will see what they did to CD-ROM drives. 12.5Ghz Max!
"Little does he know, but there is no 'I' in 'Idiot'!"
Logic synthesis tools are very important to modern day IC design and optimization.
That is why Theseus Logic, Inc. (mentioned about 2/3rds down in the NYTimes article) has a Strategic Alliance with Synopsys. Our patented NCL (Null Convention Logic) technology, unlike many other asynchronous technologies, is designed for maximum interoperability with existing tools, maximum design reuse and near-complete elimination of common CBL (clocked boolean logic) timing closure issues.
For those that mentioned Amulet, its project leader, as well as original ARM designer, Steve Furber is on Theseus' Advisory Board.
Please visit our web site for more information.
Disclaimer: I am an employee of Theseus Logic, Inc., who is NOT speaking on behalf of Theseus Logic in this post, nor his its content been approved by any Theseus Logic official.
-- Bryan "TheBS" Smith
-- Bryan "TheBS" Smith
Independent Author, Consultant and Trainer
Even without a set clock cycle, any CPU must have some sort of regulatory system which coordinates the execution of instructions (this is, of course, the primary function of the system clock). Without such a system, all parts of the CPU could execute instructions at random, making performance-improving techniques such as pipelining useless. So where would the regulatory circuitry be on such a chip? Surely adding it to the CPU itself would counteract the supposed gains from ditching the cock.
Imagine your local server room. Don't you think they would like a 20% decrese in their power bill?
We can afford the power bill easily enough. What we can't deal with is the extra cooling.
Does anyone know roughly:
cooling the chip will automagically speed it up.
asynch chips go as fast as the hardware can when the software needs it
Technically you don't know how long it takes on a regular microprocessor because of out of order execution and multiple issues per cycles. And on an async processor, i'd imagine that each instruction has an average latency. That way, you'd know about what should happen.
Outside of a dog, a book is a man's best friend. Inside a dog, its too dark to read.
You would measure this form of speed with names like
Intel Pentium V Fast
Intel Pentium V Really Fast
Intel Pentium V Yeah we know this one costs the same as the fast one last year but it is so much faster.
AMD Thunderbird Oh my god did you see how fast that was
AMD Thunderbird Seriously ya'll this is quick
IRNI
if you wire the asynchronous machine well. I'm taking undergrad Computer Engineering at U of T, and we have a course which introduces asynchronous finite state machine design.
Although it's much harder to design such a machine, it is not impossible, and it acts the same way as a clocked machine, except there is no clock to interface you with it, so you would have to have it output a simulated "clock" that would give you the information on when it's ready for the next instruction.
This clock wouldn't really be a clock, as you might expect. It would give you an edge when the AFSM is ready, so the period would vary from instruction to instruction.. This isn't such a problem though, since it's the clock that dictates the pace anyway..
If any of you are interested in more on this, check out our text, "Fundamentals of Digital Logic with VHDL Design" by S. Brown. Chapter 9 introduces the concepts.
Janimal
You forgot the all-important CowboyNeal. I don't know about you, but I wouldn't buy a computer with less than 50 giga-CowboyNeals of processing power.
Two things: A friend of mine was taking some ee classes at cornell on chip design, and he said he attended an optinal lecture (not related to his class, but his professor suggested that everybody go anyhow), and the person giving the lecture had built an asynchronous MIPS chip... That's cool =:-)
The second thing: I'd hope he'd have made some advances in programmability in the mean time...
---
Play Six Pack Man. I
So, what is a CFPP? It is a processor with a pipeline where data and instructions flow in opposite directions, with the instructions usually thought of as moving "up" and data as moving "down". The functional units (FU) are attached as sidings to the main pipeline. Each FU launches from a single pipeline stage and writes its results to a different stage, further "up" the pipeline. The main goals of this architecture were to make the processor simple and regular enough to create a correctness proof and to achieve purely local control.
If Sun ever produces a processor that is asynchronous, it will likely look similar to this.
--
"You can put a man through school,
But you cannot make him think."
"You can put a man through school,
But you cannot make him think."
Ben Harper
You mean, like the way they optimize for MHz over other, useful things, like flops? Remember when AMD did that little ad campaign of "Our 800 MHz chip is faster than Intel's 766 MHz chip!" How many "normal" people followed that one? Today, MHz is the standard rating of speed, and is misleading. mflops would be a much better measure (although you're right that, with different ops taking different amounts of time, you'd have to carefully define what you mean by an operation).
Secondly, I don't think it will take "several years" of experimentation to figure out how much faster your add is than your multiply. We already know the answer to that question, and it depends on how you decide to impliment your circuit. If you decide to do multiplication with shift/add you could get a tiny little multiplier that's freaking slow, or you can go hog-wild with 7,3 counters, wallace trees, fast adders, etc. etc., and have a gigantic circuit that's really fast, but that's how hardware design has always worked and the options for solutions will be unchanged. Now though, you have a few more choices to make since your ops don't all have to fit into equal length pipeline stages, and also each op doesn't have to take the same amount of time for each set of inputs (for example, 7 + 1 might take x gate-dealays of time, whereas 7 + 0 could take many less.)
It's all very exciting.
God does not play dice with the universe. Albert Einstein
Those who fail to understand communication protocols, are doomed to repeat them over port 80.
If asynchronous chips became popular, it might help further debate on the difficult question of what makes "fast" fast. The common number, MHz, is of course pretty meaningless -- sort of like measuring the speed of a car by the RPMs. (If you've got the same model of car in the same gear, it's a meaningful comparison. But is a G4 at 500 MHz exactly five times as fast as a 486 at 100? What does "five times as fast" mean anyway?)
...Or not -- there are obvious and equally meaningless alternatives for asynchronous chips, like FLOPS, or LOPS (Logical Ops Per Sec).
Making the familiar measurement meaningless might put more emphasis on benchmarks, and give more impetus for getting them better standardized and more meaningful.
The point most people miss about chip design in general is that whatever the methodology being discussed, modern chip designs would be utterly infeasable without the CAD software that goes with them. The complexity of any IC design both synchronous or asynchronous is utterly beyond any manual design methods at this point, and the main reason that synchronous design predominates today is that CAD tools for synchronous synthesis came to market (Synopsys in particular ) and have dominated the field for nearly 10 years now. However research on async CAD tools continues, one notable effort being the European OMI project EXACT (http://www.omimo.be/system/templates/OMIProjects_ Detail.cfm?ID=6&Project=6143) which yielded a commercial error correction chip used by Philips in DCC players. If groups such as the AMULET group can automate there methodolgies then async design can quickly gain ground in various power and performance sensitive niches.
This sounds like the kinds of problems that the concurrent functional programming people love. See Erlang or perhaps some concurrent variation of Haskell like Eden.
Regards,
Zooko
Try to count seconds _asynchronously_ with your heart... It's not easy, I've tried :-)
- Steeltoe
http://www.debunkingskeptics.com/
> [can measure how many operations it can do per second]
Yes, but the point is that even on the same processor may take a different amount of time to do the same operation albeit on different data.
There are structures in your brain that basically function as a 24 hour clock, and some people can use them so effectively that they can tell time. This "clock" is not distributed to every functional component though. Ditto in your hearing example.
Asyncronous computers will have a timing clock, just not a clock signal that controls the gates on each functional component. This is the difference between attaching a few thousand parts to the clock and attaching 37 million+.
What makes it interesting is that you have to fundamentally redesign your your whole logical design so that you have a general purpose design.
With clocked computing, it is easy to see how you would flush buffers, etc. Clockless computing would be more problematic, and of course, would probably be proprietary.
My initial reactions are that it would work easiset in things like embedded processing. I also wonder if there would have to be some sort of evolution similar to what we have seen over the past few years with Intel, Motorola, etc.
One must not forget that the increases in performance for an awfull lot of these chips has to do with clock speed increases, as well as code designed to take advantadge of certain coding features in the hardware.
an early example of this is when the Pentiums first came out. For a while you had 486 boxes and pentiums with the same clock speeds on the market. you could compare performance between systems with the same video cards, same ram, same cache, etc. even though the chip sets with not the same, etc. This was educational. As I recall, the performance boost for somewhere not taking advantadge of the pentium feature set was aboput 20 - 25% (?) I may have this wrong, of course.
But at a time when pentium systems cost twice of a 486, it was definitely buying for the future.
"It is a greater offense to steal men's labor, than their clothes"
All your suprachiasmatic nucleus are etc, etc.
How come one Amulet post is modded "3 informative", and this one is modded "3 offtopic"???
I'd have thought that a post about a commercially available async. processor and the benefits of async. design are rather "on topic" for a story on an async CPU... particularly when the story claims this is something new when the idea itself is decades old, and Amulet itself (async. ARM, designed by Steve Furber, the original ARM architect, now a professor at Manchester University) has been around for quite a while.
As this post points out, not only is async. (i.e. data driven) design good for low power (you only use power when doing something!), but it also promises to raise performance by allowing each part of the chip to independently run as fast as it is able, and compute results as soon as it's operands are available rather than waiting for the next clock.
The obvious and most practically useful approach to speed grading async. CPUs would be to bin them into lots that meet a set of minimum performance standards.
I somewhat diagree about where speed grades would be an issue. The obvious market for async. CPUs like Amulet is in handheld consumer devices where precise performance characteristics don't matter (hence the acceptability of conventional power management techniques), but power consuption does.
For embedded real-time applications, however, you need repeatability more than power savings or peak performance.
And yes, it's true, Michael stole the Signal 11 account from the person who was using it. Shortly after it happened, there was a discussion about it on one of the front page stories of the day, in which Michael participated and basically admitted what he did (he acted like there was nothing wrong with it). He then promptly modded down all the posts in that thread (including his own) to -1 so they wouldn't get archived. Anybody who was paying attention at the time will remember what I'm talking about.
Axel backed up his statements with references wherever possible, and you merely assert, with no evidence, that he's lying.
Folks, what Axel said is true. Ignore "Ryan"; he's just trying to confuse the issue.
Free Hans!
The Commodore Amiga 3000 and 4000 actually feature a 32bit "clockless bus" called Zorro III which overlays ontop of a clocked bus, similar to ISA, called Zorro II. The Zorro III bus is a very interesting bus. All timing is strobe based. This means data is thrown out on the bus, and the receiving card or bus controller strobes a reply back as soon as it can latch the data and make use of it. Transfers between cards and memory can be extremely fast, however, transfering from a "fast" card to a card with slower logic will actually slow the system down. Addressing is multiplexed, like PCI ... its a very cool thing though, imagine if PCI was transparently overlayed onto ISA and thats Zorro III! All data transfers were done between the 3.5MHz clock cycles of the Zorro II bus. Of course, the technology has been lost over the years -- it would be cool if it was used again. clock sucks!
... Another concept of interest on this topic is Null Convention Logic. Here is a report to the NSF who apparently paid for some level of research on this.
I'm not sure if it was DEC or DG, or one of the other old Minicomputer manufacturers, but I seem to recall that something like a PDP-11 class machine was implemented without a CPU clock. Any old timers able to fill me in, here? -jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
Yeah, but that clock is like the BIOS or hardware clock, not like the CPU clock. It times large scale activity in the brain, but not second by second activity. I remember reading about clocks that governed second by second activity that could be read in brainwaves.
Need a Python, C++, Unix, Linux develop
...but only when it needs to synchronize with those components.
Is ten years of research really worth a 20% decrease in power consumption and a 15% decrease in overall chip size? I can't see how it could be. Chances are, by the time this technology is ready for prime-time (if ever), chips will be utilizing vastly different technology than they are now.
It's becomming increasingly harder to shrink chip sizes and increase speeds. Even with using different metals such as copper and shrinking trace widths, we are eventually going to hit a brick wall with current technology. After doing so, taking away 15% of the chip complexity is not going to go far in creating the next generation faster chip.
It's time to look to new technologies: carbon nanotubules and buckyballs, quantum computing, etc.
-=e
It makes good sense. Some operations take longer than others. So adjust the clock period to suit each instruction. Only works if you execute one instruction at a time. Otherwise I can't see why you are puzzled. If, as at present, you run at a fixed clock speed, some instructions are ready before others, which is inefficient. Of course in the real world, it makes sense to keep the clock speed fixed, which is my worry about asynchronous designs; sounds good on paper, doesn't translate into zillion gate designs.
Have you been peeking at my source?
http://twitter.com/onion2k
You're right and yes, I have. To 288 packages, each one an ASIC, over a volume of about one cubic foot. I more or less gave up doing hardware after that!
I'm just taking low level design for my BS in CS so i'm no expert, but it seems to me that... All the clock does is tell the CPU when to execute the next instruction. Well, the voltage change also drives the execution, but the voltage change takes place when the clock tells it to. What if the next voltage change took place when the last instruction said it was done. Why wait 1/800 x 10^6 seconds for a register load which is done in less than 1/3 that time (and register loads are VERY common). If the last instruction executed signaled the Program Counter to fetch the next instruction when it was done....voila! faster computing!
Seems to me that system performance should be be measured by real-world benchmarks... ...perhaps using Q3 as the de-facto standard...
>TIMEDEMO 1
Well, that's all most people seem to care about these days anyway.
"Hey man, I get 143 frames per second in Quake 3! 147 if I overclock!"
"Everything you know is wrong. (And stupid.)"
"Everything you know is wrong. (And stupid.)"
Moderation Totals: Wrong=2, Stupid=3, Total=5.
I can introduce you to one of several cousins who generally have the effect of sending high-freqency signals not only along your spine, but along nerves you never knew you had before - if the ``mike'' in your email address does stand for michael. Youngest candidate is about 15, oldest is about 30. Warning: they're more likely to stop your clock than start it, if the old ticker isn't in good shape or the old blood supply is a bit lean... (-:
Got time? Spend some of it coding or testing
Hm. How about the time it takes to compile the generic NetBSD kernel? Since NetBSD runs on almost anything for which the speed is interesting, and compiling involves a nice mix of real-world operations, execpt for floating-point.
(Yeah, I'm joking... a little bit :)
Most of the times wins. From what people are saying is that asynchronous logic is much harder to create. Or is just that everyone was taught more about clock-based circuit than asynchronous logic circuits. Well I will personaly like to see how this will play out.
'pretsnsious signatures on dork postings will be moderated as such'
Chuck Moore (the guy who invented Forth way back when) has built several minimalistic chips which have a lot of asynchrony. The way they do it is very different form dataflow -- instead of having functional units notify each other when they're ready, the hardware is generally designed to /assume/ readiness, and in fact to /be/ ready.
For example, in traditional processor design the first stage of execution is decoding, and the second is register lookup; in Chuck's chip there is no register lookup, because he uses a stack. Because of this, instruction decoding proceeds in parallel with all the ALU computations and result accesses, and when the instruction is decoded the result is simply to gate a result into the TopOfStack register (and sometimes to pop the stack, if the instruction was supposed to consume two values).
The only exception in the current design is the ADD instruction, which he's implemented as a ripple-carry; that can sometimes take more than one cycle to compute, so if there's a possibility of a large carry the programmer is responsible to insert up to 3 NOPs.
The URL is http://www.ultratechnology.com.
In my microprocessor design class a few years back, I built my processor around these concepts. It was an unqualified success; it was easy to build, easy to program, much less resource-constrained than any of the other designs in the class, and ran all the required programs much faster than anything else. Almost everyone else was following the party line and building a RISC-style two-operand machine; since we only had eight bits per instruction, this was suicidal. The few who weren't completely toeing the line were building accumulator machines, which worked well but didn't have the sheer flexibility.
-Billy
There would still need to be some type of oscillator on board, at least for computers. Otherwise, how would a computer be able to do time sensitive operations, like wait 1 second for instance. Also, how would the computer be able to keep the time? As humans, we think on terms of seconds, minutes, and hours. There would be no way for Outlook to tell me I have a meeting in 5 minutes, not like it can do it correctly even with a CPU clock though.
The biggest point is that this will save enormous amounts of electrical power that can be used elsewhere.
Sutherland's work is nothing new. The Computer Science department at the (a department he founded) has been working on this for years as well. They have made significant progress.
I know, I worked with one of the professors for a while before I went into electrical engineering.
Just 'cuase the British government can't make it work does not mean it is impossible. Many inventions we use every day were considered "impossible".
Remember that since you don't know about the research, perhaps there is something you don't know.
Weren't the original Cray-designed machines clockless? Supposedly the idea was that signal propogation times synched up so that everything got where it was going at the right time. I remember hearing horror stories of CDC field engineers debugging installations by clipping wires and re-testing, over and over, until signal propogation times synched up. Of course, operating temperature was kinda an issue.
On that front, the Cray 1 didn't have any cooling -- everything was designed to run "hot," so the circuit boards' contacts were all flakey, for example, until the board heated up and the board expanded to tighten the contacts.
Brilliant? Insane? Yes.
At the end of the day, MIPS and FLOPS and all the other so-called performance numbers are pretty much meaningless anyway, since no one factor governs the overall performance.
:)
Seems to me that system performance should be be measured by real-world benchmarks that most people can relate to and that hold much more importance to the user, such as:
SBT - System Boot-up time, where the value is normalized modulo the time it takes to get coffee
GAFR - Graphics Accelerator Frame rate, perhaps using Q3 as the de-facto standard
and the related:
MTBF - mean time between frags
of course, we should also measure things like:
MTBR - Mean Time Between Reboots
MTBNR - Mean Time Between Network Redials
... but these are questionable since they can affected by outside influences unrelated to system performance
That's the whole point of many years of resarch on the subject. This is nothing new.
The CPU hits bottlenecks in slow components now. The difference is that right now everything keeps running at full power during a bottleneck while it does nothing at all. With async design, when components are waiting they take a fraction of their power as there is not clock cycling them off-and-on.
Also, the layout and design of a synchronous clock is a major limiting factor in CPU design.
there's no problem. busy wait loop delays are calculated at boot time. just cuz you don't have a distributed clock, doesn't mean the instructions aren't complete at regular intervals.
I don't think flops will be able to replace Ghz, what with the idea of a floating point operation recently becoming "obsolete". To quote the Matlab version 6 help file for the function FLOPS (available in Matlab 5.3.11 and prior):
This is an obsolete function. With the incorporation of LAPACK in MATLAB version 6, counting floating-point operations is no longer practical.
While I have to admit that FLOPS did't give a 100% accurate picture of whats going on, it was nice to test an algorithm and see if the actual flop count matched the theoretical count...or to get a rough idea of what constants are associated with the order of the algorithm. Thank God for Octave!
Neural nets are simply a way of creating adaptive digital logic. If you build a 3-layer network, you can train it to implement any boolean function (with however many inputs you have).
Back in the 40s or 50s, some mathmatician guy (not J'VonN', although he was somewhat involved?) proved that NNs/digital logic is isomorphic to some sort of logical calculus stuff. (Sorry for the lack of details.) People got excited because, philosophically, they thought that Formal Logic = Thought. Nowadays, most of us would be kinda skeptical of that assumption.
Also, keep in mind that our control systems aren't just neurons (of which there are a buncha kinds)...there's also the endocrine system and all.
Then there's the concept of the soul...I'd sure like to be non-deterministic, wouldn't you? ;-)
FLOPS, of course.
Even without a processor clock, you should still be able to measure how many operations it can do per (real-time) second.
While clockless digital circuit design is not new, it's not as well analyzed as conventional synchronous circuit design. There are still many improvements to be made to this design technique. It will also take time before it will be generally accepted by the industry.
I saw a presentation about Null Convention Logic (NCL) just last week. It seems that there are companies out there already manufacturing chips using this particular asynchronous design technique. Overall, the advantages don't seem to be great enough to make a big impact on the computer industry, but with active research in this area we might see something within the next few years.
It is both amusing and frustrating to hear all of the "armchair computer scientists" discussing the reasons this technology is a bad idea. As if they knew more about the subject than the many PhD's who have dedicated their careers to this subject based on the knowledge gleamed from the one Computer Architecture class the poster took as an undergraduate.
I was invited to work on a team at the University of Utah (Sutherland's old school) where they were researching this very topic. This is old news; they have been working on it for years. And as some people have correctly pointed out, there are both good and bad points to sync or async logic.
There are two major reasons to work on async logic: clock skew and power savings. The reason for power savings alone is a good one. People here have been complaining that it "is not worth it for only a 20% power savings".
Yes it is! In a modern office, computers end up taking a lot of power. Imagine your local server room. Don't you think they would like a 20% decrese in their power bill?
That means instead of building five power plants, you only need four (on a grand scale; please no newbie replies like doodz, thiz guy thinkz you n33d five pawer pl3nts to run a box). That is significant. And with today's high-MHz CPUs this means even more. Some think >50% savings, and even more during low cycle time.
The clock skew issue has been covered somewhat here. One of the major hurdles in solving the design problem is the development of new design tools, which is what many people at Utah are currently working on.
The way to move forward is not to argue for the limitations of systems of the past. Don't make me pull out Ken Olson quotes here.
http://channel.nytimes.com/2001/03/05/technology/0 5IVAN.html?pagewanted=all
Research themes for AI:
* representation
* understanding the representation
* reintegrating new things (learning)
* deciding how to act on what you know
So that would be no change there for 40 years then... but, I have to say, the spin offs are spectacular. To answer your comments one at a time:a sekaran1988-1.html
for a summary of Chandrasekaran's seminal paper.
* Neural nets are probably not capable of consiousness because beings with a digital matrix cannot conceivably operate in a linear fashion. What will time mean when you can live in any "when" that you, or anyone else, recorded? So no "self" , no "stream of being".
* Subsystems - good research theme - look at http://www.cc.gatech.edu/~jimmyd/summaries/chandr
Hope that is of interest,
Si.
--------------------------------------------- "In the end, we're all just water and old stars."
The best part of slashdot is the hypocrisy. Slashdot has a definite "do as I say, not as I do" policy.
Example 1: Censorship
Slashdot claims to be anti-censorship. They make prominent figures in the anti-censorware movement authors. I'm talking about Michael Sims and Jamie. They claim to promote free speech. But do they really?
I'm not going to bore you with tales of the dreaded bitchslap.
Here's an article you might find interesting. It's about Michaels real position on censorware.
Also, here's a charming article.
Example 2: Auctions
Taco and Hemos find the idea of auctioning virtual property to be interesting. Here's a story by Hemos, and here's one by taco.
But what happens when someone tries to auction a slashdot account? Here's a snippet from an IRC log:
[22:25:58] [Questions] JustSomeGuy asks: How do you feel about the recent sale of user accounts on ebay?
[22:26:06] [CmdrTaco] should we fess up?
[22:26:11] [CmdrTaco] we fucked with the first guys karma.
[22:26:14] [CmdrTaco] it was funny as hell.
[22:26:28] [CmdrTaco] we wrote a script to give him random karma from 0.. number of seconds until ebay auction ends.
[22:26:35] [CmdrTaco] so he had 0 karma when the sale ended.
[22:26:41] [CmdrTaco] he updated his account to cry.
[22:26:44] [CmdrTaco] it was so funny.
What's this? Taco writing a script just to fuck with a user? Say it isn't so.
You can view the complete IRC log here.
Oddly enough, this never gets mentioned in any story on virtual property auctions.
Why is that?
Example 3: Community
Slashdot is a community oriented website. They win webbys for this. It's the community that helped Taco and Hemos to a big pile of VA Linux stock.
But they don't really give a fuck about the community.
Here's a quote from an email Taco sent to Shoeboy:
> Anyway, to go back to my original point, I think a fair
> number of readers are interested in who the trolls are
> and why they post what they do.
That may be, but I don't care. I post Slashdot stories that *I* want to read.
You can get the whole email thread here.
(Shoeboy kicks Taco's ass hardcore)
Want more? How about the theft of user accounts?
Famous slashdot poster Signal 11 grew tired of this site. So he gave away his account. Dear beloved free speech advocate Michael discovered this and used his authorial privileges to steal the account. No warning was given. No explanation either. The account was simply stolen and that was that.
These are all reasons I love this site. If I wanted a site that wasn't run by assholes, I'd read kuro5hin.
NOTE: this post is entirely factual. If you have any doubts about the veracity of these claims, feel free to contact Taco.
Cheers,
~Axel~
--
Well, it's not exactly a dataflow machine, anyway.
The old E&S machines were dataflow architectures at the equivalent of the "machine code" level. Newer architectures are using similar ideas, but in a way that does not require details of the dataflow model leeching outside the chip.
Look at the Pentium 3, for example. It exploits dataflow ideas at the microcode level by prefetching several machine code instructions, splitting them into a larger set of "micro-instructions" and then schedules them together. That's not really a dataflow architecture, but it does use ideas from it: the idea of deciding on how to schedule the instructions at run-time.
The new clockless CPUs will exploit dataflow ideas by implementing a kind of dataflow machine between the functional units of the CPU itself. The CPU, remember, is like an interpreter for machine code. Since the "program" for that interpreter does not change, it can be implemented in a "plugboard" kind of way and people or programs producing machine code will never know the difference, apart from speed.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
This design paradigm and sets of tools all assume synchronous logic.
That is a significant issue. Because of that, I'm not betting that my next CPU will be async. The first applications will probably be in the micro controler area where the chips tend to be simpler and less powerful, and where constraints on power consumption are tighter and more important. However, just as the industry made the transition from hand drawn schematics w/ discreet componants to ASICs, this too shall come to pass.
Imagine an automotive assembly line where things could only move forward if each station got permission from his adjacent stations.
To some extent, that's EXACTLY what happens in many line processes. To compensate, there are 'buffers' built into the system. Like with clocked logic, each step is designed to take about the same amount of time to complete so that there won't be pileups or starvation at any given station. Consider that in many line processes, some stations, especially at the input and output areas are human operated, and humans are very asynchronous, yet the line proceeds in an orderly manner overall.
In the case of CPUs, I imagine a RISC approach will probably be used where the instruction set is designed so that each instruction will take roughly the same amount of time (generally true today as well, but enforced by the clock). The compiler will be responsable for scheduling the instructions to avoid starvation and pileup. In cases where multiple identical units exist, there may even be routing bits in the instructions to choose which unit is employed. That's not a very big stretch from the current situation since good optimizing compilers already have to know about that process but don't get to make the choices.
But if it doesn't have a clock, how do you overclock it?
</sarcastic>
UK egghead Sir Clive Sinclair wanted to produce a clockless computer in the late 80's, I recall. It never came to anything, because his business acumen was always somewhat lacking.
Yeah, but can I run Linux on it?
-----
Actually, the human brain does have a clock signal of sorts in it. For interpretation of audio and such, there is a "gate" of a few hundredths of a second at the onset of a sound when the brain can figure out where it's coming from. After that onset, its almost impossible to figure out the direction of a sound.
Also, how about people that can tell time without a watch? I knew someone who was accurate to the minute, day or night. There must have been some clock signal in his brain to govern that.
Yeah, I think your sig says it all.
MHz has always been a palrty substitute for a real indicator of processor power. Just look at the Pentium 4 and the PowerPC vs. Pentium wars a few years back. Not to mention the current Athlon craze.
;-).
A good benchmark program is always a better indicator of processor power anyway. No, they aren't perfect, but if you take the results of a few different ones, selected to balance their strengths/weaknesses, you end up with a good idea of how a processor really 'stacks' up
There's been a good deal of reserach on this under the topic of wave-pipelined systems. The biggest problems are in the area of design tools, e.g. accurate timing checks and debugging methods. On the flip-side in 1993, a 32-bit wave-pipelined multiplier was by by a grad student at NC State running at 200-MHz in 2.0 micron CMOS.
Either we have to have separately clocked parts in smaller domains or we have to go asynchronous. Both are insanely difficult, but the latter have the possibility of generating speeds unheard of. There are transistors capapble to 250 GHz (not in CMOS Si-technology, but anyway) and with some reduction in the feedback, a back-fed inverter could generate 50-100 GHz, locally. Imagine small parts of a chip operating at that speed and using level-triggered handshaking... diffucult, but mindblowing. :-)
Another thigs, we would get rid of the power consumption. The CMOS is consuming power proportional to the frequency even if they are not doing anything. (At least the clocked parts) The asynchronous logic would not waste any charge in the on-off switches.... Some real power saving!
The next step is adiabatic calculations. After the logic has reached the result, the process is reversed with not energy- or charge loss.
However, the quantum computers will not happen during my life time. If ever.
There is actually a sort of overall clock in the mind. I remember reading that brainwaves seem to be a kind of general clock signal that seems to be used for coordinating certain activities of conciousness. Sorry to be so vague. The article was awhile ago, and in print. I think in Scientific American.
That being said, I think you're right, the brain is largely asynchronous.
BTW, as a shameless plug, my StreamModule System is also largely asynchronous. It's for IPC though, not for gate-logic.
Need a Python, C++, Unix, Linux develop
I'm also an old fart and not some software geek to whom every hardware technology mentioned is something unheard of before. That being said...
The first computing machines weren't synchronous. I forget the names, but this kind of thing was being done way back when because it was impractical to distribute a common clock across the racks and racks of equipment that made up a CPU back then.
Also, Motorola's PowerPC chips implement an asynchronous divider, so you might be using asynchronous technology right now.
The idea of having a computer run as fast as the transistors can go is a great goal, but there's some impractical aspects to the use of asynchronous circuits.
First, how do you know your computation is done? Well, there's several different ways of telling. You can use a current sensor to decide when your gates have settled out for a decent length of time or you can wait a predetermined amount of time based on worst case. All solutions involve bloating the design with more transistors to time the handshaking between Muller-C elements. Whether it's some type of current sensor or just inverter chains, there's at least 10% of a circuit tied up in timing (and it can run much, much higher).
Also, what do you do with the data once you've processed it so fast? The IOBs are only so quick in driving pins, so while the core of the design can run really stinkin' fast asynchronously, it's hampered by the ability to get data in and out.
Design verification is also a nightmare with asynchronous logic. It's a hard enough problem figuring out my longest path between registers across process and temperature variations, but to add in the factor of not knowing your clock is... well, icky.
Finally, what about noise in an asynchronous design? For my current work, I have to make sure everything happens synchronously... or I end up with nasty noise in my CCD section. I can tolerate a little bit of asynchronous behaviour, but not a lot.
Where asynchronous technology makes sense now is something like Motorola's divider circuit. By making it asynchronous, they gain the speed advantage of not having to rely on a slower, global clock distribution network, by making it a local function, they avoid the problem of slow IO, and by using it for a "small" amount of their design, they avoid die bloat and noise problems.
I guess the idea of asynchronous design boils down to one of history. If it's such a wonderful thing and has been around for so long, why doesn't everybody do it? Well, because it has drawbacks and the design philosophy rarely fits the design criteria (cost, tools, reliability, performance, and function).
I don't think this is a newsworthy item. In asynchronous design, it's pretty much ALL old hat. Academic papers recycle the same ideas and the UK email reflector for asynchronous "researchers" goes quiet for months at a time.
Maybe tomorrow, /. will report the discovery of fire.
The DEC PDP-8 was an asynchronous design. No clock anywhere, just a bunch of delay lines to resolve race conditions.
/.'ers were probably in diapers back then. Or not born yet.
Of course, most
Concealed Handgun License Courses in Plano, Texas
Having interviewed and also know personally the people who work at ADD (asynchronous digital design), the company in pasadena that is working on asynchronous VLSI, I can tell you that there are definitely new paradigms for datapath and computation flow construction that make modern designs easy to create in asynchronous technology. The caltech AVLSI research group has succeeded in creating a MIPS core using VLSI technology, and their current work in ADD will take them even further. One of the advantages of asynchronous design is that once you create a working unit, you can plug units together without worrying about any timing issues and assuming the units are fully tested, the entire system should work out of the box. A lot of current vlsi design involves recreating a lot of structures in order to optimize for the latest architecture scheme. There's more than just Sutherland working on this, in fact some big names and some big people are interested in AVLSI technology. Asmodean / Naru Sundar
One interesting result of the AMULET group's The main proreasearch was in power consumption: the likelyhood that all elements of the circuit switch at the same time is basically zero, however, the likelyhood that 90% of all elements switch at the same time is quite high. This puts a limit on the power-dissipation advantage that was thought to be a benefit of asynchronous logic. However asynchronous design is still usefull if you want to save energy rather than power, which would be the case, e.g. in portable devices. I think Phillips has some chips for CD players which are asynchronous.
Another problem for traditional synchronous design was clock skew, and the resulting need for large clock buffers - however there are some interesting approaches to handling this, too. There is a company which actually uses the clock skew to aid timing between interfacing blocks - sort of turning the problem on it's head and treating it as a benefit. :)
(That's as much research as Sutherland's work, though.)
The main problem with asynchronous design is the lack of tools: you can't design multi-million gate ASICs by hand, and there is no support from the tool vendors.
Right now the possible advantages of asynchronous design don't seem to be big enough to actually use it on a commercial chip. Granted Sutherland is optimistic, but he's been optimistic 7 years ago, too - when I first heard of his mircopipeline approach. (And it is a great idea, don't get me wrong.)
We are still in the dark ages as to what the brain actually does and how it actually does it; and we won't be able to use any of our discoverie in information processing technology for the forseeable future.
In part, I would agree with you. But only to a certain extent. Yes, we don't know a lot. But you make it out to be a lot worse than it is. We also know probably as much as we don't know.
I wish I had a link to back up what I'm about to say, but I read about it in Ray Kurzwiel's book, "The Age of Spiritual Machines." Groups of scientists have actually reverse engineered specific neural networks of human and other animal brains. One group has even re-implemented the visual cortexes of several animal brains (including humans, if I remember correctly) in high-density analog neural networks. There is also currently tons of work being done in other analogous areas.
Hey, a quick search on Google turned up this article: A Little Piece of (Silicon) Cortex.
Read up before your spout off.
Also, I recommend everybody read Kurzwiel's book "The Age of Spiritual Machines."
-=/\- Jizzbug -/\=-
What if we try to compare clocked work cycles in a cpu to the "punch the time clock"-ed work cycles of today's society. What would it take to evolve society off the "clock cycle"? Clocking time = money. If the work clock is removed, 40 hour work weeks go away, deadlines disappear, and so on. What percentage of work (and life) is spent being "on time" which is really unnecessary and could be be removed. Make sure you punch out the time clock if you're not allowed to read slashdot while at work!
This is indeed intriguing; what unit will replace the familiar megahertz?
Given the absence of a clock, Id go for Inhertzia.
Karma karma karma karma karmeleon: it comes and goes, it comes and goes.
no text
---
How about an equally meaningless number, like BogoMIPS?
--
Restating the obvious since nineteen aught five.
So the extra complecity added by async logic becomes more and more feasable. However I dont think we will see fully async chips in widespread use, soon. Moreover bigger and bigger parts of the chip will work async. with a synchronous frontend.
nVidia pixel shaders work like this; see the information on their site wrt coding for the GeForce / GeForce2 with OpenGL. Judging by John Carmacks comments wrt GeForce3 pixel shaders, they haven't changed much since then.
They are quite usable, but then the "circuits" you build are not in excess of 25 stages long (most likely less than that).
The first electronics where built without clocks. Most communications don't use clocks. your 100/t network doesn't have a clock to operate. It works by senging data (A high or a low) then a middle state. This is a voltage in between the zero and one states. On the recieving end it waits for the state changes. Once it recieves that middle state it knows that the next change is acually data. Once it recieves the data it waits for that invalid state to know that that bit has ended.
The reason it has to do this is because both systems may be running at diferent speeds. If one's running at 100MHz and the other is at 102MHz then they will eventually get out of sync. Without a system like this they wouldn't even know they where out of sync. DSL/Cable/Modems all use systems like this. There are several hundred out there. Some of the mor common are Return to Zero (RZ) Non return to zero (NRZ) etc etc etc..
Back to electronics though, several types of memory are acually async. You set your addressing pins, you pulse a pin high wich tells the chip that its address is waiting. The chip then looks up the memory and sets the output data pins and then sets a return pin wich allows the other electronics to use the data returned. This allows for the fastest operation of the chip as posable. Since there is no clock there is no speed to conform to. The chip will always return its fastest posable responce. This is also harder to work with since you have to rate your chips in a speed rate. You may remember this from the old 70ns/60ns/50ns EDO/SIMM memory erra.
Well, the moderators seem to agree with you.
What, then, will replace the megahertz rating? Flops?
The story is about asynchronous computing, not about clocks in general. Asynchronous computing is to synchronous computing as functional programming is to imperative programming. Sure you may have methods of synchronizing with external entities, but the internal processes are (mainly) asynchronous.
:-) So the basic idea is that a neuron is asynchronous in principle, but groups of them may find it easier to communicate synchronously.
The brain is an excellent example of parallell asynchronous computing, since a neuron will only fire when its input-treshold has been reached. However, many internal processes in the brain may in fact be more or less synchronous, due to the fact that maybe it's an evolutionary advantage
- Steeltoe
http://www.debunkingskeptics.com/
I agree completely. I think the clock skew is the big (enormous?) incentive to go async, and I'm surprised that no one mentioned this until Comment 59.
At least this thing will give you an excellent excuse while you're at work to say at 10AM: 'Hey I'm going home, my computer's telling me it's 5PM.'
I have a photographic memory for numbers. I know almost a hundred of them.
Unfortunately for Sutherland, there's something called the PS300.
Back in the late 70's and early 80's, his company, Evans and Sutherland, ruled the world of computer graphics with their very slick Picture System machines. These were peripherals to PDP-11s and VAXes, and were wonderfully programmable machines. There was a fast interface between host memory and Picture System memory; letting you mess with the bits to your heart's content. We had a couple of them at NYIT's computer graphics lab; and did a lot of great animation with them.
E&S's next machine, though, was the PS300. This was a far more powerful machine, its first machine with a raster display. It was an advance in every way, except that it imposed a dataflow paradigm on programming the machine. You could only write programs by wiring up functional units. It was astonishingly difficult to write useful programs using this technology. Everybody I know that tried (and this was the early 80s, when people were used to having to work very hard to get anything on the screen at all); every one, gave up in frustration and disgust.
ILM got the most out of the machine; but that was by imposing their will on E&S to provide them with a fast direct link to the PS300's internal frame buffer.
Basically, dataflow ideas killed the PS300, which destroyed the advantage that E&S had as the pioneer graphics company, and they have never recovered from it. While the idea is charming, and to a hardware engineer it makes a lot of sense, programming them takes you back to the plugboard era of the very first WW-II machines. Nobody wants to do that.
thad
I love Mondays. On a Monday, anything is possible.
Hmm. I've understood that several computers of '60s and '70s were asynchronous. I'm sure PDP11/20 was, and some early-model PDP10 (KA10) machines were too and most likely the PDP6 was also asynchronous - and certainly there were many other asynchronous designs from other manufacturers than DEC.
So this is similar to CDROM data and other serial data that is "self-timing"? Do you have any more in depth articles or whitepapers to back this up?
----
---- I made the Kessel Run in under 11 parsecs.
Synchronous design (using a clock) makes it a lot easier to structure your design. It makes it into a big state machine in effect.
There's nothing to stop you changing the clock period on a cycle by cycle basis - slow instruction, use a longer clock period. There used to be an AMD chip to do just that for bitslice machines. Even with a fixed clock period, only the critical path needs a full clock period, some bits of logic have made their minds up well before a clock tick, so the bottlenecks you mention apply here as well.
It all falls apart on the assumption that the clock ticks at various points in your circuit are all happening at the same time; speed of light is finite, so you can understand why that's not true in the real world. (Clock skew)
It also falls apart because you can't enforce clocking on the entire universe - a signal coming into your system can (in theory) hang it up if it comes at just the right time, but there's maths to show that the likelihood is so small that it doesn't matter in practice. (Metastability)
That said, last time I looked into asynchronous design, not very deeply I admit, the cure looked worse than the disease. Which isn't to say it's a bad idea, maybe its time has come?
Yeah, and I'll bet that traffic lights and intersections take up a fair amount of road space in any city, and if we get rid of them, think about how much more traffic we can handle!
Seriously, one of the reasons that clocked logic is used everywhere is that it's (relatively) portable across process tweaks. If Intel used asynchronous logic on the new Octium... err... Pentium VII, then every time it tweaked its process there is the potential for the critical logic path to change, and the design would have to be re-optimized, re-layed out, and Oops! There goes the schedule again. Not to mention the fact that Logic Synthesis for Asynchronous circuits is a pain. (All of my designs are clocked...)
IANASPE, (I'm not a Semiconductor Process Engineer), so I'm probably getting this explanation all wrong. But I am a rather lame Digital Designer, FWIW.
On the other hand, There are some games you can play in a fast Clocked design. At 1GHz+ speeds, the clock skew across a die is so great in comparison to the period that you have very little time (if any!) left over for Logic. I think companies like Intel and AMD must have a way to "schedule" and plan for clock skew, so they can make tighter designs that actually work... This technique kind of looks like Asynchronous Design if you look at it sideways.
Not enough detail. What about the things that need a clock, like communication over a serial or paralell port, or USB, or data transfer to a HD? or to a 3d card? How many bits wide would an asynchronous bus be, and is there any way to keep from having to design all new peripherals? And what kind of OS would it run? I dont think it would be as trivial as taking linux and compiling it for an asynchronous computer, because it's so much different from a normal PC. And Bill's certianly not going to recompile windows to run on one, not for a long time. I dont think Asynchronous computers will ever be used for much more than specialty devices for specific applications, becuase of the above reasons.
The (Hopefully) Great Slashdot Blackout Apr 21-27
Yes exactly what I said. There is a difference between m and M, but not between k and K.
k = kilo = 1000
K = uhhh...well, kilo = 1000
See? Perhaps you misunderstood what I was saying...
-----
"People who bite the hand that feeds them usually lick the boot that kicks them"
Higher Logics: where programming meets science.
Our group at Cornell University works on asynchronous design. My advisor built an asynchronous MIPS processor at Caltech a couple of years ago. It works, and it is extremely energy-efficient (better than pretty much anything in existence for the same process). We use a different design methodology than Sutherland's group, and none of the criticisms posted here about asynchronous design apply to us (for example, all of our circuits -- including full CPU's -- have been formally proved to be 100% correct).
c as e.html
If anyone's interested, our group's page is:
http://vlsi.cornell.edu/
Anyone who wants a good overview of asynchronous design should read this paper:
http://vlsi.cornell.edu/~rajit/abstracts/async-
ck
does anyone else keep reading that title as "Cockless Computing"?
A fully asynchronous design requires lots of ready signals, or some very careful time-of-flight constraints. Aside from the fact that the current popular logic-synthesis tools don't provide neatly packaged solutions for this kind of design, if you don't implement this stuff in an intelligent manner, you can easily create a design which completely destroys any advantage over a synchronous design in speed, power, reliability and/or area.
On the other hand, one more advantage that I haven't seen mentioned about asynchronous design is modularity - most synchronous designs can only be verified for correctness in the context of the global clock signal, whereas if you've verified the correctness of an asynchronous module, you can plug it in wherever its functionality fits, without having to adjust all the stuff around it.
When you think about it, however, you will note that synchronous design is actually just a SUBSET of asynchronous design - the clock signals are just a way of indicating a "data ready" condition to the next bunch of logic gates. Careful logic designers who hold this viewpoint can design hybrid synchronous/asynchronous designs, where the overall design is actually a bunch of smaller synchronous designs, where each block of synchronous logic receives a "clock" which is actually a data ready signal for the logic block as a whole.
Haven't you been on lately?
Anyways, you mention...
In a modern office, computers end up taking a lot of power. Imagine your local server room. Don't you think they would like a 20% decrese in their power bill?
the savings of 20% are under "ideal" circumstances - i.e. not going to happen, or is going to be really expensive, I'd rather see people working on quantum computing - or cheap subzero cooling systems.
(you mention 50% at idle, but if your box runs at idle most of the time, you need to get seti@home or something)
If you want to save power in a server room, turn down/off the air conditioning / ventilation, don't run windows (idle is bad), turn seti@home off etc. . .
Also - the cpu isn't the only thing that generates heat / wastes energy in the box.
I couldn't resist, hell, you asked for it.
doodz, thiz guy thinkz you n33d five pawer pl3nts to run a box
I have a shotgun, a shovel and 30 acres behind the barn.
1q2w3e4r5t6y7u8i9o0pqawsedrftgthyjukilo;p'azsxdcf
The discovery of the tau mutation by Ralph and Menaker was the first mutation shown to alter circadian rhythm in a mammal. The mutation rendered the enzyme (CKIe) slower at switching on proteins produced by a key circadian rhythm gene called period. The regular rise and fall in levels of these circadian proteins governs the length of each cycle of the biological clock. The alteration in CKIe effectively changes the animals' circadian rhythm from 24 to 20 hours.
Someone who already replied to this particular sub-thread stated that the brain exhibits different wavelengths during different states of consciousness. While this is true, keep in mind (no pun intended) that brain waves are generally NOT periodic, but chaotic. This certainly discounts the theory that wave frequency is anything like the clock signal in microprocessors. Our brain isn't "faster" when our brain waves are of a higher frequency, either.
What's a second? An hour? A day?
It has much more to do with
the Earth's rotation than with cesium.
Performance measurement should be done by a test suite anyway. This "clockless" machine will only emphasize that. (They could, perhaps, call it the infinity chip .. the clock is a piece of wire, so it cycles "an infinite" number of times per second :-)
Caution: Now approaching the (technological) singularity.
I think we've pushed this "anyone can grow up to be president" thing too far.
The original Jim Clark Geometry Engine (c. 1981) was a self-timed floating point pipeline.
Mead and Conway talk about self-timed ICs in their book (1980)
There are connections between all of these folks and Sutherland.
[place clever signature here]
The OS is not the issue at all. Here we are talking about the logic circuits. One can always keep a clock to feed the OS. In an asynchronous circuit, the latches will not be clocked with that clock. That saves on a lot of power issues. On the other hand, in a synchronous circuit, all the parts of the circuit will be clocked by the same clock - which goes through an immensely complicated clock distribution network.
--- lm747
..and Ivan would be the first person to say so to anyone who reads that article and gets the idea that he invented it recently.
The pioneering work in this field was done in the late early 70's, primarily at Washington University. Look up macromodules by Tom Chaney, Wes Clark, and Charlie Molnar - this was a self-timed toolkit from which people could build larger systems. L Peter Deutsch whipped up a self-timed Lisp machine (!!) when he visited the group.
As an aside, the Digital PDP-6/10 had a self-timed main adder.
When Ivan was head of Caltech computer science in the late 70's, he brought together a group of self-timed luminaries, including Chuck Seitz, Charlie Molnar, and Wes Clark. The self-timed chapter in Carver Mead's 1980 VLSI book was written by Chuck Seitz. Alain Martin's group at Caltech succeeded in producing a 100MHz (that's how fast it went when left to its own, uhh, devices) self-timed uP a few years back.
There was also a self-timed ARM project. Couldn't tell you the outcome.
I haven't been following the work of Ivan's group at Sun (which again included Molnar, who died a couple years ago, and long-time collaberator Bob Sproull), These are all really smart guys doing cool work.. whether they can overtake the clocked world is another matter.
Self-timed logic is a rich field, well worth wading into. Imagine a CPU that went as fast as its logic let it, and if you cooled it, it would simply go faster. No overclocking guesswork.
People been worying about this since the 1980s.
Speeds get faster; chip dies get larger;
far of units get out of sync.
The problem with Mips:
;-)
Not all Mips are created equal. For example: is it fair and reasonable to compare a CISC Mips to a RISC Mips? The CISC may be doing something like a string move with one instruction while the RISC machine does it with series of instructions in a loop. Obviously this is an apples an oranges comparison.
Okay - next you look at Flops - aren't Flops the same on every machine. Well - no, though that is probably less of an issue for comparing IEEE based implementations. The question comes up (and it has already been mentioned) that Flops don't compare useful work loads! The vast majority of computer work loads don't involve significant floating point operations. (Yes you can find workloads where that is the case - but it isn't the majority situation.)
So it comes down to comparing computer "systems" is a tricky business. Even Mhz in the same architecture family doesn't work because you don't know how efficiently the machine is designed -the hardware might be capable of greater than one instruction per clock!
Finally - I don't believe the estimate of upto 15 % or clock distribution. It's more like 1%-2%. ( I do chip design for a living..at least I have an educated opinion on this!) The clocks ARE a significant part of the power issue though. CMOS burns power when signals move. The clock moves. Simple enough analysis there.
Asynch design methods have been around forever, but present a number of problems for traditional design tools that depend on the clock to do their work. Further, there are alot of chip designers that throw up their hands if you just mention the word "asynchronous design" to them. Any push to this kind of design would be tramatic to say the least
Have you compiled your kernel today??
Hmm, but I'd bet that my 550 MHz system with 128k of cache beats yours when it comes to rendering/2d graphics/etc. It all depends on what is important to you...since I rarely use Office apps I'd rather see raw MHz than larger caches. (Your memory bandwidth argument is generally being affected by the fact that whatever program you are running fits into your cache, but not the 512k one...something that doesn't hold for large programs or ones that work on large datasets)
During the last few years, a group has been working upon an asynchronous processor : AMULET.
This CPU uses the ARM core.
It is so power-efficient that it could only rely on the induction power resulting from its pins transmitting information.
Current status specify delivery of the AMULET3i.
--
Trolling using another account since 2005.
The clock that the OS uses to wake itself up to make scheduling decisions is completely seperate from the clock signal that is distributed on a cpu chip. The clock signal on a chip is what permits data and control signals to advance from one stage of the pipeline to the next, and thats the clock signal that async logic gets rid of. The clock that the OS uses to wake itself is a hardware interrupt from a completely seperate place.
Outside of a dog, a book is a man's best friend. Inside a dog, its too dark to read.
Asynchronous logic uses a handshake signal to indicate completion of an operation. This is often done by adding an extra signal along with the result, the signal is asserted when the operation of the gate is completed.
There are other problems that have stopped asynchronous circuits from wide adoption. The handshake signalling adds overhead. More circuitry and more wiring, which means more silicon area used and higher wiring congestion.
Chris Kuivenhoven is a thief, beware
I agree that you need a clock to indicate when data becomes available, but it's increasingly difficult to design complex circuits with a global clock - the clock distribution itself becomes a major part of the silicon.
Asynchronous logic offers the opportunity to avoid this problem. Self-clocked might be a better description than Un-clocked - essentially, every data object has a related 'clock', or arrival indicator.
Think of a massively parallel pipelined processor (at the register, or even gate level) rather than some sort of unclocked anarchy. Or a very finely divided set of Communicating Sequential Processes, to use your software analogy.
The question I'm really lookking an answer for is: Could CPU manufacturers be fooling us?
C.Burgess - email:colvinb@airnet.com.au
If you believe the parent, you're an idiot.
I kid you not, the trolls actually sit around for hours fabrication "evidence" to support their position. I would know...as fascdot I was inside the troll cabal. It disgusted me, though, so I left...and now they hound me.
They can't handle the fact that anyone would not want to be a part of their little circle jerk.
So I sez to him, I ain't givin' you no damn three-fity.
The interdata series of computers did not use a clock -- twenty years ago. The logic just "falls through" with the answer and restarts.
This is not new or groundbreaking.
Treatment, not tyranny. End the drug war and free our American POWs.
See my user info for links.
Have you ever tried to debug your brain? It can be fairly difficult.
------
Since my masters thesis was a asynchronous general purpose micropocessor, I can tell you that if you are interested in finding out how to do this search for articles on DCVSL. Differential Cascade Voltage Switched Logic.
Basically the gates include extra circuitry that determines when gate has completed switching. The major benefit is that heat is significantly reduced in the chip. Also the design of the chip in my thesis allowed for multiple pipeline exits and the ability to have multiple ALU's etc. The design of the chip is infact easier once you get over the hump and understand how to deal without a clock.
The big problem was dealing with memory chips. Your processor and memory may get "out of phase" with each other and you end up waiting for the next memory cycle, every time you go to memory.
yes, it may reduce the number of ops per clock, but it more than makes up for that by allowing a much higher clocking.
How does one overclock without a clock?
.5 times the bus speed and cannot do so reliably.
They'd better allow for the user to alter the speed in software. Maybe a dedicated software interface, that showed the intended speed and a disclaimer, and that could be set to different speeds in different modes (say, 100mhz-equivalent up until boot sector load) for safety purposes. I do see potential for the CPU and FSB not being tied so closely in speed, though I have to wonder about latency.
I see tremendous potential for this to be used to alter speed depending on the task, without disturbing other components.
For example, an OS could raise or lower the speed of the CPU and memory and other such components dynamically depending on usage, and act like it still has full speed, so that if you're only running an MP3 player that takes 5% system resources, the processor only runs at 10% or so of its normal capacity rather than being told to chill 95 out of every 100 cycles and running at full capacity. This can save mobile users a ton of power (bye bye speedstep) and trouble and also save power and heat for desktop users, who are finding power and cooling costs increasingly high over time. Not to mention, the prospect of temporarily increasing the speed of a CPU to deal with bursty system usage is nice as well, given temp monitoring facilities.
Ideally, in my opinion, the CPU should run as fast as it safely can given its current running temperature, and slow down as the OS tells it to do so or by intelligently metering HLT calls and lowering its speed as appropriate until it gets few HLTs. This would be an optimal setup, but is only possible with a CPU that allows for arbitrary speed changes. Multiplier-locked CPUs on a bus that likes to be left alone is not going to cut it, and for that matter neither will an unlocked CPU that only adjusts in increments of
I know that gota little off topic, but I see some serious potential for this aside from the theoretical speed increase. Just my $.02.
"You should stop trollspotting though. It's really irritating."
To whom? And why?
"I'd be quite pissed if you did that to one of my accounts..."
If I did it to "one of your accounts" it would be because *I* was pissed already. I'm getting pretty sick of trolls following an obvious procedure to garner karma and generate huge responses. Of course, the anger should really be directed as idiotic moderators, but there's no way to get a handle on *them*, now that meta-mod is gone.
As for you not being a troll: It's a gray area. Some people (or accounts) are trolls. Some are regular accounts that sometimes troll. Shoeboy is the latter. I don't care about that so much, because the account generates real discussion most of the rest of the time (or humor in Shoeboy's case).
--
Non-meta-modded "Overrated" mods are killing Slashdot
Non-meta-modded "Overrated" mods are killing Slashdot
(Hey Ryan! Here's your proof!)
That's because your brain is an analog computer, not a digital one. As for "if the brain can do it, it must be possible", that is simply not true. We are still in the dark ages as to what the brain actually does and how it actually does it; and we won't be able to use any of our discoveries in information processing technology for the forseeable future.
And no, artificial neural networks are NOT analogous to how the brain operates. It is most useful to think of them purely as mathematical creations. They are orders of magnitude simpler than the networks found in the brain; and their operations are at BEST guesswork.
Hell, what term will we use in place of "overclocking"?
"Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
I remember from my digital electronics class that in many ways, it makes the most sense to have tri-state (instead of binary) electronics. (In theory, e-state would be better, but since e isn't an integer, that's a little hard.) Of course, the problem is making the logic work, since your signals are low, medium, and high.
Well, what if the logic worked such that if any input was medium, the output was medium. Otherwise, the output was as it is now with binary logic. Then you could build a CPU that left units that weren't being used in the medium state. When an operation was performed, you would know when it was done as soon as the result didn't have medium bits.
Likewise, you could push this back on the memory and other subsystems.
Of course, now the question is whether adding the additional state is worth it in eliminating the clock.
If we don't have some sort of timing mechanism, how do we manage to play music, jump rope, or do anything else that requires regular periodic action?
Have a poll on it
1. FLOPS
2. Trolls/Sec
3. Katz/Sec
4. HP
5. Porn/Sec
Its good to see you can find the funny side and don't hold a grudge; while slightly juvenile of the ./ crew it did make me smirk when I read it.
Of course, on a serious note I could completely sympathise if e.g. Verant took a similiar stance on eBay trading of virtual property - its understandable to slap circumventing of the rules in place for allocation of the stuff. Coming back to this example, allowing a goat troll to post with +2 bonus simply because they bought an account would be irritating, to say the least.
"I Know You Are But What Am I?"
I think in such a system, other features (code optimization, use of 3D accelerators, etc) will be more important than the speed of an add. It will even take several years of experimentation to determine what optimizations to make (how many times is it better to add than multiply, how should loops be unrolled, etc).
I think many traditional measurements will become worse than useless, and instead misleading. Since a lot of your repetative math operations may be unloaded on your 3D accelerator, it is questionable that, even if you could decide how to measure it, floating-point-operations per seconds would be a real indicator. I wouldn't want the manufacturer optimizing for that over other, useful things.
A better question is, how long does a NOP last? Won't this system optimize it out? How can you time a NOP without a clock?
One problem with aynchronous systems is testing.
If you have a chip where some of the units are slower than expected, you might get curious interactions and "race conditions" that are
very hard to test before you put the chip into
service.
Also, designing for asychronous logic has
been difficult - designing clocked and even
pipelined systems is a breeze compared to
dealing with asynchronous design. A lot of the
structured methods that have been developed for
conventional clocked circuits cannot be used,
and so designers have a lot of trouble
building complex systems.
Asynchronous CPUs exist. Have a look here. It's a commercial 32-bit system-on-chip with an Amulet asynchronous core. Even that article's a year old.
That's how brains seem to work. When synchronization is seen in the brain, it is suspected to reflect multiple regions attending to a single object. Note that synchronization in the brain is self-organized, rather than driven by a clock.
A little bit of a pain, but far from impossible. Anyone who works on software for a multithreaded, multiprocessor, or distributed environment solves asynchrony-related problems all the time. We do it by having locks instead of clocks; hardware folks can do and on occasion have done just the same. I'm sorry to hear that such basically simple problems are considered unsolvable by garden-variety EEs.
Slashdot - News for Herds. Stuff that Splatters.
It says radio interference is less but isn't the radio signal strength just spread over a larger spectrum? Instead of the whole chip broadcasting at 1 (really 3 or 4 because of the buses) strong signal, we actually have an unpredictable spread across multiple frequencies? Might that be more of a problem than a predictable frequency?
just design it as a data-flow chip, with functional units propagating answers when they are availible and halting on partial inputs (you can even envisage a system which allows out-of-order execution of ALU operations). The main difficulty is likely to be getting in-order commit to work out.
appart from that, it is basically an excersise in bookkeeping -- tag all values as belonging to a subinstruction, so that you are able to get the data dependencies right.
I could go on, but I think you get the idea. However, let me emphasize that the situation of the whole chip waiting on the slowest component is what we AVOID by going asynchonous, as this is exactly the reason why intel needs to pipeline so damn deep to get the clock rate up. They need to split the pipeline into steps small enought that each step can be done in one clock. Asynch circuitry wouldn't have that problem.
the Asynchronous VLSI and Architecture Group is also active in this field
You can have this one for $3.50
So I sez to him, I ain't givin' you no damn three-fity.
The radio interference is from the clock's rising and falling edges occuring, say 1,000,000 times per second (at 1MHz). Without a clock signal being driven throughout the entire circuit constantly, there are fewer fluxes, for shorter periods of time. So not only does the amount of the circuit causing radio interference at one time get reduced, but the amount of time it produces that interference is reduced.
Is here Sorry
What kind of benifits do clockless chips present?
--
--
silence is poetry.
well, to do a digital lowpass filter, you would use a FFT; i doubt that many commercial audio devices would use time-domain convolution when the FFT is faster
For one thing, it wouldn't necessarily be faster. Filtering with FFT is O(n log w) where w == window size; time-domain convolution is O(nm) where m == filter length. The hard edges of the FFT window creates artifacts that can be audible as a buzzing noise; this is why MP3 and Vorbis spend a few extra cycles on MDCT (an overlapped transform). Besides, you don't need a lot of taps; I know of a decent FOUR tap low-pass filter [11 19 5 -3]/32.
All your hallucinogen are belong to us.
Will I retire or break 10K?
Here's the URL for the asynchronous design group's homepage There's more info there.
The familiar gigahertz.
In other words, we'll continue to use synchronous processors for quite a while longer, and we will continue to judge their performance by their clock rate.
--
Fuck the system? Nah, you might catch something.
Is ten years of research really worth a 20% decrease in power consumption and a 15% decrease in overall chip size?
It is absolutely worthwhile to have a handful of people working for 10 years on this if, at the end, everyone can leverage the benefit of their work. In this specific case, asynchronous vs. synchronous is orthogonal to fabrication technologies. The lessons learned can be applied to future processes as easily as present-day ones.
Yup, digital communication systems don't SEND clock signals over wires or air, but they DO USE clocks, clock is regenerated from the signal encoding (RZ, NRZ, AMI, HDB3 or Manchester) and fed to the rest of the system so when you decode the signal you know where the transition from one bit to the other is.
Not all encodings represent the transition state with a voltage, RZ and NRZ are basically TTL signals (GNDVcc), Manchester, AMI and HDB3 are differential logic signals (-VeeGNDVcc) and do represent phase transitions with a different voltage...
Remember, telecomms and computer science are different beasts so circular logic doesn't apply here...
I hate to agree with davecrazy but...
With all the suffering and poverty in the world we should really question whether some "scientists" deserve the money they get or whether those same funds could be utilised elswhere.
Your pizza just the way you ought to have it.
Anyone with experience in chip design want to make an evaluation of the possibility?
"Evil company X is threatening to restrict our rights! Let's all get together to stop--OOOH! SHINEY!!!" -- AC
Anyway, if anyone is interested, I have copies of both papers (presented at HPCA3 and HPCA4) that I can put online if there are people interested. Actually, I've got the simulators and stuff too if you are interested in those for continued research. It's a cycle-based simulator written in Java (don't laugh, it was a really good decision) that runs off traces generated by SimpleScalar.
Basically we drove the architectural research for CFPP forward about a generation's worth. Some of their papers look a bit like our stuff, but it was a logical progression, and I haven't gotten a chance to read their stuff in detail. They are more concentrated in the asynch circuits, where we were approaching it from a higher level, with clocked approach. We stuck with clocks because our research was at such a higher level and it didn't model the async circuits. Besides, after the architecture was straightened out, it would be easier to then work out the details of the async circuts at the lower levels.
It's really nice to see the research that I was doing being continued on :-)
The effort sounds like a great science fair project. Above that, I can't see anything coming out of it to fruition.
- I don't care if they globalize against free speech. All my best free thoughts are done in my head.
m = milli = 1/1000
M = Mega = 1000x
So there!
I've seen this posted below, but there's an even more interesting question: Not all chips off the same line are the same speed. How do you "bin" the chips? Also, suppose one chip gets a somewhat faster overall chip, but for some reason the part that does FMUL is a little slow. Maybe my application cares, maybe not. If I need a DB server or web server or anything similar, who cares? But if I'm number crunching, I care very much. Without the MHz number, there is no effective way to compare two of the same chip. I think this isn't much of a problem for the embedded market (it will run X application in real-time, guaranteed); but for the CPU market it is a big deal.
I rememeber hearing about the jews in WW2 being tortured if they couldnt tell the guard when a minute was up. They had a problem: no clock. So what did they do? They counted the beats of their heart, and with enough pratice they were able to get it in sync with the nazis test. They would be spared if they could tell when a minute was up (give or take a few seconds). I personaly think that this aproach does work, and that the clock doesnt always need to be in the processing area, eg: the heart (sure your counting with your brain...)... Perhaps I am wrong, but wasnt there a way to make the clock off cpu?
Fight censors!
"Not my manner of thinking but the manner of thinking of others has been the source of my unhappiness." - M
I always hated giving the speed of a computer in MHZ. The MHZ is almost a pointless meaning of speed sience on most system 90% of the CPU cycles are generally wasted. The Cache, RAM, Harddisk Speed, Pipelining, and Bandwith (Network and Bus) are the real speed of the computer. Sience the computer will generally go as fast as it can retreve memory. If you havent realized it yet. Many Chip Makers are just boosting up the MHZ and reducing all the more expensive components that really improve speed. The only case where really fast MHZ are needed is when you are doing a lot of complex Math Problems such as 3d Rendering or Vector compuing. (Which now are mainly done on paralallel computers, which MHZ dont count as much because the fact that you have hundreds or thousands of processors where MHZ still dont count because it can usually be done in a Big O set down) Todays programs require more memory usuage then processor usage so we should judge the speeds on what counts and not the Meegnless MHZ which alone dosent count for much anyways. My 440Mhz system with 2megs of Cache on most apps can beat a 1GHZ System with 512K of cache.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Fundamental mode is a basic circuits concept, and is the fastest possible design for any specific set of functional components. The only problem is that race conditions are a pain in the butt to solve, but surely computers can design these things these days.
There was a longer article in the infamous German c't magazin about it, sorry but that article is not online. The main idea is to let different (very small) sections of the chip work at the speed they want to and provide several join points. The problem is that all sections have to provide a signal about the processing state, which makes an extended NAND take more components. So the size of the new chips is hard to predict, as the small units provide the syncronisation themselves, which makes them bigger, but also makes external syncronization components obsolete.
The new smaller parts of the chip can go faster, as the ways are shorter, but the joining points have to be synconized in way that make the speed advantage in the sections not vanish.
Just because I can imagine doing a hippopotamus, doesn't mean I'd like to do it.
I believe that he's rather understating the problem: in CPUs the clock is the source of major headaches. I suggest you look at any of the recent ISSCC digests to see the lengths to which we have to go to design clocks for these beasts. You'll see H-trees to manage power distribution effectively. You'll see 43 different clock domains with custom alignment circuitry available on test points to accurately align clocks (as the Intel P4 does). You'll see arrays of PLLs in clock subdomains to align clocks and minimize clock skew across the chip (as the IBM POWER3 does). To say that clock distribution is a major headache is an understatement; and he does underestimate the amount of power that the clock tree consumes since can be higher than half the chip power in some MPUs. The numbers he gave are appropriate for a normal ASIC, not a CPU.
All that said, async circuits have been around a long time and they have yet to prove viable. The overhead for adding the additional "computation completed" signal usually more than compensates for area and speed savings you might get for getting rid of the clock. Besides, you still have to clock the output to be able to talk to any modern bus. I'll believe this when I see the details of the "new magic" in operation, but right now I put this in the same class as "cold fusion."
Evans and Sutherland may not be a household name, but they're still a world-class image generation company, at least in the real-time arena. E&S image generators are very common in flight simulation. They may not have a large market share, but from my experience that would mainly be due to potential customers ordering cheaper (and inferior) systems.
Personally, I think benchmarks will. It might be something similar to those run by ZDNet. However, I'd probably separate them out.
I'd have a `How fast does it compile kernel 2.x.0` test, how fast does it render `scene x`, and a few others. Enought to touch on what people would consider important. Office applications would done using the same sort of set-up ZDNet uses (unless somebody can think of a better one).
You'd have to use the same compiler/rendering/whatever program across all the platforms if possible. Otherwise, the ratings would be really unfair.
With this set-up, you could get a processor that would be great for what you're doing. The processor that works great for compiling may not be the best choice for office applications.
"This is indeed intriguing; what unit will replace the familiar megahertz"
Hear that? That's the sound of thousands of Apple engineers cheering as they anticipate the coming of widespread acceptance of ByteMark as the universal benchmark of computing power!
I'm not an OS guru, but afaik, you NEED the CPU clock to have an accurate timing loop in your OS. Especially in faster processors, the only reasonable way to figure out how long each "tick" is, is to divide the cpu clock. Are there other alternatives for a system like this?
God Fucking Damnit
The Amulet project has been going for over 10 years (it's an asynchronous ARM-like core, IIRC). I remember seeing a circuit that did asynchronous addition (or was it multiplication?) in a lecture about 2 years ago.
Another advantage to power is also the speed; the clock speed isn't determined by the worse case of the most expensive instruction. (e.g. adding 0 and 1 can be done a lot quicker than adding (2^31)-1 and 1, because of no overflow)
Yes, the frequencies will be more spread out over a much larger spectrum. This is then going to provide much less interference as it will look like background noise. remember that in a synchronous circuit, every flip-flop is clocked with the same clock, and the clock lines act as huge antennas. In an asynchronous circuit, every flip-flop (equivalent) is being "clocked" at a different frequency, and there are no longer clock lines acting as antennas. Therefore the frequency is spread out over a much larger spectrum and is also significantly less powerful than in an equivalent synchronous circuit.
What is not listed in the article but is also relevant is that that circuit, as well as creating less radiation, also consumes significantly (like half) the power of equivalent synchronous designs. This reduced power-consumption also reduces the amount of radiation produced.
For example, Royal Philips Electronics has built a pager using asynchronous electronics, taking advantage of the fact that the circuits produce far less radio interference than do clock-driven circuits. This makes it possible to operate a radio receiver that is directly next to the electronic circuit, greatly increasing the unit's operating efficiency.
Philips has also actively pursued research into asynchronous logic. Two small start-ups, Asynchronous Digital Design in Pasadena, Calif., and Theseus Logic in Orlando, Fla., are developing asynchronous chips for low-end consumer markets and high-performance computing systems.
---Technology will liberate us if it doesn't enslave us first.
So far most of the comments here are along the lines of "this won't work, it's too hard to debug, etc.". But it seems to me that the human brain is a pretty good example of asynchronous computing? The last time I checked, there wasn't any sort of high frequency clock signal running down my spine.
- Mike
being a grad student at uiuc, i've heard older professors from time to time talk about the good old days of computing. back in the day, our department was all about asynchronous logic because everybody was convinced it would be faster than having to distribute a clock signal all about the computer. well, it turned out that the asynchronous machines were really slow, so they cheated a little bit by mixing in a bit of synchronous logic to speed things up. if anything were to come of this, it would be a long time in coming.
Someone mentioned CalTech - sorry, I forgot the cid, but thank you - so I went and did a little digging. Here is a link to the CalTech Asynchronous VLSI group. Right on the page are some cogent explanations of why they believe asynchronous designs will eventually become commonplace. Further in are pointers to some good papers, and an interesting discussion of their results implementing an asynchronous version of the MIPS R3000 architecture.
Slashdot - News for Herds. Stuff that Splatters.
A MAC is a very important operation in digital signal processing.
Thanks for the responce - I guessed there was a standard FLOP, and I hoped someone would mention it. It sounds like a suitable measurement, but I'm still more interested in application benchmarks.
There is nothing new in studying asynchronous design. There is already some asynchrony in edgy circuit design currently. Note that asynchronous design also brings its own overhead, in "self synchronization" circuitry, such as handshakes. It allows for higher throughputs, but is a pain to design and to debug.
Both of them used async logic, were designed in the 60's and were quite successful, especially the 8/I. This is just another stupid "history repeats itself" article that Slashdot excels at. All digital circuit designers are aware of async logic and it's limitations.
They stab it with their steely knives,
But they just can't kill the beast.
Only some families of asynchronous hardware suffer from the problems of slow feedback. Typical bundled-data handshaking, as described above, generally does suffer from these problems; however, an approach developed at Adelaide University and known as FeFA manages to remove the bulk of the feedback. It is more similar to synchronous logic than most asynchronous styles, but the performance is generally slightly better (due to lower latency) and the noise and power consumption are both a lot lower.
[For the uninitiated: noise emissions are bad in synchronous circuits due to the switching of the clock - everything happens at once, so you get noise at the clock frequency and its harmonics]
In addition, speed-insensitive designs - for which software synthesis tools, notably the open-source petrify, now exist - do not have typical feedback paths, but each logic block provides its own indications of completion by a different data encoding. This offers the benefits of *no* timing constraints whatsoever, resulting in (theoretically) much easier design processes and a significantly higher level of tolerance to fabrication errors.
For a good introduction to asynchronous system design, look at:
_ as ync.html
n %2 0Bus%20ISCAS96.pdf
http://www.cs.man.ac.uk/async/background/return
A paper I found interesting on this subject:
http://www.ee.ic.ac.uk/pcheung/publications/Asy
Enjoy... I've heard Sutherland speak and he's done some very interesting things; most notably, he invented the method of "logical effort" for the sizing of transistors, without which CPUs would be several orders of magnitude harder to optimise than they are today.
These sigs are more interesting tha
First, most ASICs built these days are built with logic synthesis tools from Synopsys or Cadence. The inputs are typically register transfer level (RTL) code written in either the VHDL or Verilog languages. These logic synthesis tools have been around for quite some time (well over a decade for Synopsys) and have a significant infrastructure built around them. This design paradigm and sets of tools all assume synchronous logic. I can't fathom how you would build/constrain/debug these circuits in an asynchronous style with the existing toolset. And don't say "we'll use something else". It is these types of tools which have made our million gate ASICs possible. If we were still using schematics or other hack tools we would barely have passed the 80286. The current design tools took a long time to develop, hone, and get the bugs out of. The amount of money involved in just the tools is on the order of billions of dollars per year. That's a lot of inertia to move away from.
Second, yes the asynchronous approach can reduce the power consumption of ASICs. However, there are a lot of clocked approaches that do a very good job of reducing power. It all depends on what goals you have when you design the ASIC. Having multiple clocks and clock gating is common in the low power and embedded domains. It hasn't been as much of a factor in desktop systems but is certainly in use in handheld devices. The Crusoe takes these approaches to an extreme level. It's all a matter of what you want to design for and time to market pressures.
Lastly, speed. I think folks forget the feedback path. If you're going to rely on this asynchronous handshake, it requires a given stage to hold its outputs until the next stage acknowledges (asynchronously) that it got the data. This means the given stage can't accept anything new yet. This cascades/ripples back through the pipeline. This feedback takes time (and logic levels) that don't exist in clocked logic. Imagine an automotive assembly line where things could only move forward if each station got permission from his adjacent stations. In clocked logic you've guaranteed that the data is ready to move forward because you've calculated these things out. You've removed a bunch of communication overhead. Yes, there is slack in the synchronous pipeline, but for the most part current designs are pretty well balanced so that each stage uses a large portion of its clock cycle.
That's about all I can think of at the moment. I need to be getting home before I get snowed in! ;-) Just a few comments from a digital hardware designer. Hope this provided some food for thought...
A floating point operation is usually taken to mean a floating point multiply followed by a floating point addition, also known as a Multiply/Accumulate Cycle (MAC).
A MAC is a very important operation in digital signal processing. For example, to implement a digital lowpass filter (to remove tape hiss, for example), you define a finite impulse response filter (FIR filter) of some number of taps. You might need 256 taps to implement the needed low pass filter (this is a shot from the hip, the actual number of taps may be more or less). That means for every sample of audio (88.2kSamples/second for stereo audio) you need to do 256 MACs, or 22.6MFLOPS.
www.eFax.com are spammers
Happy now?
Years ago we used to talk about the speed of processors in mips (million instructions per second). I don't really see the problem with this measurement to this day. Flops aren't really that relevent for the majority of computer activities as with the exception of scientific applications most things are done using integers.
Of course this kind of measurement worked nicely on the ARM processor which exectued most instructions in a single clock cycle, however I suspect it may be somewhat more difficult on other processors which perhaps take varying time periods to execute different instructions.
Mtf@turf.org
If the radio signal strength remained the same, but was spread over a larger spectrum, then the *interference* would definitely be less because most receivers are tuned for a small bandwith section of the radio spectrum.
You could also say that since the receiver could be placed closer to the electronic bits then the wires would be shorter and would act less like antennas, making the total radiated energy less than in "traditional" pagers.
I swear to God, the first time I glanced at the article title, I thought it said, "Cockless Computing".
Sure, I've got Karma to burn. Do what you gotta do. Oh, and the lameness filter bites.
Quidquid latine dictum sit, altum viditur.
I only post comments when someone on the internet is wrong.
I know 'certain chip manufacturers' put in extra pipeline stages which increase the clock rate of the chip but actually degrade performance.
Most people will use the clock rate as a measure of the chip's performance so if you're designing a chip for end users then it makes more marketing sense to make a chip with a higher clock rate than one with a better performance.
The most common "clock-less" model is designing the circuit as a Finite State Machine, where the circuit is constantly checking "inputs" to determine when to move to the next step. This solves many timing issues: you send out a trigger pulse that activates a sub-circuit, and wait for a pulse on a return line to tell you when that piece is done computing. What you end up with is a complex system of Strobes and Acknologments, and lots of edge-sensitive circuitry (as opposed to Hi and Lo that is the basis of CMOS)
Also, some types of designs run better as async circuits: FFT and Division circuits if I recall, you just trigger it to start and let it the whole bundle cascade itself to completion. Also, bus circuitry in modern motherboards works in Async mode already (theres this nifty thing called IRQ...)
If you have a server room of constant size, with constant ac going, and you attempt to fill the room with a ridiculous number of boxes (stack that shit!), then as the number of comps in the room approaches infinity, the closer you get to having your b0x3n account for 100% of the electricity bill.
How stupid of a statement was that? Have you guys seen any of Exodus' server rooms?
--
Peace,
Lord Omlette
ICQ# 77863057
[o]_O
please elaborate.
--
Peace,
Lord Omlette
ICQ# 77863057
[o]_O
WRT the feedback path overhead....
IANAD (I am not a designer), but thinking about this I wonder if buffering interstage registers might not mitigate feedback path delay. Imagine three registers, R1 output of Stage 1, R2 buffer, and R3 input to Stage 2. Each register has a control bit (0 = read, 1 = unread). Further imagine two simple register to register copy circuits, one to copy R1 to R2, and a second to copy R2 to R3.
I apologize for the primitive exposition (I said IANAD), but intuitively it seems to me that such a buffer scheme could let logic stages overlap processing. The cost would be the time needed for the two hair-trigger copy operations between logic stages, but that should be minimal.
Bang1 - CopyR1R2 frees Stage1 to execute again, Bang2 - CopyR2R3 tells Stage2 it has an input.
If Stage1 completes a fast operation, the buffer copying lets it take on the next one (which might not be fast) perhaps before Stage2 is ready for its next input. Thus Stage1 and Stage2 can overlap in some circumstances, increasing overall speed. Multiply this by a dozen or so pipeline stages and the savings might be worth the effort.
Or perhaps this is the overhead the parent post was referring to...
We must free the CPU from the oppressive overlordship of the CPU Clock! Let the CPU work as it is wont to work, beholden to no one! Let Nature be our guide!
... Hey, we'd be living in SINchrony. Get it? Heh heh heh.
CPU 1: Hey, I think we have a calculation due in a few microseconds.
CPU 2: Dude, don't sweat it. We'll get by.
CPU 1: No, really, my mom said if we don't do some work she's gonna put a clock in here.
CPU 2: Dude, that'd suck.
CPU 1: Heh heh heh. I've got the munchies.
--
Non-meta-modded "Overrated" mods are killing Slashdot
Non-meta-modded "Overrated" mods are killing Slashdot
(Hey Ryan! Here's your proof!)
If you wanna know how fast a mhz-less chip is, just count the fps in Quake. DU-UH!
But seriously, even supposing we could come out with a retail clockless cpu, wouldn't it require a plethora of equally clockless peripherals like video cards and ide controllers and whatnot ? Otherwise it would need a clock to drive these "external" devices (external from the cpu's view, that is), and then we fall back into the same pit. The concept is fascinating but ill-fated I'm afraid.
-Billco, Fnarg.com
And all this, just because you can not suppress the clock from the taskbar.
At least in Linux you can go and edit your xinitrc not to start xclock if you don't want it to!
Juju
Black holes occur when God divides by zero.
Forgot the everything2.com link explaining that you can replace www with partners, channel for NYT.
My idea was to have a single datapipeline and two exception pipelines. Lay them out like parallel roads and lay a bunch of brutally simple cpu's along the opposite sides of the data pipeline (center). Have the two "exception" pipelines go directly to the front at the input side of the pipeline. These pipelines would need to be very complicated, they would need to understand wether they are holding data, instructions, or nothing. If they are holding nothing, they need to be able to signal the previous stage in the pipeline to come down the pipe one. This will be come obvious in a second.
The simple cpu's will need to read in an instruction from the pipeline, and two optional data words. (at this time the pipeline will need to forward to the last empty spot before the next instruction.) They will then perform thier calculation and output the result onto the pipeline. Any exceptions generated (or new instructions?) could be output onto the exception pipeline to be reinserted at the input side of the pipeline. Having 8 cpu's in such a system would allow the cpu's to operate, however, having fast enough pipeline throughput would be a major problem.
I couldn't think of solutions to some of the hard questions this idea gave me, I just don't know enough about the electronics and what they can do. I was hoping that the speed limit on the input pipeline would be able to be fast enough that instructions would always be available. However, I also thought that it would allow for a very efficient cpu. While the system was idle, there wouldn't be very many instructions on the pipeline, and nearly all of those could be handled by the first couple of cpu's on the pipeline, allowing the others to lie dormant. This sounds a lot like a turing machine, doesn't it!
I never noticed that before.
Spring is here. Don't believe me, look outside!
They are not considered unsolvable. They are considered intractable problems for the design time required by the industry. Asynch logic is one of the first things taught in the Digital side of EE. But those are small projects, _far_ fewer than the millions of transistors required my a modern micro.
UltraSparc III actually is partially asynch. The core pipeline uses a technique called ``Wave Pipelining'' which doesn't latch between pipeline stages. IIRC, one of the Alpha processors didn't have latches between the L1 cache pipeline stages.
It is true that MHz is less useful than FLOPS for determining performance. Also, with the numbers becoming huge, there isn't much difference betweem 766 MHz and 800 MHz, not as much as 100Mhz and 133 MHz. Still, with the same arcitecture, you can make rough estimates based on clock speed (100 Mhz Pentium vs 133 Mhz Pentium, if that is possible).
When you compare across processors, you have to start looking elsewhere. As another poster mentions, FLOPS (especially Multiple-Accumulate) is important for signal processing. Still, I would rather see benchmarks for two signal processing programs running on the same processor. That way, you are comparing apples to apples.
The mind boggles at what this new kind of tech means. What does a NOP mean? Do interrupts still work? Are multi-process pipelines still possible? How does memory (timed refresh, tight processor-to-memory channels) work? Video card interfaces? Other devices that may feed off the system clock? This device may be years away from commercial use, may never run Microsoft, and may take a while to even run Linux. The whole discussion seems rather academic, and the technology may only be used in academic or high-end applications.
I have long been facinated by neural nets and their potential to develop true machine consciousness. Currently most neural nets are stap-by-step simulations using clock-based CPUs, which somehow seems vulnerable to missing some key factor that makes the natural neural nets in the brain conscious. Most research into actual neural nets (not digital simulations) seem to concentrate on maximizing the number of connections and what training algorithm to use. Can Slashdot readers suggest something beyond these three familiar themes in neural net research (digital simulation, maximizing connections, training algorithm) that gets into how neural nets might be organized to achieve true machine consciousness? If this is achieved in a manner based on the human brain, I imagine numerous neural net "subsystems", each wired differently internally and connected to other subsystems like a patchwork quilt. What are these subsystems? How are their internal wiring schemes different? How are they connected? How do theese subsystems become self-aware?
Testing is the easiest part. Verification is the difficulty here - how do you verify that you've eliminated all race conditions? The whole problem of the design is to eliminate any and all possible race conditions. Today they use two-rail logic (one wire for 0, one wire for 1) and completion circuitry to decide when a computation is finished and another can start executing in the same unit. CalTech has made a MIPS compatible processor that works (with no race condition problems).
m
Asynchronous ARM core nears commercial debut (1998)
ARM researches asynchronous CPU design (feb 1995)
AMULET3: A High-Performance Self-Timed ARM Microprocessor (1998)
See www.avlsi.com for the Asynchronous Digital Design page. (They are one of the start-ups mentioned in the article)- --------
---------------------------------------
--
If we don't change direction soon, we'll end up where we're going.
Any EE knows the idiocy of designing complex logic without a clock. It is impossible to guarantee correct operation given latency variations due to slight temperature and voltage changes. To use a software analogy, how easy would it be to debug a program where half of the code consisted of conditional branches ?