10GHz Processors And Moore's Law
AntiFreeze writes "There is an interesting story on MSNBC about Intel's attempts at producing chips capable of running at faster than 10 gigahertz. There was a previous /. article in early December about this here. This article from MSNBC is much more detailed (both technically and non) than the original article referenced from December, and provides a very intriguing look at what Intel's planning to do over the next four years, and what they'll have to show the general public as soon as April 1st. And as always, there's the heated /. argument about Moore's law buried in there, too."
We all seem to find new CPU's "kickass", since they can do decoding/encoding faster, will enable faster generated images, etc. etc.
But as once was stated at the first lecture I saw about Moore's Law: If we don't have the technology (or software) to "use" this new hardware, what good is it? The gap between software and hardware is getting larger every day.
Just a small sidenote: apart from me running seti@home and some rendering stuff, my pII-celeron 266 is mostly having a load of 0.02.
This is a replacement signature.
are impossible in theory but really make no difference to the practical problem. You can solve the halting problem for every problem except that one special case which proves the halting problem unsolvable for every case. Woop! The "proof" that the halting problem is not solvable has done more to damage research into software verifiability than any suggestings that it might be a hard problem. How sane is that? Oh, we cant solve it for every case (because you can manufacture a case that is not solvable) so why bother trying to solve it for any cases, including the large majority of cases?
Sad.
How we know is more important than what we know.
Well, they travel through your body all the time, 24 hours a day. Are you going to shut down every radio transmitter in the world, or just live in a faraday cage?
-
I've had enough abrasive sigs. Kittens are cute and fuzzy.
[rant]
Haven't you learned ANYTHING from history? Apparently you were too busy studying physics to pay attention. Here's a tip - never, ever, ever tell a scientist that something's impossible, unless you want to be proven wrong...
[/rant]
Think outside the... Hey, where'd the friggin' box go?
Not only would a careful journalist make that distinction, but a careful professor of philosophy teaching the philosophy of technology would also point out the context. That being: Moore was listing a requirement of Intel staying on top of the processor heap. Fall under the 18-24 month doubling, and someone will most likely beat intel in the market. Which greatly alters Moore's Law's meaning; not a pace of technology, but a metric for corporate health.
USA-Democracy is 270 million YESes and NOes a day, not one every four years.
The article says that without EUV the end of Moore's law would be around 2005, so how much time has EUV bought?
How we know is more important than what we know.
GALLIUM-ARSENIDE FET AMPLIFIERS have been developed which provide low-noise amplification up to about 30 dB in the 7- to 18-gigahertz range. The power output of many of these amplifiers is relatively low, approximately 20 to 200 milliwatts, but that is satisfactory for many microwave applications. Research has extended both the frequency range and the power output of gallium-arsenide FET amplifiers to frequencies as high as 26.5 gigahertz and power levels in excess of 1 watt in multistage amplifiers.
The web page with this info is located at http://www.tpub.com/neets/book11/45o.htm. There is nothing preventing this being used for computing. Advances need to be made to provide syncronised clock signals to all the chip and the power consumption will need to be dealt with. These are analog devices at this time.
The truth shall set you free!
And as always, there's the heated /. argument about Moore's law buried in there too
What heated argument? They're just saying this is a way to keep it going...
--
Mosix does this I believe... clustering can work really well if you have an abundance of independant processes... like a multiuser system, where the user doesn't really know where their processes are actually being executed.
My point was that the typical processes running on a desktop machine are unsuitable for this type of farming out to a cluster.
Doug
Venn ist das nurnstuck git und Slotermeyer? Ya! Beigerhund das oder die Flipperwaldt gersput!
Couldn't you sidestep this problem with a solution you hinted at the end of your post? That is, parellel processing? If your statement is actually true, that 1 in every 250 Billion bits will be corrupted, couldn't you just run the same process on two processors at the same time and compare them? If the two results aren't the same, do it over again. But for the 1 in 250 Billion bits to line up with the other 1 in 250 Billion bits would be so astronomically unlikely that you may solve the problem right there. and if not, stick a second redundant processor in, and so on.
I think you see the solution to your own problem, so don't go saying it's impossible.
Take a working chip of anykind. Try something new like copper, silver, aluminum gold aloy, different transistor doping, interconnect size,. Get new equipment to process it. Discover copper, silver migrates into silicon and kills transistors. Try a few things to stop copper migration. Each experiment takes time and money to set up and the results are not known for a couple months because you have to make the IC to test it. When you are done 2 years later you have a 25% faster part. Now repeat with the next speed enhancement. It is not a conspiracy, it is the developement cycle. Rushing it means changing multiple things at the same time like changing lines, insutlaters, transistor size, doping, voltage, etc. all at once then not understanding why it doesn't work. Any single change usualy makes a device not work on the first try. The result needs examined and changes made to make it work right. Improvements are done from something working to somethnig unknown and making it work, then going on to the next change.
The truth shall set you free!
I suppose you can think of it like CISC. I've never really thought of it that way - I'm sure the nVida GeForce has an instruction set. I think of it more like the SGI / Amiga way of doing things. You have a very basic general purpose CPU, and then have optimized hardware to perform all the regular and other-wise complex tasks. I mentioned OS simply because they're among the most common operations a program does (even though it may not necessarily be that complex).
As for the question about making word run faster.. This entire discussion assumes an underlying desire to ever increase the speed of processing; and specifically the potential limits of More's law - namely continuing human ingenuity.
Note, there are all sorts of problems with hardware based operation, but so long as we have API's like OpenGL, POSIX, MFC, etc, then we don't have to worry about the specifics of how it's implemented. Is the latest kernel hardware accelerated? Who cares from a developer's point of view.
-Michael
I don't see how we're "handing" anything over.. Has OpenGL been conquored by nVida or 3Dfx? So long as you have an open API, the specifics of how someone implements it doesn't fundamentally affect you. Yes, MS might put feature bloat that we become depend on exclusively their products; but they'll just open themselves up for further anti-trust litigation.
Further, I really only see ASIC's as stepping stones towards development. Isn't the GeForce a full blown processor? This is most likely because of the large volumes..
Rapid switching FPGA could very well be revolutionary, since you'd have one or two pieces of hardware that are reprogrammed for their environment on the fly. But that's vapor-wear at this point. FPGA is (to my understanding) primarily for proof-of concept, or getting something out the door.
-Michael
-Michael
omg! why not make a network cable in the form factor of a DIMM? holy fukc!? the thinkg of it is, the network would just be as fast as ram, and as long as you put a decent size cache in there, say 16Mb@clock you could tune the kernel to cooperatively manage memory across the nodes. you could then order up mobo's with multiplexed chipsets and 16 DIMM Slots, no ports, no busses. you could then connect each processor in the cloud by 16 dimensions. at >10k nodes, this system would be sufficient (with an assumption of each processor doing about 2gflops with a combined 256Mb state cache) to process a perfect copy model of the human mind, in realtime.
thankyou.
:)Fudboy
:)Fudboy
I guess I'm only a Fudboy, looking for that real Transmeta
But that won't stop the chip manufacturers from trying of course.
January 15, 2028 - Intel announces their new 400THz processor, which performs 100 billion floating-point operations in the millisecond before it consumes itself in a nuclear explosion. This is a step up from AMD's recent processor which simply fries any nearby user with bolts of plasma energy. Hobbyists are already looking into ways to overclock the chip.
--
Obfuscated e-mail addresses won't stop sadistic 12-year-old ACs.
Win dain a lotica, en vai tu ri silota
While everything is obviously shielded, it is still amusing to speculate on the cooking potentials of the insides of your PC.
What is more worrisome is the problem of heat. I recall reading someplace that right now a typical processor runs the energy of a 60 watt light bulb through that piece of ceramic.
When we multiply this with the frequency shifts and the number of transistors, it becomes worrisome.
I occasionally have visions of computers glowing like a flying saucer [smile]
"It is a greater offense to steal men's labor, than their clothes"
Better off with multiple slower CPUs, like 1.5 GHz and Beowulf them. More machines to take care of, but better than rushed/poor fabbing of CPUs. Plus you get redundancy and almost unlimited scalability. And ungodly bandwidth if you use gigabit cards instead of just 100bt. It's the way to go for pretty much everything unless you have something custom for one cpu (which is rare these days)
Would you have felt better if they said "at the start of the second quarter"?
How we know is more important than what we know.
heard of reversible computing? We havn't even scratched the surface of power/heat reduction.
How we know is more important than what we know.
What's funny is when I got my 800MHz Athlon, I committed myself to keeping the case cover on all the time for fear of rads. :-) One of my friends and I discussed this, but neither of us know much about atomic physics.
Will processors running at that speed require shielding?
Yeah, right! The future is DNA computing? It's a hack that happens work for some obscure computationally intensive problems that can easily be paralellized. You do not want a DNA computer to replace your desktop, trust me on this. It would take hours just to set up a simple computation. It could make for an interesting co-processor, though, but for mainstream use the gains are probably not worth it. Parellell computing? Maybe, but there are lot's of interesting problems that are not easily paralellized. Anyway, you forgot to mention quantum computing. That is definitely interesting, if it will ever work (and chances are it will not). But for the close future, I'll be willing to bet a lot that Moore's law will probably still be valid for a few more years.
2D scaling is a problem, when you start pushing theoretical limits. However, many of our speed increases are due to better designs as well as better techniques. I've seen some promising theoretical work in expanding into the third dimension, which may cause cooling problems but may also allow better designs.
I also imagine, due to cooling requirements, development may go the route of multiple cheaper processors rather than expensive Apollo project processors (processors that push the theoretical limits). When this happens, software will start to morph to take advantage of it, and I predict we will still see gains comparable to Moore's law.
No one will ever need more than 640 MHz!
Remember "Bring 'em on"? *sigh
Oh, but Murphy's law has been experimentally tested by many people many times. It is most definitely a law of physics :-)
Better off with multiple slower CPUs, like 1.5 GHz and Beowulf them. More machines to take care of, but better than rushed/poor fabbing of CPUs. Plus you get redundancy and almost unlimited scalability. And ungodly bandwidth if you use gigabit cards instead of just 100bt. It's the way to go for pretty much everything unless you have something custom for one cpu (which is rare these days)
Actually if you are going to have a system of highly interconnected cpu's like in a beowulf cluster then you are limited fairly severly in scalability. This is mostly due to the size of the memory bus. Even if you move up to gigabit ethernet cards the bus is a big limiting factor.
Secondly the class of tasks that a cluster is useful for is not that big. It does nothing towards making a really bloated program run any faster. They are not very good for real time tasks because once you have chopped up a problem and distributed it to all of the processors you have very little time to work on it and get the results back in time.
While very useful the cluster is not likely to be the solution to potential end of Moore's law like growth.
When I want your opinion I will beat it out of you.
Then why does almost every single linux company I know of (regardless of their field) have *at least* a 6-node beowulf cluster. It's not for SETI, my friend. Some folks need that power without having to get a crazy expensive Sun/HP/SGI/DEC/Aviion or with some performance-crippled 8-way xeon. If you BREAK UP the task, it works better. Gigabit is more than enough for databases, etc.
I thought crypt/brute force ratios increased exponentially in real time and cpu years, not incrementatlly.
"Me Ted"
BOSTON SUCKS!
Ok, so what happens when we hit a practical mile-stone? Will faster general purpose CPU's achieve such a limit that it costs 10 times as much to achieve 10% performance gain?
Here are the alternatives. Get away from pipelining (which is a hack that facilitates ever-increasing clock-speeds).. Return to optimized and specialized adders / multiplers, etc. Now that we make things in parallel with 2 - 4 adders, simply produce CPU's with 24 adders, each with no inter-vening pipeline buffers.. The number of transistors significantly goes down for each adder, and through the use high conductive materials (such as diamond) you can achieve large surface area chips. (This assumes that you take on the reverse of existing P4's.. You have the control log and memory interfaces running at 10GHZ while your adder runs at say 100MHZ, which each gate switching with nearly 1/20GHZ probagation delay)
Step two is even more obvious. Specialized hardware.. In the video world, we have only to compare software OpenGL to hardware OpenGL. specialized hardware is monumental because it's the ultimate parallel algorithm. Those algorithms such as MFC, or possibly even OS calls could be hardware controlled.. Granted it makes upgrades a lot harder, but don't we find ourselves spending the money on new video cards every year and a half now? How often does someone upgrade winNT? It already costs $150 for the OS upgrade, what's an additional $50 for the PCI / adaptive AGP card?
To facilitate smoother transitions, I think that FPGA or ASICS might have a popularity explosion. As far as I know, they're still manufactured with huge gate-widths.. Bring an ASICs into the "10GHZ" range, and you have the potential for incredible performance.
In fact, the CPU as we know it might fade away into the anals of history over time. A return to cartraiges perhaps?
-Michael
-Michael
They'll have something to show on April 1st? Am I the only one who raised an eyebrow at this bit?
The reason it is impossible is due to heating issues, and also that down at 0.01 microns a single bit is represented by only a few hundred electrons. Quantum Mechanics states that the uncertainty od such a conglomeration is about 1 in 200 Billion - ie, the 'bit' is only certain to that degree. Given that a processor at this soeed will precess many times this amount per second, it is impossible for a processor to run at this architectural scale because one in every 250 Billion bits will be corrupted - which is fatal. I have estimated that the top speed we are likely to see is about 3GHz at 0.05 microns. To assert otherwise is hogwash.
The future lies in parallel processing and DNA, mark my words. You can bet AMD and Intel are reseaching it now. The traditional CPU is nearly dead.
--Anticipation of a New Lover's Arrival, The
--
"We expect to have the first full field-scanned images by April 1," said Chuck Gwyn, program director for EUV. ;)
Wouldn't happen to look like this would it?
-----
"Almost isn't good enough - but it's almost good enough."
-----
"Almost isn't good enough - but it's almost good enough."
-Me
Theoretically, it should be possible to slightly increase the width of all data paths and add some error correction information.
:-)
The tricky part is that not only storage and data paths would need ECC - all processing circuitry would need to support error correction with redundant circuits. Even the most basic building blocks would need to be redesigned and replaced with versions that incorporate ECC sanity checking into their internal design to take into account the fact that any intermediate stage may flip a bit. I imagine designing an error correcting adder or multiplier would be a nightmare but it's possible.
The resulting architecture would probably need to be a very simple processor, VLIW perhaps.
And I bet it would emulate a Pentium using Transmeta-style translation
----
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
Check out this truly scheweeeeeeet cluster!
http://tux.anu.edu.au/Projects/Bunyip/Beo-017.jpg
http://tux.anu.edu.au/Projects/Bunyip/Beo-015.jpg
Now *THAT* would be the ultimate quake server or GIMP beast! But... who can afford the electricity?!?!
If you go to an opto system, speed will always be coefficient of medium, ~3,000,000kps. But when you transfer energy, if it isn't converted back into signal, heat is generated too. What would be way kewl(Cool!) is integrated Peltier junctions to help dissipate heat. Built in heal sink!
Another thing is the inductive coupling of longer wires. There's a reason why all those stupid ground returns on a parallel cable! They redirect the induced signal to gnd. Capacitance effectively blunts the wavefront of a signal, but if they work with soliton pulses(essentially a pre squished square wave), they have nothing to blunt/induce. Induction is a rise time effect more than anything else. The trouble with solitons is when is the bloody thing a 0/1???
This mind intentionally left blank.
The KKK a bunch of sheetheads? You decide!
Power dissipation goes down with reduced size. This makes up for the increase with increased speed.
and also that down at 0.01 microns a single bit is represented by only a few hundred electrons.
Only if they make the transistor that small. .01 micron is the minimum size of a feature, not the size of all features. While smaller transistors are nice, smaller busses are actually more important. Anyway, to take your assumption at face value anyway...
Quantum Mechanics states that the uncertainty od such a conglomeration is about 1 in 200 Billion - ie, the 'bit' is only certain to that degree. Given that a processor at this soeed will precess many times this amount per second, it is impossible for a processor to run at this architectural scale because one in every 250 Billion bits will be corrupted - which is fatal.
Certainly so -- if you don't design any error correction into the chip. It only requires about a 20% increase in real estate to implement two parity bits which would require two simultaneous bit failures to create a nonresolvable error. This would also slow things down very little as parity checking can be done in parallel with computation -- it's always going on. Thus, instead of crashing once every minute or so as your calculations suggest, it would crash once every several hundred billion minutes or so, which is quite tolerable.
I have estimated that the top speed we are likely to see is about 3GHz at 0.05 microns. To assert otherwise is hogwash.
You know a lot about physics, but not much about CPU architecture. Your pet peeve will be relatively simple to work around when the time comes.
Brackets contain world's first nanosig, highly magnified:[.]
Intel should stop investing so much in CPU speed and move on to more important bottleneck elimination, such as bus speed.
- Amon CMB
Men believe what they want. - Caesar
Although Infiniband is still just getting going, it has great potential for clustering. Adapter's can RDMA to remote hosts directly. Although you could do this with PCI-PCI bridges, PCI was still slow, had limited interconnectivity and latency increased as bridges and busses were added. Infiniband cuts through all that with a network like topology (hubs and switches) but still allows direct memory access.
Of course you still have the problem that current clusters require software be rewritten to take advantage of it. I think someone could design a system that finds other systems across infiniband and shares the work load automatically. The more transparent the clustering, the better.
-- soldack
A more careful journalist would hopefully have written:
--
Blaming GW Bush for the Iraq war is like blaming Ronald McDonald for the poor quality of food.
Lowering the voltage has some good effects - the main one is that the power consumption drops as the square of the voltage (assuming Ohms law). However lowering the voltage causes everything to run slower. The old fashioned 4000 series CMOS chips were much faster at 15 volts than they were at 5 volts.
Chips get faster when they shrink because the capacitances decrease as the surface area of a conductor shrinks; cut the feature size by a factor of two in both directions and the capacitance is down by a factor of four. However there is another effect which occurs as everything shrinks; the insulation between features shrinks, and that shrinking feature increases the parasitic capacitance between the two features.
In the past the increase in capacitance caused by the thinning of insulators has not been a significant effect in limiting clock speeds but there comes a point where the effect does become important. In neurons the cell walls are so thin that the capacitance effects of the thin dielectric limit signal propagation speeds in the neuron to about 180 miles per hour or so. Long axons have thick sheaths to cut the capacitance and increase the signal propagation speeds.
This increasing capacitance with the decreasing dielectric thickness combined with the decreasing speed from the lowered voltages will eventually put an effective cap on the clock speed of silicon devices. The only big trick left in the book is too switch to Diamond based semi conductors - which are as much better than silicon than silicon was than germanium - and that will give us some more speed. Above a certain frequency Nature itself changes the way it does things. At RF frequencies bulk devices like crystals function - at the frequencies of light waves only atomic devices can switch from one state to another quickly enough.
In other words at some point in the near future we are going to reach a point where simple die shrinking won't be enough to crank up clock speeds any more. Enjoy things while they last - but another factor of a thousand increase in clock speed (Apple II one megahertz to present day one gigahertz) is going to be very difficult to achieve.
~~~
It actually isn't hard at all to do this. Individual registers can be verified in real time with parity checks. Multiple parity bits can allow parity errors to automatically be resolved without losing data. A clock cycle might have to be skipped while this is done -- once every few hundred billion clocks. Otherwise, it's transparent and consumes rather little chip real estate.
In some cases it would be easier to duplicate entire modules and compare the outputs. It's not necessary to use three blocks with voting; if a compare fails, you redo the operation. It's a computer; until you write the results you still have your starting state to begin from again. So once again, you miss a clock cycle once in a great while.
Remember also that most of the computer is not the CPU and isn't implemented at this level or running at this speed. You only have to harden the parts that are.
Brackets contain world's first nanosig, highly magnified:[.]
I had a 4 MHz Z80 system by Amstrad from the late 80's whose word processor blew away WordPerfect on a 12 MHz 8086 spell-checking documents. (I'd still be using it but it used proprietary disks and you can't get the drives any more.) This was a very powerful, intuitive word cruncher using an extended text mode that could display 512 different characters at a time on its 90x25 screen, and for ease of use it compared favorably with all but the fastest newest Windows-based systems. It also ran CP/M, and sported an interpreted BASIC that made QBasic look like a sick joke.
If we had software written like that for the x86 platform, it would be amazing what these machines could do. Imagine something text based, with pre-emptive multitasking, installable with only the features you need, highly configurable, with optional graphics, and built by people who really care about what they are doing...
Well, I guess we have an operating system like that, but it would be nice to have applications too.
Brackets contain world's first nanosig, highly magnified:[.]
And as I pointed out they were wrong.
even if it was running at 10Ghz all the components in the mobo would suffer from heavy timing problems due to different wire lengths.
Cray mainframes of the 80's had this problem as they were refrigerator-sized and operating over 100 MHz; the problem was solved by modularizing the system, desynchronizing the components, and recombining data under controlled circumstances. I remember being told by a beaming engineer in 1982 that some busses had three different addresses on them at the same time.
So it can be done. Now, we'll just see it done 100 times faster with equipment 100 times smaller -- a single chip.
Brackets contain world's first nanosig, highly magnified:[.]
Are we going to have a party when we reach this milestone?
As other people have stated before such processor is physically impossible, even if it was running at 10Ghz all the components in the mobo would suffer from heavy timing problems due to different wire lengths. The cost of memory for such a system would be prohibitive (yes, rambus speed would be a joke for such a beast) You would be better off with 10 1Gb processors, even in different motherboards, you would have less memory latency and a lot better price/performance tag.