Reduce Transistor Power Consumption
revelCyllufyalP writes to tell us that University of Kentucky researchers have discovered a way to reduce the overall power consumption of transistors. From the article: "In order to improve computer chips' performance, transistors' size and gate insulators have to be continuously shrunken so that more components can be packed into a single chip. Computer chip producers were hitting a wall in downscaling the transistors and gate insulators because of their inability to reduce the leakage current of the existing gate insulators. This new technique will help the chip producers to develop more powerful chips with low-power consumption."
This may not sound like that big a deal, but let me assure you this is very significant to wireless infrastructure enhancement. One of the biggest limiting factors in wireless devices is power consumptions, so this is great news for the industry!
LINUX ONLINE POKER: Linux Poker
The press release says they're getting several orders of magnitude less tunneling current through gate insulators. But tunneling happens because some portion of the electron's wavefunction extends to the other side of the insulator. Whst are they changing that would affect the physics? Or are they fixing a different kind of leakage and getting the press release wrong?
If Intel could apply this technique to existing P4 chips that burn ~150 watts what would the savings be? 10,000 - 100,000x less leakage current is how much of the equation?
Seriously, we have some really good programs. Hank Dietz, Bill Dieter, and Tim Mattox have some exceptional results in parallel computing. Until recently, their $40,000 home-made cluster beat UK's million dollar HP Superdome cluster in Linpack ratings. The $40k even factors in the cost of student labor (in the form of pizza) to wire the cluster.
I just wish I could say the same for our CS department... It's been getting steadily better since the College of Engineering adopted it, but they switched to M$ Visual Studio .NET this year, and that really worries me... program internals shouldn't be hidden from the student at lower levels of computer science.
I hope I didn't /. aggregate.org too badly...
Moore's law still not dead.
zach.
"Destroy science and religion. Science would re-emerge exactly the same; but not religion." - Penn Jillette, paraphrased
The whole world will change when computers run faster.
As probably one of the few semiconductor geeks on /., I have to say: Where's the news? Gate dielectrics are always made with rapid thermal processing on current technologies. Basically, stick a wafer in a chamber, flow some gas, turn on some super-high intensity
lamps, heat the wafer to >1000C for a very brief time, grow a few atomic layers of silicon dioxide (or some variant that includes nitrogen), turn off lamps, cool wafer, take it out of chamber.
From what little info is in the press release, it doesn't sound like they're doing anything revolutionary, so I'm curious why they claim they can reduce gate leakage by so much.
Gate oxides in current microprocessors are around 1.2-2 nm and are grown using RTP (rapid thermal process). A furnace oxidation is too fast. So yes industry already uses rapid thermal anneal (as suggested in TFA) for their gate oxides. Can anyone tell how is the new ?
What this really means is that the next generation has just become possible. As an incidental side benefit, current-generation laptops will be able to run cooler.
Heh...for just a second as my eyes hit the headline, I thought that the researchers had discovered some "direct tunneling" from Kentucky to the United Kingdom.
Human being (n.): A genetically human, genetically distinct, functioning organism.
leakage current happens when the transistor is in "off" state, in other words no channel is formed between the source and drain (for your run-of-mill MOS). at this time the only way for electrons to go through is by tunnelling, and as you noted, the wavefunction extends to the other side easier with each shrinkage of the gate width.*
that said, i am much more interested in what exactly is the "thermal process" they are talking about.
*wavefunction's extention can be affected by several things, AFAIK: e.g. voltage across the gate, dielectric constant.
What you have to remember about heat is that electronics only get hot because they are never perfect conductors nor perfect insulators {though we can make nearer-perfect insulators than we can conductors}. A perfect conductor will never get hot, no matter how much current you put through it, because the voltage drop across it will be nil and power = voltage * current. Nor will a perfect insulator, because this time, the current through it will be nil.
..... hopefully a fuse.
CMOS is based around two transistors, a P-channel FET which goes conductive when the gate is driven low, and an N-channel FET which goes conductive when the gate is driven high. The P-FET is trying to pull the output high and the N-FET is trying to pull it low. Both the gates are joined together, and this is the input. This is a simple NOT gate.
For a NAND gate, where any input 0 will drive the output to a 1, we have several P-FETs in parallel trying to drive the output high, and so many N-FETs in series trying to drive the output low. Each P-FET gate joined to an N-FET gate is one input. When they are all high, all the N-FETs turn on allowing the output to go low; when any one is low, the chain of N-FETs is broken, one or more P-FETs turn on, and the output goes high. For a NOR gate, where any input 1 will drive the output to a 0, we put the Ns in parallel and the Ps in series. You can make AND gates from NAND+NOT, OR gates from NOR+NOT, and any other combination you like. In fact you really don't need both NAND and NOR, because you can make either one out of the other; but it turns out they're equally as easy to make as each other in CMOS {not like many other technologies}.
In an ideal world this would never dissipate any power, since the input cannot be high and low at the same time so only one of the transistors will ever be on. In practice what happens is that the gates act like capacitors which take a finite time to charge and discharge. They do not switch instantaneously from conductive to non-conductive. So one stops conducting while the other is starting to conduct, and for a brief instant while the inputs are changing state both transistors are conducting a little. It's not a dead short circuit of course, otherwise something would give way
Now every time something changes state, you get a little pulse of heat. Which is why fast processors need cooling. Additionally, to make sure that the logic gate output has changed state before the next clock pulse, you need to make the gate capacitances charge up quickly -- which means using a higher voltage than you could get away with at lower speeds. But 2x more volts means 2x more amps means 4x more watts.
Smaller transistors should have less gate capacitance, and so be capable of switching more quickly.
University of Kansas uses the acronym "UK"... This was slightly confusing during football season, but I figured our school must have been playing a team from the USA. In the title of science article, however, I sure as hell don't assume Kansas origion.
"It was really simple," said the researchers at the University of Kentucky. "We siply asked the Flying Spaghetti Monster to fix the problem for us, and He did." </flamebait>
Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
As speeds increase, won't leakage also increase because the insulators are, in effect, capacitors? At RF speeds, power flows through capicitors.
I'm not a chip designer, just a ham radio bug, so I don't know if this problem has already been found to be a non-issue. Maybe one of you bright guys knows the answer?
Lemon curry?
ummm... twins!?!?
Anyone know? For once Wikipedia isn't much help.
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2
A quick lesson in quantum physics:
Basically, tunnelling occurs because an electron can get from one side of a potential barrier to the other without ever being in the forbidden region (the width of the barrier, where the potential energy exceeds the total energy of the electron) due to it existing as a wavefunction that does not collapse until you observe it. Anyway, the chance of an electron penetrating a simple potential barrier like the gate of a transistor is a function of the height of the barrier (voltage applied to the gate), the width of the barrier (gate length), and the energy of the electron (voltage across transistor + electron thermal energy).
So ways to decrease tunnelling include:
Just my $0.02 since if I knew for sure I'd be making 6 figures somewhere and not applying to grad schools...
Final solution: Beat the speed of light into the ground. Anything else is a baby step. Since a pico-second is , you either have to make shrink everything, or make a pico-second . Class dismissed.
Assuming they really have discovered a way to lower power consumption (forgive me for not understanding semi-conductor principles) would it not be applicable to other semiconductors? I immediately thought about cell phone/mp3 player battery life and other such things. Even so far as to think about laptops. I (roughly) understand the not-so-much-wasted-power train of thought, and heat reduction from a CPU core and all, but wouldn't this have just as much effect on battery-powered devices? Or am I just being an ass again?
The leakage path relevant to tunneling is through the gate oxide, from the gate to the channel below it. In this case, the width of the barrier is the gate oxide thickness, not the gate length. So the ways to decrease tunneling include having a thicker gate oxide, but of course it'll still be slower (less capacitive coupling of the gate to the charge in the channel). A representative paper reviewing gate tunneling and its effects on logic gate performance is this one (in pdf).
Also, the height of the barrier is determined by the material properties, not the gate voltage. With that said, I still don't understand how the authors can do what the press release says they say they do. How does RTA affect the material properies enough to affect tunneling significantly? MOS gate oxides are one of the most studied materials known to man, with uncounted man-millenia devoted to eliminating any defects therein. What did they miss?
A final thought--if this was such a fundamental breakthrough one would think it would be presented at the International Electron Devices Meeting itself, rather than at the small conference associated with it held later in the week. But maybe not.
...would be a great start. Most modern COTS CPU fans crap out in 2-3 years, tops, but I have a dual PentiumPro 200 box under my desk which is showing no sign of wearing out a fan after ~10 years of continuous service.
Let's not even discuss my underengineered AOpen laptop with its hopelessly inefficient (and now defunct) fan, other than to say that bypassing 75% of its heat generation would be -ing marvellous; the hard disks might also last more than a year, and the battery be worth something again (maybe with some help from white LEDs to replace the LCD screen's illuminating flouros).
Fixing much less glamorous issues might actually have more of an impact on power consumption than pushing the boundaries of physics, but I'm all in favour of doing that, too.
Got time? Spend some of it coding or testing
Although I'm sure these guys are top-notch, research isn't what comes to mind when I think of Kentucky.
This just tells us that future technologies are not going to have twice the leakage power as current technologies. This doesn't mean that future process technologies are going to have less leakage power than the current ones.
Sometimes I doubt your committment to SparkleMotion!
so KFC now means Kentucky Fried Chip?
*ducks*
Out of curiousity and because I have been trying to figure this out ofr some time now....why don't they just make the processors bigger? Looking at an old PIII I have laying around, it's 2 in^2. I'm sure P4's aren't that much smaller. What if they bumped it up to 2.5in^2 or 3 in^2? then they wouldnt have to worry about making the transistors smaller because there would be all that extra space. If there is any flaw to my theory, please let me know because in my mind, it seems like a very good solution....
Yes, power dissipated is V*V/R or VI And yeah, smaller transistors have lower resistance. But smaller gates mean less power, not more. You need less current to move the charge in and out of a smaller transistor (since the charge is smaller). So the "I" in the "VI" can go down. Well, that "I" is really a "V/R" (current across a resistance), so lowering that I really means you can reduce the "V". And since the total power is V*V/R, that means the total power used drops drastically.
Let me explain it a little better because I think I even confused myself.
Power is V*I. The I is V/R. Lowering this R means the V/R value does get bigger (current goes up). But also, since the I only needs to be sufficient to fill or drain a gate in a given amount of time (one cycle), you can reduce the V until V/R is a more reasonable value. And when you lower that V you're also reducing the other V in the power formula (V*V/R), so in fact instead of power going up, it goes down greatly.
For a much easier corollary, look at AMD's 130nm CPUs against their directly equivalent 90nm versions. The 90nm versions take half as much power.
Today's nuclear CPUs are mainly because there are so many transistors switching so fast in such a small space. If you built an old-type CPU using 90nm technology (like an Z80 or something) it would take far far less power than the old ones, which ran off of +5V (plug that into V*V/R!). Additionally, current CPUs have a lot of leakage current, something that CMOS didn't have a problem with until we got to sub 180nm processes. Compare a current CPU to an old NMOS or even ECL processor. You'll see how leakage was a problem before and how much of a savior CMOS was.
Additionally, the megahertz race is not over. It may not be the current concentration of vendors, but as chips go to smaller and smaller feature sizes, they naturally get faster. So even with little concentration on speed, we'll still see a rise in individual core speed.
A 1000-thread (simultaneous) chip is a ridiculous idea. That means you have to duplicate every transistor in the chip (like registers) 1000 times. That makes no sense. You will never reach the same speed as current single processor chips with a 1000-thread CPU (at least not right now). A small number of cores is a better idea at the moment.
http://lkml.org/lkml/2005/8/20/95
I wonder if the article uses UK, which is easily misinterpeted as "United Kingdom" to give the missive more credence. I also note that the actual researcher is of oriental heritage.
As an aside, I'm reminded of the joke which is probably rewritten for numerous groups.
A person on the Indiana side of the Ohio river tells a person on the Kentucky side that he will shine a flashlight across the river and he can walk across the beam. The person on the other side says, "No way! You'll shut the flashlight off when I get halfway across!"
The article was purposefully mum on the technique these guys are using, so I'll try to elaborate:
We all know that the fab process for turning silicon into chips is *way* too complex to be explained by ordinary science, so the UK researchers instead brought in 7 Christian ministers to sanctify the process. Prior to etching, the wafer (pun intended) is doped with a mixture of holy water & oil. As the etching process takes place, the ministers intone the "Reverent Petition for Holy Quantumness" in quiet solitude, while reflecting on the needs for fast silicon to spread the Good News of our Lord and Savior Jebus Christ, Son of Man.
---
And if I offended you, oh I'm sorry but maybe you needed to be offended. - Muir
in the UK they call them chips!
Autonomous Retard -- Is your camp safe? UnsafeCamp.com
This news has made me very depressed.
how can I now be a cook at the same time as programming?
Before my Pentium 4 generated heat enough for frying eggs and I'm sure in a few years I would be looking for recipes suitable for heat generated by nuclear reactor( charcoal egg comes to mind) but now me dream is gone. Damn you University of Kentucky researchers. I hope we never meet
Even a 1GHz non-multithreaded chip can be much faster than a 4.0GHz Prescott today. It doesn't take threads to best it. All you have to do is jettison all the transistors that you absolutely can do without (16-bit mode, out of order execution) and then replace them with transistors you can use.
If you want to run code across a family of processors, you'll have some wastage of transistors. This isn't avoidable. But it is also critical. You can't just make one CPU and throw it away, you need a family to compete.
As to not replicating the parts 1,000 times, you're correct, you wouldn't have to replicate them all 1,000 times. But if you don't, you now have to arbitrate for those cells, and they are physically farther away (distance is latency) because they aren't co-located anymore. I would suggest that non-replicating thread support is a steeply falling slope. 2 threads is good. 4 not as much, 1000 isn't useful. All that you need to do is keep your transistors all working as much as possible. Having a 2nd thread to run when the first is stuck on memory adds a lot of performance, if you have some compute-bound threads to run. But do you have thousands eligibile to run at any time? No.
Multi-chip CPUs don't really work anymore (your idea of high-speed interconnects to other chips). CPUs are too fast. It takes too long to drive that signal out to the other chip, it's not worth the trouble.
Anyway, I would suggest that to make 1,000 threads run well, most modules would have to be replicated at least 250 times. And if you're gonna do that, don't bother redesigning, just make a 2 or 4 thread chip and buy more of them (or replicate the area of the die).
BTW, you're right, Intel doesn't have SOI. But honestly it appears it is so because they don't need it. And don't use P4 as an example, everyone knows it sucks including Intel. Look at the new (non-SOI) 2-core Yonah that is roughly AMD A64 X2 3800+ speed and uses half the power.
Honestly, Intel are incredible at process technology. They have a lot of secrets and are way ahead of most of the industry. When the stories about the idea of using "strained silicon" came into the news, Intel had already SHIPPED chips using it. I had one in my PC it turned out!
If Intel switches tracks and gets as good at chip design as AMD is (and it appears they are at least trying now), we're going to see some spectacular stuff.
Finally, note that not nearly all tasks cannot be parallelized effectively. CPU speed will still have its place because of this. Don't write off higher clock speeds. When I started with computers, chips ran at 500KHz or 1MHz. Now even decent chips (say, AMD or Intel P-M) run at 2400x that. There's no reason it won't go up 10x again over the next couple years. And given that even a slow PC right now is very fast for all nearly all normal tasks (Excel, anyone), what percentage of the market going to need a gaggle of cores then?
BTW, I find this discussion interesting. Too often things on slashdot turn into name calling and hate-fests. I may not agree with you on this topic, but it's pretty clear that we both at least recognize there are two legitimate sides to discuss.
http://lkml.org/lkml/2005/8/20/95
I'm attending ISDRS in Bethesda this week and presenting a paper tomorrow. It was interesting but I'm not entirely sure how well it would work. I'd have to see the actual data (which usually isn't shown in such presentations) and try it. It had to do with passivating the oxide-silicon interface with deuterium instead to reduce the number of hot carriers injected through the oxide. My review: interesting but I want to repeat it and see it in action.
I'm not trying to write-off chip speeds, but as a programmer I'm aware we can't just count on the fact that this bloated, single-threaded program/game which slugs around now will work fine 1 year from now when the processors catch up.
Second, multi-chip cpu's do work, the gpu is essentially a graphics slave to the primary, specialized for its tasks, with access to primary memory, and using specialized (tho with need of improvement) mechanisms to offload work from the cpu. TCP offload adapters are similar if not the same. How are multi-chip processors dying? Even the new physics addons are attached via high speed interface. The new line of high speed busses and interconnects are easily capable of handling the data and latency. How do Opteron's share each other's memory and IO otherwise via a simple HT link?
Ok, Yonah blows my intel idea, but my point was that Yonah wasn't a blind "design it first, then just try to keep shrinking it" like prescott.
Or, make a version of an ALU that can SIMD 16-32 similar operations at once, and schedule accordingly.
Finally, no, there is almost no way we could keep a 1000-thread chip full today. That is not a flaw of the chip, or concept, but of our current programming methodology. Are you honestly telling me there is no way to change the language, or design guidelines to unroll loops in parallel? Assuming only 200 are going to be running at any time due to memory/IO/waits, you have less reason to be so focused on branch prediction or cache misses. So a thread misses a couple cycles waiting on memory, the penalty is miniscule. Have 2 branches to choose from? Choose both, and discard the unused one. For loops can be coded as dependant, or independant, or in the case of DSP like functionality, unrolled with the pointers tracked automatically, parallelizing the operation completely. All this could be done if not today, than soon, without waiting for a 200x increase in clock speeds, which may or may not happen, and with few or none of the drawbacks. All the eye candy in modern os's? Free of charge, because that is just processor power that would be wasted otherwise. Who needs a gpu, your cpu can do that as well or better with an internal geometry unit, with no need for seperate memory.
The point is once the individual operation cost drops because of massive parallelization, the entire way of looking at programming and computers in general can be changed so the current high-revving, but stop for roadbumps viewpoint can be replaced with a steady convoy of trucks metaphor. Every instruction your computer executes is not dependent on every instruction executed before it, that is a restriction placed on computers by us, for debugging purposes and because multi-threaded chips were more expensive than high speed chips until recently, a product of CMOS fabrication technology. We still use processor interrupts for similar reasons, even though they are a huge performance hit unless done properly, and have been removed as far as possible from modern software and hardware designs.
Single threading is just a legacy frame of mind, because debugging dozens of threads without the right tools is hard. Once we develop the right tools, why in god's name would we want to run 1 thread at a time? Just imagine massive multithreading as the next generation of the superscalar architecture used in current chips. Take it to the next level, and blur the lines between current threads and a superscalar scheduler instead of handicapping yourself by forcing dependency scheduling on chip by dedicated circuits that have very little idea of the actual dependencies and execution flow, and less ability to flexibly reschedule for efficiency.
The first rule of USENET is you do not talk about USENET.
I probably wasn't clear enough about what I meant by multi-chip CPUs. And I probably interpreted your original description too rigidly to mean the thing that isn't viable anymore.
Let me explain.
It used to be that you might have multiple chips involved directly in the execution of the instruction stream. For example, the AMD 2900 series was bit-sliced. When an instruction was fetched and executed, multiple chips worked on it in parallel. I don't know the restrictions, but I believe each chip operated on 8 bits. If you had two chips in parallel you had a 16-bit processor, 4 of them, 32-bit.
A more common organization had some instructions go to different chips, very commonly FPUs. In the Intel 8086/8087, the 8087 watched the CPU fetches and when it saw a floating point instruction, it executed it itself. With the 80286/80287 and later, the main CPU would fetch the instruction and send it to the FPU for exeuction, then get the results back and put them were they were supposed to go. The Motorla 68020 and 68030 supported external FPUs also (68881/68882). The MIPS R2000 and R3000 also had external FPUs.
The original Motorola 88000 was actually two chips, the 88100 and the 88200. One chip did the operations, the other was the bus-interface (load/store and cache) chip.
This all ended at the end of the 80s, when new chips like the Motorola 68040 (except LC), the Intel 486 (except SX), Motorola 88110, and MIPS R4000 series came out. These chips were too fast to effectively use external FPUs, moving the data out to that other chip and back was too slow. Even though the on-chip 68040 FPU wasn't as capable as the external 68882 (or even 68881 in many ways), the lower latency made the performance much much higher.
So, what I meant was that off-chip processing of instructions in the stream just became unworkable at that point, and it still is.
But you're right, a processing system that includes multiple chips executing different instruction streams (the GPU) is quite viable and effective today.
HT is very effective on AMD's processors, but you have to remember that HT only comes into play for load/store instructions. So it has to be fast, but it has to be fast compared to the bus speeds in order to be effective. "calling out" to other chips for the execution of single-cycle math operations wouldn't work nearly as well.
For example, my understanding is that the HT links on current AMD processors run "only" at 1GHz. Since chips like mine run instructions at 2.2GHz or higher, using HT would mean no instruction could be executed in less than 2 clocks. This would be a big hit for simple register-register transfers or adds or such.
A lot of the other stuff you describe sounds remarkably like Intel's EPIC (Itanium). I am not convinced at this time of the value of EPIC in a processor family. I agree it sounds great on paper, but the results so far have been poor, and I know that Intel hasn't be completely successful in making compilers as effective as CPUs at scheduling instructions to run. And you can see why, right? Can you really statically schedule a loop perfectly which contains loads when some of those loads will come from cache (very fast) and some from SDRAM (very slow)? A CPU selecting instructions at runtime will be able to make the decisions on the fly, the compiler doesn't know what to do.
I do know that EPIC isn't incompatible with your idea of lots of threads though, so I know that rejection of one doesn't put a black mark against the other.
http://lkml.org/lkml/2005/8/20/95
UK researchers - damn! Just when I thought us Brits might be making scientific progress again!
Still, great tagline for the UoK though.