Intel Hyperthreading In Reality
A reader writes: "Looks like GamePC has got the first look at Intel's new Xeon processor, which has the new super-fantastico Hyperthreading technology, which tricks your OS into thinking one CPU is two CPUs, two CPUs is four. Looks neat in theory, benchies included."
I am not as concerned with how it "tricks" the OS as much as I am about performance and reliability. Tell me how this actually makes the chip BETTER and I might get excited.
Tequila: It's not just for breakfast anymore!
So I can just buy half a processor, and get full functionality? ;)
Hyperthreading is a pretty cool idea, especially for those of us who would like to see SMP move more into the mainstream.
According to this article, though (posted on 2cpu.com), the Windows 2000 scheduler doesn't know how to take advantage of hyperthreading, since it doesn't know how to take advantage of virtual processors. (I suppose Windows XP does?) Go figure. Anyway, this looks like it's probably worth checking into. I'm sure Linux will support it!
---Have you crashed Windows XP with a simple printf recently? Try it!
Wow, kinda sucks if your OS has a per CPU license, like NT and Win2K server!
Make sure you use the Printer Friendly view, that way you don't get 12 pages of slashdotted hell! Look here.
-- Dan
In hyperthreading, the logical processors do not share registers, just function units. Thus, if one logical processor needs to multiply while the other needs to add, they may share the CPU resources simultaneously.
This was developed in response to the observation that individual function units remained idle for multiple cycles while the current process was busy doing one kind of operation.
-B
And it may hurt. A downside of "hyperthreading" is that the threads contend for cache space, so if the threads are executing very different code, the cache miss rate will rise. Of course, this happens in ordinary threading on each context switch, but with "hyperthreading", there's a context switch of sorts on every instruction cycle. If this effect shows up, it will show in L1 cache miss rates.
This isn't a totally new idea, either. The first step in this direction was the peripheral processor for the CDC 6600, in the 1960s, which appeared as ten peripheral processors to the programmer. Internally, it was ten sets of registers and one ALU, doing one instruction for each machine state in turn. Basic/4, a forgotten minicomputer manufacturer, tried a similar idea in the 1970s.
On the other hand, this apparently isn't that tough a feature to add to an already-superscalar CPU, so why not?
but then theres this: "While this looks great for showing off to co-workers or friends, you will absolutely NOT get the performance of four CPUs running in your system (I can't stress this enough). As you'll see in our benchmarks later, even if software is written to take advantage of SMP, you rarely ever see performance gains with Hyperthreading enabled."
Ave Molech Setting
Simultaneous Multithreading (SMT) is not a new idea, although no one to my knowledge has implemented it yet. Intel just calls it "Hyperthreading"...it is essentially SMT.
And yes, this is a very good idea. A modern superscaler out-of-order processor, like the Athlon and Pentium Pro (and later), can issue and retire multiple instructions per clock cycle. However, it can *only* do this if there is enough instruction-level parallelism (ILP). Turns out, there is not enough ILP in current programs to take full advantage of the chips processing capabilities. Issue slots and function units go unused due to dependencies in the program and cache misses that stall the processing. A typical processor can only look at about 32 instructions at a time. This is not a large enough window to execute future instructions out-of-order when such a stall occurs.
However, 2 threads of execution will likely fill all of the issue slots. They are also independent threads of execution, so dependencies don't exist between them. This means that when the pipeline stalls due to a cache miss, the other thread can keep on retiring instructions.
To all those saying that this is dumb, I suggest you study some modern architecture (I'm not talking about your undergrad architecture course either). A paper I read recently studied the affects of SMT on a simulated Alpha processor. The results were astounding with very little changes to the processor core. I heard that the next Alpha was slated to include SMT before Intel killed it.
I believe that this was done in the IBM AS/400 using a special version of the PowerPC chip. There was a talk on this at the Ottawa Linux Symposium last summer. According to the IBM people, it mostly worked great, but there were a few issues with spin locks--the CPU saw that one thread was busy (in a spin lock), so it never switched to the other one (that was holding the lock). The Intel implementation may be slightly different, but this is something to look at.
When your hardware isn't exactly what the software was written for, you tend to have weird bugs like that. I would not be surprised if Windows, Linux, FreeBSD, and other OSes need minor patches to work well with this new hyperthreading from Intel.
Despite VmWare, The Intel architecture isn't really virtualized. I am willing to believe that IBM can actually put multiple CPU cores on one die, for many SMP benefits without the downside of requiring multiple CPU dies and infrastructure.
On the other hand, for Intel this seems to be unrelated, and a cheap hack, where the CPU presents an SMP interface to the OS only to to request multithreaded code. Requesting multithreaded code should be possible without sending misinformation to the basic Operating System services. This seems to be a symptom of the disconnect inherent in having different companies producing the Architecture and the Operating System. It seems unlikely that you would see such a crude hack for fundamental systems infrastructure coming from an integrated vendor such as SGI, Sun, or even Apple. The Mips and PPC/Power don't lie about such things, because the support chipsets are actually supported by the Operating Systems.
Really, this seems reminicent of C/H/S mangling in storage interfaces and long file name mangling in the vFat filesystem. While is has taken up many generations (in computer years) to abstract the state of the art from such persistent kludges, they still haunt the consumer computing space to this day. Now do we really need to go even further backwards fuck with how the CPU interfaces with the core system?
Of course, perhaps I am not giving the Wintel juggernaut the benefit of the doubt... after all, they will have to bear the consequenses of such a choice, so maybe they aren't just adding saddlebags to the most basic part of the computing system. Perhaps they have the foresight to resolve this in the next few years before making this burdensome interface an accepted standard. It would be shameful too add more standardized interface limitations... such as the standardized "Gates" hopping of the 640K limit on commodity computing harware, or the IDE limit of 540MB, no 2GB, no 8GB, no 36GB, no wait 140Gb, Ad Nauseam. All of these arbitrary limitations were the results of cutting corners and not sending fully honest information to basic system interfaces. Can't the OS send efficient multithreaded code for processing without living a lie?
-castlan