Intel Hyperthreading In Reality
A reader writes: "Looks like GamePC has got the first look at Intel's new Xeon processor, which has the new super-fantastico Hyperthreading technology, which tricks your OS into thinking one CPU is two CPUs, two CPUs is four. Looks neat in theory, benchies included."
Hyperthreading is a pretty cool idea, especially for those of us who would like to see SMP move more into the mainstream.
According to this article, though (posted on 2cpu.com), the Windows 2000 scheduler doesn't know how to take advantage of hyperthreading, since it doesn't know how to take advantage of virtual processors. (I suppose Windows XP does?) Go figure. Anyway, this looks like it's probably worth checking into. I'm sure Linux will support it!
---Have you crashed Windows XP with a simple printf recently? Try it!
Make sure you use the Printer Friendly view, that way you don't get 12 pages of slashdotted hell! Look here.
-- Dan
Basically what they're doing is simply taking unused processor resources and allocating them to another thread. You can now have multiple _threads_ of excecution simultaneously... truely simultaneously.
Thread X is using register's B and C
Thread Y can able to use registers A and D.
These threads can be executed together without a context switch... and the processor will hunt out these relationships in hardware. That's what "the big deal" is.
Until now, when a processor "multitasks", it's simply switching from one thread of execution to the next... it allocates separatetime to two different threads....Now it can allocate the exact same timeslice to multiple threads as long as there isn't a resource dependancy.
If your program can be architechted to take advantage of this (or your OS can schedule tasks like this), you'll get a huge benifit (read: if it works on SMP systems, it'll get some benifit on this as well).
WHY? I mean, come on... If you want two processors, shouldnt you have 2 processors in the systems???
Maybe because SMT makes the die 5% bigger, while 2 processors is upwards of 100% bigger? This is where a thing called "cost" comes in.
SMT essentially allows for the CPUs to be used more efficiently. A lot of the time an ALU will sit idle while the FPUs work, and with SMT both can work at the same time on different threads.
Modern CPU's have many different execution units. Depending on the code running, not all of them may have work scheduled. Future work may depend on previous results; obviously you can't do this in parallel. The idea of "HyperThreading" is to run more than one thread of execution at a time with the multiple execution units - so more work gets done per clock cycle.
A quick Google search turned up an article here. At one point I read a really excellent article on single-processor multithreading (discussing a future Alpha processor) but I can't find it anymore. Hopefully AMD will do something like this as well for a future Hammer processor.
> I am not as concerned with how it "tricks" the OS as much as I am about performance and reliability. Tell me how this actually makes the chip BETTER and I might get excited.
Read the freaking article.
[simplified summary]
The processor can handle thread scheduling better than the os can handle thread scheduling. Claiming to be 2 processors pushes half of the scheduling from the os to the processor. Net performance gain is expected to be around 10% when the number of active threads is at least twice the number of processors.
A number of people have posted asking what the point was of making a single processor act like two processors. It's actually explained in the article linked to above.
Apparantly, he big deal is that a single processor can only handle one thread at a time--multitasking works by breaking programs down into threads, and working on one thread for a little while, then another, then another, then back to the first. But at any given time, only one thread is being actively executed. Hyperthreading changes this--a single processor can work on two threads truly simultaneously. This makes multitasking a hell of a lot more efficient.
The original Howling Frog is a fictional character and has no UID.
the site is partly /.'d already but the printer friendly (non graphic) version seems to actually still load.
http://www.gamepc.com/reviews/printreview.asp?revi ew=ppso&mscssid=&tp=
Saying Java is nice because it works on all OS's is like saying that anal sex is nice because it works on all genders.
And it may hurt. A downside of "hyperthreading" is that the threads contend for cache space, so if the threads are executing very different code, the cache miss rate will rise. Of course, this happens in ordinary threading on each context switch, but with "hyperthreading", there's a context switch of sorts on every instruction cycle. If this effect shows up, it will show in L1 cache miss rates.
This isn't a totally new idea, either. The first step in this direction was the peripheral processor for the CDC 6600, in the 1960s, which appeared as ten peripheral processors to the programmer. Internally, it was ten sets of registers and one ALU, doing one instruction for each machine state in turn. Basic/4, a forgotten minicomputer manufacturer, tried a similar idea in the 1970s.
On the other hand, this apparently isn't that tough a feature to add to an already-superscalar CPU, so why not?
Some other complaints about this "invented at Intel" terminalogy can be found at The Register.
Also Toronto has a nice slide show (pdf) on the topic.
For the record I contributed a little tiny bit to this stuff when I was at Intel (I found what I think was the first multi-processor bug for SMT.)
As Nietsche famously said, "If you stare too long into the Abyss, 1d4 Tanar'ri of random type will attack you."
Simultaneous Multithreading (SMT) is not a new idea, although no one to my knowledge has implemented it yet. Intel just calls it "Hyperthreading"...it is essentially SMT.
And yes, this is a very good idea. A modern superscaler out-of-order processor, like the Athlon and Pentium Pro (and later), can issue and retire multiple instructions per clock cycle. However, it can *only* do this if there is enough instruction-level parallelism (ILP). Turns out, there is not enough ILP in current programs to take full advantage of the chips processing capabilities. Issue slots and function units go unused due to dependencies in the program and cache misses that stall the processing. A typical processor can only look at about 32 instructions at a time. This is not a large enough window to execute future instructions out-of-order when such a stall occurs.
However, 2 threads of execution will likely fill all of the issue slots. They are also independent threads of execution, so dependencies don't exist between them. This means that when the pipeline stalls due to a cache miss, the other thread can keep on retiring instructions.
To all those saying that this is dumb, I suggest you study some modern architecture (I'm not talking about your undergrad architecture course either). A paper I read recently studied the affects of SMT on a simulated Alpha processor. The results were astounding with very little changes to the processor core. I heard that the next Alpha was slated to include SMT before Intel killed it.
Posted 1/14 on anandtech:
http://www.anandtech.com/cpu/showdoc.html?i=1576
One of the reasons that I became a lawyer was to avoid ever having to hire one. -SPYvSPY
The best example of how to split a task into threads (that I like to use) is rendering a 3D image to screen. If you want to split that task so that two threads (and thus two processors) can work on it, you just make one thread handle 'even' scan lines and one thread handle 'odd' lines. Keeping the caches cohereent between the two CPU's can be difficult - they're both executing the same code, and might also be twiddling around some piece of memory that they share.
My point is, with this hyperthreading business, that there's only one cache - so no more cache coherency bothers. I might be concerned that the arithemetic units or whatever else that are on the chip might be in contention for use - but they can just add more of 'em in later steppings of the CPU.
The problem for us Unix-lovin' folk is that Unix-esque OS'es don't often take threading very seriously. OpenBSD, for example, doesn't even have a kernel-threading implementation (correct me if I am wrong!) The 'Unix Way' is to just fork a process and run two process images. That's fifty billion times easier to debug than two threads that step on eachothers' data (see deadlock). But the forking method - even with nifty things like copy-on-write process images and such - doesn't seem to use as little memory, or perform as quickly, or process-switch as fast.
When I speak to developers who know their stuff (more than I do) they say - on NT, make a whole bunch of threads and make them talk to eachother with semaphores and stuff - on Unix, fork and write to a pipe. Nothing fundamentally wrong with that division, but advances such as this Hyperthreading thing won't work as well on Linux, I don't think.
Registers in modern processors get renamed. Intel gets away with having such few logical registers in their ISA (instruction-set architecture) because they have dozens of physical registers.
All hyperthreading will do is just maintain a different program counter and re-order buffer for each thread. There are probably other minor details as well, but don't get caught up in registers from a programmer's point of view. There is magic under the hood that the programmer will never ever be aware of. At some point in your program, their may be 8 or so "EAX" registers. Later on, this same register may be renamed to a "ESP" register.
From the article mentioned above:
As for the Xeon's Hyperthreading technologies, it's hard not to be disappointed with the scores which we got throughout our testing. Hyperthreading sounds like an incredibly useful processor feature in theory, but in practice, It's useless without compatible software on the market. Time will only tell if developers want to take on the Hyperthreading challenge, and the few developers we've talked to have not been that incredibly impressed with the technology thus far. If nothing else, Hyperthreading will certainly be an interesting to watch out for over the next few years.