Hyper-Threading Explained And Benchmarked

← Back to Stories (view on slashdot.org)

Hyper-Threading Explained And Benchmarked

Posted by timothy on Tuesday January 6, 2004 @08:48PM from the two-hearts-beat-as-one dept.

John Martin writes "2CPU.com has posted an updated article about Hyper-threading performance. They discuss the technology behind it, provide benchmarks, and make observations on what the future holds for hyper-threading. It's actually an easy, interesting read. Of note, they'll be publishing Part II in the near future which will detail hyper-threading performance under Linux 2.6. Hardware geeks will probably appreciate this."

16 of 245 comments (clear)

Min score:

Reason:

Sort:

Interesting. by Anonymous Coward · 2004-01-06 20:58 · Score: 5, Informative

There was an interesting discussion on the Plan9 newsgroup about hyperthreading recently, read here
Intel's Whitepaper by Cebu · 2004-01-06 21:02 · Score: 5, Informative

For those more technically inclined I would suggest reading Intel's Hyper-Threading Technology Architecture and Microarchitecture whitepaper instead.
1. Re:Intel's Whitepaper by arkanes · 2004-01-07 00:45 · Score: 4, Informative
  
  Ars Technica has one also - less technical than the Intel paper but very accessible and with pretty colored diagrams.
For the real technical details by photonic · 2004-01-06 21:14 · Score: 5, Informative

The article claims to talk about the technical details of hypertreading. At first glance, however, it seems more like yet another article in the series "Athlon beats Pentium at Doom by 1/2 frame per second".
If you are really interested in the how and why of hypertreading in suggest you read trough the lecture notes of Computer System Architecture at MIT OpenCourseWare. This gives you enough background to race trough all the articles at Ars Techica et al.

--
karma police: arrest this man, he talks in maths; he buzzes like a fridge, he's like a detuned radio. [radiohead]
Re:Ever buy a car with auto-everything? by BlueBiker · 2004-01-06 21:28 · Score: 5, Informative

Well Intel is already encountering heat problems which limit how fast they can crank the clockspeed. Hyperthreading is a moderately successful attempt to make use of the available execution units on the chip which would otherwise sit idle. It's also not so new and untested, it has been implemented but not enabled on earlier P4 steppings.

Athlon and Athlon64 are generally better able to make use of their execution units, and wouldn't benefit from HT as much as P4/Xeon.
YHBT HAND! by TheMidget · 2004-01-06 21:38 · Score: 4, Informative
Indeed, you've bitten on the following hooks:
- FDIV error: yes, it was division, not addition. However, conditions ware far less specific as Intel would have liked us to believe...
- CISC vs RISC: you correctly pointed out that Pentiums still are CISC (even though they nowadays have a RISC core)
And you've missed the following hooks:
- CAFEBABE: that's java's magic number. The code that used to lock up Pentium II's was F00FC7C8
- Hyperthreading and the OS's job: no, hyperthreading does not do sth which the OS normally would do. It just pretends that there is a second processor. The OS is still responsible to assign threads to both virtual processors, just like it would do with two real processors!
Note to moderators: mod grand-parent down. It is obviously a troll (albeit a rather well written troll!). If you absolutely must mod it up, at least use Funny rather than Interesting
Re:Celery by turgid · 2004-01-06 22:09 · Score: 4, Informative

A Celeron is much cheaper than a P4 with the hyperthreading
So it is, and it's not all that fast either. Then again, you shouldn't believe all that you read on the Intarweb.

--
Stick Men
Everything I know about Hyperthreading... by obergeist666 · 2004-01-06 22:27 · Score: 5, Informative

... I learned from this article.
Hyper-threading explained in 300 words or less. by Anonymous Coward · 2004-01-07 01:09 · Score: 4, Informative

When a process blocks because it is trying to access memory that is not loaded into the cache, it sits idle while the data is retrieved from the much-slower main memory. If you can store two process contexts on the CPU instead of just one, whenever one process blocks to read from memory, the operating system can quickly switch the CPU to the other context which is waiting to run.

I can't remember the name of the machine, but one parallel shared-memory machine used this exclusively. The CPU had 128 process contexts and would switch through them in order. The time between subsequent activations of each context was great enough that data could be fetched from main memory and loaded into a register. This eliminated cache coherency problems (no cache!) and all delays related to memory fetching.

A P4 with hyperthreading is a simplified and much more practical version of that machine.
Re:From the article: by Glonoinha · 2004-01-07 02:38 · Score: 4, Informative

How about two people in moderate shape being able to push wood through a single wood chipper than a single person who is in great shape (assuming the wood is piled up 18 feet away = cache miss).

The single wood chipper being analogous to the actual processing part of the core, is only going to be able to shred so much wood - but if two people fetching wood from the woodpile can keep it running at 100% capacity they will shred more wood than a single guy running back and forth to the wood pile by himself.

--
Glonoinha the MebiByte Slayer
HT Technology by sameerdesai · 2004-01-07 03:23 · Score: 3, Informative

I have some insight into this technology as I was part of a research group researching SMT. It is a really cool technology that exposes Instruction level parellelism (ILP) and increases performance. The basic HT technology for the processor however distributes the resources. The details of Intel HT are available here at http://www.intel.com/technology/hyperthread/ You can also find whitepapers associated with this. Now the catch is application should be multi threaded. You just can't buy a HT processors and run single thread application and expect to improve performance. The performance benefits lie if optimal number of threads are used. If too less it will be unnecessary wastage of resources. If too high they will queue up and cause bottlenecks. The other thing that can affect performance is unbalanced workload and can cause threads which cannot exploit the parallelism. This is a new technology and lot of research is going on in this area and it looks really promising.
Re:SMT by jtshaw · 2004-01-07 04:12 · Score: 3, Informative

That is totally true. Processor-specific microcode optimizations are definitly the compilers job. But you have to conceed the fact that the compiler can only do so much. If the programmer doesn't choose a good method or solving the problem at hand there isn't much a good compiler can do to optimize the code, especially if the problem being solved is complex.

Compilers simply can't be asked to pick up the slack for programs written with a poor logical flow. They can't be ask to figure out a completely different and improved algorithm for solving a complex problem they don't completely understand the parameters for.
AnandTech on Hyperthreading by glinden · 2004-01-07 04:46 · Score: 3, Informative

AnandTech did an excellent article on hyper threading a while back. Well written and worth reading.
IBM Will Do SMT Right by fupeg · 2004-01-07 05:06 · Score: 3, Informative

IBM will have SMT in the Power5. Their approach looks even better than Intel's, but part of that is the Power architecture and part of that is IBM learning from what Intel did. SMT is really the best way to get past the limiting reagents of modern processors : bandwidth.
"hyper-threading" vs. cache size by Animats · 2004-01-07 06:10 · Score: 4, Informative

The basic problem with hyperthreading is, of course, memory bandwidth. CPUs today are memory-bandwidth starved. 30 years ago, CPUs got about one memory cycle per instruction cycle. Since then, CPUs have speeded up by a factor of about 1000, but memory has only speeded up by a factor of 30 or so. The difference has been papered over, very successfully, with cache. The cache designers have accomplished more than seems possible. Compare paging to disk, which is a form of cacheing that hasn't improved much in decades.
If you want to benchmark a hyper-threaded machine, a useful exercise is to run two different benchmarks simultaneously. Running the same one is the best case for cache performance; one copy of the benchmark in cache is serving both execution engines. Running different ones lets you see if cache thrashing is occuring. Or try something like compressing two different video files simultaneously.
If you're seeing significant performance with real-world applications using a a "hyper-threaded" CPU, that's a sign that the operating system's dispatcher is broken. And, of course, hyper-threading dumps more work on the scheduler. There's more stuff to worry about in CPU dispatching now.
Intel seems to be desperate for a new technology that will make people buy new CPUs. The Inanium bombed. The Pentium 4 clock speed hack (faster clock, less performance per clock) has gone as far as it can go. The Pentium 5 seems to be on hold. Intel doesn't still have a good response to AMD's 64-bit CPUs.
Remember what happened with the Itanium, Intel's last architectural innovation. Intel's plan was to convert the industry over to a technology that couldn't be cloned. This would allow Intel to push CPU price margins back up to their pre-AMD levels. For a few years, Intel had been able to push the price of CPU chips to nearly $1000, and achieved huge margins and profits. Then came the clones.
Intel has many patents on the innovative technologies of the Itanium. Itanium architecture is different, all right, but not, it's clear by now, better. It's certainly far worse in price/performance. Hyperthreading isn't quite that bad an idea, but it's up there.
From a consumer perspective, it's like four-valve per cylinder auto engines. The performance increase is marginal and it adds some headaches, but it's cool.
1. Re:"hyper-threading" vs. cache size by Brandybuck · 2004-01-07 06:47 · Score: 4, Informative
  
  If you're seeing significant performance with real-world applications using a a "hyper-threaded" CPU, that's a sign that the operating system's dispatcher is broken. And, of course, hyper-threading dumps more work on the scheduler. There's more stuff to worry about in CPU dispatching now.
  
  That was my suspicion. Hyperthreading can't be much more efficient than threading via the OS, unless the software is specifically compiled for it, or you use a scheduler specific to hyperthreading. Scheduling work STILL has to be performed, and hyperthreading STILL isn't parallel processing. So where are these performance improvements people are seeing coming from?
  
  I'm not using Linux, but FreeBSD. When I got my new HT P4, I considered turning it on. Then I read the hardware notes. Since FreeBSD does not use a scheduler specific for hyperthreading, it can't take full advantage of it. In some cases it might even result in sub-optimal performance. Just like logic would lead you to think.
  
  The OS cannot treat hyperthreading the same as SMP, because they are two different beasts.
  
  --
  Don't blame me, I didn't vote for either of them!