Slashdot Mirror


Hyper-Threading Speeds Linux

developerWorks writes "The Intel Xeon processor introduces a new technology called Hyper-Threading (HT) that makes a single processor behave like two logical processors. The technology allows the processor to execute multiple threads simultaneously, which can yield significant performance improvement. But, exactly how much improvement can you expect to see? This article gives the results the investigation into the effects of Hyper-Threading (HT) on the Linux SMP kernel. It compares the performance of a Linux SMP kernel that was aware of Hyper-Threading to one that was not." Ah, the joys of high performance.

10 of 239 comments (clear)

  1. What's really cool also by esconsult1 · · Score: 4, Interesting

    We've used XEON's on our DB server for a few months now. The performance has been outstanding. You also see 4 processors when you run top.

    At first we thought this was an error, and got in touch with Dell's tech support. But the geeks there said this is normal behavior.

    1. Re:What's really cool also by Richard_at_work · · Score: 4, Interesting
      Windows2000Pro only shows two as thats all it can handle. Its part of the Windows2000 limitation:
      • Windows2000pro - 2 cpus
      • Windows2000server - 4 cpus
      • Windows2000AdvServer - 8 cpus

      We put win2kserver on a dual Xeon with HT, and it showed 4 cpus (this was when we realised we had HT capable Xeons! Suree enough, after checking, we were right)
  2. But the real question... by Jace+of+Fuse! · · Score: 3, Interesting

    Does SMP support automatically allow benefits from Hyperthreading, or does that require special support all it's own?

    --

    "Everything you know is wrong. (And stupid.)"

    Moderation Totals: Wrong=2, Stupid=3, Total=5.
    1. Re:But the real question... by lederhosen · · Score: 3, Interesting

      Yes, as said in the other posts,
      BUT, you want to schedule the
      same process on the same CPU in
      order to not trash the cache.

      I.e. you can make a huge inprovement
      by make the scheduler aware of
      processors *and* logical processors.

  3. 51% speed-up! by core+plexus · · Score: 5, Interesting
    An excellent, detailed article. For those in a hurry:

    "Conclusion
    Intel Xeon Hyper-Threading is definitely having a positive impact on Linux kernel and multithreaded applications. The speed-up from Hyper-Threading could be as high as 30% in stock kernel 2.4.19, to 51% in kernel 2.5.32 due to drastic changes in the scheduler run queue's support and Hyper-Threading awareness."

    My questions: What's the downside? Is AMD doing anything similar?

    Fight with computer brings SWAT team

  4. HT hurt perf by steelerguy · · Score: 5, Interesting

    Tested HT running couple large jobs on a 2 CPU box with each process using over a GB of RAM. Performance went down.

    Also HT can play havoc with a openMosix cluster since processes can start being migrated around to CPU's that do not really exist and appear to have no load, yet the physical CPU may be 100% loaded in reality.

    It is not all peaches and cream.

  5. Re:What are you talking about? by pVoid · · Score: 5, Interesting
    You're assuming the kernel is one thread =)

    The kernel has lots of work to do when you call into it. Of course it wouldn't boost up if all kernel calls were like this:

    void myKernelFunc( long param ) { return param * 2 - 23; }

    But kernels do work.

    See, something that very few people know is that the NT kernel is fully pre-emptible, fully-interuptible, and re-entrant by design. ie. even for single processor systems, it's like that.

    The linux kernel is not.

    It's very hard to 'tack on' SMP features onto a system that wasn't made with that in mind in the first place.

    This has some advantages, and some drawbacks... NT kernel programming is *frigging* hard. HARD.

    But it also has the advantage that it makes *much* better use of SMP.

    Sad, but true.

  6. Re:Fundamental mistake by afidel · · Score: 5, Interesting

    nope, you make incorrect assumptions, the hyperthreaded portion of the cpu shows up to software as a seperate cpu. For this reason a win2k pro machine has to have hyperthreading disabled on a dual xeon machine or else it will just use the first physical cpu and its child hyperthread. This is why artificial smp limitations suck. Also win2k server will only allow 4 cpus in standard edition so it can only utilize two physical cpus and their hyperthreads. Windows Server 2003 ups the amount of cpus allowed for standard edition to 8 to account for this.

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  7. Re:Technical Summary by cartman · · Score: 3, Interesting

    What you said was false.

    Take the example of database & OLTP applications. Database transactions are heavily dependent on repeated access to RAM. Virtually no database is small enough to fit into cache, and there is often little regularity in which data is accessed. Memory latency will REQUIRE a non-SMT processor to wait IDLY each time there is a memory latency, which takes >100 proc cycles on a modern CPU. This has NOTHING to do with he p4 architecture or long pipelines.

    "But remember that HT requires OS support, may require application support..."

    HT does not require OS support as long as the OS is capable of recognizing more than 1 CPU. Any threaded app can benefit from HT.

  8. SMT will become increasingly important by cartman · · Score: 5, Interesting

    SMT (hyperthreading) will become increasingly important when processors are able to execute more than 2 threads simultaneously.

    This development is inevitable. Previously, each new processor generation was faster than the prior one at a given clock rate, because each new processor core had more execution units, and was therefore able to perform more work in parallel. This trend abruptly ended recently, for one reason: there is no more instruction-level parallelism (ILP) to exploit. It is impossible for a processor to look at a thead of execution and find more than a few instructions to execute in parallel.

    The only parallelism left to exploit is THREAD-LEVEL parallelism (TLP). Therefore the only way to continually increase performance is to increase the number of threads that a CPU can execute in parallel. This requires two modifications to CPU cores: first, increase the number of thread contexts per CPU, and second, increase the number of pipelines to which those threads can be dispatched.

    With the P4, it would be pointless to have more than 2 thread contexts, because there aren't enough CPU resources lying idle to execute more than 2 threads. But future CPUs could make use of more than 2 thread contexts by having enough CPU resources to execute all of them. Future CPUs could have 20 execution units or more, which would be enough to execute several threads. Remember that the number of transistors per CPU continues to increase exponentially.

    It's easy to forsee a time when processors have 20 execution units (10 integer, 10 fp) and 4 thread contexts, offering more than triple the performance of a non-SMT cpu. In the future, non-SMT CPUs will make as little sense as a non-superscalar CPU would today.