Slashdot Mirror


Hyper-Threading Speeds Linux

developerWorks writes "The Intel Xeon processor introduces a new technology called Hyper-Threading (HT) that makes a single processor behave like two logical processors. The technology allows the processor to execute multiple threads simultaneously, which can yield significant performance improvement. But, exactly how much improvement can you expect to see? This article gives the results the investigation into the effects of Hyper-Threading (HT) on the Linux SMP kernel. It compares the performance of a Linux SMP kernel that was aware of Hyper-Threading to one that was not." Ah, the joys of high performance.

44 of 239 comments (clear)

  1. Also the Pentium 4 - 3 Ghz is hyperthreaded. by deathcow · · Score: 5, Funny


    Xeon folks arent having the only fun. The 3 Ghz Pentium 4 is also hyperthreaded for that crunchy flavor and great taste.

    1. Re:Also the Pentium 4 - 3 Ghz is hyperthreaded. by deathcow · · Score: 5, Informative

      Here is the associated press release from Intel about the HT in 3 Ghz P4's. I have seen screenshots of Windows task manager showing (2) CPU performance graphs.

    2. Re:Also the Pentium 4 - 3 Ghz is hyperthreaded. by Henry+V+.009 · · Score: 3, Insightful

      Yes, they do something almost exactly like it. Simply buy two processors and a multi-processor motherboard. That defeats the purpose of this technology, of course, but it nearly accomplishes the same thing.

      Other than that, well, I'm--still--waiting for Hammer. AMD is dropping a long ways behind Intel. Price is all they've got, and AMD isn't even competeing on price-performance real well at the moment. My guess is that Intel hyperthreaded systems will probably be better price-performance wise than AMD before long--if they aren't already.

  2. What's really cool also by esconsult1 · · Score: 4, Interesting

    We've used XEON's on our DB server for a few months now. The performance has been outstanding. You also see 4 processors when you run top.

    At first we thought this was an error, and got in touch with Dell's tech support. But the geeks there said this is normal behavior.

    1. Re:What's really cool also by DivideX0 · · Score: 4, Funny
      But do you really want to see additional processors, wouldn't SCO want to charge you more for them?

      Earlier SCO Story

      --
      My next Slashdot post will be ready soon, but subscribers can beat the rush and see it early!
    2. Re:What's really cool also by afidel · · Score: 3, Informative

      Nope, win2k only shows 2, well at least for pro. It also binds to the first physical cpu and it hyperthreaded child. For this reason you have to turn off hyperthreading if you are going to install win2k pro on a 2 physical cpu workstation, I should know I have a system reimaging right now because it came from the factory with hyperthreading enabled and so only 1 physical cpu was being used.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    3. Re:What's really cool also by Richard_at_work · · Score: 4, Interesting
      Windows2000Pro only shows two as thats all it can handle. Its part of the Windows2000 limitation:
      • Windows2000pro - 2 cpus
      • Windows2000server - 4 cpus
      • Windows2000AdvServer - 8 cpus

      We put win2kserver on a dual Xeon with HT, and it showed 4 cpus (this was when we realised we had HT capable Xeons! Suree enough, after checking, we were right)
  3. Fundamental mistake by cbcbcb · · Score: 5, Insightful
    >It compares the performance of a Linux SMP kernel that

    >was aware of Hyper-Threading to one that was not."

    But if you aren't going to use hyper threading you would use a UP (non-SMP) kernel, which would gain you considerable performance. The benefits are not so clear cut as many of the benchmarks show limited benefit from hyperthreading and would perform faster on a uniprocessor kernel.

    1. Re:Fundamental mistake by afidel · · Score: 5, Interesting

      nope, you make incorrect assumptions, the hyperthreaded portion of the cpu shows up to software as a seperate cpu. For this reason a win2k pro machine has to have hyperthreading disabled on a dual xeon machine or else it will just use the first physical cpu and its child hyperthread. This is why artificial smp limitations suck. Also win2k server will only allow 4 cpus in standard edition so it can only utilize two physical cpus and their hyperthreads. Windows Server 2003 ups the amount of cpus allowed for standard edition to 8 to account for this.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  4. Imagine a Beowulf Cluster of these by Anonymous Coward · · Score: 5, Funny

    All operating on a single chip!

  5. But the real question... by Jace+of+Fuse! · · Score: 3, Interesting

    Does SMP support automatically allow benefits from Hyperthreading, or does that require special support all it's own?

    --

    "Everything you know is wrong. (And stupid.)"

    Moderation Totals: Wrong=2, Stupid=3, Total=5.
    1. Re:But the real question... by norton_I · · Score: 5, Informative

      SMP already can gain benefit from hyperthreading. However, an OS really needs special support to A) get the most out of hyperthreading and B) avoid worst-case scenarios, especially when you have both multiple physical CPUs and multiple logical CPUs per physical CPU.

      For instance, if you have two processes running, you want to put them on different physical CPUs, and if you have a choice, grouping threads with the same memory image on a single processor improves cache usage.

      Without this, hyperthreading may

    2. Re:But the real question... by lederhosen · · Score: 3, Interesting

      Yes, as said in the other posts,
      BUT, you want to schedule the
      same process on the same CPU in
      order to not trash the cache.

      I.e. you can make a huge inprovement
      by make the scheduler aware of
      processors *and* logical processors.

  6. good stuff by The+Evil+Couch · · Score: 5, Insightful

    The results on Linux kernel 2.4.19 show Hyper-Threading technology could improve multithreaded applications by 30%. Current work on Linux kernel 2.5.32 may provide performance speed-up as much as 51%.

    while it may not be very useful for a single-user box(it actually looks like it would be a detriment), integrating it into client-server situations would give us some nice boosts in performance. web servers ought to see some real gains with this.

    1. Re:good stuff by windex · · Score: 5, Insightful

      You aren't looking at this logically. It's not that "you need that much CPU for a webserver", is that "look at how many more customers you can squeeze in per server".

      This lowers cost for providers, and eventually lowers costs for consumers.

      Yee haw.

    2. Re:good stuff by koreth · · Score: 3, Informative
      Depends on what site you're running. If you read your traffic report and say to yourself, "Wow, 10000 hits yesterday! A new record!" then no. If you say to yourself, "Uh oh, only 7500000 hits yesterday, must have been a big network outage somewhere," then yes.

      There's a reason some sites have multiple racks of dedicated web servers, and any technology that lets them serve more users in less physical space is going to be a win if the cost isn't prohibitive.

  7. 51% speed-up! by core+plexus · · Score: 5, Interesting
    An excellent, detailed article. For those in a hurry:

    "Conclusion
    Intel Xeon Hyper-Threading is definitely having a positive impact on Linux kernel and multithreaded applications. The speed-up from Hyper-Threading could be as high as 30% in stock kernel 2.4.19, to 51% in kernel 2.5.32 due to drastic changes in the scheduler run queue's support and Hyper-Threading awareness."

    My questions: What's the downside? Is AMD doing anything similar?

    Fight with computer brings SWAT team

    1. Re:51% speed-up! by PCM2 · · Score: 5, Informative

      The downside is that for code that isn't SMP/HT-aware, performance can actually degrade. Tom's Hardware ran tests of hyperthreading on the 3.06GHz P-4, and in almost every case, it performed better with hyperthreading disabled.

      --
      Breakfast served all day!
    2. Re:51% speed-up! by tshak · · Score: 3

      Although threading is popular for server based apps, for normal desktop apps threads should be sed lightly or not at all. Take an Mp3 coder for example. Sure, the MP3 encoding itself will launch a thread to update the status bar, but the real CPU hog is the encoding itself which is done in a single thread. According to Tom Pabst in this scenarion the MP3 encoding will perform slower than a non-HT proc.

      Also, consider another big peformance hog, games. Although a Game Server may take advantage of HT, I don't think (and this is purely speculation based on _minimal_ 3D engine programming experience) it would be a good idea for games to use threads. Threads carry overhead, and they also can make your codebase difficult to manage.

      --

      There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
  8. Application dependant by PaschalNee · · Score: 5, Insightful

    The pretty detailed (for me anyway) article on Ars Technica concludes that performance on a HyperThreaded CPU will be very much dependant on the application mix. While research like this is useful it will probably always be a try and see scenario.

  9. HT hurt perf by steelerguy · · Score: 5, Interesting

    Tested HT running couple large jobs on a 2 CPU box with each process using over a GB of RAM. Performance went down.

    Also HT can play havoc with a openMosix cluster since processes can start being migrated around to CPU's that do not really exist and appear to have no load, yet the physical CPU may be 100% loaded in reality.

    It is not all peaches and cream.

    1. Re:HT hurt perf by Zathrus · · Score: 3, Informative

      processes can start being migrated around to CPU's that do not really exist and appear to have no load, yet the physical CPU may be 100% loaded in reality

      The article indicates that they're fixing this in the 2.5 branch. Lots of additional patches to the scheduler to let it comprehend the difference between physical and logical processors and do the Right Thing with them.

      Oh, and if you're running a 2 CPU box with only a couple (as in two) large jobs then no, you won't see a performance gain. You already have 1 CPU/process and HT would just be additional overhead.

  10. Useful for development? by NixterAg · · Score: 5, Insightful

    Like most development shops, we do a great deal of development for multiprocessor machines so we write a lot of multithreaded code. Multithreaded code creates a whole host of new debugging pitfalls that don't show up if the developer is debugging on a single processor workstation. As John Robbins says in his terrific Debugging Applications book, if you are developing a multithreaded application, you better be certain you are doing your debugging in a multiprocessor environment.

    From a development standpoint, will a hyperthreaded chip provide an adequate environment in duplicating the behavior of a multi-processor PC well enough that shops can buy cheaper, one CPU machines for development and still be confident in their results? I'm guessing nothing will replace the real thing but I'd be interested in any commentary.

  11. Humph! by airrage · · Score: 4, Funny

    Well if I must say something, it' this: that's really going to put a fancy how-do-you-do in the knickers of all those pay-per-processor software types. I mean Oracle, for heaven's sake, is going to have to go absolutely bonkers trying to figure out how to screw the light-bulb into that buffalo (if you pardon my french). I mean what's a meglomanic to do? I mean I've got expenses! I've got tricarbonfiberalloy yacht hulls to pay for! Can have people going around trying to process code in a processor without us getting some slice of that monkey, I'll tell you right here and now sir! No sir! Maybe it's your not patriotic enough. Trying to cut corners, eh gov'nor? Now I'm gonna have to go and rewrite all the contracts stating explicitly that "processor" is defined as a virtual space for processing. Yes that ought to do it. But I'll still have to have the lawyers check it, just to make sure they aren't any loopies. Drats those laywers! Taking all my money too!

    --
    "This isn't a study in computer science, its a study in human behavior"
  12. Hyper(Space)Threading by LookSharp · · Score: 5, Funny

    If you overclock the Xeons (And newer P4 CPUs) too high...

    "Prepare to go to HyperThread."

    "Go to HyperThread!"

    *WHOOSH*

    "My God, they've gone plaid!"

    (Just to keep on topic, this is a very informative shootout between HT/non-HT Intel and AMD SMP processors setups here.)

    Just couldn't resist the Spaceballs reference, tho!

    1. Re:Hyper(Space)Threading by The+Evil+Couch · · Score: 5, Funny

      so you're saying AMD's response is going to be to go to LudiciousThreading?

  13. Re:Underwhelmed by pVoid · · Score: 3, Informative
    Compilers are notoriously single threaded monsters.

    On moft platforms, CL.exe goes file per file, and outputs. It's a linear opertation. So HT for compiling makes no difference.

    However, the NT DDK has a cool feature that allows you to spawn as many instances of CL as there are processors. Which you guessed it, is only of any use if you are compiling tens/hundreds of files.

    Sorry, I do not really know of compiler internals for *NIX. Maybe someone can back me up? or clear it up?

  14. Executive summary... by guido1 · · Score: 3, Informative

    Hyperthread support vs not.

    Standard API calls (w/ hyper thread) Increase (a bad thing (tm)) of latency of calls by 1-6%.

    STD workload (w/ hyper thread) Increase in throughput an average of 5-10%. Disk writes decreased throughput by 30%.

    Client network perf: "Chat room" test, increase of throughput 22-28%.

    Server network perf: File serving, increase of 9-31%.

    Kernal 2.5.24 roughly doubles the above benefits.

    Looks like no real downfalls... (How often are you running a single thread? Me either.)

  15. In other news... by dirvish · · Score: 3, Insightful

    Hyper-Threading Speeds Windows

  16. Re:Underwhelmed by Xerithane · · Score: 3

    Sorry, I do not really know of compiler internals for *NIX. Maybe someone can back me up? or clear it up?

    With gcc, the -j will setup gcc to utilize SMP. You specify the number of processors you physical have. I do not know how it would work with HT, and I didn't RTFA to see if they covered it. There is native support inside of gcc for SMP-based compiling though.

    --
    Dacels Jewelers can't be trusted.
  17. Re:What are you talking about? by pVoid · · Score: 5, Interesting
    You're assuming the kernel is one thread =)

    The kernel has lots of work to do when you call into it. Of course it wouldn't boost up if all kernel calls were like this:

    void myKernelFunc( long param ) { return param * 2 - 23; }

    But kernels do work.

    See, something that very few people know is that the NT kernel is fully pre-emptible, fully-interuptible, and re-entrant by design. ie. even for single processor systems, it's like that.

    The linux kernel is not.

    It's very hard to 'tack on' SMP features onto a system that wasn't made with that in mind in the first place.

    This has some advantages, and some drawbacks... NT kernel programming is *frigging* hard. HARD.

    But it also has the advantage that it makes *much* better use of SMP.

    Sad, but true.

  18. In Other News. by OS24Ever · · Score: 4, Funny

    Faster clock speed processors speed up Linux.

    --

    As a rock-in-roll Physicist once said, No matter where you go, there you are.

  19. Proof Bogomips are bogus! by Zathrus · · Score: 4, Funny

    As if there wasn't enough already...

    processor : 0
    bogomips : 3191.60
    processor : 1
    bogomips : 3198.15

    According to that the logical processor is actually faster than the physical one! Just think of what you could wind up with if you instantiated a logical CPU on the logical CPU!

  20. It's just you by Royster · · Score: 5, Insightful

    WHat you've conveniently snipped out in your trollish post is all of the applications benchmarks showing improvements. If you're not going to run any application code, you might as well shut the machine off and save the marginal stress on the environment.

    Most of us have our computers do work and those applications, running on an OS which has *barely* slowed, will be able to do more work in the same amount of time under the HT-aware OS than under one which does not utilize the second, virtual processor.

    --
    I have discovered a truly marvelous sig, unfortunately the sig limit is too small to contain i
  21. Technical Summary by 0x69 · · Score: 5, Insightful

    If you're running code that's efficient on a P4 (few mis-predicted branches, low cache miss rate, good parallelism, etc.) then HT is pretty much useless.

    If you're running code that's inefficient on a P4 (which pays for its high GHz with long pipelines, large latencies, a slow decode stage, and several other drawbacks), then HT can usually paper over a fair percentage of these problems. But remember that HT requires OS support, may require application support, and "your mileage will vary".

    --
    It's easy to make up & spread cool- and credible-sounding stuff. Finding & checking hard facts is hard work.
    1. Re:Technical Summary by cartman · · Score: 3, Interesting

      What you said was false.

      Take the example of database & OLTP applications. Database transactions are heavily dependent on repeated access to RAM. Virtually no database is small enough to fit into cache, and there is often little regularity in which data is accessed. Memory latency will REQUIRE a non-SMT processor to wait IDLY each time there is a memory latency, which takes >100 proc cycles on a modern CPU. This has NOTHING to do with he p4 architecture or long pipelines.

      "But remember that HT requires OS support, may require application support..."

      HT does not require OS support as long as the OS is capable of recognizing more than 1 CPU. Any threaded app can benefit from HT.

  22. Expensive HT or cheap real SMP? by ponos · · Score: 5, Insightful


    In Europe P4 3.0 with HT costs ~745 euro (+tax)
    An Asus A7M for dual Athlon costs ~260 euro (+tax)
    Two Athlon XP 2200+ cost ~340 euro (+tax).
    Alternatively you can get two Athlon MP 2000+ for
    roughly the same money (if you don't trust the
    XPs).

    Now, please explain to me why would someone
    with real SMP needs in mind (and NOT games)
    consider the P4 with HT.

    P.

    P.S. I understand that the prices in the US are
    different, but still, it is VERY expensive.

  23. Re:excellent by Russ+Steffen · · Score: 5, Informative

    Holy intellectual dishonesty, Batman!

    NT and Windows 2000 do not support HT and never will. NT will not becuase it's been end-of-lifed, and Windows 2000 will not because of Microsft policy. On a 2-CPU system with HyperThreading, NT and Windows 2000 will think they have real 4 CPUs (unsurprisingly, this is what a pre-HT version of Linux will see as well). HT support means the OS knows that it has, in this example, 2 real CPUS and 2 fakes, and the scheduler will weight the real CPUs accordingly.

    XPPro SP1 is the first, and only shipping version of Windows to support HT.

  24. Summary by swillden · · Score: 3, Insightful

    So, in a nutshell, what MS says is: Windows 2000 counts processors in a broken way and requires you to buy licenses for every logical processor, even though you won't get nearly as much processing power as you would if you really had that many physical processors. But rather than fix this bug, we're going to solve the problem by making you buy .NET, which counts processors correctly. So either way, if you're going to use hyperthreading, expect to send us more money.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  25. Hyperthreading and memory access by cartman · · Score: 5, Informative

    One of the major impediments to increasing CPU performance has been increasing memory latency. Memory latency has grown worse as CPUs have gotten faster. Accessing RAM will now cause a >150 cycle latency, during which the processor sits IDLE.

    Cache only partly mitigates this problem. Some applications, such as databases and OLTP, are heavily dependent on repeatedly accessing non-cached RAM. There is no way to cache all the relevant data, since virtually all databases are larger than can fit in any present cache, no matter how large, and there is sometimes no way to predict which data will be accessed. ALL of these applications have CPUs that spend much of their time being IDLE, waiting for memory to be returned.

    SMT (hyperthreading) allows the processor to perform useful work during these otherwise idle periods, by allowing the cpu to switch to a thread that is not blocked on memory access. The "idle bubbles" in the execution pipeline can therefore be "filled in" by useful work that advances the state of relevant programs.

    SMT can cause a degredation in performance beceause it can lead to "cache thrashing." In an SMT-naive kernel, two unrelated threads could be scheduled for the same physical CPU. These unrelated threads will likely share very little code or data. The two threads will therefore "compete" for the single shared cache, with each thread's data being repeatedly displaced by the other's.

    This difficulty can be substantially mitigated by making the kernel aware of "virtual processors," and by implementing scheduleing algorithms to minimize the impact. The performance of hyperthreading will likely improve as kernels are better able to exploit it.

  26. Re:HT is not single-chip SMP by SpinyNorman · · Score: 3, Insightful

    I don't believe that's correct.

    As I understand it HT can indeed speed up pure integer code (or more generally code that's competing for a single CPU resource). HT will allow another thread to exceute if the current one is waiting on anything from pipeline results to memory access. I believe that modern CPU/memory speed disaparity was one of the driving forces behind it - if one thread gets a cache miss then another may be able to continue executing rather than having to sit idle waiting for main memory.

  27. SMT will become increasingly important by cartman · · Score: 5, Interesting

    SMT (hyperthreading) will become increasingly important when processors are able to execute more than 2 threads simultaneously.

    This development is inevitable. Previously, each new processor generation was faster than the prior one at a given clock rate, because each new processor core had more execution units, and was therefore able to perform more work in parallel. This trend abruptly ended recently, for one reason: there is no more instruction-level parallelism (ILP) to exploit. It is impossible for a processor to look at a thead of execution and find more than a few instructions to execute in parallel.

    The only parallelism left to exploit is THREAD-LEVEL parallelism (TLP). Therefore the only way to continually increase performance is to increase the number of threads that a CPU can execute in parallel. This requires two modifications to CPU cores: first, increase the number of thread contexts per CPU, and second, increase the number of pipelines to which those threads can be dispatched.

    With the P4, it would be pointless to have more than 2 thread contexts, because there aren't enough CPU resources lying idle to execute more than 2 threads. But future CPUs could make use of more than 2 thread contexts by having enough CPU resources to execute all of them. Future CPUs could have 20 execution units or more, which would be enough to execute several threads. Remember that the number of transistors per CPU continues to increase exponentially.

    It's easy to forsee a time when processors have 20 execution units (10 integer, 10 fp) and 4 thread contexts, offering more than triple the performance of a non-SMT cpu. In the future, non-SMT CPUs will make as little sense as a non-superscalar CPU would today.

  28. Re:excellent by zdarnell · · Score: 3, Informative

    Any SMP capable operating system supports HT. On the other hand license issues make combining true SMP and HT a pain on non server version of Windows.

    You're totally missing that part of the beauty of HT is the transparency.

    On the other hand you can write things SPECIFICALLY for HT to deal with things such as cache issues, but saying that windows doesn't support it at all is rather misleading and makes it seem like people wouldn't see any improvements at all.

  29. Re:What are you talking about? by pVoid · · Score: 3, Informative
    the NT kernel is fully pre-emptible, fully-interuptible, and re-entrant by design

    Care to explain what these mean?

    I'll explain to you what this means:

    it means a piece of hardware raising an interupt will launch a driver's ISR, hardware wise, at that point, any other hardware raising an interupt at a lower IRQ level will not be serviced. BUT, any higher IRQLed hardware can take over of that 'thread' that is servicing the first interupt. There are *no* exceptions to this rule. In effect, your ISR is fully interuptible.

    The thread dispatcher runs at DIRQL (D=Disptach), any IRQL higher than DIRQL is kind of beyond the concept of threads. But any thread running bellow DIRQL (ie, APC or normal threads) are fully pre-emtible. There is no thread that has *any* priority of not being pre-empted anywhere in the system.

    All of the kernel is also re-entrant, which means from anywhere within the kernel, so long as you are in proper IRQL, you can call back the kernel.

    At these altitudes, or depths, whichever you wish, there are many strange beasts that you've never heard of - or never had a reason to really use - that are being used to do synchronization. Namely spin locks. Moft came up with queued spin locks a few years ago, and that was a rather Good Thing (tm). It made spin locks so much better under SMP systems.

    Now, to all the posts that say "all I need is an OS that doesn't suffer from SMP", all I have to say is why do you use SymmetricalMP in the first place??! Why not use assymetrical processing, and just queue all interupts on a single CPU? it'll sure as hell simplify everything, and reduce overloading the bus with those damn spin locks!

    If you're going to claim that the Linux kernel is doing a good job of an SMP system, you have to show me it's actually performing better. Not just allowing more threads to run on more processors... everyone can do that.

    Second thing is: don't flatter yourself by 'easy wins'. All I'm saying is that this is just *not* an win for linux. It's only a win for HT and multithreading... but hey, we all knew that Multithreading is a Good Thing(tm)... right?