Slashdot Mirror


Hyper-Threading Speeds Linux

developerWorks writes "The Intel Xeon processor introduces a new technology called Hyper-Threading (HT) that makes a single processor behave like two logical processors. The technology allows the processor to execute multiple threads simultaneously, which can yield significant performance improvement. But, exactly how much improvement can you expect to see? This article gives the results the investigation into the effects of Hyper-Threading (HT) on the Linux SMP kernel. It compares the performance of a Linux SMP kernel that was aware of Hyper-Threading to one that was not." Ah, the joys of high performance.

239 comments

  1. Also the Pentium 4 - 3 Ghz is hyperthreaded. by deathcow · · Score: 5, Funny


    Xeon folks arent having the only fun. The 3 Ghz Pentium 4 is also hyperthreaded for that crunchy flavor and great taste.

    1. Re:Also the Pentium 4 - 3 Ghz is hyperthreaded. by deathcow · · Score: 5, Informative

      Here is the associated press release from Intel about the HT in 3 Ghz P4's. I have seen screenshots of Windows task manager showing (2) CPU performance graphs.

    2. Re:Also the Pentium 4 - 3 Ghz is hyperthreaded. by TokyoBoy · · Score: 2, Interesting

      Does anyone know if AMD will be doing something similar or if their current processors do something like this? I know that many High Performance Clusters use SMP machines and multi-threaded code and could take advantage of HT. Many clusters are made with AMD processors due to the fact that they are so much less expensive than Intel.

    3. Re:Also the Pentium 4 - 3 Ghz is hyperthreaded. by Henry+V+.009 · · Score: 3, Insightful

      Yes, they do something almost exactly like it. Simply buy two processors and a multi-processor motherboard. That defeats the purpose of this technology, of course, but it nearly accomplishes the same thing.

      Other than that, well, I'm--still--waiting for Hammer. AMD is dropping a long ways behind Intel. Price is all they've got, and AMD isn't even competeing on price-performance real well at the moment. My guess is that Intel hyperthreaded systems will probably be better price-performance wise than AMD before long--if they aren't already.

    4. Re:Also the Pentium 4 - 3 Ghz is hyperthreaded. by imroy · · Score: 1
      Does anyone know if AMD will be doing something similar or if their current processors do something like this?

      Yes, from what I gather it's called Not-Having-Such-A-Ridiculously-Long-Pipeline (NHSARLP). Seriously, HT is more a method to reduce the impact of branch mispredictions on the long P4 pipe, rather than giving SMP on a die. The article gives pretty dismal numbers for the speed improvement. Sure, it's an improvement if you already have the hardware, but don't rush out now for a P4 or Xeon thinking it's equivalent to two CPUs.

    5. Re:Also the Pentium 4 - 3 Ghz is hyperthreaded. by _GIS_Geek · · Score: 1

      Okay ... would NOW be a good time to buy a Dual Processor (Non-Hyperthreaded) P3 Xeon (yes, P3!) ... Cheaper, and still a good processor? What about the price vs. Performance?????

  2. What's really cool also by esconsult1 · · Score: 4, Interesting

    We've used XEON's on our DB server for a few months now. The performance has been outstanding. You also see 4 processors when you run top.

    At first we thought this was an error, and got in touch with Dell's tech support. But the geeks there said this is normal behavior.

    1. Re:What's really cool also by JCholewa · · Score: 2, Interesting

      > We've used XEON's on our DB server for a few months now. The performance
      > has been outstanding. You also see 4 processors when you run top.

      > At first we thought this was an error, and got in touch with Dell's tech support.
      > But the geeks there said this is normal behavior.

      Of course it's normal behavior. Windows is (well, basically) counting the number of threads that the system can simultaneously execute (that's probably not entirely an accurate depiction), not the number of physical processors. But this does not mean that you're getting the performance of four processors. You still only have the execution resources of two processors at your system's disposal. The best that simultaneous multithreading can do is make more efficient use of the existing execution units. This can result in very nice performance boosts, no performance boosts at all, and (in some rarer circumstances) performance penalties. But it is in no way anything near like having that actual number of processors.

      Probably, a good rule of thumb would be "if it already stresses the execution units, then you won't see a boost, but if the code causes frequent thread stalls, then you'll probably see a nice jump".

      *EDIT* Crap, I didn't notice that you said top. Made the assumption about Windows. Sorry about that. My post more or less stands as the same, though. :)

    2. Re:What's really cool also by DivideX0 · · Score: 4, Funny
      But do you really want to see additional processors, wouldn't SCO want to charge you more for them?

      Earlier SCO Story

      --
      My next Slashdot post will be ready soon, but subscribers can beat the rush and see it early!
    3. Re:What's really cool also by ostiguy · · Score: 2

      Yup, I've heard task manager in win2k also shows 4 meters on a 2 physical cpu box.

      ostiguy

    4. Re:What's really cool also by afidel · · Score: 3, Informative

      Nope, win2k only shows 2, well at least for pro. It also binds to the first physical cpu and it hyperthreaded child. For this reason you have to turn off hyperthreading if you are going to install win2k pro on a 2 physical cpu workstation, I should know I have a system reimaging right now because it came from the factory with hyperthreading enabled and so only 1 physical cpu was being used.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    5. Re:What's really cool also by Chundra · · Score: 2

      Yeah. I dunno about other systems, but on my supermicro p4dp6 the POST messages even say there are 4. Using my own ad hoc benchmarks with dnetc, it appears like there are 2 fast processors and the hyper-threaded ones crunch at around 20%-30% of those.

    6. Re:What's really cool also by Richard_at_work · · Score: 4, Interesting
      Windows2000Pro only shows two as thats all it can handle. Its part of the Windows2000 limitation:
      • Windows2000pro - 2 cpus
      • Windows2000server - 4 cpus
      • Windows2000AdvServer - 8 cpus

      We put win2kserver on a dual Xeon with HT, and it showed 4 cpus (this was when we realised we had HT capable Xeons! Suree enough, after checking, we were right)
    7. Re:What's really cool also by garyrich · · Score: 2

      funny maybe, but the answer is probably yes. I know that If you have 2 mutithreading processors you have to have a versino of Windows that supports 4 processors - and that's a big additional chunk of change.

      --
      -- your Web browser is Ronald Reagan
  3. Fundamental mistake by cbcbcb · · Score: 5, Insightful
    >It compares the performance of a Linux SMP kernel that

    >was aware of Hyper-Threading to one that was not."

    But if you aren't going to use hyper threading you would use a UP (non-SMP) kernel, which would gain you considerable performance. The benefits are not so clear cut as many of the benchmarks show limited benefit from hyperthreading and would perform faster on a uniprocessor kernel.

    1. Re:Fundamental mistake by pergamon · · Score: 2

      Well, unless you have a computer that has multiple physical CPUs.

    2. Re:Fundamental mistake by JCholewa · · Score: 2, Interesting

      > Well, unless you have a computer that has multiple physical CPUs.

      You have a point, but he does, as well. SMT ("hyper-threading") should work automatically for multiprocessor systems. So if you have a dual processor, SMT-capable board in a system that's unaware of the SMT functionality, you should still get a boost from SMT. Unless Hyper-Threading is a really, really bizarre implementation of SMT. Reviewers should really compare against an SMP system that is incapable of doing SMT, because it'll do it automatically (or it should), even if you don't tell it to. Alternatively, you could approximate the same results by forcing the system to only use a number of threads equivalent to the number of processors. Not all programs can do this, though (compiling is the only thing that immediately comes to mind).

      Granted, I've been out of the loop a bit, so I might be making some really off the wall (and inaccurate) assumptions about Intel's SMT implementation.

    3. Re:Fundamental mistake by afidel · · Score: 5, Interesting

      nope, you make incorrect assumptions, the hyperthreaded portion of the cpu shows up to software as a seperate cpu. For this reason a win2k pro machine has to have hyperthreading disabled on a dual xeon machine or else it will just use the first physical cpu and its child hyperthread. This is why artificial smp limitations suck. Also win2k server will only allow 4 cpus in standard edition so it can only utilize two physical cpus and their hyperthreads. Windows Server 2003 ups the amount of cpus allowed for standard edition to 8 to account for this.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    4. Re:Fundamental mistake by Anonymous Coward · · Score: 0

      Err.. no that's wrong. Win2k pro and XP pro will show 4 cpu meters and run fine, running processes on all 4 "virtual" cpus. The hyperthreaded portion doesn't show up as a separate CPU, it shows up as a separate logical processing unit, which is handled differently..

    5. Re:Fundamental mistake by afidel · · Score: 2

      WRONG. Win2k pro will only use 2 cpu's total, XP Pro on the other hand will handle 2 physical cpus with or without hyperthreading. I just had to reimage a HP X4000 because it came from the factory with hyperthreading enabled (this is not usually the case not sure how it happened) and so the first 2 cpus the installer saw were physical cpu #1 and HT #1, physical cpu #2 and HT #2 were not used. This may have changed with SP3, but I doubt it as it is a major architecture change.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    6. Re:Fundamental mistake by sjames · · Score: 2

      Reviewers should really compare against an SMP system that is incapable of doing SMT, because it'll do it automatically (or it should), even if you don't tell it to.

      By default, hyperthreading will be used. Every board i've seen that supports it has a BIOS option to disable the virtual processor(s) by setting a bit in one of the MSRs.

  4. It's not possible by Oculus+Habent · · Score: 0, Troll

    Something that makes your computer run faster also makes free operatings systems faster too?!

    I wonder what it does for commercial OSes.

    Sorry for the sarcasm, but isn't that obvious? If you have a processor that can do more work than another processor at equivalent MHz, it, by most estimations, will speed something up.

    Not true for everything, but pretty close.

    --
    That what was all this school was for... to teach us how to solve our own problems. -- janeowit
    1. Re:It's not possible by makapuf · · Score: 1

      like the new shiny graphic card which has 700 fps under win and lin... oh wait, no free 3D accelerated drivers for X ...

    2. Re:It's not possible by lederhosen · · Score: 1

      It DOES exist in windows

  5. Imagine a Beowulf Cluster of these by Anonymous Coward · · Score: 5, Funny

    All operating on a single chip!

    1. Re:Imagine a Beowulf Cluster of these by Anonymous Coward · · Score: 0

      Has already been done

    2. Re:Imagine a Beowulf Cluster of these by Anonymous Coward · · Score: 0

      If only i had 5 mod points to pull this post down.....

  6. But the real question... by Jace+of+Fuse! · · Score: 3, Interesting

    Does SMP support automatically allow benefits from Hyperthreading, or does that require special support all it's own?

    --

    "Everything you know is wrong. (And stupid.)"

    Moderation Totals: Wrong=2, Stupid=3, Total=5.
    1. Re:But the real question... by stratjakt · · Score: 2, Insightful

      >> Does SMP support automatically allow benefits from Hyperthreading

      Yes

      HT essentially partitions out the CPUs pipeline into two pipelines executing concurrently: That is, two CPUs on the same die.

      --
      I don't need no instructions to know how to rock!!!!
    2. Re:But the real question... by Anonymous Coward · · Score: 1, Informative

      I believe it emulates the dual CPUs, so the OS should see it as 2 chips. That's why XP Home wouldn't take advantage of it, but XP Pro/2K would.

    3. Re:But the real question... by norton_I · · Score: 5, Informative

      SMP already can gain benefit from hyperthreading. However, an OS really needs special support to A) get the most out of hyperthreading and B) avoid worst-case scenarios, especially when you have both multiple physical CPUs and multiple logical CPUs per physical CPU.

      For instance, if you have two processes running, you want to put them on different physical CPUs, and if you have a choice, grouping threads with the same memory image on a single processor improves cache usage.

      Without this, hyperthreading may

    4. Re:But the real question... by lederhosen · · Score: 3, Interesting

      Yes, as said in the other posts,
      BUT, you want to schedule the
      same process on the same CPU in
      order to not trash the cache.

      I.e. you can make a huge inprovement
      by make the scheduler aware of
      processors *and* logical processors.

    5. Re:But the real question... by jim3e8 · · Score: 1

      Holy cow, that is catastrophic!

    6. Re:But the real question... by Mattsson · · Score: 2

      Well, the *article* says that you get more performance if you patch the kernel to be optimized for ht. =)
      It also says that you get a performance boost ever by using the standard smp kernel.

      --
      /.Mattsson - My native language is not English, so please don't whine over linguistic errors. (That's lame anyway...)
  7. good stuff by The+Evil+Couch · · Score: 5, Insightful

    The results on Linux kernel 2.4.19 show Hyper-Threading technology could improve multithreaded applications by 30%. Current work on Linux kernel 2.5.32 may provide performance speed-up as much as 51%.

    while it may not be very useful for a single-user box(it actually looks like it would be a detriment), integrating it into client-server situations would give us some nice boosts in performance. web servers ought to see some real gains with this.

    1. Re:good stuff by makapuf · · Score: 1

      err ... do you really need that much CPU power for a webserver ?

    2. Re:good stuff by cioxx · · Score: 2

      you would, if you're running an ircD server.

    3. Re:good stuff by escher · · Score: 1

      err ... do you really need that much CPU power for a webserver ?

      If you're running IIS, absolutely. (I work at an ISP so this isn't just idle speculation. :)

    4. Re:good stuff by windex · · Score: 5, Insightful

      You aren't looking at this logically. It's not that "you need that much CPU for a webserver", is that "look at how many more customers you can squeeze in per server".

      This lowers cost for providers, and eventually lowers costs for consumers.

      Yee haw.

    5. Re:good stuff by The+Evil+Couch · · Score: 2

      precisely. more powerful, more efficient webservers mean lower overhead costs, which is never a bad thing.

    6. Re:good stuff by Anonvmous+Coward · · Score: 2

      "err ... do you really need that much CPU power for a webserver ?"

      Ask one of Slashdot's victims when they come back on-line.

    7. Re:good stuff by koreth · · Score: 3, Informative
      Depends on what site you're running. If you read your traffic report and say to yourself, "Wow, 10000 hits yesterday! A new record!" then no. If you say to yourself, "Uh oh, only 7500000 hits yesterday, must have been a big network outage somewhere," then yes.

      There's a reason some sites have multiple racks of dedicated web servers, and any technology that lets them serve more users in less physical space is going to be a win if the cost isn't prohibitive.

    8. Re:good stuff by ethereal · · Score: 1

      Well, as long as you can get enough bandwidth hooked up to the box to serve those customers, that is. Your web server is a lot more likely to be bound by bandwidth than by processing power; if you're really doing that much processing on the web server box, then you probably are running stuff on a 'net-accessible box that maybe shouldn't be there (DB server, etc.). Having a real powerful web server is only an advantage for that short time period where a rapidly growing dynamically-generated site hasn't quite reached the threshold at which you would break out the site's processing into a tiered architecture and move the real processing deeper within the site (up the hierarchy of machines, if you can imagine).

      For example, compare to how /. is setup (not exactly a sterling example, but... :). Do you think that even if they had a really powerful box and could do everything on one box, that they would do so? I would guess not.

      YMMV, of course.

      --

      Your right to not believe: Americans United for Separation of Church and

    9. Re:good stuff by sjames · · Score: 2

      Even in a single user situation, it could be a benefit. This will be moreso once the scheduler knows how to avoid pathelogical schedules (two threads on one cpu, the other idle).

      doing make -j 4 (or 5) on a dual Xeon system is quite nice!

  8. What are you talking about? by pVoid · · Score: 2, Interesting
    The article clearly shows that syscalls and basically OS dependant stuff rarely improves in performance, in fact decreases in most spots.

    Of course multi-threaded applications are going to improve. What's your point?

    For those who didn't RTFA:

    Simple syscall 1.10 1.10 0%

    Simple read 1.49 1.49 0%

    Simple write 1.40 1.40 0%

    Simple stat 5.12 5.14 0%

    Simple fstat 1.50 1.50 0%

    Simple open/close 7.38 7.38 0%

    Select on 10 fd's 5.41 5.41 0%

    Select on 10 tcp fd's 5.69 5.70 0%

    Signal handler installation 1.56 1.55 0%

    Signal handler overhead 4.29 4.27 0%

    Pipe latency 11.16 11.31 -1%

    Process fork+exit 190.75 198.84 -4%

    Process fork+execve 581.55 617.11 -6%

    Process fork+/bin/sh -c 3051.28 3118.08 -2%

    is it just me? or does the linux kernel not perform so much better in SMP HT?

    1. Re:What are you talking about? by betaray · · Score: 1

      Of course your kernel is going to perform worse on an SMP box. Performance you're still limited to the speed of 1 CPU for any 1 call, with additional SMP overhead. The fact that most of the calls don't suffer is reason enough to praise the kernel.

      The whole point of parallel processing is provide more processor power for parallel tasks. Try running two (or one threaded client) distrubuted.net clients with hyper threading. That'd be an intresting benchmark. You've got to actually make the processor work to see the difference.

    2. Re:What are you talking about? by killmeplease · · Score: 0

      You didn't finish reading the article where multithreaded apps are compared. Simple UNIX calls are NEVER sped up by SMP but file servers and web servers are much faster and this is shown in this article. This HT would be great in the situation of a DB server, SMB server, Apache, email server, CGI server, etc....

      Interesting things can come from Intel

      --
      - Kill Yourself, spare us all! -
    3. Re:What are you talking about? by pVoid · · Score: 5, Interesting
      You're assuming the kernel is one thread =)

      The kernel has lots of work to do when you call into it. Of course it wouldn't boost up if all kernel calls were like this:

      void myKernelFunc( long param ) { return param * 2 - 23; }

      But kernels do work.

      See, something that very few people know is that the NT kernel is fully pre-emptible, fully-interuptible, and re-entrant by design. ie. even for single processor systems, it's like that.

      The linux kernel is not.

      It's very hard to 'tack on' SMP features onto a system that wasn't made with that in mind in the first place.

      This has some advantages, and some drawbacks... NT kernel programming is *frigging* hard. HARD.

      But it also has the advantage that it makes *much* better use of SMP.

      Sad, but true.

    4. Re:What are you talking about? by Anonymous Coward · · Score: 0

      Too bad you didn't read the rest of the article, or you would have seen the speedup.

    5. Re:What are you talking about? by charmer · · Score: 1

      These are latency measurements. Anyone with minimal threaded programming experience will tell you that any type of threading does not improve latency, just the throughput, by enabling overlapped execution.

    6. Re:What are you talking about? by Anonymous Coward · · Score: 0

      [quote]
      void myKernelFunc( long param ) { return param * 2 - 23; }
      [/quote]
      Man, some one needs to go back to 'Programming 101'. void functions do not return values.

      Quzah.

    7. Re:What are you talking about? by be-fan · · Score: 2

      Hah! The first poster on the OSNews thread about this story wasn't impressed either. Apparently, a lot of people don't have the attention span to read more than the first few tables in an article!

      --
      A deep unwavering belief is a sure sign you're missing something...
    8. Re:What are you talking about? by Tom · · Score: 1

      NT kernel is fully pre-emptible, fully-interuptible,

      Especially by the BSOD function. :-)

      scnr

      --
      Assorted stuff I do sometimes: Lemuria.org
    9. Re:What are you talking about? by IamTheRealMike · · Score: 2

      Well, I think the Linux kernel is now pre-emptable to some extent (not everywhere, but most places). Robert Love did the work necessary. I don't know how well it stacks up against the NT kernel, but I'd guess over the years it'll close the gap.

    10. Re:What are you talking about? by Anonymous Coward · · Score: 0

      Did a Slashbot just admit something positive about Windows? Even if it was partly negative, this is a big step for Slashbots everywhere.

    11. Re:What are you talking about? by Paul+Jakma · · Score: 2

      You're assuming the kernel is one thread =) Of course it wouldn't boost up if all kernel calls were like this:

      void myKernelFunc( long param ) { return param * 2 - 23; }


      urrgg... what balderdash.

      Look at what the LMBench benchmark is doing - in most cases it tests fairly specific OS paths. Eg a copy of data from userspace to kernel. Or a fork + exec. While these fairly shorts paths may not be "multi-threaded" the kernel itself still is.

      the NT kernel is fully pre-emptible, fully-interuptible, and re-entrant by design

      Care to explain what these mean?

      I doubt very much its 100% preemptible and interruptable - eg the initial OS interrupt vector must not be interrupted, drivers have often have a requirement to turn off interrupts (usually of the one they handle, but sometimes all interrupts). And even if by some magic NT is "fully interruptable and preemtible" - that does not mean it has much of a gain. You must lock data before you can access it, the kernel might preempt one task with another, but that second task might find that data it needs is locked and so has to spin or sleep.

      Highly threaded and preemtible kernel's also suffer from complexity (affects maintainability and stability) and this complexity and locking overhead can quite easily lead to /worse/ performance for many workloads.

      The linux kernel is not

      False. Linux acquired fine grained locking in 2.2, and has moved ever away from the big kernel lock since then. 2.5 is almost bkl free.

      But it also has the advantage that it makes *much* better use of SMP. (than linux you mean presumably.)

      Perhaps you may want to run some benchmarks and then come back and revisit that claim. Eg, which OS holds the SPECWeb record? Linux process/thread creation is an order of magnitude better than that of NT. etc..

      --
      I use Friend/Foe + mod-point modifiers as a karma/reputation system.
    12. Re:What are you talking about? by pVoid · · Score: 3, Informative
      the NT kernel is fully pre-emptible, fully-interuptible, and re-entrant by design

      Care to explain what these mean?

      I'll explain to you what this means:

      it means a piece of hardware raising an interupt will launch a driver's ISR, hardware wise, at that point, any other hardware raising an interupt at a lower IRQ level will not be serviced. BUT, any higher IRQLed hardware can take over of that 'thread' that is servicing the first interupt. There are *no* exceptions to this rule. In effect, your ISR is fully interuptible.

      The thread dispatcher runs at DIRQL (D=Disptach), any IRQL higher than DIRQL is kind of beyond the concept of threads. But any thread running bellow DIRQL (ie, APC or normal threads) are fully pre-emtible. There is no thread that has *any* priority of not being pre-empted anywhere in the system.

      All of the kernel is also re-entrant, which means from anywhere within the kernel, so long as you are in proper IRQL, you can call back the kernel.

      At these altitudes, or depths, whichever you wish, there are many strange beasts that you've never heard of - or never had a reason to really use - that are being used to do synchronization. Namely spin locks. Moft came up with queued spin locks a few years ago, and that was a rather Good Thing (tm). It made spin locks so much better under SMP systems.

      Now, to all the posts that say "all I need is an OS that doesn't suffer from SMP", all I have to say is why do you use SymmetricalMP in the first place??! Why not use assymetrical processing, and just queue all interupts on a single CPU? it'll sure as hell simplify everything, and reduce overloading the bus with those damn spin locks!

      If you're going to claim that the Linux kernel is doing a good job of an SMP system, you have to show me it's actually performing better. Not just allowing more threads to run on more processors... everyone can do that.

      Second thing is: don't flatter yourself by 'easy wins'. All I'm saying is that this is just *not* an win for linux. It's only a win for HT and multithreading... but hey, we all knew that Multithreading is a Good Thing(tm)... right?

    13. Re:What are you talking about? by pVoid · · Score: 2
      Thank you for probably the only rational post I've seen so far.

      Well, I agree with you, over the years, the gap will probably slowly close. With kernels, really, much more than the kernel, it's all the parafanelia that counts. The countless drivers that have, or have not been designed with pre-emtability - and interruptability in mind.

      Even if the kernel is modified completely overnight, it will take a few years for the whole kernel mode system to catch up.

      Bottom line is, and given that this benchmark comes from IBM it really doesn't surprise me, there isn't much there to see *yet*.

    14. Re:What are you talking about? by fitten · · Score: 1

      LoL... SPECWeb...

    15. Re:What are you talking about? by alext · · Score: 2

      Moft? Not sure who they are, but I'm pretty sure Stratus VOS had these variations on locks in 1986, since they were heavily into SMP, they might have been in Multics too (like everything else).

    16. Re:What are you talking about? by fferreres · · Score: 2

      Man, some one needs to go back to 'Programming 101'. void functions do not return values.

      Man, someone needs to go back to 'Grammar 101'.

      --
      unfinished: (adj.)
    17. Re:What are you talking about? by Paul+Jakma · · Score: 2

      it means a piece of hardware raising an interupt will launch a driver's ISR, hardware wise, at that point, any other hardware raising an interupt at a lower IRQ level will not be serviced. BUT, any higher IRQLed hardware can take over of that 'thread' that is servicing the first interupt. There are *no* exceptions to this rule. In effect, your ISR is fully interuptible.

      Linux doesnt do IRQ priority (beyond any irq priority done in the interrupt controller). Priorities are horrible do correctly - priority inversion for example. Also, not all the platforms linux runs on might support irq priorities. So Linux tries to use a more abstract and portable model, which incidentally allows even better handling than the priority based system you describe (while retaining the advantage of "simplicity").

      In linux, a lock is taken relating to the IRQ number, the interrupt is acknowledged, and the driver 'handler' is run (usually with interrupts /enabled/). Hence in linux, running of a driver 'ISR' does not prevent other interrupts from being serviced (eg by another CPU if SMP). No priorities - the only rule is that the /same/ interrupt can not run concurrently. Also, linux interrupt handling is done in 2 halves (on the general Unix fashion) -> Top half and bottom half. The top half is run with the interrupt disabled - its job is to service the interrupt as fast as possible doing the minimum amount of work needed. The bottom half is run after the top-half has finished, but with the interrupt enabled. (NB: tasklets are replacing bottom-halves, but the general idea is the same. The ISR can schedule 'tasklets' to do work, when it finishes, the kernel will run the tasklets.)

      NB: a driver ISR can set SA_INTERRUPT which will mean the general kernel interrupt mechanism will call it with all interrupts disabled. (not reccomended).

      re-entrant: interrupts in linux can call certain other general kernel functions, but must be very careful. and there are certain things they must not do (and these restrictions impact very heavily on what functions they can call). However, this isnt a bad thing. A kernel that is re-entrent even from within interrupt handling is possibly one that is trying to overreach itself in its design goals. Purpose of an ISR is to do whats needed, do it fast, and get out of the way. Real work should be done outside of interrupt context - otherwise the complexities are horrendous. Ie linux not allowing much reentrancy from interrupt context is quite possibly a very good design decision.

      spin locks: as someone points out, they are a well known and long-standing concept. However, fascinating as the concept and use of them may be - they do *not* increase performance in themselves, in fact they have an overhead. And while their use may allow a kernel to perform better by allowing concurrency, they do complicate the kernel. And their use is a tradeoff between simplicity, lack of reentrancy, efficient at serial loads/inefficient at parallel loads Vs complexity, reentrancy and inefficient at serial loads/efficiency of parrallel loads. (but that efficiency is not scalar because of lock overhead - even more so because complexity overhead makes it difficult for programmers to get very grained locking to be efficient and reliable).

      That said, Linux runs on systems with far more CPUs than NT does (iirc linux: SGI Origin 64node O2K with 128CPUs Vs NT on 32 CPU Unisys machines, but unisys recc'd partitioning the system, so in production use you'd run maybe 8 to 16 NT partitions with 2 to 4 CPUs each). And SGI and others have done lots of work to refine locking (and other many CPU SMP issues).

      Benchmarks: i asked first :) Its been a long while since those benchmarks came out that showed where NT was stronger than linux at SMP (very carefully targeted benchmark too). The scaling issues that threw up /benefited/ linux as the kernel hackers went and solved them.

      So show me a recent benchmark that backs up your claim.

      HT a win in the general case: not on certain workloads. :) (but in general, probably).

      --
      I use Friend/Foe + mod-point modifiers as a karma/reputation system.
    18. Re:What are you talking about? by pVoid · · Score: 2
      So show me a recent benchmark that backs up your claim

      My claim isn't really about NT. What I was saying about NT was just an FYI for the chearleaders.

      My claim was that this article, despite it's title (HT Speeds Linux), wasn't really showing any proof that linux itself was going faster.

      Obviously applications which use multiple threads are going to go faster on an HT machine, (the kernel would have to be really bad for that not to happen). But the article's numbers didn't show any speed up in linux itself. Which was my point, and the article itself, IMO, backs that claim.

      Another thing is that the NT kernel might be re-entrant, but the rules of IRQL still hold... there are many functions that can't be called at elevated IRQLevels. So, it's not as complicating the kernel as you might think.

      One last thing, the general point I'd like to stress is that kernel programming, and kernel optimizations are very much a black art, rocket sciene, or voodoo. It's very hard to assert whether spin locks are enough overhead to not be justifiable. And I think those kinds of discussions are way beyond the scope of this forum.

      The bottom line is this article, IMHO, was a piece of self serving specialized benchmark put out by IBM (what's so hard to believe about that?), and despite the fact that it's linux that's being talked about, I still find the numbers to be skewed to the profit of the publisher.

    19. Re:What are you talking about? by Paul+Jakma · · Score: 2

      My claim was that this article, despite it's title (HT Speeds Linux), wasn't really showing any proof that linux itself was going faster.

      Read the paper. While lmbench results werent faster (and indeed lmbench by design tests very specific kernel paths) look at the other benchmarks - they show huge speedups, a lot of which will be due to parallelism in the kernel file and network paths.

      Another thing is that the NT kernel might be re-entrant, but the rules of IRQL still hold... there are many functions that can't be called at elevated IRQLevels. So, it's not as complicating the kernel as you might think.

      Well, i dont really know much about the NT kernel. So originally your point was it was so reentrant, depending on the level. Now you've qualified that to say that many functions cant be called if irq level is elevated - hence nullifying much of that supposed reentrancy. :)

      kernel programming, and kernel optimizations are very much a black art, rocket sciene, or voodoo

      It isnt really. Its quite well understood. Its just there's lots to it - which means most of us will never be able to understand all or even a part of modern highly-featured kernels. But there's nothing magic about it.

      I still find the numbers to be skewed to the profit of the publisher.

      So come up with some benchmarks. :)

      --
      I use Friend/Foe + mod-point modifiers as a karma/reputation system.
    20. Re:What are you talking about? by pVoid · · Score: 1
      >kernel programming, and kernel optimizations are very much a black art, rocket sciene, or voodoo

      It isnt really. Its quite well understood. Its just there's lots to it - which means most of us will never be able to understand all or even a part of modern highly-featured kernels. But there's nothing magic about it.

      Have a sense of humour. Of course *nothing* is magic. I was just making a point that it's beyond the programming most people learn in CompSci because it's so specific in nature. But on top of the sheer breadth of knowledge necessary, kernel mode programming has some fundemantally different ideas that aren't present in user mode programming. So it's not just the quantity.

      Well, i dont really know much about the NT kernel. So originally your point was it was so reentrant, depending on the level. Now you've qualified that to say that many functions cant be called if irq level is elevated - hence nullifying much of that supposed reentrancy. :)

      You're playing with words, it would take me far too long to explain the intricacies of what this means... And I'm not even an expert on the matter.

      See my last comment about this not being a place where such conversations would hold water... /. is essentially a bar where people meet and talk after hours over a few beers. If you really want to learn about the re-entrancy issues, read a few books about it... an 'insightful' post isn't going to cut it.

      And last but not least:

      So come up with some benchmarks. :)

      I *refuse* to come up with benchmarks in order to explain my dislikes about benchmarks. It's like saying to a movie critic, "hey, go and shoot a movie to explain yourself". Stop being defensive and viewing it as a race between good and evil. It will actually do linux more good in the end if people are critical, and don't cave in to what IBM might have to say that might just be the beginnings of a Microsoft style skewing of numbers. Because, heck, I'm positive this whole article would have been viewed in a completely different perspective had it been Microsoft posting its results.

  9. Re:Sad news -- Don Bluth dead at 65 by Anonymous Coward · · Score: 0


    I saw this also on CNN... except it wasn't Don Knuth you Do-Do. It was Don Bluth of animation fame... All Dogs Go To Heaven, Titan AE, Dragons Lair, etc.

  10. Only Threads ? by makapuf · · Score: 2, Insightful

    I know, there might be many places where it has been discussed before, but could someone please tell me if HT is only for threading or can it be used for precesses, too.
    And I know, they are essentially the same syscall under linux, and might be faster, b/c of synchronization issues wrt to the memory access IIRC ...

    1. Re:Only Threads ? by lederhosen · · Score: 1

      They can be used for processes as well, but
      the real advantage is when used by two threads
      belonging to the same process, then it will not
      trash the caches.

    2. Re:Only Threads ? by Wesley+Felter · · Score: 2

      Threads and processes are almost the same thing in Linux, so HT benefits them both.

  11. HT - who cares? by Anonymous Coward · · Score: 0

    We are talking about 3.06GHz processors here, as far as desktop systems go.

    If my 500MHz had it, that would be cool.

    1. Re:HT - who cares? by Anonymous Coward · · Score: 0

      There's a 500MHz Pentium IV? OMG that thing must run like a dog!

  12. 51% speed-up! by core+plexus · · Score: 5, Interesting
    An excellent, detailed article. For those in a hurry:

    "Conclusion
    Intel Xeon Hyper-Threading is definitely having a positive impact on Linux kernel and multithreaded applications. The speed-up from Hyper-Threading could be as high as 30% in stock kernel 2.4.19, to 51% in kernel 2.5.32 due to drastic changes in the scheduler run queue's support and Hyper-Threading awareness."

    My questions: What's the downside? Is AMD doing anything similar?

    Fight with computer brings SWAT team

    1. Re:51% speed-up! by PCM2 · · Score: 5, Informative

      The downside is that for code that isn't SMP/HT-aware, performance can actually degrade. Tom's Hardware ran tests of hyperthreading on the 3.06GHz P-4, and in almost every case, it performed better with hyperthreading disabled.

      --
      Breakfast served all day!
    2. Re:51% speed-up! by Zathrus · · Score: 2

      What's the downside?

      Well, if your apps aren't multi-threaded then they can't make use of it. If you don't run enough CPU-intensive processes on the box, it won't buy you anything and may actually hurt you.

      If you look at the benchmarks not all the numbers are in the positive realm... although if you exclude the sync read/write numbers then it's generally a rather small difference.

      Is AMD doing anything similar?

      Not to my knowledge. They're betting the farm on Opteron/Athlon64.

    3. Re:51% speed-up! by Julian+Morrison · · Score: 2

      The downside is that for code that isn't SMP/HT-aware, performance can actually degrade.

      How many modern programs use no kernel threads / multiple processes at all? Not many I'm guessing.

    4. Re:51% speed-up! by dcmeserve · · Score: 1

      > My questions: What's the downside?

      Mostly, the cache size per "virtual cpu" is cut in half. So no individual thread of execution (including all non-threaded programs) is ever going to be able to use the entire cache.

      > Is AMD doing anything similar?

      I haven't heard of anything yet. I don't know how flexible the spec for the K9 architecture still is, but perhaps it might be slipped in if there's sufficient market demand. If it's not in there already.

      Though here's another angle: if Hammer/Opteron processors are sufficiently cheaper than equivalent-performance Intel cpu's, then you might be able to have an *actual* dual-processor machine for not a lot more money than a systems with a single HT'ing pentium.

      --
      "Orthodoxy is unconsciousness" - Orwell
    5. Re:51% speed-up! by tshak · · Score: 3

      Although threading is popular for server based apps, for normal desktop apps threads should be sed lightly or not at all. Take an Mp3 coder for example. Sure, the MP3 encoding itself will launch a thread to update the status bar, but the real CPU hog is the encoding itself which is done in a single thread. According to Tom Pabst in this scenarion the MP3 encoding will perform slower than a non-HT proc.

      Also, consider another big peformance hog, games. Although a Game Server may take advantage of HT, I don't think (and this is purely speculation based on _minimal_ 3D engine programming experience) it would be a good idea for games to use threads. Threads carry overhead, and they also can make your codebase difficult to manage.

      --

      There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
    6. Re:51% speed-up! by neurojab · · Score: 2

      Those tests are very end user (luser) specific. Yes it's true that the majority of people running games on Windows won't benefit a bit from HT... neither would those people benefit from ordinary SMP. That's why Intel left HT disabled in all P4 models until recently. Server workloads are very different, where every decent app makes heavy use of threads, and therefore benefits much more from HT (and SMP). The IBM tests are pertinent to servers and some power users.

    7. Re:51% speed-up! by Anonymous Coward · · Score: 0

      Like all of Tom's benchmark tests, it was flawed. Many of the apps he ran are single threaded, so there is a very small performance boost if any. The same result would apply if you ran a plain dual proc system against a single CPU system on a single threaded app.

      Running the tests as they did is as stupid as running Q3 benchmarks for a hard drive.

    8. Re:51% speed-up! by Anonymous Coward · · Score: 1

      for normal desktop apps threads should be sed lightly or not at all.

      You must live in 1990. Or are using Unix. Anyway, here's my taskman:

      iexplore.exe 20 threads
      outlook.exe 17 threads
      explorer.exe 14 threads
      phoenix.exe 13 threads

      The fact is that improving threading performance improves responsiveness which improves user-perceived speed, if not wallclock benchmarks.

      Just sit down at a modern threaded environment (eg Windows) running a X mhz CPU and then at one with 2 CPUs at X/2 mhz. The second one may well feel 'faster'.

  13. Application dependant by PaschalNee · · Score: 5, Insightful

    The pretty detailed (for me anyway) article on Ars Technica concludes that performance on a HyperThreaded CPU will be very much dependant on the application mix. While research like this is useful it will probably always be a try and see scenario.

  14. HT hurt perf by steelerguy · · Score: 5, Interesting

    Tested HT running couple large jobs on a 2 CPU box with each process using over a GB of RAM. Performance went down.

    Also HT can play havoc with a openMosix cluster since processes can start being migrated around to CPU's that do not really exist and appear to have no load, yet the physical CPU may be 100% loaded in reality.

    It is not all peaches and cream.

    1. Re:HT hurt perf by shaka999 · · Score: 1

      We have also done significant testing on a dual processor box. If, on the two processor box, we ran 4 processes and compared to 4 without HT then things got a nice performance boost. If we only ran two or three then performance went down.

      The problem is the kernal doesn't know what is a real processor and what isn't. When you look at the 2 process comparison we often saw both processes running on the same box. I've heard that a patch is on the way to fix the scheduler so this doesn't happen.

      --
      One should not theorize before one has data. -Sherlock Holmes-
    2. Re:HT hurt perf by Zathrus · · Score: 3, Informative

      processes can start being migrated around to CPU's that do not really exist and appear to have no load, yet the physical CPU may be 100% loaded in reality

      The article indicates that they're fixing this in the 2.5 branch. Lots of additional patches to the scheduler to let it comprehend the difference between physical and logical processors and do the Right Thing with them.

      Oh, and if you're running a 2 CPU box with only a couple (as in two) large jobs then no, you won't see a performance gain. You already have 1 CPU/process and HT would just be additional overhead.

    3. Re:HT hurt perf by pizza_milkshake · · Score: 2

      the "unbalance" problem between physical/virtual cpus was mentioned in the article, and it was said that as of 2.5.32 it was addressed.
      i guess it depends what apps you're running; from the article it looks like (web|file|db)servers (and a kernel that runs a smarter scheduler than 2.4.17 had) might be able to squeeze out a little (~30%) performance gain.

    4. Re:HT hurt perf by steelerguy · · Score: 1

      I should have indicated that the jobs are limited to 2 per machine due to the fact they use a lot of RAM, up to 2 GB. So in our case it is more of a memory limitation that prevents us from running additional jobs. If we ran 3 or 4 processes the additional performance from HT would be more than cancelled out by the swapping we would start to see. In cases like this, the additional overhead ends up slowing the jobs down rather than helping.

      I can certainly see a boost in performance for a webserver running many httpd's or something similar to that.

      Also just wanted to mention the clustering problems involved with HT since many people were starting to talk about Beowulf's.

    5. Re:HT hurt perf by jelle · · Score: 2

      I didn't see a mention of OpenMosix in the article, and the original poster worried specifically about process migration in an openmosix cluster.

      --
      --- Hindsight is 20/20, but walking backwards is not the answer.
  15. Useful for development? by NixterAg · · Score: 5, Insightful

    Like most development shops, we do a great deal of development for multiprocessor machines so we write a lot of multithreaded code. Multithreaded code creates a whole host of new debugging pitfalls that don't show up if the developer is debugging on a single processor workstation. As John Robbins says in his terrific Debugging Applications book, if you are developing a multithreaded application, you better be certain you are doing your debugging in a multiprocessor environment.

    From a development standpoint, will a hyperthreaded chip provide an adequate environment in duplicating the behavior of a multi-processor PC well enough that shops can buy cheaper, one CPU machines for development and still be confident in their results? I'm guessing nothing will replace the real thing but I'd be interested in any commentary.

    1. Re:Useful for development? by yakovlev · · Score: 1

      It's somewhere in-between. You get the "anything can happen in any order" that you would get with a modern processor, but cache synchronization bugs can be hidden. This is because the two threads share a common cache, and so there is minimal latency between when one thread does a cache write and when the other thread has that data in its cache, hiding bugs in the sychronization code that prevents using stale cache data.

      Most likely you'd catch any bugs that were in a higher-level language just fine, but you might miss bugs at the assembly level. For instance, you'd catch bugs in the code that uses a mutex, but might not catch bugs in the code that implements said mutex.

    2. Re:Useful for development? by kEnder242 · · Score: 1

      yes, it helps. Stumbled on this a while ago... interesting.

      http://www.virtualdub.org/

      "Intel Corporation has graciously given me a 3.06GHz Pentium 4 with HyperThreading Technology ..."

      "This wouldn't be a new release, of course, without a little something for everyone else too. As it turns out, the HyperThreaded CPU exposed non-atomic synchronization code in the playback routine, and so this version fixes random lockups during playback on any SMP or HT-capable system. (A rather neat feature of HyperThreading is that you find all the mistakes in your threading code without having a second CPU do nothing all the time other than run WinAmp.) The VTune 6.0 profiler also spotted an unaligned row buffer in the resize routine, which should execute a little faster now. I fixed a bug that made the copy construction support in the filter API unusable, and fixed the directory bug that everyone's been telling me about in the Save Image Sequence command. I'm sorry I wasn't able to squish some of the other bugs or missing features that still exist, but I wanted to get the P4 version and the above critical fixes out first."

      --
      my associative arrays can kick your hash - TCL
  16. Re:excellent by pVoid · · Score: 2
    You are quite the Lame one.

    if you had read the article, you would have seen that the kernel doesn't show too many signs of superb HT usage. In fact, performance degrades in many places.

    Also, if you knew just an itsy bit about kernels, you would know that Microsoft has done some pretty good advancements and achievements in the SMP realm.

  17. Humph! by airrage · · Score: 4, Funny

    Well if I must say something, it' this: that's really going to put a fancy how-do-you-do in the knickers of all those pay-per-processor software types. I mean Oracle, for heaven's sake, is going to have to go absolutely bonkers trying to figure out how to screw the light-bulb into that buffalo (if you pardon my french). I mean what's a meglomanic to do? I mean I've got expenses! I've got tricarbonfiberalloy yacht hulls to pay for! Can have people going around trying to process code in a processor without us getting some slice of that monkey, I'll tell you right here and now sir! No sir! Maybe it's your not patriotic enough. Trying to cut corners, eh gov'nor? Now I'm gonna have to go and rewrite all the contracts stating explicitly that "processor" is defined as a virtual space for processing. Yes that ought to do it. But I'll still have to have the lawyers check it, just to make sure they aren't any loopies. Drats those laywers! Taking all my money too!

    --
    "This isn't a study in computer science, its a study in human behavior"
    1. Re:Humph! by estoll · · Score: 1

      Here is what Microsoft has to say about it.

      Operating System

      Microsoft Windows-Based Servers and Intel Hyper-Threading Technology

      By John Borozan
      Microsoft Corporation
      Updated: April 2002

      Abstract
      This article provides an overview of how the Microsoft® Windows® Server operating system works with Intel® Hyper-Threading technology. It explains the implications for performance, compatibility, and licensing.

      This is a preliminary document and may be changed substantially prior to final commercial release of the software described herein.
      The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
      This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT.
      Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
      Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
      The example companies, organizations, products, people and events depicted herein are fictitious. No association with any real company, organization, product, person or event is intended or should be inferred.
      © 2002. Microsoft Corporation. All rights reserved.
      Microsoft, Windows, and the Windows logo are registered trademarks of Microsoft Corporation in the United States and/or other countries.
      The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

      Contents
      Acknowledgements 1
      Introduction 2
      What is Hyper-Threading Technology? 2
      Windows 2000 Server Family and Hyper-Threading Technology 4
      Windows .NET Server Family and Hyper-Threading Technology 7
      Windows Server Applications and Hyper-Threading Technology 9
      Windows Server Performance on Processors with Hyper-Threading Technology 10
      Windows Server Licensing on Systems Enabled with Hyper-Threading Technology 11
      Frequently-Asked Questions 12
      Appendix 13
      Related Links 14

      Acknowledgements
      Bob Ellsworth, Brad Waters, Bruce Worthington, Carla Huffman, Hiroyuki Suzuki, Jim Livingston, Luisa Vacca, Mark Wood, Maurice Franklin, Peter Conway, Peter Johnston, Sean McGrane, Sunil Koduri, Velle Kolde, Wilhelmina Duyvestyn, William Lyon, Bryan Sutton, and John Kaiser.
      Microsoft Corporation

      Introduction
      What is Hyper-Threading Technology?
      Intel's Hyper-Threading Technology allows a single physical processor to execute multiple threads (instruction streams) simultaneously, potentially providing greater throughput and improved performance.
      Intel will introduce Hyper-Threading Technology in their Intel® Xeon(TM) processor family for servers in the first quarter of 2002. For more information, see Intel Demonstrates Breakthrough Processor Design at http://developer.intel.com/pressroom/archive/relea ses/20010828comp.htm.
      These processors will contain two architectural states on a single processor core, making each physical processor act as two logical processors for the operating system. However, the two logical processors will still share the same execution resources of the processor core, so performance gains do not approximate having two complete, physical processors. For more information, see Introduction to Hyper-Threading Technology at http://developer.intel.com/technology/hyperthread/ download/25000802.pdf.
      Hyper-Threading Technology complements symmetric multi-processing (SMP) by allowing more threads to execute simultaneously per processor.
      How Do Windows-Based Servers Recognize Processors with Hyper-Threading Technology?
      Windows-based servers receive processor information from the BIOS. Each server vendor creates their own BIOS using specifications provided by Intel.
      Assuming the BIOS is written according to Intel specifications, it begins counting processors using the first logical processor on each physical processor. Once it has counted a logical processor on all of the physical processors, it will count the second logical processor on each physical processor, and so on, as shown in Figure 1.

      It is critical that the BIOS count logical processors in the manner described; otherwise, Windows 2000 or its applications may use logical processors when they should be using physical processors instead. For example, consider an application that is licensed to use two processors on the system diagrammed in Figure 1. Such an application will achieve better performance using two separate physical processors (such as 1 and 2) than it would using two logical processors on the same physical processor (such as 1 and 5).
      Note: The numbers used in these diagrams reflect the order in which logical processors are recognized by the BIOS and used by Windows; they do not reflect the actual processor numbers reported by the operating system.
      Windows 2000 Server Family and Hyper-Threading Technology
      Windows 2000 Server does not distinguish between physical and logical processors on systems enabled with Hyper-Threading Technology; Windows 2000 simply fills out the license limit using the first processors counted by the BIOS. For example, when you launch Windows 2000 Server (4-CPU limit) on a four-way system enabled with Hyper-Threading Technology, Windows will use the first logical processor on each of the four physical processors, as shown in Figure 2; the second logical processor on each physical processor will be unused, because of the 4-CPU license limit. (This assumes the BIOS was written according to Intel specifications. Windows uses the processor count and sequence indicated by the BIOS.)

      However, when you launch Windows 2000 Advanced Server (8-CPU limit) on a four-way system enabled with Hyper-Threading Technology, Windows will use all eight logical processors, as shown in Figure 3.

      Although Windows recognizes all eight logical processors in this example, in most cases performance would be better using eight physical processors.

      Windows .NET Server Family and Hyper-Threading Technology
      When examining the processor count provided by the BIOS, Windows .NET Server distinguishes between logical and physical processors, regardless of how they are counted by the BIOS. This provides a powerful advantage over Windows 2000, in that Windows .NET Server only treats physical processors as counting against the license limit. For example, if you launch Windows .NET Standard Server (2-CPU limit) on a two-way system enabled with Hyper-Threading Technology, Windows will use all four logical processors, as shown in Figure 4.

      Note: This reflects features defined at the Beta 3 release of the Windows .NET Server Family. CPU limits and product offerings are subject to change prior to final release. For more information, go to http://www.microsoft.com/windows.NETserver/evaluat ion/choosing/default.asp

      This example illustrates the great benefit provided by Windows .NET Server on systems enabled with Hyper-Threading Technology--customers are able to harness the processing power of four logical processors using a 2-CPU license.
      Windows Server Applications and Hyper-Threading Technology
      Regardless of whether an application has been specifically designed to take advantage of Hyper-Threading Technology, or even whether the application is multi-threaded, Intel expects the existing body of applications in the market today to run correctly on systems enabled with Hyper-Threading Technology without further modification, and without being recompiled. For more information, see Introduction to Hyper-Threading Technology at http://developer.intel.com/technology/hyperthread/ download/25000802.pdf.

      Windows Server Performance on Processors with Hyper-Threading Technology
      Intel has published several benchmarks demonstrating improved performance for Windows-based servers equipped with Intel® Xeon(TM) and Intel® Xeon(TM) MP processors. For more information, see Intel® Xeon(TM) Processors - Performance Indicators at http://developer.intel.com/design/xeon/perfbref/in dex.htm. Microsoft expects performance to vary depending on the application, system configuration, and version of Windows that is used.
      Although Windows 2000 is compatible with Hyper-Threading Technology, we expect customers will get the best performance from Hyper-Threading Technology using Windows .NET Server. This is because the Windows .NET Server Family is engineered to take full advantage of the logical processors created by Hyper-Threading Technology.
      Windows Server Licensing on Systems Enabled with Hyper-Threading Technology
      Windows Server licensing is based on the number of physical processors on a system. For more information, see Processors with Hyper-Threading Technology at http://www.microsoft.com/business/downloads/licens ing/hyper_threading_processors.doc.
      Because Windows 2000 Server does not distinguish between physical and logical processors, Windows 2000 simply fills out the license limit using the first processors counted by the BIOS. For example, consider the system diagrammed in Figure 1. When launching Windows 2000 Server (4-CPU limit) on this system, Windows will use logical processors 1-4 and disregard logical processors 5-8.
      In contrast, Windows .NET Server distinguishes between logical and physical processors, regardless of the way they're counted by the BIOS. Consider the system diagrammed in Figure 4 earlier. When launching Windows .NET Standard Server (2-CPU limit) on this system, Windows will use all four logical processors (1-4).
      Note: This reflects features defined at the Beta 3 release of the Windows .NET Server Family. CPU limits and product offerings are subject to change prior to final release. For more information, see http://www.microsoft.com/windows.NETserver/evaluat ion/choosing/default.asp.
      The table in the Appendix indicates the number and type of processors used by Windows-based servers on various systems.

      Frequently-Asked Questions
      Q: What is the difference between Hyper-Threading Technology and Jackson Technology (JT)?
      A: Jackson Technology was the Intel code name for Hyper-Threading Technology.
      Q: What is the difference between processors that contain Hyper-Threading Technology and multiple-core processors?
      A: Processors enabled with Hyper-Threading Technology share more resources (for example, execution resources) on the same physical processor than do multiple-core processors. For example, multiple-core processors have separate execution units and first-level memory caches.
      Q: Which applications will work with Hyper-Threading Technology?
      A: Based on tests with off-the-shelf applications, Intel expects the existing body of applications in the market today to run correctly on systems enabled with Hyper-Threading Technology without further modification or recompiling. For more information, see Introduction to Hyper-Threading Technology at http://developer.intel.com/technology/hyperthread/ download/25000802.pdf. Check with your application vendor for further details.
      Appendix
      The following table indicates the maximum number of physical/logical processors used by Windows-based servers on systems enabled with Hyper-Threading Technology.

      2-way 4-way 8-way
      Windows Server Product Physical Logical Physical Logical Physical Logical
      Windows 2000 Server 2 4 4 4 n/a n/a
      Windows 2000 Advanced Server 2 4 4 8 8 8
      Windows 2000 Datacenter Server n/a n/a 4 8 8 16
      Windows .NET Web Server 2 4 2 4 n/a n/a
      Windows .NET Standard Server 2 4 2 4 n/a n/a
      Windows .NET Enterprise Server 2 4 4 8 8 16
      Windows .NET Datacenter Server n/a n/a 4 8 8 16
      16-way 24-way 32-way
      Physical Logical Physical Logical Physical Logical
      Windows 2000 Server n/a n/a n/a n/a n/a n/a
      Windows 2000 Advanced Server n/a n/a n/a n/a n/a n/a
      Windows 2000 Datacenter Server 16 32 24 24 32 32
      Windows .NET Web Server n/a n/a n/a n/a n/a n/a
      Windows .NET Standard Server n/a n/a n/a n/a n/a n/a
      Windows .NET Enterprise Server n/a n/a n/a n/a n/a n/a
      Windows .NET Datacenter Server 16 32 24 24 32 32

      To ensure optimal performance, Hyper-Threading is disabled on partitions larger than 16 processors.

      Note: This reflects features defined at the Beta 3 release of the Windows .NET Server Family. CPU limits and product offerings are subject to change prior to final release. For more information, see http://www.microsoft.com/windows.NETserver/evaluat ion/choosing/default.asp.
      Related Links
      See the following resources for further information:
      Processors with Hyper-Threading Technology at http://www.microsoft.com/business/downloads/licens ing/hyper_threading_processors.doc
      Intel® Xeon(TM) Processors - Performance Indicators at http://developer.intel.com/design/xeon/perfbref/in dex.htm
      Intel Demonstrates Breakthrough Processor Design at http://developer.intel.com/pressroom/archive/relea ses/20010828comp.htm.
      Introduction to Hyper-Threading Technology at http://developer.intel.com/technology/hyperthread/ download/25000802.pdf.
      For the latest information about Windows 2000 Server, see the Windows 2000 Server Web site at http://www.microsoft.com/windows2000/server.
      For the latest information about Windows .NET Server, see the Windows .NET Server Web site at http://www.microsoft.com/windows.netserver.

      --
      http://www.askthevoid.com
    2. Re:Humph! by Anonymous Coward · · Score: 0

      Oracle is playing their usual games. How ever many cpu's the OS sees in the box is how many you have to license for. Microsoft (and hopefully others) is taking the saner route that it's not an actual processor and you don't have to license it in a per cpu model (e.g. SQL server)

      Fortunately you can always turn off hyperthreading in the bios if you are running apps that can't take advantage or vendors who will take advantage.

  18. Re:excellent by stratjakt · · Score: 2, Informative

    Yeah

    SMP support has existed since NT 4.

    If you use NT 4 MP edition, 2k Pro or XP pro, HT just works if you have the hardware.

    Linux had to change to accomodate it, as it bypasses the original system BIOS with it's own code.

    So what you meant to say was "once again Linux plays catchup to MicroSoft, but only about a year or so later this time, and not 5-10."

    --
    I don't need no instructions to know how to rock!!!!
  19. Underwhelmed by snitty · · Score: 1

    Seems great for server applications but not so great for desktop usage. It would be nice to see some info on compile times.

    --
    Modular Redundancy--Because 4 out of 5 Nodes agree
    1. Re:Underwhelmed by pVoid · · Score: 3, Informative
      Compilers are notoriously single threaded monsters.

      On moft platforms, CL.exe goes file per file, and outputs. It's a linear opertation. So HT for compiling makes no difference.

      However, the NT DDK has a cool feature that allows you to spawn as many instances of CL as there are processors. Which you guessed it, is only of any use if you are compiling tens/hundreds of files.

      Sorry, I do not really know of compiler internals for *NIX. Maybe someone can back me up? or clear it up?

    2. Re:Underwhelmed by stratjakt · · Score: 2, Insightful

      It's not so great, if you need SMP you still cant beat two or more physical CPUs.

      In this scheme, the pipeline is split into two and two concurrent threads run in it. Which is pretty neat, but hurts performance in some situations.

      - Cache latency is basically doubled, as two VCPUs now fight over access to the cache

      - Pipeline depth is shortened for either given VCPU, which hurts code that was optimized for the longer pipelines (lots of matrix math, MMX stuff).

      It's a cool development in CPU design, but it has a ways to go, and the OS needs to be aware of it. You should be able to 'shut it off' in code on the fly, if you want to dedicate 100% real CPU to a given task.

      --
      I don't need no instructions to know how to rock!!!!
    3. Re:Underwhelmed by Xerithane · · Score: 3

      Sorry, I do not really know of compiler internals for *NIX. Maybe someone can back me up? or clear it up?

      With gcc, the -j will setup gcc to utilize SMP. You specify the number of processors you physical have. I do not know how it would work with HT, and I didn't RTFA to see if they covered it. There is native support inside of gcc for SMP-based compiling though.

      --
      Dacels Jewelers can't be trusted.
    4. Re:Underwhelmed by betaray · · Score: 1

      GCC is not threaded (nor am I aware of any threaded compilers, but I'm sure some one will bring one up just to prove me wrong).

      However, make files have supported concurrent build steps for quite a while. Try make -j4 on your linux kernel compile to "jump" ahead and build up to four files at once.

    5. Re:Underwhelmed by JoeBuck · · Score: 2

      -j is an option for GNU make, not gcc. And there is no rule that says you must specify the number of processors you physically have; for big compiles, you'll get a somewhat better time if you say -j2 on a single-processor machine. This is because, when two gcc's run in parallel, one can take the processor while the other is waiting for disk.

      There is no native support inside of gcc for SMP-based compiling. gcc itself is completely sequential. You are perhaps thinking of parallel makes.

    6. Re:Underwhelmed by danish · · Score: 2
      With gcc, the -j will setup gcc to utilize SMP. You specify the number of processors you physical have. I do not know how it would work with HT, and I didn't RTFA to see if they covered it. There is native support inside of gcc for SMP-based compiling though.

      Um, no, not quite. You pass the -j option to make. make will then go through your makefile, and assuming you wrote it right, run specified commands (like gcc) in parallel. You have to be careful about target dependencies when doing this, though. And this parallelization is even useful on uniprocessor machines, as if you use make -j2 you will get some gain in time in a big compile because while one gcc is doing I/O, the other can be using the CPU and compiling.

      Just to be a pedant,

      -chris

    7. Re:Underwhelmed by hackstraw · · Score: 2

      Maybe this is a feature of newer releases of gcc, but I've never heard of -j doing auto SMP. There is a -j option for parallel makes with gnu make, but this is only for the compilation and not runtime.

      The portland group compiler and the intel compiler. Do support some auto-parallalization via openmp and threads.

    8. Re:Underwhelmed by cant_get_a_good_nick · · Score: 2

      Typo, I assume you meant GNUmake -j, not gcc.

    9. Re:Underwhelmed by sholden · · Score: 2

      Gcc has no -j option. Make has a -j option.

      Which has nothing to do with SMP, it's simply how many jobs make will run simultaneously, which of course is a wise thing to use in a multi processor environment, but also a good thing to use in places where the CPU waits for IO (ie. if your code and output is stored on a disk).....

    10. Re:Underwhelmed by dcmeserve · · Score: 2, Insightful

      > Cache latency is basically doubled, as two VCPUs now fight over access to the cache

      I'm pretty sure this is wrong -- cache latency isn't doubled; the SIZE is HALVED. The two threads access two different virtual caches. Trying to get them to contend for a single cache would be an architectural nightmare.

      Though I believe it's still one physical cache -- which means that the latency is going to be higher than what you'd expect for a cache of its apparent size.

      > Pipeline depth is shortened for either given VCPU, which hurts code that was optimized for the longer pipelines (lots of matrix math, MMX stuff).

      I don't actually know about the pipeline, but I suspect this is wrong too: shortening the pipeline (reducing the number of stages) is a fundamental change in the architecture; a pipeline isn't something you can cut in half and give the front end to one process and the back end to another. Each stage is quite unique.

      Now if you mean that the latest Pentiums have a shorter pipeline than previous incarnations, then maybe that's right (though I'd doubt it -- they're always *lengthening* the pipeline to get those higher GHz numbers). But that would have nothing to do with Hyperthreading.

      --
      "Orthodoxy is unconsciousness" - Orwell
  20. Hyper(Space)Threading by LookSharp · · Score: 5, Funny

    If you overclock the Xeons (And newer P4 CPUs) too high...

    "Prepare to go to HyperThread."

    "Go to HyperThread!"

    *WHOOSH*

    "My God, they've gone plaid!"

    (Just to keep on topic, this is a very informative shootout between HT/non-HT Intel and AMD SMP processors setups here.)

    Just couldn't resist the Spaceballs reference, tho!

    1. Re:Hyper(Space)Threading by The+Evil+Couch · · Score: 5, Funny

      so you're saying AMD's response is going to be to go to LudiciousThreading?

    2. Re:Hyper(Space)Threading by Anonymous Coward · · Score: 0

      That's why you dont overclock at "ludicrous speed"

    3. Re:Hyper(Space)Threading by Scooter · · Score: 2

      Well, it aint like dusting crops.

  21. Re:excellent by dabraun · · Score: 1

    Win2K was capable of making use of hyperthreaded processors, though not aware of the difference between a hyperthreaded processor and 2 physical processors. Windows XP is aware of the difference and makes the right choices about thread prioritization / processor affinity - and licensing (XP Pro, which supports "2 processors" will still support "4 virtual processors" on a hyperthreaded machine.) There is nothing new about Linux supporting Hyperthreading. David

  22. Re:Sad news -- John Belushi dead in 82 by Anonymous Coward · · Score: 0

    Say it aint so!!

  23. Executive summary... by guido1 · · Score: 3, Informative

    Hyperthread support vs not.

    Standard API calls (w/ hyper thread) Increase (a bad thing (tm)) of latency of calls by 1-6%.

    STD workload (w/ hyper thread) Increase in throughput an average of 5-10%. Disk writes decreased throughput by 30%.

    Client network perf: "Chat room" test, increase of throughput 22-28%.

    Server network perf: File serving, increase of 9-31%.

    Kernal 2.5.24 roughly doubles the above benefits.

    Looks like no real downfalls... (How often are you running a single thread? Me either.)

  24. Re:IT'S TROLL TUESDAY AT SUBWAY, ER, SLASHDOT!!!1! by airrage · · Score: 1, Offtopic

    Slashdot must have a very large varchar size on the comment_txt field in their db, or are they just saving as a blob?

    --
    "This isn't a study in computer science, its a study in human behavior"
  25. In other news... by dirvish · · Score: 3, Insightful

    Hyper-Threading Speeds Windows

  26. How does HT compare vs SMP? by sboyko · · Score: 1

    Is there any comparison between a single, HyperThreaded, chip and two chips multiprocessed with SMP? I assume the results would be very similar.

    --
    SCO, Microsoft, P2P, what's your hot button?
    1. Re:How does HT compare vs SMP? by stratjakt · · Score: 1

      No, the SMP setup would thoroughly trounce the HT chip.

      In SMP each chip has it's own cache, with HT two VCPUs fight for the same cache (twice the latency)

      Each SMP chip has full access to the pipeline for more complicated calculations, with HT the pipeline for each VCPU will vary.

      --
      I don't need no instructions to know how to rock!!!!
    2. Re:How does HT compare vs SMP? by sboyko · · Score: 1

      Wouldn't the SMP chips have greater problems with bus contention when they're both trying to access main RAM? Or does the HT chip just push the contention to the cache level?

      Good point about the pipeline(s).

      --
      SCO, Microsoft, P2P, what's your hot button?
  27. I wonder... by mkweise · · Score: 0, Offtopic

    ...did Intel come up with that name in response to AMD's Hypertransport bus architecture, or did they independently decide that the Xeon needed something hyper?

    --
    Gentlemen! You can't fight in here, this is the War Room!
  28. This may decrase performance in some cases by Jack+Wagner · · Score: 1, Troll

    We've been doing some research in the Wagner Labs and we've seen many cases where an app is optimzed for hitting the level II cache and thus reducing the pipeling done by optimizing on modern day compilers and when you use these apps on a hyperthreaded proc you actually see a performance DECREASE by the order of Olog(n) due to the fact that the insctuction set is running parallel in the CPU and never leave the LII cache, thusly never getting a chance of utilizing the advantages of hyperthreading.

    Once again this proves the point made by Fred Brooks in "The Mythical Man Month" that even if you increase the technical levels of optimiztion you can will only see actual real-world speed improvements in Olog(n)/4ac, on the average.

    That said I do think there is quite a bit of potential for hyperthreading when the compilers are able to catch up, so to speak.

    Warmest regards,
    --Jack

    --


    Wagner LLC Consulting Co. - Getting it right the first time
    1. Re:This may decrase performance in some cases by DAldredge · · Score: 1

      You are a jackass. Why do you post crap like this?

      ***CHECK HIS POSTING HISTORY and DOMAIN. THE DOMAIN DOESN'T EVEN WORK****

    2. Re:This may decrase performance in some cases by ethereal · · Score: 1

      Bravo, Jack - demonstrating moderator stupidity since, well, whenever you started doing it.

      Working "The Mythical Man Month" into a CPU architecture discussion is just classic :) I think you could probably say anything and get moderated up as long as you mentioned Fred Brooks; it's kind of the opposite of Godwin's law.

      See, folks, we need the trolls to keep the moderators honest. Now, if you are a moderator who knows 2+2, please moderate the parent down.

      --

      Your right to not believe: Americans United for Separation of Church and

  29. But... by suwain_2 · · Score: 2

    ...I don't understand how this helps. I'm typing this on a Dual 1.4 GHz system -- even if a process is multi-threaded, it's still not as fast as a 2800 MHz processor. In addition, many programs can't take advantage of SMP, rendering dual processors 'useless' (for any single process; Linux distributes processes across processors.)

    So if 2*1400 1400? Shouldn't taking, say, the 3 GHz P4 and 'emulating' SMP actually slow things down slightly? I don't understand how it can help, and am actually surprised that it doesn't *hurt* speedwise.

    --
    ________________________________________________
    suwain_2 :: quality slashdot p
    1. Re:But... by stratjakt · · Score: 2, Insightful

      It doesn't 'emulate' SMP, it actually performs two operations at the same time by splitting the instruction pipeline in half (well not in half, it varies as to how much pipeline each 'cpu' gets). It's not as good as SMP for various reasons, mostly boiling down to the two threads sharing the rest of the chip.

      It does 'hurt' sometimes, but it's usually negligable, and you have to pretty much go out of your way to design code that would run slower - such code can 'hurt' traditional SMP systems as well.

      I'm sure there will be plenty of cooked benchmarks for fanboys to rant about in the future, just like there are between 3DNow! and MMX/SSE/2..

      It is a cool development, and *can* be shut off if it's only hindering your system (ie; you're running Windows 98 or a linux kernel with no HT support - and thus wasting pipeline to a 'CPU' that isn't used)

      --
      I don't need no instructions to know how to rock!!!!
    2. Re:But... by be-fan · · Score: 2

      Look at it this way. The CPU has a bunch of execution units on it. The P4, specifically, has two arithmatic units, two FPUs, and some other stuff. Since threads usually don't use all these units optimally, some are wasted. A second simultanious thread might be able to use the otherwise unused units, and thus the overall performance of the two threads combined increases.

      --
      A deep unwavering belief is a sure sign you're missing something...
  30. Benefits!? by caluml · · Score: 2

    * 128-byte lock alignment
    * Spin-wait loop optimization
    * Non-execution based delay loops
    * Detection of Hyper-Threading enabled processor and starting the logical processor as if machine was SMP
    * Serialization in MTRR and Microcode Update driver as they affect shared state
    * Optimization to scheduler when system is idle to prioritize scheduling on a physical processor before scheduling on logical processor
    * Offset user stack to avoid 64K aliasing

    Is that all?! I hoped it'd do the post-integer-supercooled-re-automation-longterm-bu zzword-cipher-reallignment too. That's something new that you guys haven't heard of yet ;)

  31. Something else to think about... by Whatthehellever · · Score: 0

    HT = Yet *another* threat to Microsoft. Not only are the Linux servers reliable, but they just sped up 30 to 50%!

    --

    ---
    IMHO, of course.
    May the SOURCE be with you.
    1. Re:Something else to think about... by cant_get_a_good_nick · · Score: 2

      As opposed to MS OSes that are also sped up by this? If anything, it's more of a threat to other UNIX vendors where Linux and other x86 based free Unices get better performance/cost ratios.

  32. Hyper-Threading on the PPC? by siliconwafer · · Score: 1

    While this is cool news, it doesn't help us PPC users. Does anyone know if this technology will make it's way to the new IBM chips that Apple will (according to rumors) use?

    1. Re:Hyper-Threading on the PPC? by Anonymous Coward · · Score: 0



      You guys just got to 1GHz, calm down. That's enough for this year.

    2. Re:Hyper-Threading on the PPC? by Anonymous Coward · · Score: 0
    3. Re:Hyper-Threading on the PPC? by Doctor+Memory · · Score: 1

      IBM's new chips (Power4) actually have two CPUs on one die. I believe the version that Apple is rumored to be considering (PPC 970) will be a single-CPU version, though.

      --
      Just junk food for thought...
    4. Re:Hyper-Threading on the PPC? by Anonymous Coward · · Score: 0

      Yeah. I'm sure Intel is going to share this technology with IBM and the PPC. Especially now that IBM and AMD are in bed.

      Duh.

  33. In Other News. by OS24Ever · · Score: 4, Funny

    Faster clock speed processors speed up Linux.

    --

    As a rock-in-roll Physicist once said, No matter where you go, there you are.

    1. Re:In Other News. by karlandtanya · · Score: 1, Offtopic

      Go Team Banzai!! Go Blue Blazers!!

      --
      "Reality is that which, when you stop believing in it, doesn't go away." - Philip K. Dick
  34. Look Ahead by On+Lawn · · Score: 1

    My OS course is getting a bit fuzzy already, but we were lucky enough to have a visiting professor from a real university teach it. Profesor Tsuni was one of the best teachers I had.

    But back to the point, and excuse the "obviousness" of the questions. But HT sounded like a way to more efficiently use the pipelines on modern processors by allowing multiple threads to work on them.

    And here is the fuzzy part, or maybe I'm just not remembering correctly. Do the multiple threads need to be in the same process? If so, I remember the linux kernel threading actually throws threads out as new full processes, and I'm unsure of how the CPU can track that. Or is the scheduler smart enough to send processes down the queue in an order where threads that can share processor time easier are sent together?

    Also, if some of the posts are correct it seems that multiple processors show up in Top. Off hand I wonder if this hamper or help OpenMosix's algorithms that decide where to place processes to run.

    1. Re:Look Ahead by dbavirt · · Score: 1

      As far as the OS is concerned, there are two processors running in SMP.

      HT isn't really about threads in a single program, it's about any thread that the OS can throw at the CPU. It is not limited to threaded applications; a process is a thread.

      So at the worst case, both threads are optimized such that the pipeline would always be full. That's ok, it will be as if one processor executes the threads, one after another. But that never happens. Some instructions within a thread will invariably depend on their predecessors, and that's where HT kicks in. Since the next instruction in thread A depends on the previous instruction, the CPU will simply start the next instruction from thread B, which has no dependency on that previous thread A instruction.

      HT uses pipelining better than a compiler could ever dream. Sure, you can come up with some code to challenge that statement, but a vast majority of real-world code must contain dependencies which prevent efficient pipelining.

    2. Re:Look Ahead by dcmeserve · · Score: 1

      > But HT sounded like a way to more efficiently use the pipelines on modern processors by allowing multiple threads to work on them.

      Not the pipeline itself; more like the *resources* that the chip has -- e.g. if you have 2 floating-point multipliers, 2 dividers, and 3 integer units, you're going to find it difficult to keep them all busy with just a single thread, unless you're running a very math-intensive process. Similarly for other types of resources, such as memory access, etc. Having multiple threads automatically removes a lot of the inter-instruction dependencies that the CPU sees, and thus allows it to throw more work at the resources (on average).

      > Do the multiple threads need to be in the same process?

      That's purely an OS issue; the CPU really doesn't know the difference. HT's job on the CPU is to make it act like two physical processors. It's then up the the OS to decide how to use it.

      > Also, if some of the posts are correct it seems that multiple processors show up in Top. Off hand I wonder if this hamper or help OpenMosix's algorithms that decide where to place processes to run.

      I don't know anything about OpenMosix, but the OS kernel does need to take new info into account to be able to fully optimize things, as discussed in the article.

      --
      "Orthodoxy is unconsciousness" - Orwell
  35. REDUKT! REDUKT! by Anonymous Coward · · Score: 0

    CLOB, most likely. What do they need binary data for?

  36. Re:Hyperthread this! by Anonymous Coward · · Score: 0

    In soviet russia, first post fails you!

  37. Re:excellent by Anonymous Coward · · Score: 0

    Actually, SMP support has been been in NT since 3.1. From the start NT was designed to be multi-threaded.

  38. Proof Bogomips are bogus! by Zathrus · · Score: 4, Funny

    As if there wasn't enough already...

    processor : 0
    bogomips : 3191.60
    processor : 1
    bogomips : 3198.15

    According to that the logical processor is actually faster than the physical one! Just think of what you could wind up with if you instantiated a logical CPU on the logical CPU!

  39. not to be picky by DrSkwid · · Score: 1, Troll

    but surely someone who was qualified to comment would know how to spell kernel?

    It's not that I want to pull you up for bad spelling but in order to speak with authority one must get the simple things right.

    Apologies if you have some sort of linguistic problem but people who seriously study kernel performance see the word kernel constantly and therefore one would expect them to spell kernel kernel.

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
    1. Re:not to be picky by Anonymous Coward · · Score: 0

      Maybe he's just stuck in the Commodore days. They used to call it a 'kernal', too.

      Or maybe he's just stupid. Occam's razor...

    2. Re:not to be picky by Anonymous Coward · · Score: 0

      You'd think people could spell "squid" correctly, too, wouldn't you?

  40. Re:excellent by stratjakt · · Score: 1

    Multithreaded, sure but I'm not sure if it supported SMP, because I don't think it was a reality on Intel hardware yet (?) Of course, back then there was an Alpha tree so maybe it was.

    At any rate, I find it easier to live as though NT 3.x never existed.

    --
    I don't need no instructions to know how to rock!!!!
  41. It's just you by Royster · · Score: 5, Insightful

    WHat you've conveniently snipped out in your trollish post is all of the applications benchmarks showing improvements. If you're not going to run any application code, you might as well shut the machine off and save the marginal stress on the environment.

    Most of us have our computers do work and those applications, running on an OS which has *barely* slowed, will be able to do more work in the same amount of time under the HT-aware OS than under one which does not utilize the second, virtual processor.

    --
    I have discovered a truly marvelous sig, unfortunately the sig limit is too small to contain i
  42. SMP HT by DrSkwid · · Score: 2

    Anyone know of any details around SMP versions of HT CPUs. It's not a very google friendly set of search terms.

    I expect that there would be a performance difference if the scheduler knew which were real cpus and which were half of an HT pair.

    Even flags to fork concerning which processor to fork to. i.e. --this_cpu_but_different_HT_CPU
    Because you might want the freedom to attempt to reduce the in-CPU cache misses and the like.

    Likewise the the implmentation of Process Groups - setpgid() warrants investigation.

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  43. Re:excellent by benzapp · · Score: 2

    Wow. Its posts like this that really make me feel old at 25.

    They had 486 SMP systems. In fact there was an awesome upgrade that came out like ten years ago that let you put two 486 processors in one socket. Of course you needed the clearance for it. SMP was actually all the rage ten years ago for the same reason PowerPC was all the rage. Intel had a hard time scaling, so one of the solutions was to us multithreading and divide up the work.

    OS/2 2.11 SMP was out in 1993, and NT 3.1 came out shortly thereafter. Both supported SMP. The Pentium Pro, which came out in early 1995 was highly optimized for 32-bit code and multiprocessing. 4 and 8 way Pentium Pro boards existed. And were somewhat common.

    If anything, SMP is LESS common today. When was the last time you saw a 4-way SMP board for sale anywhere? You could easily get them back then. The reason its less common today is processors really are a lot faster. Intel is doing this Hyperthreading crap because they know that orders of magnitude performance gains are a thing of the past, so multithreading is the key.

    Of course, us old OS/2 fanatics were saying this ten years ago.

    --
    I don't read or respond to AC posts
  44. Technical Summary by 0x69 · · Score: 5, Insightful

    If you're running code that's efficient on a P4 (few mis-predicted branches, low cache miss rate, good parallelism, etc.) then HT is pretty much useless.

    If you're running code that's inefficient on a P4 (which pays for its high GHz with long pipelines, large latencies, a slow decode stage, and several other drawbacks), then HT can usually paper over a fair percentage of these problems. But remember that HT requires OS support, may require application support, and "your mileage will vary".

    --
    It's easy to make up & spread cool- and credible-sounding stuff. Finding & checking hard facts is hard work.
    1. Re:Technical Summary by iggymanz · · Score: 2

      but what about 2 unrelated apps runninng at the same time? Not everyone runs just one heavy program.

    2. Re:Technical Summary by cartman · · Score: 3, Interesting

      What you said was false.

      Take the example of database & OLTP applications. Database transactions are heavily dependent on repeated access to RAM. Virtually no database is small enough to fit into cache, and there is often little regularity in which data is accessed. Memory latency will REQUIRE a non-SMT processor to wait IDLY each time there is a memory latency, which takes >100 proc cycles on a modern CPU. This has NOTHING to do with he p4 architecture or long pipelines.

      "But remember that HT requires OS support, may require application support..."

      HT does not require OS support as long as the OS is capable of recognizing more than 1 CPU. Any threaded app can benefit from HT.

    3. Re:Technical Summary by 0x69 · · Score: 1

      It depends. If both programs are P4-INefficient, then HT has a good chance of speeding up the combination. I'd guess this would be the case with most "light weight but lots of 'em" generic server code.

      Two P4-efficient programs *might* get a minimal benefit from HT, but are more likely to perform much *worse* as they fight over the chip's maxed-out resources.

      I'd guess that a mix of the two types would fall some place in between. Theories & benchmarks are nice, but in the end, performance on your actual programs, data, etc. is what matters.

      (It would be nice if Intel's HT collected raw data to let the OS figure out when HT was & wasn't a good idea...anyone know if it does?)

      --
      It's easy to make up & spread cool- and credible-sounding stuff. Finding & checking hard facts is hard work.
    4. Re:Technical Summary by 0x69 · · Score: 1

      FWIW, such keep-stopping-to-wait-for-memory code is clearly what i classified as INefficient on a P4, so it should benefit from HT. But to respond to the assertion after your example:
      -and-
      The desired data IS often in the cache. Compare the cache latency of the 3GHz P4 (2 cycles L1, 18 cycles L2) to a 1.5GHz Itanium (1 cycle L1, 5 cycles L2). (Though the data could be in the Itanium's (12 cycle) L3 while the P4 is going to (far slower) RAM.) "...NOTHING to do with P4 architecture", right?

      When the CPU does go to RAM for the data, i bet that a 3GHz P4 waits idle for about twice as many cycles as a 1.5GHz Itanium (given same-speed memory systems). "...NOTHING to do with P4 architecture", right?

      So HT is useless if the OS doesn't recognize and use the (seeming) second CPU. Read the article, and you'll notice that HT gave a far larger benefit with the designed-specifically-for-HT kernel. You got a better 4-word summary of this than "HT requires OS support"?

      "Any threaded app can benefit from HT" is plain wrong. There are many (generally very CPU-intensive) apps that do WORSE under HT because both threads are fighting for the same (maxed-out) resources. Flip-side, there are plenty of NON-threaded apps that can benefit from HT - all this takes is for the OS to try running two different apps on the two CPU's that it sees.

      Summary: you can nit-pick okay, but i don't think you understand the technology or get the point of a short summary.

      --
      It's easy to make up & spread cool- and credible-sounding stuff. Finding & checking hard facts is hard work.
    5. Re:Technical Summary by cartman · · Score: 1

      "When the CPU does go to RAM for the data, i bet that a 3GHz P4 waits idle for about twice as many cycles as a 1.5GHz Itanium (given same-speed memory systems). "...NOTHING to do with P4 architecture", right?"

      The P4 waits idle for twice as many cycles because it HAS twice as many cycles. The clock rate of the processor is not what's important here. Latency to non-cached RAM has NOTHING to do with p4 architecture.

      "Read the article, and you'll notice that HT gave a far larger benefit with the designed-specifically-for-HT kernel. You got a better 4-word summary of this than "HT requires OS support"?"

      Yes I have a better brief summary. You said that HT "requires" OS support. That is not just an exaggeration; it is false. If true, this would mean that HT would not work at all unless the OS were specifically written to take advantage of it. An accurate summary would read: "HT benefits from optimizations designed to exploit it."

      "Summary: you can nit-pick okay, but i don't think you understand the technology or get the point of a short summary."

      I understand the technology quite well, and I certainly understand the point of an accurate summary, which yours was not.

    6. Re:Technical Summary by 0x69 · · Score: 1

      Nobody M1'ed me down, but i was pretty flamey up there. Please accept my apologies for that. On the technical issues:

      The extra P4 idle cycles (waiting on external RAM) are a direct result of the P4 designers' decision to favor "higher clock speeds" over "get more work done per clock tick". Those idle cycles become bubbles in the execution pipelines. Those bubbles (a problem when running single-thread) are the opportunity that HT takes advantage of (by filling in with work for the other thread). (If the "higher clock speeds" design decision isn't part of what you're using "P4 architecure" to mean, that's okay by me.)

      I agree that "HT benefits from optimizations designed to exploit it" is true. However, "[insert name of any new technology added to CPU's to speed up computation] benefits from optimizations designed to exploit it" is *generally* true. I'd say that my 4 words ("HT requires OS support") convey much more useful information in less space (very good in a summary) at the price of misunderstanding (bad anywhere) if taken in a certain literal way. (Which gets into what "HT support" means in an OS. I'd say that advocates for any OS will use it in a way that minimizes the damage.)

      --
      It's easy to make up & spread cool- and credible-sounding stuff. Finding & checking hard facts is hard work.
  45. Would this work? by josh+crawley · · Score: 1

    I have an idea about "logical processors". If, for some reason Intel decided to make 1 cpu to 3 "virtual" cpu's, could you boot up the computer on cpu1 with a special OS that allows you to boot the other cpu's into their own modes, while having the master OS deal with memory and drive accesses?

    CPU1 - MasterOS
    CPU2 - Linux 2.4.18
    CPU3 - Win2k

    It'd be even neater if you could shut down the os'es and reboot the chips.

    To the kernel devs: is this possible?

    1. Re:Would this work? by Anonymous Coward · · Score: 0

      no.
      HT is sorta like co-operative multitasking. it sucks rocks compared to SMP (equivalent to pre-emptive multitasking).
      in both cases, single memory space and bus resources are allocated with no support for resource (drive/IRQ) sharing which is what you need to boot 3 OSes at once.
      of course you could skip everything and boot 3 instances of Vmware on linux with a quad xeon box or quad AMD Athlon MP box.

    2. Re:Would this work? by user32.ExitWindowsEx · · Score: 1

      See VMware ESX Server.
      It may do what you want, but it's absurdly priced.
      I don't know of any free solutions.

      --
      "Evil will always triumph because good is dumb." -- Dark Helmet
  46. Expensive HT or cheap real SMP? by ponos · · Score: 5, Insightful


    In Europe P4 3.0 with HT costs ~745 euro (+tax)
    An Asus A7M for dual Athlon costs ~260 euro (+tax)
    Two Athlon XP 2200+ cost ~340 euro (+tax).
    Alternatively you can get two Athlon MP 2000+ for
    roughly the same money (if you don't trust the
    XPs).

    Now, please explain to me why would someone
    with real SMP needs in mind (and NOT games)
    consider the P4 with HT.

    P.

    P.S. I understand that the prices in the US are
    different, but still, it is VERY expensive.

    1. Re:Expensive HT or cheap real SMP? by Anonymous Coward · · Score: 0
      Now, please explain to me why would someone with real SMP needs in mind (and NOT games) consider the P4 with HT.

      No one would. So what's your point? The 3.06G P4 simply introduced SMT to desktop CPUs, it's the latest and greatest from Intel so OF COURSE IT'S OVERPRICED. Intel has always priced high end parts that way, why should this be any different? But you know, two years from now every Intel CPU will have SMT and everyone will be happy that key Linux developers have two years of SMT experience behind them.

    2. Re:Expensive HT or cheap real SMP? by Anonymous Coward · · Score: 1, Insightful

      Excellent Point.

      HT works the best where you can do better with dual CPUs.

      Why not using dual CPUs?

    3. Re:Expensive HT or cheap real SMP? by Tailhook · · Score: 1

      Now, please explain to me why would someone
      with real SMP needs in mind (and NOT games)
      consider the P4 with HT.


      Because they can use it with motherboards that have chipsets that don't suck moist donkey balls?

      --
      Maw! Fire up the karma burner!
  47. TROLL - Mod Parent Down by Surreal_Streaker · · Score: 1
    TROLL

    This is a known troll, please mod parent down.

  48. Re:excellent by Russ+Steffen · · Score: 5, Informative

    Holy intellectual dishonesty, Batman!

    NT and Windows 2000 do not support HT and never will. NT will not becuase it's been end-of-lifed, and Windows 2000 will not because of Microsft policy. On a 2-CPU system with HyperThreading, NT and Windows 2000 will think they have real 4 CPUs (unsurprisingly, this is what a pre-HT version of Linux will see as well). HT support means the OS knows that it has, in this example, 2 real CPUS and 2 fakes, and the scheduler will weight the real CPUs accordingly.

    XPPro SP1 is the first, and only shipping version of Windows to support HT.

  49. Re:excellent by Jord · · Score: 1
    How is this informative?? Lets clear up the FUD a bit and see how informative he is.

    SMP support has existed since NT 4.

    Linux has also had SMP support for ages. The changes that Linux made recently to the kernel was specifically to handle virtual cpus vs. physical cpus. I am willing to bet the farm that NO current MS product tells the difference.

    So what you meant to say was "once again Linux plays catchup to MicroSoft, but only about a year or so later this time, and not 5-10.

    Sounds more like MS will be catching up (if they even bother) to Linux instead of the other way around. Sorry to feed the trolls but with a +5 informative, it had to be done. Some moderators need to be shot.

  50. NGPT 2.2.0 tops Linuxthreads and NPTL by truth_revealed · · Score: 1, Offtopic

    NGPT 2.2.0 tops both Linuxthreads and NPTL

    Keep in mind that NPTL paved the way for the kernel changes that NGPT also makes use of.
    I'm sure that the NPTL team won't simply give up.
    Anyway - it looks like Linux will finally have a good SMP threading library.

  51. Concurrency benefits from HT by Jeppe+Salvesen · · Score: 2

    Simply put, you'll need two or more processes consuming all available CPU power before you'll see some real benefits from HT. If you're severely IO-bound, running a high-end FC SAN solution on an old P2 server will outperform a 5ghz machine with a mediocre disk.

    So - yes, not all people and applications will benefit from this. But no - it is not try and see.

    --

    Stop the brainwash

  52. Wrong about XP by DudemanX · · Score: 2, Informative

    While Win2K will see a hyper-threaded CPU as 2 physical CPUs, WinXP is smart enough to see it as a CPU and a Virtual CPU. At the last Intel conference I attended they made sure to emphasize that while XP Home doesn't support 2 phyisical prosessors it will properly recognize a hyper-threaded CPU and allocate resources accordingly. Do you think Intel would enable the technology in the P4 3Ghz(a desktop CPU) without making sure Microsoft supported it in their desktop operating systems?

  53. Unmentioned benefit of Hyper-Threading by sphinxter · · Score: 1

    I see the real benefit of Hyper-Threading being increased stability especially for development boxes. In the case where you have an infinite loop bug your CPU usage will eventually hit 100% and the computer will lock up. With Hyper-Threading only one virtual processor would lock up, the other will remain free so you would be able terminate the process and save yourself from crashing and rebooting the system.

    1. Re:Unmentioned benefit of Hyper-Threading by dcmeserve · · Score: 1

      > In the case where you have an infinite loop bug your CPU usage will eventually hit 100% and the computer will lock up.

      That'd only be true with a pre-NT Windows OS (i.e. '95, '98' ME, etc). Also the older Mac OS's, I believe. Any OS with true multitasking abilities will allow any program to be interrupted and killed, even if it's in an infinite loop. Unless, of course, the loop is within the kernel code. Which would be a major bug. Well, ok, so there go all the other versions of Windows.

      But anyways, any OS which *doesn't* have multitasking won't be able to make use of HT anyways.

      --
      "Orthodoxy is unconsciousness" - Orwell
  54. Don't be so harsh by Anonymous Coward · · Score: 0

    Some people just cannot spell. I started out in Anthropology but could never learn to spell peseant (or is it peasant or maybe peasent or pesant). They all look correct to me even tho I've read many books and articles on rural cultures in developing countries. I have a brother who can't spell the word engineer and yet makes 6 figures in this economy designing circuit boards for some well known companies. His partner writes the documentation. If you judge competence in a field based on a mispelled word in /. post, then (not to be too harsh) you're an idiot.

  55. Summary by swillden · · Score: 3, Insightful

    So, in a nutshell, what MS says is: Windows 2000 counts processors in a broken way and requires you to buy licenses for every logical processor, even though you won't get nearly as much processing power as you would if you really had that many physical processors. But rather than fix this bug, we're going to solve the problem by making you buy .NET, which counts processors correctly. So either way, if you're going to use hyperthreading, expect to send us more money.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    1. Re:Summary by ethereal · · Score: 1

      That fits right into the normal "if you're going to do anything at all, send us money" Microsoft business plan :)

      --

      Your right to not believe: Americans United for Separation of Church and

  56. 1.6GHz? by WasterDave · · Score: 2

    This is fine, I guess, if you're going to run a processor as slow (!) as this. Point being that a hyperthreaded system will place greater demands on the ram bandwidth.

    With a slow processor they may be using 80% of the available bandwidth instead of 60% with HT switched off. Upping to processor speed to ... say 3GHz, where HT is enabled in vanilla P4's ... and we can expect to see the memory bandwidth being toasted continuously. Under these conditions I doubt we would see a speedup at all, and quite possibly the reduced cache efficiency would reduce it.

    Executive Summary: Can we do this again with a non-Xeon P4 3GHz?

    Dave

    --
    I write a blog now, you should be afraid.
  57. The price comparison to 2 x Athlon-MP? by wytcld · · Score: 2

    If the results are similar to running SMP with two processors (and they look roughly similar), isn't a system with 2 Athlon-MPs still cheaper for a given performance level?

    --
    "with their freedom lost all virtue lose" - Milton
  58. It depends... by sheldon · · Score: 2

    Obviously that depends.

    If your web server is just doing static content, then probably not as a 486 can saturate a T1.

    If your web server is doing dynamic content, then possibly.

  59. Oh Crap! by Not+The+Real+Me · · Score: 1

    I just upgraded my two web servers to AMD Athlon 1.2 ghz processors with 1.5 gigs of RAM each.

    Don't tell me I have to buy another CPU and motherboard combo -- again....

  60. Re:excellent by dabraun · · Score: 1

    Of course, you are wrong.

    XP Pro and even XP Home support hyperthreading. They know the difference between physical and logical and they treat them accordingly (including lisencing ramifications - you can have 2 logical processors on home, 4 on pro vs. 1 / 2 physical processors.)

    David

  61. Ninnle's already fast! by Anonymous Coward · · Score: 0

    You can't get much faster than Ninnle Linux! Blisteringly fast performance!

  62. Re:excellent by Jord · · Score: 1
    Typical knee jerk response. Try thinking about this. Linux was capable of utilizing the hyperthreaded processor as if it was two processors using the SMP code. However, the kernel was updated to improve upon the scheduling of threads on VPCUs vs. PCPUs.

    My statement was that I doubt any MS product does this. I did not say that MS did not recognize the hyperthreaded CPU.

    Can you show anywhere that someone has tested XP to show that it handles this scheduling correctly?

  63. Re:excellent by Anonymous Coward · · Score: 0

    Rumor/legend: OS/2 did not have SMP at the divorce; MS got it up in NT, but IBM couldn't on OS/2. Gordon Letwin made a semi-public bet that if IBM could by some date, he'd fly "everyone" [the usenet group? no one was ever sure] to Seattle for lunch. IBM flailed, and failed; Letwin won.

  64. ibm websphere by Anonymous Coward · · Score: 0

    i'll be curious to see websphere's application server's for linux' performance on these new chips.

    Currently websphere's performance on linux/intel is pretty crappy. Poor java cannot multithread properly on intel, so instead it spawns of multiple processes. Currently one application server will spawn off about 90 processes on linux, while the equivilant on AIX will spawn off only 2-3.

    If you think IBM is promoting linux now, just imagine how much they'll be promoting linux once thier beloved WebSphere runs smoothly on it?

  65. Each generation believes that it invented sex by Anonymous Coward · · Score: 0

    While the technology may be new to Intel, it is 4 decades old. That's not new in my book.

  66. Nothing new by heydrick · · Score: 1

    This is nothing new. The Cray MTA supports 128 threads per processor and can scale up to 256 processors in a single system.

    And the OS is a BSD variant.

  67. Funny this article comes up, I just built one by mattx · · Score: 0

    Dual P4 2.4GHz Xeons. Had to compile my own vanilla kernel to get it to see it as 4, but it does...

    processor : 0
    vendor_id : GenuineIntel
    cpu family : 15
    model : 2
    model name : Intel(R) Xeon(TM) CPU 2.40GHz
    bogomips : 4771.02
    --
    processor : 1
    vendor_id : GenuineIntel
    cpu family : 15
    model : 2
    model name : Intel(R) Xeon(TM) CPU 2.40GHz
    bogomips : 4784.12
    --
    processor : 2
    vendor_id : GenuineIntel
    cpu family : 15
    model : 2
    model name : Intel(R) Xeon(TM) CPU 2.40GHz
    bogomips : 4784.12
    --
    processor : 3
    vendor_id : GenuineIntel
    cpu family : 15
    model : 2
    model name : Intel(R) Xeon(TM) CPU 2.40GHz
    bogomips : 4784.12

    Odd that the logical processors have more bogomips...Kernel is 2.4.20.

    This thing is fast...Apache compiled and installed in about 17 seconds. Kernel compiled in about a minute. The Java app that is going to run on this took about 16 seconds to compile and load...compare that to about 41 seconds on our old dual P3 1GHz machine...NICE.

  68. License issues? by Stonent1 · · Score: 2

    Don't companies like (guessing) Oracle charge by how many processors you use with their software? I know for solaris (even intel) you are licensed by how many cpus you can use. (Just like windows I guess, 1, 2, 4, 8+ cpus)

    Also since XP Home is only single processor capable where does that leave the home users that buy 3.x Ghz computers? Surely it wouldn't be long before someone figures out how to swap a multiprocessor HAL into XP Home...

    1. Re:License issues? by W+Parasyte · · Score: 2, Informative

      From what I've read, XP Home allows the use of 2 virtual CPUs, just not two physical CPUs. So any HyperThreading procs would work just fine.

      --
      -- Your IP is showing
  69. HT is not single-chip SMP by TeknoHog · · Score: 2
    There are too many posts here asking how HT compares with SMP. Correct me if I'm wrong, but isn't HT quite a lot different:

    To simplify greatly, if the CPU has separate units for integer and floating-point math (for example), Hyperthreading means you can use these units in parallel. Therefore, HT will not speed up pure integer or pure FP math, like SMP would. It will only speed up things if you run different kinds of process simultaneously.

    Also, many people have noted that HT sometimes slows things down a bit. I don't find this very surprising because the OS needs more work to organize things for HT, but it may not have more CPU resources than a non-HT version.

    Personally, I think HT is a good idea because it's using the existing hardware more efficiently in a true hacker spirit. However, it's nowhere near proper SMP.

    --
    Escher was the first MC and Giger invented the HR department.
    1. Re:HT is not single-chip SMP by SpinyNorman · · Score: 3, Insightful

      I don't believe that's correct.

      As I understand it HT can indeed speed up pure integer code (or more generally code that's competing for a single CPU resource). HT will allow another thread to exceute if the current one is waiting on anything from pipeline results to memory access. I believe that modern CPU/memory speed disaparity was one of the driving forces behind it - if one thread gets a cache miss then another may be able to continue executing rather than having to sit idle waiting for main memory.

  70. Tom's Hardware test. by Anonymous Coward · · Score: 0

    Toms test is flawed because he ran them on an OS which wasn't Hyperthreading aware. It simply thought it had two CPUs.

    You schedule processes differently on two CPUs than you do on a single CPU. So, Hyperthreading tricks XP into using techniques which are optimal for a two processor machine when it really only has one.

    So, for example, you might go through the trouble of scheduling a thread on the 2nd CPU (since the first is busy). On a two CPU machine that 2nd thread runs. On a HT machine it barely gets run at all unless the first thread gets significantly memory bound. So, you reschedule the thread back on the first one after the first thread is scheduled out. Now you've schedule the same thread twice to get just over one quanta of work done.

    You can easily see how CPU gets wasted.

    Once XP is aware of Hyperthreading, it will rarely if ever slow down when Hyperthreading is switched on.

    And this is the reason you cannot apply Tom's results broadly to Hyperthreading in general, just HT on non-aware versions of XP.

  71. Hyperthreading and memory access by cartman · · Score: 5, Informative

    One of the major impediments to increasing CPU performance has been increasing memory latency. Memory latency has grown worse as CPUs have gotten faster. Accessing RAM will now cause a >150 cycle latency, during which the processor sits IDLE.

    Cache only partly mitigates this problem. Some applications, such as databases and OLTP, are heavily dependent on repeatedly accessing non-cached RAM. There is no way to cache all the relevant data, since virtually all databases are larger than can fit in any present cache, no matter how large, and there is sometimes no way to predict which data will be accessed. ALL of these applications have CPUs that spend much of their time being IDLE, waiting for memory to be returned.

    SMT (hyperthreading) allows the processor to perform useful work during these otherwise idle periods, by allowing the cpu to switch to a thread that is not blocked on memory access. The "idle bubbles" in the execution pipeline can therefore be "filled in" by useful work that advances the state of relevant programs.

    SMT can cause a degredation in performance beceause it can lead to "cache thrashing." In an SMT-naive kernel, two unrelated threads could be scheduled for the same physical CPU. These unrelated threads will likely share very little code or data. The two threads will therefore "compete" for the single shared cache, with each thread's data being repeatedly displaced by the other's.

    This difficulty can be substantially mitigated by making the kernel aware of "virtual processors," and by implementing scheduleing algorithms to minimize the impact. The performance of hyperthreading will likely improve as kernels are better able to exploit it.

  72. Re:excellent by Listen+Up · · Score: 2


    Incorrect, OS/2 was SMP since 2.1. The OS/2 SMP model is still known to be one of the best SMP models to have ever been written. Click on this link http://www.byte.com/art/9406/sec11/art2.htm and learn something about OS/2 SMP (oh geez, it's 1994) and SMP in general.

  73. Win2K 2 CPU == 1 HT CPU ?? by MyHair · · Score: 2

    I have seen screenshots of Windows task manager showing (2) CPU performance graphs.

    Since the "Professional" line of NT/2K/XP kernels only support two processors, does this mean you can only use one HT CPU?

    1. Re:Win2K 2 CPU == 1 HT CPU ?? by spongman · · Score: 2

      XP sees HT processors as a single processor from a licensing standpoint so, yes, you can use two on XP pro.

  74. What is the margin of error? by rufusdufus · · Score: 2

    It really bugs me when I see benchmark numbers relied upon when they have not been presented as statistically significant.
    Whenever you run a benchmark, you MUST run it multiple times and do the proper statistical calculations for standard deviation.
    It is NOT VALID to do one run, and it is NOT VALID to average a bunch of runs without knowing what the deviation is.
    Some times a benchmark's time will vary by more than 100%. Sometimes the reasons are valid, sometimes they are because of an error in the benchmark.
    Without this sort of validation, the numbers presented should not be trusted.

  75. SMT will become increasingly important by cartman · · Score: 5, Interesting

    SMT (hyperthreading) will become increasingly important when processors are able to execute more than 2 threads simultaneously.

    This development is inevitable. Previously, each new processor generation was faster than the prior one at a given clock rate, because each new processor core had more execution units, and was therefore able to perform more work in parallel. This trend abruptly ended recently, for one reason: there is no more instruction-level parallelism (ILP) to exploit. It is impossible for a processor to look at a thead of execution and find more than a few instructions to execute in parallel.

    The only parallelism left to exploit is THREAD-LEVEL parallelism (TLP). Therefore the only way to continually increase performance is to increase the number of threads that a CPU can execute in parallel. This requires two modifications to CPU cores: first, increase the number of thread contexts per CPU, and second, increase the number of pipelines to which those threads can be dispatched.

    With the P4, it would be pointless to have more than 2 thread contexts, because there aren't enough CPU resources lying idle to execute more than 2 threads. But future CPUs could make use of more than 2 thread contexts by having enough CPU resources to execute all of them. Future CPUs could have 20 execution units or more, which would be enough to execute several threads. Remember that the number of transistors per CPU continues to increase exponentially.

    It's easy to forsee a time when processors have 20 execution units (10 integer, 10 fp) and 4 thread contexts, offering more than triple the performance of a non-SMT cpu. In the future, non-SMT CPUs will make as little sense as a non-superscalar CPU would today.

  76. Re:excellent by zdarnell · · Score: 3, Informative

    Any SMP capable operating system supports HT. On the other hand license issues make combining true SMP and HT a pain on non server version of Windows.

    You're totally missing that part of the beauty of HT is the transparency.

    On the other hand you can write things SPECIFICALLY for HT to deal with things such as cache issues, but saying that windows doesn't support it at all is rather misleading and makes it seem like people wouldn't see any improvements at all.

  77. Re:excellent by Anonymous Coward · · Score: 0

    You Sir, Are an Idiot.

  78. make -j by Anonymous Coward · · Score: 0

    What about "make -j" speedup. Eg. How much does it speed up kernel builds? That's a good high-level, non-synthetic benchmark that's relevant to most of us Slashdotters and developers.

    These people measured the time it takes to complete a system call?? That's moronic. No amount of SMP will offer *any* speedup in that test. The advantages of SMTP come when you are running multithreaded compute-intensive applications OR running many processes at high load.

  79. Re:Executive summary...(sp.) by Anonymous Coward · · Score: 0

    There's no A in kernel. Common begg^Hinner error.

  80. Price Comparison in Aus by steveoc · · Score: 1
    Prices in OzDollars ($2 AU is approx $1 US)

    Option 1 : 4 Logical CPUS

    • Tyan Tiger i7500 = $730
    • 2 x Xeon 2400 = $1200
    • 2 GB DDR = $1000
    • Total - $2930

    Option 2 : 2 x Dual Athlon OpenMosix

    • 2 x Gigabyte GA-7DPXDW+ = $600
    • 4 x Athlon MP1900 or 4 x Athlon XP2100 (and a bit of solder) = $800
    • 2 GB DDR = $1000
    • 2 x Gigabit Ethernet = $150
    • Total - $2550

    Option 3 : 4 node cheap Athlon OpenMosix
    • 4 x Shuttle AK32 = $500
    • 4 x Athlon XP2100 = $800
    • 4 x 512MB DDR = $1000
    • 4 x 100mbs ethernet cards + 8 port 100mbs switch = $150
    • Total - $2450

    All much of a muchness I suspect.

    Id lean for Option 2, since its got real SMP, and OpenMosix, and redundancy, and coolness factor.

  81. If you are going to steal OSNews' words verbatim.. by Stalin · · Score: 0

    then at least give them some freakin credit. that is clearly a copy and paste on developerWorks' part. an introduction such as "i read this over at such and such site" is plenty sufficient.

    OSNews article timestamp: 2003-01-14 02:08:14

    Slasdot article timestamp: Tuesday January 14, @02:36PM

  82. HT Raises performance, depending. by mmol_6453 · · Score: 2

    Keep in mind, a 30% gain (for the 2.4 series) in a 2GHz machine would equate to a machine that performed server-oriented functions at an effective 2.6GHz.

    When they benchmarked 2.5.32, they showed a 51% increase, which would boost your effective server performance to 3GHz.

    Granted, the way I understand it, the actual coordination of core components for the two threads is hard-wired or in firmware. That means Intel can still improve HT, to get a better performance boost. To further that line, consider if Intel were to add additional core sections of their CPUs, to be allocated dynamically by the firmware. That means you're increasing your per-clock performance without the major overhead of developing a whole new CPU core.

    I can't see Microsoft standing for it. Intel could put all the pieces for two CPUs on the same die, and call it HT. You might have all the functionality of a dual-CPU setup, with less latency, and still have it show up as a single HT-enabled processor.

    With the way Microsoft's handling SMP machines (with CPU licenses), in addition to their statement that they are developing a 64-bit version of Windows based on the Hammer architecture, I think AMD's future looks pretty bright.

    --
    What's this Submit thingy do?
    1. Re:HT Raises performance, depending. by Anonymous Coward · · Score: 0

      With the way Microsoft's handling SMP machines (with CPU licenses)

      What? Windows Server allows you 4 CPUs. Above 4 CPUs and nobody uses Windows Server. So this really affects them not. (Microsoft makes most of their bank from seat licences and single task configurations.)

      And in general, better x86 performance can only help Microsoft in the long run.

  83. HT is great by Vladimir · · Score: 1

    I tested HT on dual Xeon and my experience is VERY positive. My opinion is that 20% speed improvement is very conservative: my experience shows that code written with HT in mind (see Intel's develop. guide), gets 50% speed-up (compiled with icc). Many other standard tools I use daily (like BLAST) show significant improvement too. In fact, for my applications, to reach 2*2.8 Xeons I need something like 8*750Mhz Ultra3 CPUs or 4*667 EV67 :) This is quite impressive for me, if cost is taken into the picture!

    In general, I think HT is a very clever idea allowing much better use of CPU. I hope they'll come up with 4* and more. This is also an interesting challenge for M$: now, when in 6 months SMP is on every desktop, all kernel internals suddenly need to be SMP safe (including third party buggy drivers): looking back at the long way Linux evolved to the current state and multiplying on their rate of innovation, it's not going to be trivial (good for Linux, of course).

    1. Re:HT is great by Anonymous Coward · · Score: 0

      >This is also an interesting challenge for M$: now, when in 6 months SMP is on every desktop, all kernel internals suddenly need to be SMP safe (including third party buggy drivers)

      Pity for you, but Windows NT has been SMP capable for 5 years so all the bugs are probably out by now.

  84. Very good for single-user machines by r6144 · · Score: 1

    Users don't care about 5% lower throughput, but since the web browser can run simultaneously with a background compilation, the user experience will be much smoother (everyone says that). In the single-threaded benchmarks, there is only one thread running, so no speedups can be expected.

  85. Some background and details by sjames · · Score: 2

    It all starts with the long pipelines and being able to dispatch several instructions at once. The problem was that some of the execution units would go idle due to instructions that couldn't be reordered effectively enough.

    The idea behind Hyperthreading is to have an additional context also dispatching instructions which need not be in any particular order with respect to the first thread. This allows the CPU to actually run at closer to 100% capacity.

    The upside is that when things work out well, less execution units sit idle and waste cycles. The downside is that if one thread does manage to fully utilize the CPU, you don't benefit, and will likely pay a penelty for the extra scheduling.

    From what I've seen, many apps benefit. Heavy and well optomized floating point computation can lose on HT. Some of that can be helped by a more aware scheduler that tries to pair up primarily integer threads with threads doing a lot of floating point.

  86. Re:excellent by alext · · Score: 2

    Microsoft has done some pretty good advancements and achievements in the SMP realm

    Hmmm. Given that SMP has been around an awfully long time, I find this a little hard to believe. And I also remember talking to a senioir DB guy at Microsoft where he was explaining how they'd just started to do SMP optimization in the OS - this was for an NT SP in 1998 or 99.

  87. Re:excellent by Anonymous Coward · · Score: 0

    What's incorrect? The split was before 2.0 shipped.

    Google Groups for 'Letwin SMP' says (April '93 posting!) his wager was "that IBM would *not* ship an SMP-capable OS/2 by March, 1993". And that guy, ten years back, was conceding defeat.

    I will admit this doesn't address when NT got SMPed. Maybe Letwin's bet was just bagging on OS/2, not pumping NT.

    As to what's "still known", I'll leave that to the peanut gallery.

  88. Re:Fundamental mistake - SP3 by Anonymous Coward · · Score: 0

    on my local dual P IV 2.4 GHz Dell PE2650:
    for SP3 on windows 2000 Advanced server, 2 logical per 1 physical.
    This will not be fixed in any release of windows 2000.
    It will be fixed in .Net/windows 2003 server.
    check the MSKB for details.

  89. Just like Windows by Anonymous Coward · · Score: 0

    2000 and probably WinNT 4 saw the HT as two processors but WinXP differentiates between virtual and physical CPU's and schedules them appropritely

  90. This stuff is hot by andbutso · · Score: 1

    I have two Hyper-Threading Xeons in my server and it compiles a kernel like I had typed 'ls'.

  91. Distributed Computing (Offtopic) by Quantum+Skyline · · Score: 1

    So can I use this to run two distributed computing projects at once? Or several instances of one? The Folding@Home project keeps track of how many "active CPUs" that have responded in the last week. Does a hyperthreaded processor count as 2?

  92. Re:excellent by Anonymous Coward · · Score: 0

    My statement was that I doubt any MS product does this.

    Do you really think that Intel implements ANY feature without running it by Microsoft first? Think about it.

    Anyway, Linux 2.5 is still in Alpha (long way away from mainstream distribution), and Windows 03 will be shipping in April, so we'll call this one for MSFT.

  93. Re:excellent by Anonymous Coward · · Score: 0

    OS/2 2.x SMP was very poor performing, and Microsoft benchmarketed the hell out of the fact that NT did better.

    At least with the application we were running (Notes Server), OS/2 SMP wasn't a win at all, while NT scaled almost linearlly. Strange because Notes was very thread-happy (in fact too-thread happy, and it would hit some limit in OS/2 and panic).

    Furthermore, OS/2 2.x SMP wasn't in the box and required some special IBM Salesman Ass-Kissing mojo that we apparently didn't have.

    Ironically, OS/2 currently has one of the best SMP implementations on x86 available. But back in 1993-4, it was not so good.

  94. More Questions by chafey · · Score: 1

    I just ordered several dual P4-XEON, 1 Gig RAM, 80 Gig HD workstations for $1800 a piece from Dell. Besides the amazing price, I am very interested to see how a single processor HT compares to two non HT processors. I understand that HT can easily be controlled by a bios switch so it shouldn't be that hard. I full expect the two non HT processors to win, but the question is by how much? Which will perform better for memory bound tasks? Which will perform better for context switches? Which will perform better for high levels of lock contention? How will the interactivity of the user interface compare? Interesting questions, yet I haven't seen any good data on this yet. My hope is that a single HT processor will provide 80% of the benefit a dual processor gives me today - that would be a major win for everyone!

  95. Hyperthreading better for SDRAM than DDR/rambus by Adam+J.+Richter · · Score: 2
    The benefits of hyperthreading are a function of how many cycles would otherwise be wasted by the CPU as it waits on a cache miss. Systems using regular Synchronous DRAM have more of these lost cycles than systems using Double Data Rate or Rambus. So, I think hyperthreading should narrow the performance gap between regular SDRAM versus DDR or Rambus.

    Also, with DDR or Rambus costing nearly triple what SDRAM costs, I wonder if some enterprising company will develop a chipset that can interleave access to two SDRAM DIMM's for performance similar to DDR. Even if the two DIMM's have to be on completely separate electrical busses, I would think that it would result in a lower total cost for most popular combinations of performance levels and memory capacity.

  96. Maybe I'm ignorant, here, but... by bsdbigot · · Score: 1

    Through a merger, we recently converted to being a Big Blew shop, so when procurement dropped my new IBM x255 in my lap, I was extremely surprised to see that I had more processors (8) than the box physically supports (4). Freaky. Then, I find through some querying that there is apparently work being done in 2.5 to "turn off" this "feature." Makes you wonder if everyone really thinks this is an improvement. But this is a real head trip if you don't know anything about it and your first encounter is in dmesg!

    On the whole, though, I'm pleased (so far) with its performance, though I haven't done any real benchmarking. But, the fact that anyone would want to turn this off still bugs me just a little bit - are people as scared of this as I am? I must be getting old...

    --
    main(){char I,l,O[]={'-',1-1,0,(1<<5)-1,0+'-',-10-1,-10,11-0,- 1,-100};for(I=l=0;l<10+0;put
  97. Re:Underwhelmed -- Incorrect by teflonrabbit · · Score: 1

    Incorrect. The "make" command takes care of the parallelism by using a dependancy file (Makefile). If a makefile is written poorly (sequential compile commands with no dependancy information given) then you will get no gain from forking additional make processes as the compile commands will execute sequentially on *one* processor. GCC itself does not utilize more than one processor.

  98. Re:excellent by Vladimir · · Score: 1

    HT is not completly transparent. For example, MTRR registers are not per virt. processor abd need more care. So you need to modify SMP kernel to (correctly) support HT. At least that's my impression from reading newsgroups.

  99. Uh, where have you been. by Anonymous Coward · · Score: 0

    There are plenty of paid MSFT astoturfers
    lurking around here.

  100. Re:Oh so Intel is not evil today? by Wonko+the+Sane · · Score: 1

    When will the myth of slashdot consensus die?

    Do you think that maybe the people who hate intel are not the same ones who like Xeon HT processors?

    Just because it appears that a majority of slashdot readers think one way sometimes, more than likely it just means that the other 90% don't care enough to comment.

  101. Re:excellent by Listen+Up · · Score: 2


    The SMP in 2.11 wasn't necessarily the best, but I was just talking about when it really first occurred, so I may have misworded it. They actually had to write the SMP for some pretty interesting hardware (486 SMP) and needed to do some amazing wizardry to make it all work right. After OS/2 Warp came out, the SMP was amazing and NT couldn't touch it. I remember reading an article about the SMP software engineer (name escapes me right now) who he was a programming savaunt and IBM by all means had hired the best of the best. A lot of his work was put on hold for OS/2 Warp as IBM was making OS/2 into a microkernel OS for the then new PowerPC architecture (which was a ultra amazing OS, but never truly released to the public). This is what basically killed OS/2 in the end and held up the SMP implementation that today is currently one the best in the world (on Intel hardware).

  102. OS/2 by Anonymous Coward · · Score: 0

    If I dedicate one CPU to manage all of the other CPUs, will OS/2 finally run efficiently?

  103. Re:excellent by virtual_mps · · Score: 1

    You're buying into the myth of HT transparency. License issues and "things such as cache issues" aren't the problem--the achilles heel of HT is that in a physical smp system you can schedule two processor intensive tasks on two virtual cpus on the same physical cpu, while the other physical cpu is completely idle. This will lead to unpredictable, unrepeatable, and just plain bad performance. One a system with only one physical cpu this is not an issue, but you'd better make sure you don't run HT on a multi-cpu system without some OS scheduler support.