Slashdot Mirror


Ars Technica on Hyperthreading

radiokills writes "Ars Technica has a highly-informative technical paper up on Hyper-Threading. It's a technical overview of how simultaneous multithreading works, and what problems it will introduce. It also explains why comparing the technology to SMP is Apples to Oranges, in a sense. Starting with the 3 GHz Pentium 4, this tech will be standard in Intel's desktop lines (it's already in the Xeon), so this is important stuff."

21 of 235 comments (clear)

  1. Might not speed up benchmarks... by be-fan · · Score: 4, Informative

    But I'd but it gives quite a boost to interactive performance. SMP setups tend to be wonderfully responsive under background loads (much more so than the sum of the CPU speeds would suggest) so I'd guess that allowing the CPU to run more than one thread at a time would make the UI a little more responsive on single-proc machines. Now, all we need are the UNIX developers to stop being afraid of multithreading and maybe some of us UNIX users would be able to take advantage of this :0

    --
    A deep unwavering belief is a sure sign you're missing something...
  2. Hyperthreading on Windows by kawika · · Score: 5, Informative

    If you plan to use any of these features effectively on Windows you'll need to upgrade to Windows.NET Server. Windows 2000 can't distinguish between virtual and physical processors, so if the BIOS doesn't set up a two (real) CPU system the right way it will end up ignorning the second physical processor. My source:

    www.microsoft.com/windows2000/docs/hyperthreading. doc

    1. Re:Hyperthreading on Windows by Tassleman · · Score: 2, Informative

      From the document:
      When examining the processor count provided by the BIOS, Windows .NET Server distinguishes between logical and physical processors, regardless of how they are counted by the BIOS. This provides a powerful advantage over Windows 2000, in that Windows .NET Server only treats physical processors as counting against the license limit. For example, if you launch Windows .NET Standard Server (2-CPU limit) on a two-way system enabled with Hyper-Threading Technology, Windows will use all four logical processors, as shown in Figure 4.
      [DIAGRAM 4]
      This example illustrates the great benefit provided by Windows .NET Server on systems enabled with Hyper-Threading Technology--customers are able to harness the processing power of four logical processors using a 2-CPU license.


      Well that's unsurprisingly lame on Microsoft's part. Basically that document says "we're too lazy to update Windows 2000 to PROPERLY recognize SMT-enabled processors, and will screw you on licensing unless you upgrade to .NET Server"

    2. Re:Hyperthreading on Windows by riiv · · Score: 4, Informative

      Not Quite.
      From hyperthreading.doc "Windows 2000 Server does not distinguish between physical and logical processors on systems enabled with Hyper-Threading Technology"

      Basically for 2000 family you need 2x your CPU-license limit; each virtual processor counts as a physical one.

      So A .net or newer is probably required depending on your hardware requirement. The 2000 kernel will probably not be rewritten.

      --
      Unix is a standard, DOS is a standard, windows XX is not.
    3. Re:Hyperthreading on Windows by StonyUK · · Score: 5, Informative

      This is partial FUD - the document says that IF your BIOS counts processors the way Intel tell BIOS manufacurers they should, then your 4-CPU licence of 2000 server will utilize the 1st logicalCPU of each physicalCPU.

      However, it won't go on to use the extra 2nd logical CPU in each physical CPU because you've used up all your licences by then (2000 server only gives you a 4 CPU licence).

      If your BIOS doesn't enumerate CPUs the way Intel says they should, then 2000 will use both logical CPUs on the 1st and 2nd physical CPUs, and presumably leave your other two physical CPUs idle.

      In .NET, it appears that Microsoft have not only taught it how to count CPUs properly regardless of potential BIOS problems, and also decided that only physical CPUs count towards licencing (well DUH!) and so with a 4 CPU hyperthreaded system, all 8 of your logical CPUs will be used.

  3. SMP performance by swg101 · · Score: 2, Informative

    I would agree that a SMP system holds up well. I run 2x 200MHz Pentium Pro, and it gives solid performance as a desktop. I wonder if this tech would allow a slower clock speed chip, thus cooler, that still exhibited good performance. It seems like a good idea for laptops, etc.

    --
    Like pi? Try 10,000 digits.
  4. it's very difficult to do well by Trepidity · · Score: 5, Informative

    It's incredibly difficult to automatically parellelize a program well. Even when you can run a preprocessor on it and spend days on computations; doing it in real-time in hardware is even more difficult. This is currently done to a small extent in the pipelining hardware of modern CPUs, and even that small bit of automatic parallelization is ridiculously complex and slows things down (which is why the Itanium dumped it, and put the onus on the computer to paralellize sufficiently for pipelining to work). If it's that difficult to do for the relatively meager paralellization requirements of pipelining, actually breaking the program into separate execution threads is damn near impossible with current technology (at least with any efficiency even remotely approaching writing a program to be properly multithreaded in the first place).

  5. Re:multithreading by Zathrus · · Score: 3, Informative

    Hey, if you know a new solution to deadlocks and race conditions so that it's trivially easy to solve all of them in realtime, then go talk to a processor vendor of your choice - you won't ever have to invent anything again.

    Until that happens it's simply not possible for anything but the most trivial of tasks (which is already done by compilers and processors with multiple execution units).

  6. Re:multithreading by Anonymous Coward · · Score: 1, Informative

    There almost is such a thing, at least in academia literature:

    ftp://ftp.cs.wisc.edu/sohi/papers/2002/mssp.micr o. pdf

  7. Re:Oracle, W2K Enterprise by MmmmAqua · · Score: 5, Informative

    I don't know where you're getting your info about Oracle, but it's wrong. Oracle licensing is determined per-physical CPU. This was something we made doubly-sure to check up on when migrating from our old Oracle server to our new one (dual Xeon w/HT).

    On the downside of HT, until the 2.6 (or 3.0, subject to Linus' whim) kernel comes out, there's no point in enabling HT on a Linux box; because the 2.4 scheduler is unaware of HT, all CPUs are treated the same, and the scheduler ends up starving one physical CPU. Performance on a dual-1.8Ghz Xeon, 1Gb RDRAM with HT enabled under 2.4.10 is roughly 5-15% slower than with HT disabled.

    2.5.31 with the HT patch dramatically reverses these numbers, providing an average performance that is 30% better than 2.4.10 without HT. YMMV, of course, and I'm not talking about OS performance, I'm talking about Oracle's performance. Still, 30% increase just for flipping a switch in the BIOS and recompiling the kernel is nothing to sneeze at.

    --
    Arr! The laws of physics be a harsh mistress!
  8. Linux support for Hyperthreading.. by molo · · Score: 3, Informative

    KernelTrap has had some articles on Linux's support of HT. Ingo Molinar has been working on tuning the scheduler for HT systems. Articles are here:

    http://kerneltrap.org/node.php?id=391
    http://ke rneltrap.org/node.php?id=406

    </karmawhoring>

    --
    Using your sig line to advertise for friends is lame.
  9. Re:multithreading by iabervon · · Score: 3, Informative

    Processors do this to the extent that it's possible at runtime; that's what out-of-order execution is, basically. The problem is that it only makes your single threaded program into 2 or 3 threads; beyond that, you need to look at bigger chunks of the program than the processor ever sees at once.

    Beyond that, you really need to be able to look at the program as a whole in order to do anything that clever, so you're talking language, compiler, or library features, and you generally have to involve the programmer somewhat, although you don't necessarily have to do it as explicit threads. (E.g., there's a C variant with a keyword that says it's okay to evaluate all of the arguments to a function at the same time)

  10. Re:SYMMETRIC Multi Threading by wfmcwalter · · Score: 2, Informative
    They call this stuff Symmetric Multi Threading, but I think that name is a bit misleading.

    I believe when symmetric is used in the context of SMP and SMT it is intended to mean "all execution elements have the same public interface".

    Things would be asymmetric in cases where there was a differentiation between the performance or capabilities of the execution elements - e.g. where one processor could handle interrupts and the other couldn't. An 80286+80287 is an example of an asymmetric system - one execution element can only do FP stuff, the other can do everything but FP.

    --
    ## W.Finlay McWalter ## http://www.mcwalter.org ##
  11. Re:SMP is the way grasshopper by Anonymous Coward · · Score: 1, Informative

    pfft... I agree SMP is cool, but I tend to run many VMware processes at the same time.

    I play Q3, burn CD's and listen to music on my 800 Mhz P3... The SMP isn't really giving you anything in that department. For example, playing a MP3 on my computer uses about 2% CPU, burning a CD (16x) uses about 20% CPU if that much (CD burning speed is limited by HD speed, not CPU; duh). Q3 gets the rest (around 100 FPS on my ancient GF2 MX card), no worries.

  12. Re:SMP is the way grasshopper by Darren+Winsper · · Score: 4, Informative

    You have a very significant mis-understanding of pre-emptive multi-tasking. There is no situation where a locked process cannot be killed on a single CPU system but can be on a multiple CPU system.

    When the locked application's timeslice runs out, other applications will get a go, and from that it it possible to kill the locked application. This is one of the reasons pre-emptive multi-tasking became popular.

  13. Re:Hyperthreading on Windows - user experience by madbrain · · Score: 2, Informative

    I'm posting this on a Dell P530 development desktop, running Windows 2000 Server.
    The CPU is a single Intel Xeon 2.2 GHz.
    Hyperthreading can be turned on or off in the BIOS of the machine. I turned it on before I installed Win2K.

    The system was seen as a dual CPU machine from the time I installed it from the original CD, before I applied any service pack.

    If I disable hyperthreading in the BIOS and boot Win2K, then I only see one CPU.

    I have a second Xeon CPU on order for this machine as it is dual capable. Once I get it, it should make it look like a quad CPU in Win2K.

    FYI, I am also running another OS on the system, Warp Server for E-business with the SMP kernel. Unfortunately the OS2APIC.PSD driver only detected one CPU even with hyperthreading enabled. I contacted the OS/2 kernel developer at IBM Austin, who told me that somehow there needed to be explicit support for it in OS/2 SMP for it to work.

    I also left about 20 GB unpartitioned on my hard disk for Linux, but I haven't gotten around to installing it yet. Thread support in Linux has historically been poor and this is the main reason why I haven't done so. With the availability of the NPTL library, I'm looking forward to installing Linux, as NPTL becomes the standard pthreads library for Linux.

    --
    -- Julien Pierre http://www.madbrain.com/blog
  14. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 5, Informative
    There's a good reason for this. The biggest problem with debugging multithreaded code is preventing the threads from shooting each other in the foot.

    Yes, you have to use mutexes and other synchronization primitives to serialize (or at least de-conflict) accesses to shared data. But, there's nothing that requires you to share data between threads. In fact, a significant percentage of the data in the average multi-threaded program is not shared. No matter whether you are building an application using multiple threads or multiple processes, you still have the freedom to use whatever mix of data sharing and message passing is appropriate for your application.

    On unix-like systems, there's a simple, elegant solution to this: processes. If you use independent processes with shared memory, you can limit the foot-shooting problems to only the shared segments, and the rest of the code is safe.

    Data shared by multiple processes needs exactly the same kind of protection as data shared by multiple threads. Except that using shared memory segments requires a lot of extra book keeping and the segments aren't cleaned up if a program terminates abnormally. And obviously, no matter whether you are using multiple threads or processes, the foot shooting is limited to the shared data only.

    You also have several kinds of inter-process communication that are easy to program and fairly failsafe.

    You can communicate between threads (or even between the same thread or process and itself) using named pipes if you want. Same goes for sockets. Using a multi-process model instead of a multi-threaded model doesn't give you access to any additional mechanisms. In fact, it's much easier to build useful communications mechanisms if you're working with threads.

    On Windows, you don't much have these things. Developers don't much take advantage of multiprogramming, because the inter-process communication tools are so complex. So the model is a single huge program that does everything. The natural development is toward an emacs-like system, in which everything is a module in one huge program. In such a model, it makes sense to want to use threads, so that some tasks can proceed when others are blocked.

    In Windows, you have basically the same tools. You may not know this, but the process & thread model in Windows is virtually the same as in most modern UNIX systems. The fact that old UNIX command line tools are small and oriented around using pipes for IPC is mainly a byproduct of history & convention, if that's what you're thinking of.

    One way to get unix/linux developers adopt threads is making it more difficult to use the basic unix multi-processing and IPC tools. If they can be made more complex than threads, then people will adopt the Windows model.

    Alternatively, the threads library could be made as easy to use as the older unix approach. But so far, there's little sign of this happening.

    I would say that building applications with multiple threads is already easier than building applications with multiple processes. That has been my experience anyway.

    Threads are a debugging nightmare, and a programmer who has lost months trying to debug a threadized program, and finding that the end result runs even slower than the original, is going to be shy to do it again.

    On the contrary, debugging apps that consist of multiple processes is a nightmare. Debugging multi-threaded programs is much easier. For one thing, how many debuggers let you attach to & debug more than one process at a time in the same set of debugger windows (or at all)? Further, when you're debugging a program with multiple processes, if you signal or interrupt one process the others continue on (and vice-versa when you continue). This is rarely what you want. In general, the differences boil down to the fact that the OS & debugger coordinate & manage the execution of multiple threads within one application, while you have to do it manually if you have an application built with multiple processes. That means less work for the developer in terms of lines of code, less work in debugging, etc.

    Also, calling the developers dummies isn't very persuasive. They mostly hear such insults as a euphemism for "It's too complicated for your simple mind." When I hear things like that as answers to my questions, I tend to agree with my critic, and revert to things that I can understand and get to work right.

    The problem isn't so much that old school UNIX programmers are dumb. Mostly, they're either afraid of change or just too damn arrogant & obstinate to bother learning new technologies.

  15. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 3, Informative
    One real reason, I think, why we don't see more threads under Linux is that Linux doesn't support POSIX threads. The POSIX threads model is processes which have threads. Linux, on the other hand, has processes which can share address spaces, file descriptors and so on, which is not the same thing.

    Wrong. Linux threads are compliant with POSIX 1003.1c (and most of the common extensions). There is one exception, abeit a minor one - you can signal individual threads in Linux. The POSIX standard specifies nothing about how threads are to be mapped to processes.

    In Linux, the mapping between processes and threads is strictly one-to-one at the kernel level, although the use of thread groups makes it effectively one-to-many at the user process level. Other operating systems such as Solaris offer a many-to-many mapping with kernel light weight processes (LWPs), but it's again one-to-many at the user process level. Both implementations are about equally close to being POSIX compliant (Solaris threads aren't POSIX compliant because they don't support cancellation).

    For example, fork()ing a process in one thread and waitpid()ing on it in another thread simply doesn't work under Linux. This sort of thing makes porting POSIX-compliant multithreaded applications to Linux difficult at best.

    Not true again. In Linux 2.4, a parent process will wait on any child in the same thread group by default, unless you block SIGCHLD. In previous versions, it wasn't the default, but you could still do it. Besides, this doesn't have much to do with POSIX threads, because fork() and waitpid() aren't part of the pthreads API. fork() and waitpid() are process management functions. To create a new thread in POSIX, you use pthread_create() and to wait for one to exit you use pthread_join().

    Note: Before anyone accuses me of FUDing, note that I'm not passing value judgements here.

    Perhaps not, but you are passing bad info.

  16. Re:SYMMETRIC Multi Threading by akuma(x86) · · Score: 3, Informative

    It's not symmetric multithreading.
    It's SIMULTANEOUS multithreading.

    This means that both threads are in the processor pipeline simulatenously.

  17. Re:"It's already in the Xeon" by juggy · · Score: 2, Informative
    I don't think you know of the newer approaches to threading technology. For starters, you can check out the scheme48 site. They implemented threads not by using locks but by using logging facilities, that is to say, journals. I just spent 6 months working with this way of doing things, and I can assure you one this:
    1. you cannot get deadlocks
    2. you can hardly produce lifelocks
    3. it is much faster than using shared memory
    4. the main system always has access to the memory, no need to unlock/lock/...

    There are a LOT of good reasons to use this sort of multi-threading, especially since - if correctly implemented - it requires much less memory, cpu and debugging efforts than processes or the old sort of threading model.
  18. a clarification by Anonymous Coward · · Score: 3, Informative

    Since lots of people seem to be missing the point of "hyperthreading", as Intel is calling it, I feel like jumping in and trying to clarify a little bit.

    Processor clocks have gotten faster and faster and faster and faster over the last decade. Multiple orders of magnitudes faster. Not only that, but processors have incorporated increasingly clever tricks to process the data they have available to them. Memory speeds have increased too, but even with DDR and all that great stuff, they haven't kept pace. So there are times when your super-fast processor is just sitting there waiting around because it's run out of data to process.

    Even if you could (cheaply) make memory that actually ran at 2 GHz or whatever, this would not solve an even more fundamental problem that makes the situation worse: due to the speed of light, a 2 GHz processor is going to have to wait a really significant amount of time if it has to wait on main memory before it's time to process something.

    So, here's a question for you: if the processor has to wait a really long time, maybe enough time to execute maybe like 50 instructions, what should it do during that time? Should it:

    1. Sit on its butt and do absolutely nothing at all, or
    2. Quickly flip over to another thread and start executing its instructions?

    Well, the idea behind the hyperthreading (a/k/a thread-level parallelism) is that the processor should make some sort of effort to do something.

    So, IMHO hyperthreading isn't stupid or a marketing ploy. It's a genuine attempt (one that many processor makers are working on, by the way) to solve a genuine problem. And not only a genuine problem, but one that will increasingly become a bottleneck. (It's already bad enough that it has its own name: "The Von Neumann Bottleneck".)

    And by the way, the advantage of this over two processors is that you don't have to build two chips! You don't get double the performance, but it's quite possible that you might get a better bang for the buck. (Notice I said "might".)

    Also note on the cache pollution issue (where one thread slows down another by "hogging" the cache and actually causing slower execution for another) that there are ways to mitigate this problem. An obvious one that comes to mind is to bias the processor towards executing a particular one of the threads. That way, one thread runs much more often and should tend to have what it needs in the cache.

    Anyway, until the economy gets better and I find a way not to be one of the masses of unemployed software developers anymore, I'm not buying one of these fancy processors...