Slashdot Mirror


Ars Technica on Hyperthreading

radiokills writes "Ars Technica has a highly-informative technical paper up on Hyper-Threading. It's a technical overview of how simultaneous multithreading works, and what problems it will introduce. It also explains why comparing the technology to SMP is Apples to Oranges, in a sense. Starting with the 3 GHz Pentium 4, this tech will be standard in Intel's desktop lines (it's already in the Xeon), so this is important stuff."

18 of 235 comments (clear)

  1. Might not speed up benchmarks... by be-fan · · Score: 4, Informative

    But I'd but it gives quite a boost to interactive performance. SMP setups tend to be wonderfully responsive under background loads (much more so than the sum of the CPU speeds would suggest) so I'd guess that allowing the CPU to run more than one thread at a time would make the UI a little more responsive on single-proc machines. Now, all we need are the UNIX developers to stop being afraid of multithreading and maybe some of us UNIX users would be able to take advantage of this :0

    --
    A deep unwavering belief is a sure sign you're missing something...
    1. Re:Might not speed up benchmarks... by Kashif+Shaikh · · Score: 4, Interesting

      UNIX developers to stop being afraid of multithreading and maybe some of us UNIX users would be able to take advantage of this

      Do you know why they are afraid? In my view, threads re-introduce the problem where you have a bunch of processes that can freely share any memory at will, use any means of communication, and are a pain in the Ass with a capital A to debug/trace properly(without using internal debuggers). Try debugging a single process with dozens of different threads(i.e. threads with diff. entry points), where each thread has another dozen instances of itself. Now try using traditional debugging tools like strace,gprof(for tracing), or gdb.

      In traditional multi-process environments, multiple processes are forced to communicate using well-designed message passing interfaces(pipes, unix domain & net sockets, FIFOs, message queues, shared-memory). Sure you can use share memory, but its done in a more restricted way(you share a buffer) so that it's not abused. Badly written threads in my experience use global variables and literally hundreds of flags(i'm not joking) for communicating what to do,whats the state,etc. Debugging processes are easier IMO, because all processes can dump their core, you can pause a process in action and see exactly what its currently doing(tracing).

      I want to ramble more, but I'm tired. Anyone have more input on threads v.s processes?

  2. "It's already in the Xeon" by Theatetus · · Score: 4, Insightful

    Yes, but since no one has a supersentient compiler and assembler like ht requires, very few programs are able to really take advantage of this.

    I dig innovation. I dig more impressive chips. But it's getting to the point where boxes with top of the line CPUs are like those old VWs with Porsche engines in them: there comes a point when improving one part doesn't really matter any more.

    --
    All's true that is mistrusted
    1. Re:"It's already in the Xeon" by be-fan · · Score: 4, Insightful

      Um, HT doesn't require supersentient compilers, it requires mildly sentient developers. Namely, developers have to make their programs multithreaded. In the Windows world, this happens already, far less so in the Linux world. Speaking of supersentient compilers, Intel C++ 6.0 supports OpenMP, even on Linux.

      --
      A deep unwavering belief is a sure sign you're missing something...
    2. Re:"It's already in the Xeon" by jc42 · · Score: 4, Interesting

      > developers have to make their programs multithreaded. In the Windows world, this happens already, far less so in the Linux world.

      There's a good reason for this. The biggest problem with debugging multithreaded code is preventing the threads from shooting each other in the foot. On unix-like systems, there's a simple, elegant solution to this: processes. If you use independent processes with shared memory, you can limit the foot-shooting problems to only the shared segments, and the rest of the code is safe. You also have several kinds of inter-process communication that are easy to program and fairly failsafe.

      On Windows, you don't much have these things. Developers don't much take advantage of multiprogramming, because the inter-process communication tools are so complex. So the model is a single huge program that does everything. The natural development is toward an emacs-like system, in which everything is a module in one huge program. In such a model, it makes sense to want to use threads, so that some tasks can proceed when others are blocked.

      One way to get unix/linux developers adopt threads is making it more difficult to use the basic unix multi-processing and IPC tools. If they can be made more complex than threads, then people will adopt the Windows model.

      Alternatively, the threads library could be made as easy to use as the older unix approach. But so far, there's little sign of this happening.

      Threads are a debugging nightmare, and a programmer who has lost months trying to debug a threadized program, and finding that the end result runs even slower than the original, is going to be shy to do it again.

      Also, calling the developers dummies isn't very persuasive. They mostly hear such insults as a euphemism for "It's too complicated for your simple mind." When I hear things like that as answers to my questions, I tend to agree with my critic, and revert to things that I can understand and get to work right.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    3. Re:"It's already in the Xeon" by spitzak · · Score: 5, Insightful
      I would agree with the rest of the responders here that you have no idea what you are talking about.

      A correct multithreaded program is HARD!!!!! Anybody who thinks otherwise is an idiot. I have seen the results. All the systems I have seen are either broken or have so many locks in them that they may as well be single-threaded. Most Windows programmers use multithreading so that they can keep more state in local variables, which may be an ok goal but has nothing to do with speed. One of biggest buggiest programs here is a multh-threaded monstrosity written by a Windows program where there are 50 threads, ALL WAITING ON THE SAME SOCKET, and it crashes sparodically in the rare cases when two threads actually become alive at the same time. Every single rewrite to reduce the number of threads has greatly improved performance and reliability.

      I have no idea why you think GUI should be multi-threaded. GUI has no reason to be fast, computers are MUCH faster than humans, at least at drawing junk on the screen. In fact the best way to do it is pseudo-multithreading, such as the method windows uses (gasp! Fact alert: it is NOT multithreaded, only one "DispatchMessage" is running at a time!).

      I think perhaps you mean that the GUI should be running in a parallel thread with the calculations and there you have a point, however a lot of the problems are solved by deferred redraw, which the X toolkits do quite well (and in fact Windows is broken because it produes WM_PAINT events without knowing if the program has more processing to do).

      Now if there are intense calculations I grant that parallel threads are necessary, and I am working on such a program, but I must warn you that it is extremely difficult: the GUI cannot modify ANY structure being used by the parallel thread, instead it must kill the threads, wait for them to stop, modify the structure, and start them again. If in fact nothing changed you need to restart so the partially-completed answer from last time can be reused, this means you must write all the code you would for a single-threaded appliation, it does NOT save you anything. If you restart the complete parallel calculation you will get an unresponsive program if that parallel calculation takes more than a second or so. You could instead do a fancy test to see if your modifications will change the data before you kill the threads and commit them, but this often requires you to calculate the modifications twice, and the overhead of this may well kill the advantage of the parallel thread, and at least in my example this was far worse than reusing all the single-threaded restart code.

    4. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 5, Informative
      There's a good reason for this. The biggest problem with debugging multithreaded code is preventing the threads from shooting each other in the foot.

      Yes, you have to use mutexes and other synchronization primitives to serialize (or at least de-conflict) accesses to shared data. But, there's nothing that requires you to share data between threads. In fact, a significant percentage of the data in the average multi-threaded program is not shared. No matter whether you are building an application using multiple threads or multiple processes, you still have the freedom to use whatever mix of data sharing and message passing is appropriate for your application.

      On unix-like systems, there's a simple, elegant solution to this: processes. If you use independent processes with shared memory, you can limit the foot-shooting problems to only the shared segments, and the rest of the code is safe.

      Data shared by multiple processes needs exactly the same kind of protection as data shared by multiple threads. Except that using shared memory segments requires a lot of extra book keeping and the segments aren't cleaned up if a program terminates abnormally. And obviously, no matter whether you are using multiple threads or processes, the foot shooting is limited to the shared data only.

      You also have several kinds of inter-process communication that are easy to program and fairly failsafe.

      You can communicate between threads (or even between the same thread or process and itself) using named pipes if you want. Same goes for sockets. Using a multi-process model instead of a multi-threaded model doesn't give you access to any additional mechanisms. In fact, it's much easier to build useful communications mechanisms if you're working with threads.

      On Windows, you don't much have these things. Developers don't much take advantage of multiprogramming, because the inter-process communication tools are so complex. So the model is a single huge program that does everything. The natural development is toward an emacs-like system, in which everything is a module in one huge program. In such a model, it makes sense to want to use threads, so that some tasks can proceed when others are blocked.

      In Windows, you have basically the same tools. You may not know this, but the process & thread model in Windows is virtually the same as in most modern UNIX systems. The fact that old UNIX command line tools are small and oriented around using pipes for IPC is mainly a byproduct of history & convention, if that's what you're thinking of.

      One way to get unix/linux developers adopt threads is making it more difficult to use the basic unix multi-processing and IPC tools. If they can be made more complex than threads, then people will adopt the Windows model.

      Alternatively, the threads library could be made as easy to use as the older unix approach. But so far, there's little sign of this happening.

      I would say that building applications with multiple threads is already easier than building applications with multiple processes. That has been my experience anyway.

      Threads are a debugging nightmare, and a programmer who has lost months trying to debug a threadized program, and finding that the end result runs even slower than the original, is going to be shy to do it again.

      On the contrary, debugging apps that consist of multiple processes is a nightmare. Debugging multi-threaded programs is much easier. For one thing, how many debuggers let you attach to & debug more than one process at a time in the same set of debugger windows (or at all)? Further, when you're debugging a program with multiple processes, if you signal or interrupt one process the others continue on (and vice-versa when you continue). This is rarely what you want. In general, the differences boil down to the fact that the OS & debugger coordinate & manage the execution of multiple threads within one application, while you have to do it manually if you have an application built with multiple processes. That means less work for the developer in terms of lines of code, less work in debugging, etc.

      Also, calling the developers dummies isn't very persuasive. They mostly hear such insults as a euphemism for "It's too complicated for your simple mind." When I hear things like that as answers to my questions, I tend to agree with my critic, and revert to things that I can understand and get to work right.

      The problem isn't so much that old school UNIX programmers are dumb. Mostly, they're either afraid of change or just too damn arrogant & obstinate to bother learning new technologies.

  3. Hyperthreading on Windows by kawika · · Score: 5, Informative

    If you plan to use any of these features effectively on Windows you'll need to upgrade to Windows.NET Server. Windows 2000 can't distinguish between virtual and physical processors, so if the BIOS doesn't set up a two (real) CPU system the right way it will end up ignorning the second physical processor. My source:

    www.microsoft.com/windows2000/docs/hyperthreading. doc

    1. Re:Hyperthreading on Windows by dzym · · Score: 5, Funny

      I've also heard that a virtual processor requires its own CPU license, at least in Win2K.

    2. Re:Hyperthreading on Windows by riiv · · Score: 4, Informative

      Not Quite.
      From hyperthreading.doc "Windows 2000 Server does not distinguish between physical and logical processors on systems enabled with Hyper-Threading Technology"

      Basically for 2000 family you need 2x your CPU-license limit; each virtual processor counts as a physical one.

      So A .net or newer is probably required depending on your hardware requirement. The 2000 kernel will probably not be rewritten.

      --
      Unix is a standard, DOS is a standard, windows XX is not.
    3. Re:Hyperthreading on Windows by StonyUK · · Score: 5, Informative

      This is partial FUD - the document says that IF your BIOS counts processors the way Intel tell BIOS manufacurers they should, then your 4-CPU licence of 2000 server will utilize the 1st logicalCPU of each physicalCPU.

      However, it won't go on to use the extra 2nd logical CPU in each physical CPU because you've used up all your licences by then (2000 server only gives you a 4 CPU licence).

      If your BIOS doesn't enumerate CPUs the way Intel says they should, then 2000 will use both logical CPUs on the 1st and 2nd physical CPUs, and presumably leave your other two physical CPUs idle.

      In .NET, it appears that Microsoft have not only taught it how to count CPUs properly regardless of potential BIOS problems, and also decided that only physical CPUs count towards licencing (well DUH!) and so with a 4 CPU hyperthreaded system, all 8 of your logical CPUs will be used.

  4. it's very difficult to do well by Trepidity · · Score: 5, Informative

    It's incredibly difficult to automatically parellelize a program well. Even when you can run a preprocessor on it and spend days on computations; doing it in real-time in hardware is even more difficult. This is currently done to a small extent in the pipelining hardware of modern CPUs, and even that small bit of automatic parallelization is ridiculously complex and slows things down (which is why the Itanium dumped it, and put the onus on the computer to paralellize sufficiently for pipelining to work). If it's that difficult to do for the relatively meager paralellization requirements of pipelining, actually breaking the program into separate execution threads is damn near impossible with current technology (at least with any efficiency even remotely approaching writing a program to be properly multithreaded in the first place).

  5. Hyperthreading? What's next? by The+Slashdolt · · Score: 5, Funny

    What's next, LudicrousThreads?

    obligatory spaceballs reference

    --
    mp3's are only for those with bad memories
  6. Dear sir, by Anonymous+Cowrad · · Score: 5, Funny

    oh no!

    Sincerely,
    Intel

    --

    --
    pants ahoy
  7. Terra/Cray MTA by astroboy · · Score: 5, Interesting

    The company that now owns the name Cray does something very much like this on a fairly grand scale on its own architecture, the MTA (Multi-Threaded Architecture). Here, each processor switches between 128(!) hardware threads to take advantage of the sort of concurrancy you can get for waiting for memory access, etc.

  8. Re:Oracle, W2K Enterprise by MmmmAqua · · Score: 5, Informative

    I don't know where you're getting your info about Oracle, but it's wrong. Oracle licensing is determined per-physical CPU. This was something we made doubly-sure to check up on when migrating from our old Oracle server to our new one (dual Xeon w/HT).

    On the downside of HT, until the 2.6 (or 3.0, subject to Linus' whim) kernel comes out, there's no point in enabling HT on a Linux box; because the 2.4 scheduler is unaware of HT, all CPUs are treated the same, and the scheduler ends up starving one physical CPU. Performance on a dual-1.8Ghz Xeon, 1Gb RDRAM with HT enabled under 2.4.10 is roughly 5-15% slower than with HT disabled.

    2.5.31 with the HT patch dramatically reverses these numbers, providing an average performance that is 30% better than 2.4.10 without HT. YMMV, of course, and I'm not talking about OS performance, I'm talking about Oracle's performance. Still, 30% increase just for flipping a switch in the BIOS and recompiling the kernel is nothing to sneeze at.

    --
    Arr! The laws of physics be a harsh mistress!
  9. SYMMETRIC Multi Threading by keytoe · · Score: 5, Insightful

    They call this stuff Symmetric Multi Threading, but I think that name is a bit misleading. While the thread scheduling itself is symmetric (all process threads are created equal and receive equal execution time), the shared resources on the CPU (cache, shared registers) are NOT symmetric. Since these shared resources are in essence handled on the way in to the execution unit, it becomes really easy to starve the processor when you have contention for one of those resources.

    While proper application development can alleviate some of this issue, it will depend heavily on the actual usage patterns of the system. When you have a lot of overlap coming in from memory (like the file system cache on a web server), you don't worry too much about threads stepping on each others' registers. This sounds fantastic for data servers.

    Desktop systems, on the other hand, almost never work this way. When you're playing MP3s in the background while web surfing and checking your email, you're already working with vastly different areas of data. Throw the OS and any various background processes into the mix and you've pretty much eliminated any gain and possibly slowed down due to cache contention.

    While this was touched on at the end of the article, I don't think it was given enough weight. It doesn't just depend on what applications you're running and wether they were written to take advantage of it. It depends on what you want to do with the whole system. For serving data, this will certainly be good (especially with multiple CPUs!). For desktop systems, this is a non-starter.

    I'm not disparaging the technology - far from it. I'm just waiting for Intel and Microsoft to market this to my mom as a way to have higher quality DVD playback - at twice the cost. And her buying it. Again.

  10. Re:SMP is the way grasshopper by Darren+Winsper · · Score: 4, Informative

    You have a very significant mis-understanding of pre-emptive multi-tasking. There is no situation where a locked process cannot be killed on a single CPU system but can be on a multiple CPU system.

    When the locked application's timeslice runs out, other applications will get a go, and from that it it possible to kill the locked application. This is one of the reasons pre-emptive multi-tasking became popular.