Slashdot Mirror


Ars Technica on Hyperthreading

radiokills writes "Ars Technica has a highly-informative technical paper up on Hyper-Threading. It's a technical overview of how simultaneous multithreading works, and what problems it will introduce. It also explains why comparing the technology to SMP is Apples to Oranges, in a sense. Starting with the 3 GHz Pentium 4, this tech will be standard in Intel's desktop lines (it's already in the Xeon), so this is important stuff."

235 comments

  1. AMD by Almace · · Score: 1

    Does amd have naything similar? Dan 20sec rule.

    --
    Remember,democracy never lasts long.It soon wastes, exhausts and murders itself. John Adams (1814)
    1. Re:AMD by SuperKendall · · Score: 1

      Does amd have naything similar? Dan 20sec rule.

      That's odd, right about now I was thinking perhaps the limit should have been a bit higher to allow people to proofread posts a bit more.

      --
      "There is more worth loving than we have strength to love." - Brian Jay Stanley
    2. Re:AMD by machine+of+god · · Score: 1

      Does amd have naything similar? Dan 20sec rule.

      Yeah, Dan that 20 second rule.

    3. Re:AMD by Anonymous+DWord · · Score: 3, Insightful

      From http://www.hardocp.com/article.html?art=MzEw :

      As Barton and MP were mentioned, I did think to ask what [Richard Heye, AMD Vice President of Platform Engineering and Infrastructure and the Computation Products Group] thought about the threat of Intel's Hypertheading. While I see Hyperthreading as possibly becoming a very useful add-on for the Intel CPU, I can assure you that Richard Heye does not. In fact, the subject of Hyperthreading seemed to excite him. Mr. Heye explained that he had been reading papers on the subject for years and that for Intel to bring Hyperthreading to market successfully, they (Intel) were going to have to throw many more dollars at the marketing side than the development side of the issue.

      --
      "If he thinks he can hide and run from the United States and our allies, he's sorely mistaken." Bush on bin Laden
    4. Re:AMD by Anonymous Coward · · Score: 0

      Sorry, I care about performance -- not trash talking.

    5. Re:AMD by Anonymous Coward · · Score: 0

      Naything yuo cna sya wll hrut me

    6. Re:AMD by Almace · · Score: 1

      Actually when I first time I tried to post it was error free. Then I got a 20 sec warning pressed the back button and it erased my post. I re-typed it very quickly. The original post was longer and more in depth. In fact I believe it was the finest post I ever composed. Alas it is now lost.

      --
      Remember,democracy never lasts long.It soon wastes, exhausts and murders itself. John Adams (1814)
    7. Re:AMD by Anonymous Coward · · Score: 0

      Yes, dey do. And because they are manucactured in Dresden, it is called UBER-Threading.

  2. No by chainrust · · Score: 1, Offtopic

    I refuse to support Intel as long as they support Palladium and DRM.

    1. Re:No by unicron · · Score: 2

      I'm sure they'll have to close a plant because of your radical decision.

      --
      Finally, math books without any of that base 6 crap in them.
    2. Re:No by Anonymous Coward · · Score: 0

      That's fine.

      BTW, AMD supports it too.

      Guess you'll be buying a Mac.

    3. Re:No by Billly+Gates · · Score: 0, Offtopic

      Not offtopic. If someone refuses a product based on principal or company ethics then its a valid point. Unfortunately it wont make a difference in the corporate market which is where this will sell but a -1 offtopic karma is inappropriate.

    4. Re:No by Anonymous Coward · · Score: 0

      you are a fucking moron. AMD will support it also. It is by choice to enable the secure part of the chip. get a clue.

  3. Might not speed up benchmarks... by be-fan · · Score: 4, Informative

    But I'd but it gives quite a boost to interactive performance. SMP setups tend to be wonderfully responsive under background loads (much more so than the sum of the CPU speeds would suggest) so I'd guess that allowing the CPU to run more than one thread at a time would make the UI a little more responsive on single-proc machines. Now, all we need are the UNIX developers to stop being afraid of multithreading and maybe some of us UNIX users would be able to take advantage of this :0

    --
    A deep unwavering belief is a sure sign you're missing something...
    1. Re:Might not speed up benchmarks... by Anonymous Coward · · Score: 0, Offtopic

      > SMP setups tend to be wonderfully responsive under background loads

      Yes. Guess why ? Because there is an idle processor.

      Guess why ? Because the background task is generally single-threaded.

      > so I'd guess that allowing the CPU to run more than one thread at a time would make the UI a little more responsive on single-proc machines.

      Yes. basically, you'll have a virtual processor for the display server and one for the application. Very smooth.

      > Now, all we need are the UNIX developers to stop being afraid of multithreading and maybe some of us UNIX users would be able to take advantage of this

      Of course not. There will be no idle processor when executing a multi-threaded background task.

      Your setup will become as unresponsive as with a single core...

      Cheers,

      --fred

    2. Re:Might not speed up benchmarks... by be-fan · · Score: 2

      I don't have to guess why. I know why SMP setups are more responsive under background loads. I don't like you're tone of voice. As for multithreading, I was talking about GUI apps, which generally aren't multithreaded on *NIX, not background apps.

      --
      A deep unwavering belief is a sure sign you're missing something...
    3. Re:Might not speed up benchmarks... by Anonymous Coward · · Score: 0

      My point was that SMP setups are more responsive under *single-threaded* background loads.

      Making all apps multi-threaded would void the perceived SMP responsiveness advantage.

      (And yes, I had a dual PPC be box...)

      Cheers,

      --fred

    4. Re:Might not speed up benchmarks... by Anonymous Coward · · Score: 0

      Ever use the news reader Pan?

      Multithreaded all over the place. A really cool news reader.

      I wish more applications were seriously multithreaded. You don't really know how much better it can be until you use something that is well designed and multithreaded, quite a joy.

    5. Re:Might not speed up benchmarks... by Anonvmous+Coward · · Score: 2

      Windows 2000 became noticably more responsive when installed on a Dual Processor machine. I went from a Pentium 3 550 to a dual Pentium 3 500, and Windows/Explorer/IE etc were very enthusiastic and responsive about opening up and staying alert.

      Windows seems to be multi-threaded pretty well, at least from a UI point of view. I cannot help but feel that hyperthreading will likely have a similar result. If it does, it means Windows will behave better for the end user.

      Users often weigh performance on how fast a window pops up, not so much on how many calculations can be performed in a second.

    6. Re:Might not speed up benchmarks... by Ost99 · · Score: 2, Interesting

      SMT isn't necessarily a good idea for desktop computers, perhaps espesially from a GUI / responsiveness point of view. SMP machines don't share cache, and have problems with running thightly coupled threads, because the treads have to check the other CPU's cache when reading / writing to a (cached) shared resource. On a SMT this is not a problem, as the cache shared.

      SMP handles does a very good job of "hiding" all the processes the OS runs from a desktop user, you'll never experince slowdowns when the OS / an other app wirtes / reads from disk (if it's not because it's out of memory, and have to use the swap files...). On older systems this could be a significant problem. Playing games from cd-rom was often impossible, as the cd-rom drive used 40%-60% cpu when reading / seeking. With SMP you had another cpu to do your stuff, while the OS did it's stuff on another (not true of course, but close).

      An SMT pc woun't necessarily benefit the same way as a SMP when running such unrelated processes simultaneous, especially cache intensitive processes (cache is a shared, limited resource).

      I think SMT will benefit processor intensitve programs like simulations, and (multitreaded) games.

      If some way of restricting each process / threads use of cache isn't implemented, realtime scheduling on these processors will be all but impossible (it's rather hairy on SMP as well).

      - Ost

      --
      ---- Sig. gone.
    7. Re:Might not speed up benchmarks... by Toraz+Chryx · · Score: 2

      Playing games from cd-rom was often impossible, as the cd-rom drive used 40%-60% cpu when reading / seeking
      Might I suggest the use of an IDE controller that supports DMA... otherwise known as pretty much all of them since 1994...

    8. Re:Might not speed up benchmarks... by Kashif+Shaikh · · Score: 4, Interesting

      UNIX developers to stop being afraid of multithreading and maybe some of us UNIX users would be able to take advantage of this

      Do you know why they are afraid? In my view, threads re-introduce the problem where you have a bunch of processes that can freely share any memory at will, use any means of communication, and are a pain in the Ass with a capital A to debug/trace properly(without using internal debuggers). Try debugging a single process with dozens of different threads(i.e. threads with diff. entry points), where each thread has another dozen instances of itself. Now try using traditional debugging tools like strace,gprof(for tracing), or gdb.

      In traditional multi-process environments, multiple processes are forced to communicate using well-designed message passing interfaces(pipes, unix domain & net sockets, FIFOs, message queues, shared-memory). Sure you can use share memory, but its done in a more restricted way(you share a buffer) so that it's not abused. Badly written threads in my experience use global variables and literally hundreds of flags(i'm not joking) for communicating what to do,whats the state,etc. Debugging processes are easier IMO, because all processes can dump their core, you can pause a process in action and see exactly what its currently doing(tracing).

      I want to ramble more, but I'm tired. Anyone have more input on threads v.s processes?

    9. Re:Might not speed up benchmarks... by Anonymous Coward · · Score: 0

      "Now, all we need are the UNIX developers to stop being afraid of multithreading"

      What?! Who, why, and which Unix developer has been afraid of multithreading??? I have never heard of such a thing.

      Please, give me a clue, sign, or at least a link!

    10. Re:Might not speed up benchmarks... by kasperd · · Score: 2

      as the cd-rom drive used 40%-60% cpu when reading / seeking. With SMP you had another cpu to do your stuff, while the OS did it's stuff on another (not true of course, but close).

      This really indicates poorly designed hardware missing interrupts and/or DMA. Surely SMP will help here, but an extra CPU is a high price to pay compensating for the few bucks saved by using poorly designed components for the rest of the system.

      I think HT will also help. As long as the busy CPU is busywaiting, the clever driver/OS designer could even make use of the pause instruction to reduce this virtual CPUs resource usage and thus speeding up the other virtual CPU. This means that on HT the resources wasted on busywaiting can less than on SMP.

      --

      Do you care about the security of your wireless mouse?
    11. Re:Might not speed up benchmarks... by technicurt · · Score: 1

      Relatively few programmers are going to take the time to split programs and algorithms into cooperative multithreaded tasks. Supercomputer-type numeric applications would clearly benefit but that is boring.

      A much bigger win would be to run instructions from multiple *processes* simultanously. Think of all the background processes that are running and of things like 'gmake -J'. The O/S would manage it transparently and all programs would benefit.

      Curt

    12. Re:Might not speed up benchmarks... by Anonymous Coward · · Score: 0

      Global variables being overwritten is the whole idea with threads. Without any shared (i.e. global) variables, the whole idea disappears, and all you have left is something that should have been done with processes.

    13. Re:Might not speed up benchmarks... by Ost99 · · Score: 1

      A DMA capable controller didn't help much in the old days. Seeking still took much cpu. I'm talking about the good old days here, 1992-1994. The first 2x and 4x drives where horrible CPU hogs.

      But no matter, that was not the point, just an example.

      - Ost

      --
      ---- Sig. gone.
    14. Re:Might not speed up benchmarks... by dmelomed · · Score: 1

      There's more than one kind of threads, but pthreads programming can result in a major PITA. Furthermore, it may not even offer any performance improvement if the design was incorrect. Don't think of threads as panacea.

    15. Re:Might not speed up benchmarks... by be-fan · · Score: 2

      Most of your complaints arise from bad design or poor tools. Its just as easy to make multiple threads communicate via well defined interfaces (messaging) as to make multiple processes communicate via well defined interfaces. There is the incentive to use global variables and whatnot, but that's just bad programming, and doesn't reflect on threads as a design feature. The main advantage of threads vs processes is that current APIs have a good deal of support of multithreaded programs, but not multiprocess programs. It's a lot easier to create a GUI program that handles all drawing and user-interface tasks in a seperate thread than to create a GUI program that handles these tasks in a seperate process, because the GUI toolkits have some level of support for threads.

      --
      A deep unwavering belief is a sure sign you're missing something...
    16. Re:Might not speed up benchmarks... by Kashif+Shaikh · · Score: 1

      Check out why samba doesn't use threads: here.

  4. "It's already in the Xeon" by Theatetus · · Score: 4, Insightful

    Yes, but since no one has a supersentient compiler and assembler like ht requires, very few programs are able to really take advantage of this.

    I dig innovation. I dig more impressive chips. But it's getting to the point where boxes with top of the line CPUs are like those old VWs with Porsche engines in them: there comes a point when improving one part doesn't really matter any more.

    --
    All's true that is mistrusted
    1. Re:"It's already in the Xeon" by be-fan · · Score: 4, Insightful

      Um, HT doesn't require supersentient compilers, it requires mildly sentient developers. Namely, developers have to make their programs multithreaded. In the Windows world, this happens already, far less so in the Linux world. Speaking of supersentient compilers, Intel C++ 6.0 supports OpenMP, even on Linux.

      --
      A deep unwavering belief is a sure sign you're missing something...
    2. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 0

      > Um, HT doesn't require supersentient compilers, it requires mildly sentient developers.

      I've yet to met a developer that make a correct multi-threaded program.

      One that never deadlock or crash.

      Mildly sentient developers make multithreaded program (that crash)

      Sentient developers make singlethreaded programs (that work)

      It is rumored that some supersentients developers can make multi-threaded programs (that work), but this is _really_ difficult.

      Cheers,

      --fred

    3. Re:"It's already in the Xeon" by jc42 · · Score: 4, Interesting

      > developers have to make their programs multithreaded. In the Windows world, this happens already, far less so in the Linux world.

      There's a good reason for this. The biggest problem with debugging multithreaded code is preventing the threads from shooting each other in the foot. On unix-like systems, there's a simple, elegant solution to this: processes. If you use independent processes with shared memory, you can limit the foot-shooting problems to only the shared segments, and the rest of the code is safe. You also have several kinds of inter-process communication that are easy to program and fairly failsafe.

      On Windows, you don't much have these things. Developers don't much take advantage of multiprogramming, because the inter-process communication tools are so complex. So the model is a single huge program that does everything. The natural development is toward an emacs-like system, in which everything is a module in one huge program. In such a model, it makes sense to want to use threads, so that some tasks can proceed when others are blocked.

      One way to get unix/linux developers adopt threads is making it more difficult to use the basic unix multi-processing and IPC tools. If they can be made more complex than threads, then people will adopt the Windows model.

      Alternatively, the threads library could be made as easy to use as the older unix approach. But so far, there's little sign of this happening.

      Threads are a debugging nightmare, and a programmer who has lost months trying to debug a threadized program, and finding that the end result runs even slower than the original, is going to be shy to do it again.

      Also, calling the developers dummies isn't very persuasive. They mostly hear such insults as a euphemism for "It's too complicated for your simple mind." When I hear things like that as answers to my questions, I tend to agree with my critic, and revert to things that I can understand and get to work right.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    4. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 0

      > It is rumored that some supersentients developers can make multi-threaded programs (that work), but this is _really_ difficult.

      No. It is not. Well, not if you actually understand Computer Science.

      Would you suggest that it takes a supersentient contractor to lay a concrete slab?

      It's just part of the work. You can either do it, or you shouldn't be trying.

    5. Re:"It's already in the Xeon" by Wildcat+J · · Score: 2
      Um, HT doesn't require supersentient compilers, it requires mildly sentient developers.
      That's not entirely true, HT is not some sort of panacea. Simply making a program multithreaded doesn't guarantee that it will be faster on HT, or even SMP. There is the very real issue of resource contention, in which case the HT system can starve itself and run slower than a non-HT system.

      In general, I think that people learning to write multithreaded code is important. In a program where there are several disjoint tasks that can run in parallel, then multithreaded code can run faster. However, I disagree when people complain that it's a matter of lazy programmers, and that if they made everything multithreaded the world would be a better place. It's not so simple.

      -J

    6. Re:"It's already in the Xeon" by be-fan · · Score: 2

      True, I should have qualified this. I was mainly referring to GUI code, where there is are a whole lot of disjoint tasks that could easily be multithreaded, but generally aren't on *NIX platforms.

      --
      A deep unwavering belief is a sure sign you're missing something...
    7. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 2, Insightful

      > Well, not if you actually understand Computer Science.

      I disagree. I will not be able to get you see the thing from my side, but let's state a few of thing:

      1/ I know computer science.
      2/ I wrote multi-threaded apps. Hell, I even wrote an IP stack and a B-Tree base transactional system. Those worked.
      3/ I painfully debugged other people multi-threaded race conditions that only occurs twice a day on a heavily loaded server.
      4/ I saw my co-workers writing bad multi-threaded code
      5/ I maintained code I wrote for years, so I know the cost of the mistakes and complex features
      6/ I now reject most designs that contains multiple threads.

      > You can either do it, or you shouldn't be trying.

      Most software is written by youngsters that did not realize that they should not be trying to write multi-threaded code. See the swing source code for instance. It is a pathetical mess of code made by (mostly) clueless coders that thought they were smart.

      I am not saying that I cannot write multi-threaded app, or that you cannot write one. I was arguing that the original poster, the Be Fan, was deadly wrong when he said that multi-threaded apps were easy, that "Namely, developers have to make their programs multithreaded."

      And that moderators that gave him that +5 never debugged production multi-threaded code.

      Cheers,

      --fred

    8. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 0

      The student asks, "Is correct multi-threaded code so difficult that not even you can master it, Sensi?" The master replied, "That is irrelevent, for I am wise enough not to try." And the student and all who heard were enlightened.

      KISS

    9. Re:"It's already in the Xeon" by spitzak · · Score: 5, Insightful
      I would agree with the rest of the responders here that you have no idea what you are talking about.

      A correct multithreaded program is HARD!!!!! Anybody who thinks otherwise is an idiot. I have seen the results. All the systems I have seen are either broken or have so many locks in them that they may as well be single-threaded. Most Windows programmers use multithreading so that they can keep more state in local variables, which may be an ok goal but has nothing to do with speed. One of biggest buggiest programs here is a multh-threaded monstrosity written by a Windows program where there are 50 threads, ALL WAITING ON THE SAME SOCKET, and it crashes sparodically in the rare cases when two threads actually become alive at the same time. Every single rewrite to reduce the number of threads has greatly improved performance and reliability.

      I have no idea why you think GUI should be multi-threaded. GUI has no reason to be fast, computers are MUCH faster than humans, at least at drawing junk on the screen. In fact the best way to do it is pseudo-multithreading, such as the method windows uses (gasp! Fact alert: it is NOT multithreaded, only one "DispatchMessage" is running at a time!).

      I think perhaps you mean that the GUI should be running in a parallel thread with the calculations and there you have a point, however a lot of the problems are solved by deferred redraw, which the X toolkits do quite well (and in fact Windows is broken because it produes WM_PAINT events without knowing if the program has more processing to do).

      Now if there are intense calculations I grant that parallel threads are necessary, and I am working on such a program, but I must warn you that it is extremely difficult: the GUI cannot modify ANY structure being used by the parallel thread, instead it must kill the threads, wait for them to stop, modify the structure, and start them again. If in fact nothing changed you need to restart so the partially-completed answer from last time can be reused, this means you must write all the code you would for a single-threaded appliation, it does NOT save you anything. If you restart the complete parallel calculation you will get an unresponsive program if that parallel calculation takes more than a second or so. You could instead do a fancy test to see if your modifications will change the data before you kill the threads and commit them, but this often requires you to calculate the modifications twice, and the overhead of this may well kill the advantage of the parallel thread, and at least in my example this was far worse than reusing all the single-threaded restart code.

    10. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 0

      Your post doesn't make any meaningful sense.

      You've basically said...

      "To prevent thread problems drop back to processes, unix'en can do that, win'doz can't very well."

      I've never heard such FUD. When you don't use threads, all you have are processes running independently talking IPC. Windows does that, Unix does that, every OS can do processes and IPC blindfolded. Hell, threads start out as processes!

      You said..."On Windows, you don't much have these things...". Oh, yea, highly technical aren't we? That's like saying, "Windows can't run processes...". Ok then click "Start->Run->Notepad.exe", that's a process!

      Yea, maybe I'd like to step backward into the stone age. Then again, I don't listen to fools.

      You've just stepped over the edge...have a nice trip.

    11. Re:"It's already in the Xeon" by Pieroxy · · Score: 1

      There's a good reason for this. The biggest problem with debugging multithreaded code is preventing the threads from shooting each other in the foot. On unix-like systems, there's a simple, elegant solution to this: processes.

      May I add that in both worlds (windows/unix) there's another elegant solution to this: Object oriented programming !

    12. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 0

      It's just too bad that so many people claim to know CS, then make all kinds of "but, remember the children!" excuses.

      Either you know your skill, or you do not. To day threading is, in any way, bad is to not know when or how to use a potent and important tool.

      I have written multi-threaded code. Many millions of thread executions per second -- with as many as 15 programmers contributing. System simply could not be built without it.

      Too bad people keep hiring sub-standard professionals. Then, try to dumb down the entire industry, on the basis of "clueless coders that thought they were smart".

      Sounds like a race to the bottom to me.

    13. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 5, Informative
      There's a good reason for this. The biggest problem with debugging multithreaded code is preventing the threads from shooting each other in the foot.

      Yes, you have to use mutexes and other synchronization primitives to serialize (or at least de-conflict) accesses to shared data. But, there's nothing that requires you to share data between threads. In fact, a significant percentage of the data in the average multi-threaded program is not shared. No matter whether you are building an application using multiple threads or multiple processes, you still have the freedom to use whatever mix of data sharing and message passing is appropriate for your application.

      On unix-like systems, there's a simple, elegant solution to this: processes. If you use independent processes with shared memory, you can limit the foot-shooting problems to only the shared segments, and the rest of the code is safe.

      Data shared by multiple processes needs exactly the same kind of protection as data shared by multiple threads. Except that using shared memory segments requires a lot of extra book keeping and the segments aren't cleaned up if a program terminates abnormally. And obviously, no matter whether you are using multiple threads or processes, the foot shooting is limited to the shared data only.

      You also have several kinds of inter-process communication that are easy to program and fairly failsafe.

      You can communicate between threads (or even between the same thread or process and itself) using named pipes if you want. Same goes for sockets. Using a multi-process model instead of a multi-threaded model doesn't give you access to any additional mechanisms. In fact, it's much easier to build useful communications mechanisms if you're working with threads.

      On Windows, you don't much have these things. Developers don't much take advantage of multiprogramming, because the inter-process communication tools are so complex. So the model is a single huge program that does everything. The natural development is toward an emacs-like system, in which everything is a module in one huge program. In such a model, it makes sense to want to use threads, so that some tasks can proceed when others are blocked.

      In Windows, you have basically the same tools. You may not know this, but the process & thread model in Windows is virtually the same as in most modern UNIX systems. The fact that old UNIX command line tools are small and oriented around using pipes for IPC is mainly a byproduct of history & convention, if that's what you're thinking of.

      One way to get unix/linux developers adopt threads is making it more difficult to use the basic unix multi-processing and IPC tools. If they can be made more complex than threads, then people will adopt the Windows model.

      Alternatively, the threads library could be made as easy to use as the older unix approach. But so far, there's little sign of this happening.

      I would say that building applications with multiple threads is already easier than building applications with multiple processes. That has been my experience anyway.

      Threads are a debugging nightmare, and a programmer who has lost months trying to debug a threadized program, and finding that the end result runs even slower than the original, is going to be shy to do it again.

      On the contrary, debugging apps that consist of multiple processes is a nightmare. Debugging multi-threaded programs is much easier. For one thing, how many debuggers let you attach to & debug more than one process at a time in the same set of debugger windows (or at all)? Further, when you're debugging a program with multiple processes, if you signal or interrupt one process the others continue on (and vice-versa when you continue). This is rarely what you want. In general, the differences boil down to the fact that the OS & debugger coordinate & manage the execution of multiple threads within one application, while you have to do it manually if you have an application built with multiple processes. That means less work for the developer in terms of lines of code, less work in debugging, etc.

      Also, calling the developers dummies isn't very persuasive. They mostly hear such insults as a euphemism for "It's too complicated for your simple mind." When I hear things like that as answers to my questions, I tend to agree with my critic, and revert to things that I can understand and get to work right.

      The problem isn't so much that old school UNIX programmers are dumb. Mostly, they're either afraid of change or just too damn arrogant & obstinate to bother learning new technologies.

    14. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 0
      There's a good reason for this. The biggest problem with debugging multithreaded code is preventing the threads from shooting each other in the foot. On unix-like systems, there's a simple, elegant solution to this: processes.

      May I add that in both worlds (windows/unix) there's another elegant solution to this: Object oriented programming!

      I should note that OOP doesn't solve this problem, it only gives you a logical place to put synchronization code/mutexes.

    15. Re:"It's already in the Xeon" by jonabbey · · Score: 2

      Nah.

      Programming in threads is fine, so long as you have some ability to encapsulate your memory reliably. Doing multithreaded programming in Java is the default, and Java's strong object encapsulation and memory protection makes it quite reasonable to program with threads.

      You absolutely have to be aware of concurrency issue.. deadlock, livelock, and all that, but it's not a terribly bad burden given that you gain so much in simplicity of memory management, integrated exceptions on null pointers, etc., etc.

    16. Re:"It's already in the Xeon" by Theatetus · · Score: 1

      Um, HT doesn't require supersentient compilers, it requires mildly sentient developers. Namely, developers have to make their programs multithreaded. In the Windows world, this happens already, far less so in the Linux world.

      I guess I should have been more clear in my original post. The multiple "threads" in a hyperthreaded processor set may or may not bear any relationship to the threads in a well-designed program using a threads library. Just because I spawn multiple threads safely in my language of choice doesn't mean that those threads will run as distinct threads on distinct processors.

      Threads spawned in a program are scheduled by the OS (or by the host/VM, depending on what language we're talking about; Java, after all, is "multi-threaded" in this sense), and no OS or VM that I know of guarantees that an arbitrary "thread" in a program will run as an independent thread on an SMP or HT processor set.

      That's why I mentioned the hypothetical "super-sentient" compiler. A compiler has to set up the "threads" so that the OS will schedule them as honest-to-God Threads for the hardware. Compilers' mileage will vary; that in itself is why I think we should be focusing more energy on better compilers and assemblers right now until they catch up with the capabilities of the processors out there.

      --
      All's true that is mistrusted
    17. Re:"It's already in the Xeon" by Pseudonym · · Score: 2
      Alternatively, the threads library could be made as easy to use as the older unix approach.

      Well there are a lot of very nice wrappers out there (e.g. boost::thread, QThread and so on), plus most serious applications have their own wrappers. This is the same reason, incidentally, why developers generally don't use open(), read(), write() and creat(): that's not the level you're meant to program at. You should use stdio, sfio or iostreams instead, unless you really need the finer control.

      One real reason, I think, why we don't see more threads under Linux is that Linux doesn't support POSIX threads. The POSIX threads model is processes which have threads. Linux, on the other hand, has processes which can share address spaces, file descriptors and so on, which is not the same thing. For example, fork()ing a process in one thread and waitpid()ing on it in another thread simply doesn't work under Linux. This sort of thing makes porting POSIX-compliant multithreaded applications to Linux difficult at best.

      Note: Before anyone accuses me of FUDing, note that I'm not passing value judgements here. Linux threads might well be better than POSIX threads. As a developer, however, when I #include <pthread.h>, I expect POSIX threads, and under Linux I don't get that. It's the lie that concerns me.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    18. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 3, Informative
      One real reason, I think, why we don't see more threads under Linux is that Linux doesn't support POSIX threads. The POSIX threads model is processes which have threads. Linux, on the other hand, has processes which can share address spaces, file descriptors and so on, which is not the same thing.

      Wrong. Linux threads are compliant with POSIX 1003.1c (and most of the common extensions). There is one exception, abeit a minor one - you can signal individual threads in Linux. The POSIX standard specifies nothing about how threads are to be mapped to processes.

      In Linux, the mapping between processes and threads is strictly one-to-one at the kernel level, although the use of thread groups makes it effectively one-to-many at the user process level. Other operating systems such as Solaris offer a many-to-many mapping with kernel light weight processes (LWPs), but it's again one-to-many at the user process level. Both implementations are about equally close to being POSIX compliant (Solaris threads aren't POSIX compliant because they don't support cancellation).

      For example, fork()ing a process in one thread and waitpid()ing on it in another thread simply doesn't work under Linux. This sort of thing makes porting POSIX-compliant multithreaded applications to Linux difficult at best.

      Not true again. In Linux 2.4, a parent process will wait on any child in the same thread group by default, unless you block SIGCHLD. In previous versions, it wasn't the default, but you could still do it. Besides, this doesn't have much to do with POSIX threads, because fork() and waitpid() aren't part of the pthreads API. fork() and waitpid() are process management functions. To create a new thread in POSIX, you use pthread_create() and to wait for one to exit you use pthread_join().

      Note: Before anyone accuses me of FUDing, note that I'm not passing value judgements here.

      Perhaps not, but you are passing bad info.

    19. Re:"It's already in the Xeon" by entrigant · · Score: 0, Flamebait

      A correct multithreaded program is HARD!!!!! Anybody who thinks otherwise is an idiot.

      That, or they are smarter than you.. which isn't too hard to believe.

    20. Re:"It's already in the Xeon" by zaqattack911 · · Score: 1

      Threads too complicated?? then don't use C!

      Try Threads in java... easy as hell.
      Threads in Python? no problem...

      yaddaa yaddaa yaddaa

    21. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 0

      Increases to the intelectual level required to develop programs are generally a bad thing.

      I'm also going to have to agree with the grandparent mostly.

    22. Re:"It's already in the Xeon" by zonker · · Score: 0

      i dig being able to afford a chip, which is something i can't have with most xeons.

    23. Re:"It's already in the Xeon" by juggy · · Score: 2, Informative
      I don't think you know of the newer approaches to threading technology. For starters, you can check out the scheme48 site. They implemented threads not by using locks but by using logging facilities, that is to say, journals. I just spent 6 months working with this way of doing things, and I can assure you one this:
      1. you cannot get deadlocks
      2. you can hardly produce lifelocks
      3. it is much faster than using shared memory
      4. the main system always has access to the memory, no need to unlock/lock/...

      There are a LOT of good reasons to use this sort of multi-threading, especially since - if correctly implemented - it requires much less memory, cpu and debugging efforts than processes or the old sort of threading model.
    24. Re:"It's already in the Xeon" by juggy · · Score: 2, Insightful

      I have no idea why you think GUI should be multi-threaded. GUI has no reason to be fast

      Yes, it does: If you multithread it, you can e.g. show debug output, update controls and enable the user to still use the GUI. In many unix-apps your gui sort of freezes while the processes/threads are running in the background (doxygen had/still has this problem, if I remember correctly).

      the GUI cannot modify ANY structure being used by the parallel thread, instead it must kill the threads, wait for them to stop, modify the structure, and start them again

      This is not correct. It only happens when you don't know how to correctly implement a threading model, e.g. if you use journal-based threads instead of log-based you won't have any of those problems whatsoever. For example, the folks using Scheme48 implemented this, and it made a lot of their problems just vanish.

    25. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 0

      "Nothing that requires you to share data between threads"? So, now you've just reinvented processes, using threads instead, but without the memory protection that processes give you. Try using the right tool for the job: When you don't need to share data, or the amount of data shared is low, processes, possibly with shared memory. When you just want to share everything: Threads.

      Yes, data shared between processes needs the same protection. But when you have to think about what data should be shared, and put that in shared memory, you know where to put your locks. With processes, you don't accidentally share data.

      The problem with using processes under windows are (according to windows developers) that processes are (or were) much slower. If this has been fixed, and windows processes are just as good as *nix processes, then why do windows programmers still insist on using threads for things that should not use threads? Could it be that windows programmers are afraid of change?

      IMHO less than 10 percent of software should be using threads. For the rest, using threads are just a way to introduce more places where locking is needed, i.e. more places to introduce bugs by forgetting locks. And, as we all know, humans are not perfect. So, more places to introduce bugs, means more bugs.

      I don't believe it is a coincidence, that most windows programs use threads, and most windows programs are unstable.

    26. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 0

      No, threads are broken by design.

    27. Re:"It's already in the Xeon" by joib · · Score: 2

      For those who have been following the Linux kernel development, there are two implementations trying to replace the current threads implementation, the NPTL (native posix thread library) by Ingo Molnar and Ulrich Drepper, and the NGPT (next generation posix threads) by IBM. The NPTL is a 1-1 implementation while NGPT is m-n. Based on some benchmarks it looks like NPTL is a lot faster and is being included in the 2.5 kernels (and glibc 2.3). Also, Solaris 9 has moved to a 1-1 thread implementation, so while a m-n thread implementation perhaps has some theoretical advantages it seems that it is so complex and has so much overhead that 1-1 is preferred.

      And, as the AC above already said, the POSIX thread standard leaves the choice of 1-1 or m-n up to the implementor. Which is logical, since it doesn't change the semantics of the program using pthreads.

    28. Re:"It's already in the Xeon" by joib · · Score: 3, Interesting

      Umm, could you elaborate on this? Do you mean some kind of COW (copy-on-write) kind of stuff (i.e. MVCC in database terminology). I.e. if you write to a locked resource, a new copy of the resource is created, and when the writing is completed and noone is reading the original resource the new one is copied over the old one? I think someone was experimenting with this for the Linux kernel, they came to the conclusion that for data structures which are mostly read-only this is faster than the traditional locking approach, but if you need to write a lot, this is slower because of the overhead of copying.

    29. Re:"It's already in the Xeon" by joto · · Score: 2
      Debugging multi-threaded programs is much easier. For one thing, how many debuggers let you attach to & debug more than one process at a time in the same set of debugger windows (or at all)?

      Now, why on earth would you do that? The reason for putting things in separate processes is to have them separated. There is no reason to treat two entirely separate things as one. How many word processors do you see that allows you to edit two documents in the same window/pane?

    30. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 0
      Now, why on earth would you do that? The reason for putting things in separate processes is to have them separated. There is no reason to treat two entirely separate things as one. How many word processors do you see that allows you to edit two documents in the same window/pane?

      Are you seriously unable to think of any reason why a program might need more than one thread of execution? There are lots of reasons to use either multiple threads or multiple processes. The number one reason to use multiple threads (or processes) is to isolate operations that block from the parts of the program that need to remain interactive. For example, the first web browsers were single threaded and would block waiting for a DNS query to complete. While they were blocked (which was anywhere from a fraction of a second to a minute or more), the UI was frozen. Alot of other non-threaded networking apps exhibit similar behaviour. Another reason to use more than one thread is if a part of your program has to sit on a socket waiting for packets or connection requests - you couldn't write a useful web server without multiple threads or processes. Finally, a lot of applications involve calculations that take a long time and it's usually desirable to let them run in the background rather than freezing up the whole UI while the calculation completes. I could go on, but I hope you get the point by now.

    31. Re:"It's already in the Xeon" by KoopaTroopa · · Score: 1

      In Linux 2.4, a ...

      Dude, get with the times. Linux 8.0 was just released!

      (kidding)

      --
      Sharpies don't just sniff themselves.
    32. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 0

      In Windows, you have basically the same tools. You may not know this, but the process & thread model in Windows is virtually the same as in most modern UNIX systems. The fact that old UNIX command line tools are small and oriented around using pipes for IPC is mainly a byproduct of history & convention, if that's what you're thinking of.

      "old" "history" "convention": You don't think too highly of unix utilities, do you? These old, decrepeit utilities are in fact based around a complete OS design philosophy that has escaped many other OSes: A data driven operating system gives you more orthogonality that any other known paradigm. You say archaic? I say mature. At least if you're trying to build a general-purpose operating system.

    33. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 0
      "old" "history" "convention": You don't think too highly of unix utilities, do you? These old, decrepeit utilities are in fact based around a complete OS design philosophy that has escaped many other OSes: A data driven operating system gives you more orthogonality that any other known paradigm. You say archaic? I say mature. At least if you're trying to build a general-purpose operating system.

      Actually, I didn't mean to convey that the design philosophy behind all the UNIX CLI tools was bad (I quite like it). I was just pointing out that this philosophy had nothing to do with the threads vs. processes debate. It is old enough to predate threads.

    34. Re:"It's already in the Xeon" by Anonymous Coward · · Score: 0
      "Nothing that requires you to share data between threads"? So, now you've just reinvented processes, using threads instead, but without the memory protection that processes give you. Try using the right tool for the job: When you don't need to share data, or the amount of data shared is low, processes, possibly with shared memory. When you just want to share everything: Threads.

      Don't give me this "right tool for the job" nonsense argument. Threads are the right tool for the job in most circumstances. Even if you are sharing absolutely nothing, using multiple threads instead of multiple processes is usually preferable for several reasons:

      - Threads have less overhead than processes on basically every OS other than Linux (where they are one in the same). They generally require less memory overhead, creation & termination is generally faster, and context switching is faster.

      - The API for managing threads is much richer than the API for managing processes.

      - There are more options available for communicating between threads than processes.

      - A multi-threaded app is easier to debug than a multi-process app.

      - The kernel scheduler & VM system work better.

      Yes, data shared between processes needs the same protection. But when you have to think about what data should be shared, and put that in shared memory, you know where to put your locks. With processes, you don't accidentally share data.

      I suppose that unexpected sharing is one potential disadvantage of using threads, but to be honest it's not hard at all to keep threads from accidentally sharing data. No matter whether you design a program to use multiple processes or multiple threads, you have to decide what's going to be shared and what isn't. For the data that decide to share, you need exactly the same protection mechanisms no matter whether you are using threads or processes. For the data that you decide not to share, it's easy to write a multi-threaded program that doesn't share it.

      I've known a lot of people who initially didn't want to touch threads because of an irrational fear of shared data, but they've all changed their tune after working with threads for a while. Like I said earlier, there's nothing that requires you to share data between threads if you don't want to, and the pitfalls of working with shared data are just as much an issue if you're using processes.

      The problem with using processes under windows are (according to windows developers) that processes are (or were) much slower. If this has been fixed, and windows processes are just as good as *nix processes, then why do windows programmers still insist on using threads for things that should not use threads? Could it be that windows programmers are afraid of change?

      Process overhead (particularly creation) on Windows is relatively high compared to threads, but that's true on most UNIX operating systems too. Under Linux, process & thread overhead are identical because processes & threads are identical. But Linux is very unique in this respect. Windows programmers use threads because they're a superior solution for most applications, for all the reasons I've given and more. Besides, multi-threaded programming is extremely common on UNIX too. Most UNIX programmers have already moved on from building applications with multiple processes to using multiple threads. UNIX programmers who think like you do are in the minority these days, just like the people who refuse to write in any other language than C.

    35. Re:"It's already in the Xeon" by joto · · Score: 2
      Are you seriously unable to think of any reason why a program might need more than one thread of execution?

      No, and that isn't what I said. I would admit that I really didn't say much, but if I had said something I would have said that in the majority of cases, processes are more managable, and easier to debug than threads.

      The number one reason to use multiple threads (or processes) is to isolate operations that block from the parts of the program that need to remain interactive.

      Which in many cases can also be done with separate processes, select() or poll().

      For example, the first web browsers were single threaded and would block waiting for a DNS query to complete.

      Something that can easily be done as a separate process. Since the browser would have to listen on multiple file-descriptors anyway, in order to listen to both network and GUI events, this would have minimal overhead on the design of the webbrowser. And the increased modularity would help you in both designing and debugging both applications.

      Another reason to use more than one thread is if a part of your program has to sit on a socket waiting for packets or connection requests - you couldn't write a useful web server without multiple threads or processes.

      There exists several useful single-threaded webservers that select() or poll() instead of using threads. And there exists many that use processes instead of threads.

      Finally, a lot of applications involve calculations that take a long time and it's usually desirable to let them run in the background rather than freezing up the whole UI while the calculation completes.

      And most of those background calculations could just as well be put into a stupid background process. Leaving you with a simple, and dumb GUI to control them, and simple to write easily replacable background processes that can be developed and debugged independently.

      I could go on, but I hope you get the point by now.

      And so could I, but I doubt you get the point anyway.

    36. Re:"It's already in the Xeon" by spitzak · · Score: 2
      You are talking about mulththreading the calculations along with a SINGLE thread that does the GUI, which can make sense. However it seems the original poster kept saying "multihtreaded GUI" which implies to me some scheme like having each widget have it's own thread. This serves no purpose except MicroSoft uses it so that COM objects can be closed-source.

      I have no idea what you mean by "journal" or "log" based and would like an explanation.

      I am working from the assumption that threads (as opposed to processes) are taking advantage of shared memory. It seems to me that if I have a big calculation that depends on some data structure that the GUI can modify, I have to stop the calculation before I can modify the structure, and I have to inform the parallel thread that the structure has changed (in my case I decided to tell it to restart the calculation from the start, which I was calling "killing the thread" although in fact I really set a flag that makes the threads throw an exception, they then go to code that waits for the GUI to unlock them so that they can start the calculation over again, I am not really creating & destroying threads).

      Notice that in my case the threads are doing a calculation that can take several minutes, although they produce some results immediatly (a portion of an image) so the user can tell if they want to twiddle the controls some more.

      It is possible that I am confused because you are considering calculations that are done in a fraction of a second, such as parallel updating of a complex OpenGL scene. In such cases it may make sense for the GUI to wait for the previous calculation to finish before updating the structures. This should produce good results as long as it is trivial to exclude the majority of GUI events as not modifying the data structures.

      Other possibility is that you are assumming it is inexpensive to build a new data structure while keeping the old one in memory, destroying it only after the new one is built and the threads running the old one have either exited or are killed. This may be true but is certainly not in my case, where a huge savings is had by the reuse of cached information in the previous data structure.

      Although I don't know what "journal" vs "log" are but they both sound like communication pipes. Unix has had pipes since 1972 so I don't think they are a recent innovation and in fact all possible uses for them have long been explored. Basically if you are avoiding use of shared memory then you are not using threads.

    37. Re:"It's already in the Xeon" by roofingfelt · · Score: 1
      One of biggest buggiest programs here is a multh-threaded monstrosity written by a Windows program

      Now that's just asking for trouble - nobody should let those things write code.

    38. Re:"It's already in the Xeon" by spitzak · · Score: 2
      No actually I seriously respect his ability to get this thing to work at all. He also made it portable between NT and Linux (unfortunately it would not port to Win2K so we have to use the Linux version, but he was quite familiar with the abilities of both systems and the Linux version actually is slightly more reliable).

      But this program convinced me that there is a mindset that says "more threads is better" and that this mindset is wrong. It is producing bloated slow libraries with locks around every single function (like putc() in the pthreads standard, thus killing the speed of the basic K&R design!). It also seems to be used 90% of the time for programming convienence so that state can be kept in local variables, something that could be solved by cooperative multitasking inside the program with some library support, rather than using real parallel multiprocessor threads.

    39. Re:"It's already in the Xeon" by juggy · · Score: 1

      You are talking about mulththreading the calculations along with a SINGLE thread that does the GUI, which can make sense. However it seems the original poster kept saying "multihtreaded GUI" which implies to me some scheme like having each widget have it's own thread. This serves no purpose except MicroSoft uses it so that COM objects can be closed-source.

      I agree with the original poster, multithreading gui makes sense because if each widget has its own thread you don't need to have one giant message-transpose-loop which has to do the work for all of them.

      Other possibility is that you are assumming it is inexpensive to build a new data structure while keeping the old one in memory

      Hmm, I don't really think I understand what you mean by "building a new data structure...". New data? A completele structurally different data-component? I was referring to the possibility of changing data without it harming - unless the other threat depends on certain values stored in it, of course - the other threat.

      Although I don't know what "journal" vs "log" are but they both sound like communication pipes

      I am sorry, typo on my part :-). I meant to say "journal" vs. "lock". Journal means that you enter atomic operations to do in a journal and update the structures when necessary, that is you flush the journal. It works pretty much like journaling filesystems. And, I did not mean to say pipes - that is something completely different, and this way of doing things is at best one year old, that I know for sure.

    40. Re:"It's already in the Xeon" by juggy · · Score: 1
      Actually it works like this:
      1. announce the atomic operation you would like to do (read/write/...)
      2. your app has to wait now
      3. The journal now contains the ops you want to do. It checks whether they have to be carried out (e.g. you wanted to update a structure which another thread tries to read from now) now or can be postponed (nothing else depends on it).
      4. control is returned to your app
      As of what I know, this is much faster than the traditional way of locking everything, and much easier to keep consistent. You mentioned the linux-kernel: Usually systems that were originally designed to work on a single processor gain multiprocessor-functionality by locking the kernel for each processor sequentially. However, this is very inefficient (and I suppose you understand why ;-)).
      If it uses journals though, there is no need to lock all of the kernel; the journal will take care of keeping your data structures consistent and - I am sure about this - you will receive a considerable speed-up.
      As for the overhead of copying: Does not exist since you don't need to maintain an old datastructure, copy it, work on the new one and then erase the old one. Instead, since the journal carries out only atomical operations and only the journal has direct access to those structures, there can never be any interfering of other threads, and thus the structures are kept data-consistent.
      Hmm - I am late with this reply - will anyone read this, I wonder? ;-)
    41. Re:"It's already in the Xeon" by spitzak · · Score: 2
      multithreading gui makes sense because if each widget has its own thread you don't need to have one giant message-transpose-loop which has to do the work for all of them

      This is what I meant by "storing state in local variables". This may be a useful goal but I think it requires compiler/language support and certainly there is no reason to use anything other than lightweight threads for this, it can be totally cooperative. In my experience with this (I have tried to write systems that worked this way by using setjmp hacks) there are serious problems with the creation and destruction or visibility changes of widgets, and focus navigation cannot be handled locally by widgets. It also tends not to work well for minimal update, in fact many of the Unix programs you complain about have this exact problem in that they were written so one "important" widget (the main display) was managed using local state and incremental update. So in the end I am unconvinced that this is a good idea. However it could be a very good idea with language support.

      By "building new data structure" I mean modifying/creating whatever it is that the parallel calculation thread is reading. You cannot change it without a lock. Obviously you can change the parts the other thread does not care about, but by definition they have nothing to do with the calculation!

      I would guess that my implementation is pretty much a "journal". The GUI thread updates structures that are not used by the calculation thread. When it thinks something has changed it kills the calculations threads (there are 4 or more of them), waits for them to go idle, and then copies all the changes from the GUI structure to the actual structure, at that time comparing them to see if there really was a change. This is not "really" a journal because the actual changes are not recorded, but I do make a list of what objects have been modified so it does not have to search them all. If there really was a change it also instructs the calculation to destroy cached data so it starts over at the start. I can't believe my situation is unusual: the data is enormous and the calculation takes several minutes and it is extremely difficult to be certain whether the user's modifications have actually changed anything without looking at the current calculation results.

      In any case I cannot see any way a journal can avoid locks. Even if the parallel threads read from the journal you have to lock the journal while this is happening.

    42. Re:"It's already in the Xeon" by juggy · · Score: 1

      Though you do bring up a lot of good points, I would like to point out, that most of your problems can be avoided, mostly by clean design.
      I cannot tell all the details because it's too far back now and I wasn't the one who did all the hard work, but maybe you should check out this link, they implemented GUI and threads and all the stuff I was talking about, so it *can* be done. If you have trouble with German (which I assume most on this site have), this is the link where you can download it directly.
      One final note on your In any case I cannot see any way a journal can avoid locks: To a certain extent this probably depends on how you define a lock. The traditional locking system I was talking about differs from the journal approach in the respect that you don't need locks because you pass the atomic operations to the journal-thread which deals with putting it in the right order. Since only this journal-thread is allowed to update/read data nothing else can interfere and the data stays consistent.

    43. Re:"It's already in the Xeon" by spitzak · · Score: 2
      I meant that a "journal" has to be locked when you insert or retrieve something from it.

      This can reduce the work provided that the time code needed to retrieve or insert is tiny compared to the code needed to make the changes. It also requires that the code needed to turn a GUI action into a "message" and to turn a "message" into modifications of the data structure is about the same order of magnitude as the code needed to go from the GUI to the data structure directly. Not counting this overhead has been a problem with many message passing systems, in effect the GUI thread is "blocked" for the time it takes to calculate and insert a message, and the calculation thread is "blocked" for the time it takes to interpret a message, and even though they are not synchronized this overhead can add up to more time.

      In fact I think this overhead, and the difficulty of programming message-passing, is why there was such a push for multithreaded applications. I think now we are seeing the backlash as people realize that multithreaded is not the end-all solution they thought it was and are trying to find the correct middle ground between parallel processes and mt.

      The only way I see to reduce locks is to "batch" journals entries into a block, and allow them to be inserted and retrieved as blocks. Oddly enough though the more work done on this the more it looks like Unix pipes and stream i/o.

    44. Re:"It's already in the Xeon" by juggy · · Score: 1

      I meant that a "journal" has to be locked when you insert or retrieve something from it.
      You're right on that point, and I guess I didn't make myself clear enough with the previous posts (though I *did* post something about it somewhere). I will comment on it after your "batch" entries:
      The only way I see to reduce locks is to "batch" journals entries into a block
      This was exactly what I was referring to when I wrote about "the journal checks whether update is necessary". By setting a limit like, let's say, 5 atomic operations, you can still have all the advantages and yet have consistent and clean data access. Besides, I don't feal that you generate much overhead with those multithreaded models I talked about; e.g. the Scheme48 Implementation I was talking about uses less than a 100bytes per thread (so I was told), and I had no trouble in holding more than 30000 (thirty thousand) threads in a test case on my system without it noticably suffering.
      Also, the GUI system (ToyWindow) implementation referenced to by the link I supplied shows how this sort of threading can be used to create lightweight multithreaded GUIs which do *not* interfere with the calculation.
      You don't have to take accept everything I write you; it's just that I wrote some apps using this way and it provided lots of advantages. It worked well for me and I can understand the theory behind it, therefore I like it. But you do have to relearn everything about threading from scratch.

  5. Hyperthreading on Windows by kawika · · Score: 5, Informative

    If you plan to use any of these features effectively on Windows you'll need to upgrade to Windows.NET Server. Windows 2000 can't distinguish between virtual and physical processors, so if the BIOS doesn't set up a two (real) CPU system the right way it will end up ignorning the second physical processor. My source:

    www.microsoft.com/windows2000/docs/hyperthreading. doc

    1. Re:Hyperthreading on Windows by dzym · · Score: 5, Funny

      I've also heard that a virtual processor requires its own CPU license, at least in Win2K.

    2. Re:Hyperthreading on Windows by Tassleman · · Score: 2, Informative

      From the document:
      When examining the processor count provided by the BIOS, Windows .NET Server distinguishes between logical and physical processors, regardless of how they are counted by the BIOS. This provides a powerful advantage over Windows 2000, in that Windows .NET Server only treats physical processors as counting against the license limit. For example, if you launch Windows .NET Standard Server (2-CPU limit) on a two-way system enabled with Hyper-Threading Technology, Windows will use all four logical processors, as shown in Figure 4.
      [DIAGRAM 4]
      This example illustrates the great benefit provided by Windows .NET Server on systems enabled with Hyper-Threading Technology--customers are able to harness the processing power of four logical processors using a 2-CPU license.


      Well that's unsurprisingly lame on Microsoft's part. Basically that document says "we're too lazy to update Windows 2000 to PROPERLY recognize SMT-enabled processors, and will screw you on licensing unless you upgrade to .NET Server"

    3. Re:Hyperthreading on Windows by nick-less · · Score: 1


      Well that's unsurprisingly lame on Microsoft's part.

      and exactly the same on any older linux kernel - you can't support what you don't know - prior to ev8 or p4 most kernel hackers had ever thought about "logical" processors. I'm pretty sure they could release a service pack to support hyperthreading on w2k, but they love to make money ;-)

    4. Re:Hyperthreading on Windows by Fjord · · Score: 2

      1) backporting (or if you prefer a windows term, service packs)

      2) the lame part is that you have to pay to not have to pay for your logical processors. linux doesn't have these lisensing issues.

      --
      -no broken link
    5. Re:Hyperthreading on Windows by riiv · · Score: 4, Informative

      Not Quite.
      From hyperthreading.doc "Windows 2000 Server does not distinguish between physical and logical processors on systems enabled with Hyper-Threading Technology"

      Basically for 2000 family you need 2x your CPU-license limit; each virtual processor counts as a physical one.

      So A .net or newer is probably required depending on your hardware requirement. The 2000 kernel will probably not be rewritten.

      --
      Unix is a standard, DOS is a standard, windows XX is not.
    6. Re:Hyperthreading on Windows by Billly+Gates · · Score: 2

      Typical Microsoft.

      This reminds me of what Microsoft did with DirectX under NT4. I had a copy of directx7 or directx8 for windows2000 beta2 and it worked fine under NT4. I finally could play other games besides quakeIII. I had a hard disk crash and had to re-install. Guess what? Microsoft updated the code to not install on n4 by defualt and they removed the old directx package! They did this to sell more copies of Windows2000. Very sleezy.

      It worked. I then paid $300 for win2000 as more and more games used directx rather then opengl. My guess is that only a single line of code was used to force users to upgrade. Same is true with the code in Windows3.1 to make sure only ms-dos was used in conjunction with it. Add a single of line coder here and there and watch consumers open their wallets.

      Linux at least does not have this problem.

    7. Re:Hyperthreading on Windows by StonyUK · · Score: 5, Informative

      This is partial FUD - the document says that IF your BIOS counts processors the way Intel tell BIOS manufacurers they should, then your 4-CPU licence of 2000 server will utilize the 1st logicalCPU of each physicalCPU.

      However, it won't go on to use the extra 2nd logical CPU in each physical CPU because you've used up all your licences by then (2000 server only gives you a 4 CPU licence).

      If your BIOS doesn't enumerate CPUs the way Intel says they should, then 2000 will use both logical CPUs on the 1st and 2nd physical CPUs, and presumably leave your other two physical CPUs idle.

      In .NET, it appears that Microsoft have not only taught it how to count CPUs properly regardless of potential BIOS problems, and also decided that only physical CPUs count towards licencing (well DUH!) and so with a 4 CPU hyperthreaded system, all 8 of your logical CPUs will be used.

    8. Re:Hyperthreading on Windows by nick-less · · Score: 2, Insightful


      2) the lame part is that you have to pay to not have to pay for your logical processors. linux doesn't have these lisensing issues.


      yep - it might be lame to force people to buy new licenses, but hey we're free to run other operating systems and I'm sure any real microsoft zealot upgrades to .NET server years before he get's his first desktop Xeon ;-)

    9. Re:Hyperthreading on Windows by Tassleman · · Score: 1

      FYI - there is a relatively little-known tool that ships on the Windows CDs, I *think* it's called mkcompat.exe (for make compatible) that will allow you to "fool" an executable into thinking it is running on another version of Windows. I don't know if it would work for something like DirectX's installer, but it would have been worth a shot.

    10. Re:Hyperthreading on Windows by UnrefinedLayman · · Score: 1

      Try again: Windows XP supports hyperthreading.

      http://www.theinquirer.net/?article=5616

    11. Re:Hyperthreading on Windows by greenrd · · Score: 2
      So, in other words, "a virtual processor requires its own CPU license, at least in Win2K."

      Remind me - where was the FUD again?

  6. Hyper-Threding, eh? by xactoguy · · Score: 3, Funny

    So that's how we can put the thread through the needle even faster? Wow... back in MY day, we had to use our fingers to do that, in candle light, when you couldnt even see the friggin' hole! :P

    --


    And so we go, on with our lives
    We know the truth, but prefer lies
    Lies are simple, simple is bliss
    1. Re:Hyper-Threding, eh? by charon_on_acheron · · Score: 1

      You had needles? When I was a kid, we had to catch porcupines and use their quills. And we could only catch them at night because they're nocturnal, but we didn't even have candles to see with. First we had to catch dozens of fireflys and keep them in glass jars.

    2. Re:Hyper-Threding, eh? by Anonymous Coward · · Score: 0

      You had porcupines and fireflies? Back in my day we had to grind our own bones into needles and use our hair for thread. What's more fireflies weren't even invented so we had to thread in the dark. We pinched our fingers a lot, but we didn't complain. We liked it that way.

    3. Re:Hyper-Threding, eh? by Breakfast+Pants · · Score: 0

      Glass! Back in my day all we had was dandilions, and you better know you damn well better learn to like dandilions!

      --

      --

      WHO ATE MY BREAKFAST PANTS?
    4. Re:Hyper-Threding, eh? by Woy · · Score: 1
      "Wow... back in MY day, we had to use our fingers to do that, in candle light, when you couldnt even see the friggin' hole! :P"

      Man, you sure know how to make threading sound sexy...

      --
      "If God created us in his own image we have more than reciprocated." - Voltaire
  7. apples to oranges? by edrugtrader · · Score: 0, Offtopic

    come on, more like oranges to tangerines...

    you are dealing with data instruction streams going on independantly, sure maybe only x2 or more with SMP, but x anything is infinately greater than x1 when dealing with threads.

    and what is really the difference with oranges and tangerines? man i hate tangerines... if anything they are worse than oranges, but so similar. all tangerines should be destroyed. and thus i have proven why hyper-threading will fail.

    on that note:
    ARE YOU A PHP DEVELOPER? WORK WITH ME AND MAKE MILLIONS!
    Web Developer II

    --
    MARIJUANA, SHROOMS, X: ONLINE?! - E
    1. Re:apples to oranges? by Anonymous Coward · · Score: 0

      you have obviously never had a fresh Florida tangerine... oh, so, so, good.

      mmmmmmmmmmmmmmmmm, tangerines - Homer

  8. Hyperthreading verses SMT by sielwolf · · Score: 3, Interesting

    I'm personally more partial to calling it Symmetric Multi-Threading as compared to Hyperthreading which is the brandname Intel created for the concept. Sort of like Xerox versus Photocopy. Of course there are some mix-ups for those who seem to think of the multi-threading as OS based and not hardware. Eh, personal preference.

    --
    What is music when you despise all sound?
    1. Re:Hyperthreading verses SMT by dbarclay10 · · Score: 2

      Read the article. SMT sounds nicer, but "hyperthreading" is actually an improvement on something that's existed for ages in some traditional Unix environments: "superthreading". What should they have called it? "Superduperthreading"?

      In the historical context, the name is perfectly fitting.

      --

      Barclay family motto:
      Aut agere aut mori.
      (Either action or death.)
    2. Re:Hyperthreading verses SMT by Anonymous Coward · · Score: 0

      Or you could call it "thread-level parallelism", which is what they called it when I took a compilers class a little while ago. I'm not an expert on the subject (it was a fairly simple compilers class), but you would think if anyone would know the correct term, it'd be a college professor teaching a compilers class...

    3. Re:Hyperthreading verses SMT by Anonymous Coward · · Score: 0

      Thread-level parallelism is a more general concept, which could at one extreme be applied to usage of OS-level threads on a single processor. SMT is a specific microprocessor implementation technique, and there are other ways to make a CPU run several threads by itself.

  9. multithreading by kin_korn_karn · · Score: 3, Funny

    when will someone develop a processor that will automatically multithread tasks? i.e. you don't have to explicitly ask for new threads, it optimizes the code into threads for you?

    yes, I realize this is anti-geek, so this processor would also allow you to take control of thread creation by flipping a register or something.

    1. Re:multithreading by fgb · · Score: 1

      Wouldn't it be simpler and more effective if they just placed multiple processors on a single chip?

    2. Re:multithreading by Anonymous Coward · · Score: 0

      Die yields are all over the place on bleeding edge processors. They could maybe do four on a chip, but it wouldn't run at top speed unless all four did. They'd need at least some reengineering as well.

    3. Re:multithreading by Zathrus · · Score: 3, Informative

      Hey, if you know a new solution to deadlocks and race conditions so that it's trivially easy to solve all of them in realtime, then go talk to a processor vendor of your choice - you won't ever have to invent anything again.

      Until that happens it's simply not possible for anything but the most trivial of tasks (which is already done by compilers and processors with multiple execution units).

    4. Re:multithreading by swg101 · · Score: 1

      Note that the article mentioned that this performance increase was gained with only a 5% increase in die size. Much better than 2 entire procs (not to mention heat dissipation problems)

      --
      Like pi? Try 10,000 digits.
    5. Re:multithreading by Anonymous Coward · · Score: 1, Informative

      There almost is such a thing, at least in academia literature:

      ftp://ftp.cs.wisc.edu/sohi/papers/2002/mssp.micr o. pdf

    6. Re:multithreading by iabervon · · Score: 3, Informative

      Processors do this to the extent that it's possible at runtime; that's what out-of-order execution is, basically. The problem is that it only makes your single threaded program into 2 or 3 threads; beyond that, you need to look at bigger chunks of the program than the processor ever sees at once.

      Beyond that, you really need to be able to look at the program as a whole in order to do anything that clever, so you're talking language, compiler, or library features, and you generally have to involve the programmer somewhat, although you don't necessarily have to do it as explicit threads. (E.g., there's a C variant with a keyword that says it's okay to evaluate all of the arguments to a function at the same time)

    7. Re:multithreading by Anonymous Coward · · Score: 0

      Craig Zilles is a good guy

    8. Re:multithreading by Wildcat+J · · Score: 1

      I believe this is what IBM is doing with the Power4.

    9. Re:multithreading by Have+Blue · · Score: 2

      Your programs are already being multithreaded at a very, very low level: Out-of-order execution, nonblocking fetch, superscalar design, and so on cause your processor to be performing more than one instruction simultaneously, and I guess you can call that multithreading. A good compiler (especially on a RISC platform) can design for this automatically to a certain extent.

  10. SMP performance by swg101 · · Score: 2, Informative

    I would agree that a SMP system holds up well. I run 2x 200MHz Pentium Pro, and it gives solid performance as a desktop. I wonder if this tech would allow a slower clock speed chip, thus cooler, that still exhibited good performance. It seems like a good idea for laptops, etc.

    --
    Like pi? Try 10,000 digits.
    1. Re:SMP performance by FuzzyMan45 · · Score: 2, Interesting

      Actually, i've used a hyper-threaded system (dual 2.0ghz xeons) and it's really not that much faster. Maybe intel fixed some stuff on the final spec, but the chips felt faster not in HT mode...

    2. Re:SMP performance by really? · · Score: 2, Interesting

      ... and benchmarked a bit faster in nonHT mode for me. (FreeBSD 4.6.2, with an 8port 64 bit 3ware controller)

      --

      "Consistency is contrary to nature, contrary to life. The only completely consistent people are the dead." A. Huxley
    3. Re:SMP performance by Anonymous Coward · · Score: 1, Insightful

      did you read the article?

      applications have to be specifically programmed to take advantage of SMT. Apps that are not (i can imagine most apps for freebsd are not) will suffer from performance degredation. Now, go read the article, its very well written and is deserving of your time.

    4. Re:SMP performance by Anonymous Coward · · Score: 1, Interesting
      Actually, i've used a hyper-threaded system (dual 2.0ghz xeons) and it's really not that much faster. Maybe intel fixed some stuff on the final spec, but the chips felt faster not in HT mode...

      Don't you really need to recompile to take advantage of hyper-threading? On top of that, current UIs just barely take advantage of conventional multithreading in the first place. So it's not surprising that it didn't feel faster.

    5. Re:SMP performance by Anonymous Coward · · Score: 0

      "Now, go read the article, its very well written[...]"

      Indeed a torvaldesque remark.

    6. Re:SMP performance by joto · · Score: 2
      No, what you need to do is to design your application to not use any memory, since that will undoubtedly trash the cache. But of course, reordering instructions for lower performance, so the CPU can interleave another process will also help (not that it's going to increase performance in any way).

      On the other hand, this might finally be an argument in support of threads versus processes, since that former avoids the cache issues.

    7. Re:SMP performance by jrwyant · · Score: 1

      I've got a dual-2GHz Prestonia (P4 "Xeon"), and without hyperthreading, a Linux kernel (2.4.19-pre-something with patches or IRQ balancing) compile takes about 3.5 minutes, whereas with hyperthreading enabled takes just over 2 minutes. Something like that. But, when running 4 instances of dnetc on hyperthreading you get slightly less than half the key throughput of just 2 instances with hyperthreading disabled. So, I usually just disable it and run dnetc. :)

  11. it's very difficult to do well by Trepidity · · Score: 5, Informative

    It's incredibly difficult to automatically parellelize a program well. Even when you can run a preprocessor on it and spend days on computations; doing it in real-time in hardware is even more difficult. This is currently done to a small extent in the pipelining hardware of modern CPUs, and even that small bit of automatic parallelization is ridiculously complex and slows things down (which is why the Itanium dumped it, and put the onus on the computer to paralellize sufficiently for pipelining to work). If it's that difficult to do for the relatively meager paralellization requirements of pipelining, actually breaking the program into separate execution threads is damn near impossible with current technology (at least with any efficiency even remotely approaching writing a program to be properly multithreaded in the first place).

    1. Re:it's very difficult to do well by iangoldby · · Score: 1

      Some languages, in particular Fortran 90/95, are designed to make it easy for the compiler to parallelise the code. The array operations and 'where' constructs are all fully parallelisable without any effort on the part of the programmer.

      Of course, Fortan was designed from the ground up to be highly optimisable. But it's not impossible to imagine C libraries being written in the same way. The key is to provide suitably high-level operations, perhaps with call-backs to user functions.

      On the other hand, I've heard that Intel's hyperthreading only really works when the two threads are using different parts of the processor's architecture. There would be no real performance gain where similar operations are being done in parallel as in the cases mentioned above.

  12. typo by Trepidity · · Score: 2

    In reference to the Itanium's pipelining, I of course meant "put the onus on the compiler..."

  13. More fun for OS guys by be-fan · · Score: 2

    To make optimal use of hyperthreading, I'm guessing the OS guys will have to do some work, like making sure that two threads with huge, non-overlapping data sets don't get scheduled at once, and trying to schedule threads who have overlapping datasets together. And it points out another thing. Again, just when we thought we had enough, we need MORE MEMORY BANDWIDTH. The tests show that while the dual channel RDRAM was fast enough for the two HT-enabled Xeon 2.0 GHz, it wasn't enough for the two 2.4 GHz Xeons.

    --
    A deep unwavering belief is a sure sign you're missing something...
    1. Re:More fun for OS guys by Anonymous Coward · · Score: 0

      Hence DDR PC3500 at 433MHZ.
      Still can't wait for DDR 2

  14. Hyperthreading? What's next? by The+Slashdolt · · Score: 5, Funny

    What's next, LudicrousThreads?

    obligatory spaceballs reference

    --
    mp3's are only for those with bad memories
    1. Re:Hyperthreading? What's next? by Anonymous Coward · · Score: 0, Funny

      Oh my god! They went Plaid.

    2. Re:Hyperthreading? What's next? by Anonymous Coward · · Score: 0

      What's next, LudicrousThreads?

      e-threading
      threading XP
      iThreading
      threading ti
      compuglobalhypermegathreading.com

    3. Re:Hyperthreading? What's next? by Anonymous Coward · · Score: 0

      The answer is 42.

      Obligatory H2G2 plug.

  15. Oracle, W2K Enterprise by Perdo · · Score: 2, Interesting

    They Love Hyperthreading. Licencing is determined per CPU reported to the OS not per actual piece of silicon.

    Double your licencing cost for a 5% to 30% performance improvement? I don't think so. Hyperthreading is DOA on for enterprise.

    Luckly MS has decided to enable 2 CPUs in XP home so you dont have to ante up another hundred bucks for XP professional for the 5% to 30% performance improvement.

    Junkware.

    --

    If voting were effective, it would be illegal by now.

    1. Re:Oracle, W2K Enterprise by MmmmAqua · · Score: 5, Informative

      I don't know where you're getting your info about Oracle, but it's wrong. Oracle licensing is determined per-physical CPU. This was something we made doubly-sure to check up on when migrating from our old Oracle server to our new one (dual Xeon w/HT).

      On the downside of HT, until the 2.6 (or 3.0, subject to Linus' whim) kernel comes out, there's no point in enabling HT on a Linux box; because the 2.4 scheduler is unaware of HT, all CPUs are treated the same, and the scheduler ends up starving one physical CPU. Performance on a dual-1.8Ghz Xeon, 1Gb RDRAM with HT enabled under 2.4.10 is roughly 5-15% slower than with HT disabled.

      2.5.31 with the HT patch dramatically reverses these numbers, providing an average performance that is 30% better than 2.4.10 without HT. YMMV, of course, and I'm not talking about OS performance, I'm talking about Oracle's performance. Still, 30% increase just for flipping a switch in the BIOS and recompiling the kernel is nothing to sneeze at.

      --
      Arr! The laws of physics be a harsh mistress!
    2. Re:Oracle, W2K Enterprise by Unknown+Relic · · Score: 1

      Do you happen to have any links to more information on HT performance under Linux, specifically with the 2.4 kernel?

    3. Re:Oracle, W2K Enterprise by Anonymous Coward · · Score: 1, Insightful
      http://kerneltrap.org/node.php?id=391

      It's not 2.4, but it gives you an idea about how things work.

    4. Re:Oracle, W2K Enterprise by Anonymous Coward · · Score: 0
      On the downside of HT, until the 2.6 (or 3.0, subject to Linus' whim) kernel comes out, there's no point in enabling HT on a Linux box; because the 2.4 scheduler is unaware of HT, all CPUs are treated the same, and the scheduler ends up starving one physical CPU.

      That is of course assuming you have two physical CPUs, a UP-HT box will get improved performance even if the scheduler isn't HT-aware.

    5. Re:Oracle, W2K Enterprise by Anonymous Coward · · Score: 0

      > there's no point in enabling HT on a Linux box; because the 2.4 scheduler is unaware of HT, all CPUs are treated the same, and the scheduler ends up starving one physical CPU.

      Not exactly true.

      The HT CPU will look like two, and Linux will recognize it as two SMP units. It will schedule two processes in the normal SMP kinda way.

      There are some details, tho.

      1) When you hyperthread the physical CPU is shared, so you may see what looks like a performance hit. In reality, one task might do 100 units of work with out HT disbled. With HT, one task will do 80 units while the other does 30, a NET gain over BOTH tasks of 10 units.

      Measure "performance" carefully.

      2) Linux, today, sees each virtual CPU as if it were a physical one. The scheduler wants to schedule the same task back on the same CPU whenever possible (to keep the cache). It doesn't know that it can schedule the task on ALL virtual CPUs on the same silicon equally well. So it may avoid scheduling a task on a virtual twin when other runnable tasks are available.

      But, you're right, the HT patch is a good thing. Nice to know Open development can optimize such things, and the benefits to us, so quickly.

    6. Re:Oracle, W2K Enterprise by cpeterso · · Score: 2

      2.5.31 with the HT patch dramatically reverses these numbers, providing an average performance that is 30% better than 2.4.10 without HT.

      I don't think comparing Linux 2.5.31 with HT to Linux 2.4.10 without HT is a fair comparison. That supposed 30% performance gain could easily be attributed to many of the HUGE kernel changes made in the 2.5.x series. A more fair comparison would be 2.5.31 without HT turned ON and then turned OFF. Then you only have a single variable.

    7. Re:Oracle, W2K Enterprise by MmmmAqua · · Score: 1

      Of course, you're correct. My bad for not mentioning this in the first place.

      --
      Arr! The laws of physics be a harsh mistress!
    8. Re:Oracle, W2K Enterprise by MmmmAqua · · Score: 1

      1) When you hyperthread the physical CPU is shared, so you may see what looks like a performance hit. In reality, one task might do 100 units of work with out HT disbled. With HT, one task will do 80 units while the other does 30, a NET gain over BOTH tasks of 10 units.

      True, however, only true for single-CPU HT machines. In a dual-CPU system (or greater), where physical CPU 0 is logical CPU 0 & logical CPU 1, and physical CPU 1 is logical CPU 2 & logical CPU 3, the potential exists for the scheduler to assign tasks only to logical CPUs 0 & 1, thereby overloading physical CPU 0 while leaving physical CPU 1 twiddling its' thumbs. That's where the big performance hit comes in, but, like I said, this should disappear with an HT-aware 2.6.

      --
      Arr! The laws of physics be a harsh mistress!
    9. Re:Oracle, W2K Enterprise by Perdo · · Score: 3, Insightful

      How many processor licenses does Oracle charge for a Power4, which is literally 4 PPC processors on a single die? What about a clustering approach that presents a server farm as a single virtual CPU?

      So many technologies can interfere with processor count that Oracle and Microsoft are using whatever is a best case scenario for them. If licensing is by physical silicon only, future iterations of multi-processing on die will really hamper software provides profitability - something you know they will not stand for.

      If it was exclusively per CPU, you would also see a lot of shops always buying the absolute fastest processors available, and specialty shops selling factory over clocks of those processors. Reduced licensing costs would actually make the price of exotic cooling methods and reduced cpu life look good.

      Same rule applies to Co-location in a different way. How much power can you stuff into 1u of rack space?

      If the most costly machine you can buy is a 48 CPU machine that can fit into 3u using Quad processors cards on a back plane but costs less in the long term because you are not paying for 24u of rack space that dual processor 1u machines would take, you buy it. Even if your per cpu cost is 10 times the cost of more conventional systems, the machine pays for itself in rack space costs in 10 months. After 18 months you upgrade the machine because by then you are paying twice as much for per cpu licenses as you could be paying with modern hardware.

      Note to businesses: Upgrade now while prices are depressed, and interest rates are low. Sticking with your old hardware is costing you in the long term.

      Take out a loan and upgrade. If your hardware is over 18 months old, you can cut your licensing costs in half. Don't sit on hardware when you are just waiting for it to break.

      IT is not a static business. Do not keep your hardware until it has no resale value. Do not keep your hardware until you are paying twice as much for licenses as you could be paying. Do not balk at high up front costs if it saves you 10 times it's upfront cost due to licensing/rack space costs. Do not keep old machines that are costing you three times as much in electricity at a given performance level.

      Do a real cost analysis, put in the time. This is the perfect time to upgrade. Competition has never been more fierce for the dollars you have to spend. You will get more value for your dollar now than you ever have been able to.

      IT is crap as capital. It has no value in three years. Keep you IT expenditures dynamic to avoid riding your capital investment into the ground. Playing the depreciation tax game will not save you nearly as much as keeping old hardware costs you in other areas.

      Disclaimer: I am not invested in any IT infrastructure provider and I do not do IT consulting. I just have to run my own shop like the rest of you.

      --

      If voting were effective, it would be illegal by now.

  16. Dear sir, by Anonymous+Cowrad · · Score: 5, Funny

    oh no!

    Sincerely,
    Intel

    --

    --
    pants ahoy
  17. Terra/Cray MTA by astroboy · · Score: 5, Interesting

    The company that now owns the name Cray does something very much like this on a fairly grand scale on its own architecture, the MTA (Multi-Threaded Architecture). Here, each processor switches between 128(!) hardware threads to take advantage of the sort of concurrancy you can get for waiting for memory access, etc.

  18. Be careful by essdodson · · Score: 2, Redundant

    Hyperthreading needs to be used carefully. Certain applications you will end up with signifigant performance decreases with it enabled. Hyperthreading adds additional overhead to threading models and schedulers.

    --
    scott
    1. Re:Be careful by MxTxL · · Score: 2

      Quite true... your applications have to be designed specifically with multi-processing in mind otherwise your system will just end up wasting any potential performance gains on context switches and other overhead. Also, there are some data sets and types of processing that are better suited to multi-processing environments.

      Since most applications have only one processor in mind, it's typical to see dual processor systems that don't have much performance gain.

      The distributed systems (SETI, GIMPS and others) are very well suited to multi-processor environments... and this is taken to extremes by having the multi-processing done on entirely different machines with some 'master' computers that handle the overhead of reassembling the multiple datasets into something coherent for the whole system. It's actually an amazing thing when you get to thinking about it.

    2. Re:Be careful by Anonymous Coward · · Score: 0

      ya right... You don't know what you are talking about.

      Give me a realistic example where there will be a noticable performance decreased.

      Now, I'm talking real world example not some "if, but, maybe" deal. I bet there isn't one...

    3. Re:Be careful by Anonymous Coward · · Score: 0

      Hyperthreading needs to be used carefully. Certain applications you will end up with signifigant performance decreases with it enabled. Hyperthreading adds additional overhead to threading models and schedulers.

      Surely that is true for SMP with physical processors as well. But I sure wouldn't mind having a dual Athlon or Xeon box...

  19. Any advance is welcome by Anonymous Coward · · Score: 0

    I for one applaud any advance in computer technology - even one that has questionable benefits. There are bound to be lots of stumbling blocks as engineers try to increase the computing power of home PCs.

    Some people will say "what do you need more computing power for" - well on the discovery channel a few nights ago there was a documentary about using CAT scan data and visualization techniques to generate a real-time rendered 3D image of a human brain about ot undergo surgery. The render farm they used was massive. It was a computer science research lab in the UK (sorry, don't know which one). Having the power to do that on a chip that is affordable might save lives.

    Granted, this is only a small step, with questionable benefits, but it is always good to try and push the boundaries.

  20. Great article by MxTxL · · Score: 2

    This is a very good article to read for those who are not really familiar with how a processor actually does it's work. The first three pages or so are generally what a senior-level college OS course will teach you.

    The distinction between a program in memory and a process in execution is important. It is also important to understand the illusion of simultaneous execution that is acheived through concurrent processes using context switches.

    Given all that, the article makes it easy to understand where your performance gains (and losses) happen having multi-processors, and indeed in having multi-processing on the same chip.

    All in all a good read.

    1. Re:Great article by Darren+Winsper · · Score: 2

      "This is a very good article to read for those who are not really familiar with how a processor actually does it's work. The first three pages or so are generally what a senior-level college OS course will teach you."
      Not really. For starters, it doesn't go into any detail on how to use threads. They make no mention of things like semaphores, locks, monitors or race conditions, the sort of things that make threaded application development difficult.

      It goes into the very basics of that part of the OS module I did at the beginning of my second year, so please don't trivialise such courses.

    2. Re:Great article by Hast · · Score: 2

      That's because it's a /hardware/ article. For obvious reasons it doesn't mention how threads work because at the CPU level that is not relevant.

      For people who hasn't studied computer architecture I bet it's a rather tough read. (In which case they should go get Patterson & Henneseys books. They are just great regarding this type of stuff.)

    3. Re:Great article by Darren+Winsper · · Score: 2

      Well that's all well and good, but it doesn't make the statement any more true. It touches on some areas an OS course/module teaches you, but it doesn't cover anywhere near the content the course/module would.

    4. Re:Great article by Hast · · Score: 1

      Ah, I was just about to write a lot about how OS level stuff isn't relevant to hardware, and then I looked up what the original guy posted. I had the impression he was talking about hardware (since it's a hardware article) but he was really talking about OS level stuff.

      I agree, if the first 3 pages are the same level as an senior level OS course then it's high time to get your money back. At least the OS course I've taken was a lot more in depth than that. It's roughly on the level of introduction to computer design, ie they mention it but don't really go into detail. (Because it's not really relevant.)

  21. Damn thief by PissingInTheWind · · Score: 2, Funny

    From the article:

    (On a related note, this brings to mind one of my favorite .sig file quotes: "A message from the system administrator: 'I've upped my priority. Now up yours.'")

    He stole my .sig !!

    --

    A message from the system administrator: 'I've upped my priority. Now up yours.'
    1. Re:Damn thief by njm · · Score: 1, Funny

      If by steal your .sig you mean not claim it as his own, pointing out that it was another's, then yes.

  22. Linux support for Hyperthreading.. by molo · · Score: 3, Informative

    KernelTrap has had some articles on Linux's support of HT. Ingo Molinar has been working on tuning the scheduler for HT systems. Articles are here:

    http://kerneltrap.org/node.php?id=391
    http://ke rneltrap.org/node.php?id=406

    </karmawhoring>

    --
    Using your sig line to advertise for friends is lame.
  23. Hammer? by cca93014 · · Score: 1

    I know the Hammer is 64 bit, but I've no idea about its multithreading properties...Anyone?

  24. Take this one to court... by Anonymous Coward · · Score: 1, Funny

    ... crack open the machine and demonstrate that there isn't but the one CPU. Really, the price tag of the software needs to be determined *outside* of the product being paid for -- especially on proprietary systems.

    <joke>
    California might not have spent so much on Oracle licensing costs had they not relied on a calculator running this little jobber:

    if (CPU_Count < 16) {
    // why would they run on a machine with less
    // than 16 cpus? it's an insult to our software!
    ChargeForLicenses(Random(255) + 16);
    }
    else {
    // Now we're playing with power!
    ChargeForLicenses(CPU_Count);
    }
    </joke>

  25. Does this mean by Anonymous Coward · · Score: 0

    Win2k will have twice as many opportunities to freeze?

  26. It's all about language support by km790816 · · Score: 2

    This is more of an issue of programming language support.

    There are languages (well, mostly modifications to existing languges) that allow one to create a program that will scale to any number of processors.

    It's actually a very tough problem, because most coders thing in terms of doing x, then y, then z. You really need to think in terms of I need these things done and they have these dependencies, but other than that, divide and concor any way you want.

    parallel programming languages on Google

    1. Re:It's all about language support by Tony-A · · Score: 2

      Dijkstra's Guarded Commands maybe?
      Programs tend to be a linear order rather than a partial order.
      This can be a problem even with strictly sequential processing if the requirements keep changing.

  27. SYMMETRIC Multi Threading by keytoe · · Score: 5, Insightful

    They call this stuff Symmetric Multi Threading, but I think that name is a bit misleading. While the thread scheduling itself is symmetric (all process threads are created equal and receive equal execution time), the shared resources on the CPU (cache, shared registers) are NOT symmetric. Since these shared resources are in essence handled on the way in to the execution unit, it becomes really easy to starve the processor when you have contention for one of those resources.

    While proper application development can alleviate some of this issue, it will depend heavily on the actual usage patterns of the system. When you have a lot of overlap coming in from memory (like the file system cache on a web server), you don't worry too much about threads stepping on each others' registers. This sounds fantastic for data servers.

    Desktop systems, on the other hand, almost never work this way. When you're playing MP3s in the background while web surfing and checking your email, you're already working with vastly different areas of data. Throw the OS and any various background processes into the mix and you've pretty much eliminated any gain and possibly slowed down due to cache contention.

    While this was touched on at the end of the article, I don't think it was given enough weight. It doesn't just depend on what applications you're running and wether they were written to take advantage of it. It depends on what you want to do with the whole system. For serving data, this will certainly be good (especially with multiple CPUs!). For desktop systems, this is a non-starter.

    I'm not disparaging the technology - far from it. I'm just waiting for Intel and Microsoft to market this to my mom as a way to have higher quality DVD playback - at twice the cost. And her buying it. Again.

    1. Re:SYMMETRIC Multi Threading by wfmcwalter · · Score: 2, Informative
      They call this stuff Symmetric Multi Threading, but I think that name is a bit misleading.

      I believe when symmetric is used in the context of SMP and SMT it is intended to mean "all execution elements have the same public interface".

      Things would be asymmetric in cases where there was a differentiation between the performance or capabilities of the execution elements - e.g. where one processor could handle interrupts and the other couldn't. An 80286+80287 is an example of an asymmetric system - one execution element can only do FP stuff, the other can do everything but FP.

      --
      ## W.Finlay McWalter ## http://www.mcwalter.org ##
    2. Re:SYMMETRIC Multi Threading by AxelTorvalds · · Score: 1

      Isn't it simulataneous multi threading?

    3. Re:SYMMETRIC Multi Threading by akuma(x86) · · Score: 3, Informative

      It's not symmetric multithreading.
      It's SIMULTANEOUS multithreading.

      This means that both threads are in the processor pipeline simulatenously.

  28. Software Objects Should Be Concurrent as a Rule by Louis+Savain · · Score: 2

    when will someone develop a processor that will automatically multithread tasks? i.e. you don't have to explicitly ask for new threads, it optimizes the code into threads for you?

    There should be no such thing as a sequential or algorithmic task. Programs should be parallel to start with. The biggest problem in software engineering is the age-old practice of using the algorithm as the basis of programming. This is the primary reason that software is so unreliable and so hard to develop. Objects in the real world are concurrent. Why should our software objects be any different?

    1. Re:Software Objects Should Be Concurrent as a Rule by GlassHeart · · Score: 1
      The biggest problem in software engineering is the age-old practice of using the algorithm as the basis of programming. This is the primary reason that software is so unreliable and so hard to develop.

      I am skeptical of the broad brush you are painting with. There are many kinds of software, many different ways they fail, and many different ways to do better.

      Parallel code, just like OOP code and structured code and garbage-collected code and extreme programming code, come with their own failure modes. Specifically, parallel code can contain deadlocks, which can be hellishly difficult to reproduce and therefore debug. They can have synchronization bugs that are dependent on processor load and other temporary phenomena. The most common debugging tools are also generally poor at debugging parallel programs.

      I'm not saying that it's a bad solution, just that it's entirely possible that there's no magic bullet for the software engineering crisis we face.

    2. Re:Software Objects Should Be Concurrent as a Rule by Anonymous Coward · · Score: 0

      If you've ever really looked at any program, you realize that things go from A to B to C and that any parallelization is few and far between, esp in user apps (not scientific). However, using a thread for I/O to keep interactivity in your program to a maximum is a good thing, but thats not parallel, thats running concurrently.

  29. IANAE but... by mikeg22 · · Score: 1

    is this similar to or in someway related to HyperBicycles? Please, no techie answers, I don't really understand that stuff.

  30. SMP is the way grasshopper by El_Nofx · · Score: 1, Offtopic

    This is awesome, because SMP is the future baby.

    After running single cpu systems for 10 years I finally antied up and built a dual 1 gig P III box, I would never go back.

    There are many many reasons for this, first off my computer hasn't locked up in probobly 4 months, I always have a free processor to kill the app! Even though faily few programs are multi-threaded, SETI@HOME, Photoshop, etc, I still use them both evenly by running 3 or 4 things at once...

    I still love to be able to burn a cd, listen to music and play counter-stike all at the same time.

    I heard of a Higher-UP at Transmetta saying that SMP was crap one time, what a moron, no wonder they aren't doing that well.

    Now maybe when we start seeing Asyncronous processor systems come down to the desktop level is when things will really start to cook..

    --
    It's not the OS it's the user that sucks. If it's user friendly, you get stupider people. - clinko
    1. Re:SMP is the way grasshopper by theBrownfury · · Score: 1

      Totally have to second this post. Having been a single proc. system for the past 8 years and now finally switching to a dual Athlon 1800+ is just amazing. Its like having a second processor for everything! Oh wait, I do have a second processor for everything.

      Big problems though with heat dissipation and power consumption. Performance is great and all but having to have 8 fans in my case is just not fun.

      SMT is a great way to get more perf. out of existing technology.

      Now to make smarter developers.

      --

      "Unlike most of you, I am not a nut." - Homer J. Simpson
    2. Re:SMP is the way grasshopper by Anonymous Coward · · Score: 1, Informative

      pfft... I agree SMP is cool, but I tend to run many VMware processes at the same time.

      I play Q3, burn CD's and listen to music on my 800 Mhz P3... The SMP isn't really giving you anything in that department. For example, playing a MP3 on my computer uses about 2% CPU, burning a CD (16x) uses about 20% CPU if that much (CD burning speed is limited by HD speed, not CPU; duh). Q3 gets the rest (around 100 FPS on my ancient GF2 MX card), no worries.

    3. Re:SMP is the way grasshopper by Anonymous Coward · · Score: 0

      Oops, I meant 800 Mhz, single cpu P3...

    4. Re:SMP is the way grasshopper by Darren+Winsper · · Score: 4, Informative

      You have a very significant mis-understanding of pre-emptive multi-tasking. There is no situation where a locked process cannot be killed on a single CPU system but can be on a multiple CPU system.

      When the locked application's timeslice runs out, other applications will get a go, and from that it it possible to kill the locked application. This is one of the reasons pre-emptive multi-tasking became popular.

    5. Re:SMP is the way grasshopper by Anonymous Coward · · Score: 0
      I heard of a Higher-UP at Transmetta saying that SMP was crap one time, what a moron, no wonder they aren't doing that well.

      Transmeta is aiming for the embedded market - PC-104 boards, subnotebooks, possibly blade servers, that sort of thing. for those applications, SMP is expensive overkill - that sort of processors spend most of their time idle as it is, i can't figure out why the transmeta folks even try to go gigahertz myself.

    6. Re:SMP is the way grasshopper by f00zbll · · Score: 1

      I had a dual processor system and it definitely improves the responsiveness of applications. But it did end up killing two CPU fans on my dual 450mhz. I seriously doubt a Xeon of same speed would perform as well.

    7. Re:SMP is the way grasshopper by GlassHeart · · Score: 1
      I still love to be able to burn a cd, listen to music and play counter-stike all at the same time.

      This has little to do with the number of processors, but with the scheduler.

      Burning a CD is what we call a "real time task", in the sense that any gap in the data stream going to the CD writer will destroy the disc. You might call it "hard real time" if the discs are expensive.

      Listening to music is similarly a real time task. The audio hardware must be fed a constant stream of data, or it'll run out of things to play and skip. However, since it only means that you get a bit annoyed (compared to burning a coaster), we might call that a "soft real time" task.

      Your game is also important, because it's an "interactive" task. This means a user is sitting there waiting for something to happen.

      Finally, your OS has a number of "background tasks" that it must run from time to time.

      Schedulers have different design goals. So-called Real Time Operating Systems (RTOS) are designed to respect the real time tasks, and they give CPU time to the highest priority task that wants it. The real time tasks that they handle may trigger emergency cooling for a runaway nuclear reactor, so no CD burning or MP3 playback better get in its way. PC operating systems don't tend to be designed with this strictness, but instead optimize the interactive experience. Basically, any OS that doesn't explicitly take good care of real time tasks can be overloaded into burning coasters. Having two CPUs or a faster CPU merely makes it harder to overload.

    8. Re:SMP is the way grasshopper by Arandir · · Score: 2

      My computer hasn't locked up in four years. That's because I stopped using Windows...

      --
      A Government Is a Body of People, Usually Notably Ungoverned
    9. Re:SMP is the way grasshopper by ez76 · · Score: 2
      You have a very significant mis-understanding of pre-emptive multi-tasking. There is no situation where a locked process cannot be killed on a single CPU system but can be on a multiple CPU system.
      That's all good in theory, but in practice it can be damned hard to kill an errant Win32 application on a single CPU box when, for example, the errant application is hooked into the same message processing loop as your explorer shell process and you can't get a Ctrl-Alt-Del in edgewise.

      And before the *ix/X/KDE folks smile too broadly about this, I routinely have the same thing happen in KOffice applications when I scroll through the font selector drop-down a bit too zealously. XFree86 and xfstt decide to have a CPU party and other X clients are not invited (sniff, sniff).

      My point being that a pre-emptive multi-tasking O/S is no guarantee you'll make it out of a (near-)infinite loop alive with your original session intact.
    10. Re:SMP is the way grasshopper by Darren+Winsper · · Score: 2

      Hmm...I've not seen Ctrl-Alt-Del not work on Windows2000 unless the OS itself has fucked up somehwere. If that's the case, then all bets are off. And yes, I have had things fuck up and drag Explorer down with it. It never stopped me from being able to bring up the task manager.

      I do have a KDE application that tends to do funny things at times. The CVS version of Kopete sometimes decides to lock up just as I open a menu, which seems to render nothing else on the screen clickable. The way I get around this is to drop to a terminal and kill Kopete from the command line.

      I'd like to point out that in both those cases, adding another CPU doesn't increase your chance of being able to recover. The system might be a bit more responsive, but it doesn't matter how many CPUs you have if your UI is dead or the scheduler has fucked up.

  31. IDF Q/A Session by Anonymous Coward · · Score: 0

    There is a writeup on Hyperthreading along with some videos from a Q/A session with Intel representatives at their last IDF (Intel Developer Forum).

    http://www.hardwareanalysis.com/content/article/ 15 32/

  32. One for work, one for Slashdot? by billstewart · · Score: 1
    Hey, there are lots of things you can do multi-threadedly. This way your Slashdot feed keeps on ticking even if you've got brief interrupts to do your real work, or vice versa. (Alternatively, you can leave one virtual CPU crunching prime numbers or space alien radio signals whil the other's doing real work.)

    Yes, yes, I realize you don't need hyperthreading for that and regular multitasking is good enough...

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  33. Re:apples to oranges?-Mental "Fit"? by Anonymous Coward · · Score: 0

    What I think is going to be the quintessential question is, how many algorithms can be expressed fully in a hyperthread context? Kind of like the difficulties with Multi-processor and hyper-cubic architecture. And as someone else pointed out. How do you debug such a beast? Kind of like trying to fix a modern jet engine with a crowbar and rusty nail. The fix-it part hasn't kept up with the create-it part.

  34. Who cares? Its got DRM!! by Anonymous Coward · · Score: 0

    Regardless of the technical achievments that are coming out of Intel - and hyperthreading is indeed an achievment to be applauded.. The bottom line - Intels chips have beecome totally irrelevant to me, regardless of their performance since they will contain DRM restrictions.

    I'm pinning my hopes on Apple and maybe even China's new Dragon chip for my future computing needs.

  35. Who Cares? Its got DRM! by cosmosis · · Score: 2

    Regardless of the technical achievments that are coming out of Intel - and hyperthreading is indeed an achievment to be applauded.. The bottom line - Intels chips have beecome totally irrelevant to me, regardless of their performance since they will contain DRM restrictions.

    I'm pinning my hopes on Apple and maybe even China's new Dragon chip for my future computing needs.

  36. Solution to "hog" problem by CustomDesigned · · Score: 1
    The "hog" problem, as mentioned in the article, is where competing threads thrash the cache, or other shared resource. To get good performance, the threads sharing the processor must "cooperate" to some degree.

    A good solution to this is to only allow threads from the same process to share a physical CPU via "hyper-threading". This makes it possible for the programmer to provide explicitly for their "cooperation", and even without programmer support, threads from the same process are more likely to use similar TLB and cache data. Traditional time-slicing will still get competing processes in and out of the CPU.

    1. Re:Solution to "hog" problem by CustomDesigned · · Score: 1
      The article mentioned that Xeon can run in 1 or 2 logical processor mode. If switching between modes is fast enough, the OS could switch to 1 logical processor mode when only single thread processes are runnable, switching to 2 logical processor when multiple threads from the same process are executing.

      BTW, does posting in Slashdot prevent some Bozo for filing for patents on these ideas?

    2. Re:Solution to "hog" problem by Wildcat+J · · Score: 2
      That's not a bad idea, prima facie. It's not quite so simple, though. While threads in a single process share the same address space, there are no guarantees as to their memory access patterns. The programmer can't be completely sure of where code, especially from something like a shared library, will be placed. Dynamically allocated memory could be anywhere on the heap. Now, in a simple enough program, it could be trivial to make sure that the instructions for routines each thread calls are close together, and likewise for the data. However, a large, well-written (in terms of modularity) program might not be so amenable to cooperation between threads.

      I have to think there is some merit to your idea, though. I just wanted to point out that it's not a simple matter.

      -J

    3. Re:Solution to "hog" problem by Anonymous Coward · · Score: 0

      > does posting in Slashdot prevent some Bozo for filing for patents on these ideas?

      Of course not. You can alway file.

      But, worse, "prior-art" means you have to demonstrate it in something like a product.

      Odd, but true, even if programming is just a description by another language.

      The PTO is screwed. Always expect the worse.

  37. JAVA JAVA JAVA by gricholson75 · · Score: 1

    I think alot of people are missing where this will help out alot. Java servers. Java systems under high loads benefit alot from multi-threading, and this can only help.

    1. Re:JAVA JAVA JAVA by Anonymous Coward · · Score: 0

      BFO. Blindingly fsking obvious.

  38. How so? by Anonymous Coward · · Score: 0

    > Hyperthreading adds additional overhead to threading models and schedulers.

    How So?

    The CPU simply appears as an SMP pair. But better, if you only have 1 process to schedule you get 100% of the physical silicon, rather than 50%.

    If you have multiple processes, the CPU simply uses instruction slots that a single pipeline would have otherwise left unused.

    It doesn't ADD overhead to anything in the software. It can be OPTIMIZED to IMPROVE some things. But out of the box, there is no substantial overhead added into the system.

  39. Re:Who Cares? Its got DRM! by djlowe · · Score: 1

    So...

    Have you SEEN these chips in action? Do you actually have a motherboard that has an Intel implementation of DRM?

    Why not wait to see what happens, instead of spreading FUD?

  40. Re:Who Cares? Its got DRM! by damiam · · Score: 1

    Will the Itanium have DRM?

    --
    It's hard to be religious when certain people are never incinerated by bolts of lightning.
  41. Novell Netware 5 & 6 by FreeLinux · · Score: 2

    Netware 5 & 6 fully support hyper-threading.

  42. was in p4 original design qjkx by Anonymous Coward · · Score: 0

    Dual core was in original p4 design, but was dropped due to lack of time to test. I'm not sure, but it may actually be in the silicon still yet just disabled.

  43. Re:Who Cares? Its got DRM! by Anonymous Coward · · Score: 0

    there is no freaking way DRM (in the CPU/MB) is going to happen for at least several years.

    If it does, then we can blame Intel but they haven't done anything YET.

    You sir are spreading FUD in the purest sense.

  44. Your "non-techie" answer: by Anonymous Coward · · Score: 0

    "No."

  45. Re:FP by Anonymous Coward · · Score: 0

    Hi, I'm having trouble reaching the linked site. Although if the page shows what I suspect it does, I probably don't really want to see it anyway.

  46. Increasing pain of Mis Predicts and IO Access by brandido · · Score: 3, Interesting

    When Intel switched from the P3 architecture to the P4 architecture, they increased the depth of their pipeline from 10 to 20, I believe. My understanding was that this significantly increased the performance penalty for mispredicts for branches and whatnot requiring a flush of the pipeline. I am curious if adding SMT to this will increase the penalty for mispredicts even more, if both threads must be flushed or only the one. If this is the case, are there cases where the penalty would outweight the benefit?

    --
    First Falcon-1 to orbit, then Falcon-9. Then I can die a happy man.
    1. Re:Increasing pain of Mis Predicts and IO Access by taeric · · Score: 2

      More likely, the processor would hold off on the program branching and send code for another process down the pipe. That way, you have slowed down (definitely) the process that is branching, as it is waiting for the branch, but you have allowed it to perform calculations on another process in the same time. Effectively making sure that you should never have to "flush" the registers.

      So... they could eliminate the whole concept of branch "penalties" altogether.

      Now is this how it is actually implemented? I don't know. There are already plenty of complications present in the processor, so changing this bit of logic is far from trivial. Still, since a branch calculation is a fixed amount of time that leaves part of the execution units free, I don't know of any reason this sort of scheme could not be implemented.

      Perhaps if someone else has some information?

  47. DRM by XiaouTuzi · · Score: 1

    And with the addition of DRM in the intel lines we can know exactly how quickly we won't be able to install the patch for the broken / insecure bit of GNU code.
    Wake me up when its about AMD.

    1. Re:DRM by Anonymous Coward · · Score: 0

      AMD will support the DRM stuff also. Get a life and actually read up on things before making assumptions.

  48. Implications??? by Anonymous Coward · · Score: 0

    So, if CAML programs can be threaded, does that mean Bill Gates gets to go to heaven?

  49. A conspiracy theory for you by Anonymous Coward · · Score: 0

    Conspiracy theory of the day:

    When was the last time a software company found a "cure" for a basic programming problem like deadlocks, race conditions, or garbage collection?

    See, they aren't interested in actually solving problems once and for all, since it's not as profitable!

    And that's why we need to abolish patents!

  50. hey mods by Anonymous Coward · · Score: 0

    mod this guy up!

    he's a notorious Usenet kook! slashdot is the perfect place for him!

  51. Re:Hyperthreading on Windows - user experience by madbrain · · Score: 2, Informative

    I'm posting this on a Dell P530 development desktop, running Windows 2000 Server.
    The CPU is a single Intel Xeon 2.2 GHz.
    Hyperthreading can be turned on or off in the BIOS of the machine. I turned it on before I installed Win2K.

    The system was seen as a dual CPU machine from the time I installed it from the original CD, before I applied any service pack.

    If I disable hyperthreading in the BIOS and boot Win2K, then I only see one CPU.

    I have a second Xeon CPU on order for this machine as it is dual capable. Once I get it, it should make it look like a quad CPU in Win2K.

    FYI, I am also running another OS on the system, Warp Server for E-business with the SMP kernel. Unfortunately the OS2APIC.PSD driver only detected one CPU even with hyperthreading enabled. I contacted the OS/2 kernel developer at IBM Austin, who told me that somehow there needed to be explicit support for it in OS/2 SMP for it to work.

    I also left about 20 GB unpartitioned on my hard disk for Linux, but I haven't gotten around to installing it yet. Thread support in Linux has historically been poor and this is the main reason why I haven't done so. With the availability of the NPTL library, I'm looking forward to installing Linux, as NPTL becomes the standard pthreads library for Linux.

    --
    -- Julien Pierre http://www.madbrain.com/blog
  52. Apples to Oranges... by Anonymous Coward · · Score: 0

    "Years later when Apple brought dual-processing to its PowerMac line, SMP was officially mainstream"

    Actually, it's the other way around, there were dual processor PowerMacs, as well as BeBoxes, long befor dual celeron was available. What--more than any other single thing--brought SMP popularity to personal users was the Abit BP6.

  53. More good news by DeadBugs · · Score: 2

    According to this article Windows XP home and Pro already support Hyperthreading as does Linux Kernel 2.4.x and later.

    ASUS has released BIOS upgrades to the P4T533 line of motherboards that now support Hyperthreading.

    And rumors persist that Hyperthreading is on the current P4 chips (Socket 478?) and may be enabled at a later time if all goes well

    --
    http://www.kubuntu.org/
    1. Re:More good news by grahamm · · Score: 1

      /proc/cpuinfo shows the p4 in my system supports hyperthreading, but the BIOS does not enable it - so I cannot use it.

      Why does Linux (2.4) need BIOS support for hyperthreading?

    2. Re:More good news by DeadBugs · · Score: 2

      The BIOS is a common place to enable or disable features for the CPU such as Cache or a CPU ID. It's also very commond to have a BIOS upgrade contain new features. In the past I have had BIOS upgrades that supported new features for AGP performance and memory performance.

      Intel may have disabled Hyperthreading in it's current CPU's to prove out the technology first, and once they are ready a simple BIOS update may enable this feature.

      --
      http://www.kubuntu.org/
  54. Mitigating branch missprediction? by Xife · · Score: 1

    Due to the restrictions for queueing instructions on a given thread (50% for each virtual processor), I would think this would help the P4 in a couple ways.

    This is kind of like reducing the 20 stage pipeline to a 10 stage pipeline. Caveat being Hyperthreading which dilutes the number of instructions from a mispredicted thread, but does not restrict the pipeline stages.

    It kind of makes me wonder why Intel didn't stop at Superthreading with a hardcoded interleaving of the 2 threads... That would give a hard and fast improvement on branch misprediction and it might have made a lot of the other logic much simpler.

    --
    ---- Smokin' another sig.
  55. A bit OT, but I've been wondering about Xeons... by Mitchell+Mebane · · Score: 1

    I've got a Abit VP6 mobo, and I currently have dual PIII 866s in it. What I was wondering is, can I use PIII Xeon chips in it, or do they require a special board? 'Cause dual 1 GHz Xeons would be SOOOO sweet...

    --

    The roots of education are bitter, but the fruit is sweet.
    --Aristotle
  56. Re:A bit OT, but I've been wondering about Xeons.. by Anonymous Coward · · Score: 0

    Just buy dual 1GHz P3s, it would be the same thing.

    First of all, Xeons use Slot 2. Second, they are essentially EXACTLY THE SAME CORE as normal P3s if you get 256KB cache Xeons. The higher cache models are still the same core just with more cache. Plus I don't think they make the faster P3 Xeons with more than 256KB of cache, which essentially makes them only marketable to people that don't know what they are doing.

    Don't waste your money on P3 Xeons. Normal 1GHz P3s do dual just fine and cost tons less, and would perform *identically*. That is why Intel removed SMP capability from normal P4s, so that they can force people who want a new Intel-based SMP system they have to buy their ludicrously priced line of processors, which again are pretty much exactly the same except with a different socket.

  57. What does this say to the market? by Quirk · · Score: 1

    Presently the high end PC market is drying up as users come to realize a Celery 1ghz and some vid ram will take them anywhere they want to go. With them went the market that allowed Moore's Law to propogate during the commoditization (ugly word) of the PC. Aside from the server/workstation market who are the buyers for this technology?

    --
    "Academicians are more likely to share each other's toothbrush than each other's nomenclature."
    Cohen
  58. SMT = bullshit marketing to sell expensive chips by irritating+environme · · Score: 1

    here's a question:

    Why use SMT on a $500-1000 processor to effectively subdivide 3 GHz among processes when there should be motherboards that support four-eight $50 processors that run at 2Gh, and allow the subdivision of 8, 10, whatever gigahertz among processes?

    Basically, Intel doesn't have a mass-market application for gigahertz processors. DiVX encoders may need it, but ma on IE doesn't. So they try to lump as much processing onto the CPU as possible, damned if it should be there. If they had their way, your NVIDIA 3D processing would be directly on the chip too (SSE, MMX, or AMD's 3DNow!). But they can't do that economically with what's needed.

    --


    Hey, I'm just your average shit and piss factory.
  59. Suns MAJC tried to do that by orz · · Score: 2

    Suns MAJC (Multiprocessor Architecture for Java Computing or something like that) tried to automatically transparently split threads into multiple threads using some kind of weird speculative logic. I don't think it worked too well...

    Inicidentally, that chip was also supposed to do SMT and single-chip-SMP and SIMD. Dunno how well it faired, I kinda forgot about the chip after its second schedule slip, and I haven't seen it mentioned much since then... it should have been out for at least a year now.

  60. a clarification by Anonymous Coward · · Score: 3, Informative

    Since lots of people seem to be missing the point of "hyperthreading", as Intel is calling it, I feel like jumping in and trying to clarify a little bit.

    Processor clocks have gotten faster and faster and faster and faster over the last decade. Multiple orders of magnitudes faster. Not only that, but processors have incorporated increasingly clever tricks to process the data they have available to them. Memory speeds have increased too, but even with DDR and all that great stuff, they haven't kept pace. So there are times when your super-fast processor is just sitting there waiting around because it's run out of data to process.

    Even if you could (cheaply) make memory that actually ran at 2 GHz or whatever, this would not solve an even more fundamental problem that makes the situation worse: due to the speed of light, a 2 GHz processor is going to have to wait a really significant amount of time if it has to wait on main memory before it's time to process something.

    So, here's a question for you: if the processor has to wait a really long time, maybe enough time to execute maybe like 50 instructions, what should it do during that time? Should it:

    1. Sit on its butt and do absolutely nothing at all, or
    2. Quickly flip over to another thread and start executing its instructions?

    Well, the idea behind the hyperthreading (a/k/a thread-level parallelism) is that the processor should make some sort of effort to do something.

    So, IMHO hyperthreading isn't stupid or a marketing ploy. It's a genuine attempt (one that many processor makers are working on, by the way) to solve a genuine problem. And not only a genuine problem, but one that will increasingly become a bottleneck. (It's already bad enough that it has its own name: "The Von Neumann Bottleneck".)

    And by the way, the advantage of this over two processors is that you don't have to build two chips! You don't get double the performance, but it's quite possible that you might get a better bang for the buck. (Notice I said "might".)

    Also note on the cache pollution issue (where one thread slows down another by "hogging" the cache and actually causing slower execution for another) that there are ways to mitigate this problem. An obvious one that comes to mind is to bias the processor towards executing a particular one of the threads. That way, one thread runs much more often and should tend to have what it needs in the cache.

    Anyway, until the economy gets better and I find a way not to be one of the masses of unemployed software developers anymore, I'm not buying one of these fancy processors...

    1. Re:a clarification by Anonymous Coward · · Score: 0

      Not so much 'might' get better performance, as in our experience at this datacenter, pretty much always do. We run some of these newer servers with 4-way P4 Xeon 1.5's. From our performance metrics under heavy loads on Win2000A/Citrix MetaFrame 1.8 servers running JD Edwards (a very CPU intensive financial app), that we see CPU utilization about 1/2 of comparable Xeon 700's, and performance about 1/2 again faster.

      It's a moot point, as if your OEM is selling you exclusively Xeon SMP servers (and all of them are), you have no choice but to get P4X's these days. You can always disable it in the BIOS, though.

  61. Lock Granularity by Ben+Jackson · · Score: 3, Interesting
    All the systems I have seen are either broken or have so many locks in them that they may as well be single-threaded.
    Don't you mean they had so few locks in them thay they might as well be single-threaded? Having more locks isn't a bad thing unless your critical sections have to hold more than one or two locks at a time. After all, you've got to have some kind of mutual exclusion when modifying global data, and you can only have as many threads holding locks as there are locks to hold!

    To scale well you want to lock data rather than code and that can lead to many locks when you are operating on many structures. Ideally these locks each have less contention and better data sharing than "bigger" locks.

    1. Re:Lock Granularity by spitzak · · Score: 3, Interesting

      What I meant is the programs I have seen lock a piece of critical data in such a way that it is impossible for any two threads to be unlocked at any time. The code typically was like this:

      for (;;) {
      lock(big_lock_shared_by_everybody);
      figure_out_what_to_do();
      lock(small_lock_around_my_work);
      do_about_95%_of_the_work();
      unlock(big_lock_shared_by_everybody);
      do_about_5%_of_the_work();
      unlock(small_lock_around_my_work);
      do_a_bit_more_that_should_be_locked_anyway();
      &nb sp; wait_for_next_message();
      }

  62. Prescott... by Anonymous Coward · · Score: 0

    Coming from a limey perspective, Prescott brings up images of our beloved deputy PM - not a pleasant sight: John Prescott in his prime.

    He's also known for being large, rough and prone to overconsumption of reources (2 Jag's) - maybe Intel do know what they're doing.....

  63. Another good HT on Extremetech by Gremlin77 · · Score: 1

    If green text on a black background isn't your style, check out the HT article over at www.extremetech.com

  64. my benchmark on a dual Xeon system by dermond · · Score: 1

    running factor 12344322343342231127

    number of parallel processes / total executiontime

    1 2.87000000000000000000
    2 2.99500000000000000000
    3 3.53000000000000000000
    4 4.00500000000000000000
    5 5.12600000000000000000
    6 6.05500000000000000000
    7 7.31285714285714285714
    8 8.04875000000000000000

    there are 2 xeon 2ghz CPUs in the system with hypertrheading activated thus pretending there are 4 CPUs

    in the case of 4 or 8 parallel prozesses the execution time is still about 0.70 percent of what one would expect from only 2 CPU's that means about 40% more performance.. not too bad..

    mond.

  65. Re:SMT = bullshit marketing to sell expensive chip by Scooter · · Score: 1

    From my understanding of the article - HT (with appropriately written code excuting) drastically reduces wasted cyles in the execution core - so what you should be comparing is 4 or 8 normal cpu's with 4 or 8 HT capable cpus - and yes I'm sure they'll cost more, at least at first - a Porsche 993 costs 10 times as much as a Ford Focus, but only goes less than twice as fast but in certain circumstances it's useful to have all the power in one unit.

    I would agree that it makes no sense right now to buy a 2 CPU HT architecture system over an 8 way (or however many you can get for the same cash) SMP system, but as the technique is perfected the price of these CPU's will come down, and as compilers and application/OS developers use more multi-threading in their products, it will make more sense.

  66. Placing an Idea in the Public Domain... by StevenMaurer · · Score: 2

    Just so that it won't be later patented...

    The starvation issues with symmetric-multithreading can easily be addressed by keeping an instruction count for each virtual thread; perhaps hooked to an interrupt the OS can use to tell when each thread has consumed its allotted processor resource.

    That way, threads that have been starved for resources will remain in the process core longer than the any who happen to "hog" a resource. In other words, instead of time slicing, you can use instruction slicing to insure fair use of the scheduler between contending threads.

    Volla! Problem solved. (Not counting the dozen man-years it would take to implement.)

  67. Re:SMT = bullshit marketing to sell expensive chip by Anonymous Coward · · Score: 0

    I think the poster's argument goes like this.

    When you HT a CPU you, basically, divide the core between two processes. So a 2.8GHz CPU roughly gives out 1.4G to each process.

    Today, a dual 1.4GHz CPU costs $200. A single 2.8GHz Xeon CPU costs $500.

    Could that $300 difference be better spent on a making a dual/quad/Nway support chipset?

    Yes, the 2.8G will get cheaper, but so will the 1.4. Someday a 4G Xeon will come 'round, but the 2G would still be far cheaper.

    The point is, I think, that HT is an incremental advantage. Yes, it keeps more transistors firing on the CPU, but if you double the number of transistors with multi-CPU SMP you end up with even more of them to fire -- at less cost.

    What the poster misses, tho, is that it costs Intel just about as much to make a 2.8G part as it does a a 1.4. It would much rather sell 1 CPU at the higher price, than 4 at the lower.

  68. Hyperthreading at UCSD, and why the Tera Sucks by ShakaUVM · · Score: 3, Interesting

    UC San Diego has been a leader in research on hyperthreading. We used to have the Tera MTA, which kinda pioneered the whole field, and we have Dr. Dean Tullsen (and his lab of students), whose hyperthreading architecture was used in the new, now-cancelled, alpha chip.

    References: The Tera: http://www.cs.ucsd.edu/users/carter/Tera/tera.html
    Dean Tullsen: http://charlotte.ucsd.edu/users/tullsen/

    I was one of the first five students to use the Tera after it came out of development. I decided to take a different approach in evaluating its performance. I didn't like what the Tera corporate benchmarkers were doing. Which was taking applications with known parallelism, writing a serial version of the code, and then post with glowing reviews the results of the Tera automatically finding parallelism, ignoring that the number of pragmas they had to put into the code to allow the compiler to discover parallelism was more work that just writing a parallel code oneself.

    I instead called them on their advertising that their compiler could discover latent parallelism in any computation-heavy code. I noticed John Carmack's .plan file at the time openly questioned the same claim, so I took the single threaded, computation-intensive utility for Quake2 (BSP; LIGHT & VIS are multithreaded) and ran them on the Tera. Nutshell: it couldn't find parallelism. The 300Mhz Tera supercomputer ran at the equivalent speed of a 600Mhz Pentium. Which is crap considering the incredible memory bandwidth and number of computational units it had available.

    When I reported the results to Carmack, his response was, "I have never been a big believer in magically parallizing dusty deck codes. I don't mind specifying explicitly parallel activities and threads, especially with the large payoffs involved."

    Cheers,
    Bill Kerney

  69. or kleenex vs tissue! by lopati · · Score: 1

    that is all :D

  70. Re:A bit OT, but I've been wondering about Xeons.. by Mitchell+Mebane · · Score: 1

    Darn... I thought all the P3 Xeons had the 2MB cache... thanks!

    --

    The roots of education are bitter, but the fruit is sweet.
    --Aristotle
  71. Last Post! by alpg · · Score: 1

    Very few things actually get manufactured these days, because in an
    infinitely large Universe, such as the one in which we live, most things one
    could possibly imagine, and a lot of things one would rather not, grow
    somewhere. A forest was discovered recently in which most of the trees grew
    ratchet screwdrivers as fruit. The life cycle of the ratchet screwdriver is
    quite interesting. Once picked it needs a dark dusty drawer in which it can
    lie undisturbed for years. Then one night it suddenly hatches, discards its
    outer skin that crumbles into dust, and emerges as a totally unidentifiable
    little metal object with flanges at both ends and a sort of ridge and a hole
    for a screw. This, when found, will get thrown away. No one knows what the
    screwdriver is supposed to gain from this. Nature, in her infinite wisdom,
    is presumably working on it.

    - this post brought to you by the Automated Last Post Generator...