Slashdot Mirror


Multithreading - What's it Mean to Developers?

sysadmn writes "Yet another reason not to count Sun out: Chip Multithreading. CMT, as Sun calls it, is the use of hardware to assist in the execution of multiple simultaneous tasks - even on a single processor. This excellent tutorial on Sun's Developer site explains the technology, and why throughput has become more important than absolute speed in the enterprise. From the intro: Chip multi-threading (CMT) brings to hardware the concept of multi-threading, similar to software multi-threading. ... A CMT-enabled processor, similar to software multi-threading, executes many software threads simultaneously within a processor on cores. So in a system with CMT processors, software threads can be executed simultaneously within one processor or across many processors. Executing software threads simultaneously within a single processor increases a processor's efficiency as wait latencies are minimized. "

357 comments

  1. -1, Redundant: Hyperthreading. by Anonymous Coward · · Score: 2, Insightful

    How long has hyperthreading been available on Intel CPU's?

    1. Re:-1, Redundant: Hyperthreading. by Anonymous Coward · · Score: 0

      Hyperthreading is a variant of SMT, Simultaneous Multi-Threading.

      How does CMT compare? Is it a variant of SMT, or is it an all encompassing term covering both multi-core processors and multi-threaded cores?

    2. Re:-1, Redundant: Hyperthreading. by rcamans · · Score: 1

      Sorry, but hyperthreading is not available on Celerons.
      Only on the expensive stuff.

      --
      wake up and hold your nose
    3. Re:-1, Redundant: Hyperthreading. by Anonymous Coward · · Score: 0

      Compared to Sun hardware anything Intel makes is cheap..

    4. Re:-1, Redundant: Hyperthreading. by johnhennessy · · Score: 5, Informative

      There are some significant differences between hyperthreading and Suns approach.

      Tiny amount of background:

      Hardest part when trying to run things in parallel is figuring out what you can run in parallel. Example: two operations (pseudocode): c=a+b and d+c+e. These two cannot be run in parallel, since you need to result of a+b before you can start c+e.

      With modern operating systems there are many programs running at one time, and they may contain seperate threads. One assumption of threading is that threads can run asynchronously to one another - you will not get a situtation like that above (okay, okay, I'm simplying!).

      With Hyperthreading, Intel gets the CPU to pretend to the OS that there are actually two of them. They duplicate the fetch and decode units, but only use one execute unit - which probably has several FPUs and Integer units. They rely on an FPU or an Integer unit being available to be able to get a performance benefit.

      So Intel (up til now) have duplicated the fetch and decode, but still had the same execute unit.

      Suns approach is to replicate the whole pipeline - fetch, decode, execute. Intel can't really scale hyperthreading beyond two "processors", whereas Sun are aiming to try and execute 8, 16 or even more at one time.

      Because of Intels architecture they can't really scale hyperthreading in this way - for lots of reasons. I'm sure other people can add them.

      This really won't be of huge benefit to your Doom3 FPS, but for business apps (think J2EE) or message queues or science applications it will allow compute servers to scale better at heavy loads (i.e. when lots of threads are doing something that isn't IO bound, at the same time).

      --
      [ Monday is a terrible way to spend one seventh of your life. ]
    5. Re:-1, Redundant: Hyperthreading. by Anonymous Coward · · Score: 3, Interesting


      The moderators are on crack, today. Intel's hyperthreading is more of a marketing gimmik (which you fell for). It provides, what, a few percent improvement in performance?

      The fact is that Intel's Pentiums spend most of their time _not_doing_anything_at_all_. They just sit their waiting on data.

      Sun's Niagara will be able to queue 32-threads simultaneously, which 8 of those threads computing (8 cores). My guess is that Sun's analysis showed that, on average, three threads are waiting on memory while one can go forward with the data it has. This means that Sun is betting on your beloved Pentium being only 25% efficient!

      I think I've also read that Sun is planning on giving Niagara obsene amounts of bandwidth to RAM. In short, if you are running a web server, for example, it would be stupid to stick with something like Pentium when something like Niagara is available.

    6. Re:-1, Redundant: Hyperthreading. by Anonymous Coward · · Score: 1, Informative


      Go price it out, apples to apples, and Sun really can compete with Dell. It will take people a while to really understand this (a double-take is probably in order), but it is true.

    7. Re:-1, Redundant: Hyperthreading. by Anonymous Coward · · Score: 0

      Read the damned article asshole!

    8. Re:-1, Redundant: Hyperthreading. by time64_t · · Score: 1, Funny
      > okay, I'm simplying!

      No offence, but did you mean to say

      "I'm simply lying!"

      or

      "I'm simplifying!"

    9. Re:-1, Redundant: Hyperthreading. by Anonymous Coward · · Score: 4, Informative

      Hyperthread DOES NOT HAVE ADDITIONAL FETCH and DECODE, it just permits 2 different threads to occupy the the reorder buffer thus reducing penalties as a result of a context switch, so instead of a context switch the CPU fools the OS into thinking it can issue two threads of instruction simultaneously. So fetch is designed to switch between instruction memory locations based on a turn system, so it really starts work on one thread and then in the next cycle begins work on that thread. It keeps 2 separate rename tables one for each instruction, and keeps track of which thread a given instruction is. So essentially execute is the same even the reorder buffer is almost the same but it tracks which thread an op is running on. The tricky part is getting the front end to toggle correctly between the 2 regfiles and the 2 rename tables. Also fetching from different threads of control is also tricky, I think some sort of queue is used.

      Fyi, hyperthreading is used on intel because the number of instructions in-flight. The processor during a context switch interupts, saves to the stack, clears out the REGFILE, rename table, and the ROB, losing all the work accomplished that is not written back to the Regfile. So on an AMD processor this is not a huge deal, but on the P4 this is a problem because the frequent context switches that occur on modern systems cause the intel design to lose the advantage of having many instructions in flight. AMD could realize performance gains just not as much and at the cost of clockspeed.

      As for CMT, no it is essentially hyperthreading but could be a better, more costly, more effective design than intels simple design. Duplication of a pipeline is a multicore chip which Sun is doing with Niagra.

    10. Re:-1, Redundant: Hyperthreading. by Anonymous Coward · · Score: 0

      I was hired to evaluate software tools supporting intel's 80960 series in 1987.

      The 80960 was (is?) a RISC/CISC chip with the benefits of fast RISC-like functions but also having the complex higher-level functions available (I think on a separate pipeline, in order to prevent them from blocking the RISC pipeline.)

      One key feature of that processor was that it supported threading directly in the instruction set - with a set of thread lock, unlock, and switch operations in single opcodes, they really WERE atomic functions.

      Unfortunately (in my opinion) the x86 architecture really took off around that time, and the commoditization of that architecture meant that Intel relegated the 80960 to the embedded market.

      And no, there is nothing in this note which has not already been made public by Intel.

    11. Re:-1, Redundant: Hyperthreading. by InvalidError · · Score: 2, Interesting

      Actually, Intel's research (before HT became reality) said that on average, the instruction decoder was issuing just under 2.5 instructions per tick out of a maximum of 3... so instruction decoder throughput in single-threaded mode is about 75% of maximum.

      On AMD's side, the decoder has quadruple outputs and IIRC, AMD's average is 3 out of 4 so again 75% from maximum.

      By adding SMT, Intel gave the P4 the potential to keep all instruction ports busy and AMD plans to do the same next year... a single-core A64 with SMT would be interesting but we will have to settle with dual-core dual-threaded A64s and P4s which should be interesting as well.

      How do AMD and Intel manage to get 75% single-threaded when we know they will be stalled by RAM? Simple, out-of-order execution - most CPUs can look 32-128 instructions ahead to find something to do while stalled, this is necessary to maximize single-thread performance and would become unnecessary if apps and CPUs became massively multi-threaded, which appears to be what Sun is gunning for.

      As far as concurrent SMT is concerned, I think four threads per CPU core will turn out to be the practical maximum for desktop chips. We will probably see this happen once the A64/P4/PM are upgraded to six execution ports, three or four years from now.

      The only reason Sun can think of doing a SMTx8/32 chip is because their CPUs runs at ~1GHz. At higher speeds, they would not have the necessary timing margins to fit the extra logic to efficiently shuffle execution states between "reserve" and "active" threads.

    12. Re:-1, Redundant: Hyperthreading. by woah · · Score: 0
      I'm simplying!

      Well duh!
      You don't have to rub it in. Just simplify the spelling as you go along. I mean, talk about being redundant.

      *looks at the subject

      oh... nevermind.

    13. Re:-1, Redundant: Hyperthreading. by pipingguy · · Score: 1


      Hardest part when trying to run things in parallel is figuring out what you can run in parallel. Example: two operations (pseudocode): c=a+b and d+c+e. These two cannot be run in parallel, since you need to result of a+b before you can start c+e.

      You know, that was pretty easy to understand, even for a non-programmer like me. Your overall post must make sense (not that I'd know enough to tell if it didn't), since it's been modded to +5 in the Developers' section.

      Not a troll, I actually have points now and can't mod it up further.

    14. Re:-1, Redundant: Hyperthreading. by lachlan76 · · Score: 3, Informative
      ardest part when trying to run things in parallel is figuring out what you can run in parallel. Example: two operations (pseudocode): c=a+b and d+c+e. These two cannot be run in parallel, since you need to result of a+b before you can start c+e.

      Not at all, because you can add d+e. For example:
      No multithreading:
      add a,b
      add d,a
      add d,e

      With multithreading:
      First thread:
      add a,b
      ;Make sure other thread is finished
      add a,d

      Second thread:
      add d,e
    15. Re:-1, Redundant: Hyperthreading. by Anonymous Coward · · Score: 0

      Actually, Intel's research (before HT became reality) said that on average, the instruction decoder was issuing just under 2.5 instructions per tick out of a maximum of 3... so instruction decoder throughput in single-threaded mode is about 75% of maximum.

      On what workloads? SPECint?

      Sun is designing for workloads that real businesses use, which tend to leave the CPU stalled a lot of the time looking for data.

  2. it means a lot by Anonymous Coward · · Score: 4, Informative

    I am a developper, mainly in C, and I did a lot of programation on QNX4 with multi-threading (even if QNX4 implantation is not *really* threads), now I am doing it in Precise/MQX.
    Multi-threading comes with synchronization, semaphore, mutex, etc, once you know how to deal with them, it's easy.

    1. Re:it means a lot by Frymaster · · Score: 1
      once you know how to deal with them, it's easy

      that's all well and good from a developer standpoint. but for the end user, the problem is going to be software availability.

      witness altivec: apple's vector processing promised to offer all sorts of wild and crazy performance gains, but the prospect of massive refactoring of existing codebases prevented it from being widely adopted. the result is that even though your spiffy new g5 has altivec under the hood, aside from photoshop, there isn't really any software available for it.

      i can forsee sun running into the same problem.

    2. Re:it means a lot by BigSven · · Score: 2, Informative

      The CVS version of GIMP has Altivec support. That makes it two applications already ;)

    3. Re:it means a lot by Waffle+Iron · · Score: 5, Insightful
      Multi-threading comes with synchronization, semaphore, mutex, etc, once you know how to deal with them, it's easy.

      I know how to deal with them. It may seem easy at first, but it's actually very hard. Your program can run for days before a thread synchronization bug surfaces and it finally deadlocks. And since it's timing dependent, you can't reproduce it.

      In principle there are rules to follow to avoid deadlocks and race conditions, but since they need to be manually enforced, there's always potential for error. At least with memory access bugs the hardware often shows you a segfault; with synchronization problems you usually don't even get that.

      I've learned over the years that preemptive multithreading should be used only as a last resort, and even then, it's best to put exactly one synchronization point in the entire app. Self-contained tasks should be dispatched from that point and deliver their results back with little or no interaction with the other threads.

      The worst thing you can do is randomly sprinkle a bunch of semaphores, mutexes, etc. all over your app.

    4. Re:it means a lot by fitten · · Score: 2, Interesting

      That's fine for producer/consumer type problems, but there are other types of problems that don't lend themselves to that model.

      I've been programming multithreaded code for a while, too, and giant locking (which is what you describe) is not very efficient much of the time for what I've done in the past. Linux and Solaris had this type of architecture for the kernel at one time and they've long since evolved away from that.

      In short, how you use threads really depends on what you are trying to do. Hammering all multi-threaded programming into this one model may not be efficient or easy. That model does serve nicely for a number of tasks, but not all.

    5. Re:it means a lot by leonmergen · · Score: 4, Interesting

      I've learned over the years that preemptive multithreading should be used only as a last resort, and even then, it's best to put exactly one synchronization point in the entire app. Self-contained tasks should be dispatched from that point and deliver their results back with little or no interaction with the other threads.

      Exactly, and that's where design patterns come into play... many of these problems have been formally described in patterns you can follow to avoid this; with thread synchronization, you can use the Half-Sync/Half-Async pattern for example, and you can make a task an Active Object so it can deliver its own results...

      Multi-Threaded programming is hard, very hard; but you're not alone who thinks it's hard, and many researchers have formally described a bunch of rules you can follow... if you follow these rules, you often enough eliminate most of the more complicated problems.

      --
      - Leon Mergen
      http://www.solatis.com
    6. Re:it means a lot by moonbender · · Score: 2, Informative

      I'm not an Apple geek, but from what I read here, OS X itself makes use of AltiVec everywhere it makes sense. That's one application everyone will run 100% of the time. Also, Apple's libraries many/some applications use are optimised for AltiVec. From the sounds of it, AltiVec is used more than its x86 counterparts.

      --
      Switch back to Slashdot's D1 system.
    7. Re:it means a lot by Anonymous Coward · · Score: 0

      The question is, how many apps CAN benifit from multi-core systems. Graphics editors are the main ones now, I think I also heard about some new Video editor too. Eventually Game engines will get into it more. I image/hope they already are using multithreading code for AI, physics, player input, etc. Most of my development though is office based applications. Leasing software and what not. We use tons of threading, but a multi-core processor wouldn't help our apps much, if any, because we use threads primarily to isolate any wait points away from the GUI. That way the user get to have their little flashy progress bar, and the window doesn't turn white and crap out.

      My question is, should the control of threading be the application coder's responcibility, or the OS's responsibility. Should I as a developer say "Send this physics algorythm to proc 3 and send this AI process to proc 5!" or should I say "System, start Physics algorythm in new thread. System, start AI process in new thread!" and let the System deal w/ low level control.

      -Rick

    8. Re:it means a lot by Anonymous Coward · · Score: 2, Informative

      Recent versions of the GNU Compiler Collection, IBM Visual Age Compiler and other compilers provide intrinsics to access AltiVec instructions directly from C and C++ programs.

    9. Re:it means a lot by iwadasn · · Score: 1


      I disagree. Multithreading is very important for virtually every major business (think j2ee/server) app around, even GUI apps shouldn't be doing much work in the GUI thread.

      However, you are right that you need to be very careful. I would reccomend trying to cut your program into well defined modules (OO programming, coming back again), and then attempt to make each of them as atomic as possible. Also, be careful of callbacks. It's best to only make callbacks with threads that you know for sure cannot be holding a lock. If you've done that, and you make sure that any call into the system can only contend for locks that are sealed completely within the module, and that these locks always have a definite ordering (you always get A before B if you need A and B), then you are at least guaranteed that that this module will not participate in a deadlock.

    10. Re:it means a lot by guitaristx · · Score: 5, Insightful

      As far as threading is concerned, one of the few languages I've dealt with that makes mutexes, semaphores, etc. easy to deal with is Java. Most other languages bury the stuff too deep into the proprietary APIs to make them useful. Consider multithreading in win32. We need better programming languages before we can ever start reaping the benefits of good multithreading hardware.

      Furthermore, we need to get rid of lazy programming. I'm tired of watching people write slow, lazy, inefficient (in terms of both memory space AND speed) code, and justify its existence with "it'll run fast on the new über-hyper-monkey-quadruple-bucky processors." Too many times, the problem is that you've got slow code running in every thread. If the code wasn't so damned lazy, programmers would care more about nifty new hardware. We're not even coming close to using our current hardware to capacity. I've got a 1.2GHz processor with 1024Mb of RAM, and my box chugs opening an M$ Word doc?! WTF?!

      <soapbox>
      Most programming in the world is very similar to the universal statu$ symbol in the U.S.A. - a big gas-guzzling SUV. It's not like Jane the Soccer Mom really needs 300hp to haul her kids and groceries around town. Similarly, we have lots of lazy code out there that doesn't do much of anything but consume resources and pollute the environment. A nifty new processor feature won't be noticed in the computing world because it won't get used anyway, just like Jane the Soccer Mom wouldn't notice 100 more horsepower. </soapbox>

      --
      I pity the foo that isn't metasyntactic
    11. Re:it means a lot by SunFan · · Score: 1

      the problem is going to be software availability.

      Nope. Niagara is SPARC and will run Solaris. Just like any other Sun server.

      --
      -- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
    12. Re:it means a lot by tim256 · · Score: 1
      If you use java it's a lot easier. But, I agree thread synchronization is difficult to test. If your application has multiple database connections, that can make things worse.

      It's estimated by gamespy that the next xbox will have 3 cores. So finally, we will being seeing video games that use multi-threading. The results will be interesting.

    13. Re:it means a lot by Homology · · Score: 1
      Following design patterns like they're some kind of formal rules to always obey are a sure way to disaster. Design patterns are a kind of generalized solution to a particular problem, and they are intended as a starting point for application to your specific problem.

      There are several patterns that are very useful for safe multithreaded programming. Properly applied they can greatly reduce the risks of multithreaded programming while reap some of the benefits.

      Multi-threaded programming in complex applications is very hard to do correctly. I know, I do this type of programming daily. Heck, even trivial programs have real problems with this. And why is that? Because we have problems thinking in parallell all the time. A good system architecture will make it easier to do this correctly.

    14. Re:it means a lot by Homology · · Score: 2, Insightful
      As far as threading is concerned, one of the few languages I've dealt with that makes mutexes, semaphores, etc. easy to deal with is Java. Most other languages bury the stuff too deep into the proprietary APIs to make them useful. Consider multithreading in win32 [microsoft.com]. We need better programming languages before we can ever start reaping the benefits of good multithreading hardware.

      Pure bullshit.

    15. Re:it means a lot by Anonymous Coward · · Score: 0

      Your box chugs opening an MS Word doc? Maybe stop trying to open an MS Word doc in Mandrake Linux under WINE.

      It's funny how XP on a PII-266 with 192 MB of RAM is snappy, whereas any modern Linux on such a machine runs like an old dog.

    16. Re:it means a lot by Anonymous Coward · · Score: 0


      and even then, it's best to put exactly one synchronization point in the entire app.


      Did you design Windows 3?

    17. Re:it means a lot by rs79 · · Score: 1

      "Multithreading is very important for virtually every major business (think j2ee/server) app around"

      It makes you wonder how those Bell lab-rats ever invented unix on a machine that didn't have it; they didn't even have memory maagement.

      --
      Need Mercedes parts ?
    18. Re:it means a lot by Rattencremesuppe · · Score: 1
      A nifty new processor feature won't be noticed in the computing world because it won't get used anyway,

      this reminds me of the SIMD stuff (it's also a way of "parallel computing"). BTW, has the compiler support for automatic SIMD generation matured in the meantime?

      Last time I checked, if you wanted to use MMX/SSE/etc. on x86, you had to hand-code your stuff in assembler (separately for PIII, PIV, AMD etc.), or buy an expensive Intel compiler.

    19. Re:it means a lot by MORTAR_COMBAT! · · Score: 3, Informative

      exactly, and unlike Altivec, there are no "special instructions" to get benefits from Niagara -- just synchronization, deadlock, and such parallel processing issues which most enterprise software is already aware of.

      (to dumb it down: no new opcodes, existing software will benefit if it can, break if it was poorly written to begin with.)

      --
      MORTAR COMBAT!
    20. Re:it means a lot by fnord_uk · · Score: 1

      Dude, the worst thing you can do is randomly sprinkle anything all over your app, apart from comments, and maybe asserts.

      --
      In theory, theory and practice are the same. In practice, they're not.
    21. Re:it means a lot by SuiteSisterMary · · Score: 1

      The Sega Saturn had two CPUs. The PS2 has multiple vector units that need to be dealt with, as I recall, in a multi-threaded fashion.

      --
      Vintage computer games and RPG books available. Email me if you're interested.
    22. Re:it means a lot by perkr · · Score: 1

      Until Java 1.5 (and the java.util.concurrent API) Java didn't do that much to make it easier to produce multi-threaded programs, the built-in synchronization just tricks new programmers into writing dangerous applications. For example reading a long is not guaranteed to be atomic, there are no built-in facilities for semaphores, etc.

      And regarding your raning on lazy programmers, programs actually do a lot more stuff now than they used to do. Accessibility, i18n, flexible underlaying toolkits and API:s, better graphics, more features in general, etc.

    23. Re:it means a lot by LurkerXXX · · Score: 1
      That's why I was always so impressed with QNX. They fit an OS, network stack, modem driver, GUI, and web browser, and a few other utils,... on a floppy disk. That's impressive.

      Now I look at my regular computer. Just my email client alone, Thunderbird, takes up 20 MB on my hard drive. Fat. Fat. Fat.

      I'm afraid no ones going to bother optimizing code until Moore's law runs up against some physical engineering walls.

    24. Re:it means a lot by Gauchito · · Score: 2, Insightful

      Another problem with multi-threading is that nothing is a black box anymore (not like anything really is, anyway). Once you start worrying about sharing statics and globals, you need to consider all the accesses done by objects you bring in from other libraries, which means you need to check the source to see if, for example, it uses a static cache (with no locking). Then, you need to dig in, find out why you're getting seg faults or corrupted memory, track down where else you're using this class (could be, for example, inside another object entirely, which could again be from a different library), synchronize with that other thread that you thought was completely unrelated (i.e., used the same class, but had its own instance of it), rinse, and repeat.

      If you are using third party or homegrown (but not yours) libraries inside your multithreaded program, pretty soon you'll realize that not only do you need to know what and how they are accessing, but you also need to keep close tabs on what changes are done in future releases (again, keeping track of implementation details in those release). Your quick and highly parallelized threading program just became a maintenance nightmare.

    25. Re:it means a lot by Coryoth · · Score: 1

      Heck, even trivial programs have real problems with this. And why is that? Because we have problems thinking in parallell all the time.

      Exactly. So what we need is a way of thinking about the problem that makes it easy to think about and do analysis. I'm sure there are probably some Computer Science researchers contemplating these problems. Hopefully there are really good people, like Tony Hoare working on it. What's that? He did work on this already? He came up with something that solves most of these problems already? Can't have been that long ago could it? Only 20 years ago you say?

      Check out CSP. If you want a way to think about multithreaded applications that makes managing threads and avoiding deadlocks easy it's already available - it just hasn't got the publicity amongst developers that it deserves. If your project is in Java there are already things like JCSP which let you do CSP style threading in Java right now.

      Jedidiah.

    26. Re:it means a lot by fupeg · · Score: 3, Interesting
      As far as threading is concerned, one of the few languages I've dealt with that makes mutexes, semaphores, etc. easy to deal with is Java
      Umm, ok. Java has always made synchronization easy to get to use. It's never been particularly straightforward, because of Java's interpretive nature and the all the wonderful JIT liberties allowed for JVMs. Just look at all the confusion around double check locking. JDK 1.5 is the first version of Java to formally expose semaphores. Now they are "easy" to use just like syncrhonization. Verdict is still out on how easy they are to understand.
      Furthermore, we need to get rid of lazy programming.
      Oh brother, here we go again. Let me guess, you could probably write a multi-threaded database server that supported fully ATOMIC operations and transactionality, would only need 4K of memory, and would be blazingly fast on a 486SX machine, right? Over-optimization pundits are the worst, even worse than design pattern pundits. This has been discussed many times before. Fast, buggy code has zero value.
    27. Re:it means a lot by Anonymous Coward · · Score: 1, Insightful

      I'm quite impressed that you managed to convince yourself that Java multithreading is in any way better than win32's multithreading API. I mean, Have you ever even tried to write a program with nontrivial multithreading in Java?

      Just an example, in win32, a thread can blcok to wait on multiple locks or events and wake when any of them is signalled (e.g. waiting on multiple sockets, or on multiple client threads, or until one of a set of locks is open). This is just a single WaitForMultipleObjectsEx call. This is impossible to do in java. You have to have every lock run its own thread, to grab a lock and do a signal. At least now java has a socket.select, so you don't need to keep a thread per socket (as you did in 1.0 and 1.1).

      Try writing something to handle the reader/writer problem with multiple readers and writers. You require some really contrived code using something like five locks in order for it to work correctly. Even the reader/writer streams provided by the java api only work with a single pair of threads.

      The windows API is ugly in parts, especially the GUI section, but the windows base API is quite nice. I like their synchronization stuff a lot. It's even object oriented, of sorts. For example, you always use the same calls to wait for any object, regardless of the type of object (mutex, lock, semaphore, event, file handle, thread, etc), a sort of dynamic dispatch.

      I'm told Java 1.5 fixed up a lot of these problems, but unless you started coding a few months ago, I don't understand how you can consider Java's threading API anything but crippled and lousy.

    28. Re:it means a lot by Anonymous Coward · · Score: 0

      This is a construct whose entire structure is designed so that if you are very very careful, and follow an expertly designed, still evolving, specific set of rules, you won't, most of the time, get a 'worse than seg-fault' crash.

      Does anyone want to re-read that?

      "Don't, you nasty blackhats, write code that doesn't follow these rules, because then exploits are possible." Now this is a security proposal I haven't heard before.

      There are military planes whose aerodynamic characteristics are so demanding that it is impossible for a human to fly, the computer must be making the decisions. [Response times etc etc.].

      The story about Chernobyl that I know is that a control room guy said he knew the reactor so well that he didn't need the computer safety systems to run the plant, and shut them off for his visitors.

      Putting together code and machinery so prone to unpleasant non-user-intended events is bad design and shouldn't be defended.

    29. Re:it means a lot by the_duke_of_hazzard · · Score: 1

      I find I spend most of my days telling graduates to "be stupid".

      "Stupid code is good. Stupid code works."

    30. Re:it means a lot by ckaminski · · Score: 1

      The biggest problem with thread synchronization is mixed primitives. On windows, for instance, a thread can take ownership of a mutex several times, IE it can never block on a mutex that it already owns. Some primitives aren't so forgiving, and that can prove to be a nasty little problem.

      It is one very tough nut to crack. The ACE toolkit <url:www.cs.wustl.edu/~schmidt> has a number of Reactor patterns in it that have simplified my developement, doing what you've marked above, helping keep synchronization to a minimum. One of the best solutions however to deadlock is to use the try_lock() and abort. Most modern unixes and Windows support this feature. A deadlock should never occur, in reality.

    31. Re:it means a lot by Anonymous Coward · · Score: 0
      I've got a 1.2GHz processor with 1024Mb of RAM, and my box chugs opening an M$ Word doc?! WTF?!
      You run M$ Word instead of vim or emacs? WTF?!

      It's lame to blame programmers, when your perception of programmers has been biased due to you picking the worst software in the universe. That's like driving a 1973 Gremlin and then complaining that modern auto technology sucks.

      Quit running crapware, and then maybe once you try something of roughly average quality, then you can flame lazy programmers.

      Sheesh, Microsoft stuff. Could you possibly pick a more extreme and biased example?

    32. Re:it means a lot by Anonymous Coward · · Score: 0
      Multi-threading comes with synchronization, semaphore, mutex, etc, once you know how to deal with them, it's easy.
      I know how to deal with them. It may seem easy at first, but it's actually very hard. Your program can run for days before a thread synchronization bug surfaces and it finally deadlocks. And since it's timing dependent, you can't reproduce it.
      I would point out that whatever synchronization problems you think you have are absolutely nothing compared to what digital hardware designers contend with. The equations for their transistors are always "executed" in parallel, often to the tune of multiple independent clocks. The difference is that they are professionals that actually bother to use the proper tools for the job, instead of tying on the C/C++ blindfold and then whining about how unfairly hard it all is.
      The worst thing you can do is randomly sprinkle a bunch of semaphores, mutexes, etc. all over your app.
      The worst thing you can do is have no way to formally express the correct behavior of your system, and not a single tool for modelling and simulating its synchronization behavior.

      "Captain, I am attempting to construct a mnemonic memory circuit using stone knives and bear skins."
      -- Spock, The City on the Edge of Forever

    33. Re:it means a lot by Rattencremesuppe · · Score: 1
      Fast, buggy code has zero value.

      But it can crash much faster than slow, buggy code.

    34. Re:it means a lot by JebusIsLord · · Score: 0, Redundant

      wow, that was insightful... care to elaborate? I was thinking the same thing as the grandparent... Java is one of the few languages which make multithreading relatively painless, and the abstraction therein would be very complimentary to this type of technology. For Sun, this makes sense!

      --
      Jeremy
    35. Re:it means a lot by Anonymous Coward · · Score: 0

      "As far as threading is concerned, one of the few languages I've dealt with that makes mutexes, semaphores, etc. easy to deal with is Java. Most other languages bury the stuff too deep into the proprietary APIs to make them useful. Consider multithreading in win32. "

      I dont think that is a fair comparison. I havent done much work with Java threads, so I cant really speak to its simplicity or lack thereof. However under Win32 all those API's your link points to are low level functions. MFC has put a pretty nice wrapper around them. You can start a simple worker thread with two function calls. There are class wrappers for Mutexes, Semaphores, Critical Sections and so on. Saying Java's implementation of threading is easier to use vs. a bunch of low level API's is almost obvious. But you are comparing apples to gorillas.

      It's also ironic how in one sentence you complain about bloat, and in the next you sing the praises of Java, one of the most bloated languages around.

      Just my $.02

    36. Re:it means a lot by ckaminski · · Score: 1

      For some years, WindowsNT was THE defacto threading king. Efficient, easy to use, well defined semantics. Compared to the hell that was Posix Pthreads in 1994-1997... Condition variables were the only thing pthreads added that Windows didn't have, but those are damned easy to emulate (It's just a matter of how expensive it is).

    37. Re:it means a lot by Waffle+Iron · · Score: 1
      I would point out that whatever synchronization problems you think you have are absolutely nothing compared to what digital hardware designers contend with. The equations for their transistors are always "executed" in parallel, often to the tune of multiple independent clocks. The difference is that they are professionals that actually bother to use the proper tools for the job, instead of tying on the C/C++ blindfold and then whining about how unfairly hard it all is.

      Yes, I used to design hardware, and it's very parallel. It's also a much more constrained problem. I hope you're not proposing that people use hardware design languages to write software, because they were designed for a specialized problem domain and would suck as general-purpose languages. They feel clunky enough as it is when used for hardware design.

      Once you've done both hardware and software, then you'll realize that there are no silver bullets that cover all of the issues, and that includes your high-priced specialized CAD tools.

    38. Re:it means a lot by jovlinger · · Score: 1


      I'm afraid no ones going to bother optimizing code until Moore's law runs up against some physical engineering walls.


      And that is the right thing to do. It's probably cheaper for you the consumer (well, maybe not _you_, but the hypothetical windows user who buys COTS software) to buy a faster PC every so many years, than for the software vendors to spend emplyee-hours to make their snazzy software run well on old systems.

      I'm sure M$ could have carefully coded Office XP (or whatever the current version is called) to run happily on a 486, but it would cost several K$ and be coming out sometime next decade.

      It's fun to speculate that if Intel/AMD were forced to dramatically raise prices -- say that the next gen processor required an exorbitantly expensive substrate to manufacture -- that major software houses could be tempted to subsidize the costs, as it would still be cheaper than actually writing efficient software.

    39. Re:it means a lot by TheNetAvenger · · Score: 1

      care to elaborate? I was thinking the same thing as the grandparent... Java is one of the few languages which make multithreading relatively painless

      Elaborate? Consider the 10year old next door using Visual Basic (or Delphi) and easily writing multi-threaded applications with only a couple of lines of code. Let alone the multi-threading the API of Win32 and especially .NET already do automatically.

      Take this reference from Visual Basic, if you can't understand it and write a solid 'controlled' multi-threaded application, you need to consider another career.

      http://msdn.microsoft.com/library/default.asp?url= /library/en-us/vbcn7/html/vaconfreethreading.asp

      The problem with developers and multi-threading within their own applicaitons is thinking beyond the event model where a response triggers a single action.

      Java is still a poor language patch together by a few smart people trying to save it. Even the Solaris team at one time called Java crap that was shoved down their throats.

      Java was the birth of many good ideas, but few Sun has brought to fruition. (And again, why on an Open Source Advocay site, do people still bow down to Java and Sun? Microsoft has made C# and .NET more open than Sun has Java even. Geesh)

      And I am definately not a Microsoft Developer fan boy, even on the Windows Platform I seldom use MS development tools.

    40. Re:it means a lot by iwadasn · · Score: 1


      well, I guess they weren't doing real time market data distribution, go figure.

    41. Re:it means a lot by Anonymous Coward · · Score: 0

      I'm not sure what your argument is, other than "Java is bad" and "most developers are too dumb to write multithreaded code". Care to elaborate?

    42. Re:it means a lot by Anonymous Coward · · Score: 0

      Furthermore, we need to get rid of lazy programming.

      Lazy programming is one of the greatest advances in computer science: http://www.dict.org/bin/Dict?Form=Dict2&Database=* &Query=lazy+evaluation

    43. Re:it means a lot by angel'o'sphere · · Score: 1



      And I am definately not a Microsoft Developer fan boy, even on the Windows Platform I seldom use MS development tools.


      But Java you seem to have never sed at all :D


      And again, why on an Open Source Advocay site, do people still bow down to Java and Sun? Microsoft has made C# and .NET more open than Sun has Java even. Geesh


      Probably because technical issues hav nothing to do with politics?

      Anyway you are wrong. Java is quite open, see JCP. M$ is not, besides that there is Mono.

      So how can you claim that M$ is more open than SUN? I assume the following:

      a) you dont like Java
      b) so you don't work with Java
      c) so you don't care about Java
      d) so you only pick up random statements which fit into your view about Java

      And now you tell us what java is .... seems not really appropriated in my eyes.

      angel'o'sphere

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    44. Re:it means a lot by TwistedSquare · · Score: 1
      As far as threading is concerned, one of the few languages I've dealt with that makes mutexes, semaphores, etc. easy to deal with is Java. Most other languages bury the stuff too deep into the proprietary APIs to make them useful. Consider multithreading in win32 [microsoft.com]. We need better programming languages before we can ever start reaping the benefits of good multithreading hardware.

      Pure bullshit.

      Why? I think the grandparent was quite right. Lack of good support for concurrency in languages is a major reason why people shy away from it; concurrency in C and other such languages is both non-portable, and a frightening prospect.

      Java does make a valiant attempt, but still falls flat because it was too afraid to go all out and properly support concurrency, but made a better attempt than C. Although I suspect a lot of concurrency problems in Java are inherited from the underlying OSes (e.g. not being able to use threading well with GUIs). It is a problem that Computer Science has built up over the past 50 years, and one that will not be solved overnight.

    45. Re:it means a lot by guitaristx · · Score: 1
      Oh brother, here we go again. Let me guess, you could probably write a multi-threaded database server that supported fully ATOMIC operations and transactionality, would only need 4K of memory, and would be blazingly fast on a 486SX machine, right? Over-optimization pundits are the worst, even worse than design pattern pundits. This has been discussed [slashdot.org] many times before. Fast, buggy code has zero value.

      I couldn't agree more. However, I'm talking about writing code in such a way that makes modification and maintenance easier for the next person, rather than harder. I've maintained too many pieces of crap^H^H^H^H legacy software that would be beautiful things if the previous maintainers would have engineered their modifications and fixes, rather than hacking (sense 1) them.

      Most software is slow because of bad coding practices, usually of the following categories:
      1. Someone wrote an elegant and complex solution, and documentation was sparse or nonexistant, leaving the elegant solution utterly wasted whenever maintenance is due: "What the hell was Mel thinking? I've gotta fix all this crap!" Thus, the elegant solution loses its elegance because it's been "fixed to death."
      2. Someone did a quick, lazy fix because they didn't understand the code (often, because of #1): "Ehh... good enough." Of course, this makes the next cycle of maintenance senselessly complex, because it's source code gibberish - the original code plus the code written by someone that didn't know what they were doing. Then consider the next 5 maintenance cycles, all convoluting the code further from the original intent. How can this resultant code be faster?
      I'm not talking about over-optimization, I'm talking about writing good code^H^H^H^Hsoftware. It goes beyond just getting the job done. You can whine and call me an "over-optimization pundit," or you can do your part in the software community by writing better software.

      good_software != optimized_software
      --
      I pity the foo that isn't metasyntactic
    46. Re:it means a lot by lpq · · Score: 1

      Hmmm....

      Forgive me if I misunderstand -- but that sounds very much like SMP level programming. How is thread syncronization different than that found in
      SMP syncronization?

      Isn't thread level going to be figuring out more about what can be done
      in parallel -- different algorithms to exploit the increase in parallelism
      in the same process?

      This begs another question: is linux well suited to multi-threaded CPU's vs. the current thread emulation that has to be done on top of separate processes?
      Are Linux processes "light weight" enough for these new architectures?

      Sadly, as a programmer, I'd be hard pressed to design a program that would
      run optimally when placed on single-threaded vs. multi-threaded CPU's.

      If I knew I had multiple threads to play with, I might divise algorithms
      to split up work, but if run on a single CPU, the divide and merge
      code might introduce extra delay.

      Might it also not vary depending on how the threads are implemented? I.e. -
      if they are true "light weight threads" (whatever that means), vs. separate processes?

      I'm assuming that separate threads would all want CPU at the same time
      in the optimal case -- we are not talking about the case of one thread
      going to sleep while another operates and having them execute sequentially.
      The overhead of managing them with a process scheduler seems like it might
      outweigh parallelism benefits to "multi-thread" some code but have it
      run sequentially.

      Until the overhead of thread-switching/task-switching can be quantified, it will be difficult to _explicitly_ write code for multithreading.

      Worse -- if I am on a non-threading (or minimal threading) single cpu,
      just calling the main process scheduler might cause a disruption of what
      is in the cache and largely negate the benefits of multithreading.

      If the bottle neck, now, is memory I/O on the Intel platform, how will multiple
      cores and threads help that much? It seems like they'll be waiting a greater percentage of their clock cycles for memory I/O.

      -l

    47. Re:it means a lot by Anonymous Coward · · Score: 0

      It's funny how XP on a PII-266 with 192 MB of RAM is snappy, whereas any modern Linux on such a machine runs like an old dog.

      Look, this troll has bullshit dripping from his mouth.

    48. Re:it means a lot by LilMikey · · Score: 1

      sing the praises of Java, one of the most bloated languages around.

      Not sure if maybe you meant the language itself is bloated(?) but the runtime comes in at around 15 and a half meg. The .NET Runtime comes in at about 23 and a half meg. The Powerbuilder libraries we must distribute are around 43 meg (uncompressed and just for 9.0 although we must put out 7.0 as well). ActivePython is 15 (Linux) to 18 (Windows) meg. ActivePerl is 15 Linux and 12.5 Windows. The Java runtime itself seems rather... average.

      --
      LilMikey.com... I'll stop doing it when you sto
    49. Re:it means a lot by TheNetAvenger · · Score: 1

      ) you dont like Java
      b) so you don't work with Java
      c) so you don't care about Java
      d) so you only pick up random statements which fit into your view about Java


      A) you have no facts, so go after the poster's credibility.
      B) you are so in love with java, that you are considering making it your religion
      C) you have never serious developed in the real world
      D) you know nothing about me, and are only making assumptions... An assumption is just fool's intuition.

      I was using JAVA when it first appeared, and marveled at the concepts of its ideals when it was in an infant stage and a lot of it was still conceptual (not that all of it has STILL been implemented). My company and I played with it for years, hoping that it would eventually live up to the promises of cross platform reliability, scalability and performance.

      However now... It is kind of old news. Running applets and applications across the Web is old hat, and there are several technologies that do it well.

      Besides, wasn't this topic about multi-threading?

      Java has very little real support of multi-threading. The very nature of the language and it cross-platform designs never were designed to be the end all to be all for multi-threading.

      What little 'ease' in multi-threading in developing a JAVA application was nothing more than picking up on concepts introduced YEARS before JAVA was even constructed. Maybe you should pretend like you are capable of researching something, and actually go look up these theories, how they were implemented in various development languages and OS platforms.

      If you want to compare JAVA to writing C++ code for WindowsNT back in 1993, then absolutely, I am 100% with you, JAVA is far easier to easily create multi-threaded applications. But in comparison to VB, Delphi, Kylix, and MODERN C development languages and APIs and Libraries, it is nothing special, far from it.

      I wasn't just trying to make a rant that JAVA was just pure crap; I was trying to bring some reality back to the topic. Everyone for some reason rallies around SUN and JAVA here in these open source forums, when in fact they have done SO LITTLE for open source.

      Sun, nor its precious JAVA get any kudos in my book for being open source. (Even though there are rumors again yesterday that Sun might open JAVA up, rumors again - carrot on the stick comes out to get press.)

      After trying to use JAVA in production environments where cross platform and performance are important it STILL fails miserably. That is why you don't even see SUN making all their cross platform applications in JAVA, they are even writing them in C and compiling them platform specifically. Unless you are talking Web Applets very few if hardly ANY mainstream applications are JAVA based.

      When you can show me a Video rendering subsystem written in JAVA because of the reliability of its managed code and hardware independence, I will gladly apologize. Heck even Microsoft's 'bloated' .NET is capable of doing this, portions of DirectX9 are written in .NET, a managed development language. Yet after 10 years JAVA still has trouble running a sprite based frogger game reliably. Microsoft is literally pushing around 3D photorealistic imaging that is hardware independent and writing it in a managed framework that is a lot like the conceptual makings of what JAVA was supposed to become someday.

      So even if you walk away from this thinking, this person knows nothing about the EASE of multi-threading in JAVA, ask yourself this simple fact...

      What GOOD is multi-threading going to do in a development language that is performance challenged, when using muti-threading is a concept designed to get the most PERFORMANCE out of your hardware?

      The companies producing and innovating multi-core CPUS, mutli-CPU configurations, Hyperthreading and other emerging innovations are not going to tell everyone to use JAVA because it is easier to multi-thread and take advantage of the great performance boost these technologies provide.

      Anyone that HAS worked with JAVA and has sensibility can realize how ridiculous this would be.

    50. Re:it means a lot by TheNetAvenger · · Score: 1

      PS

      I should add for people following this thread that the touted multi-threading abilities in JAVA have not always even been in JAVA.

      It wasn't until the release of the 5.0 platform that they even added features to manage concurrency in an application that is even remotely on par with what is available in other development/platform environments.

      For example, prior to the this release even getting a return value from a thread required a ton of code, something that has been very easy to do in other the other development languages/platforms for years and years now.

  3. How is this different by Anonymous Coward · · Score: 0

    from Intel's hyperthreading?

    1. Re:How is this different by wezelboy · · Score: 2, Informative

      Hyperthreading makes a single processor appear as multiple processors to the OS. The OS still has to do all of the loading and storing yadayada associated with threading. From what I gather, CMT handles the threading overhead in hardware for faster context switches. Sort of reminicent of register windows on the SPARC chip.

  4. Sun is back baby! by Anonymous Coward · · Score: 1, Funny

    Nobody has anything like this and it will probably take competitors at least -2 years to catch up to sun.

    1. Re:Sun is back baby! by Anonymous Coward · · Score: 0

      Yeah, nobody has this except Intel with Hyperthreading and IBM with SMT on the Power5 chips.

      http://clusters.top500.org/ORSC/2004/power5.html/

      I could either buy one Sun Fire V490 or two IBM P550s. Guess which way we went.

    2. Re:Sun is back baby! by Anonymous Coward · · Score: 0

      Dude it's called IRONY.

  5. Multithreading? by PopeAlien · · Score: 4, Funny

    I dont mean to look a gift horse in the mouth..

    ..but wouldn't it be even better if it was hyper-multi-threading?

    1. Re:Multithreading? by smittyoneeach · · Score: 1

      active-hyper-multi-threading-gold, baby.

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
    2. Re:Multithreading? by Anonymous Coward · · Score: 0

      you mean active-hyper-multi-threading-gold-alpha+

    3. Re:Multithreading? by Anonymous Coward · · Score: 0

      Uhm, it's from Sun. That makes it UltraMULTITHREADING.

    4. Re:Multithreading? by Surt · · Score: 1

      How can I possibly consider using a technology featuring not even a single X?

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
  6. stackless.. by joeldg · · Score: 4, Interesting

    this makes me wonder what the effect would be on something like stackless python?
    the whole state pickling concept is pretty cool, and kind of throws threads all over..

    1. Re:stackless.. by CondeZer0 · · Score: 1

      Stackless is nice... specially channels.

      Python's main problems right now are multiple inheritance with all the unnecessary complexity it adds to the language and the lack of a decent concurrent programming model, stackless could provide a good concurrent programming model.

      But while it's outside the main distro stackless will only be a niche toy, so go nag Guido to include channels from stackless into Python 2.5!

      --
      "When in doubt, use brute force." Ken Thompson
    2. Re:stackless.. by GileadGreene · · Score: 1
      If you like the concurrency model that stackless uses, you should take a look at CSP, the concurrency theory that the stackless model is based on. It's the same theory that formed the basis for the occam programming language (probably the best concurrent language ever), as well as providing ideas for stackless, alef, limbo, concurrent-ml, and a host of other languages (as well as the JCSP and CCSP concurrency libraries for Java and C). The link above is to Hoare's original book on the subject, which is a little dated, but free.

      A good grasp of CSP theory makes concurrent programming much easier, and allows concurrent systems to be engineered mathematically, instead of just hacked together.

    3. Re:stackless.. by bbarooga · · Score: 1

      This processor would have precisely no effect on the scalability of Python since the Global Interpreter Lock ensures only one Python thread is run at any one time. See gil-of-doom Stackless appears to have no automatic context switching. Python typically needs multiple processes to utilise true parallel processing made available from multiple cores. Very heavyweight.

  7. i dont use multithreading by Anonymous Coward · · Score: 0, Interesting

    anything i write usually maxes out the processor at 100% for days at a time (i deal with huge data conversions)

    so yeah i'd also like to know: what does it mean to me?

    1. Re:i dont use multithreading by pclminion · · Score: 4, Insightful
      anything i write usually maxes out the processor at 100% for days at a time (i deal with huge data conversions) so yeah i'd also like to know: what does it mean to me?

      Well, if your data conversions are independent, multithreading might be of benefit to you if you have a hyperthreading processor.

      And are you sure you are maxing the processor? Surely you have to wait for disk or network, at least some of the time. If more than 10% or so (number pulled from ass but based on empirical observations) of you time is spent waiting for latent devices, you can benefit from multithreading even on a plain vanilla single CPU system with no hyperthreading.

    2. Re:i dont use multithreading by Fulcrum+of+Evil · · Score: 2, Insightful

      Well, if your data conversions are independent, multithreading might be of benefit to you if you have a hyperthreading processor.

      Unless the two execution states overflow your L1 cache, in which case a HT CPU could run slower.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    3. Re:i dont use multithreading by beerman2k · · Score: 1

      Which of course is almost always the case... :( I've seen HT win only in very few cases. For most applications at work we end up disabling HT to improve performance...

    4. Re:i dont use multithreading by darkain · · Score: 2, Informative

      its a nice theory in all, but im not too sure about it. if done correctly on a single threaded system, if one thread is in a wait state waiting on disc activity, then the CPU should jump threads and handle other tasks in the mean time. there is more then enough RAM and CPU Cache on modern computers that makes this quite effective. Also, isnt this what DMA channels are for? Wasn't the purpose of a DMA channel to mode data from one location to another while the CPU is performing other tasks? .... this is actually getting back to programming at the hardware level of a 386, its nothing new.

    5. Re:i dont use multithreading by pclminion · · Score: 1
      if done correctly on a single threaded system, if one thread is in a wait state waiting on disc activity, then the CPU should jump threads and handle other tasks in the mean time.

      That's precisely my point. One of those "other tasks" is another thread running your conversion. If you are maxing a CPU for long periods of time, you should be running on a dedicated box.

      Wasn't the purpose of a DMA channel to mode data from one location to another while the CPU is performing other tasks?

      Uhh, yes... But clearly, the thread which is waiting for that data still needs to wait. My entire point was that by multithreading your task you can eliminate the impact of a lot of that latency.

    6. Re:i dont use multithreading by e4liberty · · Score: 1

      Cluster computing is meant to address your situation (really). More processors may be able to do your conversions faster.

      Of course, this adds communication overhead.

      On a recent bio analysis project, I found that hyperthreaded processors were able to achieve super-linear speedup. In other words, N processors were more than N times faster than a single processor for this computation. My hypothesis is that the communication overhead was absorbed by the hyperthread, and the addition of more cache and RAM (on the additional processors) lead to the super linear speedup.

      This was using PVM http://www.csm.ornl.gov/pvm/pvm_home.html/

      e

    7. Re:i dont use multithreading by Anonymous Coward · · Score: 0

      It means that you can add a progress bar to the processes to watch while you wait for days.

      -Rick

    8. Re:i dont use multithreading by pclminion · · Score: 1
      Unless the two execution states overflow your L1 cache, in which case a HT CPU could run slower.

      I was under the impression that most HT cores had double the L1 cache to ameliorate this exact problem, but perhaps I haven't done my research.

      And if they DON'T, they sure as hell SHOULD.

      Also, I've always thought it would be cool to allow a process/thread to allocate a certain portion of the cache for its exclusive use. I understand that this would make the caching logic insanely complicated, however.

    9. Re:i dont use multithreading by fitten · · Score: 2, Interesting

      Cool, we did a bunch of research back in the mid 90s using MPI and published some papers about threaded communications and the like inside of MPI implementations. Also, it was common practice back on the i860 paragons with two or three processors per node to devote one of the CPUs totally to communications while the other cranked away.

      Also, be careful that you take the working set into consideration. Suppose you had one processor with 1M L2 cache but your problem needed 1.5M data to work on. It runs at around main memory speeds. However, take two such processors and (if you can) divide the data in half, each processor can now fit all of its data inside L2, which runs at L2 speeds. You can see superlinear speedups that way too.

      However, what you are saying is pretty much right on... communication overhead is almost all integer work so if you have an FPU compute thread going on and have the communications offloaded to a thread, those two things should play quite nicely on hyperthreaded Intel parts. This is even cheaper than the other past solutions of burning an entire CPU for communications while the other does computation.

    10. Re:i dont use multithreading by ComputerSlicer23 · · Score: 2, Informative
      if one thread is in a wait state waiting on disc activity, then the CPU should jump threads

      Your using the wrong word in there. Where you use the word "thread", you should be using the work "process" in UNIX parlance. What you are describing is "multi-tasking" in roughly a generic sense. It wasn't invented with the i386, try sometime in the 1960's (I'd have to crack out an OS book to be sure of the date).

      Threads are different then processes.

      Fundamentally, the standard definition of a thread is: "Seperate control of CPU, with the same VM space". Essentially, it's two processes who have precisely the same memory mapped. (I'm sure there are lots of details I just glossed over, but essentially, that's it). On thing this leads to is blowing the L1 cache if you have several threads interacting on the same pieces of memory.

      Threads have lots of performance problems, but they also greatly simplify programming as you pay a lot less attention to the "shared memory" aspect of it. You add some locking, and then essentially multiple threads of execution can work on the same bits of memory.

      However, you can do roughly what you described at the process level. Apache used to have a significant patch for "State Threads", that used various OS primitives to tell if an OS call would be blocking or not. If it wasn't going to be, it would make the call. If it was, it'd move on and see if there was more interesting work it could do rather then blocking.

      Threads a huge performance win, because any time you have multiple independent tasks they can be performed in parallel assuming you have enough CPU units. The problem is most computer problems have areas where there are separate areas where they can work on, and then they have other areas where they have to sync up and work serially thru some portions. At points, it's the overhead of spawning threads, and syncing is a losing proposition. However for a lot of things, it's the obvious way to speed up performance (GUI applications, it's nice to have on thread that works at keeping the screen up dated, while another is fetching data to display on the screen, thus avoiding applications that feel non-responsive because the window won't refresh for long periods of time while data is being fetched).

      This sounds roughly like, they are adding hardware support for threading just like the TLB hardware got added to make VM run at a sane speed. It's fundamentally we have this cool stuff we do in software, the sucks speedwise becuase the hardware is bad at X.

      Kirby

    11. Re:i dont use multithreading by Anonymous Coward · · Score: 0

      So, in other words your initial code wasn't optimal.

    12. Re:i dont use multithreading by rs79 · · Score: 1

      " It wasn't invented with the i386, try sometime in the 1960's (I'd have to crack out an OS book to be sure of the date)."

      Earlier than that actually: in 1956 Manchester University began work on the ATLAS computer project, capable of primitive multitasking. In other words multi tasking has been around for very nearly 50 years.

      Good. I don't feel quite so old now. Although I suppose I should for remembering this. I think this is docuemnted in Andy Tannenbaum's Computer Architecture book which I read in the 70s at Waterloo.

      --
      Need Mercedes parts ?
    13. Re:i dont use multithreading by ComputerSlicer23 · · Score: 1
      Honest to goodness pre-emptive multi-tasking? Hmmm, I might have to look it up. I thought in the 50's, batch processing, partitioning, and maybe co-operative multitasking was pretty much all there was. Then again, I might be discussing "shipping commercial systems", while you are discussing "state of the art research projects".

      Kirby

    14. Re:i dont use multithreading by fitten · · Score: 1

      The word "thread" is fine. If a thread is blocked, on IO for example, another thread that is "ready to run" from the same, or any other, process can be scheduled. Most thread aware OSs schedule threads as their base unit of execution. Of course, there are some advantages with scheduling threads from the same process over those from another process, but the base premise is OK for discussion.

    15. Re:i dont use multithreading by Fat+Cow · · Score: 1

      i think that just puts you back in the position of SMP - you have to deal with synchronization between the 2 parts of the cache again.

      --
      stay frosty and alert
    16. Re:i dont use multithreading by Random832 · · Score: 1

      you don't need to divide it into "2 parts" to have twice as much.

      --
      We've secretly replaced Slashdot with new Folgers Crystals - let's see if it notices.
    17. Re:i dont use multithreading by Omnifarious · · Score: 1

      If your program is ever halted waiting for I/O data and there's something else it could be doing, it's written poorly. And if you 'solve' the problem by adding threads, you're really just making it worse by adding many varied and interestingly obtuse ways for your program to have bugs.

      There are mechanisms that exist so that you can keep your single threaded model and still never wait for IO. Those are what you should use to avoid waiting for IO. Threads should be either a last resort, or a way to take advantage of more CPUs.

  8. Nothing new. by bigtallmofo · · Score: 3, Interesting

    This is Sun's Niagara Design. The more I learn about it, the more I think that it's nothing that exciting.

    From the lack of non-Sun-supplied buzz regarding this technology, it would appear that many people aren't finding it very exciting.

    --
    I'm a big tall mofo.
    1. Re:Nothing new. by zenslug · · Score: 3, Interesting

      The tech is actually pretty good, although it really depends on your application. If you want to run something single-threaded, then the Niagara chip is not going to impress you at all. The speed of the chip is not where its power is. Understand that the name is rather appropriate (i.e. like a river/waterfall): it is not very fast comparatively, but it can handle large volumes very well. Think massively multithreaded uses.

    2. Re:Nothing new. by SunFan · · Score: 3, Interesting


      What's not exciting about a 32-way single board computer? You don't have to program for it any differently than a 32-way SMP mainframe. Solaris does the rest for you.

      --
      -- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
    3. Re:Nothing new. by g0_p · · Score: 3, Informative

      Though in theory the Niagra design is another CMT implementation, its the implementation that is the crux here. CMT theory, has been worked around in academia since 6-8 years I think.

      Here is a very informative article on the Niagara design.

      For the lazy some main points from the article.
      - The Pentium 4 is a single core dual threaded CMT implementation. The Niagara has 8 cores and each core is capable of executing 4 threads.
      - Depending on the model of the application that is executing, a programmer can choose to either utilize it as a single process with multiple threads each mapped on to a hardware thread or as multiple processes mapped to hardware threads. Apart from this, individual cores can also be assigned to an individual process, adding one more level of flexibility.
      - Sharing data between threads on the same core is an L1 read and is extremely fast. Sharing data among threads on separate cores is an L2 read (since L2 is shared among cores)
      - The new chip provides a lot of flexibility in terms of how the programmer wants to allocates hardware threads across software processes or threads. But it looks like programming on it will be difficult unless the operating system provides very good support for it.

    4. Re:Nothing new. by platypus · · Score: 1

      Unfortunately, massively multihreaded uses are a small field between heavily parallizable and non-parallizable problems. And in both of these areas, SUN gets destroyed by x86(64) or POWER5 architectures, either as standalone, or as SMP or cluster architectures.
      This niagara thing is nothing more than Sun's contrived way of conceding they (or SPARC) lost the processor war.

  9. I didn't RTFA but... by cmburns69 · · Score: 0, Troll

    Can somebody explain to me how this differs from intel's hyperthreading technology?

    Is this just a fancy name for sticking multiple cores on the same die?

    What's the real story here?

    --
    Online Starcraft RPG? At
    Dietary fiber is like asynchronous IO-- Non-blocking!
    1. Re:I didn't RTFA but... by leoc · · Score: 0, Redundant

      I don't think it does. It sounds very very similar, and they even cite an Intel paper on Hyperthreading in the bibliography.

      --
      STFU about slashdot bias.
    2. Re:I didn't RTFA but... by Anonymous Coward · · Score: 0

      Intel's Hyperthreading was a relatively simple first step on the road to multiple concurrent threads on multiple CPU cores.

      Hyperthreaded CPUs can switch between threads very quickly but it doesn't allow for running more than one thread at a time.

      With the new dual core Intel and AMD CPUs, you can run two threads at a time (one per CPU). The CPU cores are not Hyperthreaded.

      Intel's architecture packs a relatively small number of relatively fast cores onto the die.

      Sun's architecture puts more cores on the die, but each core is relatively less complex. However, each core can execute more than one thread.

      Ace's Hardware has a good description of Niagra. To be taken with a liberal dose of salt -- these guys are real Sun fanboys:
      http://www.aceshardware.com/read.jsp?id= 65000292

      And of course Intel (and AMD) have plans to increase the number of cores on thier chips.

  10. Re:Bad bad English headline by pclminion · · Score: 0, Offtopic

    "What's" is a common contraction of "What does." The fact that it is used, heavily, in American speech is evidence enough of this. In British English, who knows.

  11. Not sure... by MetalliQaZ · · Score: 1

    I havn't read all the technical docs, so I'm not sure what the difference is between Sun's technology and Hyperthreading, but I'm sure there is a difference, with Sun's technology probably being more complete. Not sure but perhaps this technology would be better integrated with multi-core processors, to give not only multiple virtual processors, but also multiple simultaneous threads on each core. After all, if they want to compete against the Cell, they have to go multi-core. -d

    --
    "Here Lies Philip J. Fry, named for his uncle, to carry on his spirit"
    1. Re:Not sure... by Anonymous Coward · · Score: 0

      I havn't read all your post and the article, so I'm not sure what the difference is between your post and the article, but I'm sure there is a difference, with the article probably being more complete. Not sure but perhaps your post would summarize article better, to give not only opinion, but also multiple simultaneous thought on each subject. After all, if they want to compete, they have to go multi-post. -d -g :)

  12. Re:fp by Anonymous Coward · · Score: 0, Funny

    I would have had first post but I was reloading Slashdot using only a single thread!

  13. Same thing SMP and such has meant by Soong · · Score: 4, Insightful

    It means we're going to have to lean to program in parallel. We're going to have to parallelize our data processing and we're going to have to learn synchronization and locking methods.

    This is nothing new. The decreasing returns and impending limits of single threaded processing has been upcoming for a long time now.

    --
    Start Running Better Polls
    1. Re:Same thing SMP and such has meant by SunFan · · Score: 2, Insightful

      It means we're going to have to lean to program in parallel.

      Not really. If you've been using SMP servers, what's different about SMP on a chip? Even if you only have a few dozen Apache processes running, Solaris will schedule them onto Niagara just like if you had lots of separate CPUs.

      I don't think this is as big a change as people think. The main advantage will be a super-efficient CPU (50 to 60 watts, IIRC) but with the performance of many regular CPUs (hundreds of watts).

      --
      -- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
    2. Re:Same thing SMP and such has meant by Bastian · · Score: 2, Insightful

      I imagine that multithreading is a situation where OOP finally begins to really shine, as the amount of code factoring involved would make it much easier to keep track of when and where you need to be frotzing with synchronization and locking.

      I also imagine that if you can try to line up thread boundaries with object boundaries, the task of avoiding race conditions becomes almost trivial.

      But then, I haven't done much serious multithreaded programming, so maybe I am missing the point. Someone set me straight.

    3. Re:Same thing SMP and such has meant by Homology · · Score: 1
      Not really. If you've been using SMP servers, what's different about SMP on a chip? Even if you only have a few dozen Apache processes running, Solaris will schedule them onto Niagara just like if you had lots of separate CPUs.

      A crucial difference between a processes and threads are that threads are sharing (concurrently) that same data in the same adress space. So, having many processes are not anything like having multiple threads.

    4. Re:Same thing SMP and such has meant by Mornelithe · · Score: 1

      Apache is written with parallel execution in mind. Forking off separate processes is the most basic and easiest method of parallel programming.

      Of course, not all programs can be partitioned into little units that don't need to communicate with each other, and that's where locking and synchronization become important.

      I agree, though; this isn't exactly huge news. I thought the idea of hardware SMT (oops, CMT) was around a long time ago (we learned about it in my computer architecture course at university). The newer Pentium 4's have hyperthreading, just not multiple cores to make this really happen.

      How is Sun's design different from having multiple cores appear as multiple chips, and then allowing separate threads to be on separate chips (that does happen currently, no?)? How will this be significantly different than using dual-core chips from Intel, AMD and IBM?

      --

      I've come for the woman, and your head.

    5. Re:Same thing SMP and such has meant by Anonymous Coward · · Score: 0


      Solaris treats each process as a single thread in the kernel. It makes little difference from a scheduling point of view whether you have a 32-thread application or a 32-process application, except the latter might consume more memory.

    6. Re:Same thing SMP and such has meant by Homology · · Score: 2, Insightful
      Solaris treats each process as a single thread in the kernel. It makes little difference from a scheduling point of view whether you have a 32-thread application or a 32-process application, except the latter might consume more memory.

      With threads you have to syncronize access to common data that resides in the same memory adress space. With processes you don't have to do this as they have their own copy of the data at fork.

    7. Re:Same thing SMP and such has meant by radish · · Score: 1

      It means we're going to have to lean to program in parallel

      How many people are having to learn this? Is it really such a new thing?

      Whenever there's an article on some new multi chip/multi core tech people always mention the difficulty of programming in more than one thread. I write server side Java for web apps, there's a heavily threaded environment for you. Each incoming request is it's own thread, and that's just a really basic tomcat style setup. In our world, we have probably 7-800 threads running at any given time, over 12 CPUs. Once you get your head around it (which, in my case happened years ago at university) it's not really such a big deal. And of course, the performance gains are significant. What is also not mentioned often is the design improvements which are possible when you start thinking in multiple threads. IMHO, OO design really becomes significant only when you have more than one path - suddenly each object can do it's own thing, without having to be scheduled from some "controller".

      I'm sure there are plenty of other people here doing similar work to me...so why is multi threading considered such a big leap? I'm not even sure it's desktop-based developers - according to windows my outlook currently has 15 threads, IE has 19, mcaffee virus scan has 18 - hell even winlogon has 20. Seems like windows devs have no problems with threads either.

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    8. Re:Same thing SMP and such has meant by ajs · · Score: 1
      "The decreasing returns and impending limits of single threaded processing has been upcoming for a long time now."
      First time I heard that was 1987. Pray, what's new?

      My feeling is this: most of today's bottlenecks are related to bus and memory I/O. The fraction of bottlenecks that are CPU-related can generally be addressed more reasonably via true SMP, rather than multi-core pseudo-threading.

      For those trivial number of cases where this technology is a win... well, there you go, but the problem is that that small section of the consumer base is dictacting that MY CPUs will be more expensive. This is unthrilling to me....
    9. Re:Same thing SMP and such has meant by Kenard · · Score: 1

      You don't need to syncronize access to common data if all you are doing is reading it, so processes don't win there. If you want to have the processess commuticate, all you do is hand off the syncronization to the OS, which at best would be like having threads.

      --
      (appended to the end of comments you post)
    10. Re:Same thing SMP and such has meant by Fjornir · · Score: 1
      ...With processes you don't have to do this as they have their own copy of the data at fork.

      It's not quite as grim as that. In most cases the same pages will be used after fork() until one of the processes decides to write to the page in which case it gets a copy of the original page to work with from then on (Copy On Write).

      --
      I want a new world. I think this one is broken.
    11. Re:Same thing SMP and such has meant by Homology · · Score: 1
      You don't need to syncronize access to common data if all you are doing is reading it, so processes don't win there.

      You sure missed the point. Syncronization is needed becase one thread may modify common data at the same time as another thread is reading it.

    12. Re:Same thing SMP and such has meant by Rick+Genter · · Score: 1

      Amen.

      I program server-side Java code as well, and we usually run a couple of hundred threads over some small number of CPUs (2-4). Furthermore, the apps run non-stop for months at a time; during its current run one of our apps has been running for 3 months straight and has launched close to 1.5 million threads during that run.

      I still laugh when people think multithreaded is a big deal or "leads to hard to diagnose bugs". So can the use of pointers, but you don't see anyone saying that we should remove dynamic memory allocation from programming languages...

      --
      Don't underestimate the power of The Source
    13. Re:Same thing SMP and such has meant by GileadGreene · · Score: 1
      No, multithreading is where concurrency oriented programming, as implemented in languages such as Erlang and occam, begins to really shine.

      Lining up thread boundaries with object boundaries can help, but only if you implement the objects as "monitors" and avoid shared data. But why bother messing with monitors when you can go a step further, and use the concurrency model that Hoare (the inventor of monitors) came up with to address the shortcomings of monitors: CSP. Which just happens to be the concurrency model that underlies occam :) The CSP approach to concurrency makes the avoidance of race conditions trivial, and the avoidance of deadlock a matter of design instead of luck. Amusingly wnough, the CSP model of message-passing concurrency much more closely matches the original concept of objects than the method-based object model implemented by most modern OO systems.

      I have personally created programs that implement several thousand "threads" using the CSP model, and are completely (and provably) free of deadlock and race conditions.

    14. Re: Same thing SMP and such has meant by gidds · · Score: 1
      Please tell me more!

      I've read the one-paragraph version of CSP, but I can't immediately see how the CSP primitives combine naturally into large systems, nor how they're inherently different from the sort of threading primitives that, say, Java gives you. How do you work with CSP? How do you approach system design? How does it avoid the sorts of problems you get elsewhere?

      --

      Ceterum censeo subscriptionem esse delendam.

    15. Re: Same thing SMP and such has meant by GileadGreene · · Score: 1
      CSP primitives are inherently different from threads because they strictly ban shared data. All communications between CSP processes is via message passing (typically with blocking). There is no need for locks, mutexes, or semaphores, nor all of the conceptual difficulties that those engender. CSP primitives combine very naturally into larger systems through a parallel composition operator which establishes an inter-process interface. Composite processes can be further composed with other processes, making it easy to build hierarchical process networks. Becuase there's no shared data, you don't need to mentally keep track of what every thread is doing as you build your system: CSP-style concurrency gives you a much cleaner, easier to use, more scalable concurrency model than threads.

      The key advantage that CSP has is that it is a real, live, honest-to-god mathematical theory of concurrency, which means that the primitives and composite operators have extremely well-defined semantics, and have been analyzed to ensure that they compose correctly. Plus, you can express a design intended for a language that supports CSP-style concurrency directly in CSP notation, and then analyze it using various tools (or even pencil and paper) to ensure that it is deadlock-free, and maintains whatever properties you want maintained.

      If you are a Java programmer, you may want to check out the JCSP library of CSP primitives for Java. IMHO much easier to use than Java's native concurrency model.

    16. Re:Same thing SMP and such has meant by ckaminski · · Score: 1

      Forking has been made very efficient in modern unixes, and has very few of the multiprocessing issues that threading does. It's the main reason why Postgres still is not multithreaded on Linux: there's no benefit, and only pain and suffering.

      Thread implementations really need protected memory. In order for thread A to access memory owned by thread B, it must be granted permission, or have a handle to said memory. Problem is, that kind of brings you back to the fork() implementation. Why not just use that instead?

      Threads get you performance, but at a cost. Thread synchronization can kill all your performance gains. It's why Apache didn't have a threaded MPM until recently, and why it's still not the default (process stability, security, etc).

    17. Re:Same thing SMP and such has meant by ckaminski · · Score: 1

      The point of CMT/SMT is to take stalled transistors, waiting on L1/L2/L3 main memory data, and put them to good use on other queued tasks. So CPU1 has something like this in it's pipeline

      thread1-IP: value
      thread2-IP: value

      thread1 gets dispatched, executes in multiple parallel pipelines, gets caught waiting for data.
      CPU takes extra pipeline, throws away results executes thread2.

      If thread2 stalls, returns to thread1. If thread1 results turn out bad (the discarded pipeline results were the right one), complete thread2, reissue thread1.

      This is why SMT/hyperthreading gives you only incremental benefit, and can actually cost you performance.

      You can get a benefit by truly parallelizing the ENTIRE execution pipeline. Multiple pipelines, multiple execution units. But that's just dual core, so stop making complex CPU's, and start putting 20 of them on a die and bam. Supercomputing on a chip.

    18. Re:Same thing SMP and such has meant by ckaminski · · Score: 1

      Which is a big reason why AMD doesn't have this feature on their roadmap. Dual core is where it's at. Hyperthreading was Intel's solution to a poor design decision (a 20stage pipeline).

    19. Re:Same thing SMP and such has meant by farnz · · Score: 1
      In Sun's case, Niagra is intended for big database stuff; the idea of Niagra is that, within certain limits, the time taken per access is unimportant. What matters is the number of queries per second.

      Sun hopes that while Niagra may take (example numbers only) 10ms for a typical query, whereas an equivalently priced Opteron can do the same query in 1ms, the Niagra processor can do 1,000 of those queries in 500ms whereas the Opteron now takes 1200ms.

  14. I am not really that impressed. by Ex+Machina · · Score: 1, Troll

    This is kind of a trivial optimization! Basically, you extend your pthreads library so all the threads within a single shared memory application schedule themselves on cores on the same chip. Big deal! Now if it could figure out how to schedule processes on "adjacent" cpus to optimize their common memory accesses, I'd be more impressed.

    1. Re:I am not really that impressed. by Matthew+Weigel · · Score: 1

      Perhaps I'm misunderstanding you, but yes, I believe Irix supports this for its ccNUMA machines, where the 'distance' between CPUs (and associated memory) can vary quite a bit. If you've got a single system image running on 10 machines with 2 CPUs apiece, you really don't want it to treat every CPU as adjacent to every memory area.

      --
      --Matthew
    2. Re:I am not really that impressed. by Anonymous Coward · · Score: 0

      By using the same CPU to run multiple threads simultaneously you avoid having to save and restore the register context, which is significant.

      The major commercial OSs can already run threads of the same process on multiple CPUs simultaneously... is that your question? OK maybe it wasn't.

    3. Re:I am not really that impressed. by hlge · · Score: 1

      Most kernels already do that and more using processor affinity to determine on what CPU a certain thread should be scheduled on next time. And lately (for Solaris) a concept called affinity groups have been introduced to group CPUs and memory together based on locality, this so that the scheduler and memory allocator can make informed decisions on where to allocate recourses from. This would be needed since more and more systems tend to get NUMA like characteristics, in the case of Sun an E25K class machine is a NUMA system that is made look to as a UMA system So in short, in case of Solaris, the scheduler already knows how to "optimize" scheduling of threads based on CPU and memory locality, and the memory allocate can make similar decisions on where to allocate memory from, depending on what directives you give it. Cheers

  15. CMT processors by Anonymous Coward · · Score: 0

    Not enough teeth and too many showings of Dukes of Hazzard and Heart Like a Wheel.

    1. Re:CMT processors by rcamans · · Score: 1

      Yum Yum. Daisy Duke. Eeee-Haaa!

      --
      wake up and hold your nose
    2. Re:CMT processors by hankaholic · · Score: 1

      That's fucking hilarious. I almost spat bagel chunks.

      In case you come back to check replies, thanks for the laugh. ;)

      --
      Somebody get that guy an ambulance!
  16. INKEY and TSRs by Anonymous Coward · · Score: 3, Funny

    Can I still use INKEY in my basic programs? Will multi-threading make it more efficient? Can I actually run a second program on my DOS PC without having to force it as a TSR?

    1. Re:INKEY and TSRs by Anonymous Coward · · Score: 0

      Yes.

    2. Re:INKEY and TSRs by Anonymous Coward · · Score: 0

      No - DOS won't run on SUN HW.

  17. Re:Bad bad English headline by betelgeuse-4 · · Score: 1

    In informal British English speech it's fine. In informal writing it's probably okay, and in formal writing contractions shouldn't be used.

  18. marketing handwave by klossner · · Score: 2, Insightful
    "throughput has become more important than absolute speed in the enterprise"
    I've been seeing this quote in press releases for three decades. It has always meant "we can't compete on performance so we're going to explain why performance isn't important anymore." The few times my management bought that story, they came to regret it.
    1. Re:marketing handwave by Anonymous Coward · · Score: 0

      You can tell it's a lie by the three words "in the enterprise".

    2. Re:marketing handwave by Anonymous Coward · · Score: 1


      Everyone is going multi-core, because waste-buckets like Pentium 4 are way overextended. Sun is just taking it further earlier than other vendors, because they are betting that people don't realize just how inefficient and hyped up their Pentium/Xeon systems are.

    3. Re:marketing handwave by platypus · · Score: 1

      No, Sun is using this technique as a last resort, because POWER5 and x86(-64) architectures are destroying everything Sparc has come up with since the last several years.
      God damn, I wish people would take a look at the normal performance characteristics of their servers (or even PCs). In most cases there are just one or two threads (processes) dominating CPU usage. Niagara won't help you there.

    4. Re:marketing handwave by lewiscr · · Score: 1

      I beg to differ. Just on my desktop, there are currently 14 processes recieving a slice of the CPU this second, and 200+ processes that will receive CPU time sometime in the next 10 seconds. 3 of those processes are always fighting for the CPU (this is on a single P4 SMT 2.8 Ghz machine). Sure 8core, 4 threads each might be overkill for my desktop. But 2 core 4 threads each wouldn't be. At that's this year.

      Now, if you want to talk servers, I've got many processes I could rewrite to be much more parallel, but the hardware can't keep up. I've only got 2 SMT CPUs per machine, so I can only do heavy processing in 2 to 4 processes/threads. I would dearly love to have CPUs with 8 cores 4 thread each. Currently the code is unnecessarily procedural so that I don't drive the machine into the ground. A few programs would benefit from thousands of active threads, but most would benefit from ~50 active threads. Sure, I could buy more hardware, but if a slightly more expensive CPU will let me run ~16x faster after a rewrite to take advantage, I'll take 50 new CPUs.

      I could be a biased sample. I'm a power Linux user on the desktop, and have a nice horizontally scalable database design. That horitzontal scalabilty automatically makes many of my tasks implicitly parallel.

  19. And how is this different from hyperhreading? by qwertphobia · · Score: 0, Redundant

    And how is this different from hyperhreading?

    Seriously, is there a difference, or is this just a marketing name to differentiate the two?

    --
    Never ask for directions from a two-headed tourist! -Big Bird
  20. Re:Hyperthreading by Anonymous Coward · · Score: 0

    HyperThreading is simply an implementation of SMT, it isn't exactly an Intel invention. I think a lot of the tech was designed into an Alpha processor that never saw general release, IIRC it was a 4 thread core. HyperThreading is a naff 2 thread variant that sometimes reduces overall processing power, lol.

  21. Thruput ... by foobsr · · Score: 2, Funny

    Throughput computing maximizes the throughput per processor and per system. So a processor with multiple cores will be able to increase the throughput by the number of cores per processor. This increase in performance comes at a lower cost, fewer systems, reduced power consumption, and lower maintenance and administration, with increase in reliability due to fewer systems. (from TFA, emphasis mine)

    So it seems they invented a way to linearly scale peformance. WOW! But maybe I misunderstood and the thing is over my head.

    CC.

    --
    TaijiQuan (Huang, 5 loosenings)
    1. Re:Thruput ... by mungtor · · Score: 1

      I can't tell if you're being sarcastic, but hat should still be good, right? A machine with N processors is not N-times faster than it's single processor equivalent. I thought there was a rule that every time you double the number of processors you can increase the performance by 50%. Something to do with shared resources limiting operations.

      Of course I'm not a hardware engineer, and I could be completely wrong.

    2. Re:Thruput ... by Anonymous Coward · · Score: 0


      No, performance should scale more than linearly, because the CPUs won't be idle nearly all the time like they are in modern Xeon systems. Sun is advertising that only eight cores will be 15x as fast as UltraSPARC III, for example. Even for the SPECint trolls, this is still way faster than Xeon.

    3. Re:Thruput ... by John+Courtland · · Score: 1

      While this may not scale all the way to N-way systems, the Opteron and Hypertransport bus allow greater throughput per processor to memory and I believe the Northbridge.

      --
      Slashdot is proof that Sturgeon's Law applies to mankind.
    4. Re:Thruput ... by foobsr · · Score: 1

      I can't tell if you're being sarcastic, ...

      Admittedly, I fell back to that. I shall practise more in order to keep a state of equanimity when confronted with salesdroid talk.

      CC.

      --
      TaijiQuan (Huang, 5 loosenings)
    5. Re:Thruput ... by owlstead · · Score: 1

      Hmmm, funny yes, but as always, it really depends on the application. CPU bound tasks that have no shared data will probably scale directly with the number of processors. I've got a simple crypto-analysis tool that does just that (with exactly 8 threads as well). So yes, that will run 8 times faster.

      Fortunately Sun understands throughput and memory sharing pretty well, so I presume that it'll scale pretty well in less fortunate situations, such as running an application server. They probably designed it just for that particular field as well.

  22. As for a reason to count Sun out by simpolman · · Score: 1

    ... their continued use of the word "Enterprise." What does this mean anyhow?

    1. Re:As for a reason to count Sun out by Z00L00K · · Score: 1
      Enterprise - NCC1701 what else can it be? :-)

      (Yes I know - Offtopic to the max!)

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    2. Re:As for a reason to count Sun out by Anonymous Coward · · Score: 0

      They actually meant "Scalable Enterprise Solution."

    3. Re:As for a reason to count Sun out by rs79 · · Score: 1

      " "Enterprise." What does this mean anyhow?"

      You can use it on spaceships.

      --
      Need Mercedes parts ?
    4. Re:As for a reason to count Sun out by Listen+Up · · Score: 1

      It means such software implementations as massive/distributed databases, massive collaborative development environments, J2EE, groupware backends, webserving, various application servers, NFS, NIS, print servers, redundancy, fail-over, massive scalability, etc.

      Basically, everything that does not normally exist on a desktop computer or small office server. Personal computing is a very small tip of the iceberg in the overall picture of computing needs and power.

    5. Re:As for a reason to count Sun out by Anonymous Coward · · Score: 0

      It means "Big Database(s)". That's essentially the only job that makes big iron cost effective. Since databases are heavy on multiple transactions on the same tables, keeping as much in memory as possible while having as many processors as possible working on it is the key. Locking back to main memory or disk is incredibly slow compared with locks in cache memory between threads on the same processor.

  23. Efficiency and latency are mutal tradeoffs by squarooticus · · Score: 3, Interesting

    Not sure I buy that this "increases a processor's efficiency as wait latencies are minimized". It seems to me that decreasing latency reduces efficiency because you spend a greater percentage of your cycles changing state (overhead) instead of doing useful work. This is why realtime OS'es aren't the norm: they reduce latencies to critical maximums, but at the cost of overall throughput.

    --
    [ home ]
    1. Re:Efficiency and latency are mutal tradeoffs by Anonymous Coward · · Score: 0

      From the article, it looks like their approach is to both increase the number of thread switches, but reduce the overhead by doing the thread switching in hardware.

      This should increase the throughput on a highly parallel, memory-latency-bound system. I'm just not sure how common such things are.

      If Solaris were a microkernel, I could see this paying off, but I'd have to see useful applications first.

    2. Re:Efficiency and latency are mutal tradeoffs by farnz · · Score: 2, Informative

      A big webserver or database server is a highly parallel, memory-latency-bound system; each request is an individual thread, and in most database and web servers, locks are finegrained enough to allow many requests to proceed in parallel, subject to them being able to retrieve the data from RAM or disk in a timely fashion.

    3. Re:Efficiency and latency are mutal tradeoffs by farnz · · Score: 2, Insightful
      A processor's wait latency is the time it spends doing absolutely nothing while it waits for an external device to catch up. If your RAM latency is around 100 cycles, and context switching costs you 100 cycles, you're right in saying that efficiency goes down. On the other hand, if each context switch costs you 10 cycles, you can context switch nine times before you've started to lose efficiency.

      Sun are putting in hardware to ensure that context switches are fast (possibly even one or two cycles); hopefully, this will result in the context switches costing less than waiting for memory accesses, and speed up the throughput of the system as a whole. So, benchmarking one thread of execution will show a slow system, whereas a group will hopefully show a big speedup.

    4. Re:Efficiency and latency are mutal tradeoffs by Relic+of+the+Future · · Score: 2, Informative
      Actually, that's the whole point of this technology: there is no expensive context switch between threads. The processor goes along, issuing instructions from several threads, and when it gets a cache miss for one of the threads, it just keeps chuging along, issuing instructions from the other threads.

      Skiming the article, it doesn't even seem this processor bothers with out-of-order execution or register renaming; if it stalls, it just starts issuing from a different thread.

      --
      Those who fail to understand communication protocols, are doomed to repeat them over port 80.
  24. Re:They're not the same. by Anonymous Coward · · Score: 0

    If you'd bothered to RTFA, you'd see that CMP = multicore, *not* CMT. CMT uses "logical processors" in exactly the same way as HyperThreading.

    Get a clue in general before posting crap comments, please.

  25. What DOES it mean to me? by pla · · Score: 5, Insightful

    It means "Difficult to reproduce bugs".

    It worries me how many people just say "it means faster programs and doesn't take much more work". That mindset leads to lazy programmers who A - Can't optimize to save their jobs; and B - Don't actually understand what multithreading really does.

    If you consider it easy, you've either just thrown great big global locks on most of your code, in which case your code doesn't actually parallelize well; or you've written what I refer to in my first sentence - Bugs that take an immense effort just to reproduce, nevermind track down and fix.

    1. Re:What DOES it mean to me? by Anonymous Coward · · Score: 0

      Or maybe it means that the smart people who write libraries for operating systems can safely use these multithreading techniques to boost performance in those libraries. Then, mere mortal programmers like the ones you so dismissively disparage can take advantage of these techniques merely by using the libraries. Great how that works.

    2. Re:What DOES it mean to me? by bjsyd70 · · Score: 2

      If the tasks are unrelated and share no data then getting then you can get them to run in parallel with only a small decrease in reliability/increase in cost. This is a typical case for a web application serving multiple independant clients (you are reading your mail, and I am reading mine).

    3. Re:What DOES it mean to me? by TrappedByMyself · · Score: 1

      Yeah, but what if I'm reading other people's mail?

      --

      Help me take back Slashdot. When did 'News for Nerds' become 'FUD and Conspiracy Theories for Extremist Nutjobs'?
    4. Re:What DOES it mean to me? by Anonymous Coward · · Score: 1, Interesting
      You are using the wrong tools.

      Just as managing memory is a "hard problem", but malloc() and free() make it safer, there are toolss that let you use threads safely and easily too.

      Consider using something like OpenMP There is nothing dangerous or risky or hard to debug in examples like

      #pragma omp parallel for private(sum) reduction(+: sum)
      for(ii = 0; ii < n; ii++){
      sum = sum + some_complex_long_fuction(a[ii]);
      }
      If you're trying to write your own "thread create" "thread join" stuff by hand, you're wasting your time and your employer's resources in the same way as if you decided to re-write your own garbage collector.
    5. Re:What DOES it mean to me? by smcdow · · Score: 1
      #pragma omp parallel for private(sum) reduction(+: sum)
      Bleah. FYI, I'm pretty sure GCC will reject this. Even the newest versions.
      --
      In the course of every project, it will become necessary to shoot the scientists and begin production.
    6. Re:What DOES it mean to me? by smcdow · · Score: 1
      It means "Difficult to reproduce bugs".

      Agreed. It also means "Gigantic, bloated slabs of programs that do everything even when a multi-process, toolkit based model fits better for solving the problem because all anyone knows is the threading model."

      --
      In the course of every project, it will become necessary to shoot the scientists and begin production.
    7. Re:What DOES it mean to me? by Anonymous Coward · · Score: 0

      Cool lot's of thread overhead creating little threads. Most lib functions are not behemoths. Ok, let the "lwp" vs true-thread war begin (again).

    8. Re:What DOES it mean to me? by m50d · · Score: 1

      It's good enough with global locks. Yes, it isn't as efficiently paralleliseable, but generally that isn't why I've written threads in the first place. I've written them because I have something blocking to do, or something else that needs me to have threads. Because I know they're a nightmare to do properly. So for what I do, multithreading is easy. Because if it isn't, I'll do something easier instead.

      --
      I am trolling
    9. Re:What DOES it mean to me? by Anonymous Coward · · Score: 1, Informative
      It worries me how many people just say "it means faster programs and doesn't take much more work".
      Except that sometimes it's really true! (Write a ray-tracer sometime.) There are some problems -- and I mean really compute-bound problems -- that speed up almost in proportion to number of processors and where the extra programming for locking, is just plain trivial.

      Sure, not everything is like that, but some things are. So quit raining on everyone's parade. ;)

    10. Re:What DOES it mean to me? by platypus · · Score: 1

      And if you are doing that, Niagara won't help you one bit. Because if your thread is blocking, it can be scheduled anyway by the OS even on a singe CPU machine and another thread can run. On Niagara, you would just put one core into waiting state - ergo, no speedup.

    11. Re:What DOES it mean to me? by GileadGreene · · Score: 1
      Perhaps if we could get out of the mindset that the ancient "threads" paradigm is the only way to do concurrent programming we'd be better off. The folks at Bell Labs have been doing concurrent programming for decades using the CSP concurrency model. It's worked fine for them. The same concurrency model underlies the occam language, which has been used to produce some extremely complex concurrent applications. See also stackless python, JCSP for Java, CCSP for C, and C++CSP for C++.

      Concurrency isn't the problem - the threads model is. The CSP style of concurrency reduces the number of difficult to reproduce bugs, and make those bugs that remain aenable to mathematical analysis to determine their cause and solution.

    12. Re:What DOES it mean to me? by Anonymous Coward · · Score: 0

      Amen. Agree 100%

    13. Re:What DOES it mean to me? by Anonymous Coward · · Score: 0

      It means "Difficult to reproduce bugs".

      It worries me how many people just say "it means faster programs and doesn't take much more work". That mindset leads to lazy programmers who A - Can't optimize to save their jobs; and B - Don't actually understand what multithreading really does.

      If you consider it easy, you've either just thrown great big global locks on most of your code, in which case your code doesn't actually parallelize well; or you've written what I refer to in my first sentence - Bugs that take an immense effort just to reproduce, nevermind track down and fix.

      Multithreading is not easy, but it is not rocket science either.

      Optimizing all your code is a big mistake. Whenever you optimize every ten lines of code, you are optimizing locally, not globally, so while your 10 lines may run faster, your whole app is running slower, because you don't know where your app is spending most of its cycles.

      To optimize properly, you need to profile your app to see where it is spending most of the cpu cycles. Then making it faster is usually simple.

  26. Not exactly the same. by 1010011010 · · Score: 5, Informative

    1.3 Simultaneous Multi-Threading

    Simultaneous multi-threading [15],[16],[17] uses hardware threads layered on top of a core to execute instructions from multiple threads. The hardware threads consist of all the different registers to keep track of a thread execution state. These hardware threads are also called logical processors. The logical processors can process instructions from multiple software thread streams simultaneously on a core, as compared to a CMP processor with hardware threads where instructions from only one thread are processed on a core.

    SMT processors have a L1 cache per logical processor while the L2 and L3 cache is usually shared. The L2 cache is usually on the processor with the L3 off the processor. SMT processors usually have logic for ILP as well as TLP. The core is is not only usually multi-issue for a single thread, but can simultaneously process multiple streams of instructions from multiple software threads.

    1.4 Chip Multi-Threading

    Chip multi-threading encompasses the techniques of CMP, CMP with hardware threads, and SMT to improve the instructions processed per cycle. To increase the number of instructions processed per cycle, CMT uses TLP [8] (as in Figure 6) as well as ILP (see Figure 5). ILP exploits parallelism within a single thread using compiler and processor technology to simultaneously execute independent instructions from a single thread. There is a limit to the ILP [1],[12],[18] that can be found and executed within a single thread. TLP can be used to improve on ILP by executing parallel tasks from multiple threads simultaneously [18],[19].

    --
    Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
    1. Re:Not exactly the same. by Anonymous Coward · · Score: 0

      Actually you are right, this sounds more like out of order execution. IE executing both sides of a conditional branch.

  27. Re:They're not the same. by Anonymous Coward · · Score: 0

    Er, HyperThreading but with added hardware "task switching", which is what the OS takes care of on HT processors.

  28. Everything you need to know about . . . by Anonymous Coward · · Score: 1, Informative

    Hyperthreading (which is SMT) and CMT (the original CMT, not Sun's new acronym) is at:

    http://www.realworldtech.com/page.cfm?ArticleID= RW T122600000000

    It's dated a while ago, I think before hyperthreading came out (and Alpha was still being developed). The other two parts of the series are also interesting, and explain some of the possibilities with hardware processor threading. I think the first part has more explanation, but I couldn't find it quickly.

    The forums on the site are also good, better in a technical sense than ars-technica or aceshardware and especially slashdot.

  29. Translation by Anonymous Coward · · Score: 0

    I skimmed through the article, and it seems this is just multiple-cores-on-one-die (hyperthreading style), but they also add hardware context-switches. So, you can feed the processor 10 different threads and it'll take care of switching contexts as soon as a cache misses or such, without invoking the OS.

    Is this all there is to it? I mean, with just one L1 cache per core, this is not going to work very well, is it?

  30. In the loosely coupled world? little by Ars-Fartsica · · Score: 1
    First of all, threading has always been a great way for programmers to get in over their heads, create very tough bugs, and generally waste development time.

    But thats outside the point - in the new world of very many cheap rackmount servers clustered together, loose coupling has taken over. Maybe if the world had turned out differently and was dominated by big servers, threading would have caught on.

  31. Hyperthreading by Dominic_Mazzoni · · Score: 2, Interesting

    As many others have already pointed out, Intel has had Hyperthreading available in Pentium 4 and Xeon CPUs for a couple of years now, which does exactly what the article is talking about.

    I was skeptical at first, and read some of those articles showing that some applications could actually run slower. But then I tried it for myself, and I have to admit I've been impressed. My main box is a dual-Xeon, each with Hyperthreading turned on. It appears to Linux as if I have four independent CPUs. A few numerical tasks saturate the processors if I have just two of them running in parallel, but several tasks do fine with four or more copies. My favorite is "make -j 4" - starting four gcc processes in parallel works surprisingly well. How long does it take you to compile the Linux kernel?

  32. In most cases by Z00L00K · · Score: 5, Informative
    multithreading hardware will not mean much, but in some cases it may mean a lot for performance. (with most cases I mean users running Word/Excel/Powerpoint/likewise)

    The real issue is how large each thread can be (in the matter of memory) before it has to access data that is external to the thread. It may mean a lot for gamers running close to reality games and also for those that are doing massive calculations.

    The important thing is that developers has to be aware of the possibilities and limitations around this technology. Otherwise it would be like throwing a V8 into a T-Ford. It is possible, but you would never be able to utilize the full power.

    Another thing is that todays programming languages are limited. C (and C++) are advanced macro assemblers (not really bad, but it requires a lot of the programmer). Java has thread support, but it's still the programmer (in most cases) that has to decide. Java is not very efficient either, which of course is depending on which platform it's running on in combination with general optimizations. C# is Microsoft's bastard of Java and C++ with the same drawbacks as Java.

    There are other languages, but most of them are either too obscure (like Erlang or Prolog) or too unknown.

    The point is that a compiler shall be able to break out separate threads and/or processes whenever possible to improve performance. It is of course necessary for the programmer to hint the compiler where it may do this and where it shouldn't, but in any way try to keep the programmer luckily unknowing about the details. The details may depend on the actual system where the application is running. i.e. if the system is busy with serving a bunch of users then the splitting of the application into a bunch of threads is ot really what you want, but if you are running alone (or almost alone) then the application should be permitted to allocate more resources. The key is that the allocation has to be dynamic.

    Anybody knowing of any better languages?

    --
    If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    1. Re:In most cases by Anonymous Coward · · Score: 0

      Anybody knowing of any better languages?

      Every functional language.

    2. Re:In most cases by Anonymous Coward · · Score: 0

      Take a look at Oz if you want a modern well-designed language that actually understands concurrency.

  33. MultiThreading by sameerdesai · · Score: 0

    If the point is not clear one should program to get maximum Instruction per Cycle (IPC). This is what essentially is obtained via multithreading i.e. the ability to execute more instruction in a given cycle.

  34. Wrong. by BillsPetMonkey · · Score: 1

    "What's"

    Can mean any of the following:

    "What is"
    "What does"
    "What has"

    So the title of this post can validly be read
    "What is it Mean to Developers?"

    So the answer can validly be stated as

    "Yeah, it's real mean to developers".

    Go on, look it up.

    --
    "It's not your information. It's information about you" - John Ford, Vice President, Equifax
    1. Re:Wrong. by pclminion · · Score: 1

      There is only one valid interpretation. The concept you are looking for here is "ambiguity." Just because something is ambiguous does not mean that all possible interpretations are correct.

    2. Re:Wrong. by Anonymous Coward · · Score: 0

      don't be daft.

      "What is it mean to developers?" is meaningless.
      "What, is it mean to developers?" is better.

      but you can't contract words across separate clauses. it's unnatural and is never done. they're separate units of speech. to contract, you'd say

      "What, it's mean to developers?".

      so, there is only one valid interpretation, and no ambiguity at all. it is a bit clunky though.

    3. Re:Wrong. by BillsPetMonkey · · Score: 1

      There can be more than one valid interpretation because an interpretation is subjective.

      You might interpret "aloominum" as the correct pronunciation of "aluminium". Regardless of how you might then pronounce "condominium" or "plutonium", your interpretation is still valid.

      As is mine.

      --
      "It's not your information. It's information about you" - John Ford, Vice President, Equifax
    4. Re:Wrong. by pclminion · · Score: 2, Insightful
      Are you being purposefully dense?

      When a person says something, the intended meaning is not ambiguous (unless you are a poet), although the words used to describe that meaning may be.

      In this case it was intended to mean "What does it mean" and absolutely nothing else, your grammatical writhings notwithstanding.

    5. Re:Wrong. by Anonymous Coward · · Score: 0

      Over here we spell aluminum "aluminum", so it doesn't matter how we pronounce "condominium".

    6. Re:Wrong. by BillsPetMonkey · · Score: 1

      the intended meaning is not ambiguous

      The intended meaning is never ambiguous. Unfortunately, the intended meaning exists only in the speaker's head, which is why the speaker should remove all ambiguity from their speech.

      Which the poster didn't.

      --
      "It's not your information. It's information about you" - John Ford, Vice President, Equifax
    7. Re:Wrong. by pclminion · · Score: 1

      It is only ambiguous because you are being intentionally boneheaded. There's a time and a place for geeky logical nitpicking, and this ain't it.

    8. Re:Wrong. by Anonymous Coward · · Score: 0

      There's a time and a place for geeky logical nitpicking, and this ain't it.

      You're new round here aren't you ?

  35. Comment removed by account_deleted · · Score: 2, Informative

    Comment removed based on user account deletion

  36. CSP and libthread by CondeZer0 · · Score: 4, Informative

    This is what it means for me: http://www.cs.bell-labs.com/who/rsc/thread/

    Also see Brian W. Kernighan's "A Descent into Limbo" and Dennis M. Ritchie's "The Limbo Programming Language".

    And of course Hoare's classic: Communicating Sequential Processes.

    Now you can enjoy the power and beauty of the CSP model in Linux and other Unixes thanks to plan9port including libthread and Inferno; yes, it's all Open Source.

    --
    "When in doubt, use brute force." Ken Thompson
    1. Re:CSP and libthread by GileadGreene · · Score: 1

      See also occam, the original CSP-based language, and JCSP, a CSP library for Java. The WoTUG website has lots of great references on CSP, occam, and the communicating process model in general.

  37. Count Them Out? by lbmouse · · Score: 1

    Yet another reason not to count Sun out...

    Who has ever counted them out?

    1. Re:Count Them Out? by Quill_28 · · Score: 1

      You new around here?

      90% of the poster think SUN is going the way of the dodo.

      Of course most of these 90% have never worked in a enterprise environment where they can see some of the advantages of Solaris, Aix, HPUX, etc..

    2. Re:Count Them Out? by dmh20002 · · Score: 0, Flamebait

      Who has counted Sun out? How about the stock market? Anyone in this thread who really likes Sun should go buy some of their very cheap stock ($4.43, down from $60.00 in 2001) and make a zillion on their comeback.

      put some of that in your social security personal account!

    3. Re:Count Them Out? by bradleycarpenter · · Score: 1

      Yes, and we all know how the stock market always knows everything. Just like before the bubble when all the tech stocks were in the $200 range. Sun has been slow to make a comeback like most of the tech companies that did not go bust, but they still have 7 billion in the bank and things are starting to look bright once again.

  38. Re:Hyperthreading by Anonymous Coward · · Score: 0

    In my experience parallel makes on a hyperthreading CPU don't run faster. Did you time the total build each way ? Are you not running -pipe, so that you are really just paralizing access to a disk ? To do the experiment correctly, compile with make -j 4 and make -j 2 on with and without hyperthreading, geting 4 measurements in all.

  39. Old idea, but there are many ways to implement by jd · · Score: 2, Informative
    I designed pretty much the same concept in the OpenCore "FLOOP" project I'm working on. You just have an array of registers, such that each thread's state is maintained and is directly referrable.


    Actually, the "best" way to implement the design is to split the thread state from the processing elements, then use locking on the elements. If two threads use independent processor elements, they should be simultaneously executable.


    By having many instances of the more common processing elements, you would have many of the benefits of "multi-core" (in that you'd have parallel execution in the general case) but the design would be much simpler because you're working at the element level, not the core level.


    Yes, none of this is really any different from hyperthreading, multi-core, or any other parallel schemes. All parallel schemes work in essentially the same way, because they all need to preserve states and lock resources.


    Personally, I think REAL Parallel Processing CPUs that can handle multiple threads efficiently are already well-enough understood, they just have to become reasonably mainstream.


    For myself, I am much more interested in AMD's Hyper Tunneling bus technology, which looks like it could supplant most of the other bus designs out there.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:Old idea, but there are many ways to implement by Hortensia+Patel · · Score: 1

      Do you have any links handy on the "hyper tunneling" bus you mention? I'd not heard of it before and can't find anything substantive on the Web. Is it something not yet announced, or is my Google-fu unusually poor today?

    2. Re:Old idea, but there are many ways to implement by jd · · Score: 1
      I can't find a direct link. Fortunately, I've archived the web pages and will be able to put them up somewhere. The idea seems to have been to have a generic high-speed bus into which you plugged PCI, SCSI, VME or other regular busses.


      The closest I can see to that on AMD's website is some vague referencing to Hyper Transport, which is a high-speed chip-to-chip communication system. My guess is that there wasn't enough interest for them to build an entire bus architecture, but that high-speed chip communication proved sellable.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  40. My J2EE Application wil FLY by PHAEDRU5 · · Score: 3, Insightful

    Since I mostly work on J2EE stuff, I let the container take care of the threading for me. The one exception is J2EE Connector Architecture (JCA) bits that use the work manager. Even there, however, most of my work is simply putting a thin JCA layer in place between the outside world and the J2EE stack.

    For me, these new chips simply mean increased performance for deployed apps, without any modification to the app code.

    Beauty!

    --
    668: Neighbour of the Beast
    1. Re:My J2EE Application wil FLY by Anonymous Coward · · Score: 0


      Finally, someone in this discussion who really gets it. Sometimes, Slashdot makes me want to cry.

    2. Re:My J2EE Application wil FLY by Anonymous Coward · · Score: 0

      Agreed! All of these "Threading is HARD!" and "if you use any abstraction tools you are teh debil!" posts are just obnoxious.

      It all matters on the tool set your using, and for those of us who are working with .Net or Java, extreme performance is most likely not our primary goal. Having a working application to deliver to a customer would be. For those game/physics/scientific coders that need every ounce of processing power, yeah, you'll have to write the "Hard" code.

      Me, my customers wanted a knight-rider styled progress bar, and thanks to some (simple) multithreading, they got it. On schedule and on budget.

      -Rick

  41. whore for +1 informative by delirium+of+disorder · · Score: 1
    Intel was the first to impliment this, true, but sun has a history of great SMP support in both software and hardware; multiprocessing/multithreading could be a widly accepted trend in the industry and sun could take the lead. For more information on SMT/CMT, read up:


    http://en.wikipedia.org/wiki/Simultaneous_multithr eading/

    --
    ------ Take away the right to say fuck and you take away the right to say fuck the government.
    1. Re:whore for +1 informative by convolvatron · · Score: 1

      no, intel wasn't. a better candidate would be the tera MTA processor. but it wasn't the first either.

  42. This is just Multi-core processing... by mzito · · Score: 5, Interesting

    CMT is nothing more than multi-core processors. Sun is using the marketing idea of CMT to hide the fact that the UltraSparc IV is nothing more than two UltraSparc III cores on one chip.

    One way to look at this is Sun maximizing their existing engineering efforts. However, by marketing it as some revolutionary feature advance, they're implying that they've done something new and exciting, as opposed to something that IBM is already doing and AMD and Intel are working on.

    Beyond that, Sun and Fujitsu have a co-manufacturing and R&D deal now, confirming something those in the enterprise space have been saying for a long time - Fujitsu was making better Sun servers than Sun.

    Plus Sun killed plans for the UltraSparc V, leaving only the Niagra. They have the Opteron line pushing up from below, and rapidly evaporating sales at the high end. They're resorting to marketing gibberish to add new features to the product line, while simultaneously offloading R&D and manufacturing to a partner.

    Remind me again why Sun is in the hardware business?

    Thanks,
    Matt

    --
    me@mzi.to
    1. Re:This is just Multi-core processing... by mzito · · Score: 4, Informative

      And actually, this makes me so grumpy that I forgot the whole other piece.

      Despite the fact that Sun markets the UltraSparc IV as a single processor, software licensors like BEA and Oracle require that you license their software PER CORE. This means that a "4 processor" UltraSparc IV requires 8 processor licenses for Oracle or Weblogic.

      Sun never tells you this, and consequently a lot of people suddenly get tagged with additional licenses if they get audited. BEYOND that, Sun tells people that they can "double their performance" by replacing all of their UltraSparc IIIs with UltraSparc IVs, not explaining that they are doubling their performance because they're doubling the number of processors, AND that doing that upgrade can put them on the hook for literally hundreds of thousands of dollars in software cost.

      We've seen a number of companies get bitten by that, and it is downright disingenuous of Sun.

      Thanks,
      Matt

      --
      me@mzi.to
    2. Re:This is just Multi-core processing... by Blitzenn · · Score: 1

      I have to agree. These manufacturers are bastardizing terms that we have had a hard enough time establishing and making mean something. This is not multithreading. It is not suited to the term. It is multicored processing. If we allow people like this to continually mixup the termonalogy like this, how are we going to talk to each other intelligently when it comes do to coder A explaining to coder B what they did and how they accomplished it? We won't without argueing out the base terms for the conversation.

      I am sure that there will be others who will throw up definitions that will support their arguement, but... Multithreading;

      " The ability of an operating system to execute different parts of a program, called threads, virtually simultaneously. The programmer must carefully design the program in such a way that all the threads can run at the same time without interfering with each other."

      There is of course a wee bit more to this definition needed to make the difference clear, and that is that each thread takes it's turn being processed, if on a single cored machine (physically impossible to do otherwise). A multithreaded piece of code can run two threads physically at the same time if it is run on a multicored machine. Otherwise the threads take turns in execution depending on the process priorities.

      A Multicored machine or processor, could run multiple threads or programs simultaneously (if coded to take advantage of the possible multiple procesing paths). This is something that a single cored or single CPU machine can only perform virtually.

      Let's stop the graying out of terms, the bastardizing of terms and the blantant misuse of terms for the purpose of promotion and sale.

      Shame on Sun!

    3. Re:This is just Multi-core processing... by philipgar · · Score: 4, Informative

      You miss one of the major points in the article, and that is that CMT is not really about the Ultra IV being a fully CMT processor. This is about the Niagra chip. The Niagra chip is truely a CMT processor.

      The reason this is so is because it functions as both a chip multi-processor and as a multi-threaded core (although I think I'd consider their multi-threaded cores to be fine-grained multi-threading rather then SMT but thats a different story altogether). While IBM's power5 offers these same advantages (dual core, 2 way SMT cores) this is 4 threads per processor and not overly impressive.

      The Niagra chip in comparison to IBM (and upcoming Intel dualcore/SMT designs) is based on the assumption that at higher clock speeds the cpu is rarely fully utitlized (while the P4 can retire up to 3 instructions per cycle many apps, particularly data-intensive apps have an IPC of less than 1). The chip contains 8 cores with 4 threads being executed on each core. This means 32 threads can run concurrently. Sure no single thread will run as fast as it would on a NetBurst, athlon64, or power chip, but the combined throughput is enormous. Assuming each runs at ~ 1/4 the speed of their counterpart, that still gives us 8 threads on a single chip. This is enormous, and will have a major impact on database design (I'm currently doing research on SMT's effect on database algorithms) and the payoffs can be great (as can standard prefetching).

      I wouldn't reccomend writing off CMT as a marketing buzzword etc. The era of throughput computing is upon us, lets just hope Oracle and the other per-processor vendors change their liscencing to something that correlates with TPC performance or some other metric that still has meaning, otherwise companies are better off with a couple massively parallel single core chips that cost a whole lot more and generate a whole lot more power for the performance they produce.

      Phil

    4. Re:This is just Multi-core processing... by Anonymous Coward · · Score: 0

      One way to look at this is Sun maximizing their existing engineering efforts.

      If you pay attention to what Sun has said in the past, Niagara is a ground-up from-scratch design. It is not base on previous UltraSPARC cores. This is why they call the UltraSPARC IV "first generation" CMT, while Niagara will be the no-holds-barred real deal.

    5. Re:This is just Multi-core processing... by Anonymous Coward · · Score: 0


      Plus Sun killed plans for the UltraSparc V, leaving only the Niagra. They have the Opteron line pushing up from below, and rapidly evaporating sales at the high end. They're resorting to marketing gibberish to add new features to the product line, while simultaneously offloading R&D and manufacturing to a partner.

      Sun has been remarkably consistent in their R&D investment, even post-boom. But if you look at the UltraSPARC V, it is redundant with the Fujitsu SPARC64 designs, and it takes resources away from Niagara, Niagara II, and Rock. Sun shows they aren't afraid to change course, when they feel it is best for them and their customers. This is a good thing, in my opinion.

    6. Re:This is just Multi-core processing... by 0xABADCODA · · Score: 3, Informative

      Sun's CNP is modeled after Tera's MTA architecture (now named Cray again), which trades memory latency for throughput. Basically, in MTA (massively threaded architecture) each of 128 processor threads issues a few memory fetch instructions and waits for the memory to arrive (dozens to hundreds of cycles). This happens for every thread so the effect is that memory fetches and execution time are separated... iow time=max(execution,fetch) vs time=exeuction+fetch of normal processors. This also makes having a pipeling irrelevant so no effore is wasted in branch prediction.

      That's great for scientific apps since they are massively parallel... Sun has taken the same idea and scaled it down to 4 overlapping threads so normal applications can benefit. While it can be used to run 4 separate process threads at a time, at least the MTA's is fine-grained so that what really happens is that the compile changes a for (;;i++) loop into four (;;i+=4) loops and runs them in parallel.

      This technology done right means a massive performance boost (as in like 25-50%) while also simplifying the processor. Contrast that this Hyperthreading, which complicates the processor and only gets ~5-8% benefit on average... it's mostly designed to minimize context switch times.

    7. Re:This is just Multi-core processing... by Anonymous Coward · · Score: 0


      The problem is with BEA and Oracle, not Sun. With Sun, IBM, Intel, and AMD all going multi-core, companies like Oracle are just milking it for all its worth before they have to cave in and charge per socket. It is inevitable.

    8. Re:This is just Multi-core processing... by mzito · · Score: 1


      I was only talking about the USIV, not the Niagra. Honestly, I would be willing to wager a (smallish) amount of money that Niagra is the last chip architecture Sun releases.

      Thanks,
      Matt

      --
      me@mzi.to
    9. Re:This is just Multi-core processing... by mzito · · Score: 2, Insightful


      Oracle's been talking about reworking their licensing for a long time, and I agree licensing by core is sub-optimal. However, Oracle is being forthright that they charge by core, while Sun is _hiding_ the fact the USIV _is_ a multi-core processor.

      Sure, Oracle are the ones charging per processor core, but Sun is the company that is selling this upgrade as a painless, cost-effective way to upgrade their infrastructure. I firmly believe they are being negligent in not warning customers that this is a multi-core architecture - if you go to Sun's site and look at how its sold, they pitch it as one processor, one core.

      Imagine you're a customer - you spend $100k on Sun's new processors as a "painless" 1-1 upgrade, and suddenly find out that the first 100k has put you on the hook for 150k in new licenses. Wouldn't you feel like you'd been misled?

      Thanks,
      Matt

      --
      me@mzi.to
    10. Re:This is just Multi-core processing... by Bryan-10021 · · Score: 1

      Did you READ the article on CMT? The Niagra chip has eight cores that will each be able to run up to four instruction threads simultaneously. So Niagra can handle up to 32 simultaneous threads but this is *JUST* multi-core processing?!? Current multi-core means two cores each running one thread each! And the chip just tapped out. Of course competitors are *working* on something to compete. Let me know when those chips tap out.

      Sun's cancelling of USV and partnering with Fujitsu was a smart move allowing Sun to leverage Fujitsu's 64bit compatiable SPARC CPU while saving tons of R&D money trying to design a competing SPARC CPU. It's a Win win situtation to me.

      You also leave out Sun's Rock chip which is a higher end version of Niagra.

      So Sun cancel's the USV and it's no longer in the hardware business? Intel has recently cancelled chips that were on its roadmap as it decided that dual cores and not GHz was the direction they wanted to go. I guess Dell isn't in the hardware business either because they don't design anything, CPU == INTEL, motherboard == INTEL, etc.

      Who other than Sun and IBM design their own chips for their servers? In IBM's case they even manufacture their CPU's. Sun farm's it out to T.I.

    11. Re:This is just Multi-core processing... by Sentry21 · · Score: 1

      Well isn't that a result of Oracle's licensing? I mean, should Sun have to say 'Yes, we have this great new processor and it will make things faster, but oh, here is every piece of software that might not work or might charge you more money', and investigate everything their clients use?

      Yes, it sucks that Oracle is trying to (and succeeding in) bend people over a barrel, but look at both sides: Sun is selling processors as processors, and Oracle is treating them as dual processors. Both have their points, but look at things from different angles.

      Does Oracle provide a way to limit how many CPUs it runs on? Can you put it on an 8-processor machine and tell it to only use four? If not, then too bad for them, and it's not Sun's fault. If so, then too bad for the DBA that didn't configure it, and it's still not Sun's fault.

      If I have a badass database server with 4 dual-core processors and only want to pay for a 4-processor license, then let me pay for that, limit it to four cores, and then I can use the other four processors for database maintenance tasks (cronjobs and so on) if I want to. One dual-core for the kernel and system processes, and one for background processes, cronjobs, etc.

      I don't think it's fair to blame Sun when companies get screwed by Oracle's licensing. Besides, if you buy hardware, you should know everything that that entails. If you don't do your research, you deserve to get screwed.

    12. Re:This is just Multi-core processing... by Anonymous Coward · · Score: 0
      We've seen a number of companies get bitten by that, and it is downright disingenuous of Sun.
      Or maybe it's disingenuous of Oracle and Weblogic. Why blame a hardware manufacturer for a billing problem? It's not their fault.
    13. Re:This is just Multi-core processing... by g0_p · · Score: 1

      First of all if you are planning on spending $100K on a server, it is a good idea to do some product research before you buy the product. Also recommended is to see if the software that you want to install is supported well on that hardware. Sun has always maintained that the software that runs on its chips should be licensed per chip rather than per core. The per core pricing is not a Sun pricing model and nor is it in endorsed by any Sun software product either. So why exactly should it advertise that it is a multi core chip? It is a complete non-issue from Sun's perspective.

      The data sheets for Ultrasparc 4 say clearly that it is a CMT processor with each processor capable of handling 2 threads. I dont know what you are complaining about. If anything you should be complaining to Oracle about their pricing model created for fleecing their customers. And BTW I checked out the pricing information sheets on the Oracle website. It does not say in any specific terms that the pricing is per core. Pricing models seem plainly stated as per CPU or on a per user basis.

    14. Re:This is just Multi-core processing... by mzito · · Score: 1


      I'm not disagreeing that its at least partially the customer's responsibility to figure out what's going on.

      What I am disagreeing with is Sun's policy of claiming that an UltraSparc IV != 2 UltraSparc IIIs. Instead, an US IV is somehow 2x faster than an US III, "because Sun is just that good".

      The datasheets say that the UltraSparc IV is a CMT processor, yes, with each processor capble of handling two threads, but show me where that datasheet says "two execution cores". In fact, take a look at the US IV datasheets and watch as the Sun folks dance like crazy to avoid ever saying, "multiple execution cores". They say things like, "simultaneous threads", or "based on two ultra sparc III pipelines". If Sun simply stepped up and said, "Look, we've smooshed two ultrasparc III processors into one slot, so you can double your CPU count in your existing frame", I would be content.

      And I'm not complaining because I feel like Sun has cheated me or my company. In fact, that policy only works to our favor - we get a number of interested customers who feel like Sun has treated them unfairly and want to move off Sun. I'm offended because as a business person, I do not intentionally mislead my customers as to the nature of what I do, something I know for a fact Sun sales reps have been doing.

      IBM has been doing multi-core chips for a while, and they've never tried to claim that they were one processor. They say its two - why is that so hard? If you want to claim Niagra is something fundamentally different, you can make that argument. But the US IV is really just two USIIIs put together, nothing more, nothing less.

      The last thing - right from Oracle's licensing agreement:

      "For the purposes of counting the number of processors which require licensing, a multicore chip with "n" processor cores shall be counted as "n" processors."

      Seems pretty clear to me.

      Thanks,
      Matt

      --
      me@mzi.to
    15. Re:This is just Multi-core processing... by I_am_the_man · · Score: 1

      Actually with Solaris Zones Sun provides a means to work the issue out. You can install and run Oracle in a zone that only uses two of the 4 processors (not sure oracle supports running their db in a solaris zone). But you point is well taken, Oracle does not give you a means to run on a subset of the hardware if that is all you have paid for or want to pay for.

    16. Re:This is just Multi-core processing... by g0_p · · Score: 1

      The last thing - right from Oracle's licensing agreement: "For the purposes of counting the number of processors which require licensing, a multicore chip with "n" processor cores shall be counted as "n" processors."

      I am sure that they have to say it at least in the licensing agreement about the fact that they are charging per core. Or how does Oracle intend to communicate the fact at all? But its conveniently not mentioned absolutely anywhere else on their product webpage. THAT definitely seems like they are trying to hide the fact.

      The datasheets say that the UltraSparc IV is a CMT processor, yes, with each processor capble of handling two threads, but show me where that datasheet says "two execution cores".

      Well show me on the datasheet for a Power 5 chip based server where it says that it is a dual core chip. I went through the pages and have not found any information anywhere.

    17. Re:This is just Multi-core processing... by ckaminski · · Score: 1

      There's two things you can do... dedicate more and more transistors to gaining instruction-level-parallelism, or just simplify the chips, increase clockspeed, and put more cores on a single die.

      The whole SMT fiasco has been about trying to gain benefits from a stalled processor pipeline by reexecuting alternate threads...

    18. Re:This is just Multi-core processing... by mzito · · Score: 1
      Well show me on the datasheet for a Power 5 chip based server where it says that it is a dual core chip. I went through the pages and have not found any information anywhere.

      Well, that's because they only sell the Power5 as a dual-core chip. That is, in the specs and what-not, when they say, "Two Power5 processors", they mean one physical die with two cores. Consequently the p595 that supports up to "64 processors", is 32 physical processors the way Sun counts them.

      Even in the single-processor ppc systems,its a dual-core power5, with only one core activated.

      This redbook discusses it in very clear, comprehensive terms:

      http://www.redbooks.ibm.com/redpapers/abstracts/re dp9117.html

      Thanks,

      Matt

      --
      me@mzi.to
    19. Re:This is just Multi-core processing... by Anonymous Coward · · Score: 0


      Sun's roadmap includes Niagara II and Rock. It could be that Sun continues the massive CMT designs, while Fujitsu provides the more traditional SPECtroll designs.

    20. Re:This is just Multi-core processing... by Anonymous Coward · · Score: 0


      Intel did Hyperthreading so they could market it for all it's worth. Just look at all the losers above who fell for it.

    21. Re:This is just Multi-core processing... by Anonymous Coward · · Score: 0

      Who other than Sun and IBM design their own chips for their servers?

      HP designs their...oh, wait, never mind.

  43. way to get it wrong by CaptainPinko · · Score: 5, Insightful

    As many others have already pointed out, Intel has had Hyperthreading available in Pentium 4 and Xeon CPUs for a couple of years now, which does exactly what the article is talking about.

    As many others know, you know exactly nothing about what you are talking about. HT has basically two sets of registers so that during a cache miss which would cuase a bubble the chip switches to the other set so it doesn't sit idle. Suns chip on the other hand actually have multiple corses physically doing work at the same time. In fact were it not for Intel's hideously flawed NetBurst architecture the hideous hack that is HyperThreading would not provide any preformance increase at all (in fact it doesn't as much provide an increase as much as negate a decrease...). For evidence consider how many Pentium Ms have HT on them... Now I may not be fully correct but I didn't volunteer a comment; I only posted to prevent the misinformation of others. You'll find more on ArsTechnica. I'd link to the article but I can't find anything on their redesigned site.

    --
    Your CPU is not doing anything else, at least do something.
    1. Re:way to get it wrong by taniwha · · Score: 1
      well of course it negates a decrease ... the whole point of stuffing the pipe from a different PC when one thead pipe-stalls is to use up unused micro-cycles that otherwise would go to waste. It's something that's hard to do on a traditionally piped machine (because the pipe bubbles tend to be large) but easier on a super-scalar machine because implementation is just a few extra bits of tag in the right place

      What Sun have done is pretty amazing ... not particularly new though, the SMT stuff has been done in a bunch of places before, so has the multi-core stuff (think Alpha maybe 5 years ago). (I helped build a core that had both 5-6 years back). If you do CPU design it's pretty state of the art these days - both make sense from a best-way-to-use-silicon point of view (in a world where cross chip delays preclude doing big-fast anymore, small-decoupled makes so much sense), it's taken Intel's marketting people a while to come round but their engineers have finally got them talking sense.

    2. Re:way to get it wrong by at_18 · · Score: 2, Interesting

      As many others know, you know exactly nothing about what you are talking about.

      Dude, you don't know anything either. P4's hyperthreading is a two-threads implementation of Simultaneous multithreading. Niagara is an 8-way multiprocessor on a chip, and each processor has four-way simultaneous multithreading, exactly like the P4, just with more threads.

      Regarding the amount of concurrent threads, it's basically equivalent to a 16-way Xeon server with hyperthreading enabled, but with much faster inter-processor communication (since it's all inside the same core), and of course much lower cost, heat dissipation, etc.

    3. Re:way to get it wrong by Lemming+Mark · · Score: 2, Informative
      HT has basically two sets of registers so that during a cache miss which would cuase a bubble the chip switches to the other set so it doesn't sit idle.

      I'm sorry but that's not correct. What you refer to is known as "Fine Grained Multithreading", or the product name "Superthreading".

      Intel's product "hyperthreading" is also known as "simultaneous multithreading" and is able to run multiple instruction streams simultaneously in order to maximise use of the functional units when they are not saturated by a single instruction stream. This is in addition to avoiding complete stalls on cache misses.

    4. Re:way to get it wrong by Spinlock_1977 · · Score: 1

      If I remember correctly, Sun's approach can migrate a thread to another (local) core with almost (or exactly) zero latency.

      Intel's HT (which is a Intel-ization of DEC's Simultaneous Multithreading aka SMT), requires time to switch between one "virtual processor" and the other.

      True multi-core chips (as opposed to HT stuff), hate to migrate a thread because it's so expensive in terms of time.

      But let us not forget the CELL processor (discussed here recently) - which has a collection of special-purpose "cores" - a small array of which will no doubt blow the doors off of the methods described above when it comes to aggregate throughput.

      -spinLock

      --
      - The Kessel run is for nerf herders. I can circumnavigate the entire Central Finite Curve in a lot less than 12 parse
    5. Re:way to get it wrong by ratboy666 · · Score: 1

      Jeez -- you insult the poster, *and* you are wrong *and* manage a +5!

      Way to go, dawg!

      PS. The Niagra implements both "hyperthreading" and "multiple cores".

      Hyperthreading IS a good idea -- read the article (or any of the Intel papers). Throughput is the word; most users who disable the feature, do so to improve latency (which may suffer)

      Ratboy.

      --
      Just another "Cubible(sic) Joe" 2 17 3061
  44. Re:Hyperthreading by BigZaphod · · Score: 1

    Yes it *sometimes* reduces it. Not always, though. For processor-bound tasks (such as crunching numbers), it would reduce performance as you can only do so much at one time. For normal tasks, it tends to increase performance as most tasks usually are waiting on memory or disk leaving a lot of free CPU time. Also normal tasks tend to have a lot of waste due to things like branch mis-prediction. The hyperthreading concept fills in those gaps with other running threads rather than always flushing the pipeline or causing a stall on the entire thing.

  45. Re:Hyperthreading by PitaBred · · Score: 2, Interesting

    Try make -j5 or -j6. Tends to have better results than the -j4 on my dual Xeon rig. And yes, I have benchmarked it.

  46. Sun's new chips by Anonymous Coward · · Score: 3, Interesting

    Sun's upcoming "Niagra" chips are supposed to have eight cores, each core being able to execute four threads. So that allows upto 32 threads executing at once -- on one physical chip.

    And we're not talking about "HyperThreading" where one of the CPUs is virtual. It's a real execution unit.

    And Intel and AMD are talking about dual-cores?

    This should help save space and energy (both in the power needed to run the box, and in running the cooling system).

    1. Re:Sun's new chips by olddotter · · Score: 1
      Sun's chips will have eight cores while Intel and AMD are talking about dual-cores: Finannly Sun will have a chip that is comparable to AMD Opteron in performance!

      Given the clock rate difference between AMD/Intel processors and the Sparc line, it will take a 4x parallelism for Sun to beable to hold their own. :-)

      50% funny and more than 50% true

  47. Hyper-multi-threading by sczimme · · Score: 1


    ..but wouldn't it be even better if it was hyper-multi-threading?

    Of course - all things are better when they're hyper*. Of course they tend to jump from A to B so quickly everything becomes blurry. Besides, jumping into hyper-multi-threading [isn't] like dusting crops, boy!

    * See Compu-Global-Hyper-Mega-Net.

    --
    I want to drag this out as long as possible. Bring me my protractor.
    1. Re:Hyper-multi-threading by Anonymous Coward · · Score: 0

      Skip hyper and go straight to ludicrous (plaid).

  48. I buy the thoughtput / speed arguement, but... by PotatoHead · · Score: 3, Informative

    there are still some applications where raw CPU speed matters.

    We have been at the thoughtput is good enough point for several years. In truth, this is old news really. I've got IRIX servers doing lots of things plenty fast, clipping along at a brisk 400Mhz. There is not much you can't do with that, particularly when running a nice NUMA box.

    I assume the same holds true for SUN gear. (I think their NUMA performance is a bit lower than the SGI, but I also don't think it matters for a lot of enterprise stuff.)

    One application I have running, NUMA style, is MCAD. It's cool in that I have one copy of the software serving about 25 users, running on a nice NUMA server that never breaks. Admin is almost zero, except for the little things that happen from time to time --mostly user related.

    However, I'm going to have to migrate this to a win32 platform. (And yes, it's gonna suck.) Why? The peak CPU power available to me is not enough for very large datasets and I cannot easily make the data portable for roaming users. (If there were more MCAD on Linux, I could do this, alas...)

    Love it or hate it, the hot running, inefficient Intel / AMD cpu delivers more peak compute than any high I/O UNIX platform does. And it's cheap.

    Sun is stating the obvious with the whole I/O thing, IMHO. In doing so, they avoid a core problem; namely, peak compute is not an option under commercial UNIX that needs to be. (And where it is, there are no applications, or the cost is just too high...)

    This is where Linux is really important. It runs on the fast CPU's, but also is plenty UNIXey to allow smart admins to capture the benefits multi-user computing can provide.

    Linux rocks, so does Solaris, IRIX, etc... The difference is that I can get IRIX & solaris applications.

    WISH THAT WOULD CHANGE FASTER THAN IT CURRENTLY IS.

    1. Re:I buy the thoughtput / speed arguement, but... by Sentry21 · · Score: 1

      Love it or hate it, the hot running, inefficient Intel / AMD cpu delivers more peak compute than any high I/O UNIX platform does. And it's cheap.

      I dunno, universities, laboratories and the like have been loving the G4, and the G5 is more and better. Yeah, it runs a little hotter, but it's still a sweet beast regardless, and it comes with a solid 64-bit UNIX core. The question I don't have the answer to, though, is whether or not it runs what you want. I suspect it would though.

    2. Re:I buy the thoughtput / speed arguement, but... by PotatoHead · · Score: 1

      It doesn't run MCAD. Currently X86/win32, Sparc, HP-UX (IBM & SGI Platforms on the decline)

      You are right about the G-series chips otherwise.

    3. Re:I buy the thoughtput / speed arguement, but... by ckaminski · · Score: 1

      I don't get it. Linux is close enough to Irix to make portability a cinch (compared to Win32), and aside from the endianness issue of your data, there should be no issues at all in moving to a Linux platform. It's not as if the Unix world hasn't known "SGI is dying" for almost half a decade now. Kick your vendors in the ass for a Linux version. Dumb ass vendors... :-)

      One benefit of Solaris on x86-64 is again, aside from the endian problem, relatively easy portability...

  49. Re:Hyperthreading by SunFan · · Score: 4, Interesting


    "Intel has had Hyperthreading available in Pentium 4 and Xeon CPUs for a couple of years now, which does exactly what the article is talking about"

    You are wrong. Period. Sun's CMT is several independent CPU cores on the same die with a huge bandwidth interconnect on-die. Intel's Hyperthreading is a gimmicky technology that has a very small real-world impact on performance.

    And your personal "benchmarks" cite no numbers. I be trolled!

    --
    -- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
  50. Memory Mismanagment. by Anonymous Coward · · Score: 0

    Now all we need is a revolution in memory. A lot of software design is memory managment. From GCs to solve the "running into each other" memory pool problem, to simply shoving memory contents in and out of processors to set something really simple. e.g. #11111111b*

    *Yes DMA helps with some of this, but...

    1. Re:Memory Mismanagment. by jd · · Score: 1
      Some of this could be done via "Processor in Memory" architecture. The idea with PiM systems is that the memory chips have limited processing capability, so that tasks that are 100% memory-bound don't need to be pushed into the CPU.


      This technology has been around at least 20 years, but has never really moved outside of academia and theoretical systems.


      The simplest operation you might want to do a lot of is copying. So, if your memory chips supported such copying, you could trap such a command and support it internally. For block copies, you'd simply program a counter and cycle through.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  51. what MT means to developers by fred+fleenblat · · Score: 3, Interesting

    While conceptually unrelated, I put threads into the same mental category as untyped pointers. They are extremely powerful, but a complete PITA to debug if anything goes wrong, even moreso if you are maintaining someone else's void* or pthread_create filled application.

    What I've always done is code extremely defensively:
    1. make the various threads data-independent enough to be free-running and only co-ordinate at the start and finish of a thread's activity. If necessary, re-architect everything in sight to make this possible.
    2. when interaction is required, get a nice big coarse-grained lock and do everything that needs to be done and get it over with. profile it; there's a good chance it'll be over with quickly enough that it won't erase gains from parallelism or at least you can see what's taking so long and move it outside the lock.
    3. do TONS of load testing with lots of big files and random data. thread-related bugs can often hide for years in your code. Unlike divide by zero or null pointer references, a thread bug won't necessarily give any kind of hardware fault or exception. You have to go hunt for the bugs, they won't just pop up and say hi here i am.
    4. If you have multiple people of various technical abilities working on the code, you should add a grep/sed script to your makefile to check for accidental introduction of mt-unsafe library calls (strtok, ctime, etc). Flag new monitors and locks for review. Warn about dumb things like using static or global variables.
    5. Last trick is to use a layer to allow your program to be compiled for fork/wait, pthread_create/pthread_join, or just plain old co-routine execution (esp if there is a socket you can set to non-blocking). In addition to being able to test your code for correctness in various situations, you also have a baseline to see if the multithreading is an actual improvement.

    With the obvious exceptions for embarassingly parallel algorithms, I've found that humdrum client/server or middleware stuff:
    (a) gets only marginal gains from multithreading
    (b) you have to work for it--profiling and tuning are still required to get top-notch performance
    (c) effectient scaling beyond a handful of threads is the exception not the rule. If you have more threads than CPU's, it's a simple fact that some of them are going to be waiting and then your scaling is done.

    1. Re:what MT means to developers by GileadGreene · · Score: 1
      There's a simple answer to your problem: stop using the ancient and bug-instigating threads model of concurrency, and use a better model. It worked for Bell Labs.

      I've personally created complex programs with several thousand threads, without having to worry about any of the defensive techniques you preach, simply by using a better concurrency model. Not only do they work, I can prove that there are no thread-related bugs (which testing - no matter how much you do - simply isn't guaranteed to catch) because the CSP concurrency model can be mathematically analyzed and mechanically model-checked.

    2. Re:what MT means to developers by ckaminski · · Score: 1

      Threads are useful for only two things:

      * dispatching events
      * managing discretely different but cooperating events

      Obvious, but yet no so. Reactor/Proactor patterns take care of the first, careful programming is all that can really address the seecond. If I have a select loop, I'll dispatch connecting sockets to a set of worker threads (web server). If I have some log cleanup event that needs to be fired, I could spawn an object with a self-contained thread, from that same Reactor pattern, fire and forget.

  52. Like hell it is by turgid · · Score: 5, Informative
    From the lack of non-Sun-supplied buzz regarding this technology, it would appear that many people aren't finding it very exciting.

    More like none of Sun's competitors have anything which comes remotely close.

    Notice how nearly a year after Sun announced this, intel finally admitted that clock frequency (i.e. gigahertz) isn't everything and that they'd be bringing out dual core processors?

    Niagara has 8 cores each capable of 0-clock cycle latency switching between 4 different thread contexts.

    Who else has working hardware and an OS to go that can do this?

    1. Re:Like hell it is by Anonymous Coward · · Score: 0

      what makes you think Linux can't support this if the hardware's there? - it already supports SMT and SMP, no reason to think it wont continue ...

    2. Re:Like hell it is by Anonymous Coward · · Score: 0

      The difference between Linux supporting this and Solaris supporting this is kind of like the difference between getting somewhere by hobbled-together horse-drawn carriage as opposed to by anti-gravity flying car, respectively.

    3. Re:Like hell it is by mrdisco99 · · Score: 1


      IBM does multicore on Power and has for years.

      --

      +++
      NO CARRIER

    4. Re:Like hell it is by mytec · · Score: 1

      Where on earth did you get the impression from the poster that Linux couldn't run the hardware at some point in the future? The poster is right that no one else has the hardware or the OS *now*.

    5. Re:Like hell it is by turgid · · Score: 1
      IBM does multicore on Power and has for years.

      Yes, two cores on a die. Niagara has 8.

    6. Re:Like hell it is by Anonymous Coward · · Score: 0

      Who else has working hardware and an OS to go that can do this?

      Well, maybe nobody, but what good is it to me if it doesn't run any of my apps?

      Sure, Solaris/Sparc have some interesting technical features that geeks on slashdot can drool over, but will it run the latest Photoshop faster than my dual-G5 Mac? Will it run MS Word or Excel faster? Will it run Doom 3 better? No? Then why would I care?

  53. developers by delirium+of+disorder · · Score: 1
    --
    ------ Take away the right to say fuck and you take away the right to say fuck the government.
    1. Re:developers by Anonymous Coward · · Score: 0

      If we're talking about running multiple threads, shouldn't that be "developers developers developers developers"?

  54. Re:Software:Hardware thread mapper by Anonymous Coward · · Score: 0

    Troll, go back to where you came from.

  55. Well, Enterprise is a reference to by Anonymous Coward · · Score: 0
    that failed Star-Trek-Like show.

    I think Sun is suggesting that they'll be canceled just as fast.

  56. Re:Hyperthreading by Anonymous Coward · · Score: 0


    When Intel ships a CPU that can run 32-threads simultaneously, then you should ask if Intel is inventing what Sun already did.

    Intel is more marketing than substance. Hyperthreading on Pentium is like 1/16th of Niagara, at best.

  57. What is unique about Sun's CMT by Anonymous Coward · · Score: 0
    is they're not pipelined very much, so it's like hyperthreading only more so. That means more chip real estate to put lots more cpu cores on.

    What that means from a programming POV is that you really need to exploit multithreading. And scalability becomes much more important. Things that scale well with 1 or 2 processors, aren't going to scale with 16 or 32 processors. Lock-free synchronization will become more important since it scales better (11 on a scale of 1 to 10).

    That said, I think it will be some while before lock-free becomes important which is why I put my project on the back burner. I think what will happen instead is Sun will position Niagra and Rock as a commodity solution, cheaper than a bunch of cheap pc's running disparate tasks. But, Sun in the commondity market?

    1. Re:What is unique about Sun's CMT by Anonymous Coward · · Score: 0

      But, Sun in the commondity market?

      Sun Grid

  58. FTA/RAIABSF instead of FT/RAID? by TimTheFoolMan · · Score: 1

    On the plus side, whatever you might think of CMT technology, the description given demonstrates the opportunity CMT brings for redundancy:

    "...the execution of multiple simultaneous tasks - even on a single processor."

    "Chip multi-threading (CMT) brings to hardware the concept of multi-threading, similar to software multi-threading..."

    "A CMT-enabled processor, similar to software multi-threading..."

    "...CMT processors, software threads can be executed simultaneously..."

    "...executes many software threads simultaneously within a processor..."

    "Executing software threads simultaneously within a single processor..."

    I could be wrong, but I get the idea that CMT allows you to perform multiple simultaneous software threads, even within a single processor!

    If I RTFA too, do I have a FTA (Fault-Tolerant Article), or am I simply creating a RAIABSF (Redundant Array of Independent Articles and Buzzword Sentence Fragments)?

    Tim

    1. Re:FTA/RAIABSF instead of FT/RAID? by Blitzenn · · Score: 1

      "I could be wrong, but I get the idea that CMT allows you to perform multiple simultaneous software threads, even within a single processor!"

      That's where they are hoodwinking you. A single processor machine hasn't any ability to process multiple instructions in the same clock cycle. Physically impossible as there is only one path or pipeline through the actual 'core' of the processor. Only one instruction can be physically processed at any given point in time. It's a physical limitation of a single CPU.

      Multithreading allowed us to run multiple 'threads' or pieces of code virtually simultaneously, by allowing us to run a second piece of code, when the first piece was idle, (as perhaps in that case of waiting for a response from another piece of hardware). The key here is that multithreading allows for virtual multiprocessing of code, whereas a multicored machine or processer can allow for physical, simultaneous execution of instructions.

      If Sun is bent on using this as a selling point, then they are setting themselves up for a lawsuit on selling a product under false pretenses. No amount of code is going to get a single processor cpu or machine to execute two instructions simultaneously. Virtually, sure, physically, never. See you in court Sun.

    2. Re:FTA/RAIABSF instead of FT/RAID? by SuiteSisterMary · · Score: 2, Insightful

      Ah, but when you have one physical 'chip' that actually consists of four processor cores, you *can* do four simultanious tasks on one processor.

      The advantage over good old fashioned SMP? Well, probably the interconnect is way faster, and if the cores all share some cache or something, sibling threads should see some benefit.

      --
      Vintage computer games and RPG books available. Email me if you're interested.
    3. Re:FTA/RAIABSF instead of FT/RAID? by Anonymous Coward · · Score: 0

      That's true, but that is not what this article is about. Sun is (re)inventing "HyperThreading" which someone else came out with first.

    4. Re:FTA/RAIABSF instead of FT/RAID? by TimTheFoolMan · · Score: 1

      Uhm... I... uh... was trying to be funny.

      Clearly, the tongue needs to go deeper into the cheek. (I hate it when this happens!)

      Tim

  59. Complementary concepts by WebMink · · Score: 2, Insightful

    Since the Pentium 4 according to Intel, but it's not a good question as that's Intel's trademarked term for their two-thread implementation of simultaneous multithreading:

    Simultaneous multithreading allows multiple threads to execute different instructions in the same clock cycle, using the execution units that the first thread left spare.

    By contrast, Niagara is implementing Chip-level multiprocessing:

    CMP is SMP implemented on a single VLSI integrated circuit. Multiple processor cores (multicore) typically share a common second- or third-level cache and interconnect.

    In other words, Niagara implements in hardware, at greater scale, what Pentium 4 offers as an emulation feature. In theory one could SMP on top of CMP chipsets for even greater throughput. If you find the Sun article too hard, the Wikipedia references I have cited will probably prove much easier to understand.

    1. Re:Complementary concepts by Tired+and+Emotional · · Score: 1
      The Wikpedia article says its just multi-core.

      So there's nothing new here. The main point of interest would be how many way can you actually go out and buy and how much bang you get for your buck on your app of interest. That's not a trivial consideration but its not earth-shaking.

      The other point of interest is how the cache scales. If the shared cache scales as number of cores then you get some diversity and sharing that yields performance gains over multiple CPU systems (with no shared cache).

      If the cores go up but the cache does not then you are generally better off with separate CPUs - although of course its not hard to produce synthetic examples where the opposite is true. Finding a real app where multicore with fixed cache is faster will be harder, though probably not impossible (something that pushes a wave front might do it).

      It is interesting to contrast the SGI/Cray NUMA approach. With multi-core the memory bandwidth is fixed but so is memory access latency. With NUMA bandwidth also scales with number of processors but you pay a latency price for non-local memory access.

      (SGI's machines are actually multi-processors per memory system so the above is a simplification of the actual case)

      --
      Squirrel!
    2. Re:Complementary concepts by WebMink · · Score: 2, Insightful
      Actually the wikipedia article doesn't say that. It says:
      Sun Microsystems, in contrast, considers its UltraSPARC IV to be a multi-threaded rather than multi-processor chip. Intel agrees with Sun. This is not an idle debate, because software is often more expensive when licensed for more processors.

      Sun refers to the architecture as Chip-level Multi-Threading (CMT) and according to the white paper, while there are indeed multiple cores, each can also multi-thread:

      Sun's CMT processors will also have multiple cores on a single piece of silicon, with each core being able to process multiple threads, as shown in Figure 1.5. As a result, a single CMT processor will be able to process tens of threads simultaneously, exponentially increasing the amount of data processed each second.

      Cache also seems to have been considered:

      Shared chip resources such as large amounts of cache are designed to speed communications between cores to streamline parallel processing of threads.

      So while breathless enthusiasm may not be in order, a certain level of optimism seems warranted :-)

    3. Re:Complementary concepts by Tired+and+Emotional · · Score: 1
      Taking n processors and making them each m-way multi-threading gives you m x n threads, not m^n (or n^m) threads. So the use of the word exponentially is totally false. Probably written by someone who would use "Quantum Leap" to describe a very large advance rather than a very small one.

      Also the licensing issue is one that comes up on multi-threaded architectures because the threads look to the OS (and the licensing software) as multiple cpus. I expect this has been fixed but its was certainly a problem at first. Its more a marketing issue than a technical one.

      --
      Squirrel!
  60. it means a lot-Erlang. by Anonymous Coward · · Score: 1, Insightful

    "As far as threading is concerned, one of the few languages I've dealt with that makes mutexes, semaphores, etc. easy to deal with is Java."

    So can Erlang.

    Wings3D is written in Erlang.

    1. Re:it means a lot-Erlang. by GileadGreene · · Score: 1

      Erlang is much nicer to use than Java's built-in thread model. If you want decent concurrency for Java you should get the JCSP library.

  61. Re:Nothing new? Quite the opposite by davecb · · Score: 1
    Because the current big bottleneck is memory latency, either vendors will add more cores and use the memory bandwidth, or they'll scale more and more poorly.

    It makes good sense to fix the bottleneck, because that's where the problem lies. Improving other parts which don't have problems, according to Amdahl, is A Bad Idea (:-))

    --
    davecb@spamcop.net
  62. Hype-threading, fer sure by Ancient_Hacker · · Score: 1

    It would be nice to have more than hype. IIRC the Intel hyperthreading documents were mostly hype, plus a few very unimpressive benchmarks. When benchmarks by the original company are borderline, a little bell should go off. So now Sun has something similar. We're supposed to buy their new proprietary hardware and rewrite our programs and introduce concurrency bugs? And for what, a few percent improvement? Hmmmm.... Pass..

  63. Delphi by Anonymous Coward · · Score: 0

    Delphi's standard library includes very convenient wrappers around such things as threads. Writing a new thread is as easy as descending from TThread and overriding the protected Execute method.

    Unfortunately, D7 was the last gasp of a fantastic language and environment. Between the Borland->Inprise->Borland fiasco, terrible marketing, and C#, Delphi will never be recognized for the fantastic language & product that it was.

    You claim that C# is a bastardization of Java, but in fact is a combination of Java (which, by the way, simply employs a 100% object-oriented architecture) and Delphi. The lead designer, Anders Hjelsberg, was heading up the object-orientation of Pascal as Delphi. Many Delphi developers are very comfortable in C# due to the similarities in design and structure.

  64. mod me down more! by dextr0us · · Score: 1

    yeah, i'm flamebait....

    actually i was saying how cool beos was.

    just because theres caps doesn't mean its flamebait.

    thanks mods.

    --
    "Martha Stewart can lick my Scrotum......do i have a scrotum?" -- Sharon Osbourne
  65. What's it mean? by Anonymous Coward · · Score: 0

    Now we have bad grammar in the headlines.

  66. Ada by Zygfryd · · Score: 2, Informative

    In fact I do know a better language, Ada95/2005.
    It's simply meant for threading and unconventional compiler optimizations (through the enforcement of constraints), while still being imperative and having a familiar syntax. And it's meant to be compiled unlike Java.
    Here's a site about Ada and here's another one.
    A good (alas not perfect) Ada95 compiler is included in GCC 3.4.

    So aye, we are ready for the CMT systems.

  67. Re:Hyperthreading by JoeBuck · · Score: 1

    You are correct in saying that there is an important difference between Intel's hyperthreading and actually having independent CPU cores on the same die. But you're wrong in claiming that Intel's hypethreading has a very small real-world impact on performance; the win can be substantial for apps with lots of memory traffic (though the win disappears in number-crunching applications), and this is a common case. For "make -j" on a large project, the Xeon win is significant.

  68. Obligatory Ousterhout Threads Are Bad Idea Paper by Anonymous Coward · · Score: 0
  69. Parallel to the extreme by ka9dgx · · Score: 1
    Taking this to its logical conclusion means that some day, we'll have millions of 4 bit cells, each doing running a fixed 64 bit program, and passing the intermediate results to their neighbors in the orthogonal grid. There won't be a program counter, and it won't run C++ code. ;-)

    On the plus side, you'll have loader code that auto-routes around bad cells. --Mike--

    1. Re:Parallel to the extreme by Animats · · Score: 1

      Been there, done that. Look up "connection machine".

  70. Multi-threading means... by Anonymous Coward · · Score: 0

    Freaking impossible to debug!!

  71. Re:Hyperthreading by Anonymous Coward · · Score: 0


    For "make -j" on a large project, the Xeon win is significant.

    Just how significant? All the reviews when Hyperthreading came out showed 4% at best. It was kind of a joke at the time.

  72. Games in general would *LOVE* this if done right. by phorm · · Score: 3, Interesting

    Actually, when you think about it an improved threading model would actually strongly benefit well-programmed games. Why? Because there are a lot of semi-related processes occuring. Sound, graphics, physics, etc etc... they're all part of the game but work in very different ways.

    Now if you're working with a multithreaded CPU, one processor can be handling your CPU-bound graphics work (much of this is handed off to the video card anyhow), another can be doing sound/surround mixing, etc.

    In an FPS with complicated AI, you could theoretically hand that off to CPU #2 while #1 is handling different things. Your graphics engine might not have ugly-mofo-alien #235 onscreen to render, but meanwhile he's watching you and looking for a boulder that will offer him good cover to snipe you from instead of just sitting like a drone waiting for a computer-acurate headshot.

    Now let's say that PC's going multi-CPU. Maybe you don't need a single superpowerful processor, just a videocard and a few lower-powerful processors. Processor #1 is handing off the environmental data, #2 is prepping it for rendering and shovelling your GPU full of vertices, #3 is playing pinpoint surround for that cricket chirping behind the rock on your far left, and #4 is doing AI for ugly alien mofo #287.

    When I think about how games are advancing a lot can come down to interprocess communications and/or bandwidth limitations. The GPU still handles much of the video stuff so your CPU isn't really a bottleneck there in many cases, but as internet connections speed up then you're going to have MMORPGs, FPS's, and more chock full of "actors" that make up sight, sound, physics, and AI that could very well benefit from more CPU's rather than extra ticks on your overclocked single processor.

    After all, eye-candy is only a part of realism. True realism is also very much about a multitude of things happening at once.

  73. Re:Hyperthreading by BigZaphod · · Score: 2, Insightful

    Make a comment and ask a question and get marked as troll.

    Go figure.

  74. HyperRAM Technology! by MrNybbles · · Score: 3, Interesting

    I think the most interesting part of the article was when it said "Processor speed has increased many times -- it doubles every two years, while memory is still very slow, doubling every six years."

    So maybe it would be more efficent for people to stop screwing around with new processor design ideas for a while and put a little effort in doubling the speed of memory access (and I don't mean by using level whatever caches). Selling motherboards with a faster memory bus would be easy, just give it a cool sounding name kind of like Sega's "Blast Processing". Let's call it "HyperRAM Technology!"

    --
    Losing faith in humanity one person at a time.
  75. It isn't just user level threads by MerlynEmrys67 · · Score: 1
    If your game is doing any significant I/O (think network, keyboard, mouse, etc. here) then all of the Interupt processing can be handed off to the second processor - leaving a processor to handle just "User" processes.

    In reality games tend not to be I/O bound (except for graphics that tend not to be interupt driven anyway) - but many other workloads (think heavy HEAVY network) are rather I/O bound and they are bound in the Interupt processing. Freeing a CPU up to handle just Interupt processing will help the other CPU do the important User level work

    --
    I have mod points and I am not afraid to use them
  76. Think of it this way by phorm · · Score: 1

    How about something as simple as a search through an array of data.

    You could have a CPU do a sort-based search, or a linear search. You could have two CPU's divide and conquer your list independantly and tackle it on a first-there basis.

    For example, a list of names. If threaded became commonplace (if multi-CPU were common), a thread function might become:

    Take list of 10000 clients where you are looking for client "Doug Ellis"

    CPU #1 tags all items (assuming alphabetical sort by lastname) between 1 and 5000. CPU #2 takes 10000-5001.

    The race begins, CPU #1 will hit the result and CPU #2's process can be terminated. OK, so this probably isn't much faster than your standard search.

    But how about "Al Sanders"... a single CPU could would have more divides to reach that name, dividing names and finding if the target result would be more/less than the middle name. A linear search would take longer.

    With either way, a second CPU starting from the list end would tag that name first, ending the search faster. Of course, a single CPU which was truely 2x faster than either of the duals would do nicely as well... but when we've reached a point where more MHZ aren't so easily forthcoming then dual CPU's handle the situation nicely indeed.

    1. Re:Think of it this way by pclminion · · Score: 1
      But how about "Al Sanders"... a single CPU could would have more divides to reach that name, dividing names and finding if the target result would be more/less than the middle name.

      Nitpicking here... I assume you're talking about a binary search. It takes about the same amount of time to find a name anywhere in the list, unless you happen to get lucky and land directly on it. The lower and upper search bounds still have to converge to a zero-width window, and that takes the same number of divisions no matter where in the list the target is. You can optimize slightly by checking if you've hit the target on each division, in which case you might terminate early, but that doesn't happen often, especially in a large list.

      Unfolding the list into two lists, with the even elements in one list and the odd elements in the other, only saves you, at most, a single division. In short, searching a sorted list is not the type of operation that benefits greatly from multiple CPUs.

  77. Paper on multithreading by Richard+W.M.+Jones · · Score: 2, Interesting
    It's not a particularly new idea. I wrote a pretty detailed paper at university about multithreading. You can read it here:

    http://www.annexia.org/tmp/multithreading.ps

    Rich.

  78. sounds hyper by pyrrho · · Score: 2, Funny

    all you need is the ability to run processes... which I do right here.... on this abacus...

    --

    -pyrrho

  79. Bigger version of an existing idea. by Bruce+Perens · · Score: 2, Informative
    My Pentium 4 processor has 2 threads. Linux treats them as 2 processors, and makes full use of them. Yes, it's cool to have 8 cores and 4 threads per core. But this is all about price/performanace. An 8-core chip that shares the cache, VM infrastructure, and memory interface between all cores is going to work best for CPU-intensive tasks that are not also I/O or memory-intensive and can be partitioned into multiple threads easily. Not photorealistic rendering, for example, that requires too much data. And it won't handle separate-process loads as well as 8 blades would. So, watch all of those parameters: price, power, memory bandwidth, how cache and VM are done, PC board real-estate, etc. How they are combined will tell you whether that chip is a win or lose for your application.

    Bruce

    1. Re:Bigger version of an existing idea. by turgid · · Score: 1
      My Pentium 4 processor has 2 threads. Linux treats them as 2 processors, and makes full use of them.

      Solaris makes full use of them too. In real world applications, intel's "Hyperthreading" only buys you a few percent up to 30% at most in the best cases and often results in a performance decrease. Hyperthreading is a very cheap and over-hyped implementation of SMT. It is a hastily cobbled together afterthought tacked on to a disasterous architecture to try to mitigate its ludicrously long pipeline which spends most of it's time being filled and flushed due to branches and context switches.

      An 8-core chip that shares the cache, VM infrastructure, and memory interface between all cores is going to work best for CPU-intensive tasks that are not also I/O or memory-intensive and can be partitioned into multiple threads easily.

      Wrong. Niagara is designed to have very high memory bandwidth and to hide most of the latency through context-switching to different threads. It is designed for highly multi-threaded applications such as web serving.

      Not photorealistic rendering, for example, that requires too much data. And it won't handle separate-process loads as well as 8 blades would.

      It is not designed for floating-point workloads such as rendering or simulation. It will handle separate processes very well.

      So, watch all of those parameters: price, power, memory bandwidth, how cache and VM are done, PC board real-estate, etc. How they are combined will tell you whether that chip is a win or lose for your application.

      It's designed for efficient, high-density, highly multi-threaded blade applications.

      For such workloads, it's a much more efficient use of the available transistors on the CPU die than conventional CPUs. Conventional CPUs spend 75% of their time idling waiting for memory access. The Pentium IV is much worse and an extreme case as I mentioned above. Niagara does things properly and hides latency by switching thread contexts when a thread is waiting for memory access.

    2. Re:Bigger version of an existing idea. by Bruce+Perens · · Score: 2, Informative
      It's designed for efficient, high-density, highly multi-threaded blade applications..

      Um. The whole point of blades is that they don't have the expensive interconnect. So, here's an architecture with the interconnect in the chip, which would make it cheaper. But it's still not clear to me that a big system built around one die that uses transistors really efficiently is going to be less expensive than eight smaller systems that don't use their transistors as efficiently.

      Bruce

    3. Re:Bigger version of an existing idea. by turgid · · Score: 1
      But it's still not clear to me that a big system built around one die that uses transistors really efficiently is going to be less expensive than eight smaller systems that don't use their transistors as efficiently.

      You have to take into consideration floor space, rack space, cooling and power requirements, etc. Of course, you can take a bunch of cheap Dell PeeCees and rack them. You could also, probably in less space, take 8 of these systems and use less power and pump out less heat and get vastly more throughput, supposing it does turn out to be as good as they say. Who knows how much a Niagara system is going to cost, but I'll bet it will be competitive with other small blade systems.

    4. Re:Bigger version of an existing idea. by platypus · · Score: 1

      And in the end, it's about raw speed. Even Sun's charts I saw a some time ago did only show an advantage for Niagara in around 2007. And it was very optimistic in it's projection for the speed of Niagara, and quite pessimistic in forecasting the speed of rival processors - and totally neglected that AMD, Intel or IBM also could also achieve combining multiple chips on one core if there is a market for that. And all this combined with the questionable target problem field of such an architecture.

      One must not forget that competing CPUs are a) much faster core-for-core b) probably also much more cost efficient concerning (single core speed)/cost.

    5. Re:Bigger version of an existing idea. by htd2 · · Score: 1

      Um. The whole point of blades is that they don't have the expensive interconnect

      It depends, some blade servers do have rather expensive interconnects eGenera for example and all blades are more expensive than buying the same resource in a 1U/2U server.

      The first Niagara server from Sun is not going to be a large system, think 1U or 2U and clearly in that sort of form factor with a price thats competitive with current 1U/2U servers it will be competitive.

      You then factor in Solaris 10, Zones/Containers and you have a very effective competitor to x86 blade servers running a blade provisioning product since provisioning a Zone/Container is very simple and very quick process.

  80. really....? by grahamsz · · Score: 1

    I'm not certain, but i thought intels HT processors still only had one execution unit. They just have two fetch and decode processes and fast context switching between them.

    Sure this 8 core chip won't be good for everything. But when you've got a lot of similar processes in some server environment then it should do very well.

    1. Re:really....? by Bruce+Perens · · Score: 2, Informative
      i thought intels HT processors still only had one execution unit. They just have two fetch and decode processes and fast context switching between them.

      I wasn't even assuming they have that much. The minimum you need to make this trick work is two independent contexts. That means two copies of all kernel-visible control and data registers. You would probably not need to save internal microstate unless you need it to restart a long-running instruction.

      Anything else on top of that is optimization.

      Bruce

    2. Re:really....? by ckaminski · · Score: 3, Interesting

      The only thing intel's hyperthreading buys you, and what most symmetric multithreading implementations buy you, is a solution to the cache miss problem. If your pipeline stalls, you simply execute the next thread in the list until you get the data you need.

      Now, in some sophisticated designs, which is what I'd expected the P4 to do, was to turn the extra parallel execution units into independant ones, so you could issue 2 or 3 instructions simultaneously, and forgoe all the branch prediction, etc.

      Turns out that the P4 20 stage pipeline needed help. SMT/Hyperthreading was it.

  81. Threading by Anonymous Coward · · Score: 0

    My re-entry into programming as a hobbyist was via BeOS circa 1998 (I did a lot of CS in college but eventually decided to go to medical school). BeOS had a design philosophy that everything should be multithreaded as much as possible. It made for a very user-responsive system, but it also made almost all apps susceptible to race conditions. The BeOS API was very "fun" and easy, but I think it was a little deceptive. My two fairly usable apps were a MineSweeper clone and a front-end for a single-player chess program. In both cases, I had to deal with synchronization issues to avoid bizarre behavior (e.g. making sure the chessboard displayed each piece once and only once).

    There was a great discussion around the time Be broke up involving some of the Be engineers where it was acknowledged that the "pervasive multithreading" idiom really made it exceptionally difficult to write bug-free apps, and also imposed extraordinary demands on the OS with respect to messaging. (Usually, we think of the OS as managing memory allocation, processor scheduling, and disk I/O, but under BeOS the app_server had an additional, highly critical role in handling messages between and within threads).

  82. What's it Mean to Developers? by oliverthered · · Score: 1

    Well, it means you should be pestering QT to make QT thread-safe, without a thread safe tookit your hands are tied.

    (QT is GPL, so there's no reason a bunch of developers couldn't get together and mane a thread safe version of QT)

    --
    thank God the internet isn't a human right.
  83. My Answer to the thread question. by dance2die · · Score: 0

    Multithreading - What's it Mean to Developers?

    Just another word to confuse your boss...

    --
    buffering...
  84. Don't use oracle by Anonymous Coward · · Score: 0

    Personally I would be more pissed at Oracle. Very few software vendors are taking this approach. MS SQL server doesn't charge per core. Oracle even changes 2x for a hyperthreaded x86. Its just plain gouging by Oracle. Another slimy thing that they do is to charge you for the enterprise version based on the number of "cores" that you can put into your server. Not the number in there but the number it CAN have. So if you happened to buy a 4 cpu sparc in the us3 days, you could pay for Oracle "standard". But now, since its possible to put dual core us4's in, the box is no longer allowed to use the "standard" version because it is capable of holding 8 cores. How is THAT Sun's fault.

    Oracle sales is the used car lot of the software industry.

    If you don't happen to be running Oracle then everything is usually fine. Resin changes per server, so use that instead of Bea. Use DB2 or SQL server, or Postgresql instead of Oracle. Oracle's cool but the licensing insanity makes it less than useful in practice.

  85. okay, I'm simplying! by captwheeler · · Score: 1

    Either way, thats a good word. A warning that its a simplification, some taking responsibility for the errors that will come from the simplification, the implication that the simplification might reach the level of a lie, and a little fun with language.

    --

    Thanks for putting on the feedbag. Thanks for going all out. Thanks for showing me your Swiss Army knife.

  86. Bleah. FUD. by Cryptnotic · · Score: 2, Informative
    #pragma omp parallel for private(sum) reduction(+: sum)
    Bleah. FYI, I'm pretty sure GCC will reject this. Even the newest versions.

    I just tested it with GCC 2.95.3, 3.2.1, 3.3, and 3.4.2, and it works fine. Of course, GCC is just ignoring the #pragma. I didn't know about OpenMP before this, but it does look like a good way to "optimize later" and have your code still compile with gcc. And you don't have to write and maintain two different versions separated by #ifdef, #else, #endif.

    --
    My other first post is car post.
  87. OpenMP by Kenard · · Score: 1

    It's not really a language but it does allow the programmer to say that sections of code could be parallelized and it will handle the number of threads, the forking, dividing up the work, and joinning of the threads. The programmer still has to make sure that the code is really parallelizable.

    --
    (appended to the end of comments you post)
  88. Looking back... by Wesley+Felter · · Score: 1

    I guess a few years ago you would have said: With Sun, IBM, Intel, and AMD all going SMP, companies like Oracle are just milking it for all its worth before they have to cave in and charge per machine. It is inevitable.

    What is so special about a socket? Why is per-socket pricing legitimate, but per-core is not?

    1. Re:Looking back... by SunFan · · Score: 1

      Why is per-socket pricing legitimate, but per-core is not?

      IMO, the problem is that the cores vary so widely. Given a certain size peice of silicon, companies can put two, four, eight, etc. cores on it, but still be within a certain size and power consumption envelope. And it is probably arguable that state of the art design can really only squeeze so much performance per mm^2 or per watt. That means current state of the art will be more or less even per socket (for licensing arguments), but not per core.

      --
      -- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
  89. Re:Hyperthreading by Dominic_Mazzoni · · Score: 1

    Oops, it looks like I was wrong...what the article is talking about is definitely not the same as hyperthreading, but goes far beyond this! Sorry, my bad....

    However, I was certainly not "trolling" when I claimed that Hyperthreading really is impressive. As a quick demonstration, I compiled the latest version of ImageMagick (6.2.0) on my dual-Xeon (3.0 GHz) with Hyperthreading on, with 1, 2, 4, and 6 threads (make -j):

    make -j 1: 6:26
    make -j 2: 4:09
    make -j 4: 2:54
    make -j 6: 2:48

    Anyone have a dual-CPU box without hyperthreading to compare this to? In the past I've tried it, you don't get nearly that much of a speed boost using 4 threads instead of 2, without Hyperthreading.

  90. Doesn't MS do the same? by olddotter · · Score: 1
    That mindset leads to lazy programmers who A - Can't optimize to save their jobs; and B - Don't actually understand what multithreading really does.

    I always thought Microsoft's Visual"X" developer products did the same thing. That is they lead to lazy (and often unskilled) programmers who A - Can't optimize to save their jobs; and B - Can't debug to save their jobs either.

  91. Re:Bad bad English headline by Anonymous Coward · · Score: 0

    Have you never heard "what does" shortened to "what's"? Come on, don't jerk me off here.

  92. Ironically, M$ did something right here by Anonymous Coward · · Score: 0

    MS per cpu licensing is per physical cpu socket, not per cpu core.

  93. "programmer can choose" ??? WTF? by olddotter · · Score: 1

    Ok lets face it, 98.9% of the time these decisions are made by either the OS, the compiler, or the VM. Very few programmers out there are really capable of making these decisions, and even fewer work in an environment where they are allowed to make them.

    This is of interest to the OS developers, the compiler developers, and people who work on Beowolf Clusters.

    Uses who are running an application mix with a lot of different threads (or processes) will see a benefit as well.

  94. Re:Software:Hardware thread mapper by Doc+Ruby · · Score: 1

    Infant, come back for a spanking. Or don't you know any other words except poser nerdspeak?

    --

    --
    make install -not war

  95. However, there is just no substitute for speed by Anonymous Coward · · Score: 0
    throughput has become more important than absolute speed in the enterprise

    Not sure I agree with that. The thing is if your box doesn't have enough speed after you have optimised the application and the database, there's not much you can do about it. If it doesn't have enough throughput, you can add more boxes.

    For example I've been working on a nontrivial system which has a 200ms response time ceiling and huge data volumes. That is hard to achieve (much harder than a previous project I worked on which has a 100us response time requirement). The system with the 200ms response time requirement also will have large transaction volumes. However, the response time is the scary bit. Adding more servers will deal with the total volumes, but if we can't deal with the individual requests within 200ms, we're in real trouble.

    And yes, you're right. Throwing hardware at the problem is expensive. We now have more kit installed than I have even seen in one place before. In fact, at a rough estimate, more Sun boxes than I have eaten pizzas, and I've been working in IT for 15 years...

  96. ...but pretty darn close! by beaststwo · · Score: 1
    After reading the linked Sun article, this still sounds a lot more like Intel's Hyperthreading articles that your post would indicate.

    As a developer, the OS provides the interface for handling threads. Yes, we may have to handle locks on some platforms, but the OS still does most of the heavy lifting. What Sun describes would mainly apply to the interface between the part of OS that schedules process and/or threads (some OS'es, like Linux, treat threads and processes as contexts and don't differentiate between them for scheduling) on hardware, rather than something that a developer can make direct use of.

    Sure, Intel uses the concept of logical processors for Hyperthreading, but the main thing that does is allow backwards compatible with API's available in current OS'es. I'm sure that, at some point in the future, OS'es may provide some other great way to use the capability.

    I don't see that Sun has offered anything significantly different. Yes, they're handling caching diferently and their solution may actually work better, but the overall concept and net impact looks very similar. A developer will still use a threading API, so for a given hardware/OS/Development environment, there should be little difference. Future API developments and thread-safe libraries will make the biggest difference to the developer.

  97. Re:Software:Hardware thread mapper by Doc+Ruby · · Score: 1

    Moderation -1
    100% Offtopic

    I point out that HW threading support of SW threading techniques needs a mapping app, which would be gcc. How is that offtopic? And why the flame?

    --

    --
    make install -not war

  98. Re:Hyperthreading by owlstead · · Score: 2, Funny

    This is not college. Slashdot does not start with "there are no stupid questions". There are, you asked one, AND it was already more covered than the genitals in a tiroller soft sex movie.

  99. Very Good!!! by Anonymous Coward · · Score: 0

    For WikiPedia? ;D

  100. Sun is still dead by geekee · · Score: 1

    So basically I can run a bunch of threads slowly simultaneously, and pay 10x the cost of AMD. Thanks but no thanks.

    --
    Vote for Pedro
    1. Re:Sun is still dead by swordgeek · · Score: 1

      That's nice. Tell me just how well your AMD box stacks up to my Sun F2900.

      I love (and own) AMD systems, but they don't compete beyond 2-4 processors. Increasingly, that's where Sun's market is.

      --

      "People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
  101. Process Calculi by Ulrich+Hobelmann · · Score: 0

    Multithreading is really cool. Maybe it's about time programmers took a look at CSP, CCS, the pi-calculus and other parallel programming languages.

    Maybe the transputer and OCCAM will even return :D

  102. Cache coloring by tlambert · · Score: 1

    The technique you are looking for is called "cache coloring". If you search for those two terms in Citeseer, you'll get about 60 papers back.

    Effectively, multicore architectures are morally very similar to ccNUMA. In both cases you're talking a hierarchy of execution units.

  103. Sun's stolen the IBM's idea, from Dr. Throughput! by Anonymous Coward · · Score: 0
    http://www-1.ibm.com/servers/eserver/pseries/hardw are/whitepapers/p5_db2.pdf
    http://www-1.ibm.com/servers/eserver/iseries/perfm gmt/pdf/SMT.pdf
    http://www.cs.washington.edu/research/smt/

    It uses fast task switching of 2 or 3 or 4 cycles of many soft-tasks using only one real core (more cores better!).

    open4free ©

  104. Easy to understand the Dr. Throughput Idea!!! by Anonymous Coward · · Score: 0
    By example, using one real core (it's using one PC (Program Counter) or one IP (Instruction Pointer))

    1. Cache's Missing => aprox. 5 cycles.
    2. TLB's Missing => aprox. between 5 and 100 cycles.
    3. MMU's Missing => aprox. between 100 and 100,000 cycles.

    Imagine that the core has 1 active task and many sleeping tasks ...

    And the special fast switching task for SMT consumes only 2 cycles.

    Without SMT, when the first task is failing because of any missing, the remaining time is unused.

    With SMT, when the first task is failing because of any missing, the remaining time is used by the special fast switching task for SMT and the next 2nd, 3rd, 4th, ... tasks of his scheduling list!.

    open4free © it's beast IPC. More cores better!!!

  105. Spot on and annoying as hell. by PotatoHead · · Score: 1

    Microsoft is winning these datacenters one at a time.

    Vendor = UGS in this case.

    Top reason: Which Linux do we support?

    Of course I tell them to just pick one and let their users sort it out. --No dice. They believe expectations are too hard to manage and what happens when their particular linux dies.

    Dorks.

    1. Re:Spot on and annoying as hell. by ckaminski · · Score: 1

      People have been targeting RedHat and SuSE for some time. Other than the abrupt end of Redhat and the emergence of Fedora Core, big deal. Thanks to adherence to the LSB, this isn't as big a deal as it was five years ago.

      RPM's built for Redhat 9 typically work without error on my SuSE 8.2 machine, unless it's something designed specifically for system management. This is a cop out.

      Oracle is not supported on Fedora Core 2, but I'm running it, thanks to some tricks with tweaking a few /etc files. So what's the big deal?

      You are so right, annoying as hell. :-/ I think vendors are a little too tied to checking uname -a and not enough time making sure their libraries are installed properly. Which is where autoconf comes in... maybe it's time installers start running it before committing an install?

  106. Totally. by PotatoHead · · Score: 1

    Why they don't see this is beyond me. All they need to do is spec what their config is and the rest can be handled by the users, VAR, whatever.

    PTC does it, why can't the others?

    1. Re:Totally. by ckaminski · · Score: 1

      I'll second this. I spent YEARS at PTC qualifying specific hardware configurations for support before we finally figured out that we had to target feature sets, not "Compaq Model #1010 w/ AccelGraphics Graphics Engine #2020." It was a PITA, and 99.9% of all functionality worked without error on generic 3d hardware of the day (voodoo, mid, early 1997). Part of that had to do with fewer vendors paying for the certification as the margins in the workstation business collapsed under the weight of the PII flooding the market, but it was a good wake-up call.

      Object Design/Excelon/Progress (same company, different names) does it with ObjectStore... It's not a hard problem. If autoconf can figure out how to build bacula on my highly tweaked SuSE 8.2 Linux configuration, it can surely be used to figure out if Random Linux #.1 can support Random App 1.3.

  107. Not a connection machine by ka9dgx · · Score: 1
    It's not a connection machine. There is no program counter to be found anywhere. Its orthogonal, and that's it. A single bit to and from each neighbor in a simple 2d grid.

    --Mike--