Slashdot Mirror


SW Weenies: Ready for CMT?

tbray writes "The hardware guys are getting ready to toss this big hairy package over the wall: CMT (Chip Multi Threading) and TLP (Thread Level Parallelism). Think about a chip that isn't that fast but runs 32 threads in hardware. This year, more threads next year. How do you make your code run fast? Anyhow, I was just at a high-level Sun meeting about this stuff, and we don't know the answers, but I pulled together some of the questions."

8 of 378 comments (clear)

  1. Steam Engine - Diesel by kpp_kpp · · Score: 5, Insightful

    Some people have predicted this move for quite some time. I remember hearing about it back in the late 80's early 90's and I'm sure it goes way back before then. The analogy was to Steam Engines and why they lost out over Diesels. You can only make a Steam engine so big but you cannot connect them together to get more power. With diesels you can hook many of them together for more power. Chips are finally getting to the same point -- It is more cost efficient to chain them together than to create a monsterous one. I'm surprised it has take this long to get to this point.

  2. well at least he seems to understand the problems by Anonymous Coward · · Score: 5, Interesting

    from TFA:
    "Problem: Legacy Apps You'd be surprised how many cycles the world's Sun boxes spend running decades-old FORTRAN, COBOL, C, and C++ code in monster legacy apps that work just fine and aren't getting thrown away any time soon. There aren't enough people and time in the world to re-write these suckers, plus it took person-centuries in the first place to make them correct.

    Obviously it's not just Sun, I bet every kind of computer you can think of carries its share of this kind of good old code. I guarantee that whoever wrote that code wasn't thinking about threads or concurrency or lock-free algorithms or any of that stuff. So if we're going to get some real CMT juice out of these things, it's going to have to be done automatically down in the infrastructure. I'd think the legacy-language compiler teams have lots of opportunities for innovation in an area where you might not have expected it."

  3. One Weenie's Perspective by Anonymous Coward · · Score: 5, Funny

    Well I am a Star Wars weenie, and I am definitely NOT ready for Country Music Television.

  4. Big Hairy Package by Tweak232 · · Score: 5, Funny

    "The hardware guys are getting ready to toss this big hairy package over the wall:"

    Vivid imagary...

  5. Programming isn't up to it by Toby+The+Economist · · Score: 5, Interesting

    32 threads in hardware on one chip is the same as 32 slow CPUs.

    Current programming languages are insufficiently descriptive to permit compilers to generate usefully multi-threaded code.

    Accordingly, multi-threading is currently handled by the programmer; which by and large doesn't happen, because programmers are not used to it.

    A lot of applications these days are weakly multi-threaded - Windows apps for example often have one thread for the GUI, another for their main processing work.

    This is *weak* multi-threading; because the main work done occurs within a single thread. Strong multi-threading is when the main work is somehow partioned so that it is processed by several threads. This is difficult, because a lot of tasks are inherently essentially serial; stage A must complete before stage B which must complete before stage C.

    The main technique I'm aware of for making good use of multi-threading support is that of worker-thread farms. A main thread receives requests for work and farms them out to worker threads. This approach is useful only for a certain subset of problem types, however, and within the processing of *each* worker thread, the work done itself remains essentially serial.

    In other words, clock speeds have hit the wall, transistor counts are still rising, the only way to improve performance is to have more CPUs/threads, but programming models don't yet know how to actually *use* multiple CPU/threads.

    El problemo!

    --
    Toby

  6. Re:Schism Growing by philipgar · · Score: 5, Interesting

    Actually from what I've heard, the entire industry is moving in this direction. The whole idea of out of order processors (OOP) has become outdated. OOP was great. Enabled massive single threaded performance, however the costs (in terms of area and heat dissipation) is enormous.

    I just came back from the DaMoN workshop where the keynote was delivered by one of the lead P4 developers. He explained the future of microprocessors and said that the 10-15% extra performance that OOP enables just isn't worth it. The Pentium 4 has 3 issue units, but the way things are rarely issues more than 1 instruction per cycle.

    We can squeeze more performance out of them, but not much. The easiest method is to go dual core. However if an application must be multithreaded to enable the best performance, what would you rather have . . . 2 highly advanced cores, or 8-10 simple cores that can issue half as many instructions per cycle as the dual core design. Than consider the fact that each core enables 4 threads to run (switch on cache miss/access). It doesn't take a rocket scientist to see that overall performance is improved with this.

    The other option is the hybrid core. A single really fast x86 core combined with multiple simpler x86 cores. That way single threaded apps can run fast (until they're converted) and you can get overall throughput from the system without blowing away your power budget on OOP optimizations.

    Granted most of this is in the future (within the next 5 years), but IBM's going that way (ala Cell), its within Intels roadmap, Sun is pushing that route etc. I assume AMD has plans to create a supercomputer on a chip . . . unless they wish to be obsoleted.

    Phil

  7. Re:Question: What needs multiple threads? by Frit+Mock · · Score: 5, Insightful


    In games the AI of non-player-characters (-objects) can profit a lot from threading.

    But for common apps ... I don't expect a big gain from multiple threads. I guess typical apps like browsers, word-processor and so one have a hard time utilizing more than 3-4 threads for the most common operations a user does.

  8. Why the future of SMT is bleak by spockvariant · · Score: 5, Informative

    I'm a researcher working on high performance computing and have used various configurations of Simultaneous Multithreading (aka Hyperthreading aka CMT) (Intel Xeon, IBM POWER5). The result is always the same - at the end, memory latencies and OS overheads kill most of the gains of instruction level parallelism coming from SMT. Look at it this way - the typical latencies of operations on most modern processors are of the order of 1 nanosecond, whereas DRAM latencies are of the order of 200ns. As long as you can't do anything about this latency, there's no point in cutting down on processing times. There's a very nice paper in this year's ACM SIGMETRICS that gives real experimental data to illustrate this fact - http://www.cs.princeton.edu/~yruan/XeonSMT/smt.pdf The paper shows that the speedups obtained using SMT in practice are meagre. The reason that the simulation results coming from the original UWashington research on the subject - http://www.cs.washington.edu/research/smt/ - looked far better was their use of unreasonably large caches in their simulations, and that they completely ignored the OS overhead of enabling SMT - which is non-negligeable - and is a thing that has been pointed out often on the Linux Kernel mailing list as well.