Slashdot Mirror


How We'll Program 1000 Cores - and Get Linus Ranting, Again

vikingpower writes For developers, 2015 got kick-started mentally by a Linus Torvald rant about parallel computing being a bunch of crock. Although Linus' rants are deservedly famous for the political incorrectness and (often) for their insight, it may be that Linus has overlooked Gustafson's Law. Back in 2012, the High Scalability blog already ran a post pointing towards new ways to think about parallel computing, especially the ideas of David Ungar, who thinks in the direction of lock-less computing of intermediary, possibly faulty results that are updated often. At the end of this year, we may be thinking differently about parallel server-side computing than we do today.

7 of 449 comments (clear)

  1. Pullin' a Gates? by Tablizer · · Score: 4, Interesting

    "4 cores should be enough for any workstation"

    Perhaps it's an over-simplification, but if it turns out wrong, people will be quoting that for many decades like they do Gates' memory quote.

    1. Re:Pullin' a Gates? by bruce_the_loon · · Score: 4, Interesting

      If you went and read Linus' rant, then you'll find you are actually reinforcing his argument. He says that except for a handful of edge use-cases, there will be no demand for massively parallel in end user usage and that we shouldn't waste time that could be better spent optimizing the low-core processes.

      The CAD, video and HTPC use-cases are already solved by the GPU architecture and don't need to be re-solved by inefficient CPU algorithms.

      Your Linux workstation would be a good example, but is a very low user count requirement and can be done at the compiler level and not the core OS level anyway.

      Your Linux gaming machine shouldn't be doing more than 3/4 cores of CPU and handing the heavy grunt work off to the GPU anyway. No need for a 64 core CPU for that one.

      Redesigning what we're already doing successfully with a low number of controller/data shifting CPU cores managing a large bank of dedicated rendering/physics GPU cores and task-specific ASICs for things like 10GB networking and 6GB IO interfaces is pretty pointless, which is what Linus is talking about, not that we only need 4 cores and nothing else.

      --
      Trying to become famous by taking photos. Visit my homepage please.
    2. Re:Pullin' a Gates? by Rei · · Score: 3, Interesting

      Linus's argument basically boils down to, "Parallel algorithms are sorcery, and the only place they matter are places applications that demand performance which are indeed increasingly using parallelism".

      Of course you don't need, say, a 50-threaded version of vi or alsamixer or whatever. But for apps that need performance, increasingly they have to get them from threading. And there's nothing "magical" about parallelism. Perhaps in Linus's dislike for C++ he's missed how trivially easy it's gotten to launch threads in C++11, but it takes less work now than a for-loop, since std::thread is so simple and you can inline the command with a lambda. And you have a nice clean mutex library including scoped mutexes like std::lock_guard so you don't even have to remember to unlock them.

      It's quite true that having multiple cores needing to read to and write from the same chunk of memory isn't a good thing. But I'd bet you that only in under 5% or so of high performance apps is that the *only* level you can thread at. Because if you have say five nested levels of looping, 4 of them can be memory constrained, but so long as least just one can be threaded without heavy reads/writes on shared cache, you can thread to your heart's content with minimal adverse impact. And "heavy" is the key word. So long as you're not doing essentially *constant* heavy reads/writes on shared cache, the overhead cost is minimal.

      --
      If you play a Ke$ha song backwards, you hear messages from Satan. Even worse, if you play it forwards you hear Ke$ha.
    3. Re:Pullin' a Gates? by im_thatoneguy · · Score: 3, Interesting

      It is a niche which will need specific algorithms tuned for the hardware (GPU or other) the pipeline must be kept busy to observe a performance gain. It doesn't scale to general purpose computing.

      I feel like this is moving the goal posts. "You will never do massively parallel computing on a CPU because if it's massively parallel it's a GPU not a CPU."

      Linus is 100% wrong. What's the "general purpose" computing that we all want? The NCC-1701D's main computer from star trek. If I say "Cortana/Siri/Google Now please rough me out a flyer for our yardsale on Saturday." you're going to be looking at massively parallel task for the neural networks to not only interpret the voice but then make sense of the words and finally produce a printable flyer suitable for hanging. Programming is still a really fancy version of "IF A THEN B". "for X in GROUP do Z". "X = Y". Yeah, if your application is incredibly serial then a serial processor is all that you'll need. When computing advances to the next phase of neural networks, AI and directed (not instructed) computing then it'll need to be more like our brain: massively parallel.

      Now there are two obnoxious tautological arguments against this:
      A) "That's not a "CPU" that's like a NeuroProcessorUnit, an NPU if you will"
      B) "Yes we'll need a giant mainframe, but it'll be a server in the cloud!"

      A is moving the goal posts. Just because the processor isn't an ARM or x86 instruction compatible chip doesn't mean it's not worthy of the label CPU. As mentioned above you can't say that there'll never be a CPU with massive parallelism because as soon as it has massive parallelism it's by definition no longer a CPU. B is just saying that nobody will have a need for computers because we'll have a giant mainframe. Which might be true but you just need a basic DSP not even a CPU if it's just a pure thin client transmitting a video, audio and input stream to the cloud for processing. In which case all of the CPUs in existence... need to be massively parallel AI processors.

    4. Re:Pullin' a Gates? by Anonymous Coward · · Score: 2, Interesting

      Perhaps in Linus's dislike for C++ he's missed how trivially easy it's gotten to launch threads in C++11, but it takes less work now than a for-loop, since std::thread is so simple and you can inline the command with a lambda. And you have a nice clean mutex library including scoped mutexes like std::lock_guard so you don't even have to remember to unlock them.

      He doesn't mention C++ or anything like that. What he is talking about is that since the overhead for task switching is pretty large so in cases where a tradeoff is made between the performance of a single core or adding more cores to a CPU you will typically get more performance gain by having fewer better cores since the task most users do most of the time is of a nature that doesn't lend itself to parallellization. In those cases where it is easily done it is already delegated to dedicated hardware like GPU.
      For your typical for-loop that is so easy to launch threads for the problem is that the overhead for moving the task to another core with another cache is so high that you don't get a performance gain. There are still cases where it makes sense to launch threads but people who does it without thinking because "parallell is better" is the kind of programmers that jumps on every new programming fad.

  2. How parallel does a Word Processor need to be? by Nutria · · Score: 3, Interesting

    Or a spreadsheet? (Sure, a small fraction of people will have monster multi-tab sheets, but they're idiots.)
    Email programs?
    Chat?
    Web browsers get a big win from multi-processing, but not parallel algorithms.

    Linus is right: most of what we do has limited need for massive parallelization, and the work that does benefit from parallelization has been parallelized.

    --
    "I don't know, therefore Aliens" Wafflebox1
  3. Bad summary, shocking by Urkki · · Score: 5, Interesting

    Linus doesn't so much say that parallelism is useless, he's saying that more cache and bigger, more efficient cores is much better. Therefore, increased number of cores at the cost of single core efficiency is just stupid for general purpose computing. Better just stick more cache to the die, instead of adding a core. Or that is how I read what he says.

    I'd say, number of cores should scale with IO bandwidth. You need enough cores to make parallel compilation be CPU bound. Is 4 cores enough for that? Well, I don't know, but if the cores are efficient (highly parallel out-of-order execution) and have large caches, I'd wager IO lags far behind today. Is IO catching up? When will it catch up, if it is? No idea. Maybe someone here does?