Slashdot Mirror


How We'll Program 1000 Cores - and Get Linus Ranting, Again

vikingpower writes For developers, 2015 got kick-started mentally by a Linus Torvald rant about parallel computing being a bunch of crock. Although Linus' rants are deservedly famous for the political incorrectness and (often) for their insight, it may be that Linus has overlooked Gustafson's Law. Back in 2012, the High Scalability blog already ran a post pointing towards new ways to think about parallel computing, especially the ideas of David Ungar, who thinks in the direction of lock-less computing of intermediary, possibly faulty results that are updated often. At the end of this year, we may be thinking differently about parallel server-side computing than we do today.

13 of 449 comments (clear)

  1. Clue token? by Anonymous Coward · · Score: 0, Insightful

    Linus doesn't have a clue about much of computing:

    * Floating point? Nope.
    * Graphics? Nope.
    * High performance? Nope.
    * Parallel? Nope.
    * Compiling the Linux kernel? Maybe.

    This is another clear indication he currently lacks the clue token.

  2. Re:Pullin' a Gates? by jhol13 · · Score: 3, Insightful

    Why not? Currently Firefox has problems rendering (loading) two pages simultaneously, although it should be able to handle tens, using several cores.
    Same with Evince (which is crap anyway), it cannot do anything in parallel, should be able to use tens of cores.
    Javascript? Although the language is the worst I have seen since APL, a smart compiler could at least in some cases parallelize it (maybe with speculative execution or like).
    And so on.

    It will turn out to be as wrong as "640k".

  3. Re:Pullin' a Gates? by bloodhawk · · Score: 3, Insightful

    Actually the quote is just an internet myth, at least no one has ever found a source for it or anyone that even reports to have heard him say it and gates denies having said it as well.

  4. Re:Core of the article by imgod2u · · Score: 3, Insightful

    The idea isn't that the computer ends up with an incorrect result. The idea is that the computer is designed to be fast at doing things in parallel with the occasional hiccup that will flag an error and re-run in the traditional slow method. How much of a window you can have for "screwing up" will determine how much performance you gain.

    This is essentially the idea behind transactional memory: optimize for the common case where threads that would use a lock don't actually access the same byte (or page, or cacheline) of memory. Elide the lock (pretend it isn't there), have the two threads run in parallel and if they do happen to collide, roll back and re-run in the slow way.

    We see this concept play out in many parts of hardware and software algorithms actually. Hell, TCP/IP is built on having packets freely distribute and possibly collide/drop with the idea that you can resend it. It ends up speeding up the common case: that packets make it to their destination along 1 path.

  5. Torvalds is half right by popo · · Score: 5, Insightful

    The problem is that Linus is discussing two different things at once and so it sounds like he's making a more inflammatory point than he is.

    The issue is not whether parallelism is uniformly better for all tasks. The question is, is parallelism better for some tasks. And as Torvalds points out, those tasks do exist (Graphics being an obvious one).

    The nature of the workload required for most workstations is non-uniform processing of large quantities of discreet, irregular tasks. For this, parallelism (as Torvald's correctly notes) is likely not the most efficient approach. To pretend that in some magical future, our processing needs can be homogenized into tasks for which parallel computing is superior is to make a faith-based prediction on how our use of computers will evolve. I would say that the evidence is quite the opposite: That tasks will become more discrete and unique.

    Some fields though: finance, science, statistics, weather, medicine, etc. are rife with computing tasks which ARE well suited to parallel computing. But how much of those tasks happens on workstations. Not much, most likely. So Linus' point is valid.

    But I have to take issue of Linus tone in which he downplays "graphics" as being a rather unimportant subset of computing tasks. It's not "graphics". It's "GRAPHICS". That's not a small outlier of a task. Wait until we're all wearing ninth generation Oculus headsets... the trajectory of parallel processing requirements for graphics is already becoming clear -- and it's stratospheric. The issue is this: Our desktop processing requirements are actually slowing and as Linus points out, are probably ill-suited for increased parallelism. But our graphics requirements may be nearly infinite.

    Unlike other fields of computing, we know where graphics is going 20 years from now: It's going to the "holodeck".

    Keep working on parallel computing guys. Yes, we need it.

     

    --
    ------ The best brain training is now totally free : )
    1. Re:Torvalds is half right by Anonymous Coward · · Score: 2, Insightful

      No, that's not faith. That's an economic argument. I know that many tasks which are considered practically non-parallelizable today can in fact be parallelized. We don't do that today because the additional work doesn't pay off when massive multicore systems are not yet available or not yet capable of running general purpose code. Often it's just a matter of getting the right tools, but sometimes you need to look at problems again and solve them in a different way. With new algorithms and new tools, you will make use of many cores, because if you don't, you will be left in the dust by the people who do.

  6. Re:Pullin' a Gates? by Urkki · · Score: 5, Insightful

    Why not? Currently Firefox has problems rendering (loading) two pages simultaneously, although it should be able to handle tens, using several cores.
    Same with Evince (which is crap anyway), it cannot do anything in parallel, should be able to use tens of cores.
    Javascript? Although the language is the worst I have seen since APL, a smart compiler could at least in some cases parallelize it (maybe with speculative execution or like).
    And so on.

    It will turn out to be as wrong as "640k".

    Javascript is generally used in event driven manner, so it will perform quite well on a single core. Firefox having trouble loading multiple pages simultaneously should still be IO-bound, not CPU-bound, and if the engine has trouble, then it's an SW architecture problem where more cores will not really help.

    Point of Linus was, taking a 6 core CPU, and replacing 2 cores with more cache and more transistors per core should make almost anything on Desktop run faster.

  7. Re:Programs people want to use... by Rei · · Score: 3, Insightful

    Indeed. There's tons of CPU-intensive tasks that need to be done in a modern computer game, but they're typically done as:

    while (true)
    {
        do_task_1();
        do_task_2();
        ( ... )
        do_task_N();
    }

    Rather than...

    std::thread([&](){ while (true) do_task_1(); }).detach();
    std::thread([&](){ while (true) do_task_2(); }).detach();
    ( ... )
    std::thread([&](){ while (true) do_task_N(); }).detach();
    }

    ... or similar. Because in C and older versions of C++ launching a thread takes significant typing and ugly code, up to and including - in the case of the same function threaded a variable number of times in a loop with more than a trivial argument - having to have a memory-managed threadsafe container to hold your arguments (and in C you don't have STL containers, you have to do all that work yourself too). It's not the end of the world to have to code threads in C or earlier C++, but it's enough work that programmers usually don't do it any more than they're pretty much forced to. "Okay, my game will literally run at half the speed if I don't thread this function" - fine, they'll thread it. But "this function call eats up 3% of my performance, this one 6%, this one 4%, this one 2,5%, this one 3,5%...."? Usually such functions just get stuck into one big main loop.

    I really hope with how easy it's gotten in C++11 that more people will make better use of threads. In the first example code, not only do you relegate all of your tasks to the same core, thus hitting performance, but if any one task hangs, all of them hang. It's a terrible approach, but it's the most common. The only case where threads aren't good is where you're doing heavy concurrent read/writes to the same cached data, but in real world apps there's almost always a level where you can launch the thread where this isn't the case, if it's even an issue to begin with in your particular application. The presumption that concurrent access to cached memory will usually or always be a problem (which seems to be Linux's presumption) requires that A) your threads not doing the majority of their work on thread-local memory, AND B) that the shared data area being read from / written to concurrently is small enough to be cached, AND C) you can't just migrate your threads up in scope N levels to work around any such issue.
     

    --
    If you play a Ke$ha song backwards, you hear messages from Satan. Even worse, if you play it forwards you hear Ke$ha.
  8. Shi's Law, Gustafsson's Law, Amdahls Law by amplesand · · Score: 3, Insightful

    Shi's Law

    http://developers.slashdot.org...

    http://spartan.cis.temple.edu/...

    http://slashdot.org/comments.p...

    "Researchers in the parallel processing community have been using Amdahl's Law and Gustafson's Law to obtain estimated speedups as measures of parallel program potential. In 1967, Amdahl's Law was used as an argument against massively parallel processing. Since 1988 Gustafson's Law has been used to justify massively parallel processing (MPP). Interestingly, a careful analysis reveals that these two laws are in fact identical. The well publicized arguments were resulted from misunderstandings of the nature of both laws.

    This paper establishes the mathematical equivalence between Amdahl's Law and Gustafson's Law. We also focus on an often neglected prerequisite to applying the Amdahl's Law: the serial and parallel programs must compute the same total number of steps for the same input. There is a class of commonly used algorithms for which this prerequisite is hard to satisfy. For these algorithms, the law can be abused. A simple rule is provided to identify these algorithms.

    We conclude that the use of the "serial percentage" concept in parallel performance evaluation is misleading. It has caused nearly three decades of confusion in the parallel processing community. This confusion disappears when processing times are used in the formulations. Therefore, we suggest that time-based formulations would be the most appropriate for parallel performance evaluation."



    .

  9. Poor slashdot... by Anonymous Coward · · Score: 3, Insightful

    Few are actually people with a real engineering background anymore.

    What Linus means is:
    - Moore's law is ending (go read about mask costs and feature sizes)
    - If you can't geometrically scale transistor counts, you will be transistor count bound (Duh)
    - therefore you have to choose what to use the transistors for
    - anyone with a little experience with how machines actually perform (as one would have to admit Linus does) will know that keeping execution units running is hard.
    - since memory bandwidth has no where near scaled with CPU apatite for instructions and data, cache is already a bottleneck

    Therefore, do instruction and register scheduling well, have the biggest on die cache you can, and enough CPUs to deal with common threaded workflows. And this, in his opinion, is about 4 CPUs in common cases. I think we may find that his opinion is informed by looking at real data of CPU usage on common workloads, seeing as how performance benchmarks might be something he is interested in. In other words, based in some (perhaps adhoc) statistics.

  10. Re:i'm so tired of political correctness by Attila+Dimedici · · Score: 4, Insightful

    No, "political correctness" is a thing. It is where someone gets in trouble for using the word "niggardly" because it sounds like another word.

    --
    The truth is that all men having power ought to be mistrusted. James Madison
  11. Linus is right by gweihir · · Score: 3, Insightful

    Nothing significant will change this year or in the next 10 years in parallel computing. The subject is very hard, and that may very well be a fundamental limit, not one requiring some kind of special "magic" idea. The other problem is that most programmers have severe trouble handling even classical, fully-locked, code in cases where the way to parallelize is rather clear. These "magic" new ways will turn out just as the hundreds of other "magic" ideas to finally get parallel computing to take off: As duds that either do not work at all, or that almost nobody can write code for.

    Really, stop grasping for straws. There is nothing to be gained in that direction, except for a few special problems where the problem can be partitioned exceptionally well. CPUs have reached a limit in speed, and this is a limit that will be with us for a very long time, and possibly permanently. There is nothing wrong with that, technology has countless other hard limits, some of them centuries old. Life goes on.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  12. Re: Pullin' a Gates? by Anonymous Coward · · Score: 2, Insightful

    There has been a push back against integrating ANNs into mobile platforms. I think low power real time classification is simply missing an application in the mass market that can't be solved by off loading to a server. We simply assume that we are continuously connected to a sufficiently large data pipe and the problem goes away. Whether the hardware changes on the server side or not is a question of power savings, but I doubt we will see gains in performance over software implemented on server farms.

    That said if we put our future caps on, is there a point when the amount of data our electronics gather for processing that pushing into the cloud is cost and time prohibitive? If wearable electronics becomes a pervasive technology, we may need some on board continuously learning classifier cores to locally fuse sensor data rather than sending raw data Into the cloud. This is where we could see truly assistive computing without the creepier general intelligence hassabis and crew are working on at deep mind.

    Imagine you have a conversation with your wife and she says the kids need to be picked up at 4 on Tuesday. If my phone put a reminder on my calendar for me based on my continuous audio stream, the mental offload would be huge as I could seamlessly continue with my day without managing my calendar, but I don't want to continuously stream my audio to Google nor do they want to continuously process the sound of me typing and sipping coffee... That's what we have the NSA for.