Slashdot Mirror


Auto-threading Compiler Could Restore Moore's Law Gains

New submitter Nemo the Magnificent writes "Develop in the Cloud has news about what might be a breakthrough out of Microsoft Research. A team there wrote a paper (PDF), now accepted for publication at OOPSLA, that describes how to teach a compiler to auto-thread a program that was written single-threaded in a conventional language like C#. This is the holy grail to take advantage of multiple cores — to get Moore's Law improvements back on track, after they essentially ran aground in the last decade. (Functional programming, the other great hope, just isn't happening.) About 2004 was when Intel et al. ran into a wall and started packing multiple cores into chips instead of cranking the clock speed. The Microsoft team modified a C# compiler to use the new technique, and claim a 'large project at Microsoft' have written 'several million lines of code' testing out the resulting 'safe parallelism.'" The paper is a good read if you're into compilers and functional programming. The key to operation is adding permissions to reference types allowing you to declare normal references, read-only references to mutable objects, references to globally immutable objects, and references to isolated clusters of objects. With that information, the compiler is able to prove that chunks of code can safely be run in parallel. Unlike many other approaches, it doesn't require that your program be purely functional either.

38 of 404 comments (clear)

  1. Not this shit again by Anonymous Coward · · Score: 4, Informative

    Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.

    1. Re:Not this shit again by xetovss · · Score: 4, Interesting

      Exactly, Moore's Law isn't a law, it is a marketing plan. I don't see why so many people get so serious about it. A real law (of science) would be something like the law of gravity where it has a near universal application, whereas Moore's Law is a "law" that describes Intel's marketing plan.

    2. Re:Not this shit again by Baloroth · · Score: 4, Informative

      Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.

      Yes and no. Moore's law states that the number of transistors will double every 2 years. The problem is that we are nearing the peak of what is possible with current technology in a single core, hence all the focus on 4,6,8, and even 16 core systems for consumers (always been popular in supercomputers and the like). That means doubling transistor count every 2 years can be done through increasing cores... but there is no point in doing that if programs can only use a few of them (very few consumers now need 4 cores, much less 8 or 16).

      So, if you can scale up the ability to use processor cores, then Moore's law can continue to hold for CPU's as we increase processor cores. If you can't, then it won't. It'll have to stop sometime, of course, just not necessarily today.

      --
      "None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
    3. Re:Not this shit again by Jonner · · Score: 4, Insightful

      Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.

      Try reading the headline again. Usually, clueless posters have to be reminded to read TFA, but this is ridiculous. As Moore's "law" continues to be mostly true, the added transistors are being used for extra more cores rather than to make one core faster. Most of the cores sit idle most of the time because few programs can use more than one of them at once.

    4. Re:Not this shit again by c0lo · · Score: 5, Funny

      Apparently the real geeks left Slashdot ages ago.

      Casted to void?

      --
      Questions raise, answers kill. Raise questions to stay alive.
    5. Re:Not this shit again by Spiridios · · Score: 5, Insightful

      Maybe you need to read?

      Here's the headline (emphasis mine): Auto-threading Compiler Could Restore Moore's Law Gains

      In the past, adding more transistors to a core meant the core worked faster (simplification), so Moore's law indirectly lead to better performance. Now a core is close to as fast/efficient as its going to get, so we throw the extra transistors at new cores (still Moore's law). The problem is, there's only so many single-threaded processes a person can run before there will literally be one core per process. In order to see benefits from the extra transistors again, "gains" in the summary's terminology, then we need a way to take advantage of it (this new compiler technique, functional programming, programmers who can actually grok threads). In the end, if we're not seeing some kind of gain from Moore's law, then the chip manufacturers will stop adding new transistors because no one will pay money for a chip that's just as good as their current chip, and Moore's law will fail.

    6. Re:Not this shit again by ceoyoyo · · Score: 5, Interesting

      Moore's law was coined by an engineer to describe a series of observations. That is, it's a mathematical function that seems to fit some data, without any explanatory power. Just like various other "laws" such as the laws of thermodynamics, and, your favourite, Newton's laws, including his law of universal gravitation.

      Moore's law states that the number of components on an integrated circuit doubles approximately every two years.

    7. Re:Not this shit again by shmageggy · · Score: 4, Insightful

      Yeah but "Moore's Heuristic" just aint as snappy.

    8. Re:Not this shit again by tlhIngan · · Score: 4, Informative

      Yes and no. Moore's law states that the number of transistors will double every 2 years. The problem is that we are nearing the peak of what is possible with current technology in a single core, hence all the focus on 4,6,8, and even 16 core systems for consumers (always been popular in supercomputers and the like). That means doubling transistor count every 2 years can be done through increasing cores... but there is no point in doing that if programs can only use a few of them (very few consumers now need 4 cores, much less 8 or 16).

      Except... the number of transistors in a CPU is irrelevant!

      A CPU doesn't have the transistor density that really benefits much from Moore's Law - because the vast majority of the space on a chip is not taken up by transistors, but by wiring. In fact, the wiring density is what's limiting transistor density (a good thing - larger transistors can give you better performance because they can drive the longer wires quicker).

      Most of the transistors used in a CPU actually goes towards the cache - when you're talking about 16+ MB of pure L1/L2/L3 cache, implemented as 6T SRAM cells, that's 100M transistors right there (and that doesn't include the cache line tag logic and CAM).

      The thing with the highest transistor density (and thus the most benefit of Moore's Law) is actually memory structures - caches, DRAM, SRAM, flash memory, etc. This is where each transistor is vital to memory storage and packing them in close means more storage is available, in which case Moore's law states that RAM etc. will double in capacity or halve in cost every 18 months or so.

      Smaller transistors do help CPUs consume a little less power, but double the number of transistors doesn't do a whole lot because there's a lot of empty space that the wiring forces to be transistor-free. (Non-memory parts of the CPU are effectively "random logic" where there's no rhyme or reason to the wiring). It's why the caches have the most transistors yet take the smallest areas.

    9. Re:Not this shit again by TheLink · · Score: 4, Interesting

      The problem is that we are nearing the peak of what is possible with current technology in a single core

      But aren't there still plenty of things that the hardware can do to make the software stuff easier?

      Intel has actually started adding some stuff to help: http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions

      So maybe Intel, AMD should interact more with the OS and compiler bunch and figure out how to use those billions of transistors in more useful ways.

      There are things you can do in software to address the c10k problem (and similar problems): http://www.kegel.com/c10k.html
      But I'm sure there are things you can do in hardware to make stuff even easier. It's not like the stuff that's being done is the best way of doing things. Furthermore the OS and apps people might be writing things in certain ways because certain things aren't done by the hardware or done fast.

      I know that at least on the x86 platform checking the current time and also getting monotonic time is not as simple AND efficient as it could be. It was even worse before HPET, and now even HPET isn't that great. Monotonic system time (ticks) could be different from current human time, many programs need one or both.

      Same goes for scheduling stuff, serializing stuff and getting stuff to trigger on arbitrary things all with minimal overhead. I suspect many things would be easier if you can create a way of having cluster wide consistent monotonic high-res time, and also fast low latency clusterwide locking.

      On a related note many programming languages seem to like to push parameters onto a stack. That's fine but in many implementations they seem to push the data onto the code/call stack (which holds return addresses). This mixing of data and code stuff is unsafe. They should have separate data and code stacks. That way even if a hacker messes with the data, it's harder to execute arbitrary code- you'd just execute the same code but with mangled data, which is more likely to produce errors instead of arbitrary code execution.

      If the reason for doing such insecure stuff is performance or convenience, then perhaps Intel etc can use those transistors to make it faster and easier to do things safer.

      --
    10. Re:Not this shit again by chrysrobyn · · Score: 5, Interesting

      Except... the number of transistors in a CPU is irrelevant!

      No, it's very relevant.

      A CPU doesn't have the transistor density that really benefits much from Moore's Law - because the vast majority of the space on a chip is not taken up by transistors, but by wiring. In fact, the wiring density is what's limiting transistor density (a good thing - larger transistors can give you better performance because they can drive the longer wires quicker).

      How much wiring happens on doped silicon? None. The vast majority of the chip is covered in transistors, with 6-10 levels of wires on top of them. There are some designs where the I/O count demands so many pins that's what dictates the size of the chip -- so cache is filled in underneath. Heck, if your power budget allows it, you're already blowing the silicon area anyway, might as well increase your cache size! Consider your recent Core derived designs. Take away half the cache. Do you think the die area would go down? Not hardly.

      Most of the transistors used in a CPU actually goes towards the cache - when you're talking about 16+ MB of pure L1/L2/L3 cache, implemented as 6T SRAM cells, that's 100M transistors right there (and that doesn't include the cache line tag logic and CAM).

      You did the math right, but the cache line tag logic and coupled CAM are negligible. Sure, they may add a few million or so, but not anywhere near 5% of 100M.

      The thing with the highest transistor density (and thus the most benefit of Moore's Law) is actually memory structures - caches, DRAM, SRAM, flash memory, etc. This is where each transistor is vital to memory storage and packing them in close means more storage is available, in which case Moore's law states that RAM etc. will double in capacity or halve in cost every 18 months or so.

      I realize it's vogue for people to revisit Moore's Law and rewrite it every few years, but he was not speaking specifically about memory arrays. In fact, the chips Moore had access to at the time had very little memory on them.

      Smaller transistors do help CPUs consume a little less power, but double the number of transistors doesn't do a whole lot because there's a lot of empty space that the wiring forces to be transistor-free. (Non-memory parts of the CPU are effectively "random logic" where there's no rhyme or reason to the wiring). It's why the caches have the most transistors yet take the smallest areas.

      Wiring never forces silicon area to be transistor-free, unless you're thinking of 1980 era chips. Not even late '80s had wiring on doped silicon. Certainly the kinds of chips Moore was talking about has had no significant wiring on doped silicon in 20 years, the exceptions being only when layout designers are getting lazy. I've done layout design, I've done circuit design, I've audited dozens of chip layouts and seen several technology manuals dating back to the 90s.

      That random logic, by the way, is the subject of the most innovation in the field of chip layout and arguably in all of chip design. When your chip's entire goal is to funnel data through different units and do different things to it, you're dominated by buses. Automated tools often do split these buses up, but different algorithms can pull them together and make them more efficient. Caches are the smallest because they can be small. There's an entire periphery to them, including senseamps devoted to reading the baby FETs that can't make full rail to rail swings on the bitlines.

      May I guess you're a student? Perhaps one who is learning from a professor who hasn't been in the industry since about 1985?

  2. Re:I hate when people misuse Moore's law by oodaloop · · Score: 4, Interesting

    It turns out that Moore was observing only a small portion of a greater trend. That is, many (or most) things involving technology have a short doubling period, usually between 1 and 2 years. Total information created, processing power per dollar, resolution of brain scanning technology, et al all follow a similar doubling period to what Moore observed with transistors on integrated circuits. It's not that everyone else is misusing Moore's Law, so much as Moore didn't make make the law broad enough.

    --
    Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
  3. Re:I hate when people misuse Moore's law by Anonymous Coward · · Score: 5, Funny

    No no, you misunderstand. They're teaching the compiler to be so efficient that it actually starts creating new transistors to make itself smarter.

    Make no mistake, this is how the rise of the machines will start.

  4. Re:I hate when people misuse Moore's law by newcastlejon · · Score: 4, Funny

    oodaloop's Law doesn't have quite the same ring to it.

    --
    If God forks the Universe every time you roll a die, he'd better have a damned good memory.
  5. Or.. teach devs to use threading as appropriate? by databeast · · Score: 4, Informative

    Or, gawd forbid.. we could teach programmers how to use threading? I am a casual developer, with no formal training beyond writing practical code and reading as much as I can from the keyboards of real developers. I've run into my fair share of "real", "professional" developers who've been tasked to work on my code and thrown their hands up in defeat - not, as I feared, because of the awfulness of my coding style, but "This uses threading, I don't know this!".. Of course, looking at their resumes, a long list of single threaded webapps where the actual threading is handled for them by the webserver itself.. I give up. Even some basic native GUI client development should teach developers simple threading and asynchronous callbacks? alas, yet another talent abandoned in the age of the webapp. Is it any wonder the security issues (my actual realm of supposed 'expertise') in their code often make the architectural issues look trivial in comparison?

    An interesting development, and much needed I fear, but yet another layer of abstraction to allow lazy developers to not have to really bother about knowing what their code is actually doing (that's for the poor SoB who has to maintain it is for...)

  6. Parallelizing is easy, performance is hard by HuguesT · · Score: 4, Insightful

    OK, so now we shall have another way to produce parallel programs.

    Running things safely in parallel is a very good thing, but it does not by itself guarantee significant improvements in speed. The hard bit is actually to optimize the parallel code.

  7. Re:Great potential by Desler · · Score: 5, Informative

    Having the compiler identify things which could run in parallel and thread work to run as A B&C D would be a huge step forward as long as it doesn't introduce bugs.

    Compilers already try to do this for example with auto-vectorization. The problem is that they are usually quite terrible at it. Even Intel's compiler which is probably the best at it usually misses out on most of the obvious places that should be vectorized. This is why pretty much all code dealing with multimedia content (audio/video/image codecs, games, etc.) still rely on tons of had written SIMD to be optimized to their fullest.

  8. mutable state by jbolden · · Score: 4, Informative

    Mainstream language have mutable state all over the code. Functional programming's big change on state issues is to careful isolate state. The Microsoft approach means that state needs to be tracked carefully so that it could be isolated by the compiler even if it isn't isolated by the code. Which is likely just as much work as isolating state. And the nice thing about isolating state is once you do it you can make use of all sorts of incredibly powerful functional paradigms like like first class functions (closures, partial execution...) and lazyness (infinite data structures, no need to figure out proper order of evaluation..)

    The solution to parallelism is functional programming. And no it is not too hard. Excel is a functional programming language that lots of people know that does a great job isolating state.

  9. Re:Or.. teach devs to use threading as appropriate by lennier · · Score: 4, Insightful

    Or, gawd forbid.. we could teach programmers how to use threading?

    That's easy: "Don't."

    From everything I've read about threading, there's no general way a hand-coded multithreaded program can ever be proven to be correct. Threads introduce all sorts of extremely subtle timing-based logic and security bugs which even the smartest programmers in the world think they can handle but in practice don't. And most programmers are not the smartest programmers in the world (they only think they are).

    The correct solution is to switch from C-based imperative languages to pure-functional implicitly parallelised languages, but that's not likely to happen before the heat death of the universe.

    --
    You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
  10. Re:How about by chromas · · Score: 5, Funny

    all the magic compiler is going to do is increase the speed of the System Idle Thread.

    Have you seen how much processing power that thing consumes? Usually 90% and up. On all cores. It really needs some serious optimization.

  11. Re:I hate when people misuse Moore's law by timeOday · · Score: 4, Insightful

    I hate it when all the potentially interesting discussion on a slashdot story is headed off by a pedantic nitpick.

  12. Re:Great potential by allcoolnameswheretak · · Score: 5, Insightful

    Imagine you have a for-loop that calls the method 'hop' on every object 'bunny' in the list:

    for every bunny in list {
            bunny.hop()
    }

    This is a simple, sequential process - bunny.hop() is called in order, for every bunny in the list, one after the other.
    Now suppose you have defined the data in your program in such a way that the compiler knows that method 'bunny.hop()' only ever accesses read-only data, or modifies local data that is unaccessible from anywhere else. The compiler now knows that the order of execution of the bunny hops doesn't really matter, as every call of bunny.hop() is independent from anything else. This frees the compiler to spawn threads or processes to call as many bunny.hop()'s as he likes at the same time, thereby processing through the list alot faster.

    Another method, bunny.eat() actually performs write access on a shared object 'carrot' when called. If two bunnies eat the same carrot, the compiler can not perform automatic parallelization, as running two bunny.eat() methods could lead to invalid state (only one piece of carrot remaining, two bunnies eating 1 piece of carrot at the same time, results in -1 pieces of carrot). In this case, the compiler will take care to run two bunnies eating away at the same carrot sequentially. However if there are 2 bunnies eating the green carrot and another 2 bunnies eating the yellow carrot, these are again independent from each other and can again be paralleled.

    The requirement to make this possible is to provide the compiler with information on what kind of data something is - is it an isolated hopping? Or a shared carrot? or a global Bunnygod that affects all bunnies?

  13. Re:Or.. teach devs to use threading as appropriate by Jonner · · Score: 5, Insightful

    An interesting development, and much needed I fear, but yet another layer of abstraction to allow lazy developers to not have to really bother about knowing what their code is actually doing (that's for the poor SoB who has to maintain it is for...)

    Developing software is all about managing complexity. Abstractions are the primary tool used to do so. They are neither inherently good or bad. A large part of writing good software is finding the appropriate abstractions and eliminating inappropriate ones. If an abstraction allows a program to be written more easily with an acceptible level of performance and correctness, it is an appropriate abstraction.

    To know what code is "actually doing" is relative. No programmer knows what his code is doing at the level of gates on the chip. It's rarely necessary or even helpful to know what the code is doing at the level of CPU instructions or microinstructions. This is especially true if the code runs on multiple CPUs.

  14. Re:I hate when people misuse Moore's law by loneDreamer · · Score: 4, Interesting

    Magnetic Hard drives, for instance is an example of the same curve with no transistors.

  15. Meh. Not that big a problem. by fyngyrz · · Score: 4, Insightful

    Honestly, you know what this says? This just says some programmers still need to go practice in the threading space.

    Threading is not that hard at all for a VERY large class of problems. There are issues -- like, the memory bandwidth will eventually choke the ability of processors to get at nearby memory regions -- and you need some kind of sane plan to deal with cacheing in one core as compared to the next core when you share flags or data coming in during execution from elsewhere -- and you need to figure out when thread overhead starts to chew into your efficiencies -- but really, it's entirely doable, and isn't that difficult a problem space as compared to the actual problems *serious* programmers have to solve. It may just be that your "thing" isn't really all that threadable. Some things aren't. But it should also be very obvious when they aren't.

    Then again, some problem spaces lend themselves very well to multiple core approaches -- I just finished a library that slices up images and then applies various separable algorithms to the slices in a variable number of threads (the user of the library can specify.) I wrote it in pure C, used posix threads, no sweat at all, in my 8-core machine, you get just about exactly the gains you'd think (x the number of cores) when the process is CPU cycle intensive, less as the ops are simpler and memory I/O rates rise.

    Seriously. If threading seems that hard, you need to go do some studying. It's not the threading. It's you. You really should get on this bus.

    --
    I've fallen off your lawn, and I can't get up.
    1. Re:Meh. Not that big a problem. by rk · · Score: 5, Funny

      Maybe it's subtle sarcasm regarding learning threaded programming? People can't identify the right thread in a conversation, but should be expected to write threaded code? :-)

  16. Re:Great potential by Boawk · · Score: 5, Funny

    If two bunnies eat the same carrot...the compiler will take care to run two bunnies eating away at the same carrot sequentially. However if there are 2 bunnies eating the green carrot and another 2 bunnies eating the yellow carrot

    Am I the only one who is totally horny after reading that?

  17. Nothing new here by stevel · · Score: 4, Informative

    This is nothing new. As in decades-old. Back when I was at DEC in the 1980s we had auto-threading compilers that worked very well on standard application code. Today, Intel has been doing auto-threading in its C++ and Fortran compilers for more than a decade and it is very effective for a large class of applications. Intel's compiler also has features that help you tweak the code for better auto-parallelism, notably the Guided Auto Parallel feature introduced in 2010. Other compiler vendors also have auto-parallelizing compilers.

    I've been in the industry for decades, and every couple of years someone comes out with yet another programming language or technique that is supposed to revolutionize application parallelization. This is just another one, and many years behind the rest of the industry from what I can tell.

  18. Re:Great potential by Your.Master · · Score: 4, Insightful

    If it has a visual effect, then it's not read-only.

  19. Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · · Score: 5, Insightful
    Let me repeat myself: except for very small programs or very short inner loop bodies. Most of what you are describing is pretty small in terms of the amount of code and its complexity; hand-optimizing assembly code does not scale up.

    Even the best of C or C++ compilers are terrible at vectorization of code.

    Yeah, and the best humans are terrible at allocating registers -- so bad, in fact, that the best C compilers ignore the register keyword. What do you think is more relevant to the general case: vectorizing multimedia operations, or allocating registers? Compilers are also better than humans at:

    • Finding redundant or unreachable code
    • Finding dead or otherwise unneeded variables
    • Finding algebraic reductions
    • Strength reducing transformations
    • Boolean optimizations
    • Reducing program size

    ...and numerous other optimizations that, like register allocation, are more generally applicable than SIMD. There has also been work on compiler optimizations that are utterly out of reach for even the most skilled humans, like probabilistic optimizations that have a negligible (with respect to some tunable parameter) probability of being unsound.

    To put it another way, look at the Orbitz story, which is over a decade old now:

    http://www.paulgraham.com/carl.html

    On the one hand, you have hand-tuned assembly language. On the other, you have a program written in Lisp (a high level, compiled language) with some C++ mixed in (for managing memory). Orbitz was able to compete on speed, but more importantly, it was returning better results. It's not that the people who wrote that mainframe assembly language were idiots -- they were taking advantage of all sorts of special hardware features, they knew how to hack their machines better than anyone else -- it is just that the Orbitz algorithm was far too complex for efficient hand-rolled assembly code, at which point compilers are really the only choice. The mainframe guys were busy thinking about how to make use of special machine features in assembly language; the ITA team was busy solving the higher-level problem, and relying on their compiler to generate good assembly language code. This is a particularly telling line:

    We disassemble most every Lisp function looking for inefficiencies and have had both CMUCL and Franz enhanced to compile our code better.

    [emphasis mine]. They disassembled their code...and then improved their compiler when they saw problems. They did not hand-roll the code, they made the compiler do a better job of generating code. These are not lazy programmers, nor are they programmers who do not know how to use assembly language; they are programmers who understand that they have a tool that is far better at generating assembly language than they are, and that they have more important things to do with their time.

    I deal with quite a bit of crypto code in my work. I have seen lots of hand-tuned assembly language, I dealt with code that took advantage of the AESNI instructions to perform very fast encryption. I am well aware that in small, highly specialized functions (like AES), humans are better able to utilize special instructions to improve performance. Those are niche cases, and the techniques used in those cases have very limited applicability (even SSE is fairly limited in its applicability, by comparison with the sort of code programmers write and maintain every day), and the techniques scale very poorly.

    --
    Palm trees and 8
  20. Re:Great potential by paskie · · Score: 4, Insightful

    One would assume that "world modification" (monad; in this case, visual I/O) does not fit the restriction "only ever accesses read-only data, or modifies local data that is unaccessible from anywhere else" and as soon as you are showing something to the user, that blocks the auto-threading.

    Then again, if you are drawing map tiles, the order you draw them in doesn't matter - so this goes to show that in some situations, there's no way around the programmer actually having to stop, think and declare whether the program's contract allows it to do some user-visible operation in parallel.

    --
    It's not the fall that kills you. It's the sudden stop at the end. -Douglas Adams
  21. Re:Great potential by Jeremi · · Score: 5, Interesting

    Compilers already try to do this for example with auto-vectorization. The problem is that they are usually quite terrible at it.

    I suspect one reason they are so bad at it is they have to be very conservative in how they optimize, due to the relaxed nature of the C language. For example, if the C optimizer cannot prove beyond a shadow of a doubt that a particular memory location isn't being aliased, it can't make any assumptions about the location's value not changing at any step of the process. In practice, that means no optimizations for you.

    Given that, it would seem that the Microsoft approach (using not only the higher-level language C#, but a specially customized version of C#) gives their optimizer much greater latitude. Because the language forces the programmer to annotate his objects with readable/writable/immutable tags (think C's "const" tag, but with teeth), and because the language (presumably) doesn't allow the programmer to do sneaky low-level tricks like casting away const or aliasing pointers or pointer math, the optimizer can safely make assumptions about the code that a C or C++ optimizer could never get away with. That may allow it to be more effective than you might anticipate (or maybe not, we'll see).

    --


    I don't care if it's 90,000 hectares. That lake was not my doing.
  22. Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · · Score: 5, Insightful

    Just like quicksort(3) is far faster than bubblesort so too is a highly threadable code faster than non-threadble code

    First, just to be pedantic, I'll point out that quicksort is as bad as bubblesort in the worst case, to a constant factor (you should have picked heapsort or mergesort). That aside, it is worth noting (and I am somewhat bothered by this when it comes to TFA) that we still do not know if it is even possible to optimize any program by parallelizing it; see the NC-vs-P question:

    https://en.wikipedia.org/wiki/P-complete

    Multithreading is not a magic bullet, and in all likelihood it is not generally applicable.

    Languages do not, contrary to belief, express intent, the provide a strict set of instructions that the computer MUST respect

    Wrong on all counts. Imperative languages are a way to convey instructions to a computer; declarative languages do not convey instructions, and programming in a declarative language requires an entirely different mode of thinking (it is closer to asking a question that giving instructions). It is also not strictly necessary for the computer to do exactly what a program expresses; there has been some work on compiler optimizations that have a (tunable) chance of not maintaining soundness.

    In the end a good algorithm with no compiler help will beat optimized "dumb" code in all cases larger than "toy" (say, a few dozen "n" in Big-O notation)

    If you are convinced of this, try implementing something more complex than the algorithms you see in Knuth's books; say, this:

    http://eurocrypt2010rump.cr.yp.to/9854ad3cab48983f7c2c5a2258e27717.pdf

    Then ask yourself this: could the constant factors in your implementation be better? At the end of the day, big constant factors will hurt your performance so badly that you might as well have used an asymptotically worse algorithm; indeed, consider fast integer multiplication:

    https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm

    10000 digits are needed before that algorithm actually outperforms the asymptotically worse Toom-Cook family of algorithms. Here is an even more extreme example:

    https://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_algorithm

    Sure, that's a better matrix multiplication algorithm in the asymptotic sense...but only for matrices that are so large that you cannot even store them on today's computers.

    So really, while you are right that asymptotic improvements will always beat constant factor improvements (which is what compilers are mostly going to do for you), you are wrong to ignore the importance of constant factor improvements. In the real world, constant factors matter. In the real world, quicksort and mergesort will use asymptotically worse algorithms below a certain problem size because of constant factors. In the real world, large integer multiplication is done using Karatsuba or Toom-Cook methods, not FFT methods, because of constant factors. In the real world, if you are not using a compiler to optimize your code, your code is going to be slower than it needs to be, even if you spent hours hand-tuning it, unless you are only dealing with toy problems.

    --
    Palm trees and 8
  23. Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · · Score: 4, Informative

    How frequently do you spend time thinking about which registers are being used to store partial results in the programs you write? When was the last time you spent time thinking about which registers are being used by which variables, or when it would be a good time to store register contents in memory?

    What about memory alignment and packing? Calling conventions? Near pointers and far pointers (OK maybe that one is a bit dated)? Most programmers do not think about these problems, because most programmers are trying to solve bigger problems and need to spend time thinking about the bigger picture. Getting mired down in low level details makes programmers less productive -- that is why we use programming languages.

    So if we can have compilers that can automatically utilize multithreading, even if it is in a somewhat limited context, we should be happy. If you are happy to not be spending time thinking about padding data or allocating registers, you should be happy about automatic multithreading.

    --
    Palm trees and 8
  24. Re:Fast First Post by aNonnyMouseCowered · · Score: 5, Interesting

    "MS R&D is the largest computer tech R&D in the world. Combine IBM, Intel, and AMD, and you get an idea of their size."

    Citation need. Not disputing the first part. Just the second part about the relative size of Miscorsoft Research.

  25. The real law in play is Amdahl's by 14erCleaner · · Score: 5, Informative
    From skimming their paper, it doesn't appear that they get any real speedup from their parallelism. This is apparent because they state, in the part about the millions of lines of code written in their language, that

    These and other applications written in the source language are performance-competitive with established implementations on standard benchmarks

    Translation: we didn't speed them up any, or at least not by enough that we care to share any number.

    Amdahl's Law is difficult to overcome in auto-parallelising systems that rely on anything other than loop optimizations. Basically, in straight-line code, if you make 50% of the code run on multiple cores, you're only going to get at most a 2x improvement in speed. In practice, you won't even get that much (except in loops) due to added overhead. Bottom line: you can write papers about your wonderful auto-parallelizing technique, but when the rubber hits the road this is unlikely to lead to much performance improvement.

    --
    Have you read my blog lately?
  26. Re:Fast First Post by happymellon · · Score: 5, Funny

    Of those, IBM have around 439,999 project managers.

  27. Re:Fast First Post by TheRaven64 · · Score: 4, Informative

    Nobel Prizes? None. Because there is no Nobel Prize for computer science. The closest equivalent is a Turing Award. I haven't checked the whole list, however the 2009 winner works at Microsoft Research (and has since 1997), the 1998 winner is now dead, but was working at Microsoft Research between 1995 and his death. You can check the list yourself for IBM.

    --
    I am TheRaven on Soylent News