Slashdot Mirror


Auto-threading Compiler Could Restore Moore's Law Gains

New submitter Nemo the Magnificent writes "Develop in the Cloud has news about what might be a breakthrough out of Microsoft Research. A team there wrote a paper (PDF), now accepted for publication at OOPSLA, that describes how to teach a compiler to auto-thread a program that was written single-threaded in a conventional language like C#. This is the holy grail to take advantage of multiple cores — to get Moore's Law improvements back on track, after they essentially ran aground in the last decade. (Functional programming, the other great hope, just isn't happening.) About 2004 was when Intel et al. ran into a wall and started packing multiple cores into chips instead of cranking the clock speed. The Microsoft team modified a C# compiler to use the new technique, and claim a 'large project at Microsoft' have written 'several million lines of code' testing out the resulting 'safe parallelism.'" The paper is a good read if you're into compilers and functional programming. The key to operation is adding permissions to reference types allowing you to declare normal references, read-only references to mutable objects, references to globally immutable objects, and references to isolated clusters of objects. With that information, the compiler is able to prove that chunks of code can safely be run in parallel. Unlike many other approaches, it doesn't require that your program be purely functional either.

404 comments

  1. ok now by Anonymous Coward · · Score: 0

    now that this has been described, programmers make it so!

  2. Not this shit again by Anonymous Coward · · Score: 4, Informative

    Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.

    1. Re:Not this shit again by xetovss · · Score: 4, Interesting

      Exactly, Moore's Law isn't a law, it is a marketing plan. I don't see why so many people get so serious about it. A real law (of science) would be something like the law of gravity where it has a near universal application, whereas Moore's Law is a "law" that describes Intel's marketing plan.

    2. Re:Not this shit again by Baloroth · · Score: 4, Informative

      Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.

      Yes and no. Moore's law states that the number of transistors will double every 2 years. The problem is that we are nearing the peak of what is possible with current technology in a single core, hence all the focus on 4,6,8, and even 16 core systems for consumers (always been popular in supercomputers and the like). That means doubling transistor count every 2 years can be done through increasing cores... but there is no point in doing that if programs can only use a few of them (very few consumers now need 4 cores, much less 8 or 16).

      So, if you can scale up the ability to use processor cores, then Moore's law can continue to hold for CPU's as we increase processor cores. If you can't, then it won't. It'll have to stop sometime, of course, just not necessarily today.

      --
      "None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
    3. Re:Not this shit again by Jonner · · Score: 4, Insightful

      Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.

      Try reading the headline again. Usually, clueless posters have to be reminded to read TFA, but this is ridiculous. As Moore's "law" continues to be mostly true, the added transistors are being used for extra more cores rather than to make one core faster. Most of the cores sit idle most of the time because few programs can use more than one of them at once.

    4. Re:Not this shit again by Anonymous Coward · · Score: 0

      For chrissake... relax. We refer to Moore's law, in almost every case, in the context of chip performance, not actual transistor count.

      Moore's law is the observation that over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years. The period often quoted as "18 months" is due to Intel executive David House, who predicted that period for a doubling in chip performance (being a combination of the effect of more transistors and their being faster).[1]

      Obivously we hit the wall on that, and the summary suggests the net result of this auto-threading would put "work done" back on track with predictions of Moore's law. Pull the stick out.

    5. Re:Not this shit again by c0lo · · Score: 5, Funny

      Apparently the real geeks left Slashdot ages ago.

      Casted to void?

      --
      Questions raise, answers kill. Raise questions to stay alive.
    6. Re:Not this shit again by Spiridios · · Score: 5, Insightful

      Maybe you need to read?

      Here's the headline (emphasis mine): Auto-threading Compiler Could Restore Moore's Law Gains

      In the past, adding more transistors to a core meant the core worked faster (simplification), so Moore's law indirectly lead to better performance. Now a core is close to as fast/efficient as its going to get, so we throw the extra transistors at new cores (still Moore's law). The problem is, there's only so many single-threaded processes a person can run before there will literally be one core per process. In order to see benefits from the extra transistors again, "gains" in the summary's terminology, then we need a way to take advantage of it (this new compiler technique, functional programming, programmers who can actually grok threads). In the end, if we're not seeing some kind of gain from Moore's law, then the chip manufacturers will stop adding new transistors because no one will pay money for a chip that's just as good as their current chip, and Moore's law will fail.

    7. Re:Not this shit again by Anonymous Coward · · Score: 2, Informative

      Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.

      Yes and no. Moore's law states that the number of transistors will double every 2 years.

      Why do people always fail to mention the part about complexity for minimum cost? It's an economy-of-scale observation. The cost part is essential to what Moore was saying.

      "Number of components per integrated circuit for minimum cost" will double every 1-2 years.

    8. Re:Not this shit again by Anonymous Coward · · Score: 0

      > ... but there is no point in doing that if programs can only use a few of them (very few consumers now need 4 cores, much less 8 or 16).
      Right, because editing video while video conferencing and/or burning an edit sample to dvd at the same time over different usb busses doesn't really require all that much cpu power.
      Nor would it be anything that a "consumer" would do right?

    9. Re:Not this shit again by ceoyoyo · · Score: 5, Interesting

      Moore's law was coined by an engineer to describe a series of observations. That is, it's a mathematical function that seems to fit some data, without any explanatory power. Just like various other "laws" such as the laws of thermodynamics, and, your favourite, Newton's laws, including his law of universal gravitation.

      Moore's law states that the number of components on an integrated circuit doubles approximately every two years.

    10. Re:Not this shit again by shmageggy · · Score: 4, Insightful

      Yeah but "Moore's Heuristic" just aint as snappy.

    11. Re:Not this shit again by Anonymous Coward · · Score: 0

      Right, because editing video while video conferencing and/or burning an edit sample to dvd at the same time over different usb busses doesn't really require all that much cpu power.

      Umm.... Right?

    12. Re:Not this shit again by Anonymous Coward · · Score: 0

      Indeed, it sums up your arguments well.

    13. Re:Not this shit again by Anonymous Coward · · Score: 0

      Actually part of the problem is probably the dinosaur of x86. Frequency isn't king. Remember Celeron. If more operations or larger operations could be completed in a cycle than performance would grow without increasing frequency since shrink would allow the extra transistors to accomplish this. I think Moore's Law is bollocks anyways cause it wasn't meant to foretell anything it was just noting a trend way back when.

    14. Re:Not this shit again by tlhIngan · · Score: 4, Informative

      Yes and no. Moore's law states that the number of transistors will double every 2 years. The problem is that we are nearing the peak of what is possible with current technology in a single core, hence all the focus on 4,6,8, and even 16 core systems for consumers (always been popular in supercomputers and the like). That means doubling transistor count every 2 years can be done through increasing cores... but there is no point in doing that if programs can only use a few of them (very few consumers now need 4 cores, much less 8 or 16).

      Except... the number of transistors in a CPU is irrelevant!

      A CPU doesn't have the transistor density that really benefits much from Moore's Law - because the vast majority of the space on a chip is not taken up by transistors, but by wiring. In fact, the wiring density is what's limiting transistor density (a good thing - larger transistors can give you better performance because they can drive the longer wires quicker).

      Most of the transistors used in a CPU actually goes towards the cache - when you're talking about 16+ MB of pure L1/L2/L3 cache, implemented as 6T SRAM cells, that's 100M transistors right there (and that doesn't include the cache line tag logic and CAM).

      The thing with the highest transistor density (and thus the most benefit of Moore's Law) is actually memory structures - caches, DRAM, SRAM, flash memory, etc. This is where each transistor is vital to memory storage and packing them in close means more storage is available, in which case Moore's law states that RAM etc. will double in capacity or halve in cost every 18 months or so.

      Smaller transistors do help CPUs consume a little less power, but double the number of transistors doesn't do a whole lot because there's a lot of empty space that the wiring forces to be transistor-free. (Non-memory parts of the CPU are effectively "random logic" where there's no rhyme or reason to the wiring). It's why the caches have the most transistors yet take the smallest areas.

    15. Re:Not this shit again by TheLink · · Score: 4, Interesting

      The problem is that we are nearing the peak of what is possible with current technology in a single core

      But aren't there still plenty of things that the hardware can do to make the software stuff easier?

      Intel has actually started adding some stuff to help: http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions

      So maybe Intel, AMD should interact more with the OS and compiler bunch and figure out how to use those billions of transistors in more useful ways.

      There are things you can do in software to address the c10k problem (and similar problems): http://www.kegel.com/c10k.html
      But I'm sure there are things you can do in hardware to make stuff even easier. It's not like the stuff that's being done is the best way of doing things. Furthermore the OS and apps people might be writing things in certain ways because certain things aren't done by the hardware or done fast.

      I know that at least on the x86 platform checking the current time and also getting monotonic time is not as simple AND efficient as it could be. It was even worse before HPET, and now even HPET isn't that great. Monotonic system time (ticks) could be different from current human time, many programs need one or both.

      Same goes for scheduling stuff, serializing stuff and getting stuff to trigger on arbitrary things all with minimal overhead. I suspect many things would be easier if you can create a way of having cluster wide consistent monotonic high-res time, and also fast low latency clusterwide locking.

      On a related note many programming languages seem to like to push parameters onto a stack. That's fine but in many implementations they seem to push the data onto the code/call stack (which holds return addresses). This mixing of data and code stuff is unsafe. They should have separate data and code stacks. That way even if a hacker messes with the data, it's harder to execute arbitrary code- you'd just execute the same code but with mangled data, which is more likely to produce errors instead of arbitrary code execution.

      If the reason for doing such insecure stuff is performance or convenience, then perhaps Intel etc can use those transistors to make it faster and easier to do things safer.

      --
    16. Re:Not this shit again by madprof · · Score: 1

      This is informative in more ways than one. I wish every single post on Slashdot was this good.

    17. Re:Not this shit again by Anonymous Coward · · Score: 0

      i wonder why so many "so-called" geeks fail to point that the relevant law here is Amdahl's law and not moore's law.

    18. Re:Not this shit again by Anonymous Coward · · Score: 0

      No: >null

    19. Re:Not this shit again by Baranovich · · Score: 1

      That means doubling transistor count every 2 years can be done through increasing cores....

      That makes no sense at all. You've got it backwards. Moore's Law concerns the doubling of transistor counts roughly every two years. How those additional transistors are used to increase computing performance, whether for additional cores or cache or whatever, is entirely separate.

      --
      Philosophy is questions that may never be answered, religion is answers that may never be questioned.
    20. Re:Not this shit again by fgouget · · Score: 1

      Most of the transistors used in a CPU actually goes towards the cache - when you're talking about 16+ MB of pure L1/L2/L3 cache, implemented as 6T SRAM cells, that's 100M transistors right there (and that doesn't include the cache line tag logic and CAM).

      More than you think: 16 MB * 1048576 bytes/MB * 8 bits/byte * 6 transistors/bit = 805 million transistors

    21. Re:Not this shit again by chrysrobyn · · Score: 5, Interesting

      Except... the number of transistors in a CPU is irrelevant!

      No, it's very relevant.

      A CPU doesn't have the transistor density that really benefits much from Moore's Law - because the vast majority of the space on a chip is not taken up by transistors, but by wiring. In fact, the wiring density is what's limiting transistor density (a good thing - larger transistors can give you better performance because they can drive the longer wires quicker).

      How much wiring happens on doped silicon? None. The vast majority of the chip is covered in transistors, with 6-10 levels of wires on top of them. There are some designs where the I/O count demands so many pins that's what dictates the size of the chip -- so cache is filled in underneath. Heck, if your power budget allows it, you're already blowing the silicon area anyway, might as well increase your cache size! Consider your recent Core derived designs. Take away half the cache. Do you think the die area would go down? Not hardly.

      Most of the transistors used in a CPU actually goes towards the cache - when you're talking about 16+ MB of pure L1/L2/L3 cache, implemented as 6T SRAM cells, that's 100M transistors right there (and that doesn't include the cache line tag logic and CAM).

      You did the math right, but the cache line tag logic and coupled CAM are negligible. Sure, they may add a few million or so, but not anywhere near 5% of 100M.

      The thing with the highest transistor density (and thus the most benefit of Moore's Law) is actually memory structures - caches, DRAM, SRAM, flash memory, etc. This is where each transistor is vital to memory storage and packing them in close means more storage is available, in which case Moore's law states that RAM etc. will double in capacity or halve in cost every 18 months or so.

      I realize it's vogue for people to revisit Moore's Law and rewrite it every few years, but he was not speaking specifically about memory arrays. In fact, the chips Moore had access to at the time had very little memory on them.

      Smaller transistors do help CPUs consume a little less power, but double the number of transistors doesn't do a whole lot because there's a lot of empty space that the wiring forces to be transistor-free. (Non-memory parts of the CPU are effectively "random logic" where there's no rhyme or reason to the wiring). It's why the caches have the most transistors yet take the smallest areas.

      Wiring never forces silicon area to be transistor-free, unless you're thinking of 1980 era chips. Not even late '80s had wiring on doped silicon. Certainly the kinds of chips Moore was talking about has had no significant wiring on doped silicon in 20 years, the exceptions being only when layout designers are getting lazy. I've done layout design, I've done circuit design, I've audited dozens of chip layouts and seen several technology manuals dating back to the 90s.

      That random logic, by the way, is the subject of the most innovation in the field of chip layout and arguably in all of chip design. When your chip's entire goal is to funnel data through different units and do different things to it, you're dominated by buses. Automated tools often do split these buses up, but different algorithms can pull them together and make them more efficient. Caches are the smallest because they can be small. There's an entire periphery to them, including senseamps devoted to reading the baby FETs that can't make full rail to rail swings on the bitlines.

      May I guess you're a student? Perhaps one who is learning from a professor who hasn't been in the industry since about 1985?

    22. Re:Not this shit again by HaZardman27 · · Score: 3, Funny

      In fact, the chips Moore had access to at the time had very little memory on them

      Well of course! That was a lot of 18 monthses ago!

      --
      Apparently wizard is not a legitimate career path, so I chose programmer instead.
    23. Re:Not this shit again by Mythran · · Score: 1

      You did the math right, but the cache line tag logic and coupled CAM are negligible. Sure, they may add a few million or so, but not anywhere near 5% of 100M.

      Umm, "a few million" is around 4 to 5 million...which can equate to "about 5% of 100 million"...so yeah, I take that to mean somewhere near and potentially exceeding 5% of 100 million. Just to be anally retentive.

    24. Re:Not this shit again by Mythran · · Score: 1

      Depending on your definition, usage, and/or context of "a few"...which could also mean "more than 2, less than 5"...or since we are talking about "a few million" out of 100 million, this could also go upwards of 10-15 million... :P

    25. Re:Not this shit again by Anonymous Coward · · Score: 0

      Man. Times like this, I wish the "Interesting" mod could go higher than 5.

    26. Re:Not this shit again by Anonymous Coward · · Score: 0

      The number of people predicting the end to Moore's Law doubles every two years.

    27. Re:Not this shit again by Anonymous Coward · · Score: 0

      Per dollar. Don't forget that bit.

    28. Re:Not this shit again by TeknoHog · · Score: 1

      In physics, a law is a very strong statement, it's not enough it if fits some data. For example, we have the law of conservation of energy, but the general theory of relativity. We know that the latter is not absolutely true at all levels of scale, but we believe that the former will hold nevertheless. That said, even a theory needs a pretty good track record, it's very far from a hypothesis.

      In the past, some observations have been unfortunately named as laws, so we have things like power laws in statistical linguistics. It would be great to separate legal opinions of judges and juries from something that happens independent of puny humans, but this is the language we're stuck with for now. Incidentally, Newton's laws aren't that bad, you just have to be careful with the frame of reference and the definition of momentum (It's F = dp/dt, not just F = ma). Just like the law of conservation of energy holds after slight refinements to our idea of energy.

      (I am a physicist, but it's been a while. I am not ANAL.)

      --
      Escher was the first MC and Giger invented the HR department.
    29. Re:Not this shit again by ceoyoyo · · Score: 2

      We also have the law of conservation of mass. Oh yeah, we had a nice demonstration of breaking that one in 1945. Then there's Kepler's laws of planetary motion: the first and second are pretty close, but not quite true (even if you're not looking at Mercury) and the third was only a statement of proportionality until it was refined later.

      Notice that all the physics "laws" originate before the 20th century. They were exactly as I described: statements, preferably mathematical, that fit the data of the time but didn't offer the explanatory power that most modern theories do. Some of them have been proven to be untrue. Some of them were approximate. Some of them were later refined.

      Actually, conservation of energy is about the only one that immediately comes to mind that is both exact and we still believe is universally true. I guess maybe you could put Newton's first and third laws in that category, but as you yourself point out, Newton's second law isn't really universally true unless you change it.

    30. Re:Not this shit again by Jonner · · Score: 1

      Thank you Spiridios for expanding on my post. I thought it was quite clear that the gains referred to result from Moore's "law" rather than being the "law" itself.

    31. Re:Not this shit again by Jonner · · Score: 1

      Actually part of the problem is probably the dinosaur of x86. Frequency isn't king. Remember Celeron. If more operations or larger operations could be completed in a cycle than performance would grow without increasing frequency since shrink would allow the extra transistors to accomplish this. I think Moore's Law is bollocks anyways cause it wasn't meant to foretell anything it was just noting a trend way back when.

      To be able to take advantage of the additional transistors provided by Moore's "law," either clock speed must increase or more work must be accomplished in a single clock cycle. It is getting very difficult to increase either of those within a single core. Therefore, the gains must come from additional cores.

    32. Re:Not this shit again by highphilosopher · · Score: 1

      Code stack, and a data stack? I can't decide. You either smell of pot, or stink of scripting languages.

      Ever heard of the stack and the heap?

    33. Re:Not this shit again by davesag · · Score: 1

      I've always assumed that the word "law" in "Moore's Law" was being used ironically, or at least sarcastically.

      --
      I used to have a better sig than this, but I got tired of it
    34. Re:Not this shit again by Anonymous Coward · · Score: 0

      gravity is a theory, not a law.

      electromagnetism is what you were looking for maybe.

    35. Re:Not this shit again by Anonymous Coward · · Score: 0

      Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.

      Then why are you here?

  3. Great potential by Anonymous Coward · · Score: 1

    There is great potential here if this sort of thing can be made to work cleanly and safely. I suspect a lot of programmers (myself included) think better in a single-threaded manner. People are mostly used to dealing with A B C D and, as such, I think we usually write code to execute steps the same way. Having the compiler identify things which could run in parallel and thread work to run as A B&C D would be a huge step forward as long as it doesn't introduce bugs.

    1. Re:Great potential by MightyMartian · · Score: 3, Interesting

      I'm wondering how this works. Does it scan for loops to remake them as event driven processes? Does it splice off multiple function calls and then throw the results into some sort of stack for retrieval? It's a cool idea, but man, for any kind of non-trivial program it must be some monumental piece of work.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    2. Re:Great potential by Desler · · Score: 5, Informative

      Having the compiler identify things which could run in parallel and thread work to run as A B&C D would be a huge step forward as long as it doesn't introduce bugs.

      Compilers already try to do this for example with auto-vectorization. The problem is that they are usually quite terrible at it. Even Intel's compiler which is probably the best at it usually misses out on most of the obvious places that should be vectorized. This is why pretty much all code dealing with multimedia content (audio/video/image codecs, games, etc.) still rely on tons of had written SIMD to be optimized to their fullest.

    3. Re:Great potential by Desler · · Score: 2

      And it must be hell trying to debug without digging deeply into the generated code.

    4. Re:Great potential by Anonymous Coward · · Score: 2, Interesting

      Any code that doesn't access the same resources can be run simultaneously. Alternately, if no thread is writing to a resource, it can be shared without locking.

      Those two observations allow you to determine when code can be run together in different threads. If a method only uses 'final' or 'const' things, then you can run it. If your method only uses one isolated cluster of objects, then it can run simultaneously with another method that only uses a different isolated cluster of objects.

      Honestly, any programmer worth $120k could make something more efficient just using threads, if they thought about it. This will get you 20% of the efficiency gains without thinking of it.

    5. Re:Great potential by allcoolnameswheretak · · Score: 5, Insightful

      Imagine you have a for-loop that calls the method 'hop' on every object 'bunny' in the list:

      for every bunny in list {
              bunny.hop()
      }

      This is a simple, sequential process - bunny.hop() is called in order, for every bunny in the list, one after the other.
      Now suppose you have defined the data in your program in such a way that the compiler knows that method 'bunny.hop()' only ever accesses read-only data, or modifies local data that is unaccessible from anywhere else. The compiler now knows that the order of execution of the bunny hops doesn't really matter, as every call of bunny.hop() is independent from anything else. This frees the compiler to spawn threads or processes to call as many bunny.hop()'s as he likes at the same time, thereby processing through the list alot faster.

      Another method, bunny.eat() actually performs write access on a shared object 'carrot' when called. If two bunnies eat the same carrot, the compiler can not perform automatic parallelization, as running two bunny.eat() methods could lead to invalid state (only one piece of carrot remaining, two bunnies eating 1 piece of carrot at the same time, results in -1 pieces of carrot). In this case, the compiler will take care to run two bunnies eating away at the same carrot sequentially. However if there are 2 bunnies eating the green carrot and another 2 bunnies eating the yellow carrot, these are again independent from each other and can again be paralleled.

      The requirement to make this possible is to provide the compiler with information on what kind of data something is - is it an isolated hopping? Or a shared carrot? or a global Bunnygod that affects all bunnies?

    6. Re:Great potential by Boawk · · Score: 5, Funny

      If two bunnies eat the same carrot...the compiler will take care to run two bunnies eating away at the same carrot sequentially. However if there are 2 bunnies eating the green carrot and another 2 bunnies eating the yellow carrot

      Am I the only one who is totally horny after reading that?

    7. Re:Great potential by Culture20 · · Score: 2, Interesting

      Imagine you have a for-loop that calls the method 'hop' on every object 'bunny' in the list:

      for every bunny in list {
      bunny.hop()
      }

      And if the original intent is to make the bunnies hop in sequence like performing "the wave", making a new thread for each hop might produce a visual effect of them all hopping at once if the sleep occurred in the hop() function instead of the for loop calling the hops.

    8. Re:Great potential by Kjella · · Score: 2

      There is great potential here if this sort of thing can be made to work cleanly and safely. I suspect a lot of programmers (myself included) think better in a single-threaded manner. People are mostly used to dealing with A B C D and, as such, I think we usually write code to execute steps the same way.

      If you've ever done a dish with more than one casserole or making a side dish while the main dish is cooking you've done parallelization in real life, the problem is that expressing it in a computer language is hard. Functional programming works okay if you're doing lots of similar things at the same time, but not if you're doing lots of different - but parallelizable - tasks. I want to call oven.roast( meat ), stove.boil( potatoes ) and workbench.slice( salad ) but I don't particularly care about the order and I know they're side effect free relative to each other, but how do I express that? Either I write them in sequence ABC and they happen in sequence ABC or I have to write threads and schedule them which is honestly overkill - I want it to use the resources efficiently not micromanage the schedule myself.

      Particularly if these come in random and unpredictable amounts like the orders at a restaurant I need the computer to work it out itself. It's also very important if you have small resource conflicts like the hot casserole and hot pot roast both requiring mittens to move. I know quite a lot about mutexes and locks and semaphores to make it explicit but now it feels like I'm explicitly describing the opportunities for parallelism not the dependencies preventing it. Imagine if you were a project manager, do you start with every task depending on every other task? No, you start with the tasks then make dependencies where they're needed.

      I wonder if you could do something more with "scoped const-ness", for lack of a better term. That is to say a function is non-const in a certain scope and const outside it, and thus can be executed in parallel with anything else outside that scope. To continue the example above, let's say I could define a function oven.raiseTemperature() that's non-const in the oven scope and const in the global scope and a function stove.stirPot() that is the same for the stove. If I then call them the compiler would know it's free to reorder them or run them in parallel, it picks the optimal execution based on the hardware available or even dynamically at run time. Functions that depend on multiple objects would automatically become fences to "sync up", though not necessarily on a global scale. Like a function on both the oven and stove may sync the kitchen, but not the living room.

      It might not give you massive parallelization but it'd at least let you parallelize many mixed tasks that I don't feel you manage to do easily in any language today. It certainly sounds like a safer way than starting off a bunch of worker threads and praying you didn't create any deadlocks or live locks or race conditions or whatnot. Much like the global const the compiler should be able to validate that it doesn't in fact have any side effects outside the scope it claims. I'm sure there's lots of reasons why this wouldn't be as useful as I imagine it would be, but it certainly seems easier for those of us used to imperative programming.

      --
      Live today, because you never know what tomorrow brings
    9. Re:Great potential by Your.Master · · Score: 4, Insightful

      If it has a visual effect, then it's not read-only.

    10. Re:Great potential by paskie · · Score: 4, Insightful

      One would assume that "world modification" (monad; in this case, visual I/O) does not fit the restriction "only ever accesses read-only data, or modifies local data that is unaccessible from anywhere else" and as soon as you are showing something to the user, that blocks the auto-threading.

      Then again, if you are drawing map tiles, the order you draw them in doesn't matter - so this goes to show that in some situations, there's no way around the programmer actually having to stop, think and declare whether the program's contract allows it to do some user-visible operation in parallel.

      --
      It's not the fall that kills you. It's the sudden stop at the end. -Douglas Adams
    11. Re:Great potential by Jeremi · · Score: 5, Interesting

      Compilers already try to do this for example with auto-vectorization. The problem is that they are usually quite terrible at it.

      I suspect one reason they are so bad at it is they have to be very conservative in how they optimize, due to the relaxed nature of the C language. For example, if the C optimizer cannot prove beyond a shadow of a doubt that a particular memory location isn't being aliased, it can't make any assumptions about the location's value not changing at any step of the process. In practice, that means no optimizations for you.

      Given that, it would seem that the Microsoft approach (using not only the higher-level language C#, but a specially customized version of C#) gives their optimizer much greater latitude. Because the language forces the programmer to annotate his objects with readable/writable/immutable tags (think C's "const" tag, but with teeth), and because the language (presumably) doesn't allow the programmer to do sneaky low-level tricks like casting away const or aliasing pointers or pointer math, the optimizer can safely make assumptions about the code that a C or C++ optimizer could never get away with. That may allow it to be more effective than you might anticipate (or maybe not, we'll see).

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    12. Re:Great potential by Anonymous Coward · · Score: 0

      No, you are not. I wish it was implemented in GCC.

    13. Re:Great potential by elfprince13 · · Score: 3, Insightful

      One problem, and the great attraction of this way of doing things, is that most software companies don't have too many programmers worth $120k. Lots of $50k-$80k, but not so much on the higher end of the pay scale until you get into research divisions at big companies like Microsoft or Google or IBM.

    14. Re:Great potential by Anonymous Coward · · Score: 0

      Did you ever think to read the paper that's linked in the article? It goes over in some depth how it works.
      http://research.microsoft.com/pubs/170528/msr-tr-2012-79.pdf

    15. Re:Great potential by Anonymous Coward · · Score: 0

      If two bunnies eat the same carrot...the compiler will take care to run two bunnies eating away at the same carrot sequentially. However if there are 2 bunnies eating the green carrot and another 2 bunnies eating the yellow carrot

      Am I the only one who is totally horny after reading that?

      Apparently. It's a pre-existing condition for the rest...

    16. Re:Great potential by unrtst · · Score: 2

      What you describe sounds a lot like "make".

      $ cat Makefile
      dinner: roast_meat bake lasagna boil_potatoes chop_salad
              turn_off stove oven
              wipe table
      clean_table:
              wipe table
      prepare_meat: clean_table
              cut meat
              salt meat
      set_meat_temp:
              # make it fast
              chtemp 800
      roast_meat: prepare_meat set_temp_meat
              ovenize meat
              sleep 900
              put_on_table meat
      prepare_lasagna: clean_table chop_salad prepare_meat
              remove_box lasagna
      set_lasagna_temp: roast_meat
              chtemp 500
      bake_lasagna: prepare_lasagna set_lasagna_temp roast_meat
              ovenize_lasagna
              sleep 3600
              put_on_table lasagna
      prepare_potatoes:
              wash potatoes
      boil_potatoes: prepare_potatoes turn_on_burner1
              put_on_burner1 pot water potatoes
              sleep 1800
              put_on_table potatoes
      chop_salad: clean_table prepare_meat prepare_potatoes
              put_on_table salad

      $ make dinner

    17. Re:Great potential by Anonymous Coward · · Score: 0

      Often the problem is that obvious opportunities for auto-vectorization (and other optimizations) aren't provably safe from a static perspective without a well-designed type system. Hence research on new type systems. In addition to permission-based type systems, researchers are considering optimization techniques in functional languages (see Manticore and ConcurrentML.)

    18. Re:Great potential by DragonWriter · · Score: 1

      Any code that doesn't access the same resources can be run simultaneously.

      Any code that doesn't access the same memory location can be run simultaneously, as can any code that does access the same memory location as long as that access is all read-only. But if more than one bit of code accesses any external resources (even different external resources), you can't parallelize (or do anything that might reorder) them without potentially changing the results.

    19. Re:Great potential by ignavus · · Score: 2

      Have you ever considered writing a text book on programming?

      --
      I am anarch of all I survey.
    20. Re:Great potential by Anonymous Coward · · Score: 0

      Carrots are ORANGE, numbnuts!

    21. Re:Great potential by phantomfive · · Score: 1

      That may allow it to be more effective than you might anticipate (or maybe not, we'll see).

      My guess is no, and here's why: the vast majority of cases where parallelism is actually useful is when you have to process a lot of data in an array, or some kind of sequence. You can't declare the whole array const (or threadsafe), because you want to have multiple threads working on the same array. So there's no good way to automatically have multiple threads working on the same project using this technique.

      Or consider another typical problem where threading gives you a good bonus: video encoding. A naive approach would be to load one frame, encode it, load the next frame, encode it, etc. How will the compiler know that it should optimize that loop and stick multiple worker threads on that task at once? I'm not sure having 'const' will help you with that.

      --
      "First they came for the slanderers and i said nothing."
    22. Re:Great potential by toby34a · · Score: 1

      But even make will do things sequentially. True parallelization would be like running make in several different terminal windows at the same time. I encounter this issue a lot with my scientific data analysis. The code that I write could be easily parallelized, if there existed the framework to do so. I repeat the same command for every hour of every data set. Writing this sequentially makes sense, works, and takes the least amount of effort to get the goal accomplished. But if I could leverage this into 12 instances of assigning a core to do each task (such as doing a daily total from 5 minute data over a year), I could truly speed up my productivity without much effort into writing separate applications to handle each month (which is possible as well, but can be a pain in the ass to actually operate, and then you have to compile it 12 times, and when you tweak your program have to recompile 12 times, etc.)

    23. Re:Great potential by ceoyoyo · · Score: 2

      I know nobody knows how to write low level code anymore but displaying something involves writing to memory.

    24. Re:Great potential by flyingfsck · · Score: 1

      We've been doing similar things for decades with Gate Array programming languages. Debugging is hard, but not impossible.

      --
      Excuse me, but please get off my Pennisetum Clandestinum, eh!
    25. Re:Great potential by Anonymous Coward · · Score: 0

      However, if you have something like an array that you want to process, if you're still in C# you can just use Parallel LINQ....

    26. Re:Great potential by Jeremi · · Score: 1

      You can't declare the whole array const (or threadsafe), because you want to have multiple threads working on the same array.

      True, but what the compiler can do is figure out which writable objects in the array will be accessed from each thread, and as long as each thread only reads/writes its own subset of the array, all the threads can happily work in parallel.

      According to the whitepaper, their compiler already does this sort of analysis (keeping track of the set of objects accessible by each thread, given the thread's starting set of references) for individual objects, and while it doesn't currently partition arrays automatically, they are working in that direction:

      "A common focus for safe data-parallelism systems is handling of arrays. The implementation currently does not support arrays directly, but via trusted library abstractions for safe data parallelism. We are currently designing a notion of a sub-array, using a combination of isolation and immutability to allow safe array partitioning for in-place updates, as well as functional data-parallelism. This part of the design is still evolving."

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    27. Re:Great potential by phantomfive · · Score: 1

      Hmmmm, looks like they are just adding a bunch of language extensions. What do they say, "Every language eventually evolves in complexity until it's a subset of APL."

      --
      "First they came for the slanderers and i said nothing."
    28. Re:Great potential by commodore73 · · Score: 1

      Who are you calling a monad?

    29. Re:Great potential by rjstanford · · Score: 0

      Where it would get really interesting is with advanced compiler inspections. For example, let's say that you modified data within the loop, but only through atomic writes:

      i = 0;
      for every bunny in list {
                      ++i;
      }

      Silly example, I know, but in this particular case it still doesn't matter if the loop is parellized. ( i = i + 1 , however, would be a disaster unless it was initially optimized into an increment operation).

      Fun times!

      --
      You're special forces then? That's great! I just love your olympics!
    30. Re:Great potential by Jmc23 · · Score: 1
      (with-parallel-tasks

      ((task-1 ....) (task-2 ....) ..task-n ....)))

      Making macros in Lisp to deal with setting up parallel tasks isn't too difficult. Maybe even use the map syntax:

      (map-pl #'task-command (list dataset-1 dataset-2 ...dataset-n) :key 'hourly)

      --
      Don't complain about syntax, grammar, or spelling. There is no.hell like input on android.
    31. Re:Great potential by ExecutorElassus · · Score: 1

      Every computing/programming abstraction is improved by reference to a bunny metaphor. This is true facts.

    32. Re:Great potential by Alex+Belits · · Score: 1, Funny

      Honestly, any programmer worth $120k could make something more efficient just using threads, if they thought about it. This will get you 20% of the efficiency gains without thinking of it.

      THIS IS WHAT WINDOWS PROGRAMMERS ACTUALLY BELIEVE.

      --
      Contrary to the popular belief, there indeed is no God.
    33. Re:Great potential by phantomfive · · Score: 1

      Oh, come to Silicon Valley, where $100k-$120k is the mid range.

      --
      "First they came for the slanderers and i said nothing."
    34. Re:Great potential by hvdh · · Score: 1

      i++ or ++i or i = i + 1 doesn't generate an atomic operation. You need some intrinsic like Increment(&i) to get that.

    35. Re:Great potential by dutchd00d · · Score: 1

      $ make dinner

      That should be "make -j dinner", obviously.

    36. Re:Great potential by slashdotjunker · · Score: 1

      You will also need to tell the compiler that every element of list is a unique object.

      A = bunny()
      list = [A, A]
      for bunny in list:
          bunny.hop()

    37. Re:Great potential by Anonymous Coward · · Score: 0

      Yes, but in that case iterate through the list instead of using a construct that just happens to iterate in a way you want. I imagine it's defined in standards, but in "real life" speak, saying "for every x in xs" doesn't imply an order.

    38. Re:Great potential by Anonymous Coward · · Score: 0

      Now suppose you have defined the data in your program in such a way that the compiler knows that method 'bunny.hop()' only ever accesses read-only data, or modifies local data that is unaccessible from anywhere else. The compiler now knows ... to optimize it to bunny.nop().

      FTFY

      Maybe you didn't really mean "unaccessible from anywhere else"... maybe you confuse "local" and "private"?

      Words have meanings, please use them.

    39. Re:Great potential by martin_dk · · Score: 1

      Since the compiled program is proved to be thread safe debugging could be done like this:

      • Turn off parallel compiling and debug as usual until everything works fine
      • Turn on parallel compiling and track down your faulty permissions to referencetypes

      Of course the magic is to know which references and variables should be marked local, clustered or shared. But no need to ever check out the generated code assuming the compiler is bug free.

    40. Re:Great potential by martin_dk · · Score: 1

      Perhaps this example is better:

      var results:ThreadedQueue;
      foreach thing in todo {
      results.add(thing.do()) // If thing does not use any concurrent variables it can be spawned in parallel
      // OR
      thing.do(&results);
      }

    41. Re:Great potential by Anonymous Coward · · Score: 0

      But no need to ever check out the generated code assuming the compiler is bug free.

      Rest assured, we can trust Microsoft.

    42. Re:Great potential by Desler · · Score: 1

      But no need to ever check out the generated code assuming the compiler is bug free.

      Do unicorns exist in this fantasy land of yours as well?

    43. Re:Great potential by Bob+the+Super+Hamste · · Score: 2

      as can any code that does access the same memory location as long as that access is all read-only

      Not entirely true. If reads on that block of memory are atomic it will work every time, otherwise you might get some strange results as it may be possible for values to change while reading part of it. This mostly happens with more complex data structures but it can happen with primitive ones depending on their size.

      --
      Time to offend someone
    44. Re:Great potential by xelah · · Score: 1

      I'm sure it must have been possible for compilers to see potentially parallelizable operations for a long time. I remember watching some lectures a few years ago about that (and I'd be surprised if it hasn't been thought about decades before that, it didn't seem hard to spot it over short stretches of code). But that was with a Playstation 3 with (IIRC) essentially only one process scheduled to run at a time. What struck me then, and now, is that it could be a scheduling and optimization nightmare, especially if your compiler can only spot small scale parallelizability.

      Suppose your compiler can spot and parallelize in to chunks of, say, a few hundred or thousand instructions each running in half a dozen threads. That would be fine if your OS schedule each programmer-visible thread on to several cores, giving the compiler free access to a collection of cores at a time. But if it doesn't do that.....well, a thread is quite heavyweight, think of all the stack allocation, synchronization and scheduling that has to go on. And hardware and core availability is quite variable. Maybe it's a performance improvement across two cores in one package scheduled simultaneously, but not two cores in different packages. Or with one processor but not another.

      Maybe a further necessary step will be for processor to have so many cores that each programmer level thread can be scheduled on to two or more cores, giving scope for the compiler to generate parallelism as well as the developer.

    45. Re:Great potential by jones_supa · · Score: 1

      Good point. There is indeed parallelism going on with languages such as VHDL. Although they are technically just hardware description languages, not programming languages, there might be some things to learn from there.

    46. Re:Great potential by allcoolnameswheretak · · Score: 1

      Maybe bunny.hop() simply increments an internal hop counter that is written to a file when the program terminates... I know I could have been more specific and mentioned that the compiler can build a dependency and visibility tree on shared data to plan an optimal parallel execution schedule for blocks of unrelated processes at varying granularity... but the simplicity was intentional.

    47. Re:Great potential by Xest · · Score: 1

      AFAIK where it can be automatically inferred that a particular task can be run in a functionally independent manner as in your bunny.hop() example .NET 4.5 already automatically does this for the LINQ extension methods.

      So for example, if you have a list, such as:

      List bunnies = new List();

      Then you can do, say:

      bunnies.ForEach(x => x.hop());

      and the call to the ForEach extension method will automatically split execution against multiple threads. IIRC it's even somewhat intelligent in that it will only do this optimisation if the list is of adequate size to make it worthwhile.

      It sounds like this new research basically extends this existing idea beyond the LINQ extension methods but is based largely on the same base concept.

    48. Re:Great potential by Anonymous Coward · · Score: 0

      That's exactly true, and it spells why the method described in the article has very limited use.

      It's only very rarely that code does things that don't influence anything else. This example of bunny hopping, for instance, is very unrealistic: if a bunny hops and has no effect on anything else, it's useless and doesn't need to be executed.

      Now, it's possible that a bunny hopping connects to a database and does something there (note that it can't use an already established connection, otherwise it's having an effect on other parts of the program!), or opens a file and writes to it, etc., that would be useful. These are limited cases, though More often than not, to get real gains, you'd have to write code with this optimization in mind.

    49. Re:Great potential by TemporalBeing · · Score: 1

      Honestly, any programmer worth $120k could make something more efficient just using threads, if they thought about it. This will get you 20% of the efficiency gains without thinking of it.

      THIS IS WHAT WINDOWS PROGRAMMERS ACTUALLY BELIEVE.

      Unfortunately for them Windows eats that 20%, thus negating any gains they may make in their actual program.

      --
      Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
    50. Re:Great potential by wierd_w · · Score: 1

      Not necessarily. Some processes require sequential execution, as the input for the next function is the result of the previous. The two cannot execute in parallel, unless the people at microsoft have invented the ansible.

      For other things that don't require sequential execution, multithreading just makes sense given the current trend in computing hardware. Eg, if you want to both divide and square root a number, and return both results, there is no reason that both sets of computations could not fire simultaneously, since they share a single common input, and don't change each other's data. (Or two seperate database queries, or conditional checks, etc.)

      The issue here is that continuing execution past the paralleled operation requires both threads to complete.

      Race conditions galore can occur with improperly structured parallel execution. Many programmers have gotten comfortable with the intrinsic inability to produce that kind of race condition within their porgrams using sequential execution. Old habbits die hard.

      What this compiler does is parse the sourcecode for a sequential execution program, identify blocks of code that can execute in parallel, insert conditional checks to ensure all parallel blocks execute before returning, and restructure the compiled code approprately.

      Because execution prediction is tricky, and because programmers use nasty behavior exploits in some of their programs, this compiler could blow up spectacularly in certain circumstances when trying to multithread a single thread program. (Programmer makes use of a hardware quirk in how a processor handes cached results, for instance. Code executes on a different core than intended, cache behavior is not conserved, shit hits the fan.)

      As long as the compiler has metacomment flags you can put in to tell it to not attempt paralellism on certain blocks, it should be quite useful. But if it just blindly tries to mangle code, its gonna make a lot of programmers very angry.

    51. Re:Great potential by ImprovOmega · · Score: 1

      There's at least two reasonably decent free projects out there for auto-parallelism: Par4All and PLUTO. They both can take C code as input and chuck out a paralellized OpenMP implementation. I did a research project last semester comparing them with hand optimization on wavelet filters (think JPEG-2000 image processing). So far, hand optimization blew them out of the water in every case. And that was after reworking the code to be more amenable to their particular "quirks". Both of these have reasonably recent ongoing work being poured into them. At least as of Spring 2012, the state of the art for auto-paralellizers wasn't anywhere near "holy-grail" territory.

      More to the point, what is being claimed in this paper is that they can prove thread-safety and auto-split functions as long as the variables are properly typed into read-only and writable . While this is very cool and useful, it's not some game changing breakthrough that's suddenly going to let you multi-thread all your programs. The program must still be built with multi-threading in mind, but as long as you set up all of the type correctly, it can do a lot of the heavy lifting for you. Very cool, but hardly the "holy-grail" that the /. summary claims.

    52. Re:Great potential by cduffy · · Score: 1

      Functional programming works okay if you're doing lots of similar things at the same time, but not if you're doing lots of different - but parallelizable - tasks.

      Speaking with my FP hat on here, I have not the slightest idea what you're talking about.

      The big gains from FP come from isolation of side effects, and you get that gain no matter whether you're doing the same thing or a lot of little things.

    53. Re:Great potential by gwjgwj · · Score: 1

      make -j 8

    54. Re:Great potential by Anonymous Coward · · Score: 0

      Hi, I see you're new here; welcome to Slashdot!

    55. Re:Great potential by Anonymous Coward · · Score: 0

      And most of the time, they're right. You can get a lot of gains out of multithreading without doing much, as long as you're not stupid enough to introduce subtle bugs doing so.

    56. Re:Great potential by Bengie · · Score: 2

      i++ is atomic if the CPU can handle working at the size of i. On a 32bit machine, if "i" is 64bit, then it won't be atomic, but on a 64bit machine, i++ is atomic. Newer x86 CPUs do support 128bit and soon 256bit atomic memory reads and writes via SIMD registers.

      Now, it is not guaranteed to be pushed out to memory immediately. If you're doing some like for(i = 0; ... i++), "i" may be stored in a register and never pushed back out to memory. You need to declare i as volatile or use a memory fence to force i to be pushed.

      Volatile doesn't fix everything either. All it does is say the update will be pushed out to memory, but because most CPUs use Out-of-Order execution, the read/write may get re-ordered.

      eg.

      shared volatile int i = 0
      shared volatile bool b = false

      threaded method()
      {
      i = 3
      b = true
      }

      Even if the compiler doesn't re-order these two assignments, the CPU might because "i" doesn't have a detectable dependency.

    57. Re:Great potential by Alex+Belits · · Score: 1

      And most of the time, they're right. You can get a lot of gains out of multithreading without doing much

      Most of the time the cost of synchronization and shared cache overuse outweighs any gains given by supposed parallelism. Worse yet, in a genuinely multitasking environment, such as a server with large number of clients, there is no benefit in the first place because CPU time ends up being stolen from unrelated tasks in other processes, tasks that actually use CPU and RAM more effectively due to being unrelated and requiring no synchronization.

      On the other hand, many algorithms can be SIMD'ified to get actual performance benefits without wasting CPU time or additional synchronization, however ignorant Windows programmers such as yourself, only stumble on those things by accident when compiler recognizes their scribblings and does optimization behind their backs.

      as long as you're not stupid enough to introduce subtle bugs doing so.

      There is nothing subtle about race conditions. Just because you can avoid those, does not mean that you are not an idiot.

      --
      Contrary to the popular belief, there indeed is no God.
    58. Re:Great potential by Anonymous Coward · · Score: 0

      A lot of things about them are subtle. I don't expect an idiot like you to understand, so don't worry.

    59. Re:Great potential by Alex+Belits · · Score: 1

      If race condition is "subtle" to you, all programming is beyond your undersranding, and you should stay away from it.

      --
      Contrary to the popular belief, there indeed is no God.
    60. Re:Great potential by unrtst · · Score: 1

      Thank you. I miss three characters, and the whole point gets missed cause someone doesn't know anything about make ("make -j dinner", as dutchd00d below says, would do fine).

    61. Re:Great potential by Anonymous Coward · · Score: 0

      Of course it's not subtle once you find them you stupid idiotic fuck. Doesn't take a genius to realize that they are usually more complicated bugs than what you'll usually see.

  4. I hate when people misuse Moore's law by rminsk · · Score: 2, Informative

    Moore's law is only about the number of transistors on integrated circuits.

    1. Re:I hate when people misuse Moore's law by Anonymous Coward · · Score: 0

      it is now about intel cpu naming scheme...

    2. Re:I hate when people misuse Moore's law by oodaloop · · Score: 4, Interesting

      It turns out that Moore was observing only a small portion of a greater trend. That is, many (or most) things involving technology have a short doubling period, usually between 1 and 2 years. Total information created, processing power per dollar, resolution of brain scanning technology, et al all follow a similar doubling period to what Moore observed with transistors on integrated circuits. It's not that everyone else is misusing Moore's Law, so much as Moore didn't make make the law broad enough.

      --
      Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
    3. Re:I hate when people misuse Moore's law by Anonymous Coward · · Score: 5, Funny

      No no, you misunderstand. They're teaching the compiler to be so efficient that it actually starts creating new transistors to make itself smarter.

      Make no mistake, this is how the rise of the machines will start.

    4. Re:I hate when people misuse Moore's law by newcastlejon · · Score: 4, Funny

      oodaloop's Law doesn't have quite the same ring to it.

      --
      If God forks the Universe every time you roll a die, he'd better have a damned good memory.
    5. Re:I hate when people misuse Moore's law by Anonymous Coward · · Score: 3, Informative

      You do realize that all the things you listed are a side effect of the doubling of transistor density - and not some separate independent phenomenon?

    6. Re:I hate when people misuse Moore's law by oodaloop · · Score: 2

      Does that include the doubling of processing power before the transistor was invented?

      --
      Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
    7. Re:I hate when people misuse Moore's law by viperidaenz · · Score: 2

      Moore's Law didn't exist back then.

    8. Re:I hate when people misuse Moore's law by timeOday · · Score: 4, Insightful

      I hate it when all the potentially interesting discussion on a slashdot story is headed off by a pedantic nitpick.

    9. Re:I hate when people misuse Moore's law by Anonymous Coward · · Score: 0

      I hate it when all the potentially interesting discussion on a slashdot story is headed off by a pedantic nitprick.

      FTFY

    10. Re:I hate when people misuse Moore's law by loneDreamer · · Score: 4, Interesting

      Magnetic Hard drives, for instance is an example of the same curve with no transistors.

    11. Re:I hate when people misuse Moore's law by Anonymous Coward · · Score: 1

      Apparently it applied anyway.

    12. Re:I hate when people misuse Moore's law by ceoyoyo · · Score: 2

      "resolution of brain scanning technology"

      Oh, that would be fantastic. Unfortunately it's just not true. Physics intervenes. Well, that and not making patients smoke (because they're on fire, not because they crave nicotine).

    13. Re:I hate when people misuse Moore's law by ceoyoyo · · Score: 1

      Hey, it's a refreshing change to have a Slashdot discussion dominated by snide, pedantic comments pointing out an error in the summary, as opposed to a discussion dominated by snide comments making wild extrapolations after assuming an error in the summary is indeed correct.

    14. Re:I hate when people misuse Moore's law by jeff4747 · · Score: 1

      I hate it when posters can't be bothered to read the entire title before posting.

    15. Re:I hate when people misuse Moore's law by Anonymous Coward · · Score: 1

      Magnetic Hard drives, for instance is an example of the same curve with no transistors.

      Not so much. The DSP required to make the higher densities useful relies on transistors. I'm not saying its purely a function of transistors, but the new sort of platters wouldn't be anywhere near as useful if we didn't have increased computational ability to make something of them.

    16. Re:I hate when people misuse Moore's law by Alomex · · Score: 1

      Except that they went off curve at some point in the early 90s if I remember correctly, and then came back strong in the 2000s.

    17. Re:I hate when people misuse Moore's law by Anonymous Coward · · Score: 0

      Yep, there's always some dickhead that mentions Moore's Law, then some other dickhead corrects the first dickhole....... Rinse, Lather, motherfuckin' repeat.

  5. How about by Anonymous Coward · · Score: 0

    How about using threads when it makes sense, and not when it doesn't? It doesn't matter how many cores, how many threads, and how many gigahertz you have when the computer is doing nothing but blinking a cursor waiting for me to press the keyboard. The vast majority of programs match this situation, and all the magic compiler is going to do is increase the speed of the System Idle Thread.

    1. Re:How about by chromas · · Score: 5, Funny

      all the magic compiler is going to do is increase the speed of the System Idle Thread.

      Have you seen how much processing power that thing consumes? Usually 90% and up. On all cores. It really needs some serious optimization.

    2. Re:How about by TapeCutter · · Score: 1

      The vast majority of programs match this situation

      I fear your sample may be biased, most of the commercial applications I've worked on in the last 20yrs talk to other applications, not people.

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
  6. Or.. teach devs to use threading as appropriate? by databeast · · Score: 4, Informative

    Or, gawd forbid.. we could teach programmers how to use threading? I am a casual developer, with no formal training beyond writing practical code and reading as much as I can from the keyboards of real developers. I've run into my fair share of "real", "professional" developers who've been tasked to work on my code and thrown their hands up in defeat - not, as I feared, because of the awfulness of my coding style, but "This uses threading, I don't know this!".. Of course, looking at their resumes, a long list of single threaded webapps where the actual threading is handled for them by the webserver itself.. I give up. Even some basic native GUI client development should teach developers simple threading and asynchronous callbacks? alas, yet another talent abandoned in the age of the webapp. Is it any wonder the security issues (my actual realm of supposed 'expertise') in their code often make the architectural issues look trivial in comparison?

    An interesting development, and much needed I fear, but yet another layer of abstraction to allow lazy developers to not have to really bother about knowing what their code is actually doing (that's for the poor SoB who has to maintain it is for...)

  7. Parallelizing is easy, performance is hard by HuguesT · · Score: 4, Insightful

    OK, so now we shall have another way to produce parallel programs.

    Running things safely in parallel is a very good thing, but it does not by itself guarantee significant improvements in speed. The hard bit is actually to optimize the parallel code.

    1. Re:Parallelizing is easy, performance is hard by Anonymous Coward · · Score: 0

      Quite so; this is presumably sensationalized by someone who's never heard of automated parallelization before.

      As far as I can tell, the authors are simply introducing semaphore abstraction, which is pretty much the lowest hanging fruit and still requires very careful consideration by the programmer for significant general purpose improvement over serial. There are a variety of much harder problems (cache blocking, dependency analysis/transformation, contraction, partitioning, ect.) that, while being studied, have seen extremely incremental improvements and are very far from seeing practical application.

      Wikipedia has a surprisingly good list of the automated tools in development - http://en.wikipedia.org/wiki/Automatic_parallelization_tool

    2. Re:Parallelizing is easy, performance is hard by godrik · · Score: 1

      Exactly, I am doing parallel programming and HPC for a leaving. I do not believe in compiler automagic^Wautomatic parallelisation. Typing is only a small parts of the problem. Most algorithms need to be written VERY differently to be able to exploit the parallelism within. Some times, you actually need to have variable concurrency and race condition within your algorithm but you have ways to fix problems whenever they appear.

      In brief, I do not think that compilers will be able to do anything more than quite basic stuff. What ever is proposed here do not appear to me to be significantly more powerful than Cilk (now Intel Cilk+).

    3. Re:Parallelizing is easy, performance is hard by Anonymous Coward · · Score: 0

      The funny thing is, the compiler is able to do the optimizations you're linking to. If you actually read the academic paper behind the blog post, it show's you how it avoids resource contention by tracking mutability. The strategy is to remove resources to contend with - the compiler only splits to multiple threads when it knows a resource contention would be impossible. From you're link, I'm sure you think the compiler is auto-generating locks & synchronization blocks (which slow down execution) - that's simply not true. The compiler that Microsoft wrote generates lock-less multi-threaded code.

    4. Re:Parallelizing is easy, performance is hard by Darinbob · · Score: 1

      One reason some explicitly parallel computers died and were replaced by SMP and "cores". Programmers don't want things to be hard and so insisted on the simplest of all parallel models.

    5. Re:Parallelizing is easy, performance is hard by Anonymous Coward · · Score: 0

      The fact that (some) people working on this stuff think race conditions are either desirable or unavoidable indicates quite strongly why we need compilers to do this stuff for us. To save typing? Ha ha ha. No, to save bugs. Such as bugs introduced by people who intentionally allow race conditions to exist.

    6. Re:Parallelizing is easy, performance is hard by i · · Score: 1

      It's rather that you have to *design* the application on a high level to get the most of the parallelism gains.

      --
      Mundus Vult Decipi
  8. mutable state by jbolden · · Score: 4, Informative

    Mainstream language have mutable state all over the code. Functional programming's big change on state issues is to careful isolate state. The Microsoft approach means that state needs to be tracked carefully so that it could be isolated by the compiler even if it isn't isolated by the code. Which is likely just as much work as isolating state. And the nice thing about isolating state is once you do it you can make use of all sorts of incredibly powerful functional paradigms like like first class functions (closures, partial execution...) and lazyness (infinite data structures, no need to figure out proper order of evaluation..)

    The solution to parallelism is functional programming. And no it is not too hard. Excel is a functional programming language that lots of people know that does a great job isolating state.

    1. Re:mutable state by readin · · Score: 2

      I wish I knew more about functional programming than I do. In reading the article I found the concept of a language the limits mutable variables interesting. In using Java I find that making most variables "final" is helpful to me for a number of reasons. I can easily find where the variable got its current value. If I write a line of code that changes the wrong variable it is immediately obvious. It keeps me from re-using variables that shouldn't be re-used (generally a variable should have one meaning and one value). It helps me keep variable scope narrow.

      Is this article suggesting that most functional languages make their variables the similar the Java "final" or am I way way off?

      --
      I often don't like the choices people make, but I like the fact that people make choices. That's why I'm a conservative.
    2. Re:mutable state by betterunixthanunix · · Score: 3, Interesting
      In functional algorithms and data structures, everything is immutable (in theory) -- rather than thinking in terms of "final," think in terms of Java's String class. If you want to change one character in a String instance, you must create a new String instance. For a less trivial example, consider how a functional algorithm that removes an element from a list would work (written in Java-like syntax):

      List remove_if(List lst, Predicate pred)
      {
      if(lst == null)
      {
      return null
      }
      else if(pred.test(lst.first()))
      {
      return remove_if(lst.rest());
      }
      else
      {
      return new List(lst.first(), remove_if(lst.rest()));
      }
      }

      Notice that this constructs an entirely new list, even if none of the elements in the list pass the test. This may seem like a terrible idea, but let's put it this way: if you have 10 threads that share the list, and one of them wants to remove some nodes, you would have had to have copied the list anyway; the functional approach to remove_if is exactly what you want. Now, consider this function, which only removes the first node to match:

      List remove_first(List lst, Predicate pred)
      {
      if(lst == null)
      {
      return null;
      }
      else if(pred.test(lst.first))
      {
      return lst.rest();
      }
      else
      {
      return new List(lst.first(), remove_first(lst.rest()));
      }
      }

      Now you have a situation where lists share nodes -- and again, imagine a situation where 10 threads share the list, and one wants to perform this operation. This is one of the reasons that functional programming is so promising for parallel algorithms: you have fewer situations where explicit mutexes are needed, because you are usually copying things that you intend to modify (or more precisely, you never really modify anything).

      Of course, things are more complicated in the real world. Purely functional approaches would obviously be pretty inefficient in a lot of cases, since things would be needlessly copied. Lisp, as an example, has both a non-destructive append as well as a destructive nconc, the latter being intended for use in situations where the original lists will not be used again (and can therefore be modified).

      --
      Palm trees and 8
    3. Re:mutable state by segfault_0 · · Score: 1

      The benefits of using functional languages is realized in terms of program safety and a lack of defects -- not necessarily performance. I think we are all aware of the fact that there are plenty of programmers out there who care very little about introducing a few bugs for the sake of speed and looking clever.

      But even if you just use immutable data and message passing as IPC under a procedural language you are headed in the right direction. It's really a mindset and even the procedural programming folks don't really have many excuses for not doing what the functional languages make natural.

      --

      I was crazy back when being crazy really meant something. (Charles Manson)
    4. Re:mutable state by shutdown+-p+now · · Score: 2

      All mainstream languages are slowly gravitating towards FP-style - C# is already halfway there, Java is on its way to join it, C++ STL is functional in spirit if not in implementation, and of course we have new languages like Scala that are explicitly designed to marry the old and the new.

      Problem is, for transparent parallelism, you need to go 100% functional / immutable, and that's just not happening. What does happen is isolating distinct units in an OOP program that are then implemented functional-style with immutable data structures etc - and then exposed via a high-level conventional OO API at component boundaries. Which, if you think about it, is not all that different from what the paper talks about.

    5. Re:mutable state by 10101001+10101001 · · Score: 1

      Mainstream language have mutable state all over the code. Functional programming's big change on state issues is to careful isolate state.

      Isolating state isn't per se the issue. Most mainstream languages isolate state as part of their automatic garbage collection. Yet, they still offer a high level of mutability. No, what functional programming's offer more fundamentally is closures, as you note. That translates into a much clearer tree-based observation of state. Ie, what functional programming languages offer is not per se protection from mutable state but a clear outline on exactly how one would go about writing a parallelizing compiler.

      Meanwhile, non-functional languages lack the structure that make such changes as apparent. Isolated state does very little for a compiler writer who can't easily comprehend the structure of the program. Of course, the same is actual true in functional programs as well: the compiler will shine in parallelizing and optimizing the lower-level functional parts yet not inherently understand or parallelize the overall groupings of functions. To that end, a more generic optimizing and parallelizing compiler is needed that can break down entire programs. And that's precisely what MS Research did by being able to effectively tree structure everything.

      Put another way, a parallelizing functional compiler only works as well as the programmer who structures the program to isolate state in a fashion that allows for the compiler to recognize that isolated state in the right level of nesting. But a truly parallelizing compiler can observe the actual level of isolated state and work on that and not just hope the programmer did a good job.

      --
      Eurohacker European paranoia, gun rights, and h
    6. Re:mutable state by Anonymous Coward · · Score: 0

      That's part of it. If you can change everwhere that your code says private to private.final, then you've gone a long way towards minimal shared state within your program. Threading something like that is a lot safer than threading functions that are very state-dependent.

      Really you can take any language and move towards a more functional method of thinking, and it sounds like you already largely thinking that way.
      http://www.youtube.com/watch?v=JeK979aqqqc

    7. Re:mutable state by readin · · Score: 1

      That helps. Thank you!

      --
      I often don't like the choices people make, but I like the fact that people make choices. That's why I'm a conservative.
    8. Re:mutable state by Anonymous Coward · · Score: 0

      I see your point--functional isn't really that foreign--but Excel is not actually functional. Data tables, iteration, and volatile functions (eg. rand()) come to mind.

    9. Re:mutable state by z3alot · · Score: 0

      It is important to remember that while a naive approach to functional programming (that is, using the same data structures and idioms as for imperative languages), will indeed be inefficient due to enforced persistence, much of the loss can be mitigated by using better data structures. For instance, Data.Sequence of Haskell supports access and modification time logrithmic in the closest distance to one end of the sequence, and does so in a thread safe (as always) way.

      Even in the case where we demand mutable data structures because nothing else is acceptable, we can control the access in a purely functional way using monads (a very cool concept which arent as scary as they first might seem). See for instance, the haskell State monad and mutable data structures.

    10. Re:mutable state by sjames · · Score: 1

      Not quite, but similar. A big part is that data is immutable. That is, you cannot change a variable in place or out of scope. You can assign a new value to a given name. For example, you cannot do something like sort(a). You CAN do a = sort(a). The difference is that sort isn't re-arranging a in place. Likewise, you cannot do a[3]=5 but you can do a = [ a[0:2], 5, a[4:]].

      That may not seem like much, but it also means there is no such thing as aliasing. For example, given a=[1,2,3,4,5,6], b=a, a=[ a[:2], 5, a[4:]], thjen b[3] will still be 4 but a[3] will be 5. That's because b=a means b is assigned the immutable array referenced by a. The later assignment to a doesn't affect b's assignment.

      A functional language will tend to produce less efficient code than a mid level language like C would, but it also produces code that lends itself well to parallelization and correctness checking.

    11. Re:mutable state by chthon · · Score: 2

      Problem is, for transparent parallelism, you need to go 100% functional / immutable

      Which comes down to re-educating programmers to learn to think about their algorithms.

      One book which does this nicely, without going to deep into theoretical computer science is How to Design Programs

    12. Re:mutable state by dkf · · Score: 1

      Of course, things are more complicated in the real world. Purely functional approaches would obviously be pretty inefficient in a lot of cases, since things would be needlessly copied. Lisp, as an example, has both a non-destructive append as well as a destructive nconc, the latter being intended for use in situations where the original lists will not be used again (and can therefore be modified).

      The other complication in the real world is that memory management becomes a real drag because memory is inevitably a shared modifiable resource, and determining what threads are using a particular piece of memory is messy. The best approach I'm aware of involves sharing objects between threads only for the duration of a message transfer and processing in response to it; if the receiver wishes to retain a value, it has to copy it (which can be done in a usually-lock-free way, fortunately). This model fits with pure functional programming languages fairly well (and a few others) but it really sits badly with the POSIX threading model and its derivatives (including both Java and C#).

      If your program is both lock-free and hazard-free, it can scale up and out very well. However, it's not trivial to get there, and it is an arrangement that is actually non-optimal in some special cases, such as applying a color transform to an image. Parallelization is harder than it looks, and it definitely should look hard.

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    13. Re:mutable state by jbolden · · Score: 1

      Java is a really tough language to make analogies with functional programming. Instead of like French to Spanish or even French to English it is like French to Chinese. J2EE and J3EE have a lot of code so that objects on the client synchronize with objects on the server. Imagine if the default behavior when a client a changed an object was for them to make a copy of the server's object and then change that. Thus they always had their own unique copies and there was no need to synchronize until the end.

      A place where you've run into functional concepts most likely though is working with databases. SQL is a functional language. Inside a select, regardless of how big you cannot change data. Thus the RDBMS knows if it faces 200 selects it can execute them in any order safely. Thus you as a programmer have no ability to control the order the RDBMS executes selects in. That's a far better analogy.

    14. Re:mutable state by jbolden · · Score: 1

      Why is color transform on an image hard? That sounds like a classic map-reduce.

      a) decompose the image into n pieces
      b) apply color transform to each piece
      c) reassemble, note this is a pairwise associative reassemble because of position and thus you don't need complex organization.

    15. Re:mutable state by jbolden · · Score: 1

      For many functional languages a=[ a[:2], 5, a[4:]] wouldn't even be legal. It would only be legal if a is a state. That is a isn't an array but
      a:: S -> (K,S) where K is an array. Then b is either an alias to a and thus subject to the same state shifts or more likely b is the value of a at one particular state.

    16. Re:mutable state by jbolden · · Score: 1

      I understand what the paper is talking about. But what you are describing where the bulk of the code is immutable is easily implemented in a language which protects immutability. Then the exposure to mutability is just the "State Monad" and the lifting from immutable to stateful is easier. The Stateful monad then exposes the objects (or really data) as services.

        So why not do the immutable part in a functional language where you don't have to keep cutting against the grain?

    17. Re:mutable state by jbolden · · Score: 1

      Interesting point, I see. So it is the compiler that isolates state not the programmer and the hope is that the compiler sees a path for parallelizing via. implicit isolation?

      I gotta tell you I don't see that working for anything but a trivial program in practice. Synchronizing state like J2EE / J3EE does is hard. It just strikes me as a lot easier to bite the bullet. But the.NET compiler is an excellent compiler.

    18. Re:mutable state by jbolden · · Score: 1

      data tables are functional,that's just the functional "map". rand would be in the state monad in a functional language which is more or less how excel treats it. Iteration is just lazy evaluation
      f_n depends on g_{n-1}
      g_n depends on f_{n-1}

      So you can create a sequence l = [(f0,g0), (f1,g1),....]

    19. Re:mutable state by jbolden · · Score: 1

      Oh I agree it is a mindset. It is just a heck of a lot easier to do it in a language that makes it easy rather than in a language that makes it hard. It is also easier to acquire the mindset.

    20. Re:mutable state by lehphyro · · Score: 1

      It can be done without functional programming, see D's immutable keyword and thread-local instead of global variables by default. It seems the results of this research would fit very well in D rules.

    21. Re:mutable state by Anonymous Coward · · Score: 0

      It works in real world but the compiler/runtime must be capable of determining when the new list should be a copy, or not (with append), or a structure of original list plus changed items (and made a full copy if changes are too many), etc, etc.

      It's a lot of work for the compiler and basically no functional language is usable without very aggressive optimization which rewrite or rearrange code .

    22. Re:mutable state by jbolden · · Score: 1

      D is explicitly cross paradigm. This paper is about using using code that was designed for non-functional paradigms and tracking changes. D allows the author to use the appropriate paradigm in the appropriate spot. I see D as being more like scala in that you'll end up with stuff that moves in the direction but doesn't quite get there.

      For example take this code block:
          foreach(words; signs2words)
                    words = words.chomp().toLower();
                      if(words.length > 1)
                              writefln(words.join(" "));

      In a functional language hat you would do is all the calculation and have the output be a lazy stream. Then pass the stream to write. So the compiler has it easy:
      construction of stream = safe to make parallel
      display of stream = needs to be single threaded

      conversely the above D code the write (stateful operation) is interspersed with the stateless one. That's what you want to avoid.

    23. Re:mutable state by fatphil · · Score: 1

      > Notice that this constructs an entirely new list

      Nope. I notice that it may construct an unbounded number of new list, one for each level of the recursion.

      --
      Also FatPhil on SoylentNews, id 863
    24. Re:mutable state by betterunixthanunix · · Score: 1
      In the functional sense, we usually define a List recursively:
      • null is a List
      • List(value, rest), where value is some value and "rest" is a List

      So in Java, a very simple List class following this definition would be:

      class List {
      Object head;
      List rest;

      public List(Object h, List r){
      head = h;
      rest = r;
      }

      public Object head() {return head;}
      public List rest() {return rest;}
      }

      Another way to put it is this: the only value that represents "one list" is null.

      --
      Palm trees and 8
    25. Re:mutable state by Jmc23 · · Score: 1

      Lisp looks horrible dressed like java.

      --
      Don't complain about syntax, grammar, or spelling. There is no.hell like input on android.
    26. Re:mutable state by Jmc23 · · Score: 1

      If a variable had only one meaning and one value it wouldn't be a variable, it would be a constant.

      --
      Don't complain about syntax, grammar, or spelling. There is no.hell like input on android.
    27. Re:mutable state by readin · · Score: 1

      If a variable had only one meaning and one value it wouldn't be a variable, it would be a constant.

      In computer programming, something is usually called a constant only when the value is known before the program is run (at compile time). However it is possible that a variable will have only one value from the time it is allocated to the time it is de-allocated, but that the value will know be know until the program runs.

      --
      I often don't like the choices people make, but I like the fact that people make choices. That's why I'm a conservative.
    28. Re:mutable state by 10101001+10101001 · · Score: 1

      Interesting point, I see. So it is the compiler that isolates state not the programmer and the hope is that the compiler sees a path for parallelizing via. implicit isolation?

      More a combination of both, really. Programmers already do it all the time in object oriented programming though encapsulation and method-based access. Add to that language-based opaque referencing being the norm and compilers already have a lot of information on the pretty explicit isolation. Like I was trying to say, functional languages just lend themselves better to the obvious cases of isolation given the nested structure of functions.

      I gotta tell you I don't see that working for anything but a trivial program in practice.

      And I'd say, that's the big weakness in functional languages as well. You can try very hard and try to wrap everything you do in a functional way yet in the end you're fundamentally trying to restructure your own thought processes under some mostly vain hope the compiler will compile and dish out the right optimizations. Without a lot of extensive refactoring, though, you end up losing out on a lot of actually substantial benefits because it's often unclear exactly where the bottlenecks are or simply how much could be gained though extensive and pervasive small scope parallelizing throughout all of the code. This is, after all, the primary basis for superscalar architectures and their clear superior processing power over simple pipelined architectures.

      Synchronizing state like J2EE / J3EE does is hard. It just strikes me as a lot easier to bite the bullet. But the.NET compiler is an excellent compiler.

      Well, the simple fact is that compiler writing is hard. I mean, it's trivial enough to write a compiler to take simple code and output a very direct translation. But the second you deal with every last possible technical valid variation under the scope of a language standard that doesn't really entirely jive with the expectation of programmers and couple that with all the potential bugs introduced while adding parallelism... This is no doubt the reason why the ia64 was such a colossal failure.

      So I'd tend to agree with biting the bullet if for no other reason that we've yet to reach the state of the art in compiler design to actually make the ia64 or similar architectures shine--and it's not like we haven't had GCC to be the sort of living-laboratory for every compiler writieri n the world to gain the fame to actually successfully architect such a thing (and for all us simpler programmers to benefit from the genius level voodoo and potentially extrapolate it onto other projects) so it rather hints that such a design may be impractial.

      Still, since functional languages merely shift the road blocks instead of really removing them...and push towards having genius level voodoo programmers in every application for optimizing at that level...I'd still rather MS Research and others actually further their compiler designs. :)

      --
      Eurohacker European paranoia, gun rights, and h
    29. Re:mutable state by Jmc23 · · Score: 1

      Ah, true. I'm used to Lisp so the pieces aren't all the same, I forgot to translate.

      --
      Don't complain about syntax, grammar, or spelling. There is no.hell like input on android.
    30. Re:mutable state by jbolden · · Score: 1

      rogrammers already do it all the time in object oriented programming though encapsulation and method-based access

      I agree that's why it is possible to have objects reside on clients and go parallel in that way. But that often isn't a very good division of labor and communication between threads can be too expensive. Just having a couple thousand threads around even if most are blocked might be good enough to get some boost and certainly at least they are able to make use of additional CPUs but I don't see that as similar to being able to cleanly divide a large workload with low communication needs like in functional.

      As far as the weakness for functional. What I see is the big advantage is that encourages you to write the bulk of your code in a way that makes parallelizing trivial whether that be the compiler or the programmer. Most functional code (after partial evaluation) is going to be reducible to one variable transforms
      f: a -> b (map) then if you hit one that isn't you merge two lists (or self merge the list)
      g: a -> b -> c (reduce)

      That's super clean and because of the associative criteria I can keep communication between processes very low. And that gets rid of the vast bulk of my code. That leaves the small fraction that is I/O, stateful... code. Now there what Microsoft is doing makes some sense. But isolating off 90+% of code immediately strikes me as far better.

      In terms of bottlenecks in theory there shouldn't be any. Once you hit a bottleneck that means your algorithm was wrong. Functional makes it easy to change algorithms and encourages that sort of approach.

      Still, since functional languages merely shift the road blocks instead of really removing them...and push towards having genius level voodoo programmers in every application for optimizing at that level...I'd still rather MS Research and others actually further their compiler designs. :)

      I love what Microsoft's compiler group does. But I don't think it is about genius voodoo. Excel proves that rather poor programmers can handle functional. Javascript's popularity proves that people can do interfaces that way. Etc...

    31. Re:mutable state by 10101001+10101001 · · Score: 1

      The problem as I see it, though, is precisely the mythic "90+% of code" which is presumed to be related to variable transformation. Yet, if that were the case, functional programs along with functional compilers would already be producing programs substantially faster on vectorizing CPUs. Of course, I know the above isn't that simple and it helps little that poor programmers who can functional languages may well code so badly to miss out on most the compiler benefits. Yet trying to structure everything to fit the theoretical mode of functional languages quickly hits real-world design issues which is precisely who object oriented and procedural languages are still quite common, even if functional languages when well coded for routinely were substantially faster. The end problem after all is compilers and CPUs trying their best to see work done as fast as possible no matter how poorly code is or even must be written vs the potential expectations.

      Or, put another way, if there is any solid truth that seems to be pervasive it is that over time an ever growing use of CPU cycles are used to do the same amount of work as of decades past and it is very unclear exactly what steps could reasonably be taken to really eliminate this dilemma in any sensible fashion. Certainly, I'm not holding my breathe for a [near] purely functional language based OS. :) So, instead, I think we're left to further designs of compilers to tackle that mythical 10-% of code which, I think, may well account for much closer to 99% of code on many systems. :/

      --
      Eurohacker European paranoia, gun rights, and h
    32. Re:mutable state by jbolden · · Score: 1

      . Yet, if that were the case, functional programs along with functional compilers would already be producing programs substantially faster on vectorizing CPUs

      I would argue they are. The entire high speed graphics libraries that are used by GPUs are a good example of this. Graphics are way faster today because of the functional code that is executed on GPUs. Another example of this massive parallelism is Google's use of map / reduce which starting in the late 1990s allowed them to do a decomposition of the entire internet regularly.

      Of course, I know the above isn't that simple and it helps little that poor programmers who can functional languages may well code so badly to miss out on most the compiler benefits.

      No I actually think even poor programers can benefit from this. The problem is that functional compilers are much less sophisticated than structured / object oriented ones. Compiler theory and CPU theory over the last 3 decades has been about making C code run fast, and then a lot about virtual CPUs with bytecode. Functional compilers in some senses are a generation ahead in terms of innovations but in terms of raw performance about a generation behind. GCC used to be like this, very innovative but about much slower than good compilers. Linus' team worked hard for about 6 years to get GCC's performance up and by the early 2000s they had. Someone is going to have to do something similar, a GHC that's much less feature rich but much faster.

      Certainly, I'm not holding my breathe for a [near] purely functional language based OS. :)

      I am. I think huge number of CPUs being standard and diverse semi-independent chips to boot will make this more and more necessary. You see this already in hardware languages, the shift towards functional has been profound. Right now that functional code is shielding the end users from the complexity of modern hardware but I'd assume that eventually people will want OSes that take better advantage of what the hardware could potentially do. Apple's focus on performance is a real change in an industry which had moved away from performance towards inexpensive, they are raising the bar.

      I suspect that many of the dynamic languages will move in a functional direction and to get performance will have their systems written more functionally. Perl 6 being a good example of the change that's happened in just a few decades, where a lot of ideas from GHC and Haskell have bled over. If Perl6 hadn't stagnated it would be out years ago and would have like Scala and Clojure have moved the industry more in a functional direction.

      So, instead, I think we're left to further designs of compilers to tackle that mythical 10-% of code which, I think, may well account for much closer to 99% of code on many systems. :/

      You and I don't disagree that if you don't use functional it is 99%. My point is that once the programmer needs to give the system that much isolation and hints he might as well write functional. But obviously if the system could work on out of the box C# that's huge. Microsoft's research is very cutting edge and if they get this to work they have the kinds of advantages that it will be hard for anyone to compete with for a decade or more.

    33. Re:mutable state by badkarmadayaccount · · Score: 1

      Lots of lambda lifting, hidden classes, and implicit message passing for state sync.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    34. Re:mutable state by 10101001+10101001 · · Score: 1

      The problem is that GPUs and high speed graphics libraries are an edge case, just like Google's use of map / reduce. That isn't to say they're not very advantageous in their own field. The point though is that 99% of code isn't GPU or map / reduce and there's been literally decades for functional programming to be made provably the superior option in more than a lot of edge cases.

      Further, compiler and CPU theory over the last 3 decades have focused on making all code run faster, not simply C code. It is merely that C code which its highly interdependent structure has seen the most obvious gains. Meanwhile, as you say, functional compilers are a generation ahead because their inherent structure already allows for much better optimization. Yet by the same token, plenty of programs are still not written in a functional style precisely because they do not form into that inherent structure and trying to morph them into such a format is likely to be very suboptimal at best.

      And, btw, the reason I'm not holding my breathe for a purely functional language based OS is precisely because Symbolic Lisp machines were such a failure. Do note that RMS was a rather huge supporter of Lisp and it's a significant reason why GNU supports Guile--admittedly much more Scheme-like than Lisp-like--yet pursued a base system of Unix and C. Of course, I believe RMS's original vision was more of a core C/Unix system and most of user space in Guile or another functional language, but then that large didn't materialize because neither scripting languages--a form I think very appealing to most Free supporters--nor even a byte code language lends itself well to the sort of optimal user experience one would desire. Then again, Android is doing quite well, but that's more or less a consequence of the smartphone/low energy platform being a fluid one and trying to avoid any future CPU migration issues--ie, they simply made a virtual architecture and so it has very little to do with language format/structure.

      I think my final analysis would be that programmers aren't reasonably capable of giving the system the sort of isolation hints to a compiler be it in a functional, procedural, or whatever language--except that perhaps a genius level programmer may every once in a while manage it for isolated case. If they were, again, 99% of programs and 99% of code would already be functional because of its inherent superiority in compiler-driven optimizations. Instead, 1% of code that would otherwise use 99.9% of CPU time uses closer to 1% of CPU time. :) So the big bottleneck then is the rest of the code that doesn't fall so easily into a pattern and that's why optimizations on non-functional code is so critical. And precisely because humans are unable to reasonable see the pattern of isolations in millions of lines of code, a compiler driven approach that can sieve through the code is the key. And because compilers can't work miracles alone, languages that curtail some of the mutability of C have been born in countless languages while still retaining a lot of the strengths of C to try to give the best of both worlds.

      So, I think RMS had it partly right and your statements about further functional incorporation are right at one level. But Perl6, Scala, Clojure, and Haskell are no languages to write a kernel in--not that it may not be technically possible but in that it's rather orthogonal to what a kernel is about, especially when it comes to having a rather clear bounded understanding of execution time, memory usage, etc. Functional programming definitely has its place and there's certainly a lot of places that functional-like structures can be incorporated into a lot of languages to handle readily parallizable edge cases. But none of that handles a lot of the unrealized readily parallizable common cases which few humans are readily able to comprehend. That is, after all, the whole reason resource scheduling in kernels is still a very heavy area of research with still a lot of questions on even static models let alone the sort of real world ones.

      --
      Eurohacker European paranoia, gun rights, and h
  9. Auto-Threading by DaneM · · Score: 2

    About time.

    1. Re:Auto-Threading by Anonymous Coward · · Score: 0

      Labview has done auto threading for ages. Whenever a blocks inputs are valid, the block executes. The problem, at least with 8.6.1 and previous is it will generally make a copy of whatever the common inputs are so it can execute things in parallel, and, well, that copy often just makes things worse than just executing serially. A smarter compiler, perhaps similar to the one suggested in this, could avoid the copies, sometimes. Of course, the other problem with Labview is C# seems to be way faster, and for a sufficiently large project, I'd rather maintain the C# code.

      Still, I remain somewhat of the opinion that a graphical language makes it easier to design multithreaded code in principle...

    2. Re:Auto-Threading by DaneM · · Score: 1

      Labview has done auto threading for ages. Whenever a blocks inputs are valid, the block executes. The problem, at least with 8.6.1 and previous is it will generally make a copy of whatever the common inputs are so it can execute things in parallel, and, well, that copy often just makes things worse than just executing serially. A smarter compiler, perhaps similar to the one suggested in this, could avoid the copies, sometimes. Of course, the other problem with Labview is C# seems to be way faster, and for a sufficiently large project, I'd rather maintain the C# code.

      Still, I remain somewhat of the opinion that a graphical language makes it easier to design multithreaded code in principle...

      Insightful comment, AC. Post it as yourself, next time! :-)

  10. Great white hope? by Anonymous Coward · · Score: 1

    "Great white hope"?

    Is this Slashdot or segregation-era Mississippi?

    1. Re:Great white hope? by belg4mit · · Score: 2

      Indeed. silicon is sort of a shiny black color!

      OP probably once read or heard the phrase,
      misunderstood the context and thought it
      would embiggen the post's language.

      --
      Were that I say, pancakes?
    2. Re:Great white hope? by catf00d · · Score: 1

      If I could, I'd mod you up. I was pretty aghast as well.

    3. Re:Great white hope? by Anonymous Coward · · Score: 0

      Wow, this is a great contribution to this thread. Somebody mod parent up!

  11. So... by Anonymous Coward · · Score: 0

    I skimmed the paper... it requires you to describe how variables are accessed... is this statically checked? If not, then it is no better than the current paradigm, except the compiler will be introducing races rather than the developer.

    Also, there are no performance numbers. How much of the gains of paralleling such small chunks of code are lost through synchronization?

  12. Re:Or.. teach devs to use threading as appropriate by lennier · · Score: 4, Insightful

    Or, gawd forbid.. we could teach programmers how to use threading?

    That's easy: "Don't."

    From everything I've read about threading, there's no general way a hand-coded multithreaded program can ever be proven to be correct. Threads introduce all sorts of extremely subtle timing-based logic and security bugs which even the smartest programmers in the world think they can handle but in practice don't. And most programmers are not the smartest programmers in the world (they only think they are).

    The correct solution is to switch from C-based imperative languages to pure-functional implicitly parallelised languages, but that's not likely to happen before the heat death of the universe.

    --
    You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
  13. Now we wait 20 years... by reve_etrange · · Score: 0

    Until non-Microsoft compilers can implement the technique.

    --
    .: Semper Absurda :.
    1. Re:Now we wait 20 years... by Anonymous Coward · · Score: 0

      Or in under 20 years, autoparallelism is utilised by less proprietary languages; which may or may not happen before Microsoft drops C# and moves on to the next latest and greatest.

  14. Re:Or.. teach devs to use threading as appropriate by timeOday · · Score: 1

    Goto is swell also. Just be sure not to make any mistakes!

  15. Re:Or.. teach devs to use threading as appropriate by HornWumpus · · Score: 0

    No non-trivial program can ever be proven to be correct.

    --
    John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
  16. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    The researchers are proposing neither solution, and I think they are right.

    Explicit threading is hard to get right and the bugs are insane. The typical proposed solution, "use a functional programming language and abandon all state", is like using sarin on ants. Effective, and pretty neurotoxic.

    The researchers propose that humans annotate languages with information about intended uses of variables which is not too difficult to infer in the cognitive model of the programmers, and then let the the compiler take care of the coordination of the threads. As it should be.

  17. Re:Or.. teach devs to use threading as appropriate by doublebackslash · · Score: 1

    I wrote up a very long comment that ranted and raved and didn't come close to this comment right here. Some threading problems are hard, really hard. They aren't that common.

    Man up and learn. I like it.

    --
    md5sum /boot/vmlinuz
    d41d8cd98f00b204e9800998ecf8427e /boot/vmlinuz
  18. Re:Or.. teach devs to use threading as appropriate by Desler · · Score: 1

    The correct solution is to switch from C-based imperative languages to pure-functional implicitly parallelised languages, but that's not likely to happen before the heat death of the universe.

    Well, yes. Are you going to be the one to pay the 10s of billions of dollars, if not 100s of billions, to rewrite all software in existence into a functional language?

  19. Doubling processors would not restore Moore's law by rroman · · Score: 1

    Doubling of processor cores doesn't imply that things can be twice as fast. There is actually pretty hard science going on researching what can be done with N processors to speed up problem solving. It turns out, that even with large numbers of processors we are not able to speed up majority of algorithms in any significant way. Maybe in real world, few additional processors could compute some basic stuff in the meantime to speed things up, but I doubt that doubling the number of processors will add significant speed to computers when the number of them is large enough.

  20. Re:Or.. teach devs to use threading as appropriate by Desler · · Score: 1

    I think the point is that just because something is hard is no reason not to teach it to people. If no one knew every learns these low-level details who is going to be maintaining the compilers?

  21. Re:Or.. teach devs to use threading as appropriate by Nerdfest · · Score: 1

    There are plenty of threading frameworks in most languages where you can just define threadable operations. Your job is simply to ensure the task is correct, use the right framework and trust it to the correct degree. As with many things, someone only needs to do it right once.

  22. Re:Fast First Post by smittyoneeach · · Score: 0

    It means that, for a Schrödinger moment, you can be goatse'd by your own foot, before it comes to rest in your mouth.

    --
    Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
  23. Re:Or.. teach devs to use threading as appropriate by Intropy · · Score: 2

    This is totally true.

    So long as you define trivial programs to be a subset of programs that can be proven correct.

  24. Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · · Score: 2

    Or, gawd forbid.. we could teach programmers how to use threading?

    No, we want our compilers to do these things for us, because things that compilers can do they usually do better than humans can. Once your programs become large enough and complex enough, compilers outperform humans every time -- it's not about laziness, it is about the limits of human programming ability.

    Compilers surpassed humans when it comes to optimization a very long time ago, except for very small programs or very short inner loop bodies.

    --
    Palm trees and 8
  25. Re:Or.. teach devs to use threading as appropriate by databeast · · Score: 2

    Bingo.. and as many other folks have pointed out, 90% of of threading is purely to decrease latency and bypass blocking operations - very few applications out there today are heavily dependent on concurrency to achieve any kind of raw horsepower... This does just seem like, if it were to become mainstream, would just become a crutch for developers to ignore threading and blocking I/O issues entirely, because "The compiler will sort that out".. In theory, this isn't necessarily a bad thing. In theory, all things work in practice too.

  26. Rubbish by Anonymous Coward · · Score: 0

    This is not auto-parallelising. This is adding in hints to the compiler to allow it to parallelise code. That's not automatic, that requires the programmer to do it. And with zero performance analysis in the paper, wtf? Seriously, this is a much harder problem than is being made out to be, and there is not one size fits all way of doing it. So they've found a neat way of doing things, so have other people. They work well in some circumstances, and not in others. Go figure. Paper probably quite interesting, but article complete tripe.

  27. Performance.. by Anonymous Coward · · Score: 0

    So, first we create a virtual machine and a new language, called c#. That cost about 50% performance, but that's disregarded as 'irrelevant because desktop apps don't need speed' and 'computers only get faster'.

    Then, when halved performance, we go seek ways to split inefficient single core code and split among two or more threads.

    So basically, we found a way to have 2 processors do the job of one. Doubling energy usage. I'm impressed.

    1. Re:Performance.. by Anonymous Coward · · Score: 0

      If you know how to code C#, it doesn't cut performance in half except when comparing against SIMD, which then it's more like 4x-6x reduction.

  28. Re:Or.. teach devs to use threading as appropriate by Hentes · · Score: 1

    Most parallel problems tend to fall into a small set of categories, and I see no problem with abstracting them away. There are many libraries already that handle the threading for you.

  29. Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · · Score: 2

    How about we write new code in a functional language, instead of clinging to imperative languages? Put C, C++, and Java in the category they belong in: maintenance and legacy code only.

    --
    Palm trees and 8
  30. Re:Or.. teach devs to use threading as appropriate by Jonner · · Score: 5, Insightful

    An interesting development, and much needed I fear, but yet another layer of abstraction to allow lazy developers to not have to really bother about knowing what their code is actually doing (that's for the poor SoB who has to maintain it is for...)

    Developing software is all about managing complexity. Abstractions are the primary tool used to do so. They are neither inherently good or bad. A large part of writing good software is finding the appropriate abstractions and eliminating inappropriate ones. If an abstraction allows a program to be written more easily with an acceptible level of performance and correctness, it is an appropriate abstraction.

    To know what code is "actually doing" is relative. No programmer knows what his code is doing at the level of gates on the chip. It's rarely necessary or even helpful to know what the code is doing at the level of CPU instructions or microinstructions. This is especially true if the code runs on multiple CPUs.

  31. Re:Or.. teach devs to use threading as appropriate by Desler · · Score: 1, Informative

    Compilers surpassed humans when it comes to optimization a very long time ago, except for very small programs or very short inner loop bodies.

    Hahah, no they didn't. If this were true anything multimedia-related would not still require tons of hand-optimized SIMD code. Even the best of C or C++ compilers are terrible at vectorization of code. Just compile something like x264 with the best C compiler you can find and compare it to the assembly-optimized version for, say, the latest i7 processors. The difference in speed will be quite huge.

  32. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    And what in the research paper itself would you dislike, then?

    In particular, what do you dislike, more than all the "layers of abstraction" a compiler already adds between you and assembly-level management of code and machine state?

    As far as I'm concerned, FP / immutability *is* the thing programmers should learn to describe, not the threading details themselves. Describing immutability is more profound than the actual threading details. If you want to execute code out of order in sequence (=multi-threaded) and have a predictable result, it needs to be composed only of commutative operations, and operating only on immutable values (at least for the section that is executed in parallel, rather than in sequence). Threading doesn't allow you to get around that, either way. And people already largely know how to describe operations (it's up to the library developers to decompose as much of the basic ones into commutative ones, though). In my opinion, the main task is to make people work with immutable values, nothing else.
     

  33. Re:Or.. teach devs to use threading as appropriate by doublebackslash · · Score: 1

    It is possible to prove a multi-threaded program correct.

    First you might start at the memory model, and all the guarantees that one can make based on that (and, by association, all the guarantees in terms of the memory model that locking / atomic calls make), then one can move on to he library structures and routines (objects and methods to put it in another paradigm).

    Once you have hard guarantees using primitives and low level constructs it might be easy to construct a state-based proof. One example is Cliff Click's lock free hashtable (don't know ifhe has a formal proof, but nearly so if not)

    Correctness in multithreaded environments takes a different form than in a single thread, but it is nothing that cannot be managed. Generally there are only a few "tough locks" to crack and the majority of the speedup can be had. Some problems are harder than others, and some problems don't have great parallelism, but that is just a generalization of programming, really. "Some things are harder than others, but nothing doable is impossible"

    --
    md5sum /boot/vmlinuz
    d41d8cd98f00b204e9800998ecf8427e /boot/vmlinuz
  34. Meh. Not that big a problem. by fyngyrz · · Score: 4, Insightful

    Honestly, you know what this says? This just says some programmers still need to go practice in the threading space.

    Threading is not that hard at all for a VERY large class of problems. There are issues -- like, the memory bandwidth will eventually choke the ability of processors to get at nearby memory regions -- and you need some kind of sane plan to deal with cacheing in one core as compared to the next core when you share flags or data coming in during execution from elsewhere -- and you need to figure out when thread overhead starts to chew into your efficiencies -- but really, it's entirely doable, and isn't that difficult a problem space as compared to the actual problems *serious* programmers have to solve. It may just be that your "thing" isn't really all that threadable. Some things aren't. But it should also be very obvious when they aren't.

    Then again, some problem spaces lend themselves very well to multiple core approaches -- I just finished a library that slices up images and then applies various separable algorithms to the slices in a variable number of threads (the user of the library can specify.) I wrote it in pure C, used posix threads, no sweat at all, in my 8-core machine, you get just about exactly the gains you'd think (x the number of cores) when the process is CPU cycle intensive, less as the ops are simpler and memory I/O rates rise.

    Seriously. If threading seems that hard, you need to go do some studying. It's not the threading. It's you. You really should get on this bus.

    --
    I've fallen off your lawn, and I can't get up.
    1. Re:Meh. Not that big a problem. by Desler · · Score: 2

      You seem to have replied to the wrong person as I'm not seeing what your post has to do with mine.

    2. Re:Meh. Not that big a problem. by rk · · Score: 5, Funny

      Maybe it's subtle sarcasm regarding learning threaded programming? People can't identify the right thread in a conversation, but should be expected to write threaded code? :-)

    3. Re:Meh. Not that big a problem. by ceoyoyo · · Score: 0

      Ssh. You're going to hurt some poor web programmer's feelings.

    4. Re:Meh. Not that big a problem. by grumbel · · Score: 1

      Seriously. If threading seems that hard, you need to go do some studying.

      I very much disagree. Spawning a few worker threads is easy, however when you want to write an application that essentially puts everything asynchronously into the background (not just for speed, but also interactive responsiveness) then things get really complicated really fast, doubly so when programming in a language with no build in support for parallel programming.

      The core problem with parallel programming is that it bloats your code a lot, things that were trivial in sequential code turn into code that is five times the size and a nightmare of callback chaining.

    5. Re:Meh. Not that big a problem. by GiganticLyingMouth · · Score: 1

      You're describing an embarrassingly parallel application, the kind where you can actually get linear speedups. Unfortunately many problems don't fit into this category, and once you start getting into inter-thread data dependencies, jagged load balancing, and debugging race conditions it's an entirely different ballgame, not to mention massively parallel stream processors (i.e. GPGPU). Easy threading is easy, but deriding it as something less than "serious" programming belies a degree of ignorance.

    6. Re:Meh. Not that big a problem. by Anonymous+Brave+Guy · · Score: 3, Insightful

      It may just be that your "thing" isn't really all that threadable. Some things aren't. But it should also be very obvious when they aren't.

      Should it? Do you really want to bet that you understand a modern CPU, the related hardware architecture, and a production quality compiler for that platform, all well enough to predict reliably when it's worth running a few instructions in parallel and when it isn't, taking into account the inevitable overheads of fork/join operations vs. the potential performance gains of concurrency? Can you really analyse the global structure of a 1,000,000+ line project and reliably identify algorithms that are expressed sequentially but in fact could be transformed losslessly into parallelised equivalents?

      Implementing efficient algorithms for embarrassingly parallel computations is (relatively) easy. Implementing efficient algorithms for basically independent parallel operations, such as a web server, is (relatively) easy. Implementing efficient algorithms for problems that might admit some asymmetric parallelism as long as all interactions are coordinated safely and where it might be worth implicit generation of some parallel code paths as long as the benefits outweigh the overheads... Well, if that doesn't seem hard to you, I look forward to reading your papers, because you just outsmarted a lot of researchers who have been investigating this problem for years.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    7. Re:Meh. Not that big a problem. by chthon · · Score: 2

      The problem with generalised parallel programming is that the possible interactions between all threads are on the order of O(n!). That is the big problem. That is also why coordinating parts of the program around synchronising objects makes the program easier to understand, because in the same way that O(n!) grows very fast, it also shrinks very fast if n can be reduced.

      Another problem I see is that the current core capacity is too large for running small problems.

      Take a complex mathematical expression, which can be transformed into a syntax tree with values and mathematical operations. In the set of mathematical operations, there is a big difference between additive, multiplicative and transcendent operations. Either the transcendent operations also need to be transformed into a syntax tree which can be executed on many more, much simpler nodes, or the tree must be balanced into transcendent operations, with the other simpler parts all to be run on one node with the same cost as the transcendent node.

    8. Re:Meh. Not that big a problem. by fyngyrz · · Score: 1

      I wasn't sharing data. So execution order is irrelevant. :)

      --
      I've fallen off your lawn, and I can't get up.
    9. Re:Meh. Not that big a problem. by fyngyrz · · Score: 1

      Should it?

      Yes. That's our job.

      Do you really want to bet that you understand a modern CPU, the related hardware architecture, and a production quality compiler for that platform, all well enough to predict reliably when it's worth running a few instructions in parallel and when it isn't,

      Yes, of course. Do you think a "modern CPU" is suddenly going to borrow your local stack from one core to another without having switched the rest of the relevant context as well? Do you think it will be able to tell who's on first without a mutex or similar mechanism? Do you think it won't be able to tell with one if you use it properly? Seriously, come on. The day they design a "modern (multicore) CPU" that doesn't provide for the basic mechanisms that make multi-core algorithms practical, is the day that CPU is a complete flop. Likewise, the day you use a multicore CPU without using those mechanisms properly is the day your program is a complete flop. You have to learn them. It's the bedrock that underlies the task.

      As for your question about how much code to switch, first of all, benchmarking in both unloaded and loaded systems is part of the ground work. At some point it becomes obvious: if you're going to be executing just a few lines of mundane memory-tossing, then the overhead of the threading isn't worth the time. You need to understand the basics: do all your cores get equal access to FPUs? How does the OS allocate cores? How many can you get? How many are there? Do you have control? Does the OS have control? Do you want to give the user control? These things aren't "hard", they're just part of the problem space.

      Can you really analyse(sic) the global structure of a 1,000,000+ line project and reliably identify algorithms that are expressed sequentially but in fact could be transformed losslessly into parallelised equivalents?

      Wait a minute. Now you're talking about trying to re-factor someone else's code. That's a whole nuther ball game than designing a system involving some degree of parallism and implementing it. OPC is an endless nightmare -- there's no limit to how bad it can get. If you ask me if I can design such a system, and implement it, then the answer is yes, almost certainly, as long as I can comprehend the problem space. If you're asking me if I can take someone else's single threaded, unknown quality code and turn it into a parallel version of itself, even given that I do understand the problem space, the answer is most likely "you can't afford me", but if you can, then the answer is a question: "do you have time for me to design it and re-implement it?" At this point, your answer is my answer.

      I look forward to reading your papers, because you just outsmarted a lot of researchers who have been investigating this problem for years.

      I didn't offer a "paper" -- I am not an academic -- nor did I contend that this knowledge is something that can be described in a few paragraphs from some ivory tower. I said that in order to succeed at it, you need to study until you get it. If you're a decent programmer, you can do it. That's not to say that I'm claiming any specific individual a decent programmer. Just that if you are, this is something you can do.

      --
      I've fallen off your lawn, and I can't get up.
    10. Re:Meh. Not that big a problem. by Acy+James+Stapp · · Score: 1

      You seem to be describing a class of problems known as "embarrassingly parallel", as in "I'm embarrassed when I can't solve this problem in parallel." There are other parallel problems that are more difficult and that require actual brainpower to solve. Perhaps you just haven't run up against one.

      --
      -- Too lazy to get a lower UID.
    11. Re:Meh. Not that big a problem. by fyngyrz · · Score: 2

      I very much disagree.

      Fine. But that doesn't make it so.

      however when you want to write an application that essentially puts everything asynchronously into the background (not just for speed, but also interactive responsiveness) then things get really complicated really fast, doubly so when programming in a language with no build in support for parallel programming.

      Perhaps you and I simply have different appreciations for the word "complicated." The only "support" I require for parallel programming are concurrency controls such as mutexes and a means to (hopefully, efficiently) launch a thread. I've written multi-threaded image processing, SDR code (and believe me, that's the poster child for requiring responsiveness... it's pure realtime), AL/evolutionary code, ray tracers, and so on. Me, a c compiler, and a lot of pizza. :)

      The core problem with parallel programming is that it bloats your code a lot, things that were trivial in sequential code turn into code that is five times the size and a nightmare of callback chaining.

      Sounds to me like that's a consequence of your approach, or perhaps your tools, because "a nightmare of callback chaining" is nothing that's ever happened to me. Quite the contrary, in fact. So it goes from your "core problem" to no problem at all in my case. I would go so far as to venture the suggestion that you may be doing it wrong, or poorly, or with a really problematic language. Or perhaps you just need more pizza. As I said above, it takes study.

      It's a skill set that's very similar to that required to learn assembly optimization. What I mean by that... where you move invariants outside loops, where they only get computed once; while the stuff that actually needs to change constantly is in the inner loop. You have to be able to look at an algorithm and see what needs to be computed within the current context, what comes from outside, what is interdependent, and so on. Without that insight, you can't code this stuff at all. But like learning to optimize assembler, once you learn that this is what you need to know about things, you begin to perceive it, and eventually, you'll get good at perceiving it.

      I've always felt that the best programmers (in the sense of creating the most efficient code in both space and execution time) are those who understand assembly programming extremely well, can look at the assembly code a c compiler produces and read it like it was this morning's news, and have a broad familiarity with a pretty good set of HLLs. For me, c is the ideal tool -- it is at precisely the right level to let me do what I need to do, no matter what it is; if I need OO, I can implement objects of my own design. If I need a thread manager, I can write it. If I need some specific type of control over memory allocation, or list management, associative arrays, etc., I can write it. Or lift it from a large collection of stuff I've already written. The higher up you go from c, and the more you depend upon other people's code and libraries / objects, the more control you lose. If that's ok, fine. But for parallel programming, I've never found that it was ok. Your milage, of course, may differ. I would only point out that I am very comfy with threaded programming, and you do not seem to be. There may be some useful information in that for you.

      --
      I've fallen off your lawn, and I can't get up.
    12. Re:Meh. Not that big a problem. by fyngyrz · · Score: 2

      The problem with generalised parallel programming is that the possible interactions between all threads are on the order of O(n!).

      Only when the threads share data; and that's what mutexes, etc., are for. If you share data without mutexes, it had better be 100% invariant (table of constants, etc.) Or actually not shared: different regions of an image where the process at hand doesn't have inter-region dependencies. So really, it's not O(n!) in any practical sense.

      Take a complex mathematical expression, which can be transformed into a syntax tree [rest of very well presented stuff clipped]

      Also, you need to take account of (and make the end user aware of) the difference between a real core and hyper-threading. Because while you may have run it on a quad core, they may run it on a "hyperthreaded dual core", and this is not the same thing at all, unless you can arrange for those stalls (in your hypothetical slow transcendent ops) to go do something else useful in your overall context.

      Sometimes the right answer to the type of problem you present is to move the goalposts. Sometimes you can use fixed point, tables of sin/cos, etc. 16-bit audio data is a good example of where this is easily done for certain types of operations. Generate the table(s) once on startup, thereafter it's no slower than any other memory access, yet it's still as transcendent as it needs to be.

      A concrete example: If you're going to rotate pixels in an image, then, you could use "nx = (x * sin(theta)) - (y * cos(theta))" as part of the job, if you wanted to do it that way. But you can observe that theta is invariant, so you only need two constants: sin(theta) and cos(theta); you don't have to run sin() and cos() for every pixel. So now your potentially stalling transcendents are of no consequence; your execution is "nx = x*StoredSinConst - y*StoredCosConst." (and of course, if you process the image Y by X, then you don't have to calculate ny except once per line, either, and you can substitute addition for multiplication if you process the cartesian space sequentially, but that's a different issue. Sorry. :)

      --
      I've fallen off your lawn, and I can't get up.
    13. Re:Meh. Not that big a problem. by fyngyrz · · Score: 1

      You're describing an embarrassingly parallel application [clippage] ...Easy threading is easy, but deriding it as something less than "serious" programming belies a degree of ignorance.

      What part of "some problem spaces lend themselves very well to multiple core approaches" did you not understand?

      --
      I've fallen off your lawn, and I can't get up.
    14. Re:Meh. Not that big a problem. by fyngyrz · · Score: 1

      See this post.

      --
      I've fallen off your lawn, and I can't get up.
    15. Re:Meh. Not that big a problem. by ultranova · · Score: 1

      It's a skill set that's very similar to that required to learn assembly optimization. What I mean by that... where you move invariants outside loops, where they only get computed once; while the stuff that actually needs to change constantly is in the inner loop.

      You're talking about common subexpression elimination, which is usually automated and has nothing to do with assembly specifically. It's also not what people generally think when they heard "assembly optimization".

      For me, c is the ideal tool

      Well, since you seem to like doing by hand what the computer could easily do automatedly, I have to agree.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    16. Re:Meh. Not that big a problem. by Anonymous+Brave+Guy · · Score: 1

      I'm sorry to be trollish, but you seem to be talking down to the rest of us as if we just don't understand, with comments about how you understand the problem space, we couldn't afford you, and so on. However, at this point, it's abundantly clear that you don't even recognise the problem we're considering. Frankly, I'm not convinced you even looked at TFA either, or understood it at all if you did.

      What you're describing is the stuff we all considered decades ago. Designing an algorithm to solve a simple, embarrassingly parallel problem is relatively easy. Implementing that algorithm using a primitive threading+locking model is also fairly easy, though even then the evidence to date suggests that such programming models are usually too error prone for critical uses even in the hands of experts, hence the exploration of alternatives such as actor-based models with minimal or no shared state, message passing for inter-process communication, and so on.

      What the rest of us are talking about is automatic parallelisation, either on a small scale as a kind of micro-optimisation (such as auto-vectorising a loop to use SIMD type instructions) or even on a medium/large scale as a systematic transformation from an algorithm expressed sequentially into one that exploits parallelism implicitly where it is cost-effective in performance terms to do so.

      Unfortunately, as Desler correctly said in the post you first replied to in this subthread, compilers are mostly still quite poor at employing these automated transforms even in the simpler, localised cases. Research into the more powerful but challenging cases is barely out of its infancy, and some of the smartest guys in the industry are working on it. TFA is interesting because it suggests potentially interesting developments in the field.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    17. Re:Meh. Not that big a problem. by Anonymous Coward · · Score: 0

      Seriously. If threading seems that hard, you need to go do some studying. It's not the threading. It's you. You really should get on this bus.

      The problem is not how hard it is.

      The problem is what happens when bugs occur.

      Threaded code can have bugs that are uniquely difficult to find, reproduce, analyze, and debug.

    18. Re:Meh. Not that big a problem. by CodingHero · · Score: 1

      Honestly, you know what this says? This just says some programmers still need to go practice in the threading space.

      . . .

      Seriously. If threading seems that hard, you need to go do some studying. It's not the threading. It's you. You really should get on this bus.

      That's true. Threading concepts aren't hard, they're just not as widely taught as they should be. In my computer/software engineering education, I can't recall any required classes that really put a focus on developing multi-threaded applications.Operating Systems took a look at some of the issues associated with multi-tasking and thus multi-threaded programming, but it didn't really go so far as to force you to develop complex applications that put such lessons into practice.

      The real issue is that debugging your multi-threaded application can be difficult and becomes more difficult as your application becomes more complex, requiring more and more threads. Maybe that's not hard if all your threads are doing exactly the same thing, but if they're all heterogeneous then it becomes tough. How do you know what caused this dead-lock or that crash? Why does it only happen sometimes and not others? Why did adding this line of code here break all the other threads? It's not an easy problem to solve and requires either better tools (that don't exist yet) or a lot of experience to get a feel for where to look and what to do (or not do).

      Then again, some problem spaces lend themselves very well to multiple core approaches -- I just finished a library that slices up images and then applies various separable algorithms to the slices in a variable number of threads (the user of the library can specify.) I wrote it in pure C, used posix threads, no sweat at all, in my 8-core machine, you get just about exactly the gains you'd think (x the number of cores) when the process is CPU cycle intensive, less as the ops are simpler and memory I/O rates rise.

      If you are trying to accomplish the same amount of work in a shorter amount of time, Amdahl's Law calls you a liar. If, on the other hand, you're trying to do more work in the same amount of time, then Gustafson's Law agrees with you but you will still NEVER get a speedup of N by adding N cores; it will always be somewhat less than N.

    19. Re:Meh. Not that big a problem. by GiganticLyingMouth · · Score: 1

      The part where you then go on to state that that parallel programming is trivial (or at least, something beneath *serious* programmers), because you managed to code up an embarrassingly parallel application.

    20. Re:Meh. Not that big a problem. by rufty_tufty · · Score: 2

      I think the problem is the original article was about an approach to add another route to parallelise an existing bit of arbitrary code.
      Fyngyr jumps in and says something along the lines of 'well if you just wrote with parallelism in mind in the first place when architecting the system you wouldn't have this problem.' A viewpoint I agree with and in fact I would extend to say that from my viewpoint most systems are easier to design to be parallel in the first place than to force single threaded onto them once you are trained by whatever methods to think in this way.
      Now is where I think the problem comes, most SW engineers are so used to working in single threaded worlds and are so trained to think in single threaded modes it's all they know. The education and languages are so geared to single threaded viewpoint that this is understandable.
      So we end up with Fyngyr saying 'just design it to be parallel in the first place' and yourself correctly pointing out that not all problems can be parallel. Then the reaction of most SW engineers is that this is a compiler problem and we're back to TFA.
      I'd rather say that there is middle ground. IME some parts of the problem are inherently parallel, some are inherently serial and forcing one to be something it isn't is plain silly. The fact that the language encourages you to iterate serially over an image one pixel at a time without being able to say this can be done in parallel is to me daft. The fact that when most people design a large system they get the single threaded version working first then try and add in parallelism is totally wrong. I could rant all day about this but I think you get my point. Which is that while not all large problems are trivially parallelisable most are at least to some limited (but still huge) extent, so the fact that it cannot be solved for the totally general case should not stop it from being a preliminary design consideration and the fact that it often isn't is I think a shameful declaration of the state of the software industry.
      Now the coarsely grained multi-threaded-ness I'm talking about here may become invalid when we've got 256K core processors, but we're a way off that yet.

      As a side note to make my point, consider the times when your computer seems slow. What is it doing? Perhaps rendering a web page while your backup application is running in the background? Perhaps you also need your mail app to be checking the server at the same time as you've got music playing in the background. In any of the cases i can think of where my computer seems slow to complete a task it is a completely trivial problem and there is no excuse outside of poor system architecture. (this is of course ignoring the compute intense stuff that I might do like let's say transcoding video or rendering 3d, or playing computer games - oh wait, also trivially parallel.)

      --
      "The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -
    21. Re:Meh. Not that big a problem. by Anonymous+Brave+Guy · · Score: 1

      I think the fundamental problem here is an assumption that how we describe code and how it executes should be the same. There is no reason they should be, other than mechanical ease of translation by compilers and similar tools from one to the other. But the job of a good compiler/optimizer is precisely to allow me as programmer to express my intent in a convenient form, and to turn that into an efficient representation of that intent that my hardware can execute.

      In other words, just because I described an algorithm in parallel terms using say fork/join semantics or a message passing/event system, the compiler should still be free to implement it using full threading, green threads, or even some sort of single-threaded super-efficient state machine thing, and ideally I shouldn't need to know or care which it used. Similarly, just because I described an algorithm iteratively, ideally the programming model/language semantics would be expressive enough for a compiler to figure out how much sequential behaviour actually mattered and what was just coincidental, and if necessary to generate parallel execution paths under the hood.

      We don't really have tools that do these sorts of transforms very well at all yet, probably in part because we haven't really figured out which theoretical models for describing programs actually let us write compilers to do those conversions in practical situations yet either.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  35. No, it cannot by gweihir · · Score: 0

    First, many problems do not have significant speed-up when parallelized. No automated tool can change that. Second, parallelizing things has absolutely nothing to do with Moore's law. Maybe look up the definition some time?

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:No, it cannot by Alkonaut · · Score: 1

      Maybe read the article? Maybe just the title? "Help restore the GAINS of Moores law.". That is, since these days the extra transistors we put on cpu:s just go to more cores and not faster cores, we don't gain as much as we used to until 2004. These days to actually GAIN from Moore's Law, we need code that actually use these transistors, i.e. parallel code.

  36. Re:Or.. teach devs to use threading as appropriate by Desler · · Score: 0

    To further add, this is doubly compounded when running such things on, say, ARM processors. Anything multimedia related that doesn't have NEON SIMD or isn't using some form of hardware acceleration will run terribly slow relying only on compiler optimization.

  37. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    Threading as it is today is inherently non-deterministic, which makes isolating and fixing threading bugs very difficult. It doesn't matter how comfortable you are in writing threaded applications. At some point you -will- introduce thread-related bugs in your application. In fact many threaded apps run for years with threading bugs lying in wait and they aren't triggered until some later change alters conditions just enough for a race condition or other problem to be exposed. Those dormant bugs are a SERIOUS PITA to debug and fix because you look through recent changes in your version control and everything looks innocuous.

    Anything that can allow a compiler to prove correctness of concurrent execution is a welcome improvement. I'm all for additional education, but I wouldn't poo-poo this type of advance in compiler technology.

  38. Re:Or.. teach devs to use threading as appropriate by doublebackslash · · Score: 1

    That is because of knowledge of processor details, memory details, comple operations, and, well, they [compilers] are better than us. Except that the ability to optimise depends on pure logic, of some form. As the state gets large optimization gets more complex.

    Just like quicksort(3) is far faster than bubblesort so too is a highly threadable code faster than non-threadble code. Languages do not, contrary to belief, express intent, the provide a strict set of instructions that the computer MUST respect. In the end a good algorithm with no compiler help will beat optimized "dumb" code in all cases larger than "toy" (say, a few dozen "n" in Big-O notation)

    Good thought about opimizing compilers, but it doesn't bear out.

    --
    md5sum /boot/vmlinuz
    d41d8cd98f00b204e9800998ecf8427e /boot/vmlinuz
  39. Oh! My Sweet Day. by 140Mandak262Jamuna · · Score: 2
    All these days of careful coding diligently avoiding static_casts and const_casts ... All those recompilations triggered because I had to touch one header file used by everyone and his brother to satisfy my strict insistence of const correctness... I paid and paid and paid all these days. I avoided mutable data members because Soustroup pontificated in some vague posting in comp.lang.c++ (OMG I even forgot the usenet group name!) that "a well constructed object oriented code should not require mutables".

    This is my pay day baby! My code is going to run super fast in multicore machines without any (more) extra work from me! Wooo Hooo!

    Please take pity on my and let me enjoy my day dream for a few hours, without rudely inserting reality into my reverie! Thanks slashdotters, I know I can count on you!

    --
    sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
    1. Re:Oh! My Sweet Day. by phantomfive · · Score: 1

      Do you not feel that you benefit already from using const, in that your code has fewer bugs?

      --
      "First they came for the slanderers and i said nothing."
    2. Re:Oh! My Sweet Day. by 140Mandak262Jamuna · · Score: 2

      That is the only and correct reason to be a const correctness nazi. Auto threading and auto vectorization would produce a few papers and mint some PhDs. But for the guy hacking code to make a living, they are not going to make much diff.

      --
      sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
    3. Re:Oh! My Sweet Day. by phantomfive · · Score: 1

      Yeah, I had my eyes opened to the world of const correctness when I learned some functional programming recently. I used to think it was alright, but when I realized the effort it saved me in not having to read through whole sections of code in order to verify that they won't cause problems, I started really liking it.

      --
      "First they came for the slanderers and i said nothing."
    4. Re:Oh! My Sweet Day. by serviscope_minor · · Score: 1

      I started really liking it.

      indeed. I started liking it more and more as my youthful arrogance wore off and I realised I was not half so clever as I though I was. const correctness allows me to be dumb, by preventing me from having to reason about nearly so much code.

      I would like to see it go further in C++, though.

      One thing I dislike is the lack of transitive const. This gives a nasty disconnect between slices and values, unless you use raw pointers.

      If you have a function accepting const double*, then you are going to be accessing that data immutably (assuming no aliasing by a non const pointer). It doesn't matter where the data comes from and most importantly, a double* automatically converts to a const double*. In other words, the logical target, not the data (the pointer itself) becomes const.

      For other classes, it is always the data, not the logical target which becomes const, and there is no way of encoding that you want something to become logically const. The reason this matters is that if you need (assume ASliceClass holds a pointer and a size):

      void some_func(ASliceClass);

      byt ASliceClass won't automatically convert to a const double variant. It will instead convert to a const ASliceClass which is, of course entirely different.

      One can write conversion operators, but that is (a) boilerplaty and therefore bad and (b) doesn't work if the lookup gets more complex (like in the case of having some_func templated).

      One can use all sorts of interesting hacks, but fundementally, language support for passing on constness to data members would be a much better solution.

      The more constness, the better!

      --
      SJW n. One who posts facts.
    5. Re:Oh! My Sweet Day. by phantomfive · · Score: 1

      Maybe. It sounds like too much thinking to me, though. Recently I've been fairly satisfied with the solution of 'final' (since I mainly use Java at my current work) and using static methods to avoid state in general. Maybe when I have more experience like you, I will come over to your viewpoint as well!

      --
      "First they came for the slanderers and i said nothing."
    6. Re:Oh! My Sweet Day. by serviscope_minor · · Score: 1

      Maybe. It sounds like too much thinking to me, though.

      The opposite. Generally if things are immutable, then they can't change under you so there's less to worry about. Mutation leads to thinking (for me). The problem with C++ is that it is insufficuently expressive: you can't easily express a lack of mutability without introducing horrible syntatctic overhead (syntatic chaff?) for a moderately wide class of objects (slices).

      Final is good, and it's very similar to const in C++, but suffers the exact same problem.

      Basically, final in Java seems (to me, I'm not an expers) to be equivalent to makeing a pointer immutable in C++, i.e. you can't reassign.

      Const in C++ goes a bit further: you can also make all members of a class immutable. i.e., you can make the target of a pointer immutable as well.

      However in both cases (C++ and Java) if the embedded classes contain pointers/references to other classes, you can still mutate those targets even though the class in question is const/final.

      In both languages, there's no way of saying "it's impossible to ever alter anything through this object".

      TL;DR final is good, bit I want more!

      --
      SJW n. One who posts facts.
  40. Teach programmers by Culture20 · · Score: 2

    The holy grail of parallel processing is teaching programmers how to handle parallel processing (and what domains can benefit and where).

    1. Re:Teach programmers by Anonymous Coward · · Score: 0

      Agreed. Oddly enough, I don't see the hordes of people here on /. who complain about how HR/management doesn't think there skills are fresh rushing over to defend this point of view.

    2. Re:Teach programmers by Anonymous Coward · · Score: 0

      The techniques are fairly straightforward; monitors, semaphores, shared memory, scatter/gather, pipelines, intercommunication.

      The problem is that the programmers end up maintaining two more branches of the same code; the single-threaded code, and the multi-threaded code. Just as soon as CPU's (or GPU's) change architectures (clock speed, cache size, cache entry size) the multi-threaded code has to be reoptimized manually. You have to match the "chunk size" of every task with the processing speed of each core. Even if the language just gets "enhanced", that requires maintenance to scrape off the bit-rot.

    3. Re:Teach programmers by Alkonaut · · Score: 1

      This research doesn't change that. What parallel programming is about, is currently three steps 1) find parts of code that can execute in parallel (reasonably simple), 2) make sure there is no shared mutable state (hard), 3) make correct threading implementation (tedious

      The problem with todays (OO/imperative) languages and tools is that it is exceedingly hard to make sure that state isn't shared. It is also very hard to test for, and find bugs related to shared state. This research helps with step 2. You still have to figure out where these boundaries are, but you can make sure it is correct, by letting a compiler check this for you. It can also help you with step 3, but if your assumptions are correct that isn't hard in current tools, just tedious. Things like TPL and PLink have greatly simplified step 3), but what I assume MS have found out is that with such power to parallelize, developers are spending more and more time in step 2, thus gaining very little.

  41. functional programming by segfault_0 · · Score: 2

    A compiler may be able to thread your program, but it will be a long time before it understands your intent well enough to do it well. Also I can think of many situations under which such functionality may not even be a good idea at all. I would assume such a system would use a pool of threads to avoid constant thread construction overhead and if you get a few IO-busy actions threaded out in an ignorant fashion I think you will find your program blocking a lot more, and producing a lot less throughput than it should.

    Also, the OP blithely stated that functional programming isn't happening -- yet features of the functional paradigm, like anonymous functions, have made their way into nearly every language of consequence and many new languages proudly tout functional programming features (see f#, scala, rust, clojure, and go). Perhaps pure functional programming is moving pretty slow, but it's features are not.

    --

    I was crazy back when being crazy really meant something. (Charles Manson)
    1. Re:functional programming by shutdown+-p+now · · Score: 1

      A compiler may be able to thread your program, but it will be a long time before it understands your intent well enough to do it well.

      Note that they still require annotations in your code to do it right. The advantage here is that you work on a higher level - instead of stating the intent ("run this on a separate thread"), you state mutability/immutability of data, and existence and non-existence of dependencies, letting the compiler figure out how a given dependency graph of actions can be optimally threaded.

      I don't know what they do for IO-busy actions, but there's no reason why the compiler can't automagically transform a sync blocking I/O call to an async non-blocking one (with a callback corresponding to the continuation of the caller), which allows you to free up the thread. C# already does that, sort of, with async/await keywords - right now this isn't transparent, since you have to write something like "s = await stream.ReadLineAsync()" instead of "s = stream.ReadLine()", but this can obviously be done as a mechanical transformation if so desired.

    2. Re:functional programming by segfault_0 · · Score: 1

      It would have to invert control, generally asyncIO requires callbacks. Then once you get into that realm you have to worry about the stack and automatic variables not being scope in the callback. It wouldnt be simple.

      --

      I was crazy back when being crazy really meant something. (Charles Manson)
    3. Re:functional programming by shutdown+-p+now · · Score: 1

      It would have to invert control, generally asyncIO requires callbacks. Then once you get into that realm you have to worry about the stack and automatic variables not being scope in the callback. It wouldnt be simple.

      Well, of course, so you lift any variables on the stack into the heap, same as you do for closures. It actually is simple enough - it's effectively rewriting in continuation-passing style, except that you do the transformation at certain specific points (where async calls happen), not at every point you can. And we know how to transform to CPS pretty well. In a language with closures (which C# has), you just rewrite the remainder of the method as a closure. It's a bit more tricky when async call is, say, in the middle of a loop, since then you'll need to translate the loop into a state machine, but it's all doable.

      Also, as already noted, this transformation is already implemented in C#, with the only difference from what I described being that it requires the programmer to explicitly mark points where there is an async call that should be rewritten. It would be trivial to change that to always rewrite for any call that is async.

  42. Communicating Sequential Processes by Marxdot · · Score: 1

    Or we could just pursue CSP so that we have a friendly concurrency model with which to parallelize software that needs it, or just to introduce asynchronicity in a simple way. Shared-memory concurrency is pretty poor, whether or not a human user of the language is responsible for it.

    1. Re:Communicating Sequential Processes by serviscope_minor · · Score: 1

      Shared-memory concurrency is pretty poor, whether or not a human user of the language is responsible for it.

      No, it isn't. At least not for certain classes of problem. For numerical computation, it's often a very easy way to program things, especially with OpenMP.

      --
      SJW n. One who posts facts.
  43. Nothing new here by stevel · · Score: 4, Informative

    This is nothing new. As in decades-old. Back when I was at DEC in the 1980s we had auto-threading compilers that worked very well on standard application code. Today, Intel has been doing auto-threading in its C++ and Fortran compilers for more than a decade and it is very effective for a large class of applications. Intel's compiler also has features that help you tweak the code for better auto-parallelism, notably the Guided Auto Parallel feature introduced in 2010. Other compiler vendors also have auto-parallelizing compilers.

    I've been in the industry for decades, and every couple of years someone comes out with yet another programming language or technique that is supposed to revolutionize application parallelization. This is just another one, and many years behind the rest of the industry from what I can tell.

    1. Re:Nothing new here by serviscope_minor · · Score: 1

      I've been in the industry for decades, and every couple of years someone comes out with yet another programming language or technique that is supposed to revolutionize application parallelization. This is just another one, and many years behind the rest of the industry from what I can tell.

      One of the best out there is the rather unfashionable OpenMP. Usually as a programmer, you know if your loop is accesing read only data, or if the writes are independent per iteration. The compiler often has a much harder time finding out, especially with C and C++ over FORTRAN.

      If you happen to know that for sure then a simple
      #pragma omp parallel for

      at the beginning of the for-loop pretty much solves it for you. The main point is that doesn't try to tackle the impossible task (doing it automatically in general), but instead hands that off to the user and makes the rest essentially trivial.

      If anyone thinks that it isn't an impossible task, then look up P-complete.

      --
      SJW n. One who posts facts.
  44. Re:Or.. teach devs to use threading as appropriate by SplashMyBandit · · Score: 3, Insightful

    Actually you don't need functional. Java works fine with multi-threading (well, it does for my daily use). In fact, that is the real strength of the garbage collection model of Java, it can track and manage efficient release of resources in a multi-threaded system. Also Java has direct language support for synchronization mechanisms and extensive library support for multi-thread friendly data structures (Doug Leah is a legend!). Lots and lots of multi-threaded programs get written in Java, without needing functional languages, it's just that we we do write them they do work (after we test for various possible thread-related problems), work well, and we don't have to bitch about needing to change paradigms to get things going (eg. needing to change to functional languages when they are not always the best fit for many classes of problem).

    Of course you'll hear those still stuck on legacy languages designed without multi-threading in mind whinging. That doesn't mean there isn't a straightforward, easy to learn language and development platform out there that isn't strong on multi-threading. There is - and it is cross-platform and general purpose too.

  45. Re:NO potential by Eskarel · · Score: 1

    Innovation doesn't mean "doing something no one has ever thought of before" it means "doing something no one has done before". If Microsoft's technique works where Carnegie-Mellon's didn't then it's innovative, if it doesn't, it probably isn't. There could still be something innovative in the way they tried which could help someone else later, but mostly what we'd be looking for from an innovation point of view is "actually does what it says in the tin".

  46. keep moving them goalposts! by __aaltlg1547 · · Score: 0

    People who opine that a compiler advancement can get Moore's law back on track. clearly have no idea whatsoever what Moore's "law" states.

    Moore's law has absolutely nothing to do with compilers or any other sort of software. It's about transistor count in integrated circuits. Specifically, Moore's "law" observed that transistor count of state of the art ICs had doubled every year since the IC was invented and he predicted that it would continue to do so for at least 10 years.

    And he was approximately right (the best one can expect for a pseudo-prophetic utterance). But over time, the rate of increase has fallen below what Moore predicted, with an average doubling time of about 2 years. Now tech roadmaps assume more like 3 years. Every step is harder than the last and eventually we will reach a point of diminishing returns where the investment necessary to double IC complexity won't be seen as worthwhile.

    1. Re:keep moving them goalposts! by flimflammer · · Score: 1

      Yes, indeed; you are about the second dozenth person who has made this point so far.

    2. Re:keep moving them goalposts! by Alkonaut · · Score: 1

      Thus not only proving they didn't read the article, they didn't even properly read the title. "restore GAINS of moores law". Which is perfectly correct.

    3. Re:keep moving them goalposts! by __aaltlg1547 · · Score: 1

      Thus not only proving they didn't read the article, they didn't even properly read the title. "restore GAINS of moores law". Which is perfectly correct.

      No, that's still incorrect. It's like saying improvements in corn flake processing improve farm yields.

  47. Automatically paralleling compilers by Animats · · Score: 2

    Automatically paralleling compilers aren't new. SGI had one for C ten years ago. Intel has it for C++ and Fortran now. It's been done for Matlab. There's plenty of theory.

    Outside of the supercomputing community, it's never gone mainstream. Microsoft is at least trying to do it for C#, which is a reasonable language. An interesting point is that this Microsoft approach exploits immutable objects. Immutable objects can be freely shared without locking, so wide use of immutable objects makes automatic extraction of parallelism easier.

    I'd looked at doing something similar for Python, to get rid of the Global Interpreter Lock, Python's boat-anchor. Python already makes wide use of immutable objects, but doesn't gain any performance from them. If everything is thread-local, immutable, or synchronized in the Java sense, you don't need global locking. But the politics of the Python community do not favor performance. (Potentially, Python could be up to at least the LISP level of performance, within a factor of 2 of C, if the language was restricted in certain ways.)

    Another part of the problem is the pernicious heritage of POSIX/Linux locking primitives. Many programmers think that locking is an library or OS-level problem, not a language level problem. Locking via library primitives means language doesn't know what data is locked by which lock. This makes optimization and race checking very tough.

    The political and social problems here are tougher than the technical ones. So the open source community can't solve them. It takes someone with a big hammer, like Microsoft, to do it.

    "Who tells whom what to do?" - V. Lenin

    1. Re:Automatically paralleling compilers by whistl · · Score: 3, Informative

      SGI's automatic parallelizing software came from Kuck and Associates, Inc (kai.com). I worked there for 8-1/2 years, and one disappointing fact we learned was that the only people who really cared enough about parallelizing their software to analyze their code and modify the source to make it faster were either research scientists (of which there were relatively few) who mostly wanted quicker and cheaper results (because renting time on supercomputers costs $$) or marketing departments of computer hardware manufacturers (of which there were fewer) who only wanted to be able to advertise higher SPECmark numbers for their hardware. SGI was the only manufacturer who shipped our product with every C and Fortran compiler they sold. IBM, DEC, HP only sold it as an option, but all used it internally to speed up their own benchmark numbers.

      Automatic parallelizing is tough, tougher than you think. It's nearly impossible if you don't have a human performing program analysis and adding source code directives to inform the compiler about data dependence needs.

  48. Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · · Score: 5, Insightful
    Let me repeat myself: except for very small programs or very short inner loop bodies. Most of what you are describing is pretty small in terms of the amount of code and its complexity; hand-optimizing assembly code does not scale up.

    Even the best of C or C++ compilers are terrible at vectorization of code.

    Yeah, and the best humans are terrible at allocating registers -- so bad, in fact, that the best C compilers ignore the register keyword. What do you think is more relevant to the general case: vectorizing multimedia operations, or allocating registers? Compilers are also better than humans at:

    • Finding redundant or unreachable code
    • Finding dead or otherwise unneeded variables
    • Finding algebraic reductions
    • Strength reducing transformations
    • Boolean optimizations
    • Reducing program size

    ...and numerous other optimizations that, like register allocation, are more generally applicable than SIMD. There has also been work on compiler optimizations that are utterly out of reach for even the most skilled humans, like probabilistic optimizations that have a negligible (with respect to some tunable parameter) probability of being unsound.

    To put it another way, look at the Orbitz story, which is over a decade old now:

    http://www.paulgraham.com/carl.html

    On the one hand, you have hand-tuned assembly language. On the other, you have a program written in Lisp (a high level, compiled language) with some C++ mixed in (for managing memory). Orbitz was able to compete on speed, but more importantly, it was returning better results. It's not that the people who wrote that mainframe assembly language were idiots -- they were taking advantage of all sorts of special hardware features, they knew how to hack their machines better than anyone else -- it is just that the Orbitz algorithm was far too complex for efficient hand-rolled assembly code, at which point compilers are really the only choice. The mainframe guys were busy thinking about how to make use of special machine features in assembly language; the ITA team was busy solving the higher-level problem, and relying on their compiler to generate good assembly language code. This is a particularly telling line:

    We disassemble most every Lisp function looking for inefficiencies and have had both CMUCL and Franz enhanced to compile our code better.

    [emphasis mine]. They disassembled their code...and then improved their compiler when they saw problems. They did not hand-roll the code, they made the compiler do a better job of generating code. These are not lazy programmers, nor are they programmers who do not know how to use assembly language; they are programmers who understand that they have a tool that is far better at generating assembly language than they are, and that they have more important things to do with their time.

    I deal with quite a bit of crypto code in my work. I have seen lots of hand-tuned assembly language, I dealt with code that took advantage of the AESNI instructions to perform very fast encryption. I am well aware that in small, highly specialized functions (like AES), humans are better able to utilize special instructions to improve performance. Those are niche cases, and the techniques used in those cases have very limited applicability (even SSE is fairly limited in its applicability, by comparison with the sort of code programmers write and maintain every day), and the techniques scale very poorly.

    --
    Palm trees and 8
  49. Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · · Score: 5, Insightful

    Just like quicksort(3) is far faster than bubblesort so too is a highly threadable code faster than non-threadble code

    First, just to be pedantic, I'll point out that quicksort is as bad as bubblesort in the worst case, to a constant factor (you should have picked heapsort or mergesort). That aside, it is worth noting (and I am somewhat bothered by this when it comes to TFA) that we still do not know if it is even possible to optimize any program by parallelizing it; see the NC-vs-P question:

    https://en.wikipedia.org/wiki/P-complete

    Multithreading is not a magic bullet, and in all likelihood it is not generally applicable.

    Languages do not, contrary to belief, express intent, the provide a strict set of instructions that the computer MUST respect

    Wrong on all counts. Imperative languages are a way to convey instructions to a computer; declarative languages do not convey instructions, and programming in a declarative language requires an entirely different mode of thinking (it is closer to asking a question that giving instructions). It is also not strictly necessary for the computer to do exactly what a program expresses; there has been some work on compiler optimizations that have a (tunable) chance of not maintaining soundness.

    In the end a good algorithm with no compiler help will beat optimized "dumb" code in all cases larger than "toy" (say, a few dozen "n" in Big-O notation)

    If you are convinced of this, try implementing something more complex than the algorithms you see in Knuth's books; say, this:

    http://eurocrypt2010rump.cr.yp.to/9854ad3cab48983f7c2c5a2258e27717.pdf

    Then ask yourself this: could the constant factors in your implementation be better? At the end of the day, big constant factors will hurt your performance so badly that you might as well have used an asymptotically worse algorithm; indeed, consider fast integer multiplication:

    https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm

    10000 digits are needed before that algorithm actually outperforms the asymptotically worse Toom-Cook family of algorithms. Here is an even more extreme example:

    https://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_algorithm

    Sure, that's a better matrix multiplication algorithm in the asymptotic sense...but only for matrices that are so large that you cannot even store them on today's computers.

    So really, while you are right that asymptotic improvements will always beat constant factor improvements (which is what compilers are mostly going to do for you), you are wrong to ignore the importance of constant factor improvements. In the real world, constant factors matter. In the real world, quicksort and mergesort will use asymptotically worse algorithms below a certain problem size because of constant factors. In the real world, large integer multiplication is done using Karatsuba or Toom-Cook methods, not FFT methods, because of constant factors. In the real world, if you are not using a compiler to optimize your code, your code is going to be slower than it needs to be, even if you spent hours hand-tuning it, unless you are only dealing with toy problems.

    --
    Palm trees and 8
  50. Re:Or.. teach devs to use threading as appropriate by shutdown+-p+now · · Score: 1

    "Using threading" as in spinning off a thread or two is not hard to teach. However, even when people know to use threads, and, more importantly, know how to use them right (avoiding race conditions etc), they are still not necessarily optimal at determining when to fork, and how many threads to create at each fork point. A compiler doing analysis of the entire code base to see what depends on what can come up with more opportunities to parallelize, some in very non-obvious places. This isn't really any different from many compiler optimizations that you see in today's C++ compilers - some of them are very non-obvious, when you look at the generated assembly.

    Anyway, if you want to do "threading for the masses", you need to add another level of abstraction on top - tasks/futures, and chains thereof. Like Apple's GCD, or .NET TPL. It's much easier to reason in terms of forking and joining tasks than it is in terms of threads and synchronization primitives. Also, you can do nifty syntactic sugar on top of tasks (see C# async/await for an example) without hiding all their flexibility.

  51. Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · · Score: 1
    Why Java, though? What does the Java language have that other JVM languages do not? As a language, Java is pretty awful; you are stuck with a single programming paradigm even in cases where that paradigm makes no sense, you are writing enormous amounts of code to express what you are trying to do, and features that programmers in other languages take for granted are just not there (or are very limited). Meanwhile, you have Scala and Clojure on the JVM -- so why not just use these where functional approaches make sense, like for parallel algorithms?

    easy to learn language

    Having taught Java to novices at one time, I can tell you that this is not true. Here is a very small example of why Java is harder to learn than other languages:

    someObj.someMethod(if(x == 2) { 5; } else { 10; });

    Yes, they should have used the ternary operator. Why does Java even separate these? Why not just allow conditional statements to appear everything, which is what you get in Clojure (and any other Lisp, and many other programming languages that use reduction semantics)? This sort of rigid syntax makes the task of learning Java needlessly hard, because learning Java means learning how to think in a very different, very unnatural way.

    --
    Palm trees and 8
  52. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 1

    Annotations are dangerous because humans mess them up in ways that maximally obscure bugs. Explicit threading has the same and more problems, including sometimes implicit architectural dependencies. When you have many $M invested in a code, it has to outlast generations of hardware systems.

    For the best work ever done in compiler-managed parallelism, read papers about the Tera/Cray MTA's compilation system.

  53. The levels of programming by Anonymous Coward · · Score: 0

    If something can be made more efficient - using a set of logic that can be programmed... and this itself can then be done in an optimisation program. If these 'rules' for optimisation are good they may be included in the compiler. They both try to do the same thing ultimately... to convert a language into efficient binary code that can be executed by the OS? So the OS should control the threads, based on what it knows is already in use and what threading model the code can accomodate?

    Seems to me that a lot of people have been passing the buck on this one, not just the end software devs.

  54. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    I'm not sure if this is meant to be funny or serious.

  55. Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · · Score: 2

    Some threading problems are hard, really hard. They aren't that common.

    Which is why compilers should be handling as many threading problems as they can: so that rather than spending time and mental effort dealing with problems that are not so hard, we can spend time and mental effort on problems that compilers cannot solve for us. There is a reason that we use higher level languages, and that reason is not "laziness."

    --
    Palm trees and 8
  56. Re:Or.. teach devs to use threading as appropriate by DNS-and-BIND · · Score: 1

    Abstractions are the primary reason my four GIGAHERTZ computer with four GIGABYTES of RAM is less responsive than my 450MHz computer in 1998. All the performance gains are immediately devoured to make life easier for developers with no benefit to users.

    --
    Shutting down free speech with violence isn't fighting fascism. It IS fascism!
  57. I bet they patented it too by argee · · Score: 1

    Wanna bet?

  58. Re:Or.. teach devs to use threading as appropriate by KingMotley · · Score: 1

    Threads introduce all sorts of extremely subtle timing-based logic and security bugs which even the smartest programmers in the world think they can handle but in practice don't.

    To be fair, you don't even need multithreaded programming to see all the subtle timing issues that poor programers don't understand and make daily. It really isn't that difficult to write multithreaded programs, and ones that don't having "subtle timing-basic logic and security bugs".

  59. Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · · Score: 4, Informative

    How frequently do you spend time thinking about which registers are being used to store partial results in the programs you write? When was the last time you spent time thinking about which registers are being used by which variables, or when it would be a good time to store register contents in memory?

    What about memory alignment and packing? Calling conventions? Near pointers and far pointers (OK maybe that one is a bit dated)? Most programmers do not think about these problems, because most programmers are trying to solve bigger problems and need to spend time thinking about the bigger picture. Getting mired down in low level details makes programmers less productive -- that is why we use programming languages.

    So if we can have compilers that can automatically utilize multithreading, even if it is in a somewhat limited context, we should be happy. If you are happy to not be spending time thinking about padding data or allocating registers, you should be happy about automatic multithreading.

    --
    Palm trees and 8
  60. Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · · Score: 1

    All the performance gains are immediately devoured to make life easier for developers with no benefit to users.

    Nonsense:

    http://www.paulgraham.com/carl.html

    A move to a higher level of abstraction that was so beneficial to software users that the entire market was upended. That is not in any way an isolated case; the history of the software industry is full of stories of good abstractions beating the competition and improving the state of the art. Abstraction is why Unix won, and the Unix abstraction was immensely beneficial to computer users.

    --
    Palm trees and 8
  61. Re:Or.. teach devs to use threading as appropriate by Darinbob · · Score: 1

    Actually I suspect it's something like dataflow but at a coarser grain. Figure out which basic blocks don't depend on each other then have them execute. Just guessing w/o reading the paper, but it's not like this stuff is new. And you still need to teach the programmers because no matter how good the compiler you can easily write code that isn't parallelizable. All those kids who thought theory wasn't useful in real life need not apply.

    And what's with everyone calling processors "cores" now?

  62. Re:Fast First Post by Anonymous Coward · · Score: 3, Interesting

    /casts detect troll 10 post radius. Multiprocessor technology was already established when Bill Gates was still dumpster diving for code samples, and a lot of the work that gave us modern high performance multiprocessing was done by companies like IBM, Intel (and Intel shootoffs like Sequent) back when Windows was just the latest new DOS application. At the extreme end of the spectrum, this is why all of the top 10 supercomputers run on the UNIX-alike known as Linux, and none run on a Microsoft based OS.

    I'm afraid regardless of what Microsoft would now like to present as a PR image, they've consistently been embarrassingly late to the party. Let's not belittle their contribution though; they certainly throw more money at problems than any other tech company.

  63. Re:Or.. teach devs to use threading as appropriate by doublebackslash · · Score: 1

    I write in Java, most days, so I depend on compilers to do a lot of work for me. I don't depend on them [compilers] to pick the correct algorithm for me.

    Simple problems get library functions. Simple streams of instuctions get optiomized futher by the processor. It still requires an awareness of how to optimize the problem to get good results.

    Can't sort a linear list like you sort a random access list, an't sort a directed cylic graph at all (lest you peer inside and find a wayto break the loops) at all, and you can't provide consistency, patition tollerence, and availability at the same time. Some things *are* written in stone, but not as many as I let on, but CERTAINLY more than I'd like!

    Laziness need not apply.

    --
    md5sum /boot/vmlinuz
    d41d8cd98f00b204e9800998ecf8427e /boot/vmlinuz
  64. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 3, Insightful

    L4 isn't exactly trivial, but it's proof exceeds the length of it's code by whole factors.

  65. Re:Or.. teach devs to use threading as appropriate by Darinbob · · Score: 1

    Or forget the silly new stuff tied to specific platforms. We've had parallel languages and languages with parellel constructs in the past that are perfectly fine, even in not mutated by exposure to megacorporations. Occam, SISAL, Ada, etc. The trick is to get rid of the stupid programmers who don't want to deal with complexity or theory and get them back to doing mindless tasks on the web.

  66. Re:Fast First Post by aNonnyMouseCowered · · Score: 5, Interesting

    "MS R&D is the largest computer tech R&D in the world. Combine IBM, Intel, and AMD, and you get an idea of their size."

    Citation need. Not disputing the first part. Just the second part about the relative size of Miscorsoft Research.

  67. Re:Or.. teach devs to use threading as appropriate by shutdown+-p+now · · Score: 1

    The paper in question is not really tied to any specific platform. They showcase it for C# (which is also multiplatform, by the way - there's always Mono), but the principles and math behind this technique can be applied just as well to any language with objects and references to them. And it does not replace what e.g. Occam did, but rather builds on top of that to add further, more high-level primitives to improve parallelism; it doesn't mean that you have to ditch the more basic building blocks, like the explicit parallel execution operator like Occam's PAR (indeed, the paper actually provides one).

  68. Re:Fast First Post by aNonnyMouseCowered · · Score: 1

    Damn don't-reply-to-first-post curse. Reposting without typos: "MS R&D is the largest computer tech R&D in the world. Combine IBM, Intel, and AMD, and you get an idea of their size." Citation needed. Not disputing the first part. Just the second part about the relative size of Microsoft Research.

  69. Superscalar, old is new again. by citizenr · · Score: 2

    Sounds like user provides list of Structural/Control/Data Hazards. Compiler Pipelines code into blocks that can be ran in parallel.
    Sounds familiar :)

    --
    Who logs in to gdm? Not I, said the duck.
  70. The real law in play is Amdahl's by 14erCleaner · · Score: 5, Informative
    From skimming their paper, it doesn't appear that they get any real speedup from their parallelism. This is apparent because they state, in the part about the millions of lines of code written in their language, that

    These and other applications written in the source language are performance-competitive with established implementations on standard benchmarks

    Translation: we didn't speed them up any, or at least not by enough that we care to share any number.

    Amdahl's Law is difficult to overcome in auto-parallelising systems that rely on anything other than loop optimizations. Basically, in straight-line code, if you make 50% of the code run on multiple cores, you're only going to get at most a 2x improvement in speed. In practice, you won't even get that much (except in loops) due to added overhead. Bottom line: you can write papers about your wonderful auto-parallelizing technique, but when the rubber hits the road this is unlikely to lead to much performance improvement.

    --
    Have you read my blog lately?
    1. Re:The real law in play is Amdahl's by Anonymous Coward · · Score: 1

      The paper is about safety, not speed; trying to make parallel programming easier, not more performant. Sure the established implementations are just as fast, but they were more difficult to write. Easier parallel programming means more programs written to actually use parallelism.

    2. Re:The real law in play is Amdahl's by Paradigma11 · · Score: 1

      From skimming their paper, it doesn't appear that they get any real speedup from their parallelism. This is apparent because they state, in the part about the millions of lines of code written in their language, that

      These and other applications written in the source language
      are performance-competitive with established implementations
      on standard benchmarks

      Translation: we didn't speed them up any, or at least not by enough that we care to share any number.

      Amdahl's Law is difficult to overcome in auto-parallelising systems that rely on anything other than loop optimizations. Basically, in straight-line code, if you make 50% of the code run on multiple cores, you're only going to get at most a 2x improvement in speed. In practice, you won't even get that much (except in loops) due to added overhead. Bottom line: you can write papers about your wonderful auto-parallelizing technique, but when the rubber hits the road this is unlikely to lead to much performance improvement.

      I do think that Amdahl's law is complete bullshit. There is no metaphysical part of the program that is not parallellizable . Parallell programming is, in contrast to concurrency, a performance optimization problem. It is the job of the programmer to come up with solutions, algorithms that perform better. After all, you can always do speculative evalution of later parts of the program if you have enough parallell ressources.

    3. Re:The real law in play is Amdahl's by chthon · · Score: 1

      Serial dependencies are where the problem is at. There are things that you can never do before other things, and this depends on the algorithm.

    4. Re:The real law in play is Amdahl's by Anonymous Coward · · Score: 0

      Well assuming it's in the cloud you can plug as many cores as you want speeding up the process to a reasonable limit with regards to cost/output.

    5. Re:The real law in play is Amdahl's by subreality · · Score: 3, Insightful

      There is no metaphysical part of the program that is not parallellizable


      require 'digest/sha2'
      string = 'Parallelize this!'
      123456789.times { string = Digest::SHA2.hexdigest(string) }
      puts string

      Any suggestions?

      Most programs have parts that can be run in parallel, but Amdahl's point is that as you increase the number of processors you'll find that the parts that are inherently serial - even if they're quite small - will increasingly become the bottleneck.

    6. Re:The real law in play is Amdahl's by Anonymous Coward · · Score: 0

      The paper is about safety, not speed; trying to make parallel programming easier, not more performant. Sure the established implementations are just as fast, but they were more difficult to write. Easier parallel programming means more programs written to actually use parallelism.

      Easy to do something useless? I'll admit I didn't read the paper, but nothing I've read in the comments convinces me it's different from what I read 20 years ago.

    7. Re:The real law in play is Amdahl's by dywolf · · Score: 1

      so you're saying that bits that benefit from parallel will benefit, and parts that dont wont...
      basically what people have been saying about multiple cpu's and cores for years.

      well duh.

      the whole point of the thing is that they claim to be able to compile code to take advantage of what parallelism it can, specifically when the program wasnt originally written to take advantage of parallel processing.

      --
      The guy who said the election was rigged won the presidency with the second-most votes.
    8. Re:The real law in play is Amdahl's by ImprovOmega · · Score: 1

      There are absolutely parts of a program that cannot be paralellized. Data dependencies charted using the Polyhedral Model (or similar) can be mathematically proven to be unable to be done in parallel without potentially changing the result. Many automatic parallelizing compilers use this or similar to generate parallel code.

      But by all means, dismiss the work of generations as "complete bullshit" we await your genius suggestions about how to do this better.

    9. Re:The real law in play is Amdahl's by ImprovOmega · · Score: 1

      The key point with Amdahl's law is that you can reasonably determine how much paralellism will help you. You fast approach a point of diminishing returns for most problems. So if, say, your maximum theoretical speedup is about 10x, then you start seeing drastically reduced returns at > 10 processors running it.

    10. Re:The real law in play is Amdahl's by citizenr · · Score: 1

      Serial dependencies are where the problem is at. There are things that you can never do before other things, and this depends on the algorithm.

      Superscalar CPUs have been doing this for 50 years.

      --
      Who logs in to gdm? Not I, said the duck.
  71. /. once again missing the point by Anonymous Coward · · Score: 1

    ..the point is that old code -- code that wasn't written to take advantage of multiple cores -- can now take advantage of multiple cores because of this new compiler without hiring you neckbeards to rewrite it.

    1. Re:/. once again missing the point by konohitowa · · Score: 1

      So... how is the code going to get rewritten so that the compiler can then optimize it? If that question confuses you... RTFS.

  72. Re:Or.. teach devs to use threading as appropriate by phantomfive · · Score: 1

    Get rid of the eye candy and watch how responsive your computer is.

    --
    "First they came for the slanderers and i said nothing."
  73. Re:Or.. teach devs to use threading as appropriate by PlusFiveTroll · · Score: 1

    I guess it depends if you're writing the hot path in an operating system kernel/driver or if you're writing an application for the desktop. Details of the hardware make a lot of sense on one and not on the other.

  74. Re:Or.. teach devs to use threading as appropriate by SplashMyBandit · · Score: 2

    someObj.someMethod(if(x == 2) { 5; } else { 10; });

    Aarrrrghhh!!!!!111 Surely you would never *teach* students such monstrosities. The idea is to keep it simple, one statement or side effect per line of code. This helps you when you wave your debugger over the code, since it will automagically highlight the result of the statement. This also helps the people reading your program (and source code is meant for people firstly). You are trying to write Java in non-Java different style, no wonder that code you gave is so hideous. Of course, the Java compiler is so damn efficient that extra assignments for debugging and readability are no problem at all - let the machine mash the code, not you. Trying to mash everything into one line is for C and LISP coders (which I was at one time so I recognize the same style), not for someone used to multiple 1080p displays. Newlines really aren't that expensive, dontcha know?

    As a language, Java is pretty awful; you are stuck with a single programming paradigm even in cases where that paradigm makes no sense, you are writing enormous amounts of code to express what you are trying to do, and features that programmers in other languages take for granted are just not there (or are very limited). Meanwhile, you have Scala and Clojure on the JVM -- so why not just use these where functional approaches make sense, like for parallel algorithms?

    I would argue that Java is more general purpose than Scala or Clojure, at least for the kind of problems I'm often trying to solve (eg. device control, 3D graphics, medical applications, numerical modelling etc). Perhaps Scala and Clojure fit your/people's problem domains, if so, then I have no problem with people using them. Don't mistake that for being useful in the outside (non-teaching) world. Note also that functional constructs do have some overheads that iterative-style code does not (where the latter is not exclusive to, but more in Java's 'style', yeah?). For my problems that are massively iterative (eg. some parts of graphics, numericals) this does make a difference, and the functional constructs are unnatural and used so infrequently to be useless. The usefulness of threading is above this (eg. the multi-threading is coarse grained and limited to a few dozen threads, again functional programming doesn't offer enough of an advantage in this regard).

    Back to your teaching. Surely new students follow the deterministic procedural code better at the start than having lots of anonymous closures popping up all over the place? It is more advanced users that like the constructs that bring them home to the Source, that is, LISP.

    Yes, they should have used the ternary operator.

    No they should not have. The ternary operator is for C programmers making the jump. Again, newlines are not expensive. There is nothing wrong with making an if statement and assigning to a variable. Then give the output of that variable to your function call. One statement or side-effect per line. The compiler can make this fast. Someone who is not you can read the code without thinking about it (unless they are snob, and think the code is too simple and should be slightly obfuscated in the way you have shown). With lines split out in this way you get to put a debugger on the statement you want because multiple statements are not mashed together, and again, the debugger will automagically show you the value of the variable or expression as you move you mouse over it (less clicking is good, yeah?).

    If aiming for simplicity is good enough for Einstein and Steve Jobs, then who am I (or you?) to argue? As they say, "You can write bad FORTRAN in any language" and your example proves it. Just because Java (and all other languages) allow obfuscated constructs because people think terse code is better (it's not!) and those that write terse code are better coders (not true, such things are only held in esteem by those lacking several decades of experienc

  75. Re:Or.. teach devs to use threading as appropriate by ceoyoyo · · Score: 1

    "From everything I've read about threading"

    I think you just made his point for him.

    Have you ever proved the correctness of a program? I have. It took a long time. And it was a very simple program. From that we learned that correctness proofs are for when you're doing very simple embedded programming with lives on the line. For everything else there's sanity checking, exception handling and debugging.

  76. Re:Or.. teach devs to use threading as appropriate by PlusFiveTroll · · Score: 1

    I think his complaint was there are a lot of shitty abstractions in use that happen to import a kitchen sink and bathroom in when you only need a bedroom, he just didn't realize it. Most of that slowness is due to the fact that hard drives and flash in small devices is still many times slower then the speed of the processors. I think .Net was bigger then a W95 install. Stick a fast SSD with high IOPS and you kick even the lightest weight OS from '98 out the door.

    The other thing he fails to state is that the OS from '98 will die a hot death on the net today. You do use up a bit more processing time verifying that your input isn't trying to kill you.

  77. "Require that your program be purely functional"? by bill_mcgonigle · · Score: 1

    Unlike many other approaches, it doesn't require that your program be purely functional either.

    WAT?

    One of the things that got the llvm project moving along was work on Single Static Assignment compilers. The basic idea is that you take the parse tree and transform it to find segments of the tree that don't depend on other branches of the parse tree. Once you find those, you're free to dispatch the independent branches to be run wherever it's convenient.

    And this was like a decade ago. Nobody is saying that to get parallelism you have to write functional code. Did somebody read an article on Erlang in CIO magazine and get confused?

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  78. Re:Fast First Post by ChrisMaple · · Score: 1

    I don't know what portion of their staff is research, but IBM has 440,000 full-time employees, MSFT 90,000 (yahoo finance).

    --
    Contribute to civilization: ari.aynrand.org/donate
  79. 33 years old new.... by Anonymous Coward · · Score: 0

    This sounds very like work done by Josh Fisher leading up to his Ph.D. thesis in 1979.

  80. Re:Or.. teach devs to use threading as appropriate by VanessaE · · Score: 1

    "And what's with everyone calling processors 'cores' now?"

    Simple - it's because "processor" still generally refers to the chip and its packaging as a single device.

    Most processors have a number of structures and resources that are shared by some or all of their execution units (such as a processor-wide shared L3 cache versus a single core's dedicated L1 cache). Since an execution unit genrally can't be separated back out from the rest of the processor, but it still mostly functions as an independent unit, it doesn't make sense to call such a unit a "processor". Someone had to come up with a term that describes those execution units, and since they collectively make up the central operating core of the processor, "core" just makes sense. Who came up with that? No clue.

  81. Re:Or.. teach devs to use threading as appropriate by ChrisMaple · · Score: 1

    Optimization has a long way to go, and I'd guess that it's really difficult to write a good optimizer. I recently wrote a 128 bit x86 assembly cosine routine more than twice as fast as the maximally optimized libquadmath routine written in C.

    --
    Contribute to civilization: ari.aynrand.org/donate
  82. Re:Or.. teach devs to use threading as appropriate by Zmobie · · Score: 1

    I don't know what morons you have worked with, but straight out of college every company I talked to expected me to know at least basic multi-threading. My company now is not even that big and we use multi-threading extensively. An auto-threaded compiler though could run circles around 99% of all developers. This is great for doing large levels of optimization automatically, and just like with manage languages it is a great asset to programming in general.

    Yes, there are lazy developers that do not understand what a managed language is doing in the background, but look at it this way at least there is not near as much time wasted on debugging their lazy ass pointer arithmetic or messy memory leaks. There are also people that are insanely good at memory management such that they can rival or beat most managed languages, but the effort involved has highly diminished returns. Not only that but no more thread-lock that takes weeks to track down in highly complicated multi-threaded applications.

    This in my opinion could be huge boost to lots of fields. Imagine a completely scalable genome sequencer running on a super-computer. We are talking minutes or even potentially seconds to compute a sequence if it continues to grow on a Moore's law scale. Applications in physics, biology, and tons of other fields make this not just huge for computer science, but for everyone.

  83. No, it won't "Restore Moore's Law" by Anonymous Coward · · Score: 0

    Moore's law is about transistor counts doubling, which is what has been occurring. An i7 has *a lot* of transistors.

    But, no matter how many threads and cores you throw at a problem, you are still subject to Amdahl's law.

  84. Re:Fast First Post by happymellon · · Score: 5, Funny

    Of those, IBM have around 439,999 project managers.

  85. Re:Or.. teach devs to use threading as appropriate by Darinbob · · Score: 1

    I always just called the "core" the processor. Sure they share some higher level caches but that's only because someone moved it into the chip (or at least the package). What do you call it when the entire processor doesn't even fit on a single chip, or single board? The distinction about what's on silicon, what's in a "package", what's on a board, etc, is all implementation detail and should not affect the naming that everyone has understood for decades and is described in all textbooks.

  86. Re:Or.. teach devs to use threading as appropriate by sjames · · Score: 1

    And it remains a small subset.

  87. Sounds like polly to me by Anonymous Coward · · Score: 0

    The so called breakthrough at microsoft sounds a lot like the polly.llvm.org project which started 3 years ago.

  88. Best ever comment ... by dgharmon · · Score: 1

    Best ever comment ... (Score:+9)

    --
    AccountKiller
  89. Re:Or.. teach devs to use threading as appropriate by sjames · · Score: 1

    So all those threaded programs in daily use are just my imagination?

    Perhaps it's time to recognize that not all programmers are the same. If it appears that no programmer can handle threading, perhaps it's because ageist management usually pushes them out before they accumulate enough experience to actually manage it.

    So, the choice now is between no more improvements in performance, hire experienced programmers to actually code efficiently after years of blowing efficiency off with assurances that next year's faster CPUs will be fast enough to make up for crappy code, or hire programmers experienced enough to SUCCESSFULLY write multi-threaded code that can take advantage of the new bazillion core processors.

    Perhaps in another 20-40 years compilers will be as good at auto-threadibng as they are at optimizing serial code now.

  90. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    Some people just aren't that smart. Are you quite sure that threading is no obstacle for someone of an IQ of 100, supposing said person is hard working? I don't know. I know that auto-threading compilers could save me a lot of headaches.

  91. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    Abstractions are a source-level concept. They should be more or less compiled away, especially in a JITed language like Java/C#.

    That said, the, uh, nephew(?) poster makes a good point: bad abstractions can cause problems.

  92. My impressions about the topic and paper. by hackus · · Score: 1

    First of all, I would like to point out that many of these ideas present in the paper have been proven to not work practically, as one will take immediate note of the use of C# and similar architectural similarities and assumptions with regards to Microsoft's Universal Compiler. Microsoft (et. al.) I would like to point out, many other organizations with far stronger talent and resources have also been working on the problem of Thread mutex issues.

    With regards to Microsoft, they thought they had the problem solved with the .Net architecture of separating language constructs to code generation and with it reducing it to proof at runtime. Problem is it would seem they could never get it quite right with either bad code being produced, or the human coder coming up with situations and algorithms that just plain could not be proofed at runtime and ran single threaded anyway. Usually in a really bad way. (i.e. Employing the use of locking structures in the runtime code that really whacked CPU affinity and cache coherency.)

    This resulted in some fairly spectacular failures with systems designed with these sorts of assumptions as described in the papers "Lemma's" section.

    Secondly, I think given the length of time this problem has been around one hope remains for a more straightforward approach to language concurrent syntax. Like I said, this problem has been around and has been looked at extensively in my opinion by much more competent people long long LONG ago...(IBM Labs in 1990's PowerPC compiler papers there.). What hasn't been around is the hardware options for solving this issue. Now that we are at 22nm as well as using all three dimensions of that tiny space hardware will be advancing at a very pace for the next several years. With all of the extra room, there is hardware assist that may become practical to make at least the locking facilities more straightforward with registers and dedicated memory to handle mutex atoms as a sort of map for the compiler to fill out for execution. No code analysis required.

    So now I am SUPER skeptical. :-) Primarily because no knew computing hardware was mentioned by the authors and how the concurrrent algorithm generation would be assisted in the use of atomic operations on today's typical NUMA systems.

    But after reading the paper, I am not hopeful that this is practical or that it even works correctly.

    Microsoft has tried to foster this architecture on the exchanges by attempting to apply this sort of reasoning from similar papers in the past and apply it to massive computational concurrent computing and it failed in a spectacular way. (I am talking about the worlds exchanges on Wall St. London and Tokyo.)

    After several extremely costly failures that occurred wherever this system was deployed, all of world wide banking exchanges went back to LINUX, which is currently the all time master of concurrent computing world wide on scales that leave this paper and its researches I am afraid far far behind.

    All hand coded and with very little optimization work to boot via gcc.

    -Hack

    --
    Got Geometrodynamics? Awe, too hard to figure out? Too bad.
  93. Infinite Monkeys by dutchwhizzman · · Score: 2

    I'm sure that they have produced all the worlds great literature the last 30 years as well. Just their size doesn't make them the source of everything good.

    I haven't seen any GPU or CPU design used in modern computers come from MicroSoft. Sure, they have worked with it to implement it into their own software. They probably have been able to express their preferences to hardware vendors, in order to make them support the hardware and make it perform good in their software. High Performance multi-core kernel design in Windows was, if I call correctly, taken from VMS and Compaq design when they designed Windows NT. MicroSoft has been struggling to get over 16 cores effectively supported by their OS and most definitively by their applications. Linux has been getting great support from SGI when they ported their NUMA architecture from IRIX to Linux and building on that, Linux has been expanding and rewriting their multi core stuff more efficient and larger than MicroSoft has done with Windows.

    Now what technologies has Linux taken from MicroSoft (free) research exactly? What MicroSoft patents were violated by Linux again? What's that eerie silence all of a sudden?

    --
    I was promised a flying car. Where is my flying car?
    1. Re:Infinite Monkeys by Anonymous Coward · · Score: 0

      MS funds a lot of research and collaborates with Universities. They don't put their name on it but MS has dumped a lot of money into all kinds of computer tech that we use today.

  94. Re:Or.. teach devs to use threading as appropriate by jimbo · · Score: 1

    Ahhh, Occam, the memories come flooding....

  95. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    Mono works worse on the average program than Wine, if you consider C# multi-platform then C/C++ using the Windows API is multi-platform even more.

  96. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    someObj.someMethod(if(x == 2) { 5; } else { 10; });

    In what version of java is this legal? I have never seen anything to make me think that would compile, and trying it out for myself I can't make it work:


    class main {
            void someMethod(int bla) {
                    System.out.println(bla);
            }

            public static void main(String[] args) {
                    main m = new main();
                    m.someMethod(if(3 > 2) { 5; } else { 10; });
            }
    }

    All I get is errors on line 8 when trying to compile this. I don't know where you learnt this, but I don't think it was in Java class. Are you sure you're not getting Scala and Java confused?

  97. Re:Or.. teach devs to use threading as appropriate by ShakaUVM · · Score: 1

    >Or, gawd forbid.. we could teach programmers how to use threading?

    My Master's Degree was in High Performance / Parallel Code, and I used to TA the subject, and worked at the San Diego Supercomputer Center.

    I worked with one of the original auto-threading compilers back in the day (on the TERA supercomputer). It was a lot easier to port code to be automatically parallelized than it was to refactor code to be multi-threaded. Some code was easy enough to parallelize manually - for example, you can find one loop that does the lion's share of the calculations, and you just worry about parallelizing that - but some was a complete and total bitch to refactor. So an automated tool to find parallelism was pretty neat.

    I ran the Quake 2 utilities through the Tera, and it found some minor speedups, but nothing too incredible. I had a few conversations with John Carmack about it, and it confirmed what he thought about auto-parallelizing, too.

    So maybe Microsoft has finally done it right. There's no way to tell until I can test it for myself.

  98. Re:Fast First Post by Pascal+Sartoretti · · Score: 1

    MS R&D is the largest computer tech R&D in the world. Combine IBM, Intel, and AMD, and you get an idea of their size.

    For a company whose main job is to develop software, it is not surprising to have a large R&D.

    But if you talk about research, can you please remind me how many Nobel prizes IBM and Microsoft have each won ?

  99. Garbage collector by Anonymous Coward · · Score: 0

    Please recompile the .NET garbage collector with it.

  100. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    And all that multithreading goes flat when the GC enters the global lock, when it is time to clear out a generation

  101. Re:Or.. teach devs to use threading as appropriate by sjames · · Score: 1

    The problem here is that unless the programmer knows what is going on wrt threading, the compiler will necessarily be limited to threading across short segments of the code.

    OTOH, tools using some of the techniques in TFA to act as a sort of lint for threads would be a very good idea.

  102. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    How frequently do you spend time thinking about which registers are being used to store partial results in the programs you write? When was the last time you spent time thinking about which registers are being used by which variables, or when it would be a good time to store register contents in memory?

    Very seldom since the processors I use treat most registers the same way and I know that the compiler won't use the other registers needlessly.
    I do however often verify that the compiler generated acceptable output because this is not always the case. For modern compilers to work as intended you have to structure your code accordingly. Changing from while(blah) {}; to a do {} while(bleh); and similar things can do wonders.

  103. openmp/tbb/etc by l3v1 · · Score: 1

    So, basically you tell the compiler what's immutable, readonly, isolated, etc. so it can parallelize the code easier. It seems clever, however, I don't see people good in tbb and/or openmp et al. rushing to drop and move over. But others might like it. Also, someone who's familiar with Intel Cilk Plus or Berkeley UPC could say a few words about this.

    --
    I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
  104. Looks like by Anonymous Coward · · Score: 0

    declare normal references, read-only references to mutable objects, references to globally immutable objects, and references to isolated clusters of objects

    Somebody re-invented type safe pointers, const pointers, pointers to const and container objects. ;)

  105. Re:Fast First Post by deoxyribonucleose · · Score: 2

    If you talk about Nobel prizes, could you please remind a forgetful Swede how many are awarded in mathematics or software engineering? Or is research strictly a speaking only something that happens in physics, chemistry, medicine.... or literature... or (snrk) in (giggle) economics (bwa-ha-ha-ha-ha!)?

  106. Re:Or.. teach devs to use threading as appropriate by Vintermann · · Score: 1

    It's almost as hard as writing sentences that don't having grammar errors.

    --
    xkcd is not in the sudoers file. This incident will be reported.
  107. Re:Or.. teach devs to use threading as appropriate by Vintermann · · Score: 1

    auto-threadibng

    With multithreaded programming, there are simply more ways to make mistakes. Far more than in writing a Slashdot post, and yet...

    --
    xkcd is not in the sudoers file. This incident will be reported.
  108. Possible problematics by fa2k · · Score: 1

    This is good for number crunching applications that run for tens of seconds or more, but there may be a problem for small workloads: CPUs are normally clocked down when idle. If, say, Java was compiled with this tool and I tried to start Eclipse, it would have to realise that all four cores needed to be clocked up to nominal speed, rather than just one. This means more code will be executed with the very slow 800 MHz or so setting. Worse than that, if the compiler can't parallelize the program perfectly, you'll have cores with the clock speed constatnly being turned up and down. Or you can make the CPU frequency governor so sensitive that it doesn't clock down easily, but that will use more power. Similar concerns with the "turbo core" mode.

    It just means that people shouldn't compile normal application software with auto-parallelization, but only actual number crunching code (Photoshop filters, for example, if they are not already threaded).

  109. Re:Or.. teach devs to use threading as appropriate by sjames · · Score: 1

    And GUI programs offer many more ways to screw up than batch programs...

  110. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    Damn, a great comment completely spoiled by a stupidly incorrect final summary.

    In fact, FFT methods are used "in the real world", when it makes sense. Your comment even says when it makes sense (> 10,000 digits) and then you blow it by pretending that 10,000 digits never occurs in "the real world". Well, sure bud, except when it does. 10,000 digits is a very small number in some domains, just not domains that you've considered.

    So, nice post about constant factors - these are important things to understand - but STFU about "the real world". There are plenty of non-toy examples where hand-optimized assembler can still beat the compiler hands down. In the real world I spent three months optimizing the G729 codec on a certain processor, and made it two orders of magnitudes faster. I was paid enough to convince me that G729 isn't a "toy problem". According to your last sentence, things like this never happen in the real world, and I'm afraid that's bullshit. And you know it's bullshit because it contradicts everything you've already said, so I don't know why you even typed that in.

  111. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    Did this also happen when you forgot to write the Data Access Layer or didn't use a template engine, UI framework or communication protocols? Since I guess that too is just for lazy programmers who don't know what their code is doing...

  112. I'm not impressed; don't believe it... by Anonymous Coward · · Score: 0

    C# is a "toy" language akin to PASCAL, but not quite as good. And I challenge their claim is true if
    all of the "features" of C# were utilized. For example, there are many basic features missing in C# that
    require calling external functions in some MS DLL library.

    1. Re:I'm not impressed; don't believe it... by neminem · · Score: 1

      Convenient, then, that "calling external functions in some MS dll library" is something that's trivial to do in C#. Why reinvent the wheel?

      MS has made a lot of crappy stuff over the years, but I find I actually enjoy programming in C# more than any other language I've tried. Granted, there are a lot of languages I haven't tried, but there are a fair number I have, too.

  113. Re:Or.. teach devs to use threading as appropriate by girlinatrainingbra · · Score: 1

    hmmm, what exactly is the point of finding these "decreased computational complexity" algorithms (such as the matrix multiply coopersmith-winograd you mentioned or the integer multiply schonhage-strassen) if the constant in front of it is so large that the benefit is not seen for polynomial A (naive algorithm) vs polynomial B (lower computational complexity algorithm) until you get very far out on the x-axis (or would it be the n-axis?) ?
    Is it purely to find a better algorithm, or is it to show that as the size of the problem increases, there might be a better algorithm that would actually be of use?

  114. Hm, so... by Parker+Lewis · · Score: 1

    ... is like Scala?

  115. Topoligical sort in re: do things sequentially by girlinatrainingbra · · Score: 1

    re: But even make will do things sequentially. Couldn't something like tsort be used to do a topological sort and list the dependencies and branching, and allow for branches that do not share dependencies to be run in parallel?

  116. Re:Fast First Post by phagstrom · · Score: 3, Funny

    And the last one cleans the office and makes coffee.

  117. Re:Fast First Post by witherstaff · · Score: 1

    I'd also like a citation. I won't dispute that MS does some neat things but I've seen a lot of Pro-MS postings tossing buzzwords around lately. At least this posting was on a relevant story.

  118. Isn't this part of what VLIW has already tried? by unixisc · · Score: 1

    Isn't this one of the myriad issues that the VLIW compilers ran into, since in VLIW, just about everything has to be done by the compiler and there are no dynamic analysis portions on the chip itself? That was what sabotaged VLIW's attempt to go one-up over RISC - the percentage of die area that RISC was using for functions like branch prediction, speculative execution, register renaming was really low, so VLIW didn't save that much in terms of silicon, but at the same time, the perfect compiler that it was banking on never materialized.

    It would seem that the Itanium would be one of the best platforms to test the above theory. If only someone would make an Itanium based workstation w/ which to test that. Although since Microsoft has dropped support for the Itanium, that may be problematic. Or maybe they could take one of their Itanium boxes, put an internal version of Windows 8 on it, and then put this new compiler on it and try it.

    Looks like this VLIW compiler would have been a perfect project for Microsoft Research.

    1. Re:Isn't this part of what VLIW has already tried? by dumael · · Score: 1

      Afaik, the problem with VLIW processors in general is that they attempt to exploit instruction-level parallelism.

      This is an entirely different beast to what's presented in the paper.

      Instruction-level parallelism occurs when there are instructions within a (fixed) window of a code stream where there are no dependencies between two or more instructions.

      The VLIW paradigm is have bundles of instructions which contain all instructions that can be executed simultaneously. This shifts complexity from the hardware[1] to the compiler.

      Unfortunately, ILP can be very difficult to extract from arbitrary code, though cases exist where it's trivial.

      [1] Latter RISC chips and today's non-mobile CPUs take advantage of ILP through the use of multi-issue out of order execution. Out-of-order execution typically defers execution of any given instruction until all its dependencies have been fulfilled i.e. memory/cache accesses have occurred, previous results are available, etc. By making these units multi-issue the CPU dynamically exploits ILP to the availability of hardware, no recompilation required (though it may help).

      These hardware techniques are slowly coming to the mobile arena as they are relatively expensive transistor wise.

  119. Re:Or.. teach devs to use threading as appropriate by serviscope_minor · · Score: 1

    hmm, what exactly is the point of finding these "decreased computational complexity" algorithms

    That is a quite astonishingly ignorant statement. That's like saying "what is the point of any research".

    I'll leave you to figure out that for yourself.

    Perhaps you should research it.

    --
    SJW n. One who posts facts.
  120. Re:Or.. teach devs to use threading as appropriate by Rich0 · · Score: 1

    Well, arguably that is no harder than rewriting it all to be multi-threaded.

    I think the articles's approach is a good one - it allows procedural code while hiding threading details from the programmer, which means that the programmers task is to properly tag code and not to try to carefully implement locks/etc.

  121. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    Problem is that only a limited amount of parallelism can be achieved without changing the system itself - see Amdahl's law. Essentially, if in your system operation B consumes the result of operation A then A has to happen before B, and there's nothing a language or processor can do to help. As a result, any automatic multithreading will only be able to achieve a limited amount of speed-up. You'd be lucky to get a few times speed-up and extremely lucky to get a few tens of times. That may sounds a lot, but it isn't really, particularly if we find ourselves in a world of 1024-core CPUs. And even getting gains like this isn't totally automatic: in this case immutability hints have to be added, so they're not magically parallelising unmodified code in an existing language.

    To go beyond Amdahl's law is a higher-level design challenge: you need to restructure things on larger and larger scales in order to get improvements. Not only that, but the parallelism choices you make on a larger scale can negate or even reverse the benefits of lower-level parallelism such as the language itself might have applied. As a trivial example, a problem might be embarrassingly parallel on a large scale, allowing you to run large sequential tasks side-by-side, but if each of those tasks is attempting to parallelise itself things are suddenly a lot more complicated.

    This certainly doesn't mean that there's no point in having something that can automatically parallelise code at the small scale, or that finding ways to make things easy for programmers isn't a worthwhile area of research. However, it does make this a qualitatively different problem to something like choosing registers: you never have to completely redesign a system in order improve its regster usage. Some forms of parallelism could perhaps (it has't really been proven in practice) be considered a low-level detail that can be hidden by a language, but parallelism as a whole is something that affects design. As a result, however good the tools become, as long as there are programmers and a need for parallelism there will be a need for good programmers to understand parallelism, whatever language they are using.

  122. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    I think it's better to roll all this up into simply using the right tool or technique for the job. You have tons of code to write, doing it all in assembler may not get you to the finish line very quickly.

    I am really a big fan of profiling code projects and then looking for critical areas where speed ups are warranted and then writing these in assembler directly. If you know exactly how the CPU you are working with operates, you can optimize your assembler better than a compiler with all the tricks they try. I had an argument/competition about this with someone that was a compiler evangelist decades ago. I trounced his beloved Watcom C compiler. Things have changed a lot since then, but you will never convince me a compiler can out-wit ALL human beings.

  123. Re:Fast First Post by TheRaven64 · · Score: 4, Informative

    Nobel Prizes? None. Because there is no Nobel Prize for computer science. The closest equivalent is a Turing Award. I haven't checked the whole list, however the 2009 winner works at Microsoft Research (and has since 1997), the 1998 winner is now dead, but was working at Microsoft Research between 1995 and his death. You can check the list yourself for IBM.

    --
    I am TheRaven on Soylent News
  124. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    This is why you don't drop out of college. If your college is any good, they'll teach you how to design and code parallelized programs. Threads only introduce subtle bugs when you use them wrong (and you're using a threading library instead of reinventing the wheel right?). You don't waste time parallelizing trivial parts of your application. You parallelize the data and/or CPU heavy computations. There are some problems which are trivial to parallelize.

    When was the last time you developed any type of program that was proven to be correct? Do your hello world programs handle a closed standard output stream?

  125. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    You just did not "get" it, did you?

    It matters for both/all situations if success of the application depends on how fast it can happen. Think of weather simulations and nuclear explosion simulations and biological simulations. All those need to be as fast as can be to be as useful as they can be.

    The point that was so well-said by the previous poster is that you doing multithreading to the nth degree while writing comprehensible code is pretty much impossible. Compilers, however, can be taught how to do the intricate detailed analysis of the execution flow that your code implies and figure out how to multithread it more correctly and more completely than you would ever be likely to do.

  126. Re:Or.. teach devs to use threading as appropriate by Desler · · Score: 1

    Let me repeat myself: except for very small programs or very short inner loop bodies.

    Yes, and you're still wrong. Video/audio decoders are not "very small programs" nor are game engines which also extensively use SIMD.

  127. Great white hope? by Anonymous Coward · · Score: 0

    Was functional programming supposed to be a way to take development back from... uh... Indians?

  128. Re:Or.. teach devs to use threading as appropriate by bingoUV · · Score: 1

    Do you know how we can write to log files in functional languages? For any diagnosability in a program, logging is essential.
    "Log" is a difficult word to search these days, so couldn't find much on my intial research.

    thanks

    --
    Bingo Dictionary - Pragmatist, n. A myopic idealist.
  129. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    To whoever posted this originally:

    Even the best of C or C++ compilers are terrible at vectorization of code.

    The 1990s just called. Seymour Cray is on the phone laughing at you.

  130. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    Sounds like you're running the wrong OS or the wrong applications. My 2gb/2.2Ghz dual core desktop is *much* snappier than my previous 500Mhz/512Mb desktop, including when it's doing four things at once instead of just one.

  131. Re:Or.. teach devs to use threading as appropriate by KingMotley · · Score: 1

    If I cared. Or had an edit button. Or if this was grammardot, news for grammar nazis.

  132. ANY SORT has "upsides" &/or "downsides" by Anonymous Coward · · Score: 0

    DataStructures coursework shows students this much easily, per my subject-line above... now, on THIS from you? Small "correction::

    "First, just to be pedantic, I'll point out that quicksort is as bad as bubblesort in the worst case" - by betterunixthanunix (980855) on Monday December 03, @09:19PM (#42175437)

    In this case I'll outline, Bubblesort's actually FASTER than quicksort - however, ONLY IF THE DATASET IS ALREADY MOSTLY SORTED... fact!

    Otherwise though, quicksort's probably THE most "all-around optimal sort" there is + easy to implement, especially since it's (or a variant of it) usually "built-in" to many GUI controls as a method (think list boxes), AND, over the most case scenarios (hence why in many to most GUI oriented controls with a sorting method "built-in", some variant of quicksort's USUALLY used)!

    "In the real world, quicksort and mergesort will use asymptotically worse algorithms below a certain problem size because of constant factors" - by betterunixthanunix (980855) on Monday December 03, @09:19PM (#42175437)

    See above - that's one you overlooked/omitted - mostly sorted datasets...

    ---

    Sort strengths vs. weaknesses:

    SelectionSort - Good for small datasets & very easy to implement plus decent on non-mostly sorted lists
    InsertionSort - burns LOTS OF TIME on placement of item into sorted list
    Bubblesort - best on MOSTLY SORTED datasets
    Quicksort - has recursion overheads during pivot use for "divide & conquer" sort methods it employs
    Mergesort - has recursion overheads during pivot use for "divide & conquer" sort methods it employs but has temp overheads
    Heapsort - Uses b-trees, but has no temp "scratch area" like mergesort (avoiding paging)
    CountingSort - Fast for small datasets (think 1000 items or less) that deal in integers (not so fast otherwise)
    BucketSort - Fast for linked list sorting (faster than insertionsort typically) but overheads in linked list creation

    GENERIC RULES:

    A.) If your list is 99% sorted? Use Bubblesort.

    B.) If you want a GOOD "generically fast" algorithm for sorts?? Use quicksort (good over a large range of dataset sizes and sorted vs. unsorted - optimal).

    C.) If your list is TINY (think sub 100 items), use selectionsort

    D.) If you use linked lists? Use Bucketsort

    E.) If your list range is tiny (relatively speaking, say 5,000), use CountingSort

    ---

    * You're also correct in that not EVERY PROBLEM LENDS ITSELF TO MULTITHREADING HELPING IT (in some cases, it can HURT performance, as in on a single CPU/single CORE based system)...

    "Multithreading is not a magic bullet, and in all likelihood it is not generally applicable." - by betterunixthanunix (980855) on Monday December 03, @09:19PM (#42175437)

    Agreed, 110%, by example!

    E.G.-> Let's say I have this chain of events:

    A = A-B
    B = B+C
    C = A-C

    A can't get ANYWHERE until B finishes processing... C can't get anywhere until A gets done processing. This type of scenario isn't something to put threads into, it would ADD OVERHEADS needlessly, to NO GAIN!

    APK

    P.S.=> Personally, as far as multithreaded code goes? I've been writing it since 1996 using CreateThread API work, but I generally "steer clear" of "fine-grained" multithreading (where the threads can touch THE SAME DATA, since this is where lockup due to race conditions occur) IF/WHEN possible, & only use it IF I totally completely understand the issue & data...

    I usually go "the easier way" using "coarse" multithreading" (where the threads operate on diff. kinds of data or tasks, such as an example of printing 1 spreadsheet page on 1 thread, & formatting another on another, & then updating interfaces on yet another (or, just using "idle loop time" on the latter))...

    ... apk

    1. Re:ANY SORT has "upsides" &/or "downsides" by leaen · · Score: 1

      Which is theoretically correct but completely different in practice.
      When you do not care about sorting performance using library function is best.
      When you care about performance then optimized radixsort almost always wins by huge margin.(Using radixsort on floats is typical exercise.)
      Exception is when you require wild sorting order which does not happen often.
      Unless your data does not fit into memory where variant of quicksort is best.

  133. Re:Or.. teach devs to use threading as appropriate by Bob+the+Super+Hamste · · Score: 1

    This would be an example of a small segment of of highly specialized code that can be hand optimized to great benefit as the parent pointed out. I have done similar things in the past and gotten similar results. This was for some AI code in college and yes it did wonders but that was only a small subset of functions that were run the most, the vast majority of the rest of the program was done in C as it allowed me to concentrate on the higher level design.

    --
    Time to offend someone
  134. Real performance gained with multi-process arch. by Runesabre · · Score: 1

    Multi-process is the scalable architecture of choice. You get the same advantage of utilizing all available cores within a given hardware context as well as the ability to expand across multiple hardware contexts. Hardware still gets limited eventually by memory, cores, network throughput, disk, etc. Multi-process allows you to utilize more pieces of hardware to scale up your system.

    Multi-threading unnecessarily complicates the ability to develop, debug and maintain a program and is still limited within the hardware context it's running. If you want real, scalable performance that isn't bottlenecked by cores, memory or other hardware context limitations, design for multi-process architectures.

    --
    Runesabre
    Enspira Online
  135. Re:Fast First Post by hAckz0r · · Score: 3, Informative
    You were right to question the stated facts. Microsoft Research has only 350 "employees". Not all that many if you compare it to the combined resources of the others previously sited. IBM for instance has 1593 "researches" alone, and that is bonafide "researchers" not just employees or warm bodies. So the prior statement is provably false by even the quickest and roughest of google searched.

    https://en.wikipedia.org/wiki/Microsoft_Research
    http://researcher.ibm.com/researcher/search.php?sn=1

  136. Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · · Score: 2

    The idea is to keep it simple, one statement or side effect per line of code

    Except that you are then going to either (a) duplicate code or (b) have to introduce and keep track of more variables. In what way is that simple?

    You are trying to write Java in non-Java different style

    Perhaps I was not clear: that was a mistake that one of my students made. They did not know Java, they were still trying to learn it, and the point is that the Java style of programming is very unnatural. "Call f with 5 if x is 2 or else 10" is how the student was thinking about the program; they were forced to think about it instead as "if x is 2 call f with 5, otherwise call f with 10."

    Note also that functional constructs do have some overheads that iterative-style code does not

    Yes, there are limits to functional programming. There are limits to all programming paradigms, which is why languages should not force programmers to use a particular paradigm in all cases (and as it so happens, Clojure does have a loop construct). One of Java's major problems is that it really lacks support for anything other than object oriented programming, with only limited support for other paradigms or styles.

    For my problems that are massively iterative (eg. some parts of graphics, numericals) this does make a difference, and the functional constructs are unnatural and used so infrequently to be useless

    Perhaps so, although I suspect that in both cases object oriented programming is not universally applicable, and that you would appreciate the ability to just write a function without having to dress it up with class definitions. When I was an undergrad, I knew a lot of professors in the EE department who used Matlab to do computationally intense signal processing, and I know people in the CS department here who use Matlab for image processing work (and they would rule out both Java and any functional language for that work).

    Surely new students follow the deterministic procedural code better at the start than having lots of anonymous closures popping up all over the place?

    They do, but what I was saying was the Java is not an easy language to learn. Let's put it this way:

    public class HelloWorld {
    public static void main(String[] args) {
    System.out.println("Hello world!");
    }
    }

    If you have never programmed before, that is an awful lot to deal with in a first program. It is certainly more to deal with than this:

    (print "Hello world!")

    In the Java case, most of the questions students were asking were about public, static, void, main, class, String, and then questions about function definition syntax, what "args" even is, etc. The Lisp case is far simpler and far more straightforward. I would say that it is acceptable to teach Java if all that complexity would eventually lead to simpler programs, if it was just a matter of scale, but unfortunately that is not the case; Java's syntax does little to keep complexity down, and the "everything is object oriented" approach adds a lot of complexity.

    I find it interesting that you would mention closures here; in introductory Java courses, it is almost guaranteed that students will have to write at least one class definition. Do you think students have an easier time dealing with classes and objects than closures? Why even bring up anonymous closures; do you think new programmers have an easy time dealing with anonymous classes and objects (do you think we even get that far in a single semester)? My experience with teaching students how to write classes was this: it's hard for people who have never programmed before, especially when they are writing programs that are tiny and benefit little from classes. I would say that object oriented programming

    --
    Palm trees and 8
  137. Re:Or.. teach devs to use threading as appropriate by 21mhz · · Score: 1

    Do you know how we can write to log files in functional languages?

    By using a monad such as Haskell's IO.
    The trouble is, you have to wrap with it any parts of the program that need logging.

    --
    My exception safety is -fno-exceptions.
  138. WTF? by 3seas · · Score: 1

    teach a compiler?

    So we have somehow gone from writing programs (a compiler is a program) to teaching a program?

    Well we have certainly achieved artificial intelligence. Any fool who thinks there is a difference on this issue is themselves artificially intelligent and not using their own computer called a brain..

    What does a compiler do but translate the input into output the hardware understands and preferably in an optimized manner.
    So... you don't teach software how to translate, you program it to.

  139. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    quicksort is as bad as bubblesort in the worst case

    Pick the pivot using recursive median of 3 on some percentage of values. This will pull the worst case to O(n^2/log n).

  140. Re:Or.. teach devs to use threading as appropriate by ahabswhale · · Score: 2

    I've spent a lot of time in the earlier part of my career writing multithreaded applications (mostly client/server stuff) and I got very good at it. That said, it's a complete waste of my time as a developer. Developers should focus on domain problems, not how to tweak the microprocessor. In my experience, most developers don't seem to ever fully get how to write multithread apps and threading problems are a royal pain in the ass to debug. I can't see a better candidate for automation than taking this out of developer's hands.

    I used to write my own linked lists and hash tables but I don't do that anymore either. It's a new era and it's all about the domain.

    --
    Are agnostics skeptical of unicorns too?
  141. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    By the way, Moore's law also took into account minimum component cost, so he was talking about complexity of components at the minimum cost, which also brings economics into the equation. It's not only that the number of components on an IC is (or was) doubling (and now it's about multiple cores), but it's that in order for this to be accepted, the cost of production cannot expand by the same factor, the cost of production should stay minimal, the increase should be a constant, not a function (which it is, the constant cost is the cost of development of the architecture and the cost of building a fab).

    Wow, I thought you were gong to make it through an entire comment without reciting your religious mantra - I was really rooting for you. Tragically, you just couldn't do it. After writing a mostly on-topic message, you closed with

    Bringing back human slavery will allow us to continue with Moore's law. Otherwise, we are DOOMED

  142. Re:Or.. teach devs to use threading as appropriate by Jmc23 · · Score: 1
    I'm not sure there's much difference to the two approaches. In the Lisp case, the code is still being hand-optimized, it's just being put into the compiler where it belongs.

    Not sure if Lisp qualifies exclusively as a high level language. I'll dissassemble my functions all the time (it's built into the language) to see what assembly is being generated. If I wanted to, it's trivial to jump to the source where a compiler decision is made or a virtual operation defined and tweak it... not that I have.

    Lisp is high level when in the development stage, but you can pretty much get as close to the metal as you want.

    --
    Don't complain about syntax, grammar, or spelling. There is no.hell like input on android.
  143. To optimize the idle loop by tepples · · Score: 1

    The only optimization you can really do for an idle loop is to underclock your CPU so that the whole processor uses less power. But that just leads to the CPU not being the most power-hungry component of a computer. On tablets, for instance, the screen's backlight takes a lot more power.

  144. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    This needs a +6:MakesYouSmarter tag.

  145. Re:Or.. teach devs to use threading as appropriate by betterunixthanunix · · Score: 1

    That is a matter of scale. An audio or video decoder is tiny compared to an application server (JBoss, at 80MB of code, is on the small end; Websphere is 2GB), and even audio and video decoders are rarely written in pure assembly language (and you can measure the difference between codecs that were optimized by a compiler and those that were not, even with those hand-rolled optimizations). Game engine SIMD is exactly where I said it would be: in very small functions (i.e. programs) or in inner loops.

    As I said, hand-optimization scales poorly. You are talking about niche cases involving instructions that were design specifically for those cases. In the general case, where you did not receive some special set of instructions to do things for you, humans are not nearly as good as compilers, even humans who are well-versed in their machine's architecture.

    --
    Palm trees and 8
  146. For speed &/or efficiency in DIFFERENT scenari by Anonymous Coward · · Score: 0

    "Is it purely to find a better algorithm, or is it to show that as the size of the problem increases, there might be a better algorithm that would actually be of use?" - by girlinatrainingbra (2738457) on Tuesday December 04, @05:49AM (#42177533)

    Per my subject-line above? BOTH really... it's about efficiency & speed in different dataset scenarios!

    Here's a "Big O notation" table of BEST case, AVERAGE case, & WORST case scenarios, respectively (for each general sort type):

    ---

    Selection - BEST = O(N squared) - AVERGE = O(N squared) - WORST O(N squared)

    Bubble = Same as above EXCEPT best case where data is MOSTLY sorted (again, here it excels)

    Insertion - BEST = O(N) IF data mostly sorted - AVERAGE = O(N squared) - WORST = O(N squared)

    Merge - BEST = O(N log N) - AVERAGE = O(N log N) - WORST = O(N log N)

    Heap = same as Merge

    Quick - BEST = O(N log N) - AVERAGE = O(N log N) - WORST = O(N squared)

    ---

    * I.E. -> They DO all have merits in different scenarios (regarding sort state of data, and of course, size too, as well as if they incur recursion memory overheads, which can induce paging, etc./et al)...

    Like multivariable calculus & computing optimization curves? There's often MORE THAN 1 SOLUTION to any problem...

    (Trick is, finding which solution fits the data scenario best, for optimal performance in speed (and even memory efficiencies))...

    APK

    P.S.=> I posted more about it here -> http://developers.slashdot.org/comments.pl?sid=3291593&cid=42178921 with some examples...

    Also - How/Where/When multiple thread usage has DOWNSIDES (on single cpu rigs due to overheads cost incurred, AND as well as a case example where all the threads in the WORLD don't make a difference, & can make it worse due to overheads costs incurred (due to "blocking" occurring on operations & their results needing to happen, first))...

    ... apk

  147. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    How frequently do you spend time thinking about which registers are being used to store partial results in the programs you write? When was the last time you spent time thinking about which registers are being used by which variables, or when it would be a good time to store register contents in memory?

    What about memory alignment and packing? Calling conventions? Near pointers and far pointers (OK maybe that one is a bit dated)?

    Every day at my job writing embedded software for an automotive supplier.

  148. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    That is a quite astonishingly ignorant statement. That's like saying "what is the point of any research".

    And yet your comment seemed even more useless to me, than hers did to you... funny the way that works.

    Please don't deride people on here for asking seemingly honest questions.

  149. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    Or, gawd forbid.. we could teach programmers how to use threading?

    I know how to use threading.

    And, as a direct result of that knowledge, I push hard to avoid the usage of threads.

    The problem is that when threaded code has bugs, they can be brutally hard to find, reproduce, debug, and analyze.

    All of the code I have written professionally has deliberately avoided the use of threads. (I do application and system programming on embedded systems.)

    As a result, I have saved my employer a huge amount of time and money over the past 8 years completely avoiding all those weird, flaky kinds of bugs that only show up occasionally in the field, and that can't be reproduced in a debugger -- that are the hallmark of bugs in threaded code.

  150. Great white hope? by Anonymous Coward · · Score: 0

    Racist construct from decades past creeps into computing language. Why is this necessary? Holy grail != great white hope. How about rosetta stone, lost ark instead?

  151. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    which even the smartest programmers in the world think they can handle

    Nope.

    The smartest programmers fully understand the level of difficulty and complexity that comes with using threads, and they know how brutally hard it can be to find, reproduce, analyze, and debug the problems that result from bugs in threaded code. And, as a result, they are smart enough to avoid using threads in the first place.

    They know enough about threads to know that the amount of time for them to fully understand how threads would work in their code is far greater than the amount of time they are authorized by their employer to spend.

  152. pfff... my auto-threading language is far superior by happyjack27 · · Score: 1

    pfff... i designed a much better autothreading language, much better, much simpler, much more intuitive, and allows for data-parrallel threading. a long time ago, too. https://sourceforge.net/projects/tsunamiprogramm/

  153. When can we get this baked into Windows Store Apps by elabs · · Score: 1

    I know a lot of apps that could take advantage of this technology right now.

  154. Re:Fast First Post by A+bsd+fool · · Score: 1

    They do actually spend more, not to say they're spending it better. For 2011, 2010, and 2009, companies spent:

    (billions)
    MS: $9.0, $8.7, $9.0
    IBM: $6.0, $6.0, $6.0 (IBMs 10-K filings do not list specific numbers. Each year says "approximately six billion.")
    AMD: $1.5, $1.4, $1.7
    Intel: $8.4, $6.6, $5.7

    It was an obvious mistake to include Intel in the list of "more than ______ combined". Both Intel and MS spend more than AMD and IBM combined, though.

  155. Re:For speed &/or efficiency in DIFFERENT scen by girlinatrainingbra · · Score: 1

    Thanks! Thanks for taking my question seriously and for giving me a useful explanation and answer. Also, thanks for the pointer to your other /. comment about this. :>)

  156. Re:Or.. teach devs to use threading as appropriate by leaen · · Score: 1

    Even the best of C or C++ compilers are terrible at vectorization of code.

    The 1990s just called. Seymour Cray is on the phone laughing at you.

    Really? Even simple examples like this:

    int sum(int *s){long i;
    int su=0;
    for(i=0;i<128;i+=2) su+=s[i]*s[i+1];
    return su;
    }

    Results in 765 byte monstrosity by latest icc and 1095 bytes with gcc

  157. Moore's Law and Lying by Anonymous Coward · · Score: 0

    Moore's *observation*, as the poster noted, got sick in 2004; which is when CPUs stopped getting faster. The important thing here is the lying; most sources -- i.e. the computer industry and the pathetically-mendacious magazines -- still pretend *nothing* happened to Moore's whatever-it-is, but in fact obviously something happened and, most noticeably, it involves a *lot of money*.

    Whatever this parallel press release is about, Microsoft's entire .NET feature program since around 2004 has been things that were supposed to make parallel processing work good. At last. And maybe this one will.

  158. Re:Or.. teach devs to use threading as appropriate by SplashMyBandit · · Score: 1

    When I was an undergrad, I knew a lot of professors in the EE department who used Matlab to do computationally intense signal processing, and I know people in the CS department here who use Matlab for image processing work (and they would rule out both Java and any functional language for that work).

    Matlab is ok (I use Octave from time to time). However, it not not really a general purpose solution. If you get to work outside your university you will come up against them. That is part of the problem with our discussion I think. From your post you are an academic working in CS department. Well, I'm an ex-academic (astrophysics) who gained sufficient energy to reach a free state. I work outside of university and get to build real systems in the non-ideal world outside the Ivory Tower. I'm not saying your observations are invalid, they just aren't practical with the variable quality of people you have out here.

    With regard to the "extra effort" you suppose. Well, I'm well aware of the psychological limits of working memory because I also am trained in the Information Mapping methodology of technical writing (and have do aviation, where those limits were found). I continue to disagree with you. Making variables and breaking long chains of calls up helps debug unfamiliar code (eg. written by someone else, or yourself many years ago) because the machine can help you with it. You don't have to remember all the variables because you've given them great names like "specificImpulseNetwons" or "annualInterestRate". Note how units become important. I could rant how the CS that I learned as an undergraduate placed emphasis on the wrong things for industry: like worrying about how few lines a toy program can be reduced to; how making yet another language will solve all programming problems (for a specific domain); like proving the algorithmic complexity for some obscure domain. Don't confuse that with teaching students useful things; like always thinking about the units they are working in (failing to do so will famously get your Mars Rover crashed), or how to design stuff to be as simple as possible (so lesser programmers can build and maintain it), how to get sh!t done quickly, give them 'religion' about testing or how to write robust software (hints: test, check your preconditions always, validate input data, assume that the stuff can fail at any time and fall back to a safe checkpoint state, etc).

    It is not very useful to continue this debate. You are interested in languages (for the sake of it I would assume, given your career is CS) and in constructs that are easy to teach. I'm interested in getting big projects done, getting them done on time, getting a team to produce it, making it bulletproof, and making it so that it can be deployed and maintained by folks who don't know anything about programming. This is why I recommended Java, and why so many enterprises use it - and because there is not yet a sensible alternative to it (yes, yes, the good is the enemy of the great - but I'd rather choose a good general purpose language over a great limited purpose language because of the enormous economic benefits of re-use).

  159. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    Threading isn't hard, just learn to design instead of throwing code into files and hoping it works. I find it easy to visualize the way different parts of the system interact.

    My first real-world project after college was to fix a program that took a 6 man team 9 months to create and another 6 months of debugging. I realized it would be the program to parallelize. Did a week of reading into threading, which seemed very simple, spent 3 weeks re-writing the program the program from scratch, took a week of testing, and now the program has been running in production for hundreds of customers for the past 3 years without a redeployment.

    Best part is is it is about 1/3rd the lines of code from the original, runs about 1000x faster single threaded, uses about 1/10th the memory, and scales close to linearly with cores.

    Now that was my first multi-threaded program, took about 1 week of reading and 1 more week of design, 3 weeks of coding, and another 1 week of debugging.

    Threading isn't hard. Once one understands how a CPU(what is atomic and what isn't), thread scheduling(interrupts), and IO(blocking) all interact, the rest is simple. I don't even have a background in CS or Engineering. Like I said, I can "see" in my head how everything interacts, I don't know how many people do that, but it was natural for me ever since I started programming.

  160. "I set fire to the rain..." by Anonymous Coward · · Score: 0

    http://www.youtube.com/watch?v=FlsBObg-1BQ

    * You're welcome... Hope it got you thinking (that's how it all starts).

    APK

    P.S.=> Do the same when you are able... pay it forward & what-not - that's all...

    ... apk

  161. "I set fire to the rain..."... apk by Anonymous Coward · · Score: 0

    Every day, all we speak of -> http://start64.com/index.php?option=com_content&view=article&id=5851:apk-hosts-file-engine-64bit-version&catid=26:64bit-security-software&Itemid=74

    "Which is theoretically correct but completely different in practice." - by leaen (987954) on Tuesday December 04, @06:52PM (#42186395)

    Perhaps, for those of you that are theorists... I never have been. I am about "Where the rubber meets the road"...

    * There is NO THEORY THERE ONLY, only work in practice...

    (Talk's cheap - deeds, are not...)

    APK

    P.S.=> See subject-line... & the program is about 1-2 hours of work since 2003

    ... apk

    1. Re:"I set fire to the rain..."... apk by Anonymous Coward · · Score: 0

      Stop bolding random sentences in your text. It makes you seem like a cook.

    2. Re:"I set fire to the rain..."... apk by Anonymous Coward · · Score: 0

      Stop giving orders nobody will obey and get on topic.

  162. Re:Or.. teach devs to use threading as appropriate by prionic6 · · Score: 1

    In what version of java is this legal?

    I think his point was, Java is harder because it does not allow that style of coding.

    Just for the record, I don't agree. The line looks too complicated to be understood efficiently.

  163. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    Wrong on all counts. Imperative languages are a way to convey instructions to a computer; declarative languages do not convey instructions

    Alan and Alonzo might like to have a talk with you if they were alive.

    Imperative languages convey instructions to a virtual computer that might or might not be implemented by feeding a direct translation of them to the chip. Declarative languages do the same, but the instruction is in such a form that it is inefficient to implement directly, e.g. "substitute parameters and expand the result".

    try implementing something more complex than the algorithms you see in Knuth's books

    Very few people ever code algorithms that even approach stuff in knuths book in complexity. (pouring featurism all over the codebase and then trying to fix the mess doesn't count. it's a different kind of problem)

    In the real world, quicksort and mergesort will use asymptotically worse algorithms below a certain problem size because of constant factors

    This is more because it's usually hard to instruct the compiler to properly expand fully inlined versions of cases where size<=16. If it could do that the result wouldn't be that far from optimal. These algorithms do not, in principle, suck enormously on small inputs.

  164. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    Hmm, let's take a claim back. It's not terribly hard to imagine a declarative language that can represent impossible problems or problems with unknown impossibility. Such a language does not represent instructions in this world.

  165. Not special. by Anonymous Coward · · Score: 0

    Intel has been doing this in their compilers for years. More marketing hype I suppose. You can read more here: Automatic Parallelization with Intel® Compilers.

  166. Re:Or.. teach devs to use threading as appropriate by Anonymous Coward · · Score: 0

    Just like quicksort(3) is far faster than bubblesort so too is a highly threadable code faster than non-threadble code

    First, just to be pedantic, I'll point out that quicksort is as bad as bubblesort in the worst case, to a constant factor (you should have picked heapsort or mergesort).

    ...

    Multithreading is not a magic bullet, and in all likelihood it is not generally applicable.

    Multithreading is a magic bullet for quicksort - Quicksort - Parallelization.

  167. No fun. by Anonymous Coward · · Score: 0

    Great. Now Microsoft is taking all the fun out of programming.

  168. Re:Or.. teach devs to use threading as appropriate by MagdJTK · · Score: 1

    Or, gawd forbid.. we could teach programmers how to use threading?

    Who said it was an either/or proposition? People should learn to use parallel processors efficiently, but that doesn't mean that I'm going to turn down a compiler that can do a bit of it too. Otherwise, why bother having optimising compilers at all; clearly the programmer should have just programmed more efficiently!

  169. ".....a breakthrough out of Microsoft Research." by Anonymous Coward · · Score: 0

    I was reading this article until it said ".....a breakthrough out of Microsoft Research."...at which point I realized it was a scam.
    EVERYBODY KNOWS MICROSOFT DOESN'T WRITE CODE!

    Only false promises and bad checks for stolen technology.

    HAHAHAAHAHAHAHAHAHAHAHAHAHAHAH

  170. Re:Or.. teach devs to use threading as appropriate by Vintermann · · Score: 1

    Point is, you make "stupid" errors when doing something as simple as constructing an English sentence. We all do, on occasion. Yet I am supposed to believe you're good enough to not make errors in a task that's much harder, and where there are far more errors to make?

    Security has to rely on something more than "don't make mistakes", because statistically, people will make mistakes. Even the best. Especially when it's easy to do things wrong and there are many opportunities to do so, as is the case with multithreaded programming.

    --
    xkcd is not in the sudoers file. This incident will be reported.
  171. Re:Or.. teach devs to use threading as appropriate by KingMotley · · Score: 1

    And the point is invalid because:
    1) I don't put the same weight on posting perfect English on slashdot as I do writing actual code.
    2) Slashdot does not have an edit button. My code editors do.
    2) While I am pretty good at the English language, it's not in my field. So, while I should not be taking a job as an English teacher, that says absolutely nothing about how I program. Oh, I am also very bad at drawing and coordinating colors. All of which has nothing to do with my ability to do my job.
    3) You are correct that mistakes happen, however, in programming we have things like editors with grammar checking, debuggers with step tracing and watch windows, unit tests, best practices, and design.
    4) Multithreaded programming isn't a difficult concept. In most cases it isn't difficult to implement.

  172. Re:Or.. teach devs to use threading as appropriate by Vintermann · · Score: 1

    Tell you what, why don't you show me some nontrivial multithreaded code you've written.

    --
    xkcd is not in the sudoers file. This incident will be reported.
  173. Re:Or.. teach devs to use threading as appropriate by jedwidz · · Score: 1

    Logging is a good candidate for Haskell's 'get out of FP jail free' function, unsafePerformIO.

    Apparently there's a standard function Debug.Trace.trace for this.

  174. Re:Or.. teach devs to use threading as appropriate by KingMotley · · Score: 1

    Sure, define non-trivial.

    Would you like some that I did in C#, C++, C, VB6, or 80386 assembly (My own written from the ground up protected mode OS)?

    Where would you like it, and where can I send the bill for my time to strip out any confidential, trade secret, and/or private section?