Auto-threading Compiler Could Restore Moore's Law Gains
New submitter Nemo the Magnificent writes "Develop in the Cloud has news about what might be a breakthrough out of Microsoft Research. A team there wrote a paper (PDF), now accepted for publication at OOPSLA, that describes how to teach a compiler to auto-thread a program that was written single-threaded in a conventional language like C#. This is the holy grail to take advantage of multiple cores — to get Moore's Law improvements back on track, after they essentially ran aground in the last decade. (Functional programming, the other great hope, just isn't happening.) About 2004 was when Intel et al. ran into a wall and started packing multiple cores into chips instead of cranking the clock speed. The Microsoft team modified a C# compiler to use the new technique, and claim a 'large project at Microsoft' have written 'several million lines of code' testing out the resulting 'safe parallelism.'"
The paper is a good read if you're into compilers and functional programming. The key to operation is adding permissions to reference types allowing you to declare normal references, read-only references to mutable objects, references to globally immutable objects, and references to isolated clusters of objects. With that information, the compiler is able to prove that chunks of code can safely be run in parallel. Unlike many other approaches, it doesn't require that your program be purely functional either.
now that this has been described, programmers make it so!
Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.
There is great potential here if this sort of thing can be made to work cleanly and safely. I suspect a lot of programmers (myself included) think better in a single-threaded manner. People are mostly used to dealing with A B C D and, as such, I think we usually write code to execute steps the same way. Having the compiler identify things which could run in parallel and thread work to run as A B&C D would be a huge step forward as long as it doesn't introduce bugs.
Moore's law is only about the number of transistors on integrated circuits.
How about using threads when it makes sense, and not when it doesn't? It doesn't matter how many cores, how many threads, and how many gigahertz you have when the computer is doing nothing but blinking a cursor waiting for me to press the keyboard. The vast majority of programs match this situation, and all the magic compiler is going to do is increase the speed of the System Idle Thread.
Or, gawd forbid.. we could teach programmers how to use threading? I am a casual developer, with no formal training beyond writing practical code and reading as much as I can from the keyboards of real developers. I've run into my fair share of "real", "professional" developers who've been tasked to work on my code and thrown their hands up in defeat - not, as I feared, because of the awfulness of my coding style, but "This uses threading, I don't know this!".. Of course, looking at their resumes, a long list of single threaded webapps where the actual threading is handled for them by the webserver itself.. I give up. Even some basic native GUI client development should teach developers simple threading and asynchronous callbacks? alas, yet another talent abandoned in the age of the webapp. Is it any wonder the security issues (my actual realm of supposed 'expertise') in their code often make the architectural issues look trivial in comparison?
An interesting development, and much needed I fear, but yet another layer of abstraction to allow lazy developers to not have to really bother about knowing what their code is actually doing (that's for the poor SoB who has to maintain it is for...)
OK, so now we shall have another way to produce parallel programs.
Running things safely in parallel is a very good thing, but it does not by itself guarantee significant improvements in speed. The hard bit is actually to optimize the parallel code.
Mainstream language have mutable state all over the code. Functional programming's big change on state issues is to careful isolate state. The Microsoft approach means that state needs to be tracked carefully so that it could be isolated by the compiler even if it isn't isolated by the code. Which is likely just as much work as isolating state. And the nice thing about isolating state is once you do it you can make use of all sorts of incredibly powerful functional paradigms like like first class functions (closures, partial execution...) and lazyness (infinite data structures, no need to figure out proper order of evaluation..)
The solution to parallelism is functional programming. And no it is not too hard. Excel is a functional programming language that lots of people know that does a great job isolating state.
About time.
"Great white hope"?
Is this Slashdot or segregation-era Mississippi?
I skimmed the paper... it requires you to describe how variables are accessed... is this statically checked? If not, then it is no better than the current paradigm, except the compiler will be introducing races rather than the developer.
Also, there are no performance numbers. How much of the gains of paralleling such small chunks of code are lost through synchronization?
Or, gawd forbid.. we could teach programmers how to use threading?
That's easy: "Don't."
From everything I've read about threading, there's no general way a hand-coded multithreaded program can ever be proven to be correct. Threads introduce all sorts of extremely subtle timing-based logic and security bugs which even the smartest programmers in the world think they can handle but in practice don't. And most programmers are not the smartest programmers in the world (they only think they are).
The correct solution is to switch from C-based imperative languages to pure-functional implicitly parallelised languages, but that's not likely to happen before the heat death of the universe.
You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
Until non-Microsoft compilers can implement the technique.
.: Semper Absurda
Goto is swell also. Just be sure not to make any mistakes!
No non-trivial program can ever be proven to be correct.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
The researchers are proposing neither solution, and I think they are right.
Explicit threading is hard to get right and the bugs are insane. The typical proposed solution, "use a functional programming language and abandon all state", is like using sarin on ants. Effective, and pretty neurotoxic.
The researchers propose that humans annotate languages with information about intended uses of variables which is not too difficult to infer in the cognitive model of the programmers, and then let the the compiler take care of the coordination of the threads. As it should be.
I wrote up a very long comment that ranted and raved and didn't come close to this comment right here. Some threading problems are hard, really hard. They aren't that common.
Man up and learn. I like it.
md5sum
d41d8cd98f00b204e9800998ecf8427e
The correct solution is to switch from C-based imperative languages to pure-functional implicitly parallelised languages, but that's not likely to happen before the heat death of the universe.
Well, yes. Are you going to be the one to pay the 10s of billions of dollars, if not 100s of billions, to rewrite all software in existence into a functional language?
Doubling of processor cores doesn't imply that things can be twice as fast. There is actually pretty hard science going on researching what can be done with N processors to speed up problem solving. It turns out, that even with large numbers of processors we are not able to speed up majority of algorithms in any significant way. Maybe in real world, few additional processors could compute some basic stuff in the meantime to speed things up, but I doubt that doubling the number of processors will add significant speed to computers when the number of them is large enough.
I think the point is that just because something is hard is no reason not to teach it to people. If no one knew every learns these low-level details who is going to be maintaining the compilers?
There are plenty of threading frameworks in most languages where you can just define threadable operations. Your job is simply to ensure the task is correct, use the right framework and trust it to the correct degree. As with many things, someone only needs to do it right once.
It means that, for a Schrödinger moment, you can be goatse'd by your own foot, before it comes to rest in your mouth.
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
This is totally true.
So long as you define trivial programs to be a subset of programs that can be proven correct.
Or, gawd forbid.. we could teach programmers how to use threading?
No, we want our compilers to do these things for us, because things that compilers can do they usually do better than humans can. Once your programs become large enough and complex enough, compilers outperform humans every time -- it's not about laziness, it is about the limits of human programming ability.
Compilers surpassed humans when it comes to optimization a very long time ago, except for very small programs or very short inner loop bodies.
Palm trees and 8
Bingo.. and as many other folks have pointed out, 90% of of threading is purely to decrease latency and bypass blocking operations - very few applications out there today are heavily dependent on concurrency to achieve any kind of raw horsepower... This does just seem like, if it were to become mainstream, would just become a crutch for developers to ignore threading and blocking I/O issues entirely, because "The compiler will sort that out".. In theory, this isn't necessarily a bad thing. In theory, all things work in practice too.
This is not auto-parallelising. This is adding in hints to the compiler to allow it to parallelise code. That's not automatic, that requires the programmer to do it. And with zero performance analysis in the paper, wtf? Seriously, this is a much harder problem than is being made out to be, and there is not one size fits all way of doing it. So they've found a neat way of doing things, so have other people. They work well in some circumstances, and not in others. Go figure. Paper probably quite interesting, but article complete tripe.
So, first we create a virtual machine and a new language, called c#. That cost about 50% performance, but that's disregarded as 'irrelevant because desktop apps don't need speed' and 'computers only get faster'.
Then, when halved performance, we go seek ways to split inefficient single core code and split among two or more threads.
So basically, we found a way to have 2 processors do the job of one. Doubling energy usage. I'm impressed.
Most parallel problems tend to fall into a small set of categories, and I see no problem with abstracting them away. There are many libraries already that handle the threading for you.
How about we write new code in a functional language, instead of clinging to imperative languages? Put C, C++, and Java in the category they belong in: maintenance and legacy code only.
Palm trees and 8
An interesting development, and much needed I fear, but yet another layer of abstraction to allow lazy developers to not have to really bother about knowing what their code is actually doing (that's for the poor SoB who has to maintain it is for...)
Developing software is all about managing complexity. Abstractions are the primary tool used to do so. They are neither inherently good or bad. A large part of writing good software is finding the appropriate abstractions and eliminating inappropriate ones. If an abstraction allows a program to be written more easily with an acceptible level of performance and correctness, it is an appropriate abstraction.
To know what code is "actually doing" is relative. No programmer knows what his code is doing at the level of gates on the chip. It's rarely necessary or even helpful to know what the code is doing at the level of CPU instructions or microinstructions. This is especially true if the code runs on multiple CPUs.
Compilers surpassed humans when it comes to optimization a very long time ago, except for very small programs or very short inner loop bodies.
Hahah, no they didn't. If this were true anything multimedia-related would not still require tons of hand-optimized SIMD code. Even the best of C or C++ compilers are terrible at vectorization of code. Just compile something like x264 with the best C compiler you can find and compare it to the assembly-optimized version for, say, the latest i7 processors. The difference in speed will be quite huge.
And what in the research paper itself would you dislike, then?
In particular, what do you dislike, more than all the "layers of abstraction" a compiler already adds between you and assembly-level management of code and machine state?
As far as I'm concerned, FP / immutability *is* the thing programmers should learn to describe, not the threading details themselves. Describing immutability is more profound than the actual threading details. If you want to execute code out of order in sequence (=multi-threaded) and have a predictable result, it needs to be composed only of commutative operations, and operating only on immutable values (at least for the section that is executed in parallel, rather than in sequence). Threading doesn't allow you to get around that, either way. And people already largely know how to describe operations (it's up to the library developers to decompose as much of the basic ones into commutative ones, though). In my opinion, the main task is to make people work with immutable values, nothing else.
It is possible to prove a multi-threaded program correct.
First you might start at the memory model, and all the guarantees that one can make based on that (and, by association, all the guarantees in terms of the memory model that locking / atomic calls make), then one can move on to he library structures and routines (objects and methods to put it in another paradigm).
Once you have hard guarantees using primitives and low level constructs it might be easy to construct a state-based proof. One example is Cliff Click's lock free hashtable (don't know ifhe has a formal proof, but nearly so if not)
Correctness in multithreaded environments takes a different form than in a single thread, but it is nothing that cannot be managed. Generally there are only a few "tough locks" to crack and the majority of the speedup can be had. Some problems are harder than others, and some problems don't have great parallelism, but that is just a generalization of programming, really. "Some things are harder than others, but nothing doable is impossible"
md5sum
d41d8cd98f00b204e9800998ecf8427e
Honestly, you know what this says? This just says some programmers still need to go practice in the threading space.
Threading is not that hard at all for a VERY large class of problems. There are issues -- like, the memory bandwidth will eventually choke the ability of processors to get at nearby memory regions -- and you need some kind of sane plan to deal with cacheing in one core as compared to the next core when you share flags or data coming in during execution from elsewhere -- and you need to figure out when thread overhead starts to chew into your efficiencies -- but really, it's entirely doable, and isn't that difficult a problem space as compared to the actual problems *serious* programmers have to solve. It may just be that your "thing" isn't really all that threadable. Some things aren't. But it should also be very obvious when they aren't.
Then again, some problem spaces lend themselves very well to multiple core approaches -- I just finished a library that slices up images and then applies various separable algorithms to the slices in a variable number of threads (the user of the library can specify.) I wrote it in pure C, used posix threads, no sweat at all, in my 8-core machine, you get just about exactly the gains you'd think (x the number of cores) when the process is CPU cycle intensive, less as the ops are simpler and memory I/O rates rise.
Seriously. If threading seems that hard, you need to go do some studying. It's not the threading. It's you. You really should get on this bus.
I've fallen off your lawn, and I can't get up.
First, many problems do not have significant speed-up when parallelized. No automated tool can change that. Second, parallelizing things has absolutely nothing to do with Moore's law. Maybe look up the definition some time?
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
To further add, this is doubly compounded when running such things on, say, ARM processors. Anything multimedia related that doesn't have NEON SIMD or isn't using some form of hardware acceleration will run terribly slow relying only on compiler optimization.
Threading as it is today is inherently non-deterministic, which makes isolating and fixing threading bugs very difficult. It doesn't matter how comfortable you are in writing threaded applications. At some point you -will- introduce thread-related bugs in your application. In fact many threaded apps run for years with threading bugs lying in wait and they aren't triggered until some later change alters conditions just enough for a race condition or other problem to be exposed. Those dormant bugs are a SERIOUS PITA to debug and fix because you look through recent changes in your version control and everything looks innocuous.
Anything that can allow a compiler to prove correctness of concurrent execution is a welcome improvement. I'm all for additional education, but I wouldn't poo-poo this type of advance in compiler technology.
That is because of knowledge of processor details, memory details, comple operations, and, well, they [compilers] are better than us. Except that the ability to optimise depends on pure logic, of some form. As the state gets large optimization gets more complex.
Just like quicksort(3) is far faster than bubblesort so too is a highly threadable code faster than non-threadble code. Languages do not, contrary to belief, express intent, the provide a strict set of instructions that the computer MUST respect. In the end a good algorithm with no compiler help will beat optimized "dumb" code in all cases larger than "toy" (say, a few dozen "n" in Big-O notation)
Good thought about opimizing compilers, but it doesn't bear out.
md5sum
d41d8cd98f00b204e9800998ecf8427e
This is my pay day baby! My code is going to run super fast in multicore machines without any (more) extra work from me! Wooo Hooo!
Please take pity on my and let me enjoy my day dream for a few hours, without rudely inserting reality into my reverie! Thanks slashdotters, I know I can count on you!
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
The holy grail of parallel processing is teaching programmers how to handle parallel processing (and what domains can benefit and where).
A compiler may be able to thread your program, but it will be a long time before it understands your intent well enough to do it well. Also I can think of many situations under which such functionality may not even be a good idea at all. I would assume such a system would use a pool of threads to avoid constant thread construction overhead and if you get a few IO-busy actions threaded out in an ignorant fashion I think you will find your program blocking a lot more, and producing a lot less throughput than it should.
Also, the OP blithely stated that functional programming isn't happening -- yet features of the functional paradigm, like anonymous functions, have made their way into nearly every language of consequence and many new languages proudly tout functional programming features (see f#, scala, rust, clojure, and go). Perhaps pure functional programming is moving pretty slow, but it's features are not.
I was crazy back when being crazy really meant something. (Charles Manson)
Or we could just pursue CSP so that we have a friendly concurrency model with which to parallelize software that needs it, or just to introduce asynchronicity in a simple way. Shared-memory concurrency is pretty poor, whether or not a human user of the language is responsible for it.
This is nothing new. As in decades-old. Back when I was at DEC in the 1980s we had auto-threading compilers that worked very well on standard application code. Today, Intel has been doing auto-threading in its C++ and Fortran compilers for more than a decade and it is very effective for a large class of applications. Intel's compiler also has features that help you tweak the code for better auto-parallelism, notably the Guided Auto Parallel feature introduced in 2010. Other compiler vendors also have auto-parallelizing compilers.
I've been in the industry for decades, and every couple of years someone comes out with yet another programming language or technique that is supposed to revolutionize application parallelization. This is just another one, and many years behind the rest of the industry from what I can tell.
Actually you don't need functional. Java works fine with multi-threading (well, it does for my daily use). In fact, that is the real strength of the garbage collection model of Java, it can track and manage efficient release of resources in a multi-threaded system. Also Java has direct language support for synchronization mechanisms and extensive library support for multi-thread friendly data structures (Doug Leah is a legend!). Lots and lots of multi-threaded programs get written in Java, without needing functional languages, it's just that we we do write them they do work (after we test for various possible thread-related problems), work well, and we don't have to bitch about needing to change paradigms to get things going (eg. needing to change to functional languages when they are not always the best fit for many classes of problem).
Of course you'll hear those still stuck on legacy languages designed without multi-threading in mind whinging. That doesn't mean there isn't a straightforward, easy to learn language and development platform out there that isn't strong on multi-threading. There is - and it is cross-platform and general purpose too.
Innovation doesn't mean "doing something no one has ever thought of before" it means "doing something no one has done before". If Microsoft's technique works where Carnegie-Mellon's didn't then it's innovative, if it doesn't, it probably isn't. There could still be something innovative in the way they tried which could help someone else later, but mostly what we'd be looking for from an innovation point of view is "actually does what it says in the tin".
People who opine that a compiler advancement can get Moore's law back on track. clearly have no idea whatsoever what Moore's "law" states.
Moore's law has absolutely nothing to do with compilers or any other sort of software. It's about transistor count in integrated circuits. Specifically, Moore's "law" observed that transistor count of state of the art ICs had doubled every year since the IC was invented and he predicted that it would continue to do so for at least 10 years.
And he was approximately right (the best one can expect for a pseudo-prophetic utterance). But over time, the rate of increase has fallen below what Moore predicted, with an average doubling time of about 2 years. Now tech roadmaps assume more like 3 years. Every step is harder than the last and eventually we will reach a point of diminishing returns where the investment necessary to double IC complexity won't be seen as worthwhile.
Automatically paralleling compilers aren't new. SGI had one for C ten years ago. Intel has it for C++ and Fortran now. It's been done for Matlab. There's plenty of theory.
Outside of the supercomputing community, it's never gone mainstream. Microsoft is at least trying to do it for C#, which is a reasonable language. An interesting point is that this Microsoft approach exploits immutable objects. Immutable objects can be freely shared without locking, so wide use of immutable objects makes automatic extraction of parallelism easier.
I'd looked at doing something similar for Python, to get rid of the Global Interpreter Lock, Python's boat-anchor. Python already makes wide use of immutable objects, but doesn't gain any performance from them. If everything is thread-local, immutable, or synchronized in the Java sense, you don't need global locking. But the politics of the Python community do not favor performance. (Potentially, Python could be up to at least the LISP level of performance, within a factor of 2 of C, if the language was restricted in certain ways.)
Another part of the problem is the pernicious heritage of POSIX/Linux locking primitives. Many programmers think that locking is an library or OS-level problem, not a language level problem. Locking via library primitives means language doesn't know what data is locked by which lock. This makes optimization and race checking very tough.
The political and social problems here are tougher than the technical ones. So the open source community can't solve them. It takes someone with a big hammer, like Microsoft, to do it.
"Who tells whom what to do?" - V. Lenin
Even the best of C or C++ compilers are terrible at vectorization of code.
Yeah, and the best humans are terrible at allocating registers -- so bad, in fact, that the best C compilers ignore the register keyword. What do you think is more relevant to the general case: vectorizing multimedia operations, or allocating registers? Compilers are also better than humans at:
To put it another way, look at the Orbitz story, which is over a decade old now:
http://www.paulgraham.com/carl.html
On the one hand, you have hand-tuned assembly language. On the other, you have a program written in Lisp (a high level, compiled language) with some C++ mixed in (for managing memory). Orbitz was able to compete on speed, but more importantly, it was returning better results. It's not that the people who wrote that mainframe assembly language were idiots -- they were taking advantage of all sorts of special hardware features, they knew how to hack their machines better than anyone else -- it is just that the Orbitz algorithm was far too complex for efficient hand-rolled assembly code, at which point compilers are really the only choice. The mainframe guys were busy thinking about how to make use of special machine features in assembly language; the ITA team was busy solving the higher-level problem, and relying on their compiler to generate good assembly language code. This is a particularly telling line:
We disassemble most every Lisp function looking for inefficiencies and have had both CMUCL and Franz enhanced to compile our code better.
[emphasis mine]. They disassembled their code...and then improved their compiler when they saw problems. They did not hand-roll the code, they made the compiler do a better job of generating code. These are not lazy programmers, nor are they programmers who do not know how to use assembly language; they are programmers who understand that they have a tool that is far better at generating assembly language than they are, and that they have more important things to do with their time.
I deal with quite a bit of crypto code in my work. I have seen lots of hand-tuned assembly language, I dealt with code that took advantage of the AESNI instructions to perform very fast encryption. I am well aware that in small, highly specialized functions (like AES), humans are better able to utilize special instructions to improve performance. Those are niche cases, and the techniques used in those cases have very limited applicability (even SSE is fairly limited in its applicability, by comparison with the sort of code programmers write and maintain every day), and the techniques scale very poorly.
Palm trees and 8
Just like quicksort(3) is far faster than bubblesort so too is a highly threadable code faster than non-threadble code
First, just to be pedantic, I'll point out that quicksort is as bad as bubblesort in the worst case, to a constant factor (you should have picked heapsort or mergesort). That aside, it is worth noting (and I am somewhat bothered by this when it comes to TFA) that we still do not know if it is even possible to optimize any program by parallelizing it; see the NC-vs-P question:
https://en.wikipedia.org/wiki/P-complete
Multithreading is not a magic bullet, and in all likelihood it is not generally applicable.
Languages do not, contrary to belief, express intent, the provide a strict set of instructions that the computer MUST respect
Wrong on all counts. Imperative languages are a way to convey instructions to a computer; declarative languages do not convey instructions, and programming in a declarative language requires an entirely different mode of thinking (it is closer to asking a question that giving instructions). It is also not strictly necessary for the computer to do exactly what a program expresses; there has been some work on compiler optimizations that have a (tunable) chance of not maintaining soundness.
In the end a good algorithm with no compiler help will beat optimized "dumb" code in all cases larger than "toy" (say, a few dozen "n" in Big-O notation)
If you are convinced of this, try implementing something more complex than the algorithms you see in Knuth's books; say, this:
http://eurocrypt2010rump.cr.yp.to/9854ad3cab48983f7c2c5a2258e27717.pdf
Then ask yourself this: could the constant factors in your implementation be better? At the end of the day, big constant factors will hurt your performance so badly that you might as well have used an asymptotically worse algorithm; indeed, consider fast integer multiplication:
https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm
10000 digits are needed before that algorithm actually outperforms the asymptotically worse Toom-Cook family of algorithms. Here is an even more extreme example:
https://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_algorithm
Sure, that's a better matrix multiplication algorithm in the asymptotic sense...but only for matrices that are so large that you cannot even store them on today's computers.
So really, while you are right that asymptotic improvements will always beat constant factor improvements (which is what compilers are mostly going to do for you), you are wrong to ignore the importance of constant factor improvements. In the real world, constant factors matter. In the real world, quicksort and mergesort will use asymptotically worse algorithms below a certain problem size because of constant factors. In the real world, large integer multiplication is done using Karatsuba or Toom-Cook methods, not FFT methods, because of constant factors. In the real world, if you are not using a compiler to optimize your code, your code is going to be slower than it needs to be, even if you spent hours hand-tuning it, unless you are only dealing with toy problems.
Palm trees and 8
"Using threading" as in spinning off a thread or two is not hard to teach. However, even when people know to use threads, and, more importantly, know how to use them right (avoiding race conditions etc), they are still not necessarily optimal at determining when to fork, and how many threads to create at each fork point. A compiler doing analysis of the entire code base to see what depends on what can come up with more opportunities to parallelize, some in very non-obvious places. This isn't really any different from many compiler optimizations that you see in today's C++ compilers - some of them are very non-obvious, when you look at the generated assembly.
Anyway, if you want to do "threading for the masses", you need to add another level of abstraction on top - tasks/futures, and chains thereof. Like Apple's GCD, or .NET TPL. It's much easier to reason in terms of forking and joining tasks than it is in terms of threads and synchronization primitives. Also, you can do nifty syntactic sugar on top of tasks (see C# async/await for an example) without hiding all their flexibility.
easy to learn language
Having taught Java to novices at one time, I can tell you that this is not true. Here is a very small example of why Java is harder to learn than other languages:
Yes, they should have used the ternary operator. Why does Java even separate these? Why not just allow conditional statements to appear everything, which is what you get in Clojure (and any other Lisp, and many other programming languages that use reduction semantics)? This sort of rigid syntax makes the task of learning Java needlessly hard, because learning Java means learning how to think in a very different, very unnatural way.
Palm trees and 8
Annotations are dangerous because humans mess them up in ways that maximally obscure bugs. Explicit threading has the same and more problems, including sometimes implicit architectural dependencies. When you have many $M invested in a code, it has to outlast generations of hardware systems.
For the best work ever done in compiler-managed parallelism, read papers about the Tera/Cray MTA's compilation system.
If something can be made more efficient - using a set of logic that can be programmed... and this itself can then be done in an optimisation program. If these 'rules' for optimisation are good they may be included in the compiler. They both try to do the same thing ultimately... to convert a language into efficient binary code that can be executed by the OS? So the OS should control the threads, based on what it knows is already in use and what threading model the code can accomodate?
Seems to me that a lot of people have been passing the buck on this one, not just the end software devs.
I'm not sure if this is meant to be funny or serious.
Some threading problems are hard, really hard. They aren't that common.
Which is why compilers should be handling as many threading problems as they can: so that rather than spending time and mental effort dealing with problems that are not so hard, we can spend time and mental effort on problems that compilers cannot solve for us. There is a reason that we use higher level languages, and that reason is not "laziness."
Palm trees and 8
Abstractions are the primary reason my four GIGAHERTZ computer with four GIGABYTES of RAM is less responsive than my 450MHz computer in 1998. All the performance gains are immediately devoured to make life easier for developers with no benefit to users.
Shutting down free speech with violence isn't fighting fascism. It IS fascism!
Wanna bet?
Threads introduce all sorts of extremely subtle timing-based logic and security bugs which even the smartest programmers in the world think they can handle but in practice don't.
To be fair, you don't even need multithreaded programming to see all the subtle timing issues that poor programers don't understand and make daily. It really isn't that difficult to write multithreaded programs, and ones that don't having "subtle timing-basic logic and security bugs".
How frequently do you spend time thinking about which registers are being used to store partial results in the programs you write? When was the last time you spent time thinking about which registers are being used by which variables, or when it would be a good time to store register contents in memory?
What about memory alignment and packing? Calling conventions? Near pointers and far pointers (OK maybe that one is a bit dated)? Most programmers do not think about these problems, because most programmers are trying to solve bigger problems and need to spend time thinking about the bigger picture. Getting mired down in low level details makes programmers less productive -- that is why we use programming languages.
So if we can have compilers that can automatically utilize multithreading, even if it is in a somewhat limited context, we should be happy. If you are happy to not be spending time thinking about padding data or allocating registers, you should be happy about automatic multithreading.
Palm trees and 8
All the performance gains are immediately devoured to make life easier for developers with no benefit to users.
Nonsense:
http://www.paulgraham.com/carl.html
A move to a higher level of abstraction that was so beneficial to software users that the entire market was upended. That is not in any way an isolated case; the history of the software industry is full of stories of good abstractions beating the competition and improving the state of the art. Abstraction is why Unix won, and the Unix abstraction was immensely beneficial to computer users.
Palm trees and 8
Actually I suspect it's something like dataflow but at a coarser grain. Figure out which basic blocks don't depend on each other then have them execute. Just guessing w/o reading the paper, but it's not like this stuff is new. And you still need to teach the programmers because no matter how good the compiler you can easily write code that isn't parallelizable. All those kids who thought theory wasn't useful in real life need not apply.
And what's with everyone calling processors "cores" now?
/casts detect troll 10 post radius. Multiprocessor technology was already established when Bill Gates was still dumpster diving for code samples, and a lot of the work that gave us modern high performance multiprocessing was done by companies like IBM, Intel (and Intel shootoffs like Sequent) back when Windows was just the latest new DOS application. At the extreme end of the spectrum, this is why all of the top 10 supercomputers run on the UNIX-alike known as Linux, and none run on a Microsoft based OS.
I'm afraid regardless of what Microsoft would now like to present as a PR image, they've consistently been embarrassingly late to the party. Let's not belittle their contribution though; they certainly throw more money at problems than any other tech company.
I write in Java, most days, so I depend on compilers to do a lot of work for me. I don't depend on them [compilers] to pick the correct algorithm for me.
Simple problems get library functions. Simple streams of instuctions get optiomized futher by the processor. It still requires an awareness of how to optimize the problem to get good results.
Can't sort a linear list like you sort a random access list, an't sort a directed cylic graph at all (lest you peer inside and find a wayto break the loops) at all, and you can't provide consistency, patition tollerence, and availability at the same time. Some things *are* written in stone, but not as many as I let on, but CERTAINLY more than I'd like!
Laziness need not apply.
md5sum
d41d8cd98f00b204e9800998ecf8427e
L4 isn't exactly trivial, but it's proof exceeds the length of it's code by whole factors.
Or forget the silly new stuff tied to specific platforms. We've had parallel languages and languages with parellel constructs in the past that are perfectly fine, even in not mutated by exposure to megacorporations. Occam, SISAL, Ada, etc. The trick is to get rid of the stupid programmers who don't want to deal with complexity or theory and get them back to doing mindless tasks on the web.
"MS R&D is the largest computer tech R&D in the world. Combine IBM, Intel, and AMD, and you get an idea of their size."
Citation need. Not disputing the first part. Just the second part about the relative size of Miscorsoft Research.
The paper in question is not really tied to any specific platform. They showcase it for C# (which is also multiplatform, by the way - there's always Mono), but the principles and math behind this technique can be applied just as well to any language with objects and references to them. And it does not replace what e.g. Occam did, but rather builds on top of that to add further, more high-level primitives to improve parallelism; it doesn't mean that you have to ditch the more basic building blocks, like the explicit parallel execution operator like Occam's PAR (indeed, the paper actually provides one).
Damn don't-reply-to-first-post curse. Reposting without typos: "MS R&D is the largest computer tech R&D in the world. Combine IBM, Intel, and AMD, and you get an idea of their size." Citation needed. Not disputing the first part. Just the second part about the relative size of Microsoft Research.
Sounds like user provides list of Structural/Control/Data Hazards. Compiler Pipelines code into blocks that can be ran in parallel. :)
Sounds familiar
Who logs in to gdm? Not I, said the duck.
These and other applications written in the source language are performance-competitive with established implementations on standard benchmarks
Translation: we didn't speed them up any, or at least not by enough that we care to share any number.
Amdahl's Law is difficult to overcome in auto-parallelising systems that rely on anything other than loop optimizations. Basically, in straight-line code, if you make 50% of the code run on multiple cores, you're only going to get at most a 2x improvement in speed. In practice, you won't even get that much (except in loops) due to added overhead. Bottom line: you can write papers about your wonderful auto-parallelizing technique, but when the rubber hits the road this is unlikely to lead to much performance improvement.
Have you read my blog lately?
..the point is that old code -- code that wasn't written to take advantage of multiple cores -- can now take advantage of multiple cores because of this new compiler without hiring you neckbeards to rewrite it.
Get rid of the eye candy and watch how responsive your computer is.
"First they came for the slanderers and i said nothing."
I guess it depends if you're writing the hot path in an operating system kernel/driver or if you're writing an application for the desktop. Details of the hardware make a lot of sense on one and not on the other.
someObj.someMethod(if(x == 2) { 5; } else { 10; });
Aarrrrghhh!!!!!111 Surely you would never *teach* students such monstrosities. The idea is to keep it simple, one statement or side effect per line of code. This helps you when you wave your debugger over the code, since it will automagically highlight the result of the statement. This also helps the people reading your program (and source code is meant for people firstly). You are trying to write Java in non-Java different style, no wonder that code you gave is so hideous. Of course, the Java compiler is so damn efficient that extra assignments for debugging and readability are no problem at all - let the machine mash the code, not you. Trying to mash everything into one line is for C and LISP coders (which I was at one time so I recognize the same style), not for someone used to multiple 1080p displays. Newlines really aren't that expensive, dontcha know?
As a language, Java is pretty awful; you are stuck with a single programming paradigm even in cases where that paradigm makes no sense, you are writing enormous amounts of code to express what you are trying to do, and features that programmers in other languages take for granted are just not there (or are very limited). Meanwhile, you have Scala and Clojure on the JVM -- so why not just use these where functional approaches make sense, like for parallel algorithms?
I would argue that Java is more general purpose than Scala or Clojure, at least for the kind of problems I'm often trying to solve (eg. device control, 3D graphics, medical applications, numerical modelling etc). Perhaps Scala and Clojure fit your/people's problem domains, if so, then I have no problem with people using them. Don't mistake that for being useful in the outside (non-teaching) world. Note also that functional constructs do have some overheads that iterative-style code does not (where the latter is not exclusive to, but more in Java's 'style', yeah?). For my problems that are massively iterative (eg. some parts of graphics, numericals) this does make a difference, and the functional constructs are unnatural and used so infrequently to be useless. The usefulness of threading is above this (eg. the multi-threading is coarse grained and limited to a few dozen threads, again functional programming doesn't offer enough of an advantage in this regard).
Back to your teaching. Surely new students follow the deterministic procedural code better at the start than having lots of anonymous closures popping up all over the place? It is more advanced users that like the constructs that bring them home to the Source, that is, LISP.
Yes, they should have used the ternary operator.
No they should not have. The ternary operator is for C programmers making the jump. Again, newlines are not expensive. There is nothing wrong with making an if statement and assigning to a variable. Then give the output of that variable to your function call. One statement or side-effect per line. The compiler can make this fast. Someone who is not you can read the code without thinking about it (unless they are snob, and think the code is too simple and should be slightly obfuscated in the way you have shown). With lines split out in this way you get to put a debugger on the statement you want because multiple statements are not mashed together, and again, the debugger will automagically show you the value of the variable or expression as you move you mouse over it (less clicking is good, yeah?).
If aiming for simplicity is good enough for Einstein and Steve Jobs, then who am I (or you?) to argue? As they say, "You can write bad FORTRAN in any language" and your example proves it. Just because Java (and all other languages) allow obfuscated constructs because people think terse code is better (it's not!) and those that write terse code are better coders (not true, such things are only held in esteem by those lacking several decades of experienc
"From everything I've read about threading"
I think you just made his point for him.
Have you ever proved the correctness of a program? I have. It took a long time. And it was a very simple program. From that we learned that correctness proofs are for when you're doing very simple embedded programming with lives on the line. For everything else there's sanity checking, exception handling and debugging.
I think his complaint was there are a lot of shitty abstractions in use that happen to import a kitchen sink and bathroom in when you only need a bedroom, he just didn't realize it. Most of that slowness is due to the fact that hard drives and flash in small devices is still many times slower then the speed of the processors. I think .Net was bigger then a W95 install. Stick a fast SSD with high IOPS and you kick even the lightest weight OS from '98 out the door.
The other thing he fails to state is that the OS from '98 will die a hot death on the net today. You do use up a bit more processing time verifying that your input isn't trying to kill you.
WAT?
One of the things that got the llvm project moving along was work on Single Static Assignment compilers. The basic idea is that you take the parse tree and transform it to find segments of the tree that don't depend on other branches of the parse tree. Once you find those, you're free to dispatch the independent branches to be run wherever it's convenient.
And this was like a decade ago. Nobody is saying that to get parallelism you have to write functional code. Did somebody read an article on Erlang in CIO magazine and get confused?
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
I don't know what portion of their staff is research, but IBM has 440,000 full-time employees, MSFT 90,000 (yahoo finance).
Contribute to civilization: ari.aynrand.org/donate
This sounds very like work done by Josh Fisher leading up to his Ph.D. thesis in 1979.
"And what's with everyone calling processors 'cores' now?"
Simple - it's because "processor" still generally refers to the chip and its packaging as a single device.
Most processors have a number of structures and resources that are shared by some or all of their execution units (such as a processor-wide shared L3 cache versus a single core's dedicated L1 cache). Since an execution unit genrally can't be separated back out from the rest of the processor, but it still mostly functions as an independent unit, it doesn't make sense to call such a unit a "processor". Someone had to come up with a term that describes those execution units, and since they collectively make up the central operating core of the processor, "core" just makes sense. Who came up with that? No clue.
Optimization has a long way to go, and I'd guess that it's really difficult to write a good optimizer. I recently wrote a 128 bit x86 assembly cosine routine more than twice as fast as the maximally optimized libquadmath routine written in C.
Contribute to civilization: ari.aynrand.org/donate
I don't know what morons you have worked with, but straight out of college every company I talked to expected me to know at least basic multi-threading. My company now is not even that big and we use multi-threading extensively. An auto-threaded compiler though could run circles around 99% of all developers. This is great for doing large levels of optimization automatically, and just like with manage languages it is a great asset to programming in general.
Yes, there are lazy developers that do not understand what a managed language is doing in the background, but look at it this way at least there is not near as much time wasted on debugging their lazy ass pointer arithmetic or messy memory leaks. There are also people that are insanely good at memory management such that they can rival or beat most managed languages, but the effort involved has highly diminished returns. Not only that but no more thread-lock that takes weeks to track down in highly complicated multi-threaded applications.
This in my opinion could be huge boost to lots of fields. Imagine a completely scalable genome sequencer running on a super-computer. We are talking minutes or even potentially seconds to compute a sequence if it continues to grow on a Moore's law scale. Applications in physics, biology, and tons of other fields make this not just huge for computer science, but for everyone.
Moore's law is about transistor counts doubling, which is what has been occurring. An i7 has *a lot* of transistors.
But, no matter how many threads and cores you throw at a problem, you are still subject to Amdahl's law.
Of those, IBM have around 439,999 project managers.
I always just called the "core" the processor. Sure they share some higher level caches but that's only because someone moved it into the chip (or at least the package). What do you call it when the entire processor doesn't even fit on a single chip, or single board? The distinction about what's on silicon, what's in a "package", what's on a board, etc, is all implementation detail and should not affect the naming that everyone has understood for decades and is described in all textbooks.
And it remains a small subset.
The so called breakthrough at microsoft sounds a lot like the polly.llvm.org project which started 3 years ago.
Best ever comment ... (Score:+9)
AccountKiller
So all those threaded programs in daily use are just my imagination?
Perhaps it's time to recognize that not all programmers are the same. If it appears that no programmer can handle threading, perhaps it's because ageist management usually pushes them out before they accumulate enough experience to actually manage it.
So, the choice now is between no more improvements in performance, hire experienced programmers to actually code efficiently after years of blowing efficiency off with assurances that next year's faster CPUs will be fast enough to make up for crappy code, or hire programmers experienced enough to SUCCESSFULLY write multi-threaded code that can take advantage of the new bazillion core processors.
Perhaps in another 20-40 years compilers will be as good at auto-threadibng as they are at optimizing serial code now.
Some people just aren't that smart. Are you quite sure that threading is no obstacle for someone of an IQ of 100, supposing said person is hard working? I don't know. I know that auto-threading compilers could save me a lot of headaches.
Abstractions are a source-level concept. They should be more or less compiled away, especially in a JITed language like Java/C#.
That said, the, uh, nephew(?) poster makes a good point: bad abstractions can cause problems.
First of all, I would like to point out that many of these ideas present in the paper have been proven to not work practically, as one will take immediate note of the use of C# and similar architectural similarities and assumptions with regards to Microsoft's Universal Compiler. Microsoft (et. al.) I would like to point out, many other organizations with far stronger talent and resources have also been working on the problem of Thread mutex issues.
With regards to Microsoft, they thought they had the problem solved with the .Net architecture of separating language constructs to code generation and with it reducing it to proof at runtime. Problem is it would seem they could never get it quite right with either bad code being produced, or the human coder coming up with situations and algorithms that just plain could not be proofed at runtime and ran single threaded anyway. Usually in a really bad way. (i.e. Employing the use of locking structures in the runtime code that really whacked CPU affinity and cache coherency.)
This resulted in some fairly spectacular failures with systems designed with these sorts of assumptions as described in the papers "Lemma's" section.
Secondly, I think given the length of time this problem has been around one hope remains for a more straightforward approach to language concurrent syntax. Like I said, this problem has been around and has been looked at extensively in my opinion by much more competent people long long LONG ago...(IBM Labs in 1990's PowerPC compiler papers there.). What hasn't been around is the hardware options for solving this issue. Now that we are at 22nm as well as using all three dimensions of that tiny space hardware will be advancing at a very pace for the next several years. With all of the extra room, there is hardware assist that may become practical to make at least the locking facilities more straightforward with registers and dedicated memory to handle mutex atoms as a sort of map for the compiler to fill out for execution. No code analysis required.
So now I am SUPER skeptical. :-) Primarily because no knew computing hardware was mentioned by the authors and how the concurrrent algorithm generation would be assisted in the use of atomic operations on today's typical NUMA systems.
But after reading the paper, I am not hopeful that this is practical or that it even works correctly.
Microsoft has tried to foster this architecture on the exchanges by attempting to apply this sort of reasoning from similar papers in the past and apply it to massive computational concurrent computing and it failed in a spectacular way. (I am talking about the worlds exchanges on Wall St. London and Tokyo.)
After several extremely costly failures that occurred wherever this system was deployed, all of world wide banking exchanges went back to LINUX, which is currently the all time master of concurrent computing world wide on scales that leave this paper and its researches I am afraid far far behind.
All hand coded and with very little optimization work to boot via gcc.
-Hack
Got Geometrodynamics? Awe, too hard to figure out? Too bad.
I'm sure that they have produced all the worlds great literature the last 30 years as well. Just their size doesn't make them the source of everything good.
I haven't seen any GPU or CPU design used in modern computers come from MicroSoft. Sure, they have worked with it to implement it into their own software. They probably have been able to express their preferences to hardware vendors, in order to make them support the hardware and make it perform good in their software. High Performance multi-core kernel design in Windows was, if I call correctly, taken from VMS and Compaq design when they designed Windows NT. MicroSoft has been struggling to get over 16 cores effectively supported by their OS and most definitively by their applications. Linux has been getting great support from SGI when they ported their NUMA architecture from IRIX to Linux and building on that, Linux has been expanding and rewriting their multi core stuff more efficient and larger than MicroSoft has done with Windows.
Now what technologies has Linux taken from MicroSoft (free) research exactly? What MicroSoft patents were violated by Linux again? What's that eerie silence all of a sudden?
I was promised a flying car. Where is my flying car?
Ahhh, Occam, the memories come flooding....
Mono works worse on the average program than Wine, if you consider C# multi-platform then C/C++ using the Windows API is multi-platform even more.
someObj.someMethod(if(x == 2) { 5; } else { 10; });
In what version of java is this legal? I have never seen anything to make me think that would compile, and trying it out for myself I can't make it work:
class main {
void someMethod(int bla) {
System.out.println(bla);
}
public static void main(String[] args) {
main m = new main();
m.someMethod(if(3 > 2) { 5; } else { 10; });
}
}
All I get is errors on line 8 when trying to compile this. I don't know where you learnt this, but I don't think it was in Java class. Are you sure you're not getting Scala and Java confused?
>Or, gawd forbid.. we could teach programmers how to use threading?
My Master's Degree was in High Performance / Parallel Code, and I used to TA the subject, and worked at the San Diego Supercomputer Center.
I worked with one of the original auto-threading compilers back in the day (on the TERA supercomputer). It was a lot easier to port code to be automatically parallelized than it was to refactor code to be multi-threaded. Some code was easy enough to parallelize manually - for example, you can find one loop that does the lion's share of the calculations, and you just worry about parallelizing that - but some was a complete and total bitch to refactor. So an automated tool to find parallelism was pretty neat.
I ran the Quake 2 utilities through the Tera, and it found some minor speedups, but nothing too incredible. I had a few conversations with John Carmack about it, and it confirmed what he thought about auto-parallelizing, too.
So maybe Microsoft has finally done it right. There's no way to tell until I can test it for myself.
MS R&D is the largest computer tech R&D in the world. Combine IBM, Intel, and AMD, and you get an idea of their size.
For a company whose main job is to develop software, it is not surprising to have a large R&D.
But if you talk about research, can you please remind me how many Nobel prizes IBM and Microsoft have each won ?
Please recompile the .NET garbage collector with it.
And all that multithreading goes flat when the GC enters the global lock, when it is time to clear out a generation
The problem here is that unless the programmer knows what is going on wrt threading, the compiler will necessarily be limited to threading across short segments of the code.
OTOH, tools using some of the techniques in TFA to act as a sort of lint for threads would be a very good idea.
How frequently do you spend time thinking about which registers are being used to store partial results in the programs you write? When was the last time you spent time thinking about which registers are being used by which variables, or when it would be a good time to store register contents in memory?
Very seldom since the processors I use treat most registers the same way and I know that the compiler won't use the other registers needlessly.
I do however often verify that the compiler generated acceptable output because this is not always the case. For modern compilers to work as intended you have to structure your code accordingly. Changing from while(blah) {}; to a do {} while(bleh); and similar things can do wonders.
So, basically you tell the compiler what's immutable, readonly, isolated, etc. so it can parallelize the code easier. It seems clever, however, I don't see people good in tbb and/or openmp et al. rushing to drop and move over. But others might like it. Also, someone who's familiar with Intel Cilk Plus or Berkeley UPC could say a few words about this.
I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
declare normal references, read-only references to mutable objects, references to globally immutable objects, and references to isolated clusters of objects
Somebody re-invented type safe pointers, const pointers, pointers to const and container objects. ;)
If you talk about Nobel prizes, could you please remind a forgetful Swede how many are awarded in mathematics or software engineering? Or is research strictly a speaking only something that happens in physics, chemistry, medicine.... or literature... or (snrk) in (giggle) economics (bwa-ha-ha-ha-ha!)?
It's almost as hard as writing sentences that don't having grammar errors.
xkcd is not in the sudoers file. This incident will be reported.
With multithreaded programming, there are simply more ways to make mistakes. Far more than in writing a Slashdot post, and yet...
xkcd is not in the sudoers file. This incident will be reported.
This is good for number crunching applications that run for tens of seconds or more, but there may be a problem for small workloads: CPUs are normally clocked down when idle. If, say, Java was compiled with this tool and I tried to start Eclipse, it would have to realise that all four cores needed to be clocked up to nominal speed, rather than just one. This means more code will be executed with the very slow 800 MHz or so setting. Worse than that, if the compiler can't parallelize the program perfectly, you'll have cores with the clock speed constatnly being turned up and down. Or you can make the CPU frequency governor so sensitive that it doesn't clock down easily, but that will use more power. Similar concerns with the "turbo core" mode.
It just means that people shouldn't compile normal application software with auto-parallelization, but only actual number crunching code (Photoshop filters, for example, if they are not already threaded).
And GUI programs offer many more ways to screw up than batch programs...
Damn, a great comment completely spoiled by a stupidly incorrect final summary.
In fact, FFT methods are used "in the real world", when it makes sense. Your comment even says when it makes sense (> 10,000 digits) and then you blow it by pretending that 10,000 digits never occurs in "the real world". Well, sure bud, except when it does. 10,000 digits is a very small number in some domains, just not domains that you've considered.
So, nice post about constant factors - these are important things to understand - but STFU about "the real world". There are plenty of non-toy examples where hand-optimized assembler can still beat the compiler hands down. In the real world I spent three months optimizing the G729 codec on a certain processor, and made it two orders of magnitudes faster. I was paid enough to convince me that G729 isn't a "toy problem". According to your last sentence, things like this never happen in the real world, and I'm afraid that's bullshit. And you know it's bullshit because it contradicts everything you've already said, so I don't know why you even typed that in.
Did this also happen when you forgot to write the Data Access Layer or didn't use a template engine, UI framework or communication protocols? Since I guess that too is just for lazy programmers who don't know what their code is doing...
C# is a "toy" language akin to PASCAL, but not quite as good. And I challenge their claim is true if
all of the "features" of C# were utilized. For example, there are many basic features missing in C# that
require calling external functions in some MS DLL library.
hmmm, what exactly is the point of finding these "decreased computational complexity" algorithms (such as the matrix multiply coopersmith-winograd you mentioned or the integer multiply schonhage-strassen) if the constant in front of it is so large that the benefit is not seen for polynomial A (naive algorithm) vs polynomial B (lower computational complexity algorithm) until you get very far out on the x-axis (or would it be the n-axis?) ?
Is it purely to find a better algorithm, or is it to show that as the size of the problem increases, there might be a better algorithm that would actually be of use?
... is like Scala?
re: But even make will do things sequentially. Couldn't something like tsort be used to do a topological sort and list the dependencies and branching, and allow for branches that do not share dependencies to be run in parallel?
And the last one cleans the office and makes coffee.
I'd also like a citation. I won't dispute that MS does some neat things but I've seen a lot of Pro-MS postings tossing buzzwords around lately. At least this posting was on a relevant story.
Isn't this one of the myriad issues that the VLIW compilers ran into, since in VLIW, just about everything has to be done by the compiler and there are no dynamic analysis portions on the chip itself? That was what sabotaged VLIW's attempt to go one-up over RISC - the percentage of die area that RISC was using for functions like branch prediction, speculative execution, register renaming was really low, so VLIW didn't save that much in terms of silicon, but at the same time, the perfect compiler that it was banking on never materialized.
It would seem that the Itanium would be one of the best platforms to test the above theory. If only someone would make an Itanium based workstation w/ which to test that. Although since Microsoft has dropped support for the Itanium, that may be problematic. Or maybe they could take one of their Itanium boxes, put an internal version of Windows 8 on it, and then put this new compiler on it and try it.
Looks like this VLIW compiler would have been a perfect project for Microsoft Research.
hmm, what exactly is the point of finding these "decreased computational complexity" algorithms
That is a quite astonishingly ignorant statement. That's like saying "what is the point of any research".
I'll leave you to figure out that for yourself.
Perhaps you should research it.
SJW n. One who posts facts.
Well, arguably that is no harder than rewriting it all to be multi-threaded.
I think the articles's approach is a good one - it allows procedural code while hiding threading details from the programmer, which means that the programmers task is to properly tag code and not to try to carefully implement locks/etc.
Problem is that only a limited amount of parallelism can be achieved without changing the system itself - see Amdahl's law. Essentially, if in your system operation B consumes the result of operation A then A has to happen before B, and there's nothing a language or processor can do to help. As a result, any automatic multithreading will only be able to achieve a limited amount of speed-up. You'd be lucky to get a few times speed-up and extremely lucky to get a few tens of times. That may sounds a lot, but it isn't really, particularly if we find ourselves in a world of 1024-core CPUs. And even getting gains like this isn't totally automatic: in this case immutability hints have to be added, so they're not magically parallelising unmodified code in an existing language.
To go beyond Amdahl's law is a higher-level design challenge: you need to restructure things on larger and larger scales in order to get improvements. Not only that, but the parallelism choices you make on a larger scale can negate or even reverse the benefits of lower-level parallelism such as the language itself might have applied. As a trivial example, a problem might be embarrassingly parallel on a large scale, allowing you to run large sequential tasks side-by-side, but if each of those tasks is attempting to parallelise itself things are suddenly a lot more complicated.
This certainly doesn't mean that there's no point in having something that can automatically parallelise code at the small scale, or that finding ways to make things easy for programmers isn't a worthwhile area of research. However, it does make this a qualitatively different problem to something like choosing registers: you never have to completely redesign a system in order improve its regster usage. Some forms of parallelism could perhaps (it has't really been proven in practice) be considered a low-level detail that can be hidden by a language, but parallelism as a whole is something that affects design. As a result, however good the tools become, as long as there are programmers and a need for parallelism there will be a need for good programmers to understand parallelism, whatever language they are using.
I think it's better to roll all this up into simply using the right tool or technique for the job. You have tons of code to write, doing it all in assembler may not get you to the finish line very quickly.
I am really a big fan of profiling code projects and then looking for critical areas where speed ups are warranted and then writing these in assembler directly. If you know exactly how the CPU you are working with operates, you can optimize your assembler better than a compiler with all the tricks they try. I had an argument/competition about this with someone that was a compiler evangelist decades ago. I trounced his beloved Watcom C compiler. Things have changed a lot since then, but you will never convince me a compiler can out-wit ALL human beings.
Nobel Prizes? None. Because there is no Nobel Prize for computer science. The closest equivalent is a Turing Award. I haven't checked the whole list, however the 2009 winner works at Microsoft Research (and has since 1997), the 1998 winner is now dead, but was working at Microsoft Research between 1995 and his death. You can check the list yourself for IBM.
I am TheRaven on Soylent News
This is why you don't drop out of college. If your college is any good, they'll teach you how to design and code parallelized programs. Threads only introduce subtle bugs when you use them wrong (and you're using a threading library instead of reinventing the wheel right?). You don't waste time parallelizing trivial parts of your application. You parallelize the data and/or CPU heavy computations. There are some problems which are trivial to parallelize.
When was the last time you developed any type of program that was proven to be correct? Do your hello world programs handle a closed standard output stream?
You just did not "get" it, did you?
It matters for both/all situations if success of the application depends on how fast it can happen. Think of weather simulations and nuclear explosion simulations and biological simulations. All those need to be as fast as can be to be as useful as they can be.
The point that was so well-said by the previous poster is that you doing multithreading to the nth degree while writing comprehensible code is pretty much impossible. Compilers, however, can be taught how to do the intricate detailed analysis of the execution flow that your code implies and figure out how to multithread it more correctly and more completely than you would ever be likely to do.
Let me repeat myself: except for very small programs or very short inner loop bodies.
Yes, and you're still wrong. Video/audio decoders are not "very small programs" nor are game engines which also extensively use SIMD.
Was functional programming supposed to be a way to take development back from... uh... Indians?
Do you know how we can write to log files in functional languages? For any diagnosability in a program, logging is essential.
"Log" is a difficult word to search these days, so couldn't find much on my intial research.
thanks
Bingo Dictionary - Pragmatist, n. A myopic idealist.
To whoever posted this originally:
The 1990s just called. Seymour Cray is on the phone laughing at you.
Sounds like you're running the wrong OS or the wrong applications. My 2gb/2.2Ghz dual core desktop is *much* snappier than my previous 500Mhz/512Mb desktop, including when it's doing four things at once instead of just one.
If I cared. Or had an edit button. Or if this was grammardot, news for grammar nazis.
DataStructures coursework shows students this much easily, per my subject-line above... now, on THIS from you? Small "correction::
"First, just to be pedantic, I'll point out that quicksort is as bad as bubblesort in the worst case" - by betterunixthanunix (980855) on Monday December 03, @09:19PM (#42175437)
In this case I'll outline, Bubblesort's actually FASTER than quicksort - however, ONLY IF THE DATASET IS ALREADY MOSTLY SORTED... fact!
Otherwise though, quicksort's probably THE most "all-around optimal sort" there is + easy to implement, especially since it's (or a variant of it) usually "built-in" to many GUI controls as a method (think list boxes), AND, over the most case scenarios (hence why in many to most GUI oriented controls with a sorting method "built-in", some variant of quicksort's USUALLY used)!
"In the real world, quicksort and mergesort will use asymptotically worse algorithms below a certain problem size because of constant factors" - by betterunixthanunix (980855) on Monday December 03, @09:19PM (#42175437)
See above - that's one you overlooked/omitted - mostly sorted datasets...
---
Sort strengths vs. weaknesses:
SelectionSort - Good for small datasets & very easy to implement plus decent on non-mostly sorted lists
InsertionSort - burns LOTS OF TIME on placement of item into sorted list
Bubblesort - best on MOSTLY SORTED datasets
Quicksort - has recursion overheads during pivot use for "divide & conquer" sort methods it employs
Mergesort - has recursion overheads during pivot use for "divide & conquer" sort methods it employs but has temp overheads
Heapsort - Uses b-trees, but has no temp "scratch area" like mergesort (avoiding paging)
CountingSort - Fast for small datasets (think 1000 items or less) that deal in integers (not so fast otherwise)
BucketSort - Fast for linked list sorting (faster than insertionsort typically) but overheads in linked list creation
GENERIC RULES:
A.) If your list is 99% sorted? Use Bubblesort.
B.) If you want a GOOD "generically fast" algorithm for sorts?? Use quicksort (good over a large range of dataset sizes and sorted vs. unsorted - optimal).
C.) If your list is TINY (think sub 100 items), use selectionsort
D.) If you use linked lists? Use Bucketsort
E.) If your list range is tiny (relatively speaking, say 5,000), use CountingSort
---
* You're also correct in that not EVERY PROBLEM LENDS ITSELF TO MULTITHREADING HELPING IT (in some cases, it can HURT performance, as in on a single CPU/single CORE based system)...
"Multithreading is not a magic bullet, and in all likelihood it is not generally applicable." - by betterunixthanunix (980855) on Monday December 03, @09:19PM (#42175437)
Agreed, 110%, by example!
E.G.-> Let's say I have this chain of events:
A = A-B
B = B+C
C = A-C
A can't get ANYWHERE until B finishes processing... C can't get anywhere until A gets done processing. This type of scenario isn't something to put threads into, it would ADD OVERHEADS needlessly, to NO GAIN!
APK
P.S.=> Personally, as far as multithreaded code goes? I've been writing it since 1996 using CreateThread API work, but I generally "steer clear" of "fine-grained" multithreading (where the threads can touch THE SAME DATA, since this is where lockup due to race conditions occur) IF/WHEN possible, & only use it IF I totally completely understand the issue & data...
I usually go "the easier way" using "coarse" multithreading" (where the threads operate on diff. kinds of data or tasks, such as an example of printing 1 spreadsheet page on 1 thread, & formatting another on another, & then updating interfaces on yet another (or, just using "idle loop time" on the latter))...
... apk
This would be an example of a small segment of of highly specialized code that can be hand optimized to great benefit as the parent pointed out. I have done similar things in the past and gotten similar results. This was for some AI code in college and yes it did wonders but that was only a small subset of functions that were run the most, the vast majority of the rest of the program was done in C as it allowed me to concentrate on the higher level design.
Time to offend someone
http://www.cs.iastate.edu/~prabhu/Tutorial/CACHE/amdahl.html
http://en.wikipedia.org/wiki/Amdahl's_law
Multi-process is the scalable architecture of choice. You get the same advantage of utilizing all available cores within a given hardware context as well as the ability to expand across multiple hardware contexts. Hardware still gets limited eventually by memory, cores, network throughput, disk, etc. Multi-process allows you to utilize more pieces of hardware to scale up your system.
Multi-threading unnecessarily complicates the ability to develop, debug and maintain a program and is still limited within the hardware context it's running. If you want real, scalable performance that isn't bottlenecked by cores, memory or other hardware context limitations, design for multi-process architectures.
Runesabre
Enspira Online
https://en.wikipedia.org/wiki/Microsoft_Research
http://researcher.ibm.com/researcher/search.php?sn=1
The idea is to keep it simple, one statement or side effect per line of code
Except that you are then going to either (a) duplicate code or (b) have to introduce and keep track of more variables. In what way is that simple?
You are trying to write Java in non-Java different style
Perhaps I was not clear: that was a mistake that one of my students made. They did not know Java, they were still trying to learn it, and the point is that the Java style of programming is very unnatural. "Call f with 5 if x is 2 or else 10" is how the student was thinking about the program; they were forced to think about it instead as "if x is 2 call f with 5, otherwise call f with 10."
Note also that functional constructs do have some overheads that iterative-style code does not
Yes, there are limits to functional programming. There are limits to all programming paradigms, which is why languages should not force programmers to use a particular paradigm in all cases (and as it so happens, Clojure does have a loop construct). One of Java's major problems is that it really lacks support for anything other than object oriented programming, with only limited support for other paradigms or styles.
For my problems that are massively iterative (eg. some parts of graphics, numericals) this does make a difference, and the functional constructs are unnatural and used so infrequently to be useless
Perhaps so, although I suspect that in both cases object oriented programming is not universally applicable, and that you would appreciate the ability to just write a function without having to dress it up with class definitions. When I was an undergrad, I knew a lot of professors in the EE department who used Matlab to do computationally intense signal processing, and I know people in the CS department here who use Matlab for image processing work (and they would rule out both Java and any functional language for that work).
Surely new students follow the deterministic procedural code better at the start than having lots of anonymous closures popping up all over the place?
They do, but what I was saying was the Java is not an easy language to learn. Let's put it this way:
If you have never programmed before, that is an awful lot to deal with in a first program. It is certainly more to deal with than this:
In the Java case, most of the questions students were asking were about public, static, void, main, class, String, and then questions about function definition syntax, what "args" even is, etc. The Lisp case is far simpler and far more straightforward. I would say that it is acceptable to teach Java if all that complexity would eventually lead to simpler programs, if it was just a matter of scale, but unfortunately that is not the case; Java's syntax does little to keep complexity down, and the "everything is object oriented" approach adds a lot of complexity.
I find it interesting that you would mention closures here; in introductory Java courses, it is almost guaranteed that students will have to write at least one class definition. Do you think students have an easier time dealing with classes and objects than closures? Why even bring up anonymous closures; do you think new programmers have an easy time dealing with anonymous classes and objects (do you think we even get that far in a single semester)? My experience with teaching students how to write classes was this: it's hard for people who have never programmed before, especially when they are writing programs that are tiny and benefit little from classes. I would say that object oriented programming
Palm trees and 8
Do you know how we can write to log files in functional languages?
By using a monad such as Haskell's IO.
The trouble is, you have to wrap with it any parts of the program that need logging.
My exception safety is -fno-exceptions.
teach a compiler?
So we have somehow gone from writing programs (a compiler is a program) to teaching a program?
Well we have certainly achieved artificial intelligence. Any fool who thinks there is a difference on this issue is themselves artificially intelligent and not using their own computer called a brain..
What does a compiler do but translate the input into output the hardware understands and preferably in an optimized manner.
So... you don't teach software how to translate, you program it to.
quicksort is as bad as bubblesort in the worst case
Pick the pivot using recursive median of 3 on some percentage of values. This will pull the worst case to O(n^2/log n).
I've spent a lot of time in the earlier part of my career writing multithreaded applications (mostly client/server stuff) and I got very good at it. That said, it's a complete waste of my time as a developer. Developers should focus on domain problems, not how to tweak the microprocessor. In my experience, most developers don't seem to ever fully get how to write multithread apps and threading problems are a royal pain in the ass to debug. I can't see a better candidate for automation than taking this out of developer's hands.
I used to write my own linked lists and hash tables but I don't do that anymore either. It's a new era and it's all about the domain.
Are agnostics skeptical of unicorns too?
By the way, Moore's law also took into account minimum component cost, so he was talking about complexity of components at the minimum cost, which also brings economics into the equation. It's not only that the number of components on an IC is (or was) doubling (and now it's about multiple cores), but it's that in order for this to be accepted, the cost of production cannot expand by the same factor, the cost of production should stay minimal, the increase should be a constant, not a function (which it is, the constant cost is the cost of development of the architecture and the cost of building a fab).
Wow, I thought you were gong to make it through an entire comment without reciting your religious mantra - I was really rooting for you. Tragically, you just couldn't do it. After writing a mostly on-topic message, you closed with
Bringing back human slavery will allow us to continue with Moore's law. Otherwise, we are DOOMED
Not sure if Lisp qualifies exclusively as a high level language. I'll dissassemble my functions all the time (it's built into the language) to see what assembly is being generated. If I wanted to, it's trivial to jump to the source where a compiler decision is made or a virtual operation defined and tweak it... not that I have.
Lisp is high level when in the development stage, but you can pretty much get as close to the metal as you want.
Don't complain about syntax, grammar, or spelling. There is no.hell like input on android.
The only optimization you can really do for an idle loop is to underclock your CPU so that the whole processor uses less power. But that just leads to the CPU not being the most power-hungry component of a computer. On tablets, for instance, the screen's backlight takes a lot more power.
This needs a +6:MakesYouSmarter tag.
That is a matter of scale. An audio or video decoder is tiny compared to an application server (JBoss, at 80MB of code, is on the small end; Websphere is 2GB), and even audio and video decoders are rarely written in pure assembly language (and you can measure the difference between codecs that were optimized by a compiler and those that were not, even with those hand-rolled optimizations). Game engine SIMD is exactly where I said it would be: in very small functions (i.e. programs) or in inner loops.
As I said, hand-optimization scales poorly. You are talking about niche cases involving instructions that were design specifically for those cases. In the general case, where you did not receive some special set of instructions to do things for you, humans are not nearly as good as compilers, even humans who are well-versed in their machine's architecture.
Palm trees and 8
"Is it purely to find a better algorithm, or is it to show that as the size of the problem increases, there might be a better algorithm that would actually be of use?" - by girlinatrainingbra (2738457) on Tuesday December 04, @05:49AM (#42177533)
Per my subject-line above? BOTH really... it's about efficiency & speed in different dataset scenarios!
Here's a "Big O notation" table of BEST case, AVERAGE case, & WORST case scenarios, respectively (for each general sort type):
---
Selection - BEST = O(N squared) - AVERGE = O(N squared) - WORST O(N squared)
Bubble = Same as above EXCEPT best case where data is MOSTLY sorted (again, here it excels)
Insertion - BEST = O(N) IF data mostly sorted - AVERAGE = O(N squared) - WORST = O(N squared)
Merge - BEST = O(N log N) - AVERAGE = O(N log N) - WORST = O(N log N)
Heap = same as Merge
Quick - BEST = O(N log N) - AVERAGE = O(N log N) - WORST = O(N squared)
---
* I.E. -> They DO all have merits in different scenarios (regarding sort state of data, and of course, size too, as well as if they incur recursion memory overheads, which can induce paging, etc./et al)...
Like multivariable calculus & computing optimization curves? There's often MORE THAN 1 SOLUTION to any problem...
(Trick is, finding which solution fits the data scenario best, for optimal performance in speed (and even memory efficiencies))...
APK
P.S.=> I posted more about it here -> http://developers.slashdot.org/comments.pl?sid=3291593&cid=42178921 with some examples...
Also - How/Where/When multiple thread usage has DOWNSIDES (on single cpu rigs due to overheads cost incurred, AND as well as a case example where all the threads in the WORLD don't make a difference, & can make it worse due to overheads costs incurred (due to "blocking" occurring on operations & their results needing to happen, first))...
... apk
How frequently do you spend time thinking about which registers are being used to store partial results in the programs you write? When was the last time you spent time thinking about which registers are being used by which variables, or when it would be a good time to store register contents in memory?
What about memory alignment and packing? Calling conventions? Near pointers and far pointers (OK maybe that one is a bit dated)?
Every day at my job writing embedded software for an automotive supplier.
That is a quite astonishingly ignorant statement. That's like saying "what is the point of any research".
And yet your comment seemed even more useless to me, than hers did to you... funny the way that works.
Please don't deride people on here for asking seemingly honest questions.
Or, gawd forbid.. we could teach programmers how to use threading?
I know how to use threading.
And, as a direct result of that knowledge, I push hard to avoid the usage of threads.
The problem is that when threaded code has bugs, they can be brutally hard to find, reproduce, debug, and analyze.
All of the code I have written professionally has deliberately avoided the use of threads. (I do application and system programming on embedded systems.)
As a result, I have saved my employer a huge amount of time and money over the past 8 years completely avoiding all those weird, flaky kinds of bugs that only show up occasionally in the field, and that can't be reproduced in a debugger -- that are the hallmark of bugs in threaded code.
Racist construct from decades past creeps into computing language. Why is this necessary? Holy grail != great white hope. How about rosetta stone, lost ark instead?
which even the smartest programmers in the world think they can handle
Nope.
The smartest programmers fully understand the level of difficulty and complexity that comes with using threads, and they know how brutally hard it can be to find, reproduce, analyze, and debug the problems that result from bugs in threaded code. And, as a result, they are smart enough to avoid using threads in the first place.
They know enough about threads to know that the amount of time for them to fully understand how threads would work in their code is far greater than the amount of time they are authorized by their employer to spend.
pfff... i designed a much better autothreading language, much better, much simpler, much more intuitive, and allows for data-parrallel threading. a long time ago, too. https://sourceforge.net/projects/tsunamiprogramm/
I know a lot of apps that could take advantage of this technology right now.
They do actually spend more, not to say they're spending it better. For 2011, 2010, and 2009, companies spent:
(billions)
MS: $9.0, $8.7, $9.0
IBM: $6.0, $6.0, $6.0 (IBMs 10-K filings do not list specific numbers. Each year says "approximately six billion.")
AMD: $1.5, $1.4, $1.7
Intel: $8.4, $6.6, $5.7
It was an obvious mistake to include Intel in the list of "more than ______ combined". Both Intel and MS spend more than AMD and IBM combined, though.
Thanks! Thanks for taking my question seriously and for giving me a useful explanation and answer. Also, thanks for the pointer to your other /. comment about this. :>)
Even the best of C or C++ compilers are terrible at vectorization of code.
The 1990s just called. Seymour Cray is on the phone laughing at you.
Really? Even simple examples like this:
Results in 765 byte monstrosity by latest icc and 1095 bytes with gcc
Moore's *observation*, as the poster noted, got sick in 2004; which is when CPUs stopped getting faster. The important thing here is the lying; most sources -- i.e. the computer industry and the pathetically-mendacious magazines -- still pretend *nothing* happened to Moore's whatever-it-is, but in fact obviously something happened and, most noticeably, it involves a *lot of money*.
Whatever this parallel press release is about, Microsoft's entire .NET feature program since around 2004 has been things that were supposed to make parallel processing work good. At last. And maybe this one will.
When I was an undergrad, I knew a lot of professors in the EE department who used Matlab to do computationally intense signal processing, and I know people in the CS department here who use Matlab for image processing work (and they would rule out both Java and any functional language for that work).
Matlab is ok (I use Octave from time to time). However, it not not really a general purpose solution. If you get to work outside your university you will come up against them. That is part of the problem with our discussion I think. From your post you are an academic working in CS department. Well, I'm an ex-academic (astrophysics) who gained sufficient energy to reach a free state. I work outside of university and get to build real systems in the non-ideal world outside the Ivory Tower. I'm not saying your observations are invalid, they just aren't practical with the variable quality of people you have out here.
With regard to the "extra effort" you suppose. Well, I'm well aware of the psychological limits of working memory because I also am trained in the Information Mapping methodology of technical writing (and have do aviation, where those limits were found). I continue to disagree with you. Making variables and breaking long chains of calls up helps debug unfamiliar code (eg. written by someone else, or yourself many years ago) because the machine can help you with it. You don't have to remember all the variables because you've given them great names like "specificImpulseNetwons" or "annualInterestRate". Note how units become important. I could rant how the CS that I learned as an undergraduate placed emphasis on the wrong things for industry: like worrying about how few lines a toy program can be reduced to; how making yet another language will solve all programming problems (for a specific domain); like proving the algorithmic complexity for some obscure domain. Don't confuse that with teaching students useful things; like always thinking about the units they are working in (failing to do so will famously get your Mars Rover crashed), or how to design stuff to be as simple as possible (so lesser programmers can build and maintain it), how to get sh!t done quickly, give them 'religion' about testing or how to write robust software (hints: test, check your preconditions always, validate input data, assume that the stuff can fail at any time and fall back to a safe checkpoint state, etc).
It is not very useful to continue this debate. You are interested in languages (for the sake of it I would assume, given your career is CS) and in constructs that are easy to teach. I'm interested in getting big projects done, getting them done on time, getting a team to produce it, making it bulletproof, and making it so that it can be deployed and maintained by folks who don't know anything about programming. This is why I recommended Java, and why so many enterprises use it - and because there is not yet a sensible alternative to it (yes, yes, the good is the enemy of the great - but I'd rather choose a good general purpose language over a great limited purpose language because of the enormous economic benefits of re-use).
Threading isn't hard, just learn to design instead of throwing code into files and hoping it works. I find it easy to visualize the way different parts of the system interact.
My first real-world project after college was to fix a program that took a 6 man team 9 months to create and another 6 months of debugging. I realized it would be the program to parallelize. Did a week of reading into threading, which seemed very simple, spent 3 weeks re-writing the program the program from scratch, took a week of testing, and now the program has been running in production for hundreds of customers for the past 3 years without a redeployment.
Best part is is it is about 1/3rd the lines of code from the original, runs about 1000x faster single threaded, uses about 1/10th the memory, and scales close to linearly with cores.
Now that was my first multi-threaded program, took about 1 week of reading and 1 more week of design, 3 weeks of coding, and another 1 week of debugging.
Threading isn't hard. Once one understands how a CPU(what is atomic and what isn't), thread scheduling(interrupts), and IO(blocking) all interact, the rest is simple. I don't even have a background in CS or Engineering. Like I said, I can "see" in my head how everything interacts, I don't know how many people do that, but it was natural for me ever since I started programming.
http://www.youtube.com/watch?v=FlsBObg-1BQ
* You're welcome... Hope it got you thinking (that's how it all starts).
APK
P.S.=> Do the same when you are able... pay it forward & what-not - that's all...
... apk
Every day, all we speak of -> http://start64.com/index.php?option=com_content&view=article&id=5851:apk-hosts-file-engine-64bit-version&catid=26:64bit-security-software&Itemid=74
"Which is theoretically correct but completely different in practice." - by leaen (987954) on Tuesday December 04, @06:52PM (#42186395)
Perhaps, for those of you that are theorists... I never have been. I am about "Where the rubber meets the road"...
* There is NO THEORY THERE ONLY, only work in practice...
(Talk's cheap - deeds, are not...)
APK
P.S.=> See subject-line... & the program is about 1-2 hours of work since 2003
... apk
In what version of java is this legal?
I think his point was, Java is harder because it does not allow that style of coding.
Just for the record, I don't agree. The line looks too complicated to be understood efficiently.
Wrong on all counts. Imperative languages are a way to convey instructions to a computer; declarative languages do not convey instructions
Alan and Alonzo might like to have a talk with you if they were alive.
Imperative languages convey instructions to a virtual computer that might or might not be implemented by feeding a direct translation of them to the chip. Declarative languages do the same, but the instruction is in such a form that it is inefficient to implement directly, e.g. "substitute parameters and expand the result".
try implementing something more complex than the algorithms you see in Knuth's books
Very few people ever code algorithms that even approach stuff in knuths book in complexity. (pouring featurism all over the codebase and then trying to fix the mess doesn't count. it's a different kind of problem)
In the real world, quicksort and mergesort will use asymptotically worse algorithms below a certain problem size because of constant factors
This is more because it's usually hard to instruct the compiler to properly expand fully inlined versions of cases where size<=16. If it could do that the result wouldn't be that far from optimal. These algorithms do not, in principle, suck enormously on small inputs.
Hmm, let's take a claim back. It's not terribly hard to imagine a declarative language that can represent impossible problems or problems with unknown impossibility. Such a language does not represent instructions in this world.
Intel has been doing this in their compilers for years. More marketing hype I suppose. You can read more here: Automatic Parallelization with Intel® Compilers.
Just like quicksort(3) is far faster than bubblesort so too is a highly threadable code faster than non-threadble code
First, just to be pedantic, I'll point out that quicksort is as bad as bubblesort in the worst case, to a constant factor (you should have picked heapsort or mergesort).
...
Multithreading is not a magic bullet, and in all likelihood it is not generally applicable.
Multithreading is a magic bullet for quicksort - Quicksort - Parallelization.
Great. Now Microsoft is taking all the fun out of programming.
Or, gawd forbid.. we could teach programmers how to use threading?
Who said it was an either/or proposition? People should learn to use parallel processors efficiently, but that doesn't mean that I'm going to turn down a compiler that can do a bit of it too. Otherwise, why bother having optimising compilers at all; clearly the programmer should have just programmed more efficiently!
I was reading this article until it said ".....a breakthrough out of Microsoft Research."...at which point I realized it was a scam.
EVERYBODY KNOWS MICROSOFT DOESN'T WRITE CODE!
Only false promises and bad checks for stolen technology.
HAHAHAAHAHAHAHAHAHAHAHAHAHAHAH
Point is, you make "stupid" errors when doing something as simple as constructing an English sentence. We all do, on occasion. Yet I am supposed to believe you're good enough to not make errors in a task that's much harder, and where there are far more errors to make?
Security has to rely on something more than "don't make mistakes", because statistically, people will make mistakes. Even the best. Especially when it's easy to do things wrong and there are many opportunities to do so, as is the case with multithreaded programming.
xkcd is not in the sudoers file. This incident will be reported.
And the point is invalid because:
1) I don't put the same weight on posting perfect English on slashdot as I do writing actual code.
2) Slashdot does not have an edit button. My code editors do.
2) While I am pretty good at the English language, it's not in my field. So, while I should not be taking a job as an English teacher, that says absolutely nothing about how I program. Oh, I am also very bad at drawing and coordinating colors. All of which has nothing to do with my ability to do my job.
3) You are correct that mistakes happen, however, in programming we have things like editors with grammar checking, debuggers with step tracing and watch windows, unit tests, best practices, and design.
4) Multithreaded programming isn't a difficult concept. In most cases it isn't difficult to implement.
Tell you what, why don't you show me some nontrivial multithreaded code you've written.
xkcd is not in the sudoers file. This incident will be reported.
Logging is a good candidate for Haskell's 'get out of FP jail free' function, unsafePerformIO.
Apparently there's a standard function Debug.Trace.trace for this.
Sure, define non-trivial.
Would you like some that I did in C#, C++, C, VB6, or 80386 assembly (My own written from the ground up protected mode OS)?
Where would you like it, and where can I send the bill for my time to strip out any confidential, trade secret, and/or private section?