Intel Updates Compilers For Multicore CPUs
Threaded writes with news from Ars that Intel has announced major updates to its C++ and Fortran tools. The new compilers are Intel's first that are capable of doing thread-level optimization and auto-vectorization simultaneously in a single pass. "On the data parallelism side, the Intel C++ Compiler and Fortran Professional Editions both sport improved auto-vectorization features that can target Intel's new SSE4 extensions. For thread-level parallelism, the compilers support the use of Intel's Thread Building Blocks for automatic thread-level optimization that takes place simultaneously with auto-vectorization... Intel is encouraging the widespread use of its Intel Threading Tools as an interface to its multicore processors. As the company raises the core count with each generation of new products, it will get harder and harder for programmers to manage the complexity associated with all of that available parallelism. So the Thread Building Blocks are Intel's attempt to insert a stable layer of abstraction between the programmer and the processor so that code scales less painfully with the number of cores."
We see Intel mainly as a CPU/chipset maker, but don't pay much attention to their software side. I believe they are one of the largest software development companies in the world. Between drivers, compilers, and all the other goodies to support all their hardware, they spend a lot of time doing software development.
And as much as they develop compilers to optimize code for Intel CPUs, the code most of the time will also see a speed increase on AMD CPUs as well. Who else do you want developing a compiler but the people who made the hardware it's running on.
Its not what it is, its something else.
Will they add these features to GCC or make docs available so others can?
If compilers keep abstrating away the programmer and the cpu, and getting better at optimization, programmers won't need to write better code or learn new techniques to take advantage of all the power a few extra cores can provide.
Instead the programmer can concerntrate on writing more understandable code.
Actually, I've always thought that telling the compiler what you wanted to do, instead of how to do it, would result in the compiler being able to determine the best path to take for a given task.
Even more so for interpreted/compiled on the fly languages. They can be dynamically compiled to take advantage of whatever hardware is available on each machine, without the developer having to code for it.
1&1 - Cheap domain and web hosting.
If I am writing a quantum pyhsic calculation package or compiling it (let us say.... Molpro 96) I want it to use correctly the many core I assign it to run , using high paralllelized fortran compiler. I don not want to know how and why it does it, I jsut want it to do it. I ain't a computer sicentist, I am not writing a thesis on computer science, and I don't care a iota about this. I leave that to computer scientist. Neither does my chief care about academic computer science. Same for intel multicore. The fun fact, is that msot of us want to use the power of the many core, and don't care a bit about the how, why etc...
Fortran is dead, and it has had native parallel math since 1990. C is alive and it needs ugly hacks to get parallel math.
Escher was the first MC and Giger invented the HR department.
>> As the company raises the core count with each generation of new products, it will get harder and harder for programmers to manage the complexity associated with all of that available parallelism.
I'm very surprised and dissapointed by the pervasiveness of the incorrect myth thats being promoted even amongst supposedly technically knowledgeable groups that:
a) Writing multithreaded code is terribly difficult
b) You need to implement code to have the same number of threads as your target hardware has cores
Both of these is completely not true at least for the PC marchitecture.
The way to develop multithreaded code is to exploit the natural parallelism of the problem itself. If the problem decomposes down most neatly into one, three or 6789 threads, then design and write the implementation that way. Consequently the complexity of the problem does not increase as the number of cores available increases.
In the PC architecture case, attempting to design your code based on the number of cores in your target hardware just leads to a twisted and therefore bad and also non-portable design.
I'm surprised how few developers seem to understand that in fact its OK, normal and often desireable to have more than one application thread running on the same core. In fact you really can't even ensure or even assume that your multi-threaded app will get one core per thread even if the hardware has enough cores, or work best if it does, as core/thread allocation is dynamically scheduled by the OS depending on loading. Not to mention there's all sorts of other apps, drivers and operating system tasks running concurrently too, so depending on each core's load, one app-thread per core may actually not be the most optimal approach anyway.
Almost everybody who can write better assembly than GCC is already working on compilers and optimization. Even GCC is better than most programmer's hand-optimized assembly. I've seen many times over the past several years where open source projects have thrown away assembly source because it is faster and more readable in C. (WINE in particular benchmarked their hand-optimized routines and found themselves soundly beat by GCC.)
These days, a similar thing is happening with vectorization. If programmers try to do it manually, odds are that they won't do better than the compiler, but they will have wasted a lot of time on it. Eventually, we will probably see the same thing for multi-threading workloads. Compilers aren't stupid, and compiler writers are some of the best programmers around when it comes to optimization.