Revisiting Amdahl's Law
An anonymous reader writes "A German computer scientist is taking a fresh look at the 46-year old Amdahl's law, which took a first look at limitations in parallel computing with respect to serial computing. The fresh look considers software development models as a way to overcome parallel computing limitations. 'DEEP keeps the code parts of a simulation that can only be parallelized up to a concurrency of p = L on a Cluster Computer equipped with fast general purpose processors. The highly parallelizable parts of the simulation are run on a massively parallel Booster-system with a concurrency of p = H, H >> L. The booster is equipped with many-core Xeon Phi processors and connected by a 3D-torus network of sub-microsecond latency based on EXTOLL technology. The DEEP system software allows to dynamically distribute the tasks to the most appropriate parts of the hardware in order to achieve highest computational efficiency.' Amdahl's law has been revisited many times, most notably by John Gustafson."
The article makes little sense. The site of the DEEP project is more useful. It has the look of an EU publicly funded boondoggle. Those have a long history; see Plan Calcul, the 1966 plan to create a major European computing industry. That didn't do too well.
The trouble with supercomputers is that only governments buy them. When they do, they tend not to use them very effectively. The US has pork programs like the Alabama Supercomputer Center. One of their main activities is providing the censorware for Alabama schools.
There's something to be said for trying to come up with better ways of making sequential computation more parallel. But the track record of failures is discouraging. The game industry beat their head against the wall for five years trying to get the Cell processors in the PS3 to do useful work. Sony has given up; the PS4 is an ordinary shared-memory multiprocessor. So are all the XBox machines.
It's encouraging to see how much useful work people are getting out of GPUs, though.
SMBC
"Xeon Phi = unavailable vaporware"
You know, I wrote a paper on SpMV for Xeon Phi and I got quite a lot of people from all over the world asking me for clarification and for code. So it seems to be quite widespread. You can actually buy some online, Google points to several vendors.
"in order to discourage folks from porting big science applications to CUDA"
There are two things wrong with this statement. First of all, I do not think scientist are discourage from giving a shot to CUDA. Just check any scientific conference and you'll see GPU and CUDA everywhere. Actually we see so much GPU programming that it is getting boring.
Also porting to CUDA is difficult and alien for most people. If we can get similar performance using programming model people are used to, how is that not a good thing? What is so good about CUDA? It is just pretty much the only way to get good performance out of NVIDIA gpus.
The tradeoff between performance, hardware cost and developper cost is a difficult tradeoff. I say let's throw them all in the arena and see what stands.
Disclaimer: my research is supported by both Intel and NVIDIA.
Amdahl's Law still stands. TFA is about changing the assumptions that Amdahl's Law is based on; instead of homogenous parallel processing, you stick a few big grunty processors in for the serial components of your task, and a huge pile of basic processors for the embaressingly parallel components. You're still limited by the fastest processing of non-parellel tasks, but by using a heterogenous mix of processors you're not wasting CPU time (and thus power and money) leaving processors idle.
Every time I hear someone starting about how Amdahl's law is wrong it means one of two things:
1. They want your attention and their topic isn't interesting enough without resorting to controversial statements.
2. They don't understand Amdahls law.
Also, unless you're presenting a summary of the history of computing, you really shouldn't have a figure of Moore's law.
Some people handle this well. When they get to that point in their presentation they just say:
And this is the mandatory picture of Moore's law.
And skip to next slide.
In 2006 I submitted this (http://slashdot.org/comments.pl?sid=183461&cid=15153431):
"Researchers in the parallel processing community have been using Amdahl's Law and Gustafson's Law to obtain estimated speedups as measures of parallel program potential. In 1967, Amdahl's Law was used as an argument against massively parallel processing. Since 1988 Gustafson's Law has been used to justify massively parallel processing (MPP). Interestingly, a careful analysis reveals that these two laws are in fact identical. The well publicized arguments were resulted from misunderstandings of the nature of both laws.
This paper establishes the mathematical equivalence between Amdahl's Law and Gustafson's Law. We also focus on an often neglected prerequisite to applying the Amdahl's Law: the serial and parallel programs must compute the same total number of steps for the same input. There is a class of commonly used algorithms for which this prerequisite is hard to satisfy. For these algorithms, the law can be abused. A simple rule is provided to identify these algorithms.
We conclude that the use of the "serial percentage" concept in parallel performance evaluation is misleading. It has caused nearly three decades of confusion in the parallel processing community. This confusion disappears when processing times are used in the formulations. Therefore, we suggest that time-based formulations would be the most appropriate for parallel performance evaluation."
Maybe it will be helpful gain
I am sure that means something...
I haven't thought of anything clever to put here, but then again most of you haven't either.
You can't cheat Amdahl's law anymore than you can give birth in one month with nine women. The law is a rather simple idea similar to chemical kinetics, when you think about it. i.e. a rate limiting steps.
If you are interested in a non-mathematical description of Amdahl's law have a look at http://www.clustermonkey.net/Parallel-Programming/parallel-computing-101-the-lawnmower-law.html
HPC for Primates. Read Cluster Monkey
Ahmdal's Law only applies to individual algorithms. Ahmdal's Law only applies to individual algorithms. Ahmdal's Law only applies to individual algorithms.
Besides which, Ahmdal's law is an obvious truth unless you can make a process take negative time. All attempts to make Ahmdal's Law sound fancy or complicated are a disservice. All attempts to pigeonhole Ahmdal's Law into only applying to parallel design are a disservice. Any attempts to "revisit" are either fallacious or focus on algorithm changes, which Amdahl made no attempt to address.
Ahmdal's law in a nutshell: If you spend 10% of your time on X and 90% of your time on Y, you will never get more than a 1/.9 speedup by optimizing X, even if you manage to make X instantaneous. Another way to put it is that if Y takes 9 seconds, you are never going to get the process under 9 seconds by modifying X...
This most certainly does NOT break Amdahl's law. It simply partitions the problem to use the cheap gear for the embarrassingly parallel portion of the workload and the expensive gear for the harder to parallelize workload.
It necessarily cannot make a non-parallelizable portion (the serial part) run in parallel.
Note that what part of the problem is serial depends on the hardware. The lower the latency and the higher the bandwidth of the interconnect, the more of the problem you can get to run effectively in parallel. However, there comes a point where the problem cannot be decomposed further. The atoms that remain after that may all be run at once, but the individual atom will run serially. No matter what you do, 5*(2+3) can go no faster than serially adding and then multiplying (yes, you could do two multiplications in parallel and then add, but you gain nothing for it).
Amdahl came up with this law, to voice his scepticism about parallell computing. In favor of better uni-processors.
The law is revisited every day by anyone who needs to calculate the cost of software licenses.
Nowadays the cost of HW is miniscule, compared to the cost of for example off the shelf oracle databases that are based on how many cores a system has.
Smart money would hoard dual-core Xeon processors and oracle licenses while they are still available for these processors, as it can mean saving $100k's of dollars in licensing cost over the next few years.
Brannigan's Law.
Actually, you can buy Xeon Phis. We have a pair in one of our machines. Also, I don't see why using a proprietary NVidia system is any better than using a proprietary Intel system. If you care about interoperability, you should be using OpenCL.
Optimizing CUDA is almost, but not quite, as arcane as optimizing assembly code by hand. It requires a deep knowledge of the underlying architecture. The addressing, the memory read patterns, and the role of each of the tiers of memory and the cost of moving between tiers, the size restrictions on each buffer, and how to coalesce the whole mess into a coherent answer. I once got a 30% performance increase by offsetting the addressing on my memory buffers so that they didn't all start on 16-byte boundaries. It allowed the data to be read in parallel and avoided collisions from the different processes trying to access the same block at the same time. The problem is most programmers aren't particularly hardware oriented, so CUDA comes with a steep learning curve if you want to do it well.
"You can actually buy some online, Google points to several vendors"
Provide a link. The only thing that turns up are sucker traps that don't actually have phi for sale.