Revisiting Amdahl's Law

← Back to Stories (view on slashdot.org)

Posted by Soulskill on Tuesday June 18, 2013 @06:15PM from the looking-for-new-loopholes dept.

An anonymous reader writes "A German computer scientist is taking a fresh look at the 46-year old Amdahl's law, which took a first look at limitations in parallel computing with respect to serial computing. The fresh look considers software development models as a way to overcome parallel computing limitations. 'DEEP keeps the code parts of a simulation that can only be parallelized up to a concurrency of p = L on a Cluster Computer equipped with fast general purpose processors. The highly parallelizable parts of the simulation are run on a massively parallel Booster-system with a concurrency of p = H, H >> L. The booster is equipped with many-core Xeon Phi processors and connected by a 3D-torus network of sub-microsecond latency based on EXTOLL technology. The DEEP system software allows to dynamically distribute the tasks to the most appropriate parts of the hardware in order to achieve highest computational efficiency.' Amdahl's law has been revisited many times, most notably by John Gustafson."

54 comments

Min score:

Reason:

Sort:

Buzzword-heavy by Animats · 2013-06-18 18:33 · Score: 4, Insightful

The article makes little sense. The site of the DEEP project is more useful. It has the look of an EU publicly funded boondoggle. Those have a long history; see Plan Calcul, the 1966 plan to create a major European computing industry. That didn't do too well.
The trouble with supercomputers is that only governments buy them. When they do, they tend not to use them very effectively. The US has pork programs like the Alabama Supercomputer Center. One of their main activities is providing the censorware for Alabama schools.
There's something to be said for trying to come up with better ways of making sequential computation more parallel. But the track record of failures is discouraging. The game industry beat their head against the wall for five years trying to get the Cell processors in the PS3 to do useful work. Sony has given up; the PS4 is an ordinary shared-memory multiprocessor. So are all the XBox machines.
It's encouraging to see how much useful work people are getting out of GPUs, though.
1. Re:Buzzword-heavy by Anonymous Coward · 2013-06-18 18:42 · Score: 0
  
  Thank you for not forcing me to read that retarded string of characters that slashdot apparently like to call "news".
2. Re:Buzzword-heavy by cold+fjord · 2013-06-18 18:51 · Score: 3, Interesting
  
  The article makes sense, but I don't think the work appears to be especially innovative even if it could be very useful.
  It is more than governments that buy supercomputers. They are also used in industry for things like oil and gas exploration, economic modeling, and weather forecasts. Universities and research organizations also use them for a variety of purposes. Time on an actual supercomputer tends to be highly valuable and sought after. You may disagree with the use, but that is a different question from not being used effectively.
  The Secret Lives of Supercomputers, Part 1
  
  "It is probably the biggest trend in supercomputers -- the movement away from ivory-tower research and government-sponsored research to commerce and business," Michael Corrado, an IBM spokesperson, told TechNewsWorld. In 1997, there were 161 supersystems deployed in business and industry, but that figure grew to 287 by June 2008, he noted. "More than half the list reside in commercial enterprises. That's a huge shift, and it's been under way for years."
  Uses for supercomputers
  
  --
  much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
3. Re:Buzzword-heavy by cold+fjord · 2013-06-18 19:05 · Score: 3, Interesting
  
  Demand Surges for Supercomputers
  Do Supercomputers Still Matter?
  Oil giant Total builds "world's largest commercial supercomputer"
  
  --
  much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
4. Re:Buzzword-heavy by Anonymous Coward · 2013-06-18 21:04 · Score: 0
  
  As someone who actually gets payed by one of those EU publicly funded boondoggles I'd have to say they do quite well (at paying me).
  
  I'd even go so far as to say that there are some genuinely interesting results that come out of these projects.
  There is just a lot of additional work that goes into massaging your results into something that fits the EU's mission statements and goals.
  
  And of course the overall goals of developing some platform for EU industry or whatever is total BS.
  Research projects like this will never accomplish something like that.
  That mission is basically to develop solutions to problems that no one has.
  
  Posting anonymous for obvious reasons.
5. Re:Buzzword-heavy by rioki · 2013-06-18 21:50 · Score: 3, Interesting
  
  You might want to read / view these slides:An Introduction to Modern GPU Architecture Especially slide 42.
  Modern GPUs are massively parallel in their execution. Yes they work "only" on one image, but when rendering one scene the sharers work in parallel. For example a fragment (aka per pixel) shader will be run in parallel for each pixel, limited by the number of available shader units (aka core). THIS is why you get the awesome performance: small, self contained programs running in parallel.
6. Re:Buzzword-heavy by smallfries · 2013-06-18 22:28 · Score: 2
  
  How dare you criticise the author - he is a physicist and he has stooped to coming and telling us computer science types how to do it properly!
  There is a deeply appropriate xkcd but I cannot be bothered to find it. Decoding the garbage in the pcworld story tell us that he is going to break Amdahl's Law by dynamically partitioning the workload between a fast single threaded processor and many slower parallel processors. I would guess that my failing to make a fair comparison they can claim that the portion running under the boosted clock somehow beats the bounds predicted by Amdahl's law. Sadly it does not as the law is worded in the proportion of the code that can be executed on the parallel architecture.
  It is quite possible that much of the hyperbole was added as sales pitch, which is a little unfortunate as the dynamic partitioning and the toolchain support are far more interesting anyway.
  
  --
  Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
7. Re:Buzzword-heavy by RabidReindeer · 2013-06-18 22:54 · Score: 1
  
  The trouble with supercomputers is that only governments buy them.
  Actually, not so. For about 15 minutes, I once owned a supercomputer myself, believe it or not.
  It wasn't a major supercomputer, but it was classified as a true supercomputer and I was acting as an intermediary for an oil industry company who had offended the seller, so the seller wouldn't sell directly to them.
  Governments are definitely big consumers of supercomputers, but universities also do a lot of computationally-intensive work, not all of which is necessarily government-funded. I've already mentioned oil companies. I'm sure there are other cases.
8. Re:Buzzword-heavy by mysidia · 2013-06-18 23:45 · Score: 1
  
  claim that the portion running under the boosted clock somehow beats the bounds predicted by Amdahl's law.
  Right... their system cannot 'break' Amdahl's law. They bypass it by allowing the sequential portion of the workload to run on faster hardware, and the parallel portion of the workload to run on the massively parallel (but slower) architecture.
  Designing an approach that allows better parallel computing despite Amdahl's law, does not imply necessarily breaking the law.
  It's more like: working cleverly within the constraints of Amdahl's law, to evade the issue.
  For it to work, they have to have that super-fast sequential platform though.
  Unless there continue to be linear speed performance improvements in the fast sequential execution architecture (and not just more cores on the commodity PCs), they will still run into Amdahl's law.
9. Re:Buzzword-heavy by grizdog · 2013-06-19 00:22 · Score: 1
  
  I agree. The article is next to worthless. In particular, it appears (and that is the problem - the article is just too vague) that they are not counting the GPU time against Amdahl's law. That's splitting hairs, at best.
  There might be some "there there" if they tried to refine Amdahl's law to include different kinds of processors, and the kinds of physical restrictions they talk about. All the article does is say such a thing might be possible - I think we already knew that.
10. Re:Buzzword-heavy by delt0r · 2013-06-19 00:23 · Score: 1
  
  I use supercomputers all the time for my work. I am at university so this perhaps is a government one. But we are not idiots and use it quite effectively thank you very much. In most of the supercomputers in the EU at least are for universities. They are mostly used quite well. At least all the ones i have used. Which is quite a few of them.
  
  --
  If information wants to be free, why does my internet connection cost so much?
11. Re:Buzzword-heavy by kramulous · 2013-06-19 00:24 · Score: 1
  
  Somebody has an axe to grind.
  
  --
  .
12. Re:Buzzword-heavy by rgbatduke · 2013-06-19 00:59 · Score: 2
  
  Hey, don't disrespect physicists in parallel computing. Some of us actually understand how to do it properly and agree with what you state. Superlinear speedup is not precisely unknown, but it is rare and depends on architectural "tricks" that typically preserve Amdahl's law at a low level but apparently violate it at a higher level. In the naivest, stupidest example, if we didn't count cores instead of processors, even embarrassingly parallel code would exhibit superlinear speedup on a single processor system. Replace core count with internal ALUs, pipelines, SIMD/MIMD in the architecture, onboard vector units, etc, and one can get the same sort of thing per core for just the right code.
  I am deeply skeptical of any sort of toolset that purports to be able to either statically or dynamically partition a given set of upper level code to get superlinear speedup. I won't say it is impossible to build a set that "works" for some fraction of the parallelizable code in the Universe, but given the complex tradeoffs between computation and communication in different communication topologies and task partitionings, this is not a problem that has a simple universal solution and I suspect that in lots of cases an experienced parallel programming human being could spend a half day analyzing the code and architecture and beat (or tie, as the USUAL rule is going to be no meaningful superlinear speedup boring for coarse grained parallel or embarrassingly parallel code) the output from an automated tool.
  An interesting example of a tool that does this sort of tuning (semi-empirically!) that works is ATLAS, the automatically tuned linear algebra system. Basically it does a search of the space of partitionings and algorithms to determine the best combination of the two for performing basic linear algebra functions (BLAS) and then implements it in a transparent library. It is semi-empirical because it is nearly impossible to predict the overall effect of every combination of SSE support, clock speed, bus speed, core architecture -- it is a lot easier to just go and find out. But the problem ATLAS solves is comparatively simple relative to even static task partitioning on heterogeneous computational resources with variable costs for core-to-core communication, especially in today's multicore world where one has different speeds between cores on a processor, between processors in a system, between systems, between general purpose processors and special purpose e.g. GPU/vector processors, where the communication topology itself can have a major impact on the kind of parallel speedup any given task has.
  This, then, is the interesting part as you note, and who knows, maybe they've built a sufficiently intelligent system to get nominally superlinear speedup (or hell, who cares, just getting close to optimal speedup sublinear or not) from a meaningful fraction of the space of possible parallizable code. But God couldn't get superlinear speedup on fine grained synchronous parallel code with long range coupling out of any multi-node scalable parallel architecture available in the real world today, no matter how fancy the partitioning tool.
  rgb
  
  --
  Even when the experts all agree, they may well be mistaken. --- Bertrand Russell.
13. Re:Buzzword-heavy by rgbatduke · 2013-06-19 01:20 · Score: 2
  
  Double ditto. I've written magazine articles on beowulf-style supercomputers I've built at home (I used to write a column for "Cluster World" magazine in the brief time that it existed, but I also wrote an article or two for more mainstream computer mags). I have also set up clusters for companies I've founded and helped others set up corporate clusters. Some of what are arguably the world's largest parallel supercomputers -- Google's cluster, for example -- are not government funded. Many others aren't government funded, they are built by companies that sell products to many entities, among them (perhaps) the government. Aerospace engineering companies all need supercomputers to do computational fluid dynamics on hull designs, for example. Ordinary engineering companies use them to do finite element analysis. Gaming clusters are by any sensible definition a highly parallelized, dynamically partitioning supercomputer.
  Ever since the invention of PVM and open source versions of MPI, anybody with a small pile of computational resources and a network has been able to implement a beowulf-style supercomputer built from them, an architecture so successful that nearly all of the supercomputers built in the world today are basically "beowulfs". I've helped a few dozen individuals (one at a time, not via my book or magazine articles) build beowulf clusters at home just to dink around with for fun, or to learn a new job skill, or to set up a learning cluster at a small community college or university. No government funding, often out of pocket funding or repurposing old computers that are lying around. Not all of these clusters could beat Moore's Law, which has inexorably eaten Amdahl's Lunch after a few years (that is, by the time they were built it was often the case that a single processor over the counter computer at the high end of clock and so on would beat the small cluster made of older systems) but there is no doubt that they were supercomputer architecture with substantial (but sublinear) speedup compared to single threaded execution times for a suitable parallelized chore.
  Besides, it is useful to remember that your cell phone would have been considered a munition a bit over a decade ago. A better thing to state is that everybody buys supercomputers because almost every processor based system from navigation systems in cars to cell phones to tablets to personal computers is, these days, a supercomputer. My i7 laptop has four cores, eight contexts, and exhibits linear speedup on in-cache embarrassingly parallel code out to eight simultaneous tasks because Intel has done a pretty amazing job of internally parallelizing the execution subsystems for the contexts. It beats the hell out of almost all of the small clusters I've ever built, including clusters with many, many more nodes. Build even a small stack of i7 systems on a gigabit or better network -- two, for example -- and you've got a sixteen core supercomputer with a complex communication topology (variable speeds and nonlinear thresholds, as the i7 does stop giving you the purely linear 8 way speedup for large enough tasks and drops down to a bit over four -- again an instance of "superlinear speedup" of parallel code even WITHOUT using a fancy tool if you simply count cores instead of context and ignore internal parallelism for certain kinds of code that permits a single core to be managing memory I/O for one task while executing the other).
  rgb
  
  --
  Even when the experts all agree, they may well be mistaken. --- Bertrand Russell.
14. Re:Buzzword-heavy by postbigbang · 2013-06-19 03:03 · Score: 1
  
  Go back further to Von Neumann and you'll see that this is a hybrid model, where the state machine is respected, with mgmt processes acting as controler daemons to child processes. It's not really a bypass, just a hybrid representation as the distributed portions still respect Amdahl's precepts.
  
  --
  ---- Teach Peace. It's Cheaper Than War.
15. Re:Buzzword-heavy by phantomfive · 2013-06-19 03:29 · Score: 1
  
  Also worth mentioning that IBM sells more mainframes now than they did 20 years ago......
  
  --
  "First they came for the slanderers and i said nothing."
16. Re:Buzzword-heavy by Anonymous Coward · 2013-06-19 03:39 · Score: 0
  
  Please don't use Alabama as an example of how effective government is, or what governments use supercomputers for. Not all states are Alabama--this is why I don't live there. Some states use their supercomputers for other things, like biochemistry, aerodynamics, climate modeling, or you name it. Things that private industry would never invest in because they're too selfish to see the benefit.
  I'm tired about bashing the government. Sure, there's pork, but there's way more pork in private industry. And services like NOAA are critical to everyone.
  For every Alabama censoring project, there's private firms using supercomputers to cheat people out of money in the financial industry, or to make a buck by destroying ecosystems.
  To be clear, I don't hate private industry. I'm just tiring of the unfair comparisons that get made, like somehow private industry is a lily-white utopia without any problems of its own. Both have their advantages and disadvantages.
  FWIW, I agree with the general sentiment that this project seems like sort of a hand-waving exercise, which is common in research everywhere.
17. Re:Buzzword-heavy by Anonymous Coward · 2013-06-19 06:42 · Score: 0
  
  How dare you criticise the author - he is a physicist and he has stooped to coming and telling us computer science types how to do it properly!
  He's a computational physicist, moron. Did you even read the guy's credentials?
18. Re:Buzzword-heavy by Anonymous Coward · 2013-06-19 12:37 · Score: 0
  
  There is a deeply appropriate xkcd but I cannot be bothered to find it.
  This one?
19. Re:Buzzword-heavy by smallfries · 2013-06-19 22:06 · Score: 1
  
  That's a bingo.
  
  --
  Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
20. Re:Buzzword-heavy by smallfries · 2013-06-19 22:10 · Score: 1
  
  Your phrasing is kind of hard to parse - I actually can't tell if you are agreeing with what I wrote, or arguing in a passive-aggressive way. This implies that I have had too many arguments with passive aggressive people recently and I need to learn to read things more neutrally again. But yes, that is what I was pointing out: tweaking the frequency in the fast sequential part is still covered by Amdahl's law, contrary to their wild hyperbole.
  
  --
  Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
21. Re:Buzzword-heavy by Bengie · 2013-06-20 05:24 · Score: 2
  
  GPU cores are broken into groups. Each group must be doing the exact same instruction at the exact same time. Branches are horrible for performance as it will force some cores to stop computing all together while waiting for the branch to finish.
  
  There are many concurrent algorithms that don't like to keep the execution path in perfect sync. This is where a many-core CPU will take out a GPU in performance. GPUs also have horrible random access and very small caches. Actually, the per core cache of GPUs has been going down over the years for both nVidia and AMD.
  
  GPUs are excellent for what they're good at, and horrible for everything else. If I could remember the link, there was a 100GFlop Intel CPU kicking the crap out of a 2Tflop nVidia GPU on transcoding, even though both code-paths were highly optimized for each architecture. It just so happened that the algorithm used did not play well with GPUs.
SMBC by klapaucjusz · 2013-06-18 19:05 · Score: 2

SMBC
1. Re:SMBC by godrik · 2013-06-18 19:27 · Score: 1
  
  Yeah, there is nothing wrong with amdahl's law. People that need to care about it clearly understand what it means. That is to say, when you increase parallelism, sequential parts become bottlenecks. You need to reengineer the problem/algorithm/architecture around that new bottleneck.
2. Re:SMBC by Anonymous Coward · 2013-06-18 20:39 · Score: 1
  
  "Sequential parts" usually mean "We won't know how to proceed further on until previous step is done". However, if you have really massive, 2^data_word_length or higher scale parallelism, then you can actually try guessing, and executing next step an all possible outcomes of previous step, then throwing away every result but one as previous step completes. Even if your parallelism is of lower scale, statistically it may still yield some speedup, whenever you happen to have a lucky guess. Sure beats letting all those processors just idling on helplessly.
3. Re:SMBC by mysidia · 2013-06-18 23:51 · Score: 2
  
  then you can actually try guessing, and executing next step an all possible outcomes of previous step, then throwing away every result but one as previous step completes.
  However... this requires power consumption, and it still does take time and tie up your infrastructure working on the 'guess'. Meanwhile, the previous step completes, and your CPUs are all still busy working on guessing the previous step, and you need additional sequential overhead to initiate and terminate the guessing process.
  You show 'CPU usage' on your 'idle cores', but it's 99% waste heat.
4. Re:SMBC by TheLink · 2013-06-19 04:18 · Score: 1
  
  I've long wondered if you can set up a quantum computer to process "all possible paths" and then "collapses" on the most probable right answer.
  After all you can have light beams (and other quantum state friendly stuff) that are superpositions of all possible states and perform functions on them.
  --
  
  Too many replies beneath your current threshold
Re:Xeon dream on by godrik · 2013-06-18 19:23 · Score: 3, Informative

"Xeon Phi = unavailable vaporware"
You know, I wrote a paper on SpMV for Xeon Phi and I got quite a lot of people from all over the world asking me for clarification and for code. So it seems to be quite widespread. You can actually buy some online, Google points to several vendors.
"in order to discourage folks from porting big science applications to CUDA"
There are two things wrong with this statement. First of all, I do not think scientist are discourage from giving a shot to CUDA. Just check any scientific conference and you'll see GPU and CUDA everywhere. Actually we see so much GPU programming that it is getting boring.
Also porting to CUDA is difficult and alien for most people. If we can get similar performance using programming model people are used to, how is that not a good thing? What is so good about CUDA? It is just pretty much the only way to get good performance out of NVIDIA gpus.
The tradeoff between performance, hardware cost and developper cost is a difficult tradeoff. I say let's throw them all in the arena and see what stands.
Disclaimer: my research is supported by both Intel and NVIDIA.
Poor summary by Anonymous Coward · 2013-06-18 19:57 · Score: 5, Informative

Amdahl's Law still stands. TFA is about changing the assumptions that Amdahl's Law is based on; instead of homogenous parallel processing, you stick a few big grunty processors in for the serial components of your task, and a huge pile of basic processors for the embaressingly parallel components. You're still limited by the fastest processing of non-parellel tasks, but by using a heterogenous mix of processors you're not wasting CPU time (and thus power and money) leaving processors idle.
1. Re:Poor summary by Impy+the+Impiuos+Imp · 2013-06-19 01:06 · Score: 1
  
  Most of the cool stuff is pure parallel anyway, like the brain, or simulations of bodies made of atoms or cells. Plenty of room to grow regardless of some un-de-serializable algorithms.
  
  --
  (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
2. Re:Poor summary by Anonymous Coward · 2013-06-19 12:17 · Score: 0
  
  So most of the cool stuff is pure nothing, right?
3. Re:Poor summary by Anonymous Coward · 2013-06-19 12:23 · Score: 0
  
  The smallest unit within any task is sequential. You're never "pure parallel".
Not again by Anonymous Coward · 2013-06-18 21:16 · Score: 0

Every time I hear someone starting about how Amdahl's law is wrong it means one of two things:
1. They want your attention and their topic isn't interesting enough without resorting to controversial statements.
2. They don't understand Amdahls law.

Also, unless you're presenting a summary of the history of computing, you really shouldn't have a figure of Moore's law.
Some people handle this well. When they get to that point in their presentation they just say:
And this is the mandatory picture of Moore's law.
And skip to next slide.
Shi's Law, revisited by G3ckoG33k · 2013-06-18 21:28 · Score: 1

In 2006 I submitted this (http://slashdot.org/comments.pl?sid=183461&cid=15153431):
"Researchers in the parallel processing community have been using Amdahl's Law and Gustafson's Law to obtain estimated speedups as measures of parallel program potential. In 1967, Amdahl's Law was used as an argument against massively parallel processing. Since 1988 Gustafson's Law has been used to justify massively parallel processing (MPP). Interestingly, a careful analysis reveals that these two laws are in fact identical. The well publicized arguments were resulted from misunderstandings of the nature of both laws.
This paper establishes the mathematical equivalence between Amdahl's Law and Gustafson's Law. We also focus on an often neglected prerequisite to applying the Amdahl's Law: the serial and parallel programs must compute the same total number of steps for the same input. There is a class of commonly used algorithms for which this prerequisite is hard to satisfy. For these algorithms, the law can be abused. A simple rule is provided to identify these algorithms.
We conclude that the use of the "serial percentage" concept in parallel performance evaluation is misleading. It has caused nearly three decades of confusion in the parallel processing community. This confusion disappears when processing times are used in the formulations. Therefore, we suggest that time-based formulations would be the most appropriate for parallel performance evaluation."
Maybe it will be helpful gain
1. Re:Shi's Law, revisited by Anonymous Coward · 2013-06-19 06:24 · Score: 0
  
  Thank you for this excerpt, the paper seems to be down, though!
  A working link is: http://spartan.cis.temple.edu/shi/public_html/docs/amdahl/amdahl.html
  (seems to be the star wars guy got replaced by a hellenophile, as the server was called yoda before)
Hmm by TheSkepticalOptimist · 2013-06-19 00:44 · Score: 1

I am sure that means something...

--
I haven't thought of anything clever to put here, but then again most of you haven't either.
Understanding Amdahl's law by deadline · 2013-06-19 00:55 · Score: 2

You can't cheat Amdahl's law anymore than you can give birth in one month with nine women. The law is a rather simple idea similar to chemical kinetics, when you think about it. i.e. a rate limiting steps.
If you are interested in a non-mathematical description of Amdahl's law have a look at http://www.clustermonkey.net/Parallel-Programming/parallel-computing-101-the-lawnmower-law.html

--
HPC for Primates. Read Cluster Monkey
Repeat after me: by Mashdar · 2013-06-19 01:06 · Score: 4, Insightful

Ahmdal's Law only applies to individual algorithms. Ahmdal's Law only applies to individual algorithms. Ahmdal's Law only applies to individual algorithms.
Besides which, Ahmdal's law is an obvious truth unless you can make a process take negative time. All attempts to make Ahmdal's Law sound fancy or complicated are a disservice. All attempts to pigeonhole Ahmdal's Law into only applying to parallel design are a disservice. Any attempts to "revisit" are either fallacious or focus on algorithm changes, which Amdahl made no attempt to address.
Ahmdal's law in a nutshell: If you spend 10% of your time on X and 90% of your time on Y, you will never get more than a 1/.9 speedup by optimizing X, even if you manage to make X instantaneous. Another way to put it is that if Y takes 9 seconds, you are never going to get the process under 9 seconds by modifying X...
Not breaking Amdahls law by sjames · 2013-06-19 01:07 · Score: 2

This most certainly does NOT break Amdahl's law. It simply partitions the problem to use the cheap gear for the embarrassingly parallel portion of the workload and the expensive gear for the harder to parallelize workload.
It necessarily cannot make a non-parallelizable portion (the serial part) run in parallel.
Note that what part of the problem is serial depends on the hardware. The lower the latency and the higher the bandwidth of the interconnect, the more of the problem you can get to run effectively in parallel. However, there comes a point where the problem cannot be decomposed further. The atoms that remain after that may all be run at once, but the individual atom will run serially. No matter what you do, 5*(2+3) can go no faster than serially adding and then multiplying (yes, you could do two multiplications in parallel and then add, but you gain nothing for it).
1. Re:Not breaking Amdahls law by Anonymous Coward · 2013-06-19 03:12 · Score: 0
  
  5*(2+3) decomposes to 4 * (2 + 3) + (2 + 3), which can be boosted by using the hardware address resolver (which is a 3-way native adder with bitshift).
2. Re:Not breaking Amdahls law by Anonymous Coward · 2013-06-19 03:27 · Score: 0
  
  Amdahls law is about the bandwidth and the time to compute something.
  For 5*(2+3) you have a fixed output of time
  Using your special instruction 4*(2+3) + (2+3) is changing where data goes (thus changing the time to send/recv). You can use amdahls law to compare each one and see if you have decreased the time to send/recv the results.
  You can also decompose it into 2+3+2+3+2+3+2+3+2+3. Which is wildly parallel. But the cost to send it out and get the data back may be more expensive than just doing the multiply. You have to try it on your hardware to get an idea what it would do. As decomposing it this way makes it highly dependent on your interconnects.
  What Amdahl was getting at was, do not ignore your interconnects when working out time to do something. Which *many* of our CS big O notations ignore.
3. Re:Not breaking Amdahls law by sjames · 2013-06-19 03:32 · Score: 1
  
  Where does that get you any useful parallelism?
It's revisited every day by Anonymous Coward · 2013-06-19 01:14 · Score: 0

Amdahl came up with this law, to voice his scepticism about parallell computing. In favor of better uni-processors.
The law is revisited every day by anyone who needs to calculate the cost of software licenses.
Nowadays the cost of HW is miniscule, compared to the cost of for example off the shelf oracle databases that are based on how many cores a system has.
Smart money would hoard dual-core Xeon processors and oracle licenses while they are still available for these processors, as it can mean saving $100k's of dollars in licensing cost over the next few years.
There's only ONE law by Anonymous Coward · 2013-06-19 01:57 · Score: 0

Brannigan's Law.
Re:Xeon dream on by Anonymous Coward · 2013-06-19 02:28 · Score: 0

Actually, you can buy Xeon Phis. We have a pair in one of our machines. Also, I don't see why using a proprietary NVidia system is any better than using a proprietary Intel system. If you care about interoperability, you should be using OpenCL.
Re:Xeon dream on by ImprovOmega · 2013-06-19 05:09 · Score: 2

Optimizing CUDA is almost, but not quite, as arcane as optimizing assembly code by hand. It requires a deep knowledge of the underlying architecture. The addressing, the memory read patterns, and the role of each of the tiers of memory and the cost of moving between tiers, the size restrictions on each buffer, and how to coalesce the whole mess into a coherent answer. I once got a 30% performance increase by offsetting the addressing on my memory buffers so that they didn't all start on 16-byte boundaries. It allowed the data to be read in parallel and avoided collisions from the different processes trying to access the same block at the same time. The problem is most programmers aren't particularly hardware oriented, so CUDA comes with a steep learning curve if you want to do it well.
Re:Xeon dream on by Anonymous Coward · 2013-06-19 09:58 · Score: 0

"You can actually buy some online, Google points to several vendors"
Provide a link. The only thing that turns up are sucker traps that don't actually have phi for sale.