Intel, NVIDIA Take Shots At CPU vs. GPU Performance
MojoKid writes "In the past, NVIDIA has made many claims of how porting various types of applications to run on GPUs instead of CPUs can tremendously improve performance — by anywhere from 10x to 500x. Intel has remained relatively quiet on the issue until recently. The two companies fired shots this week in a pre-Independence Day fireworks show. The recent announcement that Intel's Larrabee core has been re-purposed as an HPC/scientific computing solution may be partially responsible for Intel ramping up an offensive against NVIDIA's claims regarding GPU computing."
I am now posting using my GPU. It's at least 50x faster!
Uh, Linus doesn't work for microsoft.
Isn't it like saying "Ferrari makes the fastest tractors!" (yeah, I know!), which may be true, as long as they can actually carry out the things you want to do.
I don't know about the limits of OpenCL/GPU-code (or architecture compared to regular CPUs/AMD64 functions, registers, cache, pipelines, what not), but I'm sure there's plenty and that someone will tell us.
Do you mean that Intel is homosexual? I'm not even sure what that would mean. It's attracted to others of the same gender? Seems a bizarre thing to say about a company.
It seems unlikely that you're using the old usage of happy. It would make some sort of sense if the company turned out to give its employees a positive experience from working there, but I don't see any evidence of that.
Or perhaps you simply consider it somehow reprehensible that a company makes semiconductors. Why do you have an aversion to this practice?
I don't expect slashdot "editors" to actually edit, but could you at least link to the most applicable past story on the subject? It's almost like you people don't care if slashdot appears at all competent. Snicker.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
AMD must feel very conflicted...
The troll did have one point, the subject, where is AMD/ATI in this article ? Didn't they also have a product in that segment ?
New things are always on the horizon
At least as far as parallel computing goes. CPUs have been designed for decades to handle sequential problems, where each new computation is likely to have dependencies on the results of recent computations. GPUs, on the other hand, are designed for situations where most of the operations happen on huge vectors of data; the reason they work well isn't really that they have many cores, but that the operations for splitting up the data and distributing it to the cores is (supposedly) done in hardware. In a CPU, the programmer has to deal with splitting up the data, and allowing the programmer to control that process makes many hardware optimizations impossible.
The surprising thing in TFA is that Intel is claiming to have done almost as well on a problem that NVIDIA used to tout their GPUs. It really makes me wonder what problem it was. The claim that "performance on both CPUs and GPUs is limited by memory bandwidth" seems particularly suspect, since on a good GPU the memory access should be parallelized.
It's clear that Intel wants a piece of the growing CUDA userbase, but I think it will be a while before any x86 processor can compete with a GPU on the problems that a GPU's architecture was specifically designed to address.
The author doesn't understand what the straw man argument is. He thinks it is bringing up anything that isn't specifically mentioned in the original argument. Nvidia stating that optimizing multi-core CPUs is difficult and that the Nvidia architecture has hundreds of applications seeing a huge gain in performance now is a valid point even if the Intel side never mentioned the difficulty of implementation.
This sentence no verb.
"daddy, what's AMD?" ... "well son, its that company that tried to keep doing everything at once and died."
What the hell kind of sales pitch is "We're only a little more than twice as slow!"
It's gonna work, too.
Humanity sucks at math.
This sentence no verb.
From the article, you can narrow the gap:
"with careful multithreading, reorganization of memory access patterns, and SIMD optimizations"
Sometimes though, I don't want to spend all week making optimizations. I just want my code to run and run fast. Sure, if you optimize the heck out of a section of code, you can always eek out a bit more performance, but if the unoptimized code can run just as fast (on a GPU), why would I bother?
That'd be "you're" or "your are", not "your".
HTH. HAND.
On top of being highly capable at massively parallel floating point math (the bread and butter of top500 and most all real world HPC applications), GPU chips benefit from economies of scale by having a much larger market to sell chips to. If Intel has an HPC-only processor, I don't see it really surviving. There have been numerous HPC only accelerators that provided huge boosts over cpus that flopped. GPUs growing into that capability is the first large scale phenomenon in hpc with legs.
XML is like violence. If it doesn't solve the problem, use more.
Does anyone under the age of 25 really care anymore about processor speed and video card "features"?
I only ask because 15 years ago I cared greatly about this stuff. However, I'm not sure if that is a product of my immaturity at that time, or the blossoming industry in general.
Nowadays it's all pretty much the same to me. Convenience (as in, there it is sitting on the shelf for a decent price) is more important these days.
..they have products in both segments.
..and for the record, AMD is still ruling the very high end multi-CPU (aka server) benchmarks and of course, we all know that their GPU's are top notch.
AMD just isnt doing well in the high end consumer-grade space, but then again the chips that Intel is ruling with in that segment are priced well above consumer budgets.
"His name was James Damore."
Evergreen had a *huge* lead over pre-Fermi nVidia chips, and still leads in 32-bit precision (and by extension most of what the mass market cares about), but 64-bit precision lags Fermi. Of course, Evergreen beat Fermi to market by a large large margin.
XML is like violence. If it doesn't solve the problem, use more.
AMD is the most advantaged on this front...
Intel and nVidia are stuck in the mode of realistically needing one another and simultaneously downplaying the other's contribution.
AMD can use what's best for the task at hand/accurately portray the relative importance of their CPUs/GPUs without undermining their marketing message.
XML is like violence. If it doesn't solve the problem, use more.
Magny-Cours is currently showing significant performance advantage over Intel's offerings while at the same time AMD's Evergreen *mostly* shows performance advantages over nVidia's Fermi despite making it to market ahead of Fermi.
AMD is currently providing the best tech on the market This will likely change, but at the moment, things look good for them.
XML is like violence. If it doesn't solve the problem, use more.
That would be "you are", not "your are".
The day I build a computer with an Nvidia graphics processor as a CPU is when it's time to call 911, cause I will have completely lost my mind.
Just kiss and make up already. Intel and nVidia have but one choice: to join forces and try collectively to compete against AMD/ATI. Anything less, and they're cutting their nose off to spite their respective faces.
Yeah, speciality silicon for a small subset of problems will stomp all over a general purpose CPU. No big news there.
Why is Intel even bothering to whine about this stuff? They sound like a bunch of babies trying to argue that the sky isn't blue.
This makes Intel look truely sad. It's completely unecessary.
A Pirate and a Puritan look the same on a balance sheet.
This troll reminds me of that Dave Chappelle skit about the racist black guy.
Truth is, blacks and hispanics are far more racist than whites.
I wonder if matrix inversion could be done with an asic with massive performance improvement over typical cpus? Im thinking of hardware that is designed to natively describe very large (spares?) matrices efficiently, and perform elementary matrix ops on these matrices.
is this possible? can you think of a way of implementing this, in terms of actual transistor logic?
The reason that Intel is whining is in the context of large number crunching systems or high end workstations. Rather than sell Ks of chips for the former, Nvidia (and to a lesser extent AMD) gets to sell hundreds of GPU chips. And for the workstations, Intel sells only one chip instead of a 2 to 4.
No, I don't trust in god. He'll have to pay up front, like everybody else.
I remember reading here on ./ that it got abandoned by Intel.
Wow. All that and no mention of Eric S. Raymonds bisexual wife and their girlfriend? Or is that to close to a heterosexual male fantasy for the OP to imagine?
So what you're saying is nVidia will become a patent holding company and probably make just as much money as they're making now.
I don't think AMD really cares about competing with top-end Intel processors. It takes a lot of R&D investment with very little return (it's a tiny market segment)
In the low/mid range AMD rules the roost in terms of value for money.
No sig today...
there's something very misleading about this. i don't see any socket 1567
cpus listed and the highest-listed intel quad socket is not listed on intel's
web site. (http://ark.intel.com/ProductCollection.aspx?series=36934) if it
were, it would be a 6 core/6 thread job. i don't know for a fact that the intel
7560 (8c/16t @ 2.26ghz; http://ark.intel.com/ProductCollection.aspx?series=46487)
would be faster, but 64 threads seems >> 24 to me! it should be at least
listed.
Intel decided to bail on marketing an in-house high performance GPU. But, they'd still like a return on their Larrabee investment. I don't doubt they would have been pushing the HPC mode anyway, but now, that's all they've got. Unfortunately for Intel, they've got to sell Larrabee performance based on in-house testing, while there are now a number of CUDA-based applications, and HPC-related conferences and papers are now replete with performance data.
To Intel's and AMD/ATI's advantage, NVIDIA has signed on with the OpenCL effort, so as the first two start getting drivers out, they can give the later a run for their HPC-GPU money. At the moment, though, it's all talk.
Luke, help me take this mask off
look at the X6 BE chip, 6 cores, better performance than anything intel has at the same price. It doesn't compete with Intels 12 "core" in 12 thead applications, but apart from video encodes(even thats iffy) you'll be hard pressed to find a 12 thread app that doesn't end up IO bound, as a home user.
All of the above was encrypted with a Quad ROT-13 method. Unauthorized decryption is in violation of the DMCA.
Note that one of AMD's 12-core Opterons is cheaper than Intel's top-of-the-line "consumer grade" 4-core i7 extreme, and THAT wouldn't kick the snot out of any i7 in I/O
"His name was James Damore."
Don't get me wrong, I like what Intel is doing, but c'mon, you are understating this:
and the SIMD instructions that have been added to Intel/AMD CPUs in recent years really are the same thing you get with GPU programming, just on a bit smaller scale.
It's an order of magnitude different (and I know from experience coding CPU and GPU)
i7 960 - 4 cores 4 way SIMD
GT285 (not 280) - 30 cores 32 way SIMD
SP GFLOPS
i7 960 - 102
GT285 - 1080
No matter what, AMD really wins in this one.
AMD has the potential to win, but currently are in last place. Intel is aggressively solving all of the problems that previously gave AMD an advantage, and NVIDIA has aggressively put in place the things HPC wants (e.g. easy to code in C for the platform - I've done it and it is easy, also adding ECC and caching, etc.)
Using Badaboom a CUDA app, you can rip down DVD copies to your Ipod's in minutes, not hours.
Unfortunately Badaboom are idiots and are taking their sweet time porting to the 465/470/480 cards.
I'd love to see a processor fast enough to beat a GPU at tasks such as these, and cd to mp3 conversions on CUDA, it's like moving from a hard drive to a fast SSD.
I mean when you get down to it, the seem really overpriced. No video output, their processor isn't anything faster, what's the big deal? Big deal is that 4x the RAM can really speed shit up.
Unfortunately there are very hard limits to how much RAM they can put on a card. This is both because of the memory controllers, and because of electrical considerations. So you aren't going to see a 128GB GPU or the like any time soon.
Most of our researchers that do that kind of thing use only Teslas because of the need for more RAM. As you said, the transfer is the limiting factor. More RAM means less often you have to snuffle data back and forth.
But I think the timescale will be a very long one.
I mean ideally, we want only the CPU in a computer. The whole idea of a computer is that it does everything, rather than having dedicated devices. Ideally that means that it does everything purely in software, that the CPU is all it needs. For everything else, we seem to have reached that point but graphics are still too intense. Have to have a dedicated DSP for them.
However, we'll keep wanting that until the CPU can do photorealistic graphics in realtime. That is a long way off yet. Even GPUs can't do that. Once GPUs can, the trick is then being able to scale that down to become a realistic subset of the CPU, rather than a dedicated unit. You can't very well scale CPUs up to massive sizes and power consumptions.
So I've no doubt it'll happen, but I think not for 20+ years.
AMD is the most advantaged on this front...
Intel and nVidia are stuck in the mode of realistically needing one another and simultaneously downplaying the other's contribution.
Exactly, and this manifested in Intel's new Pinetrail platform to the consumer's detriment. Intel refused to grant NVidia the license to connect their ION chipset via DMI, and so people planning on using Pinetrail in HTPC's were saddled with Intel's own chipset with crappy graphics performance (No Native Hardware H.264 Decoding: Long Live Ion).
How is AMD "ruling" the high end multi-CPU benchmarks??
1) [Quad CPU] AMD Opteron 6168 = 23,784
2) [8-Way] Six-Core AMD Opteron 8435 = 22,745
3) [Dual CPU] Intel Xeon X5680 @ 3.33GHz = 17,377
Let me translate for you:
1) 48 core AMD box = 23,784
2) 48 core AMD box = 22,745
3) 12 core Intel box = 17.377
Somewhat less impressive hey?
They aren't showing scores for Intel's 8-socket beckton systems which have 64 core, or 4-socket systems which have 32 cores.
AMD is getting completely spanked up and down the CPU spectrum. They need a new core - maybe Bulldozer will be great.
look at the X6 BE chip, 6 cores, better performance than anything intel has at the same price. It doesn't compete with Intels 12 "core" in 12 thead applications, but apart from video encodes(even thats iffy) you'll be hard pressed to find a 12 thread app that doesn't end up IO bound, as a home user.
Apart from video encodes you'll be hard pressed to find a 12 thread app, as a home user. (as in actually thrashing 12 threads at once)
AMD really cares about competing with top-end Intel processors - as when the Athlon ruled the roost, AMD sold its chips at a premium. Now (since the Core2Duo launched), with Intel in top spot, AMD is selling its processors cheaper, so it's losing possible profit.
From Wikipedia, "OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors." In other words, write your massively parallel programs using OpenCL and then run them on the device (or combination of devices) that executes your program the fastest.
Hopefully, OpenCL will have the same cataylzing effect on HPC that OpenGL had on computer graphics, but time will tell.
Word of warning to Intel: Almost nobody wants to hand-code assembly to run your SIMD instructions. People doing HPC (at least the ones using CUDA) are scientists and engineers who typically have better things to worry about than reading through detailed tomes on the i7 architecture. Make it more convenient (i.e. via OpenCL) or continue to lose market share in this area.
Given how Intel's current Nehalem interconnect and chipsets perform far higher than anything AMD's has to offer. I wonder how AMD manages to kick their snot out of i7 in I/O with a worse performing interconnect.