Slashdot Mirror


Intel, NVIDIA Take Shots At CPU vs. GPU Performance

MojoKid writes "In the past, NVIDIA has made many claims of how porting various types of applications to run on GPUs instead of CPUs can tremendously improve performance — by anywhere from 10x to 500x. Intel has remained relatively quiet on the issue until recently. The two companies fired shots this week in a pre-Independence Day fireworks show. The recent announcement that Intel's Larrabee core has been re-purposed as an HPC/scientific computing solution may be partially responsible for Intel ramping up an offensive against NVIDIA's claims regarding GPU computing."

129 comments

  1. first post! by Dynetrekk · · Score: 4, Funny

    I am now posting using my GPU. It's at least 50x faster!

    1. Re:first post! by LordKronos · · Score: 4, Informative

      Awesome. And now maybe you've learned a lesson. While the external processor was faster, sending your data over the bus to the external processor has an inherent delay in it. That's why your first post came in fourth.

    2. Re:first post! by TheLink · · Score: 4, Funny

      The other earlier posts however seem to suffer from some sort of processing or data corruption/error.

      --
  2. Re:AMD by jimmydevice · · Score: 1

    Uh, Linus doesn't work for microsoft.

  3. It depends? by aliquis · · Score: 5, Insightful

    Isn't it like saying "Ferrari makes the fastest tractors!" (yeah, I know!), which may be true, as long as they can actually carry out the things you want to do.

    I don't know about the limits of OpenCL/GPU-code (or architecture compared to regular CPUs/AMD64 functions, registers, cache, pipelines, what not), but I'm sure there's plenty and that someone will tell us.

    1. Re:It depends? by jawtheshark · · Score: 5, Informative

      Try Lamborghini next time... You do know that Mr Lamborghini originally made his money making tractors. The legend says he wasn't satisfied with what Ferrari offered as sports cars and thus made one himself. Originally, Lamborghini is a tractor brand.... Not kidding. I think they still make them...

      --
      Ahhh...the great dumpster continuum. Many a free computer will be found there. -- sowth (748135)
    2. Re:It depends? by Sycraft-fu · · Score: 5, Informative

      Basically, GPUs are stream processors. They are fast at tasks that meet the following criteria:

      1) Your problem has to be more or less infinitely parallel. A modern GPU will have anywhere in the range of 128-512 parallel execution units, and of course you can have multiple GPUs. So it needs to be something that can be broken down in to a lot of peices.

      2) Your problem needs to be floating point. GPUs push 32-bit floating point numbers really fast. The most recent ones can also do 64-bit FP numbers at half the speed. Anything older is pretty much 32-bit only. For the most part, count on single precision FP for good performance.

      3) Your problem must fit within the RAM of the GPU. This varies, 512MB-1GB is common for consumer GPUs, 4GB is fairly easy to get for things like Teslas that are built for GPGPU. GPUs have extremely fast RAM connected to them, much faster than even system RAM. 100GB/sec+ is not uncommon. While a 16x PCIe bus is fast, it isn't that fast. So to get good performance, the problem needs to fit on the GPU. You can move data to and from the main memory (or disk) occasionally, but most of the crunching must happen on card.

      4) Your problem needs to have not a whole lot of branching, and when it does branch, multiple paths need to branch the same. GPUs handle branching, but not all that well. The performance penalty is pretty high. Also generally speaking a whole group of shaders has to branch the same way. So you need the sort of thing that when the "else" is hit, it is hit for the entire group.

      So, the more similar your problem is to that, the better GPUs work on it. 3D graphics would be an excellent example of something that meets that precisely, which is no surprise as that's what they are made for. The more your deviate from that, the less suited GPUs are. You can easily find tasks they are exceedingly slow at compared to CPUs.

      Basically modern CPUs tend to be quite good at everything. They have strong performance across the board so no matter what the task, they can do it well. The downside is they are unspecalized, they excel at nothing. The other end of the spectrum is an ASIC, a circuit designed for one and only one thing. That kind of thing can be extremely efficient. Something like a gigabit switch ASIC is a great example. You can have a tiny chip that draws a couple watts and yet and switch 50+gbit/sec of traffic. However that ASIC can only do its one task, no programability. GPUs are something of a hybrid. They are fully programmable, but they are specialized in to a given field. As such at the tasks they are good at, the are extremely fast. At the tasks they are not, they are extremely slow.

    3. Re:It depends? by antifoidulus · · Score: 1

      I wonder if competition from GPUs will influence Intel to beef up the vector processing capabilities of it's chips. Currently Intel' SSE is pretty weak, especially when you compare it to competitors like Altivec. Unfortunately outside of Cell there aren't a whole lot of CPUs nowadays that feature Altivec....

    4. Re:It depends? by g4b · · Score: 1

      maybe GPUs can solve life then. Real Life Problems meet most of the criteria. infinite amounts all at the same time, many numbers floating, small description size, tends to branch in an endless tree of solutions never to be achieved...

    5. Re:It depends? by aliquis · · Score: 1

      Yeah I wondered which one it was but I was somewhat to lazy I guess. Maybe the history was the Lamborghini guy decided he could to ..

      Only googled ferrari tractor to see if they had any or whatever it was lamborghini, got a few tractor images so I went with that.

      So Lamborghini went super-cars and Ferrari tractors ("if they can beat us at cars we for sure will show them with tractors!"? :D)

      Sorry for messing up :)
      http://www.ferrari-tractors.com/

    6. Re:It depends? by JanneM · · Score: 4, Informative

      "So to get good performance, the problem needs to fit on the GPU. You can move data to and from the main memory (or disk) occasionally, but most of the crunching must happen on card."

      From what I have seen when people use GPUs for HPC, this, more often than anything else, is the limiting factor. The actual calculations are plenty fast, but the need to format your data for the GPU, send it, then do the same in reverse for the result really limits the practical gain you get.

      I'm not saying it's useless or anything - far from it - but this issue is as important asthe actual processing you want to do for determining what kind of gain you'll see from such an approach.

      --
      Trust the Computer. The Computer is your friend.
    7. Re:It depends? by rahvin112 · · Score: 4, Insightful

      It is not a secret (it's a stated fact on both Intel and AMD's roadmaps) to integrate GPU like programmable FP into the FP units of the general processor. The likely result will be the same general purpose CPU you love, but there will be dozens of additional FP units that excel at mathematics like the parent described except more flexible. When the fusion'eske products ramp and GPGPU functionality is integrated into the CPU Nvidia is out of business. Oh I don't expect these fusion products to have great GPU's, but once you destroy the low end and mid range graphics marketplace there is very little $$ wise left to fund R&D (3dfx was the first one into the high end 3d market and they barely broke even on their first sales, the only reason they survived was because they were heavy in the arcade sector sales). If Nvidia hasn't been allowed to purchase Via's x86 license by that point they are quite frankly out of business. Not immediately of course, they will spend a few years evaporating all assets while they try to compete with only the highend marketplace but in the end they won't survive. Things go in cycles and the independent graphics chip cycle is going to end very shortly, maybe in a decade it will come back, but I'm skeptical. CPU's have exceeded the speed needed for 80% of most tasks out there.

      When I first started my Career computer runs of my design work took about 5-30 minutes to run on bare minimum quality. These days I can exceed that bare minimum by 20 times and the run will take seconds. It's to the point where I can model with far more precision than the end product needs with almost no time penalty. In fact additional CPU speed at this point is almost meaningless and my business isn't alone in this. In fact most of the software in my business is single threaded (and the apps run that fast with single threads). Once the software is multi-threaded there is really no additional CPU power needed and it may come to the point where my business just stops upgrading hardware beyond what's need to replace failures and my business isn't alone. I just don't see a future for independent graphics chip/card producers.

    8. Re:It depends? by Anonymous Coward · · Score: 1, Interesting

      That is an excellent post, with the exception of this little bit

      GPUs have extremely fast RAM connected to them, much faster than even system RAM

      I'd like to see a citation for that little bit of trivia... the specific type & speed of RAM on a board with a GPU varies based on model and manufacturer. Cheaper boards use slower RAM, the more expensive ones use higher end stuff. I haven't seen ANY GPU's that came with on-board RAM that is any different than what you can mount as normal system RAM, however.

      Not trolling, I wanted to point out a serious flaw in what in an otherwise great post.

    9. Re:It depends? by pnewhook · · Score: 3, Informative

      GPUs have extremely fast RAM connected to them, much faster than even system RAM

      I'd like to see a citation for that little bit of trivia

      Ok, so my Geforce GTX480 has GDDR5 ( http://www.nvidia.com/object/product_geforce_gtx_480_us.html ) which is based on DDR3 ( http://en.wikipedia.org/wiki/GDDR5 )

      My memory bandwidth on the GTX480 is 177 GB/sec. The fastest DDR3 module is PC3-17000 ( http://en.wikipedia.org/wiki/DDR3_SDRAM ) which gives approx 17000 MB/s which is approx 17GB/sec. So my graphics ram is basically 10x faster than system ram as it should be.

      --
      Tesla was a genius. Edison however was a overrated hack who liked to torture puppies.
    10. Re:It depends? by somenickname · · Score: 2, Interesting

      That's a very good breakdown of what you need to benefit from GPU based computing but, really, only #1 has any relevance vs. an x86 chip.

      #2) Yes, an x86 chip will have a high clock speed but, unless you can use SSE instructions, x86 is crazy slow. Also, most (if not all) architectures will give you half the flops for using the double precision vector instructions vs. the single precision ones.

      #3) This is a problem with CPUs as well except, as you point out, the memory is much slower. Performance is often about hiding latency. You don't need your problem to fit in the L2/L3 cache of a CPU, but, if the compiler/programmer/CPU can prefetch things into L2/L3 before it's accessed, it's a huge win. The same goes for having things in GPU memory before it's needed. The difference is that the GPU has a TON of memory compared to an L2/L3 cache.

      #4) You might be right here. I know that with hyperthreading a CPU will yield to another "thread" when it mispredicts a branch. However, the fact that branch misprediction is a condition in which the CPU will switch to another thread, to me, means that mispredicting a branch on an x86 CPU is also a fairly expensive thing to do. Maybe not as expensive as on a GPU but, expensive nonetheless.

      I suppose it all comes down to what kind of problem you are trying to compute but, if you can make your problem work in a way that is pleasing to #1, using a GPU is probably going to be a win.

    11. Re:It depends? by Spatial · · Score: 2, Interesting

      I haven't seen ANY GPU's that came with on-board RAM that is any different than what you can mount as normal system RAM, however.

      You haven't been looking very hard. Most GPUs have GDDR3 or GDDR5 running at very high frequencies.

      My system for example:
      Main memory: DDR2 400Mhz, 64-bit bus. 6,400 MB/sec max.
      GPU memory: GDDR3 1050Mhz, 448-bit bus. 117,600 MB/sec max.

      Maybe double the DDR2 figure since it's in dual-channel mode. I'm not sure, but it hardly makes much of a difference in contrast. :)

      That isn't even exceptional by the way. I have a fairly mainstream GPU, the GTX 260 c216. High-end cards like the HD5870 and GTX 480 are capable of pushing more than 158,000 and 177,000 MB/sec respectively.

    12. Re:It depends? by Bengie · · Score: 1

      New AVX SIMD is coming out soon. The first set of 256bit registers are suppose to be 2xs as fast as SSE and later 512bit and 1024bit AVX are suppose to be another ~2-4xs faster than the 256bit. I guess one of the benefits of AVX is the new register sizes are suppose to give transparent speed increases. So a program made for 256bit AVX will automatically see faster calculations when the new 512bit AVX registers come out. Sounds good to me. They're suppose to be 3 operandi instructions.

    13. Re:It depends? by santiagodraco · · Score: 1

      They are called, specifically, FPU's not FP's.

      As for the cpu guys putting the gpu guys out of business... we know how successful Intel has been trying to do just that with their GPU offerings... you expect that to change in the next, say, 10 years? Not likely given their past track record of failure.

    14. Re:It depends? by Tacvek · · Score: 1

      The GPUs are definately worse than CPUs in branching.

      If your code splits into 8 different code paths at one point due to branching, your performance can be as bad as 1/8 the maximum, since rather than do anything remotely like actual branching, some GPUs just interleave the code of the different branches, with each instruction tagged as to whether which branch the code belongs to. So if the unit is processing an instrcution for a branch it is not on, it usts sits there doing nothing for one instruction cycle. This type of design may also have a depth limit on branching, so eight simultaneous code branches may not even be possible.

      So the CPU performance only significantly degrades if a branch is mispredicted, while many GPU designs have performance suffer for every branch, even if it could have been accurately predicted.

      --
      Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
    15. Re:It depends? by Elbows · · Score: 1

      The other big factor (the biggest in most of the GPU code I've written) is your pattern of memory access. Most GPUs have no cache so access to memory has very high latency even though the bandwidth is excellent. The card will hide this latency to some extent through clever scheduling; and if all your threads are accessing adjacent memory, it will coalesce that into one big read/write. But GPUs do best on problems where the ratio of arithmetic to memory access is high, and your data can hang around in registers for a while.

      I've found that in general GPU code has to be written much more carefully if you want good performance. On a regular CPU, if you pick a decent algorithm and pay attention to cache locality, you can usually rely on the compiler for the low-level optimizations and get pretty close to peak performance. On the GPU you have to pay very close attention to memory access patterns, register usage, and other low-level hardware details -- screwing up any of those parts can easily cost you a factor of 10 in performance.

      This is starting to change, though -- the newest chips from nvidia have an L1 cache, and relax some of the other restrictions.

    16. Re:It depends? by hairyfeet · · Score: 1

      If your theory was true, why hasn't it already happened? Both AMD and Nvidia have been putting pretty nice GPUs on motherboards for quite awhile, yet we still have discrete cards, why? There is a good reason why, for the most basic office task, even two or three year old gaming, the onboard chips work fine. I myself played Bioshock I and Swat 4 on my onboard with no trouble.

      But for anything where you care even a little bit about REAL performance the onboards, and I don't care if we are talking onboard or on die, simply won't have a chance. You just can't put hundreds of Mb or even Gbs of RAM onto the die. And honestly the amount of power you get in discrete cards for even the low end makes them a terrific buy. My 4650HD with 1Gb of RAM cost a grand total of $36 after MIR, plays all the games I care about, and does wonderfully well at transcoding and HD.

      So while you think discrete GPUs are gonna die, I'd say the opposite is true. I think the onboards will be used in machines where price trumps everything, such as bottom of the line netbooks and Walmart/Best Buy "specials", whereas for everything else since HD and games like Warcrack will continue to be popular and thus selling points discretes will bring in the "wow" factor and help OEMs to differentiate their products. And thanks to PCIe being standard on just about every board being made or sold in the last few years even those that get a Best Buy Special can bring it to someone like me and have an upgrade on the cheap.

      Although I do agree on one point you made: Nvidia. They need to buy out Via and they need to do it yesterday. If Nvidia doesn't prevail in an antitrust against Intel over the new socket they are gonna be dead in the water, as both AMD and Intel can offer "top to bottom" solutions (although I would call an Intel GPU a problem not a solution) and Nvidia simply doesn't have anything to compete. Both AMD and Intel get paid twice for every board sold, getting $$$ for both the GPU and the CPU, this extra cash will allow that much more R&D that Nvidia won't be able to afford. Finally from what I've seen Fermi is a GP/GPU and NOT a gamer chip, and cranks out waaaaay too much heat to boot. With each generation AMD seems to be getting better with power usage with the second and third gen of a chip using much less than the first, while Nvidia seems to be stuck in "Netburst" mode. With the focus on green computing now is not the time to be building space heaters, and considering the time it will take to combine CPU+GPU Nvidia needs to be seriously trying to pick up Via. If they don't I predict they will end up being bought out by Intel.

      --
      ACs don't waste your time replying, your posts are never seen by me.
    17. Re:It depends? by Kjella · · Score: 1

      My memory bandwidth on the GTX480 is 177 GB/sec. The fastest DDR3 module is PC3-17000 ( http://en.wikipedia.org/wiki/DDR3_SDRAM ) which gives approx 17000 MB/s which is approx 17GB/sec.

      And the high end CPUs have as far as I know triple channel memory now so a total of 51 GB/s. Not sure how valid that comparison is but graphics card tend to get their fill rate from having a much wider memory bus - the GTX480 has a 384 bit wide bus - rather than that much faster memory so it's probably not too far off. If CPUs move towards doing GPU-like work which can be loaded in wider chunks they'll probably move towards a wider bus too.

      --
      Live today, because you never know what tomorrow brings
    18. Re:It depends? by Anonymous Coward · · Score: 0

      Nah. NVIDIA will continue to exist because their will still be a large market for dedicated GPUs for years to come. Cell-Phones, Game Consoles, and all sorts of other electronic devices will still want access to dedicated processors for the very type of problems that you describe. Not to mention large scale research that is now being done on NVIDIA cards.

      As a software engineer, in the printing industry, I can tell you that our interest in dedicated GPUs for large scale printing is increasing not decreasing and we are buying the kinda of cards you think NVIDIA will only have left.

    19. Re:It depends? by Anonymous Coward · · Score: 0

      "I don't see a need for it in my business therefore I don't see why anybody else should have a need for it either"

    20. Re:It depends? by Jah-Wren+Ryel · · Score: 1

      2) Your problem needs to be floating point. GPUs push 32-bit floating point numbers really fast. The most recent ones can also do 64-bit FP numbers at half the speed. Anything older is pretty much 32-bit only. For the most part, count on single precision FP for good performance.

      That requirement is not necessarily true. Or at least not in the traditional sense of 'floating point.' GPUs make awesome pattern-matchers for data that isn't necessarily floating point.

      Elcomsoft (of adobe DRM international arreset fame) has a GPU accelerated password cracker that is essentially a massively parallel dictionary attack,

      A number of anti-virus vendors have GPU accelerated scanners - like Kaspersky.

      And some people have been working with GPUs for network monitoring via packet analysis too.

      --
      When information is power, privacy is freedom.
    21. Re:It depends? by robthebloke · · Score: 1

      I guess one of the benefits of AVX is the new register sizes are suppose to give transparent speed increases. So a program made for 256bit AVX will automatically see faster calculations when the new 512bit AVX registers come out.

      Afraid not (well, there are ways if you are willing to litter your code with C++ templates). Yes the instructions will process 8 floats, however you're only going to see some nice linear speed up if you are already using SOA data structures. For a lot of the 'traditional' SSE code you'll tend to see (i.e. AOS vector3/matrix classes etc), the AVX instructions will be of little use. In effect they duplicate all SSE->SSE4.1 for 256 register types. i.e. SSE has _mm_add_ps, AVX has _mm256_add_ps, and any new 512 bit instruction will be _mm512_add_ps (which incidentally is one of the larrabee instructions). You'll have to modify just as much code porting SSE to AVX as you will porting the 256bit AVX instructions to 512bit AVX ones. The only advantage is that we know what the new 512 bit instructions will look like, and can plan for the future!

      I've already had a crack at porting a fair amount of existing code to AVX (Intel compiler comes with an emulator - I don't have any hardware obviously!). For code optimised in an SOA layout, coding is going to be quite fun. If you have a load of Vector3/Matrix type classes, then you aren't going to get much performance benefit. The nicest thing about AVX for that kind of code is that it allows you to convert to double, perform the operation, convert back to floats, and the resulting code should run at about the same speed as the SSE equivalent..... (a touch slower, but not enough to care....)

    22. Re:It depends? by pnewhook · · Score: 1

      Width is part of it but it's also clock rate. The fastest overclocked DDR3 will go to 2.5GHz. The stock Geforce 480 is 3.7Ghz. At those rates the bus length gets to be an issue. The memory on a graphics card can be kept very close to the chip. On a PC the memory due to practical reason has to be set farther away resulting in necessarily slower clocks and data rates.

      The 51 GB/sec you mention is definitely overclocked. I've not seen stock memory that fast. Even so its still less than a third the rate of the graphics memory.

      --
      Tesla was a genius. Edison however was a overrated hack who liked to torture puppies.
    23. Re:It depends? by Anonymous Coward · · Score: 0

      GTX480? You mean the Pentium 4 of the graphics card world?

    24. Re:It depends? by pnewhook · · Score: 1

      No, I mean the fastest consumer card available right now.

      --
      Tesla was a genius. Edison however was a overrated hack who liked to torture puppies.
    25. Re:It depends? by kramulous · · Score: 1

      Some of the examples used in the cudaSDK are phoney. The sobel one can be made to run faster on the cpu - provided you use the intel compilers and performance primitives and can parallelise.

      It doesn't surprise me. There is an example of Sobel for the FPGA's that tote much faster execution times, but then when you examine the code, the fpga version has algorithmic optimisations that were 'left out' for the cpu version. Again, it can be made to run faster on the cpu.

      I'm not saying that GPUs are crap. For the right problem, they can be really good. It is just that they are not anywhere near the magic bullet the NVidia PR machine is saying.

      --
      .
    26. Re:It depends? by jesset77 · · Score: 1

      Blah, why can't I get a good GPU accelerated Mandelbrot set viewer, then? z = z^2 + c meets all your criteria great, dun it? :P

      --
      People willing to trade their freedom of expression for temporary entertainment deserve neither and will lose both.
    27. Re:It depends? by Anonymous Coward · · Score: 0

      All 4 of your points are quite naive...

      1) Your problem does NOT need to be infinitely parallel to gain speedups, most stream processors (be it a GPU, Cell, DSP, whatever) are clocked at 1/4 to 1/2 of your consumer level CPU - with far-faster cache/memory interfaces. So at worst you need to be able to split your workload into 2-4 workloads that can run on each execution unit, to have your code 'execute' as fast as a CPU (more is generally faster, depending on memory transactions, cache read/write latency, etc) - note: 'execute' does no mean 'complete' faster (latency can cause a single execution unit to not be 'ready' for a while).

      2) Most modern stream processors (nVidia GPUs, DSPs, Cell) can do 32bit and 64bit integer and FP mathematics as fast as a CPU, in many cases 32/64bit FP faster than a CPU - so again, as long as you spread your workload over the 2-4 execution units you'll be 'on par' with a single CPU core.

      3) Most workloads aren't as simple as running a tiny piece of code over 2-4 execution units for a few hundred us, most workloads take many ms (overall) while individual execution units take a few us. In this case many stream processors are capable of concurrently executing code while also concurrently (either half or full duplex, depending) transferring memory to/from the GPU - in many cases it's very possible to concurrently execute code running over many gigabytes of data without incurring any obvious memory transfer overheads. The obvious limitation here is if your workload is extremely memory bound, and you need so much data you reach the limits of the PCI express bus bandwidth.

      4) There are so many parallel programing paradigms to avoid branching it's not funny. I agree if you naively copy/paste code written for a CPU and try to run it on a stream processor, you'll have major issues. But if you actually write an algorithm from the ground up with the execution and scheduling model of a stream processor in mind - you'll find you don't really branch much, if at all (at the cost of a few extra registers or cache).

    28. Re:It depends? by internettoughguy · · Score: 1

      There are also ferrari tractors, unrelated to the sports car manufacturer though.

    29. Re:It depends? by Anonymous Coward · · Score: 0

      I put my bets on Intel ad AMD. At least they publish documentation. What makes me sad is that GPU are not standardized like USB mass storage. There is no OS-independent access method. This is anti-competitive because it destroys new OS players even if they are open source and live by donations. Lack of standards and documentation means (for me) a crappy product. Until then I prefer to do my work on my quad-core atom with TBB. OpenCL/OpenGL are not form me until they are embedded in the GPU.

    30. Re:It depends? by drinkypoo · · Score: 1

      If your theory was true, why hasn't it already happened? Both AMD and Nvidia have been putting pretty nice GPUs on motherboards for quite awhile, yet we still have discrete cards, why?

      You have misunderstood the theory. The theory is that level of functionality will be merged into the CPU. Not into the motherboard.

      There is a good reason why, for the most basic office task, even two or three year old gaming, the onboard chips work fine. I myself played Bioshock I and Swat 4 on my onboard with no trouble.

      We still have discrete graphics cards because onboard GPUs are sufficient for most tasks? Do you have any idea what you're saying here?

      But for anything where you care even a little bit about REAL performance the onboards, and I don't care if we are talking onboard or on die, simply won't have a chance. You just can't put hundreds of Mb or even Gbs of RAM onto the die.

      They're not on the die in a motherboard-integrated solution, either. They use system memory. Further, CPUs can already access system memory. You clearly have no idea what you are talking about.

      So while you think discrete GPUs are gonna die, I'd say the opposite is true. I think the onboards will be used in machines where price trumps everything, such as bottom of the line netbooks and Walmart/Best Buy "specials", whereas for everything else since HD and games like Warcrack will continue to be popular and thus selling points discretes will bring in the "wow" factor and help OEMs to differentiate their products.

      You don't get it. ALL GPUs are going to go away, because CPUs are getting better at GPU tasks faster than GPUs are getting better at CPU tasks.

      Although I do agree on one point you made: Nvidia. They need to buy out Via and they need to do it yesterday. If Nvidia doesn't prevail in an antitrust against Intel over the new socket they are gonna be dead in the water, as both AMD and Intel can offer "top to bottom" solutions (although I would call an Intel GPU a problem not a solution) and Nvidia simply doesn't have anything to compete.

      This part, at least, makes sense. Although I don't think buying VIA would actually help them. VIA has no competitive processors whatsoever.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    31. Re:It depends? by CyberDragon777 · · Score: 1

      You mean the HD5970?

      --
      We both said a lot of things that you are going to regret.
    32. Re:It depends? by hairyfeet · · Score: 1

      I understand perfectly, it is you that probably needs a WHOOSH here. I know all about having everything onboard, as I'm old enough to remember when there was NOTHING but onboard. The problem with your theory is unless you make the CPU as easy to toss as the old Slot 1s it simply isn't gonna work for anything but the most basic tasks. And system memory will ALWAYS suck...full stop. That is why my CPU has 6Mb of cache onboard, do you think they would waste that much die space on cache if memory access weren't slow as ass? Try watching full HD or playing a decent game with an onboard, then do the same with a $35 discrete. even the shitty discrete will STOMP. What I am saying is for office tasks, or the girl whose "gaming" consists of FB, they have no need for anything better.

      And CPUs are better at GPU tasks? Citation please? Because I gotta call bullshit on that one. Even with SSE4 the CPU still sucks ass for even basic transcoding when compared to GPUs, which is the whole reason why we see both AMD and Nvidia investing serious money in GP/GPU. It is because for HPC and other huge tasks the CPU just doesn't compare, not even close. Or are you trying to stand here and seriously argue a Pinetrail or Bulldozer CPU is gonna compete with an HD5xxx or Fermi?

      I'm sorry but I've been hearing about the "death of the GPU" since Unreal, yet here they are, why? Because I don't give a shit if it is on the board, the die, or in the microcode, it is gonna suck compared to a discrete. The reason we are seeing both Intel and AMD go to a GPU on a CPU has NOTHING to do with performance at all, and everything to do with power and cost. It is the old "race to the bottom" where by having less chips you can cut costs and lower power usage, but for anything other than MS Office or FB it is gonna blow chunks. Funny thing is ARM has seen the wisdom of going the other way, with specialized chips for things like HD decoding.

      But I tell you what, you bookmark this and come back in 2 years and we'll see who is right. My prediction is thus: The ONLY thing that the combo CPU/GPU will kill is bottom of the line shit chips like the Intel IGP or the 3xxx motherboard GPUs. For everything else they will be bragging (and charging more money) for the power of their discrete GPUs, just like they do now. Or do you seriously think AMD sunk all that money into ATI simply to kill the goose that lays the golden eggs?

      --
      ACs don't waste your time replying, your posts are never seen by me.
    33. Re:It depends? by Anonymous Coward · · Score: 0

      Here you go. From the Orange book 2nd edition pp 467-475. typos however are mine :)
      vertex shader

      uniform vec3 LightPosition;
      uniform float Specular Contribution;
      uniform float DiffuseContribution;
      uniform float Shininess;

      varyng float LightIntensity;
      varying vec3 Position;

      void main()
      {
      vec3 ecPosition = vec3(glModelViewMatrix = gl_Vertex);
      vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);
      vec3 lightVec = normalize(lightPosition - ecPosition);
      vec3 reflectVec = reflect(-lightVec, tnorm);
      Vec3 viewVec = normalize(v-ecPosition);
      float spec = max(dot (reflectVec, viewVec), 0.0);
      spec = pow (spec, Shininess);
      LightIntensity = DiffuseContribution * max(dot(lightVec, tnorm), 0.0) + SpecularContribution * spec;
      Position = vec3(gl_multiTexCoord0 - 0.5) * 5.0;
      gl_Position = ftransform();
      }

      Fragment Shader

      varing vec3 Position;
      varying float LightIntensity;

      uniform float MaxIterations;
      uniform float Zoom;
      uniform float Xcenter;
      uniform float Ycenter;
      uniform vec3 InnerColor;
      uniform vec3 OutterColor1;
      uniform vec3 OuterColor2;

      void main()
      {
      float real1 = position.x * Zoom + Xcenter;
      float imag = position.y * Zoom + Ycenter;
      float Creal = real ; // change this line ....
      float Cimag=imag; // ... and this one to get a Julia set

      float r2 = 0.0
      float iter;
      for ( iter = 0.0; iter MaxIterations && r2 4.0; ++iter)
                { float tempreal = real;

                    real = (tempreal * temreal) - (imag * imag) + Creal;
                    imag = 2.0 * tempreal * imag + Creal;
                    r2 = (real * real) + (imag * imag);
                } // Base the color on the number of iterations
      vec3 color;
      if (r2 4.0) color = InnerColor;
      else color = mix (OuterColor1 , OuterColor2, fract (iter * 0.5));
      color *=LightIntensity;
      gl_FragColor = vec4(color, 1.0);
      }

    34. Re:It depends? by drinkypoo · · Score: 1

      I understand perfectly, it is you that probably needs a WHOOSH here.

      Well, give it a shot.

      I know all about having everything onboard, as I'm old enough to remember when there was NOTHING but onboard.

      Well, I had an Altos CP/M machine that had everything onboard, once. But my next one, a Kaypro 4, carried its modem on a daughterboard. So really, we're talking about times so old as to not be worth mentioning.

      The problem with your theory is unless you make the CPU as easy to toss as the old Slot 1s it simply isn't gonna work for anything but the most basic tasks.

      Unless the CPU is easily removable and discardable, it simply isn't going to be able to compute on the proper level? We're talking about power, not packages. Speak English, and make it relevant.

      And system memory will ALWAYS suck...full stop.

      That's funny, my crystal ball remains cloudy on the subject. Even my magic 8-Ball is no help. Neither of them can explain why the same type of memory is commonly used for system RAM and GPUs (i.e. various DDR SDRAM variants.) The only problem with the typical PC memory bus is that it is too narrow.

      Try watching full HD or playing a decent game with an onboard, then do the same with a $35 discrete. even the shitty discrete will STOMP.

      Watching full HD is a completely solved problem with almost any modern video card. Even under Linux I have hardware-accelerated video decoding. You can get this functionality, more than enough to smoothly play 1080p H.264 on Windows or Linux, in a $200 Acer Aspire. Obviously, current integrated GPUs are more than sufficient for this task. Playing games is a little different, but it of course depends on how far you're trying to push the hardware. I can play Mechwarrior 4 at 720p full detail very smoothly on my Gateway LT3201u.

      However, this argument is completely and totally irrelevant for a further reason: the idea is that the CPU will surpass the GPU, especially the integrated one.

      And CPUs are better at GPU tasks? Citation please? Because I gotta call bullshit on that one.

      Logical fallacy: attacking a straw man. I note you don't bother to use quote and insert blocks of my text, because it would make it too obvious that you are not replying to them, but instead making shit up. Here's what I actually said: You don't get it. ALL GPUs are going to go away, because CPUs are getting better at GPU tasks faster than GPUs are getting better at CPU tasks. Here's how you interpreted it: "CPUs do GPU stuff faster than a GPU". Which is of course not what I said. It's too bad I changed my sig recently; it used to say that you should read and attempt to understand my comment before replying. I ask you for this now. If you will not do me this courtesy then I will have to stop reading what you write, because it is becoming clear that you are not interested in a conversation, only continually repeating your beliefs with addressing any points. You will, in fact, be deliberately obtuse in order to avoid addressing the actual points which are inconvenient to your argument.

      Even with SSE4 the CPU still sucks ass for even basic transcoding when compared to GPUs,

      Please compare on a total watt-for-watt basis.

      But I tell you what, you bookmark this and come back in 2 years and we'll see who is right.

      A single successful hardware project like the launch of a new chip can take five years or more from start to finish. You want this to be resolved in two? You're attempting to create a bullshit success condition to make yourself look clever. I am not buying in.

      Or do you seriously think AMD sunk all that money into ATI simply to kill the goose that lays the golden eggs?

      I think they did it because their technology is behin

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    35. Re:It depends? by Calinous · · Score: 1

      CPUs have a long way to go to reach the level of memory bandwidth and latency available to a top grade graphic card - even triple channel DDR3 kits offer only some 50GB/s, while a AMD 4870 has some 115 GB/s. Latency on the GPU is similar with a top end desktop memory (server memory tends to have lower latency and reduced speed though, even if servers might use more memory channels than the three used in the top end desktops).

    36. Re:It depends? by hairyfeet · · Score: 1

      Since you seem to be having trouble understanding, I'll break it down, kay? lets take a machine just three years old, standard "Best Buy Special". Now if your theory were true that machine would either A-have to be thrown in the trash, very wasteful, or B-be able to have the CPU trivially thrown away, because with the GPU integrated it will NOT be able to do the tasks required today. Is that so hard to understand? HD is going nowhere but UP, higher resolutions, bigger screens, etc. This is trivially handled by adding a new GPU, but if we only have CPU+GPU? Well lets just say I hope you have a way to dispose of the massive amounts of eWaste you'll generate.

      And talk about straw men! Comparing GDDR5 to DDR RAM? Uhh...bullshit? Even the shittiest discrete can crank out insane Gb of bandwidth. Please show me a SINGLE BOARD that can even hope to approach what a $100 discrete can push. You can't can you? That is because general RAM will always suck...full stop. And here you are accusing me of not reading while you COMPLETELY IGNORE what I write. Pot meets kettle? Sure your shitty onboard can decode a 720p, hell if you are lucky it may even do 1080i without pegging the CPU to 100%, but what about transcoding? What about 3D, the next big wave of video? What about 2500x1900 resolution? These things are trivial by adding a GPU, but if it is integrated, how will you solve it? Will you tell everyone to throw away their machines and buy new? While I'm sure that would be nice for Intel and AMD it wouldn't be cost effective for the rest of us.

      Finally since you want me to quote, I will. You said, and I quote "You don't get it. ALL GPUs are going to go away, because CPUs are getting better at GPU tasks faster than GPUs are getting better at CPU tasks. " Where is your proof? Pinetrail? A FIVE YEAR OLD GPU jammed into the CPU? Even AMD says Bulldozer is to be for the "budget" (translation...cheap ass) market, so where is your proof? Show me a SINGLE page where either Intel or AMD has said they are going to put CURRENT cutting edge GPUs onboard the die. Just one, because from what I've seen they are integrating chips that wouldn't even be considered "cutting edge" three years ago. Bulldozer is gonna be based on the HD3xxx line, whereas Intel is based on the HD4xxx, neither of which are worth writing home about.

      I repeat, I'm old enough to remember the designer of Unreal swearing that GPUs would be dead in 5 years, that was 1999 I do believe. Yet here we are with more GPUs than we can shake a stick at. We have heard the "death of" stories for the past decade and the ONLY one they got right was Sound, and that is because there simply hasn't been an advancement in sound since surround sound a decade ago. Maybe a decade from now when everyone has settled on 5900x2300 res 50in flat panels, maybe we will see the death of GPUs then. But I haven't seen a single shred of evidence to support your claims, unless you think nobody will want more than a 5 year old integrated GPU. The only market I see for integrated GPU on die is the absolute bottom of the line, but those people never bought GPUs in the first place so who cares?

      --
      ACs don't waste your time replying, your posts are never seen by me.
    37. Re:It depends? by DarthVain · · Score: 1

      Shhhhh.... Your giving rednecks around the world hope that someday John Deere will make a sports car...

    38. Re:It depends? by Anonymous Coward · · Score: 0

      Or at least monster trucks. That would be nearly as awesome.

    39. Re:It depends? by drinkypoo · · Score: 1

      Finally since you want me to quote, I will. You said, and I quote "You don't get it. ALL GPUs are going to go away, because CPUs are getting better at GPU tasks faster than GPUs are getting better at CPU tasks. " Where is your proof? Pinetrail? A FIVE YEAR OLD GPU jammed into the CPU?

      You're so stupid, I can barely stand it. you want a GPU put into a CPU package to be proof that GPUs are going away? I said they're going away, not going into the package with the CPU. This is the general theme of your "conversation", attacking straw men. Welcome to my foes list, idiot. I can't waste more time on someone who doesn't understand when they are spewing logical fallacies (or who does it on purpose; I have not totally ruled out the possibility that you are a troll. But I suspect you simply have very poor reading comprehension skills and persist in assuming that you only need to read things through once when there are big words involved.)

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    40. Re:It depends? by pnewhook · · Score: 1

      I've had really bad experiences with ATI cards, especially the poor OpenGL support which I use predominantly. I'm sticking with nVidia - OpenGL is always faster on nVidia than ATI.

      --
      Tesla was a genius. Edison however was a overrated hack who liked to torture puppies.
    41. Re:It depends? by jawtheshark · · Score: 1

      Hehehe... Good one :-)

      --
      Ahhh...the great dumpster continuum. Many a free computer will be found there. -- sowth (748135)
    42. Re:It depends? by hairyfeet · · Score: 1

      Ahhh...so there it is. You can't even come up with a SINGLE SOURCE to back up your claims, so all you can do is insult and add me to some imaginary list I couldn't give less of a shit about. Did I call you names? Nope that would be you acting all childish because you have NO PROOF AT ALL and are just pulling a theory out of your ass.

      If "CPUs do it better" as you say, why are Intel and AMD trying to integrate GPUs? wouldn't that be a waste? Of course it would be, but the answer is simple: Your theory is simply bullshit. Video? GPUs stomp, have stomped, continue to stomp. The best Intel and AMD have come up with is SSE4a, which will still peg out a CPU just converting 720p. Gaming? Show me a SINGLE CPU that can play Far Cry 1 (a nearly 8 year old game BTW) without GPU assistance. Just one. You can't can you?

      If you would think, even for just a moment, you would see that your argument makes NO sense. It is like saying x86 is gonna replace ARM. A CPU is by its very definition a general processing unit, where die space is costly and better served with more cores. a GPU is a highly specialized vector processor designed for multimedia and 3D applications. The few GPU centric tasks added to CPUs have been to support a tiny subset of what even an 8 year old GPU can do. Show me a single CPU that can even hold its own against a 5 year old GPU in its tasks, just one.

      But instead you simply insult when you are shown to be mistaken. You are like those Linux zealots that make claims "Linux is pulling ahead!" and when shown wrong suddenly change their definition of what ahead means or call someone a shill. Yet again I point out we have heard the "death of the GPU" since 1999, and if anything with GP/GPU and Eyefinity GPUs have become an even more integral part of multimedia. Where is the proof of your claims? Show me a white paper backing you up, a single citation, anything at all? When is this "magic CPU" that makes GPUs obsolete supposed to be here? Five years? Twenty? Where is it on the Intel or AMD roadmap?

      Here is your chance to put your money where you mouth is Drinkypoo, prove me wrong. You have the ENTIRE INTERNET at your disposal, surely if you aren't just making this up out of your ass you'll be able to provide citations? Whitepapers? Anything at all? Otherwise you are just another troll, throwing insults because someone points out your "Theory" is complete and total horseshit with no evidence to back it up. If you would like I can plaster this post with everything from roadmaps to whitepapers showing every. single. thing. I said to be true, but you already know this, don't you?

      --
      ACs don't waste your time replying, your posts are never seen by me.
    43. Re:It depends? by Jorophose · · Score: 1

      If Nvidia hasn't been allowed to purchase Via's x86 license by that point they are quite frankly out of business.

      Apparently, licensing terms for access to the x86 designs forbid it passing to non-US hands. I think VIA is working a loophole here, because Centaur is doing the CPU designs, VIA does the branding, chipsets, and sometimes the motherboards.

      However, it might work if nVidia aquired VIA and Centaur, and merged VIA's chipset departments into nvidia's (and segmented them or something) because otherwise they're only looking for Centaur and it ain't going to be pretty. I'm just disappointed in nVidia myself for letting Nano go the way it did, and not partnering GeForce+Nano properly.

  4. Re:Intel is gay by Anonymous Coward · · Score: 0

    Do you mean that Intel is homosexual? I'm not even sure what that would mean. It's attracted to others of the same gender? Seems a bizarre thing to say about a company.

    It seems unlikely that you're using the old usage of happy. It would make some sort of sense if the company turned out to give its employees a positive experience from working there, but I don't see any evidence of that.

    Or perhaps you simply consider it somehow reprehensible that a company makes semiconductors. Why do you have an aversion to this practice?

  5. You lazy fuckers by drinkypoo · · Score: 5, Interesting

    I don't expect slashdot "editors" to actually edit, but could you at least link to the most applicable past story on the subject? It's almost like you people don't care if slashdot appears at all competent. Snicker.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    1. Re:You lazy fuckers by jgardia · · Score: 1

      s/editors/kdawson/g

    2. Re:You lazy fuckers by Anonymous Coward · · Score: 0

      s/kdawson/monkey_with_typewriter/g

      If you are going to substitute, as least substitute with something better.

    3. Re:You lazy fuckers by perryizgr8 · · Score: 1

      what does this mean? totally lost, i am.

      --
      Wealth is the gift that keeps on giving.
    4. Re:You lazy fuckers by Entropius · · Score: 0, Offtopic

      So, once upon a time, there was this text editor called vi.

      To make it do shit you type in cryptic commands. The one for search-and-replace is s, followed by a slash, followed by the thing you want to search for, followed by another slash, followed by the thing you want to replace it with. Because of more arcana, this will only happen once per line unless you put a g after it.

      So s/cat/dog/g means "replace all occurrences of cat with dog".

      Incidentally, you also have to tell vi in what range it should do this operation. So you get cryptic commands like :1,$s/cat/dog/g

    5. Re:You lazy fuckers by 3.1415926535 · · Score: 1

      I think that predates vi. Good ol' "ed", the line editor, has s/foo/bar/g command.

    6. Re:You lazy fuckers by cynyr · · Score: 1

      sed has the same syntax as well.

      --
      All of the above was encrypted with a Quad ROT-13 method. Unauthorized decryption is in violation of the DMCA.
    7. Re:You lazy fuckers by Anonymous Coward · · Score: 0

      Is this sarcasm?

  6. AMD by MadGeek007 · · Score: 5, Funny

    AMD must feel very conflicted...

  7. Re:AMD by Lennie · · Score: 1

    The troll did have one point, the subject, where is AMD/ATI in this article ? Didn't they also have a product in that segment ?

    --
    New things are always on the horizon
  8. CPUs and GPUs have different goals by leptogenesis · · Score: 5, Interesting

    At least as far as parallel computing goes. CPUs have been designed for decades to handle sequential problems, where each new computation is likely to have dependencies on the results of recent computations. GPUs, on the other hand, are designed for situations where most of the operations happen on huge vectors of data; the reason they work well isn't really that they have many cores, but that the operations for splitting up the data and distributing it to the cores is (supposedly) done in hardware. In a CPU, the programmer has to deal with splitting up the data, and allowing the programmer to control that process makes many hardware optimizations impossible.

    The surprising thing in TFA is that Intel is claiming to have done almost as well on a problem that NVIDIA used to tout their GPUs. It really makes me wonder what problem it was. The claim that "performance on both CPUs and GPUs is limited by memory bandwidth" seems particularly suspect, since on a good GPU the memory access should be parallelized.

    It's clear that Intel wants a piece of the growing CUDA userbase, but I think it will be a while before any x86 processor can compete with a GPU on the problems that a GPU's architecture was specifically designed to address.

    1. Re:CPUs and GPUs have different goals by rahvin112 · · Score: 1

      All 3 of them?

  9. Straw man? by Posting=!Working · · Score: 1

    The author doesn't understand what the straw man argument is. He thinks it is bringing up anything that isn't specifically mentioned in the original argument. Nvidia stating that optimizing multi-core CPUs is difficult and that the Nvidia architecture has hundreds of applications seeing a huge gain in performance now is a valid point even if the Intel side never mentioned the difficulty of implementation.

    --
    This sentence no verb.
  10. Re:AMD by TheGryphon · · Score: 1

    "daddy, what's AMD?" ... "well son, its that company that tried to keep doing everything at once and died."

  11. Intel says "Buy Nvidia" by Posting=!Working · · Score: 4, Insightful

    What the hell kind of sales pitch is "We're only a little more than twice as slow!"

    [W]e perform a rigorous performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7 960 processor narrows to only 2.5x on average.

    It's gonna work, too.

    Humanity sucks at math.

    --
    This sentence no verb.
    1. Re:Intel says "Buy Nvidia" by Cassini2 · · Score: 1

      What the hell kind of sales pitch is "We're only a little more than twice as slow!"

      The two times speed gain point is where it becomes pointless to exploit specialized hardware. Frequently, the software development program manager has two choices:
      a) Ship a product now, or
      b) Spend 1 to 2 more years developing the product, then ship it.
      The issue is that hardware doubles in speed every 1 to 2 years. If the cost of exploiting current specialized hardware is an additional 1 to 2 years software development, then the "user" performance at the end of 1 to 2 years is the same.

      The revenue from the additional time to market is not. Being in the market first, can yield additional sales. Simply having a product to market, results in sales. As such, delaying software development to get a speed gain, adversely affects revenues.

      Most significantly, having a product in the marketplace allows one to understand what the users "want". The users might not want speed. They might have some massive algorithmic problem with the product. Perhaps, you designed the software for a small investor tracking one stock, and the purchasers are wall street traders tracking thousands of stocks. In this case, the program needs to be restructured to better handle the problems the user base wants solved.

      The "additional software development time" argument significantly reduces the usefulness of the CUDA approach. Intel delivers processors that can speed up the current software, immediately. No need to rewrite software. "Twice as slow" is approximately the break even point for many businesses.

    2. Re:Intel says "Buy Nvidia" by royallthefourth · · Score: 1

      I did an experiment on a Core 2 Duo a couple years ago and found it to be only 5% as fast at doing a huge matrix multiply compared to a (then) top-of-the-line Nvidia. So, they're catching up pretty well.

      That's worth noting for people who've been following this closely for a while.

    3. Re:Intel says "Buy Nvidia" by evilviper · · Score: 1

      What the hell kind of sales pitch is "We're only a little more than twice as slow!"

      It's a very good sales pitch, actually. Unlike AMD, NVidia isn't an alternative to Intel CPUs. Instead it's a complimentary technology, which adds additional cost.

      So, I could buy a $500 CPU and a $500 GPU, or I could buy TWO $500 CPUs, and get most of the performance, without having to completely redesign all software to run on an GPU.

      And Intel has at least one good point, in that NVidia's claims are based on pretty naive methods, and the SIMD instructions that have been added to Intel/AMD CPUs in recent years really are the same thing you get with GPU programming, just on a bit smaller scale. And if Intel could double the performance of SIMD instructions on near-future CPUs, you'd really lose the benefits of GPU programming, and the economics would take care of the rest quite simply.

      No matter what, AMD really wins in this one. They're packaging CPU & GPU ever closer. It might just expand the utility of SIMD, or it might introduce a new CPU instruction set for the GPU, like the x87 FPU did before it, or it might be integrated tighter still, to the point that their CPUs might just automatically route appropriate computations to the GPU silicon, and route anything else to the traditional CPU.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    4. Re:Intel says "Buy Nvidia" by Twinbee · · Score: 1

      I think that CPUs are faster with conditional branching and other general purpose computing tasks, so I would sacrifice 2x for that.

      --
      Why OpalCalc is the best Windows calc
    5. Re:Intel says "Buy Nvidia" by sbates · · Score: 1

      Just a helpful tip: the next time you're tempted to add a comma, don't. It'll vastly improve the readability of your otherwise competent writing.

  12. Optimizations Matter by Lord+Byron+II · · Score: 1

    From the article, you can narrow the gap:

    "with careful multithreading, reorganization of memory access patterns, and SIMD optimizations"

    Sometimes though, I don't want to spend all week making optimizations. I just want my code to run and run fast. Sure, if you optimize the heck out of a section of code, you can always eek out a bit more performance, but if the unoptimized code can run just as fast (on a GPU), why would I bother?

    1. Re:Optimizations Matter by Rockoon · · Score: 3, Informative

      Just to be clear, those same memory reorganizations are required for the GPU. That being specifically the Structure-of-Arrays strategy instead of the Array-of-Structures strategy.

      Its certainly true that most programmers reach for the later style, but mainly because they arent planning on using any SIMD.

      --
      "His name was James Damore."
    2. Re:Optimizations Matter by Anonymous Coward · · Score: 0

      Because it can't.
      To run on GPU, the code would have to be written for running on GPU. You can't just recompile some C and call it a day

    3. Re:Optimizations Matter by Junta · · Score: 2

      The difference is the 'naive' code you write to do things in the simplest manner *can* run on a CPU. For the GPU languages, you *must* make those optimizations. This is not to undercut the value of GPU (as Intel concedes, the gap is large), but it does serve to counteract the dramatic numbers tauted by nVidia.

      nVidia compared expert tuned and optimized performance metrics on their product and compared against stock, generic benchmarks on intel products.

      --
      XML is like violence. If it doesn't solve the problem, use more.
  13. Re:AMD by Anonymous Coward · · Score: 0

    Wow. Whatever drugs your on

    That'd be "you're" or "your are", not "your".

    HTH. HAND.

  14. Still trying to keep Larrabee going? by Junta · · Score: 4, Insightful

    On top of being highly capable at massively parallel floating point math (the bread and butter of top500 and most all real world HPC applications), GPU chips benefit from economies of scale by having a much larger market to sell chips to. If Intel has an HPC-only processor, I don't see it really surviving. There have been numerous HPC only accelerators that provided huge boosts over cpus that flopped. GPUs growing into that capability is the first large scale phenomenon in hpc with legs.

    --
    XML is like violence. If it doesn't solve the problem, use more.
    1. Re:Still trying to keep Larrabee going? by Anonymous Coward · · Score: 0

      Except that Larrabee is x86 compatible.

      Did those past accelerators allow people to make their software faster with barely any (if any at all) rewriting?

    2. Re:Still trying to keep Larrabee going? by Amanieu · · Score: 1

      The software will need rewriting to take advantage of Larrabee anyways.

  15. Who cares anymore? by stewbacca · · Score: 1

    Does anyone under the age of 25 really care anymore about processor speed and video card "features"?

    I only ask because 15 years ago I cared greatly about this stuff. However, I'm not sure if that is a product of my immaturity at that time, or the blossoming industry in general.

    Nowadays it's all pretty much the same to me. Convenience (as in, there it is sitting on the shelf for a decent price) is more important these days.

    1. Re:Who cares anymore? by Overzeetop · · Score: 3, Insightful

      Two things: you've been conditioned to accept gaming graphics of yesteryear, and your need for more complex game play now trumps pure visuals. You can drop in a $100 video card, set the quality to give you excellent frame rates, and it looks fucking awesome because you remember playing Doom. Also, once you get to a certain point, the eye candy takes a backseat to game play and story - the basic cards hit that point pretty easily now.

      Back when we used to game, you needed just about every cycle you could get to make basic gameplay what would now be considered "primitive". Middling level detail is great, in my opinion. Going up levels to the maximum detail really adds very little. I won't argue that it's cool to see that last bit of realism, but it's not worth doubling the cost of a computer to get it.

      --
      Is it just my observation, or are there way too many stupid people in the world?
    2. Re:Who cares anymore? by Rockoon · · Score: 3, Informative

      Well as far as GPU's and Gaming, there are two segments of the population: Those with "low resolution" rigs such as 1280x1024 (most common group according to steam), and those with "high resolution" rigs such as 1920x1200.

      An $80 video card enables high/ultra settings at 60+ FPS on nearly all games for the "low resolution" group, but not the "high resolution" group.

      --
      "His name was James Damore."
    3. Re:Who cares anymore? by bat21 · · Score: 1

      I do, because I enjoy playing my games at 1920x1080 on high graphics settings with a decent frame rate. I think playing at low res on a crappy monitor degrades enjoyment of the game. I have a few friends that don't mind playing at 1024x768, and a couple more that still try to play the latest games with onboard video ("I can run Crysis if I set everything to low right?"), but that's more because they're cheap than because they don't care about high performance. That isn't to say I would go out and buy a $500 graphics card to get a few more fps, even a $200 card is pushing it.

    4. Re:Who cares anymore? by stewbacca · · Score: 1

      So you ARE under the age of 25! Joking aside, I find the $50-$75 3d cards to work just fine for new 3d games. This has been an adequate price-to-performance point for me since about 2003.

  16. Re:AMD by Rockoon · · Score: 4, Informative

    ..they have products in both segments.

    ..and for the record, AMD is still ruling the very high end multi-CPU (aka server) benchmarks and of course, we all know that their GPU's are top notch.

    AMD just isnt doing well in the high end consumer-grade space, but then again the chips that Intel is ruling with in that segment are priced well above consumer budgets.

    --
    "His name was James Damore."
  17. GPUs not top notch across the board... by Junta · · Score: 1

    Evergreen had a *huge* lead over pre-Fermi nVidia chips, and still leads in 32-bit precision (and by extension most of what the mass market cares about), but 64-bit precision lags Fermi. Of course, Evergreen beat Fermi to market by a large large margin.

    --
    XML is like violence. If it doesn't solve the problem, use more.
    1. Re:GPUs not top notch across the board... by Anonymous Coward · · Score: 0

      64-bit only marginally lags Fermi. It's pretty close.

    2. Re:GPUs not top notch across the board... by Anonymous Coward · · Score: 0

      Consumer grade Fermi chips only run at 1/8th double precision performance compared to single precision.
      http://www.geeks3d.com/20100329/geforce-gtx-480-and-gtx-470-have-limited-double-precision-performance/
      Evergreen chips run DP at 1/5th SP rate and Tesla chips run DP at 1/2 SP rate.
      So that gives you Fermi-based Tesla at 672 Gflops DP, HD5870 at 544 Gflops DP and GTX480 at 168gflops DP.

      So, for the consumer grade chips, ATI has almost 3.25x more DP power, based on the geeks3d article.

      I couldn't find any information on dual-chip Tesla's, so that crowns the HD5970 king with the most DP performance at 1088 GFlops DP for a third the price of the Tesla chip.

    3. Re:GPUs not top notch across the board... by mjwx · · Score: 1

      Evergreen had a *huge* lead over pre-Fermi nVidia chips, and still leads in 32-bit precision (and by extension most of what the mass market cares about), but 64-bit precision lags Fermi. Of course, Evergreen beat Fermi to market by a large large margin.

      It alternates between the two. First Nvidia is in the lead, ATI (AMD) takes a step forward and now ATI is in the lead. Then Nvidia takes a step forward and Nvidia regains top spot. So on and so forth, this is the best kind of competition you can hope for in a duopoly. Realistically, at both ATI's and Nvidia's top of the line procs the bottle neck will appear somewhere else in your system (same with high end Intel and AMD procs). I just ordered a 10K RPM disk for my gaming rig as my bottleneck was the ancient 320 GB drive I was using.

      --
      Calling someone a "hater" only means you can not rationally rebut their argument.
  18. Re:AMD by Junta · · Score: 4, Insightful

    AMD is the most advantaged on this front...

    Intel and nVidia are stuck in the mode of realistically needing one another and simultaneously downplaying the other's contribution.

    AMD can use what's best for the task at hand/accurately portray the relative importance of their CPUs/GPUs without undermining their marketing message.

    --
    XML is like violence. If it doesn't solve the problem, use more.
  19. Except that.. by Junta · · Score: 1

    Magny-Cours is currently showing significant performance advantage over Intel's offerings while at the same time AMD's Evergreen *mostly* shows performance advantages over nVidia's Fermi despite making it to market ahead of Fermi.

    AMD is currently providing the best tech on the market This will likely change, but at the moment, things look good for them.

    --
    XML is like violence. If it doesn't solve the problem, use more.
    1. Re:Except that.. by Entropius · · Score: 1

      I just got back from a lattice QCD conference, and there were lots of talks on GPGPU. Everybody's falling over each other trying to implement their code on GPU's because of the performance gains.

      *Every* talk mentioned Nvidia cards -- Geforce GTX nnn's, Tesla boards, Fermi boards. Nobody talked about AMD at all.

      Maybe AMD does have an advantage, but nobody's using it.

    2. Re:Except that.. by hvdh · · Score: 1

      Interestingly, most scientific papers talking about large speed gains (factor 2..10) by going from CPU to GPU computation compare a hand-optimized GPU implementation to a plain single-threaded non-SSE CPU implementation.

      From my experience, using SSE intrinsics gives a speed-up of 2..8 versus good generic code, and multi-threading gives more improvement until one hits the RAM bandwidth wall.

    3. Re:Except that.. by raftpeople · · Score: 1

      For those problems that map well to the GPU model of processing, the gains can be enormous (I have ported code to NVIDIA). However, some of my code works better on the CPU and some of it really needs a middle ground of many traditional cores with good branching support, etc. and not as many streaming cores all doing the same thing.

    4. Re:Except that.. by Anonymous Coward · · Score: 0

      nVidia bet the farm on GPGPU. The Fermi is nothing but 512 steam processing units (mini dumb CPUs).

      Maybe AMD does have an advantage, but nobody's using it.

      Whilst AMD's hardware isn't as emphasised on GPGPU stuff, they do have the better video cards. (nVidia's GTX 4xx series has horrible power usage and is universally slower than AMD's equivalent price-point cards on any and all 3D graphics tasks).

    5. Re:Except that.. by Ken_g6 · · Score: 1

      *Every* talk mentioned Nvidia cards -- Geforce GTX nnn's, Tesla boards, Fermi boards. Nobody talked about AMD at all.

      Maybe AMD does have an advantage, but nobody's using it.

      That's because nVIDIA has excellent support, both on Windows and Linux, and documentation for their CUDA GPGPU system. They even have an emulator so people without an nVIDIA GPU can develop for one. (Although it's now deprecated.)

      On the other hand, AMD has CAL, Stream, and OpenCL; and I can't even figure out which one I'm supposed to use to support all GPGPU-capable AMD cards. OpenCL has some documentation; I can't find anything good on CAL, and I can't find any way to develop for the platform on Linux without the hardware.

      That's why I've written a working CUDA app but nothing for AMD.

      --
      (T>t && O(n)--) == sqrt(666)
    6. Re:Except that.. by Anonymous Coward · · Score: 0

      The reason is that AMD has done very little work on optimising for GPGPU compared to NVIDIA. Their cards are great for graphics performance but they get creamed by NVIDIAs in compute tasks.
      Case in point from Anand's gtx 480 review: http://www.anandtech.com/show/2977/nvidia-s-geforce-gtx-480-and-gtx-470-6-months-late-was-it-worth-the-wait-/6
      TLDR: A GTX 285 outperforms a 5870 by a significant margin and the 480 destroys both.
      I've also noticed this performance difference in my own experience with OpenCL performance on AMD cards.

    7. Re:Except that.. by Anonymous Coward · · Score: 0

      Well, thus far NVIDIA has been far more committed to the whole idea of GPGPU than AMD has.

      And it is not just that the programming stack for NVIDIA GPUs is far more mature than ATI's. But it seems that the same can be said for the silicon.

      The performance difference in OpenCL code between a GT200 card and an ATI 4870 was abysmally biased in favor of the NVIDIA part. Even though, technically on paper the ATI chip has far higher peak FP performance in its shaders cumulatively.

      And we were able to reproduce the same results across OSX and Linux, so I am not surprised people are ignoring ATI when it comes to GPGPU codes.

      It's interesting because NVIDIA may be dead in the water if things like Fusion catch on. But then ATI depends on GPGPU for its future, and they still have a lot of catching up to do with NVIDIA. And then to make things even more weird, given how half assed Larrabee ended up becoming, Intel does not seem to be able to execute for shit when it comes to large on-chop multiprocessors.

  20. Re:AMD by Anonymous Coward · · Score: 0

    That would be "you are", not "your are".

  21. Take me out back and shoot me by WillyWanker · · Score: 1

    The day I build a computer with an Nvidia graphics processor as a CPU is when it's time to call 911, cause I will have completely lost my mind.

  22. Oh for cryin' out loud by werewolf1031 · · Score: 4, Insightful

    Just kiss and make up already. Intel and nVidia have but one choice: to join forces and try collectively to compete against AMD/ATI. Anything less, and they're cutting their nose off to spite their respective faces.

  23. Big Deal, A Barrel... by jedidiah · · Score: 3, Insightful

    Yeah, speciality silicon for a small subset of problems will stomp all over a general purpose CPU. No big news there.

    Why is Intel even bothering to whine about this stuff? They sound like a bunch of babies trying to argue that the sky isn't blue.

    This makes Intel look truely sad. It's completely unecessary.

    --
    A Pirate and a Puritan look the same on a balance sheet.
  24. Re:AMD by Anonymous Coward · · Score: 0

    This troll reminds me of that Dave Chappelle skit about the racist black guy.

  25. Re:AMD by Anonymous Coward · · Score: 0

    Truth is, blacks and hispanics are far more racist than whites.

  26. elementary matrix ops by Anonymous Coward · · Score: 0

    I wonder if matrix inversion could be done with an asic with massive performance improvement over typical cpus? Im thinking of hardware that is designed to natively describe very large (spares?) matrices efficiently, and perform elementary matrix ops on these matrices.

    is this possible? can you think of a way of implementing this, in terms of actual transistor logic?

  27. Re:Big Deal, A Barrel... by chriso11 · · Score: 2, Insightful

    The reason that Intel is whining is in the context of large number crunching systems or high end workstations. Rather than sell Ks of chips for the former, Nvidia (and to a lesser extent AMD) gets to sell hundreds of GPU chips. And for the workstations, Intel sells only one chip instead of a 2 to 4.

    --
    No, I don't trust in god. He'll have to pay up front, like everybody else.
  28. Larrabee is back? by Gri3v3r · · Score: 1

    I remember reading here on ./ that it got abandoned by Intel.

  29. Re:AMD by Anonymous Coward · · Score: 0

    Wow. All that and no mention of Eric S. Raymonds bisexual wife and their girlfriend? Or is that to close to a heterosexual male fantasy for the OP to imagine?

  30. Patent holding company by Anonymous Coward · · Score: 0

    So what you're saying is nVidia will become a patent holding company and probably make just as much money as they're making now.

  31. Re:AMD by Joce640k · · Score: 2, Insightful

    I don't think AMD really cares about competing with top-end Intel processors. It takes a lot of R&D investment with very little return (it's a tiny market segment)

    In the low/mid range AMD rules the roost in terms of value for money.

    --
    No sig today...
  32. Re:AMD by Anonymous Coward · · Score: 0

    there's something very misleading about this. i don't see any socket 1567
    cpus listed and the highest-listed intel quad socket is not listed on intel's
    web site. (http://ark.intel.com/ProductCollection.aspx?series=36934) if it
    were, it would be a 6 core/6 thread job. i don't know for a fact that the intel
    7560 (8c/16t @ 2.26ghz; http://ark.intel.com/ProductCollection.aspx?series=46487)
    would be faster, but 64 threads seems >> 24 to me! it should be at least
    listed.

  33. Larrabee Marketing == Direct-to-DVD? by cmholm · · Score: 1

    Intel decided to bail on marketing an in-house high performance GPU. But, they'd still like a return on their Larrabee investment. I don't doubt they would have been pushing the HPC mode anyway, but now, that's all they've got. Unfortunately for Intel, they've got to sell Larrabee performance based on in-house testing, while there are now a number of CUDA-based applications, and HPC-related conferences and papers are now replete with performance data.

    To Intel's and AMD/ATI's advantage, NVIDIA has signed on with the OpenCL effort, so as the first two start getting drivers out, they can give the later a run for their HPC-GPU money. At the moment, though, it's all talk.

    --
    Luke, help me take this mask off ... Just for once, let me butterfly kiss you with my own eyes.
  34. Re:AMD by cynyr · · Score: 1

    look at the X6 BE chip, 6 cores, better performance than anything intel has at the same price. It doesn't compete with Intels 12 "core" in 12 thead applications, but apart from video encodes(even thats iffy) you'll be hard pressed to find a 12 thread app that doesn't end up IO bound, as a home user.

    --
    All of the above was encrypted with a Quad ROT-13 method. Unauthorized decryption is in violation of the DMCA.
  35. Re:AMD by Rockoon · · Score: 1

    Note that one of AMD's 12-core Opterons is cheaper than Intel's top-of-the-line "consumer grade" 4-core i7 extreme, and THAT wouldn't kick the snot out of any i7 in I/O

    --
    "His name was James Damore."
  36. Yes, great sales pitch by raftpeople · · Score: 1

    Don't get me wrong, I like what Intel is doing, but c'mon, you are understating this:

    and the SIMD instructions that have been added to Intel/AMD CPUs in recent years really are the same thing you get with GPU programming, just on a bit smaller scale.

    It's an order of magnitude different (and I know from experience coding CPU and GPU)
    i7 960 - 4 cores 4 way SIMD
    GT285 (not 280) - 30 cores 32 way SIMD

    SP GFLOPS
    i7 960 - 102
    GT285 - 1080

    No matter what, AMD really wins in this one.

    AMD has the potential to win, but currently are in last place. Intel is aggressively solving all of the problems that previously gave AMD an advantage, and NVIDIA has aggressively put in place the things HPC wants (e.g. easy to code in C for the platform - I've done it and it is easy, also adding ECC and caching, etc.)

  37. Sorry Intel Nvidia Wins by Bruha · · Score: 1

    Using Badaboom a CUDA app, you can rip down DVD copies to your Ipod's in minutes, not hours.

    Unfortunately Badaboom are idiots and are taking their sweet time porting to the 465/470/480 cards.

    I'd love to see a processor fast enough to beat a GPU at tasks such as these, and cd to mp3 conversions on CUDA, it's like moving from a hard drive to a fast SSD.

  38. That's the big draw of the Teslas by Sycraft-fu · · Score: 2, Informative

    I mean when you get down to it, the seem really overpriced. No video output, their processor isn't anything faster, what's the big deal? Big deal is that 4x the RAM can really speed shit up.

    Unfortunately there are very hard limits to how much RAM they can put on a card. This is both because of the memory controllers, and because of electrical considerations. So you aren't going to see a 128GB GPU or the like any time soon.

    Most of our researchers that do that kind of thing use only Teslas because of the need for more RAM. As you said, the transfer is the limiting factor. More RAM means less often you have to snuffle data back and forth.

    1. Re:That's the big draw of the Teslas by JanneM · · Score: 1

      The problem is when you have a larger system, with hundreds of cores, and an iterative simulation. You run the system for a cycle, propagate data, then run for another cycle and so on. In that case you can't isolate a long-running process on the card, and you end up having to squeeze data through that bus for each cycle anyway. It is likely still well worth using GPUs but you do need to take a good look at whether adding GPUs are more or less effective than using your funds to simply add more cores instead.

      I expect over time to see better suited interfaces appear for this type of computing.

      --
      Trust the Computer. The Computer is your friend.
  39. Overall I think you are right by Sycraft-fu · · Score: 1

    But I think the timescale will be a very long one.

    I mean ideally, we want only the CPU in a computer. The whole idea of a computer is that it does everything, rather than having dedicated devices. Ideally that means that it does everything purely in software, that the CPU is all it needs. For everything else, we seem to have reached that point but graphics are still too intense. Have to have a dedicated DSP for them.

    However, we'll keep wanting that until the CPU can do photorealistic graphics in realtime. That is a long way off yet. Even GPUs can't do that. Once GPUs can, the trick is then being able to scale that down to become a realistic subset of the CPU, rather than a dedicated unit. You can't very well scale CPUs up to massive sizes and power consumptions.

    So I've no doubt it'll happen, but I think not for 20+ years.

  40. Re:AMD by Anonymous Coward · · Score: 0

    AMD is the most advantaged on this front...

    Intel and nVidia are stuck in the mode of realistically needing one another and simultaneously downplaying the other's contribution.

    Exactly, and this manifested in Intel's new Pinetrail platform to the consumer's detriment. Intel refused to grant NVidia the license to connect their ION chipset via DMI, and so people planning on using Pinetrail in HTPC's were saddled with Intel's own chipset with crappy graphics performance (No Native Hardware H.264 Decoding: Long Live Ion).

  41. Re:AMD by Anonymous Coward · · Score: 0

    How is AMD "ruling" the high end multi-CPU benchmarks??

    1) [Quad CPU] AMD Opteron 6168 = 23,784
    2) [8-Way] Six-Core AMD Opteron 8435 = 22,745
    3) [Dual CPU] Intel Xeon X5680 @ 3.33GHz = 17,377

    Let me translate for you:

    1) 48 core AMD box = 23,784
    2) 48 core AMD box = 22,745
    3) 12 core Intel box = 17.377

    Somewhat less impressive hey?
    They aren't showing scores for Intel's 8-socket beckton systems which have 64 core, or 4-socket systems which have 32 cores.

    AMD is getting completely spanked up and down the CPU spectrum. They need a new core - maybe Bulldozer will be great.

  42. Re:AMD by exomondo · · Score: 1

    look at the X6 BE chip, 6 cores, better performance than anything intel has at the same price. It doesn't compete with Intels 12 "core" in 12 thead applications, but apart from video encodes(even thats iffy) you'll be hard pressed to find a 12 thread app that doesn't end up IO bound, as a home user.

    Apart from video encodes you'll be hard pressed to find a 12 thread app, as a home user. (as in actually thrashing 12 threads at once)

  43. Re:AMD by Calinous · · Score: 1

    AMD really cares about competing with top-end Intel processors - as when the Athlon ruled the roost, AMD sold its chips at a premium. Now (since the Core2Duo launched), with Intel in top spot, AMD is selling its processors cheaper, so it's losing possible profit.

  44. The Solution? by Vasheron · · Score: 1

    From Wikipedia, "OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors." In other words, write your massively parallel programs using OpenCL and then run them on the device (or combination of devices) that executes your program the fastest.

    Hopefully, OpenCL will have the same cataylzing effect on HPC that OpenGL had on computer graphics, but time will tell.

    Word of warning to Intel: Almost nobody wants to hand-code assembly to run your SIMD instructions. People doing HPC (at least the ones using CUDA) are scientists and engineers who typically have better things to worry about than reading through detailed tomes on the i7 architecture. Make it more convenient (i.e. via OpenCL) or continue to lose market share in this area.

  45. Re:AMD by Anonymous Coward · · Score: 0

    Given how Intel's current Nehalem interconnect and chipsets perform far higher than anything AMD's has to offer. I wonder how AMD manages to kick their snot out of i7 in I/O with a worse performing interconnect.