Slashdot Mirror


Five Nvidia CUDA-Enabled Apps Tested

crazipper writes "Much fuss has been made about Nvidia's CUDA technology and its general-purpose computing potential. Now, in 2009, a steady stream of launches from third-party software developers sees CUDA gaining traction at the mainstream. Tom's Hardware takes five of the most interesting desktop apps with CUDA support and compares the speed-up yielded by a pair of mainstream GPUs versus a CPU-only. Not surprisingly, depending on the workload you throw at your GPU, you'll see results ranging from average to downright impressive."

134 comments

  1. Nice, but... by mikiN · · Score: 2, Funny

    post.push("First!");

    All fine and dandy, but...does it run Linux?

    --
    The Hacker's Guide To The Kernel: Don't panic()!
    1. Re:Nice, but... by slummy · · Score: 4, Informative

      CUDA is a framework that will work on Windows and Linux.

    2. Re:Nice, but... by gustgr · · Score: 4, Informative

      I know you are trolling, but actually CUDA applications work better on Linux than on Windows. If you run a CUDA kernel on Windows that lasts longer than 5~6 seconds, your system will hang. The same will happen on Linux but then you can just disable the X server or have one card providing your graphical display and another one as your parallel co-processor.

    3. Re:Nice, but... by mikiN · · Score: 4, Funny

      Well, everywhere else in the world, Linux runs the CUDA Toolkit, so I can imagine that in Soviet Russia, a Beowulf cluster of Nvidia cards run Linux.

      --
      The Hacker's Guide To The Kernel: Don't panic()!
    4. Re:Nice, but... by mikiN · · Score: 2, Informative

      Queue mip-mapped, 8xAA, subpixel rendered, fogged, PhysX enhanced flyby of a 'Whoosh' passing over your head.

      The question was not whether CUDA runs _on_ Linux, but whether the GPU itself can run Linux.

      I can imagine that, if we had ever been given all the specs, a multi-function DSP card like IBM's Mwave could. It would probably even be able to read aloud console messages (besides being a graphics card and modem, it's also a sound card).

      --
      The Hacker's Guide To The Kernel: Don't panic()!
    5. Re:Nice, but... by Anpheus · · Score: 0

      Are you certain this is the case?

      I'm curious because ATI/AMD appear to have solved that problem, in that I can run the Folding@Home GPU client and my displays still run. I'm running Windows 7 with Aero, so it's hitting the GPU not the CPU for my displays.

    6. Re:Nice, but... by 3.1415926535 · · Score: 4, Informative

      Folding@Home runs its computations in short bursts. gustgr is talking about a single computation kernel that takes more than 5-6 seconds.

    7. Re:Nice, but... by Jah-Wren+Ryel · · Score: 3, Informative

      He's not talking about how long the app itself runs, but how long each subroutine that runs on the GPU runs before returning something back to the app on the CPU side. If that subroutine takes too long to complete windows gets unhappy. I don't remember if it was a watchdog timer thing or a bus-locking thing or something else. I don't even know if its been fixed or not.

      --
      When information is power, privacy is freedom.
    8. Re:Nice, but... by Jah-Wren+Ryel · · Score: 5, Insightful

      Does it matter? Linux is not anywhere close to the target market,

      Linux support for CUDA matters hugely, Linux boxes are head and shoulders above any other market for CUDA-based software. That's because linux is the OS for supercomputing nowadays and CUDA's biggest niche is the exact same kind of number crunching that is typically associated with supercomputer workloads.

      In fact, these GPUs are yet another example of how there is nothing new under the sun. A GPU is very much like the vector processor of Cray-style supercomputing (when Cray was still alive that is) aka SIMD (single instruction, multiple data).

      --
      When information is power, privacy is freedom.
    9. Re:Nice, but... by David+Greene · · Score: 1

      Uhh...Cray is still very much alive. And doing vectors. And threads. And multicore. All long before Intel/AMD.

      --

    10. Re:Nice, but... by bigstrat2003 · · Score: 2, Informative

      I know you are trolling...

      No, he's joking. Stop crying troll when there's not even a hint of troll, for God's sake.

      ...but actually CUDA applications work better on Linux than on Windows.

      Read carefully. He said "does it run Linux?", not "does it run on Linux?". Overused slashdot meme it might be, but the joke still went miles above your head.

      --
      "16MB (fuck off, MiB fascists)" - The Mighty Buzzard
    11. Re:Nice, but... by fuzzyfuzzyfungus · · Score: 3, Insightful

      If anything, NVIDIA is likely far more interested in CUDA working on Linux then in openGL working on Linux(something that they obviously do have some interest in).

      Gamers, certainly, most likely have Windows systems. Workstation applications are likely a good chunk of Windows, with a slice of Mac, and some Linux.

      Bulk crunching, though, which is where CUDA might make NVIDIA some real money, is overwhelmingly Linux based. Linux is, by a substantial margin, the obvious choice for big commodity clusters.

    12. Re:Nice, but... by Anpheus · · Score: 1

      Thanks for the clarification.

    13. Re:Nice, but... by Jah-Wren+Ryel · · Score: 4, Informative

      Uhh...Cray is still very much alive. And doing vectors. And threads. And multicore. All long before Intel/AMD.

      Seymour Cray was killed by a speeding redneck in a trans-am in 1996.

      The company currently known as Cray as formerly known as TERA, which bought the assets of Cray Research from SGI who acquired Cray Research after Seymour had left to form Cray Computer which is also defunct.

      Seymour was never significantly involved in multi-core or multi-threaded processors or NUMA. In fact, he specifically avoided designs even hinting of that sort of complexity because he felt that simplicity in design made it easier to fully utilize the maximum performance of the hardware.

      --
      When information is power, privacy is freedom.
    14. Re:Nice, but... by Anpheus · · Score: 1

      Thanks for the clarification, as well.

    15. Re:Nice, but... by jgtg32a · · Score: 1

      I'll give him the benefit of the doubt on does it run Linux" and "does it run on Linux" I read it the same way and didn't notice it until I saw your comment

    16. Re:Nice, but... by umeboshi · · Score: 1, Funny

      Seymour Cray was killed by a speeding redneck in a trans-am in 1996.

      Well, at least it wasn't a speeding redneck in a 'cuda. ;)

    17. Re:Nice, but... by Dragonslicer · · Score: 1

      Queue mip-mapped, 8xAA, subpixel rendered, fogged, PhysX enhanced flyby of a 'Whoosh' passing over your head.

      What, this thing runs on AA batteries? Sweet.

      And as a side note, unless you were talking about a long line of whooshes, the word you were looking for is "cue".

    18. Re:Nice, but... by Anonymous Coward · · Score: 0

      Queue mip-mapped, 8xAA, subpixel rendered, fogged, PhysX enhanced flyby of a 'Whoosh' passing over your head.

      What, this thing runs on AA batteries? Sweet.

      And as a side note, unless you were talking about a long line of whooshes, the word you were looking for is "cue".

      AA = Antialiasing.

      Wait, I sense that thing is flying towards me now...

    19. Re:Nice, but... by parlancex · · Score: 5, Interesting

      In fact, these GPUs are yet another example of how there is nothing new under the sun. A GPU is very much like the vector processor of Cray-style supercomputing (when Cray was still alive that is) aka SIMD (single instruction, multiple data).

      Actually, not quite. The execution architecture in the Nvidia's G80 series GPUs and onwards is actually SIMT, single instruction multiple threads. The not so subtle difference here is that in a SIMD vector architecture the application explicitly manages instruction level divergence which will generally narrow the SIMD width of divergent paths to only 1 path, whereas in a SIMT architecture when threads diverge within a warp all divergent threads executing the same branch within that warp can be issued an instruction simultaneously, with the threads that are not on that branch within that warp inactive for that cycle. This is transparent to the application. Currently in Nvidia's latest architecture the warp size is still statically set at 32 threads so you'll see performance penalties when threads within any warp diverge proportional to the number of unique paths taken. Interestingly the next iteration of the hardware is rumored to feature a thread scheduler capable of variable warp sizes, probably still with some lower bound, but this would bring the GPU much closer to the ideal "array of independently executing processing cores" that we have in modern CPUs, but with obviously far more cores.

    20. Re:Nice, but... by David+Greene · · Score: 0

      Seymour Cray was killed by a speeding redneck in a trans-am in 1996.

      So? Cray != Seymour. In fact the most successful Cray machines were not designed by Seymour.

      The company currently known as Cray as formerly known as TERA, which bought the assets of Cray Research from SGI who acquired Cray Research after Seymour had left to form Cray Computer which is also defunct.

      So? Many of the engineers there have been there for a long time. Even if they've been bounced around between companies, it's a good number of the same people. And who's to say that SGI and Tera didn't provide some good brainpower to the current Cray, Inc.? No one has a monopoly on good design.

      Seymour was never significantly involved in multi-core or multi-threaded processors or NUMA. In fact, he specifically avoided designs even hinting of that sort of complexity because he felt that simplicity in design made it easier to fully utilize the maximum performance of the hardware.

      So? Seymour was wrong. It worked in the early days of CDC and Cray Research but it doesn't work any more. The microprocessor vendors made sure of that. Honestly, the man wasn't a god.

      --

    21. Re:Nice, but... by Jah-Wren+Ryel · · Score: 1

      So? Cray != Seymour. In fact the most successful Cray machines were not designed by Seymour.

      So? A company can not be alive.

      --
      When information is power, privacy is freedom.
    22. Re:Nice, but... by F34nor · · Score: 0

      Not according to the supreme court. Currently corporations are in fact granted both human rights and limited liability for the investors. The reading of the fourteenth amendment goes that we granted human rights to property and therefore property has rights. Throw in money being a form of speach and you have Enron and banking deregulation

    23. Re:Nice, but... by Anonymous Coward · · Score: 1, Insightful

      This is not a bug, it's a feature. It prevents your app from taking the OS down with it (or at least, part of it). It is generally quite easy to ensure your kernels are small enough to last less than 5s. Since a kernel launch takes a few milliseconds, it is both easy and efficient to subdivide a big computation into several consecutive kernel calls.

    24. Re:Nice, but... by AmiMoJo · · Score: 2, Informative

      Presumably it's some kind of issue with CUDA because running code on ATI GPUs does not seem to have this problem. Also, multiple GPUs are supported by apps like Elcomsoft's Wireless Password Recovery on Windows.

      It should be fixable anyway, since modern GPUs are massively parallel and desktop stuff only needs only a fraction of the available processing, even if it's just a case of setting a few stream processors aside.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    25. Re:Nice, but... by mangu · · Score: 1

      Seymour Cray was killed by a speeding redneck in a trans-am in 1996.

      According to Wikipedia: "Daniel Rarick, 33, had tried to pass Cray on Interstate 25 in Colorado Springs, Colorado, struck another car, which then struck Cray's Jeep Cherokee, causing it to roll 3 times. Rarick received a citation for careless driving causing serious bodily injury. He was unhurt in the accident"

      It says nothing there about Rarick being a redneck or not, but at least one thing is clear: the American myth about SUVs being safer than small cars is busted, at least in this anecdote. SUVs are more likely to roll over and break your neck.

    26. Re:Nice, but... by robthebloke · · Score: 1

      The whoosh you mean? Nah, he's on holiday this week. Remember him saying something about having to go to the funeral of some Duke Nukem Forever jokes....

    27. Re:Nice, but... by mpdolan37 · · Score: 1

      In Soviet Russia the Cluster runs you!!!

      --
      Facts are useless, they can be used to prove anything.
    28. Re:Nice, but... by Anonymous Coward · · Score: 0

      Yeah, heaven knows people would never be greedy all on their own. It's the corporations! Never people.

      2009, A Slashdot Odyssey - My god, it's full of retards.

    29. Re:Nice, but... by David+Greene · · Score: 1

      You're making the mistake of equating a company's products with one person. It doesn't work that way. Seymour Cray did not single-handedly produce any machine. It takes a team of dedicated people to do that. Therefore, this quote is a non-sequitur if "Cray" means an individual:

      A GPU is very much like the vector processor of Cray-style supercomputing (when Cray was still alive that is)

      The quote only makes sense if "Cray" refers to the company. "Cray-style supercomputing" only exists because of a group of people, not one man. Otherwise you're going to be in serious trouble when the one man gets hit by a bus.

      And let's be a little more accurate with our history (this is directed to all of us). Seymour Cray had many more failures than successes. He had a tendancy to go for the perfect instead of the good and that resulted in numerous over-budget, cancelled projects. We need to be much more realistic about how we treat key figures in our industry, lest we fail to learn the lessons they've provided us.

      --

    30. Re:Nice, but... by Jah-Wren+Ryel · · Score: 1

      You're making the mistake of equating a company's products with one person. It doesn't work that way.

      No, YOU are making that mistake. It was quite clear from my original wording that I was talking about Seymour Cray.

      I wrote:A GPU is very much like the vector processor of Cray-style supercomputing (when Cray was still alive that is)

      • I use the word 'alive' because I'm talking about a person -- by my detailed follow-up with information not contained in wikipedia it should be obvious that I was quite aware that a company with the name Cray still exists
      • If I were talking about a company, what sense would it make to exclude the kind of computers the company made after it was defunct? Name one company that not only manufactures but designs new computers after it has gone out of business.

      And let's be a little more accurate with our history (this is directed to all of us). Seymour Cray had many more failures than successes.

      You clearly have a bug up your ass about Cray and your entire contribution to this thread has been to show off that bug, to the point of making an ass of yourself.

      --
      When information is power, privacy is freedom.
    31. Re:Nice, but... by David+Greene · · Score: 1

      It was quite clear from my original wording that I was talking about Seymour Cray.

      No, it wasn't. But let's not continue a useless argument. Obviously I misunderstood what you were saying. But I'll note that Seymour didn't invent vector computing, so the statement is a little misleading. To his credit, Cray was the first ot use vector registers, which was an important innovation.

      If I were talking about a company, what sense would it make to exclude the kind of computers the company made after it was defunct? Name one company that not only manufactures but designs new computers after it has gone out of business.

      Cray the company did not go out of business. It got acquired a spun off and acquired again. That's not quite the same thing as going under. And as I noted earlier, it's the same people.

      You clearly have a bug up your ass about Cray and your entire contribution to this thread has been to show off that bug, to the point of making an ass of yourself.

      Well, that's your opinion. I simply think it's unwise to ignore the lessons history teaches.

      --

    32. Re:Nice, but... by Jah-Wren+Ryel · · Score: 1

      No, it wasn't.

      Only to someone too caught up in his own preoccupation with Cray to be bothered to read closely. Let's illustrate:

      But I'll note that Seymour didn't invent vector computing,

      No one here has said THAT either. But you are so caught up in your own issues with the man that you read that into what I wrote too.

      Cray the company did not go out of business. It got acquired a spun off and acquired again.

      Gee, I had nooooo idea. NOT. If you had thought for half a second you would have realized that since Cray Research is still around in some form or another than it was a good bet that I must have been talking about some other Cray when I said "when Cray was still alive." How you can note such an obvious discrepancy and still think the quote "only makes sense" if you read it in the way that does not jibe with the facts can only be explained by you being blinded to basic reading comprehension because of your fixation.

      Well, that's your opinion. I simply think it's unwise to ignore the lessons history teaches.

      Yeah that's true, it is unwise to ignore the lessons history teaches, but you know what else is unwise? Inserting random tangents into a discussion and then using them as strawmen. Atari is still "alive" today and Nolan Bushnell doesn't deserve any of the credit for that either.

      --
      When information is power, privacy is freedom.
    33. Re:Nice, but... by F34nor · · Score: 1

      No you AC idiot that has nothing to do with what I said. What I said was that the courts granted corporations human rights. They have both limited liability and the right to speech vs. being a regulated industry. If the government can demand to walk in and look at your books anytime they want then they are less likely to pull Enron style shit. For instance coal mines are "heavy regulated industry" meaning that the government can walk in anytime they want and see what is going on. Corporation were originaly intended to have limited rights in return for limited liability. Now they make more money than many countries but are not whole governed by the public. That's fucked up. You sir are the retard.

  2. The war begins. by XPeter · · Score: 2, Interesting

    With NVIDIA slowly pushing it's way into the CPU market (CUDA is the first step, in a few years I wouldn't be surprised if Nvidia started developing processors) and Intel trying to cut into NVidia's GPU market share with Larrabee http://en.wikipedia.org/wiki/Larrabee_(GPU), we'll see who can develop outside of their box faster. This is good news for AMD since Intel will be more focused on Nvidia instead of being neck to neck with them in the processor market. Hey, maybe AMD will regain it's power in the server and netbook realms.

    There's also going to be a battle of patents pretty soon too. Wish I was a tech lawyer.

    --
    "The difference between genius and stupidity is that genius has it's limits" - Albert Einstein
    1. Re:The war begins. by David+Greene · · Score: 2, Interesting

      It's going to be interesting to see how Larrabee and AMD's Fusion battle it out. With Larrabee, Intel is taking a tightly integrated approach. One can easily imagine that LRBni will be integrated into mainstream CPUs in the not-so-distant future, at which point Intel will argue that no one needs a GPU.

      AMD, on the other hand, is taking he approach of (relatively) loosely-coupled specialized processors. One, the CPU, for general-purpose/integer/branchy code and the GPU for graphics (and HPC?).

      Currently my bet is on Intel because of the much simpler Larrabee programming model. But if the performance isn't there, things could get heated.

      --

    2. Re:The war begins. by Bat+Country · · Score: 1

      I'd honestly like to see the two work together to produce some sort of sickeningly powerful rendering setup.

      A processor which was good at preprocessing a scene for maximum performance on the GPU hardware and built-in support for multiple display adapters, plus an on-board chip which handles outputting the resulting images via the digital-link-du-jour.

      This sort of setup would mean that rather than having to update your GPUs every two years (you could just buy another one to run in parallel) - the graphics card manufacturers could get better at producing the hardware with a larger profit margin due to longer product lifetimes, the CPU manufacturers could get in on the action like they so clearly want to, and the motherboard chipset manufacturers could get in endless bidding wars to produce the best output signal pipeline and video decoders.

      Nobody would come out a loser, and the whole thing would be more friendly to consumers in a depressed economy, which I've no doubt customers would respond to.

      --
      The land shall stone them with the bread of his son.
    3. Re:The war begins. by Narishma · · Score: 1

      There's no power in the netbook realm for AMD to regain as it never had any to begin with. The netbook market is 95% Intel and the rest is mainly VIA and a smattering of MIPS and ARM nobody seems to care about.

      --
      Mada mada dane.
  3. Tied to a card by ComputerDruid · · Score: 5, Insightful

    What I don't understand is why people hype a technology that is tied to a specific manufacturer of card. If nvidia died tomorrow, we'd have a fair amount of code thats no longer relevant, unless there was some way to design cards that are CUDA-capable but not nvidia.

    Also worth noting that I'd completely forgotten CUDA even ran on windows, as I've only heard it in the context of linux recently.

    1. Re:Tied to a card by gustgr · · Score: 5, Insightful

      OpenCL will hopefully help to set a solid ground for GPU and CPU parallel computing, and since it is not technically very different from CUDA, porting existing applications to OpenCL will not be a challenge. Nowadays with current massively parallel technology the hardest part is making the algorithms parallel, not programming any specific device.

    2. Re:Tied to a card by egr · · Score: 1

      I think there was an open source alternative which is not tied to any card, but I forgot what its name was. And I never programmed for it, so I don't know how well it preforms.

    3. Re:Tied to a card by Caelius · · Score: 2, Informative

      Open CL is the open source CUDA alternative. http://en.wikipedia.org/wiki/OpenCL

    4. Re:Tied to a card by Anonymous Coward · · Score: 0

      How is OpenCL "open source"??

    5. Re:Tied to a card by Darkness404 · · Score: 1

      Cross platform, royalty free, support from all major vendors... etc.

      --
      Taxation is legalized theft, no more, no less.
    6. Re:Tied to a card by jared9900 · · Score: 1

      OpenCL is not open source, OpenCL is a specification for a CUDA-equivalent language and API. Drivers are still necessary, and will likely be produced by the makers of the graphics hardware (ATI, Nvidia, Intel). Open source drivers and compilers are certainly possible, but I wouldn't expect them to be equivalent to the closed source stuff for sometime yet.

    7. Re:Tied to a card by TheRaven64 · · Score: 3, Informative

      OpenCL is an open standard, but there is not yet an open source implementation. That said, OpenCL is very similar to GLSL, and there is already a GLSL front end for LLVM being worked on by Mesa and Tungsten Graphics, so extending it to support OpenCL should be relatively easy.

      --
      I am TheRaven on Soylent News
    8. Re:Tied to a card by Anonymous Coward · · Score: 4, Informative

      I hear this a lot in CUDA/GPGPU-related threads on slashdot, primarily from people who simply have zero experience with GPU programming. The bottom line is that in the present and for the foreseeable future, if you are going to try to accelerate a program by offloading some of the computation to a GPU, you are going to be tying yourself to one vendor (or writing different versions for multiple vendors) anyways. You simply cannot get anything approaching worthwhile performance from a GPU kernel without having a good understanding of the hardware you are writing for. nVidia has a paper that illustrates this excellently, in which they start off with a seemingly good "generic" parallel reduction code and go through a series of 7 or 8 optimizations -- most of them based on knowledge of the hardware -- and improve its performance by more than a factor of 30 versus the generic implementation.

      Another thing to keep in mind is that CUDA is very simple to learn as an API -- if you're familiar with C you can pick up CUDA in an afternoon easily. The difficulty, as I said in the previous paragraph, is optimization; and optimizations that work well for a particular GPU in CUDA will (or at least should) work well for the same GPU in OpenCL.

    9. Re:Tied to a card by jared9900 · · Score: 4, Informative

      But OpenCL is a specification, not an implementation. The only 3 implementations I'm currently aware of is Apple's (with Snow Leopard), AMD demoed implementation back in March, and Nvidia's beta implementation. So far none of those are open source. If you're aware of an open source implementation, please let me know I'm actually very interested in it, but have yet to locate one.

    10. Re:Tied to a card by mathimus1863 · · Score: 2, Interesting

      In general, it's not tied to a card. CUDA itself might be NVIDIA-dependent, but general-purpose GPU programming is not, and other manufacturers will have similar interfaces to GP-GPU programming, eventually.

      As for my own experience with it... everyone at work is going crazy over them. One of our major simulations implements a high-fidelity IR scene modeler. It used to take 2 seconds per frame on CPU-only. They re-wrote it with GPU and got it down to 12 ms.

      Anything that is highly parallelizable with low memory transfer reqts will get a pretty impressive speedup. My co-worker who has been doing this for a year now was explaining that computation is essentially free, it's the memory operations which are the bottleneck.

    11. Re:Tied to a card by jasprov · · Score: 1

      That's where abstraction and specialization comes into play. After defining your algorithm for independent use, specialize and optimize it to exploit current or future hardware. This gives you a fallback for calculation, and extremely enhanced performance for the life and support of said hardware. And, as others have pointed out, it's a stepping stone to an OpenCL implementation, eventually giving you multiple vendors to rely on.

      If NVIDIA goes out of business or drops support in two years, how much more work will you have gotten done over that time? If it's any less than the cost of implementing the specialized solution, it's worth it.

      Is there risk? Yes. And, it's highly mitigated with the abstracted solution and migration paths.

    12. Re:Tied to a card by Anonymous Coward · · Score: 0

      Just as all x86 code will no longer be relevant if Intel died tomorrow.

    13. Re:Tied to a card by Caelius · · Score: 1

      OpenCL is an open standard, but there is not yet an open source implementation.

      Thanks for clarifying to everyone for me. I was in a hurry and misspoke. I was trying to imply that it wasn't tied to a single company/entity like CUDA, but rather a consortium of industry players, and "open-source" is what my fingers typed, instead of "open standard." Gah.

    14. Re:Tied to a card by CAIMLAS · · Score: 2, Interesting

      How is this different than AMD-v, which Intel licenses for their virtualization (or maybe I'm confusing it with a64, which Intel licenses)?

      Either way, if AMD "died tomorrow", the same thing would happen as would happen if Nvidia did: some other company, likely a previous competitor, would buy up the technology, and things would continue with barely a hickup.

      A product or technology does not need to be open source or 'standards based' to gain wild adoption. Sometimes, a technology speaks for itself. After all, ARM CPUs are literally everywhere, as are many other things which are quite closed (as I'm sure you're aware). There will be someone else waiting in the wings to pick up the chalice, should it be dropped, with all worthwhile technology.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    15. Re:Tied to a card by Lucractius · · Score: 1

      This of course assumes that OpenCL is able to make a foothold and has support from the hardware and gets some software that really shows the improvements that other developers can get using it.

      Without those it wont have enough traction/mindshare.

      --
      XML - A clever joke would be here if /. didn't mangle tag brackets.
  4. For folders by esocid · · Score: 3, Informative

    Fold@home can use CUDA in linux, but you have to compile the CUDA driver first.

    --
    Absolute power corrupts absolutely. indymedia
  5. Tom's Hardware by sexconker · · Score: 0, Troll

    Totally not a biased, money-hatted site. Totally. Trust us.

    (Not saying they're biased in this case, but because of the bullshit they've pulled in the past I'll never visit their site again.)

    1. Re:Tom's Hardware by crazipper · · Score: 2, Interesting

      I'd welcome the opportunity to prove otherwise. I've been managing editor for the last year, and much has changed. Best, Chris

    2. Re:Tom's Hardware by Anonymous Coward · · Score: 0

      Used to go there all the time, but stopped going because they weren't posting the kind of reviews that I wanted to read (up-to-date roundups).

      What bullshit are you referring to?

    3. Re:Tom's Hardware by feepness · · Score: 1

      Without being specific about the bullshit you are referring to, you just make yourself look like a fanboi whose favorite card was slammed.

    4. Re:Tom's Hardware by XPeter · · Score: 5, Funny

      Totally not a biased, money-hatted site. Totally. Trust us.

      Hi! You must be new to the internet as well as Slashdot, let me give you some tips.

              1. Always use the word "lunix" in place of "linux" in slashdot's discussion forums.
              2. You can steal mod points by copying someone else's insightful comment and pasting it as a reply to an earlier one.
              3. Mac users are a bunch of fucking queers.
              4. When there's something you need to do that can't be done with Windows but can be done with Lunix, keep in mind that you can do an even better job with Mac OS X. Some argue that BSD can do it better but no one makes software for BSD since no one gives a flying fuck.
              5. Adequacy.org was one of the best sites on the internet. Want to know if your sons a computer hacker? Click here! http://www.adequacy.org/stories/2001.12.2.42056.2147.html

      Good luck, friend!

      --
      "The difference between genius and stupidity is that genius has it's limits" - Albert Einstein
    5. Re:Tom's Hardware by ChunderDownunder · · Score: 3, Insightful

      To be honest, it's all about advertising.

      C'mon, 15 pages? You wonder why few of us ever RTFA...

      Make Slashdot linked articles direct to a single page version, with maybe a handful of ads, and we may stick around and look at the rest of your site. Otherwise, it's potentially 1 million readers who may not bother clicking the URL, or just skip to the conclusion and miss the point of the article - perhaps hurting sales of advertised nvidia cards, the crux of the article's technology.

    6. Re:Tom's Hardware by Kagura · · Score: 1

      I'd welcome the opportunity to prove otherwise. I've been managing editor for the last year, and much has changed. Best, Chris

      Tom's Hardware has been the best consistent site that I've gone to for the past four video cards I've bought (spanning many years). I'm happy with their benchmarks, more or less. I can deal with the 15 pages per article, but I am not impressed with that aspect.

    7. Re:Tom's Hardware by crazipper · · Score: 1, Informative

      I'll pass this feedback along to the design guys, but do you *really* want to scroll through 4,000 words and 50-some charts, rather than looking at just the pages you're interested in reading? Surely the length would be a bigger problem if there wasn't an index, right? TBH, I'm most focused on the editorial side of things.

    8. Re:Tom's Hardware by XPeter · · Score: 1

      Chris, as long as you keep the drop-down menu I'll keep reading Tom's.

      --
      "The difference between genius and stupidity is that genius has it's limits" - Albert Einstein
    9. Re:Tom's Hardware by crazipper · · Score: 1

      Cheers X. The devs got rid of it for a few days there and they definitely got an earful ;-)

    10. Re:Tom's Hardware by XPeter · · Score: 1

      Oh and one thing...when's the next SBM? I'm looking to build a new rig in the 2-2.5k range and I want to use the SBM's as a guide. We need some Q1 charts soon too :) Anyway enough with my demands, keep up the good work. Love the site. -Peter

      --
      "The difference between genius and stupidity is that genius has it's limits" - Albert Einstein
    11. Re:Tom's Hardware by crazipper · · Score: 1

      Next SBM starts next Monday and includes $600, $1,350, and $2,250 price points. Oh--and hold off on the purchase. All three systems are actually going to be given away this time around, so you never know. Might win one :)

    12. Re:Tom's Hardware by XPeter · · Score: 1

      If I won one of the PC's, I wouldn't use it. It would be placed on a glass shelf in my room and if someone goes near it, I release the hounds :)

      --
      "The difference between genius and stupidity is that genius has it's limits" - Albert Einstein
    13. Re:Tom's Hardware by ChunderDownunder · · Score: 3, Insightful

      Definitely YES, if it's an article worth viewing. I mightn't think I'm interested in a topic, only to find I am. :) Clicking a link after a screen only disrupts one's concentration, while the next page loads, when most of us just use a scroll wheel. And as far as revenue goes, you can fill an entire sidebar with ads, if lost advertising is a concern...

      And to whoever moderated his post a troll, get a life. He's trying to improve the experience for us readers and we should encourage dialog...

    14. Re:Tom's Hardware by rasherbuyer · · Score: 1

      There were ads?

    15. Re:Tom's Hardware by linhares · · Score: 2, Informative
      and seriously, are you talking gpgpu performance or the magical wonders of seti@home, h264, science funding, and so on? So many pages wasted... and of course, much worse: my time wasted on the poetry.

      If you absolutely need this type of wandering off to have more pages and more clicks to survive on the web, then I'm concerned your site may not last for very long. I personally love the site, but these 15-page wonderings off the subject drive me fucking nuts.

    16. Re:Tom's Hardware by Khyber · · Score: 3, Insightful

      Here's why you're proven to be a money-hatted site.

      Advertising bandwidth versus actual article content bandwidth. Your advertising uses up about 2500% more bandwidth than the actual article content.

      You care more about advertising than you do about content. That's why you split everything up into so many pages that I could have done in less than two, single-spaced, 20 point font.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    17. Re:Tom's Hardware by ChunderDownunder · · Score: 1

      Yeah, try turning your ad-blocker off once in a while, for the full internet experience! :)

    18. Re:Tom's Hardware by Jah-Wren+Ryel · · Score: 2, Insightful

      I'll pass this feedback along to the design guys, but do you *really* want to scroll through 4,000 words and 50-some charts, rather than looking at just the pages you're interested in reading?

      Yes, I do. I can scroll just fine thank you and I can also use the browser's built in word search to find specific words anywhere in the current page, but I can't do that and stay sane at the same time if I have to click 15 times and search 15 times for each word I might want find.

      Surely the length would be a bigger problem if there wasn't an index, right?

      Put the index in a sidebar or at the top of the single page. HTML has had document internal anchor points since pretty much day 1.

      --
      When information is power, privacy is freedom.
    19. Re:Tom's Hardware by perryizgr8 · · Score: 2, Interesting

      yeah, make it like wikipedia articles. they are long but easily navigatable.

      --
      Wealth is the gift that keeps on giving.
    20. Re:Tom's Hardware by Boba001 · · Score: 3, Insightful

      What's with only allowing registered users access to the print version? I pretty much gave up on being able to read the article after seeing that.

    21. Re:Tom's Hardware by johannesg · · Score: 1

      I'll pass this feedback along to the design guys, but do you *really* want to scroll through 4,000 words and 50-some charts, rather than looking at just the pages you're interested in reading?

      Absolutely. I'd much rather load a page once and then read it in one go, then get stuck in a cycle doing load, read, load, read, load, read - all those loads interrupt workflow and make it much more likely for me to go do something else.

    22. Re:Tom's Hardware by Nom+du+Keyboard · · Score: 1

      2. You can steal mod points by copying someone else's insightful comment and pasting it as a reply to an earlier one.

      No wonder I'm always getting modded as Redundant -1.

      --
      "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
    23. Re:Tom's Hardware by WNight · · Score: 1

      Are you kidding? Do you *really* want to wait 3-5 seconds for each page to load, after you spend 1-3 seconds interacting with the tiny little UI element in only a few spots on the page, or do you just want to hit Page-down again? Seriously, go to a nearby internet cafe and ask to watch someone pull up your site over the net without ad filters. Worse, read your article a page at a time, using the index to jump around. It's not the load speed that kills, even over cellular (help me!) but the latency between each couple of paragraphs.

      I don't know who seriously thinks scrolling through a document confuses or annoys people. Word processor work that way and I've helped thousands of customers with problems and never did one have to ask how to get to the rest of the document when the scroll-bar, pgdn, and arrow keys all work.

      How about a one-page solution with internal links? Like Wikipedia articles, etc. So you can jump where you want with the index, but the whole page loads and is usable in one piece.

      But I imagine this is more an issue of ad revenue than user-friendliness.

    24. Re:Tom's Hardware by xelah · · Score: 1

      Hold down Ctrl and click on all of the contents links one after another to open them in new tabs. Then the rest load while you're reading the first page and you can flip back and forth between pages quickly enough to make it about usable. Still annoying, though. And, of course, you need to know about tabs.

    25. Re:Tom's Hardware by Nicolay77 · · Score: 1

      I do. I totally prefer that and have even read full length books in both my computer and cell phone in this way.

      However, there's a school of 'design' for lazy readers that treat everything I like as 'the ugly wall of text'.

      Anywhere I find a long text the wall-of-text comment appears, no matter how well the paragraphs are formatted.

      So in this case it seems you really can't please everybody.

      --
      We are Turing O-Machines. The Oracle is out there.
    26. Re:Tom's Hardware by WNight · · Score: 1

      Yeah, I know. While I could easily open each page in a new tab I'd rather go to a site that caters to me and load an article in each tab.

      I'm asking if he seriously thinks reading it like that is any good though. My guess is he reads their articles from the office mainly, and occasionally at that, and has no idea of the typical user experience.

    27. Re:Tom's Hardware by Anonymous Coward · · Score: 0

      Best what, you moron?

      It's idiots like you that came up with bullshit like SyFy.

    28. Re:Tom's Hardware by The+End+Of+Days · · Score: 1

      I'm sure the managing editor of Tom's Hardware shares your Slashdottian belief that he should be providing his services to you for free. Perhaps you and the rest of the entitled people could form a rota to feed and house him (and his family if applicable.)

    29. Re:Tom's Hardware by sexconker · · Score: 1

      Actually, you're just revealed yourself to be that fanboy. I never mentioned "cards", and can only assume you are referring to graphics cards.

      Their entire site is filled to the brim with ads.
      20 "page" reviews filled with copy-and-pasted marketing bullshit from the press kits.

      They test cherry-picked hardware samples no mere mortal will ever get to touch.

      You assume I give a fucking shit about AMD/Intel or nVidia/ATi or whatever other drama there is. I give a shit about honest reviews with real products (not "gifts" or "review" hardware) that aren't essentially written by the manufacturer.

      I'd like to read said review without clicking through 20 2-paragraph pages. I don't need 5000 charts, a single fucking table will suffice.

      Bottom line:
      If a review site is using non-retail product, non-paid-for products, it's bullshit. Call it a preview if it's newsworthy and you still want to post it.

      If a review site copies and pastes press material and marketing-speak from the manufacturer, it's bullshit.

      If a review site sucks the genitals of the manufacturer and agrees to NDAs, it's bullshit.

      If a review site loads up more ads than actual review, it's bullshit.

      This goes for all sites, but obviously Tom's Hardware is the biggest offender, and has been for the longest time.

  6. SETI? by NiteMair · · Score: 4, Informative

    Waste your GPU cycles on something more interesting than SETI...

    http://www.gpugrid.net/
    http://distributed.net/download/prerelease.php (ok, maybe that's less interesting...)

    And why limit this discussion to CUDA? ATI/AMD's STREAM is usable as well...

    http://folding.stanford.edu/English/FAQ-ATI

    1. Re:SETI? by ComputerDruid · · Score: 1

      As of now, though, nvidia's CUDA has all of the hype, as well as a handful of applications developed for the platform.

    2. Re:SETI? by Anonymous Coward · · Score: 0

      Waste your GPU cycles on something more interesting than SETI...

      http://www.gpugrid.net/
      http://distributed.net/download/prerelease.php (ok, maybe that's less interesting...)

      And why limit this discussion to CUDA? ATI/AMD's STREAM is usable as well...

      http://folding.stanford.edu/English/FAQ-ATI

      More interesting? Yeah, right. Are any of those ever going to get me the alien porn I need? NO. Therefore they are all a waste of time, QED.

  7. Science is a parasite by gustgr · · Score: 1

    The same way the DoD payed for the Cray supercomputers, gamers are paying for the GPUs. Science dropped by and said thanks.

    1. Re:Science is a parasite by Bigjeff5 · · Score: 1

      Science prefers you use the term "symbiot".

      Parasite has a negative connotation.

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
  8. Hooray ... Fortran again by thoughtspace · · Score: 2, Informative
    For those out of work since the millenium bug, at long last FORTRAN is back: http://www.nvidia.com/object/cuda_what_is.html

    Can't wait for the APL support. Reorganising my keyboard keys in anticipation.

    1. Re:Hooray ... Fortran again by Anonymous Coward · · Score: 1, Funny

      Back? You've never been in a Physics department, have you? Fortran was never gone.

    2. Re:Hooray ... Fortran again by Anonymous Coward · · Score: 0

      FYI, Fortran is still heavily used for scientific computing. All libraries that target this community have bindings for C, Fortran and often C++.

  9. h.264 encoding by BikeHelmet · · Score: 5, Informative

    h.264 encoding didn't improve with more shaders for some of the results(like PowerDirector 7), because of the law of diminishing returns.

    I remember reading about x264 when quad-cores were becoming common. It mentioned that if quality is of the utmost importance, you should still encode on a single core. It splits squares of pixels between the cores; where those squares connect there can be very minor artifacts. It smooths these artifacts out with a small amount of extra data and post processing; the end result is a file hardly 1-2% bigger than if encoded on a single core, but encoded roughly 4x faster.

    Now, if we're talking about 32 cores, or 64, or 128, would the size difference be bigger than 1-2%? Probably. After a certain point, it would almost certainly not be worth it.

    This is supported by Badaboom's results, where the higher resolution videos (with more encoded squares) seem to make use of more shaders when encoding, while most of the lower resolution vids do not. (indicating that some shaders may be lying idle)

    What I'm curious about, is could the 9800GTX encode two videos at once, while the 9600GT could only manage one? ;)

    I'm also curious why the 320x240 video encoded so quickly - but that could be from superior memory bandwidth, shader clockspeed, and some other important factor in h.264 encoding.

    Take it with a grain of salt; I'm not an encoder engineer; just regurgitating what I once read, hopefully accurately. ;)

    1. Re:h.264 encoding by Anonymous Coward · · Score: 0

      That makes no sense. Why don't they start the encode on each processor at 0%, 25%, 50%, and 75% of the movie?

    2. Re:h.264 encoding by SpazmodeusG · · Score: 2, Informative

      Data compression is an inherantly serial operation. Parts of it can be done in parrallel but in general the way you compress the next bit is based on the patterns observed earlier.

      Say you wanted one core to start encoding at 0% and the other at 50% of the way into the movie. The core starting at 50% has to start compression without any of the learned patterns in the 0-50% range. In the example you gave one core encodes half the screen and the other core encodes the other half. If they are running in parrallel the second core can't use the learnt patterns of the first unless it wants to wait for the first core to finish its current frame (thereby making it non-parrallel).

      So you have a tradeoff. You can run everything serially, or you can accept that you'll miss a few observed patterns here and there and run more parrallel.

    3. Re:h.264 encoding by Anonymous Coward · · Score: 0

      Disk read/write would likely become a bottleneck there. Plus you would have to recombine the files at the end, using more disk time.

    4. Re:h.264 encoding by SpazmodeusG · · Score: 4, Informative

      Encoding from multiple different keyframes works when you can seek to any part of the input video but it doesn't help with realtime encoding.

      If i'm encoding a signal in realtime from TV i have to start encoding at 0% onwards. The only way to parallelize it is to split the individual frames up into boxes (as done by the Badaboom).

    5. Re:h.264 encoding by geekboy642 · · Score: 1

      I know almost nothing about data compression beyond the readme for pkzip. Are there really enough learned patterns in a video stream that would make a >1% difference in filesize if compressed in independent chunks? As far as I can reason it out, independent chunks would act like you'd just inserted an extra keyframe at the splitpoints.

      --
      Just another "DOJ fascist authoritarian totalitarian bootlicker" -- Zeio
    6. Re:h.264 encoding by Anonymous Coward · · Score: 0

      You are correct - it's basically key-frame boundaries that matter with conventional video compression.

    7. Re:h.264 encoding by midicase · · Score: 1

      You are thinking in terms of data from start to finish, but many types of video encoding/compression operate on the frame or relative to a frame.

      One can store an entire frame in data, and the next bit of data would be the delta between the next and previous frame. Every so often the cycle restarts so that systems can cope with streaming data (you do need a least one full frame as a reference).

      You can chop up the frame into many individual blocks. Do more of the same as above but on portions of the screen data.

      There are many, many methods of handling video data. I'm working in the industry now, but still have yet to bend my mind around many of them, but we do have engineers whose sole job is to deal with this.

    8. Re:h.264 encoding by electrosoccertux · · Score: 2, Informative

      Data compression is an inherantly serial operation. Parts of it can be done in parrallel but in general the way you compress the next bit is based on the patterns observed earlier.

      Say you wanted one core to start encoding at 0% and the other at 50% of the way into the movie. The core starting at 50% has to start compression without any of the learned patterns in the 0-50% range. In the example you gave one core encodes half the screen and the other core encodes the other half. If they are running in parrallel the second core can't use the learnt patterns of the first unless it wants to wait for the first core to finish its current frame (thereby making it non-parrallel).

      So you have a tradeoff. You can run everything serially, or you can accept that you'll miss a few observed patterns here and there and run more parrallel.

      For usability (seeking through a video) no codecs worked based on a learned pattern. The memory requirements to make use of this would be astronomical (you'd have to store the entire file in RAM, good luck doing that with a BluRay).

      IIRC, the furthest back any codec looks is something like 24 frames.

    9. Re:h.264 encoding by Anonymous Coward · · Score: 2, Informative

      For video encoding there is a ton of work that can be done in parallel. You can compute all of the dct's for all of the macroblocks in parallel. You can run your motion search for every block in parallel.

    10. Re:h.264 encoding by adolf · · Score: 2, Informative

      This is one of the most inane thought patterns I have yet to witness this week.

      The reason is simple: Fine, so you've split a process into chunks and distributed them across two or more cores. But it's not exactly like those cores are working in a vacuum; they all use the same RAM.

      As another reply has stated, codecs don't work quite how you describe -- they don't use the entire media as a reference, but at most a couple of dozen frames. But even if such mythological technology were really in use: There's no qualitative reason why something learned by process A cannot be shared with process B, and vice-versa. Therefore, the two processes can encode totally different segments of a given video, share what they've learned, and make similar and consistent tradeoffs.

      After that, you join the parts on an existing keyframe (which doesn't have to be exactly at 50% or whatever the ideal number happens to be), and call it a day.

  10. Well, it works awesome if your problem is parellel by Muerte23 · · Score: 5, Interesting

    The Tesla 1060 is a video card with no video output (strictly for processing) that has something like 240 processor cores and 4 GB of DDR3 RAM. Just doing math on large arrays (1k x 1k) I get a performance boost of about a factor of forty over a dual core 3.0 GHz Xeon.

    The CUDA extension set has FFT functionality built in as well, so it's excellent for signal processing. The SDK and programming paradigm is super easy to learn. I only know C (and not C++) and I can't even make a proper GUI, but I can make my array functions run massively in parallel.

    The trick is to minimize memory moving between the CPU and the GPU because that kills performance. Only the brand newest cards support functionality for "simultaneous copy and execute" where one thread can be reading new data to the card, another can be processing, and the third can be moving the results off the card.

    One way that the video people can maybe speed up their processing (disclaimer: I don't know anything about this) is to do a quick sweep for keyframes, and then send the video streams between keyframes to individual processor cores. So instead of each core gets a piece of the frame, maybe each core gets a piece of the movie.

    The days of the math coprocessor card have returned!

  11. OpenCL? by Midnight+Thunder · · Score: 1

    I thought Nvidia was indicating they were going to move to supporting OpenCL, or are the simply planning to support multiple technologies?

    --
    Jumpstart the tartan drive.
    1. Re:OpenCL? by ChunderDownunder · · Score: 2

      Both, I'd guess. If someone releases some killer software for OpenCL they'd be made not to - Apple are pushing it for OS X.

      On the other hand, if they do a deal with someone to write CUDA stuff, it's lock-in that you must buy an nvidia card.

      Either way they win...

    2. Re:OpenCL? by Trepidity · · Score: 1

      They also have control over adding features to CUDA relatively rapidly as hardware gains new capabilities, which they can't easily do with OpenCL.

    3. Re:OpenCL? by cptnapalm · · Score: 1

      I remember reading the OpenCL announcement (I like to pretend that I know what I'm talking about in programming matters) and Nvidia did indeed say that they would be supporting it.

    4. Re:OpenCL? by 3.1415926535 · · Score: 1

      CUDA and OpenCL are not exclusive, they're at different layers in the driver stack. If you look at the NVIDIA slides, you'll see that C, OpenGL, DX11 Compute, and Fortran are all just frontend languages that compile to/run on top of CUDA.

  12. Re:Well, it works awesome if your problem is parel by Anonymous Coward · · Score: 2, Interesting

    We've run some signal processing on a Tesla card, and get roughly 500x improvement over (somewhat poorly written) code for a Core 2 Duo.
    ~8 hr on a Core 2 Duo
    ~1.5 hr on Core i7
    seconds on Tesla

  13. Re:Well, it works awesome if your problem is parel by Muerte23 · · Score: 2, Informative

    Well I didn't say my code was *well* written. Apparently there's a lot of trickery with copying global memory to cached memory to speed up operations. Cached memory takes (IIRC) one clock cycle to read or write, and global GPU memory takes six hundred cycles. And there's all this whatnot and nonsense about aligning your threads with memory locations that I don't even bother with.

  14. OpenCL is an Open Standard Compute Language by Gary+W.+Longsine · · Score: 5, Informative
    It's not really clear what you're looking for, possibly because you're looking for the wrong thing. It might help if you first spend an hour or three learning a little more about OpenCL, and reading up at various sites to see who's doing what.

    OpenCL is an Open Standard compute language which comprises:
    • a language extended from C99,
    • a platform (hardware + OpenCL-aware device driver), and
    • a compiler and runtime (which may decide where to send a compute task at run time).

    If you're writing an OpenCL-aware device device driver for a GPU, you'll probably need to wait a bit for some open source examples. It's reasonably likely that there will be some included in Darwin (once updated for Snow Leopard).

    Look to the LLVM project (sponsored heavily by Apple and others) for an open source compiler which will (if it doesn't already) know about OpenCL.

    It sounds like you might be looking for a higher level API which allows you to more easily use the OpenCL, or possibly for language bindings to Java or Python perhaps? I suspect you'll see those coming along, once Apple ships Snow Leopard, and people have a chance to kick the tires, and then integrate LLMV into their tool chains, extend various higher level API, bridge to Java and whatnot.

    The earliest high level API to take easy and broad advantage of OpenCL will probably be from Apple, of course. They'll likely provide some nicely automatic ways to take advantage of OpenCL without programming the OpenCL C API directly. As a Cocoa programmer, you'll be using various high level objects, maybe an indexer for example, which have been taught new OpenCL tricks. You'll just recompile your program and it will tap the GPU as appropriate and if available. The Cocoa implementation is closed source, but people will see what's possible and emulate it in various open source libraries, on other platforms, for Java and other languages.

    Here's a good place to start: OpenCL - Parallel Computing on the GPU and CPU. Follow up with a google search.

    --
    If you mod me down, I shall become more powerful than you could possibly imagine.
  15. OpenCL - UnTied to a card by Gary+W.+Longsine · · Score: 1

    That's the whole point of of the OpenCL architecture, to let the compiler figure out the hardware specific optimizations. If you want a cross platform, GPU-independent mechanism to:

    [ _Booming_ _Monster_ _Truck_ _Voice_]
    Tap the hidden potential of your GPU! then you want OpenCL.

    --
    If you mod me down, I shall become more powerful than you could possibly imagine.
    1. Re:OpenCL - UnTied to a card by Anonymous Coward · · Score: 0

      nVidia could not manage to make the magical optimizing compiler for their own API and their own hardware, nor could ATI/AMD make such a compiler for their API and their hardware. Why on earth are people expecting that the OpenCL implementations are going to manage to do any better? Furthermore, the OpenCL code that I've looked at so far in the beta OpenCL SDK from nVidia is very similar (in design and optimization) to the equivalent code from the CUDA SDK.

    2. Re:OpenCL - UnTied to a card by mdarksbane · · Score: 2, Informative

      And as someone who has worked in GLSL (which is a similar level of abstraction as OpenCL) I can say you'll still see major differences even between cards from the same vendor.

      I remember several minor tweaks in our code that gave 20% performance boosts on one card and 20% loss on another, and that was without ever actually getting into the assembler. Video games already often have largely different rendering paths for different cards when it comes to specific shader effects.

  16. MIMD by Gary+W.+Longsine · · Score: 1

    Apple and other OpenCL partners are undoubtedly looking forward, beyond SIMD, to the coming generation of MIMD capable GPU such as the nVIDIA GT300.

    --
    If you mod me down, I shall become more powerful than you could possibly imagine.
  17. Ya... by msimm · · Score: 1
    For once in my life I had to RTFA (all the way through) to see if he was really serious.

    In extreme cases, over-exposure to computer radiation can cause schizophrenia

    That explains so much about me. Classic. Great link. ;-)

    --
    Quack, quack.
  18. What About Multiple GPU Cards in 1 Host? by Doc+Ruby · · Score: 2, Insightful

    Those benchmarks show that even older ($120-140) nVidia GPU cards can really speed up some processing tasks, especially transcoding video. But what I think is even more exciting than just the acceleration from offloading CPU to GPU is using multiple GPU cards in a single host PC. Stuff a $1000 PC with $1120 in GPUs (like 8 $140 nVidia cards), and that's 1024 parallel cores, anywhere from 16x to 56x the performance at only just over double the price. PCI-e should make the data parallel fast enough to feed the cards. I bet that 8 $1000 cards stuffed into a $1000 PC would be something like 200x to 4000x for only 9x the price.

    So what I want to see is benchmarks for whole render farms. I want to see HD video transcoded into H.264 and other formats simultaneously on the fly, in realtime, with true fast-forward, in multiple independent streams from the same master source. This stuff is possible now on a reasonable budget.

    --

    --
    make install -not war

    1. Re:What About Multiple GPU Cards in 1 Host? by adolf · · Score: 1

      Cool. Sign me up.

      Just one problem: Where can I find a $1000 PC with 8 available PCI Express x16 slots? The best machine I have at the moment only has three, and 8 won't even fit into a normal ATX case.

    2. Re:What About Multiple GPU Cards in 1 Host? by TubeSteak · · Score: 3, Interesting

      Those benchmarks show that even older ($120-140) nVidia GPU cards can really speed up some processing tasks, especially transcoding video. But what I think is even more exciting than just the acceleration from offloading CPU to GPU is using multiple GPU cards in a single host PC. Stuff a $1000 PC with $1120 in GPUs (like 8 $140 nVidia cards), and that's 1024 parallel cores, anywhere from 16x to 56x the performance at only just over double the price.

      Your passwords are no longer safe.
      It used to require days for a cluster of PCs to brute force an 8+ character password.
      Now with a big enough PSU, you can stuff a tower with graphics cards to get it done in hours.
      About the only common hash I can't find a CUDA enabled brute forcer for is NTLM2

      --
      [Fuck Beta]
      o0t!
    3. Re:What About Multiple GPU Cards in 1 Host? by Doc+Ruby · · Score: 1

      My password is probably safe. It might take hours to crack a single password, but what are the odds that it will be my password, of all the billions of them in use now, of all the dozens of passwords I use, each different?

      --

      --
      make install -not war

  19. Re:Well, it works awesome if your problem is parel by parlancex · · Score: 3, Informative

    Actually, what you are referring to is simultaneous DMA and kernel execution, and this is available in every card that has compute 1.1 capability which is actually every card but the very first G80 series cards (8800 GTX and 8800 GTS). The GPU actually executes the DMA and pulls memory that has been allocated as aligned and pagelocked and this can be overlapped with kernel execution, it doesn't have anything to do with GPU or CPU threads. Transfers from non page-locked memory are always synchronous and as such can't be overlapped with kernel execution. But, generally, yes, host -> device memory bandwidth is usually the bottleneck for most CUDA applications. Applications that are able to perform a large amount of processing on the same data if that data will fit simultaneously in device memory are able to mitigate this, but this doesn't usually include supercomputing or general coprocessor-esque applications (transcoding).

  20. Re:Well, it works awesome if your problem is parel by Belisar · · Score: 2, Interesting

    I assume that's what the parent meant.

    As an addendum, the newest CUDA 2.2 (with chip of the newest generation, i.e. GT200) actually has support for reading directly from (page-locked) host memory inside of GPU kernels... something I believe ATI cards have allowed for a while.

  21. Ok, hit me /. by Anonymous Coward · · Score: 0

    I'm currently running 2x Geforce 9800GTX and dual-booting Ubuntu and XP.

    What interesting and practical things am I, the average schmoe with a gaming pc, able to do with CUDA today? What resources have I been squandering?

  22. Amdahl's law by Anonymous Coward · · Score: 0

    see: Amdahl's law
    "....is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in parallel computing to predict the theoretical maximum speedup using multiple processors." http://en.wikipedia.org/wiki/Amdahl%27s_law

  23. Single precision by Gerb · · Score: 1

    You get the big speedup only if you're doing single precision floating point computations.

    On the NVIDIA GTX 280 & 260, a multiprocessor has eight single-precision floating point ALUs (one per core) but only one double-precision ALU (shared by the eight cores). Thus, for applications whose execution time is dominated by floating point computations, switching from single-precision to double-precision will increase runtime by a factor of approximately eight.

    A lot of my HPC customers do CFD with (1) double precision in (2) Fortran. 1 and 2 are not easy or fast with CUDA.

    --
    There's no place like 127.0.0.1
  24. Re:Well, it works awesome if your problem is parel by kramulous · · Score: 1

    Is that for single or double precision work? Which Xeon exactly? Which compiler? How was the code written for the compiler? Which compiler flags?

    Although I don't dispute your claims, writing to get max performance out the newer xeons is *hard* and you need to be very careful. The 256 bit wide registers on the 54xx can be extremely handy for codes written the right way.

    I currently have a client that needs to run a lot of this and so far, I have the single cpu version running 10x faster than the parallel version running on 8 cores (single node). Only simple changes thus far although there is a particularly nasty data structure in there that is next for the chopping block.

    Just saying.

    --
    .
  25. Re:Well, it works awesome if your problem is parel by parlancex · · Score: 1

    Yeah, zerocopy is what they're calling it. It's most interesting in Nvidia's latest integrated chipsets because the latency is much lower than across the PCI-E bus which allow for some interesting applications (it wouldn't be that hard to write a sound driver that could process almost everything hardware on your GPU, and you could probably use the SPDIF mixed out over HDMI to actually output the sound directly).