Slashdot Mirror


AMD's OpenCL Allows GPU Code To Run On X86 CPUs

eldavojohn writes "Two blog posts from AMD are causing a stir in the GPU community. AMD has created and released the industry's first OpenCL which allows developers to code against AMD's graphics API (normally only used for their GPUs) and run it on any x86 CPU. Now, as a developer, you can divide the workload between the two as you see fit instead of having to commit to either GPU or CPU. Ars has more details."

46 of 176 comments (clear)

  1. Nice by clarkn0va · · Score: 5, Interesting

    Good on them. Now how about an API that allows me to run GPU code on the GPU? The day I can play 1080p mkvs from a netbook on AMD/ATI hardware is the day I'll quit buying nvidia.

    --
    I am literally 3000 tokens away from the chaotic crossbow --Stephen
    1. Re:Nice by Anonymous Coward · · Score: 2, Funny

      Good on them. Now how about an API that allows me to run GPU code on the GPU? The day I can play 1080p mkvs from a netbook on AMD/ATI hardware is the day I'll quit buying nvidia.

      *Head Explodes*

    2. Re:Nice by clarkn0va · · Score: 5, Informative

      I suppose I could have been clearer. I'm talking about gpu decoding of HD video, conspicuously absent on AMD hardware in Linux, fully functional on NVIDIA.

      --
      I am literally 3000 tokens away from the chaotic crossbow --Stephen
    3. Re:Nice by MostAwesomeDude · · Score: 5, Informative

      AMD/ATI only offers GPU-accelerated decoding and presentation through the XvBA API, which is only available to their enterprise and embedded customers. People seem to always forget that fglrx is for enterprise (FireGL) people first.

      Wait for the officially supported open-source radeon drivers to get support for GPU-accelerated decoding, or (God forbid!) contribute some code. In particular, if somebody would write a VDPAU frontend for Gallium3D...

      --
      ~ C.
    4. Re:Nice by Briareos · · Score: 3, Insightful

      I suppose I could have been clearer. I'm talking about gpu decoding of HD video, conspicuously absent on AMD drivers in Linux, fully functional on NVIDIA.

      Fixed that for you. Or does installing Linux somehow magically unsolder the video decoding part of AMD's GPUs?

      np: Death Cab For Cutie - Information Travels Faster (The Photo Album)

      --

      "I'm not anti-anything, I'm anti-everything, it fits better." - Sole

    5. Re:Nice by Bootarn · · Score: 2, Insightful

      Damn, you beat me to it!

      The problem now is the lack of applications that enable end users to make benefit from having a powerful GPU. This will be the case until there's a standard API which works across multiple GPU architectures. Having both CUDA and OpenCL is one too many

    6. Re:Nice by Anonymous Coward · · Score: 2, Interesting

      That's hilarious. Maybe you should quit buying nvidia hardware, then.

      .

      Maybe I should be a little clearer: you should have quit buying nvidia hardware in September of 2008 , because hardware acceleration for video on Linux has been available since then, with the official AMD/ATI driver.

    7. Re:Nice by clarkn0va · · Score: 4, Funny

      does installing Linux somehow magically unsolder the video decoding part of AMD's GPUs?

      I'm not going to lie to you; I don't know the answer to that question, and I'm not about to make any assumptions.

      --
      I am literally 3000 tokens away from the chaotic crossbow --Stephen
    8. Re:Nice by hairyfeet · · Score: 2, Interesting

      I thought the AMD guys are releasing the specs so the Linux guys can code pretty much any goodie they want? I don't know how high def on AMD is/isn't on Linux, but one of the reasons why I went AMD for my new PC was how well their "bang for the buck" has gotten. My 780V board played videos (and Bioshock surprisingly) smooth as butter until my PCIe card came in, which I couldn't believe supported H264, WMV9, DivX, MPG, and a few others right out of the box with no fiddling. All that and a gig of RAM on a 4650 for a measly $50!(actually $37 when the rebate gets here)

      You may be unhappy at AMD for not having high def drivers out yet for your OS, but me? I'm fricking amazed at how far we have come. My first x86 was a 66MHz with a whopping 12Mb of RAM, and even with an MPG card anything more than video that looked like it belonged on a Sega CD might as well have been a slideshow. And all told I probably sank over $1300 on the PC. Now I have an AMD 7550 with dual 2500MHz cores, 8Gb of RAM, 3/4ths of a Tb of HDD, and another Gb just for the GPU, and all that with a nice 22x DVD burner and XP X64 and I barely spent $600 total.It is just amazing!

      So while it sucks that at this very moment you don't have drivers, I'm sure with the open specs they will come. And frankly the new AMD IGPs rock! Low heat, low power, and really smooth video. Doesn't AMD have binary drivers out for Linux? Don't they work? You'll probably have to use those until the developers can play catch up. Until then you can always dual boot, because I can tell you the Windows drivers are smooth as butter with MPC Home Cinema. And doesn't Nvidia only give out closed spec binary blobs? I thought you Linux guys hated those? Sorry if I am mistaken, but I'm a Windows repairman Jim, not a Linux Guru!

      --
      ACs don't waste your time replying, your posts are never seen by me.
    9. Re:Nice by RiotingPacifist · · Score: 3, Interesting

      look back about a year, since AMD opened up specs & docs, the radeon drivers have become very usable for everyday stuff (maybe not HD video, compiz or games), but the stability blows any prop driver i have ever used (nvidia or flgrx) right out of the water.
      For years linux users/developers have been claiming that we don't want drivers we just want open specs (without NDAs) and "we" would do the hard work. Well AMD have opened specs but it turns out when i say "we" i mean just the 2 guys who can be bothers, fortunately these guys are pretty fucking awesome so development is coming along smoothly but still lags behind what prop drives offer (in terms of performance anyway). Perhaps readon does not meet your needs but they it is defiantly viable alternative to nvidia for many uses!

      --
      IranAir Flight 655 never forget!
  2. Re:Optimization by Shadow+of+Eternity · · Score: 5, Funny

    Why would anyone ever want to do something well when they can fail at several things?

    --
    A bullet may have your name on it but splash damage is addressed "To whom it may concern."
  3. The real benefit by HappySqurriel · · Score: 4, Insightful

    Wouldn't the real benefit be that you wouldn't have to create two separate code-bases to create an application that both supported GPU optimization and could run naively on any system?

    1. Re:The real benefit by Red+Flayer · · Score: 5, Funny

      to create an application that both supported GPU optimization and could run naively on any system?

      Yes, that's the solution. Have your code run on any system, all too willing to be duped by street vendors, and blissfully unaware of the nefarious intentions of the guy waving candy from the back of the BUS.

      Oh... you meant running code natively... I see.

      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
  4. Intel counters with CPU+GPU on a chip by fibrewire · · Score: 5, Interesting

    Ironically Intel announced that they are going to stop outsourcing their GPU's in Atom processors and include the gpu + cpu in one package, yet nobody knows what happened to the dual core Atom N270...

    1. Re:Intel counters with CPU+GPU on a chip by avandesande · · Score: 3, Insightful

      Microsoft wouldn't allow licensing dual cores on netbooks.

      --
      love is just extroverted narcissism
    2. Re:Intel counters with CPU+GPU on a chip by PitaBred · · Score: 2, Interesting

      If that's not monopoly control, I don't know what is. A single company essentially telling another one what it can or can't develop or release?

  5. Re:Optimization by jjoelc · · Score: 2, Insightful

    Actually, this will provide more flexibility in their optimizations. There are some aspects that the CPU does very well, and there are others that the GPU handle well... being able to say "perform THIS function on the CPU and THAT one on the GPU, will free up resources on each chip. Utilizing the CPU for some functions will free up resources on the GPU, and vise-versa, allowing (theoretically) to optimize the performance of EACH one for a better overall experience.

  6. Re:Optimization by Timothy+Brownawell · · Score: 4, Insightful

    So now programmers can write code that will work on either processor and will be optimized on neither. Brilliant. I'm sure this is somehow a great step forward.

    -sigh-

    Um, what? How does the existence of a compiler that generates x86 code prevent the existence of an optimizing compiler that generate GPU instructions?

  7. Makes sense by m.dillon · · Score: 3, Interesting

    Things have been slowly moving in this directly already, since game makers have not been using available cpu horsepower very effectively. A little z-buffer magic and there is no reason why the object space couldn't be separated into completely independent processing streams.

    -Matt

  8. Use both at the same time? by TejWC · · Score: 2, Interesting

    I haven't read too much of OpenCL (just a few whitepapers and tutorials) but does anybody know if you can use both the GPU and CPU at the same time for the same kind of task. For example, in a single "kernel", I want it done 100 times, I can send 4 to the quad-core CPU and the rest to the GPU? If so, this would be a big win for AMD.

  9. Re:Isn't there a fundamental problem... by Eddy+Luten · · Score: 2, Interesting

    IMO, the fundamental problem with OpenCL is the same as with OpenAL, which is that Operating System vendors don't provide a standard implementation as is done with OpenGL.

    (Bus) speed isn't an issue as creating a CPU or GPU context requires a specific creation flag, so one would know what the target platform is.

  10. Re:Isn't there a fundamental problem... by ByOhTek · · Score: 2, Informative

    So, you store the data the GPU is working on in the card's memory, and the data the CPU is working on in system memory.

    yes, it is relatively slow to move between the two, but not so much that the one time latency incurred will eliminate the benefits.

    --
    Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
  11. Overhyped by TheRaven64 · · Score: 5, Informative
    Compiling OpenCL code as x86 is potentially interesting. There are two ways that make sense. One is as a front-end to your existing compiler toolchain (e.g. GCC or LLVM) so that you can write parts of your code in OpenCL and have them compiled to SSE (or whatever) code and inlined in the calling code on platforms without a programmable GPU. With this approach, you'd include both the OpenCL bytecode (which is JIT-compiled to the GPU's native instruction set by the driver) and the native binary and load the CPU-based version if OpenCL is not available. The other is in the driver stack, where something like Gallium (which has an OpenCL state tracker under development) will fall back to compiling to native CPU code if the GPU can't support the OpenCL program directly.

    Having a separate compiler that doesn't integrate cleanly with the rest of your toolchain (i.e. uses a different intermediate representation preventing cross-module optimisations between C code and OpenCL) and doesn't integrate with the driver stack is very boring.

    Oh, and the press release appears to be a lie:

    AMD is the first to deliver a beta release of an OpenCL software development platform for x86-based CPUs

    Somewhat surprising, given that OS X 10.6 betas have included an OpenCL SDK for x86 CPUs for several months prior to the date of the press release. Possibly they meant public beta.

    --
    I am TheRaven on Soylent News
    1. Re:Overhyped by tyrione · · Score: 2, Informative

      AMD is the first to deliver a beta release of an OpenCL software cross development platform for x86-based CPUs

      Source: http://developer.amd.com/GPU/ATISTREAMSDKBETAPROGRAM/Pages/default.aspx

      Being able to target both Windows and Linux is something outside Apple's platform scope.

  12. Re:DIDN'T APPLE COME UP WITH THIS ABOUT A YEAR AGO by TejWC · · Score: 2, Informative

    Ok, I'll feed the troll (this time)

    Anyway, Apple was one of the companies that first came up with the OpenCL standard. Apple worked with Khronos to make it a full standard. AMD is one of the first to publicly release a full implementation of OpenCL which is why this is big news.

  13. Re:Optimization by V!NCENT · · Score: 2, Interesting

    I suppose it really sucks to code in OpenCL and also take advantage of your CPU. It also really sucks that when you have an nVidia card and the code is made for ATI that you can still use it on your CPU. Seriously...

    --
    Here be signatures
  14. Re:Optimization by olsmeister · · Score: 5, Insightful

    Welcome back to the days of the math coprocessor....

  15. Re:Isn't there a fundamental problem... by sarkeizen · · Score: 4, Interesting

    It's difficult to actually figure out what you are talking about here..from what I see this article is about writing code to the AMD stream framework and have it target X86 (or AMD GPUs).
    If your concern is shipping object code to a card to be processed may end up being so time consuming that it would not be worth it. Then I'd say that most examples of this kind of processing I've seen are doing some specific highly scalable task (e.g. MD5 hashing, portions of h264 decode). So clearly you have to do a cost/benefit like you would with any type of parallelization. That said, the cost of shipping code to the card is pretty small. So I would expect any reasonably repetitive task would afford some improvement. You're probably more worried about how well the code can be parallelized rather than the transfer cost.

  16. GPUs are dying - the cycle continues by realmolo · · Score: 2, Insightful

    Now that we have CPUs with literally more cores than we know what to do with, it makes sense to use those cores for graphics processing. I think that within a few years, we'll start seeing games that don't require a high-end graphics card- they'll just use a couple of the cores on your CPU. It makes sense, and is actually a good thing. Fewer discrete chips is better, as far as power consumption and heat, ease-of-programming and compatibility are concerned.

    1. Re:GPUs are dying - the cycle continues by Pentium100 · · Score: 3, Insightful

      A dedicated graphics processor will be faster than a general purpose processor. Yes, you could use an 8 core CPU for graphics, or you could use a 4 year old VGA. Guess which one is cheaper.

    2. Re:GPUs are dying - the cycle continues by Khyber · · Score: 3, Insightful

      Hey, my nVidia 9800GTX+ has over 120 processing cores of one form or another in one package..

      Show me an Intel offering or AMD offering in the CPU market with similar numbers of cores in one package.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    3. Re:GPUs are dying - the cycle continues by SpinyNorman · · Score: 3, Interesting

      For some games that'll be true, but I think it'll be a long time, if ever, before we see a CPU that can compete with a high end GPU especially as the bar gets higher and higher - e.g. physics simulation , ray tracing...

      Note that a GPU core/thread processor is way simpler than a general purpose CPU core and so MANY more can be fit on a die. Compare an x86 chip with maybe 4 cores with something like an NVidea Tesla (CUDA) card which starts with 128 thread processors and goes up to 960(!) in a 1U format card! I think there'll always be that 10-100 factor more cores in a high end GPU vs CPU and for apps that need that degree of paralellism/power the CPU will not be a substitute.

    4. Re:GPUs are dying - the cycle continues by ShadowRangerRIT · · Score: 2, Informative

      There's only two ways to do that:

      1. Some of the cores are specialized in the same way that current GPUs are: You may lose some performance due to memory bottlenecks, but you'll still have the specialized circuitry for doing quick vectored floating point math.
      2. You throw out the current graphics model used in 99% of 3D applications, replacing it with ray tracing, and lose 90% of your performance in exchange for mostly unnoticeable improvements in the quality of the generated graphics.

      Of course, you're reading this the wrong way. You think they are trying to replace GPUs with CPUs. They're really just trying to deal with the fact that some systems lack GPUs, and many systems with GPUs will have underutilized CPUs. GPGPU applications are using the specialized GPU hardware for a reason; falling back to CPU is for improved compatibility with low end systems and full hardware utilization on high end ones; it's not intended to get rid of the GPU (defined as any chip specializing in minimal branching, high throughput, vectorized floating point math).

      Take a look at Folding@Home sometime. They have a CPU and GPU client. They are both trying to solve protein folding problems. The CPU, being good at integer math, looks at the problem as a discrete particle simulation. The GPU, being good at bursts of floating point math, models the system in a continuous way (see their site for a complete explanation). While the GPU results have a small margin for error (due to FP rounding), they're still one of the best clients from the perspective of advancing the field, because on similar value hardware (say, an recent Core2Duo vs. a 8800GTX) they solve similar problems 5-10x faster. If they could run the GPU specific code on a CPU it wouldn't do them any good; since the CPU is bad at that type of problem, they'd end up doing worse than running the correct client on the CPU. The CPU clients can double check the GPU results if needed, but the GPU is by far the fastest at sorting plausible from implausible results.

      --
      $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
    5. Re:GPUs are dying - the cycle continues by Scott+Francis[Mecham · · Score: 2, Funny

      And so, the wheel starts another turn.

      --
      --
  17. What's the story? by trigeek · · Score: 2, Informative

    The OpenCL spec already allowed for running code on a CPU or a GPU. It's just registered as a different type of device. So basically, they are enabling compiling the OpenCL programming language to the x86? I don't really see the story, here.

    --
    Sometimes I doubt your committment to SparkleMotion!
  18. Re:Optimization by earnest+murderer · · Score: 4, Funny

    The SX is for Sux!

    --
    Platform advocacy is like choosing a favorite severely developmentally disabled child.
  19. Re:Isn't there a fundamental problem... by iluvcapra · · Score: 2, Interesting

    IMO, the fundamental problem with OpenCL is the same as with OpenAL, which is that Operating System vendors don't provide a standard implementation as is done with OpenGL.

    It's still pretty early to say, though Apple provides an API for this with Snow Leopard. I don't know it OpenAL is a bad comparison or not, but as someone that does audio coding, OpenAL is the biggest joke of an API yet devised by man. OpenAL has little support because it's an awful and usless set of resources and features.

    --
    Don't blame me, I voted for Baltar.
  20. Not any time soon by Sycraft-fu · · Score: 4, Insightful

    I agree that the eventual goal is everything on the CPU. After all, that is the great thing about a computer. You do everything in software, you don't need dedicated devices for each feature, you just need software. However, even as powerful as CPUs are, they are WAY behind what is needed to get the kind of graphics we do out of a GPU. At this point in time, dedicated hardware is still far ahead of what you can do with a CPU. So it is coming, but probably not for 10+ years.

  21. UniversCL by phil_ps · · Score: 2, Interesting

    Hi, I am working on an OpenCL implementation sponsored by google summer of code. It is nearly done supporting the CPU and the Cell processor. This news has come to as a blow to me. I have struggled so much with my open source project and now a big company is going to come and trample all over me. boo hoo. http://github.com/pcpratts/gcc_opencl/tree/master

  22. Re:Expect more of this in the near future by ShadowRangerRIT · · Score: 2, Interesting

    I wouldn't be so sure on nVidia. They appear to think CUDA is a better system, and from what I've heard and seen, they're right. OpenCL appears to be more limited in scope and harder to optimize, partially due to OpenCL being written as a spec for abstract, heterogeneous hardware, while CUDA was written with the 8000+ series nVidia cards in mind. They'll probably eventually implement OpenCL, but I suspect it will take a back seat to CUDA.

    OpenCL has advantages in larger systems (e.g. supercomputers built from large numbers of commodity processors), but on a single machine, the heterogeneous support gains you little; CUDA's focus on the GPU often means the GPU does more work than an OpenCL program using both GPU and one or two CPU cores.

    --
    $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
  23. Re:Isn't there a fundamental problem... by iluvcapra · · Score: 4, Interesting

    My main issues with OpenAL are that it is completely based around the concept of a "listener" interacting with sounds in "space." In other words, it's the OpenGL semantic applied to sound. I looked into it originally because I wanted something more system-independent than Apple's CoreAudio, but really OpenAL is just a videogame language, and it's focused completely around choreographing sounds for interactive emulation of space. OpenAL is hell if you want to apply a subjective effects aside from its pre-cooked spatial repertory, or even do something simple like build a mixer with busses.

    In my line, film post-production, the users really don't want to control the "direction" and "distance" of a sound, they want to control the pan and reverb send of a sound; the language and the model is simply too high level for people who are used to setting their own EQ poles and their own pitch-shifts for doppler.... Most of the models OpenAL uses to create distance and direction sensations are pretty subjective, arbitrary, and not really based on current pychoacoustic modelling. It works to an extent, but it doesn't give a sound designer, of a videogame or anything else, the level of control over the environment they generally expect. It certainly doesn't give a videogame sound designer the level of control over presentation that OpenGL gives the modeller or shader developer.

    Oh, and OpenAL doesn't support 96k, 24 bit audio, or 5.1 surround.

    I admit I am not their target audeince, and I can see how OpenAL is sufficient for videogame developers, but it really is nothing more than sufficient, and unlike OpenGL, which universal enough that it can be used in system and productivity software, on computers, phones, and in renderfarms on everything from calendar software to animated movies, OpenAL is strictly for videogames only.

    --
    Don't blame me, I voted for Baltar.
  24. Re:Isn't there a fundamental problem... by kramulous · · Score: 2, Interesting

    I've found that an O(n^3) algorithm or less should be run on cpu. The overhead of moving to gpu memory is just too high. The gen2 pci is faster, but that just means I do #pragma omp parallel for and set the number of processors to 2.

    The comparisons of gpu and cpu code are not fair. They talk about highly optimised code for the gpu but totally neglect the cpu code (only use a O2 with the gcc compiler and that's it). On a E5430 Xeon, intel compiler and well written code, an O(n^3) or less is faster.

    --
    .
  25. Re:Isn't there a fundamental problem... by schwaang · · Score: 2, Interesting

    Unless of course you have a device (like newer macbooks) with nvidia's mobile chipset, which shares system memory and can therefore take advantage of Zero-copy access, in which case there is no transfer penalty because there is no transfer. A limited case, but useful for sure.

  26. Re:Isn't there a fundamental problem... by Chris+Burke · · Score: 3, Insightful

    I admit I am not their target audeince, and I can see how OpenAL is sufficient for videogame developers, but it really is nothing more than sufficient, and unlike OpenGL, which universal enough that it can be used in system and productivity software, on computers, phones, and in renderfarms on everything from calendar software to animated movies, OpenAL is strictly for videogames only.

    Um, yeah. I have only used it sparingly, but it has always been my understanding that OpenAL was a library for doing spatial audio, in particular for 3D games. I never got the impression that it was supposed to just be an arbitrary audio api. I never got the impression that it was supposed to be for anyone who wasn't specifically interested in spatial audio.

    I mean there are plenty of other cross-platform sound libraries.

    Is OpenAL seriously advertising itself as a general-purpose sound library akin to OpenGL these days? Is it suffering from feature/scope creep? Or is this just a case of picking the wrong tool for the job based on an understandable confusion regarding the OpenFoo nomenclature?

    --

    The enemies of Democracy are
  27. Re:Isn't there a fundamental problem... by Anonymous Coward · · Score: 2, Interesting

    As a related aside to this, how long before GPU's include a form of audio processing as well. We want to offload radiosity effects to video cards. GPGPU is one way, although specialized support for this that utilizes the graphics card's inherent knowledge of object positioning might be somewhat preferable

    At the same time it might be beneficial to consider a similar, but slightly more general problem. Radiosity utilizes reflectivity "textures" to calculate final light levels. One could easilly imagine applying audio reflectivity textures to objects, and simulating the reflections of sound to produce the final sound. Thus if the player is standing on the other side of a large audio absorptive object from the sound source the sound would be muffled. If a sound occurred in a large cavern-style area with appropriate sound textures it would inherently echo. Clearly there are some substantial similaries between the two, although of course the differences are also significant. Never the less, it seems reasonably possible to design a GPU hardware addition that is capable of performing the calculations for either, no?

  28. Re:Isn't there a fundamental problem... by kramulous · · Score: 2, Informative

    Not at all absurd. I realise that the gpu is a compute workhorse. That's not the issue. It is the data transfer rate to and from the card. Transferring 3GiB takes quite a while. Pulling the results back takes a while also. That's what kills it. The cpu can get the work done in that time.

    I'm using the cuda blas routines, examples from the sdk and those published as 'glorious almighty' codes. Everything that the card does is timed as it is all time to solution.

    --
    .