Slashdot Mirror


AMD's OpenCL Allows GPU Code To Run On X86 CPUs

eldavojohn writes "Two blog posts from AMD are causing a stir in the GPU community. AMD has created and released the industry's first OpenCL which allows developers to code against AMD's graphics API (normally only used for their GPUs) and run it on any x86 CPU. Now, as a developer, you can divide the workload between the two as you see fit instead of having to commit to either GPU or CPU. Ars has more details."

176 comments

  1. Nice by clarkn0va · · Score: 5, Interesting

    Good on them. Now how about an API that allows me to run GPU code on the GPU? The day I can play 1080p mkvs from a netbook on AMD/ATI hardware is the day I'll quit buying nvidia.

    --
    I am literally 3000 tokens away from the chaotic crossbow --Stephen
    1. Re:Nice by Anonymous Coward · · Score: 2, Funny

      Good on them. Now how about an API that allows me to run GPU code on the GPU? The day I can play 1080p mkvs from a netbook on AMD/ATI hardware is the day I'll quit buying nvidia.

      *Head Explodes*

    2. Re:Nice by clarkn0va · · Score: 5, Informative

      I suppose I could have been clearer. I'm talking about gpu decoding of HD video, conspicuously absent on AMD hardware in Linux, fully functional on NVIDIA.

      --
      I am literally 3000 tokens away from the chaotic crossbow --Stephen
    3. Re:Nice by Anonymous Coward · · Score: 1, Insightful

      No you shouldn't, this way you got +5 interesting AND +5 informative ;)

    4. Re:Nice by MostAwesomeDude · · Score: 5, Informative

      AMD/ATI only offers GPU-accelerated decoding and presentation through the XvBA API, which is only available to their enterprise and embedded customers. People seem to always forget that fglrx is for enterprise (FireGL) people first.

      Wait for the officially supported open-source radeon drivers to get support for GPU-accelerated decoding, or (God forbid!) contribute some code. In particular, if somebody would write a VDPAU frontend for Gallium3D...

      --
      ~ C.
    5. Re:Nice by Anonymous Coward · · Score: 0

      Umm...what netbooks have 1080p resolution that would make this essential?

    6. Re:Nice by Briareos · · Score: 3, Insightful

      I suppose I could have been clearer. I'm talking about gpu decoding of HD video, conspicuously absent on AMD drivers in Linux, fully functional on NVIDIA.

      Fixed that for you. Or does installing Linux somehow magically unsolder the video decoding part of AMD's GPUs?

      np: Death Cab For Cutie - Information Travels Faster (The Photo Album)

      --

      "I'm not anti-anything, I'm anti-everything, it fits better." - Sole

    7. Re:Nice by Anonymous Coward · · Score: 0

      The Radeon R700 chip already supports H.264/MPEG-4 AVC decoding on hardware.

    8. Re:Nice by Bootarn · · Score: 2, Insightful

      Damn, you beat me to it!

      The problem now is the lack of applications that enable end users to make benefit from having a powerful GPU. This will be the case until there's a standard API which works across multiple GPU architectures. Having both CUDA and OpenCL is one too many

    9. Re:Nice by Anonymous Coward · · Score: 1, Insightful

      Wai any netbook that has a DVI, VGA or HDMI port that is connected to a widescreen monitor, silly!

    10. Re:Nice by Anonymous Coward · · Score: 2, Interesting

      That's hilarious. Maybe you should quit buying nvidia hardware, then.

      .

      Maybe I should be a little clearer: you should have quit buying nvidia hardware in September of 2008 , because hardware acceleration for video on Linux has been available since then, with the official AMD/ATI driver.

    11. Re:Nice by clarkn0va · · Score: 4, Funny

      does installing Linux somehow magically unsolder the video decoding part of AMD's GPUs?

      I'm not going to lie to you; I don't know the answer to that question, and I'm not about to make any assumptions.

      --
      I am literally 3000 tokens away from the chaotic crossbow --Stephen
    12. Re:Nice by clarkn0va · · Score: 1
      From your link:

      XvBA isn't yet usable by end-users on Linux

      The API for XvBA isn't published yet and we are not sure whether it will be due to legal issues. We're told by a credible source though that X-Video Bitstream Acceleration wouldn't be much of a challenge to reverse-engineer by the open-source community.

      Interesting, but not yet useful (unless you're able to reverse-engineer this type of code, and I'm not). I'm still looking forward to the day when ATI hardware is a viable alternative on Linux.

      --
      I am literally 3000 tokens away from the chaotic crossbow --Stephen
    13. Re:Nice by Trahloc · · Score: 1

      I think the point isn't that the hardware doesn't support it, but that the software can't access it. It's like a ready and willing woman in the other room but there is a lock with no key blocking the way.

      --
      The Goal: A long simple life filled with many complex toys.
    14. Re:Nice by ion.simon.c · · Score: 1

      If the lock has no key, then it cannot be locked.

      Problem solved! :D

    15. Re:Nice by hairyfeet · · Score: 2, Interesting

      I thought the AMD guys are releasing the specs so the Linux guys can code pretty much any goodie they want? I don't know how high def on AMD is/isn't on Linux, but one of the reasons why I went AMD for my new PC was how well their "bang for the buck" has gotten. My 780V board played videos (and Bioshock surprisingly) smooth as butter until my PCIe card came in, which I couldn't believe supported H264, WMV9, DivX, MPG, and a few others right out of the box with no fiddling. All that and a gig of RAM on a 4650 for a measly $50!(actually $37 when the rebate gets here)

      You may be unhappy at AMD for not having high def drivers out yet for your OS, but me? I'm fricking amazed at how far we have come. My first x86 was a 66MHz with a whopping 12Mb of RAM, and even with an MPG card anything more than video that looked like it belonged on a Sega CD might as well have been a slideshow. And all told I probably sank over $1300 on the PC. Now I have an AMD 7550 with dual 2500MHz cores, 8Gb of RAM, 3/4ths of a Tb of HDD, and another Gb just for the GPU, and all that with a nice 22x DVD burner and XP X64 and I barely spent $600 total.It is just amazing!

      So while it sucks that at this very moment you don't have drivers, I'm sure with the open specs they will come. And frankly the new AMD IGPs rock! Low heat, low power, and really smooth video. Doesn't AMD have binary drivers out for Linux? Don't they work? You'll probably have to use those until the developers can play catch up. Until then you can always dual boot, because I can tell you the Windows drivers are smooth as butter with MPC Home Cinema. And doesn't Nvidia only give out closed spec binary blobs? I thought you Linux guys hated those? Sorry if I am mistaken, but I'm a Windows repairman Jim, not a Linux Guru!

      --
      ACs don't waste your time replying, your posts are never seen by me.
    16. Re:Nice by Anonymous Coward · · Score: 1, Insightful

      Or just use Windows where we've been enjoying working hardware acceleration and 1080p videos for a long time now.

    17. Re:Nice by clarkn0va · · Score: 1

      We've come a long way in most respects, I'll give you that. Hardware accelerated HD playback on Linux too is happening, but I want it now, see?

      When it comes to open source, I'm part of the pragmatist camp. Yeah, I totally prefer to use the stuff that's open, but then it has to be usable. AMD's video hardware is way more open than nvidia's if you believe the reports, yet time and again I'm disappointed by its poor real-world performance. As I implied earlier in this discussion, ATI has already won my heart, but they have yet to win my dollar.

      At this very moment I'm shopping out a new laptop and really waffling between nvidia--closed, but does great HD--and intel--open, but hasn't delivered on HD in linux yet. It's perplexing, and yes, time will sort it all out. I'm leaning toward intel, but I'll be really disappointed if this promised technology doesn't move beyond vapourware some time in the next year or two.

      --
      I am literally 3000 tokens away from the chaotic crossbow --Stephen
    18. Re:Nice by RiotingPacifist · · Score: 3, Interesting

      look back about a year, since AMD opened up specs & docs, the radeon drivers have become very usable for everyday stuff (maybe not HD video, compiz or games), but the stability blows any prop driver i have ever used (nvidia or flgrx) right out of the water.
      For years linux users/developers have been claiming that we don't want drivers we just want open specs (without NDAs) and "we" would do the hard work. Well AMD have opened specs but it turns out when i say "we" i mean just the 2 guys who can be bothers, fortunately these guys are pretty fucking awesome so development is coming along smoothly but still lags behind what prop drives offer (in terms of performance anyway). Perhaps readon does not meet your needs but they it is defiantly viable alternative to nvidia for many uses!

      --
      IranAir Flight 655 never forget!
    19. Re:Nice by Jaroslav.Tucek · · Score: 1

      They are working on it. Look here or here. While linux xvba driver support seems almost finished, it might take a while before user space applications make use of the capability.

    20. Re:Nice by hairyfeet · · Score: 1

      If it is a laptop and you don't mind an old Windows repairman's advice, go nvidia. Intel just hasn't ever had any really good IGP, and you have to be REALLY careful now if you are using Linux, because one of their latest IGP offerings is NOT a true Intel, but a rebadged PowerVR chip, and apparently they give a big old finger to Linux users. let me see if I can find the model...here you go, it is the GMA 500 and they say Linux support is piss poor at best.

      So if it were me, no matter which OS I wanted to be using I would NOT get Intel until Larrabee comes out. They are just too underpowered compared to AMD and Nvidia chips. If the Nvidia blobs are working good for you I would say go Nvidia. After all, this is a laptop, which I'm betting you are gonna wanna keep a little while at least, yes? Videos are only gonna get bigger, effects fancier, and the Nvidia is more likely to keep up with those changes than the current Intel chips. Hey, maybe you could find a nice laptop with the discrete Nvidia AND the Intel IGP? Then you could have the best of both worlds!

      --
      ACs don't waste your time replying, your posts are never seen by me.
    21. Re:Nice by quintesse · · Score: 1

      What the heck, this is /. so I can nitpick as much as I want.

      The OP you referred to said "decoding of HD video ... absent on AMD hardware in Linux" not "from". There's a difference and it's enough to understand his statement correctly (as he meant it).

    22. Re:Nice by dwater · · Score: 1

      > decoding of HD video ... absent from AMD hardware in Linux

      eh? Doesn't make any sense to me.

      --
      Max.
    23. Re:Nice by dwater · · Score: 1

      oh, now it does...when I put the right emphasis on and fill in the '...' with the right words (something like 'is' and group 'from' with 'absent' rather than 'AMD'.

      never mind...

      --
      Max.
    24. Re:Nice by kayditty · · Score: 0

      why are you telling people what music you listen to in your Slashdot posts? could you at least put it in your signature so I won't see it?

  2. Isn't there a fundamental problem... by tjstork · · Score: 1

    In that memory on the card is faster for the card GPU and memory on the CPU is faster than the CPU. Like, I know PC-Express speeds things up, but, is it that fast that you don't have to worry about the bottleneck of the system bus?

    --
    This is my sig.
    1. Re:Isn't there a fundamental problem... by InsertWittyNameHere · · Score: 1

      If it was a problem then it wouldn't have been worth it to have a separate GPU in the first place.

      The GPU is there, now lets make it useful as often as possible. And if there is no GPU but two CPUs then with OpenCL we can use two the CPUs instead.

    2. Re:Isn't there a fundamental problem... by Anonymous Coward · · Score: 0

      smaller problems are usually faster to run on CPU, while larger problems can be much faster to run on GPU. And programmer and the framework has to consider the trade-offs when deciding where to send the work load.

    3. Re:Isn't there a fundamental problem... by Eddy+Luten · · Score: 2, Interesting

      IMO, the fundamental problem with OpenCL is the same as with OpenAL, which is that Operating System vendors don't provide a standard implementation as is done with OpenGL.

      (Bus) speed isn't an issue as creating a CPU or GPU context requires a specific creation flag, so one would know what the target platform is.

    4. Re:Isn't there a fundamental problem... by ByOhTek · · Score: 2, Informative

      So, you store the data the GPU is working on in the card's memory, and the data the CPU is working on in system memory.

      yes, it is relatively slow to move between the two, but not so much that the one time latency incurred will eliminate the benefits.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    5. Re:Isn't there a fundamental problem... by sarkeizen · · Score: 4, Interesting

      It's difficult to actually figure out what you are talking about here..from what I see this article is about writing code to the AMD stream framework and have it target X86 (or AMD GPUs).
      If your concern is shipping object code to a card to be processed may end up being so time consuming that it would not be worth it. Then I'd say that most examples of this kind of processing I've seen are doing some specific highly scalable task (e.g. MD5 hashing, portions of h264 decode). So clearly you have to do a cost/benefit like you would with any type of parallelization. That said, the cost of shipping code to the card is pretty small. So I would expect any reasonably repetitive task would afford some improvement. You're probably more worried about how well the code can be parallelized rather than the transfer cost.

    6. Re:Isn't there a fundamental problem... by iluvcapra · · Score: 2, Interesting

      IMO, the fundamental problem with OpenCL is the same as with OpenAL, which is that Operating System vendors don't provide a standard implementation as is done with OpenGL.

      It's still pretty early to say, though Apple provides an API for this with Snow Leopard. I don't know it OpenAL is a bad comparison or not, but as someone that does audio coding, OpenAL is the biggest joke of an API yet devised by man. OpenAL has little support because it's an awful and usless set of resources and features.

      --
      Don't blame me, I voted for Baltar.
    7. Re:Isn't there a fundamental problem... by tjstork · · Score: 1

      If your concern is shipping object code to a card to be processed may end up being so time consuming that it would not be worth i

      Not so much as the code but the data. If you have a giant array of stuff to crunch, then yeah, shipping it to the card makes sense. But if you have a lot of tiny chunks of data then, it may not make as much sense to ship it all over to the card. That same problem is really what haunts multicore designs as well - its like you can build a job scheduler that takes a list of jobs and have threads servicing it, but at some point, the overhead of having your thread wait to get a job is more than its worth and certainly the act of creating a thread is pretty expensive.

      --
      This is my sig.
    8. Re:Isn't there a fundamental problem... by apharmdq · · Score: 1

      Please elaborate. I've been using OpenAL for a long time now, and I've come to prefer it to any other audio API. It may be lower-level than most, but it's fast, robust, and cross-platform. (Of course, it depends on what you're developing for, but to say that it's a joke of an API, you imply that it's useless. From what I've seen in the industry, it seems to be gaining quite a bit of momentum.)

    9. Re:Isn't there a fundamental problem... by iluvcapra · · Score: 4, Interesting

      My main issues with OpenAL are that it is completely based around the concept of a "listener" interacting with sounds in "space." In other words, it's the OpenGL semantic applied to sound. I looked into it originally because I wanted something more system-independent than Apple's CoreAudio, but really OpenAL is just a videogame language, and it's focused completely around choreographing sounds for interactive emulation of space. OpenAL is hell if you want to apply a subjective effects aside from its pre-cooked spatial repertory, or even do something simple like build a mixer with busses.

      In my line, film post-production, the users really don't want to control the "direction" and "distance" of a sound, they want to control the pan and reverb send of a sound; the language and the model is simply too high level for people who are used to setting their own EQ poles and their own pitch-shifts for doppler.... Most of the models OpenAL uses to create distance and direction sensations are pretty subjective, arbitrary, and not really based on current pychoacoustic modelling. It works to an extent, but it doesn't give a sound designer, of a videogame or anything else, the level of control over the environment they generally expect. It certainly doesn't give a videogame sound designer the level of control over presentation that OpenGL gives the modeller or shader developer.

      Oh, and OpenAL doesn't support 96k, 24 bit audio, or 5.1 surround.

      I admit I am not their target audeince, and I can see how OpenAL is sufficient for videogame developers, but it really is nothing more than sufficient, and unlike OpenGL, which universal enough that it can be used in system and productivity software, on computers, phones, and in renderfarms on everything from calendar software to animated movies, OpenAL is strictly for videogames only.

      --
      Don't blame me, I voted for Baltar.
    10. Re:Isn't there a fundamental problem... by sarkeizen · · Score: 1

      I suppose it depends on what you mean by "lots of tiny chunks". Clearly doing a single "burst" transfer is better than lots of small ones but if you are still planning to process all these "chunks" of data at the same time then there's no reason why you couldn't just ship them all together and process them individually. Perhaps even from shared memory.

      Unless of course we're taking about a bunch of chunks that are not going to be worked on simultaneously which goes back to my statement about the degree of parallelism that can be achieved being your chief worry.

    11. Re:Isn't there a fundamental problem... by Elshar · · Score: 1

      No, it'd still be worth it. Right now hardware acceleration (using the GPU to generate graphics) is done via sending instructions to the GPU which then do all the work of rendering the scene and sending out to your display.

      What they're talking about is having the *CPU* render the scene, or at least part of it and then handing *THAT* off to the display.

      The problem that the GP was talking about is that there's only so much BW available on the system bus and with alot of things going on, it's possible to max out that BW and actually cause a degradation of performance if it's not handled correctly.

    12. Re:Isn't there a fundamental problem... by kramulous · · Score: 2, Interesting

      I've found that an O(n^3) algorithm or less should be run on cpu. The overhead of moving to gpu memory is just too high. The gen2 pci is faster, but that just means I do #pragma omp parallel for and set the number of processors to 2.

      The comparisons of gpu and cpu code are not fair. They talk about highly optimised code for the gpu but totally neglect the cpu code (only use a O2 with the gcc compiler and that's it). On a E5430 Xeon, intel compiler and well written code, an O(n^3) or less is faster.

      --
      .
    13. Re:Isn't there a fundamental problem... by schwaang · · Score: 2, Interesting

      Unless of course you have a device (like newer macbooks) with nvidia's mobile chipset, which shares system memory and can therefore take advantage of Zero-copy access, in which case there is no transfer penalty because there is no transfer. A limited case, but useful for sure.

    14. Re:Isn't there a fundamental problem... by Anonymous Coward · · Score: 0

      Umh, sorry, but you're quite a bit off.
      A few of the things you mention can be done with EFX, but mostly you don't want to use spatialised audio, so yes, you're looking at the wrong tool.
      OpenAL is a specification, of course it doesn't support your preferred audio format, but implementations might (5.1 in particular is supported, but if you knew OpenAL you'd know why asking for 5.1 doesn't make much sense).
      You also seem to not know much about OpenGL, it is not used for (professional) animated movies and such, in fact OpenGL fails there for the same reason OpenAL does. Both specifications are mainly meant to be used live, for rendering you use other systems that result in much better quality at the expense of rendering time.
      For what it's worth, in a former life I developed with OpenGL and have contributed to two different OpenAL implementations (not much, but I'm familiar with the codebases and spec).

    15. Re:Isn't there a fundamental problem... by iluvcapra · · Score: 1

      For what it's worth, in a former life I developed with OpenGL and have contributed to two different OpenAL implementations (not much, but I'm familiar with the codebases and spec).

      Did you guys actually talk to any sound designers when you designed this spec? There are so many other things you could have done, but instead you chased the chimera of "OpenGL for sound" or rather "a sound design API for people who hate sound design."

      5.1 being implemetation-defined is unacceptable. The signal presented on speaker channels should never be a matter of the platform vendor's choice, it must be the designers.

      --
      Don't blame me, I voted for Baltar.
    16. Re:Isn't there a fundamental problem... by Chris+Burke · · Score: 3, Insightful

      I admit I am not their target audeince, and I can see how OpenAL is sufficient for videogame developers, but it really is nothing more than sufficient, and unlike OpenGL, which universal enough that it can be used in system and productivity software, on computers, phones, and in renderfarms on everything from calendar software to animated movies, OpenAL is strictly for videogames only.

      Um, yeah. I have only used it sparingly, but it has always been my understanding that OpenAL was a library for doing spatial audio, in particular for 3D games. I never got the impression that it was supposed to just be an arbitrary audio api. I never got the impression that it was supposed to be for anyone who wasn't specifically interested in spatial audio.

      I mean there are plenty of other cross-platform sound libraries.

      Is OpenAL seriously advertising itself as a general-purpose sound library akin to OpenGL these days? Is it suffering from feature/scope creep? Or is this just a case of picking the wrong tool for the job based on an understandable confusion regarding the OpenFoo nomenclature?

      --

      The enemies of Democracy are
    17. Re:Isn't there a fundamental problem... by Anonymous Coward · · Score: 0

      I once had to write a fire alarm type app for Mac OS X and found NSSound lacking and CoreAudio too complex, so I used OpenAL. I made the alarm sound "approach" the user, which had the added bonus of a builtin Doppler effect that added to the urgency of the alarm.

    18. Re:Isn't there a fundamental problem... by Anonymous Coward · · Score: 0

      First off, your argument is absurd from a simple theoretical standpoint. GPUs offer more computing power than CPUs, which means that for anything greater than O(n) GPUs will come out ahead for sufficiently large input.

      Secondly, although I certainly have seen some researchers publish suspicious results (including a few cases where the GPU performance improvement was larger than it theoretically could be!), the vast majority are comparing GPU codes to well-written, highly optimized scientific computing codes compiled with an Intel or PGI compiler.

      I suspect you are either used to working with smaller input sizes than are commonly used in GPU kernels, or you are simply a bad CUDA programmer. With an O(n^2) algorithm, a hundred megabytes of CPUGPU memory transfer -- which can be done in a fraction of a second -- can translate to 625,000,000,000,000 floating point operations. Unless someone has released a general purpose CPU capable of 625TFLOPS in real world settings, it appears O(n^2) is perfectly sufficient to make GPUs worthwhile.

    19. Re:Isn't there a fundamental problem... by Anonymous Coward · · Score: 2, Interesting

      As a related aside to this, how long before GPU's include a form of audio processing as well. We want to offload radiosity effects to video cards. GPGPU is one way, although specialized support for this that utilizes the graphics card's inherent knowledge of object positioning might be somewhat preferable

      At the same time it might be beneficial to consider a similar, but slightly more general problem. Radiosity utilizes reflectivity "textures" to calculate final light levels. One could easilly imagine applying audio reflectivity textures to objects, and simulating the reflections of sound to produce the final sound. Thus if the player is standing on the other side of a large audio absorptive object from the sound source the sound would be muffled. If a sound occurred in a large cavern-style area with appropriate sound textures it would inherently echo. Clearly there are some substantial similaries between the two, although of course the differences are also significant. Never the less, it seems reasonably possible to design a GPU hardware addition that is capable of performing the calculations for either, no?

    20. Re:Isn't there a fundamental problem... by kramulous · · Score: 2, Informative

      Not at all absurd. I realise that the gpu is a compute workhorse. That's not the issue. It is the data transfer rate to and from the card. Transferring 3GiB takes quite a while. Pulling the results back takes a while also. That's what kills it. The cpu can get the work done in that time.

      I'm using the cuda blas routines, examples from the sdk and those published as 'glorious almighty' codes. Everything that the card does is timed as it is all time to solution.

      --
      .
    21. Re:Isn't there a fundamental problem... by Anonymous Coward · · Score: 0

      Yes, it is absurd. If the amount of computational work grows faster than the memory transfer required, a faster processor will always win for sufficiently large inputs. Note that I am not saying an O(n^2) algorithm (or some other greater-than-linear-time algorithm) will always be better on a GPU; just that it will be better for a large enough input. I've run into cases, like Cholesky Decomposition, where it's simply better for small inputs to run on the CPU. But even there, take a sufficiently large matrix as input and you are better off running on the GPU.

      Think of it this way: the running time (ignoring memory transfer) is roughly n^2/G for GPUs, and n^2/C for CPUs, with G > C for GPU-applicable problems. We add in the time to transfer memory to and from the GPU, getting n^2/G + m*n for the total GPU time, as the memory transfer scales linearly with amount of data transferred. Take sufficiently large n and you find that, because G > C, the total GPU time is less than the CPU time.

    22. Re:Isn't there a fundamental problem... by squidinkcalligraphy · · Score: 1

      I'm guessing we'll soon get with GPUs what happened with FPUs. Remember FPUs? Maths Co-processors? 80387? A seperate chip that handled floating point ops because the CPU did have those in the instruction set. Eventually merged into the main CPU chip. GPUs: initially on a seperate card, but requiring and increasingly faster bus (GPUs have driven the development of high speed buses), now often on the mainboard (true, not top-of-the-line chips yet, but I suspect that has a lot to do with marketing rather than technology) with shared access to the system's main memory. I'm guessing before long the GPU will be on the CPUs die.

      --
      "I think it would be a good idea" Gandhi, on Western Civilisation
    23. Re:Isn't there a fundamental problem... by tyrione · · Score: 1

      IMO, the fundamental problem with OpenCL is the same as with OpenAL, which is that Operating System vendors don't provide a standard implementation as is done with OpenGL.

      (Bus) speed isn't an issue as creating a CPU or GPU context requires a specific creation flag, so one would know what the target platform is.

      http://www.khronos.org/registry/cl/

      Embrace and extend. So far I'm seeing C/C++ APIs and of course Apple extends their own with ObjC APIs.

      What's stopping you from using the C APIs?

      The Core Spec is akin to the OpenGL spec. The custom extensions for Intel, Nvidia and AMD will be based upon their design decisions they implement in their GPUs.

      However, the CPU specs for Intel and AMD are there to leverage with OpenCL.

      What else do you want?

    24. Re:Isn't there a fundamental problem... by ByOhTek · · Score: 1

      I'd argue it is absurd, it's way too simplistic.

      Lets say you have an O(n) algorithm, and a quad core CPU that can handle 4 billion instructions per second per core (16 billion IPS total for the CPU), on average, and the algorithm is highly scaleable.

      Now, lets say the number of instructions per input is 1 million. That means 16 thousand inputs takes 1 seconds.

      Now, with a GPU, you might have 128 effective cores, each of which can handle 500 million instructions per second, and each unit requires 2 billion instructions instead of one (assuming a smaller/less efficient instruction set for the task).

      sixteen thousand inputs will take 32 billion instructions, you have 64 billion instructions per second, so it will take 1/2 of a second to calculate.

      Which means, you have 1/2 of a second to transfer the data up and down, which means, in this case, if the dataset is under a couple gigabytes for those 16,000 entries using PCI Express.

      It's not as simple as calculation order (actually that is probably the least important issue here), it's
      (1) How much can be done in the algorithm in parallel
      (2) How large is the data set
      (3) What is your CPUGPU transfer speed
      (4) What is the relative power of your CPU vs. GPU.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    25. Re:Isn't there a fundamental problem... by ByOhTek · · Score: 1

      sorry for the reply to my own post, in the GPU section, I stated the units taking 2 billion instructions instead of 1, it should read 2 million instructions instead of 1.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    26. Re:Isn't there a fundamental problem... by Eddy+Luten · · Score: 1

      Nothing stops anyone from using the APIs, I'm talking about a working implementation.

      OpenGL is supported on pretty much all available platforms and has a standard implementation on them: Windows has opengl32.dll, Linux has Mesa3d, and Apple also has a default implementation.

      I guess the point I'm trying to make here that an API is worthless without an implementation: the library containing the actual functionality. What are you loading if you don't have a IHV implementation available? Nothing. Just like OpenGL, OpenCL will need a default, software, implementation supported on all platforms.

      And I can promise you that Microsoft will not be jumping on this OpenCL bandwagon (providing a platform default software implementation) with their development Direct3D Compute Shaders and the fact that Microsoft is no longer a Khronos partner. If they do in the next version of Windows I'll be very pleasantly surprised.

    27. Re:Isn't there a fundamental problem... by Anonymous Coward · · Score: 0

      true, not top-of-the-line chips yet, but I suspect that has a lot to do with marketing rather than technology

      The problem (well, one of them) with GPUs on the motherboard is the lack of bandwidth to system memory. High end GPUs in particular expect an order of magnitude more memory bandwidth than what is offered by current high end triple channel DDR3. Either you increase system memory bandwidth to the level GPUs require, or you have separate memory banks on the motherboard for the GPU. In the former case you significantly increase the cost of motherboards and system memory, and in the latter case you increase the cost of motherboards and don't really provide a clear-cut advantage over the current discrete card solution. You'd simply be replacing the PCI-E bus with HyperTransport or QPI.

    28. Re:Isn't there a fundamental problem... by Bigjeff5 · · Score: 1

      I apologize in advance; I'm not normally a grammar Nazi as I make the same mistakes myself, but this just made me cringe:

      That means 16 thousand inputs takes 1 seconds.

      Holy crap! If the subject is plural, the verb must be plural. Takes is not plural! I know the "s" throws you off, but takes is singular, take is plural. If that were all I would have been able to refrain from getting all Nazi on you, but "1 seconds"? Seriously?

      Say it with me: 16 thousand inputs take 1 second.

      Usually I make that kind of mistake when I edit and re-edit a statement without checking it thoroughly, so I can see how it might have happened. Still, I was almost unable to read the rest of your post because "inputs takes 1 seconds" was so mind-numbing.

      This one is probably worse, but for some reason not as mind-numbing. Perhaps only because it came after "inputs takes 1 seconds".

      Which means, you have 1/2 of a second to transfer the data up and down, which means, in this case, if the dataset is under a couple gigabytes for those 16,000 entries using PCI Express.

      If the dataset is under a couple gigabytes what? You just left the poor sentance there, hanging, fragmented. It will never be whole. Also, the double "which means," is inane and over-uses the comma, but that's just nit-picking.

      Now, for bonus points, where is the grammar error in my post? It must exist, it's a fundamental law of nature that one always commits a grammar error when correcting another's grammar. The more smug the correction the more glaring the error.

      Happy hunting!

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
    29. Re:Isn't there a fundamental problem... by ByOhTek · · Score: 1

      I apologize. I have a lot of typos and slips of the finger. If that offends you, you might want to get off the internet and go outside for a while.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
  3. Re:Optimization by Shadow+of+Eternity · · Score: 5, Funny

    Why would anyone ever want to do something well when they can fail at several things?

    --
    A bullet may have your name on it but splash damage is addressed "To whom it may concern."
  4. The real benefit by HappySqurriel · · Score: 4, Insightful

    Wouldn't the real benefit be that you wouldn't have to create two separate code-bases to create an application that both supported GPU optimization and could run naively on any system?

    1. Re:The real benefit by V!NCENT · · Score: 1

      Thank you _O_

      --
      Here be signatures
    2. Re:The real benefit by Red+Flayer · · Score: 5, Funny

      to create an application that both supported GPU optimization and could run naively on any system?

      Yes, that's the solution. Have your code run on any system, all too willing to be duped by street vendors, and blissfully unaware of the nefarious intentions of the guy waving candy from the back of the BUS.

      Oh... you meant running code natively... I see.

      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    3. Re:The real benefit by Bigjeff5 · · Score: 1

      Yes, that's the solution. Have your code run on any system, all too willing to be duped by street vendors, and blissfully unaware of the nefarious intentions of the guy waving candy from the back of the BUS.

      I don't see the problem, I run strange code all the time and nothing bad has ever happened. That I know of. Yet. ;)

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
  5. Intel counters with CPU+GPU on a chip by fibrewire · · Score: 5, Interesting

    Ironically Intel announced that they are going to stop outsourcing their GPU's in Atom processors and include the gpu + cpu in one package, yet nobody knows what happened to the dual core Atom N270...

    1. Re:Intel counters with CPU+GPU on a chip by avandesande · · Score: 3, Insightful

      Microsoft wouldn't allow licensing dual cores on netbooks.

      --
      love is just extroverted narcissism
    2. Re:Intel counters with CPU+GPU on a chip by Cycon · · Score: 1

      Microsoft wouldn't allow licensing dual cores on netbooks.

      As far as I can tell, that's only regards Windows XP.

      See this article (which, admittedly, its talking about a "nettop" box, not a netbook:

      ...first thing you see is that it runs on Windows Vista - XP under Microsoft's licensing terms for netbooks limited it to single core CPUs.

      Got anything which specifically states that other OS's besides XP (which they've been trying to drop support on for a some time now) is restricted regards Dual Core?

      --
      Your Brain + EEG + LEGO Robots = Brainstorms
    3. Re:Intel counters with CPU+GPU on a chip by PitaBred · · Score: 2, Interesting

      If that's not monopoly control, I don't know what is. A single company essentially telling another one what it can or can't develop or release?

    4. Re:Intel counters with CPU+GPU on a chip by Dishmopo · · Score: 1

      Netbook OEMs are offered licenses cheaper than say a laptop OEM (presumably because a netbook is intended to be a low-cost machine). Microsoft is simply saying that a dual core netbook is functionally at the level of a real laptop, and thus needs to purchase laptop OEM licenses instead of netbook OEM licenses. It doesn't sound all that unreasonable to me.

    5. Re:Intel counters with CPU+GPU on a chip by HoppQ · · Score: 1

      Well, essentially Microsoft's monopoly is hurting the end-users by artificially forcing netbooks to be lower-powered than they would be otherwise. I'm sure the manufacturers think it's reasonable since they get to sell more powerful netbooks as upgrades to the old models once Microsoft eases the licensing deal requirements. Or whatever.

      --
      My sig will be released in 2015 third quarter. Rating pending.
    6. Re:Intel counters with CPU+GPU on a chip by Anonymous Coward · · Score: 0

      No, that's not a monopoly, that's just price differentiation. Microsoft just doesn't sell XP for the ridiculously low netbook prices unless the netbook is below their maximum specs. They have adjusted that once already (the max. HDD and display sizes, if I'm not mistaken), and I'd assume that they'll adjust these specs again if a netbook performance and priced dual core CPU came into widespread use. I don't know if they are planning to continue selling XP for netbooks or if they'll introduce those make-sure-they-don't-use-Linux prices for some edition of Windows 7.

    7. Re:Intel counters with CPU+GPU on a chip by PitaBred · · Score: 1

      Still, price controls that change the market are the hallmark of monopolies. If you can coerce other companies to make inferior products with your prices, the market you are operating in is not free.

  6. Re:Optimization by jjoelc · · Score: 2, Insightful

    Actually, this will provide more flexibility in their optimizations. There are some aspects that the CPU does very well, and there are others that the GPU handle well... being able to say "perform THIS function on the CPU and THAT one on the GPU, will free up resources on each chip. Utilizing the CPU for some functions will free up resources on the GPU, and vise-versa, allowing (theoretically) to optimize the performance of EACH one for a better overall experience.

  7. Re:Optimization by Timothy+Brownawell · · Score: 4, Insightful

    So now programmers can write code that will work on either processor and will be optimized on neither. Brilliant. I'm sure this is somehow a great step forward.

    -sigh-

    Um, what? How does the existence of a compiler that generates x86 code prevent the existence of an optimizing compiler that generate GPU instructions?

  8. Makes sense by m.dillon · · Score: 3, Interesting

    Things have been slowly moving in this directly already, since game makers have not been using available cpu horsepower very effectively. A little z-buffer magic and there is no reason why the object space couldn't be separated into completely independent processing streams.

    -Matt

    1. Re:Makes sense by shentino · · Score: 1

      How do you handle translucency when you have a Z buffer?

    2. Re:Makes sense by Anonymous Coward · · Score: 0

      How do you handle translucency when you have a Z buffer?

      I'm sure a quick google search would tell you more, but generally the solution for this is to render all opaque objects first with z-test and z-write on, then turn off z-write and render all the translucent objects from back to front.

    3. Re:Makes sense by Anonymous Coward · · Score: 0

      You turn it off in order to draw the parts of the scene that have non-opaque aspects.

      That said, this means that you must first sort the polygons in the scene in depth-first order (furthest-first) or it won't look right.

  9. Use both at the same time? by TejWC · · Score: 2, Interesting

    I haven't read too much of OpenCL (just a few whitepapers and tutorials) but does anybody know if you can use both the GPU and CPU at the same time for the same kind of task. For example, in a single "kernel", I want it done 100 times, I can send 4 to the quad-core CPU and the rest to the GPU? If so, this would be a big win for AMD.

    1. Re:Use both at the same time? by jerep · · Score: 1

      I am pretty sure these are details for the implementation of OpenCL, not for client code. It is the very reason why libraries such as OpenGL/CL/AL/etc exists, so you don't have to worry about implementation details in your code.

      From what I know of the spec, you would just create your kernel, feed it data, and execute it, the implementation will worry about sharing the work between the CPU and GPU to get optimal performance.

      However, I don't think it would be optimal to have all 4 cores of the CPU running on parallel tasks when the GPU has dozens more processing cores dedicated for such tasks, the CPU will better be spent doing system tasks.

    2. Re:Use both at the same time? by Anonymous Coward · · Score: 1, Informative

      From what I know of the spec, you would just create your kernel, feed it data, and execute it, the implementation will worry about sharing the work between the CPU and GPU to get optimal performance.

      No. Any individual OpenCL kernel runs solely on one device (be it CPU, GPU, or otherwise). If you want to run a kernel on multiple devices you must manually divide the work into multiple kernels and setup an OpenCL context on each device you wish to use.

  10. DIDN'T APPLE COME UP WITH THIS ABOUT A YEAR AGO??? by Anonymous Coward · · Score: 0

    This is old news, Apple has been touting this for a year now, not AMD.

  11. Overhyped by TheRaven64 · · Score: 5, Informative
    Compiling OpenCL code as x86 is potentially interesting. There are two ways that make sense. One is as a front-end to your existing compiler toolchain (e.g. GCC or LLVM) so that you can write parts of your code in OpenCL and have them compiled to SSE (or whatever) code and inlined in the calling code on platforms without a programmable GPU. With this approach, you'd include both the OpenCL bytecode (which is JIT-compiled to the GPU's native instruction set by the driver) and the native binary and load the CPU-based version if OpenCL is not available. The other is in the driver stack, where something like Gallium (which has an OpenCL state tracker under development) will fall back to compiling to native CPU code if the GPU can't support the OpenCL program directly.

    Having a separate compiler that doesn't integrate cleanly with the rest of your toolchain (i.e. uses a different intermediate representation preventing cross-module optimisations between C code and OpenCL) and doesn't integrate with the driver stack is very boring.

    Oh, and the press release appears to be a lie:

    AMD is the first to deliver a beta release of an OpenCL software development platform for x86-based CPUs

    Somewhat surprising, given that OS X 10.6 betas have included an OpenCL SDK for x86 CPUs for several months prior to the date of the press release. Possibly they meant public beta.

    --
    I am TheRaven on Soylent News
    1. Re:Overhyped by MemoryDragon · · Score: 1

      Compiling OpenCL code as x86 is potentially interesting. There are two ways that make sense. One is as a front-end to your existing compiler toolchain (e.g. GCC or LLVM) so that you can write parts of your code in OpenCL and have them compiled to SSE (or whatever) code and inlined in the calling code on platforms without a programmable GPU. With this approach, you'd include both the OpenCL bytecode (which is JIT-compiled to the GPU's native instruction set by the driver) and the native binary and load the CPU-based version if OpenCL is not available. The other is in the driver stack, where something like Gallium (which has an OpenCL state tracker under development) will fall back to compiling to native CPU code if the GPU can't support the OpenCL program directly.

      Having a separate compiler that doesn't integrate cleanly with the rest of your toolchain (i.e. uses a different intermediate representation preventing cross-module optimisations between C code and OpenCL) and doesn't integrate with the driver stack is very boring.

      Oh, and the press release appears to be a lie:

      AMD is the first to deliver a beta release of an OpenCL software development platform for x86-based CPUs

      Somewhat surprising, given that OS X 10.6 betas have included an OpenCL SDK for x86 CPUs for several months prior to the date of the press release. Possibly they meant public beta.

      I assume so OpenCL for ATI cards is heavens sent, since ATI seems to get nowhere with their custom shader language solutions, unlike NVidia which made heavy inroads with CUDA on the video codec front.
      I am rather sick of having a powerhorse which rivals the best nvidia cards and yet all the codecs use CUDA for video coding acceleration!

    2. Re:Overhyped by caramelcarrot · · Score: 1

      Yeah, CUDA does this already as far as I know. Kernels you write in their version of restricted C can be transparently called as CPU code if you don't have an available physical CUDA device.

    3. Re:Overhyped by tyrione · · Score: 2, Informative

      AMD is the first to deliver a beta release of an OpenCL software cross development platform for x86-based CPUs

      Source: http://developer.amd.com/GPU/ATISTREAMSDKBETAPROGRAM/Pages/default.aspx

      Being able to target both Windows and Linux is something outside Apple's platform scope.

  12. Re:DIDN'T APPLE COME UP WITH THIS ABOUT A YEAR AGO by TejWC · · Score: 2, Informative

    Ok, I'll feed the troll (this time)

    Anyway, Apple was one of the companies that first came up with the OpenCL standard. Apple worked with Khronos to make it a full standard. AMD is one of the first to publicly release a full implementation of OpenCL which is why this is big news.

  13. Re:Optimization by ByOhTek · · Score: 1

    Yeah, it's amazing how things that can generate executables on multiple platforms, things like C, are so amazingly slow.

    Man, why did we ever stop using assembly?

    --
    Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
  14. Re:Optimization by V!NCENT · · Score: 2, Interesting

    I suppose it really sucks to code in OpenCL and also take advantage of your CPU. It also really sucks that when you have an nVidia card and the code is made for ATI that you can still use it on your CPU. Seriously...

    --
    Here be signatures
  15. Re:Optimization by olsmeister · · Score: 5, Insightful

    Welcome back to the days of the math coprocessor....

  16. Re:DIDN'T APPLE COME UP WITH THIS ABOUT A YEAR AGO by Anonymous Coward · · Score: 0, Informative

    nVidia has had a full implementation of OpenCL out for months now.

  17. Re:DIDN'T APPLE COME UP WITH THIS ABOUT A YEAR AGO by Anonymous Coward · · Score: 0

    However, its beta and only accessible via the "OpenCL Early Access Program" which you have to apply for.

  18. Re:Optimization by HomelessInLaJolla · · Score: 1

    Insightful, funny, best post yet

    --
    the NPG electrode was replaced with carbon blac
  19. Re:DIDN'T APPLE COME UP WITH THIS ABOUT A YEAR AGO by Sir_Sri · · Score: 1

    This idea isn't new. CUDA allows you to execute your GPU code on the CPU. This is just AMD implenting OpenCl which afaik is sufficently new no one else has done this yet. I would have expected it to be another couple of months before we really saw NVIDIA and AMD start pushing OpenCL when they release new hardware. Obviously they're working on it already, it's just a matter of when anyone can do anything with it.

  20. GPUs are dying - the cycle continues by realmolo · · Score: 2, Insightful

    Now that we have CPUs with literally more cores than we know what to do with, it makes sense to use those cores for graphics processing. I think that within a few years, we'll start seeing games that don't require a high-end graphics card- they'll just use a couple of the cores on your CPU. It makes sense, and is actually a good thing. Fewer discrete chips is better, as far as power consumption and heat, ease-of-programming and compatibility are concerned.

    1. Re:GPUs are dying - the cycle continues by Pentium100 · · Score: 3, Insightful

      A dedicated graphics processor will be faster than a general purpose processor. Yes, you could use an 8 core CPU for graphics, or you could use a 4 year old VGA. Guess which one is cheaper.

    2. Re:GPUs are dying - the cycle continues by Khyber · · Score: 3, Insightful

      Hey, my nVidia 9800GTX+ has over 120 processing cores of one form or another in one package..

      Show me an Intel offering or AMD offering in the CPU market with similar numbers of cores in one package.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    3. Re:GPUs are dying - the cycle continues by Eddy+Luten · · Score: 1

      CPUs are infamously bad at processing floating point operations, this is the reason that dedicated GPUs were invented in the first place. A graphics processor like the GTX 285 has 240 stream processors that are manufactured for processing floating point numbers but really bad at integer operations. A CPU like a Core 2 Quad has four cores that are really good at integer operations but requires CPU extensions like SSE to do high performance floating point operations.

      Both Intel and AMD are currently manufacturing CPU/GPU hybrids that would kind of balance both these worlds: Larrabee a GPU-like addon, AMD Fusion an on-chip solution. We'll see what kind of API hell they will bring.

    4. Re:GPUs are dying - the cycle continues by SpinyNorman · · Score: 3, Interesting

      For some games that'll be true, but I think it'll be a long time, if ever, before we see a CPU that can compete with a high end GPU especially as the bar gets higher and higher - e.g. physics simulation , ray tracing...

      Note that a GPU core/thread processor is way simpler than a general purpose CPU core and so MANY more can be fit on a die. Compare an x86 chip with maybe 4 cores with something like an NVidea Tesla (CUDA) card which starts with 128 thread processors and goes up to 960(!) in a 1U format card! I think there'll always be that 10-100 factor more cores in a high end GPU vs CPU and for apps that need that degree of paralellism/power the CPU will not be a substitute.

    5. Re:GPUs are dying - the cycle continues by MistrBlank · · Score: 1

      Technology fail.

    6. Re:GPUs are dying - the cycle continues by ShadowRangerRIT · · Score: 2, Informative

      There's only two ways to do that:

      1. Some of the cores are specialized in the same way that current GPUs are: You may lose some performance due to memory bottlenecks, but you'll still have the specialized circuitry for doing quick vectored floating point math.
      2. You throw out the current graphics model used in 99% of 3D applications, replacing it with ray tracing, and lose 90% of your performance in exchange for mostly unnoticeable improvements in the quality of the generated graphics.

      Of course, you're reading this the wrong way. You think they are trying to replace GPUs with CPUs. They're really just trying to deal with the fact that some systems lack GPUs, and many systems with GPUs will have underutilized CPUs. GPGPU applications are using the specialized GPU hardware for a reason; falling back to CPU is for improved compatibility with low end systems and full hardware utilization on high end ones; it's not intended to get rid of the GPU (defined as any chip specializing in minimal branching, high throughput, vectorized floating point math).

      Take a look at Folding@Home sometime. They have a CPU and GPU client. They are both trying to solve protein folding problems. The CPU, being good at integer math, looks at the problem as a discrete particle simulation. The GPU, being good at bursts of floating point math, models the system in a continuous way (see their site for a complete explanation). While the GPU results have a small margin for error (due to FP rounding), they're still one of the best clients from the perspective of advancing the field, because on similar value hardware (say, an recent Core2Duo vs. a 8800GTX) they solve similar problems 5-10x faster. If they could run the GPU specific code on a CPU it wouldn't do them any good; since the CPU is bad at that type of problem, they'd end up doing worse than running the correct client on the CPU. The CPU clients can double check the GPU results if needed, but the GPU is by far the fastest at sorting plausible from implausible results.

      --
      $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
    7. Re:GPUs are dying - the cycle continues by Scott+Francis[Mecham · · Score: 2, Funny

      And so, the wheel starts another turn.

      --
      --
    8. Re:GPUs are dying - the cycle continues by Anonymous Coward · · Score: 0

      No, we won't. Not without massive speedups in the bus speed between ram and cpu. My card has 57.8 GB/sec transfer rates with a 256 bit bus. It also has 112 stream processors running at 600Mhz. This is just stock; I could overclock it for more performance. Using one or two cores simply could not match all 112, given that rasterized rendering is 'embarrassingly parallel'.

      We won't see CPUs capable of that for a while.

    9. Re:GPUs are dying - the cycle continues by johnthorensen · · Score: 1

      Except that GPU architecture is pretty different from that of a CPU. IANAE(xpert), but from what I understand the GPU is very, very, parallel compared to a CPU thanks to how easily parallelized most graphics problems are. Though CPUs are gaining more cores, I think that the difficulty in parallelizing many problems places a practical limit on the CPU's parallelism.

      That's not to say though that a GPU-type parallel core can't be integrated into the CPU package, however. I believe NVIDIA is doing some of this?

    10. Re:GPUs are dying - the cycle continues by NoOneInParticular · · Score: 1

      Actually, ray tracing would be an area where a multi-core CPU would help. There's some progress, but in contrast with scanline rendering, ray tracing is very GPU unfriendly. So, for photo-realism, the future might still be with the CPU.

    11. Re:GPUs are dying - the cycle continues by Anonymous Coward · · Score: 0

      Technobabble advertising win.

    12. Re:GPUs are dying - the cycle continues by youshotwhointhewhat · · Score: 1

      You are fundamentally wrong for many reasons: 1) Highly data-parallel problems (like graphics) are always going to be solved faster on a GPU-like architecture. 2) GPUs are gaining processing power at a higher rate than CPUs. 3) Power/heat/cost for the number of CPUs needed to match the processing power of a GPU-based solution is always going to be worse. The mentality of one size fits all for processor architectures is what is actually dying.

    13. Re:GPUs are dying - the cycle continues by Anonymous Coward · · Score: 0

      Even modern CPUs pale in comparison to some middle speced gpus out there in specific operations.

      I remember doing some benchmarks with a highly** optimized matrix multiplication code (no, -O3 doesn't mean highly optimized :p) and a basic implementation in cuda (nvidia's gpgpu language). The cpu was an intel core 2 duo e7500 (2.93ghz) and the gpu was an nvidia 8400gs. The nvidia card left the cpu in it's dust. Card produced about 40x's as much flops as the cpu.

      **I even resorted to loop unrolling (writing a bash script to unroll the loop n times, and use the best n I found)

    14. Re:GPUs are dying - the cycle continues by Turiko · · Score: 1

      the difference is that GPU's are simply so much more powerfull. In high-end systems, i think it would be more usefull to scrap the expensive cpu, switch a cheap-o thing in, and let the GPU handle what the CPU can't.

    15. Re:GPUs are dying - the cycle continues by phil_ps · · Score: 1

      The thing you must know about GPUs is that, currently, threads in groups of 32 all have to do the SAME THING or you get slower performance than a CPU.

    16. Re:GPUs are dying - the cycle continues by blahplusplus · · Score: 1

      "Now that we have CPUs with literally more cores than we know what to do with, it makes sense to use those cores for graphics processing."

      This comment is always trotted out by people who have no clue about hardware.

      CPU's doing graphics are bandwidth limited by main memory (not to mention general architecture). Graphics requires insane bandwidth. GPU's have had way more main memory bandwidth then modern CPU's have had for a long time. There is simply no way CPU's will ever catch up to GPU's because the GPU has dedicated memory bandwidth that destroys mainboard memory bandwidth on modern motherboards.

      A geforce 285 has a Memory clock speed is 2584MHz, with memory bandwidth measured at 159GB/sec.

      That's more then 10 times what an i920 has and more importantly that kind of bandwidth is absolutely necessary for highspeed graphics.

    17. Re:GPUs are dying - the cycle continues by andy_t_roo · · Score: 1

      each is individually less than 15% as powerful as a cpu anyway, if you loose 3/4 of your efficiency with bad code, you'll still end up with more than 4 cpu cores worth of compute power.
      the "same thing" basically means that if you have a branch in your code and some threads go one way, and some go the other way then those two are run sequentially.
      the total time to run
      code:
      A
      if B then X else Y
      C

      is a+b+x+y+c if all the threads don't take the same branch.

    18. Re:GPUs are dying - the cycle continues by SpinyNorman · · Score: 1

      Maybe current GPU cores/architecture, designed for vertex shading, don't map well to ray tracing, but I'd not be surprised that if ray-tracing were to become mainstream, a different highly parallel architecture (and maybe new algorithm) may be able to accelerate it.

    19. Re:GPUs are dying - the cycle continues by oldpond · · Score: 1

      Sort of like a Cell processor?

    20. Re:GPUs are dying - the cycle continues by mikael · · Score: 1

      It would like going back to the era of early DOS game programming where you just had the framebuffer, a sound function (sound), two keyboard input functions (getch/kbhit), and everyone wrote their own rendering code.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    21. Re:GPUs are dying - the cycle continues by Bigjeff5 · · Score: 1

      Since when has NVidia sold CPU's?

      Intel and AMD are doing this, and NVidia is going to be left in the dust. Why do you think they are shifting some of their focus to ultra-high end parallel processing tasks? NVidia is slowly moving away from the desktop market, or at least are building a safety net in case they get pushed out of it. Who knows, maybe they'll team up with VIA to produce a third alternative to the CPU/GPU combo.

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
  21. Re:Optimization by russotto · · Score: 1

    Man, why did we ever stop using assembly?

    For the kind of really high performance stuff OpenCL is targeted to, we didn't. Look at the low level code in GnuMP, for instance.

  22. Re:DIDN'T APPLE COME UP WITH THIS ABOUT A YEAR AGO by Runefox · · Score: 0

    No they haven't. Only as of last month have they had a release candidate for the developers-only crowd. I think you're thinking of CUDA, which is an nVidia-only technology similar to OpenCL, but differing in implementation (and I believe openness as well). Along with OpenCL, DirectX 11 is also bringing "Compute Shaders" into the DirectX model, making this kind of thing a requirement for a DX11 GPU.

    --
    Screw the rules, I have green hair!
  23. What's the story? by trigeek · · Score: 2, Informative

    The OpenCL spec already allowed for running code on a CPU or a GPU. It's just registered as a different type of device. So basically, they are enabling compiling the OpenCL programming language to the x86? I don't really see the story, here.

    --
    Sometimes I doubt your committment to SparkleMotion!
    1. Re:What's the story? by Anonymous Coward · · Score: 0

      I'm also troubled by the fact that it says "GPU Code" in the title.

      As long as there is a OpenCL enabled device, be it CPU, GPU or maybe even a PCI card hosting a Cell processor, it should be used in OpenCL applications. But everyone is typically so focused on GPU only.

      This isn't some emulation of a GPU running on CPU's. It's what OpenCL was always meant to be, and the original articles do point this out.
      If GPU coders are surprised by this, they are just ignorant to what OpenCL really is. I'm surprised the CPU drivers didn't come first!

      And a small correction to parent; Not compiling, just having the runtime use all the resources. A OpenCL binary should be able to run on whatever openCL enabled devices the runtime can find.

  24. Expect more of this in the near future by Anonymous Coward · · Score: 1, Interesting

    Note that this OpenCL implementation works for CPU only. GPU support is forthcoming.
    However, we know that Mac OSX (Snow Leopard) will soon be shipping with an OpenCL implementation.
    I think we can expect full OpenCL (CPU & GPU) support from Intel, ATI/AMD, and nVidia sooner rather than later.

    1. Re:Expect more of this in the near future by ShadowRangerRIT · · Score: 2, Interesting

      I wouldn't be so sure on nVidia. They appear to think CUDA is a better system, and from what I've heard and seen, they're right. OpenCL appears to be more limited in scope and harder to optimize, partially due to OpenCL being written as a spec for abstract, heterogeneous hardware, while CUDA was written with the 8000+ series nVidia cards in mind. They'll probably eventually implement OpenCL, but I suspect it will take a back seat to CUDA.

      OpenCL has advantages in larger systems (e.g. supercomputers built from large numbers of commodity processors), but on a single machine, the heterogeneous support gains you little; CUDA's focus on the GPU often means the GPU does more work than an OpenCL program using both GPU and one or two CPU cores.

      --
      $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
    2. Re:Expect more of this in the near future by UncleFluffy · · Score: 1

      CUDA's focus on the GPU often means the GPU does more work than an OpenCL program using both GPU and one or two CPU cores.

      Do you have evidence for this statement? Code that you can share?

      --

      What would Lemmy do?

    3. Re:Expect more of this in the near future by Chris+Burke · · Score: 1

      CUDA is the GLIDE of the GP-GPU movement. In the short term it may be highly attractive due to features, completeness, optimization, and so forth, and you'll see applications using it for this reason. In the long run it's a dead-end. Just like with rendering APIs, the winners will be one or both of the following: The open and cross-platform API, or the one Microsoft is creating.

      --

      The enemies of Democracy are
    4. Re:Expect more of this in the near future by Anonymous Coward · · Score: 0

      What are you talking about? NVIDIA is supporting both: http://www.nvidia.com:80/object/cuda_opencl.html

    5. Re:Expect more of this in the near future by Anonymous Coward · · Score: 0

      They'll probably eventually implement OpenCL, but I suspect it will take a back seat to CUDA.

      NVIDIA already released their OpenCL driver to registered developers for testing:

      http://www.nvidia.com/object/cuda_opencl.html

    6. Re:Expect more of this in the near future by Anonymous Coward · · Score: 0

      Just FYI - NVIDIA already provides conformant OpenCL 1.0 drivers and has submitted a *GPU* implementation to Khronos already...

      NVIDIA Submits OpenCL 1.0 Driver to Khronos for Conformance Certification for Windows and Linux

    7. Re:Expect more of this in the near future by Anonymous Coward · · Score: 0

      Limited in scope and harder to optimize? That's not what I see from the specifications. Do you have any links backing up your assertion?

  25. Re:Optimization by earnest+murderer · · Score: 4, Funny

    The SX is for Sux!

    --
    Platform advocacy is like choosing a favorite severely developmentally disabled child.
  26. Does OpenCL Make Parallel Programming Easy? by Louis+Savain · · Score: 1

    This is essentially what it comes down to. Does OpenCL make parallel programming of heterogeneous processors easy? The answer is no, of course, and the reason is not hard to understand. Multicore CPUs and GPUs are two incompatible approaches to parallel computing. The former is based on concurrent threads and MIMD (multiple instructions, multiple data) while the latter uses an SIMD (single instruction, multiple data) configuration. They are fundamentally different and no single interface will get around that fact. OpenCL (or CUDA) is really two languages in one. Programmers will have to frequently flip their mode of thinking in order to take effective advantage of both technologies and this is the primary reason that heterogeneous processors will be a pain to program. The other is multithreading, which, as we all know, is a royal pain in the arse in its own right.

    Obviously what it needed is a new universal parallel software model, one that is supported by a single *homogeneous* processor architecture. Unfortunately for the major players, they have so much money and resources invested in last century's processor technologies that they are stuck in a rut of their own making. They are like the Titanic on a collision course with a monster iceberg. Unless the big players are willing and able to make an about-face in their thinking (can a Titanic turn on a dime?), I am afraid that the solution to the parallel programming crisis will have to come from elsewhere. A true maverick startup will eventually turn up and revolutionize the computer industry. And then there shall be weeping and gnashing of teeth among the old guard.

    Read How to Solve the Parallel Programming Crisis if you're interested in an alternative approach to parallel computing.

  27. Not any time soon by Sycraft-fu · · Score: 4, Insightful

    I agree that the eventual goal is everything on the CPU. After all, that is the great thing about a computer. You do everything in software, you don't need dedicated devices for each feature, you just need software. However, even as powerful as CPUs are, they are WAY behind what is needed to get the kind of graphics we do out of a GPU. At this point in time, dedicated hardware is still far ahead of what you can do with a CPU. So it is coming, but probably not for 10+ years.

    1. Re:Not any time soon by ShadowRangerRIT · · Score: 1

      The question is why? Ideology should not make this determination. Assuming the current trajectories continue (or close enough to what we've seen so far), by the time the CPU can do what we want, the GPU will still be able to do it faster and with less waste. Energy costs aren't likely to drop in the next 50 years, and the GPU applications (e.g. 3D modelling/lighting) that we've done with a CPU based approach (ray tracing) usually require 10x the hardware. If one GPU (drawing, for example 200 watts) can do the work of 10 CPUs (each drawing 50 watts), you need to give a compelling, non-ideological reason for why the CPU is the better option. As the increasing number of GPGPU accelerated apps has shown, there are a lot of things that are better done with semi-specialized hardware. No, we don't need a special chip for every complex task, but at the same time it's ludicrous to ignore the advantages of specialization when you have so many tasks that benefit from it.

      --
      $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
    2. Re:Not any time soon by Miseph · · Score: 1

      Simplicity and size. The less components we need, and the smaller they can be, the better. Ultimately, if programmers didn't NEED to split up their code to run on different processors, they wouldn't, because it just makes life harder. Having one chip that handles everything makes that so, and having an API that brings us closer to a place where that makes intuitive sense is a logical progression toward that end.

      --
      Try not to take me more seriously than I take myself.
    3. Re:Not any time soon by darkwing_bmf · · Score: 1

      I don't think you understand. CPU transistor count is getting to the point where turning additional transistors into another general purpose core doesn't make as much sense as making a specialized graphics circuit with them on the same chip.

    4. Re:Not any time soon by Anonymous Coward · · Score: 0

      Moving all the processing to the CPU doesn't require that the CPU be designed around the current current symmetric core architecture.
      It seems likely that the plan will be to first eliminate low-end on-motherboard graphics processors by moving them onto the CPU die, and then developing those on-die graphics processors until they replace more and more of the discrete graphics card market. This kind of multi-target toolkit could help a great deal with this market since systems may eventually need to dynamically move work onto one or another CPU core based on load etc...

  28. Open Source OpenCL Compiler? by onionman · · Score: 1

    So, where can one obtain an open source OpenCL compiler? (Or, to be more precise, an open source compiler which can take OpenCL compliant code and produce object code that will run on my GPU via the driver stack?)

    1. Re:Open Source OpenCL Compiler? by TheRaven64 · · Score: 1

      So, where can one obtain an open source OpenCL compiler? (Or, to be more precise, an open source compiler which can take OpenCL compliant code and produce object code that will run on my GPU via the driver stack?)

      The Gallium3D architecture, which is likely to be the driver architecture for 3D drivers for open source operating systems for the next few years, compiles a bytecode that is a bit lower-level than OpenCL to native GPU code. Gallium has a pluggable architecture that allows different front ends to be plugged in and an OpenCL state tracker (the part that handles API-specific semantics) is under development and should appear in the next version.

      There is also a project to write an OpenCL front-end for LLVM. This is particularly interesting on systems which use an LLVM-based compiler for other code, because it gives the option of compiling OpenCL code to LLVM bitcode and linking this (while performing link-time optimisations) with code from other languages, which makes OpenCL useful for writing vector-heavy, branch-light code. When writing something like a video CODEC, you could compile the OpenCL code twice, once linked into the app directly, calling the OpenCL directly, and one that will be JIT-compiled by the driver.

      --
      I am TheRaven on Soylent News
  29. UniversCL by phil_ps · · Score: 2, Interesting

    Hi, I am working on an OpenCL implementation sponsored by google summer of code. It is nearly done supporting the CPU and the Cell processor. This news has come to as a blow to me. I have struggled so much with my open source project and now a big company is going to come and trample all over me. boo hoo. http://github.com/pcpratts/gcc_opencl/tree/master

    1. Re:UniversCL by Anonymous Coward · · Score: 0

      Why a blow? x86 is not Cell.

    2. Re:UniversCL by phil_ps · · Score: 1

      Well I guess you are right. It's just I spent the whole summer working on this and then I look at the code examples produced by AMD and there are just so many example. I just don't feel like I have the manpower to compete with AMD, Nvidia, Apple and Intel. I want my version to support all GPU backends, the CPU and the Cell. The OpenCL C compiler is not done but I was thinking about writing it for my master's thesis. But I don't know.....Any other open source developers out there with advice/tips on dealing with competition from corporations? Will they steal my stuff and not make it free source? Is it worth it to put effort into open source?

    3. Re:UniversCL by anti-NAT · · Score: 1
      I still encourage you to work on it.

      Firstly, "in theory, practice and theory are the same, in practice they're not." You'll learn from the process of implementing it, and if you provide your code (and a reasonable number of comments), what you've leaned will be available to other people who read your code.

      Secondly, with the right licence, e.g. GPL, corporations won't (or shouldn't be able to) steal your code. If they do, you have legal grounds to sue them.

      I don't know the architecture of the ATI/Nvidia GPUs, however from what I understand of the Cell CPUs they might be fairly dfferent with their SPEs. You may encounter problems and develop solutions for them that nobody else will develop. Even better, because your code will be open source, you'll also be open publishing the solution.

      Competition is good, but competition relies on competitors. If everybody gave up because somebody else had done it, the world wouldn't be anywhere near as advanced as it is - and the Olympic games would be boring, because there'd only ever be one "competitor" in any race.

      --
      The Internet's nature is peer to peer - 20050301_cs_profs.pdf
    4. Re:UniversCL by BitZtream · · Score: 1

      ... You were doomed to fail for multiple reasons. 'nearly done supporting the CPU and the Cell'. ... which CPU? ARM, x86, SPARC, PPC? Are you ignoring all the other implementations that already support OCL on x86?

      If this comes as a blow to you, you didn't do any research before you started and I find it really hard to believe you haven't come across the other existing implementations in your research for your own project.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    5. Re:UniversCL by phil_ps · · Score: 1

      I just finished the code for google summer of code today. I supports every cpu that is supported by gcc. And it also supports the cell. It does not have an OpenCL C compiler yet or support for the image api.

    6. Re:UniversCL by Anonymous Coward · · Score: 0

      There are two things to note here. First, a Cell implementation of OpenCL would be very valuable - probably moreso than an x86 implementation for the time being. Secondly, don't throw out the work you've done on x86 yet: there have been a few projects to write optimizing x86 compilers for CUDA despite the fact that nVidia provides a CUDA->x86 compiler, because nVidia's achieved very poor performance. That may be the case with AMD's OpenCL->x86 compiler, in which case an optimizing compiler is still very much worthwhile.

    7. Re:UniversCL by TikiTDO · · Score: 1

      Why do you feel like you have to compete? Unless you went in expecting a profit, which is unlikely given the open nature of the project, you are contributing to the progress of humanity.

      Now, give a proper license, your product will likely be used by a few, and maybe even included into the PS3 and other cell based systems, spreading your name far and wide. So, look at this as an advertisement opportunity. If you release your project soon, you could put on your resume that you were among the first OpenCL implementers, and if it's used by other companies, that's just extra selling points. Hell, maybe even get in touch with Sony, and let them know that you have an open source project that they may be able to use.

  30. Re:Optimization by AcidPenguin9873 · · Score: 1

    And to take that one step further, both Intel and AMD are planning on integrating the GPU on-die in future products, just like the math coprocessor moved on-die 15-20 years ago.

  31. Re:Optimization by raftpeople · · Score: 1

    The problem is typically with how you set up your data structures to solve the problem at hand. When I converted my CPU code to run on a GPU, I had to go through and re-work the problem. I changed the way my data was stored, which was previously optimized for CPU serial processing and caching etc. to something that matched the GPU's model of queuing up read requests of multiple adjacent words while previously read memory is being processed.

    These types of changes aren't really optimizations the compiler can do.

  32. Re:Optimization by ThePhilips · · Score: 1

    It was already explained above. CPU and GPU are very different at handling things, meaning that top level algorithms used are very different.

    Unless of course you can point at a compiler which can rethink and rewrite the program.

    --
    All hope abandon ye who enter here.
  33. Funny, cus this is about GPU ascendency. by Chris+Burke · · Score: 1

    Now that we have CPUs with literally more cores than we know what to do with, it makes sense to use those cores for graphics processing. I think that within a few years, we'll start seeing games that don't require a high-end graphics card- they'll just use a couple of the cores on your CPU.

    LOL. That's funny, because this is about exactly the opposite -- using the very impressive floating point number crunching power of the GPU to do the work that the CPU used to do. OpenCL is essentially an API for being able to use your GPU for general purpose computing. Not a way to use your CPU to do rendering (OpenGL already does that).

    Your CPU, four cores and all, is a LOOOOOOONG way from being able to do what your graphics card does wrt 3d rendering. That's okay, the tradeoffs are different for something that's supposed to be able to run databases just as competently as finite element analysis. But for raw floating point throughput on embarassingly parallelizable tasks -- which the 3d rendering pipeline is, and thus why GPUs are optimized around it -- the GPU is miles ahead. Thus the motivation to use it instead of the CPU.

    It makes sense, and is actually a good thing. Fewer discrete chips is better, as far as power consumption and heat, ease-of-programming and compatibility are concerned.

    Well you got that right at least, but the way it's going to happen is that you're still going to have a GPU, but it's going to be on the same piece of silicon as your CPU. Both Intel and AMD have combined CPU/GPU products in the pipe that are supposed to be released in 2011, meaning they have been in development for a number of years now.

    Discrete graphics will live on for quite a while though in situations where low power is less important than performance. Both cpu and gpu having separate memory with their own memory controllers optimized for their needs is a big advantage over sharing a memory bus and memory controller. Not having to fit both functions within a single socket's TDP budget is another.

    Eventually, the built-in UMA graphics may become good enough that it doesn't make sense to have a separate card. In the meantime, discreet graphics cards will live on, and the GPU in general ain't going anywhere -- it's only becoming even more important!

    --

    The enemies of Democracy are
  34. CUDA is the reverse by raftpeople · · Score: 0

    CUDA allows you to easily compile C code to run on the GPU, not the reverse.

  35. Who modded this insightful? by Prien715 · · Score: 1

    If history tells us anything, it's quite the opposite. For years, graphics cards have been getting more and more cores and applications (especially games or anything 3D) have come to rely on them much more than the CPU. I remember playing Half-life 2 with a 5 year old processor and a new graphics card...and it worked pretty well.

    The CPU folk, meanwhile, are being pretty useless. CPUs haven't gotten much faster in the past 5 years; they just add more cores. Which is fine from the perspective of a multiprocess OS, but the fact remains that some algorithms you can parallelize, others you can't...and a GPU with hundreds of cores is only going to be as fast at one of these as its fastest core.

    We'll see. My bet is if Intel/AMD just keep dumping more cores in the processors, they'll risk becoming irrelevant as we'll have more processors than we know what to do with (see the SGI's Prism...which was terribly slow despite having dozens of processors.)

    --
    -- Political fascism requires a Fuhrer.
    1. Re:Who modded this insightful? by Anonymous Coward · · Score: 0

      Parent is mostly talking out of his ass.

      Source engine games, especially Orange box versions like Half Life 2, Episode 1 & 2, Portal, and Team Fortress 2 are very heavily CPU intensive games.

      I've got a dual core Optiron clocked at 2.9ghz and a GTX260. The CPU pegs itself at 100% and gets red hot while the GPU hardly gets pushed at all. Valve has actually made a lot of effort to move more tasks over to the GPU in later versions of the engine (L4D actually does a better job of utilizing the GPU).

    2. Re:Who modded this insightful? by Prien715 · · Score: 1

      Whatever.

      My system was an Athlon 2200+ with 2 GB RAM and a GeForce 6800. HL2 ran just fine -- albeit not at max res/max detail. As another anecdote, even upgrading my brother's old box (Athlon64 3000+) to from a GF5500 to a GF7200 yielded tremendous performance gains.

      --
      -- Political fascism requires a Fuhrer.
  36. Re:DIDN'T APPLE COME UP WITH THIS ABOUT A YEAR AGO by Anonymous Coward · · Score: 0

    The PR freaks have always said CUDA could and would work where ever nvidia want, ie CPU or supercomputers. Look up the stanford uni "Computer Systems Colloquium - Winter 2008 - Scalable Parallel Programming with CUDA on Manycore GPUs (February 27, 2008) - (February 27, 2008) John Nickolls from NVIDIA ". Video should be on the net somewhere.

  37. Not exactly by raftpeople · · Score: 1

    "Now that we have CPUs with literally more cores than we know what to do with,"

    For many problems, multi-core CPU's aren't even close to having enough power, that's why all of the interest in utilizing the GPU processing power.

    They are different ends of a spectrum: CPU generally=fast serial processing, GPU generally=slow serial, fast parallel. Some problems require fast serial processing, some require fast parallel processing and some are in between. Both are valuable tools and neither will replace the other, although merging them onto one chip with shared memory/cache would be great.

  38. Unsurprising by SleepyHappyDoc · · Score: 1

    AMD obviously has a vested interest in making their scheme an industry standard, so of course they'd want to support Larrabee with their GPGPU stuff. Larrabee has x86 lineage (of some sort, I'm not clear on exactly what or how), so they'd have to have at least some x86 support to be able to use their scheme on Larrabee. It seems to me that if they were going to bake some x86 support in there, they may as well add regular CPUs in as well (if you already wrote 90% of it, why not write the other 10%?).

    I don't really know anything about this kind of stuff, but this news strikes me as unsurprising, given the environment.

    --
    Stasis is death. Embrace change.
  39. Download link please? by Anonymous Coward · · Score: 1, Funny

    Where is the link to the source tarball?
    Can't find it, just some more mumbo jumbo about delivering seameless integration with the goatse paradigm shift, blah, blah, etc.

  40. Bah. The Amiga did it already. by straponego · · Score: 1

    http://everything2.com/index.pl?node_id=1311164&displaytype=linkview&lastnode_id=1311164

    Exactly the same thing.

    I said EXACTLY!

    [wanders off, muttering and picking bugs out of beard]

  41. Re:DIDN'T APPLE COME UP WITH THIS ABOUT A YEAR AGO by Anonymous Coward · · Score: 0

    My bad, I forgot it was still for developers only. Although frankly it's so easy to become an nVidia "developer" that it may as well be called a public beta.

  42. Undermining Larrabee? by w0mprat · · Score: 1
    Is AMD cleverly trying to undermine Intel's Larrabee threat? If this code can run abstracted enough that it doesn't matter what CPU/GPU is under the hood, this knocks out the main point of selling point larrabee: x86 code.

    (Ars makes a similar point:)

    the fact that Larrabee runs x86 will be irrelevant; so Intel had better be able to scale up Larrabee's performance

    If AMD is working on a abstraction layer that lets OpenCL run on x86, could the reverse be in the works, having x86 code ported to run on CPU+GPGPU as one combined processing resource? AMD may be trying to make it's GPUs more like what Intel is trying to achieve with larrabee - a bridge between CPU and GPU -- yet Intel is originally trying to undermine the GPU as a unique processing platform.

    --
    After logging in slashdot still does not take you back to the page you were on. It's been that way for 20 years.
  43. Re:Optimization by Anonymous Coward · · Score: 1, Funny

    I used to have a 486 40mhz DLC cpu from Texas Instruments. It didn't have a math co-processor... Can you believe it? A TI chip that couldn't do math!

    We used to joke that DLC stood for:

    Da Low Cost

  44. Re:Optimization by Anonymous Coward · · Score: 1, Funny

    The DX is for Dux!

  45. Re:Optimization by lennier · · Score: 1

    "Unless of course you can point at a compiler which can rethink and rewrite the program."

    That's exactly what Lisp was invented for.

    Pity we abandoned it in the 1980s and left it half-built.

    --
    You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
  46. Re:Optimization by Jamie+Lokier · · Score: 1

    Those types of change aren't all that radical, even though they're not commonly implemented in compilers at the moment, as far as I know.

    You're not describing major algorithm changes, just reorganising data to suit different batching requirements, reorganising loops and so on.
    Reorganising loops is decades old already.

  47. I just had a by Cur8or · · Score: 0

    "why didnt they do this 10 years ago"-moment. Go A-Team!

    --
    Winkey shortcut mapping for 64bit windows. WinKeyPlus
  48. dear SANTA: AMD wishlist by korkakak · · Score: 1

    Hmmmm what about a *working*, *full-featured* linux driver instead of SDKs?

  49. CUDA on x86 by Anonymous Coward · · Score: 0

    The advantage of being able to run the same code on a GPU and an x86 multicore is that some parts of some apps run faster on one or the other, and with a compiler that targets both you can easily move apps between them.

    GPU architectures are becoming very similar to CPU architectures, enough so that it is becoming possible to write compilers that generate efficient code for each. On the NVIDIA side, my group wrote an emulator for running CUDA on x86 ( http://code.google.com/p/gpuocelot/ ). The step from an emulator like this to a compiler is not huge...

  50. ion.SIMIAN.c gets spanked, again? by Anonymous Coward · · Score: 0

    http://tech.slashdot.org/comments.pl?sid=1327945&cid=28981391 see subject above and read all about it in that url link. Ion.SIMIAN.c only brought it on himself, as usual.

    1. Re:ion.SIMIAN.c gets spanked, again? by Anonymous Coward · · Score: 0

      Oh wise APK from the year 1998, please tell me, what is a "url link"?

  51. Re:Optimization by HoppQ · · Score: 1

    The FX is for...

    Dammit, why didn't I know this stuff as teen?

    --
    My sig will be released in 2015 third quarter. Rating pending.
  52. ion.simon.c is a convicted child rapist by Anonymous Coward · · Score: 0

    ion.simon.c is a convicted child rapist who was caught several years ago raping and molesting little boys.

  53. Re:Optimization by badkarmadayaccount · · Score: 1

    Pure functional or dataflow programming FTW!

    --
    I know tobacco is bad for you, so I smoke weed with crack.