An Open Source Compiler From CUDA To X86-Multicore
Gregory Diamos writes "An open source project, Ocelot, has recently released a just-in-time compiler for CUDA, allowing the same programs to be run on NVIDIA GPUs or x86 CPUs and providing an alternative to OpenCL. A description of the compiler was recently posted on the NVIDIA forums. The compiler works by translating GPU instructions to LLVM and then generating native code for any LLVM target. It has been validated against over 100 CUDA applications. All of the code is available under the New BSD license."
This isn't an alternative to CUDA; it lets CUDA code run on x86, but still doesn't do anything for AMD graphics cards. In other words, your choices as a developer are to use OpenCL and have your code run everywhere (AMD, nVidia, x86 slowly), or use CUDA and have your code run on nVidia or x86 slowly.
What possible reason could you have to want to be locked into one GPU vendor?
Why would you go from CUDA(Fast Floating-points) to x86(slower Floating-points)?
Is there support yet for double-precision floating points yet on Nvidia cards?
This makes as much sense as a Wookiee on the planet Endor.
Unless the Point is portability but, then why write it in Cuda to begin with?
Because if you don't have the right hardware, CUDA isn't fast floats. It's a program that doesn't run at all.
A bit off topic, but since I'm seeing posts about OpenCL and portability...
OpenCL will indeed get you portability between processors, however OpenCL does not make any guarantees about how well that portable code will run. In the end, to get optimum performance you still have to code to the particular architecture on which that your code is going to run. For example, performance on Nvidia chips is extremely sensitive to memory access patterns. You could write OpenCL code that runs very well on Nvidia chips, but runs poorly on a different architecture.
Not saying that portability isn't a good thing, but a lot of people seem to be thinking that OpenCL will solve all your portability problems. It won't. It only will let code run on multiple architectures. You'll still have to more or less hand optimize to the architecture.
In the course of every project, it will become necessary to shoot the scientists and begin production.