An Open Source Compiler From CUDA To X86-Multicore

← Back to Stories (view on slashdot.org)

An Open Source Compiler From CUDA To X86-Multicore

Posted by timothy on Wednesday December 23, 2009 @07:06AM from the abstraction-gains-a-layer dept.

Gregory Diamos writes "An open source project, Ocelot, has recently released a just-in-time compiler for CUDA, allowing the same programs to be run on NVIDIA GPUs or x86 CPUs and providing an alternative to OpenCL. A description of the compiler was recently posted on the NVIDIA forums. The compiler works by translating GPU instructions to LLVM and then generating native code for any LLVM target. It has been validated against over 100 CUDA applications. All of the code is available under the New BSD license."

5 of 71 comments (clear)

Min score:

Reason:

Sort:

Re:Alternative? by Sloppy · 2009-12-23 08:19 · Score: 3, Informative

He means CUDA was here first, and it does(did) lock you into Nvidia. So if you jumped on the bandwagon early, your code is Nvidia only. If you waited for a standard (opencl) (or ported your app) then you're cross-platform.

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Re:Alternative? by TheRaven64 · 2009-12-23 08:50 · Score: 4, Informative

it lets CUDA code run on x86, but still doesn't do anything for AMD graphics cards
Actually, it does. It lets CUDA code run on any processor that has an LLVM back end. The open source Radeon drivers have an experimental LLVM back end and use LLVM for optimising shader code.

--
I am TheRaven on Soylent News
Re:Doesn't sound like a compiler by MostAwesomeDude · 2009-12-23 09:09 · Score: 3, Informative

There is no LLVM backend for AMD/ATI cards. Of the few of us that actually understand ATI hardware, most of us are working on other things besides GPGPU. Sorry.

--
~ C.
Re:Alternative? by CDeity · 2009-12-23 10:58 · Score: 4, Informative

The greatest challenges lie in accommodating arbitrary control flow among threads within a cooperative thread array. NVIDIA GPUs are SIMD multiprocessors, but they include a thread activity stack that enables serialization of threads when they reach diverging branches. Without hardware support, this kind of thing becomes difficult on SIMD processors which is why Ocelot doesn't include support for SSE yet. It is also one of the obstacles for supporting AMD/ATI IL at the moment, though solutions are in order.
Translation from PTX to LLVM to multicore x86 does not necessarily throw away information concerning the PTX thread hierarchy initially. The first step is to express a PTX kernel using LLVM instructions and intrinsic function calls. This phase is [theoretically] invertible and no information concerning correctness or parallelism is lost.
To get to multicore from here, a second phase of transformations insert loops around blocks of code within the kernel to implement fine-grain multithreading. This is the part that isn't necessarily invertible or easy to translate back to GPU architectures and is what is referenced in the note you are citing.
Disclosure: I'm one of the core contributors to the Ocelot project.
Why? by Gregory+Diamos · 2009-12-23 14:13 · Score: 3, Informative
So there seem to be several questions as to why people would want to use CUDA when an open standard exists for the same thing (OpenCL).
Well, honestly, the reason why I wrote this was because when I started, OpenCL did not exist.
I have heard the following reasons why some people prefer CUDA over OpenCL:
- The toolchains for OpenCL are still immature. They are getting better, but are not quite as bug-free and high performance as CUDA at this point.
- CUDA has more desirable features. For example, CUDA supports many C++ features such as templates and classes in device code that are not part of the OpenCL specification.
Additionally I would like to see a programming model like CUDA or OpenCL replace the most widespread models in industry (threads, openmp, mpi, etc...). CUDA and OpenCL are each examples of Bulk Synchronous Parallel models, which explicitly are designed with the idea that communication latency and core count will increase over time. Although I think that it is a long shot, I would like to see more applications written in these languages so there is a migration path for developers who do not want to write specialized applications for GPUs, but can instead write an application for a CPU that can take advantage of future CPUs with multiple cores, or GPUs with a large degree of fine-grained parallelism.
Most of the codebase for Ocelot could be re-used for OpenCL. The intermediate representation for each language is very similar, with the main differences being in the runtime.
Please try to tear down these arguments, it really does help.