Slashdot Mirror


An Open Source Compiler From CUDA To X86-Multicore

Gregory Diamos writes "An open source project, Ocelot, has recently released a just-in-time compiler for CUDA, allowing the same programs to be run on NVIDIA GPUs or x86 CPUs and providing an alternative to OpenCL. A description of the compiler was recently posted on the NVIDIA forums. The compiler works by translating GPU instructions to LLVM and then generating native code for any LLVM target. It has been validated against over 100 CUDA applications. All of the code is available under the New BSD license."

3 of 71 comments (clear)

  1. Alternative? by Guspaz · · Score: 4, Insightful

    This isn't an alternative to CUDA; it lets CUDA code run on x86, but still doesn't do anything for AMD graphics cards. In other words, your choices as a developer are to use OpenCL and have your code run everywhere (AMD, nVidia, x86 slowly), or use CUDA and have your code run on nVidia or x86 slowly.

    What possible reason could you have to want to be locked into one GPU vendor?

    1. Re:Alternative? by TheRaven64 · · Score: 4, Informative

      it lets CUDA code run on x86, but still doesn't do anything for AMD graphics cards

      Actually, it does. It lets CUDA code run on any processor that has an LLVM back end. The open source Radeon drivers have an experimental LLVM back end and use LLVM for optimising shader code.

      --
      I am TheRaven on Soylent News
    2. Re:Alternative? by CDeity · · Score: 4, Informative

      The greatest challenges lie in accommodating arbitrary control flow among threads within a cooperative thread array. NVIDIA GPUs are SIMD multiprocessors, but they include a thread activity stack that enables serialization of threads when they reach diverging branches. Without hardware support, this kind of thing becomes difficult on SIMD processors which is why Ocelot doesn't include support for SSE yet. It is also one of the obstacles for supporting AMD/ATI IL at the moment, though solutions are in order.

      Translation from PTX to LLVM to multicore x86 does not necessarily throw away information concerning the PTX thread hierarchy initially. The first step is to express a PTX kernel using LLVM instructions and intrinsic function calls. This phase is [theoretically] invertible and no information concerning correctness or parallelism is lost.

      To get to multicore from here, a second phase of transformations insert loops around blocks of code within the kernel to implement fine-grain multithreading. This is the part that isn't necessarily invertible or easy to translate back to GPU architectures and is what is referenced in the note you are citing.

      Disclosure: I'm one of the core contributors to the Ocelot project.