Slashdot Mirror


BOINC Now Available For GPU/CUDA

GDI Lord writes "BOINC, open-source software for volunteer computing and grid computing, has posted news that GPU computing has arrived! The GPUGRID.net project from the Barcelona Biomedical Research Park uses CUDA-capable NVIDIA chips to create an infrastructure for biomolecular simulations. (Currently available for Linux64; other platforms to follow soon. To participate, follow the instructions on the web site.) I think this is great news, as GPUs have shown amazing potential for parallel computing."

20 comments

  1. It's thinking... by neomunk · · Score: 4, Interesting

    As someone who is interested in software neural nets, this announcement practically gives me a chubber.

    And let me be the first to welcome our new Distributed Overlord. The lack of an 's' on "Overlord" is the exciting part of this article.
     

  2. What I'm waiting for is by da5idnetlimit.com · · Score: 2, Interesting

    Video conversion for GPU/CUDA (an amd64 version for ubuntu heron, if I get to be really choosy)

    saw something about this, and they were getting unbelievable transcoding speeds...

    --
    It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker
    1. Re:What I'm waiting for is by Qhartb · · Score: 0

      Yep. That program is "badaboom Media Converter" by Elemental Technologies. I look forward to seeing what other applications CUDA has for home users. So far we have video transcoding, gaming physics simulation, and distributed computing projects (SETI&Folding@Home). Doubtless graphics pretty soon, ironically (CUDA ray-tracing). It's really exciting.

    2. Re:What I'm waiting for is by lavid · · Score: 1

      the way that CUDA deals with thread death in the current iterations is lacking. if they make that more graceful, you can really expect to see some insane speedups.

      --
      If Bush wants to kill the terrorists, he should jump off a cliff.
    3. Re:What I'm waiting for is by krilli · · Score: 1

      Interesting - Can you elaborate? Got a link?

      --
      Jag pratar lite svenska.
  3. Single platform only by DrYak · · Score: 3, Interesting

    The only sad thing is that CUDA is a single platform API that only supports a handful of cards from a single constructor. For a project that tries to get as many computers working together as possible like BOINC, it would be also good if they tried to support at least one more API.

    Brook could have been also a nice candidate. It has already been used by other distributed computing project (Folding@home), it supports multiple back-end (including a multi-CPU one which actually works(*), an OpenGL which works with most hardware, and AMD/ATI's CAL backend featured in their Brook+ fork)

    Too bad that currently both nVidia and Intel are trying to attract customers to proprietary single platform APIs (CUDA and Ct resp.)
    Specially given some memory management weirdness in CUDA.

    (*) : unlike CUDA's device emulation mode which is just a ridiculous joke performance-wise.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:Single platform only by Anonymous Coward · · Score: 2, Informative

      CUDA is being ported to ATI/AMD cards with nVidia's blessing and support. By next year there will probably be a lot of hardware support for the API.

    2. Re:Single platform only by Satis · · Score: 3, Informative

      fyi, as the other reply states, CUDA isn't limited to a single manufacturer. nVidia has made it available for other graphics card manufacturers to support. Here's an article on Extremetech talking a bit about it, but at least according to the article ATI doesn't appear interested.

      http://www.extremetech.com/article2/0,2845,2324555,00.asp

      --
      Satis clankiller.com
    3. Re:Single platform only by mikael · · Score: 2, Informative

      There are many parallel processing and networking API's and out there - both past and present - OpenMP, pthreads, CUDA, sockets, etc...

      There is a proposal by Apple to create a common API for parallel processing (OpenCL) which would be cross-platform compatible. The Guardian has an article on this topic.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    4. Re:Single platform only by schwaang · · Score: 1

      Brook could have been also a nice candidate. It has already been used by other distributed computing project (Folding@home), it supports multiple back-end (including a multi-CPU one which actually works(*), an OpenGL which works with most hardware, and AMD/ATI's CAL backend featured in their Brook+ fork)

      Does Brook provide access like CUDA does to fast shared memory and registers vs. device memory vs. host memory?

      (*) : unlike CUDA's device emulation mode which is just a ridiculous joke performance-wise.

      Just to pick a nit, I'm pretty sure that the point of device emulation mode is ease of debugging, not performance.

      On the whole I think we agree that it would be nice for programmers to have a non-proprietary and non-vendor-specific language to express parallel programs in. But at this early stage, with things still emerging, using CUDA directly seems to have some advantages.

    5. Re:Single platform only by krilli · · Score: 1

      CUDA is really easy to use. So easy to use that BOINC+CUDA got off the ground.

      I don't see any cards other than NVIDIA's that are as effective, given cost, effectiveness and ease of programming.

      "A handful of cards from a single constructor"? You can also say "Cheap, powerful cards available anywhere".

      CUDA device emulation is only intended as a partial debugging tool.

      --
      Jag pratar lite svenska.
  4. Lemme put another item on my to-do list... by MostAwesomeDude · · Score: 1

    - Implement CUDA in Gallium, so all Gallium-capable HW can run CUDA

    --
    ~ C.
  5. CUDA is extremely nVidia oriented. by DrYak · · Score: 2, Informative

    Yes, but sorry, CUDA is as much oriented toward other graphic manufacturers as Microsoft's ISO Office XML with all its "use_spacing_as_in_word_96='ture' " options is an open standard.

    It very heavily oriented toward nVidia's architecture. It has several deeply asinine architecture quirks. (you see, you have several different type of memory architecture. The twist is that 3 of them are accessed using regular pointer arithmetic, but textures are accessed using dedicated specific functions. because using "[]" operator like all other memory type wo uld have been too much straight forward).
    Also instead of being just able to declare stream buffers and bind them to some data with a language extensions (as in Brook for exemple) you have to go through a couple of specific function calls into the CUDA API. It's all over 1980's-style C language again.
    This whole thing being very much directed toward an architecture like nVidia which can't apply a kernel on the fly while loading memory from the main memory to the GFX cards, but instead relies on concurrent kernels and loads.
    And don't ask me about this all weird tendency to require the user to go through some function calls just to set a constant to its default value (instead of simply declaring and accessing it directly).

    CUDA provides a nice C-like language for kernels. But the host code it self looks like a direct dump of the driver's interface.
    It's definitely something that won't be easily used by 3rd party developer and map nicely to other architectures.

    That's why ATI isn't interested. Because most of the host API is designed in a way which is very nVidia oriented and won't necessarily map nicely to other architectures.

    FYI, i've been both working on several projects using CUDA and using Brook. Although I appreciate the speed gain of CUDA, and I appreciate having several C-dialects which could get a port of an algorithm between C, CUDA and Brook without too much efforts ; I still find that Brook has a nicer and much more abstract architecture

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:CUDA is extremely nVidia oriented. by krilli · · Score: 1

      CUDA is free and it works. I prefer a hackish CUDA now to a nice, abstract CUDA in two years.

      Also, I do believe someone will write a nice abstraction on top of CUDA. If CUDA is like C++, there will be nice Boost and Qt toolkits for it.

      Also, you can asynchronous memory transfers and kernel executions ... unless you're talking about something else and it's my misunderstanding.

      --
      Jag pratar lite svenska.
  6. More abstraction could be appreciated by DrYak · · Score: 2, Insightful

    Does Brook provide access like CUDA does to fast shared memory and registers vs. device memory vs. host memory?

    No. Being multiplatform to begin with, Brook exposes less details of the memory architecture underneath (because it can vary widely between platform - like CPU to GPU -, or not be exposed at all by the platform underneath - like OpenGL)

    But what it has is that data is represented by simple C-like array, and the compiler remaps that to cached fast texture accesses. No weird "tex2D" functions, unlike CUDA - that's something I find weird in an architecture which is supposed to abstract and simplify GPGPU coding, specially when all the other memory types are accessed in CUDA using C pointer math.

    Probably now that ATI's Brook+ is maturing, extra attributes on variable declaration could be introduced to have more influence on the memory organisation on that specific back-end.

    CUDA is nice because it enables very low-level control on how memory is used. But this currently comes at the cost of syntax complexity.
    It's interesting to note that both CUDA and Brook+ use a matrix multiplication as an example of language usage. Brook+ simply explain how to partition the work to keep the data nicely inside the fast cache. CUDA has a significant amount of code lines devoted to moving data between several Hungarian notation-prefixed pointer, which is a little bit more confusing.

    Just to pick a nit, I'm pretty sure that the point of device emulation mode is ease of debugging, not performance.

    But to be debugable, the code must at least be runnable. Sadly, the emulation is so slow, that it can run real-word complex algorithms only on really small sets of data. Which might be corner cases and you might misses bugs that only happen on larger data sets. Also, it always runs single threaded, no matter how many cores are available in the system, which may lead to missing some concurrency problems (code works fine on CPU but breaks on GPU because a sync is missing somewhere)

    It can be used to debug short matrix-operation algorithms, but it's very hard to debug more complex things like sequence analysis (and there are even a couple of teams trying to do parallelised antivirus on the GPU)

    But at this early stage, with things still emerging, using CUDA directly seems to have some advantages.

    There are cases where the low level-ness of CUDA definitely makes sense :
    when developing code for specially on purpose built hardware. Say, the lab you work in has built a machine with a couple GeForces inside for you project (given the price of graphic cards and the performance increase between each generation, it makes sense to just throw in a couple of hundred bucks per graphic card for a specific project when the performance need arises). CUDA makes sense - even if it is ugly in places - because it'll let you squeeze the last possible cycle out of the hardware.

    But for something that will run distributed across a huge number of home configurations like "@home" distributed computing, adding an API which will bring additional architectures and is more abstract makes sense. Going for a single API roughly restrict the code to running on only half of gamers population's machines.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:More abstraction could be appreciated by krilli · · Score: 1

      Why don't you get cracking then and write a nice Brook BOINC?

      --
      Jag pratar lite svenska.
    2. Re:More abstraction could be appreciated by DrYak · · Score: 1

      Why don't you get cracking then and write a nice Brook BOINC?

      I actually *do* happen to write parallel applications using Brook for bioinformatics processing.
      It just happens that the current application I'm paid for developing doesn't use BOINC. Otherwise I would happily contribute.

      --
      "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    3. Re:More abstraction could be appreciated by schwaang · · Score: 1

      But for something that will run distributed across a huge number of home configurations like "@home" distributed computing, adding an API which will bring additional architectures and is more abstract makes sense. Going for a single API roughly restrict the code to running on only half of gamers population's machines.

      If something like Brook could come *near enough* to generating optimal code for both NVIDIA and ATI cards, I'd agree with you whole-heartedly. I strongly suspect that this isn't the case.

      Imagine if BOINC restricted you to writing i386 code because it will run on everything, but wasted the capabilities of i686 and SSE2 etc.

      I would think it would be better to write a CUDA-optimized client of your algo and a CAL-optimized client and let BOINC feed work appropriately. I believe that's what F@H did w.r.t. various hardware architectures.

      In the longer run I hope for the same utopia you do, where the strengths of each approach inform the final iteration of Brook or whatever succeeds it, and the back-end compilers do the hard work of optimizing for each architecture that programmers have to do today.

    4. Re:More abstraction could be appreciated by krilli · · Score: 1

      OK :)

      Big up!

      --
      Jag pratar lite svenska.
  7. F@H and multiple back-ends by DrYak · · Score: 1

    Yes, indeed, F@H sports quite an original zoo of various computation engines in order to squeeze as much performance as possible from as many clients as possible. Including a client running on PS3's Cell.

    I agree that BOINC should include support for more than 1 single API. Either adding CAL as you suggest (although it's rather low level stuff) or adding Brook (which has a CAL backend - I would think that would be better as it is much higher level).

    And you presume correct, currently Brook only supports nVidia through the OpenGL/GLSL backend which lacks some advanced features. There has been some discussion on some forums about trying a CUDA backend on Brook, but the idea doesn't have enough follower mainly because the most speed critical optimisation (shared memory) won't be easy to implement automatically (in CUDA it's a voodoo art done manually by the coder).

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]