Slashdot Mirror


NVIDIA's $10K Tesla GPU-Based Personal Supercomputer

gupg writes "NVIDIA announced a new category of supercomputers — the Tesla Personal Supercomputer — a 4 TeraFLOPS desktop for under $10,000. This desktop machine has 4 of the Tesla C1060 computing processors. These GPUs have no graphics out and are used only for computing. Each Tesla GPU has 240 cores and delivers about 1 TeraFLOPS single precision and about 80 GigaFLOPS double-precision floating point performance. The CPU + GPU is programmed using C with added keywords using a parallel programming model called CUDA. The CUDA C compiler/development toolchain is free to download. There are tons of applications ported to CUDA including Mathematica, LabView, ANSYS Mechanical, and tons of scientific codes from molecular dynamics, quantum chemistry, and electromagnetics; they're listed on CUDA Zone."

18 of 236 comments (clear)

  1. Graphics by Anonymous Coward · · Score: 5, Funny

    Wow, that's some serious computing power! I wonder if anyone has thought of using these for graphics or rendering? I imagine they could make some killer games, especially with advanced technology like Direct 3D.

    1. Re:Graphics by Gnavpot · · Score: 4, Funny

      "I wonder if anyone has thought of using these for graphics or rendering?"

      These are effectively just NVIDIA GT280 chips with the ports removed. Their heritage is gaming.

      We need a "+1 Whoosh" moderation option.

      No, I do not mean "-1 Whoosh". I want to see those embarrassingly stupid postings. But perhaps this moderation option should subtract karma.

    2. Re:Graphics by GigaplexNZ · · Score: 4, Funny

      I suppose I'm one of those guys now. Hook, line and sinker.

  2. 4 TFLOPS? by Anonymous Coward · · Score: 5, Insightful

    A single Radeon 4870x2 is 2.4 TFLOPS. Some supercomputer, that.

    Seriously, why is this even news? nVidia makes a product, which is OK, but nothing revolutionary. The devaluation of the "supercomputer" term is appalling.

    Also, how much of that 4 TFLOPS you can get on actual applications? How's FFT? Or LINPACK?

    1. Re:4 TFLOPS? by GigaplexNZ · · Score: 4, Informative

      A single Radeon 4870x2 is 2.4 TFLOPS.

      A single Radeon 4870x2 uses two chips. This Tesla thing uses 4 chips that are comparable to the Radeon ones. It should be obvious that they would be in a similar ballpark.

      Seriously, why is this even news?

      It isn't. Tesla was released a while ago, this is just a slashvertisement.

  3. What, no coil? by dgun · · Score: 5, Funny

    What a rip.

    --
    FAQs are evil.
  4. Binary-only toolchain by Anonymous Coward · · Score: 5, Informative

    The toolchain is binary only and has an EULA that prohibits reverse engineering.

    1. Re:Binary-only toolchain by FireFury03 · · Score: 5, Informative

      has an EULA that prohibits reverse engineering.

      Not really a big deal to those of us in the EU since we have a legally guaranteed right to reverse engineer stuff for interoperability purposes.

  5. Let me be the first to say... by rdnetto · · Score: 5, Funny

    4 Terraflops should be more than enough for anybody...

    --
    Most human behaviour can be explained in terms of identity.
  6. Comment removed by account_deleted · · Score: 4, Funny

    Comment removed based on user account deletion

  7. weak DP performance by Henriok · · Score: 5, Informative

    I supercomputing circles (i.e. Top500.org) double precision floating point operations seems to be what is desired. 4 TFLOPS single precision, while impressive, is overshadowed by the equally weak 80 GFLOPS double precision, beaten by a single PowerXCell 8i (successor to the Cell in PS3) or the latest crop of Xeons. I'm sure tesla will find its users but we won't see them on the Top500 list anytime soon.

    --

    - Henrik

    - when the Shadows descend -
  8. Re:Heartening... by mangu · · Score: 4, Interesting

    Can you imagine a Beowulf cluster of these?

    Yes, I can. My first thought when I saw the article was to calculate how many of them one would need to simulate a human brain in real time. The answer is: with 2500 of these machines one could simulate a hundred billion neurons with a thousand synapses each, firing a hundred times per second, which is the approximate capacity of a human brain.

    People have paid $20 million to visit the space station, now who will be the first millionaire hobbyist to pay $25 million to have his own simulated human brain?

  9. Re:Only in C? Oh dear. by xororand · · Score: 5, Informative

    OO is very good for graphical interfaces, but it isn't particularly well suited for algorithms and other maths oriented stuff.

    The term OO is too general to make a statement about its usefulness for mathematics oriented problems. The powerful templating features of modern C++ are indeed very useful for numerical simulations:

    It's called C++ Expression Templates, an excellent tool for numerical simulations. ETs can get you very close to the performance of hand optimized C code while they're much more comfortable to use than plain C. Parallelization is also relatively easy to achieve with expression templates.

    A research team at my university actually uses expression templates to build some sort of meta compiler which translates C++ ETs into CUDA code. They use it to numerically simulate laser diodes.

    Search for papers by David Vandevoorde & Todd Veldhuizen if you want to know more about this. They both developed the technique independently.

    Vandevoorde also explains ETs to some degree in his excellent book "C++ Templates - The Complete Guide".

  10. And in other news... by bsDaemon · · Score: 5, Funny

    ... AMD has annouced today it new Edison Personal Supercomputer technology.

    The game is on.

  11. Re:Heartening... by smallfries · · Score: 4, Interesting

    Your figures are off by several orders of magnitude. 2500 of these is roughly 10,000T/flops. As a Tflop is 10^12 operations, and we have 10^11 neurons that leaves 10^5 floating point operations per neuron. If each has 1000 synapses to process then we are down to 100 operations per connection, per second.

    At this point it seems obvious that you've assumed a really simplistic model of a neuron that can compute a synaptic value in a single floating point operation. These simple neuron models don't behave like a real brain, and scaling up simulations of them doesn't produce anything interesting. Real neurons are capable of computing much more complex functions than these models. The throughput on the interconnect is going to be a major factor, and simulating each neuron will require from 10s to 1000000s of operations depending on the level of biological realism that is required. The Blue Brain project has a lot of interesting material on different models of the neuron and the tradeoff between performance and realism.

    Their end goal is to dedicate a large IBM Blue Gene to simulating an entire column within the brain (roughly 1,000,000 neurons) using a biologically-realistic model.

    --
    Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
  12. Re:cold hard facts about cuda- unbalanced by anon+mouse-cow-aard · · Score: 4, Insightful

    People are always coming out of the wood work to claim supercomputer performance with such and such a solution, go back and look at GRAPE (which is really cool.) http://arstechnica.com/news.ars/post/20061212-8408.html or a lot of other supercomputer clusters. When you want something flexible, you look for "balance" that means a good relationship between memory capacity, latency & bandwidth, as well as computer power. in terms of memory capacity, the number people talk about is: 1 byte/flop... that is 1 Tbyte of memory is about right to keep 1 TFLOP flexibly useful. this thing has 4 G of memory for 4 TF... in other words: 1 byte / 1000 flops. it's going to be hard to use in a general purpose way.

  13. Re:FLOPS not FLOP! by TeknoHog · · Score: 4, Funny

    What's the plural of FLOPS then? My preciouss FLOPSes?

    --
    Escher was the first MC and Giger invented the HR department.
  14. Re:Heartening... by LeDopore · · Score: 5, Informative

    You're right unless there's a computational way to take advantage of the fact that most neurons in cortex pretty much never fire (1), and that a small minority of synapses are responsible for nearly all of the excitation in a slab of cortical tissue (2). If not active == not important == not necessary to simulate with a 100% duty cycle (these are big "ifs"), then we could be literally about 3-5 orders of magnitude closer to being able to simulate whole brains than anyone realizes.

    (1) How silent is the brain: is there a "dark matter" problem in neuroscience? Shy Shoham, Daniel H. O'Connor, Ronen Segev. J Comp Physiol A (2006)

    (2) Highly Nonrandom Features of Synaptic Connectivity in Local Cortical Circuits. Sen Song, Per Jesper Sjostro, Markus Reigl, Sacha Nelson, Dmitri B. Chklovskii. PLOS biology March 2005

    --
    Expected time to finish is 1 hour and 60 minutes.