NVIDIA's $10K Tesla GPU-Based Personal Supercomputer
gupg writes "NVIDIA announced a new category of supercomputers — the Tesla Personal Supercomputer — a 4 TeraFLOPS desktop for under $10,000. This desktop machine has 4 of the Tesla C1060 computing processors. These GPUs have no graphics out and are used only for computing. Each Tesla GPU has 240 cores and delivers about 1 TeraFLOPS single precision and about 80 GigaFLOPS double-precision floating point performance. The CPU + GPU is programmed using C with added keywords using a parallel programming model called CUDA. The CUDA C compiler/development toolchain is free to download. There are tons of applications ported to CUDA including Mathematica, LabView, ANSYS Mechanical, and tons of scientific codes from molecular dynamics, quantum chemistry, and electromagnetics; they're listed on CUDA Zone."
The toolchain is binary only and has an EULA that prohibits reverse engineering.
A single Radeon 4870x2 is 2.4 TFLOPS.
A single Radeon 4870x2 uses two chips. This Tesla thing uses 4 chips that are comparable to the Radeon ones. It should be obvious that they would be in a similar ballpark.
Seriously, why is this even news?
It isn't. Tesla was released a while ago, this is just a slashvertisement.
I supercomputing circles (i.e. Top500.org) double precision floating point operations seems to be what is desired. 4 TFLOPS single precision, while impressive, is overshadowed by the equally weak 80 GFLOPS double precision, beaten by a single PowerXCell 8i (successor to the Cell in PS3) or the latest crop of Xeons. I'm sure tesla will find its users but we won't see them on the Top500 list anytime soon.
- Henrik
- when the Shadows descend -
Look, there's Python here. You can do the low-level high-performance core routines in C, and use Python to do all the OO programming. This is how God intended us to program.
OO is very good for graphical interfaces, but it isn't particularly well suited for algorithms and other maths oriented stuff.
The term OO is too general to make a statement about its usefulness for mathematics oriented problems. The powerful templating features of modern C++ are indeed very useful for numerical simulations:
It's called C++ Expression Templates, an excellent tool for numerical simulations. ETs can get you very close to the performance of hand optimized C code while they're much more comfortable to use than plain C. Parallelization is also relatively easy to achieve with expression templates.
A research team at my university actually uses expression templates to build some sort of meta compiler which translates C++ ETs into CUDA code. They use it to numerically simulate laser diodes.
Search for papers by David Vandevoorde & Todd Veldhuizen if you want to know more about this. They both developed the technique independently.
Vandevoorde also explains ETs to some degree in his excellent book "C++ Templates - The Complete Guide".
From NVidia's CUDA site, most of their regular display cards support CUDA, just with less cores (hence less performance) than the Tesla card. The cores that CUDA uses are what used to be called the vertex shaders on your (NVidia) card. The CUDA API is designed so that your code doesn't know/specify how many cores are going to be used - you just code to the CUDA architecture and at runtime it distrubutes the workload to the available cores... so you can develop for a low end card (or they even have an emulator) then later pay for th hardware/performance you need.
You're right unless there's a computational way to take advantage of the fact that most neurons in cortex pretty much never fire (1), and that a small minority of synapses are responsible for nearly all of the excitation in a slab of cortical tissue (2). If not active == not important == not necessary to simulate with a 100% duty cycle (these are big "ifs"), then we could be literally about 3-5 orders of magnitude closer to being able to simulate whole brains than anyone realizes.
(1) How silent is the brain: is there a "dark matter" problem in neuroscience? Shy Shoham, Daniel H. O'Connor, Ronen Segev. J Comp Physiol A (2006)
(2) Highly Nonrandom Features of Synaptic Connectivity in Local Cortical Circuits. Sen Song, Per Jesper Sjostro, Markus Reigl, Sacha Nelson, Dmitri B. Chklovskii. PLOS biology March 2005
Expected time to finish is 1 hour and 60 minutes.