Australia's CSIRO To Launch CPU-GPU Supercomputer
bennyboy64 contributes this excerpt from CRN Australia: "The CSIRO will this week launch a new supercomputer which uses a cluster of GPUs [pictures] to gain a processing capacity that competes with supercomputers over twice its size.
The supercomputer is one of the world's first to combine traditional CPUs with the more powerful GPUs.
It features 100 Intel Xeon CPU chips and 50 Tesla GPU chips, connected to an 80 Terabyte Hitachi Data Systems network attached storage unit. CSIRO science applications have already seen 10-100x speedups on NVIDIA GPUs."
Why is it "more powerful" than "traditional" CPUs?
And why is it not under my hood already if it is superior technology?
Can someone explain exactly what the benefits/drawbacks of using GPUs for processing?
It would also be nice if someone could give a quick run down of what sort of applications GPUs are good at.
The article didn't seem to mention cost, power usage, heat, or anything remotely relevant. Just a nice happy fluff piece for NVIDIA who I do adore but really these articles on slashdot do not have as much tech sustenance as it used to.
..can it Run CRySiS?
Graphics processing, the technically demanding part of PC gaming, uses GPUs essentially exclusively. Physics processing, the runner-up, can already be loaded off to technically-similar PPUs, or even actual GPUs working as physics processors. The reason that most apps run on the CPU is that it's easier to write for, not that most apps actually run better on it for some fundimental reason.
No kidding!!! What do you say at this point?
A super computing cluster is already used for highly parallelized problems. Using hardware that handles those kinds of problems at a far greater speed than a typical CPU is a no-brainer. I think the part of the story that would be real interesting to the /. crowd is what exactly are the kinds of problems they're using this cluster to speed up. GPUs aren't too keen on problems involving data that is hard to cache and as far as I know, the instruction set is somewhat limited to doing lots of little, parallel calculations, but have a hard time with large, solid problems.
I am very interested in seeing what kinds of research this will help the most with and what areas will still be more efficient to run on Xeons/Opterons.
-Oz
In other words, the GPU is a single-instruction-multiple-data (SIMD) device. It matches well simple, regular computations like that which occurs in digital signal processing, image processing, computer-generated graphics, etc.
The modern-day GPU is the difference between "Asteroids" (a video game from the 1980s) and Unreal Tournament 2004 (an intense 3D-graphics game of the 21st century).
wow the world of technology is spiking, i remember only a few years ago there was only 1 massive super computer, now every university will have one, what next, link every supercomputer and have a supercomputer cloud or should i say nebula now? :p
the rise of the machine, let me take this time to welcome our new ovelords.
It's not a typo if you understood the meaning!
1) Your problem is one that is more or less infinitely parallel in nature. Their method of operation is a whole bunch of parallel pathways, as such your problem needs to be one that can be broken down in to very small parts that can execute in parallel. A single GPU these days can have hundreds of parallel shaders (the GTX 285 has 240 for example).
2) Your problem needs to be fairly linear, not a whole lot of branching. Modern GPUs can handle branching, but they take a heavy penalty doing it. They are designed for processing data streams where you just crunch numbers, not a lot of if-then kind of logic. So if your problem should be fairly linear to run well.
3) Your problem needs to be solvable using single precision floating point math. This is changing, new GPUs are getting double precision capability and better integer handling, but almost all of the ones on the market now are only fast with 32-bit FP. So your problem needs to use that kind of math.
4) Your problem needs to be able to be broken down in to pieces that can fit in the memory on a GPU board. This varies, it is typically 512MB-1GB for consumer boards and as much as 4GB for Teslas. Regardless, your problem needs to fit in there for the most part. The memory on a GPU is very fast, 100GB/sec or more of bandwidth for high end ones. The communication back to the system via PCIe is an order of magnitude slower usually. So while you certainly can move data to main memory and to disk, it needs to be done sparingly. For the most part, you need to be cranking on stuff that is in the GPU's memory.
Now, the more your problem meets those criteria, the better a candidate it is for acceleration by GPUs. If your problem is fairly small, very parallel, very linear and all single precision, well you will see absolutely massive gains over a CPU. It can be 100x or so. These are indeed the kind of gains you see in computer graphics, which is not surprising given that's what GPUs are made for. If your problem is very single threaded, has tons of branching, requires hundreds of gigs of data and such, well then you might find offloading to a GPU slower than trying it on a CPU. The system might spend more time just getting the data moved around than doing any real work.
The good news is, there's an awful lot of problems that nicely meet the criteria for running on GPUs. They may not be perfectly ideal, but they still run plenty fast. After all, if a GPU is ideally 100x a CPU, and your code can only use it to 10% efficiency, well hell you are still doing 10x what you did on a CPU.
So what kind of things are like this? Well graphics would be the most obvious one. That's where the design comes from. You do math on lots of matrices of 32-bit numbers. This doesn't just apply to consumer game graphics though, material shaders in professional 3D programs work the same way. Indeed, you'll find those can be accelerated with GPUs. Audio is another area that is a real good candidate. Most audio processing is the same kind of thing. You have large streams of numbers representing amplitude samples. You need to do various simple math functions on them to add reverb or compress the dynamics or whatever. I don't know of any audio processing that uses GPUs, but they'd do well for it. Protein folding is another great candidate. Folding@Home runs WAY faster on GPUs than CPUs.
At this point, GPGPU stuff is still really in its infancy. We should start to see more and more of it as more people these days have GPUs that are useful for GPGPU apps (pretty much DX10 or better hardware, nVidia 8000 or higher and ATi 3000 or higher). Also there is starting to be better APIs out for it. nVidia's CUDA is popular, but proprietary to their cards. MS has introduced GPGPU support in DirectX, and OpenCL has come out and is being supported. As such, you should see more apps slowly start to be developed.
GPUs certainly aren't good at everything, I mean if they were, well then we'd just make CPUs like GPUs and call it good. However there is a large set of problems they are better than the CPU at solving.
Hmmm.... is this setup a realisation of this release from Nvidia in March
Nvidia Touts New GPU Supercomputer
http://gigaom.com/2009/05/04/nvidia-touts-new-gpu-supercomputer/
Another 'standalone' GPGPU supercomputer, without the Infiniband switch
University of Antwerp makes 4000EUR NVIDIA supercomputer
http://www.dvhardware.net/article27538.html
Finaly a machine good enought to run Crysis at full specs on 1680x1050 (well, I hope so)
Religion: The greatest weapon of mass destruction of all time
Does it use wood screws?
It depends.
Mod me up, pls.
GPUs are massively parallel handling hundreds of cores and tens of thousands of threads
eh? Massively parallel yes. The rest?
More to do with a single instruction performing the same operation on multiple bits of data at the same time. AKA vector processors. Great for physics/graphics processing where you want to perform the same process on lots of bits of data.
Deleted
We have one of those already; I imagine a lot of schools do. Ours is only an 18-node cluster so the numbers are much smaller, but the story here is that this is relatively big, not that it's some new thing.
From an open source point of view... this is a mistake since we (as open source people) must favor AMD GPUs. Moreover, it has been 2 years the AMD GPUs seem faster than nvidia ones. So from such bad news, open source people must keep the bearing: favor AMD GPUs whatever.
There are indeed tasks that don't parallelise well. My brain's filed them as unimportant, but that's likely due to the difficulty in doing computational work that parallelises poorly rather than some fundimental deficiency. A better way of putting it would be to say that most hard-core research computing is done in a manner that's very similar to hard-core gaming computing, so it's actually a very sensible transition.
No kidding!!! What do you say at this point?
give it more gas, that's what we do? when we run out of/blow up from, gas??? ta da?
some of us are already learning to walk again. it feels pretty good.
modelling how to split the beer atom?
What API would be the best approach for writing some future proof GPU code?
I'm willing to sacrifice some bleeding edge performance now for ease of maintainability.
Other GPU possibilities
* OpenCL
* GPGPU
* CUDA
* DirectCompute
* FireStream
* Larrabee
* Close to Metal
* BrookGPU
* Lib Sh
Cheers
[Intentionally left blank]
... a beowulf cluster of those! ;)
(Sorry, it had to be said)
The biggest benefit to GPU processing is that they are much more adept at floating-point math...that is, 2.5436*23.561234 instead of 1829*2304. The distributed computing efforts (Folding, Boinc projects) have started writing clients for users' gpus as well, and have seen great success so far.
Floating point operations are tedious on normal processors, but the shader units on GPU's, designed to handle complex calculations for graphical effects, process non-integers much faster.
Sort of. NVIDIA's definition of a "thread" is different from a CPU thread -- it's more similar to the instructions executed on a single piece of data in a SIMD system. You're not required to make data-parallel code for the GPU, but certainly data-parallel code is the easiest to write and visualize.
On NVIDIA chips, at least, there are a number of independent processors. The processors execute vector instructions (though all the vector instructions can be conditionally executed, so that, e.g., they only affect some of the data). Optimally, they have many sets of instruction flows at the same time -- they have a built-in zero-cost thread context switch, and computation in one set of threads is used to hide memory access time for the other threads.
http://www.top500.org/system/10186 The machine quoted in TFA is quoting single precision. Currently the ATI boards trounce the Nvidia boards in double precision. The next GPU cluster down the list is Nvidia based at #56 http://www.top500.org/site/690
the cluster actually has 50 Tesla C1070 boards, each of which contains 4 GPUs
so its 200 GPUs, and that is just the initial rollout with additional nodes to be delivered pretty quickly (perhaps waiting for Fermi)
... reading a story some time ago about the use of GPU clusters by organizations on national security watch lists to circumvent ITAR controls.
Have gnu, will travel.
All you need for that is a chisel and a back shed to work in.
http://michaelsmith.id.au