GPUs To Power Supercomputing's Next Revolution
evanwired writes "Revolution is a word that's often thrown around with little thought in high tech circles, but this one looks real. Wired News has a comprehensive report on computer scientists' efforts to adapt graphics processors for high performance computing. The goal for these NVidia and ATI chips is to tackle non-graphics related number crunching for complex scientific calculations. NVIDIA announced this week along with its new wicked fast GeForce 8800 release the first C-compiler environment for the GPU; Wired reports that ATI is planning to release at least some of its proprietary code to the public domain to spur non-graphics related development of its technology. Meanwhile lab results are showing some amazing comparisons between CPU and GPU performance. Stanford's distributed computing project Folding@Home launched a GPU beta last month that is now publishing data putting donated GPU performance at 20-40 times the efficiency of donated CPU performance."
I was thinking about the question of what makes GPUs so great..
.. What is it that a CPU does that a GPU doesn't?
.. I know .. run windows.
I thought
Oh yeah
*I'm kidding I'm kidding*
One more step toward GPU Raytracing. We're already pushing rediculous numbers of polygons, with less and less return for our efforts. The future lies in projects like OpenRT. With any luck, we'll start being able to blow holes through levels rather than having to run the rat-maze.
Javascript + Nintendo DSi = DSiCade
I'll believe it when I see Linpack numbers
Let me see if I have this down right: With the progress of multi-core CPU's, especially looking at the AMD / ATI deal, PC's are moving towards a single 'super chip' that will do everything while phasing out the use of a truly separate graphics system. Meanwhile, supercomputers are moving towards using GPU's as the main workhorse. Doesn't that strike anybody else as a little odd?
Unpleasantries.
Simple video games that run ENTIRELY on the GPU- mainly for developers. Got 3 hours (or I guess it's now going on 7 hours) to wait for an ALTER statement to a table to complete, and you're bored stiff? Fire up this video game, and while your CPU cranks away, you can be playing the video game instead with virtually NO performance hit to the background CPU task.
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
enough power to run WIndows Vista at the same time with DNF, plus every computer game on Earth
Great now Homeland Defence is going to buy up all the graphics cards to prevent their dangerous computing power from falling in the hands of evil script kiddies trying to crack your hotmail account...
Will it still be relevent if Intel delivers 80 cores in five years as they promise? Or will history repeat itself and we'll have our 80 cores plus specialized "math coprocessors" again?
Shh.
For those who are curious, CUDA stands for "compute unified device architecture".
Ben Hocking
Need a professional organizer?
Nvidia out and Intel graphics chipsets in. So long as Nvidia don't even release specs for their cards, I don't foresee their GPUs powering anything I'm involved with.
"Serious" computers won't come with fewer than 4 16x PCI-E slots for hooking in "scientific processing units"...
We used to tell our boss that we were going to do stress-testing when we stayed late to play Q3, this takes that joke to a whole new level.
Oh, you're not stuck, you're just unable to let go of the onion rings.
This may result in people buying high end video cards for headless servers doing weather simulations and the like.
Step back, step back from that sig....
"I thought .. What is it that a CPU does that a GPU doesn't?"
GPUs have dedicated circuitry to do math, math, and more math - and to do it *fast*. In a single cycle, they can perform mathematical computations that take general-purpose CPUs an eternity, in comparison.
Oh, you're not stuck, you're just unable to let go of the onion rings.
NVIDIA announced this week along with its new wicked fast GeForce 8800 release the first C-compiler environment for the GPU
"Wicked fast" GPU? And a compiler?
Sounds like a Boston C Party.
I want to drag this out as long as possible. Bring me my protractor.
Nice to see the mention of Acceleware in the press release. While a lot of the article is about lab results, Acceleware has been delivering actual GPU powered products for a couple of years now.
The 8800 looks like the first GPU that really enters the realm of the old fashioned supercomputing architectures pioneered by Seymour Cray that I cut my teeth on in the mid 1970s. I can't wait to get my hands on their "C" compiler.
Seastead this.
Excellent news! Below is the link, registration required, for the New York Times. I will try to paste the article.
Second. Anyone out there working on books that have examples? Please reply with any good 'how to' sources.
Source: http://www.nytimes.com/2006/11/09/technology/09chi p.html?ref=technology
SAN JOSE, Calif., Nov. 8 -- A $90 million supercomputer made for nuclear weapons simulation cannot yet be rivaled by a single PC chip for a serious video gamer. But the gap is closing quickly.
Indeed, a new breed of consumer-oriented graphics chips have roughly the brute computing processing power of the world's fastest computing system of just seven years ago. And the latest advance came Wednesday when the Nvidia Corporation introduced its next-generation processor, capable of more than three trillion mathematical operations per second.
Nvidia and its rival, ATI Technologies, which was recently acquired by the microprocessor maker Advanced Micro Devices, are engaged in a technology race that is rapidly changing the face of computing as the chips -- known as graphical processing units, or G.P.U.'s -- take on more general capabilities.
In recent years, the lead has switched quickly with each new family of chips, and for the moment the new chip, the GeForce 8800, appears to give the performance advantage to Nvidia.
On Wednesday, the company said its processors would be priced at $599 and $449, sold as add-ins for use by video game enthusiasts and for computer users with advanced graphics applications.
Yet both companies have said that the line between such chips and conventional microprocessors is beginning to blur. For example, the new Nvidia chip will handle physics computations that are performed by Sony's Cell microprocessor in the company's forthcoming PlayStation 3 console.
The new Nvidia chip will have 128 processors intended for specific functions, including displaying high-resolution video.
And the next generation of the 8800, scheduled to arrive in about a year, will have "double precision" mathematical capabilities that will make it a more direct competitor to today's supercomputers for many applications.
"I am eagerly looking forward to our next generation," said Andy Keane, general manager of Nvidia's professional products division, a business the company set up recently to aim at commercial high-performance computing applications like geosciences and gene splicing.
The chips made by Nvidia and ATI are shaking up the computing industry and causing a level of excitement among computer designers, who in recent years have complained that the industry seemed to have run out of new ideas for gaining computing speed. ATI and Advanced Micro Devices have said they are working on a chip, likely to emerge in 2008, that would combine the functions of conventional microprocessors and graphics processors.
That convergence was emphasized earlier this year when an annual competition sponsored by Microsoft's research labs to determine the fastest sorting algorithm was won this year by a team that used a G.P.U. instead of a traditional microprocessor. The result is significant, according to Microsoft researchers, because sorting is a basic element of many modern computing operations.
Moreover, while innovation in the world of conventional microprocessors has become more muted and largely confined to adding multiple processors, or "cores," to single chips, G.P.U. technology is continuing to advance rapidly.
"The G.P.U. has this incredible memory bandwidth, and it will continue to double for the foreseeable future," said Jim Gray, manager of Microsoft's eScience group.
Although the comparison has many caveats, both computer scientists and game designers said that Nvidia GeForce 8800 had in some ways moved near the realm for the computing power of the supercomputing world of the last decade.
The fastest of thes
I think computers will eventually contain an FPGA, which can be re-programmed to perform any task. For example, a physics processor can be programmed into the FPGA when a game launches, folding@home can program the FPGA to do specific vector calculations very quickly, encryption algorithms can be programmed in to perform encryption/decryption very quickly, etc.
FPGAs are getting quite powerful and are getting a lot cheaper. It definitely won't be as fast as a dedicated ASIC, but if programmed properly, it should be able to accelerate certain tasks significantly.
The addition of a C compiler, drivers specific to GPGPU applications and available for linux (!) as well as XP/Vista means that this is going to be seeing widespread adoption amongst the HPC crowd. There probably won't be any papers on it published at SC06 in Florida next week, but over the next year there probably will be a veritable torrent of publications (there already is a LOT being done with GPUs). The new architecture really promotes GPGPU apps, and the potential performance/$ especially factoring in the development time which should be significantly less with this toolchain. A couple 8800GTXes in SLI and I could be giving traditional clusters a run for their money when it comes to apps like FFTs etc. I can't wait till someone benchmarks FFT performance using CUDA. If anyone finds such numbers post and let me know!
Original
It's not unusual at all. CPUs are very general and do certain things very quickly & efficiently. GPUs on the other hand do other things very quickly and efficiently. The type of number crunching that GPUs do is actually well suited to the massively repetitive number crunching done by most of the big super computers [think climatology studies]. Shifting from CPU to GPU architectures just makes sense there.
It's nice to see the name Acceleware mentioned in the NVIDIA press release, although they are missing from the 'comprehensive' report on wired. It should be noted that they have been delivering High performance computing solutions for a couple of years or so already. I guess now it's out of the bag that NVIDIA's little graphics cards had something to with that.
Anyone know of any other companies that have already been commercializing GPGPU technology?
"Let me see if I have this down right: With the progress of multi-core CPU's, especially looking at the AMD / ATI deal, PC's are moving towards a single 'super chip' that will do everything while phasing out the use of a truly separate graphics system. Meanwhile, supercomputers are moving towards using GPU's as the main workhorse. Doesn't that strike anybody else as a little odd?"
16789087
I picture this:
Before:
CPU makers: "Hardware's expensive, keep it simple."
GPU makers: "We can specialize the expensive hardware separatly!"
Now:
CPU makers: "Hardware's cheaper and cheaper, lets keep up our profits by making our more inclusive."
GPU makers: "We can specialize the cheap hardware in really really big number-crunch projects!"
btw, why isn't the reply button showing up? I'm too lazy to hand type the address.
Demented But Determined.
Check out Peakstream (http://www.peakstreaminc.com/). They're a Silicon Valley startup doing a lot of tool development for multicore chips, GPUs and Cell.
They found they could get even more performance by turning off vsync!
We go into NVIDIA's "CUDA" (Compute Unified Device Architecture) here and it's pretty interesting actually.
In other news, the Von Neumann design was discovered to be pretty much Turing complete, but not the best tool for every job. Film at 11.
My honors thesis at college back in 2004 was a framework that would allow you to load pixel shaders (written in CG) as 'threads' and run them in parallel on one GPU. As far as I can tell nVidia has done the same thing, but taken it a step further by translating from C (and more efficient I'm sure).
I guess I should have published that paper back then...oh well.
and they take a lot of memory. does anyone know if the nvidia cards will be able to access main system memory or does that defeat the purpose? (e.g. I am currently running 268 climate states which each take a couple of hours to run and about 1GB of physical memory - so on my cluster of 4 X2 5000s they should be done in a couple of days) 40 times the processing power (these GPUS are probably ATIs with 24-48 pipelines hence the 20-40 times performance) would be awesome (or 128 in the case of the GTX) but where is all the memory gona come from? will we see video cards with 64GB or DDR? if it means recoding eveything from scratch then this benefit wont trickle down to people like me :(
*wipes tears from eyes and sniffs* so anyone know if this will run on my 6800GT :)
They "CUDA" come up with a better acronym.
I think that implementing the gpu as a collection of configurable ALUs is an awesome idea. I have two gripes:
(1) Power Management : I want at least 3 settings (lowest power, mid-range and max-performance)
(2) Where's the killer app? I value my electricty more than contributing to folding and SETI.
If they address these, I'm a customer... (I'm a cheap bastard who is fine with integrated 6150 graphics)
Why is a GPU so great FOR MATH? Parallel processing (it is on Page 2 of the Wired article linked at the first of the Slashdot summary) If you need to have lots of branching and decision making, it is not as good. The better bandwidth, etc sure helps, but parallel processing is part of it. That is why they are so great for tasks such as number crunching involved in graphics (3d is done not by "moving the points" but by changing the base axis around the points-- this is a way of visualizing the math done to transform those point locations to the new point locations when a 3d figure "moves")
So *some* parts of computer transactions can be done in parallel, but if much needs to be in serial, it will ALL be slowed down by the (serial) process that decides which parts go where. If you can make your problems purely non-serial, like math that can be done in chunks and reassembled, without conditions that affect processing bewtween the chunks, THOSE problems can benefit from a parallel processor. Parallel processors are NOT new, but in the home-computing industry they just happen to be represented by GPUs, math co-processors, and not a lot else (dedicated cryptography chips probably too). If there is more demand at home, there will be more manufactured for the home. Currently, games and video are the main home demands, although home audio studios could probably benefit, if those people were to demand a lot more COMPLEX digital signal processing on the fly.(maybe more likely audio soundboards?)
Compilers can also compile out-of-order, which is why a C compiler can benefit-- there is a static end result from a given compiler input-- no interaction and choices not defined by the input.
Infoport
Folding@Home launched a GPU beta last month that is now publishing data putting donated GPU performance at 20-40 times the efficiency of donated CPU performance.
Obviously some of that is due to GPUs being better than general-purpose CPUs at this sort of math, but how much is also due to the fact that the people who are willing to run a Beta version of Folding@Home on their GPU tend to be the sort of people who would have much better computers overall than those who are merely running the project on their CPUs?
To the guy who utilized pixel shaders as threads for his 'thesis': when you do research, one of the most important steps is background research (related works). Ex: anything on http://www.gpgpu.org/ should have provided a cue for original work. Von Neumann machines are out, they clearly don't scale. Scheme taught us that local mutation may be ok, but global not so much. GPU coding exemplifies this assumption and, more interestingly, make hpc apps accessible :)
I hope to work on this stuff next year, assuming funding :)
Intel wants more GPU power in their CPU's. NVidia is using features of their GPU to do problem solving. Which one will win out?
These CPUs are not that easy to program, but they run screamingly fast when you make them work right. I think that IBM may have the edge in putting lots of these together for a cluster computing environment (insert Beowolf joke here). IBM is planning a very large Cell based system, an additonal member of the Blue Gene family. http://en.wikipedia.org/wiki/IBM_Roadrunner
On the other hand, GPUs will have a huge advantage because of the size of the graphics card market. This will drive prices down and make GPU based computation available to the masses.
The following idea from TFA is what caught my eye:
"In a sign of the growing importance of graphics processors, chipmaker Advanced Micro Devices inked a deal in July to acquire ATI for $5.4 billion, and then unveiled plans to develop a new "fusion" chip that combines CPU and GPU functions."
I can see the coming age of multi-core CPU's not necessarily lasting very long now. We don't tend to need a large number of general-purpose CPU's. But a CPU+GPU chip, where the GPU has for example 128 1.35GHz cores (from the Nvidia press release), and with a new generation of compilers written to funnel sections of code marked parallelizable to the GPU portion, and the rest to the CPU, would be tremendous.
Does Intel have any plans to try to acquire Nvidia?
Attention zealots and haters: 00100 00100
nVidia has PureVideo, ATi has whatever. Why are there still no GPU-assisted MPEG2 (or any other format) video encoders? Modern GPUs will do hardware assisted MPEG decoding, but software-only encoding is still too slow. TMPGEnc could be much faster. Same for the others. It seems as though the headlong rush to HD formats have left SD in the dust.
Great if you want fast answers, but the RAM used in GPUs isn't as robust accuracy-wise as normal RAM.
Soon trogans can put entire mini-OSs on the video card...
Hmm, is anyone porting NetBSD to it yet?
(I'm not serious, well, maybe a little.)
While it's probably too late to sign up for the general-purpose GPU tutorial at Supercomputing '06, there may still be time to get to the "General-Purpose GPU Computing: Practice and Experience" workshop (assuming you're going to Supercomputing to begin with.) Workshop's web page is http://www.gpgpu.org/sc2006/workshop/
The workshop itself has turned into a kind of "GPU and multi-core" forum, with lots of great speakers. NVIDIA's Ian Buck and ATI's Mark Segal will both be speaking to the Wired article's material. And IBM and Los Alamos will be talking about Cell and Roadrunner, among other things.
</shameless plug>
So, I wonder what Dinesh Manocha will be talking about at the workshop... Hmmm....
Google "Dominik Goeddeke" and read his GPGPU tutorial. It's excellent, as far as tutorials go, and helped me bootstrap.
Ok, ok, here's the link...
ATi has AVIVO, and they've been doing hardware-assisted encoding in a variety of formats for some time now. Google it up.
Why would anyone engrave "Elbereth"?
GPUs ok... Supercomputers, specific applications, custom code. I'd have thought it'd be an ideal application.
Deleted
Intel's 80 core chip wasn't symmetric; most of those cores were stripped-down processors, not x86 standard. Like the Cell, only more so.
nVidia's G80, while not on the same chip, takes this to 128 cores. G90 will support full double-precision math. And although it's separate from the CPU, graphics cards are such a standard part of most systems that by the time five years have elapsed, you'll likely be able to get a quad-core x86 + 256-core DP gfx/HPC system for somewhat less than Intel's fancy new 80-core release alone.
Why would anyone engrave "Elbereth"?
Unfortunately, the new NV80 is still not IEEE754 compliant for single precision (32 bit) floating point math. It is mostly compliant however, so may be usable by some people. Forget it if you want to do 64 bit double precision floats though.
CPUs are inherently good at doing serial jobs. and GPUs are good at doing parallel jobs. GPUs can be though of as the extreme enhanced graphical equivalent of DSP chips. So basically, any combination of a controlling and parallel execution processor can give you the supercomputing environment you need. Which again brings us back to our traditional supercomputing model; Except for one change, that the mathematical units have grown faster and massively parallel in nature! We haven't done much past anything turing computable anyway: chips growing faster, doing the same thing. So, you first had the CPU. Then you wanted faster graphics. So you seperate it and have the GPU (which is more than a video adapter). Then you want faster computation. So you put it together ('cept the video adapter). It's crazy, but they keep shuffling things over the years, people aren't bored of it anyway, everybody buys stuff and everybody wins. Now again: what does a GPU do that a CPU can't?
-Karthik
I'm all for turing my 1 CPU machine + graphics card into a 2 CPU machine when .NET or JAVA VM runs on the GPU. Following that, I'd like to see a micro-PC for $25 consisting of a boot rom, io interface ports (USB, video, etc) that lets an off the shelf graphics card run as the main CPU and therefore have a machine without a motherboard CPU.
- Graphics card
- Microcontroller for io board with 1 slot for graphics card + USB ports
- Boot rom
- Tiny wall plugin transformer
- Basic tiny router sized case
----
for $25 or less
First of all, the gf8800 has the same deficiency that the cell has, in that both are really good at performing single precision floating point math. This is great for video processing and the like, but real science has been using 64bit floats since the mid 70's. It might be hard to convince users that they can get the wrong answer, but it'll be really cheap and really fast.
secondly, the bandwidth to memory is very high, but the amount of addressable memory is very very low. 768MB of memory, divided by 128 processing units means that the entire problem set for each PE needs to fit in 6MB, otherwise you're bottlenecked going to main memory. Game rendering, conveniently tends to reuse a lot of data, and that data compresses very well in memory. Not so with real science data. This is quite analagous to the problems a lot of scientists are having with Blue Gene, which has 256MB of memory available to each PE.
This is not to say that doing HPC computing on the GPU won't happen, it will just be fairly limited in the number of problems that will port well to that environment. For those that do, however, you can't beat the bang for the buck. I suspect that this is mostly for game physics and video transcoding, as those are things that nvidia/amd can sell as an added value. Anything else just doesn't seem to provide much additional revenue, so I can't imagine them putting a lot of effort into supporting it.
GPU's a slower than a CPU for serialised operations.
Its great for highly parallel processor bound applications, but for anything close to user level apps its just a waste of silicon.
"GPUs have dedicated circuitry to do math, math, and more math - and to do it *fast*. In a single cycle, they can perform mathematical computations that take general-purpose CPUs an eternity, in comparison."
Sounds like there is a lot of untapped potential. I propose we move GPUs off the external cards, and give them their own dedicated spot on the motherboard. Though, since we will allowing it be used for more general applications, we could just call it a Math Processor. Then again, it's not really a full processor like a duel core, so, we'll just call it a Co-Processor. This new "Math Co- Processor" will revolutionize PCs like nothing we have ever seen before. Think of it, who would have thought 20 years ago we could have a whole chip just for floating point math!
Its quite obvious that computing is going in a direction where we won't say GPUs or CPUs, but rather serial processors and parallel processors, with the assumption of having both. The cell processors are a good example of this thought, although they're too heavy on the parallel side. Many tasks do not parallelize well, and will still need a solid serial processor.
Remember this? although it was a failure commercially, it was the right idea after all: lots of small processing units that are able to process in parallel big chunks of data; that's what modern GPUs do.
So what we need now is for this kind of architecture to pass in CPUs (maybe already scheduled from what I've read lately) and then a programming language where operations are parallel, except when data dependencies exist (functional languages may be good for this task).
Until these things are able to do double-precision, their applicability to general HPC problems remains very limited. Make them do DP arithmetic, and benchmark with SPEC and McAlpine's STREAMS benchmarks, and then we'll see. Oh and BTW, make a Fortran-90/95 compiler available.
Perhaps finally, we will see the popular commercial/shareware/freeware programs taking advantage of GPU acceleration.
There are two main areas that I would love to see accelerated by GPU: DivX or other MPEG4 Codec MP3 Codec
Due to the asymmetry in CPU usage it is the ENCODING that would be revolutionized by GPU acceleration. I am sure I am not alone when I think of these two areas as the most time consuming tasks my home PC is set-upon. Yes ATI may have a soilution, but I want to see support for both Nvidia and ATI in a more generally avlaible encoder solution.
Everyone knows that the language of supercomputing is Fortran, for historical (legacy code) as well as truly practical reasons such as braindead language (very good for compiler optimizations and automatic rewriting) efficient and predictible (loop unrolling, peephole optimization, optimal memory access without pointer indirections and heavy objects to pass between functions) linear algebra handling, which is the core of heavy numerical computing. What are they waiting to release a Fortran compiler for the GPU ? I think many chemical (Gaussian, GAMESS) physical (WIEN, VASP...) , biological or engineering packages (STAR-CD, math libraries (ScaLAPACK) are written in FORTRAN.
Google passes Turing test : see my journal
My rational is something along the lines of how Apple may have implemented hardware assisted vector operations; falling back to scalar equivalents when altivec wasn't available.
On kernel startup (or dynamically, assuming hot swapping GPUs!) the system could load a configuration for a shared library to take advantage of GPU acceleration. Whether this happened when coding to a specific API or could somehow be trapped in the platform c lib or at a kernel level I'll leave as an exercise for the reader. [I wouldn't know, I just program for a well know virtual machine. But as that VM might soon be GPLed, hopefully some well meaning soul would transparently integrate such a technology, at least for its math libraries. e.g. Offloading work to a GPU has already seen massive improvements in Swing's performance without touching a line of application level code, or recompiling for that matter. Effectively that virtual machine saw a HW upgrade!]
As a 3D graphics engine programmer, I like programming 3D graphics engines. I like GPUs. I like the fact they are tailored to graphics, and the clever things the artistic graphics programmers can do with them.
:(
So, when Nvidia announce CUDA I start looking for the April Fools joke.
Alas, it's not
They are genuinely creating a generalised multi-cell processor on their GPUs. The only purpose of this is to move the processing of generalised
problems from one chip (the x86) to another (an nvidia gpu). It's easier to get people to replace their graphics card than CPU, I guess.
Not one of the examples in the announcement highlighted how graphics are to be improved with this mechanism. Guess what the G in GPU stands for...
I would suggest reading the net while you're waiting for the computation to finish, but I'm sitting here with Mozilla using 150MB of RAM and burning 98% of CPU because it's gotten itself into some kind of loop.... But Nethack is a nice low-CPU low-RAM game that shouldn't bother your CPU much.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks