Why 'Gaming' Chips Are Moving Into the Server Room
Esther Schindler writes "After several years of trying, graphics processing units (GPUs) are beginning to win over the major server vendors. Dell and IBM are the first tier-one server vendors to adopt GPUs as server processors for high-performance computing (HPC). Here's a high level view of the hardware change and what it might mean to your data center. (Hint: faster servers.) The article also addresses what it takes to write software for GPUs: 'Adopting GPU computing is not a drop-in task. You can't just add a few boards and let the processors do the rest, as when you add more CPUs. Some programming work has to be done, and it's not something that can be accomplished with a few libraries and lines of code.'"
I've heard that many programmers have issues coding for 2 and 4 core processors. I'd like to see how they'll addapt to running "run hundreds of threads" in parallel.
Sent from my iPhone 5
This is a long-standing issue. If your programs don't just "magically" run faster, then count out 90% or more of the programs that will benefit from this.
"No matter where you go, there you are." -- Buckaroo Banzai
The sysdamins need new machines with powerful GPUs, you know, for business purposes.
Oh and, they sell ERP software on Steam now, too, so we'll have to install that as well.
I was interested in CUDA until I learned that even the simplest of "hello world" apps is still quite complex and quite low-level.
NVidia needs to make the APIs and tools for CUDA programming simpler and more accessible, with solid support for higher-level languages. Once that happens, we could see adoption skyrocket.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
"OpenCL is managed by a standards group, which is a great way to get nothing done"
I don't see the correlation.
In soviet Russia, God creates you!
Sounds like a perfect job for OpenCL. When a program is rewritten for OpenCL, you can just drop in CPU's or GPU's and they get used.
It doesn't take a few libraries and lines of code... It takes a SHITLOAD of libraries and lines of code! - Lone Starr
I remember reading that IBM was planning to put Cell in mainframes and other high-end servers several years ago, supposedly to accrue the same benefits. I don't really know whether or not that was ever followed through with, I haven't kept track of the story.
American Third Position
Finally, a real choice!
I'm really interested in using GPGPU for my physics calculations. But you know - I don't want to learn Nvidia's low-level, proprietary (whateveritis) in order to do an addition or multiplication, which may or may not outperform the CPU version. What would be _really_ great is stuff like porting the standard "low-level numerics" libraries to the GPU: BLAS, LAPACK, FFTs, special functions, and whatnot - the building blocks for most numerical programs. LAPACK+BLAS you already get in multicore versions, and there's no extra work on my part to use all cores on my PC. Please, computer geeks (i.e. more computer geek than myself), let me have the same on the GPU. When that happens, we can all buy Nvidia HotShit gaming cards and get research done. Until then, GPGPU is for the superdupergeeks.
So.. webpages will soon be available in 3D with anti-aliasing and realistic shading?
So why a GPU rather than a dedicated DSP? Seems they do pretty much the same thing except a GPU is optimised for graphics. A DSP offers 32 or even 64 bit integers, have had 64 bit floats for a while now, allow more flexible memory write positions, and can use the previous results of adjacent values in calculations.
...coming soon to a server farm near you!
No mention of Microsoft's RemoteFX coming in Windows 2008 R2 SP1? RemoteFX uses the server GPU for compression and to provide 3d capabilites to the desktop VMs.
Any company large enough for a datacenter is looking at VDI and RemoteFX is going to be supported by all of VDI providers except VMware. VDI, not relatively niche case massive calculations, will put GPUs in the datacenter.
If your data center is running stochastic tests, trying scenarios on derivative securities, it's a big win. If it's serving pages with PHP, zero win.
There are many useful ways to use a GPU. Machine learning. Computer vision. Finite element analysis. Audio processing. But those aren't things most people are doing. If your problem can be expressed well in MATLAB, a GPU can probably accelerate it. MATLAB connections to GPUs are becoming popular. They're badly needed; MATLAB is widely used in engineering and scientific work, but it's not as fast as it should be.
There's always an application for that.
"No matter where you go, there you are." -- Buckaroo Banzai
I could almost EOM that. They're massively parallel, deeply pipelined DSPs. This is why people have trouble with their programming model.
The only difference here is the arrays we're dealing with are 2D and the number of threads is huge (100s-1000s). But each pipe is just a DSP.
OpenCL and the like are basically revealing these chips for what they really are, and the more general purpose they try to make them, the more they resemble a conventional, if massively parallel, array of DSPs.
There's a lot of comments on this subject along the lines of "Why couldn't they make it easier to program?" Well, it always boils down to fundamental complexities in design, and those boil down to the laws of physics. The only way you can get things running this parallel and this fast is to mess with the programming model. People need to learn to deal with it, because all programming is going to end up heading this way.
It is a Graphics Processing Unit, not a Gaming Processing Unit. Sure, they are great for gaming, but also very useful for other types of 3D and 2D rendering of graphics.
Saw the title of this article and wondered "how will Las Vegas casinos make the move to have all of my gaming chips put onto a server."
While it is true that parallelization does not necessarily assist a single program to operate more efficiently or faster, it is true that multi-cpu systems allow more concurrent programs to operate. In a major corporate context, there are 1000's of jobs running at any given time. The more effective number of CPU's (and memory) the better to keep costs down.
Sounds very interesting to me, as I'm pretty sick of upgrade treadmills. OnLive would probably also wipe out hacked-client based cheating (though bots and such might still be doable). It would also allow bleeding-edge games to be enjoyed by those without the best hardware, increasing adoption rates for those types of games.
Make sure everyone's vote counts: Verified Voting
so a car analogy would be, CPU are normal cars, and GPUs are dragracers.. high speed and no brakes?
I've done a little CUDA programming, and I've yet to find significant speedups doing it. Every single time, some limitation in the arch keeps it from running well. My last little project, ran about 30x faster on the GPU than the CPU, the only problem was that the overhead of getting it to the GPU + computation + overhead of getting it back, was roughly equal to the time it took to just dedicate a CPU.
I was really excited about AES on the GPU too, until it turned out to be about 5% faster than my CPU.
Now if the GPU was designed more as a proper coprocessor (ala early x87, or early Weitek) and integrated into the memory hierarchy better (put the funky texture ram and such off to the side) some of my problems might go away.
Even better would be a language that didn't need horrendous amounts of crappy boilerplate code for every API.
What about FPGAs ? You could install FPGAs in whole server farms, thus making it cheaper and also upgradable over software.
Dont want GPU, but wanted something else? "install" it without opening the cabinet!
I for one wouldn't mind gaming chips moving to the bedroom, if you know what I mean.
o hai
Erlang has a good support for threads and parallelism, I think it would be a great idea to add support of GPU in Erlang. Erlang has a natural way to write parallel applications in a functional programming style also it implements the notion of "green threads" very well. Does someone see the perspectives ?
We've already known about what we could have done.
We've been writing code to use multiple cores for some time already. The trick is to (also) avoid locking, because locking generates serialization. Virtualization has its limits, since any form of communication between the virtual machines (and their processes) becomes expensive.
Suggest using new techniques such as atomic variables (atomic built ins) and locking for whatever needs to be shared and then divide to conquer! Easiest thing is to delegate transactions to specific threads, so that the data that pertains the transaction is kept in one place -- doesn't move around. I know, easier said than done.
If you wish for your computations to be parallel at a level higher than algorithm steps (i.e. you can build libraries upon libraries that are efficient parallel computation throughout the layers of libraries), then neither the CUDA driver or the CUDA runtime API (or OpenCL or DirectCompute) are very good. An example of this for CUDA is that even usage of the Fermi concurrent kernel execution feature is not generally possible using all (or even very many) CUDA kernels in a program by just using the CUDA APIs.
MPI (message passing interface) gives parallel computation at the clustering level and the Kappa Library gives you this at the library component level. If somebody knows about something other than MPI or Kappa that does this and is available for general use, I would be interested to hear about it.