Impressive GPU Numbers From Folding@Home
ludd1t3 writes, "The Folding@Home project has put forth some impressive performance numbers with the GPU client that's designed to work with the ATI X1900. According to the client statistics, there are 448 registered GPUs that produce 29 TFLOPS. Those 448 GPUs outperform the combined 25,050 CPUs registered by the Linux and Mac OS clients. Ouch! Are ASICs really that much better than general-purpose circuits? If so, does that mean that IBM was right all along with their AS/400, iSeries product which makes heavy use of ASICs?"
Those 448 GPUs outperform the combined 282,111 CPUs registered by the Linux and Mac OS clients. Ouch! Are ASICs really that much better than general-purpose circuits? If so, does that mean that IBM was right all along with their AS/400, iSeries product which makes heavy use of ASICs?"
That's pretty lopsided, but I suppose some of it could be explained away by GPU's not chewing through OS code and having to play nice for memory, so they'd be a bit more efficient. Could be most of those Linux and MacOS systems are long of tooth, but suspect someone's missed a few decimal places somewhere. I do love how quick a theory is posed and the OP starts to run with it. e.g. I look at the balance of my checking account and see there's $1,000 more in there than I expect there to be and immediately form the hypothesis that it's money to spend, without considering whether my rent check has gone through yet. Could be a rough time ahead if I went shopping with it. Either that or the GPU computers are on more than the others.
Whoops, used an old pentium for the math, never mind.
A feeling of having made the same mistake before: Deja Foobar
Everyone knows that ASCII text is faster than binary.
Are ASICs really that much better than general-purpose circuits?
Generally ASICs are much better than general-purpose circuits except in general cases.
Custom app written to run on hardware specifically designed to run apps like it, outperforming general purpose CPUs? Newsflash from Ric Romero!!1!
I want to delete my account but Slashdot doesn't allow it.
So, will someone please create a really pretty 3D screensaver representing the folding calculation process? I'd love to see a represention with hi-res lighting and texturing, full transforms, and user-scalable views at 400 million triangles/sec.. Thanks.
Solomon
"Twice half-assed makes an ass whole." --Solomon K. Chang
I'm waiting for the clients that use all the other ASICS in modern computers. e.g. sound card.
Stats page shows Windows clients putting up 149 TFlops, GPUs only 29. What kind of crack are you smoking?
Maybe I'm missing some subtlety in the OP somewhere, but if GPUs weren't better at what they're doing than CPUs, there wouldn't be a point in having a GPU in the first place.
...and if you have a problem that can be expressed in terms of the problem space the GPU is designed to handle, then that problem is going to run faster on the gpu than on the CPU.
We're all born with nothing.
If you die in debt, you're ahead.
The purpose a general purpose CPU is to handle all calculations. For this task, which is very specific, a GPU may be that much better.
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
GPUs are, for the most part, highly specialized parallel computers. Virtually all modern CPUs are serial computers. They do essentially one thing at a time. Because of this, most modern programming languages are taylored to this serial processing.
Making a general purpose parallel computer is very, very hard. It just so happens that you can use things like shaders for more than just graphics processing, and so via OpenGL and DirectX you can make GPUs do some nifty things.
In theory, and indeed often in practice, parallel computers are much, much faster than their serial counterparts. Hence the reason a GPU that costs $200 can render incredible 3D scenes that a $1000 CPU wouldn't have a prayer trying to render.
Macs and Linux suck. This is SCIENTIFIC proof.
... to start heating your house with your computers ;)
I actually installed boinc with seti on several of my machines last night and it worked quite well to heat part of the house (us Canadians need to turn the heater on earlier). Took a bit of time to get started, but it was nice and toasty in the morning.
Does anyone know if this method is less efficient in generating heat than using a apace heater? Slower perhaps...
If you're going to use energy by turning on the wall heater anyways, why not use it to crunch some numbers?
1q2w3e4r5t6y7u8i9o0pqawsedrftgthyjukilo;p'azsxdcf
Q: Are ASICs really that much better than general-purpose circuits?
Yes, that's why anyone would bother.
Q: If so, does that mean that IBM was right all along with their AS/400, iSeries product which makes heavy use of ASICs?
A: Yes and no. More relevant will Cell pave the way to good price/performance. The problem with the iSeries line is not so much performance, but price/performance For the same cost of an iSeries config you can cluster a bunch of xSeries and beat it through sheer brute force of CPUs. If the QS20 and followups yield better price/performance, it could be interesting.
XML is like violence. If it doesn't solve the problem, use more.
Fast ASICs are better than below-average CPUs, though.
Remember, the GPUs required for this are pretty new, while any CPU can run the normal client.
Look at how new high-end graphics cards have more RAM than the computer I bought just a couple years ago, for example. It's not surprising that new high-end GPUs are faster than average CPUs. Consider, also, that some fraction of people who would have otherwise run the normal client, and also have high-end systems (as demonstrated by their graphics card), have removed themselves from the normal CPU pool.
The "CPU" half of this statistic, then, is full of people with relatively wimpy computers. Are we surprised?
Are there actual benchmarks yet comparing average time per WU for GPU vs CPU?
For all we know the majority of those Linux and Mac clients are old P2s and G3s.
Whenever the offence inspires less horror than the punishment, the rigour of penal law is obliged to give way...
"...makes heavy use of ASICs..." only from someone raised in the x86 culture would find it flabbergasting that special purpose ICs are 3 magnitudes faster than a general purpose program.
What's really exciting is what if only 10% of the PCs that are currently running the CPU version switch to the GPU version, the work output will increase by a factor of 6. What does that mean for the researchers using this data? Will they get the answers they're looking for in a matter of years instead of decades?
GPU Speed % vs OS Type
Windows 6902%
Mac OS X 12818%
Linux 5370%
GPU 100%
Total 5889%
An average of 5889% faster than other "OS's" or PU's.
Not so much to make the poster look less like a moron, e.g. "Are ASICs really that much better than general-purpose circuits? If so, does that mean that IBM was right all along with their AS/400, iSeries product which makes heavy use of ASICs?", but to spare the rest of us having their eyes roll back in their heads.
I used to have excellent vision, but reading these submissions to Slashdot is giving me eye strain from the frequent and violent eye rolls!
If the same processor can be used to generate eye candy and cure cancer, I wouldn't call it application specific.
Escher was the first MC and Giger invented the HR department.
If he is needing the heater this early in the year, it is a safe bet he lives in a climate where a heat-pump alone does not give enough delta-T to work all winter long.
P.S. - all electric heaters have the same efficiency, assuming no energy is "wasted" as visible light. The difference between them basically comes down to radiant vs. convection heat. Which is more useful depends on your circumstances. Radiant heat has the advantage of heating you and not the air.
I was curious about this so I did a bit of reading on their site. It seems like the GPU's are only useful for certain types of calculations. So while the GPU's can get a huge amount done fast they still need processors to handle stuff GPU's aren't made for. Another factor is that the specific model GPU determines what types of calculations it can handle. That's the reasoning behind only supporting certain kinds of ATI cards. They can handle enough different types of calculations that they're worth using. As far as the PS3 is concerned for f@h I would be concerned about overheating if you're running it like that for long periods of time. It's already been a problem to some extent for the xbox 360 and the cell processor is even more powerful. PS3's are very expensive machines (especially at launch) to be using them for f@h if there's risk of overheating.
If a GPU can be so effective, seems like they might be better off building a cluster of ATI powered pc's and running the calculations in-house. I bet that's a lot cheaper than a supercomputer.
But i guess you can't argue with 'free'
Anyone care to explain?
This article has recently been linked from Slashdot. Please keep an eye on the page history for errors or vandalism.
These days high-end graphics cards are multiprocessor DSP systems. That they're also ASICs is too general to be informative here. Those DSPs are programmable like the general-purpose processors, but they wouldn't be as efficient in normal programs. However, in certain types of programs they're very fast due to their simplified memory architecture, pipelining etc. I think it would be more accurate to ask:
"Are multiprocessor DSP systems really that much better than general-purpose multiprocessors systems?"
Usually the speed comes with the loss of programmability. Programs for those DSPs have to be designed with message-passing, tight threading and memory efficiency in mind, so it won't be easy to take advantage of the potential. It's interesting to see how far this will go.
Take one hundred people with computers, and who have an interest in Folding@Home. Offer them a CPU-driven version of the app, and 100 computers will be running the CPU-driven app, regardless of the age/performance of the machine.
Now, offer them a GPU-driven alternative. For the most part, the only people that will install and run it are those with a fancy-schmancy video card capable of running it, and for the most part, the only people that have a fancy-schmancy video card capable of running it have high-performance computers as well (or at least more recent computers that came with compatible cards.)
So let's say that's ten out of the hundred, and those ten are statistically likely to have had the highest-performing CPUs as well; so you've pulled the top ten performers out of the CPU-client pool, and thrown them in the GPU-client pool. Even if you didn't switch those ten people over to the GPU, you could probably isolate those computers' CPU-client performance numbers from the other 90 and find that they're disproportionally faster than a larger number of the slower computers.
There's still more to the story, of course, but you really are taking the highest-performing computers out of the CPU pool and into the GPU pool. The exception would be high-performance servers with lousy/no graphic cards, but those are likely working so hard to perform their normal business that Folding@Home isn't a priority.
So when are we going to see (x86/64) motherboards with a socket for a standard processor and a socket for a vector processor?
Couldn't we finally have graphics cards that only give output to the screen and separate vector processors with a standardized interface / instruction set?
I seriously don't see IBM's AS/400 as the wave of the future. For those of you who get to support these green screen monsters - keep up on your PTF's. I had the unpleasant task of migrating from an AS/400 to an i5 shortly after I started working for a university. We hadn't applied any updates in a few years and it was a nightmare trying to get our different software packages to run on the new system. If it wasn't a licensing issue then it was a hardware compatibility problem. Turns out the new i5's ship with gig ethernet controllers built into the motherboards that don't support older protocols. Fun for all - especially since most OS/400 applications are horribly old. It's solid as a rock over all - as long as you don't hit refresh too many times and lock up 99% of the cpu.
... I was a recent user of that new thing, Linux, and someone made a patch to use the fpu to accelerate memory content transfers, which that someone claimed was an often used operation. (the original patch page seems to be missing...)
;-) ... to speed up things like Oo.o ?
FFW to now, does anyone know whether we could make use of this GPU... e.g., Nvidia MX 440
If so, how?
Thx.
That does sound impressive.
:(
:)
Even if, as i imagine, many of the linux clients aren't exactly top end CPUs. Usually it seems the top-end GPU is as complex as the top-end CPU of the time. I know the transistor count was close when i built my last complete setup. Surely my 1.6 P4 (soon-to-be-linux box) get trashed for complexity/throughput by a new video card by now
That and like others said a targeted client and screaming memory for the GPU is gonna rock. Would be closer if the CPU client was aimed at a particular one, but noone is gonna make a dozen general clients to cover each generation and brand of CPU to best use each little feature.
but yowsers that's a lot of umph
X1900 - 48 pixel shader processors plus 8 vertex shaders. Assuming you manage to run them all equally in parallel: 56 processors.
Standard CPU - 1 core (assuming dual cores get read as 2 CPUs).
448 GPUs x 56 = 25,088 effective processors all with on card memory.
25,050 CPUs x 1 core = 25,050 effective processors all dealing with system busses etc.
In short, if you're performing one simple task trillions of times, many very simple, highly optimized processors with dedicated memory do the job better than even a similar number of much more capable processors that have to play nice across a whole system.
And this ignores the number of old couple of hundred megahertz systems that people don't use anymore so hand over to the task vs. X1900s being the very high end of ATIs most recent line.
For massively parallel tasks like rendering pixels, folding proteins, compressing frames of a movie, etc. I'd absolutely love large quantities of a simple processor. For most other tasks, given present technology, I'd still side with fewer more able processors. Either way comparing 448 of something with 56 processors within it to 25,000 single processors and saying, "But 448 is SO much less than 25,000!" is an unfair comparrison.
Whenever writing a significant bit of code, ask yourself if you can represent the functionality graphically. If so, recode it the function as a graphical problem and use the GPU. If these figures are right, this conversion can carry a 99% 'inefficiency overhead' and still run faster on the GPU than the original code could in the CPU...
A hammer is optimised for the task of hammering nails in boards, it will do this task significantly better than say a screwdriver.
Same with GPU, a GPU is desigened for one specific task (or a number of specific tasks), among them is folding. Not because of folding@home mind you, but because folding is one of the "operators" in image processing.
Now someone is using a GPU to the task for which it was desined, frankly I'm actually surprised they didnt get any more out of it.
QED
Imagine if they had developed this application for NVidia video cards, probably 2x the speed!!1! Go ahead, mod me a troll....I will appologize tomorrow :o)
"My immediate reaction is "WTF? What kind of moron doesn't make things 64-bit safe to begin with?" Linus
That might very well be true, but his question was "Does anyone know if this method is less efficient in generating heat than using a apace heater?"
You are also talking about the most efficient NG heaters around (and we can have a whole other discussion on the problems HE furnaces can cause in a house not designed for them) and your reply to yourself is quoting NG prices at their lowest point this year. I believe prices are down well over 70% from their post-Katrina peak, FWIW.
But yeah, resistive electric heat is not often a cheaper choice than NG.
And on the bulbs, while the heat they produce might be more expensive than gas heat, it does go a surprisingly long way to making CF bulbs less attractive from a return-on-investment POV.
PF "losses" are not losses, it is power that is in effect returned back to the source. One can simply treat it as power that isn't delivered at all. Therefore the original posting can be considered as essentially correct.
>Using your CPU as a space heater is not a bad idea. It is 100% efficient.
Not really. Consider exergy. Yes, your CPU is just as efficient as any electric space heater. However, consider that the alternative is probably burning natural gas or oil in a furnace. If you burn fuel for heat, 90%+ of the chemical energy goes to producing heat (the rest is lost as unburnt hydrocarbons in the exhaust). If you burn fuel to spin a turbine at a power plant, only about 40% goes to electrical energy, and unless it's a cogeneration plant which uses the waste heat for industrial purposes, the rest is lost as heat up the smokestacks. So, starting from the fossil-fuel source, electrical heating is less than half as efficient as burning fuel for heat. If you do need to heat using electric power, it's much more efficient to use that electricity to pump heat in from a lower temperature outside than it is to turn that electricity itself into heat.
If you are stuck with electric (non-heat-pump) heating in your house, however, you are correct: There is absolutely no reason not to run your CPU or any other electrical appliance full tilt.
Imagine if macs stayed with PPC, and use a cell based version like ibms server, it would have 8x grunt.
And before any one ays SPE cannot run general code, look at the instruction set, its as good if not better than the old
68k chips, infact much better as it has lots of cool math simd type instructions. Whats 90% of C/C++ code? lots of IFs and
variable assignments, and structure memcpys. Now if the whole OS would thread to all SPEs on demand it would fly.
Liberty freedom are no1, not dicks in suits.
Look at the number of Tflops per active cpu by OS. .948 .51
= osstats
I took (TFlop/active cpu)*1000 to get a readable number --
or Gflops/cpu
Windows is
Mac is
Linux is 1.21
And GPU is 65!
The source:
http://fah-web.stanford.edu/cgi-bin/main.py?qtype
The average Linux user proably has a decent AMD Athlon,
The average Windoze user has a P4 Dell.
Athlons just crunch the math better.
Power tends to corrupt, and absolute power corrupts absolutely.
A vehicle can be super efficient when designed to take one person from point A to point B over smooth terrain. When you start adding requirements like carrying a family of people with 50 cubic feet of junk and an attached trailer over both smooth terrain and off road, your efficiency drops tremendously. [/obligatory car metaphor]
The more specifically you can narrow down the problem set you're trying to solve, the faster you can solve it. The more specific your tool, the better it will work on that problem.
Why is a high-end computer gaming rig thousands more than a comparable next-gen console? Because it's a lot less specific than a console. Why are GPU's so insanely good at crunching linear streams of parallel floating point operations? Because that's all they do.
The ______ Agenda
So since we are talking games...
The top of the line iMac comes with a 7300GT Nvidia card. You can up it to a mid range 7600GT card at most. Now with Windows installed, do you think that the 7600GT would drive the 24" monitor as well as oh say a dual 7900GT SLI setup?
Oh, get a Mac Pro you say. Well except that you still have no SLI or Crossfire support and extremely limited choices of video cards. And it costs an arm and a leg. Sure it's cheap for a dual Xeon workstation - but if you just want to play games you are better off with a slower CPU and a top of the line graphics card setup.
Oh and the FB-DIMMs in the Mac Pro severly impact gaming performance. Check out the gaming benchmarks and you'll see that the Mac Pro comes in last in every one. The 3GHz Xeon (both dual and quad) gets beat out by a Core 2 Duo running at 2.66GHz.
Mac hardware is absolutely NOT the superior choice when it comes to high end game playing.
I use and like Mac OSX but one thing I dislike about the platform is the lack of hardware choice. That's the reason why you can build a PC for gaming that will be higher performing than a Mac and will cost less.
Sometimes my arms bend back.
GPUs are highly specialized and have far higher memory bandwidth than what's on your motherboard (in most cases, unless you're still stuck with an older GeForce or Radeon card that's on an AGP bus.) This sahouldn't come nearly as much as a surprise as dual voodoo2 cards producing 1024x768 gaming resolution at 60FPS. Are you hiding in a hole? No, I'm not trying to be a troll, but this is absolutely absurd. How does this make it Slashdot?
Oh, I forgot, it's news REPOSTING site. It's nowhere on the forefront of bleeding-edge technology, and the way things are going, it most likely NEVER WILL BE. OSTG might as well die with the rest of the dinosaurs, in that case, because if you can't keep on the bleeding edge like *pukes hard* Digg does, fuck, I might as well just ignore the internet and start looking at independent benchmarks. Get these fucking slashvertisements out of my face, FFS.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Perhaps AMD is conjuring up something interesting with their newly purchased ATI chips?
"refresh too many times and lock up 99% of the cpu." what is this, in X11?
Sounds like you need virtual machines, run the old crap under a VM so it still works.
What could possibly those old systems do that you cannot replicate or recode on new systems, or at least have it recompiled with
wrapper apis, or is it that horendously badly coded.
i5 specs look good though - http://www-03.ibm.com/systems/i/hardware/
Liberty freedom are no1, not dicks in suits.
There is something that bothers me about all this protein folding stuff.
...and just for the end, I do want to use all the muscule of my 200$ video card.
What is the main use of finding-the-way-the-proteins-fold? To find out how the protein would look like? Hmmmm.
If anyone wants, You can look in the past and see number of papers showing that several hundreds of millions of dollars was spent on similar projects in the past 30 years, and the effect was? Almost nothing. Such enormous computing projects gave us predictions about protein folding (helix/strand) of accuracy from 60 to 70%. Don't remember, testing such algorithms on large series of random samples will give You MORE than 50% accuracy. On large number of samples, practically any algorithm will give You about 66% accuracy.
The thing is, if You have one amount of proteins and calibrate many of today's algorithms and then test it on some different proteins, You should get accuracy of al least 80% to call the test usable/good. In medicine, acceptable accuracy of new method is at 90%!
Can this project reach such accuracy?
Now, for the end, there are some, obviously less known papers that describe methods for protein folding prediction with accuracy well above 85%. There are some papers that give directions that it does not really metter how the protein will look like, because one can predict when and how it will fold, and, what is more important, interact with other proteins. Wasn't that the idea behind all this?
But that is, obviously, not mainstream.
Doing a good job is like spilling coffee on a dark suit, you feel warm all over, but nobody notices.
> If so, does that mean that IBM was right all along with their AS/400, iSeries product which makes heavy use of ASICs?"
/.ers alwas ask such stupid questions???
Why do
We could call it... Altivec or even something shorter, like MMX or SSE!
Clear, Dark Skies
I saw the article a few weeks back about having a GPU client for the 1900xt. Decided to try it out on my mac pro booting in XP. Must say it may be giving off great numbers, but I'm not sure how it's affecting the card itself. Once the client starts the fan is running non-stop (which is obviously understandable).
Not sure how my vid card will be in a month or so because i don't think they're designed to be in game play mode for more than 12 hours or so (can't think of anyone who plays more than 12hours at a time besides WoW players)
And yes I have WoW and it doesn't even spin the fan up on the 1900, with the client it's at max.
MrJynx
Like idle time ... I expect that most people will stress their CPU's much more heavily than their GPU's. Productivity software, music playback, background threads checking email, instant messages, etc. all require more CPU than GPU, whereas games, photo and video editing are probably the only mainstream apps with much of a GPU need. Take into consideration multi-tasking wherein you've got active apps that aren't displaying anything, and it becomes even clearer that the CPU is pulling a lot more weight than the GPU. I don't dispute that in a head-to-head folding race the GPU may still come out ahead, but it's got a major jump-start in the idle-time department.
That these cards enable you to do things like Play Doom 3 at a speed hundreds of times faster than you could by doing the rendering on your CPU. What will people think of next?
The Gromacs Code on an ATI card takes advantage of super low latency, and bandwidth.. It is a good case of taking a reasonably serial process and moving it to a parallel solution.
I would argue that a system that had a massivly parallel architecture could run rings arouns a current machine for operating system tasks. However the process of developing the code would require a army of top end coders to get something reasonably acceptable to the public at large. The current branching of a current microprocessor is a bit uglier for a gpu. However there are some very neat COSC/MATH tricks to make the IF disappear, If this were to be done smoothly you could have a faster IF than the currentIF. So getting a massivly parallel os any time soon is probably like hoping that BeOS will rise from the grave and take it's rightfull market share. :)