FASTRA II Puts 13 GPUs In a Desktop Supercomputer
An anonymous reader writes "Last year tomography researchers of the ASTRA group at the University of Antwerp developed a desktop supercomputer with four NVIDIA GeForce 9800 GX2 graphics cards. The performance of the FASTRA GPGPU system was amazing; it was slightly faster than the university's 512-core supercomputer and cost less than 4000EUR. Today the researchers announce FASTRA II, a new 6000EUR GPGPU computing beast with six dual-GPU NVIDIA GeForce GTX 295 graphics cards and one GeForce GTX 275. The development of the new system was more complicated and there are still some stability issues, but tests reveal the 13 GPUs deliver 3.75x more performance than the old system. For the tomography reconstruction calculations these researchers need to do, the compact FASTRA II is four times faster than the university's supercomputer cluster, while being roughly 300 times more energy efficient."
It sounds like there might be easy money to be made buying these components, putting them in a computer case and then reselling them for profit at various universities. Just wait for the "Dell" of supercomputers.
Taxation is legalized theft, no more, no less.
Almost meets the minimum requirements for Crysis 2
This was post #2 and already modded -1, Redundant.
Blazing Fast Pron Machine running Windows Vista. Don't forget to pick a copy of the latest memory intensive Anti-Virus, as this machine will handle it just fine.
"the compact FASTRA II is four times faster than the university's supercomputer cluster, while consuming 300 times less power" And the original supercomputer was how fast? 512 cores doesn't say THAT much. I could compare my computer to supercomputers from the past and they'd say the performance of my system was amazing too.
Why does the computer from Swordfish?
Get Animated
*Drools*
It used to be that GPUs would sacrifice accuracy for speed in floating point calculations, making them unsuitable for scientific computing. Is this still the case?
...consuming 300 times less power.
*sigh*
I've got a pair of 9800gx2 in my rig. The cards turn room temperature air into ~46C air. Without proper ventilation, these things will turn a chassis into an easy bake oven.
For those not familiar with the 9800gx2 cards, it essentially is two 8800gts video cards linked together to act as a single card - something called SLI on the NVidia side of marketing. SLI typically required a mainboard/chipset that would allow you to plug in two cards and link them together. This model allowed any mainboard to have two 'internal' cards linked together, with the option of linking another 9800gx2 if your board actually supported SLI.
The pictures did not show any SLI bridge, so it looks like they are just taking advantage of multiple GPUs per card.
+++ UGUCAUCGUAUUUCU
Can it play Crysis with a high frame rate on maximum?
This is my footer. There are many like it, but this one is mine.
Duh! Look at the number of GPU's...13...try 12 or 14 and your luck will change.
jsut athnoer menagiensls ltitle psrhae for you to dcoede. Why do we wtsae our tmie dnoig tihs?
The guy in the video on that page looks exactly like the stereotype of the guy I'd expect to do this sort of thing.
If the only way you can accept an assertion is by faith, then you are conceding that it can't be taken on its own merits
This isn't a huge achievement. Nobody else has done it because it's silly.
There are two major reasons... the first is they use GeForce cards. That's not a good idea, since GeForces are held to much lower quality standards than Teslas and Quadros. They're intended for gaming graphics, where a minor error here or there isn't the end of the world. "Sorry we missed your cancer, since our supercomputer miscalculated that region of the reconstruction." The second problem is, that's one bandwidth starved machine. It's based on a pretty nice motherboard, but with 13 GPUs that's not a lot of bandwidth to go around.
The more popular layout for a GPU supercomputer of that size is a small cluster of 2-GPU blades, with a hypertransport interconnect. It's a little bit trickier to work with, but there are fewer bottlenecks.
Folding@home enthusiasts and academic contributors did more than that, and a long time ago, too. Just check this thread at foldingforums for one example.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
Carefule about equating tesellation processing with matices. Many matrice operations have N^3 or higher operations. And they may be close to singular (ill-conditioned). Single point precession is poor for both.
The pictures did not show any SLI bridge, so it looks like they are just taking advantage of multiple GPUs per card.
There's no seven-way SLI anyway. Since the GPUs are being used for processing and not graphics, there's no need for them to work together via SLI or Crossfire or what have you as long as the OS and programs treat 'em like any other multiprocessor setup.
Wouldn't it be nice if the FASTRA II, which is 3.75 times faster than the FASTRA I, was actually called the FASTRA 375. Then I wouldn't have to ask.
Because it only applies to the kind of problems that CUDA is good at solving. Now while there are plenty of those, there are plenty that it isn't good for. Take a problem that is all 64-bit integer math and has a branch every couple hundred instructions and GPUs will do for crap on it. However a supercomputer with general purpose CPUs will do as well on it as basically anything else.
That's why I find these comparisons stupid. "Oh this is so much faster than our supercomputer!" No it isn't. It is so much faster for some things. Now if you are doing those things wonderful, please use GPUs. However don't then try to pretend you have a "supercomputer in a desktop." You don't. You have a specialized computer with a bunch of single precision stream processors. That's great so long as your problem is 32-bit fp, highly parallel, doesn't branch much, and fits within the memory on a GPU. However not all problems are hence they are NOT a general replacement for a supercomputer.
That's a brilliant idea, now people can make snacks without ever leaving the computer.
A game has objectives and is competitive, anything else is just play
The Brady Bunch called they want their set clothes back!
Did more what, exactly? None of the Folding setups listed have more than 4 GPU cards per motherboard.
..hyper linux
it was slightly faster than the university's 512-core supercomputer and cost less than 4000EUR.
but tests reveal the 13 GPUs deliver 3.75x more performance than the old system.
It is impossible, to make such general statements about the performance, for something that is still very much specialized on long pipelines and streams of repetitive data (vector processing).
They may be much faster for tasks that fit that scheme. But slower for those that don’t.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Apparently, the regular BIOS can't boot with more than 5? graphics cards installed due to the amount of resources (memory & I/O space) that each one requires. So the researchers asked ASUS to make a special BIOS for them which doesn't set up the graphics card resources. However, the BIOS still needs to initialize at least one video card, so they agreed that the boot video card would be the one with only a single GPU. Presumably, they could have also chosen a dual GPU card that happened to be different from the others in some way.
Take a problem that is all 64-bit integer math and has a branch every couple hundred instructions and GPUs will do for crap on it.
So would a Cray; supercomputers and GPUs are made for the same sorts of problems (exploiting data parallelism). Now if by 'supercomputer' you mean 'a cluster of commodity hardware', then ok, you've got a point, that heap of cpus will handle branches plenty fast.
Except that a 'supercomputers' and a 'cluster of commodity hardware' are effectively synonymous these days. They all use the same Power/Xeon/Opteron/Itanium chips, with several cores and a several GB of memory to a compute node. The only real difference left is the interconnect. Commercially built systems tend to have far beefier and more complex interconnects. Homebrew systems more often than not just use gigabit ethernet, with the larger ones rarely using anything better than a 'fat tree' with channel bonding or 10gbps ethernet.
I did not try baking anything, but it did turn the top of the computer into a nice coffee cup warmer.
+++ UGUCAUCGUAUUUCU
I've found that lately on Slashdot, I agree with them that highly moderated humorous posts seem to far outnumber the interesting ones. I've actually ratcheted down all funny comments to -4 or -5, and browse at 2, to catch the more interesting discussions which get passed over. But I've never seen any reason to moderate them down now that we have control when logged in... I dunno, maybe others think that people who come here looking for facetious comments should have to browse at funny +5 instead of us sourpusses :)
The sending of this message pretty much inconveniences everyone involved.
There are also a fair number of Cell based supercomputers and even one hybrid out there. And even some pure custom solutions used by the NSA. (There is a reason they have their own chip fab.) And, if you include folding at home type applications, then GPU's represent a reasonable percentage of the worlds supper computing infrastructure.
Not only is there no seven-way SLI, it tends to work poorly with CUDA applications no matter what sort of SLI you're using. Before running any BOINC Cuda apps, SLI needs to be disable or the app only sees "1" gpu.
Maybe there's a really good reason for it that I'm not fully aware of, but why are PC cases, motherboards, add-on cards etc. all seen to be designed around such limited amounts of space? Is there a such thing a s PC case that size of a mini-fridge or bigger? A motherboard with freaking 10 or 12 slots with enough space between them? A video card the size of a motherboard? Anything but a cramped little box with limited expansion? Is that such a bizarre thing to want?
I don't think anyone on /. actually "leaves" their computer...
It's redundant because some smartass mentions Crysis in response to *every fucking article* about someone doing something using powerful GPUs*.
Of course, if it was about CPUs, the post would be about what will be needed to run Windows 8, or 'finally meeting the minimum system requirements for Vista'.
Mostly, you can predict these posts from the title of the article. Doesn't stop crotchety people people like me coming to complain about it though...
* Footnote: When someone equally-crotchety complained about this before, a poster made the good point that Crysis draws this derision as it *still* taxes high-end systems. Myabe it's because CryEngine2 is bloated and ineffecient, maybe it's because it tries to do too much. All I know is we keep getting these inane posts.
If all you have is a grenade, pretty soon every problem looks like a foxhole -- MightyYar
That was always true of supercomputers. In fact the stuff that runs well on CUDA now is almost precisely the same stuff that ran well on Cray vector machines - the classic stereotype of "Supercomputer"! Thus I do not see your point. The best computer for any particular task will always be one specialized for that task, and thus compromised for other tasks.
BTW, newer GPUs support double precision.
Au contraire, I clicked the article link JUST to find this comment. Thankyou for maintaining a cherished /. tradition!
you had me at #!
Nice try at humor. But as almost always with these types of multi-processor machines, it runs Linux.
E X A C T L Y ! ! ! I always read about how fast the Cell Broadband Processor(tm) is and how anyone is a FOOL for not using it. No. They suck hard when it comes to branch prediction. Their memory access is limited to fast, but very small memory. Out of branch execution performance is awful. You have to rewrite code massively to avoid it. For embarassingly parallel problems, they are a dream. For problems not parallel, they are quite slow. An old supercomputer isn't as fast as a new one. If ordinary processors especially multi-core ones had two or four stream processors for every core, parallel operations would be much faster too, the processors themselves would be faster and its likely one of the improvements that are being looked at by Intel and AMD (and others). Something like this would make general purpose processors much more like the Cell Broadband Engine(tm), and would make them somewhat obsolete. Certainly the Cell processor suffers from being able to deal with problems that can only use 256 MB of memory (the cell BE uses proprietary memory, very fast, but only available up to 256 MB, no one else makes this kind of memory, and they don't make chip sizes bigger than what winds up being 256 MB. GPU's are limited by memory size too (although 1GB is bigger than 256 MB), but it still suffers all of the problems of a specialty processor. If you can use it, great. I can't get any performance boost out of them, because my programs have out-of-order branches, and I get better performance from a general purpose CPU.
That's why I find these comparisons stupid. "Oh this is so much faster than our supercomputer!" No it isn't. It is so much faster for some things. Now if you are doing those things wonderful, please use GPUs. However don't then try to pretend you have a "supercomputer in a desktop." You don't. You have a specialized computer with a bunch of single precision stream processors. That's great so long as your problem is 32-bit fp, highly parallel, doesn't branch much, and fits within the memory on a GPU. However not all problems are hence they are NOT a general replacement for a supercomputer.
For that matter, which is faster: A two-ton flatbed truck, or a Maserati? Kinda depends on what you are trying to do, doesn't it? Want to move 3,000 pounds of Hay? You probably DON'T want the Maserati!
And all machines are like this. Some machines are better at some tasks than others. And presumably, the comparison to the University Supercomputer was because of a task that they *needed* to perform, and the pittance cost of the GPGPU-based supercomputer favored very well against the cost of leasing University supercomputer time.
Even different people are better at some things than others.... Some people are better a maths than others. Some people can take a bit of vinegar and coffee grounds, and make an artistic masterpiece.
Because I'm a jogger, I can run long distances faster than most people. But I suck at sprints, and I take long showers. I type over 100 WPM.
See?
I have no problem with your religion until you decide it's reason to deprive others of the truth.
I also have two GX2s in a box for use with CUDA programming. (You don't use SLI with CUDA, in fact it's a disadvantage in that if you do you'll only be able to actually use one of the GPUs, so you won't see SLI bridges in a CUDA box.) The power consumption of the 9800gx2s is indeed fearsome even at idle. I measured it, but don't have the numbers on hand. BUT: Newer nvidia cards apparently use *much* less power at idle, and probably less at full blast as well (like a 45nm CPU vs. a 90nm CPU at the same GHz will use less power for the exact same work).
Since I only need one GX2 to test most programs, I keep the power unplugged to the second one most of the time to keep from wasting so much energy (and producing so much heat).
Sure,,but if you look at it from their perspective - before we needed time on a supercomputer and now we don't. Either you redefine supercomputers to include that or it's another task where we don't need one, even better if you ask me. So it doesn't do everything, well running an embarrassingly parallel problem on a supercomputer would also "terrible" performance now compared to this.
That's great so long as your problem is 32-bit fp, highly parallel, doesn't branch much, and fits within the memory on a GPU.
As far as I know the Teslas will be doing double precision, and we certainly could put GPUs on a better backplane for GPU-GPU communication with NUMA. What's left is being highly parallel and doesn't branch much - aren't those two sides of the same coin? - and usually that's about finding a reasonable way to break it down like a finite element model or something, there's many ways you can do that and approach the right result.
Trying to solve one megastate usually has tons of cache coherency issues to let all CPUs do useful work too. If you have to rely on a single stream of calculations performance will suck one way or the other, so really you only get decent performance if you can divide it into blocks of work and yet each block is branching enough that a supercomputer is better than a GPU. So it won't be a general purpose supercomputer true but these aren't computers used for running a million different desktop apps. There's some highly specialized simulations you run, if you can do better on highly specialized hardware that's what will happen.
Live today, because you never know what tomorrow brings
Aside from a few homebrew PS3 clusters, I don't know of any large scale Cell installations. The Roadrunner is a fairly standard (if very large) Opteron based cluster, with PowerXCell co-processors. The latest Cray XT5 is is a fairly standard (if very large) Opteron based cluster, with PowerXCell or FPGA co-processors.
The NSAs ASIC systems don't count, by definition, they are not general purpose. A modern 3GHz quad-core processor will manage an exhaustive DES search in about 600 years. Deep crack in 1998 could manage that feat in 9 days, but you wouldn't consider it comparable with a 25k chip cluster.
Similarly the @Home applications don't count because the interconnect bandwidth and latency is so abysmally low. It holds a huge amount of power (on the order of several PF), but it cannot be used for anything but relatively small Monte Carlo simulations.
I think the TESLA GPUs actually go up to 4GB, and that is per GPU. However for NVidia at least 4GB unfortunately is an architectural limit. (and due to "bad" memory management you won't be able to use nearly all of those 4GB, particularly for long-running stuff with many allocations/deallocations).
They have more powerful GPUs, and have had them since a long time.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
Well isn't a super computer specially build for one task in the first run?
My lab will soon be building a computer or cluster for bioinformatics. Would something like this be appropriate / scaleable for gene microarray analysis, clustering algorithm tasks, etc? We need the capability to work with datasets in the 400 GB range, and with many permutations, but the specific datapoints are not large. Any suggestions or input would be much appreciated...
Mobo Manufacturer
Lets see, I can help these guys develop a new use for my line of wonky mobo's, get favourable mention all over the world on places like slashdot and reap the benefit of every geek with excess cash and a yen for a super computer or I can stand back and maybe they find someone else who has a "better" board or they develop their own.
Hmm lets think on this one
Would be nice if it actually worked. It's not much good having the fastest desktop computer in the world if it isn't stable. Or are they using the Dilbert definition of a PC upgrade ?
Next time, make the fancy video when it's finished guys.
What the fuck is WRONG with you guys at Slashdot. I posted this on Tuesday and my submission get's turned down. Is it because i am not Anglosaxon? Whose **** do i need to suck?
It was posted on Monday, not Tuesday
It is kind of unfair to generalize commercial clusters vs homebrew in that manner. Many institutions that purchase commercial clusters from HP/Dell/SGI/etc opt out of the use of InfiniBand or 10GbE. The logic behind it is when the vender says that for $100,000 they can upgrade to IB, the purchaser goes back and says, "For $100,000, I'll just get more cluster nodes instead." This is probably a big reason that gigabit takes up 52% of the Top500 list of supercomputers.
The main accomplishment of Fastra II is that they put 7 GPU cards into the machine. Fastra I already had 4 GPU cards, which is what the FOH machines top out at. The GPU cards are the same cards all around (Nvidia GTX 295). The linked article & video point out the difficulties they faced & overcame in trying to make 7 GPU cards work simultaneously in the same box.
OK then. I'm raising an eyebrow in somewhat heightened interest.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
They had support from the mainboard manufacturer.
Read the fucking article.