Supercomputer Built With 8 GPUs

By what benchmark? by morgan_greywolf · 2008-05-31 05:25 · Score: 1, Redundant

By what benchmark is eight of the NVIDIA GPUs in the 9800 GX2 more powerful than 300 2.4 GHz C2Ds?

--
My blog

Re:By what benchmark? by Anonymous Coward · 2008-05-31 05:29 · Score: 5, Informative

By what benchmark is eight of the NVIDIA GPUs in the 9800 GX2 more powerful than 300 2.4 GHz C2Ds? Looking at TFS the benchmark of their own tomography code taking a couple of hours instead of weeks.
Re:By what benchmark? by cromar · 2008-05-31 05:33 · Score: 4, Insightful

I am guessing it has something to do with floating point calculations vs. integer calculations, but if I read the article, this wouldn't be Slashdot, would it? Think about it. We have GPUs to perform vector maths, flops, etc. because the CPU is not all that great at that sort of thing typically. A general purpose CPU is not necessarily going to be the fastest if your problem domain is more suited to an "inferior" chip; general purpose CPUs are not designed to be the fastest chip in every situation.
Re:By what benchmark? by symbolset · 2008-05-31 05:35 · Score: 4, Insightful

By what benchmark is eight of the NVIDIA GPUs in the 9800 GX2 more powerful than 300 2.4 GHz C2Ds?

By the benchmark that they solve the particular problem of this specific application in 1/300th of the time?

--
Help stamp out iliturcy.
Re:By what benchmark? by 77Punker · 2008-05-31 05:37 · Score: 4, Informative

By what benchmark is eight of the NVIDIA GPUs in the 9800 GX2 more powerful than 300 2.4 GHz C2Ds? By any SIMD problem. For reference, fire up a game that's capable of using a software renderer and do some sort of benchmark, then use the 3D hardware on the same benchmark. That's the difference between SIMD on hardware that is designed to do SIMD and SIMD on hardware that's designed to do everything (or in the case of the Duo, multitasking).
Re:By what benchmark? by hansraj · 2008-05-31 05:39 · Score: 5, Informative

As far as my understanding goes, comparing a GPU's performance to a CPU's performance is very very task dependent and the comparison with 300 CPUs should not be taken to mean that a 8GPU system is more powerful than a 300 core duo system in general.

If the application requires solving a small task many times over and over and all of these tasks can be done in parallel then using a GPU works great because a GPU has many cores each of which can handle a simple routine. Also the GPU is designed to spend very little time on the way code is hadled (load, switch etc) and spend more time actually running the code (hence the requirement of only very simple functions).

Such problems frequently arise in tomography, physics, astronomy etc and I hear GPUs are a great success in these areas. But don't hold your breath for running your favorite distro blazingly fast using GPUs.
Re:By what benchmark? by 77Punker · 2008-05-31 05:46 · Score: 4, Informative

The GPU's are better at floating point than integer; if I remember correctly it takes 4 cycles on current GPU's to do a float operation, but it takes 16 to do an int. No, I don't understand why.

Also, the "multiply" and "add" instructions exist in a "madd" opcode which essentially doubles the theoretical floating point performance, even if you don't use "madd" very often.
Re:By what benchmark? by Guillaume+Castel · 2008-05-31 06:14 · Score: 1

Intel begs to differ. Not that I'm buying into their marketing.
Re:By what benchmark? by Jaime2 · 2008-05-31 06:16 · Score: 4, Insightful

I think the GP (and myself) were objecting to the use of the fairly general word "power" and the use of this one problem as a "power benchmark". While it is obviously true that 8GPUs is as fast as 300 C2Ds for this problem, this system isn't as fast as a supercomputer for most problems. All this does is point out that the recent trend of building supercomputers out of inexpensive general purpose CPUs may not be a good idea for all applications.
Re:By what benchmark? by gumbi+west · 2008-05-31 06:19 · Score: 2, Insightful

When you get into inverting matricies, or doing matrix vector multiplication the algo is very easily in parallel, but I always wonder where the full matrices live. i.e. they could easily be tens of GBs of matrix, so the CPU would seem to have to be heavily involved as well.
Re:By what benchmark? by foobsr · 2008-05-31 06:26 · Score: 1

building supercomputers out of inexpensive general purpose CPUs may not be a good idea for all applications

You may generalize that, like, e.g., in - 'for running VISTA', but (ymmv) of course you can come up with a more serious example.

CC.

--
TaijiQuan (Huang, 5 loosenings)
Re:By what benchmark? by kumar999 · 2008-05-31 06:27 · Score: 1

It is obvious that a GPU is going to do floating point cals much faster than a CPU. But the question becomes is this "super computer" so fast for the tests they designed for it or is it fast for any application.

GPUs are required to do more floating point calculations but is there such huge demands on super computers running financial calculations for instance?

I can understand it might be of some interest to the scientific community. Especially to do with astronomy, where they wait for months to get access to computers fast enough to run models.
Re:By what benchmark? by pablomme · 2008-05-31 06:33 · Score: 4, Insightful

As far as I know, GPUs are amazingly fast at matrix operations and other things allowing vectorized evaluation. I guess these tomography applications must make massive use of these. After all, tomography is in essence image processing..

--
The state you are in while your HEAD is detached... - wait, what?
Re:By what benchmark? by symbolset · 2008-05-31 06:40 · Score: 5, Insightful

All this does is point out that the recent trend of building supercomputers out of inexpensive general purpose CPUs may not be a good idea for all applications.

And... a screwdriver is not always a prybar. A tool's a tool - they have preferred usage but if your requirement is specific and you're creative enough, you can do some fine work outside of the tool's intended purpose. Like this guy. Kudos to him.
Perhaps some more creative people finding this information will now discover if their specific requirements can be met by this interesting configuration. That will save them large quantities of cash or possibly enable some facility that was not previously available because supercomputers cost a grip-o-cash.
Of course for general purpose supercomputing you would want to use modified PS3s.

--
Help stamp out iliturcy.
Re:By what benchmark? by Calinous · 2008-05-31 06:54 · Score: 4, Informative

Because floating point operation goes on a dedicated path, while the integer operations does not have a dedicated integer-only path.
Also, it's possible that loading floating points operands and storing results in actual code can be pipelined, while integer operations are not pipelined.
(and yes, I don't know what I'm talking about)
Re:By what benchmark? by cheier · 2008-05-31 07:04 · Score: 5, Interesting

Too bad this isn't really news. I guess it is news if you consider that someone else has had their application accelerated by NVIDIA GPUs. I guess the only other reason that this could be news is by virtue of having 8 GPU cores.

Unfortunately, this setup won't work ideally for a lot of other CUDA based applications. For the past 6 months, I had a system with 6 GPUs (actual physical GPUs). This is the system that I showed at CES. We are easily able to do 8 physical GPUs, and now I've been solely focused on utilizing Tesla.

Given that NVIDIA released the GX2 series, I was not surprised that someone would announce an 8GPU system. I'm surprised it took this long for someone to do it, and almost equally surprised that slashdot took this long to publish any news that is decent in the realm of GPU super computing. I've been cranking out close to 228 billion atom evals. per second in VMD for months now, versus about 4 billion on dual quad core 3.0GHz Xeons.
Re:By what benchmark? by kipman725 · 2008-05-31 07:08 · Score: 2, Informative

or 100's of FPGA's can do what was previously considered a task that even with super computing resources was considered so time consuming to be only worthwhile for groups like the nsa: http://www.copacobana.org/index.html (the EFF had a simlar custom chip device several years before but that cost >$250K)
Re:By what benchmark? by TheThiefMaster · 2008-05-31 07:23 · Score: 3, Informative

The 9800 GX2's GPUs have 128 1.5GHz "shader processors". 8 of these is like having 1024 vector-processing-specialised processor cores at your command.

I could easily believe that it performed comparably to 300 2.4GHz Core 2 Duos (aka 600 "over 1.5x faster but not vector-specialised" cores).

Theoretical performance is 576 GFLOPS per 9800 GX2 GPU (4.608 TFLOPS total) vs 19.2 GFLOPS per Core 2 CPU (5.760 TFLOPS total). However in tests the Core 2 gets as low as 6 GFLOPS instead of it's 19 theoretical, and the 9800 GPU gets a lot closer to it's full power.
Re:By what benchmark? by raftpeople · 2008-05-31 07:27 · Score: 2, Interesting

Just to expand on this stuff: Different tools are (obviously) designed for different workloads. I have a project I was contemplating porting to the Cell. Unfortunately only 40% of my performance bottleneck could take advantage of SIMD, but that 40% could have taken advantage of an enormous number of SIMD instructions just like the workload from TFA.

The other critical 40% of my project would have gained absolutely nothing from SIMD and on the Cell would have lost time due to branches. In this case 300 c2d's would far exceed the throughput of 8 GPU's.
Re:By what benchmark? by Enderandrew · 2008-05-31 07:30 · Score: 2, Insightful

Please, please, please do the math.

8 GPUs are being compared to 300 CPUs. So the single GPU for this pupose isn't 300 times as powerful as the CPU.

It is doing the operation in 1/37th the time approximately. This isn't news or unbelievable. GPUs are dedicated to performing certainly types of tasks far better than a CPU.

--
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
Re:By what benchmark? by AlecC · 2008-05-31 09:30 · Score: 4, Informative

It takes 4 cycles to do a floating point operation, and 4 cycles to do an integer add/subtract. It takes 16 cycles to do an integer multiply because it only has a 24-bit hardware multiplier (needed to achieve the 4-cycle flops, so it has to do long multiplies as four madds, This was for the first generation CUDA CPUs; the second generation, which should be out by now, was going to have double length floating and would be able to do 32 bit multiplies in the same four cycles.

While they can do integer, these machines are not very happy with it, and I found it much easier to do everything in floating point, even if you are talking about 8-bit colour data. It goes no slower, and everything is much better adapted to floating point. Then there are special instructions to get back to integer at the output.

While each operation takes 4 cycles, they are fully pipelined, so that it launched a new instruction per cycle, times 32 pipes per unit, times 8 units per GPU.

And madd is very useful for the sort of tasks for which supercomputers are traditionally used.

--
Consciousness is an illusion caused by an excess of self consciousness.
Re:By what benchmark? by schwaang · 2008-05-31 10:35 · Score: 1

Interesting. Did you find it necessary like the Antwerp people to have a multi-core CPU to keep CPU-to-GPU communication bubbling along smoothly?

I'm looking to CUDA-ize some algorithms (for computation as opposed to real-time graphics). Are there any books or sites that you found really helpful? (I have GPU Gems 3 on order.)
Re:By what benchmark? by mikael · 2008-05-31 12:57 · Score: 1

They are doing tomography on large volume data sets. You take a large number of high-resolution slices of an object (1024x1024 x 16 bits) from a large number of different angles which encompass an entire circle (256 to 512 slices).

Since they are using eight GPU's, the total memory of the system must be in the range of 8 Gigabytes. They would need half the memory for the raw image data, and the other half for the final cube volume (1024^3 x 16 bits).

From the video, a calculation which would normally take an hour on a network of PC's will take seconds on their system.

Most likely, the speed optimisations are probably through not having to synchronise the PC's, load and save data from disk, and transfer it through the local network as well as having multiple processors to transform the data (128 per GPU).
If you have everything in one place (ie. in GPU memory), then it is going to be a lot faster than a network of PC's.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re:By what benchmark? by mikael · 2008-05-31 13:23 · Score: 2, Informative

Because a floating point number really consists of two value (a large 23-bit mantissa, a smaller (8-bit exponent and a single sign bit), performing a single arithmetic operation on two floating point numbers requires:

1. Aligning the two mantissas so the exponents match
2. Performing the operation
3. Renormalizing the mantissa of new value so that it is in the range 1.0 to less than 2.0
4. Saving the result to the destination register

Each of these stages would probably take one read/write cycle.

Performing an integer operation requires a shift-and-add sequence for multiplication, and a shift-compare-and-conditional-subtract for division.

Previous GPU's just stored the integer as the mantissa of floating point registers. But as integers are now represented separately as 32-bit values, they will be processed by a different hardware unit. Maybe they have two barrel shift registers working in parallel so that only 16 cycles are required.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re:By what benchmark? by Brian+Gordon · 2008-05-31 14:16 · Score: 2, Interesting

It's not really that surprising. CPUs are supposed to control the computer's resources and keep some kind of sanity and synchronization.. do one thing at a time and they do it fast. Multiple cores are nice but they just let you do 2 things at once. Yeah they're fancy and pipelined and there's all sort of asynchronous optimization with arithmetic and logic, but as a whole it's executing one instruction after another.

GPUs on the other hand are far more parallel. The thousands of individual subprocessors can be independently controlled in software and given different tasks.
Re:By what benchmark? by schwaang · 2008-05-31 15:33 · Score: 1

They do single precision floating point, so yes there are limitations. But if you google for CUDA examples, there are plenty of useful computational applications like modelling physics for purposes other than graphical display.
Re:By what benchmark? by Xyrus · 2008-05-31 16:25 · Score: 3, Interesting

You're being overly simplistic.

In order to utilize this "super computer", your problem has to be refactored in such a way that it can utilize the hardware efficiently. This can be either be fairly easy or incredibly difficult depending on the problem, tool-set available, etc. .

Their benchmark is good for them, but it is most likely meaningless to the general super-computing community. Porting something like LINPACK over and running that as a benchmark however would give a whole lot more insight into what kind of performance boost a typical scientific app might gain from said hardware.

Nice to see someone utilizing this functionality though.

~X~

--
~X~
Re:By what benchmark? by cheier · 2008-05-31 17:06 · Score: 3, Informative

I haven't done much of the development of the software myself. We have a developer we hired to work with CUDA. From what I've found, the documentation that is available on NVIDIAs site for CUDA is excellent and their developers are active on their forums.

For VMD, it was necessary to have 1 CPU core per GPU. We tested 6 GPUs with 4 cores and we could only spawn 4 threads for GPU processing. The guys at Evolved Machines told me they can use multi GPU off of a single core. If so, I have no idea how. NVIDIA even tells me 1 core per GPU, so that is the gospel I'm following by. Acceleware for some of their stuff even use 2 cores per GPU, but they have their own libraries outside CUDA for GPU stuff, so who knows.

I haven't come across any books on CUDA other than the support manuals, but since it isn't a very mature API, it is only a matter of time.
Re:By what benchmark? by frosty_tsm · 2008-05-31 17:10 · Score: 2, Funny

You might be a little off, but I'd say you're pretty close, and I just had a couple of architecture and super computer systems classes.
Re:By what benchmark? by arktemplar · 2008-05-31 19:25 · Score: 1

Ok, now I know a little about this, so let me expand a little on what you said - specifically matrix vector.

in the case of a matrix vector operation Ax = b, the amount of reuse for a given vector x is limited to the number of rows of the matrix - now the costs you incur is essentially in data movement operations - the cost of bringing the Vector in and the cost of bringing the element of A in.

The cost of the vector is amortised over time due to the parallelism that we see, but the elements of A are reused only once, and this ! is a problem making matrix-vector a bandwidth limited operation rather than a truly parallelizable one, matrix-matrix on the other hand is memory limited and can be said to be more parallelizable.

In short - O(n^m) the higher the m - chances are the more parallelizable the operation, making it a better target for acceleration.

--
blog plug -> The Darker Side of Light
Re:By what benchmark? by arktemplar · 2008-05-31 19:28 · Score: 1

That is not entirely true - they are 'amazingly fast' only for a set of matrix operations of some limited size. They are not particularly effective for very large sizes, fortunately as you said tomography is essentially image processing and image processing matrices are usually not very large (filtering etc. etc. all will be done by matrices of sizes - 3x3- 9x9 such like, by a large matrix I'm talking of the order of 5000x50,000)

--
blog plug -> The Darker Side of Light
Re:By what benchmark? by arktemplar · 2008-05-31 19:31 · Score: 1

LAPACK and LINPACK and SCALPACK, why measure using your own benchmark when you don't mention performance with the standard ones.

--
blog plug -> The Darker Side of Light
Re:By what benchmark? by PeterGraham · 2008-06-01 11:09 · Score: 1

Would a configuration like this be good (and more economical) for usage as a render farm for 3d animation rendering?
Re:By what benchmark? by AlecC · 2008-06-01 11:47 · Score: 1

That is what they are built for, so if it isn't, NVidia have deeply failed.

The CUDA software takes a little getting your head around - but not much. It is basically plain vanilla C (not C++) with vector processing. A single $400 8800 GPU board matches manufacturer-that-I-used-to-work-for's $50,000 hardware at evens. it isn't faster, but it isn't slower either. Two would be twice as fast.

--
Consciousness is an illusion caused by an excess of self consciousness.
Re:By what benchmark? by PeterGraham · 2008-06-01 12:23 · Score: 1

So... I would have to learn some programing (C or C++) right? Do you know of a good book to learn from? On a system like this, would I be able to run app like Maya and Photoshop? I'm a student at Ringling College of Art and Design going into my senior year, majoring in Computer Animation. This year I will be working on my thesis which is a 2min 3d animated short and I need to build a powerful computer. I just priced one out through BOXX for over $10,000, but if I can build one better for cheaper, I'm all for it!!! (plus if I can use it as a small render farm it would be to a huge advantage)
Re:By what benchmark? by AlecC · 2008-06-01 13:09 · Score: 1

No, you will not be able to run Maya or Photoshop. I would hope that these would use GPU acceleration in their normal modes (either via DirectX or OpenGL) butthey are still going to run mostly on the main CPU.

If you don't already speak C, then (IMO) learning C is too much of a burden to add onto the norm,al burdens of a thesis. If you were that kind of a geek, you would be there already. But you definitely want to look at the available graphic tools, such as the ones you mentioned, and see which ones can best use the added power of a GPU. The oomph of one of these GPUs will go of the order of a hundred times faster than the same render in the general purpose CPU - once it gets started. So it would, aain IMP, be worth researching which animation package best exploits GPUs, and putting some effort into optimising that use. Starting "clean sheet" with C code is backing off too far to takea run at your target.

--
Consciousness is an illusion caused by an excess of self consciousness.
Re:By what benchmark? by PeterGraham · 2008-06-02 07:31 · Score: 1

Do you know of a good site for buying computer hardware through? and a site to get solid info on hardware compatibility?
Re:By what benchmark? by AlecC · 2008-06-02 09:23 · Score: 1

I buy my hardware from dabs.com. Cheap, fast, good refund on faulty goods. But you ned to know what you are doing - they don't refund when you bought the wrong thing. For hardware compatibility, I am afraid I cannot help.

--
Consciousness is an illusion caused by an excess of self consciousness.

I guess... by LordVader717 · 2008-05-31 05:26 · Score: 3, Funny

They didn't have enough dough for 9.

Re:I guess... by lukas84 · 2008-05-31 06:58 · Score: 2, Informative

Look at the IBM x3850 M2.
Re:I guess... by nih · 2008-05-31 07:40 · Score: 1, Funny

they should of pushed it to 11

--
I'm a rabbit startled by the headlights of life :(

Re-birth of Amiga? by Yvan256 · 2008-05-31 05:30 · Score: 4, Interesting

Am I the only one seeing those alternative uses of GPUs as some kind of re-birth of the Amiga design?

Re:Re-birth of Amiga? by Quarters · 2008-05-31 06:03 · Score: 5, Informative

The Amiga design was, essentially, dedicated chips for dedicated tasks. The CPU was a Motorola 68XXX chip. Agnus handled RAM access requests from the CPU and the other custom chips. Denise handled video operations. Paula handled audio. This cpu + coprocessor setup is roughly analogous to a modern X86 PC with a CPU, northbridge chip, GPU, and dedicated audio chip. At the time the Amiga's design was revolutionary because PCs and Macs were using a single CPU to handle all operations. Both Macs and PCs have come a long way since then. 'Modern' PCs have had the "Amiga design" since about the time the AGP bus became prevalent.
nVidia's CUDA framework for performing general purpose operations on a GPU is something totally different. I don't think the Amiga custom chips could be repurposed in such a fashion.
Re:Re-birth of Amiga? by porpnorber · 2008-05-31 09:28 · Score: 3, Interesting

I think the parent was seeing the same situation a little differently. You ever code up Conway's Life for the blitter? Whoosh! Now CUDA does floating point where the Amiga could only do binary operations, and the GPU has a lot more control onboard, but the analogy is not unsound. After all, CPUs themselves didn't even do floating point in the old days (though of course they did do narrow integer arithmetic).
Re:Re-birth of Amiga? by Anonymous Coward · 2008-05-31 09:57 · Score: 2, Interesting

Modern' PCs have had the "Amiga design" since about the time the AGP bus became prevalent.
Not really. The Amiga also had perfect synchronization between the different components. When you configured soundchip and graphics chip for a particular sample rate and screen resolution, you would know exactly how many samples would be played for the duration of one frame. And you had synchronization to the point where you could know which of the samples were played while a particular line was being sent through the D/A converter.

Considering how often audio and video gets out of sync on a PC, I will not say they have caught up with the Amiga design.

Another thing to remember is how much Amigas were being used in television production in the past. With an Amiga it was actually possible to sync the entire machine to an external clock source such that the video output could be mixed with another video source.

Syncing the CPU to an external source these days may not be a good idea these days. But syncing audio and video should be a no brainer, it just happens to be tricky to achieve in a modular design.

Another thing is the low latency the Amiga could achieve from input to output. If you moved the mouse while the last few lines of one picture was being sent to the monitor, the position would actually still be updated on the next frame.
Re:Re-birth of Amiga? by Anonymous Coward · 2008-05-31 10:54 · Score: 1, Informative

GPUs essentially act like field programmable gate arrays. In a CPU, to perform a typical mathematical transformation, you would write the mathematical algorithm, wrap it with a loop (such as a for or while loop) and iterate the loop once for each element in a large array. In a GPU, You define the mathematical algorithm, point the GPU at the array, and tell it to go. It applies the algorithm to every element in the array without the code overhead for the loop and do it simultaneously by the number of pipelines it contains, given the constraint that the algorithm cannot be recursive (dependent upon the value of another member of the array).
If your problem fits the non-recursion constraint, GPUs are going to kick ass all over normal CPUs. Most general programming problems do not fit that constraint. Most production scientific mathematical problems do.
So no, it's nothing like an Amiga.
Re:Re-birth of Amiga? by jaminJay · 2008-05-31 10:59 · Score: 1

I never thought I'd say this, but... if I had mod points right now...

The blitter and the copper of any machine can do much more than Block Image TransfER and screen timings. These two in concert and you can do many, many calculations with no CPU intervention at all (use the copper to control the blitter with blitter feedback into the copper).

But, yes. Only if you can do it all in binary, preferably huge chunks of it at a time.

--
Leela: "Is all the work done by children?" Alien: "No, not the whipping."
Re:Re-birth of Amiga? by J.R.+Random · 2008-05-31 12:32 · Score: 1

It's more like the re-birth of the ILLIAC III.

Why haven't they started releasing GPU CPUs yet? by arrenlex · 2008-05-31 05:34 · Score: 3, Interesting

This article makes it seem like it is possible to use the GPUs as general purpose CPUs. Is that the case? If so, why doesn't NVIDIA or especially AMD\ATI start putting its GPUs on motherboards? At a ratio of 8:300, a single high-end GPU seems to be able to do the work of dozens of high-end CPUs. They'd utterly wipe out the competition. Why haven't they put something like this out yet?

This is awesome! by sticks_us · 2008-05-31 05:36 · Score: 4, Funny

Ok, probably a paid NVIDIA ad placement, but check TFA anyway (and even if you don't read, you gotta love the case). It looks like heat generation is one of the biggest problems--sweet.

I like this too:

The medical researchers ran some benchmarks and found that in some cases their 4000EUR desktop superPC outperforms CalcUA, a 256-node supercomputer with dual AMD Opteron 250 2.4GHz chips that cost the University of Antwerp 3.5 million euro in March 2005...

...and at 4000EUR, that comes to what (rolls dice, consults sundial) about $20000 American?

--
"Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth

Re:This is awesome! by livingboy · 2008-05-31 05:42 · Score: 2, Informative

Insted of dices you could use KCALC 1 EUR is about 1.55 USD so instead of 20000 it did cost only about 6200 USD.
Re:This is awesome! by krilli · 2008-05-31 05:48 · Score: 2, Informative

http://www.google.com/search?q=4000+eur+in+usd

$6.218.

--
Jag pratar lite svenska.
Re:This is awesome! by TubeSteak · 2008-05-31 05:57 · Score: 1

The FASTRA uses aircooling and with the sidepanel removed the GPUs run at 55 degrees Celsius in idle, 86 degrees Celsius under full load and 100 degrees Celsius under full load with the shaders 20% overclocked. They have to run the system with the left side panel removed as the graphics cards would otherwise overheat but they're looking for a solution for their heat problem. Looking for a solution?
Geeks everyone have used the old "box fan aimed at the case" solution since time immemorial.

If you wanna get real fancy, you can pull/push air through a water cooled radiator.
Example: http://www.gmilburn.ca/ac/geoff_ac.html

--
[Fuck Beta]
o0t!
Re:This is awesome! by maxume · 2008-05-31 06:36 · Score: 1

Designed in America, manufactured in Asia, purchased in Europe.

--
Nerd rage is the funniest rage.
Re:This is awesome! by pablomme · 2008-05-31 07:02 · Score: 1

and at 4000EUR, that comes to what (rolls dice, consults sundial) about $20000 American? That made me try to extrapolate the 2002-2008 trend of the exchange rate to see when that would become true (provided the trend continues). I get 2014 and 2045 with linear extrapolations, which are gross approximations, and 2023 with an exponential extrapolation. Does anyone know how exchange rates should be expected to behave with respect to time?

--
The state you are in while your HEAD is detached... - wait, what?
Re:This is awesome! by osu-neko · 2008-05-31 07:16 · Score: 1

$6.218.
Six dollars and twenty two (rounding up) cents?! That's quite a bargain!

(One should probably refrain from using a "." as a separator when posting a figure in US dollars, where, by US convention [and this is US currency we're talking about], a "." indicates the end of the dollar part and the start of the cents part.)

--
"Convictions are more dangerous enemies of truth than lies."
Re:This is awesome! by osu-neko · 2008-05-31 07:19 · Score: 2, Insightful

Designed in America, manufactured in Asia, purchased in Europe.
20th century thinking. Welcome to globalization. The product was designed, manufactured, and purchased on Earth.

--
"Convictions are more dangerous enemies of truth than lies."
Re:This is awesome! by maxume · 2008-05-31 07:32 · Score: 1

Yeah thanks, it still makes a good riposte to the OPs Euro noise.

--
Nerd rage is the funniest rage.
Re:This is awesome! by maxume · 2008-05-31 07:37 · Score: 2, Insightful

Unpredictably.

(the big shift over the last 6 years is mostly due to wanton printing of money in the US and rather tight central banking in Europe [with a healthy dose of Chinese currency rate fixing thrown in]. The trend isn't all that likely to continue, as a weakening dollar is great for American businesses operating in Europe and horrible for European businesses operating in America, which creates [increasing amounts of] counter-pressure to the relatively loose government policy in the US, or saying it the other way around, counter-pressure to the relatively tight government policy in the EU.)

--
Nerd rage is the funniest rage.
Re:This is awesome! by Anonymous Coward · 2008-05-31 09:05 · Score: 1, Interesting

You can't use the currency exchange rate to get the US price for this system as many companies don't use this to set their prices. I did a quick check and the hardware used in this "supercomputer" would cost you a bit less than $4000 at Newegg.
Re:This is awesome! by noidentity · 2008-05-31 09:30 · Score: 1

Or just use google: http://google.com/search?q=4000+EUR+in+USD
Re:This is awesome! by Anonymous Coward · 2008-05-31 11:00 · Score: 3, Funny

WHOOOOOSH - over your head it went.
Re:This is awesome! by raynet · 2008-05-31 12:21 · Score: 1

With or without tax? The euro price does include about 20% of value added tax.

--
- Raynet --> .
Re:This is awesome! by krilli · 2008-06-03 00:03 · Score: 1

Hehe

Best FLOPS/$ ratio in the business!

--
Jag pratar lite svenska.

Tomography by ProfessionalCookie · 2008-05-31 05:36 · Score: 4, Informative

noun a technique for displaying a representation of a cross section through a human body or other solid object using X-rays or ultrasound.

In other news Graphics cards are good at . . . graphics.

Re:Tomography by krilli · 2008-05-31 05:53 · Score: 1

It's more complicated than that, trust me. The troubles inherent in graphics processing translate to a whole lot more stuff than just rendering the latest Doom.

--
Jag pratar lite svenska.
Re:Tomography by jergh · 2008-05-31 06:01 · Score: 5, Insightful

What they are is doing is reconstruction, basically analyzing the raw data data from a tomographic scanner and generating a representation which can then be visualized. So its more doing numerical methods than graphics.

And BTW even rendering the reconstructed results is not that simple, as current graphics card are optimized for geometry, not volumetric data.
Re:Tomography by imrehg · 2008-05-31 08:13 · Score: 2, Informative

In other news Graphics cards are good at . . . graphics.
It's not the graphics part that makes it so computer-intensive.... All the mathematics behind it, once that's done, the presentation could be done on any ol' computer....

So, if you mean by "graphics" that they are good at difficult geometrical calculations (like in games, for example), than you are right.... because that's what it is, truck-load of geometry...

From Wikipedia:
Tomography: "[...] Digital geometry processing is used to generate a three-dimensional image of the inside of an object from a large series of two-dimensional X-ray images taken around a single axis of rotation."
Re:Tomography by ProfessionalCookie · 2008-05-31 09:54 · Score: 1

if you mean by "graphics" that they are good at difficult geometrical calculations (like in games, for example)
I can't imagine any other definition of 3d graphics.
Cheers ;)
Re:Tomography by jergh · 2008-05-31 10:05 · Score: 2, Informative

Sounds like a simple 3D to 2D projection issue, no? Not really simple. The input data is "raw" in a sense that it contains lots of artifacts from the acquisition which have to be removed during reconstruction.
Re:Tomography by biobogonics · 2008-05-31 18:02 · Score: 1

It's not the graphics part that makes it so computer-intensive.... All the mathematics behind it, once that's done, the presentation could be done on any ol' computer....

So, if you mean by "graphics" that they are good at difficult geometrical calculations (like in games, for example), than you are right.... because that's what it is, truck-load of geometry...

From Wikipedia:
Tomography: "[...] Digital geometry processing is used to generate a three-dimensional image of the inside of an object from a large series of two-dimensional X-ray images taken around a single axis of rotation."

Computed tomorgraphy has been around for some time. Early CT ran on computers about as powerful as the 16 bit PDP-11. But traditional CT requires x-ray images to be taken from many different angles around the patient. In addition, it has problems with artifacts and noise, such as movement. The computations are relatively simple because you are reconstructing an image from a large number of projections. To reduce the time, radiation dose and influence of artifacts, you can reduce the number of angles, etc. but this comes at a cost in much longer computation time.

So you trade off a shorter scan for longer reconstruction of the image.

Debate if you wish if this is a supercomputer or not. I think this is exciting because it is being done with relatively inexpensive off the shelf hardware.

Also medical and instrumentation computers tend to lag far behind the usual gamer or consumer machine. 20+ years ago I was shown a piece of new lab equipment - the size of a large washing machine. Its owner bragged that its CPU was a PDP-8, someting much less powerful than the contermporary IBM-PC.

Imagine how long you would have to wait if Duke Nukem Forever had to be certified by the FDA before release.
Re:Tomography by iwein · 2008-05-31 23:45 · Score: 1

The researchers come to a different conclusion: "In fact, graphics cards are highly suitable for tomography computations."

I'm sure it's not that simple, but apparently they thought (as the parent noted) that these graphical operations should work very well on graphics cards.

And they have the data to back it up.

--
Show a man some news, distract him for an hour. Show a man some mod points, distract him for the rest of his life.

coincidence by DaveGod · 2008-05-31 05:37 · Score: 2, Insightful

I can't imagine that it is a coincidence that this comes along just as Nvidia are crowing about CUDA, or that the resulting machine looks like a gamer's dream rig.

While there is ample crossover between hardware enthusiasts and academia, anyone soley with the computation interest in mind probabyl wouldn't be selecting neon fans, aftermarket coolers or spend that much time on presentable wiring.

Re:coincidence by osu-neko · 2008-05-31 07:28 · Score: 1

I can't imagine that it is a coincidence that this comes along just as Nvidia are crowing about CUDA, or that the resulting machine looks like a gamer's dream rig.

Given that a gamer's dream rig is precisely the kind of platform that can best computationally solve the problem they have at hand, you're right, there's nothing at all coincidental about it.

While there is ample crossover between hardware enthusiasts and academia, anyone soley with the computation interest in mind probabyl wouldn't be selecting neon fans, aftermarket coolers or spend that much time on presentable wiring.

Anyone solely with one and only interest in mind is probably not a human being. That's the kind of singlemindedness you find in a good calculator but is probably literally impossible for a human to achieve. And we know that's not the case here. If that was their one and only interest, we'd never have heard about it, they'd have just built and used the machine to solve their problem. The fact that we even know it exists is because they also displayed a human need to tell other people about it, and even show it to them. Of course, once you stray into wanting to show it to someone, making it presentable becomes a real concern.

--
"Convictions are more dangerous enemies of truth than lies."

Re:Why haven't they started releasing GPU CPUs yet by kcbanner · 2008-05-31 05:37 · Score: 4, Insightful

They are useful for applications that can be massively parallelized. Your average program can't break off into 128 threads, that takes a little bit of extra skill on the coder's part. If, for example, someone could port gcc to run on the GPU, think of how happy those Gentoo folks would be :) (make -j128)!

--
Obligatory blog plug: http://www.caseybanner.ca/

In other news... by bobdotorg · 2008-05-31 05:39 · Score: 4, Funny

... 3D Realms announced this as the minimum platform requirements to run Duke Nuke'em Forever.

--
__ Someday, but not this morning, I'll finally learn to use the preview button.

Re:In other news... by MindlessAutomata · 2008-05-31 06:29 · Score: 1

Sounds extreme, but you have to keep in mind that when DNF comes out, all those cards and such will be stored in our closets or will be being used as spare equipment for our silly home Linux servers.

No it does not by symbolset · 2008-05-31 05:39 · Score: 1

It does not run Linux.

Can it? Anybody?

--
Help stamp out iliturcy.

Re:No it does not by 77Punker · 2008-05-31 05:49 · Score: 1

I guess this is probably a joke, but the sections of code that run on the GPU should work on any platform supported by CUDA (which is probably what they're using, didn't read TFA) with little or no modification. Unless they're modifying structures created by D3D, that is....
Re:No it does not by Tweenk · 2008-05-31 06:05 · Score: 1

Can it? Anybody? You can begin working on it, they also have SDKs for Linux:
CUDA SDK download

--
Those who would give up liberty to obtain working drivers, deserve neither liberty nor working drivers.
Re:No it does not by You+ain't+seen+me! · 2008-05-31 06:47 · Score: 1

It does not run Linux. But Microsoft will start porting Vista to it, as soon as they get their x86 version sorted out.

Re:Why haven't they started releasing GPU CPUs yet by MynockGuano · 2008-05-31 05:39 · Score: 1

No; if you read all the way to the end, you can see where they discuss the limited specific "general" programs that currently support this kind of thing. Namely, folding@home (on ATI cards) and maybe Photoshop in the future. The tomography software they use is likely their own code, is graphics-heavy, and is tailored for this set-up.

Re:Why haven't they started releasing GPU CPUs yet by Anonymous Coward · 2008-05-31 05:39 · Score: 1, Interesting

For information on their current HPC platform checkout http://www.nvidia.com/object/tesla_computing_solutions.html FWIW I don't think there would be that big of performance advantage of putting the GPUs on the motherboard, infact you'd probably actually get a performance decrease if you UMA'd the memory. With discrete boards each GPU has it's own framebuffer resulting in higher memory bandwidth.

Finally... by ferrellcat · 2008-05-31 05:40 · Score: 5, Funny

Something that can play Crysis!

Re:Finally... by Yvan256 · 2008-05-31 05:43 · Score: 4, Funny

If you call 8 FPS "playing".
Re:Finally... by cpricejones · 2008-05-31 07:53 · Score: 2, Funny

And that's on 640x480

nVidia Tesla by beef3k · 2008-05-31 05:40 · Score: 1

Why not just buy a premade Tesla system from nVidia and avoid the heating problems?

Re:nVidia Tesla by Anonymous Coward · 2008-05-31 05:44 · Score: 1, Insightful

A Tesla system would cost a lot more.

This is not a supercomputer by poeidon1 · 2008-05-31 05:41 · Score: 3, Insightful

this is an example of acceleration architecture. Anyone who have used FPGAs knows that. Ofcourse, making sensational news is a too common thing on /.

--
They called me mad, and I called them mad, and damn them, they outvoted me. -Nathaniel Lee

Re:Why haven't they started releasing GPU CPUs yet by 77Punker · 2008-05-31 05:43 · Score: 4, Informative

It is possible to solve non-graphics problems on graphics cards nowadays, but the hardware is still very specialized. You don't want the GPU to run your OS or your web browser or any of that; when a SIMD (single instruction, multiple data) problem arises, a decent computer scientist should recognize it and use the tools he has available.
Also, this stuff isn't as mature as normal C programming, so issues that don't always exist in software that's distributed to the general public will crop up because not everyone's video card will support everything that's going on in the program.

Killer Slant by FurtiveGlancer · 2008-05-31 05:43 · Score: 1, Insightful

The guys explain the eight NVIDIA GPUs deliver the same performance for their work as more than 300 Intel Core 2 Duo 2.4GHz processors.

Pardon the italics, but I was impacted by the killer slant of this posting.

For specific kinds of calculations, sure, GPGPU supercomputing is superior. I would question what software optimization they had applied to the 300 CPU system. Apparently, none. Let's not sensationalize quite so much, shall we?

--
Invenio via vel creo

Re:Killer Slant by osu-neko · 2008-05-31 07:33 · Score: 1

For specific kinds of calculations, sure, GPGPU supercomputing is superior. I would question what software optimization they had applied to the 300 CPU system. Apparently, none.
That sounds highly unlikely. The time it takes to run these kinds of programs on a PC is a major source of pain. They throw every optimization in the book at these programs.

--
"Convictions are more dangerous enemies of truth than lies."

Not a Supercomputer -- Special purpose hardware by gweihir · 2008-05-31 05:44 · Score: 2, Informative

It is also not difficuult to find other tasks where, e.g., FPGAs peform vastly better than general-purpose CPUs. That does not make an FPGA a "Supercomputer". Stop the BS, please.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:Not a Supercomputer -- Special purpose hardware by emilper · 2008-05-31 07:17 · Score: 2, Interesting

aren't most of the supercomputers designed to perform some very specific tasks ? You don't buy a supercomputer to run the Super edition of Excel.
Re:Not a Supercomputer -- Special purpose hardware by iwein · 2008-05-31 23:36 · Score: 1

No, super computers are designed to be fast (at all tasks). That's why you can outsmart them if you have a very specific task at hand.

--
Show a man some news, distract him for an hour. Show a man some mod points, distract him for the rest of his life.

Re:Why haven't they started releasing GPU CPUs yet by yanyan · 2008-05-31 05:45 · Score: 1

Check out the GPGPU (General Purpose GPU) project:

http://www.gpgpu.org/

Re:Why haven't they started releasing GPU CPUs yet by gweihir · 2008-05-31 05:48 · Score: 1

This article makes it seem like it is possible to use the GPUs as general purpose CPUs. Is that the case? If so, why doesn't NVIDIA or especially AMD\ATI start putting its GPUs on motherboards? At a ratio of 8:300, a single high-end GPU seems to be able to do the work of dozens of high-end CPUs. They'd utterly wipe out the competition. Why haven't they put something like this out yet?

Simple: This is not a supercomputer at all, just special-purpose hardware running a very special problem. For general computations, GPUs are pretty inferiour.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Brick of GPUs by Rufus211 · 2008-05-31 05:48 · Score: 4, Interesting

I love this picture: http://fastra.ua.ac.be/en/images.html

Between the massive brick of GPUs and the massive CPU heatsink/fan, you can't see the mobo at all.

Re:Brick of GPUs by Fumus · 2008-05-31 06:41 · Score: 3, Funny

They spent 4000 EUR for the computer, but use two boxes in order to situate the monitor higher. I guess they spent everything they had on the computer.

Re:Why haven't they started releasing GPU CPUs yet by krilli · 2008-05-31 05:51 · Score: 1

I don't understand completely what you mean, but:

* Only some people need the extra speed.

* The tasks that can be accelerated this much belong to a specific domain of computation.

* NVIDIA're just starting.

--
Jag pratar lite svenska.

Limited Application by MOBE2001 · 2008-05-31 05:52 · Score: 1

It is obvious that, if a computer is using GPUs exclusively, it is limited to vector or data parallel processing. And it is no surprise that it is being used by an outfit that specializes in visual processing, which is ideally suited to data parallel processing. Change the benchmark program to code that has a lot of data dependencies and this "supercomputer" will choke to a crawl.

Re:Limited Application by Calinous · 2008-05-31 07:04 · Score: 4, Informative

Even more: if you don't optimize the code specifically for the GPU-based supercomputer, your performance goes down the drain. I wouldn't be surprised if they obtained a speedup of an order of magnitude or more from the aggressive code optimisation.
The idea is: the original code would run faster on a 8 Core2Duo machine than on the 8 GPUs. Even more optimising of the code will do little for the Core2Duos, due to limited memory bandwidth, FSB bandwidth, and so on.
Meanwhile, optimising a pipelining sistem (load, compute, store) in the GPU would be greatly improved by huge bandwidth (50GB/s on current systems), huge number of computation units (128 or more) and so on.
Re:Limited Application by tdelaney · 2008-05-31 09:57 · Score: 5, Insightful

Sure - but at 4000 euros, you can afford to do a one-off purchase and write custom software for a limited application. The point of this is that if your application suits it, this is a very cheap way to get supercomputer performance without paying for your own supercomputer (cluster) or time on an existing one.

Wave of the Future? Yes by bockelboy · 2008-05-31 05:54 · Score: 5, Informative

Wave of the Future? Yes*. Revolution in computing? Not quite.

The GPGPU scheme is, after all, a re-invention of the vector processing of old. Vector processors died out, however, because there were too few users to support. Now that there's a commercially viable reason to make these processors (PS3 and video games), they are interesting again.

The researchers took a specialized piece of hardware, rewrote their code for it, and found it was faster than their original code on generic hardware. The problems here are that you have to rewrite your code (High Energy Physics codebases are about a GB, compiled... other sciences are similar) and you have to have a problem which will run well on this scheme. Have a discrete problem? Too bad. Have a gigantic, tightly coupled problem which requires lots of inter-GPU communication? Too bad.

Have a tomography problem which requires only 1GB of RAM? Here you go...

The standard supercomputer isn't going away for a long, long time. Now, as before, a one-size-fits-all approach is silly. You'll start to see sites complement their clusters and large-SMP machines with GPU power as scientists start to understand and take advantage of them. Just remember, there are 10-20 years of legacy code which will need to be ported... it's going to be a slow process.

Re:Why haven't they started releasing GPU CPUs yet by ChrisMaple · 2008-05-31 05:56 · Score: 1

GPUs are massively parallel and very poor at processing branches. This doesn't make for good speed with most programs. For a company to put out a product that they claimed is much faster than the competition, they would have to be very selective with their examples and create new benchmarks that took advantage of their product. They might even have to create new programs to take advantage of the power, like a modified gimp.

In all likelihood, if they tried too hard to advertise their speed advantage, they would be called dishonest because most programs would not exhibit a speedup.

--
Contribute to civilization: ari.aynrand.org/donate

Re:Wave of the Future? Yes by 77Punker · 2008-05-31 06:03 · Score: 4, Informative

Fortunately, Nvidia provides a CUDA version of the basic linear algebra subprograms, so even if your software is hard to port, you can speed it up considerably if it does some big matrix operations, which can easily take a long time on a CPU.

Vector Computing by alewar · 2008-05-31 06:14 · Score: 2, Interesting

They are comparing their system against normal computers, I'd be interesting to see a benchmark against a vector computer, like, eg. NEC SX9

Re:Vector Computing by Yvan256 · 2008-05-31 09:09 · Score: 1

Yeah but how much does a NEC SX9 cost?
Re:Vector Computing by Jeremy+Erwin · 2008-05-31 12:23 · Score: 1

You can lease one for 2.98 Million yen, per month.

Re:The idea is to use the CPU as the CPU by dreamchaser · 2008-05-31 06:23 · Score: 4, Insightful

Because for 95%+ of the problems a general purpose computer tackles GPU's would suck. It's only in very special cases that GPU's outperform CPU's. Thus, your idea is a poor one.

Get the performance where it's most needed by mangu · 2008-05-31 06:43 · Score: 3, Insightful

They are useful for applications that can be massively parallelized

Precisely. But that happens to be one of the areas where more performance is still needed.

You don't need a super-duper CPU for text editing, that's for sure. For most of the tasks people do on computers, we have had CPU enough for the last 15 years or more. But where we still need more CPU happens to be mostly in tasks that ARE massively parallel, for instance, physics simulations, of which you will find several examples in the nVidia site.

I'm following this technology with much interest, and I think I will have a major upgrade in my home computer soon. My old FX-5200 card has been more than enough for my gaming needs, but now I have a new reason for upgrading.

Re:Get the performance where it's most needed by kipman725 · 2008-05-31 07:11 · Score: 5, Funny

You don't need a super-duper CPU for text editing
clearly you have never used EMACS ;)
Re:Get the performance where it's most needed by Hadlock · 2008-05-31 08:44 · Score: 1

For most of the tasks people do on computers, we have had CPU enough for the last 15 years or more.

We're using core2 duos at my work to run Office 97 and a foxpro database app that's been warmed over once since 1996. I use the remaining 99% of CPU power for F@H and World Community Grid clients.

--
moox. for a new generation.
Re:Get the performance where it's most needed by jacquesm · 2008-05-31 09:49 · Score: 1

to expand on that a bit if you're looking to get your hands dirty with cuda & linux you can start doing so here: http://www.nvidia.com/object/cuda_get.html#linux

--
MP3 Search Engine
Re:Get the performance where it's most needed by Dolda2000 · 2008-05-31 12:55 · Score: 4, Funny

"clearly you have never used Eclipse ;)"
There, I corrected that for you.
Re:Get the performance where it's most needed by Ant+P. · 2008-05-31 13:32 · Score: 1

You don't need a super-duper CPU for text editing, that's for sure. Yeah, but if I had a CPU like that it could run the text editor and switch off the other 127 cores.
Re:Get the performance where it's most needed by Iron+Condor · 2008-05-31 19:02 · Score: 1

Wait -- you can use EMACS for text editing now? What will they think of next...

--
We're all born with nothing.
If you die in debt, you're ahead.
Re:Get the performance where it's most needed by eugene_roux · 2008-05-31 21:15 · Score: 2, Funny

"clearly you have never used Eclipse ;)"
There, I corrected that for you.

Clearly YOU have never used EMACS!

--
Part Time Philosopher, Oft Times Romantic, Full Time Unix Geek
Re:Get the performance where it's most needed by Renegrade · 2008-06-01 06:34 · Score: 1

You don't need a super-duper CPU for text editing, that's for sure. Except under Vista!
My old FX-5200 card... Ugh 5200FX? Those are slower than the 4200ti cards! It was the secret, unlabeled "MX" card of it's line. I bought one myself as a replacement for an older, lower-generation card that had failed, and was surprised by how much slower it was.

The price ! by this+great+guy · 2008-05-31 06:48 · Score: 2, Funny

The system uses four NVIDIA GeForce 9800 GX2 graphics cards, costs less than 4,000 EUR to build

What's more crazy: calling something this inexpensive a supercomputer, or 4 video cards costing a freaking 4,000 EUR.

Re:The price ! by TheUnknownOne · 2008-05-31 07:23 · Score: 1

I'm even more surprised by the fact that they gave links to Newegg for all there parts, and if I build a similar system swapping out their PSU choice for a 1300W one from newegg (the one newegg had with the most PCIe connectors, as they didn't have the PSU they used) it costs a only $3,400 US, or as per google, about 2,200 EUR. Granted prices may have dropped since they did this, but I doubt they've been cut in half.
Re:The price ! by danhuby · 2008-06-01 05:18 · Score: 1

What about the time taken to build it? Time is not normally free.

I wonder how that compares to the D870 by Lazy+Jones · 2008-05-31 06:57 · Score: 1

Nvidia offers an external GPU solution specifically for "deskside supercomputing", the Tesla D870. It has only 2 cores with 1.35GHz each, apart from it being a bit more expensive, I wonder how it compares (you can connect several to a PC).

--
"I love my job, but I hate talking to people like you" (Freddie Mercury)

Re:I wonder how that compares to the D870 by Wiseleo · 2008-05-31 19:57 · Score: 1

Tesla - Cuda 1.0
Graphics boards - Cuda 1.1 (capable of atomic operations)

Tesla board - 1.5GB RAM
Graphics board - 512MB RAM per GPU

If you have no requirement for 1.5GB of on-board RAM, the graphics boards are a better option. A Tesla is basically a Quadro board without the video output.

Computationally to CUDA, they are the same.

I am going to use some 9800GX2s for h.264 video encoding with the upcoming RapiHD product. I am still debating whether I will use 4 GX2s in a single box or if I will simply have twice as many boxes.

One thing that is curious about the article is that they are only using a single quadcore chip. Most people who are considering using quad GX2s suggest a core per GPU. Their rig probably would perform even faster with 8 cores.

--
Leonid S. Knyshov
Find me on Quora :)

Re:Why haven't they started releasing GPU CPUs yet by Dogtanian · 2008-05-31 07:02 · Score: 1

This article makes it seem like it is possible to use the GPUs as general purpose CPUs. Is that the case? As well as the issues that others have mentioned, there's also the problem of accuracy with GPUs.

AFAIK, in many (all?) ordinary consumer graphics cards, minor mistakes by the GPU are tolerated because they'll typically result in (at worst) minor or unnoticable glitches in the display. I assume that this is because, to get the best performance, designers push the hardware beyond levels that would be acceptable otherwise.

Clearly if you're using them for other mathematical operations, or to partly replace a standard CPU, such mistakes might *not* be acceptable.

--
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).

Re:Why haven't they started releasing GPU CPUs yet by **loki969** · 2008-05-31 07:13 · Score: 1

This article makes it seem like it is possible to use the GPUs as general purpose CPUs. The summary might, but the article doesn`t.

Windows XP? by marind · 2008-05-31 07:16 · Score: 1

What a pity that they are running Windows XP. This computer could be the first to see running Vista at an acceptable speed.

Re:First post by gardyloo · 2008-05-31 07:18 · Score: 1

On par with anyone else's system?

Sooo... using GPU for graphics? by flyingfsck · 2008-05-31 07:20 · Score: 1

They are using graphical processors to process graphics. Truly revolutionary. Who woulda thunkit?

--
Excuse me, but please get off my Pennisetum Clandestinum, eh!

Re:Sooo... using GPU for graphics? by Warbothong · 2008-05-31 07:31 · Score: 1

Makes me slightly annoyed whilst I sit here watching my windows stutter around the screen when I switch desktops since all of my graphics are being processed by my CPU.

Thanks a lot ATI :(

(PS: This is in Enlightenment not Compiz)

Re:Why haven't they started releasing GPU CPUs yet by Dachannien · 2008-05-31 07:26 · Score: 1

Your average program can't break off into 128 threads, that takes a little bit of extra skill on the coder's part. Or, in some cases, a significant lack of skill, although in those cases, it usually doesn't stop at 128.

Hmm... by grodzix · 2008-05-31 07:31 · Score: 1

Ok, so if GPUs are far more powerful than CPUs why doesn't nVidia just create a CPU based on their GPU? AMD & Intel would be in serious troubles then. And if 300 CPUs can be overtaken by just 8 GPUs, why the heck people haven't been doing it before? No one got such idea? Yeah, right... Something stinks here...

--
My Windows is NOT slow, it's special!

Re:Hmm... by sycomonkey · 2008-05-31 08:16 · Score: 1

The article is correct, however, most software is not nearly so parralel/multithreaded that it can be run effecientyly on a GPU. For general computer purposes, it is better to have a few, fast cores (AMD/Intel) than to have many, slower cores (ATI/nVidia). However, certain scientific computing applications are heavily parralelizable and so run very fast on GPU's. Basically, it depends on your definition of "powerful".

--
--The universe will not be altered by forum threads, even those which are very wry. --Tycho Brahe (Penny Arcade)

Oblig by s1d · 2008-05-31 07:56 · Score: 1

*Insert Beowulf cluster meme*

--
In Soviet Russia, everything runs linux.

Have they profiled it? by Chris+Snook · 2008-05-31 07:56 · Score: 3, Interesting

I'm extremely curious to know where the performance bottleneck is in this system. Is it memory bandwidth? PCIe bandwidth? Raw GPU power? Depending on which it is, it may or may not be very easy to improve upon the price/performance ratio of this setup. Given that the work parallelizes very easily, if you could build two machines that are each 2/3 as powerful and each cost 1/2 as much, that's a huge win for other people trying to build similar systems.

--
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.

Re:Wave of the Future? Yes by Josef+Meixner · 2008-05-31 08:12 · Score: 2, Interesting

The GPGPU scheme is, after all, a re-invention of the vector processing of old. Vector processors died out, however, because there were too few users to support. Now that there's a commercially viable reason to make these processors (PS3 and video games), they are interesting again.

Since when have "vector processors died out"? The "Earth Simulator" for example used the NEC SX-6 CPU, currently the SX-9 is sold. Vector processors never died out and were in use for what they are best at. The GPU and the Cell are no match for either processor, first they both are only fast in single precission mode and much slower when they have to do double precission (the second generation of Cell is better at double precission) and they both have a weak memory subsystem when compared to a true VPU. It is slow and they can only use small memories. As far as I know the Cell can't even chain it's VPUs, something which was standard since the Cray-2 on VPUs.

Single precision by SlashV · 2008-05-31 08:16 · Score: 1

I have the impression that the graphics cards only do single precision floating point math, just like the CELL. Makes sense since both were developed for gaming, and single precision is sufficient for graphics.
For many scientific problems however, double precision is a must have. I guess that's one reason why GPU based systems aren't in the .
I wish people were more explicit about what kind of FLOP they are talking about when claiming TeraFLOPs performance.

24 core IKEA-cabinet setup? by plaz_swe · 2008-05-31 08:41 · Score: 1

Sure, this machine is for floating point calculations. But for general-purpose computing, I think one of these setups would be more efficient: "24-core Linux cluster in IKEA cabinet" http://helmer.sfe.se/

Direct link to the project website by kjba · 2008-05-31 09:28 · Score: 1

The main post does not contain a direct link to the project webpage: http://fastra.ua.ac.be/, which contains a lot more info and explanations.

Re:FASTRA (overclocked) by Arimus · 2008-05-31 10:13 · Score: 1

The graphs are time taken in seconds to perform a task - so shorter graph == better.

--
--- Users are like bacteria -> Each one causing a thousand tiny crises until the host finally gives up and dies.

Re:Why haven't they started releasing GPU CPUs yet by Shark · 2008-05-31 10:23 · Score: 1

If, for example, someone could port gcc to run on the GPU, think of how happy those Gentoo folks would be :) (make -j128)! The mere mention of it gave me the Gentoo equivalent of a giant stiffy.

--
Mind the frickin' laser...

Why general purpose? by nurb432 · 2008-05-31 10:48 · Score: 1

Wouldn't be the first time a supercomputer was developed with a specific task in mind.

--
---- Booth was a patriot ----

We Need a Universal Multicore Processor by MOBE2001 · 2008-05-31 10:55 · Score: 3, Interesting

The point of this is that if your application suits it, this is a very cheap way to get supercomputer performance without paying for your own supercomputer (cluster) or time on an existing one.

No doubt about it. In spite of my admittedly negative criticism, I applaud these guys because I think this shows the amazing potential of multicore parallel computing to bringing supercomputing power to the desktop and even to the laptop and the cellphone. However, this potential will not arrive unless we can find a way to design a universal multicore processor architecture that is at home in all possible parallel environments, not just vector parallel systems. IOW, we need a parallel processor that can handle anything we can throw at it with equal ease. Unfortunately, both the industry and academia are pushing the field toward so-called heterogeneous processors, hideous monsters that will be a nightmare to write code for. Check out Nightmare on Core Street for good explanation of the multicore crisis and how it can be solved.

CUDA C programming environment is the key by Anonymous Coward · 2008-05-31 11:37 · Score: 1, Informative

The key to why they were able to do this is the CUDA C programming environment:
http://www.nvidia.com/cuda

Define: which is better? by mcrbids · 2008-05-31 11:44 · Score: 4, Informative

Which is faster? A Lamborghini or a 5-ton flatbed truck?

Depends on what you're after! If you are trying to get yourself from point A to point B, the Lamborghini is the obvious choice. But if you need to move 4.5 tons of stuff from point A to point B, the Lamborghini would suck ass when compared to the flatbed truck.

It's just a question of what you are trying to accomplish. There is no absolute framework for "power" to solve problems, even if you define it fairly narrowly. For example, let's talk about 'pattern matching': A free database (like PostgreSQL) on cheap hardware can search through millions of records to deliver a query result in a tenth of a second. In that respect, Postgres is WAY faster than, say, the human brain. But the human brain will KICK ASS over just about any other technology out there in deciding whether or not a particular image contains a cat.

Use the right tool for the job, and you'll be amazed at the results. That 8 GPUs handily outperform 512 CPU cores at a specific task is not surprising - the GPUs are designed from the beginning to solve the kind of problem that's needed!

Personally, I'm surprised as to why there hasn't been more development behind the FPGA: are they just expensive?

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.

Re:Define: which is better? by Jaime2 · 2008-05-31 15:58 · Score: 1

So you agree with me. I never suggested that there was an absolute framework for "power". The summary is written in such a way that a casual reader who takes it at face value may come away believing that NVIDIA has invented some killer technology that is going to wipe Intel off the map. The truth is that the article linked from the summary refers to a group of people who found that NVIDIA GPUs happened to match their processing needs much better than the current crop of typical desktop/server processors.

Personally, I'm surprised as to why there hasn't been more development behind the FPGA: are they just expensive? Coding that close to the metal is really expensive. It is usually easier to throw servers at the problem.
Re:Define: which is better? by irc.goatse.cx+troll · 2008-05-31 16:52 · Score: 1

But the human brain will KICK ASS over just about any other technology out there in deciding whether or not a particular image contains a cat.

I disagree -- I have a about a 4 in 10 success rate on rapidshares new "type the letters that have a cat behind them" captcha. Surely someone could write an app to do better.

--
Pain lasts, kid. Its how you know you're alive. Sometimes I think this growing up thing is just pain management-TheMaxx
Re:Define: which is better? by arktemplar · 2008-05-31 19:36 · Score: 1

Well, consider it this way - high level to VHDL/Verilog sucks a lot. So you are left with optimising VHDL code for performing highly parallel computations. FPGAs only recently have started getting competitive with GPUs etc. with the advent of large FPGAs which clock all the way upto 500 MHz, and have enough resources to make your mouth water (Xilinx LX330 has 54K LUT-6 slices).

There was a great paper by a guy called Underwood, which showed that in around 2009-2010 FPGAs could manage to outperform CPUs at computationally intensive tasks, having worked on this for some time now I do find my self believing him.

--
blog plug -> The Darker Side of Light
Re:Define: which is better? by thetoadwarrior · 2008-05-31 19:50 · Score: 1

But it's not just a picture of a cat. It's a warped black & white picture of a cat and a letter/number besides similar letters / numbers with a similar warped animal on them.

Rapidshare wants you to pay for it so they aren't going to make it easy and even if you do get it on the first go I've found it claims you've exceeded your number of tries and you must either pay or wait to try again.

Has been done before with PS3s by nkeat · 2008-05-31 13:02 · Score: 2, Interesting

The guys in Antwerp have probably got themselves the greater number crunching power, but reconstruction of tomographic images has been done using similar multi-core hardware. See the following (pdf alert) from the University of Erlangen, which uses a cluster of PS3s for a great use of commodity consumer hardware http://www.google.co.uk/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fwww.imp.uni-erlangen.de%2FIEEE%2520MIC2007%2FKnaup_Poster_M19-291.pdf&ei=t_FBSKnZKoie1gbh2Y23Bg&usg=AFQjCNG7vNGmMM2hBrYdVKbwZAJZL0oS3Q&sig2=sEdlnPROC77CZ_KJ5OOgrg .

A funny twist on 3DFx' marketing campaing by DrYak · 2008-05-31 13:09 · Score: 1

What this makes me thinking about it the marketing campaign that 3DFx used back when they launched their Voodoo 5 series ("So powerful it's kind of ridiculous").

Most of the TV spot started explaing how scientist could save humanity with GFLOPS-grade chips. But then humorously, the TV sport announces that they decided to play game (often with hilarious effect on the various "dreams of a better humanity" that the first half of the spot showed).

In a funny twist of things, it's the exact opposite that happened actually : nVidia and ATI were aiming for the highest possible frame rate in games, but then they decided to do science and did CUDA & Brook.

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]

Re:A funny twist on 3DFx' marketing campaing by cheesybagel · 2008-06-03 01:01 · Score: 1

Not really. If 3Dfx was ever serious about scientific or engineering applications, they would have made decent OpenGL drivers and they never did. They only had a craptascular miniGL driver. They wanted everyone to use their proprietary Glide, but by that time, people preferred OpenGL or Direct3D.

Intel is planning it by DrYak · 2008-05-31 13:19 · Score: 1

An ATI or nVidia GPU as main processor is completely out of question. These things aren't even able to *call* a function (everything is inlined at compile time), not to mention that everything is ran by a SIMD engine behind the scene which mean that all the processing unit must all actually run the same code at the same time.

Intel on the other hand are touting their future Larrabee as being completely compatible with the x86 instruction set. The whole thing, according to them, should behave like a big many-core in-order CPU - think of Tilera's Tile64, but with overclocked Pentium1 + SSE (hum... probably the Larrabe is going to be a many core silverthorne ?)
They have mentioned the possibility of producing Larrabee that use quickpath to communicate with the outside and fit into a server socket. They could even be it used alone and boot the OS by themselves, thank to the compatibility with the original x86 instruction set.

Of course, that's only marketing, there are no Larrabees out yet. We shall see...

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]

Re:Why haven't they started releasing GPU CPUs yet by gollito · 2008-05-31 13:22 · Score: 1

They are useful for applications that can be massively parallelized. Your average program can't break off into 128 threads, that takes a little bit of extra skill on the coder's part. If, for example, someone could port gcc to run on the GPU, think of how happy those Gentoo folks would be :) (make -j128)! That would be "make -j129" (cpu's + 1 is the recommended setting)

Supercomputer? Hardly! by Binder · 2008-05-31 13:29 · Score: 1

Impressive performance yes... but a supercomputer? Hardly!
Next article will be "hey, I overclocked my Q6600 and now it's a supercomputer."

Realistically this is a rather normal computer with a rather fast coprocessor.

Re:Why haven't they started releasing GPU CPUs yet by mikael · 2008-05-31 13:43 · Score: 1

The major different between CPU's and GPU's is that CPU's have to handle the effects of different branch conditions. One of the optimizations that CPU's have been designed for, is the combination of pipelining and dual path evaluation. Because the CPU is pipelining instructions, it has to evaluate both outcomes of each conditional instruction in parallel with the actual condition, and then select the actual outcome once it is actually known. Calculating the condition first followed by the outcome, would reduce performance by a half.

As GPU's are dedicated to floating point data, they don't have the space to do this kind of logic.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads

Monitor Height & Ergonomics by cciRRus · 2008-05-31 14:39 · Score: 1

On a more serious note, they shouldn't have elevated their monitor. Generally, the top of computer monitors should be at the eye-level of the user. From the picture, it doesn't seem so. This would cause more strain on the user's neck as he/she needs to look up more than looking down.

The way our neck bones are structured, makes looking up more strenuous than looking down. Hence, it is more comfortable to look downwards than upwards.

--
w00t

presentable? by CaptainNerdCave · 2008-05-31 17:09 · Score: 1

presentable and flashy? how about practical? did you notice the case comes with four 120mm led fans? http://www.newegg.com/Product/Product.aspx?Item=N82E16811112160

have you built many computers? with all of the systems i've put together, bundling wires is the most reasonable way to make the hardware manageable. additionally, the use of aftermarket cooling devices is often essential when you are planning to run your hardware at maximum capacity 24/7. if the plan for the hardware is something that the manufacturer never envisioned (which is clearly the case), then you have to find appropriate solutions. in the future, you may want to look a little more closely before jumping straight into harsh criticism.

personally, i think their heat issue could be solved with 1/2" tubes and water.

Personally, I'd add disclaimer to my self-link by Mathinker · 2008-05-31 19:14 · Score: 1

> Check out my blog post Nightmare on Core Street ....

There, fixed that (IMNSHO) for you...

Re:The idea is to use the CPU as the CPU by dreamchaser · 2008-05-31 20:53 · Score: 1

Oh I understand perfectly why there is a 'stir', and I also understand why they aren't anywhere close to replacing general purpose CPU's yet. Not even close. I never said it wasn't interesting use for the technology.

Typical tomography matrix sizes by BuGless · 2008-05-31 20:58 · Score: 1

For tomography reconstructions one typically needs SVD (Singular Value Decomposition) which are of n^2 sizes. E.g. consider a reconstructable image of 128 x 128 = 16384 pixels total, this needs (temporary) processing matrices of size 16384 x 16384, which, as you can see, grows rather fast with even moderate reconstruction grids.

Re:Typical tomography matrix sizes by arktemplar · 2008-06-01 00:26 · Score: 1

Possibly, however no one in their right minds would process a matrix of that size without blocking. There are several 'Block SVD' algorithms, there are also some projects done by students on doing SVD on GPUs.

http://www.cs.unc.edu/~geom/Numeric/svd/
http://www.google.co.in/search?q=block+SVD&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a

Agreed that the matrices that it is processing are large but what is the size of the blocks that it is processing - IIRC the 8600 that the guys in my lab are working on can have 32 threads running in parallel however each thread will be working on a very small (I've been told it's upto 32x32 double precision floating point and that's stretching it)

Thus -blocking- as I had said comes into play, look I'm not downplaying the usefulness of the GPU or anything, nor am I downplaying their research, it's just that it's to be expected and the usefulness to general applications is limited.

--
blog plug -> The Darker Side of Light

Galaxy collisions? by Man+Eating+Duck · 2008-05-31 22:05 · Score: 1

I'm no great coder, nor matemathician, but wouldn't a GPU be perfect for simulations such as this one?

I imagine there is a lot of vector calculations going on, which GPUs are very good at. Maybe the communication between memory and GPU would be too slow.

Someone with more knowledge, please share your insights!

--
Are you a grammar Nazi? I'm trying to improve my English; please correct my errors! :)

Made _your_ blood boil, obviously by Mathinker · 2008-05-31 22:22 · Score: 1

Looking at the last sequence of three posts in this thread, it seems obvious whose blood is boiling. LOL

Re:Specific Tasks == FLOPS by FurtiveGlancer · 2008-05-31 23:40 · Score: 1

Most supercomputers are optimized to perform 32 or 64 bit floating point operations. Currently broaching the petaflop boundary. The reason is that most established uses of supercomputers rely on mathematical computation: Fourier Transforms, Linear Algebra, etc., which utilize floating point operations. Visualization and imaging have been done on GP supercomputers, but not very economically. There's constant buzz in the HPC community about FPGAs and GPUs, but CPUs remain the most flexible hardware to do the core job of HPC which is floating point computation.

I wouldn't be surprised to see HPC centers adding (as budgets permit) GPU based solutions for vizualization, imaging, ray tracing, etc. FPGA solutions appear, IMHO, to be a bit farther away from use. The biggest hold up for both will be the programming interfaces. HPC programming interfaces are just emerging for GPU solutions. It remains to be seen which will succeed.

--
Invenio via vel creo

Re:bottleneck analysis by Chris+Snook · 2008-06-01 03:01 · Score: 1

a) The motherboard in question runs 4 PCIe slots at 8x, giving a theoretical limit of 2GB/s for each slot (thus card). The very large transfers they're probably making can probably approach something like 90% of this, so we'll say each card would like 1.8GB/s.

Actually, that's PCIe 2.0. They should be getting double that bandwidth.

You're describing "GPGPU clustering", and although there has been some work on this concept, it's not very attractive because it's even more difficult to tailor your application to the hardware, and you almost always need Infiniband (at least), which is so expensive that you're rapidly approaching the realm of specialized hardware prices anyway.

A lot of graphics work is considered to be "embarrassingly parallel", such that gigabit ethernet is more than adequate as an interconnect. This is what movie studios use for their render farms. Given that they've managed to parallelize this to the array of stream processors on the GPUs, it sounds like they're at least approaching the "embarrassingly parallel" level. Whether or not they're so parallel they can use commodity ethernet as an interconnect... that's the question. From a programming perspective, running on a single system is certainly easier, but what if someone wanted to scale this up by a factor of 1000? Would this problem scale like that, or would they be limited to a smaller, less interesting set of problems?

--
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.

The question is... by okmijnuhb · 2008-06-01 04:43 · Score: 1

...But will it run Vista?

What about video editing needs ?? by Woofer60 · 2008-06-01 05:15 · Score: 1

I do some intense video editing and the 'rendering' phase takes forever. I have a dual core AMD, fast FSB...etc.. still slow. But wouldn't it make sense to have a GPU (or 2 or ..) just for those sort of number crunches ?? I would imagine rendering can be parallel processed..? I would think there would be a market for an add-in board that can crunch numbers like crazy and would work with the common video software (Vegas, Premiere, etc.) Maybe even have a direct socket for SATA drives so as to avoid the bus. Home video editing has come down in price so much that I see a huge future market there.

my cellphone beats a 1970s Cray by peter303 · 2008-06-01 10:10 · Score: 1

Comparing somthing to an old supercomputer is stupid. At any given time a s supercomputer is faster order of magnitude machines (60 to 600 teraflops in mid-2008).

Eight Megabytes and Constantly Swapping... by billstewart · 2008-06-02 06:31 · Score: 1

Emacs used to be really big. Now it's pretty small, even with the graphics interfaces and kitchen sink written in ELisp. My browser's currently burning 300MB (the 3 Beta 5 version used much less RAM than 2.x, and I bought more RAM about the same time the RC version came out, so I don't know how much of the change is which :-) Of course, any time you take a 5-year-old application and run it on a current computer, it turns out to be small and blazingly fast, unless it has to play games with hardware emulation that can make it as slow as the original.

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

That's what Supercomputers *are* these days by billstewart · 2008-06-02 06:44 · Score: 1

If you look at most of the Top500 supercomputers these days, they're not Fluorinert-cooled Cray2s any more - they're almost all lots of processors with some kinds of interconnection network to get data between processors, and they're running lots of SIMD to make LINPAK go fast. This machine is using the CPU interconnects because they benchmarked it and found they didn't need to mess with SLI, and it may not have quite the I/O scalability that the Earth Simulator or its faster competitors have, but it's in the same kind of space. It's not near the top, but it's certainly respectable.

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Slashdot Mirror

Supercomputer Built With 8 GPUs

171 of 232 comments (clear)